Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete translations show up in search results #2327

Open
mgeisler opened this issue Aug 28, 2024 · 6 comments
Open

Incomplete translations show up in search results #2327

mgeisler opened this issue Aug 28, 2024 · 6 comments
Assignees

Comments

@mgeisler
Copy link
Collaborator

I've been hesitant to generate and publish HTML for half-done translations since there is a risk that search engines get confused by the duplicate content. A half-done translation will result in lots and lots of pages with English content. We emit a correct <html lang="xx"> tag, so I guess it should be fine, but it's something to be aware of.

Originally posted by @mgeisler in #2189 (comment)

@mgeisler
Copy link
Collaborator Author

I just noticed that this is starting to happen. I searched for "async rust comprehensive" because I wanted to find one of our examples, and the top result points to the Bengali translation:

image

My browser should tell Google that I can read English, Danish, and German, but I still got this result — the <html lang="bn" at the top of the page has not been a strong enough signal here.

We could consider adding a no-index tag to very incomplete translations?

Cc @qwandor, @henrif75, @djmitche.

@henrif75
Copy link
Collaborator

Maybe we could add a robots.txt to the directory of unfinished translations to avoid crawling? In that case, we would have to add a parameter to tell which ones are good for publishing.

@mgeisler
Copy link
Collaborator Author

Maybe we could add a robots.txt to the directory of unfinished translations to avoid crawling? In that case, we would have to add a parameter to tell which ones are good for publishing.

Yes, we could do something like that! We cannot use robots.txt since it has to be in the root of the domain (last I checked), but we can inject the no-index attribute in the theme somehow.

@mgeisler
Copy link
Collaborator Author

In that case, we would have to add a parameter to tell which ones are good for publishing.

Yeah, good point: we should probably just follow the same principle as for when we include translations in the language picker.

@henrif75 henrif75 pinned this issue Sep 17, 2024
@henrif75 henrif75 unpinned this issue Sep 17, 2024
@djmitche
Copy link
Collaborator

https://stackoverflow.com/questions/32784322/stopping-index-of-github-pages

  • Can use a user page to put robots.txt at the root of the domain. I think that would be https://github.com/google/google?
  • <meta name="robots" content="noindex"> may do the trick - maybe we just add that to incomplete translations?

@henrif75 henrif75 self-assigned this Sep 18, 2024
@mgeisler
Copy link
Collaborator Author

  • <meta name="robots" content="noindex"> may do the trick - maybe we just add that to incomplete translations?

Yes, I believe this is the correct solution for us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants