Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation - scripts, the final part #154

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

marmarek
Copy link
Member

@marmarek marmarek commented Mar 31, 2021

This is the final part of the translation integration. Previous two:
#153
QubesOS/qubes-doc#1123

Similar to previous parts, the majority of work was done by @maiska and @tokideveloper. Thanks!

On top of their work, this PR integrates all those scripts with Gitlab CI pipeline. It handles pushing/pulling content to/from Transifex automatically. I have performed the first iteration of this, to populate https://github.com/QubesOS/qubes-translated repository with initial content (we did have some content translated before, so indeed some (small) parts are translated, mostly in Spanish language). For the test purposes, I've enabled languages es, de, fr, but this commit should not be merged until substantial part of the documentation there is translated (right now most of those pages are simply copies from English).

See README for high level description of how it works.

The resulting website, with translations enabled can be seen at https://wwwtest.qubes-os.org/

@marmarek
Copy link
Member Author

testdeploy

@qubesos-bot qubesos-bot temporarily deployed to qa March 31, 2021 22:08 Inactive
@tokideveloper
Copy link
Contributor

Great work! Thanks to @maiska and @marmarek!

BTW, the line

{% assign langmenu = (site.languages.size > 1) %}

doesn't seem to work as expected. At least, I see the lang menu with the English entry only. The following should work (without having tested it):

{% if site.languages.size > 1 %}
  {% assign langmenu = true %}
{% else %}
  {% assign langmenu = false %}
{% endif %}

It's a Liquid thing …

@marmarek
Copy link
Member Author

marmarek commented Apr 1, 2021

{% assign langmenu = (site.languages.size > 1) %}

It works, just ignore wwwpreview instance - it got built with empty _translated dir... See wwwtest instance, linked in the PR description.

@tokideveloper
Copy link
Contributor

{% assign langmenu = (site.languages.size > 1) %}

It works, just ignore wwwpreview instance - it got built with empty _translated dir... See wwwtest instance, linked in the PR description.

I see multiple langs in the menu in the wwwtest instance, so, it seems to work as you said.

However, I tried the following mini examples right now:

Example 1

{% assign a = (10 > 5) %}
{{ a }}

outputs nothing. Also, I got a syntax error warning.

Example 2

{% assign a = 10 > 5 %}
{{ a }}

outputs 10. Also, I got a syntax error warning.

Example 3

{% if 10 > 5 %}
  {% assign a = true %}
{% else %}
  {% assign a = false %}
{% endif %}
{{ a }}

outputs true. No warning.

Example 4

{% if (10 > 5) %}
  {% assign a = true %}
{% else %}
  {% assign a = false %}
{% endif %}
{{ a }}

outputs false. Also, I got a syntax error warning.

Do I have a different Liquid version?
What's the output of {{ langmenu }}?

Sorry for the noise. I just want to avoid errors in the long run.

@marmarek
Copy link
Member Author

marmarek commented Apr 2, 2021

You are correct, it worked only because langmenu was assigned 4. Thanks!


echo "============================ post processing step 2 ======================================"
#read b
ruby _utils/_translation_utils/merge_md_heading_ids.rb "$1" /tmp/tx-mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the merge_md_heading_ids.rb script, called here, it is important that the checked out commit is not the most recent one but that commit that was the current one when the files have been pushed to Transifex the last time. The reason is that headlines may have changed or added or removed meanwhile and thus, the merge_md_heading_ids.rb script may produce a wrong output.

I can't see any attempt to checkout that correct commit or have I overseen something?

IMHO when pushing files to Transifex, the related commit hash should be saved. When pulling the files from Transifex, that related (old) commit should be checked out for the postprocessing steps.

Is it understandable what I mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, now I see. Both pushing and pulling belong to the same commit and thus everything is right. Thank you for clarification!

@marmarek
Copy link
Member Author

I've rebased it on top of recent version and added few improvement discussed in the meantime (including @tokideveloper 's updated script).
This PR includes also #177

@marmarek marmarek force-pushed the translation-scripts branch 2 times, most recently from cfdcf4e to e8d7029 Compare June 23, 2021 03:47
@marmarek
Copy link
Member Author

I'll simplify it further when rebasing on top of #178 (when it will be merged).

@marmarek
Copy link
Member Author

PipelineRefresh

@marmarek marmarek force-pushed the translation-scripts branch 2 times, most recently from fe3a31c to 095179a Compare June 24, 2021 12:14
@marmarek
Copy link
Member Author

PipelineRetry

@marmarek
Copy link
Member Author

PipelineRetry

@marmarek
Copy link
Member Author

PipelineRetry

Remove translated files when the source file get removed. This includes
also renaming files.
Do not attempt to verify correctness of the website before refreshing
all languages. If any structural change happened (rename, permalink
change etc), it will fail because of the old files, not the freshly
downloaded ones.

Make htmlproofer postprocessing script language agnostic. This is mostly
about removing "language" parameter, which wasn't used anyway.
@andrewdavidwong
Copy link
Member

Just to confirm... you're not waiting or expecting me to do anything with this, right @marmarek? I've been intentionally leaving this PR alone because you seem like you're managing it yourself (and of course the big "[DO NOT MERGE]" warning).

@marmarek
Copy link
Member Author

testdeploy

@qubesos-bot qubesos-bot temporarily deployed to qa June 24, 2021 17:30 Inactive
@marmarek
Copy link
Member Author

trestdeploy

@marmarek
Copy link
Member Author

testdeploy

@qubesos-bot qubesos-bot temporarily deployed to qa June 24, 2021 17:32 Inactive
@marmarek
Copy link
Member Author

You are correct. I want to get it working again first. It's almost there. For example QubesOS/qubes-translated@a102a9e was just made. There is still some issue with language switcher (just "en" visible there).

As for the "[DO NOT MERGE]" commits - it's just about that commit - it enables the translations to be visible for some languages (es, fr, de) - which I think we want only when sufficient part will be translated already.

@tokideveloper
Copy link
Contributor

You are correct. I want to get it working again first. It's almost there. For example QubesOS/qubes-translated@a102a9e was just made. There is still some issue with language switcher (just "en" visible there).

Currently (if language switcher enabled at all), for a certain page P, the language switcher only shows those languages where translations exist for P in qubes-translated. Should it be different?

@marmarek
Copy link
Member Author

Currently (if language switcher enabled at all), for a certain page P, the language switcher only shows those languages where translations exist for P in qubes-translated.

I think this design is fine (even if may look weird at places). The issue is I see only "en" on pages I manually verified the "translated" version exists in the qubes-translated repo. For example documentation index, or developer/building/qubes-builder.md.

@tokideveloper
Copy link
Contributor

tokideveloper commented Jun 24, 2021

Currently (if language switcher enabled at all), for a certain page P, the language switcher only shows those languages where translations exist for P in qubes-translated.

I think this design is fine (even if may look weird at places). The issue is I see only "en" on pages I manually verified the "translated" version exists in the qubes-translated repo. For example documentation index, or developer/building/qubes-builder.md.

That's strange because I tested it on my machine a few minutes ago and it works. Okay, actually I only tested two files: _doc/doc.md and _doc/user/how-to-guides/how-to-copy-and-paste-text.md. And actually, I just copied the two original files from qubesos.github.io/_doc to the newly created _qubes-translated directory (with underscore), set lang and permalink appropriately and translated title test-wise. That was sufficient to work: On these two pages, the switcher contains both en and de, on the other pages not.

EDIT: Of course, I enabled the switcher in _includes/header.html via {% assign langmenu = true %} in line 14.

@marmarek
Copy link
Member Author

testdeploy

@qubesos-bot qubesos-bot temporarily deployed to qa June 24, 2021 21:42 Inactive
@marmarek
Copy link
Member Author

Ok, that was not very smart from me. "testdeploy" command took the version without downloaded translations 🤦 Now it is better :)

@tokideveloper
Copy link
Contributor

Now it is better :)

It's a lot better! It looks great! 🎉 ❤️

@marmarek
Copy link
Member Author

https://wwwpreview.qubes-os.org/de/team/#inquiries - this doesn't look properly handled, especially "Business" should be "Geschäft" (it is translated in the qubes-translated repo). This is because _includes/team.html has:

  {% for team in site.data.team %}
    {% if team.type == "inquiries" %}
...

I can easily change that to appropriate entry from site.translated, but then team.type may not match "inquiries" if it gets translated. And in fact team.type core is translated to Kern in German version. I think team.type entry should not be translated, as it's just an internal metadata for the layout - the actual section titles are in team-page.yml file.

Some ideas:

  • mark team.type entry as non translatable (locked?)
  • don't translate team.yml file at all, but move translatable strings into team-page.yml ("name" in the inquiries section, "role" in others)

I'm not sure how to do either of those.

@marmarek
Copy link
Member Author

Anyway, now is the time to review both the content of this very PR, but more importantly its output at wwwpreview.

As for the scripts in _utils/_translation_utils, some extra review would be useful, but not strictly necessary - those were mostly written by @maiska and @tokideveloper and already reviewed (and sometimes modified) by me.

@tokideveloper
Copy link
Contributor

https://wwwpreview.qubes-os.org/de/team/#inquiries - this doesn't look properly handled, especially "Business" should be "Geschäft" (it is translated in the qubes-translated repo). This is because _includes/team.html has:

  {% for team in site.data.team %}
    {% if team.type == "inquiries" %}
...

I can easily change that to appropriate entry from site.translated, but then team.type may not match "inquiries" if it gets translated. And in fact team.type core is translated to Kern in German version. I think team.type entry should not be translated, as it's just an internal metadata for the layout - the actual section titles are in team-page.yml file.

Some ideas:

  • mark team.type entry as non translatable (locked?)

Yes! Maybe other entries need to be locked, too.

  • don't translate team.yml file at all, but move translatable strings into team-page.yml ("name" in the inquiries section, "role" in others)

No. I like the current design where the team itself has its own data file.

I'm not sure how to do either of those.

Me neither. @maiska, can you describe how to lock certain strings? Maybe via the API?

@tokideveloper
Copy link
Contributor

Anyway, now is the time to review both the content of this very PR, but more importantly its output at wwwpreview.

I agree. First, the output should be good. Afterwards, we can review the code.

As for the scripts in _utils/_translation_utils, some extra review would be useful, but not strictly necessary - those were mostly written by @maiska and @tokideveloper and already reviewed (and sometimes modified) by me.

In the past, I reviewed maiska's files and they seemed to be good. But I can do another review again.

@tokideveloper
Copy link
Contributor

  • mark team.type entry as non translatable (locked?)

Yes! Maybe other entries need to be locked, too.

The smart tag "locked" means "nobody can touch it" and the smart tag "notranslate" means in our case "replace with source string". We should use both. See https://docs.transifex.com/projects/preventing-resource-edits#locking-strings

Me neither. @maiska, can you describe how to lock certain strings? Maybe via the API?

See _utils/_translation_utils/tag_strings_as_locked.py. You may need to load the diff manually.

Copy link
Contributor

@tokideveloper tokideveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed all files. At least, I tried it and tried to understand the files.

@@ -0,0 +1 @@
current counter: 251
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that this value is correct once the PR is merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll rerun the script shortly after merging again. And generally, once it lands in master, the script will be called frequently.

- de
- fr
- es

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would start with en only for now. This way, we could test the workflow on the live system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the plan. Adding those extra languages is a part of the "[DO NOT MERGE] enable languages: de fr es" commit.

result = merge_ids_in_gfm_files(original_lines, translated_lines)
write_lines(result, item)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this Python script isn't needed anymore since it's not called and it has been ported to a Ruby script (which is called). See _utils/_translation_utils/merge_md_heading_ids.rb.

EDIT: Also, this Python version is not up-to-date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I'll remove it.

# headline starting with '#'
gfm_lines[line_number] = '# ' + placeholder + "\n"
hl = look_for_headline(render(gfm_lines), placeholder)
elsif next_line.length >= 3 and (line_only_made_of(next_line, '=') or line_only_made_of(next_line, '-'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since next_line contains a \n at the end, the check above must be next_line.length >= 3 + 1 instead of next_line.length >= 3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.


mapping = create_dict_from_tx_config(ARGV[0], ARGV[1])
mapping.each do |key, value|
if !key.end_with?(".yml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there could be more files on Transifex in the future (e.g. HTML, PO etc), the check should check for accepted file endings like *.md: if key.end_with?(".md")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@tokideveloper
Copy link
Contributor

What I forget in the review: Some scripts run recursively through directories like _doc but don't ignore .git. To be corrected.

Copy link
Contributor

@tokideveloper tokideveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this review, I marked the spots where .git is not ignored.

Sometimes there seems to be no need to ignore .git since the root directory of such a walk is a repo subdirectory (not containing any .git directory). However, to be on the safe side, .git directories should be ignored in general.

mapping = {}
perms = []
yml_files = []
for dirname, subdirlist, filelist in walk(translated_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This walk doesn't ignore .git.


href = set()
reg ='(?<=href=\").*?(?=\")'
for dirname, subdirlist, filelist in walk(translated_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This walk doesn't ignore .git.

"""

perms = set()
for dirname, subdirlist, filelist in walk(translated_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This walk doesn't ignore .git.


return counter

for dir_name, subdir_list, file_list in os.walk(root_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This walk doesn't ignore .git.

@jermanuts
Copy link

Any progress?

@tokideveloper
Copy link
Contributor

Any progress?

This PR focuses on internationalization of the Qubes OS website including the documentation, using Markdown and Jekyll.

Meanwhile, we moved away from that approach towards a conversion of the documentation from Markdown to reStructuredText, render it using Sphinx and host it on ReadTheDocs. This is our (new) next step.

See QubesOS/qubes-doc#1367

The next steps afterwards would be internationalization and localization of the documentation and later the Qubes OS website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants