[Flightgear-devel] Translations tools, file formats

Discussion:

[Flightgear-devel] Translations tools, file formats

James Turner

2017-05-18 15:16:39 UTC

Hi,

I would like to enable full translations of the launcher, which is easy using Qt, but raises some tooling questions. Qt has its own translation system, including a translation GUI tool [1], which would be easy for me to use. But then we would have some strings in one translation system (Qtâs) and others in the existing format.

Hence I wondered about either:

- using our existing translation format within Qt - quite possible, but has the issue that I am not sure our current solution scales well to the number of strings a GUI can use. And thereâs some missing features in our current system that would be nice to have (but not blockers)

- using the Qt translation formats (which are well defined and public XML standards) and hence optionally, the translation tooling UI, for all of FG.

(Note for people who know Qt, Iâm proposing to use the .TS files here, not worry about running lrelease and using .QM files, since that would imply a runtime Qt dependency which we donât want)

The second would be my suggestion, because the worst case is basically where we are now (translators edit XML by hand) but can use Qt Linguist to detect untranslated strings, access dictionaries and more. In addition with some build system tooling (and hence on Jenkins) we can generate a translation report of which untranslated strings exist for each language, and hence hopefully not miss any before a release.

(Of course this tooling could also be created in a scripting language around our existing XML format, if anyone cared to do so)

Any thoughts on this? The amount of work involved is fairly low either way, but migrating strings between formats is slightly tedious, so I would like to agree the end destination before I create translation templates for the launcher.

[1] http://doc.qt.io/qt-5/linguist-translators.html

Kind regards,
James

Florent Rougon

2017-05-19 08:17:32 UTC

Hi James,

Post by James Turner
I would like to enable full translations of the launcher, which is easy using
Qt, but raises some tooling questions. Qt has its own translation system,
including a translation GUI tool [1], which would be easy for me to use. But
then we would have some strings in one translation system (Qt’s) and others in
the existing format.
- using our existing translation format within Qt - quite possible, but has
the issue that I am not sure our current solution scales well to the number of
strings a GUI can use. And there’s some missing features in our current system
that would be nice to have (but not blockers)

Quite true.

Post by James Turner
- using the Qt translation formats (which are well defined and public XML
standards) and hence optionally, the translation tooling UI, for all of FG.
(Note for people who know Qt, I’m proposing to use the .TS files here, not
worry about running lrelease and using .QM files, since that would imply a
runtime Qt dependency which we don’t want)

This seems like a very good idea. But we are going to have to replicate
the rules to handle plural forms, right? Also some way to replace the
%1, %2, etc. placeholders, unless we don't use them, of course---which
would be rather inconvenient.

However, since the XML system doesn't support these features, AFAIK,
just migrating from one system to the other doesn't really require these
to work from day 1 (maybe translators will spontaneously enter plural
forms in Qt Linguist, but I suppose the .ts parser could easily ignore
them: I have a .ts under the eyes, but I don't see any of the plural
forms; maybe they don't appear in this file at all?)

Regards

--
Florent

James Turner

2017-05-22 07:26:41 UTC

Post by Florent Rougon
This seems like a very good idea. But we are going to have to replicate
the rules to handle plural forms, right? Also some way to replace the
%1, %2, etc. placeholders, unless we don't use them, of course---which
would be rather inconvenient.
However, since the XML system doesn't support these features, AFAIK,
just migrating from one system to the other doesn't really require these
to work from day 1 (maybe translators will spontaneously enter plural
forms in Qt Linguist, but I suppose the .ts parser could easily ignore
them: I have a .ts under the eyes, but I don't see any of the plural
forms; maybe they don't appear in this file at all?)

Iâm happy to add %1 support to âourâ translation API, I will need to check how complex the plural forms support is to see if itâs worth doing in our API.

But, anyway, I shall continue doing the research on the technical side of this, now I have some feeling this solution is acceptable.

Kind regards,
James

James Turner

2017-05-30 14:27:56 UTC

Post by James Turner
Iâm happy to add %1 support to âourâ translation API, I will need to check how complex the plural forms support is to see if itâs worth doing in our API.
But, anyway, I shall continue doing the research on the technical side of this, now I have some feeling this solution is acceptable.

Update on this: the Qt tools support the standard XLIFF format, which also has a bunch of other open tooling. So I suggest we migrate to XLIFF for the translations in FGData, and can use this to build .qm assets for the launcher (the launcher translations will not be âliveâ since they will be compiled into the build, but itâs less code to write and more compatible with standard Qt translations)

http://docs.oasis-open.org/xliff/xliff-core/v2.0/os/xliff-core-v2.0-os.html <http://docs.oasis-open.org/xliff/xliff-core/v2.0/os/xliff-core-v2.0-os.html>

https://en.wikipedia.org/wiki/XLIFF <https://en.wikipedia.org/wiki/XLIFF>

(includes a list of translation tools support XLIFF)

So the routes will be:

- translators authors in XLIFF (probably quite a basic subset!)

- we need to convert the current translation files to XLIFF, one-off. Horrible python script time!

- current FG translations are read directly at runtime form XLIFF files in FGData (splash, tooltips, etc)

- this means writing a custom XML parser to hook that into the locale code, but our locale system is very simple. This is the bulk of the development work

- some build infrastructure so at compile time, for the launcher and other Qt-based UI, we do:

- convert XLIFF -> TS (using lupdate I think)
- run âlreleaseâ as normal to generate the binary QM files
- link and install as normal for any Qt app

And from then on every string can be translated nicely.

For extracting the strings from Qt, I will need to run the other way (.TS - > XLIFF and merge the new entries) which is also supported.

Comments / thoughts on this approach? Offers of help are also appreciated. We can also, if we choose, evolve the FGLocale code to support more features, like the string-placeholder system which XLIFF supports, or adding other kinds of translation data. But letâs add those incrementally!

Kind regards,
James

Florent Rougon

2017-05-31 09:41:31 UTC

Hi,

Post by James Turner
Update on this: the Qt tools support the standard XLIFF format, which
also has a bunch of other open tooling. So I suggest we migrate to
XLIFF for the translations in FGData, and can use this to build .qm
assets for the launcher (the launcher translations will not be ‘live’
since they will be compiled into the build, but it’s less code to
write and more compatible with standard Qt translations)

If sacrificing the “liveness” of non-Qt translation files isn't too
much, then the embedded resources system could be pushed and used to
load compiled-in, possibly-compressed XLIFF files (it's tested and
working). The little advantage is to allow translations to work even
when FG_ROOT isn't set. Disadvantages are: more indirection, build step
needed to see the effect of a new or modified translation file, and if
the files to embed are in FGData instead of the flightgear repo (the
latter seems more appropriate here), then the FG build process needs to
know the path to FGData, normally by passing '-D FG_DATA_DIR:PATH=...'
to CMake.

Post by James Turner
- translators authors in XLIFF (probably quite a basic subset!)

Directly with Qt Linguist or with some XMLIFF-specific tool? One
important point if using the %1, %2, etc. placeholders, is to have a
tool that checks them in translated strings (to avoid locale-specific
segfaults, probably hard to identify).

Post by James Turner
- we need to convert the current translation files to XLIFF, one-off. Horrible python script time!

In principle, XML to XML conversion shouldn't be horrible. In this case,
there are elements like required “dual parsing” to match translations
with the English strings, some heterogeneity among the input files
(e.g., the tips are more than a simple map or dictionary: they form an
ordered sequence), and there is also the fact that things such as:

main.cxx ->
fgSplashProgress("loading-nav-dat")

and

splash.cxx ->
void fgSplashProgress( const char *identifier, unsigned int percent )
{

[...]

std::string id = std::string("splash/") + identifier;
text = globals->get_locale()->getLocalizedString(id.c_str(), "sys", "");

[...]

}

need to be arranged in a way that is not easy to automate at all (maybe
semi-automated processing followed by manual reviewing + fixing will be
best). I suppose that's what you meant.

Also, I see on the wikipedia page you indicated[1] that there are
several versions of the XLIFF format, so better check that all tools
you think will be needed work well with the chosen version (2.0 in the
links you gave).

Another thing: I believe the menu, options, sys, tips (and atc? so far
only available in English locale) separation won't be needed anymore.
All translations for a given locale can go in one XLIFF file (converted
to a .qm file for the launcher), right?

Post by James Turner
- current FG translations are read directly at runtime form XLIFF files in FGData (splash, tooltips, etc)

(or from embedded resources if deemed worth it)

Post by James Turner
- this means writing a custom XML parser to hook that into the locale code, but our locale system is very simple. This is the bulk of the development work

Yep.
[...]

Post by James Turner
For extracting the strings from Qt, I will need to run the other way
(.TS - > XLIFF and merge the new entries) which is also supported.
Comments / thoughts on this approach? Offers of help are also
appreciated. We can also, if we choose, evolve the FGLocale code to
support more features, like the string-placeholder system which XLIFF
supports, or adding other kinds of translation data. But let’s add
those incrementally!

This makes a lot of formats to deal with... and round-trips, but maybe
they can't be avoided. Did you rule out the .ts format because XLIFF is
more standard? And Qt can't work with it without prior conversion to
.qm?

In principle, I would be willing to help, for instance with the
conversion process of current translation files. However, I must say
that the ceiling is not exactly CAVOK here, figuratively speaking, so I
can't promise anything, sorry. :-|

Regards

[1] https://en.wikipedia.org/wiki/XLIFF

--
Florent

James Turner

2017-05-31 09:54:00 UTC

If sacrificing the âlivenessâ of non-Qt translation files isn't too
much, then the embedded resources system could be pushed and used to
load compiled-in, possibly-compressed XLIFF files (it's tested and
working). The little advantage is to allow translations to work even
when FG_ROOT isn't set. Disadvantages are: more indirection, build step
needed to see the effect of a new or modified translation file, and if
the files to embed are in FGData instead of the flightgear repo (the
latter seems more appropriate here), then the FG build process needs to
know the path to FGData, normally by passing '-D FG_DATA_DIR:PATH=...'
to CMake.

Iâm okay with requiring the presence of FGData to enable translations in the build, but I wonder if people like the current âliveâ workflow for working on translations? Not sure itâs a big deal either way.

The benefit of having help translations available even when FG_DATA is missing, is quite high however.

Post by James Turner
- translators authors in XLIFF (probably quite a basic subset!)

Directly with Qt Linguist or with some XMLIFF-specific tool? One
important point if using the %1, %2, etc. placeholders, is to have a
tool that checks them in translated strings (to avoid locale-specific
segfaults, probably hard to identify).

I would leave that up to each translation author. Note that missing placeholders donât cause segfaults, these strings are not passed to vsprintf() derived functions like sprintf, that would be a huge security risk.

Post by James Turner
- we need to convert the current translation files to XLIFF, one-off. Horrible python script time!

In principle, XML to XML conversion shouldn't be horrible. In this case,
there are elements like required âdual parsingâ to match translations
with the English strings, some heterogeneity among the input files
(e.g., the tips are more than a simple map or dictionary: they form an
main.cxx ->
fgSplashProgress("loading-nav-dat")
and
splash.cxx ->
void fgSplashProgress( const char *identifier, unsigned int percent )
{
[...]
std::string id = std::string("splash/") + identifier;
text = globals->get_locale()->getLocalizedString(id.c_str(), "sys", "");
[...]
}
need to be arranged in a way that is not easy to automate at all (maybe
semi-automated processing followed by manual reviewing + fixing will be
best). I suppose that's what you meant.

Yep.

Also, I see on the wikipedia page you indicated[1] that there are
several versions of the XLIFF format, so better check that all tools
you think will be needed work well with the chosen version (2.0 in the
links you gave).

Yep

Another thing: I believe the menu, options, sys, tips (and atc? so far
only available in English locale) separation won't be needed anymore.
All translations for a given locale can go in one XLIFF file (converted
to a .qm file for the launcher), right?

That was my guess, but if people would find things more manageable in multiple files, we could still permit it.

This makes a lot of formats to deal with... and round-trips, but maybe
they can't be avoided. Did you rule out the .ts format because XLIFF is
more standard? And Qt can't work with it without prior conversion to
.qm?

Right, Qt can convert to/from it, and I figure people would be happier with an open standard for the main translations. (Plus it has plenty of other tooling besides the Qt ones).

In principle, I would be willing to help, for instance with the
conversion process of current translation files. However, I must say
that the ceiling is not exactly CAVOK here, figuratively speaking, so I
can't promise anything, sorry. :-|

No worries, but certainly I would take the help on the conversion process, if you are able.

This is not an urgent project, I have one other thing I want to look which will actually touch onto it (fixing key-bindings across platforms) since localised descriptions of key-bindings would be /another/ big usuability win.

Kind regards,
James

Florent Rougon

2017-05-31 16:23:50 UTC

I’m okay with requiring the presence of FGData to enable translations in the
build, but I wonder if people like the current ‘live’ workflow for working on
translations? Not sure it’s a big deal either way.

Note that the presence of FGData at FG build time would only be required
if:
- the translation files are included as embedding resources (to allow
the translations to work without FG_ROOT set) and;
- said files are left in the FGData repo.

If embedding is used but the files are in the flightgear repo, then
passing "-D FG_DATA_DIR:PATH=..." to FG's CMake command isn't necessary
(my embedded resource compiler, fgrcc, accepts a --root option to
specify the “root directory” for finding resources[1]; if you point to a
directory in the FG source tree, then only this tree is needed).

[1] https://sourceforge.net/u/frougon/flightgear-flightgear/ci/embedded_resources/tree/src/EmbeddedResources/fgrcc.cxx#l750

The current ‘live’ workflow isn't a big deal for developers indeed (esp.
if the Qt strings require to run lrelease & co); it might be if we had
translators who want to test their work before submitting it, but can't
build FG themselves. Difficult to say in the present situation...

However, it is possible to have a switch in FGLocale to enable “live”
behavior precisely for this use case. This could go in the place where
FGLocale receives either an std::string or (nicer) an std::istream
subclass from the EmbeddedResourceManager. The normal code path would
get an std::unique_ptr<std::istream> from
EmbeddedResourceManager::getIStream():

https://sourceforge.net/u/frougon/flightgear-simgear/ci/embedded_resources/tree/simgear/embedded_resources/EmbeddedResourceManager.hxx#l146

This provides contents read from an embedded resource. In “live” mode,
you would use an std::istream obtained by direct opening of the real
XLIFF file instead. Pass this to:

void readXML(std::istream &input, XMLVisitor &visitor,
const std::string &path="");

and you're done.

The benefit of having help translations available even when FG_DATA is
missing, is quite high however.

Okay. I'll post more info about the embedded resources system, then.

I would leave that up to each translation author. Note that missing
placeholders don’t cause segfaults, these strings are not passed to vsprintf()
derived functions like sprintf, that would be a huge security risk.

Regarding the tool, my question was basically: can Qt Linguist edit
XLIFF files directly? (for user-friendliness, and because I have happily
used it with .ts/.qm files in the “past”)

Concerning the safety impact of a translation with bad %n placeholders:
I agree there should be no segfault with a straightforward
implementation of the functionality that QString::arg() provides. So,
indeed, the checks some translation tools implement (at least poedit for
.po files) are mainly useful for early prevention of bugs in such a case.

(translated strings are still slightly security-sensitive regardless of
the implementation, in that an intentionally misleading string might
lead users to do something unwanted...)

Post by Florent Rougon
This makes a lot of formats to deal with... and round-trips, but maybe
they can't be avoided. Did you rule out the .ts format because XLIFF is
more standard? And Qt can't work with it without prior conversion to
.qm?

Right, Qt can convert to/from it, and I figure people would be happier with an
open standard for the main translations. (Plus it has plenty of other tooling
besides the Qt ones).

Okay, I didn't know this format before you mentioned it here.

No worries, but certainly I would take the help on the conversion process, if you are able.
This is not an urgent project, I have one other thing I want to look which
will actually touch onto it (fixing key-bindings across platforms) since
localised descriptions of key-bindings would be /another/ big usuability win.

Ack, thanks.

Regards

--
Florent

James Turner

2017-05-31 22:30:26 UTC

Post by Florent Rougon
Regarding the tool, my question was basically: can Qt Linguist edit
XLIFF files directly? (for user-friendliness, and because I have happily
used it with .ts/.qm files in the âpastâ)

Thatâs one point I need to check for sure - I have spoken to some colleagues since we already used XLIFF for some work, but most of the docs refer to importing and exporting, not live editing.

(All the tools on the wikipedia page edit XLIF directly, however)

BTW, apparently the version of XLIFF which the Qt translations tools peak is 1.2, not 2.0 as I was linking to in previous messages.

I will make some practical experiments around this, anyway.

Kind regards,
James

Florent Rougon

2017-06-01 07:54:22 UTC

That’s one point I need to check for sure - I have spoken to some colleagues
since we already used XLIFF for some work, but most of the docs refer to
importing and exporting, not live editing.

Well, if it can import and export, I don't really see what would be
missing.

(All the tools on the wikipedia page edit XLIF directly, however)

Yes, I saw that list now, I just don't know these tools so I have no
opinion on them.

BTW, apparently the version of XLIFF which the Qt translations tools peak is
1.2, not 2.0 as I was linking to in previous messages.

Version 1.2 has the weird 'maxbytes' attribute that, according to the
wikipedia example[1], seems to depend on the encoding used for the XML
document, something like UTF-16 given this excerpt:

<trans-unit id="1" maxbytes="14">
<source xml:lang="en-US">Quetzal</source>
<target xml:lang="ja-JP">Quetzal</target>
</trans-unit>

The example given for XLIFF 2.0 doesn't have this. Not a big deal
though, as long as expat can read it, which I think is the case.

I will make some practical experiments around this, anyway.

Thanks!

[1] https://en.wikipedia.org/wiki/XLIFF

--
Florent

Florent Rougon

2017-06-22 14:37:43 UTC

[ Replying to
<https://sourceforge.net/p/flightgear/mailman/message/35906901/>
which mostly belongs here ]

Yes, in theory a short string (thinking of typical localized messages of
a program) can be registered and retrieved the same as a README file,
but the infrastructure wasn't designed for this, and I'm not sure it's a
good fit. The locale fallback algorithm is rather simple and can be
copy-pasted to a more specialized infrastructure if that's really the
thing... which I doubt. :)

That being said, I don't see a fundamental problem with this:

- Make it so that we generate suitable XLIFF files (or whatever)
containing the translations with the metadata you want in order to
describe your action strings (action-name, action-description,
action-label-on, action-label-off...).

- Make a special-case in fgrcc or a standalone similar tool that reads
such files and writes a .cxx + .hxx pair of files that registers
each string as a single resource with the EmbeddedResourceManager
with the following metadata:
* a virtual path such as /actions/toggle-thrust-reversers/name
/actions/toggle-thrust-reversers/description
/actions/toggle-thrust-reversers/label-on
/actions/toggle-thrust-reversers/label-off
* the associated locale (nothing particular to do here: same way
as other resources are handled).

- Then it becomes very easy to add the methods you mentioned to the
EmbeddedResourceManager, that fetch the appropriate resource based
on the aforementioned systematic naming scheme. Once the virtual
path is constructed (for instance
/actions/toggle-thrust-reversers/label-on), the
EmbeddedResourceManager's locale-based selection mechanism will
naturally pick the best translation available.

If things are done this way, the resources are probably best written in
uncompressed form by the fgrcc-or-similar-tool (using
RawEmbeddedResource[1] or a similar class), for FG-runtime speed
reasons, and also because for very small strings, compression makes
things take *more* space (zlib header...). This is because with such a
scheme, each string is available via its own resource.

Regards

[1] https://sourceforge.net/p/flightgear/simgear/ci/next/tree/simgear/embedded_resources/EmbeddedResource.hxx#l102

--
Florent

James Turner

2017-06-22 15:30:35 UTC

Yes, in theory a short string (thinking of typical localized messages of
a program) can be registered and retrieved the same as a README file,
but the infrastructure wasn't designed for this, and I'm not sure it's a
good fit. The locale fallback algorithm is rather simple and can be
copy-pasted to a more specialized infrastructure if that's really the
thing... which I doubt. :)

Sorry, I think this entire discussion has got a bit confused due to two poorly chosen words by me.

Iâm not talking about the embedded resource system at all, I was meaning the code in FGLocale.

Which already has support for categories such as âtipâ, âmenuâ and âhelpâ.

The issue would be generating the list of translations in a more automated way, probably a Python script which uses the existing Python module we have for parsing PropertyList-XML files, to find a list of translation tokens in a hard-coded set of XML files.

Is that clearer?

Kind regards,
James

Florent Rougon

2017-06-22 19:54:43 UTC

Post by James Turner
Sorry, I think this entire discussion has got a bit confused due to two poorly
chosen words by me.

[...]

Post by James Turner
The issue would be generating the list of translations in a more automated
way, probably a Python script which uses the existing Python module we have
for parsing PropertyList-XML files, to find a list of translation tokens in a
hard-coded set of XML files.
Is that clearer?

Right, I got confused because of the “your API”.

Extracting things from XML with Python is easy. What's less easy from my
POV is to do something useful with the extracted strings here, i.e.
merge them usefully into the (not yet) existing translation files (XLIFF
in your proposal). With gettext, and probably other systems, if you have
for instance “shut down engines” translated in a few languages, and later
the string is changed to “shut down the engines”, then the string is
marked as “fuzzy” and translators get to see this along with their
outdated translation in poedit. This is quite convenient, esp. for
largish strings that get small incremental changes over time.

Also, even if we didn't do this and only *added* a new translated string
every time our tool sees a string that is not in the existing XLIFF,
then the XLIFF files would grow, and grow, and grow... And newly-added
translations (when a new language is added) would cause translators to
translate the unused strings too. To mark obsolete strings (which
gettext does), the string extraction tool must have a global view of all
source files at once (and of course, the .pot file in the case of
gettext, which contains just the English strings with metadata, but no
translation).

In order to have this global view, I see two ways:

(a) Files which need custom string extraction are handled specially,
i.e. there would be a pool of XLIFF files containing only
translations for FG XML stuff, and a separate pool for strings
extracted from .cxx/.hxx files. The custom Python script would
read the XLIFF file(s) from the first pool and the FG XML files in
the same run to determine which strings are obsolete. No easy way
that I can see to handle fuzzy strings, unless there is
third-party support for that.

(b) Have only one pool of XLIFF files for FG's .cxx and .hxx files as
well as FG XML files. The advantage is that the build
infrastructure would be simpler as well as the translation
workflow (only one translation file to send for updates). The main
disadvantage is that I believe we would have to write our own
string extraction tool for FG XML as in (a) *and also* for
.cxx/hxx, which is doable but seems a bit non-trivial (depending
on how reliable you want the C++ parser to be...).

Maybe there is support in existing XLIFF tools to submit new strings and
see if they correspond to already-exisiting strings in a given XLIFF
file (= detect fuzzy strings), and possibly automatically merge the
submitted strings. I don't know that, I've never used XLIFF.

Regards

--
Florent

James Turner

2017-06-22 20:04:53 UTC

Post by Florent Rougon
(a) Files which need custom string extraction are handled specially,
i.e. there would be a pool of XLIFF files containing only
translations for FG XML stuff, and a separate pool for strings
extracted from .cxx/.hxx files. The custom Python script would
read the XLIFF file(s) from the first pool and the FG XML files in
the same run to determine which strings are obsolete. No easy way
that I can see to handle fuzzy strings, unless there is
third-party support for that.

My presumption is that we extract strings in two ways:

- from Qt files (CXX or QML) using the tr() / qsTr mechanism, and then via lupdate <-> XLIFF

- from XML files (which are a set fixed at build time, basically contents of FG_DATA/translations/default or en) using some Python

This will represent all the strings to be translated in a given version of FG.

Am I missing something here which makes the impossible?

Kind regards,
James

Florent Rougon

2017-06-23 07:15:32 UTC

Post by James Turner
- from Qt files (CXX or QML) using the tr() / qsTr mechanism, and then
via lupdate <-> XLIFF

Okay, this doesn't cover the C++ non-Qt case. If I understand right, at
least basic use of tr() (by “basic use”, I mean excluding plural forms)
could be implemented on non-Qt builds by conditionally #defining tr(),
when Qt isn't detected, to invoke the appropriate FGLocale method that
would load translations from XLIFF (possibly via an embedded resource)?

Post by James Turner
- from XML files (which are a set fixed at build time, basically
contents of FG_DATA/translations/default or en) using some Python
This will represent all the strings to be translated in a given version of FG.

My main questions are:
- What are you going to do precisely with the strings extracted from
FG XML files?
- Do you intend to put them in their own pool of XLIFF files, or in
the same that was obtained above with lupdate <-> XLIFF?
- If the latter, how do you detect obsolete strings (those that aren't
used anymore and thus don't need to be translated)?

Regards

--
Florent

James Turner

2017-06-23 08:02:01 UTC

Post by Florent Rougon
Okay, this doesn't cover the C++ non-Qt case. If I understand right, at
least basic use of tr() (by âbasic useâ, I mean excluding plural forms)
could be implemented on non-Qt builds by conditionally #defining tr(),
when Qt isn't detected, to invoke the appropriate FGLocale method that
would load translations from XLIFF (possibly via an embedded resource)?

Correct, but, do we need the C++ non-Qt use case? My guess / hope is we donât, for the sake of a simple life.

Post by Florent Rougon

Post by James Turner
- from XML files (which are a set fixed at build time, basically
contents of FG_DATA/translations/default or en) using some Python
This will represent all the strings to be translated in a given version of FG.

- What are you going to do precisely with the strings extracted from
FG XML files?
- Do you intend to put them in their own pool of XLIFF files, or in
the same that was obtained above with lupdate <-> XLIFF?
- If the latter, how do you detect obsolete strings (those that aren't
used anymore and thus don't need to be translated)?

A separate pool of XLIFFs: I can imagine:

Launcher.ts -> Launcher.xliff (from lupdate)
Gui.ts -> Gui.xliff (from lupdate)

FG_DATA/Locale/default/menu.xml -> menu.xliff
FG_DATA/Locale/default/tips -> tips.xliff

.. and so on ..

If we /want/ to add ânon-Qt C++ extractionâ to this, we can, I just donât think itâs needed out of the box

Again I think the above steps can be run from any Git checkout (of FG + FGData) and generate an exact list of the strings to be translated corresponding to those commits. So no problem to detect obsolete strings. Synchronising the XLIFF files for each language (or combined one) based on the above templates is a standard job for the translation GUIs, eg Qt Linguist.

Kind regards,
James

Florent Rougon

2017-06-23 10:29:14 UTC

Correct, but, do we need the C++ non-Qt use case? My guess / hope is we don’t,
for the sake of a simple life.

Maybe, but unless Qt becomes a mandatory dep, there will still be code
that currently uses translations in non-Qt code paths, e.g., the code
that benefits from options.xml. So, how will it be handled?

If it's “as it is now, except that translations will be looked up from
XLIFF files”, it means we still need a master options.xml file for the
English strings that has to be manually synchronized with the C++ code
using it (not a big problem, and already the case). So, this file allows
a Python script to find all such strings and thus detect obsolete ones
in the previously-generated XLIFF template file (I imagine like
gettext's .pot: containing only English strings). No detection of fuzzy
strings unless there is third-party support, but we can live with that.
The C++ code would be using strings ids (such as 'fg-root-desc' in
options.xml) which would have to be carried over to XLIFF by the script.
XLIFF 1.2 at least has attributes for this:

http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#id

but we'll have to be careful abut scoping rules to avoid id collisions
between e.g. options.xml and menu.xml. XLIFF has a notion of <file> that
could be used for this if well supported by other tools we need such as
Linguist (cf.
<http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#SectionStructure>).
Otherwise, it should be possible to output specific XLIFF files for
options, menu, sys, etc. as you proposed below, but this may be more
inconvenient for translators. There is also the possibility of using
well-defined prefixes in XLIFF id attributes to simulate scoping within
a single <file> element.

OTOH, if the FG code currently relying on options.xml is changed to have
stuff like tr("Specify the root data path"), then there is the problem
of string extraction and how to determine obsolete strings. I don't
think you meant this, since you talked about using separate pools of
XLIFFs.

Launcher.ts -> Launcher.xliff (from lupdate)
Gui.ts -> Gui.xliff (from lupdate)
FG_DATA/Locale/default/menu.xml -> menu.xliff
FG_DATA/Locale/default/tips -> tips.xliff
.. and so on ..
If we /want/ to add ‘non-Qt C++ extraction’ to this, we can, I just don’t
think it’s needed out of the box
Again I think the above steps can be run from any Git checkout (of FG +
FGData) and generate an exact list of the strings to be translated
corresponding to those commits. So no problem to detect obsolete strings.
Synchronising the XLIFF files for each language (or combined one) based on the
above templates is a standard job for the translation GUIs, eg Qt Linguist.

I was confused because you said the script to convert from FG XML to
XLIFF would only be used once. But I think there are two scripts,
actually:
- one to read non-English FG XML translation files and write
corresponding XLIFF, this one only used once;
- one to read the English (master) FG XML files such as options.xml,
and write the corresponding template XLIFF file(s). This one would
be used every time we need to update the translation files.

Did I describe the thing correctly?

Regards

--
Florent

James Turner

2017-06-24 07:27:20 UTC

If it's âas it is now, except that translations will be looked up from
XLIFF filesâ, it means we still need a master options.xml file for the
English strings that has to be manually synchronized with the C++ code
using it (not a big problem, and already the case). So, this file allows
a Python script to find all such strings and thus detect obsolete ones
in the previously-generated XLIFF template file (I imagine like
gettext's .pot: containing only English strings). No detection of fuzzy
strings unless there is third-party support, but we can live with that.
The C++ code would be using strings ids (such as 'fg-root-desc' in
options.xml) which would have to be carried over to XLIFF by the script.
http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#id <http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#id>
but we'll have to be careful abut scoping rules to avoid id collisions
between e.g. options.xml and menu.xml. XLIFF has a notion of <file> that
could be used for this if well supported by other tools we need such as
Linguist (cf.
<http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#SectionStructure <http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#SectionStructure>>).
Otherwise, it should be possible to output specific XLIFF files for
options, menu, sys, etc. as you proposed below, but this may be more
inconvenient for translators. There is also the possibility of using
well-defined prefixes in XLIFF id attributes to simulate scoping within
a single <file> element.

Yes this is basically what I was imagining: we use the current string-ids are the lookup keys for XLIFF.

This is actually a common technique with Qt tr() as well: i.e writing tr(âopen-file-menu-itemâ) instead of tr(âOpen fileâŠâ); - it ensures you donât have any untranslated files in the UI.

And yeah I think XLIFF has sufficient scoping rules to make this work, or we can use separate files initially - which is how weâve been working all along so far.

Kind regards,
James

James Turner

2017-06-24 07:38:20 UTC

Post by James Turner
untranslated files in the UI.

âuntranslated stringsâ, of course

James Turner

2017-06-24 08:40:32 UTC

Post by Florent Rougon
I was confused because you said the script to convert from FG XML to
XLIFF would only be used once. But I think there are two scripts,
- one to read non-English FG XML translation files and write
corresponding XLIFF, this one only used once;
- one to read the English (master) FG XML files such as options.xml,
and write the corresponding template XLIFF file(s). This one would
be used every time we need to update the translation files.
Did I describe the thing correctly?

Missed this bit, yes this is my expectation of how things work, except the âmasterâ wonât be in English necessarily, but use translation IDs. We need to decide, which is better - the old argument for using english was having meaningful strings even in FG_DATA was lost, but embedded resources make the point irrelevant.

Kind regards,
James

Florent Rougon

2017-06-24 09:25:35 UTC

Post by James Turner
This is actually a common technique with Qt tr() as well: i.e writing
tr(“open-file-menu-item”) instead of tr(“Open file…”); - it ensures you don’t
have any untranslated files in the UI.

Ah, didn't know that...

Post by James Turner
Missed this bit, yes this is my expectation of how things work, except the
‘master’ won’t be in English necessarily, but use translation IDs. We need to
decide, which is better - the old argument for using english was having
meaningful strings even in FG_DATA was lost, but embedded resources make the
point irrelevant.

Agreed. I don't have strong feelings, but I have the impressinon that
keeping the current

Translations/en/menu.xml
Translations/en/options.xml
etc.

in $FG_ROOT as the master files will be more convenient than treating
English on the same level as other translations. Otherwise, adding a new
translated string will require to update the English XLIFF file in two
ways:

1) run the script that grabs ids from the master FG XML file and
removes (or marks as such) obsolete strings;
2) update the English translation (XLIFF) for the new id.

(+ rebuild FG of course, but it's necessary either way)

Regards

--
Florent

James Turner

2017-06-24 17:25:18 UTC

Post by Florent Rougon
Agreed. I don't have strong feelings, but I have the impressinon that
keeping the current
Translations/en/menu.xml
Translations/en/options.xml
etc.
in $FG_ROOT as the master files will be more convenient than treating
English on the same level as other translations.

Yes, well itâs also called âdefaultâ in locale.xml. Perhaps we should rename Translations/en to Translations/default? And leave someone free to make an en_CA or en_UK to fix strange American spelling :)

But I agree thatâs a good way to make it clear those files are special.

Kind regards,
James

Florent Rougon

2017-06-24 19:56:40 UTC

Yes, well it’s also called ‘default’ in locale.xml. Perhaps we should rename
Translations/en to Translations/default? And leave someone free to make an
en_CA or en_UK to fix strange American spelling :)

AFAIK, the "default" name in locale.xml isn't used except for people
having a very strange locale setup (LANG=default...). So, TTBOMK, it is
only decorative for now---the code always uses the first node as default
locale:

FGLocale::FGLocale(SGPropertyNode* root) :
_intl(root->getNode("/sim/intl",0, true)),
_defaultLocale(_intl->getChild("locale",0, true))
{
}

That being said, in the context of this discussion with English (en_US)
being treated specially, I agree it would make sense to rename "en" to
"default" for the sake of clarity.

Regards

--
Florent

James Turner

2017-06-25 09:56:23 UTC

Post by Florent Rougon
That being said, in the context of this discussion with English (en_US)
being treated specially, I agree it would make sense to rename "en" to
"default" for the sake of clarity.

Yes, this was my feeling.

The action/keybinding stuff is close to being committable - will aim to get it merged this week, since then Iâm on vacation for two weeks. Once thatâs done we can start working on the translation changes.

Kind regards,
James

James Turner

2017-06-26 15:27:42 UTC

Post by James Turner
The action/keybinding stuff is close to being committable - will aim to get it merged this week, since then Iâm on vacation for two weeks. Once thatâs done we can start working on the translation changes.

Actually re-thinking this - I am nervous about how many cases there are to test, so I want to add some automated test coverage, and I doubt I can get this done in a week. Rather than push it with some uncertainty and have people fighting weird key-binding issues for two weeks, Iâd rather hold off. If I magically get very productive in the next few days and get the tests written, and they all pass first time, I will merge it, but otherwise, it will happen in August.

(If that does happen, I will also immediately go and buy some lottery tickets, since luck like that does not happen oftenâŠ.)

Kind regards,
James

Florent Rougon

2017-06-26 16:45:30 UTC

Post by James Turner
Actually re-thinking this - I am nervous about how many cases there are to
test, so I want to add some automated test coverage, and I doubt I can get
this done in a week. Rather than push it with some uncertainty and have people
fighting weird key-binding issues for two weeks, I’d rather hold off. If I
magically get very productive in the next few days and get the tests written,
and they all pass first time, I will merge it, but otherwise, it will happen
in August.

Okay, no problem for me with this...

Post by James Turner
(If that does happen, I will also immediately go and buy some lottery tickets,
since luck like that does not happen often….)

;)

--
Florent

Florent Rougon

2017-07-13 14:10:11 UTC

Hi,

Hoping you are having nice holidays. :)

The script in branch
<https://sourceforge.net/u/frougon/flightgear-fgmeta/ci/i18n-work/tree/>
(translation-infrastructure/convert-translation-files.py) converts all
FG XML translation files[1] to some XLIFF 1.2 that Qt Linguist can work
with (one .xlf file per locale, and one <file> element per .xlf file,
see the comments in the file):

http://imgur.com/a/fWcRW

I call the script this way (this is after renaming Translations/en to
Translations/default as you proposed here):

cd fgdata/Translations && \
fgmeta/translation-infrastructure/convert-translation-files.py \
--output-format=xliff --output-dir=wherever-you-want \
--master-dir=default \
de es fr it nl pl pt zh_CN

Options can be seen with 'convert-translation-files.py --help'.

Regards

[1] Except for default/atc.xml which is not translated, because it
probably doesn't make much sense this way due to phraseology being
too different in various areas around the world.

--
Florent

James Turner

2017-07-17 16:55:23 UTC

Post by Florent Rougon
The script in branch
<https://sourceforge.net/u/frougon/flightgear-fgmeta/ci/i18n-work/tree/ <https://sourceforge.net/u/frougon/flightgear-fgmeta/ci/i18n-work/tree/>>
(translation-infrastructure/convert-translation-files.py) converts all
FG XML translation files[1] to some XLIFF 1.2 that Qt Linguist can work
with (one .xlf file per locale, and one <file> element per .xlf file,
http://imgur.com/a/fWcRW <http://imgur.com/a/fWcRW>
I call the script this way (this is after renaming Translations/en to
cd fgdata/Translations && \
fgmeta/translation-infrastructure/convert-translation-files.py \
--output-format=xliff --output-dir=wherever-you-want \
--master-dir=default \
de es fr it nl pl pt zh_CN
Options can be seen with 'convert-translation-files.py --help'.

Excellent, thanks!

I need to finish off the keybinding + action changes and will look at translations immediately afterwards.

Kind regards,
James

Florent Rougon

2017-07-18 14:30:44 UTC

This post might be inappropriate. Click to display it.

James Turner

2017-07-18 14:43:37 UTC

Post by Florent Rougon
I've pushed a change this morning (still in
- Most of the code is now available as a library (Python modules inside
the flightgear.meta Python package. It might be a good idea in the
future to lay out all the Python 3 FG meta code using a similar
structure: this would make it possible to use all the available Python
code together if needed: this l10n stuff, PropertyList files parsing,
whatever).

Yes, I would emphatically suggest we do that, I am only a moderate user of Python 3 so please point out any dumb things I’m doing.

Post by Florent Rougon
- More OOP to allow clean extensibility, in particular to deal with
FlightGear XML localization files that have different structures
depending on the category. This was already needed, but so trivial
(sys.xml having everything inside a <splash> element) that a simple
hack did the job. The current structure makes this cleaner: for each
category that needs different parsing rules, one can define a subclass
of flightgear.meta.i18n.L10NResourceManagerBase that knows how to
parse resource files of said category. This will allow one to have,

To be honest I would rather we change the C++ so the XML structure is consistent, I certainly don’t think we should encourage arbitrary XML structures. But, I am glad you did this work.

Post by Florent Rougon
I got a request in private about translating dialogs generated from
property tree subtrees, e.g. boolean props that could (?) have 'label'
and 'tooltip' attributes on the respective property nodes to describe
a dialog with checkboxes; by using values such as
dialogs/draw-masks:widgets:render-sunlight:label:0 or
dialogs/draw-masks:widgets:render-sunlight:tooltip:0 for such
attributes (see below), the FG C++ code could easily get translations
for said labels and tooltips from a
Translations/<langCode>/dialogs.xml file.

Personally I expect this to change with Qt-based dialogs so I would persoanlly hold off, but it sounds pretty simple to do for the existing dialogs so go ahead if you wish. We have the usual issue that PUI can’t render non ASCII glyphs however, so I really question the value of translation support there.

Post by Florent Rougon
- The two currently-used structures for FlightGear XML localization
files are handled by the classes
flightgear.meta.i18n.BasicL10NResourceManager (for all categories but
'sys') and flightgear.meta.i18n.SysL10NResourceManager (for 'sys').
- To support this, flightgear.meta.i18n.AbstractTranslationUnitId can be
subclassed in order to provide nice Python id objects for translation
units, that contain all the needed metadata to describe where in a
complex hierarchy a given translation unit occurs. For instance, the
above example could lead to XLIFF ids (obtained from the corresponding
Python id objects) such as
dialogs/draw-masks:widgets:render-sunlight:label:0 (:0 being for the
PropertyNode index, as already the case). This will allow C++ code to
look up translated strings in the XLIFF files based on the structure
in the corresponding FlightGear XML localization files (dialogs.xml in
the example).
- Output XLIFF <trans-unit> 'id' attributes now look like
menu/aircraft-checklists:0 instead of menu:aircraft-checklists:0
(easier to read with a "deep" structure as in the above example).
- code to update Translations/<langCode>/<cat>.xml from
* strings in the latter but not in the former must be added with
an empty translation and approved="no" (new strings);
* strings in the former but not in the latter must be marked with
translate="no" (obsolete or vanished in Linguist-speak);
* strings present in both but where the sourceText differs must be
marked with approved="no" (= needs review by translators).
- currently, the produced XLIFF files only contain <trans-unit>
elements for strings that have a translation in the corresponding
Translations/<langCode>/<cat>.xml file. So, it's not possible yet to
finish an incomplete translation. However, all that is needed to fix
this is to apply the code from the previous point, once it's done,
to the translations produced in the current state (and probably all
strings will have to be marked with approved="no", because many are
probably out of sync with the present default translation).
- a command/option or a separate script to produce skeleton XLIFF
files for new languages (trivial: this is the default (= master)
Translation instance with only the targetLanguage and masterTransl
attributes properly set).
If all goes well, I'll add these missing pieces in the next days...

All sounds good to me, thanks!

Kind regards,
James

Florent Rougon

2017-07-18 15:49:24 UTC

Post by James Turner
Yes, I would emphatically suggest we do that, I am only a moderate user of
Python 3 so please point out any dumb things I’m doing.

Well, I haven't really read FG Python code from others so far. If there
is a particular module, maybe...

Post by James Turner
To be honest I would rather we change the C++ so the XML structure is
consistent, I certainly don’t think we should encourage arbitrary XML
structures. But, I am glad you did this work.

Okay, this didn't make things really more complex, just a matter of
using right abstraction. So with what you said, the code doesn't need to
change (and if */'sys.xml' get flattened, then the tiny
SysL10NResourceManager class could even be removed).

Post by James Turner
Personally I expect this to change with Qt-based dialogs so I would persoanlly
hold off, but it sounds pretty simple to do for the existing dialogs so go
ahead if you wish. We have the usual issue that PUI can’t render non ASCII
glyphs however, so I really question the value of translation support there.

Okay, I'm not going in any adventure here, I just made sure the Python
side could accomodate for stuff like that. While it is possible to
implement translation unit ids with more metadata than currently used by
subclassing AbstractTranslationUnitId, we don't *have* to: the current
needs are well served by BasicTranslationUnitId, which basically contains:
- a category name (sys, menu, options...);
- an id (e.g., enable-mouse-pointer-desc: XML tag names in current FG
XML l10n files);
- an index (PropertyNode index needed when there are several sibling
nodes with the same tag name);
- logic to make this pleasant to use: essentially string conversions,
comparison special methods allowing standard sorting of said ids,
and __hash__() to use them in mappings.

Thanks for your feedback!

Regards

--
Florent

Alessandro Menti

2017-05-19 20:57:17 UTC

Hi James,
I would also choose the second alternative (Qt translation formats) to
rely (mostly) on existing tools.

Post by James Turner
The second would be my suggestion, because the worst case is basically where we are now (translators edit XML by hand) but can use Qt Linguist to detect untranslated strings, access dictionaries and more. In addition with some build system tooling (and hence on Jenkins) we can generate a translation report of which untranslated strings exist for each language, and hence hopefully not miss any before a release.
(Of course this tooling could also be created in a scripting language around our existing XML format, if anyone cared to do so)

There are a couple of online services that provide those features, e.g.
Transifex [1] or Crowdin [2], via a Web UI. Both are free for open
source projects and have command-line tools for pulling/pushing
translations automatically, which could be integrated into Jenkins;
while the former is - in my opinion - more popular, the second supports
more document formats.

Cheers,
Alessandro Menti

[1] https://www.transifex.com/
[2] https://crowdin.com/

James Turner

2017-05-22 07:25:07 UTC

Post by Alessandro Menti
I would also choose the second alternative (Qt translation formats) to
rely (mostly) on existing tools.

Post by James Turner
The second would be my suggestion, because the worst case is basically where we are now (translators edit XML by hand) but can use Qt Linguist to detect untranslated strings, access dictionaries and more. In addition with some build system tooling (and hence on Jenkins) we can generate a translation report of which untranslated strings exist for each language, and hence hopefully not miss any before a release.
(Of course this tooling could also be created in a scripting language around our existing XML format, if anyone cared to do so)

There are a couple of online services that provide those features, e.g.
Transifex [1] or Crowdin [2], via a Web UI. Both are free for open
source projects and have command-line tools for pulling/pushing
translations automatically, which could be integrated into Jenkins;
while the former is - in my opinion - more popular, the second supports
more document formats.

Thanks for the info on those, will take a look!

Kind regards,
James

31 Replies
23 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

James Turner 2017-05-18 15:16:39 UTC

Florent Rougon 2017-05-19 08:17:32 UTC

James Turner 2017-05-22 07:26:41 UTC

James Turner 2017-05-30 14:27:56 UTC

Florent Rougon 2017-05-31 09:41:31 UTC

James Turner 2017-05-31 09:54:00 UTC

Florent Rougon 2017-05-31 16:23:50 UTC

James Turner 2017-05-31 22:30:26 UTC

Florent Rougon 2017-06-01 07:54:22 UTC

Florent Rougon 2017-06-22 14:37:43 UTC

James Turner 2017-06-22 15:30:35 UTC

Florent Rougon 2017-06-22 19:54:43 UTC

James Turner 2017-06-22 20:04:53 UTC

Florent Rougon 2017-06-23 07:15:32 UTC

James Turner 2017-06-23 08:02:01 UTC

Florent Rougon 2017-06-23 10:29:14 UTC

James Turner 2017-06-24 07:27:20 UTC

James Turner 2017-06-24 07:38:20 UTC

James Turner 2017-06-24 08:40:32 UTC

Florent Rougon 2017-06-24 09:25:35 UTC

James Turner 2017-06-24 17:25:18 UTC

Florent Rougon 2017-06-24 19:56:40 UTC

James Turner 2017-06-25 09:56:23 UTC

James Turner 2017-06-26 15:27:42 UTC

Florent Rougon 2017-06-26 16:45:30 UTC

Florent Rougon 2017-07-13 14:10:11 UTC

James Turner 2017-07-17 16:55:23 UTC

Florent Rougon 2017-07-18 14:30:44 UTC

James Turner 2017-07-18 14:43:37 UTC

Florent Rougon 2017-07-18 15:49:24 UTC

Alessandro Menti 2017-05-19 20:57:17 UTC

James Turner 2017-05-22 07:25:07 UTC

about - legalese

Loading...