i18n/l10n summary

Discussion:

i18n/l10n summary

Jean-Christophe Helary

2017-05-28 05:29:01 UTC

The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).

====================================================

My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected.
Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some
use "error", some use other mechanisms.

Internally, they all boil down to a small set of C functions, which is
where we should make these changes.
====================================================

Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever...

Jean-Christophe

Drew Adams

2017-05-28 14:27:10 UTC

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

(Caveat: I haven't been following this thread at all, so
ignore if not helpful.)

Is the idea that something will be done so that, for example,
`message' automatically uses a translation of the message to
the user's currently preferred language? I.e., if that is
what is planned, isn't it perhaps too systematic - all or
nothing?

If so, then perhaps there should be a way to easily, from
Lisp, specify the target language explicitly - e.g. by an
optional `message' argument or (better, because the scope
is controllable without changing the `message' calls) by
binding a variable.

This, as opposed to automatically and always just making
`message' target whatever language is currently being
used/preferred by the user.

And in that case, there should perhaps be a user option
that overrides such a language choice by Lisp code. IOW,
in general, let code control the language for a given
`message' call or for a given scope (by a variable), but
let a user customize Emacs to say whether to allow this.

Jean-Christophe Helary

2017-05-28 14:36:01 UTC

Post by Drew Adams

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

(Caveat: I haven't been following this thread at all, so
ignore if not helpful.)
Is the idea that something will be done so that, for example,
`message' automatically uses a translation of the message to
the user's currently preferred language? I.e., if that is
what is planned, isn't it perhaps too systematic - all or
nothing?

The idea is to have the messages displayed according to the *specified* language.

How the language is specified has been considered out of the scope of this discussion, if I'm not wrong. But that's the idea.

Basically, there are 2 possible states:
1) Emacs new install → the language is the OS default
2) User setting within Emacs (allows for normal use and for l10n testing).

Jean-Christophe

Eli Zaretskii

2017-05-28 15:33:01 UTC

Date: Sun, 28 May 2017 07:27:10 -0700 (PDT)
Is the idea that something will be done so that, for example,
`message' automatically uses a translation of the message to
the user's currently preferred language? I.e., if that is
what is planned, isn't it perhaps too systematic - all or
nothing?
If so, then perhaps there should be a way to easily, from
Lisp, specify the target language explicitly - e.g. by an
optional `message' argument or (better, because the scope
is controllable without changing the `message' calls) by
binding a variable.

(let ((current-language-environment "FOO"))
(message "FOOBAR"))

And in that case, there should perhaps be a user option
that overrides such a language choice by Lisp code.

M-x set-language-environment RET

Jean-Christophe Helary

2017-06-05 12:55:08 UTC

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

Just as a reminder, we'll need to update all the texi files so that they include:
@documentlanguage
@documentencoding

Jean-Christophe

Jean-Christophe Helary

2017-07-17 23:22:07 UTC

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

@documentlanguage
@documentencoding

Is it ok to proceed with that ?

Jean-Christophe

Jean-Christophe Helary

2017-07-22 12:48:29 UTC

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

@documentlanguage
@documentencoding

Is it ok to proceed with that ?

I don't think there was a reply to that so I guess that's a yes. I'm going to proceed and send a patch here and then wait for comments.

Jean-Christophe

Eli Zaretskii

2017-07-22 13:06:19 UTC

Date: Sat, 22 Jul 2017 21:48:29 +0900

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

@documentlanguage
@documentencoding

Is it ok to proceed with that ?

I don't think there was a reply to that so I guess that's a yes.

Actually, I didn't reply because I didn't understand what was the
question. Can you remind me?

Jean-Christophe Helary

2017-07-22 13:45:41 UTC

Post by Eli Zaretskii

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and
the likes so that developers don't have to bother with any l10n
mechanism on their part (besides for writing clean strings).

@documentlanguage
@documentencoding

Is it ok to proceed with that ?

I don't think there was a reply to that so I guess that's a yes.

Actually, I didn't reply because I didn't understand what was the
question. Can you remind me?

It was just about making sure that all the files (in fact only docstyle.texi) included "@documentlanguage en_US" and "@documentencoding UTF-8".

After checking the files I realized that all had "@include docstyle.texi" which already had "@documentencoding UTF-8", so I just added "@documentlanguage en_US" there.

Let me know if there is a problem with that.

Jean-Christophe

Eli Zaretskii

2017-07-22 14:08:54 UTC

Date: Sat, 22 Jul 2017 22:45:41 +0900

en_US is the default in the absence of an explicit @documentlanguage,
so I'm not sure I understand why would we need to add it. It will
change nothing, AFAIK.

Jean-Christophe Helary

2017-07-22 23:54:36 UTC

Post by Eli Zaretskii

Date: Sat, 22 Jul 2017 22:45:41 +0900

so I'm not sure I understand why would we need to add it. It will
change nothing, AFAIK.

When po4a extracts translatable text to create the pot files docstyle.texi will also have a pot file that includes "@documentlanguage en_US" and that will allow translators to change that to "@documentlanguage fr_FR" etc. If we don't had that to docstyle.texi somebody will have to add the language string manually and that's an extra task that we'll have to check.

Jean-Christophe

Eli Zaretskii

2017-07-23 14:39:23 UTC

Date: Sun, 23 Jul 2017 08:54:36 +0900

Post by Eli Zaretskii
so I'm not sure I understand why would we need to add it. It will
change nothing, AFAIK.

Sorry, I don't think I follow. Does po4a understand Texinfo in
general and the @documentlanguage directive in particular? If it
does, why doesn't it also know that en_US is the default when no such
directive is present? And if it doesn't understand Texinfo, why would
the presence of @documentlanguage change anything in its output? And
where would someone need to add the language string manually -- in
what file?

Thanks.

Jean-Christophe Helary

2017-07-23 23:29:37 UTC

Post by Eli Zaretskii

Date: Sun, 23 Jul 2017 08:54:36 +0900

Post by Eli Zaretskii
so I'm not sure I understand why would we need to add it. It will
change nothing, AFAIK.

Sorry, I don't think I follow. Does po4a understand Texinfo in

No, it doesn't. It just extracts text. po4a is the tool that has been developed to allow people who are familiar with po to work with documentation files. It just round-trips the document to and from the po format and is not at all as smart as gettext.

For ex. The current docstyle.texi is converted to a docstyle.texi.fr.po whose contents is:

================================
# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <***@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2017-07-24 08:11+0900\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <***@ADDRESS>\n"
"Language-Team: LANGUAGE <***@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#. type: Plain text
#: docstyle.texi:6
msgid "@documentencoding UTF-8 @documentlanguage en_US"
msgstr ""
================================

It is arguable that we could have something better like:

#. type: documentencoding
#: docstyle.texi:2
msgid "UTF-8"
msgstr ""

#. type: documentlanguage
#: docstyle.texi:3
msgid "en_US"
msgstr ""

but that's a separate discussion.

Post by Eli Zaretskii
And if it doesn't understand Texinfo, why would
And where would someone need to add the language string manually -- in
what file?

If the directive were not present, somebody would need to manually add @documentlanguage [appropriate language] directive to the docstyle.texi file of that language.

Jean-Christophe

Eli Zaretskii

2017-07-24 14:47:16 UTC

Date: Mon, 24 Jul 2017 08:29:37 +0900

Post by Eli Zaretskii
And where would someone need to add the language string manually -- in
what file?

OK, thanks. But AFAIU, po4a can be told to add text to the translated
document; couldn't we simply insert @documentlanguage in the
translated manual that way, without expecting the original manuals to
add this redundant directive to the English original?

Jean-Christophe Helary

2017-07-24 15:34:54 UTC

Post by Eli Zaretskii

Date: Mon, 24 Jul 2017 08:29:37 +0900

Post by Eli Zaretskii
And where would someone need to add the language string manually -- in
what file?

OK, thanks. But AFAIU, po4a can be told to add text to the translated
translated manual that way, without expecting the original manuals to
add this redundant directive to the English original?

I was not aware of this option, after checking the po4a manual here is what it says:

https://po4a.alioth.debian.org/man/man7/po4a.7.php
"HOWTO add extra text to translations (like translator's name)?"

Post by Eli Zaretskii
HOWTO add extra text to translations (like translator's name)?
Because of the gettext approach, doing this becomes more difficult in po4a than it was when simply editing a new file along the original one. But it remains possible, thanks to the so-called addenda.
It may help the comprehension to consider addenda as a sort of patches applied to the localized document after processing. They are rather different from the usual patches (they have only one line of context, which can embed Perl regular expression, and they can only add new text without removing any), but the functionalities are the same.
Their goal is to allow the translator to add extra content to the document which is not translated from the original document. The most common usage is to add a section about the translation itself, listing contributors and explaining how to report bug against the translation.
An addendum must be provided as a separate file. The first line constitutes a header indicating where in the produced document they should be placed. The rest of the addendum file will be added verbatim at the determined position of the resulting document.
The header has a pretty rigid syntax: It must begin with the string PO4A-HEADER:, followed by a semi-colon (;) separated list of key=value fields. White spaces ARE important. Note that you cannot use the semi-colon char (;) in the value, and that quoting it doesn't help.

So we'd have to create and maintain as many files as there are translations, instead of just adding 1 directive in 1 texi file and be done with it.

Jean-Christophe

Eli Zaretskii

2017-07-24 15:51:01 UTC

Date: Tue, 25 Jul 2017 00:34:54 +0900
So we'd have to create and maintain as many files as there are translations

No, because the file can be auto-generated: its contents is simple and
known in advance as soon as the target language is specified.

Anyway, I think we should defer adding this directive until we have
the first translation of the first manual. It sounds wrong to me to
make preparations without having even a single client to benefit from
it.

Jean-Christophe Helary

2017-07-24 16:08:29 UTC

Post by Eli Zaretskii

Date: Tue, 25 Jul 2017 00:34:54 +0900
So we'd have to create and maintain as many files as there are translations

No, because the file can be auto-generated: its contents is simple and
known in advance as soon as the target language is specified.

But that's only a trick for one specific case. How is that different from the code I modified in package.el ?
I've just opened tutorial.el and found that we have functions there that do not consider English is the default. Why not generalize that ?

Post by Eli Zaretskii
Anyway, I think we should defer adding this directive until we have
the first translation of the first manual. It sounds wrong to me to
make preparations without having even a single client to benefit from
it.

What I did on package.el was similar. Except for a few obvious errors, the new strings only benefit potential translators that we'll only have when the elisp strings are exposed for translation which is even further down the localization road...

Also, I *am* working on the documentation French translation and that's why I found about this directive. If you want to see the French manual to consider adding the directive I'm fine with that, but I honestly consider that as a punishment rather as an incentive to contribute.

Jean-Christophe

Eli Zaretskii

2017-07-24 16:29:55 UTC

Date: Tue, 25 Jul 2017 01:08:29 +0900
Also, I *am* working on the documentation French translation and that's why I found about this directive. If you want to see the French manual to consider adding the directive I'm fine with that, but I honestly consider that as a punishment rather as an incentive to contribute.

It's not a punishment, believe me, just a precaution. I've seen too
many cases where some project was started, infrastructure added in
preparation for it, then the project stalled, and we were left with
the infrastructure no one uses. What's worse, after enough time has
passed no one even remembers why we added that.

I'm guessing you are working in a local branch, is that true? (If
not, you should, because manuals are being changed all the time.)
Then just add the directive in your branch, and it will be merged when
your work is brought into master. Would that be okay with you?

Jean-Christophe Helary

2017-07-24 16:48:24 UTC

Post by Eli Zaretskii
I'm guessing you are working in a local branch, is that true? (If
not, you should, because manuals are being changed all the time.)

I'm regularly building po files from the updated texi set and moving all that to an OmegaT project that's not in the repository to avoid any git management. There is a bug in po4a that I reported that keeps it from creating the po file for org mode but I guess Martin Quinson is working on it...

Post by Eli Zaretskii
Then just add the directive in your branch, and it will be merged when
your work is brought into master. Would that be okay with you?

Perfect.

On a side note, I don't see how my French translation can be brought into master since we don't have a location for translated texi file sets... For that we'd need to have a doc/en/... doc/fr/... structure, which does not exist at the moment... But I guess that's something we'll discuss when we have things to add there...

Jean-Christophe

Eli Zaretskii

2017-07-24 16:55:13 UTC

Date: Tue, 25 Jul 2017 01:48:24 +0900
On a side note, I don't see how my French translation can be brought into master since we don't have a
location for translated texi file sets... For that we'd need to have a doc/en/... doc/fr/... structure, which does
not exist at the moment... But I guess that's something we'll discuss when we have things to add there...

We can discuss this later, indeed, but up front I don't see why we'd
need a separate directory, since I think the file name(s) will need to
be different anyway. Texinfo doesn't gracefully handle different
manuals that go under the same name, it wasn't designed to solve that
problem. So it's best to avoid the issue to begin with, and have,
say, emacs-fr.info, emacs-jp.info, etc.

Drew Adams

2017-05-28 21:52:47 UTC

Post by Drew Adams
If so, then perhaps there should be a way to easily, from
Lisp, specify the target language explicitly ... by
binding a variable.

(let ((current-language-environment "FOO"))(message "FOOBAR"))

Post by Drew Adams
And...a user option that overrides such a language choice
by Lisp code.

M-x set-language-environment RET

Great. Thx.

Philipp Stephani

2017-05-31 22:18:11 UTC

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and the likes
so that developers don't have to bother with any l10n mechanism on their
part (besides for writing clean strings).
====================================================
On May 27, 2017, at 10:52, Jean-Christophe Helary <
My very uninformed idea is that we need an independent function that
handles the preferred language check and the catalog parsing based on a
key, and all the string displaying functions (message etc) would be
redefined to call that function when a non default preferred langage
(currently English) is detected.
Yes but from what I've seen in package/el, a lot of translatable texts are
not displayed with "message". Some
use "error", some use other mechanisms.
Internally, they all boil down to a small set of C functions, which is
where we should make these changes.
====================================================
Since it's C, I'm not going to be able to contribute to that before I
understand the language, and the function definitions. I guess it's time I
open that K&R that's been on my shelves forever...

One small aspect would be to implement field numbers for `format' so that
argument indices can be explicitly specified. That is probably quite
important because the word order is different between languages, but it's
also useful in other situations (e.g. when repeating an argument). I've
implemented this in the attached patch.

Jean-Christophe Helary

2017-05-31 22:29:02 UTC

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
====================================================

My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected.
Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some
use "error", some use other mechanisms.

Internally, they all boil down to a small set of C functions, which is
where we should make these changes.
====================================================
Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever...

One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch.

That's really not a *small* aspect. Thank you for thinking about that. I'm in no position to review your work but I suppose others will.

Jean-Christophe

Paul Eggert

2017-06-01 05:18:08 UTC

Thanks for that patch: it's a good move forward for i18n. Some suggestions:

* Today I fixed the bug with "%%" and the 'error' function, so there's no need
for a FIXME or a workaround any more.

* In strings.texi, reorder the format spec description so that it matches the
textual order of a format spec. This should lessen confusion.

* Allow field numbers in a %% spec. All other components of a format spec are
allowed in %%, so odd to report an error for just field numbers.

* There is no need for a special diagnostic for field numbers greater than
PTRDIFF_MAX. Just use the same diagnostic other too-large field numbers use.
This avoids a need for an alloca.

* Reword "Invalid field number `0'" to "Invalid format field number 0" to make
it more obvious that it's a format and there's no need to quote the 0.

Proposed further patch attached (it addresses the above points), along with a
copy of your patch rebased to current master for convenience.

Philipp Stephani

2017-06-01 08:17:37 UTC

Post by Paul Eggert
* Today I fixed the bug with "%%" and the 'error' function, so there's no need
for a FIXME or a workaround any more.
* In strings.texi, reorder the format spec description so that it matches the
textual order of a format spec. This should lessen confusion.
* Allow field numbers in a %% spec. All other components of a format spec are
allowed in %%, so odd to report an error for just field numbers.

The reason I banned that initially is that the behavior for the case "%1$%
%d" is confusing: will the %d take argument 1 or 2? (We should ban such
mixing instead, see below.)

Post by Paul Eggert
* There is no need for a special diagnostic for field numbers greater than
PTRDIFF_MAX. Just use the same diagnostic other too-large field numbers use.
This avoids a need for an alloca.
* Reword "Invalid field number `0'" to "Invalid format field number 0" to make
it more obvious that it's a format and there's no need to quote the 0.
Proposed further patch attached (it addresses the above points), along with a
copy of your patch rebased to current master for convenience.

Thanks, feel free to push.

Two further things:
- Probably there's a bug lurking because the info[n] ought to be indexed by
specification index, not argument index. Something like (format "%1$c %1$d"
?a) will probably do the wrong thing (untested).
- We should ban mixing explicit and implicit field numbers, like POSIX
printf(3) does. The gain from allowing to mix is negligible, and it makes
the implementation and the documentation needlessly complex.

Paul Eggert

2017-06-01 23:20:55 UTC

Post by Philipp Stephani
- Probably there's a bug lurking because the info[n] ought to be
indexed by specification index, not argument index. Something like
(format "%1$c %1$d" ?a) will probably do the wrong thing (untested).

Sorry, I'm not following. That call returns "a 97"; isn't that the
expected result?

Post by Philipp Stephani
- We should ban mixing explicit and implicit field numbers, like POSIX
printf(3) does. The gain from allowing to mix is negligible, and it
makes the implementation and the documentation needlessly complex.

Sounds good, and I installed the attached.

The 1st patch fixes a performance regression introduced by calling
strtoumax. I went whole-hog and removed all calls to strtoumax, since
they're all performance-significant, plus it makes for one less porting
issue to worry about.

The 2nd patch fixes the documentation along the lines that you
suggested. And on further thought, the tradition for Emacs is to
document supported behavior and not worry about slowing Emacs down to
check for undocumented usage (aside from preventing crashes), so with
that in mind the 2nd patch removes the check for %0$ (which never crashes).

Philipp Stephani

2017-06-02 06:52:42 UTC

Post by Paul Eggert

Post by Philipp Stephani
- Probably there's a bug lurking because the info[n] ought to be
indexed by specification index, not argument index. Something like
(format "%1$c %1$d" ?a) will probably do the wrong thing (untested).

Sorry, I'm not following. That call returns "a 97"; isn't that the
expected result?

Wrong example, try (format "%1$c %1$s" ?Â±)

Post by Paul Eggert
And on further thought, the tradition for Emacs is to
document supported behavior and not worry about slowing Emacs down to
check for undocumented usage

Would be great to break that tradition, but that's for another discussion.

Paul Eggert

2017-06-03 08:37:14 UTC

Post by Philipp Stephani
Wrong example, try (format "%1$c %1$s" ?Â±)

Ouch. Fixing that (without adversely affecting performance) would be a bit of a
hassle. Not sure that it's worth it. For now let's just document the limitation.
I installed the attached.

Post by Philipp Stephani

Post by Paul Eggert
And on further thought, the tradition for Emacs is to
document supported behavior and not worry about slowing Emacs down to
check for undocumented usage

Would be great to break that tradition, but that's for another discussion.

Indeed.

Andreas Schwab

2017-06-03 09:12:09 UTC

diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index e80e778..f365c80 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -965,9 +965,10 @@ Formatting Strings
convert the argument with the given number instead of the next
-argument. Field numbers start at 1. A format can contain either
-numbered or unnumbered format specifications but not both, except that
+argument. Field numbers start at 1. A field number should differ
+from the other field numbers in the same format.

printf allows using an argument multiple times, Emacs should do as well.

Andreas.

--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Philipp Stephani

2017-06-03 09:34:07 UTC

Post by Paul Eggert

Post by Philipp Stephani
Wrong example, try (format "%1$c %1$s" ?Â±)

Ouch. Fixing that (without adversely affecting performance) would be a bit of a
hassle. Not sure that it's worth it. For now let's just document the limitation.
I installed the attached.

It's not hard to fix, just the info array needs to be indexed by spec
index, no argument. Installed 7d413cb4da89e0bdd70068e6a5e1dbc57190ed10;
0147cdd4d96f1eaeef720ee0b89bddd27eaf4233 can now be reverted.

Paul Eggert

2017-06-04 15:54:31 UTC

Post by Philipp Stephani
It's not hard to fix,

Thanks. However, on my platform those fixes slowed down a microbenchmark (format
"%d %s %c" 3 "def" ?â) by 7%, so I installed the attached further patch, which
recovered the performance loss for me. It avoids the need for the extra pass
through the format string and it caches more quantities in registers. While I
was at it I changed the doc to agree with POSIX that %% should not have
modifiers (not that we enforce this).

One little thing that has tripped me up in the past. In Emacs the preferred
style is typically to break assignments and initializations before the "=", not
after. Like this:

int some_long_name
= some_long_expression;

I myself prefer the style you used, as it's more column-efficient and it
simplifies text searches for assignments to a variable, but there it is.

Eli Zaretskii

2017-06-04 16:45:10 UTC

Date: Sun, 4 Jun 2017 08:54:31 -0700
However, on my platform those fixes slowed down a microbenchmark (format
"%d %s %c" 3 "def" ?‘) by 7%, so I installed the attached further patch, which
recovered the performance loss for me.

Would it make sense to start collecting all those benchmarks, e.g.
somewhere under test/, so that in due time we will have a sound
body of code to make sure we don't lose performance during
development?

TIA

Paul Eggert

2017-06-04 18:37:46 UTC

Post by Eli Zaretskii
Would it make sense to start collecting all those benchmarks, e.g.
somewhere under test/, so that in due time we will have a sound
body of code

I wouldn't collect microbenchmarks just for the sake of collecting them. They're
flaky, and the overhead of collecting them and maintaining the collection is
likely greater than any future benefit of having them around. If we want to have
a performance testsuite our main problem is building and supporting its
infrastructure, not coming up with examples.

If you're interested despite the above disclaimer, here's what I used:

(defun bench-with (n)
(let ((start (float-time (get-internal-run-time)))
(i 0))
(while (< i n)
(format "%d %s %c" 3 "def" ?‘)
(setq i (1+ i)))
(- (float-time (get-internal-run-time)) start)))

(defun bench-without (n)
(let ((start (float-time (get-internal-run-time)))
(i 0))
(while (< i n)
(setq i (1+ i)))
(- (float-time (get-internal-run-time)) start)))

(defun bench (n)
(- (bench-with n)
(bench-without n)))

Paul Eggert

2017-12-03 05:43:28 UTC

Post by Paul Eggert

Post by Philipp Stephani
It's not hard to fix,

Thanks. However, on my platform those fixes slowed down a microbenchmark (format
"%d %s %c" 3 "def" ?â) by 7%, so I installed the attached further patch, which
recovered the performance loss for me.

Oops, that introduced an off-by-one error that caused Emacs to access one past
the end of the 'info' array. To fix this I installed the attached further patch
into emacs-26 and merged emacs-26 into master.

Eli Zaretskii

2017-06-01 14:21:41 UTC

Date: Wed, 31 May 2017 22:18:11 +0000

I think this should say "field numbers in format spec"
^^^^^
A typo.

Thank you for working on this valuable improvement.

Jean-Christophe Helary

2017-07-02 01:22:28 UTC

I was wondering if that patch (and the fixes that came after) had been committed ?

Have some people started to use it ?

Jean-Christophe

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).
====================================================

My very uninformed idea is that we need an independent function that handles the preferred language check and the catalog parsing based on a key, and all the string displaying functions (message etc) would be redefined to call that function when a non default preferred langage (currently English) is detected.
Yes but from what I've seen in package/el, a lot of translatable texts are not displayed with "message". Some
use "error", some use other mechanisms.

Internally, they all boil down to a small set of C functions, which is
where we should make these changes.
====================================================
Since it's C, I'm not going to be able to contribute to that before I understand the language, and the function definitions. I guess it's time I open that K&R that's been on my shelves forever...
One small aspect would be to implement field numbers for `format' so that argument indices can be explicitly specified. That is probably quite important because the word order is different between languages, but it's also useful in other situations (e.g. when repeating an argument). I've implemented this in the attached patch.
<0001-Implement-field-numbers-in-format-strings.txt>

Eli Zaretskii

2017-07-02 02:34:09 UTC

Date: Sun, 2 Jul 2017 10:22:28 +0900
I was wondering if that patch (and the fixes that came after) had been committed ?

It was.

Jean-Christophe Helary

2018-04-25 12:58:03 UTC

Post by Jean-Christophe Helary
The discussion so far seems to point at modifying 'message' and the likes so that developers don't have to bother with any l10n mechanism on their part (besides for writing clean strings).

Since there does not seem to be an agreement on the "straightening" of the package.el strings that I posted here (even though Eli seemed to be fine with them and nobody raised any objection), I'm going to start working on other packages and I will continue to send patches here.

Eventually somebody will see why that is important and will commit my package.el modifications *or* tell me why they are not acceptable (which I'm totally fine with btw) :)

Jean-Christophe Helary
-----------------------------------------------
http://mac4translators.blogspot.com @brandelune

37 Replies
4 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Jean-Christophe Helary 2017-05-28 05:29:01 UTC

Drew Adams 2017-05-28 14:27:10 UTC

Jean-Christophe Helary 2017-05-28 14:36:01 UTC

Eli Zaretskii 2017-05-28 15:33:01 UTC

Jean-Christophe Helary 2017-06-05 12:55:08 UTC

Jean-Christophe Helary 2017-07-17 23:22:07 UTC

Jean-Christophe Helary 2017-07-22 12:48:29 UTC

Eli Zaretskii 2017-07-22 13:06:19 UTC

Jean-Christophe Helary 2017-07-22 13:45:41 UTC

Eli Zaretskii 2017-07-22 14:08:54 UTC

Jean-Christophe Helary 2017-07-22 23:54:36 UTC

Eli Zaretskii 2017-07-23 14:39:23 UTC

Jean-Christophe Helary 2017-07-23 23:29:37 UTC

Eli Zaretskii 2017-07-24 14:47:16 UTC

Jean-Christophe Helary 2017-07-24 15:34:54 UTC

Eli Zaretskii 2017-07-24 15:51:01 UTC

Jean-Christophe Helary 2017-07-24 16:08:29 UTC

Eli Zaretskii 2017-07-24 16:29:55 UTC

Jean-Christophe Helary 2017-07-24 16:48:24 UTC

Eli Zaretskii 2017-07-24 16:55:13 UTC

Drew Adams 2017-05-28 21:52:47 UTC

Philipp Stephani 2017-05-31 22:18:11 UTC

Jean-Christophe Helary 2017-05-31 22:29:02 UTC

Paul Eggert 2017-06-01 05:18:08 UTC

Philipp Stephani 2017-06-01 08:17:37 UTC

Paul Eggert 2017-06-01 23:20:55 UTC

Philipp Stephani 2017-06-02 06:52:42 UTC

Paul Eggert 2017-06-03 08:37:14 UTC

Andreas Schwab 2017-06-03 09:12:09 UTC

Philipp Stephani 2017-06-03 09:34:07 UTC

Paul Eggert 2017-06-04 15:54:31 UTC

Eli Zaretskii 2017-06-04 16:45:10 UTC

Paul Eggert 2017-06-04 18:37:46 UTC

Paul Eggert 2017-12-03 05:43:28 UTC

Eli Zaretskii 2017-06-01 14:21:41 UTC

Jean-Christophe Helary 2017-07-02 01:22:28 UTC

Eli Zaretskii 2017-07-02 02:34:09 UTC

Jean-Christophe Helary 2018-04-25 12:58:03 UTC

about - legalese

Loading...