All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] i18n: Not add stripped contents for translation
@ 2012-03-05  1:21 Jiang Xin
  2012-03-05  2:27 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jiang Xin @ 2012-03-05  1:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Ævar Arnfjörð, Jiang Xin

The last two chars of the concatenate str from the i18n marked strings
(", ") will be stripped out by strbuf_setlen.

        before: "new commits, modified content, "
        end:    "new commits, modified content"

If the translations won't end with COMMA+SPACE, will break the integrity
of the concatenate string. As for Chinese, COMMA+SPACE may translated to
"," -- a 3-byte UTF-8 Chinese comma character.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
---
 wt-status.c |    6 +++---
 1 个文件被修改,插入 3 行(+),删除 3 行(-)

diff --git a/wt-status.c b/wt-status.c
index 9ffc535..0042dbc 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -245,11 +245,11 @@ static void wt_status_print_change_data(struct wt_status *s,
 		if (d->new_submodule_commits || d->dirty_submodule) {
 			strbuf_addstr(&extra, " (");
 			if (d->new_submodule_commits)
-				strbuf_addf(&extra, _("new commits, "));
+				strbuf_addf(&extra, "%s, ", _("new commits"));
 			if (d->dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-				strbuf_addf(&extra, _("modified content, "));
+				strbuf_addf(&extra, "%s, ", _("modified content"));
 			if (d->dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-				strbuf_addf(&extra, _("untracked content, "));
+				strbuf_addf(&extra, "%s, ", _("untracked content"));
 			strbuf_setlen(&extra, extra.len - 2);
 			strbuf_addch(&extra, ')');
 		}
-- 
1.7.9.2.330.g152e4.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-05  1:21 [PATCH] i18n: Not add stripped contents for translation Jiang Xin
@ 2012-03-05  2:27 ` Junio C Hamano
  2012-03-05  3:01   ` Jiang Xin
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-03-05  2:27 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, Ævar Arnfjörð

Jiang Xin <worldhello.net@gmail.com> writes:

> The last two chars of the concatenate str from the i18n marked strings
> (", ") will be stripped out by strbuf_setlen.
>
>         before: "new commits, modified content, "
>         end:    "new commits, modified content"
>
> If the translations won't end with COMMA+SPACE, will break the integrity
> of the concatenate string. As for Chinese, COMMA+SPACE may translated to
> "," -- a 3-byte UTF-8 Chinese comma character.

Hmph.  Why would that be a bad thing in the first place?
For example, for the diff.c::print_stat_summary() message, you have this
translation:

>  1 个文件被修改,插入 3 行(+),删除 3 行(-)

where the original would be:

   %d file changed, %d insertions(+), %d deletions(-).

and I would imagine that it entirely is plausible if a native reader would
wish to read a Japanese translation like this:

   1個のファイルを変更、挿入 3 行(+)、削除 3 行(-)

without using ASCII comma, but using "、" instead.

> diff --git a/wt-status.c b/wt-status.c
> index 9ffc535..0042dbc 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -245,11 +245,11 @@ static void wt_status_print_change_data(struct wt_status *s,
>  		if (d->new_submodule_commits || d->dirty_submodule) {
>  			strbuf_addstr(&extra, " (");
>  			if (d->new_submodule_commits)
> -				strbuf_addf(&extra, _("new commits, "));
> +				strbuf_addf(&extra, "%s, ", _("new commits"));
>  			if (d->dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
> -				strbuf_addf(&extra, _("modified content, "));
> +				strbuf_addf(&extra, "%s, ", _("modified content"));
>  			if (d->dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
> -				strbuf_addf(&extra, _("untracked content, "));
> +				strbuf_addf(&extra, "%s, ", _("untracked content"));
>  			strbuf_setlen(&extra, extra.len - 2);
>  			strbuf_addch(&extra, ')');
>  		}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-05  2:27 ` Junio C Hamano
@ 2012-03-05  3:01   ` Jiang Xin
  2012-03-05  3:42     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jiang Xin @ 2012-03-05  3:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Ævar Arnfjörð

2012/3/5 Junio C Hamano <gitster@pobox.com>:
>> If the translations won't end with COMMA+SPACE, will break the integrity
>> of the concatenate string. As for Chinese, COMMA+SPACE may translated to
>> "," -- a 3-byte UTF-8 Chinese comma character.
>
> Hmph.  Why would that be a bad thing in the first place?

Orignal source code:

244   case WT_STATUS_CHANGED:
245     if (d->new_submodule_commits || d->dirty_submodule) {
246       strbuf_addstr(&extra, " (");
247       if (d->new_submodule_commits)
248         strbuf_addf(&extra, _("new commits, "));
249       if (d->dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
250         strbuf_addf(&extra, _("modified content, "));
251       if (d->dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
252         strbuf_addf(&extra, _("untracked content, "));
253       strbuf_setlen(&extra, extra.len - 2);
254       strbuf_addch(&extra, ')');
255     }

The bad thing is strbuf_setlen() at line 253. We can not asume the translation
of ", " must be 2 characters.

If we change line 253 like the following, it may work but introduce another
string ", " for translation, and it looks weird.

253       strbuf_setlen(&extra, extra.len - strlen( _(", ") ));

It is because I translate the ", " to "," (chinese comma), the translated
chinese comma is 3 bytes long, striped two of them cause the output
looks weird.

It is also a bug hard to detect, If a careless translater lost the end space
in translation.


-- 
Jiang Xin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-05  3:01   ` Jiang Xin
@ 2012-03-05  3:42     ` Junio C Hamano
  2012-03-05 19:34       ` Jens Lehmann
  2012-03-06  4:16       ` Jiang Xin
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2012-03-05  3:42 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, Ævar Arnfjörð, Jens Lehmann

Jiang Xin <worldhello.net@gmail.com> writes:

> Orignal source code:
>
> 244   case WT_STATUS_CHANGED:
> 245     if (d->new_submodule_commits || d->dirty_submodule) {
> 246       strbuf_addstr(&extra, " (");
> 247       if (d->new_submodule_commits)
> 248         strbuf_addf(&extra, _("new commits, "));
> 249       if (d->dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
> 250         strbuf_addf(&extra, _("modified content, "));
> 251       if (d->dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
> 252         strbuf_addf(&extra, _("untracked content, "));
> 253       strbuf_setlen(&extra, extra.len - 2);
> 254       strbuf_addch(&extra, ')');
> 255     }
>
> The bad thing is strbuf_setlen() at line 253. We can not asume the translation
> of ", " must be 2 characters.

It sounds like you are merely working around a poor style in the original,
which should have been structured more like this in the first place, no?

        /* a helper function elsewhere, possibly inlined */
        static void add_iwsep_as_needed(struct strbuf *buf, int origlen)
        {
                if (buf->len != origlen)
                        strbuf_addstr(buf, _(","));
        }

        ...
        int origlen;

        strbuf_addstr(&extra, " ("))
        origlen = extra.len;
        if (a)
                strbuf_addstr(&extra, _("msg a"));
        if (b) {
                add_iwsep_as_needed(&extra, origlen);
                strbuf_addstr(&extra, _("msg b"));
        }
        if (c) {
                add_iwsep_as_needed(&extra, origlen);
                strbuf_addstr(&extra, _("msg c"));
        }
        strbuf_addstr(&extra, ")");

Cc'ing Jens whose 9297f77 (git status: Show detailed dirty status of
submodules in long format, 2010-03-08) introduced the "two-byte backstep".


This is a tangent and I am just showing aloud my ignorance, but I wonder
if there is a reasonably generic and "best current practice" way to
structure code to show an enumeration in human languages, for example,

	A, B, C and D.

in an easier-to-translate way.

I suspect that it might be sufficiently generic if we can make it possible
to allow the first and the last inter-word-separation and the token after
all the items to be different from other inter-word-separation tokens.

E.g. in English, the first one and all the "other" are ", ", the last
inter-word token is " and ", and the token at the very end is ".". In
Japanese some translators may want to say "AやBとCとD。", meaning the
first one is "や", "。" is used at the very end, and all the others may be
"と".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-05  3:42     ` Junio C Hamano
@ 2012-03-05 19:34       ` Jens Lehmann
  2012-03-05 20:08         ` Junio C Hamano
  2012-03-06  4:16       ` Jiang Xin
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Lehmann @ 2012-03-05 19:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jiang Xin, Git List, Ævar Arnfjörð

Am 05.03.2012 04:42, schrieb Junio C Hamano:
> Jiang Xin <worldhello.net@gmail.com> writes:
> 
>> Orignal source code:
>>
>> 244   case WT_STATUS_CHANGED:
>> 245     if (d->new_submodule_commits || d->dirty_submodule) {
>> 246       strbuf_addstr(&extra, " (");
>> 247       if (d->new_submodule_commits)
>> 248         strbuf_addf(&extra, _("new commits, "));
>> 249       if (d->dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
>> 250         strbuf_addf(&extra, _("modified content, "));
>> 251       if (d->dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
>> 252         strbuf_addf(&extra, _("untracked content, "));
>> 253       strbuf_setlen(&extra, extra.len - 2);
>> 254       strbuf_addch(&extra, ')');
>> 255     }
>>
>> The bad thing is strbuf_setlen() at line 253. We can not asume the translation
>> of ", " must be 2 characters.
> 
> It sounds like you are merely working around a poor style in the original,
> which should have been structured more like this in the first place, no?
> 
>         /* a helper function elsewhere, possibly inlined */
>         static void add_iwsep_as_needed(struct strbuf *buf, int origlen)
>         {
>                 if (buf->len != origlen)
>                         strbuf_addstr(buf, _(","));
>         }
> 
>         ...
>         int origlen;
> 
>         strbuf_addstr(&extra, " ("))
>         origlen = extra.len;
>         if (a)
>                 strbuf_addstr(&extra, _("msg a"));
>         if (b) {
>                 add_iwsep_as_needed(&extra, origlen);
>                 strbuf_addstr(&extra, _("msg b"));
>         }
>         if (c) {
>                 add_iwsep_as_needed(&extra, origlen);
>                 strbuf_addstr(&extra, _("msg c"));
>         }
>         strbuf_addstr(&extra, ")");
> 
> Cc'ing Jens whose 9297f77 (git status: Show detailed dirty status of
> submodules in long format, 2010-03-08) introduced the "two-byte backstep".

I have no objections at all against changing the code that way to make it
possible to translate it in a sane way. This code predates the i18n effort
by a few months, so it didn't take this kind of problem into account.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-05 19:34       ` Jens Lehmann
@ 2012-03-05 20:08         ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2012-03-05 20:08 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Jiang Xin, Git List, Ævar Arnfjörð

Jens Lehmann <Jens.Lehmann@web.de> writes:

> I have no objections at all against changing the code that way to make it
> possible to translate it in a sane way. This code predates the i18n effort
> by a few months, so it didn't take this kind of problem into account.

Ah, Ok, that explains it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-05  3:42     ` Junio C Hamano
  2012-03-05 19:34       ` Jens Lehmann
@ 2012-03-06  4:16       ` Jiang Xin
  2012-03-06  6:46         ` Junio C Hamano
  2012-03-06  6:47         ` Jiang Xin
  1 sibling, 2 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-06  4:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Ævar Arnfjörð, Jens Lehmann

2012/3/5 Junio C Hamano <gitster@pobox.com>:
> This is a tangent and I am just showing aloud my ignorance, but I wonder
> if there is a reasonably generic and "best current practice" way to
> structure code to show an enumeration in human languages, for example,
>
>        A, B, C and D.
>
> in an easier-to-translate way.
>
> I suspect that it might be sufficiently generic if we can make it possible
> to allow the first and the last inter-word-separation and the token after
> all the items to be different from other inter-word-separation tokens.
>
> E.g. in English, the first one and all the "other" are ", ", the last
> inter-word token is " and ", and the token at the very end is ".". In
> Japanese some translators may want to say "AやBとCとD。", meaning the
> first one is "や", "。" is used at the very end, and all the others may be
> "と".

I write a function for this.

/*
 * Make list of items easy for l10n translation.
 *
 *   1. Input list of items one by one through the 2nd argument,
 *      but leave the 1st argument as NULL.
 *
 *   2. Get the output joint string from the 1st argument.
 *
 * According to the number of items input. The joint string maybe:
 *
 *   a
 *   a and b
 *   a, b and c
 *   a, b, c and d
 *
 */
#define MAX_L10N_LIST_ITEMS_COUNT  128
void append_l10n_list_items(struct strbuf *ret, const char *item)
{
    static const char **itemlist = NULL;
    static int  count = 0;
    int i = 0;

    if (itemlist == NULL)
        itemlist = xmalloc(MAX_L10N_LIST_ITEMS_COUNT * sizeof(char*));

    if(item != NULL && count < MAX_L10N_LIST_ITEMS_COUNT )
        itemlist[count++] = item;

    if (ret != NULL) {
        if (count == 1) {
            strbuf_addstr(ret, itemlist[0]);
        }
        else if (count == 2) {
            strbuf_addf(ret, _("%s and %s"), itemlist[0], itemlist[1]);
        }
        else if (count > 2) {
            strbuf_addf(ret, _("%s, "), itemlist[0]);
            for (i=1; i<count-2; i++) {
                strbuf_addstr(ret, itemlist[i]);
                strbuf_addstr(ret, _(", "));
            }
            strbuf_addf(ret, _("%s and %s"), itemlist[count-2],
itemlist[count-1]);
        }
        free(itemlist);
        itemlist = NULL;
        count = 0;
    }
}


...
       strbuf_addstr(&extra, " ("))
       if (a)
               append_l10n_list_items(NULL, _("msg a"));
       if (b)
               append_l10n_list_items(NULL, _("msg b"));
       if (c)
               append_l10n_list_items(NULL, _("msg c"));

       append_l10n_list_items(&extra, NULL);
...

-- 
Jiang Xin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-06  4:16       ` Jiang Xin
@ 2012-03-06  6:46         ` Junio C Hamano
  2012-03-06  6:55           ` Jiang Xin
  2012-03-06  6:47         ` Jiang Xin
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-03-06  6:46 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, Ævar Arnfjörð, Jens Lehmann

Jiang Xin <worldhello.net@gmail.com> writes:

>> This is a tangent and I am just showing aloud my ignorance, but I wonder
>> if there is a reasonably generic and "best current practice" way to
>> structure code to show an enumeration in human languages, for example,
> ...
> I write a function for this.

It is not a very good API design to use static array in a function
like your code.

You will often want to walk list of lists in a nested loop and want
to have two "appends" running in parallel.  Imagine you are merging
three branches A, B and C in an octopus merge, and want to say
"commits A1 and A2 were merged from A".  You would want to structure
the caller like this:

        clear_list(&list_of_branches);
        # this walks branches A, B and C
        foreach branch
                append_item(&list_of_branches, branch);

                clear_list(&list_of_commits);
                # this walks commits on the branch
                foreach commit on branch
                        append_item(&list_of_commits, commit);
                # this shows e.g. "commits A1 and A2"
                format_list_as_human_string(&list_of_commits);
        # this shows e.g. "branches A, B and C"
        format_list_as_human_string(&list_of_branches);

In any case, I was not asking anybody to come up with an original
solution. Rather, I was asking if somebody knew an already widely
used library-ish implementation we can just reuse.  If there is no
such thing, we may end up designing one ourselves, but we shouldn't
be doing so if we don't have to.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-06  4:16       ` Jiang Xin
  2012-03-06  6:46         ` Junio C Hamano
@ 2012-03-06  6:47         ` Jiang Xin
  1 sibling, 0 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-06  6:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Ævar Arnfjörð, Jens Lehmann

在 2012年3月6日 下午12:16,Jiang Xin <worldhello.net@gmail.com> 写道:
>
> /*
>  * Make list of items easy for l10n translation.
>  *
>  *   1. Input list of items one by one through the 2nd argument,
>  *      but leave the 1st argument as NULL.
>  *
>  *   2. Append the output joint string to strbuf from the 1st argument.
>  *
>  * According to the number of items input. The joint string maybe:
>  *
>  *   a
>  *   a and b
>  *   a, b and c
>  *   a, b, c and d
>  *
>  */

-#define MAX_L10N_LIST_ITEMS_COUNT  128
+#define MAX_L10N_LIST_ITEMS_COUNT      8

> void append_l10n_list_items(struct strbuf *ret, const char *item)
> {
>    static const char **itemlist = NULL;
>    static int  count = 0;
>    int i = 0;
>
>    if (itemlist == NULL)
>        itemlist = xmalloc(MAX_L10N_LIST_ITEMS_COUNT * sizeof(char*));

+    else if (count >= MAX_L10N_LIST_ITEMS_COUNT)
+        itemlist = xrealloc(itemlist, (count +
MAX_L10N_LIST_ITEMS_COUNT) * sizeof(char*));

-    if(item != NULL && count < MAX_L10N_LIST_ITEMS_COUNT )
+    if (item != NULL)

>        itemlist[count++] = item;
>
>    if (ret != NULL) {
>        if (count == 1) {
>            strbuf_addstr(ret, itemlist[0]);
>        }
>        else if (count == 2) {
>            strbuf_addf(ret, _("%s and %s"), itemlist[0], itemlist[1]);
>        }
>        else if (count > 2) {
>            strbuf_addf(ret, _("%s, "), itemlist[0]);
>            for (i=1; i<count-2; i++) {

-                 strbuf_addstr(ret, itemlist[i]);
-                 strbuf_addstr(ret, _(", "));
+                 strbuf_addf(ret, _("%s, %s"), itemlist[i], "");

>            }
>            strbuf_addf(ret, _("%s and %s"), itemlist[count-2],
> itemlist[count-1]);
>        }
>        free(itemlist);
>        itemlist = NULL;
>        count = 0;
>    }
> }
>
>
> ...
>       strbuf_addstr(&extra, " ("))
>       if (a)
>               append_l10n_list_items(NULL, _("msg a"));
>       if (b)
>               append_l10n_list_items(NULL, _("msg b"));
>       if (c)
>               append_l10n_list_items(NULL, _("msg c"));
>
>       append_l10n_list_items(&extra, NULL);
> ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i18n: Not add stripped contents for translation
  2012-03-06  6:46         ` Junio C Hamano
@ 2012-03-06  6:55           ` Jiang Xin
  0 siblings, 0 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-06  6:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

> In any case, I was not asking anybody to come up with an original
> solution. Rather, I was asking if somebody knew an already widely
> used library-ish implementation we can just reuse.  If there is no
> such thing, we may end up designing one ourselves, but we shouldn't
> be doing so if we don't have to.

Still awake? hardwork, admire. ;-)

-- 
Jiang Xin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-06  6:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-05  1:21 [PATCH] i18n: Not add stripped contents for translation Jiang Xin
2012-03-05  2:27 ` Junio C Hamano
2012-03-05  3:01   ` Jiang Xin
2012-03-05  3:42     ` Junio C Hamano
2012-03-05 19:34       ` Jens Lehmann
2012-03-05 20:08         ` Junio C Hamano
2012-03-06  4:16       ` Jiang Xin
2012-03-06  6:46         ` Junio C Hamano
2012-03-06  6:55           ` Jiang Xin
2012-03-06  6:47         ` Jiang Xin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.