From: "Torsten Bögershausen" <tboegi@web.de>
To: Jeff King <peff@peff.net>
Cc: 孟子易 <mengziyi540841@gmail.com>, git@vger.kernel.org
Subject: Re: bug report: symbolic-ref --short command echos the wrong text while use Chinese language
Date: Wed, 15 Feb 2023 17:26:49 +0100 [thread overview]
Message-ID: <20230215162648.py7diaasrymezntl@tb-raspi4> (raw)
In-Reply-To: <Y+qbFN+PhHVuWT2T@coredump.intra.peff.net>
On Mon, Feb 13, 2023 at 03:18:28PM -0500, Jeff King wrote:
> On Mon, Feb 13, 2023 at 02:38:08PM +0800, 孟子易 wrote:
>
> > System: Mac Os (Ventura 13.2)
> > Language: Chinese simplified
> > Preconditions:
> > # git checkout -b 测试-加-增加-加-增加
> > # git symbolic-ref --short HEAD
> > Wrong Echo (Current Echo):
> > 测试-�
> > Correct Echo:
> > // I Don't know, may be "测试-加" ?
>
> Hmm, I can't reproduce here on Linux:
>
> $ git init
> $ git commit --allow-empty -m foo
> $ git checkout -b 测试-加-增加-加-增加
> $ git symbolic-ref --short HEAD
> 测试-加-增加-加-增加
Neither can I - MacOs pre-Ventura ;-)
>
> I wonder if it is related to using macOS. The refs are stored as
> individual files in the filesystem, and HFS+ will do some unicode
> normalization. So I get:
>
> $ ls .git/refs/heads/ | xxd
> 00000000: 6d61 696e 0ae6 b58b e8af 952d e58a a02d main.......-...-
> 00000010: e5a2 9ee5 8aa0 2de5 8aa0 2de5 a29e e58a ......-...-.....
> 00000020: a00a
>
> Are your on-disk bytes different?
In my case there are the same.
Trying to convert from UTF-8 into UTF-8-MAC didn't change anything here.
Side note:
MacOs Ventura is probably not using HFS+, but apfs, which doesn't do
the unicode decomposition on file system level.
It would be helpful, to pipe the result into xxd:
git symbolic-ref --short HEAD | xxd
And then see, if there is any garbling inside or outside of Git ?
>
> My instinct was that this might be related to the shortening code
> treating the names as bytes, rather than characters. But looking at
> shorten_unambiguous_ref(), it is really operating at the level of path
> components, and should never split a partial string.
>
> Another possibility: the shortening is done by applying our usual
> ref-resolving rules one by one via scanf(). There's an assumption in the
> code that the resulting string can never be longer than the input:
>
> /* buffer for scanf result, at most refname must fit */
> short_name = xstrdup(refname);
>
> ...
> for (i = nr_rules - 1; i > 0 ; --i) {
> ...
> if (1 != sscanf(refname, scanf_fmts[i], short_name))
> continue;
>
> Is it possible that this assumption is violated based on some particular
> combination of unicode normalization and locale? That seems unlikely to
> me, but it wouldn't be the first time I've been surprised by subtle
> unicode implications.
>
> Is it possible for you to run Git in a debugger and check the
> intermediate steps happening in refs_shorten_unambiguous_ref()?
>
> -Peff
next prev parent reply other threads:[~2023-02-15 16:27 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-13 6:38 bug report: symbolic-ref --short command echos the wrong text while use Chinese language 孟子易
2023-02-13 20:18 ` Jeff King
2023-02-13 22:58 ` Eric Sunshine
2023-02-14 1:39 ` Jeff King
2023-02-14 5:15 ` Eric Sunshine
2023-02-14 5:33 ` Eric Sunshine
2023-02-14 5:40 ` Junio C Hamano
2023-02-14 6:05 ` Eric Sunshine
2023-02-14 6:45 ` Junio C Hamano
2023-02-14 6:55 ` Eric Sunshine
2023-02-14 16:01 ` Jeff King
2023-02-14 16:29 ` Eric Sunshine
2023-02-14 17:07 ` Jeff King
2023-02-14 18:38 ` [PATCH 0/3] get rid of sscanf() when shortening refs Jeff King
2023-02-14 18:39 ` [PATCH 1/3] shorten_unambiguous_ref(): avoid integer truncation Jeff King
2023-02-14 18:40 ` [PATCH 2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant Jeff King
2023-02-14 21:34 ` Junio C Hamano
2023-02-14 22:23 ` Jeff King
2023-02-14 18:41 ` [PATCH 3/3] shorten_unambiguous_ref(): avoid sscanf() Jeff King
2023-02-14 21:48 ` Junio C Hamano
2023-02-14 22:25 ` Junio C Hamano
2023-02-14 22:30 ` Jeff King
2023-02-14 22:34 ` Junio C Hamano
2023-02-14 22:40 ` Jeff King
2023-02-15 5:10 ` Junio C Hamano
2023-02-15 14:30 ` Jeff King
2023-02-15 16:41 ` Junio C Hamano
2023-02-14 23:20 ` Eric Sunshine
2023-02-15 15:16 ` [PATCH v2 0/3] get rid of sscanf() when shortening refs Jeff King
2023-02-15 15:16 ` [PATCH v2 1/3] shorten_unambiguous_ref(): avoid integer truncation Jeff King
2023-02-15 15:16 ` [PATCH v2 2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant Jeff King
2023-02-15 15:16 ` [PATCH v2 3/3] shorten_unambiguous_ref(): avoid sscanf() Jeff King
2023-02-16 5:56 ` Torsten Bögershausen
2023-02-16 6:16 ` Eric Sunshine
2023-02-16 17:21 ` Junio C Hamano
2023-02-16 17:28 ` Jeff King
2023-02-16 23:36 ` Junio C Hamano
2023-02-16 17:31 ` Jeff King
2023-02-17 6:46 ` Torsten Bögershausen
2023-02-15 18:00 ` [PATCH v2 0/3] get rid of sscanf() when shortening refs Junio C Hamano
2023-02-14 16:40 ` bug report: symbolic-ref --short command echos the wrong text while use Chinese language Junio C Hamano
2023-02-14 17:40 ` Jeff King
2023-02-15 16:26 ` Torsten Bögershausen [this message]
2023-02-15 16:37 ` Eric Sunshine
2023-02-15 17:19 ` Torsten Bögershausen
2023-02-16 6:08 ` Eric Sunshine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230215162648.py7diaasrymezntl@tb-raspi4 \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=mengziyi540841@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).