All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Calvin Wan <calvinwan@google.com>,
	Alexander Meshcheryakov <alexander.s.m@gmail.com>,
	git@vger.kernel.org
Subject: Re: [BUG] Unicode filenames handling in `git log --stat`
Date: Wed, 10 Aug 2022 19:35:54 +0200	[thread overview]
Message-ID: <20220810173554.sl3bxtosnszygs5f@tb-raspi4> (raw)
In-Reply-To: <xmqqiln0p01z.fsf@gitster.g>

On Wed, Aug 10, 2022 at 08:53:28AM -0700, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
> >  git log --stat
> > [snip]
> >  Arger.txt  | 1 +
> >  Ärger.txt | 1 +
> >    2 files changed, 2 insertions(+)
> >
> > From this very first experiment I would suspect that we use
> > strlen() somewhere rather then utf8.c::git_gcwidth()
>
> Yeah, that does sound like the case, and quite honestly, knowing
> that the diffstat code is way older than unicode-width code, which
> was added by you in mid 2014, I am not all that surprised if we used
> to use strlen() throughout and we still do by mistake.
>
> Thanks for a doze of sanity.

Some 2 updates here:
- The strlen() needs a replacement.
  It looks as if the following patch helps:

/* somewhere in diff.c */
static size_t screen_utf8_width(const char *start)
{
       const char *cp = start;
       size_t remain = strlen(start);
       size_t width = 0;

       while (remain) {
               int n = utf8_width(&cp, &remain);
               if (n < 0)
                       return strlen(start); /* not UTF-8 ? Use strlen() */
               width += n;
       }
       return width;
}

@@ -2620,7 +2635,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
                        continue;
			                }
					                fill_print_name(file);
							-               len = strlen(file->print_name);
							+               len = screen_utf8_width(file->print_name);
							                if (max_len < len)
									                        max_len = len;

@@ -2743,7 +2758,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
                 * "scale" the filename
		                  */
				                  len = name_width;
						  -               name_len = strlen(name);
						  +               name_len = screen_utf8_width(name);
						                  if (name_width < name_len) {


=====================================
Let's see if I can make a proper patch out of it.

The second problem, and I hoped it wasn't, seems to be related to what
you had digged out earlier.

>Sounds like a symptom observable when the width computed by
>utf8.c::git_gcwidth(), using the width table imported from
>unicode.org, and the width the terminal thinks each of the displayed
>character has, do not match (e.g. seen when ambiguous characters are
>involved, https://unicode.org/reports/tr11/#Ambiguous).

That needs a second patch, probably after some more digging,
how unicode is rendedered on the different systems

  reply	other threads:[~2022-08-10 17:36 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09 13:11 [BUG] Unicode filenames handling in `git log --stat` Alexander Meshcheryakov
2022-08-09 18:20 ` Calvin Wan
2022-08-09 19:03   ` Alexander Meshcheryakov
2022-08-09 21:36     ` Calvin Wan
2022-08-10  5:55   ` Junio C Hamano
2022-08-10  8:40     ` Torsten Bögershausen
2022-08-10  8:56       ` Alexander Meshcheryakov
2022-08-10  9:51         ` Torsten Bögershausen
2022-08-10 11:41           ` Torsten Bögershausen
2022-08-10 15:53       ` Junio C Hamano
2022-08-10 17:35         ` Torsten Bögershausen [this message]
2022-08-14 13:35 ` [PATCH/RFC 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-08-14 23:12   ` Junio C Hamano
2022-08-15  6:34     ` Torsten Bögershausen
2022-08-18 21:00       ` Junio C Hamano
2022-08-27  8:50 ` [PATCH v2 " tboegi
2022-08-27  8:54   ` Torsten Bögershausen
2022-08-27  9:50     ` Eric Sunshine
2022-08-29 12:04   ` Johannes Schindelin
2022-08-29 17:54     ` Torsten Bögershausen
2022-08-29 18:37       ` Junio C Hamano
2022-09-02  9:47       ` Johannes Schindelin
2022-09-02  4:21 ` [PATCH v3 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-02  9:39   ` Johannes Schindelin
2022-09-02  4:21 ` [PATCH v3 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-02 10:12   ` Johannes Schindelin
2022-09-03  5:39 ` [PATCH v4 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-05 20:46   ` Junio C Hamano
2022-09-07  4:30     ` Torsten Bögershausen
2022-09-07 18:31       ` Junio C Hamano
2022-09-03  5:39 ` [PATCH v4 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-05 10:13   ` Johannes Schindelin
2022-09-14 15:13 ` [PATCH v5 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-09-14 16:40   ` Junio C Hamano
2022-09-26 18:43     ` Torsten Bögershausen
2022-10-10 21:58       ` Junio C Hamano
2022-10-20 15:46         ` Torsten Bögershausen
2022-10-20 17:43           ` Junio C Hamano
2022-10-21 15:19             ` Torsten Bögershausen
2022-10-21 21:59               ` Junio C Hamano
2022-10-23 20:02                 ` Torsten Bögershausen
2022-09-15  2:57   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220810173554.sl3bxtosnszygs5f@tb-raspi4 \
    --to=tboegi@web.de \
    --cc=alexander.s.m@gmail.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.