All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Meshcheryakov <alexander.s.m@gmail.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Junio C Hamano <gitster@pobox.com>,
	Calvin Wan <calvinwan@google.com>,
	git@vger.kernel.org
Subject: Re: [BUG] Unicode filenames handling in `git log --stat`
Date: Wed, 10 Aug 2022 12:56:11 +0400	[thread overview]
Message-ID: <CA+VDVVUKf48Q9A0hWPnBE+qG_7tBDuXKkdo+wWDU7iC3Wg=oEg@mail.gmail.com> (raw)
In-Reply-To: <20220810084017.gnnodcbt5lyibbf6@tb-raspi4>

I believe I have found exact place where strlen is used incorrectly
This is at diff.c:show_stats

https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/diff.c#L2623

It probably should be replaced with one of utf8_width, utf8_strnwidth
or utf8_strwidth from utf8.c


On Wed, 10 Aug 2022 at 12:40, Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Tue, Aug 09, 2022 at 10:55:31PM -0700, Junio C Hamano wrote:
> > Calvin Wan <calvinwan@google.com> writes:
> >
> > > Hi Alexander,
> > >
> > > Thank you for the report! I attempted to reproduce with the steps you
> > > provided, but was unable to do so. What commands would I have to run
> > > on a clean git repository to reproduce this?
> >
> > Sounds like a symptom observable when the width computed by
> > utf8.c::git_gcwidth(), using the width table imported from
> > unicode.org, and the width the terminal thinks each of the displayed
> > character has, do not match (e.g. seen when ambiguous characters are
> > involved, https://unicode.org/reports/tr11/#Ambiguous).
> >
>
> I am not fully sure about that - I can reproduce it with Latin based
> file names as well:
>
>  git log --stat
> [snip]
>  Arger.txt  | 1 +
>  Ärger.txt | 1 +
>    2 files changed, 2 insertions(+)
>
> From this very first experiment I would suspect that we use
> strlen() somewhere rather then utf8.c::git_gcwidth()
>
> More digging needed (but I don't promise anything today)

  reply	other threads:[~2022-08-10  8:56 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09 13:11 [BUG] Unicode filenames handling in `git log --stat` Alexander Meshcheryakov
2022-08-09 18:20 ` Calvin Wan
2022-08-09 19:03   ` Alexander Meshcheryakov
2022-08-09 21:36     ` Calvin Wan
2022-08-10  5:55   ` Junio C Hamano
2022-08-10  8:40     ` Torsten Bögershausen
2022-08-10  8:56       ` Alexander Meshcheryakov [this message]
2022-08-10  9:51         ` Torsten Bögershausen
2022-08-10 11:41           ` Torsten Bögershausen
2022-08-10 15:53       ` Junio C Hamano
2022-08-10 17:35         ` Torsten Bögershausen
2022-08-14 13:35 ` [PATCH/RFC 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-08-14 23:12   ` Junio C Hamano
2022-08-15  6:34     ` Torsten Bögershausen
2022-08-18 21:00       ` Junio C Hamano
2022-08-27  8:50 ` [PATCH v2 " tboegi
2022-08-27  8:54   ` Torsten Bögershausen
2022-08-27  9:50     ` Eric Sunshine
2022-08-29 12:04   ` Johannes Schindelin
2022-08-29 17:54     ` Torsten Bögershausen
2022-08-29 18:37       ` Junio C Hamano
2022-09-02  9:47       ` Johannes Schindelin
2022-09-02  4:21 ` [PATCH v3 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-02  9:39   ` Johannes Schindelin
2022-09-02  4:21 ` [PATCH v3 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-02 10:12   ` Johannes Schindelin
2022-09-03  5:39 ` [PATCH v4 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-05 20:46   ` Junio C Hamano
2022-09-07  4:30     ` Torsten Bögershausen
2022-09-07 18:31       ` Junio C Hamano
2022-09-03  5:39 ` [PATCH v4 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-05 10:13   ` Johannes Schindelin
2022-09-14 15:13 ` [PATCH v5 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-09-14 16:40   ` Junio C Hamano
2022-09-26 18:43     ` Torsten Bögershausen
2022-10-10 21:58       ` Junio C Hamano
2022-10-20 15:46         ` Torsten Bögershausen
2022-10-20 17:43           ` Junio C Hamano
2022-10-21 15:19             ` Torsten Bögershausen
2022-10-21 21:59               ` Junio C Hamano
2022-10-23 20:02                 ` Torsten Bögershausen
2022-09-15  2:57   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+VDVVUKf48Q9A0hWPnBE+qG_7tBDuXKkdo+wWDU7iC3Wg=oEg@mail.gmail.com' \
    --to=alexander.s.m@gmail.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.