All of lore.kernel.org
 help / color / mirror / Atom feed
From: Danh Doan <congdanhqx@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, git@vger.kernel.org
Subject: Re: [PATCH 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list
Date: Fri, 1 Nov 2019 11:49:49 +0700	[thread overview]
Message-ID: <20191101044949.GA26545@danh.dev> (raw)
In-Reply-To: <20191031192650.GA12834@sigill.intra.peff.net>

On 2019-10-31 15:26:50 -0400, Jeff King wrote:
> I'm confused about a few things here, though. I agree with you that the
> subjects here are only used for finding the fixup/squash relationships.
> But I don't understand the musl connection.

You're right.

Because of musl's iconv implementation, the problem is being shown up
earlier.

> Wouldn't failure to reencode here always be a problem? E.g., if I do:
> 
>   for encoding in utf-8 iso-8859-1; do
>     # commit using the encoding
>     echo $encoding >file && git add file
>     echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding |
>       git -c i18n.commitEncoding=$encoding commit -F -
>     # and then fixup without it
>     echo "$encoding fixed" >file && git add file
>     git commit --fixup HEAD
>   done
>   
>   GIT_EDITOR='echo; grep -v ^#' git rebase -i --root --autosquash
> 
> then the resulting todo-list output (on my glibc system) is:
> 
>   pick 3a5bace éñcödèd with utf-8
>   fixup aa9f09c fixup! éñcödèd with utf-8
>   pick 6e85d32 éñcödèd with iso-8859-1
>   pick 3ceac05 fixup! éñcödèd with iso-8859-1
> 
> I.e., we don't actually match up the second pair, and I think we
> probably ought to.

Yes, we ought to match up the second pair, and after changing
get_commit_buffer to logmsg_reencode, we do.

> 
> I guess the test in t3900 is less exotic; it uses the same encoding for
> both commits. And it's just that "foo" and "!fixup foo" can (and do in
> musl) end up with different encodings (because of the specific language,
> and the vagaries of each iconv implementation).
> 
> Would we have similar problems in all of the other functions which use
> get_commit_buffer() without reencoding? For instance if I do this:
> 
>   echo base >file && git add file && git commit -m base
>   for encoding in utf-8 iso-8859-1; do
>     echo $encoding >file && git add file
>     echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding |
>       git -c i18n.commitEncoding=$encoding commit -F -
>   done
>   git checkout -b side HEAD~2
>   git cherry-pick master master^
>   cat .git/sequencer/todo
> 
> then the resulting todo file has a mix of iso-8859-1 and utf-8.
> 
> It seems to me that we should always be working with the subjects in a
> single encoding internally,

I'm in favour of this idea.

> and likewise outputting in that format
> (which should probably be git_log_output_encoding(), for the instances
> where we show it to the user).

This is git's current behaviour but it's get_log_output_encoding()
instead of git_log_output_encoding().

> I.e., we should always call logmsg_reencode() instead of
> get_commit_buffer().

-- 
Danh

  reply	other threads:[~2019-11-01  4:49 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-31  9:26 [PATCH 0/3] Linux with musl libc improvement Doan Tran Cong Danh
2019-10-31  9:26 ` [PATCH 1/3] t0028: eliminate non-standard usage of printf Doan Tran Cong Danh
2019-10-31 17:41   ` Jeff King
2019-11-01  1:33     ` Danh Doan
2019-10-31 19:50   ` brian m. carlson
2019-10-31  9:26 ` [PATCH 2/3] configure.ac: define ICONV_OMITS_BOM if necessary Doan Tran Cong Danh
2019-10-31 18:11   ` Jeff King
2019-10-31 20:02     ` brian m. carlson
2019-11-01  1:40     ` Danh Doan
2019-10-31  9:26 ` [PATCH 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list Doan Tran Cong Danh
2019-10-31 10:38   ` Johannes Schindelin
2019-10-31 19:26     ` Jeff King
2019-11-01  4:49       ` Danh Doan [this message]
2019-11-01  8:25 ` [PATCH v2 0/3] Linux with musl libc improvement Doan Tran Cong Danh
2019-11-01  8:25   ` [PATCH v2 1/3] t0028: eliminate non-standard usage of printf Doan Tran Cong Danh
2019-11-01 16:54     ` Jeff King
2019-11-01  8:25   ` [PATCH v2 2/3] configure.ac: define ICONV_OMITS_BOM if necessary Doan Tran Cong Danh
2019-11-01 16:56     ` Jeff King
2019-11-02  0:43       ` Danh Doan
2019-11-01  8:25   ` [PATCH v2 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list Doan Tran Cong Danh
2019-11-01 16:59     ` Jeff King
2019-11-02  1:02       ` Danh Doan
2019-11-02 12:20         ` Danh Doan
2019-11-05  8:00         ` Jeff King
2019-11-06  1:30           ` Junio C Hamano
2019-11-06  4:03             ` Jeff King
2019-11-06 10:03               ` Danh Doan
2019-11-07  5:56                 ` Jeff King
2019-11-06  9:19 ` [PATCH v3 0/8] Correct internal working and output encoding Doan Tran Cong Danh
2019-11-06  9:19   ` [PATCH v3 1/8] t0028: eliminate non-standard usage of printf Doan Tran Cong Danh
2019-11-06  9:20   ` [PATCH v3 2/8] configure.ac: define ICONV_OMITS_BOM if necessary Doan Tran Cong Danh
2019-11-06  9:20   ` [PATCH v3 3/8] t3900: demonstrate git-rebase problem with multi encoding Doan Tran Cong Danh
2019-11-06  9:20   ` [PATCH v3 4/8] sequencer: reencode to utf-8 before arrange rebase's todo list Doan Tran Cong Danh
2019-11-06  9:20   ` [PATCH v3 5/8] sequencer: reencode revert/cherry-pick's " Doan Tran Cong Danh
2019-11-06  9:20   ` [PATCH v3 6/8] sequencer: reencode squashing commit's message Doan Tran Cong Danh
2019-11-06  9:20   ` [PATCH v3 7/8] sequencer: reencode old merge-commit message Doan Tran Cong Danh
2019-11-06 15:39     ` Eric Sunshine
2019-11-06  9:20   ` [PATCH v3 8/8] sequencer: reencode commit message for am/rebase --show-current-patch Doan Tran Cong Danh
2019-11-07  2:56 ` [PATCH v4 0/8] Correct internal working and output encoding Doan Tran Cong Danh
2019-11-07  2:56   ` [PATCH v4 1/8] t0028: eliminate non-standard usage of printf Doan Tran Cong Danh
2019-11-07  2:56   ` [PATCH v4 2/8] configure.ac: define ICONV_OMITS_BOM if necessary Doan Tran Cong Danh
2019-11-07  6:18     ` Junio C Hamano
2019-11-07  2:56   ` [PATCH v4 3/8] t3900: demonstrate git-rebase problem with multi encoding Doan Tran Cong Danh
2019-11-07  6:02     ` Jeff King
2019-11-07  6:48       ` Danh Doan
2019-11-07  8:02         ` Jeff King
2019-11-07 10:51           ` Danh Doan
2019-11-11  8:22             ` Jeff King
2019-11-07  2:56   ` [PATCH v4 4/8] sequencer: reencode to utf-8 before arrange rebase's todo list Doan Tran Cong Danh
2019-11-07  6:04     ` Jeff King
2019-11-07  2:56   ` [PATCH v4 5/8] sequencer: reencode revert/cherry-pick's " Doan Tran Cong Danh
2019-11-07  6:06     ` Jeff King
2019-11-07  2:56   ` [PATCH v4 6/8] sequencer: reencode squashing commit's message Doan Tran Cong Danh
2019-11-07  6:15     ` Jeff King
2019-11-07  2:56   ` [PATCH v4 7/8] sequencer: reencode old merge-commit message Doan Tran Cong Danh
2019-11-07  2:56   ` [PATCH v4 8/8] sequencer: reencode commit message for am/rebase --show-current-patch Doan Tran Cong Danh
2019-11-07  6:32     ` Jeff King
2019-11-07  7:48       ` Danh Doan
2019-11-07  8:03         ` Jeff King
2019-11-07 16:32           ` Danh Doan
2019-11-08  9:43 ` [PATCH v5 0/9] Improve odd encoding integration Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 1/9] t0028: eliminate non-standard usage of printf Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 2/9] configure.ac: define ICONV_OMITS_BOM if necessary Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 3/9] t3900: demonstrate git-rebase problem with multi encoding Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 4/9] sequencer: reencode to utf-8 before arrange rebase's todo list Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 5/9] sequencer: reencode revert/cherry-pick's " Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 6/9] sequencer: reencode squashing commit's message Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 7/9] sequencer: reencode old merge-commit message Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 8/9] sequencer: reencode commit message for am/rebase --show-current-patch Doan Tran Cong Danh
2019-11-08  9:43   ` [PATCH v5 9/9] sequencer: fallback to sane label in making rebase todo list Doan Tran Cong Danh
2019-11-11  1:22   ` [PATCH v5 0/9] Improve odd encoding integration Junio C Hamano
2019-11-11  4:02   ` Junio C Hamano
2019-11-11  4:43     ` Danh Doan
2019-11-11  6:14     ` Junio C Hamano
2019-11-11  6:03 ` [PATCH v6 0/9] sequencer: handle other encoding better Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 1/9] t0028: eliminate non-standard usage of printf Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 2/9] configure.ac: define ICONV_OMITS_BOM if necessary Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 3/9] t3900: demonstrate git-rebase problem with multi encoding Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 4/9] sequencer: reencode to utf-8 before arrange rebase's todo list Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 5/9] sequencer: reencode revert/cherry-pick's " Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 6/9] sequencer: reencode squashing commit's message Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 7/9] sequencer: reencode old merge-commit message Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 8/9] sequencer: reencode commit message for am/rebase --show-current-patch Doan Tran Cong Danh
2019-11-11  6:03   ` [PATCH v6 9/9] sequencer: fallback to sane label in making rebase todo list Doan Tran Cong Danh
2019-11-11  8:39     ` Jeff King
2019-11-11 16:22       ` Phillip Wood
2019-11-11 18:26     ` Johannes Schindelin
2019-11-12  4:17       ` Junio C Hamano
2019-11-11  8:40   ` [PATCH v6 0/9] sequencer: handle other encoding better Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191101044949.GA26545@danh.dev \
    --to=congdanhqx@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.