All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andrey Bienkowski via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Andrey Bienkowski <hexagonrecursion@gmail.com>,
	Andrey Bienkowski <hexagonrecursion@gmail.com>
Subject: [PATCH v2] doc: clarify the filename encoding in git diff
Date: Tue, 20 Apr 2021 11:24:37 +0000	[thread overview]
Message-ID: <pull.996.v2.git.git.1618917877881.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.996.git.git.1618838856399.gitgitgadget@gmail.com>

From: Andrey Bienkowski <hexagonrecursion@gmail.com>

AFAICT parsing the output of `git diff --name-only master...feature`
is the intended way of programmatically getting the list of files
modified
by a feature branch. It is impossible to parse text unless you know what
encoding it is in. The output encoding of diff --name-only and
diff --name-status was not documented.

I asked on the mailing list and got this:
https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
> There's some discussion in Documentation/i18n.txt, which is included
in
various manpages (e.g., https://git-scm.com/docs/git-log#_discussion)
but it doesn't seem to be mentioned in git-diff.
>
The short answer is: mostly utf8, but historically on platforms that
don't care (like Linux) you could get away with other encodings.
>
-Peff

My takeaway was to always parse it as utf8 regardless of platform or
environment.

Signed-off-by: Andrey Bienkowski <hexagonrecursion@gmail.com>
---
    doc: clarify the filename encoding in git diff --name-only and
    --name-status
    
    AFAICT parsing the output of git diff --name-only master...feature is
    the intended way of programmatically getting the list of files modified
    by a feature branch. It is impossible to parse text unless you know what
    encoding it is in. The output encoding of diff --name-only and diff
    --name-status was not documented.
    
    I asked on the mailing list and got this:
    https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
    
    > There's some discussion in Documentation/i18n.txt, which is included
    > in various manpages (e.g.,
    > https://git-scm.com/docs/git-log#_discussion) but it doesn't seem to
    > be mentioned in git-diff.
    >
    > The short answer is: mostly utf8, but historically on platforms that
    > don't care (like Linux) you could get away with other encodings.
    >
    > -Peff
    
    My takeaway was to always parse it as utf8 regardless of platform or
    environment.
    
    Changes since v1:
    
     * Replace "always" with "usually"
     * Add a link to https://git-scm.com/docs/git-log
     * Replace "usually" with "often"

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-996%2Fhexagonrecursion%2Futf8-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-996/hexagonrecursion/utf8-v2
Pull-Request: https://github.com/git/git/pull/996

Range-diff vs v1:

 1:  4f1987e5e09c ! 1:  6daa652b7b15 doc: clarify the filename encoding in git diff
     @@ Commit message
          doc: clarify the filename encoding in git diff
      
          AFAICT parsing the output of `git diff --name-only master...feature`
     -    is the intended way of programmatically getting the list of files modified
     +    is the intended way of programmatically getting the list of files
     +    modified
          by a feature branch. It is impossible to parse text unless you know what
          encoding it is in. The output encoding of diff --name-only and
          diff --name-status was not documented.
      
          I asked on the mailing list and got this:
          https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
     -    > There's some discussion in Documentation/i18n.txt, which is included in
     +    > There's some discussion in Documentation/i18n.txt, which is included
     +    in
          various manpages (e.g., https://git-scm.com/docs/git-log#_discussion)
          but it doesn't seem to be mentioned in git-diff.
          >
     @@ Documentation/diff-options.txt: explained for the configuration variable `core.q
       
       --name-only::
      -	Show only names of changed files.
     -+	Show only names of changed files. The file names are usually encoded in UTF-8.
     ++	Show only names of changed files. The file names are often encoded in UTF-8.
      +	For more information see the discussion about encoding in the linkgit:git-log[1]
      +	manual page.
       
       --name-status::
       	Show only names and status of changed files. See the description
       	of the `--diff-filter` option on what the status letters mean.
     -+	Just like `--name-only` the file names are usually encoded in UTF-8.
     ++	Just like `--name-only` the file names are often encoded in UTF-8.
       
       --submodule[=<format>]::
       	Specify how differences in submodules are shown.  When specifying


 Documentation/diff-options.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index aa2b5c11f20b..69de49f977b6 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,11 +293,14 @@ explained for the configuration variable `core.quotePath` (see
 linkgit:git-config[1]).
 
 --name-only::
-	Show only names of changed files.
+	Show only names of changed files. The file names are often encoded in UTF-8.
+	For more information see the discussion about encoding in the linkgit:git-log[1]
+	manual page.
 
 --name-status::
 	Show only names and status of changed files. See the description
 	of the `--diff-filter` option on what the status letters mean.
+	Just like `--name-only` the file names are often encoded in UTF-8.
 
 --submodule[=<format>]::
 	Specify how differences in submodules are shown.  When specifying

base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7
-- 
gitgitgadget

      parent reply	other threads:[~2021-04-20 11:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 13:27 [PATCH] doc: clarify the filename encoding in git diff Andrey Bienkowski via GitGitGadget
2021-04-19 21:33 ` Junio C Hamano
2021-04-20 11:24 ` Andrey Bienkowski via GitGitGadget [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.996.v2.git.git.1618917877881.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=hexagonrecursion@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.