All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] doc: clarify the filename encoding in git diff
@ 2021-04-19 13:27 Andrey Bienkowski via GitGitGadget
  2021-04-19 21:33 ` Junio C Hamano
  2021-04-20 11:24 ` [PATCH v2] " Andrey Bienkowski via GitGitGadget
  0 siblings, 2 replies; 3+ messages in thread
From: Andrey Bienkowski via GitGitGadget @ 2021-04-19 13:27 UTC (permalink / raw)
  To: git; +Cc: Andrey Bienkowski, Andrey Bienkowski

From: Andrey Bienkowski <hexagonrecursion@gmail.com>

AFAICT parsing the output of `git diff --name-only master...feature`
is the intended way of programmatically getting the list of files modified
by a feature branch. It is impossible to parse text unless you know what
encoding it is in. The output encoding of diff --name-only and
diff --name-status was not documented.

I asked on the mailing list and got this:
https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
> There's some discussion in Documentation/i18n.txt, which is included in
various manpages (e.g., https://git-scm.com/docs/git-log#_discussion)
but it doesn't seem to be mentioned in git-diff.
>
The short answer is: mostly utf8, but historically on platforms that
don't care (like Linux) you could get away with other encodings.
>
-Peff

My takeaway was to always parse it as utf8 regardless of platform or
environment.

Signed-off-by: Andrey Bienkowski <hexagonrecursion@gmail.com>
---
    doc: clarify the filename encoding in git diff --name-only and
    --name-status
    
    AFAICT parsing the output of git diff --name-only master...feature is
    the intended way of programmatically getting the list of files modified
    by a feature branch. It is impossible to parse text unless you know what
    encoding it is in. The output encoding of diff --name-only and diff
    --name-status was not documented.
    
    I asked on the mailing list and got this:
    https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
    
    > There's some discussion in Documentation/i18n.txt, which is included
    > in various manpages (e.g.,
    > https://git-scm.com/docs/git-log#_discussion) but it doesn't seem to
    > be mentioned in git-diff.
    >
    > The short answer is: mostly utf8, but historically on platforms that
    > don't care (like Linux) you could get away with other encodings.
    >
    > -Peff
    
    My takeaway was to always parse it as utf8 regardless of platform or
    environment.
    
    Changes since v1:
    
     * Replace "always" with "often"
     * Add a link to https://git-scm.com/docs/git-log

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-996%2Fhexagonrecursion%2Futf8-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-996/hexagonrecursion/utf8-v1
Pull-Request: https://github.com/git/git/pull/996

 Documentation/diff-options.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index aa2b5c11f20b..4ce36ef535ba 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,11 +293,14 @@ explained for the configuration variable `core.quotePath` (see
 linkgit:git-config[1]).
 
 --name-only::
-	Show only names of changed files.
+	Show only names of changed files. The file names are usually encoded in UTF-8.
+	For more information see the discussion about encoding in the linkgit:git-log[1]
+	manual page.
 
 --name-status::
 	Show only names and status of changed files. See the description
 	of the `--diff-filter` option on what the status letters mean.
+	Just like `--name-only` the file names are usually encoded in UTF-8.
 
 --submodule[=<format>]::
 	Specify how differences in submodules are shown.  When specifying

base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] doc: clarify the filename encoding in git diff
  2021-04-19 13:27 [PATCH] doc: clarify the filename encoding in git diff Andrey Bienkowski via GitGitGadget
@ 2021-04-19 21:33 ` Junio C Hamano
  2021-04-20 11:24 ` [PATCH v2] " Andrey Bienkowski via GitGitGadget
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2021-04-19 21:33 UTC (permalink / raw)
  To: Andrey Bienkowski via GitGitGadget; +Cc: git, Andrey Bienkowski

"Andrey Bienkowski via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Signed-off-by: Andrey Bienkowski <hexagonrecursion@gmail.com>
> ---
>     My takeaway was to always parse it as utf8 regardless of platform or
>     environment.
>     
>     Changes since v1:

I do not think the readers on the list have seen the "v1", but
anyway, the 

>      * Replace "always" with "often"

"often" here sound more measured than ...

>  --name-only::
> -	Show only names of changed files.
> +	Show only names of changed files. The file names are usually encoded in UTF-8.
> +	For more information see the discussion about encoding in the linkgit:git-log[1]
> +	manual page.

... "usually" here ...

>  --name-status::
>  	Show only names and status of changed files. See the description
>  	of the `--diff-filter` option on what the status letters mean.
> +	Just like `--name-only` the file names are usually encoded in UTF-8.

... and here.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] doc: clarify the filename encoding in git diff
  2021-04-19 13:27 [PATCH] doc: clarify the filename encoding in git diff Andrey Bienkowski via GitGitGadget
  2021-04-19 21:33 ` Junio C Hamano
@ 2021-04-20 11:24 ` Andrey Bienkowski via GitGitGadget
  1 sibling, 0 replies; 3+ messages in thread
From: Andrey Bienkowski via GitGitGadget @ 2021-04-20 11:24 UTC (permalink / raw)
  To: git; +Cc: Andrey Bienkowski, Andrey Bienkowski

From: Andrey Bienkowski <hexagonrecursion@gmail.com>

AFAICT parsing the output of `git diff --name-only master...feature`
is the intended way of programmatically getting the list of files
modified
by a feature branch. It is impossible to parse text unless you know what
encoding it is in. The output encoding of diff --name-only and
diff --name-status was not documented.

I asked on the mailing list and got this:
https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
> There's some discussion in Documentation/i18n.txt, which is included
in
various manpages (e.g., https://git-scm.com/docs/git-log#_discussion)
but it doesn't seem to be mentioned in git-diff.
>
The short answer is: mostly utf8, but historically on platforms that
don't care (like Linux) you could get away with other encodings.
>
-Peff

My takeaway was to always parse it as utf8 regardless of platform or
environment.

Signed-off-by: Andrey Bienkowski <hexagonrecursion@gmail.com>
---
    doc: clarify the filename encoding in git diff --name-only and
    --name-status
    
    AFAICT parsing the output of git diff --name-only master...feature is
    the intended way of programmatically getting the list of files modified
    by a feature branch. It is impossible to parse text unless you know what
    encoding it is in. The output encoding of diff --name-only and diff
    --name-status was not documented.
    
    I asked on the mailing list and got this:
    https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
    
    > There's some discussion in Documentation/i18n.txt, which is included
    > in various manpages (e.g.,
    > https://git-scm.com/docs/git-log#_discussion) but it doesn't seem to
    > be mentioned in git-diff.
    >
    > The short answer is: mostly utf8, but historically on platforms that
    > don't care (like Linux) you could get away with other encodings.
    >
    > -Peff
    
    My takeaway was to always parse it as utf8 regardless of platform or
    environment.
    
    Changes since v1:
    
     * Replace "always" with "usually"
     * Add a link to https://git-scm.com/docs/git-log
     * Replace "usually" with "often"

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-996%2Fhexagonrecursion%2Futf8-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-996/hexagonrecursion/utf8-v2
Pull-Request: https://github.com/git/git/pull/996

Range-diff vs v1:

 1:  4f1987e5e09c ! 1:  6daa652b7b15 doc: clarify the filename encoding in git diff
     @@ Commit message
          doc: clarify the filename encoding in git diff
      
          AFAICT parsing the output of `git diff --name-only master...feature`
     -    is the intended way of programmatically getting the list of files modified
     +    is the intended way of programmatically getting the list of files
     +    modified
          by a feature branch. It is impossible to parse text unless you know what
          encoding it is in. The output encoding of diff --name-only and
          diff --name-status was not documented.
      
          I asked on the mailing list and got this:
          https://public-inbox.org/git/YGx2EMHnwXWbp4ET@coredump.intra.peff.net/
     -    > There's some discussion in Documentation/i18n.txt, which is included in
     +    > There's some discussion in Documentation/i18n.txt, which is included
     +    in
          various manpages (e.g., https://git-scm.com/docs/git-log#_discussion)
          but it doesn't seem to be mentioned in git-diff.
          >
     @@ Documentation/diff-options.txt: explained for the configuration variable `core.q
       
       --name-only::
      -	Show only names of changed files.
     -+	Show only names of changed files. The file names are usually encoded in UTF-8.
     ++	Show only names of changed files. The file names are often encoded in UTF-8.
      +	For more information see the discussion about encoding in the linkgit:git-log[1]
      +	manual page.
       
       --name-status::
       	Show only names and status of changed files. See the description
       	of the `--diff-filter` option on what the status letters mean.
     -+	Just like `--name-only` the file names are usually encoded in UTF-8.
     ++	Just like `--name-only` the file names are often encoded in UTF-8.
       
       --submodule[=<format>]::
       	Specify how differences in submodules are shown.  When specifying


 Documentation/diff-options.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index aa2b5c11f20b..69de49f977b6 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,11 +293,14 @@ explained for the configuration variable `core.quotePath` (see
 linkgit:git-config[1]).
 
 --name-only::
-	Show only names of changed files.
+	Show only names of changed files. The file names are often encoded in UTF-8.
+	For more information see the discussion about encoding in the linkgit:git-log[1]
+	manual page.
 
 --name-status::
 	Show only names and status of changed files. See the description
 	of the `--diff-filter` option on what the status letters mean.
+	Just like `--name-only` the file names are often encoded in UTF-8.
 
 --submodule[=<format>]::
 	Specify how differences in submodules are shown.  When specifying

base-commit: 48bf2fa8bad054d66bd79c6ba903c89c704201f7
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-20 11:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 13:27 [PATCH] doc: clarify the filename encoding in git diff Andrey Bienkowski via GitGitGadget
2021-04-19 21:33 ` Junio C Hamano
2021-04-20 11:24 ` [PATCH v2] " Andrey Bienkowski via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.