All of lore.kernel.org
 help / color / mirror / Atom feed
* "add -p" + filenames with UTF-8 multibyte characters = "No changes"
@ 2009-02-15 18:40 Antonio García Domínguez
  2009-02-15 18:59 ` Teemu Likonen
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio García Domínguez @ 2009-02-15 18:40 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

Hello all,

I seem to have run into a bug in "add -p" and "add -i" when trying to
stage diff hunks in tracked files with UTF-8 multibyte characters
(such as "á"). If I add "á", commit, then modify it and try to run
"add -p" on it, Git reports "No changes". "add -i" doesn't do
anything, either.

I've switched to 1.6.2.rc0.90.g0753 and the problem persists. If it
helps, I've attached a small shell script with a minimal recipe for
triggering the bug.

This should incorrectly report "No changes":
./test-accents.sh

And this should work fine:
FILE=a ./test-accents.sh

-Antonio

[-- Attachment #2: test-accents.sh --]
[-- Type: application/x-sh, Size: 185 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "add -p" + filenames with UTF-8 multibyte characters = "No changes"
  2009-02-15 18:40 "add -p" + filenames with UTF-8 multibyte characters = "No changes" Antonio García Domínguez
@ 2009-02-15 18:59 ` Teemu Likonen
       [not found]   ` <2b8265360902151100n2eca0182odf9543c1dd8a7f98@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Teemu Likonen @ 2009-02-15 18:59 UTC (permalink / raw)
  To: Antonio García Domínguez; +Cc: git

On 2009-02-15 19:40 (+0100), Antonio García Domínguez wrote:

> I seem to have run into a bug in "add -p" and "add -i" when trying to
> stage diff hunks in tracked files with UTF-8 multibyte characters
> (such as "á"). If I add "á", commit, then modify it and try to run
> "add -p" on it, Git reports "No changes". "add -i" doesn't do
> anything, either.
>
> I've switched to 1.6.2.rc0.90.g0753 and the problem persists. If it
> helps, I've attached a small shell script with a minimal recipe for
> triggering the bug.

This bug is documented in BUGS section of "git add" manual (see "git
help add"). You can work it around with

    git config --global core.quotepath false

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "add -p" + filenames with UTF-8 multibyte characters = "No  changes"
       [not found]   ` <2b8265360902151100n2eca0182odf9543c1dd8a7f98@mail.gmail.com>
@ 2009-02-15 19:11     ` Teemu Likonen
  2009-02-16  3:36       ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Teemu Likonen @ 2009-02-15 19:11 UTC (permalink / raw)
  To: Antonio García Domínguez; +Cc: git

[Added the Git list back to delivery.]

On 2009-02-15 20:00 (+0100), Antonio García Domínguez wrote:

>> This bug is documented in BUGS section of "git add" manual (see "git
>> help add"). You can work it around with
>>
>>    git config --global core.quotepath false
>
> Oh, sorry, I should have known better and RTFM. :-D

No problem at all. I was bitten by the same bug earlier, and as I wasn't
able to fix it I sent a documentation patch instead. :-)

"core.quotepath=false" is good for other purposes too. It prints UTF-8
filenames in diff headers in form that is actually readable. I think it
would be better default.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "add -p" + filenames with UTF-8 multibyte characters = "No changes"
  2009-02-15 19:11     ` Teemu Likonen
@ 2009-02-16  3:36       ` Jeff King
  2009-02-16  4:00         ` Junio C Hamano
  2009-02-16  5:13         ` "add -p" + filenames with UTF-8 multibyte characters = "No changes" Teemu Likonen
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2009-02-16  3:36 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: Junio C Hamano, git

On Sun, Feb 15, 2009 at 09:11:00PM +0200, Teemu Likonen wrote:

> "core.quotepath=false" is good for other purposes too. It prints UTF-8
> filenames in diff headers in form that is actually readable. I think it
> would be better default.

I am not opposed to setting this as a default, but I think there may be
some encoding issues to be dealt with. At the very least, format-patch
generates messages without a content-type header. E.g.,:

  $ touch föö && git add . && git commit -m one
  $ echo content >föö && git commit -a -m two

  $ git format-patch --stdout HEAD^ | sed '/^$/q'

     vs

  $ git config core.quotepath false
  $ git format-patch --stdout HEAD^ | sed '/^$/q'

So now we have non-ascii in our email, but no header specifying
encoding. Previous experience has shown that intermediate MTAs (like
vger) will add their own header with whatever encoding they think is
sensible (in the case of vger, iso8859-1), corrupting the mail if they
guess wrong.

But what is the right encoding to specify? We can guess that it is
whatever the commit message is in (defaulting to utf-8). It is by no
means correct, but it would probably work pretty well in practice.

On the other hand, we already have the same problem for encoded file
_contents_. So maybe it is not a big problem in practice.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "add -p" + filenames with UTF-8 multibyte characters = "No changes"
  2009-02-16  3:36       ` Jeff King
@ 2009-02-16  4:00         ` Junio C Hamano
  2009-02-17  7:02           ` [PATCH] git-add -i/-p: learn to unwrap C-quoted paths Junio C Hamano
  2009-02-16  5:13         ` "add -p" + filenames with UTF-8 multibyte characters = "No changes" Teemu Likonen
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2009-02-16  4:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Teemu Likonen, git

Jeff King <peff@peff.net> writes:

> But what is the right encoding to specify? We can guess that it is
> whatever the commit message is in (defaulting to utf-8). It is by no
> means correct, but it would probably work pretty well in practice.
>
> On the other hand, we already have the same problem for encoded file
> _contents_. So maybe it is not a big problem in practice.

I did not spell the specifics out because this change won't happen in any
near future anyway, but my thinking was to give a way for "add -p" to
either (1) internally run without quotepath regardless of the user's
settings or (2) unquote the paths correctly when it learns the set of
paths affected by the change.

I think the right approach is (2), because you need to unquote pathnames
with some byte values that even with core.quotepath=false will not pass
unquoted *anyway*.

I also happen to think that it may be a good idea to ignore core.quotepath
settings in format-patch, but that is a separate topic.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "add -p" + filenames with UTF-8 multibyte characters = "No changes"
  2009-02-16  3:36       ` Jeff King
  2009-02-16  4:00         ` Junio C Hamano
@ 2009-02-16  5:13         ` Teemu Likonen
  1 sibling, 0 replies; 8+ messages in thread
From: Teemu Likonen @ 2009-02-16  5:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On 2009-02-15 22:36 (-0500), Jeff King wrote:

> I am not opposed to setting this as a default, but I think there may
> be some encoding issues to be dealt with. At the very least,
> format-patch generates messages without a content-type header. E.g.,:

> But what is the right encoding to specify? We can guess that it is
> whatever the commit message is in (defaulting to utf-8). It is by no
> means correct, but it would probably work pretty well in practice.

I have a small script which adds/rewrites MIME headers. It defaults to
UTF-8/8bit:

    #!/bin/sh

    charset="${1:-UTF-8}"
    encoding="${2:-8bit}"

    formail -I "MIME-Version: 1.0" \
            -I "Content-Type: text/plain; charset=$charset" \
            -I "Content-Transfer-Encoding: $encoding" \
            -s

I have used the script this way:

    git format-patch --stdout [...] | add-mime-headers | \
        formail -s sh -c 'cat >$FILENO.patch'

It may be difficult and unreliable to try to detect the encoding of file
content so maybe "git format-patch" could be taught an option like
"--charset=<charset>". (And perhaps also a configuration variable.) That
would just write MIME headers (charset) as the user wishes.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] git-add -i/-p: learn to unwrap C-quoted paths
  2009-02-16  4:00         ` Junio C Hamano
@ 2009-02-17  7:02           ` Junio C Hamano
  2009-02-17  8:09             ` Teemu Likonen
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2009-02-17  7:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Teemu Likonen, git

The underlying plumbing commands are not run with -z option, so the paths
returned from them need to be unquoted as needed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Junio C Hamano <gitster@pobox.com> writes:

 > I did not spell the specifics out because this change won't happen in any
 > near future anyway, but my thinking was to give a way for "add -p" to
 > either (1) internally run without quotepath regardless of the user's
 > settings or (2) unquote the paths correctly when it learns the set of
 > paths affected by the change.
 >
 > I think the right approach is (2), because you need to unquote pathnames
 > with some byte values that even with core.quotepath=false will not pass
 > unquoted *anyway*.

 I deliberately avoided using :utf8 because pathnames to git are always
 binary text, not necessarily utf8, and for the same reason I dropped the
 "shortcut by first character", as the pathnames are stored as raw
 sequence of bytes and using the first byte is obviously wrong.  It would
 be very much welcomed if somebody feels inclined to test and polish it.

 This hopefully makes the discussion on default value for core.quotepath
 independent of "git-add -i/-p".

 git-add--interactive.perl |   55 ++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 5f129a4..064d4c6 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -3,6 +3,8 @@
 use strict;
 use Git;
 
+binmode(STDOUT, ":raw");
+
 my $repo = Git->repository();
 
 my $menu_use_color = $repo->get_colorbool('color.interactive');
@@ -91,6 +93,47 @@ if (!defined $GIT_DIR) {
 }
 chomp($GIT_DIR);
 
+my %cquote_map = (
+ "b" => chr(8),
+ "t" => chr(9),
+ "n" => chr(10),
+ "v" => chr(11),
+ "f" => chr(12),
+ "r" => chr(13),
+ "\\" => "\\",
+ "\042" => "\042",
+);
+
+sub unquote_path {
+	local ($_) = @_;
+	my ($retval, $remainder);
+	if (!/^\042(.*)\042$/) {
+		return $_;
+	}
+	($_, $retval) = ($1, "");
+	while (/^([^\\]*)\\(.*)$/) {
+		$remainder = $2;
+		$retval .= $1;
+		for ($remainder) {
+			if (/^([0-3][0-7][0-7])(.*)$/) {
+				$retval .= chr(oct($1));
+				$_ = $2;
+				last;
+			}
+			if (/^([\\\042btnvfr])(.*)$/) {
+				$retval .= $cquote_map{$1};
+				$_ = $2;
+				last;
+			}
+			# This is malformed -- just return it as-is for now.
+			return $_[0];
+		}
+		$_ = $remainder;
+	}
+	$retval .= $_;
+	return $retval;
+}
+
 sub refresh {
 	my $fh;
 	open $fh, 'git update-index --refresh |'
@@ -104,7 +147,7 @@ sub refresh {
 sub list_untracked {
 	map {
 		chomp $_;
-		$_;
+		unquote_path($_);
 	}
 	run_cmd_pipe(qw(git ls-files --others --exclude-standard --), @ARGV);
 }
@@ -141,7 +184,8 @@ sub list_modified {
 
 	if (@ARGV) {
 		@tracked = map {
-			chomp $_; $_;
+			chomp $_;
+			unquote_path($_);
 		} run_cmd_pipe(qw(git ls-files --exclude-standard --), @ARGV);
 		return if (!@tracked);
 	}
@@ -153,6 +197,7 @@ sub list_modified {
 		if (($add, $del, $file) =
 		    /^([-\d]+)	([-\d]+)	(.*)/) {
 			my ($change, $bin);
+			$file = unquote_path($file);
 			if ($add eq '-' && $del eq '-') {
 				$change = 'binary';
 				$bin = 1;
@@ -168,6 +213,7 @@ sub list_modified {
 		}
 		elsif (($adddel, $file) =
 		       /^ (create|delete) mode [0-7]+ (.*)$/) {
+			$file = unquote_path($file);
 			$data{$file}{INDEX_ADDDEL} = $adddel;
 		}
 	}
@@ -175,6 +221,7 @@ sub list_modified {
 	for (run_cmd_pipe(qw(git diff-files --numstat --summary --), @tracked)) {
 		if (($add, $del, $file) =
 		    /^([-\d]+)	([-\d]+)	(.*)/) {
+			$file = unquote_path($file);
 			if (!exists $data{$file}) {
 				$data{$file} = +{
 					INDEX => 'unchanged',
@@ -196,6 +243,7 @@ sub list_modified {
 		}
 		elsif (($adddel, $file) =
 		       /^ (create|delete) mode [0-7]+ (.*)$/) {
+			$file = unquote_path($file);
 			$data{$file}{FILE_ADDDEL} = $adddel;
 		}
 	}
@@ -302,7 +350,8 @@ sub find_unique_prefixes {
 			}
 			%search = %{$search{$letter}};
 		}
-		if ($soft_limit && $j + 1 > $soft_limit) {
+		if (ord($letters[0]) > 127 ||
+		    ($soft_limit && $j + 1 > $soft_limit)) {
 			$prefix = undef;
 			$remainder = $ret;
 		}
-- 
1.6.2.rc1.47.g30e1d8

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] git-add -i/-p: learn to unwrap C-quoted paths
  2009-02-17  7:02           ` [PATCH] git-add -i/-p: learn to unwrap C-quoted paths Junio C Hamano
@ 2009-02-17  8:09             ` Teemu Likonen
  0 siblings, 0 replies; 8+ messages in thread
From: Teemu Likonen @ 2009-02-17  8:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On 2009-02-16 23:02 (-0800), Junio C Hamano wrote:

> The underlying plumbing commands are not run with -z option, so the paths
> returned from them need to be unquoted as needed.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

According to my quick tests this seems to work. You may want to squash
this patch in:


diff --git i/Documentation/git-add.txt w/Documentation/git-add.txt
index 7c129cb..6c79a87 100644
--- i/Documentation/git-add.txt
+++ w/Documentation/git-add.txt
@@ -263,13 +263,6 @@ diff::
   This lets you review what will be committed (i.e. between
   HEAD and index).
 
-Bugs
-----
-The interactive mode does not work with files whose names contain
-characters that need C-quoting.  `core.quotepath` configuration can be
-used to work this limitation around to some degree, but backslash,
-double-quote and control characters will still have problems.
-
 SEE ALSO
 --------
 linkgit:git-status[1]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-17  8:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-15 18:40 "add -p" + filenames with UTF-8 multibyte characters = "No changes" Antonio García Domínguez
2009-02-15 18:59 ` Teemu Likonen
     [not found]   ` <2b8265360902151100n2eca0182odf9543c1dd8a7f98@mail.gmail.com>
2009-02-15 19:11     ` Teemu Likonen
2009-02-16  3:36       ` Jeff King
2009-02-16  4:00         ` Junio C Hamano
2009-02-17  7:02           ` [PATCH] git-add -i/-p: learn to unwrap C-quoted paths Junio C Hamano
2009-02-17  8:09             ` Teemu Likonen
2009-02-16  5:13         ` "add -p" + filenames with UTF-8 multibyte characters = "No changes" Teemu Likonen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.