git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug: `gitsubmodule` does not list modules with unicode characters
@ 2013-03-23 16:28 Ilya Kulakov
  2013-03-25  8:30 ` Jens Lehmann
  0 siblings, 1 reply; 5+ messages in thread
From: Ilya Kulakov @ 2013-03-23 16:28 UTC (permalink / raw)
  To: git

The `git submodule` commands seem to ignore modules which paths contain
unicode characters.

Consider the following steps to reproduce the problem:

  1. Create a directory with name that contains at least one unicode character
     (e.g. "ûñïçödé-rèpø")

  2. Initialize git repository within this directory

  3. Add this repository as a submodule to another repository so that
     unicode characters will appear in the path to the module
     (e.g. "../ûñïçödé-rèpø")

  4. Check that .gitmodules file is updated and contains record
     about just added module

  5. List submodules with using `git submodule` and find out
     that just added module is not listed

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug: `gitsubmodule` does not list modules with unicode characters
  2013-03-23 16:28 Bug: `gitsubmodule` does not list modules with unicode characters Ilya Kulakov
@ 2013-03-25  8:30 ` Jens Lehmann
  2013-06-08  1:05   ` Fredrik Gustafsson
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Lehmann @ 2013-03-25  8:30 UTC (permalink / raw)
  To: Ilya Kulakov; +Cc: git

Am 23.03.2013 17:28, schrieb Ilya Kulakov:
> The `git submodule` commands seem to ignore modules which paths contain
> unicode characters.
> 
> Consider the following steps to reproduce the problem:
> 
>   1. Create a directory with name that contains at least one unicode character
>      (e.g. "ûñïçödé-rèpø")
> 
>   2. Initialize git repository within this directory
> 
>   3. Add this repository as a submodule to another repository so that
>      unicode characters will appear in the path to the module
>      (e.g. "../ûñïçödé-rèpø")
> 
>   4. Check that .gitmodules file is updated and contains record
>      about just added module
> 
>   5. List submodules with using `git submodule` and find out
>      that just added module is not listed

Thanks for your report. It is known that git submodule does not behave
very well when path names contain special characters. I'll look into
that when I find some time to see if we can easily fix your problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug: `gitsubmodule` does not list modules with unicode characters
  2013-03-25  8:30 ` Jens Lehmann
@ 2013-06-08  1:05   ` Fredrik Gustafsson
  2013-06-08  9:18     ` Jonathan Nieder
  0 siblings, 1 reply; 5+ messages in thread
From: Fredrik Gustafsson @ 2013-06-08  1:05 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Ilya Kulakov, git

On Mon, Mar 25, 2013 at 09:30:44AM +0100, Jens Lehmann wrote:
> Am 23.03.2013 17:28, schrieb Ilya Kulakov:
> > The `git submodule` commands seem to ignore modules which paths contain
> > unicode characters.
> > 
> > Consider the following steps to reproduce the problem:
> > 
> >   1. Create a directory with name that contains at least one unicode character
> >      (e.g. "ûñïçödé-rèpø")
> > 
> >   2. Initialize git repository within this directory
> > 
> >   3. Add this repository as a submodule to another repository so that
> >      unicode characters will appear in the path to the module
> >      (e.g. "../ûñïçödé-rèpø")
> > 
> >   4. Check that .gitmodules file is updated and contains record
> >      about just added module
> > 
> >   5. List submodules with using `git submodule` and find out
> >      that just added module is not listed
> 
> Thanks for your report. It is known that git submodule does not behave
> very well when path names contain special characters. I'll look into
> that when I find some time to see if we can easily fix your problem.

I've looked into this a bit.

git ls-files will return all filenames "c-style quoted". Hence the
filename åäö will be returned as "303245303244303266". This is of course
also wrong as it should be "\303\245\303\244\303\266".

However, if we tell git ls-files to use \0 instead of \n for line
termination. We get åäö returned. So how can the choose of line termination
effect the encoding?

Look in quote.c. The following patch will solve this particular problem
(but break other usecases!)

diff --git a/quote.c b/quote.c
index 911229f..2870ca5 100644
--- a/quote.c
+++ b/quote.c
@@ -284,7 +284,7 @@ void quote_two_c_style(struct strbuf *sb, const char *prefix, const char *path,
 void write_name_quoted(const char *name, FILE *fp, int terminator)
 {
 	if (terminator) {
-		quote_c_style(name, NULL, fp, 0);
+		fputs(name, fp);
 	} else {
 		fputs(name, fp);
 	}

Why don't we always print names quoted? IMHO the choose of line
termination should not do anything else than alter the line termination.

However, an other solution would be to use git ls-files -z in
git-submodule.sh and then rewrite the perl-code to handle \0 instead
of \n.

(The same perl-code I wanted to throw away 13 months ago but
Junio wanted to keep because perl can handle \0 and eventually -z should
be used according to him. He was right.)

However, a shortcut would be to the patch below. It will work as long as
there's no newline in the filename (is that really something we want to
support? If not, let's throw away perl and stick with the sed solution
below).

diff --git a/git-submodule.sh b/git-submodule.sh
index 79bfaac..31524d3 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -113,9 +113,10 @@ resolve_relative_url ()
 module_list()
 {
 	(
-		git ls-files --error-unmatch --stage -- "$@" ||
+		git ls-files --error-unmatch --stage -z -- "$@" ||
 		echo "unmatched pathspec exists"
 	) |
+	sed -e 's/\x00/\n/g' |
 	perl -e '
 	my %unmerged = ();
 	my ($null_sha1) = ("0" x 40);

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iveqy@iveqy.com

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Bug: `gitsubmodule` does not list modules with unicode characters
  2013-06-08  1:05   ` Fredrik Gustafsson
@ 2013-06-08  9:18     ` Jonathan Nieder
  2013-06-08 12:06       ` Fredrik Gustafsson
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Nieder @ 2013-06-08  9:18 UTC (permalink / raw)
  To: Fredrik Gustafsson; +Cc: Jens Lehmann, Ilya Kulakov, git

Fredrik Gustafsson wrote:

> I've looked into this a bit.

Thanks for investigating.

[...]
> Why don't we always print names quoted? IMHO the choose of line
> termination should not do anything else than alter the line termination.
>
> However, an other solution would be to use git ls-files -z in
> git-submodule.sh and then rewrite the perl-code to handle \0 instead
> of \n.

The whole point of "-z" is that by using a terminator that is guaranteed
not to appear in filenames, it avoids the need to quote filenames.
Otherwise at least \n would need to be quoted.

How about something like this patch?

-- >8 --
Subject: ls-files doc: clarify purpose of "-z" option

The purpose of the "-z" option is to avoid quoting issues by using a
delimiter that implies a binary-clean parser and cannot appear in
filenames, and in that spirit, "-z" turns off C-style quoting.  But
without looking carefully through the entire manpage, it is too easy
to miss that.

Reported-by: Fredrik Gustafsson <iveqy@iveqy.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-ls-files.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index c0856a6e..753c223f 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -75,7 +75,9 @@ OPTIONS
 	succeed.
 
 -z::
-	\0 line termination on output.
+	Terminate lines with NUL instead of LF.
+	This avoids the need to quote filenames; see the Output section
+	below for details.
 
 -x <pattern>::
 --exclude=<pattern>::
-- 
1.8.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Bug: `gitsubmodule` does not list modules with unicode characters
  2013-06-08  9:18     ` Jonathan Nieder
@ 2013-06-08 12:06       ` Fredrik Gustafsson
  0 siblings, 0 replies; 5+ messages in thread
From: Fredrik Gustafsson @ 2013-06-08 12:06 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Fredrik Gustafsson, Jens Lehmann, Ilya Kulakov, git

On Sat, Jun 08, 2013 at 02:18:36AM -0700, Jonathan Nieder wrote:
> The whole point of "-z" is that by using a terminator that is guaranteed
> not to appear in filenames, it avoids the need to quote filenames.
> Otherwise at least \n would need to be quoted.

Thanks, now I understand why.

> 
> How about something like this patch?
> 
> -- >8 --
> Subject: ls-files doc: clarify purpose of "-z" option
> 
> The purpose of the "-z" option is to avoid quoting issues by using a
> delimiter that implies a binary-clean parser and cannot appear in
> filenames, and in that spirit, "-z" turns off C-style quoting.  But
> without looking carefully through the entire manpage, it is too easy
> to miss that.
> 
> Reported-by: Fredrik Gustafsson <iveqy@iveqy.com>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
>  Documentation/git-ls-files.txt | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> index c0856a6e..753c223f 100644
> --- a/Documentation/git-ls-files.txt
> +++ b/Documentation/git-ls-files.txt
> @@ -75,7 +75,9 @@ OPTIONS
>  	succeed.
>  
>  -z::
> -	\0 line termination on output.
> +	Terminate lines with NUL instead of LF.
> +	This avoids the need to quote filenames; see the Output section
> +	below for details.
>  
>  -x <pattern>::
>  --exclude=<pattern>::
> -- 
> 1.8.3
> 

That would be very helpfull. I would suggest to add something about
unicode also (and maybe about the quotes that's added?). I'm a bit
unsure about the formulating but how about something like this:


From 114c34ea482873b39c02e63eeaf866c3e9ebfc14 Mon Sep 17 00:00:00 2001
From: Jonathan Nieder <jrnieder@gmail.com>
Date: Sat, 8 Jun 2013 02:18:36 -0700
Subject: [PATCH] Subject: ls-files doc: clarify purpose of "-z" option

The purpose of the "-z" option is to avoid quoting issues by using a
delimiter that implies a binary-clean parser and cannot appear in
filenames, and in that spirit, "-z" turns off C-style quoting.  But
without looking carefully through the entire manpage, it is too easy
to miss that.

Reported-by: Fredrik Gustafsson <iveqy@iveqy.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-ls-files.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index c0856a6..ef785ba 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -75,7 +75,9 @@ OPTIONS
 	succeed.
 
 -z::
-	\0 line termination on output.
+	Terminate lines with NUL instead of LF.
+	This avoids the need to quote filenames; see the Output section
+	below for details.
 
 -x <pattern>::
 --exclude=<pattern>::
@@ -172,7 +174,8 @@ path. (see linkgit:git-read-tree[1] for more information on state)
 
 When `-z` option is not used, TAB, LF, and backslash characters
 in pathnames are represented as `\t`, `\n`, and `\\`,
-respectively.
+respectively. Multibyte characters are represented by they escaped
+equivalents.
 
 
 Exclude Patterns
-- 
1.8.1.5


-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iveqy@iveqy.com

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-08 12:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-23 16:28 Bug: `gitsubmodule` does not list modules with unicode characters Ilya Kulakov
2013-03-25  8:30 ` Jens Lehmann
2013-06-08  1:05   ` Fredrik Gustafsson
2013-06-08  9:18     ` Jonathan Nieder
2013-06-08 12:06       ` Fredrik Gustafsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).