All of lore.kernel.org
 help / color / mirror / Atom feed
* svn clone Checksum mismatch question
@ 2009-03-26 10:31 Gilbert Liddell
  2009-03-26 13:02 ` Björn Steinbrink
  2009-03-26 14:34 ` Peter Harris
  0 siblings, 2 replies; 28+ messages in thread
From: Gilbert Liddell @ 2009-03-26 10:31 UTC (permalink / raw)
  To: git


Hi,

I've just started using GIT this week, currently the project i'm working on
is held in subversion. I tested git svn clone with a small test project
(about 10 files) which worked a treat.

This morning i decided to test the clone with the full project i'm working
on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
got: e71db1010a0da06ea76d4163c452df72

Can someone help with why this error is happening? Is there an issue with
the GIT clone and large repositories?

Thanks in advance for your help,
Gilbert.
-- 
View this message in context: http://www.nabble.com/svn-clone-Checksum-mismatch-question-tp22719363p22719363.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-26 10:31 svn clone Checksum mismatch question Gilbert Liddell
@ 2009-03-26 13:02 ` Björn Steinbrink
  2009-03-26 13:28   ` Gilbert Liddell
  2009-03-26 14:35   ` Anton Gyllenberg
  2009-03-26 14:34 ` Peter Harris
  1 sibling, 2 replies; 28+ messages in thread
From: Björn Steinbrink @ 2009-03-26 13:02 UTC (permalink / raw)
  To: Gilbert Liddell; +Cc: git

On 2009.03.26 03:31:53 -0700, Gilbert Liddell wrote:
> This morning i decided to test the clone with the full project i'm working
> on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
> 0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
> got: e71db1010a0da06ea76d4163c452df72
> 
> Can someone help with why this error is happening? Is there an issue with
> the GIT clone and large repositories?

Which git version is that? There was some bug in git-svn that caused it
to fill the disk with temporary files, without noticing that those files
get truncated when the disk is full. That was fixed in some 1.6.0.x
release IIRC.

Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: svn clone Checksum mismatch question
  2009-03-26 13:02 ` Björn Steinbrink
@ 2009-03-26 13:28   ` Gilbert Liddell
  2009-03-26 13:54     ` Sverre Rabbelier
  2009-03-26 14:35   ` Anton Gyllenberg
  1 sibling, 1 reply; 28+ messages in thread
From: Gilbert Liddell @ 2009-03-26 13:28 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: git

Hi Björn,

Thanks for the reply, i'm using git version 1.6.2.msysgit.0.186.gf7512

Gilbert.

-----Original Message-----
From: Björn Steinbrink [mailto:B.Steinbrink@gmx.de] 
Sent: 26 March 2009 13:02
To: Gilbert Liddell
Cc: git@vger.kernel.org
Subject: Re: svn clone Checksum mismatch question

On 2009.03.26 03:31:53 -0700, Gilbert Liddell wrote:
> This morning i decided to test the clone with the full project i'm working
> on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
> 0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
> got: e71db1010a0da06ea76d4163c452df72
> 
> Can someone help with why this error is happening? Is there an issue with
> the GIT clone and large repositories?

Which git version is that? There was some bug in git-svn that caused it
to fill the disk with temporary files, without noticing that those files
get truncated when the disk is full. That was fixed in some 1.6.0.x
release IIRC.

Björn

Registered in Scotland
32 Fountain Drive
Inchinnan Business Park
Renfrewshire,PA4 9RF.
Company number:SC112872

This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom
they are addressed.
If you have received this e-mail in error please notify the
originator of the message. This footer also confirms that this
e-mail message has been scanned for the presence of computer viruses.

Any views expressed in this message are those of the individual
sender, except where the sender specifies and with authority,
states them to be the views of Total Repair Solutions.

Scanning of this message and addition of this footer is performed
by SurfControl E-mail Filter software in conjunction with 
virus detection software.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-26 13:28   ` Gilbert Liddell
@ 2009-03-26 13:54     ` Sverre Rabbelier
  2009-03-26 14:18       ` Gilbert Liddell
  2009-03-26 14:34       ` Johannes Schindelin
  0 siblings, 2 replies; 28+ messages in thread
From: Sverre Rabbelier @ 2009-03-26 13:54 UTC (permalink / raw)
  To: Gilbert Liddell, Johannes Schindelin, Johannes Sixt
  Cc: Björn Steinbrink, git

Heya,

[We do not top post on this list, instead it is customary to reply
inline, as I and Björn have done]

2009/3/26 Gilbert Liddell <gliddell@totalrepair.co.uk>:
>2009/3/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
>> On 2009.03.26 03:31:53 -0700, Gilbert Liddell wrote:
>>> This morning i decided to test the clone with the full project i'm working
>>> on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
>>> 0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
>>> got: e71db1010a0da06ea76d4163c452df72
>>>
>>> Can someone help with why this error is happening? Is there an issue with
>>> the GIT clone and large repositories?
>>
>> Which git version is that? There was some bug in git-svn that caused it
>> to fill the disk with temporary files, without noticing that those files
>> get truncated when the disk is full. That was fixed in some 1.6.0.x
>> release IIRC.
>
> Thanks for the reply, i'm using git version 1.6.2.msysgit.0.186.gf7512

Seems like it could be one of the known bugs of git-svn on windows?
(ccing Dscho and J6t)

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: svn clone Checksum mismatch question
  2009-03-26 13:54     ` Sverre Rabbelier
@ 2009-03-26 14:18       ` Gilbert Liddell
  2009-03-26 14:34       ` Johannes Schindelin
  1 sibling, 0 replies; 28+ messages in thread
From: Gilbert Liddell @ 2009-03-26 14:18 UTC (permalink / raw)
  To: Sverre Rabbelier, Johannes Schindelin, Johannes Sixt
  Cc: Björn Steinbrink, git

>2009/3/26 Gilbert Liddell <gliddell@totalrepair.co.uk>:
>>2009/3/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
>>> On 2009.03.26 03:31:53 -0700, Gilbert Liddell wrote:
>>>> This morning i decided to test the clone with the full project i'm working
>>>> on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
>>>> 0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
>>>> got: e71db1010a0da06ea76d4163c452df72
>>>>
>>>> Can someone help with why this error is happening? Is there an issue with
>>>> the GIT clone and large repositories?
>>>
>>> Which git version is that? There was some bug in git-svn that caused it
>>> to fill the disk with temporary files, without noticing that those files
>>> get truncated when the disk is full. That was fixed in some 1.6.0.x
>>> release IIRC.
>>
>> Thanks for the reply, i'm using git version 1.6.2.msysgit.0.186.gf7512
>
>Seems like it could be one of the known bugs of git-svn on windows?
> (ccing Dscho and J6t)
>
>-- 
>Cheers,
>
>Sverre Rabbelier

Hi,

Apologies for the Top Posting.
I've not been able to find any info about this being a but with git-svn on Windows. I stumbled across this post that appears to be the same/similar issue - 
http://lists-archives.org/git/668493-git-svn-checksum-mismatch-importing-large-file.html

Gilbert.




Registered in Scotland
32 Fountain Drive
Inchinnan Business Park
Renfrewshire,PA4 9RF.
Company number:SC112872

This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom
they are addressed.
If you have received this e-mail in error please notify the
originator of the message. This footer also confirms that this
e-mail message has been scanned for the presence of computer viruses.

Any views expressed in this message are those of the individual
sender, except where the sender specifies and with authority,
states them to be the views of Total Repair Solutions.

Scanning of this message and addition of this footer is performed
by SurfControl E-mail Filter software in conjunction with 
virus detection software.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-26 10:31 svn clone Checksum mismatch question Gilbert Liddell
  2009-03-26 13:02 ` Björn Steinbrink
@ 2009-03-26 14:34 ` Peter Harris
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Harris @ 2009-03-26 14:34 UTC (permalink / raw)
  To: Gilbert Liddell; +Cc: git

On Thu, Mar 26, 2009 at 6:31 AM, Gilbert Liddell wrote:
>
> This morning i decided to test the clone with the full project i'm working
> on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
> 0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
> got: e71db1010a0da06ea76d4163c452df72
>
> Can someone help with why this error is happening? Is there an issue with
> the GIT clone and large repositories?

(since you mentioned msysgit in another reply) What is your
core.autocrlf setting? Did you default it to 'true' or 'input' when
you installed msysgit?

Try "git config core.autocrlf false" and resume the import process
(with "git svn fetch" or similar).

Importing from svn with autocrlf on only works if every text file has
svn:eol-style=native set in every revision. *.sln files are even
worse, since they look like text to git, but they're really binary (so
nobody sets svn:eol-style on them).

Peter Harris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-26 13:54     ` Sverre Rabbelier
  2009-03-26 14:18       ` Gilbert Liddell
@ 2009-03-26 14:34       ` Johannes Schindelin
  1 sibling, 0 replies; 28+ messages in thread
From: Johannes Schindelin @ 2009-03-26 14:34 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Gilbert Liddell, Johannes Sixt, Björn Steinbrink, git

Hi,

On Thu, 26 Mar 2009, Sverre Rabbelier wrote:

> Seems like it could be one of the known bugs of git-svn on windows? 
> (ccing Dscho and J6t)

EOUTOFGITTIME,
Dscho

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-26 13:02 ` Björn Steinbrink
  2009-03-26 13:28   ` Gilbert Liddell
@ 2009-03-26 14:35   ` Anton Gyllenberg
  2009-03-27 11:18     ` Anton Gyllenberg
  1 sibling, 1 reply; 28+ messages in thread
From: Anton Gyllenberg @ 2009-03-26 14:35 UTC (permalink / raw)
  To: Björn Steinbrink, git; +Cc: Gilbert Liddell

2009/3/26 Björn Steinbrink <B.Steinbrink@gmx.de>:
> On 2009.03.26 03:31:53 -0700, Gilbert Liddell wrote:
>> This morning i decided to test the clone with the full project i'm working
>> on (11,000 files) and I get the error message Checksum mismatch: vn2.sln
>> 0f7a82f1d38b819 expected: fde799e5ba0d1d07e6b539016bea3260
>> got: e71db1010a0da06ea76d4163c452df72
>>
>> Can someone help with why this error is happening? Is there an issue with
>> the GIT clone and large repositories?
>
> Which git version is that? There was some bug in git-svn that caused it
> to fill the disk with temporary files, without noticing that those files
> get truncated when the disk is full. That was fixed in some 1.6.0.x
> release IIRC.

I don't know if this is the same issue, but the I get a similar error
on the public twisted-python repository on both windows and linux,
with several different versions and plenty of free disk space. As this
is a publicly accessible repository it should be easy to reproduce:

git svn init -s svn://svn.twistedmatrix.com/svn/Twisted twisted
cd twisted
git svn fetch -r 13611:HEAD

This ultimately dies with the following error:
W: +empty_dir: trunk/doc/core/howto/listings/finger/finger
r13612 = f6d995ac255e3dfa08a517a6e72fbcfe63feaaa0 (trunk)
Checksum mismatch:
branches/foom/--omg-optimized/twisted/internet/cdefer/cdefer.pyx
264b0c5f7b3a00d401d1a5dcce67a3734f0eede3
expected: c7ccddd195f132926e20bab573da7ef3
     got: f006323ff4714ca52c0228ce6390d415


I found this a long time ago but never got around to analyze or report it.

Anton

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-26 14:35   ` Anton Gyllenberg
@ 2009-03-27 11:18     ` Anton Gyllenberg
  2009-03-29  6:08       ` Eric Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Anton Gyllenberg @ 2009-03-27 11:18 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

I hope I didn't hijack the thread with an unrelated issue.

2009/3/26 Anton Gyllenberg <anton@iki.fi>:
> I don't know if this is the same issue, but the I get a similar error
> on the public twisted-python repository on both windows and linux,
> with several different versions and plenty of free disk space. As this
> is a publicly accessible repository it should be easy to reproduce:
>
> git svn init -s svn://svn.twistedmatrix.com/svn/Twisted twisted
> cd twisted
> git svn fetch -r 13611:HEAD
>
> This ultimately dies with the following error:
> W: +empty_dir: trunk/doc/core/howto/listings/finger/finger
> r13612 = f6d995ac255e3dfa08a517a6e72fbcfe63feaaa0 (trunk)
> Checksum mismatch:
> branches/foom/--omg-optimized/twisted/internet/cdefer/cdefer.pyx
> 264b0c5f7b3a00d401d1a5dcce67a3734f0eede3
> expected: c7ccddd195f132926e20bab573da7ef3
>     got: f006323ff4714ca52c0228ce6390d415

Looking into this, the mentioned blob
264b0c5f7b3a00d401d1a5dcce67a3734f0eede3 with md5sum
f006323ff4714ca52c0228ce6390d415 is not at path
branches/foom/--omg-optimized/twisted/internet/cdefer/cdefer.pyx. The
contents of the blob is the seemingly totally unrelated LICENSE file
that is found at trunk/LICENSE and
branches/foom/--omg-optimized/LICENSE. cdefer.pyx does have the md5sum
c7ccddd195f132926e20bab573da7ef3.  Note that the branch root directory
is branches/foom/--omg-optimized (like with the branch name being
foom/--omg-optimized), not just branches/foom. Is think git-svn relies
on the standard layout being branches directly under the branches/
directory, but I don't see how this would get the paths mixed up like
this.

Looking at what was done around this commit one finds odd stuff, like
deleting directories in trunk and then copying from a previous
revision of trunk to under the branch:
http://twistedmatrix.com/trac/changeset/13611

I created a local test svn repository and tried to do something
similar but git-svn had no problem with my test.

This is issue is not critical for me in any way but if somebody wants
to look into it I am happy to help out.

Anton

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-27 11:18     ` Anton Gyllenberg
@ 2009-03-29  6:08       ` Eric Wong
  2009-03-29  6:10         ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Eric Wong
  2009-03-30  7:26         ` svn clone Checksum mismatch question Anton Gyllenberg
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Wong @ 2009-03-29  6:08 UTC (permalink / raw)
  To: Anton Gyllenberg; +Cc: git

Anton Gyllenberg <anton@iki.fi> wrote:
> I hope I didn't hijack the thread with an unrelated issue.

No worries if you did.  You helped me find a stupid bug in git-svn
that's been there for 3 years.  I couldn't help the original poster at
all since he was working on a private repo and I can't support git-svn
under Windows.

> 2009/3/26 Anton Gyllenberg <anton@iki.fi>:
> > I don't know if this is the same issue, but the I get a similar error
> > on the public twisted-python repository on both windows and linux,
> > with several different versions and plenty of free disk space. As this
> > is a publicly accessible repository it should be easy to reproduce:
> >
> > git svn init -s svn://svn.twistedmatrix.com/svn/Twisted twisted
> > cd twisted
> > git svn fetch -r 13611:HEAD
> >
> > This ultimately dies with the following error:
> > W: +empty_dir: trunk/doc/core/howto/listings/finger/finger
> > r13612 = f6d995ac255e3dfa08a517a6e72fbcfe63feaaa0 (trunk)
> > Checksum mismatch:
> > branches/foom/--omg-optimized/twisted/internet/cdefer/cdefer.pyx
> > 264b0c5f7b3a00d401d1a5dcce67a3734f0eede3
> > expected: c7ccddd195f132926e20bab573da7ef3
> >     got: f006323ff4714ca52c0228ce6390d415
> 
> is branches/foom/--omg-optimized (like with the branch name being
> foom/--omg-optimized), not just branches/foom. Is think git-svn relies
> on the standard layout being branches directly under the branches/
> directory, but I don't see how this would get the paths mixed up like
> this.

Root problem: I misused "git ls-tree" for 3 years and nobody noticed.
At least I'm glad the checksum verification every step of the way caught
this bug and prevented propagating it into repository corruption.

> Looking at what was done around this commit one finds odd stuff, like
> deleting directories in trunk and then copying from a previous
> revision of trunk to under the branch:
> http://twistedmatrix.com/trac/changeset/13611
> 
> I created a local test svn repository and tried to do something
> similar but git-svn had no problem with my test.

I was fooled by the weird copy sequences, too.

> This is issue is not critical for me in any way but if somebody wants
> to look into it I am happy to help out.

I guess few folks in the UNIX world are crazy enough to make pathnames
prefixed with dashes :)  But I do wonder how/if many repositories out
there failed and nobody bothered to report it...

Patch in reply

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-29  6:08       ` Eric Wong
@ 2009-03-29  6:10         ` Eric Wong
  2009-03-29 20:33           ` Junio C Hamano
  2009-03-30  7:26         ` svn clone Checksum mismatch question Anton Gyllenberg
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Wong @ 2009-03-29  6:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Anton Gyllenberg

To find the blob object name given a tree and pathname, we were
incorrectly calling "git ls-tree" with a "--" argument followed
by the pathname of the file we wanted to get.

  git ls-tree <TREE> -- --dashed/path/name.c

Unlike many command-line interfaces, the "--" alone does not
symbolize the end of non-option arguments on the command-line.

ls-tree interprets the "--" as a prefix to match against, thus
the entire contents of the --dashed/* hierarchy would be
returned because the "--" matches "--dashed" and every path
under it.

Thanks to Anton Gyllenberg for pointing me toward the
Twisted repository as a real-world example of this case.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---

 Junio: This can go to maint.  Thanks

 git-svn.perl |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 8be6be0..f21cfb4 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -3387,15 +3387,18 @@ sub delete_entry {
 	return undef if ($gpath eq '');
 
 	# remove entire directories.
-	if (command('ls-tree', $self->{c}, '--', $gpath) =~ /^040000 tree/) {
+	my ($tree) = (command('ls-tree', '-z', $self->{c}, "./$gpath")
+	                 =~ /\A040000 tree ([a-f\d]{40})\t\Q$gpath\E\0/);
+	if ($tree) {
 		my ($ls, $ctx) = command_output_pipe(qw/ls-tree
 		                                     -r --name-only -z/,
-				                     $self->{c}, '--', $gpath);
+				                     $tree);
 		local $/ = "\0";
 		while (<$ls>) {
 			chomp;
-			$self->{gii}->remove($_);
-			print "\tD\t$_\n" unless $::_q;
+			my $rmpath = "$gpath/$_";
+			$self->{gii}->remove($rmpath);
+			print "\tD\t$rmpath\n" unless $::_q;
 		}
 		print "\tD\t$gpath/\n" unless $::_q;
 		command_close_pipe($ls, $ctx);
@@ -3414,8 +3417,8 @@ sub open_file {
 	goto out if is_path_ignored($path);
 
 	my $gpath = $self->git_path($path);
-	($mode, $blob) = (command('ls-tree', $self->{c}, '--', $gpath)
-	                     =~ /^(\d{6}) blob ([a-f\d]{40})\t/);
+	($mode, $blob) = (command('ls-tree', '-z', $self->{c}, "./$gpath")
+	                     =~ /\A(\d{6}) blob ([a-f\d]{40})\t\Q$gpath\E\0/);
 	unless (defined $mode && defined $blob) {
 		die "$path was not found in commit $self->{c} (r$rev)\n";
 	}
-- 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-29  6:10         ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Eric Wong
@ 2009-03-29 20:33           ` Junio C Hamano
  2009-03-29 21:56             ` Eric Wong
  2009-03-30  5:28             ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Björn Steinbrink
  0 siblings, 2 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-03-29 20:33 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Anton Gyllenberg

Eric Wong <normalperson@yhbt.net> writes:

> To find the blob object name given a tree and pathname, we were
> incorrectly calling "git ls-tree" with a "--" argument followed
> by the pathname of the file we wanted to get.
>
>   git ls-tree <TREE> -- --dashed/path/name.c
>
> Unlike many command-line interfaces, the "--" alone does not
> symbolize the end of non-option arguments on the command-line.
>
> ls-tree interprets the "--" as a prefix to match against, thus
> the entire contents of the --dashed/* hierarchy would be
> returned because the "--" matches "--dashed" and every path
> under it.

The above makes only half a sense to me.  In an empty directory:

    $ git init
    Initialized empty Git repository in /tmp/empty/.git
    $ mkdir -p ./--dashed/path
    $ >./--dashed/path/name
    $ git add .
    $ git ls-files
    --dashed/path/name
    $ git commit -a -m initial
    [master (root-commit) cd44284] initial
     0 files changed, 0 insertions(+), 0 deletions(-)
     create mode 100644 --dashed/path/name
    $ git ls-tree HEAD^{tree} --
    $ git ls-tree HEAD^{tree} -- --dashed/path/name
    100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/path/name
    $ mkdir ./--
    $ >./--/eman
    $ git add .
    $ git commit -m second
    [master 80f8ef9] second
     0 files changed, 0 insertions(+), 0 deletions(-)
     create mode 100644 --/eman
    $ git ls-tree HEAD^{tree} -- --dashed/path
    100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--/eman
    040000 tree 23e59e0c91294c39ac7c5a2e39efb01d878de9a0	--dashed/path
    $ exit

Perhaps the problem repository had a pathname that is exactly -- (in
addition to --dashed/), and ls-tree emitted everything under --/
hierarchy?  In other words, your fix to git-svn may be correct and I am
reading your problem description above incorrectly?

As the command always takes exactly one tree, it could be argued that it
is not a bug that it does not honour the usual -- convention, even though
I am tempted to think it is of a very dark shade of gray.  It is certainly
something that we would have done differently if we were implementing the
command today.

"Fixing" ls-tree would be trivial to ignore the first "--" if it precedes
other pathspecs (see below), but the command is a plumbing, and such a
change will break existing scripts that have relied on the existing
behaviour since 2005, so I do not think it is worth the risk of causing
such silent breakages to them.  Besides, with such a "fix", fixing of user
scripts will become much more cumbersome, as they need to detect the
version of git and drive ls-tree differently.


 builtin-ls-tree.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c
index 22008df..08c4307 100644
--- a/builtin-ls-tree.c
+++ b/builtin-ls-tree.c
@@ -186,6 +186,12 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	if (get_sha1(argv[1], sha1))
 		die("Not a valid object name %s", argv[1]);
 
+	if (3 < argc && !strcmp(argv[2], "--")) {
+		/* ls-tree <tree> -- pathspec */
+		argc--;
+		argv++;
+		warning("ignoring -- in 'ls-tree <tree> -- <pathspec>'");
+	}
 	pathspec = get_pathspec(prefix, argv + 2);
 	tree = parse_tree_indirect(sha1);
 	if (!tree)

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-29 20:33           ` Junio C Hamano
@ 2009-03-29 21:56             ` Eric Wong
  2009-03-30  6:44               ` Junio C Hamano
  2009-03-30  5:28             ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Björn Steinbrink
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Wong @ 2009-03-29 21:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Anton Gyllenberg

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
> > To find the blob object name given a tree and pathname, we were
> > incorrectly calling "git ls-tree" with a "--" argument followed
> > by the pathname of the file we wanted to get.
> >
> >   git ls-tree <TREE> -- --dashed/path/name.c
> >
> > Unlike many command-line interfaces, the "--" alone does not
> > symbolize the end of non-option arguments on the command-line.
> >
> > ls-tree interprets the "--" as a prefix to match against, thus
> > the entire contents of the --dashed/* hierarchy would be
> > returned because the "--" matches "--dashed" and every path
> > under it.
> 
> The above makes only half a sense to me.  In an empty directory:

Ah, I think you missed this line:

"the entire contents of the --dashed/* hierarchy would be"

>     $ git init
>     Initialized empty Git repository in /tmp/empty/.git
>     $ mkdir -p ./--dashed/path
>     $ >./--dashed/path/name

# Add a second file
	>./--dashed/path/ame

>     $ git add .
>     $ git ls-files
>     --dashed/path/name
>     $ git commit -a -m initial
>     [master (root-commit) cd44284] initial
>      0 files changed, 0 insertions(+), 0 deletions(-)
>      create mode 100644 --dashed/path/name
>     $ git ls-tree HEAD^{tree} --
>     $ git ls-tree HEAD^{tree} -- --dashed/path/name
>     100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/path/name
>     $ mkdir ./--
>     $ >./--/eman
>     $ git add .
>     $ git commit -m second
>     [master 80f8ef9] second
>      0 files changed, 0 insertions(+), 0 deletions(-)
>      create mode 100644 --/eman
>     $ git ls-tree HEAD^{tree} -- --dashed/path
>     100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--/eman
>     040000 tree 23e59e0c91294c39ac7c5a2e39efb01d878de9a0	--dashed/path

This is similar to the problem I was experiencing.

>     $ exit
> 
> Perhaps the problem repository had a pathname that is exactly -- (in
> addition to --dashed/), and ls-tree emitted everything under --/
> hierarchy?  In other words, your fix to git-svn may be correct and I am
> reading your problem description above incorrectly?

I think so.

> As the command always takes exactly one tree, it could be argued that it
> is not a bug that it does not honour the usual -- convention, even though
> I am tempted to think it is of a very dark shade of gray.  It is certainly
> something that we would have done differently if we were implementing the
> command today.

Well, if somebody had a path in their repo called "--full-name" then it
would certainly be ambiguous and respecting "--" would help.  Something
we should definitely go back and fix if we have time travel[1]

> "Fixing" ls-tree would be trivial to ignore the first "--" if it precedes
> other pathspecs (see below), but the command is a plumbing, and such a
> change will break existing scripts that have relied on the existing
> behaviour since 2005, so I do not think it is worth the risk of causing
> such silent breakages to them.  Besides, with such a "fix", fixing of user
> scripts will become much more cumbersome, as they need to detect the
> version of git and drive ls-tree differently.

I concur completely.  I didn't propose a "fix" to ls-tree for exactly
the reasons you stated.


[1] But if we had time travel we could just release git before any other
SCM and hopefully not have to deal with SVN at all :)

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-29 20:33           ` Junio C Hamano
  2009-03-29 21:56             ` Eric Wong
@ 2009-03-30  5:28             ` Björn Steinbrink
  1 sibling, 0 replies; 28+ messages in thread
From: Björn Steinbrink @ 2009-03-30  5:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, git, Anton Gyllenberg

On 2009.03.29 13:33:02 -0700, Junio C Hamano wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
> > To find the blob object name given a tree and pathname, we were
> > incorrectly calling "git ls-tree" with a "--" argument followed
> > by the pathname of the file we wanted to get.
> >
> >   git ls-tree <TREE> -- --dashed/path/name.c
> >
> > Unlike many command-line interfaces, the "--" alone does not
> > symbolize the end of non-option arguments on the command-line.
> >
> > ls-tree interprets the "--" as a prefix to match against, thus
> > the entire contents of the --dashed/* hierarchy would be
> > returned because the "--" matches "--dashed" and every path
> > under it.
> 
> The above makes only half a sense to me.  In an empty directory:
> 
>     $ git init
>     Initialized empty Git repository in /tmp/empty/.git
>     $ mkdir -p ./--dashed/path
>     $ >./--dashed/path/name
>     $ git add .
>     $ git ls-files
>     --dashed/path/name
>     $ git commit -a -m initial
>     [master (root-commit) cd44284] initial
>      0 files changed, 0 insertions(+), 0 deletions(-)
>      create mode 100644 --dashed/path/name
>     $ git ls-tree HEAD^{tree} --
>     $ git ls-tree HEAD^{tree} -- --dashed/path/name
>     100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/path/name
>     $ mkdir ./--
>     $ >./--/eman
>     $ git add .
>     $ git commit -m second
>     [master 80f8ef9] second
>      0 files changed, 0 insertions(+), 0 deletions(-)
>      create mode 100644 --/eman
>     $ git ls-tree HEAD^{tree} -- --dashed/path
>     100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--/eman
>     040000 tree 23e59e0c91294c39ac7c5a2e39efb01d878de9a0	--dashed/path
>     $ exit
> 
> Perhaps the problem repository had a pathname that is exactly -- (in
> addition to --dashed/), and ls-tree emitted everything under --/
> hierarchy?  In other words, your fix to git-svn may be correct and I am
> reading your problem description above incorrectly?

Your test case is flawed, because you only have a single path in
--dashed/

Initialized empty Git repository in /home/doener/test/.git/
$ mkdir ./--dashed
$ touch ./--dashed/{1,2}
$ git add .
$ git ls-files
--dashed/1
--dashed/2
$ git commit -m init
[master (root-commit) ae7cd83] init
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 --dashed/1
 create mode 100644 --dashed/2
$ git ls-tree HEAD^{tree}
040000 tree f353b342b53872c6a510229524f819c4fe0d5c1b	--dashed
$ git ls-tree HEAD^{tree} --
$ git ls-tree HEAD^{tree} -- --dashed
040000 tree f353b342b53872c6a510229524f819c4fe0d5c1b	--dashed
$ git ls-tree HEAD^{tree} -- --dashed/
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/2
$ git ls-tree HEAD^{tree} -- --dashed/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	--dashed/2

Or even more weird (at least to me):

Initialized empty Git repository in /home/doener/test/.git/
$ mkdir foo fab
$ touch {foo,fab}/{1,2}
$ git add .
$ git commit -m init
[master (root-commit) fdb7bb3] init
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 fab/1
 create mode 100644 fab/2
 create mode 100644 foo/1
 create mode 100644 foo/2
$ git ls-files foo/1 fab/1
fab/1
foo/1
$ git ls-files foo/1 fab/1 f
fab/1
foo/1
$ git ls-tree HEAD^{tree} foo/1 fab/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	fab/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	foo/1
$ git ls-tree HEAD^{tree} foo/1 fab/1 f
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	fab/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	fab/2
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	foo/1
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	foo/2

So if you go into some tree, any additional pattern that is a prefix of
the tree name will match the tree and its contents.

Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-29 21:56             ` Eric Wong
@ 2009-03-30  6:44               ` Junio C Hamano
  2009-03-30 17:41                 ` Eric Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2009-03-30  6:44 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Anton Gyllenberg

Eric Wong <normalperson@yhbt.net> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Wong <normalperson@yhbt.net> writes:
>> 
>> > To find the blob object name given a tree and pathname, we were
>> > incorrectly calling "git ls-tree" with a "--" argument followed
>> > by the pathname of the file we wanted to get.
>> >
>> >   git ls-tree <TREE> -- --dashed/path/name.c
>> >
>> > Unlike many command-line interfaces, the "--" alone does not
>> > symbolize the end of non-option arguments on the command-line.
>> >
>> > ls-tree interprets the "--" as a prefix to match against, thus
>> > the entire contents of the --dashed/* hierarchy would be
>> > returned because the "--" matches "--dashed" and every path
>> > under it.
>> 
>> The above makes only half a sense to me.  In an empty directory:
>
> Ah, I think you missed this line:
>
> "the entire contents of the --dashed/* hierarchy would be"

Actually, that was what I was trying to demonstrate to be false.  Notice
the empty output from the first ls-tree with only -- and no other pathspec
on the command line.  "--" should not match "--dashed/*" anything (but
also notice that I said "should" here).

>>     $ git init
>>     Initialized empty Git repository in /tmp/empty/.git
>>     $ mkdir -p ./--dashed/path
>>     $ >./--dashed/path/name
>
> # Add a second file
> 	>./--dashed/path/ame

I think that is an independent bug.  Not just "--" but it appears "--d"
seems to hit it (and this is an ancient bug---even v1.0.0 seems to have
it).

> [1] But if we had time travel we could just release git before any other
> SCM and hopefully not have to deal with SVN at all :)

;-)

I suspect that ls-tree needs a fix, not about "--" but about the pathspec
filtering.  It appears that the part that decides if a subtree is worth
traversing into uses the correct "is a pathspec pattern match leading path
components?" semantics (i.e. "--dashed" matches but "--" doesn't), but
after traversing into subtrees, the part that emits the output uses a
broken semantics "does the path have any pathspec patter as its prefix?"
It shouldn't check for "prefix", but for "leading path components", in
other words, the match must happen at directory boundaries.

And I do not think *this* bug is too late to fix.  We should fix it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: svn clone Checksum mismatch question
  2009-03-29  6:08       ` Eric Wong
  2009-03-29  6:10         ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Eric Wong
@ 2009-03-30  7:26         ` Anton Gyllenberg
  1 sibling, 0 replies; 28+ messages in thread
From: Anton Gyllenberg @ 2009-03-30  7:26 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Makes perfect sense now that you explain it and I see the patch. I
actually tried debugging it a bit but got lost in the perl SVN code.
Thank you Eric for figuring this out and for all your work on git-svn
in general!

> I guess few folks in the UNIX world are crazy enough to make pathnames
> prefixed with dashes :)

I guess they wouldn't call the project Twisted without reason :-).

Anton

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-30  6:44               ` Junio C Hamano
@ 2009-03-30 17:41                 ` Eric Wong
  2009-03-30 18:05                   ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wong @ 2009-03-30 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Anton Gyllenberg

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
> > Junio C Hamano <gitster@pobox.com> wrote:
> >> Eric Wong <normalperson@yhbt.net> writes:
> >> 
> >> > To find the blob object name given a tree and pathname, we were
> >> > incorrectly calling "git ls-tree" with a "--" argument followed
> >> > by the pathname of the file we wanted to get.
> >> >
> >> >   git ls-tree <TREE> -- --dashed/path/name.c
> >> >
> >> > Unlike many command-line interfaces, the "--" alone does not
> >> > symbolize the end of non-option arguments on the command-line.
> >> >
> >> > ls-tree interprets the "--" as a prefix to match against, thus
> >> > the entire contents of the --dashed/* hierarchy would be
> >> > returned because the "--" matches "--dashed" and every path
> >> > under it.
> >> 
> >> The above makes only half a sense to me.  In an empty directory:
> >
> > Ah, I think you missed this line:
> >
> > "the entire contents of the --dashed/* hierarchy would be"
> 
> Actually, that was what I was trying to demonstrate to be false.  Notice
> the empty output from the first ls-tree with only -- and no other pathspec
> on the command line.  "--" should not match "--dashed/*" anything (but
> also notice that I said "should" here).
> 
> >>     $ git init
> >>     Initialized empty Git repository in /tmp/empty/.git
> >>     $ mkdir -p ./--dashed/path
> >>     $ >./--dashed/path/name
> >
> > # Add a second file
> > 	>./--dashed/path/ame
> 
> I think that is an independent bug.  Not just "--" but it appears "--d"
> seems to hit it (and this is an ancient bug---even v1.0.0 seems to have
> it).

> I suspect that ls-tree needs a fix, not about "--" but about the pathspec
> filtering.  It appears that the part that decides if a subtree is worth
> traversing into uses the correct "is a pathspec pattern match leading path
> components?" semantics (i.e. "--dashed" matches but "--" doesn't), but
> after traversing into subtrees, the part that emits the output uses a
> broken semantics "does the path have any pathspec patter as its prefix?"
> It shouldn't check for "prefix", but for "leading path components", in
> other words, the match must happen at directory boundaries.
> 
> And I do not think *this* bug is too late to fix.  We should fix it.

>From the ls-tree documentation, I was under the impression that "--"
matching "--dashed" was intended:

  When paths are given, show them (note that this isn't really raw
  pathnames, but rather a list of patterns to match).

It doesn't make sense to me match like this, either; but I do think it
was intended and it will break things if people depend on the
existing behavior.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-30 17:41                 ` Eric Wong
@ 2009-03-30 18:05                   ` Junio C Hamano
  2009-03-30 22:58                     ` Eric Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2009-03-30 18:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Anton Gyllenberg

Eric Wong <normalperson@yhbt.net> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>
>> I think that is an independent bug.  Not just "--" but it appears "--d"
>> seems to hit it (and this is an ancient bug---even v1.0.0 seems to have
>> it).
>
>> I suspect that ls-tree needs a fix, not about "--" but about the pathspec
>> filtering.  It appears that the part that decides if a subtree is worth
>> traversing into uses the correct "is a pathspec pattern match leading path
>> components?" semantics (i.e. "--dashed" matches but "--" doesn't), but
>> after traversing into subtrees, the part that emits the output uses a
>> broken semantics "does the path have any pathspec patter as its prefix?"
>> It shouldn't check for "prefix", but for "leading path components", in
>> other words, the match must happen at directory boundaries.
>> 
>> And I do not think *this* bug is too late to fix.  We should fix it.
>
> From the ls-tree documentation, I was under the impression that "--"
> matching "--dashed" was intended:
>
>   When paths are given, show them (note that this isn't really raw
>   pathnames, but rather a list of patterns to match).
>
> It doesn't make sense to me match like this, either; but I do think it
> was intended and it will break things if people depend on the
> existing behavior.

Ok, but then the decision to descend into --dashed should be consistent
with that policy, no?  Right now, it appears that giving "--" alone says
"Anything under --dashed can never match that pattern, so I wouldn't
bother recursing into it".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-30 18:05                   ` Junio C Hamano
@ 2009-03-30 22:58                     ` Eric Wong
  2009-03-31  7:11                       ` Björn Steinbrink
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wong @ 2009-03-30 22:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Anton Gyllenberg, Björn Steinbrink

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
> > Junio C Hamano <gitster@pobox.com> wrote:
> >
> >> I think that is an independent bug.  Not just "--" but it appears "--d"
> >> seems to hit it (and this is an ancient bug---even v1.0.0 seems to have
> >> it).
> >
> >> I suspect that ls-tree needs a fix, not about "--" but about the pathspec
> >> filtering.  It appears that the part that decides if a subtree is worth
> >> traversing into uses the correct "is a pathspec pattern match leading path
> >> components?" semantics (i.e. "--dashed" matches but "--" doesn't), but
> >> after traversing into subtrees, the part that emits the output uses a
> >> broken semantics "does the path have any pathspec patter as its prefix?"
> >> It shouldn't check for "prefix", but for "leading path components", in
> >> other words, the match must happen at directory boundaries.
> >> 
> >> And I do not think *this* bug is too late to fix.  We should fix it.
> >
> > From the ls-tree documentation, I was under the impression that "--"
> > matching "--dashed" was intended:
> >
> >   When paths are given, show them (note that this isn't really raw
> >   pathnames, but rather a list of patterns to match).
> >
> > It doesn't make sense to me match like this, either; but I do think it
> > was intended and it will break things if people depend on the
> > existing behavior.
> 
> Ok, but then the decision to descend into --dashed should be consistent
> with that policy, no?  Right now, it appears that giving "--" alone says
> "Anything under --dashed can never match that pattern, so I wouldn't
> bother recursing into it".

Right.  Except in the case when there are multiple files inside --dashed/
as Björn's email illustrated.  So there seems to be a bug in the way
the number of files inside --dashed/ affects what "--" does when used
with "--dashed/1" (if --dashed/2 also exists).  Very confusing :x

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-30 22:58                     ` Eric Wong
@ 2009-03-31  7:11                       ` Björn Steinbrink
  2009-03-31  7:31                         ` Björn Steinbrink
  0 siblings, 1 reply; 28+ messages in thread
From: Björn Steinbrink @ 2009-03-31  7:11 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Anton Gyllenberg

On 2009.03.30 15:58:34 -0700, Eric Wong wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> > Eric Wong <normalperson@yhbt.net> writes:
> > > From the ls-tree documentation, I was under the impression that "--"
> > > matching "--dashed" was intended:
> > >
> > >   When paths are given, show them (note that this isn't really raw
> > >   pathnames, but rather a list of patterns to match).
> > >
> > > It doesn't make sense to me match like this, either; but I do think it
> > > was intended and it will break things if people depend on the
> > > existing behavior.

I guess that paragraph was meant to explain why "git ls-tree HEAD
Documentation" and "git ls-tree HEAD Documentation/" give different
results.  The first one shows the entry for the tree object, while the
second one shows the contents of the tree object. In contrast to "ls"
which would descend into the directory in both cases.

> > Ok, but then the decision to descend into --dashed should be consistent
> > with that policy, no?  Right now, it appears that giving "--" alone says
> > "Anything under --dashed can never match that pattern, so I wouldn't
> > bother recursing into it".
> 
> Right.  Except in the case when there are multiple files inside --dashed/
> as Björn's email illustrated.  So there seems to be a bug in the way
> the number of files inside --dashed/ affects what "--" does when used
> with "--dashed/1" (if --dashed/2 also exists).  Very confusing :x

It's not the number of files that matters. With just one file, you just
don't notice the buggy behaviour, because showing all files is the same
as showing the specified file.

And interestingly, the problem doesn't seem to be in
show_tree/show_recursive, but in match_tree_entry.

With "git ls-tree HEAD gitweb/git-favicon.png g" we descend into gitweb/
and at some point we get:

match = "g"
base = "gitweb/"

And we have:
if (baselen >= matchlen) {
	if (strncmp(base, match, matchlen))
		continue;
	/* The base is a subdirectory of a path which was specified */
	return 1;
}

So we return 1 there. The code doesn't do what the comment says, so I
guess we can be pretty sure that the behaviour is not intended.

Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-31  7:11                       ` Björn Steinbrink
@ 2009-03-31  7:31                         ` Björn Steinbrink
  2009-03-31  9:41                           ` Björn Steinbrink
  0 siblings, 1 reply; 28+ messages in thread
From: Björn Steinbrink @ 2009-03-31  7:31 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Anton Gyllenberg

On 2009.03.31 09:11:00 +0200, Björn Steinbrink wrote:
> And interestingly, the problem doesn't seem to be in
> show_tree/show_recursive, but in match_tree_entry.
> 
> With "git ls-tree HEAD gitweb/git-favicon.png g" we descend into gitweb/
> and at some point we get:
> 
> match = "g"
> base = "gitweb/"
> 
> And we have:
> if (baselen >= matchlen) {
> 	if (strncmp(base, match, matchlen))
> 		continue;
> 	/* The base is a subdirectory of a path which was specified */
> 	return 1;
> }
> 
> So we return 1 there. The code doesn't do what the comment says, so I
> guess we can be pretty sure that the behaviour is not intended.

Yup, it's in match_tree_entry, you get the same thing with git show.
With git.git, you can try with:

git show 4fa535a -- Documentation/git-merge.txt D

I'll try to get a patch done, if noone beats me to it.

Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths
  2009-03-31  7:31                         ` Björn Steinbrink
@ 2009-03-31  9:41                           ` Björn Steinbrink
  2009-03-31 15:05                             ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
  0 siblings, 1 reply; 28+ messages in thread
From: Björn Steinbrink @ 2009-03-31  9:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Anton Gyllenberg

On 2009.03.31 09:31:47 +0200, Björn Steinbrink wrote:
> On 2009.03.31 09:11:00 +0200, Björn Steinbrink wrote:
> > And interestingly, the problem doesn't seem to be in
> > show_tree/show_recursive, but in match_tree_entry.
> > 
> > With "git ls-tree HEAD gitweb/git-favicon.png g" we descend into gitweb/
> > and at some point we get:
> > 
> > match = "g"
> > base = "gitweb/"
> > 
> > And we have:
> > if (baselen >= matchlen) {
> > 	if (strncmp(base, match, matchlen))
> > 		continue;
> > 	/* The base is a subdirectory of a path which was specified */
> > 	return 1;
> > }
> > 
> > So we return 1 there. The code doesn't do what the comment says, so I
> > guess we can be pretty sure that the behaviour is not intended.
> 
> Yup, it's in match_tree_entry, you get the same thing with git show.
> With git.git, you can try with:
> 
> git show 4fa535a -- Documentation/git-merge.txt D
> 
> I'll try to get a patch done, if noone beats me to it.

Ah, crap, "git show" actually uses a different function,
tree_entry_interesting, which happens to have the same problem, but
needs a slightly different fix.

Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component
  2009-03-31  9:41                           ` Björn Steinbrink
@ 2009-03-31 15:05                             ` Björn Steinbrink
  2009-04-02  4:32                               ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Björn Steinbrink @ 2009-03-31 15:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, Anton Gyllenberg, git

Previously the code did a simple prefix match, which means that it
treated for example "foo/" as a subdirectory of "f".

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
---
I'm not exactly happy with the commit message, but that's the best I
could come up with. Probably shows how little I know about that code :-/
The test suite still passes and I'll try to provide a new testcase
tonight or tommorow.

 tree-diff.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 9f67af6..b05d0f4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -118,10 +118,16 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 				continue;
 
 			/*
-			 * The base is a subdirectory of a path which
-			 * was specified, so all of them are interesting.
+			 * If the base is a subdirectory of a path which
+			 * was specified, all of them are interesting.
 			 */
-			return 2;
+			if (!matchlen ||
+			    base[matchlen] == '/' ||
+			    match[matchlen - 1] == '/')
+				return 2;
+
+			/* Just a random prefix match */
+			continue;
 		}
 
 		/* Does the base match? */
-- 
1.6.2.1.426.gf94cd

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component
  2009-03-31 15:05                             ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
@ 2009-04-02  4:32                               ` Junio C Hamano
  2009-04-02  4:41                                 ` [PATCH] match_tree_entry(): a pathspec only matches at directory boundaries Junio C Hamano
  2009-04-02 11:38                                 ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
  0 siblings, 2 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-04-02  4:32 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Eric Wong, Anton Gyllenberg, git

Björn Steinbrink <B.Steinbrink@gmx.de> writes:

> Previously the code did a simple prefix match, which means that it
> treated for example "foo/" as a subdirectory of "f".
>
> Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
> ---
> I'm not exactly happy with the commit message, but that's the best I
> could come up with. Probably shows how little I know about that code :-/
> The test suite still passes and I'll try to provide a new testcase
> tonight or tommorow.

I'm planning to queue this.

From: Björn Steinbrink <B.Steinbrink@gmx.de>
Date: Tue, 31 Mar 2009 17:05:01 +0200
Subject: [PATCH] tree_entry_interesting: a pathspec only matches at directory boundary

Previously the code did a simple prefix match, which means that a
path in a directory "frotz/" would have matched with pathspec "f".

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t4010-diff-pathspec.sh |    8 ++++++++
 tree-diff.c              |   12 +++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index ad3d9e4..4c4c8b1 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -62,4 +62,12 @@ test_expect_success \
     'git diff-index --cached $tree -- file0/ >current &&
      compare_diff_raw current expected'
 
+test_expect_success 'diff-tree pathspec' '
+	tree2=$(git write-tree) &&
+	echo "$tree2" &&
+	git diff-tree -r --name-only $tree $tree2 -- pa path1/a >current &&
+	>expected &&
+	test_cmp expected current
+'
+
 test_done
diff --git a/tree-diff.c b/tree-diff.c
index 9f67af6..b05d0f4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -118,10 +118,16 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 				continue;
 
 			/*
-			 * The base is a subdirectory of a path which
-			 * was specified, so all of them are interesting.
+			 * If the base is a subdirectory of a path which
+			 * was specified, all of them are interesting.
 			 */
-			return 2;
+			if (!matchlen ||
+			    base[matchlen] == '/' ||
+			    match[matchlen - 1] == '/')
+				return 2;
+
+			/* Just a random prefix match */
+			continue;
 		}
 
 		/* Does the base match? */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] match_tree_entry(): a pathspec only matches at directory boundaries
  2009-04-02  4:32                               ` Junio C Hamano
@ 2009-04-02  4:41                                 ` Junio C Hamano
  2009-04-02 16:36                                   ` Linus Torvalds
  2009-04-02 11:38                                 ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
  1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2009-04-02  4:41 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Eric Wong, Anton Gyllenberg, git, Linus Torvalds

Previously the code did a simple prefix match, which means that a path in
a directory "frotz/" would have matched with pathspec "f".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * And this is a companion patch to fix ls-tree.  The test case uses a
   tree that has path3/1.txt and path3/2.txt in it.

   The bug Eric diagnosed and worked around in git-svn makes the current
   code show these two paths when pathspec "pa" and "path3/a" are given.
   The presense of "path3/a" makes the tree walker traverse down to path3
   subtree (in case something that matches "a" is in there---this is a
   correct behaviour), but then in that subtree, "pa" incorrectly matches
   "path3/1.txt".

   This logic dates back to 0ca14a5 (Start adding interfaces to read in
   partial trees, 2005-07-14).  I think it is just a simple oversight and
   we should fix it.

 t/t3101-ls-tree-dirname.sh |    6 ++++++
 tree.c                     |    8 ++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/t/t3101-ls-tree-dirname.sh b/t/t3101-ls-tree-dirname.sh
index 4dd7d12..51cb4a3 100755
--- a/t/t3101-ls-tree-dirname.sh
+++ b/t/t3101-ls-tree-dirname.sh
@@ -135,4 +135,10 @@ test_expect_success \
 EOF
      test_output'
 
+test_expect_success 'ls-tree filter is leading path match' '
+	git ls-tree $tree pa path3/a >current &&
+	>expected &&
+	test_output
+'
+
 test_done
diff --git a/tree.c b/tree.c
index 03e782a..d82a047 100644
--- a/tree.c
+++ b/tree.c
@@ -60,8 +60,12 @@ static int match_tree_entry(const char *base, int baselen, const char *path, uns
 			/* If it doesn't match, move along... */
 			if (strncmp(base, match, matchlen))
 				continue;
-			/* The base is a subdirectory of a path which was specified. */
-			return 1;
+			/* pathspecs match only at the directory boundaries */
+			if (!matchlen ||
+			    base[matchlen] == '/' ||
+			    match[matchlen - 1] == '/')
+				return 1;
+			continue;
 		}
 
 		/* Does the base match? */
-- 
1.6.2.1.483.gcc994

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component
  2009-04-02  4:32                               ` Junio C Hamano
  2009-04-02  4:41                                 ` [PATCH] match_tree_entry(): a pathspec only matches at directory boundaries Junio C Hamano
@ 2009-04-02 11:38                                 ` Björn Steinbrink
  2009-04-03 16:25                                   ` Junio C Hamano
  1 sibling, 1 reply; 28+ messages in thread
From: Björn Steinbrink @ 2009-04-02 11:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, Anton Gyllenberg, git

On 2009.04.01 21:32:05 -0700, Junio C Hamano wrote:
> I'm planning to queue this.
> [Improved version of my patch]

Ah, thanks. Got busy with other stuff, and tried to fix another related
bug in ls-tree which made me forget to send the match_tree_entry fix :-/

That other ls-tree bug is with recursing into subdirectories, because
match_tree_entry matches even when the base is a subdirectory, but
ls-tree doesn't actually want that behaviour, I think. For example:

$ git ls-tree --abbrev HEAD git-gui/macosx/AppMain.tcl
100644 blob ddbe633	git-gui/macosx/AppMain.tcl

$ git ls-tree --abbrev HEAD  git-gui/
100644 blob f96112d	git-gui/.gitattributes
100644 blob 6483b21	git-gui/.gitignore
100755 blob b3f937e	git-gui/GIT-VERSION-GEN
100644 blob 3ad8a21	git-gui/Makefile
100755 blob 12e117e	git-gui/git-gui--askpass
100755 blob e018e07	git-gui/git-gui.sh
040000 tree f723285	git-gui/lib
040000 tree 73f3c34	git-gui/macosx
040000 tree 11cd1a0	git-gui/po
040000 tree 144728d	git-gui/windows

$ git ls-tree --abbrev HEAD git-gui/macosx/AppMain.tcl git-gui/
100644 blob f96112d	git-gui/.gitattributes
100644 blob 6483b21	git-gui/.gitignore
100755 blob b3f937e	git-gui/GIT-VERSION-GEN
100644 blob 3ad8a21	git-gui/Makefile
100755 blob 12e117e	git-gui/git-gui--askpass
100755 blob e018e07	git-gui/git-gui.sh
040000 tree f723285	git-gui/lib
100644 blob ddbe633	git-gui/macosx/AppMain.tcl
100644 blob b3bf15f	git-gui/macosx/Info.plist
100644 blob 77d88a7	git-gui/macosx/git-gui.icns
040000 tree 11cd1a0	git-gui/po
040000 tree 144728d	git-gui/windows

The last ls-tree shows all entries from git-gui/macosx/, beacuse the
first pattern makes it descend into that tree and all entries are
matched by the git-gui/ prefix. So the combined ls-tree shows more than
what the individual calls show. Seems wrong to me, but I'm unsure how to
tackle that, assuming that match_tree_entry is right in allowing any
base that's a subdirectory of a specified pathspec.

Björn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] match_tree_entry(): a pathspec only matches at directory boundaries
  2009-04-02  4:41                                 ` [PATCH] match_tree_entry(): a pathspec only matches at directory boundaries Junio C Hamano
@ 2009-04-02 16:36                                   ` Linus Torvalds
  0 siblings, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2009-04-02 16:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Björn Steinbrink, Eric Wong, Anton Gyllenberg, git



On Wed, 1 Apr 2009, Junio C Hamano wrote:
>
> Previously the code did a simple prefix match, which means that a path in
> a directory "frotz/" would have matched with pathspec "f".
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

I have this suspicion that we should be able to write it more readably, 
but yes:

	Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

because the current code is clearly buggy.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component
  2009-04-02 11:38                                 ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
@ 2009-04-03 16:25                                   ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-04-03 16:25 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Eric Wong, Anton Gyllenberg, git

Björn Steinbrink <B.Steinbrink@gmx.de> writes:

> On 2009.04.01 21:32:05 -0700, Junio C Hamano wrote:
>> I'm planning to queue this.
>> [Improved version of my patch]
>
> Ah, thanks. Got busy with other stuff, and tried to fix another related
> bug in ls-tree which made me forget to send the match_tree_entry fix :-/
>
> That other ls-tree bug is with recursing into subdirectories, because
> match_tree_entry matches even when the base is a subdirectory, but
> ls-tree doesn't actually want that behaviour, I think. For example:
>
> $ git ls-tree --abbrev HEAD git-gui/macosx/AppMain.tcl
> 100644 blob ddbe633	git-gui/macosx/AppMain.tcl
>
> $ git ls-tree --abbrev HEAD  git-gui/
> 100644 blob f96112d	git-gui/.gitattributes
> 100644 blob 6483b21	git-gui/.gitignore
> 100755 blob b3f937e	git-gui/GIT-VERSION-GEN
> 100644 blob 3ad8a21	git-gui/Makefile
> 100755 blob 12e117e	git-gui/git-gui--askpass
> 100755 blob e018e07	git-gui/git-gui.sh
> 040000 tree f723285	git-gui/lib
> 040000 tree 73f3c34	git-gui/macosx
> 040000 tree 11cd1a0	git-gui/po
> 040000 tree 144728d	git-gui/windows
>
> $ git ls-tree --abbrev HEAD git-gui/macosx/AppMain.tcl git-gui/
> 100644 blob f96112d	git-gui/.gitattributes
> 100644 blob 6483b21	git-gui/.gitignore
> 100755 blob b3f937e	git-gui/GIT-VERSION-GEN
> 100644 blob 3ad8a21	git-gui/Makefile
> 100755 blob 12e117e	git-gui/git-gui--askpass
> 100755 blob e018e07	git-gui/git-gui.sh
> 040000 tree f723285	git-gui/lib
> 100644 blob ddbe633	git-gui/macosx/AppMain.tcl
> 100644 blob b3bf15f	git-gui/macosx/Info.plist
> 100644 blob 77d88a7	git-gui/macosx/git-gui.icns
> 040000 tree 11cd1a0	git-gui/po
> 040000 tree 144728d	git-gui/windows
>
> The last ls-tree shows all entries from git-gui/macosx/, beacuse the
> first pattern makes it descend into that tree and all entries are
> matched by the git-gui/ prefix. So the combined ls-tree shows more than
> what the individual calls show. Seems wrong to me, but I'm unsure how to
> tackle that, assuming that match_tree_entry is right in allowing any
> base that's a subdirectory of a specified pathspec.

What might be confusing is how ls-tree, when run without an explicit
option -r (ecurse), comes up with candidates to show.  With pathspecs, you
are instructing it to recurse into subtrees as deep as needed to discover
all the paths that can match them, and your first pathspec causes the
contents of macosx subtree to be looked at for this reason.

Pathspecs are always OR'ed together, never AND'ed, and your second
pathspec git-gui covers them, so I do not think it is a bug.

If we had a glob support in the "leading path components" pathspec matcher
used by ls-tree/diff-tree, I would imagine you could do something like;

	ls-tree --recurse-to='*/*/*' $other $pathspec

to say "recurse three levels, but do not use */*/* as a filter to decide
what to show; the filters are only $other and $pathspec", iow, using

	(AND "*/*/*" (OR $other $pathspec))

semantics.  But I think this kind of complexity falls out of the "useful"
category and into a "mental exercise" category.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-04-03 16:28 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-26 10:31 svn clone Checksum mismatch question Gilbert Liddell
2009-03-26 13:02 ` Björn Steinbrink
2009-03-26 13:28   ` Gilbert Liddell
2009-03-26 13:54     ` Sverre Rabbelier
2009-03-26 14:18       ` Gilbert Liddell
2009-03-26 14:34       ` Johannes Schindelin
2009-03-26 14:35   ` Anton Gyllenberg
2009-03-27 11:18     ` Anton Gyllenberg
2009-03-29  6:08       ` Eric Wong
2009-03-29  6:10         ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Eric Wong
2009-03-29 20:33           ` Junio C Hamano
2009-03-29 21:56             ` Eric Wong
2009-03-30  6:44               ` Junio C Hamano
2009-03-30 17:41                 ` Eric Wong
2009-03-30 18:05                   ` Junio C Hamano
2009-03-30 22:58                     ` Eric Wong
2009-03-31  7:11                       ` Björn Steinbrink
2009-03-31  7:31                         ` Björn Steinbrink
2009-03-31  9:41                           ` Björn Steinbrink
2009-03-31 15:05                             ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
2009-04-02  4:32                               ` Junio C Hamano
2009-04-02  4:41                                 ` [PATCH] match_tree_entry(): a pathspec only matches at directory boundaries Junio C Hamano
2009-04-02 16:36                                   ` Linus Torvalds
2009-04-02 11:38                                 ` [PATCH] tree_entry_interesting: Only recurse when the pathspec is a leading path component Björn Steinbrink
2009-04-03 16:25                                   ` Junio C Hamano
2009-03-30  5:28             ` [PATCH] git-svn: fix ls-tree usage with dash-prefixed paths Björn Steinbrink
2009-03-30  7:26         ` svn clone Checksum mismatch question Anton Gyllenberg
2009-03-26 14:34 ` Peter Harris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.