All of lore.kernel.org
 help / color / mirror / Atom feed
* What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
@ 2011-08-23  7:25 Marat Radchenko
  2011-08-23 10:03 ` Michael J Gruber
  2011-08-29  7:41 ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 56+ messages in thread
From: Marat Radchenko @ 2011-08-23  7:25 UTC (permalink / raw)
  To: git

$ time git show branch:file | diff -u - file > /dev/null 

real    0m0.003s
user    0m0.000s
sys     0m0.000s

$ time git diff branch -- file > /dev/null 

real    0m31.442s
user    0m31.040s
sys     0m0.380s

What does git diff do so it takes that much time? And is there any flag to git 
diff so that it will work as fast as show + diff? I thought these two are 
equivalent but from run time it is obvious that they aren't.

gprof output: http://slonopotamus.org/git-diff/git-diff-branch.gprof.txt

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23  7:25 What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`? Marat Radchenko
@ 2011-08-23 10:03 ` Michael J Gruber
  2011-08-23 10:52   ` Marat Radchenko
  2011-08-29  7:41 ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 56+ messages in thread
From: Michael J Gruber @ 2011-08-23 10:03 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

Marat Radchenko venit, vidit, dixit 23.08.2011 09:25:
> $ time git show branch:file | diff -u - file > /dev/null 
> 
> real    0m0.003s
> user    0m0.000s
> sys     0m0.000s
> 
> $ time git diff branch -- file > /dev/null 
> 
> real    0m31.442s
> user    0m31.040s
> sys     0m0.380s
> 
> What does git diff do so it takes that much time? And is there any flag to git 
> diff so that it will work as fast as show + diff? I thought these two are 
> equivalent but from run time it is obvious that they aren't.
> 
> gprof output: http://slonopotamus.org/git-diff/git-diff-branch.gprof.txt
> 

Is that a very large tree or a very slow file system?

>From the gprof output, it would appear that we compare a lot of cache
entries (or rather: spent most time on...). Do we enumerate all
differing files and only then limit diff output by path??

Michael

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 10:03 ` Michael J Gruber
@ 2011-08-23 10:52   ` Marat Radchenko
  2011-08-23 15:20     ` Michael Witten
  2011-08-23 15:34     ` Michael J Gruber
  0 siblings, 2 replies; 56+ messages in thread
From: Marat Radchenko @ 2011-08-23 10:52 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git

> Is that a very large tree or a very slow file system?
Tree is large (500k files), file system is irrelevant since all time is spend on CPU.

> Do we enumerate all
> differing files and only then limit diff output by path??

Dunno, that's why I am asking why it is so slow.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 10:52   ` Marat Radchenko
@ 2011-08-23 15:20     ` Michael Witten
  2011-08-23 15:34     ` Michael J Gruber
  1 sibling, 0 replies; 56+ messages in thread
From: Michael Witten @ 2011-08-23 15:20 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: Michael J Gruber, git

On Tue, Aug 23, 2011 at 10:52, Marat Radchenko <marat@slonopotamus.org> wrote:
>> Do we enumerate all differing files and only then
>> limit diff output by path??
>
> Dunno, that's why I am asking why it is so slow.

Why in the world do you think that question was directed at you?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 10:52   ` Marat Radchenko
  2011-08-23 15:20     ` Michael Witten
@ 2011-08-23 15:34     ` Michael J Gruber
  2011-08-23 16:45       ` Marat Radchenko
  2011-08-23 17:15       ` Junio C Hamano
  1 sibling, 2 replies; 56+ messages in thread
From: Michael J Gruber @ 2011-08-23 15:34 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

Marat Radchenko venit, vidit, dixit 23.08.2011 12:52:
>> Is that a very large tree or a very slow file system?
> Tree is large (500k files), file system is irrelevant since all time is spend on CPU.
> 
>> Do we enumerate all
>> differing files and only then limit diff output by path??
> 
> Dunno, that's why I am asking why it is so slow.

Well, we have to read the full tree before diffing. But I can't
reproduce the extreme difference which you observed (0.003s vs. 30s),
only a factor of ten or so for a repo with 100k files:

git init
seq 0 100000|while read n ; do echo a > a$n;done
git add .
git commit -m m
echo b > a0

On a ramdisk, I get:

time git diff  > /dev/null

real    0m0.160s
user    0m0.064s
sys     0m0.190s

time git diff -- a0 > /dev/null

real    0m0.070s
user    0m0.051s
sys     0m0.021s

time git diff HEAD > /dev/null

real    0m0.266s
user    0m0.145s
sys     0m0.212s

time git diff HEAD -- a0 > /dev/null

real    0m0.171s
user    0m0.136s
sys     0m0.033s

time git show HEAD:a0  > /dev/null

real    0m0.018s
user    0m0.009s
sys     0m0.007s

time git show HEAD:a0 | diff -u - a0 > /dev/null

real    0m0.019s
user    0m0.010s
sys     0m0.008s

Stumped.

In your case, do you have a lot of differing files besides the one you
are limitting to? Anyway, that does not seem to make a huge difference
in my timings. (Just tried.) Still Stumped.

Michael

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 15:34     ` Michael J Gruber
@ 2011-08-23 16:45       ` Marat Radchenko
  2011-08-23 17:15       ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Marat Radchenko @ 2011-08-23 16:45 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git

On вт 23 авг 2011 19:34:50 MSD, Michael J Gruber <git@drmicha.warpmail.net> wrote: 
> Well, we have to read the full tree before diffing.
I don't see why you need that.

> But I can't
> reproduce the extreme difference which you observed (0.003s vs. 30s)
Well, I have an extreme repo (sadly, private) that has already shown several scalability issues in various parts of git code. Hope this thread will help to improve it.

> In your case, do you have a lot of differing files besides the one you
> are limitting to?
1. I'm diffing a single (and rather small, <50kb) text file
2. Diff is done between two branches (master and bugfix for a particular release) one of which (master) already has several thousands of commits after fork, so yes, whole tree diffs a lot.

P.S. Fix for [1] might somewhat improve the situation but it still isn't clear to me why whole tree needs to be processed when specific path is given. Btw, it is 30s even with --no-renames.

[1] http://git.661346.n2.nabble.com/git-diff-is-slow-patience-is-fast-td6667216.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 15:34     ` Michael J Gruber
  2011-08-23 16:45       ` Marat Radchenko
@ 2011-08-23 17:15       ` Junio C Hamano
  2011-08-23 18:21         ` Marat Radchenko
  2011-08-23 20:07         ` Michael J Gruber
  1 sibling, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-23 17:15 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Marat Radchenko, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

> Marat Radchenko venit, vidit, dixit 23.08.2011 12:52:
>>> Is that a very large tree or a very slow file system?
>> Tree is large (500k files), file system is irrelevant since all time is spend on CPU.
>> 
>>> Do we enumerate all
>>> differing files and only then limit diff output by path??
>> 
>> Dunno, that's why I am asking why it is so slow.
>
> Well, we have to read the full tree before diffing.

Not necessarily, especially when pathspec is given like the original post,
i.e. "git diff $tree_ish -- $path". We would need to open tree objects
that lead to the leaf of the $path and a blob, but other objects won't be
needed.

The default diff backend tries to come up with minimal changes by spending
extra cycles, so it is not so surprising if the file compared is large-ish
and/or has very many similar lines in itself (in which case there are many
potential matching line pairs between the preimage and the postimage to be
examined to produce a minimal diff).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 17:15       ` Junio C Hamano
@ 2011-08-23 18:21         ` Marat Radchenko
  2011-08-23 20:07         ` Michael J Gruber
  1 sibling, 0 replies; 56+ messages in thread
From: Marat Radchenko @ 2011-08-23 18:21 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster <at> pobox.com> writes:
> Not necessarily, especially when pathspec is given like the original post,
> i.e. "git diff $tree_ish -- $path". We would need to open tree objects
> that lead to the leaf of the $path and a blob, but other objects won't be
> needed.
And git show branch:path already has that code.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 17:15       ` Junio C Hamano
  2011-08-23 18:21         ` Marat Radchenko
@ 2011-08-23 20:07         ` Michael J Gruber
  2011-08-25 16:09           ` Marat Radchenko
  2011-08-25 21:10           ` Junio C Hamano
  1 sibling, 2 replies; 56+ messages in thread
From: Michael J Gruber @ 2011-08-23 20:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marat Radchenko, git

Junio C Hamano venit, vidit, dixit 23.08.2011 19:15:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>> Marat Radchenko venit, vidit, dixit 23.08.2011 12:52:
>>>> Is that a very large tree or a very slow file system?
>>> Tree is large (500k files), file system is irrelevant since all time is spend on CPU.
>>>
>>>> Do we enumerate all
>>>> differing files and only then limit diff output by path??
>>>
>>> Dunno, that's why I am asking why it is so slow.
>>
>> Well, we have to read the full tree before diffing.
> 
> Not necessarily, especially when pathspec is given like the original post,
> i.e. "git diff $tree_ish -- $path". We would need to open tree objects
> that lead to the leaf of the $path and a blob, but other objects won't be
> needed.

I meant: The way "git diff" is now, it does that.

> 
> The default diff backend tries to come up with minimal changes by spending
> extra cycles, so it is not so surprising if the file compared is large-ish
> and/or has very many similar lines in itself (in which case there are many
> potential matching line pairs between the preimage and the postimage to be
> examined to produce a minimal diff).

But the file in this case is not that large, and "git diff" spends 30s!

Michael

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 20:07         ` Michael J Gruber
@ 2011-08-25 16:09           ` Marat Radchenko
  2011-08-25 21:10           ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Marat Radchenko @ 2011-08-25 16:09 UTC (permalink / raw)
  To: Michael J Gruber, Junio C Hamano; +Cc: git

On 08/24/2011 00:07:43 MSD, Michael J Gruber <git@drmicha.warpmail.net> wrote:

> Junio C Hamano venit, vidit, dixit 23.08.2011 19:15:
> > Michael J Gruber <git@drmicha.warpmail.net> writes:
> > 
> > > Marat Radchenko venit, vidit, dixit 23.08.2011 12:52:
> > > > > Is that a very large tree or a very slow file system?
> > > > Tree is large (500k files), file system is irrelevant since all
> > > > time is spend on CPU.
> > > > 
> > > > > Do we enumerate all
> > > > > differing files and only then limit diff output by path??
> > > > 
> > > > Dunno, that's why I am asking why it is so slow.
> > > 
> > > Well, we have to read the full tree before diffing.
> > 
> > Not necessarily, especially when pathspec is given like the original
> > post, i.e. "git diff $tree_ish -- $path". We would need to open tree
> > objects that lead to the leaf of the $path and a blob, but other
> > objects won't be needed.
> 
> I meant: The way "git diff" is now, it does that.
> 
> > 
> > The default diff backend tries to come up with minimal changes by
> > spending extra cycles, so it is not so surprising if the file compared
> > is large-ish and/or has very many similar lines in itself (in which
> > case there are many potential matching line pairs between the preimage
> > and the postimage to be examined to produce a minimal diff).
> 
> But the file in this case is not that large, and "git diff" spends 30s!

So, is some more info required from me or gprof output given in initial report + following discussion are enough to conclude what code needs to be improved?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23 20:07         ` Michael J Gruber
  2011-08-25 16:09           ` Marat Radchenko
@ 2011-08-25 21:10           ` Junio C Hamano
  2011-08-26  9:43             ` Marat Radchenko
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2011-08-25 21:10 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Marat Radchenko, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

>> The default diff backend tries to come up with minimal changes by spending
>> extra cycles, so it is not so surprising if the file compared is large-ish
>> and/or has very many similar lines in itself (in which case there are many
>> potential matching line pairs between the preimage and the postimage to be
>> examined to produce a minimal diff).
>
> But the file in this case is not that large, and "git diff" spends 30s!

If the difference was literally between

 $ git diff branch -- file
 $ git show branch:file | diff -u - file

that is, "file" is the name of a file in the top-level directory, I would
expect that former would open the top-level tree object for the branch,
read it thru until it finds "file", grabs a single blob and deflate it in
core, and compare that with the contents of a single file read from the
filesystem.

An interesting comparison may be to run this once:

   $ git show branch:file >fileI

and then compare between these two:

    $ diff -u fileI file
    $ git diff --no-index fileI file

If the latter is slower than the former in the same way as the original
experiment, that would mean that the tree traversal time does not have
anything to do with it (iow, your "The way 'git diff' is now, it does
that" is not just incorrect---we don't read the full tree to begin
with---but irrelevant).

If the "we try to come up with minimal changes by spending extra cycles" I
mentioned in my messages is indeed the cause, you may see diffferences
running the "git diff --no-index" version with backend options, e.g.

    $ git diff --no-index --patience fileI file
    $ git diff --no-index --histogram fileI file

Another thing to try may be to run the version from "next" that has
27af01d (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(),
2011-08-17), without any backend options.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-25 21:10           ` Junio C Hamano
@ 2011-08-26  9:43             ` Marat Radchenko
  0 siblings, 0 replies; 56+ messages in thread
From: Marat Radchenko @ 2011-08-26  9:43 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster <at> pobox.com> writes:
> An interesting comparison may be to run this once:
> 
>    $ git show branch:file >fileI
> 
> and then compare between these two:
> 
>     $ diff -u fileI file
>     $ git diff --no-index fileI file

In all times below, "file" is the same file i used in original report.

$ time diff -u fileI file > /dev/null 

real    0m0.002s
user    0m0.000s
sys     0m0.000s

$ time git diff --no-index fileI file > /dev/null 

real    0m0.331s
user    0m0.270s
sys     0m0.060s

Same with "pu" branch (has 27af01d):
$ time ~/git/git-diff --no-index fileI file > /dev/null 

real    0m0.307s
user    0m0.220s
sys     0m0.080s

> 
> If the latter is slower than the former in the same way as the original
> experiment, that would mean that the tree traversal time does not have
> anything to do with it (iow, your "The way 'git diff' is now, it does
> that" is not just incorrect---we don't read the full tree to begin
> with---but irrelevant).
I still suspect tree traversal. Original report with "pu" branch is still 30s.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-23  7:25 What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`? Marat Radchenko
  2011-08-23 10:03 ` Michael J Gruber
@ 2011-08-29  7:41 ` Nguyen Thai Ngoc Duy
  2011-08-29 14:48   ` Marat Radchenko
  1 sibling, 1 reply; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-08-29  7:41 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

On Tue, Aug 23, 2011 at 2:25 PM, Marat Radchenko <marat@slonopotamus.org> wrote:
> $ time git show branch:file | diff -u - file > /dev/null
>
> real    0m0.003s
> user    0m0.000s
> sys     0m0.000s
>
> $ time git diff branch -- file > /dev/null
>
> real    0m31.442s
> user    0m31.040s
> sys     0m0.380s
>
> What does git diff do so it takes that much time?

You said elsewhere in this thread this is private repo, so some more questions:

 - is "file" above at top repo, or is it actually very/deep/path/to/a/file?
 - how many entries in the tree that contain "file"?
 - how is "git ls-files | wc -l"?
 - how about "time git diff branch another-branch -- file >/dev/null"?
That'd remove unpack-trees code.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-29  7:41 ` Nguyen Thai Ngoc Duy
@ 2011-08-29 14:48   ` Marat Radchenko
  2011-08-29 16:09     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 56+ messages in thread
From: Marat Radchenko @ 2011-08-29 14:48 UTC (permalink / raw)
  To: git

Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:
>  - is "file" above at top repo, or is it actually very/deep/path/to/a/file?
3 levels deep. Most parent dir (one after repo root) contains 20k files.

>  - how many entries in the tree that contain "file"?
Sorry, didn't understand this.

>  - how is "git ls-files | wc -l"?
$ time git ls-files | wc -l
603137

real    0m0.417s
user    0m0.440s
sys     0m0.060s

>  - how about "time git diff branch another-branch -- file >/dev/null"?
> That'd remove unpack-trees code.
Pretty fast:
$ time git diff branch:file other_branch:file > /dev/null

real    0m0.278s
user    0m0.210s
sys     0m0.060s

One more test:
git diff HEAD branch -- file > /dev/null 

real    0m0.276s
user    0m0.240s
sys     0m0.030s

So the only troubled variant is `git diff branch -- file`.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-29 14:48   ` Marat Radchenko
@ 2011-08-29 16:09     ` Nguyen Thai Ngoc Duy
  2011-08-29 17:18       ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-08-29 16:09 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

On Mon, Aug 29, 2011 at 9:48 PM, Marat Radchenko <marat@slonopotamus.org> wrote:
> Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:
>>  - is "file" above at top repo, or is it actually very/deep/path/to/a/file?
> 3 levels deep. Most parent dir (one after repo root) contains 20k files.
>
>>  - how many entries in the tree that contain "file"?
> Sorry, didn't understand this.

You have already answered it. I was asking the size of parent dir, but
phrased poorly.

>>  - how is "git ls-files | wc -l"?
> $ time git ls-files | wc -l
> 603137
>
> real    0m0.417s
> user    0m0.440s
> sys     0m0.060s
>
>>  - how about "time git diff branch another-branch -- file >/dev/null"?
>> That'd remove unpack-trees code.
> Pretty fast:
>
> git diff HEAD branch -- file > /dev/null
>
> real    0m0.276s
> user    0m0.240s
> sys     0m0.030s

That may explain it. "git diff <ref>" walks through the index, unpacks
tree objects along the way, matches up entries with the same path from
the branch, the index then feeds matching entries to diff function. If
tree cutting is not done efficiently, it could very well walk through
every entry in the index (~600k entries in your case), unpacking all
tree objects along the way.

And it looks like to me that diff_cache() in diff-lib.c, responsible
for this case, does not do any prefix trimming. traverse_trees() also
does not seem to do "never_interesting" optimization like in
tree_interesting(), so if the traversed tree is big (~20k as you told
me), it will take some time, even though you are only interested in a
single entry.

> So the only troubled variant is `git diff branch -- file`.

No, I suspect "git diff --cached" would be also slow. "git merge"
would be definitely slow. But we can hardly improve these cases
because the commands are usually called tree-wide, no path limiting.

If you only work on a small subset of files, there's some (unfinished)
code in narrow clone implementation that cuts down index size, which
may speed up in big-index repositories like yours. Could be a good
reason for me (or someone) to extract that part and get it in before
full narrow clone is implemented.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-29 16:09     ` Nguyen Thai Ngoc Duy
@ 2011-08-29 17:18       ` Junio C Hamano
  2011-08-29 20:42         ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 17:18 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Marat Radchenko, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> That may explain it. "git diff <ref>" walks through the index, unpacks
> tree objects along the way, matches up entries with the same path from
> the branch, the index then feeds matching entries to diff function.

Yeah, that would explain it. The implementation of diff-index has changed
a few times and the latest incarnation is based on unpack-trees, _merging_
the tree and the index, introduced by d1f2d7e (Make run_diff_index() use
unpack_trees(), not read_tree(), 2008-01-19).

Of course the merge machinery does not know anything about pruning with
pathspec, so it is understandable (not justifiable) it would walk the full
tree.

Will try to find time this week to cook up something.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-29 17:18       ` Junio C Hamano
@ 2011-08-29 20:42         ` Junio C Hamano
  2011-08-29 20:50           ` Junio C Hamano
                             ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 20:42 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Marat Radchenko, git

Junio C Hamano <gitster@pobox.com> writes:

> Of course the merge machinery does not know anything about pruning with
> pathspec, so it is understandable (not justifiable) it would walk the full
> tree.
>
> Will try to find time this week to cook up something.

This is still rough, but seems to pass the test suite, and gives me some
performance boost when applied to the kernel tree:

    (without patch)
    $ time git diff --raw --cached v2.6.30 -- virt/kvm
    real    0m0.114s
    user    0m0.088s
    sys     0m0.028s

    (with patch)
    $ time ./git diff --raw --cached v2.6.30 -- virt/kvm
    real    0m0.075s
    user    0m0.068s
    sys     0m0.008s

What I do not like about it most is that we have an infrastructure that
links traverse_info across stackframes to store the paths unexpanded and
without extra linear allocation, but tree_entry_interesting() wants the
path as a single string. Hence unpack_trees() carries an extra baggage
"base" string, even though the general callback in tree-walk machinery
does not need it.

I think this can be trivially optimized to keep a pointer to a single
strbuf in the traverse_info (initialize it at the same points as this
patch sets info.pathspec), extending it as it digs deeper (copy the same
pointer to the strbuf to a child traverse_info and tuck the name of the
directory it descends into it, at the same points as this patch copies
info->pathspec from the parent), and resetting the length back when the
traversal into a subdirectory comes back.



 diff-lib.c     |    1 +
 tree-walk.c    |   39 +++++++++++++++++++++++++++++++++------
 tree-walk.h    |    1 +
 unpack-trees.c |    2 ++
 unpack-trees.h |    1 +
 5 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index f8454dd..ebe751e 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -468,6 +468,7 @@ static int diff_cache(struct rev_info *revs,
 	opts.unpack_data = revs;
 	opts.src_index = &the_index;
 	opts.dst_index = NULL;
+	opts.pathspec = &revs->diffopt.pathspec;
 
 	init_tree_desc(&t, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
diff --git a/tree-walk.c b/tree-walk.c
index 33f749e..808bb55 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -309,6 +309,18 @@ static void free_extended_entry(struct tree_desc_x *t)
 	}
 }
 
+static inline int prune_traversal(struct name_entry *e,
+				  struct traverse_info *info,
+				  struct strbuf *base,
+				  int still_interesting)
+{
+	if (!info->pathspec || still_interesting == 2)
+		return 2;
+	if (still_interesting < 0)
+		return still_interesting;
+	return tree_entry_interesting(e, base, 0, info->pathspec);
+}
+
 int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 {
 	int ret = 0;
@@ -316,10 +328,18 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 	struct name_entry *entry = xmalloc(n*sizeof(*entry));
 	int i;
 	struct tree_desc_x *tx = xcalloc(n, sizeof(*tx));
+	struct strbuf base = STRBUF_INIT;
+	int interesting = 1;
 
 	for (i = 0; i < n; i++)
 		tx[i].d = t[i];
 
+	if (info->prev) {
+		strbuf_grow(&base, info->pathlen);
+		make_traverse_path(base.buf, info->prev, &info->name);
+		base.buf[info->pathlen-1] = '/';
+		strbuf_setlen(&base, info->pathlen);
+	}
 	for (;;) {
 		unsigned long mask, dirmask;
 		const char *first = NULL;
@@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 			mask |= 1ul << i;
 			if (S_ISDIR(entry[i].mode))
 				dirmask |= 1ul << i;
+			e = &entry[i];
 		}
 		if (!mask)
 			break;
-		ret = info->fn(n, mask, dirmask, entry, info);
-		if (ret < 0) {
-			error = ret;
-			if (!info->show_all_errors)
-				break;
+		interesting = prune_traversal(e, info, &base, interesting);
+		if (interesting < 0)
+			break;
+		if (interesting) {
+			ret = info->fn(n, mask, dirmask, entry, info);
+			if (ret < 0) {
+				error = ret;
+				if (!info->show_all_errors)
+					break;
+			}
+			mask &= ret;
 		}
-		mask &= ret;
 		ret = 0;
 		for (i = 0; i < n; i++)
 			if (mask & (1ul << i))
@@ -395,6 +421,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 	for (i = 0; i < n; i++)
 		free_extended_entry(tx + i);
 	free(tx);
+	strbuf_release(&base);
 	return error;
 }
 
diff --git a/tree-walk.h b/tree-walk.h
index 39524b7..0089581 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -44,6 +44,7 @@ struct traverse_info {
 	struct traverse_info *prev;
 	struct name_entry name;
 	int pathlen;
+	struct pathspec *pathspec;
 
 	unsigned long conflicts;
 	traverse_callback_t fn;
diff --git a/unpack-trees.c b/unpack-trees.c
index cc616c3..670b464 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -444,6 +444,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 
 	newinfo = *info;
 	newinfo.prev = info;
+	newinfo.pathspec = info->pathspec;
 	newinfo.name = *p;
 	newinfo.pathlen += tree_entry_len(p->path, p->sha1) + 1;
 	newinfo.conflicts |= df_conflicts;
@@ -1040,6 +1041,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.fn = unpack_callback;
 		info.data = o;
 		info.show_all_errors = o->show_all_errors;
+		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
 			/*
diff --git a/unpack-trees.h b/unpack-trees.h
index 7998948..5e432f5 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -52,6 +52,7 @@ struct unpack_trees_options {
 	const char *prefix;
 	int cache_bottom;
 	struct dir_struct *dir;
+	struct pathspec *pathspec;
 	merge_fn_t fn;
 	const char *msgs[NB_UNPACK_TREES_ERROR_TYPES];
 	/*

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-29 20:42         ` Junio C Hamano
@ 2011-08-29 20:50           ` Junio C Hamano
  2011-08-29 21:09           ` Junio C Hamano
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
  2 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 20:50 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Marat Radchenko, git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Of course the merge machinery does not know anything about pruning with
>> pathspec, so it is understandable (not justifiable) it would walk the full
>> tree.
>>
>> Will try to find time this week to cook up something.
>
> This is still rough, but seems to pass the test suite, and gives me some
> performance boost when applied to the kernel tree:

Ehh, no, the patch does not apply to the kernel tree, but I meant "when
used on the kernel tree".

>     (without patch)
>     $ time git diff --raw --cached v2.6.30 -- virt/kvm
>     real    0m0.114s
>     user    0m0.088s
>     sys     0m0.028s
>
>     (with patch)
>     $ time ./git diff --raw --cached v2.6.30 -- virt/kvm
>     real    0m0.075s
>     user    0m0.068s
>     sys     0m0.008s
>
> What I do not like about it most is that we have an infrastructure that
> links traverse_info across stackframes to store the paths unexpanded and
> without extra linear allocation, but tree_entry_interesting() wants the
> path as a single string. Hence unpack_trees() carries an extra baggage
> "base" string, even though the general callback in tree-walk machinery
> does not need it.
>
> I think this can be trivially optimized to keep a pointer to a single
> strbuf in the traverse_info (initialize it at the same points as this
> patch sets info.pathspec), extending it as it digs deeper (copy the same
> pointer to the strbuf to a child traverse_info and tuck the name of the
> directory it descends into it, at the same points as this patch copies
> info->pathspec from the parent), and resetting the length back when the
> traversal into a subdirectory comes back.
>
>
>
>  diff-lib.c     |    1 +
>  tree-walk.c    |   39 +++++++++++++++++++++++++++++++++------
>  tree-walk.h    |    1 +
>  unpack-trees.c |    2 ++
>  unpack-trees.h |    1 +
>  5 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index f8454dd..ebe751e 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -468,6 +468,7 @@ static int diff_cache(struct rev_info *revs,
>  	opts.unpack_data = revs;
>  	opts.src_index = &the_index;
>  	opts.dst_index = NULL;
> +	opts.pathspec = &revs->diffopt.pathspec;
>  
>  	init_tree_desc(&t, tree->buffer, tree->size);
>  	return unpack_trees(1, &t, &opts);
> diff --git a/tree-walk.c b/tree-walk.c
> index 33f749e..808bb55 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -309,6 +309,18 @@ static void free_extended_entry(struct tree_desc_x *t)
>  	}
>  }
>  
> +static inline int prune_traversal(struct name_entry *e,
> +				  struct traverse_info *info,
> +				  struct strbuf *base,
> +				  int still_interesting)
> +{
> +	if (!info->pathspec || still_interesting == 2)
> +		return 2;
> +	if (still_interesting < 0)
> +		return still_interesting;
> +	return tree_entry_interesting(e, base, 0, info->pathspec);
> +}
> +
>  int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>  {
>  	int ret = 0;
> @@ -316,10 +328,18 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>  	struct name_entry *entry = xmalloc(n*sizeof(*entry));
>  	int i;
>  	struct tree_desc_x *tx = xcalloc(n, sizeof(*tx));
> +	struct strbuf base = STRBUF_INIT;
> +	int interesting = 1;
>  
>  	for (i = 0; i < n; i++)
>  		tx[i].d = t[i];
>  
> +	if (info->prev) {
> +		strbuf_grow(&base, info->pathlen);
> +		make_traverse_path(base.buf, info->prev, &info->name);
> +		base.buf[info->pathlen-1] = '/';
> +		strbuf_setlen(&base, info->pathlen);
> +	}
>  	for (;;) {
>  		unsigned long mask, dirmask;
>  		const char *first = NULL;
> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>  			mask |= 1ul << i;
>  			if (S_ISDIR(entry[i].mode))
>  				dirmask |= 1ul << i;
> +			e = &entry[i];
>  		}
>  		if (!mask)
>  			break;
> -		ret = info->fn(n, mask, dirmask, entry, info);
> -		if (ret < 0) {
> -			error = ret;
> -			if (!info->show_all_errors)
> -				break;
> +		interesting = prune_traversal(e, info, &base, interesting);
> +		if (interesting < 0)
> +			break;
> +		if (interesting) {
> +			ret = info->fn(n, mask, dirmask, entry, info);
> +			if (ret < 0) {
> +				error = ret;
> +				if (!info->show_all_errors)
> +					break;
> +			}
> +			mask &= ret;
>  		}
> -		mask &= ret;
>  		ret = 0;
>  		for (i = 0; i < n; i++)
>  			if (mask & (1ul << i))
> @@ -395,6 +421,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>  	for (i = 0; i < n; i++)
>  		free_extended_entry(tx + i);
>  	free(tx);
> +	strbuf_release(&base);
>  	return error;
>  }
>  
> diff --git a/tree-walk.h b/tree-walk.h
> index 39524b7..0089581 100644
> --- a/tree-walk.h
> +++ b/tree-walk.h
> @@ -44,6 +44,7 @@ struct traverse_info {
>  	struct traverse_info *prev;
>  	struct name_entry name;
>  	int pathlen;
> +	struct pathspec *pathspec;
>  
>  	unsigned long conflicts;
>  	traverse_callback_t fn;
> diff --git a/unpack-trees.c b/unpack-trees.c
> index cc616c3..670b464 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -444,6 +444,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>  
>  	newinfo = *info;
>  	newinfo.prev = info;
> +	newinfo.pathspec = info->pathspec;
>  	newinfo.name = *p;
>  	newinfo.pathlen += tree_entry_len(p->path, p->sha1) + 1;
>  	newinfo.conflicts |= df_conflicts;
> @@ -1040,6 +1041,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  		info.fn = unpack_callback;
>  		info.data = o;
>  		info.show_all_errors = o->show_all_errors;
> +		info.pathspec = o->pathspec;
>  
>  		if (o->prefix) {
>  			/*
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 7998948..5e432f5 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -52,6 +52,7 @@ struct unpack_trees_options {
>  	const char *prefix;
>  	int cache_bottom;
>  	struct dir_struct *dir;
> +	struct pathspec *pathspec;
>  	merge_fn_t fn;
>  	const char *msgs[NB_UNPACK_TREES_ERROR_TYPES];
>  	/*

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
  2011-08-29 20:42         ` Junio C Hamano
  2011-08-29 20:50           ` Junio C Hamano
@ 2011-08-29 21:09           ` Junio C Hamano
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
  2 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 21:09 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Marat Radchenko, git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Of course the merge machinery does not know anything about pruning with
>> pathspec, so it is understandable (not justifiable) it would walk the full
>> tree.
>>
>> Will try to find time this week to cook up something.
>
> This is still rough, but seems to pass the test suite, and gives me some
> performance boost when applied to the kernel tree:
>
>     (without patch)
>     $ time git diff --raw --cached v2.6.30 -- virt/kvm
>     real    0m0.114s
>     user    0m0.088s
>     sys     0m0.028s
>
>     (with patch)
>     $ time ./git diff --raw --cached v2.6.30 -- virt/kvm
>     real    0m0.075s
>     user    0m0.068s
>     sys     0m0.008s

Heh, this is even better without --cached.

    (without patch)
    $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6
    real    0m0.538s
    user    0m0.492s
    sys     0m0.048s

    (with patch)
    $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6
    real    0m0.038s
    user    0m0.028s
    sys     0m0.012s

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 20:42         ` Junio C Hamano
  2011-08-29 20:50           ` Junio C Hamano
  2011-08-29 21:09           ` Junio C Hamano
@ 2011-08-29 21:33           ` Junio C Hamano
  2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
                               ` (4 more replies)
  2 siblings, 5 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 21:33 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy

"git diff A B -- $pathspec" to compare two tree-ishes knew how to apply
pathspec to avoid opening trees that fall outside the area of interest,
but "git diff A -- $pathspec" used unpack_trees() machinery that was meant
for full-tree merges, and ended up reading the whole tree only to discard
potentially major part of the work it does.

Before and after applying this series, looking for changes in the kernel
repository with a fairly narrow pathspec gets a moderate speeds up.

  (without patch)
  $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
  0.48user 0.05system 0:00.53elapsed 100%CPU (0avgtext+0avgdata 163296maxresident)k
  0inputs+952outputs (0major+11163minor)pagefaults 0swaps

  (with patch)
  $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
  0.01user 0.00system 0:00.02elapsed 104%CPU (0avgtext+0avgdata 43856maxresident)k
  0inputs+24outputs (0major+3688minor)pagefaults 0swaps

Junio C Hamano (3):
  tree-walk: allow pruning with pathspec
  unpack-trees: allow pruning with pathspec
  diff-index: pass pathspec down to unpack-trees machinery

 diff-lib.c     |    1 +
 tree-walk.c    |   39 +++++++++++++++++++++++++++++++++------
 tree-walk.h    |    1 +
 unpack-trees.c |    2 ++
 unpack-trees.h |    1 +
 5 files changed, 38 insertions(+), 6 deletions(-)

-- 
1.7.7.rc0.70.g82660

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
@ 2011-08-29 21:33             ` Junio C Hamano
  2011-08-30 12:53               ` Nguyen Thai Ngoc Duy
  2011-10-09 15:39               ` Michael Haggerty
  2011-08-29 21:33             ` [PATCH 2/3] unpack-trees: " Junio C Hamano
                               ` (3 subsequent siblings)
  4 siblings, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 21:33 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy

The traverse_trees() machinery is primarily meant for merging two (or
more) trees, and because a merge is a full tree operation, it doesn't
support any pruning with pathspec.

Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19), however, we use unpack_trees() to traverse_trees() callchain
to perform "diff-index", which could waste a lot of work traversing trees
outside the user-supplied pathspec, only to discard at the blob comparison
level in diff-lib.c::oneway_diff() which is way too late.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 tree-walk.c |   39 +++++++++++++++++++++++++++++++++------
 tree-walk.h |    1 +
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 33f749e..808bb55 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -309,6 +309,18 @@ static void free_extended_entry(struct tree_desc_x *t)
 	}
 }
 
+static inline int prune_traversal(struct name_entry *e,
+				  struct traverse_info *info,
+				  struct strbuf *base,
+				  int still_interesting)
+{
+	if (!info->pathspec || still_interesting == 2)
+		return 2;
+	if (still_interesting < 0)
+		return still_interesting;
+	return tree_entry_interesting(e, base, 0, info->pathspec);
+}
+
 int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 {
 	int ret = 0;
@@ -316,10 +328,18 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 	struct name_entry *entry = xmalloc(n*sizeof(*entry));
 	int i;
 	struct tree_desc_x *tx = xcalloc(n, sizeof(*tx));
+	struct strbuf base = STRBUF_INIT;
+	int interesting = 1;
 
 	for (i = 0; i < n; i++)
 		tx[i].d = t[i];
 
+	if (info->prev) {
+		strbuf_grow(&base, info->pathlen);
+		make_traverse_path(base.buf, info->prev, &info->name);
+		base.buf[info->pathlen-1] = '/';
+		strbuf_setlen(&base, info->pathlen);
+	}
 	for (;;) {
 		unsigned long mask, dirmask;
 		const char *first = NULL;
@@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 			mask |= 1ul << i;
 			if (S_ISDIR(entry[i].mode))
 				dirmask |= 1ul << i;
+			e = &entry[i];
 		}
 		if (!mask)
 			break;
-		ret = info->fn(n, mask, dirmask, entry, info);
-		if (ret < 0) {
-			error = ret;
-			if (!info->show_all_errors)
-				break;
+		interesting = prune_traversal(e, info, &base, interesting);
+		if (interesting < 0)
+			break;
+		if (interesting) {
+			ret = info->fn(n, mask, dirmask, entry, info);
+			if (ret < 0) {
+				error = ret;
+				if (!info->show_all_errors)
+					break;
+			}
+			mask &= ret;
 		}
-		mask &= ret;
 		ret = 0;
 		for (i = 0; i < n; i++)
 			if (mask & (1ul << i))
@@ -395,6 +421,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 	for (i = 0; i < n; i++)
 		free_extended_entry(tx + i);
 	free(tx);
+	strbuf_release(&base);
 	return error;
 }
 
diff --git a/tree-walk.h b/tree-walk.h
index 39524b7..0089581 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -44,6 +44,7 @@ struct traverse_info {
 	struct traverse_info *prev;
 	struct name_entry name;
 	int pathlen;
+	struct pathspec *pathspec;
 
 	unsigned long conflicts;
 	traverse_callback_t fn;
-- 
1.7.7.rc0.70.g82660

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/3] unpack-trees: allow pruning with pathspec
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
  2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
@ 2011-08-29 21:33             ` Junio C Hamano
  2011-08-30 13:03               ` Nguyen Thai Ngoc Duy
  2011-08-30 15:24               ` David Michael Barr
  2011-08-29 21:33             ` [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery Junio C Hamano
                               ` (2 subsequent siblings)
  4 siblings, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 21:33 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy

Use the pathspec pruning of traverse_trees() from unpack_trees(). Again,
the unpack_trees() machinery is primarily meant for merging two (or more)
trees, and because a merge is a full tree operation, it didn't support any
pruning with pathspec, and this codepath probably should not be enabled
while running a merge, but the caller in diff-lib.c::diff_cache() should
be able to take advanrage of it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 unpack-trees.c |    2 ++
 unpack-trees.h |    1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index cc616c3..670b464 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -444,6 +444,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 
 	newinfo = *info;
 	newinfo.prev = info;
+	newinfo.pathspec = info->pathspec;
 	newinfo.name = *p;
 	newinfo.pathlen += tree_entry_len(p->path, p->sha1) + 1;
 	newinfo.conflicts |= df_conflicts;
@@ -1040,6 +1041,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.fn = unpack_callback;
 		info.data = o;
 		info.show_all_errors = o->show_all_errors;
+		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
 			/*
diff --git a/unpack-trees.h b/unpack-trees.h
index 7998948..5e432f5 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -52,6 +52,7 @@ struct unpack_trees_options {
 	const char *prefix;
 	int cache_bottom;
 	struct dir_struct *dir;
+	struct pathspec *pathspec;
 	merge_fn_t fn;
 	const char *msgs[NB_UNPACK_TREES_ERROR_TYPES];
 	/*
-- 
1.7.7.rc0.70.g82660

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
  2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
  2011-08-29 21:33             ` [PATCH 2/3] unpack-trees: " Junio C Hamano
@ 2011-08-29 21:33             ` Junio C Hamano
  2012-01-11  6:31               ` Jonathan Nieder
  2011-08-29 21:56             ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Linus Torvalds
  2011-08-30 10:04             ` Michael J Gruber
  4 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 21:33 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy

And finally, pass the pathspec down through unpack_trees() to traverse_trees()
callchain.

Before and after applying this series, looking for changes in the kernel
repository with a fairly narrow pathspec gets a moderate speeds up.

  (without patch)
  $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
  0.48user 0.05system 0:00.53elapsed 100%CPU (0avgtext+0avgdata 163296maxresident)k
  0inputs+952outputs (0major+11163minor)pagefaults 0swaps

  (with patch)
  $ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
  0.01user 0.00system 0:00.02elapsed 104%CPU (0avgtext+0avgdata 43856maxresident)k
  0inputs+24outputs (0major+3688minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff-lib.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index f8454dd..ebe751e 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -468,6 +468,7 @@ static int diff_cache(struct rev_info *revs,
 	opts.unpack_data = revs;
 	opts.src_index = &the_index;
 	opts.dst_index = NULL;
+	opts.pathspec = &revs->diffopt.pathspec;
 
 	init_tree_desc(&t, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
-- 
1.7.7.rc0.70.g82660

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
                               ` (2 preceding siblings ...)
  2011-08-29 21:33             ` [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery Junio C Hamano
@ 2011-08-29 21:56             ` Linus Torvalds
  2011-08-29 22:05               ` Junio C Hamano
  2011-08-30 10:04             ` Michael J Gruber
  4 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2011-08-29 21:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy

On Mon, Aug 29, 2011 at 2:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Before and after applying this series, looking for changes in the kernel
> repository with a fairly narrow pathspec gets a moderate speeds up.

"moderate speeds up"?

Looks like a big win to me. Admittedly it's already a pretty fast
operation, but script it and repeat it a million times, and that will
matter a lot more.

I guess the "--raw" diff part means that you are hiding the time to
make a real diff, which would otherwise swamp everything else. Even
so, this looks like a good improvement.

                    Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 21:56             ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Linus Torvalds
@ 2011-08-29 22:05               ` Junio C Hamano
  2011-08-29 22:11                 ` Linus Torvalds
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 22:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Nguyen Thai Ngoc Duy

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Aug 29, 2011 at 2:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Before and after applying this series, looking for changes in the kernel
>> repository with a fairly narrow pathspec gets a moderate speeds up.
>
> "moderate speeds up"?
>
> Looks like a big win to me. Admittedly it's already a pretty fast
> operation, but script it and repeat it a million times, and that will
> matter a lot more.
>
> I guess the "--raw" diff part means that you are hiding the time to
> make a real diff, which would otherwise swamp everything else. Even
> so, this looks like a good improvement.

The topic started by Marat Radchenko in

    http://thread.gmane.org/gmane.comp.version-control.git/179926

who was trying to pick a single path (we do not know how deep it is), and
was comparing between these two:

  $ time git show branch:file | diff -u - file > /dev/null 
  real    0m0.003s
  user    0m0.000s
  sys     0m0.000s
  $ time git diff branch -- file > /dev/null 
  real    0m31.442s
  user    0m31.040s
  sys     0m0.380s

Replacing "diff -u" with "git diff --no-index" in the former didn't make
much of a difference. It turned out that Marat had 603k paths in the index
and the most of the time was spent in the unpack machinery.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 22:05               ` Junio C Hamano
@ 2011-08-29 22:11                 ` Linus Torvalds
  2011-08-29 23:42                   ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2011-08-29 22:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyen Thai Ngoc Duy

On Mon, Aug 29, 2011 at 3:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> The topic started by Marat Radchenko in

Ugh, that's ugly. Is it verified to solve Marat's nasty case too?

But yeah, ack on the series.

                 Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 22:11                 ` Linus Torvalds
@ 2011-08-29 23:42                   ` Junio C Hamano
  2011-08-30  6:16                     ` Marat Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2011-08-29 23:42 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: Linus Torvalds, git, Nguyen Thai Ngoc Duy

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Aug 29, 2011 at 3:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> The topic started by Marat Radchenko in
>
> Ugh, that's ugly. Is it verified to solve Marat's nasty case too?

Not yet, though Marat is on the CC list of the original "this is still
rough" draft (the code is identical but it is not split up into
three-patch series) message.

Marat, if/when you have a chance could you try a patched git on your
original use case and see if it produces correct output with shorter
amount of time?

> But yeah, ack on the series.
>
>                  Linus

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 23:42                   ` Junio C Hamano
@ 2011-08-30  6:16                     ` Marat Radchenko
  2011-08-31  0:18                       ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Marat Radchenko @ 2011-08-30  6:16 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster <at> pobox.com> writes:
> Marat, if/when you have a chance could you try a patched git on your
> original use case and see if it produces correct output with shorter
> amount of time?

30s without patch and 0.3s with it. You rock :)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
                               ` (3 preceding siblings ...)
  2011-08-29 21:56             ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Linus Torvalds
@ 2011-08-30 10:04             ` Michael J Gruber
  2011-08-30 17:03               ` Junio C Hamano
  4 siblings, 1 reply; 56+ messages in thread
From: Michael J Gruber @ 2011-08-30 10:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy

Junio C Hamano venit, vidit, dixit 29.08.2011 23:33:
> "git diff A B -- $pathspec" to compare two tree-ishes knew how to apply
> pathspec to avoid opening trees that fall outside the area of interest,
> but "git diff A -- $pathspec" used unpack_trees() machinery that was meant
> for full-tree merges, and ended up reading the whole tree only to discard
> potentially major part of the work it does.

Seems my analysis wasn't that far off (though, admittedly, unspecific):

MG:
> Well, we have to read the full tree before diffing. But I can't
...
> I meant: The way "git diff" is now, it does that.

JC:
> anything to do with it (iow, your "The way 'git diff' is now, it does
> that" is not just incorrect---we don't read the full tree to begin
> with---but irrelevant).

Anyways, as MR writes:

MR:
> 30s without patch and 0.3s with it. You rock ;)

I agree with all of that ;)

Michael

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
@ 2011-08-30 12:53               ` Nguyen Thai Ngoc Duy
  2011-08-30 17:44                 ` Junio C Hamano
  2011-10-09 15:39               ` Michael Haggerty
  1 sibling, 1 reply; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-08-30 12:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tue, Aug 30, 2011 at 4:33 AM, Junio C Hamano <gitster@pobox.com> wrote:
> @@ -316,10 +328,18 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)

Not related to the patch, but a bit of comment how this function works
and how it expects info->fn() to return would be appreciated.


>        struct name_entry *entry = xmalloc(n*sizeof(*entry));
>        int i;
>        struct tree_desc_x *tx = xcalloc(n, sizeof(*tx));
> +       struct strbuf base = STRBUF_INIT;
> +       int interesting = 1;

I suspect making "base" an argument to traverse_trees() may save us a
few more alloc/free (or a lot, depends on how crowded the tree is).
There are only two users of traverse_trees(): merge-tree.c and
unpack-trees.c, changes should be manageable.


> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>                        mask |= 1ul << i;
>                        if (S_ISDIR(entry[i].mode))
>                                dirmask |= 1ul << i;
> +                       e = &entry[i];
>                }

Why? "e" is not used in that loop or anywhere after that.


>                if (!mask)
>                        break;
> -               ret = info->fn(n, mask, dirmask, entry, info);
> -               if (ret < 0) {
> -                       error = ret;
> -                       if (!info->show_all_errors)
> -                               break;
> +               interesting = prune_traversal(e, info, &base, interesting);
> +               if (interesting < 0)
> +                       break;

I don't really understand this function to comment. But I guess when
interesting < 0, we only skip info->fn() and assume it returns "mask"
(its user unpack_callback() only returns either "mask" or -1). So the
code at the end of the main loop

		for (i = 0; i < n; i++)
			if (mask & (1ul << i))
				update_extended_entry(tx + i, entry + i);

should not be skipped (by putting a break here). IOW, there should be
no "break;" here.


>                }
> -               mask &= ret;
>                ret = 0;

While at there, remove "ret = 0;" too? I think this statement is useless.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/3] unpack-trees: allow pruning with pathspec
  2011-08-29 21:33             ` [PATCH 2/3] unpack-trees: " Junio C Hamano
@ 2011-08-30 13:03               ` Nguyen Thai Ngoc Duy
  2011-08-30 17:32                 ` Junio C Hamano
  2011-08-30 15:24               ` David Michael Barr
  1 sibling, 1 reply; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-08-30 13:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tue, Aug 30, 2011 at 4:33 AM, Junio C Hamano <gitster@pobox.com> wrote:
> and this codepath probably should not be enabled while running a merge

But do you think if it works with merge (ie. a partial merge)?

There's probably no interest in partial merge now, but if narrow clone
comes, index will be also narrowed and at least fast-forward merge
should work. I think it may work, but I'm not sure. unpack_callback()
and traverse_trees() look like magic to me. There may be some
interactions between index and unpacked trees that make partial merge
fail in a subtle way.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/3] unpack-trees: allow pruning with pathspec
  2011-08-29 21:33             ` [PATCH 2/3] unpack-trees: " Junio C Hamano
  2011-08-30 13:03               ` Nguyen Thai Ngoc Duy
@ 2011-08-30 15:24               ` David Michael Barr
  1 sibling, 0 replies; 56+ messages in thread
From: David Michael Barr @ 2011-08-30 15:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy

On Tue, Aug 30, 2011 at 7:33 AM, Junio C Hamano <gitster@pobox.com> wrote:
> while running a merge, but the caller in diff-lib.c::diff_cache() should
> be able to take advanrage of it.

s/advanrage/advantage/ :)

Nice little hot-fix too. I love the understatement, "moderate speed up."

--
David Barr

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-30 10:04             ` Michael J Gruber
@ 2011-08-30 17:03               ` Junio C Hamano
  0 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-30 17:03 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy

Michael J Gruber <git@drmicha.warpmail.net> writes:

> Seems my analysis wasn't that far off (though, admittedly, unspecific):

Yes, the "diff-index" implementation was updated over time a few times and
I was confused. Sorry.

Thanks for helping diagnosing the issue.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/3] unpack-trees: allow pruning with pathspec
  2011-08-30 13:03               ` Nguyen Thai Ngoc Duy
@ 2011-08-30 17:32                 ` Junio C Hamano
  0 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-30 17:32 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Linus Torvalds

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Tue, Aug 30, 2011 at 4:33 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> and this codepath probably should not be enabled while running a merge
>
> But do you think if it works with merge (ie. a partial merge)?

I highly doubt it.

The primary reason why I doubt it would work is actually the same as the
very reason why a merge would work with unpack_trees() machinery. It walks
all the entries in the trees involved, picking matching entries that would
go in the result, while also picking the corresponding entry from the index,
and compute their merge and depositing the result in the final index. If
you "optimize" it by not walking some parts of trees, you don't just omit
merging damages made to these parts by other trees, but you also omit
recording the contributions by the current HEAD and the index. You would
need to compensate for that by doing something that copies the contents
of the index for paths outside the area covered by pathspec.

And whole point of this series is not doing that. We don't want to say
"because they are outside pathspec, we pretend that these paths in index
do not have corresponding entries in the trees we are merging into" and
end up producing creation filepairs for them.

But I honestly am not interested giving it unnecessary deep thought at
this point in the cycle. I haven't found a need for a low-level partial
merge machinery implementation to support any higher level workflows in
git, and the path-level 3-way merge machinery already exists in the form
of ll_merge().

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-08-30 12:53               ` Nguyen Thai Ngoc Duy
@ 2011-08-30 17:44                 ` Junio C Hamano
  2011-08-31  1:35                   ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2011-08-30 17:44 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Linus Torvalds

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

>> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>>                        mask |= 1ul << i;
>>                        if (S_ISDIR(entry[i].mode))
>>                                dirmask |= 1ul << i;
>> +                       e = &entry[i];
>>                }
>
> Why? "e" is not used in that loop or anywhere after that.

This is trying to find _a_ surviving entry to be fed to prune_traversal()
which in turn uses tree_entry_interesting(). At this point in the code, we
are stuffing the entries of the same name from the input trees (and if one
tree is missing an entry of the chosen name, it will have NULL there), so
any non-empty entry would do. It corresponds to "first" but that is just a
simple string and not a name_entry tree_entry_interesting() wants.


>>                if (!mask)
>>                        break;
>> -               ret = info->fn(n, mask, dirmask, entry, info);
>> -               if (ret < 0) {
>> -                       error = ret;
>> -                       if (!info->show_all_errors)
>> -                               break;
>> +               interesting = prune_traversal(e, info, &base, interesting);
>> +               if (interesting < 0)
>> +                       break;
>
> I don't really understand this function to comment. But I guess when
> interesting < 0, we only skip info->fn() and assume it returns "mask"
> (its user unpack_callback() only returns either "mask" or -1).

We consume the entries we have used in merging (which is actually
"everything in entry[] array" as info->fn() returns "mask" itself) by
saying "update_extended_entry()" and the purpose of doing so is to prepare
to process the next entry of the tree we are traversing.

When tree_entry_interesting() returns negative, it tells us "no, and no
subsequent entries will be either", meaning "we are done with this tree".
As we are done, there is nothing to prepare for the next round; we are not
walking the remaining entries in the trees we are looking at. Is there any
point in calling update_extended_entry() I am missing?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec"
  2011-08-30  6:16                     ` Marat Radchenko
@ 2011-08-31  0:18                       ` Junio C Hamano
  0 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2011-08-31  0:18 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

Marat Radchenko <marat@slonopotamus.org> writes:

> Junio C Hamano <gitster <at> pobox.com> writes:
>> Marat, if/when you have a chance could you try a patched git on your
>> original use case and see if it produces correct output with shorter
>> amount of time?
>
> 30s without patch and 0.3s with it.

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-08-30 17:44                 ` Junio C Hamano
@ 2011-08-31  1:35                   ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-08-31  1:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Wed, Aug 31, 2011 at 12:44 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>>> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>>>                        mask |= 1ul << i;
>>>                        if (S_ISDIR(entry[i].mode))
>>>                                dirmask |= 1ul << i;
>>> +                       e = &entry[i];
>>>                }
>>
>> Why? "e" is not used in that loop or anywhere after that.
>
> This is trying to find _a_ surviving entry to be fed to prune_traversal()
> which in turn uses tree_entry_interesting(). At this point in the code, we
> are stuffing the entries of the same name from the input trees (and if one
> tree is missing an entry of the chosen name, it will have NULL there), so
> any non-empty entry would do. It corresponds to "first" but that is just a
> simple string and not a name_entry tree_entry_interesting() wants.

Ah yes. I only searched in old code, "e" is used in the new
prune_traversal() call.

>>>                if (!mask)
>>>                        break;
>>> -               ret = info->fn(n, mask, dirmask, entry, info);
>>> -               if (ret < 0) {
>>> -                       error = ret;
>>> -                       if (!info->show_all_errors)
>>> -                               break;
>>> +               interesting = prune_traversal(e, info, &base, interesting);
>>> +               if (interesting < 0)
>>> +                       break;
>>
>> I don't really understand this function to comment. But I guess when
>> interesting < 0, we only skip info->fn() and assume it returns "mask"
>> (its user unpack_callback() only returns either "mask" or -1).
>
> We consume the entries we have used in merging (which is actually
> "everything in entry[] array" as info->fn() returns "mask" itself) by
> saying "update_extended_entry()" and the purpose of doing so is to prepare
> to process the next entry of the tree we are traversing.
>
> When tree_entry_interesting() returns negative, it tells us "no, and no
> subsequent entries will be either", meaning "we are done with this tree".
> As we are done, there is nothing to prepare for the next round; we are not
> walking the remaining entries in the trees we are looking at. Is there any
> point in calling update_extended_entry() I am missing?

No you're right again. Somehow I thought there would be another round
in the loop (ie. continue, not break). My bad.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
  2011-08-30 12:53               ` Nguyen Thai Ngoc Duy
@ 2011-10-09 15:39               ` Michael Haggerty
  2011-10-09 21:35                 ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 56+ messages in thread
From: Michael Haggerty @ 2011-10-09 15:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy

On 08/29/2011 11:33 PM, Junio C Hamano wrote:
> diff --git a/tree-walk.c b/tree-walk.c
> index 33f749e..808bb55 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> [...]
> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>  			mask |= 1ul << i;
>  			if (S_ISDIR(entry[i].mode))
>  				dirmask |= 1ul << i;
> +			e = &entry[i];
>  		}
>  		if (!mask)
>  			break;
> -		ret = info->fn(n, mask, dirmask, entry, info);
> -		if (ret < 0) {
> -			error = ret;
> -			if (!info->show_all_errors)
> -				break;
> +		interesting = prune_traversal(e, info, &base, interesting);

According to gcc 4.2.4 (though, strangely, not gcc 4.4.3):

tree-walk.c: In function ‘traverse_trees’:
tree-walk.c:347: warning: ‘e’ may be used uninitialized in this function

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-10-09 15:39               ` Michael Haggerty
@ 2011-10-09 21:35                 ` Nguyen Thai Ngoc Duy
  2011-10-10  4:42                   ` Michael Haggerty
  0 siblings, 1 reply; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-10-09 21:35 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git, Linus Torvalds

On Mon, Oct 10, 2011 at 2:39 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 08/29/2011 11:33 PM, Junio C Hamano wrote:
>> diff --git a/tree-walk.c b/tree-walk.c
>> index 33f749e..808bb55 100644
>> --- a/tree-walk.c
>> +++ b/tree-walk.c
>> [...]
>> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>>                       mask |= 1ul << i;
>>                       if (S_ISDIR(entry[i].mode))
>>                               dirmask |= 1ul << i;
>> +                     e = &entry[i];
>>               }
>>               if (!mask)
>>                       break;
>> -             ret = info->fn(n, mask, dirmask, entry, info);
>> -             if (ret < 0) {
>> -                     error = ret;
>> -                     if (!info->show_all_errors)
>> -                             break;
>> +             interesting = prune_traversal(e, info, &base, interesting);
>
> According to gcc 4.2.4 (though, strangely, not gcc 4.4.3):
>
> tree-walk.c: In function ‘traverse_trees’:
> tree-walk.c:347: warning: ‘e’ may be used uninitialized in this function

False alarm. If e is not initialized in the for loop, mask would be
zero and therefore prune_traversal(e, info, &base, interesting), which
would use uninitialized "e", would never be called.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/3] traverse_trees(): allow pruning with pathspec
  2011-10-09 21:35                 ` Nguyen Thai Ngoc Duy
@ 2011-10-10  4:42                   ` Michael Haggerty
  0 siblings, 0 replies; 56+ messages in thread
From: Michael Haggerty @ 2011-10-10  4:42 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git, Linus Torvalds

On 10/09/2011 11:35 PM, Nguyen Thai Ngoc Duy wrote:
> On Mon, Oct 10, 2011 at 2:39 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 08/29/2011 11:33 PM, Junio C Hamano wrote:
>>> diff --git a/tree-walk.c b/tree-walk.c
>>> index 33f749e..808bb55 100644
>>> --- a/tree-walk.c
>>> +++ b/tree-walk.c
>>> [...]
>>> @@ -376,16 +396,22 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
>>>                       mask |= 1ul << i;
>>>                       if (S_ISDIR(entry[i].mode))
>>>                               dirmask |= 1ul << i;
>>> +                     e = &entry[i];
>>>               }
>>>               if (!mask)
>>>                       break;
>>> -             ret = info->fn(n, mask, dirmask, entry, info);
>>> -             if (ret < 0) {
>>> -                     error = ret;
>>> -                     if (!info->show_all_errors)
>>> -                             break;
>>> +             interesting = prune_traversal(e, info, &base, interesting);
>>
>> According to gcc 4.2.4 (though, strangely, not gcc 4.4.3):

I checked this a bit more carefully.  gcc 4.2.4 emits a warning when the
-O1 or -O2 optimization levels are used, but not with -O0.  gcc 4.4.3
does not emit a warning regardless of optimization level.

>> tree-walk.c: In function ‘traverse_trees’:
>> tree-walk.c:347: warning: ‘e’ may be used uninitialized in this function
> 
> False alarm. If e is not initialized in the for loop, mask would be
> zero and therefore prune_traversal(e, info, &base, interesting), which
> would use uninitialized "e", would never be called.

That's good to know.  Still, it might be worthwhile to initialize the
variable explicitly to avoid future confusion.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery
  2011-08-29 21:33             ` [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery Junio C Hamano
@ 2012-01-11  6:31               ` Jonathan Nieder
  2012-01-11  8:05                 ` Junio C Hamano
                                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Jonathan Nieder @ 2012-01-11  6:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy

Hi,

Junio C Hamano wrote:

> And finally, pass the pathspec down through unpack_trees() to traverse_trees()
> callchain.
>
> Before and after applying this series, looking for changes in the kernel
> repository with a fairly narrow pathspec gets a moderate speeds up.

Check this out:

	mkdir -p test-repo/subdir
	cd test-repo
	git init
	echo hi >subdir/hello.h
	git add subdir/hello.h
	git commit -m 'say hi'
	git diff-index --abbrev HEAD -- '*.h'

In git versions which include the patch described above, the unchanged
subdir/hello.h shows up as a newly added file.  Reverting that patch
(v1.7.7.1~22^2, diff-index: pass pathspec down to unpack-trees
machinery, 2011-08-29) makes "git diff HEAD" with wildcards work
again.

It looks like the pruning of the preimage by pathspec is too
aggressive and is omiting whole directories that do not match the
pathspec without realizing that the path to a file contained within
might match.

We could just add a test for this to the testsuite and do that revert,
but I'd rather not yet, in case this is a symptom of some deeper
unpack_trees() confusion.

Hints?

Thanks,
Jonathan

>  diff-lib.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index f8454dd..ebe751e 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -468,6 +468,7 @@ static int diff_cache(struct rev_info *revs,
>  	opts.unpack_data = revs;
>  	opts.src_index = &the_index;
>  	opts.dst_index = NULL;
> +	opts.pathspec = &revs->diffopt.pathspec;
>  
>  	init_tree_desc(&t, tree->buffer, tree->size);
>  	return unpack_trees(1, &t, &opts);
> -- 
> 1.7.7.rc0.70.g82660
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery
  2012-01-11  6:31               ` Jonathan Nieder
@ 2012-01-11  8:05                 ` Junio C Hamano
  2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
  2012-01-12  4:09                 ` [PATCH] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2012-01-11  8:05 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy

Jonathan Nieder <jrnieder@gmail.com> writes:

> Junio C Hamano wrote:
>
>> And finally, pass the pathspec down through unpack_trees() to traverse_trees()
>> callchain.
>
> In git versions which include the patch described above, the unchanged
> subdir/hello.h shows up as a newly added file.  Reverting that patch
> (v1.7.7.1~22^2, diff-index: pass pathspec down to unpack-trees
> machinery, 2011-08-29) makes "git diff HEAD" with wildcards work
> again.

I suspect that the particular change on the side branch predates Nguyen's
effort to unify the pathspec semantics to teach the wildcard (i.e. not the
traditional "prefix match") to the tree traversal code, but it is fairly
late here, so I didn't check.

I think the right fix is to update the logic that still assumes that a
pathspec used for tree traversal is always prefix match when leaving the
traversal early, and instead use the proper matching logic that knows that
a wildcard pathspec needs to dig deeper into the tree regardless (I think
the pathspec implementation used in "git grep" got this right, but please
double check), so that we do not dig unnecessary subtrees when pathspecs
are all prefixes, but still keep digging when there is a wildcard match.

I suspect that it would lead to a fairly complete unification of the three
implementations of pathspec matching logic and would allow us to also do
something like "git log -- '*.h'" for free.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery
  2012-01-11  6:31               ` Jonathan Nieder
  2012-01-11  8:05                 ` Junio C Hamano
@ 2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
  2012-01-11 12:47                   ` Nguyen Thai Ngoc Duy
  2012-01-11 20:40                   ` Junio C Hamano
  2012-01-12  4:09                 ` [PATCH] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
  2 siblings, 2 replies; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-01-11 12:33 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git, Linus Torvalds

On Wed, Jan 11, 2012 at 1:31 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Check this out:
>
>        mkdir -p test-repo/subdir
>        cd test-repo
>        git init
>        echo hi >subdir/hello.h
>        git add subdir/hello.h
>        git commit -m 'say hi'
>        git diff-index --abbrev HEAD -- '*.h'

This seems to fix this.

diff --git a/unpack-trees.c b/unpack-trees.c
index 7c9ecf6..5cf58b6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1042,6 +1042,7 @@ int unpack_trees(unsigned len, struct tree_desc
*t, struct unpack_trees_options
                info.data = o;
                info.show_all_errors = o->show_all_errors;
                info.pathspec = o->pathspec;
+               info.pathspec->recursive = 1;

                if (o->prefix) {
                        /*

Still scratching my head why this flag is zero by default, which would
affect all other places.. Or perhaps the right fix would be this
instead

diff --git a/tree-walk.c b/tree-walk.c
index f82dba6..0345938 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -634,7 +634,7 @@ enum interesting tree_entry_interesting(const
struct name_entry *entry,
                                 * Match all directories. We'll try to
                                 * match files later on.
                                 */
-                               if (ps->recursive && S_ISDIR(entry->mode))
+                               if (S_ISDIR(entry->mode))
                                        return entry_interesting;
                        }

@@ -662,7 +662,7 @@ match_wildcards:
                 * Match all directories. We'll try to match files
                 * later on.
                 */
-               if (ps->recursive && S_ISDIR(entry->mode))
+               if (S_ISDIR(entry->mode))
                        return entry_interesting;
        }
        return never_interesting; /* No matches */
-- 
Duy

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery
  2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
@ 2012-01-11 12:47                   ` Nguyen Thai Ngoc Duy
  2012-01-11 20:40                   ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-01-11 12:47 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git, Linus Torvalds

On Wed, Jan 11, 2012 at 7:33 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> Still scratching my head why this flag is zero by default, which would
> affect all other places.. Or perhaps the right fix would be this
> instead

Yep, it seems the right one. recursive flag is introduced to
enable/disable max_depth feature, which is only used by grep, so we
want this feature to be off by default. max_depth == 0 has special
meaning, and we do not want another any other kind of initialization
but memset(), so "recursive" functions as the feature switch.

The code portions below deal with the case where we have tried and
failed prefix matching. Because this is wildcard, we don't know if
anything in the given directory may match or not, so we match all
directories, then filter unmatched files out at the end. This has
nothing to do with "recursive" flag as the max_depth switch.

Will think about it and test some more tomorrow when my mind is in better state.

> diff --git a/tree-walk.c b/tree-walk.c
> index f82dba6..0345938 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -634,7 +634,7 @@ enum interesting tree_entry_interesting(const
> struct name_entry *entry,
>                                 * Match all directories. We'll try to
>                                 * match files later on.
>                                 */
> -                               if (ps->recursive && S_ISDIR(entry->mode))
> +                               if (S_ISDIR(entry->mode))
>                                        return entry_interesting;
>                        }
>
> @@ -662,7 +662,7 @@ match_wildcards:
>                 * Match all directories. We'll try to match files
>                 * later on.
>                 */
> -               if (ps->recursive && S_ISDIR(entry->mode))
> +               if (S_ISDIR(entry->mode))
>                        return entry_interesting;
>        }
>        return never_interesting; /* No matches */
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery
  2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
  2012-01-11 12:47                   ` Nguyen Thai Ngoc Duy
@ 2012-01-11 20:40                   ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2012-01-11 20:40 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jonathan Nieder, git, Linus Torvalds

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> This seems to fix this.
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 7c9ecf6..5cf58b6 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1042,6 +1042,7 @@ int unpack_trees(unsigned len, struct tree_desc
> *t, struct unpack_trees_options
>                 info.data = o;
>                 info.show_all_errors = o->show_all_errors;
>                 info.pathspec = o->pathspec;
> +               info.pathspec->recursive = 1;
>
>                 if (o->prefix) {
>                         /*
>
> Still scratching my head why this flag is zero by default, which would
> affect all other places.

Ahh, thanks for diagnosing.

> ... Or perhaps the right fix would be this
> instead
>
> diff --git a/tree-walk.c b/tree-walk.c
> index f82dba6..0345938 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -634,7 +634,7 @@ enum interesting tree_entry_interesting(const
> struct name_entry *entry,
>                                  * Match all directories. We'll try to
>                                  * match files later on.
>                                  */
> -                               if (ps->recursive && S_ISDIR(entry->mode))
> +                               if (S_ISDIR(entry->mode))
>                                         return entry_interesting;
>                         }
>
> @@ -662,7 +662,7 @@ match_wildcards:
>                  * Match all directories. We'll try to match files
>                  * later on.
>                  */
> -               if (ps->recursive && S_ISDIR(entry->mode))
> +               if (S_ISDIR(entry->mode))
>                         return entry_interesting;
>         }
>         return never_interesting; /* No matches */

Doesn't that break "git diff-tree A B" without the "-r" option?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] tree_entry_interesting: make recursive mode default
  2012-01-11  6:31               ` Jonathan Nieder
  2012-01-11  8:05                 ` Junio C Hamano
  2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
@ 2012-01-12  4:09                 ` Nguyễn Thái Ngọc Duy
  2012-01-12  5:04                   ` Junio C Hamano
  2012-01-14  9:23                   ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Nguyễn Thái Ngọc Duy
  2 siblings, 2 replies; 56+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-01-12  4:09 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jonathan Nieder, Linus Torvalds,
	Nguyễn Thái Ngọc Duy

There is a bit of history behind this. Some time ago, t_e_i() only
supported prefix matching. diff-tree supported recursive and
non-recursive mode but it did not make any difference to prefix
matching.

Later on, t_e_i() gained limited recursion support to unify a similar
matching code used by git-grep. It introduced two new fields in struct
pathspec: max_depth and recursive. "recursive" field functions as a
feature switch so that this feature is off by default.

Some time after that, t_e_i() further gained wildcard support. With
wildcard matching, recursive and non-recursive diff-tree
mattered. "recursive" field was reused to distinguish recursion in
diff-tree.

This choice has a side effect that by default wildcard matching is in
non-recursive mode, which is not true. All current call sites except
"diff-tree without -r" (grep, traverse_commit_list, tree-walk and
general tree diff) prefer recursive mode.

This patch decouples the use of recursive field. The max_depth feature
switch is now controlled by max_depth_valid field. diff-tree recursion
is controlled by nonrecursive_diff_tree, which makes it recursive by
default.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Junio, log/diff family takes wildcards just fine (even before this
 patch, just less reliable). If it's not mentioned in release notes
 before, perhaps it's worth a line in 1.7.9 annoucement.

 builtin/grep.c           |    2 +-
 cache.h                  |    3 ++-
 dir.c                    |    4 ++--
 t/t4010-diff-pathspec.sh |    8 ++++++++
 tree-diff.c              |    3 +--
 tree-walk.c              |    8 ++++----
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9ce064a..c081bf4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1051,7 +1051,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	paths = get_pathspec(prefix, argv + i);
 	init_pathspec(&pathspec, paths);
 	pathspec.max_depth = opt.max_depth;
-	pathspec.recursive = 1;
+	pathspec.max_depth_valid = 1;
 
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
diff --git a/cache.h b/cache.h
index 10afd71..f58e05a 100644
--- a/cache.h
+++ b/cache.h
@@ -526,7 +526,8 @@ struct pathspec {
 	const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
 	int nr;
 	unsigned int has_wildcard:1;
-	unsigned int recursive:1;
+	unsigned int max_depth_valid:1;
+	unsigned int nonrecursive_diff_tree:1;
 	int max_depth;
 	struct pathspec_item {
 		const char *match;
diff --git a/dir.c b/dir.c
index 0a78d00..5af3567 100644
--- a/dir.c
+++ b/dir.c
@@ -258,7 +258,7 @@ int match_pathspec_depth(const struct pathspec *ps,
 	int i, retval = 0;
 
 	if (!ps->nr) {
-		if (!ps->recursive || ps->max_depth == -1)
+		if (!ps->max_depth_valid || ps->max_depth == -1)
 			return MATCHED_RECURSIVELY;
 
 		if (within_depth(name, namelen, 0, ps->max_depth))
@@ -275,7 +275,7 @@ int match_pathspec_depth(const struct pathspec *ps,
 		if (seen && seen[i] == MATCHED_EXACTLY)
 			continue;
 		how = match_pathspec_item(ps->items+i, prefix, name, namelen);
-		if (ps->recursive && ps->max_depth != -1 &&
+		if (ps->max_depth_valid && ps->max_depth != -1 &&
 		    how && how != MATCHED_FNMATCH) {
 			int len = ps->items[i].len;
 			if (name[len] == '/')
diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index fbc8cd8..af5134b 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -48,6 +48,14 @@ test_expect_success \
      compare_diff_raw current expected'
 
 cat >expected <<\EOF
+:100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	path1/file1
+EOF
+test_expect_success \
+    '"*file1" should show path1/file1' \
+    'git diff-index --cached $tree -- "*file1" >current &&
+     compare_diff_raw current expected'
+
+cat >expected <<\EOF
 :100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	file0
 EOF
 test_expect_success \
diff --git a/tree-diff.c b/tree-diff.c
index 28ad6db..164693f 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -137,8 +137,7 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 	enum interesting t2_match = entry_not_interesting;
 
 	/* Enable recursion indefinitely */
-	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
-	opt->pathspec.max_depth = -1;
+	opt->pathspec.nonrecursive_diff_tree = !DIFF_OPT_TST(opt, RECURSIVE);
 
 	strbuf_init(&base, PATH_MAX);
 	strbuf_add(&base, base_str, baselen);
diff --git a/tree-walk.c b/tree-walk.c
index f82dba6..04f1b81 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -588,7 +588,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 		entry_not_interesting : all_entries_not_interesting;
 
 	if (!ps->nr) {
-		if (!ps->recursive || ps->max_depth == -1)
+		if (!ps->max_depth_valid || ps->max_depth == -1)
 			return all_entries_interesting;
 		return within_depth(base->buf + base_offset, baselen,
 				    !!S_ISDIR(entry->mode),
@@ -609,7 +609,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 			if (!match_dir_prefix(base_str, match, matchlen))
 				goto match_wildcards;
 
-			if (!ps->recursive || ps->max_depth == -1)
+			if (!ps->max_depth_valid || ps->max_depth == -1)
 				return all_entries_interesting;
 
 			return within_depth(base_str + matchlen + 1,
@@ -634,7 +634,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 				 * Match all directories. We'll try to
 				 * match files later on.
 				 */
-				if (ps->recursive && S_ISDIR(entry->mode))
+				if (!ps->nonrecursive_diff_tree && S_ISDIR(entry->mode))
 					return entry_interesting;
 			}
 
@@ -662,7 +662,7 @@ match_wildcards:
 		 * Match all directories. We'll try to match files
 		 * later on.
 		 */
-		if (ps->recursive && S_ISDIR(entry->mode))
+		if (!ps->nonrecursive_diff_tree && S_ISDIR(entry->mode))
 			return entry_interesting;
 	}
 	return never_interesting; /* No matches */
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH] tree_entry_interesting: make recursive mode default
  2012-01-12  4:09                 ` [PATCH] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
@ 2012-01-12  5:04                   ` Junio C Hamano
  2012-01-12  5:44                     ` Nguyen Thai Ngoc Duy
  2012-01-14  9:23                   ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2012-01-12  5:04 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Jonathan Nieder, Linus Torvalds

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> There is a bit of history behind this. Some time ago, t_e_i() only
> supported prefix matching. diff-tree supported recursive and
> non-recursive mode but it did not make any difference to prefix
> matching.
>
> Later on, t_e_i() gained limited recursion support to unify a similar
> matching code used by git-grep. It introduced two new fields in struct
> pathspec: max_depth and recursive. "recursive" field functions as a
> feature switch so that this feature is off by default.
>
> Some time after that, t_e_i() further gained wildcard support. With
> wildcard matching, recursive and non-recursive diff-tree
> mattered. "recursive" field was reused to distinguish recursion in
> diff-tree.
>
> This choice has a side effect that by default wildcard matching is in
> non-recursive mode, which is not true. All current call sites except
> "diff-tree without -r" (grep, traverse_commit_list, tree-walk and
> general tree diff) prefer recursive mode.
>
> This patch decouples the use of recursive field. The max_depth feature
> switch is now controlled by max_depth_valid field. diff-tree recursion
> is controlled by nonrecursive_diff_tree, which makes it recursive by
> default.

Thanks, but I am curious about two (and a half) things.

 - The "max_depth" option has perfectly good and natural "invalid"
   sentinel values (i.e. 0 or negative). Why do we need a separate
   bitfield?

 - Special casing the non-recursive mode of diff-tree is perfectly
   acceptable, but nonrecursive_diff_tree does not sound like a very good
   name for it for two reasons. Perhaps there may be other users that want
   the "surface only" behaviour, so having "diff_tree" in the name limits
   its future (re)use. Also an option that is named negatively inevitably
   invites "if (!opt->non_whatever)" double negative. Can we come up with
   a better name, perhaps "onelevel_only" or something?

 - Shouldn't "onelevel_only" be the same as limiting to a single depth
   with "max_depth"?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] tree_entry_interesting: make recursive mode default
  2012-01-12  5:04                   ` Junio C Hamano
@ 2012-01-12  5:44                     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-01-12  5:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Nieder, Linus Torvalds

2012/1/12 Junio C Hamano <gitster@pobox.com>:
> Thanks, but I am curious about two (and a half) things.
>
>  - The "max_depth" option has perfectly good and natural "invalid"
>   sentinel values (i.e. 0 or negative). Why do we need a separate
>   bitfield?

We want infinite recursion by default, by max_depth default value is
0, which is non-recursive.

>  - Special casing the non-recursive mode of diff-tree is perfectly
>   acceptable, but nonrecursive_diff_tree does not sound like a very good
>   name for it for two reasons. Perhaps there may be other users that want
>   the "surface only" behaviour, so having "diff_tree" in the name limits
>   its future (re)use. Also an option that is named negatively inevitably
>   invites "if (!opt->non_whatever)" double negative. Can we come up with
>   a better name, perhaps "onelevel_only" or something?

I'm bad at naming, any other names are welcome. onelevel_only sounds good.

>  - Shouldn't "onelevel_only" be the same as limiting to a single depth
>   with "max_depth"?

I thought of that but not sure it's equivalent to max_depth == 1 or
max_depth == 0, so I separated it for safety and clarity. max_depth
feature is driven by git-grep and there were a few interpretations how
it should behave last time, so I'm not sure if its behavior may change
in future.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards
  2012-01-12  4:09                 ` [PATCH] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
  2012-01-12  5:04                   ` Junio C Hamano
@ 2012-01-14  9:23                   ` Nguyễn Thái Ngọc Duy
  2012-01-14  9:23                     ` [PATCH v2 2/2] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
  2012-01-15  2:38                     ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Junio C Hamano
  1 sibling, 2 replies; 56+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-01-14  9:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jonathan Nieder, Linus Torvalds,
	Nguyễn Thái Ngọc Duy

It's actually unlimited recursion if wildcards are active regardless
--max-depth

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Regarding Junio's question earlier:

 >  - Shouldn't "onelevel_only" be the same as limiting to a single depth
 >   with "max_depth"?

 Doing that would change the behavior of "git grep --max-depth=0 -- 'a*'"
 from unlimited recursion currently to limited. We did not come to agree
 how --max-depth should behave with wildcards last time it was discussed,
 so it's best separating two flags (in the next patch) for now.

 Documentation/git-grep.txt |    3 +++
 tree-walk.c                |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 15d6711..6a8b1e3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -79,6 +79,9 @@ OPTIONS
 --max-depth <depth>::
 	For each <pathspec> given on command line, descend at most <depth>
 	levels of directories. A negative value means no limit.
+	This option is ignored if <pathspec> contains active wildcards.
+	In other words if "a*" matches a directory named "a*",
+	"*" is matched literally so --max-depth is still effective.
 
 -w::
 --word-regexp::
diff --git a/tree-walk.c b/tree-walk.c
index f82dba6..492c7cd 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -661,6 +661,9 @@ match_wildcards:
 		/*
 		 * Match all directories. We'll try to match files
 		 * later on.
+		 * max_depth is ignored but we may consider support it
+		 * in future, see
+		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
 		 */
 		if (ps->recursive && S_ISDIR(entry->mode))
 			return entry_interesting;
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/2] tree_entry_interesting: make recursive mode default
  2012-01-14  9:23                   ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Nguyễn Thái Ngọc Duy
@ 2012-01-14  9:23                     ` Nguyễn Thái Ngọc Duy
  2012-01-15  3:12                       ` Junio C Hamano
  2012-01-15  2:38                     ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-01-14  9:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jonathan Nieder, Linus Torvalds,
	Nguyễn Thái Ngọc Duy

There is a bit of history behind this. Some time ago, t_e_i() only
supported prefix matching. diff-tree supported recursive and
non-recursive mode but it did not make any difference to prefix
matching.

Later on, t_e_i() gained limited recursion support to unify a similar
matching code used by git-grep. It introduced two new fields in struct
pathspec: max_depth and recursive. "recursive" field functions as a
feature switch so that this feature is off by default.

Some time after that, t_e_i() further gained wildcard support. With
wildcard matching, recursive and non-recursive diff-tree
mattered. "recursive" field was reused to distinguish recursion in
diff-tree.

This choice has a side effect that by default wildcard matching is in
non-recursive mode, which is not true. All current call sites except
"diff-tree without -r" (grep, traverse_commit_list, tree-walk and
general tree diff) prefer recursive mode.

This patch decouples the use of recursive field. The max_depth feature
switch is now controlled by max_depth_valid field. diff-tree recursion
is controlled by onelevel_only, which makes it recursive by default.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 The ugly name "nonrecursive_diff_tree" is changed to "onelevel_only".

 builtin/grep.c           |    2 +-
 cache.h                  |    3 ++-
 dir.c                    |    4 ++--
 t/t4010-diff-pathspec.sh |    8 ++++++++
 tree-diff.c              |    3 +--
 tree-walk.c              |    8 ++++----
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9ce064a..c081bf4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1051,7 +1051,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	paths = get_pathspec(prefix, argv + i);
 	init_pathspec(&pathspec, paths);
 	pathspec.max_depth = opt.max_depth;
-	pathspec.recursive = 1;
+	pathspec.max_depth_valid = 1;
 
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
diff --git a/cache.h b/cache.h
index 10afd71..0c4067d 100644
--- a/cache.h
+++ b/cache.h
@@ -526,7 +526,8 @@ struct pathspec {
 	const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
 	int nr;
 	unsigned int has_wildcard:1;
-	unsigned int recursive:1;
+	unsigned int max_depth_valid:1;
+	unsigned int onelevel_only:1;
 	int max_depth;
 	struct pathspec_item {
 		const char *match;
diff --git a/dir.c b/dir.c
index 0a78d00..5af3567 100644
--- a/dir.c
+++ b/dir.c
@@ -258,7 +258,7 @@ int match_pathspec_depth(const struct pathspec *ps,
 	int i, retval = 0;
 
 	if (!ps->nr) {
-		if (!ps->recursive || ps->max_depth == -1)
+		if (!ps->max_depth_valid || ps->max_depth == -1)
 			return MATCHED_RECURSIVELY;
 
 		if (within_depth(name, namelen, 0, ps->max_depth))
@@ -275,7 +275,7 @@ int match_pathspec_depth(const struct pathspec *ps,
 		if (seen && seen[i] == MATCHED_EXACTLY)
 			continue;
 		how = match_pathspec_item(ps->items+i, prefix, name, namelen);
-		if (ps->recursive && ps->max_depth != -1 &&
+		if (ps->max_depth_valid && ps->max_depth != -1 &&
 		    how && how != MATCHED_FNMATCH) {
 			int len = ps->items[i].len;
 			if (name[len] == '/')
diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index fbc8cd8..af5134b 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -48,6 +48,14 @@ test_expect_success \
      compare_diff_raw current expected'
 
 cat >expected <<\EOF
+:100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	path1/file1
+EOF
+test_expect_success \
+    '"*file1" should show path1/file1' \
+    'git diff-index --cached $tree -- "*file1" >current &&
+     compare_diff_raw current expected'
+
+cat >expected <<\EOF
 :100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	file0
 EOF
 test_expect_success \
diff --git a/tree-diff.c b/tree-diff.c
index 28ad6db..fbc683c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -137,8 +137,7 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 	enum interesting t2_match = entry_not_interesting;
 
 	/* Enable recursion indefinitely */
-	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
-	opt->pathspec.max_depth = -1;
+	opt->pathspec.onelevel_only = !DIFF_OPT_TST(opt, RECURSIVE);
 
 	strbuf_init(&base, PATH_MAX);
 	strbuf_add(&base, base_str, baselen);
diff --git a/tree-walk.c b/tree-walk.c
index 492c7cd..fdecacc 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -588,7 +588,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 		entry_not_interesting : all_entries_not_interesting;
 
 	if (!ps->nr) {
-		if (!ps->recursive || ps->max_depth == -1)
+		if (!ps->max_depth_valid || ps->max_depth == -1)
 			return all_entries_interesting;
 		return within_depth(base->buf + base_offset, baselen,
 				    !!S_ISDIR(entry->mode),
@@ -609,7 +609,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 			if (!match_dir_prefix(base_str, match, matchlen))
 				goto match_wildcards;
 
-			if (!ps->recursive || ps->max_depth == -1)
+			if (!ps->max_depth_valid || ps->max_depth == -1)
 				return all_entries_interesting;
 
 			return within_depth(base_str + matchlen + 1,
@@ -634,7 +634,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
 				 * Match all directories. We'll try to
 				 * match files later on.
 				 */
-				if (ps->recursive && S_ISDIR(entry->mode))
+				if (!ps->onelevel_only && S_ISDIR(entry->mode))
 					return entry_interesting;
 			}
 
@@ -665,7 +665,7 @@ match_wildcards:
 		 * in future, see
 		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
 		 */
-		if (ps->recursive && S_ISDIR(entry->mode))
+		if (!ps->onelevel_only && S_ISDIR(entry->mode))
 			return entry_interesting;
 	}
 	return never_interesting; /* No matches */
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards
  2012-01-14  9:23                   ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Nguyễn Thái Ngọc Duy
  2012-01-14  9:23                     ` [PATCH v2 2/2] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
@ 2012-01-15  2:38                     ` Junio C Hamano
  2012-01-15  9:48                       ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2012-01-15  2:38 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Jonathan Nieder, Linus Torvalds

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> It's actually unlimited recursion if wildcards are active regardless
> --max-depth
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Regarding Junio's question earlier:
>
>  >  - Shouldn't "onelevel_only" be the same as limiting to a single depth
>  >   with "max_depth"?
>
>  Doing that would change the behavior of "git grep --max-depth=0 -- 'a*'"
>  from unlimited recursion currently to limited. We did not come to agree
>  how --max-depth should behave with wildcards last time it was discussed,
>  so it's best separating two flags (in the next patch) for now.

Ok, I 100% agree with the "at least for now" reasoning. Thanks for digging
into the archive.

>  Documentation/git-grep.txt |    3 +++
>  tree-walk.c                |    3 +++
>  2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 15d6711..6a8b1e3 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -79,6 +79,9 @@ OPTIONS
>  --max-depth <depth>::
>  	For each <pathspec> given on command line, descend at most <depth>
>  	levels of directories. A negative value means no limit.
> +	This option is ignored if <pathspec> contains active wildcards.
> +	In other words if "a*" matches a directory named "a*",
> +	"*" is matched literally so --max-depth is still effective.

Do we have a definition of "active wildcard"?

> diff --git a/tree-walk.c b/tree-walk.c
> index f82dba6..492c7cd 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -661,6 +661,9 @@ match_wildcards:
>  		/*
>  		 * Match all directories. We'll try to match files
>  		 * later on.
> +		 * max_depth is ignored but we may consider support it
> +		 * in future, see
> +		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
>  		 */
>  		if (ps->recursive && S_ISDIR(entry->mode))
>  			return entry_interesting;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] tree_entry_interesting: make recursive mode default
  2012-01-14  9:23                     ` [PATCH v2 2/2] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
@ 2012-01-15  3:12                       ` Junio C Hamano
  2012-01-15 10:03                         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2012-01-15  3:12 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Jonathan Nieder, Linus Torvalds

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This patch decouples the use of recursive field. The max_depth feature
> switch is now controlled by max_depth_valid field. diff-tree recursion
> is controlled by onelevel_only, which makes it recursive by default.

> diff --git a/builtin/grep.c b/builtin/grep.c
> index 9ce064a..c081bf4 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -1051,7 +1051,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	paths = get_pathspec(prefix, argv + i);
>  	init_pathspec(&pathspec, paths);
>  	pathspec.max_depth = opt.max_depth;
> -	pathspec.recursive = 1;
> +	pathspec.max_depth_valid = 1;

We initialize opt.max_depth to "-1" (unlimited) and then let it be
overridden by the command line, and we set it to pathspec, so max_depth is
always valid, even when it is "-1", and is to be honoured.

> diff --git a/dir.c b/dir.c
> index 0a78d00..5af3567 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -258,7 +258,7 @@ int match_pathspec_depth(const struct pathspec *ps,
>  	int i, retval = 0;
>  
>  	if (!ps->nr) {
> -		if (!ps->recursive || ps->max_depth == -1)
> +		if (!ps->max_depth_valid || ps->max_depth == -1)
>  			return MATCHED_RECURSIVELY;
>  		if (within_depth(name, namelen, 0, ps->max_depth))

When there is no pathspec given, if we do not have a valid depth limiter,
or a valid depth limiter says "-1" (unlimited), it is always a match.
Otherwise we have check the depth. Looks correct.


> @@ -275,7 +275,7 @@ int match_pathspec_depth(const struct pathspec *ps,
>  		if (seen && seen[i] == MATCHED_EXACTLY)
>  			continue;
>  		how = match_pathspec_item(ps->items+i, prefix, name, namelen);
> -		if (ps->recursive && ps->max_depth != -1 &&
> +		if (ps->max_depth_valid && ps->max_depth != -1 &&
>  		    how && how != MATCHED_FNMATCH) {
>  			int len = ps->items[i].len;
>  			if (name[len] == '/')

Likewise. When there is a max_depth defined from the caller, and we did
not get the desired match, we check if we can go deeper to retry the
match. Looks correct.

These assume that tree-diff (the sole user of onelevel_only) never calls
into this function and ask to limit the recursion, though. Is it a good
thing for the longer-term code health?

In any case, both of the above seem to work without max_depth_valid.  If
the caller does not want to use depth limiting, it can set max_depth to
"-1", and all the code after this patch that check ps->max_depth_valid can
pretend that max_depth_valid is set to 1, no?

> diff --git a/tree-diff.c b/tree-diff.c
> index 28ad6db..fbc683c 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -137,8 +137,7 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
>  	enum interesting t2_match = entry_not_interesting;
>  
>  	/* Enable recursion indefinitely */
> -	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
> -	opt->pathspec.max_depth = -1;
> +	opt->pathspec.onelevel_only = !DIFF_OPT_TST(opt, RECURSIVE);

The comment "Enable recursion indefinitely" seems stale (not a problem
with this change).

> diff --git a/tree-walk.c b/tree-walk.c
> index 492c7cd..fdecacc 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -588,7 +588,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
>  		entry_not_interesting : all_entries_not_interesting;
>  
>  	if (!ps->nr) {
> -		if (!ps->recursive || ps->max_depth == -1)
> +		if (!ps->max_depth_valid || ps->max_depth == -1)
>  			return all_entries_interesting;
>  		return within_depth(base->buf + base_offset, baselen,
>  				    !!S_ISDIR(entry->mode),

When there is no pathspec given, if we do not have a valid depth limiter
(i.e. caller is diff-tree), or a valid depth limiter says "-1" (unlimited), 
everything in this tree is interesting. If we have depth limit, we need to
check it.

But how would onelevel_only option interact with this codepath? We used to
have recursive == false and max_depth == -1 in a non-recursive diff-tree,
so the old code would have returned all_entries_interesting. Now we rely
on max_depth_valid being invalid. Again, wouldn't this work without
max_depth_valid if max_depth is set to "-1" in diff-tree?

> @@ -609,7 +609,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
>  			if (!match_dir_prefix(base_str, match, matchlen))
>  				goto match_wildcards;
>  
> -			if (!ps->recursive || ps->max_depth == -1)
> +			if (!ps->max_depth_valid || ps->max_depth == -1)
>  				return all_entries_interesting;
>  			return within_depth(base_str + matchlen + 1,

Likewise. Everything is interesting in a matched entry, when not
depth-limited. Otherwise we would need to check the depth. Looks correct.

Again, how would onelevel_only option interact with this part?

> @@ -634,7 +634,7 @@ enum interesting tree_entry_interesting(const struct name_entry *entry,
>  				 * Match all directories. We'll try to
>  				 * match files later on.
>  				 */
> -				if (ps->recursive && S_ISDIR(entry->mode))
> +				if (!ps->onelevel_only && S_ISDIR(entry->mode))
>  					return entry_interesting;
>  			}
>  
> @@ -665,7 +665,7 @@ match_wildcards:
>  		 * in future, see
>  		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
>  		 */
> -		if (ps->recursive && S_ISDIR(entry->mode))
> +		if (!ps->onelevel_only && S_ISDIR(entry->mode))
>  			return entry_interesting;
>  	}
>  	return never_interesting; /* No matches */

Before we had recursive and max_depth. Now you have three instead of two.
The only thing we are trying to say with these three is if we want to
allow unlimited recursion, no recursion or recursion limited to a certain
depth. An integer option ought to work, and various codepaths that used to
look at the old two variables are converted to look at only just a few of
the new three variables, and never all three of them.

That makes my head hurt and makes me suspect there is something
fundamentally wrong in the patch.  Sigh...

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards
  2012-01-15  2:38                     ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Junio C Hamano
@ 2012-01-15  9:48                       ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-01-15  9:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Nieder, Linus Torvalds

2012/1/15 Junio C Hamano <gitster@pobox.com>:
>> @@ -79,6 +79,9 @@ OPTIONS
>>  --max-depth <depth>::
>>       For each <pathspec> given on command line, descend at most <depth>
>>       levels of directories. A negative value means no limit.
>> +     This option is ignored if <pathspec> contains active wildcards.
>> +     In other words if "a*" matches a directory named "a*",
>> +     "*" is matched literally so --max-depth is still effective.
>
> Do we have a definition of "active wildcard"?

Probably not. I did not know how to phrase it and ended up with
"active wildcard" and an example for clarification.
-- 
Duy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] tree_entry_interesting: make recursive mode default
  2012-01-15  3:12                       ` Junio C Hamano
@ 2012-01-15 10:03                         ` Nguyen Thai Ngoc Duy
  2012-01-16 22:15                           ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-01-15 10:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Nieder, Linus Torvalds

On Sat, Jan 14, 2012 at 07:12:03PM -0800, Junio C Hamano wrote:
> That makes my head hurt and makes me suspect there is something
> fundamentally wrong in the patch.  Sigh...

I'll need to think about it. In the meantime perhaps the following
bandage patch would suffice, rather than revert 2f88c19 (diff-index:
pass pathspec down to unpack-trees machinery)

-- 8< --
Subject: [PATCH] diff-index: enable recursive pathspec matching in unpack_trees

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff-lib.c               |    2 ++
 t/t4010-diff-pathspec.sh |    8 ++++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 62f4cd9..fc0dff3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -469,6 +469,8 @@ static int diff_cache(struct rev_info *revs,
 	opts.src_index = &the_index;
 	opts.dst_index = NULL;
 	opts.pathspec = &revs->diffopt.pathspec;
+	opts.pathspec->recursive = 1;
+	opts.pathspec->max_depth = -1;
 
 	init_tree_desc(&t, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index fbc8cd8..af5134b 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -48,6 +48,14 @@ test_expect_success \
      compare_diff_raw current expected'
 
 cat >expected <<\EOF
+:100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	path1/file1
+EOF
+test_expect_success \
+    '"*file1" should show path1/file1' \
+    'git diff-index --cached $tree -- "*file1" >current &&
+     compare_diff_raw current expected'
+
+cat >expected <<\EOF
 :100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	file0
 EOF
 test_expect_success \
-- 
1.7.8.36.g69ee2

-- 8< --
-- 
Duy

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] tree_entry_interesting: make recursive mode default
  2012-01-15 10:03                         ` Nguyen Thai Ngoc Duy
@ 2012-01-16 22:15                           ` Junio C Hamano
  2012-01-18  8:59                             ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2012-01-16 22:15 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Jonathan Nieder, Linus Torvalds

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Sat, Jan 14, 2012 at 07:12:03PM -0800, Junio C Hamano wrote:
>> That makes my head hurt and makes me suspect there is something
>> fundamentally wrong in the patch.  Sigh...
>
> I'll need to think about it. In the meantime perhaps the following
> bandage patch would suffice, rather than revert 2f88c19 (diff-index:
> pass pathspec down to unpack-trees machinery)

Yeah, the logic of this correction is very clear. Because diff_cache is
about walking a flat index, the "recursive pathspec" that allows us to
look into deeper levels in directory hierarchy should be set, and also we
should not be limiting the depth of the match in any way by setting the
max_depth to "unlimited".

Thanks.

> -- 8< --
> Subject: [PATCH] diff-index: enable recursive pathspec matching in unpack_trees
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  diff-lib.c               |    2 ++
>  t/t4010-diff-pathspec.sh |    8 ++++++++
>  2 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index 62f4cd9..fc0dff3 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -469,6 +469,8 @@ static int diff_cache(struct rev_info *revs,
>  	opts.src_index = &the_index;
>  	opts.dst_index = NULL;
>  	opts.pathspec = &revs->diffopt.pathspec;
> +	opts.pathspec->recursive = 1;
> +	opts.pathspec->max_depth = -1;
>  
>  	init_tree_desc(&t, tree->buffer, tree->size);
>  	return unpack_trees(1, &t, &opts);
> diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
> index fbc8cd8..af5134b 100755
> --- a/t/t4010-diff-pathspec.sh
> +++ b/t/t4010-diff-pathspec.sh
> @@ -48,6 +48,14 @@ test_expect_success \
>       compare_diff_raw current expected'
>  
>  cat >expected <<\EOF
> +:100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	path1/file1
> +EOF
> +test_expect_success \
> +    '"*file1" should show path1/file1' \
> +    'git diff-index --cached $tree -- "*file1" >current &&
> +     compare_diff_raw current expected'
> +
> +cat >expected <<\EOF
>  :100644 100644 766498d93a4b06057a8e49d23f4068f1170ff38f 0a41e115ab61be0328a19b29f18cdcb49338d516 M	file0
>  EOF
>  test_expect_success \
> -- 
> 1.7.8.36.g69ee2
>
> -- 8< --

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] tree_entry_interesting: make recursive mode default
  2012-01-16 22:15                           ` Junio C Hamano
@ 2012-01-18  8:59                             ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 56+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-01-18  8:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Nieder, Linus Torvalds

On Mon, Jan 16, 2012 at 02:15:59PM -0800, Junio C Hamano wrote:
> Yeah, the logic of this correction is very clear. Because diff_cache is
> about walking a flat index, the "recursive pathspec" that allows us to
> look into deeper levels in directory hierarchy should be set, and also we
> should not be limiting the depth of the match in any way by setting the
> max_depth to "unlimited".

For the record, enabling wildcard in git-log is just as simple. But I
guess you don't want more changes this late in rc cycles.

-- 8< --
diff --git a/revision.c b/revision.c
index 064e351..c426271 100644
--- a/revision.c
+++ b/revision.c
@@ -1816,6 +1816,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 
 	if (revs->prune_data.nr) {
 		diff_tree_setup_paths(revs->prune_data.raw, &revs->pruning);
+		revs->pruning.pathspec.recursive = 1;
+		revs->pruning.pathspec.max_depth = -1;
 		/* Can't prune commits with rename following: the paths change.. */
 		if (!DIFF_OPT_TST(&revs->diffopt, FOLLOW_RENAMES))
 			revs->prune = 1;
-- 8< --

--
Duy

^ permalink raw reply related	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2012-01-18  9:00 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-23  7:25 What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`? Marat Radchenko
2011-08-23 10:03 ` Michael J Gruber
2011-08-23 10:52   ` Marat Radchenko
2011-08-23 15:20     ` Michael Witten
2011-08-23 15:34     ` Michael J Gruber
2011-08-23 16:45       ` Marat Radchenko
2011-08-23 17:15       ` Junio C Hamano
2011-08-23 18:21         ` Marat Radchenko
2011-08-23 20:07         ` Michael J Gruber
2011-08-25 16:09           ` Marat Radchenko
2011-08-25 21:10           ` Junio C Hamano
2011-08-26  9:43             ` Marat Radchenko
2011-08-29  7:41 ` Nguyen Thai Ngoc Duy
2011-08-29 14:48   ` Marat Radchenko
2011-08-29 16:09     ` Nguyen Thai Ngoc Duy
2011-08-29 17:18       ` Junio C Hamano
2011-08-29 20:42         ` Junio C Hamano
2011-08-29 20:50           ` Junio C Hamano
2011-08-29 21:09           ` Junio C Hamano
2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
2011-08-30 12:53               ` Nguyen Thai Ngoc Duy
2011-08-30 17:44                 ` Junio C Hamano
2011-08-31  1:35                   ` Nguyen Thai Ngoc Duy
2011-10-09 15:39               ` Michael Haggerty
2011-10-09 21:35                 ` Nguyen Thai Ngoc Duy
2011-10-10  4:42                   ` Michael Haggerty
2011-08-29 21:33             ` [PATCH 2/3] unpack-trees: " Junio C Hamano
2011-08-30 13:03               ` Nguyen Thai Ngoc Duy
2011-08-30 17:32                 ` Junio C Hamano
2011-08-30 15:24               ` David Michael Barr
2011-08-29 21:33             ` [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery Junio C Hamano
2012-01-11  6:31               ` Jonathan Nieder
2012-01-11  8:05                 ` Junio C Hamano
2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
2012-01-11 12:47                   ` Nguyen Thai Ngoc Duy
2012-01-11 20:40                   ` Junio C Hamano
2012-01-12  4:09                 ` [PATCH] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
2012-01-12  5:04                   ` Junio C Hamano
2012-01-12  5:44                     ` Nguyen Thai Ngoc Duy
2012-01-14  9:23                   ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Nguyễn Thái Ngọc Duy
2012-01-14  9:23                     ` [PATCH v2 2/2] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
2012-01-15  3:12                       ` Junio C Hamano
2012-01-15 10:03                         ` Nguyen Thai Ngoc Duy
2012-01-16 22:15                           ` Junio C Hamano
2012-01-18  8:59                             ` Nguyen Thai Ngoc Duy
2012-01-15  2:38                     ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Junio C Hamano
2012-01-15  9:48                       ` Nguyen Thai Ngoc Duy
2011-08-29 21:56             ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Linus Torvalds
2011-08-29 22:05               ` Junio C Hamano
2011-08-29 22:11                 ` Linus Torvalds
2011-08-29 23:42                   ` Junio C Hamano
2011-08-30  6:16                     ` Marat Radchenko
2011-08-31  0:18                       ` Junio C Hamano
2011-08-30 10:04             ` Michael J Gruber
2011-08-30 17:03               ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.