All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] t4202 (log): add failing test for log with subtree
@ 2013-04-22 12:08 Ramkumar Ramachandra
  2013-04-22 12:11 ` Matthieu Moy
  2013-04-22 13:15 ` Thomas Rast
  0 siblings, 2 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 12:08 UTC (permalink / raw)
  To: Git List; +Cc: Avery Pennarun, Junio C Hamano

When using 'git subtree' to add an external project at a given prefix,
log of a pathspec within the prefix fails to give the expected output.
This failure can be interpreted as log expecting all trees to be read
into /.  Document this bug by adding a failing test.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 I normally don't use git-subtree, but happened to notice this when I
 was checking it out: a 'git log <pathspec>', when referring to a file
 inside the subtree, doesn't work as expected: it only displays the
 HEAD commit.  I know this is not related to git-subtree at all, and
 has to do with how 'git log' (and probably 'git merge -Xsubtree',
 'git merge -s subtree') works.  I suspect that 'git log' expects all
 trees to be read into /, and I've tried to prove this with a failing
 test.

 I think this is a bug, but I might be missing something.  Can someone
 tell me what is actually happening?

 t/t4202-log.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 9243a97..523c1be 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -168,6 +168,37 @@ test_expect_success 'git log --follow' '
 
 '
 
+test_expect_failure 'log pathspec in tree read into prefix' '
+	git checkout --orphan subtree &&
+	git rm -rf . &&
+	echo foodle >ichi &&
+	git add ichi &&
+	test_tick &&
+	git commit -m foom &&
+	echo moodle >unrelated &&
+	git add unrelated &&
+	test_tick &&
+	git commit -m quux &&
+	subtree_h=$(git rev-parse HEAD) &&
+	git checkout master &&
+	orig_h=$(git rev-parse HEAD) &&
+	git read-tree --prefix=bar $subtree_h &&
+	new_t=$(git write-tree) &&
+	new_h=$(echo "new subtree" |
+	git commit-tree $new_t -p $orig_h -p $subtree_h) &&
+	git reset --hard $new_h &&
+	(
+		cd bar &&
+		git log --oneline ichi >../actual
+	) &&
+	cat >expect <<-\EOF &&
+	61dcd8e new subtree
+	130a8fb foom
+	EOF
+	git reset --hard HEAD~1 &&
+	test_cmp expect actual
+'
+
 cat > expect << EOF
 804a787 sixth
 394ef78 fifth
-- 
1.8.2.1.546.gea3475a

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:08 [PATCH] t4202 (log): add failing test for log with subtree Ramkumar Ramachandra
@ 2013-04-22 12:11 ` Matthieu Moy
  2013-04-22 12:17   ` Thomas Rast
  2013-04-22 12:29   ` Ramkumar Ramachandra
  2013-04-22 13:15 ` Thomas Rast
  1 sibling, 2 replies; 34+ messages in thread
From: Matthieu Moy @ 2013-04-22 12:11 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra <artagnon@gmail.com> writes:

>  was checking it out: a 'git log <pathspec>', when referring to a file
>  inside the subtree, doesn't work as expected: it only displays the
>  HEAD commit.

This is somehow expected: the subtree merge changed the filename during
merge (it is subtree/file.txt after the merge, and just file.txt
before), so "git log" without --follow just considers the file appeared.

OTOH, I think this is a known limitation of "git log --follow" that it
does not follow renames done by subtree merges.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:11 ` Matthieu Moy
@ 2013-04-22 12:17   ` Thomas Rast
  2013-04-22 12:37     ` Ramkumar Ramachandra
  2013-04-22 12:29   ` Ramkumar Ramachandra
  1 sibling, 1 reply; 34+ messages in thread
From: Thomas Rast @ 2013-04-22 12:17 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Ramkumar Ramachandra, Git List, Avery Pennarun, Junio C Hamano

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Ramkumar Ramachandra <artagnon@gmail.com> writes:
>
>>  was checking it out: a 'git log <pathspec>', when referring to a file
>>  inside the subtree, doesn't work as expected: it only displays the
>>  HEAD commit.
>
> This is somehow expected: the subtree merge changed the filename during
> merge (it is subtree/file.txt after the merge, and just file.txt
> before), so "git log" without --follow just considers the file appeared.
>
> OTOH, I think this is a known limitation of "git log --follow" that it
> does not follow renames done by subtree merges.

Umm, it should follow the rename.  The big limitation is that it is
unable to follow more than one name at a time, so if the file exists on
both sides of the subtree merge, it will follow the original name.  And
that's the common case; only the very first merge of the subtree has the
files only on one side.

You can see this by comparing

  git log --oneline --follow gitk-git/gitk

with

  git log --oneline -- gitk gitk-git/gitk

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:11 ` Matthieu Moy
  2013-04-22 12:17   ` Thomas Rast
@ 2013-04-22 12:29   ` Ramkumar Ramachandra
  2013-04-22 12:36     ` Matthieu Moy
  1 sibling, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 12:29 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Git List, Avery Pennarun, Junio C Hamano

Matthieu Moy wrote:
> This is somehow expected: the subtree merge changed the filename during
> merge (it is subtree/file.txt after the merge, and just file.txt
> before), so "git log" without --follow just considers the file appeared.

No, a merge does not "change" any filenames.  The history of the file
is very much present: run a git log HEAD^2 to see the entire history
of the subtree.  Even a git blame (without -M or -C) works just fine.

> OTOH, I think this is a known limitation of "git log --follow" that it
> does not follow renames done by subtree merges.

Um, no.  I think --follow is entirely orthogonal to the issue: unless
I'm mistaken, it looks for other blobs in history with heuristically
similar content.

The real issue has nothing to do with log itself: it has to do with
how rev-parse handles pathspecs.  A 'git rev-parse
HEAD:subproject/README' works fine, but 'git rev-parse
HEAD^2:subproject/README' fails.  However, 'git rev-parse
HEAD^2:README' works, but it is assuming that the path README is
present in /, when it is actually present in subproject/.  Now, I'm
not sure rev-parse is doing something unexpected, which is why I filed
the bug in log.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:29   ` Ramkumar Ramachandra
@ 2013-04-22 12:36     ` Matthieu Moy
  0 siblings, 0 replies; 34+ messages in thread
From: Matthieu Moy @ 2013-04-22 12:36 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Matthieu Moy wrote:
>> This is somehow expected: the subtree merge changed the filename during
>> merge (it is subtree/file.txt after the merge, and just file.txt
>> before), so "git log" without --follow just considers the file appeared.
>
> No, a merge does not "change" any filenames.

Read again my message, especially the "it is subtree/file.txt after the
merge, and just file.txt before" part. Replace "subtree" with "bar" and
"file.txt" with "ichi".

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:17   ` Thomas Rast
@ 2013-04-22 12:37     ` Ramkumar Ramachandra
  2013-04-22 12:54       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 12:37 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Matthieu Moy, Git List, Avery Pennarun, Junio C Hamano

Thomas Rast wrote:
> Umm, it should follow the rename.

There is no rename.  Unless there is a commit with the following diff
(with heuristically similar content), I don't see how --follow can
work:

    diff --git a/README b/README
    deleted file mode 100644

    diff --git a/subproject/README b/subproject/README
    new file mode 100644

Here, we just created one tiny merge commit after reading a tree into
a non-/ prefix.  There is no diff associated.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:37     ` Ramkumar Ramachandra
@ 2013-04-22 12:54       ` Ramkumar Ramachandra
  0 siblings, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 12:54 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Matthieu Moy, Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra wrote:
> Unless there is a commit with the following diff
> (with heuristically similar content), I don't see how --follow can
> work:
>
>     diff --git a/README b/README
>     deleted file mode 100644
>
>     diff --git a/subproject/README b/subproject/README
>     new file mode 100644

In other words, git diff-tree HEAD^2 HEAD is absolute nonsense in the
subtree case.

Let me outline how I think --follow works:
A 'git log --follow foo' executes a diff-tree with historical trees
until it finds one that removes (or adds, depending on which way you
look at it) the 'foo' entry.  It then inspects all the trees in the
corresponding commit for a blob that is heuristically similar to the
HEAD:foo blob, and follows it.

How can you logically extend this concept to handle my case?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 12:08 [PATCH] t4202 (log): add failing test for log with subtree Ramkumar Ramachandra
  2013-04-22 12:11 ` Matthieu Moy
@ 2013-04-22 13:15 ` Thomas Rast
  2013-04-22 13:19   ` Thomas Rast
  2013-04-22 14:30   ` Ramkumar Ramachandra
  1 sibling, 2 replies; 34+ messages in thread
From: Thomas Rast @ 2013-04-22 13:15 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra <artagnon@gmail.com> writes:
>

So after a long IRC discussion trying to hash out what you *want* it to
do, here's the summary for everyone else.  This test is wrong on several
counts.  For simplicity I'll denote by M the subtree merge, called
$new_h in the actual test.

> +test_expect_failure 'log pathspec in tree read into prefix' '
> +	git checkout --orphan subtree &&
> +	git rm -rf . &&
> +	echo foodle >ichi &&

'ichi' also exists in M^1 because you reused a name from another test.
So rename detection will never pair the eventual 'bar/ichi' with this
'ichi', because the 'ichi' path was *modified*, not deleted, w.r.t. M^1.

Just to clarify, rename detection is based on seeing

  A foo
  D bar

and changing that to

  R bar -> foo

assuming the blobs were reasonably similar.  And indeed, *copy*
detection (-C) is able to figure it out, because it considers all paths
that were modified as possible candidates for a copy source.

But --follow only uses rename detection.

> +	git add ichi &&
> +	test_tick &&
> +	git commit -m foom &&
> +	echo moodle >unrelated &&
> +	git add unrelated &&
> +	test_tick &&
> +	git commit -m quux &&
> +	subtree_h=$(git rev-parse HEAD) &&
> +	git checkout master &&
> +	orig_h=$(git rev-parse HEAD) &&
> +	git read-tree --prefix=bar $subtree_h &&

You need to supply a trailing / for the prefix, it's not implied.

> +	new_t=$(git write-tree) &&
> +	new_h=$(echo "new subtree" |
> +	git commit-tree $new_t -p $orig_h -p $subtree_h) &&
> +	git reset --hard $new_h &&
> +	(
> +		cd bar &&
> +		git log --oneline ichi >../actual

You need to use --follow, as otherwise the pathspec filtering is
considered fixed.

Note that as discussed in the rest of the thread, --follow is fairly
limited and doesn't *really* solve the problem.  But it does work
assuming the filenames are different and you only have one subtree
merge, like in this case.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 13:15 ` Thomas Rast
@ 2013-04-22 13:19   ` Thomas Rast
  2013-04-22 14:30   ` Ramkumar Ramachandra
  1 sibling, 0 replies; 34+ messages in thread
From: Thomas Rast @ 2013-04-22 13:19 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Avery Pennarun, Junio C Hamano

Thomas Rast <trast@inf.ethz.ch> writes:

>> +test_expect_failure 'log pathspec in tree read into prefix' '
>> +	git checkout --orphan subtree &&
>> +	git rm -rf . &&
>> +	echo foodle >ichi &&
>
> 'ichi' also exists in M^1 because you reused a name from another test.
> So rename detection will never pair the eventual 'bar/ichi' with this
> 'ichi', because the 'ichi' path was *modified*, not deleted, w.r.t. M^1.

Argh, that should read 'w.r.t. M^2', i.e. the subtree side.

The subtree side brings its own 'ichi', but it is moved to bar/ichi, so
there is a large difference between M:ichi (which came from M^1) and
M^2:ichi.


PS: As mentioned on IRC, even if you fix all that, a one-line file is
probably too small to pass the rename detection heuristics.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 13:15 ` Thomas Rast
  2013-04-22 13:19   ` Thomas Rast
@ 2013-04-22 14:30   ` Ramkumar Ramachandra
  2013-04-22 14:57     ` Ramkumar Ramachandra
  2013-04-22 15:24     ` Thomas Rast
  1 sibling, 2 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 14:30 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Thomas Rast wrote:
> [...]

I think you've misunderstood the whole thing.  The histories of M^1
and M^2 are completely unrelated: they're from different projects
altogether.  Considering the /ichi in M^2 a "rename" of the /ichi in
M^1 is completely wrong.  They have nothing to do with each other.  I
intentionally named it "ichi" in my orphan branch just to drive my
point.  I suspect you've got confused because I used an orphan branch
to emulate a different project's history.  If you want an end-user
understanding of the problem, use git subtree:

    $ cd /tmp
    $ git clone gh:artagnon/varlog
    $ cd varlog
    $ git subtree add --prefix=clayoven \
       gh:artagnon/clayoven master
    $ cd clayoven
    $ git log README.md

What do you expect?  The same output you would get if you cloned
gh:artagnon/clayoven separately and executed 'git log README.md' on
it.

Now, clayoven's README.md (the one in HEAD^2) has nothing to do with
varlog's README.md (the one in HEAD^1).  It's just incidental that
both projects have a README.md.  I repeat: clayoven and varlog have
_nothing_ to do with each other.  If I say git log --follow README.md
in the above example, I don't even get the HEAD commit.  And I
wouldn't expect to either.

I will repeat this: --follow has nothing to do with the problem I've
specified.  And it is not tied to "renaming" (ie. changing the
name/path of a file) as you've made it look.  If you're still not
convinced, I've included a testcase for --follow "following" over a
merge commit (include it after the --follow test in the t4202).  Try
it without the --follow and you'll see what I mean.  Neither the
filename nor the filepath of ichi has changed in this example.

-- 8< --
test_expect_success '--follow over merge' '
	git checkout -b featurebranch
	echo foodle >>ichi &&
	git add ichi &&
	test_tick &&
	git commit -m "add a line to the end of ichi" &&
	echo moodle >unrelated &&
	git add unrelated &&
	test_tick &&
	git commit -m quux &&
	git checkout master &&
	mv ichi ichi.bak &&
	echo gooble >ichi &&
	cat ichi.bak >>ichi &&
	git add ichi &&
	test_tick &&
	git commit -m "add a line to the beginning of ichi" &&
	git merge featurebranch &&
	git log --follow --oneline ichi >actual &&
	cat >expect <<-\EOF &&
	df26551 add a line to the beginning of ichi
	882d8d9 add a line to the end of ichi
	2fbe8c0 third
	f7dab8e second
	3a2fdcb initial
	EOF
	test_cmp expect actual
'

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 14:30   ` Ramkumar Ramachandra
@ 2013-04-22 14:57     ` Ramkumar Ramachandra
  2013-04-22 15:24     ` Thomas Rast
  1 sibling, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 14:57 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra wrote:
>     $ git log README.md
>
> What do you expect?  The same output you would get if you cloned
> gh:artagnon/clayoven separately and executed 'git log README.md' on
> it.

And the reason I brought up rev-parse in the first place is because
this problem is not unique to log.  Try a 'git shortlog README.md' if
you're not convinced.  The entire revision-walking-pathspec-filtering
mechanism in the log-like-family is broken on trees that are read
into a non-/ prefix.

And no, it doesn't have anything to do with renames or "rename
detection".  If it did, 'git shortlog README.md' should atleast give
me the commit that was responsible for the rename.  And if you're
still wondering about --follow, 'git shortlog --follow' (undocumented)
is completely broken.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 14:30   ` Ramkumar Ramachandra
  2013-04-22 14:57     ` Ramkumar Ramachandra
@ 2013-04-22 15:24     ` Thomas Rast
  2013-04-22 15:46       ` Ramkumar Ramachandra
                         ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Thomas Rast @ 2013-04-22 15:24 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Thomas Rast wrote:
>> [...]
>
> I think you've misunderstood the whole thing.  The histories of M^1
> and M^2 are completely unrelated: they're from different projects
> altogether.  Considering the /ichi in M^2 a "rename" of the /ichi in
> M^1 is completely wrong.  They have nothing to do with each other.  I
> intentionally named it "ichi" in my orphan branch just to drive my
> point.  I suspect you've got confused because I used an orphan branch
> to emulate a different project's history.  If you want an end-user
> understanding of the problem, use git subtree:
>
>     $ cd /tmp
>     $ git clone gh:artagnon/varlog
>     $ cd varlog
>     $ git subtree add --prefix=clayoven \
>        gh:artagnon/clayoven master
>     $ cd clayoven
>     $ git log README.md
>
> What do you expect?  The same output you would get if you cloned
> gh:artagnon/clayoven separately and executed 'git log README.md' on
> it.

No, I don't.  But that's probably because I know a few things about how
git-log works that your hypothetical $USER doesn't.

At the risk of restating what everyone agrees on: It's a design
principle of git that it only stores tree states, and anything about
diffs, files, renames, etc. is purely in the imagination of the user.

We support that imagination by having analysis tools with which some
things can be found out, but others can't.

So (I think?) in the above you claim that $USER interprets

  git log -- README.md

as

  Show me the history of README.md.

But there's no such thing as the history of a file!  The command instead
says

  If I filter all history for only changes affecting a path 'README.md'
  in the root of the repository[1], then what does it look like?


So please don't write tests that go contrary to that definition, because
they're *wrong*.  The current implementation precisely matches the
current definition of pathspec filtering.

You can try arguing for changing the definition, but unless you find one
that can be implemented fast enough to be generally usable, I will
oppose that change.


The only thing that's broken in any of this is that I think, as
explained on IRC, that a hypothetical fixed --follow -C should be able
to figure out this case.  By spending extra cycles on analysis,
naturally.


[1]  and also skipping lines of history that seem uninteresting at this
point already, compare --simplify-merges

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 15:24     ` Thomas Rast
@ 2013-04-22 15:46       ` Ramkumar Ramachandra
  2013-04-22 15:50       ` Ramkumar Ramachandra
  2013-04-22 16:32       ` Junio C Hamano
  2 siblings, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 15:46 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Thomas Rast wrote:
> So (I think?) in the above you claim that $USER interprets
>
>   git log -- README.md
>
> as
>
>   Show me the history of README.md.
>
> But there's no such thing as the history of a file!

I made no such claims.  I might not know as much as you or the others
on the list about git, but I can certainly grok how git stores
history.

There needs to be some amount of mutual respect for a healthy
conversation: if you start assuming midway that I don't understand
what history is, we have a problem.

> So please don't write tests that go contrary to that definition, because
> they're *wrong*.  The current implementation precisely matches the
> current definition of pathspec filtering.

Who said anything about changing any definitions?  Where are you
getting all this from?

How does "git log HEAD~3 -- README" work?  It sets up a revision walk
to start from HEAD~3 going all the way down to the root commit.  In
each of these commits, it looks for the entry "README" in the
corresponding tree.  It then runs diff-tree with the previous commit's
tree to see if the object (blob) corresponding to the "README" entry
is different: if so, it selects the commit and displays it.

Now, what am I saying?  I'm saying that this approach assumes that all
trees are read into /.  A pathspec "subproject/README" is _only_
present in the subtree-merge commit^{tree} and nowhere else.  The
current log algorithm might try to look for the entry
"subproject/README" (your pathspec) in all the commit^{tree}s of the
commits leading up to M^2.  That is _not_ the problem, as I have
already illustrated that --follow follows over merges.  The problem is
looking for the pathspec "subproject/README" in the first place: those
commit^{trees}s have the entry stored as "README".

Am I making any sense, or are you going to accuse me of not
understanding trees now?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 15:24     ` Thomas Rast
  2013-04-22 15:46       ` Ramkumar Ramachandra
@ 2013-04-22 15:50       ` Ramkumar Ramachandra
  2013-04-22 15:54         ` Ramkumar Ramachandra
  2013-04-22 16:32       ` Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 15:50 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Thomas Rast wrote:
> The only thing that's broken in any of this is that I think, as
> explained on IRC, that a hypothetical fixed --follow -C should be able
> to figure out this case.  By spending extra cycles on analysis,
> naturally.

For the 100th time, nothing has been "copied".  There is no need to
spend time on any analysis.  It's a very straightforward problem that
requires no computation or heuristics: it just requires you to strip
the leading "subproject/" when looking for pathspecs in the M^2
commit^{tree}s.  Done.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 15:50       ` Ramkumar Ramachandra
@ 2013-04-22 15:54         ` Ramkumar Ramachandra
  2013-04-22 17:29           ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 15:54 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra wrote:
> For the 100th time, nothing has been "copied".  There is no need to
> spend time on any analysis.  It's a very straightforward problem that
> requires no computation or heuristics: it just requires you to strip
> the leading "subproject/" when looking for pathspecs in the M^2
> commit^{tree}s.  Done.

And if you're still not convinced, run 'git log HEAD^2 -- README.md'
from the toplevel directory.  You'll get the log of README.md from the
subproject.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 15:24     ` Thomas Rast
  2013-04-22 15:46       ` Ramkumar Ramachandra
  2013-04-22 15:50       ` Ramkumar Ramachandra
@ 2013-04-22 16:32       ` Junio C Hamano
  2013-04-22 18:00         ` Ramkumar Ramachandra
  2013-04-22 20:39         ` Ramkumar Ramachandra
  2 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2013-04-22 16:32 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Ramkumar Ramachandra, Git List, Avery Pennarun

Thomas Rast <trast@inf.ethz.ch> writes:

> So (I think?) in the above you claim that $USER interprets
>
>   git log -- README.md
>
> as
>
>   Show me the history of README.md.
>
> But there's no such thing as the history of a file!  The command instead
> says
>
>   If I filter all history for only changes affecting a path 'README.md'
>   in the root of the repository[1], then what does it look like?

Correct.  The "design principle" I did not quote from your message
comes from one of the most important messages in the list archive
($gmane/217).

Three issues with "--follow" are:

 (1) It uses the given pathname as single pathspec and drill down
     the same way without "--follow" until it notices the path
     disappears and until then there is no attempt to detect renames
     is made.  And it only does -M variant of rename detection

 (2) The "same way" in (1) includes the merge simplification to cull
     side branches if one parent matches the end result.  In a
     history where the first parent of a merge M, i.e. M^1, did not
     have path F, its second parent M^2 had path F, and the merge
     result M deposited the contents of M^2:F at M:D/F, then running
     "log --follow F" starting from M would notice that the end
     result M did not have F in it, which is shared with its first
     parent M^1, and culls the side branch M^2.  The --full-history
     option may let you keep the history leading to M^2, though.

 (3) When (1) notices that the path being followed did not exist in
     any of the parents (be it a merge or a non-merge) and finds a
     different path with a similar looking content, it _switches_
     the pathspec to it, but the single pathspec it uses is a global
     state and affects traversals of other ancestry paths at the
     same time.  Because of this, "--follow" will not work correctly
     in a history that contains merges.  It often _appears_ to work
     only by accident.

The first two are relatively minor (the second is not even an
issue).

To solve the last one, you would need to keep track of which path it
is following per traversal path.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 15:54         ` Ramkumar Ramachandra
@ 2013-04-22 17:29           ` Ramkumar Ramachandra
  2013-04-22 19:15             ` Thomas Rast
  0 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 17:29 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra wrote:
> And if you're still not convinced, run 'git log HEAD^2 -- README.md'
> from the toplevel directory.  You'll get the log of README.md from the
> subproject.

On IRC, Thomas explained to me that mixing in changes from various
branches into the pathspec will break this so-called determinism.  To
try it out for yourself, do:

    $ cd /tmp
    $ git clone gh:trast/subtree-mainline-example
    $ cd subtree-mainline-example
    $ git log HEAD^2 -- sub # only lists the side changes
    $ git log -- dir/sub # only lists the mainline changes

What we should really expect is a mix of the two.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 16:32       ` Junio C Hamano
@ 2013-04-22 18:00         ` Ramkumar Ramachandra
  2013-04-22 18:18           ` Matthieu Moy
  2013-04-22 19:09           ` Junio C Hamano
  2013-04-22 20:39         ` Ramkumar Ramachandra
  1 sibling, 2 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Rast, Git List, Avery Pennarun

Junio C Hamano wrote:
> [...]
>  (3) When (1) notices that the path being followed did not exist in
>      any of the parents (be it a merge or a non-merge) and finds a
>      different path with a similar looking content, it _switches_
>      the pathspec to it, but the single pathspec it uses is a global
>      state and affects traversals of other ancestry paths at the
>      same time.  Because of this, "--follow" will not work correctly
>      in a history that contains merges.  It often _appears_ to work
>      only by accident.

This explanation is all very nice, but isn't it completely tangential
to the issue at hand?

Let's say I have a subtree merge M located at HEAD~4.  I ask for the
log of 'HEAD~4 -- README' with --follow.  It follows until it gets to
M: at M, M^1:README is missing, but M^2:README is present.  Should it
follow down and show the history of M^2:README?

You can reserve the discussion about --follow working in the general
case for another thread.  Meanwhile, you're evading the issue of
assuming that all trees are read into /, and are really representing
the same project's history, while this is not the case.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 18:00         ` Ramkumar Ramachandra
@ 2013-04-22 18:18           ` Matthieu Moy
  2013-04-22 19:09           ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Matthieu Moy @ 2013-04-22 18:18 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Junio C Hamano, Thomas Rast, Git List, Avery Pennarun

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Meanwhile, you're evading the issue of assuming that all trees are
> read into /, and are really representing the same project's history,
> while this is not the case.

This is fundamenally how Git works. Git works with commit objects, each
commit object points to a tree object that represent "/" at this commit.

When you do a subtree merge, you include the tree that was in "/" in a
subtree of the master project. Files used to exist as /file, and now
exist as /subtree/file. There is nothing recording that the new
/subtree/file comes from /file in its second parent. Call this a
renaming or not, but "git log subtree/file" won't show you changes
touching "/file" by default, and this is the case for the history of the
subproject you are merging.

A subtree merge is really a rename of the subproject's file plus a
merge, done in the same commit. Try doing something like

mkdir -p subproject
git mv * subproject/
git commit

in your subproject before doing a merge, you'll get the same result,
except there will be one more commit.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 18:00         ` Ramkumar Ramachandra
  2013-04-22 18:18           ` Matthieu Moy
@ 2013-04-22 19:09           ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2013-04-22 19:09 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Thomas Rast, Git List, Avery Pennarun

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> [...]
>>  (3) When (1) notices that the path being followed did not exist in
>>      any of the parents (be it a merge or a non-merge) and finds a
>>      different path with a similar looking content, it _switches_
>>      the pathspec to it, but the single pathspec it uses is a global
>>      state and affects traversals of other ancestry paths at the
>>      same time.  Because of this, "--follow" will not work correctly
>>      in a history that contains merges.  It often _appears_ to work
>>      only by accident.
>
> This explanation is all very nice, but isn't it completely tangential
> to the issue at hand?

I think you are talking mostly about (2), but the primary purpose of
my message was not about your specific issue.

It was to give a larger picture to people who may be inclined to
tackle, and may be capable of tackling, the real issues in the
"--follow" codepath.  Anybody who wants to update it needs to be
aware of all three.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 17:29           ` Ramkumar Ramachandra
@ 2013-04-22 19:15             ` Thomas Rast
  2013-04-22 19:54               ` Ramkumar Ramachandra
  2013-04-22 21:06               ` Matthieu Moy
  0 siblings, 2 replies; 34+ messages in thread
From: Thomas Rast @ 2013-04-22 19:15 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Ramkumar Ramachandra wrote:
>> And if you're still not convinced, run 'git log HEAD^2 -- README.md'
>> from the toplevel directory.  You'll get the log of README.md from the
>> subproject.
>
> On IRC, Thomas explained to me that mixing in changes from various
> branches into the pathspec will break this so-called determinism.  To
> try it out for yourself, do:
>
>     $ cd /tmp
>     $ git clone gh:trast/subtree-mainline-example
>     $ cd subtree-mainline-example
>     $ git log HEAD^2 -- sub # only lists the side changes
>     $ git log -- dir/sub # only lists the mainline changes
>
> What we should really expect is a mix of the two.

So after lots of confusing misunderstandings across the entire thread,
and a long IRC discussion, I do have one take-away that I think is worth
writing down:

There is a market for a rename detection that works at the tree level.

Bear with me while I explain.

The average subtree merge (after the initial one) looks like this,
focusing on the subtree:

  M    new state in sub/
  | \
  |  * new state in /
  |
  * old state in sub/

(The example in the quote above additionally complicates the issue by
changing sub/ in the mainline.)

Ideally we'd like our hypothetical fixed --follow to accurately track a
pathspec 'sub/foo' so that in the mainline it remains the same, but in
the side it becomes 'foo'.  Because of reasons already explained in
earlier mails, -M does not suffice for all cases (in particular, it
fails if there is a /foo in the mainline too).  -C works, but is slow.

So how can we fix that?  We could try to somehow figure out that M:sub/
refers to the same thing as M^2:/, by looking at them at the tree level.
Let's provisionally call this --follow-tree-rename.

I don't quite know how to heuristically[1] detect such a rename, but
since 'merge -ssubtree' does it (if you don't specify a tree prefix
explicitly), it can't be That Hard(tm).  If we're extra lucky it's fast
enough to be enabled by default.

In the special case where the subtree was not modified in the mainline
since the last merge, the problem is pretty trivial: simply check if any
subtree of M agrees with the root tree of each merge parent; if so, diff
those trees instead of the root trees.

That should then help subtree users to get better logs.


[1]  the quoted example shows that you can't just go looking for
identical trees in the general case, so it is really a heuristic

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 19:15             ` Thomas Rast
@ 2013-04-22 19:54               ` Ramkumar Ramachandra
  2013-04-22 21:00                 ` Philip Oakley
  2013-04-22 21:06               ` Matthieu Moy
  1 sibling, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 19:54 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Avery Pennarun, Junio C Hamano

Thomas Rast wrote:
> There is a market for a rename detection that works at the tree level.

Exactly.  And making it auto-follow with a low threshold would be a
great default.  We'll have to deal with D/F conflicts that Junio was
talking about in (2), every once in a while.  We'll have a lot more
incentive to build the D/F conflict resolution process a nice UI.

git-core will actually start working with trees the way it works with blobs.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 16:32       ` Junio C Hamano
  2013-04-22 18:00         ` Ramkumar Ramachandra
@ 2013-04-22 20:39         ` Ramkumar Ramachandra
  1 sibling, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 20:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Rast, Git List, Avery Pennarun

Junio C Hamano wrote:
>  (1) It uses the given pathname as single pathspec and drill down
>      the same way without "--follow" until it notices the path
>      disappears and until then there is no attempt to detect renames
>      is made.  And it only does -M variant of rename detection

On this.  It might be profitable to auto-follow at a low threshold
(and change --follow to a number  argument like -M, -C).  I'm not
asking you to do it for more than one-file at a time, but I often view
one file's history: I was just going through the history of
builtin/merge-trees.c (what I was looking for was mostly in the
builtin-* variant), and it would've been nice if I didn't have to quit
and come back with a --follow.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 19:54               ` Ramkumar Ramachandra
@ 2013-04-22 21:00                 ` Philip Oakley
  2013-04-22 21:08                   ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Philip Oakley @ 2013-04-22 21:00 UTC (permalink / raw)
  To: Ramkumar Ramachandra, Thomas Rast
  Cc: Git List, Avery Pennarun, Junio C Hamano

From: "Ramkumar Ramachandra" <artagnon@gmail.com>
Sent: Monday, April 22, 2013 8:54 PM
> Thomas Rast wrote:
>> There is a market for a rename detection that works at the tree 
>> level.
>
> Exactly.  And making it auto-follow with a low threshold would be a
> great default.  We'll have to deal with D/F conflicts that Junio was
> talking about in (2), every once in a while.  We'll have a lot more
> incentive to build the D/F conflict resolution process a nice UI.
>
> git-core will actually start working with trees the way it works with 
> blobs.
> --

Is this not similar to the previous attempts at bulk rename detection? 
Such as $gmane/160233.

A practical bulk rename detection would be good. It doesn't have to be 
perfect.

Philip 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 19:15             ` Thomas Rast
  2013-04-22 19:54               ` Ramkumar Ramachandra
@ 2013-04-22 21:06               ` Matthieu Moy
  2013-04-22 21:59                 ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Matthieu Moy @ 2013-04-22 21:06 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Ramkumar Ramachandra, Git List, Avery Pennarun, Junio C Hamano

Thomas Rast <trast@inf.ethz.ch> writes:

> So how can we fix that?  We could try to somehow figure out that M:sub/
> refers to the same thing as M^2:/, by looking at them at the tree level.
> Let's provisionally call this --follow-tree-rename.

There was a patch serie long ago that implemented directory rename
detection:

  http://thread.gmane.org/gmane.comp.version-control.git/163328

I'm not sure why it hasn't been merged.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 21:00                 ` Philip Oakley
@ 2013-04-22 21:08                   ` Ramkumar Ramachandra
  2013-04-22 21:23                     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 21:08 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Thomas Rast, Git List, Avery Pennarun, Junio C Hamano

Philip Oakley wrote:
> Is this not similar to the previous attempts at bulk rename detection? Such
> as $gmane/160233.

Yes.  Does anyone know what happened to the series?

... and I wonder how git merge -s subtree works (still reading).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 21:08                   ` Ramkumar Ramachandra
@ 2013-04-22 21:23                     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 21:23 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Thomas Rast, Git List, Avery Pennarun, Junio C Hamano

Ramkumar Ramachandra wrote:
> Philip Oakley wrote:
>> Is this not similar to the previous attempts at bulk rename detection? Such
>> as $gmane/160233.
>
> Yes.  Does anyone know what happened to the series?

Hm, this and the series Matthieu posted ($gmane/163328) seems to be
solving the more generalized problem.  My initial problem was to do it
only when following a single file's history (probably a --tree-follow
corresponding to --follow).  And to probably restrict looking only in
merge commits (?).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 21:06               ` Matthieu Moy
@ 2013-04-22 21:59                 ` Junio C Hamano
  2013-04-22 22:52                   ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2013-04-22 21:59 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Thomas Rast, Ramkumar Ramachandra, Git List, Avery Pennarun

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Thomas Rast <trast@inf.ethz.ch> writes:
>
>> So how can we fix that?  We could try to somehow figure out that M:sub/
>> refers to the same thing as M^2:/, by looking at them at the tree level.
>> Let's provisionally call this --follow-tree-rename.
>
> There was a patch serie long ago that implemented directory rename
> detection:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/163328
>
> I'm not sure why it hasn't been merged.

I have forgotten about this topic (and its numerous iterations in
the past), but it appears me that people mostly lost interest after
v7 review cycle where the series looked like a solution that is
looking for a problem.

I took another quick look at it now, but it tells me that the series
was a good platform to discuss the design and the goals but it was
far from ready with comments like this "// FIXME: leaks like sieve".

As to the design (not the implementation), I find what are in the
documentation updates in [v9 2/6] more or less reasonable, even
though I still doubt that foo/ => bar/, when the constituent files
are not moved a la "mv foo/* bar/." is useful.  The change in [v9
3/6] is distasteful and should just be killed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 21:59                 ` Junio C Hamano
@ 2013-04-22 22:52                   ` Ramkumar Ramachandra
  2013-04-22 22:59                     ` Ramkumar Ramachandra
  2013-04-22 23:55                     ` Junio C Hamano
  0 siblings, 2 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 22:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthieu Moy, Thomas Rast, Git List, Avery Pennarun

Junio C Hamano wrote:
> I have forgotten about this topic (and its numerous iterations in
> the past), but it appears me that people mostly lost interest after
> v7 review cycle where the series looked like a solution that is
> looking for a problem.

I agree.  Over-generalizing the problem is going nowhere.  I have some
ideas in my head, but I think I'm at the brink of insanity again:

We have a special type of merge commit that has 'bind' lines
corresponding to the trees of the commits we just merged in: each line
references a tree and a prefix.  Then, a diff between the merge's tree
and one of these trees can easily tell us what changes were introduced
by each side.  And here's the bomb: if we consider extending it to
include a blob-like buffer, we can use it to store submodule-like
information for each prefix.  Everything would just work out of the
box with a few minor adjustments:

1. We have to modify our packing algorithm to not reach out beyond ^1
of these special merge commits.  This means object-store isolation
(not $GITDIR isolation).  And it means that submodules can be cloned
and removed at will.

2. We have to maintain a symbolic ref for each prefix.  A commit
invoked from a special-commit referenced subdirectory will update this
symref.  Ofcourse, this means that we need to namespace our refs/ and
refs/remotes sensibly.

3. Git commands will take into account $CWD very strongly, and just
DTRT all the time.  Whether you're looking from the submodule
perspective or the superproject perspective.

Am I making any sense?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 22:52                   ` Ramkumar Ramachandra
@ 2013-04-22 22:59                     ` Ramkumar Ramachandra
  2013-04-22 23:55                     ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-22 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthieu Moy, Thomas Rast, Git List, Avery Pennarun

Ramkumar Ramachandra wrote:
> [...]

Oh, and I forgot the best part.  True floating submodules when the
bind line references a zeroed-out hex.  Naturally, the diff'ing to
figure out changes introduced by the submodule will be blocked.  Also
good for dropping in huge binary files at a particular path.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 22:52                   ` Ramkumar Ramachandra
  2013-04-22 22:59                     ` Ramkumar Ramachandra
@ 2013-04-22 23:55                     ` Junio C Hamano
  2013-04-23  7:53                       ` Ramkumar Ramachandra
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2013-04-22 23:55 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Matthieu Moy, Thomas Rast, Git List, Avery Pennarun

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Am I making any sense?

I dunno.  Depends on the definition of "sense".

It sounds like you are repeating the same old "let's record renames
in the commit", and in a system (not Git) where recording renames
may make sense, you may be making sense.

But we will not record renames in the commit.  Time to re-read
$gmane/217, perhaps?

A subtree merge that slurps everything from a side branch under a
single directory, say gitk/, is not at all different from a normal
merge with many renames.

Imagine an alternate world where we had a small "git.git" project
with 11 files totallying 1244 lines.  Then Paul Mackerras forks that
project, remove everything from the top-level directory and adds a
Tck/Tk script "gitk".  Linus merges that history as-is, keeping all
our files intact (i.e. ignoring his removal) but taking the addition
of "gitk" from him.

Then after I inherit the project, I rename "gitk" to "gitk-git/git".
Paul does not rename his.  We keep developing in parallel and I
occassionally merge from his tree, which has "gitk" at the top,
while mine has it in a directory "gitk-git".  The ordinary rename
detection kicks in and integrates his updates to "gitk" into my
"gitk-git/gitk".

The only difference the above imaginary history has from the reality
is Paul's history does not share that root, but everything else is
the same.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-22 23:55                     ` Junio C Hamano
@ 2013-04-23  7:53                       ` Ramkumar Ramachandra
  2013-04-23 16:03                         ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-23  7:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthieu Moy, Thomas Rast, Git List, Avery Pennarun

Junio C Hamano wrote:
> It sounds like you are repeating the same old "let's record renames
> in the commit", and in a system (not Git) where recording renames
> may make sense, you may be making sense.
>
> But we will not record renames in the commit.  Time to re-read
> $gmane/217, perhaps?

Yeah, you're right.  It makes no sense to record renames, although I
have a different argument against it: any implementation that records
renames will depend on the path that was taken to get to the final
state, and this is completely wrong.  Subversion made this mistake,
and users pay a very heavy price: if the user didn't explicitly rename
a node (~= tree) and just did a delete + add, the repository is
more-or-less screwed wrt merging.

Forget everything that I said, and let's start over.

In the common usecase of subtrees, it might make sense to record some
additional submodule-like parameters in the subtree's tree object.
This additional information is entirely optional, and doesn't change
the way merges happen: we can have it as merely a heuristic-helper
(for git merge -s subtree).  Initially, let's just think of putting a
ref field in the tree, so that I can have the following setup:

- remote origin referring to my superproject.
- remote git/origin referring to the fetchdefault of my subproject git.
- remote git/ram referring to the pushdefault of my subproject.
- the tree object corresponding to quux/baz/git is additionally filled
in with the ref refs/remotes/git/origin/master.

Then, I can just say git pull quux/baz/git, and it will automatically
fetch changes from the ref and merge it into the subtree.  It's not to
say that I can't merge any other ref; just that I merge this ref most
of the time, and I want a DWIM for this case.

Further, this can speed up tree-rename detection greatly (in fact, I'm
thinking the first implementation will depend on this information
being present).  I inspect M^{tree} and I want to know how it's
composed from M^1^{tree} and M^2^{tree}.  Simple.  In M^{tree}, look
for trees that have this additional data filled in: then we can just
short-circuit the rename detection to matching the similarity of this
tree with M^1^{tree} and M^2^{tree}.

When this aux data is present in the tree, we can do one more thing:
have a symref tracking the commit-line corresponding to M^2. This
means that we can DWIM for things like 'git commit' inside the subtree
very easily.  When the aux information is not present, 'git commit'
will obviously commit to HEAD, essentially making the superproject own
those changes in the subtree (as it does now).

This might be the route to implementing narrow clones sensibly.  A
narrow clone does not mean that any directory in the entire repository
can be cloned separately: it just means that a tree with this aux data
need not be fetched in the initial clone.  For this to work, instead
of refs/remotes/git/origin/master, the aux data will need a
combination of upstream URL and ref: we can then automatically figure
out the name from the URL and deposit the fetched data in
refs/remotes/<name>/origin/<ref>.

Initializing nested submodules without the container is also easy: in
the superproject, you need to have aux-trees corresponding to
quux/baz/git and quux/baz/git/moo/clayoven.  It might additionally be
a good idea to track these aux-trees: but even if we don't go down
that route, we can always deposit a "template" file from the
superproject that won't interfere with the subtree merging process.

Now, I'm not sure what the value of splitting the object stores at
this point is.  The aux-tree can have a statthrough field to block
stat() calls from going through, so there's really no performance
issue.  If you want to separate one tree out of the superproject and
work on it separately, all you have to do is fetch the corresponding
ref.

On the issue of floating submodules.  It might make sense to zero out
the hex of the tree, as seen by the superproject.  The limitation is
that we can't introduce any changes to the submodule from the
superproject: it's basically a dummy entry sitting in the
superproject's tree.  It's a bit of a hack, but I think it's workable.

So, what do you think?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-23  7:53                       ` Ramkumar Ramachandra
@ 2013-04-23 16:03                         ` Junio C Hamano
  2013-04-23 16:29                           ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2013-04-23 16:03 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Matthieu Moy, Thomas Rast, Git List, Avery Pennarun

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Further, this can speed up tree-rename detection greatly (in fact, I'm
> thinking the first implementation will depend on this information
> being present).  I inspect M^{tree} and I want to know how it's
> composed from M^1^{tree} and M^2^{tree}.  Simple.  In M^{tree}, look
> for trees that have this additional data filled in: then we can just
> short-circuit the rename detection to matching the similarity of this
> tree with M^1^{tree} and M^2^{tree}.

If you need the history context (i.e. M, M^1, M^2) around it to
interpret that additional information, isn't it essentially the same
as storing renames in the merge commit M?

Not very impressed, at least not yet.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] t4202 (log): add failing test for log with subtree
  2013-04-23 16:03                         ` Junio C Hamano
@ 2013-04-23 16:29                           ` Ramkumar Ramachandra
  0 siblings, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2013-04-23 16:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthieu Moy, Thomas Rast, Git List, Avery Pennarun

Junio C Hamano wrote:
> If you need the history context (i.e. M, M^1, M^2) around it to
> interpret that additional information, isn't it essentially the same
> as storing renames in the merge commit M?

There's one big difference.

The point is that for any kind of repository composition, you're going
to have to store tree renames at some level if you want to be able to
move the submodule around in the tree.  In the current submodule
system, you could say that you're storing tree renames in a blob
(submodule.<name>.path in .gitmodules).  Ofcourse, the difference is
that the other side of the tree entry is completely opaque to the
superproject (and I think we can do better than that now).  You should
use heuristics for all other kinds of renames, but I'm arguing that
repository composition is a special case that needs more thought.

You shouldn't be against storing renames in principle, because we
already do that in a way.  What you should be against is storing
renames in a way that will lock up the repository if a different path
is taken to the same state.

What I'm proposing is an optional field to speed up (and make more
reliable) rename detection in a very specific case.  I'm not putting
the information in a commit because that will lock up our repository:
If there are two different commits representing the same tree, but one
contains bind lines and the other doesn't, we've done something
seriously wrong: rebase and merge are screwed.  In my case, if there
are two perfectly identical trees except that one contains auxiliary
information, nothing goes wrong: the tree without the auxiliary
information will just be a little slower and can't support DWIM git
operations.

> Not very impressed, at least not yet.

Please bear with me.  I really think the repository composition
problem is worth solving.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-04-23 16:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-22 12:08 [PATCH] t4202 (log): add failing test for log with subtree Ramkumar Ramachandra
2013-04-22 12:11 ` Matthieu Moy
2013-04-22 12:17   ` Thomas Rast
2013-04-22 12:37     ` Ramkumar Ramachandra
2013-04-22 12:54       ` Ramkumar Ramachandra
2013-04-22 12:29   ` Ramkumar Ramachandra
2013-04-22 12:36     ` Matthieu Moy
2013-04-22 13:15 ` Thomas Rast
2013-04-22 13:19   ` Thomas Rast
2013-04-22 14:30   ` Ramkumar Ramachandra
2013-04-22 14:57     ` Ramkumar Ramachandra
2013-04-22 15:24     ` Thomas Rast
2013-04-22 15:46       ` Ramkumar Ramachandra
2013-04-22 15:50       ` Ramkumar Ramachandra
2013-04-22 15:54         ` Ramkumar Ramachandra
2013-04-22 17:29           ` Ramkumar Ramachandra
2013-04-22 19:15             ` Thomas Rast
2013-04-22 19:54               ` Ramkumar Ramachandra
2013-04-22 21:00                 ` Philip Oakley
2013-04-22 21:08                   ` Ramkumar Ramachandra
2013-04-22 21:23                     ` Ramkumar Ramachandra
2013-04-22 21:06               ` Matthieu Moy
2013-04-22 21:59                 ` Junio C Hamano
2013-04-22 22:52                   ` Ramkumar Ramachandra
2013-04-22 22:59                     ` Ramkumar Ramachandra
2013-04-22 23:55                     ` Junio C Hamano
2013-04-23  7:53                       ` Ramkumar Ramachandra
2013-04-23 16:03                         ` Junio C Hamano
2013-04-23 16:29                           ` Ramkumar Ramachandra
2013-04-22 16:32       ` Junio C Hamano
2013-04-22 18:00         ` Ramkumar Ramachandra
2013-04-22 18:18           ` Matthieu Moy
2013-04-22 19:09           ` Junio C Hamano
2013-04-22 20:39         ` Ramkumar Ramachandra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.