git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Rename handling
@ 2007-03-19 16:10 John Goerzen
  2007-03-19 18:14 ` Steven Grimm
  2007-03-21  0:21 ` Jakub Narebski
  0 siblings, 2 replies; 31+ messages in thread
From: John Goerzen @ 2007-03-19 16:10 UTC (permalink / raw)
  To: git

Hi,

I've read the FAQ and Linus' philosophy on this topic, and have some
questions still.  I'm considering using Git and its philosophy on
renames is troubling me.

My use for version control presently has most changes being written and
committed by me directly, with occasional patches coming in from random
others.  As such, running something like 'git mv' when a rename occurs
is not a problem.

My main concerns with Git are:

1) git log does not show complete history of files that have been
   renamed or copied.

   If I have foo.txt, and rename it to bar.txt, the liberal use of -M
   can tease out a proper patch from a number of places.  But
   git log bar.txt, with any set of options I can possibly come up with,
   absolutely refuses to show me the history of bar.txt before it was
   renamed to bar.txt.  git log foo.txt also does not show me the old
   history for the file.

2) For me, a rename is a logical change to the source tree that I want
   to be recorded with absolute certainty, not guessed about later.
   Sometimes I may make API changes and it is useful to see how module
   names changed, with complete precision, later.  I do not want to be
   victim to an incorrect guess, which could be possible.

Is there any way to resolve this with Git, or do I basically have to
stick with Mercurial here?

-- John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 16:10 Rename handling John Goerzen
@ 2007-03-19 18:14 ` Steven Grimm
  2007-03-19 18:35   ` Nicolas Pitre
                     ` (3 more replies)
  2007-03-21  0:21 ` Jakub Narebski
  1 sibling, 4 replies; 31+ messages in thread
From: Steven Grimm @ 2007-03-19 18:14 UTC (permalink / raw)
  To: John Goerzen; +Cc: git

John Goerzen wrote:
> 2) For me, a rename is a logical change to the source tree that I want
>    to be recorded with absolute certainty, not guessed about later.
>    Sometimes I may make API changes and it is useful to see how module
>    names changed, with complete precision, later.  I do not want to be
>    victim to an incorrect guess, which could be possible.
>   

If you commit your renames separately from your content changes, it'll 
be unambiguous and you won't have to worry about it. That's what I 
usually do when this is a concern and it has yet to break for me.

On the other hand, I agree with your general point; I really don't like 
being uncertain about whether renames are going to come out correctly or 
not ("it has always worked before" and "it is by design unable to fail" 
are two very different things.) In particular, I strongly disagree with 
the "names are just syntactic sugar, it's the content we're tracking" 
philosophy. Here's a simple example of why:

#include <xyz.h>

That simple statement is an intermingling of content and namespace. The 
presence of something like that actually breaks the "commit the rename 
separately" approach -- if you rename xyz.h to something else and commit 
just that rename, that revision won't compile, and I *really* don't like 
intentionally committing broken revisions.

Okay, so you say, rename xyz.h and update the references to it, but 
don't actually modify it. Fine, that works in this case. Now how about 
this one:

public abstract class Foo {
    private static Logger logger = Logger.getLogger(Foo.class.getName());
}

The references to the name "Foo.java" in that case are within the file 
itself (assuming you're using a Java compiler that requires the filename 
and class name to match, which the common ones do.) You can't change 
just the references without changing the file you're renaming. And, 
depending on how many self-references there are in this file, it's 
anyone's guess whether the content-based rename detection will consider 
the renamed file to be close enough to the old one to be a probable rename.

Combine renames with major code refactoring where the content changes 
substantially, and all bets are off.

Now, having said all that, I'll argue in favor of the content-based 
rename support for a moment. It is extremely cool that git will actually 
detect renames in third-party packages where you've just untarred a new 
release into your git repository and committed it, but have given git no 
hints at all about the nature of the content changes. I'm not aware of 
any other version control system that'll do that, and I've taken 
advantage of that feature in the past. So by no stretch am I saying that 
content-based rename detection is worthless.

But I would sure rest a lot easier if "git mv" would record a "the user 
renamed this file" entry in some log somewhere and the merge code would 
see that entry and say, "aha, no need to guess at it, file X got renamed 
to Y." Bonus points if that record could apply to directories too, so 
you don't have the "I created a new file in a directory you renamed, and 
after git-pull my file is still sitting by itself in the old directory" 
bug. If no such record exists, then the current rename code should still 
be invoked to work its considerable magic.

So to answer your question, in my opinion if 100% guaranteed renames are 
high on your priority list, then Mercurial might be the better option 
for now. In practice, I've found that git's 99+% rename detection has 
yet to fail on me aside from the above directory renaming case, but at 
the end of the day it *is* guessing at your renames after the fact.

Okay, git gurus, show me no mercy. :)

-Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:14 ` Steven Grimm
@ 2007-03-19 18:35   ` Nicolas Pitre
  2007-03-19 18:48     ` Linus Torvalds
  2007-03-19 19:36     ` Steven Grimm
  2007-03-19 19:03   ` Andy Parkins
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 31+ messages in thread
From: Nicolas Pitre @ 2007-03-19 18:35 UTC (permalink / raw)
  To: Steven Grimm; +Cc: John Goerzen, git

On Mon, 19 Mar 2007, Steven Grimm wrote:

> So to answer your question, in my opinion if 100% guaranteed renames are high
> on your priority list, then Mercurial might be the better option for now. In
> practice, I've found that git's 99+% rename detection has yet to fail on me
> aside from the above directory renaming case, but at the end of the day it
> *is* guessing at your renames after the fact.
> 
> Okay, git gurus, show me no mercy. :)

Well...  the fact that you _still_ use GIT even in the face of a 1% 
probability that it might guess renames wrong (according to your own 
numbers) should mean that you didn't felt switching to Mercurial was 
worth the 100% guarantee for rename identification.

And some will argue that explicit renames are susceptible to user error 
misidentifying the rename too, certainly in the 1% figure of all renames 
if not more.

So maybe, just maybe, at the end of the day getting renames right 100% 
of the time instead of 99% is not such a big thing after all.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:35   ` Nicolas Pitre
@ 2007-03-19 18:48     ` Linus Torvalds
  2007-03-19 19:57       ` Steven Grimm
  2007-03-19 20:02       ` Robin Rosenberg
  2007-03-19 19:36     ` Steven Grimm
  1 sibling, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-03-19 18:48 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Steven Grimm, John Goerzen, git



On Mon, 19 Mar 2007, Nicolas Pitre wrote:
> 
> And some will argue that explicit renames are susceptible to user error 
> misidentifying the rename too, certainly in the 1% figure of all renames 
> if not more.

It's much worse than that. I will *guarantee* that renames are missed when 
they come in as traditional patches, for example. That's a 100% error rate 
right there, not some "1%" one.

And even if people never make mistakes, and people *only* use the native 
SCM "rename" functions, I guarantee that the downsides of thinking that 
files have identities is still much much bigger than the upsides. We've 
already shown that the git "blame" functionality is strictly more powerful 
than anything based on renames.

So learn to love the bomb. Rename tracking is *wrong*. 

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:14 ` Steven Grimm
  2007-03-19 18:35   ` Nicolas Pitre
@ 2007-03-19 19:03   ` Andy Parkins
  2007-03-19 19:21     ` Steven Grimm
  2007-03-19 19:15   ` Daniel Barkalow
  2007-03-19 19:49   ` John Goerzen
  3 siblings, 1 reply; 31+ messages in thread
From: Andy Parkins @ 2007-03-19 19:03 UTC (permalink / raw)
  To: git; +Cc: Steven Grimm, John Goerzen

On Monday 2007, March 19, Steven Grimm wrote:

> On the other hand, I agree with your general point; I really don't
> like being uncertain about whether renames are going to come out
> correctly or not ("it has always worked before" and "it is by design

I agree with you, but I think that git does exactly what you want.  In 
fact I think git is better.

The beauty of git figuring out renames for itself is that git can figure 
it out later. 

 $ mv file1 file2
 $ git update-index --remove file1
 $ git add file2

The important thing here is that git wasn't used to do the move.  This 
is great when you're lost in a development haze and do the move without 
thinking.

> So to answer your question, in my opinion if 100% guaranteed renames
> are high on your priority list, then Mercurial might be the better
> option for now. In practice, I've found that git's 99+% rename
> detection has yet to fail on me aside from the above directory
> renaming case, but at the end of the day it *is* guessing at your
> renames after the fact.

It's not really a guess; through the magic of sha-1, and provided you 
are disciplined enough to commit the rename without any changes to the 
content you can be sure that the rename is tracked.  The sha-1 /must/ 
be the same before and after.  For this 100% case, git doesn't even 
need the "-M", git-blame, git-diff and git-merge will find it anyway.

Even better is that because the rename isn't recorded explicitly when 
you upgrade git and the detection gets better, all your history 
instantly gets interpreted correctly.

The only command I've found that doesn't do the "right thing" by default 
is git-log and I think that once the following works, all the "why 
doesn't git track renames" people will go quietly away:

 $ git init
 $ date > file1
 $ git add file1
 $ git commit -m ""
 $ git mv file1 file2
 $ git commit -m ""
 $ git mv file2 file3
 $ git commit -m ""
 $ git log -- file3




Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:14 ` Steven Grimm
  2007-03-19 18:35   ` Nicolas Pitre
  2007-03-19 19:03   ` Andy Parkins
@ 2007-03-19 19:15   ` Daniel Barkalow
  2007-03-19 19:49   ` John Goerzen
  3 siblings, 0 replies; 31+ messages in thread
From: Daniel Barkalow @ 2007-03-19 19:15 UTC (permalink / raw)
  To: Steven Grimm; +Cc: John Goerzen, git

On Mon, 19 Mar 2007, Steven Grimm wrote:

> John Goerzen wrote:
> > 2) For me, a rename is a logical change to the source tree that I want
> >    to be recorded with absolute certainty, not guessed about later.
> >    Sometimes I may make API changes and it is useful to see how module
> >    names changed, with complete precision, later.  I do not want to be
> >    victim to an incorrect guess, which could be possible.
> >   
> 
> If you commit your renames separately from your content changes, it'll be
> unambiguous and you won't have to worry about it. That's what I usually do
> when this is a concern and it has yet to break for me.
> 
> On the other hand, I agree with your general point; I really don't like being
> uncertain about whether renames are going to come out correctly or not ("it
> has always worked before" and "it is by design unable to fail" are two very
> different things.) In particular, I strongly disagree with the "names are just
> syntactic sugar, it's the content we're tracking" philosophy.

We are tracking the names as part of the content. They're right there in 
the tree objects. It's not like, when you check out an older revision, you 
could get the right content under the wrong name. The philosophy is 
actually that we're tracking a series of states, and we're somewhat 
agnostic on the description of the difference between two states. And it 
often makes sense to postpone trying to describe this difference until you 
know why you want to know, because it's certainly possible that there are 
multiple reasonable interpretations, and some may give better results than 
others.

If you're trying to merge a rename-and-refactor change (often something 
like splitting a source or header file into two files) with a 
modification, and it's arguable what happened in the refactor, the 
interpretation which gives the state that's easiest to resolve correctly 
may depend on what the modification is. So you really want to leave it up 
to the merge code to choose the best guess at the result, without using a 
fixed description of what the changes are.

As for whether names or contents "matter more", we have both answers. "git 
log <names>" gives you the history of what has happened to change what 
appears with those names. "git blame <name>", on the other hand, gives you 
the history of the content which now appears at that name. You just need 
to ask the question you want the answer to.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:03   ` Andy Parkins
@ 2007-03-19 19:21     ` Steven Grimm
  2007-03-21  0:06       ` Jakub Narebski
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Grimm @ 2007-03-19 19:21 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, John Goerzen

Andy Parkins wrote:
> It's not really a guess; through the magic of sha-1, and provided you 
> are disciplined enough to commit the rename without any changes to the 
> content you can be sure that the rename is tracked.  The sha-1 /must/ 
> be the same before and after.  For this 100% case, git doesn't even 
> need the "-M", git-blame, git-diff and git-merge will find it anyway.
>   

I said as much in my mail. The problem is that "commit the rename 
without any changes to the content" is synonymous in many cases with 
"commit a revision that fails to compile." Which may or may not be 
acceptable in some environments but is, to me at least, a sign that 
something is inadequate in the version control system. I shouldn't be 
forced to have a broken build in my revision history just to be 100% 
certain my rename will be tracked accurately.

> The only command I've found that doesn't do the "right thing" by default 
> is git-log and I think that once the following works, all the "why 
> doesn't git track renames" people will go quietly away:
>
>  $ git init
>  $ date > file1
>  $ git add file1
>  $ git commit -m ""
>  $ git mv file1 file2
>  $ git commit -m ""
>  $ git mv file2 file3
>  $ git commit -m ""
>  $ git log -- file3
>   

The following is actually my biggest beef with git's rename tracking, 
and it has nothing whatsoever to do with git-log (though I agree git-log 
needs to track renames too):

$ ls
dir1
$ ls dir1
file1 file2 file3
$ echo "#include file1" > dir1/file4
$ git add dir1/file4
$ git commit
$ git pull
$ ls
dir1 dir2
$ ls dir1
file4
$ ls dir2
file1 file2 file3

That's just plain broken in my opinion. One can perhaps contrive a test 
case or two where that's the desired behavior, but in the real world it 
is almost never what you actually want.

By the way, I don't think fixing that is necessarily related to how 
renames get detected, so in some sense it's a different bug report / 
feature request than the rename hints one. It would be possible to 
figure out the directory had been renamed based purely on content 
analysis; a bunch of files all individually renamed to the same places 
under a new directory, and a lack of any files at all left in the old 
one, probably means the directory got renamed. The content-based rename 
detector could handle this case.

-Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:35   ` Nicolas Pitre
  2007-03-19 18:48     ` Linus Torvalds
@ 2007-03-19 19:36     ` Steven Grimm
  2007-03-19 19:45       ` Steven Grimm
                         ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Steven Grimm @ 2007-03-19 19:36 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: John Goerzen, git

Nicolas Pitre wrote:
> Well...  the fact that you _still_ use GIT even in the face of a 1% 
> probability that it might guess renames wrong (according to your own 
> numbers) should mean that you didn't felt switching to Mercurial was 
> worth the 100% guarantee for rename identification.
>   

Yes, that's right, for me that's the correct tradeoff. For the person 
who asked the original question, it may or may not be. He seemed a lot 
more worried about the situation than I am. In my environment renames 
are relatively rare events, but maybe in his they happen more often.

> And some will argue that explicit renames are susceptible to user error 
> misidentifying the rename too, certainly in the 1% figure of all renames 
> if not more.
>   

If you're using "git mv" instead of "mv" to do the rename, it is 
impossible to misidentify the rename since the rename and identification 
are happening in the same command with no additional inputs that could 
confuse anything. If you are talking about adding a new tool that can 
manually tag a rename after the fact, then I can't disagree with you 
except to say that the fact that no such command exists today means any 
estimate of user error rate is pure speculation.

Aside from that, the possibility of user error is an entirely different 
thing than the possibility of tool error -- if I misidentify a rename, I 
will blame myself, not the version control system, and rightly so. 
People are expected to make mistakes from time to time. But if my 
version control tool misidentifies a rename on my behalf, and there's 
nothing I can do about it because there's no way to influence the tool's 
concept of what got renamed to what, then I'm not going to consider it a 
failure of the tool, not a mistake on my part.

> So maybe, just maybe, at the end of the day getting renames right 100% 
> of the time instead of 99% is not such a big thing after all.
>   

For me personally, that is true -- but I'd still prefer that extra 1%.

-Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:36     ` Steven Grimm
@ 2007-03-19 19:45       ` Steven Grimm
  2007-03-19 20:07         ` Linus Torvalds
  2007-03-19 20:17       ` Nicolas Pitre
  2007-03-19 20:44       ` Daniel Barkalow
  2 siblings, 1 reply; 31+ messages in thread
From: Steven Grimm @ 2007-03-19 19:45 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Nicolas Pitre, John Goerzen, git

Steven Grimm wrote:
> But if my version control tool misidentifies a rename on my behalf, 
> and there's nothing I can do about it because there's no way to 
> influence the tool's concept of what got renamed to what, then I'm not 
> going to consider it a failure of the tool, not a mistake on my part.

Err, then I am *going* to consider it a failure of the tool.

-Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:14 ` Steven Grimm
                     ` (2 preceding siblings ...)
  2007-03-19 19:15   ` Daniel Barkalow
@ 2007-03-19 19:49   ` John Goerzen
  2007-03-19 22:27     ` Junio C Hamano
  3 siblings, 1 reply; 31+ messages in thread
From: John Goerzen @ 2007-03-19 19:49 UTC (permalink / raw)
  To: git

On 2007-03-19, Steven Grimm <koreth@midwinter.com> wrote:
> John Goerzen wrote:
>> 2) For me, a rename is a logical change to the source tree that I want
>>    to be recorded with absolute certainty, not guessed about later.
>>    Sometimes I may make API changes and it is useful to see how module
>>    names changed, with complete precision, later.  I do not want to be
>>    victim to an incorrect guess, which could be possible.
>>   
>
> If you commit your renames separately from your content changes, it'll 
> be unambiguous and you won't have to worry about it. That's what I 
> usually do when this is a concern and it has yet to break for me.

As I have been testing Mercurial and its addremove feature (which does
basically what Git is doing), I encountered some situations where
this broke, sometimes spectacularly.  This generally happened when there
were identical files in the source tree, or when there were identical
resulting files, or 0-byte files (the extreme pathological case, of
course).

Again, sometimes filenames have significance.  The presence or absence
of 0-byte files can impact what make does, for instance.  Not that I
advocate for this behavior, but just pointing out that it exists.

I understand what Linus is saying about applying patches from others and
agree that what git is doing is nice in this case.

But if most of my work is hacking directly on the code, I am going to
know better than the VCS what is being renamed, and would like to record
that.  Sometimes the filenames are part of the code.

I want the option.

-- John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:48     ` Linus Torvalds
@ 2007-03-19 19:57       ` Steven Grimm
  2007-03-19 20:19         ` Martin Langhoff
  2007-03-19 20:22         ` Linus Torvalds
  2007-03-19 20:02       ` Robin Rosenberg
  1 sibling, 2 replies; 31+ messages in thread
From: Steven Grimm @ 2007-03-19 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, John Goerzen, git

Linus Torvalds wrote:
> It's much worse than that. I will *guarantee* that renames are missed when 
> they come in as traditional patches, for example. That's a 100% error rate 
> right there, not some "1%" one.
>   

That's an argument that the content-based rename detector is valuable 
and shouldn't be ditched, a sentiment with which I completely agree. It 
is a fabulous piece of work and handles cases that none of git's 
competitors get right, such as patches or tracking upstream 
distributions or tracking renames made by non-git-aware tools.

However, "Should we keep the existing rename detection?" is not the same 
question as, "Should the user be able to tell the system he's renaming 
something?" Or rather, given we already have git mv, "Should the system 
remember that the user has told it he's renaming something?" Right now 
git is throwing away metadata the user is feeding to it, metadata that 
could be used to eliminate the chance of a subsequent failure. As long 
as that metadata is used in *addition* to the existing logic, rather 
than as a *replacement*, the downside seems minimal. You won't have it 
in the case of a patch, granted, but that just means patches will use 
the existing, almost-always-right, rename detection, no harm done.

> So learn to love the bomb. Rename tracking is *wrong*. 
>   

Until someone comes up with a way to make content-based rename detection 
100% foolproof in the face of things like frequent self-references in 
Java or C++ classes, it may be a necessary evil (or at least a 
worthwhile one.)

-Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 18:48     ` Linus Torvalds
  2007-03-19 19:57       ` Steven Grimm
@ 2007-03-19 20:02       ` Robin Rosenberg
  2007-03-19 20:34         ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Robin Rosenberg @ 2007-03-19 20:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Steven Grimm, John Goerzen, git

måndag 19 mars 2007 19:48 skrev Linus Torvalds:
> 
> On Mon, 19 Mar 2007, Nicolas Pitre wrote:
> > 
> > And some will argue that explicit renames are susceptible to user error 
> > misidentifying the rename too, certainly in the 1% figure of all renames 
> > if not more.
> 
> It's much worse than that. I will *guarantee* that renames are missed when 
> they come in as traditional patches, for example. That's a 100% error rate 
> right there, not some "1%" one.
> 
> And even if people never make mistakes, and people *only* use the native 
> SCM "rename" functions, I guarantee that the downsides of thinking that 
> files have identities is still much much bigger than the upsides. We've 
> already shown that the git "blame" functionality is strictly more powerful 
> than anything based on renames.
> 
> So learn to love the bomb. Rename tracking is *wrong*. 
> 
> 		Linus

How about this simple receipe for defeating rename tracking (real world):

User needs to modify A. User renames A to OLD_A within his/her IDE. SCM
records the rename. User now uses SaveAs to restore A, and SCM detects the 
*NEW* file A.

-- robin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:45       ` Steven Grimm
@ 2007-03-19 20:07         ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-03-19 20:07 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Nicolas Pitre, John Goerzen, git



On Mon, 19 Mar 2007, Steven Grimm wrote:
> 
> Err, then I am *going* to consider it a failure of the tool.

Sure.

And then realize that nothing is perfect.

Git is just *closer* to perfect than any other SCM out there. That doesn't 
mean that you can never find cases where you consider it failed. It just 
means that it fails less than the alternatives.

To be perfect, an SCM would have to be able to infer a higher meaning. Who 
knows - maybe that will happen in a few centuries (or decades, but AI has 
not had a good track-record so far). 

Git tracks exactly the stuff that does *not* require it to infer higher 
meanings - purely data. I personally consider it one of gits greatest 
strengths: it never matters *how* you get to some state, or what tools you 
used (patches, imports from other SCM's, "git mv" with intelligent 
developers, "git mv" with total clutzes, "plain mv", random monkeys 
typing, whatever). Git tracks not "intent", but "hard data".

The fact that git then can use that unambiguous hard data to show you 
interesting patterns is a big deal. But you need to realize that it's an 
even *bigger* deal that git only traffics in hard data that leaves 
absolutely no room for mistakes.

Git simply doesn't *care* whether you applied a patch to create the 
rename, or whether you imported the series from a system that doesn't 
track renames, or whether you just forgot to do "git mv". You should be 
really really happy about that. 

Btw, the reason -M isn't on by default is not that it's more expensive in 
CPU-time (it is, but quite frankly, you will never really see that effect 
in practice). No, the real reason is that if you use "-M" and actually see 
renames, traditional tools no longer understand the patches. And sadly, 
there are still too many unwashed and ignorant people out there to make 
the default patch format be git-specific.

When the revolution comes, and we can shoot everybody who uses anything 
else, we'll turn -M on by default.  Don't despair, comrade!

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:36     ` Steven Grimm
  2007-03-19 19:45       ` Steven Grimm
@ 2007-03-19 20:17       ` Nicolas Pitre
  2007-03-19 20:44       ` Daniel Barkalow
  2 siblings, 0 replies; 31+ messages in thread
From: Nicolas Pitre @ 2007-03-19 20:17 UTC (permalink / raw)
  To: Steven Grimm; +Cc: John Goerzen, git

On Mon, 19 Mar 2007, Steven Grimm wrote:

> Nicolas Pitre wrote:
> > And some will argue that explicit renames are susceptible to user error
> > misidentifying the rename too, certainly in the 1% figure of all renames if
> > not more.
> >   
> 
> If you're using "git mv" instead of "mv" to do the rename, it is impossible to
> misidentify the rename since the rename and identification are happening in
> the same command with no additional inputs that could confuse anything.

I was actually referring to someone using cp + rm + {svn|hg|whatever} to 
commit a rename in which case the tool won't know.  And I'm sure that 
happens more than 1% of the time.

> If you
> are talking about adding a new tool that can manually tag a rename after the
> fact, then I can't disagree with you except to say that the fact that no such
> command exists today means any estimate of user error rate is pure
> speculation.

Sure.  

> Aside from that, the possibility of user error is an entirely different thing
> than the possibility of tool error -- if I misidentify a rename, I will blame
> myself, not the version control system, and rightly so. People are expected to
> make mistakes from time to time. But if my version control tool misidentifies
> a rename on my behalf, and there's nothing I can do about it because there's
> no way to influence the tool's concept of what got renamed to what, then I'm
> not going to consider it a failure of the tool, not a mistake on my part.

But GIT's rename heuristics can be modified/improved, and all renames 
that were wrongly identified before will suddenly be fixed.

If the rename is part of the recorded history then there is nothing you 
can do to fix mistakes, be it human or tool based.

> > So maybe, just maybe, at the end of the day getting renames right 100% of
> > the time instead of 99% is not such a big thing after all.
> >   
> 
> For me personally, that is true -- but I'd still prefer that extra 1%.

I'm sure the human screwups is at _least_ in the 1% range.  So even if 
you think you should know better, being a human you'll make a mistake 
eventually so you won't have that 100% anyway.


Nicolas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:57       ` Steven Grimm
@ 2007-03-19 20:19         ` Martin Langhoff
  2007-03-20  8:33           ` Junio C Hamano
  2007-03-19 20:22         ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Langhoff @ 2007-03-19 20:19 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Linus Torvalds, Nicolas Pitre, John Goerzen, git

On 3/20/07, Steven Grimm <koreth@midwinter.com> wrote:
> However, "Should we keep the existing rename detection?" is not the same
> question as, "Should the user be able to tell the system he's renaming
> something?"

How is a $SCM-mv that remembers useful in any _interesting_ scenario?
You don't move it just because, you refactor and re-architect
something.

In that area, git's mergers still have a bit to go -- I do hope for a
day when I can say git-merge -s refactor or just git-merge -s
tryharder so that it if hunks don't apply, git will try and trace
where the block of code the hunk applies to has gone.

So recording mv doesn't solve anything other than 1% of the cases --
those full file moves that git will discover anyway even if the file
changed a bit. And recording the mv gives you a false sense of being
useful. It's not.

There's more work in having
go-slow-and-really-try-to-merge-across-a-refactor mergers that could
at least hint at where the hunk is likely to belong now.

> Until someone comes up with a way to make content-based rename detection
> 100% foolproof in the face of things like frequent self-references

Well, if code changes, there are no guarantees. Patching is a
best-effort-but-not-too-smart thing. But in the end, a human needs to
look at it if it's tricky. Wiggle users know ;-)

And, at the end of the day, hitting programmers that move sh*t around
needlessly in the head works too. You wouldn't let them change
projectwide function/method name conventions willy nilly either.

cheers,


martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:57       ` Steven Grimm
  2007-03-19 20:19         ` Martin Langhoff
@ 2007-03-19 20:22         ` Linus Torvalds
  1 sibling, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-03-19 20:22 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Nicolas Pitre, John Goerzen, git



On Mon, 19 Mar 2007, Steven Grimm wrote:
> 
> Until someone comes up with a way to make content-based rename detection 100%
> foolproof in the face of things like frequent self-references in Java or C++
> classes, it may be a necessary evil (or at least a worthwhile one.)

Try to implement it. Trust me, it will suck so badly that you'll realize 
that I am right.

File identities have *huge* problems. They are a total nightmare to 
implement well, and you'd lose a lot of the really good features that git 
has. But I suspect it's not worth trying to convince you. You really need 
to just try to do it.

I can name some of the problems:

 - you get a whole new source of really fundamentally nasty merge 
   conflicts that are really fundmanentally hard to handle.

   I have first-hand experience, and I realize that lots of people simply 
   don't even *understand* the problems. But take it from me - having two 
   different repositories create the same file independently (which is not 
   at all uncommon - patches flying around make it really easy) - is the 
   tip of a very nasty iceberg.

   Git makes merges easy in comparison. Trying to track file ID's is a 
   huge gaping hole, leading to hell. I doubt you'll find an SCM that does 
   it, and does it well.

   (Side note: in a *centralized* setup, with no branches, this isn't a 
   problem. But with branches it already becomes a nastier issue, and when 
   decentralized you simply cannot avoid it at all).

   End result: merges suck. Unless you're smart, and use git.

 - The end result depends on the path you took to get there. You're going 
   to have a really hard time re-creating things exactly, and doing so 
   from patches (which is *the* most common way actual real development 
   gets done) is basically impossible. So you end up with the meaning 
   getting lost along the way (or being added), and now where is your 
   "trustworthy" file movement logic?

I've seen both problems with BK. Bitkeeper had a nice patch application 
graphical toolkit to help some of these issues. It was really nice, but it 
was still *painful*. Not having it would be a total disaster. Except if 
you simply don't need it - like git.

The merges in particular is something that git just DOES BETTER than 
anybody else. The lack of explicit file ID information means that you can 
just fix up the conflicts in the working tree using perfectly normal 
tools, and never have to do anything special with special "merge tools". 

You may still want to use nice graphical tools to help you, of course, but 
they aren't even git-specific any more. See the whole discussion about 
visual merge tools recently, and "git mergetool", and using a random tool 
to do the merge. Realize that IF FILE IDENTITIES MATTER, none of this 
works. Suddenly, you need to have special tools that resolve the file 
identity problem (never mind the fact that quite often there *is* no 
ideal resolution and "resolving" it ends up being "pick one of the other, 
and live with it forever").

In other words, you live in a dream-world where you know you'd like to 
tell the SCM what the rename is, but at the same time you don't really 
realize what the problems with that is downstream. You think it's a great 
thing, because you've not thought the consequences through (which is 
understandable - they are subtle, and the main reason I know them is (a) 
I'm just special. My mother told me so! and (b) I've used SCM's that do 
it about as well as you can, and I've hit the problems).

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 20:02       ` Robin Rosenberg
@ 2007-03-19 20:34         ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-03-19 20:34 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Nicolas Pitre, Steven Grimm, John Goerzen, git



On Mon, 19 Mar 2007, Robin Rosenberg wrote:
> 
> How about this simple receipe for defeating rename tracking (real world):
> 
> User needs to modify A. User renames A to OLD_A within his/her IDE. SCM
> records the rename. User now uses SaveAs to restore A, and SCM detects the 
> *NEW* file A.

Well, the thing is, I don't think that's a very strong argument against 
rename tracking.

You can always make trivial examples of when something goes wrong. 
Computers (and SCM's) are stupid, and they simply do what they are told. 
Just about *any* policy can be trivially show to be "totally broken" by 
having a user do something - usually something very simple - that breaks 
it on purpose.

Similarly, I don't think it's hard to show examples of where git's 
"content is king" does something that the user thinks could be done much 
better. And similarly, I don't think that's an argument against the 
content model that git uses.

No, the reason I like the content model is that there is never any hidden 
state that doesn't matter for the user. If you're a physicist, you could 
say that yhere is never any "action at a distance" with git. There are no 
hidden linkages that aren't obvious in the source tree that you commit or 
work on.

In contrast, the very *definition* of "explicit rename tracking" is to 
track those hidden linkages. They aren't visible to the developer, except 
when they clash, and that very invisibility is what makes both mistakes so 
easy (anybody who claims that they never do a rename as a del/add pair is 
simply lying or not doing very interesting development) *and* it is what 
makes handling merges so hard (because when there is a conflict, the 
conflict isn't actually in anything that is *visible*!).

I also pretty much guarantee that the reason git development has been so 
fast, stable and trouble-free comes exactly from the simple conceptual 
mindset and very concrete implementation. There simply are never any 
subtle issues. Content is content is content. It has no "history". Yes, 
git shows history, but it's literally a "series of snapshots", and the 
trees that are checked out are totally history-less. 

In contrast, if you have file rename tracking, then those trees are no 
longer stateless - they have an implied history associated with them, that 
matters. It's largely *invisible*, but that just makes it worse!

			Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:36     ` Steven Grimm
  2007-03-19 19:45       ` Steven Grimm
  2007-03-19 20:17       ` Nicolas Pitre
@ 2007-03-19 20:44       ` Daniel Barkalow
  2 siblings, 0 replies; 31+ messages in thread
From: Daniel Barkalow @ 2007-03-19 20:44 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Nicolas Pitre, John Goerzen, git

On Mon, 19 Mar 2007, Steven Grimm wrote:

> Nicolas Pitre wrote:
> > So maybe, just maybe, at the end of the day getting renames right 100% of
> > the time instead of 99% is not such a big thing after all.
> 
> For me personally, that is true -- but I'd still prefer that extra 1%.

I think the discussion of 99% is misleading here. The heuristics aren't 
random; it's not like if you do 2000 renames, you can expect 20 of them to 
be mishandled. What's actually going on is that git will get 100% on 
unambiguous cases; it'll get 100% on slight ambiguities; it'll get 100% on 
mostly clear cases. On the ~2% of cases where the correct result is 
arguable, git will choose differently from you half of the time. If you do 
a rename and have to change most of the lines of the file, git might 
decide that you rewrote it from scratch. On the other hand, you might have 
had an easier time rewriting it from scratch. Even more extreme, if you 
use git-mv to rename a file, and then you totally replace the file with 
some other content, git will treat it as a remove and an add, rather than 
a rename and a total rewrite. But making it a remove and an add is the 
sensible interpretation of the change, anyway.

I'd actually guess that git's analysis is at least as likely to be useful 
as the reference human analysis that the 1% error rate is measured 
against.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:49   ` John Goerzen
@ 2007-03-19 22:27     ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2007-03-19 22:27 UTC (permalink / raw)
  To: John Goerzen; +Cc: git

John Goerzen <jgoerzen@complete.org> writes:

> But if most of my work is hacking directly on the code, I am going to
> know better than the VCS what is being renamed, and would like to record
> that.  Sometimes the filenames are part of the code.
>
> I want the option.

This depends on what you want to do with that recorded rename.

Let's say you renamed foo.py to bar.py, but the content changed
so much while you renamed the file that it is no longer similar
enough for the default similarity threshold to consider them a
renamed pair.

Then you run:

	$ git log bar.py

and against your expectation, it says "the history of that name
ends here --- you created this file at this point from zero".

Now, what would you do at this point, of course after an
obligatory "f*ck, stupid tool" cussing?

You would do "$ git show -M" and you would not see the rename,
because we are assuming that the similarity is smaller than the
default threashold.

But it's your project and you know better than the tool.  You
know that bar.py used to be called foo.py.  So you can do:

	$ git diff $commit^:foo.py $commit:bar.py

if you cared what huge change you did while renaming, and then
continue digging from that point, perhaps with:

	$ git log $commit^ -- foo.py

Earlier, I said "you know better than the tool", but that is a
slight lie.  You probably knew better than the tool back when
you made that commit, but you cannot be better than any tool to
remember that 10 years (or 6 months) after making that commit.

That's why you need to record somewhere you renamed foo.py to
bar.py in this commit.

We happen to have a perfect place to record such a rename and it
is called the commit log message.  If a rename matters *so* much
to your project, you not only would want to record the fact that
you renamed it, but you would want to record *WHY*, and the
commit log message is the perfect place to do both.

A single path being renamed is a special case of content
movement across file boundaries.  Pairwise diff that "git log
-p" gives cannot express one file being split into two or two
file merged into one, and that is independent of explicitly
recorded rename or inferred one.  We already have a better
solution for that general problem and it is called git-blame.

By the way, blame is not perfect.  One thing I sometimes find
lacking from it is that it only shows the final assignment of
the blame.  Often, a development goes like (1) code evolves in
two or more different files, (2) at one point somebody goes in
and cleans things up to move the duplicated code from these
files and consolidate into one file, (3) and the refactored
results gets polished further, drastically changing.  Many
times, step (2), especially when done by a competent person, is
done carefully not to change the logic of the code to avoid
regression, and lines from that revision does not appear in the
end result of the blame.

This is good when the reason why you are reading the blame
output is to figure out *why* a block of lines in the current
code is written that way (you do not care about the
restructuring history, but care more about the origin of the
code and the reasoning behind why it is that way), but when
doing archaeology sometimes I wish what happened in step (2) can
also be shown a bit more prominently in the final output.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 20:19         ` Martin Langhoff
@ 2007-03-20  8:33           ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2007-03-20  8:33 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Steven Grimm, Linus Torvalds, Nicolas Pitre, John Goerzen, git

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> In that area, git's mergers still have a bit to go -- I do hope for a
> day when I can say git-merge -s refactor or just git-merge -s
> tryharder so that it if hunks don't apply, git will try and trace
> where the block of code the hunk applies to has gone.

Let's say Oliver creates this project with a single file,
frotz.c.

-- >8 -- Oliver:frotz.c -- >8 --
 1 struct frotz {
 2     int nitfol;
 3     int xyzzy;
 4 };
 5
 6 int add_frotz(struct frotz *it)
 7 {
 8     return it->nitfol + it->xyzzy;
 9 }
-- 8< -- Oliver:frotz.c -- 8< --

Then Alice and Bob forks this project, and makes their own
modification.

Alice does this:

-- >8 -- Alice:frotz.c -- >8 --
 1 struct frotz {
 2     int filfre;
 3     int nitfol;
 4     int xyzzy;
 5 };
 6
 7 int add_frotz(struct frotz *it)
 8 {
 9     return it->filfre + it->nitfol + it->xyzzy;
10 }
-- 8< -- Alice:frotz.c -- 8< --

while Bob does this:

-- >8 -- Bob:frotz.h -- >8 --
 1 #ifndef FROTZ_H
 2 #define FROTZ_H
 3
 4 struct frotz {
 5     int nitfol;
 6     int xyzzy;
 7 };
 8 #endif /* FROTZ_H */
-- 8< -- Bob:frotz.h -- 8< --
-- >8 -- Bob:frotz.c -- >8 --
 1 #include <frotz.h>
 2
 3 int add_frotz(struct frotz *it)
 4 {
 5     return it->nitfol + it->xyzzy;
 6 }
-- 8< -- Bob:frotz.c -- 8< --

Now, Alice wants to merge the results of their efforts.

The "perfect merge strategy" (git-merge-blame) should be able to
detect the situation and ask Alice:

	On Bob's branch, the original file "frotz.c" was split
	into "frotz.h" and "frotz.c", while on your branch, the
	file was not split.  Both branches changed the file(s).

	Do you want to take the split [Y/n]?

We can do this by blaming from Alice:frotz.c, Bob:frotz.c and
Bob:frotz.h down to Oliver (which is the common ancestor
commit).

Let's say Alice likes the split and said yes.  Then, from the
"git-blame -C Bob -- frotz.h" output, git-merge-blame should be
able to figure out that this is what happened:

    B:1 +#ifndef FROTZ_H
    B:2 +#define FROTZ_H
    B:3 +
O:1 B:4  struct frotz {
O:2 B:5     int nitfol;
O:3 B:6     int xyzzy;
O:4 B:7  };
    B:8 +#endif /* FROTZ_H */

Similarly for "git-blame -C Bob -- frotz.c":

    B:1 +#include <frotz.h>
O:5 B:2  
O:6 B:3  int add_frotz(struct frotz *it)
O:7 B:4  {
O:8 B:5      return it->nitfol + it->xyzzy;
O:9 B:6  }

and for "git-blame -C Alice -- frotz.c",

O:1 A:1  struct frotz {
    A:2 +    int filfre;
O:2 A:3      int nitfol;
O:3 A:4      int xyzzy;
O:4 A:5  };
O:5 A:6 
O:6 A:7  int add_frotz(struct frotz *it)
O:7 A:8  {
O:8     -    return it->nitfol + it->xyzzy;
    A:9 +    return it->filfre + it->nitfol + it->xyzzy;
O:9 A:10 }

Looking at this, and because Alice chose to take the file split
Bob made, the merge strategy can insert line A:2 between B:4 and
B:5 to add filfre member to "struct frotz", and update B:5 with
A:9.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 19:21     ` Steven Grimm
@ 2007-03-21  0:06       ` Jakub Narebski
  2007-03-21  0:25         ` Johannes Schindelin
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Narebski @ 2007-03-21  0:06 UTC (permalink / raw)
  To: git

Steven Grimm wrote:

> The following is actually my biggest beef with git's rename tracking, 
> and it has nothing whatsoever to do with git-log (though I agree git-log 
> needs to track renames too):
> 
> $ ls
> dir1
> $ ls dir1
> file1 file2 file3
> $ echo "#include file1" > dir1/file4
> $ git add dir1/file4
> $ git commit
> $ git pull
> $ ls
> dir1 dir2
> $ ls dir1
> file4
> $ ls dir2
> file1 file2 file3
> 
> That's just plain broken in my opinion. One can perhaps contrive a test 
> case or two where that's the desired behavior, but in the real world it 
> is almost never what you actually want.
> 
> By the way, I don't think fixing that is necessarily related to how 
> renames get detected, so in some sense it's a different bug report / 
> feature request than the rename hints one. It would be possible to 
> figure out the directory had been renamed based purely on content 
> analysis; a bunch of files all individually renamed to the same places 
> under a new directory, and a lack of any files at all left in the old 
> one, probably means the directory got renamed. The content-based rename 
> detector could handle this case.
 
Actually this came out in some earlier talk about recording renames 
and renames detection (IIRC in the big thread about VCS comparison which
used to be on Bazaar-NG wiki). And one of the arguments about why directory
rename doesn't work as _you_ expected is (beside that it is not that
easy to code) the fact that the alternate solution (new files going
to old subdir) can be correct and expected by _others_.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-19 16:10 Rename handling John Goerzen
  2007-03-19 18:14 ` Steven Grimm
@ 2007-03-21  0:21 ` Jakub Narebski
  1 sibling, 0 replies; 31+ messages in thread
From: Jakub Narebski @ 2007-03-21  0:21 UTC (permalink / raw)
  To: git

John Goerzen wrote:

> I've read the FAQ and Linus' philosophy on this topic, and have some
> questions still.  I'm considering using Git and its philosophy on
> renames is troubling me.
> 
> My use for version control presently has most changes being written and
> committed by me directly, with occasional patches coming in from random
> others.  As such, running something like 'git mv' when a rename occurs
> is not a problem.
> 
> My main concerns with Git are:
> 
> 1) git log does not show complete history of files that have been
>    renamed or copied.
> 
>    If I have foo.txt, and rename it to bar.txt, the liberal use of -M
>    can tease out a proper patch from a number of places.  But
>    git log bar.txt, with any set of options I can possibly come up with,
>    absolutely refuses to show me the history of bar.txt before it was
>    renamed to bar.txt.  git log foo.txt also does not show me the old
>    history for the file.

That is because "bar.txt" in "git log bar.txt" is not file name to show
history of, but path limiter (BTW. it is not output filter, as it also
simplifies history). And you can say for example "git log Documentation/"
which I guess is not available in any other SCM beside Git.

There were at least two attempts to provide a kind of --follow=<filename>
to the git-log family of commands, to track/show history of a given file
across renames. See "Why does git not track renames?" entry in GitFaq
(http://git.or.cz/gitwiki/GitFaq) for some history; lately Linus has
send prototype of blame engine based implementation of --follow option.

> 2) For me, a rename is a logical change to the source tree that I want
>    to be recorded with absolute certainty, not guessed about later.
>    Sometimes I may make API changes and it is useful to see how module
>    names changed, with complete precision, later.  I do not want to be
>    victim to an incorrect guess, which could be possible.

This is much against Git philosophy of "tracking contents", although there
was talk allowing recording some optional _helper_ information about file
renames, in the proposed 'note' field (header) in commits, but it never
materialized.

You can always say that there was rename (or that file was split into two,
or that file was refactores) in the commit message.

Besides, the place where you want renames detection to work is during
merge. I don't know what would happen if merge base is so far back that
git doesn't recognize rename; on the other hand you get huge conflict
to resolve anyway.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-21  0:06       ` Jakub Narebski
@ 2007-03-21  0:25         ` Johannes Schindelin
  2007-03-21 22:28           ` Steven Grimm
  0 siblings, 1 reply; 31+ messages in thread
From: Johannes Schindelin @ 2007-03-21  0:25 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Hi,

On Wed, 21 Mar 2007, Jakub Narebski wrote:

> Steven Grimm wrote:
> 
> > The following is actually my biggest beef with git's rename tracking, 
> > and it has nothing whatsoever to do with git-log (though I agree git-log 
> > needs to track renames too):
> > 
> > $ ls
> > dir1
> > $ ls dir1
> > file1 file2 file3
> > $ echo "#include file1" > dir1/file4
> > $ git add dir1/file4
> > $ git commit
> > $ git pull
> > $ ls
> > dir1 dir2
> > $ ls dir1
> > file4
> > $ ls dir2
> > file1 file2 file3
> > 
> > That's just plain broken in my opinion. One can perhaps contrive a test 
> > case or two where that's the desired behavior, but in the real world it 
> > is almost never what you actually want.
> > 
> > By the way, I don't think fixing that is necessarily related to how 
> > renames get detected, so in some sense it's a different bug report / 
> > feature request than the rename hints one. It would be possible to 
> > figure out the directory had been renamed based purely on content 
> > analysis; a bunch of files all individually renamed to the same places 
> > under a new directory, and a lack of any files at all left in the old 
> > one, probably means the directory got renamed. The content-based rename 
> > detector could handle this case.
>  
> Actually this came out in some earlier talk about recording renames 
> and renames detection (IIRC in the big thread about VCS comparison which
> used to be on Bazaar-NG wiki). And one of the arguments about why directory
> rename doesn't work as _you_ expected is (beside that it is not that
> easy to code) the fact that the alternate solution (new files going
> to old subdir) can be correct and expected by _others_.

By now, there have been enough arguments _for_ automatic rename detection, 
but I'll add another one.

A colleague of mine worked on a certain file in a branch, where he copied 
the file to another location, and heavily modified it. He did that in a 
branch, and when he was satisfied with the result, he deleted the old 
file, since he liked the new location better.

Now, when I pulled, imagine my surprise (knowing the history of the file), 
when the pull reported a rename with a substantial similarity!

So, the automatic renamer did an awesome job.

Ciao,
Dscho

P.S.: It would be so nice if somebody (preferably someone who previously 
thought manual renames were a pretty clever thing) to write up the 
arguments, and add that to the "why automatic renaming?" section of the 
FAQ...
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-21  0:25         ` Johannes Schindelin
@ 2007-03-21 22:28           ` Steven Grimm
  2007-03-21 23:01             ` Johannes Schindelin
  2007-03-22  0:10             ` Martin Langhoff
  0 siblings, 2 replies; 31+ messages in thread
From: Steven Grimm @ 2007-03-21 22:28 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, git

Johannes Schindelin wrote:
> P.S.: It would be so nice if somebody (preferably someone who previously 
> thought manual renames were a pretty clever thing) to write up the 
> arguments, and add that to the "why automatic renaming?" section of the 
> FAQ...
>   

I completely understand the arguments in favor of automatic renaming. I 
have never once advocated getting rid of it. It is useful and valuable 
and works well within its constraints.

For some reason whenever I try to argue that we need, IN ADDITION to the 
automatic rename detection, a way to provide hints to the merge engine 
that a non-auto-detectable rename has occurred, the responses I get back 
are mostly of the form, "But the automatic rename detection handles all 
these cases that wouldn't be handled with manual rename marking!" It's 
as if one can either think autodetection is a good idea, or manual 
flagging is a good idea, but under no circumstances could they both be 
good ideas at the same time. (As evidenced by the comment above about 
"someone who previously thought manual renames were a pretty clever 
thing.") But they are not in fact mutually exclusive.

Say you're tracking a directory full of video files. Even a slight tweak 
to one of them (to put a logo in the corner, say, while moving it into 
an "accessible by the public" directory) will result in a file that has 
no content in common at all if you look at it as purely a stream of 
bytes. Short of decoding the thing to video frames and looking for 
similarities in the images, there's no way any merge tool will ever be 
able to tell the two versions are the same file unless the user 
indicates it. Any tool that saves its files in compressed form will have 
a similar problem: unless git knows how to uncompress the tool's files, 
a content comparison will often be useless to detect similarities.

Of course, git actually does give you a way to mark renames manually: 
commit them by themselves without changing the content. The problem is 
that that overloads the "record this snapshot of the tree for posterity" 
command purely for the purpose of working around the merge tool's 
inability to detect the rename. If other people are like me, when they 
record a rename-only commit immediately followed by a content-change 
commit on the same files, the intermediate state of the tree (with just 
the renames) is not actually an interesting point in the history of the 
project. It's not a revision in anything but an internal git sense. It 
probably doesn't even compile or work correctly. It exists only because 
I'm forced to create it if I want to be 100% certain my renames will be 
tracked accurately. It is, in short, pollution in the history of my project.

It also means that if I want reliable renames, I can no longer impose 
the requirement that my project be in a buildable state at each commit. 
That doesn't seem like all that unreasonable a thing to want (but maybe 
it is?) -- I don't want to be in the situation where I say, e.g. "git 
checkout -b testbranch '@{1 day ago}'" and get a broken working copy 
because I happened to do it at just the wrong time of day. But with the 
"just commit your renames separately" approach, that's exactly what can 
happen.

Now, once again, none of the above is an argument against the automatic 
rename detection. For cases where renames are automatically detectable, 
it works fine and will continue to do so, and in fact doesn't have the 
problem of committing broken builds. I am not arguing it should be 
replaced or that the user should be required to tell git about every 
rename. But the lack of an additional manual option forces me into a 
particular workflow that I wouldn't otherwise use and prevents me from 
imposing the workflow rules I *do* want.

Hopefully that shed a little light on why I think manual rename support 
is not a totally idiotic idea.

-Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-21 22:28           ` Steven Grimm
@ 2007-03-21 23:01             ` Johannes Schindelin
  2007-03-21 23:10               ` Linus Torvalds
  2007-03-22  0:10             ` Martin Langhoff
  1 sibling, 1 reply; 31+ messages in thread
From: Johannes Schindelin @ 2007-03-21 23:01 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Jakub Narebski, git

Hi,

On Wed, 21 Mar 2007, Steven Grimm wrote:

> Johannes Schindelin wrote:
> > P.S.: It would be so nice if somebody (preferably someone who 
> > previously thought manual renames were a pretty clever thing) to write 
> > up the arguments, and add that to the "why automatic renaming?" 
> > section of the FAQ...
> 
> For some reason whenever I try to argue that we need, IN ADDITION to the 
> automatic rename detection, a way to provide hints to the merge engine 
> that a non-auto-detectable rename has occurred, the responses I get back 
> are mostly of the form, "But the automatic rename detection handles all 
> these cases that wouldn't be handled with manual rename marking!"

No. That's not the argument. The argument goes like this:

	Whatever solutions you choose to handle renames, it _will_ have 
	problems. We chose a solution which appears to have the least
	problems.

> Say you're tracking a directory full of video files. Even a slight tweak 
> to one of them (to put a logo in the corner, say, while moving it into 
> an "accessible by the public" directory) will result in a file that has 
> no content in common at all if you look at it as purely a stream of 
> bytes. Short of decoding the thing to video frames and looking for 
> similarities in the images, there's no way any merge tool will ever be 
> able to tell the two versions are the same file unless the user 
> indicates it.

That is a particularly bad example: you are not renaming files in that 
example!

> Of course, git actually does give you a way to mark renames manually: 
> commit them by themselves without changing the content. The problem is 
> that that overloads the "record this snapshot of the tree for posterity" 
> command purely for the purpose of working around the merge tool's 
> inability to detect the rename.

Not at all. You are actually recording the rename. So, you proved that you 
do have a method to record a rename manually.

> It also means that if I want reliable renames, I can no longer impose 
> the requirement that my project be in a buildable state at each commit. 
> That doesn't seem like all that unreasonable a thing to want (but maybe 
> it is?) -- I don't want to be in the situation where I say, e.g. "git 
> checkout -b testbranch '@{1 day ago}'" and get a broken working copy 
> because I happened to do it at just the wrong time of day. But with the 
> "just commit your renames separately" approach, that's exactly what can 
> happen.

Hey, I might be wrong. Why don't you prove me wrong? Code talks.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-21 23:01             ` Johannes Schindelin
@ 2007-03-21 23:10               ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-03-21 23:10 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Steven Grimm, Jakub Narebski, git



On Thu, 22 Mar 2007, Johannes Schindelin wrote:
> 
> That is a particularly bad example: you are not renaming files in that 
> example!

Well, yes and no.

I would actually say that it si a particularly *good* example.

With git, you can actually record renames exactly this way: you just need 
to make sure that you don't change the content, and you make it two 
independent commits.

That is in fact how some systems that support "explicit renames" actually 
do it: the rename is literally a separate option, and cannot necessarily 
go together with other actions (in particular, several file-ID-following 
systems do not allow "cross-renames" in the same commit, for example, and 
you actually have to do them as two separate commits).

Git *allows* you to do renames with changes. In fact, I'd normally 
encourage it. But it doesn't force it, and then renames are totally 
unambiguos except for the case where you simply have the *same*file* in 
multiple places, and you remove or add multiple copies (again, you can do 
that unambiguously too, if you limit it to *one* such rename per commit)

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-21 22:28           ` Steven Grimm
  2007-03-21 23:01             ` Johannes Schindelin
@ 2007-03-22  0:10             ` Martin Langhoff
  2007-03-22  2:01               ` Jakub Narebski
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Langhoff @ 2007-03-22  0:10 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Johannes Schindelin, Jakub Narebski, git

On 3/22/07, Steven Grimm <koreth@midwinter.com> wrote:
> Say you're tracking a directory full of video files. Even a slight tweak
> to one of them (to put a logo in the corner, say, while moving it into
> an "accessible by the public" directory) will result in a file that has
> no content in common at all if you look at it as purely a stream of

In that case, tracking the rename is not useful at all from the POV of
your SCM. The  reason the SCM needs to understand content-movement (of
which renames are a special type), it to help you as much as possible
at merge time.

So - git as an SCM focusses on tracking your content, and helping you
merge. It does _that_ probably better than any other SCM. So git
internat data structures care strictly about the stuff that is needed
for git's operation as an SCM.

And in the context of helping you merge, explicit rename tracking is a
red-herring. This point is arguable - Linus said earlier "you can do
better by tracking content and ignoring explicit renames" and we are
now getting there in terms of having code that does better.

Of course in your case the fact that there was a rename is important
-- for users. This kind of information is not metadata for the SCM but
for users. So that goes into the commit message, which is freeform. So
- working with your scenario, if this happens often, I would suggest
having a pre-commit hook that prepares a nice commit text message
listing likely renames if they can be sussed out automatically.

Or having a custom git-mv that collects mv operations and then your
pre-commit-hook preps your commit message with that manifest of moved
files.

Does it make sense? It is data-for-the-user, so it goes in the commit
msg. If it's data-for-the-SCM machinery, then it goes into the
tracking data git handles internally.

cheers,


martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-22  0:10             ` Martin Langhoff
@ 2007-03-22  2:01               ` Jakub Narebski
  2007-03-22  2:39                 ` Martin Langhoff
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Narebski @ 2007-03-22  2:01 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Steven Grimm, Johannes Schindelin, git

Martin Langhoff wrote:
> On 3/22/07, Steven Grimm <koreth@midwinter.com> wrote:
>>
>> Say you're tracking a directory full of video files. Even a slight tweak
>> to one of them (to put a logo in the corner, say, while moving it into
>> an "accessible by the public" directory) will result in a file that has
>> no content in common at all if you look at it as purely a stream of
> 
> In that case, tracking the rename is not useful at all from the POV of
> your SCM. The  reason the SCM needs to understand content-movement (of
> which renames are a special type), it to help you as much as possible
> at merge time.
> 
> So - git as an SCM focusses on tracking your content, and helping you
> merge. It does _that_ probably better than any other SCM. So git
> internat data structures care strictly about the stuff that is needed
> for git's operation as an SCM.
> 
> And in the context of helping you merge, explicit rename tracking is a
> red-herring. This point is arguable - Linus said earlier "you can do
> better by tracking content and ignoring explicit renames" and we are
> now getting there in terms of having code that does better.

Additional issue that we have to think about with respect to rename
support for merges is that git uses 3-way merge, taking into account
_only_ upstream commit (of the branch we want to merge to), side branch
commit (of the branch we want to merge) and common ancestor[*1*] 
(merge base) for merging. What is important is that the intermediate
states, how we got to the current state, does not matter.

Well, one could argue that if we remember explicit (provided by user)
info about renames for example in proposed 'note' field of a commit
object, or in other helper structure (we cannot remember the information
in blob or tree), we can gather and remember information about recorded
explicit renames when finding common ancestor...

Although I think it would be better and easier to just provide rerere2
cache to git-rerere to record corrections to rename detection, and use
it in subsequent merges (this was proposed, but IIRC not implemented)...

> Of course in your case the fact that there was a rename is important
> -- for users. This kind of information is not metadata for the SCM but
> for users. So that goes into the commit message, which is freeform. So
> - working with your scenario, if this happens often, I would suggest
> having a pre-commit hook that prepares a nice commit text message
> listing likely renames if they can be sussed out automatically.
> 
> Or having a custom git-mv that collects mv operations and then your
> pre-commit-hook preps your commit message with that manifest of moved
> files.
> 
> Does it make sense? It is data-for-the-user, so it goes in the commit
> msg. If it's data-for-the-SCM machinery, then it goes into the
> tracking data git handles internally.

Still, it would be nice to have --follow=<file> option to git-log family,
besides path limiting. And this could have take use of explicit recording
of renames (much easier than merge can).


References
==========
[*1*] Well, it can be a bit more complicated if there is more than one
common ancestor; git uses recursive merge strategy.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-22  2:01               ` Jakub Narebski
@ 2007-03-22  2:39                 ` Martin Langhoff
  2007-03-22  3:32                   ` Jakub Narebski
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Langhoff @ 2007-03-22  2:39 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Steven Grimm, Johannes Schindelin, git

On 3/22/07, Jakub Narebski <jnareb@gmail.com> wrote:
> Additional issue that we have to think about with respect to rename
> support for merges is that git uses 3-way merge, taking into account
> _only_ upstream commit (of the branch we want to merge to), side branch
> commit (of the branch we want to merge) and common ancestor[*1*]
> (merge base) for merging. What is important is that the intermediate
> states, how we got to the current state, does not matter.
>
> Well, one could argue that if we remember explicit (provided by user)
> info about renames for example in proposed 'note' field of a commit
> object, or in other helper structure (we cannot remember the information
> in blob or tree), we can gather and remember information about recorded
> explicit renames when finding common ancestor...

But we do have some of that already - if one trees being merged is
missing a path that changed on the other one, we walk back on the
ancestry looking for renames.

Or am I seeing things?

> Still, it would be nice to have --follow=<file> option to git-log family,
> besides path limiting.

+1 here - git log should have something equivalent to diff's -M. When
the file "disappears", run a diff-tree -M -C against the parents to
see whether there were any "related predecessors" to the file to add
to the pathspec. Of course, there could be more than one.

For example, right now, git log git-cvsimport.perl ends at the big tool rename.

cheers,


martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-22  2:39                 ` Martin Langhoff
@ 2007-03-22  3:32                   ` Jakub Narebski
  2007-03-22  3:53                     ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Narebski @ 2007-03-22  3:32 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Steven Grimm, Johannes Schindelin, git

Martin Langhoff wrote:
> On 3/22/07, Jakub Narebski <jnareb@gmail.com> wrote:
>> Additional issue that we have to think about with respect to rename
>> support for merges is that git uses 3-way merge, taking into account
>> _only_ upstream commit (of the branch we want to merge to), side branch
>> commit (of the branch we want to merge) and common ancestor[*1*]
>> (merge base) for merging. What is important is that the intermediate
>> states, how we got to the current state, does not matter.
>>
>> Well, one could argue that if we remember explicit (provided by user)
>> info about renames for example in proposed 'note' field of a commit
>> object, or in other helper structure (we cannot remember the information
>> in blob or tree), we can gather and remember information about recorded
>> explicit renames when finding common ancestor...
> 
> But we do have some of that already - if one trees being merged is
> missing a path that changed on the other one, we walk back on the
> ancestry looking for renames.
> 
> Or am I seeing things?

First, I was talking about hypotetical manually-provided helper information
about explicit renames, entered by user, not guessed by SCM.

Second, I have thought that rename detection is done on final states: upstream,
branch and ancestor, not on intermediate commits. I guess I thought wrong.
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Rename handling
  2007-03-22  3:32                   ` Jakub Narebski
@ 2007-03-22  3:53                     ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-03-22  3:53 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Martin Langhoff, Steven Grimm, Johannes Schindelin, git



On Thu, 22 Mar 2007, Jakub Narebski wrote:
> 
> Second, I have thought that rename detection is done on final states: upstream,
> branch and ancestor, not on intermediate commits. I guess I thought wrong.

No, you didn't think wrong, with a few caveats:

 - we *do* do intermediate commits occasionally (ie for the criss-cross 
   merge case and the "recursive" part of the merge strategy). But that's 
   strictly a "we had multiple potential merge bases" issue, not a "track 
   renames through every commit" kind of thing.

 - you should also see the 3-way merge as the *first* strategy. If it 
   fails, you could do more involved stuff (ie the "blame" merge 
   strategy).

Personally, I think the three-way merge (aka "stupid") is absolutely the 
right thing to do. SCM projects that always try to take intervening 
commits into account (*cough*darcs*cough*) are just doing masturbation. 
It's pointless. The history only matters as a "what was the common state" 
thing, the intermediate mistakes you did in between are meaningless.

But my point is that if you *wanted* to, you could do something fancy. I 
think it would likely be stupid and wrong, and just cause subtle mismerges 
rather than actually *help*, but that's just my opinion. Git itself 
doesn't *force* you to just take the end-points into account, although my 
opinion that they are the only things that matter certainly may have 
colored how we do things right now ;)

			Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-03-22  3:54 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-19 16:10 Rename handling John Goerzen
2007-03-19 18:14 ` Steven Grimm
2007-03-19 18:35   ` Nicolas Pitre
2007-03-19 18:48     ` Linus Torvalds
2007-03-19 19:57       ` Steven Grimm
2007-03-19 20:19         ` Martin Langhoff
2007-03-20  8:33           ` Junio C Hamano
2007-03-19 20:22         ` Linus Torvalds
2007-03-19 20:02       ` Robin Rosenberg
2007-03-19 20:34         ` Linus Torvalds
2007-03-19 19:36     ` Steven Grimm
2007-03-19 19:45       ` Steven Grimm
2007-03-19 20:07         ` Linus Torvalds
2007-03-19 20:17       ` Nicolas Pitre
2007-03-19 20:44       ` Daniel Barkalow
2007-03-19 19:03   ` Andy Parkins
2007-03-19 19:21     ` Steven Grimm
2007-03-21  0:06       ` Jakub Narebski
2007-03-21  0:25         ` Johannes Schindelin
2007-03-21 22:28           ` Steven Grimm
2007-03-21 23:01             ` Johannes Schindelin
2007-03-21 23:10               ` Linus Torvalds
2007-03-22  0:10             ` Martin Langhoff
2007-03-22  2:01               ` Jakub Narebski
2007-03-22  2:39                 ` Martin Langhoff
2007-03-22  3:32                   ` Jakub Narebski
2007-03-22  3:53                     ` Linus Torvalds
2007-03-19 19:15   ` Daniel Barkalow
2007-03-19 19:49   ` John Goerzen
2007-03-19 22:27     ` Junio C Hamano
2007-03-21  0:21 ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).