All of lore.kernel.org
 help / color / mirror / Atom feed
* git full diff output issues..
@ 2005-05-26 19:19 Linus Torvalds
  2005-05-26 19:25 ` Linus Torvalds
  2005-05-26 23:34 ` Chris Wedgwood
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2005-05-26 19:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List


While testing my "git-apply" thing (coming along quite nicely, thanks for
asking), I've hit a case that is nasty to parse.

This is from the 2.6.12-rc4 -> 2.6.12-rc5 patch:

	diff --git a/arch/um/kernel/checksum.c b/arch/um/kernel/checksum.c
	deleted file mode 100644
	diff --git a/arch/um/kernel/initrd.c b/arch/um/kernel/initrd.c
	new file mode 100644
	--- /dev/null
	+++ b/arch/um/kernel/initrd.c
	@@ -0,0 +1,78 @@

and the magic here is that deleted file that was empty to begin with, so 
it didn't have a patch, just a note on deletion.

Why is that nasty? Because we don't have the file _name_ in any good 
format. The filename only exists int he "diff --git" header, and that one 
has the space-parsing issue, which makes it less than optimal.

I'd suggest we enhance the "full diff" output for new and deleted files to 
match the rename output, ie we'd give the actual filename on that line 
too, to avoid any ambiguities.

So we'd change it from

	deleted file mode 100644

to

	deleted file mode 100644 arch/um/kernel/checksum.c

in this case..

Comments?

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 19:19 git full diff output issues Linus Torvalds
@ 2005-05-26 19:25 ` Linus Torvalds
  2005-05-26 19:36   ` Junio C Hamano
                     ` (2 more replies)
  2005-05-26 23:34 ` Chris Wedgwood
  1 sibling, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2005-05-26 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List



On Thu, 26 May 2005, Linus Torvalds wrote:
> 
> So we'd change it from
> 
> 	deleted file mode 100644
> 
> to
> 
> 	deleted file mode 100644 arch/um/kernel/checksum.c
> 
> in this case..

I just realized that this same thing is equally true of just plain mode 
changes, where wif we don't have any content we just get

	diff --git a/name b/name
	old mode xxxx
	new mode yyyy

so I might as well parse the diff header here (I don't want to repeat the 
name twice for mode changes). Oh well.

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 19:25 ` Linus Torvalds
@ 2005-05-26 19:36   ` Junio C Hamano
  2005-05-26 19:46   ` Junio C Hamano
  2005-05-26 19:53   ` Anton Altaparmakov
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2005-05-26 19:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Thu, 26 May 2005, Linus Torvalds wrote:
>> 
>> So we'd change it from
>> 
>> deleted file mode 100644
>> 
>> to
>> 
>> deleted file mode 100644 arch/um/kernel/checksum.c
>> 
>> in this case..

LT> I just realized that this same thing is equally true of just plain mode 
LT> changes, where wif we don't have any content we just get

LT> 	diff --git a/name b/name
LT> 	old mode xxxx
LT> 	new mode yyyy

LT> so I might as well parse the diff header here (I don't want to repeat the 
LT> name twice for mode changes). Oh well.

So what do you want?  created and deleted would acquire path and
mode thing doesn't?  I think adding path only to "new mode" line
would be a sensible compromise, since we are interested in what
the resulting tree would look like most of the time.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 19:25 ` Linus Torvalds
  2005-05-26 19:36   ` Junio C Hamano
@ 2005-05-26 19:46   ` Junio C Hamano
  2005-05-26 19:53   ` Anton Altaparmakov
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2005-05-26 19:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

I'd appreciate it if you take these two patches I sent last
night.

    * Add git-external-diff-script

    This is a demonstration of GIT_EXTERNAL_DIFF mechanism, and a
    testbed for tweaking and enhancing what the built-in diff should
    be.  This script is designed to output exactly the same output
    as the built-in diff driver produces when set as GIT_EXTERNAL_DIFF.

    * Diff updates.

    With the introduction of 'T', and the "apply-patch" Linus has
    been quietly working on without much advertisement, it started
    to make sense to emit usable information in the "diff --git"
    patch output format.  Earlier built-in diff driver punted and
    did not say anything about a symbolic link changing into a file
    or vice versa, but this version represents that as a pair of
    deletion and creation.

After that, you can experiment to flush out issues with the
current built-in using git-external-diff-script for quick
turnaround.  When you have a concrete "ok this is good" format
we can port that to C in diff.c:builtin_diff().


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 19:25 ` Linus Torvalds
  2005-05-26 19:36   ` Junio C Hamano
  2005-05-26 19:46   ` Junio C Hamano
@ 2005-05-26 19:53   ` Anton Altaparmakov
  2005-05-26 20:33     ` Linus Torvalds
  2 siblings, 1 reply; 13+ messages in thread
From: Anton Altaparmakov @ 2005-05-26 19:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Thu, 26 May 2005, Linus Torvalds wrote:
> On Thu, 26 May 2005, Linus Torvalds wrote:
> > 
> > So we'd change it from
> > 
> > 	deleted file mode 100644
> > 
> > to
> > 
> > 	deleted file mode 100644 arch/um/kernel/checksum.c
> > 
> > in this case..
> 
> I just realized that this same thing is equally true of just plain mode 
> changes, where wif we don't have any content we just get
> 
> 	diff --git a/name b/name
> 	old mode xxxx
> 	new mode yyyy
> 
> so I might as well parse the diff header here (I don't want to repeat the 
> name twice for mode changes). Oh well.

Given that git already has the metadata lines in the diff ("old mode", 
"deleted file mode", etc) why not simply add another metadata line "name" 
and what follows that is the name until an end of line character (or a NUL 
if you want file names with embedded new lines).  You can then only emit 
the "name" metadata line when no actual diff is present and hence the name 
is uncertain.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 19:53   ` Anton Altaparmakov
@ 2005-05-26 20:33     ` Linus Torvalds
  2005-05-26 20:56       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2005-05-26 20:33 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Junio C Hamano, Git Mailing List



On Thu, 26 May 2005, Anton Altaparmakov wrote:
> 
> Given that git already has the metadata lines in the diff ("old mode", 
> "deleted file mode", etc) why not simply add another metadata line "name" 
> and what follows that is the name until an end of line character (or a NUL 
> if you want file names with embedded new lines).  You can then only emit 
> the "name" metadata line when no actual diff is present and hence the name 
> is uncertain.

Yes, that would work. 

However, I ended up just validating the name parsing by making sure that 
when I parse the "git --diff" line, I only take the name if I can see it 
being the same for both the old and the new. IOW, if I see

	diff --git a/hi b/hello

then I won't take it, but if I see

	diff --git hi there/I am/being difficult   oopsie dir/I am/being difficult

then I get "I am/being difficult" by virtue of checking the two names 
against each other.

This means, btw, that the "git --diff" format must _not_ do

	diff --git a/file /dev/null
	deleted file mode 100644

because in that case I don't trust the filename enough. Of course, this
all only happens when deleting empty files, if the file had any contents,
then I will see the unambiguos filename on the '---' line, and again be
happy.

IOW, git-apply is being pretty anal about things, but it looks like that
works out well.

			Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 20:33     ` Linus Torvalds
@ 2005-05-26 20:56       ` Junio C Hamano
  2005-05-26 21:09         ` Linus Torvalds
  2005-06-05  8:46         ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2005-05-26 20:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Anton Altaparmakov, Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> This means, btw, that the "git --diff" format must _not_ do

LT> 	diff --git a/file /dev/null
LT> 	deleted file mode 100644

I just checked, and both built-in and git-external-diff-script
should be safe about this issue.

So what do you want from me at this point?  Nothing?
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 20:56       ` Junio C Hamano
@ 2005-05-26 21:09         ` Linus Torvalds
  2005-05-26 21:33           ` Junio C Hamano
  2005-06-05  8:46         ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2005-05-26 21:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Anton Altaparmakov, Git Mailing List



On Thu, 26 May 2005, Junio C Hamano wrote:
> 
> So what do you want from me at this point?  Nothing?

Yeah, I'm happy. Sorry for the false alarm.

Anyway, at this point

	git-apply --stat

is actually already useful: it's a diffstat clone. Which is perhaps not
very useful in itself, but it has the advantage of being an easy way to
check that I do the right thing there, and may well be useful also for the
git-specific extensions (ie right now it's really purely a diffstat clone,
but there's nothign that says that it couldn't show rename information etc 
too, which diffstat doesn't understand).

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 21:09         ` Linus Torvalds
@ 2005-05-26 21:33           ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2005-05-26 21:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Anton Altaparmakov, Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Thu, 26 May 2005, Junio C Hamano wrote:
>> 
>> So what do you want from me at this point?  Nothing?

LT> Yeah, I'm happy. Sorry for the false alarm.

No problem.  I still kinda like Anton's proposal for conceptual
cleanness, but if the tool can cope with what we already have
then less cluttering in the output is better for human eyes.

Let me again remind you about git-external-diff-script patch.
When you encounter more gotcha in the built-in diff output
format in the future, it would be a valuable tool to experiment
and express what you would like to have git-apply to parse.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 19:19 git full diff output issues Linus Torvalds
  2005-05-26 19:25 ` Linus Torvalds
@ 2005-05-26 23:34 ` Chris Wedgwood
  2005-05-26 23:49   ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Wedgwood @ 2005-05-26 23:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Thu, May 26, 2005 at 12:19:21PM -0700, Linus Torvalds wrote:

> 	deleted file mode 100644 arch/um/kernel/checksum.c

why do we care about the mode here?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 23:34 ` Chris Wedgwood
@ 2005-05-26 23:49   ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2005-05-26 23:49 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Junio C Hamano, Git Mailing List



On Thu, 26 May 2005, Chris Wedgwood wrote:
>
> On Thu, May 26, 2005 at 12:19:21PM -0700, Linus Torvalds wrote:
> 
> > 	deleted file mode 100644 arch/um/kernel/checksum.c
> 
> why do we care about the mode here?

Junio makes the (correct) argument that patches should be reversible.

And the reverse of a delete is a create. And thus the file mode of the
file that got deleted matters.

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-05-26 20:56       ` Junio C Hamano
  2005-05-26 21:09         ` Linus Torvalds
@ 2005-06-05  8:46         ` Junio C Hamano
  2005-06-05 15:11           ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2005-06-05  8:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> This means, btw, that the "git --diff" format must _not_ do

LT> diff --git a/file /dev/null
LT> deleted file mode 100644

JCH> I just checked, and both built-in and git-external-diff-script
JCH> should be safe about this issue.

Sorry, I spoke too soon about a week and half ago X-<, and I am
bugging you about this because this clearly belongs to "fix"
category not "new stuff".

The case you mentioned (i.e. /dev/null) is fine but rename/copy
is "broken" according to the definition by git-apply.

What do you want the diff-patch format to say for this one?

    :100644 100644 SHA1-OLD SHA1-NEW R frotz.c nitfol.c

Currently I am saying:

    diff --git a/frotz.c b/nitfol.c
    similarity index 89%
    rename old frotz.c
    rename new nitfol.c
    --- a/frotz.c
    +++ b/nitfol.c
    @@ ...

and this makes git-apply barf, because a/ and b/ names are
different.  Is the following what you want?  That is, do you
always want p->two->path (name in the right hand side tree)?

    diff --git a/nitfol.c b/nitfol.c
    similarity index 89%
    rename old frotz.c
    rename new nitfol.c
    --- a/frotz.c
    +++ b/nitfol.c
    @@ ...

According to the current apply.c, git_header_name() does not
care as long as a/ and b/ names are the same (that is, I could
even say "diff --git a/junkio b/junkio" to make it grok the
above example, as long as I have the correct "rename old" and
"rename new" in the extended header part).  In that sense, it
all boils down to which name you, as a human consumer of the
patch, would want to see on the header, if we go the route of
making a/ and b/ name always the same.  However I suspect that
this slightly breaks patch reversibility.

If we do care about patch reversibility, having a/ and b/ names
to show the pre- and post- paths like my current output does
(which _does_ break the current apply.c name checking) is
probably the most sensible thing to keep things symmetric.  I am
not sure if it is worth it to make the name checking logic in
apply.c more complicated only to support this rename symmetry,
though.

Another possibility; since "diff --git" is a git-specific header
format anyway, we could quote things to help apply.c parsing it,
without introducing too much clutter for ordinary cases.  How
about taking advantage of the fact that most pathnames do not
contain spaces nor backslashes, and if we see them we simply
quote, like this?

    # no need for quote
    diff --git a/frotz.c b/nitfol.c
    rename old frotz.c
    rename new nitfol.c
    
    # patch for "frotz and nitfol.c"
    diff --git a/frotz\ and\ nitfol.c b/frotz\ and\ nitfol.c

    # rename but filename has spaces and a backslash
    diff --git a/old\ name\\with\ bs b/new\ name\\with\ bs
    rename old old name\with bs
    rename new new name\with bs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git full diff output issues..
  2005-06-05  8:46         ` Junio C Hamano
@ 2005-06-05 15:11           ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2005-06-05 15:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 5 Jun 2005, Junio C Hamano wrote:
> 
> The case you mentioned (i.e. /dev/null) is fine but rename/copy
> is "broken" according to the definition by git-apply.

No problem, the renames always get the names from the "rename" line, not 
the header. Same goes for copies.

It's only modified files that keep the same name _and_ the same content 
that don't have the name uniquely on a line somewhere.

> What do you want the diff-patch format to say for this one?
> 
>     :100644 100644 SHA1-OLD SHA1-NEW R frotz.c nitfol.c
> 
> Currently I am saying:
> 
>     diff --git a/frotz.c b/nitfol.c
>     similarity index 89%
>     rename old frotz.c
>     rename new nitfol.c
>     --- a/frotz.c
>     +++ b/nitfol.c

This finds the old names unambiguously in _two_ places: in the "--- " line 
(no question about where it begins: it's -p1, or where it ends - at the 
newline) _and_ on the "rename old xxxx" line.

The only case that was special was literally the "same name, no content 
changes, new mode" case, which looked like

	diff --git a/oldname.c b/oldname.c
	new mode 100755
	old mode 100644

and thus _only_ had the name in the (normally ambiguous wrt whitepsace)  
header line.

But by having the requirement that the format of the header line for that
case is "-p1" together with both names being the same, it's not ambigious 
any more.

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-06-05 15:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-26 19:19 git full diff output issues Linus Torvalds
2005-05-26 19:25 ` Linus Torvalds
2005-05-26 19:36   ` Junio C Hamano
2005-05-26 19:46   ` Junio C Hamano
2005-05-26 19:53   ` Anton Altaparmakov
2005-05-26 20:33     ` Linus Torvalds
2005-05-26 20:56       ` Junio C Hamano
2005-05-26 21:09         ` Linus Torvalds
2005-05-26 21:33           ` Junio C Hamano
2005-06-05  8:46         ` Junio C Hamano
2005-06-05 15:11           ` Linus Torvalds
2005-05-26 23:34 ` Chris Wedgwood
2005-05-26 23:49   ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.