All of lore.kernel.org
 help / color / mirror / Atom feed
* Reverting an uncommitted revert
@ 2009-05-20  2:34 Joshua Jensen
  2009-05-20  3:10 ` Nicolas Pitre
  0 siblings, 1 reply; 28+ messages in thread
From: Joshua Jensen @ 2009-05-20  2:34 UTC (permalink / raw)
  To: git

I had an interesting experience the other day I'd like to share.  It is 
a story of my stupidity plus a bug in the latest version of the Mac OS X 
'git gui'.

I had spent several evenings working on a piece of source code.  It was 
basically working, but I hadn't committed anything yet. Yes, I know I 
should have made a branch and performed incremental commits, but I 
didn't.  This is why I was stupid.

Because of a busy week, it was many days before I could get back to what 
I was working on.  By then, I had forgotten what I was doing, and so I 
pulled up the handy 'git gui' to show me my diffs.

'git gui' showed me a single whitespace change in the diff.  The scroll 
bar extended from top to bottom in the diff window and wouldn't move.  
(This is the bug, which is easily reproducible.)  I thought, "Gee, 
that's weird.  I thought I'd made more changes."  Not wanting the silly 
whitespace change, I chose Commit->Revert Changes from the menu.

A couple hours later, I was pulling my hair out trying to find the 
change that I knew, deep down, I had reverted.

I also knew there was no support for putting a reverted change in the 
reflog, but hey, the reflog has saved me before.  I looked.  It wasn't 
there.

So here's the idea: What if Git, upon a revert change (or git reset 
--hard HEAD), "committed" the changes to be reverted and then did the 
revert with a 'git reset --hard HEAD^'?  The reverted files would be 
disconnected from a branch, but they would be available in the reflog to 
retrieve.

Or do I just not 'get it'?

Thanks!

Josh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  2:34 Reverting an uncommitted revert Joshua Jensen
@ 2009-05-20  3:10 ` Nicolas Pitre
  2009-05-20  3:21   ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20  3:10 UTC (permalink / raw)
  To: Joshua Jensen; +Cc: git

On Tue, 19 May 2009, Joshua Jensen wrote:

> So here's the idea: What if Git, upon a revert change (or git reset --hard
> HEAD), "committed" the changes to be reverted and then did the revert with a
> 'git reset --hard HEAD^'?  The reverted files would be disconnected from a
> branch, but they would be available in the reflog to retrieve.

I think there is indeed some value in having a commit of the work 
directory dirty state automatically made before this state is discarded, 
and stuff a reference to that commit in the HEAD reflog.  I think such a 
feature would need to be made configurable and active by default.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  3:10 ` Nicolas Pitre
@ 2009-05-20  3:21   ` Jeff King
  2009-05-20  3:35     ` Nicolas Pitre
  2009-05-20 17:50     ` Junio C Hamano
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2009-05-20  3:21 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Joshua Jensen, git

On Tue, May 19, 2009 at 11:10:14PM -0400, Nicolas Pitre wrote:

> > So here's the idea: What if Git, upon a revert change (or git reset --hard
> > HEAD), "committed" the changes to be reverted and then did the revert with a
> > 'git reset --hard HEAD^'?  The reverted files would be disconnected from a
> > branch, but they would be available in the reflog to retrieve.
> 
> I think there is indeed some value in having a commit of the work 
> directory dirty state automatically made before this state is discarded, 

Related to this, I have wondered if it might be useful to have an "index
reflog". If I do something like this:

  $ git add foo
  $ hack hack hack
  $ git add foo

Then the first added state of "foo" is available in the object database,
but it is not connected to the name "foo" in any way, which makes it
much harder to find. If we had a reflog pointing to trees representing
the index state after each change, then it would be simple (you could
look at "INDEX@{1}:foo" or similar).

I don't know if the performance is an issue. We are writing an extra
tree every time we touch the index, but in many cases you are already
writing a blob.

> and stuff a reference to that commit in the HEAD reflog.  I think such a 
> feature would need to be made configurable and active by default.

I'm not sure if putting it in the HEAD reflog is a good idea. The
reflogs and what goes in them are very user-visible. Do users make the
assumption that HEAD@{1} will be where they expect (e.g., because they
just did a reset, they expect HEAD@{1} to be the prevoius commit, but
with this patch it would actually be HEAD@{2}, with HEAD@{1} as the
worktree pseudo-commit)?

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  3:21   ` Jeff King
@ 2009-05-20  3:35     ` Nicolas Pitre
  2009-05-20  3:38       ` Jeff King
                         ` (2 more replies)
  2009-05-20 17:50     ` Junio C Hamano
  1 sibling, 3 replies; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20  3:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Joshua Jensen, git

On Tue, 19 May 2009, Jeff King wrote:

> On Tue, May 19, 2009 at 11:10:14PM -0400, Nicolas Pitre wrote:
> 
> > > So here's the idea: What if Git, upon a revert change (or git reset --hard
> > > HEAD), "committed" the changes to be reverted and then did the revert with a
> > > 'git reset --hard HEAD^'?  The reverted files would be disconnected from a
> > > branch, but they would be available in the reflog to retrieve.
> > 
> > I think there is indeed some value in having a commit of the work 
> > directory dirty state automatically made before this state is discarded, 
> 
> Related to this, I have wondered if it might be useful to have an "index
> reflog". If I do something like this:
> 
>   $ git add foo
>   $ hack hack hack
>   $ git add foo
> 
> Then the first added state of "foo" is available in the object database,
> but it is not connected to the name "foo" in any way, which makes it
> much harder to find. If we had a reflog pointing to trees representing
> the index state after each change, then it would be simple (you could
> look at "INDEX@{1}:foo" or similar).
> 
> I don't know if the performance is an issue. We are writing an extra
> tree every time we touch the index, but in many cases you are already
> writing a blob.

Well... Actually I think having a reflog dedicated to discarded state 
would probably be a better idea. And...

> > and stuff a reference to that commit in the HEAD reflog.  I think such a 
> > feature would need to be made configurable and active by default.
> 
> I'm not sure if putting it in the HEAD reflog is a good idea. The
> reflogs and what goes in them are very user-visible. Do users make the
> assumption that HEAD@{1} will be where they expect (e.g., because they
> just did a reset, they expect HEAD@{1} to be the prevoius commit, but
> with this patch it would actually be HEAD@{2}, with HEAD@{1} as the
> worktree pseudo-commit)?

Having a "trash" reflog would solve this unambiguously.  That could also 
include your index example above.  However, in the index case, I'd 
record a reflog entry only if you're about to discard a previously non 
committed entry.  If you do:

	$ git add foo
	$ git add bar
	$ git commit
	$ hack hack hack
	$ git add foo

then in this case there is nothing to be lost hence no additional entry 
in the "trash" reflog.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  3:35     ` Nicolas Pitre
@ 2009-05-20  3:38       ` Jeff King
  2009-05-20  4:58       ` Ping Yin
  2009-05-20  9:15       ` Wincent Colaiuta
  2 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2009-05-20  3:38 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Joshua Jensen, git

On Tue, May 19, 2009 at 11:35:45PM -0400, Nicolas Pitre wrote:

> Having a "trash" reflog would solve this unambiguously.  That could also 
> include your index example above.


Yeah, I think combining the two in a single log makes sense.

> However, in the index case, I'd record a reflog entry only if you're
> about to discard a previously non committed entry.  If you do:
> 
> 	$ git add foo
> 	$ git add bar
> 	$ git commit
> 	$ hack hack hack
> 	$ git add foo
> 
> then in this case there is nothing to be lost hence no additional entry 
> in the "trash" reflog.

Good point. That both helps the performance and keeps the log less
cluttered.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  3:35     ` Nicolas Pitre
  2009-05-20  3:38       ` Jeff King
@ 2009-05-20  4:58       ` Ping Yin
  2009-05-20  9:15       ` Wincent Colaiuta
  2 siblings, 0 replies; 28+ messages in thread
From: Ping Yin @ 2009-05-20  4:58 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jeff King, Joshua Jensen, git

On Wed, May 20, 2009 at 11:35 AM, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 19 May 2009, Jeff King wrote:
>
>> On Tue, May 19, 2009 at 11:10:14PM -0400, Nicolas Pitre wrote:
>>
>> > > So here's the idea: What if Git, upon a revert change (or git reset --hard
>> > > HEAD), "committed" the changes to be reverted and then did the revert with a
>> > > 'git reset --hard HEAD^'?  The reverted files would be disconnected from a
>> > > branch, but they would be available in the reflog to retrieve.
>> >
>> > I think there is indeed some value in having a commit of the work
>> > directory dirty state automatically made before this state is discarded,
>>
>> Related to this, I have wondered if it might be useful to have an "index
>> reflog". If I do something like this:
>>
>>   $ git add foo
>>   $ hack hack hack
>>   $ git add foo
>>
>> Then the first added state of "foo" is available in the object database,
>> but it is not connected to the name "foo" in any way, which makes it
>> much harder to find. If we had a reflog pointing to trees representing
>> the index state after each change, then it would be simple (you could
>> look at "INDEX@{1}:foo" or similar).
>>
>> I don't know if the performance is an issue. We are writing an extra
>> tree every time we touch the index, but in many cases you are already
>> writing a blob.
>
> Well... Actually I think having a reflog dedicated to discarded state
> would probably be a better idea. And...
>

Agree. The issues about losing changes due to wrong operation have
raised many times in the list, such as

git add .
git reset --hard

With reflog for discarded changes,  this can be easily recovered.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  3:35     ` Nicolas Pitre
  2009-05-20  3:38       ` Jeff King
  2009-05-20  4:58       ` Ping Yin
@ 2009-05-20  9:15       ` Wincent Colaiuta
  2009-05-20 10:16         ` Jakub Narebski
  2009-05-20 12:53         ` Nicolas Pitre
  2 siblings, 2 replies; 28+ messages in thread
From: Wincent Colaiuta @ 2009-05-20  9:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jeff King, Joshua Jensen, git

El 20/5/2009, a las 5:35, Nicolas Pitre escribió:

> Having a "trash" reflog would solve this unambiguously.  That could  
> also
> include your index example above.  However, in the index case, I'd
> record a reflog entry only if you're about to discard a previously non
> committed entry.  If you do:
>
> 	$ git add foo
> 	$ git add bar
> 	$ git commit
> 	$ hack hack hack
> 	$ git add foo
>
> then in this case there is nothing to be lost hence no additional  
> entry
> in the "trash" reflog.

That's true in the example you provide, but it doesn't really handle  
Jeff's initial example ("git add" twice on the same file), where it is  
possible to throw away intermediate state (by overwriting).

I personally haven't been bitten by a mistaken use of "git reset" and  
friends for a long time now, but I really like the ideas that have  
been mentioned so far in this thread. This "trash" or "index" or  
"reset" reflog would be a huge step forward in "idiot-proofing" Git.

Cheers,
Wincent

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  9:15       ` Wincent Colaiuta
@ 2009-05-20 10:16         ` Jakub Narebski
  2009-05-20 12:53         ` Nicolas Pitre
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2009-05-20 10:16 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: Nicolas Pitre, Jeff King, Joshua Jensen, git

Wincent Colaiuta <win@wincent.com> writes:
> El 20/5/2009, a las 5:35, Nicolas Pitre escribió:
> 
> > Having a "trash" reflog would solve this unambiguously.
[...]

> I personally haven't been bitten by a mistaken use of "git reset" and  
> friends for a long time now, but I really like the ideas that have  
> been mentioned so far in this thread. This "trash" or "index" or  
> "reset" reflog would be a huge step forward in "idiot-proofing" Git.

Well, having a kind of 'Attic' for reflogs for deleted branches would
help here in "mistake-proofing" Git (there were even IIRC some
preliminary test implementation), although it is not as necessary
nowadays because of reflog for HEAD, and ability to set longer expiry
time for it (IIRC).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  9:15       ` Wincent Colaiuta
  2009-05-20 10:16         ` Jakub Narebski
@ 2009-05-20 12:53         ` Nicolas Pitre
  2009-05-20 14:17           ` Shawn O. Pearce
  2009-05-20 15:23           ` Wincent Colaiuta
  1 sibling, 2 replies; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20 12:53 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: Jeff King, Joshua Jensen, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 832 bytes --]

On Wed, 20 May 2009, Wincent Colaiuta wrote:

> El 20/5/2009, a las 5:35, Nicolas Pitre escribió:
> 
> > Having a "trash" reflog would solve this unambiguously.  That could also
> > include your index example above.  However, in the index case, I'd
> > record a reflog entry only if you're about to discard a previously non
> > committed entry.  If you do:
> > 
> > 	$ git add foo
> > 	$ git add bar
> > 	$ git commit
> > 	$ hack hack hack
> > 	$ git add foo
> > 
> > then in this case there is nothing to be lost hence no additional entry
> > in the "trash" reflog.
> 
> That's true in the example you provide, but it doesn't really handle Jeff's
> initial example ("git add" twice on the same file), where it is possible to
> throw away intermediate state (by overwriting).

Did I disagree with Jeff's original example?


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 12:53         ` Nicolas Pitre
@ 2009-05-20 14:17           ` Shawn O. Pearce
  2009-05-20 16:55             ` Eric Raible
  2009-05-20 17:59             ` Junio C Hamano
  2009-05-20 15:23           ` Wincent Colaiuta
  1 sibling, 2 replies; 28+ messages in thread
From: Shawn O. Pearce @ 2009-05-20 14:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Wincent Colaiuta, Jeff King, Joshua Jensen, git

Nicolas Pitre <nico@cam.org> wrote:
> On Wed, 20 May 2009, Wincent Colaiuta wrote:
> > El 20/5/2009, a las 5:35, Nicolas Pitre escribi?:
> > 
> > > Having a "trash" reflog would solve this unambiguously.  That could also
> > > include your index example above.  However, in the index case, I'd
> > > record a reflog entry only if you're about to discard a previously non
> > > committed entry.  If you do:
> > > 
> > > 	$ git add foo
> > > 	$ git add bar
> > > 	$ git commit
> > > 	$ hack hack hack
> > > 	$ git add foo
> > > 
> > > then in this case there is nothing to be lost hence no additional entry
> > > in the "trash" reflog.
> > 
> > That's true in the example you provide, but it doesn't really handle Jeff's
> > initial example ("git add" twice on the same file), where it is possible to
> > throw away intermediate state (by overwriting).
> 
> Did I disagree with Jeff's original example?

No, but if you don't read closely enough to notice that the first
add affects path "foo" and the second add "bar", you might miss
what you were trying to say.  I almost thought you disagreed with
Peff until I read your example a 2nd time.  :-)

You did say "uncommitted entry causes reflog append", so in Peff's
original example of "git add a; vi a; git add a", we should be
creating a reflog entry for that first added state, which is clearly
not a disagreement.

FWIW, I think this is a great idea, but lack the time to code it
myself, otherwise I probably would start hacking on it right now.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 12:53         ` Nicolas Pitre
  2009-05-20 14:17           ` Shawn O. Pearce
@ 2009-05-20 15:23           ` Wincent Colaiuta
  2009-05-20 15:47             ` Nicolas Pitre
  1 sibling, 1 reply; 28+ messages in thread
From: Wincent Colaiuta @ 2009-05-20 15:23 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jeff King, Joshua Jensen, git

El 20/5/2009, a las 14:53, Nicolas Pitre escribió:

> On Wed, 20 May 2009, Wincent Colaiuta wrote:
>
>> El 20/5/2009, a las 5:35, Nicolas Pitre escribió:
>>
>>> Having a "trash" reflog would solve this unambiguously.  That  
>>> could also
>>> include your index example above.  However, in the index case, I'd
>>> record a reflog entry only if you're about to discard a previously  
>>> non
>>> committed entry.  If you do:
>>>
>>> 	$ git add foo
>>> 	$ git add bar
>>> 	$ git commit
>>> 	$ hack hack hack
>>> 	$ git add foo
>>>
>>> then in this case there is nothing to be lost hence no additional  
>>> entry
>>> in the "trash" reflog.
>>
>> That's true in the example you provide, but it doesn't really  
>> handle Jeff's
>> initial example ("git add" twice on the same file), where it is  
>> possible to
>> throw away intermediate state (by overwriting).
>
> Did I disagree with Jeff's original example?

No, but I may have misinterpreted you; I understood that you said you  
wouldn't store intermediate index state after successive "git add"  
executions.

Cheers,
Wincent

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 15:23           ` Wincent Colaiuta
@ 2009-05-20 15:47             ` Nicolas Pitre
  2009-05-20 16:13               ` Sverre Rabbelier
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20 15:47 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: Jeff King, Joshua Jensen, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1524 bytes --]

On Wed, 20 May 2009, Wincent Colaiuta wrote:

> El 20/5/2009, a las 14:53, Nicolas Pitre escribió:
> 
> > On Wed, 20 May 2009, Wincent Colaiuta wrote:
> > 
> > > El 20/5/2009, a las 5:35, Nicolas Pitre escribió:
> > > 
> > > > Having a "trash" reflog would solve this unambiguously.  That could also
> > > > include your index example above.  However, in the index case, I'd
> > > > record a reflog entry only if you're about to discard a previously non
> > > > committed entry.  If you do:
> > > > 
> > > > 	$ git add foo
> > > > 	$ git add bar
> > > > 	$ git commit
> > > > 	$ hack hack hack
> > > > 	$ git add foo
> > > > 
> > > > then in this case there is nothing to be lost hence no additional entry
> > > > in the "trash" reflog.
> > > 
> > > That's true in the example you provide, but it doesn't really handle
> > > Jeff's
> > > initial example ("git add" twice on the same file), where it is possible
> > > to
> > > throw away intermediate state (by overwriting).
> > 
> > Did I disagree with Jeff's original example?
> 
> No, but I may have misinterpreted you; I understood that you said you wouldn't
> store intermediate index state after successive "git add" executions.

I wouldn't if those states don't discard data.  If you 'git add foo' and 
then 'git add bar' you don't lose anything.  If you do two 'git add foo' 
and the second one would discard the previous state of foo which is 
different from the new one _then_ it is worth adding a reflog entry for 
the otherwise about to be lost state. 


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 15:47             ` Nicolas Pitre
@ 2009-05-20 16:13               ` Sverre Rabbelier
  2009-05-20 16:58                 ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Sverre Rabbelier @ 2009-05-20 16:13 UTC (permalink / raw)
  To: Git List; +Cc: Wincent Colaiuta, Jeff King, Joshua Jensen, Nicolas Pitre

Heya,

On Wed, May 20, 2009 at 17:47, Nicolas Pitre <nico@cam.org> wrote:
> [...] _then_ it is worth adding a reflog entry for the otherwise about to
> be lost state.

On that note, the usefulness of such a feature is dependant on the
support we have for actually restoring an entry from this new reflog.
The current reflog is so amazingly useful because git has
awesome-cherry-pick-and-the-like-commit-handling powers that make it
easy to restore the otherwise lost state. But as far as I know,
there's no nice 'n easy support for restoring state to the information
contained in this new reflog, is there?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 14:17           ` Shawn O. Pearce
@ 2009-05-20 16:55             ` Eric Raible
  2009-05-20 17:59             ` Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Eric Raible @ 2009-05-20 16:55 UTC (permalink / raw)
  To: git

Shawn O. Pearce <spearce <at> spearce.org> writes:

> You did say "uncommitted entry causes reflog append", so in Peff's
> original example of "git add a; vi a; git add a", we should be
> creating a reflog entry for that first added state, which is clearly
> not a disagreement.
> 
> FWIW, I think this is a great idea, but lack the time to code it
> myself, otherwise I probably would start hacking on it right now.
> 

I for one am very happy to hear to say this.

Git is generally quite safe, but still, there are holes for the unwary.
I've fallen into those holes (embarrassingly, more than once), and so
any movement towards filling them is quite welcome.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 16:13               ` Sverre Rabbelier
@ 2009-05-20 16:58                 ` Jeff King
  2009-05-20 18:04                   ` Nicolas Pitre
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2009-05-20 16:58 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Git List, Wincent Colaiuta, Joshua Jensen, Nicolas Pitre

On Wed, May 20, 2009 at 06:13:55PM +0200, Sverre Rabbelier wrote:

> On that note, the usefulness of such a feature is dependant on the
> support we have for actually restoring an entry from this new reflog.
> The current reflog is so amazingly useful because git has
> awesome-cherry-pick-and-the-like-commit-handling powers that make it
> easy to restore the otherwise lost state. But as far as I know,
> there's no nice 'n easy support for restoring state to the information
> contained in this new reflog, is there?

I was envisioning a reflog of tree objects, so you could do:

  $ git reflog show TRASH ;# show the reflog message
  $ git show TRASH@{1} ;# show what's in the tree
  $ git show TRASH@{1}:path/to/file ;# see a file
  $ git checkout TRASH@{1} path/to/file ;# restore a file

which should all work as-is.

I suspect "git log -g" might need some tweaking to get a tree rather
than a commit (but in theory we should just show the "Reflog *:" headers
and not the commit headers).

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20  3:21   ` Jeff King
  2009-05-20  3:35     ` Nicolas Pitre
@ 2009-05-20 17:50     ` Junio C Hamano
  2009-05-20 18:27       ` [PATCH] write-tree --ignore-cache-tree Junio C Hamano
  1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2009-05-20 17:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Nicolas Pitre, Joshua Jensen, git

Jeff King <peff@peff.net> writes:

> Related to this, I have wondered if it might be useful to have an "index
> reflog". If I do something like this:
>
>   $ git add foo
>   $ hack hack hack
>   $ git add foo
>
> Then the first added state of "foo" is available in the object database,
> but it is not connected to the name "foo" in any way, which makes it
> much harder to find. If we had a reflog pointing to trees representing
> the index state after each change, then it would be simple (you could
> look at "INDEX@{1}:foo" or similar).
>
> I don't know if the performance is an issue. We are writing an extra
> tree every time we touch the index, but in many cases you are already
> writing a blob.

It is not just "an extra tree every time".  For example, in the kernel
repository, one of the path that is deepest [*1*] (i.e. whose modification
affects the most number of trees) is:

    arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm/iop_reg_space_asm.h

If you modify this file and then "git add", and if you write-tree the
index at that point, you need to write a tree object for ".", arch/,
arch/cris, ..., arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm, 10
trees in total (if I am counting them right ;-).

If your cache-tree is fresh (and if you "git write-tree" every time you
"git add", that will make it stay fresh), you do not have to recompute
object names of other 1728 tree objects (they are unchanged) [*2*], which
should help somewhat, but the majority of time is spent in the I/O (and
perhaps slow fsync on ext3 ;-) of writing these 10 tree objects [*3*].

People like Shawn who work with Java projects, where the tree hierarchy
tends to be (unnecessary) deep with prefixes like org/spearce/jgit due to
the namespace issues will have bigger overhead than a relatively shallow
project like git.git itself.

[Footnotes]

*1* You can find it out yourself with...

git ls-files "$(
    git ls-files |
    sed -e 's|[^/]||g' |
    sort -u |
    tail -n 1 |
    sed -e 's|/|*/|g' -e 's/$/*/'
)" | head -n 1

*2* The total number of tree objects in a commit is...

echo $(git ls-tree -r -d HEAD | wc -l) 1 + p | dc

*3* write-tree with or without help from cache-tree in the kernel
repository with a hot cache (we are talking about running "git write-tree"
every time you do "git add" so the cold cache case does not matter) looks
like this:

$ l=arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm/iop_reg_space_asm.h
$ echo >>$l && git add $l
$ /usr/bin/time git write-tree
04bc92c40a5d0f0d44e162e140cb00964a52046b
0.02user 0.01system 0:00.03elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6387minor)pagefaults 0swaps

$ git reset --hard

$ echo >>$l && git add $l
$ /usr/bin/time git write-tree --ignore-cache-tree
04bc92c40a5d0f0d44e162e140cb00964a52046b
0.13user 0.04system 0:00.17elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+5336outputs (0major+17141minor)pagefaults 0swaps

(The numbers are from my Athlon(tm) 64 X2 3800+ with slow IDE disks).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 14:17           ` Shawn O. Pearce
  2009-05-20 16:55             ` Eric Raible
@ 2009-05-20 17:59             ` Junio C Hamano
  2009-05-20 18:19               ` Nicolas Pitre
  2009-05-20 18:21               ` Jakub Narebski
  1 sibling, 2 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-05-20 17:59 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Wincent Colaiuta, Jeff King, Joshua Jensen, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> You did say "uncommitted entry causes reflog append", so in Peff's
> original example of "git add a; vi a; git add a", we should be
> creating a reflog entry for that first added state, which is clearly
> not a disagreement.
>
> FWIW, I think this is a great idea, but lack the time to code it
> myself, otherwise I probably would start hacking on it right now.

The devil's in the details.  There are at least four things you would need
to design before start hacking.

 (0) Do you want this to apply only to Porcelains, or do you want to use
     this for plumbing operations as well?

 (1) When would you "auto" write-tree?  When you do "git add" or anything
     that adds new contents to the index?  Or immediately before you do
     something destructive like "git reset"?  Or perhaps both?

 (2) Enumerate the operations that falls into the category you decided in
     the above question.  For example, "git apply --index" and "git apply
     --cached" would fall into the same category as "git add".  If you
     cover plumbing, you would also need to cover "git update-index".

 (3) What should happen when you cannot write the index out as a tree?  I
     think it is easier to make mistakes during a conflicted merge
     resolution than during a straight linear development of your own, and
     one of the cases that would benefit most would be that you have
     resolved a path to your satisfaction but then later you screw up
     while resolving some other paths, losing an earlier resolution.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 16:58                 ` Jeff King
@ 2009-05-20 18:04                   ` Nicolas Pitre
  2009-05-20 18:08                     ` Sverre Rabbelier
  2009-05-21  3:47                     ` Jeff King
  0 siblings, 2 replies; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20 18:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Sverre Rabbelier, Git List, Wincent Colaiuta, Joshua Jensen

On Wed, 20 May 2009, Jeff King wrote:

> On Wed, May 20, 2009 at 06:13:55PM +0200, Sverre Rabbelier wrote:
> 
> > On that note, the usefulness of such a feature is dependant on the
> > support we have for actually restoring an entry from this new reflog.
> > The current reflog is so amazingly useful because git has
> > awesome-cherry-pick-and-the-like-commit-handling powers that make it
> > easy to restore the otherwise lost state. But as far as I know,
> > there's no nice 'n easy support for restoring state to the information
> > contained in this new reflog, is there?
> 
> I was envisioning a reflog of tree objects, so you could do:
> 
>   $ git reflog show TRASH ;# show the reflog message
>   $ git show TRASH@{1} ;# show what's in the tree
>   $ git show TRASH@{1}:path/to/file ;# see a file
>   $ git checkout TRASH@{1} path/to/file ;# restore a file
> 
> which should all work as-is.
> 
> I suspect "git log -g" might need some tweaking to get a tree rather
> than a commit (but in theory we should just show the "Reflog *:" headers
> and not the commit headers).

Even simpler (for the user) would be to create a full commit with a 
synthetic message.  The advantage is that the commit would have for 
parent the HEAD commit at the moment the operation leading to the reflog 
entry was made, with the date tag, etc.  The message could even contain 
information about what exactly was discarded by the operation.  This 
commit would be referenced only by the trash reflog and not be part of 
any branch. And all your examples above should still work of course.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 18:04                   ` Nicolas Pitre
@ 2009-05-20 18:08                     ` Sverre Rabbelier
  2009-05-21  3:47                     ` Jeff King
  1 sibling, 0 replies; 28+ messages in thread
From: Sverre Rabbelier @ 2009-05-20 18:08 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jeff King, Git List, Wincent Colaiuta, Joshua Jensen

Heya,

On Wed, May 20, 2009 at 20:04, Nicolas Pitre <nico@cam.org> wrote:
> Even simpler (for the user) would be to create a full commit with a
> synthetic message.

FWIW, I really like this idea, and it would automagically put the
'trash reflog' into the reach of git's
awesome-cherry-pick-and-the-like-commit-handling powers.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 17:59             ` Junio C Hamano
@ 2009-05-20 18:19               ` Nicolas Pitre
  2009-05-20 18:25                 ` Nicolas Pitre
                                   ` (2 more replies)
  2009-05-20 18:21               ` Jakub Narebski
  1 sibling, 3 replies; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20 18:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn O. Pearce, Wincent Colaiuta, Jeff King, Joshua Jensen, git

On Wed, 20 May 2009, Junio C Hamano wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > You did say "uncommitted entry causes reflog append", so in Peff's
> > original example of "git add a; vi a; git add a", we should be
> > creating a reflog entry for that first added state, which is clearly
> > not a disagreement.
> >
> > FWIW, I think this is a great idea, but lack the time to code it
> > myself, otherwise I probably would start hacking on it right now.
> 
> The devil's in the details.  There are at least four things you would need
> to design before start hacking.
> 
>  (0) Do you want this to apply only to Porcelains, or do you want to use
>      this for plumbing operations as well?

Depends if current plumbing already takes care of reflogs for normal 
operations.

>  (1) When would you "auto" write-tree?  When you do "git add" or anything
>      that adds new contents to the index?  Or immediately before you do
>      something destructive like "git reset"?  Or perhaps both?

Delaying any IO until it is clear that something is to be discarded is 
the best approach performance wise.  So perhaps not on the first 'git 
add' but certainly on the second one with an already cached path for 
which new (different) content is going to replace previous content.

>  (2) Enumerate the operations that falls into the category you decided in
>      the above question.  For example, "git apply --index" and "git apply
>      --cached" would fall into the same category as "git add".  If you
>      cover plumbing, you would also need to cover "git update-index".

Plumbing should probably have the mechanism to create those trash 
reflogs if desired, but not active by default.  Plumbing is normally for 
scripts, and once a script is debugged it shouldn't discard data by 
mistake.  Let's not forget that this is a feature for mistake inducing 
humans.

>  (3) What should happen when you cannot write the index out as a tree?  I
>      think it is easier to make mistakes during a conflicted merge
>      resolution than during a straight linear development of your own, and
>      one of the cases that would benefit most would be that you have
>      resolved a path to your satisfaction but then later you screw up
>      while resolving some other paths, losing an earlier resolution.

This one is tricky.  Maybe storing two reflog entries corresponding to 
the unresolved stages?


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 17:59             ` Junio C Hamano
  2009-05-20 18:19               ` Nicolas Pitre
@ 2009-05-20 18:21               ` Jakub Narebski
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2009-05-20 18:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn O. Pearce, Nicolas Pitre, Wincent Colaiuta, Jeff King,
	Joshua Jensen, git

Junio C Hamano <gitster@pobox.com> writes:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > You did say "uncommitted entry causes reflog append", so in Peff's
> > original example of "git add a; vi a; git add a", we should be
> > creating a reflog entry for that first added state, which is clearly
> > not a disagreement.
> >
> > FWIW, I think this is a great idea, but lack the time to code it
> > myself, otherwise I probably would start hacking on it right now.
> 
> The devil's in the details.  There are at least four things you would need
> to design before start hacking.
> 
>  (0) Do you want this to apply only to Porcelains, or do you want to use
>      this for plumbing operations as well?
> 
>  (1) When would you "auto" write-tree?  When you do "git add" or anything
>      that adds new contents to the index?  Or immediately before you do
>      something destructive like "git reset"?  Or perhaps both?
> 
>  (2) Enumerate the operations that falls into the category you decided in
>      the above question.  For example, "git apply --index" and "git apply
>      --cached" would fall into the same category as "git add".  If you
>      cover plumbing, you would also need to cover "git update-index".

I guess that if we will not be able to resolve performance issues,
then such safety net would be of necessity limited only to highly
destructive commands that are more likely to destroy huge amount of
work if handled not carefully.

> 
>  (3) What should happen when you cannot write the index out as a tree?  I
>      think it is easier to make mistakes during a conflicted merge
>      resolution than during a straight linear development of your own, and
>      one of the cases that would benefit most would be that you have
>      resolved a path to your satisfaction but then later you screw up
>      while resolving some other paths, losing an earlier resolution.

Hmmm... because of (3), and because of performance concerns,
especially for the deep hierarchy, perhaps it would be better to use
here backup copy of index, or to save disk space just a delta to index.
On the other hand this would introduce yet another mechanism...

BTW., ad (3): how stash deals with conflicted merge? Hmmm... it doesn't

 $ git stash
 foo: needs merge
 foo: needs merge
 foo: unmerged (257cc5642cb1a054f08cc83f2d943e56fd3ebe99)
 foo: unmerged (b7d6715e2df11b9c32b2341423273c6b3ad9ae8a)
 foo: unmerged (5716ca5987cbf97d6bb54920bea6adde242d87e6)
 fatal: git-write-tree: error building trees
 Cannot save the current index state

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 18:19               ` Nicolas Pitre
@ 2009-05-20 18:25                 ` Nicolas Pitre
  2009-05-20 18:57                 ` Shawn O. Pearce
  2009-05-21  6:16                 ` Junio C Hamano
  2 siblings, 0 replies; 28+ messages in thread
From: Nicolas Pitre @ 2009-05-20 18:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn O. Pearce, Wincent Colaiuta, Jeff King, Joshua Jensen, git

On Wed, 20 May 2009, Nicolas Pitre wrote:

> On Wed, 20 May 2009, Junio C Hamano wrote:
> 
> >  (3) What should happen when you cannot write the index out as a tree?  I
> >      think it is easier to make mistakes during a conflicted merge
> >      resolution than during a straight linear development of your own, and
> >      one of the cases that would benefit most would be that you have
> >      resolved a path to your satisfaction but then later you screw up
> >      while resolving some other paths, losing an earlier resolution.
> 
> This one is tricky.  Maybe storing two reflog entries corresponding to 
> the unresolved stages?

BTW, what happens if one tries to 'git stash' an unresolved index?  So 
far this trash reflog and git stash have quite in common, and stashing 
an unresolved index certainly has value too.


Nicolas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] write-tree --ignore-cache-tree
  2009-05-20 17:50     ` Junio C Hamano
@ 2009-05-20 18:27       ` Junio C Hamano
  2009-05-21  0:40         ` [PATCH 1/2] cache-tree.c::cache_tree_find(): simplify inernal API Junio C Hamano
  2009-05-21  0:44         ` [PATCH 2/2] Optimize "diff-index --cached" using cache-tree Junio C Hamano
  0 siblings, 2 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-05-20 18:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Nicolas Pitre, Joshua Jensen

This allows you to discard the cache-tree information before writing the
tree out of the index (i.e. it always recomputes the tree object names for
all the subtrees).

This is only useful as a debug option, so I did not bother documenting it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-write-tree.c |   12 +++++++++---
 cache-tree.c         |   10 +++++++---
 cache-tree.h         |    7 ++++++-
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/builtin-write-tree.c b/builtin-write-tree.c
index 9d64050..3a24ce8 100644
--- a/builtin-write-tree.c
+++ b/builtin-write-tree.c
@@ -13,7 +13,7 @@ static const char write_tree_usage[] =
 
 int cmd_write_tree(int argc, const char **argv, const char *unused_prefix)
 {
-	int missing_ok = 0, ret;
+	int flags = 0, ret;
 	const char *prefix = NULL;
 	unsigned char sha1[20];
 	const char *me = "git-write-tree";
@@ -22,9 +22,15 @@ int cmd_write_tree(int argc, const char **argv, const char *unused_prefix)
 	while (1 < argc) {
 		const char *arg = argv[1];
 		if (!strcmp(arg, "--missing-ok"))
-			missing_ok = 1;
+			flags |= WRITE_TREE_MISSING_OK;
 		else if (!prefixcmp(arg, "--prefix="))
 			prefix = arg + 9;
+		else if (!prefixcmp(arg, "--ignore-cache-tree"))
+			/*
+			 * This is only useful for debugging, so I
+			 * do not bother documenting it.
+			 */
+			flags |= WRITE_TREE_IGNORE_CACHE_TREE;
 		else
 			usage(write_tree_usage);
 		argc--; argv++;
@@ -33,7 +39,7 @@ int cmd_write_tree(int argc, const char **argv, const char *unused_prefix)
 	if (argc > 2)
 		die("too many options");
 
-	ret = write_cache_as_tree(sha1, missing_ok, prefix);
+	ret = write_cache_as_tree(sha1, flags, prefix);
 	switch (ret) {
 	case 0:
 		printf("%s\n", sha1_to_hex(sha1));
diff --git a/cache-tree.c b/cache-tree.c
index 37bf35e..6dd8411 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -538,28 +538,32 @@ static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *pat
 	return it;
 }
 
-int write_cache_as_tree(unsigned char *sha1, int missing_ok, const char *prefix)
+int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)
 {
 	int entries, was_valid, newfd;
+	struct lock_file *lock_file;
 
 	/*
 	 * We can't free this memory, it becomes part of a linked list
 	 * parsed atexit()
 	 */
-	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
+	lock_file = xcalloc(1, sizeof(struct lock_file));
 
 	newfd = hold_locked_index(lock_file, 1);
 
 	entries = read_cache();
 	if (entries < 0)
 		return WRITE_TREE_UNREADABLE_INDEX;
+	if (flags & WRITE_TREE_IGNORE_CACHE_TREE)
+		cache_tree_free(&(active_cache_tree));
 
 	if (!active_cache_tree)
 		active_cache_tree = cache_tree();
 
 	was_valid = cache_tree_fully_valid(active_cache_tree);
-
 	if (!was_valid) {
+		int missing_ok = flags & WRITE_TREE_MISSING_OK;
+
 		if (cache_tree_update(active_cache_tree,
 				      active_cache, active_nr,
 				      missing_ok, 0) < 0)
diff --git a/cache-tree.h b/cache-tree.h
index e958835..eadcad8 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -30,11 +30,16 @@ struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 int cache_tree_fully_valid(struct cache_tree *);
 int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int, int);
 
+/* bitmasks to write_cache_as_tree flags */
+#define WRITE_TREE_MISSING_OK 1
+#define WRITE_TREE_IGNORE_CACHE_TREE 2
+
+/* error return codes */
 #define WRITE_TREE_UNREADABLE_INDEX (-1)
 #define WRITE_TREE_UNMERGED_INDEX (-2)
 #define WRITE_TREE_PREFIX_ERROR (-3)
 
-int write_cache_as_tree(unsigned char *sha1, int missing_ok, const char *prefix);
+int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix);
 void prime_cache_tree(struct cache_tree **, struct tree *);
 
 #endif
-- 
1.6.3.1.56.gd00e3.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 18:19               ` Nicolas Pitre
  2009-05-20 18:25                 ` Nicolas Pitre
@ 2009-05-20 18:57                 ` Shawn O. Pearce
  2009-05-21  6:16                 ` Junio C Hamano
  2 siblings, 0 replies; 28+ messages in thread
From: Shawn O. Pearce @ 2009-05-20 18:57 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Junio C Hamano, Wincent Colaiuta, Jeff King, Joshua Jensen, git

Nicolas Pitre <nico@cam.org> wrote:
> On Wed, 20 May 2009, Junio C Hamano wrote:
> > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > 
> > > You did say "uncommitted entry causes reflog append", so in Peff's
> > > original example of "git add a; vi a; git add a", we should be
> > > creating a reflog entry for that first added state, which is clearly
> > > not a disagreement.
> > >
> > > FWIW, I think this is a great idea, but lack the time to code it
> > > myself, otherwise I probably would start hacking on it right now.
> > 
> > The devil's in the details.  There are at least four things you would need
> > to design before start hacking.
> > 
> >  (0) Do you want this to apply only to Porcelains, or do you want to use
> >      this for plumbing operations as well?
> 
> Depends if current plumbing already takes care of reflogs for normal 
> operations.

git update-ref and git symbolic-ref, both plumbing, update the
reflog automatically.

Why?  Because that was the best way to ensure we always caught the
change, so we didn't miss updates and have a missing but important
state.

However, git rebase was moved from operating on its branch to work
on detached HEAD to avoid polluting the branch reflog with all of
the intermediate states.  So, here is a case where porcelain had to
be changed after the fact to work around the log-by-default nature
of the plumbing.
 
> >  (2) Enumerate the operations that falls into the category you decided in
> >      the above question.  For example, "git apply --index" and "git apply
> >      --cached" would fall into the same category as "git add".  If you
> >      cover plumbing, you would also need to cover "git update-index".
> 
> Plumbing should probably have the mechanism to create those trash 
> reflogs if desired, but not active by default.  Plumbing is normally for 
> scripts, and once a script is debugged it shouldn't discard data by 
> mistake.  Let's not forget that this is a feature for mistake inducing 
> humans.

git gui calls plumbing.  It uses git update-index directly.  IMHO,
git gui is porcelain, and wants this log, if its available.  But,
by the same token, git filter-branch is porcelain, and probably
does *not* want this logging on the index updates it makes.

So, if its off by default, we really need a way to allow git gui
to force it on.

But... given that it is a safety measure, I would prefer to make it
on by default, and permit scripts to disable it, and update tools
like git filter-branch to disable it, because that fits with our
other rules to try and err on the side of caution for the user's
data when given the choice between raw speed and some minor safety.
(E.g. we fsync objects to disk for safety, not because its fast.)
 
> >  (3) What should happen when you cannot write the index out as a tree?  I
> >      think it is easier to make mistakes during a conflicted merge
> >      resolution than during a straight linear development of your own, and
> >      one of the cases that would benefit most would be that you have
> >      resolved a path to your satisfaction but then later you screw up
> >      while resolving some other paths, losing an earlier resolution.
> 
> This one is tricky.  Maybe storing two reflog entries corresponding to 
> the unresolved stages?

Yea, and as was pointed out, git stash doesn't support this, but
it probably should be able to do it, assuming the index could write
out 3 (or maybe 4) trees in this case, one for each stage.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] cache-tree.c::cache_tree_find(): simplify inernal API
  2009-05-20 18:27       ` [PATCH] write-tree --ignore-cache-tree Junio C Hamano
@ 2009-05-21  0:40         ` Junio C Hamano
  2009-05-21  0:44         ` [PATCH 2/2] Optimize "diff-index --cached" using cache-tree Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-05-21  0:40 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Nicolas Pitre, Joshua Jensen

Earlier cache_tree_find() needs to be called with a valid cache_tree,
but repeated look-up may find an invalid or missing cache_tree in between.
Help simplify the callers by returning NULL to mean "nothing appropriate
found" when the input is NULL.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is a preliminary clean-up patch.  What comes next is much bigger.

 cache-tree.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 6dd8411..5481e43 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -514,6 +514,8 @@ struct cache_tree *cache_tree_read(const char *buffer, unsigned long size)
 
 static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *path)
 {
+	if (!it)
+		return NULL;
 	while (*path) {
 		const char *slash;
 		struct cache_tree_sub *sub;
-- 
1.6.3.1.56.gd00e3.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] Optimize "diff-index --cached" using cache-tree
  2009-05-20 18:27       ` [PATCH] write-tree --ignore-cache-tree Junio C Hamano
  2009-05-21  0:40         ` [PATCH 1/2] cache-tree.c::cache_tree_find(): simplify inernal API Junio C Hamano
@ 2009-05-21  0:44         ` Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-05-21  0:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Nicolas Pitre, Joshua Jensen

When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway.  Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.

As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:

    (without patch, hot cache)
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+9407minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+2446minor)pagefaults 0swaps

Cold cache numbers are very impressive, but it does not matter very much
in practice:

    (without patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
    247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps

    (with patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
    18440inputs+0outputs (79major+2369minor)pagefaults 0swaps

This of course helps "git status" as well.

    (without patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+10970minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+3921minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is what I've been wanting to do for a loooooooong time, ever since
   I introduced the cache-tree extension to the index data structure.
   Even though "diff --cached HEAD" is probably one of the least often
   used form of the diff from end users' point of view (you would run at
   most once before making an actual commit to give it a final review), it
   is more often used than a casual observer realizes, because it is the
   way Porcelains check what changes, if any, are staged.

 cache-tree.c   |   32 ++++++++++++++++++++++++++++++++
 cache-tree.h   |    3 +++
 diff-lib.c     |    2 ++
 unpack-trees.c |   17 +++++++++++++++++
 unpack-trees.h |    1 +
 5 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 5481e43..16a65df 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -631,3 +631,35 @@ void prime_cache_tree(struct cache_tree **it, struct tree *tree)
 	*it = cache_tree();
 	prime_cache_tree_rec(*it, tree);
 }
+
+/*
+ * find the cache_tree that corresponds to the current level without
+ * exploding the full path into textual form.  The root of the
+ * cache tree is given as "root", and our current level is "info".
+ * (1) When at root level, info->prev is NULL, so it is "root" itself.
+ * (2) Otherwise, find the cache_tree that corresponds to one level
+ *     above us, and find ourselves in there.
+ */
+static struct cache_tree *find_cache_tree_from_traversal(struct cache_tree *root,
+							 struct traverse_info *info)
+{
+	struct cache_tree *our_parent;
+
+	if (!info->prev)
+		return root;
+	our_parent = find_cache_tree_from_traversal(root, info->prev);
+	return cache_tree_find(our_parent, info->name.path);
+}
+
+int cache_tree_matches_traversal(struct cache_tree *root,
+				 struct name_entry *ent,
+				 struct traverse_info *info)
+{
+	struct cache_tree *it;
+
+	it = find_cache_tree_from_traversal(root, info);
+	it = cache_tree_find(it, ent->path);
+	if (it && it->entry_count > 0 && !hashcmp(ent->sha1, it->sha1))
+		return it->entry_count;
+	return 0;
+}
diff --git a/cache-tree.h b/cache-tree.h
index eadcad8..3df641f 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -2,6 +2,7 @@
 #define CACHE_TREE_H
 
 #include "tree.h"
+#include "tree-walk.h"
 
 struct cache_tree;
 struct cache_tree_sub {
@@ -42,4 +43,6 @@ int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int, int)
 int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix);
 void prime_cache_tree(struct cache_tree **, struct tree *);
 
+extern int cache_tree_matches_traversal(struct cache_tree *, struct name_entry *ent, struct traverse_info *info);
+
 #endif
diff --git a/diff-lib.c b/diff-lib.c
index a310fb2..1cb97af 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -446,6 +446,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	memset(&opts, 0, sizeof(opts));
 	opts.head_idx = 1;
 	opts.index_only = cached;
+	opts.diff_index_cached = cached;
 	opts.merge = 1;
 	opts.fn = oneway_diff;
 	opts.unpack_data = revs;
@@ -502,6 +503,7 @@ int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt)
 	memset(&opts, 0, sizeof(opts));
 	opts.head_idx = 1;
 	opts.index_only = 1;
+	opts.diff_index_cached = 1;
 	opts.merge = 1;
 	opts.fn = oneway_diff;
 	opts.unpack_data = &revs;
diff --git a/unpack-trees.c b/unpack-trees.c
index aaacaf1..8eb3ddb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -326,6 +326,23 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			if (src[0])
 				conflicts |= 1;
 		}
+
+		/* special case: "diff-index --cached" looking at a tree */
+		if (o->diff_index_cached &&
+		    n == 1 && dirmask == 1 && S_ISDIR(names->mode)) {
+			int matches;
+			matches = cache_tree_matches_traversal(o->src_index->cache_tree,
+							       names, info);
+			/*
+			 * Everything under the name matches.  Adjust o->pos to
+			 * skip the entire hierarchy.
+			 */
+			if (matches) {
+				o->pos += matches;
+				return mask;
+			}
+		}
+
 		if (traverse_trees_recursive(n, dirmask, conflicts,
 					     names, info) < 0)
 			return -1;
diff --git a/unpack-trees.h b/unpack-trees.h
index 0d26f3d..1e0e232 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -27,6 +27,7 @@ struct unpack_trees_options {
 		     aggressive:1,
 		     skip_unmerged:1,
 		     initial_checkout:1,
+		     diff_index_cached:1,
 		     gently:1;
 	const char *prefix;
 	int pos;
-- 
1.6.3.1.56.gd00e3.dirty

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 18:04                   ` Nicolas Pitre
  2009-05-20 18:08                     ` Sverre Rabbelier
@ 2009-05-21  3:47                     ` Jeff King
  1 sibling, 0 replies; 28+ messages in thread
From: Jeff King @ 2009-05-21  3:47 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Sverre Rabbelier, Git List, Wincent Colaiuta, Joshua Jensen

On Wed, May 20, 2009 at 02:04:58PM -0400, Nicolas Pitre wrote:

> > I was envisioning a reflog of tree objects, so you could do:
> [...]
> Even simpler (for the user) would be to create a full commit with a 
> synthetic message.  The advantage is that the commit would have for 
> parent the HEAD commit at the moment the operation leading to the reflog 
> entry was made, with the date tag, etc.  The message could even contain 

I think you are right that this is better. I was trying to save the
creation of an extra commit object, but it probably is not that big a
deal (as Junio mentioned, we may already be creating multiple tree
objects). And having parent information is much richer for doing actual
merges of the changes introduced by that reflog commit, not just pulling
out individual pieces of its state.

So forget my idea of storing trees; it should definitely be commits.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Reverting an uncommitted revert
  2009-05-20 18:19               ` Nicolas Pitre
  2009-05-20 18:25                 ` Nicolas Pitre
  2009-05-20 18:57                 ` Shawn O. Pearce
@ 2009-05-21  6:16                 ` Junio C Hamano
  2 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2009-05-21  6:16 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shawn O. Pearce, Wincent Colaiuta, Jeff King, Joshua Jensen, git

Nicolas Pitre <nico@cam.org> writes:

>>  (1) When would you "auto" write-tree?  When you do "git add" or anything
>>      that adds new contents to the index?  Or immediately before you do
>>      something destructive like "git reset"?  Or perhaps both?
>
> Delaying any IO until it is clear that something is to be discarded is 
> the best approach performance wise.  So perhaps not on the first 'git 
> add' but certainly on the second one with an already cached path for 
> which new (different) content is going to replace previous content.

I hope that you are not forgetting that it costs cycles to determine "it
is clear that something is to be discarded".

The "diff-index --cached" optimization I did today may help, as it would
allow you to check with the last trash-log more cheaply, but it certainly
is not free.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-05-21  6:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-20  2:34 Reverting an uncommitted revert Joshua Jensen
2009-05-20  3:10 ` Nicolas Pitre
2009-05-20  3:21   ` Jeff King
2009-05-20  3:35     ` Nicolas Pitre
2009-05-20  3:38       ` Jeff King
2009-05-20  4:58       ` Ping Yin
2009-05-20  9:15       ` Wincent Colaiuta
2009-05-20 10:16         ` Jakub Narebski
2009-05-20 12:53         ` Nicolas Pitre
2009-05-20 14:17           ` Shawn O. Pearce
2009-05-20 16:55             ` Eric Raible
2009-05-20 17:59             ` Junio C Hamano
2009-05-20 18:19               ` Nicolas Pitre
2009-05-20 18:25                 ` Nicolas Pitre
2009-05-20 18:57                 ` Shawn O. Pearce
2009-05-21  6:16                 ` Junio C Hamano
2009-05-20 18:21               ` Jakub Narebski
2009-05-20 15:23           ` Wincent Colaiuta
2009-05-20 15:47             ` Nicolas Pitre
2009-05-20 16:13               ` Sverre Rabbelier
2009-05-20 16:58                 ` Jeff King
2009-05-20 18:04                   ` Nicolas Pitre
2009-05-20 18:08                     ` Sverre Rabbelier
2009-05-21  3:47                     ` Jeff King
2009-05-20 17:50     ` Junio C Hamano
2009-05-20 18:27       ` [PATCH] write-tree --ignore-cache-tree Junio C Hamano
2009-05-21  0:40         ` [PATCH 1/2] cache-tree.c::cache_tree_find(): simplify inernal API Junio C Hamano
2009-05-21  0:44         ` [PATCH 2/2] Optimize "diff-index --cached" using cache-tree Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.