All of lore.kernel.org
 help / color / mirror / Atom feed
* Git Notes idea.
@ 2008-12-16  8:15 Govind Salinas
  2008-12-16  8:51 ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Govind Salinas @ 2008-12-16  8:15 UTC (permalink / raw)
  To: Git Mailing List

Hi All,

I was thinking about possible ideas for my little pet project and I
had and idea for way to tack on notes to a commit, or any object
really.  I know that the idea has been flying around for a long time
but there has never been any implementation or a concept that people
liked enough to use (unless I have missed something).

Here is my idea.

.git/refs/notes  contains a tree-id (assuming that using a tree-id
will not cause any problems, otherwise a commit object can be used.
it does not *need* a history, but it *could* have one).

That tree has a structure similar to the layout of .git/objects, where
it is 2 letter subdirectories for the notes objects.

Given a git object (commit, tree, blob, tag), use its sha as the
path/filename in this tree.
    If I have a commit 1234567890123456789012345678901234567890 then
the notes tree will have a file
12/34567890123456789012345678901234567890

That file has a list of sha1s (one per line).  These shas are object
IDs for blobs that have the notes or whatever that you want attached
to the item.

I think you get the idea.  When looking up an item, it should be
fairly easy to have the notes tree and subtrees available for doing
lookups.  And as far as I know stuff under .git/refs can be
pushed/pulled even if its not under heads or remotes or tags using
already existing machinery.  I am not sure, but I think that would
satisfy gc operations as well.  Also, these trees and blobs never have
to be put in the working directory.

Does this sound like something that is workable?  I thought it might
appeal since it uses only features that are already present.

This could be extended so that you have different sets of notes under
.git/refs/notes/<my note set> or whatever.  So that you can have some
notes you keep private and some that you publish or whatever.

OK, hopefully this isn't a off the wall,
thats-what-you-get-for-being-up-at-2-AM idea.

Thanks,
Govind.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16  8:15 Git Notes idea Govind Salinas
@ 2008-12-16  8:51 ` Jeff King
  2008-12-16  8:53   ` Jeff King
                     ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Jeff King @ 2008-12-16  8:51 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Git Mailing List

On Tue, Dec 16, 2008 at 02:15:47AM -0600, Govind Salinas wrote:

> I was thinking about possible ideas for my little pet project and I
> had and idea for way to tack on notes to a commit, or any object
> really.  I know that the idea has been flying around for a long time
> but there has never been any implementation or a concept that people
> liked enough to use (unless I have missed something).

I think you look at the previous suggestions, because yours is very
similar. Which is good, I think, because the current status is that the
design is good, but nobody has gotten around to working on it yet. So
maybe you can fix that. :)

> .git/refs/notes  contains a tree-id (assuming that using a tree-id
> will not cause any problems, otherwise a commit object can be used.
> it does not *need* a history, but it *could* have one).

That is the same as the current proposal, except:

 - the proposal is to use a commit, so your notes are version-controlled

 - I have suggested supporting multiple note "bases" in the refs/notes
   namespace. This would allow you to share some notes but not others
   (e.g., if you had some automated notes related to a build/test
   system, you might not want to mix those with your human-written
   notes).

> That tree has a structure similar to the layout of .git/objects, where
> it is 2 letter subdirectories for the notes objects.

I don't think this has been suggested yet, but I'm not sure it is a good
idea. The usual reason for this split is that many filesystems handle
large directories badly; that isn't a problem here.

It does reduce the size of the resulting tree objects when a note is
modified (we make updates to two smaller trees instead of one big tree).
I don't know if this really matters all that much, since the trees
will end up deltified in a pack anyway.

And it does make the implementation slightly less simple, since we have
to deal with two levels of trees.

> Given a git object (commit, tree, blob, tag), use its sha as the
> path/filename in this tree.
>     If I have a commit 1234567890123456789012345678901234567890 then
> the notes tree will have a file
> 12/34567890123456789012345678901234567890
> 
> That file has a list of sha1s (one per line).  These shas are object
> IDs for blobs that have the notes or whatever that you want attached
> to the item.

This is slightly different than the current proposal. You are proposing
that each item have a "list of notes". My thinking was to have "named
notes" using a tree for each entry full of blobs. So you could look up
the "foo" note for a given commit, but that note would be a single
scalar (which could, of course, be interpreted according to its name).

> I think you get the idea.  When looking up an item, it should be
> fairly easy to have the notes tree and subtrees available for doing
> lookups.  And as far as I know stuff under .git/refs can be

It is easy, but it's slow because we have to do a linear search in the
(potentially huge) notes tree. And that's what held up the initial
implementation. I did a proof-of-concept a month or so ago that
pre-seeded an in-memory hash using the tree contents and got pretty
reasonable performance.

> pushed/pulled even if its not under heads or remotes or tags using
> already existing machinery.  I am not sure, but I think that would
> satisfy gc operations as well.  Also, these trees and blobs never have
> to be put in the working directory.

Right, though I think one of the benefits of this approach is that you
_could_ do a checkout on the notes tree if you wanted to do very
flexible editing.

> Does this sound like something that is workable?  I thought it might
> appeal since it uses only features that are already present.

Yes, it sounds workable, though if you diverge from what has already
been discussed, I think you should make an argument about why your
approach is better.

> This could be extended so that you have different sets of notes under
> .git/refs/notes/<my note set> or whatever.  So that you can have some
> notes you keep private and some that you publish or whatever.

Oops, I should have read your whole mail. Yes, that is a good idea. :)

For reference, here are the previous discussions that I think are
relevant:

  Johan Herland's original notes proposal (which I think is largely
  dead, replaced by the one below):
  http://thread.gmane.org/gmane.comp.version-control.git/46770

  Johannes Schindelin's notes proposal (which is more or less the
  current proposal, but I think the on-disk notes index was not
  well liked):
  http://thread.gmane.org/gmane.comp.version-control.git/52598

  My test with using a hash to speed it up:
  http://article.gmane.org/gmane.comp.version-control.git/99415

  Some discussion of the interaction of notes and rebase:
  http://thread.gmane.org/gmane.comp.version-control.git/100533

  Some thoughts from me on naming issues:
  http://article.gmane.org/gmane.comp.version-control.git/100402

  Some thoughts from me on the tree speedup:
  http://article.gmane.org/gmane.comp.version-control.git/101460

which I think should bring you up to speed. :)

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16  8:51 ` Jeff King
@ 2008-12-16  8:53   ` Jeff King
  2008-12-16 18:43   ` Govind Salinas
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2008-12-16  8:53 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Git Mailing List

On Tue, Dec 16, 2008 at 03:51:08AM -0500, Jeff King wrote:

> I think you look at the previous suggestions, because yours is very

Sorry, there is a typo there. I meant "I think you _should_ look at the
previous suggestions." Not saying in broken English that you already
have looked at them.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16  8:51 ` Jeff King
  2008-12-16  8:53   ` Jeff King
@ 2008-12-16 18:43   ` Govind Salinas
  2008-12-16 23:48     ` Johannes Schindelin
  2008-12-17  9:38     ` Jeff King
  2008-12-17  0:12   ` rebasing commits that have notes, was " Johannes Schindelin
  2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
  3 siblings, 2 replies; 41+ messages in thread
From: Govind Salinas @ 2008-12-16 18:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Tue, Dec 16, 2008 at 2:51 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Dec 16, 2008 at 02:15:47AM -0600, Govind Salinas wrote:
>
>> I was thinking about possible ideas for my little pet project and I
>> had and idea for way to tack on notes to a commit, or any object
>> really.  I know that the idea has been flying around for a long time
>> but there has never been any implementation or a concept that people
>> liked enough to use (unless I have missed something).
>
> I think you look at the previous suggestions, because yours is very
> similar. Which is good, I think, because the current status is that the
> design is good, but nobody has gotten around to working on it yet. So
> maybe you can fix that. :)
>

I was thinking I would do my first implementation in pyrite and if I find
that it works well I will port it.

>> .git/refs/notes  contains a tree-id (assuming that using a tree-id
>> will not cause any problems, otherwise a commit object can be used.
>> it does not *need* a history, but it *could* have one).
>
> That is the same as the current proposal, except:
>
>  - the proposal is to use a commit, so your notes are version-controlled
>
>  - I have suggested supporting multiple note "bases" in the refs/notes
>   namespace. This would allow you to share some notes but not others
>   (e.g., if you had some automated notes related to a build/test
>   system, you might not want to mix those with your human-written
>   notes).
>
>> That tree has a structure similar to the layout of .git/objects, where
>> it is 2 letter subdirectories for the notes objects.
>
> I don't think this has been suggested yet, but I'm not sure it is a good
> idea. The usual reason for this split is that many filesystems handle
> large directories badly; that isn't a problem here.
>

I just read the proposal from Johannes, he seems to want to use a
similar layout.  However, I would like to modify my proposal slightly
to make it work better when a gc is run.  I would modify the tree to
look like this...

let 1234567890123456789012345678901234567890 be the
id of the item that is annotated.

let abcdef7890123456789012345678901234567890 be the
id of the note to be attached

root/
     12/
         34567890123456789012345678901234567890/
             abcdef7890123456789012345678901234567890

This way all the notes are attached to a tree, so that gc won't
think they are unreferenced objects.

> It does reduce the size of the resulting tree objects when a note is
> modified (we make updates to two smaller trees instead of one big tree).
> I don't know if this really matters all that much, since the trees
> will end up deltified in a pack anyway.
>
> And it does make the implementation slightly less simple, since we have
> to deal with two levels of trees.
>
>> Given a git object (commit, tree, blob, tag), use its sha as the
>> path/filename in this tree.
>>     If I have a commit 1234567890123456789012345678901234567890 then
>> the notes tree will have a file
>> 12/34567890123456789012345678901234567890
>>
>> That file has a list of sha1s (one per line).  These shas are object
>> IDs for blobs that have the notes or whatever that you want attached
>> to the item.
>
> This is slightly different than the current proposal. You are proposing
> that each item have a "list of notes". My thinking was to have "named
> notes" using a tree for each entry full of blobs. So you could look up
> the "foo" note for a given commit, but that note would be a single
> scalar (which could, of course, be interpreted according to its name).
>


>> I think you get the idea.  When looking up an item, it should be
>> fairly easy to have the notes tree and subtrees available for doing
>> lookups.  And as far as I know stuff under .git/refs can be
>
> It is easy, but it's slow because we have to do a linear search in the
> (potentially huge) notes tree. And that's what held up the initial
> implementation. I did a proof-of-concept a month or so ago that
> pre-seeded an in-memory hash using the tree contents and got pretty
> reasonable performance.
>

Perhaps I am missing something, how is it a linear search?.  Since we
are keying off of the sha of the annotated object, using a hashtable for
a cache should be a fairly quick binary search.  If you just wanted to
use the tree objects, that should work almost as well since the first tree
will split them up nicely for you.

Also, how large do you expect the list to be under reasonable
circumstances.

>> pushed/pulled even if its not under heads or remotes or tags using
>> already existing machinery.  I am not sure, but I think that would
>> satisfy gc operations as well.  Also, these trees and blobs never have
>> to be put in the working directory.
>
> Right, though I think one of the benefits of this approach is that you
> _could_ do a checkout on the notes tree if you wanted to do very
> flexible editing.
>

Sure, why not.

>> Does this sound like something that is workable?  I thought it might
>> appeal since it uses only features that are already present.
>
> Yes, it sounds workable, though if you diverge from what has already
> been discussed, I think you should make an argument about why your
> approach is better.
>

Well after reading Johannes proposal, I find it to be surprisingly similar
since I had not seen it before.  However I think mine is a win in a few ways.
One, it allows multiple notes per object.  Two, it plays well with gc.  From
what I could follow, the files in his layout just have a ref to the note object
but gc would be required to know about this feature and not remove those
note blobs.  Third it allows multiple sets of notes.  Although it seems that
at least 1 and 3 have been discussed at some length.

>> This could be extended so that you have different sets of notes under
>> .git/refs/notes/<my note set> or whatever.  So that you can have some
>> notes you keep private and some that you publish or whatever.
>
> Oops, I should have read your whole mail. Yes, that is a good idea. :)
>
> For reference, here are the previous discussions that I think are
> relevant:
>

Thanks for the pointers, a couple quick thoughts...

>  Some discussion of the interaction of notes and rebase:
>  http://thread.gmane.org/gmane.comp.version-control.git/100533
>

On rebasing, I have a couple thoughts.  1) I think it only really makes
sense to make a public annotation to a commit that is public, and
once a commit is public it should not be rebased.  2) We could also
annotate the commit's tree instead of the commit.  That would make
it somewhat resistant to rebases, cherry-picks and amends.  And
once a tree has changed, the notes are probably less reliable
although the user should be able to force a note or notes to be
carried along.

>  Some thoughts from me on naming issues:
>  http://article.gmane.org/gmane.comp.version-control.git/100402
>

On naming.  I strongly support a ref/notes/sha1/sha1 approach.  If
having a type to the note is important, then perhaps the first line of
a note could be considered a type or a set of "tags".  This way you
have both naming/typing and one lookup per sha.  The only
drawback is that you have to open the blob to see the type.  A hybrid
approach that uses refs/notes/acked/sha/sha which is one lookup if
you know the type and the sha of the annotated object before hand
might be worth considering.  This would be similar to the public or
private notes that i mentioned before.

>  Some thoughts from me on the tree speedup:
>  http://article.gmane.org/gmane.comp.version-control.git/101460
>

I guess I must be missing something.  I have seen several references
to this not being a binary search several times in the links that you have
here.  But I fail to see why a binary search cannot be done.  That said, I
would still think that the existing hash table would be the way to go.

> which I think should bring you up to speed. :)

Thanks again.

-Govind

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16 18:43   ` Govind Salinas
@ 2008-12-16 23:48     ` Johannes Schindelin
  2008-12-17  9:45       ` Jeff King
                         ` (2 more replies)
  2008-12-17  9:38     ` Jeff King
  1 sibling, 3 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-16 23:48 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Jeff King, Git Mailing List

Hi,

On Tue, 16 Dec 2008, Govind Salinas wrote:

> I was thinking I would do my first implementation in pyrite and if I 
> find that it works well I will port it.

Given that there are a lot of building blocks in C already, I think it 
would be a waste of your time.

> I just read the proposal from Johannes, he seems to want to use a 
> similar layout.  However, I would like to modify my proposal slightly to 
> make it work better when a gc is run.  I would modify the tree to look 
> like this...
> 
> let 1234567890123456789012345678901234567890 be the
> id of the item that is annotated.
> 
> let abcdef7890123456789012345678901234567890 be the
> id of the note to be attached
> 
> root/
>      12/
>          34567890123456789012345678901234567890/
>              abcdef7890123456789012345678901234567890
> 
> This way all the notes are attached to a tree, so that gc won't
> think they are unreferenced objects.

In my proposal back then, your root/12/345... would be a blob, in Peff's, 
it would be a tree, and in both cases the blobs/trees would be referenced 
by refs/notes, so git gc would not kill them either.

The bigger issue is that commit objects can be gc'ed, and then their notes 
should be gc'ed, too.

And of course, there is the rebase issue (which I completely missed; I 
will read the mail Jeff referenced tomorrow).

Speaking about the blobs vs trees issue, I think it is no issue, as Peff 
and me already discussed: the notes could check if it is a tree or a 
blob, and handle both easily.

> Peff wrote:
>
> > It is easy, but it's slow because we have to do a linear search in the 
> > (potentially huge) notes tree. And that's what held up the initial 
> > implementation. I did a proof-of-concept a month or so ago that 
> > pre-seeded an in-memory hash using the tree contents and got pretty 
> > reasonable performance.
> 
> Perhaps I am missing something, how is it a linear search?

Yes, you are missing what I wrote in the original thread: tree objects 
must be read in a forward direction, one by one.

IIRC back then, Junio and/or Linus suggested that you could backward 
search with heuristics finding the beginning of a tree entry, and thus you 
could kind of bisect the tree to search for a specific tree entry (since 
tree objects have the contents sorted), but I presented a case where this 
breaks down at the GitTogether:

Tree entries consist of a mode (as 6-byte ASCII representing the octal 
value), then SPC, then a NUL-terminated path, and then a 20-byte SHA-1.  
(Just hexdump the output of "git cat-file tree HEAD:" in any repository to 
see it).

The only heuristic you could apply to find your way backward (or forward) 
to find the beginning of a tree entry would be to find the NUL character 
and verify that exactly 20 bytes after it, either the tree object ends, or 
there is a valid octal number followed by a SPC.

The only thing you would need for this heuristic to break down is a SHA-1 
which contains a \x00 (which is then mistaken for a string termination), 
and part of a filename that could be mistaken to be an octal number.

Take for example the SHA-1 of git-gui in v1.6.0.5, which has a NUL its 
14th byte, and just assume that the next tree entry has mode 100644 
and name "4040000 Some financial record.txt".

Then, the NUL could be mistaken for the end of the previous tree entry, 
and "040000 " as the mode for the next one, whose name would be assumed to 
be "Some financial record.txt".

Granted, if you find two NULs in 21 bytes, one of them must be the 
termination of the path, but which one?

Worse, even if you would find a method (complicated, and therefore 
necessarily fragile) to find the boundary of the tree entries reliably, 
you would _still_ have a linear time unpacking the darned tree object in 
the first place.

So you cannot do that for every commit you encounter and expect not to die 
of boredom in the process.

Peff's very cute idea was to decouple that process from the per-commit 
procedure, and basically make it a one-time cost (per Git call, and only 
when notes were asked for).

It will still be linear in the number of notes, but it would then be in a 
hashmap, with an expected linear cost per commit.

> Also, how large do you expect the list to be under reasonable 
> circumstances.

We did not intend Git to be used as a backup tool, did we?

One of the _worst_ design decisions is to limit yourself by expectations.

> >  Some discussion of the interaction of notes and rebase:
> >  http://thread.gmane.org/gmane.comp.version-control.git/100533
> 
> On rebasing, I have a couple thoughts.
>
> 1) I think it only really makes sense to make a public annotation to a 
>    commit that is public, and once a commit is public it should not be 
>    rebased.

Again, do not limit your design by your expectations.  People already talk 
about having cover letters for their patch series as notes, and Pasky 
seems to discuss tracking explicit renames with notes when he does not 
play Go, instead of maintaining repo.or.cz and git.or.cz.

> 2) We could also annotate the commit's tree instead of the commit.  
>    That would make it somewhat resistant to rebases, cherry-picks and 
>    amends.

To the contrary.  When I rebase, the tree _does_ change, otherwise I would 
have rebased onto something that had the same original tree as my 
rebase-base to begin with, which would make the rebase rather pointless.

> >  Some thoughts from me on naming issues: 
> >  http://article.gmane.org/gmane.comp.version-control.git/100402
> 
> On naming.  I strongly support a ref/notes/sha1/sha1 approach.

I think you meant refs/notes:<first byte in hex>/<rest of bytes>/<some 
arbitrary SHA-1>?

I am rather supporting refs/nots:<first byte in hex>/<rest of bytes> being 
either a blob, or a tree containing human readable tags, such as "bugfix" 
or "review" or some such.

> If having a type to the note is important, then perhaps the first line 
> of a note could be considered a type or a set of "tags".

That would be horrible!  Just to know if you need to unpack the blob, 
you'd have to unpack it!

Hth,
Dscho

^ permalink raw reply	[flat|nested] 41+ messages in thread

* rebasing commits that have notes, was Re: Git Notes idea.
  2008-12-16  8:51 ` Jeff King
  2008-12-16  8:53   ` Jeff King
  2008-12-16 18:43   ` Govind Salinas
@ 2008-12-17  0:12   ` Johannes Schindelin
  2008-12-17  9:15     ` Johan Herland
  2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
  3 siblings, 1 reply; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-17  0:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List

Hi,

On Tue, 16 Dec 2008, Jeff King wrote:

>   Some discussion of the interaction of notes and rebase:
>   http://thread.gmane.org/gmane.comp.version-control.git/100533

Oh, I misinterpreted that label... of course you can track rebases in 
notes, but some issue that we did not look into yet (I think) is the issue 
that you can cherry-pick and rebase commits and lose notes in the process.

It seems that the notes idea is not that unintrusive as I thought...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: rebasing commits that have notes, was Re: Git Notes idea.
  2008-12-17  0:12   ` rebasing commits that have notes, was " Johannes Schindelin
@ 2008-12-17  9:15     ` Johan Herland
  2008-12-17 17:55       ` Stephan Beyer
  0 siblings, 1 reply; 41+ messages in thread
From: Johan Herland @ 2008-12-17  9:15 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff King, Govind Salinas

On Wednesday 17 December 2008, Johannes Schindelin wrote:
> Hi,
>
> On Tue, 16 Dec 2008, Jeff King wrote:
> >   Some discussion of the interaction of notes and rebase:
> >   http://thread.gmane.org/gmane.comp.version-control.git/100533
>
> Oh, I misinterpreted that label... of course you can track rebases in
> notes, but some issue that we did not look into yet (I think) is the
> issue that you can cherry-pick and rebase commits and lose notes in the
> process.
>
> It seems that the notes idea is not that unintrusive as I thought...

So we have two issues here:

1. Using notes to annotate the rebase/cherry-pick action itself.

2. Preserving (or at least giving the user the option of preserving) notes 
across a rebase/cherry-pick.


I think issue #1 has already been discussed, and is largely resolved: People 
can do this if they want to; it probably only makes sense when 
rebasing/cherry-picking public branches; etc... AFAICS there are no 
remaining problems here that needs an intrusive solution (see below for one 
such unintrusive alternative).

Issue #2, however, is a little more involved. We can discuss the merits of 
wanting to preserve notes across a rebase/cherry-pick itself; e.g. when it 
makes sense to preserve notes, and when it doesn't make sense, but I think 
this is orthogonal to the issue of HOW to preserve them, so instead of 
focusing on WHY, I'll focus on HOW:

If notes are named according to the "refs/notes:<first byte in hex>/<rest of 
bytes>/<referenced object SHA-1>" scheme (and AFAICS this is still being 
discussed, so it's indeed a big IF), then rebase/cherry-pick of the 
referenced object simply translates to a rename/copy of the corresponding 
note (this is of course assuming that the note itself does not contain the 
SHA-1 of the referenced object). This could probably be solved fairly 
unintrusively in the current code, but there are (as always) complications:

- The user may want to amend the note after the rebase/cherry-pick (just as 
(s)he may want to amend the commit message).

- In some cases it may even make sense to fold (parts of) the note _into_ 
the commit message.

- probably more reasons...

So what about the following proposal: Add hooks that are invoked by 
rebase/cherry-pick with the <from-SHA1> and <to-SHA1> as arguments. A 
typical hook script can then use this information to look up notes 
referencing <from-SHA1> and update these to reference <to-SHA1> instead, 
and in the process, prompt the user to do whatever changes (s)he wants to. 
The hook scripts can do other things as well, e.g. implementing issue #1 
above (adding notes for annotating the rebase/cherry-pick itself.)


Have fun! :)

...Johan


PS: What's the current status on git-sequencer? It's probably the best place 
to invoke these hooks.

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16 18:43   ` Govind Salinas
  2008-12-16 23:48     ` Johannes Schindelin
@ 2008-12-17  9:38     ` Jeff King
  2008-12-17 17:06       ` Govind Salinas
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2008-12-17  9:38 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Git Mailing List, Johannes Schindelin

On Tue, Dec 16, 2008 at 12:43:55PM -0600, Govind Salinas wrote:

> I was thinking I would do my first implementation in pyrite and if I find
> that it works well I will port it.

OK, though your performance will probably suck unless you dump the notes
tree into a local hash at the beginning of your program. And looking up
every commit's note during revision traversal is one of the intended
uses (e.g., decorating git-log output, or filtering commits based on a
particular note).

And as Dscho mentioned, most of what you need is already there in C.
You are welcome to implement whatever you want in pyrite, of course, but
there is a desire to have this accessible to the revision traversal
machinery. And that means if you want your version in pyrite to be
compatible with what ends up in git, the data structure design needs to
be suitable for both.

> I just read the proposal from Johannes, he seems to want to use a
> similar layout.  However, I would like to modify my proposal slightly
> to make it work better when a gc is run.  I would modify the tree to
> look like this...
> 
> let 1234567890123456789012345678901234567890 be the
> id of the item that is annotated.
> 
> let abcdef7890123456789012345678901234567890 be the
> id of the note to be attached
> 
> root/
>      12/
>          34567890123456789012345678901234567890/
>              abcdef7890123456789012345678901234567890
> 
> This way all the notes are attached to a tree, so that gc won't
> think they are unreferenced objects.

But you have lost the ordering in your list, then, since they will not
be ordered by sha1 of the note contents. I don't know if you care. The
second sha1 is pointless, anyway, since nobody will know that number as
a reference; why not just name them monotonically starting at 1?

One of the things I don't like about having several notes is that it
introduces an extra level of indirection that every user has to pay for,
whether they want it or not. If a note can be a blob _or_ a tree, then
those who want to use blobs can reap the performance benefit. Those who
want multiple named notes in a hierarchy can pay the extra indirection
cost.

I haven't measured how big a cost that is (but bearing in mind that we
might want to do this lookup once per revision in a traversal, even one
extra object lookup can have an impact).

I'm also still not convinced the fan-out is worthwhile, but I can see
how it might be. It would be nice to see numbers for both.

> Perhaps I am missing something, how is it a linear search?.  Since we

I think Johannes explained in detail in another message, but it is a
linear search to look up directly in a tree object. Of course you can
build a hash or a sorted fixed-size list as an index.

> Also, how large do you expect the list to be under reasonable
> circumstances.

As many notes as there are commits is my goal (e.g., it is not hard to
imagine an automated process to add notes on build status). Ideally, we
could handle as many notes as there are objects; I see no reason not to
allow annotating arbitrary sha1's (I don't know if there is a use for
that, but the more scalable the implementation, the better).

> >  Some thoughts from me on naming issues:
> >  http://article.gmane.org/gmane.comp.version-control.git/100402
> 
> On naming.  I strongly support a ref/notes/sha1/sha1 approach.  If
> having a type to the note is important, then perhaps the first line of
> a note could be considered a type or a set of "tags".  This way you

I don't think we are talking about the same thing. What I mean by naming
is "here is a shorthand for referring to notes" that is not necessarily
coupled with the implementation. That is, I would like to do something
like:

  git log --notes-filter="foo:bar == 1"

and have that "foo:bar" as a shorthand on each commit for:

  refs/notes/foo:$COMMIT/bar

Without a left-hand side (e.g., "bar"), we get:

  refs/notes/default:$COMMIT/bar

Or without a right-hand side (e.g., "foo:"), we get:

  refs/notes/foo:$COMMIT

So you can group related notes in the same tree (which gives you fast
lookup if you are looking at multiple ones, since you only have to do
the tree lookup once), or you can keep notes in separate trees (which
means you can distribute some but not others).

I think your "list of notes" proposal on top of that would be for any
note resolution to provide a tree instead of a blob, with sequenced
elements. I.e., foo:bar might have multiple notes, like:

  refs/notes/foo:$COMMIT/bar/1
  refs/notes/foo:$COMMIT/bar/2
  refs/notes/foo:$COMMIT/bar/3

> drawback is that you have to open the blob to see the type.  A hybrid
> approach that uses refs/notes/acked/sha/sha which is one lookup if

Right, that is possible, but implementing only half of what I suggested
above. I think you should be flexible enough to have grouped notes for
fast lookup, or ungrouped notes for more flexibility.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16 23:48     ` Johannes Schindelin
@ 2008-12-17  9:45       ` Jeff King
       [not found]       ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com>
  2008-12-17 12:21       ` Petr Baudis
  2 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2008-12-17  9:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Govind Salinas, Git Mailing List

On Wed, Dec 17, 2008 at 12:48:02AM +0100, Johannes Schindelin wrote:

> > Perhaps I am missing something, how is it a linear search?
> 
> Yes, you are missing what I wrote in the original thread: tree objects 
> must be read in a forward direction, one by one.

Thanks, that was a nice writeup of what was discussed at the
GitTogether.

> Peff's very cute idea was to decouple that process from the per-commit 
> procedure, and basically make it a one-time cost (per Git call, and only 
> when notes were asked for).

To be fair, it was not my cute idea, but somebody else's (I think David
Reiss). I just coded it quickly because somebody was talking about
iPhones or something. :)

> To the contrary.  When I rebase, the tree _does_ change, otherwise I
> would have rebased onto something that had the same original tree as
> my rebase-base to begin with, which would make the rebase rather
> pointless.

Another fun option would be to put notes on patch-ids. That would of
course be horrifically slow to look up, but would survive many
cherry-picks and rebases (but not all, of course).

I don't know if that is useful or not, but I don't see any reason why
the discussed implementation would forbid it (somebody would just have
to implement the lookups at a useful spot).

> > On naming.  I strongly support a ref/notes/sha1/sha1 approach.
> I think you meant refs/notes:<first byte in hex>/<rest of bytes>/<some
> arbitrary SHA-1>?

You seem to be in favor of the fan-out. Out of curiosity, did you ever
do numbers on whether the fan-out is actually helpful?

> I am rather supporting refs/nots:<first byte in hex>/<rest of bytes>
> being either a blob, or a tree containing human readable tags, such as
> "bugfix" or "review" or some such.

Yes, I am in favor of the "tree or blob" idea, too. I want this to not
just be about "I am a person writing a note" but "arbitrary data
attached to an object after it has been created". And that means
thinking up front about managing the namespace.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
       [not found]         ` <alpine.DEB.1.00.0812170420560.14632@racer>
@ 2008-12-17 10:11           ` Jeff King
  2008-12-17 11:38             ` Johannes Schindelin
  2008-12-19 17:18             ` Govind Salinas
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2008-12-17 10:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Govind Salinas, Git Mailing List

On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote:

> > I agree, I haven't thought of any fix along these lines other than to 
> > make gc do the clean up.
> 
> I have, and IIRC I briefly mentioned it back then.  Basically, you will 
> have to add a "git notes gc" or some such, which basically reads in the 
> whole notes, traverses all reachable commits, marking the corresponding 
> notes, and then writes out all marked notes (leaving the other notes 
> behind).

I was thinking something similar, but I think it is even easier. Make
the rule "if we still have the object, then we still have the note".
That has three benefits:

 - implementation is simple: for each note $n, delete it unless
   has_sha1_file($n).

 - it handles notes on non-commit objects

 - it kills off notes when an object is _pruned_, not when it stops
   being _reachable_. So if I delete a branch with a commit found
   nowhere else, its notes will hang around until it is actually pruned.
   If I pull it from lost+found, I still keep the notes.

Note that all of this garbage collection of notes is really just
removing them from the most current notes _tree_. If the notes structure
is actually composed of commits, then old notes that are "deleted" will
still be available historically.

> I wonder why you speak as if none of that had been implemented yet.  From 
> my work, it is obvious that hashtable is better than sorted list (except 
> for the fan-out, which obviously wants to be an array), and from Peff's it 
> is obvious that we want to keep the hashtables in memory.

If he is planning on doing a separate pyrite implementation, then it
_hasn't_ been implemented yet. And I don't care there if he uses
hash tables or sorted lists or whatever. I think the most important
thing is getting down the design of the _data structure_, so that we can
have a compatible implementation inside git itself.

> > IMO notes are just a generallized tag.
> 
> IMO notes have nothing to do with a tag.  Tags point to commits (or other 
> objects, but that's beside the point here).  Notes are pointed _to_ by 
> commits.

I think maybe we are just talking about semantics, but I think notes are
not pointed to by commits. There is an external mapping of commits to
notes, which is very different. I can give you the commit without you
knowing the notes, or that the notes even exist.

But in practice, I don't know if this distinction is going to influence
any of the design or use.

> Has the tree changed?  Sure it has.  Because Junio committed and pushed 
> some changes.

I think it is safe to say the tree generally changes for rebase, but not
necessarily for something like an amended commit, or a pull of a patch
sent by mail. So there are times when it changes, and times when it
doesn't.

And if there were some simple way of handling the times when it didn't
change at no general cost, I think going that way would be fine. But:

> And the worst part about your idea to attach notes to trees rather than 
> commits:  For things like Acked-by:... you very much want to annotate the 
> commit, _not_ the tree.  The tree is useless here.  It says nothing about 
> the patch, nothing about the explanation, and nothing about the history.

This is a huge cost, IMHO. I think you generally want to annotate
commits, not trees, and the semantic difference is important (but again,
I think all of the proposals are capable of doing either -- but if you
want a "show me the notes on this commit" feature to interoperate, it
needs to pick one).

> > root/
> >      12/
> >          34567890123456789012345678901234567890/
> >              <type>
> 
> Funny.  That is Peff's proposal.

Clearly we have independent verification that it's a good idea. ;)

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17 10:11           ` Jeff King
@ 2008-12-17 11:38             ` Johannes Schindelin
  2008-12-17 19:20               ` Junio C Hamano
  2008-12-19 17:42               ` Govind Salinas
  2008-12-19 17:18             ` Govind Salinas
  1 sibling, 2 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-17 11:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List

Hi,

On Wed, 17 Dec 2008, Jeff King wrote:

> On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote:
> 
> > > I agree, I haven't thought of any fix along these lines other than 
> > > to make gc do the clean up.
> > 
> > I have, and IIRC I briefly mentioned it back then.  Basically, you 
> > will have to add a "git notes gc" or some such, which basically reads 
> > in the whole notes, traverses all reachable commits, marking the 
> > corresponding notes, and then writes out all marked notes (leaving the 
> > other notes behind).
> 
> I was thinking something similar, but I think it is even easier. Make
> the rule "if we still have the object, then we still have the note".
> That has three benefits:
> 
>  - implementation is simple: for each note $n, delete it unless
>    has_sha1_file($n).
> 
>  - it handles notes on non-commit objects
> 
>  - it kills off notes when an object is _pruned_, not when it stops
>    being _reachable_. So if I delete a branch with a commit found
>    nowhere else, its notes will hang around until it is actually pruned.
>    If I pull it from lost+found, I still keep the notes.
> 
> Note that all of this garbage collection of notes is really just 
> removing them from the most current notes _tree_. If the notes structure 
> is actually composed of commits, then old notes that are "deleted" will 
> still be available historically.

Right.  So my original proposal to use separate refs for separate purposes 
might make sense again, so you can have private as well as public notes.

> > I wonder why you speak as if none of that had been implemented yet.  
> > From my work, it is obvious that hashtable is better than sorted list 
> > (except for the fan-out, which obviously wants to be an array), and 
> > from Peff's it is obvious that we want to keep the hashtables in 
> > memory.
> 
> If he is planning on doing a separate pyrite implementation, then it 
> _hasn't_ been implemented yet. And I don't care there if he uses hash 
> tables or sorted lists or whatever. I think the most important thing is 
> getting down the design of the _data structure_, so that we can have a 
> compatible implementation inside git itself.

Well, I don't care about pyrite.  As far as I am concerned, it might as 
well use an incompatible version.  I really don't care.

> > > IMO notes are just a generallized tag.
> > 
> > IMO notes have nothing to do with a tag.  Tags point to commits (or 
> > other objects, but that's beside the point here).  Notes are pointed 
> > _to_ by commits.
> 
> I think maybe we are just talking about semantics, but I think notes are
> not pointed to by commits. There is an external mapping of commits to
> notes, which is very different. I can give you the commit without you
> knowing the notes, or that the notes even exist.
> 
> But in practice, I don't know if this distinction is going to influence 
> any of the design or use.

You are correct, of course, that the commit does not point to the notes 
explicitely, by having a SHA-1 _in_ the commit object.  But the main point 
still stands: you go from commit to note, not from note to commit.  And 
this is in stark contrast to tags, where you go from tag to commit, _not_ 
from commit to tag.

That is a fundamental _difference_ between tags and notes, so that I 
refuse to accept the notion of notes being a generalized form of tags.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-16 23:48     ` Johannes Schindelin
  2008-12-17  9:45       ` Jeff King
       [not found]       ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com>
@ 2008-12-17 12:21       ` Petr Baudis
  2 siblings, 0 replies; 41+ messages in thread
From: Petr Baudis @ 2008-12-17 12:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Govind Salinas, Jeff King, Git Mailing List

On Wed, Dec 17, 2008 at 12:48:02AM +0100, Johannes Schindelin wrote:
> Again, do not limit your design by your expectations.  People already talk 
> about having cover letters for their patch series as notes, and Pasky 
> seems to discuss tracking explicit renames with notes when he does not 
> play Go, instead of maintaining repo.or.cz and git.or.cz.

I don't really play Go that much anymore! ;-)

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17  9:38     ` Jeff King
@ 2008-12-17 17:06       ` Govind Salinas
  2008-12-18 13:54         ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Govind Salinas @ 2008-12-17 17:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Johannes Schindelin

On Wed, Dec 17, 2008 at 3:38 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Dec 16, 2008 at 12:43:55PM -0600, Govind Salinas wrote:
>
>> I was thinking I would do my first implementation in pyrite and if I find
>> that it works well I will port it.
>
> OK, though your performance will probably suck unless you dump the notes
> tree into a local hash at the beginning of your program. And looking up
> every commit's note during revision traversal is one of the intended
> uses (e.g., decorating git-log output, or filtering commits based on a
> particular note).

Yes, I was thinking that this is the natural way to do things, save that I
would be lazy loading the trees into a cache instead of caching them
all up front.  This is one of the reasons that I think the fan out will
help.

> And as Dscho mentioned, most of what you need is already there in C.
> You are welcome to implement whatever you want in pyrite, of course, but
> there is a desire to have this accessible to the revision traversal
> machinery. And that means if you want your version in pyrite to be
> compatible with what ends up in git, the data structure design needs to
> be suitable for both.

Yes, I completely agree that I want it to have the same scheme as what
git will use.  That is the reason I posted this here.  Since no method
has been formally accepted (checked into master) I wanted to see if
I could nudge things along.  I wasn't aware that you and Dscho had
a (very similar) plan.  Please, if you guys are decided on the format
then I can just go off and start working on it.  But it sounds like there
isn't consensus yet.

<snip>
>> root/
>>      12/
>>          34567890123456789012345678901234567890/
>>              abcdef7890123456789012345678901234567890
>>
>> This way all the notes are attached to a tree, so that gc won't
>> think they are unreferenced objects.
>
> But you have lost the ordering in your list, then, since they will not
> be ordered by sha1 of the note contents. I don't know if you care. The
> second sha1 is pointless, anyway, since nobody will know that number as
> a reference; why not just name them monotonically starting at 1?

In a later mail I suggested that this be the type or name of the note.  Which
I hear is similar to what you suggested.

> One of the things I don't like about having several notes is that it
> introduces an extra level of indirection that every user has to pay for,
> whether they want it or not. If a note can be a blob _or_ a tree, then
> those who want to use blobs can reap the performance benefit. Those who
> want multiple named notes in a hierarchy can pay the extra indirection
> cost.
>
> I haven't measured how big a cost that is (but bearing in mind that we
> might want to do this lookup once per revision in a traversal, even one
> extra object lookup can have an impact).

That seems reasonable.

> I'm also still not convinced the fan-out is worthwhile, but I can see
> how it might be. It would be nice to see numbers for both.
>
<snip>
> Also, how large do you expect the list to be under reasonable
>> circumstances.
>
> As many notes as there are commits is my goal (e.g., it is not hard to
> imagine an automated process to add notes on build status). Ideally, we
> could handle as many notes as there are objects; I see no reason not to
> allow annotating arbitrary sha1's (I don't know if there is a use for
> that, but the more scalable the implementation, the better).

Ah, that is in line with what I was thinking as well.

>> >  Some thoughts from me on naming issues:
>> >  http://article.gmane.org/gmane.comp.version-control.git/100402
>>
>> On naming.  I strongly support a ref/notes/sha1/sha1 approach.  If
>> having a type to the note is important, then perhaps the first line of
>> a note could be considered a type or a set of "tags".  This way you
>
> I don't think we are talking about the same thing. What I mean by naming
> is "here is a shorthand for referring to notes" that is not necessarily
> coupled with the implementation. That is, I would like to do something
> like:
>
>  git log --notes-filter="foo:bar == 1"
>
> and have that "foo:bar" as a shorthand on each commit for:
>
>  refs/notes/foo:$COMMIT/bar
>
> Without a left-hand side (e.g., "bar"), we get:
>
>  refs/notes/default:$COMMIT/bar
>
> Or without a right-hand side (e.g., "foo:"), we get:
>
>  refs/notes/foo:$COMMIT
>

I like the overall plan, but I would suggest that --notes[=default] and
--note-type=whatever would be a little friendlier and less error prone.

Thanks for helping me think through this.

-Govind

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: rebasing commits that have notes, was Re: Git Notes idea.
  2008-12-17  9:15     ` Johan Herland
@ 2008-12-17 17:55       ` Stephan Beyer
  0 siblings, 0 replies; 41+ messages in thread
From: Stephan Beyer @ 2008-12-17 17:55 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Johannes Schindelin, Jeff King, Govind Salinas

Hi,

Johan Herland wrote:
> PS: What's the current status on git-sequencer?

Usable, but I still have a small todo list of things to add
or fix. (And the next days some more time again. I hope it's
enough. Sorry, I'm too lame.)

> It's probably the best place to invoke these hooks.

If you want to do stuff on top of git sequencer it's currently
the best to fetch
	git://repo.or.cz/git/sbeyer.git seq-builtin-dev
or just to wait. :-\

Regards,
  Stephan

-- 
Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17 11:38             ` Johannes Schindelin
@ 2008-12-17 19:20               ` Junio C Hamano
  2008-12-18  3:08                 ` Johannes Schindelin
  2008-12-19 17:42               ` Govind Salinas
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2008-12-17 19:20 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Govind Salinas, Git Mailing List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> ...  But the main point 
> still stands: you go from commit to note, not from note to commit.  And 
> this is in stark contrast to tags, where you go from tag to commit, _not_ 
> from commit to tag.
>
> That is a fundamental _difference_ between tags and notes, so that I 
> refuse to accept the notion of notes being a generalized form of tags.

Hmm, how would you explain things like "git describe" (and "name-rev")?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17 19:20               ` Junio C Hamano
@ 2008-12-18  3:08                 ` Johannes Schindelin
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-18  3:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Govind Salinas, Git Mailing List

Hi,

On Wed, 17 Dec 2008, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > ...  But the main point still stands: you go from commit to note, not 
> > from note to commit.  And this is in stark contrast to tags, where you 
> > go from tag to commit, _not_ from commit to tag.
> >
> > That is a fundamental _difference_ between tags and notes, so that I 
> > refuse to accept the notion of notes being a generalized form of tags.
> 
> Hmm, how would you explain things like "git describe" (and "name-rev")?

Programs?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17 17:06       ` Govind Salinas
@ 2008-12-18 13:54         ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2008-12-18 13:54 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Git Mailing List, Johannes Schindelin

On Wed, Dec 17, 2008 at 11:06:15AM -0600, Govind Salinas wrote:

> Yes, I was thinking that this is the natural way to do things, save that I
> would be lazy loading the trees into a cache instead of caching them
> all up front.  This is one of the reasons that I think the fan out will
> help.

I was working under the assumption that you are going to do multiple
note lookups. If you are, then the fan-out isn't really going to help,
as you're going to end up pulling in all of the subtrees anyway. It
helps some if you're only doing a single lookup, but I don't know if
that is measurable.

> Yes, I completely agree that I want it to have the same scheme as what
> git will use.  That is the reason I posted this here.  Since no method
> has been formally accepted (checked into master) I wanted to see if
> I could nudge things along.  I wasn't aware that you and Dscho had
> a (very similar) plan.  Please, if you guys are decided on the format
> then I can just go off and start working on it.  But it sounds like there
> isn't consensus yet.

This is probably not the answer you want, but I think the final design
depends on some C experiments. For example, whether or not there should
be fan-out depends on how it affects performance, which means we need to
do at least partial implementations to compare. So it really is just
waiting for somebody to sit down and do it.

> I like the overall plan, but I would suggest that --notes[=default] and
> --note-type=whatever would be a little friendlier and less error prone.

But keeping it as a single string means there is no ambiguity when you
specify multiple notes at once. For example:

  git log \
    --note-filter='test:status == "fail" && importance > 3' \
    --pretty=format:%h%n%N(test:errors)

would do something like:

  foreach commit $C
    compare refs/notes/test:$C/status against the string "fail"
    compare refs/notes/default:$C/importance against the number 3
    if either don't match, skip the commit
    show the hash and the contents of refs/notes/test:$C/errors

and obviously that filter language is totally made up and we may or may
not want to do something that complex. But my point is that we are
defining a namespace of notes, and we want to be able to refer to a
multiple fully qualified names.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17 10:11           ` Jeff King
  2008-12-17 11:38             ` Johannes Schindelin
@ 2008-12-19 17:18             ` Govind Salinas
  2008-12-19 17:38               ` Govind Salinas
  1 sibling, 1 reply; 41+ messages in thread
From: Govind Salinas @ 2008-12-19 17:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List

On Wed, Dec 17, 2008 at 4:11 AM, Jeff King <peff@peff.net> wrote:
> On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote:
>
>> > I agree, I haven't thought of any fix along these lines other than to
>> > make gc do the clean up.
>>
>> I have, and IIRC I briefly mentioned it back then.  Basically, you will
>> have to add a "git notes gc" or some such, which basically reads in the
>> whole notes, traverses all reachable commits, marking the corresponding
>> notes, and then writes out all marked notes (leaving the other notes
>> behind).
>
> I was thinking something similar, but I think it is even easier. Make
> the rule "if we still have the object, then we still have the note".
> That has three benefits:
>
>  - implementation is simple: for each note $n, delete it unless
>   has_sha1_file($n).
>
>  - it handles notes on non-commit objects
>
>  - it kills off notes when an object is _pruned_, not when it stops
>   being _reachable_. So if I delete a branch with a commit found
>   nowhere else, its notes will hang around until it is actually pruned.
>   If I pull it from lost+found, I still keep the notes.
>
> Note that all of this garbage collection of notes is really just
> removing them from the most current notes _tree_. If the notes structure
> is actually composed of commits, then old notes that are "deleted" will
> still be available historically.
>

This is my concern with keeping a history of the notes pseudo-branch. Let
us say that I do the following

1) on branch A commit a
2) add note a`
3) on branch B commit b
4) add note b`
5) on branch B commit c
6) add note c`
7) delete branch A
8) gc after a time such that a is pruned

Now either I will always have  anote a`

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-19 17:18             ` Govind Salinas
@ 2008-12-19 17:38               ` Govind Salinas
  2008-12-19 21:25                 ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Govind Salinas @ 2008-12-19 17:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List

Sorry, hit the send key accidentally.

On Wed, Dec 17, 2008 at 4:11 AM, Jeff King <peff@peff.net> wrote:
> On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote:
>
> > I agree, I haven't thought of any fix along these lines other than to
> > make gc do the clean up.
>
> I have, and IIRC I briefly mentioned it back then.  Basically, you will
>> have to add a "git notes gc" or some such, which basically reads in the
>> whole notes, traverses all reachable commits, marking the corresponding
>> notes, and then writes out all marked notes (leaving the other notes
>> behind).
>
> I was thinking something similar, but I think it is even easier. Make
> the rule "if we still have the object, then we still have the note".
> That has three benefits:
>
>  - implementation is simple: for each note $n, delete it unless
>   has_sha1_file($n).
>
>  - it handles notes on non-commit objects
>
>  - it kills off notes when an object is _pruned_, not when it stops
>   being _reachable_. So if I delete a branch with a commit found
>   nowhere else, its notes will hang around until it is actually pruned.
>   If I pull it from lost+found, I still keep the notes.
>
> Note that all of this garbage collection of notes is really just
> removing them from the most current notes _tree_. If the notes structure
> is actually composed of commits, then old notes that are "deleted" will
> still be available historically.
>

This is my concern with keeping a history of the notes pseudo-branch.  Let
me restate what you are saying with an example

1) on branch A commit a
2) add note a`
3) on branch B commit b
4) add note b`
5) on branch B commit c
6) add note c`
7) delete branch A
8) gc after a time such that a is pruned

Now either I will always have a note a` as an object forever even though
the only commit that points to it is gone or I have to re-write the history of
the notes branch from the point that it was added.

Given this problem, is it really such a good idea to keep the history?

Of course the other side of this conversation is that the merge operation
will be more complex since the following can also happen

9) push notes
10) user 2 pulls notes but still has commit a and note a`

On the other, other hand, pushing and pulling notes if a history is kept
will have to involve a lot of rebasing/merging.

Just to throw an idea out...

A possible solution is that notes are per-branch,

refs/notes/heads/master
refs/notes/heads/foo/bar
refs/notes/remotes/baz/bang

and then it is easier to deal with.  A published branch's notes are isolated
from the changes in unpublished branches.  And since published branches
aren't *supposed* to change, then the notes should also always be fast
forwards.  Similarly, if a branch is not considered stable, like pu or even
next, then the associated notes branch could be forced in the same way.

Rebase, cherry-pick and merge (and possibly branch/checkout) would have
to be updated to handle notes, which is the down side.  It also doesn't solve
the issue of a history causing us to keep notes after the aren't useful anymore.

So perhaps we could use the above layout with no history?

Thanks,
Govind.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-17 11:38             ` Johannes Schindelin
  2008-12-17 19:20               ` Junio C Hamano
@ 2008-12-19 17:42               ` Govind Salinas
  1 sibling, 0 replies; 41+ messages in thread
From: Govind Salinas @ 2008-12-19 17:42 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Git Mailing List

On Wed, Dec 17, 2008 at 5:38 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Wed, 17 Dec 2008, Jeff King wrote:

>> If he is planning on doing a separate pyrite implementation, then it
>> _hasn't_ been implemented yet. And I don't care there if he uses hash
>> tables or sorted lists or whatever. I think the most important thing is
>> getting down the design of the _data structure_, so that we can have a
>> compatible implementation inside git itself.
>
> Well, I don't care about pyrite.  As far as I am concerned, it might as
> well use an incompatible version.  I really don't care.

Well I do care.  It would not be a good thing for anyone to have 2 separate
systems for notes.  Let us say that someone who you work with uses pyrite
and you don't.  They will add notes which you can't see and vice versa.

Thanks,
Govind.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-19 17:38               ` Govind Salinas
@ 2008-12-19 21:25                 ` Jeff King
  2008-12-19 22:24                   ` Govind Salinas
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2008-12-19 21:25 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Johannes Schindelin, Git Mailing List

On Fri, Dec 19, 2008 at 11:38:55AM -0600, Govind Salinas wrote:

> This is my concern with keeping a history of the notes pseudo-branch.  Let
> me restate what you are saying with an example
> 
> 1) on branch A commit a
> 2) add note a`
> 3) on branch B commit b
> 4) add note b`
> 5) on branch B commit c
> 6) add note c`
> 7) delete branch A
> 8) gc after a time such that a is pruned
> 
> Now either I will always have a note a` as an object forever even though
> the only commit that points to it is gone or I have to re-write the history of
> the notes branch from the point that it was added.

Yes, that's correct.

> Given this problem, is it really such a good idea to keep the history?

I think so. Otherwise how will you push and pull notes? You won't even
know which one is the more recent tree, let alone handle any merges
caused by editing notes in two places.

> On the other, other hand, pushing and pulling notes if a history is kept
> will have to involve a lot of rebasing/merging.

Depending on your workflow. It might just involve a lot of fast forwards
if the note writer is in one place.

> A possible solution is that notes are per-branch,
> 
> refs/notes/heads/master
> refs/notes/heads/foo/bar
> refs/notes/remotes/baz/bang

Sorry, I don't quite get it. You are asking for per-branch notes that
keep history, or per-branch notes that don't keep history?

If the former, then you haven't solved the cruft accumulation problem.
You can get obsolete notes in your note history by rebasing on a branch
that is long-running (which is OK as long as you haven't published
_those particular_ commits). Or are you proposing to rebase and cleanup
the notes history every time you do a destructive operation?

If the latter, then I don't see how you've solved the push-pull and
merge problem (which you need history for).

But in either case, I think the solution is non-intuitive. If I annotate
a commit, and then merge the commit from one branch to another,
shouldn't the annotation stay?


Really, I am not sure this is worth getting too concerned about. Since
we are talking about cruft in the _history_ of the notes branch, it
won't impact actual notes usage (which will always just deal with the
most recent tree). So really we are talking about some uninteresting
objects in the db, which wastes some space. In practice, I suspect this
won't be that large because notes themselves are going to be relatively
short and in many cases, repetitive (i.e., many annotations may have the
same blob hash for several commits). And if it is a space problem, then
the right solution is to periodically truncate the notes history by
rewriting.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-19 21:25                 ` Jeff King
@ 2008-12-19 22:24                   ` Govind Salinas
  2008-12-20  4:54                     ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Govind Salinas @ 2008-12-19 22:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List

On Fri, Dec 19, 2008 at 3:25 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 19, 2008 at 11:38:55AM -0600, Govind Salinas wrote:
>
>> This is my concern with keeping a history of the notes pseudo-branch.  Let
>> me restate what you are saying with an example
>>
>> 1) on branch A commit a
>> 2) add note a`
>> 3) on branch B commit b
>> 4) add note b`
>> 5) on branch B commit c
>> 6) add note c`
>> 7) delete branch A
>> 8) gc after a time such that a is pruned
>>
>> Now either I will always have a note a` as an object forever even though
>> the only commit that points to it is gone or I have to re-write the history of
>> the notes branch from the point that it was added.
>
> Yes, that's correct.
>
>> Given this problem, is it really such a good idea to keep the history?
>
> I think so. Otherwise how will you push and pull notes? You won't even
> know which one is the more recent tree, let alone handle any merges
> caused by editing notes in two places.

Couldn't you simply merge your tree and theirs even if there is no
history.  You would have to find a way to handle merges in any event
since they could just as easily happen if you have a history.

>> On the other, other hand, pushing and pulling notes if a history is kept
>> will have to involve a lot of rebasing/merging.
>
> Depending on your workflow. It might just involve a lot of fast forwards
> if the note writer is in one place.
>
>> A possible solution is that notes are per-branch,
>>
>> refs/notes/heads/master
>> refs/notes/heads/foo/bar
>> refs/notes/remotes/baz/bang
>
> Sorry, I don't quite get it. You are asking for per-branch notes that
> keep history, or per-branch notes that don't keep history?

Both, at the end of my previous mail I said...

"So perhaps we could use the above layout with no history?"

But they are two separate fixes to 2 different problems.

> If the former, then you haven't solved the cruft accumulation problem.
> You can get obsolete notes in your note history by rebasing on a branch
> that is long-running (which is OK as long as you haven't published
> _those particular_ commits). Or are you proposing to rebase and cleanup
> the notes history every time you do a destructive operation?

Yes, it does not solve that problem.  But it does solve things like

Dev1 and Dev2 both have branches A and topic branch B. and they
are in refs/notes/public (or refs/notes or something not branch specific).

Dev1 adds 100 notes to topic B, lets say half of them are obsolete due
to rebases or whatever.  Dev2 pulls A and updates their notes
as well.  Now Dev2 has acquired all the notes from Dev1 including the
obsolete ones.  So you have 100 commits, 100 blobs and all the new
trees that go with them that the user was not interested in.

Run this across 1000 users and you have a lot of cruft.

Now, if instead we have a per-branch notes scheme, then you only get
the cruft from the branches you were interested in.  If you remove the
history you could end up with no cruft because gc should handle it.

> If the latter, then I don't see how you've solved the push-pull and
> merge problem (which you need history for).

What git-fetch would have to do is say.  This is a note.  The remote
sha is not the same as mine, i will treat this as a force and fetch the
objects without checking history and then run a merge on the 2
commits.  The notes merge could have its own strategy that checked
if an object exists before deciding to add a new item or delete a
removed one.  Then the user would only have to intervene if the
notes where edited.

> But in either case, I think the solution is non-intuitive. If I annotate
> a commit, and then merge the commit from one branch to another,
> shouldn't the annotation stay?

Sure, either the merge command could run 2 merges, one for the
real branch and one for the notes pseudo branch or the user
could be required to do that manually.  I would think that doing
it automatically would be good.  Especially if you use a special
merge strategy.

> Really, I am not sure this is worth getting too concerned about. Since
> we are talking about cruft in the _history_ of the notes branch, it
> won't impact actual notes usage (which will always just deal with the
> most recent tree). So really we are talking about some uninteresting
> objects in the db, which wastes some space. In practice, I suspect this
> won't be that large because notes themselves are going to be relatively
> short and in many cases, repetitive (i.e., many annotations may have the
> same blob hash for several commits). And if it is a space problem, then
> the right solution is to periodically truncate the notes history by
> rewriting.

You are correct of course that it will just be wasted space.  But I am
concerned that it could end up being a lot of wasted space.  I mean, what
if every person who contributed to the kernel contributed note cruft.  Users
have branches that they consider public, so they might go into the a public
note store if there is no per-branch store.  Or errant users could use the
public store without understanding how they are affecting the central repo,
including the obsolete ones.

If you *really* don't think its something to be worried about then I am OK
with that since you have a lot more experience with this, but it sounds hairy
to me.

Thanks,
Govind.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/4] Notes reloaded
  2008-12-16  8:51 ` Jeff King
                     ` (2 preceding siblings ...)
  2008-12-17  0:12   ` rebasing commits that have notes, was " Johannes Schindelin
@ 2008-12-19 23:34   ` Johannes Schindelin
  2008-12-19 23:35     ` [PATCH 1/4] Introduce commit notes Johannes Schindelin
                       ` (3 more replies)
  3 siblings, 4 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-19 23:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List

Hi,

On Tue, 16 Dec 2008, Jeff King wrote:

>   Johannes Schindelin's notes proposal (which is more or less 
>   the current proposal, but I think the on-disk notes index was not 
>   well liked): 
>   http://thread.gmane.org/gmane.comp.version-control.git/52598

I redid the benchmark (this time with a bit beefier machine), just 
comparing no notes with David's/Peff's idea:


-- snip --
$ GIT_NOTES_TIMING_TESTS=1 sh t3302-notes-index-expensive.sh -i -v
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/.git/
* expecting success: create_repo 10
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/10/.git/
*   ok 1: setup 10

* expecting success: test_notes 10
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
0.08user 0.10system 0:00.18elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+58926minor)pagefaults 0swaps
notes
0.14user 0.07system 0:00.54elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+60319minor)pagefaults 0swaps
*   ok 3: notes timing

* expecting success: create_repo 100
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/100/.git/
*   ok 1: setup 100

* expecting success: test_notes 100
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
0.23user 0.21system 0:00.45elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68043minor)pagefaults 0swaps
notes
0.38user 0.21system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+78829minor)pagefaults 0swaps
*   ok 3: notes timing

* expecting success: create_repo 1000
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/1000/.git/
*   ok 1: setup 1000

* expecting success: test_notes 1000
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
2.06user 0.95system 0:04.26elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+159115minor)pagefaults 0swaps
notes
2.83user 1.54system 0:04.38elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+267416minor)pagefaults 0swaps
*   ok 3: notes timing

* expecting success: create_repo 10000
Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/10000/.git/
*   ok 1: setup 10000

* expecting success: test_notes 10000
*   ok 2: notes work

* expecting success: time_notes 100
no-notes
20.46user 7.63system 0:28.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1083378minor)pagefaults 0swaps
notes
28.78user 13.74system 0:42.85elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2240296minor)pagefaults 0swaps
*   ok 3: notes timing

* passed all 0 test(s)
-- snap --


Keep in mind that the tests run "git log" 99 times, and show the 
accumulated time.

So it seems that an increase of roughly 40% in the user time, and roughly 
70% in the system time is the price to have notes associated with every 
single commit.

Note that in that very same repository, a single "git show" goes from

0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (0major+561minor)pagefaults 0swaps

to this:

0.03user 0.02system 0:00.04elapsed 113%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (0major+2294minor)pagefaults 0swaps

(In another run, it only used 90%CPU)

That's not too shabby, given that Git needs to unpack double the number of 
objects in this test when using notes vs. no notes.

For comparison, the numbers back then were something like 10% in user time 
with a penalty of an extraordinary magnitude everytime the notes are 
updated: around 800%.

Note: all these numbers are worst-case numbers, i.e. every commit has one 
note.

To be frank, I do not completely understand why the numbers are that high.  
I would have understood an increase roughly 4 seconds for reading the 
quite large tree 99 times, and then the same ~0.20 seconds back then.  
Maybe I made a huge mistake when implementing the thing.

And BTW, my code does not yet handle the case when 
refs/notes/commits:$commit is a tree instead of a blob.  That is left as 
an exercise to the reader.



Johannes Schindelin (4):
  Introduce commit notes
  Add a script to edit/inspect notes
  Speed up git notes lookup
  Add an expensive test for git-notes

 .gitignore                       |    1 +
 Documentation/config.txt         |   15 ++++
 Documentation/git-notes.txt      |   46 +++++++++++
 Makefile                         |    3 +
 cache.h                          |    3 +
 command-list.txt                 |    1 +
 commit.c                         |    1 +
 config.c                         |    5 +
 environment.c                    |    1 +
 git-notes.sh                     |   65 +++++++++++++++
 notes.c                          |  159 ++++++++++++++++++++++++++++++++++++++
 notes.h                          |    7 ++
 pretty.c                         |    5 +
 t/t3301-notes.sh                 |   65 +++++++++++++++
 t/t3302-notes-index-expensive.sh |   98 +++++++++++++++++++++++
 15 files changed, 475 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100644 notes.c
 create mode 100644 notes.h
 create mode 100755 t/t3301-notes.sh
 create mode 100755 t/t3302-notes-index-expensive.sh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/4] Introduce commit notes
  2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
@ 2008-12-19 23:35     ` Johannes Schindelin
  2008-12-20  6:53       ` Jeff King
  2008-12-19 23:35     ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-19 23:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


Commit notes are blobs which are shown together with the commit
message.  These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.

The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).

The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/config.txt |   15 ++++++++++
 Makefile                 |    2 +
 cache.h                  |    3 ++
 commit.c                 |    1 +
 config.c                 |    5 +++
 environment.c            |    1 +
 notes.c                  |   68 ++++++++++++++++++++++++++++++++++++++++++++++
 notes.h                  |    7 +++++
 pretty.c                 |    5 +++
 9 files changed, 107 insertions(+), 0 deletions(-)
 create mode 100644 notes.c
 create mode 100644 notes.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4089362..3248524 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -431,6 +431,21 @@ core.inithook::
 	linkgit:git-init[1].  The hook is called with the argument
 	"reinit" if an existing repository is re-initialized.
 
+core.notesRef::
+	When showing commit messages, also show notes which are stored in
+	the given ref.  This ref is expected to contain paths of the form
+	??/*, where the directory name consists of the first two
+	characters of the commit name, and the base name consists of
+	the remaining 38 characters.
++
+If such a path exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line.  If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be print.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
 alias.*::
 	Command aliases for the linkgit:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/Makefile b/Makefile
index 5e293fe..7871f52 100644
--- a/Makefile
+++ b/Makefile
@@ -371,6 +371,7 @@ LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
 LIB_H += merge-recursive.h
+LIB_H += notes.h
 LIB_H += object.h
 LIB_H += pack.h
 LIB_H += pack-refs.h
@@ -454,6 +455,7 @@ LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += name-hash.o
+LIB_OBJS += notes.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-refs.o
diff --git a/cache.h b/cache.h
index b393c2d..30981de 100644
--- a/cache.h
+++ b/cache.h
@@ -375,6 +375,8 @@ static inline enum object_type object_type(unsigned int mode)
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
 
 extern int is_bare_repository_cfg;
 extern int is_bare_repository(void);
@@ -548,6 +550,7 @@ enum rebase_setup_type {
 extern enum branch_track git_branch_track;
 extern enum rebase_setup_type autorebase;
 extern int keep_hard_links;
+extern char *notes_ref_name;
 
 #define GIT_REPO_VERSION 0
 extern int repository_format_version;
diff --git a/commit.c b/commit.c
index 00f4774..547b88f 100644
--- a/commit.c
+++ b/commit.c
@@ -5,6 +5,7 @@
 #include "utf8.h"
 #include "diff.h"
 #include "revision.h"
+#include "notes.h"
 
 int save_commit_buffer = 1;
 
diff --git a/config.c b/config.c
index 8ff2b4b..9ab9d21 100644
--- a/config.c
+++ b/config.c
@@ -469,6 +469,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.notesref")) {
+		notes_ref_name = xstrdup(value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.pager"))
 		return git_config_string(&pager_program, var, value);
 
diff --git a/environment.c b/environment.c
index fc91809..5724ad2 100644
--- a/environment.c
+++ b/environment.c
@@ -46,6 +46,7 @@ int keep_hard_links = 0;
 
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
+char *notes_ref_name;
 
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..91ec77f
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,68 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+#include "strbuf.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding)
+{
+	static const char *utf8 = "utf-8";
+	struct strbuf name = STRBUF_INIT;
+	const char *hex;
+	unsigned char sha1[20];
+	char *msg;
+	unsigned long msgoffset, msglen;
+	enum object_type type;
+
+	if (!initialized) {
+		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		if (env)
+			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		else if (!notes_ref_name)
+			notes_ref_name = GIT_NOTES_DEFAULT_REF;
+		if (notes_ref_name && read_ref(notes_ref_name, sha1))
+			notes_ref_name = NULL;
+		initialized = 1;
+	}
+
+	if (!notes_ref_name)
+		return;
+
+	strbuf_addf(&name, "%s:%s", notes_ref_name,
+			sha1_to_hex(commit->object.sha1));
+	if (get_sha1(name.buf, sha1))
+		return;
+
+	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+			type != OBJ_BLOB)
+		return;
+
+	if (output_encoding && *output_encoding &&
+			strcmp(utf8, output_encoding)) {
+		char *reencoded = reencode_string(msg, output_encoding, utf8);
+		if (reencoded) {
+			free(msg);
+			msg = reencoded;
+			msglen = strlen(msg);
+		}
+	}
+
+	/* we will end the annotation by a newline anyway */
+	if (msglen && msg[msglen - 1] == '\n')
+		msglen--;
+
+	strbuf_addstr(sb, "\nNotes:\n");
+
+	for (msgoffset = 0; msgoffset < msglen;) {
+		int linelen = strchrnul(msg, '\n') - msg;
+
+		strbuf_addstr(sb, "    ");
+		strbuf_add(sb, msg + msgoffset, linelen);
+		msgoffset += linelen;
+	}
+	free(msg);
+}
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..79d21b6
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,7 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding);
+
+#endif
diff --git a/pretty.c b/pretty.c
index 5f9a0c7..c2bf451 100644
--- a/pretty.c
+++ b/pretty.c
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "mailmap.h"
 #include "log-tree.h"
+#include "notes.h"
 
 static char *user_format;
 
@@ -911,5 +912,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 	 */
 	if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
+
+	if (fmt != CMIT_FMT_ONELINE)
+		get_commit_notes(commit, sb, encoding);
+
 	free(reencoded);
 }
-- 
1.6.1.rc3.368.g63acc

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/4] Add a script to edit/inspect notes
  2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
  2008-12-19 23:35     ` [PATCH 1/4] Introduce commit notes Johannes Schindelin
@ 2008-12-19 23:35     ` Johannes Schindelin
  2008-12-19 23:35     ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin
  2008-12-19 23:37     ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin
  3 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-19 23:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


The script 'git notes' allows you to edit and show commit notes, by
calling either

	git notes show <commit>

or

	git notes edit <commit>

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 .gitignore                  |    1 +
 Documentation/git-notes.txt |   46 ++++++++++++++++++++++++++++++
 Makefile                    |    1 +
 command-list.txt            |    1 +
 git-notes.sh                |   65 +++++++++++++++++++++++++++++++++++++++++++
 t/t3301-notes.sh            |   65 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 179 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100755 t/t3301-notes.sh

diff --git a/.gitignore b/.gitignore
index 90bbb2a..8b90af1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -83,6 +83,7 @@ git-mktag
 git-mktree
 git-name-rev
 git-mv
+git-notes
 git-pack-redundant
 git-pack-objects
 git-pack-refs
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..3d93625
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,46 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add/inspect commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit]
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, without
+changing the commit.  To discern these notes from the message stored
+in the commit object, the notes are indented like the message, after
+an unindented line saying "Notes:".
+
+To disable commit notes, you have to set the config variable
+core.notesRef to the empty string.  Alternatively, you can set it
+to a different ref, something like "refs/notes/bugzilla".  This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+	Edit the notes for a given commit (defaults to HEAD).
+
+show::
+	Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the gitlink:git[7] suite
diff --git a/Makefile b/Makefile
index 7871f52..4bdc86e 100644
--- a/Makefile
+++ b/Makefile
@@ -259,6 +259,7 @@ SCRIPT_SH += git-merge-octopus.sh
 SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
+SCRIPT_SH += git-notes.sh
 SCRIPT_SH += git-parse-remote.sh
 SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
diff --git a/command-list.txt b/command-list.txt
index 3583a33..2dc2c33 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -73,6 +73,7 @@ git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
 git-mv                                  mainporcelain common
 git-name-rev                            plumbinginterrogators
+git-notes                               mainporcelain
 git-pack-objects                        plumbingmanipulators
 git-pack-redundant                      plumbinginterrogators
 git-pack-refs                           ancillarymanipulators
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..bfdbaa8
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,65 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$1" && usage
+ACTION="$1"; shift
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
+die "Invalid commit: $@"
+
+MESSAGE="$GIT_DIR"/new-notes-$COMMIT
+trap '
+	test -f "$MESSAGE" && rm "$MESSAGE"
+' 0
+
+case "$ACTION" in
+edit)
+	GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE"
+
+	GIT_INDEX_FILE="$MESSAGE".idx
+	export GIT_INDEX_FILE
+
+	CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+	if [ -z "$CURRENT_HEAD" ]; then
+		PARENT=
+	else
+		PARENT="-p $CURRENT_HEAD"
+		git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+		git cat-file blob :$COMMIT >> "$MESSAGE" 2> /dev/null
+	fi
+
+	${VISUAL:-${EDITOR:-vi}} "$MESSAGE"
+
+	grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed
+	mv "$MESSAGE".processed "$MESSAGE"
+	if [ -s "$MESSAGE" ]; then
+		BLOB=$(git hash-object -w "$MESSAGE") ||
+			die "Could not write into object database"
+		git update-index --add --cacheinfo 0644 $BLOB $COMMIT ||
+			die "Could not write index"
+	else
+		test -z "$CURRENT_HEAD" &&
+			die "Will not initialise with empty tree"
+		git update-index --force-remove $COMMIT ||
+			die "Could not update index"
+	fi
+
+	TREE=$(git write-tree) || die "Could not write tree"
+	NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+		die "Could not annotate"
+	git update-ref -m "Annotate $COMMIT" \
+		"$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+	git show "$GIT_NOTES_REF":$COMMIT
+;;
+*)
+	usage
+esac
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..ba42c45
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,65 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+	! MSG=3 git notes edit
+'
+
+test_expect_success setup '
+	: > a1 &&
+	git add a1 &&
+	test_tick &&
+	git commit -m 1st &&
+	: > a2 &&
+	git add a2 &&
+	test_tick &&
+	git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+	! MSG=1 GIT_NOTES_REF='/' git notes edit &&
+	! MSG=2 GIT_NOTES_REF='/' git notes show
+'
+
+test_expect_success 'create notes' '
+	git config core.notesRef refs/notes/commits &&
+	MSG=b1 git notes edit &&
+	test ! -f .git/new-notes &&
+	test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+	test b1 = $(git notes show) &&
+	git show HEAD^ &&
+	! git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+    2nd
+
+Notes:
+    b1
+EOF
+
+test_expect_success 'show notes' '
+	! (git cat-file commit HEAD | grep b1) &&
+	git log -1 > output &&
+	git diff expect output
+'
+
+test_done
-- 
1.6.1.rc3.368.g63acc

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/4] Speed up git notes lookup
  2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
  2008-12-19 23:35     ` [PATCH 1/4] Introduce commit notes Johannes Schindelin
  2008-12-19 23:35     ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin
@ 2008-12-19 23:35     ` Johannes Schindelin
  2008-12-19 23:37     ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin
  3 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-19 23:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.

The idea fo the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.

Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object.  This
allows you to annotate arbitrary branches, or objects.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 notes.c |  113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 102 insertions(+), 11 deletions(-)

diff --git a/notes.c b/notes.c
index 91ec77f..68bcb24 100644
--- a/notes.c
+++ b/notes.c
@@ -4,16 +4,112 @@
 #include "refs.h"
 #include "utf8.h"
 #include "strbuf.h"
+#include "tree-walk.h"
+
+struct entry {
+	unsigned char commit_sha1[20];
+	unsigned char notes_sha1[20];
+};
+
+struct hash_map {
+	struct entry *entries;
+	off_t count, size;
+};
 
 static int initialized;
+static struct hash_map hash_map;
+
+static int hash_index(struct hash_map *map, const unsigned char *sha1)
+{
+	int i = ((*(unsigned int *)sha1) % map->size);
+
+	for (;;) {
+		unsigned char *current = map->entries[i].commit_sha1;
+
+		if (!hashcmp(sha1, current))
+			return i;
+
+		if (is_null_sha1(current))
+			return -1 - i;
+
+		if (++i == map->size)
+			i = 0;
+	}
+}
+
+static void add_entry(const unsigned char *commit_sha1,
+		const unsigned char *notes_sha1)
+{
+	int index;
+
+	if (hash_map.count + 1 > hash_map.size >> 1) {
+		int i, old_size = hash_map.size;
+		struct entry *old = hash_map.entries;
+
+		hash_map.size = old_size ? old_size << 1 : 64;
+		hash_map.entries = (struct entry *)
+			xcalloc(sizeof(struct entry), hash_map.size);
+
+		for (i = 0; i < old_size; i++)
+			if (!is_null_sha1(old[i].commit_sha1)) {
+				index = -1 - hash_index(&hash_map,
+						old[i].commit_sha1);
+				memcpy(hash_map.entries + index, old + i,
+					sizeof(struct entry));
+			}
+		free(old);
+	}
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0) {
+		index = -1 - index;
+		hash_map.count++;
+	}
+
+	hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
+	hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+}
+
+static void initialize_hash_map(const char *notes_ref_name)
+{
+	unsigned char sha1[20], commit_sha1[20];
+	unsigned *mode;
+	struct tree_desc desc;
+	struct name_entry entry;
+	void *buf;
+
+	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
+			get_tree_entry(commit_sha1, "", sha1, mode))
+		return;
+
+	buf = fill_tree_descriptor(&desc, sha1);
+	if (!buf)
+		die ("Could not read %s for notes-index", sha1_to_hex(sha1));
+
+	while (tree_entry(&desc, &entry))
+		if (!get_sha1(entry.path, commit_sha1))
+			add_entry(commit_sha1, entry.sha1);
+	free(buf);
+}
+
+static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+{
+	int index;
+
+	if (!hash_map.size)
+		return NULL;
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0)
+		return NULL;
+	return hash_map.entries[index].notes_sha1;
+}
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding)
 {
 	static const char *utf8 = "utf-8";
-	struct strbuf name = STRBUF_INIT;
-	const char *hex;
-	unsigned char sha1[20];
+	unsigned char *sha1;
 	char *msg;
 	unsigned long msgoffset, msglen;
 	enum object_type type;
@@ -24,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
 		else if (!notes_ref_name)
 			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		if (notes_ref_name && read_ref(notes_ref_name, sha1))
-			notes_ref_name = NULL;
+		initialize_hash_map(notes_ref_name);
 		initialized = 1;
 	}
 
-	if (!notes_ref_name)
-		return;
-
-	strbuf_addf(&name, "%s:%s", notes_ref_name,
-			sha1_to_hex(commit->object.sha1));
-	if (get_sha1(name.buf, sha1))
+	sha1 = lookup_notes(commit->object.sha1);
+	if (!sha1)
 		return;
 
 	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
-- 
1.6.1.rc3.368.g63acc

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/4] Add an expensive test for git-notes
  2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
                       ` (2 preceding siblings ...)
  2008-12-19 23:35     ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin
@ 2008-12-19 23:37     ` Johannes Schindelin
  2008-12-19 23:49       ` Boyd Stephen Smith Jr.
  3 siblings, 1 reply; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-19 23:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


git-notes have the potential of being pretty expensive, so test with
a lot of commits.  A lot.  So to make things cheaper, you have to
opt-in explicitely, by setting the environment variable
GIT_NOTES_TIMING_TESTS.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	I would appreciate other people running the tests, and maybe 
	profiling the code.

	However, I will not be really online the next two weeks, so if you 
	feel like working on this series, go ahead.

	Merry Christmas.

 t/t3302-notes-index-expensive.sh |   98 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)
 create mode 100755 t/t3302-notes-index-expensive.sh

diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..00d27bf
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+	say Skipping timing tests
+	test_done
+	exit
+}
+
+create_repo () {
+	number_of_commits=$1
+	nr=0
+	parent=
+	test -d .git || {
+	git init &&
+	tree=$(git write-tree) &&
+	while [ $nr -lt $number_of_commits ]; do
+		test_tick &&
+		commit=$(echo $nr | git commit-tree $tree $parent) ||
+			return
+		parent="-p $commit"
+		nr=$(($nr+1))
+	done &&
+	git update-ref refs/heads/master $commit &&
+	{
+		export GIT_INDEX_FILE=.git/temp;
+		git rev-list HEAD | cat -n | sed "s/^[ 	][ 	]*/ /g" |
+		while read nr sha1; do
+			blob=$(echo note $nr | git hash-object -w --stdin) &&
+			echo $sha1 | sed "s/^/0644 $blob 0	/"
+		done | git update-index --index-info &&
+		tree=$(git write-tree) &&
+		test_tick &&
+		commit=$(echo notes | git commit-tree $tree) &&
+		git update-ref refs/notes/commits $commit
+	} &&
+	git config core.notesRef refs/notes/commits
+	}
+}
+
+test_notes () {
+	count=$1 &&
+	git config core.notesRef refs/notes/commits &&
+	git log | grep "^    " > output &&
+	i=1 &&
+	while [ $i -le $count ]; do
+		echo "    $(($count-$i))" &&
+		echo "    note $i" &&
+		i=$(($i+1));
+	done > expect &&
+	git diff expect output
+}
+
+cat > time_notes << \EOF
+	mode=$1
+	i=1
+	while [ $i -lt $2 ]; do
+		case $1 in
+		no-notes)
+			export GIT_NOTES_REF=non-existing
+		;;
+		notes)
+			unset GIT_NOTES_REF
+		;;
+		esac
+		git log >/dev/null
+		i=$(($i+1))
+	done
+EOF
+
+time_notes () {
+	for mode in no-notes notes
+	do
+		echo $mode
+		/usr/bin/time sh ../time_notes $mode $1
+	done
+}
+
+for count in 10 100 1000 10000; do
+
+	mkdir $count
+	(cd $count;
+
+	test_expect_success "setup $count" "create_repo $count"
+
+	test_expect_success 'notes work' "test_notes $count"
+
+	test_expect_success 'notes timing' "time_notes 100"
+	)
+done
+
+test_done
-- 
1.6.1.rc3.368.g63acc

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/4] Add an expensive test for git-notes
  2008-12-19 23:37     ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin
@ 2008-12-19 23:49       ` Boyd Stephen Smith Jr.
  2008-12-20 11:51         ` Johannes Schindelin
  0 siblings, 1 reply; 41+ messages in thread
From: Boyd Stephen Smith Jr. @ 2008-12-19 23:49 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Johannes Schindelin, Jeff King, Govind Salinas

[-- Attachment #1: Type: text/plain, Size: 452 bytes --]

On Friday 2008 December 19 17:37:15 Johannes Schindelin wrote:
> +       test_expect_success 'notes timing' "time_notes 100"
                                                         ^^^
Probably should be ${count}.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Git Notes idea.
  2008-12-19 22:24                   ` Govind Salinas
@ 2008-12-20  4:54                     ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2008-12-20  4:54 UTC (permalink / raw)
  To: Govind Salinas; +Cc: Johannes Schindelin, Git Mailing List

On Fri, Dec 19, 2008 at 04:24:01PM -0600, Govind Salinas wrote:

> > I think so. Otherwise how will you push and pull notes? You won't even
> > know which one is the more recent tree, let alone handle any merges
> > caused by editing notes in two places.
> 
> Couldn't you simply merge your tree and theirs even if there is no
> history.  You would have to find a way to handle merges in any event
> since they could just as easily happen if you have a history.

Let's say I have a tree T1 like this:

  $COMMIT_A -> $BLOB_A
  $COMMIT_B -> $BLOB_B1

and a tree T2 like this:

  $COMMIT_B -> $BLOB_B2
  $COMMIT_C -> $BLOB_C

what is the correct merge? Was $COMMIT_A added in T1, or deleted in T2?
How about $COMMIT_C? Even if you went with a strategy like "always add
from both" (which I don't think is a good idea, because deleted notes
will keep popping back up) you have a conflict with $COMMIT_B.  Should
it be B1 or B2? You can't tell if B1 became B2, vice versa, or if there
is a true merge conflict.

> > If the former, then you haven't solved the cruft accumulation problem.
> > You can get obsolete notes in your note history by rebasing on a branch
> > that is long-running (which is OK as long as you haven't published
> > _those particular_ commits). Or are you proposing to rebase and cleanup
> > the notes history every time you do a destructive operation?
> 
> Yes, it does not solve that problem.  But it does solve things like
> 
> Dev1 and Dev2 both have branches A and topic branch B. and they
> are in refs/notes/public (or refs/notes or something not branch specific).
> 
> Dev1 adds 100 notes to topic B, lets say half of them are obsolete due
> to rebases or whatever.  Dev2 pulls A and updates their notes
> as well.  Now Dev2 has acquired all the notes from Dev1 including the
> obsolete ones.  So you have 100 commits, 100 blobs and all the new
> trees that go with them that the user was not interested in.
> 
> Run this across 1000 users and you have a lot of cruft.
> 
> Now, if instead we have a per-branch notes scheme, then you only get
> the cruft from the branches you were interested in.  If you remove the
> history you could end up with no cruft because gc should handle it.

OK. But my point is that this is an incomplete solution. You can _still_
get cruft, and you _still_ have to deal with that cruft some other way.
So we will still end up having to implement something else.  And I might
even be fine with a partial solution that helped some if it didn't come
with a cost, but I think the "notes stick to branches" behavior is
strictly worse.

> > If the latter, then I don't see how you've solved the push-pull and
> > merge problem (which you need history for).
> 
> What git-fetch would have to do is say.  This is a note.  The remote
> sha is not the same as mine, i will treat this as a force and fetch the
> objects without checking history and then run a merge on the 2
> commits.  The notes merge could have its own strategy that checked
> if an object exists before deciding to add a new item or delete a
> removed one.  Then the user would only have to intervene if the
> notes where edited.

I don't like that because:

  - the user is going to end up manually resolving merge conflicts for
    things that _should_ have been fast forwards. But much worse, it's
    going to be on content they may never even have seen before. How
    will they decide which is which?

  - how do you push notes? There's no opportunity to handle the merge
    on the remote side. And you can't just pull, merge locally, and push
    what is now a fast-forward, because there is no concept of
    fast-forward without history.

  - Suddenly pulling and pushing notes isn't just taken care of by the
    usual ref transfer mechanisms. We have to implement a whole new
    system.

> You are correct of course that it will just be wasted space.  But I am
> concerned that it could end up being a lot of wasted space.  I mean, what
> if every person who contributed to the kernel contributed note cruft.  Users

What if every person who contributed to the kernel contributed history
cruft? It's really the same problem, and it is solved by people keeping
their trees clean (via rebase) and being picky about how data comes into
your tree (i.e., don't pull from people with cruft). I suspect Linus
wouldn't pull notes at all (and they wouldn't make it over patch
transmission anyway). But in a workflow that is pulling the notes, the
right time to clean up history is probably before publishing. That is,
you can rebase and clean up your notes history just before you push it
to somewhere public, just like you might clean up messy history.

> If you *really* don't think its something to be worried about then I
> am OK with that since you have a lot more experience with this, but it
> sounds hairy to me.

It is hairy, and I wish there were a better solution. But I think every
other option is much worse.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4] Introduce commit notes
  2008-12-19 23:35     ` [PATCH 1/4] Introduce commit notes Johannes Schindelin
@ 2008-12-20  6:53       ` Jeff King
  2008-12-20  7:55         ` Robin Rosenberg
  2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2008-12-20  6:53 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Govind Salinas, Git Mailing List

On Sat, Dec 20, 2008 at 12:35:06AM +0100, Johannes Schindelin wrote:

> Commit notes are blobs which are shown together with the commit
> message.  These blobs are taken from the notes ref, which you can
> configure by the config variable core.notesRef, which in turn can
> be overridden by the environment variable GIT_NOTES_REF.

Hmm. I wanted to try some performance comparisons based on this
implementation, but I can't get your 1/4 to apply. Conflicts in
config.txt and cache.h when applying to master, and "sha1 information is
lacking or useless" for a 3-way merge. What did you base this on?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4] Introduce commit notes
  2008-12-20  6:53       ` Jeff King
@ 2008-12-20  7:55         ` Robin Rosenberg
  2008-12-20  8:05           ` Jeff King
  2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
  1 sibling, 1 reply; 41+ messages in thread
From: Robin Rosenberg @ 2008-12-20  7:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Govind Salinas, Git Mailing List

lördag 20 december 2008 07:53:38 skrev Jeff King:
> On Sat, Dec 20, 2008 at 12:35:06AM +0100, Johannes Schindelin wrote:
> 
> > Commit notes are blobs which are shown together with the commit
> > message.  These blobs are taken from the notes ref, which you can
> > configure by the config variable core.notesRef, which in turn can
> > be overridden by the environment variable GIT_NOTES_REF.
> 
> Hmm. I wanted to try some performance comparisons based on this
> implementation, but I can't get your 1/4 to apply. Conflicts in
> config.txt and cache.h when applying to master, and "sha1 information is
> lacking or useless" for a 3-way merge. What did you base this on?

patch(1) however can crunch it, with the exception of cache.h. Shouldn't
git am/appy and patch agree on git generated patches (without binary diffs)?

-- robin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4] Introduce commit notes
  2008-12-20  7:55         ` Robin Rosenberg
@ 2008-12-20  8:05           ` Jeff King
  2008-12-20  8:17             ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2008-12-20  8:05 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Johannes Schindelin, Govind Salinas, Git Mailing List

On Sat, Dec 20, 2008 at 08:55:14AM +0100, Robin Rosenberg wrote:

> > Hmm. I wanted to try some performance comparisons based on this
> > implementation, but I can't get your 1/4 to apply. Conflicts in
> > config.txt and cache.h when applying to master, and "sha1 information is
> > lacking or useless" for a 3-way merge. What did you base this on?
> 
> patch(1) however can crunch it, with the exception of cache.h. Shouldn't
> git am/appy and patch agree on git generated patches (without binary diffs)?

No. git apply is intentionally much more strict about applying under the
assumption that it is better to force a conflict than to silently apply
something that has a reasonable chance of being completely wrong.

And usually it is not a big deal because falling back to the 3-way merge
is a much nicer way of handling any conflicts _anyway_ (I find .rej
files so much more useless than conflict markers, personally).

In this case I was able to:

  1. git am /the/patch
  2. patch -p1 <.git/rebase-apply/patch
  3. manually inspect the results for sanity, and fix up the cache.h
     bit that failed totally
  4. git add -u && git add notes.[ch]
  5. git am --resolved

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4] Introduce commit notes
  2008-12-20  8:05           ` Jeff King
@ 2008-12-20  8:17             ` Junio C Hamano
  2008-12-20  8:23               ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2008-12-20  8:17 UTC (permalink / raw)
  To: Jeff King
  Cc: Robin Rosenberg, Johannes Schindelin, Govind Salinas, Git Mailing List

Jeff King <peff@peff.net> writes:

> No. git apply is intentionally much more strict about applying under the
> assumption that it is better to force a conflict than to silently apply
> something that has a reasonable chance of being completely wrong.
>
> And usually it is not a big deal because falling back to the 3-way merge
> is a much nicer way of handling any conflicts _anyway_ (I find .rej
> files so much more useless than conflict markers, personally).
>
> In this case I was able to:
>
>   1. git am /the/patch
>   2. patch -p1 <.git/rebase-apply/patch
>   3. manually inspect the results for sanity, and fix up the cache.h
>      bit that failed totally
>   4. git add -u && git add notes.[ch]
>   5. git am --resolved

I usually skip 2-4 and edit .git/rebase-apply/patch in place instead, and
run "git am" instead of step 5.

Why was the patch, that was based on something that is clearly different
from what other people work on, sent to the list in the first place?  IOW,
what good does it do to show your patch if other people (plural) need to
spend a lot of time and head-scratching?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4] Introduce commit notes
  2008-12-20  8:17             ` Junio C Hamano
@ 2008-12-20  8:23               ` Jeff King
  2008-12-20 20:09                 ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2008-12-20  8:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robin Rosenberg, Johannes Schindelin, Govind Salinas, Git Mailing List

On Sat, Dec 20, 2008 at 12:17:46AM -0800, Junio C Hamano wrote:

> >   1. git am /the/patch
> >   2. patch -p1 <.git/rebase-apply/patch
> >   3. manually inspect the results for sanity, and fix up the cache.h
> >      bit that failed totally
> >   4. git add -u && git add notes.[ch]
> >   5. git am --resolved
> 
> I usually skip 2-4 and edit .git/rebase-apply/patch in place instead, and
> run "git am" instead of step 5.

How do you track down the source of the conflict to do the patch fixup?
In this case, it was a context line in the patch that had been deleted
in my version. Do you just find the appropriate chunk in what you have
already and visually compare?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/4] Add an expensive test for git-notes
  2008-12-19 23:49       ` Boyd Stephen Smith Jr.
@ 2008-12-20 11:51         ` Johannes Schindelin
  0 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-20 11:51 UTC (permalink / raw)
  To: Boyd Stephen Smith Jr.; +Cc: Git Mailing List, Jeff King, Govind Salinas

[-- Attachment #1: Type: TEXT/PLAIN, Size: 437 bytes --]

Hi,

On Fri, 19 Dec 2008, Boyd Stephen Smith Jr. wrote:

> On Friday 2008 December 19 17:37:15 Johannes Schindelin wrote:
> > +       test_expect_success 'notes timing' "time_notes 100"
>                                                          ^^^
> Probably should be ${count}.

No.  It times a git log 100 times (actually, 99 times due to a thinko).  
This is only to protect against jitter, otherwise I'd do it only once.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/4] Notes, reloaded
  2008-12-20  6:53       ` Jeff King
  2008-12-20  7:55         ` Robin Rosenberg
@ 2008-12-20 12:04         ` Johannes Schindelin
  2008-12-20 12:05           ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin
                             ` (3 more replies)
  1 sibling, 4 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-20 12:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


My apologies; I forgot to rebase the series from my personal fork to Junio's
next.  This is the version that should apply cleanly.

Johannes Schindelin (4):
  Introduce commit notes
  Add a script to edit/inspect notes
  Speed up git notes lookup
  Add an expensive test for git-notes

 .gitignore                       |    1 +
 Documentation/config.txt         |   15 ++++
 Documentation/git-notes.txt      |   46 +++++++++++
 Makefile                         |    3 +
 cache.h                          |    3 +
 command-list.txt                 |    1 +
 commit.c                         |    1 +
 config.c                         |    5 +
 environment.c                    |    1 +
 git-notes.sh                     |   65 +++++++++++++++
 notes.c                          |  159 ++++++++++++++++++++++++++++++++++++++
 notes.h                          |    7 ++
 pretty.c                         |    5 +
 t/t3301-notes.sh                 |   65 +++++++++++++++
 t/t3302-notes-index-expensive.sh |   98 +++++++++++++++++++++++
 15 files changed, 475 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100644 notes.c
 create mode 100644 notes.h
 create mode 100755 t/t3301-notes.sh
 create mode 100755 t/t3302-notes-index-expensive.sh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/4] Introduce commit notes
  2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
@ 2008-12-20 12:05           ` Johannes Schindelin
  2008-12-20 12:05           ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-20 12:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


Commit notes are blobs which are shown together with the commit
message.  These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.

The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).

The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/config.txt |   15 ++++++++++
 Makefile                 |    2 +
 cache.h                  |    3 ++
 commit.c                 |    1 +
 config.c                 |    5 +++
 environment.c            |    1 +
 notes.c                  |   68 ++++++++++++++++++++++++++++++++++++++++++++++
 notes.h                  |    7 +++++
 pretty.c                 |    5 +++
 9 files changed, 107 insertions(+), 0 deletions(-)
 create mode 100644 notes.c
 create mode 100644 notes.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 21ea165..b35a32a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -422,6 +422,21 @@ relatively high IO latencies.  With this set to 'true', git will do the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.
 
+core.notesRef::
+	When showing commit messages, also show notes which are stored in
+	the given ref.  This ref is expected to contain paths of the form
+	??/*, where the directory name consists of the first two
+	characters of the commit name, and the base name consists of
+	the remaining 38 characters.
++
+If such a path exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line.  If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be print.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
 alias.*::
 	Command aliases for the linkgit:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/Makefile b/Makefile
index 5fcff9a..dc0a324 100644
--- a/Makefile
+++ b/Makefile
@@ -370,6 +370,7 @@ LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
 LIB_H += merge-recursive.h
+LIB_H += notes.h
 LIB_H += object.h
 LIB_H += pack.h
 LIB_H += pack-refs.h
@@ -451,6 +452,7 @@ LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += name-hash.o
+LIB_OBJS += notes.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-refs.o
diff --git a/cache.h b/cache.h
index 9dabcc7..9192853 100644
--- a/cache.h
+++ b/cache.h
@@ -375,6 +375,8 @@ static inline enum object_type object_type(unsigned int mode)
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
 
 extern int is_bare_repository_cfg;
 extern int is_bare_repository(void);
@@ -546,6 +548,7 @@ enum rebase_setup_type {
 
 extern enum branch_track git_branch_track;
 extern enum rebase_setup_type autorebase;
+extern char *notes_ref_name;
 
 #define GIT_REPO_VERSION 0
 extern int repository_format_version;
diff --git a/commit.c b/commit.c
index c99db16..10e532a 100644
--- a/commit.c
+++ b/commit.c
@@ -5,6 +5,7 @@
 #include "utf8.h"
 #include "diff.h"
 #include "revision.h"
+#include "notes.h"
 
 int save_commit_buffer = 1;
 
diff --git a/config.c b/config.c
index 790405a..e5d5b4b 100644
--- a/config.c
+++ b/config.c
@@ -469,6 +469,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.notesref")) {
+		notes_ref_name = xstrdup(value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.pager"))
 		return git_config_string(&pager_program, var, value);
 
diff --git a/environment.c b/environment.c
index e278bce..0edae21 100644
--- a/environment.c
+++ b/environment.c
@@ -45,6 +45,7 @@ enum rebase_setup_type autorebase = AUTOREBASE_NEVER;
 
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
+char *notes_ref_name;
 
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..91ec77f
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,68 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+#include "strbuf.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding)
+{
+	static const char *utf8 = "utf-8";
+	struct strbuf name = STRBUF_INIT;
+	const char *hex;
+	unsigned char sha1[20];
+	char *msg;
+	unsigned long msgoffset, msglen;
+	enum object_type type;
+
+	if (!initialized) {
+		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		if (env)
+			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		else if (!notes_ref_name)
+			notes_ref_name = GIT_NOTES_DEFAULT_REF;
+		if (notes_ref_name && read_ref(notes_ref_name, sha1))
+			notes_ref_name = NULL;
+		initialized = 1;
+	}
+
+	if (!notes_ref_name)
+		return;
+
+	strbuf_addf(&name, "%s:%s", notes_ref_name,
+			sha1_to_hex(commit->object.sha1));
+	if (get_sha1(name.buf, sha1))
+		return;
+
+	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+			type != OBJ_BLOB)
+		return;
+
+	if (output_encoding && *output_encoding &&
+			strcmp(utf8, output_encoding)) {
+		char *reencoded = reencode_string(msg, output_encoding, utf8);
+		if (reencoded) {
+			free(msg);
+			msg = reencoded;
+			msglen = strlen(msg);
+		}
+	}
+
+	/* we will end the annotation by a newline anyway */
+	if (msglen && msg[msglen - 1] == '\n')
+		msglen--;
+
+	strbuf_addstr(sb, "\nNotes:\n");
+
+	for (msgoffset = 0; msgoffset < msglen;) {
+		int linelen = strchrnul(msg, '\n') - msg;
+
+		strbuf_addstr(sb, "    ");
+		strbuf_add(sb, msg + msgoffset, linelen);
+		msgoffset += linelen;
+	}
+	free(msg);
+}
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..79d21b6
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,7 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+		const char *output_encoding);
+
+#endif
diff --git a/pretty.c b/pretty.c
index f6ff312..2d2872f 100644
--- a/pretty.c
+++ b/pretty.c
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "mailmap.h"
 #include "log-tree.h"
+#include "notes.h"
 
 static char *user_format;
 
@@ -881,5 +882,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 	 */
 	if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
+
+	if (fmt != CMIT_FMT_ONELINE)
+		get_commit_notes(commit, sb, encoding);
+
 	free(reencoded);
 }
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/4] Add a script to edit/inspect notes
  2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
  2008-12-20 12:05           ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin
@ 2008-12-20 12:05           ` Johannes Schindelin
  2008-12-20 12:05           ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin
  2008-12-20 12:06           ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin
  3 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-20 12:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


The script 'git notes' allows you to edit and show commit notes, by
calling either

	git notes show <commit>

or

	git notes edit <commit>

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 .gitignore                  |    1 +
 Documentation/git-notes.txt |   46 ++++++++++++++++++++++++++++++
 Makefile                    |    1 +
 command-list.txt            |    1 +
 git-notes.sh                |   65 +++++++++++++++++++++++++++++++++++++++++++
 t/t3301-notes.sh            |   65 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 179 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100755 t/t3301-notes.sh

diff --git a/.gitignore b/.gitignore
index 327e660..ac5fbf4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -82,6 +82,7 @@ git-mktag
 git-mktree
 git-name-rev
 git-mv
+git-notes
 git-pack-redundant
 git-pack-objects
 git-pack-refs
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..3d93625
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,46 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add/inspect commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit]
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, without
+changing the commit.  To discern these notes from the message stored
+in the commit object, the notes are indented like the message, after
+an unindented line saying "Notes:".
+
+To disable commit notes, you have to set the config variable
+core.notesRef to the empty string.  Alternatively, you can set it
+to a different ref, something like "refs/notes/bugzilla".  This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+	Edit the notes for a given commit (defaults to HEAD).
+
+show::
+	Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the gitlink:git[7] suite
diff --git a/Makefile b/Makefile
index dc0a324..5082f0f 100644
--- a/Makefile
+++ b/Makefile
@@ -258,6 +258,7 @@ SCRIPT_SH += git-merge-octopus.sh
 SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
+SCRIPT_SH += git-notes.sh
 SCRIPT_SH += git-parse-remote.sh
 SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
diff --git a/command-list.txt b/command-list.txt
index 3583a33..2dc2c33 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -73,6 +73,7 @@ git-mktag                               plumbingmanipulators
 git-mktree                              plumbingmanipulators
 git-mv                                  mainporcelain common
 git-name-rev                            plumbinginterrogators
+git-notes                               mainporcelain
 git-pack-objects                        plumbingmanipulators
 git-pack-redundant                      plumbinginterrogators
 git-pack-refs                           ancillarymanipulators
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..bfdbaa8
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,65 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$1" && usage
+ACTION="$1"; shift
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
+die "Invalid commit: $@"
+
+MESSAGE="$GIT_DIR"/new-notes-$COMMIT
+trap '
+	test -f "$MESSAGE" && rm "$MESSAGE"
+' 0
+
+case "$ACTION" in
+edit)
+	GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE"
+
+	GIT_INDEX_FILE="$MESSAGE".idx
+	export GIT_INDEX_FILE
+
+	CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+	if [ -z "$CURRENT_HEAD" ]; then
+		PARENT=
+	else
+		PARENT="-p $CURRENT_HEAD"
+		git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+		git cat-file blob :$COMMIT >> "$MESSAGE" 2> /dev/null
+	fi
+
+	${VISUAL:-${EDITOR:-vi}} "$MESSAGE"
+
+	grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed
+	mv "$MESSAGE".processed "$MESSAGE"
+	if [ -s "$MESSAGE" ]; then
+		BLOB=$(git hash-object -w "$MESSAGE") ||
+			die "Could not write into object database"
+		git update-index --add --cacheinfo 0644 $BLOB $COMMIT ||
+			die "Could not write index"
+	else
+		test -z "$CURRENT_HEAD" &&
+			die "Will not initialise with empty tree"
+		git update-index --force-remove $COMMIT ||
+			die "Could not update index"
+	fi
+
+	TREE=$(git write-tree) || die "Could not write tree"
+	NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+		die "Could not annotate"
+	git update-ref -m "Annotate $COMMIT" \
+		"$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+	git show "$GIT_NOTES_REF":$COMMIT
+;;
+*)
+	usage
+esac
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..ba42c45
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,65 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+	! MSG=3 git notes edit
+'
+
+test_expect_success setup '
+	: > a1 &&
+	git add a1 &&
+	test_tick &&
+	git commit -m 1st &&
+	: > a2 &&
+	git add a2 &&
+	test_tick &&
+	git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+	! MSG=1 GIT_NOTES_REF='/' git notes edit &&
+	! MSG=2 GIT_NOTES_REF='/' git notes show
+'
+
+test_expect_success 'create notes' '
+	git config core.notesRef refs/notes/commits &&
+	MSG=b1 git notes edit &&
+	test ! -f .git/new-notes &&
+	test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+	test b1 = $(git notes show) &&
+	git show HEAD^ &&
+	! git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+    2nd
+
+Notes:
+    b1
+EOF
+
+test_expect_success 'show notes' '
+	! (git cat-file commit HEAD | grep b1) &&
+	git log -1 > output &&
+	git diff expect output
+'
+
+test_done
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/4] Speed up git notes lookup
  2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
  2008-12-20 12:05           ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin
  2008-12-20 12:05           ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin
@ 2008-12-20 12:05           ` Johannes Schindelin
  2008-12-20 12:06           ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin
  3 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-20 12:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.

The idea fo the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.

Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object.  This
allows you to annotate arbitrary branches, or objects.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 notes.c |  113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 102 insertions(+), 11 deletions(-)

diff --git a/notes.c b/notes.c
index 91ec77f..68bcb24 100644
--- a/notes.c
+++ b/notes.c
@@ -4,16 +4,112 @@
 #include "refs.h"
 #include "utf8.h"
 #include "strbuf.h"
+#include "tree-walk.h"
+
+struct entry {
+	unsigned char commit_sha1[20];
+	unsigned char notes_sha1[20];
+};
+
+struct hash_map {
+	struct entry *entries;
+	off_t count, size;
+};
 
 static int initialized;
+static struct hash_map hash_map;
+
+static int hash_index(struct hash_map *map, const unsigned char *sha1)
+{
+	int i = ((*(unsigned int *)sha1) % map->size);
+
+	for (;;) {
+		unsigned char *current = map->entries[i].commit_sha1;
+
+		if (!hashcmp(sha1, current))
+			return i;
+
+		if (is_null_sha1(current))
+			return -1 - i;
+
+		if (++i == map->size)
+			i = 0;
+	}
+}
+
+static void add_entry(const unsigned char *commit_sha1,
+		const unsigned char *notes_sha1)
+{
+	int index;
+
+	if (hash_map.count + 1 > hash_map.size >> 1) {
+		int i, old_size = hash_map.size;
+		struct entry *old = hash_map.entries;
+
+		hash_map.size = old_size ? old_size << 1 : 64;
+		hash_map.entries = (struct entry *)
+			xcalloc(sizeof(struct entry), hash_map.size);
+
+		for (i = 0; i < old_size; i++)
+			if (!is_null_sha1(old[i].commit_sha1)) {
+				index = -1 - hash_index(&hash_map,
+						old[i].commit_sha1);
+				memcpy(hash_map.entries + index, old + i,
+					sizeof(struct entry));
+			}
+		free(old);
+	}
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0) {
+		index = -1 - index;
+		hash_map.count++;
+	}
+
+	hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
+	hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+}
+
+static void initialize_hash_map(const char *notes_ref_name)
+{
+	unsigned char sha1[20], commit_sha1[20];
+	unsigned *mode;
+	struct tree_desc desc;
+	struct name_entry entry;
+	void *buf;
+
+	if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
+			get_tree_entry(commit_sha1, "", sha1, mode))
+		return;
+
+	buf = fill_tree_descriptor(&desc, sha1);
+	if (!buf)
+		die ("Could not read %s for notes-index", sha1_to_hex(sha1));
+
+	while (tree_entry(&desc, &entry))
+		if (!get_sha1(entry.path, commit_sha1))
+			add_entry(commit_sha1, entry.sha1);
+	free(buf);
+}
+
+static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+{
+	int index;
+
+	if (!hash_map.size)
+		return NULL;
+
+	index = hash_index(&hash_map, commit_sha1);
+	if (index < 0)
+		return NULL;
+	return hash_map.entries[index].notes_sha1;
+}
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 		const char *output_encoding)
 {
 	static const char *utf8 = "utf-8";
-	struct strbuf name = STRBUF_INIT;
-	const char *hex;
-	unsigned char sha1[20];
+	unsigned char *sha1;
 	char *msg;
 	unsigned long msgoffset, msglen;
 	enum object_type type;
@@ -24,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
 		else if (!notes_ref_name)
 			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		if (notes_ref_name && read_ref(notes_ref_name, sha1))
-			notes_ref_name = NULL;
+		initialize_hash_map(notes_ref_name);
 		initialized = 1;
 	}
 
-	if (!notes_ref_name)
-		return;
-
-	strbuf_addf(&name, "%s:%s", notes_ref_name,
-			sha1_to_hex(commit->object.sha1));
-	if (get_sha1(name.buf, sha1))
+	sha1 = lookup_notes(commit->object.sha1);
+	if (!sha1)
 		return;
 
 	if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/4] Add an expensive test for git-notes
  2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
                             ` (2 preceding siblings ...)
  2008-12-20 12:05           ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin
@ 2008-12-20 12:06           ` Johannes Schindelin
  3 siblings, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2008-12-20 12:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Govind Salinas, Git Mailing List


git-notes have the potential of being pretty expensive, so test with
a lot of commits.  A lot.  So to make things cheaper, you have to
opt-in explicitely, by setting the environment variable
GIT_NOTES_TIMING_TESTS.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t3302-notes-index-expensive.sh |   98 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)
 create mode 100755 t/t3302-notes-index-expensive.sh

diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..00d27bf
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+	say Skipping timing tests
+	test_done
+	exit
+}
+
+create_repo () {
+	number_of_commits=$1
+	nr=0
+	parent=
+	test -d .git || {
+	git init &&
+	tree=$(git write-tree) &&
+	while [ $nr -lt $number_of_commits ]; do
+		test_tick &&
+		commit=$(echo $nr | git commit-tree $tree $parent) ||
+			return
+		parent="-p $commit"
+		nr=$(($nr+1))
+	done &&
+	git update-ref refs/heads/master $commit &&
+	{
+		export GIT_INDEX_FILE=.git/temp;
+		git rev-list HEAD | cat -n | sed "s/^[ 	][ 	]*/ /g" |
+		while read nr sha1; do
+			blob=$(echo note $nr | git hash-object -w --stdin) &&
+			echo $sha1 | sed "s/^/0644 $blob 0	/"
+		done | git update-index --index-info &&
+		tree=$(git write-tree) &&
+		test_tick &&
+		commit=$(echo notes | git commit-tree $tree) &&
+		git update-ref refs/notes/commits $commit
+	} &&
+	git config core.notesRef refs/notes/commits
+	}
+}
+
+test_notes () {
+	count=$1 &&
+	git config core.notesRef refs/notes/commits &&
+	git log | grep "^    " > output &&
+	i=1 &&
+	while [ $i -le $count ]; do
+		echo "    $(($count-$i))" &&
+		echo "    note $i" &&
+		i=$(($i+1));
+	done > expect &&
+	git diff expect output
+}
+
+cat > time_notes << \EOF
+	mode=$1
+	i=1
+	while [ $i -lt $2 ]; do
+		case $1 in
+		no-notes)
+			export GIT_NOTES_REF=non-existing
+		;;
+		notes)
+			unset GIT_NOTES_REF
+		;;
+		esac
+		git log >/dev/null
+		i=$(($i+1))
+	done
+EOF
+
+time_notes () {
+	for mode in no-notes notes
+	do
+		echo $mode
+		/usr/bin/time sh ../time_notes $mode $1
+	done
+}
+
+for count in 10 100 1000 10000; do
+
+	mkdir $count
+	(cd $count;
+
+	test_expect_success "setup $count" "create_repo $count"
+
+	test_expect_success 'notes work' "test_notes $count"
+
+	test_expect_success 'notes timing' "time_notes 100"
+	)
+done
+
+test_done
-- 
1.6.1.rc3.412.ga72b

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4] Introduce commit notes
  2008-12-20  8:23               ` Jeff King
@ 2008-12-20 20:09                 ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2008-12-20 20:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Robin Rosenberg, Johannes Schindelin, Govind Salinas, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Sat, Dec 20, 2008 at 12:17:46AM -0800, Junio C Hamano wrote:
>
>> >   1. git am /the/patch
>> >   2. patch -p1 <.git/rebase-apply/patch
>> >   3. manually inspect the results for sanity, and fix up the cache.h
>> >      bit that failed totally
>> >   4. git add -u && git add notes.[ch]
>> >   5. git am --resolved
>> 
>> I usually skip 2-4 and edit .git/rebase-apply/patch in place instead, and
>> run "git am" instead of step 5.
>
> How do you track down the source of the conflict to do the patch fixup?

Old fashioned way, by looking at the patch and the file that the patch is
supposed to apply and reading the contexts.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2008-12-20 20:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-16  8:15 Git Notes idea Govind Salinas
2008-12-16  8:51 ` Jeff King
2008-12-16  8:53   ` Jeff King
2008-12-16 18:43   ` Govind Salinas
2008-12-16 23:48     ` Johannes Schindelin
2008-12-17  9:45       ` Jeff King
     [not found]       ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com>
     [not found]         ` <alpine.DEB.1.00.0812170420560.14632@racer>
2008-12-17 10:11           ` Jeff King
2008-12-17 11:38             ` Johannes Schindelin
2008-12-17 19:20               ` Junio C Hamano
2008-12-18  3:08                 ` Johannes Schindelin
2008-12-19 17:42               ` Govind Salinas
2008-12-19 17:18             ` Govind Salinas
2008-12-19 17:38               ` Govind Salinas
2008-12-19 21:25                 ` Jeff King
2008-12-19 22:24                   ` Govind Salinas
2008-12-20  4:54                     ` Jeff King
2008-12-17 12:21       ` Petr Baudis
2008-12-17  9:38     ` Jeff King
2008-12-17 17:06       ` Govind Salinas
2008-12-18 13:54         ` Jeff King
2008-12-17  0:12   ` rebasing commits that have notes, was " Johannes Schindelin
2008-12-17  9:15     ` Johan Herland
2008-12-17 17:55       ` Stephan Beyer
2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
2008-12-19 23:35     ` [PATCH 1/4] Introduce commit notes Johannes Schindelin
2008-12-20  6:53       ` Jeff King
2008-12-20  7:55         ` Robin Rosenberg
2008-12-20  8:05           ` Jeff King
2008-12-20  8:17             ` Junio C Hamano
2008-12-20  8:23               ` Jeff King
2008-12-20 20:09                 ` Junio C Hamano
2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
2008-12-20 12:05           ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin
2008-12-20 12:05           ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin
2008-12-20 12:05           ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin
2008-12-20 12:06           ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin
2008-12-19 23:35     ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin
2008-12-19 23:35     ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin
2008-12-19 23:37     ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin
2008-12-19 23:49       ` Boyd Stephen Smith Jr.
2008-12-20 11:51         ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.