* Git Notes idea. @ 2008-12-16 8:15 Govind Salinas 2008-12-16 8:51 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: Govind Salinas @ 2008-12-16 8:15 UTC (permalink / raw) To: Git Mailing List Hi All, I was thinking about possible ideas for my little pet project and I had and idea for way to tack on notes to a commit, or any object really. I know that the idea has been flying around for a long time but there has never been any implementation or a concept that people liked enough to use (unless I have missed something). Here is my idea. .git/refs/notes contains a tree-id (assuming that using a tree-id will not cause any problems, otherwise a commit object can be used. it does not *need* a history, but it *could* have one). That tree has a structure similar to the layout of .git/objects, where it is 2 letter subdirectories for the notes objects. Given a git object (commit, tree, blob, tag), use its sha as the path/filename in this tree. If I have a commit 1234567890123456789012345678901234567890 then the notes tree will have a file 12/34567890123456789012345678901234567890 That file has a list of sha1s (one per line). These shas are object IDs for blobs that have the notes or whatever that you want attached to the item. I think you get the idea. When looking up an item, it should be fairly easy to have the notes tree and subtrees available for doing lookups. And as far as I know stuff under .git/refs can be pushed/pulled even if its not under heads or remotes or tags using already existing machinery. I am not sure, but I think that would satisfy gc operations as well. Also, these trees and blobs never have to be put in the working directory. Does this sound like something that is workable? I thought it might appeal since it uses only features that are already present. This could be extended so that you have different sets of notes under .git/refs/notes/<my note set> or whatever. So that you can have some notes you keep private and some that you publish or whatever. OK, hopefully this isn't a off the wall, thats-what-you-get-for-being-up-at-2-AM idea. Thanks, Govind. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 8:15 Git Notes idea Govind Salinas @ 2008-12-16 8:51 ` Jeff King 2008-12-16 8:53 ` Jeff King ` (3 more replies) 0 siblings, 4 replies; 41+ messages in thread From: Jeff King @ 2008-12-16 8:51 UTC (permalink / raw) To: Govind Salinas; +Cc: Git Mailing List On Tue, Dec 16, 2008 at 02:15:47AM -0600, Govind Salinas wrote: > I was thinking about possible ideas for my little pet project and I > had and idea for way to tack on notes to a commit, or any object > really. I know that the idea has been flying around for a long time > but there has never been any implementation or a concept that people > liked enough to use (unless I have missed something). I think you look at the previous suggestions, because yours is very similar. Which is good, I think, because the current status is that the design is good, but nobody has gotten around to working on it yet. So maybe you can fix that. :) > .git/refs/notes contains a tree-id (assuming that using a tree-id > will not cause any problems, otherwise a commit object can be used. > it does not *need* a history, but it *could* have one). That is the same as the current proposal, except: - the proposal is to use a commit, so your notes are version-controlled - I have suggested supporting multiple note "bases" in the refs/notes namespace. This would allow you to share some notes but not others (e.g., if you had some automated notes related to a build/test system, you might not want to mix those with your human-written notes). > That tree has a structure similar to the layout of .git/objects, where > it is 2 letter subdirectories for the notes objects. I don't think this has been suggested yet, but I'm not sure it is a good idea. The usual reason for this split is that many filesystems handle large directories badly; that isn't a problem here. It does reduce the size of the resulting tree objects when a note is modified (we make updates to two smaller trees instead of one big tree). I don't know if this really matters all that much, since the trees will end up deltified in a pack anyway. And it does make the implementation slightly less simple, since we have to deal with two levels of trees. > Given a git object (commit, tree, blob, tag), use its sha as the > path/filename in this tree. > If I have a commit 1234567890123456789012345678901234567890 then > the notes tree will have a file > 12/34567890123456789012345678901234567890 > > That file has a list of sha1s (one per line). These shas are object > IDs for blobs that have the notes or whatever that you want attached > to the item. This is slightly different than the current proposal. You are proposing that each item have a "list of notes". My thinking was to have "named notes" using a tree for each entry full of blobs. So you could look up the "foo" note for a given commit, but that note would be a single scalar (which could, of course, be interpreted according to its name). > I think you get the idea. When looking up an item, it should be > fairly easy to have the notes tree and subtrees available for doing > lookups. And as far as I know stuff under .git/refs can be It is easy, but it's slow because we have to do a linear search in the (potentially huge) notes tree. And that's what held up the initial implementation. I did a proof-of-concept a month or so ago that pre-seeded an in-memory hash using the tree contents and got pretty reasonable performance. > pushed/pulled even if its not under heads or remotes or tags using > already existing machinery. I am not sure, but I think that would > satisfy gc operations as well. Also, these trees and blobs never have > to be put in the working directory. Right, though I think one of the benefits of this approach is that you _could_ do a checkout on the notes tree if you wanted to do very flexible editing. > Does this sound like something that is workable? I thought it might > appeal since it uses only features that are already present. Yes, it sounds workable, though if you diverge from what has already been discussed, I think you should make an argument about why your approach is better. > This could be extended so that you have different sets of notes under > .git/refs/notes/<my note set> or whatever. So that you can have some > notes you keep private and some that you publish or whatever. Oops, I should have read your whole mail. Yes, that is a good idea. :) For reference, here are the previous discussions that I think are relevant: Johan Herland's original notes proposal (which I think is largely dead, replaced by the one below): http://thread.gmane.org/gmane.comp.version-control.git/46770 Johannes Schindelin's notes proposal (which is more or less the current proposal, but I think the on-disk notes index was not well liked): http://thread.gmane.org/gmane.comp.version-control.git/52598 My test with using a hash to speed it up: http://article.gmane.org/gmane.comp.version-control.git/99415 Some discussion of the interaction of notes and rebase: http://thread.gmane.org/gmane.comp.version-control.git/100533 Some thoughts from me on naming issues: http://article.gmane.org/gmane.comp.version-control.git/100402 Some thoughts from me on the tree speedup: http://article.gmane.org/gmane.comp.version-control.git/101460 which I think should bring you up to speed. :) -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 8:51 ` Jeff King @ 2008-12-16 8:53 ` Jeff King 2008-12-16 18:43 ` Govind Salinas ` (2 subsequent siblings) 3 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2008-12-16 8:53 UTC (permalink / raw) To: Govind Salinas; +Cc: Git Mailing List On Tue, Dec 16, 2008 at 03:51:08AM -0500, Jeff King wrote: > I think you look at the previous suggestions, because yours is very Sorry, there is a typo there. I meant "I think you _should_ look at the previous suggestions." Not saying in broken English that you already have looked at them. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 8:51 ` Jeff King 2008-12-16 8:53 ` Jeff King @ 2008-12-16 18:43 ` Govind Salinas 2008-12-16 23:48 ` Johannes Schindelin 2008-12-17 9:38 ` Jeff King 2008-12-17 0:12 ` rebasing commits that have notes, was " Johannes Schindelin 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin 3 siblings, 2 replies; 41+ messages in thread From: Govind Salinas @ 2008-12-16 18:43 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On Tue, Dec 16, 2008 at 2:51 AM, Jeff King <peff@peff.net> wrote: > On Tue, Dec 16, 2008 at 02:15:47AM -0600, Govind Salinas wrote: > >> I was thinking about possible ideas for my little pet project and I >> had and idea for way to tack on notes to a commit, or any object >> really. I know that the idea has been flying around for a long time >> but there has never been any implementation or a concept that people >> liked enough to use (unless I have missed something). > > I think you look at the previous suggestions, because yours is very > similar. Which is good, I think, because the current status is that the > design is good, but nobody has gotten around to working on it yet. So > maybe you can fix that. :) > I was thinking I would do my first implementation in pyrite and if I find that it works well I will port it. >> .git/refs/notes contains a tree-id (assuming that using a tree-id >> will not cause any problems, otherwise a commit object can be used. >> it does not *need* a history, but it *could* have one). > > That is the same as the current proposal, except: > > - the proposal is to use a commit, so your notes are version-controlled > > - I have suggested supporting multiple note "bases" in the refs/notes > namespace. This would allow you to share some notes but not others > (e.g., if you had some automated notes related to a build/test > system, you might not want to mix those with your human-written > notes). > >> That tree has a structure similar to the layout of .git/objects, where >> it is 2 letter subdirectories for the notes objects. > > I don't think this has been suggested yet, but I'm not sure it is a good > idea. The usual reason for this split is that many filesystems handle > large directories badly; that isn't a problem here. > I just read the proposal from Johannes, he seems to want to use a similar layout. However, I would like to modify my proposal slightly to make it work better when a gc is run. I would modify the tree to look like this... let 1234567890123456789012345678901234567890 be the id of the item that is annotated. let abcdef7890123456789012345678901234567890 be the id of the note to be attached root/ 12/ 34567890123456789012345678901234567890/ abcdef7890123456789012345678901234567890 This way all the notes are attached to a tree, so that gc won't think they are unreferenced objects. > It does reduce the size of the resulting tree objects when a note is > modified (we make updates to two smaller trees instead of one big tree). > I don't know if this really matters all that much, since the trees > will end up deltified in a pack anyway. > > And it does make the implementation slightly less simple, since we have > to deal with two levels of trees. > >> Given a git object (commit, tree, blob, tag), use its sha as the >> path/filename in this tree. >> If I have a commit 1234567890123456789012345678901234567890 then >> the notes tree will have a file >> 12/34567890123456789012345678901234567890 >> >> That file has a list of sha1s (one per line). These shas are object >> IDs for blobs that have the notes or whatever that you want attached >> to the item. > > This is slightly different than the current proposal. You are proposing > that each item have a "list of notes". My thinking was to have "named > notes" using a tree for each entry full of blobs. So you could look up > the "foo" note for a given commit, but that note would be a single > scalar (which could, of course, be interpreted according to its name). > >> I think you get the idea. When looking up an item, it should be >> fairly easy to have the notes tree and subtrees available for doing >> lookups. And as far as I know stuff under .git/refs can be > > It is easy, but it's slow because we have to do a linear search in the > (potentially huge) notes tree. And that's what held up the initial > implementation. I did a proof-of-concept a month or so ago that > pre-seeded an in-memory hash using the tree contents and got pretty > reasonable performance. > Perhaps I am missing something, how is it a linear search?. Since we are keying off of the sha of the annotated object, using a hashtable for a cache should be a fairly quick binary search. If you just wanted to use the tree objects, that should work almost as well since the first tree will split them up nicely for you. Also, how large do you expect the list to be under reasonable circumstances. >> pushed/pulled even if its not under heads or remotes or tags using >> already existing machinery. I am not sure, but I think that would >> satisfy gc operations as well. Also, these trees and blobs never have >> to be put in the working directory. > > Right, though I think one of the benefits of this approach is that you > _could_ do a checkout on the notes tree if you wanted to do very > flexible editing. > Sure, why not. >> Does this sound like something that is workable? I thought it might >> appeal since it uses only features that are already present. > > Yes, it sounds workable, though if you diverge from what has already > been discussed, I think you should make an argument about why your > approach is better. > Well after reading Johannes proposal, I find it to be surprisingly similar since I had not seen it before. However I think mine is a win in a few ways. One, it allows multiple notes per object. Two, it plays well with gc. From what I could follow, the files in his layout just have a ref to the note object but gc would be required to know about this feature and not remove those note blobs. Third it allows multiple sets of notes. Although it seems that at least 1 and 3 have been discussed at some length. >> This could be extended so that you have different sets of notes under >> .git/refs/notes/<my note set> or whatever. So that you can have some >> notes you keep private and some that you publish or whatever. > > Oops, I should have read your whole mail. Yes, that is a good idea. :) > > For reference, here are the previous discussions that I think are > relevant: > Thanks for the pointers, a couple quick thoughts... > Some discussion of the interaction of notes and rebase: > http://thread.gmane.org/gmane.comp.version-control.git/100533 > On rebasing, I have a couple thoughts. 1) I think it only really makes sense to make a public annotation to a commit that is public, and once a commit is public it should not be rebased. 2) We could also annotate the commit's tree instead of the commit. That would make it somewhat resistant to rebases, cherry-picks and amends. And once a tree has changed, the notes are probably less reliable although the user should be able to force a note or notes to be carried along. > Some thoughts from me on naming issues: > http://article.gmane.org/gmane.comp.version-control.git/100402 > On naming. I strongly support a ref/notes/sha1/sha1 approach. If having a type to the note is important, then perhaps the first line of a note could be considered a type or a set of "tags". This way you have both naming/typing and one lookup per sha. The only drawback is that you have to open the blob to see the type. A hybrid approach that uses refs/notes/acked/sha/sha which is one lookup if you know the type and the sha of the annotated object before hand might be worth considering. This would be similar to the public or private notes that i mentioned before. > Some thoughts from me on the tree speedup: > http://article.gmane.org/gmane.comp.version-control.git/101460 > I guess I must be missing something. I have seen several references to this not being a binary search several times in the links that you have here. But I fail to see why a binary search cannot be done. That said, I would still think that the existing hash table would be the way to go. > which I think should bring you up to speed. :) Thanks again. -Govind ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 18:43 ` Govind Salinas @ 2008-12-16 23:48 ` Johannes Schindelin 2008-12-17 9:45 ` Jeff King ` (2 more replies) 2008-12-17 9:38 ` Jeff King 1 sibling, 3 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-16 23:48 UTC (permalink / raw) To: Govind Salinas; +Cc: Jeff King, Git Mailing List Hi, On Tue, 16 Dec 2008, Govind Salinas wrote: > I was thinking I would do my first implementation in pyrite and if I > find that it works well I will port it. Given that there are a lot of building blocks in C already, I think it would be a waste of your time. > I just read the proposal from Johannes, he seems to want to use a > similar layout. However, I would like to modify my proposal slightly to > make it work better when a gc is run. I would modify the tree to look > like this... > > let 1234567890123456789012345678901234567890 be the > id of the item that is annotated. > > let abcdef7890123456789012345678901234567890 be the > id of the note to be attached > > root/ > 12/ > 34567890123456789012345678901234567890/ > abcdef7890123456789012345678901234567890 > > This way all the notes are attached to a tree, so that gc won't > think they are unreferenced objects. In my proposal back then, your root/12/345... would be a blob, in Peff's, it would be a tree, and in both cases the blobs/trees would be referenced by refs/notes, so git gc would not kill them either. The bigger issue is that commit objects can be gc'ed, and then their notes should be gc'ed, too. And of course, there is the rebase issue (which I completely missed; I will read the mail Jeff referenced tomorrow). Speaking about the blobs vs trees issue, I think it is no issue, as Peff and me already discussed: the notes could check if it is a tree or a blob, and handle both easily. > Peff wrote: > > > It is easy, but it's slow because we have to do a linear search in the > > (potentially huge) notes tree. And that's what held up the initial > > implementation. I did a proof-of-concept a month or so ago that > > pre-seeded an in-memory hash using the tree contents and got pretty > > reasonable performance. > > Perhaps I am missing something, how is it a linear search? Yes, you are missing what I wrote in the original thread: tree objects must be read in a forward direction, one by one. IIRC back then, Junio and/or Linus suggested that you could backward search with heuristics finding the beginning of a tree entry, and thus you could kind of bisect the tree to search for a specific tree entry (since tree objects have the contents sorted), but I presented a case where this breaks down at the GitTogether: Tree entries consist of a mode (as 6-byte ASCII representing the octal value), then SPC, then a NUL-terminated path, and then a 20-byte SHA-1. (Just hexdump the output of "git cat-file tree HEAD:" in any repository to see it). The only heuristic you could apply to find your way backward (or forward) to find the beginning of a tree entry would be to find the NUL character and verify that exactly 20 bytes after it, either the tree object ends, or there is a valid octal number followed by a SPC. The only thing you would need for this heuristic to break down is a SHA-1 which contains a \x00 (which is then mistaken for a string termination), and part of a filename that could be mistaken to be an octal number. Take for example the SHA-1 of git-gui in v1.6.0.5, which has a NUL its 14th byte, and just assume that the next tree entry has mode 100644 and name "4040000 Some financial record.txt". Then, the NUL could be mistaken for the end of the previous tree entry, and "040000 " as the mode for the next one, whose name would be assumed to be "Some financial record.txt". Granted, if you find two NULs in 21 bytes, one of them must be the termination of the path, but which one? Worse, even if you would find a method (complicated, and therefore necessarily fragile) to find the boundary of the tree entries reliably, you would _still_ have a linear time unpacking the darned tree object in the first place. So you cannot do that for every commit you encounter and expect not to die of boredom in the process. Peff's very cute idea was to decouple that process from the per-commit procedure, and basically make it a one-time cost (per Git call, and only when notes were asked for). It will still be linear in the number of notes, but it would then be in a hashmap, with an expected linear cost per commit. > Also, how large do you expect the list to be under reasonable > circumstances. We did not intend Git to be used as a backup tool, did we? One of the _worst_ design decisions is to limit yourself by expectations. > > Some discussion of the interaction of notes and rebase: > > http://thread.gmane.org/gmane.comp.version-control.git/100533 > > On rebasing, I have a couple thoughts. > > 1) I think it only really makes sense to make a public annotation to a > commit that is public, and once a commit is public it should not be > rebased. Again, do not limit your design by your expectations. People already talk about having cover letters for their patch series as notes, and Pasky seems to discuss tracking explicit renames with notes when he does not play Go, instead of maintaining repo.or.cz and git.or.cz. > 2) We could also annotate the commit's tree instead of the commit. > That would make it somewhat resistant to rebases, cherry-picks and > amends. To the contrary. When I rebase, the tree _does_ change, otherwise I would have rebased onto something that had the same original tree as my rebase-base to begin with, which would make the rebase rather pointless. > > Some thoughts from me on naming issues: > > http://article.gmane.org/gmane.comp.version-control.git/100402 > > On naming. I strongly support a ref/notes/sha1/sha1 approach. I think you meant refs/notes:<first byte in hex>/<rest of bytes>/<some arbitrary SHA-1>? I am rather supporting refs/nots:<first byte in hex>/<rest of bytes> being either a blob, or a tree containing human readable tags, such as "bugfix" or "review" or some such. > If having a type to the note is important, then perhaps the first line > of a note could be considered a type or a set of "tags". That would be horrible! Just to know if you need to unpack the blob, you'd have to unpack it! Hth, Dscho ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 23:48 ` Johannes Schindelin @ 2008-12-17 9:45 ` Jeff King [not found] ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com> 2008-12-17 12:21 ` Petr Baudis 2 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2008-12-17 9:45 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Govind Salinas, Git Mailing List On Wed, Dec 17, 2008 at 12:48:02AM +0100, Johannes Schindelin wrote: > > Perhaps I am missing something, how is it a linear search? > > Yes, you are missing what I wrote in the original thread: tree objects > must be read in a forward direction, one by one. Thanks, that was a nice writeup of what was discussed at the GitTogether. > Peff's very cute idea was to decouple that process from the per-commit > procedure, and basically make it a one-time cost (per Git call, and only > when notes were asked for). To be fair, it was not my cute idea, but somebody else's (I think David Reiss). I just coded it quickly because somebody was talking about iPhones or something. :) > To the contrary. When I rebase, the tree _does_ change, otherwise I > would have rebased onto something that had the same original tree as > my rebase-base to begin with, which would make the rebase rather > pointless. Another fun option would be to put notes on patch-ids. That would of course be horrifically slow to look up, but would survive many cherry-picks and rebases (but not all, of course). I don't know if that is useful or not, but I don't see any reason why the discussed implementation would forbid it (somebody would just have to implement the lookups at a useful spot). > > On naming. I strongly support a ref/notes/sha1/sha1 approach. > I think you meant refs/notes:<first byte in hex>/<rest of bytes>/<some > arbitrary SHA-1>? You seem to be in favor of the fan-out. Out of curiosity, did you ever do numbers on whether the fan-out is actually helpful? > I am rather supporting refs/nots:<first byte in hex>/<rest of bytes> > being either a blob, or a tree containing human readable tags, such as > "bugfix" or "review" or some such. Yes, I am in favor of the "tree or blob" idea, too. I want this to not just be about "I am a person writing a note" but "arbitrary data attached to an object after it has been created". And that means thinking up front about managing the namespace. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com>]
[parent not found: <alpine.DEB.1.00.0812170420560.14632@racer>]
* Re: Git Notes idea. [not found] ` <alpine.DEB.1.00.0812170420560.14632@racer> @ 2008-12-17 10:11 ` Jeff King 2008-12-17 11:38 ` Johannes Schindelin 2008-12-19 17:18 ` Govind Salinas 0 siblings, 2 replies; 41+ messages in thread From: Jeff King @ 2008-12-17 10:11 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Govind Salinas, Git Mailing List On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote: > > I agree, I haven't thought of any fix along these lines other than to > > make gc do the clean up. > > I have, and IIRC I briefly mentioned it back then. Basically, you will > have to add a "git notes gc" or some such, which basically reads in the > whole notes, traverses all reachable commits, marking the corresponding > notes, and then writes out all marked notes (leaving the other notes > behind). I was thinking something similar, but I think it is even easier. Make the rule "if we still have the object, then we still have the note". That has three benefits: - implementation is simple: for each note $n, delete it unless has_sha1_file($n). - it handles notes on non-commit objects - it kills off notes when an object is _pruned_, not when it stops being _reachable_. So if I delete a branch with a commit found nowhere else, its notes will hang around until it is actually pruned. If I pull it from lost+found, I still keep the notes. Note that all of this garbage collection of notes is really just removing them from the most current notes _tree_. If the notes structure is actually composed of commits, then old notes that are "deleted" will still be available historically. > I wonder why you speak as if none of that had been implemented yet. From > my work, it is obvious that hashtable is better than sorted list (except > for the fan-out, which obviously wants to be an array), and from Peff's it > is obvious that we want to keep the hashtables in memory. If he is planning on doing a separate pyrite implementation, then it _hasn't_ been implemented yet. And I don't care there if he uses hash tables or sorted lists or whatever. I think the most important thing is getting down the design of the _data structure_, so that we can have a compatible implementation inside git itself. > > IMO notes are just a generallized tag. > > IMO notes have nothing to do with a tag. Tags point to commits (or other > objects, but that's beside the point here). Notes are pointed _to_ by > commits. I think maybe we are just talking about semantics, but I think notes are not pointed to by commits. There is an external mapping of commits to notes, which is very different. I can give you the commit without you knowing the notes, or that the notes even exist. But in practice, I don't know if this distinction is going to influence any of the design or use. > Has the tree changed? Sure it has. Because Junio committed and pushed > some changes. I think it is safe to say the tree generally changes for rebase, but not necessarily for something like an amended commit, or a pull of a patch sent by mail. So there are times when it changes, and times when it doesn't. And if there were some simple way of handling the times when it didn't change at no general cost, I think going that way would be fine. But: > And the worst part about your idea to attach notes to trees rather than > commits: For things like Acked-by:... you very much want to annotate the > commit, _not_ the tree. The tree is useless here. It says nothing about > the patch, nothing about the explanation, and nothing about the history. This is a huge cost, IMHO. I think you generally want to annotate commits, not trees, and the semantic difference is important (but again, I think all of the proposals are capable of doing either -- but if you want a "show me the notes on this commit" feature to interoperate, it needs to pick one). > > root/ > > 12/ > > 34567890123456789012345678901234567890/ > > <type> > > Funny. That is Peff's proposal. Clearly we have independent verification that it's a good idea. ;) -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 10:11 ` Jeff King @ 2008-12-17 11:38 ` Johannes Schindelin 2008-12-17 19:20 ` Junio C Hamano 2008-12-19 17:42 ` Govind Salinas 2008-12-19 17:18 ` Govind Salinas 1 sibling, 2 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-17 11:38 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List Hi, On Wed, 17 Dec 2008, Jeff King wrote: > On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote: > > > > I agree, I haven't thought of any fix along these lines other than > > > to make gc do the clean up. > > > > I have, and IIRC I briefly mentioned it back then. Basically, you > > will have to add a "git notes gc" or some such, which basically reads > > in the whole notes, traverses all reachable commits, marking the > > corresponding notes, and then writes out all marked notes (leaving the > > other notes behind). > > I was thinking something similar, but I think it is even easier. Make > the rule "if we still have the object, then we still have the note". > That has three benefits: > > - implementation is simple: for each note $n, delete it unless > has_sha1_file($n). > > - it handles notes on non-commit objects > > - it kills off notes when an object is _pruned_, not when it stops > being _reachable_. So if I delete a branch with a commit found > nowhere else, its notes will hang around until it is actually pruned. > If I pull it from lost+found, I still keep the notes. > > Note that all of this garbage collection of notes is really just > removing them from the most current notes _tree_. If the notes structure > is actually composed of commits, then old notes that are "deleted" will > still be available historically. Right. So my original proposal to use separate refs for separate purposes might make sense again, so you can have private as well as public notes. > > I wonder why you speak as if none of that had been implemented yet. > > From my work, it is obvious that hashtable is better than sorted list > > (except for the fan-out, which obviously wants to be an array), and > > from Peff's it is obvious that we want to keep the hashtables in > > memory. > > If he is planning on doing a separate pyrite implementation, then it > _hasn't_ been implemented yet. And I don't care there if he uses hash > tables or sorted lists or whatever. I think the most important thing is > getting down the design of the _data structure_, so that we can have a > compatible implementation inside git itself. Well, I don't care about pyrite. As far as I am concerned, it might as well use an incompatible version. I really don't care. > > > IMO notes are just a generallized tag. > > > > IMO notes have nothing to do with a tag. Tags point to commits (or > > other objects, but that's beside the point here). Notes are pointed > > _to_ by commits. > > I think maybe we are just talking about semantics, but I think notes are > not pointed to by commits. There is an external mapping of commits to > notes, which is very different. I can give you the commit without you > knowing the notes, or that the notes even exist. > > But in practice, I don't know if this distinction is going to influence > any of the design or use. You are correct, of course, that the commit does not point to the notes explicitely, by having a SHA-1 _in_ the commit object. But the main point still stands: you go from commit to note, not from note to commit. And this is in stark contrast to tags, where you go from tag to commit, _not_ from commit to tag. That is a fundamental _difference_ between tags and notes, so that I refuse to accept the notion of notes being a generalized form of tags. Ciao, Dscho ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 11:38 ` Johannes Schindelin @ 2008-12-17 19:20 ` Junio C Hamano 2008-12-18 3:08 ` Johannes Schindelin 2008-12-19 17:42 ` Govind Salinas 1 sibling, 1 reply; 41+ messages in thread From: Junio C Hamano @ 2008-12-17 19:20 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jeff King, Govind Salinas, Git Mailing List Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > ... But the main point > still stands: you go from commit to note, not from note to commit. And > this is in stark contrast to tags, where you go from tag to commit, _not_ > from commit to tag. > > That is a fundamental _difference_ between tags and notes, so that I > refuse to accept the notion of notes being a generalized form of tags. Hmm, how would you explain things like "git describe" (and "name-rev")? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 19:20 ` Junio C Hamano @ 2008-12-18 3:08 ` Johannes Schindelin 0 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-18 3:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Govind Salinas, Git Mailing List Hi, On Wed, 17 Dec 2008, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > ... But the main point still stands: you go from commit to note, not > > from note to commit. And this is in stark contrast to tags, where you > > go from tag to commit, _not_ from commit to tag. > > > > That is a fundamental _difference_ between tags and notes, so that I > > refuse to accept the notion of notes being a generalized form of tags. > > Hmm, how would you explain things like "git describe" (and "name-rev")? Programs? Ciao, Dscho ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 11:38 ` Johannes Schindelin 2008-12-17 19:20 ` Junio C Hamano @ 2008-12-19 17:42 ` Govind Salinas 1 sibling, 0 replies; 41+ messages in thread From: Govind Salinas @ 2008-12-19 17:42 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jeff King, Git Mailing List On Wed, Dec 17, 2008 at 5:38 AM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Hi, > > On Wed, 17 Dec 2008, Jeff King wrote: >> If he is planning on doing a separate pyrite implementation, then it >> _hasn't_ been implemented yet. And I don't care there if he uses hash >> tables or sorted lists or whatever. I think the most important thing is >> getting down the design of the _data structure_, so that we can have a >> compatible implementation inside git itself. > > Well, I don't care about pyrite. As far as I am concerned, it might as > well use an incompatible version. I really don't care. Well I do care. It would not be a good thing for anyone to have 2 separate systems for notes. Let us say that someone who you work with uses pyrite and you don't. They will add notes which you can't see and vice versa. Thanks, Govind. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 10:11 ` Jeff King 2008-12-17 11:38 ` Johannes Schindelin @ 2008-12-19 17:18 ` Govind Salinas 2008-12-19 17:38 ` Govind Salinas 1 sibling, 1 reply; 41+ messages in thread From: Govind Salinas @ 2008-12-19 17:18 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List On Wed, Dec 17, 2008 at 4:11 AM, Jeff King <peff@peff.net> wrote: > On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote: > >> > I agree, I haven't thought of any fix along these lines other than to >> > make gc do the clean up. >> >> I have, and IIRC I briefly mentioned it back then. Basically, you will >> have to add a "git notes gc" or some such, which basically reads in the >> whole notes, traverses all reachable commits, marking the corresponding >> notes, and then writes out all marked notes (leaving the other notes >> behind). > > I was thinking something similar, but I think it is even easier. Make > the rule "if we still have the object, then we still have the note". > That has three benefits: > > - implementation is simple: for each note $n, delete it unless > has_sha1_file($n). > > - it handles notes on non-commit objects > > - it kills off notes when an object is _pruned_, not when it stops > being _reachable_. So if I delete a branch with a commit found > nowhere else, its notes will hang around until it is actually pruned. > If I pull it from lost+found, I still keep the notes. > > Note that all of this garbage collection of notes is really just > removing them from the most current notes _tree_. If the notes structure > is actually composed of commits, then old notes that are "deleted" will > still be available historically. > This is my concern with keeping a history of the notes pseudo-branch. Let us say that I do the following 1) on branch A commit a 2) add note a` 3) on branch B commit b 4) add note b` 5) on branch B commit c 6) add note c` 7) delete branch A 8) gc after a time such that a is pruned Now either I will always have anote a` ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-19 17:18 ` Govind Salinas @ 2008-12-19 17:38 ` Govind Salinas 2008-12-19 21:25 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: Govind Salinas @ 2008-12-19 17:38 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List Sorry, hit the send key accidentally. On Wed, Dec 17, 2008 at 4:11 AM, Jeff King <peff@peff.net> wrote: > On Wed, Dec 17, 2008 at 04:43:57AM +0100, Johannes Schindelin wrote: > > > I agree, I haven't thought of any fix along these lines other than to > > make gc do the clean up. > > I have, and IIRC I briefly mentioned it back then. Basically, you will >> have to add a "git notes gc" or some such, which basically reads in the >> whole notes, traverses all reachable commits, marking the corresponding >> notes, and then writes out all marked notes (leaving the other notes >> behind). > > I was thinking something similar, but I think it is even easier. Make > the rule "if we still have the object, then we still have the note". > That has three benefits: > > - implementation is simple: for each note $n, delete it unless > has_sha1_file($n). > > - it handles notes on non-commit objects > > - it kills off notes when an object is _pruned_, not when it stops > being _reachable_. So if I delete a branch with a commit found > nowhere else, its notes will hang around until it is actually pruned. > If I pull it from lost+found, I still keep the notes. > > Note that all of this garbage collection of notes is really just > removing them from the most current notes _tree_. If the notes structure > is actually composed of commits, then old notes that are "deleted" will > still be available historically. > This is my concern with keeping a history of the notes pseudo-branch. Let me restate what you are saying with an example 1) on branch A commit a 2) add note a` 3) on branch B commit b 4) add note b` 5) on branch B commit c 6) add note c` 7) delete branch A 8) gc after a time such that a is pruned Now either I will always have a note a` as an object forever even though the only commit that points to it is gone or I have to re-write the history of the notes branch from the point that it was added. Given this problem, is it really such a good idea to keep the history? Of course the other side of this conversation is that the merge operation will be more complex since the following can also happen 9) push notes 10) user 2 pulls notes but still has commit a and note a` On the other, other hand, pushing and pulling notes if a history is kept will have to involve a lot of rebasing/merging. Just to throw an idea out... A possible solution is that notes are per-branch, refs/notes/heads/master refs/notes/heads/foo/bar refs/notes/remotes/baz/bang and then it is easier to deal with. A published branch's notes are isolated from the changes in unpublished branches. And since published branches aren't *supposed* to change, then the notes should also always be fast forwards. Similarly, if a branch is not considered stable, like pu or even next, then the associated notes branch could be forced in the same way. Rebase, cherry-pick and merge (and possibly branch/checkout) would have to be updated to handle notes, which is the down side. It also doesn't solve the issue of a history causing us to keep notes after the aren't useful anymore. So perhaps we could use the above layout with no history? Thanks, Govind. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-19 17:38 ` Govind Salinas @ 2008-12-19 21:25 ` Jeff King 2008-12-19 22:24 ` Govind Salinas 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2008-12-19 21:25 UTC (permalink / raw) To: Govind Salinas; +Cc: Johannes Schindelin, Git Mailing List On Fri, Dec 19, 2008 at 11:38:55AM -0600, Govind Salinas wrote: > This is my concern with keeping a history of the notes pseudo-branch. Let > me restate what you are saying with an example > > 1) on branch A commit a > 2) add note a` > 3) on branch B commit b > 4) add note b` > 5) on branch B commit c > 6) add note c` > 7) delete branch A > 8) gc after a time such that a is pruned > > Now either I will always have a note a` as an object forever even though > the only commit that points to it is gone or I have to re-write the history of > the notes branch from the point that it was added. Yes, that's correct. > Given this problem, is it really such a good idea to keep the history? I think so. Otherwise how will you push and pull notes? You won't even know which one is the more recent tree, let alone handle any merges caused by editing notes in two places. > On the other, other hand, pushing and pulling notes if a history is kept > will have to involve a lot of rebasing/merging. Depending on your workflow. It might just involve a lot of fast forwards if the note writer is in one place. > A possible solution is that notes are per-branch, > > refs/notes/heads/master > refs/notes/heads/foo/bar > refs/notes/remotes/baz/bang Sorry, I don't quite get it. You are asking for per-branch notes that keep history, or per-branch notes that don't keep history? If the former, then you haven't solved the cruft accumulation problem. You can get obsolete notes in your note history by rebasing on a branch that is long-running (which is OK as long as you haven't published _those particular_ commits). Or are you proposing to rebase and cleanup the notes history every time you do a destructive operation? If the latter, then I don't see how you've solved the push-pull and merge problem (which you need history for). But in either case, I think the solution is non-intuitive. If I annotate a commit, and then merge the commit from one branch to another, shouldn't the annotation stay? Really, I am not sure this is worth getting too concerned about. Since we are talking about cruft in the _history_ of the notes branch, it won't impact actual notes usage (which will always just deal with the most recent tree). So really we are talking about some uninteresting objects in the db, which wastes some space. In practice, I suspect this won't be that large because notes themselves are going to be relatively short and in many cases, repetitive (i.e., many annotations may have the same blob hash for several commits). And if it is a space problem, then the right solution is to periodically truncate the notes history by rewriting. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-19 21:25 ` Jeff King @ 2008-12-19 22:24 ` Govind Salinas 2008-12-20 4:54 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: Govind Salinas @ 2008-12-19 22:24 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Git Mailing List On Fri, Dec 19, 2008 at 3:25 PM, Jeff King <peff@peff.net> wrote: > On Fri, Dec 19, 2008 at 11:38:55AM -0600, Govind Salinas wrote: > >> This is my concern with keeping a history of the notes pseudo-branch. Let >> me restate what you are saying with an example >> >> 1) on branch A commit a >> 2) add note a` >> 3) on branch B commit b >> 4) add note b` >> 5) on branch B commit c >> 6) add note c` >> 7) delete branch A >> 8) gc after a time such that a is pruned >> >> Now either I will always have a note a` as an object forever even though >> the only commit that points to it is gone or I have to re-write the history of >> the notes branch from the point that it was added. > > Yes, that's correct. > >> Given this problem, is it really such a good idea to keep the history? > > I think so. Otherwise how will you push and pull notes? You won't even > know which one is the more recent tree, let alone handle any merges > caused by editing notes in two places. Couldn't you simply merge your tree and theirs even if there is no history. You would have to find a way to handle merges in any event since they could just as easily happen if you have a history. >> On the other, other hand, pushing and pulling notes if a history is kept >> will have to involve a lot of rebasing/merging. > > Depending on your workflow. It might just involve a lot of fast forwards > if the note writer is in one place. > >> A possible solution is that notes are per-branch, >> >> refs/notes/heads/master >> refs/notes/heads/foo/bar >> refs/notes/remotes/baz/bang > > Sorry, I don't quite get it. You are asking for per-branch notes that > keep history, or per-branch notes that don't keep history? Both, at the end of my previous mail I said... "So perhaps we could use the above layout with no history?" But they are two separate fixes to 2 different problems. > If the former, then you haven't solved the cruft accumulation problem. > You can get obsolete notes in your note history by rebasing on a branch > that is long-running (which is OK as long as you haven't published > _those particular_ commits). Or are you proposing to rebase and cleanup > the notes history every time you do a destructive operation? Yes, it does not solve that problem. But it does solve things like Dev1 and Dev2 both have branches A and topic branch B. and they are in refs/notes/public (or refs/notes or something not branch specific). Dev1 adds 100 notes to topic B, lets say half of them are obsolete due to rebases or whatever. Dev2 pulls A and updates their notes as well. Now Dev2 has acquired all the notes from Dev1 including the obsolete ones. So you have 100 commits, 100 blobs and all the new trees that go with them that the user was not interested in. Run this across 1000 users and you have a lot of cruft. Now, if instead we have a per-branch notes scheme, then you only get the cruft from the branches you were interested in. If you remove the history you could end up with no cruft because gc should handle it. > If the latter, then I don't see how you've solved the push-pull and > merge problem (which you need history for). What git-fetch would have to do is say. This is a note. The remote sha is not the same as mine, i will treat this as a force and fetch the objects without checking history and then run a merge on the 2 commits. The notes merge could have its own strategy that checked if an object exists before deciding to add a new item or delete a removed one. Then the user would only have to intervene if the notes where edited. > But in either case, I think the solution is non-intuitive. If I annotate > a commit, and then merge the commit from one branch to another, > shouldn't the annotation stay? Sure, either the merge command could run 2 merges, one for the real branch and one for the notes pseudo branch or the user could be required to do that manually. I would think that doing it automatically would be good. Especially if you use a special merge strategy. > Really, I am not sure this is worth getting too concerned about. Since > we are talking about cruft in the _history_ of the notes branch, it > won't impact actual notes usage (which will always just deal with the > most recent tree). So really we are talking about some uninteresting > objects in the db, which wastes some space. In practice, I suspect this > won't be that large because notes themselves are going to be relatively > short and in many cases, repetitive (i.e., many annotations may have the > same blob hash for several commits). And if it is a space problem, then > the right solution is to periodically truncate the notes history by > rewriting. You are correct of course that it will just be wasted space. But I am concerned that it could end up being a lot of wasted space. I mean, what if every person who contributed to the kernel contributed note cruft. Users have branches that they consider public, so they might go into the a public note store if there is no per-branch store. Or errant users could use the public store without understanding how they are affecting the central repo, including the obsolete ones. If you *really* don't think its something to be worried about then I am OK with that since you have a lot more experience with this, but it sounds hairy to me. Thanks, Govind. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-19 22:24 ` Govind Salinas @ 2008-12-20 4:54 ` Jeff King 0 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2008-12-20 4:54 UTC (permalink / raw) To: Govind Salinas; +Cc: Johannes Schindelin, Git Mailing List On Fri, Dec 19, 2008 at 04:24:01PM -0600, Govind Salinas wrote: > > I think so. Otherwise how will you push and pull notes? You won't even > > know which one is the more recent tree, let alone handle any merges > > caused by editing notes in two places. > > Couldn't you simply merge your tree and theirs even if there is no > history. You would have to find a way to handle merges in any event > since they could just as easily happen if you have a history. Let's say I have a tree T1 like this: $COMMIT_A -> $BLOB_A $COMMIT_B -> $BLOB_B1 and a tree T2 like this: $COMMIT_B -> $BLOB_B2 $COMMIT_C -> $BLOB_C what is the correct merge? Was $COMMIT_A added in T1, or deleted in T2? How about $COMMIT_C? Even if you went with a strategy like "always add from both" (which I don't think is a good idea, because deleted notes will keep popping back up) you have a conflict with $COMMIT_B. Should it be B1 or B2? You can't tell if B1 became B2, vice versa, or if there is a true merge conflict. > > If the former, then you haven't solved the cruft accumulation problem. > > You can get obsolete notes in your note history by rebasing on a branch > > that is long-running (which is OK as long as you haven't published > > _those particular_ commits). Or are you proposing to rebase and cleanup > > the notes history every time you do a destructive operation? > > Yes, it does not solve that problem. But it does solve things like > > Dev1 and Dev2 both have branches A and topic branch B. and they > are in refs/notes/public (or refs/notes or something not branch specific). > > Dev1 adds 100 notes to topic B, lets say half of them are obsolete due > to rebases or whatever. Dev2 pulls A and updates their notes > as well. Now Dev2 has acquired all the notes from Dev1 including the > obsolete ones. So you have 100 commits, 100 blobs and all the new > trees that go with them that the user was not interested in. > > Run this across 1000 users and you have a lot of cruft. > > Now, if instead we have a per-branch notes scheme, then you only get > the cruft from the branches you were interested in. If you remove the > history you could end up with no cruft because gc should handle it. OK. But my point is that this is an incomplete solution. You can _still_ get cruft, and you _still_ have to deal with that cruft some other way. So we will still end up having to implement something else. And I might even be fine with a partial solution that helped some if it didn't come with a cost, but I think the "notes stick to branches" behavior is strictly worse. > > If the latter, then I don't see how you've solved the push-pull and > > merge problem (which you need history for). > > What git-fetch would have to do is say. This is a note. The remote > sha is not the same as mine, i will treat this as a force and fetch the > objects without checking history and then run a merge on the 2 > commits. The notes merge could have its own strategy that checked > if an object exists before deciding to add a new item or delete a > removed one. Then the user would only have to intervene if the > notes where edited. I don't like that because: - the user is going to end up manually resolving merge conflicts for things that _should_ have been fast forwards. But much worse, it's going to be on content they may never even have seen before. How will they decide which is which? - how do you push notes? There's no opportunity to handle the merge on the remote side. And you can't just pull, merge locally, and push what is now a fast-forward, because there is no concept of fast-forward without history. - Suddenly pulling and pushing notes isn't just taken care of by the usual ref transfer mechanisms. We have to implement a whole new system. > You are correct of course that it will just be wasted space. But I am > concerned that it could end up being a lot of wasted space. I mean, what > if every person who contributed to the kernel contributed note cruft. Users What if every person who contributed to the kernel contributed history cruft? It's really the same problem, and it is solved by people keeping their trees clean (via rebase) and being picky about how data comes into your tree (i.e., don't pull from people with cruft). I suspect Linus wouldn't pull notes at all (and they wouldn't make it over patch transmission anyway). But in a workflow that is pulling the notes, the right time to clean up history is probably before publishing. That is, you can rebase and clean up your notes history just before you push it to somewhere public, just like you might clean up messy history. > If you *really* don't think its something to be worried about then I > am OK with that since you have a lot more experience with this, but it > sounds hairy to me. It is hairy, and I wish there were a better solution. But I think every other option is much worse. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 23:48 ` Johannes Schindelin 2008-12-17 9:45 ` Jeff King [not found] ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com> @ 2008-12-17 12:21 ` Petr Baudis 2 siblings, 0 replies; 41+ messages in thread From: Petr Baudis @ 2008-12-17 12:21 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Govind Salinas, Jeff King, Git Mailing List On Wed, Dec 17, 2008 at 12:48:02AM +0100, Johannes Schindelin wrote: > Again, do not limit your design by your expectations. People already talk > about having cover letters for their patch series as notes, and Pasky > seems to discuss tracking explicit renames with notes when he does not > play Go, instead of maintaining repo.or.cz and git.or.cz. I don't really play Go that much anymore! ;-) Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-16 18:43 ` Govind Salinas 2008-12-16 23:48 ` Johannes Schindelin @ 2008-12-17 9:38 ` Jeff King 2008-12-17 17:06 ` Govind Salinas 1 sibling, 1 reply; 41+ messages in thread From: Jeff King @ 2008-12-17 9:38 UTC (permalink / raw) To: Govind Salinas; +Cc: Git Mailing List, Johannes Schindelin On Tue, Dec 16, 2008 at 12:43:55PM -0600, Govind Salinas wrote: > I was thinking I would do my first implementation in pyrite and if I find > that it works well I will port it. OK, though your performance will probably suck unless you dump the notes tree into a local hash at the beginning of your program. And looking up every commit's note during revision traversal is one of the intended uses (e.g., decorating git-log output, or filtering commits based on a particular note). And as Dscho mentioned, most of what you need is already there in C. You are welcome to implement whatever you want in pyrite, of course, but there is a desire to have this accessible to the revision traversal machinery. And that means if you want your version in pyrite to be compatible with what ends up in git, the data structure design needs to be suitable for both. > I just read the proposal from Johannes, he seems to want to use a > similar layout. However, I would like to modify my proposal slightly > to make it work better when a gc is run. I would modify the tree to > look like this... > > let 1234567890123456789012345678901234567890 be the > id of the item that is annotated. > > let abcdef7890123456789012345678901234567890 be the > id of the note to be attached > > root/ > 12/ > 34567890123456789012345678901234567890/ > abcdef7890123456789012345678901234567890 > > This way all the notes are attached to a tree, so that gc won't > think they are unreferenced objects. But you have lost the ordering in your list, then, since they will not be ordered by sha1 of the note contents. I don't know if you care. The second sha1 is pointless, anyway, since nobody will know that number as a reference; why not just name them monotonically starting at 1? One of the things I don't like about having several notes is that it introduces an extra level of indirection that every user has to pay for, whether they want it or not. If a note can be a blob _or_ a tree, then those who want to use blobs can reap the performance benefit. Those who want multiple named notes in a hierarchy can pay the extra indirection cost. I haven't measured how big a cost that is (but bearing in mind that we might want to do this lookup once per revision in a traversal, even one extra object lookup can have an impact). I'm also still not convinced the fan-out is worthwhile, but I can see how it might be. It would be nice to see numbers for both. > Perhaps I am missing something, how is it a linear search?. Since we I think Johannes explained in detail in another message, but it is a linear search to look up directly in a tree object. Of course you can build a hash or a sorted fixed-size list as an index. > Also, how large do you expect the list to be under reasonable > circumstances. As many notes as there are commits is my goal (e.g., it is not hard to imagine an automated process to add notes on build status). Ideally, we could handle as many notes as there are objects; I see no reason not to allow annotating arbitrary sha1's (I don't know if there is a use for that, but the more scalable the implementation, the better). > > Some thoughts from me on naming issues: > > http://article.gmane.org/gmane.comp.version-control.git/100402 > > On naming. I strongly support a ref/notes/sha1/sha1 approach. If > having a type to the note is important, then perhaps the first line of > a note could be considered a type or a set of "tags". This way you I don't think we are talking about the same thing. What I mean by naming is "here is a shorthand for referring to notes" that is not necessarily coupled with the implementation. That is, I would like to do something like: git log --notes-filter="foo:bar == 1" and have that "foo:bar" as a shorthand on each commit for: refs/notes/foo:$COMMIT/bar Without a left-hand side (e.g., "bar"), we get: refs/notes/default:$COMMIT/bar Or without a right-hand side (e.g., "foo:"), we get: refs/notes/foo:$COMMIT So you can group related notes in the same tree (which gives you fast lookup if you are looking at multiple ones, since you only have to do the tree lookup once), or you can keep notes in separate trees (which means you can distribute some but not others). I think your "list of notes" proposal on top of that would be for any note resolution to provide a tree instead of a blob, with sequenced elements. I.e., foo:bar might have multiple notes, like: refs/notes/foo:$COMMIT/bar/1 refs/notes/foo:$COMMIT/bar/2 refs/notes/foo:$COMMIT/bar/3 > drawback is that you have to open the blob to see the type. A hybrid > approach that uses refs/notes/acked/sha/sha which is one lookup if Right, that is possible, but implementing only half of what I suggested above. I think you should be flexible enough to have grouped notes for fast lookup, or ungrouped notes for more flexibility. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 9:38 ` Jeff King @ 2008-12-17 17:06 ` Govind Salinas 2008-12-18 13:54 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: Govind Salinas @ 2008-12-17 17:06 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List, Johannes Schindelin On Wed, Dec 17, 2008 at 3:38 AM, Jeff King <peff@peff.net> wrote: > On Tue, Dec 16, 2008 at 12:43:55PM -0600, Govind Salinas wrote: > >> I was thinking I would do my first implementation in pyrite and if I find >> that it works well I will port it. > > OK, though your performance will probably suck unless you dump the notes > tree into a local hash at the beginning of your program. And looking up > every commit's note during revision traversal is one of the intended > uses (e.g., decorating git-log output, or filtering commits based on a > particular note). Yes, I was thinking that this is the natural way to do things, save that I would be lazy loading the trees into a cache instead of caching them all up front. This is one of the reasons that I think the fan out will help. > And as Dscho mentioned, most of what you need is already there in C. > You are welcome to implement whatever you want in pyrite, of course, but > there is a desire to have this accessible to the revision traversal > machinery. And that means if you want your version in pyrite to be > compatible with what ends up in git, the data structure design needs to > be suitable for both. Yes, I completely agree that I want it to have the same scheme as what git will use. That is the reason I posted this here. Since no method has been formally accepted (checked into master) I wanted to see if I could nudge things along. I wasn't aware that you and Dscho had a (very similar) plan. Please, if you guys are decided on the format then I can just go off and start working on it. But it sounds like there isn't consensus yet. <snip> >> root/ >> 12/ >> 34567890123456789012345678901234567890/ >> abcdef7890123456789012345678901234567890 >> >> This way all the notes are attached to a tree, so that gc won't >> think they are unreferenced objects. > > But you have lost the ordering in your list, then, since they will not > be ordered by sha1 of the note contents. I don't know if you care. The > second sha1 is pointless, anyway, since nobody will know that number as > a reference; why not just name them monotonically starting at 1? In a later mail I suggested that this be the type or name of the note. Which I hear is similar to what you suggested. > One of the things I don't like about having several notes is that it > introduces an extra level of indirection that every user has to pay for, > whether they want it or not. If a note can be a blob _or_ a tree, then > those who want to use blobs can reap the performance benefit. Those who > want multiple named notes in a hierarchy can pay the extra indirection > cost. > > I haven't measured how big a cost that is (but bearing in mind that we > might want to do this lookup once per revision in a traversal, even one > extra object lookup can have an impact). That seems reasonable. > I'm also still not convinced the fan-out is worthwhile, but I can see > how it might be. It would be nice to see numbers for both. > <snip> > Also, how large do you expect the list to be under reasonable >> circumstances. > > As many notes as there are commits is my goal (e.g., it is not hard to > imagine an automated process to add notes on build status). Ideally, we > could handle as many notes as there are objects; I see no reason not to > allow annotating arbitrary sha1's (I don't know if there is a use for > that, but the more scalable the implementation, the better). Ah, that is in line with what I was thinking as well. >> > Some thoughts from me on naming issues: >> > http://article.gmane.org/gmane.comp.version-control.git/100402 >> >> On naming. I strongly support a ref/notes/sha1/sha1 approach. If >> having a type to the note is important, then perhaps the first line of >> a note could be considered a type or a set of "tags". This way you > > I don't think we are talking about the same thing. What I mean by naming > is "here is a shorthand for referring to notes" that is not necessarily > coupled with the implementation. That is, I would like to do something > like: > > git log --notes-filter="foo:bar == 1" > > and have that "foo:bar" as a shorthand on each commit for: > > refs/notes/foo:$COMMIT/bar > > Without a left-hand side (e.g., "bar"), we get: > > refs/notes/default:$COMMIT/bar > > Or without a right-hand side (e.g., "foo:"), we get: > > refs/notes/foo:$COMMIT > I like the overall plan, but I would suggest that --notes[=default] and --note-type=whatever would be a little friendlier and less error prone. Thanks for helping me think through this. -Govind ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Git Notes idea. 2008-12-17 17:06 ` Govind Salinas @ 2008-12-18 13:54 ` Jeff King 0 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2008-12-18 13:54 UTC (permalink / raw) To: Govind Salinas; +Cc: Git Mailing List, Johannes Schindelin On Wed, Dec 17, 2008 at 11:06:15AM -0600, Govind Salinas wrote: > Yes, I was thinking that this is the natural way to do things, save that I > would be lazy loading the trees into a cache instead of caching them > all up front. This is one of the reasons that I think the fan out will > help. I was working under the assumption that you are going to do multiple note lookups. If you are, then the fan-out isn't really going to help, as you're going to end up pulling in all of the subtrees anyway. It helps some if you're only doing a single lookup, but I don't know if that is measurable. > Yes, I completely agree that I want it to have the same scheme as what > git will use. That is the reason I posted this here. Since no method > has been formally accepted (checked into master) I wanted to see if > I could nudge things along. I wasn't aware that you and Dscho had > a (very similar) plan. Please, if you guys are decided on the format > then I can just go off and start working on it. But it sounds like there > isn't consensus yet. This is probably not the answer you want, but I think the final design depends on some C experiments. For example, whether or not there should be fan-out depends on how it affects performance, which means we need to do at least partial implementations to compare. So it really is just waiting for somebody to sit down and do it. > I like the overall plan, but I would suggest that --notes[=default] and > --note-type=whatever would be a little friendlier and less error prone. But keeping it as a single string means there is no ambiguity when you specify multiple notes at once. For example: git log \ --note-filter='test:status == "fail" && importance > 3' \ --pretty=format:%h%n%N(test:errors) would do something like: foreach commit $C compare refs/notes/test:$C/status against the string "fail" compare refs/notes/default:$C/importance against the number 3 if either don't match, skip the commit show the hash and the contents of refs/notes/test:$C/errors and obviously that filter language is totally made up and we may or may not want to do something that complex. But my point is that we are defining a namespace of notes, and we want to be able to refer to a multiple fully qualified names. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* rebasing commits that have notes, was Re: Git Notes idea. 2008-12-16 8:51 ` Jeff King 2008-12-16 8:53 ` Jeff King 2008-12-16 18:43 ` Govind Salinas @ 2008-12-17 0:12 ` Johannes Schindelin 2008-12-17 9:15 ` Johan Herland 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin 3 siblings, 1 reply; 41+ messages in thread From: Johannes Schindelin @ 2008-12-17 0:12 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List Hi, On Tue, 16 Dec 2008, Jeff King wrote: > Some discussion of the interaction of notes and rebase: > http://thread.gmane.org/gmane.comp.version-control.git/100533 Oh, I misinterpreted that label... of course you can track rebases in notes, but some issue that we did not look into yet (I think) is the issue that you can cherry-pick and rebase commits and lose notes in the process. It seems that the notes idea is not that unintrusive as I thought... Ciao, Dscho ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: rebasing commits that have notes, was Re: Git Notes idea. 2008-12-17 0:12 ` rebasing commits that have notes, was " Johannes Schindelin @ 2008-12-17 9:15 ` Johan Herland 2008-12-17 17:55 ` Stephan Beyer 0 siblings, 1 reply; 41+ messages in thread From: Johan Herland @ 2008-12-17 9:15 UTC (permalink / raw) To: git; +Cc: Johannes Schindelin, Jeff King, Govind Salinas On Wednesday 17 December 2008, Johannes Schindelin wrote: > Hi, > > On Tue, 16 Dec 2008, Jeff King wrote: > > Some discussion of the interaction of notes and rebase: > > http://thread.gmane.org/gmane.comp.version-control.git/100533 > > Oh, I misinterpreted that label... of course you can track rebases in > notes, but some issue that we did not look into yet (I think) is the > issue that you can cherry-pick and rebase commits and lose notes in the > process. > > It seems that the notes idea is not that unintrusive as I thought... So we have two issues here: 1. Using notes to annotate the rebase/cherry-pick action itself. 2. Preserving (or at least giving the user the option of preserving) notes across a rebase/cherry-pick. I think issue #1 has already been discussed, and is largely resolved: People can do this if they want to; it probably only makes sense when rebasing/cherry-picking public branches; etc... AFAICS there are no remaining problems here that needs an intrusive solution (see below for one such unintrusive alternative). Issue #2, however, is a little more involved. We can discuss the merits of wanting to preserve notes across a rebase/cherry-pick itself; e.g. when it makes sense to preserve notes, and when it doesn't make sense, but I think this is orthogonal to the issue of HOW to preserve them, so instead of focusing on WHY, I'll focus on HOW: If notes are named according to the "refs/notes:<first byte in hex>/<rest of bytes>/<referenced object SHA-1>" scheme (and AFAICS this is still being discussed, so it's indeed a big IF), then rebase/cherry-pick of the referenced object simply translates to a rename/copy of the corresponding note (this is of course assuming that the note itself does not contain the SHA-1 of the referenced object). This could probably be solved fairly unintrusively in the current code, but there are (as always) complications: - The user may want to amend the note after the rebase/cherry-pick (just as (s)he may want to amend the commit message). - In some cases it may even make sense to fold (parts of) the note _into_ the commit message. - probably more reasons... So what about the following proposal: Add hooks that are invoked by rebase/cherry-pick with the <from-SHA1> and <to-SHA1> as arguments. A typical hook script can then use this information to look up notes referencing <from-SHA1> and update these to reference <to-SHA1> instead, and in the process, prompt the user to do whatever changes (s)he wants to. The hook scripts can do other things as well, e.g. implementing issue #1 above (adding notes for annotating the rebase/cherry-pick itself.) Have fun! :) ...Johan PS: What's the current status on git-sequencer? It's probably the best place to invoke these hooks. -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: rebasing commits that have notes, was Re: Git Notes idea. 2008-12-17 9:15 ` Johan Herland @ 2008-12-17 17:55 ` Stephan Beyer 0 siblings, 0 replies; 41+ messages in thread From: Stephan Beyer @ 2008-12-17 17:55 UTC (permalink / raw) To: Johan Herland; +Cc: git, Johannes Schindelin, Jeff King, Govind Salinas Hi, Johan Herland wrote: > PS: What's the current status on git-sequencer? Usable, but I still have a small todo list of things to add or fix. (And the next days some more time again. I hope it's enough. Sorry, I'm too lame.) > It's probably the best place to invoke these hooks. If you want to do stuff on top of git sequencer it's currently the best to fetch git://repo.or.cz/git/sbeyer.git seq-builtin-dev or just to wait. :-\ Regards, Stephan -- Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 0/4] Notes reloaded 2008-12-16 8:51 ` Jeff King ` (2 preceding siblings ...) 2008-12-17 0:12 ` rebasing commits that have notes, was " Johannes Schindelin @ 2008-12-19 23:34 ` Johannes Schindelin 2008-12-19 23:35 ` [PATCH 1/4] Introduce commit notes Johannes Schindelin ` (3 more replies) 3 siblings, 4 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-19 23:34 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List Hi, On Tue, 16 Dec 2008, Jeff King wrote: > Johannes Schindelin's notes proposal (which is more or less > the current proposal, but I think the on-disk notes index was not > well liked): > http://thread.gmane.org/gmane.comp.version-control.git/52598 I redid the benchmark (this time with a bit beefier machine), just comparing no notes with David's/Peff's idea: -- snip -- $ GIT_NOTES_TIMING_TESTS=1 sh t3302-notes-index-expensive.sh -i -v Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/.git/ * expecting success: create_repo 10 Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/10/.git/ * ok 1: setup 10 * expecting success: test_notes 10 * ok 2: notes work * expecting success: time_notes 100 no-notes 0.08user 0.10system 0:00.18elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+58926minor)pagefaults 0swaps notes 0.14user 0.07system 0:00.54elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+60319minor)pagefaults 0swaps * ok 3: notes timing * expecting success: create_repo 100 Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/100/.git/ * ok 1: setup 100 * expecting success: test_notes 100 * ok 2: notes work * expecting success: time_notes 100 no-notes 0.23user 0.21system 0:00.45elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+68043minor)pagefaults 0swaps notes 0.38user 0.21system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+78829minor)pagefaults 0swaps * ok 3: notes timing * expecting success: create_repo 1000 Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/1000/.git/ * ok 1: setup 1000 * expecting success: test_notes 1000 * ok 2: notes work * expecting success: time_notes 100 no-notes 2.06user 0.95system 0:04.26elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+159115minor)pagefaults 0swaps notes 2.83user 1.54system 0:04.38elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+267416minor)pagefaults 0swaps * ok 3: notes timing * expecting success: create_repo 10000 Initialized empty Git repository in /home/gitte/git/t/trash directory.t3302-notes-index-expensive/10000/.git/ * ok 1: setup 10000 * expecting success: test_notes 10000 * ok 2: notes work * expecting success: time_notes 100 no-notes 20.46user 7.63system 0:28.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1083378minor)pagefaults 0swaps notes 28.78user 13.74system 0:42.85elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2240296minor)pagefaults 0swaps * ok 3: notes timing * passed all 0 test(s) -- snap -- Keep in mind that the tests run "git log" 99 times, and show the accumulated time. So it seems that an increase of roughly 40% in the user time, and roughly 70% in the system time is the price to have notes associated with every single commit. Note that in that very same repository, a single "git show" goes from 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+561minor)pagefaults 0swaps to this: 0.03user 0.02system 0:00.04elapsed 113%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2294minor)pagefaults 0swaps (In another run, it only used 90%CPU) That's not too shabby, given that Git needs to unpack double the number of objects in this test when using notes vs. no notes. For comparison, the numbers back then were something like 10% in user time with a penalty of an extraordinary magnitude everytime the notes are updated: around 800%. Note: all these numbers are worst-case numbers, i.e. every commit has one note. To be frank, I do not completely understand why the numbers are that high. I would have understood an increase roughly 4 seconds for reading the quite large tree 99 times, and then the same ~0.20 seconds back then. Maybe I made a huge mistake when implementing the thing. And BTW, my code does not yet handle the case when refs/notes/commits:$commit is a tree instead of a blob. That is left as an exercise to the reader. Johannes Schindelin (4): Introduce commit notes Add a script to edit/inspect notes Speed up git notes lookup Add an expensive test for git-notes .gitignore | 1 + Documentation/config.txt | 15 ++++ Documentation/git-notes.txt | 46 +++++++++++ Makefile | 3 + cache.h | 3 + command-list.txt | 1 + commit.c | 1 + config.c | 5 + environment.c | 1 + git-notes.sh | 65 +++++++++++++++ notes.c | 159 ++++++++++++++++++++++++++++++++++++++ notes.h | 7 ++ pretty.c | 5 + t/t3301-notes.sh | 65 +++++++++++++++ t/t3302-notes-index-expensive.sh | 98 +++++++++++++++++++++++ 15 files changed, 475 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-notes.txt create mode 100755 git-notes.sh create mode 100644 notes.c create mode 100644 notes.h create mode 100755 t/t3301-notes.sh create mode 100755 t/t3302-notes-index-expensive.sh ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 1/4] Introduce commit notes 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin @ 2008-12-19 23:35 ` Johannes Schindelin 2008-12-20 6:53 ` Jeff King 2008-12-19 23:35 ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin ` (2 subsequent siblings) 3 siblings, 1 reply; 41+ messages in thread From: Johannes Schindelin @ 2008-12-19 23:35 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List Commit notes are blobs which are shown together with the commit message. These blobs are taken from the notes ref, which you can configure by the config variable core.notesRef, which in turn can be overridden by the environment variable GIT_NOTES_REF. The notes ref is a branch which contains "files" whose names are the names of the corresponding commits (i.e. the SHA-1). The rationale for putting this information into a ref is this: we want to be able to fetch and possibly union-merge the notes, maybe even look at the date when a note was introduced, and we want to store them efficiently together with the other objects. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- Documentation/config.txt | 15 ++++++++++ Makefile | 2 + cache.h | 3 ++ commit.c | 1 + config.c | 5 +++ environment.c | 1 + notes.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++ notes.h | 7 +++++ pretty.c | 5 +++ 9 files changed, 107 insertions(+), 0 deletions(-) create mode 100644 notes.c create mode 100644 notes.h diff --git a/Documentation/config.txt b/Documentation/config.txt index 4089362..3248524 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -431,6 +431,21 @@ core.inithook:: linkgit:git-init[1]. The hook is called with the argument "reinit" if an existing repository is re-initialized. +core.notesRef:: + When showing commit messages, also show notes which are stored in + the given ref. This ref is expected to contain paths of the form + ??/*, where the directory name consists of the first two + characters of the commit name, and the base name consists of + the remaining 38 characters. ++ +If such a path exists in the given ref, the referenced blob is read, and +appended to the commit message, separated by a "Notes:" line. If the +given ref itself does not exist, it is not an error, but means that no +notes should be print. ++ +This setting defaults to "refs/notes/commits", and can be overridden by +the `GIT_NOTES_REF` environment variable. + alias.*:: Command aliases for the linkgit:git[1] command wrapper - e.g. after defining "alias.last = cat-file commit HEAD", the invocation diff --git a/Makefile b/Makefile index 5e293fe..7871f52 100644 --- a/Makefile +++ b/Makefile @@ -371,6 +371,7 @@ LIB_H += ll-merge.h LIB_H += log-tree.h LIB_H += mailmap.h LIB_H += merge-recursive.h +LIB_H += notes.h LIB_H += object.h LIB_H += pack.h LIB_H += pack-refs.h @@ -454,6 +455,7 @@ LIB_OBJS += match-trees.o LIB_OBJS += merge-file.o LIB_OBJS += merge-recursive.o LIB_OBJS += name-hash.o +LIB_OBJS += notes.o LIB_OBJS += object.o LIB_OBJS += pack-check.o LIB_OBJS += pack-refs.o diff --git a/cache.h b/cache.h index b393c2d..30981de 100644 --- a/cache.h +++ b/cache.h @@ -375,6 +375,8 @@ static inline enum object_type object_type(unsigned int mode) #define GITATTRIBUTES_FILE ".gitattributes" #define INFOATTRIBUTES_FILE "info/attributes" #define ATTRIBUTE_MACRO_PREFIX "[attr]" +#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF" +#define GIT_NOTES_DEFAULT_REF "refs/notes/commits" extern int is_bare_repository_cfg; extern int is_bare_repository(void); @@ -548,6 +550,7 @@ enum rebase_setup_type { extern enum branch_track git_branch_track; extern enum rebase_setup_type autorebase; extern int keep_hard_links; +extern char *notes_ref_name; #define GIT_REPO_VERSION 0 extern int repository_format_version; diff --git a/commit.c b/commit.c index 00f4774..547b88f 100644 --- a/commit.c +++ b/commit.c @@ -5,6 +5,7 @@ #include "utf8.h" #include "diff.h" #include "revision.h" +#include "notes.h" int save_commit_buffer = 1; diff --git a/config.c b/config.c index 8ff2b4b..9ab9d21 100644 --- a/config.c +++ b/config.c @@ -469,6 +469,11 @@ static int git_default_core_config(const char *var, const char *value) return 0; } + if (!strcmp(var, "core.notesref")) { + notes_ref_name = xstrdup(value); + return 0; + } + if (!strcmp(var, "core.pager")) return git_config_string(&pager_program, var, value); diff --git a/environment.c b/environment.c index fc91809..5724ad2 100644 --- a/environment.c +++ b/environment.c @@ -46,6 +46,7 @@ int keep_hard_links = 0; /* Parallel index stat data preload? */ int core_preload_index = 0; +char *notes_ref_name; /* This is set by setup_git_dir_gently() and/or git_default_config() */ char *git_work_tree_cfg; diff --git a/notes.c b/notes.c new file mode 100644 index 0000000..91ec77f --- /dev/null +++ b/notes.c @@ -0,0 +1,68 @@ +#include "cache.h" +#include "commit.h" +#include "notes.h" +#include "refs.h" +#include "utf8.h" +#include "strbuf.h" + +static int initialized; + +void get_commit_notes(const struct commit *commit, struct strbuf *sb, + const char *output_encoding) +{ + static const char *utf8 = "utf-8"; + struct strbuf name = STRBUF_INIT; + const char *hex; + unsigned char sha1[20]; + char *msg; + unsigned long msgoffset, msglen; + enum object_type type; + + if (!initialized) { + const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT); + if (env) + notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT); + else if (!notes_ref_name) + notes_ref_name = GIT_NOTES_DEFAULT_REF; + if (notes_ref_name && read_ref(notes_ref_name, sha1)) + notes_ref_name = NULL; + initialized = 1; + } + + if (!notes_ref_name) + return; + + strbuf_addf(&name, "%s:%s", notes_ref_name, + sha1_to_hex(commit->object.sha1)); + if (get_sha1(name.buf, sha1)) + return; + + if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen || + type != OBJ_BLOB) + return; + + if (output_encoding && *output_encoding && + strcmp(utf8, output_encoding)) { + char *reencoded = reencode_string(msg, output_encoding, utf8); + if (reencoded) { + free(msg); + msg = reencoded; + msglen = strlen(msg); + } + } + + /* we will end the annotation by a newline anyway */ + if (msglen && msg[msglen - 1] == '\n') + msglen--; + + strbuf_addstr(sb, "\nNotes:\n"); + + for (msgoffset = 0; msgoffset < msglen;) { + int linelen = strchrnul(msg, '\n') - msg; + + strbuf_addstr(sb, " "); + strbuf_add(sb, msg + msgoffset, linelen); + msgoffset += linelen; + } + free(msg); +} diff --git a/notes.h b/notes.h new file mode 100644 index 0000000..79d21b6 --- /dev/null +++ b/notes.h @@ -0,0 +1,7 @@ +#ifndef NOTES_H +#define NOTES_H + +void get_commit_notes(const struct commit *commit, struct strbuf *sb, + const char *output_encoding); + +#endif diff --git a/pretty.c b/pretty.c index 5f9a0c7..c2bf451 100644 --- a/pretty.c +++ b/pretty.c @@ -6,6 +6,7 @@ #include "string-list.h" #include "mailmap.h" #include "log-tree.h" +#include "notes.h" static char *user_format; @@ -911,5 +912,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit, */ if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body) strbuf_addch(sb, '\n'); + + if (fmt != CMIT_FMT_ONELINE) + get_commit_notes(commit, sb, encoding); + free(reencoded); } -- 1.6.1.rc3.368.g63acc ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 1/4] Introduce commit notes 2008-12-19 23:35 ` [PATCH 1/4] Introduce commit notes Johannes Schindelin @ 2008-12-20 6:53 ` Jeff King 2008-12-20 7:55 ` Robin Rosenberg 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin 0 siblings, 2 replies; 41+ messages in thread From: Jeff King @ 2008-12-20 6:53 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Govind Salinas, Git Mailing List On Sat, Dec 20, 2008 at 12:35:06AM +0100, Johannes Schindelin wrote: > Commit notes are blobs which are shown together with the commit > message. These blobs are taken from the notes ref, which you can > configure by the config variable core.notesRef, which in turn can > be overridden by the environment variable GIT_NOTES_REF. Hmm. I wanted to try some performance comparisons based on this implementation, but I can't get your 1/4 to apply. Conflicts in config.txt and cache.h when applying to master, and "sha1 information is lacking or useless" for a 3-way merge. What did you base this on? -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/4] Introduce commit notes 2008-12-20 6:53 ` Jeff King @ 2008-12-20 7:55 ` Robin Rosenberg 2008-12-20 8:05 ` Jeff King 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin 1 sibling, 1 reply; 41+ messages in thread From: Robin Rosenberg @ 2008-12-20 7:55 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Govind Salinas, Git Mailing List lördag 20 december 2008 07:53:38 skrev Jeff King: > On Sat, Dec 20, 2008 at 12:35:06AM +0100, Johannes Schindelin wrote: > > > Commit notes are blobs which are shown together with the commit > > message. These blobs are taken from the notes ref, which you can > > configure by the config variable core.notesRef, which in turn can > > be overridden by the environment variable GIT_NOTES_REF. > > Hmm. I wanted to try some performance comparisons based on this > implementation, but I can't get your 1/4 to apply. Conflicts in > config.txt and cache.h when applying to master, and "sha1 information is > lacking or useless" for a 3-way merge. What did you base this on? patch(1) however can crunch it, with the exception of cache.h. Shouldn't git am/appy and patch agree on git generated patches (without binary diffs)? -- robin ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/4] Introduce commit notes 2008-12-20 7:55 ` Robin Rosenberg @ 2008-12-20 8:05 ` Jeff King 2008-12-20 8:17 ` Junio C Hamano 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2008-12-20 8:05 UTC (permalink / raw) To: Robin Rosenberg; +Cc: Johannes Schindelin, Govind Salinas, Git Mailing List On Sat, Dec 20, 2008 at 08:55:14AM +0100, Robin Rosenberg wrote: > > Hmm. I wanted to try some performance comparisons based on this > > implementation, but I can't get your 1/4 to apply. Conflicts in > > config.txt and cache.h when applying to master, and "sha1 information is > > lacking or useless" for a 3-way merge. What did you base this on? > > patch(1) however can crunch it, with the exception of cache.h. Shouldn't > git am/appy and patch agree on git generated patches (without binary diffs)? No. git apply is intentionally much more strict about applying under the assumption that it is better to force a conflict than to silently apply something that has a reasonable chance of being completely wrong. And usually it is not a big deal because falling back to the 3-way merge is a much nicer way of handling any conflicts _anyway_ (I find .rej files so much more useless than conflict markers, personally). In this case I was able to: 1. git am /the/patch 2. patch -p1 <.git/rebase-apply/patch 3. manually inspect the results for sanity, and fix up the cache.h bit that failed totally 4. git add -u && git add notes.[ch] 5. git am --resolved -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/4] Introduce commit notes 2008-12-20 8:05 ` Jeff King @ 2008-12-20 8:17 ` Junio C Hamano 2008-12-20 8:23 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: Junio C Hamano @ 2008-12-20 8:17 UTC (permalink / raw) To: Jeff King Cc: Robin Rosenberg, Johannes Schindelin, Govind Salinas, Git Mailing List Jeff King <peff@peff.net> writes: > No. git apply is intentionally much more strict about applying under the > assumption that it is better to force a conflict than to silently apply > something that has a reasonable chance of being completely wrong. > > And usually it is not a big deal because falling back to the 3-way merge > is a much nicer way of handling any conflicts _anyway_ (I find .rej > files so much more useless than conflict markers, personally). > > In this case I was able to: > > 1. git am /the/patch > 2. patch -p1 <.git/rebase-apply/patch > 3. manually inspect the results for sanity, and fix up the cache.h > bit that failed totally > 4. git add -u && git add notes.[ch] > 5. git am --resolved I usually skip 2-4 and edit .git/rebase-apply/patch in place instead, and run "git am" instead of step 5. Why was the patch, that was based on something that is clearly different from what other people work on, sent to the list in the first place? IOW, what good does it do to show your patch if other people (plural) need to spend a lot of time and head-scratching? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/4] Introduce commit notes 2008-12-20 8:17 ` Junio C Hamano @ 2008-12-20 8:23 ` Jeff King 2008-12-20 20:09 ` Junio C Hamano 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2008-12-20 8:23 UTC (permalink / raw) To: Junio C Hamano Cc: Robin Rosenberg, Johannes Schindelin, Govind Salinas, Git Mailing List On Sat, Dec 20, 2008 at 12:17:46AM -0800, Junio C Hamano wrote: > > 1. git am /the/patch > > 2. patch -p1 <.git/rebase-apply/patch > > 3. manually inspect the results for sanity, and fix up the cache.h > > bit that failed totally > > 4. git add -u && git add notes.[ch] > > 5. git am --resolved > > I usually skip 2-4 and edit .git/rebase-apply/patch in place instead, and > run "git am" instead of step 5. How do you track down the source of the conflict to do the patch fixup? In this case, it was a context line in the patch that had been deleted in my version. Do you just find the appropriate chunk in what you have already and visually compare? -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/4] Introduce commit notes 2008-12-20 8:23 ` Jeff King @ 2008-12-20 20:09 ` Junio C Hamano 0 siblings, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2008-12-20 20:09 UTC (permalink / raw) To: Jeff King Cc: Robin Rosenberg, Johannes Schindelin, Govind Salinas, Git Mailing List Jeff King <peff@peff.net> writes: > On Sat, Dec 20, 2008 at 12:17:46AM -0800, Junio C Hamano wrote: > >> > 1. git am /the/patch >> > 2. patch -p1 <.git/rebase-apply/patch >> > 3. manually inspect the results for sanity, and fix up the cache.h >> > bit that failed totally >> > 4. git add -u && git add notes.[ch] >> > 5. git am --resolved >> >> I usually skip 2-4 and edit .git/rebase-apply/patch in place instead, and >> run "git am" instead of step 5. > > How do you track down the source of the conflict to do the patch fixup? Old fashioned way, by looking at the patch and the file that the patch is supposed to apply and reading the contexts. ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 0/4] Notes, reloaded 2008-12-20 6:53 ` Jeff King 2008-12-20 7:55 ` Robin Rosenberg @ 2008-12-20 12:04 ` Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin ` (3 more replies) 1 sibling, 4 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-20 12:04 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List My apologies; I forgot to rebase the series from my personal fork to Junio's next. This is the version that should apply cleanly. Johannes Schindelin (4): Introduce commit notes Add a script to edit/inspect notes Speed up git notes lookup Add an expensive test for git-notes .gitignore | 1 + Documentation/config.txt | 15 ++++ Documentation/git-notes.txt | 46 +++++++++++ Makefile | 3 + cache.h | 3 + command-list.txt | 1 + commit.c | 1 + config.c | 5 + environment.c | 1 + git-notes.sh | 65 +++++++++++++++ notes.c | 159 ++++++++++++++++++++++++++++++++++++++ notes.h | 7 ++ pretty.c | 5 + t/t3301-notes.sh | 65 +++++++++++++++ t/t3302-notes-index-expensive.sh | 98 +++++++++++++++++++++++ 15 files changed, 475 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-notes.txt create mode 100755 git-notes.sh create mode 100644 notes.c create mode 100644 notes.h create mode 100755 t/t3301-notes.sh create mode 100755 t/t3302-notes-index-expensive.sh ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 1/4] Introduce commit notes 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin @ 2008-12-20 12:05 ` Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin ` (2 subsequent siblings) 3 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-20 12:05 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List Commit notes are blobs which are shown together with the commit message. These blobs are taken from the notes ref, which you can configure by the config variable core.notesRef, which in turn can be overridden by the environment variable GIT_NOTES_REF. The notes ref is a branch which contains "files" whose names are the names of the corresponding commits (i.e. the SHA-1). The rationale for putting this information into a ref is this: we want to be able to fetch and possibly union-merge the notes, maybe even look at the date when a note was introduced, and we want to store them efficiently together with the other objects. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- Documentation/config.txt | 15 ++++++++++ Makefile | 2 + cache.h | 3 ++ commit.c | 1 + config.c | 5 +++ environment.c | 1 + notes.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++ notes.h | 7 +++++ pretty.c | 5 +++ 9 files changed, 107 insertions(+), 0 deletions(-) create mode 100644 notes.c create mode 100644 notes.h diff --git a/Documentation/config.txt b/Documentation/config.txt index 21ea165..b35a32a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -422,6 +422,21 @@ relatively high IO latencies. With this set to 'true', git will do the index comparison to the filesystem data in parallel, allowing overlapping IO's. +core.notesRef:: + When showing commit messages, also show notes which are stored in + the given ref. This ref is expected to contain paths of the form + ??/*, where the directory name consists of the first two + characters of the commit name, and the base name consists of + the remaining 38 characters. ++ +If such a path exists in the given ref, the referenced blob is read, and +appended to the commit message, separated by a "Notes:" line. If the +given ref itself does not exist, it is not an error, but means that no +notes should be print. ++ +This setting defaults to "refs/notes/commits", and can be overridden by +the `GIT_NOTES_REF` environment variable. + alias.*:: Command aliases for the linkgit:git[1] command wrapper - e.g. after defining "alias.last = cat-file commit HEAD", the invocation diff --git a/Makefile b/Makefile index 5fcff9a..dc0a324 100644 --- a/Makefile +++ b/Makefile @@ -370,6 +370,7 @@ LIB_H += ll-merge.h LIB_H += log-tree.h LIB_H += mailmap.h LIB_H += merge-recursive.h +LIB_H += notes.h LIB_H += object.h LIB_H += pack.h LIB_H += pack-refs.h @@ -451,6 +452,7 @@ LIB_OBJS += match-trees.o LIB_OBJS += merge-file.o LIB_OBJS += merge-recursive.o LIB_OBJS += name-hash.o +LIB_OBJS += notes.o LIB_OBJS += object.o LIB_OBJS += pack-check.o LIB_OBJS += pack-refs.o diff --git a/cache.h b/cache.h index 9dabcc7..9192853 100644 --- a/cache.h +++ b/cache.h @@ -375,6 +375,8 @@ static inline enum object_type object_type(unsigned int mode) #define GITATTRIBUTES_FILE ".gitattributes" #define INFOATTRIBUTES_FILE "info/attributes" #define ATTRIBUTE_MACRO_PREFIX "[attr]" +#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF" +#define GIT_NOTES_DEFAULT_REF "refs/notes/commits" extern int is_bare_repository_cfg; extern int is_bare_repository(void); @@ -546,6 +548,7 @@ enum rebase_setup_type { extern enum branch_track git_branch_track; extern enum rebase_setup_type autorebase; +extern char *notes_ref_name; #define GIT_REPO_VERSION 0 extern int repository_format_version; diff --git a/commit.c b/commit.c index c99db16..10e532a 100644 --- a/commit.c +++ b/commit.c @@ -5,6 +5,7 @@ #include "utf8.h" #include "diff.h" #include "revision.h" +#include "notes.h" int save_commit_buffer = 1; diff --git a/config.c b/config.c index 790405a..e5d5b4b 100644 --- a/config.c +++ b/config.c @@ -469,6 +469,11 @@ static int git_default_core_config(const char *var, const char *value) return 0; } + if (!strcmp(var, "core.notesref")) { + notes_ref_name = xstrdup(value); + return 0; + } + if (!strcmp(var, "core.pager")) return git_config_string(&pager_program, var, value); diff --git a/environment.c b/environment.c index e278bce..0edae21 100644 --- a/environment.c +++ b/environment.c @@ -45,6 +45,7 @@ enum rebase_setup_type autorebase = AUTOREBASE_NEVER; /* Parallel index stat data preload? */ int core_preload_index = 0; +char *notes_ref_name; /* This is set by setup_git_dir_gently() and/or git_default_config() */ char *git_work_tree_cfg; diff --git a/notes.c b/notes.c new file mode 100644 index 0000000..91ec77f --- /dev/null +++ b/notes.c @@ -0,0 +1,68 @@ +#include "cache.h" +#include "commit.h" +#include "notes.h" +#include "refs.h" +#include "utf8.h" +#include "strbuf.h" + +static int initialized; + +void get_commit_notes(const struct commit *commit, struct strbuf *sb, + const char *output_encoding) +{ + static const char *utf8 = "utf-8"; + struct strbuf name = STRBUF_INIT; + const char *hex; + unsigned char sha1[20]; + char *msg; + unsigned long msgoffset, msglen; + enum object_type type; + + if (!initialized) { + const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT); + if (env) + notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT); + else if (!notes_ref_name) + notes_ref_name = GIT_NOTES_DEFAULT_REF; + if (notes_ref_name && read_ref(notes_ref_name, sha1)) + notes_ref_name = NULL; + initialized = 1; + } + + if (!notes_ref_name) + return; + + strbuf_addf(&name, "%s:%s", notes_ref_name, + sha1_to_hex(commit->object.sha1)); + if (get_sha1(name.buf, sha1)) + return; + + if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen || + type != OBJ_BLOB) + return; + + if (output_encoding && *output_encoding && + strcmp(utf8, output_encoding)) { + char *reencoded = reencode_string(msg, output_encoding, utf8); + if (reencoded) { + free(msg); + msg = reencoded; + msglen = strlen(msg); + } + } + + /* we will end the annotation by a newline anyway */ + if (msglen && msg[msglen - 1] == '\n') + msglen--; + + strbuf_addstr(sb, "\nNotes:\n"); + + for (msgoffset = 0; msgoffset < msglen;) { + int linelen = strchrnul(msg, '\n') - msg; + + strbuf_addstr(sb, " "); + strbuf_add(sb, msg + msgoffset, linelen); + msgoffset += linelen; + } + free(msg); +} diff --git a/notes.h b/notes.h new file mode 100644 index 0000000..79d21b6 --- /dev/null +++ b/notes.h @@ -0,0 +1,7 @@ +#ifndef NOTES_H +#define NOTES_H + +void get_commit_notes(const struct commit *commit, struct strbuf *sb, + const char *output_encoding); + +#endif diff --git a/pretty.c b/pretty.c index f6ff312..2d2872f 100644 --- a/pretty.c +++ b/pretty.c @@ -6,6 +6,7 @@ #include "string-list.h" #include "mailmap.h" #include "log-tree.h" +#include "notes.h" static char *user_format; @@ -881,5 +882,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit, */ if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body) strbuf_addch(sb, '\n'); + + if (fmt != CMIT_FMT_ONELINE) + get_commit_notes(commit, sb, encoding); + free(reencoded); } -- 1.6.1.rc3.412.ga72b ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 2/4] Add a script to edit/inspect notes 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin @ 2008-12-20 12:05 ` Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin 2008-12-20 12:06 ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin 3 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-20 12:05 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List The script 'git notes' allows you to edit and show commit notes, by calling either git notes show <commit> or git notes edit <commit> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- .gitignore | 1 + Documentation/git-notes.txt | 46 ++++++++++++++++++++++++++++++ Makefile | 1 + command-list.txt | 1 + git-notes.sh | 65 +++++++++++++++++++++++++++++++++++++++++++ t/t3301-notes.sh | 65 +++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 179 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-notes.txt create mode 100755 git-notes.sh create mode 100755 t/t3301-notes.sh diff --git a/.gitignore b/.gitignore index 327e660..ac5fbf4 100644 --- a/.gitignore +++ b/.gitignore @@ -82,6 +82,7 @@ git-mktag git-mktree git-name-rev git-mv +git-notes git-pack-redundant git-pack-objects git-pack-refs diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt new file mode 100644 index 0000000..3d93625 --- /dev/null +++ b/Documentation/git-notes.txt @@ -0,0 +1,46 @@ +git-notes(1) +============ + +NAME +---- +git-notes - Add/inspect commit notes + +SYNOPSIS +-------- +[verse] +'git-notes' (edit | show) [commit] + +DESCRIPTION +----------- +This command allows you to add notes to commit messages, without +changing the commit. To discern these notes from the message stored +in the commit object, the notes are indented like the message, after +an unindented line saying "Notes:". + +To disable commit notes, you have to set the config variable +core.notesRef to the empty string. Alternatively, you can set it +to a different ref, something like "refs/notes/bugzilla". This setting +can be overridden by the environment variable "GIT_NOTES_REF". + + +SUBCOMMANDS +----------- + +edit:: + Edit the notes for a given commit (defaults to HEAD). + +show:: + Show the notes for a given commit (defaults to HEAD). + + +Author +------ +Written by Johannes Schindelin <johannes.schindelin@gmx.de> + +Documentation +------------- +Documentation by Johannes Schindelin + +GIT +--- +Part of the gitlink:git[7] suite diff --git a/Makefile b/Makefile index dc0a324..5082f0f 100644 --- a/Makefile +++ b/Makefile @@ -258,6 +258,7 @@ SCRIPT_SH += git-merge-octopus.sh SCRIPT_SH += git-merge-one-file.sh SCRIPT_SH += git-merge-resolve.sh SCRIPT_SH += git-mergetool.sh +SCRIPT_SH += git-notes.sh SCRIPT_SH += git-parse-remote.sh SCRIPT_SH += git-pull.sh SCRIPT_SH += git-quiltimport.sh diff --git a/command-list.txt b/command-list.txt index 3583a33..2dc2c33 100644 --- a/command-list.txt +++ b/command-list.txt @@ -73,6 +73,7 @@ git-mktag plumbingmanipulators git-mktree plumbingmanipulators git-mv mainporcelain common git-name-rev plumbinginterrogators +git-notes mainporcelain git-pack-objects plumbingmanipulators git-pack-redundant plumbinginterrogators git-pack-refs ancillarymanipulators diff --git a/git-notes.sh b/git-notes.sh new file mode 100755 index 0000000..bfdbaa8 --- /dev/null +++ b/git-notes.sh @@ -0,0 +1,65 @@ +#!/bin/sh + +USAGE="(edit | show) [commit]" +. git-sh-setup + +test -n "$3" && usage + +test -z "$1" && usage +ACTION="$1"; shift + +test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)" +test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits" + +COMMIT=$(git rev-parse --verify --default HEAD "$@") || +die "Invalid commit: $@" + +MESSAGE="$GIT_DIR"/new-notes-$COMMIT +trap ' + test -f "$MESSAGE" && rm "$MESSAGE" +' 0 + +case "$ACTION" in +edit) + GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE" + + GIT_INDEX_FILE="$MESSAGE".idx + export GIT_INDEX_FILE + + CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ') + if [ -z "$CURRENT_HEAD" ]; then + PARENT= + else + PARENT="-p $CURRENT_HEAD" + git read-tree "$GIT_NOTES_REF" || die "Could not read index" + git cat-file blob :$COMMIT >> "$MESSAGE" 2> /dev/null + fi + + ${VISUAL:-${EDITOR:-vi}} "$MESSAGE" + + grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed + mv "$MESSAGE".processed "$MESSAGE" + if [ -s "$MESSAGE" ]; then + BLOB=$(git hash-object -w "$MESSAGE") || + die "Could not write into object database" + git update-index --add --cacheinfo 0644 $BLOB $COMMIT || + die "Could not write index" + else + test -z "$CURRENT_HEAD" && + die "Will not initialise with empty tree" + git update-index --force-remove $COMMIT || + die "Could not update index" + fi + + TREE=$(git write-tree) || die "Could not write tree" + NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) || + die "Could not annotate" + git update-ref -m "Annotate $COMMIT" \ + "$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD +;; +show) + git show "$GIT_NOTES_REF":$COMMIT +;; +*) + usage +esac diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh new file mode 100755 index 0000000..ba42c45 --- /dev/null +++ b/t/t3301-notes.sh @@ -0,0 +1,65 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test commit notes' + +. ./test-lib.sh + +cat > fake_editor.sh << \EOF +echo "$MSG" > "$1" +echo "$MSG" >& 2 +EOF +chmod a+x fake_editor.sh +VISUAL=./fake_editor.sh +export VISUAL + +test_expect_success 'cannot annotate non-existing HEAD' ' + ! MSG=3 git notes edit +' + +test_expect_success setup ' + : > a1 && + git add a1 && + test_tick && + git commit -m 1st && + : > a2 && + git add a2 && + test_tick && + git commit -m 2nd +' + +test_expect_success 'need valid notes ref' ' + ! MSG=1 GIT_NOTES_REF='/' git notes edit && + ! MSG=2 GIT_NOTES_REF='/' git notes show +' + +test_expect_success 'create notes' ' + git config core.notesRef refs/notes/commits && + MSG=b1 git notes edit && + test ! -f .git/new-notes && + test 1 = $(git ls-tree refs/notes/commits | wc -l) && + test b1 = $(git notes show) && + git show HEAD^ && + ! git notes show HEAD^ +' + +cat > expect << EOF +commit 268048bfb8a1fb38e703baceb8ab235421bf80c5 +Author: A U Thor <author@example.com> +Date: Thu Apr 7 15:14:13 2005 -0700 + + 2nd + +Notes: + b1 +EOF + +test_expect_success 'show notes' ' + ! (git cat-file commit HEAD | grep b1) && + git log -1 > output && + git diff expect output +' + +test_done -- 1.6.1.rc3.412.ga72b ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 3/4] Speed up git notes lookup 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin @ 2008-12-20 12:05 ` Johannes Schindelin 2008-12-20 12:06 ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin 3 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-20 12:05 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List To avoid looking up each and every commit in the notes ref's tree object, which is very expensive, speed things up by slurping the tree object's contents into a hash_map. The idea fo the hashmap singleton is from David Reiss, initial benchmarking by Jeff King. Note: the implementation allows for arbitrary entries in the notes tree object, ignoring those that do not reference a valid object. This allows you to annotate arbitrary branches, or objects. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- notes.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 102 insertions(+), 11 deletions(-) diff --git a/notes.c b/notes.c index 91ec77f..68bcb24 100644 --- a/notes.c +++ b/notes.c @@ -4,16 +4,112 @@ #include "refs.h" #include "utf8.h" #include "strbuf.h" +#include "tree-walk.h" + +struct entry { + unsigned char commit_sha1[20]; + unsigned char notes_sha1[20]; +}; + +struct hash_map { + struct entry *entries; + off_t count, size; +}; static int initialized; +static struct hash_map hash_map; + +static int hash_index(struct hash_map *map, const unsigned char *sha1) +{ + int i = ((*(unsigned int *)sha1) % map->size); + + for (;;) { + unsigned char *current = map->entries[i].commit_sha1; + + if (!hashcmp(sha1, current)) + return i; + + if (is_null_sha1(current)) + return -1 - i; + + if (++i == map->size) + i = 0; + } +} + +static void add_entry(const unsigned char *commit_sha1, + const unsigned char *notes_sha1) +{ + int index; + + if (hash_map.count + 1 > hash_map.size >> 1) { + int i, old_size = hash_map.size; + struct entry *old = hash_map.entries; + + hash_map.size = old_size ? old_size << 1 : 64; + hash_map.entries = (struct entry *) + xcalloc(sizeof(struct entry), hash_map.size); + + for (i = 0; i < old_size; i++) + if (!is_null_sha1(old[i].commit_sha1)) { + index = -1 - hash_index(&hash_map, + old[i].commit_sha1); + memcpy(hash_map.entries + index, old + i, + sizeof(struct entry)); + } + free(old); + } + + index = hash_index(&hash_map, commit_sha1); + if (index < 0) { + index = -1 - index; + hash_map.count++; + } + + hashcpy(hash_map.entries[index].commit_sha1, commit_sha1); + hashcpy(hash_map.entries[index].notes_sha1, notes_sha1); +} + +static void initialize_hash_map(const char *notes_ref_name) +{ + unsigned char sha1[20], commit_sha1[20]; + unsigned *mode; + struct tree_desc desc; + struct name_entry entry; + void *buf; + + if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) || + get_tree_entry(commit_sha1, "", sha1, mode)) + return; + + buf = fill_tree_descriptor(&desc, sha1); + if (!buf) + die ("Could not read %s for notes-index", sha1_to_hex(sha1)); + + while (tree_entry(&desc, &entry)) + if (!get_sha1(entry.path, commit_sha1)) + add_entry(commit_sha1, entry.sha1); + free(buf); +} + +static unsigned char *lookup_notes(const unsigned char *commit_sha1) +{ + int index; + + if (!hash_map.size) + return NULL; + + index = hash_index(&hash_map, commit_sha1); + if (index < 0) + return NULL; + return hash_map.entries[index].notes_sha1; +} void get_commit_notes(const struct commit *commit, struct strbuf *sb, const char *output_encoding) { static const char *utf8 = "utf-8"; - struct strbuf name = STRBUF_INIT; - const char *hex; - unsigned char sha1[20]; + unsigned char *sha1; char *msg; unsigned long msgoffset, msglen; enum object_type type; @@ -24,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb, notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT); else if (!notes_ref_name) notes_ref_name = GIT_NOTES_DEFAULT_REF; - if (notes_ref_name && read_ref(notes_ref_name, sha1)) - notes_ref_name = NULL; + initialize_hash_map(notes_ref_name); initialized = 1; } - if (!notes_ref_name) - return; - - strbuf_addf(&name, "%s:%s", notes_ref_name, - sha1_to_hex(commit->object.sha1)); - if (get_sha1(name.buf, sha1)) + sha1 = lookup_notes(commit->object.sha1); + if (!sha1) return; if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen || -- 1.6.1.rc3.412.ga72b ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 4/4] Add an expensive test for git-notes 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin ` (2 preceding siblings ...) 2008-12-20 12:05 ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin @ 2008-12-20 12:06 ` Johannes Schindelin 3 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-20 12:06 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List git-notes have the potential of being pretty expensive, so test with a lot of commits. A lot. So to make things cheaper, you have to opt-in explicitely, by setting the environment variable GIT_NOTES_TIMING_TESTS. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- t/t3302-notes-index-expensive.sh | 98 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 98 insertions(+), 0 deletions(-) create mode 100755 t/t3302-notes-index-expensive.sh diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh new file mode 100755 index 0000000..00d27bf --- /dev/null +++ b/t/t3302-notes-index-expensive.sh @@ -0,0 +1,98 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test commit notes index (expensive!)' + +. ./test-lib.sh + +test -z "$GIT_NOTES_TIMING_TESTS" && { + say Skipping timing tests + test_done + exit +} + +create_repo () { + number_of_commits=$1 + nr=0 + parent= + test -d .git || { + git init && + tree=$(git write-tree) && + while [ $nr -lt $number_of_commits ]; do + test_tick && + commit=$(echo $nr | git commit-tree $tree $parent) || + return + parent="-p $commit" + nr=$(($nr+1)) + done && + git update-ref refs/heads/master $commit && + { + export GIT_INDEX_FILE=.git/temp; + git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" | + while read nr sha1; do + blob=$(echo note $nr | git hash-object -w --stdin) && + echo $sha1 | sed "s/^/0644 $blob 0 /" + done | git update-index --index-info && + tree=$(git write-tree) && + test_tick && + commit=$(echo notes | git commit-tree $tree) && + git update-ref refs/notes/commits $commit + } && + git config core.notesRef refs/notes/commits + } +} + +test_notes () { + count=$1 && + git config core.notesRef refs/notes/commits && + git log | grep "^ " > output && + i=1 && + while [ $i -le $count ]; do + echo " $(($count-$i))" && + echo " note $i" && + i=$(($i+1)); + done > expect && + git diff expect output +} + +cat > time_notes << \EOF + mode=$1 + i=1 + while [ $i -lt $2 ]; do + case $1 in + no-notes) + export GIT_NOTES_REF=non-existing + ;; + notes) + unset GIT_NOTES_REF + ;; + esac + git log >/dev/null + i=$(($i+1)) + done +EOF + +time_notes () { + for mode in no-notes notes + do + echo $mode + /usr/bin/time sh ../time_notes $mode $1 + done +} + +for count in 10 100 1000 10000; do + + mkdir $count + (cd $count; + + test_expect_success "setup $count" "create_repo $count" + + test_expect_success 'notes work' "test_notes $count" + + test_expect_success 'notes timing' "time_notes 100" + ) +done + +test_done -- 1.6.1.rc3.412.ga72b ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 2/4] Add a script to edit/inspect notes 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin 2008-12-19 23:35 ` [PATCH 1/4] Introduce commit notes Johannes Schindelin @ 2008-12-19 23:35 ` Johannes Schindelin 2008-12-19 23:35 ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin 2008-12-19 23:37 ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin 3 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-19 23:35 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List The script 'git notes' allows you to edit and show commit notes, by calling either git notes show <commit> or git notes edit <commit> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- .gitignore | 1 + Documentation/git-notes.txt | 46 ++++++++++++++++++++++++++++++ Makefile | 1 + command-list.txt | 1 + git-notes.sh | 65 +++++++++++++++++++++++++++++++++++++++++++ t/t3301-notes.sh | 65 +++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 179 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-notes.txt create mode 100755 git-notes.sh create mode 100755 t/t3301-notes.sh diff --git a/.gitignore b/.gitignore index 90bbb2a..8b90af1 100644 --- a/.gitignore +++ b/.gitignore @@ -83,6 +83,7 @@ git-mktag git-mktree git-name-rev git-mv +git-notes git-pack-redundant git-pack-objects git-pack-refs diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt new file mode 100644 index 0000000..3d93625 --- /dev/null +++ b/Documentation/git-notes.txt @@ -0,0 +1,46 @@ +git-notes(1) +============ + +NAME +---- +git-notes - Add/inspect commit notes + +SYNOPSIS +-------- +[verse] +'git-notes' (edit | show) [commit] + +DESCRIPTION +----------- +This command allows you to add notes to commit messages, without +changing the commit. To discern these notes from the message stored +in the commit object, the notes are indented like the message, after +an unindented line saying "Notes:". + +To disable commit notes, you have to set the config variable +core.notesRef to the empty string. Alternatively, you can set it +to a different ref, something like "refs/notes/bugzilla". This setting +can be overridden by the environment variable "GIT_NOTES_REF". + + +SUBCOMMANDS +----------- + +edit:: + Edit the notes for a given commit (defaults to HEAD). + +show:: + Show the notes for a given commit (defaults to HEAD). + + +Author +------ +Written by Johannes Schindelin <johannes.schindelin@gmx.de> + +Documentation +------------- +Documentation by Johannes Schindelin + +GIT +--- +Part of the gitlink:git[7] suite diff --git a/Makefile b/Makefile index 7871f52..4bdc86e 100644 --- a/Makefile +++ b/Makefile @@ -259,6 +259,7 @@ SCRIPT_SH += git-merge-octopus.sh SCRIPT_SH += git-merge-one-file.sh SCRIPT_SH += git-merge-resolve.sh SCRIPT_SH += git-mergetool.sh +SCRIPT_SH += git-notes.sh SCRIPT_SH += git-parse-remote.sh SCRIPT_SH += git-pull.sh SCRIPT_SH += git-quiltimport.sh diff --git a/command-list.txt b/command-list.txt index 3583a33..2dc2c33 100644 --- a/command-list.txt +++ b/command-list.txt @@ -73,6 +73,7 @@ git-mktag plumbingmanipulators git-mktree plumbingmanipulators git-mv mainporcelain common git-name-rev plumbinginterrogators +git-notes mainporcelain git-pack-objects plumbingmanipulators git-pack-redundant plumbinginterrogators git-pack-refs ancillarymanipulators diff --git a/git-notes.sh b/git-notes.sh new file mode 100755 index 0000000..bfdbaa8 --- /dev/null +++ b/git-notes.sh @@ -0,0 +1,65 @@ +#!/bin/sh + +USAGE="(edit | show) [commit]" +. git-sh-setup + +test -n "$3" && usage + +test -z "$1" && usage +ACTION="$1"; shift + +test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)" +test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits" + +COMMIT=$(git rev-parse --verify --default HEAD "$@") || +die "Invalid commit: $@" + +MESSAGE="$GIT_DIR"/new-notes-$COMMIT +trap ' + test -f "$MESSAGE" && rm "$MESSAGE" +' 0 + +case "$ACTION" in +edit) + GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE" + + GIT_INDEX_FILE="$MESSAGE".idx + export GIT_INDEX_FILE + + CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ') + if [ -z "$CURRENT_HEAD" ]; then + PARENT= + else + PARENT="-p $CURRENT_HEAD" + git read-tree "$GIT_NOTES_REF" || die "Could not read index" + git cat-file blob :$COMMIT >> "$MESSAGE" 2> /dev/null + fi + + ${VISUAL:-${EDITOR:-vi}} "$MESSAGE" + + grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed + mv "$MESSAGE".processed "$MESSAGE" + if [ -s "$MESSAGE" ]; then + BLOB=$(git hash-object -w "$MESSAGE") || + die "Could not write into object database" + git update-index --add --cacheinfo 0644 $BLOB $COMMIT || + die "Could not write index" + else + test -z "$CURRENT_HEAD" && + die "Will not initialise with empty tree" + git update-index --force-remove $COMMIT || + die "Could not update index" + fi + + TREE=$(git write-tree) || die "Could not write tree" + NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) || + die "Could not annotate" + git update-ref -m "Annotate $COMMIT" \ + "$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD +;; +show) + git show "$GIT_NOTES_REF":$COMMIT +;; +*) + usage +esac diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh new file mode 100755 index 0000000..ba42c45 --- /dev/null +++ b/t/t3301-notes.sh @@ -0,0 +1,65 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test commit notes' + +. ./test-lib.sh + +cat > fake_editor.sh << \EOF +echo "$MSG" > "$1" +echo "$MSG" >& 2 +EOF +chmod a+x fake_editor.sh +VISUAL=./fake_editor.sh +export VISUAL + +test_expect_success 'cannot annotate non-existing HEAD' ' + ! MSG=3 git notes edit +' + +test_expect_success setup ' + : > a1 && + git add a1 && + test_tick && + git commit -m 1st && + : > a2 && + git add a2 && + test_tick && + git commit -m 2nd +' + +test_expect_success 'need valid notes ref' ' + ! MSG=1 GIT_NOTES_REF='/' git notes edit && + ! MSG=2 GIT_NOTES_REF='/' git notes show +' + +test_expect_success 'create notes' ' + git config core.notesRef refs/notes/commits && + MSG=b1 git notes edit && + test ! -f .git/new-notes && + test 1 = $(git ls-tree refs/notes/commits | wc -l) && + test b1 = $(git notes show) && + git show HEAD^ && + ! git notes show HEAD^ +' + +cat > expect << EOF +commit 268048bfb8a1fb38e703baceb8ab235421bf80c5 +Author: A U Thor <author@example.com> +Date: Thu Apr 7 15:14:13 2005 -0700 + + 2nd + +Notes: + b1 +EOF + +test_expect_success 'show notes' ' + ! (git cat-file commit HEAD | grep b1) && + git log -1 > output && + git diff expect output +' + +test_done -- 1.6.1.rc3.368.g63acc ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 3/4] Speed up git notes lookup 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin 2008-12-19 23:35 ` [PATCH 1/4] Introduce commit notes Johannes Schindelin 2008-12-19 23:35 ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin @ 2008-12-19 23:35 ` Johannes Schindelin 2008-12-19 23:37 ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin 3 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-19 23:35 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List To avoid looking up each and every commit in the notes ref's tree object, which is very expensive, speed things up by slurping the tree object's contents into a hash_map. The idea fo the hashmap singleton is from David Reiss, initial benchmarking by Jeff King. Note: the implementation allows for arbitrary entries in the notes tree object, ignoring those that do not reference a valid object. This allows you to annotate arbitrary branches, or objects. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- notes.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 102 insertions(+), 11 deletions(-) diff --git a/notes.c b/notes.c index 91ec77f..68bcb24 100644 --- a/notes.c +++ b/notes.c @@ -4,16 +4,112 @@ #include "refs.h" #include "utf8.h" #include "strbuf.h" +#include "tree-walk.h" + +struct entry { + unsigned char commit_sha1[20]; + unsigned char notes_sha1[20]; +}; + +struct hash_map { + struct entry *entries; + off_t count, size; +}; static int initialized; +static struct hash_map hash_map; + +static int hash_index(struct hash_map *map, const unsigned char *sha1) +{ + int i = ((*(unsigned int *)sha1) % map->size); + + for (;;) { + unsigned char *current = map->entries[i].commit_sha1; + + if (!hashcmp(sha1, current)) + return i; + + if (is_null_sha1(current)) + return -1 - i; + + if (++i == map->size) + i = 0; + } +} + +static void add_entry(const unsigned char *commit_sha1, + const unsigned char *notes_sha1) +{ + int index; + + if (hash_map.count + 1 > hash_map.size >> 1) { + int i, old_size = hash_map.size; + struct entry *old = hash_map.entries; + + hash_map.size = old_size ? old_size << 1 : 64; + hash_map.entries = (struct entry *) + xcalloc(sizeof(struct entry), hash_map.size); + + for (i = 0; i < old_size; i++) + if (!is_null_sha1(old[i].commit_sha1)) { + index = -1 - hash_index(&hash_map, + old[i].commit_sha1); + memcpy(hash_map.entries + index, old + i, + sizeof(struct entry)); + } + free(old); + } + + index = hash_index(&hash_map, commit_sha1); + if (index < 0) { + index = -1 - index; + hash_map.count++; + } + + hashcpy(hash_map.entries[index].commit_sha1, commit_sha1); + hashcpy(hash_map.entries[index].notes_sha1, notes_sha1); +} + +static void initialize_hash_map(const char *notes_ref_name) +{ + unsigned char sha1[20], commit_sha1[20]; + unsigned *mode; + struct tree_desc desc; + struct name_entry entry; + void *buf; + + if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) || + get_tree_entry(commit_sha1, "", sha1, mode)) + return; + + buf = fill_tree_descriptor(&desc, sha1); + if (!buf) + die ("Could not read %s for notes-index", sha1_to_hex(sha1)); + + while (tree_entry(&desc, &entry)) + if (!get_sha1(entry.path, commit_sha1)) + add_entry(commit_sha1, entry.sha1); + free(buf); +} + +static unsigned char *lookup_notes(const unsigned char *commit_sha1) +{ + int index; + + if (!hash_map.size) + return NULL; + + index = hash_index(&hash_map, commit_sha1); + if (index < 0) + return NULL; + return hash_map.entries[index].notes_sha1; +} void get_commit_notes(const struct commit *commit, struct strbuf *sb, const char *output_encoding) { static const char *utf8 = "utf-8"; - struct strbuf name = STRBUF_INIT; - const char *hex; - unsigned char sha1[20]; + unsigned char *sha1; char *msg; unsigned long msgoffset, msglen; enum object_type type; @@ -24,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb, notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT); else if (!notes_ref_name) notes_ref_name = GIT_NOTES_DEFAULT_REF; - if (notes_ref_name && read_ref(notes_ref_name, sha1)) - notes_ref_name = NULL; + initialize_hash_map(notes_ref_name); initialized = 1; } - if (!notes_ref_name) - return; - - strbuf_addf(&name, "%s:%s", notes_ref_name, - sha1_to_hex(commit->object.sha1)); - if (get_sha1(name.buf, sha1)) + sha1 = lookup_notes(commit->object.sha1); + if (!sha1) return; if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen || -- 1.6.1.rc3.368.g63acc ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 4/4] Add an expensive test for git-notes 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin ` (2 preceding siblings ...) 2008-12-19 23:35 ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin @ 2008-12-19 23:37 ` Johannes Schindelin 2008-12-19 23:49 ` Boyd Stephen Smith Jr. 3 siblings, 1 reply; 41+ messages in thread From: Johannes Schindelin @ 2008-12-19 23:37 UTC (permalink / raw) To: Jeff King; +Cc: Govind Salinas, Git Mailing List git-notes have the potential of being pretty expensive, so test with a lot of commits. A lot. So to make things cheaper, you have to opt-in explicitely, by setting the environment variable GIT_NOTES_TIMING_TESTS. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- I would appreciate other people running the tests, and maybe profiling the code. However, I will not be really online the next two weeks, so if you feel like working on this series, go ahead. Merry Christmas. t/t3302-notes-index-expensive.sh | 98 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 98 insertions(+), 0 deletions(-) create mode 100755 t/t3302-notes-index-expensive.sh diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh new file mode 100755 index 0000000..00d27bf --- /dev/null +++ b/t/t3302-notes-index-expensive.sh @@ -0,0 +1,98 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test commit notes index (expensive!)' + +. ./test-lib.sh + +test -z "$GIT_NOTES_TIMING_TESTS" && { + say Skipping timing tests + test_done + exit +} + +create_repo () { + number_of_commits=$1 + nr=0 + parent= + test -d .git || { + git init && + tree=$(git write-tree) && + while [ $nr -lt $number_of_commits ]; do + test_tick && + commit=$(echo $nr | git commit-tree $tree $parent) || + return + parent="-p $commit" + nr=$(($nr+1)) + done && + git update-ref refs/heads/master $commit && + { + export GIT_INDEX_FILE=.git/temp; + git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" | + while read nr sha1; do + blob=$(echo note $nr | git hash-object -w --stdin) && + echo $sha1 | sed "s/^/0644 $blob 0 /" + done | git update-index --index-info && + tree=$(git write-tree) && + test_tick && + commit=$(echo notes | git commit-tree $tree) && + git update-ref refs/notes/commits $commit + } && + git config core.notesRef refs/notes/commits + } +} + +test_notes () { + count=$1 && + git config core.notesRef refs/notes/commits && + git log | grep "^ " > output && + i=1 && + while [ $i -le $count ]; do + echo " $(($count-$i))" && + echo " note $i" && + i=$(($i+1)); + done > expect && + git diff expect output +} + +cat > time_notes << \EOF + mode=$1 + i=1 + while [ $i -lt $2 ]; do + case $1 in + no-notes) + export GIT_NOTES_REF=non-existing + ;; + notes) + unset GIT_NOTES_REF + ;; + esac + git log >/dev/null + i=$(($i+1)) + done +EOF + +time_notes () { + for mode in no-notes notes + do + echo $mode + /usr/bin/time sh ../time_notes $mode $1 + done +} + +for count in 10 100 1000 10000; do + + mkdir $count + (cd $count; + + test_expect_success "setup $count" "create_repo $count" + + test_expect_success 'notes work' "test_notes $count" + + test_expect_success 'notes timing' "time_notes 100" + ) +done + +test_done -- 1.6.1.rc3.368.g63acc ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 4/4] Add an expensive test for git-notes 2008-12-19 23:37 ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin @ 2008-12-19 23:49 ` Boyd Stephen Smith Jr. 2008-12-20 11:51 ` Johannes Schindelin 0 siblings, 1 reply; 41+ messages in thread From: Boyd Stephen Smith Jr. @ 2008-12-19 23:49 UTC (permalink / raw) To: Git Mailing List; +Cc: Johannes Schindelin, Jeff King, Govind Salinas [-- Attachment #1: Type: text/plain, Size: 452 bytes --] On Friday 2008 December 19 17:37:15 Johannes Schindelin wrote: > + test_expect_success 'notes timing' "time_notes 100" ^^^ Probably should be ${count}. -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/ [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/4] Add an expensive test for git-notes 2008-12-19 23:49 ` Boyd Stephen Smith Jr. @ 2008-12-20 11:51 ` Johannes Schindelin 0 siblings, 0 replies; 41+ messages in thread From: Johannes Schindelin @ 2008-12-20 11:51 UTC (permalink / raw) To: Boyd Stephen Smith Jr.; +Cc: Git Mailing List, Jeff King, Govind Salinas [-- Attachment #1: Type: TEXT/PLAIN, Size: 437 bytes --] Hi, On Fri, 19 Dec 2008, Boyd Stephen Smith Jr. wrote: > On Friday 2008 December 19 17:37:15 Johannes Schindelin wrote: > > + test_expect_success 'notes timing' "time_notes 100" > ^^^ > Probably should be ${count}. No. It times a git log 100 times (actually, 99 times due to a thinko). This is only to protect against jitter, otherwise I'd do it only once. Hth, Dscho ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2008-12-20 20:11 UTC | newest] Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-12-16 8:15 Git Notes idea Govind Salinas 2008-12-16 8:51 ` Jeff King 2008-12-16 8:53 ` Jeff King 2008-12-16 18:43 ` Govind Salinas 2008-12-16 23:48 ` Johannes Schindelin 2008-12-17 9:45 ` Jeff King [not found] ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com> [not found] ` <alpine.DEB.1.00.0812170420560.14632@racer> 2008-12-17 10:11 ` Jeff King 2008-12-17 11:38 ` Johannes Schindelin 2008-12-17 19:20 ` Junio C Hamano 2008-12-18 3:08 ` Johannes Schindelin 2008-12-19 17:42 ` Govind Salinas 2008-12-19 17:18 ` Govind Salinas 2008-12-19 17:38 ` Govind Salinas 2008-12-19 21:25 ` Jeff King 2008-12-19 22:24 ` Govind Salinas 2008-12-20 4:54 ` Jeff King 2008-12-17 12:21 ` Petr Baudis 2008-12-17 9:38 ` Jeff King 2008-12-17 17:06 ` Govind Salinas 2008-12-18 13:54 ` Jeff King 2008-12-17 0:12 ` rebasing commits that have notes, was " Johannes Schindelin 2008-12-17 9:15 ` Johan Herland 2008-12-17 17:55 ` Stephan Beyer 2008-12-19 23:34 ` [PATCH 0/4] Notes reloaded Johannes Schindelin 2008-12-19 23:35 ` [PATCH 1/4] Introduce commit notes Johannes Schindelin 2008-12-20 6:53 ` Jeff King 2008-12-20 7:55 ` Robin Rosenberg 2008-12-20 8:05 ` Jeff King 2008-12-20 8:17 ` Junio C Hamano 2008-12-20 8:23 ` Jeff King 2008-12-20 20:09 ` Junio C Hamano 2008-12-20 12:04 ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin 2008-12-20 12:05 ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin 2008-12-20 12:06 ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin 2008-12-19 23:35 ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin 2008-12-19 23:35 ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin 2008-12-19 23:37 ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin 2008-12-19 23:49 ` Boyd Stephen Smith Jr. 2008-12-20 11:51 ` Johannes Schindelin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.