All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Govind Salinas <govind@sophiasuchtig.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: Git Notes idea.
Date: Wed, 17 Dec 2008 04:38:44 -0500	[thread overview]
Message-ID: <20081217093843.GA18265@coredump.intra.peff.net> (raw)
In-Reply-To: <5d46db230812161043m4a5873a8w4c323d634b639ba0@mail.gmail.com>

On Tue, Dec 16, 2008 at 12:43:55PM -0600, Govind Salinas wrote:

> I was thinking I would do my first implementation in pyrite and if I find
> that it works well I will port it.

OK, though your performance will probably suck unless you dump the notes
tree into a local hash at the beginning of your program. And looking up
every commit's note during revision traversal is one of the intended
uses (e.g., decorating git-log output, or filtering commits based on a
particular note).

And as Dscho mentioned, most of what you need is already there in C.
You are welcome to implement whatever you want in pyrite, of course, but
there is a desire to have this accessible to the revision traversal
machinery. And that means if you want your version in pyrite to be
compatible with what ends up in git, the data structure design needs to
be suitable for both.

> I just read the proposal from Johannes, he seems to want to use a
> similar layout.  However, I would like to modify my proposal slightly
> to make it work better when a gc is run.  I would modify the tree to
> look like this...
> 
> let 1234567890123456789012345678901234567890 be the
> id of the item that is annotated.
> 
> let abcdef7890123456789012345678901234567890 be the
> id of the note to be attached
> 
> root/
>      12/
>          34567890123456789012345678901234567890/
>              abcdef7890123456789012345678901234567890
> 
> This way all the notes are attached to a tree, so that gc won't
> think they are unreferenced objects.

But you have lost the ordering in your list, then, since they will not
be ordered by sha1 of the note contents. I don't know if you care. The
second sha1 is pointless, anyway, since nobody will know that number as
a reference; why not just name them monotonically starting at 1?

One of the things I don't like about having several notes is that it
introduces an extra level of indirection that every user has to pay for,
whether they want it or not. If a note can be a blob _or_ a tree, then
those who want to use blobs can reap the performance benefit. Those who
want multiple named notes in a hierarchy can pay the extra indirection
cost.

I haven't measured how big a cost that is (but bearing in mind that we
might want to do this lookup once per revision in a traversal, even one
extra object lookup can have an impact).

I'm also still not convinced the fan-out is worthwhile, but I can see
how it might be. It would be nice to see numbers for both.

> Perhaps I am missing something, how is it a linear search?.  Since we

I think Johannes explained in detail in another message, but it is a
linear search to look up directly in a tree object. Of course you can
build a hash or a sorted fixed-size list as an index.

> Also, how large do you expect the list to be under reasonable
> circumstances.

As many notes as there are commits is my goal (e.g., it is not hard to
imagine an automated process to add notes on build status). Ideally, we
could handle as many notes as there are objects; I see no reason not to
allow annotating arbitrary sha1's (I don't know if there is a use for
that, but the more scalable the implementation, the better).

> >  Some thoughts from me on naming issues:
> >  http://article.gmane.org/gmane.comp.version-control.git/100402
> 
> On naming.  I strongly support a ref/notes/sha1/sha1 approach.  If
> having a type to the note is important, then perhaps the first line of
> a note could be considered a type or a set of "tags".  This way you

I don't think we are talking about the same thing. What I mean by naming
is "here is a shorthand for referring to notes" that is not necessarily
coupled with the implementation. That is, I would like to do something
like:

  git log --notes-filter="foo:bar == 1"

and have that "foo:bar" as a shorthand on each commit for:

  refs/notes/foo:$COMMIT/bar

Without a left-hand side (e.g., "bar"), we get:

  refs/notes/default:$COMMIT/bar

Or without a right-hand side (e.g., "foo:"), we get:

  refs/notes/foo:$COMMIT

So you can group related notes in the same tree (which gives you fast
lookup if you are looking at multiple ones, since you only have to do
the tree lookup once), or you can keep notes in separate trees (which
means you can distribute some but not others).

I think your "list of notes" proposal on top of that would be for any
note resolution to provide a tree instead of a blob, with sequenced
elements. I.e., foo:bar might have multiple notes, like:

  refs/notes/foo:$COMMIT/bar/1
  refs/notes/foo:$COMMIT/bar/2
  refs/notes/foo:$COMMIT/bar/3

> drawback is that you have to open the blob to see the type.  A hybrid
> approach that uses refs/notes/acked/sha/sha which is one lookup if

Right, that is possible, but implementing only half of what I suggested
above. I think you should be flexible enough to have grouped notes for
fast lookup, or ungrouped notes for more flexibility.

-Peff

  parent reply	other threads:[~2008-12-17  9:40 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-16  8:15 Git Notes idea Govind Salinas
2008-12-16  8:51 ` Jeff King
2008-12-16  8:53   ` Jeff King
2008-12-16 18:43   ` Govind Salinas
2008-12-16 23:48     ` Johannes Schindelin
2008-12-17  9:45       ` Jeff King
     [not found]       ` <5d46db230812161815s1c48af9dwc96a4701fb2a669b@mail.gmail.com>
     [not found]         ` <alpine.DEB.1.00.0812170420560.14632@racer>
2008-12-17 10:11           ` Jeff King
2008-12-17 11:38             ` Johannes Schindelin
2008-12-17 19:20               ` Junio C Hamano
2008-12-18  3:08                 ` Johannes Schindelin
2008-12-19 17:42               ` Govind Salinas
2008-12-19 17:18             ` Govind Salinas
2008-12-19 17:38               ` Govind Salinas
2008-12-19 21:25                 ` Jeff King
2008-12-19 22:24                   ` Govind Salinas
2008-12-20  4:54                     ` Jeff King
2008-12-17 12:21       ` Petr Baudis
2008-12-17  9:38     ` Jeff King [this message]
2008-12-17 17:06       ` Govind Salinas
2008-12-18 13:54         ` Jeff King
2008-12-17  0:12   ` rebasing commits that have notes, was " Johannes Schindelin
2008-12-17  9:15     ` Johan Herland
2008-12-17 17:55       ` Stephan Beyer
2008-12-19 23:34   ` [PATCH 0/4] Notes reloaded Johannes Schindelin
2008-12-19 23:35     ` [PATCH 1/4] Introduce commit notes Johannes Schindelin
2008-12-20  6:53       ` Jeff King
2008-12-20  7:55         ` Robin Rosenberg
2008-12-20  8:05           ` Jeff King
2008-12-20  8:17             ` Junio C Hamano
2008-12-20  8:23               ` Jeff King
2008-12-20 20:09                 ` Junio C Hamano
2008-12-20 12:04         ` [PATCH v2 0/4] Notes, reloaded Johannes Schindelin
2008-12-20 12:05           ` [PATCH v2 1/4] Introduce commit notes Johannes Schindelin
2008-12-20 12:05           ` [PATCH v2 2/4] Add a script to edit/inspect notes Johannes Schindelin
2008-12-20 12:05           ` [PATCH v2 3/4] Speed up git notes lookup Johannes Schindelin
2008-12-20 12:06           ` [PATCH v2 4/4] Add an expensive test for git-notes Johannes Schindelin
2008-12-19 23:35     ` [PATCH 2/4] Add a script to edit/inspect notes Johannes Schindelin
2008-12-19 23:35     ` [PATCH 3/4] Speed up git notes lookup Johannes Schindelin
2008-12-19 23:37     ` [PATCH 4/4] Add an expensive test for git-notes Johannes Schindelin
2008-12-19 23:49       ` Boyd Stephen Smith Jr.
2008-12-20 11:51         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081217093843.GA18265@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=govind@sophiasuchtig.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.