More precise tag following

* More precise tag following
@ 2007-01-26 11:07 Junio C Hamano
  2007-01-26 11:53 ` Junio C Hamano
  2007-01-27  8:01 ` Shawn O. Pearce
  0 siblings, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-26 11:07 UTC (permalink / raw)
  To: git

What if (I know, this discussion does not belong here until
1.5.0 final) we had a "reverse" database that keeps track of
what tag references which object, and "git rev-list" knows how
to exploit it?  That is, just like generating a list of objects
that are reachable with --objects option, if we can add a new
option --with-tag very cheaply to list tag objects that would
reach what are in the generated list of objects?

The way current git-fetch "follows" tags is very imprecise,
although it is good enough in practice.  If you happen to
locally have an object that is tagged (and currently we get the
list of non-tag objects that tags eventually refer to in an
out-of-band-ish way), then we fetch the tag and everything it
reaches.  This means if you copied a single commit that is tagged
from somewhere without objects that it refers to, we would end
up fetching beyond that commit to complete it.  Which would not
result in a corrupted repository, but ideally we should not be
fetching the tag in such a case.  And with something like
enhanced rev-list that knows --with-tag it might be possible (I
need to think a bit more about have/want exchange and what
should happen later in fetch-pack and push-pack protocol,
though).

The application of this actually may not be limited to tag
following.  We could define a tag-like objects that attaches to
other objects and enhance its meanings (annotates them) and
treat them the same way as tags for objects traversal and
transfer purposes, so if we were to do this, the option to
exploit reverse database would be called --with-annotation and
not --with-tag.

 - If a single-path following turns out to be too expensive
   (there was a longstanding talk about "git log --single-follow
   $path"; "git blame" also follows a single path although the
   target it follows can fork into two or more when following
   cut&pastes) because we need to explode multi-level trees for
   each commit while traversing commit ancestry, we could define
   an annotation to a commit that lists the set of paths the
   commit touches relative to each of its parents (so the object
   contains N lists of paths), so that pathspec limiting needs
   to open and read only one object to figure out that the trees
   do not have to be opened to skip the commit and/or a merge
   can be simplified.

 - We could define an annotation to a commit that describes what
   fake parents it should have instead of the real ones
   (i.e. grafts implemented in the object database).

Just an idle thought.

^ permalink raw reply	[flat|nested] 92+ messages in thread