All of lore.kernel.org
 help / color / mirror / Atom feed
* Looking up objects that point to other objects
@ 2011-08-26 19:01 Ævar Arnfjörð Bjarmason
  2011-08-26 20:10 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-08-26 19:01 UTC (permalink / raw)
  To: Git Mailing List

Here's a couple of tasks that require brute-force with the Git object
format that I've wanted to do at some point.

 * Associate a blob with trees

   Given a blob sha1 find trees that reference it.

 * Associate trees with commits / other trees.

   Given a tree find which commit points to that tree, or a parent
   tree N levels up the stack that a commit points to.

Has anyone written tools to do this? They'd obviously be very CPU and
I/O intensive, but occasionally I encounter cases where I'd find this
useful, e.g. to find what commit contains this huge blob, or what
trees / commits are involved with a corrupted object.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Looking up objects that point to other objects
  2011-08-26 19:01 Looking up objects that point to other objects Ævar Arnfjörð Bjarmason
@ 2011-08-26 20:10 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2011-08-26 20:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Fri, Aug 26, 2011 at 09:01:22PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Here's a couple of tasks that require brute-force with the Git object
> format that I've wanted to do at some point.
> 
>  * Associate a blob with trees
> 
>    Given a blob sha1 find trees that reference it.
> 
>  * Associate trees with commits / other trees.
> 
>    Given a tree find which commit points to that tree, or a parent
>    tree N levels up the stack that a commit points to.

I've more frequently wanted to find the entrance and exit points of a
particular blob in history, and used something like:

  git log --all --no-abbrev -c --raw --format='commit %H' |
  perl -le '
    my @blobs = map { qr/$_/ } @ARGV;
    while(<STDIN>) {
      if (/^commit (.*)/) {
        $commit = $1;
      }
      else {
        foreach my $re (@blobs) {
          next unless /$re/;
          print $commit;
          last;
        }
      }
    }
  ' $blobs

which is fairly efficient. It's brute-force, but at least it all happens
in O(1) processes. It can find blobs in git.git in about a minute or so
on my machine.

I don't think there's a way to ask for all of the trees in a commit in a
single process. It wouldn't be hard to write in C, of course, but it's
not something the current tools support. However, you can use the above
script to narrow the range of commits that you know contain a blob, and
then individually run ls-tree each one. It's better at least than running
ls-tree on every commit in the repo.

Anything that iterates over commits is going to end up seeing the same
trees again and again. I think you could probably do better by thinking
of it like a directed graph problem. Nodes are sha1s, and edges are any
references of interest:

  1. For a commit, make an edge from the commit to its tree.

  2. For a tree, make an edge from the tree to each of its entries.

And then the problem is reduced to "find all commit nodes that have a
path to $blob". Which you can do by breadth-first search from the
commits (or backwards from the blob).

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-08-26 20:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26 19:01 Looking up objects that point to other objects Ævar Arnfjörð Bjarmason
2011-08-26 20:10 ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.