[PATCH/RFC 0/6] commit caching

* [PATCH/RFC 0/6] commit caching
@ 2013-01-29  9:14 Jeff King
  2013-01-29  9:15 ` [PATCH 1/6] csum-file: make sha1write const-correct Jeff King
                   ` (7 more replies)
  0 siblings, 8 replies; 43+ messages in thread
From: Jeff King @ 2013-01-29  9:14 UTC (permalink / raw)
  To: git; +Cc: Duy Nguyen, Shawn O. Pearce

This is the cleaned-up version of the commit caching patches I mentioned
here:

  http://article.gmane.org/gmane.comp.version-control.git/212329

The basic idea is to generate a cache file that sits alongside a
packfile and contains the timestamp, tree, and parents in a more compact
and easy-to-access format.

The timings from this one are roughly similar to what I posted earlier.
Unlike the earlier version, this one keeps the data for a single commit
together for better cache locality (though I don't think it made a big
difference in my tests, since my cold-cache timing test ends up touching
every commit anyway).  The short of it is that for an extra 31M of disk
space (~4%), I get a warm-cache speedup for "git rev-list --all" of
~4.2s to ~0.66s.

The big thing it does not (yet) do is use offsets to reference sha1s, as
Shawn suggested.  This would potentially drop the on-disk size from 84
bytes to 16 bytes per commit (or about 6M total for linux.git).

Coupled with using compression level 0 for trees (which do not compress
well at all, and yield only a 2% increase in size when left
uncompressed), my "git rev-list --objects --all" time drops from ~40s to
~25s. Perf reveals that we're spending most of the remaining time in
lookup_object. I've spent a fair bit of time trying to optimize that,
but with no luck; I think it's fairly close to optimal. The problem is
just that we call it a very large number of times, since it is the
mechanism by which we recognize that we have already processed each
sha1.

  [1/6]: csum-file: make sha1write const-correct
  [2/6]: strbuf: add string-chomping functions
  [3/6]: introduce pack metadata cache files
  [4/6]: introduce a commit metapack
  [5/6]: add git-metapack command
  [6/6]: commit: look up commit info in metapack

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread