git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Nicolas Pitre <nico@cam.org>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [RFC] Cache negative delta pairs
Date: Thu, 29 Jun 2006 14:53:35 -0400	[thread overview]
Message-ID: <20060629185335.GA6704@coredump.intra.peff.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0606291410420.1213@localhost.localdomain>

On Thu, Jun 29, 2006 at 02:24:57PM -0400, Nicolas Pitre wrote:

> > I assumed the window would change over time (though our total is still
> > likely to hang around N*10 rather than N^2).
> It doesn't change unless you force a different window size.

Sorry, I meant "the items in the window for a given object would change
over time." 

> > This will fail to hit the cache anytime the window changes. How often
> > does the window change? In my test case, I would think anytime I added a
> > bunch of new photos, it would be likely that one of them would make it
> > into the window, thus invalidating the cache entry and forcing me to try
> > against every object in the window (even though I've already tried
> > 9/10).
> Sure.  But on the lot how often will that happen?

Reasonably often, according to my test. I did this to simulate usage
over time:
  - create an empty repo
  - from my test repo of 515 images, grab 20 at a time and add/commit
    them
  - after each commit, record the SHA1 of (object, window[0..n]) for
    each object to be delta'd
If doing the cache on the sha1 of the whole window is a good idea, then
we should see many of the same hashes from commit to commit. If we
don't, that means the newly added files are being placed in the old
windows, thus disrupting their hashes.

The results were that there was typically only 1 reusable window each
time I added 20 files. At that point, caching is largely pointless.

> And even then, since my suggested method implies only one cache lookup 
> in a much smaller cache instead of 10 lookups in a larger cache for each 
> objects it might end up faster overall even if sometimes some windows 
> don't match and deltas are recomputed needlessly.

I didn't benchmark, but I doubt it will have significant impact.
Especially on my photo test repo, the lookups are dominated by the
create_delta time by several orders of magnitude.

> Of course a greater depth might allow for a hit where there isn't any 
> otherwise.  But changing the delta depth is not something someone does 
> that often, and when the depth is changed then you better use -f with 
> git-repack as well which like I said should also ignore the cache.

That sounds reasonable to me for depth. What about other reasons for
try_delta to fail? Preferred base?

> > > So given my GIT repository such a cache would be 7610 * 40 = 304400 
> > > bytes if we stick to the full 40 bytes of sha1 to hash bad combinations.
> > Keep in mind that it will grow every time the window changes.
> What do you mean by window change?

I meant that when the window we use for a given object changes, it will
insert a new cache entry. But if we deal with invalidating unused cache
entries as you suggested before, it won't matter.

-Peff

  reply	other threads:[~2006-06-29 18:53 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060628223744.GA24421@coredump.intra.peff.net>
2006-06-29  3:09 ` [RFC] Cache negative delta pairs Junio C Hamano
2006-06-29  3:50   ` Jeff King
2006-06-29  3:58   ` Jeff King
2006-06-29  4:30     ` Jeff King
2006-06-29 16:39     ` Nicolas Pitre
2006-06-29 18:07       ` Jeff King
2006-06-29 18:48         ` Nicolas Pitre
2006-06-29 18:58           ` Jeff King
2006-06-29 19:06             ` Nicolas Pitre
2006-06-29  4:09   ` Jeff King
2006-06-29 15:42   ` Nicolas Pitre
2006-06-29 16:35     ` Nicolas Pitre
2006-06-29 18:00     ` Jeff King
2006-06-29 18:24       ` Nicolas Pitre
2006-06-29 18:53         ` Jeff King [this message]
2006-06-29 19:04           ` Nicolas Pitre
2006-06-29 19:52             ` Jeff King
2006-06-29 20:24               ` Nicolas Pitre
2006-06-29 21:04                 ` Linus Torvalds
2006-06-29 21:24                   ` Nicolas Pitre
2006-06-29 21:30                     ` Linus Torvalds
2006-06-29 21:39                       ` Jeff King
2006-06-29 21:43                       ` Joel Becker
2006-06-29 21:47                       ` Nicolas Pitre
2006-06-29 22:12                         ` Junio C Hamano
2006-06-30  3:44                           ` [PATCH] consider previous pack undeltified object state only when reusing delta data Nicolas Pitre
2006-06-30  9:45                             ` Johannes Schindelin
2006-06-30 12:28                               ` Andreas Ericsson
2006-06-30 16:55                                 ` Nicolas Pitre
2006-07-03  8:11                                   ` Andreas Ericsson
2006-06-29 21:35                   ` [RFC] Cache negative delta pairs Jeff King
2006-06-29 21:54                   ` Junio C Hamano
2006-06-29 22:22                   ` Junio C Hamano
2006-06-29 21:26                 ` Junio C Hamano
2006-06-29 21:37                 ` Jeff King
2006-06-29 22:31   ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060629185335.GA6704@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).