All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Martin Fick <mfick@codeaurora.org>,
	Git List <git@vger.kernel.org>
Subject: Re: [PATCH] git exproll: steps to tackle gc aggression
Date: Sat, 10 Aug 2013 08:24:39 +0700	[thread overview]
Message-ID: <CACsJy8BZv0nr92foiYpbscpg86awFZVerpeXcz5CuYWeg3guEA@mail.gmail.com> (raw)
In-Reply-To: <20130809221615.GA7160@sigill.intra.peff.net>

On Sat, Aug 10, 2013 at 5:16 AM, Jeff King <peff@peff.net> wrote:
> Another solution could involve not writing the duplicate of Y in the
> first place. The reason we do not store thin-packs on disk is that you
> run into problems with cycles in the delta graph (e.g., A deltas against
> B, which deltas against C, which deltas against A; at one point you had
> a full copy of one object which let you create the cycle, but you later
> deleted it as redundant with the delta, and now you cannot reconstruct
> any of the objects).
>
> You could possibly solve this with cycle detection, though it would be
> complicated (you need to do it not just when getting rid of objects, but
> when sending a pack, to make sure you don't send a cycle of deltas that
> the other end cannot use). You _might_ be able to get by with a kind of
> "two-level" hack: consider your main pack as "group A" and newly pushed
> packs as "group B". Allow storing thin deltas on disk from group B
> against group A, but never the reverse (nor within group B). That makes
> sure you don't have cycles, and it eliminates even more I/O than any
> repacking solution (because you never write the extra copy of Y to disk
> in the first place). But I can think of two problems:
>
>   1. You still want to repack more often than every 300 packs, because
>      having many packs cost both in space, but also in object lookup
>      time (we can do a log(N) search through each pack index, but have
>      to search linearly through the set of indices).
>
>   2. As you accumulate group B packs with new objects, the deltas that
>      people send will tend to be against objects in group B. They are
>      closer to the tip of history, and therefore make better deltas for
>      history built on top.
>
> That's all just off the top of my head. There are probably other flaws,
> too, as I haven't considered it too hard.

Some refinements on this idea

 - We could keep packs in group B ordered as the packs come in. The
new pack can depend on the previous ones.
 - A group index in addition to separate index for each pack would
solve linear search object lookup problem.
-- 
Duy

  reply	other threads:[~2013-08-10  1:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06  2:38 [PATCH] git exproll: steps to tackle gc aggression Ramkumar Ramachandra
2013-08-06 12:24 ` Duy Nguyen
2013-08-06 17:39   ` Junio C Hamano
2013-08-07  4:43     ` Ramkumar Ramachandra
2013-08-08  7:13       ` Junio C Hamano
2013-08-08  7:44         ` Ramkumar Ramachandra
2013-08-08 16:56           ` Junio C Hamano
2013-08-08 17:34             ` Martin Fick
2013-08-08 18:52               ` Junio C Hamano
2013-08-08 19:14                 ` Ramkumar Ramachandra
2013-08-08 17:36             ` Ramkumar Ramachandra
2013-08-08 19:37               ` Junio C Hamano
2013-08-08 20:04                 ` Ramkumar Ramachandra
2013-08-08 21:09                   ` Martin Langhoff
2013-08-09 11:00                   ` Jeff King
2013-08-09 13:34                     ` Ramkumar Ramachandra
2013-08-09 17:35                       ` Junio C Hamano
2013-08-09 22:16                       ` Jeff King
2013-08-10  1:24                         ` Duy Nguyen [this message]
2013-08-10  9:50                           ` Jeff King
2013-08-10  5:26                         ` Junio C Hamano
2013-08-10  8:42                         ` Ramkumar Ramachandra
2013-08-10  9:24                           ` Duy Nguyen
2013-08-10  9:28                             ` Duy Nguyen
2013-08-10  9:43                             ` Jeff King
2013-08-10  9:50                               ` Duy Nguyen
2013-08-10 10:05                                 ` Ramkumar Ramachandra
2013-08-10 10:16                                   ` Duy Nguyen
2013-08-10  9:38                           ` Jeff King
2013-08-07  0:10   ` Martin Fick
2013-08-08  2:18     ` Duy Nguyen
2013-08-07  0:25 ` Martin Fick
2013-08-07  4:36   ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8BZv0nr92foiYpbscpg86awFZVerpeXcz5CuYWeg3guEA@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.