All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC PATCH 0/5] Add delta islands support
Date: Tue, 24 Jul 2018 06:16:28 -0400	[thread overview]
Message-ID: <20180724101627.GC3578@sigill.intra.peff.net> (raw)
In-Reply-To: <20180722054836.28935-1-chriscool@tuxfamily.org>

On Sun, Jul 22, 2018 at 07:48:31AM +0200, Christian Couder wrote:

> I kept Peff as the author and took the liberty to add his
> Signed-off-by to all the patches.

Yes, that's fine. This is a topic I've been meaning to upstream for a
long time, so I'm happy to see it progressing in that direction.

A few reasons I've been hesitant:

 - this produces worse packs for people cloning. The packs we serve at
   GitHub for heavily-forked repositories can be significantly larger
   than what you could get by repacking in isolation.

   I think this is inherent in the scheme (we're losing some delta
   opportunities). But I think it's also made worse because the delta
   window gets clogged with candidates that are forbidden by the island
   config. Repacking with a big --window helps (and doesn't take as long
   as it otherwise might because we can reject some object pairs based
   on islands before doing any computation on the content).

   I also haven't done much modeling on how this scheme progresses over
   time. Imagine users tend to create forks at random points in time and
   then let them get stale. You're going to accumulate a bunch of
   islands that increasingly don't overlap with each other. And as time
   goes on, your delta opportunities get worse and worse.

   Which isn't to say the scheme doesn't work to some degree. It has
   been running in production at GitHub for several years. But it's very
   cobbled together, and I had always hoped to give it some more careful
   thought before sending it upstream. ;) That may be "the perfect is
   the enemy of the good" though (or at least the enemy of the
   mediocre).

 - the whole regex island-naming configuration feels complicated and
   hacky to me. I haven't been able to come up with a better solution,
   though.

 - the whole islandCore thing is it's own weirdness (more below)

> The patches look good to me except perhaps that "pack.islandCore" is
> not documented. Maybe something could be added in
> Documentation/technical/ too, or maybe improving commit messages could
> be enough.

So basically what islandCore does is make one island "special", and we
write all of its objects out first in the packfile. The idea here was
that it would interact well with the pack-reuse code which tries to send
the early part of the pack out verbatim. And if you could pick the fork
which is most-often cloned, then that would be a win (so you'd pick
torvalds/linux.git here).

But:

 - the pack-reuse code we have upstream is crap; it doesn't kick in for
   this instance anyway, because it's too concerned with whether you're
   sending all of the pack. So if you have a 10GB pack and the "core"
   island is the entire first 1GB, you should be able to ship out that
   first GB easily. But the code says "well, that's only 10%!", which is
   not enough to trigger.

   I have replacement code (which we have been running in production)
   that is more clever about the threshold, and also can handle gaps in
   the continuity (so we might realize we need to send objects 1-5000,
   then skip a few, then 5037-8000, and so on). And after that, it's not
   at all clear to me if the islandCore thing is actually still helpful.

 - it's hard to configure, because it may actually change over time
   (whereas all the other island config is basically static and just impacts
   how we consider the refs that are there)

 - there may actually be _multiple_ forks that are "special". In
   torvalds/linux, for example, there's a fair bit of hierarchy, and
   there are people who forked Linus and have their own active
   sub-community. The actual island concept handles this kind of
   layering OK, but the islandCore thing is pretty static.

So I'm not sure if islandCore is even needed or not. But somebody would
have to do some experiments to figure it out (and obviously I should
share the replacement pack-reuse patches).

-Peff

  parent reply	other threads:[~2018-07-24 10:16 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-22  5:48 [RFC PATCH 0/5] Add delta islands support Christian Couder
2018-07-22  5:48 ` [RFC PATCH 1/5] packfile: make get_delta_base() non static Christian Couder
2018-07-24 16:19   ` Junio C Hamano
2018-07-27 11:29     ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 2/5] Add delta-islands.{c,h} Christian Couder
2018-07-22  8:50   ` Duy Nguyen
2018-07-22 13:57     ` Christian Couder
2018-08-05 18:53     ` Christian Couder
2018-08-06 14:17       ` Jeff King
2018-08-06 15:53       ` Duy Nguyen
2018-08-06 18:54         ` Christian Couder
2018-08-06 19:21           ` Duy Nguyen
2018-07-24 16:47   ` Junio C Hamano
2018-07-27 13:02     ` Jeff King
2018-07-27  9:40   ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 3/5] pack-objects: add delta-islands support Christian Couder
2018-07-22  8:55   ` Duy Nguyen
2018-08-05 17:28     ` Christian Couder
2018-07-23 18:52   ` Stefan Beller
2018-07-24  9:58     ` Jeff King
2018-07-24 17:20       ` Stefan Beller
2018-07-27 13:13         ` Jeff King
2018-07-27 17:22           ` Stefan Beller
2018-07-28  9:00             ` Jeff King
2018-07-28 12:12               ` Christian Couder
2018-07-24 17:03   ` Junio C Hamano
2018-07-24 17:11     ` Junio C Hamano
2018-08-05 17:40       ` Christian Couder
2018-08-06  8:44         ` Christian Couder
2018-08-06 13:58           ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 4/5] repack: " Christian Couder
2018-07-22  5:48 ` [RFC PATCH 5/5] t: add t9930-delta-islands.sh Christian Couder
2018-07-24 10:24   ` Jeff King
2018-07-24 10:16 ` Jeff King [this message]
2018-07-24 17:18   ` [RFC PATCH 0/5] Add delta islands support Junio C Hamano
2018-07-24 21:14     ` Jeff King
2018-07-26 13:34 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724101627.GC3578@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.