All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Stefan Beller <sbeller@google.com>
Cc: Christian Couder <christian.couder@gmail.com>,
	git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC PATCH 3/5] pack-objects: add delta-islands support
Date: Fri, 27 Jul 2018 09:13:34 -0400	[thread overview]
Message-ID: <20180727131333.GC18599@sigill.intra.peff.net> (raw)
In-Reply-To: <CAGZ79kZkagveB+jG9iLQ2ohaSfAzY5YtWC=BTdD1o9OQUrw90Q@mail.gmail.com>

On Tue, Jul 24, 2018 at 10:20:05AM -0700, Stefan Beller wrote:

> So in my understanding we have a "common base pack" and specific
> packs on top for each "island".

Sort of. This is another hacky part. The islands themselves are
generally just about forbidding deltas, and not any particular kind of
layering.

But there's some magic layering only for the "core" island, which gets
to go first (and makes a sort of pseudo-pack at the front of the one
pack). And then everything else is written willy nilly. This is a hack
to try to make the "blit the pack bytes out" code path for cloning fast.
And that has to pick _one_ winner, so ideally you'd point it at the
thing that gets cloned the most, and everybody else gets to be a loser.

Again, this was designed for the current pack-reuse code we have
upstream, which we (GitHub) found to be pretty crappy (which I feel
justified in saying as one of the authors). I need to clean up and share
the alternative strategy we ended up with.

> Do you envision to have "groups of islands" (an atoll) for say all
> open source clones of linux.git, such that you can layer the packs?
> You would not just have the base pack + island pack, but have one
> pack that is common to most islands?

So no, we don't really layer in any sane way. If pack-objects were fed
the topological relationships between the forks, in theory we could
create a layered packfile that respects that.

But even that is not quite enough. At the time of forking, you might
imagine that torvalds/linux has the base pack, and then somebody forks
from them and contains all of those objects plus more, and somebody
forks from them, and so on. But that's just a snapshot. Later
torvalds/linux will get a bunch of new objects pushed to it. And some of
its forks will merge those objects, too. But some of them will just rot,
abandoned, as nobody ever touches them again.

So I don't think there's much to be gained by paying attention to the
external forking relationships. We have to discover afresh the
relationships between objects, and which refs (and thus which islands)
point to them.

One thing I don't think we ever tried was doubling down on the
islandCore concept and making the "root" fork as tightly packed as it
could be (with the assumption that _most_ people grab that). And then
just respect the islands for all the other objects (remember this is an
optimization, so the worst case is somebody asks for an object during a
fetch and we have to throw away its on-disk delta).

That would solve the problem that fetching torvalds/linux from GitHub
yields a bigger pack than fetching it from kernel.org. But again, it's
making that root fork weirdly magical. People who fetch solely from
other forks won't get any benefit (and may even see worse packs).

-Peff

  reply	other threads:[~2018-07-27 13:13 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-22  5:48 [RFC PATCH 0/5] Add delta islands support Christian Couder
2018-07-22  5:48 ` [RFC PATCH 1/5] packfile: make get_delta_base() non static Christian Couder
2018-07-24 16:19   ` Junio C Hamano
2018-07-27 11:29     ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 2/5] Add delta-islands.{c,h} Christian Couder
2018-07-22  8:50   ` Duy Nguyen
2018-07-22 13:57     ` Christian Couder
2018-08-05 18:53     ` Christian Couder
2018-08-06 14:17       ` Jeff King
2018-08-06 15:53       ` Duy Nguyen
2018-08-06 18:54         ` Christian Couder
2018-08-06 19:21           ` Duy Nguyen
2018-07-24 16:47   ` Junio C Hamano
2018-07-27 13:02     ` Jeff King
2018-07-27  9:40   ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 3/5] pack-objects: add delta-islands support Christian Couder
2018-07-22  8:55   ` Duy Nguyen
2018-08-05 17:28     ` Christian Couder
2018-07-23 18:52   ` Stefan Beller
2018-07-24  9:58     ` Jeff King
2018-07-24 17:20       ` Stefan Beller
2018-07-27 13:13         ` Jeff King [this message]
2018-07-27 17:22           ` Stefan Beller
2018-07-28  9:00             ` Jeff King
2018-07-28 12:12               ` Christian Couder
2018-07-24 17:03   ` Junio C Hamano
2018-07-24 17:11     ` Junio C Hamano
2018-08-05 17:40       ` Christian Couder
2018-08-06  8:44         ` Christian Couder
2018-08-06 13:58           ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 4/5] repack: " Christian Couder
2018-07-22  5:48 ` [RFC PATCH 5/5] t: add t9930-delta-islands.sh Christian Couder
2018-07-24 10:24   ` Jeff King
2018-07-24 10:16 ` [RFC PATCH 0/5] Add delta islands support Jeff King
2018-07-24 17:18   ` Junio C Hamano
2018-07-24 21:14     ` Jeff King
2018-07-26 13:34 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727131333.GC18599@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.