All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Thu, 11 Feb 2016 16:46:53 -0500	[thread overview]
Message-ID: <20160211214653.GA835@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq4mdfvspl.fsf@gitster.mtv.corp.google.com>

On Thu, Feb 11, 2016 at 01:32:22PM -0800, Junio C Hamano wrote:

> > ... One
> > alternative would be to amend the bundle format so that rather than a
> > single file, you get a bundle header whose end says "...and my matching
> > packfile is 1234-abcd". And then the client knows that they can fetch
> > that separately from the same source.
> 
> I would imagine that we would introduce bundle v3 format for this.

Yeah, I think so. And in fact, the "here are my packfiles..." bit should
probably be in the v3 header.

> It may want to say "my matching packfiles are these" to accomodate a
> set of packs split at max-pack-size, but I am perfectly fine to say
> you must create a single pack when you use a bundle with separate
> header to keep things simpler.

Interesting. My initial thought is that one could replace "git bundle
create foo.bundle --all && split foo.bundle" with this (for storing or
transferring a bundle somewhere that cannot handle the whole thing in
one go).  It has the advantage that you do not need to recreate the full
bundle to extract the data.

But I think the negatives of splitting across packs would outweigh that.
You cannot have cross-pack deltas, so your total size would be much
larger (in general, I have yet to see a case where max-pack-size is
beneficial, beyond the obvious "your filesystem cannot store files
larger than N bytes").

So I don't think it would be helpful for normal bundle use.

It _could_ be helpful in the context we're talking about here, though.
If I create the split-bundle so that people can resumable-clone from me,
they can only clone up to that bundle's creation point (and
incrementally fetch the rest). But with a single pack, I can't update
the split-bundle without doing a full repack. With multiple packs, I
could regenerate the split-bundle header and just mention the new pack.

It wouldn't be as _efficient_ as a full repack of course, but it may be
a good, cheap interim solution between repacks.

> >   2. Client goes to <url>. They see that they are fetching a bundle,
> >      and know not to do the usual smart-http or dumb-http protocols.
> >      They can fetch the bundle header resumably (though it's tiny, so it
> >      doesn't really matter).
> 
> Might be in megabytes range, though, with many refs.  It still is
> tiny, though ;-).

Yes, but it's the same amount we already spew for a ref advertisement,
which isn't resumable, either. :)

I think I'd probably make this a straight fetch in the first iteration,
and we can worry about making it resumable later on if people actually
care.

> > And you'll notice, too, that all of the bundle-http magic kicks in
> > during step 2 because the client sees they're grabbing a bundle. Which
> > means that the <url> in step 1 doesn't _have_ to be a bundle. It can be
> > "go fetch from kernel.org, then come back to me".
> 
> Or it could be a packfile (and the client discovers roots), as you
> mentioned in a separate message.  I personally do not think it buys
> us much, as long as we do a bundle represented as a header and a
> separate pack.

Yeah, I think I agree, and if it were just me, I'd implement the bundle
part and call it done. But the important thing to me is that we haven't
eliminated the possibility of doing the pure-pack thing on top, if we
choose to.

-Peff

  reply	other threads:[~2016-02-11 21:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23   ` Stefan Beller
2016-02-10 20:57     ` Junio C Hamano
2016-02-10 21:22       ` Jonathan Nieder
2016-02-10 22:03         ` Jeff King
2016-02-10 21:01     ` Jonathan Nieder
2016-02-10 21:07       ` Junio C Hamano
2016-02-11  3:43       ` Junio C Hamano
2016-02-11 18:04         ` Shawn Pearce
2016-02-11 23:53       ` Duy Nguyen
2016-02-13  5:07         ` Junio C Hamano
2016-02-10 21:49   ` Jeff King
2016-02-10 22:17     ` Jonathan Nieder
2016-02-10 23:03       ` Jeff King
2016-02-10 22:40     ` Junio C Hamano
2016-02-11 21:32     ` Junio C Hamano
2016-02-11 21:46       ` Jeff King [this message]
2016-02-13  1:40     ` Blake Burkhart
2016-02-13 17:00       ` Jeff King
2016-02-14  2:14     ` Shawn Pearce
2016-02-14 17:05       ` Jeff King
2016-02-14 17:56         ` Shawn Pearce
2016-02-16 18:34         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160211214653.GA835@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.