All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Ted Ts'o <tytso@mit.edu>
Cc: Junio C Hamano <gitster@pobox.com>,
	Luke Kenneth Casson Leighton <luke.leighton@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: git pack/unpack over bittorrent - works!
Date: Fri, 03 Sep 2010 15:41:26 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.1009031522590.19366@xanadu.home> (raw)
In-Reply-To: <20100903183120.GA4887@thunk.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4186 bytes --]

On Fri, 3 Sep 2010, Ted Ts'o wrote:

> On Fri, Sep 03, 2010 at 10:12:29AM -0700, Junio C Hamano wrote:
> > Theodore Tso <tytso@MIT.EDU> writes:
> > 
> > > ...  So people who are willing
> > > to participate as part of the peer2peer network can download the
> > > instructions for how to make the canonical pack once a month, and use it
> > > to create the canonical pack.  If the "Gittorrent master" has spent a
> > > lot of time to carefully compute the most efficient set of delta
> > > pairings, they will get the slight benefit of a more efficient pack
> > > which they could use instead of th eir local one without having to use
> > > large values of --window and --depth to "git repack".
> > 
> > Hmm, is the idea essentially to tell people "Here is a snapshot of Linus
> > repository as of a few weeks ago, carefully repacked.  Instead of running
> > "git clone" yourself, please bootstrap your repository by copying it over
> > bittorrent and then "git pull" to update it"?
> 
> Essentially, yes.  I just don't think bittorrent makes sense for
> anything else, because the git protocol is so much more efficient for
> tiny incremental updates...
> 
> So the only other part of my idea is that we could construct a special
> set of instructions that would allow them to recreate the carefully
> repacked snapshot of Linus's repository without having to download it
> from a central seed site.  Instead, they could download a small set of
> instructions, and use that in combination with the objects already in
> their repository, to create a bit-identical version of the carefully
> repacked Linus repository.  It's basically rip-off of jigdo, but
> applied to git repositories instead of Debian .iso files.

Small?  Well...

Let's see what such instructions for how to make the canonical pack 
might look like:

First you need the full ordered list of objects.  That's a 20-byte SHA1
per object.  The current Linux repo has 1704556 objects, therefore this
list is 33MB already.

Then you need to identify which of those objects are deltas, and against
which object.  Assuming we can index in the list of objects, that means,
say, one bit to identify a delta, and 31 bits for indexing the base. In
my case this is currently 1393087 deltas, meaning 5.3 MB of additional
information.

But then, the deltas themselves can have variations in their encoding.
And we did change the heuristics for the actual delta encoding in the
past too (while remaining backward compatible), but for a canonical pack
creation we'd need to describe that in order to make things totally
reproducible.

So there are 2 choices here: Either we specify the Git version to make 
sure identical delta code is used, but that will put big pressure on 
that code to remain stable and not improve anymore as any behavior 
change will create a compatibility issue forcing people to upgrade their 
Git version all at the same time.  That's not something I want to see 
the world rely upon.

The other choice is to actually provide the delta output as part of the 
instruction for the canonical pack creation.

In my case, the delta output represents:

$ git verifi-pack -v .git/objects/pack/*.pack | \
  awk --posix  '/^[0-9a-f]{40}/ && $6 { tot += 1; size += $4 } \
                END { print tot, size }'
1393087 155022247

We therefore have 148 MB of purely delta data here.

So that makes for a grand total of 33 MB + 148 MB = 181 MB of data just
to be able to unambiguously reproduce a pack with a full guarantee of
perfect reproducibility.

But even with the presumption of stable delta code, the recipee would 
still take 38 MB that everyone would have to download every month which 
is far more than what a monthly incremental update of a kernel repo 
requires.  Of course you could create a delta between consecutive 
recipees, but that is becoming rather awkward.

I still think that if someone really want to apply the P2P principle à 
la BitTorrent to Git, then it should be based on the distributed 
exchange of _objects_ as I outlined in a previous email, and not file 
chunks like BitTorrent does.  The canonical Git _objects_ are fully 
defined, while their actual encoding may change.


Nicolas

  reply	other threads:[~2010-09-03 19:41 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-01 14:36 git pack/unpack over bittorrent - works! Luke Kenneth Casson Leighton
2010-09-01 22:04 ` Nguyen Thai Ngoc Duy
2010-09-02 13:37   ` Luke Kenneth Casson Leighton
2010-09-02 13:53     ` Luke Kenneth Casson Leighton
2010-09-02 14:08     ` Ævar Arnfjörð Bjarmason
2010-09-02 15:33     ` A Large Angry SCM
2010-09-02 15:42       ` Luke Kenneth Casson Leighton
2010-09-02 15:51         ` Luke Kenneth Casson Leighton
2010-09-02 17:06           ` A Large Angry SCM
2010-09-02 15:58         ` Jeff King
2010-09-02 16:41           ` Nicolas Pitre
2010-09-02 17:09             ` A Large Angry SCM
2010-09-02 17:31               ` Nicolas Pitre
2010-09-02 19:17                 ` Luke Kenneth Casson Leighton
2010-09-02 19:29                   ` Shawn O. Pearce
2010-09-02 19:51                     ` Luke Kenneth Casson Leighton
2010-09-02 20:06                     ` Luke Kenneth Casson Leighton
2010-09-03  0:36                       ` Nicolas Pitre
2010-09-03 10:34                         ` Luke Kenneth Casson Leighton
2010-09-03 17:03                         ` Junio C Hamano
2010-09-02 20:28                     ` Brandon Casey
2010-09-02 20:48                       ` Luke Kenneth Casson Leighton
2010-09-02 20:45                     ` Jakub Narebski
2010-09-02 21:10                       ` Luke Kenneth Casson Leighton
2010-09-02 21:19                         ` Luke Kenneth Casson Leighton
2010-09-03  0:29                         ` Nicolas Pitre
2010-09-03  2:48                           ` Nguyen Thai Ngoc Duy
2010-09-03 10:55                             ` Luke Kenneth Casson Leighton
2010-09-03 10:23                           ` Luke Kenneth Casson Leighton
2010-09-03 10:54                           ` Luke Kenneth Casson Leighton
2010-09-02 18:07           ` Luke Kenneth Casson Leighton
2010-09-02 18:23             ` Casey Dahlin
2010-09-02 16:58         ` A Large Angry SCM
2010-09-02 17:21         ` Nicolas Pitre
2010-09-02 19:41           ` Luke Kenneth Casson Leighton
2010-09-02 19:52             ` A Large Angry SCM
2010-09-02 23:09             ` Nicolas Pitre
2010-09-03 10:37               ` Theodore Tso
2010-09-03 11:04                 ` Luke Kenneth Casson Leighton
2010-09-03 17:12                 ` Junio C Hamano
2010-09-03 18:31                   ` Ted Ts'o
2010-09-03 19:41                     ` Nicolas Pitre [this message]
2010-09-03 21:11                       ` Luke Kenneth Casson Leighton
2010-09-04  0:24                         ` Nguyen Thai Ngoc Duy
2010-09-04  0:57                           ` Nguyen Thai Ngoc Duy
2010-09-04  1:52                           ` Artur Skawina
2010-09-04  4:39                             ` Nicolas Pitre
2010-09-04  5:42                               ` Artur Skawina
2010-09-04  6:13                                 ` Nicolas Pitre
2010-09-04 11:58                                   ` Luke Kenneth Casson Leighton
2010-09-04 13:14                                     ` Luke Kenneth Casson Leighton
2010-09-05  2:16                                       ` Nicolas Pitre
2010-09-05 18:05                                         ` Luke Kenneth Casson Leighton
2010-09-05 23:52                                           ` Nicolas Pitre
2010-09-06 13:23                                             ` Luke Kenneth Casson Leighton
2010-09-06 16:51                                               ` Nicolas Pitre
2010-09-06 22:33                                                 ` Luke Kenneth Casson Leighton
2010-09-06 23:34                                                 ` Junio C Hamano
2010-09-06 23:57                                                   ` Nicolas Pitre
2010-09-07  0:17                                                     ` Luke Kenneth Casson Leighton
2010-09-07  0:29                                                     ` Luke Kenneth Casson Leighton
2010-09-04 13:42                                   ` Artur Skawina
     [not found]                                     ` <20100904155638.GA17606@pcpool00.mathematik.uni-freiburg.de>
2010-09-04 17:23                                       ` Artur Skawina
2010-09-04 18:46                                       ` Artur Skawina
2010-09-04  1:57                       ` Theodore Tso
2010-09-04  5:23                         ` Kyle Moffett
2010-09-04 11:46                           ` Theodore Tso
2010-09-04 14:06                           ` Luke Kenneth Casson Leighton
2010-09-05  1:32                             ` Nicolas Pitre
2010-09-05 17:16                               ` Luke Kenneth Casson Leighton
2010-09-04  5:40                         ` Nicolas Pitre
2010-09-04 12:00                           ` Theodore Tso
2010-09-04 12:44                             ` Luke Kenneth Casson Leighton
2010-09-04 14:50                             ` Luke Kenneth Casson Leighton
2010-09-04 18:14                               ` Ted Ts'o
2010-09-04 20:00                                 ` Luke Kenneth Casson Leighton
2010-09-04 22:41                                   ` Ted Ts'o
2010-09-05 17:22                                     ` Luke Kenneth Casson Leighton
2010-09-04 20:20                                 ` Jakub Narebski
2010-09-04 20:47                                   ` Luke Kenneth Casson Leighton
2010-09-04 21:16                                     ` Jakub Narebski
2010-09-04 21:24                                       ` Luke Kenneth Casson Leighton
2010-09-04 22:47                                     ` Ted Ts'o
2010-09-05  1:43                                       ` Tomas Carnecky
2010-09-05  1:18                             ` Nicolas Pitre
2010-09-05 17:25                               ` Luke Kenneth Casson Leighton
2010-09-06  0:05                                 ` Nicolas Pitre
2010-09-04 12:33                           ` Luke Kenneth Casson Leighton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1009031522590.19366@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=luke.leighton@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.