Resumable clone/Gittorrent (again)

* Resumable clone/Gittorrent (again)
@ 2011-01-05 16:23 Nguyen Thai Ngoc Duy
  2011-01-05 16:56 ` Luke Kenneth Casson Leighton
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-01-05 16:23 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Nicolas Pitre, Luke Kenneth Casson Leighton

Hi,

I've been analyzing bittorrent protocol and come up with this. The
last idea about a similar thing [1], gittorrent, was given by Nicolas.
This keeps close to that idea (i.e the transfer protocol must be around git
objects, not file chunks) with a bit difference.

The idea is to transfer a chain of objects (trees or blobs), including
base object and delta chain. Objects are chained in according to
worktree layout, e.g. all objects of path/to/any/blob will form a
chain, from a commit tip down to the root commits. Chains can have
gaps, and don't need to start from commit tip. The transfer is
resumable because if a delta chain is corrupt at some point, we can
just request another chain from where it stops. Base object is
obviously resumable.

We start by fetching all commit contents reachable from a commit tip.
This is a chain, therefore resumable. From there each commit can be
examined. Missing trees and blobs will be fetched as chains. Everytime
a delta is received, we can recreate the new object and verify it (we
should have its SHA-1 from its parent trees/commits).

Because these chains are quite independent, in a sense that a blob
chain is independent from another blob chain (but requires tree
chains, of course). We can fetch as many as we want in parallel, once
we're done with the commit chain.

The last thing I like about these chains is that the number of chains
is reasonable. It won't increase too fast over time (as compared to
the number of commits). As such it maps well to BitTorrent's "pieces".
When a new gittorrent comes in, a running client can advertise what
chain it has with a bitmap (*).

So it all looks good to me. It is resumable and verifiable. It can be
fetched in parallel from many servers. It maps pretty good to
BitTorrent (which means we can reuse BitTorrent design). All transfer
should be compressed so the amount of transfer is also acceptable (not
as optimized as upload-pack, but hopefully overhead is low). A minor
point is latest commit (in full) will be available as soon as
possible.

One thing about reachability test. In order to avoid this test in an
expensive way every time a chain is requested, the requested chain
must be in SHA-1 extended form, starting from commit tip (e.g.
$SHA1~12^2~34...). Parsing and following that syntax is cheaper, I
hope.

Comments?

[1] http://article.gmane.org/gmane.comp.version-control.git/155222

(*) BitTorrent stores list of pieces in .torrent file. The bitmap
reflects what pieces a client has so other clients can ask for pieces
from it. If we follow the same way, we can create a list of all
possible directories/files in a repository in .gittorrent file, then
gittorrent client can advertise with bitmaps too, perhaps two-bit map
(not fetched, fetching, fully fetched).
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread