All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Neal Kreitzinger <nkreitzinger@gmail.com>
Cc: Sergio Callegari <sergio.callegari@gmail.com>,
	Bo Chen <chen@chenirvine.org>,
	git@vger.kernel.org
Subject: Re: GSoC - Some questions on the idea of
Date: Thu, 10 May 2012 18:39:16 -0400	[thread overview]
Message-ID: <20120510223916.GB31116@sigill.intra.peff.net> (raw)
In-Reply-To: <4FAC367E.8070006@gmail.com>

On Thu, May 10, 2012 at 04:43:26PM -0500, Neal Kreitzinger wrote:

> >Yes. The on-the-wire format is a packfile. We create a new packfile on
> >the fly, so we may find new deltas (e.g., between objects that were
> >stored on disk in two different packs), but we will mostly be reusing
> >deltas from the existing packs.
> >
> >So any time you improve the on-disk representation, you are also
> >improving the network bandwidth utilization.
> >
> The git-clone manpage says you can use the rsync protocol for the
> url.  If you use rsync:// as your url for your remote does that get
> you the rsync delta-transfer algorithm efficiency for the network
> bandwidth utilization part (as opposed to the on-disk representation
> part)?  (I'm new to rsync.)

Well, yes. If you use the rsync transport, it literally runs rsync,
which will use the regular rsync algorithm. But it won't be better than
the git protocol (and in fact will be much worse) for a few reasons:

  1. The object db files are all named after the sha1 of their content
     (the object sha1 for loose objects, and the sha1 of the whole pack
     for packfiles). Rsync will not run its comparison algorithm between
     files with different names. It will not re-transfer existing loose
     objects, but it will delete obsolete packfiles and retransfer new
     ones in their entirety. So it's like re-cloning over again for any
     fetch after an upstream repack.

  2. Even if you could use the rsync delta algorithm, it will never be
     as efficient as git. Git understands the structure of the packfile
     and can tell the other side "Hey, I have these objects". Whereas
     rsync must guess from the bytes in the packfiles. Which is much
     less efficient to compute, and can be wrong if the representation
     has changed (e.g., something used to be a whole object, but is now
     stored as a delta).

  3. Even if you could get the exact right set of objects to transfer,
     and then use the rsync delta algorithm on them, git would still do
     better. Git's job is much easier: one side has both sets of
     objects (those to be sent and those not), and is generating and
     sending efficient deltas for the other side to apply to their
     objects. Rsync assumes a harder job: you have one set, and
     the remote side has the other set, and you must agree on a delta by
     comparing checksums. So it will fundamentally never do as well.

-Peff

  reply	other threads:[~2012-05-10 22:39 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-28  4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28  6:19 ` Nguyen Thai Ngoc Duy
2012-03-28 11:33   ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44     ` Bo Chen
2012-03-30 19:51     ` Bo Chen
2012-03-30 20:34       ` Jeff King
2012-03-30 23:08         ` Bo Chen
2012-03-31 11:02           ` Sergio Callegari
2012-03-31 16:18             ` Neal Kreitzinger
2012-04-02 21:07               ` Jeff King
2012-04-03  9:58                 ` Sergio Callegari
2012-04-11  1:24                 ` Neal Kreitzinger
2012-04-11  6:04                   ` Jonathan Nieder
2012-04-11 16:29                     ` Neal Kreitzinger
2012-04-11 22:09                       ` Jeff King
2012-04-11 16:35                     ` Neal Kreitzinger
2012-04-11 16:44                     ` Neal Kreitzinger
2012-04-11 17:20                       ` Jonathan Nieder
2012-04-11 18:51                         ` Junio C Hamano
2012-04-11 19:03                           ` Jonathan Nieder
2012-04-11 18:23                     ` Neal Kreitzinger
2012-04-11 21:35                   ` Jeff King
2012-04-12 19:29                     ` Neal Kreitzinger
2012-04-12 21:03                       ` Jeff King
     [not found]                         ` <4F8A2EBD.1070407@gmail.com>
2012-04-15  2:15                           ` Jeff King
2012-04-15  2:33                             ` Neal Kreitzinger
2012-04-16 14:54                               ` Jeff King
2012-05-10 21:43                             ` Neal Kreitzinger
2012-05-10 22:39                               ` Jeff King [this message]
2012-04-12 21:08                       ` Neal Kreitzinger
2012-04-13 21:36                       ` Bo Chen
2012-03-31 15:19         ` Neal Kreitzinger
2012-04-02 21:40           ` Jeff King
2012-04-02 22:19             ` Junio C Hamano
2012-04-03 10:07               ` Jeff King
2012-03-31 16:49         ` Neal Kreitzinger
2012-03-31 20:28         ` Neal Kreitzinger
2012-03-31 21:27           ` Bo Chen
2012-04-01  4:22             ` Nguyen Thai Ngoc Duy
2012-04-01 23:30               ` Bo Chen
2012-04-02  1:00                 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11   ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120510223916.GB31116@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=chen@chenirvine.org \
    --cc=git@vger.kernel.org \
    --cc=nkreitzinger@gmail.com \
    --cc=sergio.callegari@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.