All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Bo Chen <chen@chenirvine.org>
Cc: git@vger.kernel.org, peff@peff.net
Subject: Re: GSoC - Some questions on the idea of "Better big-file support".
Date: Wed, 28 Mar 2012 13:19:54 +0700	[thread overview]
Message-ID: <CACsJy8APtMsMJ=FrZjOP=DbzuFoemSLJTmkjaiK5Wkq9XtA4rg@mail.gmail.com> (raw)
In-Reply-To: <CA+M5ThS2iS-NMNDosk2oR25N=PMJJVTi1D=zg7MnMCUiRoX4BQ@mail.gmail.com>

On Wed, Mar 28, 2012 at 11:38 AM, Bo Chen <chen@chenirvine.org> wrote:
> Hi, Everyone. This is Bo Chen. I am interested in the idea of "Better
> big-file support".
>
> As it is described in the idea page,
> "Many large files (like media) do not delta very well. However, some
> do (like VM disk images). Git could split large objects into smaller
> chunks, similar to bup, and find deltas between these much more
> manageable chunks. There are some preliminary patches in this
> direction, but they are in need of review and expansion."
>
> Can anyone elaborate a little bit why many large files do not delta
> very well?

Large files are usually binary. Depends on the type of binary, they
may or may not delta well. Those that are compressed/encrypted
obviously don't delta well because one change can make the final
result completely different.

Another problem with delta-ing large files with git is, current code
needs to load two files in memory for delta. Consuming 4G for delta 2
2GB files does not sound good.

> Is it a general problem or a specific problem just for Git?
> I am really new to Git, can anyone give me some hints on which source
> codes I should read to learn more about the current code on delta
> operation? It is said that "there are some preliminary patches in this
> direction", where can I find these patches?

Read about rsync algorithm [2]. Bup [1] implements the same (I think)
algorithm, but on top of git. For preliminary patches, have a look at
jc/split-blob series at commit 4a1242d in git.git.

[1] https://github.com/apenwarr/bup
[2] http://en.wikipedia.org/wiki/Rsync#Algorithm
-- 
Duy

  reply	other threads:[~2012-03-28  6:20 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-28  4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28  6:19 ` Nguyen Thai Ngoc Duy [this message]
2012-03-28 11:33   ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44     ` Bo Chen
2012-03-30 19:51     ` Bo Chen
2012-03-30 20:34       ` Jeff King
2012-03-30 23:08         ` Bo Chen
2012-03-31 11:02           ` Sergio Callegari
2012-03-31 16:18             ` Neal Kreitzinger
2012-04-02 21:07               ` Jeff King
2012-04-03  9:58                 ` Sergio Callegari
2012-04-11  1:24                 ` Neal Kreitzinger
2012-04-11  6:04                   ` Jonathan Nieder
2012-04-11 16:29                     ` Neal Kreitzinger
2012-04-11 22:09                       ` Jeff King
2012-04-11 16:35                     ` Neal Kreitzinger
2012-04-11 16:44                     ` Neal Kreitzinger
2012-04-11 17:20                       ` Jonathan Nieder
2012-04-11 18:51                         ` Junio C Hamano
2012-04-11 19:03                           ` Jonathan Nieder
2012-04-11 18:23                     ` Neal Kreitzinger
2012-04-11 21:35                   ` Jeff King
2012-04-12 19:29                     ` Neal Kreitzinger
2012-04-12 21:03                       ` Jeff King
     [not found]                         ` <4F8A2EBD.1070407@gmail.com>
2012-04-15  2:15                           ` Jeff King
2012-04-15  2:33                             ` Neal Kreitzinger
2012-04-16 14:54                               ` Jeff King
2012-05-10 21:43                             ` Neal Kreitzinger
2012-05-10 22:39                               ` Jeff King
2012-04-12 21:08                       ` Neal Kreitzinger
2012-04-13 21:36                       ` Bo Chen
2012-03-31 15:19         ` Neal Kreitzinger
2012-04-02 21:40           ` Jeff King
2012-04-02 22:19             ` Junio C Hamano
2012-04-03 10:07               ` Jeff King
2012-03-31 16:49         ` Neal Kreitzinger
2012-03-31 20:28         ` Neal Kreitzinger
2012-03-31 21:27           ` Bo Chen
2012-04-01  4:22             ` Nguyen Thai Ngoc Duy
2012-04-01 23:30               ` Bo Chen
2012-04-02  1:00                 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11   ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8APtMsMJ=FrZjOP=DbzuFoemSLJTmkjaiK5Wkq9XtA4rg@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=chen@chenirvine.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.