All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neal Kreitzinger <nkreitzinger@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Sergio Callegari <sergio.callegari@gmail.com>,
	Bo Chen <chen@chenirvine.org>,
	git@vger.kernel.org
Subject: Re: GSoC - Some questions on the idea of
Date: Tue, 10 Apr 2012 20:24:48 -0500	[thread overview]
Message-ID: <4F84DD60.20903@gmail.com> (raw)
In-Reply-To: <20120402210708.GA28926@sigill.intra.peff.net>

On 4/2/2012 4:07 PM, Jeff King wrote:

> ...I think we need to first find out exactly
> how well the generic algorithm can perform. It may be "good enough"
> compared to the hassle that inconsistent application of a content-aware
> algorithm will cause.  So I wouldn't rule it out, but I'd rather try the
> bup-style splitting first, and see how good (or bad) it is.
>
(I read bup DESIGN doc to see what bup-style splitting is.) When you use 
bup delta technology in git.git I take it that you will use it for 
big-worktree-files *and* big-history-files (not-big-worktree-files that 
are not xdelta delta-friendly)?  IOW, all binaries plus 
big-text-worktree-files.  Otherwise, small binaries will become large 
histories.

If small binaries are not going to be bup-delta-compressed, then what 
about using xxd to convert the binary to text and then xdelta 
compressing the hex dump to achieve efficient delta compression in the 
pack file?  You could convert the hexdump back to binary with xxd for 
checkout and such.

Maybe small binaries do xdelta well and the above is a moot point.  This 
is all theory to me, but the reality is looming over my head since most 
of the components I should be tracking are binaries small (large 
history?) and big (but am not yet because of "big-file" concerns -- I 
don't want to have to refactor my vast git ecosystem with filter branch 
later because I slammed binaries into the main project or superproject 
without proper systems programming (I'm not sure what the c/linux term 
is for 'systems programming', but in the mainframe world it meant making 
sure everything was configured for efficient performance)).

Now that I say that out loud I guess a superproject with binaries in 
separate repos could be easily refactored by creating new efficient 
repos and making a new commit that points to them instead of the old 
inefficient repos.  That way, when someone checks out the binary repo 
(submodule) into their worktree they get the new efficiency instead of 
the old inefficiency.  Over time, as folks are less likely to check out 
old stuff the old inefficiency goes away on its own.  I think. 
(Submodules are mostly theory to me at this point also.)

v/r,
neal

  parent reply	other threads:[~2012-04-11  1:24 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-28  4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28  6:19 ` Nguyen Thai Ngoc Duy
2012-03-28 11:33   ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44     ` Bo Chen
2012-03-30 19:51     ` Bo Chen
2012-03-30 20:34       ` Jeff King
2012-03-30 23:08         ` Bo Chen
2012-03-31 11:02           ` Sergio Callegari
2012-03-31 16:18             ` Neal Kreitzinger
2012-04-02 21:07               ` Jeff King
2012-04-03  9:58                 ` Sergio Callegari
2012-04-11  1:24                 ` Neal Kreitzinger [this message]
2012-04-11  6:04                   ` Jonathan Nieder
2012-04-11 16:29                     ` Neal Kreitzinger
2012-04-11 22:09                       ` Jeff King
2012-04-11 16:35                     ` Neal Kreitzinger
2012-04-11 16:44                     ` Neal Kreitzinger
2012-04-11 17:20                       ` Jonathan Nieder
2012-04-11 18:51                         ` Junio C Hamano
2012-04-11 19:03                           ` Jonathan Nieder
2012-04-11 18:23                     ` Neal Kreitzinger
2012-04-11 21:35                   ` Jeff King
2012-04-12 19:29                     ` Neal Kreitzinger
2012-04-12 21:03                       ` Jeff King
     [not found]                         ` <4F8A2EBD.1070407@gmail.com>
2012-04-15  2:15                           ` Jeff King
2012-04-15  2:33                             ` Neal Kreitzinger
2012-04-16 14:54                               ` Jeff King
2012-05-10 21:43                             ` Neal Kreitzinger
2012-05-10 22:39                               ` Jeff King
2012-04-12 21:08                       ` Neal Kreitzinger
2012-04-13 21:36                       ` Bo Chen
2012-03-31 15:19         ` Neal Kreitzinger
2012-04-02 21:40           ` Jeff King
2012-04-02 22:19             ` Junio C Hamano
2012-04-03 10:07               ` Jeff King
2012-03-31 16:49         ` Neal Kreitzinger
2012-03-31 20:28         ` Neal Kreitzinger
2012-03-31 21:27           ` Bo Chen
2012-04-01  4:22             ` Nguyen Thai Ngoc Duy
2012-04-01 23:30               ` Bo Chen
2012-04-02  1:00                 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11   ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F84DD60.20903@gmail.com \
    --to=nkreitzinger@gmail.com \
    --cc=chen@chenirvine.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sergio.callegari@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.