From: Neal Kreitzinger <nkreitzinger@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Sergio Callegari <sergio.callegari@gmail.com>,
Bo Chen <chen@chenirvine.org>,
git@vger.kernel.org
Subject: Re: GSoC - Some questions on the idea of
Date: Tue, 10 Apr 2012 20:24:48 -0500 [thread overview]
Message-ID: <4F84DD60.20903@gmail.com> (raw)
In-Reply-To: <20120402210708.GA28926@sigill.intra.peff.net>
On 4/2/2012 4:07 PM, Jeff King wrote:
> ...I think we need to first find out exactly
> how well the generic algorithm can perform. It may be "good enough"
> compared to the hassle that inconsistent application of a content-aware
> algorithm will cause. So I wouldn't rule it out, but I'd rather try the
> bup-style splitting first, and see how good (or bad) it is.
>
(I read bup DESIGN doc to see what bup-style splitting is.) When you use
bup delta technology in git.git I take it that you will use it for
big-worktree-files *and* big-history-files (not-big-worktree-files that
are not xdelta delta-friendly)? IOW, all binaries plus
big-text-worktree-files. Otherwise, small binaries will become large
histories.
If small binaries are not going to be bup-delta-compressed, then what
about using xxd to convert the binary to text and then xdelta
compressing the hex dump to achieve efficient delta compression in the
pack file? You could convert the hexdump back to binary with xxd for
checkout and such.
Maybe small binaries do xdelta well and the above is a moot point. This
is all theory to me, but the reality is looming over my head since most
of the components I should be tracking are binaries small (large
history?) and big (but am not yet because of "big-file" concerns -- I
don't want to have to refactor my vast git ecosystem with filter branch
later because I slammed binaries into the main project or superproject
without proper systems programming (I'm not sure what the c/linux term
is for 'systems programming', but in the mainframe world it meant making
sure everything was configured for efficient performance)).
Now that I say that out loud I guess a superproject with binaries in
separate repos could be easily refactored by creating new efficient
repos and making a new commit that points to them instead of the old
inefficient repos. That way, when someone checks out the binary repo
(submodule) into their worktree they get the new efficiency instead of
the old inefficiency. Over time, as folks are less likely to check out
old stuff the old inefficiency goes away on its own. I think.
(Submodules are mostly theory to me at this point also.)
v/r,
neal
next prev parent reply other threads:[~2012-04-11 1:24 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-28 4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28 6:19 ` Nguyen Thai Ngoc Duy
2012-03-28 11:33 ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44 ` Bo Chen
2012-03-30 19:51 ` Bo Chen
2012-03-30 20:34 ` Jeff King
2012-03-30 23:08 ` Bo Chen
2012-03-31 11:02 ` Sergio Callegari
2012-03-31 16:18 ` Neal Kreitzinger
2012-04-02 21:07 ` Jeff King
2012-04-03 9:58 ` Sergio Callegari
2012-04-11 1:24 ` Neal Kreitzinger [this message]
2012-04-11 6:04 ` Jonathan Nieder
2012-04-11 16:29 ` Neal Kreitzinger
2012-04-11 22:09 ` Jeff King
2012-04-11 16:35 ` Neal Kreitzinger
2012-04-11 16:44 ` Neal Kreitzinger
2012-04-11 17:20 ` Jonathan Nieder
2012-04-11 18:51 ` Junio C Hamano
2012-04-11 19:03 ` Jonathan Nieder
2012-04-11 18:23 ` Neal Kreitzinger
2012-04-11 21:35 ` Jeff King
2012-04-12 19:29 ` Neal Kreitzinger
2012-04-12 21:03 ` Jeff King
[not found] ` <4F8A2EBD.1070407@gmail.com>
2012-04-15 2:15 ` Jeff King
2012-04-15 2:33 ` Neal Kreitzinger
2012-04-16 14:54 ` Jeff King
2012-05-10 21:43 ` Neal Kreitzinger
2012-05-10 22:39 ` Jeff King
2012-04-12 21:08 ` Neal Kreitzinger
2012-04-13 21:36 ` Bo Chen
2012-03-31 15:19 ` Neal Kreitzinger
2012-04-02 21:40 ` Jeff King
2012-04-02 22:19 ` Junio C Hamano
2012-04-03 10:07 ` Jeff King
2012-03-31 16:49 ` Neal Kreitzinger
2012-03-31 20:28 ` Neal Kreitzinger
2012-03-31 21:27 ` Bo Chen
2012-04-01 4:22 ` Nguyen Thai Ngoc Duy
2012-04-01 23:30 ` Bo Chen
2012-04-02 1:00 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11 ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F84DD60.20903@gmail.com \
--to=nkreitzinger@gmail.com \
--cc=chen@chenirvine.org \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=sergio.callegari@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.