Re: [RFC] Add --create-cache to repack

From: Shawn Pearce <spearce@spearce.org>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Johannes Sixt <j.sixt@viscovery.net>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	John Hawley <warthog19@eaglescrag.net>
Subject: Re: [RFC] Add --create-cache to repack
Date: Fri, 28 Jan 2011 20:35:48 -0800	[thread overview]
Message-ID: <AANLkTik01sVOKD+gW_BembUDN-LRbqJsPCCBBVyJJ11T@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1101282055190.8580@xanadu.home>

On Fri, Jan 28, 2011 at 20:08, Nicolas Pitre <nico@fluxnic.net> wrote:
>> pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB.  I point
>> out this data because improvements made to JGit may show similar
>> improvements to CGit given how close they are in running time.
>
> What are those improvements?

None right now.  JGit is similar to CGit algorithm-wise.  (Actually it
looks like JGit has a faster diff implementation, but that's a
different email.)

If you are asking about why JGit created a slightly smaller pack
file... it splits the delta window during threaded delta search
differently than CGit does, and we align our blocks slightly
differently when comparing two objects to generate a delta sequence
for them.  These two variations mean JGit produces different deltas
than CGit does.  Sometimes we are smaller, sometimes we are larger.
But its a small difference, on the order of 1-4 MiB for something like
linux-2.6.  I don't think its worthwhile trying to analyze the
specific differences in implementations and retrofit those differences
into the other one.

What I was trying to say was, _if_ we made a change to JGit and it
dropped the running time, that same change in CGit should have _at
least_ the same running time improvement, if not better.  I was
pointing out that this cached-pack change dropped the running time by
1 minute, so CGit should also see a similar improvement (if not
better).  I would prefer to test against CGit for this sort of thing,
but its been too long since I last poked pack-objects.c and the
revision code in CGit, while the JGit equivalents are really fresh in
my head.

> Now, the fact that JGit is so close to CGit must be because the actual
> cost is outside of them such as within zlib, otherwise the C code should
> normally always be faster, right?

Yup, I mostly agree with this statement.  CGit does a lot of
malloc/free activity when reading objects in.  JGit does too, but we
often fit into the young generation for the GC, which sometimes can be
faster to clean and recycle memory in.  We're not too far off from C
code.

But yes... our profile looks like this too:

> Looking at the profile for "git rev-list --objects --all > /dev/null"
> for the object enumeration phase, we have:
>
> # Samples: 1814637
> #
> # Overhead          Command  Shared Object  Symbol
> # ........  ...............  .............  ......
> #
>    28.81%              git  /home/nico/bin/git  [.] lookup_object
>    12.21%              git  /lib64/libz.so.1.2.3  [.] inflate
>    10.49%              git  /lib64/libz.so.1.2.3  [.] inflate_fast
>     7.47%              git  /lib64/libz.so.1.2.3  [.] inflate_table
>     6.66%              git  /lib64/libc-2.11.2.so  [.] __GI_memcpy
>     5.66%              git  /home/nico/bin/git  [.] find_pack_entry_one
>     2.98%              git  /home/nico/bin/git  [.] decode_tree_entry
> [...]
>
> So we've got lookup_object() clearly at the top.

Isn't this the hash table lookup inside the revision pool, to see if
the object has already been visited?  That seems horrible, 28% of the
CPU is going to probing that table.

>  I suspect the
> hashcmp() in there, which probably gets inlined, is responsible for most
> cycles.

Probably true.  I know our hashcmp() is inlined, its actually written
by hand as 5 word compares, and is marked final, so the JIT is rather
likely to inline it.

>  There is certainly a better way here, and probably in JGit you
> rely on some optimized facility provided by the language/library to
> perform that lookup.  So there is probably some easy improvements that
> can be made here.

Nope.  Actually we have to bend over backwards and work against the
language to get anything even reasonably sane for performance.  Our
"solution" in JGit has actually been used by Rob Pike to promote his
Go programming language and why Java sucks as a language.  Its a great
quote of mine that someone dragged up off the git@vger mailing list
and started using to promote Go.

At least once I week I envy how easy it is to use hashcmp() and
hashcpy() inside of CGit.  JGit's management of hashes is sh*t because
we have to bend so hard around the language.

> Otherwise it is at least 12.21 + 10.49 + 7.47 + 2.71 = 32.88% spent
> directly in the zlib code, making it the biggest cost.

Yea, that's what we have too, about 33% inside of zlib code... which
is the same implementation that CGit uses.

>  This is rather
> unavoidable unless the data structure is changed.

We already knew this from our pack v4 experiments years ago.

>  And pack v4 would
> probably move things such as find_pack_entry_one, decode_tree_entry,
> process_tree and tree_entry off the radar as well.

This is hard to do inside of CGit if I recall... but yes, changing the
way trees are handled would really improve things.

> The object writeout phase should pretty much be network bound.

Yes.

>> I fully implemented the reuse of a cached pack behind a thin pack idea
>> I was trying to describe in this thread.  It saved 1m7s off the JGit
>> running time, but increased the data transfer by 25 MiB.
>
> Yeah... this sucks.

Very much.  :-(

But this is a fundamental issue with our incremental fetch support
anyway.  In this exact case if the client was at that 1 month old
commit, and fetched current master, he would pull 25 MiB of data.. but
only needed about 4-6 MiB worth of deltas if it was properly delta
compressed against the content we know he already has.  Our server
side optimization of only pushing the immediate "have" list of the
client into the delta search window limits how much we can compress
the data we are sending.  If we were willing to push more in on the
server side, we could shrink the incremental fetch more.  But that's a
CPU problem on the server.

-- 
Shawn.