All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ramkumar Ramachandra <artagnon@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Duy Nguyen <pclouds@gmail.com>,
	Martin Fick <mfick@codeaurora.org>,
	Git List <git@vger.kernel.org>
Subject: Re: [PATCH] git exproll: steps to tackle gc aggression
Date: Fri, 9 Aug 2013 19:04:23 +0530	[thread overview]
Message-ID: <CALkWK0nSC-Aty55QO+DrM5Zf2t=DK8iMfbhv_HD44Z_m8d19Pg@mail.gmail.com> (raw)
In-Reply-To: <20130809110000.GD18878@sigill.intra.peff.net>

Jeff King wrote:
> It depends on what each side has it, doesn't it? We generally try to
> reuse on-disk deltas when we can, since they require no computation. If
> I have object A delta'd against B, and I know that the other side wants
> A and has B (or I am also sending B), I can simply send what I have on
> disk. So we do not just blit out the existing pack as-is, but we may
> reuse portions of it as appropriate.

I'll raise some (hopefully interesting) points. Let's take the example
of a simple push: I start send-pack, which in turn starts receive_pack
on the server and connects its stdin/stdout to it (using git_connect).
Now, it reads the (sha1, ref) pairs it receives on stdin and spawns
pack-objects --stdout with the right arguments as the response, right?
Overall, nothing special: just pack-objects invoked with specific
arguments.

How does pack-objects work? ll_find_deltas() spawns some worker
threads to find_deltas(). This is where this get fuzzy for me: it
calls try_delta() in a nested loop, trying to find the smallest delta,
right? I'm not sure whether the interfaces it uses to read objects
differentiates between packed deltas versus packed undeltified objects
versus loose objects at all.

Now, the main problem I see is that a pack has to be self-contained: I
can't have an object "bar" which is a delta against an object that is
not present in the pack, right? Therefore no matter what the server
already has, I have to prepare deltas only against the data that I'm
sending across.

> Of course we may have to reconstruct deltas for trees in order to find
> the correct set of objects (i.e., the history traversal). But that is a
> separate phase from generating the pack's object content, and we do not
> reuse any of the traversal work in later phases.

I see. Are we talking about tree-walk.c here? This is not unique to
packing at all; we need to do this kind of traversal for any git
operation that digs into older history, no? I recall a discussion
about using generation numbers to speed up the walk: I tried playing
with your series (where you use a cache to keep the generation
numbers), but got nowhere. Does it make sense to think about speeding
up the walk (how?).

  reply	other threads:[~2013-08-09 13:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06  2:38 [PATCH] git exproll: steps to tackle gc aggression Ramkumar Ramachandra
2013-08-06 12:24 ` Duy Nguyen
2013-08-06 17:39   ` Junio C Hamano
2013-08-07  4:43     ` Ramkumar Ramachandra
2013-08-08  7:13       ` Junio C Hamano
2013-08-08  7:44         ` Ramkumar Ramachandra
2013-08-08 16:56           ` Junio C Hamano
2013-08-08 17:34             ` Martin Fick
2013-08-08 18:52               ` Junio C Hamano
2013-08-08 19:14                 ` Ramkumar Ramachandra
2013-08-08 17:36             ` Ramkumar Ramachandra
2013-08-08 19:37               ` Junio C Hamano
2013-08-08 20:04                 ` Ramkumar Ramachandra
2013-08-08 21:09                   ` Martin Langhoff
2013-08-09 11:00                   ` Jeff King
2013-08-09 13:34                     ` Ramkumar Ramachandra [this message]
2013-08-09 17:35                       ` Junio C Hamano
2013-08-09 22:16                       ` Jeff King
2013-08-10  1:24                         ` Duy Nguyen
2013-08-10  9:50                           ` Jeff King
2013-08-10  5:26                         ` Junio C Hamano
2013-08-10  8:42                         ` Ramkumar Ramachandra
2013-08-10  9:24                           ` Duy Nguyen
2013-08-10  9:28                             ` Duy Nguyen
2013-08-10  9:43                             ` Jeff King
2013-08-10  9:50                               ` Duy Nguyen
2013-08-10 10:05                                 ` Ramkumar Ramachandra
2013-08-10 10:16                                   ` Duy Nguyen
2013-08-10  9:38                           ` Jeff King
2013-08-07  0:10   ` Martin Fick
2013-08-08  2:18     ` Duy Nguyen
2013-08-07  0:25 ` Martin Fick
2013-08-07  4:36   ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALkWK0nSC-Aty55QO+DrM5Zf2t=DK8iMfbhv_HD44Z_m8d19Pg@mail.gmail.com' \
    --to=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.