git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: gitster@pobox.com
Cc: jonathantanmy@google.com, git@vger.kernel.org
Subject: Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
Date: Tue,  7 Aug 2018 16:23:04 -0700	[thread overview]
Message-ID: <20180807232304.181764-1-jonathantanmy@google.com> (raw)
In-Reply-To: <xmqqzhxxki84.fsf@gitster-ct.c.googlers.com>

> for_each_object_in_pack() is a fine way to make sure that you list
> everythning in a pack, but I suspect it is a horrible way to feed a
> list of objects to pack-objects, as it goes by the .idx order, which
> is by definition a way to enumerate objects in a randomly useless
> order.

That's true.

> Do we already have an access to the in-core reverse index for the
> pack at this point in the code?

As far as I can tell, no. These patches construct the list of promisor
objects in repack.c (which does not use the revindex at all), to be
processed by pack-objects in a different process (which does use the
revindex in reuse-delta mode, which is the default). I guess that we
could have access to the revindex if we were to libify pack-objects and
run it in the same process as repack.c or if we were to add a special
mode to pack-objects that reads for itself the list of all the promisor
objects.

> If so, we can enumerate the objects
> in the offset order without incurring a lot of cost (building the
> in-core revindex is the more expensive part).  When writing a pack,
> we try to make sure that related objects are written out close to
> each other [*1*], so listing them in the offset order (with made-up
> pathname information to _force_ that objects that live close
> together in the original pack are grouped together by getting
> similar names) might give us a better than horrible deltification.
> I dunno.

Before I write the NEEDSWORK comment, just to clarify - do you mean
better than horrible object locality? I think deltification still occurs
because pack-objects sorts the input when doing the windowed
deltification, for example:

  git rev-parse HEAD:fetch-pack.c HEAD HEAD^ HEAD^^ \
	HEAD^^^ v2.17.0:fetch-pack.c | git pack-objects --window=2 abc

produces a packfile with a delta, as checked by `verify-pack -v`.

> 	Side note *1*: "related" has two axis, and one is relevant
> 	for better deltification, while the other is not useful.
> 	The one I have in mind here is that we write set of blobs
> 	that belong to the same "delta family" together.

As far as I can see, they do not need to be adjacent in the packfile to
have one be a delta against the other.

> I do not think such a "make it a bit better than horrible" is necessary
> in an initial version, but it deserves an in-code NEEDSWORK comment
> to remind us that we need to measure and experiment.

OK, I'll do this in the next reroll.

  reply	other threads:[~2018-08-07 23:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-07 20:12 [PATCH 0/2] Repacking of promisor packfiles Jonathan Tan
2018-08-07 20:12 ` [PATCH 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
2018-08-07 20:12 ` [PATCH 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
2018-08-07 20:57   ` Junio C Hamano
2018-08-07 23:23     ` Jonathan Tan [this message]
2018-08-08  0:25       ` Junio C Hamano
2018-08-08 18:45         ` Jonathan Tan
2018-08-08 15:50       ` Jeff King
2018-08-08 16:17         ` Junio C Hamano
2018-08-07 22:37   ` Junio C Hamano
2018-08-07 23:25     ` Jonathan Tan
2018-08-08 16:34       ` Junio C Hamano
2018-08-08 22:34 ` [PATCH v2 0/2] Repacking of promisor packfiles Jonathan Tan
2018-08-08 22:34   ` [PATCH v2 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
2018-08-08 22:34   ` [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
2018-08-09 17:05     ` Junio C Hamano
2018-08-09 18:12       ` Jonathan Tan
2018-08-18 16:05     ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180807232304.181764-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).