git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rafael Silva <rafaeloliveira.cs@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Jonathan Tan" <jonathantanmy@google.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: rather slow 'git repack' in 'blob:none' partial clones
Date: Mon, 12 Apr 2021 09:09:46 +0200	[thread overview]
Message-ID: <gohp6kim4sf07b.fsf@cpm12071.fritz.box> (raw)
In-Reply-To: <YG4hfge2y/AmcklZ@coredump.intra.peff.net>


Jeff King <peff@peff.net> writes:

> On Mon, Apr 05, 2021 at 03:02:33AM +0200, Rafael Silva wrote:
>
>> I'm not entirely sure about this (not this late in the day), but it seems to
>> me that we should simply skip the "missing" (promisor) files when
>> operating on a partial clone.
>> 
>> Perhaps something like:
>> 
>> --- >8 ---
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 525c2d8552..fedf58323d 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -3468,6 +3468,8 @@ static int loosened_object_can_be_discarded(const struct object_id *oid,
>>  {
>>         if (!unpack_unreachable_expiration)
>>                 return 0;
>> +       if (exclude_promisor_objects && is_promisor_object(oid))
>> +               return 1;
>>         if (mtime > unpack_unreachable_expiration)
>>                 return 0;
>>         if (oid_array_lookup(&recent_objects, oid) >= 0)
>> --- >8 ---
>
> you are avoiding writing out the file. But we should realize much
> earlier that it is not something we need to even consider loosening.
>
> In the loop in loosen_unused_packed_objects(), we skip packs that are
> marked as "keep", so we'd skip the new promisor pack entirely. But we'd
> still see all these objects in the _old_ promisor pack. However, for
> each object there, we call has_sha1_pack_kept_or_nonlocal(), so that
> would likewise realize that each object is already being kept in the
> other pack.
>

Agreed. Realizing sooner that we shouldn't even consider loosening the
objects from the packfile it's better solution.

> Something like this seems to work, but I only lightly tested it, and it
> could probably use some refactoring to make it less horrible:
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index fdee8e4578..457525953a 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -574,6 +574,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		repack_promisor_objects(&po_args, &names);
>  
>  		if (existing_packs.nr && delete_redundant) {
> +			/*
> +			 * tell pack-objects about our new promisor pack, which
> +			 * we will also be keeping
> +			 */
> +			for_each_string_list_item(item, &names) {
> +				/*
> +				 * yuck, we seem to only have the name with the
> +				 * packdir prefixed
> +				 */
> +				const char *prefix;
> +				if (!skip_prefix(packtmp, packdir, &prefix) ||
> +				    *prefix++ != '/')
> +					BUG("confused by packtmp");
> +				strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
> +					     prefix, item->string);
> +			}
> +
>  			if (unpack_unreachable) {
>  				strvec_pushf(&cmd.args,
>  					     "--unpack-unreachable=%s",
>
> Do you want to try to work with that?
>

Yes, I'll try to work with that, together with refactoring that you
mentioned in the code and the other replies.

Thanks for the suggestion.

>> A quick benchmark did show some promising result:
>> 
>>     # built from: 2e36527f23 (The sixth batch, 2021-04-02)
>>     Benchmark #1: ./bin-wrappers/git -C git.git gc
>>           Time (mean ± σ):     135.669 s ±  0.665 s    [User: 42.789 s, System: 91.332 s]
>>           Range (min … max):   134.905 s … 136.115 s    3 runs
>> 
>>     # built from: 2e36527f23 + minor patch (from above)
>>     Benchmark #2: ./bin-wrappers/git -C git.git gc
>>           Time (mean ± σ):     12.586 s ±  0.031 s    [User: 11.462 s, System: 1.365 s]
>>           Range (min … max):   12.553 s … 12.616 s    3 runs
>> 
>>     Summary:
>>           'Benchmark #2' ran 10.78 ± 0.06 times faster than 'Benchmark #1'
>
> It's still quite a bit slower than a non-partial clone because the
> traversal with --exclude-promisor-objects is slow. I think that's
> because it has to open up all of the objects in the promisor pack to see
> what they refer to. I don't know if we can do better (and it's largely
> an orthogonal problem to what you're solving here, so it probably makes
> sense to just punt on it for now).
>
> -Peff

Make sense.

-- 
Thanks
Rafael

  parent reply	other threads:[~2021-04-12  7:09 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05  1:02 ` Rafael Silva
2021-04-07 21:17   ` Jeff King
2021-04-08  0:02     ` Jonathan Tan
2021-04-08  0:35       ` Jeff King
2021-04-12  7:09     ` Rafael Silva [this message]
2021-04-12 21:36     ` SZEDER Gábor
2021-04-12 21:49       ` Bryan Turner
2021-04-12 23:51         ` Jeff King
2021-04-12 23:47       ` Jeff King
2021-04-13  7:12         ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13  7:15           ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17             ` Junio C Hamano
2021-04-14  5:18               ` Jeff King
2021-04-13  7:16           ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13  7:17           ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22             ` Junio C Hamano
2021-04-13 18:10           ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14           ` Jonathan Tan
2021-04-14 19:22           ` Rafael Silva
2021-04-13 18:05         ` rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-14  5:14           ` Jeff King
2021-04-11 10:59   ` SZEDER Gábor
2021-04-12  7:53     ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14   ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50     ` Jonathan Tan
2021-04-18 14:15       ` Rafael Silva
2021-04-14 19:14   ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15  1:04     ` Jonathan Tan
2021-04-15  3:51       ` Junio C Hamano
2021-04-15  9:03         ` Jeff King
2021-04-15  9:05       ` Jeff King
2021-04-18  7:12       ` Rafael Silva
2021-04-15 18:06     ` Junio C Hamano
2021-04-18  8:40       ` Rafael Silva
2021-04-14 22:10   ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15  9:15   ` Jeff King
2021-04-18  8:20     ` Rafael Silva
2021-04-18 13:57   ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57     ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15       ` Jonathan Tan
2021-04-21 18:54         ` Rafael Silva
2021-04-19 23:09       ` Junio C Hamano
2021-04-21 19:25         ` Rafael Silva
2021-04-21 19:32     ` [PATCH v3] " Rafael Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=gohp6kim4sf07b.fsf@cpm12071.fritz.box \
    --to=rafaeloliveira.cs@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).