From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Rafael Silva <rafaeloliveira.cs@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: rather slow 'git repack' in 'blob:none' partial clones
Date: Tue, 13 Apr 2021 20:05:52 +0200 [thread overview]
Message-ID: <20210413180552.GI2947267@szeder.dev> (raw)
In-Reply-To: <YHTcHY+P7RuZJGab@coredump.intra.peff.net>
On Mon, Apr 12, 2021 at 07:47:41PM -0400, Jeff King wrote:
> On Mon, Apr 12, 2021 at 11:36:53PM +0200, SZEDER Gábor wrote:
>
> > All what you wrote above makes sense to me, but I've never looked at
> > how partial clones work, so it doesn't mean anything... In any case
> > your patch brings great speedups, but, unfortunately, the memory usage
> > remains as high as it was:
> >
> > $ /usr/bin/time --format=elapsed: %E max RSS: %Mk /home/szeder/src/git/bin-wrappers/git -C git-partial.git/ gc
> > Enumerating objects: 188450, done.
> > Counting objects: 100% (188450/188450), done.
> > Delta compression using up to 4 threads
> > Compressing objects: 100% (66623/66623), done.
> > Writing objects: 100% (188450/188450), done.
> > Total 188450 (delta 120109), reused 188450 (delta 120109), pack-reused 0
> > elapsed: 0:15.18 max RSS: 1888332k
> >
> > And git.git is not all that large, I wonder how much memory would be
> > necessary to 'gc' a 'blob:none' clone of e.g. chromium?! :)
> >
> > BTW, this high memory usage in a partial clone is not specific to
> > 'repack', 'fsck' suffers just as much:
>
> I think the issue is in the exclude-promisor-object code paths of the
> traversal. Try this:
>
> [two clones of git.git, one full and one partial]
> $ git clone --no-local --bare /path/to/git full.git
> $ git clone --no-local --bare --filter=blob:none /path/to/git partial.git
>
> [full clone is quick to traverse all objects]
> $ time git -C full.git rev-list --count --all
> 63215
> real 0m0.369s
> user 0m0.365s
> sys 0m0.004s
>
> [partial is, too; it's the same amount of work because we're just
> looking at the commits here]
> $ time git -C partial.git rev-list --count --all
> 63215
> real 0m0.373s
> user 0m0.364s
> sys 0m0.009s
>
> [but now ask it to exclude promisor objects, and it's much slower;
> this is because is_promisor_object() opens up each tree in the pack in
> order to see which "promised" objects it mentions]
I don't understand this: 'git rev-list --count --all' only counts
commit objects, so why should it open any trees at all?
> $ time git -C partial.git rev-list --exclude-promisor-objects --count --all
> 0
> real 0m11.723s
> user 0m11.354s
> sys 0m0.369s
>
> And I think that is the source for the memory use, too. Without that
> option, we peak at 11MB heap. With it, 1.6GB. Oops. Looks like there
> might be some "leaks" when we parse tree objects and then leave the
> buffers connected to the structs (technically not a leak because we
> still have the pointers, but obviously having every tree in memory at
> once is not good).
>
> The patch below drops the peak heap to 165MB. Still quite a bit more,
> but I think it's a combination of delta-base cache (96MB) plus extra
> structs for all the non-commit objects whose flags we marked.
>
> It does seem like there's probably a good space/time tradeoff to be made
> here (e.g., caching the list of "promisor" object ids for the pack
> instead of inflating and reading all of the trees on the fly).
>
> diff --git a/packfile.c b/packfile.c
> index 8668345d93..b79cbc8cd4 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -2247,6 +2247,7 @@ static int add_promisor_object(const struct object_id *oid,
> return 0;
> while (tree_entry_gently(&desc, &entry))
> oidset_insert(set, &entry.oid);
> + free_tree_buffer(tree);
> } else if (obj->type == OBJ_COMMIT) {
> struct commit *commit = (struct commit *) obj;
> struct commit_list *parents = commit->parents;
> diff --git a/revision.c b/revision.c
> index 553c0faa9b..fac2577748 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3271,8 +3271,15 @@ static int mark_uninteresting(const struct object_id *oid,
> void *cb)
> {
> struct rev_info *revs = cb;
> + /*
> + * yikes, do we really need to parse here? maybe
Heh, a "yikes" here and a "yuck" in your previous patch... This issue
was worth reporting :)
> + * lookup_unknown_object() would be sufficient, or
> + * even oid_object_info() followed by the correct type
> + */
> struct object *o = parse_object(revs->repo, oid);
> o->flags |= UNINTERESTING | SEEN;
> + if (o->type == OBJ_TREE)
> + free_tree_buffer((struct tree *)o);
> return 0;
> }
>
>
> -Peff
next prev parent reply other threads:[~2021-04-13 18:05 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-03 9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05 1:02 ` Rafael Silva
2021-04-07 21:17 ` Jeff King
2021-04-08 0:02 ` Jonathan Tan
2021-04-08 0:35 ` Jeff King
2021-04-12 7:09 ` Rafael Silva
2021-04-12 21:36 ` SZEDER Gábor
2021-04-12 21:49 ` Bryan Turner
2021-04-12 23:51 ` Jeff King
2021-04-12 23:47 ` Jeff King
2021-04-13 7:12 ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13 7:15 ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17 ` Junio C Hamano
2021-04-14 5:18 ` Jeff King
2021-04-13 7:16 ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13 7:17 ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22 ` Junio C Hamano
2021-04-13 18:10 ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14 ` Jonathan Tan
2021-04-14 19:22 ` Rafael Silva
2021-04-13 18:05 ` SZEDER Gábor [this message]
2021-04-14 5:14 ` rather slow 'git repack' in 'blob:none' partial clones Jeff King
2021-04-11 10:59 ` SZEDER Gábor
2021-04-12 7:53 ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14 ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50 ` Jonathan Tan
2021-04-18 14:15 ` Rafael Silva
2021-04-14 19:14 ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15 1:04 ` Jonathan Tan
2021-04-15 3:51 ` Junio C Hamano
2021-04-15 9:03 ` Jeff King
2021-04-15 9:05 ` Jeff King
2021-04-18 7:12 ` Rafael Silva
2021-04-15 18:06 ` Junio C Hamano
2021-04-18 8:40 ` Rafael Silva
2021-04-14 22:10 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15 9:15 ` Jeff King
2021-04-18 8:20 ` Rafael Silva
2021-04-18 13:57 ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57 ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15 ` Jonathan Tan
2021-04-21 18:54 ` Rafael Silva
2021-04-19 23:09 ` Junio C Hamano
2021-04-21 19:25 ` Rafael Silva
2021-04-21 19:32 ` [PATCH v3] " Rafael Silva
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210413180552.GI2947267@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=peff@peff.net \
--cc=rafaeloliveira.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).