From: Bryan Turner <bturner@atlassian.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Jeff King <peff@peff.net>,
Rafael Silva <rafaeloliveira.cs@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: rather slow 'git repack' in 'blob:none' partial clones
Date: Mon, 12 Apr 2021 14:49:00 -0700 [thread overview]
Message-ID: <CAGyf7-HTCDm_SB5CfQWJWjvuCVYuJ4=h65=zG-N1XTgNRs+j0w@mail.gmail.com> (raw)
In-Reply-To: <20210412213653.GH2947267@szeder.dev>
On Mon, Apr 12, 2021 at 2:37 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> And a somewhat related issue: when the server doesn't support filters,
> then 'git clone --filter=...' prints a warning and proceeds to clone
> the full repo. Reading ba95710a3b ({fetch,upload}-pack: support
> filter in protocol v2, 2018-05-03) this seems to be intentional and I
> tend to think that it makes sense (though I managed to overlook that
> warning twice today... I surely wouldn't have overlooked a hard
> error, but that would perhaps be too harsh in this case, dunno).
> However, the resulting full clone is still marked as partial:
>
> $ git clone --bare --filter=blob:none https://git.kernel.org/pub/scm/git/git.git git-not-really-partial.git
> Cloning into bare repository 'git-not-really-partial.git'...
> warning: filtering not recognized by server, ignoring
> remote: Enumerating objects: 591, done.
> remote: Counting objects: 100% (591/591), done.
> remote: Compressing objects: 100% (293/293), done.
> remote: Total 305662 (delta 372), reused 393 (delta 298), pack-reused 305071
> Receiving objects: 100% (305662/305662), 96.83 MiB | 2.10 MiB/s, done.
> Resolving deltas: 100% (228123/228123), done.
> $ ls -l git-not-really-partial.git/objects/pack/
> total 107568
> -r--r--r-- 1 szeder szeder 8559608 Apr 12 21:13 pack-53f3ee0dfeaa8cea65c78473cd5904bf5ddfaa20.idx
> -r--r--r-- 1 szeder szeder 101535430 Apr 12 21:13 pack-53f3ee0dfeaa8cea65c78473cd5904bf5ddfaa20.pack
> -rw------- 1 szeder szeder 49012 Apr 12 21:13 pack-53f3ee0dfeaa8cea65c78473cd5904bf5ddfaa20.promisor
> $ cat git-not-really-partial.git/config
> [core]
> repositoryformatversion = 1
> filemode = true
> bare = true
> [remote "origin"]
> url = https://git.kernel.org/pub/scm/git/git.git
> promisor = true
> partialclonefilter = blob:none
I ran into this same surprising behavior recently, too. I was adding
some automated testing to Bitbucket for partial clones and initially
tried to use whether the repository was configured with a partial
clone filter as one of my checks, only to find that even when filters
weren't supported it was still set. The only way I could find to
detect that a partial clone that was requested didn't actually happen
was to parse the git clone output and look for the warning.
>
> I wonder whether this is intentional, or that it is really the desired
> behavior, considering that 'gc/repack/fsck' still treat it as a
> partial clone, and, consequently, are affected by this slowness and
> much higher memory usage, and since the repo now contains a lot more
> objects than expected (all the blobs as well), they are much slower:
>
> $ /usr/bin/time --format=elapsed: %E max RSS: %Mk git -C git-not-really-partial.git/ gc
> Enumerating objects: 305662, done.
> Counting objects: 100% (305662/305662), done.
> Delta compression using up to 4 threads
> Compressing objects: 100% (75200/75200), done.
> Writing objects: 100% (305662/305662), done.
> Total 305662 (delta 228123), reused 305662 (delta 228123), pack-reused 0
> Removing duplicate objects: 100% (256/256), done.
> elapsed: 4:28.96 max RSS: 1985100k
> # with Peff's patch above:
> $ /usr/bin/time --format=elapsed: %E max RSS: %Mk /home/szeder/src/git/bin-wrappers/git -C git-not-really-partial.git/ gc
> Enumerating objects: 305662, done.
> Counting objects: 100% (305662/305662), done.
> Delta compression using up to 4 threads
> Compressing objects: 100% (75200/75200), done.
> Writing objects: 100% (305662/305662), done.
> Total 305662 (delta 228123), reused 305662 (delta 228123), pack-reused 0
> elapsed: 1:21.83 max RSS: 1959740k
>
next prev parent reply other threads:[~2021-04-12 21:49 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-03 9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05 1:02 ` Rafael Silva
2021-04-07 21:17 ` Jeff King
2021-04-08 0:02 ` Jonathan Tan
2021-04-08 0:35 ` Jeff King
2021-04-12 7:09 ` Rafael Silva
2021-04-12 21:36 ` SZEDER Gábor
2021-04-12 21:49 ` Bryan Turner [this message]
2021-04-12 23:51 ` Jeff King
2021-04-12 23:47 ` Jeff King
2021-04-13 7:12 ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13 7:15 ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17 ` Junio C Hamano
2021-04-14 5:18 ` Jeff King
2021-04-13 7:16 ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13 7:17 ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22 ` Junio C Hamano
2021-04-13 18:10 ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14 ` Jonathan Tan
2021-04-14 19:22 ` Rafael Silva
2021-04-13 18:05 ` rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-14 5:14 ` Jeff King
2021-04-11 10:59 ` SZEDER Gábor
2021-04-12 7:53 ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14 ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50 ` Jonathan Tan
2021-04-18 14:15 ` Rafael Silva
2021-04-14 19:14 ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15 1:04 ` Jonathan Tan
2021-04-15 3:51 ` Junio C Hamano
2021-04-15 9:03 ` Jeff King
2021-04-15 9:05 ` Jeff King
2021-04-18 7:12 ` Rafael Silva
2021-04-15 18:06 ` Junio C Hamano
2021-04-18 8:40 ` Rafael Silva
2021-04-14 22:10 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15 9:15 ` Jeff King
2021-04-18 8:20 ` Rafael Silva
2021-04-18 13:57 ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57 ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15 ` Jonathan Tan
2021-04-21 18:54 ` Rafael Silva
2021-04-19 23:09 ` Junio C Hamano
2021-04-21 19:25 ` Rafael Silva
2021-04-21 19:32 ` [PATCH v3] " Rafael Silva
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGyf7-HTCDm_SB5CfQWJWjvuCVYuJ4=h65=zG-N1XTgNRs+j0w@mail.gmail.com' \
--to=bturner@atlassian.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=peff@peff.net \
--cc=rafaeloliveira.cs@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).