All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH 0/7] rev-parse: implement object type filter
Date: Wed, 10 Mar 2021 16:39:22 -0500	[thread overview]
Message-ID: <YEk8iiDf/FMxzhIF@coredump.intra.peff.net> (raw)
In-Reply-To: <cover.1614600555.git.ps@pks.im>

On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:

> Altogether, this ends up with the following queries, both of which have
> been executed in a well-packed linux.git repository:
> 
>     # Previous query which uses object names as a heuristic to filter
>     # non-blob objects, which bars us from using bitmap indices because
>     # they cannot print paths.
>     $ time git rev-list --objects --filter=blob:limit=200 \
>         --object-names --all | sed -r '/^.{,41}$/d' | wc -l
>     4502300
> 
>     real 1m23.872s
>     user 1m30.076s
>     sys  0m6.002s
> 
>     # New query.
>     $ time git rev-list --objects --filter-provided \
>         --filter=object:type=blob --filter=blob:limit=200 \
>         --use-bitmap-index --all | wc -l
>     22585
> 
>     real 0m19.216s
>     user 0m16.768s
>     sys  0m2.450s

Those produce very different answers. I guess because in the first one,
you still have a bunch of tree objects, too. You'd do much better to get
the actual types from cat-file, and filter on that. That also lets you
use bitmaps for the traversal portion. E.g.:

  $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
         git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
	 perl -lne 'print $1 if /^blob (.*)/' | wc -l
  14966
  
  real	0m6.248s
  user	0m7.810s
  sys	0m0.440s

which is faster than what you showed above (this is on linux.git, but my
result is different; maybe you have more refs than me?). But we should
be able to do better purely internally, so I suspect my computer is just
faster (or maybe your extra refs just aren't well-covered by bitmaps).
Running with your patches I get:

  $ time git rev-list --objects --use-bitmap-index --all \
             --filter-provided --filter=object:type=blob \
	     --filter=blob:limit=200 | wc -l
  16339

  real	0m1.309s
  user	0m1.234s
  sys	0m0.079s

which is indeed faster. It's quite curious that the answer is not the
same, though! I think yours has some bugs. If I sort and diff the
results, I see some commits mentioned in the output. Perhaps this is
--filter-provided not working, as they all seem to be ref tips.

> To be able to more efficiently answer this query, I've implemented
> multiple things:
> 
> - A new object type filter `--filter=object:type=<type>` for
>   git-rev-list(1), which is implemented both for normal graph walks and
>   for the packfile bitmap index.
> 
> - Given that above usecase requires two filters (the object type
>   and blob size filters), bitmap filters were extended to support
>   combined filters.

That's probably reasonable, especially because it lets us use bitmaps. I
do have a dream that we'll eventually be able to support more extensive
formatting via log/rev-list, which would allow:

  git rev-list --use-bitmap-index --objects --all \
               --format=%(objecttype) %(objectname) |
  perl -ne 'print $1 if /^blob (.*)/'

That should be faster than the separate cat-file (which has to re-lookup
each object, in addition to the extra pipe overhead), but I expect the
--filter solution should always be faster still, as it can very quickly
eliminate the majority of the objects at the bitmap level.

> - git-rev-list(1) doesn't filter user-provided objects and always prints
>   them. I don't want the listed commits though and only their referenced
>   potential LFS blobs. So I've added a new flag `--filter-provided`
>   which marks all provided objects as not-user-provided such that they
>   get filtered the same as all the other objects.

Yeah, this "user-provided" behavior was quite a surprise to me when I
started implementing the bitmap versions of the existing filters. It's
nice to have the option to specify which you want.

-Peff

  parent reply	other threads:[~2021-03-10 21:40 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 2/7] list-objects: move tag processing into its own function Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 3/7] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 4/7] list-objects: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 5/7] pack-bitmap: " Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 6/7] pack-bitmap: implement combined filter Patrick Steinhardt
2021-03-01 12:21 ` [PATCH 7/7] rev-list: allow filtering of provided items Patrick Steinhardt
2021-03-10 21:39 ` Jeff King [this message]
2021-03-11 14:38   ` [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-11 17:54     ` Jeff King
2021-03-15 11:25   ` Patrick Steinhardt
2021-03-10 21:58 ` Taylor Blau
2021-03-10 22:19   ` Jeff King
2021-03-11 14:43     ` Patrick Steinhardt
2021-03-11 17:56       ` Jeff King
2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-06 17:17     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-06 17:30     ` Jeff King
2021-04-09 10:19       ` Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-06 17:39     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-06 17:42     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-06 17:48     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-06 17:54     ` Jeff King
2021-04-09 10:31       ` Patrick Steinhardt
2021-04-09 15:53         ` Jeff King
2021-04-09 11:17       ` Patrick Steinhardt
2021-04-09 15:55         ` Jeff King
2021-03-15 13:15   ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-06 18:04     ` Jeff King
2021-04-09 10:59       ` Patrick Steinhardt
2021-04-09 15:58         ` Jeff King
2021-03-20 21:10   ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-06 18:08     ` Jeff King
2021-04-09 11:14       ` Patrick Steinhardt
2021-04-09 16:05         ` Jeff King
2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
2021-04-09 11:27     ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-09 11:27     ` [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-11  6:49       ` Junio C Hamano
2021-04-09 11:28     ` [PATCH v3 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-09 11:32       ` [RESEND PATCH " Patrick Steinhardt
2021-04-09 15:00       ` [PATCH " Philip Oakley
2021-04-12 13:15         ` Patrick Steinhardt
2021-04-11  6:02     ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-12 13:12       ` Patrick Steinhardt
2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-13  9:57         ` Ævar Arnfjörð Bjarmason
2021-04-13 10:43           ` Andreas Schwab
2021-04-14 11:32           ` Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-13  7:45       ` [PATCH v4 0/8] rev-list: implement object type filter Jeff King
2021-04-13  8:06         ` Patrick Steinhardt
2021-04-15  9:42           ` Jeff King
2021-04-16 22:06             ` Junio C Hamano
2021-04-16 23:15               ` Junio C Hamano
2021-04-17  1:17                 ` Ramsay Jones
2021-04-17  9:01                   ` Jeff King
2021-04-17 21:45                     ` Junio C Hamano
2021-04-13 21:03         ` Junio C Hamano
2021-04-14 11:59           ` Patrick Steinhardt
2021-04-14 21:07             ` Junio C Hamano
2021-04-15  9:57               ` Jeff King
2021-04-15 17:53                 ` Junio C Hamano
2021-04-15 17:57                   ` Junio C Hamano
2021-04-17  8:58                     ` Jeff King
2021-04-19 11:46       ` [PATCH v5 " Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-19 11:46         ` [PATCH v5 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-19 11:47         ` [PATCH v5 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-19 11:47         ` [PATCH v5 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-19 23:16         ` [PATCH v5 0/8] rev-list: implement object type filter Junio C Hamano
2021-04-23  9:13           ` Jeff King
2021-04-28  2:18             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEk8iiDf/FMxzhIF@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.