* [PATCH] builtin/show-ref.c: avoid over-iterating with --heads, --tags
@ 2022-06-03 21:55 Taylor Blau
2022-06-04 2:33 ` Derrick Stolee
0 siblings, 1 reply; 3+ messages in thread
From: Taylor Blau @ 2022-06-03 21:55 UTC (permalink / raw)
To: git; +Cc: gitster
When `show-ref` is combined with the `--heads` or `--tags` options, it
can avoid iterating parts of a repository's references that it doesn't
care about.
But it doesn't take advantage of this potential optimization. When this
command was introduced back in 358ddb62cf (Add "git show-ref" builtin
command, 2006-09-15), `for_each_ref_in()` did exist. But since most
repositories don't have many (any?) references that aren't branches or
tags already, this makes little difference in practice.
Though for repositories with a large imbalance of branches and tags (or,
more likely in the case of server operators, many hidden references),
this can make quite a difference. Take, for example, a repository with
500,000 "hidden" references (all of the form "refs/__hidden__/N"), and
a single branch:
git commit --allow-empty -m "base" &&
seq 1 500000 | sed 's,\(.*\),create refs/__hidden__/\1 HEAD,' |
git update-ref --stdin &&
git pack-refs --all
Outputting the existence of that single branch currently takes on the
order of ~50ms on my machine. The vast majority of this time is wasted
iterating through references that we know we're going to discard.
Instead, teach `show-ref` that it can iterate just "refs/heads" and/or
"refs/tags" when given `--heads` and/or `--tags`, respectively. A few
small interesting things to note:
- When given either option, we can avoid the general-purpose
for_each_ref() call altogether, since we know that it won't give us
any references that we wouldn't filter out already.
- We can make two separate calls to `for_each_fullref_in()` (and
avoid, say, the more specialized `for_each_fullref_in_prefixes()`,
since we know that the set of references enumerated by each is
disjoint, so we'll never see the same reference appear in both
calls.
- We have to use the "fullref" variant (instead of just
`for_each_branch_ref()` and `for_each_tag_ref()`), since we expect
fully-qualified reference names to appear in `show-ref`'s output.
When either of `heads_only` or `tags_only` is set, we can eliminate the
strcmp() calls in `builtin/show-ref.c::show_ref()` altogether, since we
know that `show_ref()` will never see a non-branch or tag reference.
Unfortunately, we can't use `for_each_fullref_in_prefixes()` to enhance
`show-ref`'s pattern matching, since `show-ref` patterns match on the
_suffix_ (e.g., the pattern "foo" shows "refs/heads/foo",
"refs/tags/foo", and etc, not "foo/*").
Nonetheless, in our synthetic example above, this provides a significant
speed-up ("git" is roughly v2.36, "git.compile" is this patch):
$ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
Benchmark 1: git show-ref --heads
Time (mean ± σ): 49.9 ms ± 6.2 ms [User: 45.6 ms, System: 4.1 ms]
Range (min … max): 46.1 ms … 73.6 ms 43 runs
Benchmark 2: git.compile show-ref --heads
Time (mean ± σ): 2.8 ms ± 0.4 ms [User: 1.4 ms, System: 1.2 ms]
Range (min … max): 1.3 ms … 5.6 ms 957 runs
Summary
'git.compile show-ref --heads' ran
18.03 ± 3.38 times faster than 'git show-ref --heads'
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
This is obviously a little extreme of an example for most use-cases,
but it does come up more often than not within GitHub, where we have
many repositories with lots of hidden references that we're wasting time
enumerating with `show-ref`.
builtin/show-ref.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 7f8a5332f8..5fa207a044 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -52,14 +52,6 @@ static int show_ref(const char *refname, const struct object_id *oid,
if (show_head && !strcmp(refname, "HEAD"))
goto match;
- if (tags_only || heads_only) {
- int match;
-
- match = heads_only && starts_with(refname, "refs/heads/");
- match |= tags_only && starts_with(refname, "refs/tags/");
- if (!match)
- return 0;
- }
if (pattern) {
int reflen = strlen(refname);
const char **p = pattern, *m;
@@ -216,7 +208,14 @@ int cmd_show_ref(int argc, const char **argv, const char *prefix)
if (show_head)
head_ref(show_ref, NULL);
- for_each_ref(show_ref, NULL);
+ if (heads_only || tags_only) {
+ if (heads_only)
+ for_each_fullref_in("refs/heads/", show_ref, NULL);
+ if (tags_only)
+ for_each_fullref_in("refs/tags/", show_ref, NULL);
+ } else {
+ for_each_ref(show_ref, NULL);
+ }
if (!found_match) {
if (verify && !quiet)
die("No match");
--
2.36.1.94.gb0d54bedca
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] builtin/show-ref.c: avoid over-iterating with --heads, --tags
2022-06-03 21:55 [PATCH] builtin/show-ref.c: avoid over-iterating with --heads, --tags Taylor Blau
@ 2022-06-04 2:33 ` Derrick Stolee
2022-06-06 17:07 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Derrick Stolee @ 2022-06-04 2:33 UTC (permalink / raw)
To: Taylor Blau, git; +Cc: gitster
On 6/3/2022 5:55 PM, Taylor Blau wrote:
> When `show-ref` is combined with the `--heads` or `--tags` options, it
> can avoid iterating parts of a repository's references that it doesn't
> care about.
>
> But it doesn't take advantage of this potential optimization. When this
> command was introduced back in 358ddb62cf (Add "git show-ref" builtin
> command, 2006-09-15), `for_each_ref_in()` did exist. But since most
> repositories don't have many (any?) references that aren't branches or
> tags already, this makes little difference in practice.
...
> Nonetheless, in our synthetic example above, this provides a significant
> speed-up ("git" is roughly v2.36, "git.compile" is this patch):
>
> $ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
> Benchmark 1: git show-ref --heads
> Time (mean ± σ): 49.9 ms ± 6.2 ms [User: 45.6 ms, System: 4.1 ms]
> Range (min … max): 46.1 ms … 73.6 ms 43 runs
>
> Benchmark 2: git.compile show-ref --heads
> Time (mean ± σ): 2.8 ms ± 0.4 ms [User: 1.4 ms, System: 1.2 ms]
> Range (min … max): 1.3 ms … 5.6 ms 957 runs
>
> Summary
> 'git.compile show-ref --heads' ran
> 18.03 ± 3.38 times faster than 'git show-ref --heads'
Excellent speedup.
> - for_each_ref(show_ref, NULL);
> + if (heads_only || tags_only) {
> + if (heads_only)
> + for_each_fullref_in("refs/heads/", show_ref, NULL);
> + if (tags_only)
> + for_each_fullref_in("refs/tags/", show_ref, NULL);
This looks a little silly if these were truly "only" (they
could not both be true), but indeed they could both be true
and the names are just slightly misleading.
Indeed, t1403-show-ref.sh tests all combinations of these
options.
> + } else {
> + for_each_ref(show_ref, NULL);
> + }
Thanks for finding this!
-Stolee
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] builtin/show-ref.c: avoid over-iterating with --heads, --tags
2022-06-04 2:33 ` Derrick Stolee
@ 2022-06-06 17:07 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2022-06-06 17:07 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Taylor Blau, git
Derrick Stolee <derrickstolee@github.com> writes:
>> - for_each_ref(show_ref, NULL);
>> + if (heads_only || tags_only) {
>> + if (heads_only)
>> + for_each_fullref_in("refs/heads/", show_ref, NULL);
>> + if (tags_only)
>> + for_each_fullref_in("refs/tags/", show_ref, NULL);
>
> This looks a little silly if these were truly "only" (they
> could not both be true), but indeed they could both be true
> and the names are just slightly misleading.
>
> Indeed, t1403-show-ref.sh tests all combinations of these
> options.
Hmph, that is right. show-heads and show-tags, perhaps.
Thanks, both. Queued.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-06-06 17:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03 21:55 [PATCH] builtin/show-ref.c: avoid over-iterating with --heads, --tags Taylor Blau
2022-06-04 2:33 ` Derrick Stolee
2022-06-06 17:07 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).