* [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff
2022-02-02 12:51 ` [PATCH v2 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
@ 2022-02-02 12:51 ` Patrick Steinhardt
2022-02-09 18:01 ` Christian Couder
2022-02-02 12:51 ` [PATCH v2 2/2] fetch: skip computing output width when not printing anything Patrick Steinhardt
2022-02-10 12:28 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
2 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-02 12:51 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 3068 bytes --]
During packfile negotiation we iterate over all refs announced by the
remote side to check whether their IDs refer to commits already known to
us. If a commit is known to us already, then its date is a potential
cutoff point for commits we have in common with the remote side.
There is potentially a lot of commits announced by the remote depending
on how many refs there are in the remote repository, and for every one
of them we need to search for it in our object database and, if found,
parse the corresponding object to find out whether it is a candidate for
the cutoff date. This can be sped up by trying to look up commits via
the commit-graph first, which is a lot more efficient.
Benchmarks in a repository with about 2,1 million refs and an up-to-date
commit-graph show a 20% speedup when mirror-fetching:
Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
Time (mean ± σ): 75.264 s ± 1.115 s [User: 68.199 s, System: 10.094 s]
Range (min … max): 74.145 s … 76.862 s 5 runs
Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
Time (mean ± σ): 62.350 s ± 0.854 s [User: 55.412 s, System: 9.976 s]
Range (min … max): 61.224 s … 63.216 s 5 runs
Summary
'git fetch --atomic +refs/*:refs/* (HEAD)' ran
1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
fetch-pack.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index dd6ec449f2..c5967e228e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -696,26 +696,30 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
for (ref = *refs; ref; ref = ref->next) {
- struct object *o;
+ struct commit *commit;
- if (!has_object_file_with_flags(&ref->old_oid,
+ commit = lookup_commit_in_graph(the_repository, &ref->old_oid);
+ if (!commit) {
+ struct object *o;
+
+ if (!has_object_file_with_flags(&ref->old_oid,
OBJECT_INFO_QUICK |
- OBJECT_INFO_SKIP_FETCH_OBJECT))
- continue;
- o = parse_object(the_repository, &ref->old_oid);
- if (!o)
- continue;
+ OBJECT_INFO_SKIP_FETCH_OBJECT))
+ continue;
+ o = parse_object(the_repository, &ref->old_oid);
+ if (!o || o->type != OBJ_COMMIT)
+ continue;
+
+ commit = (struct commit *)o;
+ }
/*
* We already have it -- which may mean that we were
* in sync with the other side at some time after
* that (it is OK if we guess wrong here).
*/
- if (o->type == OBJ_COMMIT) {
- struct commit *commit = (struct commit *)o;
- if (!cutoff || cutoff < commit->date)
- cutoff = commit->date;
- }
+ if (!cutoff || cutoff < commit->date)
+ cutoff = commit->date;
}
trace2_region_leave("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
--
2.35.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff
2022-02-02 12:51 ` [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff Patrick Steinhardt
@ 2022-02-09 18:01 ` Christian Couder
2022-02-10 11:43 ` Patrick Steinhardt
0 siblings, 1 reply; 16+ messages in thread
From: Christian Couder @ 2022-02-09 18:01 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Taylor Blau, Derrick Stolee
On Mon, Feb 7, 2022 at 7:03 AM Patrick Steinhardt <ps@pks.im> wrote:
> Benchmarks in a repository with about 2,1 million refs and an up-to-date
> commit-graph show a 20% speedup when mirror-fetching:
>
> Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
> Time (mean ± σ): 75.264 s ± 1.115 s [User: 68.199 s, System: 10.094 s]
> Range (min … max): 74.145 s … 76.862 s 5 runs
>
> Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
> Time (mean ± σ): 62.350 s ± 0.854 s [User: 55.412 s, System: 9.976 s]
> Range (min … max): 61.224 s … 63.216 s 5 runs
>
> Summary
> 'git fetch --atomic +refs/*:refs/* (HEAD)' ran
> 1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'
The commit message and code make sense to me, but I wonder if there is
a reason why --atomic is used when fetching.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff
2022-02-09 18:01 ` Christian Couder
@ 2022-02-10 11:43 ` Patrick Steinhardt
0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-10 11:43 UTC (permalink / raw)
To: Christian Couder; +Cc: git, Taylor Blau, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 2148 bytes --]
On Wed, Feb 09, 2022 at 07:01:54PM +0100, Christian Couder wrote:
> On Mon, Feb 7, 2022 at 7:03 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> > Benchmarks in a repository with about 2,1 million refs and an up-to-date
> > commit-graph show a 20% speedup when mirror-fetching:
> >
> > Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
> > Time (mean ± σ): 75.264 s ± 1.115 s [User: 68.199 s, System: 10.094 s]
> > Range (min … max): 74.145 s … 76.862 s 5 runs
> >
> > Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
> > Time (mean ± σ): 62.350 s ± 0.854 s [User: 55.412 s, System: 9.976 s]
> > Range (min … max): 61.224 s … 63.216 s 5 runs
> >
> > Summary
> > 'git fetch --atomic +refs/*:refs/* (HEAD)' ran
> > 1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'
>
> The commit message and code make sense to me, but I wonder if there is
> a reason why --atomic is used when fetching.
The repository that I was mirror-fetching into needs to update a big
bunch of references, and doing that via `--atomic` is more efficient
than doing it without, and this shows in the benchmark. I did another
benchmarking run without `--atomic`, and it is indeed about 30 seconds
slower for both cases. But interestingly the relative performance
improvement is still roughly the same:
Benchmark 1: git fetch +refs/*:refs/* (v2.35.0)
Time (mean ± σ): 115.587 s ± 2.009 s [User: 109.874 s, System: 11.305 s]
Range (min … max): 113.584 s … 118.820 s 5 runs
Benchmark 2: git fetch +refs/*:refs/* (pks-fetch-pack-optim-v1~)
Time (mean ± σ): 96.859 s ± 0.624 s [User: 91.948 s, System: 10.980 s]
Range (min … max): 96.180 s … 97.875 s 5 runs
Summary
'git fetch +refs/*:refs/* (pks-fetch-pack-optim-v1~)' ran
1.19 ± 0.02 times faster than 'git fetch +refs/*:refs/* (v2.35.0)'
I'll update the commit message to just use this new benchmark so that
the `--atomic` flag doesn't cause any questions.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/2] fetch: skip computing output width when not printing anything
2022-02-02 12:51 ` [PATCH v2 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
2022-02-02 12:51 ` [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff Patrick Steinhardt
@ 2022-02-02 12:51 ` Patrick Steinhardt
2022-02-09 18:10 ` Christian Couder
2022-02-10 12:28 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
2 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-02 12:51 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 2789 bytes --]
When updating references via git-fetch(1), then by default we report to
the user which references have been changed. This output is formatted in
a nice table such that the different columns are aligned. Because the
first column contains abbreviated object IDs we thus need to iterate
over all refs which have changed and compute the minimum length for
their respective abbreviated hashes. While this effort makes sense in
most cases, it is wasteful when the user passes the `--quiet` flag: we
don't print the summary, but still compute the length.
Skip computing the summary width when the user asked for us to be quiet.
This gives us a small speedup of nearly 10% when doing a dry-run
mirror-fetch in a repository with thousands of references being updated:
Benchmark 1: git fetch --prune --dry-run +refs/*:refs/* (HEAD~)
Time (mean ± σ): 34.048 s ± 0.233 s [User: 30.739 s, System: 4.640 s]
Range (min … max): 33.785 s … 34.296 s 5 runs
Benchmark 2: git fetch --prune --dry-run +refs/*:refs/* (HEAD)
Time (mean ± σ): 30.768 s ± 0.287 s [User: 27.534 s, System: 4.565 s]
Range (min … max): 30.432 s … 31.181 s 5 runs
Summary
'git fetch --prune --dry-run +refs/*:refs/* (HEAD)' ran
1.11 ± 0.01 times faster than 'git fetch --prune --dry-run +refs/*:refs/* (HEAD~)'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/fetch.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5f06b21f8e..ebbde5d56d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1093,12 +1093,15 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
struct ref *rm;
char *url;
int want_status;
- int summary_width = transport_summary_width(ref_map);
+ int summary_width = 0;
rc = open_fetch_head(&fetch_head);
if (rc)
return -1;
+ if (verbosity >= 0)
+ summary_width = transport_summary_width(ref_map);
+
if (raw_url)
url = transport_anonymize_url(raw_url);
else
@@ -1344,7 +1347,6 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
int url_len, i, result = 0;
struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
char *url;
- int summary_width = transport_summary_width(stale_refs);
const char *dangling_msg = dry_run
? _(" (%s will become dangling)")
: _(" (%s has become dangling)");
@@ -1373,6 +1375,8 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
}
if (verbosity >= 0) {
+ int summary_width = transport_summary_width(stale_refs);
+
for (ref = stale_refs; ref; ref = ref->next) {
struct strbuf sb = STRBUF_INIT;
if (!shown_url) {
--
2.35.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] fetch: skip computing output width when not printing anything
2022-02-02 12:51 ` [PATCH v2 2/2] fetch: skip computing output width when not printing anything Patrick Steinhardt
@ 2022-02-09 18:10 ` Christian Couder
2022-02-10 12:23 ` Patrick Steinhardt
0 siblings, 1 reply; 16+ messages in thread
From: Christian Couder @ 2022-02-09 18:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Taylor Blau, Derrick Stolee
On Wed, Feb 2, 2022 at 5:13 PM Patrick Steinhardt <ps@pks.im> wrote:
> Skip computing the summary width when the user asked for us to be quiet.
There is a --quiet option for git fetch, so here we can expect that it
will be used to test this speedup...
> This gives us a small speedup of nearly 10% when doing a dry-run
> mirror-fetch in a repository with thousands of references being updated:
>
> Benchmark 1: git fetch --prune --dry-run +refs/*:refs/* (HEAD~)
> Time (mean ± σ): 34.048 s ± 0.233 s [User: 30.739 s, System: 4.640 s]
> Range (min … max): 33.785 s … 34.296 s 5 runs
>
> Benchmark 2: git fetch --prune --dry-run +refs/*:refs/* (HEAD)
> Time (mean ± σ): 30.768 s ± 0.287 s [User: 27.534 s, System: 4.565 s]
> Range (min … max): 30.432 s … 31.181 s 5 runs
>
> Summary
> 'git fetch --prune --dry-run +refs/*:refs/* (HEAD)' ran
> 1.11 ± 0.01 times faster than 'git fetch --prune --dry-run +refs/*:refs/* (HEAD~)'
...but --prune and --dry-run are used for testing. It would be nice if
this discrepancy was explained a bit.
Otherwise the commit message and code look good to me.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] fetch: skip computing output width when not printing anything
2022-02-09 18:10 ` Christian Couder
@ 2022-02-10 12:23 ` Patrick Steinhardt
0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-10 12:23 UTC (permalink / raw)
To: Christian Couder; +Cc: git, Taylor Blau, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]
On Wed, Feb 09, 2022 at 07:10:38PM +0100, Christian Couder wrote:
> On Wed, Feb 2, 2022 at 5:13 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> > Skip computing the summary width when the user asked for us to be quiet.
>
> There is a --quiet option for git fetch, so here we can expect that it
> will be used to test this speedup...
>
> > This gives us a small speedup of nearly 10% when doing a dry-run
> > mirror-fetch in a repository with thousands of references being updated:
> >
> > Benchmark 1: git fetch --prune --dry-run +refs/*:refs/* (HEAD~)
> > Time (mean ± σ): 34.048 s ± 0.233 s [User: 30.739 s, System: 4.640 s]
> > Range (min … max): 33.785 s … 34.296 s 5 runs
> >
> > Benchmark 2: git fetch --prune --dry-run +refs/*:refs/* (HEAD)
> > Time (mean ± σ): 30.768 s ± 0.287 s [User: 27.534 s, System: 4.565 s]
> > Range (min … max): 30.432 s … 31.181 s 5 runs
> >
> > Summary
> > 'git fetch --prune --dry-run +refs/*:refs/* (HEAD)' ran
> > 1.11 ± 0.01 times faster than 'git fetch --prune --dry-run +refs/*:refs/* (HEAD~)'
>
> ...but --prune and --dry-run are used for testing. It would be nice if
> this discrepancy was explained a bit.
>
> Otherwise the commit message and code look good to me.
Yeah, I was hiding away the `--quiet` flag here by accident. I used
`--prune` and `--dry-run` to trigger more lines to be printed to console
and to not take into account the time it takes to update local refs and
fetch objects. But doing so without these flags also demonstrates what I
want to:
Benchmark 1: git fetch --quiet +refs/*:refs/* (pks-fetch-pack-optim-v1~)
Time (mean ± σ): 96.078 s ± 0.508 s [User: 91.378 s, System: 10.870 s]
Range (min … max): 95.449 s … 96.760 s 5 runs
Benchmark 2: git fetch --quiet +refs/*:refs/* (pks-fetch-pack-optim-v1)
Time (mean ± σ): 88.214 s ± 0.192 s [User: 83.274 s, System: 10.978 s]
Range (min … max): 87.998 s … 88.446 s 5 runs
So again, I'll update the commit message.
Thanks for your feedback!
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs
2022-02-02 12:51 ` [PATCH v2 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
2022-02-02 12:51 ` [PATCH v2 1/2] fetch-pack: use commit-graph when computing cutoff Patrick Steinhardt
2022-02-02 12:51 ` [PATCH v2 2/2] fetch: skip computing output width when not printing anything Patrick Steinhardt
@ 2022-02-10 12:28 ` Patrick Steinhardt
2022-02-10 12:28 ` [PATCH v3 1/2] fetch-pack: use commit-graph when computing cutoff Patrick Steinhardt
` (2 more replies)
2 siblings, 3 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-10 12:28 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Derrick Stolee, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 4413 bytes --]
Hi,
this is the third version of my patch series which aimn to speed up
mirror-fetches in repos with huge amounts of refs. Again, the only
change compared to v2 is a change in commit messages: Chris has rightly
pointed out that the benchmarks were a bit confusing, so I've updated
them to hopefully be less so.
Thanks for your feedback!
Patrick
Patrick Steinhardt (2):
fetch-pack: use commit-graph when computing cutoff
fetch: skip computing output width when not printing anything
builtin/fetch.c | 8 ++++++--
fetch-pack.c | 28 ++++++++++++++++------------
2 files changed, 22 insertions(+), 14 deletions(-)
Range-diff against v2:
1: 6fac914f0f ! 1: 077d06764c fetch-pack: use commit-graph when computing cutoff
@@ Commit message
the commit-graph first, which is a lot more efficient.
Benchmarks in a repository with about 2,1 million refs and an up-to-date
- commit-graph show a 20% speedup when mirror-fetching:
+ commit-graph show an almost 20% speedup when mirror-fetching:
- Benchmark 1: git fetch --atomic +refs/*:refs/* (v2.35.0)
- Time (mean ± σ): 75.264 s ± 1.115 s [User: 68.199 s, System: 10.094 s]
- Range (min … max): 74.145 s … 76.862 s 5 runs
+ Benchmark 1: git fetch +refs/*:refs/* (v2.35.0)
+ Time (mean ± σ): 115.587 s ± 2.009 s [User: 109.874 s, System: 11.305 s]
+ Range (min … max): 113.584 s … 118.820 s 5 runs
- Benchmark 2: git fetch --atomic +refs/*:refs/* (HEAD)
- Time (mean ± σ): 62.350 s ± 0.854 s [User: 55.412 s, System: 9.976 s]
- Range (min … max): 61.224 s … 63.216 s 5 runs
+ Benchmark 2: git fetch +refs/*:refs/* (HEAD)
+ Time (mean ± σ): 96.859 s ± 0.624 s [User: 91.948 s, System: 10.980 s]
+ Range (min … max): 96.180 s … 97.875 s 5 runs
Summary
- 'git fetch --atomic +refs/*:refs/* (HEAD)' ran
- 1.21 ± 0.02 times faster than 'git fetch --atomic +refs/*:refs/* (v2.35.0)'
+ 'git fetch +refs/*:refs/* (HEAD)' ran
+ 1.19 ± 0.02 times faster than 'git fetch +refs/*:refs/* (v2.35.0)'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
2: 4b9bbcf795 ! 2: ef1fd07be5 fetch: skip computing output width when not printing anything
@@ Commit message
don't print the summary, but still compute the length.
Skip computing the summary width when the user asked for us to be quiet.
- This gives us a small speedup of nearly 10% when doing a dry-run
- mirror-fetch in a repository with thousands of references being updated:
+ This gives us a speedup of nearly 10% when doing a mirror-fetch in a
+ repository with thousands of references being updated:
- Benchmark 1: git fetch --prune --dry-run +refs/*:refs/* (HEAD~)
- Time (mean ± σ): 34.048 s ± 0.233 s [User: 30.739 s, System: 4.640 s]
- Range (min … max): 33.785 s … 34.296 s 5 runs
+ Benchmark 1: git fetch --quiet +refs/*:refs/* (HEAD~)
+ Time (mean ± σ): 96.078 s ± 0.508 s [User: 91.378 s, System: 10.870 s]
+ Range (min … max): 95.449 s … 96.760 s 5 runs
- Benchmark 2: git fetch --prune --dry-run +refs/*:refs/* (HEAD)
- Time (mean ± σ): 30.768 s ± 0.287 s [User: 27.534 s, System: 4.565 s]
- Range (min … max): 30.432 s … 31.181 s 5 runs
+ Benchmark 2: git fetch --quiet +refs/*:refs/* (HEAD)
+ Time (mean ± σ): 88.214 s ± 0.192 s [User: 83.274 s, System: 10.978 s]
+ Range (min … max): 87.998 s … 88.446 s 5 runs
Summary
- 'git fetch --prune --dry-run +refs/*:refs/* (HEAD)' ran
- 1.11 ± 0.01 times faster than 'git fetch --prune --dry-run +refs/*:refs/* (HEAD~)'
+ 'git fetch --quiet +refs/*:refs/* (HEAD)' ran
+ 1.09 ± 0.01 times faster than 'git fetch --quiet +refs/*:refs/* (HEAD~)'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
--
2.35.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/2] fetch-pack: use commit-graph when computing cutoff
2022-02-10 12:28 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
@ 2022-02-10 12:28 ` Patrick Steinhardt
2022-02-10 12:28 ` [PATCH v3 2/2] fetch: skip computing output width when not printing anything Patrick Steinhardt
2022-02-10 18:04 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Junio C Hamano
2 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-10 12:28 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Derrick Stolee, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 3045 bytes --]
During packfile negotiation we iterate over all refs announced by the
remote side to check whether their IDs refer to commits already known to
us. If a commit is known to us already, then its date is a potential
cutoff point for commits we have in common with the remote side.
There is potentially a lot of commits announced by the remote depending
on how many refs there are in the remote repository, and for every one
of them we need to search for it in our object database and, if found,
parse the corresponding object to find out whether it is a candidate for
the cutoff date. This can be sped up by trying to look up commits via
the commit-graph first, which is a lot more efficient.
Benchmarks in a repository with about 2,1 million refs and an up-to-date
commit-graph show an almost 20% speedup when mirror-fetching:
Benchmark 1: git fetch +refs/*:refs/* (v2.35.0)
Time (mean ± σ): 115.587 s ± 2.009 s [User: 109.874 s, System: 11.305 s]
Range (min … max): 113.584 s … 118.820 s 5 runs
Benchmark 2: git fetch +refs/*:refs/* (HEAD)
Time (mean ± σ): 96.859 s ± 0.624 s [User: 91.948 s, System: 10.980 s]
Range (min … max): 96.180 s … 97.875 s 5 runs
Summary
'git fetch +refs/*:refs/* (HEAD)' ran
1.19 ± 0.02 times faster than 'git fetch +refs/*:refs/* (v2.35.0)'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
fetch-pack.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index dd6ec449f2..c5967e228e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -696,26 +696,30 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
for (ref = *refs; ref; ref = ref->next) {
- struct object *o;
+ struct commit *commit;
- if (!has_object_file_with_flags(&ref->old_oid,
+ commit = lookup_commit_in_graph(the_repository, &ref->old_oid);
+ if (!commit) {
+ struct object *o;
+
+ if (!has_object_file_with_flags(&ref->old_oid,
OBJECT_INFO_QUICK |
- OBJECT_INFO_SKIP_FETCH_OBJECT))
- continue;
- o = parse_object(the_repository, &ref->old_oid);
- if (!o)
- continue;
+ OBJECT_INFO_SKIP_FETCH_OBJECT))
+ continue;
+ o = parse_object(the_repository, &ref->old_oid);
+ if (!o || o->type != OBJ_COMMIT)
+ continue;
+
+ commit = (struct commit *)o;
+ }
/*
* We already have it -- which may mean that we were
* in sync with the other side at some time after
* that (it is OK if we guess wrong here).
*/
- if (o->type == OBJ_COMMIT) {
- struct commit *commit = (struct commit *)o;
- if (!cutoff || cutoff < commit->date)
- cutoff = commit->date;
- }
+ if (!cutoff || cutoff < commit->date)
+ cutoff = commit->date;
}
trace2_region_leave("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
--
2.35.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 2/2] fetch: skip computing output width when not printing anything
2022-02-10 12:28 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
2022-02-10 12:28 ` [PATCH v3 1/2] fetch-pack: use commit-graph when computing cutoff Patrick Steinhardt
@ 2022-02-10 12:28 ` Patrick Steinhardt
2022-02-10 18:04 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Junio C Hamano
2 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2022-02-10 12:28 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Derrick Stolee, Christian Couder
[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]
When updating references via git-fetch(1), then by default we report to
the user which references have been changed. This output is formatted in
a nice table such that the different columns are aligned. Because the
first column contains abbreviated object IDs we thus need to iterate
over all refs which have changed and compute the minimum length for
their respective abbreviated hashes. While this effort makes sense in
most cases, it is wasteful when the user passes the `--quiet` flag: we
don't print the summary, but still compute the length.
Skip computing the summary width when the user asked for us to be quiet.
This gives us a speedup of nearly 10% when doing a mirror-fetch in a
repository with thousands of references being updated:
Benchmark 1: git fetch --quiet +refs/*:refs/* (HEAD~)
Time (mean ± σ): 96.078 s ± 0.508 s [User: 91.378 s, System: 10.870 s]
Range (min … max): 95.449 s … 96.760 s 5 runs
Benchmark 2: git fetch --quiet +refs/*:refs/* (HEAD)
Time (mean ± σ): 88.214 s ± 0.192 s [User: 83.274 s, System: 10.978 s]
Range (min … max): 87.998 s … 88.446 s 5 runs
Summary
'git fetch --quiet +refs/*:refs/* (HEAD)' ran
1.09 ± 0.01 times faster than 'git fetch --quiet +refs/*:refs/* (HEAD~)'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/fetch.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5b3b18a72f..7ef305c66d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1094,12 +1094,15 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
struct ref *rm;
char *url;
int want_status;
- int summary_width = transport_summary_width(ref_map);
+ int summary_width = 0;
rc = open_fetch_head(&fetch_head);
if (rc)
return -1;
+ if (verbosity >= 0)
+ summary_width = transport_summary_width(ref_map);
+
if (raw_url)
url = transport_anonymize_url(raw_url);
else
@@ -1345,7 +1348,6 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
int url_len, i, result = 0;
struct ref *ref, *stale_refs = get_stale_heads(rs, ref_map);
char *url;
- int summary_width = transport_summary_width(stale_refs);
const char *dangling_msg = dry_run
? _(" (%s will become dangling)")
: _(" (%s has become dangling)");
@@ -1374,6 +1376,8 @@ static int prune_refs(struct refspec *rs, struct ref *ref_map,
}
if (verbosity >= 0) {
+ int summary_width = transport_summary_width(stale_refs);
+
for (ref = stale_refs; ref; ref = ref->next) {
struct strbuf sb = STRBUF_INIT;
if (!shown_url) {
--
2.35.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs
2022-02-10 12:28 ` [PATCH v3 0/2] fetch: speed up mirror-fetches with many refs Patrick Steinhardt
2022-02-10 12:28 ` [PATCH v3 1/2] fetch-pack: use commit-graph when computing cutoff Patrick Steinhardt
2022-02-10 12:28 ` [PATCH v3 2/2] fetch: skip computing output width when not printing anything Patrick Steinhardt
@ 2022-02-10 18:04 ` Junio C Hamano
2 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2022-02-10 18:04 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Taylor Blau, Derrick Stolee, Christian Couder
1Patrick Steinhardt <ps@pks.im> writes:
> this is the third version of my patch series which aimn to speed up
> mirror-fetches in repos with huge amounts of refs. Again, the only
> change compared to v2 is a change in commit messages: Chris has rightly
> pointed out that the benchmarks were a bit confusing, so I've updated
> them to hopefully be less so.
>
> Thanks for your feedback!
> Patrick
>
> Patrick Steinhardt (2):
> fetch-pack: use commit-graph when computing cutoff
> fetch: skip computing output width when not printing anything
Both changes are based on quite sensible idea. If we have
precomputed dates for each commit, it makes sense to look it up
before parsing the commit. If we are not preparing output, there is
no point in computing the output width.
Very simple and potentially effective.
Will queue. Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread