All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>,
	"Felipe Contreras" <felipe.contreras@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Chris Torek" <chris.torek@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Taylor Blau" <me@ttaylorr.com>
Subject: [PATCH v4 2/6] connected: do not sort input revisions
Date: Thu, 5 Aug 2021 13:25:28 +0200	[thread overview]
Message-ID: <9d7f484907e2bd2492e6676238579e9f0c6ed374.1628162156.git.ps@pks.im> (raw)
In-Reply-To: <cover.1628162156.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 7277 bytes --]

In order to compute whether objects reachable from a set of tips are all
connected, we do a revision walk with these tips as positive references
and `--not --all`. `--not --all` will cause the revision walk to load
all preexisting references as uninteresting, which can be very expensive
in repositories with many references.

Benchmarking the git-rev-list(1) command highlights that by far the most
expensive single phase is initial sorting of the input revisions: after
all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes
up about 40% of the total runtime of git-rev-list(1).

Ultimately, the connectivity check shouldn't really bother about the
order of input revisions at all. We only care whether we can actually
walk all objects until we hit the cut-off point. So sorting the input is
a complete waste of time.

Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
cause it to not sort the commits and adjust the connectivity check to
always pass the flag. This results in the following speedups, executed
in a clone of gitlab-org/gitlab [1]:

    Benchmark #1: git rev-list  --objects --quiet --not --all --not $(cat newrev)
      Time (mean ± σ):      7.639 s ±  0.065 s    [User: 7.304 s, System: 0.335 s]
      Range (min … max):    7.543 s …  7.742 s    10 runs

    Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.995 s ±  0.044 s    [User: 4.657 s, System: 0.337 s]
      Range (min … max):    4.909 s …  5.048 s    10 runs

    Summary
      'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
        1.53 ± 0.02 times faster than 'git rev-list  --objects --quiet --not --all --not $newrev'

[1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
     are visible to clients.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/rev-list-options.txt |  8 ++++++-
 connected.c                        |  1 +
 revision.c                         | 13 ++++++++--
 t/t6000-rev-list-misc.sh           | 38 ++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 24569b06d1..b7bd27e171 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -968,6 +968,11 @@ list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 	objects.
 endif::git-rev-list[]
 
+--unsorted-input::
+	Show commits in the order they were given on the command line instead
+	of sorting them in reverse chronological order by commit time. Cannot
+	be combined with `--no-walk` or `--no-walk=sorted`.
+
 --no-walk[=(sorted|unsorted)]::
 	Only show the given commits, but do not traverse their ancestors.
 	This has no effect if a range is specified. If the argument
@@ -975,7 +980,8 @@ endif::git-rev-list[]
 	given on the command line. Otherwise (if `sorted` or no argument
 	was given), the commits are shown in reverse chronological order
 	by commit time.
-	Cannot be combined with `--graph`.
+	Cannot be combined with `--graph`. Cannot be combined with
+	`--unsorted-input` if `sorted` or no argument was given.
 
 --do-walk::
 	Overrides a previous `--no-walk`.
diff --git a/connected.c b/connected.c
index b18299fdf0..b5f9523a5f 100644
--- a/connected.c
+++ b/connected.c
@@ -106,6 +106,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 	if (opt->progress)
 		strvec_pushf(&rev_list.args, "--progress=%s",
 			     _("Checking connectivity"));
+	strvec_push(&rev_list.args, "--unsorted-input");
 
 	rev_list.git_cmd = 1;
 	rev_list.env = opt->env;
diff --git a/revision.c b/revision.c
index 86bbcd10d2..793f76a509 100644
--- a/revision.c
+++ b/revision.c
@@ -2256,6 +2256,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--author-date-order")) {
 		revs->sort_order = REV_SORT_BY_AUTHOR_DATE;
 		revs->topo_order = 1;
+	} else if (!strcmp(arg, "--unsorted-input")) {
+		if (revs->no_walk && !revs->unsorted_input)
+			die(_("--unsorted-input is incompatible with --no-walk and --no-walk=sorted"));
+		revs->unsorted_input = 1;
 	} else if (!strcmp(arg, "--early-output")) {
 		revs->early_output = 100;
 		revs->topo_order = 1;
@@ -2651,6 +2655,8 @@ static int handle_revision_pseudo_opt(const char *submodule,
 	} else if (!strcmp(arg, "--not")) {
 		*flags ^= UNINTERESTING | BOTTOM;
 	} else if (!strcmp(arg, "--no-walk")) {
+		if (revs->unsorted_input)
+			die(_("--no-walk is incompatible with --no-walk=unsorted and --unsorted-input"));
 		revs->no_walk = 1;
 	} else if (skip_prefix(arg, "--no-walk=", &optarg)) {
 		/*
@@ -2658,9 +2664,12 @@ static int handle_revision_pseudo_opt(const char *submodule,
 		 * not allowed, since the argument is optional.
 		 */
 		revs->no_walk = 1;
-		if (!strcmp(optarg, "sorted"))
+		if (!strcmp(optarg, "sorted")) {
+			if (revs->unsorted_input)
+				die(_("--no-walk=sorted is incompatible with --no-walk=unsorted "
+				    "and --unsorted-input"));
 			revs->unsorted_input = 0;
-		else if (!strcmp(optarg, "unsorted"))
+		} else if (!strcmp(optarg, "unsorted"))
 			revs->unsorted_input = 1;
 		else
 			return error("invalid argument to --no-walk");
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 12def7bcbf..8e213eb413 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -169,4 +169,42 @@ test_expect_success 'rev-list --count --objects' '
 	test_line_count = $count actual
 '
 
+test_expect_success 'rev-list --unsorted-input results in different sorting' '
+	git rev-list --unsorted-input HEAD HEAD~ >first &&
+	git rev-list --unsorted-input HEAD~ HEAD >second &&
+	! test_cmp first second &&
+	sort first >first.sorted &&
+	sort second >second.sorted &&
+	test_cmp first.sorted second.sorted
+'
+
+test_expect_success 'rev-list --unsorted-input compatible with --no-walk=unsorted' '
+	git rev-list --unsorted-input --no-walk=unsorted HEAD HEAD~ >actual &&
+	git rev-parse HEAD >expect &&
+	git rev-parse HEAD~ >>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list --unsorted-input incompatible with --no-walk=sorted' '
+	cat >expect <<-EOF &&
+		fatal: --no-walk is incompatible with --no-walk=unsorted and --unsorted-input
+	EOF
+	test_must_fail git rev-list --unsorted-input --no-walk HEAD 2>error &&
+	test_cmp expect error &&
+
+	cat >expect <<-EOF &&
+		fatal: --no-walk=sorted is incompatible with --no-walk=unsorted and --unsorted-input
+	EOF
+	test_must_fail git rev-list --unsorted-input --no-walk=sorted HEAD 2>error &&
+	test_cmp expect error &&
+
+	cat >expect <<-EOF &&
+		fatal: --unsorted-input is incompatible with --no-walk and --no-walk=sorted
+	EOF
+	test_must_fail git rev-list --no-walk --unsorted-input HEAD 2>error &&
+	test_cmp expect error &&
+	test_must_fail git rev-list --no-walk=sorted --unsorted-input HEAD 2>error &&
+	test_cmp expect error
+'
+
 test_done
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2021-08-05 11:25 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-28  5:33 [PATCH v2 0/3] Speed up connectivity checks via bitmaps Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1) Patrick Steinhardt
2021-06-28  7:49   ` Ævar Arnfjörð Bjarmason
2021-06-29  6:18     ` Patrick Steinhardt
2021-06-29 12:09       ` Ævar Arnfjörð Bjarmason
2021-06-28  5:33 ` [PATCH v2 2/3] receive-pack: skip connectivity checks on delete-only commands Patrick Steinhardt
2021-06-28  8:00   ` Ævar Arnfjörð Bjarmason
2021-06-28  8:06     ` Ævar Arnfjörð Bjarmason
2021-06-29  6:26     ` Patrick Steinhardt
2021-06-30  1:31   ` Jeff King
2021-06-30  1:35     ` Jeff King
2021-06-30 13:52     ` Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 3/3] connected: implement connectivity check using bitmaps Patrick Steinhardt
2021-06-28 20:23   ` Taylor Blau
2021-06-29 22:44     ` Taylor Blau
2021-06-30  2:04       ` Jeff King
2021-06-30  3:07         ` Taylor Blau
2021-06-30  5:45           ` Jeff King
2021-07-02 17:44             ` Taylor Blau
2021-07-02 21:21               ` Jeff King
2021-06-30  1:51   ` Jeff King
2021-07-20 14:26     ` Patrick Steinhardt
2021-08-02  9:37 ` [PATCH v3 0/4] Speed up connectivity checks Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 1/4] connected: do not sort input revisions Patrick Steinhardt
2021-08-02 12:49     ` Ævar Arnfjörð Bjarmason
2021-08-03  8:50       ` Patrick Steinhardt
2021-08-04 11:01         ` Ævar Arnfjörð Bjarmason
2021-08-02 19:00     ` Junio C Hamano
2021-08-03  8:55       ` Patrick Steinhardt
2021-08-03 21:47         ` Junio C Hamano
2021-08-02  9:38   ` [PATCH v3 2/4] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-02 12:53     ` Ævar Arnfjörð Bjarmason
2021-08-02  9:38   ` [PATCH v3 3/4] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-02 12:55     ` Ævar Arnfjörð Bjarmason
2021-08-05 10:12       ` Patrick Steinhardt
2021-08-02 19:40     ` Junio C Hamano
2021-08-03  9:07       ` Patrick Steinhardt
2021-08-06 14:17         ` Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 4/4] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-02 20:01     ` Junio C Hamano
2021-08-03  9:16       ` Patrick Steinhardt
2021-08-03 21:56         ` Junio C Hamano
2021-08-05 11:01           ` Patrick Steinhardt
2021-08-05 16:16             ` Junio C Hamano
2021-08-04 10:51         ` Ævar Arnfjörð Bjarmason
2021-08-05 11:25   ` [PATCH v4 0/6] Speed up connectivity checks Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 1/6] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-05 18:47       ` Junio C Hamano
2021-08-05 11:25     ` Patrick Steinhardt [this message]
2021-08-05 18:44       ` [PATCH v4 2/6] connected: do not sort input revisions Junio C Hamano
2021-08-06  6:00         ` Patrick Steinhardt
2021-08-06 16:50           ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 3/6] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 4/6] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 5/6] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 6/6] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-09  8:00 ` [PATCH v5 0/5] Speed up connectivity checks Patrick Steinhardt
2021-08-09  8:02   ` Patrick Steinhardt
2021-08-09  8:11 ` Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 1/5] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 2/5] connected: do not sort input revisions Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 3/5] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 4/5] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-09  8:12   ` [PATCH v5 5/5] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d7f484907e2bd2492e6676238579e9f0c6ed374.1628162156.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=avarab@gmail.com \
    --cc=chris.torek@gmail.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.