All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH v6 02/16] revision.c: consult Bloom filters for root commits
Date: Wed, 31 Jan 2024 17:52:23 -0500	[thread overview]
Message-ID: <a6dc377f1b46d4afd442023426fc8dd5cd8ceb8e.1706741516.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1706741516.git.me@ttaylorr.com>

The commit-graph stores changed-path Bloom filters which represent the
set of paths included in a tree-level diff between a commit's root tree
and that of its parent.

When a commit has no parents, the tree-diff is computed against that
commit's root tree and the empty tree. In other words, every path in
that commit's tree is stored in the Bloom filter (since they all appear
in the diff).

Consult these filters during pathspec-limited traversals in the function
`rev_same_tree_as_empty()`. Doing so yields a performance improvement
where we can avoid enumerating the full set of paths in a parentless
commit's root tree when we know that the path(s) of interest were not
listed in that commit's changed-path Bloom filter.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Original-patch-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 revision.c           | 26 ++++++++++++++++++++++----
 t/t4216-log-bloom.sh |  8 ++++++--
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/revision.c b/revision.c
index 2424c9bd67..0e6f7d02b6 100644
--- a/revision.c
+++ b/revision.c
@@ -833,17 +833,28 @@ static int rev_compare_tree(struct rev_info *revs,
 	return tree_difference;
 }
 
-static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
+static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit,
+				  int nth_parent)
 {
 	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
+	int bloom_ret = -1;
 
 	if (!t1)
 		return 0;
 
+	if (!nth_parent && revs->bloom_keys_nr) {
+		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
+		if (!bloom_ret)
+			return 1;
+	}
+
 	tree_difference = REV_TREE_SAME;
 	revs->pruning.flags.has_changes = 0;
 	diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);
 
+	if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
+		count_bloom_filter_false_positive++;
+
 	return tree_difference == REV_TREE_SAME;
 }
 
@@ -881,7 +892,7 @@ static int compact_treesame(struct rev_info *revs, struct commit *commit, unsign
 		if (nth_parent != 0)
 			die("compact_treesame %u", nth_parent);
 		old_same = !!(commit->object.flags & TREESAME);
-		if (rev_same_tree_as_empty(revs, commit))
+		if (rev_same_tree_as_empty(revs, commit, nth_parent))
 			commit->object.flags |= TREESAME;
 		else
 			commit->object.flags &= ~TREESAME;
@@ -977,7 +988,14 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		return;
 
 	if (!commit->parents) {
-		if (rev_same_tree_as_empty(revs, commit))
+		/*
+		 * Pretend as if we are comparing ourselves to the
+		 * (non-existent) first parent of this commit object. Even
+		 * though no such parent exists, its changed-path Bloom filter
+		 * (if one exists) is relative to the empty tree, using Bloom
+		 * filters is allowed here.
+		 */
+		if (rev_same_tree_as_empty(revs, commit, 0))
 			commit->object.flags |= TREESAME;
 		return;
 	}
@@ -1058,7 +1076,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 
 		case REV_TREE_NEW:
 			if (revs->remove_empty_trees &&
-			    rev_same_tree_as_empty(revs, p)) {
+			    rev_same_tree_as_empty(revs, p, nth_parent)) {
 				/* We are adding all the specified
 				 * paths from this parent, so the
 				 * history beyond this parent is not
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index b7baf49d62..cc6ebc8140 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -88,7 +88,11 @@ test_bloom_filters_not_used () {
 		# if the Bloom filter system is initialized, ensure that no
 		# filters were used
 		data="statistics:{"
-		data="$data\"filter_not_present\":0,"
+		# unusable filters (e.g., those computed with a
+		# different value of commitGraph.changedPathsVersion)
+		# are counted in the filter_not_present bucket, so any
+		# value is OK there.
+		data="$data\"filter_not_present\":[0-9][0-9]*,"
 		data="$data\"maybe\":0,"
 		data="$data\"definitely_not\":0,"
 		data="$data\"false_positive\":0}"
@@ -175,7 +179,7 @@ test_expect_success 'setup - add commit-graph to the chain with Bloom filters' '
 
 test_bloom_filters_used_when_some_filters_are_missing () {
 	log_args=$1
-	bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":9"
+	bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":10"
 	setup "$log_args" &&
 	grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" &&
 	test_cmp log_wo_bloom log_w_bloom
-- 
2.43.0.509.g253f65a7fc


  parent reply	other threads:[~2024-01-31 22:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 22:52 [PATCH v6 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-01-31 22:52 ` [PATCH v6 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-31 22:52 ` Taylor Blau [this message]
2024-01-31 22:52 ` [PATCH v6 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-31 22:52 ` [PATCH v6 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-31 22:52 ` [PATCH v6 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-31 22:52 ` [PATCH v6 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-31 22:52 ` [PATCH v6 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-31 22:52 ` [PATCH v6 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-31 22:52 ` [PATCH v6 10/16] bloom: annotate filters with hash version Taylor Blau
2024-01-31 22:52 ` [PATCH v6 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-31 22:52 ` [PATCH v6 12/16] commit-graph: unconditionally load " Taylor Blau
2024-01-31 22:52 ` [PATCH v6 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-02-25 22:15   ` SZEDER Gábor
2024-02-28  0:11   ` Jiang Xin
2024-01-31 22:52 ` [PATCH v6 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-31 22:52 ` [PATCH v6 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-31 22:53 ` [PATCH v6 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6dc377f1b46d4afd442023426fc8dd5cd8ceb8e.1706741516.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.