From: Jonathan Tan <jonathantanmy@google.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters
Date: Tue, 29 Aug 2023 15:04:32 -0700 [thread overview]
Message-ID: <20230829220432.558674-1-jonathantanmy@google.com> (raw)
In-Reply-To: <ZO5DheoYy/syeJ+9@nand.local>
Here are answers to your questions, but I'll also reply to the latest
version stating that I'm OK with this patch set being merged (with my
reasons).
Taylor Blau <me@ttaylorr.com> writes:
> Apologies for not quite grokking this, but I am still somewhat confused.
>
> Suppose we applied something like your suggestion on top of this patch,
> like so:
>
> --- 8< ---
> diff --git a/bloom.c b/bloom.c
> index 739fa093ba..ee81976342 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
> struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
> {
> struct bloom_filter *filter;
> - int hash_version;
> + struct bloom_filter_settings *settings;
>
> filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL);
> if (!filter)
> return NULL;
>
> prepare_repo_settings(r);
> - hash_version = r->settings.commit_graph_changed_paths_version;
> + settings = get_bloom_filter_settings(r);
>
> - if (!(hash_version == -1 || hash_version == filter->version))
> + if (filter->version != (settings->hash_version == 2 ? 2 : 1))
> return NULL; /* unusable filter */
> return filter;
> }
> --- >8 ---
>
> We have a handful of failing tests, notably including t4216.1, which
> tries to read settings->hash_version, but fails since settings is NULL.
> I believe this happens when we have a computed Bloom filter prepared for
> some commit, but have not yet written a commit graph containing that (or
> any) Bloom filter.
Ah, I discovered this independently too. I think it happens when we
write version 2 Bloom filters to a repo that has no Bloom filters
currently. So, r->settings.commit_graph_changed_paths_version is set to
2, but settings is NULL. I think the solution has to be a combination
of both (use commit_graph_changed_paths_version, but if it is -1, check
get_bloom_filter_settings()). But having said that, as I said above, we
can go with what you have currently.
> But I think we're talking about different things here. The point of
> get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is
> to only return Bloom filters that are readable under the configuration
> commitGraph.changedPathsVersion.
But this is my contention - sometimes commitGraph.changedPathsVersion is
-1, so we need to find out which version is readable from elsewhere.
> We have a handle on what version was used to compute Bloom filters in
> each layer of the graph by reading its "version" field, which is written
> via load_bloom_filter_from_graph().
>
> So we want to return a non-NULL filter exactly when we (a) have a Bloom
> filter computed for the given commit, and (b) its version matches the
> version specified in commitGraph.chagnedPathsVersion.
Yes, except add "or autodetected from the first commit graph file that
we read if nothing is specified in commitGraph.changedPathsVersion" to
the end.
> Are you saying that we need to do the normalization ahead of time so
> that we don't emit filters that have different hash versions when
> working in a commit-graph chain where each layer may use different Bloom
> filter settings?
By "emit" do you mean write filters to disk? If yes, I'm worried about
reading them, not writing them - for example, when running "git log"
with a path. I am worried precisely about the commit-graph chain in
which different layers have different Bloom settings, yes.
next prev parent reply other threads:[~2023-08-29 22:05 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46 ` Jonathan Tan
2023-08-17 19:55 ` Taylor Blau
2023-08-21 20:21 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48 ` Jonathan Tan
2023-08-21 20:23 ` Taylor Blau
2023-08-24 22:20 ` Jonathan Tan
2023-08-24 22:47 ` Taylor Blau
2023-08-24 23:05 ` Jonathan Tan
2023-08-25 19:00 ` Taylor Blau
2023-08-29 16:49 ` Jonathan Tan
2023-08-29 19:14 ` Taylor Blau
2023-08-29 22:04 ` Jonathan Tan [this message]
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00 ` Jonathan Tan
2023-08-21 20:40 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06 ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230829220432.558674-1-jonathantanmy@google.com \
--to=jonathantanmy@google.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).