From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 1/6] bloom: annotate filters with hash version
Date: Thu, 17 Aug 2023 15:55:06 -0400 [thread overview]
Message-ID: <ZN57Gsz+wk9n6/Da@nand.local> (raw)
In-Reply-To: <20230811214651.3326180-1-jonathantanmy@google.com>
On Fri, Aug 11, 2023 at 02:46:51PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > In subsequent commits, we will want to load existing Bloom filters out
> > of a commit-graph, even when the hash version they were computed with
> > does not match the value of `commitGraph.changedPathVersion`.
> >
> > In order to differentiate between the two, add a "filter" field to each
> > Bloom filter.
>
> You mean "version", I think.
Oops, yes -- I'm not sure how my editor tab-completed "version" there,
but oh, well :-).
> > @@ -55,6 +55,7 @@ struct bloom_filter_settings {
> > struct bloom_filter {
> > unsigned char *data;
> > size_t len;
> > + int version;
> > };
>
> We might want to shrink the sizes of len (we have a changed path limit
> so we know exactly how big Bloom filters can get) and version so that
> this struct doesn't take up more space. But if other reviewers think
> that this is OK, I'm fine with that.
I think that making len a size_t here is an appropriate choice. Even
though the maximum length of a Bloom filter is well below the 2^64-1
threshold, we are often looking at a memory-mapped region here, so
keeping track of it with a size_t / off_t seems reasonable to me.
> Another thing that we might want to track is whether the Bloom filter is
> a reference to an existing buffer (and thus does not need to be freed)
> or a reference to a malloc-ed buffer that we must free. But both before
> and after this patch set, a malloc-ed buffer is never overridden by a
> reference-to-existing-buffer, so we should still be fine for now. (This
> patch set does add a scenario in which a reference-to-existing buffer is
> overridden by a malloc-ed buffer, but that's the only new scenario.)
Yeah, I think there is some opportunity for clean-up here. I'll take a
look...
Thanks,
Taylor
next prev parent reply other threads:[~2023-08-17 19:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46 ` Jonathan Tan
2023-08-17 19:55 ` Taylor Blau [this message]
2023-08-21 20:21 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48 ` Jonathan Tan
2023-08-21 20:23 ` Taylor Blau
2023-08-24 22:20 ` Jonathan Tan
2023-08-24 22:47 ` Taylor Blau
2023-08-24 23:05 ` Jonathan Tan
2023-08-25 19:00 ` Taylor Blau
2023-08-29 16:49 ` Jonathan Tan
2023-08-29 19:14 ` Taylor Blau
2023-08-29 22:04 ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00 ` Jonathan Tan
2023-08-21 20:40 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06 ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZN57Gsz+wk9n6/Da@nand.local \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).