git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 1/6] bloom: annotate filters with hash version
Date: Thu, 17 Aug 2023 15:55:06 -0400	[thread overview]
Message-ID: <ZN57Gsz+wk9n6/Da@nand.local> (raw)
In-Reply-To: <20230811214651.3326180-1-jonathantanmy@google.com>

On Fri, Aug 11, 2023 at 02:46:51PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > In subsequent commits, we will want to load existing Bloom filters out
> > of a commit-graph, even when the hash version they were computed with
> > does not match the value of `commitGraph.changedPathVersion`.
> >
> > In order to differentiate between the two, add a "filter" field to each
> > Bloom filter.
>
> You mean "version", I think.

Oops, yes -- I'm not sure how my editor tab-completed "version" there,
but oh, well :-).

> > @@ -55,6 +55,7 @@ struct bloom_filter_settings {
> >  struct bloom_filter {
> >  	unsigned char *data;
> >  	size_t len;
> > +	int version;
> >  };
>
> We might want to shrink the sizes of len (we have a changed path limit
> so we know exactly how big Bloom filters can get) and version so that
> this struct doesn't take up more space. But if other reviewers think
> that this is OK, I'm fine with that.

I think that making len a size_t here is an appropriate choice. Even
though the maximum length of a Bloom filter is well below the 2^64-1
threshold, we are often looking at a memory-mapped region here, so
keeping track of it with a size_t / off_t seems reasonable to me.

> Another thing that we might want to track is whether the Bloom filter is
> a reference to an existing buffer (and thus does not need to be freed)
> or a reference to a malloc-ed buffer that we must free. But both before
> and after this patch set, a malloc-ed buffer is never overridden by a
> reference-to-existing-buffer, so we should still be fine for now. (This
> patch set does add a scenario in which a reference-to-existing buffer is
> overridden by a malloc-ed buffer, but that's the only new scenario.)

Yeah, I think there is some opportunity for clean-up here. I'll take a
look...

Thanks,
Taylor

  reply	other threads:[~2023-08-17 19:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46   ` Jonathan Tan
2023-08-17 19:55     ` Taylor Blau [this message]
2023-08-21 20:21       ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48   ` Jonathan Tan
2023-08-21 20:23     ` Taylor Blau
2023-08-24 22:20   ` Jonathan Tan
2023-08-24 22:47     ` Taylor Blau
2023-08-24 23:05       ` Jonathan Tan
2023-08-25 19:00         ` Taylor Blau
2023-08-29 16:49           ` Jonathan Tan
2023-08-29 19:14             ` Taylor Blau
2023-08-29 22:04               ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00   ` Jonathan Tan
2023-08-21 20:40     ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06   ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46   ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZN57Gsz+wk9n6/Da@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).