git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Garima Singh <garimasigit@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>,
	Garima Singh via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Derrick Stolee" <stolee@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Taylor Blau" <me@ttaylorr.com>, "Jeff King" <peff@peff.net>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Emily Shaffer" <emilyshaffer@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Garima Singh" <garima.singh@microsoft.com>
Subject: Re: [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths
Date: Fri, 21 Feb 2020 19:32:03 -0500	[thread overview]
Message-ID: <a6c08b27-18d7-cf30-c076-3f6451a21519@gmail.com> (raw)
In-Reply-To: <86tv3qqxyn.fsf@gmail.com>


On 2/16/2020 11:49 AM, Jakub Narebski wrote:
>> From: Garima Singh <garima.singh@microsoft.com>
>>
>> Add the core Bloom filter logic for computing the paths changed between a
>> commit and its first parent. For details on what Bloom filters are and how they
>> work, please refer to Dr. Derrick Stolee's blog post [1]. It provides a concise
>> explaination of the adoption of Bloom filters as described in [2] and [3].
>                                                                            ^^- to add

Not sure what this means. Can you please clarify. 

>> 1. We currently use 7 and 10 for the number of hashes and the size of each
>>    entry respectively. They served as great starting values, the mathematical
>>    details behind this choice are described in [1] and [4]. The implementation,
>                                                                                 ^^- to add

Not sure what this means. Can you please clarify.

>> 3. The filters are sized according to the number of changes in the each commit,
>>    with minimum size of one 64 bit word.
> 
> If I understand it correctly (but which might not be entirely clear),
> the filter size in bits is the number of changes^* times 10, rounded up
> to the nearest multiple of 64.
> 
> [*] where the number of changes is the number of changed files (new blob
> objects) _and_ the number of changed directories (new tree objects,
> excluding root tree object change).
> 

Yes. 

> The interesting corner case, which might be worth specifying explicitly,
> is what happens in the case there are _no changes_ with respect to first
> parent (which can happen with either commit created with `git commit
> --allow-empty`, or merge created e.g. with `git merge --strategy=ours`).
> Is this case represented as Bloom filter of length 0, or as a Bloom
> filter of length  of one 64-bit word which is minimal length composed of
> all 0's (0x0000000000000000)?
> 

See t0095-bloom.sh: The filter for a commit with no changes is of length 0.
I will call it out specifically in the appropriate commit message as well. 

>>
>> [1] https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/
> 
> I would write it in full, similar to subsequent bibliographical entries,
> that is:
> 
>   [1] Derrick Stolee
>       "Supercharging the Git Commit Graph IV: Bloom Filters"
>       https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/
> 
> But that is just a matter of style.
> 

Sounds good. Will do. 

>>
>> [4] Thomas Mueller Graf, Daniel Lemire
>>     "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters"
>>     https://arxiv.org/abs/1912.08258
>>
>> [5] https://en.wikipedia.org/wiki/MurmurHash#Algorithm
>>
>> Helped-by: Jeff King <peff@peff.net>
>> Helped-by: Derrick Stolee <dstolee@microsoft.com>
>> Signed-off-by: Garima Singh <garima.singh@microsoft.com>
>> ---
>>  Makefile              |   2 +
>>  bloom.c               | 228 ++++++++++++++++++++++++++++++++++++++++++
>>  bloom.h               |  56 +++++++++++
>>  t/helper/test-bloom.c |  84 ++++++++++++++++
>>  t/helper/test-tool.c  |   1 +
>>  t/helper/test-tool.h  |   1 +
>>  t/t0095-bloom.sh      | 113 +++++++++++++++++++++
>>  7 files changed, 485 insertions(+)
>>  create mode 100644 bloom.c
>>  create mode 100644 bloom.h
>>  create mode 100644 t/helper/test-bloom.c
>>  create mode 100755 t/t0095-bloom.sh
> 
> As I wrote earlier, In my opinion this patch could be split into three
> individual single-functionality pieces, to make it easier to review and
> aid in bisectability if needed.
> 

Doing this in v3. 


>> +
>> +static uint32_t rotate_right(uint32_t value, int32_t count)
>> +{
>> +	uint32_t mask = 8 * sizeof(uint32_t) - 1;
>> +	count &= mask;
>> +	return ((value >> count) | (value << ((-count) & mask)));
>> +}
> 
> Hmmm... both the algoritm on Wikipedia, and reference implementation use
> rotate *left*, not rotate *right* in the implementation of Murmur3 hash,
> see
> 
>   https://en.wikipedia.org/wiki/MurmurHash#Algorithm
>   https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp#L23
> 
> 
> inline uint32_t rotl32 ( uint32_t x, int8_t r )
> {
>   return (x << r) | (x >> (32 - r));
> }
> 

Thanks! Fixed this in v3. More on it later. 

>> +
>> +/*
>> + * Calculate a hash value for the given data using the given seed.
>> + * Produces a uniformly distributed hash value.
>> + * Not considered to be cryptographically secure.
>> + * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm
>> + **/
>     ^^-- why two _trailing_ asterisks?
> 

Oops. Fixed. 

>> +static uint32_t seed_murmur3(uint32_t seed, const char *data, int len)
> 
> In short, I think that the name of the function should be murmur3_32, or
> murmurhash3_32, or possibly murmur3_32_seed, or something like that.
>

Renamed it to murmur3_seeded in v3. The input and output types in the 
signature make it clear that it is 32-bit version.
 
>> +{
>> +	const uint32_t c1 = 0xcc9e2d51;
>> +	const uint32_t c2 = 0x1b873593;
>> +	const uint32_t r1 = 15;
>> +	const uint32_t r2 = 13;
>> +	const uint32_t m = 5;
>> +	const uint32_t n = 0xe6546b64;
>> +	int i;
>> +	uint32_t k1 = 0;
>> +	const char *tail;
>> +
>> +	int len4 = len / sizeof(uint32_t);
>> +
>> +	const uint32_t *blocks = (const uint32_t*)data;
>> +
>> +	uint32_t k;
>> +	for (i = 0; i < len4; i++)
>> +	{
>> +		k = blocks[i];
> 
> IMPORTANT: There is a comment around there in the example implementation
> in C on Wikipedia that this operation above is a source of differing
> results across endianness.  

Thanks! SZEDER found this on his CI pipeline and we have fixed it to 
process the data in 1 byte words to avoid hitting any endian-ness issues. 
See this part of the thread that carries the fix and the related discussion. 
  https://lore.kernel.org/git/ba856e20-0a3c-e2d2-6744-b9abfacdc465@gmail.com/
I will be squashing those changes in appropriately in v3.  
 
>> +		k1 *= c2;
>> +		seed ^= k1;
>> +		break;
>> +	}
>> +
>> +	seed ^= (uint32_t)len;
>> +	seed ^= (seed >> 16);
>> +	seed *= 0x85ebca6b;
>> +	seed ^= (seed >> 13);
>> +	seed *= 0xc2b2ae35;
>> +	seed ^= (seed >> 16);
>> +
>> +	return seed;
>> +}
> 
> In https://public-inbox.org/git/ba856e20-0a3c-e2d2-6744-b9abfacdc465@gmail.com/
> you posted "[PATCH] Process bloom filter data as 1 byte words".
> This may avoid the Big-endian vs Little-endian confusion,
> that is wrong results on Big-endian architectures, but
> it also may slow down the algorithm.
> 

Oh cool! You have seen that patch. And yes, we understand that it might add 
a little overhead but at this point it is more important to be correct on all
architectures instead of micro-optimizing and introducing different 
implementations for Little-endian and Big-endian. This would make this 
series overly complicated. Optimizing the hashing techniques would deserve a
series of its own, which we can definitely revisit later.

> The public domain implementation in PMurHash.c in SMHasher
> (re)implementation in Chromium (see URL above) fall backs to 1-byte
> operations only if it doesn't know the endianness (or if it is neither
> little-endian, nor big-endian, i.e. middle-endian or mixed-endian --
> though I doubt that Git works correctly on mixed-endian anyway).
> 
> 
> Sidenote: it looks like the current implementation if Murmur hash in
> Cromium uses MurmurHash3_x86_32, i.e. little-endian unaligned-safe
> implementation, but prepares data by swapping with StringToLE32
> https://github.com/chromium/chromium/blob/master/components/variations/variations_murmur_hash.h
> 
> 
> Assuming that the terminating NUL ("\0") character of a c-string is not
> included in hash calculations, then murmur3_x86_32 hash has the
> following results (all results are for seed equal 0):
> 
> ''               -> 0x00000000
> ' '              -> 0x7ef49b98
> 'Hello world!'   -> 0x627b0c2c
> 'The quick brown fox jumps over the lazy dog'   -> 0x2e4ff723
> 
> C source (from Wikipedia): https://godbolt.org/z/ofa2p8
> C++ source (Appleby's):    https://godbolt.org/z/BoSt6V
> 
> The implementation provided in this patch, with rotate_right (instead of
> rotate_left) gives, on little-endian machine, different results:
> 
> ''               -> 0x00000000
> ' '              -> 0xd1f27e64
> 'Hello world!'   -> 0xa0791ad7
> 'The quick brown fox jumps over the lazy dog'   -> 0x99f1676c
> 
> https://github.com/gitgitgadget/git/blob/e1b076a714d611e59d3d71c89221e41a3427fae4/bloom.c#L21
> C source (via GitGitGadget): https://godbolt.org/z/R9s8Tt
> 

Thanks! This is an excellent catch! Fixing the rotate_right to rotate_left, 
gives us the same answers as the two implementations you pointed out. I have
added the appropriate unit tests in v3 and they match the values you obtained 
from the other implementations. Thanks a lot for the rigor! 

We based our implementation on the pseudo code and not on the sample code 
presented here: https://en.wikipedia.org/wiki/MurmurHash#Algorithm
We just didn't parse the ROL instruction correctly. 

>> +
>> +void load_bloom_filters(void)
>> +{
>> +	init_bloom_filter_slab(&bloom_filters);
>> +}
> 
> 
> Actually this function doesn't load anything.  Perhaps it should be
> named init_bloom_filters() or init_bloom_filters_storage(), or
> bloom_filters_init()?
>

Changed to init_bloom_filters() in v3. Thanks! 
 
>> +
>> +void fill_bloom_key(const char *data,
>> +					int len,
>> +					struct bloom_key *key,
>> +					struct bloom_filter_settings *settings)
> 
> The last parameter could be of 'const bloom_filter_settings *' type.
> 

Done. 

>> +{
>> +	int i;
>> +	const uint32_t seed0 = 0x293ae76f;
>> +	const uint32_t seed1 = 0x7e646e2c;
> 
> Where did those seeds values came from?
> 

Those values were chosen randomly. They will be fixed constants for the 
current hashing version. I will add a note calling this out in the 
appropriate commit messages and the Documentation in v3. 

>> +	const uint32_t hash0 = seed_murmur3(seed0, data, len);
>> +	const uint32_t hash1 = seed_murmur3(seed1, data, len);
>> +
>> +	key->hashes = (uint32_t *)xcalloc(settings->num_hashes, sizeof(uint32_t));
>> +	for (i = 0; i < settings->num_hashes; i++)
>> +		key->hashes[i] = hash0 + i * hash1;
> 
> Note that in [3] authors say that double hashing technique has some
> problems.  For one, we should ensure that hash1 is not zero, and even
> better that it is odd (which makes it relatively prime to filter size
> which is multiple of 64).  It also suffers from something called
> "approximate fingerprint collisions".
> 
> That is why the define "enhanced double hashing" technique, which does
> not suffer from those problems (Algorithm 2, page 11/15).
> 
>   +	for (i = 0; i < settings->num_hashes; i++) {
>   +		key->hashes[i] = hash0;
>   +
>   +		hash0 = hash0 + hash1;
>   +		hash1 = hash1 + i;
>   +	}
> 
> This can also be written in closed form, based on equation (6)
> 
>   +	for (i = 0; i < settings->num_hashes; i++)
>   +		key->hashes[i] = hash0 + i * hash1 + i*(i*i - 1)/6;
> 
> 
> In later paper [6] the closed form for "enhanced double hashing"
> (p. 188) is slightly modified (or rather they use different variant of
> this technique):
> 
>   +	for (i = 0; i < settings->num_hashes; i++)
>   +		key->hashes[i] = hash0 + i * hash1 + i*i;
> 
> This is a variant of more generic "enhanced double hashing", section
> 5.2 (Enhanced) Double Hashing Schemes (page 199):
> 
>         h_1(u) + i h_2(u) + f(i)    mod m
> 
> with f(i) = i^2 = i*i.
> 
> They have tested that enhanced double hashing with both f(i) equal i*i
> and equal i*i*i, and triple hashing technique, and they have found that
> it performs slightly better than straight double hashing technique
> (Fig. 1, page 212, section 3).
> 

Thanks for the detailed research here! The hash becoming zero and the 
approximate fingerprint collision are both extremely rare situations. In both
cases, we would just see git log having to diff more trees than if it didn't 
occur. While these techniques would be great optimizations to do, especially
if this implementation gets pulled into more generic hashing applications
in the code, we think that for the purposes of the current series - it is not 
worth it. I say this because Azure Repos has been using this exact hashing 
technique for several years now without any glitches. And we think it would
be great to rely on this battle tested strategy in atleast the first version
of this feature. 

>> +}
>> +
>> +void add_key_to_filter(struct bloom_key *key,
>> +					   struct bloom_filter *filter,
>> +					   struct bloom_filter_settings *settings)
> 
> Here again the 'settings' argument can be const (as can the 'key'
> parameter).
> 

Done. 

>> +
>> +struct bloom_filter *get_bloom_filter(struct repository *r,
>> +				      struct commit *c)
>> +{
>> +	struct bloom_filter *filter;
>> +	struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS;
>> +	int i;
>> +	struct diff_options diffopt;
>> +
>> +	if (!bloom_filters.slab_size)
>> +		return NULL;
> 
> This is testing that commit slab for per-commit Bloom filters is
> initialized, isn't it?
> 
> First, should we write the condition as
> 
> 	if (!bloom_filters.slab_size)
> 
> or would the following be more readable
> 
> 	if (bloom_filters.slab_size == 0)
> 

Sure. Switched to `if (bloom_filter.slab_size == 0)` in v3. 

> Second, should we return NULL, or should we just initialize the slab?
> Or is non-existence of slab treated as a signal that the Bloom filters
> mechanism is turned off?
> 

Yes. We purposefully choose to return NULL and ignore the mechanism 
overall because we use Bloom filters best effort only. 

>> +
>> +	if (diff_queued_diff.nr <= 512) {
>
> Second, there is a minor issue that diff_queue_struct.nr stores the
> number of filepairs, that is the number of changed files, while the
> number of elements added to Bloom filter is number of changed blobs and
> trees.  For example if the following files are changed:
> 
>   sub/dir/file1
>   sub/file2
> 
> then diff_queued_diff.nr is 2, but number of elements to be added to
> Bloom filter is 4.
> 
>   sub/dir/file1
>   sub/file2
>   sub/dir/
>   sub/
> 
> I'm not sure if it matters in practice.
> 

It does not matter much in practice, since the directories usually tend
to collapse across the changes. Still, I will add another limit after 
creating the hashmap entries to cap at 640 so that we have a maximum of 
100 changes in the bloom filter. 

We plan to make these values configurable later. 

>> +		struct hashmap pathmap;
>> +		struct pathmap_hash_entry* e;
>> +		struct hashmap_iter iter;
>> +		hashmap_init(&pathmap, NULL, NULL, 0);
> 
> Stylistic issue: I have just noticed that here (and in some other
> places), but not in all cases, you declare pointer types with asterisk
> cuddled to type name, not to variable name, which contradicts
> CodingGuidelines

Thanks for noticing that! Fixed all of these in v3. 

>> +
>> +		for (i = 0; i < diff_queued_diff.nr; i++) {
>> +			const char* path = diff_queued_diff.queue[i]->two->path;
> 
> Is that correct that we consider only post-image name for storing
> changes in Bloom filter?  Currently if file was renamed (or deleted), it
> is considered changed, and `git log -- <old-name>` lists commit that
> changed file name too.
>

The tests in t4216-log-bloom.sh ensure that the output of `git log -- <oldname>` 
remains unchanged for renamed and deleted files, when using bloom filters. 
I realize that I fat fingered over checking the old name, and didn't have an 
explicit deleted file in the test. I have added them in v3, and the tests pass. 
So the behavior is preserved and as expected when using Bloom filters. 
Thanks for paying close attention! 
 
>> +			const char* p = path;
> 
> It should be "const char *" for both.
> 
>> +
>> +			/*
>> +			* Add each leading directory of the changed file, i.e. for
>> +			* 'dir/subdir/file' add 'dir' and 'dir/subdir' as well, so
>> +			* the Bloom filter could be used to speed up commands like
>> +			* 'git log dir/subdir', too.
>> +			*
>> +			* Note that directories are added without the trailing '/'.
>> +			*/
>> +			do {
>> +				char* last_slash = strrchr(p, '/');
>> +
>> +				FLEX_ALLOC_STR(e, path, path);
> 
> Here first 'path' is the field name, i.e. pathmap_hash_entry.path,
> second 'path' is the name of local variable, aliased also to 'p'.
> 
>> +				hashmap_entry_init(&e->entry, strhash(p));
> 
> I don't know why both 'path' and 'p' are used, while both point to the
> same memory (and thus have the same contents).  It is a bit confusing.
> See also my previous comment.
> 

Cleaned up in v3. Thanks! 
>> +		filter->data = NULL;
>> +		filter->len = 0;
> 
> This needs to be explicitly stated both in the commit message and in the
> API documentation (in comments) that bloom_filter.len == 0 means "no
> data", while "no changes" is represented as bloom_filter with len == 1
> and *data == (uint64_t)0;
> 
> EDIT: actually "no changes" is also represented as bloom_filter with len
> equal 0, as it turns out.
> 
> One possible alternative could be representing "no data" value with
> Bloom filter of length 1 and all 64 bits set to 1, and "no changes"
> represented as filter of length 0.  This is not unambiguous choice!
>

There is no gain in distinguishing between the absence of a filter and
a commit having no changes. The effect on `git log -- path` is the same in 
both cases. We fall back to the normal diffing algorithm in revision.c.
I will make this clearer in the appropriate commit messages and in the 
Documentation in v3. 
 
>> +}
>> diff --git a/bloom.h b/bloom.h
>> new file mode 100644
>> index 0000000000..7f40c751f7
>> --- /dev/null
>> +++ b/bloom.h
>> @@ -0,0 +1,56 @@
>> +#ifndef BLOOM_H
>> +#define BLOOM_H
> 
> Should we #include the stdint.h header for uint32_t and uint64_t types?
> 

git-compat-util.h takes care of this. 

>> +
>> +struct commit;
>> +struct repository;
>> +struct commit_graph;
>> +
> 
> Perhaps we should add block comment for this struct, like there is one
> for struct bloom_filter below.
> 

Done in v3.

>> +struct bloom_filter_settings {
>> +	uint32_t hash_version;
>> +	uint32_t num_hashes;
>> +	uint32_t bits_per_entry;
> 
> I guess that the type uint32_t was chosen to make it easier to store
> this information and later retrieve it from the commit-graph file, isn't
> it?  Otherwise those types are much too large for sensible range of
> values (which would all fit in 8-bits byte).
> 

Yes.

>> +
>> +/*
>> + * A bloom_key represents the k hash values for a
>> + * given hash input. These can be precomputed and
>> + * stored in a bloom_key for re-use when testing
>> + * against a bloom_filter.
> 
> We might want to add that the number of hash values is given by Bloom
> filter settings, and it is assumed to be the same for all bloom_key
> variables / objects.
> 

Incorporated in v3. 

>> +. ./test-lib.sh
>> +
>> +test_expect_success 'get bloom filters for commit with no changes' '
>> +	git init &&
>> +	git commit --allow-empty -m "c0" &&
>> +	cat >expect <<-\EOF &&
>> +	Filter_Length:0
>> +	Filter_Data:
>> +	EOF
>> +	test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual &&
>> +	test_cmp expect actual
>> +'
> 
> A few things.  First, I wonder why we need to provide object ID;
> couldn't 'test-tool bloom get_filter_for_commit' parse commit-ish
> argument, or would it make it too complicated for no reason?
> 

Yes it was overkill for what I need in the test. 

>> +
>> +test_expect_success 'get bloom filter for commit with 10 changes' '
>> +	rm actual &&
>> +	rm expect &&
>> +	mkdir smallDir &&
>> +	for i in $(test_seq 0 9)
>> +	do
>> +		echo $i >smallDir/$i
>> +	done &&
>> +	git add smallDir &&
>> +	git commit -m "commit with 10 changes" &&
>> +	cat >expect <<-\EOF &&
>> +	Filter_Length:4
>> +	Filter_Data:508928809087080a|8a7648210804001|4089824400951000|841ab310098051a8|
>> +	EOF
>> +	test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual &&
>> +	test_cmp expect actual
>> +'
> 
> This test is in my opinion fragile, as it unnecessarily test the
> implementation details instead of the functionality provided.  If we
> change the hashing scheme (for example going from double hashing to some
> variant of enhanced double hashing), or change the base hash function
> (for example from Murmur3_32 to xxHash_64), or change the number of hash
> functions (perhaps because changing of number of bits per element, and
> thus optimal number of hash functions from 7 to 6), or change from
> 64-bit word blocks to 32-bit word blocks, the test would have to be
> changed.
> 

Regarding this and the rest of you comments on t0095-log-bloom.sh:

I am tweaking it as necessary but the entire point of these tests is to
break for the things you called out. They need to be intricately tied
to the current hashing strategy and are hence intended to be fragile so 
as to catch any subtle or accidental changes in the hashing computation. 
Any change like the ones you have called out would require a hash version
change and all the compatibility reactions that come with it. 

I have added more tests around the murmur3_seeded method in v3. Removed
some of the redundant ones. 

The other more evolved test cases you call out are covered in the e2e
integration tests in t4216-log-bloom.sh

> 
> Reviewed-by: Jakub Narębski <jnareb@gmail.com>
> 
> Thanks for working on this.
> 
> Best,
> 

Thank you once again for an excellent and in-depth review of this patch! 
You have helped make this code so much better!

Cheers! 
Garima Singh

  reply	other threads:[~2020-02-22  0:32 UTC|newest]

Thread overview: 159+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 22:05 [PATCH 0/9] [RFC] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget
2020-01-01 20:20   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 2/9] commit-graph: write changed paths bloom filters Garima Singh via GitGitGadget
2019-12-21 16:48   ` Philip Oakley
2020-01-06 18:44   ` Jakub Narebski
2020-01-13 19:48     ` Garima Singh
2019-12-20 22:05 ` [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-01-07 12:19   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 4/9] commit-graph: document bloom filter format Garima Singh via GitGitGadget
2020-01-07 14:46   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file Garima Singh via GitGitGadget
2020-01-07 16:01   ` Jakub Narebski
2020-01-14 15:14     ` Garima Singh
2019-12-20 22:05 ` [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Garima Singh via GitGitGadget
2020-01-08  0:32   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 7/9] commit-graph: reuse existing bloom filters during write Garima Singh via GitGitGadget
2020-01-09 19:12   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-01-11  0:27   ` Jakub Narebski
2020-01-15  0:08     ` Garima Singh
2019-12-20 22:05 ` [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Garima Singh via GitGitGadget
2020-01-11 19:56   ` Jakub Narebski
2020-01-15  0:55     ` Garima Singh
2019-12-20 22:14 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Junio C Hamano
2019-12-22  9:26 ` Christian Couder
2019-12-22  9:38   ` Jeff King
2020-01-01 12:04     ` Jakub Narebski
2019-12-22  9:30 ` Jeff King
2019-12-22  9:32   ` [PATCH 1/3] commit-graph: examine changed-path objects in pack order Jeff King
2019-12-27 14:51     ` Derrick Stolee
2019-12-29  6:12       ` Jeff King
2019-12-29  6:28         ` Jeff King
2019-12-30 14:37         ` Derrick Stolee
2019-12-30 14:51           ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 2/3] commit-graph: free large diffs, too Jeff King
2019-12-27 14:52     ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 3/3] commit-graph: stop using full rev_info for diffs Jeff King
2019-12-27 14:53     ` Derrick Stolee
2019-12-26 14:21   ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Derrick Stolee
2019-12-29  6:03     ` Jeff King
2019-12-27 16:11   ` Derrick Stolee
2019-12-29  6:24     ` Jeff King
2019-12-30 16:04       ` Derrick Stolee
2019-12-30 17:02       ` Junio C Hamano
2019-12-31 16:45 ` Jakub Narebski
2020-01-13 16:54   ` Garima Singh
2020-01-20 13:48     ` Jakub Narebski
2020-01-21 16:14       ` Garima Singh
2020-02-02 18:43         ` Jakub Narebski
2020-01-21 23:40 ` Emily Shaffer
2020-01-27 18:24   ` Garima Singh
2020-02-01 23:32   ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 00/11] " Garima Singh via GitGitGadget
2020-02-05 22:56   ` [PATCH v2 01/11] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-02-09 12:39     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-02-15 17:17     ` Jakub Narebski
2020-02-16 16:49     ` Jakub Narebski
2020-02-22  0:32       ` Garima Singh [this message]
2020-02-23 13:38         ` Jakub Narebski
2020-02-24 17:34           ` Garima Singh
2020-02-24 18:20             ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 03/11] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-02-17  0:00     ` Jakub Narebski
2020-02-22  0:37       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-02-17 21:56     ` Jakub Narebski
2020-02-22  0:55       ` Garima Singh
2020-02-23 17:34         ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-02-18 17:59     ` Jakub Narebski
2020-02-24 18:29       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 06/11] commit-graph: examine commits by generation number Derrick Stolee via GitGitGadget
2020-02-19  0:32     ` Jakub Narebski
2020-02-24 20:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 07/11] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-02-19 15:13     ` Jakub Narebski
2020-02-24 21:14       ` Garima Singh
2020-02-25 11:40         ` Jakub Narebski
2020-02-25 15:58           ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 08/11] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-02-20 18:48     ` Jakub Narebski
2020-02-24 21:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-02-20 20:28     ` Jakub Narebski
2020-02-24 21:51       ` Garima Singh
2020-02-25 12:10         ` Jakub Narebski
2020-02-20 22:10     ` Bryan Turner
2020-02-22  1:44       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 10/11] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-02-21 17:31     ` Jakub Narebski
2020-02-21 22:45     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 11/11] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-02-22  0:11     ` Jakub Narebski
2020-02-07 13:52   ` [PATCH v2 00/11] Changed Paths Bloom Filters SZEDER Gábor
2020-02-07 15:09     ` Garima Singh
2020-02-07 15:36       ` Derrick Stolee
2020-02-07 16:15         ` SZEDER Gábor
2020-02-07 16:33           ` Derrick Stolee
2020-02-11 19:08       ` Garima Singh
2020-02-08 23:04   ` Jakub Narebski
2020-02-21 17:41     ` Garima Singh
2020-03-29 18:36       ` Junio C Hamano
2020-03-30  0:31   ` [PATCH v3 00/16] " Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 01/16] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 02/16] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 03/16] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 04/16] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 05/16] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 06/16] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 07/16] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 08/16] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 09/16] diff: skip batch object download when possible Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 10/16] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 11/16] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 12/16] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 13/16] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 14/16] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 15/16] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 16/16] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-06 16:59     ` [PATCH v4 00/15] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 01/15] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 02/15] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 03/15] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 04/15] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-06-27 15:53         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 05/15] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-08-04 14:47         ` SZEDER Gábor
2020-08-04 16:25           ` Derrick Stolee
2020-08-04 17:00             ` SZEDER Gábor
2020-08-04 17:31               ` Derrick Stolee
2020-08-05 17:08                 ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 06/15] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 07/15] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 08/15] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-05-29  8:57         ` SZEDER Gábor
2020-05-29 13:35           ` Derrick Stolee
2020-05-31 17:23             ` SZEDER Gábor
2020-07-09 17:00         ` [PATCH] commit-graph: fix "Writing out commit graph" progress counter SZEDER Gábor
2020-07-09 18:01           ` Derrick Stolee
2020-07-09 18:20             ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 10/15] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-06-19 14:02         ` SZEDER Gábor
2020-06-19 19:28           ` Junio C Hamano
2020-07-27 21:33         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 11/15] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-06-07 22:21         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 12/15] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-06-26  6:34         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 13/15] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 14/15] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 15/15] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-08 15:51       ` [PATCH v4 00/15] Changed Paths Bloom Filters Derrick Stolee
2020-04-08 19:21         ` Junio C Hamano
2020-04-08 20:05         ` Jakub Narębski
2020-04-12 20:34         ` Taylor Blau
2020-03-05 19:49 ` [PATCH 0/9] [RFC] " Garima Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6c08b27-18d7-cf30-c076-3f6451a21519@gmail.com \
    --to=garimasigit@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=emilyshaffer@gmail.com \
    --cc=garima.singh@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jnareb@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    --subject='Re: [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox