From: Jonathan Tan <jonathantanmy@google.com> To: stolee@gmail.com Cc: szeder.dev@gmail.com, git@vger.kernel.org, peff@peff.net, gitster@pobox.com, avarab@gmail.com, sbeller@google.com, pclouds@gmail.com, jonathantanmy@google.com Subject: Re: [PATCH 0/4] Bloom filter experiment Date: Tue, 16 Oct 2018 16:41:29 -0700 [thread overview] Message-ID: <20181016234129.194138-1-jonathantanmy@google.com> (raw) In-Reply-To: <61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com> > | Implementation | Queries | Maybe | FP # | FP % | > |----------------|---------|-------|------|-------| > | Szeder | 66095 | 1142 | 256 | 0.38% | > | Jonathan | 66459 | 107 | 89 | 0.16% | > | Stolee | 53025 | 492 | 479 | 0.90% | > > (Note that we must have used different starting points, which is why my > "Queries" is so much smaller.) I suspect it's because your bloom filter implementation covers only the first parent (if I'm understanding get_bloom_filter() correctly). When I only covered the first parent in my initial test (see patch 2 of [1]), I got (following the columns in the table above): 53096 107 89 0.001676 Also, I think that the rejecting power (Queries - Maybe)/(Total tree comparisons if no bloom filters were used) needs to be in the evaluation criteria somewhere, as that indicates how many tree comparisons we managed to avoid. Also, we probably should also test on a file that changes more frequently :-) [1] https://public-inbox.org/git/cover.1539219248.git.jonathantanmy@google.com/ > The increase in false-positive percentage is expected in my > implementation. I'm using the correct filter sizes to hit a <1% FP > ratio. This could be lowered by changing the settings, and the size > would dynamically grow. For my Git repo (which contains > git-for-windows/git and microsoft/git) this implementation grows the > commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to > Szeder's 8MB filter). For 105,260 commits, that rounds out to less than > 20 bytes per commit (compared to Jonathan's 256 bytes per commit). Mine has 256 bits per commit, which is 32 bytes per commit (still more than yours). Having said all that, thanks for writing up your version - in particular, variable sized filters (like in yours) seem to be the way to go. > We'll see how much time I have to do this polish, but I think the > benefit is proven. Agreed.
next prev parent reply other threads:[~2018-10-16 23:41 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-03 13:23 We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason 2018-10-03 13:36 ` SZEDER Gábor 2018-10-03 13:42 ` Derrick Stolee 2018-10-03 14:18 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:01 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:17 ` SZEDER Gábor 2018-10-03 14:22 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:53 ` SZEDER Gábor 2018-10-03 15:19 ` Ævar Arnfjörð Bjarmason 2018-10-03 16:59 ` SZEDER Gábor 2018-10-05 6:09 ` Junio C Hamano 2018-10-10 22:07 ` SZEDER Gábor 2018-10-10 23:01 ` Ævar Arnfjörð Bjarmason 2018-10-03 19:08 ` Stefan Beller 2018-10-03 19:21 ` Jeff King 2018-10-03 20:35 ` Ævar Arnfjörð Bjarmason 2018-10-03 17:47 ` Stefan Beller 2018-10-03 18:47 ` Ævar Arnfjörð Bjarmason 2018-10-03 18:51 ` Jeff King 2018-10-03 18:59 ` Derrick Stolee 2018-10-03 19:18 ` Jeff King 2018-10-08 16:41 ` SZEDER Gábor 2018-10-08 16:57 ` Derrick Stolee 2018-10-08 18:10 ` SZEDER Gábor 2018-10-08 18:29 ` Derrick Stolee 2018-10-09 3:08 ` Jeff King 2018-10-09 13:48 ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee 2018-10-09 18:45 ` Ævar Arnfjörð Bjarmason 2018-10-09 18:46 ` Jeff King 2018-10-09 19:03 ` Derrick Stolee 2018-10-09 21:14 ` Jeff King 2018-10-09 23:12 ` Bloom Filters Jeff King 2018-10-09 23:13 ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King 2018-10-09 23:14 ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Jeff King 2018-10-10 0:48 ` Junio C Hamano 2018-10-11 3:13 ` Jeff King 2018-10-09 23:14 ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King 2018-10-10 0:58 ` Junio C Hamano 2018-10-11 3:20 ` Jeff King 2018-10-11 12:33 ` Bloom Filters Derrick Stolee 2018-10-11 13:43 ` Jeff King 2018-10-09 21:30 ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor 2018-10-09 19:34 ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor 2018-10-09 19:34 ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor 2018-10-09 19:34 ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor 2018-10-09 21:06 ` Jeff King 2018-10-09 21:37 ` SZEDER Gábor 2018-10-09 19:34 ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor 2018-10-09 19:34 ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor 2018-10-09 19:47 ` [PATCH 0/4] Bloom filter experiment Derrick Stolee 2018-10-11 1:21 ` [PATCH 0/2] Per-commit filter proof of concept Jonathan Tan 2018-10-11 1:21 ` [PATCH 1/2] One filter per commit Jonathan Tan 2018-10-11 12:49 ` Derrick Stolee 2018-10-11 19:11 ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan 2018-10-11 1:21 ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan 2018-10-11 7:37 ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason 2018-10-15 14:39 ` [PATCH 0/4] Bloom filter experiment Derrick Stolee 2018-10-16 4:45 ` Junio C Hamano 2018-10-16 11:13 ` Derrick Stolee 2018-10-16 12:57 ` Ævar Arnfjörð Bjarmason 2018-10-16 13:03 ` Derrick Stolee 2018-10-18 2:00 ` Junio C Hamano 2018-10-16 23:41 ` Jonathan Tan [this message] 2018-10-08 23:02 ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano 2018-10-03 14:32 ` Duy Nguyen 2018-10-03 16:45 ` Duy Nguyen 2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason 2018-10-05 12:05 ` Derrick Stolee 2018-10-05 13:05 ` Ævar Arnfjörð Bjarmason 2018-10-05 13:45 ` Derrick Stolee 2018-10-05 14:04 ` Ævar Arnfjörð Bjarmason 2018-10-05 19:21 ` Jeff King 2018-10-05 19:41 ` Derrick Stolee 2018-10-05 19:47 ` Jeff King 2018-10-05 20:00 ` Derrick Stolee 2018-10-05 20:02 ` Jeff King 2018-10-05 20:01 ` Ævar Arnfjörð Bjarmason 2018-10-05 20:09 ` Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181016234129.194138-1-jonathantanmy@google.com \ --to=jonathantanmy@google.com \ --cc=avarab@gmail.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=pclouds@gmail.com \ --cc=peff@peff.net \ --cc=sbeller@google.com \ --cc=stolee@gmail.com \ --cc=szeder.dev@gmail.com \ --subject='Re: [PATCH 0/4] Bloom filter experiment' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).