From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com>, peff@peff.net, stolee@gmail.com, avarab@gmail.com, szeder.dev@gmail.com Subject: [PATCH 0/2] Per-commit filter proof of concept Date: Wed, 10 Oct 2018 18:21:19 -0700 [thread overview] Message-ID: <cover.1539219248.git.jonathantanmy@google.com> (raw) In-Reply-To: <20181009193445.21908-1-szeder.dev@gmail.com> I did my own experiments on top of what Szeder provided - the first patch is to have one fixed-size bloom filter per commit, and the second patch makes that bloom filter apply to only the first parent of each commit. The results are: Original (szeder's) $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) time ./git commit-graph write 0:10.28 $ ls -l .git/objects/info/bloom 8388616 $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \ core.commitgraph=true rev-list --count --full-history HEAD -- \ t/valgrind/valgrind.sh 886 bloom filter total queries: 66459 definitely not: 65276 maybe: 1183 false positives: 297 fp ratio: 0.004469 0:00.24 With patch 1 $ GIT_USE_POC_BLOOM_FILTER=256 time ./git commit-graph write 0:16.22 $ ls -l .git/objects/info/bloom 1832620 $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \ core.commitgraph=true rev-list --count --full-history HEAD -- \ t/valgrind/valgrind.sh 886 bloom filter total queries: 66459 definitely not: 46637 maybe: 19822 false positives: 18936 fp ratio: 0.284928 0:01.53 With patch 2 $ GIT_USE_POC_BLOOM_FILTER=256 time ./git commit-graph write 0:06.70 $ ls -l .git/objects/info/bloom 1832620 $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \ core.commitgraph=true rev-list --count --full-history HEAD -- \ t/valgrind/valgrind.sh 886 bloom filter total queries: 53096 definitely not: 52989 maybe: 107 false positives: 89 fp ratio: 0.001676 0:01.29 For comparison, a non-GIT_USE_POC_BLOOM_FILTER rev-list takes 3.517 seconds. I haven't investigated why patch 1 takes longer than the original to create the bloom filter. Using per-commit filters and restricting the bloom filter to a single parent increases the relative power of the filter in omitting tree inspections compared to the original (107/53096 vs 1183/66459), but the lack of coverage w.r.t. the non-first parents had a more significant effect than I thought (1.29s vs .24s). It might be best to have one filter for each (commit, parent) pair (or, at least, the first two parents of each commit - we probably don't need to care that much about octopus merges) - this would take up more disk space than if we only store filters for the first parent, but is still less than the original example of storing information for all commits in one filter. There are more possibilities like dynamic filter sizing, different hashing, and hashing to support wildcard matches, which I haven't looked into. Jonathan Tan (2): One filter per commit Only make bloom filter for first parent bloom-filter.c | 31 ++++++++++++++++++------------- bloom-filter.h | 12 ++++-------- commit-graph.c | 30 ++++++++++++++---------------- revision.c | 29 +++++++++++++++-------------- 4 files changed, 51 insertions(+), 51 deletions(-) -- 2.19.0.271.gfe8321ec05.dirty
next prev parent reply other threads:[~2018-10-11 1:21 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-03 13:23 We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason 2018-10-03 13:36 ` SZEDER Gábor 2018-10-03 13:42 ` Derrick Stolee 2018-10-03 14:18 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:01 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:17 ` SZEDER Gábor 2018-10-03 14:22 ` Ævar Arnfjörð Bjarmason 2018-10-03 14:53 ` SZEDER Gábor 2018-10-03 15:19 ` Ævar Arnfjörð Bjarmason 2018-10-03 16:59 ` SZEDER Gábor 2018-10-05 6:09 ` Junio C Hamano 2018-10-10 22:07 ` SZEDER Gábor 2018-10-10 23:01 ` Ævar Arnfjörð Bjarmason 2018-10-03 19:08 ` Stefan Beller 2018-10-03 19:21 ` Jeff King 2018-10-03 20:35 ` Ævar Arnfjörð Bjarmason 2018-10-03 17:47 ` Stefan Beller 2018-10-03 18:47 ` Ævar Arnfjörð Bjarmason 2018-10-03 18:51 ` Jeff King 2018-10-03 18:59 ` Derrick Stolee 2018-10-03 19:18 ` Jeff King 2018-10-08 16:41 ` SZEDER Gábor 2018-10-08 16:57 ` Derrick Stolee 2018-10-08 18:10 ` SZEDER Gábor 2018-10-08 18:29 ` Derrick Stolee 2018-10-09 3:08 ` Jeff King 2018-10-09 13:48 ` Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) Derrick Stolee 2018-10-09 18:45 ` Ævar Arnfjörð Bjarmason 2018-10-09 18:46 ` Jeff King 2018-10-09 19:03 ` Derrick Stolee 2018-10-09 21:14 ` Jeff King 2018-10-09 23:12 ` Bloom Filters Jeff King 2018-10-09 23:13 ` [PoC -- do not apply 1/3] initial tree-bitmap proof of concept Jeff King 2018-10-09 23:14 ` [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode Jeff King 2018-10-10 0:48 ` Junio C Hamano 2018-10-11 3:13 ` Jeff King 2018-10-09 23:14 ` [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding Jeff King 2018-10-10 0:58 ` Junio C Hamano 2018-10-11 3:20 ` Jeff King 2018-10-11 12:33 ` Bloom Filters Derrick Stolee 2018-10-11 13:43 ` Jeff King 2018-10-09 21:30 ` We should add a "git gc --auto" after "git clone" due to commit graph SZEDER Gábor 2018-10-09 19:34 ` [PATCH 0/4] Bloom filter experiment SZEDER Gábor 2018-10-09 19:34 ` [PATCH 1/4] Add a (very) barebones Bloom filter implementation SZEDER Gábor 2018-10-09 19:34 ` [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit SZEDER Gábor 2018-10-09 21:06 ` Jeff King 2018-10-09 21:37 ` SZEDER Gábor 2018-10-09 19:34 ` [PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks SZEDER Gábor 2018-10-09 19:34 ` [PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics SZEDER Gábor 2018-10-09 19:47 ` [PATCH 0/4] Bloom filter experiment Derrick Stolee 2018-10-11 1:21 ` Jonathan Tan [this message] 2018-10-11 1:21 ` [PATCH 1/2] One filter per commit Jonathan Tan 2018-10-11 12:49 ` Derrick Stolee 2018-10-11 19:11 ` [PATCH] Per-commit and per-parent filters for 2 parents Jonathan Tan 2018-10-11 1:21 ` [PATCH 2/2] Only make bloom filter for first parent Jonathan Tan 2018-10-11 7:37 ` [PATCH 0/2] Per-commit filter proof of concept Ævar Arnfjörð Bjarmason 2018-10-15 14:39 ` [PATCH 0/4] Bloom filter experiment Derrick Stolee 2018-10-16 4:45 ` Junio C Hamano 2018-10-16 11:13 ` Derrick Stolee 2018-10-16 12:57 ` Ævar Arnfjörð Bjarmason 2018-10-16 13:03 ` Derrick Stolee 2018-10-18 2:00 ` Junio C Hamano 2018-10-16 23:41 ` Jonathan Tan 2018-10-08 23:02 ` We should add a "git gc --auto" after "git clone" due to commit graph Junio C Hamano 2018-10-03 14:32 ` Duy Nguyen 2018-10-03 16:45 ` Duy Nguyen 2018-10-04 21:42 ` [RFC PATCH] " Ævar Arnfjörð Bjarmason 2018-10-05 12:05 ` Derrick Stolee 2018-10-05 13:05 ` Ævar Arnfjörð Bjarmason 2018-10-05 13:45 ` Derrick Stolee 2018-10-05 14:04 ` Ævar Arnfjörð Bjarmason 2018-10-05 19:21 ` Jeff King 2018-10-05 19:41 ` Derrick Stolee 2018-10-05 19:47 ` Jeff King 2018-10-05 20:00 ` Derrick Stolee 2018-10-05 20:02 ` Jeff King 2018-10-05 20:01 ` Ævar Arnfjörð Bjarmason 2018-10-05 20:09 ` Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cover.1539219248.git.jonathantanmy@google.com \ --to=jonathantanmy@google.com \ --cc=avarab@gmail.com \ --cc=git@vger.kernel.org \ --cc=peff@peff.net \ --cc=stolee@gmail.com \ --cc=szeder.dev@gmail.com \ --subject='Re: [PATCH 0/2] Per-commit filter proof of concept' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).