[PATCH 0/9] [RFC] Changed Paths Bloom Filters

* [PATCH 0/9] [RFC] Changed Paths Bloom Filters
@ 2019-12-20 22:05 Garima Singh via GitGitGadget
  2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget
                   ` (15 more replies)
  0 siblings, 16 replies; 159+ messages in thread
From: Garima Singh via GitGitGadget @ 2019-12-20 22:05 UTC (permalink / raw)
  To: git; +Cc: stolee, szeder.dev, jonathantanmy, jeffhost, me, peff, Junio C Hamano

Hey! 

The commit graph feature brought in a lot of performance improvements across
multiple commands. However, file based history continues to be a performance
pain point, especially in large repositories. 

Adopting changed path bloom filters has been discussed on the list before,
and a prototype version was worked on by SZEDER Gábor, Jonathan Tan and Dr.
Derrick Stolee [1]. This series is based on Dr. Stolee's approach [2] and
presents an updated and more polished RFC version of the feature. 

Performance Gains: We tested the performance of git log -- path on the git
repo, the linux repo and some internal large repos, with a variety of paths
of varying depths.

On the git and linux repos: We observed a 2x to 5x speed up.

On a large internal repo with files seated 6-10 levels deep in the tree: We
observed 10x to 20x speed ups, with some paths going up to 28 times faster.

Future Work (not included in the scope of this series):

 1. Supporting multiple path based revision walk
 2. Adopting it in git blame logic. 
 3. Interactions with line log git log -L

This series is intended to start the conversation and many of the commit
messages include specific call outs for suggestions and thoughts. 

Cheers! Garima Singh

[1] https://lore.kernel.org/git/20181009193445.21908-1-szeder.dev@gmail.com/
[2] 
https://lore.kernel.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/

Garima Singh (9):
  commit-graph: add --changed-paths option to write
  commit-graph: write changed paths bloom filters
  commit-graph: use MAX_NUM_CHUNKS
  commit-graph: document bloom filter format
  commit-graph: write changed path bloom filters to commit-graph file.
  commit-graph: test commit-graph write --changed-paths
  commit-graph: reuse existing bloom filters during write.
  revision.c: use bloom filters to speed up path based revision walks
  commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag

 Documentation/git-commit-graph.txt            |   5 +
 .../technical/commit-graph-format.txt         |  17 ++
 Makefile                                      |   1 +
 bloom.c                                       | 257 +++++++++++++++++
 bloom.h                                       |  51 ++++
 builtin/commit-graph.c                        |   9 +-
 ci/run-build-and-tests.sh                     |   1 +
 commit-graph.c                                | 116 +++++++-
 commit-graph.h                                |   9 +-
 revision.c                                    |  67 ++++-
 revision.h                                    |   5 +
 t/README                                      |   3 +
 t/helper/test-read-graph.c                    |   4 +
 t/t4216-log-bloom.sh                          |  77 ++++++
 t/t5318-commit-graph.sh                       |   2 +
 t/t5324-split-commit-graph.sh                 |   1 +
 t/t5325-commit-graph-bloom.sh                 | 258 ++++++++++++++++++
 17 files changed, 875 insertions(+), 8 deletions(-)
 create mode 100644 bloom.c
 create mode 100644 bloom.h
 create mode 100755 t/t4216-log-bloom.sh
 create mode 100755 t/t5325-commit-graph-bloom.sh

base-commit: b02fd2accad4d48078671adf38fe5b5976d77304
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-497%2Fgarimasi514%2FcoreGit-bloomFilters-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-497/garimasi514/coreGit-bloomFilters-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/497
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 159+ messages in thread