From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> To: git@vger.kernel.org Cc: philipoakley@iee.email, peff@peff.net, Derrick Stolee <dstolee@microsoft.com>, Derrick Stolee <dstolee@microsoft.com> Subject: [PATCH v2] bloom: ignore renames when computing changed paths Date: Thu, 09 Apr 2020 13:00:11 +0000 [thread overview] Message-ID: <pull.601.v2.git.1586437211842.gitgitgadget@gmail.com> (raw) In-Reply-To: <pull.601.git.1586363907252.gitgitgadget@gmail.com> From: Derrick Stolee <dstolee@microsoft.com> The changed-path Bloom filters record an entry in the filter for every path that was changed. This includes every add and delete, regardless of whether a rename was detected. Detecting renames causes significant performance issues, but also will trigger downloading missing blobs in partial clone. The simple fix is to disable rename detection when computing a changed-path Bloom filter. This should already be disabled by default, but it is good to explicitly enforce the intended behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- bloom: ignore renames when computing changed paths I promised [1] I would adapt the commit that was dropped from gs/commit-graph-path-filter [2] on top of gs/commit-graph-path-filter and jt/avoid-prefetch-when-able-in-diff. However, I noticed that the change was extremely simple and has value without basing it on jt/avoid-prefetch-when-able-in-diff. This change applied to gs/commit-graph-path-filter has obvious CPU time improvements for computing changed-path Bloom filters (that I did not measure). The partial clone improvements require jt/avoid-prefetch-when-able-in-diff to be included, too, but the code does not depend on it at compile time. Thanks, -Stolee [1] https://lore.kernel.org/git/7de2f54b-8704-a0e1-12aa-0ca9d3d70f6f@gmail.com/ [2] https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/ Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-601%2Fderrickstolee%2Fdiff-and-bloom-filters-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-601/derrickstolee/diff-and-bloom-filters-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/601 Range-diff vs v1: 1: 5fae00adcf0 ! 1: f4df00a0dd4 bloom: ignore renames when computing changed paths @@ Commit message The changed-path Bloom filters record an entry in the filter for every path that was changed. This includes every add and delete, - regardless of whther a rename was detected. Detecting renames + regardless of whether a rename was detected. Detecting renames causes significant performance issues, but also will trigger downloading missing blobs in partial clone. The simple fix is to disable rename detection when computing a - changed-path Bloom filter. + changed-path Bloom filter. This should already be disabled by + default, but it is good to explicitly enforce the intended + behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> bloom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/bloom.c b/bloom.c index c5b461d1cfe..dd9bab9bbd6 100644 --- a/bloom.c +++ b/bloom.c @@ -189,6 +189,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; + diffopt.detect_rename = 0; diffopt.max_changes = max_changes; diff_setup_done(&diffopt); base-commit: d5b873c832d832e44523d1d2a9d29afe2b84c84f -- gitgitgadget
prev parent reply other threads:[~2020-04-09 13:00 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-08 16:38 [PATCH] " Derrick Stolee via GitGitGadget 2020-04-08 19:11 ` Junio C Hamano 2020-04-08 19:13 ` Philip Oakley 2020-04-08 22:31 ` Jeff King 2020-04-09 11:56 ` Derrick Stolee 2020-04-09 13:47 ` Jeff King 2020-04-09 14:00 ` Derrick Stolee 2020-04-09 14:15 ` Jeff King 2020-04-09 13:00 ` Derrick Stolee via GitGitGadget [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=pull.601.v2.git.1586437211842.gitgitgadget@gmail.com \ --to=gitgitgadget@gmail.com \ --cc=dstolee@microsoft.com \ --cc=git@vger.kernel.org \ --cc=peff@peff.net \ --cc=philipoakley@iee.email \ --subject='Re: [PATCH v2] bloom: ignore renames when computing changed paths' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).