From: Elijah Newren <newren@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Derrick Stolee <dstolee@microsoft.com>,
Jonathan Tan <jonathantanmy@google.com>,
Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames
Date: Mon, 1 Mar 2021 08:39:07 -0800 [thread overview]
Message-ID: <CABPp-BEVvfMLGFthb5EQpYORSe9BifQRNb40+mgcg=bGCABc5Q@mail.gmail.com> (raw)
In-Reply-To: <pull.845.git.1614484707.gitgitgadget@gmail.com>
On Sat, Feb 27, 2021 at 7:58 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series depends textually on ort-perf-batch-8, but semantically it's
> almost completely unrelated and can be reviewed without any familiarity with
> any of the previous patch series.
>
> === Basic Optimization idea ===
>
> This series determines paths which meet special cases where detection of
> renames is irrelevant, where the irrelevance is due to the fact that the
> merge machinery will arrive at the same result regardless of whether a
> rename is detected for any of those paths. This series represents
> "Optimization #2" from my Git Merge 2020 talk[1], though this series has
> some improvements on the optimization relative to what I had at that time.
>
> The basic idea here is that if side A of history:
>
> * only modifies/adds/deletes a few files
> * adds new files to few if any of the directories that side B deleted or
> renamed
>
> then when we do rename detection on side B we can avoid even looking at most
> (and perhaps even all) paths that side B deleted. Since commits being
> rebased or cherry-picked tend to only modify a few files, this optimization
> tends to be particularly effective for rebases and cherry-picks.
>
> Basing rename detection on what the other side of history did to a file
> means that extra information needs to be fed from merge-ort to
> diffcore-rename in order to take advantage of such an optimization.
>
> === Comparison to previous series ===
>
> This series differs from my two previous optimizations[2][3] (focusing on
> basename-guided rename detection) in two important aspects:
>
> * there are no behavioral changes (there is no heuristic involved)
>
> * this optimization is merge specific (it does not help the diff/status/log
> family of commands, just merge/rebase/cherry-pick and such)
>
> === Results ===
>
> For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> performance work; instrument with trace2_region_* calls", 2020-10-28), the
> changes in just this series improves the performance as follows:
>
> Before Series After Series
> no-renames: 12.596 s ± 0.061 s 5.680 s ± 0.096 s
> mega-renames: 130.465 s ± 0.259 s 13.812 s ± 0.162 s
> just-one-mega: 3.958 s ± 0.010 s 506.0 ms ± 3.9 ms
>
>
> However, interestingly, if we had ignored the basename-guided rename
> detection optimizations[2][3], then this optimization series would have
> improved the performance as follows:
>
> Before Basename Series After Just This Series
> no-renames: 13.815 s ± 0.062 s 5.728 s ± 0.104 s
> mega-renames: 1799.937 s ± 0.493 s 18.213 s ± 0.139 s
> just-one-mega 51.289 s ± 0.019 s 891.9 ms ± 7.0 ms
>
>
> As a reminder, before any merge-ort/diffcore-rename performance work, the
> performance results we started with (as noted in the same commit message)
> were:
>
> no-renames-am: 6.940 s ± 0.485 s
> no-renames: 18.912 s ± 0.174 s
> mega-renames: 5964.031 s ± 10.459 s
> just-one-mega: 149.583 s ± 0.751 s
>
>
> === Competition between optimizations ===
>
> We now have three major rename-related optimizations:
>
> * exact rename detection
> * basename-guided rename detection[2][3]
> * skip-because-unnecessary (this series)
>
> It is possible for all three to potentially apply for specific paths (they
> do for the majority of renamed paths in our testcases), but we cannot use
> more than one for any given path. It turns out that the priority we give
> each optimization is very important and can drastically affect performance.
> We get best results by prioritizing them as follows:
>
> 1. exact rename detection
> 2. skip-because-unnecessary
> 3. basename-guided rename detection
>
> The third-to-last patch of this series also discusses this ordering and
> another minor variant of the skip-because-unnecessary optimization that was
> tried (and resulted in less effective performance gains than reported here),
> as well as some of the preparatory work over the past few years that this
> series relies on in order to enable this optimization.
Oops. When I restructured the series I carefully re-read the commit
messages to make sure I didn't forget to update one...but I apparently
forgot to update the cover letter. The discussion was actually split
across a few patches by the refactoring, and what is now the
third-to-last patch doesn't contain any of that discussion.
> === Near optimal? ===
>
> You may remember that there was a row labelled "everything else" from the
> commit message of 557ac0350d that represented the maximum possible speed-up
> from accelerating rename detection alone; as stated in that commit, those
> rows represented how fast the code could be if we had somehow infinitely
> parallelized the inexact rename detection. However, if you compare those
> "maximum speedup" numbers to what we have above, you'll note that the
> infinitely parallelized inexact rename detection would have been slightly
> slower than the results we have now achieved. (The reason this is possible,
> despite the fact that we still spend time in rename detection after our
> optimizations, is because we implemented two optimizations outside of
> diffcore_rename() along the way.) However, this good news does also come
> with a downside -- it means that our remaining optimization potential is
> somewhat limited, and subsequent optimization series will have to fight for
> much smaller gains.
>
> [1]
> https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf
> [2]
> https://lore.kernel.org/git/pull.843.git.1612651937.gitgitgadget@gmail.com/
> [3]
> https://lore.kernel.org/git/pull.844.git.1613289544.gitgitgadget@gmail.com/
>
> Elijah Newren (8):
> diffcore-rename: enable filtering possible rename sources
> merge-ort: precompute subset of sources for which we need rename
> detection
> merge-ort: add data structures for an alternate tree traversal
> merge-ort: introduce wrappers for alternate tree traversal
> merge-ort: precompute whether directory rename detection is needed
> merge-ort: use relevant_sources to filter possible rename sources
> merge-ort: skip rename detection entirely if possible
> diffcore-rename: avoid doing basename comparisons for irrelevant
> sources
>
> diffcore-rename.c | 63 ++++++++++---
> diffcore.h | 1 +
> merge-ort.c | 236 +++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 285 insertions(+), 15 deletions(-)
>
>
> base-commit: 4be565c472088d4144063b736308bf2a57331f45
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-845%2Fnewren%2Fort-perf-batch-9-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-845/newren/ort-perf-batch-9-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/845
> --
> gitgitgadget
next prev parent reply other threads:[~2021-03-01 16:44 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-28 3:58 [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-01 16:39 ` Elijah Newren [this message]
2021-03-04 7:54 ` [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren
2021-03-09 0:09 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-09 22:21 ` Derrick Stolee
2021-03-09 22:40 ` Elijah Newren
2021-03-09 22:45 ` Derrick Stolee
2021-03-09 0:09 ` [PATCH v2 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-09 23:06 ` Derrick Stolee
2021-03-09 0:09 ` [PATCH v2 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-09 22:51 ` Derrick Stolee
2021-03-09 22:57 ` Elijah Newren
2021-03-09 0:09 ` [PATCH v2 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-09 22:08 ` [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-10 15:08 ` Derrick Stolee
2021-03-11 0:38 ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-15 13:57 ` [PATCH v3 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-15 17:10 ` [PATCH v2 " Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABPp-BEVvfMLGFthb5EQpYORSe9BifQRNb40+mgcg=bGCABc5Q@mail.gmail.com' \
--to=newren@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).