git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames
Date: Mon, 1 Mar 2021 08:39:07 -0800	[thread overview]
Message-ID: <CABPp-BEVvfMLGFthb5EQpYORSe9BifQRNb40+mgcg=bGCABc5Q@mail.gmail.com> (raw)
In-Reply-To: <pull.845.git.1614484707.gitgitgadget@gmail.com>

On Sat, Feb 27, 2021 at 7:58 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series depends textually on ort-perf-batch-8, but semantically it's
> almost completely unrelated and can be reviewed without any familiarity with
> any of the previous patch series.
>
> === Basic Optimization idea ===
>
> This series determines paths which meet special cases where detection of
> renames is irrelevant, where the irrelevance is due to the fact that the
> merge machinery will arrive at the same result regardless of whether a
> rename is detected for any of those paths. This series represents
> "Optimization #2" from my Git Merge 2020 talk[1], though this series has
> some improvements on the optimization relative to what I had at that time.
>
> The basic idea here is that if side A of history:
>
>  * only modifies/adds/deletes a few files
>  * adds new files to few if any of the directories that side B deleted or
>    renamed
>
> then when we do rename detection on side B we can avoid even looking at most
> (and perhaps even all) paths that side B deleted. Since commits being
> rebased or cherry-picked tend to only modify a few files, this optimization
> tends to be particularly effective for rebases and cherry-picks.
>
> Basing rename detection on what the other side of history did to a file
> means that extra information needs to be fed from merge-ort to
> diffcore-rename in order to take advantage of such an optimization.
>
> === Comparison to previous series ===
>
> This series differs from my two previous optimizations[2][3] (focusing on
> basename-guided rename detection) in two important aspects:
>
>  * there are no behavioral changes (there is no heuristic involved)
>
>  * this optimization is merge specific (it does not help the diff/status/log
>    family of commands, just merge/rebase/cherry-pick and such)
>
> === Results ===
>
> For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> performance work; instrument with trace2_region_* calls", 2020-10-28), the
> changes in just this series improves the performance as follows:
>
>                      Before Series           After Series
> no-renames:       12.596 s ±  0.061 s     5.680 s ±  0.096 s
> mega-renames:    130.465 s ±  0.259 s    13.812 s ±  0.162 s
> just-one-mega:     3.958 s ±  0.010 s   506.0  ms ±  3.9  ms
>
>
> However, interestingly, if we had ignored the basename-guided rename
> detection optimizations[2][3], then this optimization series would have
> improved the performance as follows:
>
>                Before Basename Series   After Just This Series
> no-renames:      13.815 s ±  0.062 s      5.728 s ±  0.104 s
> mega-renames:  1799.937 s ±  0.493 s     18.213 s ±  0.139 s
> just-one-mega    51.289 s ±  0.019 s    891.9  ms ±  7.0  ms
>
>
> As a reminder, before any merge-ort/diffcore-rename performance work, the
> performance results we started with (as noted in the same commit message)
> were:
>
> no-renames-am:      6.940 s ±  0.485 s
> no-renames:        18.912 s ±  0.174 s
> mega-renames:    5964.031 s ± 10.459 s
> just-one-mega:    149.583 s ±  0.751 s
>
>
> === Competition between optimizations ===
>
> We now have three major rename-related optimizations:
>
>  * exact rename detection
>  * basename-guided rename detection[2][3]
>  * skip-because-unnecessary (this series)
>
> It is possible for all three to potentially apply for specific paths (they
> do for the majority of renamed paths in our testcases), but we cannot use
> more than one for any given path. It turns out that the priority we give
> each optimization is very important and can drastically affect performance.
> We get best results by prioritizing them as follows:
>
>  1. exact rename detection
>  2. skip-because-unnecessary
>  3. basename-guided rename detection
>
> The third-to-last patch of this series also discusses this ordering and
> another minor variant of the skip-because-unnecessary optimization that was
> tried (and resulted in less effective performance gains than reported here),
> as well as some of the preparatory work over the past few years that this
> series relies on in order to enable this optimization.

Oops.  When I restructured the series I carefully re-read the commit
messages to make sure I didn't forget to update one...but I apparently
forgot to update the cover letter.  The discussion was actually split
across a few patches by the refactoring, and what is now the
third-to-last patch doesn't contain any of that discussion.

> === Near optimal? ===
>
> You may remember that there was a row labelled "everything else" from the
> commit message of 557ac0350d that represented the maximum possible speed-up
> from accelerating rename detection alone; as stated in that commit, those
> rows represented how fast the code could be if we had somehow infinitely
> parallelized the inexact rename detection. However, if you compare those
> "maximum speedup" numbers to what we have above, you'll note that the
> infinitely parallelized inexact rename detection would have been slightly
> slower than the results we have now achieved. (The reason this is possible,
> despite the fact that we still spend time in rename detection after our
> optimizations, is because we implemented two optimizations outside of
> diffcore_rename() along the way.) However, this good news does also come
> with a downside -- it means that our remaining optimization potential is
> somewhat limited, and subsequent optimization series will have to fight for
> much smaller gains.
>
> [1]
> https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf
> [2]
> https://lore.kernel.org/git/pull.843.git.1612651937.gitgitgadget@gmail.com/
> [3]
> https://lore.kernel.org/git/pull.844.git.1613289544.gitgitgadget@gmail.com/
>
> Elijah Newren (8):
>   diffcore-rename: enable filtering possible rename sources
>   merge-ort: precompute subset of sources for which we need rename
>     detection
>   merge-ort: add data structures for an alternate tree traversal
>   merge-ort: introduce wrappers for alternate tree traversal
>   merge-ort: precompute whether directory rename detection is needed
>   merge-ort: use relevant_sources to filter possible rename sources
>   merge-ort: skip rename detection entirely if possible
>   diffcore-rename: avoid doing basename comparisons for irrelevant
>     sources
>
>  diffcore-rename.c |  63 ++++++++++---
>  diffcore.h        |   1 +
>  merge-ort.c       | 236 +++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 285 insertions(+), 15 deletions(-)
>
>
> base-commit: 4be565c472088d4144063b736308bf2a57331f45
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-845%2Fnewren%2Fort-perf-batch-9-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-845/newren/ort-perf-batch-9-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/845
> --
> gitgitgadget

  parent reply	other threads:[~2021-03-01 16:44 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-28  3:58 [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-01 16:39 ` Elijah Newren [this message]
2021-03-04  7:54 ` [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren
2021-03-09  0:09 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-09 22:21     ` Derrick Stolee
2021-03-09 22:40       ` Elijah Newren
2021-03-09 22:45         ` Derrick Stolee
2021-03-09  0:09   ` [PATCH v2 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-09 23:06     ` Derrick Stolee
2021-03-09  0:09   ` [PATCH v2 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-09 22:51     ` Derrick Stolee
2021-03-09 22:57       ` Elijah Newren
2021-03-09  0:09   ` [PATCH v2 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-09 22:08   ` [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-10 15:08   ` Derrick Stolee
2021-03-11  0:38   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-15 13:57     ` [PATCH v3 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-15 17:10   ` [PATCH v2 " Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BEVvfMLGFthb5EQpYORSe9BifQRNb40+mgcg=bGCABc5Q@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).