git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>, Elijah Newren <newren@gmail.com>
Subject: [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames
Date: Wed, 24 Mar 2021 21:32:26 +0000	[thread overview]
Message-ID: <pull.859.git.1616621553.gitgitgadget@gmail.com> (raw)

This series builds on ort-readiness, but it's semantically orthogonal to the
previous series and represents a new independent optimization.

=== Basic Optimization idea ===

This series avoids repeatedly detecting the same renames in a sequence of
merges such as a rebase or cherry-pick of several commits. When there are
many renames between the old base and the new base, traditionally all those
renames are re-detected for every commit that is transplanted. This
optimization avoids redoing that work.

This represents "Optimization #4" from my Git Merge 2020 talk[1]; the
details are a bit more involved than I realized at the time, but the high
level idea is the same.

=== Comparison to previous series ===

I previously noted that we had three major rename-related optimizations:

 * exact rename detection (applies when unmodified on renamed side)
 * skip-because-irrelevant (applies when unmodified on unrenamed side)
 * basename-guided rename detection (applies when basename unchanged)

This one adds a fourth (remember-renames), with some interesting properties:

 * unlike basename-guided rename detection, there are no behavioral changes
   (there is no heuristic involved)[2].

 * like skip-because-irrelevant, this optimization does not apply to all git
   commands using the rename machinery. In fact, this one is even more
   restrictive since it is ONLY useful for rebases and cherry-picks (not
   even merges), and only for second and later commits in a linear series.

 * unlike the three previous optimizations, there are no requirements about
   the types of changes done to the file; it just caches renames on the
   "upstream" side of history for subsequent commit picking.

It's also worth noting despite wording about "remembering" or "caching"
renames, that this optimization does NOT write this cache to disk; it's an
in-memory only cache. When the rebase or cherry-pick completes (or hits a
conflict and stops), the cache is discarded.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.665 s ±  0.129 s     5.624 s ±  0.077 s 
mega-renames:     11.435 s ±  0.158 s    10.213 s ±  0.032 s
just-one-mega:   494.2  ms ±  6.1  ms   497.6  ms ±  5.3  ms


By design, this optimization could not help the just-one-mega testcase. The
gains for the other two testcases may look somewhat smaller than one would
expect given the description (only ~10% for the mega-renames testcase), but
the point was to spend less time detecting renames...and there just wasn't
that much time spent in renames for these testcases before this series for
us to remove. However, if we undid the basename-guided rename detection and
skip-because-unnecessary optimizations, then this series alone would have
improved performance as follows:

                   Before Basename Series   After Just This Series
    no-renames:      13.815 s ±  0.062 s      5.814 s ±  0.094 s
    mega-renames:  1799.937 s ±  0.493 s    303.225 s ±  1.330 s


Showing that this optimization has the ability to improve things when the
other optimizations do not apply. In fact, when I originally implemented
this optimization, it improved the mega-renames testcase by a factor of 2
(at the time, I did not have all the optimizations from ort-perf-batch-7
thru ort-perf-batch-10 in their current shape).

As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Further discussion of results ===

If we change our focus from absolute time taken, to the percentage of
overall time spent on rename detection, then we find the following picture
comparing our starting point at the beginning of the performance work to
what we achieve at the end of this series:

       Percentage of time spent on rename detection
   
                  commit 557ac0350d      After this Series
no-renames:             39.4%                   0.2%
mega-renames:           96.6%                   2.7%
just-one-mega:          95.0%                   9.0%


Since this optimization is only applicable for the first two testcases
(because the third only involves rebasing a single commit), even if this
optimization had removed all the remaining time in rename detection it would
have only sped it up the mega-renames case by ~12% rather than the 10% it
achieved. This table makes it clear that our attempts to accelerate rename
detection have succeeded, and any further work to accelerate merges needs to
start concentrating on other areas.

[1]
https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf

[2] Well, almost no changes. There's technically a very narrow way that this
could change the behavior; see the really long "Technically," bullet point
in patch 3 for discussion of this.

Elijah Newren (7):
  merge-ort: add data structures for in-memory caching of rename
    detection
  merge-ort: populate caches of rename detection results
  merge-ort: add code to check for whether cached renames can be reused
  merge-ort: avoid accidental API mis-use
  merge-ort: preserve cached renames for the appropriate side
  merge-ort: add helper functions for using cached renames
  merge-ort, diffcore-rename: employ cached renames when possible

 diffcore-rename.c |  22 +++-
 diffcore.h        |   3 +-
 merge-ort.c       | 273 +++++++++++++++++++++++++++++++++++++++++++---
 merge-ort.h       |   2 +
 4 files changed, 282 insertions(+), 18 deletions(-)


base-commit: c2d2a1ccaea70b7dc8c0539ba9d3a132f9687040
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-859%2Fnewren%2Fort-perf-batch-11-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-859/newren/ort-perf-batch-11-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/859
-- 
gitgitgadget

             reply	other threads:[~2021-03-24 21:33 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 21:32 Elijah Newren via GitGitGadget [this message]
2021-03-24 21:32 ` [PATCH 1/7] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 2/7] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 3/7] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 4/7] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 5/7] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 6/7] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 7/7] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-03-24 22:04 ` [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Junio C Hamano
2021-03-24 23:25   ` Elijah Newren
2021-03-25 18:59     ` Junio C Hamano
2021-03-29 22:34       ` Elijah Newren
2021-03-30 12:07         ` Derrick Stolee
2021-05-04  2:12 ` [PATCH v2 00/13] " Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-17 13:32     ` Derrick Stolee
2021-05-18  3:42       ` Elijah Newren
2021-05-18 13:54         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-17 13:41     ` Derrick Stolee
2021-05-18  3:55       ` Elijah Newren
2021-05-18 13:57         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-17 13:51     ` Derrick Stolee
2021-05-20  0:48       ` Elijah Newren
2021-05-04  2:12   ` [PATCH v2 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-17 14:01     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-17 14:10     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-17 14:16     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-17 14:23     ` Derrick Stolee
2021-05-20  0:36       ` Elijah Newren
2021-05-22 11:17         ` Derrick Stolee
2021-05-14 17:37   ` [PATCH v2 00/13] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren
2021-05-14 21:04     ` Derrick Stolee
2021-05-20  6:09   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-20 11:32       ` Bagas Sanjaya
2021-05-20 15:14         ` Kerry, Richard
2021-05-20 16:34         ` Elijah Newren
2021-05-20  6:09     ` [PATCH v3 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-22 11:17     ` [PATCH v3 00/13] Optimization batch 11: avoid repeatedly detecting same renames Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.859.git.1616621553.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --subject='Re: [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox