From: Elijah Newren <newren@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>,
Jonathan Nieder <jrnieder@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Taylor Blau <me@ttaylorr.com>,
gitster@pobox.com, Elijah Newren <newren@gmail.com>
Subject: [PATCH v4 0/3] And so it begins...merge/rename performance work
Date: Sat, 23 Jan 2021 22:01:09 -0800 [thread overview]
Message-ID: <20210124060112.1258291-1-newren@gmail.com> (raw)
In-Reply-To: <20210115192958.3336755-1-newren@gmail.com>
This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.
Changes since v3:
* Add a couple preliminary patches needed for later performance work,
including fixing a big memory leak (in some series that has already
been merged to master, I should have included a small additional
section of code from my 'ort' branch, but I overlooked it).
* Update the performance numbers in the commit message of the final
patch based on the changes; the memory leak makes a noticeable
difference on the overall timings since it basically represents
a combination of all the data allocated during the merge algorithm.
Elijah Newren (3):
merge-ort: fix massive leak
merge-ort: ignore the directory rename split conflict for now
merge-ort: begin performance work; instrument with trace2_region_*
calls
diffcore-rename.c | 8 +++++
merge-ort.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 94 insertions(+), 1 deletion(-)
Range-diff:
-: ---------- > 1: 549a63cd2a merge-ort: fix massive leak
-: ---------- > 2: 817a197dbc merge-ort: ignore the directory rename split conflict for now
1: 36d1f87d05 ! 3: 7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls
@@ Commit message
Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
10 runs for the other two cases):
- merge-recursive merge-ort
- no-renames: 18.912 s ± 0.174 s 12.975 s ± 0.037 s
- mega-renames: 5964.031 s ± 10.459 s 5154.338 s ± 19.139 s
- just-one-mega: 149.583 s ± 0.751 s 146.703 s ± 0.852 s
+ merge-recursive merge-ort
+ no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s
+ mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s
+ just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s
A single re-run of each with some breakdowns:
- --- no-renames ---
- merge-recursive merge-ort
- overall runtime: 19.302 s 13.017 s
- inexact rename detection: 7.603 s 7.695 s
- everything else: 11.699 s 5.322 s
+ --- no-renames ---
+ merge-recursive merge-ort
+ overall runtime: 19.302 s 14.257 s
+ inexact rename detection: 7.603 s 7.906 s
+ everything else: 11.699 s 6.351 s
- --- mega-renames ---
- merge-recursive merge-ort
- overall runtime: 5950.195 s 5132.851 s
- inexact rename detection: 5746.309 s 5119.215 s
- everything else: 203.886 s 13.636 s
+ --- mega-renames ---
+ merge-recursive merge-ort
+ overall runtime: 5950.195 s 5499.672 s
+ inexact rename detection: 5746.309 s 5487.120 s
+ everything else: 203.886 s 17.552 s
- --- just-one-mega ---
- merge-recursive merge-ort
- overall runtime: 151.001 s 146.478 s
- inexact rename detection: 143.448 s 145.901 s
- everything else: 7.553 s 0.577 s
+ --- just-one-mega ---
+ merge-recursive merge-ort
+ overall runtime: 151.001 s 158.582 s
+ inexact rename detection: 143.448 s 157.835 s
+ everything else: 7.553 s 0.747 s
=== Timing observations ===
+ 0) Maximum speedup
+
+ The "everything else" row represents the maximum speedup we could
+ achieve if we were to somehow infinitely parallelize inexact rename
+ detection, but leave everything else alone. The fact that this is so
+ much smaller than the real runtime (even in the case with virtually no
+ renames) makes it clear just how overwhelmingly large the time spent on
+ rename detection can be.
+
1) no-renames
1a) merge-ort is faster than merge-recursive, which is nice. However,
@@ Commit message
2a) Obviously rename detection is a huge cost; it's where most the time
is spent. We need to cut that down. If we could somehow infinitely
parallelize it and drive its time to 0, the merge-recursive time would
- drop to about 204s, and the merge-ort time would drop to about 14s. I
+ drop to about 204s, and the merge-ort time would drop to about 17s. I
think this particular stat shows I've subtly baked a couple performance
- improvements into merge-ort[A] (one of them large) and into
- fast-rebase[B] already.
-
- [A] Avoid quadratic behavior with O(N) insertions or removals
- of entries in the index & avoid unconditional dropping and
- re-reading of the index
- [B] Avoid updating the on-disk index or the working directory
- for intermediate patches -- only update at the end
-
- 2b) rename-detection is somehow ~10% cheaper for merge-ort than
- merge-recursive. This was and is a big surprise to me. Both of them
- call diff_tree_oid() and diffcore_std() with the EXACT same inputs. I
- don't have an explanation, but it is very consistent even after
- re-running many times. Interestingly, the rename detection for the
- first patch is more expensive (just barely) for merge-ort than
- merge-recursive, and that is also consistent. I won't investigate this
- further, as I'm just going to focus on 1a & 2a.
+ improvements into merge-ort and into fast-rebase already.
3) just-one-mega
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
assert(opt->branch1 && opt->branch2);
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
- assert(opt->obuf.len == 0);
-
- assert(opt->priv == NULL);
+ assert(!opt->priv->toplevel_dir ||
+ 0 == strlen(opt->priv->toplevel_dir));
+ }
+ trace2_region_leave("merge", "sanity checks", opt->repo);
/* Default to histogram diff. Actually, just hardcode it...for now. */
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
/* Initialization of opt->priv, our internal merge data */
+ trace2_region_enter("merge", "allocate/init", opt->repo);
- opt->priv = xcalloc(1, sizeof(*opt->priv));
-
- /* Initialization of various renames fields */
+ if (opt->priv) {
+ clear_or_reinit_internal_opts(opt->priv, 1);
+ trace2_region_leave("merge", "allocate/init", opt->repo);
@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
* subset of the overall paths that have special output.
*/
--
2.30.0.135.g7f7d4a3e17
next prev parent reply other threads:[~2021-01-24 6:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-08 20:59 ` Taylor Blau
2021-01-08 21:50 ` Elijah Newren
2021-01-08 21:55 ` Taylor Blau
2021-01-09 0:52 ` Elijah Newren
2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-13 22:11 ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-14 19:08 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
2021-01-14 20:18 ` Elijah Newren
2021-01-15 19:29 ` [PATCH v3 " Elijah Newren
2021-01-15 19:29 ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-24 6:01 ` Elijah Newren [this message]
2021-01-24 6:01 ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
2021-01-24 19:11 ` Derrick Stolee
2021-01-24 6:01 ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
2021-01-24 6:01 ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210124060112.1258291-1-newren@gmail.com \
--to=newren@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.