All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	gitster@pobox.com, Elijah Newren <newren@gmail.com>
Subject: [PATCH v4 0/3] And so it begins...merge/rename performance work
Date: Sat, 23 Jan 2021 22:01:09 -0800	[thread overview]
Message-ID: <20210124060112.1258291-1-newren@gmail.com> (raw)
In-Reply-To: <20210115192958.3336755-1-newren@gmail.com>

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

Changes since v3:
  * Add a couple preliminary patches needed for later performance work,
    including fixing a big memory leak (in some series that has already
    been merged to master, I should have included a small additional
    section of code from my 'ort' branch, but I overlooked it).
  * Update the performance numbers in the commit message of the final
    patch based on the changes; the memory leak makes a noticeable
    difference on the overall timings since it basically represents
    a combination of all the data allocated during the merge algorithm.

Elijah Newren (3):
  merge-ort: fix massive leak
  merge-ort: ignore the directory rename split conflict for now
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++
 merge-ort.c       | 87 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 1 deletion(-)

Range-diff:
-:  ---------- > 1:  549a63cd2a merge-ort: fix massive leak
-:  ---------- > 2:  817a197dbc merge-ort: ignore the directory rename split conflict for now
1:  36d1f87d05 ! 3:  7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls
    @@ Commit message
         Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
         10 runs for the other two cases):
     
    -                      merge-recursive         merge-ort
    -      no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
    -      mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
    -      just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s
    +                           merge-recursive           merge-ort
    +        no-renames:       18.912 s ±  0.174 s    14.263 s ±  0.053 s
    +        mega-renames:   5964.031 s ± 10.459 s  5504.231 s ±  5.150 s
    +        just-one-mega:   149.583 s ±  0.751 s   158.534 s ±  0.498 s
     
         A single re-run of each with some breakdowns:
     
    -                                      ---  no-renames  ---
    -                                merge-recursive   merge-ort
    -      overall runtime:              19.302 s        13.017 s
    -      inexact rename detection:      7.603 s         7.695 s
    -      everything else:              11.699 s         5.322 s
    +                                        ---  no-renames  ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:              19.302 s        14.257 s
    +        inexact rename detection:      7.603 s         7.906 s
    +        everything else:              11.699 s         6.351 s
     
    -                                      --- mega-renames ---
    -                                merge-recursive   merge-ort
    -      overall runtime:            5950.195 s      5132.851 s
    -      inexact rename detection:   5746.309 s      5119.215 s
    -      everything else:             203.886 s        13.636 s
    +                                        --- mega-renames ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:            5950.195 s      5499.672 s
    +        inexact rename detection:   5746.309 s      5487.120 s
    +        everything else:             203.886 s        17.552 s
     
    -                                      --- just-one-mega ---
    -                                merge-recursive   merge-ort
    -      overall runtime:             151.001 s       146.478 s
    -      inexact rename detection:    143.448 s       145.901 s
    -      everything else:               7.553 s         0.577 s
    +                                        --- just-one-mega ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:             151.001 s       158.582 s
    +        inexact rename detection:    143.448 s       157.835 s
    +        everything else:               7.553 s         0.747 s
     
         === Timing observations ===
     
    +    0) Maximum speedup
    +
    +    The "everything else" row represents the maximum speedup we could
    +    achieve if we were to somehow infinitely parallelize inexact rename
    +    detection, but leave everything else alone.  The fact that this is so
    +    much smaller than the real runtime (even in the case with virtually no
    +    renames) makes it clear just how overwhelmingly large the time spent on
    +    rename detection can be.
    +
         1) no-renames
     
         1a) merge-ort is faster than merge-recursive, which is nice.  However,
    @@ Commit message
         2a) Obviously rename detection is a huge cost; it's where most the time
         is spent.  We need to cut that down.  If we could somehow infinitely
         parallelize it and drive its time to 0, the merge-recursive time would
    -    drop to about 204s, and the merge-ort time would drop to about 14s.  I
    +    drop to about 204s, and the merge-ort time would drop to about 17s.  I
         think this particular stat shows I've subtly baked a couple performance
    -    improvements into merge-ort[A] (one of them large) and into
    -    fast-rebase[B] already.
    -
    -        [A] Avoid quadratic behavior with O(N) insertions or removals
    -            of entries in the index & avoid unconditional dropping and
    -            re-reading of the index
    -        [B] Avoid updating the on-disk index or the working directory
    -            for intermediate patches -- only update at the end
    -
    -    2b) rename-detection is somehow ~10% cheaper for merge-ort than
    -    merge-recursive.  This was and is a big surprise to me.  Both of them
    -    call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
    -    don't have an explanation, but it is very consistent even after
    -    re-running many times.  Interestingly, the rename detection for the
    -    first patch is more expensive (just barely) for merge-ort than
    -    merge-recursive, and that is also consistent.  I won't investigate this
    -    further, as I'm just going to focus on 1a & 2a.
    +    improvements into merge-ort and into fast-rebase already.
     
         3) just-one-mega
     
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
      
      	assert(opt->branch1 && opt->branch2);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
    - 	assert(opt->obuf.len == 0);
    - 
    - 	assert(opt->priv == NULL);
    + 		assert(!opt->priv->toplevel_dir ||
    + 		       0 == strlen(opt->priv->toplevel_dir));
    + 	}
     +	trace2_region_leave("merge", "sanity checks", opt->repo);
      
      	/* Default to histogram diff.  Actually, just hardcode it...for now. */
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res
      
      	/* Initialization of opt->priv, our internal merge data */
     +	trace2_region_enter("merge", "allocate/init", opt->repo);
    - 	opt->priv = xcalloc(1, sizeof(*opt->priv));
    - 
    - 	/* Initialization of various renames fields */
    + 	if (opt->priv) {
    + 		clear_or_reinit_internal_opts(opt->priv, 1);
    + 		trace2_region_leave("merge", "allocate/init", opt->repo);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
      	 * subset of the overall paths that have special output.
      	 */
-- 
2.30.0.135.g7f7d4a3e17


  parent reply	other threads:[~2021-01-24  6:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 20:51 [PATCH 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-08 20:51 ` [PATCH 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-08 20:59   ` Taylor Blau
2021-01-08 21:50     ` Elijah Newren
2021-01-08 21:55       ` Taylor Blau
2021-01-09  0:52         ` Elijah Newren
2021-01-13 22:11 ` [PATCH v2 0/1] And so it begins...merge/rename performance work Elijah Newren
2021-01-13 22:11   ` [PATCH v2 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-14 19:08   ` [PATCH v2 0/1] And so it begins...merge/rename performance work Junio C Hamano
2021-01-14 20:18     ` Elijah Newren
2021-01-15 19:29   ` [PATCH v3 " Elijah Newren
2021-01-15 19:29     ` [PATCH v3 1/1] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren
2021-01-24  6:01     ` Elijah Newren [this message]
2021-01-24  6:01       ` [PATCH v4 1/3] merge-ort: fix massive leak Elijah Newren
2021-01-24 19:11         ` Derrick Stolee
2021-01-24  6:01       ` [PATCH v4 2/3] merge-ort: ignore the directory rename split conflict for now Elijah Newren
2021-01-24  6:01       ` [PATCH v4 3/3] merge-ort: begin performance work; instrument with trace2_region_* calls Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210124060112.1258291-1-newren@gmail.com \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.