All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: "Derrick Stolee" <dstolee@microsoft.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: Re: [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames
Date: Tue, 9 Mar 2021 17:08:37 -0500	[thread overview]
Message-ID: <9fce692f-86f9-59c1-07de-1eaabdbcb827@gmail.com> (raw)
In-Reply-To: <pull.845.v2.git.1615248599.gitgitgadget@gmail.com>

On 3/8/2021 7:09 PM, Elijah Newren via GitGitGadget wrote:> The basic idea here is:
> 
> We only need expensive rename detection on the subset of files changed on
> both sides of history (for the most part).
> 
> This is because:
> 
>  1. The primary reason for rename detection in merges is enabling three-way
>     content merges
>  2. The purpose of three-way content merges is reconciling changes when
> 
> both sides of history modified some file 3. If a file was only modified by
> the side that renamed the file, then detecting the rename is irrelevant;
> we'll get the same answer without knowing about the rename. 4. (Well...there
> are rare cases where we need the rename for reasons other than three-way
> content merges. Patch 5 explains those.)

Makes sense. Don't compute information you won't need. I look forward to
trying to figure out the special cases here.
 
>                      Before Series           After Series
> no-renames:       12.596 s ±  0.061 s     5.680 s ±  0.096 s
> mega-renames:    130.465 s ±  0.259 s    13.812 s ±  0.162 s
> just-one-mega:     3.958 s ±  0.010 s   506.0  ms ±  3.9  ms

These are _very_ impressive numbers for such a "simple" idea.
 
> However, interestingly, if we had ignored the basename-guided rename
> detection optimizations[2][3], then this optimization series would have
> improved the performance as follows:
> 
>                Before Basename Series   After Just This Series
> no-renames:      13.815 s ±  0.062 s      5.728 s ±  0.104 s
> mega-renames:  1799.937 s ±  0.493 s     18.213 s ±  0.139 s
> just-one-mega    51.289 s ±  0.019 s    891.9  ms ±  7.0  ms

And here it is even more impressive. I see that your optimizations are
having combined effects but also are doing valuable things on their
own.

> We get best results by prioritizing them as follows:
> 
>  1. exact rename detection
>  2. skip-because-unnecessary
>  3. basename-guided rename detection

This makes sense to me, since even the basename-guided rename is
doing some non-trivial work. It would be good to reduce that
effort.

> it means that our remaining optimization potential is
> somewhat limited, and subsequent optimization series will have to fight for
> much smaller gains.

This is a good place to end up. Let the code rest for a bit after
we are done here, and maybe we'll find new cases to care about
later. We could chase the long tail forever, but these steps are
a huge accomplishment!

Getting to reading now.

-Stolee

  parent reply	other threads:[~2021-03-09 22:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-28  3:58 [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-02-28  3:58 ` [PATCH 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-01 16:39 ` [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren
2021-03-04  7:54 ` Elijah Newren
2021-03-09  0:09 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-09 22:21     ` Derrick Stolee
2021-03-09 22:40       ` Elijah Newren
2021-03-09 22:45         ` Derrick Stolee
2021-03-09  0:09   ` [PATCH v2 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-09 23:06     ` Derrick Stolee
2021-03-09  0:09   ` [PATCH v2 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-09  0:09   ` [PATCH v2 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-09 22:51     ` Derrick Stolee
2021-03-09 22:57       ` Elijah Newren
2021-03-09  0:09   ` [PATCH v2 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-09 22:08   ` Derrick Stolee [this message]
2021-03-10 15:08   ` [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-11  0:38   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-11  0:38     ` [PATCH v3 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-15 13:57     ` [PATCH v3 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-15 17:10   ` [PATCH v2 " Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fce692f-86f9-59c1-07de-1eaabdbcb827@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.