git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 6/7] diffcore-rename: simplify and accelerate register_rename_src()
Date: Wed, 9 Dec 2020 16:25:18 -0800	[thread overview]
Message-ID: <CABPp-BGPfdFj1q09XVa7UOVz-0K9yf9Lp-h2wh6+nVVbVu94yg@mail.gmail.com> (raw)
In-Reply-To: <X9FScb3pzY0EBLvS@nand.local>

On Wed, Dec 9, 2020 at 2:40 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Dec 06, 2020 at 02:54:35AM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > reigster_rename_src() took pains to create an array in rename_src which
> > was sorted by pathname of the contained diff_filepair.  However, the
> > fact that this array was sorted was not needed anywhere, and thus
> > represented wasted time.  Simply append to the end of the array, which
> > in a usecase of note saved 45% of diffcore_rename() setup time for me.
> >
> > Technically, there is one difference in the end result for the code.  IF
>
> s/IF/If ?

Indeed; will fix.

> > the caller of diffcore-rename provides a set of pairs with a duplicate
> > filename in the sources (an erroneous input condition), the old code
> > would simply ignore the duplicate (without warning or error).  The new
> > code will simply swallow it and thus allow multiple pairings for the
> > "same" source file.  Checking for this error condition is expensive
> > (basically undoing the optimization) and the only existing callers
> > already avoid passing such an invalid input.  If someone really wants to
> > add some error checking here on this input condition, I believe they
> > should add a separate function which can be called before
> > diffcore_rename(), and then other callers that don't need additional
> > checks because of their design (such as merge-ort) can avoid the extra
> > overhead.
>
> It seems like this is currently impossible to trigger, making any extra
> (expensive) checking of it worthless, no?

I believe that's what it currently amounts to, and I debated just
ripping the paragraph out.

However, a natural question that can easily arise for current
reviewers or future readers of the patch is why was there ever sorting
in the first place if the sorting isn't used?  That question came up
for me, and I dug into it.  Sorting is also used with rename_dst in
nearby code in the file, and there are a few reasons for it there.
The quick indexing that rename_dst needs doesn't apply to rename_src,
but it's not as obvious whether the
broken-trees-with-duplicate-entries rationale applies or not.  I spent
a while digging into it.  (And it's possible that I didn't correctly
check the callers or that in the seven months since I wrote this
message someone added another caller of this code that does pass
multiple diff_filepair-s for the "same" source file.)  Anyway, this
paragraph exists because I had to go down a goose chase to answer this
natural question, and I wanted to provide an answer to anyone else
asking the same question.  Also, in the off chance that anyone did
want to add callers that passed multiple copies of any source file, I
wanted to point out that this modified algorithm would result in a
slight behavioral difference, but that otherwise the modified
algorithm gives identical results.  (And if some future reader
stumbled on the paragraph because they had made such a change, I
wanted to provide a quick suggestion of how to get what they wanted
without adversely affecting performance.)

I hope that helps.  I'm sorry if my worrying about these cases and
discussing them made the patch harder to read or review, but I feel
like diffcore-rename is one of those low-level components you want to
be careful with.  diffcore-rename is one of those parts of the code
where a bug in it might not result in a directly observable breakage
to users but in some secondary or tertiary side-effect showing weird
results (e.g. in a merge you won't necessarily see that A was renamed
to B, instead you get a three-way content merge of original A, other
A, and new B -- or don't get a three-way content merge you might
expect).

> > Also, note that I dropped the return type on the function since it was
> > unconditionally discarded anyway.
> >
> > This patch is being submitted in a different order than its original
> > development, but in a large rebase of many commits with lots of renames
> > and with several optimizations to inexact rename detection,
> > diffcore_rename() setup time was a sizeable chunk of overall runtime.
> > This patch dropped execution time of rebasing 35 commits with lots of
> > renames by 2% overall.
>
> Neat!
>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  diffcore-rename.c | 30 +++---------------------------
> >  1 file changed, 3 insertions(+), 27 deletions(-)
> >
> > diff --git a/diffcore-rename.c b/diffcore-rename.c
> > index 3d637ba4645..816d2fbac44 100644
> > --- a/diffcore-rename.c
> > +++ b/diffcore-rename.c
> > @@ -76,36 +76,12 @@ static struct diff_rename_src {
> > [...]
>
> This looks obviously correct.

Thanks for taking a look!

  reply	other threads:[~2020-12-10  0:26 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-06  2:54 [PATCH 0/7] diffcore-rename improvements Elijah Newren via GitGitGadget
2020-12-06  2:54 ` [PATCH 1/7] diffcore-rename: avoid usage of global in too_many_rename_candidates() Elijah Newren via GitGitGadget
2020-12-09 22:06   ` Taylor Blau
2020-12-06  2:54 ` [PATCH 2/7] diffcore-rename: remove unnecessary if-clause Elijah Newren via GitGitGadget
2020-12-09 22:10   ` Taylor Blau
2020-12-10  0:32     ` Elijah Newren
2020-12-10  2:03     ` Junio C Hamano
2020-12-10  2:17       ` Elijah Newren
2020-12-10  6:56         ` Junio C Hamano
2020-12-06  2:54 ` [PATCH 3/7] diffcore-rename: rename num_create to num_targets Elijah Newren via GitGitGadget
2020-12-10  2:20   ` Junio C Hamano
2020-12-10  2:25     ` Elijah Newren
2020-12-06  2:54 ` [PATCH 4/7] diffcore-rename: change a few comments to use 'add' instead of 'create' Elijah Newren via GitGitGadget
2020-12-10  2:29   ` Junio C Hamano
2020-12-06  2:54 ` [PATCH 5/7] diffcore-rename: reduce jumpiness in progress counters Elijah Newren via GitGitGadget
2020-12-09 22:24   ` Taylor Blau
2020-12-10  2:36   ` Junio C Hamano
2020-12-06  2:54 ` [PATCH 6/7] diffcore-rename: simplify and accelerate register_rename_src() Elijah Newren via GitGitGadget
2020-12-09 22:40   ` Taylor Blau
2020-12-10  0:25     ` Elijah Newren [this message]
2020-12-10  0:41       ` Taylor Blau
2020-12-10  2:51   ` Junio C Hamano
2020-12-06  2:54 ` [PATCH 7/7] Accelerate rename_dst setup Elijah Newren via GitGitGadget
2020-12-09 23:01   ` Taylor Blau
2020-12-10  0:57     ` Elijah Newren
2020-12-10  1:43       ` Junio C Hamano
2020-12-06  3:01 ` [PATCH 0/7] diffcore-rename improvements Elijah Newren
2020-12-11  9:08 ` [PATCH v2 0/9] " Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 1/9] diffcore-rename: rename num_create to num_destinations Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 2/9] diffcore-rename: avoid usage of global in too_many_rename_candidates() Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 3/9] diffcore-rename: simplify limit check Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 4/9] diffcore-rename: reduce jumpiness in progress counters Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 5/9] t4058: add more tests and documentation for duplicate tree entry handling Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 6/9] t4058: explore duplicate tree entry handling in a bit more detail Elijah Newren via GitGitGadget
2021-04-21 12:29     ` Ævar Arnfjörð Bjarmason
2021-04-21 17:38       ` Elijah Newren
2020-12-11  9:08   ` [PATCH v2 7/9] diffcore-rename: simplify and accelerate register_rename_src() Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 8/9] diffcore-rename: accelerate rename_dst setup Elijah Newren via GitGitGadget
2020-12-11  9:08   ` [PATCH v2 9/9] diffcore-rename: remove unneccessary duplicate entry checks Elijah Newren via GitGitGadget
2020-12-29  8:31     ` Christian Couder
2020-12-29 18:09       ` Elijah Newren
2020-12-29 20:05   ` [PATCH v3 0/9] diffcore-rename improvements Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 1/9] diffcore-rename: rename num_create to num_destinations Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 2/9] diffcore-rename: avoid usage of global in too_many_rename_candidates() Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 3/9] diffcore-rename: simplify limit check Elijah Newren via GitGitGadget
2021-11-09 21:14       ` Başar Uğur
2021-11-10 20:06         ` Elijah Newren
2021-11-11  9:02           ` Başar Uğur
2021-11-11 16:19             ` Elijah Newren
2020-12-29 20:05     ` [PATCH v3 4/9] diffcore-rename: reduce jumpiness in progress counters Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 5/9] t4058: add more tests and documentation for duplicate tree entry handling Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 6/9] t4058: explore duplicate tree entry handling in a bit more detail Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 7/9] diffcore-rename: simplify and accelerate register_rename_src() Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 8/9] diffcore-rename: accelerate rename_dst setup Elijah Newren via GitGitGadget
2020-12-29 20:05     ` [PATCH v3 9/9] diffcore-rename: remove unnecessary duplicate entry checks Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGPfdFj1q09XVa7UOVz-0K9yf9Lp-h2wh6+nVVbVu94yg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).