git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>, Jeff King <peff@peff.net>,
	Karsten Blees <blees@dcon.de>, Derrick Stolee <stolee@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 0/2] Optimization batch 6: make full use of exact renames
Date: Wed, 03 Feb 2021 13:56:21 -0800	[thread overview]
Message-ID: <xmqqlfc4byt6.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <pull.842.v2.git.1612382628.gitgitgadget@gmail.com> (Elijah Newren via GitGitGadget's message of "Wed, 03 Feb 2021 20:03:45 +0000")

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series depends on en/merge-ort-perf and makes full use of exact
> renames; see commit messages for details.
>
> Thanks to Stolee and Junio for reviewing v1.
>
> Changes since v1:
>
>  * Update rename_src_nr when updating rename_src
>  * Introduce want_copies in the first patch and use it in a few more places
>  * Move a comment below a few exit-early if-checks.
>
> Elijah Newren (2):
>   diffcore-rename: no point trying to find a match better than exact
>   diffcore-rename: filter rename_src list when possible
>
>  diffcore-rename.c | 69 +++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 61 insertions(+), 8 deletions(-)

Thanks, these look bettrer.

With these changes, I guess there are only two things I find myself
somewhat embarrassing in the rename machinery that is still there
since I invented it.

 - We still need to go full matrix while finding the "best"
   pairing.  I cannot think of a way to avoid it (that is what makes
   it embarrassing) but wish there were some way to.

   In an early attempt, I tried to retire rename_src[j], once
   rename_dst[i] has been found to be a "good enough" match for it,
   from the pool of rename src candidates to find a good match for
   rename_dst[k] for i < k, but naive implementation of it would not
   work well for obvious reasons---rename_src[j] may match a lot
   better with rename_dst[k] than rename_dst[i] but we do not know
   that until we try to estimate similarity with rename_dst[k].


 - The .cnt_data member was designed to be a concise summary of the
   blob characteristics so that two .cnt_data can be "compared"
   fairly cheaply to see how "similar" two blobs are [*], but (1) it
   is rather big to be called a "concise summary", and (2) it was
   not chosen after real performance measurement, and we've been
   using it for the past 15 years without revisiting its design.

   Side note: In a very early prototype, the approach to assess
   similarity between two blobs was very different---there was no
   attempt to compute "concise summary" for each blob, but we just
   attempted to create delta (as in the pack data) between src and
   dst blobs and measured how small a delta we can use to transform
   from src to dst.


  parent reply	other threads:[~2021-02-03 21:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  5:49 [PATCH 0/2] Optimization batch 6: make full use of exact renames Elijah Newren via GitGitGadget
2021-02-03  5:49 ` [PATCH 1/2] diffcore-rename: no point trying to find a match better than exact Elijah Newren via GitGitGadget
2021-02-03 11:44   ` Derrick Stolee
2021-02-03 16:31     ` Elijah Newren
2021-02-03 18:46     ` Junio C Hamano
2021-02-03 19:10       ` Elijah Newren
2021-02-03  5:49 ` [PATCH 2/2] diffcore-rename: filter rename_src list when possible Elijah Newren via GitGitGadget
     [not found]   ` <13feb106-c3a7-a26d-0e6e-013aa45c58d4@gmail.com>
2021-02-03 17:12     ` Elijah Newren
2021-02-03 19:12   ` Junio C Hamano
2021-02-03 19:19     ` Elijah Newren
2021-02-03 20:03 ` [PATCH v2 0/2] Optimization batch 6: make full use of exact renames Elijah Newren via GitGitGadget
2021-02-03 20:03   ` [PATCH v2 1/2] diffcore-rename: no point trying to find a match better than exact Elijah Newren via GitGitGadget
2021-02-03 20:03   ` [PATCH v2 2/2] diffcore-rename: filter rename_src list when possible Elijah Newren via GitGitGadget
2021-02-13  1:04     ` Junio C Hamano
2021-02-13  4:24       ` Elijah Newren
2021-02-13  1:06     ` Junio C Hamano
2021-02-13  4:43       ` Elijah Newren
2021-02-03 21:56   ` Junio C Hamano [this message]
2021-02-03 23:06     ` [PATCH v2 0/2] Optimization batch 6: make full use of exact renames Elijah Newren
2021-02-03 23:26       ` Junio C Hamano
2021-02-03 23:36       ` Jeff King
2021-02-04  0:05         ` Elijah Newren
2021-02-14  7:34   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-02-14  7:35     ` [PATCH v3 1/2] diffcore-rename: no point trying to find a match better than exact Elijah Newren via GitGitGadget
2021-02-14  7:35     ` [PATCH v3 2/2] diffcore-rename: filter rename_src list when possible Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqlfc4byt6.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=blees@dcon.de \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).