All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection
Date: Mon, 17 May 2021 20:55:07 -0700	[thread overview]
Message-ID: <CABPp-BFOSBVP-9A6BQegpaPRA+iU=ZQCiJYrTEkq0H9b+xRjEQ@mail.gmail.com> (raw)
In-Reply-To: <b9bb5b44-47ce-8198-c546-8f07d03ef863@gmail.com>

On Mon, May 17, 2021 at 6:41 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/3/21 10:12 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > When there are many renames between the old base of a series of commits
> > and the new base for a series of commits, the sequence of merges
> > employed to transplant those commits (from a cherry-pick or rebase
> > operation) will repeatedly detect the exact same renames.  This is
> > wasted effort.
> >
> > Add data structures which will be used to cache rename detection
> > results, along with the initialization and deallocation of these data
> > structures.  Future commits will populate these caches, detect the
> > appropriate circumstances when they can be used, and employ them to
> > avoid re-detecting the same renames repeatedly.
>
> I appreciate the definitions and boilerplate for these data
> structures being isolated to their own patch.
>
> > @@ -140,6 +140,37 @@ struct rename_info {
> >       int callback_data_nr, callback_data_alloc;
> >       char *callback_data_traverse_path;
> >
> > +     /*
> > +      * cached_pairs: Caching of renames and deletions.
> > +      *
> > +      * These are mappings recording renames and deletions of individual
> > +      * files (not directories).  They are thus a map from an old
> > +      * filename to either NULL (for deletions) or a new filename (for
> > +      * renames).
> > +      */
> > +     struct strmap cached_pairs[3];
> > +
> > +     /*
> > +      * cached_target_names: just the destinations from cached_pairs
> > +      *
> > +      * We sometimes want a fast lookup to determine if a given filename
> > +      * is one of the destinations in cached_pairs.  cached_target_names
> > +      * is thus duplicative information, but it provides a fast lookup.
> > +      */
> > +     struct strset cached_target_names[3];
>
> These two work well together. Very clear.
>
> > +     /*
> > +      * cached_irrelevant: Caching of rename_sources that aren't relevant.
> > +      *
> > +      * cached_pairs records both renames and deletes.  Sometimes we
> > +      * do not know if a path is a rename or a delete because we pass
> > +      * RELEVANT_LOCATION to diffcore_rename_extended() and based on
> > +      * various optimizations it returns without detecting whether that
> > +      * path is actually a rename or a delete.  We need to cache such
> > +      * paths too, but separately from cached_pairs.
> > +      */
> > +     struct strset cached_irrelevant[3];
>
> I'm having a hard time parsing what these "irrelevant" paths will be.
> It seems like diffcore_rename_extended() will report something other
> than "rename" or "delete" for some paths. Could we explicitly mark
> that state as "irrelevant"?

The state is better known as RELEVANT_NO_MORE, yes.

>         /*
>          * cached_irrelevant: Caching of rename_sources that aren't relevant.
>          *
>          * cached_pairs records both renames and deletes.  Sometimes we
>          * do not know if a path is a rename or a delete because we pass
>          * RELEVANT_LOCATION to diffcore_rename_extended() which might
>          * describe a path as "irrelevant" instead of as a "rename" or "delete".
>          *  We need to cache such paths too, but separately from cached_pairs.
>          */
>
> Does this make sense? diffcore_rename_extended() might need an update
> to match this extra, explicit state.

Hmm, let's flesh out the description a bit and try to be more
explicit.  How about:

    /*
     * cached_irrelevant: Caching of rename_sources that aren't relevant.
     *
     * If we try to detect a rename for a source path and succeed, it's
     * part of a rename.  If we try to detect a rename for a source path
     * and fail, then it's a delete.  If we do not try to detect a rename
     * for a path, then we don't know if it's a rename or a delete.  If
     * merge-ort doesn't think the path is relevant, then we just won't
     * cache anything for that path.  But there's a slight problem in
     * that merge-ort can think a path is RELEVANT_LOCATION, but due to
     * commit 9bd342137e ("diffcore-rename: determine which
     * relevant_sources are no longer relevant", 2021-03-13),
     * diffcore-rename can downgrade the path to RELEVANT_NO_MORE.  To
     * avoid excessive calls to diffcore_rename_extended() we still need
     * to cache such paths, though we cannot record them as either
     * renames or deletes.  So we cache them here as a "turned out to be
     * irrelevant *for this commit*" as they are often also irrelevant
     * for subsequent commits, though we will have to do some extra
     * checking to see whether such paths become relevant for rename
     * detection when cherry-picking/rebasing subsequent commits.
     */

  reply	other threads:[~2021-05-18  3:55 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 21:32 [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 1/7] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 2/7] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 3/7] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 4/7] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 5/7] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 6/7] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 7/7] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-03-24 22:04 ` [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Junio C Hamano
2021-03-24 23:25   ` Elijah Newren
2021-03-25 18:59     ` Junio C Hamano
2021-03-29 22:34       ` Elijah Newren
2021-03-30 12:07         ` Derrick Stolee
2021-05-04  2:12 ` [PATCH v2 00/13] " Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-17 13:32     ` Derrick Stolee
2021-05-18  3:42       ` Elijah Newren
2021-05-18 13:54         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-17 13:41     ` Derrick Stolee
2021-05-18  3:55       ` Elijah Newren [this message]
2021-05-18 13:57         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-17 13:51     ` Derrick Stolee
2021-05-20  0:48       ` Elijah Newren
2021-05-04  2:12   ` [PATCH v2 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-17 14:01     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-17 14:10     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-17 14:16     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-17 14:23     ` Derrick Stolee
2021-05-20  0:36       ` Elijah Newren
2021-05-22 11:17         ` Derrick Stolee
2021-05-14 17:37   ` [PATCH v2 00/13] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren
2021-05-14 21:04     ` Derrick Stolee
2021-05-20  6:09   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-20 11:32       ` Bagas Sanjaya
2021-05-20 15:14         ` Kerry, Richard
2021-05-20 16:34         ` Elijah Newren
2021-05-20  6:09     ` [PATCH v3 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-22 11:17     ` [PATCH v3 00/13] Optimization batch 11: avoid repeatedly detecting same renames Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFOSBVP-9A6BQegpaPRA+iU=ZQCiJYrTEkq0H9b+xRjEQ@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.