All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 01/11] merge-ort: add basic data structures for handling renames
Date: Fri, 11 Dec 2020 01:41:09 -0800	[thread overview]
Message-ID: <CABPp-BG5_PzmKMV6YMs+fkTB-NFqDbJhduNoOWY5nt+wCE006Q@mail.gmail.com> (raw)
In-Reply-To: <38f68a01-4d22-d88d-dfda-c85f67340819@gmail.com>

On Thu, Dec 10, 2020 at 6:03 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > This will grow later, but we only need a few fields for basic rename
> > handling.
>
> Perhaps these things will be extremely clear as the patch
> series continues, but...
>
> > +struct rename_info {
> > +     /*
> > +      * pairs: pairing of filenames from diffcore_rename()
> > +      *
> > +      * Index 1 and 2 correspond to sides 1 & 2 as used in
> > +      * conflict_info.stages.  Index 0 unused.
>
> Hm. This seems wasteful. I'm sure that you have a reason to use
> index 0 in the future instead of just avoiding instances of [i-1]
> indexes.

Yes, it is...and it gets more wasteful when I increase the number of
fields that are arrays of size 3 with none of them using index 0.
Currently, there's only 1 such field; later there will be 10.

However, this does not scale with the number of files or size of the
repository or anything like that; it's a flat overhead.  At this point
in my patch submissions, that overhead is 16 bytes per merge.  Later
when I have 10 variables that are arrays of size three, it'll be 940
bytes per merge.

I'm not planning on using index 0 later; the reason for this really is
to avoid off-by-one errors (it's one of the two biggest problems in
computer science, right?).  The off-by-one problem becomes huge when
you consider all the references:

* The conflict_info type has stages which is an array of size three --
index 0 is always the base commit, index 1 is side1, and index 2 is
side2.  There is one of these per path involved in the merge, and are
used all over the place, so it's nice to think in terms of "1 is
side1, 2 is side2".  (There is also a pathnames variable of size three
with the same indexing rules, and a bunch of bitmasks that rely on
2<<0 == base, 2<<1 == side1, and 2<<2 == side2.)

* These other 10 variables that are arrays of size 3 in the
rename_info struct are all keeping track of information for side1 and
side2.  When you consider the number of references for all 10 of them
combined across the codebase, it adds up to quite a bit.

I'm certain that if I would have had to use off-by-one indexing for
these 10 variables, while using not-off-by-one indexing for the stages
and pathnames in conflict_info, I'm certain I would have messed it up
many dozen times and spent countless hours tracking down bugs.  And I
think the result would be a lot harder to review.  And future
developers would come along and fall into that trap and get various
indices off.

I'm willing to pay a one-time overhead of 940 bytes to avoid that.

> > +      */
> > +     struct diff_queue_struct pairs[3];
> > +
> > +     /*
> > +      * needed_limit: value needed for inexact rename detection to run
> > +      *
> > +      * If the current rename limit wasn't high enough for inexact
> > +      * rename detection to run, this records the limit needed.  Otherwise,
> > +      * this value remains 0.
> > +      */
> > +     int needed_limit;
> > +};
> > +
> >  struct merge_options_internal {
> >       /*
> >        * paths: primary data structure in all of merge ort.
> > @@ -96,6 +115,11 @@ struct merge_options_internal {
> >        */
> >       struct strmap output;
> >
> > +     /*
> > +      * renames: various data relating to rename detection
> > +      */
> > +     struct rename_info *renames;
> > +
>
> And here, you create this as a pointer, but...
> >       /* Initialization of opt->priv, our internal merge data */
> >       opt->priv = xcalloc(1, sizeof(*opt->priv));
> > +     opt->priv->renames = xcalloc(1, sizeof(*opt->priv->renames));
>
> ...unconditionally allocate it here. Perhaps there are other cases
> where 'struct merge_options_internal' is allocated without the renames
> member?
>
> Searching merge-ort.c at this point does not appear to have any
> other allocations of opt->priv or struct merge_options_internal.
> Perhaps it would be best to include struct rename_info not as a
> pointer?

That's a really good point; I'll try it out.

> If you do have a reason to keep it as a pointer, then perhaps it
> should be freed in clear_internal_opts()?

Eek.  It's there in my 'ort' branch, but one of the problems trying to
rearrange and clean things up to make nice digestible series is that
you sometimes forget to bring important parts along.  Whoops; good
catch.  I'm going to try just turning renames into an embedded struct
instead of a pointer, though.  If it doesn't work out, I'll make sure
to clear it.

  reply	other threads:[~2020-12-11  9:42 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
2020-12-11  2:03   ` Derrick Stolee
2020-12-11  9:41     ` Elijah Newren [this message]
2020-12-09 19:41 ` [PATCH 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
2020-12-11  2:39   ` Derrick Stolee
2020-12-11  9:40     ` Elijah Newren
2020-12-13  7:47     ` Elijah Newren
2020-12-14 14:33       ` Derrick Stolee
2020-12-14 15:42         ` Johannes Schindelin
2020-12-14 16:11           ` Elijah Newren
2020-12-14 16:50             ` Johannes Schindelin
2020-12-14 17:35         ` Elijah Newren
2020-12-09 19:41 ` [PATCH 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
2020-12-11  2:54   ` Derrick Stolee
2020-12-11 17:38     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
2020-12-11  3:00   ` Derrick Stolee
2020-12-11 18:43     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
2020-12-11  3:24   ` Derrick Stolee
2020-12-11 20:03     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
2020-12-11  3:32   ` Derrick Stolee
2020-12-09 19:41 ` [PATCH 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
2020-12-11  3:39   ` Derrick Stolee
2020-12-11 21:56     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
2020-12-15 14:09     ` Derrick Stolee
2020-12-15 16:56       ` Elijah Newren
2020-12-14 16:21   ` [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
2020-12-15 14:23     ` Derrick Stolee
2020-12-15 17:07       ` Elijah Newren
2020-12-15 14:27     ` Derrick Stolee
2020-12-14 16:21   ` [PATCH v2 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
2020-12-15 14:27     ` Derrick Stolee
2020-12-14 16:21   ` [PATCH v2 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
2020-12-15 14:31     ` Derrick Stolee
2020-12-15 17:11       ` Elijah Newren
2020-12-15 14:34   ` [PATCH v2 00/11] merge-ort: add basic rename detection Derrick Stolee
2020-12-15 22:09     ` Junio C Hamano
2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 08/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 09/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BG5_PzmKMV6YMs+fkTB-NFqDbJhduNoOWY5nt+wCE006Q@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.