git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH 1/5] merge-ort: replace string_list_df_name_compare with faster alternative
Date: Thu, 27 May 2021 21:10:27 -0700	[thread overview]
Message-ID: <CABPp-BGXwC5Gtz5PVkXnw2wXzacqjUACB=sgCF5e=BwJiWsYPQ@mail.gmail.com> (raw)
In-Reply-To: <YLBIIL+yShDVlWo5@nand.local>

On Thu, May 27, 2021 at 6:32 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Thu, May 27, 2021 at 08:37:17AM +0000, Elijah Newren via GitGitGadget wrote:
> > -static int string_list_df_name_compare(const char *one, const char *two)
> > +static int sort_dirs_next_to_their_children(const void *a, const void *b)
>
> I looked at the new implementation of this function (and
> df_name_compare() to double check it) and am convinced that it's
> correctness, with the exception of one question.
>
> Some thoughts I had while trying to make sure this was all OK:
>
> > +     const char *one = ((struct string_list_item *)a)->string;
> > +     const char *two = ((struct string_list_item *)b)->string;
> > +     unsigned char c1, c2;
> > +
> > +     while (*one && (*one == *two)) {
> > +             one++;
> > +             two++;
> > +     }
>
> Advancing 'one' and 'two' to point at either the end of 'a' (and the
> same position within 'b'), or the first place where the two have
> different characters. If 'b' is shorter than 'a', '*one != *two' will
> terminate the loop (since '*two' will be NUL, and '*one' will not).
>
> > +     c1 = *one;
> > +     if (!c1)
> > +             c1 = '/';
> > +
> > +     c2 = *two;
> > +     if (!c2)
> > +             c2 = '/';
>
> Store off the last character of each, or '/' if we got to the end. Hmm,
> is this right (the guard in 'df_name_compare()' read 'if (!c1 &&
> S_ISDIR(mode1))'). Suppose both strings were "foo", then both c1 and c2
> would be "/", and I think we would return -1.
>
> That doesn't seem quite right to me. I think it *would* be right if we
> checked the mode of each entry before assigning c1 or c2 to '/',
> though. (Being generally unfamiliar with this area, I haven't looked to
> see whether getting access to the modes of each entry at this point is
> easy or not).

Good reasoning; I should have been clearer about one of the
assumptions that this function operates under which precludes that
possibility.

> > +
> > +     if (c1 == c2) {
> > +             /* Getting here means one is a leading directory of the other */

Your example case of both strings being "foo" obviously conflicts with
this comment; but the comment is correct.  This function will never be
given two equal strings because it is called to sort the keys of a
strmap, and strmap keys are unique by construction.  (If one side of
history has "foo" as a directory, and the other side has "foo" as a
path, then there is still only one "foo" in opt->priv->paths.  Every
entry in opt->priv->paths records 3 hashes and modes and whatnot in
order to know what each side of history had at the given path.)

Also, even in that case (when two strings are equal), getting the
right return value would only matter if we cared about a stable sort.
But we call this function with QSORT, not STABLE_QSORT.

Interestingly, this function technically doesn't even need to fully
sort the array either.  For example, if you took the output of 'git
ls-tree -rt HEAD' and permuted the order of files within the same
directory, that kind of level of quasi-sorted would be good enough for
my purposes; I just need files underneath the same directory to be
"together" and the containing directory to appear immediately before
those.  There is a later call to write_tree() that will hande sorting
within a single directory to ensure fully-sorted-ness.  Unfortunately,
I don't know of a way to take advantage of that less strict sorting
requirement for this point of the code to improve the performance
further, so this function just implemented "pretend that every path
has a '/' appended and then fully sort them".

> > +             return (*one) ? 1 : -1;
> > +     }
>
> > +     else
>
> I did find this spacing awkward (would have expected '} else' or even '}
> else {'), but it hardly matters.

I'll fix that up while adding some comments about the purpose of the
function and the fact that it assumes there will be no duplicates in
the array being sorted.

  reply	other threads:[~2021-05-28  4:12 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  8:37 [PATCH 0/5] Optimization batch 12: miscellaneous unthemed stuff Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 1/5] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-05-27 21:00   ` René Scharfe
2021-05-27 22:47     ` Elijah Newren
2021-05-28 16:12       ` René Scharfe
2021-05-28 18:09         ` Elijah Newren
2021-05-28  1:32   ` Taylor Blau
2021-05-28  4:10     ` Elijah Newren [this message]
2021-05-27  8:37 ` [PATCH 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 4/5] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 5/5] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-01 14:58 ` [PATCH v2 0/5] Optimization batch 12: miscellaneous unthemed stuff Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 1/5] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-02 11:29     ` Derrick Stolee
2021-06-01 14:58   ` [PATCH v2 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Elijah Newren via GitGitGadget
2021-06-03 12:54     ` Derrick Stolee
2021-06-03 14:13       ` Elijah Newren
2021-06-01 14:58   ` [PATCH v2 4/5] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 5/5] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-03 12:55   ` [PATCH v2 0/5] Optimization batch 12: miscellaneous unthemed stuff Derrick Stolee
2021-06-04  4:39   ` [PATCH v3 0/4] " Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 1/4] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 2/4] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 3/4] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 4/4] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-04 13:11     ` [PATCH v3 0/4] Optimization batch 12: miscellaneous unthemed stuff Derrick Stolee
2021-06-04 15:48       ` Elijah Newren
2021-06-04 16:30         ` Elijah Newren
2021-06-04 16:35         ` Jeff King
2021-06-04 18:42           ` Derrick Stolee
2021-06-04 19:43             ` Elijah Newren
2021-06-04 19:53             ` Jeff King
2021-06-08 16:11     ` [PATCH v4 " Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 1/4] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 2/4] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 3/4] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 4/4] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BGXwC5Gtz5PVkXnw2wXzacqjUACB=sgCF5e=BwJiWsYPQ@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).