All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 1/5] merge-ort: replace string_list_df_name_compare with faster alternative
Date: Fri, 28 May 2021 11:09:25 -0700	[thread overview]
Message-ID: <CABPp-BHoZTFwcqZvLLdPR0deuBRjzeY9DgPVkCu4GYkt9=gc-Q@mail.gmail.com> (raw)
In-Reply-To: <106a5592-4e5e-cf06-16e4-100ca3c3fb5f@web.de>

On Fri, May 28, 2021 at 9:12 AM René Scharfe <l.s.r@web.de> wrote:
>
> Am 28.05.21 um 00:47 schrieb Elijah Newren:
> > On Thu, May 27, 2021 at 2:00 PM René Scharfe <l.s.r@web.de> wrote:
> >>
> >> Am 27.05.21 um 10:37 schrieb Elijah Newren via GitGitGadget:
...
> >>>  /*** Function Grouping: functions related to process_entries() ***/
> >>>
> >>> -static int string_list_df_name_compare(const char *one, const char *two)
> >>> +static int sort_dirs_next_to_their_children(const void *a, const void *b)
> >>>  {
> >>> -     int onelen = strlen(one);
> >>> -     int twolen = strlen(two);
> >>
> >> The old code scans both strings fully, while the new one stops when it
> >> reaches a difference and doesn't look at any further characters.  How
> >> much does that contribute to the speedup?  (I suspect a lot.)
> >
> > Oh, indeed, good catch.  It appears to be responsible for essentially all of it.
>
> Then you can keep the original function signature (as well as the use of
> string_list_sort) and avoid explicit casts.

Fair enough; though I'll probably still apply the function rename,
even if I keep the old signature.  The function has a completely
different purpose in merge-ort.c than it does in merge-recursive.c.

> The same function exists in merge-recursive.c, by the way.  I suspect we
> could avoid sorting entirely there by taking advantage of the index
> order and a mechanism like the one in the second half of
> fsck.c::verify_ordered().  That's a bit tricky, though (for me anyway).
>
> All tests still pass when I replace string_list_df_name_compare() with
> strcmp() in merge-recursive.c, so the first thing needed would be tests
> that highlight the difference between those comparison functions,
> however.  Not sure if it's worth it -- merge-recursive is on its way
> out, right?

Interesting idea (and interesting that the function might not matter
anymore in merge-recursive.c and might be replaceable by strcmp), but
yeah merge-recursive.c is on its way out.  In fact, there are two
other reasons to also not modify merge-recursive.c along these lines:
(1) We want to keep merge-recursive stable right now as we push people
to try out merge-ort, so that we have a good comparison point if they
run into problems; if we modify merge-recursive, especially in tricky
ways, then our comparison point might get invalidated.  (2) This
codepath is relatively hot in merge-ort.c because it has optimized all
the other stuff, so saving approximately half a second is significant.
In merge-recursive.c, saving half a second is a small blip in the
overall timing that would be lost in the run-to-run variability.
(Saving 0.7s off of 10.127 s ±  0.073 s is huge; but removing that
much time from 5964.031 s ± 10.459 is wasted effort.)

> Not sure if d/f conflicts could also be detected in merge-ort.c without
> sorting -- the original order is lost when the paths are thrown into
> a strmap.

While string_list_df_name_compare() in merge-recursive.c is used for
the purpose of detecting D/F conflicts, this function in merge-ort.c
had nothing to do with detecting D/F conflicts.  In fact, D/F
conflicts were already detected in collect_merge_info_callback() on
this line:

    unsigned df_conflict = (filemask != 0) && (dirmask != 0);

and recorded in setup_path_info(), which was done long before it got
to this point in the code.  The point of this sorting in merge-ort.c
is just to put paths in an order that allows us to write out trees as
we process them and record the resulting hash for the next directory
up the chain as we go.

  reply	other threads:[~2021-05-28 18:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  8:37 [PATCH 0/5] Optimization batch 12: miscellaneous unthemed stuff Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 1/5] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-05-27 21:00   ` René Scharfe
2021-05-27 22:47     ` Elijah Newren
2021-05-28 16:12       ` René Scharfe
2021-05-28 18:09         ` Elijah Newren [this message]
2021-05-28  1:32   ` Taylor Blau
2021-05-28  4:10     ` Elijah Newren
2021-05-27  8:37 ` [PATCH 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 4/5] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-05-27  8:37 ` [PATCH 5/5] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-01 14:58 ` [PATCH v2 0/5] Optimization batch 12: miscellaneous unthemed stuff Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 1/5] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-02 11:29     ` Derrick Stolee
2021-06-01 14:58   ` [PATCH v2 2/5] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 3/5] diffcore-rename: enable limiting rename detection to relevant destinations Elijah Newren via GitGitGadget
2021-06-03 12:54     ` Derrick Stolee
2021-06-03 14:13       ` Elijah Newren
2021-06-01 14:58   ` [PATCH v2 4/5] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-01 14:58   ` [PATCH v2 5/5] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-03 12:55   ` [PATCH v2 0/5] Optimization batch 12: miscellaneous unthemed stuff Derrick Stolee
2021-06-04  4:39   ` [PATCH v3 0/4] " Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 1/4] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 2/4] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 3/4] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-04  4:39     ` [PATCH v3 4/4] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget
2021-06-04 13:11     ` [PATCH v3 0/4] Optimization batch 12: miscellaneous unthemed stuff Derrick Stolee
2021-06-04 15:48       ` Elijah Newren
2021-06-04 16:30         ` Elijah Newren
2021-06-04 16:35         ` Jeff King
2021-06-04 18:42           ` Derrick Stolee
2021-06-04 19:43             ` Elijah Newren
2021-06-04 19:53             ` Jeff King
2021-06-08 16:11     ` [PATCH v4 " Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 1/4] merge-ort: replace string_list_df_name_compare with faster alternative Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 2/4] diffcore-rename: avoid unnecessary strdup'ing in break_idx Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 3/4] Fix various issues found in comments Elijah Newren via GitGitGadget
2021-06-08 16:11       ` [PATCH v4 4/4] merge-ort: miscellaneous touch-ups Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BHoZTFwcqZvLLdPR0deuBRjzeY9DgPVkCu4GYkt9=gc-Q@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.