All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Derrick Stolee <dstolee@gmail.com>, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v3 0/5] Optimization batch 13: partial clone optimizations for merge-ort
Date: Wed, 23 Jun 2021 01:11:19 -0700	[thread overview]
Message-ID: <CABPp-BEuHbFbTWXKhW0MYkcW9ScJfUungwL6y9g=W=bZ=cmCbg@mail.gmail.com> (raw)
In-Reply-To: <8f1edc60-f754-541b-1d66-7f5ec49eff55@gmail.com>

On Tue, Jun 22, 2021 at 7:14 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/22/2021 2:45 PM, Elijah Newren wrote:
> > On Tue, Jun 22, 2021 at 9:10 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> I want to focus on this item:
>
> >> 2. I watched for the partial clone logic to kick in and download blobs.
> >>    Some of these were inevitable: we need the blobs to resolve edit/edit
> >>    conflicts. Most cases none were downloaded at all, so this series is
> >>    working as advertised. There _was_ a case where the inexact rename
> >>    detection requested a large list of files (~2900 in three batches) but
> >>    _then_ said "inexact rename detection was skipped due to too many
> >>    files". This is a case that would be nice to resolve in this series. I
> >>    will try to find exactly where in the code this is being triggered and
> >>    report back.
> >
> > This suggests perhaps that EITHER there was a real modify/delete
> > conflict (because you have to do full rename detection to rule out
> > that the modify/delete was part of some rename), OR that there was a
> > renamed file modified on both sides that did not keep its original
> > basename (because that combination is needed to bypass the various
> > optimizations and make it fall back to full inexact rename detection).
> > Further, in either case, there were enough adds/deletes that full
> > inexact detection is still a bit expensive.  It'd be interesting to
> > know which case it was.  What happens if you set merge.renameLimit to
> > something higher (the default is surprisingly small)?
>
> The behavior I'd like to see is that the partial clone logic is not
> run if we are going to download more than merge.renameLimit files.
> Whatever is getting these missing blobs is earlier than the limit
> check, but it should be after instead.
>
> It's particularly problematic that Git does all the work to get the
> blobs, but then gives up and doesn't even use them for rename
> detection.

I agree with what should happen, but I'm surprised it's not already
happening.  The give up check comes from too_many_rename_candidates(),
which is called before dpf_options.missing_object_cb is even set to
inexact_prefetch.  So I'm not sure how the fetching comes first.  Is
there a microsoft specific patch that changes the order somehow?  Is
there something I'm mis-reading in the code?

> I'm happy that we download necessary blobs when there are a few
> dozen files that need inexact renames. When it gets into the
> thousands, then we jump into a different category of user experience.
>
> Having a stop-gap of rename detection limits is an important way to
> avoid huge amounts of file downloads in these huge repo cases. Users
> can always opt into a larger limit if they really do want that rename
> detection to work at such a large scale, but we still need protections
> for the vast majority of cases where a user isn't willing to pay the
> cost of downloading these blobs.

Sure, that's fair.

But I'm still curious what the particular shape is for the data in
question.  What does the error say merge.renameLimit would be need to
set to?  If it's set higher, do some of the files resolve nicely (i.e.
they really were renames modified on both sides with a different
basename), or are they modify/delete conflicts and we're paying the
cost of rename detection to verify they were deleted and we really do
have a conflict?  I'm curious if there's more to learn and more
optimization potential, basically.  Your repository is bigger, so
there may be more to learn from it than from the testcases I've tried
so far.  :-)

  reply	other threads:[~2021-06-23  8:11 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-05  1:27 [PATCH 0/5] Optimization batch 13: partial clone optimizations for merge-ort Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-09 21:12   ` Derrick Stolee
2021-06-05  1:28 ` [PATCH 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-09 21:16   ` Derrick Stolee
2021-06-05  1:28 ` [PATCH 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-15 22:41 ` [PATCH v2 0/5] Optimization batch 13: partial clone optimizations for merge-ort Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-17  4:49     ` Junio C Hamano
2021-06-15 22:41   ` [PATCH v2 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-17  5:04     ` Junio C Hamano
2021-06-22  8:02       ` Elijah Newren
2021-06-16 17:54   ` [PATCH v2 0/5] Optimization batch 13: partial clone optimizations for merge-ort Derrick Stolee
2021-06-17  5:05   ` Junio C Hamano
2021-06-22  8:04   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-22 16:10     ` [PATCH v3 0/5] Optimization batch 13: partial clone optimizations for merge-ort Derrick Stolee
2021-06-22 18:45       ` Elijah Newren
2021-06-23  2:14         ` Derrick Stolee
2021-06-23  8:11           ` Elijah Newren [this message]
2021-06-23 17:31             ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BEuHbFbTWXKhW0MYkcW9ScJfUungwL6y9g=W=bZ=cmCbg@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.