All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH] diff: only prefetch for certain output formats
Date: Thu, 30 Jan 2020 00:51:36 -0500	[thread overview]
Message-ID: <20200130055136.GA2184413@coredump.intra.peff.net> (raw)
In-Reply-To: <20200130013900.181477-1-jonathantanmy@google.com>

On Wed, Jan 29, 2020 at 05:39:00PM -0800, Jonathan Tan wrote:

> > You could get pretty specific by putting logic inside diffcore_rename(),
> > which would know if anything is left over after exact rename detection,
> > but I suspect just checking:
> > 
> >   (options->break_opt != -1 || options->detect_rename)
> > 
> > in diffcore_std() would be OK in practice.
> 
> Thanks for taking a look at this patch and for the pointers, especially
> to rename detection. I investigated and found that in practice, with
> respect to rename detection, options->detect_rename is insufficient to
> determine exactly when we need to fetch; we need to fetch when
> (for example) a file is deleted and another added, but not when a file
> is merely changed, and these rules are not reflected in
> options->detect_rename. These rules indeed are in diffcore_rename(), as
> you mentioned, but putting logic inside diffcore_rename() (or copying
> the same logic over to diffcore_std()) complicates things for too little
> benefit, I think.
> 
> To add to this, rename detection is turned on by default, so it wouldn't
> even fix the original issue with "status".
> 
> So I'll abandon this patch, at least until someone finds a use case for
> diffing with no rename detection on a partial clone and would rather not
> have a prefetch.

Ah, true, "options->detect_rename" would be overly broad.

I actually don't think it would be that bad to put the logic in
diffcore_rename(). If we wait until the right moment (after inexact
renames have been resolved, and when we see if there are any candidates
left), it should just be a matter of walking over the candidate lists.

Something like this (it would need the add_if_missing() helper from
diffcore_std()):

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 531d7adeaf..d519ffcc45 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -458,6 +458,7 @@ void diffcore_rename(struct diff_options *options)
 	int i, j, rename_count, skip_unmodified = 0;
 	int num_create, dst_cnt;
 	struct progress *progress = NULL;
+	struct oid_array to_fetch = OID_ARRAY_INIT;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
@@ -538,6 +539,25 @@ void diffcore_rename(struct diff_options *options)
 		break;
 	}
 
+	/*
+	 * At this point we know there's actual work to do: we have rename
+	 * destinations that didn't find an exact match, and we have potential
+	 * sources. So we'll have to do inexact rename detection, which
+	 * requires looking at the blobs. It's worth pre-fetching them as a
+	 * group now.
+	 */
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].pair)
+			continue; /* already found exact match */
+		add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
+	}
+	for (i = 0; i < rename_src_nr; i++) {
+		add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);
+	}
+	if (to_fetch.nr)
+		promisor_remote_get_direct(options->repo,
+					   to_fetch.oid, to_fetch.nr);
+
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),

  reply	other threads:[~2020-01-30  5:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28 21:35 [RFC PATCH] diff: only prefetch for certain output formats Jonathan Tan
2020-01-29  5:09 ` Jeff King
2020-01-30  1:39   ` Jonathan Tan
2020-01-30  5:51     ` Jeff King [this message]
2020-01-30 23:20       ` Jonathan Tan
2020-01-31  0:14         ` Jeff King
2020-01-31 18:08           ` Junio C Hamano
2020-02-01 11:29             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200130055136.GA2184413@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.