git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, stolee@gmail.com, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 2/2] diff: restrict when prefetching occurs
Date: Thu, 02 Apr 2020 13:08:51 -0700	[thread overview]
Message-ID: <xmqq7dyx3b1o.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <a3322cdedf019126305fcead5918d523a1b2dfbc.1585854639.git.jonathantanmy@google.com> (Jonathan Tan's message of "Thu, 2 Apr 2020 12:19:17 -0700")

Jonathan Tan <jonathantanmy@google.com> writes:

> +	int prefetched = 0;
> +	int output_formats_to_prefetch = DIFF_FORMAT_DIFFSTAT |
> +		DIFF_FORMAT_NUMSTAT |
> +		DIFF_FORMAT_PATCH |
> +		DIFF_FORMAT_SHORTSTAT |
> +		DIFF_FORMAT_DIRSTAT;

Would this want to be a "const int" (or even #define), I wonder.  I
do not care too much between the two, but leaving it as a variable
makes me a bit nervous.

> +	/*
> +	 * Check if the user requested a blob-data-requiring diff output and/or
> +	 * break-rewrite detection (which requires blob data). If yes, prefetch
> +	 * the diff pairs.
> +	 *
> +	 * If no prefetching occurs, diffcore_rename() will prefetch if it
> +	 * decides that it needs inexact rename detection.
> +	 */

Name-only etc. that Derrick mentioned in the other thread would be
relevant only when rename detection is active, and you'd do that in
diffcore_rename().  Good.

> +	if (options->repo == the_repository && has_promisor_remote() &&
> +	    (options->output_format & output_formats_to_prefetch ||
> +	     (!options->found_follow && options->break_opt != -1))) {
>  		int i;
>  		struct diff_queue_struct *q = &diff_queued_diff;
>  		struct oid_array to_fetch = OID_ARRAY_INIT;
>  
>  		for (i = 0; i < q->nr; i++) {
>  			struct diff_filepair *p = q->queue[i];
> -			add_if_missing(options->repo, &to_fetch, p->one);
> -			add_if_missing(options->repo, &to_fetch, p->two);
> +			diff_add_if_missing(options->repo, &to_fetch, p->one);
> +			diff_add_if_missing(options->repo, &to_fetch, p->two);
>  		}
> +
> +		prefetched = 1;
> +

Wouldn't it logically make more sense to do this after calling
promisor_remote_get_direct() and if to_fetch.nr is not 0, ...

>  		/*
>  		 * NEEDSWORK: Consider deduplicating the OIDs sent.
>  		 */
>  		promisor_remote_get_direct(options->repo,
>  					   to_fetch.oid, to_fetch.nr);
> +

... namely, here?

When (q->nr != 0), to_fetch.nr may not be zero, I suspect, but the
original code before [1/2] protected against to_fetch.nr==0 case, so
...?

>  		oid_array_clear(&to_fetch);
>  	}
>  
> @@ -6537,7 +6555,7 @@ void diffcore_std(struct diff_options *options)
>  			diffcore_break(options->repo,
>  				       options->break_opt);
>  		if (options->detect_rename)
> -			diffcore_rename(options);
> +			diffcore_rename(options, prefetched);
>  		if (options->break_opt != -1)
>  			diffcore_merge_broken();
>  	}
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index e189f407af..79ac1b4bee 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -7,6 +7,7 @@
>  #include "object-store.h"
>  #include "hashmap.h"
>  #include "progress.h"
> +#include "promisor-remote.h"
>  
>  /* Table of rename/copy destinations */
>  
> @@ -448,7 +449,7 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i
>  	return count;
>  }
>  
> -void diffcore_rename(struct diff_options *options)
> +void diffcore_rename(struct diff_options *options, int prefetched)
>  {
>  	int detect_rename = options->detect_rename;
>  	int minimum_score = options->rename_score;
> @@ -538,6 +539,40 @@ void diffcore_rename(struct diff_options *options)
>  		break;
>  	}
>  
> +	if (!prefetched) {
> +		/*
> +		 * At this point we know there's actual work to do: we have rename
> +		 * destinations that didn't find an exact match, and we have potential
> +		 * sources. So we'll have to do inexact rename detection, which
> +		 * requires looking at the blobs.
> +		 *
> +		 * If we haven't already prefetched, it's worth pre-fetching
> +		 * them as a group now.
> +		 */

This comment makes me wonder if it would be even better to

 - prepare an empty to_fetch OID array in the caller,

 - if the output format is one of the ones that wants prefetch, add
   object names to to_fetch in the caller, BUT not fetch there.

 - pass &to_fetch by the caller to this function, and this code here
   may add even more objects,

 - then do the prefetch here (so a single promisor interaction will
   grab objects the caller would have fetched before calling us and
   the ones we want here), and then clear the to_fetch array.

 - the caller, after seeing this function returns, checks to_fetch
   and if it is not empty, fetches (i.e. the caller prepared list of
   objects based on the output type, we ended up not calling this
   helper, and then finally the caller does the prefetch).

That way, the "unless we have already prefetched" logic can go, and
we can lose one indentation level, no?


> +		int i;
> +		struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> +		for (i = 0; i < rename_dst_nr; i++) {
> +			if (rename_dst[i].pair)
> +				continue; /* already found exact match */
> +			diff_add_if_missing(options->repo, &to_fetch, rename_dst[i].two);
> +		}
> +		for (i = 0; i < rename_src_nr; i++) {
> +			if (skip_unmodified &&
> +			    diff_unmodified_pair(rename_src[i].p))
> +				/*
> +				 * The "for" loop below will not need these
> +				 * blobs, so skip prefetching.
> +				 */
> +				continue;
> +			diff_add_if_missing(options->repo, &to_fetch, rename_src[i].p->one);
> +		}
> +		if (to_fetch.nr)
> +			promisor_remote_get_direct(options->repo,
> +						   to_fetch.oid, to_fetch.nr);

You no longer need the if(), no?

> +		oid_array_clear(&to_fetch);
> +	}

  reply	other threads:[~2020-04-02 20:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  2:04 [PATCH] diff: restrict when prefetching occurs Jonathan Tan
2020-03-31 12:14 ` Derrick Stolee
2020-03-31 16:50   ` Jonathan Tan
2020-03-31 17:48     ` Derrick Stolee
2020-03-31 18:21       ` Junio C Hamano
2020-03-31 18:15 ` Junio C Hamano
2020-04-02 19:19 ` [PATCH v2 0/2] Restrict when prefetcing occurs Jonathan Tan
2020-04-02 19:19   ` [PATCH v2 1/2] promisor-remote: accept 0 as oid_nr in function Jonathan Tan
2020-04-02 19:46     ` Junio C Hamano
2020-04-02 23:01       ` Jonathan Tan
2020-04-02 19:19   ` [PATCH v2 2/2] diff: restrict when prefetching occurs Jonathan Tan
2020-04-02 20:08     ` Junio C Hamano [this message]
2020-04-02 23:09       ` Jonathan Tan
2020-04-02 23:25         ` Junio C Hamano
2020-04-02 23:54         ` Junio C Hamano
2020-04-03 21:35           ` Jonathan Tan
2020-04-03 22:12             ` Junio C Hamano
2020-04-02 20:28   ` [PATCH v2 0/2] Restrict when prefetcing occurs Junio C Hamano
2020-04-06 11:44     ` Derrick Stolee
2020-04-06 11:57       ` Garima Singh
2020-04-07 22:11 ` [PATCH v3 0/4] " Jonathan Tan
2020-04-07 22:11   ` [PATCH v3 1/4] promisor-remote: accept 0 as oid_nr in function Jonathan Tan
2020-04-07 22:11   ` [PATCH v3 2/4] diff: make diff_populate_filespec_options struct Jonathan Tan
2020-04-07 23:44     ` Junio C Hamano
2020-04-07 22:11   ` [PATCH v3 3/4] diff: refactor object read Jonathan Tan
2020-04-07 22:11   ` [PATCH v3 4/4] diff: restrict when prefetching occurs Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7dyx3b1o.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).