git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: gitster@pobox.com
Cc: jonathantanmy@google.com, git@vger.kernel.org, stolee@gmail.com,
	peff@peff.net
Subject: Re: [PATCH v2 2/2] diff: restrict when prefetching occurs
Date: Thu,  2 Apr 2020 16:09:37 -0700	[thread overview]
Message-ID: <20200402230937.47323-1-jonathantanmy@google.com> (raw)
In-Reply-To: <xmqq7dyx3b1o.fsf@gitster.c.googlers.com>

> > +	int output_formats_to_prefetch = DIFF_FORMAT_DIFFSTAT |
> > +		DIFF_FORMAT_NUMSTAT |
> > +		DIFF_FORMAT_PATCH |
> > +		DIFF_FORMAT_SHORTSTAT |
> > +		DIFF_FORMAT_DIRSTAT;
> 
> Would this want to be a "const int" (or even #define), I wonder.  I
> do not care too much between the two, but leaving it as a variable
> makes me a bit nervous.

OK, will switch to "const int".

> > +	if (options->repo == the_repository && has_promisor_remote() &&
> > +	    (options->output_format & output_formats_to_prefetch ||
> > +	     (!options->found_follow && options->break_opt != -1))) {
> >  		int i;
> >  		struct diff_queue_struct *q = &diff_queued_diff;
> >  		struct oid_array to_fetch = OID_ARRAY_INIT;
> >  
> >  		for (i = 0; i < q->nr; i++) {
> >  			struct diff_filepair *p = q->queue[i];
> > -			add_if_missing(options->repo, &to_fetch, p->one);
> > -			add_if_missing(options->repo, &to_fetch, p->two);
> > +			diff_add_if_missing(options->repo, &to_fetch, p->one);
> > +			diff_add_if_missing(options->repo, &to_fetch, p->two);
> >  		}
> > +
> > +		prefetched = 1;
> > +
> 
> Wouldn't it logically make more sense to do this after calling
> promisor_remote_get_direct() and if to_fetch.nr is not 0, ...
> 
> >  		/*
> >  		 * NEEDSWORK: Consider deduplicating the OIDs sent.
> >  		 */
> >  		promisor_remote_get_direct(options->repo,
> >  					   to_fetch.oid, to_fetch.nr);
> > +
> 
> ... namely, here?
> 
> When (q->nr != 0), to_fetch.nr may not be zero, I suspect, but the
> original code before [1/2] protected against to_fetch.nr==0 case, so
> ...?

My idea is that this prefetch is a superset of what diffcore_rebase()
wants to prefetch, so if we have already done the necessary logic here
(even if nothing gets prefetched - which might be the case if we have
all objects), we do not need to do it in diffcore_rebase().

> > +	if (!prefetched) {
> > +		/*
> > +		 * At this point we know there's actual work to do: we have rename
> > +		 * destinations that didn't find an exact match, and we have potential
> > +		 * sources. So we'll have to do inexact rename detection, which
> > +		 * requires looking at the blobs.
> > +		 *
> > +		 * If we haven't already prefetched, it's worth pre-fetching
> > +		 * them as a group now.
> > +		 */
> 
> This comment makes me wonder if it would be even better to
> 
>  - prepare an empty to_fetch OID array in the caller,
> 
>  - if the output format is one of the ones that wants prefetch, add
>    object names to to_fetch in the caller, BUT not fetch there.
> 
>  - pass &to_fetch by the caller to this function, and this code here
>    may add even more objects,
> 
>  - then do the prefetch here (so a single promisor interaction will
>    grab objects the caller would have fetched before calling us and
>    the ones we want here), and then clear the to_fetch array.
> 
>  - the caller, after seeing this function returns, checks to_fetch
>    and if it is not empty, fetches (i.e. the caller prepared list of
>    objects based on the output type, we ended up not calling this
>    helper, and then finally the caller does the prefetch).
> 
> That way, the "unless we have already prefetched" logic can go, and
> we can lose one indentation level, no?

This means that the only prefetch occurs in diffcore_rename()? I don't
think this will work for 2 reasons:

 - diffcore_std() calls diffcore_break() (which also reads blobs) before
   diffcore_rename()
 - (more importantly) there's a code path in diffcore_std() that does
   not call diffcore_rename(), so we would still need some prefetching
   logic in diffcore_std() in case diffcore_rename() is not called

> > +		if (to_fetch.nr)
> > +			promisor_remote_get_direct(options->repo,
> > +						   to_fetch.oid, to_fetch.nr);
> 
> You no longer need the if(), no?

Ah...I'll remove the if().

  reply	other threads:[~2020-04-02 23:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  2:04 [PATCH] diff: restrict when prefetching occurs Jonathan Tan
2020-03-31 12:14 ` Derrick Stolee
2020-03-31 16:50   ` Jonathan Tan
2020-03-31 17:48     ` Derrick Stolee
2020-03-31 18:21       ` Junio C Hamano
2020-03-31 18:15 ` Junio C Hamano
2020-04-02 19:19 ` [PATCH v2 0/2] Restrict when prefetcing occurs Jonathan Tan
2020-04-02 19:19   ` [PATCH v2 1/2] promisor-remote: accept 0 as oid_nr in function Jonathan Tan
2020-04-02 19:46     ` Junio C Hamano
2020-04-02 23:01       ` Jonathan Tan
2020-04-02 19:19   ` [PATCH v2 2/2] diff: restrict when prefetching occurs Jonathan Tan
2020-04-02 20:08     ` Junio C Hamano
2020-04-02 23:09       ` Jonathan Tan [this message]
2020-04-02 23:25         ` Junio C Hamano
2020-04-02 23:54         ` Junio C Hamano
2020-04-03 21:35           ` Jonathan Tan
2020-04-03 22:12             ` Junio C Hamano
2020-04-02 20:28   ` [PATCH v2 0/2] Restrict when prefetcing occurs Junio C Hamano
2020-04-06 11:44     ` Derrick Stolee
2020-04-06 11:57       ` Garima Singh
2020-04-07 22:11 ` [PATCH v3 0/4] " Jonathan Tan
2020-04-07 22:11   ` [PATCH v3 1/4] promisor-remote: accept 0 as oid_nr in function Jonathan Tan
2020-04-07 22:11   ` [PATCH v3 2/4] diff: make diff_populate_filespec_options struct Jonathan Tan
2020-04-07 23:44     ` Junio C Hamano
2020-04-07 22:11   ` [PATCH v3 3/4] diff: refactor object read Jonathan Tan
2020-04-07 22:11   ` [PATCH v3 4/4] diff: restrict when prefetching occurs Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200402230937.47323-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).