All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] fixup! diff: batch fetching of missing blobs
Date: Mon, 08 Apr 2019 13:06:03 +0900	[thread overview]
Message-ID: <xmqqmul1b0pg.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <CACsJy8AjyY1Azbf4VfnJMA3O8NNBV1P16dZRrHOUJdsYQHUu4g@mail.gmail.com> (Duy Nguyen's message of "Sat, 6 Apr 2019 11:17:46 +0700")

Duy Nguyen <pclouds@gmail.com> writes:

>> Avoid a usage of the_repository by propagating the configured repository
>> to add_if_missing(). Also, prefetch only if the repository being diffed
>> is the_repository (because we do not support lazy fetching for any other
>> repository anyway).

If we are willing to stay limited to the default repository anyway,
allowing add_if_missing() to take an arbitrary repository does not
really matter, but before the caller of add_if_missing() befcomes
ready to work on an arbitrary repository, this change has to happen.

To update the caller, it seems to me that fetch_objects() must learn
to take an arbitrary repository, but is that the only thing needed?
After that, the function that the caller resides in and callchain
upwards can learn to take a repository instance if we want to be
able to diff inside an arbitrary repository.

But.  Such a change still would not allow us to compare a tree in
one repository against a tree in another repository.  It is likely
that a caller with such a need would simply make sure that objects
in both repositories are available by using the in-core alternate
object store mechanism, making it a more-or-less moot point to be
able to pass a repository instance through the callchain X-<.  We
probably should make it, and spell it out somewhere in a long term
vision shared among the developers, an explicit goal to get rid of
the internal (ab)use of the alternate object store mechanism.

With squashing the fix-up commit in, the 2/2 patch has become like
so.

Thanks, both.

-- >8 --
From: Jonathan Tan <jonathantanmy@google.com>
Date: Fri, 5 Apr 2019 10:09:34 -0700
Subject: [PATCH] diff: batch fetching of missing blobs

When running a command like "git show" or "git diff" in a partial clone,
batch all missing blobs to be fetched as one request.

This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
blobs", 2017-12-08), but for another command.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c                        |  33 +++++++++++
 t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)
 create mode 100755 t/t4067-diff-partial-clone.sh

diff --git a/diff.c b/diff.c
index ec5c095199..811afbdfb1 100644
--- a/diff.c
+++ b/diff.c
@@ -25,6 +25,7 @@
 #include "packfile.h"
 #include "parse-options.h"
 #include "help.h"
+#include "fetch-object.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -6366,8 +6367,40 @@ void diffcore_fix_diff_index(void)
 	QSORT(q->queue, q->nr, diffnamecmp);
 }
 
+static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
+			   const struct diff_filespec *filespec)
+{
+	if (filespec && filespec->oid_valid &&
+	    oid_object_info_extended(r, &filespec->oid, NULL,
+				     OBJECT_INFO_FOR_PREFETCH))
+		oid_array_append(to_fetch, &filespec->oid);
+}
+
 void diffcore_std(struct diff_options *options)
 {
+	if (options->repo == the_repository &&
+	    repository_format_partial_clone) {
+		/*
+		 * Prefetch the diff pairs that are about to be flushed.
+		 */
+		int i;
+		struct diff_queue_struct *q = &diff_queued_diff;
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+
+		for (i = 0; i < q->nr; i++) {
+			struct diff_filepair *p = q->queue[i];
+			add_if_missing(&to_fetch, options->repo, p->one);
+			add_if_missing(&to_fetch, options->repo, p->two);
+		}
+		if (to_fetch.nr)
+			/*
+			 * NEEDSWORK: Consider deduplicating the OIDs sent.
+			 */
+			fetch_objects(repository_format_partial_clone,
+				      to_fetch.oid, to_fetch.nr);
+		oid_array_clear(&to_fetch);
+	}
+
 	/* NOTE please keep the following in sync with diff_tree_combined() */
 	if (options->skip_stat_unmatch)
 		diffcore_skip_stat_unmatch(options);
diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
new file mode 100755
index 0000000000..90c8fb2901
--- /dev/null
+++ b/t/t4067-diff-partial-clone.sh
@@ -0,0 +1,103 @@
+#!/bin/sh
+
+test_description='behavior of diff when reading objects in a partial clone'
+
+. ./test-lib.sh
+
+test_expect_success 'git show batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'diff batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	echo c >server/c &&
+	echo d >server/d &&
+	git -C server add c d &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'diff skips same-OID blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	echo another-a >server/a &&
+	git -C server add a &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	echo a | git hash-object --stdin >hash-old-a &&
+	echo another-a | git hash-object --stdin >hash-new-a &&
+	echo b | git hash-object --stdin >hash-b &&
+
+	# Ensure that only a and another-a are fetched.
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
+	grep "want $(cat hash-old-a)" trace &&
+	grep "want $(cat hash-new-a)" trace &&
+	! grep "want $(cat hash-b)" trace
+'
+
+test_expect_success 'diff with rename detection batches blobs' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	printf "b\nb\nb\nb\nb\n" >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+	rm server/b &&
+	printf "b\nb\nb\nb\nbX\n" >server/c &&
+	git -C server add c &&
+	git -C server commit -a -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is exactly 1 negotiation by checking that there is
+	# only 1 "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
+	grep "similarity index" out &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_done
-- 
2.21.0-196-g041f5ea1cf


  parent reply	other threads:[~2019-04-08  4:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26 22:09 [PATCH] diff: batch fetching of missing blobs Jonathan Tan
2019-03-27 10:10 ` SZEDER Gábor
2019-03-27 22:02 ` Johannes Schindelin
2019-03-28  6:52 ` Jeff King
2019-03-29 21:39 ` [PATCH v2 0/2] Batch fetching of missing blobs in diff and show Jonathan Tan
2019-03-29 21:39   ` [PATCH v2 1/2] sha1-file: support OBJECT_INFO_FOR_PREFETCH Jonathan Tan
2019-04-05 14:13     ` Johannes Schindelin
2019-04-05 22:00     ` Jeff King
2019-03-29 21:39   ` [PATCH v2 2/2] diff: batch fetching of missing blobs Jonathan Tan
2019-04-04  2:47     ` SZEDER Gábor
2019-04-05 13:38       ` Johannes Schindelin
2019-04-07  6:00         ` Christian Couder
2019-04-08  2:36           ` Junio C Hamano
2019-04-08  5:51             ` Junio C Hamano
2019-04-08  6:03               ` Junio C Hamano
2019-04-08  6:45                 ` Christian Couder
2019-04-08  6:40             ` Christian Couder
2019-04-08  7:59               ` Junio C Hamano
2019-04-08  9:56                 ` Christian Couder
2019-04-05  9:39     ` Duy Nguyen
2019-04-05 17:09       ` [PATCH] fixup! " Jonathan Tan
2019-04-05 20:16         ` Johannes Schindelin
2019-04-06  4:17         ` Duy Nguyen
2019-04-08  3:46           ` Junio C Hamano
2019-04-08  4:06           ` Junio C Hamano [this message]
2019-04-08  9:58             ` Duy Nguyen
2019-04-09  6:36               ` Junio C Hamano
2019-04-05 14:17     ` [PATCH v2 2/2] " Johannes Schindelin
2019-04-05 22:12   ` [PATCH v2 0/2] Batch fetching of missing blobs in diff and show Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmul1b0pg.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.