All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, gitster@pobox.com
Subject: [PATCH] fetch: remove fetch_if_missing=0
Date: Fri,  1 Nov 2019 13:38:30 -0700	[thread overview]
Message-ID: <20191101203830.231676-1-jonathantanmy@google.com> (raw)

In fetch_pack() (and all functions it calls), pass
OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
a tree or blob that we do not want to be lazy-fetched even if it is
absent. Thus, the only lazy-fetches occurring for trees and blobs are
when resolving deltas.

Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
this, and also add a test ensuring that such objects are not
lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
places too, but I have limited myself to builtin/fetch.c in this commit
because I have not written tests for the other commands yet.)

Note that commits and tags may still be lazy-fetched. I limited myself
to objects that could be trees or blobs here because Git does not
support creating such commit- and tag-excluding clones yet, and even if
such a clone were manually created, Git does not have good support for
fetching a single commit (when fetching a commit, it and all its
ancestors would be sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
I've verified that this also solves the bug explained in:
https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/

Quoting what I wrote from there:

> (Alternatively, we
> could add the equivalent of OBJECT_INFO_SKIP_FETCH_OBJECT to functions
> like parse_commit() that are used by files like negotiator/default.c, or
> split up commit parsing into object reading - which already has that
> flag - and commit parsing.)

This commit adds OBJECT_INFO_SKIP_FETCH_OBJECT to functions, but not the
parse_commit() mentioned - as stated above, this commit only handles
objects that could be trees or blobs.
---
 builtin/fetch.c          |  5 ++-
 fetch-pack.c             |  3 +-
 t/t5616-partial-clone.sh | 69 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0c345b5dfe..5ff7367dd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
 	 * we need all direct targets to exist.
 	 */
 	for (r = rm; r; r = r->next) {
-		if (!has_object_file(&r->old_oid))
+		if (!has_object_file_with_flags(&r->old_oid,
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			return -1;
 	}
 
@@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0130b44112..37178e2d34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 		struct object *o;
 
 		if (!has_object_file_with_flags(&ref->old_oid,
-						OBJECT_INFO_QUICK))
+						OBJECT_INFO_QUICK |
+							OBJECT_INFO_SKIP_FETCH_OBJECT))
 			continue;
 		o = parse_object(the_repository, &ref->old_oid);
 		if (!o)
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 79f7b65f8c..1081ed2de0 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
 	test_i18ngrep "unable to parse sparse filter data in" err
 '
 
+setup_triangle () {
+	rm -rf big-blob.txt server client promisor-remote &&
+
+	touch big-blob.txt &&
+	for i in $(seq 1 100)
+	do
+		echo line $i >>big-blob.txt
+	done &&
+
+	# Create a server with 2 commits: a commit with a big blob and a child
+	# commit with an incremental change. Also, create a partial clone
+	# client that only contains the first commit.
+	git init server &&
+	git -C server config --local uploadpack.allowfilter 1 &&
+	cp big-blob.txt server &&
+	git -C server add big-blob.txt &&
+	git -C server commit -m "initial" &&
+	git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
+	echo another line >>server/big-blob.txt &&
+	git -C server commit -am "append line to big blob" &&
+
+	# Create a promisor remote that only contains the blob from the first
+	# commit, and set it as the promisor remote of client. Thus, whenever
+	# the client lazy fetches, the lazy fetch will succeed only if it is
+	# for this blob.
+	git init promisor-remote &&
+	test_commit -C promisor-remote one && # so that ref advertisement is not empty
+	git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
+	git -C promisor-remote hash-object -w --stdin <big-blob.txt &&
+	git -C client remote set-url origin "file://$(pwd)/promisor-remote"
+}
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas' '
+	setup_triangle &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' '
+	setup_triangle &&
+
+	git -C server config --local protocol.version 2 &&
+	git -C client config --local protocol.version 2 &&
+	git -C promisor-remote config --local protocol.version 2 &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify that protocol version 2 was used.
+	grep "fetch< version 2" trace &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


             reply	other threads:[~2019-11-01 20:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-01 20:38 Jonathan Tan [this message]
2019-11-01 22:05 ` [PATCH] fetch: remove fetch_if_missing=0 Jonathan Nieder
2019-11-02  5:55   ` Junio C Hamano
2019-11-02  6:11     ` Eric Sunshine
2019-11-02  5:59   ` Junio C Hamano
2019-11-05 18:53   ` Jonathan Tan
2019-11-05 18:58     ` Jonathan Nieder
2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
2019-11-05 20:06   ` Eric Sunshine
2019-11-06  1:45   ` Junio C Hamano
2019-11-08  6:33   ` Junio C Hamano
2019-11-08  7:40     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191101203830.231676-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.