All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>
Subject: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
Date: Tue, 14 May 2019 14:10:55 -0700	[thread overview]
Message-ID: <4fcaa4481b5fd2a76aa21263f997e00913db0e0f.1557868134.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1557868134.git.jonathantanmy@google.com>

When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".

If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.

The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.

Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)

To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c     | 26 +++++++++++++++--
 t/t5616-partial-clone.sh | 61 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ccf4eb7e9b..0d55f73b0b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -14,6 +14,7 @@
 #include "thread-utils.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "fetch-object.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -1351,6 +1352,25 @@ static void fix_unresolved_deltas(struct hashfile *f)
 		sorted_by_pos[i] = &ref_deltas[i];
 	QSORT(sorted_by_pos, nr_ref_deltas, delta_pos_compare);
 
+	if (repository_format_partial_clone) {
+		/*
+		 * Prefetch the delta bases.
+		 */
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+		for (i = 0; i < nr_ref_deltas; i++) {
+			struct ref_delta_entry *d = sorted_by_pos[i];
+			if (!oid_object_info_extended(the_repository, &d->oid,
+						      NULL,
+						      OBJECT_INFO_FOR_PREFETCH))
+				continue;
+			oid_array_append(&to_fetch, &d->oid);
+		}
+		if (to_fetch.nr)
+			fetch_objects(repository_format_partial_clone,
+				      to_fetch.oid, to_fetch.nr);
+		oid_array_clear(&to_fetch);
+	}
+
 	for (i = 0; i < nr_ref_deltas; i++) {
 		struct ref_delta_entry *d = sorted_by_pos[i];
 		enum object_type type;
@@ -1650,8 +1670,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	int report_end_of_input = 0;
 
 	/*
-	 * index-pack never needs to fetch missing objects, since it only
-	 * accesses the repo to do hash collision checks
+	 * index-pack never needs to fetch missing objects except when
+	 * REF_DELTA bases are missing (which are explicitly handled). It only
+	 * accesses the repo to do hash collision checks and to check which
+	 * REF_DELTA bases need to be fetched.
 	 */
 	fetch_if_missing = 0;
 
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 7cc0c71556..f1baf83502 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -339,4 +339,65 @@ test_expect_success 'when partial cloning, tolerate server not sending target of
 	! test -e "$HTTPD_ROOT_PATH/one-time-sed"
 '
 
+test_expect_success 'tolerate server sending REF_DELTA against missing promisor objects' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	rm -rf "$SERVER" repo &&
+	test_create_repo "$SERVER" &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+	test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
+
+	# Create a commit with a blob to be used as a delta base.
+	for i in $(test_seq 10)
+	do
+		echo "this is a line" >>"$SERVER/foo.txt"
+	done &&
+	git -C "$SERVER" add foo.txt &&
+	git -C "$SERVER" commit -m bar &&
+	git -C "$SERVER" rev-parse HEAD:foo.txt >deltabase &&
+
+	git -c protocol.version=2 clone --no-checkout \
+		--filter=blob:none $HTTPD_URL/one_time_sed/server repo &&
+
+	# Sanity check to ensure that the client does not have that blob.
+	git -C repo rev-list --objects --exclude-promisor-objects \
+		-- $(cat deltabase) >objlist &&
+	test_line_count = 0 objlist &&
+
+	# Another commit. This commit will be fetched by the client.
+	echo "abcdefghijklmnopqrstuvwxyz" >>"$SERVER/foo.txt" &&
+	git -C "$SERVER" add foo.txt &&
+	git -C "$SERVER" commit -m baz &&
+
+	# Pack a thin pack containing, among other things, HEAD:foo.txt
+	# delta-ed against HEAD^:foo.txt.
+	printf "%s\n--not\n%s\n" \
+		$(git -C "$SERVER" rev-parse HEAD) \
+		$(git -C "$SERVER" rev-parse HEAD^) |
+		git -C "$SERVER" pack-objects --thin --stdout >thin.pack &&
+
+	# Ensure that the pack contains one delta against HEAD^:foo.txt. Since
+	# the delta contains at least 26 novel characters, the size cannot be
+	# contained in 4 bits, so the object header will take up 2 bytes. The
+	# most significant nybble of the first byte is 0b1111 (0b1 to indicate
+	# that the header continues, and 0b111 to indicate REF_DELTA), followed
+	# by any 3 nybbles, then the OID of the delta base.
+	git -C "$SERVER" rev-parse HEAD^:foo.txt >deltabase &&
+	printf "f.,..%s" $(intersperse "," <deltabase) >want &&
+	hex_unpack <thin.pack | intersperse "," >have &&
+	grep $(cat want) have &&
+
+	replace_packfile thin.pack &&
+
+	# Use protocol v2 because the sed command looks for the "packfile"
+	# section header.
+	test_config -C "$SERVER" protocol.version 2 &&
+
+	# Fetch the thin pack and ensure that index-pack is able to handle the
+	# REF_DELTA object with a missing promisor delta base.
+	git -C repo -c protocol.version=2 fetch &&
+
+	# Ensure that the one-time-sed script was used.
+	! test -e "$HTTPD_ROOT_PATH/one-time-sed"
+'
+
 test_done
-- 
2.21.0.1020.gf2820cf01a-goog


  parent reply	other threads:[~2019-05-14 21:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 21:10 [PATCH 0/2] Partial clone fix: handling received REF_DELTA Jonathan Tan
2019-05-14 21:10 ` [PATCH 1/2] t5616: refactor packfile replacement Jonathan Tan
2019-05-15  8:36   ` Johannes Schindelin
2019-05-15 18:22     ` Jonathan Tan
2019-05-14 21:10 ` Jonathan Tan [this message]
2019-05-15  8:46   ` [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases Johannes Schindelin
2019-05-15 18:28     ` Jonathan Tan
2019-05-17 18:33       ` Johannes Schindelin
2019-05-15 23:16   ` Jeff King
2019-05-16  1:43     ` Junio C Hamano
2019-05-16  4:04       ` Jeff King
2019-05-16 18:26     ` Jonathan Tan
2019-05-16 21:12       ` Jeff King
2019-05-16 21:30         ` Jonathan Tan
2019-05-16 21:42           ` Jeff King
2019-05-16 23:15             ` Jonathan Tan
2019-05-17  1:09               ` Jeff King
2019-05-17  1:22                 ` Jeff King
2019-05-17  4:39                   ` Jeff King
2019-05-17  4:42                     ` Jeff King
2019-05-17  7:20                     ` Duy Nguyen
2019-05-17  8:55                       ` Jeff King
2019-05-18 11:39                         ` Duy Nguyen
2019-05-20 23:04                           ` Nicolas Pitre
2019-05-21 21:20                             ` Jeff King
2019-06-03 22:23   ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fcaa4481b5fd2a76aa21263f997e00913db0e0f.1557868134.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.