All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>
Subject: [PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs
Date: Mon,  8 Oct 2018 14:48:16 -0700	[thread overview]
Message-ID: <20181008214816.42856-1-jonathantanmy@google.com> (raw)

Whenever a sparse checkout occurs, the existence of all blobs in the
index is verified, whether or not they are included or excluded by the
.git/info/sparse-checkout specification. This degrades performance,
significantly in the case of a partial clone, because a lazy fetch
occurs whenever the existence of a missing blob is checked.

At the point of invoking cache_tree_update() in unpack_trees(),
CE_SKIP_WORKTREE is already set on all excluded blobs
(mark_new_skip_worktree() is called twice to set CE_NEW_SKIP_WORKTREE,
then apply_sparse_checkout() is called which copies over
CE_NEW_SKIP_WORKTREE to CE_SKIP_WORKTREE). cache_tree_update() can use
this information to know which blobs are excluded, and thus skip the
checking of these.

Because cache_tree_update() is used from multiple places, this new
behavior is guarded by a new flag WRITE_TREE_SKIP_WORKTREE_MISSING_OK.
Implement this new flag, and teach unpack_trees() to invoke
cache_tree_update() with this new flag.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 cache-tree.c                     |  6 +++++-
 cache-tree.h                     |  4 ++++
 t/t1090-sparse-checkout-scope.sh | 33 ++++++++++++++++++++++++++++++++
 unpack-trees.c                   |  1 +
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 5ce51468f0..340caf2d34 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -246,6 +246,8 @@ static int update_one(struct cache_tree *it,
 	int missing_ok = flags & WRITE_TREE_MISSING_OK;
 	int dryrun = flags & WRITE_TREE_DRY_RUN;
 	int repair = flags & WRITE_TREE_REPAIR;
+	int skip_worktree_missing_ok =
+		flags & WRITE_TREE_SKIP_WORKTREE_MISSING_OK;
 	int to_invalidate = 0;
 	int i;
 
@@ -356,7 +358,9 @@ static int update_one(struct cache_tree *it,
 		}
 
 		if (is_null_oid(oid) ||
-		    (mode != S_IFGITLINK && !missing_ok && !has_object_file(oid))) {
+		    (mode != S_IFGITLINK && !missing_ok &&
+		     !(skip_worktree_missing_ok && ce_skip_worktree(ce)) &&
+		     !has_object_file(oid))) {
 			strbuf_release(&buffer);
 			if (expected_missing)
 				return -1;
diff --git a/cache-tree.h b/cache-tree.h
index 0ab6784ffe..655d487619 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -40,6 +40,10 @@ void cache_tree_verify(struct index_state *);
 #define WRITE_TREE_DRY_RUN 4
 #define WRITE_TREE_SILENT 8
 #define WRITE_TREE_REPAIR 16
+/*
+ * Do not check for the presence of cache entry objects with CE_SKIP_WORKTREE.
+ */
+#define WRITE_TREE_SKIP_WORKTREE_MISSING_OK 32
 
 /* error return codes */
 #define WRITE_TREE_UNREADABLE_INDEX (-1)
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 25d7c700f6..090b7fc3d3 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -63,4 +63,37 @@ test_expect_success 'return to full checkout of master' '
 	test "$(cat b)" = "modified"
 '
 
+test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' '
+	test_create_repo server &&
+	git clone "file://$(pwd)/server" client &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	echo a >server/a &&
+	echo bb >server/b &&
+	mkdir server/c &&
+	echo ccc >server/c/c &&
+	git -C server add a b c/c &&
+	git -C server commit -m message &&
+
+	test_config -C client core.sparsecheckout 1 &&
+	test_config -C client extensions.partialclone origin &&
+	echo "!/*" >client/.git/info/sparse-checkout &&
+	echo "/a" >>client/.git/info/sparse-checkout &&
+	git -C client fetch --filter=blob:none origin &&
+	git -C client checkout FETCH_HEAD &&
+
+	git -C client rev-list HEAD \
+		--quiet --objects --missing=print >unsorted_actual &&
+	(
+		printf "?" &&
+		git hash-object server/b &&
+		printf "?" &&
+		git hash-object server/c/c
+	) >unsorted_expect &&
+	sort unsorted_actual >actual &&
+	sort unsorted_expect >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 51bfac6aa0..39e0e7a6c7 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1635,6 +1635,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 				o->result.cache_tree = cache_tree();
 			if (!cache_tree_fully_valid(o->result.cache_tree))
 				cache_tree_update(&o->result,
+						  WRITE_TREE_SKIP_WORKTREE_MISSING_OK |
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
-- 
2.19.0.271.gfe8321ec05.dirty


             reply	other threads:[~2018-10-08 21:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 21:48 Jonathan Tan [this message]
2018-10-09  9:27 ` [PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs Junio C Hamano
2018-10-09  9:30 ` Junio C Hamano
2018-10-09 14:49   ` Ben Peart
2018-10-09 14:48 ` Ben Peart
2018-10-09 18:40 ` [PATCH v2] cache-tree: skip some blob checks in partial clone Jonathan Tan
2018-10-10  1:19   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181008214816.42856-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.