All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>
Subject: [PATCH] apply: do not fetch when checking object existence
Date: Mon, 27 Jul 2020 18:04:03 -0700	[thread overview]
Message-ID: <20200728010403.95142-1-jonathantanmy@google.com> (raw)

There have been a few bugs wherein Git fetches missing objects whenever
the existence of an object is checked, even though it does not need to
perform such a fetch. To resolve these bugs, we could look at all the
places that has_object_file() (or a similar function) is used. As a
first step, introduce a new function has_object() that checks for the
existence of an object, with a default behavior of not fetching if the
object is missing and the repository is a partial clone. As we verify
each has_object_file() (or similar) usage, we can replace it with
has_object(), and we will know that we are done when we can delete
has_object_file() (and the other similar functions).

Also, the new function has_object() has more appropriate defaults:
besides not fetching, it also does not recheck packed storage.

Then, use this new function to fix a bug in apply.c.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
I mentioned the idea for this change here:
https://lore.kernel.org/git/20200721225020.1352772-1-jonathantanmy@google.com/

I included the same have_repository check introduced in 3e8b7d3c77
("has_sha1_file: don't bother if we are not in a repository",
2017-04-13), but I couldn't verify if it is still needed. In particular,
even at that commit, all the tests pass (after I make a small change
to an irrelevant test about the order of entries in a cookie file).
---
 apply.c        |  2 +-
 object-store.h | 25 +++++++++++++++++++++++--
 sha1-file.c    | 12 ++++++++++++
 t/t4150-am.sh  | 16 ++++++++++++++++
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/apply.c b/apply.c
index 8bff604dbe..402d80602a 100644
--- a/apply.c
+++ b/apply.c
@@ -3178,7 +3178,7 @@ static int apply_binary(struct apply_state *state,
 		return 0; /* deletion patch */
 	}
 
-	if (has_object_file(&oid)) {
+	if (has_object(the_repository, &oid, 0)) {
 		/* We already have the postimage */
 		enum object_type type;
 		unsigned long size;
diff --git a/object-store.h b/object-store.h
index f439d47af8..c4fc9dd74e 100644
--- a/object-store.h
+++ b/object-store.h
@@ -239,12 +239,33 @@ int read_loose_object(const char *path,
 		      unsigned long *size,
 		      void **contents);
 
+/* Retry packed storage after checking packed and loose storage */
+#define HAS_OBJECT_RECHECK_PACKED 1
+
+/*
+ * Returns 1 if the object exists. This function will not lazily fetch objects
+ * in a partial clone.
+ */
+int has_object(struct repository *r, const struct object_id *oid,
+	       unsigned flags);
+
+/*
+ * These macros and functions are deprecated. If checking existence for an
+ * object that is likely to be missing and/or whose absence is relatively
+ * inconsequential (or is consequential but the caller is prepared to handle
+ * it), use has_object(), which has better defaults (no lazy fetch in a partial
+ * clone and no rechecking of packed storage). In the unlikely event that a
+ * caller needs to assert existence of an object that it fully expects to
+ * exist, and wants to trigger a lazy fetch in a partial clone, use
+ * oid_object_info_extended() with a NULL struct object_info.
+ *
+ * These functions can be removed once all callers have migrated to
+ * has_object() and/or oid_object_info_extended().
+ */
 #ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
 #define has_sha1_file_with_flags(sha1, flags) repo_has_sha1_file_with_flags(the_repository, sha1, flags)
 #define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
 #endif
-
-/* Same as the above, except for struct object_id. */
 int repo_has_object_file(struct repository *r, const struct object_id *oid);
 int repo_has_object_file_with_flags(struct repository *r,
 				    const struct object_id *oid, int flags);
diff --git a/sha1-file.c b/sha1-file.c
index ccd34dd9e8..ff444d7abb 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1988,6 +1988,18 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 	return ret;
 }
 
+int has_object(struct repository *r, const struct object_id *oid,
+	       unsigned flags)
+{
+	int quick = !(flags & HAS_OBJECT_RECHECK_PACKED);
+	unsigned object_info_flags = OBJECT_INFO_SKIP_FETCH_OBJECT |
+		(quick ? OBJECT_INFO_QUICK : 0);
+
+	if (!startup_info->have_repository)
+		return 0;
+	return oid_object_info_extended(r, oid, NULL, object_info_flags) >= 0;
+}
+
 int repo_has_object_file_with_flags(struct repository *r,
 				    const struct object_id *oid, int flags)
 {
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index bda4586a79..94a2c76522 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -1133,4 +1133,20 @@ test_expect_success 'am and .gitattibutes' '
 	)
 '
 
+test_expect_success 'apply binary blob in partial clone' '
+	printf "\\000" >binary &&
+	git add binary &&
+	git commit -m "binary blob" &&
+	git format-patch --stdout -m HEAD^ >patch &&
+
+	test_create_repo server &&
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none "file://$(pwd)/server" client &&
+	test_when_finished "rm -rf client" &&
+
+	# Exercise to make sure that it works
+	git -C client am ../patch
+'
+
 test_done
-- 
2.28.0.rc0.142.g3c755180ce-goog


             reply	other threads:[~2020-07-28  1:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  1:04 Jonathan Tan [this message]
2020-07-28  1:19 ` [PATCH] apply: do not fetch when checking object existence Junio C Hamano
2020-07-28 18:23   ` Jonathan Tan
2020-08-05 23:06 ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 1/4] sha1-file: introduce no-lazy-fetch has_object() Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 2/4] apply: do not lazy fetch when applying binary Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 3/4] pack-objects: no fetch when allow-{any,promisor} Jonathan Tan
2020-08-05 23:06   ` [PATCH v2 4/4] fsck: do not lazy fetch known non-promisor object Jonathan Tan
2020-08-06 20:00   ` [PATCH v2 0/4] No-lazy-fetch has_object() and some fixes Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728010403.95142-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.