All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fail early when partial clone prefetch fails
@ 2022-09-27 22:12 Jonathan Tan
  2022-09-27 22:12 ` [PATCH 1/2] promisor-remote: remove a return value Jonathan Tan
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-09-27 22:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This fixes something we noticed at $DAYJOB, when a partial clone
prefetch intermittently fails but all subsequent fetches work. More
details are in patch 2's commit message.

Jonathan Tan (2):
  promisor-remote: remove a return value
  promisor-remote: die upon failing fetch

 object-file.c     |  4 ----
 promisor-remote.c | 23 ++++++++++++++---------
 promisor-remote.h | 11 +++++------
 3 files changed, 19 insertions(+), 19 deletions(-)

-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] promisor-remote: remove a return value
  2022-09-27 22:12 [PATCH 0/2] Fail early when partial clone prefetch fails Jonathan Tan
@ 2022-09-27 22:12 ` Jonathan Tan
  2022-09-29 19:23   ` Junio C Hamano
  2022-09-27 22:12 ` [PATCH 2/2] promisor-remote: die upon failing fetch Jonathan Tan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Jonathan Tan @ 2022-09-27 22:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

No caller of promisor_remote_get_direct() is checking its return value,
so remove it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 promisor-remote.c | 12 ++++--------
 promisor-remote.h | 11 +++++------
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 68f46f5ec7..8b4d650b4c 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -230,18 +230,17 @@ static int remove_fetched_oids(struct repository *repo,
 	return remaining_nr;
 }
 
-int promisor_remote_get_direct(struct repository *repo,
-			       const struct object_id *oids,
-			       int oid_nr)
+void promisor_remote_get_direct(struct repository *repo,
+				const struct object_id *oids,
+				int oid_nr)
 {
 	struct promisor_remote *r;
 	struct object_id *remaining_oids = (struct object_id *)oids;
 	int remaining_nr = oid_nr;
 	int to_free = 0;
-	int res = -1;
 
 	if (oid_nr == 0)
-		return 0;
+		return;
 
 	promisor_remote_init(repo);
 
@@ -256,12 +255,9 @@ int promisor_remote_get_direct(struct repository *repo,
 				continue;
 			}
 		}
-		res = 0;
 		break;
 	}
 
 	if (to_free)
 		free(remaining_oids);
-
-	return res;
 }
diff --git a/promisor-remote.h b/promisor-remote.h
index edc45ab0f5..df36eb08ef 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -39,13 +39,12 @@ static inline int has_promisor_remote(void)
 
 /*
  * Fetches all requested objects from all promisor remotes, trying them one at
- * a time until all objects are fetched. Returns 0 upon success, and non-zero
- * otherwise.
+ * a time until all objects are fetched.
  *
- * If oid_nr is 0, this function returns 0 (success) immediately.
+ * If oid_nr is 0, this function returns immediately.
  */
-int promisor_remote_get_direct(struct repository *repo,
-			       const struct object_id *oids,
-			       int oid_nr);
+void promisor_remote_get_direct(struct repository *repo,
+				const struct object_id *oids,
+				int oid_nr);
 
 #endif /* PROMISOR_REMOTE_H */
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] promisor-remote: die upon failing fetch
  2022-09-27 22:12 [PATCH 0/2] Fail early when partial clone prefetch fails Jonathan Tan
  2022-09-27 22:12 ` [PATCH 1/2] promisor-remote: remove a return value Jonathan Tan
@ 2022-09-27 22:12 ` Jonathan Tan
  2022-09-28 17:43   ` Glen Choo
  2022-09-29 20:14   ` Junio C Hamano
  2022-09-29 20:15 ` [PATCH 0/2] Fail early when partial clone prefetch fails Junio C Hamano
  2022-10-04 21:13 ` [PATCH v2 " Jonathan Tan
  3 siblings, 2 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-09-27 22:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In a partial clone, an attempt to read a missing object results in an
attempt to fetch that single object. In order to avoid multiple
sequential fetches, which would occur when multiple objects are missing
(which is the typical case), some commands have been taught to prefetch
in a batch: such a command would, in a partial clone, notice that
several objects that it will eventually need are missing, and call
promisor_remote_get_direct() with all such objects at once.

When this batch prefetch fails, these commands fall back to the
sequential fetches. But at $DAYJOB we have noticed that this results in
a bad user experience: a command would take unexpectedly long to finish
if the batch prefetch would fail for some intermittent reason, but all
subsequent fetches would work. It would be a better user experience for
such a command would just fail.

Therefore, make it a fatal error if the prefetch fails and at least one
object being fetched is known to be a promisor object. (The latter
criterion is to make sure that we are not misleading the user that such
an object would be present from the promisor remote. For example, a
missing object may be a result of repository corruption and not because
it is expectedly missing due to the repository being a partial clone.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-file.c     |  4 ----
 promisor-remote.c | 11 ++++++++++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/object-file.c b/object-file.c
index 5b270f046d..5e30960234 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1599,10 +1599,6 @@ static int do_oid_object_info_extended(struct repository *r,
 		if (fetch_if_missing && repo_has_promisor_remote(r) &&
 		    !already_retried &&
 		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
-			/*
-			 * TODO Investigate checking promisor_remote_get_direct()
-			 * TODO return value and stopping on error here.
-			 */
 			promisor_remote_get_direct(r, real, 1);
 			already_retried = 1;
 			continue;
diff --git a/promisor-remote.c b/promisor-remote.c
index 8b4d650b4c..faa7612941 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -4,6 +4,7 @@
 #include "config.h"
 #include "transport.h"
 #include "strvec.h"
+#include "packfile.h"
 
 struct promisor_remote_config {
 	struct promisor_remote *promisors;
@@ -238,6 +239,7 @@ void promisor_remote_get_direct(struct repository *repo,
 	struct object_id *remaining_oids = (struct object_id *)oids;
 	int remaining_nr = oid_nr;
 	int to_free = 0;
+	int i;
 
 	if (oid_nr == 0)
 		return;
@@ -255,9 +257,16 @@ void promisor_remote_get_direct(struct repository *repo,
 				continue;
 			}
 		}
-		break;
+		goto all_fetched;
+	}
+
+	for (i = 0; i < remaining_nr; i++) {
+		if (is_promisor_object(&remaining_oids[i]))
+			die(_("could not fetch %s from promisor remote"),
+			    oid_to_hex(&remaining_oids[i]));
 	}
 
+all_fetched:
 	if (to_free)
 		free(remaining_oids);
 }
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] promisor-remote: die upon failing fetch
  2022-09-27 22:12 ` [PATCH 2/2] promisor-remote: die upon failing fetch Jonathan Tan
@ 2022-09-28 17:43   ` Glen Choo
  2022-09-30 22:10     ` Jonathan Tan
  2022-09-29 20:14   ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Glen Choo @ 2022-09-28 17:43 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan

No real comments on the code - I'm not familiar enough with it, but it
seems really simple anyway.

Jonathan Tan <jonathantanmy@google.com> writes:

> When this batch prefetch fails, these commands fall back to the
> sequential fetches. But at $DAYJOB we have noticed that this results in
> a bad user experience: a command would take unexpectedly long to finish
> if the batch prefetch would fail for some intermittent reason, but all
> subsequent fetches would work. It would be a better user experience for
> such a command would just fail.

I'm not certain that this fail-fast approach is always a better user
experience:

- I could imagine that for a small-enough set of objects (say, a very
  restrictive set of sparse specifications), one-by-one fetching would be
  good enough.
- Even if one-by-one fetching isn't fast, I'd imagine that each
  individual fetch is more likely to succeed than a batch prefetch, and
  as a user, I would prefer to ^C an operation that takes longer than I
  expected than to have retry the repeatedly.

Here are some other arguments that you didn't mention, but I find more
convincing:

- Running prefetch in a non-interactive process (e.g. running a job in
  CI) and the user would prefer to fail fast than to have the job run
  longer than expected, e.g. they could script retries manually
  (although, maybe we should do that ourselves).

- Fetching might be subject to a quota, which will be exhausted by
  one-by-one fetching.

As such, it _might_ make sense to make this behavior configurable, since
we may sometimes want it and sometimes not.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] promisor-remote: remove a return value
  2022-09-27 22:12 ` [PATCH 1/2] promisor-remote: remove a return value Jonathan Tan
@ 2022-09-29 19:23   ` Junio C Hamano
  2022-09-30 22:02     ` Jonathan Tan
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2022-09-29 19:23 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> No caller of promisor_remote_get_direct() is checking its return value,
> so remove it.

The above is a good explanation why this change will not hurt the
current users, but is it a good thing to do in the first place?

Isn't it an indication that all the existing callers are sloppy?  Or
does it mean that the value returned from this function is not a
useful signal to the caller?

Looking at the implementation, the function seems to want to
indicate if the fetching succeeded (with 0) or failed (with -1).
What is curious is that the function seems to consider it a success
even if some requested objects were never downloaded after
consulting all the promisor remotes.  There is no way for the caller
to obtain the list of objects it wanted to have but the function
failed to deliver, either (other than obviously redoing what
promisor-remote.c::remove_fetched_oids() does itself).

Are these what makes the returned value from the function useless
and the callers ignore it?  If these are fixed, would it make the
indication of error more useful?

Looking at some of the existing callers:

 * builtin/index-pack.c calls it to fill the "gap" and after giving
   the function a chance to download "missing" objects, goes on its
   prosessing as if nothing has happened.  We will properly die if
   any object we need is still missing, so unless we are interested
   in failing early, checking the return value of the function does
   not help all that much.

 * The caller in builtin/pack-objects.c is similar.  We have a list
   of bunch of objects we want to pack, some of which may be
   missing.  We ask all missing ones from the list in bulk before
   doing anything, but we rely on the regular codepath to notice if
   the promisor remote did not give us necessary objects.

I didn't look at others, but the above two are probably OK, unless
we want to get a signal to be cautious about poorly managed promisor
remotes.  Not distinguishing the reason why we do not have objects
between a corrupt repository and a promisor remote that breaks its
earlier promises makes it impossible to say "hey, that promisor
remote is failing its expectation, perhaps we should wean us off
from it".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] promisor-remote: die upon failing fetch
  2022-09-27 22:12 ` [PATCH 2/2] promisor-remote: die upon failing fetch Jonathan Tan
  2022-09-28 17:43   ` Glen Choo
@ 2022-09-29 20:14   ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-09-29 20:14 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/object-file.c b/object-file.c
> index 5b270f046d..5e30960234 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1599,10 +1599,6 @@ static int do_oid_object_info_extended(struct repository *r,
>  		if (fetch_if_missing && repo_has_promisor_remote(r) &&
>  		    !already_retried &&
>  		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
> -			/*
> -			 * TODO Investigate checking promisor_remote_get_direct()
> -			 * TODO return value and stopping on error here.
> -			 */
>  			promisor_remote_get_direct(r, real, 1);
>  			already_retried = 1;
>  			continue;
> diff --git a/promisor-remote.c b/promisor-remote.c
> index 8b4d650b4c..faa7612941 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -4,6 +4,7 @@
>  #include "config.h"
>  #include "transport.h"
>  #include "strvec.h"
> +#include "packfile.h"
>  
>  struct promisor_remote_config {
>  	struct promisor_remote *promisors;
> @@ -238,6 +239,7 @@ void promisor_remote_get_direct(struct repository *repo,
>  	struct object_id *remaining_oids = (struct object_id *)oids;
>  	int remaining_nr = oid_nr;
>  	int to_free = 0;
> +	int i;
>  
>  	if (oid_nr == 0)
>  		return;
> @@ -255,9 +257,16 @@ void promisor_remote_get_direct(struct repository *repo,
>  				continue;
>  			}
>  		}
> -		break;
> +		goto all_fetched;

OK.  So when fetch_object() says it got everything we asked it to
give, there is no behaviour change.  But if invocations of
fetch_objects() on all promisor remotes end unsuccessfully, we see
if some are supposed to be available from the promisor remote and
die.

An obvious alternative would be to do without the previous step, and
instead tighten the callers to react to error return, and then from
the new code, return a different error return value to allow the
caller to tell what kind of breakage it got.  But this extra die()
with more sloppy callers would probably work just fine in practice.

> +	}
> +
> +	for (i = 0; i < remaining_nr; i++) {
> +		if (is_promisor_object(&remaining_oids[i]))
> +			die(_("could not fetch %s from promisor remote"),
> +			    oid_to_hex(&remaining_oids[i]));
>  	}
>  
> +all_fetched:
>  	if (to_free)
>  		free(remaining_oids);
>  }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Fail early when partial clone prefetch fails
  2022-09-27 22:12 [PATCH 0/2] Fail early when partial clone prefetch fails Jonathan Tan
  2022-09-27 22:12 ` [PATCH 1/2] promisor-remote: remove a return value Jonathan Tan
  2022-09-27 22:12 ` [PATCH 2/2] promisor-remote: die upon failing fetch Jonathan Tan
@ 2022-09-29 20:15 ` Junio C Hamano
  2022-09-30 21:50   ` Jonathan Tan
  2022-10-04 21:13 ` [PATCH v2 " Jonathan Tan
  3 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2022-09-29 20:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> This fixes something we noticed at $DAYJOB, when a partial clone
> prefetch intermittently fails but all subsequent fetches work. More
> details are in patch 2's commit message.
>
> Jonathan Tan (2):
>   promisor-remote: remove a return value
>   promisor-remote: die upon failing fetch
>
>  object-file.c     |  4 ----
>  promisor-remote.c | 23 ++++++++++++++---------
>  promisor-remote.h | 11 +++++------
>  3 files changed, 19 insertions(+), 19 deletions(-)

No changes to t/ directory?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Fail early when partial clone prefetch fails
  2022-09-29 20:15 ` [PATCH 0/2] Fail early when partial clone prefetch fails Junio C Hamano
@ 2022-09-30 21:50   ` Jonathan Tan
  2022-09-30 22:17     ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Tan @ 2022-09-30 21:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

Junio C Hamano <gitster@pobox.com> writes:
> No changes to t/ directory?

Hmm...any test you would like to see in particular? Here, we just fail
with a specific error message when any sort of fetching from the
promisor remote due to a lookup of a missing object.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] promisor-remote: remove a return value
  2022-09-29 19:23   ` Junio C Hamano
@ 2022-09-30 22:02     ` Jonathan Tan
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-09-30 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > No caller of promisor_remote_get_direct() is checking its return value,
> > so remove it.
> 
> The above is a good explanation why this change will not hurt the
> current users, but is it a good thing to do in the first place?
> 
> Isn't it an indication that all the existing callers are sloppy?  Or
> does it mean that the value returned from this function is not a
> useful signal to the caller?
> 
> Looking at the implementation, the function seems to want to
> indicate if the fetching succeeded (with 0) or failed (with -1).
> What is curious is that the function seems to consider it a success
> even if some requested objects were never downloaded after
> consulting all the promisor remotes.  There is no way for the caller
> to obtain the list of objects it wanted to have but the function
> failed to deliver, either (other than obviously redoing what
> promisor-remote.c::remove_fetched_oids() does itself).
> 
> Are these what makes the returned value from the function useless
> and the callers ignore it?  If these are fixed, would it make the
> indication of error more useful?
> 
> Looking at some of the existing callers:
> 
>  * builtin/index-pack.c calls it to fill the "gap" and after giving
>    the function a chance to download "missing" objects, goes on its
>    prosessing as if nothing has happened.  We will properly die if
>    any object we need is still missing, so unless we are interested
>    in failing early, checking the return value of the function does
>    not help all that much.
> 
>  * The caller in builtin/pack-objects.c is similar.  We have a list
>    of bunch of objects we want to pack, some of which may be
>    missing.  We ask all missing ones from the list in bulk before
>    doing anything, but we rely on the regular codepath to notice if
>    the promisor remote did not give us necessary objects.
> 
> I didn't look at others, but the above two are probably OK, unless
> we want to get a signal to be cautious about poorly managed promisor
> remotes.  

True - in all cases, we either fall back to a code path where we try to
read a single object (if promisor_remote_get_direct() was called by a
prefetch) or the same code path that is taken when a non-partial-clone
tries to read a missing object. So it's not fully clear what's going on,
but it is safe. I should have added a note about this in the commit
message and will do so in the next version.

> Not distinguishing the reason why we do not have objects
> between a corrupt repository and a promisor remote that breaks its
> earlier promises makes it impossible to say "hey, that promisor
> remote is failing its expectation, perhaps we should wean us off
> from it".

I'll add a note that in a subsequent patch, we will add a message that
can indeed distinguish this case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] promisor-remote: die upon failing fetch
  2022-09-28 17:43   ` Glen Choo
@ 2022-09-30 22:10     ` Jonathan Tan
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-09-30 22:10 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> I'm not certain that this fail-fast approach is always a better user
> experience:
> 
> - I could imagine that for a small-enough set of objects (say, a very
>   restrictive set of sparse specifications), one-by-one fetching would be
>   good enough.

I think that in this case, if you couldn't fetch a small set, you
wouldn't be able to fetch a single object too.

> - Even if one-by-one fetching isn't fast, I'd imagine that each
>   individual fetch is more likely to succeed than a batch prefetch, and
>   as a user, I would prefer to ^C an operation that takes longer than I
>   expected than to have retry the repeatedly.

Hmm...but when you ^C, you have to retry it too right?

> Here are some other arguments that you didn't mention, but I find more
> convincing:
> 
> - Running prefetch in a non-interactive process (e.g. running a job in
>   CI) and the user would prefer to fail fast than to have the job run
>   longer than expected, e.g. they could script retries manually
>   (although, maybe we should do that ourselves).

That's true.

> - Fetching might be subject to a quota, which will be exhausted by
>   one-by-one fetching.

I'll add a note that lengthy execution can be bad for quota reasons as
well (in addition to others).

> As such, it _might_ make sense to make this behavior configurable, since
> we may sometimes want it and sometimes not.

I don't think there is a compelling reason to fallback to single object
fetching (which is there not because it is useful, but just so that Git
commands that haven't been updated to prefetch will still function), so
I'd rather not add an option for this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Fail early when partial clone prefetch fails
  2022-09-30 21:50   ` Jonathan Tan
@ 2022-09-30 22:17     ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-09-30 22:17 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>> No changes to t/ directory?
>
> Hmm...any test you would like to see in particular? Here, we just fail
> with a specific error message when any sort of fetching from the
> promisor remote due to a lookup of a missing object.

I presume that we already have a test that interact with a promisor
remote that does not keep its promise?  If we don't then adding one
or two would help.  If we do, then how the command fails would
probably be different (observable via trace2) with your "early
break" change, no?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/2] Fail early when partial clone prefetch fails
  2022-09-27 22:12 [PATCH 0/2] Fail early when partial clone prefetch fails Jonathan Tan
                   ` (2 preceding siblings ...)
  2022-09-29 20:15 ` [PATCH 0/2] Fail early when partial clone prefetch fails Junio C Hamano
@ 2022-10-04 21:13 ` Jonathan Tan
  2022-10-04 21:13   ` [PATCH v2 1/2] promisor-remote: remove a return value Jonathan Tan
                     ` (2 more replies)
  3 siblings, 3 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-10-04 21:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Thanks everyone for your comments. I don't think that there is a test 
for when a lazy fetch fails, for some reason, so I added one.

Jonathan Tan (2):
  promisor-remote: remove a return value
  promisor-remote: die upon failing fetch

 object-file.c            |  4 ----
 promisor-remote.c        | 23 ++++++++++++++---------
 promisor-remote.h        | 11 +++++------
 t/t0410-partial-clone.sh | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 19 deletions(-)

Range-diff against v1:
1:  fdb35bc9a6 ! 1:  207332c2df promisor-remote: remove a return value
    @@ Commit message
         No caller of promisor_remote_get_direct() is checking its return value,
         so remove it.
     
    +    Not checking the return value means that the user would not know
    +    whether the failure of reading an object is due to the promisor remote
    +    not supplying the object or because of local repository corruption, but
    +    this will be fixed in a subsequent patch.
    +
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
2:  06a649b01a ! 2:  9bbf130c2c promisor-remote: die upon failing fetch
    @@ Commit message
         When this batch prefetch fails, these commands fall back to the
         sequential fetches. But at $DAYJOB we have noticed that this results in
         a bad user experience: a command would take unexpectedly long to finish
    -    if the batch prefetch would fail for some intermittent reason, but all
    -    subsequent fetches would work. It would be a better user experience for
    -    such a command would just fail.
    +    (and possibly use up a lot of bandwidth) if the batch prefetch would
    +    fail for some intermittent reason, but all subsequent fetches would
    +    work. It would be a better user experience for such a command would
    +    just fail.
     
         Therefore, make it a fatal error if the prefetch fails and at least one
         object being fetched is known to be a promisor object. (The latter
    @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
      	if (to_free)
      		free(remaining_oids);
      }
    +
    + ## t/t0410-partial-clone.sh ##
    +@@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects' '
    + 	grep "$HASH" out
    + '
    + 
    ++test_expect_success 'fetching of a promised object that promisor remote no longer has' '
    ++	rm -f err &&
    ++	test_create_repo unreliable-server &&
    ++	git -C unreliable-server config uploadpack.allowanysha1inwant 1 &&
    ++	git -C unreliable-server config uploadpack.allowfilter 1 &&
    ++	test_commit -C unreliable-server foo &&
    ++
    ++	git clone --filter=blob:none --no-checkout "file://$(pwd)/unreliable-server" unreliable-client &&
    ++
    ++	rm -rf unreliable-server/.git/objects/* &&
    ++	test_must_fail git -C unreliable-client checkout HEAD 2>err &&
    ++	grep "could not fetch.*from promisor remote" err
    ++'
    ++
    + test_expect_success 'fetching of missing objects works with ref-in-want enabled' '
    + 	# ref-in-want requires protocol version 2
    + 	git -C server config protocol.version 2 &&
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] promisor-remote: remove a return value
  2022-10-04 21:13 ` [PATCH v2 " Jonathan Tan
@ 2022-10-04 21:13   ` Jonathan Tan
  2022-10-04 21:13   ` [PATCH v2 2/2] promisor-remote: die upon failing fetch Jonathan Tan
  2022-10-05 18:09   ` [PATCH v2 0/2] Fail early when partial clone prefetch fails Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-10-04 21:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Junio C Hamano

No caller of promisor_remote_get_direct() is checking its return value,
so remove it.

Not checking the return value means that the user would not know
whether the failure of reading an object is due to the promisor remote
not supplying the object or because of local repository corruption, but
this will be fixed in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 promisor-remote.c | 12 ++++--------
 promisor-remote.h | 11 +++++------
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 68f46f5ec7..8b4d650b4c 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -230,18 +230,17 @@ static int remove_fetched_oids(struct repository *repo,
 	return remaining_nr;
 }
 
-int promisor_remote_get_direct(struct repository *repo,
-			       const struct object_id *oids,
-			       int oid_nr)
+void promisor_remote_get_direct(struct repository *repo,
+				const struct object_id *oids,
+				int oid_nr)
 {
 	struct promisor_remote *r;
 	struct object_id *remaining_oids = (struct object_id *)oids;
 	int remaining_nr = oid_nr;
 	int to_free = 0;
-	int res = -1;
 
 	if (oid_nr == 0)
-		return 0;
+		return;
 
 	promisor_remote_init(repo);
 
@@ -256,12 +255,9 @@ int promisor_remote_get_direct(struct repository *repo,
 				continue;
 			}
 		}
-		res = 0;
 		break;
 	}
 
 	if (to_free)
 		free(remaining_oids);
-
-	return res;
 }
diff --git a/promisor-remote.h b/promisor-remote.h
index edc45ab0f5..df36eb08ef 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -39,13 +39,12 @@ static inline int has_promisor_remote(void)
 
 /*
  * Fetches all requested objects from all promisor remotes, trying them one at
- * a time until all objects are fetched. Returns 0 upon success, and non-zero
- * otherwise.
+ * a time until all objects are fetched.
  *
- * If oid_nr is 0, this function returns 0 (success) immediately.
+ * If oid_nr is 0, this function returns immediately.
  */
-int promisor_remote_get_direct(struct repository *repo,
-			       const struct object_id *oids,
-			       int oid_nr);
+void promisor_remote_get_direct(struct repository *repo,
+				const struct object_id *oids,
+				int oid_nr);
 
 #endif /* PROMISOR_REMOTE_H */
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] promisor-remote: die upon failing fetch
  2022-10-04 21:13 ` [PATCH v2 " Jonathan Tan
  2022-10-04 21:13   ` [PATCH v2 1/2] promisor-remote: remove a return value Jonathan Tan
@ 2022-10-04 21:13   ` Jonathan Tan
  2022-10-05 18:09   ` [PATCH v2 0/2] Fail early when partial clone prefetch fails Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2022-10-04 21:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Junio C Hamano

In a partial clone, an attempt to read a missing object results in an
attempt to fetch that single object. In order to avoid multiple
sequential fetches, which would occur when multiple objects are missing
(which is the typical case), some commands have been taught to prefetch
in a batch: such a command would, in a partial clone, notice that
several objects that it will eventually need are missing, and call
promisor_remote_get_direct() with all such objects at once.

When this batch prefetch fails, these commands fall back to the
sequential fetches. But at $DAYJOB we have noticed that this results in
a bad user experience: a command would take unexpectedly long to finish
(and possibly use up a lot of bandwidth) if the batch prefetch would
fail for some intermittent reason, but all subsequent fetches would
work. It would be a better user experience for such a command would
just fail.

Therefore, make it a fatal error if the prefetch fails and at least one
object being fetched is known to be a promisor object. (The latter
criterion is to make sure that we are not misleading the user that such
an object would be present from the promisor remote. For example, a
missing object may be a result of repository corruption and not because
it is expectedly missing due to the repository being a partial clone.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-file.c            |  4 ----
 promisor-remote.c        | 11 ++++++++++-
 t/t0410-partial-clone.sh | 14 ++++++++++++++
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/object-file.c b/object-file.c
index 6c8e3b1660..a5e0160d28 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1599,10 +1599,6 @@ static int do_oid_object_info_extended(struct repository *r,
 		if (fetch_if_missing && repo_has_promisor_remote(r) &&
 		    !already_retried &&
 		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
-			/*
-			 * TODO Investigate checking promisor_remote_get_direct()
-			 * TODO return value and stopping on error here.
-			 */
 			promisor_remote_get_direct(r, real, 1);
 			already_retried = 1;
 			continue;
diff --git a/promisor-remote.c b/promisor-remote.c
index 8b4d650b4c..faa7612941 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -4,6 +4,7 @@
 #include "config.h"
 #include "transport.h"
 #include "strvec.h"
+#include "packfile.h"
 
 struct promisor_remote_config {
 	struct promisor_remote *promisors;
@@ -238,6 +239,7 @@ void promisor_remote_get_direct(struct repository *repo,
 	struct object_id *remaining_oids = (struct object_id *)oids;
 	int remaining_nr = oid_nr;
 	int to_free = 0;
+	int i;
 
 	if (oid_nr == 0)
 		return;
@@ -255,9 +257,16 @@ void promisor_remote_get_direct(struct repository *repo,
 				continue;
 			}
 		}
-		break;
+		goto all_fetched;
+	}
+
+	for (i = 0; i < remaining_nr; i++) {
+		if (is_promisor_object(&remaining_oids[i]))
+			die(_("could not fetch %s from promisor remote"),
+			    oid_to_hex(&remaining_oids[i]));
 	}
 
+all_fetched:
 	if (to_free)
 		free(remaining_oids);
 }
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 1e864cf317..5b7bee888d 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -215,6 +215,20 @@ test_expect_success 'fetching of missing objects' '
 	grep "$HASH" out
 '
 
+test_expect_success 'fetching of a promised object that promisor remote no longer has' '
+	rm -f err &&
+	test_create_repo unreliable-server &&
+	git -C unreliable-server config uploadpack.allowanysha1inwant 1 &&
+	git -C unreliable-server config uploadpack.allowfilter 1 &&
+	test_commit -C unreliable-server foo &&
+
+	git clone --filter=blob:none --no-checkout "file://$(pwd)/unreliable-server" unreliable-client &&
+
+	rm -rf unreliable-server/.git/objects/* &&
+	test_must_fail git -C unreliable-client checkout HEAD 2>err &&
+	grep "could not fetch.*from promisor remote" err
+'
+
 test_expect_success 'fetching of missing objects works with ref-in-want enabled' '
 	# ref-in-want requires protocol version 2
 	git -C server config protocol.version 2 &&
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] Fail early when partial clone prefetch fails
  2022-10-04 21:13 ` [PATCH v2 " Jonathan Tan
  2022-10-04 21:13   ` [PATCH v2 1/2] promisor-remote: remove a return value Jonathan Tan
  2022-10-04 21:13   ` [PATCH v2 2/2] promisor-remote: die upon failing fetch Jonathan Tan
@ 2022-10-05 18:09   ` Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-10-05 18:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Thanks everyone for your comments. I don't think that there is a test 
> for when a lazy fetch fails, for some reason, so I added one.
>
> Jonathan Tan (2):
>   promisor-remote: remove a return value
>   promisor-remote: die upon failing fetch
>
>  object-file.c            |  4 ----
>  promisor-remote.c        | 23 ++++++++++++++---------
>  promisor-remote.h        | 11 +++++------
>  t/t0410-partial-clone.sh | 14 ++++++++++++++
>  4 files changed, 33 insertions(+), 19 deletions(-)

Thanks.  Will replace.

I think the C/H code was good already in the first round, so
hopefully this can go 'next' early in this cycle.

>
> Range-diff against v1:
> 1:  fdb35bc9a6 ! 1:  207332c2df promisor-remote: remove a return value
>     @@ Commit message
>          No caller of promisor_remote_get_direct() is checking its return value,
>          so remove it.
>      
>     +    Not checking the return value means that the user would not know
>     +    whether the failure of reading an object is due to the promisor remote
>     +    not supplying the object or because of local repository corruption, but
>     +    this will be fixed in a subsequent patch.
>     +
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>          Signed-off-by: Junio C Hamano <gitster@pobox.com>
>      
> 2:  06a649b01a ! 2:  9bbf130c2c promisor-remote: die upon failing fetch
>     @@ Commit message
>          When this batch prefetch fails, these commands fall back to the
>          sequential fetches. But at $DAYJOB we have noticed that this results in
>          a bad user experience: a command would take unexpectedly long to finish
>     -    if the batch prefetch would fail for some intermittent reason, but all
>     -    subsequent fetches would work. It would be a better user experience for
>     -    such a command would just fail.
>     +    (and possibly use up a lot of bandwidth) if the batch prefetch would
>     +    fail for some intermittent reason, but all subsequent fetches would
>     +    work. It would be a better user experience for such a command would
>     +    just fail.
>      
>          Therefore, make it a fatal error if the prefetch fails and at least one
>          object being fetched is known to be a promisor object. (The latter
>     @@ promisor-remote.c: void promisor_remote_get_direct(struct repository *repo,
>       	if (to_free)
>       		free(remaining_oids);
>       }
>     +
>     + ## t/t0410-partial-clone.sh ##
>     +@@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects' '
>     + 	grep "$HASH" out
>     + '
>     + 
>     ++test_expect_success 'fetching of a promised object that promisor remote no longer has' '
>     ++	rm -f err &&
>     ++	test_create_repo unreliable-server &&
>     ++	git -C unreliable-server config uploadpack.allowanysha1inwant 1 &&
>     ++	git -C unreliable-server config uploadpack.allowfilter 1 &&
>     ++	test_commit -C unreliable-server foo &&
>     ++
>     ++	git clone --filter=blob:none --no-checkout "file://$(pwd)/unreliable-server" unreliable-client &&
>     ++
>     ++	rm -rf unreliable-server/.git/objects/* &&
>     ++	test_must_fail git -C unreliable-client checkout HEAD 2>err &&
>     ++	grep "could not fetch.*from promisor remote" err
>     ++'
>     ++
>     + test_expect_success 'fetching of missing objects works with ref-in-want enabled' '
>     + 	# ref-in-want requires protocol version 2
>     + 	git -C server config protocol.version 2 &&

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-05 18:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 22:12 [PATCH 0/2] Fail early when partial clone prefetch fails Jonathan Tan
2022-09-27 22:12 ` [PATCH 1/2] promisor-remote: remove a return value Jonathan Tan
2022-09-29 19:23   ` Junio C Hamano
2022-09-30 22:02     ` Jonathan Tan
2022-09-27 22:12 ` [PATCH 2/2] promisor-remote: die upon failing fetch Jonathan Tan
2022-09-28 17:43   ` Glen Choo
2022-09-30 22:10     ` Jonathan Tan
2022-09-29 20:14   ` Junio C Hamano
2022-09-29 20:15 ` [PATCH 0/2] Fail early when partial clone prefetch fails Junio C Hamano
2022-09-30 21:50   ` Jonathan Tan
2022-09-30 22:17     ` Junio C Hamano
2022-10-04 21:13 ` [PATCH v2 " Jonathan Tan
2022-10-04 21:13   ` [PATCH v2 1/2] promisor-remote: remove a return value Jonathan Tan
2022-10-04 21:13   ` [PATCH v2 2/2] promisor-remote: die upon failing fetch Jonathan Tan
2022-10-05 18:09   ` [PATCH v2 0/2] Fail early when partial clone prefetch fails Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.