All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fetch: when deepening, check connectivity fully
@ 2018-06-27 17:32 Jonathan Tan
  2018-06-27 19:57 ` Junio C Hamano
  2018-06-27 20:17 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Tan @ 2018-06-27 17:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Do not stop at ancestors of our refs when deepening during fetching.
This is because when performing such a fetch, shallow entries may have
been updated; the client can reasonably assume that the existing refs
are connected up to the old shallow points, but not the new.

This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch.c          |  5 ++++-
 connected.c              |  6 ++++--
 connected.h              |  7 +++++++
 t/t5537-fetch-shallow.sh | 23 +++++++++++++++++++++++
 4 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669ad..2cfd13342f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -780,6 +780,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
+	struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
 	fp = fopen(filename, "a");
 	if (!fp)
@@ -791,7 +792,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	rm = ref_map;
-	if (check_connected(iterate_ref_map, &rm, NULL)) {
+	if (deepen)
+		opt.is_deepening_fetch = 1;
+	if (check_connected(iterate_ref_map, &rm, &opt)) {
 		rc = error(_("%s did not send all necessary objects\n"), url);
 		goto abort;
 	}
diff --git a/connected.c b/connected.c
index 91feb78815..1bba888eff 100644
--- a/connected.c
+++ b/connected.c
@@ -58,8 +58,10 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 	argv_array_push(&rev_list.args, "--stdin");
 	if (repository_format_partial_clone)
 		argv_array_push(&rev_list.args, "--exclude-promisor-objects");
-	argv_array_push(&rev_list.args, "--not");
-	argv_array_push(&rev_list.args, "--all");
+	if (!opt->is_deepening_fetch) {
+		argv_array_push(&rev_list.args, "--not");
+		argv_array_push(&rev_list.args, "--all");
+	}
 	argv_array_push(&rev_list.args, "--quiet");
 	if (opt->progress)
 		argv_array_pushf(&rev_list.args, "--progress=%s",
diff --git a/connected.h b/connected.h
index a53f03a61a..322dc76372 100644
--- a/connected.h
+++ b/connected.h
@@ -38,6 +38,13 @@ struct check_connected_options {
 	 * Insert these variables into the environment of the child process.
 	 */
 	const char **env;
+
+	/*
+	 * If non-zero, check the ancestry chain completely, not stopping at
+	 * any existing ref. This is necessary when deepening existing refs
+	 * during a fetch.
+	 */
+	unsigned is_deepening_fetch : 1;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index df8d2f095a..ac4beac96b 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -186,4 +186,27 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_success 'shallow fetches check connectivity without stopping at existing refs' '
+	cp -R .git server.git &&
+
+	# Normally, the connectivity check stops at ancestors of existing refs.
+	git init client &&
+	GIT_TRACE="$(pwd)/trace" git -C client fetch "$(pwd)/server.git" &&
+	grep "run_command: git rev-list" trace >rev-list-command &&
+	grep -e "--not --all" rev-list-command &&
+
+	# But it does not for a shallow fetch...
+	rm -rf client trace &&
+	git init client &&
+	GIT_TRACE="$(pwd)/trace" git -C client fetch --depth=1 "$(pwd)/server.git" &&
+	grep "run_command: git rev-list" trace >rev-list-command &&
+	! grep -e "--not --all" rev-list-command &&
+
+	# ...and when deepening.
+	rm trace &&
+	GIT_TRACE="$(pwd)/trace" git -C client fetch --unshallow "$(pwd)/server.git" &&
+	grep "run_command: git rev-list" trace >rev-list-command &&
+	! grep -e "--not --all" rev-list-command
+'
+
 test_done
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 17:32 [PATCH] fetch: when deepening, check connectivity fully Jonathan Tan
@ 2018-06-27 19:57 ` Junio C Hamano
  2018-06-27 22:40   ` Jonathan Tan
  2018-06-27 20:17 ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2018-06-27 19:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Do not stop at ancestors of our refs when deepening during fetching.
> This is because when performing such a fetch, shallow entries may have
> been updated; the client can reasonably assume that the existing refs
> are connected up to the old shallow points, but not the new.

OK.

> diff --git a/connected.c b/connected.c
> index 91feb78815..1bba888eff 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -58,8 +58,10 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  	argv_array_push(&rev_list.args, "--stdin");
>  	if (repository_format_partial_clone)
>  		argv_array_push(&rev_list.args, "--exclude-promisor-objects");
> -	argv_array_push(&rev_list.args, "--not");
> -	argv_array_push(&rev_list.args, "--all");
> +	if (!opt->is_deepening_fetch) {
> +		argv_array_push(&rev_list.args, "--not");
> +		argv_array_push(&rev_list.args, "--all");
> +	}
>  	argv_array_push(&rev_list.args, "--quiet");
>  	if (opt->progress)
>  		argv_array_pushf(&rev_list.args, "--progress=%s",

Hmph, remind me how old and/or new shallow cut-off points affect
this traversal?  In order to verify that everything above the new
cut-off points, opt->shallow_file should be pointing at the new
cut-off points, then we do the full sweep (without --not --all) to
ensure we won't find missing or broken objects but still stop at the
new cut-off points, and then only after check_connected() passes,
update the shallow file to make new shallow cut-off points active
(and if the traversal fails, then we do nto install the new shallow
cut-off points)?

If so, that sounds sensible.  By not having "--not --all", we
potentially would do a full sweep, but if we are really deepening to
the root commits, then we do need one full sweep anyway (as these
deepest parts of the history were previously hidden by the shallow
cutoff points), and if we are only deepening the history by a bit,
even if we do not have "--not --all", we would hit the updated
shallow cutoff points (which may be at older parts of the history)
and stop, and for correctness we need to visit there anyway, so I
think we are not being overly pessimistic with this change, either.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 17:32 [PATCH] fetch: when deepening, check connectivity fully Jonathan Tan
  2018-06-27 19:57 ` Junio C Hamano
@ 2018-06-27 20:17 ` Junio C Hamano
  2018-06-27 22:51   ` Jonathan Tan
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2018-06-27 20:17 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> +test_expect_success 'shallow fetches check connectivity without stopping at existing refs' '
> +	cp -R .git server.git &&
> +
> +	# Normally, the connectivity check stops at ancestors of existing refs.
> +	git init client &&
> +	GIT_TRACE="$(pwd)/trace" git -C client fetch "$(pwd)/server.git" &&
> +	grep "run_command: git rev-list" trace >rev-list-command &&
> +	grep -e "--not --all" rev-list-command &&
> +
> +	# But it does not for a shallow fetch...
> +	rm -rf client trace &&
> +	git init client &&
> +	GIT_TRACE="$(pwd)/trace" git -C client fetch --depth=1 "$(pwd)/server.git" &&
> +	grep "run_command: git rev-list" trace >rev-list-command &&
> +	! grep -e "--not --all" rev-list-command &&
> +
> +	# ...and when deepening.
> +	rm trace &&
> +	GIT_TRACE="$(pwd)/trace" git -C client fetch --unshallow "$(pwd)/server.git" &&
> +	grep "run_command: git rev-list" trace >rev-list-command &&
> +	! grep -e "--not --all" rev-list-command
> +'

Hmph, don't we quote these in the trace output, requiring us to grep
for "'--not' '--all'" or somesuch?  

I do not think of a better way to do the above without a huge effort
offhand, and the approach taken by the above may be the best we
could do, but it looks like quite a brittle test that knows too much
about the current implementation.  "rev-list $new_commits --not
--all" is a so very common and useful pattern that it is not all
that implausible that we may want to come up with a new option to do
so, or more likely we may want to do that with an in-process API
without spawning an external rev-list (hence making it impossible to
observe via GIT_TRACE).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 19:57 ` Junio C Hamano
@ 2018-06-27 22:40   ` Jonathan Tan
  2018-06-27 22:56     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Tan @ 2018-06-27 22:40 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Hmph, remind me how old and/or new shallow cut-off points affect
> this traversal?  In order to verify that everything above the new
> cut-off points, opt->shallow_file should be pointing at the new
> cut-off points, then we do the full sweep (without --not --all) to
> ensure we won't find missing or broken objects but still stop at the
> new cut-off points, and then only after check_connected() passes,
> update the shallow file to make new shallow cut-off points active
> (and if the traversal fails, then we do nto install the new shallow
> cut-off points)?

That is the way it should work, but after thinking about it once more, I
realize that it isn't.

opt->shallow_file is not set to anything. And fetch-pack updates the
shallow file by itself (at least, that is my understanding of
update_shallow() in fetch-pack.c) before fetch calls check_connected(),
meaning that if check_connected() fails, there is still no rollback of
the shallow file.

So to directly answer your first question, only the new shallow cut-off
points affect this traversal, and not the old ones.

> If so, that sounds sensible.  By not having "--not --all", we
> potentially would do a full sweep, but if we are really deepening to
> the root commits, then we do need one full sweep anyway (as these
> deepest parts of the history were previously hidden by the shallow
> cutoff points), and if we are only deepening the history by a bit,
> even if we do not have "--not --all", we would hit the updated
> shallow cutoff points (which may be at older parts of the history)
> and stop, and for correctness we need to visit there anyway, so I
> think we are not being overly pessimistic with this change, either.

Yes, this analysis makes sense.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 20:17 ` Junio C Hamano
@ 2018-06-27 22:51   ` Jonathan Tan
  2018-06-27 22:58     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Tan @ 2018-06-27 22:51 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +test_expect_success 'shallow fetches check connectivity without stopping at existing refs' '
> > +	cp -R .git server.git &&
> > +
> > +	# Normally, the connectivity check stops at ancestors of existing refs.
> > +	git init client &&
> > +	GIT_TRACE="$(pwd)/trace" git -C client fetch "$(pwd)/server.git" &&
> > +	grep "run_command: git rev-list" trace >rev-list-command &&
> > +	grep -e "--not --all" rev-list-command &&
> > +
> > +	# But it does not for a shallow fetch...
> > +	rm -rf client trace &&
> > +	git init client &&
> > +	GIT_TRACE="$(pwd)/trace" git -C client fetch --depth=1 "$(pwd)/server.git" &&
> > +	grep "run_command: git rev-list" trace >rev-list-command &&
> > +	! grep -e "--not --all" rev-list-command &&
> > +
> > +	# ...and when deepening.
> > +	rm trace &&
> > +	GIT_TRACE="$(pwd)/trace" git -C client fetch --unshallow "$(pwd)/server.git" &&
> > +	grep "run_command: git rev-list" trace >rev-list-command &&
> > +	! grep -e "--not --all" rev-list-command
> > +'
> 
> Hmph, don't we quote these in the trace output, requiring us to grep
> for "'--not' '--all'" or somesuch?  

I thought so too, but this was changed in commit 1fbdab21bb ("trace:
avoid unnecessary quoting", 2018-01-16).

> I do not think of a better way to do the above without a huge effort
> offhand, and the approach taken by the above may be the best we
> could do, but it looks like quite a brittle test that knows too much
> about the current implementation.  "rev-list $new_commits --not
> --all" is a so very common and useful pattern that it is not all
> that implausible that we may want to come up with a new option to do
> so, or more likely we may want to do that with an in-process API
> without spawning an external rev-list (hence making it impossible to
> observe via GIT_TRACE).

I agree. The best way to do it would probably be to intercept the fetch
response and substitute an empty packfile for the packfile returned by
the fetch, like the one-time-sed mechanism [1], but I think that it is
outside the scope of this patch.

[1] https://public-inbox.org/git/afe5d7d3f876893fdad318665805df1e056717c6.1485381677.git.jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 22:40   ` Jonathan Tan
@ 2018-06-27 22:56     ` Junio C Hamano
  2018-06-29 22:30       ` Jonathan Tan
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2018-06-27 22:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>> Hmph, remind me how old and/or new shallow cut-off points affect
>> this traversal?  In order to verify that everything above the new
>> cut-off points, opt->shallow_file should be pointing at the new
>> cut-off points, then we do the full sweep (without --not --all) to
>> ensure we won't find missing or broken objects but still stop at the
>> new cut-off points, and then only after check_connected() passes,
>> update the shallow file to make new shallow cut-off points active
>> (and if the traversal fails, then we do nto install the new shallow
>> cut-off points)?
>
> That is the way it should work, but after thinking about it once more, I
> realize that it isn't.
>
> opt->shallow_file is not set to anything. And fetch-pack updates the
> shallow file by itself (at least, that is my understanding of
> update_shallow() in fetch-pack.c) before fetch calls check_connected(),
> meaning that if check_connected() fails, there is still no rollback of
> the shallow file.

Ouch.  We need to fix that; otherwise, a broken server will keep
giving you a corrupt repository even with this fix, no?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 22:51   ` Jonathan Tan
@ 2018-06-27 22:58     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2018-06-27 22:58 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>> Hmph, don't we quote these in the trace output, requiring us to grep
>> for "'--not' '--all'" or somesuch?  
>
> I thought so too, but this was changed in commit 1fbdab21bb ("trace:
> avoid unnecessary quoting", 2018-01-16).

Yup; fortunately or unfortunately, that and the "partial clone"
stuff are among the differences between v2.16 track and newer, so we
can just forget about v2.16.x and lower and target this fix to
maint-2.17 and above.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fetch: when deepening, check connectivity fully
  2018-06-27 22:56     ` Junio C Hamano
@ 2018-06-29 22:30       ` Jonathan Tan
  0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Tan @ 2018-06-29 22:30 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> > That is the way it should work, but after thinking about it once more, I
> > realize that it isn't.
> >
> > opt->shallow_file is not set to anything. And fetch-pack updates the
> > shallow file by itself (at least, that is my understanding of
> > update_shallow() in fetch-pack.c) before fetch calls check_connected(),
> > meaning that if check_connected() fails, there is still no rollback of
> > the shallow file.
> 
> Ouch.  We need to fix that; otherwise, a broken server will keep
> giving you a corrupt repository even with this fix, no?

Yes, that is true - the repository will be left in a corrupt state.

I did some more investigation, and usually things are OK because
unpack-trees runs a fsck_walk on all the objects it unpacks (with
--shallow-file appropriately set). Things are bad only if the packfile
is empty (or, presumably, if the packfile only has unrelated objects).

I managed to use the one-time-sed mechanism to craft a response that
triggers this case. This both enables me to avoid the brittle check of
"rev-list" appearing in the GIT_TRACE output (as discussed in [1]), and
to verify that a rollback of the shallow file works, once such a patch
is written. I'll look into writing such a patch, so feel free to hold
off on this until that patch is done.

[1] https://public-inbox.org/git/20180627225105.155996-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-29 22:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-27 17:32 [PATCH] fetch: when deepening, check connectivity fully Jonathan Tan
2018-06-27 19:57 ` Junio C Hamano
2018-06-27 22:40   ` Jonathan Tan
2018-06-27 22:56     ` Junio C Hamano
2018-06-29 22:30       ` Jonathan Tan
2018-06-27 20:17 ` Junio C Hamano
2018-06-27 22:51   ` Jonathan Tan
2018-06-27 22:58     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.