All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
@ 2022-02-01 15:49 Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 1/6] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
                   ` (9 more replies)
  0 siblings, 10 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup

If a filter is changed on a partial clone repository, for example from
blob:none to blob:limit=1m, there is currently no straightforward way to
bulk-refetch the objects that match the new filter for existing local
commits. This is because the client will report commits as "have" during
negotiation and any dependent objects won't be included in the transferred
pack. Another use case is discussed at [1].

This patch series proposes adding a --refilter option to fetch & fetch-pack
to enable doing a full fetch with a different filter, as if the local has no
commits in common with the remote. It builds upon cbe566a071
("negotiator/noop: add noop fetch negotiator", 2020-08-18).

To note:

 1. This will produce duplicated objects between the existing and newly
    fetched packs, but gc will clean them up.
 2. This series doesn't check that there's a new filter in any way, whether
    configured via config or passed via --filter=. Personally I think that's
    fine.
 3. If a user fetches with --refilter applying a more restrictive filter
    than previously (eg: blob:limit=1m then blob:limit=1k) the eventual
    state is a no-op, since any referenced object already in the local
    repository is never removed. Potentially this could be improved in
    future by more advanced gc, possibly along the lines discussed at [2].

[1]
https://public-inbox.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
[2]
https://public-inbox.org/git/A4BAD509-FA1F-49C3-87AF-CF4B73C559F1@gmail.com/

Robert Coup (6):
  fetch-negotiator: add specific noop initializor
  fetch-pack: add partial clone refiltering
  builtin/fetch-pack: add --refilter option
  fetch: add --refilter option
  t5615-partial-clone: add test for --refilter
  doc/partial-clone: mention --refilter option

 Documentation/fetch-options.txt           |  9 ++++
 Documentation/git-fetch-pack.txt          |  4 ++
 Documentation/technical/partial-clone.txt |  3 ++
 builtin/fetch-pack.c                      |  4 ++
 builtin/fetch.c                           | 18 ++++++-
 fetch-negotiator.c                        |  5 ++
 fetch-negotiator.h                        |  8 ++++
 fetch-pack.c                              | 57 +++++++++++++++--------
 fetch-pack.h                              |  1 +
 remote-curl.c                             |  6 +++
 t/t5616-partial-clone.sh                  | 42 ++++++++++++++++-
 transport-helper.c                        |  3 ++
 transport.c                               |  4 ++
 transport.h                               |  4 ++
 14 files changed, 146 insertions(+), 22 deletions(-)


base-commit: 5d01301f2b865aa8dba1654d3f447ce9d21db0b5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1138%2Frcoup%2Frc-partial-clone-refilter-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1138/rcoup/rc-partial-clone-refilter-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1138
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/6] fetch-negotiator: add specific noop initializor
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
@ 2022-02-01 15:49 ` Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 2/6] fetch-pack: add partial clone refiltering Robert Coup via GitGitGadget
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a specific initializor for the noop fetch negotiator. This is
introduced to support allowing partial clones to skip commit negotiation
when refetching to apply a modified filter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-negotiator.c | 5 +++++
 fetch-negotiator.h | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/fetch-negotiator.c b/fetch-negotiator.c
index 273390229fe..fe316afbf03 100644
--- a/fetch-negotiator.c
+++ b/fetch-negotiator.c
@@ -23,3 +23,8 @@ void fetch_negotiator_init(struct repository *r,
 		return;
 	}
 }
+
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator)
+{
+	noop_negotiator_init(negotiator);
+}
diff --git a/fetch-negotiator.h b/fetch-negotiator.h
index ea78868504b..e348905a1f0 100644
--- a/fetch-negotiator.h
+++ b/fetch-negotiator.h
@@ -53,7 +53,15 @@ struct fetch_negotiator {
 	void *data;
 };
 
+/*
+ * Initialize a negotiator based on the repository settings.
+ */
 void fetch_negotiator_init(struct repository *r,
 			   struct fetch_negotiator *negotiator);
 
+/*
+ * Initialize a noop negotiator.
+ */
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/6] fetch-pack: add partial clone refiltering
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 1/6] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
@ 2022-02-01 15:49 ` Robert Coup via GitGitGadget
  2022-02-04 18:02   ` Jonathan Tan
  2022-02-01 15:49 ` [PATCH 3/6] builtin/fetch-pack: add --refilter option Robert Coup via GitGitGadget
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Allow refetching with a new partial clone filter by not attempting to
find or negotiate common commits with the remote, and always forcing
a full filtered fetch.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-pack.c | 57 ++++++++++++++++++++++++++++++++++------------------
 fetch-pack.h |  1 +
 2 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index dd6ec449f2d..dd670441656 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -312,19 +312,21 @@ static int find_common(struct fetch_negotiator *negotiator,
 		const char *remote_hex;
 		struct object *o;
 
-		/*
-		 * If that object is complete (i.e. it is an ancestor of a
-		 * local ref), we tell them we have it but do not have to
-		 * tell them about its ancestors, which they already know
-		 * about.
-		 *
-		 * We use lookup_object here because we are only
-		 * interested in the case we *know* the object is
-		 * reachable and we have already scanned it.
-		 */
-		if (((o = lookup_object(the_repository, remote)) != NULL) &&
-				(o->flags & COMPLETE)) {
-			continue;
+		if (!args->refilter) {
+			/*
+			* If that object is complete (i.e. it is an ancestor of a
+			* local ref), we tell them we have it but do not have to
+			* tell them about its ancestors, which they already know
+			* about.
+			*
+			* We use lookup_object here because we are only
+			* interested in the case we *know* the object is
+			* reachable and we have already scanned it.
+			*/
+			if (((o = lookup_object(the_repository, remote)) != NULL) &&
+					(o->flags & COMPLETE)) {
+				continue;
+			}
 		}
 
 		remote_hex = oid_to_hex(remote);
@@ -694,6 +696,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 
 	save_commit_buffer = 0;
 
+	if (args->refilter)
+		return;
+
 	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
 	for (ref = *refs; ref; ref = ref->next) {
 		struct object *o;
@@ -1022,9 +1027,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	int agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
-
-	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	unsigned is_refiltering = 0;
 
 	sort_ref_list(&ref, ref_compare_name);
 	QSORT(sought, nr_sought, cmp_ref_by_name);
@@ -1094,10 +1097,14 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	if (server_supports("filter")) {
 		server_supports_filtering = 1;
 		print_verbose(args, _("Server supports %s"), "filter");
-	} else if (args->filter_options.choice) {
+	} else if (args->filter_options.choice || args->refilter) {
 		warning("filtering not recognized by server, ignoring");
 	}
 
+	if (server_supports_filtering && args->refilter) {
+		is_refiltering = 1;
+	}
+
 	if (server_supports("deepen-since")) {
 		print_verbose(args, _("Server supports %s"), "deepen-since");
 		deepen_since_ok = 1;
@@ -1115,9 +1122,16 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	if (!server_supports_hash(the_hash_algo->name, NULL))
 		die(_("Server does not support this repository's object format"));
 
+	negotiator = &negotiator_alloc;
+	if (is_refiltering) {
+		fetch_negotiator_init_noop(negotiator);
+	} else {
+		fetch_negotiator_init(r, negotiator);
+	}
+
 	mark_complete_and_common_ref(negotiator, args, &ref);
 	filter_refs(args, &ref, sought, nr_sought);
-	if (everything_local(args, &ref)) {
+	if (!is_refiltering && everything_local(args, &ref)) {
 		packet_flush(fd[1]);
 		goto all_done;
 	}
@@ -1575,7 +1589,10 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct strvec index_pack_args = STRVEC_INIT;
 
 	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	if (args->refilter)
+		fetch_negotiator_init_noop(negotiator);
+	else
+		fetch_negotiator_init(r, negotiator);
 
 	packet_reader_init(&reader, fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -1601,7 +1618,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			/* Filter 'ref' by 'sought' and those that aren't local */
 			mark_complete_and_common_ref(negotiator, args, &ref);
 			filter_refs(args, &ref, sought, nr_sought);
-			if (everything_local(args, &ref))
+			if (!args->refilter && everything_local(args, &ref))
 				state = FETCH_DONE;
 			else
 				state = FETCH_SEND_REQUEST;
diff --git a/fetch-pack.h b/fetch-pack.h
index 7f94a2a5831..68df2230c55 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -42,6 +42,7 @@ struct fetch_pack_args {
 	unsigned update_shallow:1;
 	unsigned reject_shallow_remote:1;
 	unsigned deepen:1;
+	unsigned refilter:1;
 
 	/*
 	 * Indicate that the remote of this request is a promisor remote. The
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/6] builtin/fetch-pack: add --refilter option
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 1/6] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 2/6] fetch-pack: add partial clone refiltering Robert Coup via GitGitGadget
@ 2022-02-01 15:49 ` Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 4/6] fetch: " Robert Coup via GitGitGadget
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a --refilter option to fetch-pack to force a full fetch when
applying a new filter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/git-fetch-pack.txt | 4 ++++
 builtin/fetch-pack.c             | 4 ++++
 remote-curl.c                    | 6 ++++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index c9758847937..ad21c627d75 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -101,6 +101,10 @@ be in a separate packet, and the list must end with a flush packet.
 	current shallow boundary instead of from the tip of each
 	remote branch history.
 
+--refilter::
+	Skips negotiating commits with the server in order to reapply a new
+	partial clone filter and fetch all matching objects.
+
 --no-progress::
 	Do not show the progress.
 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c2d96f4c89a..eb478929236 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.from_promisor = 1;
 			continue;
 		}
+		if (!strcmp("--refilter", arg)) {
+			args.refilter = 1;
+			continue;
+		}
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
 			parse_list_objects_filter(&args.filter_options, arg);
 			continue;
diff --git a/remote-curl.c b/remote-curl.c
index 0dabef2dd7c..0679f71e935 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -43,6 +43,7 @@ struct options {
 		/* see documentation of corresponding flag in fetch-pack.h */
 		from_promisor : 1,
 
+		refilter : 1,
 		atomic : 1,
 		object_format : 1,
 		force_if_includes : 1;
@@ -198,6 +199,9 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "from-promisor")) {
 		options.from_promisor = 1;
 		return 0;
+	} else if (!strcmp(name, "refilter")) {
+		options.refilter = 1;
+		return 0;
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
@@ -1182,6 +1186,8 @@ static int fetch_git(struct discovery *heads,
 		strvec_push(&args, "--deepen-relative");
 	if (options.from_promisor)
 		strvec_push(&args, "--from-promisor");
+	if (options.refilter)
+		strvec_push(&args, "--refilter");
 	if (options.filter)
 		strvec_pushf(&args, "--filter=%s", options.filter);
 	strvec_push(&args, url.buf);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 4/6] fetch: add --refilter option
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-02-01 15:49 ` [PATCH 3/6] builtin/fetch-pack: add --refilter option Robert Coup via GitGitGadget
@ 2022-02-01 15:49 ` Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 5/6] t5615-partial-clone: add test for --refilter Robert Coup via GitGitGadget
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Teach fetch and transports the --refilter option to force a full fetch
when applying a new filter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  9 +++++++++
 builtin/fetch.c                 | 18 +++++++++++++++++-
 transport-helper.c              |  3 +++
 transport.c                     |  4 ++++
 transport.h                     |  4 ++++
 5 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index e967ff1874c..004078bfea9 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -162,6 +162,15 @@ endif::git-pull[]
 	behavior for a remote may be specified with the remote.<name>.tagOpt
 	setting. See linkgit:git-config[1].
 
+ifndef::git-pull[]
+--refilter::
+	Reapply a partial clone filter from configuration or `--filter=`, such
+	as when the filter definition has changed. Instead of negotiating with
+	the server to avoid transferring commits and associated objects that are
+	already present locally, this option fetches all objects that match the
+	filter.
+endif::git-pull[]
+
 --refmap=<refspec>::
 	When fetching refs listed on the command line, use the
 	specified refspec (can be given more than once) to map the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 5f06b21f8e9..1dec90fb25f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -59,7 +59,7 @@ static int prune_tags = -1; /* unspecified */
 
 static int all, append, dry_run, force, keep, multiple, update_head_ok;
 static int write_fetch_head = 1;
-static int verbosity, deepen_relative, set_upstream;
+static int verbosity, deepen_relative, set_upstream, refilter;
 static int progress = -1;
 static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
@@ -189,6 +189,9 @@ static struct option builtin_fetch_options[] = {
 	OPT_SET_INT_F(0, "unshallow", &unshallow,
 		      N_("convert to a complete repository"),
 		      1, PARSE_OPT_NONEG),
+	OPT_SET_INT_F(0, "refilter", &refilter,
+		      N_("re-fetch with a modified filter"),
+		      1, PARSE_OPT_NONEG),
 	{ OPTION_STRING, 0, "submodule-prefix", &submodule_prefix, N_("dir"),
 		   N_("prepend this to submodule path output"), PARSE_OPT_HIDDEN },
 	OPT_CALLBACK_F(0, "recurse-submodules-default",
@@ -1292,6 +1295,17 @@ static int check_exist_and_connected(struct ref *ref_map)
 	if (deepen)
 		return -1;
 
+	/*
+	 * Similarly, if we need to refilter a partial clone we already have
+	 * these commits reachable.  Running rev-list here will return with
+	 * a good (0) exit status and we'll bypass the fetch that we
+	 * really need to perform.  Claiming failure now will ensure
+	 * we perform the network exchange to reapply the filter.
+	 */
+	if (refilter)
+		return -1;
+
+
 	/*
 	 * check_connected() allows objects to merely be promised, but
 	 * we need all direct targets to exist.
@@ -1487,6 +1501,8 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (refilter)
+		set_option(transport, TRANS_OPT_REFILTER, "yes");
 	if (filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&filter_options);
diff --git a/transport-helper.c b/transport-helper.c
index a0297b0986c..3cab1ccc8d5 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -715,6 +715,9 @@ static int fetch_refs(struct transport *transport,
 	if (data->transport_options.update_shallow)
 		set_helper_option(transport, "update-shallow", "true");
 
+	if (data->transport_options.refilter)
+		set_helper_option(transport, "refilter", "true");
+
 	if (data->transport_options.filter_options.choice) {
 		const char *spec = expand_list_objects_filter_spec(
 			&data->transport_options.filter_options);
diff --git a/transport.c b/transport.c
index 2a3e3241545..a150964e20d 100644
--- a/transport.c
+++ b/transport.c
@@ -243,6 +243,9 @@ static int set_git_option(struct git_transport_options *opts,
 		list_objects_filter_die_if_populated(&opts->filter_options);
 		parse_list_objects_filter(&opts->filter_options, value);
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_REFILTER)) {
+		opts->refilter = !!value;
+		return 0;
 	} else if (!strcmp(name, TRANS_OPT_REJECT_SHALLOW)) {
 		opts->reject_shallow = !!value;
 		return 0;
@@ -377,6 +380,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
 	args.filter_options = data->options.filter_options;
+	args.refilter = data->options.refilter;
 	args.stateless_rpc = transport->stateless_rpc;
 	args.server_options = transport->server_options;
 	args.negotiation_tips = data->options.negotiation_tips;
diff --git a/transport.h b/transport.h
index 3f16e50c196..6e21a5b0393 100644
--- a/transport.h
+++ b/transport.h
@@ -16,6 +16,7 @@ struct git_transport_options {
 	unsigned update_shallow : 1;
 	unsigned reject_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned refilter : 1;
 
 	/* see documentation of corresponding flag in fetch-pack.h */
 	unsigned from_promisor : 1;
@@ -216,6 +217,9 @@ void transport_check_allowed(const char *type);
 /* Filter objects for partial clone and fetch */
 #define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
 
+/* Refilter a previous partial clone */
+#define TRANS_OPT_REFILTER "refilter"
+
 /* Request atomic (all-or-nothing) updates when pushing */
 #define TRANS_OPT_ATOMIC "atomic"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 5/6] t5615-partial-clone: add test for --refilter
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-02-01 15:49 ` [PATCH 4/6] fetch: " Robert Coup via GitGitGadget
@ 2022-02-01 15:49 ` Robert Coup via GitGitGadget
  2022-02-01 15:49 ` [PATCH 6/6] doc/partial-clone: mention --refilter option Robert Coup via GitGitGadget
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a test for partial clone refiltering under protocol v0 and v2.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 t/t5616-partial-clone.sh | 42 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 34469b6ac10..87b8095258f 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -166,6 +166,46 @@ test_expect_success 'manual prefetch of missing objects' '
 	test_line_count = 0 observed.oids
 '
 
+# create new commits in "src" repo to establish a history on file.4.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.4.txt' '
+	for x in a b c d e f
+	do
+		echo "Mod file.4.txt $x" >src/file.4.txt &&
+		if list_contains "a,b" "$x"; then
+			printf "%10000s" X >>src/file.4.txt
+		fi &&
+		if list_contains "c,d" "$x"; then
+			printf "%20000s" X >>src/file.4.txt
+		fi &&
+		git -C src add file.4.txt &&
+		git -C src commit -m "mod $x" || return 1
+	done &&
+	git -C src push -u srv main
+'
+
+# Do partial fetch to fetch smaller files; then verify that without --refilter
+# applying a new filter does not refetch missing large objects. Then use
+# --refilter to apply the new filter on existing commits. Test it under both
+# protocol v2 & v0.
+test_expect_success 'apply a different filter using --refilter' '
+	git -C pc1 fetch --filter=blob:limit=999 origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 4 observed &&
+
+	git -C pc1 fetch --filter=blob:limit=19999 --refilter origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 2 observed &&
+
+	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
+		--refilter origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 0 observed
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&
@@ -225,7 +265,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 
 	# Auto-fetch all remaining trees and blobs with --missing=error
 	git -C dst rev-list --missing=error --objects main >fetched_objects &&
-	test_line_count = 70 fetched_objects &&
+	test_line_count = 88 fetched_objects &&
 
 	awk -f print_1.awk fetched_objects |
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 6/6] doc/partial-clone: mention --refilter option
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-02-01 15:49 ` [PATCH 5/6] t5615-partial-clone: add test for --refilter Robert Coup via GitGitGadget
@ 2022-02-01 15:49 ` Robert Coup via GitGitGadget
  2022-02-01 20:13 ` [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Junio C Hamano
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-01 15:49 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add the fetch --refilter option to the partial clone documentation.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/technical/partial-clone.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index a0dd7c66f24..e246b0778e5 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -181,6 +181,9 @@ Fetching Missing Objects
   currently fetches all objects referred to by the requested objects, even
   though they are not necessary.
 
+- Fetching with `--refilter` will request a complete new filtered packfile from
+  the remote, which can be used to change a filter without needing to
+  dynamically fetch missing objects.
 
 Using many promisor remotes
 ---------------------------
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-02-01 15:49 ` [PATCH 6/6] doc/partial-clone: mention --refilter option Robert Coup via GitGitGadget
@ 2022-02-01 20:13 ` Junio C Hamano
  2022-02-02 15:02   ` Robert Coup
  2022-02-02 18:59 ` Jonathan Tan
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2022-02-01 20:13 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:

> If a filter is changed on a partial clone repository, for example from
> blob:none to blob:limit=1m, there is currently no straightforward way to
> bulk-refetch the objects that match the new filter for existing local
> commits. This is because the client will report commits as "have" during
> negotiation and any dependent objects won't be included in the transferred
> pack.

It sounds like a useful thing to have such a "refetch things"
option.

A lazy/partial clone is narrower than the full tree in the width
dimension, while a shallow clone is shallower than the full history
in the time dimension.  The latter already has the "--deepen"
support to say "the commits listed in my shallow boundary list may
claim that I already have these, but I actually don't have them;
please stop lying to the other side and refetch what I should have
fetched earlier".  I understand that this works in the other
dimension to "--widen" things?

Makes me wonder how well these two features work together (or if
they are mutually exclusive, that is fine as well as a starting
point).

If you update the filter specification to make it narrower (e.g. you
start from blob:limit=1m down to blob:limit=512k), would we transfer
nothing (which would be ideal), or would we end up refetching
everything that are smaller than 512k?

> This patch series proposes adding a --refilter option to fetch & fetch-pack
> to enable doing a full fetch with a different filter, as if the local has no
> commits in common with the remote. It builds upon cbe566a071
> ("negotiator/noop: add noop fetch negotiator", 2020-08-18).

I guess the answer to the last question is ...

> To note:
>
>  1. This will produce duplicated objects between the existing and newly
>     fetched packs, but gc will clean them up.

... it is not smart enough to stell them to exclude what we _ought_
to have by telling them what the _old_ filter spec was.  That's OK
for a starting point, I guess.  Hopefully, at the end of this
operation, we should garbage collect the duplicated objects by
default (with an option to turn it off)?

>  2. This series doesn't check that there's a new filter in any way, whether
>     configured via config or passed via --filter=. Personally I think that's
>     fine.

In other words, a repository that used to be a partial clone can
become a full clone by using the option _and_ not giving any filter.
I think that is an intuitive enough behaviour and a natural
consequence to the extreme of what the feature is.  Compared to
making a full "git clone", fetching from the old local (and narrow)
repository into it and then discarding the old one, it would not
have any performance or storage advantage, but it probably is more
convenient.

>  3. If a user fetches with --refilter applying a more restrictive filter
>     than previously (eg: blob:limit=1m then blob:limit=1k) the eventual
>     state is a no-op, since any referenced object already in the local
>     repository is never removed. Potentially this could be improved in
>     future by more advanced gc, possibly along the lines discussed at [2].

OK.  That matches my reaction to 1. above.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-01 20:13 ` [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Junio C Hamano
@ 2022-02-02 15:02   ` Robert Coup
  2022-02-16 13:24     ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-02-02 15:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, Derrick Stolee,
	Taylor Blau, Christian Couder, John Cai

Hi Junio

Thanks for your input. Hopefully some of the other partial-clone
interested folks will chime in too.

On Tue, 1 Feb 2022 at 20:13, Junio C Hamano <gitster@pobox.com> wrote:
>
> It sounds like a useful thing to have such a "refetch things"
> option.

Any improved suggestions on the argument name? I thought of
--refetch but `fetch --refetch` seemed more confusing to explain.

> Makes me wonder how well these two features work together (or if
> they are mutually exclusive, that is fine as well as a starting
> point).

I don't see any particular reason they can't work together - as you say,
the filtering is orthogonal to shallow on a conceptual level. I haven't
added a test for that scenario yet but will do for a v1.

> If you update the filter specification to make it narrower (e.g. you
> start from blob:limit=1m down to blob:limit=512k), would we transfer
> nothing (which would be ideal), or would we end up refetching
> everything that are smaller than 512k?

As you spot, the latter. I can't see a straightforward way of telling the
server "I have these trees/blobs already" without generating (one way
or the other) a list of millions of oids, then transferring & negotiating
with it.

> ... it is not smart enough to stell them to exclude what we _ought_
> to have by telling them what the _old_ filter spec was.  That's OK
> for a starting point, I guess.

The client doesn't really know what the local repository *has* —
potentially several filters could have been applied and used for fetches
at different points in the commit history, as well as objects dynamically
fetched in. Even a filter set in the config only applies to subsequent
fetches, and only if --filter isn't used to override it.

> Hopefully, at the end of this
> operation, we should garbage collect the duplicated objects by
> default (with an option to turn it off)?

I haven't explicitly looked into invoking gc yet, but yes, it'd be a bit of
a waste if it wasn't kicked off by default. Maybe reusing gc.auto

> In other words, a repository that used to be a partial clone can
> become a full clone by using the option _and_ not giving any filter.

For that specific case I think you can already do it by removing the
promisor flag in the remote config, potentially adding it back if you
wanted to keep it partial again from that point forward.

> I think that is an intuitive enough behaviour and a natural
> consequence to the extreme of what the feature is.  Compared to
> making a full "git clone", fetching from the old local (and narrow)
> repository into it and then discarding the old one, it would not
> have any performance or storage advantage, but it probably is more
> convenient.

It's certainly cleaner than abusing --deepen, or temporarily moving pack
files out of the way, or starting over with a fresh clone & copying config.

Thanks,

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-02-01 20:13 ` [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Junio C Hamano
@ 2022-02-02 18:59 ` Jonathan Tan
  2022-02-02 21:58   ` Robert Coup
  2022-02-07 19:37 ` Jeff Hostetler
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  9 siblings, 1 reply; 76+ messages in thread
From: Jonathan Tan @ 2022-02-02 18:59 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, jonathantanmy, stolee, me, christian.couder, johncai86, robert

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:
> If a filter is changed on a partial clone repository, for example from
> blob:none to blob:limit=1m, there is currently no straightforward way to
> bulk-refetch the objects that match the new filter for existing local
> commits. This is because the client will report commits as "have" during
> negotiation and any dependent objects won't be included in the transferred
> pack. Another use case is discussed at [1].

Reporting commits as "have" can be solved by forcing the noop
negotiator, but there is another issue (which you seem to have
discovered, glancing through your patches) in that fetch will abort (and
report success) if all wanted commits are present, even if not all
objects directly or indirectly referenced by those commits are present.

> This patch series proposes adding a --refilter option to fetch & fetch-pack
> to enable doing a full fetch with a different filter, as if the local has no
> commits in common with the remote. It builds upon cbe566a071
> ("negotiator/noop: add noop fetch negotiator", 2020-08-18).

Thanks - I think this is a useful feature. This is useful even in a
non-partial-clone repo, to repair objects that were, say, accidentally
deleted from the local object store.

If it's acceptable to have a separate command to configure the new
filter in the repo config (or to delete it, if we want to convert a
partial clone into a regular repo), I think it's clearer to name this
option "--repair" or something like that, and explain it as a fetch that
does not take into account the contents of the local object store (not
as a fetch that changes the filter).

Having said that, the overall feature is worth having. While we decide
on the name, the implementation of this will not change much. I'll try
to review the implementation by the end of this week (but other reviews
are welcome too, needless to say).

> To note:
> 
>  1. This will produce duplicated objects between the existing and newly
>     fetched packs, but gc will clean them up.

Noted. This might be an argument for naming the option "--repair", since
the user would probably understand that there would be duplicate
objects.

>  2. This series doesn't check that there's a new filter in any way, whether
>     configured via config or passed via --filter=. Personally I think that's
>     fine.

Agreed.

>  3. If a user fetches with --refilter applying a more restrictive filter
>     than previously (eg: blob:limit=1m then blob:limit=1k) the eventual
>     state is a no-op, since any referenced object already in the local
>     repository is never removed. Potentially this could be improved in
>     future by more advanced gc, possibly along the lines discussed at [2].

That's true.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-02 18:59 ` Jonathan Tan
@ 2022-02-02 21:58   ` Robert Coup
  2022-02-02 21:59     ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-02-02 21:58 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitgitgadget, git, stolee, me, christian.couder, johncai86

Hi Jonathan,

On Wed, 2 Feb 2022 at 19:00, Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks - I think this is a useful feature. This is useful even in a
> non-partial-clone repo, to repair objects that were, say, accidentally
> deleted from the local object store.

I'd

> If it's acceptable to have a separate command to configure the new
> filter in the repo config (or to delete it, if we want to convert a
> partial clone into a regular repo), I think it's clearer to name this
> option "--repair" or something like that, and explain it as a fetch that
> does not take into account the contents of the local object store (not
> as a fetch that changes the filter).

I quite like --repair, since the implementation really has zero to do with
filtering or partial clones beyond that being my use case for it. Specifying
a filter, shallow options, or even using a promisor remote aren't
even slightly necessary for the implementation as it turns out.

And as you say, that makes it easier to explain too:

"fetch --repair will fetch all objects from the remote (applying any filters
or shallow options as requested). Unlike a normal fetch it does not take into
account any content already in the local repository and acts more like an
initial clone. Any duplicate objects will get cleaned up during subsequent
maintenance."

"If you want to update your local repository with a different partial clone
filter, use `fetch --repair` to re-download all matching objects from the
remote."

> I'll try
> to review the implementation by the end of this week (but other reviews
> are welcome too, needless to say).

Thanks!

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-02 21:58   ` Robert Coup
@ 2022-02-02 21:59     ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-02-02 21:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitgitgadget, git, stolee, me, christian.couder, johncai86

On Wed, 2 Feb 2022 at 21:58, Robert Coup <robert@coup.net.nz> wrote:
>
> Hi Jonathan,
>
> On Wed, 2 Feb 2022 at 19:00, Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > Thanks - I think this is a useful feature. This is useful even in a
> > non-partial-clone repo, to repair objects that were, say, accidentally
> > deleted from the local object store.
>
> I'd

I'd be keen to hear the stories here :) Just losing loose object files?

Rob.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] fetch-pack: add partial clone refiltering
  2022-02-01 15:49 ` [PATCH 2/6] fetch-pack: add partial clone refiltering Robert Coup via GitGitGadget
@ 2022-02-04 18:02   ` Jonathan Tan
  2022-02-11 14:56     ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Jonathan Tan @ 2022-02-04 18:02 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, jonathantanmy, stolee, me, christian.couder, johncai86, robert

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -312,19 +312,21 @@ static int find_common(struct fetch_negotiator *negotiator,
>  		const char *remote_hex;
>  		struct object *o;
>  
> -		/*
> -		 * If that object is complete (i.e. it is an ancestor of a
> -		 * local ref), we tell them we have it but do not have to
> -		 * tell them about its ancestors, which they already know
> -		 * about.
> -		 *
> -		 * We use lookup_object here because we are only
> -		 * interested in the case we *know* the object is
> -		 * reachable and we have already scanned it.
> -		 */
> -		if (((o = lookup_object(the_repository, remote)) != NULL) &&
> -				(o->flags & COMPLETE)) {
> -			continue;
> +		if (!args->refilter) {
> +			/*
> +			* If that object is complete (i.e. it is an ancestor of a
> +			* local ref), we tell them we have it but do not have to
> +			* tell them about its ancestors, which they already know
> +			* about.
> +			*
> +			* We use lookup_object here because we are only
> +			* interested in the case we *know* the object is
> +			* reachable and we have already scanned it.
> +			*/
> +			if (((o = lookup_object(the_repository, remote)) != NULL) &&
> +					(o->flags & COMPLETE)) {
> +				continue;
> +			}

The approach that I would have expected is to not call
mark_complete_and_common_ref(), filter_refs(), everything_local(), and
find_common(), but your approach here is to ensure that
mark_complete_and_common_ref() and find_common() do not do anything.
Comparing the two approaches, the advantage of yours is that we only
need to make the change once to support both protocol v0 and v2
(although the change looks more substantial than just skipping function
calls), but it makes the code more difficult to read in that we now have
function calls that do nothing. What do you think about my approach?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (7 preceding siblings ...)
  2022-02-02 18:59 ` Jonathan Tan
@ 2022-02-07 19:37 ` Jeff Hostetler
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  9 siblings, 0 replies; 76+ messages in thread
From: Jeff Hostetler @ 2022-02-07 19:37 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget, git
  Cc: Jonathan Tan, Derrick Stolee, Taylor Blau, Christian Couder,
	John Cai, Robert Coup



On 2/1/22 10:49 AM, Robert Coup via GitGitGadget wrote:
> If a filter is changed on a partial clone repository, for example from
> blob:none to blob:limit=1m, there is currently no straightforward way to
> bulk-refetch the objects that match the new filter for existing local
> commits. This is because the client will report commits as "have" during
> negotiation and any dependent objects won't be included in the transferred
> pack. Another use case is discussed at [1].
> 
> This patch series proposes adding a --refilter option to fetch & fetch-pack
> to enable doing a full fetch with a different filter, as if the local has no
> commits in common with the remote. It builds upon cbe566a071
> ("negotiator/noop: add noop fetch negotiator", 2020-08-18).
> 
> To note:
> 
>   1. This will produce duplicated objects between the existing and newly
>      fetched packs, but gc will clean them up.
>   2. This series doesn't check that there's a new filter in any way, whether
>      configured via config or passed via --filter=. Personally I think that's
>      fine.
>   3. If a user fetches with --refilter applying a more restrictive filter
>      than previously (eg: blob:limit=1m then blob:limit=1k) the eventual
>      state is a no-op, since any referenced object already in the local
>      repository is never removed. Potentially this could be improved in
>      future by more advanced gc, possibly along the lines discussed at [2].
> 

Yes, it would be nice to have a way to efficiently extend a
partial clone with a more inclusive filter.

It would be nice to be able to send the old filter-spec and the
new filter-spec and ask the server to send "new && !old" to keep
from having to resend the objects that the client already has.
But I'm not sure we know enough (on either side) to make that
computation.  And as you say, there is no guarantee that the client
has only used one filter in the past.

Jeff

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] fetch-pack: add partial clone refiltering
  2022-02-04 18:02   ` Jonathan Tan
@ 2022-02-11 14:56     ` Robert Coup
  2022-02-17  0:05       ` Jonathan Tan
  0 siblings, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-02-11 14:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitgitgadget, git, stolee, me, christian.couder, johncai86

Hi Jonathan,

Thanks for taking a look at this.

On Fri, 4 Feb 2022 at 18:02, Jonathan Tan <jonathantanmy@google.com> wrote:
>
> The approach that I would have expected is to not call
> mark_complete_and_common_ref(), filter_refs(), everything_local(), and
> find_common(), but your approach here is to ensure that
> mark_complete_and_common_ref() and find_common() do not do anything.

v0: find_common() definitely still does something: during refiltering it skips
checking the local object db, but it's still responsible for sending
the "wants".

filter_refs() is necessary under v0 & v2 so the remote refs all get marked as
matched, otherwise we end up erroring after the transfer with
"error: no such remote ref refs/heads/main" etc.

> Comparing the two approaches, the advantage of yours is that we only
> need to make the change once to support both protocol v0 and v2
> (although the change looks more substantial than just skipping function
> calls), but it makes the code more difficult to read in that we now have
> function calls that do nothing. What do you think about my approach?

My original approach was to leave the negotiator in place, and just conditional
around it in do_fetch_pack_v2 — this worked ok for protocol v2 but the v0
implementation didn't work. After that I switched to forcing noop in [1/6]
which made both implementations match (& tidier imo).

To make the test pass and skip those calls I need a patch like the below
— filter_refs() is still required during refiltering for the ref-matching. To me
this looks more complicated, but I'm happy to defer to your thinking.

Thanks,

Rob :)


diff --git a/fetch-pack.c b/fetch-pack.c
index dd67044165..870bfba267 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1125,15 +1125,16 @@ static struct ref *do_fetch_pack(struct
fetch_pack_args *args,
        negotiator = &negotiator_alloc;
        if (is_refiltering) {
                fetch_negotiator_init_noop(negotiator);
+               filter_refs(args, &ref, sought, nr_sought);
        } else {
                fetch_negotiator_init(r, negotiator);
-       }

-       mark_complete_and_common_ref(negotiator, args, &ref);
-       filter_refs(args, &ref, sought, nr_sought);
-       if (!is_refiltering && everything_local(args, &ref)) {
-               packet_flush(fd[1]);
-               goto all_done;
+               mark_complete_and_common_ref(negotiator, args, &ref);
+               filter_refs(args, &ref, sought, nr_sought);
+               if (everything_local(args, &ref)) {
+                       packet_flush(fd[1]);
+                       goto all_done;
+               }
        }
        if (find_common(negotiator, args, fd, &oid, ref) < 0)
                if (!args->keep_pack)
@@ -1615,13 +1616,18 @@ static struct ref *do_fetch_pack_v2(struct
fetch_pack_args *args,
                        if (args->depth > 0 || args->deepen_since ||
args->deepen_not)
                                args->deepen = 1;

-                       /* Filter 'ref' by 'sought' and those that
aren't local */
-                       mark_complete_and_common_ref(negotiator, args, &ref);
-                       filter_refs(args, &ref, sought, nr_sought);
-                       if (!args->refilter && everything_local(args, &ref))
-                               state = FETCH_DONE;
-                       else
+                       if (args->refilter) {
+                               filter_refs(args, &ref, sought, nr_sought);
                                state = FETCH_SEND_REQUEST;
+                       } else {
+                               /* Filter 'ref' by 'sought' and those
that aren't local */
+
mark_complete_and_common_ref(negotiator, args, &ref);
+                               filter_refs(args, &ref, sought, nr_sought);
+                               if (everything_local(args, &ref))
+                                       state = FETCH_DONE;
+                               else
+                                       state = FETCH_SEND_REQUEST;
+                       }

                        mark_tips(negotiator, args->negotiation_tips);
                        for_each_cached_alternate(negotiator,

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter
  2022-02-02 15:02   ` Robert Coup
@ 2022-02-16 13:24     ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-02-16 13:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Robert Coup via GitGitGadget, git, Junio C Hamano, Jonathan Tan

Hi Derrick,

On Wed, 2 Feb 2022 at 15:02, Robert Coup <robert@coup.net.nz> wrote:
> On Tue, 1 Feb 2022 at 20:13, Junio C Hamano <gitster@pobox.com> wrote:
> > "Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > >  1. This will produce duplicated objects between the existing and newly
> > >     fetched packs, but gc will clean them up.
> >
> > Hopefully, at the end of this
> > operation, we should garbage collect the duplicated objects by
> > default (with an option to turn it off)?
>
> I haven't explicitly looked into invoking gc yet, but yes, it'd be a bit of
> a waste if it wasn't kicked off by default. Maybe reusing gc.auto

Just looking into this: after a fetch, run_auto_maintenance() is called, which
invokes `git maintenance run --auto` (being the successor to `git gc --auto`)...

In the refilter (repair) case, we'll have a new pack which substantially
duplicates what's in our existing packs. I'm after some advice here: I'm not
sure whether I want to encourage a gc pack consolidation, an incremental
repack, both, neither, or something else?

My current train of thought is it invokes `git maintenance run --auto` with
gc.autoPackLimit=1 to force a gc pack consolidation if it's not currently
=0 (disabled), and with maintenance.incremental-repack.auto=-1 if it's not
currently =0 (disabled) to force an incremental repack if someone doesn't
want to do pack consolidations.

In all cases the various enabled-ness settings should still be honoured to
skip tasks if the user doesn't want them ever happening automatically.

Implementation-wise I guess we'll need run_auto_maintenance() to be able
to pass some config options through to the maintenance subprocess.

Does this sound like a reasonable approach?

Thanks,

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] fetch-pack: add partial clone refiltering
  2022-02-11 14:56     ` Robert Coup
@ 2022-02-17  0:05       ` Jonathan Tan
  0 siblings, 0 replies; 76+ messages in thread
From: Jonathan Tan @ 2022-02-17  0:05 UTC (permalink / raw)
  To: Robert Coup
  Cc: Jonathan Tan, gitgitgadget, git, stolee, me, christian.couder, johncai86

Robert Coup <robert@coup.net.nz> writes:
> On Fri, 4 Feb 2022 at 18:02, Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > The approach that I would have expected is to not call
> > mark_complete_and_common_ref(), filter_refs(), everything_local(), and
> > find_common(), but your approach here is to ensure that
> > mark_complete_and_common_ref() and find_common() do not do anything.
> 
> v0: find_common() definitely still does something: during refiltering it skips
> checking the local object db, but it's still responsible for sending
> the "wants".
> 
> filter_refs() is necessary under v0 & v2 so the remote refs all get marked as
> matched, otherwise we end up erroring after the transfer with
> "error: no such remote ref refs/heads/main" etc.
> 
> > Comparing the two approaches, the advantage of yours is that we only
> > need to make the change once to support both protocol v0 and v2
> > (although the change looks more substantial than just skipping function
> > calls), but it makes the code more difficult to read in that we now have
> > function calls that do nothing. What do you think about my approach?
> 
> My original approach was to leave the negotiator in place, and just conditional
> around it in do_fetch_pack_v2 — this worked ok for protocol v2 but the v0
> implementation didn't work. After that I switched to forcing noop in [1/6]
> which made both implementations match (& tidier imo).
> 
> To make the test pass and skip those calls I need a patch like the below
> — filter_refs() is still required during refiltering for the ref-matching. To me
> this looks more complicated, but I'm happy to defer to your thinking.
> 
> Thanks,
> 
> Rob :)

Thanks for investigating this; looks like I was perhaps too optimistic
in thinking that the code could be reorganized in the way I'm
suggesting.

I think that it's worth checking if we could refactor the parts that are
needed with --refilter out of filter_refs() and find_common() (and in
doing so, make these functions do only what their names imply). I
don't have time to do so right now, but I don't want to unnecessarily
block this patch set either, so I'll say that ideally another reviewer
(or you) would do that investigation, but I have no objection to merging
this patch set in this state (other than the change of the argument
name, perhaps to "--repair", and associated documentation).

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
                   ` (8 preceding siblings ...)
  2022-02-07 19:37 ` Jeff Hostetler
@ 2022-02-24 16:13 ` Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 1/8] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
                     ` (9 more replies)
  9 siblings, 10 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup

If a filter is changed on a partial clone repository, for example from
blob:none to blob:limit=1m, there is currently no straightforward way to
bulk-refetch the objects that match the new filter for existing local
commits. This is because the client will report commits as "have" during
fetch negotiation and any dependent objects won't be included in the
transferred pack. Another use case is discussed at [1].

This patch series introduces a --repair option to fetch & fetch-pack to
enable doing a full fetch without performing any commit negotiation with the
remote, as for a fresh clone. It builds upon cbe566a071 ("negotiator/noop:
add noop fetch negotiator", 2020-08-18). While a key use case is described
above for partial clones, a user could also use --repair to fix a corrupted
object database by performing a refetch of objects that should already be
present, establishing a better workflow than deleting the local repository
and re-cloning.

 * Using --repair will produce duplicated objects between the existing and
   newly fetched packs, but maintenance will clean them up when it runs
   automatically post-fetch (if enabled).
 * If a user fetches with --repair applying a more restrictive partial clone
   filter than previously (eg: blob:limit=1m then blob:limit=1k) the
   eventual state is a no-op, since any referenced object already in the
   local repository is never removed. More advanced repacking which could
   improve this scenario is currently proposed at [2].

[1]
https://public-inbox.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
[2]
https://public-inbox.org/git/21ED346B-A906-4905-B061-EDE53691C586@gmail.com/

Changes since RFC (v1):

 * Changed the name from "refilter" to "repair"
 * Removed dependency between server-side support for filtering and repair
 * Added a test case for a shallow clone
 * Post-fetch auto maintenance now strongly encourages
   repacking/consolidation

Robert Coup (8):
  fetch-negotiator: add specific noop initializor
  fetch-pack: add repairing
  builtin/fetch-pack: add --repair option
  fetch: add --repair option
  t5615-partial-clone: add test for fetch --repair
  maintenance: add ability to pass config options
  fetch: after repair, encourage auto gc repacking
  doc/partial-clone: mention --repair fetch option

 Documentation/fetch-options.txt           | 10 +++++
 Documentation/git-fetch-pack.txt          |  4 ++
 Documentation/technical/partial-clone.txt |  3 ++
 builtin/am.c                              |  2 +-
 builtin/commit.c                          |  2 +-
 builtin/fetch-pack.c                      |  4 ++
 builtin/fetch.c                           | 38 ++++++++++++++--
 builtin/merge.c                           |  2 +-
 builtin/rebase.c                          |  2 +-
 fetch-negotiator.c                        |  5 +++
 fetch-negotiator.h                        |  8 ++++
 fetch-pack.c                              | 50 +++++++++++++--------
 fetch-pack.h                              |  1 +
 remote-curl.c                             |  6 +++
 run-command.c                             |  8 +++-
 run-command.h                             |  5 ++-
 t/t5616-partial-clone.sh                  | 54 ++++++++++++++++++++++-
 transport-helper.c                        |  3 ++
 transport.c                               |  4 ++
 transport.h                               |  4 ++
 20 files changed, 186 insertions(+), 29 deletions(-)


base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1138%2Frcoup%2Frc-partial-clone-refilter-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1138/rcoup/rc-partial-clone-refilter-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1138

Range-diff vs v1:

 1:  ca50ce77ae4 ! 1:  d146d8aaaaf fetch-negotiator: add specific noop initializor
     @@ Commit message
      
          Add a specific initializor for the noop fetch negotiator. This is
          introduced to support allowing partial clones to skip commit negotiation
     -    when refetching to apply a modified filter.
     +    when fetching to repair or apply a modified filter.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
 2:  a3c1999936d ! 2:  2d817a65db5 fetch-pack: add partial clone refiltering
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    fetch-pack: add partial clone refiltering
     +    fetch-pack: add repairing
      
     -    Allow refetching with a new partial clone filter by not attempting to
     -    find or negotiate common commits with the remote, and always forcing
     -    a full filtered fetch.
     +    Allow a 'repair fetch' where the contents of the local object store are
     +    ignored and a full fetch is performed, not attempting to find or
     +    negotiate common commits with the remote.
     +
     +    A key use case is to apply a new partial clone blob/tree filter and
     +    refetch all the associated matching content, which would otherwise not
     +    be transferred when the commit objects are already present locally.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ fetch-pack.c: static int find_common(struct fetch_negotiator *negotiator,
      -		if (((o = lookup_object(the_repository, remote)) != NULL) &&
      -				(o->flags & COMPLETE)) {
      -			continue;
     -+		if (!args->refilter) {
     ++		if (!args->repair) {
      +			/*
      +			* If that object is complete (i.e. it is an ancestor of a
      +			* local ref), we tell them we have it but do not have to
     @@ fetch-pack.c: static void mark_complete_and_common_ref(struct fetch_negotiator *
       
       	save_commit_buffer = 0;
       
     -+	if (args->refilter)
     ++	if (args->repair)
      +		return;
      +
       	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
       	for (ref = *refs; ref; ref = ref->next) {
     - 		struct object *o;
     + 		struct commit *commit;
      @@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     - 	int agent_len;
       	struct fetch_negotiator negotiator_alloc;
       	struct fetch_negotiator *negotiator;
     --
     + 
      -	negotiator = &negotiator_alloc;
      -	fetch_negotiator_init(r, negotiator);
     -+	unsigned is_refiltering = 0;
     - 
     +-
       	sort_ref_list(&ref, ref_compare_name);
       	QSORT(sought, nr_sought, cmp_ref_by_name);
     -@@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     - 	if (server_supports("filter")) {
     - 		server_supports_filtering = 1;
     - 		print_verbose(args, _("Server supports %s"), "filter");
     --	} else if (args->filter_options.choice) {
     -+	} else if (args->filter_options.choice || args->refilter) {
     - 		warning("filtering not recognized by server, ignoring");
     - 	}
       
     -+	if (server_supports_filtering && args->refilter) {
     -+		is_refiltering = 1;
     -+	}
     -+
     - 	if (server_supports("deepen-since")) {
     - 		print_verbose(args, _("Server supports %s"), "deepen-since");
     - 		deepen_since_ok = 1;
      @@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
       	if (!server_supports_hash(the_hash_algo->name, NULL))
       		die(_("Server does not support this repository's object format"));
       
      +	negotiator = &negotiator_alloc;
     -+	if (is_refiltering) {
     ++	if (args->repair) {
      +		fetch_negotiator_init_noop(negotiator);
      +	} else {
      +		fetch_negotiator_init(r, negotiator);
     @@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
       	mark_complete_and_common_ref(negotiator, args, &ref);
       	filter_refs(args, &ref, sought, nr_sought);
      -	if (everything_local(args, &ref)) {
     -+	if (!is_refiltering && everything_local(args, &ref)) {
     ++	if (!args->repair && everything_local(args, &ref)) {
       		packet_flush(fd[1]);
       		goto all_done;
       	}
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
       
       	negotiator = &negotiator_alloc;
      -	fetch_negotiator_init(r, negotiator);
     -+	if (args->refilter)
     ++	if (args->repair)
      +		fetch_negotiator_init_noop(negotiator);
      +	else
      +		fetch_negotiator_init(r, negotiator);
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
       			mark_complete_and_common_ref(negotiator, args, &ref);
       			filter_refs(args, &ref, sought, nr_sought);
      -			if (everything_local(args, &ref))
     -+			if (!args->refilter && everything_local(args, &ref))
     ++			if (!args->repair && everything_local(args, &ref))
       				state = FETCH_DONE;
       			else
       				state = FETCH_SEND_REQUEST;
     @@ fetch-pack.h: struct fetch_pack_args {
       	unsigned update_shallow:1;
       	unsigned reject_shallow_remote:1;
       	unsigned deepen:1;
     -+	unsigned refilter:1;
     ++	unsigned repair:1;
       
       	/*
       	 * Indicate that the remote of this request is a promisor remote. The
 3:  d7f4b6c8052 ! 3:  a42d40ac294 builtin/fetch-pack: add --refilter option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    builtin/fetch-pack: add --refilter option
     +    builtin/fetch-pack: add --repair option
      
     -    Add a --refilter option to fetch-pack to force a full fetch when
     -    applying a new filter.
     +    Add a --repair option to fetch-pack to force a full fetch. Use when
     +    applying a new partial clone filter to refetch all matching objects.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ Documentation/git-fetch-pack.txt: be in a separate packet, and the list must end
       	current shallow boundary instead of from the tip of each
       	remote branch history.
       
     -+--refilter::
     -+	Skips negotiating commits with the server in order to reapply a new
     -+	partial clone filter and fetch all matching objects.
     ++--repair::
     ++	Skips negotiating commits with the server in order to fetch all matching
     ++	objects. Use to reapply a new partial clone blob/tree filter.
      +
       --no-progress::
       	Do not show the progress.
     @@ builtin/fetch-pack.c: int cmd_fetch_pack(int argc, const char **argv, const char
       			args.from_promisor = 1;
       			continue;
       		}
     -+		if (!strcmp("--refilter", arg)) {
     -+			args.refilter = 1;
     ++		if (!strcmp("--repair", arg)) {
     ++			args.repair = 1;
      +			continue;
      +		}
       		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
     @@ remote-curl.c: struct options {
       		/* see documentation of corresponding flag in fetch-pack.h */
       		from_promisor : 1,
       
     -+		refilter : 1,
     ++		repair : 1,
       		atomic : 1,
       		object_format : 1,
       		force_if_includes : 1;
     @@ remote-curl.c: static int set_option(const char *name, const char *value)
       	} else if (!strcmp(name, "from-promisor")) {
       		options.from_promisor = 1;
       		return 0;
     -+	} else if (!strcmp(name, "refilter")) {
     -+		options.refilter = 1;
     ++	} else if (!strcmp(name, "repair")) {
     ++		options.repair = 1;
      +		return 0;
       	} else if (!strcmp(name, "filter")) {
       		options.filter = xstrdup(value);
     @@ remote-curl.c: static int fetch_git(struct discovery *heads,
       		strvec_push(&args, "--deepen-relative");
       	if (options.from_promisor)
       		strvec_push(&args, "--from-promisor");
     -+	if (options.refilter)
     -+		strvec_push(&args, "--refilter");
     ++	if (options.repair)
     ++		strvec_push(&args, "--repair");
       	if (options.filter)
       		strvec_pushf(&args, "--filter=%s", options.filter);
       	strvec_push(&args, url.buf);
 4:  3e102724196 ! 4:  79c409d0542 fetch: add --refilter option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    fetch: add --refilter option
     +    fetch: add --repair option
      
     -    Teach fetch and transports the --refilter option to force a full fetch
     -    when applying a new filter.
     +    Teach fetch and transports the --repair option to force a full fetch
     +    without negotiating common commits with the remote. Use when applying a
     +    new partial clone filter to refetch all matching objects.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ Documentation/fetch-options.txt: endif::git-pull[]
       	setting. See linkgit:git-config[1].
       
      +ifndef::git-pull[]
     -+--refilter::
     -+	Reapply a partial clone filter from configuration or `--filter=`, such
     -+	as when the filter definition has changed. Instead of negotiating with
     -+	the server to avoid transferring commits and associated objects that are
     -+	already present locally, this option fetches all objects that match the
     -+	filter.
     ++--repair::
     ++	Instead of negotiating with the server to avoid transferring commits and
     ++	associated objects that are already present locally, this option fetches
     ++	all objects as a fresh clone would. Use this to reapply a partial clone
     ++	filter from configuration or using `--filter=` when the filter
     ++	definition has changed.
      +endif::git-pull[]
      +
       --refmap=<refspec>::
     @@ builtin/fetch.c: static int prune_tags = -1; /* unspecified */
       static int all, append, dry_run, force, keep, multiple, update_head_ok;
       static int write_fetch_head = 1;
      -static int verbosity, deepen_relative, set_upstream;
     -+static int verbosity, deepen_relative, set_upstream, refilter;
     ++static int verbosity, deepen_relative, set_upstream, repair;
       static int progress = -1;
       static int enable_auto_gc = 1;
       static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
     @@ builtin/fetch.c: static struct option builtin_fetch_options[] = {
       	OPT_SET_INT_F(0, "unshallow", &unshallow,
       		      N_("convert to a complete repository"),
       		      1, PARSE_OPT_NONEG),
     -+	OPT_SET_INT_F(0, "refilter", &refilter,
     -+		      N_("re-fetch with a modified filter"),
     ++	OPT_SET_INT_F(0, "repair", &repair,
     ++		      N_("re-fetch without negotiating common commits"),
      +		      1, PARSE_OPT_NONEG),
       	{ OPTION_STRING, 0, "submodule-prefix", &submodule_prefix, N_("dir"),
       		   N_("prepend this to submodule path output"), PARSE_OPT_HIDDEN },
     @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
       		return -1;
       
      +	/*
     -+	 * Similarly, if we need to refilter a partial clone we already have
     -+	 * these commits reachable.  Running rev-list here will return with
     -+	 * a good (0) exit status and we'll bypass the fetch that we
     -+	 * really need to perform.  Claiming failure now will ensure
     -+	 * we perform the network exchange to reapply the filter.
     ++	 * Similarly, if we need to repair, we always want to perform a full
     ++	 * fetch ignoring existing objects.
      +	 */
     -+	if (refilter)
     ++	if (repair)
      +		return -1;
      +
      +
     @@ builtin/fetch.c: static struct transport *prepare_transport(struct remote *remot
       		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
       	if (update_shallow)
       		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
     -+	if (refilter)
     -+		set_option(transport, TRANS_OPT_REFILTER, "yes");
     ++	if (repair)
     ++		set_option(transport, TRANS_OPT_REPAIR, "yes");
       	if (filter_options.choice) {
       		const char *spec =
       			expand_list_objects_filter_spec(&filter_options);
     @@ transport-helper.c: static int fetch_refs(struct transport *transport,
       	if (data->transport_options.update_shallow)
       		set_helper_option(transport, "update-shallow", "true");
       
     -+	if (data->transport_options.refilter)
     -+		set_helper_option(transport, "refilter", "true");
     ++	if (data->transport_options.repair)
     ++		set_helper_option(transport, "repair", "true");
      +
       	if (data->transport_options.filter_options.choice) {
       		const char *spec = expand_list_objects_filter_spec(
     @@ transport.c: static int set_git_option(struct git_transport_options *opts,
       		list_objects_filter_die_if_populated(&opts->filter_options);
       		parse_list_objects_filter(&opts->filter_options, value);
       		return 0;
     -+	} else if (!strcmp(name, TRANS_OPT_REFILTER)) {
     -+		opts->refilter = !!value;
     ++	} else if (!strcmp(name, TRANS_OPT_REPAIR)) {
     ++		opts->repair = !!value;
      +		return 0;
       	} else if (!strcmp(name, TRANS_OPT_REJECT_SHALLOW)) {
       		opts->reject_shallow = !!value;
     @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
       	args.update_shallow = data->options.update_shallow;
       	args.from_promisor = data->options.from_promisor;
       	args.filter_options = data->options.filter_options;
     -+	args.refilter = data->options.refilter;
     ++	args.repair = data->options.repair;
       	args.stateless_rpc = transport->stateless_rpc;
       	args.server_options = transport->server_options;
       	args.negotiation_tips = data->options.negotiation_tips;
     @@ transport.h: struct git_transport_options {
       	unsigned update_shallow : 1;
       	unsigned reject_shallow : 1;
       	unsigned deepen_relative : 1;
     -+	unsigned refilter : 1;
     ++	unsigned repair : 1;
       
       	/* see documentation of corresponding flag in fetch-pack.h */
       	unsigned from_promisor : 1;
     @@ transport.h: void transport_check_allowed(const char *type);
       /* Filter objects for partial clone and fetch */
       #define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
       
     -+/* Refilter a previous partial clone */
     -+#define TRANS_OPT_REFILTER "refilter"
     ++/* Refetch all objects without negotiating */
     ++#define TRANS_OPT_REPAIR "repair"
      +
       /* Request atomic (all-or-nothing) updates when pushing */
       #define TRANS_OPT_ATOMIC "atomic"
 5:  66c18850aad ! 5:  38af2bbee79 t5615-partial-clone: add test for --refilter
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    t5615-partial-clone: add test for --refilter
     +    t5615-partial-clone: add test for fetch --repair
      
     -    Add a test for partial clone refiltering under protocol v0 and v2.
     +    Add a test for doing a repair fetch to apply a changed partial clone
     +    filter under protocol v0 and v2.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ t/t5616-partial-clone.sh: test_expect_success 'manual prefetch of missing object
      +	git -C src push -u srv main
      +'
      +
     -+# Do partial fetch to fetch smaller files; then verify that without --refilter
     ++# Do partial fetch to fetch smaller files; then verify that without --repair
      +# applying a new filter does not refetch missing large objects. Then use
     -+# --refilter to apply the new filter on existing commits. Test it under both
     ++# --repair to apply the new filter on existing commits. Test it under both
      +# protocol v2 & v0.
     -+test_expect_success 'apply a different filter using --refilter' '
     ++test_expect_success 'apply a different filter using --repair' '
      +	git -C pc1 fetch --filter=blob:limit=999 origin &&
      +	git -C pc1 rev-list --quiet --objects --missing=print \
      +		main..origin/main >observed &&
      +	test_line_count = 4 observed &&
      +
     -+	git -C pc1 fetch --filter=blob:limit=19999 --refilter origin &&
     ++	git -C pc1 fetch --filter=blob:limit=19999 --repair origin &&
      +	git -C pc1 rev-list --quiet --objects --missing=print \
      +		main..origin/main >observed &&
      +	test_line_count = 2 observed &&
      +
      +	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
     -+		--refilter origin &&
     ++		--repair origin &&
      +	git -C pc1 rev-list --quiet --objects --missing=print \
      +		main..origin/main >observed &&
      +	test_line_count = 0 observed
      +'
     ++
     ++test_expect_success 'fetch --repair works with a shallow clone' '
     ++	git clone --no-checkout --depth=1 --filter=blob:none "file://$(pwd)/srv.bare" pc1s &&
     ++	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
     ++	test_line_count = 6 observed &&
     ++
     ++	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --repair origin &&
     ++	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
     ++	test_line_count = 6 observed
     ++'
      +
       test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
       	test_create_repo submodule &&
 -:  ----------- > 6:  cfa6dca8ef4 maintenance: add ability to pass config options
 -:  ----------- > 7:  2338c15249a fetch: after repair, encourage auto gc repacking
 6:  6c4d4260bfc ! 8:  20942562a66 doc/partial-clone: mention --refilter option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    doc/partial-clone: mention --refilter option
     +    doc/partial-clone: mention --repair fetch option
      
     -    Add the fetch --refilter option to the partial clone documentation.
     +    Document it for partial clones as a means to refetch with a new filter.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ Documentation/technical/partial-clone.txt: Fetching Missing Objects
         currently fetches all objects referred to by the requested objects, even
         though they are not necessary.
       
     -+- Fetching with `--refilter` will request a complete new filtered packfile from
     ++- Fetching with `--repair` will request a complete new filtered packfile from
      +  the remote, which can be used to change a filter without needing to
      +  dynamically fetch missing objects.
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 1/8] fetch-negotiator: add specific noop initializor
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-25  6:19     ` Junio C Hamano
  2022-02-24 16:13   ` [PATCH v2 2/8] fetch-pack: add repairing Robert Coup via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a specific initializor for the noop fetch negotiator. This is
introduced to support allowing partial clones to skip commit negotiation
when fetching to repair or apply a modified filter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-negotiator.c | 5 +++++
 fetch-negotiator.h | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/fetch-negotiator.c b/fetch-negotiator.c
index 874797d767b..be383367f55 100644
--- a/fetch-negotiator.c
+++ b/fetch-negotiator.c
@@ -23,3 +23,8 @@ void fetch_negotiator_init(struct repository *r,
 		return;
 	}
 }
+
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator)
+{
+	noop_negotiator_init(negotiator);
+}
diff --git a/fetch-negotiator.h b/fetch-negotiator.h
index ea78868504b..e348905a1f0 100644
--- a/fetch-negotiator.h
+++ b/fetch-negotiator.h
@@ -53,7 +53,15 @@ struct fetch_negotiator {
 	void *data;
 };
 
+/*
+ * Initialize a negotiator based on the repository settings.
+ */
 void fetch_negotiator_init(struct repository *r,
 			   struct fetch_negotiator *negotiator);
 
+/*
+ * Initialize a noop negotiator.
+ */
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 2/8] fetch-pack: add repairing
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 1/8] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-25  6:46     ` Junio C Hamano
  2022-02-24 16:13   ` [PATCH v2 3/8] builtin/fetch-pack: add --repair option Robert Coup via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Allow a 'repair fetch' where the contents of the local object store are
ignored and a full fetch is performed, not attempting to find or
negotiate common commits with the remote.

A key use case is to apply a new partial clone blob/tree filter and
refetch all the associated matching content, which would otherwise not
be transferred when the commit objects are already present locally.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-pack.c | 50 +++++++++++++++++++++++++++++++-------------------
 fetch-pack.h |  1 +
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 87657907e78..8103243947a 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -312,19 +312,21 @@ static int find_common(struct fetch_negotiator *negotiator,
 		const char *remote_hex;
 		struct object *o;
 
-		/*
-		 * If that object is complete (i.e. it is an ancestor of a
-		 * local ref), we tell them we have it but do not have to
-		 * tell them about its ancestors, which they already know
-		 * about.
-		 *
-		 * We use lookup_object here because we are only
-		 * interested in the case we *know* the object is
-		 * reachable and we have already scanned it.
-		 */
-		if (((o = lookup_object(the_repository, remote)) != NULL) &&
-				(o->flags & COMPLETE)) {
-			continue;
+		if (!args->repair) {
+			/*
+			* If that object is complete (i.e. it is an ancestor of a
+			* local ref), we tell them we have it but do not have to
+			* tell them about its ancestors, which they already know
+			* about.
+			*
+			* We use lookup_object here because we are only
+			* interested in the case we *know* the object is
+			* reachable and we have already scanned it.
+			*/
+			if (((o = lookup_object(the_repository, remote)) != NULL) &&
+					(o->flags & COMPLETE)) {
+				continue;
+			}
 		}
 
 		remote_hex = oid_to_hex(remote);
@@ -694,6 +696,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 
 	save_commit_buffer = 0;
 
+	if (args->repair)
+		return;
+
 	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
 	for (ref = *refs; ref; ref = ref->next) {
 		struct commit *commit;
@@ -1027,9 +1032,6 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
 
-	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
-
 	sort_ref_list(&ref, ref_compare_name);
 	QSORT(sought, nr_sought, cmp_ref_by_name);
 
@@ -1119,9 +1121,16 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	if (!server_supports_hash(the_hash_algo->name, NULL))
 		die(_("Server does not support this repository's object format"));
 
+	negotiator = &negotiator_alloc;
+	if (args->repair) {
+		fetch_negotiator_init_noop(negotiator);
+	} else {
+		fetch_negotiator_init(r, negotiator);
+	}
+
 	mark_complete_and_common_ref(negotiator, args, &ref);
 	filter_refs(args, &ref, sought, nr_sought);
-	if (everything_local(args, &ref)) {
+	if (!args->repair && everything_local(args, &ref)) {
 		packet_flush(fd[1]);
 		goto all_done;
 	}
@@ -1587,7 +1596,10 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct strvec index_pack_args = STRVEC_INIT;
 
 	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	if (args->repair)
+		fetch_negotiator_init_noop(negotiator);
+	else
+		fetch_negotiator_init(r, negotiator);
 
 	packet_reader_init(&reader, fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -1613,7 +1625,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			/* Filter 'ref' by 'sought' and those that aren't local */
 			mark_complete_and_common_ref(negotiator, args, &ref);
 			filter_refs(args, &ref, sought, nr_sought);
-			if (everything_local(args, &ref))
+			if (!args->repair && everything_local(args, &ref))
 				state = FETCH_DONE;
 			else
 				state = FETCH_SEND_REQUEST;
diff --git a/fetch-pack.h b/fetch-pack.h
index 7f94a2a5831..bbb663edda8 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -42,6 +42,7 @@ struct fetch_pack_args {
 	unsigned update_shallow:1;
 	unsigned reject_shallow_remote:1;
 	unsigned deepen:1;
+	unsigned repair:1;
 
 	/*
 	 * Indicate that the remote of this request is a promisor remote. The
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 3/8] builtin/fetch-pack: add --repair option
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 1/8] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 2/8] fetch-pack: add repairing Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 4/8] fetch: " Robert Coup via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a --repair option to fetch-pack to force a full fetch. Use when
applying a new partial clone filter to refetch all matching objects.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/git-fetch-pack.txt | 4 ++++
 builtin/fetch-pack.c             | 4 ++++
 remote-curl.c                    | 6 ++++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index c9758847937..b9276813fa1 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -101,6 +101,10 @@ be in a separate packet, and the list must end with a flush packet.
 	current shallow boundary instead of from the tip of each
 	remote branch history.
 
+--repair::
+	Skips negotiating commits with the server in order to fetch all matching
+	objects. Use to reapply a new partial clone blob/tree filter.
+
 --no-progress::
 	Do not show the progress.
 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c2d96f4c89a..60c0082d52b 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.from_promisor = 1;
 			continue;
 		}
+		if (!strcmp("--repair", arg)) {
+			args.repair = 1;
+			continue;
+		}
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
 			parse_list_objects_filter(&args.filter_options, arg);
 			continue;
diff --git a/remote-curl.c b/remote-curl.c
index 0dabef2dd7c..941ed627db5 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -43,6 +43,7 @@ struct options {
 		/* see documentation of corresponding flag in fetch-pack.h */
 		from_promisor : 1,
 
+		repair : 1,
 		atomic : 1,
 		object_format : 1,
 		force_if_includes : 1;
@@ -198,6 +199,9 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "from-promisor")) {
 		options.from_promisor = 1;
 		return 0;
+	} else if (!strcmp(name, "repair")) {
+		options.repair = 1;
+		return 0;
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
@@ -1182,6 +1186,8 @@ static int fetch_git(struct discovery *heads,
 		strvec_push(&args, "--deepen-relative");
 	if (options.from_promisor)
 		strvec_push(&args, "--from-promisor");
+	if (options.repair)
+		strvec_push(&args, "--repair");
 	if (options.filter)
 		strvec_pushf(&args, "--filter=%s", options.filter);
 	strvec_push(&args, url.buf);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 4/8] fetch: add --repair option
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-02-24 16:13   ` [PATCH v2 3/8] builtin/fetch-pack: add --repair option Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 5/8] t5615-partial-clone: add test for fetch --repair Robert Coup via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Teach fetch and transports the --repair option to force a full fetch
without negotiating common commits with the remote. Use when applying a
new partial clone filter to refetch all matching objects.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  9 +++++++++
 builtin/fetch.c                 | 15 ++++++++++++++-
 transport-helper.c              |  3 +++
 transport.c                     |  4 ++++
 transport.h                     |  4 ++++
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index f9036831898..1131aaad252 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -163,6 +163,15 @@ endif::git-pull[]
 	behavior for a remote may be specified with the remote.<name>.tagOpt
 	setting. See linkgit:git-config[1].
 
+ifndef::git-pull[]
+--repair::
+	Instead of negotiating with the server to avoid transferring commits and
+	associated objects that are already present locally, this option fetches
+	all objects as a fresh clone would. Use this to reapply a partial clone
+	filter from configuration or using `--filter=` when the filter
+	definition has changed.
+endif::git-pull[]
+
 --refmap=<refspec>::
 	When fetching refs listed on the command line, use the
 	specified refspec (can be given more than once) to map the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 79ee9591859..8e5e590dd6e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -59,7 +59,7 @@ static int prune_tags = -1; /* unspecified */
 
 static int all, append, dry_run, force, keep, multiple, update_head_ok;
 static int write_fetch_head = 1;
-static int verbosity, deepen_relative, set_upstream;
+static int verbosity, deepen_relative, set_upstream, repair;
 static int progress = -1;
 static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
@@ -190,6 +190,9 @@ static struct option builtin_fetch_options[] = {
 	OPT_SET_INT_F(0, "unshallow", &unshallow,
 		      N_("convert to a complete repository"),
 		      1, PARSE_OPT_NONEG),
+	OPT_SET_INT_F(0, "repair", &repair,
+		      N_("re-fetch without negotiating common commits"),
+		      1, PARSE_OPT_NONEG),
 	{ OPTION_STRING, 0, "submodule-prefix", &submodule_prefix, N_("dir"),
 		   N_("prepend this to submodule path output"), PARSE_OPT_HIDDEN },
 	OPT_CALLBACK_F(0, "recurse-submodules-default",
@@ -1296,6 +1299,14 @@ static int check_exist_and_connected(struct ref *ref_map)
 	if (deepen)
 		return -1;
 
+	/*
+	 * Similarly, if we need to repair, we always want to perform a full
+	 * fetch ignoring existing objects.
+	 */
+	if (repair)
+		return -1;
+
+
 	/*
 	 * check_connected() allows objects to merely be promised, but
 	 * we need all direct targets to exist.
@@ -1492,6 +1503,8 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (repair)
+		set_option(transport, TRANS_OPT_REPAIR, "yes");
 	if (filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&filter_options);
diff --git a/transport-helper.c b/transport-helper.c
index a0297b0986c..a16a9626421 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -715,6 +715,9 @@ static int fetch_refs(struct transport *transport,
 	if (data->transport_options.update_shallow)
 		set_helper_option(transport, "update-shallow", "true");
 
+	if (data->transport_options.repair)
+		set_helper_option(transport, "repair", "true");
+
 	if (data->transport_options.filter_options.choice) {
 		const char *spec = expand_list_objects_filter_spec(
 			&data->transport_options.filter_options);
diff --git a/transport.c b/transport.c
index 253d6671b1f..e9c9c35b8e5 100644
--- a/transport.c
+++ b/transport.c
@@ -243,6 +243,9 @@ static int set_git_option(struct git_transport_options *opts,
 		list_objects_filter_die_if_populated(&opts->filter_options);
 		parse_list_objects_filter(&opts->filter_options, value);
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_REPAIR)) {
+		opts->repair = !!value;
+		return 0;
 	} else if (!strcmp(name, TRANS_OPT_REJECT_SHALLOW)) {
 		opts->reject_shallow = !!value;
 		return 0;
@@ -377,6 +380,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
 	args.filter_options = data->options.filter_options;
+	args.repair = data->options.repair;
 	args.stateless_rpc = transport->stateless_rpc;
 	args.server_options = transport->server_options;
 	args.negotiation_tips = data->options.negotiation_tips;
diff --git a/transport.h b/transport.h
index a0bc6a1e9eb..f3621e8b43c 100644
--- a/transport.h
+++ b/transport.h
@@ -16,6 +16,7 @@ struct git_transport_options {
 	unsigned update_shallow : 1;
 	unsigned reject_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned repair : 1;
 
 	/* see documentation of corresponding flag in fetch-pack.h */
 	unsigned from_promisor : 1;
@@ -216,6 +217,9 @@ void transport_check_allowed(const char *type);
 /* Filter objects for partial clone and fetch */
 #define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
 
+/* Refetch all objects without negotiating */
+#define TRANS_OPT_REPAIR "repair"
+
 /* Request atomic (all-or-nothing) updates when pushing */
 #define TRANS_OPT_ATOMIC "atomic"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 5/8] t5615-partial-clone: add test for fetch --repair
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-02-24 16:13   ` [PATCH v2 4/8] fetch: " Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-24 16:13   ` [PATCH v2 6/8] maintenance: add ability to pass config options Robert Coup via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a test for doing a repair fetch to apply a changed partial clone
filter under protocol v0 and v2.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 t/t5616-partial-clone.sh | 52 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 34469b6ac10..230b2dcbc94 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -166,6 +166,56 @@ test_expect_success 'manual prefetch of missing objects' '
 	test_line_count = 0 observed.oids
 '
 
+# create new commits in "src" repo to establish a history on file.4.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.4.txt' '
+	for x in a b c d e f
+	do
+		echo "Mod file.4.txt $x" >src/file.4.txt &&
+		if list_contains "a,b" "$x"; then
+			printf "%10000s" X >>src/file.4.txt
+		fi &&
+		if list_contains "c,d" "$x"; then
+			printf "%20000s" X >>src/file.4.txt
+		fi &&
+		git -C src add file.4.txt &&
+		git -C src commit -m "mod $x" || return 1
+	done &&
+	git -C src push -u srv main
+'
+
+# Do partial fetch to fetch smaller files; then verify that without --repair
+# applying a new filter does not refetch missing large objects. Then use
+# --repair to apply the new filter on existing commits. Test it under both
+# protocol v2 & v0.
+test_expect_success 'apply a different filter using --repair' '
+	git -C pc1 fetch --filter=blob:limit=999 origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 4 observed &&
+
+	git -C pc1 fetch --filter=blob:limit=19999 --repair origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 2 observed &&
+
+	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
+		--repair origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 0 observed
+'
+
+test_expect_success 'fetch --repair works with a shallow clone' '
+	git clone --no-checkout --depth=1 --filter=blob:none "file://$(pwd)/srv.bare" pc1s &&
+	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
+	test_line_count = 6 observed &&
+
+	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --repair origin &&
+	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
+	test_line_count = 6 observed
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&
@@ -225,7 +275,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 
 	# Auto-fetch all remaining trees and blobs with --missing=error
 	git -C dst rev-list --missing=error --objects main >fetched_objects &&
-	test_line_count = 70 fetched_objects &&
+	test_line_count = 88 fetched_objects &&
 
 	awk -f print_1.awk fetched_objects |
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 6/8] maintenance: add ability to pass config options
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-02-24 16:13   ` [PATCH v2 5/8] t5615-partial-clone: add test for fetch --repair Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-25  6:57     ` Junio C Hamano
  2022-02-25 10:29     ` Ævar Arnfjörð Bjarmason
  2022-02-24 16:13   ` [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking Robert Coup via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 2 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Make run_auto_maintenance() accept optional config options for a
specific invocation of the auto-maintenance process.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 builtin/am.c     | 2 +-
 builtin/commit.c | 2 +-
 builtin/fetch.c  | 2 +-
 builtin/merge.c  | 2 +-
 builtin/rebase.c | 2 +-
 run-command.c    | 8 +++++++-
 run-command.h    | 5 ++++-
 7 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 7de2c89ef22..298c6093bff 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1899,7 +1899,7 @@ next:
 	 */
 	if (!state->rebasing) {
 		am_destroy(state);
-		run_auto_maintenance(state->quiet);
+		run_auto_maintenance(state->quiet, NULL);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ed0374e30..84e7ab0a4cc 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1844,7 +1844,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_maintenance(quiet);
+	run_auto_maintenance(quiet, NULL);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 8e5e590dd6e..f32b24d182b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2227,7 +2227,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	}
 
 	if (enable_auto_gc)
-		run_auto_maintenance(verbosity < 0);
+		run_auto_maintenance(verbosity < 0, NULL);
 
  cleanup:
 	string_list_clear(&list, 0);
diff --git a/builtin/merge.c b/builtin/merge.c
index a94a03384ae..8d3e6d0de03 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -472,7 +472,7 @@ static void finish(struct commit *head_commit,
 			 * We ignore errors in 'gc --auto', since the
 			 * user should see them.
 			 */
-			run_auto_maintenance(verbosity < 0);
+			run_auto_maintenance(verbosity < 0, NULL);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index d858add3fe8..cbab6c05373 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -552,7 +552,7 @@ static int finish_rebase(struct rebase_options *opts)
 	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)), NULL);
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index a8501e38ceb..720fd7820c8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1798,9 +1798,10 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_maintenance(int quiet)
+int run_auto_maintenance(int quiet, const struct strvec *config_opts)
 {
 	int enabled;
+	int i;
 	struct child_process maint = CHILD_PROCESS_INIT;
 
 	if (!git_config_get_bool("maintenance.auto", &enabled) &&
@@ -1809,6 +1810,11 @@ int run_auto_maintenance(int quiet)
 
 	maint.git_cmd = 1;
 	maint.close_object_store = 1;
+
+	if (config_opts)
+		for (i = 0; i<config_opts->nr; i++)
+			strvec_pushl(&maint.args, "-c", config_opts->v[i], NULL);
+
 	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
 	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
 
diff --git a/run-command.h b/run-command.h
index 07bed6c31b4..24021abd41f 100644
--- a/run-command.h
+++ b/run-command.h
@@ -222,8 +222,11 @@ int run_command(struct child_process *);
 
 /*
  * Trigger an auto-gc
+ *
+ * config_opts is an optional list of additional config options to
+ * pass to the maintenance process in the form "some.option=value".
  */
-int run_auto_maintenance(int quiet);
+int run_auto_maintenance(int quiet, const struct strvec *config_opts);
 
 #define RUN_COMMAND_NO_STDIN		(1<<0)
 #define RUN_GIT_CMD			(1<<1)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-02-24 16:13   ` [PATCH v2 6/8] maintenance: add ability to pass config options Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-28 16:40     ` Ævar Arnfjörð Bjarmason
  2022-02-24 16:13   ` [PATCH v2 8/8] doc/partial-clone: mention --repair fetch option Robert Coup via GitGitGadget
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

After invoking `fetch --repair`, the object db will likely contain many
duplicate objects. If auto-maintenance is enabled, invoke it with
appropriate settings to encourage repacking/consolidation.

* gc.autoPackLimit: unless this is set to 0 (disabled), override the
  value to 1 to force pack consolidation.
* maintenance.incremental-repack.auto: unless this is set to 0, override
  the value to -1 to force incremental repacking.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  3 ++-
 builtin/fetch.c                 | 23 +++++++++++++++++++++--
 t/t5616-partial-clone.sh        |  6 ++++--
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 1131aaad252..73abafdfc41 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -169,7 +169,8 @@ ifndef::git-pull[]
 	associated objects that are already present locally, this option fetches
 	all objects as a fresh clone would. Use this to reapply a partial clone
 	filter from configuration or using `--filter=` when the filter
-	definition has changed.
+	definition has changed. Automatic post-fetch maintenance will perform
+	object database pack consolidation to remove any duplicate objects.
 endif::git-pull[]
 
 --refmap=<refspec>::
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f32b24d182b..7d023341ac0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2020,6 +2020,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	struct remote *remote = NULL;
 	int result = 0;
 	int prune_tags_ok = 1;
+	struct strvec auto_maint_opts = STRVEC_INIT;
+	int opt_val;
 
 	packet_trace_identity("fetch");
 
@@ -2226,10 +2228,27 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 					     NULL);
 	}
 
-	if (enable_auto_gc)
-		run_auto_maintenance(verbosity < 0, NULL);
+	if (enable_auto_gc) {
+		if (repair) {
+			/*
+			 * Hint auto-maintenance strongly to encourage repacking,
+			 * but respect config settings disabling it.
+			 */
+			if (git_config_get_int("gc.autopacklimit", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				strvec_push(&auto_maint_opts, "gc.autoPackLimit=1");
+
+			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				strvec_push(&auto_maint_opts, "maintenance.incremental-repack.auto=-1");
+		}
+		run_auto_maintenance(verbosity < 0, &auto_maint_opts);
+	}
 
  cleanup:
 	string_list_clear(&list, 0);
+	strvec_clear(&auto_maint_opts);
 	return result;
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 230b2dcbc94..60f1817cda6 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -187,7 +187,7 @@ test_expect_success 'push new commits to server for file.4.txt' '
 # Do partial fetch to fetch smaller files; then verify that without --repair
 # applying a new filter does not refetch missing large objects. Then use
 # --repair to apply the new filter on existing commits. Test it under both
-# protocol v2 & v0.
+# protocol v2 & v0. Check repacking auto-maintenance is kicked off.
 test_expect_success 'apply a different filter using --repair' '
 	git -C pc1 fetch --filter=blob:limit=999 origin &&
 	git -C pc1 rev-list --quiet --objects --missing=print \
@@ -199,11 +199,13 @@ test_expect_success 'apply a different filter using --repair' '
 		main..origin/main >observed &&
 	test_line_count = 2 observed &&
 
+	GIT_TRACE2_EVENT="$(pwd)/trace.log" \
 	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
 		--repair origin &&
 	git -C pc1 rev-list --quiet --objects --missing=print \
 		main..origin/main >observed &&
-	test_line_count = 0 observed
+	test_line_count = 0 observed &&
+	test_subcommand git -c gc.autoPackLimit=1 -c maintenance.incremental-repack.auto=-1 maintenance run --auto --no-quiet <trace.log
 '
 
 test_expect_success 'fetch --repair works with a shallow clone' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v2 8/8] doc/partial-clone: mention --repair fetch option
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-02-24 16:13   ` [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking Robert Coup via GitGitGadget
@ 2022-02-24 16:13   ` Robert Coup via GitGitGadget
  2022-02-28 16:43   ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
  9 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-02-24 16:13 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Document it for partial clones as a means to refetch with a new filter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/technical/partial-clone.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index a0dd7c66f24..268939f781d 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -181,6 +181,9 @@ Fetching Missing Objects
   currently fetches all objects referred to by the requested objects, even
   though they are not necessary.
 
+- Fetching with `--repair` will request a complete new filtered packfile from
+  the remote, which can be used to change a filter without needing to
+  dynamically fetch missing objects.
 
 Using many promisor remotes
 ---------------------------
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 1/8] fetch-negotiator: add specific noop initializor
  2022-02-24 16:13   ` [PATCH v2 1/8] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
@ 2022-02-25  6:19     ` Junio C Hamano
  2022-02-28 12:22       ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2022-02-25  6:19 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Derrick Stolee, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Robert Coup <robert@coup.net.nz>
>
> Add a specific initializor for the noop fetch negotiator. This is

"initializer"?

> introduced to support allowing partial clones to skip commit negotiation
> when fetching to repair or apply a modified filter.
>
> Signed-off-by: Robert Coup <robert@coup.net.nz>
> ---
>  fetch-negotiator.c | 5 +++++
>  fetch-negotiator.h | 8 ++++++++
>  2 files changed, 13 insertions(+)
>
> diff --git a/fetch-negotiator.c b/fetch-negotiator.c
> index 874797d767b..be383367f55 100644
> --- a/fetch-negotiator.c
> +++ b/fetch-negotiator.c
> @@ -23,3 +23,8 @@ void fetch_negotiator_init(struct repository *r,
>  		return;
>  	}
>  }
> +
> +void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator)
> +{
> +	noop_negotiator_init(negotiator);
> +}

Puzzling.  What makes this better than allowing noop-negotiator-init
to be called directly?

> diff --git a/fetch-negotiator.h b/fetch-negotiator.h
> index ea78868504b..e348905a1f0 100644
> --- a/fetch-negotiator.h
> +++ b/fetch-negotiator.h
> @@ -53,7 +53,15 @@ struct fetch_negotiator {
>  	void *data;
>  };
>  
> +/*
> + * Initialize a negotiator based on the repository settings.
> + */
>  void fetch_negotiator_init(struct repository *r,
>  			   struct fetch_negotiator *negotiator);
>  
> +/*
> + * Initialize a noop negotiator.
> + */
> +void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator);
> +
>  #endif

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 2/8] fetch-pack: add repairing
  2022-02-24 16:13   ` [PATCH v2 2/8] fetch-pack: add repairing Robert Coup via GitGitGadget
@ 2022-02-25  6:46     ` Junio C Hamano
  2022-02-28 12:14       ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2022-02-25  6:46 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Derrick Stolee, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -694,6 +696,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  
>  	save_commit_buffer = 0;
>  
> +	if (args->repair)
> +		return;
> +

Reading how the original value of save_commit_buffer is saved away,
the variable gets cleared and then gets restored before the function
returns in the normal codepath, this new code looks wrong.  Hitting
this early return after clearing the variable means nobody will
restore the saved value of the variable, no?

> @@ -1027,9 +1032,6 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  	struct fetch_negotiator negotiator_alloc;
>  	struct fetch_negotiator *negotiator;
>  
> -	negotiator = &negotiator_alloc;
> -	fetch_negotiator_init(r, negotiator);

I know why you want to force the "noop" negotiator while repairing,
but it is unclear why you need to move this down in the function.

>  	sort_ref_list(&ref, ref_compare_name);
>  	QSORT(sought, nr_sought, cmp_ref_by_name);
>  
> @@ -1119,9 +1121,16 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  	if (!server_supports_hash(the_hash_algo->name, NULL))
>  		die(_("Server does not support this repository's object format"));
>  
> +	negotiator = &negotiator_alloc;
> +	if (args->repair) {
> +		fetch_negotiator_init_noop(negotiator);
> +	} else {
> +		fetch_negotiator_init(r, negotiator);
> +	}

Hmph.  I am debating myself if hardcoding the implementation detail
of "when repairing, the noop negitiator is the only useful one" like
this code does is a sensible thing to do.  If we later need to tweak
the choice of negotiator used depending on the caller's needs,
perhaps fetch_negotiator_init() should gain a new flags word, i.e.

	fetch_negotiator_init(struct repository *,
			      struct fetch_negotiator *,
			      unsigned flags)

where "Use negotiator suitable for the repairing fetch" could be a
single bit in the flags word, making this caller more like:

	negotiator = &negotiator_alloc;
	flags = 0;
	if (args->repair)
		flags |= FETCH_NEGOTIATOR_REPAIRING;
	fetch_negotiator_init(r, negotiator, flags);

perhaps.  That way, [1/8] becomes unnecessary.

>  	mark_complete_and_common_ref(negotiator, args, &ref);
>  	filter_refs(args, &ref, sought, nr_sought);
> -	if (everything_local(args, &ref)) {
> +	if (!args->repair && everything_local(args, &ref)) {
>  		packet_flush(fd[1]);
>  		goto all_done;
>  	}
> @@ -1587,7 +1596,10 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  	struct strvec index_pack_args = STRVEC_INIT;
>  
>  	negotiator = &negotiator_alloc;
> -	fetch_negotiator_init(r, negotiator);
> +	if (args->repair)
> +		fetch_negotiator_init_noop(negotiator);
> +	else
> +		fetch_negotiator_init(r, negotiator);

Likewise.

>  	packet_reader_init(&reader, fd[0], NULL, 0,
>  			   PACKET_READ_CHOMP_NEWLINE |
> @@ -1613,7 +1625,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  			/* Filter 'ref' by 'sought' and those that aren't local */
>  			mark_complete_and_common_ref(negotiator, args, &ref);
>  			filter_refs(args, &ref, sought, nr_sought);
> -			if (everything_local(args, &ref))
> +			if (!args->repair && everything_local(args, &ref))
>  				state = FETCH_DONE;
>  			else
>  				state = FETCH_SEND_REQUEST;
> diff --git a/fetch-pack.h b/fetch-pack.h
> index 7f94a2a5831..bbb663edda8 100644
> --- a/fetch-pack.h
> +++ b/fetch-pack.h
> @@ -42,6 +42,7 @@ struct fetch_pack_args {
>  	unsigned update_shallow:1;
>  	unsigned reject_shallow_remote:1;
>  	unsigned deepen:1;
> +	unsigned repair:1;
>  
>  	/*
>  	 * Indicate that the remote of this request is a promisor remote. The

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 6/8] maintenance: add ability to pass config options
  2022-02-24 16:13   ` [PATCH v2 6/8] maintenance: add ability to pass config options Robert Coup via GitGitGadget
@ 2022-02-25  6:57     ` Junio C Hamano
  2022-02-28 12:02       ` Robert Coup
  2022-02-25 10:29     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2022-02-25  6:57 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Derrick Stolee, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	if (config_opts)
> +		for (i = 0; i<config_opts->nr; i++)

Style.  SP on both sides of '<'.

> +			strvec_pushl(&maint.args, "-c", config_opts->v[i], NULL);

>  	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
>  	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");

It is unclear if it is generally a good idea to pass hardcoded set
of configuration variables to begin with, but provided if it makes
sense [*], the implementation seems OK.

	Side note.  And this is a big *IF*, as we can see in all
	other helper functions in run-commands.h, nobody has such a
	privision.  If supporting such a "feature" makes sense, we
	probably would need to do so with a common interface that
	can be used across run_command() API, not with an ad-hoc
	interface that is only usable with run_auto_maintenance(),
	which may look somewhat similar to how we have a common way
	to pass set of environment variables.

> diff --git a/run-command.h b/run-command.h
> index 07bed6c31b4..24021abd41f 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -222,8 +222,11 @@ int run_command(struct child_process *);
>  
>  /*
>   * Trigger an auto-gc
> + *
> + * config_opts is an optional list of additional config options to
> + * pass to the maintenance process in the form "some.option=value".
>   */
> -int run_auto_maintenance(int quiet);
> +int run_auto_maintenance(int quiet, const struct strvec *config_opts);
>  
>  #define RUN_COMMAND_NO_STDIN		(1<<0)
>  #define RUN_GIT_CMD			(1<<1)
n

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 6/8] maintenance: add ability to pass config options
  2022-02-24 16:13   ` [PATCH v2 6/8] maintenance: add ability to pass config options Robert Coup via GitGitGadget
  2022-02-25  6:57     ` Junio C Hamano
@ 2022-02-25 10:29     ` Ævar Arnfjörð Bjarmason
  2022-02-28 11:51       ` Robert Coup
  1 sibling, 1 reply; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-25 10:29 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup


On Thu, Feb 24 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> Make run_auto_maintenance() accept optional config options for a
> specific invocation of the auto-maintenance process.
> [...]
> -int run_auto_maintenance(int quiet)
> +int run_auto_maintenance(int quiet, const struct strvec *config_opts)
>  {
>  	int enabled;
> +	int i;
>  	struct child_process maint = CHILD_PROCESS_INIT;
>  
>  	if (!git_config_get_bool("maintenance.auto", &enabled) &&
> @@ -1809,6 +1810,11 @@ int run_auto_maintenance(int quiet)
>  
>  	maint.git_cmd = 1;
>  	maint.close_object_store = 1;
> +
> +	if (config_opts)
> +		for (i = 0; i<config_opts->nr; i++)
> +			strvec_pushl(&maint.args, "-c", config_opts->v[i], NULL);
> +
>  	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
>  	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
>  
> diff --git a/run-command.h b/run-command.h
> index 07bed6c31b4..24021abd41f 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -222,8 +222,11 @@ int run_command(struct child_process *);
>  
>  /*
>   * Trigger an auto-gc
> + *
> + * config_opts is an optional list of additional config options to
> + * pass to the maintenance process in the form "some.option=value".
>   */
> -int run_auto_maintenance(int quiet);
> +int run_auto_maintenance(int quiet, const struct strvec *config_opts);
>  
>  #define RUN_COMMAND_NO_STDIN		(1<<0)
>  #define RUN_GIT_CMD			(1<<1)


Shouldn't this bei using git grep the git_config_push.*parameter()
functions instead of adding a new custom method to do this.

Perhaps there's some subtle distinction between the two that's important
here, or perhaps you just didn't know about that API...

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 6/8] maintenance: add ability to pass config options
  2022-02-25 10:29     ` Ævar Arnfjörð Bjarmason
@ 2022-02-28 11:51       ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-02-28 11:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee

Hi Ævar,

On Fri, 25 Feb 2022 at 10:30, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> Shouldn't this bei using git grep the git_config_push.*parameter()
> functions instead of adding a new custom method to do this.
>
> Perhaps there's some subtle distinction between the two that's important
> here, or perhaps you just didn't know about that API...

I didn't know of it — it could probably use a docstring too ;-) A few
other sub-commands seemed to use the `-c key=value` approach so it
didn't seem like GIT_CONFIG_PARAMETERS was universal.

Easy enough to swap to, thanks. Assuming we want it, but I'm about to
reply to Junio's email.

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 6/8] maintenance: add ability to pass config options
  2022-02-25  6:57     ` Junio C Hamano
@ 2022-02-28 12:02       ` Robert Coup
  2022-02-28 17:07         ` Junio C Hamano
  0 siblings, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-02-28 12:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee

Hi Junio,

On Fri, 25 Feb 2022 at 06:57, Junio C Hamano <gitster@pobox.com> wrote:
>
> Style.  SP on both sides of '<'.

Thanks

> It is unclear if it is generally a good idea to pass hardcoded set
> of configuration variables to begin with, but provided if it makes
> sense [*], the implementation seems OK.

My RFC didn't do any repacking, and you (rightly) raised it at
https://lore.kernel.org/git/xmqqk0eecpl9.fsf@gitster.g/

>> To note:
>>
>>  1. This will produce duplicated objects between the existing and newly
>>     fetched packs, but gc will clean them up.
>
> ... it is not smart enough to stell them to exclude what we _ought_
> to have by telling them what the _old_ filter spec was.  That's OK
> for a starting point, I guess.  Hopefully, at the end of this
> operation, we should garbage collect the duplicated objects by
> default (with an option to turn it off)?

This was my best stab at (I think) safely doing it — it respects users
who have disabled auto-gc/repacking/maintenance, and doesn't add yet
another config variable. But we could also:

1. do nothing — the user's object DB will contain duplicate objects
until some repacking happens. This could be up to twice the size it
"should" be.
2. print a message to suggest running `git gc` / `git maintenance`

I'm keen on some input from Derrick or someone deeply familiar with
the maintenance/GC bits (particularly wrt incremental repacking)
whether what I'm doing would cause issues for people with complex
setups I haven't thought of.

>         Side note.  And this is a big *IF*, as we can see in all
>         other helper functions in run-commands.h, nobody has such a
>         privision.  If supporting such a "feature" makes sense, we
>         probably would need to do so with a common interface that
>         can be used across run_command() API, not with an ad-hoc
>         interface that is only usable with run_auto_maintenance(),
>         which may look somewhat similar to how we have a common way
>         to pass set of environment variables.

Ævar pointed out GIT_CONFIG_PARAMETERS has an API to wrap it's use, so
I can adapt the implementation to use that.

Thanks

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 2/8] fetch-pack: add repairing
  2022-02-25  6:46     ` Junio C Hamano
@ 2022-02-28 12:14       ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-02-28 12:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee

Hi Junio,

On Fri, 25 Feb 2022 at 06:46, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -694,6 +696,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
> >
> >       save_commit_buffer = 0;
> >
> > +     if (args->repair)
> > +             return;
> > +
>
> Reading how the original value of save_commit_buffer is saved away,
> the variable gets cleared and then gets restored before the function
> returns in the normal codepath, this new code looks wrong.  Hitting
> this early return after clearing the variable means nobody will
> restore the saved value of the variable, no?

Good spotting, thank you.

>
> > @@ -1027,9 +1032,6 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
> >       struct fetch_negotiator negotiator_alloc;
> >       struct fetch_negotiator *negotiator;
> >
> > -     negotiator = &negotiator_alloc;
> > -     fetch_negotiator_init(r, negotiator);
>
> I know why you want to force the "noop" negotiator while repairing,
> but it is unclear why you need to move this down in the function.

Seemed cleaner to initialise the right negotiator once, rather than
clearing and re-initialising depending on repair mode.

> Hmph.  I am debating myself if hardcoding the implementation detail
> of "when repairing, the noop negitiator is the only useful one" like
> this code does is a sensible thing to do.  If we later need to tweak
> the choice of negotiator used depending on the caller's needs,
> perhaps fetch_negotiator_init() should gain a new flags word, i.e.

To me this feels a bit hypothetical, but maybe I'm missing a use case?
The point of repairing is not to negotiate common commits and do
(effectively) a clone-style fresh fetch. If some future special
negotiator that has a repair mode arrives, or likewise a more complex
repair mode then other things will probably need adapting?

> where "Use negotiator suitable for the repairing fetch" could be a
> single bit in the flags word, making this caller more like:
>
>         negotiator = &negotiator_alloc;
>         flags = 0;
>         if (args->repair)
>                 flags |= FETCH_NEGOTIATOR_REPAIRING;
>         fetch_negotiator_init(r, negotiator, flags);
>
> perhaps.  That way, [1/8] becomes unnecessary.

With the current patch it is clear what's happening, that the user's
negotiator selection is deliberately being ignored for the purposes of
repairing. Conversely, calling negotiator_init() asking for a skipping
negotiator in repair mode and getting back a noop negotiator seems
unobvious.

Thanks,

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 1/8] fetch-negotiator: add specific noop initializor
  2022-02-25  6:19     ` Junio C Hamano
@ 2022-02-28 12:22       ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-02-28 12:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee

Hi,

On Fri, 25 Feb 2022 at 06:19, Junio C Hamano <gitster@pobox.com> wrote:
>
> > Add a specific initializor for the noop fetch negotiator. This is
>
> "initializer"?

Thanks. A poor translation effort to American English ;-)

> > +
> > +void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator)
> > +{
> > +     noop_negotiator_init(negotiator);
> > +}
>
> Puzzling.  What makes this better than allowing noop-negotiator-init
> to be called directly?

Is simply for naming/API consistency with the regular method. Happy to
call noop_negotiator_init() if you prefer?

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking
  2022-02-24 16:13   ` [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking Robert Coup via GitGitGadget
@ 2022-02-28 16:40     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-28 16:40 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup


On Thu, Feb 24 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> After invoking `fetch --repair`, the object db will likely contain many
> duplicate objects. If auto-maintenance is enabled, invoke it with
> appropriate settings to encourage repacking/consolidation.
>
> * gc.autoPackLimit: unless this is set to 0 (disabled), override the
>   value to 1 to force pack consolidation.
> * maintenance.incremental-repack.auto: unless this is set to 0, override
>   the value to -1 to force incremental repacking.
>
> Signed-off-by: Robert Coup <robert@coup.net.nz>
> ---
>  Documentation/fetch-options.txt |  3 ++-
>  builtin/fetch.c                 | 23 +++++++++++++++++++++--
>  t/t5616-partial-clone.sh        |  6 ++++--
>  3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
> index 1131aaad252..73abafdfc41 100644
> --- a/Documentation/fetch-options.txt
> +++ b/Documentation/fetch-options.txt
> @@ -169,7 +169,8 @@ ifndef::git-pull[]
>  	associated objects that are already present locally, this option fetches
>  	all objects as a fresh clone would. Use this to reapply a partial clone
>  	filter from configuration or using `--filter=` when the filter
> -	definition has changed.
> +	definition has changed. Automatic post-fetch maintenance will perform
> +	object database pack consolidation to remove any duplicate objects.
>  endif::git-pull[]
>  
>  --refmap=<refspec>::
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index f32b24d182b..7d023341ac0 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2020,6 +2020,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  	struct remote *remote = NULL;
>  	int result = 0;
>  	int prune_tags_ok = 1;
> +	struct strvec auto_maint_opts = STRVEC_INIT;

[Nits, but aside from earlier comments about config options v.s. config[]

this variable...

> +	int opt_val;

...and this...
>  
>  	packet_trace_identity("fetch");
>  
> @@ -2226,10 +2228,27 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  					     NULL);
>  	}
>  
> -	if (enable_auto_gc)
> -		run_auto_maintenance(verbosity < 0, NULL);
> +	if (enable_auto_gc) {

...can just be declared in this scope.

> +		if (repair) {

I think having:

    if (enable_auto_gc && repair)

Might make this more readable without the extra indentation, but of
course then the variables need to be at the top-level... :)

> +			/*
> +			 * Hint auto-maintenance strongly to encourage repacking,
> +			 * but respect config settings disabling it.
> +			 */
> +			if (git_config_get_int("gc.autopacklimit", &opt_val))
> +				opt_val = -1;
> +			if (opt_val != 0)
> +				strvec_push(&auto_maint_opts, "gc.autoPackLimit=1");
> +
> +			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
> +				opt_val = -1;
> +			if (opt_val != 0)
> +				strvec_push(&auto_maint_opts, "maintenance.incremental-repack.auto=-1");
> +		}
> +		run_auto_maintenance(verbosity < 0, &auto_maint_opts);
> +	}
>  
>   cleanup:
>  	string_list_clear(&list, 0);
> +	strvec_clear(&auto_maint_opts);
>  	return result;
>  }
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 230b2dcbc94..60f1817cda6 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -187,7 +187,7 @@ test_expect_success 'push new commits to server for file.4.txt' '
>  # Do partial fetch to fetch smaller files; then verify that without --repair
>  # applying a new filter does not refetch missing large objects. Then use
>  # --repair to apply the new filter on existing commits. Test it under both
> -# protocol v2 & v0.
> +# protocol v2 & v0. Check repacking auto-maintenance is kicked off.
>  test_expect_success 'apply a different filter using --repair' '
>  	git -C pc1 fetch --filter=blob:limit=999 origin &&
>  	git -C pc1 rev-list --quiet --objects --missing=print \
> @@ -199,11 +199,13 @@ test_expect_success 'apply a different filter using --repair' '
>  		main..origin/main >observed &&
>  	test_line_count = 2 observed &&
>  
> +	GIT_TRACE2_EVENT="$(pwd)/trace.log" \

Nit: Better to use $PWD instead of $(pwd), works here, but won't be
compatible with -x if we ever want to test stderr.

>  	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
>  		--repair origin &&
>  	git -C pc1 rev-list --quiet --objects --missing=print \
>  		main..origin/main >observed &&
> -	test_line_count = 0 observed
> +	test_line_count = 0 observed &&
> +	test_subcommand git -c gc.autoPackLimit=1 -c maintenance.incremental-repack.auto=-1 maintenance run --auto --no-quiet <trace.log
>  '
>  
>  test_expect_success 'fetch --repair works with a shallow clone' '


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (7 preceding siblings ...)
  2022-02-24 16:13   ` [PATCH v2 8/8] doc/partial-clone: mention --repair fetch option Robert Coup via GitGitGadget
@ 2022-02-28 16:43   ` Ævar Arnfjörð Bjarmason
  2022-02-28 17:27     ` Robert Coup
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
  9 siblings, 1 reply; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-28 16:43 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Robert Coup


On Thu, Feb 24 2022, Robert Coup via GitGitGadget wrote:

> [...] While a key use case is described
> above for partial clones, a user could also use --repair to fix a corrupted
> object database by performing a refetch of objects that should already be
> present, establishing a better workflow than deleting the local repository
> and re-cloning.
>
>  * Using --repair will produce duplicated objects between the existing and
>    newly fetched packs, but maintenance will clean them up when it runs
>    automatically post-fetch (if enabled).
>  * If a user fetches with --repair applying a more restrictive partial clone
>    filter than previously (eg: blob:limit=1m then blob:limit=1k) the
>    eventual state is a no-op, since any referenced object already in the
>    local repository is never removed. More advanced repacking which could
>    improve this scenario is currently proposed at [2].

I realize this was probably based on feedback on v1 (I didn't go back
and re-read it, sorry).

But I feel strongly that we really should name this something other than
--repair. I don't care much if it isn't that :) But maybe
--expand-filters, --fleshen-partial or something like that?

So first (and partially as an aside): Is a "noop" negotiatior really
want we want at all? Don't we instead want to be discovering those parts
of our history that are closed under reachability (if any) and say we
HAVE those things during negotiation?

I haven't tested, but maybe that's just more complex, e.g. you have a
filter that's excluding >500MB blobs or whatever you might have the full
history already, or maybe that 500MB blob was added last week, so you
have almost all of it.

But wouldn't that be a lot kinder to server resources + network at the
expense of some (presumably rare) extra local computation?

But secondly, on the "--repair" name: The reason I mentioned that is
that I'd really like us to actually have a "my repo is screwed, please
repair it".

But (and I haven't tested, but I'm pretty sure), this patch series isn't
going to give you that. The reasons are elaborated on in [1], basically
we try really hard to re-use local data, and due to that & the collision
detection will often just hard die early in object walking.

But maybe I'm wrong, have you actually tested this with *broken* objects
as opposed to just missing ones with repo filters + promisors in play?
Our t/*fsck* and t/*corrupt*/ etc. tests have some of those.

And maybe I'm making a big deal out of nothing, but I fear that by
naming it --repair and giving it these semantics that we'd be closing
the door on things that are actually needed for some of the trickier
edge cases when it comes to repairing a bad repository.

Including but not limited to having a loose BAD_OBJ, and needing to
replace it with another loose object (due to the unpack limit), the
branch we're updating can't be read locally, but is at an OID that's
(re-)included in the incoming pack and is hopefully about to repair our
repository.

Or even that we have a SHA-1 collision, but we intentionally want to
override the collision detection because we know our local repo is bad,
but the remote can be trusted.

All of which are much more involved than just the "fleshen partial data"
you're aiming for here...

1. https://lore.kernel.org/git/87czo7haha.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 6/8] maintenance: add ability to pass config options
  2022-02-28 12:02       ` Robert Coup
@ 2022-02-28 17:07         ` Junio C Hamano
  0 siblings, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2022-02-28 17:07 UTC (permalink / raw)
  To: Robert Coup
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee

Robert Coup <robert@coup.net.nz> writes:

> Ævar pointed out GIT_CONFIG_PARAMETERS has an API to wrap it's use, so
> I can adapt the implementation to use that.

Depending on how that "API to wrap" looks like, it may not be a wise
idea, though.  You'd need to consider how it interacts with those
who run "git -c var=val" to enter this codepath.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-02-28 16:43   ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
@ 2022-02-28 17:27     ` Robert Coup
  2022-02-28 18:54       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation Junio C Hamano
  2022-02-28 22:20       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 76+ messages in thread
From: Robert Coup @ 2022-02-28 17:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee

Hi Ævar,

Thanks for taking the time to look into this,

On Mon, 28 Feb 2022 at 16:54, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> I realize this was probably based on feedback on v1 (I didn't go back
> and re-read it, sorry).

Yes, `fetch --repair` was from Jonathan Tan's v1 feedback[1], where he
pointed out it could fill in lost objects from any remote in a more
generally useful fashion.

My goal here is to refetch with a different filter so that I get the
outcome of a `clone --filter=` without having to chuck my object
directory. But the actual implementation doesn't need to know anything
specific about filters, so the original "refilter" name I had isn't
really right.

> But I feel strongly that we really should name this something other than
> --repair. I don't care much if it isn't that :) But maybe
> --expand-filters, --fleshen-partial or something like that?

fleshen-partial sounds like a horror movie scene to me.

1. `--refetch`
2. `--as-clone`
3. `--expand-filter` (though TBC you don't necessarily need a filter)
4. `--refilter`
5. something else

> So first (and partially as an aside): Is a "noop" negotiatior really
> want we want at all? Don't we instead want to be discovering those parts
> of our history that are closed under reachability (if any) and say we
> HAVE those things during negotiation?

At an object level we don't have any means of knowing what has or
hasn't been obtained via fetch to a partial clone with different
`--filter` args (via config or cli), dynamic fault-ins, or sourced
from a different remote. Fetch negotiation only occurs for refs and
their associated commits/histories, but filtering occurs at the blob
or tree level — so we often HAVE a commit but not all of its
trees/blobs, whereupon negotiation skips that commit and all it's
associated objects.

> But secondly, on the "--repair" name: The reason I mentioned that is
> that I'd really like us to actually have a "my repo is screwed, please
> repair it".

Feels like people would look at `fsck` for that over `fetch`? Maybe
not. Anyway, I get the point about the naming still not being right
:-)

> But (and I haven't tested, but I'm pretty sure), this patch series isn't
> going to give you that. The reasons are elaborated on in [1], basically
> we try really hard to re-use local data, and due to that & the collision
> detection will often just hard die early in object walking.
>
> But maybe I'm wrong, have you actually tested this with *broken* objects
> as opposed to just missing ones with repo filters + promisors in play?
> Our t/*fsck* and t/*corrupt*/ etc. tests have some of those.

Correct: I haven't tested with such objects/broken ODBs. Ideally
repack/gc/etc would prefer a new-fixed pack over the old-broken
pack/object but that's not really what I'm aiming to achieve here or
am interested in.

1. https://lore.kernel.org/git/20220202185957.1928631-1-jonathantanmy@google.com/

Thanks,

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/8] fetch: add repair: full refetch without negotiation
  2022-02-28 17:27     ` Robert Coup
@ 2022-02-28 18:54       ` Junio C Hamano
  2022-02-28 22:20       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2022-02-28 18:54 UTC (permalink / raw)
  To: Robert Coup
  Cc: Ævar Arnfjörð Bjarmason,
	Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee

Robert Coup <robert@coup.net.nz> writes:

> fleshen-partial sounds like a horror movie scene to me.
>
> 1. `--refetch`

I do think, as part of a larger "repair" task, fetching objects
again from a known-good remote would be a sensible one step.  So
"re"fetch would be a good way to phrase it.

> 2. `--as-clone`

This I think is in the same vein.

> 3. `--expand-filter` (though TBC you don't necessarily need a filter)
> 4. `--refilter`

These focus on a wrong thing, I would think.  "I suspect my object
store may be missing some objects, and I want to repair it by
getting another copy from a known-good remote" would be the simplest
and easiest-to-explain form of the end-user requirement this feature
helps.  The end-user wish does not require "filter" at all.

But this reminds of me about a few caveats.

The patch does implement a good way to lie about what we have to the
other end, forcing them to send everything that is necessary to
create a new clone.

But that is only a half of what is required to "repair a possibly
broken object store by re-fetching".  It is sufficient for repairing
"missing objects", but not enough for repairing "broken/corrupt"
objects.  The object store has a mechanism to resist collision
attacks by ignoring objects coming over the network that do not
match what we locally already have.

I offhand do not think it is a practical problem that this new
mechanism is incapable of repair a repository with broken/corrupt
objects [*].  But tying the bland and broad word "repair" to the
feature that is meant to only deal with "missing" but not "curropt"
form of object store breakage may lead to confusion.

	Side note: others who spent more time on partial/lazy clone
	might realize that broken/corrupt objects are real problems
	they want to tackle, though.  I dunno, and that is why I am
	raising it as a potential issue.

Thanks.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-02-28 17:27     ` Robert Coup
  2022-02-28 18:54       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation Junio C Hamano
@ 2022-02-28 22:20       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-28 22:20 UTC (permalink / raw)
  To: Robert Coup
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee


On Mon, Feb 28 2022, Robert Coup wrote:

> Hi Ævar,
>
> Thanks for taking the time to look into this,
>
> On Mon, 28 Feb 2022 at 16:54, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> I realize this was probably based on feedback on v1 (I didn't go back
>> and re-read it, sorry).
>
> Yes, `fetch --repair` was from Jonathan Tan's v1 feedback[1], where he
> pointed out it could fill in lost objects from any remote in a more
> generally useful fashion.
>
> My goal here is to refetch with a different filter so that I get the
> outcome of a `clone --filter=` without having to chuck my object
> directory. But the actual implementation doesn't need to know anything
> specific about filters, so the original "refilter" name I had isn't
> really right.

*nod*

>> But I feel strongly that we really should name this something other than
>> --repair. I don't care much if it isn't that :) But maybe
>> --expand-filters, --fleshen-partial or something like that?
>
> fleshen-partial sounds like a horror movie scene to me.
>
> 1. `--refetch`
> 2. `--as-clone`
> 3. `--expand-filter` (though TBC you don't necessarily need a filter)
> 4. `--refilter`
> 5. something else

*nod*

>> So first (and partially as an aside): Is a "noop" negotiatior really
>> want we want at all? Don't we instead want to be discovering those parts
>> of our history that are closed under reachability (if any) and say we
>> HAVE those things during negotiation?
>
> At an object level we don't have any means of knowing what has or
> hasn't been obtained via fetch to a partial clone with different
> `--filter` args (via config or cli), dynamic fault-ins, or sourced
> from a different remote. Fetch negotiation only occurs for refs and
> their associated commits/histories, but filtering occurs at the blob
> or tree level — so we often HAVE a commit but not all of its
> trees/blobs, whereupon negotiation skips that commit and all it's
> associated objects.

Yes, I'm basically asking if the negotiation part wouldn't *ideally* be
doing basically the same "everything is connected" check
receive-pack/fsck do.

I.e. you've got partial data with promisors locally, but if you walk
your branch histor(y|ies) you'll discover that N commits down we have
all the prerequisite objects locally.

As an aside there's a 1=1 mapping between that and what "git bundle
create" will do/verify to create a bundle without listed
prerequisites.

I.e. I think you'll find what it does with revision.c and PREREQ_* and
other flags INTERESTING (a lame pun on its use of UNINTERESTING :).

Presumably the code needed to drive such a negotiation would be useful
for other neat stuff, e.g. having a some-partial repo locally, wanting
to fetch the PACK to complete it from the server, and knowing you have
that data to create a fully connected (or incremental) bundle for that
repository, but I digress.

>> But secondly, on the "--repair" name: The reason I mentioned that is
>> that I'd really like us to actually have a "my repo is screwed, please
>> repair it".
>
> Feels like people would look at `fsck` for that over `fetch`? Maybe
> not. Anyway, I get the point about the naming still not being right
> :-)

I think that definitely would be fetch/gc over "fsck". I.e. if you've
got corruption fsck can only tell you that it's screwed.

It's fetch/gc (or "git bundle unbundle") that stand any chance of
actually doing the repair, since we'd need to stitch together the
(partially) corrupted local/remote content with a hopefully good
compliment to it.

FWIW I had an ad-hoc implementation of this basically working by
disabling the negotiation + not doing any object existence/collision
checks before writing content to the repository.

That and teaching "repack" to not die and instead to carry on in the
face of object decoding failure (and hopefully discover a "duplicate"
but good copy later) + "gc" is enough to repair most corruption,
e.g. truncated loose object etc.

>> But (and I haven't tested, but I'm pretty sure), this patch series isn't
>> going to give you that. The reasons are elaborated on in [1], basically
>> we try really hard to re-use local data, and due to that & the collision
>> detection will often just hard die early in object walking.
>>
>> But maybe I'm wrong, have you actually tested this with *broken* objects
>> as opposed to just missing ones with repo filters + promisors in play?
>> Our t/*fsck* and t/*corrupt*/ etc. tests have some of those.
>
> Correct: I haven't tested with such objects/broken ODBs. Ideally
> repack/gc/etc would prefer a new-fixed pack over the old-broken
> pack/object but that's not really what I'm aiming to achieve here or
> am interested in.

I think I've only tested loose (bad) + pack (good), I think pack (bad) +
pack (good) probably has some bigger caveats (like the first error
aborting the whole pack read, due to deltas etc.).

But yeah, I'm not saying this should be on your radar at all, other than
the bikeshedding comment of having a --repair that doesn't really do
"repair" would be unfortunate naming, especially if we're locked into
behavior orthagonal to that needed for an "actual" repair.

> 1. https://lore.kernel.org/git/20220202185957.1928631-1-jonathantanmy@google.com/


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                     ` (8 preceding siblings ...)
  2022-02-28 16:43   ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
@ 2022-03-04 15:04   ` Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
                       ` (8 more replies)
  9 siblings, 9 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup

If a filter is changed on a partial clone repository, for example from
blob:none to blob:limit=1m, there is currently no straightforward way to
bulk-refetch the objects that match the new filter for existing local
commits. This is because the client will report commits as "have" during
fetch negotiation and any dependent objects won't be included in the
transferred pack. Another use case is discussed at [1].

This patch series introduces a --refetch option to fetch & fetch-pack to
enable doing a full fetch without performing any commit negotiation with the
remote, as a fresh clone does. It builds upon cbe566a071 ("negotiator/noop:
add noop fetch negotiator", 2020-08-18).

 * Using --refetch will produce duplicated objects between the existing and
   newly fetched packs, but maintenance will clean them up when it runs
   automatically post-fetch (if enabled).
 * If a user fetches with --refetch applying a more restrictive partial
   clone filter than previously (eg: blob:limit=1m then blob:limit=1k) the
   eventual state is a no-op, since any referenced object already in the
   local repository is never removed. More advanced repacking which could
   improve this scenario is currently proposed at [2].

[1]
https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
[2]
https://lore.kernel.org/git/21ED346B-A906-4905-B061-EDE53691C586@gmail.com/

Changes since v2:

 * Changed the name from "repair" to "refetch". While it's conceivable to
   use it in some object DB repair situations that's not the focus of these
   changes.
 * Pass config options to maintenance via GIT_CONFIG_PARAMETERS
 * Split out auto-maintenance to a separate & more robust test
 * Minor fixes/improvements from reviews by Junio & Ævar

Changes since RFC (v1):

 * Changed the name from "refilter" to "repair"
 * Removed dependency between server-side support for filtering and repair
 * Added a test case for a shallow clone
 * Post-fetch auto maintenance now strongly encourages
   repacking/consolidation

Robert Coup (7):
  fetch-negotiator: add specific noop initializer
  fetch-pack: add refetch
  builtin/fetch-pack: add --refetch option
  fetch: add --refetch option
  t5615-partial-clone: add test for fetch --refetch
  fetch: after refetch, encourage auto gc repacking
  doc/partial-clone: mention --refetch fetch option

 Documentation/fetch-options.txt           | 10 +++
 Documentation/git-fetch-pack.txt          |  4 ++
 Documentation/technical/partial-clone.txt |  3 +
 builtin/fetch-pack.c                      |  4 ++
 builtin/fetch.c                           | 34 +++++++++-
 fetch-negotiator.c                        |  5 ++
 fetch-negotiator.h                        |  8 +++
 fetch-pack.c                              | 46 ++++++++-----
 fetch-pack.h                              |  1 +
 remote-curl.c                             |  6 ++
 t/t5616-partial-clone.sh                  | 81 ++++++++++++++++++++++-
 transport-helper.c                        |  3 +
 transport.c                               |  4 ++
 transport.h                               |  4 ++
 14 files changed, 193 insertions(+), 20 deletions(-)


base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1138%2Frcoup%2Frc-partial-clone-refilter-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1138/rcoup/rc-partial-clone-refilter-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1138

Range-diff vs v2:

 1:  d146d8aaaaf ! 1:  96a75be3d8a fetch-negotiator: add specific noop initializor
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    fetch-negotiator: add specific noop initializor
     +    fetch-negotiator: add specific noop initializer
      
     -    Add a specific initializor for the noop fetch negotiator. This is
     +    Add a specific initializer for the noop fetch negotiator. This is
          introduced to support allowing partial clones to skip commit negotiation
     -    when fetching to repair or apply a modified filter.
     +    when performing a "refetch".
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
 2:  2d817a65db5 ! 2:  04ca6a07f85 fetch-pack: add repairing
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    fetch-pack: add repairing
     +    fetch-pack: add refetch
      
     -    Allow a 'repair fetch' where the contents of the local object store are
     +    Allow a "refetch" where the contents of the local object store are
          ignored and a full fetch is performed, not attempting to find or
          negotiate common commits with the remote.
      
     @@ fetch-pack.c: static int find_common(struct fetch_negotiator *negotiator,
      -		if (((o = lookup_object(the_repository, remote)) != NULL) &&
      -				(o->flags & COMPLETE)) {
      -			continue;
     -+		if (!args->repair) {
     ++		if (!args->refetch) {
      +			/*
      +			* If that object is complete (i.e. it is an ancestor of a
      +			* local ref), we tell them we have it but do not have to
     @@ fetch-pack.c: static int find_common(struct fetch_negotiator *negotiator,
       
       		remote_hex = oid_to_hex(remote);
      @@ fetch-pack.c: static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
     + 	int old_save_commit_buffer = save_commit_buffer;
     + 	timestamp_t cutoff = 0;
       
     - 	save_commit_buffer = 0;
     - 
     -+	if (args->repair)
     ++	if (args->refetch)
      +		return;
      +
     + 	save_commit_buffer = 0;
     + 
       	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
     - 	for (ref = *refs; ref; ref = ref->next) {
     - 		struct commit *commit;
      @@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     - 	struct fetch_negotiator negotiator_alloc;
       	struct fetch_negotiator *negotiator;
       
     --	negotiator = &negotiator_alloc;
     + 	negotiator = &negotiator_alloc;
      -	fetch_negotiator_init(r, negotiator);
     --
     - 	sort_ref_list(&ref, ref_compare_name);
     - 	QSORT(sought, nr_sought, cmp_ref_by_name);
     - 
     -@@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     - 	if (!server_supports_hash(the_hash_algo->name, NULL))
     - 		die(_("Server does not support this repository's object format"));
     - 
     -+	negotiator = &negotiator_alloc;
     -+	if (args->repair) {
     ++	if (args->refetch) {
      +		fetch_negotiator_init_noop(negotiator);
      +	} else {
      +		fetch_negotiator_init(r, negotiator);
      +	}
     -+
     + 
     + 	sort_ref_list(&ref, ref_compare_name);
     + 	QSORT(sought, nr_sought, cmp_ref_by_name);
     +@@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     + 
       	mark_complete_and_common_ref(negotiator, args, &ref);
       	filter_refs(args, &ref, sought, nr_sought);
      -	if (everything_local(args, &ref)) {
     -+	if (!args->repair && everything_local(args, &ref)) {
     ++	if (!args->refetch && everything_local(args, &ref)) {
       		packet_flush(fd[1]);
       		goto all_done;
       	}
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
       
       	negotiator = &negotiator_alloc;
      -	fetch_negotiator_init(r, negotiator);
     -+	if (args->repair)
     ++	if (args->refetch)
      +		fetch_negotiator_init_noop(negotiator);
      +	else
      +		fetch_negotiator_init(r, negotiator);
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
       			mark_complete_and_common_ref(negotiator, args, &ref);
       			filter_refs(args, &ref, sought, nr_sought);
      -			if (everything_local(args, &ref))
     -+			if (!args->repair && everything_local(args, &ref))
     ++			if (!args->refetch && everything_local(args, &ref))
       				state = FETCH_DONE;
       			else
       				state = FETCH_SEND_REQUEST;
     @@ fetch-pack.h: struct fetch_pack_args {
       	unsigned update_shallow:1;
       	unsigned reject_shallow_remote:1;
       	unsigned deepen:1;
     -+	unsigned repair:1;
     ++	unsigned refetch:1;
       
       	/*
       	 * Indicate that the remote of this request is a promisor remote. The
 3:  a42d40ac294 ! 3:  879d30c4473 builtin/fetch-pack: add --repair option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    builtin/fetch-pack: add --repair option
     +    builtin/fetch-pack: add --refetch option
      
     -    Add a --repair option to fetch-pack to force a full fetch. Use when
     +    Add a refetch option to fetch-pack to force a full fetch. Use when
          applying a new partial clone filter to refetch all matching objects.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
     @@ Documentation/git-fetch-pack.txt: be in a separate packet, and the list must end
       	current shallow boundary instead of from the tip of each
       	remote branch history.
       
     -+--repair::
     ++--refetch::
      +	Skips negotiating commits with the server in order to fetch all matching
      +	objects. Use to reapply a new partial clone blob/tree filter.
      +
     @@ builtin/fetch-pack.c: int cmd_fetch_pack(int argc, const char **argv, const char
       			args.from_promisor = 1;
       			continue;
       		}
     -+		if (!strcmp("--repair", arg)) {
     -+			args.repair = 1;
     ++		if (!strcmp("--refetch", arg)) {
     ++			args.refetch = 1;
      +			continue;
      +		}
       		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
     @@ remote-curl.c: struct options {
       		/* see documentation of corresponding flag in fetch-pack.h */
       		from_promisor : 1,
       
     -+		repair : 1,
     ++		refetch : 1,
       		atomic : 1,
       		object_format : 1,
       		force_if_includes : 1;
     @@ remote-curl.c: static int set_option(const char *name, const char *value)
       	} else if (!strcmp(name, "from-promisor")) {
       		options.from_promisor = 1;
       		return 0;
     -+	} else if (!strcmp(name, "repair")) {
     -+		options.repair = 1;
     ++	} else if (!strcmp(name, "refetch")) {
     ++		options.refetch = 1;
      +		return 0;
       	} else if (!strcmp(name, "filter")) {
       		options.filter = xstrdup(value);
     @@ remote-curl.c: static int fetch_git(struct discovery *heads,
       		strvec_push(&args, "--deepen-relative");
       	if (options.from_promisor)
       		strvec_push(&args, "--from-promisor");
     -+	if (options.repair)
     -+		strvec_push(&args, "--repair");
     ++	if (options.refetch)
     ++		strvec_push(&args, "--refetch");
       	if (options.filter)
       		strvec_pushf(&args, "--filter=%s", options.filter);
       	strvec_push(&args, url.buf);
 4:  79c409d0542 ! 4:  a503b98f333 fetch: add --repair option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    fetch: add --repair option
     +    fetch: add --refetch option
      
     -    Teach fetch and transports the --repair option to force a full fetch
     +    Teach fetch and transports the --refetch option to force a full fetch
          without negotiating common commits with the remote. Use when applying a
          new partial clone filter to refetch all matching objects.
      
     @@ Documentation/fetch-options.txt: endif::git-pull[]
       	setting. See linkgit:git-config[1].
       
      +ifndef::git-pull[]
     -+--repair::
     ++--refetch::
      +	Instead of negotiating with the server to avoid transferring commits and
      +	associated objects that are already present locally, this option fetches
      +	all objects as a fresh clone would. Use this to reapply a partial clone
     @@ builtin/fetch.c: static int prune_tags = -1; /* unspecified */
       static int all, append, dry_run, force, keep, multiple, update_head_ok;
       static int write_fetch_head = 1;
      -static int verbosity, deepen_relative, set_upstream;
     -+static int verbosity, deepen_relative, set_upstream, repair;
     ++static int verbosity, deepen_relative, set_upstream, refetch;
       static int progress = -1;
       static int enable_auto_gc = 1;
       static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
     @@ builtin/fetch.c: static struct option builtin_fetch_options[] = {
       	OPT_SET_INT_F(0, "unshallow", &unshallow,
       		      N_("convert to a complete repository"),
       		      1, PARSE_OPT_NONEG),
     -+	OPT_SET_INT_F(0, "repair", &repair,
     ++	OPT_SET_INT_F(0, "refetch", &refetch,
      +		      N_("re-fetch without negotiating common commits"),
      +		      1, PARSE_OPT_NONEG),
       	{ OPTION_STRING, 0, "submodule-prefix", &submodule_prefix, N_("dir"),
     @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
       		return -1;
       
      +	/*
     -+	 * Similarly, if we need to repair, we always want to perform a full
     ++	 * Similarly, if we need to refetch, we always want to perform a full
      +	 * fetch ignoring existing objects.
      +	 */
     -+	if (repair)
     ++	if (refetch)
      +		return -1;
      +
      +
     @@ builtin/fetch.c: static struct transport *prepare_transport(struct remote *remot
       		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
       	if (update_shallow)
       		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
     -+	if (repair)
     -+		set_option(transport, TRANS_OPT_REPAIR, "yes");
     ++	if (refetch)
     ++		set_option(transport, TRANS_OPT_REFETCH, "yes");
       	if (filter_options.choice) {
       		const char *spec =
       			expand_list_objects_filter_spec(&filter_options);
     @@ transport-helper.c: static int fetch_refs(struct transport *transport,
       	if (data->transport_options.update_shallow)
       		set_helper_option(transport, "update-shallow", "true");
       
     -+	if (data->transport_options.repair)
     -+		set_helper_option(transport, "repair", "true");
     ++	if (data->transport_options.refetch)
     ++		set_helper_option(transport, "refetch", "true");
      +
       	if (data->transport_options.filter_options.choice) {
       		const char *spec = expand_list_objects_filter_spec(
     @@ transport.c: static int set_git_option(struct git_transport_options *opts,
       		list_objects_filter_die_if_populated(&opts->filter_options);
       		parse_list_objects_filter(&opts->filter_options, value);
       		return 0;
     -+	} else if (!strcmp(name, TRANS_OPT_REPAIR)) {
     -+		opts->repair = !!value;
     ++	} else if (!strcmp(name, TRANS_OPT_REFETCH)) {
     ++		opts->refetch = !!value;
      +		return 0;
       	} else if (!strcmp(name, TRANS_OPT_REJECT_SHALLOW)) {
       		opts->reject_shallow = !!value;
     @@ transport.c: static int fetch_refs_via_pack(struct transport *transport,
       	args.update_shallow = data->options.update_shallow;
       	args.from_promisor = data->options.from_promisor;
       	args.filter_options = data->options.filter_options;
     -+	args.repair = data->options.repair;
     ++	args.refetch = data->options.refetch;
       	args.stateless_rpc = transport->stateless_rpc;
       	args.server_options = transport->server_options;
       	args.negotiation_tips = data->options.negotiation_tips;
     @@ transport.h: struct git_transport_options {
       	unsigned update_shallow : 1;
       	unsigned reject_shallow : 1;
       	unsigned deepen_relative : 1;
     -+	unsigned repair : 1;
     ++	unsigned refetch : 1;
       
       	/* see documentation of corresponding flag in fetch-pack.h */
       	unsigned from_promisor : 1;
     @@ transport.h: void transport_check_allowed(const char *type);
       #define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
       
      +/* Refetch all objects without negotiating */
     -+#define TRANS_OPT_REPAIR "repair"
     ++#define TRANS_OPT_REFETCH "refetch"
      +
       /* Request atomic (all-or-nothing) updates when pushing */
       #define TRANS_OPT_ATOMIC "atomic"
 5:  38af2bbee79 ! 5:  01f22e784a5 t5615-partial-clone: add test for fetch --repair
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    t5615-partial-clone: add test for fetch --repair
     +    t5615-partial-clone: add test for fetch --refetch
      
     -    Add a test for doing a repair fetch to apply a changed partial clone
     -    filter under protocol v0 and v2.
     +    Add a test for doing a refetch to apply a changed partial clone filter
     +    under protocol v0 and v2.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ t/t5616-partial-clone.sh: test_expect_success 'manual prefetch of missing object
      +	git -C src push -u srv main
      +'
      +
     -+# Do partial fetch to fetch smaller files; then verify that without --repair
     ++# Do partial fetch to fetch smaller files; then verify that without --refetch
      +# applying a new filter does not refetch missing large objects. Then use
     -+# --repair to apply the new filter on existing commits. Test it under both
     ++# --refetch to apply the new filter on existing commits. Test it under both
      +# protocol v2 & v0.
     -+test_expect_success 'apply a different filter using --repair' '
     ++test_expect_success 'apply a different filter using --refetch' '
      +	git -C pc1 fetch --filter=blob:limit=999 origin &&
      +	git -C pc1 rev-list --quiet --objects --missing=print \
      +		main..origin/main >observed &&
      +	test_line_count = 4 observed &&
      +
     -+	git -C pc1 fetch --filter=blob:limit=19999 --repair origin &&
     ++	git -C pc1 fetch --filter=blob:limit=19999 --refetch origin &&
      +	git -C pc1 rev-list --quiet --objects --missing=print \
      +		main..origin/main >observed &&
      +	test_line_count = 2 observed &&
      +
      +	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
     -+		--repair origin &&
     ++		--refetch origin &&
      +	git -C pc1 rev-list --quiet --objects --missing=print \
      +		main..origin/main >observed &&
      +	test_line_count = 0 observed
      +'
      +
     -+test_expect_success 'fetch --repair works with a shallow clone' '
     ++test_expect_success 'fetch --refetch works with a shallow clone' '
      +	git clone --no-checkout --depth=1 --filter=blob:none "file://$(pwd)/srv.bare" pc1s &&
      +	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
      +	test_line_count = 6 observed &&
      +
     -+	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --repair origin &&
     ++	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --refetch origin &&
      +	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
      +	test_line_count = 6 observed
      +'
 6:  cfa6dca8ef4 < -:  ----------- maintenance: add ability to pass config options
 7:  2338c15249a ! 6:  31046625987 fetch: after repair, encourage auto gc repacking
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    fetch: after repair, encourage auto gc repacking
     +    fetch: after refetch, encourage auto gc repacking
      
     -    After invoking `fetch --repair`, the object db will likely contain many
     +    After invoking `fetch --refetch`, the object db will likely contain many
          duplicate objects. If auto-maintenance is enabled, invoke it with
          appropriate settings to encourage repacking/consolidation.
      
     @@ Documentation/fetch-options.txt: ifndef::git-pull[]
       --refmap=<refspec>::
      
       ## builtin/fetch.c ##
     -@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
     - 	struct remote *remote = NULL;
     - 	int result = 0;
     - 	int prune_tags_ok = 1;
     -+	struct strvec auto_maint_opts = STRVEC_INIT;
     -+	int opt_val;
     - 
     - 	packet_trace_identity("fetch");
     - 
      @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
       					     NULL);
       	}
       
      -	if (enable_auto_gc)
     --		run_auto_maintenance(verbosity < 0, NULL);
      +	if (enable_auto_gc) {
     -+		if (repair) {
     ++		if (refetch) {
      +			/*
      +			 * Hint auto-maintenance strongly to encourage repacking,
      +			 * but respect config settings disabling it.
      +			 */
     ++			int opt_val;
      +			if (git_config_get_int("gc.autopacklimit", &opt_val))
      +				opt_val = -1;
      +			if (opt_val != 0)
     -+				strvec_push(&auto_maint_opts, "gc.autoPackLimit=1");
     ++				git_config_push_parameter("gc.autoPackLimit=1");
      +
      +			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
      +				opt_val = -1;
      +			if (opt_val != 0)
     -+				strvec_push(&auto_maint_opts, "maintenance.incremental-repack.auto=-1");
     ++				git_config_push_parameter("maintenance.incremental-repack.auto=-1");
      +		}
     -+		run_auto_maintenance(verbosity < 0, &auto_maint_opts);
     + 		run_auto_maintenance(verbosity < 0);
      +	}
       
        cleanup:
       	string_list_clear(&list, 0);
     -+	strvec_clear(&auto_maint_opts);
     - 	return result;
     - }
      
       ## t/t5616-partial-clone.sh ##
     -@@ t/t5616-partial-clone.sh: test_expect_success 'push new commits to server for file.4.txt' '
     - # Do partial fetch to fetch smaller files; then verify that without --repair
     - # applying a new filter does not refetch missing large objects. Then use
     - # --repair to apply the new filter on existing commits. Test it under both
     --# protocol v2 & v0.
     -+# protocol v2 & v0. Check repacking auto-maintenance is kicked off.
     - test_expect_success 'apply a different filter using --repair' '
     - 	git -C pc1 fetch --filter=blob:limit=999 origin &&
     - 	git -C pc1 rev-list --quiet --objects --missing=print \
     -@@ t/t5616-partial-clone.sh: test_expect_success 'apply a different filter using --repair' '
     - 		main..origin/main >observed &&
     - 	test_line_count = 2 observed &&
     - 
     -+	GIT_TRACE2_EVENT="$(pwd)/trace.log" \
     - 	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
     - 		--repair origin &&
     - 	git -C pc1 rev-list --quiet --objects --missing=print \
     - 		main..origin/main >observed &&
     --	test_line_count = 0 observed
     -+	test_line_count = 0 observed &&
     -+	test_subcommand git -c gc.autoPackLimit=1 -c maintenance.incremental-repack.auto=-1 maintenance run --auto --no-quiet <trace.log
     +@@ t/t5616-partial-clone.sh: test_expect_success 'fetch --refetch works with a shallow clone' '
     + 	test_line_count = 6 observed
       '
       
     - test_expect_success 'fetch --repair works with a shallow clone' '
     ++test_expect_success 'fetch --refetch triggers repacking' '
     ++	GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&
     ++	export GIT_TRACE2_CONFIG_PARAMS &&
     ++
     ++	GIT_TRACE2_EVENT="$PWD/trace1.event" \
     ++	git -C pc1 fetch --refetch origin &&
     ++	test_subcommand git maintenance run --auto --no-quiet <trace1.event &&
     ++	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace1.event &&
     ++	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace1.event &&
     ++
     ++	GIT_TRACE2_EVENT="$PWD/trace2.event" \
     ++	git -c protocol.version=0 \
     ++		-c gc.autoPackLimit=0 \
     ++		-c maintenance.incremental-repack.auto=1234 \
     ++		-C pc1 fetch --refetch origin &&
     ++	test_subcommand git maintenance run --auto --no-quiet <trace2.event &&
     ++	grep \"param\":\"gc.autopacklimit\",\"value\":\"0\" trace2.event &&
     ++	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace2.event &&
     ++
     ++	GIT_TRACE2_EVENT="$PWD/trace3.event" \
     ++	git -c protocol.version=0 \
     ++		-c gc.autoPackLimit=1234 \
     ++		-c maintenance.incremental-repack.auto=0 \
     ++		-C pc1 fetch --refetch origin &&
     ++	test_subcommand git maintenance run --auto --no-quiet <trace3.event &&
     ++	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace3.event &&
     ++	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"0\" trace3.event
     ++'
     ++
     + test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
     + 	test_create_repo submodule &&
     + 	test_commit -C submodule mycommit &&
 8:  20942562a66 ! 7:  f923a06aab5 doc/partial-clone: mention --repair fetch option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    doc/partial-clone: mention --repair fetch option
     +    doc/partial-clone: mention --refetch fetch option
      
     -    Document it for partial clones as a means to refetch with a new filter.
     +    Document it for partial clones as a means to apply a new filter.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     @@ Documentation/technical/partial-clone.txt: Fetching Missing Objects
         currently fetches all objects referred to by the requested objects, even
         though they are not necessary.
       
     -+- Fetching with `--repair` will request a complete new filtered packfile from
     ++- Fetching with `--refetch` will request a complete new filtered packfile from
      +  the remote, which can be used to change a filter without needing to
      +  dynamically fetch missing objects.
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 1/7] fetch-negotiator: add specific noop initializer
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a specific initializer for the noop fetch negotiator. This is
introduced to support allowing partial clones to skip commit negotiation
when performing a "refetch".

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-negotiator.c | 5 +++++
 fetch-negotiator.h | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/fetch-negotiator.c b/fetch-negotiator.c
index 874797d767b..be383367f55 100644
--- a/fetch-negotiator.c
+++ b/fetch-negotiator.c
@@ -23,3 +23,8 @@ void fetch_negotiator_init(struct repository *r,
 		return;
 	}
 }
+
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator)
+{
+	noop_negotiator_init(negotiator);
+}
diff --git a/fetch-negotiator.h b/fetch-negotiator.h
index ea78868504b..e348905a1f0 100644
--- a/fetch-negotiator.h
+++ b/fetch-negotiator.h
@@ -53,7 +53,15 @@ struct fetch_negotiator {
 	void *data;
 };
 
+/*
+ * Initialize a negotiator based on the repository settings.
+ */
 void fetch_negotiator_init(struct repository *r,
 			   struct fetch_negotiator *negotiator);
 
+/*
+ * Initialize a noop negotiator.
+ */
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v3 2/7] fetch-pack: add refetch
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Allow a "refetch" where the contents of the local object store are
ignored and a full fetch is performed, not attempting to find or
negotiate common commits with the remote.

A key use case is to apply a new partial clone blob/tree filter and
refetch all the associated matching content, which would otherwise not
be transferred when the commit objects are already present locally.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-pack.c | 46 +++++++++++++++++++++++++++++-----------------
 fetch-pack.h |  1 +
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 87657907e78..4e1e88eea09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -312,19 +312,21 @@ static int find_common(struct fetch_negotiator *negotiator,
 		const char *remote_hex;
 		struct object *o;
 
-		/*
-		 * If that object is complete (i.e. it is an ancestor of a
-		 * local ref), we tell them we have it but do not have to
-		 * tell them about its ancestors, which they already know
-		 * about.
-		 *
-		 * We use lookup_object here because we are only
-		 * interested in the case we *know* the object is
-		 * reachable and we have already scanned it.
-		 */
-		if (((o = lookup_object(the_repository, remote)) != NULL) &&
-				(o->flags & COMPLETE)) {
-			continue;
+		if (!args->refetch) {
+			/*
+			* If that object is complete (i.e. it is an ancestor of a
+			* local ref), we tell them we have it but do not have to
+			* tell them about its ancestors, which they already know
+			* about.
+			*
+			* We use lookup_object here because we are only
+			* interested in the case we *know* the object is
+			* reachable and we have already scanned it.
+			*/
+			if (((o = lookup_object(the_repository, remote)) != NULL) &&
+					(o->flags & COMPLETE)) {
+				continue;
+			}
 		}
 
 		remote_hex = oid_to_hex(remote);
@@ -692,6 +694,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	int old_save_commit_buffer = save_commit_buffer;
 	timestamp_t cutoff = 0;
 
+	if (args->refetch)
+		return;
+
 	save_commit_buffer = 0;
 
 	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
@@ -1028,7 +1033,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	struct fetch_negotiator *negotiator;
 
 	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	if (args->refetch) {
+		fetch_negotiator_init_noop(negotiator);
+	} else {
+		fetch_negotiator_init(r, negotiator);
+	}
 
 	sort_ref_list(&ref, ref_compare_name);
 	QSORT(sought, nr_sought, cmp_ref_by_name);
@@ -1121,7 +1130,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	mark_complete_and_common_ref(negotiator, args, &ref);
 	filter_refs(args, &ref, sought, nr_sought);
-	if (everything_local(args, &ref)) {
+	if (!args->refetch && everything_local(args, &ref)) {
 		packet_flush(fd[1]);
 		goto all_done;
 	}
@@ -1587,7 +1596,10 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct strvec index_pack_args = STRVEC_INIT;
 
 	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	if (args->refetch)
+		fetch_negotiator_init_noop(negotiator);
+	else
+		fetch_negotiator_init(r, negotiator);
 
 	packet_reader_init(&reader, fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -1613,7 +1625,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			/* Filter 'ref' by 'sought' and those that aren't local */
 			mark_complete_and_common_ref(negotiator, args, &ref);
 			filter_refs(args, &ref, sought, nr_sought);
-			if (everything_local(args, &ref))
+			if (!args->refetch && everything_local(args, &ref))
 				state = FETCH_DONE;
 			else
 				state = FETCH_SEND_REQUEST;
diff --git a/fetch-pack.h b/fetch-pack.h
index 7f94a2a5831..8c7752fc821 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -42,6 +42,7 @@ struct fetch_pack_args {
 	unsigned update_shallow:1;
 	unsigned reject_shallow_remote:1;
 	unsigned deepen:1;
+	unsigned refetch:1;
 
 	/*
 	 * Indicate that the remote of this request is a promisor remote. The
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v3 3/7] builtin/fetch-pack: add --refetch option
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 4/7] fetch: " Robert Coup via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a refetch option to fetch-pack to force a full fetch. Use when
applying a new partial clone filter to refetch all matching objects.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/git-fetch-pack.txt | 4 ++++
 builtin/fetch-pack.c             | 4 ++++
 remote-curl.c                    | 6 ++++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index c9758847937..46747d5f429 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -101,6 +101,10 @@ be in a separate packet, and the list must end with a flush packet.
 	current shallow boundary instead of from the tip of each
 	remote branch history.
 
+--refetch::
+	Skips negotiating commits with the server in order to fetch all matching
+	objects. Use to reapply a new partial clone blob/tree filter.
+
 --no-progress::
 	Do not show the progress.
 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c2d96f4c89a..1f8aec97d47 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.from_promisor = 1;
 			continue;
 		}
+		if (!strcmp("--refetch", arg)) {
+			args.refetch = 1;
+			continue;
+		}
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
 			parse_list_objects_filter(&args.filter_options, arg);
 			continue;
diff --git a/remote-curl.c b/remote-curl.c
index 0dabef2dd7c..fc75600d4c6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -43,6 +43,7 @@ struct options {
 		/* see documentation of corresponding flag in fetch-pack.h */
 		from_promisor : 1,
 
+		refetch : 1,
 		atomic : 1,
 		object_format : 1,
 		force_if_includes : 1;
@@ -198,6 +199,9 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "from-promisor")) {
 		options.from_promisor = 1;
 		return 0;
+	} else if (!strcmp(name, "refetch")) {
+		options.refetch = 1;
+		return 0;
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
@@ -1182,6 +1186,8 @@ static int fetch_git(struct discovery *heads,
 		strvec_push(&args, "--deepen-relative");
 	if (options.from_promisor)
 		strvec_push(&args, "--from-promisor");
+	if (options.refetch)
+		strvec_push(&args, "--refetch");
 	if (options.filter)
 		strvec_pushf(&args, "--filter=%s", options.filter);
 	strvec_push(&args, url.buf);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v3 4/7] fetch: add --refetch option
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-03-04 15:04     ` [PATCH v3 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-04 21:19       ` Junio C Hamano
  2022-03-04 15:04     ` [PATCH v3 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Teach fetch and transports the --refetch option to force a full fetch
without negotiating common commits with the remote. Use when applying a
new partial clone filter to refetch all matching objects.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  9 +++++++++
 builtin/fetch.c                 | 15 ++++++++++++++-
 transport-helper.c              |  3 +++
 transport.c                     |  4 ++++
 transport.h                     |  4 ++++
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index f9036831898..21a247abfa4 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -163,6 +163,15 @@ endif::git-pull[]
 	behavior for a remote may be specified with the remote.<name>.tagOpt
 	setting. See linkgit:git-config[1].
 
+ifndef::git-pull[]
+--refetch::
+	Instead of negotiating with the server to avoid transferring commits and
+	associated objects that are already present locally, this option fetches
+	all objects as a fresh clone would. Use this to reapply a partial clone
+	filter from configuration or using `--filter=` when the filter
+	definition has changed.
+endif::git-pull[]
+
 --refmap=<refspec>::
 	When fetching refs listed on the command line, use the
 	specified refspec (can be given more than once) to map the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95832ba1dfd..f7bcf6fa64d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -59,7 +59,7 @@ static int prune_tags = -1; /* unspecified */
 
 static int all, append, dry_run, force, keep, multiple, update_head_ok;
 static int write_fetch_head = 1;
-static int verbosity, deepen_relative, set_upstream;
+static int verbosity, deepen_relative, set_upstream, refetch;
 static int progress = -1;
 static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
@@ -190,6 +190,9 @@ static struct option builtin_fetch_options[] = {
 	OPT_SET_INT_F(0, "unshallow", &unshallow,
 		      N_("convert to a complete repository"),
 		      1, PARSE_OPT_NONEG),
+	OPT_SET_INT_F(0, "refetch", &refetch,
+		      N_("re-fetch without negotiating common commits"),
+		      1, PARSE_OPT_NONEG),
 	{ OPTION_STRING, 0, "submodule-prefix", &submodule_prefix, N_("dir"),
 		   N_("prepend this to submodule path output"), PARSE_OPT_HIDDEN },
 	OPT_CALLBACK_F(0, "recurse-submodules-default",
@@ -1296,6 +1299,14 @@ static int check_exist_and_connected(struct ref *ref_map)
 	if (deepen)
 		return -1;
 
+	/*
+	 * Similarly, if we need to refetch, we always want to perform a full
+	 * fetch ignoring existing objects.
+	 */
+	if (refetch)
+		return -1;
+
+
 	/*
 	 * check_connected() allows objects to merely be promised, but
 	 * we need all direct targets to exist.
@@ -1492,6 +1503,8 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (refetch)
+		set_option(transport, TRANS_OPT_REFETCH, "yes");
 	if (filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&filter_options);
diff --git a/transport-helper.c b/transport-helper.c
index a0297b0986c..b4dbbabb0c2 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -715,6 +715,9 @@ static int fetch_refs(struct transport *transport,
 	if (data->transport_options.update_shallow)
 		set_helper_option(transport, "update-shallow", "true");
 
+	if (data->transport_options.refetch)
+		set_helper_option(transport, "refetch", "true");
+
 	if (data->transport_options.filter_options.choice) {
 		const char *spec = expand_list_objects_filter_spec(
 			&data->transport_options.filter_options);
diff --git a/transport.c b/transport.c
index 253d6671b1f..e2817b7a715 100644
--- a/transport.c
+++ b/transport.c
@@ -243,6 +243,9 @@ static int set_git_option(struct git_transport_options *opts,
 		list_objects_filter_die_if_populated(&opts->filter_options);
 		parse_list_objects_filter(&opts->filter_options, value);
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_REFETCH)) {
+		opts->refetch = !!value;
+		return 0;
 	} else if (!strcmp(name, TRANS_OPT_REJECT_SHALLOW)) {
 		opts->reject_shallow = !!value;
 		return 0;
@@ -377,6 +380,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
 	args.filter_options = data->options.filter_options;
+	args.refetch = data->options.refetch;
 	args.stateless_rpc = transport->stateless_rpc;
 	args.server_options = transport->server_options;
 	args.negotiation_tips = data->options.negotiation_tips;
diff --git a/transport.h b/transport.h
index a0bc6a1e9eb..12bc08fc339 100644
--- a/transport.h
+++ b/transport.h
@@ -16,6 +16,7 @@ struct git_transport_options {
 	unsigned update_shallow : 1;
 	unsigned reject_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned refetch : 1;
 
 	/* see documentation of corresponding flag in fetch-pack.h */
 	unsigned from_promisor : 1;
@@ -216,6 +217,9 @@ void transport_check_allowed(const char *type);
 /* Filter objects for partial clone and fetch */
 #define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
 
+/* Refetch all objects without negotiating */
+#define TRANS_OPT_REFETCH "refetch"
+
 /* Request atomic (all-or-nothing) updates when pushing */
 #define TRANS_OPT_ATOMIC "atomic"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v3 5/7] t5615-partial-clone: add test for fetch --refetch
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-03-04 15:04     ` [PATCH v3 4/7] fetch: " Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a test for doing a refetch to apply a changed partial clone filter
under protocol v0 and v2.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 t/t5616-partial-clone.sh | 52 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 34469b6ac10..87ebf4b0b1c 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -166,6 +166,56 @@ test_expect_success 'manual prefetch of missing objects' '
 	test_line_count = 0 observed.oids
 '
 
+# create new commits in "src" repo to establish a history on file.4.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.4.txt' '
+	for x in a b c d e f
+	do
+		echo "Mod file.4.txt $x" >src/file.4.txt &&
+		if list_contains "a,b" "$x"; then
+			printf "%10000s" X >>src/file.4.txt
+		fi &&
+		if list_contains "c,d" "$x"; then
+			printf "%20000s" X >>src/file.4.txt
+		fi &&
+		git -C src add file.4.txt &&
+		git -C src commit -m "mod $x" || return 1
+	done &&
+	git -C src push -u srv main
+'
+
+# Do partial fetch to fetch smaller files; then verify that without --refetch
+# applying a new filter does not refetch missing large objects. Then use
+# --refetch to apply the new filter on existing commits. Test it under both
+# protocol v2 & v0.
+test_expect_success 'apply a different filter using --refetch' '
+	git -C pc1 fetch --filter=blob:limit=999 origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 4 observed &&
+
+	git -C pc1 fetch --filter=blob:limit=19999 --refetch origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 2 observed &&
+
+	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
+		--refetch origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 0 observed
+'
+
+test_expect_success 'fetch --refetch works with a shallow clone' '
+	git clone --no-checkout --depth=1 --filter=blob:none "file://$(pwd)/srv.bare" pc1s &&
+	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
+	test_line_count = 6 observed &&
+
+	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --refetch origin &&
+	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
+	test_line_count = 6 observed
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&
@@ -225,7 +275,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 
 	# Auto-fetch all remaining trees and blobs with --missing=error
 	git -C dst rev-list --missing=error --objects main >fetched_objects &&
-	test_line_count = 70 fetched_objects &&
+	test_line_count = 88 fetched_objects &&
 
 	awk -f print_1.awk fetched_objects |
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v3 6/7] fetch: after refetch, encourage auto gc repacking
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-03-04 15:04     ` [PATCH v3 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-04 15:04     ` [PATCH v3 7/7] doc/partial-clone: mention --refetch fetch option Robert Coup via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

After invoking `fetch --refetch`, the object db will likely contain many
duplicate objects. If auto-maintenance is enabled, invoke it with
appropriate settings to encourage repacking/consolidation.

* gc.autoPackLimit: unless this is set to 0 (disabled), override the
  value to 1 to force pack consolidation.
* maintenance.incremental-repack.auto: unless this is set to 0, override
  the value to -1 to force incremental repacking.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  3 ++-
 builtin/fetch.c                 | 19 ++++++++++++++++++-
 t/t5616-partial-clone.sh        | 29 +++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 21a247abfa4..49ae48dca32 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -169,7 +169,8 @@ ifndef::git-pull[]
 	associated objects that are already present locally, this option fetches
 	all objects as a fresh clone would. Use this to reapply a partial clone
 	filter from configuration or using `--filter=` when the filter
-	definition has changed.
+	definition has changed. Automatic post-fetch maintenance will perform
+	object database pack consolidation to remove any duplicate objects.
 endif::git-pull[]
 
 --refmap=<refspec>::
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7bcf6fa64d..1557e8d57c5 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2226,8 +2226,25 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 					     NULL);
 	}
 
-	if (enable_auto_gc)
+	if (enable_auto_gc) {
+		if (refetch) {
+			/*
+			 * Hint auto-maintenance strongly to encourage repacking,
+			 * but respect config settings disabling it.
+			 */
+			int opt_val;
+			if (git_config_get_int("gc.autopacklimit", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				git_config_push_parameter("gc.autoPackLimit=1");
+
+			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				git_config_push_parameter("maintenance.incremental-repack.auto=-1");
+		}
 		run_auto_maintenance(verbosity < 0);
+	}
 
  cleanup:
 	string_list_clear(&list, 0);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 87ebf4b0b1c..4a3778d04a8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -216,6 +216,35 @@ test_expect_success 'fetch --refetch works with a shallow clone' '
 	test_line_count = 6 observed
 '
 
+test_expect_success 'fetch --refetch triggers repacking' '
+	GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&
+	export GIT_TRACE2_CONFIG_PARAMS &&
+
+	GIT_TRACE2_EVENT="$PWD/trace1.event" \
+	git -C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace1.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace1.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace1.event &&
+
+	GIT_TRACE2_EVENT="$PWD/trace2.event" \
+	git -c protocol.version=0 \
+		-c gc.autoPackLimit=0 \
+		-c maintenance.incremental-repack.auto=1234 \
+		-C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace2.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"0\" trace2.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace2.event &&
+
+	GIT_TRACE2_EVENT="$PWD/trace3.event" \
+	git -c protocol.version=0 \
+		-c gc.autoPackLimit=1234 \
+		-c maintenance.incremental-repack.auto=0 \
+		-C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace3.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace3.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"0\" trace3.event
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v3 7/7] doc/partial-clone: mention --refetch fetch option
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-03-04 15:04     ` [PATCH v3 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
@ 2022-03-04 15:04     ` Robert Coup via GitGitGadget
  2022-03-09  0:27     ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Calvin Wan
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  8 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-04 15:04 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Document it for partial clones as a means to apply a new filter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/technical/partial-clone.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index a0dd7c66f24..99f0eb30406 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -181,6 +181,9 @@ Fetching Missing Objects
   currently fetches all objects referred to by the requested objects, even
   though they are not necessary.
 
+- Fetching with `--refetch` will request a complete new filtered packfile from
+  the remote, which can be used to change a filter without needing to
+  dynamically fetch missing objects.
 
 Using many promisor remotes
 ---------------------------
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 4/7] fetch: add --refetch option
  2022-03-04 15:04     ` [PATCH v3 4/7] fetch: " Robert Coup via GitGitGadget
@ 2022-03-04 21:19       ` Junio C Hamano
  2022-03-07 11:31         ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2022-03-04 21:19 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int verbosity, deepen_relative, set_upstream, refetch;
>  static int progress = -1;
>  static int enable_auto_gc = 1;
>  static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
> @@ -190,6 +190,9 @@ static struct option builtin_fetch_options[] = {
>  	OPT_SET_INT_F(0, "unshallow", &unshallow,
>  		      N_("convert to a complete repository"),
>  		      1, PARSE_OPT_NONEG),
> +	OPT_SET_INT_F(0, "refetch", &refetch,
> +		      N_("re-fetch without negotiating common commits"),
> +		      1, PARSE_OPT_NONEG),

I guess the existing --unshallow has the same problem, but it strikes
me odd that these aren't doing a bog-standard OPT_BOOL(), with default
value of "false", like, say "--update-head-ok" does.

That will naturally support things like

	git fetch --refetch --no-refetch

where a later option overrides what an earlier option did.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 4/7] fetch: add --refetch option
  2022-03-04 21:19       ` Junio C Hamano
@ 2022-03-07 11:31         ` Robert Coup
  2022-03-07 17:27           ` Junio C Hamano
  0 siblings, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-03-07 11:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Hi Junio,

On Fri, 4 Mar 2022 at 21:19, Junio C Hamano <gitster@pobox.com> wrote:
> I guess the existing --unshallow has the same problem, but it strikes
> me odd that these aren't doing a bog-standard OPT_BOOL(), with default
> value of "false", like, say "--update-head-ok" does.
>
> That will naturally support things like
>
>         git fetch --refetch --no-refetch
>
> where a later option overrides what an earlier option did.
>

Ah, I literally copied the unshallow one since it seemed boolean-ish
and that's what I wanted. I'll look into it.

Thanks,
Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 4/7] fetch: add --refetch option
  2022-03-07 11:31         ` Robert Coup
@ 2022-03-07 17:27           ` Junio C Hamano
  2022-03-09 10:00             ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Junio C Hamano @ 2022-03-07 17:27 UTC (permalink / raw)
  To: Robert Coup
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Robert Coup <robert@coup.net.nz> writes:

> Ah, I literally copied the unshallow one since it seemed boolean-ish
> and that's what I wanted. I'll look into it.

You may (or may not --- I didn't look) find options that need to be
OPT_BOOL() other than "unshallow" one.

While it would be excellent to add a separate "preliminary clean-up"
step before you add OPT_BOOL("refetch") to fix them to use
OPT_BOOL() instead of OPT_SET_INT(), that will add extra review
cycles to the series by extending the scope of it.

It is OK to leave them, in addition to the new one you are adding,
to use the old and a bit incorrect pattern, as long as you leave a
prominent note that we need to clean them up later.

Thanks.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-03-04 15:04     ` [PATCH v3 7/7] doc/partial-clone: mention --refetch fetch option Robert Coup via GitGitGadget
@ 2022-03-09  0:27     ` Calvin Wan
  2022-03-09  9:57       ` Robert Coup
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  8 siblings, 1 reply; 76+ messages in thread
From: Calvin Wan @ 2022-03-09  0:27 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: Calvin Wan, git, Jonathan Tan, John Cai, Jeff Hostetler,
	Junio C Hamano, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:
> If a filter is changed on a partial clone repository, for example from
> blob:none to blob:limit=1m, there is currently no straightforward way to
> bulk-refetch the objects that match the new filter for existing local
> commits. This is because the client will report commits as "have" during
> fetch negotiation and any dependent objects won't be included in the
> transferred pack. Another use case is discussed at [1].
> 
> This patch series introduces a --refetch option to fetch & fetch-pack to
> enable doing a full fetch without performing any commit negotiation with the
> remote, as a fresh clone does. It builds upon cbe566a071 ("negotiator/noop:
> add noop fetch negotiator", 2020-08-18).

Hi Robert,

This is my first time sending a review to the list, so forgive me for any
mistakes I make or conventions missed. Feedback about my review would be well
appreciated!

Overall I think your patch is well written and the implementation accomplishes
what you describe in your cover letter, however, I would like to discuss another
possible design I thought of. Currently, the user has to know to run ‘--refetch’
after changing the partial clone filter configuration in order to fetch the
commits that match the new filter. Ideally I believe this behavior should be
default so therefore instead of adding an option, if git knew what filter was
last used in the fetch, it could automatically ‘refetch’ everything if there is
a change between the last used filter and the default filter. I’m not sure if
the config is the best location to store the last used filter, but we can use it
as an example for now. The tradeoff here is balancing between having an
additional config variable and having the user know to specify a parameter to
fetch after changing the config. And in the future, if there are other use cases
for needing a fetch that bypasses commit negotiation (I know you described one
such possible use case in v2), the ‘--refetch’ option can easily be readded to
hook into this patch.

Looking forward to hearing your thoughts!

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-03-09  0:27     ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Calvin Wan
@ 2022-03-09  9:57       ` Robert Coup
  2022-03-09 21:32         ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation Junio C Hamano
  0 siblings, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-03-09  9:57 UTC (permalink / raw)
  To: Calvin Wan
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Hi Calvin,

On Wed, 9 Mar 2022 at 00:27, Calvin Wan <calvinwan@google.com> wrote:
>
> This is my first time sending a review to the list, so forgive me for any
> mistakes I make or conventions missed. Feedback about my review would be well
> appreciated!

I'm as new as you to contributing to Git :-)

> Overall I think your patch is well written and the implementation accomplishes
> what you describe in your cover letter,

Thanks! I think if you're completely content you can add your
Reviewed-By as described at
https://git-scm.com/docs/SubmittingPatches#commit-trailers

> I would like to discuss another
> possible design I thought of. Currently, the user has to know to run ‘--refetch’
> after changing the partial clone filter configuration in order to fetch the
> commits that match the new filter. Ideally I believe this behavior should be
> default so therefore instead of adding an option, if git knew what filter was
> last used in the fetch, it could automatically ‘refetch’ everything if there is
> a change between the last used filter and the default filter.

So, if you do a partial clone using `git clone --filter=...` then the
filter is saved into the config at `remote.<name>.partialclonefilter`
and is re-used by default for subsequent fetches from that remote. But
there's nothing to stop `git fetch --filter=...` being run multiple
times with different filters to carefully setup a repository for a
particular use case, or any notion that there has to be "one" filter
in place for a remote.

Running `git fetch --filter=...` doesn't update the remote's partial
clone filter in the config, and IMO it shouldn't for the above reason.

I think having `git config remote.<name>.partialclonefilter
<new-filter>` print out something to the user along the lines of "Your
change to/removal of the filter won't fetch in additional objects
associated with existing commits, you can do this with `fetch
--refetch <remote>`" could be helpful, but after a very quick look I
can't see  anything like that at the moment for other config settings
(ie. no plumbing in place to easily reuse), and I'm not motivated to
add such plumbing.

Cheers,

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 4/7] fetch: add --refetch option
  2022-03-07 17:27           ` Junio C Hamano
@ 2022-03-09 10:00             ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-03-09 10:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Hi Junio,

On Mon, 7 Mar 2022 at 17:27, Junio C Hamano <gitster@pobox.com> wrote:
>
> You may (or may not --- I didn't look) find options that need to be
> OPT_BOOL() other than "unshallow" one.
>
> While it would be excellent to add a separate "preliminary clean-up"
> step before you add OPT_BOOL("refetch") to fix them to use
> OPT_BOOL() instead of OPT_SET_INT(), that will add extra review
> cycles to the series by extending the scope of it.
>
> It is OK to leave them, in addition to the new one you are adding,
> to use the old and a bit incorrect pattern, as long as you leave a
> prominent note that we need to clean them up later.

I think for this series I will just do a re-roll updating refetch to
use OPT_BOOL, and I can do a separate series to look for other
existing booleans using OPT_SET_INT.

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation
  2022-03-09  9:57       ` Robert Coup
@ 2022-03-09 21:32         ` Junio C Hamano
  2022-03-10  1:07           ` Calvin Wan
  2022-03-10 14:29           ` Robert Coup
  0 siblings, 2 replies; 76+ messages in thread
From: Junio C Hamano @ 2022-03-09 21:32 UTC (permalink / raw)
  To: Robert Coup
  Cc: Calvin Wan, Robert Coup via GitGitGadget, git, Jonathan Tan,
	John Cai, Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Robert Coup <robert@coup.net.nz> writes:

> So, if you do a partial clone using `git clone --filter=...` then the
> filter is saved into the config at `remote.<name>.partialclonefilter`
> and is re-used by default for subsequent fetches from that remote. But
> there's nothing to stop `git fetch --filter=...` being run multiple
> times with different filters to carefully setup a repository for a
> particular use case, or any notion that there has to be "one" filter
> in place for a remote.

The way I read Calvin's suggestion was that you won't allow such a
random series of "git fetch"es without updating the "this is the
filter that is consistent with the contents of this repository"
record, which will lead to inconsistencies.  I.e.

 - we must maintain the "filter that is consistent with the contents
   of this repository", which this series does not do, but we should.

 - the "--refetch" is unnecessary and redundant, as long as such a
   record is maintained; when a filter settings changes, we should
   do the equivalent of "--refetch" automatically.

IOW, ...

> Running `git fetch --filter=...` doesn't update the remote's partial
> clone filter in the config, and IMO it shouldn't for the above reason.

... isn't "git fetch --fitler" that does not update the configured
filter (and does not do a refetch automatically) a bug that made the
"refetch" necessary in the first place?

Or perhaps I read Calvin incorrectly?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation
  2022-03-09 21:32         ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation Junio C Hamano
@ 2022-03-10  1:07           ` Calvin Wan
  2022-03-10 14:29           ` Robert Coup
  1 sibling, 0 replies; 76+ messages in thread
From: Calvin Wan @ 2022-03-10  1:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robert Coup, Robert Coup via GitGitGadget, git, Jonathan Tan,
	John Cai, Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On 3/9/2022 1:32 PM, Junio C Hamano wrote:

 > - we must maintain the "filter that is consistent with the contents
 >   of this repository", which this series does not do, but we should.

If this is possible/reasonably implementable then...

 > - the "--refetch" is unnecessary and redundant, as long as such a
 >  record is maintained; when a filter settings changes, we should
 >  do the equivalent of "--refetch" automatically.

... this is the logical conclusion that can be drawn.

> isn't "git fetch --filter" that does not update the configured
> filter (and does not do a refetch automatically) a bug that made the
> "refetch" necessary in the first place?

There are two cases to be considered here:
1. The user changes the filter config using "git config"
2. The user runs "git fetch --filter"

The first case is the use case advocated by Robert for "--refetch".
I originally suggested saving the last used filter into the config
and if the last used filter != the configured filter, then fetch
automatically "refetches" everything.

The second use case is the one Junio believes I am referring to,
however, whether to classify this as a bug or feature, I am
uncertain. Like above, I suggested saving this as the last used
filter to be used as a comparison to the configured filter.
Setting this filter as the configured filter can be a separate
discussion to be had because I can see pros/cons for it.

Ultimately the expectation is that if I run "git fetch" without --filter,
then I will fetch based on the config filter. And if I have previously
run "git fetch" with the same filter, whether through the config or
my own filter, then Git will only fetch the latest refs/objects.



On Wed, Mar 9, 2022 at 1:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Robert Coup <robert@coup.net.nz> writes:
>
> > So, if you do a partial clone using `git clone --filter=...` then the
> > filter is saved into the config at `remote.<name>.partialclonefilter`
> > and is re-used by default for subsequent fetches from that remote. But
> > there's nothing to stop `git fetch --filter=...` being run multiple
> > times with different filters to carefully setup a repository for a
> > particular use case, or any notion that there has to be "one" filter
> > in place for a remote.
>
> The way I read Calvin's suggestion was that you won't allow such a
> random series of "git fetch"es without updating the "this is the
> filter that is consistent with the contents of this repository"
> record, which will lead to inconsistencies.  I.e.
>
>  - we must maintain the "filter that is consistent with the contents
>    of this repository", which this series does not do, but we should.
>
>  - the "--refetch" is unnecessary and redundant, as long as such a
>    record is maintained; when a filter settings changes, we should
>    do the equivalent of "--refetch" automatically.
>
> IOW, ...
>
> > Running `git fetch --filter=...` doesn't update the remote's partial
> > clone filter in the config, and IMO it shouldn't for the above reason.
>
> ... isn't "git fetch --fitler" that does not update the configured
> filter (and does not do a refetch automatically) a bug that made the
> "refetch" necessary in the first place?
>
> Or perhaps I read Calvin incorrectly?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation
  2022-03-09 21:32         ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation Junio C Hamano
  2022-03-10  1:07           ` Calvin Wan
@ 2022-03-10 14:29           ` Robert Coup
  2022-03-21 17:58             ` Calvin Wan
  1 sibling, 1 reply; 76+ messages in thread
From: Robert Coup @ 2022-03-10 14:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Calvin Wan, Robert Coup via GitGitGadget, git, Jonathan Tan,
	John Cai, Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Hi,

On Wed, 9 Mar 2022 at 21:32, Junio C Hamano <gitster@pobox.com> wrote:
>
> The way I read Calvin's suggestion was that you won't allow such a
> random series of "git fetch"es without updating the "this is the
> filter that is consistent with the contents of this repository"
> record, which will lead to inconsistencies.  I.e.
>
>  - we must maintain the "filter that is consistent with the contents
>    of this repository", which this series does not do, but we should.

I don't think we should strive to keep this "consistency" —

>  - the "--refetch" is unnecessary and redundant, as long as such a
>    record is maintained; when a filter settings changes, we should
>    do the equivalent of "--refetch" automatically.

— we don't know how much data has been pulled in by fetches from
different promisor and non-promisor remotes (past & present); or
dynamically faulted in through branch switching or history
exploration. And I can't see any particular benefit in attempting to
keep track of that?

Ævar suggested in future maybe we could figure out which commits a
user definitively has all the blobs & trees for and refetch could
negotiate from that position to improve efficiency: nothing in this
series precludes such an enhancement.

> ... isn't "git fetch --fitler" that does not update the configured
> filter (and does not do a refetch automatically) a bug that made the
> "refetch" necessary in the first place?

I don't believe it's a bug. A fairly obvious partial clone example
I've used before on repos where I want the commit history but not all
the associated data (especially when the history is littered with
giant blobs I don't care about):

  git clone example.com/myrepo --filter=blob:none
  # does a partial clone with no blobs
  # checkout faults in the blobs present at HEAD in bulk to populate
the working tree
  git config --unset remote.origin.partialclonefilter
  # going forward, future fetches include all associated blobs for new commits

Getting all the blobs for all history is something I'm explicitly
trying not to do in this example, but if the next fetch from origin
automatically did a "refetch" after I removed the filter that's
exactly what would happen.

We don't expect users to update `diff.algorithm` in config to run a
minimal diff: using the `--diff-algorithm=` option on the command line
overrides the config. And the same philosophy applies with fetch:
`remote.<name>.partialclonefilter` provides the default filter for
fetches, and a user can override it via `git fetch --filter=`. To me
this is how Git commands are expected to work.

Partial clones are still relatively new and advanced, and I don't
believe we should try and over-predict too much what the correct
behaviour is for a user.

I'd be happy adding something to the documentation for the
`remote.<name>.partialclonefilter` config setting to explain that
changing or removing the filter won't backfill the local object DB and
the user would need `fetch --refetch` for that.

Thanks,
Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation
  2022-03-10 14:29           ` Robert Coup
@ 2022-03-21 17:58             ` Calvin Wan
  2022-03-21 21:34               ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Calvin Wan @ 2022-03-21 17:58 UTC (permalink / raw)
  To: Robert Coup
  Cc: Junio C Hamano, Robert Coup via GitGitGadget, git, Jonathan Tan,
	John Cai, Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Hi Robert,

Documentation for the config setting is an acceptable solution!
Apologies for the late response -- wanted to wait and see if anyone
else on the list had any last thoughts. Also I noticed you were hoping
that Jonathan Tan could take a look at your patch on the What's
Cooking thread. Before I sent my first review out, I discussed your
patch with him so he's been briefed.

Reviewed-by: Calvin Wan <calvinwan@google.com>


On Thu, Mar 10, 2022 at 6:29 AM Robert Coup <robert@coup.net.nz> wrote:
>
> Hi,
>
> On Wed, 9 Mar 2022 at 21:32, Junio C Hamano <gitster@pobox.com> wrote:
> >
> > The way I read Calvin's suggestion was that you won't allow such a
> > random series of "git fetch"es without updating the "this is the
> > filter that is consistent with the contents of this repository"
> > record, which will lead to inconsistencies.  I.e.
> >
> >  - we must maintain the "filter that is consistent with the contents
> >    of this repository", which this series does not do, but we should.
>
> I don't think we should strive to keep this "consistency" —
>
> >  - the "--refetch" is unnecessary and redundant, as long as such a
> >    record is maintained; when a filter settings changes, we should
> >    do the equivalent of "--refetch" automatically.
>
> — we don't know how much data has been pulled in by fetches from
> different promisor and non-promisor remotes (past & present); or
> dynamically faulted in through branch switching or history
> exploration. And I can't see any particular benefit in attempting to
> keep track of that?
>
> Ævar suggested in future maybe we could figure out which commits a
> user definitively has all the blobs & trees for and refetch could
> negotiate from that position to improve efficiency: nothing in this
> series precludes such an enhancement.
>
> > ... isn't "git fetch --fitler" that does not update the configured
> > filter (and does not do a refetch automatically) a bug that made the
> > "refetch" necessary in the first place?
>
> I don't believe it's a bug. A fairly obvious partial clone example
> I've used before on repos where I want the commit history but not all
> the associated data (especially when the history is littered with
> giant blobs I don't care about):
>
>   git clone example.com/myrepo --filter=blob:none
>   # does a partial clone with no blobs
>   # checkout faults in the blobs present at HEAD in bulk to populate
> the working tree
>   git config --unset remote.origin.partialclonefilter
>   # going forward, future fetches include all associated blobs for new commits
>
> Getting all the blobs for all history is something I'm explicitly
> trying not to do in this example, but if the next fetch from origin
> automatically did a "refetch" after I removed the filter that's
> exactly what would happen.
>
> We don't expect users to update `diff.algorithm` in config to run a
> minimal diff: using the `--diff-algorithm=` option on the command line
> overrides the config. And the same philosophy applies with fetch:
> `remote.<name>.partialclonefilter` provides the default filter for
> fetches, and a user can override it via `git fetch --filter=`. To me
> this is how Git commands are expected to work.
>
> Partial clones are still relatively new and advanced, and I don't
> believe we should try and over-predict too much what the correct
> behaviour is for a user.
>
> I'd be happy adding something to the documentation for the
> `remote.<name>.partialclonefilter` config setting to explain that
> changing or removing the filter won't backfill the local object DB and
> the user would need `fetch --refetch` for that.
>
> Thanks,
> Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v3 0/7] fetch: add repair: full refetch without negotiation
  2022-03-21 17:58             ` Calvin Wan
@ 2022-03-21 21:34               ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-03-21 21:34 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Junio C Hamano, Robert Coup via GitGitGadget, git

Hi Calvin

On Mon, 21 Mar 2022 at 17:58, Calvin Wan <calvinwan@google.com> wrote:
> Documentation for the config setting is an acceptable solution!

Great :)

> Apologies for the late response -- wanted to wait and see if anyone
> else on the list had any last thoughts. Also I noticed you were hoping
> that Jonathan Tan could take a look at your patch on the What's
> Cooking thread. Before I sent my first review out, I discussed your
> patch with him so he's been briefed.
>
> Reviewed-by: Calvin Wan <calvinwan@google.com>

Thank you.

@Junio - I'm on leave for the remainder of this week, so expect the
re-roll sometime next week.

Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering")
  2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-03-09  0:27     ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Calvin Wan
@ 2022-03-28 14:02     ` Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
                         ` (6 more replies)
  8 siblings, 7 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup

If a filter is changed on a partial clone repository, for example from
blob:none to blob:limit=1m, there is currently no straightforward way to
bulk-refetch the objects that match the new filter for existing local
commits. This is because the client will report commits as "have" during
fetch negotiation and any dependent objects won't be included in the
transferred pack. Another use case is discussed at [1].

This patch series introduces a --refetch option to fetch & fetch-pack to
enable doing a full fetch without performing any commit negotiation with the
remote, as a fresh clone does. It builds upon cbe566a071 ("negotiator/noop:
add noop fetch negotiator", 2020-08-18).

 * Using --refetch will produce duplicated objects between the existing and
   newly fetched packs, but maintenance will clean them up when it runs
   automatically post-fetch (if enabled).
 * If a user fetches with --refetch applying a more restrictive partial
   clone filter than previously (eg: blob:limit=1m then blob:limit=1k) the
   eventual state is a no-op, since any referenced object already in the
   local repository is never removed. More advanced repacking which could
   improve this scenario is currently proposed at [2].

[1]
https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
[2]
https://lore.kernel.org/git/21ED346B-A906-4905-B061-EDE53691C586@gmail.com/

Changes since v3:

 * Mention fetch --refetch in the remote.<name>.partialclonefilter
   documentation.

Changes since v2:

 * Changed the name from "repair" to "refetch". While it's conceivable to
   use it in some object DB repair situations that's not the focus of these
   changes.
 * Pass config options to maintenance via GIT_CONFIG_PARAMETERS
 * Split out auto-maintenance to a separate & more robust test
 * Minor fixes/improvements from reviews by Junio & Ævar

Changes since RFC (v1):

 * Changed the name from "refilter" to "repair"
 * Removed dependency between server-side support for filtering and repair
 * Added a test case for a shallow clone
 * Post-fetch auto maintenance now strongly encourages
   repacking/consolidation

Reviewed-by: Calvin Wan calvinwan@google.com

Robert Coup (7):
  fetch-negotiator: add specific noop initializer
  fetch-pack: add refetch
  builtin/fetch-pack: add --refetch option
  fetch: add --refetch option
  t5615-partial-clone: add test for fetch --refetch
  fetch: after refetch, encourage auto gc repacking
  docs: mention --refetch fetch option

 Documentation/config/remote.txt           |  6 +-
 Documentation/fetch-options.txt           | 10 +++
 Documentation/git-fetch-pack.txt          |  4 ++
 Documentation/technical/partial-clone.txt |  3 +
 builtin/fetch-pack.c                      |  4 ++
 builtin/fetch.c                           | 34 +++++++++-
 fetch-negotiator.c                        |  5 ++
 fetch-negotiator.h                        |  8 +++
 fetch-pack.c                              | 46 ++++++++-----
 fetch-pack.h                              |  1 +
 remote-curl.c                             |  6 ++
 t/t5616-partial-clone.sh                  | 81 ++++++++++++++++++++++-
 transport-helper.c                        |  3 +
 transport.c                               |  4 ++
 transport.h                               |  4 ++
 15 files changed, 197 insertions(+), 22 deletions(-)


base-commit: abf474a5dd901f28013c52155411a48fd4c09922
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1138%2Frcoup%2Frc-partial-clone-refilter-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1138/rcoup/rc-partial-clone-refilter-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1138

Range-diff vs v3:

 1:  96a75be3d8a = 1:  6cd6d4a59f6 fetch-negotiator: add specific noop initializer
 2:  04ca6a07f85 = 2:  03f0de3d28c fetch-pack: add refetch
 3:  879d30c4473 = 3:  f7942344ff8 builtin/fetch-pack: add --refetch option
 4:  a503b98f333 = 4:  78501bbf281 fetch: add --refetch option
 5:  01f22e784a5 = 5:  6c17167ac1e t5615-partial-clone: add test for fetch --refetch
 6:  31046625987 = 6:  28c07219fd8 fetch: after refetch, encourage auto gc repacking
 7:  f923a06aab5 ! 7:  da1e6de7a9f doc/partial-clone: mention --refetch fetch option
     @@ Metadata
      Author: Robert Coup <robert@coup.net.nz>
      
       ## Commit message ##
     -    doc/partial-clone: mention --refetch fetch option
     +    docs: mention --refetch fetch option
      
     -    Document it for partial clones as a means to apply a new filter.
     +    Document it for partial clones as a means to apply a new filter, and
     +    reference it from the remote.<name>.partialclonefilter config parameter.
      
          Signed-off-by: Robert Coup <robert@coup.net.nz>
      
     + ## Documentation/config/remote.txt ##
     +@@ Documentation/config/remote.txt: remote.<name>.promisor::
     + 	objects.
     + 
     + remote.<name>.partialclonefilter::
     +-	The filter that will be applied when fetching from this
     +-	promisor remote.
     ++	The filter that will be applied when fetching from this	promisor remote.
     ++	Changing or clearing this value will only affect fetches for new commits.
     ++	To fetch associated objects for commits already present in the local object
     ++	database, use the `--refetch` option of linkgit:git-fetch[1].
     +
       ## Documentation/technical/partial-clone.txt ##
      @@ Documentation/technical/partial-clone.txt: Fetching Missing Objects
         currently fetches all objects referred to by the requested objects, even

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 1/7] fetch-negotiator: add specific noop initializer
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a specific initializer for the noop fetch negotiator. This is
introduced to support allowing partial clones to skip commit negotiation
when performing a "refetch".

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-negotiator.c | 5 +++++
 fetch-negotiator.h | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/fetch-negotiator.c b/fetch-negotiator.c
index 874797d767b..be383367f55 100644
--- a/fetch-negotiator.c
+++ b/fetch-negotiator.c
@@ -23,3 +23,8 @@ void fetch_negotiator_init(struct repository *r,
 		return;
 	}
 }
+
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator)
+{
+	noop_negotiator_init(negotiator);
+}
diff --git a/fetch-negotiator.h b/fetch-negotiator.h
index ea78868504b..e348905a1f0 100644
--- a/fetch-negotiator.h
+++ b/fetch-negotiator.h
@@ -53,7 +53,15 @@ struct fetch_negotiator {
 	void *data;
 };
 
+/*
+ * Initialize a negotiator based on the repository settings.
+ */
 void fetch_negotiator_init(struct repository *r,
 			   struct fetch_negotiator *negotiator);
 
+/*
+ * Initialize a noop negotiator.
+ */
+void fetch_negotiator_init_noop(struct fetch_negotiator *negotiator);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 2/7] fetch-pack: add refetch
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-31 15:09         ` Ævar Arnfjörð Bjarmason
  2022-03-28 14:02       ` [PATCH v4 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
                         ` (4 subsequent siblings)
  6 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Allow a "refetch" where the contents of the local object store are
ignored and a full fetch is performed, not attempting to find or
negotiate common commits with the remote.

A key use case is to apply a new partial clone blob/tree filter and
refetch all the associated matching content, which would otherwise not
be transferred when the commit objects are already present locally.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 fetch-pack.c | 46 +++++++++++++++++++++++++++++-----------------
 fetch-pack.h |  1 +
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 87657907e78..4e1e88eea09 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -312,19 +312,21 @@ static int find_common(struct fetch_negotiator *negotiator,
 		const char *remote_hex;
 		struct object *o;
 
-		/*
-		 * If that object is complete (i.e. it is an ancestor of a
-		 * local ref), we tell them we have it but do not have to
-		 * tell them about its ancestors, which they already know
-		 * about.
-		 *
-		 * We use lookup_object here because we are only
-		 * interested in the case we *know* the object is
-		 * reachable and we have already scanned it.
-		 */
-		if (((o = lookup_object(the_repository, remote)) != NULL) &&
-				(o->flags & COMPLETE)) {
-			continue;
+		if (!args->refetch) {
+			/*
+			* If that object is complete (i.e. it is an ancestor of a
+			* local ref), we tell them we have it but do not have to
+			* tell them about its ancestors, which they already know
+			* about.
+			*
+			* We use lookup_object here because we are only
+			* interested in the case we *know* the object is
+			* reachable and we have already scanned it.
+			*/
+			if (((o = lookup_object(the_repository, remote)) != NULL) &&
+					(o->flags & COMPLETE)) {
+				continue;
+			}
 		}
 
 		remote_hex = oid_to_hex(remote);
@@ -692,6 +694,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	int old_save_commit_buffer = save_commit_buffer;
 	timestamp_t cutoff = 0;
 
+	if (args->refetch)
+		return;
+
 	save_commit_buffer = 0;
 
 	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
@@ -1028,7 +1033,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	struct fetch_negotiator *negotiator;
 
 	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	if (args->refetch) {
+		fetch_negotiator_init_noop(negotiator);
+	} else {
+		fetch_negotiator_init(r, negotiator);
+	}
 
 	sort_ref_list(&ref, ref_compare_name);
 	QSORT(sought, nr_sought, cmp_ref_by_name);
@@ -1121,7 +1130,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	mark_complete_and_common_ref(negotiator, args, &ref);
 	filter_refs(args, &ref, sought, nr_sought);
-	if (everything_local(args, &ref)) {
+	if (!args->refetch && everything_local(args, &ref)) {
 		packet_flush(fd[1]);
 		goto all_done;
 	}
@@ -1587,7 +1596,10 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct strvec index_pack_args = STRVEC_INIT;
 
 	negotiator = &negotiator_alloc;
-	fetch_negotiator_init(r, negotiator);
+	if (args->refetch)
+		fetch_negotiator_init_noop(negotiator);
+	else
+		fetch_negotiator_init(r, negotiator);
 
 	packet_reader_init(&reader, fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -1613,7 +1625,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			/* Filter 'ref' by 'sought' and those that aren't local */
 			mark_complete_and_common_ref(negotiator, args, &ref);
 			filter_refs(args, &ref, sought, nr_sought);
-			if (everything_local(args, &ref))
+			if (!args->refetch && everything_local(args, &ref))
 				state = FETCH_DONE;
 			else
 				state = FETCH_SEND_REQUEST;
diff --git a/fetch-pack.h b/fetch-pack.h
index 7f94a2a5831..8c7752fc821 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -42,6 +42,7 @@ struct fetch_pack_args {
 	unsigned update_shallow:1;
 	unsigned reject_shallow_remote:1;
 	unsigned deepen:1;
+	unsigned refetch:1;
 
 	/*
 	 * Indicate that the remote of this request is a promisor remote. The
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 3/7] builtin/fetch-pack: add --refetch option
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 4/7] fetch: " Robert Coup via GitGitGadget
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a refetch option to fetch-pack to force a full fetch. Use when
applying a new partial clone filter to refetch all matching objects.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/git-fetch-pack.txt | 4 ++++
 builtin/fetch-pack.c             | 4 ++++
 remote-curl.c                    | 6 ++++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt
index c9758847937..46747d5f429 100644
--- a/Documentation/git-fetch-pack.txt
+++ b/Documentation/git-fetch-pack.txt
@@ -101,6 +101,10 @@ be in a separate packet, and the list must end with a flush packet.
 	current shallow boundary instead of from the tip of each
 	remote branch history.
 
+--refetch::
+	Skips negotiating commits with the server in order to fetch all matching
+	objects. Use to reapply a new partial clone blob/tree filter.
+
 --no-progress::
 	Do not show the progress.
 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c2d96f4c89a..1f8aec97d47 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.from_promisor = 1;
 			continue;
 		}
+		if (!strcmp("--refetch", arg)) {
+			args.refetch = 1;
+			continue;
+		}
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
 			parse_list_objects_filter(&args.filter_options, arg);
 			continue;
diff --git a/remote-curl.c b/remote-curl.c
index ff44f41011e..67f178b1120 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -43,6 +43,7 @@ struct options {
 		/* see documentation of corresponding flag in fetch-pack.h */
 		from_promisor : 1,
 
+		refetch : 1,
 		atomic : 1,
 		object_format : 1,
 		force_if_includes : 1;
@@ -198,6 +199,9 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "from-promisor")) {
 		options.from_promisor = 1;
 		return 0;
+	} else if (!strcmp(name, "refetch")) {
+		options.refetch = 1;
+		return 0;
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
@@ -1182,6 +1186,8 @@ static int fetch_git(struct discovery *heads,
 		strvec_push(&args, "--deepen-relative");
 	if (options.from_promisor)
 		strvec_push(&args, "--from-promisor");
+	if (options.refetch)
+		strvec_push(&args, "--refetch");
 	if (options.filter)
 		strvec_pushf(&args, "--filter=%s", options.filter);
 	strvec_push(&args, url.buf);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 4/7] fetch: add --refetch option
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-03-28 14:02       ` [PATCH v4 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-31 15:18         ` Ævar Arnfjörð Bjarmason
  2022-03-28 14:02       ` [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Teach fetch and transports the --refetch option to force a full fetch
without negotiating common commits with the remote. Use when applying a
new partial clone filter to refetch all matching objects.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  9 +++++++++
 builtin/fetch.c                 | 15 ++++++++++++++-
 transport-helper.c              |  3 +++
 transport.c                     |  4 ++++
 transport.h                     |  4 ++++
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 6cdd9d43c5a..d03fce5aae0 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -163,6 +163,15 @@ endif::git-pull[]
 	behavior for a remote may be specified with the remote.<name>.tagOpt
 	setting. See linkgit:git-config[1].
 
+ifndef::git-pull[]
+--refetch::
+	Instead of negotiating with the server to avoid transferring commits and
+	associated objects that are already present locally, this option fetches
+	all objects as a fresh clone would. Use this to reapply a partial clone
+	filter from configuration or using `--filter=` when the filter
+	definition has changed.
+endif::git-pull[]
+
 --refmap=<refspec>::
 	When fetching refs listed on the command line, use the
 	specified refspec (can be given more than once) to map the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9b4018f62c4..e391a5dbc55 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -59,7 +59,7 @@ static int prune_tags = -1; /* unspecified */
 
 static int all, append, dry_run, force, keep, multiple, update_head_ok;
 static int write_fetch_head = 1;
-static int verbosity, deepen_relative, set_upstream;
+static int verbosity, deepen_relative, set_upstream, refetch;
 static int progress = -1;
 static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
@@ -190,6 +190,9 @@ static struct option builtin_fetch_options[] = {
 	OPT_SET_INT_F(0, "unshallow", &unshallow,
 		      N_("convert to a complete repository"),
 		      1, PARSE_OPT_NONEG),
+	OPT_SET_INT_F(0, "refetch", &refetch,
+		      N_("re-fetch without negotiating common commits"),
+		      1, PARSE_OPT_NONEG),
 	{ OPTION_STRING, 0, "submodule-prefix", &submodule_prefix, N_("dir"),
 		   N_("prepend this to submodule path output"), PARSE_OPT_HIDDEN },
 	OPT_CALLBACK_F(0, "recurse-submodules-default",
@@ -1304,6 +1307,14 @@ static int check_exist_and_connected(struct ref *ref_map)
 	if (deepen)
 		return -1;
 
+	/*
+	 * Similarly, if we need to refetch, we always want to perform a full
+	 * fetch ignoring existing objects.
+	 */
+	if (refetch)
+		return -1;
+
+
 	/*
 	 * check_connected() allows objects to merely be promised, but
 	 * we need all direct targets to exist.
@@ -1517,6 +1528,8 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (refetch)
+		set_option(transport, TRANS_OPT_REFETCH, "yes");
 	if (filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&filter_options);
diff --git a/transport-helper.c b/transport-helper.c
index a0297b0986c..b4dbbabb0c2 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -715,6 +715,9 @@ static int fetch_refs(struct transport *transport,
 	if (data->transport_options.update_shallow)
 		set_helper_option(transport, "update-shallow", "true");
 
+	if (data->transport_options.refetch)
+		set_helper_option(transport, "refetch", "true");
+
 	if (data->transport_options.filter_options.choice) {
 		const char *spec = expand_list_objects_filter_spec(
 			&data->transport_options.filter_options);
diff --git a/transport.c b/transport.c
index 70e9840a90e..3d64a43ab39 100644
--- a/transport.c
+++ b/transport.c
@@ -250,6 +250,9 @@ static int set_git_option(struct git_transport_options *opts,
 		list_objects_filter_die_if_populated(&opts->filter_options);
 		parse_list_objects_filter(&opts->filter_options, value);
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_REFETCH)) {
+		opts->refetch = !!value;
+		return 0;
 	} else if (!strcmp(name, TRANS_OPT_REJECT_SHALLOW)) {
 		opts->reject_shallow = !!value;
 		return 0;
@@ -384,6 +387,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
 	args.filter_options = data->options.filter_options;
+	args.refetch = data->options.refetch;
 	args.stateless_rpc = transport->stateless_rpc;
 	args.server_options = transport->server_options;
 	args.negotiation_tips = data->options.negotiation_tips;
diff --git a/transport.h b/transport.h
index a0bc6a1e9eb..12bc08fc339 100644
--- a/transport.h
+++ b/transport.h
@@ -16,6 +16,7 @@ struct git_transport_options {
 	unsigned update_shallow : 1;
 	unsigned reject_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned refetch : 1;
 
 	/* see documentation of corresponding flag in fetch-pack.h */
 	unsigned from_promisor : 1;
@@ -216,6 +217,9 @@ void transport_check_allowed(const char *type);
 /* Filter objects for partial clone and fetch */
 #define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
 
+/* Refetch all objects without negotiating */
+#define TRANS_OPT_REFETCH "refetch"
+
 /* Request atomic (all-or-nothing) updates when pushing */
 #define TRANS_OPT_ATOMIC "atomic"
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-03-28 14:02       ` [PATCH v4 4/7] fetch: " Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-31 15:20         ` Ævar Arnfjörð Bjarmason
  2022-03-28 14:02       ` [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
  2022-03-28 14:02       ` [PATCH v4 7/7] docs: mention --refetch fetch option Robert Coup via GitGitGadget
  6 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Add a test for doing a refetch to apply a changed partial clone filter
under protocol v0 and v2.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 t/t5616-partial-clone.sh | 52 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 34469b6ac10..87ebf4b0b1c 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -166,6 +166,56 @@ test_expect_success 'manual prefetch of missing objects' '
 	test_line_count = 0 observed.oids
 '
 
+# create new commits in "src" repo to establish a history on file.4.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.4.txt' '
+	for x in a b c d e f
+	do
+		echo "Mod file.4.txt $x" >src/file.4.txt &&
+		if list_contains "a,b" "$x"; then
+			printf "%10000s" X >>src/file.4.txt
+		fi &&
+		if list_contains "c,d" "$x"; then
+			printf "%20000s" X >>src/file.4.txt
+		fi &&
+		git -C src add file.4.txt &&
+		git -C src commit -m "mod $x" || return 1
+	done &&
+	git -C src push -u srv main
+'
+
+# Do partial fetch to fetch smaller files; then verify that without --refetch
+# applying a new filter does not refetch missing large objects. Then use
+# --refetch to apply the new filter on existing commits. Test it under both
+# protocol v2 & v0.
+test_expect_success 'apply a different filter using --refetch' '
+	git -C pc1 fetch --filter=blob:limit=999 origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 4 observed &&
+
+	git -C pc1 fetch --filter=blob:limit=19999 --refetch origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 2 observed &&
+
+	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
+		--refetch origin &&
+	git -C pc1 rev-list --quiet --objects --missing=print \
+		main..origin/main >observed &&
+	test_line_count = 0 observed
+'
+
+test_expect_success 'fetch --refetch works with a shallow clone' '
+	git clone --no-checkout --depth=1 --filter=blob:none "file://$(pwd)/srv.bare" pc1s &&
+	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
+	test_line_count = 6 observed &&
+
+	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --refetch origin &&
+	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
+	test_line_count = 6 observed
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&
@@ -225,7 +275,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 
 	# Auto-fetch all remaining trees and blobs with --missing=error
 	git -C dst rev-list --missing=error --objects main >fetched_objects &&
-	test_line_count = 70 fetched_objects &&
+	test_line_count = 88 fetched_objects &&
 
 	awk -f print_1.awk fetched_objects |
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-03-28 14:02       ` [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-31 15:22         ` Ævar Arnfjörð Bjarmason
  2022-03-28 14:02       ` [PATCH v4 7/7] docs: mention --refetch fetch option Robert Coup via GitGitGadget
  6 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

After invoking `fetch --refetch`, the object db will likely contain many
duplicate objects. If auto-maintenance is enabled, invoke it with
appropriate settings to encourage repacking/consolidation.

* gc.autoPackLimit: unless this is set to 0 (disabled), override the
  value to 1 to force pack consolidation.
* maintenance.incremental-repack.auto: unless this is set to 0, override
  the value to -1 to force incremental repacking.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/fetch-options.txt |  3 ++-
 builtin/fetch.c                 | 19 ++++++++++++++++++-
 t/t5616-partial-clone.sh        | 29 +++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index d03fce5aae0..622bd84768b 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -169,7 +169,8 @@ ifndef::git-pull[]
 	associated objects that are already present locally, this option fetches
 	all objects as a fresh clone would. Use this to reapply a partial clone
 	filter from configuration or using `--filter=` when the filter
-	definition has changed.
+	definition has changed. Automatic post-fetch maintenance will perform
+	object database pack consolidation to remove any duplicate objects.
 endif::git-pull[]
 
 --refmap=<refspec>::
diff --git a/builtin/fetch.c b/builtin/fetch.c
index e391a5dbc55..e3791f09ed5 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2306,8 +2306,25 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 					     NULL);
 	}
 
-	if (enable_auto_gc)
+	if (enable_auto_gc) {
+		if (refetch) {
+			/*
+			 * Hint auto-maintenance strongly to encourage repacking,
+			 * but respect config settings disabling it.
+			 */
+			int opt_val;
+			if (git_config_get_int("gc.autopacklimit", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				git_config_push_parameter("gc.autoPackLimit=1");
+
+			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
+				opt_val = -1;
+			if (opt_val != 0)
+				git_config_push_parameter("maintenance.incremental-repack.auto=-1");
+		}
 		run_auto_maintenance(verbosity < 0);
+	}
 
  cleanup:
 	string_list_clear(&list, 0);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 87ebf4b0b1c..4a3778d04a8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -216,6 +216,35 @@ test_expect_success 'fetch --refetch works with a shallow clone' '
 	test_line_count = 6 observed
 '
 
+test_expect_success 'fetch --refetch triggers repacking' '
+	GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&
+	export GIT_TRACE2_CONFIG_PARAMS &&
+
+	GIT_TRACE2_EVENT="$PWD/trace1.event" \
+	git -C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace1.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace1.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace1.event &&
+
+	GIT_TRACE2_EVENT="$PWD/trace2.event" \
+	git -c protocol.version=0 \
+		-c gc.autoPackLimit=0 \
+		-c maintenance.incremental-repack.auto=1234 \
+		-C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace2.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"0\" trace2.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace2.event &&
+
+	GIT_TRACE2_EVENT="$PWD/trace3.event" \
+	git -c protocol.version=0 \
+		-c gc.autoPackLimit=1234 \
+		-c maintenance.incremental-repack.auto=0 \
+		-C pc1 fetch --refetch origin &&
+	test_subcommand git maintenance run --auto --no-quiet <trace3.event &&
+	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace3.event &&
+	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"0\" trace3.event
+'
+
 test_expect_success 'partial clone with transfer.fsckobjects=1 works with submodules' '
 	test_create_repo submodule &&
 	test_commit -C submodule mycommit &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 7/7] docs: mention --refetch fetch option
  2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-03-28 14:02       ` [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
@ 2022-03-28 14:02       ` Robert Coup via GitGitGadget
  2022-03-28 17:38         ` Junio C Hamano
  6 siblings, 1 reply; 76+ messages in thread
From: Robert Coup via GitGitGadget @ 2022-03-28 14:02 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Ævar Arnfjörð Bjarmason,
	Calvin Wan, Robert Coup, Robert Coup

From: Robert Coup <robert@coup.net.nz>

Document it for partial clones as a means to apply a new filter, and
reference it from the remote.<name>.partialclonefilter config parameter.

Signed-off-by: Robert Coup <robert@coup.net.nz>
---
 Documentation/config/remote.txt           | 6 ++++--
 Documentation/technical/partial-clone.txt | 3 +++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/remote.txt b/Documentation/config/remote.txt
index a8e6437a903..0678b4bcfef 100644
--- a/Documentation/config/remote.txt
+++ b/Documentation/config/remote.txt
@@ -82,5 +82,7 @@ remote.<name>.promisor::
 	objects.
 
 remote.<name>.partialclonefilter::
-	The filter that will be applied when fetching from this
-	promisor remote.
+	The filter that will be applied when fetching from this	promisor remote.
+	Changing or clearing this value will only affect fetches for new commits.
+	To fetch associated objects for commits already present in the local object
+	database, use the `--refetch` option of linkgit:git-fetch[1].
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index a0dd7c66f24..99f0eb30406 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -181,6 +181,9 @@ Fetching Missing Objects
   currently fetches all objects referred to by the requested objects, even
   though they are not necessary.
 
+- Fetching with `--refetch` will request a complete new filtered packfile from
+  the remote, which can be used to change a filter without needing to
+  dynamically fetch missing objects.
 
 Using many promisor remotes
 ---------------------------
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 7/7] docs: mention --refetch fetch option
  2022-03-28 14:02       ` [PATCH v4 7/7] docs: mention --refetch fetch option Robert Coup via GitGitGadget
@ 2022-03-28 17:38         ` Junio C Hamano
  0 siblings, 0 replies; 76+ messages in thread
From: Junio C Hamano @ 2022-03-28 17:38 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason, Calvin Wan, Robert Coup

"Robert Coup via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  remote.<name>.partialclonefilter::
> -	The filter that will be applied when fetching from this
> -	promisor remote.
> +	The filter that will be applied when fetching from this	promisor remote.
> +	Changing or clearing this value will only affect fetches for new commits.
> +	To fetch associated objects for commits already present in the local object
> +	database, use the `--refetch` option of linkgit:git-fetch[1].

Good advice to add.

Will replace.  I think we've seen this topic enough times and it
looked reasonably well done.  Let's mark it for 'next' unless we
hear objections soonish.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 2/7] fetch-pack: add refetch
  2022-03-28 14:02       ` [PATCH v4 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
@ 2022-03-31 15:09         ` Ævar Arnfjörð Bjarmason
  2022-04-01 10:26           ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-31 15:09 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Calvin Wan, Robert Coup


On Mon, Mar 28 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> Allow a "refetch" where the contents of the local object store are
> ignored and a full fetch is performed, not attempting to find or
> negotiate common commits with the remote.
>
> A key use case is to apply a new partial clone blob/tree filter and
> refetch all the associated matching content, which would otherwise not
> be transferred when the commit objects are already present locally.

FWIW it's not clear to me re earlier comments on earlier iterations
whether the "noop fetch" is per-se wanted for this feature for some
reason, or if it's just much easier to implement than doing what I
suggested in:
https://lore.kernel.org/git/220228.86ee3m39jf.gmgdl@evledraar.gmail.com/

I don't think such a thing should hold this series up, but as it would
be a bit kinder to servers I think it's worth at least noting in the
commit message what's desired per-se here, v.s. what's just needed for
the convenience of implementation.

I.e. when this series was in an earlier iteration the scope was to
repair repository corruption, which I pointed out to you it really
couldn't do without more wider changes to the object store management,
and at that point having it be NOOP definitely makes sense. The object
lookups etc. take shortcuts that "fsck" wouldn't do, so we could be
negotiating on the basis of corrupt content.

But now that it's a "fetch what's missing" wouldn't it make more sense
to descend from our otherwise-negotiated tips, and find the OIDs that
are "complete", if any, and negotiate with those?

Which again, I think it's fine to say "yeah, that would be ideal, but
this is easier". I'm just checking if I'm missing some subtlety here...

> Signed-off-by: Robert Coup <robert@coup.net.nz>
> ---
>  fetch-pack.c | 46 +++++++++++++++++++++++++++++-----------------
>  fetch-pack.h |  1 +
>  2 files changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 87657907e78..4e1e88eea09 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -312,19 +312,21 @@ static int find_common(struct fetch_negotiator *negotiator,
>  		const char *remote_hex;
>  		struct object *o;
>  
> -		/*
> -		 * If that object is complete (i.e. it is an ancestor of a
> -		 * local ref), we tell them we have it but do not have to
> -		 * tell them about its ancestors, which they already know
> -		 * about.
> -		 *
> -		 * We use lookup_object here because we are only
> -		 * interested in the case we *know* the object is
> -		 * reachable and we have already scanned it.
> -		 */
> -		if (((o = lookup_object(the_repository, remote)) != NULL) &&
> -				(o->flags & COMPLETE)) {
> -			continue;
> +		if (!args->refetch) {
> +			/*
> +			* If that object is complete (i.e. it is an ancestor of a
> +			* local ref), we tell them we have it but do not have to
> +			* tell them about its ancestors, which they already know
> +			* about.
> +			*
> +			* We use lookup_object here because we are only
> +			* interested in the case we *know* the object is
> +			* reachable and we have already scanned it.
> +			*/
> +			if (((o = lookup_object(the_repository, remote)) != NULL) &&
> +					(o->flags & COMPLETE)) {
> +				continue;
> +			}

nit: remove the {} here
nit: this double-indented if can just be one if-statement
nit: don't compare against NULL, use ! instead.
>  		}
>  
>  		remote_hex = oid_to_hex(remote);
> @@ -692,6 +694,9 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  	int old_save_commit_buffer = save_commit_buffer;
>  	timestamp_t cutoff = 0;
>  
> +	if (args->refetch)
> +		return;
> +
>  	save_commit_buffer = 0;

nit: This function has only two callers, perhaps it's clearer to do do
this "early abort" in those calls?

>  	trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
> @@ -1028,7 +1033,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  	struct fetch_negotiator *negotiator;
>  
>  	negotiator = &negotiator_alloc;
> -	fetch_negotiator_init(r, negotiator);
> +	if (args->refetch) {
> +		fetch_negotiator_init_noop(negotiator);
> +	} else {
> +		fetch_negotiator_init(r, negotiator);
> +	}

More needless {}

>  	sort_ref_list(&ref, ref_compare_name);
>  	QSORT(sought, nr_sought, cmp_ref_by_name);
> @@ -1121,7 +1130,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  
>  	mark_complete_and_common_ref(negotiator, args, &ref);
>  	filter_refs(args, &ref, sought, nr_sought);
> -	if (everything_local(args, &ref)) {
> +	if (!args->refetch && everything_local(args, &ref)) {
>  		packet_flush(fd[1]);
>  		goto all_done;
>  	}

Here everything_local() is doing what I suggested for
mark_complete_and_common_ref() above, i.e. we check args->refetch first.

> @@ -1587,7 +1596,10 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  	struct strvec index_pack_args = STRVEC_INIT;
>  
>  	negotiator = &negotiator_alloc;
> -	fetch_negotiator_init(r, negotiator);
> +	if (args->refetch)
> +		fetch_negotiator_init_noop(negotiator);
> +	else
> +		fetch_negotiator_init(r, negotiator);

This one doesn't have braces (good), unlike do_fetch_pack()

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 4/7] fetch: add --refetch option
  2022-03-28 14:02       ` [PATCH v4 4/7] fetch: " Robert Coup via GitGitGadget
@ 2022-03-31 15:18         ` Ævar Arnfjörð Bjarmason
  2022-04-01 10:31           ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-31 15:18 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Calvin Wan, Robert Coup


On Mon, Mar 28 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> Teach fetch and transports the --refetch option to force a full fetch
> without negotiating common commits with the remote. Use when applying a
> new partial clone filter to refetch all matching objects.
>
> [...]
> +ifndef::git-pull[]
> +--refetch::
> +	Instead of negotiating with the server to avoid transferring commits and
> +	associated objects that are already present locally, this option fetches
> +	all objects as a fresh clone would. Use this to reapply a partial clone
> +	filter from configuration or using `--filter=` when the filter
> +	definition has changed.
> +endif::git-pull[]

Re my comment on negotiation specifics in 2/7, this documentation is
really over-promising depending on what the answer to that is:
https://lore.kernel.org/git/220331.86o81mp2w1.gmgdl@evledraar.gmail.com/

I.e. instead of saying that we WILL fetch all objects "just like a
clone" shouldn't we have less focus on implementation details here, and
assure the user that we'll make their object store "complete" as though
--filter hadn't been used, without going into the specifics of what'll
happen over the wire?
>  		return -1;
>  
> +	/*
> +	 * Similarly, if we need to refetch, we always want to perform a full
> +	 * fetch ignoring existing objects.
> +	 */
> +	if (refetch)
> +		return -1;
> +
> +

One too many \n here.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch
  2022-03-28 14:02       ` [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
@ 2022-03-31 15:20         ` Ævar Arnfjörð Bjarmason
  2022-04-01 10:36           ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-31 15:20 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Calvin Wan, Robert Coup


On Mon, Mar 28 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> Add a test for doing a refetch to apply a changed partial clone filter
> under protocol v0 and v2.
>
> Signed-off-by: Robert Coup <robert@coup.net.nz>
> ---
>  t/t5616-partial-clone.sh | 52 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 34469b6ac10..87ebf4b0b1c 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -166,6 +166,56 @@ test_expect_success 'manual prefetch of missing objects' '
>  	test_line_count = 0 observed.oids
>  '
>  
> +# create new commits in "src" repo to establish a history on file.4.txt
> +# and push to "srv.bare".
> +test_expect_success 'push new commits to server for file.4.txt' '
> +	for x in a b c d e f
> +	do
> +		echo "Mod file.4.txt $x" >src/file.4.txt &&
> +		if list_contains "a,b" "$x"; then
> +			printf "%10000s" X >>src/file.4.txt
> +		fi &&
> +		if list_contains "c,d" "$x"; then
> +			printf "%20000s" X >>src/file.4.txt
> +		fi &&
> +		git -C src add file.4.txt &&
> +		git -C src commit -m "mod $x" || return 1
> +	done &&
> +	git -C src push -u srv main
> +'
> +
> +# Do partial fetch to fetch smaller files; then verify that without --refetch
> +# applying a new filter does not refetch missing large objects. Then use
> +# --refetch to apply the new filter on existing commits. Test it under both
> +# protocol v2 & v0.
> +test_expect_success 'apply a different filter using --refetch' '
> +	git -C pc1 fetch --filter=blob:limit=999 origin &&
> +	git -C pc1 rev-list --quiet --objects --missing=print \
> +		main..origin/main >observed &&
> +	test_line_count = 4 observed &&
> +
> +	git -C pc1 fetch --filter=blob:limit=19999 --refetch origin &&

Is 19999 just "arbitrary big number" here?

> +	git -C pc1 rev-list --quiet --objects --missing=print \
> +		main..origin/main >observed &&
> +	test_line_count = 2 observed &&
> +
> +	git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
> +		--refetch origin &&
> +	git -C pc1 rev-list --quiet --objects --missing=print \
> +		main..origin/main >observed &&
> +	test_line_count = 0 observed

Does this test_line_count *really* want to be = 0, or does this mean
test_must_be_empty?

I.e. are we expecting content here, just not ending in a \n, or nothing
at all?

> +'
> +
> +test_expect_success 'fetch --refetch works with a shallow clone' '
> +	git clone --no-checkout --depth=1 --filter=blob:none "file://$(pwd)/srv.bare" pc1s &&
> +	git -C pc1s rev-list --objects --missing=print HEAD >observed &&
> +	test_line_count = 6 observed &&
> +
> +	GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --refetch origin &&

Why the GIT_TRACE=1? Seems to not be used.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking
  2022-03-28 14:02       ` [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
@ 2022-03-31 15:22         ` Ævar Arnfjörð Bjarmason
  2022-04-01 10:51           ` Robert Coup
  0 siblings, 1 reply; 76+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-31 15:22 UTC (permalink / raw)
  To: Robert Coup via GitGitGadget
  Cc: git, Jonathan Tan, John Cai, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Calvin Wan, Robert Coup


On Mon, Mar 28 2022, Robert Coup via GitGitGadget wrote:

> From: Robert Coup <robert@coup.net.nz>
>
> After invoking `fetch --refetch`, the object db will likely contain many
> duplicate objects. If auto-maintenance is enabled, invoke it with
> appropriate settings to encourage repacking/consolidation.
>
> * gc.autoPackLimit: unless this is set to 0 (disabled), override the
>   value to 1 to force pack consolidation.
> * maintenance.incremental-repack.auto: unless this is set to 0, override
>   the value to -1 to force incremental repacking.
>
> Signed-off-by: Robert Coup <robert@coup.net.nz>
> ---
>  Documentation/fetch-options.txt |  3 ++-
>  builtin/fetch.c                 | 19 ++++++++++++++++++-
>  t/t5616-partial-clone.sh        | 29 +++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
> index d03fce5aae0..622bd84768b 100644
> --- a/Documentation/fetch-options.txt
> +++ b/Documentation/fetch-options.txt
> @@ -169,7 +169,8 @@ ifndef::git-pull[]
>  	associated objects that are already present locally, this option fetches
>  	all objects as a fresh clone would. Use this to reapply a partial clone
>  	filter from configuration or using `--filter=` when the filter
> -	definition has changed.
> +	definition has changed. Automatic post-fetch maintenance will perform
> +	object database pack consolidation to remove any duplicate objects.
>  endif::git-pull[]
>  
>  --refmap=<refspec>::
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e391a5dbc55..e3791f09ed5 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2306,8 +2306,25 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  					     NULL);
>  	}
>  
> -	if (enable_auto_gc)
> +	if (enable_auto_gc) {
> +		if (refetch) {
> +			/*
> +			 * Hint auto-maintenance strongly to encourage repacking,
> +			 * but respect config settings disabling it.
> +			 */
> +			int opt_val;

nit: add a \n after this.

> +			if (git_config_get_int("gc.autopacklimit", &opt_val))
> +				opt_val = -1;
> +			if (opt_val != 0)

nit: don't compare against 0 or null,  just !opt_val

Isn't this whole thing also clearer as:

	int &forget;

        if (git_conf...(..., &forget))
		git_config_push_parameter("gc.autoPackLimit=1");

Maybe I haven't eyeballed this enough, but aren't you ignoring explicit
gc.autoPackLimit=0 configuration? Whereas what you seem to want is "set
this config unlress the user has it set", for which we only need to
check the git_config...(...) return value, no?

> +				git_config_push_parameter("gc.autoPackLimit=1");
> +
> +			if (git_config_get_int("maintenance.incremental-repack.auto", &opt_val))
> +				opt_val = -1;
> +			if (opt_val != 0)
> +				git_config_push_parameter("maintenance.incremental-repack.auto=-1");

hrm, do we really need to set both of these these days (not saying we
don't, just surprised). I.e. both gc.* an maintenance.* config.

*skims the code*

Urgh, yes? too_many_packs() seems to check gc.* only, but
incremental_repack_auto_condition() check this variable... :(

> +test_expect_success 'fetch --refetch triggers repacking' '
> +	GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&

Nit: Can we use GIT_CONFIG_KEY_* et al for this these days, or do we
still need this trace2 thingy?

> +	export GIT_TRACE2_CONFIG_PARAMS &&
> +

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 2/7] fetch-pack: add refetch
  2022-03-31 15:09         ` Ævar Arnfjörð Bjarmason
@ 2022-04-01 10:26           ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-04-01 10:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee, Calvin Wan

Hi Ævar,

< just when I was thinking I'm done... ;-) >

On Thu, 31 Mar 2022 at 16:17, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> FWIW it's not clear to me re earlier comments on earlier iterations
> whether the "noop fetch" is per-se wanted for this feature for some
> reason, or if it's just much easier to implement than doing what I
> suggested in:
> https://lore.kernel.org/git/220228.86ee3m39jf.gmgdl@evledraar.gmail.com/

The noop-fetch is an implementation detail IMO, and can be improved
if/when someone is motivated later.

> I don't think such a thing should hold this series up, but as it would
> be a bit kinder to servers I think it's worth at least noting in the
> commit message what's desired per-se here, v.s. what's just needed for
> the convenience of implementation.

Yes, doing some per-commit negotiation is conceptually nicer. The
user-facing alternative at this point is basically running "clone" in
some form; or temporarily moving aside the object DB and running
fetch. Both those (and the new approach) all end up putting the same
load on the server.

I can note it in the commit message.

> I.e. when this series was in an earlier iteration the scope was to
> repair repository corruption

That was never _my_ motivation, I was a bit eager in reflecting some
comments I received suggesting it would be useful for that as well.

> But now that it's a "fetch what's missing" wouldn't it make more sense
> to descend from our otherwise-negotiated tips, and find the OIDs that
> are "complete", if any, and negotiate with those?
>
> Which again, I think it's fine to say "yeah, that would be ideal, but
> this is easier". I'm just checking if I'm missing some subtlety here...

Yeah, that would be ideal, but this is easier. AFAICS I'm not putting
in place any barriers to making it smarter later.

> nit: remove the {} here
> nit: this double-indented if can just be one if-statement
> nit: don't compare against NULL, use ! instead.

Is the common interpretation for nits "Do another re-roll"; "If you're
doing another re-roll"; or "For future reference"?

> nit: This function has only two callers, perhaps it's clearer to do do
> this "early abort" in those calls?

I discussed similar in
https://lore.kernel.org/git/CACf-nVePhtm_HAzAKzcap0E8kiyyEJPY_+N+bbPcYPVUkjweFg@mail.gmail.com/
but yes, I think you're right about mark_complete_and_common_ref().
Will see what it looks like.

Thanks, Rob.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 4/7] fetch: add --refetch option
  2022-03-31 15:18         ` Ævar Arnfjörð Bjarmason
@ 2022-04-01 10:31           ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-04-01 10:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee, Calvin Wan

Hi Ævar,

On Thu, 31 Mar 2022 at 16:20, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > +--refetch::
> > +     Instead of negotiating with the server to avoid transferring commits and
> > +     associated objects that are already present locally, this option fetches
> > +     all objects as a fresh clone would. Use this to reapply a partial clone
> > +     filter from configuration or using `--filter=` when the filter
> > +     definition has changed.
>
> Re my comment on negotiation specifics in 2/7, this documentation is
> really over-promising depending on what the answer to that is:
> https://lore.kernel.org/git/220331.86o81mp2w1.gmgdl@evledraar.gmail.com/
>
> I.e. instead of saying that we WILL fetch all objects "just like a
> clone" shouldn't we have less focus on implementation details here, and
> assure the user that we'll make their object store "complete" as though
> --filter hadn't been used, without going into the specifics of what'll
> happen over the wire?

That's not quite the case though: we'll make their object db complete
as if a fresh clone with any specified --filter (including none) from
that remote has been done.

IMO explaining what will happen and to expect (particularly as this is
_not_ transferring some sort of object-delta-set, clone transfers can
be big) is a good thing. If it improves later; then change the docs.

Alternatively, rewording along the lines of "it will fix up your
object db to match a changed filter" is implying that only the missing
bits will be fetched.

Thanks, Rob.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch
  2022-03-31 15:20         ` Ævar Arnfjörð Bjarmason
@ 2022-04-01 10:36           ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-04-01 10:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee, Calvin Wan

Hi,

On Thu, 31 Mar 2022 at 16:22, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>

> > +
> > +     git -C pc1 fetch --filter=blob:limit=19999 --refetch origin &&
>
> Is 19999 just "arbitrary big number" here?

A selected big number so we can change it and have different objects match.

>
> > +     git -C pc1 rev-list --quiet --objects --missing=print \
> > +             main..origin/main >observed &&
> > +     test_line_count = 2 observed &&
> > +
> > +     git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 \
> > +             --refetch origin &&
> > +     git -C pc1 rev-list --quiet --objects --missing=print \
> > +             main..origin/main >observed &&
> > +     test_line_count = 0 observed
>
> Does this test_line_count *really* want to be = 0, or does this mean
> test_must_be_empty?
>
> I.e. are we expecting content here, just not ending in a \n, or nothing
> at all?

I'm expecting no missing objects after a refetch with a new filter
that matches all the objects in the repo.

> > +     GIT_TRACE=1 git -C pc1s fetch --filter=blob:limit=999 --refetch origin &&
>
> Why the GIT_TRACE=1? Seems to not be used.

Ah, extraneous debugging on my part.

Thanks, Rob :)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking
  2022-03-31 15:22         ` Ævar Arnfjörð Bjarmason
@ 2022-04-01 10:51           ` Robert Coup
  0 siblings, 0 replies; 76+ messages in thread
From: Robert Coup @ 2022-04-01 10:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Robert Coup via GitGitGadget, git, Jonathan Tan, John Cai,
	Jeff Hostetler, Junio C Hamano, Derrick Stolee, Calvin Wan

Hi Ævar,

On Thu, 31 Mar 2022 at 16:33, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:


> > +                     if (git_config_get_int("gc.autopacklimit", &opt_val))
> > +                             opt_val = -1;
> > +                     if (opt_val != 0)
>
> nit: don't compare against 0 or null,  just !opt_val

I did this since 0 has a specific meaning ("Setting this to 0
disables"), it's not just false-y in this context. Tomayto, tomahto?

>
> Isn't this whole thing also clearer as:
>
>         int &forget;
>
>         if (git_conf...(..., &forget))
>                 git_config_push_parameter("gc.autoPackLimit=1");
>
> Maybe I haven't eyeballed this enough, but aren't you ignoring explicit
> gc.autoPackLimit=0 configuration? Whereas what you seem to want is "set
> this config unlress the user has it set", for which we only need to
> check the git_config...(...) return value, no?

What I'm trying to achieve: if the user has not disabled auto-packing
(autoPackLimit=0), then pass autoPackLimit=1 to the subprocess to
encourage repacking.
Context/why: so we don't 2x the object store size and not even attempt
to repack it now, rather than at some unspecified point in the future.
Maybe.

How the code achieves it:
  load autoPackLimit into opt_val
  if autoPackLimit is not specified in config: set opt_val to -1
  if opt_val is not 0: pass autoPackLimit=1 to the subprocess

AFAICT if we just if(git_config_get_int()) then if they haven't set it
at all in config, we wouldn't encourage repacking in the subprocess.
Which isn't what I'm trying to achieve.

> hrm, do we really need to set both of these these days (not saying we
> don't, just surprised). I.e. both gc.* an maintenance.* config.
>
> *skims the code*
>
> Urgh, yes? too_many_packs() seems to check gc.* only, but
> incremental_repack_auto_condition() check this variable... :(

Yes.

>
> > +test_expect_success 'fetch --refetch triggers repacking' '
> > +     GIT_TRACE2_CONFIG_PARAMS=gc.autoPackLimit,maintenance.incremental-repack.auto &&
>
> Nit: Can we use GIT_CONFIG_KEY_* et al for this these days, or do we
> still need this trace2 thingy?

I copied a pattern existing tests are using.

Thanks, Rob.

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2022-04-01 10:52 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 15:49 [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 1/6] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 2/6] fetch-pack: add partial clone refiltering Robert Coup via GitGitGadget
2022-02-04 18:02   ` Jonathan Tan
2022-02-11 14:56     ` Robert Coup
2022-02-17  0:05       ` Jonathan Tan
2022-02-01 15:49 ` [PATCH 3/6] builtin/fetch-pack: add --refilter option Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 4/6] fetch: " Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 5/6] t5615-partial-clone: add test for --refilter Robert Coup via GitGitGadget
2022-02-01 15:49 ` [PATCH 6/6] doc/partial-clone: mention --refilter option Robert Coup via GitGitGadget
2022-02-01 20:13 ` [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter Junio C Hamano
2022-02-02 15:02   ` Robert Coup
2022-02-16 13:24     ` Robert Coup
2022-02-02 18:59 ` Jonathan Tan
2022-02-02 21:58   ` Robert Coup
2022-02-02 21:59     ` Robert Coup
2022-02-07 19:37 ` Jeff Hostetler
2022-02-24 16:13 ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 1/8] fetch-negotiator: add specific noop initializor Robert Coup via GitGitGadget
2022-02-25  6:19     ` Junio C Hamano
2022-02-28 12:22       ` Robert Coup
2022-02-24 16:13   ` [PATCH v2 2/8] fetch-pack: add repairing Robert Coup via GitGitGadget
2022-02-25  6:46     ` Junio C Hamano
2022-02-28 12:14       ` Robert Coup
2022-02-24 16:13   ` [PATCH v2 3/8] builtin/fetch-pack: add --repair option Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 4/8] fetch: " Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 5/8] t5615-partial-clone: add test for fetch --repair Robert Coup via GitGitGadget
2022-02-24 16:13   ` [PATCH v2 6/8] maintenance: add ability to pass config options Robert Coup via GitGitGadget
2022-02-25  6:57     ` Junio C Hamano
2022-02-28 12:02       ` Robert Coup
2022-02-28 17:07         ` Junio C Hamano
2022-02-25 10:29     ` Ævar Arnfjörð Bjarmason
2022-02-28 11:51       ` Robert Coup
2022-02-24 16:13   ` [PATCH v2 7/8] fetch: after repair, encourage auto gc repacking Robert Coup via GitGitGadget
2022-02-28 16:40     ` Ævar Arnfjörð Bjarmason
2022-02-24 16:13   ` [PATCH v2 8/8] doc/partial-clone: mention --repair fetch option Robert Coup via GitGitGadget
2022-02-28 16:43   ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
2022-02-28 17:27     ` Robert Coup
2022-02-28 18:54       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation Junio C Hamano
2022-02-28 22:20       ` [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering") Ævar Arnfjörð Bjarmason
2022-03-04 15:04   ` [PATCH v3 0/7] " Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 4/7] fetch: " Robert Coup via GitGitGadget
2022-03-04 21:19       ` Junio C Hamano
2022-03-07 11:31         ` Robert Coup
2022-03-07 17:27           ` Junio C Hamano
2022-03-09 10:00             ` Robert Coup
2022-03-04 15:04     ` [PATCH v3 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
2022-03-04 15:04     ` [PATCH v3 7/7] doc/partial-clone: mention --refetch fetch option Robert Coup via GitGitGadget
2022-03-09  0:27     ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Calvin Wan
2022-03-09  9:57       ` Robert Coup
2022-03-09 21:32         ` [PATCH v3 0/7] fetch: add repair: full refetch without negotiation Junio C Hamano
2022-03-10  1:07           ` Calvin Wan
2022-03-10 14:29           ` Robert Coup
2022-03-21 17:58             ` Calvin Wan
2022-03-21 21:34               ` Robert Coup
2022-03-28 14:02     ` [PATCH v4 0/7] fetch: add repair: full refetch without negotiation (was: "refiltering") Robert Coup via GitGitGadget
2022-03-28 14:02       ` [PATCH v4 1/7] fetch-negotiator: add specific noop initializer Robert Coup via GitGitGadget
2022-03-28 14:02       ` [PATCH v4 2/7] fetch-pack: add refetch Robert Coup via GitGitGadget
2022-03-31 15:09         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:26           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 3/7] builtin/fetch-pack: add --refetch option Robert Coup via GitGitGadget
2022-03-28 14:02       ` [PATCH v4 4/7] fetch: " Robert Coup via GitGitGadget
2022-03-31 15:18         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:31           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 5/7] t5615-partial-clone: add test for fetch --refetch Robert Coup via GitGitGadget
2022-03-31 15:20         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:36           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 6/7] fetch: after refetch, encourage auto gc repacking Robert Coup via GitGitGadget
2022-03-31 15:22         ` Ævar Arnfjörð Bjarmason
2022-04-01 10:51           ` Robert Coup
2022-03-28 14:02       ` [PATCH v4 7/7] docs: mention --refetch fetch option Robert Coup via GitGitGadget
2022-03-28 17:38         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.