git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists
@ 2022-08-22 15:12 Derrick Stolee via GitGitGadget
  2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
                   ` (7 more replies)
  0 siblings, 8 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git; +Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee

This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way:

 1. Patches 1-2 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 2. Patches 3-4 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 3. Patch 5 teaches Git to parse Git config files into bundle lists.

 4. Patches 6-7 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22

Thanks,

 * Stolee

Derrick Stolee (5):
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle-uri: fetch a list of bundles

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  22 ++
 Makefile                        |   1 +
 bundle-uri.c                    | 442 +++++++++++++++++++++++++++++++-
 bundle-uri.h                    |  98 ++++++-
 t/helper/test-bundle-uri.c      |  90 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     |  93 +++++++
 t/t5750-bundle-uri-parse.sh     | 141 ++++++++++
 t/test-lib-functions.sh         |  11 +
 11 files changed, 889 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1333
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 1/7] bundle-uri: create bundle_list struct and helpers
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
@ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget
  2022-08-22 17:57   ` Junio C Hamano
  2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.

In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.

Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.

The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:

1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
   together. The client should download all of the advertised data to
   have a complete copy of the data.

2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
   copy of the data. The client can choose arbitrarily from these
   options. In the future, the client may use pings to find the closest
   URI among geodistributed replicas, or use some other heuristic
   information added to the format.

This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4a8cc74ed05..ceeef0b6641 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -4,6 +4,67 @@
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
+#include "hashmap.h"
+#include "pkt-line.h"
+
+static int compare_bundles(const void *hashmap_cmp_fn_data,
+			   const struct hashmap_entry *he1,
+			   const struct hashmap_entry *he2,
+			   const void *id)
+{
+	const struct remote_bundle_info *e1 =
+		container_of(he1, const struct remote_bundle_info, ent);
+	const struct remote_bundle_info *e2 =
+		container_of(he2, const struct remote_bundle_info, ent);
+
+	return strcmp(e1->id, id ? (const char *)id : e2->id);
+}
+
+void init_bundle_list(struct bundle_list *list)
+{
+	memset(list, 0, sizeof(*list));
+
+	/* Implied defaults. */
+	list->mode = BUNDLE_MODE_ALL;
+	list->version = 1;
+
+	hashmap_init(&list->bundles, compare_bundles, NULL, 0);
+}
+
+static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
+				    void *data)
+{
+	free(bundle->id);
+	free(bundle->uri);
+	strbuf_release(&bundle->file);
+	return 0;
+}
+
+void clear_bundle_list(struct bundle_list *list)
+{
+	if (!list)
+		return;
+
+	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
+	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+}
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data)
+{
+	struct remote_bundle_info *info;
+	struct hashmap_iter i;
+
+	hashmap_for_each_entry(&list->bundles, &i, info, ent) {
+		int result = iter(info, data);
+
+		if (result)
+			return result;
+	}
+
+	return 0;
+}
 
 static int find_temp_filename(struct strbuf *name)
 {
diff --git a/bundle-uri.h b/bundle-uri.h
index 8a152f1ef14..6692aa4b170 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -1,7 +1,72 @@
 #ifndef BUNDLE_URI_H
 #define BUNDLE_URI_H
 
+#include "hashmap.h"
+#include "strbuf.h"
+
 struct repository;
+struct string_list;
+
+/**
+ * The remote_bundle_info struct contains information for a single bundle
+ * URI. This may be initialized simply by a given URI or might have
+ * additional metadata associated with it if the bundle was advertised by
+ * a bundle list.
+ */
+struct remote_bundle_info {
+	struct hashmap_entry ent;
+
+	/**
+	 * The 'id' is a name given to the bundle for reference
+	 * by other bundle infos.
+	 */
+	char *id;
+
+	/**
+	 * The 'uri' is the location of the remote bundle so
+	 * it can be downloaded on-demand. This will be NULL
+	 * if there was no table of contents.
+	 */
+	char *uri;
+
+	/**
+	 * If the bundle has been downloaded, then 'file' is a
+	 * filename storing its contents. Otherwise, 'file' is
+	 * an empty string.
+	 */
+	struct strbuf file;
+};
+
+#define REMOTE_BUNDLE_INFO_INIT { \
+	.file = STRBUF_INIT, \
+}
+
+enum bundle_list_mode {
+	BUNDLE_MODE_NONE = 0,
+	BUNDLE_MODE_ALL,
+	BUNDLE_MODE_ANY
+};
+
+/**
+ * A bundle_list contains an unordered set of remote_bundle_info structs,
+ * as well as information about the bundle listing, such as version and
+ * mode.
+ */
+struct bundle_list {
+	int version;
+	enum bundle_list_mode mode;
+	struct hashmap bundles;
+};
+
+void init_bundle_list(struct bundle_list *list);
+void clear_bundle_list(struct bundle_list *list);
+
+typedef int (*bundle_iterator)(struct remote_bundle_info *bundle,
+			       void *data);
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data);
 
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 2/7] bundle-uri: create base key-value pair parsing
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget
  2022-08-22 18:20   ` Junio C Hamano
                     ` (2 more replies)
  2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.

Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:

* bundle.version: This value should be an integer. Git currently
  understands only version 1 and will ignore the list if the version is
  any other value. This version can be increased in the future if we
  need to add new keys that Git should not ignore. We can add new
  "heuristic" keys without incrementing the version.

* bundle.mode: This value should be one of "all" or "any". If this
  mode is not understood, then Git will ignore the list. This mode
  indicates whether Git needs all of the bundle list items to make a
  complete view of the content or if any single item is sufficient.

The rest of the keys use a bundle identifier "<id>" as part of the key
name. Keys using the same "<id>" describe a single bundle list item.

* bundle.<id>.uri: This stores the URI of the bundle item. This
  currently is expected to be an absolute URI, but will be relaxed to be
  a relative URI in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config.txt        |  2 +
 Documentation/config/bundle.txt | 22 ++++++++++
 bundle-uri.c                    | 74 +++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)
 create mode 100644 Documentation/config/bundle.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e376d547ce0..4280af6992e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -387,6 +387,8 @@ include::config/branch.txt[]
 
 include::config/browser.txt[]
 
+include::config/bundle.txt[]
+
 include::config/checkout.txt[]
 
 include::config/clean.txt[]
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
new file mode 100644
index 00000000000..3515bfe38d1
--- /dev/null
+++ b/Documentation/config/bundle.txt
@@ -0,0 +1,22 @@
+bundle.*::
+	The `bundle.*` keys are used when communicating a list of bundle URIs
+	See link:technical/bundle-uri.html[the bundle URI design document] for
+	more details.
+
+bundle.version::
+	This integer value advertises the version of the bundle list format
+	used by the bundle list. Currently, the only accepted value is `1`.
+
+bundle.mode::
+	This string value should be either `all` or `any`. This value describes
+	whether all of the advertised bundles are required to unbundle a
+	complete understanding of the bundled information (`all`) or if any one
+	of the listed bundle URIs is sufficient (`any`).
+
+bundle.<id>.*::
+	The `bundle.<id>.*` keys are used to describe a single item in the
+	bundle list, grouped under `<id>` for identification purposes.
+
+bundle.<id>.uri::
+	This string value defines the URI by which Git can reach the contents
+	of this `<id>`. This URI may be a bundle file or another bundle list.
diff --git a/bundle-uri.c b/bundle-uri.c
index ceeef0b6641..ade7eccce39 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -66,6 +66,80 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+/**
+ * Given a key-value pair, update the state of the given bundle list.
+ * Returns 0 if the key-value pair is understood. Returns 1 if the key
+ * is not understood or the value is malformed.
+ */
+MAYBE_UNUSED
+static int bundle_list_update(const char *key, const char *value,
+			      struct bundle_list *list)
+{
+	const char *pkey, *dot;
+	struct strbuf id = STRBUF_INIT;
+	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
+	struct remote_bundle_info *bundle;
+
+	if (!skip_prefix(key, "bundle.", &pkey))
+		return 1;
+
+	dot = strchr(pkey, '.');
+	if (!dot) {
+		if (!strcmp(pkey, "version")) {
+			int version = atoi(value);
+			if (version != 1)
+				return 1;
+
+			list->version = version;
+			return 0;
+		}
+
+		if (!strcmp(pkey, "mode")) {
+			if (!strcmp(value, "all"))
+				list->mode = BUNDLE_MODE_ALL;
+			else if (!strcmp(value, "any"))
+				list->mode = BUNDLE_MODE_ANY;
+			else
+				return 1;
+			return 0;
+		}
+
+		/* Ignore other unknown global keys. */
+		return 0;
+	}
+
+	strbuf_add(&id, pkey, dot - pkey);
+	dot++;
+
+	/*
+	 * Check for an existing bundle with this <id>, or create one
+	 * if necessary.
+	 */
+	lookup.id = id.buf;
+	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
+	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
+		CALLOC_ARRAY(bundle, 1);
+		bundle->id = strbuf_detach(&id, NULL);
+		strbuf_init(&bundle->file, 0);
+		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
+		hashmap_add(&list->bundles, &bundle->ent);
+	}
+	strbuf_release(&id);
+
+	if (!strcmp(dot, "uri")) {
+		free(bundle->uri);
+		bundle->uri = xstrdup(value);
+		return 0;
+	}
+
+	/*
+	 * At this point, we ignore any information that we don't
+	 * understand, assuming it to be hints for a heuristic the client
+	 * does not currently understand.
+	 */
+	return 0;
+}
+
 static int find_temp_filename(struct strbuf *name)
 {
 	int fd;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 3/7] bundle-uri: create "key=value" line parsing
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
  2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-08-22 15:12 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-08-22 19:17   ` Junio C Hamano
  2022-09-02 23:41   ` Josh Steadmon
  2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

When advertising a bundle list over Git's protocol v2, we will use
packet lines. Each line will be of the form "key=value" representing a
bundle list. Connect the API necessary for Git's transport to the
key-value pair parsing created in the previous change.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 27 ++++++++++++++++++++++++++-
 bundle-uri.h | 14 +++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index ade7eccce39..9a7d09349fe 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list,
  * Returns 0 if the key-value pair is understood. Returns 1 if the key
  * is not understood or the value is malformed.
  */
-MAYBE_UNUSED
 static int bundle_list_update(const char *key, const char *value,
 			      struct bundle_list *list)
 {
@@ -301,3 +300,29 @@ cleanup:
 	strbuf_release(&filename);
 	return result;
 }
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct bundle_list *list, const char *line)
+{
+	int result;
+	const char *equals;
+	struct strbuf key = STRBUF_INIT;
+
+	if (!strlen(line))
+		return error(_("bundle-uri: got an empty line"));
+
+	equals = strchr(line, '=');
+
+	if (!equals)
+		return error(_("bundle-uri: line is not of the form 'key=value'"));
+	if (line == equals || !*(equals + 1))
+		return error(_("bundle-uri: line has empty key or value"));
+
+	strbuf_add(&key, line, equals - line);
+	result = bundle_list_update(key.buf, equals + 1, list);
+	strbuf_release(&key);
+
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index 6692aa4b170..f725c9796f7 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -76,4 +76,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
-#endif
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * Parse a "key=value" packet line from the bundle-uri verb.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int bundle_uri_parse_line(struct bundle_list *list,
+			  const char *line);
+
+#endif /* BUNDLE_URI_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 4/7] bundle-uri: unit test "key=value" parsing
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-08-22 15:12 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-09-01  2:56   ` Teng Long
  2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.

This change introduces the 'parse-key-values' subcommand, which parses
stdin as a list of lines. These are fed into bundle_uri_parse_line() to
test how we construct a 'struct bundle_list' from that data. The list is
then output to stdout as if the key-value pairs were a Git config file.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile                    |  1 +
 bundle-uri.c                | 33 ++++++++++++++
 bundle-uri.h                |  3 ++
 t/helper/test-bundle-uri.c  | 63 +++++++++++++++++++++++++
 t/helper/test-tool.c        |  1 +
 t/helper/test-tool.h        |  1 +
 t/t5750-bundle-uri-parse.sh | 91 +++++++++++++++++++++++++++++++++++++
 t/test-lib-functions.sh     | 11 +++++
 8 files changed, 204 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 7d5f48069ea..7dee0329c49 100644
--- a/Makefile
+++ b/Makefile
@@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index 9a7d09349fe..d56c5e33d5f 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+static int summarize_bundle(struct remote_bundle_info *info, void *data)
+{
+	FILE *fp = data;
+	fprintf(fp, "[bundle \"%s\"]\n", info->id);
+	fprintf(fp, "\turi = %s\n", info->uri);
+	return 0;
+}
+
+void print_bundle_list(FILE *fp, struct bundle_list *list)
+{
+	const char *mode;
+
+	switch (list->mode) {
+	case BUNDLE_MODE_ALL:
+		mode = "all";
+		break;
+
+	case BUNDLE_MODE_ANY:
+		mode = "any";
+		break;
+
+	case BUNDLE_MODE_NONE:
+	default:
+		mode = "<unknown>";
+	}
+
+	printf("[bundle]\n");
+	printf("\tversion = %d\n", list->version);
+	printf("\tmode = %s\n", mode);
+
+	for_all_bundles_in_list(list, summarize_bundle, fp);
+}
+
 /**
  * Given a key-value pair, update the state of the given bundle list.
  * Returns 0 if the key-value pair is understood. Returns 1 if the key
diff --git a/bundle-uri.h b/bundle-uri.h
index f725c9796f7..41a1510a4ac 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -68,6 +68,9 @@ int for_all_bundles_in_list(struct bundle_list *list,
 			    bundle_iterator iter,
 			    void *data);
 
+struct FILE;
+void print_bundle_list(FILE *fp, struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 00000000000..5cb0c9196fa
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,63 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri parse-key-values <in",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct bundle_list list;
+	int err = 0;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc)
+		goto usage;
+
+	init_bundle_list(&list);
+	while (strbuf_getline(&sb, stdin) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf) < 0)
+			err = error("bad line: '%s'", sb.buf);
+	}
+	strbuf_release(&sb);
+
+	print_bundle_list(stdout, &list);
+
+	clear_bundle_list(&list);
+
+	return !!err;
+
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse-key-values"))
+		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c3..fbe2d9d8108 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb799271631..b2aa1f39a8f 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 00000000000..675c1f1d2f4
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
+	cat >in <<-\EOF &&
+	=bogus-value
+	bogus-key=
+	EOF
+
+	cat >err.expect <<-EOF &&
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''=bogus-value'\''
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''bogus-key='\''
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+
+	bundle.two.uri=https://example.com/bundle.bdl
+
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6da7273f1d5..3175d665add 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1956,3 +1956,14 @@ test_is_magic_mtime () {
 	rm -f .git/test-mtime-actual
 	return $ret
 }
+
+# Given two filenames, parse both using 'git config --list --file'
+# and compare the sorted output of those commands. Useful when
+# wanting to ignore whitespace differences and sorting concerns.
+test_cmp_config_output () {
+	git config --list --file="$1" >config-expect &&
+	git config --list --file="$2" >config-actual &&
+	sort config-expect >sorted-expect &&
+	sort config-actual >sorted-actual &&
+	test_cmp sorted-expect sorted-actual
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 5/7] bundle-uri: parse bundle list in config format
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget
  2022-08-22 19:25   ` Junio C Hamano
  2022-09-01  8:05   ` Teng Long
  2022-08-22 15:12 ` [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create parse_bundle_list_in_config_format() to parse a file in config
format and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to call git_config_from_file_with_options() because the
default action for git_config_from_file() is to die() on a parsing
error. The current warning isn't particularly helpful if it arises to a
user, but it will be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version. I would rather have a slightly confusing
test helper than complicated product code.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 29 +++++++++++++++++++++
 bundle-uri.h                | 10 ++++++++
 t/helper/test-bundle-uri.c  | 45 ++++++++++++++++++++++++++-------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index d56c5e33d5f..dca88ed1e89 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -6,6 +6,7 @@
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "config.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -172,6 +173,34 @@ static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int parse_bundle_list_in_config_format(const char *uri,
+				       const char *filename,
+				       struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	list->mode = BUNDLE_MODE_NONE;
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static int find_temp_filename(struct strbuf *name)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 41a1510a4ac..294ac804140 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -71,6 +71,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+
+int parse_bundle_list_in_config_format(const char *uri,
+				       const char *filename,
+				       struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 5cb0c9196fa..23ce0eebca3 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,27 +4,52 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
-	const char *usage[] = {
+	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <in",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	struct option options[] = {
 		OPT_END(),
 	};
+	const char **usage = key_value_usage;
 	struct strbuf sb = STRBUF_INIT;
 	struct bundle_list list;
 	int err = 0;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	while (strbuf_getline(&sb, stdin) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf) < 0)
-			err = error("bad line: '%s'", sb.buf);
+
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc)
+			goto usage;
+		while (strbuf_getline(&sb, stdin) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf) < 0)
+				err = error("bad line: '%s'", sb.buf);
+		}
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = parse_bundle_list_in_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
 
@@ -55,7 +80,9 @@ int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 675c1f1d2f4..dd9dc36bfd7 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -88,4 +88,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = <unknown>
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	warning: bundle list at '\''<uri>'\'' has no mode
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget
  2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  7 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The next change will start allowing us to parse bundle lists that are
downloaded from a provided bundle URI. Those lists might point to other
lists, which could proceed to an arbitrary depth (and even create
cycles). Restructure fetch_bundle_uri() to have an internal version that
has a recursion depth. Compare that to a new max_bundle_uri_depth
constant that is twice as high as we expect this depth to be for any
legitimate use of bundle list linking.

We can consider making max_bundle_uri_depth a configurable value if
there is demonstrated value in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index dca88ed1e89..c9f3df28b2f 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+/**
+ * This limits the recursion on fetch_bundle_uri_internal() when following
+ * bundle lists.
+ */
+static int max_bundle_uri_depth = 4;
+
+static int fetch_bundle_uri_internal(struct repository *r,
+				     const char *uri,
+				     int depth)
 {
 	int result = 0;
 	struct strbuf filename = STRBUF_INIT;
 
+	if (depth >= max_bundle_uri_depth) {
+		warning(_("exceeded bundle URI recursion limit (%d)"),
+			max_bundle_uri_depth);
+		return -1;
+	}
+
 	if ((result = find_temp_filename(&filename)))
 		goto cleanup;
 
@@ -363,6 +377,11 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	return fetch_bundle_uri_internal(r, uri, 0);
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 7/7] bundle-uri: fetch a list of bundles
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-08-22 15:12 ` [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
@ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget
  2022-09-02 23:51   ` Josh Steadmon
  2022-09-05 12:50   ` Teng Long
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  7 siblings, 2 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 211 +++++++++++++++++++++++++++++++++---
 bundle-uri.h                |   6 +
 t/t5558-clone-bundle-uri.sh |  93 ++++++++++++++++
 3 files changed, 293 insertions(+), 17 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index c9f3df28b2f..37867afca27 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -35,9 +35,10 @@ void init_bundle_list(struct bundle_list *list)
 static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
 				    void *data)
 {
-	free(bundle->id);
-	free(bundle->uri);
+	FREE_AND_NULL(bundle->id);
+	FREE_AND_NULL(bundle->uri);
 	strbuf_release(&bundle->file);
+	bundle->unbundled = 0;
 	return 0;
 }
 
@@ -334,18 +335,102 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
+struct bundle_list_context {
+	struct repository *r;
+	struct bundle_list *list;
+	enum bundle_list_mode mode;
+	int count;
+	int depth;
+};
+
+/*
+ * This early definition is necessary because we use indirect recursion:
+ *
+ * While iterating through a bundle list that was downloaded as part
+ * of fetch_bundle_uri_internal(), iterator methods eventually call it
+ * again, but with depth + 1.
+ */
+static int fetch_bundle_uri_internal(struct repository *r,
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list);
+
+static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
+{
+	struct bundle_list_context *ctx = data;
+
+	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
+		return 0;
+
+	ctx->count++;
+	return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
+}
+
+static int download_bundle_list(struct repository *r,
+				struct bundle_list *local_list,
+				struct bundle_list *global_list,
+				int depth)
+{
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = global_list,
+		.depth = depth + 1,
+		.mode = local_list->mode,
+	};
+
+	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
+}
+
+static int fetch_bundle_list_in_config_format(struct repository *r,
+					      struct bundle_list *global_list,
+					      struct remote_bundle_info *bundle,
+					      int depth)
+{
+	int result;
+	struct bundle_list list_from_bundle;
+
+	init_bundle_list(&list_from_bundle);
+
+	if ((result = parse_bundle_list_in_config_format(bundle->uri,
+							 bundle->file.buf,
+							 &list_from_bundle)))
+		goto cleanup;
+
+	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
+		warning(_("unrecognized bundle mode from URI '%s'"),
+			bundle->uri);
+		result = -1;
+		goto cleanup;
+	}
+
+	if ((result = download_bundle_list(r, &list_from_bundle,
+					   global_list, depth)))
+		goto cleanup;
+
+cleanup:
+	clear_bundle_list(&list_from_bundle);
+	return result;
+}
+
 /**
  * This limits the recursion on fetch_bundle_uri_internal() when following
  * bundle lists.
  */
 static int max_bundle_uri_depth = 4;
 
+/**
+ * Recursively download all bundles advertised at the given URI
+ * to files. If the file is a bundle, then add it to the given
+ * 'list'. Otherwise, expect a bundle list and recurse on the
+ * URIs in that list according to the list mode (ANY or ALL).
+ */
 static int fetch_bundle_uri_internal(struct repository *r,
-				     const char *uri,
-				     int depth)
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list)
 {
 	int result = 0;
-	struct strbuf filename = STRBUF_INIT;
+	struct remote_bundle_info *bcopy;
 
 	if (depth >= max_bundle_uri_depth) {
 		warning(_("exceeded bundle URI recursion limit (%d)"),
@@ -353,33 +438,125 @@ static int fetch_bundle_uri_internal(struct repository *r,
 		return -1;
 	}
 
-	if ((result = find_temp_filename(&filename)))
+	if (!bundle->file.len &&
+	    (result = find_temp_filename(&bundle->file)))
 		goto cleanup;
 
-	if ((result = copy_uri_to_file(filename.buf, uri))) {
-		warning(_("failed to download bundle from URI '%s'"), uri);
+	if ((result = copy_uri_to_file(bundle->file.buf, bundle->uri))) {
+		warning(_("failed to download bundle from URI '%s'"), bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename.buf, 0))) {
-		warning(_("file at URI '%s' is not a bundle"), uri);
+	if ((result = !is_bundle(bundle->file.buf, 1))) {
+		result = fetch_bundle_list_in_config_format(
+				r, list, bundle, depth);
+		if (result)
+			warning(_("file at URI '%s' is not a bundle or bundle list"),
+				bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename.buf))) {
-		warning(_("failed to unbundle bundle from URI '%s'"), uri);
-		goto cleanup;
-	}
+	/* Copy the bundle and insert it into the global list. */
+	CALLOC_ARRAY(bcopy, 1);
+	bcopy->id = xstrdup(bundle->id);
+	strbuf_init(&bcopy->file, 0);
+	strbuf_add(&bcopy->file, bundle->file.buf, bundle->file.len);
+	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
+	hashmap_add(&list->bundles, &bcopy->ent);
 
 cleanup:
-	unlink(filename.buf);
-	strbuf_release(&filename);
+	if (result)
+		unlink(bundle->file.buf);
 	return result;
 }
 
+struct attempt_unbundle_context {
+	struct repository *r;
+	int success_count;
+	int failure_count;
+};
+
+static int attempt_unbundle(struct remote_bundle_info *info, void *data)
+{
+	struct attempt_unbundle_context *ctx = data;
+
+	if (info->unbundled || !unbundle_from_file(ctx->r, info->file.buf)) {
+		ctx->success_count++;
+		info->unbundled = 1;
+	} else {
+		ctx->failure_count++;
+	}
+
+	return 0;
+}
+
+static int unbundle_all_bundles(struct repository *r,
+				struct bundle_list *list)
+{
+	int last_success_count = -1;
+	struct attempt_unbundle_context ctx = {
+		.r = r,
+	};
+
+	/*
+	 * Iterate through all bundles looking for ones that can
+	 * successfully unbundle. If any succeed, then perhaps another
+	 * will succeed in the next attempt.
+	 */
+	while (last_success_count < ctx.success_count) {
+		last_success_count = ctx.success_count;
+
+		ctx.success_count = 0;
+		ctx.failure_count = 0;
+		for_all_bundles_in_list(list, attempt_unbundle, &ctx);
+	}
+
+	if (ctx.success_count)
+		git_config_set_multivar_gently("log.excludedecoration",
+						"refs/bundle/",
+						"refs/bundle/",
+						CONFIG_FLAGS_FIXED_VALUE |
+						CONFIG_FLAGS_MULTI_REPLACE);
+
+	if (ctx.failure_count)
+		warning(_("failed to unbundle %d bundles"),
+			ctx.failure_count);
+
+	return 0;
+}
+
+static int unlink_bundle(struct remote_bundle_info *info, void *data)
+{
+	if (info->file.buf)
+		unlink_or_warn(info->file.buf);
+	return 0;
+}
+
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
-	return fetch_bundle_uri_internal(r, uri, 0);
+	int result;
+	struct bundle_list list;
+	struct remote_bundle_info bundle = {
+		.uri = xstrdup(uri),
+		.id = xstrdup("<root>"),
+		.file = STRBUF_INIT,
+	};
+
+	init_bundle_list(&list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &list);
+
+cleanup:
+	for_all_bundles_in_list(&list, unlink_bundle, NULL);
+	clear_bundle_list(&list);
+	clear_remote_bundle_info(&bundle, NULL);
+	return result;
 }
 
 /**
diff --git a/bundle-uri.h b/bundle-uri.h
index 294ac804140..e9d85a6ecfb 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -35,6 +35,12 @@ struct remote_bundle_info {
 	 * an empty string.
 	 */
 	struct strbuf file;
+
+	/**
+	 * If the bundle has been unbundled successfully, then
+	 * this boolean is true.
+	 */
+	unsigned unbundled:1;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { \
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index ad666a2d28a..592790b49f0 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -41,6 +41,72 @@ test_expect_success 'clone with file:// bundle' '
 	test_cmp expect actual
 '
 
+# To get interesting tests for bundle lists, we need to construct a
+# somewhat-interesting commit history.
+#
+# ---------------- bundle-4
+#
+#       4
+#      / \
+# ----|---|------- bundle-3
+#     |   |
+#     |   3
+#     |   |
+# ----|---|------- bundle-2
+#     |   |
+#     2   |
+#     |   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'construct incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit 1 &&
+		git checkout -b left &&
+		test_commit 2 &&
+		git checkout -b right base &&
+		test_commit 3 &&
+		git checkout -b merge left &&
+		git merge right -m "4" &&
+
+		git bundle create bundle-1.bundle base &&
+		git bundle create bundle-2.bundle base..left &&
+		git bundle create bundle-3.bundle base..right &&
+		git bundle create bundle-4.bundle merge --not left right
+	)
+'
+
+test_expect_success 'clone bundle list (file, no heuristic)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = file://$(pwd)/clone-from/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file &&
+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
+	do
+		git -C clone-list-file rev-parse $oid || return 1
+	done
+'
+
+
 #########################################################################
 # HTTP tests begin here
 
@@ -75,6 +141,33 @@ test_expect_success 'clone HTTP bundle' '
 	test_config -C clone-http log.excludedecoration refs/bundle/
 '
 
+test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = $HTTPD_URL/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = $HTTPD_URL/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = $HTTPD_URL/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http &&
+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
+	do
+		git -C clone-list-http rev-parse $oid || return 1
+	done
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/7] bundle-uri: create bundle_list struct and helpers
  2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-08-22 17:57   ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-22 17:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/**
> + * The remote_bundle_info struct contains information for a single bundle
> + * URI. This may be initialized simply by a given URI or might have
> + * additional metadata associated with it if the bundle was advertised by
> + * a bundle list.
> + */
> +struct remote_bundle_info {
> +	struct hashmap_entry ent;
> +
> +	/**
> +	 * The 'id' is a name given to the bundle for reference
> +	 * by other bundle infos.
> +	 */
> +	char *id;
> +
> +	/**
> +	 * The 'uri' is the location of the remote bundle so
> +	 * it can be downloaded on-demand. This will be NULL
> +	 * if there was no table of contents.
> +	 */
> +	char *uri;
> +
> +	/**
> +	 * If the bundle has been downloaded, then 'file' is a
> +	 * filename storing its contents. Otherwise, 'file' is
> +	 * an empty string.
> +	 */
> +	struct strbuf file;
> +};

Presumably the sequence of events are that first a bundle list is
obtained, with their .file member set to empty, then http worker(s)
download and deposit the contents to files at which time the .file
member is set to the resulting file.  The file downloader presumably
uses the usual "create a temporary file, download to it, and then
commit it by closing and then renaming" dance, and the downloading
http worker may want to have two strbufs somewhere it can access to
come up with the name of the temporary and the name of the final
file.  But once the result becomes a committed file, its name will
not change, or will it?

At this step without the code that actually uses the data, use of
strbuf, instead of "char *" like id and uri members do, smells like
a premature optimization, and it is unclear if the optimization is
even effective.

Other than that, looks good to me.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing
  2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-08-22 18:20   ` Junio C Hamano
  2022-08-23 16:29     ` Derrick Stolee
  2022-08-31 22:02   ` Glen Choo
  2022-09-01  2:38   ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Teng Long
  2 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-22 18:20 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index e376d547ce0..4280af6992e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -387,6 +387,8 @@ include::config/branch.txt[]
>  
>  include::config/browser.txt[]
>  
> +include::config/bundle.txt[]
> +

The file that records a list of bundles may borrow the format of git
config files, but will we store their contents in configuration
files in the receiving (or originating) repository?  With the
presence of fields like "bundle.version", I somehow doubt it.

Should "git config --help" list them?

> diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
> new file mode 100644
> index 00000000000..3515bfe38d1
> --- /dev/null
> +++ b/Documentation/config/bundle.txt

If the answer is "no", then this file looks out of place.

> diff --git a/bundle-uri.c b/bundle-uri.c
> index ceeef0b6641..ade7eccce39 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -66,6 +66,80 @@ int for_all_bundles_in_list(struct bundle_list *list,
>  	return 0;
>  }
>  
> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.

Let's stick to the "error is negative" if we do not have a strong
reason not to.

> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> +			      struct bundle_list *list)
> +{
> +	const char *pkey, *dot;
> +	struct strbuf id = STRBUF_INIT;
> +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
> +	struct remote_bundle_info *bundle;
> +
> +	if (!skip_prefix(key, "bundle.", &pkey))
> +		return 1;
> +	dot = strchr(pkey, '.');
> +	if (!dot) {
> +		if (!strcmp(pkey, "version")) {
> +			int version = atoi(value);

Can atoi() safely fail?  Are we happy of pkey that says "1A" and we
parse it as "1"?

> +			if (version != 1)
> +				return 1;
> +
> +			list->version = version;
> +			return 0;
> +		}

Is it OK for a bundle list described in the config-file format to
have "bundle.version" twice, giving different values?  It feels
counter-intuitive to apply the "last one wins" rule that is usual
for configuration files.

> +		if (!strcmp(pkey, "mode")) {
> +			if (!strcmp(value, "all"))
> +				list->mode = BUNDLE_MODE_ALL;
> +			else if (!strcmp(value, "any"))
> +				list->mode = BUNDLE_MODE_ANY;
> +			else
> +				return 1;
> +			return 0;
> +		}

Likewise for bundle.mode

> +		/* Ignore other unknown global keys. */
> +		return 0;
> +	}
> +
> +	strbuf_add(&id, pkey, dot - pkey);
> +	dot++;
> +
> +	/*
> +	 * Check for an existing bundle with this <id>, or create one
> +	 * if necessary.
> +	 */
> +	lookup.id = id.buf;
> +	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
> +	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
> +		CALLOC_ARRAY(bundle, 1);
> +		bundle->id = strbuf_detach(&id, NULL);
> +		strbuf_init(&bundle->file, 0);
> +		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
> +		hashmap_add(&list->bundles, &bundle->ent);
> +	}
> +	strbuf_release(&id);
> +
> +	if (!strcmp(dot, "uri")) {
> +		free(bundle->uri);
> +		bundle->uri = xstrdup(value);
> +		return 0;
> +	}

This explicitly implements "the last one wins".  Would it really
make sense for a server to serve a bundle list that says redundant
and wasteful pieces of information, i.e.

    [bundle "1"]
	url = one
	url = two

It is not like doing so would allow us to reuse an otherwise mostly
good file by appending new information and that would be a performance
or storage win.  So I am not quite sure why we want "the last one wins"
rule here.  It instead looks like something we want to sanity check
and complain about.

> +	/*
> +	 * At this point, we ignore any information that we don't
> +	 * understand, assuming it to be hints for a heuristic the client
> +	 * does not currently understand.
> +	 */

This is sensible.

> +	return 0;
> +}
> +
>  static int find_temp_filename(struct strbuf *name)
>  {
>  	int fd;

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] bundle-uri: create "key=value" line parsing
  2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-08-22 19:17   ` Junio C Hamano
  2022-08-23 16:31     ` Derrick Stolee
  2022-09-02 23:41   ` Josh Steadmon
  1 sibling, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-08-22 19:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee

"Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
writes:

> +/**
> + * General API for {transport,connect}.c etc.
> + */
> +int bundle_uri_parse_line(struct bundle_list *list, const char *line)
> +{
> +	int result;
> +	const char *equals;
> +	struct strbuf key = STRBUF_INIT;
> +
> +	if (!strlen(line))
> +		return error(_("bundle-uri: got an empty line"));
> +
> +	equals = strchr(line, '=');
> +
> +	if (!equals)
> +		return error(_("bundle-uri: line is not of the form 'key=value'"));
> +	if (line == equals || !*(equals + 1))
> +		return error(_("bundle-uri: line has empty key or value"));

The suggestions implied by my asking fall strictly into the "it does
not have to exist here at this step and we can later extend it", but
for something whose equivalent can be stored in our configuration
file, it is curious why we _insist_ to refuse an empty string as the
value.

I do not miss the "key alone without even '=' means 'true'"
convention, personally, so insisting to have '=' is OK, but the
inability to have an empty string as a value looks a bit disturbing.

This depends on how the helper gets called, but most likely the
caller has a single line of pkt-line that it GAVE us to process, so
it sounds a bit wasteful to insist that "line" to be const to us and
force us to use a separate strbuf, instead of just stuffing NUL at
where we found '=' and pass the two halves to bundle_list_update().

Not a huge deal, it is just something I found funny in the "back in
the days we coded together, Linus would never have written like
this" way.

Other than that small detail, the code looks OK to me.

> +	strbuf_add(&key, line, equals - line);
> +	result = bundle_list_update(key.buf, equals + 1, list);
> +	strbuf_release(&key);
> +
> +	return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> index 6692aa4b170..f725c9796f7 100644
> --- a/bundle-uri.h
> +++ b/bundle-uri.h
> @@ -76,4 +76,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
>   */
>  int fetch_bundle_uri(struct repository *r, const char *uri);
>  
> -#endif
> +/**
> + * General API for {transport,connect}.c etc.
> + */
> +
> +/**
> + * Parse a "key=value" packet line from the bundle-uri verb.
> + *
> + * Returns 0 on success and non-zero on error.
> + */
> +int bundle_uri_parse_line(struct bundle_list *list,
> +			  const char *line);
> +
> +#endif /* BUNDLE_URI_H */

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/7] bundle-uri: parse bundle list in config format
  2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-08-22 19:25   ` Junio C Hamano
  2022-08-23 16:43     ` Derrick Stolee
  2022-08-31 22:18     ` Jonathan Tan
  2022-09-01  8:05   ` Teng Long
  1 sibling, 2 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-08-22 19:25 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> To allow for the incremental fetch case, teach Git to understand a
> bundle list that could be advertised at an independent bundle URI. Such
> a bundle list is likely to be inspected by human readers, even if only
> by the bundle provider creating the list. For this reason, we can take
> our expected "key=value" pairs and instead format them using Git config
> format.

"can" does not explain why it is a good idea.  "As a sequence of
key=value pairs is a lot more dense and harder to read than the
configuration file format, let's declare that it is the format we
use in a file that holds a bundle-list" would be.

I do not personally buy it, though.  As I hinted in an earlier step,
some trait we associate with our configuration fioe format, like the
"last one wins" semantics, are undesirable ones, so even if we reuse
the appearance of the text, the semantics would have to become
different (including "syntax errors lead to die()" mentioned
elsewhere in the proposed log message).

> Update 'test-tool bundle-uri' to take this config file format as input.
> It uses a filename instead of stdin because there is no existing way to
> parse a FILE pointer in the config machinery. Using
> git_config_from_mem() is overly complicated and more likely to introduce
> bugs than this simpler version. I would rather have a slightly confusing
> test helper than complicated product code.

All the troubles described above seem to come from the initial
mistake to try reusing the configuration file parser or reusing the
configuration file format, at least to me.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing
  2022-08-22 18:20   ` Junio C Hamano
@ 2022-08-23 16:29     ` Derrick Stolee
  2022-08-31 22:10       ` Jonathan Tan
  0 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-08-23 16:29 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon

On 8/22/2022 2:20 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index e376d547ce0..4280af6992e 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -387,6 +387,8 @@ include::config/branch.txt[]
>>  
>>  include::config/browser.txt[]
>>  
>> +include::config/bundle.txt[]
>> +
> 
> The file that records a list of bundles may borrow the format of git
> config files, but will we store their contents in configuration
> files in the receiving (or originating) repository?  With the
> presence of fields like "bundle.version", I somehow doubt it.
> 
> Should "git config --help" list them?

I suppose that at this point, they should be left out, since
writing them to your Git config does nothing.

In the future, having these config values present will advertise
the bundle list during the 'bundle-uri' protocol v2 command. That
could use some clarification in the documentation, too, perhaps
with a "bundle.*" item discussing how all of the other items are
related to that advertisement.

>> +/**
>> + * Given a key-value pair, update the state of the given bundle list.
>> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
>> + * is not understood or the value is malformed.
> 
> Let's stick to the "error is negative" if we do not have a strong
> reason not to.

Right. Can do.
 
>> + */
>> +MAYBE_UNUSED
>> +static int bundle_list_update(const char *key, const char *value,
>> +			      struct bundle_list *list)
>> +{
>> +	const char *pkey, *dot;
>> +	struct strbuf id = STRBUF_INIT;
>> +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
>> +	struct remote_bundle_info *bundle;
>> +
>> +	if (!skip_prefix(key, "bundle.", &pkey))
>> +		return 1;
>> +	dot = strchr(pkey, '.');
>> +	if (!dot) {
>> +		if (!strcmp(pkey, "version")) {
>> +			int version = atoi(value);
> 
> Can atoi() safely fail?  Are we happy of pkey that says "1A" and we
> parse it as "1"?
> 
>> +			if (version != 1)
>> +				return 1;
>> +
>> +			list->version = version;
>> +			return 0;
>> +		}
> 
> Is it OK for a bundle list described in the config-file format to
> have "bundle.version" twice, giving different values?  It feels
> counter-intuitive to apply the "last one wins" rule that is usual
> for configuration files.

...
> This explicitly implements "the last one wins".  Would it really
> make sense for a server to serve a bundle list that says redundant
> and wasteful pieces of information, i.e.
> 
>     [bundle "1"]
> 	url = one
> 	url = two
> 
> It is not like doing so would allow us to reuse an otherwise mostly
> good file by appending new information and that would be a performance
> or storage win.  So I am not quite sure why we want "the last one wins"
> rule here.  It instead looks like something we want to sanity check
> and complain about.

I could switch this to "expect at most one value" and add warnings for
duplicate keys. Should duplicate keys then mean "the bundle list is
malformed, abort downloading bundles"? That seems reasonable to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] bundle-uri: create "key=value" line parsing
  2022-08-22 19:17   ` Junio C Hamano
@ 2022-08-23 16:31     ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-08-23 16:31 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon

On 8/22/2022 3:17 PM, Junio C Hamano wrote:
> "Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
> writes:
> 
>> +/**
>> + * General API for {transport,connect}.c etc.
>> + */
>> +int bundle_uri_parse_line(struct bundle_list *list, const char *line)
>> +{
>> +	int result;
>> +	const char *equals;
>> +	struct strbuf key = STRBUF_INIT;
>> +
>> +	if (!strlen(line))
>> +		return error(_("bundle-uri: got an empty line"));
>> +
>> +	equals = strchr(line, '=');
>> +
>> +	if (!equals)
>> +		return error(_("bundle-uri: line is not of the form 'key=value'"));
>> +	if (line == equals || !*(equals + 1))
>> +		return error(_("bundle-uri: line has empty key or value"));
> 
> The suggestions implied by my asking fall strictly into the "it does
> not have to exist here at this step and we can later extend it", but
> for something whose equivalent can be stored in our configuration
> file, it is curious why we _insist_ to refuse an empty string as the
> value.
> 
> I do not miss the "key alone without even '=' means 'true'"
> convention, personally, so insisting to have '=' is OK, but the
> inability to have an empty string as a value looks a bit disturbing.

I'd be happy to switch this to allow an empty value.
 
> This depends on how the helper gets called, but most likely the
> caller has a single line of pkt-line that it GAVE us to process, so
> it sounds a bit wasteful to insist that "line" to be const to us and
> force us to use a separate strbuf, instead of just stuffing NUL at
> where we found '=' and pass the two halves to bundle_list_update().

I can look into using a non-const buffer.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/7] bundle-uri: parse bundle list in config format
  2022-08-22 19:25   ` Junio C Hamano
@ 2022-08-23 16:43     ` Derrick Stolee
  2022-08-31 22:18     ` Jonathan Tan
  1 sibling, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-08-23 16:43 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon

On 8/22/2022 3:25 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> To allow for the incremental fetch case, teach Git to understand a
>> bundle list that could be advertised at an independent bundle URI. Such
>> a bundle list is likely to be inspected by human readers, even if only
>> by the bundle provider creating the list. For this reason, we can take
>> our expected "key=value" pairs and instead format them using Git config
>> format.
> 
> "can" does not explain why it is a good idea.  "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.

This "more dense and harder to read" was definitely my intention for
wanting a different format. 

> I do not personally buy it, though.  As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).

The points you made earlier about "last one wins" semantics are the
biggest road-blocks to using the config file format, from what I've read
so far. We could change those semantics to be different from my current
implementation which respects the "last one wins" rule, and then that
makes the config format match not as closely. That burden of avoiding
multiple key values is not on the end-user but the bundle provider to
match the new expectations. (There might be something we should be careful
about when advertising the bundle list from our Git config in the
'bundle-uri' command in the next series.)

The "syntax errors lead to die()" is mitigated by using
CONFIG_ERROR_ERROR, which is what I meant by "Be careful to call..." I
should have been more clear that we are _not_ going to die() based on the
remote data. We might write an error message and then abort the bundle
download.

With all of these points in mind, I'd still prefer to use the config file
format as described in the design document. If you still don't agree, then
I'll change the format to be key=value pairs split with newlines, and
update the design document accordingly.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing
  2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
  2022-08-22 18:20   ` Junio C Hamano
@ 2022-08-31 22:02   ` Glen Choo
  2022-09-01  2:38   ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Teng Long
  2 siblings, 0 replies; 94+ messages in thread
From: Glen Choo @ 2022-08-31 22:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.
> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> +			      struct bundle_list *list)
> +{
> +	const char *pkey, *dot;
> +	struct strbuf id = STRBUF_INIT;
> +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
> +	struct remote_bundle_info *bundle;
> +
> +	if (!skip_prefix(key, "bundle.", &pkey))
> +		return 1;
> +
> +	dot = strchr(pkey, '.');
> +	if (!dot) {
> +		if (!strcmp(pkey, "version")) {
> +			int version = atoi(value);
> +			if (version != 1)
> +				return 1;
> +
> +			list->version = version;
> +			return 0;
> +		}
> +
> +		if (!strcmp(pkey, "mode")) {
> +			if (!strcmp(value, "all"))
> +				list->mode = BUNDLE_MODE_ALL;
> +			else if (!strcmp(value, "any"))
> +				list->mode = BUNDLE_MODE_ANY;
> +			else
> +				return 1;
> +			return 0;
> +		}

Drive-by comment from Review Club: we could simplify
"section.[subsection.]key" parsing using parse_config_key(). There are
other places in the code that do custom parsing like this, but maybe
they should use parse_config_key() too.

> +
> +		/* Ignore other unknown global keys. */
> +		return 0;
> +	}
> +
> +	strbuf_add(&id, pkey, dot - pkey);
> +	dot++;
> +
> +	/*
> +	 * Check for an existing bundle with this <id>, or create one
> +	 * if necessary.
> +	 */
> +	lookup.id = id.buf;
> +	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
> +	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
> +		CALLOC_ARRAY(bundle, 1);
> +		bundle->id = strbuf_detach(&id, NULL);
> +		strbuf_init(&bundle->file, 0);
> +		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
> +		hashmap_add(&list->bundles, &bundle->ent);
> +	}
> +	strbuf_release(&id);
> +
> +	if (!strcmp(dot, "uri")) {
> +		free(bundle->uri);
> +		bundle->uri = xstrdup(value);
> +		return 0;
> +	}
> +
> +	/*
> +	 * At this point, we ignore any information that we don't
> +	 * understand, assuming it to be hints for a heuristic the client
> +	 * does not currently understand.
> +	 */
> +	return 0;
> +}
> +
>  static int find_temp_filename(struct strbuf *name)
>  {
>  	int fd;
> -- 
> gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing
  2022-08-23 16:29     ` Derrick Stolee
@ 2022-08-31 22:10       ` Jonathan Tan
  0 siblings, 0 replies; 94+ messages in thread
From: Jonathan Tan @ 2022-08-31 22:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jonathan Tan, Junio C Hamano, Derrick Stolee via GitGitGadget,
	git, me, newren, avarab, mjcheetham, steadmon

Derrick Stolee <derrickstolee@github.com> writes:
> On 8/22/2022 2:20 PM, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > 
> >> diff --git a/Documentation/config.txt b/Documentation/config.txt
> >> index e376d547ce0..4280af6992e 100644
> >> --- a/Documentation/config.txt
> >> +++ b/Documentation/config.txt
> >> @@ -387,6 +387,8 @@ include::config/branch.txt[]
> >>  
> >>  include::config/browser.txt[]
> >>  
> >> +include::config/bundle.txt[]
> >> +
> > 
> > The file that records a list of bundles may borrow the format of git
> > config files, but will we store their contents in configuration
> > files in the receiving (or originating) repository?  With the
> > presence of fields like "bundle.version", I somehow doubt it.
> > 
> > Should "git config --help" list them?
> 
> I suppose that at this point, they should be left out, since
> writing them to your Git config does nothing.
> 
> In the future, having these config values present will advertise
> the bundle list during the 'bundle-uri' protocol v2 command. That
> could use some clarification in the documentation, too, perhaps
> with a "bundle.*" item discussing how all of the other items are
> related to that advertisement.

I think the main point of confusion is that these config variables
currently do nothing when in a repo config, but they will be
subsequently used once we implement advertising them, and it is
convenient that these configs delegate to other files that have the same
format (and that we can specify, at the CLI, a file of the same format).
Maybe documentation like this would clear up the confusion:

  bundle.*::
  	The `bundle.*` keys may appear in a repo's config, in a file
  	linked by bundle.<id>.uri, or in a file passed to "clone
  	--bundle-uri".
  +
  NEEDSWORK: Currently, only the latter 2 situations work. `bundle.*` keys
  appearing in a repo's config will take effect once support for
  advertising bundles in fetch protocol v2 is implemented.
  +
  See link:technical/bundle-uri.html[the bundle URI design document] for
  more details.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 5/7] bundle-uri: parse bundle list in config format
  2022-08-22 19:25   ` Junio C Hamano
  2022-08-23 16:43     ` Derrick Stolee
@ 2022-08-31 22:18     ` Jonathan Tan
  1 sibling, 0 replies; 94+ messages in thread
From: Jonathan Tan @ 2022-08-31 22:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Tan, Derrick Stolee via GitGitGadget, git, me, newren,
	avarab, mjcheetham, steadmon, Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > To allow for the incremental fetch case, teach Git to understand a
> > bundle list that could be advertised at an independent bundle URI. Such
> > a bundle list is likely to be inspected by human readers, even if only
> > by the bundle provider creating the list. For this reason, we can take
> > our expected "key=value" pairs and instead format them using Git config
> > format.
> 
> "can" does not explain why it is a good idea.  "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.
> 
> I do not personally buy it, though.  As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).

One reason for using the configuration file format (which perhaps could
have been better explained in the commit message) is that we plan to
have a way for a repo to advertise a list of bundles during fetch.
I think that config is a natural place to put that, even with its "last
one wins" semantics.

It could be argued that we can just put a single URI in config and only
allow advertising of a single URI (and then use a different format for
the bundle lists with semantics that are stricter than "last one wins"),
but that seems unnecessarily restrictive (and would make the client make
one more network request). And if we're advertising multiple bundles, it
seems reasonable to make all bundle lists have the same format (whether
they are in config or in a separate file).

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4/7] bundle-uri: unit test "key=value" parsing
  2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
  2022-08-22 18:20   ` Junio C Hamano
  2022-08-31 22:02   ` Glen Choo
@ 2022-09-01  2:38   ` Teng Long
  2 siblings, 0 replies; 94+ messages in thread
From: Teng Long @ 2022-09-01  2:38 UTC (permalink / raw)
  To: gitgitgadget
  Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren,
	steadmon, tenglong.tl

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> +> +	init_bundle_list(&list);
> +	while (strbuf_getline(&sb, stdin) != EOF) {
> +		if (bundle_uri_parse_line(&list, sb.buf) < 0)
> +			err = error("bad line: '%s'", sb.buf);
> +	}

The command to write such a test is useful for people who
want to experiment about the feature, Thanks. On top of that,
I have a little question about the condition:

  if (bundle_uri_parse_line(&list, sb.buf) < 0)

"bundle_uri_parse_line" will call "bundle_list_update" inside, and
could get the result of it as "bundle_uri_parse_line"'s return, then
actually "bundle_list_update" could return "1", so I'm not sure but maybe
the line could modified to:

  if (bundle_uri_parse_line(&list, sb.buf))

at here.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 4/7] bundle-uri: unit test "key=value" parsing
  2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-09-01  2:56   ` Teng Long
  0 siblings, 0 replies; 94+ messages in thread
From: Teng Long @ 2022-09-01  2:56 UTC (permalink / raw)
  To: gitgitgadget
  Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren,
	steadmon, tenglong.tl

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes
> +void print_bundle_list(FILE *fp, struct bundle_list *list)
> +{
> +	const char *mode;
> +
> +	switch (list->mode) {
> +	case BUNDLE_MODE_ALL:
> +		mode = "all";
> +		break;
> +
> +	case BUNDLE_MODE_ANY:
> +		mode = "any";
> +		break;
> +
> +	case BUNDLE_MODE_NONE:
> +	default:
> +		mode = "<unknown>";
> +	}
> +
> +	printf("[bundle]\n");
> +	printf("\tversion = %d\n", list->version);
> +	printf("\tmode = %s\n", mode);
> +
> +	for_all_bundles_in_list(list, summarize_bundle, fp);
> +}

"print_bundle_list" use to print the git config formatting lines of
"bundle_list", it's supported to use a "FILE *fp" as it's output. The
"for_all_bundles_in_list" use it, but other places seems not, I'm not
sure, maybe we should change to:

  fprintf(fp, "[bundle]\n");
  fprintf(fp, "\tversion = %d\n", list->version);
  fprintf(fp, "\tmode = %s\n", mode);

here?

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 5/7] bundle-uri: parse bundle list in config format
  2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
  2022-08-22 19:25   ` Junio C Hamano
@ 2022-09-01  8:05   ` Teng Long
  1 sibling, 0 replies; 94+ messages in thread
From: Teng Long @ 2022-09-01  8:05 UTC (permalink / raw)
  To: gitgitgadget
  Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren,
	steadmon, tenglong.tl

Derrick Stolee <derrickstolee@github.com> writes:

> diff --git a/bundle-uri.h b/bundle-uri.h
> index 41a1510a4ac..294ac804140 100644
> --- a/bundle-uri.h
> +++ b/bundle-uri.h
> @@ -71,6 +71,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
>  struct FILE;
>  void print_bundle_list(FILE *fp, struct bundle_list *list);
>
> +/**
> + * A bundle URI may point to a bundle list where the key=value
> + * pairs are provided in config file format. This method is
> + * exposed publicly for testing purposes.
> + */
> +
> +int parse_bundle_list_in_config_format(const char *uri,
> +				       const char *filename,
> +				       struct bundle_list *list);
> +

Although the comment clarifies the purpose of why to introduce
"parse_bundle_list_in_config_format", but I think this API is useful if finally
config format is supported. So far, we have a API names "bundle_uri_parse_line"
which is used to parsing key-value pairs and package into bundle list, I think
maybe we should rename the API name from "parse_bundle_list_in_config_format" to
"bundle_uri_parse_config_format", maybe better in my opinion for more consistent
naming. I think it doesnt break anything, feel free to accept or remain.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 3/7] bundle-uri: create "key=value" line parsing
  2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-08-22 19:17   ` Junio C Hamano
@ 2022-09-02 23:41   ` Josh Steadmon
  1 sibling, 0 replies; 94+ messages in thread
From: Josh Steadmon @ 2022-09-02 23:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason via GitGitGadget
  Cc: git, gitster, me, newren, avarab, mjcheetham, Derrick Stolee

On 2022.08.22 15:12, Ævar Arnfjörð Bjarmason via GitGitGadget wrote:
> From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
>  <avarab@gmail.com>
> 
> When advertising a bundle list over Git's protocol v2, we will use
> packet lines. Each line will be of the form "key=value" representing a
> bundle list. Connect the API necessary for Git's transport to the
> key-value pair parsing created in the previous change.

Since we're not actually implementing advertisement via proto v2 in this
series, could we add an additional paragraph noting that this is useful
now for implementing the test helper in the next patch?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 7/7] bundle-uri: fetch a list of bundles
  2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-09-02 23:51   ` Josh Steadmon
  2022-09-05 12:50   ` Teng Long
  1 sibling, 0 replies; 94+ messages in thread
From: Josh Steadmon @ 2022-09-02 23:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, mjcheetham, Derrick Stolee

On 2022.08.22 15:12, Derrick Stolee via GitGitGadget wrote:
[snip]
> +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
> +{
> +	struct bundle_list_context *ctx = data;
> +
> +	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
> +		return 0;
> +
> +	ctx->count++;
> +	return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
> +}

We should check whether fetch_bundle_uri_internal() actually succeeds
before we increment ctx->count here. Otherwise, if we're in
BUNDLE_MODE_ANY and the client gets unlucky that one of the servers
hosting a bundle file is offline, it won't retry any of the other
servers.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 7/7] bundle-uri: fetch a list of bundles
  2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
  2022-09-02 23:51   ` Josh Steadmon
@ 2022-09-05 12:50   ` Teng Long
  2022-09-08 17:10     ` Derrick Stolee
  1 sibling, 1 reply; 94+ messages in thread
From: Teng Long @ 2022-09-05 12:50 UTC (permalink / raw)
  To: gitgitgadget
  Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren,
	steadmon, tenglong.tl


Derrick Stolee <derrickstolee@github.com> writes:

>  int fetch_bundle_uri(struct repository *r, const char *uri)
>  {
> -	return fetch_bundle_uri_internal(r, uri, 0);
> +	int result;
> +	struct bundle_list list;
> +	struct remote_bundle_info bundle = {
> +		.uri = xstrdup(uri),
> +		.id = xstrdup("<root>"),

Very readable code, thank you very much.

I'm a little curious why we use the "<root>" as the init value of
".id"?

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 7/7] bundle-uri: fetch a list of bundles
  2022-09-05 12:50   ` Teng Long
@ 2022-09-08 17:10     ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-09-08 17:10 UTC (permalink / raw)
  To: Teng Long, gitgitgadget
  Cc: avarab, git, gitster, me, mjcheetham, newren, steadmon, tenglong.tl

On 9/5/2022 8:50 AM, Teng Long wrote:
> 
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>>  int fetch_bundle_uri(struct repository *r, const char *uri)
>>  {
>> -	return fetch_bundle_uri_internal(r, uri, 0);
>> +	int result;
>> +	struct bundle_list list;
>> +	struct remote_bundle_info bundle = {
>> +		.uri = xstrdup(uri),
>> +		.id = xstrdup("<root>"),
> 
> Very readable code, thank you very much.
> 
> I'm a little curious why we use the "<root>" as the init value of
> ".id"?

In this case, we need a valid ID to initialize the bundle list (since
it will add the remote_bundle_info to the hash set), but it is
considered the info "above" all other lists. We could also specify an
empty string here, but we just can't use NULL.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists
  2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget
                     ` (10 more replies)
  7 siblings, 11 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way:

 1. Patches 1-2 are cleanups from the previous part. The first was
    recommended by Teng Long and the second allows us to simplify our bundle
    list data structure slightly.

 2. Patches 3-4 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 3. Patches 5-6 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 4. Patch 7 teaches Git to parse Git config files into bundle lists.

 5. Patches 8-9 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22


Updates in v2
=============

Thank you to all of the voices who chimed in on the previous version. I'm
sorry it took so long for me to get a new version.

 * I've done a rather thorough overhaul to minimize how often later patches
   rewrite portions of earlier patches.

 * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
   'char *' and only in the patch where it is first used.

 * The config documentation is more clearly indicating that the bundle.*
   section has no effect in the repository config (at the moment, which will
   change in the next series).

 * The bundle.version value is now parsed using git_parse_int().

 * The config key is now parsed using parse_config_key().

 * Commit messages clarify more about the context of the change in the
   bigger picture of the bundle URI effort.

 * Some printf()s are correctly changed to fprintf()s.

 * The test helper CLI is unified across the two modes. They both take a
   filename now.

 * The count of downloaded bundles is now only updated after a successful
   download, allowing the "any" mode to keep trying after a failure.

Thanks,

 * Stolee

Derrick Stolee (7):
  bundle-uri: short-circuit capability parsing
  bundle-uri: use plain string in find_temp_filename()
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle-uri: fetch a list of bundles

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  24 ++
 Makefile                        |   1 +
 bundle-uri.c                    | 466 ++++++++++++++++++++++++++++++--
 bundle-uri.h                    |  93 +++++++
 config.c                        |   2 +-
 config.h                        |   1 +
 t/helper/test-bundle-uri.c      |  95 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     |  93 +++++++
 t/t5750-bundle-uri-parse.sh     | 171 ++++++++++++
 t/test-lib-functions.sh         |  11 +
 13 files changed, 942 insertions(+), 19 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1333

Range-diff vs v1:

  -:  ----------- >  1:  2ca431e6c37 bundle-uri: short-circuit capability parsing
  -:  ----------- >  2:  ee6c4b824c2 bundle-uri: use plain string in find_temp_filename()
  1:  c3943888658 !  3:  d9812440594 bundle-uri: create bundle_list struct and helpers
     @@ bundle-uri.c
      +static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
      +				    void *data)
      +{
     -+	free(bundle->id);
     -+	free(bundle->uri);
     -+	strbuf_release(&bundle->file);
     ++	FREE_AND_NULL(bundle->id);
     ++	FREE_AND_NULL(bundle->uri);
      +	return 0;
      +}
      +
     @@ bundle-uri.c
      +	return 0;
      +}
       
     - static int find_temp_filename(struct strbuf *name)
     + static char *find_temp_filename(void)
       {
      
       ## bundle-uri.h ##
     @@ bundle-uri.h
      +	 * if there was no table of contents.
      +	 */
      +	char *uri;
     -+
     -+	/**
     -+	 * If the bundle has been downloaded, then 'file' is a
     -+	 * filename storing its contents. Otherwise, 'file' is
     -+	 * an empty string.
     -+	 */
     -+	struct strbuf file;
      +};
      +
     -+#define REMOTE_BUNDLE_INFO_INIT { \
     -+	.file = STRBUF_INIT, \
     -+}
     ++#define REMOTE_BUNDLE_INFO_INIT { 0 }
      +
      +enum bundle_list_mode {
      +	BUNDLE_MODE_NONE = 0,
  2:  7e4e4656e53 !  4:  70daef66833 bundle-uri: create base key-value pair parsing
     @@ Commit message
            currently is expected to be an absolute URI, but will be relaxed to be
            a relative URI in the future.
      
     +    While parsing, return an error if a URI key is repeated, since we can
     +    make that restriction with bundle lists.
     +
     +    Make the git_parse_int() method global so we can parse the integer
     +    version value carefully.
     +
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/config.txt ##
     @@ Documentation/config.txt: include::config/branch.txt[]
       ## Documentation/config/bundle.txt (new) ##
      @@
      +bundle.*::
     -+	The `bundle.*` keys are used when communicating a list of bundle URIs
     -+	See link:technical/bundle-uri.html[the bundle URI design document] for
     -+	more details.
     ++	The `bundle.*` keys may appear in a bundle list file found via the
     ++	`git clone --bundle-uri` option. These keys currently have no effect
     ++	if placed in a repository config file, though this will change in the
     ++	future. See link:technical/bundle-uri.html[the bundle URI design
     ++	document] for more details.
      +
      +bundle.version::
      +	This integer value advertises the version of the bundle list format
     @@ Documentation/config/bundle.txt (new)
      +	of this `<id>`. This URI may be a bundle file or another bundle list.
      
       ## bundle-uri.c ##
     +@@
     + #include "run-command.h"
     + #include "hashmap.h"
     + #include "pkt-line.h"
     ++#include "config.h"
     + 
     + static int compare_bundles(const void *hashmap_cmp_fn_data,
     + 			   const struct hashmap_entry *he1,
      @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
       	return 0;
       }
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +static int bundle_list_update(const char *key, const char *value,
      +			      struct bundle_list *list)
      +{
     -+	const char *pkey, *dot;
      +	struct strbuf id = STRBUF_INIT;
      +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
      +	struct remote_bundle_info *bundle;
     ++	const char *subsection, *subkey;
     ++	size_t subsection_len;
      +
     -+	if (!skip_prefix(key, "bundle.", &pkey))
     -+		return 1;
     ++	if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
     ++		return -1;
      +
     -+	dot = strchr(pkey, '.');
     -+	if (!dot) {
     -+		if (!strcmp(pkey, "version")) {
     -+			int version = atoi(value);
     ++	if (!subsection_len) {
     ++		if (!strcmp(subkey, "version")) {
     ++			int version;
     ++			if (!git_parse_int(value, &version))
     ++				return -1;
      +			if (version != 1)
     -+				return 1;
     ++				return -1;
      +
      +			list->version = version;
      +			return 0;
      +		}
      +
     -+		if (!strcmp(pkey, "mode")) {
     ++		if (!strcmp(subkey, "mode")) {
      +			if (!strcmp(value, "all"))
      +				list->mode = BUNDLE_MODE_ALL;
      +			else if (!strcmp(value, "any"))
      +				list->mode = BUNDLE_MODE_ANY;
      +			else
     -+				return 1;
     ++				return -1;
      +			return 0;
      +		}
      +
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +		return 0;
      +	}
      +
     -+	strbuf_add(&id, pkey, dot - pkey);
     -+	dot++;
     ++	strbuf_add(&id, subsection, subsection_len);
      +
      +	/*
      +	 * Check for an existing bundle with this <id>, or create one
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
      +		CALLOC_ARRAY(bundle, 1);
      +		bundle->id = strbuf_detach(&id, NULL);
     -+		strbuf_init(&bundle->file, 0);
      +		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
      +		hashmap_add(&list->bundles, &bundle->ent);
      +	}
      +	strbuf_release(&id);
      +
     -+	if (!strcmp(dot, "uri")) {
     -+		free(bundle->uri);
     ++	if (!strcmp(subkey, "uri")) {
     ++		if (bundle->uri)
     ++			return -1;
      +		bundle->uri = xstrdup(value);
      +		return 0;
      +	}
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +	return 0;
      +}
      +
     - static int find_temp_filename(struct strbuf *name)
     + static char *find_temp_filename(void)
       {
       	int fd;
     +
     + ## config.c ##
     +@@ config.c: static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max)
     + 	return 0;
     + }
     + 
     +-static int git_parse_int(const char *value, int *ret)
     ++int git_parse_int(const char *value, int *ret)
     + {
     + 	intmax_t tmp;
     + 	if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int)))
     +
     + ## config.h ##
     +@@ config.h: int config_with_options(config_fn_t fn, void *,
     + 
     + int git_parse_ssize_t(const char *, ssize_t *);
     + int git_parse_ulong(const char *, unsigned long *);
     ++int git_parse_int(const char *value, int *ret);
     + 
     + /**
     +  * Same as `git_config_bool`, except that it returns -1 on error rather
  3:  49c4f88b6fd !  5:  4df3f834029 bundle-uri: create "key=value" line parsing
     @@ Commit message
          bundle list. Connect the API necessary for Git's transport to the
          key-value pair parsing created in the previous change.
      
     +    We are not currently implementing this protocol v2 functionality, but
     +    instead preparing to expose this parsing to be unit-testable.
     +
          Co-authored-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
       			      struct bundle_list *list)
       {
      @@ bundle-uri.c: cleanup:
     - 	strbuf_release(&filename);
     + 	free(filename);
       	return result;
       }
      +
     @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list,
        */
       int fetch_bundle_uri(struct repository *r, const char *uri);
       
     --#endif
      +/**
      + * General API for {transport,connect}.c etc.
      + */
     @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list,
      +int bundle_uri_parse_line(struct bundle_list *list,
      +			  const char *line);
      +
     -+#endif /* BUNDLE_URI_H */
     + #endif
  4:  7580e1f09af !  6:  91c5b58f011 bundle-uri: unit test "key=value" parsing
     @@ Commit message
          in testing logic deep in the bundle URI feature.
      
          This change introduces the 'parse-key-values' subcommand, which parses
     -    stdin as a list of lines. These are fed into bundle_uri_parse_line() to
     -    test how we construct a 'struct bundle_list' from that data. The list is
     -    then output to stdout as if the key-value pairs were a Git config file.
     +    an input file as a list of lines. These are fed into
     +    bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
     +    from that data. The list is then output to stdout as if the key-value
     +    pairs were a Git config file.
     +
     +    We use an input file instead of stdin because of a future change to
     +    parse in config-file format that works better as an input file.
      
          Co-authored-by: Derrick Stolee <derrickstolee@github.com>
          Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +		mode = "<unknown>";
      +	}
      +
     -+	printf("[bundle]\n");
     -+	printf("\tversion = %d\n", list->version);
     -+	printf("\tmode = %s\n", mode);
     ++	fprintf(fp, "[bundle]\n");
     ++	fprintf(fp, "\tversion = %d\n", list->version);
     ++	fprintf(fp, "\tmode = %s\n", mode);
      +
      +	for_all_bundles_in_list(list, summarize_bundle, fp);
      +}
     @@ t/helper/test-bundle-uri.c (new)
      +#include "strbuf.h"
      +#include "string-list.h"
      +
     -+static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
     ++static int cmd__bundle_uri_parse(int argc, const char **argv)
      +{
     -+	const char *usage[] = {
     -+		"test-tool bundle-uri parse-key-values <in",
     ++	const char *key_value_usage[] = {
     ++		"test-tool bundle-uri parse-key-values <input>",
      +		NULL
      +	};
     ++	const char **usage = key_value_usage;
      +	struct option options[] = {
      +		OPT_END(),
      +	};
      +	struct strbuf sb = STRBUF_INIT;
      +	struct bundle_list list;
      +	int err = 0;
     ++	FILE *fp;
      +
      +	argc = parse_options(argc, argv, NULL, options, usage, 0);
     -+	if (argc)
     ++	if (argc != 1)
      +		goto usage;
      +
      +	init_bundle_list(&list);
     -+	while (strbuf_getline(&sb, stdin) != EOF) {
     -+		if (bundle_uri_parse_line(&list, sb.buf) < 0)
     ++	fp = fopen(argv[0], "r");
     ++	if (!fp)
     ++		die("failed to open '%s'", argv[0]);
     ++
     ++	while (strbuf_getline(&sb, fp) != EOF) {
     ++		if (bundle_uri_parse_line(&list, sb.buf))
      +			err = error("bad line: '%s'", sb.buf);
      +	}
      +	strbuf_release(&sb);
     ++	fclose(fp);
      +
      +	print_bundle_list(stdout, &list);
      +
     @@ t/helper/test-bundle-uri.c (new)
      +		goto usage;
      +
      +	if (!strcmp(argv[1], "parse-key-values"))
     -+		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
     ++		return cmd__bundle_uri_parse(argc - 1, argv + 1);
      +	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
      +
      +usage:
     @@ t/t5750-bundle-uri-parse.sh (new)
      +		uri = file:///usr/share/git/bundle.bdl
      +	EOF
      +
     -+	test-tool bundle-uri parse-key-values <in >actual 2>err &&
     ++	test-tool bundle-uri parse-key-values in >actual 2>err &&
      +	test_must_be_empty err &&
      +	test_cmp_config_output expect actual
      +'
     @@ t/t5750-bundle-uri-parse.sh (new)
      +		mode = all
      +	EOF
      +
     -+	test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err &&
     ++	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
      +	test_cmp err.expect err &&
      +	test_cmp_config_output expect actual
      +'
     @@ t/t5750-bundle-uri-parse.sh (new)
      +		uri = file:///usr/share/git/bundle.bdl
      +	EOF
      +
     -+	test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err &&
     ++	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
     ++	test_cmp err.expect err &&
     ++	test_cmp_config_output expect actual
     ++'
     ++
     ++test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' '
     ++	cat >in <<-\EOF &&
     ++	bundle.one.uri=http://example.com/bundle.bdl
     ++	bundle.two.uri=https://example.com/bundle.bdl
     ++	bundle.one.uri=https://example.com/bundle-2.bdl
     ++	bundle.three.uri=file:///usr/share/git/bundle.bdl
     ++	EOF
     ++
     ++	cat >err.expect <<-\EOF &&
     ++	error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\''
     ++	EOF
     ++
     ++	# We fail, but try to continue parsing regardless
     ++	cat >expect <<-\EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = all
     ++	[bundle "one"]
     ++		uri = http://example.com/bundle.bdl
     ++	[bundle "two"]
     ++		uri = https://example.com/bundle.bdl
     ++	[bundle "three"]
     ++		uri = file:///usr/share/git/bundle.bdl
     ++	EOF
     ++
     ++	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
      +	test_cmp err.expect err &&
      +	test_cmp_config_output expect actual
      +'
  5:  1d1bd9c7103 !  7:  1492b8f5ef0 bundle-uri: parse bundle list in config format
     @@ Commit message
          our expected "key=value" pairs and instead format them using Git config
          format.
      
     -    Create parse_bundle_list_in_config_format() to parse a file in config
     -    format and convert that into a 'struct bundle_list' filled with its
     +    Create bundle_uri_parse_config_format() to parse a file in config format
     +    and convert that into a 'struct bundle_list' filled with its
          understanding of the contents.
      
     -    Be careful to call git_config_from_file_with_options() because the
     -    default action for git_config_from_file() is to die() on a parsing
     -    error. The current warning isn't particularly helpful if it arises to a
     -    user, but it will be made more verbose at a higher layer later.
     +    Be careful to use error_action CONFIG_ERROR_ERROR when calling
     +    git_config_from_file_with_options() because the default action for
     +    git_config_from_file() is to die() on a parsing error.  The current
     +    warning isn't particularly helpful if it arises to a user, but it will
     +    be made more verbose at a higher layer later.
      
          Update 'test-tool bundle-uri' to take this config file format as input.
          It uses a filename instead of stdin because there is no existing way to
          parse a FILE pointer in the config machinery. Using
          git_config_from_mem() is overly complicated and more likely to introduce
     -    bugs than this simpler version. I would rather have a slightly confusing
     -    test helper than complicated product code.
     +    bugs than this simpler version.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## bundle-uri.c ##
     -@@
     - #include "run-command.h"
     - #include "hashmap.h"
     - #include "pkt-line.h"
     -+#include "config.h"
     - 
     - static int compare_bundles(const void *hashmap_cmp_fn_data,
     - 			   const struct hashmap_entry *he1,
      @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value,
       	return 0;
       }
     @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value,
      +	return bundle_list_update(key, value, list);
      +}
      +
     -+int parse_bundle_list_in_config_format(const char *uri,
     -+				       const char *filename,
     -+				       struct bundle_list *list)
     ++int bundle_uri_parse_config_format(const char *uri,
     ++				   const char *filename,
     ++				   struct bundle_list *list)
      +{
      +	int result;
      +	struct config_options opts = {
      +		.error_action = CONFIG_ERROR_ERROR,
      +	};
      +
     -+	list->mode = BUNDLE_MODE_NONE;
      +	result = git_config_from_file_with_options(config_to_bundle_list,
      +						   filename, list,
      +						   &opts);
     @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value,
      +	return result;
      +}
      +
     - static int find_temp_filename(struct strbuf *name)
     + static char *find_temp_filename(void)
       {
       	int fd;
      
     @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list,
      + * pairs are provided in config file format. This method is
      + * exposed publicly for testing purposes.
      + */
     -+
     -+int parse_bundle_list_in_config_format(const char *uri,
     -+				       const char *filename,
     -+				       struct bundle_list *list);
     ++int bundle_uri_parse_config_format(const char *uri,
     ++				   const char *filename,
     ++				   struct bundle_list *list);
      +
       /**
        * Fetch data from the given 'uri' and unbundle the bundle data found
     @@ t/helper/test-bundle-uri.c
       #include "strbuf.h"
       #include "string-list.h"
       
     --static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
     +-static int cmd__bundle_uri_parse(int argc, const char **argv)
      +enum input_mode {
      +	KEY_VALUE_PAIRS,
      +	CONFIG_FILE,
     @@ t/helper/test-bundle-uri.c
      +
      +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
       {
     --	const char *usage[] = {
     -+	const char *key_value_usage[] = {
     - 		"test-tool bundle-uri parse-key-values <in",
     + 	const char *key_value_usage[] = {
     + 		"test-tool bundle-uri parse-key-values <input>",
       		NULL
       	};
      +	const char *config_usage[] = {
      +		"test-tool bundle-uri parse-config <input>",
      +		NULL
      +	};
     + 	const char **usage = key_value_usage;
       	struct option options[] = {
       		OPT_END(),
     - 	};
     -+	const char **usage = key_value_usage;
     - 	struct strbuf sb = STRBUF_INIT;
     - 	struct bundle_list list;
     +@@ t/helper/test-bundle-uri.c: static int cmd__bundle_uri_parse(int argc, const char **argv)
       	int err = 0;
     + 	FILE *fp;
       
      -	argc = parse_options(argc, argv, NULL, options, usage, 0);
     --	if (argc)
     +-	if (argc != 1)
      -		goto usage;
      +	if (mode == CONFIG_FILE)
      +		usage = config_usage;
     @@ t/helper/test-bundle-uri.c
      +			     PARSE_OPT_STOP_AT_NON_OPTION);
       
       	init_bundle_list(&list);
     --	while (strbuf_getline(&sb, stdin) != EOF) {
     --		if (bundle_uri_parse_line(&list, sb.buf) < 0)
     +-	fp = fopen(argv[0], "r");
     +-	if (!fp)
     +-		die("failed to open '%s'", argv[0]);
     + 
     +-	while (strbuf_getline(&sb, fp) != EOF) {
     +-		if (bundle_uri_parse_line(&list, sb.buf))
      -			err = error("bad line: '%s'", sb.buf);
     -+
      +	switch (mode) {
      +	case KEY_VALUE_PAIRS:
     -+		if (argc)
     ++		if (argc != 1)
      +			goto usage;
     -+		while (strbuf_getline(&sb, stdin) != EOF) {
     -+			if (bundle_uri_parse_line(&list, sb.buf) < 0)
     ++		fp = fopen(argv[0], "r");
     ++		if (!fp)
     ++			die("failed to open '%s'", argv[0]);
     ++		while (strbuf_getline(&sb, fp) != EOF) {
     ++			if (bundle_uri_parse_line(&list, sb.buf))
      +				err = error("bad line: '%s'", sb.buf);
      +		}
     ++		fclose(fp);
      +		break;
      +
      +	case CONFIG_FILE:
      +		if (argc != 1)
      +			goto usage;
     -+		err = parse_bundle_list_in_config_format("<uri>", argv[0], &list);
     ++		err = bundle_uri_parse_config_format("<uri>", argv[0], &list);
      +		break;
       	}
       	strbuf_release(&sb);
     +-	fclose(fp);
     + 
     + 	print_bundle_list(stdout, &list);
       
      @@ t/helper/test-bundle-uri.c: int cmd__bundle_uri(int argc, const char **argv)
       		goto usage;
       
       	if (!strcmp(argv[1], "parse-key-values"))
     --		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
     +-		return cmd__bundle_uri_parse(argc - 1, argv + 1);
      +		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
      +	if (!strcmp(argv[1], "parse-config"))
      +		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
     @@ t/helper/test-bundle-uri.c: int cmd__bundle_uri(int argc, const char **argv)
       usage:
      
       ## t/t5750-bundle-uri-parse.sh ##
     -@@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
     +@@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines
       	test_cmp_config_output expect actual
       '
       
     @@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsin
      +	cat >expect <<-\EOF &&
      +	[bundle]
      +		version = 1
     -+		mode = <unknown>
     ++		mode = all
      +	EOF
      +
      +	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
     @@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsin
      +	EOF
      +
      +	cat >err2 <<-EOF &&
     -+	warning: bundle list at '\''<uri>'\'' has no mode
     ++	error: bad config line 1 in file in2
      +	EOF
      +
      +	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
  6:  039e172849c !  8:  b5d570082fa bundle-uri: limit recursion depth for bundle lists
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +				     int depth)
       {
       	int result = 0;
     - 	struct strbuf filename = STRBUF_INIT;
     + 	char *filename;
       
      +	if (depth >= max_bundle_uri_depth) {
      +		warning(_("exceeded bundle URI recursion limit (%d)"),
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +		return -1;
      +	}
      +
     - 	if ((result = find_temp_filename(&filename)))
     + 	if (!(filename = find_temp_filename())) {
     + 		result = -1;
       		goto cleanup;
     - 
      @@ bundle-uri.c: cleanup:
       	return result;
       }
  7:  7b45c06cc9e !  9:  a6ab8f7c699 bundle-uri: fetch a list of bundles
     @@ Commit message
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## bundle-uri.c ##
     -@@ bundle-uri.c: void init_bundle_list(struct bundle_list *list)
     - static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
     - 				    void *data)
     +@@ bundle-uri.c: static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
       {
     --	free(bundle->id);
     --	free(bundle->uri);
     -+	FREE_AND_NULL(bundle->id);
     -+	FREE_AND_NULL(bundle->uri);
     - 	strbuf_release(&bundle->file);
     + 	FREE_AND_NULL(bundle->id);
     + 	FREE_AND_NULL(bundle->uri);
     ++	FREE_AND_NULL(bundle->file);
      +	bundle->unbundled = 0;
       	return 0;
       }
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +
      +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
      +{
     ++	int res;
      +	struct bundle_list_context *ctx = data;
      +
      +	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
      +		return 0;
      +
     -+	ctx->count++;
     -+	return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
     ++	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
     ++
     ++	/*
     ++	 * Only increment count if the download succeeded. If our mode is
     ++	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
     ++	 * list in case they work instead.
     ++	 */
     ++	if (!res)
     ++		ctx->count++;
     ++	return res;
      +}
      +
      +static int download_bundle_list(struct repository *r,
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +
      +	init_bundle_list(&list_from_bundle);
      +
     -+	if ((result = parse_bundle_list_in_config_format(bundle->uri,
     -+							 bundle->file.buf,
     -+							 &list_from_bundle)))
     ++	if ((result = bundle_uri_parse_config_format(bundle->uri,
     ++						     bundle->file,
     ++						     &list_from_bundle)))
      +		goto cleanup;
      +
      +	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +				     struct bundle_list *list)
       {
       	int result = 0;
     --	struct strbuf filename = STRBUF_INIT;
     +-	char *filename;
      +	struct remote_bundle_info *bcopy;
       
       	if (depth >= max_bundle_uri_depth) {
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
       		return -1;
       	}
       
     --	if ((result = find_temp_filename(&filename)))
     -+	if (!bundle->file.len &&
     -+	    (result = find_temp_filename(&bundle->file)))
     +-	if (!(filename = find_temp_filename())) {
     ++	if (!bundle->file &&
     ++	    !(bundle->file = find_temp_filename())) {
     + 		result = -1;
       		goto cleanup;
     + 	}
       
     --	if ((result = copy_uri_to_file(filename.buf, uri))) {
     +-	if ((result = copy_uri_to_file(filename, uri))) {
      -		warning(_("failed to download bundle from URI '%s'"), uri);
     -+	if ((result = copy_uri_to_file(bundle->file.buf, bundle->uri))) {
     ++	if ((result = copy_uri_to_file(bundle->file, bundle->uri))) {
      +		warning(_("failed to download bundle from URI '%s'"), bundle->uri);
       		goto cleanup;
       	}
       
     --	if ((result = !is_bundle(filename.buf, 0))) {
     +-	if ((result = !is_bundle(filename, 0))) {
      -		warning(_("file at URI '%s' is not a bundle"), uri);
     -+	if ((result = !is_bundle(bundle->file.buf, 1))) {
     ++	if ((result = !is_bundle(bundle->file, 1))) {
      +		result = fetch_bundle_list_in_config_format(
      +				r, list, bundle, depth);
      +		if (result)
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
       		goto cleanup;
       	}
       
     --	if ((result = unbundle_from_file(r, filename.buf))) {
     +-	if ((result = unbundle_from_file(r, filename))) {
      -		warning(_("failed to unbundle bundle from URI '%s'"), uri);
      -		goto cleanup;
      -	}
      +	/* Copy the bundle and insert it into the global list. */
      +	CALLOC_ARRAY(bcopy, 1);
      +	bcopy->id = xstrdup(bundle->id);
     -+	strbuf_init(&bcopy->file, 0);
     -+	strbuf_add(&bcopy->file, bundle->file.buf, bundle->file.len);
     ++	bcopy->file = xstrdup(bundle->file);
      +	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
      +	hashmap_add(&list->bundles, &bcopy->ent);
       
       cleanup:
     --	unlink(filename.buf);
     --	strbuf_release(&filename);
     -+	if (result)
     -+		unlink(bundle->file.buf);
     +-	if (filename)
     +-		unlink(filename);
     +-	free(filename);
     ++	if (result && bundle->file)
     ++		unlink(bundle->file);
       	return result;
       }
       
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
      +{
      +	struct attempt_unbundle_context *ctx = data;
      +
     -+	if (info->unbundled || !unbundle_from_file(ctx->r, info->file.buf)) {
     ++	if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) {
      +		ctx->success_count++;
      +		info->unbundled = 1;
      +	} else {
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
      +
      +static int unlink_bundle(struct remote_bundle_info *info, void *data)
      +{
     -+	if (info->file.buf)
     -+		unlink_or_warn(info->file.buf);
     ++	if (info->file)
     ++		unlink_or_warn(info->file);
      +	return 0;
      +}
      +
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
      +	struct bundle_list list;
      +	struct remote_bundle_info bundle = {
      +		.uri = xstrdup(uri),
     -+		.id = xstrdup("<root>"),
     -+		.file = STRBUF_INIT,
     ++		.id = xstrdup(""),
      +	};
      +
      +	init_bundle_list(&list);
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
      
       ## bundle-uri.h ##
      @@ bundle-uri.h: struct remote_bundle_info {
     - 	 * an empty string.
     + 	 * if there was no table of contents.
       	 */
     - 	struct strbuf file;
     + 	char *uri;
     ++
     ++	/**
     ++	 * If the bundle has been downloaded, then 'file' is a
     ++	 * filename storing its contents. Otherwise, 'file' is
     ++	 * NULL.
     ++	 */
     ++	char *file;
      +
      +	/**
      +	 * If the bundle has been unbundled successfully, then
     @@ bundle-uri.h: struct remote_bundle_info {
      +	unsigned unbundled:1;
       };
       
     - #define REMOTE_BUNDLE_INFO_INIT { \
     + #define REMOTE_BUNDLE_INFO_INIT { 0 }
      
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' '

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v2 1/9] bundle-uri: short-circuit capability parsing
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-09 17:24     ` Junio C Hamano
  2022-09-09 14:33   ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When parsing the capability lines from the 'git remote-https' process,
we can stop reading the lines once we notice the 'get' capability.

Reported-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4a8cc74ed05..7173ed065e9 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -56,8 +56,10 @@ static int download_https_uri_to_file(const char *file, const char *uri)
 	while (!strbuf_getline(&line, child_out)) {
 		if (!line.len)
 			break;
-		if (!strcmp(line.buf, "get"))
+		if (!strcmp(line.buf, "get")) {
 			found_get = 1;
+			break;
+		}
 	}
 	strbuf_release(&line);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename()
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-09 17:56     ` Junio C Hamano
  2022-09-09 14:33   ` [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The find_temp_filename() method was created in 53a50892be2 (bundle-uri:
create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to
create a temporary filename. The odb_mkstemp() method uses a strbuf in
its interface, but we do not need to continue carrying a strbuf
throughout the bundle URI code.

Convert the find_temp_filename() method to use a 'char *' and modify its
only caller. This makes sense that we don't actually need to modify this
filename directly later, so using a strbuf is overkill.

This change will simplify the data structure for tracking a bundle list
to use plain strings instead of strbufs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 7173ed065e9..c52b2a2a64a 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -5,22 +5,23 @@
 #include "refs.h"
 #include "run-command.h"
 
-static int find_temp_filename(struct strbuf *name)
+static char *find_temp_filename(void)
 {
 	int fd;
+	struct strbuf name = STRBUF_INIT;
 	/*
 	 * Find a temporary filename that is available. This is briefly
 	 * racy, but unlikely to collide.
 	 */
-	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX");
 	if (fd < 0) {
 		warning(_("failed to create temporary file"));
-		return -1;
+		return NULL;
 	}
 
 	close(fd);
-	unlink(name->buf);
-	return 0;
+	unlink(name.buf);
+	return strbuf_detach(&name, NULL);
 }
 
 static int download_https_uri_to_file(const char *file, const char *uri)
@@ -143,28 +144,31 @@ static int unbundle_from_file(struct repository *r, const char *file)
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
 	int result = 0;
-	struct strbuf filename = STRBUF_INIT;
+	char *filename;
 
-	if ((result = find_temp_filename(&filename)))
+	if (!(filename = find_temp_filename())) {
+		result = -1;
 		goto cleanup;
+	}
 
-	if ((result = copy_uri_to_file(filename.buf, uri))) {
+	if ((result = copy_uri_to_file(filename, uri))) {
 		warning(_("failed to download bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename.buf, 0))) {
+	if ((result = !is_bundle(filename, 0))) {
 		warning(_("file at URI '%s' is not a bundle"), uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename.buf))) {
+	if ((result = unbundle_from_file(r, filename))) {
 		warning(_("failed to unbundle bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
 cleanup:
-	unlink(filename.buf);
-	strbuf_release(&filename);
+	if (filename)
+		unlink(filename);
+	free(filename);
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.

In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.

Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.

The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:

1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
   together. The client should download all of the advertised data to
   have a complete copy of the data.

2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
   copy of the data. The client can choose arbitrarily from these
   options. In the future, the client may use pings to find the closest
   URI among geodistributed replicas, or use some other heuristic
   information added to the format.

This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index c52b2a2a64a..7a0bada6eda 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -4,6 +4,66 @@
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
+#include "hashmap.h"
+#include "pkt-line.h"
+
+static int compare_bundles(const void *hashmap_cmp_fn_data,
+			   const struct hashmap_entry *he1,
+			   const struct hashmap_entry *he2,
+			   const void *id)
+{
+	const struct remote_bundle_info *e1 =
+		container_of(he1, const struct remote_bundle_info, ent);
+	const struct remote_bundle_info *e2 =
+		container_of(he2, const struct remote_bundle_info, ent);
+
+	return strcmp(e1->id, id ? (const char *)id : e2->id);
+}
+
+void init_bundle_list(struct bundle_list *list)
+{
+	memset(list, 0, sizeof(*list));
+
+	/* Implied defaults. */
+	list->mode = BUNDLE_MODE_ALL;
+	list->version = 1;
+
+	hashmap_init(&list->bundles, compare_bundles, NULL, 0);
+}
+
+static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
+				    void *data)
+{
+	FREE_AND_NULL(bundle->id);
+	FREE_AND_NULL(bundle->uri);
+	return 0;
+}
+
+void clear_bundle_list(struct bundle_list *list)
+{
+	if (!list)
+		return;
+
+	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
+	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+}
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data)
+{
+	struct remote_bundle_info *info;
+	struct hashmap_iter i;
+
+	hashmap_for_each_entry(&list->bundles, &i, info, ent) {
+		int result = iter(info, data);
+
+		if (result)
+			return result;
+	}
+
+	return 0;
+}
 
 static char *find_temp_filename(void)
 {
diff --git a/bundle-uri.h b/bundle-uri.h
index 8a152f1ef14..ff7e3fd3fb2 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -1,7 +1,63 @@
 #ifndef BUNDLE_URI_H
 #define BUNDLE_URI_H
 
+#include "hashmap.h"
+#include "strbuf.h"
+
 struct repository;
+struct string_list;
+
+/**
+ * The remote_bundle_info struct contains information for a single bundle
+ * URI. This may be initialized simply by a given URI or might have
+ * additional metadata associated with it if the bundle was advertised by
+ * a bundle list.
+ */
+struct remote_bundle_info {
+	struct hashmap_entry ent;
+
+	/**
+	 * The 'id' is a name given to the bundle for reference
+	 * by other bundle infos.
+	 */
+	char *id;
+
+	/**
+	 * The 'uri' is the location of the remote bundle so
+	 * it can be downloaded on-demand. This will be NULL
+	 * if there was no table of contents.
+	 */
+	char *uri;
+};
+
+#define REMOTE_BUNDLE_INFO_INIT { 0 }
+
+enum bundle_list_mode {
+	BUNDLE_MODE_NONE = 0,
+	BUNDLE_MODE_ALL,
+	BUNDLE_MODE_ANY
+};
+
+/**
+ * A bundle_list contains an unordered set of remote_bundle_info structs,
+ * as well as information about the bundle listing, such as version and
+ * mode.
+ */
+struct bundle_list {
+	int version;
+	enum bundle_list_mode mode;
+	struct hashmap bundles;
+};
+
+void init_bundle_list(struct bundle_list *list);
+void clear_bundle_list(struct bundle_list *list);
+
+typedef int (*bundle_iterator)(struct remote_bundle_info *bundle,
+			       void *data);
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data);
 
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 4/9] bundle-uri: create base key-value pair parsing
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-29 21:49     ` Jonathan Tan
  2022-09-09 14:33   ` [PATCH v2 5/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.

Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:

* bundle.version: This value should be an integer. Git currently
  understands only version 1 and will ignore the list if the version is
  any other value. This version can be increased in the future if we
  need to add new keys that Git should not ignore. We can add new
  "heuristic" keys without incrementing the version.

* bundle.mode: This value should be one of "all" or "any". If this
  mode is not understood, then Git will ignore the list. This mode
  indicates whether Git needs all of the bundle list items to make a
  complete view of the content or if any single item is sufficient.

The rest of the keys use a bundle identifier "<id>" as part of the key
name. Keys using the same "<id>" describe a single bundle list item.

* bundle.<id>.uri: This stores the URI of the bundle item. This
  currently is expected to be an absolute URI, but will be relaxed to be
  a relative URI in the future.

While parsing, return an error if a URI key is repeated, since we can
make that restriction with bundle lists.

Make the git_parse_int() method global so we can parse the integer
version value carefully.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config.txt        |  2 +
 Documentation/config/bundle.txt | 24 +++++++++++
 bundle-uri.c                    | 76 +++++++++++++++++++++++++++++++++
 config.c                        |  2 +-
 config.h                        |  1 +
 5 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/config/bundle.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e376d547ce0..4280af6992e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -387,6 +387,8 @@ include::config/branch.txt[]
 
 include::config/browser.txt[]
 
+include::config/bundle.txt[]
+
 include::config/checkout.txt[]
 
 include::config/clean.txt[]
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
new file mode 100644
index 00000000000..daa21eb674a
--- /dev/null
+++ b/Documentation/config/bundle.txt
@@ -0,0 +1,24 @@
+bundle.*::
+	The `bundle.*` keys may appear in a bundle list file found via the
+	`git clone --bundle-uri` option. These keys currently have no effect
+	if placed in a repository config file, though this will change in the
+	future. See link:technical/bundle-uri.html[the bundle URI design
+	document] for more details.
+
+bundle.version::
+	This integer value advertises the version of the bundle list format
+	used by the bundle list. Currently, the only accepted value is `1`.
+
+bundle.mode::
+	This string value should be either `all` or `any`. This value describes
+	whether all of the advertised bundles are required to unbundle a
+	complete understanding of the bundled information (`all`) or if any one
+	of the listed bundle URIs is sufficient (`any`).
+
+bundle.<id>.*::
+	The `bundle.<id>.*` keys are used to describe a single item in the
+	bundle list, grouped under `<id>` for identification purposes.
+
+bundle.<id>.uri::
+	This string value defines the URI by which Git can reach the contents
+	of this `<id>`. This URI may be a bundle file or another bundle list.
diff --git a/bundle-uri.c b/bundle-uri.c
index 7a0bada6eda..4ccd14c8936 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -6,6 +6,7 @@
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "config.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+/**
+ * Given a key-value pair, update the state of the given bundle list.
+ * Returns 0 if the key-value pair is understood. Returns 1 if the key
+ * is not understood or the value is malformed.
+ */
+MAYBE_UNUSED
+static int bundle_list_update(const char *key, const char *value,
+			      struct bundle_list *list)
+{
+	struct strbuf id = STRBUF_INIT;
+	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
+	struct remote_bundle_info *bundle;
+	const char *subsection, *subkey;
+	size_t subsection_len;
+
+	if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
+		return -1;
+
+	if (!subsection_len) {
+		if (!strcmp(subkey, "version")) {
+			int version;
+			if (!git_parse_int(value, &version))
+				return -1;
+			if (version != 1)
+				return -1;
+
+			list->version = version;
+			return 0;
+		}
+
+		if (!strcmp(subkey, "mode")) {
+			if (!strcmp(value, "all"))
+				list->mode = BUNDLE_MODE_ALL;
+			else if (!strcmp(value, "any"))
+				list->mode = BUNDLE_MODE_ANY;
+			else
+				return -1;
+			return 0;
+		}
+
+		/* Ignore other unknown global keys. */
+		return 0;
+	}
+
+	strbuf_add(&id, subsection, subsection_len);
+
+	/*
+	 * Check for an existing bundle with this <id>, or create one
+	 * if necessary.
+	 */
+	lookup.id = id.buf;
+	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
+	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
+		CALLOC_ARRAY(bundle, 1);
+		bundle->id = strbuf_detach(&id, NULL);
+		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
+		hashmap_add(&list->bundles, &bundle->ent);
+	}
+	strbuf_release(&id);
+
+	if (!strcmp(subkey, "uri")) {
+		if (bundle->uri)
+			return -1;
+		bundle->uri = xstrdup(value);
+		return 0;
+	}
+
+	/*
+	 * At this point, we ignore any information that we don't
+	 * understand, assuming it to be hints for a heuristic the client
+	 * does not currently understand.
+	 */
+	return 0;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/config.c b/config.c
index 015bec360f5..e93101249f6 100644
--- a/config.c
+++ b/config.c
@@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max)
 	return 0;
 }
 
-static int git_parse_int(const char *value, int *ret)
+int git_parse_int(const char *value, int *ret)
 {
 	intmax_t tmp;
 	if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int)))
diff --git a/config.h b/config.h
index ca994d77147..ef9eade6414 100644
--- a/config.h
+++ b/config.h
@@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *,
 
 int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
+int git_parse_int(const char *value, int *ret);
 
 /**
  * Same as `git_config_bool`, except that it returns -1 on error rather
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 5/9] bundle-uri: create "key=value" line parsing
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

When advertising a bundle list over Git's protocol v2, we will use
packet lines. Each line will be of the form "key=value" representing a
bundle list. Connect the API necessary for Git's transport to the
key-value pair parsing created in the previous change.

We are not currently implementing this protocol v2 functionality, but
instead preparing to expose this parsing to be unit-testable.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 27 ++++++++++++++++++++++++++-
 bundle-uri.h | 12 ++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4ccd14c8936..d4eb1ec7d4d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list,
  * Returns 0 if the key-value pair is understood. Returns 1 if the key
  * is not understood or the value is malformed.
  */
-MAYBE_UNUSED
 static int bundle_list_update(const char *key, const char *value,
 			      struct bundle_list *list)
 {
@@ -308,3 +307,29 @@ cleanup:
 	free(filename);
 	return result;
 }
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct bundle_list *list, const char *line)
+{
+	int result;
+	const char *equals;
+	struct strbuf key = STRBUF_INIT;
+
+	if (!strlen(line))
+		return error(_("bundle-uri: got an empty line"));
+
+	equals = strchr(line, '=');
+
+	if (!equals)
+		return error(_("bundle-uri: line is not of the form 'key=value'"));
+	if (line == equals || !*(equals + 1))
+		return error(_("bundle-uri: line has empty key or value"));
+
+	strbuf_add(&key, line, equals - line);
+	result = bundle_list_update(key.buf, equals + 1, list);
+	strbuf_release(&key);
+
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index ff7e3fd3fb2..90583461929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * Parse a "key=value" packet line from the bundle-uri verb.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int bundle_uri_parse_line(struct bundle_list *list,
+			  const char *line);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 5/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-09-09 14:33   ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 7/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.

This change introduces the 'parse-key-values' subcommand, which parses
an input file as a list of lines. These are fed into
bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
from that data. The list is then output to stdout as if the key-value
pairs were a Git config file.

We use an input file instead of stdin because of a future change to
parse in config-file format that works better as an input file.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile                    |   1 +
 bundle-uri.c                |  33 ++++++++++
 bundle-uri.h                |   3 +
 t/helper/test-bundle-uri.c  |  70 +++++++++++++++++++++
 t/helper/test-tool.c        |   1 +
 t/helper/test-tool.h        |   1 +
 t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++
 t/test-lib-functions.sh     |  11 ++++
 8 files changed, 241 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 7d5f48069ea..7dee0329c49 100644
--- a/Makefile
+++ b/Makefile
@@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index d4eb1ec7d4d..74d5695e99e 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+static int summarize_bundle(struct remote_bundle_info *info, void *data)
+{
+	FILE *fp = data;
+	fprintf(fp, "[bundle \"%s\"]\n", info->id);
+	fprintf(fp, "\turi = %s\n", info->uri);
+	return 0;
+}
+
+void print_bundle_list(FILE *fp, struct bundle_list *list)
+{
+	const char *mode;
+
+	switch (list->mode) {
+	case BUNDLE_MODE_ALL:
+		mode = "all";
+		break;
+
+	case BUNDLE_MODE_ANY:
+		mode = "any";
+		break;
+
+	case BUNDLE_MODE_NONE:
+	default:
+		mode = "<unknown>";
+	}
+
+	fprintf(fp, "[bundle]\n");
+	fprintf(fp, "\tversion = %d\n", list->version);
+	fprintf(fp, "\tmode = %s\n", mode);
+
+	for_all_bundles_in_list(list, summarize_bundle, fp);
+}
+
 /**
  * Given a key-value pair, update the state of the given bundle list.
  * Returns 0 if the key-value pair is understood. Returns 1 if the key
diff --git a/bundle-uri.h b/bundle-uri.h
index 90583461929..0e56ab2ae5a 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list,
 			    bundle_iterator iter,
 			    void *data);
 
+struct FILE;
+void print_bundle_list(FILE *fp, struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 00000000000..0329c56544f
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,70 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse(int argc, const char **argv)
+{
+	const char *key_value_usage[] = {
+		"test-tool bundle-uri parse-key-values <input>",
+		NULL
+	};
+	const char **usage = key_value_usage;
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct bundle_list list;
+	int err = 0;
+	FILE *fp;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc != 1)
+		goto usage;
+
+	init_bundle_list(&list);
+	fp = fopen(argv[0], "r");
+	if (!fp)
+		die("failed to open '%s'", argv[0]);
+
+	while (strbuf_getline(&sb, fp) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf))
+			err = error("bad line: '%s'", sb.buf);
+	}
+	strbuf_release(&sb);
+	fclose(fp);
+
+	print_bundle_list(stdout, &list);
+
+	clear_bundle_list(&list);
+
+	return !!err;
+
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse-key-values"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c3..fbe2d9d8108 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb799271631..b2aa1f39a8f 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 00000000000..fd142a66ad5
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
+	cat >in <<-\EOF &&
+	=bogus-value
+	bogus-key=
+	EOF
+
+	cat >err.expect <<-EOF &&
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''=bogus-value'\''
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''bogus-key='\''
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+
+	bundle.two.uri=https://example.com/bundle.bdl
+
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.one.uri=https://example.com/bundle-2.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6da7273f1d5..3175d665add 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1956,3 +1956,14 @@ test_is_magic_mtime () {
 	rm -f .git/test-mtime-actual
 	return $ret
 }
+
+# Given two filenames, parse both using 'git config --list --file'
+# and compare the sorted output of those commands. Useful when
+# wanting to ignore whitespace differences and sorting concerns.
+test_cmp_config_output () {
+	git config --list --file="$1" >config-expect &&
+	git config --list --file="$2" >config-actual &&
+	sort config-expect >sorted-expect &&
+	sort config-actual >sorted-actual &&
+	test_cmp sorted-expect sorted-actual
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 7/9] bundle-uri: parse bundle list in config format
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create bundle_uri_parse_config_format() to parse a file in config format
and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to use error_action CONFIG_ERROR_ERROR when calling
git_config_from_file_with_options() because the default action for
git_config_from_file() is to die() on a parsing error.  The current
warning isn't particularly helpful if it arises to a user, but it will
be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 27 ++++++++++++++++++++
 bundle-uri.h                |  9 +++++++
 t/helper/test-bundle-uri.c  | 49 +++++++++++++++++++++++++++---------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 74d5695e99e..92354aa3bbd 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 0e56ab2ae5a..bc13d4c9929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 0329c56544f..25afd393428 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,12 +4,21 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
 	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <input>",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	const char **usage = key_value_usage;
 	struct option options[] = {
 		OPT_END(),
@@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv)
 	int err = 0;
 	FILE *fp;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc != 1)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	fp = fopen(argv[0], "r");
-	if (!fp)
-		die("failed to open '%s'", argv[0]);
 
-	while (strbuf_getline(&sb, fp) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf))
-			err = error("bad line: '%s'", sb.buf);
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc != 1)
+			goto usage;
+		fp = fopen(argv[0], "r");
+		if (!fp)
+			die("failed to open '%s'", argv[0]);
+		while (strbuf_getline(&sb, fp) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf))
+				err = error("bad line: '%s'", sb.buf);
+		}
+		fclose(fp);
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = bundle_uri_parse_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
-	fclose(fp);
 
 	print_bundle_list(stdout, &list);
 
@@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index fd142a66ad5..c2fe3f9c5a5 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	error: bad config line 1 in file in2
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 7/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-09 14:33   ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The next change will start allowing us to parse bundle lists that are
downloaded from a provided bundle URI. Those lists might point to other
lists, which could proceed to an arbitrary depth (and even create
cycles). Restructure fetch_bundle_uri() to have an internal version that
has a recursion depth. Compare that to a new max_bundle_uri_depth
constant that is twice as high as we expect this depth to be for any
legitimate use of bundle list linking.

We can consider making max_bundle_uri_depth a configurable value if
there is demonstrated value in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 92354aa3bbd..b8ca6cd9493 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -336,11 +336,25 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+/**
+ * This limits the recursion on fetch_bundle_uri_internal() when following
+ * bundle lists.
+ */
+static int max_bundle_uri_depth = 4;
+
+static int fetch_bundle_uri_internal(struct repository *r,
+				     const char *uri,
+				     int depth)
 {
 	int result = 0;
 	char *filename;
 
+	if (depth >= max_bundle_uri_depth) {
+		warning(_("exceeded bundle URI recursion limit (%d)"),
+			max_bundle_uri_depth);
+		return -1;
+	}
+
 	if (!(filename = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
@@ -368,6 +382,11 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	return fetch_bundle_uri_internal(r, uri, 0);
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v2 9/9] bundle-uri: fetch a list of bundles
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
@ 2022-09-09 14:33   ` Derrick Stolee via GitGitGadget
  2022-09-29 21:58     ` Jonathan Tan
  2022-09-26 13:19   ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 216 +++++++++++++++++++++++++++++++++---
 bundle-uri.h                |  13 +++
 t/t5558-clone-bundle-uri.sh |  93 ++++++++++++++++
 3 files changed, 306 insertions(+), 16 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index b8ca6cd9493..6a2fea26a94 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
 {
 	FREE_AND_NULL(bundle->id);
 	FREE_AND_NULL(bundle->uri);
+	FREE_AND_NULL(bundle->file);
+	bundle->unbundled = 0;
 	return 0;
 }
 
@@ -336,18 +338,111 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
+struct bundle_list_context {
+	struct repository *r;
+	struct bundle_list *list;
+	enum bundle_list_mode mode;
+	int count;
+	int depth;
+};
+
+/*
+ * This early definition is necessary because we use indirect recursion:
+ *
+ * While iterating through a bundle list that was downloaded as part
+ * of fetch_bundle_uri_internal(), iterator methods eventually call it
+ * again, but with depth + 1.
+ */
+static int fetch_bundle_uri_internal(struct repository *r,
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list);
+
+static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
+{
+	int res;
+	struct bundle_list_context *ctx = data;
+
+	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
+		return 0;
+
+	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
+
+	/*
+	 * Only increment count if the download succeeded. If our mode is
+	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
+	 * list in case they work instead.
+	 */
+	if (!res)
+		ctx->count++;
+	return res;
+}
+
+static int download_bundle_list(struct repository *r,
+				struct bundle_list *local_list,
+				struct bundle_list *global_list,
+				int depth)
+{
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = global_list,
+		.depth = depth + 1,
+		.mode = local_list->mode,
+	};
+
+	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
+}
+
+static int fetch_bundle_list_in_config_format(struct repository *r,
+					      struct bundle_list *global_list,
+					      struct remote_bundle_info *bundle,
+					      int depth)
+{
+	int result;
+	struct bundle_list list_from_bundle;
+
+	init_bundle_list(&list_from_bundle);
+
+	if ((result = bundle_uri_parse_config_format(bundle->uri,
+						     bundle->file,
+						     &list_from_bundle)))
+		goto cleanup;
+
+	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
+		warning(_("unrecognized bundle mode from URI '%s'"),
+			bundle->uri);
+		result = -1;
+		goto cleanup;
+	}
+
+	if ((result = download_bundle_list(r, &list_from_bundle,
+					   global_list, depth)))
+		goto cleanup;
+
+cleanup:
+	clear_bundle_list(&list_from_bundle);
+	return result;
+}
+
 /**
  * This limits the recursion on fetch_bundle_uri_internal() when following
  * bundle lists.
  */
 static int max_bundle_uri_depth = 4;
 
+/**
+ * Recursively download all bundles advertised at the given URI
+ * to files. If the file is a bundle, then add it to the given
+ * 'list'. Otherwise, expect a bundle list and recurse on the
+ * URIs in that list according to the list mode (ANY or ALL).
+ */
 static int fetch_bundle_uri_internal(struct repository *r,
-				     const char *uri,
-				     int depth)
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list)
 {
 	int result = 0;
-	char *filename;
+	struct remote_bundle_info *bcopy;
 
 	if (depth >= max_bundle_uri_depth) {
 		warning(_("exceeded bundle URI recursion limit (%d)"),
@@ -355,36 +450,125 @@ static int fetch_bundle_uri_internal(struct repository *r,
 		return -1;
 	}
 
-	if (!(filename = find_temp_filename())) {
+	if (!bundle->file &&
+	    !(bundle->file = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
 	}
 
-	if ((result = copy_uri_to_file(filename, uri))) {
-		warning(_("failed to download bundle from URI '%s'"), uri);
+	if ((result = copy_uri_to_file(bundle->file, bundle->uri))) {
+		warning(_("failed to download bundle from URI '%s'"), bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename, 0))) {
-		warning(_("file at URI '%s' is not a bundle"), uri);
+	if ((result = !is_bundle(bundle->file, 1))) {
+		result = fetch_bundle_list_in_config_format(
+				r, list, bundle, depth);
+		if (result)
+			warning(_("file at URI '%s' is not a bundle or bundle list"),
+				bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename))) {
-		warning(_("failed to unbundle bundle from URI '%s'"), uri);
-		goto cleanup;
-	}
+	/* Copy the bundle and insert it into the global list. */
+	CALLOC_ARRAY(bcopy, 1);
+	bcopy->id = xstrdup(bundle->id);
+	bcopy->file = xstrdup(bundle->file);
+	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
+	hashmap_add(&list->bundles, &bcopy->ent);
 
 cleanup:
-	if (filename)
-		unlink(filename);
-	free(filename);
+	if (result && bundle->file)
+		unlink(bundle->file);
 	return result;
 }
 
+struct attempt_unbundle_context {
+	struct repository *r;
+	int success_count;
+	int failure_count;
+};
+
+static int attempt_unbundle(struct remote_bundle_info *info, void *data)
+{
+	struct attempt_unbundle_context *ctx = data;
+
+	if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) {
+		ctx->success_count++;
+		info->unbundled = 1;
+	} else {
+		ctx->failure_count++;
+	}
+
+	return 0;
+}
+
+static int unbundle_all_bundles(struct repository *r,
+				struct bundle_list *list)
+{
+	int last_success_count = -1;
+	struct attempt_unbundle_context ctx = {
+		.r = r,
+	};
+
+	/*
+	 * Iterate through all bundles looking for ones that can
+	 * successfully unbundle. If any succeed, then perhaps another
+	 * will succeed in the next attempt.
+	 */
+	while (last_success_count < ctx.success_count) {
+		last_success_count = ctx.success_count;
+
+		ctx.success_count = 0;
+		ctx.failure_count = 0;
+		for_all_bundles_in_list(list, attempt_unbundle, &ctx);
+	}
+
+	if (ctx.success_count)
+		git_config_set_multivar_gently("log.excludedecoration",
+						"refs/bundle/",
+						"refs/bundle/",
+						CONFIG_FLAGS_FIXED_VALUE |
+						CONFIG_FLAGS_MULTI_REPLACE);
+
+	if (ctx.failure_count)
+		warning(_("failed to unbundle %d bundles"),
+			ctx.failure_count);
+
+	return 0;
+}
+
+static int unlink_bundle(struct remote_bundle_info *info, void *data)
+{
+	if (info->file)
+		unlink_or_warn(info->file);
+	return 0;
+}
+
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
-	return fetch_bundle_uri_internal(r, uri, 0);
+	int result;
+	struct bundle_list list;
+	struct remote_bundle_info bundle = {
+		.uri = xstrdup(uri),
+		.id = xstrdup(""),
+	};
+
+	init_bundle_list(&list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &list);
+
+cleanup:
+	for_all_bundles_in_list(&list, unlink_bundle, NULL);
+	clear_bundle_list(&list);
+	clear_remote_bundle_info(&bundle, NULL);
+	return result;
 }
 
 /**
diff --git a/bundle-uri.h b/bundle-uri.h
index bc13d4c9929..4dbc269823c 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -28,6 +28,19 @@ struct remote_bundle_info {
 	 * if there was no table of contents.
 	 */
 	char *uri;
+
+	/**
+	 * If the bundle has been downloaded, then 'file' is a
+	 * filename storing its contents. Otherwise, 'file' is
+	 * NULL.
+	 */
+	char *file;
+
+	/**
+	 * If the bundle has been unbundled successfully, then
+	 * this boolean is true.
+	 */
+	unsigned unbundled:1;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index ad666a2d28a..592790b49f0 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -41,6 +41,72 @@ test_expect_success 'clone with file:// bundle' '
 	test_cmp expect actual
 '
 
+# To get interesting tests for bundle lists, we need to construct a
+# somewhat-interesting commit history.
+#
+# ---------------- bundle-4
+#
+#       4
+#      / \
+# ----|---|------- bundle-3
+#     |   |
+#     |   3
+#     |   |
+# ----|---|------- bundle-2
+#     |   |
+#     2   |
+#     |   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'construct incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit 1 &&
+		git checkout -b left &&
+		test_commit 2 &&
+		git checkout -b right base &&
+		test_commit 3 &&
+		git checkout -b merge left &&
+		git merge right -m "4" &&
+
+		git bundle create bundle-1.bundle base &&
+		git bundle create bundle-2.bundle base..left &&
+		git bundle create bundle-3.bundle base..right &&
+		git bundle create bundle-4.bundle merge --not left right
+	)
+'
+
+test_expect_success 'clone bundle list (file, no heuristic)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = file://$(pwd)/clone-from/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file &&
+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
+	do
+		git -C clone-list-file rev-parse $oid || return 1
+	done
+'
+
+
 #########################################################################
 # HTTP tests begin here
 
@@ -75,6 +141,33 @@ test_expect_success 'clone HTTP bundle' '
 	test_config -C clone-http log.excludedecoration refs/bundle/
 '
 
+test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = $HTTPD_URL/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = $HTTPD_URL/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = $HTTPD_URL/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http &&
+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
+	do
+		git -C clone-list-http rev-parse $oid || return 1
+	done
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 1/9] bundle-uri: short-circuit capability parsing
  2022-09-09 14:33   ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget
@ 2022-09-09 17:24     ` Junio C Hamano
  2022-09-19 17:55       ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-09-09 17:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When parsing the capability lines from the 'git remote-https' process,
> we can stop reading the lines once we notice the 'get' capability.
>
> Reported-by: Teng Long <dyroneteng@gmail.com>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle-uri.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 4a8cc74ed05..7173ed065e9 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -56,8 +56,10 @@ static int download_https_uri_to_file(const char *file, const char *uri)
>  	while (!strbuf_getline(&line, child_out)) {
>  		if (!line.len)
>  			break;
> -		if (!strcmp(line.buf, "get"))
> +		if (!strcmp(line.buf, "get")) {
>  			found_get = 1;
> +			break;
> +		}
>  	}

Hmph, is this safe to do?  Who is feeding child_out?  Aren't they
get upset if we do not slurp what they write to us?  Are we
expecting to read more from them after this part?  Aren't we get
upset if we leave some other stuff when we read from child_out after
we saw "get"?  If we respond to child_in without reading all from
them, do we not get into a deadlock?

Perhaps these are all silly questions, but the description above
does not quite answer them.

>  	strbuf_release(&line);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename()
  2022-09-09 14:33   ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
@ 2022-09-09 17:56     ` Junio C Hamano
  2022-09-19 17:54       ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-09-09 17:56 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The find_temp_filename() method was created in 53a50892be2 (bundle-uri:
> create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to
> create a temporary filename. The odb_mkstemp() method uses a strbuf in
> its interface, but we do not need to continue carrying a strbuf
> throughout the bundle URI code.

What the patch does is not wrong per-se, but it is unfortunate that,
even though we accepted a known-to-be-racy approach for expediency
earlier, the first update to that is not to replace it with a
non-racy and safe approach, but make it easier to use, encouraging
use of the racy approach and give it an appearance of a clean code
X-<.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename()
  2022-09-09 17:56     ` Junio C Hamano
@ 2022-09-19 17:54       ` Derrick Stolee
  2022-09-19 18:16         ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-09-19 17:54 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long

On 9/9/2022 1:56 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The find_temp_filename() method was created in 53a50892be2 (bundle-uri:
>> create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to
>> create a temporary filename. The odb_mkstemp() method uses a strbuf in
>> its interface, but we do not need to continue carrying a strbuf
>> throughout the bundle URI code.
> 
> What the patch does is not wrong per-se, but it is unfortunate that,
> even though we accepted a known-to-be-racy approach for expediency
> earlier, the first update to that is not to replace it with a
> non-racy and safe approach, but make it easier to use, encouraging
> use of the racy approach and give it an appearance of a clean code
> X-<.
 
Hopefully you would be encouraged by future efforts to replace this
temporary file name with something deterministic based on the URI so
we can restart downloads that were halted, even if the process needs
to restart. But for now, this change helps us to remove the strbuf
from the remote_bundle_info struct.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 1/9] bundle-uri: short-circuit capability parsing
  2022-09-09 17:24     ` Junio C Hamano
@ 2022-09-19 17:55       ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-09-19 17:55 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long

On 9/9/2022 1:24 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When parsing the capability lines from the 'git remote-https' process,
>> we can stop reading the lines once we notice the 'get' capability.
>>
>> Reported-by: Teng Long <dyroneteng@gmail.com>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>>  bundle-uri.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index 4a8cc74ed05..7173ed065e9 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -56,8 +56,10 @@ static int download_https_uri_to_file(const char *file, const char *uri)
>>  	while (!strbuf_getline(&line, child_out)) {
>>  		if (!line.len)
>>  			break;
>> -		if (!strcmp(line.buf, "get"))
>> +		if (!strcmp(line.buf, "get")) {
>>  			found_get = 1;
>> +			break;
>> +		}
>>  	}
> 
> Hmph, is this safe to do?  Who is feeding child_out?  Aren't they
> get upset if we do not slurp what they write to us?  Are we
> expecting to read more from them after this part?  Aren't we get
> upset if we leave some other stuff when we read from child_out after
> we saw "get"?  If we respond to child_in without reading all from
> them, do we not get into a deadlock?
> 
> Perhaps these are all silly questions, but the description above
> does not quite answer them.

In my testing, this has not been a problem, but that does not mean
that it is safe to do. I'll drop this patch in v3.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename()
  2022-09-19 17:54       ` Derrick Stolee
@ 2022-09-19 18:16         ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-09-19 18:16 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long

Derrick Stolee <derrickstolee@github.com> writes:

> ... something deterministic based on the URI so
> we can restart downloads that were halted, even if the process needs
> to restart. But for now, this change helps us to remove the strbuf
> from the remote_bundle_info struct.

Yay for a bright future ;-)

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2022-09-09 14:33   ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-09-26 13:19   ` Derrick Stolee
  2022-09-26 19:10     ` Junio C Hamano
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-09-26 13:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long

On 9/9/2022 10:33 AM, Derrick Stolee via GitGitGadget wrote:

> Updates in v2
> =============
> 
> Thank you to all of the voices who chimed in on the previous version. I'm
> sorry it took so long for me to get a new version.
> 
>  * I've done a rather thorough overhaul to minimize how often later patches
>    rewrite portions of earlier patches.
> 
>  * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
>    'char *' and only in the patch where it is first used.
> 
>  * The config documentation is more clearly indicating that the bundle.*
>    section has no effect in the repository config (at the moment, which will
>    change in the next series).
> 
>  * The bundle.version value is now parsed using git_parse_int().
> 
>  * The config key is now parsed using parse_config_key().
> 
>  * Commit messages clarify more about the context of the change in the
>    bigger picture of the bundle URI effort.
> 
>  * Some printf()s are correctly changed to fprintf()s.
> 
>  * The test helper CLI is unified across the two modes. They both take a
>    filename now.
> 
>  * The count of downloaded bundles is now only updated after a successful
>    download, allowing the "any" mode to keep trying after a failure.

If some of the reviewers from v1 could check that I responded to their
comments, then that would be a big help to getting this series moving
again.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists
  2022-09-26 13:19   ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
@ 2022-09-26 19:10     ` Junio C Hamano
  2022-09-29 22:00       ` Jonathan Tan
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-09-26 19:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long

Derrick Stolee <derrickstolee@github.com> writes:

> If some of the reviewers from v1 could check that I responded to their
> comments, then that would be a big help to getting this series moving
> again.

Thanks for a ping.  Also, if reviewers who missed v1 can take a look
and give fresh insights, that would also help polishing the series
further.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 4/9] bundle-uri: create base key-value pair parsing
  2022-09-09 14:33   ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-09-29 21:49     ` Jonathan Tan
  0 siblings, 0 replies; 94+ messages in thread
From: Jonathan Tan @ 2022-09-29 21:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, newren, avarab, mjcheetham,
	steadmon, Glen Choo, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list,
>  	return 0;
>  }
>  
> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.
> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> +			      struct bundle_list *list)
> +{
[snip]
> +	if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
> +		return -1;

The comment at the top should say -1 instead of 1.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 9/9] bundle-uri: fetch a list of bundles
  2022-09-09 14:33   ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-09-29 21:58     ` Jonathan Tan
  2022-09-30 12:49       ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Jonathan Tan @ 2022-09-29 21:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, newren, avarab, mjcheetham,
	steadmon, Glen Choo, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
> +{
> +	int res;
> +	struct bundle_list_context *ctx = data;
> +
> +	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
> +		return 0;
> +
> +	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
> +
> +	/*
> +	 * Only increment count if the download succeeded. If our mode is
> +	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
> +	 * list in case they work instead.
> +	 */
> +	if (!res)
> +		ctx->count++;
> +	return res;
> +}

So this returns nonzero if a download fails...

> +static int download_bundle_list(struct repository *r,
> +				struct bundle_list *local_list,
> +				struct bundle_list *global_list,
> +				int depth)
> +{
> +	struct bundle_list_context ctx = {
> +		.r = r,
> +		.list = global_list,
> +		.depth = depth + 1,
> +		.mode = local_list->mode,
> +	};
> +
> +	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
> +}

...and for_all_bundles_in_list does not proceed with the rest of the
loop if any callback invocation returns nonzero. Don't we need to
continue retrying the others if the mode is ANY?

> +static int attempt_unbundle(struct remote_bundle_info *info, void *data)
> +{
> +	struct attempt_unbundle_context *ctx = data;
> +
> +	if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) {
> +		ctx->success_count++;
> +		info->unbundled = 1;
> +	} else {
> +		ctx->failure_count++;
> +	}
> +
> +	return 0;
> +}

Do we need to handle the case in which a file is missing but it's
expected because the mode is ANY and another file was successfully
downloaded?

> +static int unbundle_all_bundles(struct repository *r,
> +				struct bundle_list *list)
> +{
> +	int last_success_count = -1;
> +	struct attempt_unbundle_context ctx = {
> +		.r = r,
> +	};
> +
> +	/*
> +	 * Iterate through all bundles looking for ones that can
> +	 * successfully unbundle. If any succeed, then perhaps another
> +	 * will succeed in the next attempt.
> +	 */
> +	while (last_success_count < ctx.success_count) {
> +		last_success_count = ctx.success_count;
> +
> +		ctx.success_count = 0;
> +		ctx.failure_count = 0;
> +		for_all_bundles_in_list(list, attempt_unbundle, &ctx);

I think it would have been clearer if the invocation to
for_all_bundles_in_list were to stop early if a bundle has been
successfully unbundled, and then you can just run this loop n times,
instead of needing to reset the success count each time in order to
check that the latest count is more than the prior one. But this works
too.

[snip tests]

I see that there are ALL tests, but could we have an ANY test as well?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists
  2022-09-26 19:10     ` Junio C Hamano
@ 2022-09-29 22:00       ` Jonathan Tan
  2022-09-30 13:21         ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Jonathan Tan @ 2022-09-29 22:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee via GitGitGadget,
	git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Teng Long

Junio C Hamano <gitster@pobox.com> writes:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
> > If some of the reviewers from v1 could check that I responded to their
> > comments, then that would be a big help to getting this series moving
> > again.

Yes, all my comments from v1 were indeed addressed, thanks.

> Thanks for a ping.  Also, if reviewers who missed v1 can take a look
> and give fresh insights, that would also help polishing the series
> further.

I didn't miss v1 but I gave some new insights. :-) The patch set looks
good except for some commands I had on the last one.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 9/9] bundle-uri: fetch a list of bundles
  2022-09-29 21:58     ` Jonathan Tan
@ 2022-09-30 12:49       ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-09-30 12:49 UTC (permalink / raw)
  To: Jonathan Tan, Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, mjcheetham, steadmon,
	Glen Choo, Teng Long

On 9/29/2022 5:58 PM, Jonathan Tan wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
>> +{
>> +	int res;
>> +	struct bundle_list_context *ctx = data;
>> +
>> +	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
>> +		return 0;
>> +
>> +	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
>> +
>> +	/*
>> +	 * Only increment count if the download succeeded. If our mode is
>> +	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
>> +	 * list in case they work instead.
>> +	 */
>> +	if (!res)
>> +		ctx->count++;
>> +	return res;
>> +}
> 
> So this returns nonzero if a download fails...
> 
>> +static int download_bundle_list(struct repository *r,
>> +				struct bundle_list *local_list,
>> +				struct bundle_list *global_list,
>> +				int depth)
>> +{
>> +	struct bundle_list_context ctx = {
>> +		.r = r,
>> +		.list = global_list,
>> +		.depth = depth + 1,
>> +		.mode = local_list->mode,
>> +	};
>> +
>> +	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
>> +}
> 
> ...and for_all_bundles_in_list does not proceed with the rest of the
> loop if any callback invocation returns nonzero. Don't we need to
> continue retrying the others if the mode is ANY?

You are right! Thanks.
 
>> +static int attempt_unbundle(struct remote_bundle_info *info, void *data)
>> +{
>> +	struct attempt_unbundle_context *ctx = data;
>> +
>> +	if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) {
>> +		ctx->success_count++;
>> +		info->unbundled = 1;
>> +	} else {
>> +		ctx->failure_count++;
>> +	}
>> +
>> +	return 0;
>> +}
> 
> Do we need to handle the case in which a file is missing but it's
> expected because the mode is ANY and another file was successfully
> downloaded?

By "file is missing" I think you mean "we never successfully downloaded
that file" and I agree that we should skip those bundles. I'll add more
tests for ANY mode to hopefully catch these issues.

>> +static int unbundle_all_bundles(struct repository *r,
>> +				struct bundle_list *list)
>> +{
>> +	int last_success_count = -1;
>> +	struct attempt_unbundle_context ctx = {
>> +		.r = r,
>> +	};
>> +
>> +	/*
>> +	 * Iterate through all bundles looking for ones that can
>> +	 * successfully unbundle. If any succeed, then perhaps another
>> +	 * will succeed in the next attempt.
>> +	 */
>> +	while (last_success_count < ctx.success_count) {
>> +		last_success_count = ctx.success_count;
>> +
>> +		ctx.success_count = 0;
>> +		ctx.failure_count = 0;
>> +		for_all_bundles_in_list(list, attempt_unbundle, &ctx);
> 
> I think it would have been clearer if the invocation to
> for_all_bundles_in_list were to stop early if a bundle has been
> successfully unbundled, and then you can just run this loop n times,
> instead of needing to reset the success count each time in order to
> check that the latest count is more than the prior one. But this works
> too.

It's a little bit backwards to have the "terminate early with nonzero
value" signal "success", but it would work. With careful commenting, I
think it's doable.

> I see that there are ALL tests, but could we have an ANY test as well?

Yes, excellent point. They are absolutely necessary.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists
  2022-09-29 22:00       ` Jonathan Tan
@ 2022-09-30 13:21         ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-09-30 13:21 UTC (permalink / raw)
  To: Jonathan Tan, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	mjcheetham, steadmon, Glen Choo, Teng Long

On 9/29/2022 6:00 PM, Jonathan Tan wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>> Derrick Stolee <derrickstolee@github.com> writes:
>>
>>> If some of the reviewers from v1 could check that I responded to their
>>> comments, then that would be a big help to getting this series moving
>>> again.
> 
> Yes, all my comments from v1 were indeed addressed, thanks.
> 
>> Thanks for a ping.  Also, if reviewers who missed v1 can take a look
>> and give fresh insights, that would also help polishing the series
>> further.
> 
> I didn't miss v1 but I gave some new insights. :-) The patch set looks
> good except for some commands I had on the last one.

Thanks for taking a detailed look. I've added extra "any" mode tests to
my local branch in addition to the code changes you recommended. I'll
plan to send a v3 early next week, giving time for any other review
comments to trickle in.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 0/9] Bundle URIs III: Parse and download from bundle lists
  2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2022-09-26 13:19   ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
@ 2022-10-04 12:34   ` Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
                       ` (9 more replies)
  10 siblings, 10 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way:

 1. Patches 1-2 are cleanups from the previous part. The first was
    recommended by Teng Long and the second allows us to simplify our bundle
    list data structure slightly.

 2. Patches 3-4 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 3. Patches 5-6 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 4. Patch 7 teaches Git to parse Git config files into bundle lists.

 5. Patches 8-9 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22


Updates in v3
=============

 * Fixed a comment about a return value of -1.
 * Fixed and tested scenario where early URIs fail in "any" mode and Git
   should try the rest of the list.
 * Instead of using 'success_count' and 'failure_count', use the iterator
   return value to terminate the "all" mode loop early.


Updates in v2
=============

Thank you to all of the voices who chimed in on the previous version. I'm
sorry it took so long for me to get a new version.

 * I've done a rather thorough overhaul to minimize how often later patches
   rewrite portions of earlier patches.

 * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
   'char *' and only in the patch where it is first used.

 * The config documentation is more clearly indicating that the bundle.*
   section has no effect in the repository config (at the moment, which will
   change in the next series).

 * The bundle.version value is now parsed using git_parse_int().

 * The config key is now parsed using parse_config_key().

 * Commit messages clarify more about the context of the change in the
   bigger picture of the bundle URI effort.

 * Some printf()s are correctly changed to fprintf()s.

 * The test helper CLI is unified across the two modes. They both take a
   filename now.

 * The count of downloaded bundles is now only updated after a successful
   download, allowing the "any" mode to keep trying after a failure.

Thanks,

 * Stolee

Derrick Stolee (7):
  bundle-uri: use plain string in find_temp_filename()
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle-uri: fetch a list of bundles
  bundle-uri: suppress stderr from remote-https

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  24 ++
 Makefile                        |   1 +
 bundle-uri.c                    | 449 ++++++++++++++++++++++++++++++--
 bundle-uri.h                    |  93 +++++++
 config.c                        |   2 +-
 config.h                        |   1 +
 t/helper/test-bundle-uri.c      |  95 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     | 143 ++++++++++
 t/t5750-bundle-uri-parse.sh     | 171 ++++++++++++
 t/test-lib-functions.sh         |  11 +
 13 files changed, 976 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1333

Range-diff vs v2:

  1:  2ca431e6c37 <  -:  ----------- bundle-uri: short-circuit capability parsing
  2:  ee6c4b824c2 =  1:  48beccb0f5e bundle-uri: use plain string in find_temp_filename()
  3:  d9812440594 =  2:  f0c4457951c bundle-uri: create bundle_list struct and helpers
  4:  70daef66833 !  3:  430e01cd2a4 bundle-uri: create base key-value pair parsing
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
       
      +/**
      + * Given a key-value pair, update the state of the given bundle list.
     -+ * Returns 0 if the key-value pair is understood. Returns 1 if the key
     ++ * Returns 0 if the key-value pair is understood. Returns -1 if the key
      + * is not understood or the value is malformed.
      + */
      +MAYBE_UNUSED
  5:  4df3f834029 !  4:  cd915d57f3b bundle-uri: create "key=value" line parsing
     @@ Commit message
      
       ## bundle-uri.c ##
      @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
     -  * Returns 0 if the key-value pair is understood. Returns 1 if the key
     +  * Returns 0 if the key-value pair is understood. Returns -1 if the key
        * is not understood or the value is malformed.
        */
      -MAYBE_UNUSED
  6:  91c5b58f011 !  5:  4d8cac67f66 bundle-uri: unit test "key=value" parsing
     @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list,
      +
       /**
        * Given a key-value pair, update the state of the given bundle list.
     -  * Returns 0 if the key-value pair is understood. Returns 1 if the key
     +  * Returns 0 if the key-value pair is understood. Returns -1 if the key
      
       ## bundle-uri.h ##
      @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list,
  7:  1492b8f5ef0 =  6:  0ecae3a44b3 bundle-uri: parse bundle list in config format
  8:  b5d570082fa =  7:  7e6b32313b0 bundle-uri: limit recursion depth for bundle lists
  9:  a6ab8f7c699 !  8:  46799648b4c bundle-uri: fetch a list of bundles
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +	 */
      +	if (!res)
      +		ctx->count++;
     -+	return res;
     ++
     ++	/*
     ++	 * In BUNDLE_MODE_ANY, we need to continue iterating until we find
     ++	 * a bundle that works, so do not signal a failure here.
     ++	 */
     ++	return ctx->mode == BUNDLE_MODE_ANY ? 0 : res;
      +}
      +
      +static int download_bundle_list(struct repository *r,
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
       	return result;
       }
       
     -+struct attempt_unbundle_context {
     -+	struct repository *r;
     -+	int success_count;
     -+	int failure_count;
     -+};
     -+
     ++/**
     ++ * This loop iterator breaks the loop with nonzero return code on the
     ++ * first successful unbundling of a bundle.
     ++ */
      +static int attempt_unbundle(struct remote_bundle_info *info, void *data)
      +{
     -+	struct attempt_unbundle_context *ctx = data;
     ++	struct repository *r = data;
      +
     -+	if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) {
     -+		ctx->success_count++;
     ++	if (!info->file || info->unbundled)
     ++		return 0;
     ++
     ++	if (!unbundle_from_file(r, info->file)) {
      +		info->unbundled = 1;
     -+	} else {
     -+		ctx->failure_count++;
     ++		return 1;
      +	}
      +
      +	return 0;
     @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r,
      +static int unbundle_all_bundles(struct repository *r,
      +				struct bundle_list *list)
      +{
     -+	int last_success_count = -1;
     -+	struct attempt_unbundle_context ctx = {
     -+		.r = r,
     -+	};
     -+
      +	/*
      +	 * Iterate through all bundles looking for ones that can
      +	 * successfully unbundle. If any succeed, then perhaps another
      +	 * will succeed in the next attempt.
     ++	 *
     ++	 * Keep in mind that a non-zero result for the loop here means
     ++	 * the loop terminated early on a successful unbundling, which
     ++	 * signals that we can try again.
      +	 */
     -+	while (last_success_count < ctx.success_count) {
     -+		last_success_count = ctx.success_count;
     -+
     -+		ctx.success_count = 0;
     -+		ctx.failure_count = 0;
     -+		for_all_bundles_in_list(list, attempt_unbundle, &ctx);
     -+	}
     -+
     -+	if (ctx.success_count)
     -+		git_config_set_multivar_gently("log.excludedecoration",
     -+						"refs/bundle/",
     -+						"refs/bundle/",
     -+						CONFIG_FLAGS_FIXED_VALUE |
     -+						CONFIG_FLAGS_MULTI_REPLACE);
     -+
     -+	if (ctx.failure_count)
     -+		warning(_("failed to unbundle %d bundles"),
     -+			ctx.failure_count);
     ++	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
      +
      +	return 0;
      +}
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' '
      +		uri = file://$(pwd)/clone-from/bundle-4.bundle
      +	EOF
      +
     -+	git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file &&
     -+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
     -+	do
     -+		git -C clone-list-file rev-parse $oid || return 1
     -+	done
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-list-file cat-file --batch-check <oids
      +'
      +
     ++test_expect_success 'clone bundle list (file, any mode)' '
     ++	cat >bundle-list <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = any
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = $HTTPD_URL/bundle-0.bundle
     ++
     ++	[bundle "bundle-1"]
     ++		uri = $HTTPD_URL/bundle-1.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = $HTTPD_URL/bundle-5.bundle
     ++	EOF
     ++
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-any-file cat-file --batch-check <oids
     ++'
      +
       #########################################################################
       # HTTP tests begin here
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' '
      +		uri = $HTTPD_URL/bundle-4.bundle
      +	EOF
      +
     -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http &&
     -+	for oid in $(git -C clone-from for-each-ref --format="%(objectname)")
     -+	do
     -+		git -C clone-list-http rev-parse $oid || return 1
     -+	done
     ++	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-list-http cat-file --batch-check <oids
     ++'
     ++
     ++test_expect_success 'clone bundle list (HTTP, any mode)' '
     ++	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
     ++	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = any
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = $HTTPD_URL/bundle-0.bundle
     ++
     ++	[bundle "bundle-1"]
     ++		uri = $HTTPD_URL/bundle-1.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = $HTTPD_URL/bundle-5.bundle
     ++	EOF
     ++
     ++	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-any-http cat-file --batch-check <oids
      +'
      +
       # Do not add tests here unless they use the HTTP server, as they will
  -:  ----------- >  9:  d84544859e4 bundle-uri: suppress stderr from remote-https

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename()
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The find_temp_filename() method was created in 53a50892be2 (bundle-uri:
create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to
create a temporary filename. The odb_mkstemp() method uses a strbuf in
its interface, but we do not need to continue carrying a strbuf
throughout the bundle URI code.

Convert the find_temp_filename() method to use a 'char *' and modify its
only caller. This makes sense that we don't actually need to modify this
filename directly later, so using a strbuf is overkill.

This change will simplify the data structure for tracking a bundle list
to use plain strings instead of strbufs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4a8cc74ed05..8b2f4e08c9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -5,22 +5,23 @@
 #include "refs.h"
 #include "run-command.h"
 
-static int find_temp_filename(struct strbuf *name)
+static char *find_temp_filename(void)
 {
 	int fd;
+	struct strbuf name = STRBUF_INIT;
 	/*
 	 * Find a temporary filename that is available. This is briefly
 	 * racy, but unlikely to collide.
 	 */
-	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX");
 	if (fd < 0) {
 		warning(_("failed to create temporary file"));
-		return -1;
+		return NULL;
 	}
 
 	close(fd);
-	unlink(name->buf);
-	return 0;
+	unlink(name.buf);
+	return strbuf_detach(&name, NULL);
 }
 
 static int download_https_uri_to_file(const char *file, const char *uri)
@@ -141,28 +142,31 @@ static int unbundle_from_file(struct repository *r, const char *file)
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
 	int result = 0;
-	struct strbuf filename = STRBUF_INIT;
+	char *filename;
 
-	if ((result = find_temp_filename(&filename)))
+	if (!(filename = find_temp_filename())) {
+		result = -1;
 		goto cleanup;
+	}
 
-	if ((result = copy_uri_to_file(filename.buf, uri))) {
+	if ((result = copy_uri_to_file(filename, uri))) {
 		warning(_("failed to download bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename.buf, 0))) {
+	if ((result = !is_bundle(filename, 0))) {
 		warning(_("file at URI '%s' is not a bundle"), uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename.buf))) {
+	if ((result = unbundle_from_file(r, filename))) {
 		warning(_("failed to unbundle bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
 cleanup:
-	unlink(filename.buf);
-	strbuf_release(&filename);
+	if (filename)
+		unlink(filename);
+	free(filename);
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 3/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.

In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.

Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.

The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:

1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
   together. The client should download all of the advertised data to
   have a complete copy of the data.

2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
   copy of the data. The client can choose arbitrarily from these
   options. In the future, the client may use pings to find the closest
   URI among geodistributed replicas, or use some other heuristic
   information added to the format.

This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 8b2f4e08c9c..f9a8db221bc 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -4,6 +4,66 @@
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
+#include "hashmap.h"
+#include "pkt-line.h"
+
+static int compare_bundles(const void *hashmap_cmp_fn_data,
+			   const struct hashmap_entry *he1,
+			   const struct hashmap_entry *he2,
+			   const void *id)
+{
+	const struct remote_bundle_info *e1 =
+		container_of(he1, const struct remote_bundle_info, ent);
+	const struct remote_bundle_info *e2 =
+		container_of(he2, const struct remote_bundle_info, ent);
+
+	return strcmp(e1->id, id ? (const char *)id : e2->id);
+}
+
+void init_bundle_list(struct bundle_list *list)
+{
+	memset(list, 0, sizeof(*list));
+
+	/* Implied defaults. */
+	list->mode = BUNDLE_MODE_ALL;
+	list->version = 1;
+
+	hashmap_init(&list->bundles, compare_bundles, NULL, 0);
+}
+
+static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
+				    void *data)
+{
+	FREE_AND_NULL(bundle->id);
+	FREE_AND_NULL(bundle->uri);
+	return 0;
+}
+
+void clear_bundle_list(struct bundle_list *list)
+{
+	if (!list)
+		return;
+
+	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
+	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+}
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data)
+{
+	struct remote_bundle_info *info;
+	struct hashmap_iter i;
+
+	hashmap_for_each_entry(&list->bundles, &i, info, ent) {
+		int result = iter(info, data);
+
+		if (result)
+			return result;
+	}
+
+	return 0;
+}
 
 static char *find_temp_filename(void)
 {
diff --git a/bundle-uri.h b/bundle-uri.h
index 8a152f1ef14..ff7e3fd3fb2 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -1,7 +1,63 @@
 #ifndef BUNDLE_URI_H
 #define BUNDLE_URI_H
 
+#include "hashmap.h"
+#include "strbuf.h"
+
 struct repository;
+struct string_list;
+
+/**
+ * The remote_bundle_info struct contains information for a single bundle
+ * URI. This may be initialized simply by a given URI or might have
+ * additional metadata associated with it if the bundle was advertised by
+ * a bundle list.
+ */
+struct remote_bundle_info {
+	struct hashmap_entry ent;
+
+	/**
+	 * The 'id' is a name given to the bundle for reference
+	 * by other bundle infos.
+	 */
+	char *id;
+
+	/**
+	 * The 'uri' is the location of the remote bundle so
+	 * it can be downloaded on-demand. This will be NULL
+	 * if there was no table of contents.
+	 */
+	char *uri;
+};
+
+#define REMOTE_BUNDLE_INFO_INIT { 0 }
+
+enum bundle_list_mode {
+	BUNDLE_MODE_NONE = 0,
+	BUNDLE_MODE_ALL,
+	BUNDLE_MODE_ANY
+};
+
+/**
+ * A bundle_list contains an unordered set of remote_bundle_info structs,
+ * as well as information about the bundle listing, such as version and
+ * mode.
+ */
+struct bundle_list {
+	int version;
+	enum bundle_list_mode mode;
+	struct hashmap bundles;
+};
+
+void init_bundle_list(struct bundle_list *list);
+void clear_bundle_list(struct bundle_list *list);
+
+typedef int (*bundle_iterator)(struct remote_bundle_info *bundle,
+			       void *data);
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data);
 
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 3/9] bundle-uri: create base key-value pair parsing
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 4/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.

Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:

* bundle.version: This value should be an integer. Git currently
  understands only version 1 and will ignore the list if the version is
  any other value. This version can be increased in the future if we
  need to add new keys that Git should not ignore. We can add new
  "heuristic" keys without incrementing the version.

* bundle.mode: This value should be one of "all" or "any". If this
  mode is not understood, then Git will ignore the list. This mode
  indicates whether Git needs all of the bundle list items to make a
  complete view of the content or if any single item is sufficient.

The rest of the keys use a bundle identifier "<id>" as part of the key
name. Keys using the same "<id>" describe a single bundle list item.

* bundle.<id>.uri: This stores the URI of the bundle item. This
  currently is expected to be an absolute URI, but will be relaxed to be
  a relative URI in the future.

While parsing, return an error if a URI key is repeated, since we can
make that restriction with bundle lists.

Make the git_parse_int() method global so we can parse the integer
version value carefully.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config.txt        |  2 +
 Documentation/config/bundle.txt | 24 +++++++++++
 bundle-uri.c                    | 76 +++++++++++++++++++++++++++++++++
 config.c                        |  2 +-
 config.h                        |  1 +
 5 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/config/bundle.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e376d547ce0..4280af6992e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -387,6 +387,8 @@ include::config/branch.txt[]
 
 include::config/browser.txt[]
 
+include::config/bundle.txt[]
+
 include::config/checkout.txt[]
 
 include::config/clean.txt[]
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
new file mode 100644
index 00000000000..daa21eb674a
--- /dev/null
+++ b/Documentation/config/bundle.txt
@@ -0,0 +1,24 @@
+bundle.*::
+	The `bundle.*` keys may appear in a bundle list file found via the
+	`git clone --bundle-uri` option. These keys currently have no effect
+	if placed in a repository config file, though this will change in the
+	future. See link:technical/bundle-uri.html[the bundle URI design
+	document] for more details.
+
+bundle.version::
+	This integer value advertises the version of the bundle list format
+	used by the bundle list. Currently, the only accepted value is `1`.
+
+bundle.mode::
+	This string value should be either `all` or `any`. This value describes
+	whether all of the advertised bundles are required to unbundle a
+	complete understanding of the bundled information (`all`) or if any one
+	of the listed bundle URIs is sufficient (`any`).
+
+bundle.<id>.*::
+	The `bundle.<id>.*` keys are used to describe a single item in the
+	bundle list, grouped under `<id>` for identification purposes.
+
+bundle.<id>.uri::
+	This string value defines the URI by which Git can reach the contents
+	of this `<id>`. This URI may be a bundle file or another bundle list.
diff --git a/bundle-uri.c b/bundle-uri.c
index f9a8db221bc..0bc59dd9c34 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -6,6 +6,7 @@
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "config.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+/**
+ * Given a key-value pair, update the state of the given bundle list.
+ * Returns 0 if the key-value pair is understood. Returns -1 if the key
+ * is not understood or the value is malformed.
+ */
+MAYBE_UNUSED
+static int bundle_list_update(const char *key, const char *value,
+			      struct bundle_list *list)
+{
+	struct strbuf id = STRBUF_INIT;
+	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
+	struct remote_bundle_info *bundle;
+	const char *subsection, *subkey;
+	size_t subsection_len;
+
+	if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
+		return -1;
+
+	if (!subsection_len) {
+		if (!strcmp(subkey, "version")) {
+			int version;
+			if (!git_parse_int(value, &version))
+				return -1;
+			if (version != 1)
+				return -1;
+
+			list->version = version;
+			return 0;
+		}
+
+		if (!strcmp(subkey, "mode")) {
+			if (!strcmp(value, "all"))
+				list->mode = BUNDLE_MODE_ALL;
+			else if (!strcmp(value, "any"))
+				list->mode = BUNDLE_MODE_ANY;
+			else
+				return -1;
+			return 0;
+		}
+
+		/* Ignore other unknown global keys. */
+		return 0;
+	}
+
+	strbuf_add(&id, subsection, subsection_len);
+
+	/*
+	 * Check for an existing bundle with this <id>, or create one
+	 * if necessary.
+	 */
+	lookup.id = id.buf;
+	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
+	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
+		CALLOC_ARRAY(bundle, 1);
+		bundle->id = strbuf_detach(&id, NULL);
+		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
+		hashmap_add(&list->bundles, &bundle->ent);
+	}
+	strbuf_release(&id);
+
+	if (!strcmp(subkey, "uri")) {
+		if (bundle->uri)
+			return -1;
+		bundle->uri = xstrdup(value);
+		return 0;
+	}
+
+	/*
+	 * At this point, we ignore any information that we don't
+	 * understand, assuming it to be hints for a heuristic the client
+	 * does not currently understand.
+	 */
+	return 0;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/config.c b/config.c
index 015bec360f5..e93101249f6 100644
--- a/config.c
+++ b/config.c
@@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max)
 	return 0;
 }
 
-static int git_parse_int(const char *value, int *ret)
+int git_parse_int(const char *value, int *ret)
 {
 	intmax_t tmp;
 	if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int)))
diff --git a/config.h b/config.h
index ca994d77147..ef9eade6414 100644
--- a/config.h
+++ b/config.h
@@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *,
 
 int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
+int git_parse_int(const char *value, int *ret);
 
 /**
  * Same as `git_config_bool`, except that it returns -1 on error rather
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 4/9] bundle-uri: create "key=value" line parsing
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 3/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

When advertising a bundle list over Git's protocol v2, we will use
packet lines. Each line will be of the form "key=value" representing a
bundle list. Connect the API necessary for Git's transport to the
key-value pair parsing created in the previous change.

We are not currently implementing this protocol v2 functionality, but
instead preparing to expose this parsing to be unit-testable.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 27 ++++++++++++++++++++++++++-
 bundle-uri.h | 12 ++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0bc59dd9c34..372e6fac5cf 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list,
  * Returns 0 if the key-value pair is understood. Returns -1 if the key
  * is not understood or the value is malformed.
  */
-MAYBE_UNUSED
 static int bundle_list_update(const char *key, const char *value,
 			      struct bundle_list *list)
 {
@@ -306,3 +305,29 @@ cleanup:
 	free(filename);
 	return result;
 }
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct bundle_list *list, const char *line)
+{
+	int result;
+	const char *equals;
+	struct strbuf key = STRBUF_INIT;
+
+	if (!strlen(line))
+		return error(_("bundle-uri: got an empty line"));
+
+	equals = strchr(line, '=');
+
+	if (!equals)
+		return error(_("bundle-uri: line is not of the form 'key=value'"));
+	if (line == equals || !*(equals + 1))
+		return error(_("bundle-uri: line has empty key or value"));
+
+	strbuf_add(&key, line, equals - line);
+	result = bundle_list_update(key.buf, equals + 1, list);
+	strbuf_release(&key);
+
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index ff7e3fd3fb2..90583461929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * Parse a "key=value" packet line from the bundle-uri verb.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int bundle_uri_parse_line(struct bundle_list *list,
+			  const char *line);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 4/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-10-04 12:34     ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 6/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.

This change introduces the 'parse-key-values' subcommand, which parses
an input file as a list of lines. These are fed into
bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
from that data. The list is then output to stdout as if the key-value
pairs were a Git config file.

We use an input file instead of stdin because of a future change to
parse in config-file format that works better as an input file.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile                    |   1 +
 bundle-uri.c                |  33 ++++++++++
 bundle-uri.h                |   3 +
 t/helper/test-bundle-uri.c  |  70 +++++++++++++++++++++
 t/helper/test-tool.c        |   1 +
 t/helper/test-tool.h        |   1 +
 t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++
 t/test-lib-functions.sh     |  11 ++++
 8 files changed, 241 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 7d5f48069ea..7dee0329c49 100644
--- a/Makefile
+++ b/Makefile
@@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index 372e6fac5cf..c02e7f62eb1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+static int summarize_bundle(struct remote_bundle_info *info, void *data)
+{
+	FILE *fp = data;
+	fprintf(fp, "[bundle \"%s\"]\n", info->id);
+	fprintf(fp, "\turi = %s\n", info->uri);
+	return 0;
+}
+
+void print_bundle_list(FILE *fp, struct bundle_list *list)
+{
+	const char *mode;
+
+	switch (list->mode) {
+	case BUNDLE_MODE_ALL:
+		mode = "all";
+		break;
+
+	case BUNDLE_MODE_ANY:
+		mode = "any";
+		break;
+
+	case BUNDLE_MODE_NONE:
+	default:
+		mode = "<unknown>";
+	}
+
+	fprintf(fp, "[bundle]\n");
+	fprintf(fp, "\tversion = %d\n", list->version);
+	fprintf(fp, "\tmode = %s\n", mode);
+
+	for_all_bundles_in_list(list, summarize_bundle, fp);
+}
+
 /**
  * Given a key-value pair, update the state of the given bundle list.
  * Returns 0 if the key-value pair is understood. Returns -1 if the key
diff --git a/bundle-uri.h b/bundle-uri.h
index 90583461929..0e56ab2ae5a 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list,
 			    bundle_iterator iter,
 			    void *data);
 
+struct FILE;
+void print_bundle_list(FILE *fp, struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 00000000000..0329c56544f
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,70 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse(int argc, const char **argv)
+{
+	const char *key_value_usage[] = {
+		"test-tool bundle-uri parse-key-values <input>",
+		NULL
+	};
+	const char **usage = key_value_usage;
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct bundle_list list;
+	int err = 0;
+	FILE *fp;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc != 1)
+		goto usage;
+
+	init_bundle_list(&list);
+	fp = fopen(argv[0], "r");
+	if (!fp)
+		die("failed to open '%s'", argv[0]);
+
+	while (strbuf_getline(&sb, fp) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf))
+			err = error("bad line: '%s'", sb.buf);
+	}
+	strbuf_release(&sb);
+	fclose(fp);
+
+	print_bundle_list(stdout, &list);
+
+	clear_bundle_list(&list);
+
+	return !!err;
+
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse-key-values"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c3..fbe2d9d8108 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb799271631..b2aa1f39a8f 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 00000000000..fd142a66ad5
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
+	cat >in <<-\EOF &&
+	=bogus-value
+	bogus-key=
+	EOF
+
+	cat >err.expect <<-EOF &&
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''=bogus-value'\''
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''bogus-key='\''
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+
+	bundle.two.uri=https://example.com/bundle.bdl
+
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.one.uri=https://example.com/bundle-2.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6da7273f1d5..3175d665add 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1956,3 +1956,14 @@ test_is_magic_mtime () {
 	rm -f .git/test-mtime-actual
 	return $ret
 }
+
+# Given two filenames, parse both using 'git config --list --file'
+# and compare the sorted output of those commands. Useful when
+# wanting to ignore whitespace differences and sorting concerns.
+test_cmp_config_output () {
+	git config --list --file="$1" >config-expect &&
+	git config --list --file="$2" >config-actual &&
+	sort config-expect >sorted-expect &&
+	sort config-actual >sorted-actual &&
+	test_cmp sorted-expect sorted-actual
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 6/9] bundle-uri: parse bundle list in config format
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create bundle_uri_parse_config_format() to parse a file in config format
and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to use error_action CONFIG_ERROR_ERROR when calling
git_config_from_file_with_options() because the default action for
git_config_from_file() is to die() on a parsing error.  The current
warning isn't particularly helpful if it arises to a user, but it will
be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 27 ++++++++++++++++++++
 bundle-uri.h                |  9 +++++++
 t/helper/test-bundle-uri.c  | 49 +++++++++++++++++++++++++++---------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index c02e7f62eb1..3d44ec2b1e6 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 0e56ab2ae5a..bc13d4c9929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 0329c56544f..25afd393428 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,12 +4,21 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
 	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <input>",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	const char **usage = key_value_usage;
 	struct option options[] = {
 		OPT_END(),
@@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv)
 	int err = 0;
 	FILE *fp;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc != 1)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	fp = fopen(argv[0], "r");
-	if (!fp)
-		die("failed to open '%s'", argv[0]);
 
-	while (strbuf_getline(&sb, fp) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf))
-			err = error("bad line: '%s'", sb.buf);
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc != 1)
+			goto usage;
+		fp = fopen(argv[0], "r");
+		if (!fp)
+			die("failed to open '%s'", argv[0]);
+		while (strbuf_getline(&sb, fp) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf))
+				err = error("bad line: '%s'", sb.buf);
+		}
+		fclose(fp);
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = bundle_uri_parse_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
-	fclose(fp);
 
 	print_bundle_list(stdout, &list);
 
@@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index fd142a66ad5..c2fe3f9c5a5 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	error: bad config line 1 in file in2
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 6/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-04 12:34     ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The next change will start allowing us to parse bundle lists that are
downloaded from a provided bundle URI. Those lists might point to other
lists, which could proceed to an arbitrary depth (and even create
cycles). Restructure fetch_bundle_uri() to have an internal version that
has a recursion depth. Compare that to a new max_bundle_uri_depth
constant that is twice as high as we expect this depth to be for any
legitimate use of bundle list linking.

We can consider making max_bundle_uri_depth a configurable value if
there is demonstrated value in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 3d44ec2b1e6..8a7c11c6393 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+/**
+ * This limits the recursion on fetch_bundle_uri_internal() when following
+ * bundle lists.
+ */
+static int max_bundle_uri_depth = 4;
+
+static int fetch_bundle_uri_internal(struct repository *r,
+				     const char *uri,
+				     int depth)
 {
 	int result = 0;
 	char *filename;
 
+	if (depth >= max_bundle_uri_depth) {
+		warning(_("exceeded bundle URI recursion limit (%d)"),
+			max_bundle_uri_depth);
+		return -1;
+	}
+
 	if (!(filename = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
@@ -366,6 +380,11 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	return fetch_bundle_uri_internal(r, uri, 0);
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 8/9] bundle-uri: fetch a list of bundles
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-04 21:44       ` Jonathan Tan
  2022-10-04 12:34     ` [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  9 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 202 +++++++++++++++++++++++++++++++++---
 bundle-uri.h                |  13 +++
 t/t5558-clone-bundle-uri.sh | 135 ++++++++++++++++++++++++
 3 files changed, 334 insertions(+), 16 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 8a7c11c6393..aaa1848044a 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
 {
 	FREE_AND_NULL(bundle->id);
 	FREE_AND_NULL(bundle->uri);
+	FREE_AND_NULL(bundle->file);
+	bundle->unbundled = 0;
 	return 0;
 }
 
@@ -334,18 +336,116 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
+struct bundle_list_context {
+	struct repository *r;
+	struct bundle_list *list;
+	enum bundle_list_mode mode;
+	int count;
+	int depth;
+};
+
+/*
+ * This early definition is necessary because we use indirect recursion:
+ *
+ * While iterating through a bundle list that was downloaded as part
+ * of fetch_bundle_uri_internal(), iterator methods eventually call it
+ * again, but with depth + 1.
+ */
+static int fetch_bundle_uri_internal(struct repository *r,
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list);
+
+static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
+{
+	int res;
+	struct bundle_list_context *ctx = data;
+
+	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
+		return 0;
+
+	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
+
+	/*
+	 * Only increment count if the download succeeded. If our mode is
+	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
+	 * list in case they work instead.
+	 */
+	if (!res)
+		ctx->count++;
+
+	/*
+	 * In BUNDLE_MODE_ANY, we need to continue iterating until we find
+	 * a bundle that works, so do not signal a failure here.
+	 */
+	return ctx->mode == BUNDLE_MODE_ANY ? 0 : res;
+}
+
+static int download_bundle_list(struct repository *r,
+				struct bundle_list *local_list,
+				struct bundle_list *global_list,
+				int depth)
+{
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = global_list,
+		.depth = depth + 1,
+		.mode = local_list->mode,
+	};
+
+	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
+}
+
+static int fetch_bundle_list_in_config_format(struct repository *r,
+					      struct bundle_list *global_list,
+					      struct remote_bundle_info *bundle,
+					      int depth)
+{
+	int result;
+	struct bundle_list list_from_bundle;
+
+	init_bundle_list(&list_from_bundle);
+
+	if ((result = bundle_uri_parse_config_format(bundle->uri,
+						     bundle->file,
+						     &list_from_bundle)))
+		goto cleanup;
+
+	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
+		warning(_("unrecognized bundle mode from URI '%s'"),
+			bundle->uri);
+		result = -1;
+		goto cleanup;
+	}
+
+	if ((result = download_bundle_list(r, &list_from_bundle,
+					   global_list, depth)))
+		goto cleanup;
+
+cleanup:
+	clear_bundle_list(&list_from_bundle);
+	return result;
+}
+
 /**
  * This limits the recursion on fetch_bundle_uri_internal() when following
  * bundle lists.
  */
 static int max_bundle_uri_depth = 4;
 
+/**
+ * Recursively download all bundles advertised at the given URI
+ * to files. If the file is a bundle, then add it to the given
+ * 'list'. Otherwise, expect a bundle list and recurse on the
+ * URIs in that list according to the list mode (ANY or ALL).
+ */
 static int fetch_bundle_uri_internal(struct repository *r,
-				     const char *uri,
-				     int depth)
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list)
 {
 	int result = 0;
-	char *filename;
+	struct remote_bundle_info *bcopy;
 
 	if (depth >= max_bundle_uri_depth) {
 		warning(_("exceeded bundle URI recursion limit (%d)"),
@@ -353,36 +453,106 @@ static int fetch_bundle_uri_internal(struct repository *r,
 		return -1;
 	}
 
-	if (!(filename = find_temp_filename())) {
+	if (!bundle->file &&
+	    !(bundle->file = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
 	}
 
-	if ((result = copy_uri_to_file(filename, uri))) {
-		warning(_("failed to download bundle from URI '%s'"), uri);
+	if ((result = copy_uri_to_file(bundle->file, bundle->uri))) {
+		warning(_("failed to download bundle from URI '%s'"), bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename, 0))) {
-		warning(_("file at URI '%s' is not a bundle"), uri);
+	if ((result = !is_bundle(bundle->file, 1))) {
+		result = fetch_bundle_list_in_config_format(
+				r, list, bundle, depth);
+		if (result)
+			warning(_("file at URI '%s' is not a bundle or bundle list"),
+				bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename))) {
-		warning(_("failed to unbundle bundle from URI '%s'"), uri);
-		goto cleanup;
-	}
+	/* Copy the bundle and insert it into the global list. */
+	CALLOC_ARRAY(bcopy, 1);
+	bcopy->id = xstrdup(bundle->id);
+	bcopy->file = xstrdup(bundle->file);
+	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
+	hashmap_add(&list->bundles, &bcopy->ent);
 
 cleanup:
-	if (filename)
-		unlink(filename);
-	free(filename);
+	if (result && bundle->file)
+		unlink(bundle->file);
 	return result;
 }
 
+/**
+ * This loop iterator breaks the loop with nonzero return code on the
+ * first successful unbundling of a bundle.
+ */
+static int attempt_unbundle(struct remote_bundle_info *info, void *data)
+{
+	struct repository *r = data;
+
+	if (!info->file || info->unbundled)
+		return 0;
+
+	if (!unbundle_from_file(r, info->file)) {
+		info->unbundled = 1;
+		return 1;
+	}
+
+	return 0;
+}
+
+static int unbundle_all_bundles(struct repository *r,
+				struct bundle_list *list)
+{
+	/*
+	 * Iterate through all bundles looking for ones that can
+	 * successfully unbundle. If any succeed, then perhaps another
+	 * will succeed in the next attempt.
+	 *
+	 * Keep in mind that a non-zero result for the loop here means
+	 * the loop terminated early on a successful unbundling, which
+	 * signals that we can try again.
+	 */
+	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
+
+	return 0;
+}
+
+static int unlink_bundle(struct remote_bundle_info *info, void *data)
+{
+	if (info->file)
+		unlink_or_warn(info->file);
+	return 0;
+}
+
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
-	return fetch_bundle_uri_internal(r, uri, 0);
+	int result;
+	struct bundle_list list;
+	struct remote_bundle_info bundle = {
+		.uri = xstrdup(uri),
+		.id = xstrdup(""),
+	};
+
+	init_bundle_list(&list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &list);
+
+cleanup:
+	for_all_bundles_in_list(&list, unlink_bundle, NULL);
+	clear_bundle_list(&list);
+	clear_remote_bundle_info(&bundle, NULL);
+	return result;
 }
 
 /**
diff --git a/bundle-uri.h b/bundle-uri.h
index bc13d4c9929..4dbc269823c 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -28,6 +28,19 @@ struct remote_bundle_info {
 	 * if there was no table of contents.
 	 */
 	char *uri;
+
+	/**
+	 * If the bundle has been downloaded, then 'file' is a
+	 * filename storing its contents. Otherwise, 'file' is
+	 * NULL.
+	 */
+	char *file;
+
+	/**
+	 * If the bundle has been unbundled successfully, then
+	 * this boolean is true.
+	 */
+	unsigned unbundled:1;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index ad666a2d28a..9690f19386f 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -41,6 +41,92 @@ test_expect_success 'clone with file:// bundle' '
 	test_cmp expect actual
 '
 
+# To get interesting tests for bundle lists, we need to construct a
+# somewhat-interesting commit history.
+#
+# ---------------- bundle-4
+#
+#       4
+#      / \
+# ----|---|------- bundle-3
+#     |   |
+#     |   3
+#     |   |
+# ----|---|------- bundle-2
+#     |   |
+#     2   |
+#     |   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'construct incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit 1 &&
+		git checkout -b left &&
+		test_commit 2 &&
+		git checkout -b right base &&
+		test_commit 3 &&
+		git checkout -b merge left &&
+		git merge right -m "4" &&
+
+		git bundle create bundle-1.bundle base &&
+		git bundle create bundle-2.bundle base..left &&
+		git bundle create bundle-3.bundle base..right &&
+		git bundle create bundle-4.bundle merge --not left right
+	)
+'
+
+test_expect_success 'clone bundle list (file, no heuristic)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = file://$(pwd)/clone-from/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-file cat-file --batch-check <oids
+'
+
+test_expect_success 'clone bundle list (file, any mode)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = $HTTPD_URL/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = $HTTPD_URL/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-file cat-file --batch-check <oids
+'
+
 #########################################################################
 # HTTP tests begin here
 
@@ -75,6 +161,55 @@ test_expect_success 'clone HTTP bundle' '
 	test_config -C clone-http log.excludedecoration refs/bundle/
 '
 
+test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = $HTTPD_URL/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = $HTTPD_URL/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = $HTTPD_URL/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http cat-file --batch-check <oids
+'
+
+test_expect_success 'clone bundle list (HTTP, any mode)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = $HTTPD_URL/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = $HTTPD_URL/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-http cat-file --batch-check <oids
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-10-04 12:34     ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When downloading bundles from a git-remote-https subprocess, the bundle
URI logic wants to be opportunistic and download as much as possible and
work with what did succeed. This is particularly important in the "any"
mode, where any single bundle success will work.

If the URI is not available, the git-remote-https process will die()
with a "fatal:" error message, even though that error is not actually
fatal to the super process. Since stderr is passed through, it looks
like a fatal error to the user.

Suppress stderr to avoid these errors from bubbling to the surface. The
bundle URI API adds its own warning() messages on these failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                |  1 +
 t/t5558-clone-bundle-uri.sh | 12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index aaa1848044a..92af0eae224 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -230,6 +230,7 @@ static int download_https_uri_to_file(const char *file, const char *uri)
 	int found_get = 0;
 
 	strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
+	cp.err = -1;
 	cp.in = -1;
 	cp.out = -1;
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9690f19386f..a0ef0588e21 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -122,7 +122,11 @@ test_expect_success 'clone bundle list (file, any mode)' '
 		uri = $HTTPD_URL/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-any-file 2>err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-file cat-file --batch-check <oids
 '
@@ -205,7 +209,11 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 		uri = $HTTPD_URL/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-any-http 2>err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-http cat-file --batch-check <oids
 '
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 8/9] bundle-uri: fetch a list of bundles
  2022-10-04 12:34     ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-10-04 21:44       ` Jonathan Tan
  2022-10-07 13:29         ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Jonathan Tan @ 2022-10-04 21:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, newren, avarab, mjcheetham,
	steadmon, Glen Choo, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +static int unbundle_all_bundles(struct repository *r,
> +				struct bundle_list *list)
> +{
> +	/*
> +	 * Iterate through all bundles looking for ones that can
> +	 * successfully unbundle. If any succeed, then perhaps another
> +	 * will succeed in the next attempt.
> +	 *
> +	 * Keep in mind that a non-zero result for the loop here means
> +	 * the loop terminated early on a successful unbundling, which
> +	 * signals that we can try again.
> +	 */
> +	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
> +
> +	return 0;
> +}

This function always returns 0 regardless of how many successful 
iterations there were: we would need the number to be equal to the 
number of bundles in the list if ALL, and 1 if ANY. 

Which brings up the question...we probably need a test for when the 
unbundling is unsuccessful. 

Other than that, everything looks good, including the removal of one 
patch and the addition of the "bundle-uri: suppress stderr from 
remote-https" patch.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 8/9] bundle-uri: fetch a list of bundles
  2022-10-04 21:44       ` Jonathan Tan
@ 2022-10-07 13:29         ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-10-07 13:29 UTC (permalink / raw)
  To: Jonathan Tan, Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, mjcheetham, steadmon,
	Glen Choo, Teng Long

On 10/4/22 5:44 PM, Jonathan Tan wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +static int unbundle_all_bundles(struct repository *r,
>> +				struct bundle_list *list)
>> +{
>> +	/*
>> +	 * Iterate through all bundles looking for ones that can
>> +	 * successfully unbundle. If any succeed, then perhaps another
>> +	 * will succeed in the next attempt.
>> +	 *
>> +	 * Keep in mind that a non-zero result for the loop here means
>> +	 * the loop terminated early on a successful unbundling, which
>> +	 * signals that we can try again.
>> +	 */
>> +	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
>> +
>> +	return 0;
>> +}
> 
> This function always returns 0 regardless of how many successful 
> iterations there were: we would need the number to be equal to the 
> number of bundles in the list if ALL, and 1 if ANY. 

The ALL mode is a bit more permissive than requiring literally
every bundle: if some fail to download or apply, then we continue
with whatever we were able to unbundle. The ALL mode indicates that
the bundles build on each other, so the client should download as
many as possible. By contrast, ANY indicates that they are independent
so the client should stop after the first successful download.

We could still find a way to indicate how many bundles were downloaded
in the return of this method, but we don't want to have additional
warnings based on that return value.

> Which brings up the question...we probably need a test for when the 
> unbundling is unsuccessful. 

I will add more failure scenarios, including no successful downloads
or only a partial success in ALL mode.
 
> Other than that, everything looks good, including the removal of one 
> patch and the addition of the "bundle-uri: suppress stderr from 
> remote-https" patch.

Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists
  2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2022-10-04 12:34     ` [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04     ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
                         ` (11 more replies)
  9 siblings, 12 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way (updated for v4):

 1. Patch 1 is a cleanup from the previous part. This allows us to simplify
    our bundle list data structure slightly.

 2. Patches 2-3 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 4. Patch 6 teaches Git to parse Git config files into bundle lists.

 5. Patches 7-9 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists. We also need to
    be careful when verifying the bundles due to ref caches, so some flags
    are added to unbundle() and verify_bundle().

 6. Patches 10-11 suppress unhelpful warnings from user visibility.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22


Updates in v4
=============

 * Properly updated the patch outline.

 * Jonathan Tan asked for more tests, and this revealed some interesting
   behaviors which I have now either fixed or made explicit:
   
   1. In "all" mode, we try to download and apply all bundles. Do not fail
      if a single bundle download fails.
   2. Previously, not all bundles were being applied, and this was noticed
      by the added checks for the refs/bundles/* refs at the end of the
      tests. This revealed the need for removing the reachability walk from
      verify_bundle() since the written refs/bundles/* refs were not being
      picked up by the loose ref cache. Since removing the reachability walk
      seemed like the faster (for users) option, I went that direction.
   3. While running those tests and examining the output carefully, I
      noticed several error messages related to missing prerequisites due to
      attempting unbundling in a random order. This doesn't appear in the
      later creationToken version, so I hadn't noticed it at the tip of my
      local work. These messages are removed with a new quiet mode for
      verify_bundle().


Updates in v3
=============

 * Fixed a comment about a return value of -1.
 * Fixed and tested scenario where early URIs fail in "any" mode and Git
   should try the rest of the list.
 * Instead of using 'success_count' and 'failure_count', use the iterator
   return value to terminate the "all" mode loop early.


Updates in v2
=============

Thank you to all of the voices who chimed in on the previous version. I'm
sorry it took so long for me to get a new version.

 * I've done a rather thorough overhaul to minimize how often later patches
   rewrite portions of earlier patches.

 * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
   'char *' and only in the patch where it is first used.

 * The config documentation is more clearly indicating that the bundle.*
   section has no effect in the repository config (at the moment, which will
   change in the next series).

 * The bundle.version value is now parsed using git_parse_int().

 * The config key is now parsed using parse_config_key().

 * Commit messages clarify more about the context of the change in the
   bigger picture of the bundle URI effort.

 * Some printf()s are correctly changed to fprintf()s.

 * The test helper CLI is unified across the two modes. They both take a
   filename now.

 * The count of downloaded bundles is now only updated after a successful
   download, allowing the "any" mode to keep trying after a failure.

Thanks,

 * Stolee

Derrick Stolee (9):
  bundle-uri: use plain string in find_temp_filename()
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle: add flags to verify_bundle(), skip walk
  bundle-uri: fetch a list of bundles
  bundle-uri: quiet failed unbundlings
  bundle-uri: suppress stderr from remote-https

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  24 ++
 Makefile                        |   1 +
 builtin/bundle.c                |   5 +-
 bundle-uri.c                    | 458 ++++++++++++++++++++++++++++++--
 bundle-uri.h                    |  93 +++++++
 bundle.c                        |  22 +-
 bundle.h                        |  16 +-
 config.c                        |   2 +-
 config.h                        |   1 +
 t/helper/test-bundle-uri.c      |  95 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     | 275 +++++++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 171 ++++++++++++
 t/test-lib-functions.sh         |  11 +
 transport.c                     |   2 +-
 17 files changed, 1149 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1333

Range-diff vs v3:

  1:  48beccb0f5e =  1:  48beccb0f5e bundle-uri: use plain string in find_temp_filename()
  2:  f0c4457951c =  2:  f0c4457951c bundle-uri: create bundle_list struct and helpers
  3:  430e01cd2a4 =  3:  430e01cd2a4 bundle-uri: create base key-value pair parsing
  4:  cd915d57f3b =  4:  cd915d57f3b bundle-uri: create "key=value" line parsing
  5:  4d8cac67f66 =  5:  4d8cac67f66 bundle-uri: unit test "key=value" parsing
  6:  0ecae3a44b3 =  6:  0ecae3a44b3 bundle-uri: parse bundle list in config format
  7:  7e6b32313b0 =  7:  7e6b32313b0 bundle-uri: limit recursion depth for bundle lists
  -:  ----------- >  8:  83f2cd893a4 bundle: add flags to verify_bundle(), skip walk
  8:  46799648b4c !  9:  6b9c764c6b3 bundle-uri: fetch a list of bundles
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +		ctx->count++;
      +
      +	/*
     -+	 * In BUNDLE_MODE_ANY, we need to continue iterating until we find
     -+	 * a bundle that works, so do not signal a failure here.
     ++	 * To be opportunistic as possible, we continue iterating and
     ++	 * download as many bundles as we can, so we can apply the ones
     ++	 * that work, even in BUNDLE_MODE_ALL mode.
      +	 */
     -+	return ctx->mode == BUNDLE_MODE_ANY ? 0 : res;
     ++	return 0;
      +}
      +
      +static int download_bundle_list(struct repository *r,
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' '
      +
      +	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
      +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     -+	git -C clone-list-file cat-file --batch-check <oids
     ++	git -C clone-list-file cat-file --batch-check <oids &&
     ++
     ++	git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
     ++	grep "refs/bundles/" refs >actual &&
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	refs/bundles/left
     ++	refs/bundles/merge
     ++	refs/bundles/right
     ++	EOF
     ++	test_cmp expect actual
     ++'
     ++
     ++test_expect_success 'clone bundle list (file, all mode, some failures)' '
     ++	cat >bundle-list <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = all
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = file://$(pwd)/clone-from/bundle-0.bundle
     ++
     ++	[bundle "bundle-1"]
     ++		uri = file://$(pwd)/clone-from/bundle-1.bundle
     ++
     ++	[bundle "bundle-2"]
     ++		uri = file://$(pwd)/clone-from/bundle-2.bundle
     ++
     ++	# No bundle-3 means bundle-4 will not apply.
     ++
     ++	[bundle "bundle-4"]
     ++		uri = file://$(pwd)/clone-from/bundle-4.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = file://$(pwd)/clone-from/bundle-5.bundle
     ++	EOF
     ++
     ++	GIT_TRACE2_PERF=1 \
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-all-some cat-file --batch-check <oids &&
     ++
     ++	git -C clone-all-some for-each-ref --format="%(refname)" >refs &&
     ++	grep "refs/bundles/" refs >actual &&
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	refs/bundles/left
     ++	EOF
     ++	test_cmp expect actual
     ++'
     ++
     ++test_expect_success 'clone bundle list (file, all mode, all failures)' '
     ++	cat >bundle-list <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = all
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = file://$(pwd)/clone-from/bundle-0.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = file://$(pwd)/clone-from/bundle-5.bundle
     ++	EOF
     ++
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-all-fail cat-file --batch-check <oids &&
     ++
     ++	git -C clone-all-fail for-each-ref --format="%(refname)" >refs &&
     ++	! grep "refs/bundles/" refs
      +'
      +
      +test_expect_success 'clone bundle list (file, any mode)' '
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' '
      +
      +	# Does not exist. Should be skipped.
      +	[bundle "bundle-0"]
     -+		uri = $HTTPD_URL/bundle-0.bundle
     ++		uri = file://$(pwd)/clone-from/bundle-0.bundle
      +
      +	[bundle "bundle-1"]
     -+		uri = $HTTPD_URL/bundle-1.bundle
     ++		uri = file://$(pwd)/clone-from/bundle-1.bundle
      +
      +	# Does not exist. Should be skipped.
      +	[bundle "bundle-5"]
     -+		uri = $HTTPD_URL/bundle-5.bundle
     ++		uri = file://$(pwd)/clone-from/bundle-5.bundle
      +	EOF
      +
      +	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
      +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     -+	git -C clone-any-file cat-file --batch-check <oids
     ++	git -C clone-any-file cat-file --batch-check <oids &&
     ++
     ++	git -C clone-any-file for-each-ref --format="%(refname)" >refs &&
     ++	grep "refs/bundles/" refs >actual &&
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	EOF
     ++	test_cmp expect actual
     ++'
     ++
     ++test_expect_success 'clone bundle list (file, any mode, all failures)' '
     ++	cat >bundle-list <<-EOF &&
     ++	[bundle]
     ++		version = 1
     ++		mode = any
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-0"]
     ++		uri = $HTTPD_URL/bundle-0.bundle
     ++
     ++	# Does not exist. Should be skipped.
     ++	[bundle "bundle-5"]
     ++		uri = $HTTPD_URL/bundle-5.bundle
     ++	EOF
     ++
     ++	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail &&
     ++	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     ++	git -C clone-any-fail cat-file --batch-check <oids &&
     ++
     ++	git -C clone-any-fail for-each-ref --format="%(refname)" >refs &&
     ++	! grep "refs/bundles/" refs
      +'
      +
       #########################################################################
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' '
      +
      +	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
      +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     -+	git -C clone-any-http cat-file --batch-check <oids
     ++	git -C clone-any-http cat-file --batch-check <oids &&
     ++
     ++	git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
     ++	grep "refs/bundles/" refs >actual &&
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	refs/bundles/left
     ++	refs/bundles/merge
     ++	refs/bundles/right
     ++	EOF
     ++	test_cmp expect actual
      +'
      +
       # Do not add tests here unless they use the HTTP server, as they will
  -:  ----------- > 10:  1cae3096624 bundle-uri: quiet failed unbundlings
  9:  d84544859e4 ! 11:  52a575f8a69 bundle-uri: suppress stderr from remote-https
     @@ bundle-uri.c: static int download_https_uri_to_file(const char *file, const char
       
      
       ## t/t5558-clone-bundle-uri.sh ##
     -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any mode)' '
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, all mode, some failures)' '
     + 	git clone --bundle-uri="file://$(pwd)/bundle-list" \
     + 		clone-from clone-all-some 2>err &&
     + 	! grep "Repository lacks these prerequisite commits" err &&
     ++	! grep "fatal" err &&
     ++	grep "warning: failed to download bundle from URI" err &&
     + 
     + 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     + 	git -C clone-all-some cat-file --batch-check <oids &&
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, all mode, all failures)' '
     + 	git clone --bundle-uri="file://$(pwd)/bundle-list" \
     + 		clone-from clone-all-fail 2>err &&
     + 	! grep "Repository lacks these prerequisite commits" err &&
     ++	! grep "fatal" err &&
     ++	grep "warning: failed to download bundle from URI" err &&
     + 
     + 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     + 	git -C clone-all-fail cat-file --batch-check <oids &&
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any mode, all failures)' '
       		uri = $HTTPD_URL/bundle-5.bundle
       	EOF
       
     --	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
     +-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail &&
      +	git clone --bundle-uri="file://$(pwd)/bundle-list" \
     -+		clone-from clone-any-file 2>err &&
     ++		clone-from clone-any-fail 2>err &&
      +	! grep "fatal" err &&
      +	grep "warning: failed to download bundle from URI" err &&
      +
       	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     - 	git -C clone-any-file cat-file --batch-check <oids
     - '
     + 	git -C clone-any-fail cat-file --batch-check <oids &&
     + 
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any mode)' '
       		uri = $HTTPD_URL/bundle-5.bundle
       	EOF
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
      +	grep "warning: failed to download bundle from URI" err &&
      +
       	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     - 	git -C clone-any-http cat-file --batch-check <oids
     - '
     + 	git -C clone-any-http cat-file --batch-check <oids &&
     + 

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename()
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
                         ` (10 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The find_temp_filename() method was created in 53a50892be2 (bundle-uri:
create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to
create a temporary filename. The odb_mkstemp() method uses a strbuf in
its interface, but we do not need to continue carrying a strbuf
throughout the bundle URI code.

Convert the find_temp_filename() method to use a 'char *' and modify its
only caller. This makes sense that we don't actually need to modify this
filename directly later, so using a strbuf is overkill.

This change will simplify the data structure for tracking a bundle list
to use plain strings instead of strbufs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4a8cc74ed05..8b2f4e08c9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -5,22 +5,23 @@
 #include "refs.h"
 #include "run-command.h"
 
-static int find_temp_filename(struct strbuf *name)
+static char *find_temp_filename(void)
 {
 	int fd;
+	struct strbuf name = STRBUF_INIT;
 	/*
 	 * Find a temporary filename that is available. This is briefly
 	 * racy, but unlikely to collide.
 	 */
-	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX");
 	if (fd < 0) {
 		warning(_("failed to create temporary file"));
-		return -1;
+		return NULL;
 	}
 
 	close(fd);
-	unlink(name->buf);
-	return 0;
+	unlink(name.buf);
+	return strbuf_detach(&name, NULL);
 }
 
 static int download_https_uri_to_file(const char *file, const char *uri)
@@ -141,28 +142,31 @@ static int unbundle_from_file(struct repository *r, const char *file)
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
 	int result = 0;
-	struct strbuf filename = STRBUF_INIT;
+	char *filename;
 
-	if ((result = find_temp_filename(&filename)))
+	if (!(filename = find_temp_filename())) {
+		result = -1;
 		goto cleanup;
+	}
 
-	if ((result = copy_uri_to_file(filename.buf, uri))) {
+	if ((result = copy_uri_to_file(filename, uri))) {
 		warning(_("failed to download bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename.buf, 0))) {
+	if ((result = !is_bundle(filename, 0))) {
 		warning(_("file at URI '%s' is not a bundle"), uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename.buf))) {
+	if ((result = unbundle_from_file(r, filename))) {
 		warning(_("failed to unbundle bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
 cleanup:
-	unlink(filename.buf);
-	strbuf_release(&filename);
+	if (filename)
+		unlink(filename);
+	free(filename);
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 03/11] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
                         ` (9 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.

In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.

Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.

The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:

1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
   together. The client should download all of the advertised data to
   have a complete copy of the data.

2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
   copy of the data. The client can choose arbitrarily from these
   options. In the future, the client may use pings to find the closest
   URI among geodistributed replicas, or use some other heuristic
   information added to the format.

This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 8b2f4e08c9c..f9a8db221bc 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -4,6 +4,66 @@
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
+#include "hashmap.h"
+#include "pkt-line.h"
+
+static int compare_bundles(const void *hashmap_cmp_fn_data,
+			   const struct hashmap_entry *he1,
+			   const struct hashmap_entry *he2,
+			   const void *id)
+{
+	const struct remote_bundle_info *e1 =
+		container_of(he1, const struct remote_bundle_info, ent);
+	const struct remote_bundle_info *e2 =
+		container_of(he2, const struct remote_bundle_info, ent);
+
+	return strcmp(e1->id, id ? (const char *)id : e2->id);
+}
+
+void init_bundle_list(struct bundle_list *list)
+{
+	memset(list, 0, sizeof(*list));
+
+	/* Implied defaults. */
+	list->mode = BUNDLE_MODE_ALL;
+	list->version = 1;
+
+	hashmap_init(&list->bundles, compare_bundles, NULL, 0);
+}
+
+static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
+				    void *data)
+{
+	FREE_AND_NULL(bundle->id);
+	FREE_AND_NULL(bundle->uri);
+	return 0;
+}
+
+void clear_bundle_list(struct bundle_list *list)
+{
+	if (!list)
+		return;
+
+	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
+	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+}
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data)
+{
+	struct remote_bundle_info *info;
+	struct hashmap_iter i;
+
+	hashmap_for_each_entry(&list->bundles, &i, info, ent) {
+		int result = iter(info, data);
+
+		if (result)
+			return result;
+	}
+
+	return 0;
+}
 
 static char *find_temp_filename(void)
 {
diff --git a/bundle-uri.h b/bundle-uri.h
index 8a152f1ef14..ff7e3fd3fb2 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -1,7 +1,63 @@
 #ifndef BUNDLE_URI_H
 #define BUNDLE_URI_H
 
+#include "hashmap.h"
+#include "strbuf.h"
+
 struct repository;
+struct string_list;
+
+/**
+ * The remote_bundle_info struct contains information for a single bundle
+ * URI. This may be initialized simply by a given URI or might have
+ * additional metadata associated with it if the bundle was advertised by
+ * a bundle list.
+ */
+struct remote_bundle_info {
+	struct hashmap_entry ent;
+
+	/**
+	 * The 'id' is a name given to the bundle for reference
+	 * by other bundle infos.
+	 */
+	char *id;
+
+	/**
+	 * The 'uri' is the location of the remote bundle so
+	 * it can be downloaded on-demand. This will be NULL
+	 * if there was no table of contents.
+	 */
+	char *uri;
+};
+
+#define REMOTE_BUNDLE_INFO_INIT { 0 }
+
+enum bundle_list_mode {
+	BUNDLE_MODE_NONE = 0,
+	BUNDLE_MODE_ALL,
+	BUNDLE_MODE_ANY
+};
+
+/**
+ * A bundle_list contains an unordered set of remote_bundle_info structs,
+ * as well as information about the bundle listing, such as version and
+ * mode.
+ */
+struct bundle_list {
+	int version;
+	enum bundle_list_mode mode;
+	struct hashmap bundles;
+};
+
+void init_bundle_list(struct bundle_list *list);
+void clear_bundle_list(struct bundle_list *list);
+
+typedef int (*bundle_iterator)(struct remote_bundle_info *bundle,
+			       void *data);
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data);
 
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 03/11] bundle-uri: create base key-value pair parsing
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 04/11] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                         ` (8 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.

Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:

* bundle.version: This value should be an integer. Git currently
  understands only version 1 and will ignore the list if the version is
  any other value. This version can be increased in the future if we
  need to add new keys that Git should not ignore. We can add new
  "heuristic" keys without incrementing the version.

* bundle.mode: This value should be one of "all" or "any". If this
  mode is not understood, then Git will ignore the list. This mode
  indicates whether Git needs all of the bundle list items to make a
  complete view of the content or if any single item is sufficient.

The rest of the keys use a bundle identifier "<id>" as part of the key
name. Keys using the same "<id>" describe a single bundle list item.

* bundle.<id>.uri: This stores the URI of the bundle item. This
  currently is expected to be an absolute URI, but will be relaxed to be
  a relative URI in the future.

While parsing, return an error if a URI key is repeated, since we can
make that restriction with bundle lists.

Make the git_parse_int() method global so we can parse the integer
version value carefully.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config.txt        |  2 +
 Documentation/config/bundle.txt | 24 +++++++++++
 bundle-uri.c                    | 76 +++++++++++++++++++++++++++++++++
 config.c                        |  2 +-
 config.h                        |  1 +
 5 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/config/bundle.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e376d547ce0..4280af6992e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -387,6 +387,8 @@ include::config/branch.txt[]
 
 include::config/browser.txt[]
 
+include::config/bundle.txt[]
+
 include::config/checkout.txt[]
 
 include::config/clean.txt[]
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
new file mode 100644
index 00000000000..daa21eb674a
--- /dev/null
+++ b/Documentation/config/bundle.txt
@@ -0,0 +1,24 @@
+bundle.*::
+	The `bundle.*` keys may appear in a bundle list file found via the
+	`git clone --bundle-uri` option. These keys currently have no effect
+	if placed in a repository config file, though this will change in the
+	future. See link:technical/bundle-uri.html[the bundle URI design
+	document] for more details.
+
+bundle.version::
+	This integer value advertises the version of the bundle list format
+	used by the bundle list. Currently, the only accepted value is `1`.
+
+bundle.mode::
+	This string value should be either `all` or `any`. This value describes
+	whether all of the advertised bundles are required to unbundle a
+	complete understanding of the bundled information (`all`) or if any one
+	of the listed bundle URIs is sufficient (`any`).
+
+bundle.<id>.*::
+	The `bundle.<id>.*` keys are used to describe a single item in the
+	bundle list, grouped under `<id>` for identification purposes.
+
+bundle.<id>.uri::
+	This string value defines the URI by which Git can reach the contents
+	of this `<id>`. This URI may be a bundle file or another bundle list.
diff --git a/bundle-uri.c b/bundle-uri.c
index f9a8db221bc..0bc59dd9c34 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -6,6 +6,7 @@
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "config.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+/**
+ * Given a key-value pair, update the state of the given bundle list.
+ * Returns 0 if the key-value pair is understood. Returns -1 if the key
+ * is not understood or the value is malformed.
+ */
+MAYBE_UNUSED
+static int bundle_list_update(const char *key, const char *value,
+			      struct bundle_list *list)
+{
+	struct strbuf id = STRBUF_INIT;
+	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
+	struct remote_bundle_info *bundle;
+	const char *subsection, *subkey;
+	size_t subsection_len;
+
+	if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
+		return -1;
+
+	if (!subsection_len) {
+		if (!strcmp(subkey, "version")) {
+			int version;
+			if (!git_parse_int(value, &version))
+				return -1;
+			if (version != 1)
+				return -1;
+
+			list->version = version;
+			return 0;
+		}
+
+		if (!strcmp(subkey, "mode")) {
+			if (!strcmp(value, "all"))
+				list->mode = BUNDLE_MODE_ALL;
+			else if (!strcmp(value, "any"))
+				list->mode = BUNDLE_MODE_ANY;
+			else
+				return -1;
+			return 0;
+		}
+
+		/* Ignore other unknown global keys. */
+		return 0;
+	}
+
+	strbuf_add(&id, subsection, subsection_len);
+
+	/*
+	 * Check for an existing bundle with this <id>, or create one
+	 * if necessary.
+	 */
+	lookup.id = id.buf;
+	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
+	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
+		CALLOC_ARRAY(bundle, 1);
+		bundle->id = strbuf_detach(&id, NULL);
+		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
+		hashmap_add(&list->bundles, &bundle->ent);
+	}
+	strbuf_release(&id);
+
+	if (!strcmp(subkey, "uri")) {
+		if (bundle->uri)
+			return -1;
+		bundle->uri = xstrdup(value);
+		return 0;
+	}
+
+	/*
+	 * At this point, we ignore any information that we don't
+	 * understand, assuming it to be hints for a heuristic the client
+	 * does not currently understand.
+	 */
+	return 0;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/config.c b/config.c
index 015bec360f5..e93101249f6 100644
--- a/config.c
+++ b/config.c
@@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max)
 	return 0;
 }
 
-static int git_parse_int(const char *value, int *ret)
+int git_parse_int(const char *value, int *ret)
 {
 	intmax_t tmp;
 	if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int)))
diff --git a/config.h b/config.h
index ca994d77147..ef9eade6414 100644
--- a/config.h
+++ b/config.h
@@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *,
 
 int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
+int git_parse_int(const char *value, int *ret);
 
 /**
  * Same as `git_config_bool`, except that it returns -1 on error rather
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 04/11] bundle-uri: create "key=value" line parsing
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 03/11] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                         ` (7 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

When advertising a bundle list over Git's protocol v2, we will use
packet lines. Each line will be of the form "key=value" representing a
bundle list. Connect the API necessary for Git's transport to the
key-value pair parsing created in the previous change.

We are not currently implementing this protocol v2 functionality, but
instead preparing to expose this parsing to be unit-testable.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 27 ++++++++++++++++++++++++++-
 bundle-uri.h | 12 ++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0bc59dd9c34..372e6fac5cf 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list,
  * Returns 0 if the key-value pair is understood. Returns -1 if the key
  * is not understood or the value is malformed.
  */
-MAYBE_UNUSED
 static int bundle_list_update(const char *key, const char *value,
 			      struct bundle_list *list)
 {
@@ -306,3 +305,29 @@ cleanup:
 	free(filename);
 	return result;
 }
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct bundle_list *list, const char *line)
+{
+	int result;
+	const char *equals;
+	struct strbuf key = STRBUF_INIT;
+
+	if (!strlen(line))
+		return error(_("bundle-uri: got an empty line"));
+
+	equals = strchr(line, '=');
+
+	if (!equals)
+		return error(_("bundle-uri: line is not of the form 'key=value'"));
+	if (line == equals || !*(equals + 1))
+		return error(_("bundle-uri: line has empty key or value"));
+
+	strbuf_add(&key, line, equals - line);
+	result = bundle_list_update(key.buf, equals + 1, list);
+	strbuf_release(&key);
+
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index ff7e3fd3fb2..90583461929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * Parse a "key=value" packet line from the bundle-uri verb.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int bundle_uri_parse_line(struct bundle_list *list,
+			  const char *line);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 04/11] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-10-10 16:04       ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 06/11] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.

This change introduces the 'parse-key-values' subcommand, which parses
an input file as a list of lines. These are fed into
bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
from that data. The list is then output to stdout as if the key-value
pairs were a Git config file.

We use an input file instead of stdin because of a future change to
parse in config-file format that works better as an input file.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile                    |   1 +
 bundle-uri.c                |  33 ++++++++++
 bundle-uri.h                |   3 +
 t/helper/test-bundle-uri.c  |  70 +++++++++++++++++++++
 t/helper/test-tool.c        |   1 +
 t/helper/test-tool.h        |   1 +
 t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++
 t/test-lib-functions.sh     |  11 ++++
 8 files changed, 241 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 7d5f48069ea..7dee0329c49 100644
--- a/Makefile
+++ b/Makefile
@@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index 372e6fac5cf..c02e7f62eb1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+static int summarize_bundle(struct remote_bundle_info *info, void *data)
+{
+	FILE *fp = data;
+	fprintf(fp, "[bundle \"%s\"]\n", info->id);
+	fprintf(fp, "\turi = %s\n", info->uri);
+	return 0;
+}
+
+void print_bundle_list(FILE *fp, struct bundle_list *list)
+{
+	const char *mode;
+
+	switch (list->mode) {
+	case BUNDLE_MODE_ALL:
+		mode = "all";
+		break;
+
+	case BUNDLE_MODE_ANY:
+		mode = "any";
+		break;
+
+	case BUNDLE_MODE_NONE:
+	default:
+		mode = "<unknown>";
+	}
+
+	fprintf(fp, "[bundle]\n");
+	fprintf(fp, "\tversion = %d\n", list->version);
+	fprintf(fp, "\tmode = %s\n", mode);
+
+	for_all_bundles_in_list(list, summarize_bundle, fp);
+}
+
 /**
  * Given a key-value pair, update the state of the given bundle list.
  * Returns 0 if the key-value pair is understood. Returns -1 if the key
diff --git a/bundle-uri.h b/bundle-uri.h
index 90583461929..0e56ab2ae5a 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list,
 			    bundle_iterator iter,
 			    void *data);
 
+struct FILE;
+void print_bundle_list(FILE *fp, struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 00000000000..0329c56544f
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,70 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse(int argc, const char **argv)
+{
+	const char *key_value_usage[] = {
+		"test-tool bundle-uri parse-key-values <input>",
+		NULL
+	};
+	const char **usage = key_value_usage;
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct bundle_list list;
+	int err = 0;
+	FILE *fp;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc != 1)
+		goto usage;
+
+	init_bundle_list(&list);
+	fp = fopen(argv[0], "r");
+	if (!fp)
+		die("failed to open '%s'", argv[0]);
+
+	while (strbuf_getline(&sb, fp) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf))
+			err = error("bad line: '%s'", sb.buf);
+	}
+	strbuf_release(&sb);
+	fclose(fp);
+
+	print_bundle_list(stdout, &list);
+
+	clear_bundle_list(&list);
+
+	return !!err;
+
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse-key-values"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c3..fbe2d9d8108 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb799271631..b2aa1f39a8f 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 00000000000..fd142a66ad5
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
+	cat >in <<-\EOF &&
+	=bogus-value
+	bogus-key=
+	EOF
+
+	cat >err.expect <<-EOF &&
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''=bogus-value'\''
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''bogus-key='\''
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+
+	bundle.two.uri=https://example.com/bundle.bdl
+
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.one.uri=https://example.com/bundle-2.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6da7273f1d5..3175d665add 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1956,3 +1956,14 @@ test_is_magic_mtime () {
 	rm -f .git/test-mtime-actual
 	return $ret
 }
+
+# Given two filenames, parse both using 'git config --list --file'
+# and compare the sorted output of those commands. Useful when
+# wanting to ignore whitespace differences and sorting concerns.
+test_cmp_config_output () {
+	git config --list --file="$1" >config-expect &&
+	git config --list --file="$2" >config-actual &&
+	sort config-expect >sorted-expect &&
+	sort config-actual >sorted-actual &&
+	test_cmp sorted-expect sorted-actual
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 06/11] bundle-uri: parse bundle list in config format
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
                         ` (5 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create bundle_uri_parse_config_format() to parse a file in config format
and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to use error_action CONFIG_ERROR_ERROR when calling
git_config_from_file_with_options() because the default action for
git_config_from_file() is to die() on a parsing error.  The current
warning isn't particularly helpful if it arises to a user, but it will
be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 27 ++++++++++++++++++++
 bundle-uri.h                |  9 +++++++
 t/helper/test-bundle-uri.c  | 49 +++++++++++++++++++++++++++---------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index c02e7f62eb1..3d44ec2b1e6 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 0e56ab2ae5a..bc13d4c9929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 0329c56544f..25afd393428 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,12 +4,21 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
 	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <input>",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	const char **usage = key_value_usage;
 	struct option options[] = {
 		OPT_END(),
@@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv)
 	int err = 0;
 	FILE *fp;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc != 1)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	fp = fopen(argv[0], "r");
-	if (!fp)
-		die("failed to open '%s'", argv[0]);
 
-	while (strbuf_getline(&sb, fp) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf))
-			err = error("bad line: '%s'", sb.buf);
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc != 1)
+			goto usage;
+		fp = fopen(argv[0], "r");
+		if (!fp)
+			die("failed to open '%s'", argv[0]);
+		while (strbuf_getline(&sb, fp) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf))
+				err = error("bad line: '%s'", sb.buf);
+		}
+		fclose(fp);
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = bundle_uri_parse_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
-	fclose(fp);
 
 	print_bundle_list(stdout, &list);
 
@@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index fd142a66ad5..c2fe3f9c5a5 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	error: bad config line 1 in file in2
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 06/11] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The next change will start allowing us to parse bundle lists that are
downloaded from a provided bundle URI. Those lists might point to other
lists, which could proceed to an arbitrary depth (and even create
cycles). Restructure fetch_bundle_uri() to have an internal version that
has a recursion depth. Compare that to a new max_bundle_uri_depth
constant that is twice as high as we expect this depth to be for any
legitimate use of bundle list linking.

We can consider making max_bundle_uri_depth a configurable value if
there is demonstrated value in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 3d44ec2b1e6..8a7c11c6393 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+/**
+ * This limits the recursion on fetch_bundle_uri_internal() when following
+ * bundle lists.
+ */
+static int max_bundle_uri_depth = 4;
+
+static int fetch_bundle_uri_internal(struct repository *r,
+				     const char *uri,
+				     int depth)
 {
 	int result = 0;
 	char *filename;
 
+	if (depth >= max_bundle_uri_depth) {
+		warning(_("exceeded bundle URI recursion limit (%d)"),
+			max_bundle_uri_depth);
+		return -1;
+	}
+
 	if (!(filename = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
@@ -366,6 +380,11 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	return fetch_bundle_uri_internal(r, uri, 0);
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 17:27         ` Junio C Hamano
  2022-10-10 16:04       ` [PATCH v4 09/11] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  11 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The verify_bundle() method checks if a bundle can be applied to a given
repository. This not only verifies that certain commits exist in the
repository, but Git also checks that these commits are reachable.

This behavior dates back to the original git-bundle builtin written in
2e0afafebd8 (Add git-bundle: move objects and references by archive,
2007-02-22), but the message does not go into detail why the
reachability check is important.

Since verify_bundle() is called from unbundle(), we need to add an
option to pipe the flags through that method.

When unbundling from a list of bundles, Git will create refs that point
to the tips of the latest bundle, which makes this reachability walk
succeed, in theory. However, the loose refs cache does not get
invalidated and hence the reachability walk fails. By disabling the
reachability walk in the bundle URI code, we can get around this
reachability check.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/bundle.c |  5 +++--
 bundle-uri.c     |  8 +++++++-
 bundle.c         | 12 +++++++-----
 bundle.h         | 15 +++++++++++++--
 transport.c      |  2 +-
 5 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 2adad545a2e..7d983a238f0 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -119,7 +119,8 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
 		goto cleanup;
 	}
 	close(bundle_fd);
-	if (verify_bundle(the_repository, &header, !quiet)) {
+	if (verify_bundle(the_repository, &header,
+			  quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) {
 		ret = 1;
 		goto cleanup;
 	}
@@ -185,7 +186,7 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
 			     _("Unbundling objects"), NULL);
 	ret = !!unbundle(the_repository, &header, bundle_fd,
-			 &extra_index_pack_args) ||
+			 &extra_index_pack_args, 0) ||
 		list_bundle_refs(&header, argc, argv);
 	bundle_header_release(&header);
 cleanup:
diff --git a/bundle-uri.c b/bundle-uri.c
index 8a7c11c6393..ad5baabdd94 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -301,7 +301,13 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
 		return 1;
 
-	if ((result = unbundle(r, &header, bundle_fd, NULL)))
+	/*
+	 * Skip the reachability walk here, since we will be adding
+	 * a reachable ref pointing to the new tips, which will reach
+	 * the prerequisite commits.
+	 */
+	if ((result = unbundle(r, &header, bundle_fd, NULL,
+			       VERIFY_BUNDLE_SKIP_REACHABLE)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 0208e6d90d3..36ffeb1e0eb 100644
--- a/bundle.c
+++ b/bundle.c
@@ -189,7 +189,7 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
 
 int verify_bundle(struct repository *r,
 		  struct bundle_header *header,
-		  int verbose)
+		  enum verify_bundle_flags flags)
 {
 	/*
 	 * Do fast check, then if any prereqs are missing then go line by line
@@ -222,7 +222,8 @@ int verify_bundle(struct repository *r,
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
-	if (revs.pending.nr != p->nr)
+	if (revs.pending.nr != p->nr ||
+	    (flags & VERIFY_BUNDLE_SKIP_REACHABLE))
 		goto cleanup;
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
@@ -259,7 +260,7 @@ int verify_bundle(struct repository *r,
 			clear_commit_marks(commit, ALL_REV_FLAGS);
 	}
 
-	if (verbose) {
+	if (flags & VERIFY_BUNDLE_VERBOSE) {
 		struct string_list *r;
 
 		r = &header->references;
@@ -620,7 +621,8 @@ err:
 }
 
 int unbundle(struct repository *r, struct bundle_header *header,
-	     int bundle_fd, struct strvec *extra_index_pack_args)
+	     int bundle_fd, struct strvec *extra_index_pack_args,
+	     enum verify_bundle_flags flags)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
@@ -634,7 +636,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
 		strvec_clear(extra_index_pack_args);
 	}
 
-	if (verify_bundle(r, header, 0))
+	if (verify_bundle(r, header, flags))
 		return -1;
 	ip.in = bundle_fd;
 	ip.no_stdout = 1;
diff --git a/bundle.h b/bundle.h
index 0c052f54964..9f798c00d93 100644
--- a/bundle.h
+++ b/bundle.h
@@ -29,7 +29,14 @@ int read_bundle_header_fd(int fd, struct bundle_header *header,
 int create_bundle(struct repository *r, const char *path,
 		  int argc, const char **argv, struct strvec *pack_options,
 		  int version);
-int verify_bundle(struct repository *r, struct bundle_header *header, int verbose);
+
+enum verify_bundle_flags {
+	VERIFY_BUNDLE_VERBOSE = (1 << 0),
+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1)
+};
+
+int verify_bundle(struct repository *r, struct bundle_header *header,
+		  enum verify_bundle_flags flags);
 
 /**
  * Unbundle after reading the header with read_bundle_header().
@@ -40,9 +47,13 @@ int verify_bundle(struct repository *r, struct bundle_header *header, int verbos
  * Provide "extra_index_pack_args" to pass any extra arguments
  * (e.g. "-v" for verbose/progress), NULL otherwise. The provided
  * "extra_index_pack_args" (if any) will be strvec_clear()'d for you.
+ *
+ * Before unbundling, this method will call verify_bundle() with the
+ * given 'flags'.
  */
 int unbundle(struct repository *r, struct bundle_header *header,
-	     int bundle_fd, struct strvec *extra_index_pack_args);
+	     int bundle_fd, struct strvec *extra_index_pack_args,
+	     enum verify_bundle_flags flags);
 int list_bundle_refs(struct bundle_header *header,
 		int argc, const char **argv);
 
diff --git a/transport.c b/transport.c
index 52db7a3cb09..c5d3042731a 100644
--- a/transport.c
+++ b/transport.c
@@ -178,7 +178,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
-		       &extra_index_pack_args);
+		       &extra_index_pack_args, 0);
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 09/11] bundle-uri: fetch a list of bundles
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 10/11] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 203 ++++++++++++++++++++++++++---
 bundle-uri.h                |  13 ++
 t/t5558-clone-bundle-uri.sh | 248 ++++++++++++++++++++++++++++++++++++
 3 files changed, 448 insertions(+), 16 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index ad5baabdd94..c0a6fb05fad 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
 {
 	FREE_AND_NULL(bundle->id);
 	FREE_AND_NULL(bundle->uri);
+	FREE_AND_NULL(bundle->file);
+	bundle->unbundled = 0;
 	return 0;
 }
 
@@ -340,18 +342,117 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
+struct bundle_list_context {
+	struct repository *r;
+	struct bundle_list *list;
+	enum bundle_list_mode mode;
+	int count;
+	int depth;
+};
+
+/*
+ * This early definition is necessary because we use indirect recursion:
+ *
+ * While iterating through a bundle list that was downloaded as part
+ * of fetch_bundle_uri_internal(), iterator methods eventually call it
+ * again, but with depth + 1.
+ */
+static int fetch_bundle_uri_internal(struct repository *r,
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list);
+
+static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
+{
+	int res;
+	struct bundle_list_context *ctx = data;
+
+	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
+		return 0;
+
+	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
+
+	/*
+	 * Only increment count if the download succeeded. If our mode is
+	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
+	 * list in case they work instead.
+	 */
+	if (!res)
+		ctx->count++;
+
+	/*
+	 * To be opportunistic as possible, we continue iterating and
+	 * download as many bundles as we can, so we can apply the ones
+	 * that work, even in BUNDLE_MODE_ALL mode.
+	 */
+	return 0;
+}
+
+static int download_bundle_list(struct repository *r,
+				struct bundle_list *local_list,
+				struct bundle_list *global_list,
+				int depth)
+{
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = global_list,
+		.depth = depth + 1,
+		.mode = local_list->mode,
+	};
+
+	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
+}
+
+static int fetch_bundle_list_in_config_format(struct repository *r,
+					      struct bundle_list *global_list,
+					      struct remote_bundle_info *bundle,
+					      int depth)
+{
+	int result;
+	struct bundle_list list_from_bundle;
+
+	init_bundle_list(&list_from_bundle);
+
+	if ((result = bundle_uri_parse_config_format(bundle->uri,
+						     bundle->file,
+						     &list_from_bundle)))
+		goto cleanup;
+
+	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
+		warning(_("unrecognized bundle mode from URI '%s'"),
+			bundle->uri);
+		result = -1;
+		goto cleanup;
+	}
+
+	if ((result = download_bundle_list(r, &list_from_bundle,
+					   global_list, depth)))
+		goto cleanup;
+
+cleanup:
+	clear_bundle_list(&list_from_bundle);
+	return result;
+}
+
 /**
  * This limits the recursion on fetch_bundle_uri_internal() when following
  * bundle lists.
  */
 static int max_bundle_uri_depth = 4;
 
+/**
+ * Recursively download all bundles advertised at the given URI
+ * to files. If the file is a bundle, then add it to the given
+ * 'list'. Otherwise, expect a bundle list and recurse on the
+ * URIs in that list according to the list mode (ANY or ALL).
+ */
 static int fetch_bundle_uri_internal(struct repository *r,
-				     const char *uri,
-				     int depth)
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list)
 {
 	int result = 0;
-	char *filename;
+	struct remote_bundle_info *bcopy;
 
 	if (depth >= max_bundle_uri_depth) {
 		warning(_("exceeded bundle URI recursion limit (%d)"),
@@ -359,36 +460,106 @@ static int fetch_bundle_uri_internal(struct repository *r,
 		return -1;
 	}
 
-	if (!(filename = find_temp_filename())) {
+	if (!bundle->file &&
+	    !(bundle->file = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
 	}
 
-	if ((result = copy_uri_to_file(filename, uri))) {
-		warning(_("failed to download bundle from URI '%s'"), uri);
+	if ((result = copy_uri_to_file(bundle->file, bundle->uri))) {
+		warning(_("failed to download bundle from URI '%s'"), bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename, 0))) {
-		warning(_("file at URI '%s' is not a bundle"), uri);
+	if ((result = !is_bundle(bundle->file, 1))) {
+		result = fetch_bundle_list_in_config_format(
+				r, list, bundle, depth);
+		if (result)
+			warning(_("file at URI '%s' is not a bundle or bundle list"),
+				bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename))) {
-		warning(_("failed to unbundle bundle from URI '%s'"), uri);
-		goto cleanup;
-	}
+	/* Copy the bundle and insert it into the global list. */
+	CALLOC_ARRAY(bcopy, 1);
+	bcopy->id = xstrdup(bundle->id);
+	bcopy->file = xstrdup(bundle->file);
+	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
+	hashmap_add(&list->bundles, &bcopy->ent);
 
 cleanup:
-	if (filename)
-		unlink(filename);
-	free(filename);
+	if (result && bundle->file)
+		unlink(bundle->file);
 	return result;
 }
 
+/**
+ * This loop iterator breaks the loop with nonzero return code on the
+ * first successful unbundling of a bundle.
+ */
+static int attempt_unbundle(struct remote_bundle_info *info, void *data)
+{
+	struct repository *r = data;
+
+	if (!info->file || info->unbundled)
+		return 0;
+
+	if (!unbundle_from_file(r, info->file)) {
+		info->unbundled = 1;
+		return 1;
+	}
+
+	return 0;
+}
+
+static int unbundle_all_bundles(struct repository *r,
+				struct bundle_list *list)
+{
+	/*
+	 * Iterate through all bundles looking for ones that can
+	 * successfully unbundle. If any succeed, then perhaps another
+	 * will succeed in the next attempt.
+	 *
+	 * Keep in mind that a non-zero result for the loop here means
+	 * the loop terminated early on a successful unbundling, which
+	 * signals that we can try again.
+	 */
+	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
+
+	return 0;
+}
+
+static int unlink_bundle(struct remote_bundle_info *info, void *data)
+{
+	if (info->file)
+		unlink_or_warn(info->file);
+	return 0;
+}
+
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
-	return fetch_bundle_uri_internal(r, uri, 0);
+	int result;
+	struct bundle_list list;
+	struct remote_bundle_info bundle = {
+		.uri = xstrdup(uri),
+		.id = xstrdup(""),
+	};
+
+	init_bundle_list(&list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &list);
+
+cleanup:
+	for_all_bundles_in_list(&list, unlink_bundle, NULL);
+	clear_bundle_list(&list);
+	clear_remote_bundle_info(&bundle, NULL);
+	return result;
 }
 
 /**
diff --git a/bundle-uri.h b/bundle-uri.h
index bc13d4c9929..4dbc269823c 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -28,6 +28,19 @@ struct remote_bundle_info {
 	 * if there was no table of contents.
 	 */
 	char *uri;
+
+	/**
+	 * If the bundle has been downloaded, then 'file' is a
+	 * filename storing its contents. Otherwise, 'file' is
+	 * NULL.
+	 */
+	char *file;
+
+	/**
+	 * If the bundle has been unbundled successfully, then
+	 * this boolean is true.
+	 */
+	unsigned unbundled:1;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index ad666a2d28a..a86dc04f528 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -41,6 +41,195 @@ test_expect_success 'clone with file:// bundle' '
 	test_cmp expect actual
 '
 
+# To get interesting tests for bundle lists, we need to construct a
+# somewhat-interesting commit history.
+#
+# ---------------- bundle-4
+#
+#       4
+#      / \
+# ----|---|------- bundle-3
+#     |   |
+#     |   3
+#     |   |
+# ----|---|------- bundle-2
+#     |   |
+#     2   |
+#     |   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'construct incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit 1 &&
+		git checkout -b left &&
+		test_commit 2 &&
+		git checkout -b right base &&
+		test_commit 3 &&
+		git checkout -b merge left &&
+		git merge right -m "4" &&
+
+		git bundle create bundle-1.bundle base &&
+		git bundle create bundle-2.bundle base..left &&
+		git bundle create bundle-3.bundle base..right &&
+		git bundle create bundle-4.bundle merge --not left right
+	)
+'
+
+test_expect_success 'clone bundle list (file, no heuristic)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = file://$(pwd)/clone-from/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-file cat-file --batch-check <oids &&
+
+	git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'clone bundle list (file, all mode, some failures)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = file://$(pwd)/clone-from/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	# No bundle-3 means bundle-4 will not apply.
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = file://$(pwd)/clone-from/bundle-5.bundle
+	EOF
+
+	GIT_TRACE2_PERF=1 \
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-all-some cat-file --batch-check <oids &&
+
+	git -C clone-all-some for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'clone bundle list (file, all mode, all failures)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = file://$(pwd)/clone-from/bundle-0.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = file://$(pwd)/clone-from/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-all-fail cat-file --batch-check <oids &&
+
+	git -C clone-all-fail for-each-ref --format="%(refname)" >refs &&
+	! grep "refs/bundles/" refs
+'
+
+test_expect_success 'clone bundle list (file, any mode)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = file://$(pwd)/clone-from/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = file://$(pwd)/clone-from/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-file cat-file --batch-check <oids &&
+
+	git -C clone-any-file for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'clone bundle list (file, any mode, all failures)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = $HTTPD_URL/bundle-0.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = $HTTPD_URL/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-fail cat-file --batch-check <oids &&
+
+	git -C clone-any-fail for-each-ref --format="%(refname)" >refs &&
+	! grep "refs/bundles/" refs
+'
+
 #########################################################################
 # HTTP tests begin here
 
@@ -75,6 +264,65 @@ test_expect_success 'clone HTTP bundle' '
 	test_config -C clone-http log.excludedecoration refs/bundle/
 '
 
+test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = $HTTPD_URL/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = $HTTPD_URL/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = $HTTPD_URL/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http cat-file --batch-check <oids
+'
+
+test_expect_success 'clone bundle list (HTTP, any mode)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = $HTTPD_URL/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = $HTTPD_URL/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-http cat-file --batch-check <oids &&
+
+	git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect actual
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 10/11] bundle-uri: quiet failed unbundlings
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (8 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 09/11] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-10 16:04       ` [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When downloading a list of bundles in "all" mode, Git has no
understanding of the dependencies between the bundles. Git attempts to
unbundle the bundles in some order, but some may not pass the
verify_bundle() step because of missing prerequisites. This is passed as
error messages to the user, even when they eventually succeed in later
attempts after their dependent bundles are unbundled.

Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the
error messages from the missing prerequisite commits. The method still
returns the number of missing prerequisit commits, allowing callers to
unbundle() to notice that the bundle failed to apply.

Use this flag in bundle-uri.c and test that the messages go away for
'git clone --bundle-uri' commands.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                |  2 +-
 bundle.c                    | 10 ++++++++--
 bundle.h                    |  3 ++-
 t/t5558-clone-bundle-uri.sh | 25 ++++++++++++++++++++-----
 4 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index c0a6fb05fad..18b993c207f 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -309,7 +309,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * the prerequisite commits.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_SKIP_REACHABLE)))
+			       VERIFY_BUNDLE_SKIP_REACHABLE | VERIFY_BUNDLE_QUIET)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 36ffeb1e0eb..143e7c4508f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -218,7 +218,10 @@ int verify_bundle(struct repository *r,
 			add_pending_object(&revs, o, name);
 			continue;
 		}
-		if (++ret == 1)
+		ret++;
+		if (flags & VERIFY_BUNDLE_QUIET)
+			continue;
+		if (ret == 1)
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
@@ -246,7 +249,10 @@ int verify_bundle(struct repository *r,
 		assert(o); /* otherwise we'd have returned early */
 		if (o->flags & SHOWN)
 			continue;
-		if (++ret == 1)
+		ret++;
+		if (flags & VERIFY_BUNDLE_QUIET)
+			continue;
+		if (ret == 1)
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
diff --git a/bundle.h b/bundle.h
index 9f798c00d93..ba453404163 100644
--- a/bundle.h
+++ b/bundle.h
@@ -32,7 +32,8 @@ int create_bundle(struct repository *r, const char *path,
 
 enum verify_bundle_flags {
 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
-	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1)
+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1),
+	VERIFY_BUNDLE_QUIET = (1 << 2),
 };
 
 int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index a86dc04f528..9b159078386 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -99,7 +99,10 @@ test_expect_success 'clone bundle list (file, no heuristic)' '
 		uri = file://$(pwd)/clone-from/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-list-file 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-list-file cat-file --batch-check <oids &&
 
@@ -141,7 +144,10 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' '
 	EOF
 
 	GIT_TRACE2_PERF=1 \
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-all-some 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-some cat-file --batch-check <oids &&
 
@@ -169,7 +175,10 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' '
 		uri = file://$(pwd)/clone-from/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-all-fail 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-fail cat-file --batch-check <oids &&
 
@@ -195,7 +204,10 @@ test_expect_success 'clone bundle list (file, any mode)' '
 		uri = file://$(pwd)/clone-from/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-any-file 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-file cat-file --batch-check <oids &&
 
@@ -284,7 +296,10 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 		uri = $HTTPD_URL/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-list-http  2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-list-http cat-file --batch-check <oids
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (9 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 10/11] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
@ 2022-10-10 16:04       ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When downloading bundles from a git-remote-https subprocess, the bundle
URI logic wants to be opportunistic and download as much as possible and
work with what did succeed. This is particularly important in the "any"
mode, where any single bundle success will work.

If the URI is not available, the git-remote-https process will die()
with a "fatal:" error message, even though that error is not actually
fatal to the super process. Since stderr is passed through, it looks
like a fatal error to the user.

Suppress stderr to avoid these errors from bubbling to the surface. The
bundle URI API adds its own warning() messages on these failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                |  1 +
 t/t5558-clone-bundle-uri.sh | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 18b993c207f..6bfba95f872 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -230,6 +230,7 @@ static int download_https_uri_to_file(const char *file, const char *uri)
 	int found_get = 0;
 
 	strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
+	cp.err = -1;
 	cp.in = -1;
 	cp.out = -1;
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9b159078386..9155f31fa2c 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -147,6 +147,8 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' '
 	git clone --bundle-uri="file://$(pwd)/bundle-list" \
 		clone-from clone-all-some 2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-some cat-file --batch-check <oids &&
@@ -178,6 +180,8 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' '
 	git clone --bundle-uri="file://$(pwd)/bundle-list" \
 		clone-from clone-all-fail 2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-fail cat-file --batch-check <oids &&
@@ -234,7 +238,11 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
 		uri = $HTTPD_URL/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-any-fail 2>err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-fail cat-file --batch-check <oids &&
 
@@ -323,7 +331,11 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 		uri = $HTTPD_URL/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-any-http 2>err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-http cat-file --batch-check <oids &&
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk
  2022-10-10 16:04       ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget
@ 2022-10-10 17:27         ` Junio C Hamano
  2022-10-10 18:13           ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-10-10 17:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The verify_bundle() method checks if a bundle can be applied to a given
> repository. This not only verifies that certain commits exist in the
> repository, but Git also checks that these commits are reachable.
>
> This behavior dates back to the original git-bundle builtin written in
> 2e0afafebd8 (Add git-bundle: move objects and references by archive,
> 2007-02-22), but the message does not go into detail why the
> reachability check is important.
>
> Since verify_bundle() is called from unbundle(), we need to add an
> option to pipe the flags through that method.

All makes sense.

> When unbundling from a list of bundles, Git will create refs that point
> to the tips of the latest bundle, which makes this reachability walk
> succeed, in theory. However, the loose refs cache does not get
> invalidated and hence the reachability walk fails. By disabling the
> reachability walk in the bundle URI code, we can get around this
> reachability check.

The above makes it sound like the real culprit is that cache goes
out of sync and the presented solution is a workaround; readers are
left in suspense if the "real" solution (as opposed to a workaround)
would come in a later step or in a future series.

> diff --git a/bundle-uri.c b/bundle-uri.c
> index 8a7c11c6393..ad5baabdd94 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -301,7 +301,13 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
>  		return 1;
>  
> -	if ((result = unbundle(r, &header, bundle_fd, NULL)))
> +	/*
> +	 * Skip the reachability walk here, since we will be adding
> +	 * a reachable ref pointing to the new tips, which will reach
> +	 * the prerequisite commits.
> +	 */
> +	if ((result = unbundle(r, &header, bundle_fd, NULL,
> +			       VERIFY_BUNDLE_SKIP_REACHABLE)))
>  		return 1;

This is not a new problem introduced in this new round, but if we
are updating this, can we fix it to omit assignment inside if
condition?

 * result is initialized to 0.

 * when unbundle returns non-zero, it is assigned to result and the
   function returns immediately, discarding whatever was assigned to
   the variable.

 * if unbundle returns zero, it is assigned to result and the
   control continues from here.  We know result is set to 0, but
   then that is what it was initialized earlier.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk
  2022-10-10 17:27         ` Junio C Hamano
@ 2022-10-10 18:13           ` Derrick Stolee
  2022-10-10 18:40             ` Junio C Hamano
  0 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-10-10 18:13 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long

On 10/10/2022 1:27 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>> When unbundling from a list of bundles, Git will create refs that point
>> to the tips of the latest bundle, which makes this reachability walk
>> succeed, in theory. However, the loose refs cache does not get
>> invalidated and hence the reachability walk fails. By disabling the
>> reachability walk in the bundle URI code, we can get around this
>> reachability check.
> 
> The above makes it sound like the real culprit is that cache goes
> out of sync and the presented solution is a workaround; readers are
> left in suspense if the "real" solution (as opposed to a workaround)
> would come in a later step or in a future series.

I've been going over the refs code multiple times today trying to
fix this "real" culprit, with no luck. I can share this interesting
point:

 * The initial loop over the bundles tries to apply each, but the
   prerequisite objects are not present so we never reach the revision
   walk. A refs/bundle/* ref is added via update_ref().

 * The second loop over the bundles tries to apply each, but the only
   bundle with its prerequisites present also finds the commits as
   reachable (this must be where the loose ref cache is populated).
   Then, a refs/bundle/* ref is added via update_ref().

 * The third loop over the bundles finds a bundle whose prerequisites
   are present, but verify_bundle() rejected it because those commits
   were not seen from any ref.

Other than identifying that issue, I was unable to track down exactly
what is happening here or offer a fix. I had considered inserting
more cache frees deep in the refs code, but I wasn't sure what effect
that would have across the wider system.

>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index 8a7c11c6393..ad5baabdd94 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -301,7 +301,13 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
>>  		return 1;
>>  
>> -	if ((result = unbundle(r, &header, bundle_fd, NULL)))
>> +	/*
>> +	 * Skip the reachability walk here, since we will be adding
>> +	 * a reachable ref pointing to the new tips, which will reach
>> +	 * the prerequisite commits.
>> +	 */
>> +	if ((result = unbundle(r, &header, bundle_fd, NULL,
>> +			       VERIFY_BUNDLE_SKIP_REACHABLE)))
>>  		return 1;
> 
> This is not a new problem introduced in this new round, but if we
> are updating this, can we fix it to omit assignment inside if
> condition?
> 
>  * result is initialized to 0.
> 
>  * when unbundle returns non-zero, it is assigned to result and the
>    function returns immediately, discarding whatever was assigned to
>    the variable.
> 
>  * if unbundle returns zero, it is assigned to result and the
>    control continues from here.  We know result is set to 0, but
>    then that is what it was initialized earlier.
 
Since we are not "trusting" the integer result of unbundle, we
can definitely stop this assignment in the if.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk
  2022-10-10 18:13           ` Derrick Stolee
@ 2022-10-10 18:40             ` Junio C Hamano
  2022-10-11 19:04               ` Derrick Stolee
  0 siblings, 1 reply; 94+ messages in thread
From: Junio C Hamano @ 2022-10-10 18:40 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long

Derrick Stolee <derrickstolee@github.com> writes:

> I've been going over the refs code multiple times today trying to
> fix this "real" culprit, with no luck. I can share this interesting
> point:
>
>  * The initial loop over the bundles tries to apply each, but the
>    prerequisite objects are not present so we never reach the revision
>    walk. A refs/bundle/* ref is added via update_ref().
>
>  * The second loop over the bundles tries to apply each, but the only
>    bundle with its prerequisites present also finds the commits as
>    reachable (this must be where the loose ref cache is populated).
>    Then, a refs/bundle/* ref is added via update_ref().
>
>  * The third loop over the bundles finds a bundle whose prerequisites
>    are present, but verify_bundle() rejected it because those commits
>    were not seen from any ref.
>
> Other than identifying that issue, I was unable to track down exactly
> what is happening here or offer a fix. I had considered inserting
> more cache frees deep in the refs code, but I wasn't sure what effect
> that would have across the wider system.

OK.  That certainly is understandable.

As a comment in the proposed log message that BUNDLE_SKIP_REACHABLE
bit is a band aid papering over a problem we punted in this series,
to guide future developers, I think what you wrote is sufficient.
We do not want them to think that skipping the check is our
preferred longer term solution and add their own hack to keep
skipping the check when they resolve "the real culprit".

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk
  2022-10-10 18:40             ` Junio C Hamano
@ 2022-10-11 19:04               ` Derrick Stolee
  0 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee @ 2022-10-11 19:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long

On 10/10/2022 2:40 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> I've been going over the refs code multiple times today trying to
>> fix this "real" culprit, with no luck. I can share this interesting
>> point:
>>
>>  * The initial loop over the bundles tries to apply each, but the
>>    prerequisite objects are not present so we never reach the revision
>>    walk. A refs/bundle/* ref is added via update_ref().
>>
>>  * The second loop over the bundles tries to apply each, but the only
>>    bundle with its prerequisites present also finds the commits as
>>    reachable (this must be where the loose ref cache is populated).
>>    Then, a refs/bundle/* ref is added via update_ref().
>>
>>  * The third loop over the bundles finds a bundle whose prerequisites
>>    are present, but verify_bundle() rejected it because those commits
>>    were not seen from any ref.
>>
>> Other than identifying that issue, I was unable to track down exactly
>> what is happening here or offer a fix. I had considered inserting
>> more cache frees deep in the refs code, but I wasn't sure what effect
>> that would have across the wider system.
> 
> OK.  That certainly is understandable.
> 
> As a comment in the proposed log message that BUNDLE_SKIP_REACHABLE
> bit is a band aid papering over a problem we punted in this series,
> to guide future developers, I think what you wrote is sufficient.
> We do not want them to think that skipping the check is our
> preferred longer term solution and add their own hack to keep
> skipping the check when they resolve "the real culprit".

I have discovered the real culprit, and my expectation was incorrect
about the loose ref cache. The key issue was that I was looking at
this loop:

	i = req_nr;
	while (i && (commit = get_revision(&revs)))
		if (commit->object.flags & PREREQ_MARK)
			i--;

and noticing that only one commit was being visited. I was not 
seeing the actually-important commit. But it wasn't the revision
walk's fault. The loop was terminating because "i" was reaching
zero!

It turns out that verify_bundles() is not clearing the PREREQ_MARK
flag, so multiple runs would incorrectly hit this short-circuit
and terminate the walk early.

I'll replace this patch with the correct fix soon.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists
  2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                         ` (10 preceding siblings ...)
  2022-10-10 16:04       ` [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52       ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
                           ` (12 more replies)
  11 siblings, 13 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

This is the third series building the bundle URI feature. It is built on top
of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
a URI to a bundle file. This series adds the capability of downloading and
parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data
structures from a list of key-value pairs. These pairs can come from a
plain-text file in Git config format, but in the future, we will support the
list being supplied by packet lines over Git's protocol v2 in the
'bundle-uri' command (reserved for the next series).

The patches are organized in this way (updated for v4):

 1. Patch 1 is a cleanup from the previous part. This allows us to simplify
    our bundle list data structure slightly.

 2. Patches 2-3 create the bundle list data structures and the logic for
    populating the list from key-value pairs.

 3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle
    list. Add unit tests that ensure this logic constructs lists correctly.
    These patches are adapted from Ævar's RFC [1] and were previously seen
    in my combined RFC [2].

 4. Patch 6 teaches Git to parse Git config files into bundle lists.

 5. Patches 7-9 implement the ability to download a bundle list and
    recursively download the contained bundles (and possibly the bundle
    lists within). This is limited by a constant depth to avoid issues with
    cycles or otherwise incorrectly configured bundle lists. This also fixes
    a previous bug when running verify_bundle() multiple times in the same
    process, as it did not clear the PREREQ_MARK flag upon leaving (see
    patch 8).

 6. Patches 10-12 suppress unhelpful warnings from user visibility.

[1]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone
--bundle-uri= ' where points to a bundle list instead of a single bundle
file.

As outlined in the design document [1], the next steps after this are:

 1. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [2]
 2. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [3]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[3]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] https://github.com/derrickstolee/git/pull/21

[5] https://github.com/derrickstolee/git/pull/22


Updates in v5
=============

 * The bug about verify_bundle() not working multile times in the same
   process is fixed without removing the revision walk. Instead, more flags
   needed to be removed as the method cleaned up after itself.


Updates in v4
=============

 * Properly updated the patch outline.

 * Jonathan Tan asked for more tests, and this revealed some interesting
   behaviors which I have now either fixed or made explicit:
   
   1. In "all" mode, we try to download and apply all bundles. Do not fail
      if a single bundle download fails.
   2. Previously, not all bundles were being applied, and this was noticed
      by the added checks for the refs/bundles/* refs at the end of the
      tests. This revealed the need for removing the reachability walk from
      verify_bundle() since the written refs/bundles/* refs were not being
      picked up by the loose ref cache. Since removing the reachability walk
      seemed like the faster (for users) option, I went that direction.
   3. While running those tests and examining the output carefully, I
      noticed several error messages related to missing prerequisites due to
      attempting unbundling in a random order. This doesn't appear in the
      later creationToken version, so I hadn't noticed it at the tip of my
      local work. These messages are removed with a new quiet mode for
      verify_bundle().


Updates in v3
=============

 * Fixed a comment about a return value of -1.
 * Fixed and tested scenario where early URIs fail in "any" mode and Git
   should try the rest of the list.
 * Instead of using 'success_count' and 'failure_count', use the iterator
   return value to terminate the "all" mode loop early.


Updates in v2
=============

Thank you to all of the voices who chimed in on the previous version. I'm
sorry it took so long for me to get a new version.

 * I've done a rather thorough overhaul to minimize how often later patches
   rewrite portions of earlier patches.

 * We no longer use a strbuf in struct remote_bundle_info. Instead, use a
   'char *' and only in the patch where it is first used.

 * The config documentation is more clearly indicating that the bundle.*
   section has no effect in the repository config (at the moment, which will
   change in the next series).

 * The bundle.version value is now parsed using git_parse_int().

 * The config key is now parsed using parse_config_key().

 * Commit messages clarify more about the context of the change in the
   bigger picture of the bundle URI effort.

 * Some printf()s are correctly changed to fprintf()s.

 * The test helper CLI is unified across the two modes. They both take a
   filename now.

 * The count of downloaded bundles is now only updated after a successful
   download, allowing the "any" mode to keep trying after a failure.

Thanks,

 * Stolee

Derrick Stolee (10):
  bundle-uri: use plain string in find_temp_filename()
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: parse bundle list in config format
  bundle-uri: limit recursion depth for bundle lists
  bundle: properly clear all revision flags
  bundle-uri: fetch a list of bundles
  bundle: add flags to verify_bundle()
  bundle-uri: quiet failed unbundlings
  bundle-uri: suppress stderr from remote-https

Ævar Arnfjörð Bjarmason (2):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing

 Documentation/config.txt        |   2 +
 Documentation/config/bundle.txt |  24 ++
 Makefile                        |   1 +
 builtin/bundle.c                |   5 +-
 bundle-uri.c                    | 458 ++++++++++++++++++++++++++++++--
 bundle-uri.h                    |  93 +++++++
 bundle.c                        |  42 +--
 bundle.h                        |  15 +-
 config.c                        |   2 +-
 config.h                        |   1 +
 t/helper/test-bundle-uri.c      |  95 +++++++
 t/helper/test-tool.c            |   1 +
 t/helper/test-tool.h            |   1 +
 t/t5558-clone-bundle-uri.sh     | 275 +++++++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 171 ++++++++++++
 t/test-lib-functions.sh         |  11 +
 transport.c                     |   2 +-
 17 files changed, 1156 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/config/bundle.txt
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh


base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1333

Range-diff vs v4:

  1:  48beccb0f5e =  1:  48beccb0f5e bundle-uri: use plain string in find_temp_filename()
  2:  f0c4457951c =  2:  f0c4457951c bundle-uri: create bundle_list struct and helpers
  3:  430e01cd2a4 =  3:  430e01cd2a4 bundle-uri: create base key-value pair parsing
  4:  cd915d57f3b =  4:  cd915d57f3b bundle-uri: create "key=value" line parsing
  5:  4d8cac67f66 =  5:  4d8cac67f66 bundle-uri: unit test "key=value" parsing
  6:  0ecae3a44b3 =  6:  0ecae3a44b3 bundle-uri: parse bundle list in config format
  7:  7e6b32313b0 =  7:  7e6b32313b0 bundle-uri: limit recursion depth for bundle lists
  -:  ----------- >  8:  8dc5a8e4e63 bundle: properly clear all revision flags
  9:  6b9c764c6b3 =  9:  51e9b8474fb bundle-uri: fetch a list of bundles
  8:  83f2cd893a4 ! 10:  fba3a4a117e bundle: add flags to verify_bundle(), skip walk
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@github.com>
      
       ## Commit message ##
     -    bundle: add flags to verify_bundle(), skip walk
     +    bundle: add flags to verify_bundle()
      
     -    The verify_bundle() method checks if a bundle can be applied to a given
     -    repository. This not only verifies that certain commits exist in the
     -    repository, but Git also checks that these commits are reachable.
     -
     -    This behavior dates back to the original git-bundle builtin written in
     -    2e0afafebd8 (Add git-bundle: move objects and references by archive,
     -    2007-02-22), but the message does not go into detail why the
     -    reachability check is important.
     -
     -    Since verify_bundle() is called from unbundle(), we need to add an
     -    option to pipe the flags through that method.
     -
     -    When unbundling from a list of bundles, Git will create refs that point
     -    to the tips of the latest bundle, which makes this reachability walk
     -    succeed, in theory. However, the loose refs cache does not get
     -    invalidated and hence the reachability walk fails. By disabling the
     -    reachability walk in the bundle URI code, we can get around this
     -    reachability check.
     +    The verify_bundle() method has a 'verbose' option, but we will want to
     +    extend this method to have more granular control over its output. First,
     +    replace this 'verbose' option with a new 'flags' option with a single
     +    possible value: VERIFY_BUNDLE_VERBOSE.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
      +	 * a reachable ref pointing to the new tips, which will reach
      +	 * the prerequisite commits.
      +	 */
     -+	if ((result = unbundle(r, &header, bundle_fd, NULL,
     -+			       VERIFY_BUNDLE_SKIP_REACHABLE)))
     ++	if ((result = unbundle(r, &header, bundle_fd, NULL, 0)))
       		return 1;
       
       	/*
     @@ bundle.c: static int list_refs(struct string_list *r, int argc, const char **arg
       	/*
       	 * Do fast check, then if any prereqs are missing then go line by line
      @@ bundle.c: int verify_bundle(struct repository *r,
     - 			error("%s", message);
       		error("%s %s", oid_to_hex(oid), name);
       	}
     --	if (revs.pending.nr != p->nr)
     -+	if (revs.pending.nr != p->nr ||
     -+	    (flags & VERIFY_BUNDLE_SKIP_REACHABLE))
     - 		goto cleanup;
     - 	req_nr = revs.pending.nr;
     - 	setup_revisions(2, argv, &revs, NULL);
     -@@ bundle.c: int verify_bundle(struct repository *r,
     - 			clear_commit_marks(commit, ALL_REV_FLAGS);
     - 	}
       
      -	if (verbose) {
      +	if (flags & VERIFY_BUNDLE_VERBOSE) {
     @@ bundle.h: int read_bundle_header_fd(int fd, struct bundle_header *header,
      +
      +enum verify_bundle_flags {
      +	VERIFY_BUNDLE_VERBOSE = (1 << 0),
     -+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1)
      +};
      +
      +int verify_bundle(struct repository *r, struct bundle_header *header,
 10:  1cae3096624 ! 11:  2e0bfa834f1 bundle-uri: quiet failed unbundlings
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     + ## builtin/bundle.c ##
     +@@ builtin/bundle.c: static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
     + 	}
     + 	close(bundle_fd);
     + 	if (verify_bundle(the_repository, &header,
     +-			  quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) {
     ++			  quiet ? VERIFY_BUNDLE_QUIET : VERIFY_BUNDLE_VERBOSE)) {
     + 		ret = 1;
     + 		goto cleanup;
     + 	}
     +
       ## bundle-uri.c ##
      @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
     + 	 * a reachable ref pointing to the new tips, which will reach
       	 * the prerequisite commits.
       	 */
     - 	if ((result = unbundle(r, &header, bundle_fd, NULL,
     --			       VERIFY_BUNDLE_SKIP_REACHABLE)))
     -+			       VERIFY_BUNDLE_SKIP_REACHABLE | VERIFY_BUNDLE_QUIET)))
     +-	if ((result = unbundle(r, &header, bundle_fd, NULL, 0)))
     ++	if ((result = unbundle(r, &header, bundle_fd, NULL,
     ++			       VERIFY_BUNDLE_QUIET)))
       		return 1;
       
       	/*
     @@ bundle.h: int create_bundle(struct repository *r, const char *path,
       
       enum verify_bundle_flags {
       	VERIFY_BUNDLE_VERBOSE = (1 << 0),
     --	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1)
     -+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1),
     -+	VERIFY_BUNDLE_QUIET = (1 << 2),
     ++	VERIFY_BUNDLE_QUIET = (1 << 1),
       };
       
       int verify_bundle(struct repository *r, struct bundle_header *header,
 11:  52a575f8a69 = 12:  5729ff2af4b bundle-uri: suppress stderr from remote-https

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename()
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
                           ` (11 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The find_temp_filename() method was created in 53a50892be2 (bundle-uri:
create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to
create a temporary filename. The odb_mkstemp() method uses a strbuf in
its interface, but we do not need to continue carrying a strbuf
throughout the bundle URI code.

Convert the find_temp_filename() method to use a 'char *' and modify its
only caller. This makes sense that we don't actually need to modify this
filename directly later, so using a strbuf is overkill.

This change will simplify the data structure for tracking a bundle list
to use plain strings instead of strbufs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 4a8cc74ed05..8b2f4e08c9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -5,22 +5,23 @@
 #include "refs.h"
 #include "run-command.h"
 
-static int find_temp_filename(struct strbuf *name)
+static char *find_temp_filename(void)
 {
 	int fd;
+	struct strbuf name = STRBUF_INIT;
 	/*
 	 * Find a temporary filename that is available. This is briefly
 	 * racy, but unlikely to collide.
 	 */
-	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX");
 	if (fd < 0) {
 		warning(_("failed to create temporary file"));
-		return -1;
+		return NULL;
 	}
 
 	close(fd);
-	unlink(name->buf);
-	return 0;
+	unlink(name.buf);
+	return strbuf_detach(&name, NULL);
 }
 
 static int download_https_uri_to_file(const char *file, const char *uri)
@@ -141,28 +142,31 @@ static int unbundle_from_file(struct repository *r, const char *file)
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
 	int result = 0;
-	struct strbuf filename = STRBUF_INIT;
+	char *filename;
 
-	if ((result = find_temp_filename(&filename)))
+	if (!(filename = find_temp_filename())) {
+		result = -1;
 		goto cleanup;
+	}
 
-	if ((result = copy_uri_to_file(filename.buf, uri))) {
+	if ((result = copy_uri_to_file(filename, uri))) {
 		warning(_("failed to download bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename.buf, 0))) {
+	if ((result = !is_bundle(filename, 0))) {
 		warning(_("file at URI '%s' is not a bundle"), uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename.buf))) {
+	if ((result = unbundle_from_file(r, filename))) {
 		warning(_("failed to unbundle bundle from URI '%s'"), uri);
 		goto cleanup;
 	}
 
 cleanup:
-	unlink(filename.buf);
-	strbuf_release(&filename);
+	if (filename)
+		unlink(filename);
+	free(filename);
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 03/12] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
                           ` (10 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.

In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.

Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.

The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:

1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
   together. The client should download all of the advertised data to
   have a complete copy of the data.

2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
   copy of the data. The client can choose arbitrarily from these
   options. In the future, the client may use pings to find the closest
   URI among geodistributed replicas, or use some other heuristic
   information added to the format.

This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 8b2f4e08c9c..f9a8db221bc 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -4,6 +4,66 @@
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
+#include "hashmap.h"
+#include "pkt-line.h"
+
+static int compare_bundles(const void *hashmap_cmp_fn_data,
+			   const struct hashmap_entry *he1,
+			   const struct hashmap_entry *he2,
+			   const void *id)
+{
+	const struct remote_bundle_info *e1 =
+		container_of(he1, const struct remote_bundle_info, ent);
+	const struct remote_bundle_info *e2 =
+		container_of(he2, const struct remote_bundle_info, ent);
+
+	return strcmp(e1->id, id ? (const char *)id : e2->id);
+}
+
+void init_bundle_list(struct bundle_list *list)
+{
+	memset(list, 0, sizeof(*list));
+
+	/* Implied defaults. */
+	list->mode = BUNDLE_MODE_ALL;
+	list->version = 1;
+
+	hashmap_init(&list->bundles, compare_bundles, NULL, 0);
+}
+
+static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
+				    void *data)
+{
+	FREE_AND_NULL(bundle->id);
+	FREE_AND_NULL(bundle->uri);
+	return 0;
+}
+
+void clear_bundle_list(struct bundle_list *list)
+{
+	if (!list)
+		return;
+
+	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
+	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+}
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data)
+{
+	struct remote_bundle_info *info;
+	struct hashmap_iter i;
+
+	hashmap_for_each_entry(&list->bundles, &i, info, ent) {
+		int result = iter(info, data);
+
+		if (result)
+			return result;
+	}
+
+	return 0;
+}
 
 static char *find_temp_filename(void)
 {
diff --git a/bundle-uri.h b/bundle-uri.h
index 8a152f1ef14..ff7e3fd3fb2 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -1,7 +1,63 @@
 #ifndef BUNDLE_URI_H
 #define BUNDLE_URI_H
 
+#include "hashmap.h"
+#include "strbuf.h"
+
 struct repository;
+struct string_list;
+
+/**
+ * The remote_bundle_info struct contains information for a single bundle
+ * URI. This may be initialized simply by a given URI or might have
+ * additional metadata associated with it if the bundle was advertised by
+ * a bundle list.
+ */
+struct remote_bundle_info {
+	struct hashmap_entry ent;
+
+	/**
+	 * The 'id' is a name given to the bundle for reference
+	 * by other bundle infos.
+	 */
+	char *id;
+
+	/**
+	 * The 'uri' is the location of the remote bundle so
+	 * it can be downloaded on-demand. This will be NULL
+	 * if there was no table of contents.
+	 */
+	char *uri;
+};
+
+#define REMOTE_BUNDLE_INFO_INIT { 0 }
+
+enum bundle_list_mode {
+	BUNDLE_MODE_NONE = 0,
+	BUNDLE_MODE_ALL,
+	BUNDLE_MODE_ANY
+};
+
+/**
+ * A bundle_list contains an unordered set of remote_bundle_info structs,
+ * as well as information about the bundle listing, such as version and
+ * mode.
+ */
+struct bundle_list {
+	int version;
+	enum bundle_list_mode mode;
+	struct hashmap bundles;
+};
+
+void init_bundle_list(struct bundle_list *list);
+void clear_bundle_list(struct bundle_list *list);
+
+typedef int (*bundle_iterator)(struct remote_bundle_info *bundle,
+			       void *data);
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data);
 
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 03/12] bundle-uri: create base key-value pair parsing
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 04/12] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                           ` (9 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.

Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:

* bundle.version: This value should be an integer. Git currently
  understands only version 1 and will ignore the list if the version is
  any other value. This version can be increased in the future if we
  need to add new keys that Git should not ignore. We can add new
  "heuristic" keys without incrementing the version.

* bundle.mode: This value should be one of "all" or "any". If this
  mode is not understood, then Git will ignore the list. This mode
  indicates whether Git needs all of the bundle list items to make a
  complete view of the content or if any single item is sufficient.

The rest of the keys use a bundle identifier "<id>" as part of the key
name. Keys using the same "<id>" describe a single bundle list item.

* bundle.<id>.uri: This stores the URI of the bundle item. This
  currently is expected to be an absolute URI, but will be relaxed to be
  a relative URI in the future.

While parsing, return an error if a URI key is repeated, since we can
make that restriction with bundle lists.

Make the git_parse_int() method global so we can parse the integer
version value carefully.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config.txt        |  2 +
 Documentation/config/bundle.txt | 24 +++++++++++
 bundle-uri.c                    | 76 +++++++++++++++++++++++++++++++++
 config.c                        |  2 +-
 config.h                        |  1 +
 5 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/config/bundle.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e376d547ce0..4280af6992e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -387,6 +387,8 @@ include::config/branch.txt[]
 
 include::config/browser.txt[]
 
+include::config/bundle.txt[]
+
 include::config/checkout.txt[]
 
 include::config/clean.txt[]
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
new file mode 100644
index 00000000000..daa21eb674a
--- /dev/null
+++ b/Documentation/config/bundle.txt
@@ -0,0 +1,24 @@
+bundle.*::
+	The `bundle.*` keys may appear in a bundle list file found via the
+	`git clone --bundle-uri` option. These keys currently have no effect
+	if placed in a repository config file, though this will change in the
+	future. See link:technical/bundle-uri.html[the bundle URI design
+	document] for more details.
+
+bundle.version::
+	This integer value advertises the version of the bundle list format
+	used by the bundle list. Currently, the only accepted value is `1`.
+
+bundle.mode::
+	This string value should be either `all` or `any`. This value describes
+	whether all of the advertised bundles are required to unbundle a
+	complete understanding of the bundled information (`all`) or if any one
+	of the listed bundle URIs is sufficient (`any`).
+
+bundle.<id>.*::
+	The `bundle.<id>.*` keys are used to describe a single item in the
+	bundle list, grouped under `<id>` for identification purposes.
+
+bundle.<id>.uri::
+	This string value defines the URI by which Git can reach the contents
+	of this `<id>`. This URI may be a bundle file or another bundle list.
diff --git a/bundle-uri.c b/bundle-uri.c
index f9a8db221bc..0bc59dd9c34 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -6,6 +6,7 @@
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "config.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+/**
+ * Given a key-value pair, update the state of the given bundle list.
+ * Returns 0 if the key-value pair is understood. Returns -1 if the key
+ * is not understood or the value is malformed.
+ */
+MAYBE_UNUSED
+static int bundle_list_update(const char *key, const char *value,
+			      struct bundle_list *list)
+{
+	struct strbuf id = STRBUF_INIT;
+	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
+	struct remote_bundle_info *bundle;
+	const char *subsection, *subkey;
+	size_t subsection_len;
+
+	if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey))
+		return -1;
+
+	if (!subsection_len) {
+		if (!strcmp(subkey, "version")) {
+			int version;
+			if (!git_parse_int(value, &version))
+				return -1;
+			if (version != 1)
+				return -1;
+
+			list->version = version;
+			return 0;
+		}
+
+		if (!strcmp(subkey, "mode")) {
+			if (!strcmp(value, "all"))
+				list->mode = BUNDLE_MODE_ALL;
+			else if (!strcmp(value, "any"))
+				list->mode = BUNDLE_MODE_ANY;
+			else
+				return -1;
+			return 0;
+		}
+
+		/* Ignore other unknown global keys. */
+		return 0;
+	}
+
+	strbuf_add(&id, subsection, subsection_len);
+
+	/*
+	 * Check for an existing bundle with this <id>, or create one
+	 * if necessary.
+	 */
+	lookup.id = id.buf;
+	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
+	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
+		CALLOC_ARRAY(bundle, 1);
+		bundle->id = strbuf_detach(&id, NULL);
+		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
+		hashmap_add(&list->bundles, &bundle->ent);
+	}
+	strbuf_release(&id);
+
+	if (!strcmp(subkey, "uri")) {
+		if (bundle->uri)
+			return -1;
+		bundle->uri = xstrdup(value);
+		return 0;
+	}
+
+	/*
+	 * At this point, we ignore any information that we don't
+	 * understand, assuming it to be hints for a heuristic the client
+	 * does not currently understand.
+	 */
+	return 0;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/config.c b/config.c
index 015bec360f5..e93101249f6 100644
--- a/config.c
+++ b/config.c
@@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max)
 	return 0;
 }
 
-static int git_parse_int(const char *value, int *ret)
+int git_parse_int(const char *value, int *ret)
 {
 	intmax_t tmp;
 	if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int)))
diff --git a/config.h b/config.h
index ca994d77147..ef9eade6414 100644
--- a/config.h
+++ b/config.h
@@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *,
 
 int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
+int git_parse_int(const char *value, int *ret);
 
 /**
  * Same as `git_config_bool`, except that it returns -1 on error rather
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 04/12] bundle-uri: create "key=value" line parsing
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (2 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 03/12] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                           ` (8 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

When advertising a bundle list over Git's protocol v2, we will use
packet lines. Each line will be of the form "key=value" representing a
bundle list. Connect the API necessary for Git's transport to the
key-value pair parsing created in the previous change.

We are not currently implementing this protocol v2 functionality, but
instead preparing to expose this parsing to be unit-testable.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 27 ++++++++++++++++++++++++++-
 bundle-uri.h | 12 ++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0bc59dd9c34..372e6fac5cf 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list,
  * Returns 0 if the key-value pair is understood. Returns -1 if the key
  * is not understood or the value is malformed.
  */
-MAYBE_UNUSED
 static int bundle_list_update(const char *key, const char *value,
 			      struct bundle_list *list)
 {
@@ -306,3 +305,29 @@ cleanup:
 	free(filename);
 	return result;
 }
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct bundle_list *list, const char *line)
+{
+	int result;
+	const char *equals;
+	struct strbuf key = STRBUF_INIT;
+
+	if (!strlen(line))
+		return error(_("bundle-uri: got an empty line"));
+
+	equals = strchr(line, '=');
+
+	if (!equals)
+		return error(_("bundle-uri: line is not of the form 'key=value'"));
+	if (line == equals || !*(equals + 1))
+		return error(_("bundle-uri: line has empty key or value"));
+
+	strbuf_add(&key, line, equals - line);
+	result = bundle_list_update(key.buf, equals + 1, list);
+	strbuf_release(&key);
+
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index ff7e3fd3fb2..90583461929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * Parse a "key=value" packet line from the bundle-uri verb.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int bundle_uri_parse_line(struct bundle_list *list,
+			  const char *line);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (3 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 04/12] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-10-12 12:52         ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 06/12] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
                           ` (7 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.

This change introduces the 'parse-key-values' subcommand, which parses
an input file as a list of lines. These are fed into
bundle_uri_parse_line() to test how we construct a 'struct bundle_list'
from that data. The list is then output to stdout as if the key-value
pairs were a Git config file.

We use an input file instead of stdin because of a future change to
parse in config-file format that works better as an input file.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile                    |   1 +
 bundle-uri.c                |  33 ++++++++++
 bundle-uri.h                |   3 +
 t/helper/test-bundle-uri.c  |  70 +++++++++++++++++++++
 t/helper/test-tool.c        |   1 +
 t/helper/test-tool.h        |   1 +
 t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++
 t/test-lib-functions.sh     |  11 ++++
 8 files changed, 241 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 7d5f48069ea..7dee0329c49 100644
--- a/Makefile
+++ b/Makefile
@@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index 372e6fac5cf..c02e7f62eb1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+static int summarize_bundle(struct remote_bundle_info *info, void *data)
+{
+	FILE *fp = data;
+	fprintf(fp, "[bundle \"%s\"]\n", info->id);
+	fprintf(fp, "\turi = %s\n", info->uri);
+	return 0;
+}
+
+void print_bundle_list(FILE *fp, struct bundle_list *list)
+{
+	const char *mode;
+
+	switch (list->mode) {
+	case BUNDLE_MODE_ALL:
+		mode = "all";
+		break;
+
+	case BUNDLE_MODE_ANY:
+		mode = "any";
+		break;
+
+	case BUNDLE_MODE_NONE:
+	default:
+		mode = "<unknown>";
+	}
+
+	fprintf(fp, "[bundle]\n");
+	fprintf(fp, "\tversion = %d\n", list->version);
+	fprintf(fp, "\tmode = %s\n", mode);
+
+	for_all_bundles_in_list(list, summarize_bundle, fp);
+}
+
 /**
  * Given a key-value pair, update the state of the given bundle list.
  * Returns 0 if the key-value pair is understood. Returns -1 if the key
diff --git a/bundle-uri.h b/bundle-uri.h
index 90583461929..0e56ab2ae5a 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list,
 			    bundle_iterator iter,
 			    void *data);
 
+struct FILE;
+void print_bundle_list(FILE *fp, struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 00000000000..0329c56544f
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,70 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse(int argc, const char **argv)
+{
+	const char *key_value_usage[] = {
+		"test-tool bundle-uri parse-key-values <input>",
+		NULL
+	};
+	const char **usage = key_value_usage;
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct bundle_list list;
+	int err = 0;
+	FILE *fp;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc != 1)
+		goto usage;
+
+	init_bundle_list(&list);
+	fp = fopen(argv[0], "r");
+	if (!fp)
+		die("failed to open '%s'", argv[0]);
+
+	while (strbuf_getline(&sb, fp) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf))
+			err = error("bad line: '%s'", sb.buf);
+	}
+	strbuf_release(&sb);
+	fclose(fp);
+
+	print_bundle_list(stdout, &list);
+
+	clear_bundle_list(&list);
+
+	return !!err;
+
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse-key-values"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c3..fbe2d9d8108 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb799271631..b2aa1f39a8f 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 00000000000..fd142a66ad5
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,121 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
+	cat >in <<-\EOF &&
+	=bogus-value
+	bogus-key=
+	EOF
+
+	cat >err.expect <<-EOF &&
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''=bogus-value'\''
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''bogus-key='\''
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+
+	bundle.two.uri=https://example.com/bundle.bdl
+
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.one.uri=https://example.com/bundle-2.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6da7273f1d5..3175d665add 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1956,3 +1956,14 @@ test_is_magic_mtime () {
 	rm -f .git/test-mtime-actual
 	return $ret
 }
+
+# Given two filenames, parse both using 'git config --list --file'
+# and compare the sorted output of those commands. Useful when
+# wanting to ignore whitespace differences and sorting concerns.
+test_cmp_config_output () {
+	git config --list --file="$1" >config-expect &&
+	git config --list --file="$2" >config-actual &&
+	sort config-expect >sorted-expect &&
+	sort config-actual >sorted-actual &&
+	test_cmp sorted-expect sorted-actual
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 06/12] bundle-uri: parse bundle list in config format
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (4 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
                           ` (6 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create bundle_uri_parse_config_format() to parse a file in config format
and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to use error_action CONFIG_ERROR_ERROR when calling
git_config_from_file_with_options() because the default action for
git_config_from_file() is to die() on a parsing error.  The current
warning isn't particularly helpful if it arises to a user, but it will
be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 27 ++++++++++++++++++++
 bundle-uri.h                |  9 +++++++
 t/helper/test-bundle-uri.c  | 49 +++++++++++++++++++++++++++---------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+), 12 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index c02e7f62eb1..3d44ec2b1e6 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static char *find_temp_filename(void)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 0e56ab2ae5a..bc13d4c9929 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+int bundle_uri_parse_config_format(const char *uri,
+				   const char *filename,
+				   struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 0329c56544f..25afd393428 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,12 +4,21 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
 	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <input>",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	const char **usage = key_value_usage;
 	struct option options[] = {
 		OPT_END(),
@@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv)
 	int err = 0;
 	FILE *fp;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc != 1)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	fp = fopen(argv[0], "r");
-	if (!fp)
-		die("failed to open '%s'", argv[0]);
 
-	while (strbuf_getline(&sb, fp) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf))
-			err = error("bad line: '%s'", sb.buf);
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc != 1)
+			goto usage;
+		fp = fopen(argv[0], "r");
+		if (!fp)
+			die("failed to open '%s'", argv[0]);
+		while (strbuf_getline(&sb, fp) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf))
+				err = error("bad line: '%s'", sb.buf);
+		}
+		fclose(fp);
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = bundle_uri_parse_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
-	fclose(fp);
 
 	print_bundle_list(stdout, &list);
 
@@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index fd142a66ad5..c2fe3f9c5a5 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	error: bad config line 1 in file in2
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (5 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 06/12] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget
                           ` (5 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The next change will start allowing us to parse bundle lists that are
downloaded from a provided bundle URI. Those lists might point to other
lists, which could proceed to an arbitrary depth (and even create
cycles). Restructure fetch_bundle_uri() to have an internal version that
has a recursion depth. Compare that to a new max_bundle_uri_depth
constant that is twice as high as we expect this depth to be for any
legitimate use of bundle list linking.

We can consider making max_bundle_uri_depth a configurable value if
there is demonstrated value in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 3d44ec2b1e6..8a7c11c6393 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+/**
+ * This limits the recursion on fetch_bundle_uri_internal() when following
+ * bundle lists.
+ */
+static int max_bundle_uri_depth = 4;
+
+static int fetch_bundle_uri_internal(struct repository *r,
+				     const char *uri,
+				     int depth)
 {
 	int result = 0;
 	char *filename;
 
+	if (depth >= max_bundle_uri_depth) {
+		warning(_("exceeded bundle URI recursion limit (%d)"),
+			max_bundle_uri_depth);
+		return -1;
+	}
+
 	if (!(filename = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
@@ -366,6 +380,11 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	return fetch_bundle_uri_internal(r, uri, 0);
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 08/12] bundle: properly clear all revision flags
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (6 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 16:17           ` Junio C Hamano
  2022-10-12 12:52         ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
                           ` (4 subsequent siblings)
  12 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The verify_bundle() method checks two things for a bundle's
prerequisites:

 1. Are these objects in the object store?
 2. Are these objects reachable from our references?

In this second question, multiple uses of verify_bundle() in the same
process can report an invalid bundle even though it is correct. The
reason is due to not clearing all of the commit marks on the commits
previously walked.

The revision walk machinery was first introduced in-process by
fb9a54150d3 (git-bundle: avoid fork() in verify_bundle(), 2007-02-22).
This implementation used "-1" as the set of flags to clear. The next
meaningful change came in 2b064697a5b (revision traversal: retire
BOUNDARY_SHOW, 2007-03-05), which introduced the PREREQ_MARK flag
instead of a flag normally controlled by the revision-walk machinery.

In 86a0a408b90 (commit: factor out
clear_commit_marks_for_object_array, 2011-10-01), the loop over the
array of commits was replaced with a new
clear_commit_marks_for_object_array(), but simultaneously the "-1" value
was replaced with "ALL_REV_FLAGS", which stopped un-setting the
PREREQ_MARK flag. This means that if multiple commits were marked by the
PREREQ_MARK in a previous run of verify_bundle(), then this loop could
terminate early due to 'i' going to zero:

	while (i && (commit = get_revision(&revs)))
		if (commit->object.flags & PREREQ_MARK)
			i--;

The flag clearing work was changed again in 63647391e6c (bundle: avoid
using the rev_info flag leak_pending, 2017-12-25), but that was only
cosmetic and did not change the behavior.

It may seem that it would be sufficient to add the PREREQ_MARK flag to
the clear_commit_marks() call in its current location. However, we
actually need to do it in the "cleanup:" step, since the first loop
checking "Are these objects in the object store?" might add the
PREREQ_MARK flag to some objects and then terminate without performing a
walk due to one missing object. By clearing the flags in all cases, we
avoid this issue when running verify_bundle() multiple times in the same
process.

Moving this loop to the cleanup step alone would cause a segfault when
running 'git bundle verify' outside of a repository, but this is because
of that error condition using "goto cleanup" when returning is perfectly
safe. Nothing has been initialized at that point, so we can return
immediately without causing any leaks.

This behavior is verified carefully by a test that will be added soon
when Git learns to download bundle lists in a 'git clone --bundle-uri'
command.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/bundle.c b/bundle.c
index 0208e6d90d3..c277f3b9360 100644
--- a/bundle.c
+++ b/bundle.c
@@ -202,10 +202,8 @@ int verify_bundle(struct repository *r,
 	int i, ret = 0, req_nr;
 	const char *message = _("Repository lacks these prerequisite commits:");
 
-	if (!r || !r->objects || !r->objects->odb) {
-		ret = error(_("need a repository to verify a bundle"));
-		goto cleanup;
-	}
+	if (!r || !r->objects || !r->objects->odb)
+		return error(_("need a repository to verify a bundle"));
 
 	repo_init_revisions(r, &revs, NULL);
 	for (i = 0; i < p->nr; i++) {
@@ -250,15 +248,6 @@ int verify_bundle(struct repository *r,
 		error("%s %s", oid_to_hex(oid), name);
 	}
 
-	/* Clean up objects used, as they will be reused. */
-	for (i = 0; i < p->nr; i++) {
-		struct string_list_item *e = p->items + i;
-		struct object_id *oid = e->util;
-		commit = lookup_commit_reference_gently(r, oid, 1);
-		if (commit)
-			clear_commit_marks(commit, ALL_REV_FLAGS);
-	}
-
 	if (verbose) {
 		struct string_list *r;
 
@@ -287,6 +276,14 @@ int verify_bundle(struct repository *r,
 				  list_objects_filter_spec(&header->filter));
 	}
 cleanup:
+	/* Clean up objects used, as they will be reused. */
+	for (i = 0; i < p->nr; i++) {
+		struct string_list_item *e = p->items + i;
+		struct object_id *oid = e->util;
+		commit = lookup_commit_reference_gently(r, oid, 1);
+		if (commit)
+			clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK);
+	}
 	release_revisions(&revs);
 	return ret;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 09/12] bundle-uri: fetch a list of bundles
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (7 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-26 19:06           ` Junio C Hamano
  2022-10-12 12:52         ` [PATCH v5 10/12] bundle: add flags to verify_bundle() Derrick Stolee via GitGitGadget
                           ` (3 subsequent siblings)
  12 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 203 ++++++++++++++++++++++++++---
 bundle-uri.h                |  13 ++
 t/t5558-clone-bundle-uri.sh | 248 ++++++++++++++++++++++++++++++++++++
 3 files changed, 448 insertions(+), 16 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 8a7c11c6393..70bfd2defee 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
 {
 	FREE_AND_NULL(bundle->id);
 	FREE_AND_NULL(bundle->uri);
+	FREE_AND_NULL(bundle->file);
+	bundle->unbundled = 0;
 	return 0;
 }
 
@@ -334,18 +336,117 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
+struct bundle_list_context {
+	struct repository *r;
+	struct bundle_list *list;
+	enum bundle_list_mode mode;
+	int count;
+	int depth;
+};
+
+/*
+ * This early definition is necessary because we use indirect recursion:
+ *
+ * While iterating through a bundle list that was downloaded as part
+ * of fetch_bundle_uri_internal(), iterator methods eventually call it
+ * again, but with depth + 1.
+ */
+static int fetch_bundle_uri_internal(struct repository *r,
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list);
+
+static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
+{
+	int res;
+	struct bundle_list_context *ctx = data;
+
+	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
+		return 0;
+
+	res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
+
+	/*
+	 * Only increment count if the download succeeded. If our mode is
+	 * BUNDLE_MODE_ANY, then we will want to try other URIs in the
+	 * list in case they work instead.
+	 */
+	if (!res)
+		ctx->count++;
+
+	/*
+	 * To be opportunistic as possible, we continue iterating and
+	 * download as many bundles as we can, so we can apply the ones
+	 * that work, even in BUNDLE_MODE_ALL mode.
+	 */
+	return 0;
+}
+
+static int download_bundle_list(struct repository *r,
+				struct bundle_list *local_list,
+				struct bundle_list *global_list,
+				int depth)
+{
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = global_list,
+		.depth = depth + 1,
+		.mode = local_list->mode,
+	};
+
+	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
+}
+
+static int fetch_bundle_list_in_config_format(struct repository *r,
+					      struct bundle_list *global_list,
+					      struct remote_bundle_info *bundle,
+					      int depth)
+{
+	int result;
+	struct bundle_list list_from_bundle;
+
+	init_bundle_list(&list_from_bundle);
+
+	if ((result = bundle_uri_parse_config_format(bundle->uri,
+						     bundle->file,
+						     &list_from_bundle)))
+		goto cleanup;
+
+	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
+		warning(_("unrecognized bundle mode from URI '%s'"),
+			bundle->uri);
+		result = -1;
+		goto cleanup;
+	}
+
+	if ((result = download_bundle_list(r, &list_from_bundle,
+					   global_list, depth)))
+		goto cleanup;
+
+cleanup:
+	clear_bundle_list(&list_from_bundle);
+	return result;
+}
+
 /**
  * This limits the recursion on fetch_bundle_uri_internal() when following
  * bundle lists.
  */
 static int max_bundle_uri_depth = 4;
 
+/**
+ * Recursively download all bundles advertised at the given URI
+ * to files. If the file is a bundle, then add it to the given
+ * 'list'. Otherwise, expect a bundle list and recurse on the
+ * URIs in that list according to the list mode (ANY or ALL).
+ */
 static int fetch_bundle_uri_internal(struct repository *r,
-				     const char *uri,
-				     int depth)
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list)
 {
 	int result = 0;
-	char *filename;
+	struct remote_bundle_info *bcopy;
 
 	if (depth >= max_bundle_uri_depth) {
 		warning(_("exceeded bundle URI recursion limit (%d)"),
@@ -353,36 +454,106 @@ static int fetch_bundle_uri_internal(struct repository *r,
 		return -1;
 	}
 
-	if (!(filename = find_temp_filename())) {
+	if (!bundle->file &&
+	    !(bundle->file = find_temp_filename())) {
 		result = -1;
 		goto cleanup;
 	}
 
-	if ((result = copy_uri_to_file(filename, uri))) {
-		warning(_("failed to download bundle from URI '%s'"), uri);
+	if ((result = copy_uri_to_file(bundle->file, bundle->uri))) {
+		warning(_("failed to download bundle from URI '%s'"), bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = !is_bundle(filename, 0))) {
-		warning(_("file at URI '%s' is not a bundle"), uri);
+	if ((result = !is_bundle(bundle->file, 1))) {
+		result = fetch_bundle_list_in_config_format(
+				r, list, bundle, depth);
+		if (result)
+			warning(_("file at URI '%s' is not a bundle or bundle list"),
+				bundle->uri);
 		goto cleanup;
 	}
 
-	if ((result = unbundle_from_file(r, filename))) {
-		warning(_("failed to unbundle bundle from URI '%s'"), uri);
-		goto cleanup;
-	}
+	/* Copy the bundle and insert it into the global list. */
+	CALLOC_ARRAY(bcopy, 1);
+	bcopy->id = xstrdup(bundle->id);
+	bcopy->file = xstrdup(bundle->file);
+	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
+	hashmap_add(&list->bundles, &bcopy->ent);
 
 cleanup:
-	if (filename)
-		unlink(filename);
-	free(filename);
+	if (result && bundle->file)
+		unlink(bundle->file);
 	return result;
 }
 
+/**
+ * This loop iterator breaks the loop with nonzero return code on the
+ * first successful unbundling of a bundle.
+ */
+static int attempt_unbundle(struct remote_bundle_info *info, void *data)
+{
+	struct repository *r = data;
+
+	if (!info->file || info->unbundled)
+		return 0;
+
+	if (!unbundle_from_file(r, info->file)) {
+		info->unbundled = 1;
+		return 1;
+	}
+
+	return 0;
+}
+
+static int unbundle_all_bundles(struct repository *r,
+				struct bundle_list *list)
+{
+	/*
+	 * Iterate through all bundles looking for ones that can
+	 * successfully unbundle. If any succeed, then perhaps another
+	 * will succeed in the next attempt.
+	 *
+	 * Keep in mind that a non-zero result for the loop here means
+	 * the loop terminated early on a successful unbundling, which
+	 * signals that we can try again.
+	 */
+	while (for_all_bundles_in_list(list, attempt_unbundle, r)) ;
+
+	return 0;
+}
+
+static int unlink_bundle(struct remote_bundle_info *info, void *data)
+{
+	if (info->file)
+		unlink_or_warn(info->file);
+	return 0;
+}
+
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
-	return fetch_bundle_uri_internal(r, uri, 0);
+	int result;
+	struct bundle_list list;
+	struct remote_bundle_info bundle = {
+		.uri = xstrdup(uri),
+		.id = xstrdup(""),
+	};
+
+	init_bundle_list(&list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &list);
+
+cleanup:
+	for_all_bundles_in_list(&list, unlink_bundle, NULL);
+	clear_bundle_list(&list);
+	clear_remote_bundle_info(&bundle, NULL);
+	return result;
 }
 
 /**
diff --git a/bundle-uri.h b/bundle-uri.h
index bc13d4c9929..4dbc269823c 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -28,6 +28,19 @@ struct remote_bundle_info {
 	 * if there was no table of contents.
 	 */
 	char *uri;
+
+	/**
+	 * If the bundle has been downloaded, then 'file' is a
+	 * filename storing its contents. Otherwise, 'file' is
+	 * NULL.
+	 */
+	char *file;
+
+	/**
+	 * If the bundle has been unbundled successfully, then
+	 * this boolean is true.
+	 */
+	unsigned unbundled:1;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index ad666a2d28a..a86dc04f528 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -41,6 +41,195 @@ test_expect_success 'clone with file:// bundle' '
 	test_cmp expect actual
 '
 
+# To get interesting tests for bundle lists, we need to construct a
+# somewhat-interesting commit history.
+#
+# ---------------- bundle-4
+#
+#       4
+#      / \
+# ----|---|------- bundle-3
+#     |   |
+#     |   3
+#     |   |
+# ----|---|------- bundle-2
+#     |   |
+#     2   |
+#     |   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'construct incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit 1 &&
+		git checkout -b left &&
+		test_commit 2 &&
+		git checkout -b right base &&
+		test_commit 3 &&
+		git checkout -b merge left &&
+		git merge right -m "4" &&
+
+		git bundle create bundle-1.bundle base &&
+		git bundle create bundle-2.bundle base..left &&
+		git bundle create bundle-3.bundle base..right &&
+		git bundle create bundle-4.bundle merge --not left right
+	)
+'
+
+test_expect_success 'clone bundle list (file, no heuristic)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = file://$(pwd)/clone-from/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-file cat-file --batch-check <oids &&
+
+	git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'clone bundle list (file, all mode, some failures)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = file://$(pwd)/clone-from/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+
+	# No bundle-3 means bundle-4 will not apply.
+
+	[bundle "bundle-4"]
+		uri = file://$(pwd)/clone-from/bundle-4.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = file://$(pwd)/clone-from/bundle-5.bundle
+	EOF
+
+	GIT_TRACE2_PERF=1 \
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-all-some cat-file --batch-check <oids &&
+
+	git -C clone-all-some for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'clone bundle list (file, all mode, all failures)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = file://$(pwd)/clone-from/bundle-0.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = file://$(pwd)/clone-from/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-all-fail cat-file --batch-check <oids &&
+
+	git -C clone-all-fail for-each-ref --format="%(refname)" >refs &&
+	! grep "refs/bundles/" refs
+'
+
+test_expect_success 'clone bundle list (file, any mode)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = file://$(pwd)/clone-from/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = file://$(pwd)/clone-from/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-file cat-file --batch-check <oids &&
+
+	git -C clone-any-file for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'clone bundle list (file, any mode, all failures)' '
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = $HTTPD_URL/bundle-0.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = $HTTPD_URL/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-fail cat-file --batch-check <oids &&
+
+	git -C clone-any-fail for-each-ref --format="%(refname)" >refs &&
+	! grep "refs/bundles/" refs
+'
+
 #########################################################################
 # HTTP tests begin here
 
@@ -75,6 +264,65 @@ test_expect_success 'clone HTTP bundle' '
 	test_config -C clone-http log.excludedecoration refs/bundle/
 '
 
+test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = $HTTPD_URL/bundle-2.bundle
+
+	[bundle "bundle-3"]
+		uri = $HTTPD_URL/bundle-3.bundle
+
+	[bundle "bundle-4"]
+		uri = $HTTPD_URL/bundle-4.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http cat-file --batch-check <oids
+'
+
+test_expect_success 'clone bundle list (HTTP, any mode)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = any
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-0"]
+		uri = $HTTPD_URL/bundle-0.bundle
+
+	[bundle "bundle-1"]
+		uri = $HTTPD_URL/bundle-1.bundle
+
+	# Does not exist. Should be skipped.
+	[bundle "bundle-5"]
+		uri = $HTTPD_URL/bundle-5.bundle
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-any-http cat-file --batch-check <oids &&
+
+	git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect actual
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 10/12] bundle: add flags to verify_bundle()
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (8 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 12:52         ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
                           ` (2 subsequent siblings)
  12 siblings, 0 replies; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The verify_bundle() method has a 'verbose' option, but we will want to
extend this method to have more granular control over its output. First,
replace this 'verbose' option with a new 'flags' option with a single
possible value: VERIFY_BUNDLE_VERBOSE.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/bundle.c |  5 +++--
 bundle-uri.c     |  7 ++++++-
 bundle.c         |  9 +++++----
 bundle.h         | 14 ++++++++++++--
 transport.c      |  2 +-
 5 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 2adad545a2e..7d983a238f0 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -119,7 +119,8 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
 		goto cleanup;
 	}
 	close(bundle_fd);
-	if (verify_bundle(the_repository, &header, !quiet)) {
+	if (verify_bundle(the_repository, &header,
+			  quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) {
 		ret = 1;
 		goto cleanup;
 	}
@@ -185,7 +186,7 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
 			     _("Unbundling objects"), NULL);
 	ret = !!unbundle(the_repository, &header, bundle_fd,
-			 &extra_index_pack_args) ||
+			 &extra_index_pack_args, 0) ||
 		list_bundle_refs(&header, argc, argv);
 	bundle_header_release(&header);
 cleanup:
diff --git a/bundle-uri.c b/bundle-uri.c
index 70bfd2defee..d9060be707e 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -303,7 +303,12 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
 		return 1;
 
-	if ((result = unbundle(r, &header, bundle_fd, NULL)))
+	/*
+	 * Skip the reachability walk here, since we will be adding
+	 * a reachable ref pointing to the new tips, which will reach
+	 * the prerequisite commits.
+	 */
+	if ((result = unbundle(r, &header, bundle_fd, NULL, 0)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index c277f3b9360..1f6a7f782e1 100644
--- a/bundle.c
+++ b/bundle.c
@@ -189,7 +189,7 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
 
 int verify_bundle(struct repository *r,
 		  struct bundle_header *header,
-		  int verbose)
+		  enum verify_bundle_flags flags)
 {
 	/*
 	 * Do fast check, then if any prereqs are missing then go line by line
@@ -248,7 +248,7 @@ int verify_bundle(struct repository *r,
 		error("%s %s", oid_to_hex(oid), name);
 	}
 
-	if (verbose) {
+	if (flags & VERIFY_BUNDLE_VERBOSE) {
 		struct string_list *r;
 
 		r = &header->references;
@@ -617,7 +617,8 @@ err:
 }
 
 int unbundle(struct repository *r, struct bundle_header *header,
-	     int bundle_fd, struct strvec *extra_index_pack_args)
+	     int bundle_fd, struct strvec *extra_index_pack_args,
+	     enum verify_bundle_flags flags)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
@@ -631,7 +632,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
 		strvec_clear(extra_index_pack_args);
 	}
 
-	if (verify_bundle(r, header, 0))
+	if (verify_bundle(r, header, flags))
 		return -1;
 	ip.in = bundle_fd;
 	ip.no_stdout = 1;
diff --git a/bundle.h b/bundle.h
index 0c052f54964..6652e819981 100644
--- a/bundle.h
+++ b/bundle.h
@@ -29,7 +29,13 @@ int read_bundle_header_fd(int fd, struct bundle_header *header,
 int create_bundle(struct repository *r, const char *path,
 		  int argc, const char **argv, struct strvec *pack_options,
 		  int version);
-int verify_bundle(struct repository *r, struct bundle_header *header, int verbose);
+
+enum verify_bundle_flags {
+	VERIFY_BUNDLE_VERBOSE = (1 << 0),
+};
+
+int verify_bundle(struct repository *r, struct bundle_header *header,
+		  enum verify_bundle_flags flags);
 
 /**
  * Unbundle after reading the header with read_bundle_header().
@@ -40,9 +46,13 @@ int verify_bundle(struct repository *r, struct bundle_header *header, int verbos
  * Provide "extra_index_pack_args" to pass any extra arguments
  * (e.g. "-v" for verbose/progress), NULL otherwise. The provided
  * "extra_index_pack_args" (if any) will be strvec_clear()'d for you.
+ *
+ * Before unbundling, this method will call verify_bundle() with the
+ * given 'flags'.
  */
 int unbundle(struct repository *r, struct bundle_header *header,
-	     int bundle_fd, struct strvec *extra_index_pack_args);
+	     int bundle_fd, struct strvec *extra_index_pack_args,
+	     enum verify_bundle_flags flags);
 int list_bundle_refs(struct bundle_header *header,
 		int argc, const char **argv);
 
diff --git a/transport.c b/transport.c
index 52db7a3cb09..c5d3042731a 100644
--- a/transport.c
+++ b/transport.c
@@ -178,7 +178,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
-		       &extra_index_pack_args);
+		       &extra_index_pack_args, 0);
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 11/12] bundle-uri: quiet failed unbundlings
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (9 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 10/12] bundle: add flags to verify_bundle() Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-12 16:32           ` Junio C Hamano
  2022-10-12 12:52         ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
  2022-10-26 14:34         ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
  12 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When downloading a list of bundles in "all" mode, Git has no
understanding of the dependencies between the bundles. Git attempts to
unbundle the bundles in some order, but some may not pass the
verify_bundle() step because of missing prerequisites. This is passed as
error messages to the user, even when they eventually succeed in later
attempts after their dependent bundles are unbundled.

Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the
error messages from the missing prerequisite commits. The method still
returns the number of missing prerequisit commits, allowing callers to
unbundle() to notice that the bundle failed to apply.

Use this flag in bundle-uri.c and test that the messages go away for
'git clone --bundle-uri' commands.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/bundle.c            |  2 +-
 bundle-uri.c                |  3 ++-
 bundle.c                    | 10 ++++++++--
 bundle.h                    |  1 +
 t/t5558-clone-bundle-uri.sh | 25 ++++++++++++++++++++-----
 5 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 7d983a238f0..fd4586b09e0 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -120,7 +120,7 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
 	}
 	close(bundle_fd);
 	if (verify_bundle(the_repository, &header,
-			  quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) {
+			  quiet ? VERIFY_BUNDLE_QUIET : VERIFY_BUNDLE_VERBOSE)) {
 		ret = 1;
 		goto cleanup;
 	}
diff --git a/bundle-uri.c b/bundle-uri.c
index d9060be707e..d872acf5ab0 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -308,7 +308,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * a reachable ref pointing to the new tips, which will reach
 	 * the prerequisite commits.
 	 */
-	if ((result = unbundle(r, &header, bundle_fd, NULL, 0)))
+	if ((result = unbundle(r, &header, bundle_fd, NULL,
+			       VERIFY_BUNDLE_QUIET)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 1f6a7f782e1..4ef7256aa11 100644
--- a/bundle.c
+++ b/bundle.c
@@ -216,7 +216,10 @@ int verify_bundle(struct repository *r,
 			add_pending_object(&revs, o, name);
 			continue;
 		}
-		if (++ret == 1)
+		ret++;
+		if (flags & VERIFY_BUNDLE_QUIET)
+			continue;
+		if (ret == 1)
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
@@ -243,7 +246,10 @@ int verify_bundle(struct repository *r,
 		assert(o); /* otherwise we'd have returned early */
 		if (o->flags & SHOWN)
 			continue;
-		if (++ret == 1)
+		ret++;
+		if (flags & VERIFY_BUNDLE_QUIET)
+			continue;
+		if (ret == 1)
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
diff --git a/bundle.h b/bundle.h
index 6652e819981..575c34245d1 100644
--- a/bundle.h
+++ b/bundle.h
@@ -32,6 +32,7 @@ int create_bundle(struct repository *r, const char *path,
 
 enum verify_bundle_flags {
 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
+	VERIFY_BUNDLE_QUIET = (1 << 1),
 };
 
 int verify_bundle(struct repository *r, struct bundle_header *header,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index a86dc04f528..9b159078386 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -99,7 +99,10 @@ test_expect_success 'clone bundle list (file, no heuristic)' '
 		uri = file://$(pwd)/clone-from/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-list-file 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-list-file cat-file --batch-check <oids &&
 
@@ -141,7 +144,10 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' '
 	EOF
 
 	GIT_TRACE2_PERF=1 \
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-all-some 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-some cat-file --batch-check <oids &&
 
@@ -169,7 +175,10 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' '
 		uri = file://$(pwd)/clone-from/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-all-fail 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-fail cat-file --batch-check <oids &&
 
@@ -195,7 +204,10 @@ test_expect_success 'clone bundle list (file, any mode)' '
 		uri = file://$(pwd)/clone-from/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-any-file 2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-file cat-file --batch-check <oids &&
 
@@ -284,7 +296,10 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 		uri = $HTTPD_URL/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http &&
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-list-http  2>err &&
+	! grep "Repository lacks these prerequisite commits" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-list-http cat-file --batch-check <oids
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (10 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
@ 2022-10-12 12:52         ` Derrick Stolee via GitGitGadget
  2022-10-26 18:54           ` Junio C Hamano
  2022-10-26 14:34         ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
  12 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When downloading bundles from a git-remote-https subprocess, the bundle
URI logic wants to be opportunistic and download as much as possible and
work with what did succeed. This is particularly important in the "any"
mode, where any single bundle success will work.

If the URI is not available, the git-remote-https process will die()
with a "fatal:" error message, even though that error is not actually
fatal to the super process. Since stderr is passed through, it looks
like a fatal error to the user.

Suppress stderr to avoid these errors from bubbling to the surface. The
bundle URI API adds its own warning() messages on these failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                |  1 +
 t/t5558-clone-bundle-uri.sh | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index d872acf5ab0..79a914f961b 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -230,6 +230,7 @@ static int download_https_uri_to_file(const char *file, const char *uri)
 	int found_get = 0;
 
 	strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
+	cp.err = -1;
 	cp.in = -1;
 	cp.out = -1;
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9b159078386..9155f31fa2c 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -147,6 +147,8 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' '
 	git clone --bundle-uri="file://$(pwd)/bundle-list" \
 		clone-from clone-all-some 2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-some cat-file --batch-check <oids &&
@@ -178,6 +180,8 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' '
 	git clone --bundle-uri="file://$(pwd)/bundle-list" \
 		clone-from clone-all-fail 2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-all-fail cat-file --batch-check <oids &&
@@ -234,7 +238,11 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
 		uri = $HTTPD_URL/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail &&
+	git clone --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from clone-any-fail 2>err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-fail cat-file --batch-check <oids &&
 
@@ -323,7 +331,11 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 		uri = $HTTPD_URL/bundle-5.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http &&
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-any-http 2>err &&
+	! grep "fatal" err &&
+	grep "warning: failed to download bundle from URI" err &&
+
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-any-http cat-file --batch-check <oids &&
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 08/12] bundle: properly clear all revision flags
  2022-10-12 12:52         ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget
@ 2022-10-12 16:17           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-10-12 16:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The verify_bundle() method checks two things for a bundle's
> prerequisites:
>
>  1. Are these objects in the object store?
>  2. Are these objects reachable from our references?
>
> In this second question, multiple uses of verify_bundle() in the same
> process can report an invalid bundle even though it is correct. The
> reason is due to not clearing all of the commit marks on the commits
> previously walked.
> ...
> Moving this loop to the cleanup step alone would cause a segfault when
> running 'git bundle verify' outside of a repository, but this is because
> of that error condition using "goto cleanup" when returning is perfectly
> safe. Nothing has been initialized at that point, so we can return
> immediately without causing any leaks.

Nicely analyzed.  The implementation clearly follows the design
described above.  Much better than the previous iteration.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 11/12] bundle-uri: quiet failed unbundlings
  2022-10-12 12:52         ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
@ 2022-10-12 16:32           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-10-12 16:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When downloading a list of bundles in "all" mode, Git has no
> understanding of the dependencies between the bundles. Git attempts to
> unbundle the bundles in some order, but some may not pass the
> verify_bundle() step because of missing prerequisites. This is passed as
> error messages to the user, even when they eventually succeed in later
> attempts after their dependent bundles are unbundled.
>
> Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the
> error messages from the missing prerequisite commits. The method still
> returns the number of missing prerequisit commits, allowing callers to
> unbundle() to notice that the bundle failed to apply.
>
> Use this flag in bundle-uri.c and test that the messages go away for
> 'git clone --bundle-uri' commands.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

Interesting that we ended up with <quiet, normal, verbose> verbosity
levels, but "bundle verify --verbose" does not (have to) exist, as
that is the default, and 0 (aka "normal") is no longer used to call
verify_bundle() by anybody.

I actually was wondering that with SKIP_REACHABLE gone, we would
lose the "enum verify_bundle_flags" altogether, without the need for
a new "quiet" option.  But that would not work as unbundle() calls
verify_bundle() and callers of unbundle() do not necessarily want
the verification step to squelch errors.

So looks good overall.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists
  2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
                           ` (11 preceding siblings ...)
  2022-10-12 12:52         ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
@ 2022-10-26 14:34         ` Derrick Stolee
  2022-10-26 16:06           ` Junio C Hamano
  12 siblings, 1 reply; 94+ messages in thread
From: Derrick Stolee @ 2022-10-26 14:34 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long

On 10/12/2022 8:52 AM, Derrick Stolee via GitGitGadget wrote:
> This is the third series building the bundle URI feature. It is built on top
> of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
> a URI to a bundle file. This series adds the capability of downloading and
> parsing a bundle list and then downloading the URIs in that list.
> 
> The core functionality of bundle lists is implemented by creating data
> structures from a list of key-value pairs. These pairs can come from a
> plain-text file in Git config format, but in the future, we will support the
> list being supplied by packet lines over Git's protocol v2 in the
> 'bundle-uri' command (reserved for the next series).

This version has been available for a while now without comment. Could
we consider it for merging to 'next' soon?

I want to wait for this series to merge into 'master' before sending
part IV on top, which advertises bundle URIs over protocol v2.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists
  2022-10-26 14:34         ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
@ 2022-10-26 16:06           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-10-26 16:06 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long

Derrick Stolee <derrickstolee@github.com> writes:

> This version has been available for a while now without comment. Could
> we consider it for merging to 'next' soon?

Could somebody who has reviewed it fully give an Ack (or two)?  I
know an earlier rounds had some comments, but after v3 things have
quieted down.

I know the change from v4 to v5 has good improvements, but do not
claim to have read the other parts in detail.

Thanks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https
  2022-10-12 12:52         ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
@ 2022-10-26 18:54           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-10-26 18:54 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When downloading bundles from a git-remote-https subprocess, the bundle
> URI logic wants to be opportunistic and download as much as possible and
> work with what did succeed. This is particularly important in the "any"
> mode, where any single bundle success will work.
>
> If the URI is not available, the git-remote-https process will die()
> with a "fatal:" error message, even though that error is not actually
> fatal to the super process. Since stderr is passed through, it looks
> like a fatal error to the user.
>
> Suppress stderr to avoid these errors from bubbling to the surface. The
> bundle URI API adds its own warning() messages on these failures.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle-uri.c                |  1 +
>  t/t5558-clone-bundle-uri.sh | 16 ++++++++++++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)

So this is the same in spirit as [11/12] to squelch errors from an
action that we are prepared to fail.  If we had an easy way to
squelch only one class of errors (e.g. the resource no longer exists
at the URI) while allowing others to pass (e.g. we downloaded but it
was corrupt), that might be even better when somebody is debugging
the thing, but to an end user, it is a hopefully ignorable failure
either way, as long as there are other alternatives in the set of
bundles in the "any" mode.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v5 09/12] bundle-uri: fetch a list of bundles
  2022-10-12 12:52         ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-10-26 19:06           ` Junio C Hamano
  0 siblings, 0 replies; 94+ messages in thread
From: Junio C Hamano @ 2022-10-26 19:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo,
	Jonathan Tan, Teng Long, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int fetch_bundle_list_in_config_format(struct repository *r,
> +					      struct bundle_list *global_list,
> +					      struct remote_bundle_info *bundle,
> +					      int depth)
> +{
> +	int result;
> +	struct bundle_list list_from_bundle;
> +
> +	init_bundle_list(&list_from_bundle);
> +
> +	if ((result = bundle_uri_parse_config_format(bundle->uri,
> +						     bundle->file,
> +						     &list_from_bundle)))
> +		goto cleanup;

It makes us a bit nervous to apply the config parser directly on
data controlled by a third-party.  bundle_uri_parse_config_format()
hopefully is careful enough to avoid including other local files and
call generic callbacks to affect the actual configuration used by
the process.

It seems bundle_list_update() discards everything it does not (care
to) understand, and safe to call from config_to_bundle_list(), which
in turn is called from here.

OK.

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2022-10-26 19:09 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
2022-08-22 17:57   ` Junio C Hamano
2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
2022-08-22 18:20   ` Junio C Hamano
2022-08-23 16:29     ` Derrick Stolee
2022-08-31 22:10       ` Jonathan Tan
2022-08-31 22:02   ` Glen Choo
2022-09-01  2:38   ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Teng Long
2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-08-22 19:17   ` Junio C Hamano
2022-08-23 16:31     ` Derrick Stolee
2022-09-02 23:41   ` Josh Steadmon
2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-09-01  2:56   ` Teng Long
2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
2022-08-22 19:25   ` Junio C Hamano
2022-08-23 16:43     ` Derrick Stolee
2022-08-31 22:18     ` Jonathan Tan
2022-09-01  8:05   ` Teng Long
2022-08-22 15:12 ` [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
2022-09-02 23:51   ` Josh Steadmon
2022-09-05 12:50   ` Teng Long
2022-09-08 17:10     ` Derrick Stolee
2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
2022-09-09 14:33   ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget
2022-09-09 17:24     ` Junio C Hamano
2022-09-19 17:55       ` Derrick Stolee
2022-09-09 14:33   ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
2022-09-09 17:56     ` Junio C Hamano
2022-09-19 17:54       ` Derrick Stolee
2022-09-19 18:16         ` Junio C Hamano
2022-09-09 14:33   ` [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
2022-09-09 14:33   ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
2022-09-29 21:49     ` Jonathan Tan
2022-09-09 14:33   ` [PATCH v2 5/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-09-09 14:33   ` [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-09-09 14:33   ` [PATCH v2 7/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
2022-09-09 14:33   ` [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
2022-09-09 14:33   ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
2022-09-29 21:58     ` Jonathan Tan
2022-09-30 12:49       ` Derrick Stolee
2022-09-26 13:19   ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
2022-09-26 19:10     ` Junio C Hamano
2022-09-29 22:00       ` Jonathan Tan
2022-09-30 13:21         ` Derrick Stolee
2022-10-04 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 3/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 4/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 6/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
2022-10-04 12:34     ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
2022-10-04 21:44       ` Jonathan Tan
2022-10-07 13:29         ` Derrick Stolee
2022-10-04 12:34     ` [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
2022-10-10 16:04     ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 03/11] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 04/11] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 06/11] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget
2022-10-10 17:27         ` Junio C Hamano
2022-10-10 18:13           ` Derrick Stolee
2022-10-10 18:40             ` Junio C Hamano
2022-10-11 19:04               ` Derrick Stolee
2022-10-10 16:04       ` [PATCH v4 09/11] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 10/11] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
2022-10-10 16:04       ` [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
2022-10-12 12:52       ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 03/12] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 04/12] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 06/12] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget
2022-10-12 16:17           ` Junio C Hamano
2022-10-12 12:52         ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
2022-10-26 19:06           ` Junio C Hamano
2022-10-12 12:52         ` [PATCH v5 10/12] bundle: add flags to verify_bundle() Derrick Stolee via GitGitGadget
2022-10-12 12:52         ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget
2022-10-12 16:32           ` Junio C Hamano
2022-10-12 12:52         ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget
2022-10-26 18:54           ` Junio C Hamano
2022-10-26 14:34         ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee
2022-10-26 16:06           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).