* [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists @ 2022-08-22 15:12 Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget ` (7 more replies) 0 siblings, 8 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git; +Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list. The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series). The patches are organized in this way: 1. Patches 1-2 create the bundle list data structures and the logic for populating the list from key-value pairs. 2. Patches 3-4 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2]. 3. Patch 5 teaches Git to parse Git config files into bundle lists. 4. Patches 6-7 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. [1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/ [2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/ At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file. As outlined in the design document [1], the next steps after this are: 1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2] 2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3] I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going: [3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com [4] https://github.com/derrickstolee/git/pull/21 [5] https://github.com/derrickstolee/git/pull/22 Thanks, * Stolee Derrick Stolee (5): bundle-uri: create bundle_list struct and helpers bundle-uri: create base key-value pair parsing bundle-uri: parse bundle list in config format bundle-uri: limit recursion depth for bundle lists bundle-uri: fetch a list of bundles Ævar Arnfjörð Bjarmason (2): bundle-uri: create "key=value" line parsing bundle-uri: unit test "key=value" parsing Documentation/config.txt | 2 + Documentation/config/bundle.txt | 22 ++ Makefile | 1 + bundle-uri.c | 442 +++++++++++++++++++++++++++++++- bundle-uri.h | 98 ++++++- t/helper/test-bundle-uri.c | 90 +++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5558-clone-bundle-uri.sh | 93 +++++++ t/t5750-bundle-uri-parse.sh | 141 ++++++++++ t/test-lib-functions.sh | 11 + 11 files changed, 889 insertions(+), 13 deletions(-) create mode 100644 Documentation/config/bundle.txt create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1333 -- gitgitgadget ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 1/7] bundle-uri: create bundle_list struct and helpers 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget 2022-08-22 17:57 ` Junio C Hamano 2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 7 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> It will likely be rare where a user uses a single bundle URI and expects that URI to point to a bundle. Instead, that URI will likely be a list of bundles provided in some format. Alternatively, the Git server could advertise a list of bundles. In anticipation of these two ways of advertising multiple bundles, create a data structure that represents such a list. This will be populated using a common API, but for now focus on what data can be represented. Each list contains a number of remote_bundle_info structs. These contain an 'id' that is used to uniquely identify them in the list, and also a 'uri' that contains the location of its data. Finally, there is a strbuf containing the filename used when Git downloads the contents to disk. The list itself stores these remote_bundle_info structs in a hashtable using 'id' as the key. The order of the structs in the input is considered unimportant, but future modifications to the format and these data structures will place ordering possibilities on the set. The list also has a few "global" properties, including the version (used when parsing the list) and the mode. The mode is one of these two options: 1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined together. The client should download all of the advertised data to have a complete copy of the data. 2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete copy of the data. The client can choose arbitrarily from these options. In the future, the client may use pings to find the closest URI among geodistributed replicas, or use some other heuristic information added to the format. This API is currently unused, but will soon be expanded with parsing logic and then be consumed by the bundle URI download logic. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 126 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 4a8cc74ed05..ceeef0b6641 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -4,6 +4,67 @@ #include "object-store.h" #include "refs.h" #include "run-command.h" +#include "hashmap.h" +#include "pkt-line.h" + +static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, + const struct hashmap_entry *he2, + const void *id) +{ + const struct remote_bundle_info *e1 = + container_of(he1, const struct remote_bundle_info, ent); + const struct remote_bundle_info *e2 = + container_of(he2, const struct remote_bundle_info, ent); + + return strcmp(e1->id, id ? (const char *)id : e2->id); +} + +void init_bundle_list(struct bundle_list *list) +{ + memset(list, 0, sizeof(*list)); + + /* Implied defaults. */ + list->mode = BUNDLE_MODE_ALL; + list->version = 1; + + hashmap_init(&list->bundles, compare_bundles, NULL, 0); +} + +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ + free(bundle->id); + free(bundle->uri); + strbuf_release(&bundle->file); + return 0; +} + +void clear_bundle_list(struct bundle_list *list) +{ + if (!list) + return; + + for_all_bundles_in_list(list, clear_remote_bundle_info, NULL); + hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent); +} + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data) +{ + struct remote_bundle_info *info; + struct hashmap_iter i; + + hashmap_for_each_entry(&list->bundles, &i, info, ent) { + int result = iter(info, data); + + if (result) + return result; + } + + return 0; +} static int find_temp_filename(struct strbuf *name) { diff --git a/bundle-uri.h b/bundle-uri.h index 8a152f1ef14..6692aa4b170 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -1,7 +1,72 @@ #ifndef BUNDLE_URI_H #define BUNDLE_URI_H +#include "hashmap.h" +#include "strbuf.h" + struct repository; +struct string_list; + +/** + * The remote_bundle_info struct contains information for a single bundle + * URI. This may be initialized simply by a given URI or might have + * additional metadata associated with it if the bundle was advertised by + * a bundle list. + */ +struct remote_bundle_info { + struct hashmap_entry ent; + + /** + * The 'id' is a name given to the bundle for reference + * by other bundle infos. + */ + char *id; + + /** + * The 'uri' is the location of the remote bundle so + * it can be downloaded on-demand. This will be NULL + * if there was no table of contents. + */ + char *uri; + + /** + * If the bundle has been downloaded, then 'file' is a + * filename storing its contents. Otherwise, 'file' is + * an empty string. + */ + struct strbuf file; +}; + +#define REMOTE_BUNDLE_INFO_INIT { \ + .file = STRBUF_INIT, \ +} + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, + BUNDLE_MODE_ALL, + BUNDLE_MODE_ANY +}; + +/** + * A bundle_list contains an unordered set of remote_bundle_info structs, + * as well as information about the bundle listing, such as version and + * mode. + */ +struct bundle_list { + int version; + enum bundle_list_mode mode; + struct hashmap bundles; +}; + +void init_bundle_list(struct bundle_list *list); +void clear_bundle_list(struct bundle_list *list); + +typedef int (*bundle_iterator)(struct remote_bundle_info *bundle, + void *data); + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data); /** * Fetch data from the given 'uri' and unbundle the bundle data found -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 1/7] bundle-uri: create bundle_list struct and helpers 2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget @ 2022-08-22 17:57 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-08-22 17:57 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +/** > + * The remote_bundle_info struct contains information for a single bundle > + * URI. This may be initialized simply by a given URI or might have > + * additional metadata associated with it if the bundle was advertised by > + * a bundle list. > + */ > +struct remote_bundle_info { > + struct hashmap_entry ent; > + > + /** > + * The 'id' is a name given to the bundle for reference > + * by other bundle infos. > + */ > + char *id; > + > + /** > + * The 'uri' is the location of the remote bundle so > + * it can be downloaded on-demand. This will be NULL > + * if there was no table of contents. > + */ > + char *uri; > + > + /** > + * If the bundle has been downloaded, then 'file' is a > + * filename storing its contents. Otherwise, 'file' is > + * an empty string. > + */ > + struct strbuf file; > +}; Presumably the sequence of events are that first a bundle list is obtained, with their .file member set to empty, then http worker(s) download and deposit the contents to files at which time the .file member is set to the resulting file. The file downloader presumably uses the usual "create a temporary file, download to it, and then commit it by closing and then renaming" dance, and the downloading http worker may want to have two strbufs somewhere it can access to come up with the name of the temporary and the name of the final file. But once the result becomes a committed file, its name will not change, or will it? At this step without the code that actually uses the data, use of strbuf, instead of "char *" like id and uri members do, smells like a premature optimization, and it is unclear if the optimization is even effective. Other than that, looks good to me. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 2/7] bundle-uri: create base key-value pair parsing 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget 2022-08-22 18:20 ` Junio C Hamano ` (2 more replies) 2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (5 subsequent siblings) 7 siblings, 3 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "<id>" as part of the key name. Keys using the same "<id>" describe a single bundle list item. * bundle.<id>.uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Documentation/config.txt | 2 + Documentation/config/bundle.txt | 22 ++++++++++ bundle-uri.c | 74 +++++++++++++++++++++++++++++++++ 3 files changed, 98 insertions(+) create mode 100644 Documentation/config/bundle.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index e376d547ce0..4280af6992e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -387,6 +387,8 @@ include::config/branch.txt[] include::config/browser.txt[] +include::config/bundle.txt[] + include::config/checkout.txt[] include::config/clean.txt[] diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt new file mode 100644 index 00000000000..3515bfe38d1 --- /dev/null +++ b/Documentation/config/bundle.txt @@ -0,0 +1,22 @@ +bundle.*:: + The `bundle.*` keys are used when communicating a list of bundle URIs + See link:technical/bundle-uri.html[the bundle URI design document] for + more details. + +bundle.version:: + This integer value advertises the version of the bundle list format + used by the bundle list. Currently, the only accepted value is `1`. + +bundle.mode:: + This string value should be either `all` or `any`. This value describes + whether all of the advertised bundles are required to unbundle a + complete understanding of the bundled information (`all`) or if any one + of the listed bundle URIs is sufficient (`any`). + +bundle.<id>.*:: + The `bundle.<id>.*` keys are used to describe a single item in the + bundle list, grouped under `<id>` for identification purposes. + +bundle.<id>.uri:: + This string value defines the URI by which Git can reach the contents + of this `<id>`. This URI may be a bundle file or another bundle list. diff --git a/bundle-uri.c b/bundle-uri.c index ceeef0b6641..ade7eccce39 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -66,6 +66,80 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +/** + * Given a key-value pair, update the state of the given bundle list. + * Returns 0 if the key-value pair is understood. Returns 1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ + const char *pkey, *dot; + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; + + if (!skip_prefix(key, "bundle.", &pkey)) + return 1; + + dot = strchr(pkey, '.'); + if (!dot) { + if (!strcmp(pkey, "version")) { + int version = atoi(value); + if (version != 1) + return 1; + + list->version = version; + return 0; + } + + if (!strcmp(pkey, "mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else + return 1; + return 0; + } + + /* Ignore other unknown global keys. */ + return 0; + } + + strbuf_add(&id, pkey, dot - pkey); + dot++; + + /* + * Check for an existing bundle with this <id>, or create one + * if necessary. + */ + lookup.id = id.buf; + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); + strbuf_init(&bundle->file, 0); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + + if (!strcmp(dot, "uri")) { + free(bundle->uri); + bundle->uri = xstrdup(value); + return 0; + } + + /* + * At this point, we ignore any information that we don't + * understand, assuming it to be hints for a heuristic the client + * does not currently understand. + */ + return 0; +} + static int find_temp_filename(struct strbuf *name) { int fd; -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing 2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-08-22 18:20 ` Junio C Hamano 2022-08-23 16:29 ` Derrick Stolee 2022-08-31 22:02 ` Glen Choo 2022-09-01 2:38 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Teng Long 2 siblings, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-08-22 18:20 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/Documentation/config.txt b/Documentation/config.txt > index e376d547ce0..4280af6992e 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -387,6 +387,8 @@ include::config/branch.txt[] > > include::config/browser.txt[] > > +include::config/bundle.txt[] > + The file that records a list of bundles may borrow the format of git config files, but will we store their contents in configuration files in the receiving (or originating) repository? With the presence of fields like "bundle.version", I somehow doubt it. Should "git config --help" list them? > diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt > new file mode 100644 > index 00000000000..3515bfe38d1 > --- /dev/null > +++ b/Documentation/config/bundle.txt If the answer is "no", then this file looks out of place. > diff --git a/bundle-uri.c b/bundle-uri.c > index ceeef0b6641..ade7eccce39 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -66,6 +66,80 @@ int for_all_bundles_in_list(struct bundle_list *list, > return 0; > } > > +/** > + * Given a key-value pair, update the state of the given bundle list. > + * Returns 0 if the key-value pair is understood. Returns 1 if the key > + * is not understood or the value is malformed. Let's stick to the "error is negative" if we do not have a strong reason not to. > + */ > +MAYBE_UNUSED > +static int bundle_list_update(const char *key, const char *value, > + struct bundle_list *list) > +{ > + const char *pkey, *dot; > + struct strbuf id = STRBUF_INIT; > + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; > + struct remote_bundle_info *bundle; > + > + if (!skip_prefix(key, "bundle.", &pkey)) > + return 1; > + dot = strchr(pkey, '.'); > + if (!dot) { > + if (!strcmp(pkey, "version")) { > + int version = atoi(value); Can atoi() safely fail? Are we happy of pkey that says "1A" and we parse it as "1"? > + if (version != 1) > + return 1; > + > + list->version = version; > + return 0; > + } Is it OK for a bundle list described in the config-file format to have "bundle.version" twice, giving different values? It feels counter-intuitive to apply the "last one wins" rule that is usual for configuration files. > + if (!strcmp(pkey, "mode")) { > + if (!strcmp(value, "all")) > + list->mode = BUNDLE_MODE_ALL; > + else if (!strcmp(value, "any")) > + list->mode = BUNDLE_MODE_ANY; > + else > + return 1; > + return 0; > + } Likewise for bundle.mode > + /* Ignore other unknown global keys. */ > + return 0; > + } > + > + strbuf_add(&id, pkey, dot - pkey); > + dot++; > + > + /* > + * Check for an existing bundle with this <id>, or create one > + * if necessary. > + */ > + lookup.id = id.buf; > + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); > + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { > + CALLOC_ARRAY(bundle, 1); > + bundle->id = strbuf_detach(&id, NULL); > + strbuf_init(&bundle->file, 0); > + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); > + hashmap_add(&list->bundles, &bundle->ent); > + } > + strbuf_release(&id); > + > + if (!strcmp(dot, "uri")) { > + free(bundle->uri); > + bundle->uri = xstrdup(value); > + return 0; > + } This explicitly implements "the last one wins". Would it really make sense for a server to serve a bundle list that says redundant and wasteful pieces of information, i.e. [bundle "1"] url = one url = two It is not like doing so would allow us to reuse an otherwise mostly good file by appending new information and that would be a performance or storage win. So I am not quite sure why we want "the last one wins" rule here. It instead looks like something we want to sanity check and complain about. > + /* > + * At this point, we ignore any information that we don't > + * understand, assuming it to be hints for a heuristic the client > + * does not currently understand. > + */ This is sensible. > + return 0; > +} > + > static int find_temp_filename(struct strbuf *name) > { > int fd; ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing 2022-08-22 18:20 ` Junio C Hamano @ 2022-08-23 16:29 ` Derrick Stolee 2022-08-31 22:10 ` Jonathan Tan 0 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee @ 2022-08-23 16:29 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon On 8/22/2022 2:20 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> diff --git a/Documentation/config.txt b/Documentation/config.txt >> index e376d547ce0..4280af6992e 100644 >> --- a/Documentation/config.txt >> +++ b/Documentation/config.txt >> @@ -387,6 +387,8 @@ include::config/branch.txt[] >> >> include::config/browser.txt[] >> >> +include::config/bundle.txt[] >> + > > The file that records a list of bundles may borrow the format of git > config files, but will we store their contents in configuration > files in the receiving (or originating) repository? With the > presence of fields like "bundle.version", I somehow doubt it. > > Should "git config --help" list them? I suppose that at this point, they should be left out, since writing them to your Git config does nothing. In the future, having these config values present will advertise the bundle list during the 'bundle-uri' protocol v2 command. That could use some clarification in the documentation, too, perhaps with a "bundle.*" item discussing how all of the other items are related to that advertisement. >> +/** >> + * Given a key-value pair, update the state of the given bundle list. >> + * Returns 0 if the key-value pair is understood. Returns 1 if the key >> + * is not understood or the value is malformed. > > Let's stick to the "error is negative" if we do not have a strong > reason not to. Right. Can do. >> + */ >> +MAYBE_UNUSED >> +static int bundle_list_update(const char *key, const char *value, >> + struct bundle_list *list) >> +{ >> + const char *pkey, *dot; >> + struct strbuf id = STRBUF_INIT; >> + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; >> + struct remote_bundle_info *bundle; >> + >> + if (!skip_prefix(key, "bundle.", &pkey)) >> + return 1; >> + dot = strchr(pkey, '.'); >> + if (!dot) { >> + if (!strcmp(pkey, "version")) { >> + int version = atoi(value); > > Can atoi() safely fail? Are we happy of pkey that says "1A" and we > parse it as "1"? > >> + if (version != 1) >> + return 1; >> + >> + list->version = version; >> + return 0; >> + } > > Is it OK for a bundle list described in the config-file format to > have "bundle.version" twice, giving different values? It feels > counter-intuitive to apply the "last one wins" rule that is usual > for configuration files. ... > This explicitly implements "the last one wins". Would it really > make sense for a server to serve a bundle list that says redundant > and wasteful pieces of information, i.e. > > [bundle "1"] > url = one > url = two > > It is not like doing so would allow us to reuse an otherwise mostly > good file by appending new information and that would be a performance > or storage win. So I am not quite sure why we want "the last one wins" > rule here. It instead looks like something we want to sanity check > and complain about. I could switch this to "expect at most one value" and add warnings for duplicate keys. Should duplicate keys then mean "the bundle list is malformed, abort downloading bundles"? That seems reasonable to me. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing 2022-08-23 16:29 ` Derrick Stolee @ 2022-08-31 22:10 ` Jonathan Tan 0 siblings, 0 replies; 94+ messages in thread From: Jonathan Tan @ 2022-08-31 22:10 UTC (permalink / raw) To: Derrick Stolee Cc: Jonathan Tan, Junio C Hamano, Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon Derrick Stolee <derrickstolee@github.com> writes: > On 8/22/2022 2:20 PM, Junio C Hamano wrote: > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > >> diff --git a/Documentation/config.txt b/Documentation/config.txt > >> index e376d547ce0..4280af6992e 100644 > >> --- a/Documentation/config.txt > >> +++ b/Documentation/config.txt > >> @@ -387,6 +387,8 @@ include::config/branch.txt[] > >> > >> include::config/browser.txt[] > >> > >> +include::config/bundle.txt[] > >> + > > > > The file that records a list of bundles may borrow the format of git > > config files, but will we store their contents in configuration > > files in the receiving (or originating) repository? With the > > presence of fields like "bundle.version", I somehow doubt it. > > > > Should "git config --help" list them? > > I suppose that at this point, they should be left out, since > writing them to your Git config does nothing. > > In the future, having these config values present will advertise > the bundle list during the 'bundle-uri' protocol v2 command. That > could use some clarification in the documentation, too, perhaps > with a "bundle.*" item discussing how all of the other items are > related to that advertisement. I think the main point of confusion is that these config variables currently do nothing when in a repo config, but they will be subsequently used once we implement advertising them, and it is convenient that these configs delegate to other files that have the same format (and that we can specify, at the CLI, a file of the same format). Maybe documentation like this would clear up the confusion: bundle.*:: The `bundle.*` keys may appear in a repo's config, in a file linked by bundle.<id>.uri, or in a file passed to "clone --bundle-uri". + NEEDSWORK: Currently, only the latter 2 situations work. `bundle.*` keys appearing in a repo's config will take effect once support for advertising bundles in fetch protocol v2 is implemented. + See link:technical/bundle-uri.html[the bundle URI design document] for more details. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 2/7] bundle-uri: create base key-value pair parsing 2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-08-22 18:20 ` Junio C Hamano @ 2022-08-31 22:02 ` Glen Choo 2022-09-01 2:38 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Teng Long 2 siblings, 0 replies; 94+ messages in thread From: Glen Choo @ 2022-08-31 22:02 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget, git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +/** > + * Given a key-value pair, update the state of the given bundle list. > + * Returns 0 if the key-value pair is understood. Returns 1 if the key > + * is not understood or the value is malformed. > + */ > +MAYBE_UNUSED > +static int bundle_list_update(const char *key, const char *value, > + struct bundle_list *list) > +{ > + const char *pkey, *dot; > + struct strbuf id = STRBUF_INIT; > + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; > + struct remote_bundle_info *bundle; > + > + if (!skip_prefix(key, "bundle.", &pkey)) > + return 1; > + > + dot = strchr(pkey, '.'); > + if (!dot) { > + if (!strcmp(pkey, "version")) { > + int version = atoi(value); > + if (version != 1) > + return 1; > + > + list->version = version; > + return 0; > + } > + > + if (!strcmp(pkey, "mode")) { > + if (!strcmp(value, "all")) > + list->mode = BUNDLE_MODE_ALL; > + else if (!strcmp(value, "any")) > + list->mode = BUNDLE_MODE_ANY; > + else > + return 1; > + return 0; > + } Drive-by comment from Review Club: we could simplify "section.[subsection.]key" parsing using parse_config_key(). There are other places in the code that do custom parsing like this, but maybe they should use parse_config_key() too. > + > + /* Ignore other unknown global keys. */ > + return 0; > + } > + > + strbuf_add(&id, pkey, dot - pkey); > + dot++; > + > + /* > + * Check for an existing bundle with this <id>, or create one > + * if necessary. > + */ > + lookup.id = id.buf; > + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); > + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { > + CALLOC_ARRAY(bundle, 1); > + bundle->id = strbuf_detach(&id, NULL); > + strbuf_init(&bundle->file, 0); > + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); > + hashmap_add(&list->bundles, &bundle->ent); > + } > + strbuf_release(&id); > + > + if (!strcmp(dot, "uri")) { > + free(bundle->uri); > + bundle->uri = xstrdup(value); > + return 0; > + } > + > + /* > + * At this point, we ignore any information that we don't > + * understand, assuming it to be hints for a heuristic the client > + * does not currently understand. > + */ > + return 0; > +} > + > static int find_temp_filename(struct strbuf *name) > { > int fd; > -- > gitgitgadget ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 4/7] bundle-uri: unit test "key=value" parsing 2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-08-22 18:20 ` Junio C Hamano 2022-08-31 22:02 ` Glen Choo @ 2022-09-01 2:38 ` Teng Long 2 siblings, 0 replies; 94+ messages in thread From: Teng Long @ 2022-09-01 2:38 UTC (permalink / raw) To: gitgitgadget Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren, steadmon, tenglong.tl Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > +> + init_bundle_list(&list); > + while (strbuf_getline(&sb, stdin) != EOF) { > + if (bundle_uri_parse_line(&list, sb.buf) < 0) > + err = error("bad line: '%s'", sb.buf); > + } The command to write such a test is useful for people who want to experiment about the feature, Thanks. On top of that, I have a little question about the condition: if (bundle_uri_parse_line(&list, sb.buf) < 0) "bundle_uri_parse_line" will call "bundle_list_update" inside, and could get the result of it as "bundle_uri_parse_line"'s return, then actually "bundle_list_update" could return "1", so I'm not sure but maybe the line could modified to: if (bundle_uri_parse_line(&list, sb.buf)) at here. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 3/7] bundle-uri: create "key=value" line parsing 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-08-22 19:17 ` Junio C Hamano 2022-09-02 23:41 ` Josh Steadmon 2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (4 subsequent siblings) 7 siblings, 2 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> When advertising a bundle list over Git's protocol v2, we will use packet lines. Each line will be of the form "key=value" representing a bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++++++++- bundle-uri.h | 14 +++++++++++++- 2 files changed, 39 insertions(+), 2 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index ade7eccce39..9a7d09349fe 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list, * Returns 0 if the key-value pair is understood. Returns 1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED static int bundle_list_update(const char *key, const char *value, struct bundle_list *list) { @@ -301,3 +300,29 @@ cleanup: strbuf_release(&filename); return result; } + +/** + * General API for {transport,connect}.c etc. + */ +int bundle_uri_parse_line(struct bundle_list *list, const char *line) +{ + int result; + const char *equals; + struct strbuf key = STRBUF_INIT; + + if (!strlen(line)) + return error(_("bundle-uri: got an empty line")); + + equals = strchr(line, '='); + + if (!equals) + return error(_("bundle-uri: line is not of the form 'key=value'")); + if (line == equals || !*(equals + 1)) + return error(_("bundle-uri: line has empty key or value")); + + strbuf_add(&key, line, equals - line); + result = bundle_list_update(key.buf, equals + 1, list); + strbuf_release(&key); + + return result; +} diff --git a/bundle-uri.h b/bundle-uri.h index 6692aa4b170..f725c9796f7 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -76,4 +76,16 @@ int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); -#endif +/** + * General API for {transport,connect}.c etc. + */ + +/** + * Parse a "key=value" packet line from the bundle-uri verb. + * + * Returns 0 on success and non-zero on error. + */ +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + +#endif /* BUNDLE_URI_H */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 3/7] bundle-uri: create "key=value" line parsing 2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 19:17 ` Junio C Hamano 2022-08-23 16:31 ` Derrick Stolee 2022-09-02 23:41 ` Josh Steadmon 1 sibling, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-08-22 19:17 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee "Ævar Arnfjörð Bjarmason via GitGitGadget" <gitgitgadget@gmail.com> writes: > +/** > + * General API for {transport,connect}.c etc. > + */ > +int bundle_uri_parse_line(struct bundle_list *list, const char *line) > +{ > + int result; > + const char *equals; > + struct strbuf key = STRBUF_INIT; > + > + if (!strlen(line)) > + return error(_("bundle-uri: got an empty line")); > + > + equals = strchr(line, '='); > + > + if (!equals) > + return error(_("bundle-uri: line is not of the form 'key=value'")); > + if (line == equals || !*(equals + 1)) > + return error(_("bundle-uri: line has empty key or value")); The suggestions implied by my asking fall strictly into the "it does not have to exist here at this step and we can later extend it", but for something whose equivalent can be stored in our configuration file, it is curious why we _insist_ to refuse an empty string as the value. I do not miss the "key alone without even '=' means 'true'" convention, personally, so insisting to have '=' is OK, but the inability to have an empty string as a value looks a bit disturbing. This depends on how the helper gets called, but most likely the caller has a single line of pkt-line that it GAVE us to process, so it sounds a bit wasteful to insist that "line" to be const to us and force us to use a separate strbuf, instead of just stuffing NUL at where we found '=' and pass the two halves to bundle_list_update(). Not a huge deal, it is just something I found funny in the "back in the days we coded together, Linus would never have written like this" way. Other than that small detail, the code looks OK to me. > + strbuf_add(&key, line, equals - line); > + result = bundle_list_update(key.buf, equals + 1, list); > + strbuf_release(&key); > + > + return result; > +} > diff --git a/bundle-uri.h b/bundle-uri.h > index 6692aa4b170..f725c9796f7 100644 > --- a/bundle-uri.h > +++ b/bundle-uri.h > @@ -76,4 +76,16 @@ int for_all_bundles_in_list(struct bundle_list *list, > */ > int fetch_bundle_uri(struct repository *r, const char *uri); > > -#endif > +/** > + * General API for {transport,connect}.c etc. > + */ > + > +/** > + * Parse a "key=value" packet line from the bundle-uri verb. > + * > + * Returns 0 on success and non-zero on error. > + */ > +int bundle_uri_parse_line(struct bundle_list *list, > + const char *line); > + > +#endif /* BUNDLE_URI_H */ ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 3/7] bundle-uri: create "key=value" line parsing 2022-08-22 19:17 ` Junio C Hamano @ 2022-08-23 16:31 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-08-23 16:31 UTC (permalink / raw) To: Junio C Hamano, Ævar Arnfjörð Bjarmason via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon On 8/22/2022 3:17 PM, Junio C Hamano wrote: > "Ævar Arnfjörð Bjarmason via GitGitGadget" <gitgitgadget@gmail.com> > writes: > >> +/** >> + * General API for {transport,connect}.c etc. >> + */ >> +int bundle_uri_parse_line(struct bundle_list *list, const char *line) >> +{ >> + int result; >> + const char *equals; >> + struct strbuf key = STRBUF_INIT; >> + >> + if (!strlen(line)) >> + return error(_("bundle-uri: got an empty line")); >> + >> + equals = strchr(line, '='); >> + >> + if (!equals) >> + return error(_("bundle-uri: line is not of the form 'key=value'")); >> + if (line == equals || !*(equals + 1)) >> + return error(_("bundle-uri: line has empty key or value")); > > The suggestions implied by my asking fall strictly into the "it does > not have to exist here at this step and we can later extend it", but > for something whose equivalent can be stored in our configuration > file, it is curious why we _insist_ to refuse an empty string as the > value. > > I do not miss the "key alone without even '=' means 'true'" > convention, personally, so insisting to have '=' is OK, but the > inability to have an empty string as a value looks a bit disturbing. I'd be happy to switch this to allow an empty value. > This depends on how the helper gets called, but most likely the > caller has a single line of pkt-line that it GAVE us to process, so > it sounds a bit wasteful to insist that "line" to be const to us and > force us to use a separate strbuf, instead of just stuffing NUL at > where we found '=' and pass the two halves to bundle_list_update(). I can look into using a non-const buffer. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 3/7] bundle-uri: create "key=value" line parsing 2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-08-22 19:17 ` Junio C Hamano @ 2022-09-02 23:41 ` Josh Steadmon 1 sibling, 0 replies; 94+ messages in thread From: Josh Steadmon @ 2022-09-02 23:41 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason via GitGitGadget Cc: git, gitster, me, newren, avarab, mjcheetham, Derrick Stolee On 2022.08.22 15:12, Ævar Arnfjörð Bjarmason via GitGitGadget wrote: > From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= > <avarab@gmail.com> > > When advertising a bundle list over Git's protocol v2, we will use > packet lines. Each line will be of the form "key=value" representing a > bundle list. Connect the API necessary for Git's transport to the > key-value pair parsing created in the previous change. Since we're not actually implementing advertisement via proto v2 in this series, could we add an additional paragraph noting that this is useful now for implementing the test helper in the next patch? ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 4/7] bundle-uri: unit test "key=value" parsing 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 15:12 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-09-01 2:56 ` Teng Long 2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 7 siblings, 1 reply; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> Create a new 'test-tool bundle-uri' test helper. This helper will assist in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses stdin as a list of lines. These are fed into bundle_uri_parse_line() to test how we construct a 'struct bundle_list' from that data. The list is then output to stdout as if the key-value pairs were a Git config file. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Makefile | 1 + bundle-uri.c | 33 ++++++++++++++ bundle-uri.h | 3 ++ t/helper/test-bundle-uri.c | 63 +++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5750-bundle-uri-parse.sh | 91 +++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 11 +++++ 8 files changed, 204 insertions(+) create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh diff --git a/Makefile b/Makefile index 7d5f48069ea..7dee0329c49 100644 --- a/Makefile +++ b/Makefile @@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bitmap.o TEST_BUILTINS_OBJS += test-bloom.o +TEST_BUILTINS_OBJS += test-bundle-uri.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o TEST_BUILTINS_OBJS += test-crontab.o diff --git a/bundle-uri.c b/bundle-uri.c index 9a7d09349fe..d56c5e33d5f 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +static int summarize_bundle(struct remote_bundle_info *info, void *data) +{ + FILE *fp = data; + fprintf(fp, "[bundle \"%s\"]\n", info->id); + fprintf(fp, "\turi = %s\n", info->uri); + return 0; +} + +void print_bundle_list(FILE *fp, struct bundle_list *list) +{ + const char *mode; + + switch (list->mode) { + case BUNDLE_MODE_ALL: + mode = "all"; + break; + + case BUNDLE_MODE_ANY: + mode = "any"; + break; + + case BUNDLE_MODE_NONE: + default: + mode = "<unknown>"; + } + + printf("[bundle]\n"); + printf("\tversion = %d\n", list->version); + printf("\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} + /** * Given a key-value pair, update the state of the given bundle list. * Returns 0 if the key-value pair is understood. Returns 1 if the key diff --git a/bundle-uri.h b/bundle-uri.h index f725c9796f7..41a1510a4ac 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -68,6 +68,9 @@ int for_all_bundles_in_list(struct bundle_list *list, bundle_iterator iter, void *data); +struct FILE; +void print_bundle_list(FILE *fp, struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c new file mode 100644 index 00000000000..5cb0c9196fa --- /dev/null +++ b/t/helper/test-bundle-uri.c @@ -0,0 +1,63 @@ +#include "test-tool.h" +#include "parse-options.h" +#include "bundle-uri.h" +#include "strbuf.h" +#include "string-list.h" + +static int cmd__bundle_uri_parse_key_values(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri parse-key-values <in", + NULL + }; + struct option options[] = { + OPT_END(), + }; + struct strbuf sb = STRBUF_INIT; + struct bundle_list list; + int err = 0; + + argc = parse_options(argc, argv, NULL, options, usage, 0); + if (argc) + goto usage; + + init_bundle_list(&list); + while (strbuf_getline(&sb, stdin) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf) < 0) + err = error("bad line: '%s'", sb.buf); + } + strbuf_release(&sb); + + print_bundle_list(stdout, &list); + + clear_bundle_list(&list); + + return !!err; + +usage: + usage_with_options(usage, options); +} + +int cmd__bundle_uri(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri <subcommand> [<options>]", + NULL + }; + struct option options[] = { + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_ARGV0); + if (argc == 1) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) + return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: + usage_with_options(usage, options); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c3..fbe2d9d8108 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "advise", cmd__advise_if_enabled }, { "bitmap", cmd__bitmap }, { "bloom", cmd__bloom }, + { "bundle-uri", cmd__bundle_uri }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, { "crontab", cmd__crontab }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb799271631..b2aa1f39a8f 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -7,6 +7,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); +int cmd__bundle_uri(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); int cmd__crontab(int argc, const char **argv); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh new file mode 100755 index 00000000000..675c1f1d2f4 --- /dev/null +++ b/t/t5750-bundle-uri-parse.sh @@ -0,0 +1,91 @@ +#!/bin/sh + +test_description="Test bundle-uri bundle_uri_parse_line()" + +TEST_NO_CREATE_REPO=1 +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'bundle_uri_parse_line() just URIs' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-key-values <in >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' ' + cat >in <<-\EOF && + =bogus-value + bogus-key= + EOF + + cat >err.expect <<-EOF && + error: bundle-uri: line has empty key or value + error: bad line: '\''=bogus-value'\'' + error: bundle-uri: line has empty key or value + error: bad line: '\''bogus-key='\'' + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + + bundle.two.uri=https://example.com/bundle.bdl + + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6da7273f1d5..3175d665add 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1956,3 +1956,14 @@ test_is_magic_mtime () { rm -f .git/test-mtime-actual return $ret } + +# Given two filenames, parse both using 'git config --list --file' +# and compare the sorted output of those commands. Useful when +# wanting to ignore whitespace differences and sorting concerns. +test_cmp_config_output () { + git config --list --file="$1" >config-expect && + git config --list --file="$2" >config-actual && + sort config-expect >sorted-expect && + sort config-actual >sorted-actual && + test_cmp sorted-expect sorted-actual +} -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH 4/7] bundle-uri: unit test "key=value" parsing 2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-01 2:56 ` Teng Long 0 siblings, 0 replies; 94+ messages in thread From: Teng Long @ 2022-09-01 2:56 UTC (permalink / raw) To: gitgitgadget Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren, steadmon, tenglong.tl Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes > +void print_bundle_list(FILE *fp, struct bundle_list *list) > +{ > + const char *mode; > + > + switch (list->mode) { > + case BUNDLE_MODE_ALL: > + mode = "all"; > + break; > + > + case BUNDLE_MODE_ANY: > + mode = "any"; > + break; > + > + case BUNDLE_MODE_NONE: > + default: > + mode = "<unknown>"; > + } > + > + printf("[bundle]\n"); > + printf("\tversion = %d\n", list->version); > + printf("\tmode = %s\n", mode); > + > + for_all_bundles_in_list(list, summarize_bundle, fp); > +} "print_bundle_list" use to print the git config formatting lines of "bundle_list", it's supported to use a "FILE *fp" as it's output. The "for_all_bundles_in_list" use it, but other places seems not, I'm not sure, maybe we should change to: fprintf(fp, "[bundle]\n"); fprintf(fp, "\tversion = %d\n", list->version); fprintf(fp, "\tmode = %s\n", mode); here? Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 5/7] bundle-uri: parse bundle list in config format 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget 2022-08-22 19:25 ` Junio C Hamano 2022-09-01 8:05 ` Teng Long 2022-08-22 15:12 ` [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 7 siblings, 2 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When a bundle provider wants to operate independently from a Git remote, they want to provide a single, consistent URI that users can use in their 'git clone --bundle-uri' commands. At this point, the Git client expects that URI to be a single bundle that can be unbundled and used to bootstrap the rest of the clone from the Git server. This single bundle cannot be re-used to assist with future incremental fetches. To allow for the incremental fetch case, teach Git to understand a bundle list that could be advertised at an independent bundle URI. Such a bundle list is likely to be inspected by human readers, even if only by the bundle provider creating the list. For this reason, we can take our expected "key=value" pairs and instead format them using Git config format. Create parse_bundle_list_in_config_format() to parse a file in config format and convert that into a 'struct bundle_list' filled with its understanding of the contents. Be careful to call git_config_from_file_with_options() because the default action for git_config_from_file() is to die() on a parsing error. The current warning isn't particularly helpful if it arises to a user, but it will be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce bugs than this simpler version. I would rather have a slightly confusing test helper than complicated product code. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 29 +++++++++++++++++++++ bundle-uri.h | 10 ++++++++ t/helper/test-bundle-uri.c | 45 ++++++++++++++++++++++++++------- t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++ 4 files changed, 125 insertions(+), 9 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index d56c5e33d5f..dca88ed1e89 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -6,6 +6,7 @@ #include "run-command.h" #include "hashmap.h" #include "pkt-line.h" +#include "config.h" static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, @@ -172,6 +173,34 @@ static int bundle_list_update(const char *key, const char *value, return 0; } +static int config_to_bundle_list(const char *key, const char *value, void *data) +{ + struct bundle_list *list = data; + return bundle_list_update(key, value, list); +} + +int parse_bundle_list_in_config_format(const char *uri, + const char *filename, + struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + + list->mode = BUNDLE_MODE_NONE; + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); + + if (!result && list->mode == BUNDLE_MODE_NONE) { + warning(_("bundle list at '%s' has no mode"), uri); + result = 1; + } + + return result; +} + static int find_temp_filename(struct strbuf *name) { int fd; diff --git a/bundle-uri.h b/bundle-uri.h index 41a1510a4ac..294ac804140 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -71,6 +71,16 @@ int for_all_bundles_in_list(struct bundle_list *list, struct FILE; void print_bundle_list(FILE *fp, struct bundle_list *list); +/** + * A bundle URI may point to a bundle list where the key=value + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ + +int parse_bundle_list_in_config_format(const char *uri, + const char *filename, + struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c index 5cb0c9196fa..23ce0eebca3 100644 --- a/t/helper/test-bundle-uri.c +++ b/t/helper/test-bundle-uri.c @@ -4,27 +4,52 @@ #include "strbuf.h" #include "string-list.h" -static int cmd__bundle_uri_parse_key_values(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, +}; + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { - const char *usage[] = { + const char *key_value_usage[] = { "test-tool bundle-uri parse-key-values <in", NULL }; + const char *config_usage[] = { + "test-tool bundle-uri parse-config <input>", + NULL + }; struct option options[] = { OPT_END(), }; + const char **usage = key_value_usage; struct strbuf sb = STRBUF_INIT; struct bundle_list list; int err = 0; - argc = parse_options(argc, argv, NULL, options, usage, 0); - if (argc) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); - while (strbuf_getline(&sb, stdin) != EOF) { - if (bundle_uri_parse_line(&list, sb.buf) < 0) - err = error("bad line: '%s'", sb.buf); + + switch (mode) { + case KEY_VALUE_PAIRS: + if (argc) + goto usage; + while (strbuf_getline(&sb, stdin) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf) < 0) + err = error("bad line: '%s'", sb.buf); + } + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; + err = parse_bundle_list_in_config_format("<uri>", argv[0], &list); + break; } strbuf_release(&sb); @@ -55,7 +80,9 @@ int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) - return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); error("there is no test-tool bundle-uri tool '%s'", argv[1]); usage: diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 675c1f1d2f4..dd9dc36bfd7 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -88,4 +88,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format: just URIs' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'parse config format edge cases: empty key or value' ' + cat >in1 <<-\EOF && + = bogus-value + EOF + + cat >err1 <<-EOF && + error: bad config line 1 in file in1 + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = <unknown> + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && + test_cmp err1 err && + test_cmp_config_output expect actual && + + cat >in2 <<-\EOF && + bogus-key = + EOF + + cat >err2 <<-EOF && + warning: bundle list at '\''<uri>'\'' has no mode + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && + test_cmp err2 err && + test_cmp_config_output expect actual +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 5/7] bundle-uri: parse bundle list in config format 2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget @ 2022-08-22 19:25 ` Junio C Hamano 2022-08-23 16:43 ` Derrick Stolee 2022-08-31 22:18 ` Jonathan Tan 2022-09-01 8:05 ` Teng Long 1 sibling, 2 replies; 94+ messages in thread From: Junio C Hamano @ 2022-08-22 19:25 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > To allow for the incremental fetch case, teach Git to understand a > bundle list that could be advertised at an independent bundle URI. Such > a bundle list is likely to be inspected by human readers, even if only > by the bundle provider creating the list. For this reason, we can take > our expected "key=value" pairs and instead format them using Git config > format. "can" does not explain why it is a good idea. "As a sequence of key=value pairs is a lot more dense and harder to read than the configuration file format, let's declare that it is the format we use in a file that holds a bundle-list" would be. I do not personally buy it, though. As I hinted in an earlier step, some trait we associate with our configuration fioe format, like the "last one wins" semantics, are undesirable ones, so even if we reuse the appearance of the text, the semantics would have to become different (including "syntax errors lead to die()" mentioned elsewhere in the proposed log message). > Update 'test-tool bundle-uri' to take this config file format as input. > It uses a filename instead of stdin because there is no existing way to > parse a FILE pointer in the config machinery. Using > git_config_from_mem() is overly complicated and more likely to introduce > bugs than this simpler version. I would rather have a slightly confusing > test helper than complicated product code. All the troubles described above seem to come from the initial mistake to try reusing the configuration file parser or reusing the configuration file format, at least to me. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 5/7] bundle-uri: parse bundle list in config format 2022-08-22 19:25 ` Junio C Hamano @ 2022-08-23 16:43 ` Derrick Stolee 2022-08-31 22:18 ` Jonathan Tan 1 sibling, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-08-23 16:43 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon On 8/22/2022 3:25 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> To allow for the incremental fetch case, teach Git to understand a >> bundle list that could be advertised at an independent bundle URI. Such >> a bundle list is likely to be inspected by human readers, even if only >> by the bundle provider creating the list. For this reason, we can take >> our expected "key=value" pairs and instead format them using Git config >> format. > > "can" does not explain why it is a good idea. "As a sequence of > key=value pairs is a lot more dense and harder to read than the > configuration file format, let's declare that it is the format we > use in a file that holds a bundle-list" would be. This "more dense and harder to read" was definitely my intention for wanting a different format. > I do not personally buy it, though. As I hinted in an earlier step, > some trait we associate with our configuration fioe format, like the > "last one wins" semantics, are undesirable ones, so even if we reuse > the appearance of the text, the semantics would have to become > different (including "syntax errors lead to die()" mentioned > elsewhere in the proposed log message). The points you made earlier about "last one wins" semantics are the biggest road-blocks to using the config file format, from what I've read so far. We could change those semantics to be different from my current implementation which respects the "last one wins" rule, and then that makes the config format match not as closely. That burden of avoiding multiple key values is not on the end-user but the bundle provider to match the new expectations. (There might be something we should be careful about when advertising the bundle list from our Git config in the 'bundle-uri' command in the next series.) The "syntax errors lead to die()" is mitigated by using CONFIG_ERROR_ERROR, which is what I meant by "Be careful to call..." I should have been more clear that we are _not_ going to die() based on the remote data. We might write an error message and then abort the bundle download. With all of these points in mind, I'd still prefer to use the config file format as described in the design document. If you still don't agree, then I'll change the format to be key=value pairs split with newlines, and update the design document accordingly. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 5/7] bundle-uri: parse bundle list in config format 2022-08-22 19:25 ` Junio C Hamano 2022-08-23 16:43 ` Derrick Stolee @ 2022-08-31 22:18 ` Jonathan Tan 1 sibling, 0 replies; 94+ messages in thread From: Jonathan Tan @ 2022-08-31 22:18 UTC (permalink / raw) To: Junio C Hamano Cc: Jonathan Tan, Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee Junio C Hamano <gitster@pobox.com> writes: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > To allow for the incremental fetch case, teach Git to understand a > > bundle list that could be advertised at an independent bundle URI. Such > > a bundle list is likely to be inspected by human readers, even if only > > by the bundle provider creating the list. For this reason, we can take > > our expected "key=value" pairs and instead format them using Git config > > format. > > "can" does not explain why it is a good idea. "As a sequence of > key=value pairs is a lot more dense and harder to read than the > configuration file format, let's declare that it is the format we > use in a file that holds a bundle-list" would be. > > I do not personally buy it, though. As I hinted in an earlier step, > some trait we associate with our configuration fioe format, like the > "last one wins" semantics, are undesirable ones, so even if we reuse > the appearance of the text, the semantics would have to become > different (including "syntax errors lead to die()" mentioned > elsewhere in the proposed log message). One reason for using the configuration file format (which perhaps could have been better explained in the commit message) is that we plan to have a way for a repo to advertise a list of bundles during fetch. I think that config is a natural place to put that, even with its "last one wins" semantics. It could be argued that we can just put a single URI in config and only allow advertising of a single URI (and then use a different format for the bundle lists with semantics that are stricter than "last one wins"), but that seems unnecessarily restrictive (and would make the client make one more network request). And if we're advertising multiple bundles, it seems reasonable to make all bundle lists have the same format (whether they are in config or in a separate file). ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 5/7] bundle-uri: parse bundle list in config format 2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget 2022-08-22 19:25 ` Junio C Hamano @ 2022-09-01 8:05 ` Teng Long 1 sibling, 0 replies; 94+ messages in thread From: Teng Long @ 2022-09-01 8:05 UTC (permalink / raw) To: gitgitgadget Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren, steadmon, tenglong.tl Derrick Stolee <derrickstolee@github.com> writes: > diff --git a/bundle-uri.h b/bundle-uri.h > index 41a1510a4ac..294ac804140 100644 > --- a/bundle-uri.h > +++ b/bundle-uri.h > @@ -71,6 +71,16 @@ int for_all_bundles_in_list(struct bundle_list *list, > struct FILE; > void print_bundle_list(FILE *fp, struct bundle_list *list); > > +/** > + * A bundle URI may point to a bundle list where the key=value > + * pairs are provided in config file format. This method is > + * exposed publicly for testing purposes. > + */ > + > +int parse_bundle_list_in_config_format(const char *uri, > + const char *filename, > + struct bundle_list *list); > + Although the comment clarifies the purpose of why to introduce "parse_bundle_list_in_config_format", but I think this API is useful if finally config format is supported. So far, we have a API names "bundle_uri_parse_line" which is used to parsing key-value pairs and package into bundle list, I think maybe we should rename the API name from "parse_bundle_list_in_config_format" to "bundle_uri_parse_config_format", maybe better in my opinion for more consistent naming. I think it doesnt break anything, feel free to accept or remain. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 7 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The next change will start allowing us to parse bundle lists that are downloaded from a provided bundle URI. Those lists might point to other lists, which could proceed to an arbitrary depth (and even create cycles). Restructure fetch_bundle_uri() to have an internal version that has a recursion depth. Compare that to a new max_bundle_uri_depth constant that is twice as high as we expect this depth to be for any legitimate use of bundle list linking. We can consider making max_bundle_uri_depth a configurable value if there is demonstrated value in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index dca88ed1e89..c9f3df28b2f 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } -int fetch_bundle_uri(struct repository *r, const char *uri) +/** + * This limits the recursion on fetch_bundle_uri_internal() when following + * bundle lists. + */ +static int max_bundle_uri_depth = 4; + +static int fetch_bundle_uri_internal(struct repository *r, + const char *uri, + int depth) { int result = 0; struct strbuf filename = STRBUF_INIT; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), + max_bundle_uri_depth); + return -1; + } + if ((result = find_temp_filename(&filename))) goto cleanup; @@ -363,6 +377,11 @@ cleanup: return result; } +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + return fetch_bundle_uri_internal(r, uri, 0); +} + /** * General API for {transport,connect}.c etc. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH 7/7] bundle-uri: fetch a list of bundles 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2022-08-22 15:12 ` [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 ` Derrick Stolee via GitGitGadget 2022-09-02 23:51 ` Josh Steadmon 2022-09-05 12:50 ` Teng Long 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 7 siblings, 2 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-08-22 15:12 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 211 +++++++++++++++++++++++++++++++++--- bundle-uri.h | 6 + t/t5558-clone-bundle-uri.sh | 93 ++++++++++++++++ 3 files changed, 293 insertions(+), 17 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index c9f3df28b2f..37867afca27 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -35,9 +35,10 @@ void init_bundle_list(struct bundle_list *list) static int clear_remote_bundle_info(struct remote_bundle_info *bundle, void *data) { - free(bundle->id); - free(bundle->uri); + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); strbuf_release(&bundle->file); + bundle->unbundled = 0; return 0; } @@ -334,18 +335,102 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } +struct bundle_list_context { + struct repository *r; + struct bundle_list *list; + enum bundle_list_mode mode; + int count; + int depth; +}; + +/* + * This early definition is necessary because we use indirect recursion: + * + * While iterating through a bundle list that was downloaded as part + * of fetch_bundle_uri_internal(), iterator methods eventually call it + * again, but with depth + 1. + */ +static int fetch_bundle_uri_internal(struct repository *r, + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list); + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + + ctx->count++; + return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); +} + +static int download_bundle_list(struct repository *r, + struct bundle_list *local_list, + struct bundle_list *global_list, + int depth) +{ + struct bundle_list_context ctx = { + .r = r, + .list = global_list, + .depth = depth + 1, + .mode = local_list->mode, + }; + + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); +} + +static int fetch_bundle_list_in_config_format(struct repository *r, + struct bundle_list *global_list, + struct remote_bundle_info *bundle, + int depth) +{ + int result; + struct bundle_list list_from_bundle; + + init_bundle_list(&list_from_bundle); + + if ((result = parse_bundle_list_in_config_format(bundle->uri, + bundle->file.buf, + &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { + warning(_("unrecognized bundle mode from URI '%s'"), + bundle->uri); + result = -1; + goto cleanup; + } + + if ((result = download_bundle_list(r, &list_from_bundle, + global_list, depth))) + goto cleanup; + +cleanup: + clear_bundle_list(&list_from_bundle); + return result; +} + /** * This limits the recursion on fetch_bundle_uri_internal() when following * bundle lists. */ static int max_bundle_uri_depth = 4; +/** + * Recursively download all bundles advertised at the given URI + * to files. If the file is a bundle, then add it to the given + * 'list'. Otherwise, expect a bundle list and recurse on the + * URIs in that list according to the list mode (ANY or ALL). + */ static int fetch_bundle_uri_internal(struct repository *r, - const char *uri, - int depth) + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list) { int result = 0; - struct strbuf filename = STRBUF_INIT; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { warning(_("exceeded bundle URI recursion limit (%d)"), @@ -353,33 +438,125 @@ static int fetch_bundle_uri_internal(struct repository *r, return -1; } - if ((result = find_temp_filename(&filename))) + if (!bundle->file.len && + (result = find_temp_filename(&bundle->file))) goto cleanup; - if ((result = copy_uri_to_file(filename.buf, uri))) { - warning(_("failed to download bundle from URI '%s'"), uri); + if ((result = copy_uri_to_file(bundle->file.buf, bundle->uri))) { + warning(_("failed to download bundle from URI '%s'"), bundle->uri); goto cleanup; } - if ((result = !is_bundle(filename.buf, 0))) { - warning(_("file at URI '%s' is not a bundle"), uri); + if ((result = !is_bundle(bundle->file.buf, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); + if (result) + warning(_("file at URI '%s' is not a bundle or bundle list"), + bundle->uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename.buf))) { - warning(_("failed to unbundle bundle from URI '%s'"), uri); - goto cleanup; - } + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); + strbuf_init(&bcopy->file, 0); + strbuf_add(&bcopy->file, bundle->file.buf, bundle->file.len); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: - unlink(filename.buf); - strbuf_release(&filename); + if (result) + unlink(bundle->file.buf); return result; } +struct attempt_unbundle_context { + struct repository *r; + int success_count; + int failure_count; +}; + +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ + struct attempt_unbundle_context *ctx = data; + + if (info->unbundled || !unbundle_from_file(ctx->r, info->file.buf)) { + ctx->success_count++; + info->unbundled = 1; + } else { + ctx->failure_count++; + } + + return 0; +} + +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ + int last_success_count = -1; + struct attempt_unbundle_context ctx = { + .r = r, + }; + + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. + */ + while (last_success_count < ctx.success_count) { + last_success_count = ctx.success_count; + + ctx.success_count = 0; + ctx.failure_count = 0; + for_all_bundles_in_list(list, attempt_unbundle, &ctx); + } + + if (ctx.success_count) + git_config_set_multivar_gently("log.excludedecoration", + "refs/bundle/", + "refs/bundle/", + CONFIG_FLAGS_FIXED_VALUE | + CONFIG_FLAGS_MULTI_REPLACE); + + if (ctx.failure_count) + warning(_("failed to unbundle %d bundles"), + ctx.failure_count); + + return 0; +} + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ + if (info->file.buf) + unlink_or_warn(info->file.buf); + return 0; +} + int fetch_bundle_uri(struct repository *r, const char *uri) { - return fetch_bundle_uri_internal(r, uri, 0); + int result; + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), + .id = xstrdup("<root>"), + .file = STRBUF_INIT, + }; + + init_bundle_list(&list); + + /* If a bundle is added to this global list, then it is required. */ + list.mode = BUNDLE_MODE_ALL; + + if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list))) + goto cleanup; + + result = unbundle_all_bundles(r, &list); + +cleanup: + for_all_bundles_in_list(&list, unlink_bundle, NULL); + clear_bundle_list(&list); + clear_remote_bundle_info(&bundle, NULL); + return result; } /** diff --git a/bundle-uri.h b/bundle-uri.h index 294ac804140..e9d85a6ecfb 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -35,6 +35,12 @@ struct remote_bundle_info { * an empty string. */ struct strbuf file; + + /** + * If the bundle has been unbundled successfully, then + * this boolean is true. + */ + unsigned unbundled:1; }; #define REMOTE_BUNDLE_INFO_INIT { \ diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index ad666a2d28a..592790b49f0 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -41,6 +41,72 @@ test_expect_success 'clone with file:// bundle' ' test_cmp expect actual ' +# To get interesting tests for bundle lists, we need to construct a +# somewhat-interesting commit history. +# +# ---------------- bundle-4 +# +# 4 +# / \ +# ----|---|------- bundle-3 +# | | +# | 3 +# | | +# ----|---|------- bundle-2 +# | | +# 2 | +# | | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'construct incremental bundle list' ' + ( + cd clone-from && + git checkout -b base && + test_commit 1 && + git checkout -b left && + test_commit 2 && + git checkout -b right base && + test_commit 3 && + git checkout -b merge left && + git merge right -m "4" && + + git bundle create bundle-1.bundle base && + git bundle create bundle-2.bundle base..left && + git bundle create bundle-3.bundle base..right && + git bundle create bundle-4.bundle merge --not left right + ) +' + +test_expect_success 'clone bundle list (file, no heuristic)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + [bundle "bundle-3"] + uri = file://$(pwd)/clone-from/bundle-3.bundle + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file && + for oid in $(git -C clone-from for-each-ref --format="%(objectname)") + do + git -C clone-list-file rev-parse $oid || return 1 + done +' + + ######################################################################### # HTTP tests begin here @@ -75,6 +141,33 @@ test_expect_success 'clone HTTP bundle' ' test_config -C clone-http log.excludedecoration refs/bundle/ ' +test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + [bundle "bundle-2"] + uri = $HTTPD_URL/bundle-2.bundle + + [bundle "bundle-3"] + uri = $HTTPD_URL/bundle-3.bundle + + [bundle "bundle-4"] + uri = $HTTPD_URL/bundle-4.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http && + for oid in $(git -C clone-from for-each-ref --format="%(objectname)") + do + git -C clone-list-http rev-parse $oid || return 1 + done +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH 7/7] bundle-uri: fetch a list of bundles 2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-09-02 23:51 ` Josh Steadmon 2022-09-05 12:50 ` Teng Long 1 sibling, 0 replies; 94+ messages in thread From: Josh Steadmon @ 2022-09-02 23:51 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, gitster, me, newren, avarab, mjcheetham, Derrick Stolee On 2022.08.22 15:12, Derrick Stolee via GitGitGadget wrote: [snip] > +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) > +{ > + struct bundle_list_context *ctx = data; > + > + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) > + return 0; > + > + ctx->count++; > + return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); > +} We should check whether fetch_bundle_uri_internal() actually succeeds before we increment ctx->count here. Otherwise, if we're in BUNDLE_MODE_ANY and the client gets unlucky that one of the servers hosting a bundle file is offline, it won't retry any of the other servers. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH 7/7] bundle-uri: fetch a list of bundles 2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-09-02 23:51 ` Josh Steadmon @ 2022-09-05 12:50 ` Teng Long 2022-09-08 17:10 ` Derrick Stolee 1 sibling, 1 reply; 94+ messages in thread From: Teng Long @ 2022-09-05 12:50 UTC (permalink / raw) To: gitgitgadget Cc: avarab, derrickstolee, git, gitster, me, mjcheetham, newren, steadmon, tenglong.tl Derrick Stolee <derrickstolee@github.com> writes: > int fetch_bundle_uri(struct repository *r, const char *uri) > { > - return fetch_bundle_uri_internal(r, uri, 0); > + int result; > + struct bundle_list list; > + struct remote_bundle_info bundle = { > + .uri = xstrdup(uri), > + .id = xstrdup("<root>"), Very readable code, thank you very much. I'm a little curious why we use the "<root>" as the init value of ".id"? Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH 7/7] bundle-uri: fetch a list of bundles 2022-09-05 12:50 ` Teng Long @ 2022-09-08 17:10 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-09-08 17:10 UTC (permalink / raw) To: Teng Long, gitgitgadget Cc: avarab, git, gitster, me, mjcheetham, newren, steadmon, tenglong.tl On 9/5/2022 8:50 AM, Teng Long wrote: > > Derrick Stolee <derrickstolee@github.com> writes: > >> int fetch_bundle_uri(struct repository *r, const char *uri) >> { >> - return fetch_bundle_uri_internal(r, uri, 0); >> + int result; >> + struct bundle_list list; >> + struct remote_bundle_info bundle = { >> + .uri = xstrdup(uri), >> + .id = xstrdup("<root>"), > > Very readable code, thank you very much. > > I'm a little curious why we use the "<root>" as the init value of > ".id"? In this case, we need a valid ID to initialize the bundle list (since it will add the remote_bundle_info to the hash set), but it is considered the info "above" all other lists. We could also specify an empty string here, but we just can't use NULL. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget ` (10 more replies) 7 siblings, 11 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list. The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series). The patches are organized in this way: 1. Patches 1-2 are cleanups from the previous part. The first was recommended by Teng Long and the second allows us to simplify our bundle list data structure slightly. 2. Patches 3-4 create the bundle list data structures and the logic for populating the list from key-value pairs. 3. Patches 5-6 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2]. 4. Patch 7 teaches Git to parse Git config files into bundle lists. 5. Patches 8-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. [1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/ [2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/ At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file. As outlined in the design document [1], the next steps after this are: 1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2] 2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3] I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going: [3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com [4] https://github.com/derrickstolee/git/pull/21 [5] https://github.com/derrickstolee/git/pull/22 Updates in v2 ============= Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version. * I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches. * We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used. * The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series). * The bundle.version value is now parsed using git_parse_int(). * The config key is now parsed using parse_config_key(). * Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort. * Some printf()s are correctly changed to fprintf()s. * The test helper CLI is unified across the two modes. They both take a filename now. * The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure. Thanks, * Stolee Derrick Stolee (7): bundle-uri: short-circuit capability parsing bundle-uri: use plain string in find_temp_filename() bundle-uri: create bundle_list struct and helpers bundle-uri: create base key-value pair parsing bundle-uri: parse bundle list in config format bundle-uri: limit recursion depth for bundle lists bundle-uri: fetch a list of bundles Ævar Arnfjörð Bjarmason (2): bundle-uri: create "key=value" line parsing bundle-uri: unit test "key=value" parsing Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 ++ Makefile | 1 + bundle-uri.c | 466 ++++++++++++++++++++++++++++++-- bundle-uri.h | 93 +++++++ config.c | 2 +- config.h | 1 + t/helper/test-bundle-uri.c | 95 +++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5558-clone-bundle-uri.sh | 93 +++++++ t/t5750-bundle-uri-parse.sh | 171 ++++++++++++ t/test-lib-functions.sh | 11 + 13 files changed, 942 insertions(+), 19 deletions(-) create mode 100644 Documentation/config/bundle.txt create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1333 Range-diff vs v1: -: ----------- > 1: 2ca431e6c37 bundle-uri: short-circuit capability parsing -: ----------- > 2: ee6c4b824c2 bundle-uri: use plain string in find_temp_filename() 1: c3943888658 ! 3: d9812440594 bundle-uri: create bundle_list struct and helpers @@ bundle-uri.c +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ -+ free(bundle->id); -+ free(bundle->uri); -+ strbuf_release(&bundle->file); ++ FREE_AND_NULL(bundle->id); ++ FREE_AND_NULL(bundle->uri); + return 0; +} + @@ bundle-uri.c + return 0; +} - static int find_temp_filename(struct strbuf *name) + static char *find_temp_filename(void) { ## bundle-uri.h ## @@ bundle-uri.h + * if there was no table of contents. + */ + char *uri; -+ -+ /** -+ * If the bundle has been downloaded, then 'file' is a -+ * filename storing its contents. Otherwise, 'file' is -+ * an empty string. -+ */ -+ struct strbuf file; +}; + -+#define REMOTE_BUNDLE_INFO_INIT { \ -+ .file = STRBUF_INIT, \ -+} ++#define REMOTE_BUNDLE_INFO_INIT { 0 } + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, 2: 7e4e4656e53 ! 4: 70daef66833 bundle-uri: create base key-value pair parsing @@ Commit message currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. + While parsing, return an error if a URI key is repeated, since we can + make that restriction with bundle lists. + + Make the git_parse_int() method global so we can parse the integer + version value carefully. + Signed-off-by: Derrick Stolee <derrickstolee@github.com> ## Documentation/config.txt ## @@ Documentation/config.txt: include::config/branch.txt[] ## Documentation/config/bundle.txt (new) ## @@ +bundle.*:: -+ The `bundle.*` keys are used when communicating a list of bundle URIs -+ See link:technical/bundle-uri.html[the bundle URI design document] for -+ more details. ++ The `bundle.*` keys may appear in a bundle list file found via the ++ `git clone --bundle-uri` option. These keys currently have no effect ++ if placed in a repository config file, though this will change in the ++ future. See link:technical/bundle-uri.html[the bundle URI design ++ document] for more details. + +bundle.version:: + This integer value advertises the version of the bundle list format @@ Documentation/config/bundle.txt (new) + of this `<id>`. This URI may be a bundle file or another bundle list. ## bundle-uri.c ## +@@ + #include "run-command.h" + #include "hashmap.h" + #include "pkt-line.h" ++#include "config.h" + + static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, return 0; } @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ -+ const char *pkey, *dot; + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; ++ const char *subsection, *subkey; ++ size_t subsection_len; + -+ if (!skip_prefix(key, "bundle.", &pkey)) -+ return 1; ++ if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey)) ++ return -1; + -+ dot = strchr(pkey, '.'); -+ if (!dot) { -+ if (!strcmp(pkey, "version")) { -+ int version = atoi(value); ++ if (!subsection_len) { ++ if (!strcmp(subkey, "version")) { ++ int version; ++ if (!git_parse_int(value, &version)) ++ return -1; + if (version != 1) -+ return 1; ++ return -1; + + list->version = version; + return 0; + } + -+ if (!strcmp(pkey, "mode")) { ++ if (!strcmp(subkey, "mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else -+ return 1; ++ return -1; + return 0; + } + @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, + return 0; + } + -+ strbuf_add(&id, pkey, dot - pkey); -+ dot++; ++ strbuf_add(&id, subsection, subsection_len); + + /* + * Check for an existing bundle with this <id>, or create one @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); -+ strbuf_init(&bundle->file, 0); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + -+ if (!strcmp(dot, "uri")) { -+ free(bundle->uri); ++ if (!strcmp(subkey, "uri")) { ++ if (bundle->uri) ++ return -1; + bundle->uri = xstrdup(value); + return 0; + } @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, + return 0; +} + - static int find_temp_filename(struct strbuf *name) + static char *find_temp_filename(void) { int fd; + + ## config.c ## +@@ config.c: static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max) + return 0; + } + +-static int git_parse_int(const char *value, int *ret) ++int git_parse_int(const char *value, int *ret) + { + intmax_t tmp; + if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int))) + + ## config.h ## +@@ config.h: int config_with_options(config_fn_t fn, void *, + + int git_parse_ssize_t(const char *, ssize_t *); + int git_parse_ulong(const char *, unsigned long *); ++int git_parse_int(const char *value, int *ret); + + /** + * Same as `git_config_bool`, except that it returns -1 on error rather 3: 49c4f88b6fd ! 5: 4df3f834029 bundle-uri: create "key=value" line parsing @@ Commit message bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. + We are not currently implementing this protocol v2 functionality, but + instead preparing to expose this parsing to be unit-testable. + Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, struct bundle_list *list) { @@ bundle-uri.c: cleanup: - strbuf_release(&filename); + free(filename); return result; } + @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); --#endif +/** + * General API for {transport,connect}.c etc. + */ @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list, +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + -+#endif /* BUNDLE_URI_H */ + #endif 4: 7580e1f09af ! 6: 91c5b58f011 bundle-uri: unit test "key=value" parsing @@ Commit message in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses - stdin as a list of lines. These are fed into bundle_uri_parse_line() to - test how we construct a 'struct bundle_list' from that data. The list is - then output to stdout as if the key-value pairs were a Git config file. + an input file as a list of lines. These are fed into + bundle_uri_parse_line() to test how we construct a 'struct bundle_list' + from that data. The list is then output to stdout as if the key-value + pairs were a Git config file. + + We use an input file instead of stdin because of a future change to + parse in config-file format that works better as an input file. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, + mode = "<unknown>"; + } + -+ printf("[bundle]\n"); -+ printf("\tversion = %d\n", list->version); -+ printf("\tmode = %s\n", mode); ++ fprintf(fp, "[bundle]\n"); ++ fprintf(fp, "\tversion = %d\n", list->version); ++ fprintf(fp, "\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} @@ t/helper/test-bundle-uri.c (new) +#include "strbuf.h" +#include "string-list.h" + -+static int cmd__bundle_uri_parse_key_values(int argc, const char **argv) ++static int cmd__bundle_uri_parse(int argc, const char **argv) +{ -+ const char *usage[] = { -+ "test-tool bundle-uri parse-key-values <in", ++ const char *key_value_usage[] = { ++ "test-tool bundle-uri parse-key-values <input>", + NULL + }; ++ const char **usage = key_value_usage; + struct option options[] = { + OPT_END(), + }; + struct strbuf sb = STRBUF_INIT; + struct bundle_list list; + int err = 0; ++ FILE *fp; + + argc = parse_options(argc, argv, NULL, options, usage, 0); -+ if (argc) ++ if (argc != 1) + goto usage; + + init_bundle_list(&list); -+ while (strbuf_getline(&sb, stdin) != EOF) { -+ if (bundle_uri_parse_line(&list, sb.buf) < 0) ++ fp = fopen(argv[0], "r"); ++ if (!fp) ++ die("failed to open '%s'", argv[0]); ++ ++ while (strbuf_getline(&sb, fp) != EOF) { ++ if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + strbuf_release(&sb); ++ fclose(fp); + + print_bundle_list(stdout, &list); + @@ t/helper/test-bundle-uri.c (new) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) -+ return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1); ++ return cmd__bundle_uri_parse(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: @@ t/t5750-bundle-uri-parse.sh (new) + uri = file:///usr/share/git/bundle.bdl + EOF + -+ test-tool bundle-uri parse-key-values <in >actual 2>err && ++ test-tool bundle-uri parse-key-values in >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' @@ t/t5750-bundle-uri-parse.sh (new) + mode = all + EOF + -+ test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err && ++ test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' @@ t/t5750-bundle-uri-parse.sh (new) + uri = file:///usr/share/git/bundle.bdl + EOF + -+ test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err && ++ test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && ++ test_cmp err.expect err && ++ test_cmp_config_output expect actual ++' ++ ++test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' ' ++ cat >in <<-\EOF && ++ bundle.one.uri=http://example.com/bundle.bdl ++ bundle.two.uri=https://example.com/bundle.bdl ++ bundle.one.uri=https://example.com/bundle-2.bdl ++ bundle.three.uri=file:///usr/share/git/bundle.bdl ++ EOF ++ ++ cat >err.expect <<-\EOF && ++ error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\'' ++ EOF ++ ++ # We fail, but try to continue parsing regardless ++ cat >expect <<-\EOF && ++ [bundle] ++ version = 1 ++ mode = all ++ [bundle "one"] ++ uri = http://example.com/bundle.bdl ++ [bundle "two"] ++ uri = https://example.com/bundle.bdl ++ [bundle "three"] ++ uri = file:///usr/share/git/bundle.bdl ++ EOF ++ ++ test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' 5: 1d1bd9c7103 ! 7: 1492b8f5ef0 bundle-uri: parse bundle list in config format @@ Commit message our expected "key=value" pairs and instead format them using Git config format. - Create parse_bundle_list_in_config_format() to parse a file in config - format and convert that into a 'struct bundle_list' filled with its + Create bundle_uri_parse_config_format() to parse a file in config format + and convert that into a 'struct bundle_list' filled with its understanding of the contents. - Be careful to call git_config_from_file_with_options() because the - default action for git_config_from_file() is to die() on a parsing - error. The current warning isn't particularly helpful if it arises to a - user, but it will be made more verbose at a higher layer later. + Be careful to use error_action CONFIG_ERROR_ERROR when calling + git_config_from_file_with_options() because the default action for + git_config_from_file() is to die() on a parsing error. The current + warning isn't particularly helpful if it arises to a user, but it will + be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce - bugs than this simpler version. I would rather have a slightly confusing - test helper than complicated product code. + bugs than this simpler version. Signed-off-by: Derrick Stolee <derrickstolee@github.com> ## bundle-uri.c ## -@@ - #include "run-command.h" - #include "hashmap.h" - #include "pkt-line.h" -+#include "config.h" - - static int compare_bundles(const void *hashmap_cmp_fn_data, - const struct hashmap_entry *he1, @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value, return 0; } @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value, + return bundle_list_update(key, value, list); +} + -+int parse_bundle_list_in_config_format(const char *uri, -+ const char *filename, -+ struct bundle_list *list) ++int bundle_uri_parse_config_format(const char *uri, ++ const char *filename, ++ struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + -+ list->mode = BUNDLE_MODE_NONE; + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value, + return result; +} + - static int find_temp_filename(struct strbuf *name) + static char *find_temp_filename(void) { int fd; @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list, + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ -+ -+int parse_bundle_list_in_config_format(const char *uri, -+ const char *filename, -+ struct bundle_list *list); ++int bundle_uri_parse_config_format(const char *uri, ++ const char *filename, ++ struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found @@ t/helper/test-bundle-uri.c #include "strbuf.h" #include "string-list.h" --static int cmd__bundle_uri_parse_key_values(int argc, const char **argv) +-static int cmd__bundle_uri_parse(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, @@ t/helper/test-bundle-uri.c + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { -- const char *usage[] = { -+ const char *key_value_usage[] = { - "test-tool bundle-uri parse-key-values <in", + const char *key_value_usage[] = { + "test-tool bundle-uri parse-key-values <input>", NULL }; + const char *config_usage[] = { + "test-tool bundle-uri parse-config <input>", + NULL + }; + const char **usage = key_value_usage; struct option options[] = { OPT_END(), - }; -+ const char **usage = key_value_usage; - struct strbuf sb = STRBUF_INIT; - struct bundle_list list; +@@ t/helper/test-bundle-uri.c: static int cmd__bundle_uri_parse(int argc, const char **argv) int err = 0; + FILE *fp; - argc = parse_options(argc, argv, NULL, options, usage, 0); -- if (argc) +- if (argc != 1) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; @@ t/helper/test-bundle-uri.c + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); -- while (strbuf_getline(&sb, stdin) != EOF) { -- if (bundle_uri_parse_line(&list, sb.buf) < 0) +- fp = fopen(argv[0], "r"); +- if (!fp) +- die("failed to open '%s'", argv[0]); + +- while (strbuf_getline(&sb, fp) != EOF) { +- if (bundle_uri_parse_line(&list, sb.buf)) - err = error("bad line: '%s'", sb.buf); -+ + switch (mode) { + case KEY_VALUE_PAIRS: -+ if (argc) ++ if (argc != 1) + goto usage; -+ while (strbuf_getline(&sb, stdin) != EOF) { -+ if (bundle_uri_parse_line(&list, sb.buf) < 0) ++ fp = fopen(argv[0], "r"); ++ if (!fp) ++ die("failed to open '%s'", argv[0]); ++ while (strbuf_getline(&sb, fp) != EOF) { ++ if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } ++ fclose(fp); + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; -+ err = parse_bundle_list_in_config_format("<uri>", argv[0], &list); ++ err = bundle_uri_parse_config_format("<uri>", argv[0], &list); + break; } strbuf_release(&sb); +- fclose(fp); + + print_bundle_list(stdout, &list); @@ t/helper/test-bundle-uri.c: int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) -- return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1); +- return cmd__bundle_uri_parse(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); @@ t/helper/test-bundle-uri.c: int cmd__bundle_uri(int argc, const char **argv) usage: ## t/t5750-bundle-uri-parse.sh ## -@@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' +@@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines test_cmp_config_output expect actual ' @@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsin + cat >expect <<-\EOF && + [bundle] + version = 1 -+ mode = <unknown> ++ mode = all + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && @@ t/t5750-bundle-uri-parse.sh: test_expect_success 'bundle_uri_parse_line() parsin + EOF + + cat >err2 <<-EOF && -+ warning: bundle list at '\''<uri>'\'' has no mode ++ error: bad config line 1 in file in2 + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && 6: 039e172849c ! 8: b5d570082fa bundle-uri: limit recursion depth for bundle lists @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + int depth) { int result = 0; - struct strbuf filename = STRBUF_INIT; + char *filename; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + return -1; + } + - if ((result = find_temp_filename(&filename))) + if (!(filename = find_temp_filename())) { + result = -1; goto cleanup; - @@ bundle-uri.c: cleanup: return result; } 7: 7b45c06cc9e ! 9: a6ab8f7c699 bundle-uri: fetch a list of bundles @@ Commit message Signed-off-by: Derrick Stolee <derrickstolee@github.com> ## bundle-uri.c ## -@@ bundle-uri.c: void init_bundle_list(struct bundle_list *list) - static int clear_remote_bundle_info(struct remote_bundle_info *bundle, - void *data) +@@ bundle-uri.c: static int clear_remote_bundle_info(struct remote_bundle_info *bundle, { -- free(bundle->id); -- free(bundle->uri); -+ FREE_AND_NULL(bundle->id); -+ FREE_AND_NULL(bundle->uri); - strbuf_release(&bundle->file); + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); ++ FREE_AND_NULL(bundle->file); + bundle->unbundled = 0; return 0; } @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ ++ int res; + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + -+ ctx->count++; -+ return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); ++ res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); ++ ++ /* ++ * Only increment count if the download succeeded. If our mode is ++ * BUNDLE_MODE_ANY, then we will want to try other URIs in the ++ * list in case they work instead. ++ */ ++ if (!res) ++ ctx->count++; ++ return res; +} + +static int download_bundle_list(struct repository *r, @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + + init_bundle_list(&list_from_bundle); + -+ if ((result = parse_bundle_list_in_config_format(bundle->uri, -+ bundle->file.buf, -+ &list_from_bundle))) ++ if ((result = bundle_uri_parse_config_format(bundle->uri, ++ bundle->file, ++ &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + struct bundle_list *list) { int result = 0; -- struct strbuf filename = STRBUF_INIT; +- char *filename; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, return -1; } -- if ((result = find_temp_filename(&filename))) -+ if (!bundle->file.len && -+ (result = find_temp_filename(&bundle->file))) +- if (!(filename = find_temp_filename())) { ++ if (!bundle->file && ++ !(bundle->file = find_temp_filename())) { + result = -1; goto cleanup; + } -- if ((result = copy_uri_to_file(filename.buf, uri))) { +- if ((result = copy_uri_to_file(filename, uri))) { - warning(_("failed to download bundle from URI '%s'"), uri); -+ if ((result = copy_uri_to_file(bundle->file.buf, bundle->uri))) { ++ if ((result = copy_uri_to_file(bundle->file, bundle->uri))) { + warning(_("failed to download bundle from URI '%s'"), bundle->uri); goto cleanup; } -- if ((result = !is_bundle(filename.buf, 0))) { +- if ((result = !is_bundle(filename, 0))) { - warning(_("file at URI '%s' is not a bundle"), uri); -+ if ((result = !is_bundle(bundle->file.buf, 1))) { ++ if ((result = !is_bundle(bundle->file, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); + if (result) @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, goto cleanup; } -- if ((result = unbundle_from_file(r, filename.buf))) { +- if ((result = unbundle_from_file(r, filename))) { - warning(_("failed to unbundle bundle from URI '%s'"), uri); - goto cleanup; - } + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); -+ strbuf_init(&bcopy->file, 0); -+ strbuf_add(&bcopy->file, bundle->file.buf, bundle->file.len); ++ bcopy->file = xstrdup(bundle->file); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: -- unlink(filename.buf); -- strbuf_release(&filename); -+ if (result) -+ unlink(bundle->file.buf); +- if (filename) +- unlink(filename); +- free(filename); ++ if (result && bundle->file) ++ unlink(bundle->file); return result; } @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, +{ + struct attempt_unbundle_context *ctx = data; + -+ if (info->unbundled || !unbundle_from_file(ctx->r, info->file.buf)) { ++ if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) { + ctx->success_count++; + info->unbundled = 1; + } else { @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ -+ if (info->file.buf) -+ unlink_or_warn(info->file.buf); ++ if (info->file) ++ unlink_or_warn(info->file); + return 0; +} + @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), -+ .id = xstrdup("<root>"), -+ .file = STRBUF_INIT, ++ .id = xstrdup(""), + }; + + init_bundle_list(&list); @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, ## bundle-uri.h ## @@ bundle-uri.h: struct remote_bundle_info { - * an empty string. + * if there was no table of contents. */ - struct strbuf file; + char *uri; ++ ++ /** ++ * If the bundle has been downloaded, then 'file' is a ++ * filename storing its contents. Otherwise, 'file' is ++ * NULL. ++ */ ++ char *file; + + /** + * If the bundle has been unbundled successfully, then @@ bundle-uri.h: struct remote_bundle_info { + unsigned unbundled:1; }; - #define REMOTE_BUNDLE_INFO_INIT { \ + #define REMOTE_BUNDLE_INFO_INIT { 0 } ## t/t5558-clone-bundle-uri.sh ## @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' ' -- gitgitgadget ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v2 1/9] bundle-uri: short-circuit capability parsing 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-09 17:24 ` Junio C Hamano 2022-09-09 14:33 ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 10 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When parsing the capability lines from the 'git remote-https' process, we can stop reading the lines once we notice the 'get' capability. Reported-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 4a8cc74ed05..7173ed065e9 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -56,8 +56,10 @@ static int download_https_uri_to_file(const char *file, const char *uri) while (!strbuf_getline(&line, child_out)) { if (!line.len) break; - if (!strcmp(line.buf, "get")) + if (!strcmp(line.buf, "get")) { found_get = 1; + break; + } } strbuf_release(&line); -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v2 1/9] bundle-uri: short-circuit capability parsing 2022-09-09 14:33 ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget @ 2022-09-09 17:24 ` Junio C Hamano 2022-09-19 17:55 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-09-09 17:24 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > When parsing the capability lines from the 'git remote-https' process, > we can stop reading the lines once we notice the 'get' capability. > > Reported-by: Teng Long <dyroneteng@gmail.com> > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > bundle-uri.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/bundle-uri.c b/bundle-uri.c > index 4a8cc74ed05..7173ed065e9 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -56,8 +56,10 @@ static int download_https_uri_to_file(const char *file, const char *uri) > while (!strbuf_getline(&line, child_out)) { > if (!line.len) > break; > - if (!strcmp(line.buf, "get")) > + if (!strcmp(line.buf, "get")) { > found_get = 1; > + break; > + } > } Hmph, is this safe to do? Who is feeding child_out? Aren't they get upset if we do not slurp what they write to us? Are we expecting to read more from them after this part? Aren't we get upset if we leave some other stuff when we read from child_out after we saw "get"? If we respond to child_in without reading all from them, do we not get into a deadlock? Perhaps these are all silly questions, but the description above does not quite answer them. > strbuf_release(&line); ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 1/9] bundle-uri: short-circuit capability parsing 2022-09-09 17:24 ` Junio C Hamano @ 2022-09-19 17:55 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-09-19 17:55 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long On 9/9/2022 1:24 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> When parsing the capability lines from the 'git remote-https' process, >> we can stop reading the lines once we notice the 'get' capability. >> >> Reported-by: Teng Long <dyroneteng@gmail.com> >> Signed-off-by: Derrick Stolee <derrickstolee@github.com> >> --- >> bundle-uri.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/bundle-uri.c b/bundle-uri.c >> index 4a8cc74ed05..7173ed065e9 100644 >> --- a/bundle-uri.c >> +++ b/bundle-uri.c >> @@ -56,8 +56,10 @@ static int download_https_uri_to_file(const char *file, const char *uri) >> while (!strbuf_getline(&line, child_out)) { >> if (!line.len) >> break; >> - if (!strcmp(line.buf, "get")) >> + if (!strcmp(line.buf, "get")) { >> found_get = 1; >> + break; >> + } >> } > > Hmph, is this safe to do? Who is feeding child_out? Aren't they > get upset if we do not slurp what they write to us? Are we > expecting to read more from them after this part? Aren't we get > upset if we leave some other stuff when we read from child_out after > we saw "get"? If we respond to child_in without reading all from > them, do we not get into a deadlock? > > Perhaps these are all silly questions, but the description above > does not quite answer them. In my testing, this has not been a problem, but that does not mean that it is safe to do. I'll drop this patch in v3. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-09 17:56 ` Junio C Hamano 2022-09-09 14:33 ` [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 10 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The find_temp_filename() method was created in 53a50892be2 (bundle-uri: create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to create a temporary filename. The odb_mkstemp() method uses a strbuf in its interface, but we do not need to continue carrying a strbuf throughout the bundle URI code. Convert the find_temp_filename() method to use a 'char *' and modify its only caller. This makes sense that we don't actually need to modify this filename directly later, so using a strbuf is overkill. This change will simplify the data structure for tracking a bundle list to use plain strings instead of strbufs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 7173ed065e9..c52b2a2a64a 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -5,22 +5,23 @@ #include "refs.h" #include "run-command.h" -static int find_temp_filename(struct strbuf *name) +static char *find_temp_filename(void) { int fd; + struct strbuf name = STRBUF_INIT; /* * Find a temporary filename that is available. This is briefly * racy, but unlikely to collide. */ - fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX"); + fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX"); if (fd < 0) { warning(_("failed to create temporary file")); - return -1; + return NULL; } close(fd); - unlink(name->buf); - return 0; + unlink(name.buf); + return strbuf_detach(&name, NULL); } static int download_https_uri_to_file(const char *file, const char *uri) @@ -143,28 +144,31 @@ static int unbundle_from_file(struct repository *r, const char *file) int fetch_bundle_uri(struct repository *r, const char *uri) { int result = 0; - struct strbuf filename = STRBUF_INIT; + char *filename; - if ((result = find_temp_filename(&filename))) + if (!(filename = find_temp_filename())) { + result = -1; goto cleanup; + } - if ((result = copy_uri_to_file(filename.buf, uri))) { + if ((result = copy_uri_to_file(filename, uri))) { warning(_("failed to download bundle from URI '%s'"), uri); goto cleanup; } - if ((result = !is_bundle(filename.buf, 0))) { + if ((result = !is_bundle(filename, 0))) { warning(_("file at URI '%s' is not a bundle"), uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename.buf))) { + if ((result = unbundle_from_file(r, filename))) { warning(_("failed to unbundle bundle from URI '%s'"), uri); goto cleanup; } cleanup: - unlink(filename.buf); - strbuf_release(&filename); + if (filename) + unlink(filename); + free(filename); return result; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() 2022-09-09 14:33 ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget @ 2022-09-09 17:56 ` Junio C Hamano 2022-09-19 17:54 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-09-09 17:56 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > The find_temp_filename() method was created in 53a50892be2 (bundle-uri: > create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to > create a temporary filename. The odb_mkstemp() method uses a strbuf in > its interface, but we do not need to continue carrying a strbuf > throughout the bundle URI code. What the patch does is not wrong per-se, but it is unfortunate that, even though we accepted a known-to-be-racy approach for expediency earlier, the first update to that is not to replace it with a non-racy and safe approach, but make it easier to use, encouraging use of the racy approach and give it an appearance of a clean code X-<. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() 2022-09-09 17:56 ` Junio C Hamano @ 2022-09-19 17:54 ` Derrick Stolee 2022-09-19 18:16 ` Junio C Hamano 0 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee @ 2022-09-19 17:54 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long On 9/9/2022 1:56 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> The find_temp_filename() method was created in 53a50892be2 (bundle-uri: >> create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to >> create a temporary filename. The odb_mkstemp() method uses a strbuf in >> its interface, but we do not need to continue carrying a strbuf >> throughout the bundle URI code. > > What the patch does is not wrong per-se, but it is unfortunate that, > even though we accepted a known-to-be-racy approach for expediency > earlier, the first update to that is not to replace it with a > non-racy and safe approach, but make it easier to use, encouraging > use of the racy approach and give it an appearance of a clean code > X-<. Hopefully you would be encouraged by future efforts to replace this temporary file name with something deterministic based on the URI so we can restart downloads that were halted, even if the process needs to restart. But for now, this change helps us to remove the strbuf from the remote_bundle_info struct. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() 2022-09-19 17:54 ` Derrick Stolee @ 2022-09-19 18:16 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-09-19 18:16 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long Derrick Stolee <derrickstolee@github.com> writes: > ... something deterministic based on the URI so > we can restart downloads that were halted, even if the process needs > to restart. But for now, this change helps us to remove the strbuf > from the remote_bundle_info struct. Yay for a bright future ;-) Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 10 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> It will likely be rare where a user uses a single bundle URI and expects that URI to point to a bundle. Instead, that URI will likely be a list of bundles provided in some format. Alternatively, the Git server could advertise a list of bundles. In anticipation of these two ways of advertising multiple bundles, create a data structure that represents such a list. This will be populated using a common API, but for now focus on what data can be represented. Each list contains a number of remote_bundle_info structs. These contain an 'id' that is used to uniquely identify them in the list, and also a 'uri' that contains the location of its data. Finally, there is a strbuf containing the filename used when Git downloads the contents to disk. The list itself stores these remote_bundle_info structs in a hashtable using 'id' as the key. The order of the structs in the input is considered unimportant, but future modifications to the format and these data structures will place ordering possibilities on the set. The list also has a few "global" properties, including the version (used when parsing the list) and the mode. The mode is one of these two options: 1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined together. The client should download all of the advertised data to have a complete copy of the data. 2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete copy of the data. The client can choose arbitrarily from these options. In the future, the client may use pings to find the closest URI among geodistributed replicas, or use some other heuristic information added to the format. This API is currently unused, but will soon be expanded with parsing logic and then be consumed by the bundle URI download logic. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index c52b2a2a64a..7a0bada6eda 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -4,6 +4,66 @@ #include "object-store.h" #include "refs.h" #include "run-command.h" +#include "hashmap.h" +#include "pkt-line.h" + +static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, + const struct hashmap_entry *he2, + const void *id) +{ + const struct remote_bundle_info *e1 = + container_of(he1, const struct remote_bundle_info, ent); + const struct remote_bundle_info *e2 = + container_of(he2, const struct remote_bundle_info, ent); + + return strcmp(e1->id, id ? (const char *)id : e2->id); +} + +void init_bundle_list(struct bundle_list *list) +{ + memset(list, 0, sizeof(*list)); + + /* Implied defaults. */ + list->mode = BUNDLE_MODE_ALL; + list->version = 1; + + hashmap_init(&list->bundles, compare_bundles, NULL, 0); +} + +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); + return 0; +} + +void clear_bundle_list(struct bundle_list *list) +{ + if (!list) + return; + + for_all_bundles_in_list(list, clear_remote_bundle_info, NULL); + hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent); +} + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data) +{ + struct remote_bundle_info *info; + struct hashmap_iter i; + + hashmap_for_each_entry(&list->bundles, &i, info, ent) { + int result = iter(info, data); + + if (result) + return result; + } + + return 0; +} static char *find_temp_filename(void) { diff --git a/bundle-uri.h b/bundle-uri.h index 8a152f1ef14..ff7e3fd3fb2 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -1,7 +1,63 @@ #ifndef BUNDLE_URI_H #define BUNDLE_URI_H +#include "hashmap.h" +#include "strbuf.h" + struct repository; +struct string_list; + +/** + * The remote_bundle_info struct contains information for a single bundle + * URI. This may be initialized simply by a given URI or might have + * additional metadata associated with it if the bundle was advertised by + * a bundle list. + */ +struct remote_bundle_info { + struct hashmap_entry ent; + + /** + * The 'id' is a name given to the bundle for reference + * by other bundle infos. + */ + char *id; + + /** + * The 'uri' is the location of the remote bundle so + * it can be downloaded on-demand. This will be NULL + * if there was no table of contents. + */ + char *uri; +}; + +#define REMOTE_BUNDLE_INFO_INIT { 0 } + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, + BUNDLE_MODE_ALL, + BUNDLE_MODE_ANY +}; + +/** + * A bundle_list contains an unordered set of remote_bundle_info structs, + * as well as information about the bundle listing, such as version and + * mode. + */ +struct bundle_list { + int version; + enum bundle_list_mode mode; + struct hashmap bundles; +}; + +void init_bundle_list(struct bundle_list *list); +void clear_bundle_list(struct bundle_list *list); + +typedef int (*bundle_iterator)(struct remote_bundle_info *bundle, + void *data); + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data); /** * Fetch data from the given 'uri' and unbundle the bundle data found -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v2 4/9] bundle-uri: create base key-value pair parsing 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-29 21:49 ` Jonathan Tan 2022-09-09 14:33 ` [PATCH v2 5/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (6 subsequent siblings) 10 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "<id>" as part of the key name. Keys using the same "<id>" describe a single bundle list item. * bundle.<id>.uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. While parsing, return an error if a URI key is repeated, since we can make that restriction with bundle lists. Make the git_parse_int() method global so we can parse the integer version value carefully. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 +++++++++++ bundle-uri.c | 76 +++++++++++++++++++++++++++++++++ config.c | 2 +- config.h | 1 + 5 files changed, 104 insertions(+), 1 deletion(-) create mode 100644 Documentation/config/bundle.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index e376d547ce0..4280af6992e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -387,6 +387,8 @@ include::config/branch.txt[] include::config/browser.txt[] +include::config/bundle.txt[] + include::config/checkout.txt[] include::config/clean.txt[] diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt new file mode 100644 index 00000000000..daa21eb674a --- /dev/null +++ b/Documentation/config/bundle.txt @@ -0,0 +1,24 @@ +bundle.*:: + The `bundle.*` keys may appear in a bundle list file found via the + `git clone --bundle-uri` option. These keys currently have no effect + if placed in a repository config file, though this will change in the + future. See link:technical/bundle-uri.html[the bundle URI design + document] for more details. + +bundle.version:: + This integer value advertises the version of the bundle list format + used by the bundle list. Currently, the only accepted value is `1`. + +bundle.mode:: + This string value should be either `all` or `any`. This value describes + whether all of the advertised bundles are required to unbundle a + complete understanding of the bundled information (`all`) or if any one + of the listed bundle URIs is sufficient (`any`). + +bundle.<id>.*:: + The `bundle.<id>.*` keys are used to describe a single item in the + bundle list, grouped under `<id>` for identification purposes. + +bundle.<id>.uri:: + This string value defines the URI by which Git can reach the contents + of this `<id>`. This URI may be a bundle file or another bundle list. diff --git a/bundle-uri.c b/bundle-uri.c index 7a0bada6eda..4ccd14c8936 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -6,6 +6,7 @@ #include "run-command.h" #include "hashmap.h" #include "pkt-line.h" +#include "config.h" static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, @@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +/** + * Given a key-value pair, update the state of the given bundle list. + * Returns 0 if the key-value pair is understood. Returns 1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; + const char *subsection, *subkey; + size_t subsection_len; + + if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey)) + return -1; + + if (!subsection_len) { + if (!strcmp(subkey, "version")) { + int version; + if (!git_parse_int(value, &version)) + return -1; + if (version != 1) + return -1; + + list->version = version; + return 0; + } + + if (!strcmp(subkey, "mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else + return -1; + return 0; + } + + /* Ignore other unknown global keys. */ + return 0; + } + + strbuf_add(&id, subsection, subsection_len); + + /* + * Check for an existing bundle with this <id>, or create one + * if necessary. + */ + lookup.id = id.buf; + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + + if (!strcmp(subkey, "uri")) { + if (bundle->uri) + return -1; + bundle->uri = xstrdup(value); + return 0; + } + + /* + * At this point, we ignore any information that we don't + * understand, assuming it to be hints for a heuristic the client + * does not currently understand. + */ + return 0; +} + static char *find_temp_filename(void) { int fd; diff --git a/config.c b/config.c index 015bec360f5..e93101249f6 100644 --- a/config.c +++ b/config.c @@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max) return 0; } -static int git_parse_int(const char *value, int *ret) +int git_parse_int(const char *value, int *ret) { intmax_t tmp; if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int))) diff --git a/config.h b/config.h index ca994d77147..ef9eade6414 100644 --- a/config.h +++ b/config.h @@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *, int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); +int git_parse_int(const char *value, int *ret); /** * Same as `git_config_bool`, except that it returns -1 on error rather -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v2 4/9] bundle-uri: create base key-value pair parsing 2022-09-09 14:33 ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-09-29 21:49 ` Jonathan Tan 0 siblings, 0 replies; 94+ messages in thread From: Jonathan Tan @ 2022-09-29 21:49 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Jonathan Tan, git, gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list, > return 0; > } > > +/** > + * Given a key-value pair, update the state of the given bundle list. > + * Returns 0 if the key-value pair is understood. Returns 1 if the key > + * is not understood or the value is malformed. > + */ > +MAYBE_UNUSED > +static int bundle_list_update(const char *key, const char *value, > + struct bundle_list *list) > +{ [snip] > + if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey)) > + return -1; The comment at the top should say -1 instead of 1. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v2 5/9] bundle-uri: create "key=value" line parsing 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (5 subsequent siblings) 10 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> When advertising a bundle list over Git's protocol v2, we will use packet lines. Each line will be of the form "key=value" representing a bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. We are not currently implementing this protocol v2 functionality, but instead preparing to expose this parsing to be unit-testable. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++++++++- bundle-uri.h | 12 ++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 4ccd14c8936..d4eb1ec7d4d 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list, * Returns 0 if the key-value pair is understood. Returns 1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED static int bundle_list_update(const char *key, const char *value, struct bundle_list *list) { @@ -308,3 +307,29 @@ cleanup: free(filename); return result; } + +/** + * General API for {transport,connect}.c etc. + */ +int bundle_uri_parse_line(struct bundle_list *list, const char *line) +{ + int result; + const char *equals; + struct strbuf key = STRBUF_INIT; + + if (!strlen(line)) + return error(_("bundle-uri: got an empty line")); + + equals = strchr(line, '='); + + if (!equals) + return error(_("bundle-uri: line is not of the form 'key=value'")); + if (line == equals || !*(equals + 1)) + return error(_("bundle-uri: line has empty key or value")); + + strbuf_add(&key, line, equals - line); + result = bundle_list_update(key.buf, equals + 1, list); + strbuf_release(&key); + + return result; +} diff --git a/bundle-uri.h b/bundle-uri.h index ff7e3fd3fb2..90583461929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); +/** + * General API for {transport,connect}.c etc. + */ + +/** + * Parse a "key=value" packet line from the bundle-uri verb. + * + * Returns 0 on success and non-zero on error. + */ +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + #endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 5/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-09 14:33 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 7/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 10 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> Create a new 'test-tool bundle-uri' test helper. This helper will assist in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses an input file as a list of lines. These are fed into bundle_uri_parse_line() to test how we construct a 'struct bundle_list' from that data. The list is then output to stdout as if the key-value pairs were a Git config file. We use an input file instead of stdin because of a future change to parse in config-file format that works better as an input file. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Makefile | 1 + bundle-uri.c | 33 ++++++++++ bundle-uri.h | 3 + t/helper/test-bundle-uri.c | 70 +++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 11 ++++ 8 files changed, 241 insertions(+) create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh diff --git a/Makefile b/Makefile index 7d5f48069ea..7dee0329c49 100644 --- a/Makefile +++ b/Makefile @@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bitmap.o TEST_BUILTINS_OBJS += test-bloom.o +TEST_BUILTINS_OBJS += test-bundle-uri.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o TEST_BUILTINS_OBJS += test-crontab.o diff --git a/bundle-uri.c b/bundle-uri.c index d4eb1ec7d4d..74d5695e99e 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +static int summarize_bundle(struct remote_bundle_info *info, void *data) +{ + FILE *fp = data; + fprintf(fp, "[bundle \"%s\"]\n", info->id); + fprintf(fp, "\turi = %s\n", info->uri); + return 0; +} + +void print_bundle_list(FILE *fp, struct bundle_list *list) +{ + const char *mode; + + switch (list->mode) { + case BUNDLE_MODE_ALL: + mode = "all"; + break; + + case BUNDLE_MODE_ANY: + mode = "any"; + break; + + case BUNDLE_MODE_NONE: + default: + mode = "<unknown>"; + } + + fprintf(fp, "[bundle]\n"); + fprintf(fp, "\tversion = %d\n", list->version); + fprintf(fp, "\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} + /** * Given a key-value pair, update the state of the given bundle list. * Returns 0 if the key-value pair is understood. Returns 1 if the key diff --git a/bundle-uri.h b/bundle-uri.h index 90583461929..0e56ab2ae5a 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list, bundle_iterator iter, void *data); +struct FILE; +void print_bundle_list(FILE *fp, struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c new file mode 100644 index 00000000000..0329c56544f --- /dev/null +++ b/t/helper/test-bundle-uri.c @@ -0,0 +1,70 @@ +#include "test-tool.h" +#include "parse-options.h" +#include "bundle-uri.h" +#include "strbuf.h" +#include "string-list.h" + +static int cmd__bundle_uri_parse(int argc, const char **argv) +{ + const char *key_value_usage[] = { + "test-tool bundle-uri parse-key-values <input>", + NULL + }; + const char **usage = key_value_usage; + struct option options[] = { + OPT_END(), + }; + struct strbuf sb = STRBUF_INIT; + struct bundle_list list; + int err = 0; + FILE *fp; + + argc = parse_options(argc, argv, NULL, options, usage, 0); + if (argc != 1) + goto usage; + + init_bundle_list(&list); + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + strbuf_release(&sb); + fclose(fp); + + print_bundle_list(stdout, &list); + + clear_bundle_list(&list); + + return !!err; + +usage: + usage_with_options(usage, options); +} + +int cmd__bundle_uri(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri <subcommand> [<options>]", + NULL + }; + struct option options[] = { + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_ARGV0); + if (argc == 1) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) + return cmd__bundle_uri_parse(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: + usage_with_options(usage, options); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c3..fbe2d9d8108 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "advise", cmd__advise_if_enabled }, { "bitmap", cmd__bitmap }, { "bloom", cmd__bloom }, + { "bundle-uri", cmd__bundle_uri }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, { "crontab", cmd__crontab }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb799271631..b2aa1f39a8f 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -7,6 +7,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); +int cmd__bundle_uri(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); int cmd__crontab(int argc, const char **argv); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh new file mode 100755 index 00000000000..fd142a66ad5 --- /dev/null +++ b/t/t5750-bundle-uri-parse.sh @@ -0,0 +1,121 @@ +#!/bin/sh + +test_description="Test bundle-uri bundle_uri_parse_line()" + +TEST_NO_CREATE_REPO=1 +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'bundle_uri_parse_line() just URIs' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-key-values in >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' ' + cat >in <<-\EOF && + =bogus-value + bogus-key= + EOF + + cat >err.expect <<-EOF && + error: bundle-uri: line has empty key or value + error: bad line: '\''=bogus-value'\'' + error: bundle-uri: line has empty key or value + error: bad line: '\''bogus-key='\'' + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + + bundle.two.uri=https://example.com/bundle.bdl + + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.one.uri=https://example.com/bundle-2.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6da7273f1d5..3175d665add 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1956,3 +1956,14 @@ test_is_magic_mtime () { rm -f .git/test-mtime-actual return $ret } + +# Given two filenames, parse both using 'git config --list --file' +# and compare the sorted output of those commands. Useful when +# wanting to ignore whitespace differences and sorting concerns. +test_cmp_config_output () { + git config --list --file="$1" >config-expect && + git config --list --file="$2" >config-actual && + sort config-expect >sorted-expect && + sort config-actual >sorted-actual && + test_cmp sorted-expect sorted-actual +} -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v2 7/9] bundle-uri: parse bundle list in config format 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 10 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When a bundle provider wants to operate independently from a Git remote, they want to provide a single, consistent URI that users can use in their 'git clone --bundle-uri' commands. At this point, the Git client expects that URI to be a single bundle that can be unbundled and used to bootstrap the rest of the clone from the Git server. This single bundle cannot be re-used to assist with future incremental fetches. To allow for the incremental fetch case, teach Git to understand a bundle list that could be advertised at an independent bundle URI. Such a bundle list is likely to be inspected by human readers, even if only by the bundle provider creating the list. For this reason, we can take our expected "key=value" pairs and instead format them using Git config format. Create bundle_uri_parse_config_format() to parse a file in config format and convert that into a 'struct bundle_list' filled with its understanding of the contents. Be careful to use error_action CONFIG_ERROR_ERROR when calling git_config_from_file_with_options() because the default action for git_config_from_file() is to die() on a parsing error. The current warning isn't particularly helpful if it arises to a user, but it will be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce bugs than this simpler version. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++ bundle-uri.h | 9 +++++++ t/helper/test-bundle-uri.c | 49 +++++++++++++++++++++++++++--------- t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++ 4 files changed, 123 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 74d5695e99e..92354aa3bbd 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value, return 0; } +static int config_to_bundle_list(const char *key, const char *value, void *data) +{ + struct bundle_list *list = data; + return bundle_list_update(key, value, list); +} + +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); + + if (!result && list->mode == BUNDLE_MODE_NONE) { + warning(_("bundle list at '%s' has no mode"), uri); + result = 1; + } + + return result; +} + static char *find_temp_filename(void) { int fd; diff --git a/bundle-uri.h b/bundle-uri.h index 0e56ab2ae5a..bc13d4c9929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list, struct FILE; void print_bundle_list(FILE *fp, struct bundle_list *list); +/** + * A bundle URI may point to a bundle list where the key=value + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c index 0329c56544f..25afd393428 100644 --- a/t/helper/test-bundle-uri.c +++ b/t/helper/test-bundle-uri.c @@ -4,12 +4,21 @@ #include "strbuf.h" #include "string-list.h" -static int cmd__bundle_uri_parse(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, +}; + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { const char *key_value_usage[] = { "test-tool bundle-uri parse-key-values <input>", NULL }; + const char *config_usage[] = { + "test-tool bundle-uri parse-config <input>", + NULL + }; const char **usage = key_value_usage; struct option options[] = { OPT_END(), @@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv) int err = 0; FILE *fp; - argc = parse_options(argc, argv, NULL, options, usage, 0); - if (argc != 1) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); - fp = fopen(argv[0], "r"); - if (!fp) - die("failed to open '%s'", argv[0]); - while (strbuf_getline(&sb, fp) != EOF) { - if (bundle_uri_parse_line(&list, sb.buf)) - err = error("bad line: '%s'", sb.buf); + switch (mode) { + case KEY_VALUE_PAIRS: + if (argc != 1) + goto usage; + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + fclose(fp); + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; + err = bundle_uri_parse_config_format("<uri>", argv[0], &list); + break; } strbuf_release(&sb); - fclose(fp); print_bundle_list(stdout, &list); @@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) - return cmd__bundle_uri_parse(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); error("there is no test-tool bundle-uri tool '%s'", argv[1]); usage: diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index fd142a66ad5..c2fe3f9c5a5 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines test_cmp_config_output expect actual ' +test_expect_success 'parse config format: just URIs' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'parse config format edge cases: empty key or value' ' + cat >in1 <<-\EOF && + = bogus-value + EOF + + cat >err1 <<-EOF && + error: bad config line 1 in file in1 + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && + test_cmp err1 err && + test_cmp_config_output expect actual && + + cat >in2 <<-\EOF && + bogus-key = + EOF + + cat >err2 <<-EOF && + error: bad config line 1 in file in2 + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && + test_cmp err2 err && + test_cmp_config_output expect actual +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 7/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 10 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The next change will start allowing us to parse bundle lists that are downloaded from a provided bundle URI. Those lists might point to other lists, which could proceed to an arbitrary depth (and even create cycles). Restructure fetch_bundle_uri() to have an internal version that has a recursion depth. Compare that to a new max_bundle_uri_depth constant that is twice as high as we expect this depth to be for any legitimate use of bundle list linking. We can consider making max_bundle_uri_depth a configurable value if there is demonstrated value in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 92354aa3bbd..b8ca6cd9493 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -336,11 +336,25 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } -int fetch_bundle_uri(struct repository *r, const char *uri) +/** + * This limits the recursion on fetch_bundle_uri_internal() when following + * bundle lists. + */ +static int max_bundle_uri_depth = 4; + +static int fetch_bundle_uri_internal(struct repository *r, + const char *uri, + int depth) { int result = 0; char *filename; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), + max_bundle_uri_depth); + return -1; + } + if (!(filename = find_temp_filename())) { result = -1; goto cleanup; @@ -368,6 +382,11 @@ cleanup: return result; } +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + return fetch_bundle_uri_internal(r, uri, 0); +} + /** * General API for {transport,connect}.c etc. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v2 9/9] bundle-uri: fetch a list of bundles 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 ` Derrick Stolee via GitGitGadget 2022-09-29 21:58 ` Jonathan Tan 2022-09-26 13:19 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget 10 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-09-09 14:33 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 216 +++++++++++++++++++++++++++++++++--- bundle-uri.h | 13 +++ t/t5558-clone-bundle-uri.sh | 93 ++++++++++++++++ 3 files changed, 306 insertions(+), 16 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index b8ca6cd9493..6a2fea26a94 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle, { FREE_AND_NULL(bundle->id); FREE_AND_NULL(bundle->uri); + FREE_AND_NULL(bundle->file); + bundle->unbundled = 0; return 0; } @@ -336,18 +338,111 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } +struct bundle_list_context { + struct repository *r; + struct bundle_list *list; + enum bundle_list_mode mode; + int count; + int depth; +}; + +/* + * This early definition is necessary because we use indirect recursion: + * + * While iterating through a bundle list that was downloaded as part + * of fetch_bundle_uri_internal(), iterator methods eventually call it + * again, but with depth + 1. + */ +static int fetch_bundle_uri_internal(struct repository *r, + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list); + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ + int res; + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + + res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); + + /* + * Only increment count if the download succeeded. If our mode is + * BUNDLE_MODE_ANY, then we will want to try other URIs in the + * list in case they work instead. + */ + if (!res) + ctx->count++; + return res; +} + +static int download_bundle_list(struct repository *r, + struct bundle_list *local_list, + struct bundle_list *global_list, + int depth) +{ + struct bundle_list_context ctx = { + .r = r, + .list = global_list, + .depth = depth + 1, + .mode = local_list->mode, + }; + + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); +} + +static int fetch_bundle_list_in_config_format(struct repository *r, + struct bundle_list *global_list, + struct remote_bundle_info *bundle, + int depth) +{ + int result; + struct bundle_list list_from_bundle; + + init_bundle_list(&list_from_bundle); + + if ((result = bundle_uri_parse_config_format(bundle->uri, + bundle->file, + &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { + warning(_("unrecognized bundle mode from URI '%s'"), + bundle->uri); + result = -1; + goto cleanup; + } + + if ((result = download_bundle_list(r, &list_from_bundle, + global_list, depth))) + goto cleanup; + +cleanup: + clear_bundle_list(&list_from_bundle); + return result; +} + /** * This limits the recursion on fetch_bundle_uri_internal() when following * bundle lists. */ static int max_bundle_uri_depth = 4; +/** + * Recursively download all bundles advertised at the given URI + * to files. If the file is a bundle, then add it to the given + * 'list'. Otherwise, expect a bundle list and recurse on the + * URIs in that list according to the list mode (ANY or ALL). + */ static int fetch_bundle_uri_internal(struct repository *r, - const char *uri, - int depth) + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list) { int result = 0; - char *filename; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { warning(_("exceeded bundle URI recursion limit (%d)"), @@ -355,36 +450,125 @@ static int fetch_bundle_uri_internal(struct repository *r, return -1; } - if (!(filename = find_temp_filename())) { + if (!bundle->file && + !(bundle->file = find_temp_filename())) { result = -1; goto cleanup; } - if ((result = copy_uri_to_file(filename, uri))) { - warning(_("failed to download bundle from URI '%s'"), uri); + if ((result = copy_uri_to_file(bundle->file, bundle->uri))) { + warning(_("failed to download bundle from URI '%s'"), bundle->uri); goto cleanup; } - if ((result = !is_bundle(filename, 0))) { - warning(_("file at URI '%s' is not a bundle"), uri); + if ((result = !is_bundle(bundle->file, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); + if (result) + warning(_("file at URI '%s' is not a bundle or bundle list"), + bundle->uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename))) { - warning(_("failed to unbundle bundle from URI '%s'"), uri); - goto cleanup; - } + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); + bcopy->file = xstrdup(bundle->file); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: - if (filename) - unlink(filename); - free(filename); + if (result && bundle->file) + unlink(bundle->file); return result; } +struct attempt_unbundle_context { + struct repository *r; + int success_count; + int failure_count; +}; + +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ + struct attempt_unbundle_context *ctx = data; + + if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) { + ctx->success_count++; + info->unbundled = 1; + } else { + ctx->failure_count++; + } + + return 0; +} + +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ + int last_success_count = -1; + struct attempt_unbundle_context ctx = { + .r = r, + }; + + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. + */ + while (last_success_count < ctx.success_count) { + last_success_count = ctx.success_count; + + ctx.success_count = 0; + ctx.failure_count = 0; + for_all_bundles_in_list(list, attempt_unbundle, &ctx); + } + + if (ctx.success_count) + git_config_set_multivar_gently("log.excludedecoration", + "refs/bundle/", + "refs/bundle/", + CONFIG_FLAGS_FIXED_VALUE | + CONFIG_FLAGS_MULTI_REPLACE); + + if (ctx.failure_count) + warning(_("failed to unbundle %d bundles"), + ctx.failure_count); + + return 0; +} + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ + if (info->file) + unlink_or_warn(info->file); + return 0; +} + int fetch_bundle_uri(struct repository *r, const char *uri) { - return fetch_bundle_uri_internal(r, uri, 0); + int result; + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), + .id = xstrdup(""), + }; + + init_bundle_list(&list); + + /* If a bundle is added to this global list, then it is required. */ + list.mode = BUNDLE_MODE_ALL; + + if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list))) + goto cleanup; + + result = unbundle_all_bundles(r, &list); + +cleanup: + for_all_bundles_in_list(&list, unlink_bundle, NULL); + clear_bundle_list(&list); + clear_remote_bundle_info(&bundle, NULL); + return result; } /** diff --git a/bundle-uri.h b/bundle-uri.h index bc13d4c9929..4dbc269823c 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -28,6 +28,19 @@ struct remote_bundle_info { * if there was no table of contents. */ char *uri; + + /** + * If the bundle has been downloaded, then 'file' is a + * filename storing its contents. Otherwise, 'file' is + * NULL. + */ + char *file; + + /** + * If the bundle has been unbundled successfully, then + * this boolean is true. + */ + unsigned unbundled:1; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index ad666a2d28a..592790b49f0 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -41,6 +41,72 @@ test_expect_success 'clone with file:// bundle' ' test_cmp expect actual ' +# To get interesting tests for bundle lists, we need to construct a +# somewhat-interesting commit history. +# +# ---------------- bundle-4 +# +# 4 +# / \ +# ----|---|------- bundle-3 +# | | +# | 3 +# | | +# ----|---|------- bundle-2 +# | | +# 2 | +# | | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'construct incremental bundle list' ' + ( + cd clone-from && + git checkout -b base && + test_commit 1 && + git checkout -b left && + test_commit 2 && + git checkout -b right base && + test_commit 3 && + git checkout -b merge left && + git merge right -m "4" && + + git bundle create bundle-1.bundle base && + git bundle create bundle-2.bundle base..left && + git bundle create bundle-3.bundle base..right && + git bundle create bundle-4.bundle merge --not left right + ) +' + +test_expect_success 'clone bundle list (file, no heuristic)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + [bundle "bundle-3"] + uri = file://$(pwd)/clone-from/bundle-3.bundle + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file && + for oid in $(git -C clone-from for-each-ref --format="%(objectname)") + do + git -C clone-list-file rev-parse $oid || return 1 + done +' + + ######################################################################### # HTTP tests begin here @@ -75,6 +141,33 @@ test_expect_success 'clone HTTP bundle' ' test_config -C clone-http log.excludedecoration refs/bundle/ ' +test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + [bundle "bundle-2"] + uri = $HTTPD_URL/bundle-2.bundle + + [bundle "bundle-3"] + uri = $HTTPD_URL/bundle-3.bundle + + [bundle "bundle-4"] + uri = $HTTPD_URL/bundle-4.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http && + for oid in $(git -C clone-from for-each-ref --format="%(objectname)") + do + git -C clone-list-http rev-parse $oid || return 1 + done +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v2 9/9] bundle-uri: fetch a list of bundles 2022-09-09 14:33 ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-09-29 21:58 ` Jonathan Tan 2022-09-30 12:49 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Jonathan Tan @ 2022-09-29 21:58 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Jonathan Tan, git, gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) > +{ > + int res; > + struct bundle_list_context *ctx = data; > + > + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) > + return 0; > + > + res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); > + > + /* > + * Only increment count if the download succeeded. If our mode is > + * BUNDLE_MODE_ANY, then we will want to try other URIs in the > + * list in case they work instead. > + */ > + if (!res) > + ctx->count++; > + return res; > +} So this returns nonzero if a download fails... > +static int download_bundle_list(struct repository *r, > + struct bundle_list *local_list, > + struct bundle_list *global_list, > + int depth) > +{ > + struct bundle_list_context ctx = { > + .r = r, > + .list = global_list, > + .depth = depth + 1, > + .mode = local_list->mode, > + }; > + > + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); > +} ...and for_all_bundles_in_list does not proceed with the rest of the loop if any callback invocation returns nonzero. Don't we need to continue retrying the others if the mode is ANY? > +static int attempt_unbundle(struct remote_bundle_info *info, void *data) > +{ > + struct attempt_unbundle_context *ctx = data; > + > + if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) { > + ctx->success_count++; > + info->unbundled = 1; > + } else { > + ctx->failure_count++; > + } > + > + return 0; > +} Do we need to handle the case in which a file is missing but it's expected because the mode is ANY and another file was successfully downloaded? > +static int unbundle_all_bundles(struct repository *r, > + struct bundle_list *list) > +{ > + int last_success_count = -1; > + struct attempt_unbundle_context ctx = { > + .r = r, > + }; > + > + /* > + * Iterate through all bundles looking for ones that can > + * successfully unbundle. If any succeed, then perhaps another > + * will succeed in the next attempt. > + */ > + while (last_success_count < ctx.success_count) { > + last_success_count = ctx.success_count; > + > + ctx.success_count = 0; > + ctx.failure_count = 0; > + for_all_bundles_in_list(list, attempt_unbundle, &ctx); I think it would have been clearer if the invocation to for_all_bundles_in_list were to stop early if a bundle has been successfully unbundled, and then you can just run this loop n times, instead of needing to reset the success count each time in order to check that the latest count is more than the prior one. But this works too. [snip tests] I see that there are ALL tests, but could we have an ANY test as well? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 9/9] bundle-uri: fetch a list of bundles 2022-09-29 21:58 ` Jonathan Tan @ 2022-09-30 12:49 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-09-30 12:49 UTC (permalink / raw) To: Jonathan Tan, Derrick Stolee via GitGitGadget Cc: git, gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long On 9/29/2022 5:58 PM, Jonathan Tan wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) >> +{ >> + int res; >> + struct bundle_list_context *ctx = data; >> + >> + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) >> + return 0; >> + >> + res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); >> + >> + /* >> + * Only increment count if the download succeeded. If our mode is >> + * BUNDLE_MODE_ANY, then we will want to try other URIs in the >> + * list in case they work instead. >> + */ >> + if (!res) >> + ctx->count++; >> + return res; >> +} > > So this returns nonzero if a download fails... > >> +static int download_bundle_list(struct repository *r, >> + struct bundle_list *local_list, >> + struct bundle_list *global_list, >> + int depth) >> +{ >> + struct bundle_list_context ctx = { >> + .r = r, >> + .list = global_list, >> + .depth = depth + 1, >> + .mode = local_list->mode, >> + }; >> + >> + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); >> +} > > ...and for_all_bundles_in_list does not proceed with the rest of the > loop if any callback invocation returns nonzero. Don't we need to > continue retrying the others if the mode is ANY? You are right! Thanks. >> +static int attempt_unbundle(struct remote_bundle_info *info, void *data) >> +{ >> + struct attempt_unbundle_context *ctx = data; >> + >> + if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) { >> + ctx->success_count++; >> + info->unbundled = 1; >> + } else { >> + ctx->failure_count++; >> + } >> + >> + return 0; >> +} > > Do we need to handle the case in which a file is missing but it's > expected because the mode is ANY and another file was successfully > downloaded? By "file is missing" I think you mean "we never successfully downloaded that file" and I agree that we should skip those bundles. I'll add more tests for ANY mode to hopefully catch these issues. >> +static int unbundle_all_bundles(struct repository *r, >> + struct bundle_list *list) >> +{ >> + int last_success_count = -1; >> + struct attempt_unbundle_context ctx = { >> + .r = r, >> + }; >> + >> + /* >> + * Iterate through all bundles looking for ones that can >> + * successfully unbundle. If any succeed, then perhaps another >> + * will succeed in the next attempt. >> + */ >> + while (last_success_count < ctx.success_count) { >> + last_success_count = ctx.success_count; >> + >> + ctx.success_count = 0; >> + ctx.failure_count = 0; >> + for_all_bundles_in_list(list, attempt_unbundle, &ctx); > > I think it would have been clearer if the invocation to > for_all_bundles_in_list were to stop early if a bundle has been > successfully unbundled, and then you can just run this loop n times, > instead of needing to reset the success count each time in order to > check that the latest count is more than the prior one. But this works > too. It's a little bit backwards to have the "terminate early with nonzero value" signal "success", but it would work. With careful commenting, I think it's doable. > I see that there are ALL tests, but could we have an ANY test as well? Yes, excellent point. They are absolutely necessary. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2022-09-09 14:33 ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-09-26 13:19 ` Derrick Stolee 2022-09-26 19:10 ` Junio C Hamano 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget 10 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee @ 2022-09-26 13:19 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget, git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long On 9/9/2022 10:33 AM, Derrick Stolee via GitGitGadget wrote: > Updates in v2 > ============= > > Thank you to all of the voices who chimed in on the previous version. I'm > sorry it took so long for me to get a new version. > > * I've done a rather thorough overhaul to minimize how often later patches > rewrite portions of earlier patches. > > * We no longer use a strbuf in struct remote_bundle_info. Instead, use a > 'char *' and only in the patch where it is first used. > > * The config documentation is more clearly indicating that the bundle.* > section has no effect in the repository config (at the moment, which will > change in the next series). > > * The bundle.version value is now parsed using git_parse_int(). > > * The config key is now parsed using parse_config_key(). > > * Commit messages clarify more about the context of the change in the > bigger picture of the bundle URI effort. > > * Some printf()s are correctly changed to fprintf()s. > > * The test helper CLI is unified across the two modes. They both take a > filename now. > > * The count of downloaded bundles is now only updated after a successful > download, allowing the "any" mode to keep trying after a failure. If some of the reviewers from v1 could check that I responded to their comments, then that would be a big help to getting this series moving again. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists 2022-09-26 13:19 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee @ 2022-09-26 19:10 ` Junio C Hamano 2022-09-29 22:00 ` Jonathan Tan 0 siblings, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-09-26 19:10 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long Derrick Stolee <derrickstolee@github.com> writes: > If some of the reviewers from v1 could check that I responded to their > comments, then that would be a big help to getting this series moving > again. Thanks for a ping. Also, if reviewers who missed v1 can take a look and give fresh insights, that would also help polishing the series further. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists 2022-09-26 19:10 ` Junio C Hamano @ 2022-09-29 22:00 ` Jonathan Tan 2022-09-30 13:21 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Jonathan Tan @ 2022-09-29 22:00 UTC (permalink / raw) To: Junio C Hamano Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long Junio C Hamano <gitster@pobox.com> writes: > Derrick Stolee <derrickstolee@github.com> writes: > > > If some of the reviewers from v1 could check that I responded to their > > comments, then that would be a big help to getting this series moving > > again. Yes, all my comments from v1 were indeed addressed, thanks. > Thanks for a ping. Also, if reviewers who missed v1 can take a look > and give fresh insights, that would also help polishing the series > further. I didn't miss v1 but I gave some new insights. :-) The patch set looks good except for some commands I had on the last one. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists 2022-09-29 22:00 ` Jonathan Tan @ 2022-09-30 13:21 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-09-30 13:21 UTC (permalink / raw) To: Jonathan Tan, Junio C Hamano Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long On 9/29/2022 6:00 PM, Jonathan Tan wrote: > Junio C Hamano <gitster@pobox.com> writes: >> Derrick Stolee <derrickstolee@github.com> writes: >> >>> If some of the reviewers from v1 could check that I responded to their >>> comments, then that would be a big help to getting this series moving >>> again. > > Yes, all my comments from v1 were indeed addressed, thanks. > >> Thanks for a ping. Also, if reviewers who missed v1 can take a look >> and give fresh insights, that would also help polishing the series >> further. > > I didn't miss v1 but I gave some new insights. :-) The patch set looks > good except for some commands I had on the last one. Thanks for taking a detailed look. I've added extra "any" mode tests to my local branch in addition to the code changes you recommended. I'll plan to send a v3 early next week, giving time for any other review comments to trickle in. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v3 0/9] Bundle URIs III: Parse and download from bundle lists 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2022-09-26 13:19 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget ` (9 more replies) 10 siblings, 10 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list. The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series). The patches are organized in this way: 1. Patches 1-2 are cleanups from the previous part. The first was recommended by Teng Long and the second allows us to simplify our bundle list data structure slightly. 2. Patches 3-4 create the bundle list data structures and the logic for populating the list from key-value pairs. 3. Patches 5-6 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2]. 4. Patch 7 teaches Git to parse Git config files into bundle lists. 5. Patches 8-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. [1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/ [2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/ At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file. As outlined in the design document [1], the next steps after this are: 1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2] 2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3] I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going: [3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com [4] https://github.com/derrickstolee/git/pull/21 [5] https://github.com/derrickstolee/git/pull/22 Updates in v3 ============= * Fixed a comment about a return value of -1. * Fixed and tested scenario where early URIs fail in "any" mode and Git should try the rest of the list. * Instead of using 'success_count' and 'failure_count', use the iterator return value to terminate the "all" mode loop early. Updates in v2 ============= Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version. * I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches. * We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used. * The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series). * The bundle.version value is now parsed using git_parse_int(). * The config key is now parsed using parse_config_key(). * Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort. * Some printf()s are correctly changed to fprintf()s. * The test helper CLI is unified across the two modes. They both take a filename now. * The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure. Thanks, * Stolee Derrick Stolee (7): bundle-uri: use plain string in find_temp_filename() bundle-uri: create bundle_list struct and helpers bundle-uri: create base key-value pair parsing bundle-uri: parse bundle list in config format bundle-uri: limit recursion depth for bundle lists bundle-uri: fetch a list of bundles bundle-uri: suppress stderr from remote-https Ævar Arnfjörð Bjarmason (2): bundle-uri: create "key=value" line parsing bundle-uri: unit test "key=value" parsing Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 ++ Makefile | 1 + bundle-uri.c | 449 ++++++++++++++++++++++++++++++-- bundle-uri.h | 93 +++++++ config.c | 2 +- config.h | 1 + t/helper/test-bundle-uri.c | 95 +++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5558-clone-bundle-uri.sh | 143 ++++++++++ t/t5750-bundle-uri-parse.sh | 171 ++++++++++++ t/test-lib-functions.sh | 11 + 13 files changed, 976 insertions(+), 18 deletions(-) create mode 100644 Documentation/config/bundle.txt create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1333 Range-diff vs v2: 1: 2ca431e6c37 < -: ----------- bundle-uri: short-circuit capability parsing 2: ee6c4b824c2 = 1: 48beccb0f5e bundle-uri: use plain string in find_temp_filename() 3: d9812440594 = 2: f0c4457951c bundle-uri: create bundle_list struct and helpers 4: 70daef66833 ! 3: 430e01cd2a4 bundle-uri: create base key-value pair parsing @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, +/** + * Given a key-value pair, update the state of the given bundle list. -+ * Returns 0 if the key-value pair is understood. Returns 1 if the key ++ * Returns 0 if the key-value pair is understood. Returns -1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED 5: 4df3f834029 ! 4: cd915d57f3b bundle-uri: create "key=value" line parsing @@ Commit message ## bundle-uri.c ## @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, - * Returns 0 if the key-value pair is understood. Returns 1 if the key + * Returns 0 if the key-value pair is understood. Returns -1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED 6: 91c5b58f011 ! 5: 4d8cac67f66 bundle-uri: unit test "key=value" parsing @@ bundle-uri.c: int for_all_bundles_in_list(struct bundle_list *list, + /** * Given a key-value pair, update the state of the given bundle list. - * Returns 0 if the key-value pair is understood. Returns 1 if the key + * Returns 0 if the key-value pair is understood. Returns -1 if the key ## bundle-uri.h ## @@ bundle-uri.h: int for_all_bundles_in_list(struct bundle_list *list, 7: 1492b8f5ef0 = 6: 0ecae3a44b3 bundle-uri: parse bundle list in config format 8: b5d570082fa = 7: 7e6b32313b0 bundle-uri: limit recursion depth for bundle lists 9: a6ab8f7c699 ! 8: 46799648b4c bundle-uri: fetch a list of bundles @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + */ + if (!res) + ctx->count++; -+ return res; ++ ++ /* ++ * In BUNDLE_MODE_ANY, we need to continue iterating until we find ++ * a bundle that works, so do not signal a failure here. ++ */ ++ return ctx->mode == BUNDLE_MODE_ANY ? 0 : res; +} + +static int download_bundle_list(struct repository *r, @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, return result; } -+struct attempt_unbundle_context { -+ struct repository *r; -+ int success_count; -+ int failure_count; -+}; -+ ++/** ++ * This loop iterator breaks the loop with nonzero return code on the ++ * first successful unbundling of a bundle. ++ */ +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ -+ struct attempt_unbundle_context *ctx = data; ++ struct repository *r = data; + -+ if (info->unbundled || !unbundle_from_file(ctx->r, info->file)) { -+ ctx->success_count++; ++ if (!info->file || info->unbundled) ++ return 0; ++ ++ if (!unbundle_from_file(r, info->file)) { + info->unbundled = 1; -+ } else { -+ ctx->failure_count++; ++ return 1; + } + + return 0; @@ bundle-uri.c: static int fetch_bundle_uri_internal(struct repository *r, +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ -+ int last_success_count = -1; -+ struct attempt_unbundle_context ctx = { -+ .r = r, -+ }; -+ + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. ++ * ++ * Keep in mind that a non-zero result for the loop here means ++ * the loop terminated early on a successful unbundling, which ++ * signals that we can try again. + */ -+ while (last_success_count < ctx.success_count) { -+ last_success_count = ctx.success_count; -+ -+ ctx.success_count = 0; -+ ctx.failure_count = 0; -+ for_all_bundles_in_list(list, attempt_unbundle, &ctx); -+ } -+ -+ if (ctx.success_count) -+ git_config_set_multivar_gently("log.excludedecoration", -+ "refs/bundle/", -+ "refs/bundle/", -+ CONFIG_FLAGS_FIXED_VALUE | -+ CONFIG_FLAGS_MULTI_REPLACE); -+ -+ if (ctx.failure_count) -+ warning(_("failed to unbundle %d bundles"), -+ ctx.failure_count); ++ while (for_all_bundles_in_list(list, attempt_unbundle, r)) ; + + return 0; +} @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' ' + uri = file://$(pwd)/clone-from/bundle-4.bundle + EOF + -+ git clone --bundle-uri="file://$(pwd)/bundle-list" . clone-list-file && -+ for oid in $(git -C clone-from for-each-ref --format="%(objectname)") -+ do -+ git -C clone-list-file rev-parse $oid || return 1 -+ done ++ git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-list-file cat-file --batch-check <oids +' + ++test_expect_success 'clone bundle list (file, any mode)' ' ++ cat >bundle-list <<-EOF && ++ [bundle] ++ version = 1 ++ mode = any ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-0"] ++ uri = $HTTPD_URL/bundle-0.bundle ++ ++ [bundle "bundle-1"] ++ uri = $HTTPD_URL/bundle-1.bundle ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-5"] ++ uri = $HTTPD_URL/bundle-5.bundle ++ EOF ++ ++ git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-any-file cat-file --batch-check <oids ++' + ######################################################################### # HTTP tests begin here @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' ' + uri = $HTTPD_URL/bundle-4.bundle + EOF + -+ git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http && -+ for oid in $(git -C clone-from for-each-ref --format="%(objectname)") -+ do -+ git -C clone-list-http rev-parse $oid || return 1 -+ done ++ git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-list-http cat-file --batch-check <oids ++' ++ ++test_expect_success 'clone bundle list (HTTP, any mode)' ' ++ cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && ++ cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && ++ [bundle] ++ version = 1 ++ mode = any ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-0"] ++ uri = $HTTPD_URL/bundle-0.bundle ++ ++ [bundle "bundle-1"] ++ uri = $HTTPD_URL/bundle-1.bundle ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-5"] ++ uri = $HTTPD_URL/bundle-5.bundle ++ EOF ++ ++ git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-any-http cat-file --batch-check <oids +' + # Do not add tests here unless they use the HTTP server, as they will -: ----------- > 9: d84544859e4 bundle-uri: suppress stderr from remote-https -- gitgitgadget ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The find_temp_filename() method was created in 53a50892be2 (bundle-uri: create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to create a temporary filename. The odb_mkstemp() method uses a strbuf in its interface, but we do not need to continue carrying a strbuf throughout the bundle URI code. Convert the find_temp_filename() method to use a 'char *' and modify its only caller. This makes sense that we don't actually need to modify this filename directly later, so using a strbuf is overkill. This change will simplify the data structure for tracking a bundle list to use plain strings instead of strbufs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 4a8cc74ed05..8b2f4e08c9c 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -5,22 +5,23 @@ #include "refs.h" #include "run-command.h" -static int find_temp_filename(struct strbuf *name) +static char *find_temp_filename(void) { int fd; + struct strbuf name = STRBUF_INIT; /* * Find a temporary filename that is available. This is briefly * racy, but unlikely to collide. */ - fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX"); + fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX"); if (fd < 0) { warning(_("failed to create temporary file")); - return -1; + return NULL; } close(fd); - unlink(name->buf); - return 0; + unlink(name.buf); + return strbuf_detach(&name, NULL); } static int download_https_uri_to_file(const char *file, const char *uri) @@ -141,28 +142,31 @@ static int unbundle_from_file(struct repository *r, const char *file) int fetch_bundle_uri(struct repository *r, const char *uri) { int result = 0; - struct strbuf filename = STRBUF_INIT; + char *filename; - if ((result = find_temp_filename(&filename))) + if (!(filename = find_temp_filename())) { + result = -1; goto cleanup; + } - if ((result = copy_uri_to_file(filename.buf, uri))) { + if ((result = copy_uri_to_file(filename, uri))) { warning(_("failed to download bundle from URI '%s'"), uri); goto cleanup; } - if ((result = !is_bundle(filename.buf, 0))) { + if ((result = !is_bundle(filename, 0))) { warning(_("file at URI '%s' is not a bundle"), uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename.buf))) { + if ((result = unbundle_from_file(r, filename))) { warning(_("failed to unbundle bundle from URI '%s'"), uri); goto cleanup; } cleanup: - unlink(filename.buf); - strbuf_release(&filename); + if (filename) + unlink(filename); + free(filename); return result; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 3/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> It will likely be rare where a user uses a single bundle URI and expects that URI to point to a bundle. Instead, that URI will likely be a list of bundles provided in some format. Alternatively, the Git server could advertise a list of bundles. In anticipation of these two ways of advertising multiple bundles, create a data structure that represents such a list. This will be populated using a common API, but for now focus on what data can be represented. Each list contains a number of remote_bundle_info structs. These contain an 'id' that is used to uniquely identify them in the list, and also a 'uri' that contains the location of its data. Finally, there is a strbuf containing the filename used when Git downloads the contents to disk. The list itself stores these remote_bundle_info structs in a hashtable using 'id' as the key. The order of the structs in the input is considered unimportant, but future modifications to the format and these data structures will place ordering possibilities on the set. The list also has a few "global" properties, including the version (used when parsing the list) and the mode. The mode is one of these two options: 1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined together. The client should download all of the advertised data to have a complete copy of the data. 2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete copy of the data. The client can choose arbitrarily from these options. In the future, the client may use pings to find the closest URI among geodistributed replicas, or use some other heuristic information added to the format. This API is currently unused, but will soon be expanded with parsing logic and then be consumed by the bundle URI download logic. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 8b2f4e08c9c..f9a8db221bc 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -4,6 +4,66 @@ #include "object-store.h" #include "refs.h" #include "run-command.h" +#include "hashmap.h" +#include "pkt-line.h" + +static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, + const struct hashmap_entry *he2, + const void *id) +{ + const struct remote_bundle_info *e1 = + container_of(he1, const struct remote_bundle_info, ent); + const struct remote_bundle_info *e2 = + container_of(he2, const struct remote_bundle_info, ent); + + return strcmp(e1->id, id ? (const char *)id : e2->id); +} + +void init_bundle_list(struct bundle_list *list) +{ + memset(list, 0, sizeof(*list)); + + /* Implied defaults. */ + list->mode = BUNDLE_MODE_ALL; + list->version = 1; + + hashmap_init(&list->bundles, compare_bundles, NULL, 0); +} + +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); + return 0; +} + +void clear_bundle_list(struct bundle_list *list) +{ + if (!list) + return; + + for_all_bundles_in_list(list, clear_remote_bundle_info, NULL); + hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent); +} + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data) +{ + struct remote_bundle_info *info; + struct hashmap_iter i; + + hashmap_for_each_entry(&list->bundles, &i, info, ent) { + int result = iter(info, data); + + if (result) + return result; + } + + return 0; +} static char *find_temp_filename(void) { diff --git a/bundle-uri.h b/bundle-uri.h index 8a152f1ef14..ff7e3fd3fb2 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -1,7 +1,63 @@ #ifndef BUNDLE_URI_H #define BUNDLE_URI_H +#include "hashmap.h" +#include "strbuf.h" + struct repository; +struct string_list; + +/** + * The remote_bundle_info struct contains information for a single bundle + * URI. This may be initialized simply by a given URI or might have + * additional metadata associated with it if the bundle was advertised by + * a bundle list. + */ +struct remote_bundle_info { + struct hashmap_entry ent; + + /** + * The 'id' is a name given to the bundle for reference + * by other bundle infos. + */ + char *id; + + /** + * The 'uri' is the location of the remote bundle so + * it can be downloaded on-demand. This will be NULL + * if there was no table of contents. + */ + char *uri; +}; + +#define REMOTE_BUNDLE_INFO_INIT { 0 } + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, + BUNDLE_MODE_ALL, + BUNDLE_MODE_ANY +}; + +/** + * A bundle_list contains an unordered set of remote_bundle_info structs, + * as well as information about the bundle listing, such as version and + * mode. + */ +struct bundle_list { + int version; + enum bundle_list_mode mode; + struct hashmap bundles; +}; + +void init_bundle_list(struct bundle_list *list); +void clear_bundle_list(struct bundle_list *list); + +typedef int (*bundle_iterator)(struct remote_bundle_info *bundle, + void *data); + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data); /** * Fetch data from the given 'uri' and unbundle the bundle data found -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 3/9] bundle-uri: create base key-value pair parsing 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 4/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (6 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "<id>" as part of the key name. Keys using the same "<id>" describe a single bundle list item. * bundle.<id>.uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. While parsing, return an error if a URI key is repeated, since we can make that restriction with bundle lists. Make the git_parse_int() method global so we can parse the integer version value carefully. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 +++++++++++ bundle-uri.c | 76 +++++++++++++++++++++++++++++++++ config.c | 2 +- config.h | 1 + 5 files changed, 104 insertions(+), 1 deletion(-) create mode 100644 Documentation/config/bundle.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index e376d547ce0..4280af6992e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -387,6 +387,8 @@ include::config/branch.txt[] include::config/browser.txt[] +include::config/bundle.txt[] + include::config/checkout.txt[] include::config/clean.txt[] diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt new file mode 100644 index 00000000000..daa21eb674a --- /dev/null +++ b/Documentation/config/bundle.txt @@ -0,0 +1,24 @@ +bundle.*:: + The `bundle.*` keys may appear in a bundle list file found via the + `git clone --bundle-uri` option. These keys currently have no effect + if placed in a repository config file, though this will change in the + future. See link:technical/bundle-uri.html[the bundle URI design + document] for more details. + +bundle.version:: + This integer value advertises the version of the bundle list format + used by the bundle list. Currently, the only accepted value is `1`. + +bundle.mode:: + This string value should be either `all` or `any`. This value describes + whether all of the advertised bundles are required to unbundle a + complete understanding of the bundled information (`all`) or if any one + of the listed bundle URIs is sufficient (`any`). + +bundle.<id>.*:: + The `bundle.<id>.*` keys are used to describe a single item in the + bundle list, grouped under `<id>` for identification purposes. + +bundle.<id>.uri:: + This string value defines the URI by which Git can reach the contents + of this `<id>`. This URI may be a bundle file or another bundle list. diff --git a/bundle-uri.c b/bundle-uri.c index f9a8db221bc..0bc59dd9c34 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -6,6 +6,7 @@ #include "run-command.h" #include "hashmap.h" #include "pkt-line.h" +#include "config.h" static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, @@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +/** + * Given a key-value pair, update the state of the given bundle list. + * Returns 0 if the key-value pair is understood. Returns -1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; + const char *subsection, *subkey; + size_t subsection_len; + + if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey)) + return -1; + + if (!subsection_len) { + if (!strcmp(subkey, "version")) { + int version; + if (!git_parse_int(value, &version)) + return -1; + if (version != 1) + return -1; + + list->version = version; + return 0; + } + + if (!strcmp(subkey, "mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else + return -1; + return 0; + } + + /* Ignore other unknown global keys. */ + return 0; + } + + strbuf_add(&id, subsection, subsection_len); + + /* + * Check for an existing bundle with this <id>, or create one + * if necessary. + */ + lookup.id = id.buf; + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + + if (!strcmp(subkey, "uri")) { + if (bundle->uri) + return -1; + bundle->uri = xstrdup(value); + return 0; + } + + /* + * At this point, we ignore any information that we don't + * understand, assuming it to be hints for a heuristic the client + * does not currently understand. + */ + return 0; +} + static char *find_temp_filename(void) { int fd; diff --git a/config.c b/config.c index 015bec360f5..e93101249f6 100644 --- a/config.c +++ b/config.c @@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max) return 0; } -static int git_parse_int(const char *value, int *ret) +int git_parse_int(const char *value, int *ret) { intmax_t tmp; if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int))) diff --git a/config.h b/config.h index ca994d77147..ef9eade6414 100644 --- a/config.h +++ b/config.h @@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *, int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); +int git_parse_int(const char *value, int *ret); /** * Same as `git_config_bool`, except that it returns -1 on error rather -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 4/9] bundle-uri: create "key=value" line parsing 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 3/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (5 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> When advertising a bundle list over Git's protocol v2, we will use packet lines. Each line will be of the form "key=value" representing a bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. We are not currently implementing this protocol v2 functionality, but instead preparing to expose this parsing to be unit-testable. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++++++++- bundle-uri.h | 12 ++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 0bc59dd9c34..372e6fac5cf 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list, * Returns 0 if the key-value pair is understood. Returns -1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED static int bundle_list_update(const char *key, const char *value, struct bundle_list *list) { @@ -306,3 +305,29 @@ cleanup: free(filename); return result; } + +/** + * General API for {transport,connect}.c etc. + */ +int bundle_uri_parse_line(struct bundle_list *list, const char *line) +{ + int result; + const char *equals; + struct strbuf key = STRBUF_INIT; + + if (!strlen(line)) + return error(_("bundle-uri: got an empty line")); + + equals = strchr(line, '='); + + if (!equals) + return error(_("bundle-uri: line is not of the form 'key=value'")); + if (line == equals || !*(equals + 1)) + return error(_("bundle-uri: line has empty key or value")); + + strbuf_add(&key, line, equals - line); + result = bundle_list_update(key.buf, equals + 1, list); + strbuf_release(&key); + + return result; +} diff --git a/bundle-uri.h b/bundle-uri.h index ff7e3fd3fb2..90583461929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); +/** + * General API for {transport,connect}.c etc. + */ + +/** + * Parse a "key=value" packet line from the bundle-uri verb. + * + * Returns 0 on success and non-zero on error. + */ +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + #endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 4/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-04 12:34 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 6/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> Create a new 'test-tool bundle-uri' test helper. This helper will assist in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses an input file as a list of lines. These are fed into bundle_uri_parse_line() to test how we construct a 'struct bundle_list' from that data. The list is then output to stdout as if the key-value pairs were a Git config file. We use an input file instead of stdin because of a future change to parse in config-file format that works better as an input file. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Makefile | 1 + bundle-uri.c | 33 ++++++++++ bundle-uri.h | 3 + t/helper/test-bundle-uri.c | 70 +++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 11 ++++ 8 files changed, 241 insertions(+) create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh diff --git a/Makefile b/Makefile index 7d5f48069ea..7dee0329c49 100644 --- a/Makefile +++ b/Makefile @@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bitmap.o TEST_BUILTINS_OBJS += test-bloom.o +TEST_BUILTINS_OBJS += test-bundle-uri.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o TEST_BUILTINS_OBJS += test-crontab.o diff --git a/bundle-uri.c b/bundle-uri.c index 372e6fac5cf..c02e7f62eb1 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +static int summarize_bundle(struct remote_bundle_info *info, void *data) +{ + FILE *fp = data; + fprintf(fp, "[bundle \"%s\"]\n", info->id); + fprintf(fp, "\turi = %s\n", info->uri); + return 0; +} + +void print_bundle_list(FILE *fp, struct bundle_list *list) +{ + const char *mode; + + switch (list->mode) { + case BUNDLE_MODE_ALL: + mode = "all"; + break; + + case BUNDLE_MODE_ANY: + mode = "any"; + break; + + case BUNDLE_MODE_NONE: + default: + mode = "<unknown>"; + } + + fprintf(fp, "[bundle]\n"); + fprintf(fp, "\tversion = %d\n", list->version); + fprintf(fp, "\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} + /** * Given a key-value pair, update the state of the given bundle list. * Returns 0 if the key-value pair is understood. Returns -1 if the key diff --git a/bundle-uri.h b/bundle-uri.h index 90583461929..0e56ab2ae5a 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list, bundle_iterator iter, void *data); +struct FILE; +void print_bundle_list(FILE *fp, struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c new file mode 100644 index 00000000000..0329c56544f --- /dev/null +++ b/t/helper/test-bundle-uri.c @@ -0,0 +1,70 @@ +#include "test-tool.h" +#include "parse-options.h" +#include "bundle-uri.h" +#include "strbuf.h" +#include "string-list.h" + +static int cmd__bundle_uri_parse(int argc, const char **argv) +{ + const char *key_value_usage[] = { + "test-tool bundle-uri parse-key-values <input>", + NULL + }; + const char **usage = key_value_usage; + struct option options[] = { + OPT_END(), + }; + struct strbuf sb = STRBUF_INIT; + struct bundle_list list; + int err = 0; + FILE *fp; + + argc = parse_options(argc, argv, NULL, options, usage, 0); + if (argc != 1) + goto usage; + + init_bundle_list(&list); + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + strbuf_release(&sb); + fclose(fp); + + print_bundle_list(stdout, &list); + + clear_bundle_list(&list); + + return !!err; + +usage: + usage_with_options(usage, options); +} + +int cmd__bundle_uri(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri <subcommand> [<options>]", + NULL + }; + struct option options[] = { + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_ARGV0); + if (argc == 1) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) + return cmd__bundle_uri_parse(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: + usage_with_options(usage, options); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c3..fbe2d9d8108 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "advise", cmd__advise_if_enabled }, { "bitmap", cmd__bitmap }, { "bloom", cmd__bloom }, + { "bundle-uri", cmd__bundle_uri }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, { "crontab", cmd__crontab }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb799271631..b2aa1f39a8f 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -7,6 +7,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); +int cmd__bundle_uri(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); int cmd__crontab(int argc, const char **argv); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh new file mode 100755 index 00000000000..fd142a66ad5 --- /dev/null +++ b/t/t5750-bundle-uri-parse.sh @@ -0,0 +1,121 @@ +#!/bin/sh + +test_description="Test bundle-uri bundle_uri_parse_line()" + +TEST_NO_CREATE_REPO=1 +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'bundle_uri_parse_line() just URIs' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-key-values in >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' ' + cat >in <<-\EOF && + =bogus-value + bogus-key= + EOF + + cat >err.expect <<-EOF && + error: bundle-uri: line has empty key or value + error: bad line: '\''=bogus-value'\'' + error: bundle-uri: line has empty key or value + error: bad line: '\''bogus-key='\'' + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + + bundle.two.uri=https://example.com/bundle.bdl + + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.one.uri=https://example.com/bundle-2.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6da7273f1d5..3175d665add 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1956,3 +1956,14 @@ test_is_magic_mtime () { rm -f .git/test-mtime-actual return $ret } + +# Given two filenames, parse both using 'git config --list --file' +# and compare the sorted output of those commands. Useful when +# wanting to ignore whitespace differences and sorting concerns. +test_cmp_config_output () { + git config --list --file="$1" >config-expect && + git config --list --file="$2" >config-actual && + sort config-expect >sorted-expect && + sort config-actual >sorted-actual && + test_cmp sorted-expect sorted-actual +} -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 6/9] bundle-uri: parse bundle list in config format 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When a bundle provider wants to operate independently from a Git remote, they want to provide a single, consistent URI that users can use in their 'git clone --bundle-uri' commands. At this point, the Git client expects that URI to be a single bundle that can be unbundled and used to bootstrap the rest of the clone from the Git server. This single bundle cannot be re-used to assist with future incremental fetches. To allow for the incremental fetch case, teach Git to understand a bundle list that could be advertised at an independent bundle URI. Such a bundle list is likely to be inspected by human readers, even if only by the bundle provider creating the list. For this reason, we can take our expected "key=value" pairs and instead format them using Git config format. Create bundle_uri_parse_config_format() to parse a file in config format and convert that into a 'struct bundle_list' filled with its understanding of the contents. Be careful to use error_action CONFIG_ERROR_ERROR when calling git_config_from_file_with_options() because the default action for git_config_from_file() is to die() on a parsing error. The current warning isn't particularly helpful if it arises to a user, but it will be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce bugs than this simpler version. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++ bundle-uri.h | 9 +++++++ t/helper/test-bundle-uri.c | 49 +++++++++++++++++++++++++++--------- t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++ 4 files changed, 123 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index c02e7f62eb1..3d44ec2b1e6 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value, return 0; } +static int config_to_bundle_list(const char *key, const char *value, void *data) +{ + struct bundle_list *list = data; + return bundle_list_update(key, value, list); +} + +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); + + if (!result && list->mode == BUNDLE_MODE_NONE) { + warning(_("bundle list at '%s' has no mode"), uri); + result = 1; + } + + return result; +} + static char *find_temp_filename(void) { int fd; diff --git a/bundle-uri.h b/bundle-uri.h index 0e56ab2ae5a..bc13d4c9929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list, struct FILE; void print_bundle_list(FILE *fp, struct bundle_list *list); +/** + * A bundle URI may point to a bundle list where the key=value + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c index 0329c56544f..25afd393428 100644 --- a/t/helper/test-bundle-uri.c +++ b/t/helper/test-bundle-uri.c @@ -4,12 +4,21 @@ #include "strbuf.h" #include "string-list.h" -static int cmd__bundle_uri_parse(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, +}; + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { const char *key_value_usage[] = { "test-tool bundle-uri parse-key-values <input>", NULL }; + const char *config_usage[] = { + "test-tool bundle-uri parse-config <input>", + NULL + }; const char **usage = key_value_usage; struct option options[] = { OPT_END(), @@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv) int err = 0; FILE *fp; - argc = parse_options(argc, argv, NULL, options, usage, 0); - if (argc != 1) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); - fp = fopen(argv[0], "r"); - if (!fp) - die("failed to open '%s'", argv[0]); - while (strbuf_getline(&sb, fp) != EOF) { - if (bundle_uri_parse_line(&list, sb.buf)) - err = error("bad line: '%s'", sb.buf); + switch (mode) { + case KEY_VALUE_PAIRS: + if (argc != 1) + goto usage; + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + fclose(fp); + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; + err = bundle_uri_parse_config_format("<uri>", argv[0], &list); + break; } strbuf_release(&sb); - fclose(fp); print_bundle_list(stdout, &list); @@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) - return cmd__bundle_uri_parse(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); error("there is no test-tool bundle-uri tool '%s'", argv[1]); usage: diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index fd142a66ad5..c2fe3f9c5a5 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines test_cmp_config_output expect actual ' +test_expect_success 'parse config format: just URIs' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'parse config format edge cases: empty key or value' ' + cat >in1 <<-\EOF && + = bogus-value + EOF + + cat >err1 <<-EOF && + error: bad config line 1 in file in1 + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && + test_cmp err1 err && + test_cmp_config_output expect actual && + + cat >in2 <<-\EOF && + bogus-key = + EOF + + cat >err2 <<-EOF && + error: bad config line 1 in file in2 + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && + test_cmp err2 err && + test_cmp_config_output expect actual +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 6/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 9 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The next change will start allowing us to parse bundle lists that are downloaded from a provided bundle URI. Those lists might point to other lists, which could proceed to an arbitrary depth (and even create cycles). Restructure fetch_bundle_uri() to have an internal version that has a recursion depth. Compare that to a new max_bundle_uri_depth constant that is twice as high as we expect this depth to be for any legitimate use of bundle list linking. We can consider making max_bundle_uri_depth a configurable value if there is demonstrated value in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 3d44ec2b1e6..8a7c11c6393 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } -int fetch_bundle_uri(struct repository *r, const char *uri) +/** + * This limits the recursion on fetch_bundle_uri_internal() when following + * bundle lists. + */ +static int max_bundle_uri_depth = 4; + +static int fetch_bundle_uri_internal(struct repository *r, + const char *uri, + int depth) { int result = 0; char *filename; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), + max_bundle_uri_depth); + return -1; + } + if (!(filename = find_temp_filename())) { result = -1; goto cleanup; @@ -366,6 +380,11 @@ cleanup: return result; } +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + return fetch_bundle_uri_internal(r, uri, 0); +} + /** * General API for {transport,connect}.c etc. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v3 8/9] bundle-uri: fetch a list of bundles 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-04 21:44 ` Jonathan Tan 2022-10-04 12:34 ` [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 9 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 202 +++++++++++++++++++++++++++++++++--- bundle-uri.h | 13 +++ t/t5558-clone-bundle-uri.sh | 135 ++++++++++++++++++++++++ 3 files changed, 334 insertions(+), 16 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 8a7c11c6393..aaa1848044a 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle, { FREE_AND_NULL(bundle->id); FREE_AND_NULL(bundle->uri); + FREE_AND_NULL(bundle->file); + bundle->unbundled = 0; return 0; } @@ -334,18 +336,116 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } +struct bundle_list_context { + struct repository *r; + struct bundle_list *list; + enum bundle_list_mode mode; + int count; + int depth; +}; + +/* + * This early definition is necessary because we use indirect recursion: + * + * While iterating through a bundle list that was downloaded as part + * of fetch_bundle_uri_internal(), iterator methods eventually call it + * again, but with depth + 1. + */ +static int fetch_bundle_uri_internal(struct repository *r, + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list); + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ + int res; + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + + res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); + + /* + * Only increment count if the download succeeded. If our mode is + * BUNDLE_MODE_ANY, then we will want to try other URIs in the + * list in case they work instead. + */ + if (!res) + ctx->count++; + + /* + * In BUNDLE_MODE_ANY, we need to continue iterating until we find + * a bundle that works, so do not signal a failure here. + */ + return ctx->mode == BUNDLE_MODE_ANY ? 0 : res; +} + +static int download_bundle_list(struct repository *r, + struct bundle_list *local_list, + struct bundle_list *global_list, + int depth) +{ + struct bundle_list_context ctx = { + .r = r, + .list = global_list, + .depth = depth + 1, + .mode = local_list->mode, + }; + + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); +} + +static int fetch_bundle_list_in_config_format(struct repository *r, + struct bundle_list *global_list, + struct remote_bundle_info *bundle, + int depth) +{ + int result; + struct bundle_list list_from_bundle; + + init_bundle_list(&list_from_bundle); + + if ((result = bundle_uri_parse_config_format(bundle->uri, + bundle->file, + &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { + warning(_("unrecognized bundle mode from URI '%s'"), + bundle->uri); + result = -1; + goto cleanup; + } + + if ((result = download_bundle_list(r, &list_from_bundle, + global_list, depth))) + goto cleanup; + +cleanup: + clear_bundle_list(&list_from_bundle); + return result; +} + /** * This limits the recursion on fetch_bundle_uri_internal() when following * bundle lists. */ static int max_bundle_uri_depth = 4; +/** + * Recursively download all bundles advertised at the given URI + * to files. If the file is a bundle, then add it to the given + * 'list'. Otherwise, expect a bundle list and recurse on the + * URIs in that list according to the list mode (ANY or ALL). + */ static int fetch_bundle_uri_internal(struct repository *r, - const char *uri, - int depth) + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list) { int result = 0; - char *filename; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { warning(_("exceeded bundle URI recursion limit (%d)"), @@ -353,36 +453,106 @@ static int fetch_bundle_uri_internal(struct repository *r, return -1; } - if (!(filename = find_temp_filename())) { + if (!bundle->file && + !(bundle->file = find_temp_filename())) { result = -1; goto cleanup; } - if ((result = copy_uri_to_file(filename, uri))) { - warning(_("failed to download bundle from URI '%s'"), uri); + if ((result = copy_uri_to_file(bundle->file, bundle->uri))) { + warning(_("failed to download bundle from URI '%s'"), bundle->uri); goto cleanup; } - if ((result = !is_bundle(filename, 0))) { - warning(_("file at URI '%s' is not a bundle"), uri); + if ((result = !is_bundle(bundle->file, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); + if (result) + warning(_("file at URI '%s' is not a bundle or bundle list"), + bundle->uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename))) { - warning(_("failed to unbundle bundle from URI '%s'"), uri); - goto cleanup; - } + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); + bcopy->file = xstrdup(bundle->file); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: - if (filename) - unlink(filename); - free(filename); + if (result && bundle->file) + unlink(bundle->file); return result; } +/** + * This loop iterator breaks the loop with nonzero return code on the + * first successful unbundling of a bundle. + */ +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ + struct repository *r = data; + + if (!info->file || info->unbundled) + return 0; + + if (!unbundle_from_file(r, info->file)) { + info->unbundled = 1; + return 1; + } + + return 0; +} + +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. + * + * Keep in mind that a non-zero result for the loop here means + * the loop terminated early on a successful unbundling, which + * signals that we can try again. + */ + while (for_all_bundles_in_list(list, attempt_unbundle, r)) ; + + return 0; +} + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ + if (info->file) + unlink_or_warn(info->file); + return 0; +} + int fetch_bundle_uri(struct repository *r, const char *uri) { - return fetch_bundle_uri_internal(r, uri, 0); + int result; + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), + .id = xstrdup(""), + }; + + init_bundle_list(&list); + + /* If a bundle is added to this global list, then it is required. */ + list.mode = BUNDLE_MODE_ALL; + + if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list))) + goto cleanup; + + result = unbundle_all_bundles(r, &list); + +cleanup: + for_all_bundles_in_list(&list, unlink_bundle, NULL); + clear_bundle_list(&list); + clear_remote_bundle_info(&bundle, NULL); + return result; } /** diff --git a/bundle-uri.h b/bundle-uri.h index bc13d4c9929..4dbc269823c 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -28,6 +28,19 @@ struct remote_bundle_info { * if there was no table of contents. */ char *uri; + + /** + * If the bundle has been downloaded, then 'file' is a + * filename storing its contents. Otherwise, 'file' is + * NULL. + */ + char *file; + + /** + * If the bundle has been unbundled successfully, then + * this boolean is true. + */ + unsigned unbundled:1; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index ad666a2d28a..9690f19386f 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -41,6 +41,92 @@ test_expect_success 'clone with file:// bundle' ' test_cmp expect actual ' +# To get interesting tests for bundle lists, we need to construct a +# somewhat-interesting commit history. +# +# ---------------- bundle-4 +# +# 4 +# / \ +# ----|---|------- bundle-3 +# | | +# | 3 +# | | +# ----|---|------- bundle-2 +# | | +# 2 | +# | | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'construct incremental bundle list' ' + ( + cd clone-from && + git checkout -b base && + test_commit 1 && + git checkout -b left && + test_commit 2 && + git checkout -b right base && + test_commit 3 && + git checkout -b merge left && + git merge right -m "4" && + + git bundle create bundle-1.bundle base && + git bundle create bundle-2.bundle base..left && + git bundle create bundle-3.bundle base..right && + git bundle create bundle-4.bundle merge --not left right + ) +' + +test_expect_success 'clone bundle list (file, no heuristic)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + [bundle "bundle-3"] + uri = file://$(pwd)/clone-from/bundle-3.bundle + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-file cat-file --batch-check <oids +' + +test_expect_success 'clone bundle list (file, any mode)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = $HTTPD_URL/bundle-0.bundle + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = $HTTPD_URL/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-file cat-file --batch-check <oids +' + ######################################################################### # HTTP tests begin here @@ -75,6 +161,55 @@ test_expect_success 'clone HTTP bundle' ' test_config -C clone-http log.excludedecoration refs/bundle/ ' +test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + [bundle "bundle-2"] + uri = $HTTPD_URL/bundle-2.bundle + + [bundle "bundle-3"] + uri = $HTTPD_URL/bundle-3.bundle + + [bundle "bundle-4"] + uri = $HTTPD_URL/bundle-4.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-http cat-file --batch-check <oids +' + +test_expect_success 'clone bundle list (HTTP, any mode)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = $HTTPD_URL/bundle-0.bundle + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = $HTTPD_URL/bundle-5.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-http cat-file --batch-check <oids +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v3 8/9] bundle-uri: fetch a list of bundles 2022-10-04 12:34 ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-10-04 21:44 ` Jonathan Tan 2022-10-07 13:29 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Jonathan Tan @ 2022-10-04 21:44 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Jonathan Tan, git, gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static int unbundle_all_bundles(struct repository *r, > + struct bundle_list *list) > +{ > + /* > + * Iterate through all bundles looking for ones that can > + * successfully unbundle. If any succeed, then perhaps another > + * will succeed in the next attempt. > + * > + * Keep in mind that a non-zero result for the loop here means > + * the loop terminated early on a successful unbundling, which > + * signals that we can try again. > + */ > + while (for_all_bundles_in_list(list, attempt_unbundle, r)) ; > + > + return 0; > +} This function always returns 0 regardless of how many successful iterations there were: we would need the number to be equal to the number of bundles in the list if ALL, and 1 if ANY. Which brings up the question...we probably need a test for when the unbundling is unsuccessful. Other than that, everything looks good, including the removal of one patch and the addition of the "bundle-uri: suppress stderr from remote-https" patch. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v3 8/9] bundle-uri: fetch a list of bundles 2022-10-04 21:44 ` Jonathan Tan @ 2022-10-07 13:29 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-10-07 13:29 UTC (permalink / raw) To: Jonathan Tan, Derrick Stolee via GitGitGadget Cc: git, gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Teng Long On 10/4/22 5:44 PM, Jonathan Tan wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> +static int unbundle_all_bundles(struct repository *r, >> + struct bundle_list *list) >> +{ >> + /* >> + * Iterate through all bundles looking for ones that can >> + * successfully unbundle. If any succeed, then perhaps another >> + * will succeed in the next attempt. >> + * >> + * Keep in mind that a non-zero result for the loop here means >> + * the loop terminated early on a successful unbundling, which >> + * signals that we can try again. >> + */ >> + while (for_all_bundles_in_list(list, attempt_unbundle, r)) ; >> + >> + return 0; >> +} > > This function always returns 0 regardless of how many successful > iterations there were: we would need the number to be equal to the > number of bundles in the list if ALL, and 1 if ANY. The ALL mode is a bit more permissive than requiring literally every bundle: if some fail to download or apply, then we continue with whatever we were able to unbundle. The ALL mode indicates that the bundles build on each other, so the client should download as many as possible. By contrast, ANY indicates that they are independent so the client should stop after the first successful download. We could still find a way to indicate how many bundles were downloaded in the return of this method, but we don't want to have additional warnings based on that return value. > Which brings up the question...we probably need a test for when the > unbundling is unsuccessful. I will add more failure scenarios, including no successful downloads or only a partial success in ALL mode. > Other than that, everything looks good, including the removal of one > patch and the addition of the "bundle-uri: suppress stderr from > remote-https" patch. Thanks! -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 9 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-04 12:34 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When downloading bundles from a git-remote-https subprocess, the bundle URI logic wants to be opportunistic and download as much as possible and work with what did succeed. This is particularly important in the "any" mode, where any single bundle success will work. If the URI is not available, the git-remote-https process will die() with a "fatal:" error message, even though that error is not actually fatal to the super process. Since stderr is passed through, it looks like a fatal error to the user. Suppress stderr to avoid these errors from bubbling to the surface. The bundle URI API adds its own warning() messages on these failures. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 1 + t/t5558-clone-bundle-uri.sh | 12 ++++++++++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index aaa1848044a..92af0eae224 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -230,6 +230,7 @@ static int download_https_uri_to_file(const char *file, const char *uri) int found_get = 0; strvec_pushl(&cp.args, "git-remote-https", uri, NULL); + cp.err = -1; cp.in = -1; cp.out = -1; diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9690f19386f..a0ef0588e21 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -122,7 +122,11 @@ test_expect_success 'clone bundle list (file, any mode)' ' uri = $HTTPD_URL/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-any-file 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-file cat-file --batch-check <oids ' @@ -205,7 +209,11 @@ test_expect_success 'clone bundle list (HTTP, any mode)' ' uri = $HTTPD_URL/bundle-5.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-any-http 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-http cat-file --batch-check <oids ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2022-10-04 12:34 ` [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget ` (11 more replies) 9 siblings, 12 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list. The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series). The patches are organized in this way (updated for v4): 1. Patch 1 is a cleanup from the previous part. This allows us to simplify our bundle list data structure slightly. 2. Patches 2-3 create the bundle list data structures and the logic for populating the list from key-value pairs. 3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2]. 4. Patch 6 teaches Git to parse Git config files into bundle lists. 5. Patches 7-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. We also need to be careful when verifying the bundles due to ref caches, so some flags are added to unbundle() and verify_bundle(). 6. Patches 10-11 suppress unhelpful warnings from user visibility. [1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/ [2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/ At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file. As outlined in the design document [1], the next steps after this are: 1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2] 2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3] I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going: [3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com [4] https://github.com/derrickstolee/git/pull/21 [5] https://github.com/derrickstolee/git/pull/22 Updates in v4 ============= * Properly updated the patch outline. * Jonathan Tan asked for more tests, and this revealed some interesting behaviors which I have now either fixed or made explicit: 1. In "all" mode, we try to download and apply all bundles. Do not fail if a single bundle download fails. 2. Previously, not all bundles were being applied, and this was noticed by the added checks for the refs/bundles/* refs at the end of the tests. This revealed the need for removing the reachability walk from verify_bundle() since the written refs/bundles/* refs were not being picked up by the loose ref cache. Since removing the reachability walk seemed like the faster (for users) option, I went that direction. 3. While running those tests and examining the output carefully, I noticed several error messages related to missing prerequisites due to attempting unbundling in a random order. This doesn't appear in the later creationToken version, so I hadn't noticed it at the tip of my local work. These messages are removed with a new quiet mode for verify_bundle(). Updates in v3 ============= * Fixed a comment about a return value of -1. * Fixed and tested scenario where early URIs fail in "any" mode and Git should try the rest of the list. * Instead of using 'success_count' and 'failure_count', use the iterator return value to terminate the "all" mode loop early. Updates in v2 ============= Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version. * I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches. * We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used. * The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series). * The bundle.version value is now parsed using git_parse_int(). * The config key is now parsed using parse_config_key(). * Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort. * Some printf()s are correctly changed to fprintf()s. * The test helper CLI is unified across the two modes. They both take a filename now. * The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure. Thanks, * Stolee Derrick Stolee (9): bundle-uri: use plain string in find_temp_filename() bundle-uri: create bundle_list struct and helpers bundle-uri: create base key-value pair parsing bundle-uri: parse bundle list in config format bundle-uri: limit recursion depth for bundle lists bundle: add flags to verify_bundle(), skip walk bundle-uri: fetch a list of bundles bundle-uri: quiet failed unbundlings bundle-uri: suppress stderr from remote-https Ævar Arnfjörð Bjarmason (2): bundle-uri: create "key=value" line parsing bundle-uri: unit test "key=value" parsing Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 ++ Makefile | 1 + builtin/bundle.c | 5 +- bundle-uri.c | 458 ++++++++++++++++++++++++++++++-- bundle-uri.h | 93 +++++++ bundle.c | 22 +- bundle.h | 16 +- config.c | 2 +- config.h | 1 + t/helper/test-bundle-uri.c | 95 +++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5558-clone-bundle-uri.sh | 275 +++++++++++++++++++ t/t5750-bundle-uri-parse.sh | 171 ++++++++++++ t/test-lib-functions.sh | 11 + transport.c | 2 +- 17 files changed, 1149 insertions(+), 31 deletions(-) create mode 100644 Documentation/config/bundle.txt create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1333 Range-diff vs v3: 1: 48beccb0f5e = 1: 48beccb0f5e bundle-uri: use plain string in find_temp_filename() 2: f0c4457951c = 2: f0c4457951c bundle-uri: create bundle_list struct and helpers 3: 430e01cd2a4 = 3: 430e01cd2a4 bundle-uri: create base key-value pair parsing 4: cd915d57f3b = 4: cd915d57f3b bundle-uri: create "key=value" line parsing 5: 4d8cac67f66 = 5: 4d8cac67f66 bundle-uri: unit test "key=value" parsing 6: 0ecae3a44b3 = 6: 0ecae3a44b3 bundle-uri: parse bundle list in config format 7: 7e6b32313b0 = 7: 7e6b32313b0 bundle-uri: limit recursion depth for bundle lists -: ----------- > 8: 83f2cd893a4 bundle: add flags to verify_bundle(), skip walk 8: 46799648b4c ! 9: 6b9c764c6b3 bundle-uri: fetch a list of bundles @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + ctx->count++; + + /* -+ * In BUNDLE_MODE_ANY, we need to continue iterating until we find -+ * a bundle that works, so do not signal a failure here. ++ * To be opportunistic as possible, we continue iterating and ++ * download as many bundles as we can, so we can apply the ones ++ * that work, even in BUNDLE_MODE_ALL mode. + */ -+ return ctx->mode == BUNDLE_MODE_ANY ? 0 : res; ++ return 0; +} + +static int download_bundle_list(struct repository *r, @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' ' + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && -+ git -C clone-list-file cat-file --batch-check <oids ++ git -C clone-list-file cat-file --batch-check <oids && ++ ++ git -C clone-list-file for-each-ref --format="%(refname)" >refs && ++ grep "refs/bundles/" refs >actual && ++ cat >expect <<-\EOF && ++ refs/bundles/base ++ refs/bundles/left ++ refs/bundles/merge ++ refs/bundles/right ++ EOF ++ test_cmp expect actual ++' ++ ++test_expect_success 'clone bundle list (file, all mode, some failures)' ' ++ cat >bundle-list <<-EOF && ++ [bundle] ++ version = 1 ++ mode = all ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-0"] ++ uri = file://$(pwd)/clone-from/bundle-0.bundle ++ ++ [bundle "bundle-1"] ++ uri = file://$(pwd)/clone-from/bundle-1.bundle ++ ++ [bundle "bundle-2"] ++ uri = file://$(pwd)/clone-from/bundle-2.bundle ++ ++ # No bundle-3 means bundle-4 will not apply. ++ ++ [bundle "bundle-4"] ++ uri = file://$(pwd)/clone-from/bundle-4.bundle ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-5"] ++ uri = file://$(pwd)/clone-from/bundle-5.bundle ++ EOF ++ ++ GIT_TRACE2_PERF=1 \ ++ git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-all-some cat-file --batch-check <oids && ++ ++ git -C clone-all-some for-each-ref --format="%(refname)" >refs && ++ grep "refs/bundles/" refs >actual && ++ cat >expect <<-\EOF && ++ refs/bundles/base ++ refs/bundles/left ++ EOF ++ test_cmp expect actual ++' ++ ++test_expect_success 'clone bundle list (file, all mode, all failures)' ' ++ cat >bundle-list <<-EOF && ++ [bundle] ++ version = 1 ++ mode = all ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-0"] ++ uri = file://$(pwd)/clone-from/bundle-0.bundle ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-5"] ++ uri = file://$(pwd)/clone-from/bundle-5.bundle ++ EOF ++ ++ git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-all-fail cat-file --batch-check <oids && ++ ++ git -C clone-all-fail for-each-ref --format="%(refname)" >refs && ++ ! grep "refs/bundles/" refs +' + +test_expect_success 'clone bundle list (file, any mode)' ' @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with file:// bundle' ' + + # Does not exist. Should be skipped. + [bundle "bundle-0"] -+ uri = $HTTPD_URL/bundle-0.bundle ++ uri = file://$(pwd)/clone-from/bundle-0.bundle + + [bundle "bundle-1"] -+ uri = $HTTPD_URL/bundle-1.bundle ++ uri = file://$(pwd)/clone-from/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] -+ uri = $HTTPD_URL/bundle-5.bundle ++ uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && -+ git -C clone-any-file cat-file --batch-check <oids ++ git -C clone-any-file cat-file --batch-check <oids && ++ ++ git -C clone-any-file for-each-ref --format="%(refname)" >refs && ++ grep "refs/bundles/" refs >actual && ++ cat >expect <<-\EOF && ++ refs/bundles/base ++ EOF ++ test_cmp expect actual ++' ++ ++test_expect_success 'clone bundle list (file, any mode, all failures)' ' ++ cat >bundle-list <<-EOF && ++ [bundle] ++ version = 1 ++ mode = any ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-0"] ++ uri = $HTTPD_URL/bundle-0.bundle ++ ++ # Does not exist. Should be skipped. ++ [bundle "bundle-5"] ++ uri = $HTTPD_URL/bundle-5.bundle ++ EOF ++ ++ git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail && ++ git -C clone-from for-each-ref --format="%(objectname)" >oids && ++ git -C clone-any-fail cat-file --batch-check <oids && ++ ++ git -C clone-any-fail for-each-ref --format="%(refname)" >refs && ++ ! grep "refs/bundles/" refs +' + ######################################################################### @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' ' + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && -+ git -C clone-any-http cat-file --batch-check <oids ++ git -C clone-any-http cat-file --batch-check <oids && ++ ++ git -C clone-list-file for-each-ref --format="%(refname)" >refs && ++ grep "refs/bundles/" refs >actual && ++ cat >expect <<-\EOF && ++ refs/bundles/base ++ refs/bundles/left ++ refs/bundles/merge ++ refs/bundles/right ++ EOF ++ test_cmp expect actual +' + # Do not add tests here unless they use the HTTP server, as they will -: ----------- > 10: 1cae3096624 bundle-uri: quiet failed unbundlings 9: d84544859e4 ! 11: 52a575f8a69 bundle-uri: suppress stderr from remote-https @@ bundle-uri.c: static int download_https_uri_to_file(const char *file, const char ## t/t5558-clone-bundle-uri.sh ## -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any mode)' ' +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, all mode, some failures)' ' + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-all-some 2>err && + ! grep "Repository lacks these prerequisite commits" err && ++ ! grep "fatal" err && ++ grep "warning: failed to download bundle from URI" err && + + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-all-some cat-file --batch-check <oids && +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, all mode, all failures)' ' + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-all-fail 2>err && + ! grep "Repository lacks these prerequisite commits" err && ++ ! grep "fatal" err && ++ grep "warning: failed to download bundle from URI" err && + + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-all-fail cat-file --batch-check <oids && +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any mode, all failures)' ' uri = $HTTPD_URL/bundle-5.bundle EOF -- git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && +- git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ -+ clone-from clone-any-file 2>err && ++ clone-from clone-any-fail 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && - git -C clone-any-file cat-file --batch-check <oids - ' + git -C clone-any-fail cat-file --batch-check <oids && + @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any mode)' ' uri = $HTTPD_URL/bundle-5.bundle EOF @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && - git -C clone-any-http cat-file --batch-check <oids - ' + git -C clone-any-http cat-file --batch-check <oids && + -- gitgitgadget ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The find_temp_filename() method was created in 53a50892be2 (bundle-uri: create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to create a temporary filename. The odb_mkstemp() method uses a strbuf in its interface, but we do not need to continue carrying a strbuf throughout the bundle URI code. Convert the find_temp_filename() method to use a 'char *' and modify its only caller. This makes sense that we don't actually need to modify this filename directly later, so using a strbuf is overkill. This change will simplify the data structure for tracking a bundle list to use plain strings instead of strbufs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 4a8cc74ed05..8b2f4e08c9c 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -5,22 +5,23 @@ #include "refs.h" #include "run-command.h" -static int find_temp_filename(struct strbuf *name) +static char *find_temp_filename(void) { int fd; + struct strbuf name = STRBUF_INIT; /* * Find a temporary filename that is available. This is briefly * racy, but unlikely to collide. */ - fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX"); + fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX"); if (fd < 0) { warning(_("failed to create temporary file")); - return -1; + return NULL; } close(fd); - unlink(name->buf); - return 0; + unlink(name.buf); + return strbuf_detach(&name, NULL); } static int download_https_uri_to_file(const char *file, const char *uri) @@ -141,28 +142,31 @@ static int unbundle_from_file(struct repository *r, const char *file) int fetch_bundle_uri(struct repository *r, const char *uri) { int result = 0; - struct strbuf filename = STRBUF_INIT; + char *filename; - if ((result = find_temp_filename(&filename))) + if (!(filename = find_temp_filename())) { + result = -1; goto cleanup; + } - if ((result = copy_uri_to_file(filename.buf, uri))) { + if ((result = copy_uri_to_file(filename, uri))) { warning(_("failed to download bundle from URI '%s'"), uri); goto cleanup; } - if ((result = !is_bundle(filename.buf, 0))) { + if ((result = !is_bundle(filename, 0))) { warning(_("file at URI '%s' is not a bundle"), uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename.buf))) { + if ((result = unbundle_from_file(r, filename))) { warning(_("failed to unbundle bundle from URI '%s'"), uri); goto cleanup; } cleanup: - unlink(filename.buf); - strbuf_release(&filename); + if (filename) + unlink(filename); + free(filename); return result; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 03/11] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> It will likely be rare where a user uses a single bundle URI and expects that URI to point to a bundle. Instead, that URI will likely be a list of bundles provided in some format. Alternatively, the Git server could advertise a list of bundles. In anticipation of these two ways of advertising multiple bundles, create a data structure that represents such a list. This will be populated using a common API, but for now focus on what data can be represented. Each list contains a number of remote_bundle_info structs. These contain an 'id' that is used to uniquely identify them in the list, and also a 'uri' that contains the location of its data. Finally, there is a strbuf containing the filename used when Git downloads the contents to disk. The list itself stores these remote_bundle_info structs in a hashtable using 'id' as the key. The order of the structs in the input is considered unimportant, but future modifications to the format and these data structures will place ordering possibilities on the set. The list also has a few "global" properties, including the version (used when parsing the list) and the mode. The mode is one of these two options: 1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined together. The client should download all of the advertised data to have a complete copy of the data. 2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete copy of the data. The client can choose arbitrarily from these options. In the future, the client may use pings to find the closest URI among geodistributed replicas, or use some other heuristic information added to the format. This API is currently unused, but will soon be expanded with parsing logic and then be consumed by the bundle URI download logic. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 8b2f4e08c9c..f9a8db221bc 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -4,6 +4,66 @@ #include "object-store.h" #include "refs.h" #include "run-command.h" +#include "hashmap.h" +#include "pkt-line.h" + +static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, + const struct hashmap_entry *he2, + const void *id) +{ + const struct remote_bundle_info *e1 = + container_of(he1, const struct remote_bundle_info, ent); + const struct remote_bundle_info *e2 = + container_of(he2, const struct remote_bundle_info, ent); + + return strcmp(e1->id, id ? (const char *)id : e2->id); +} + +void init_bundle_list(struct bundle_list *list) +{ + memset(list, 0, sizeof(*list)); + + /* Implied defaults. */ + list->mode = BUNDLE_MODE_ALL; + list->version = 1; + + hashmap_init(&list->bundles, compare_bundles, NULL, 0); +} + +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); + return 0; +} + +void clear_bundle_list(struct bundle_list *list) +{ + if (!list) + return; + + for_all_bundles_in_list(list, clear_remote_bundle_info, NULL); + hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent); +} + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data) +{ + struct remote_bundle_info *info; + struct hashmap_iter i; + + hashmap_for_each_entry(&list->bundles, &i, info, ent) { + int result = iter(info, data); + + if (result) + return result; + } + + return 0; +} static char *find_temp_filename(void) { diff --git a/bundle-uri.h b/bundle-uri.h index 8a152f1ef14..ff7e3fd3fb2 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -1,7 +1,63 @@ #ifndef BUNDLE_URI_H #define BUNDLE_URI_H +#include "hashmap.h" +#include "strbuf.h" + struct repository; +struct string_list; + +/** + * The remote_bundle_info struct contains information for a single bundle + * URI. This may be initialized simply by a given URI or might have + * additional metadata associated with it if the bundle was advertised by + * a bundle list. + */ +struct remote_bundle_info { + struct hashmap_entry ent; + + /** + * The 'id' is a name given to the bundle for reference + * by other bundle infos. + */ + char *id; + + /** + * The 'uri' is the location of the remote bundle so + * it can be downloaded on-demand. This will be NULL + * if there was no table of contents. + */ + char *uri; +}; + +#define REMOTE_BUNDLE_INFO_INIT { 0 } + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, + BUNDLE_MODE_ALL, + BUNDLE_MODE_ANY +}; + +/** + * A bundle_list contains an unordered set of remote_bundle_info structs, + * as well as information about the bundle listing, such as version and + * mode. + */ +struct bundle_list { + int version; + enum bundle_list_mode mode; + struct hashmap bundles; +}; + +void init_bundle_list(struct bundle_list *list); +void clear_bundle_list(struct bundle_list *list); + +typedef int (*bundle_iterator)(struct remote_bundle_info *bundle, + void *data); + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data); /** * Fetch data from the given 'uri' and unbundle the bundle data found -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 03/11] bundle-uri: create base key-value pair parsing 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 04/11] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (8 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "<id>" as part of the key name. Keys using the same "<id>" describe a single bundle list item. * bundle.<id>.uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. While parsing, return an error if a URI key is repeated, since we can make that restriction with bundle lists. Make the git_parse_int() method global so we can parse the integer version value carefully. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 +++++++++++ bundle-uri.c | 76 +++++++++++++++++++++++++++++++++ config.c | 2 +- config.h | 1 + 5 files changed, 104 insertions(+), 1 deletion(-) create mode 100644 Documentation/config/bundle.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index e376d547ce0..4280af6992e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -387,6 +387,8 @@ include::config/branch.txt[] include::config/browser.txt[] +include::config/bundle.txt[] + include::config/checkout.txt[] include::config/clean.txt[] diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt new file mode 100644 index 00000000000..daa21eb674a --- /dev/null +++ b/Documentation/config/bundle.txt @@ -0,0 +1,24 @@ +bundle.*:: + The `bundle.*` keys may appear in a bundle list file found via the + `git clone --bundle-uri` option. These keys currently have no effect + if placed in a repository config file, though this will change in the + future. See link:technical/bundle-uri.html[the bundle URI design + document] for more details. + +bundle.version:: + This integer value advertises the version of the bundle list format + used by the bundle list. Currently, the only accepted value is `1`. + +bundle.mode:: + This string value should be either `all` or `any`. This value describes + whether all of the advertised bundles are required to unbundle a + complete understanding of the bundled information (`all`) or if any one + of the listed bundle URIs is sufficient (`any`). + +bundle.<id>.*:: + The `bundle.<id>.*` keys are used to describe a single item in the + bundle list, grouped under `<id>` for identification purposes. + +bundle.<id>.uri:: + This string value defines the URI by which Git can reach the contents + of this `<id>`. This URI may be a bundle file or another bundle list. diff --git a/bundle-uri.c b/bundle-uri.c index f9a8db221bc..0bc59dd9c34 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -6,6 +6,7 @@ #include "run-command.h" #include "hashmap.h" #include "pkt-line.h" +#include "config.h" static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, @@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +/** + * Given a key-value pair, update the state of the given bundle list. + * Returns 0 if the key-value pair is understood. Returns -1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; + const char *subsection, *subkey; + size_t subsection_len; + + if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey)) + return -1; + + if (!subsection_len) { + if (!strcmp(subkey, "version")) { + int version; + if (!git_parse_int(value, &version)) + return -1; + if (version != 1) + return -1; + + list->version = version; + return 0; + } + + if (!strcmp(subkey, "mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else + return -1; + return 0; + } + + /* Ignore other unknown global keys. */ + return 0; + } + + strbuf_add(&id, subsection, subsection_len); + + /* + * Check for an existing bundle with this <id>, or create one + * if necessary. + */ + lookup.id = id.buf; + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + + if (!strcmp(subkey, "uri")) { + if (bundle->uri) + return -1; + bundle->uri = xstrdup(value); + return 0; + } + + /* + * At this point, we ignore any information that we don't + * understand, assuming it to be hints for a heuristic the client + * does not currently understand. + */ + return 0; +} + static char *find_temp_filename(void) { int fd; diff --git a/config.c b/config.c index 015bec360f5..e93101249f6 100644 --- a/config.c +++ b/config.c @@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max) return 0; } -static int git_parse_int(const char *value, int *ret) +int git_parse_int(const char *value, int *ret) { intmax_t tmp; if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int))) diff --git a/config.h b/config.h index ca994d77147..ef9eade6414 100644 --- a/config.h +++ b/config.h @@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *, int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); +int git_parse_int(const char *value, int *ret); /** * Same as `git_config_bool`, except that it returns -1 on error rather -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 04/11] bundle-uri: create "key=value" line parsing 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 03/11] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (7 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> When advertising a bundle list over Git's protocol v2, we will use packet lines. Each line will be of the form "key=value" representing a bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. We are not currently implementing this protocol v2 functionality, but instead preparing to expose this parsing to be unit-testable. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++++++++- bundle-uri.h | 12 ++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 0bc59dd9c34..372e6fac5cf 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list, * Returns 0 if the key-value pair is understood. Returns -1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED static int bundle_list_update(const char *key, const char *value, struct bundle_list *list) { @@ -306,3 +305,29 @@ cleanup: free(filename); return result; } + +/** + * General API for {transport,connect}.c etc. + */ +int bundle_uri_parse_line(struct bundle_list *list, const char *line) +{ + int result; + const char *equals; + struct strbuf key = STRBUF_INIT; + + if (!strlen(line)) + return error(_("bundle-uri: got an empty line")); + + equals = strchr(line, '='); + + if (!equals) + return error(_("bundle-uri: line is not of the form 'key=value'")); + if (line == equals || !*(equals + 1)) + return error(_("bundle-uri: line has empty key or value")); + + strbuf_add(&key, line, equals - line); + result = bundle_list_update(key.buf, equals + 1, list); + strbuf_release(&key); + + return result; +} diff --git a/bundle-uri.h b/bundle-uri.h index ff7e3fd3fb2..90583461929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); +/** + * General API for {transport,connect}.c etc. + */ + +/** + * Parse a "key=value" packet line from the bundle-uri verb. + * + * Returns 0 on success and non-zero on error. + */ +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + #endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 04/11] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-10 16:04 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 06/11] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> Create a new 'test-tool bundle-uri' test helper. This helper will assist in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses an input file as a list of lines. These are fed into bundle_uri_parse_line() to test how we construct a 'struct bundle_list' from that data. The list is then output to stdout as if the key-value pairs were a Git config file. We use an input file instead of stdin because of a future change to parse in config-file format that works better as an input file. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Makefile | 1 + bundle-uri.c | 33 ++++++++++ bundle-uri.h | 3 + t/helper/test-bundle-uri.c | 70 +++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 11 ++++ 8 files changed, 241 insertions(+) create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh diff --git a/Makefile b/Makefile index 7d5f48069ea..7dee0329c49 100644 --- a/Makefile +++ b/Makefile @@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bitmap.o TEST_BUILTINS_OBJS += test-bloom.o +TEST_BUILTINS_OBJS += test-bundle-uri.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o TEST_BUILTINS_OBJS += test-crontab.o diff --git a/bundle-uri.c b/bundle-uri.c index 372e6fac5cf..c02e7f62eb1 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +static int summarize_bundle(struct remote_bundle_info *info, void *data) +{ + FILE *fp = data; + fprintf(fp, "[bundle \"%s\"]\n", info->id); + fprintf(fp, "\turi = %s\n", info->uri); + return 0; +} + +void print_bundle_list(FILE *fp, struct bundle_list *list) +{ + const char *mode; + + switch (list->mode) { + case BUNDLE_MODE_ALL: + mode = "all"; + break; + + case BUNDLE_MODE_ANY: + mode = "any"; + break; + + case BUNDLE_MODE_NONE: + default: + mode = "<unknown>"; + } + + fprintf(fp, "[bundle]\n"); + fprintf(fp, "\tversion = %d\n", list->version); + fprintf(fp, "\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} + /** * Given a key-value pair, update the state of the given bundle list. * Returns 0 if the key-value pair is understood. Returns -1 if the key diff --git a/bundle-uri.h b/bundle-uri.h index 90583461929..0e56ab2ae5a 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list, bundle_iterator iter, void *data); +struct FILE; +void print_bundle_list(FILE *fp, struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c new file mode 100644 index 00000000000..0329c56544f --- /dev/null +++ b/t/helper/test-bundle-uri.c @@ -0,0 +1,70 @@ +#include "test-tool.h" +#include "parse-options.h" +#include "bundle-uri.h" +#include "strbuf.h" +#include "string-list.h" + +static int cmd__bundle_uri_parse(int argc, const char **argv) +{ + const char *key_value_usage[] = { + "test-tool bundle-uri parse-key-values <input>", + NULL + }; + const char **usage = key_value_usage; + struct option options[] = { + OPT_END(), + }; + struct strbuf sb = STRBUF_INIT; + struct bundle_list list; + int err = 0; + FILE *fp; + + argc = parse_options(argc, argv, NULL, options, usage, 0); + if (argc != 1) + goto usage; + + init_bundle_list(&list); + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + strbuf_release(&sb); + fclose(fp); + + print_bundle_list(stdout, &list); + + clear_bundle_list(&list); + + return !!err; + +usage: + usage_with_options(usage, options); +} + +int cmd__bundle_uri(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri <subcommand> [<options>]", + NULL + }; + struct option options[] = { + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_ARGV0); + if (argc == 1) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) + return cmd__bundle_uri_parse(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: + usage_with_options(usage, options); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c3..fbe2d9d8108 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "advise", cmd__advise_if_enabled }, { "bitmap", cmd__bitmap }, { "bloom", cmd__bloom }, + { "bundle-uri", cmd__bundle_uri }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, { "crontab", cmd__crontab }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb799271631..b2aa1f39a8f 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -7,6 +7,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); +int cmd__bundle_uri(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); int cmd__crontab(int argc, const char **argv); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh new file mode 100755 index 00000000000..fd142a66ad5 --- /dev/null +++ b/t/t5750-bundle-uri-parse.sh @@ -0,0 +1,121 @@ +#!/bin/sh + +test_description="Test bundle-uri bundle_uri_parse_line()" + +TEST_NO_CREATE_REPO=1 +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'bundle_uri_parse_line() just URIs' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-key-values in >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' ' + cat >in <<-\EOF && + =bogus-value + bogus-key= + EOF + + cat >err.expect <<-EOF && + error: bundle-uri: line has empty key or value + error: bad line: '\''=bogus-value'\'' + error: bundle-uri: line has empty key or value + error: bad line: '\''bogus-key='\'' + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + + bundle.two.uri=https://example.com/bundle.bdl + + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.one.uri=https://example.com/bundle-2.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6da7273f1d5..3175d665add 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1956,3 +1956,14 @@ test_is_magic_mtime () { rm -f .git/test-mtime-actual return $ret } + +# Given two filenames, parse both using 'git config --list --file' +# and compare the sorted output of those commands. Useful when +# wanting to ignore whitespace differences and sorting concerns. +test_cmp_config_output () { + git config --list --file="$1" >config-expect && + git config --list --file="$2" >config-actual && + sort config-expect >sorted-expect && + sort config-actual >sorted-actual && + test_cmp sorted-expect sorted-actual +} -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 06/11] bundle-uri: parse bundle list in config format 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When a bundle provider wants to operate independently from a Git remote, they want to provide a single, consistent URI that users can use in their 'git clone --bundle-uri' commands. At this point, the Git client expects that URI to be a single bundle that can be unbundled and used to bootstrap the rest of the clone from the Git server. This single bundle cannot be re-used to assist with future incremental fetches. To allow for the incremental fetch case, teach Git to understand a bundle list that could be advertised at an independent bundle URI. Such a bundle list is likely to be inspected by human readers, even if only by the bundle provider creating the list. For this reason, we can take our expected "key=value" pairs and instead format them using Git config format. Create bundle_uri_parse_config_format() to parse a file in config format and convert that into a 'struct bundle_list' filled with its understanding of the contents. Be careful to use error_action CONFIG_ERROR_ERROR when calling git_config_from_file_with_options() because the default action for git_config_from_file() is to die() on a parsing error. The current warning isn't particularly helpful if it arises to a user, but it will be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce bugs than this simpler version. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++ bundle-uri.h | 9 +++++++ t/helper/test-bundle-uri.c | 49 +++++++++++++++++++++++++++--------- t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++ 4 files changed, 123 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index c02e7f62eb1..3d44ec2b1e6 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value, return 0; } +static int config_to_bundle_list(const char *key, const char *value, void *data) +{ + struct bundle_list *list = data; + return bundle_list_update(key, value, list); +} + +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); + + if (!result && list->mode == BUNDLE_MODE_NONE) { + warning(_("bundle list at '%s' has no mode"), uri); + result = 1; + } + + return result; +} + static char *find_temp_filename(void) { int fd; diff --git a/bundle-uri.h b/bundle-uri.h index 0e56ab2ae5a..bc13d4c9929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list, struct FILE; void print_bundle_list(FILE *fp, struct bundle_list *list); +/** + * A bundle URI may point to a bundle list where the key=value + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c index 0329c56544f..25afd393428 100644 --- a/t/helper/test-bundle-uri.c +++ b/t/helper/test-bundle-uri.c @@ -4,12 +4,21 @@ #include "strbuf.h" #include "string-list.h" -static int cmd__bundle_uri_parse(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, +}; + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { const char *key_value_usage[] = { "test-tool bundle-uri parse-key-values <input>", NULL }; + const char *config_usage[] = { + "test-tool bundle-uri parse-config <input>", + NULL + }; const char **usage = key_value_usage; struct option options[] = { OPT_END(), @@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv) int err = 0; FILE *fp; - argc = parse_options(argc, argv, NULL, options, usage, 0); - if (argc != 1) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); - fp = fopen(argv[0], "r"); - if (!fp) - die("failed to open '%s'", argv[0]); - while (strbuf_getline(&sb, fp) != EOF) { - if (bundle_uri_parse_line(&list, sb.buf)) - err = error("bad line: '%s'", sb.buf); + switch (mode) { + case KEY_VALUE_PAIRS: + if (argc != 1) + goto usage; + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + fclose(fp); + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; + err = bundle_uri_parse_config_format("<uri>", argv[0], &list); + break; } strbuf_release(&sb); - fclose(fp); print_bundle_list(stdout, &list); @@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) - return cmd__bundle_uri_parse(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); error("there is no test-tool bundle-uri tool '%s'", argv[1]); usage: diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index fd142a66ad5..c2fe3f9c5a5 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines test_cmp_config_output expect actual ' +test_expect_success 'parse config format: just URIs' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'parse config format edge cases: empty key or value' ' + cat >in1 <<-\EOF && + = bogus-value + EOF + + cat >err1 <<-EOF && + error: bad config line 1 in file in1 + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && + test_cmp err1 err && + test_cmp_config_output expect actual && + + cat >in2 <<-\EOF && + bogus-key = + EOF + + cat >err2 <<-EOF && + error: bad config line 1 in file in2 + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && + test_cmp err2 err && + test_cmp_config_output expect actual +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 06/11] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The next change will start allowing us to parse bundle lists that are downloaded from a provided bundle URI. Those lists might point to other lists, which could proceed to an arbitrary depth (and even create cycles). Restructure fetch_bundle_uri() to have an internal version that has a recursion depth. Compare that to a new max_bundle_uri_depth constant that is twice as high as we expect this depth to be for any legitimate use of bundle list linking. We can consider making max_bundle_uri_depth a configurable value if there is demonstrated value in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 3d44ec2b1e6..8a7c11c6393 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } -int fetch_bundle_uri(struct repository *r, const char *uri) +/** + * This limits the recursion on fetch_bundle_uri_internal() when following + * bundle lists. + */ +static int max_bundle_uri_depth = 4; + +static int fetch_bundle_uri_internal(struct repository *r, + const char *uri, + int depth) { int result = 0; char *filename; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), + max_bundle_uri_depth); + return -1; + } + if (!(filename = find_temp_filename())) { result = -1; goto cleanup; @@ -366,6 +380,11 @@ cleanup: return result; } +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + return fetch_bundle_uri_internal(r, uri, 0); +} + /** * General API for {transport,connect}.c etc. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 17:27 ` Junio C Hamano 2022-10-10 16:04 ` [PATCH v4 09/11] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 11 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The verify_bundle() method checks if a bundle can be applied to a given repository. This not only verifies that certain commits exist in the repository, but Git also checks that these commits are reachable. This behavior dates back to the original git-bundle builtin written in 2e0afafebd8 (Add git-bundle: move objects and references by archive, 2007-02-22), but the message does not go into detail why the reachability check is important. Since verify_bundle() is called from unbundle(), we need to add an option to pipe the flags through that method. When unbundling from a list of bundles, Git will create refs that point to the tips of the latest bundle, which makes this reachability walk succeed, in theory. However, the loose refs cache does not get invalidated and hence the reachability walk fails. By disabling the reachability walk in the bundle URI code, we can get around this reachability check. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- builtin/bundle.c | 5 +++-- bundle-uri.c | 8 +++++++- bundle.c | 12 +++++++----- bundle.h | 15 +++++++++++++-- transport.c | 2 +- 5 files changed, 31 insertions(+), 11 deletions(-) diff --git a/builtin/bundle.c b/builtin/bundle.c index 2adad545a2e..7d983a238f0 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -119,7 +119,8 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { goto cleanup; } close(bundle_fd); - if (verify_bundle(the_repository, &header, !quiet)) { + if (verify_bundle(the_repository, &header, + quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) { ret = 1; goto cleanup; } @@ -185,7 +186,7 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix) strvec_pushl(&extra_index_pack_args, "-v", "--progress-title", _("Unbundling objects"), NULL); ret = !!unbundle(the_repository, &header, bundle_fd, - &extra_index_pack_args) || + &extra_index_pack_args, 0) || list_bundle_refs(&header, argc, argv); bundle_header_release(&header); cleanup: diff --git a/bundle-uri.c b/bundle-uri.c index 8a7c11c6393..ad5baabdd94 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -301,7 +301,13 @@ static int unbundle_from_file(struct repository *r, const char *file) if ((bundle_fd = read_bundle_header(file, &header)) < 0) return 1; - if ((result = unbundle(r, &header, bundle_fd, NULL))) + /* + * Skip the reachability walk here, since we will be adding + * a reachable ref pointing to the new tips, which will reach + * the prerequisite commits. + */ + if ((result = unbundle(r, &header, bundle_fd, NULL, + VERIFY_BUNDLE_SKIP_REACHABLE))) return 1; /* diff --git a/bundle.c b/bundle.c index 0208e6d90d3..36ffeb1e0eb 100644 --- a/bundle.c +++ b/bundle.c @@ -189,7 +189,7 @@ static int list_refs(struct string_list *r, int argc, const char **argv) int verify_bundle(struct repository *r, struct bundle_header *header, - int verbose) + enum verify_bundle_flags flags) { /* * Do fast check, then if any prereqs are missing then go line by line @@ -222,7 +222,8 @@ int verify_bundle(struct repository *r, error("%s", message); error("%s %s", oid_to_hex(oid), name); } - if (revs.pending.nr != p->nr) + if (revs.pending.nr != p->nr || + (flags & VERIFY_BUNDLE_SKIP_REACHABLE)) goto cleanup; req_nr = revs.pending.nr; setup_revisions(2, argv, &revs, NULL); @@ -259,7 +260,7 @@ int verify_bundle(struct repository *r, clear_commit_marks(commit, ALL_REV_FLAGS); } - if (verbose) { + if (flags & VERIFY_BUNDLE_VERBOSE) { struct string_list *r; r = &header->references; @@ -620,7 +621,8 @@ err: } int unbundle(struct repository *r, struct bundle_header *header, - int bundle_fd, struct strvec *extra_index_pack_args) + int bundle_fd, struct strvec *extra_index_pack_args, + enum verify_bundle_flags flags) { struct child_process ip = CHILD_PROCESS_INIT; strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); @@ -634,7 +636,7 @@ int unbundle(struct repository *r, struct bundle_header *header, strvec_clear(extra_index_pack_args); } - if (verify_bundle(r, header, 0)) + if (verify_bundle(r, header, flags)) return -1; ip.in = bundle_fd; ip.no_stdout = 1; diff --git a/bundle.h b/bundle.h index 0c052f54964..9f798c00d93 100644 --- a/bundle.h +++ b/bundle.h @@ -29,7 +29,14 @@ int read_bundle_header_fd(int fd, struct bundle_header *header, int create_bundle(struct repository *r, const char *path, int argc, const char **argv, struct strvec *pack_options, int version); -int verify_bundle(struct repository *r, struct bundle_header *header, int verbose); + +enum verify_bundle_flags { + VERIFY_BUNDLE_VERBOSE = (1 << 0), + VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1) +}; + +int verify_bundle(struct repository *r, struct bundle_header *header, + enum verify_bundle_flags flags); /** * Unbundle after reading the header with read_bundle_header(). @@ -40,9 +47,13 @@ int verify_bundle(struct repository *r, struct bundle_header *header, int verbos * Provide "extra_index_pack_args" to pass any extra arguments * (e.g. "-v" for verbose/progress), NULL otherwise. The provided * "extra_index_pack_args" (if any) will be strvec_clear()'d for you. + * + * Before unbundling, this method will call verify_bundle() with the + * given 'flags'. */ int unbundle(struct repository *r, struct bundle_header *header, - int bundle_fd, struct strvec *extra_index_pack_args); + int bundle_fd, struct strvec *extra_index_pack_args, + enum verify_bundle_flags flags); int list_bundle_refs(struct bundle_header *header, int argc, const char **argv); diff --git a/transport.c b/transport.c index 52db7a3cb09..c5d3042731a 100644 --- a/transport.c +++ b/transport.c @@ -178,7 +178,7 @@ static int fetch_refs_from_bundle(struct transport *transport, if (!data->get_refs_from_bundle_called) get_refs_from_bundle_inner(transport); ret = unbundle(the_repository, &data->header, data->fd, - &extra_index_pack_args); + &extra_index_pack_args, 0); transport->hash_algo = data->header.hash_algo; return ret; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk 2022-10-10 16:04 ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget @ 2022-10-10 17:27 ` Junio C Hamano 2022-10-10 18:13 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-10-10 17:27 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > The verify_bundle() method checks if a bundle can be applied to a given > repository. This not only verifies that certain commits exist in the > repository, but Git also checks that these commits are reachable. > > This behavior dates back to the original git-bundle builtin written in > 2e0afafebd8 (Add git-bundle: move objects and references by archive, > 2007-02-22), but the message does not go into detail why the > reachability check is important. > > Since verify_bundle() is called from unbundle(), we need to add an > option to pipe the flags through that method. All makes sense. > When unbundling from a list of bundles, Git will create refs that point > to the tips of the latest bundle, which makes this reachability walk > succeed, in theory. However, the loose refs cache does not get > invalidated and hence the reachability walk fails. By disabling the > reachability walk in the bundle URI code, we can get around this > reachability check. The above makes it sound like the real culprit is that cache goes out of sync and the presented solution is a workaround; readers are left in suspense if the "real" solution (as opposed to a workaround) would come in a later step or in a future series. > diff --git a/bundle-uri.c b/bundle-uri.c > index 8a7c11c6393..ad5baabdd94 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -301,7 +301,13 @@ static int unbundle_from_file(struct repository *r, const char *file) > if ((bundle_fd = read_bundle_header(file, &header)) < 0) > return 1; > > - if ((result = unbundle(r, &header, bundle_fd, NULL))) > + /* > + * Skip the reachability walk here, since we will be adding > + * a reachable ref pointing to the new tips, which will reach > + * the prerequisite commits. > + */ > + if ((result = unbundle(r, &header, bundle_fd, NULL, > + VERIFY_BUNDLE_SKIP_REACHABLE))) > return 1; This is not a new problem introduced in this new round, but if we are updating this, can we fix it to omit assignment inside if condition? * result is initialized to 0. * when unbundle returns non-zero, it is assigned to result and the function returns immediately, discarding whatever was assigned to the variable. * if unbundle returns zero, it is assigned to result and the control continues from here. We know result is set to 0, but then that is what it was initialized earlier. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk 2022-10-10 17:27 ` Junio C Hamano @ 2022-10-10 18:13 ` Derrick Stolee 2022-10-10 18:40 ` Junio C Hamano 0 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee @ 2022-10-10 18:13 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long On 10/10/2022 1:27 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> When unbundling from a list of bundles, Git will create refs that point >> to the tips of the latest bundle, which makes this reachability walk >> succeed, in theory. However, the loose refs cache does not get >> invalidated and hence the reachability walk fails. By disabling the >> reachability walk in the bundle URI code, we can get around this >> reachability check. > > The above makes it sound like the real culprit is that cache goes > out of sync and the presented solution is a workaround; readers are > left in suspense if the "real" solution (as opposed to a workaround) > would come in a later step or in a future series. I've been going over the refs code multiple times today trying to fix this "real" culprit, with no luck. I can share this interesting point: * The initial loop over the bundles tries to apply each, but the prerequisite objects are not present so we never reach the revision walk. A refs/bundle/* ref is added via update_ref(). * The second loop over the bundles tries to apply each, but the only bundle with its prerequisites present also finds the commits as reachable (this must be where the loose ref cache is populated). Then, a refs/bundle/* ref is added via update_ref(). * The third loop over the bundles finds a bundle whose prerequisites are present, but verify_bundle() rejected it because those commits were not seen from any ref. Other than identifying that issue, I was unable to track down exactly what is happening here or offer a fix. I had considered inserting more cache frees deep in the refs code, but I wasn't sure what effect that would have across the wider system. >> diff --git a/bundle-uri.c b/bundle-uri.c >> index 8a7c11c6393..ad5baabdd94 100644 >> --- a/bundle-uri.c >> +++ b/bundle-uri.c >> @@ -301,7 +301,13 @@ static int unbundle_from_file(struct repository *r, const char *file) >> if ((bundle_fd = read_bundle_header(file, &header)) < 0) >> return 1; >> >> - if ((result = unbundle(r, &header, bundle_fd, NULL))) >> + /* >> + * Skip the reachability walk here, since we will be adding >> + * a reachable ref pointing to the new tips, which will reach >> + * the prerequisite commits. >> + */ >> + if ((result = unbundle(r, &header, bundle_fd, NULL, >> + VERIFY_BUNDLE_SKIP_REACHABLE))) >> return 1; > > This is not a new problem introduced in this new round, but if we > are updating this, can we fix it to omit assignment inside if > condition? > > * result is initialized to 0. > > * when unbundle returns non-zero, it is assigned to result and the > function returns immediately, discarding whatever was assigned to > the variable. > > * if unbundle returns zero, it is assigned to result and the > control continues from here. We know result is set to 0, but > then that is what it was initialized earlier. Since we are not "trusting" the integer result of unbundle, we can definitely stop this assignment in the if. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk 2022-10-10 18:13 ` Derrick Stolee @ 2022-10-10 18:40 ` Junio C Hamano 2022-10-11 19:04 ` Derrick Stolee 0 siblings, 1 reply; 94+ messages in thread From: Junio C Hamano @ 2022-10-10 18:40 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long Derrick Stolee <derrickstolee@github.com> writes: > I've been going over the refs code multiple times today trying to > fix this "real" culprit, with no luck. I can share this interesting > point: > > * The initial loop over the bundles tries to apply each, but the > prerequisite objects are not present so we never reach the revision > walk. A refs/bundle/* ref is added via update_ref(). > > * The second loop over the bundles tries to apply each, but the only > bundle with its prerequisites present also finds the commits as > reachable (this must be where the loose ref cache is populated). > Then, a refs/bundle/* ref is added via update_ref(). > > * The third loop over the bundles finds a bundle whose prerequisites > are present, but verify_bundle() rejected it because those commits > were not seen from any ref. > > Other than identifying that issue, I was unable to track down exactly > what is happening here or offer a fix. I had considered inserting > more cache frees deep in the refs code, but I wasn't sure what effect > that would have across the wider system. OK. That certainly is understandable. As a comment in the proposed log message that BUNDLE_SKIP_REACHABLE bit is a band aid papering over a problem we punted in this series, to guide future developers, I think what you wrote is sufficient. We do not want them to think that skipping the check is our preferred longer term solution and add their own hack to keep skipping the check when they resolve "the real culprit". Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk 2022-10-10 18:40 ` Junio C Hamano @ 2022-10-11 19:04 ` Derrick Stolee 0 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee @ 2022-10-11 19:04 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long On 10/10/2022 2:40 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> I've been going over the refs code multiple times today trying to >> fix this "real" culprit, with no luck. I can share this interesting >> point: >> >> * The initial loop over the bundles tries to apply each, but the >> prerequisite objects are not present so we never reach the revision >> walk. A refs/bundle/* ref is added via update_ref(). >> >> * The second loop over the bundles tries to apply each, but the only >> bundle with its prerequisites present also finds the commits as >> reachable (this must be where the loose ref cache is populated). >> Then, a refs/bundle/* ref is added via update_ref(). >> >> * The third loop over the bundles finds a bundle whose prerequisites >> are present, but verify_bundle() rejected it because those commits >> were not seen from any ref. >> >> Other than identifying that issue, I was unable to track down exactly >> what is happening here or offer a fix. I had considered inserting >> more cache frees deep in the refs code, but I wasn't sure what effect >> that would have across the wider system. > > OK. That certainly is understandable. > > As a comment in the proposed log message that BUNDLE_SKIP_REACHABLE > bit is a band aid papering over a problem we punted in this series, > to guide future developers, I think what you wrote is sufficient. > We do not want them to think that skipping the check is our > preferred longer term solution and add their own hack to keep > skipping the check when they resolve "the real culprit". I have discovered the real culprit, and my expectation was incorrect about the loose ref cache. The key issue was that I was looking at this loop: i = req_nr; while (i && (commit = get_revision(&revs))) if (commit->object.flags & PREREQ_MARK) i--; and noticing that only one commit was being visited. I was not seeing the actually-important commit. But it wasn't the revision walk's fault. The loop was terminating because "i" was reaching zero! It turns out that verify_bundles() is not clearing the PREREQ_MARK flag, so multiple runs would incorrectly hit this short-circuit and terminate the walk early. I'll replace this patch with the correct fix soon. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v4 09/11] bundle-uri: fetch a list of bundles 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 10/11] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 203 ++++++++++++++++++++++++++--- bundle-uri.h | 13 ++ t/t5558-clone-bundle-uri.sh | 248 ++++++++++++++++++++++++++++++++++++ 3 files changed, 448 insertions(+), 16 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index ad5baabdd94..c0a6fb05fad 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle, { FREE_AND_NULL(bundle->id); FREE_AND_NULL(bundle->uri); + FREE_AND_NULL(bundle->file); + bundle->unbundled = 0; return 0; } @@ -340,18 +342,117 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } +struct bundle_list_context { + struct repository *r; + struct bundle_list *list; + enum bundle_list_mode mode; + int count; + int depth; +}; + +/* + * This early definition is necessary because we use indirect recursion: + * + * While iterating through a bundle list that was downloaded as part + * of fetch_bundle_uri_internal(), iterator methods eventually call it + * again, but with depth + 1. + */ +static int fetch_bundle_uri_internal(struct repository *r, + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list); + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ + int res; + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + + res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); + + /* + * Only increment count if the download succeeded. If our mode is + * BUNDLE_MODE_ANY, then we will want to try other URIs in the + * list in case they work instead. + */ + if (!res) + ctx->count++; + + /* + * To be opportunistic as possible, we continue iterating and + * download as many bundles as we can, so we can apply the ones + * that work, even in BUNDLE_MODE_ALL mode. + */ + return 0; +} + +static int download_bundle_list(struct repository *r, + struct bundle_list *local_list, + struct bundle_list *global_list, + int depth) +{ + struct bundle_list_context ctx = { + .r = r, + .list = global_list, + .depth = depth + 1, + .mode = local_list->mode, + }; + + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); +} + +static int fetch_bundle_list_in_config_format(struct repository *r, + struct bundle_list *global_list, + struct remote_bundle_info *bundle, + int depth) +{ + int result; + struct bundle_list list_from_bundle; + + init_bundle_list(&list_from_bundle); + + if ((result = bundle_uri_parse_config_format(bundle->uri, + bundle->file, + &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { + warning(_("unrecognized bundle mode from URI '%s'"), + bundle->uri); + result = -1; + goto cleanup; + } + + if ((result = download_bundle_list(r, &list_from_bundle, + global_list, depth))) + goto cleanup; + +cleanup: + clear_bundle_list(&list_from_bundle); + return result; +} + /** * This limits the recursion on fetch_bundle_uri_internal() when following * bundle lists. */ static int max_bundle_uri_depth = 4; +/** + * Recursively download all bundles advertised at the given URI + * to files. If the file is a bundle, then add it to the given + * 'list'. Otherwise, expect a bundle list and recurse on the + * URIs in that list according to the list mode (ANY or ALL). + */ static int fetch_bundle_uri_internal(struct repository *r, - const char *uri, - int depth) + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list) { int result = 0; - char *filename; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { warning(_("exceeded bundle URI recursion limit (%d)"), @@ -359,36 +460,106 @@ static int fetch_bundle_uri_internal(struct repository *r, return -1; } - if (!(filename = find_temp_filename())) { + if (!bundle->file && + !(bundle->file = find_temp_filename())) { result = -1; goto cleanup; } - if ((result = copy_uri_to_file(filename, uri))) { - warning(_("failed to download bundle from URI '%s'"), uri); + if ((result = copy_uri_to_file(bundle->file, bundle->uri))) { + warning(_("failed to download bundle from URI '%s'"), bundle->uri); goto cleanup; } - if ((result = !is_bundle(filename, 0))) { - warning(_("file at URI '%s' is not a bundle"), uri); + if ((result = !is_bundle(bundle->file, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); + if (result) + warning(_("file at URI '%s' is not a bundle or bundle list"), + bundle->uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename))) { - warning(_("failed to unbundle bundle from URI '%s'"), uri); - goto cleanup; - } + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); + bcopy->file = xstrdup(bundle->file); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: - if (filename) - unlink(filename); - free(filename); + if (result && bundle->file) + unlink(bundle->file); return result; } +/** + * This loop iterator breaks the loop with nonzero return code on the + * first successful unbundling of a bundle. + */ +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ + struct repository *r = data; + + if (!info->file || info->unbundled) + return 0; + + if (!unbundle_from_file(r, info->file)) { + info->unbundled = 1; + return 1; + } + + return 0; +} + +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. + * + * Keep in mind that a non-zero result for the loop here means + * the loop terminated early on a successful unbundling, which + * signals that we can try again. + */ + while (for_all_bundles_in_list(list, attempt_unbundle, r)) ; + + return 0; +} + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ + if (info->file) + unlink_or_warn(info->file); + return 0; +} + int fetch_bundle_uri(struct repository *r, const char *uri) { - return fetch_bundle_uri_internal(r, uri, 0); + int result; + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), + .id = xstrdup(""), + }; + + init_bundle_list(&list); + + /* If a bundle is added to this global list, then it is required. */ + list.mode = BUNDLE_MODE_ALL; + + if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list))) + goto cleanup; + + result = unbundle_all_bundles(r, &list); + +cleanup: + for_all_bundles_in_list(&list, unlink_bundle, NULL); + clear_bundle_list(&list); + clear_remote_bundle_info(&bundle, NULL); + return result; } /** diff --git a/bundle-uri.h b/bundle-uri.h index bc13d4c9929..4dbc269823c 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -28,6 +28,19 @@ struct remote_bundle_info { * if there was no table of contents. */ char *uri; + + /** + * If the bundle has been downloaded, then 'file' is a + * filename storing its contents. Otherwise, 'file' is + * NULL. + */ + char *file; + + /** + * If the bundle has been unbundled successfully, then + * this boolean is true. + */ + unsigned unbundled:1; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index ad666a2d28a..a86dc04f528 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -41,6 +41,195 @@ test_expect_success 'clone with file:// bundle' ' test_cmp expect actual ' +# To get interesting tests for bundle lists, we need to construct a +# somewhat-interesting commit history. +# +# ---------------- bundle-4 +# +# 4 +# / \ +# ----|---|------- bundle-3 +# | | +# | 3 +# | | +# ----|---|------- bundle-2 +# | | +# 2 | +# | | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'construct incremental bundle list' ' + ( + cd clone-from && + git checkout -b base && + test_commit 1 && + git checkout -b left && + test_commit 2 && + git checkout -b right base && + test_commit 3 && + git checkout -b merge left && + git merge right -m "4" && + + git bundle create bundle-1.bundle base && + git bundle create bundle-2.bundle base..left && + git bundle create bundle-3.bundle base..right && + git bundle create bundle-4.bundle merge --not left right + ) +' + +test_expect_success 'clone bundle list (file, no heuristic)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + [bundle "bundle-3"] + uri = file://$(pwd)/clone-from/bundle-3.bundle + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-file cat-file --batch-check <oids && + + git -C clone-list-file for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect actual +' + +test_expect_success 'clone bundle list (file, all mode, some failures)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = file://$(pwd)/clone-from/bundle-0.bundle + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + # No bundle-3 means bundle-4 will not apply. + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + GIT_TRACE2_PERF=1 \ + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-all-some cat-file --batch-check <oids && + + git -C clone-all-some for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect actual +' + +test_expect_success 'clone bundle list (file, all mode, all failures)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = file://$(pwd)/clone-from/bundle-0.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-all-fail cat-file --batch-check <oids && + + git -C clone-all-fail for-each-ref --format="%(refname)" >refs && + ! grep "refs/bundles/" refs +' + +test_expect_success 'clone bundle list (file, any mode)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = file://$(pwd)/clone-from/bundle-0.bundle + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-file cat-file --batch-check <oids && + + git -C clone-any-file for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect actual +' + +test_expect_success 'clone bundle list (file, any mode, all failures)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = $HTTPD_URL/bundle-0.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = $HTTPD_URL/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-fail cat-file --batch-check <oids && + + git -C clone-any-fail for-each-ref --format="%(refname)" >refs && + ! grep "refs/bundles/" refs +' + ######################################################################### # HTTP tests begin here @@ -75,6 +264,65 @@ test_expect_success 'clone HTTP bundle' ' test_config -C clone-http log.excludedecoration refs/bundle/ ' +test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + [bundle "bundle-2"] + uri = $HTTPD_URL/bundle-2.bundle + + [bundle "bundle-3"] + uri = $HTTPD_URL/bundle-3.bundle + + [bundle "bundle-4"] + uri = $HTTPD_URL/bundle-4.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-http cat-file --batch-check <oids +' + +test_expect_success 'clone bundle list (HTTP, any mode)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = $HTTPD_URL/bundle-0.bundle + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = $HTTPD_URL/bundle-5.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-http cat-file --batch-check <oids && + + git -C clone-list-file for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect actual +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 10/11] bundle-uri: quiet failed unbundlings 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 09/11] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When downloading a list of bundles in "all" mode, Git has no understanding of the dependencies between the bundles. Git attempts to unbundle the bundles in some order, but some may not pass the verify_bundle() step because of missing prerequisites. This is passed as error messages to the user, even when they eventually succeed in later attempts after their dependent bundles are unbundled. Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the error messages from the missing prerequisite commits. The method still returns the number of missing prerequisit commits, allowing callers to unbundle() to notice that the bundle failed to apply. Use this flag in bundle-uri.c and test that the messages go away for 'git clone --bundle-uri' commands. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 2 +- bundle.c | 10 ++++++++-- bundle.h | 3 ++- t/t5558-clone-bundle-uri.sh | 25 ++++++++++++++++++++----- 4 files changed, 31 insertions(+), 9 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index c0a6fb05fad..18b993c207f 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -309,7 +309,7 @@ static int unbundle_from_file(struct repository *r, const char *file) * the prerequisite commits. */ if ((result = unbundle(r, &header, bundle_fd, NULL, - VERIFY_BUNDLE_SKIP_REACHABLE))) + VERIFY_BUNDLE_SKIP_REACHABLE | VERIFY_BUNDLE_QUIET))) return 1; /* diff --git a/bundle.c b/bundle.c index 36ffeb1e0eb..143e7c4508f 100644 --- a/bundle.c +++ b/bundle.c @@ -218,7 +218,10 @@ int verify_bundle(struct repository *r, add_pending_object(&revs, o, name); continue; } - if (++ret == 1) + ret++; + if (flags & VERIFY_BUNDLE_QUIET) + continue; + if (ret == 1) error("%s", message); error("%s %s", oid_to_hex(oid), name); } @@ -246,7 +249,10 @@ int verify_bundle(struct repository *r, assert(o); /* otherwise we'd have returned early */ if (o->flags & SHOWN) continue; - if (++ret == 1) + ret++; + if (flags & VERIFY_BUNDLE_QUIET) + continue; + if (ret == 1) error("%s", message); error("%s %s", oid_to_hex(oid), name); } diff --git a/bundle.h b/bundle.h index 9f798c00d93..ba453404163 100644 --- a/bundle.h +++ b/bundle.h @@ -32,7 +32,8 @@ int create_bundle(struct repository *r, const char *path, enum verify_bundle_flags { VERIFY_BUNDLE_VERBOSE = (1 << 0), - VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1) + VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1), + VERIFY_BUNDLE_QUIET = (1 << 2), }; int verify_bundle(struct repository *r, struct bundle_header *header, diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index a86dc04f528..9b159078386 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -99,7 +99,10 @@ test_expect_success 'clone bundle list (file, no heuristic)' ' uri = file://$(pwd)/clone-from/bundle-4.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-list-file 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-list-file cat-file --batch-check <oids && @@ -141,7 +144,10 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' ' EOF GIT_TRACE2_PERF=1 \ - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-all-some 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-some cat-file --batch-check <oids && @@ -169,7 +175,10 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' ' uri = file://$(pwd)/clone-from/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-all-fail 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-fail cat-file --batch-check <oids && @@ -195,7 +204,10 @@ test_expect_success 'clone bundle list (file, any mode)' ' uri = file://$(pwd)/clone-from/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-any-file 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-file cat-file --batch-check <oids && @@ -284,7 +296,10 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' ' uri = $HTTPD_URL/bundle-4.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http && + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-list-http 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-list-http cat-file --batch-check <oids ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 10/11] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 11 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-10 16:04 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When downloading bundles from a git-remote-https subprocess, the bundle URI logic wants to be opportunistic and download as much as possible and work with what did succeed. This is particularly important in the "any" mode, where any single bundle success will work. If the URI is not available, the git-remote-https process will die() with a "fatal:" error message, even though that error is not actually fatal to the super process. Since stderr is passed through, it looks like a fatal error to the user. Suppress stderr to avoid these errors from bubbling to the surface. The bundle URI API adds its own warning() messages on these failures. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 1 + t/t5558-clone-bundle-uri.sh | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 18b993c207f..6bfba95f872 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -230,6 +230,7 @@ static int download_https_uri_to_file(const char *file, const char *uri) int found_get = 0; strvec_pushl(&cp.args, "git-remote-https", uri, NULL); + cp.err = -1; cp.in = -1; cp.out = -1; diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9b159078386..9155f31fa2c 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -147,6 +147,8 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' ' git clone --bundle-uri="file://$(pwd)/bundle-list" \ clone-from clone-all-some 2>err && ! grep "Repository lacks these prerequisite commits" err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-some cat-file --batch-check <oids && @@ -178,6 +180,8 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' ' git clone --bundle-uri="file://$(pwd)/bundle-list" \ clone-from clone-all-fail 2>err && ! grep "Repository lacks these prerequisite commits" err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-fail cat-file --batch-check <oids && @@ -234,7 +238,11 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' ' uri = $HTTPD_URL/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-any-fail 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-fail cat-file --batch-check <oids && @@ -323,7 +331,11 @@ test_expect_success 'clone bundle list (HTTP, any mode)' ' uri = $HTTPD_URL/bundle-5.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-any-http 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-http cat-file --batch-check <oids && -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2022-10-10 16:04 ` [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget ` (12 more replies) 11 siblings, 13 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list. The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series). The patches are organized in this way (updated for v4): 1. Patch 1 is a cleanup from the previous part. This allows us to simplify our bundle list data structure slightly. 2. Patches 2-3 create the bundle list data structures and the logic for populating the list from key-value pairs. 3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2]. 4. Patch 6 teaches Git to parse Git config files into bundle lists. 5. Patches 7-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. This also fixes a previous bug when running verify_bundle() multiple times in the same process, as it did not clear the PREREQ_MARK flag upon leaving (see patch 8). 6. Patches 10-12 suppress unhelpful warnings from user visibility. [1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/ [2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/ At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file. As outlined in the design document [1], the next steps after this are: 1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2] 2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3] I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going: [3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com [4] https://github.com/derrickstolee/git/pull/21 [5] https://github.com/derrickstolee/git/pull/22 Updates in v5 ============= * The bug about verify_bundle() not working multile times in the same process is fixed without removing the revision walk. Instead, more flags needed to be removed as the method cleaned up after itself. Updates in v4 ============= * Properly updated the patch outline. * Jonathan Tan asked for more tests, and this revealed some interesting behaviors which I have now either fixed or made explicit: 1. In "all" mode, we try to download and apply all bundles. Do not fail if a single bundle download fails. 2. Previously, not all bundles were being applied, and this was noticed by the added checks for the refs/bundles/* refs at the end of the tests. This revealed the need for removing the reachability walk from verify_bundle() since the written refs/bundles/* refs were not being picked up by the loose ref cache. Since removing the reachability walk seemed like the faster (for users) option, I went that direction. 3. While running those tests and examining the output carefully, I noticed several error messages related to missing prerequisites due to attempting unbundling in a random order. This doesn't appear in the later creationToken version, so I hadn't noticed it at the tip of my local work. These messages are removed with a new quiet mode for verify_bundle(). Updates in v3 ============= * Fixed a comment about a return value of -1. * Fixed and tested scenario where early URIs fail in "any" mode and Git should try the rest of the list. * Instead of using 'success_count' and 'failure_count', use the iterator return value to terminate the "all" mode loop early. Updates in v2 ============= Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version. * I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches. * We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used. * The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series). * The bundle.version value is now parsed using git_parse_int(). * The config key is now parsed using parse_config_key(). * Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort. * Some printf()s are correctly changed to fprintf()s. * The test helper CLI is unified across the two modes. They both take a filename now. * The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure. Thanks, * Stolee Derrick Stolee (10): bundle-uri: use plain string in find_temp_filename() bundle-uri: create bundle_list struct and helpers bundle-uri: create base key-value pair parsing bundle-uri: parse bundle list in config format bundle-uri: limit recursion depth for bundle lists bundle: properly clear all revision flags bundle-uri: fetch a list of bundles bundle: add flags to verify_bundle() bundle-uri: quiet failed unbundlings bundle-uri: suppress stderr from remote-https Ævar Arnfjörð Bjarmason (2): bundle-uri: create "key=value" line parsing bundle-uri: unit test "key=value" parsing Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 ++ Makefile | 1 + builtin/bundle.c | 5 +- bundle-uri.c | 458 ++++++++++++++++++++++++++++++-- bundle-uri.h | 93 +++++++ bundle.c | 42 +-- bundle.h | 15 +- config.c | 2 +- config.h | 1 + t/helper/test-bundle-uri.c | 95 +++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5558-clone-bundle-uri.sh | 275 +++++++++++++++++++ t/t5750-bundle-uri-parse.sh | 171 ++++++++++++ t/test-lib-functions.sh | 11 + transport.c | 2 +- 17 files changed, 1156 insertions(+), 43 deletions(-) create mode 100644 Documentation/config/bundle.txt create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1333 Range-diff vs v4: 1: 48beccb0f5e = 1: 48beccb0f5e bundle-uri: use plain string in find_temp_filename() 2: f0c4457951c = 2: f0c4457951c bundle-uri: create bundle_list struct and helpers 3: 430e01cd2a4 = 3: 430e01cd2a4 bundle-uri: create base key-value pair parsing 4: cd915d57f3b = 4: cd915d57f3b bundle-uri: create "key=value" line parsing 5: 4d8cac67f66 = 5: 4d8cac67f66 bundle-uri: unit test "key=value" parsing 6: 0ecae3a44b3 = 6: 0ecae3a44b3 bundle-uri: parse bundle list in config format 7: 7e6b32313b0 = 7: 7e6b32313b0 bundle-uri: limit recursion depth for bundle lists -: ----------- > 8: 8dc5a8e4e63 bundle: properly clear all revision flags 9: 6b9c764c6b3 = 9: 51e9b8474fb bundle-uri: fetch a list of bundles 8: 83f2cd893a4 ! 10: fba3a4a117e bundle: add flags to verify_bundle(), skip walk @@ Metadata Author: Derrick Stolee <derrickstolee@github.com> ## Commit message ## - bundle: add flags to verify_bundle(), skip walk + bundle: add flags to verify_bundle() - The verify_bundle() method checks if a bundle can be applied to a given - repository. This not only verifies that certain commits exist in the - repository, but Git also checks that these commits are reachable. - - This behavior dates back to the original git-bundle builtin written in - 2e0afafebd8 (Add git-bundle: move objects and references by archive, - 2007-02-22), but the message does not go into detail why the - reachability check is important. - - Since verify_bundle() is called from unbundle(), we need to add an - option to pipe the flags through that method. - - When unbundling from a list of bundles, Git will create refs that point - to the tips of the latest bundle, which makes this reachability walk - succeed, in theory. However, the loose refs cache does not get - invalidated and hence the reachability walk fails. By disabling the - reachability walk in the bundle URI code, we can get around this - reachability check. + The verify_bundle() method has a 'verbose' option, but we will want to + extend this method to have more granular control over its output. First, + replace this 'verbose' option with a new 'flags' option with a single + possible value: VERIFY_BUNDLE_VERBOSE. Signed-off-by: Derrick Stolee <derrickstolee@github.com> @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + * a reachable ref pointing to the new tips, which will reach + * the prerequisite commits. + */ -+ if ((result = unbundle(r, &header, bundle_fd, NULL, -+ VERIFY_BUNDLE_SKIP_REACHABLE))) ++ if ((result = unbundle(r, &header, bundle_fd, NULL, 0))) return 1; /* @@ bundle.c: static int list_refs(struct string_list *r, int argc, const char **arg /* * Do fast check, then if any prereqs are missing then go line by line @@ bundle.c: int verify_bundle(struct repository *r, - error("%s", message); error("%s %s", oid_to_hex(oid), name); } -- if (revs.pending.nr != p->nr) -+ if (revs.pending.nr != p->nr || -+ (flags & VERIFY_BUNDLE_SKIP_REACHABLE)) - goto cleanup; - req_nr = revs.pending.nr; - setup_revisions(2, argv, &revs, NULL); -@@ bundle.c: int verify_bundle(struct repository *r, - clear_commit_marks(commit, ALL_REV_FLAGS); - } - if (verbose) { + if (flags & VERIFY_BUNDLE_VERBOSE) { @@ bundle.h: int read_bundle_header_fd(int fd, struct bundle_header *header, + +enum verify_bundle_flags { + VERIFY_BUNDLE_VERBOSE = (1 << 0), -+ VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1) +}; + +int verify_bundle(struct repository *r, struct bundle_header *header, 10: 1cae3096624 ! 11: 2e0bfa834f1 bundle-uri: quiet failed unbundlings @@ Commit message Signed-off-by: Derrick Stolee <derrickstolee@github.com> + ## builtin/bundle.c ## +@@ builtin/bundle.c: static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { + } + close(bundle_fd); + if (verify_bundle(the_repository, &header, +- quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) { ++ quiet ? VERIFY_BUNDLE_QUIET : VERIFY_BUNDLE_VERBOSE)) { + ret = 1; + goto cleanup; + } + ## bundle-uri.c ## @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file) + * a reachable ref pointing to the new tips, which will reach * the prerequisite commits. */ - if ((result = unbundle(r, &header, bundle_fd, NULL, -- VERIFY_BUNDLE_SKIP_REACHABLE))) -+ VERIFY_BUNDLE_SKIP_REACHABLE | VERIFY_BUNDLE_QUIET))) +- if ((result = unbundle(r, &header, bundle_fd, NULL, 0))) ++ if ((result = unbundle(r, &header, bundle_fd, NULL, ++ VERIFY_BUNDLE_QUIET))) return 1; /* @@ bundle.h: int create_bundle(struct repository *r, const char *path, enum verify_bundle_flags { VERIFY_BUNDLE_VERBOSE = (1 << 0), -- VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1) -+ VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1), -+ VERIFY_BUNDLE_QUIET = (1 << 2), ++ VERIFY_BUNDLE_QUIET = (1 << 1), }; int verify_bundle(struct repository *r, struct bundle_header *header, 11: 52a575f8a69 = 12: 5729ff2af4b bundle-uri: suppress stderr from remote-https -- gitgitgadget ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget ` (11 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The find_temp_filename() method was created in 53a50892be2 (bundle-uri: create basic file-copy logic, 2022-08-09) and uses odb_mkstemp() to create a temporary filename. The odb_mkstemp() method uses a strbuf in its interface, but we do not need to continue carrying a strbuf throughout the bundle URI code. Convert the find_temp_filename() method to use a 'char *' and modify its only caller. This makes sense that we don't actually need to modify this filename directly later, so using a strbuf is overkill. This change will simplify the data structure for tracking a bundle list to use plain strings instead of strbufs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 4a8cc74ed05..8b2f4e08c9c 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -5,22 +5,23 @@ #include "refs.h" #include "run-command.h" -static int find_temp_filename(struct strbuf *name) +static char *find_temp_filename(void) { int fd; + struct strbuf name = STRBUF_INIT; /* * Find a temporary filename that is available. This is briefly * racy, but unlikely to collide. */ - fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX"); + fd = odb_mkstemp(&name, "bundles/tmp_uri_XXXXXX"); if (fd < 0) { warning(_("failed to create temporary file")); - return -1; + return NULL; } close(fd); - unlink(name->buf); - return 0; + unlink(name.buf); + return strbuf_detach(&name, NULL); } static int download_https_uri_to_file(const char *file, const char *uri) @@ -141,28 +142,31 @@ static int unbundle_from_file(struct repository *r, const char *file) int fetch_bundle_uri(struct repository *r, const char *uri) { int result = 0; - struct strbuf filename = STRBUF_INIT; + char *filename; - if ((result = find_temp_filename(&filename))) + if (!(filename = find_temp_filename())) { + result = -1; goto cleanup; + } - if ((result = copy_uri_to_file(filename.buf, uri))) { + if ((result = copy_uri_to_file(filename, uri))) { warning(_("failed to download bundle from URI '%s'"), uri); goto cleanup; } - if ((result = !is_bundle(filename.buf, 0))) { + if ((result = !is_bundle(filename, 0))) { warning(_("file at URI '%s' is not a bundle"), uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename.buf))) { + if ((result = unbundle_from_file(r, filename))) { warning(_("failed to unbundle bundle from URI '%s'"), uri); goto cleanup; } cleanup: - unlink(filename.buf); - strbuf_release(&filename); + if (filename) + unlink(filename); + free(filename); return result; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 03/12] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> It will likely be rare where a user uses a single bundle URI and expects that URI to point to a bundle. Instead, that URI will likely be a list of bundles provided in some format. Alternatively, the Git server could advertise a list of bundles. In anticipation of these two ways of advertising multiple bundles, create a data structure that represents such a list. This will be populated using a common API, but for now focus on what data can be represented. Each list contains a number of remote_bundle_info structs. These contain an 'id' that is used to uniquely identify them in the list, and also a 'uri' that contains the location of its data. Finally, there is a strbuf containing the filename used when Git downloads the contents to disk. The list itself stores these remote_bundle_info structs in a hashtable using 'id' as the key. The order of the structs in the input is considered unimportant, but future modifications to the format and these data structures will place ordering possibilities on the set. The list also has a few "global" properties, including the version (used when parsing the list) and the mode. The mode is one of these two options: 1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined together. The client should download all of the advertised data to have a complete copy of the data. 2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete copy of the data. The client can choose arbitrarily from these options. In the future, the client may use pings to find the closest URI among geodistributed replicas, or use some other heuristic information added to the format. This API is currently unused, but will soon be expanded with parsing logic and then be consumed by the bundle URI download logic. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 8b2f4e08c9c..f9a8db221bc 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -4,6 +4,66 @@ #include "object-store.h" #include "refs.h" #include "run-command.h" +#include "hashmap.h" +#include "pkt-line.h" + +static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, + const struct hashmap_entry *he2, + const void *id) +{ + const struct remote_bundle_info *e1 = + container_of(he1, const struct remote_bundle_info, ent); + const struct remote_bundle_info *e2 = + container_of(he2, const struct remote_bundle_info, ent); + + return strcmp(e1->id, id ? (const char *)id : e2->id); +} + +void init_bundle_list(struct bundle_list *list) +{ + memset(list, 0, sizeof(*list)); + + /* Implied defaults. */ + list->mode = BUNDLE_MODE_ALL; + list->version = 1; + + hashmap_init(&list->bundles, compare_bundles, NULL, 0); +} + +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); + return 0; +} + +void clear_bundle_list(struct bundle_list *list) +{ + if (!list) + return; + + for_all_bundles_in_list(list, clear_remote_bundle_info, NULL); + hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent); +} + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data) +{ + struct remote_bundle_info *info; + struct hashmap_iter i; + + hashmap_for_each_entry(&list->bundles, &i, info, ent) { + int result = iter(info, data); + + if (result) + return result; + } + + return 0; +} static char *find_temp_filename(void) { diff --git a/bundle-uri.h b/bundle-uri.h index 8a152f1ef14..ff7e3fd3fb2 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -1,7 +1,63 @@ #ifndef BUNDLE_URI_H #define BUNDLE_URI_H +#include "hashmap.h" +#include "strbuf.h" + struct repository; +struct string_list; + +/** + * The remote_bundle_info struct contains information for a single bundle + * URI. This may be initialized simply by a given URI or might have + * additional metadata associated with it if the bundle was advertised by + * a bundle list. + */ +struct remote_bundle_info { + struct hashmap_entry ent; + + /** + * The 'id' is a name given to the bundle for reference + * by other bundle infos. + */ + char *id; + + /** + * The 'uri' is the location of the remote bundle so + * it can be downloaded on-demand. This will be NULL + * if there was no table of contents. + */ + char *uri; +}; + +#define REMOTE_BUNDLE_INFO_INIT { 0 } + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, + BUNDLE_MODE_ALL, + BUNDLE_MODE_ANY +}; + +/** + * A bundle_list contains an unordered set of remote_bundle_info structs, + * as well as information about the bundle listing, such as version and + * mode. + */ +struct bundle_list { + int version; + enum bundle_list_mode mode; + struct hashmap bundles; +}; + +void init_bundle_list(struct bundle_list *list); +void clear_bundle_list(struct bundle_list *list); + +typedef int (*bundle_iterator)(struct remote_bundle_info *bundle, + void *data); + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data); /** * Fetch data from the given 'uri' and unbundle the bundle data found -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 03/12] bundle-uri: create base key-value pair parsing 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 04/12] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (9 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "<id>" as part of the key name. Keys using the same "<id>" describe a single bundle list item. * bundle.<id>.uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. While parsing, return an error if a URI key is repeated, since we can make that restriction with bundle lists. Make the git_parse_int() method global so we can parse the integer version value carefully. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 +++++++++++ bundle-uri.c | 76 +++++++++++++++++++++++++++++++++ config.c | 2 +- config.h | 1 + 5 files changed, 104 insertions(+), 1 deletion(-) create mode 100644 Documentation/config/bundle.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index e376d547ce0..4280af6992e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -387,6 +387,8 @@ include::config/branch.txt[] include::config/browser.txt[] +include::config/bundle.txt[] + include::config/checkout.txt[] include::config/clean.txt[] diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt new file mode 100644 index 00000000000..daa21eb674a --- /dev/null +++ b/Documentation/config/bundle.txt @@ -0,0 +1,24 @@ +bundle.*:: + The `bundle.*` keys may appear in a bundle list file found via the + `git clone --bundle-uri` option. These keys currently have no effect + if placed in a repository config file, though this will change in the + future. See link:technical/bundle-uri.html[the bundle URI design + document] for more details. + +bundle.version:: + This integer value advertises the version of the bundle list format + used by the bundle list. Currently, the only accepted value is `1`. + +bundle.mode:: + This string value should be either `all` or `any`. This value describes + whether all of the advertised bundles are required to unbundle a + complete understanding of the bundled information (`all`) or if any one + of the listed bundle URIs is sufficient (`any`). + +bundle.<id>.*:: + The `bundle.<id>.*` keys are used to describe a single item in the + bundle list, grouped under `<id>` for identification purposes. + +bundle.<id>.uri:: + This string value defines the URI by which Git can reach the contents + of this `<id>`. This URI may be a bundle file or another bundle list. diff --git a/bundle-uri.c b/bundle-uri.c index f9a8db221bc..0bc59dd9c34 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -6,6 +6,7 @@ #include "run-command.h" #include "hashmap.h" #include "pkt-line.h" +#include "config.h" static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, @@ -65,6 +66,81 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +/** + * Given a key-value pair, update the state of the given bundle list. + * Returns 0 if the key-value pair is understood. Returns -1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; + const char *subsection, *subkey; + size_t subsection_len; + + if (parse_config_key(key, "bundle", &subsection, &subsection_len, &subkey)) + return -1; + + if (!subsection_len) { + if (!strcmp(subkey, "version")) { + int version; + if (!git_parse_int(value, &version)) + return -1; + if (version != 1) + return -1; + + list->version = version; + return 0; + } + + if (!strcmp(subkey, "mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else + return -1; + return 0; + } + + /* Ignore other unknown global keys. */ + return 0; + } + + strbuf_add(&id, subsection, subsection_len); + + /* + * Check for an existing bundle with this <id>, or create one + * if necessary. + */ + lookup.id = id.buf; + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + + if (!strcmp(subkey, "uri")) { + if (bundle->uri) + return -1; + bundle->uri = xstrdup(value); + return 0; + } + + /* + * At this point, we ignore any information that we don't + * understand, assuming it to be hints for a heuristic the client + * does not currently understand. + */ + return 0; +} + static char *find_temp_filename(void) { int fd; diff --git a/config.c b/config.c index 015bec360f5..e93101249f6 100644 --- a/config.c +++ b/config.c @@ -1214,7 +1214,7 @@ static int git_parse_unsigned(const char *value, uintmax_t *ret, uintmax_t max) return 0; } -static int git_parse_int(const char *value, int *ret) +int git_parse_int(const char *value, int *ret) { intmax_t tmp; if (!git_parse_signed(value, &tmp, maximum_signed_value_of_type(int))) diff --git a/config.h b/config.h index ca994d77147..ef9eade6414 100644 --- a/config.h +++ b/config.h @@ -206,6 +206,7 @@ int config_with_options(config_fn_t fn, void *, int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); +int git_parse_int(const char *value, int *ret); /** * Same as `git_config_bool`, except that it returns -1 on error rather -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 04/12] bundle-uri: create "key=value" line parsing 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 03/12] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget ` (8 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> When advertising a bundle list over Git's protocol v2, we will use packet lines. Each line will be of the form "key=value" representing a bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. We are not currently implementing this protocol v2 functionality, but instead preparing to expose this parsing to be unit-testable. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++++++++- bundle-uri.h | 12 ++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 0bc59dd9c34..372e6fac5cf 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -71,7 +71,6 @@ int for_all_bundles_in_list(struct bundle_list *list, * Returns 0 if the key-value pair is understood. Returns -1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED static int bundle_list_update(const char *key, const char *value, struct bundle_list *list) { @@ -306,3 +305,29 @@ cleanup: free(filename); return result; } + +/** + * General API for {transport,connect}.c etc. + */ +int bundle_uri_parse_line(struct bundle_list *list, const char *line) +{ + int result; + const char *equals; + struct strbuf key = STRBUF_INIT; + + if (!strlen(line)) + return error(_("bundle-uri: got an empty line")); + + equals = strchr(line, '='); + + if (!equals) + return error(_("bundle-uri: line is not of the form 'key=value'")); + if (line == equals || !*(equals + 1)) + return error(_("bundle-uri: line has empty key or value")); + + strbuf_add(&key, line, equals - line); + result = bundle_list_update(key.buf, equals + 1, list); + strbuf_release(&key); + + return result; +} diff --git a/bundle-uri.h b/bundle-uri.h index ff7e3fd3fb2..90583461929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -67,4 +67,16 @@ int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); +/** + * General API for {transport,connect}.c etc. + */ + +/** + * Parse a "key=value" packet line from the bundle-uri verb. + * + * Returns 0 on success and non-zero on error. + */ +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + #endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 04/12] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-12 12:52 ` Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 06/12] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Ævar Arnfjörð Bjarmason From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= <avarab@gmail.com> Create a new 'test-tool bundle-uri' test helper. This helper will assist in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses an input file as a list of lines. These are fed into bundle_uri_parse_line() to test how we construct a 'struct bundle_list' from that data. The list is then output to stdout as if the key-value pairs were a Git config file. We use an input file instead of stdin because of a future change to parse in config-file format that works better as an input file. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- Makefile | 1 + bundle-uri.c | 33 ++++++++++ bundle-uri.h | 3 + t/helper/test-bundle-uri.c | 70 +++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5750-bundle-uri-parse.sh | 121 ++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 11 ++++ 8 files changed, 241 insertions(+) create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh diff --git a/Makefile b/Makefile index 7d5f48069ea..7dee0329c49 100644 --- a/Makefile +++ b/Makefile @@ -722,6 +722,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bitmap.o TEST_BUILTINS_OBJS += test-bloom.o +TEST_BUILTINS_OBJS += test-bundle-uri.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o TEST_BUILTINS_OBJS += test-crontab.o diff --git a/bundle-uri.c b/bundle-uri.c index 372e6fac5cf..c02e7f62eb1 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -66,6 +66,39 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +static int summarize_bundle(struct remote_bundle_info *info, void *data) +{ + FILE *fp = data; + fprintf(fp, "[bundle \"%s\"]\n", info->id); + fprintf(fp, "\turi = %s\n", info->uri); + return 0; +} + +void print_bundle_list(FILE *fp, struct bundle_list *list) +{ + const char *mode; + + switch (list->mode) { + case BUNDLE_MODE_ALL: + mode = "all"; + break; + + case BUNDLE_MODE_ANY: + mode = "any"; + break; + + case BUNDLE_MODE_NONE: + default: + mode = "<unknown>"; + } + + fprintf(fp, "[bundle]\n"); + fprintf(fp, "\tversion = %d\n", list->version); + fprintf(fp, "\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} + /** * Given a key-value pair, update the state of the given bundle list. * Returns 0 if the key-value pair is understood. Returns -1 if the key diff --git a/bundle-uri.h b/bundle-uri.h index 90583461929..0e56ab2ae5a 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -59,6 +59,9 @@ int for_all_bundles_in_list(struct bundle_list *list, bundle_iterator iter, void *data); +struct FILE; +void print_bundle_list(FILE *fp, struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c new file mode 100644 index 00000000000..0329c56544f --- /dev/null +++ b/t/helper/test-bundle-uri.c @@ -0,0 +1,70 @@ +#include "test-tool.h" +#include "parse-options.h" +#include "bundle-uri.h" +#include "strbuf.h" +#include "string-list.h" + +static int cmd__bundle_uri_parse(int argc, const char **argv) +{ + const char *key_value_usage[] = { + "test-tool bundle-uri parse-key-values <input>", + NULL + }; + const char **usage = key_value_usage; + struct option options[] = { + OPT_END(), + }; + struct strbuf sb = STRBUF_INIT; + struct bundle_list list; + int err = 0; + FILE *fp; + + argc = parse_options(argc, argv, NULL, options, usage, 0); + if (argc != 1) + goto usage; + + init_bundle_list(&list); + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + strbuf_release(&sb); + fclose(fp); + + print_bundle_list(stdout, &list); + + clear_bundle_list(&list); + + return !!err; + +usage: + usage_with_options(usage, options); +} + +int cmd__bundle_uri(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri <subcommand> [<options>]", + NULL + }; + struct option options[] = { + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_ARGV0); + if (argc == 1) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) + return cmd__bundle_uri_parse(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: + usage_with_options(usage, options); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c3..fbe2d9d8108 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "advise", cmd__advise_if_enabled }, { "bitmap", cmd__bitmap }, { "bloom", cmd__bloom }, + { "bundle-uri", cmd__bundle_uri }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, { "crontab", cmd__crontab }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb799271631..b2aa1f39a8f 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -7,6 +7,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); +int cmd__bundle_uri(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); int cmd__crontab(int argc, const char **argv); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh new file mode 100755 index 00000000000..fd142a66ad5 --- /dev/null +++ b/t/t5750-bundle-uri-parse.sh @@ -0,0 +1,121 @@ +#!/bin/sh + +test_description="Test bundle-uri bundle_uri_parse_line()" + +TEST_NO_CREATE_REPO=1 +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'bundle_uri_parse_line() just URIs' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-key-values in >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' ' + cat >in <<-\EOF && + =bogus-value + bogus-key= + EOF + + cat >err.expect <<-EOF && + error: bundle-uri: line has empty key or value + error: bad line: '\''=bogus-value'\'' + error: bundle-uri: line has empty key or value + error: bad line: '\''bogus-key='\'' + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + + bundle.two.uri=https://example.com/bundle.bdl + + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.one.uri=https://example.com/bundle-2.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bad line: '\''bundle.one.uri=https://example.com/bundle-2.bdl'\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values in >actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 6da7273f1d5..3175d665add 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1956,3 +1956,14 @@ test_is_magic_mtime () { rm -f .git/test-mtime-actual return $ret } + +# Given two filenames, parse both using 'git config --list --file' +# and compare the sorted output of those commands. Useful when +# wanting to ignore whitespace differences and sorting concerns. +test_cmp_config_output () { + git config --list --file="$1" >config-expect && + git config --list --file="$2" >config-actual && + sort config-expect >sorted-expect && + sort config-actual >sorted-actual && + test_cmp sorted-expect sorted-actual +} -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 06/12] bundle-uri: parse bundle list in config format 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When a bundle provider wants to operate independently from a Git remote, they want to provide a single, consistent URI that users can use in their 'git clone --bundle-uri' commands. At this point, the Git client expects that URI to be a single bundle that can be unbundled and used to bootstrap the rest of the clone from the Git server. This single bundle cannot be re-used to assist with future incremental fetches. To allow for the incremental fetch case, teach Git to understand a bundle list that could be advertised at an independent bundle URI. Such a bundle list is likely to be inspected by human readers, even if only by the bundle provider creating the list. For this reason, we can take our expected "key=value" pairs and instead format them using Git config format. Create bundle_uri_parse_config_format() to parse a file in config format and convert that into a 'struct bundle_list' filled with its understanding of the contents. Be careful to use error_action CONFIG_ERROR_ERROR when calling git_config_from_file_with_options() because the default action for git_config_from_file() is to die() on a parsing error. The current warning isn't particularly helpful if it arises to a user, but it will be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce bugs than this simpler version. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 27 ++++++++++++++++++++ bundle-uri.h | 9 +++++++ t/helper/test-bundle-uri.c | 49 +++++++++++++++++++++++++++--------- t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++ 4 files changed, 123 insertions(+), 12 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index c02e7f62eb1..3d44ec2b1e6 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -173,6 +173,33 @@ static int bundle_list_update(const char *key, const char *value, return 0; } +static int config_to_bundle_list(const char *key, const char *value, void *data) +{ + struct bundle_list *list = data; + return bundle_list_update(key, value, list); +} + +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); + + if (!result && list->mode == BUNDLE_MODE_NONE) { + warning(_("bundle list at '%s' has no mode"), uri); + result = 1; + } + + return result; +} + static char *find_temp_filename(void) { int fd; diff --git a/bundle-uri.h b/bundle-uri.h index 0e56ab2ae5a..bc13d4c9929 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -62,6 +62,15 @@ int for_all_bundles_in_list(struct bundle_list *list, struct FILE; void print_bundle_list(FILE *fp, struct bundle_list *list); +/** + * A bundle URI may point to a bundle list where the key=value + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ +int bundle_uri_parse_config_format(const char *uri, + const char *filename, + struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c index 0329c56544f..25afd393428 100644 --- a/t/helper/test-bundle-uri.c +++ b/t/helper/test-bundle-uri.c @@ -4,12 +4,21 @@ #include "strbuf.h" #include "string-list.h" -static int cmd__bundle_uri_parse(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, +}; + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { const char *key_value_usage[] = { "test-tool bundle-uri parse-key-values <input>", NULL }; + const char *config_usage[] = { + "test-tool bundle-uri parse-config <input>", + NULL + }; const char **usage = key_value_usage; struct option options[] = { OPT_END(), @@ -19,21 +28,35 @@ static int cmd__bundle_uri_parse(int argc, const char **argv) int err = 0; FILE *fp; - argc = parse_options(argc, argv, NULL, options, usage, 0); - if (argc != 1) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); - fp = fopen(argv[0], "r"); - if (!fp) - die("failed to open '%s'", argv[0]); - while (strbuf_getline(&sb, fp) != EOF) { - if (bundle_uri_parse_line(&list, sb.buf)) - err = error("bad line: '%s'", sb.buf); + switch (mode) { + case KEY_VALUE_PAIRS: + if (argc != 1) + goto usage; + fp = fopen(argv[0], "r"); + if (!fp) + die("failed to open '%s'", argv[0]); + while (strbuf_getline(&sb, fp) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf)) + err = error("bad line: '%s'", sb.buf); + } + fclose(fp); + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; + err = bundle_uri_parse_config_format("<uri>", argv[0], &list); + break; } strbuf_release(&sb); - fclose(fp); print_bundle_list(stdout, &list); @@ -62,7 +85,9 @@ int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) - return cmd__bundle_uri_parse(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); error("there is no test-tool bundle-uri tool '%s'", argv[1]); usage: diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index fd142a66ad5..c2fe3f9c5a5 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -118,4 +118,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: duplicate lines test_cmp_config_output expect actual ' +test_expect_success 'parse config format: just URIs' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'parse config format edge cases: empty key or value' ' + cat >in1 <<-\EOF && + = bogus-value + EOF + + cat >err1 <<-EOF && + error: bad config line 1 in file in1 + EOF + + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && + test_cmp err1 err && + test_cmp_config_output expect actual && + + cat >in2 <<-\EOF && + bogus-key = + EOF + + cat >err2 <<-EOF && + error: bad config line 1 in file in2 + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && + test_cmp err2 err && + test_cmp_config_output expect actual +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 06/12] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The next change will start allowing us to parse bundle lists that are downloaded from a provided bundle URI. Those lists might point to other lists, which could proceed to an arbitrary depth (and even create cycles). Restructure fetch_bundle_uri() to have an internal version that has a recursion depth. Compare that to a new max_bundle_uri_depth constant that is twice as high as we expect this depth to be for any legitimate use of bundle list linking. We can consider making max_bundle_uri_depth a configurable value if there is demonstrated value in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 3d44ec2b1e6..8a7c11c6393 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -334,11 +334,25 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } -int fetch_bundle_uri(struct repository *r, const char *uri) +/** + * This limits the recursion on fetch_bundle_uri_internal() when following + * bundle lists. + */ +static int max_bundle_uri_depth = 4; + +static int fetch_bundle_uri_internal(struct repository *r, + const char *uri, + int depth) { int result = 0; char *filename; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), + max_bundle_uri_depth); + return -1; + } + if (!(filename = find_temp_filename())) { result = -1; goto cleanup; @@ -366,6 +380,11 @@ cleanup: return result; } +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + return fetch_bundle_uri_internal(r, uri, 0); +} + /** * General API for {transport,connect}.c etc. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 08/12] bundle: properly clear all revision flags 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 16:17 ` Junio C Hamano 2022-10-12 12:52 ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 12 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The verify_bundle() method checks two things for a bundle's prerequisites: 1. Are these objects in the object store? 2. Are these objects reachable from our references? In this second question, multiple uses of verify_bundle() in the same process can report an invalid bundle even though it is correct. The reason is due to not clearing all of the commit marks on the commits previously walked. The revision walk machinery was first introduced in-process by fb9a54150d3 (git-bundle: avoid fork() in verify_bundle(), 2007-02-22). This implementation used "-1" as the set of flags to clear. The next meaningful change came in 2b064697a5b (revision traversal: retire BOUNDARY_SHOW, 2007-03-05), which introduced the PREREQ_MARK flag instead of a flag normally controlled by the revision-walk machinery. In 86a0a408b90 (commit: factor out clear_commit_marks_for_object_array, 2011-10-01), the loop over the array of commits was replaced with a new clear_commit_marks_for_object_array(), but simultaneously the "-1" value was replaced with "ALL_REV_FLAGS", which stopped un-setting the PREREQ_MARK flag. This means that if multiple commits were marked by the PREREQ_MARK in a previous run of verify_bundle(), then this loop could terminate early due to 'i' going to zero: while (i && (commit = get_revision(&revs))) if (commit->object.flags & PREREQ_MARK) i--; The flag clearing work was changed again in 63647391e6c (bundle: avoid using the rev_info flag leak_pending, 2017-12-25), but that was only cosmetic and did not change the behavior. It may seem that it would be sufficient to add the PREREQ_MARK flag to the clear_commit_marks() call in its current location. However, we actually need to do it in the "cleanup:" step, since the first loop checking "Are these objects in the object store?" might add the PREREQ_MARK flag to some objects and then terminate without performing a walk due to one missing object. By clearing the flags in all cases, we avoid this issue when running verify_bundle() multiple times in the same process. Moving this loop to the cleanup step alone would cause a segfault when running 'git bundle verify' outside of a repository, but this is because of that error condition using "goto cleanup" when returning is perfectly safe. Nothing has been initialized at that point, so we can return immediately without causing any leaks. This behavior is verified carefully by a test that will be added soon when Git learns to download bundle lists in a 'git clone --bundle-uri' command. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/bundle.c b/bundle.c index 0208e6d90d3..c277f3b9360 100644 --- a/bundle.c +++ b/bundle.c @@ -202,10 +202,8 @@ int verify_bundle(struct repository *r, int i, ret = 0, req_nr; const char *message = _("Repository lacks these prerequisite commits:"); - if (!r || !r->objects || !r->objects->odb) { - ret = error(_("need a repository to verify a bundle")); - goto cleanup; - } + if (!r || !r->objects || !r->objects->odb) + return error(_("need a repository to verify a bundle")); repo_init_revisions(r, &revs, NULL); for (i = 0; i < p->nr; i++) { @@ -250,15 +248,6 @@ int verify_bundle(struct repository *r, error("%s %s", oid_to_hex(oid), name); } - /* Clean up objects used, as they will be reused. */ - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - struct object_id *oid = e->util; - commit = lookup_commit_reference_gently(r, oid, 1); - if (commit) - clear_commit_marks(commit, ALL_REV_FLAGS); - } - if (verbose) { struct string_list *r; @@ -287,6 +276,14 @@ int verify_bundle(struct repository *r, list_objects_filter_spec(&header->filter)); } cleanup: + /* Clean up objects used, as they will be reused. */ + for (i = 0; i < p->nr; i++) { + struct string_list_item *e = p->items + i; + struct object_id *oid = e->util; + commit = lookup_commit_reference_gently(r, oid, 1); + if (commit) + clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK); + } release_revisions(&revs); return ret; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v5 08/12] bundle: properly clear all revision flags 2022-10-12 12:52 ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget @ 2022-10-12 16:17 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-10-12 16:17 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > The verify_bundle() method checks two things for a bundle's > prerequisites: > > 1. Are these objects in the object store? > 2. Are these objects reachable from our references? > > In this second question, multiple uses of verify_bundle() in the same > process can report an invalid bundle even though it is correct. The > reason is due to not clearing all of the commit marks on the commits > previously walked. > ... > Moving this loop to the cleanup step alone would cause a segfault when > running 'git bundle verify' outside of a repository, but this is because > of that error condition using "goto cleanup" when returning is perfectly > safe. Nothing has been initialized at that point, so we can return > immediately without causing any leaks. Nicely analyzed. The implementation clearly follows the design described above. Much better than the previous iteration. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v5 09/12] bundle-uri: fetch a list of bundles 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-26 19:06 ` Junio C Hamano 2022-10-12 12:52 ` [PATCH v5 10/12] bundle: add flags to verify_bundle() Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 12 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 203 ++++++++++++++++++++++++++--- bundle-uri.h | 13 ++ t/t5558-clone-bundle-uri.sh | 248 ++++++++++++++++++++++++++++++++++++ 3 files changed, 448 insertions(+), 16 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 8a7c11c6393..70bfd2defee 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -37,6 +37,8 @@ static int clear_remote_bundle_info(struct remote_bundle_info *bundle, { FREE_AND_NULL(bundle->id); FREE_AND_NULL(bundle->uri); + FREE_AND_NULL(bundle->file); + bundle->unbundled = 0; return 0; } @@ -334,18 +336,117 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } +struct bundle_list_context { + struct repository *r; + struct bundle_list *list; + enum bundle_list_mode mode; + int count; + int depth; +}; + +/* + * This early definition is necessary because we use indirect recursion: + * + * While iterating through a bundle list that was downloaded as part + * of fetch_bundle_uri_internal(), iterator methods eventually call it + * again, but with depth + 1. + */ +static int fetch_bundle_uri_internal(struct repository *r, + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list); + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ + int res; + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + + res = fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); + + /* + * Only increment count if the download succeeded. If our mode is + * BUNDLE_MODE_ANY, then we will want to try other URIs in the + * list in case they work instead. + */ + if (!res) + ctx->count++; + + /* + * To be opportunistic as possible, we continue iterating and + * download as many bundles as we can, so we can apply the ones + * that work, even in BUNDLE_MODE_ALL mode. + */ + return 0; +} + +static int download_bundle_list(struct repository *r, + struct bundle_list *local_list, + struct bundle_list *global_list, + int depth) +{ + struct bundle_list_context ctx = { + .r = r, + .list = global_list, + .depth = depth + 1, + .mode = local_list->mode, + }; + + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); +} + +static int fetch_bundle_list_in_config_format(struct repository *r, + struct bundle_list *global_list, + struct remote_bundle_info *bundle, + int depth) +{ + int result; + struct bundle_list list_from_bundle; + + init_bundle_list(&list_from_bundle); + + if ((result = bundle_uri_parse_config_format(bundle->uri, + bundle->file, + &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { + warning(_("unrecognized bundle mode from URI '%s'"), + bundle->uri); + result = -1; + goto cleanup; + } + + if ((result = download_bundle_list(r, &list_from_bundle, + global_list, depth))) + goto cleanup; + +cleanup: + clear_bundle_list(&list_from_bundle); + return result; +} + /** * This limits the recursion on fetch_bundle_uri_internal() when following * bundle lists. */ static int max_bundle_uri_depth = 4; +/** + * Recursively download all bundles advertised at the given URI + * to files. If the file is a bundle, then add it to the given + * 'list'. Otherwise, expect a bundle list and recurse on the + * URIs in that list according to the list mode (ANY or ALL). + */ static int fetch_bundle_uri_internal(struct repository *r, - const char *uri, - int depth) + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list) { int result = 0; - char *filename; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { warning(_("exceeded bundle URI recursion limit (%d)"), @@ -353,36 +454,106 @@ static int fetch_bundle_uri_internal(struct repository *r, return -1; } - if (!(filename = find_temp_filename())) { + if (!bundle->file && + !(bundle->file = find_temp_filename())) { result = -1; goto cleanup; } - if ((result = copy_uri_to_file(filename, uri))) { - warning(_("failed to download bundle from URI '%s'"), uri); + if ((result = copy_uri_to_file(bundle->file, bundle->uri))) { + warning(_("failed to download bundle from URI '%s'"), bundle->uri); goto cleanup; } - if ((result = !is_bundle(filename, 0))) { - warning(_("file at URI '%s' is not a bundle"), uri); + if ((result = !is_bundle(bundle->file, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); + if (result) + warning(_("file at URI '%s' is not a bundle or bundle list"), + bundle->uri); goto cleanup; } - if ((result = unbundle_from_file(r, filename))) { - warning(_("failed to unbundle bundle from URI '%s'"), uri); - goto cleanup; - } + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); + bcopy->file = xstrdup(bundle->file); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: - if (filename) - unlink(filename); - free(filename); + if (result && bundle->file) + unlink(bundle->file); return result; } +/** + * This loop iterator breaks the loop with nonzero return code on the + * first successful unbundling of a bundle. + */ +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ + struct repository *r = data; + + if (!info->file || info->unbundled) + return 0; + + if (!unbundle_from_file(r, info->file)) { + info->unbundled = 1; + return 1; + } + + return 0; +} + +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. + * + * Keep in mind that a non-zero result for the loop here means + * the loop terminated early on a successful unbundling, which + * signals that we can try again. + */ + while (for_all_bundles_in_list(list, attempt_unbundle, r)) ; + + return 0; +} + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ + if (info->file) + unlink_or_warn(info->file); + return 0; +} + int fetch_bundle_uri(struct repository *r, const char *uri) { - return fetch_bundle_uri_internal(r, uri, 0); + int result; + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), + .id = xstrdup(""), + }; + + init_bundle_list(&list); + + /* If a bundle is added to this global list, then it is required. */ + list.mode = BUNDLE_MODE_ALL; + + if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list))) + goto cleanup; + + result = unbundle_all_bundles(r, &list); + +cleanup: + for_all_bundles_in_list(&list, unlink_bundle, NULL); + clear_bundle_list(&list); + clear_remote_bundle_info(&bundle, NULL); + return result; } /** diff --git a/bundle-uri.h b/bundle-uri.h index bc13d4c9929..4dbc269823c 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -28,6 +28,19 @@ struct remote_bundle_info { * if there was no table of contents. */ char *uri; + + /** + * If the bundle has been downloaded, then 'file' is a + * filename storing its contents. Otherwise, 'file' is + * NULL. + */ + char *file; + + /** + * If the bundle has been unbundled successfully, then + * this boolean is true. + */ + unsigned unbundled:1; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index ad666a2d28a..a86dc04f528 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -41,6 +41,195 @@ test_expect_success 'clone with file:// bundle' ' test_cmp expect actual ' +# To get interesting tests for bundle lists, we need to construct a +# somewhat-interesting commit history. +# +# ---------------- bundle-4 +# +# 4 +# / \ +# ----|---|------- bundle-3 +# | | +# | 3 +# | | +# ----|---|------- bundle-2 +# | | +# 2 | +# | | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'construct incremental bundle list' ' + ( + cd clone-from && + git checkout -b base && + test_commit 1 && + git checkout -b left && + test_commit 2 && + git checkout -b right base && + test_commit 3 && + git checkout -b merge left && + git merge right -m "4" && + + git bundle create bundle-1.bundle base && + git bundle create bundle-2.bundle base..left && + git bundle create bundle-3.bundle base..right && + git bundle create bundle-4.bundle merge --not left right + ) +' + +test_expect_success 'clone bundle list (file, no heuristic)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + [bundle "bundle-3"] + uri = file://$(pwd)/clone-from/bundle-3.bundle + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-file cat-file --batch-check <oids && + + git -C clone-list-file for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect actual +' + +test_expect_success 'clone bundle list (file, all mode, some failures)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = file://$(pwd)/clone-from/bundle-0.bundle + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + [bundle "bundle-2"] + uri = file://$(pwd)/clone-from/bundle-2.bundle + + # No bundle-3 means bundle-4 will not apply. + + [bundle "bundle-4"] + uri = file://$(pwd)/clone-from/bundle-4.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + GIT_TRACE2_PERF=1 \ + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-all-some cat-file --batch-check <oids && + + git -C clone-all-some for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect actual +' + +test_expect_success 'clone bundle list (file, all mode, all failures)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = all + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = file://$(pwd)/clone-from/bundle-0.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-all-fail cat-file --batch-check <oids && + + git -C clone-all-fail for-each-ref --format="%(refname)" >refs && + ! grep "refs/bundles/" refs +' + +test_expect_success 'clone bundle list (file, any mode)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = file://$(pwd)/clone-from/bundle-0.bundle + + [bundle "bundle-1"] + uri = file://$(pwd)/clone-from/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = file://$(pwd)/clone-from/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-file cat-file --batch-check <oids && + + git -C clone-any-file for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect actual +' + +test_expect_success 'clone bundle list (file, any mode, all failures)' ' + cat >bundle-list <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = $HTTPD_URL/bundle-0.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = $HTTPD_URL/bundle-5.bundle + EOF + + git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-fail cat-file --batch-check <oids && + + git -C clone-any-fail for-each-ref --format="%(refname)" >refs && + ! grep "refs/bundles/" refs +' + ######################################################################### # HTTP tests begin here @@ -75,6 +264,65 @@ test_expect_success 'clone HTTP bundle' ' test_config -C clone-http log.excludedecoration refs/bundle/ ' +test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + [bundle "bundle-2"] + uri = $HTTPD_URL/bundle-2.bundle + + [bundle "bundle-3"] + uri = $HTTPD_URL/bundle-3.bundle + + [bundle "bundle-4"] + uri = $HTTPD_URL/bundle-4.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-http cat-file --batch-check <oids +' + +test_expect_success 'clone bundle list (HTTP, any mode)' ' + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = any + + # Does not exist. Should be skipped. + [bundle "bundle-0"] + uri = $HTTPD_URL/bundle-0.bundle + + [bundle "bundle-1"] + uri = $HTTPD_URL/bundle-1.bundle + + # Does not exist. Should be skipped. + [bundle "bundle-5"] + uri = $HTTPD_URL/bundle-5.bundle + EOF + + git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-any-http cat-file --batch-check <oids && + + git -C clone-list-file for-each-ref --format="%(refname)" >refs && + grep "refs/bundles/" refs >actual && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect actual +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v5 09/12] bundle-uri: fetch a list of bundles 2022-10-12 12:52 ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-10-26 19:06 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-10-26 19:06 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static int fetch_bundle_list_in_config_format(struct repository *r, > + struct bundle_list *global_list, > + struct remote_bundle_info *bundle, > + int depth) > +{ > + int result; > + struct bundle_list list_from_bundle; > + > + init_bundle_list(&list_from_bundle); > + > + if ((result = bundle_uri_parse_config_format(bundle->uri, > + bundle->file, > + &list_from_bundle))) > + goto cleanup; It makes us a bit nervous to apply the config parser directly on data controlled by a third-party. bundle_uri_parse_config_format() hopefully is careful enough to avoid including other local files and call generic callbacks to affect the actual configuration used by the process. It seems bundle_list_update() discards everything it does not (care to) understand, and safe to call from config_to_bundle_list(), which in turn is called from here. OK. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v5 10/12] bundle: add flags to verify_bundle() 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 12 siblings, 0 replies; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The verify_bundle() method has a 'verbose' option, but we will want to extend this method to have more granular control over its output. First, replace this 'verbose' option with a new 'flags' option with a single possible value: VERIFY_BUNDLE_VERBOSE. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- builtin/bundle.c | 5 +++-- bundle-uri.c | 7 ++++++- bundle.c | 9 +++++---- bundle.h | 14 ++++++++++++-- transport.c | 2 +- 5 files changed, 27 insertions(+), 10 deletions(-) diff --git a/builtin/bundle.c b/builtin/bundle.c index 2adad545a2e..7d983a238f0 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -119,7 +119,8 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { goto cleanup; } close(bundle_fd); - if (verify_bundle(the_repository, &header, !quiet)) { + if (verify_bundle(the_repository, &header, + quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) { ret = 1; goto cleanup; } @@ -185,7 +186,7 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix) strvec_pushl(&extra_index_pack_args, "-v", "--progress-title", _("Unbundling objects"), NULL); ret = !!unbundle(the_repository, &header, bundle_fd, - &extra_index_pack_args) || + &extra_index_pack_args, 0) || list_bundle_refs(&header, argc, argv); bundle_header_release(&header); cleanup: diff --git a/bundle-uri.c b/bundle-uri.c index 70bfd2defee..d9060be707e 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -303,7 +303,12 @@ static int unbundle_from_file(struct repository *r, const char *file) if ((bundle_fd = read_bundle_header(file, &header)) < 0) return 1; - if ((result = unbundle(r, &header, bundle_fd, NULL))) + /* + * Skip the reachability walk here, since we will be adding + * a reachable ref pointing to the new tips, which will reach + * the prerequisite commits. + */ + if ((result = unbundle(r, &header, bundle_fd, NULL, 0))) return 1; /* diff --git a/bundle.c b/bundle.c index c277f3b9360..1f6a7f782e1 100644 --- a/bundle.c +++ b/bundle.c @@ -189,7 +189,7 @@ static int list_refs(struct string_list *r, int argc, const char **argv) int verify_bundle(struct repository *r, struct bundle_header *header, - int verbose) + enum verify_bundle_flags flags) { /* * Do fast check, then if any prereqs are missing then go line by line @@ -248,7 +248,7 @@ int verify_bundle(struct repository *r, error("%s %s", oid_to_hex(oid), name); } - if (verbose) { + if (flags & VERIFY_BUNDLE_VERBOSE) { struct string_list *r; r = &header->references; @@ -617,7 +617,8 @@ err: } int unbundle(struct repository *r, struct bundle_header *header, - int bundle_fd, struct strvec *extra_index_pack_args) + int bundle_fd, struct strvec *extra_index_pack_args, + enum verify_bundle_flags flags) { struct child_process ip = CHILD_PROCESS_INIT; strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL); @@ -631,7 +632,7 @@ int unbundle(struct repository *r, struct bundle_header *header, strvec_clear(extra_index_pack_args); } - if (verify_bundle(r, header, 0)) + if (verify_bundle(r, header, flags)) return -1; ip.in = bundle_fd; ip.no_stdout = 1; diff --git a/bundle.h b/bundle.h index 0c052f54964..6652e819981 100644 --- a/bundle.h +++ b/bundle.h @@ -29,7 +29,13 @@ int read_bundle_header_fd(int fd, struct bundle_header *header, int create_bundle(struct repository *r, const char *path, int argc, const char **argv, struct strvec *pack_options, int version); -int verify_bundle(struct repository *r, struct bundle_header *header, int verbose); + +enum verify_bundle_flags { + VERIFY_BUNDLE_VERBOSE = (1 << 0), +}; + +int verify_bundle(struct repository *r, struct bundle_header *header, + enum verify_bundle_flags flags); /** * Unbundle after reading the header with read_bundle_header(). @@ -40,9 +46,13 @@ int verify_bundle(struct repository *r, struct bundle_header *header, int verbos * Provide "extra_index_pack_args" to pass any extra arguments * (e.g. "-v" for verbose/progress), NULL otherwise. The provided * "extra_index_pack_args" (if any) will be strvec_clear()'d for you. + * + * Before unbundling, this method will call verify_bundle() with the + * given 'flags'. */ int unbundle(struct repository *r, struct bundle_header *header, - int bundle_fd, struct strvec *extra_index_pack_args); + int bundle_fd, struct strvec *extra_index_pack_args, + enum verify_bundle_flags flags); int list_bundle_refs(struct bundle_header *header, int argc, const char **argv); diff --git a/transport.c b/transport.c index 52db7a3cb09..c5d3042731a 100644 --- a/transport.c +++ b/transport.c @@ -178,7 +178,7 @@ static int fetch_refs_from_bundle(struct transport *transport, if (!data->get_refs_from_bundle_called) get_refs_from_bundle_inner(transport); ret = unbundle(the_repository, &data->header, data->fd, - &extra_index_pack_args); + &extra_index_pack_args, 0); transport->hash_algo = data->header.hash_algo; return ret; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* [PATCH v5 11/12] bundle-uri: quiet failed unbundlings 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 10/12] bundle: add flags to verify_bundle() Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-12 16:32 ` Junio C Hamano 2022-10-12 12:52 ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget 2022-10-26 14:34 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee 12 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When downloading a list of bundles in "all" mode, Git has no understanding of the dependencies between the bundles. Git attempts to unbundle the bundles in some order, but some may not pass the verify_bundle() step because of missing prerequisites. This is passed as error messages to the user, even when they eventually succeed in later attempts after their dependent bundles are unbundled. Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the error messages from the missing prerequisite commits. The method still returns the number of missing prerequisit commits, allowing callers to unbundle() to notice that the bundle failed to apply. Use this flag in bundle-uri.c and test that the messages go away for 'git clone --bundle-uri' commands. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- builtin/bundle.c | 2 +- bundle-uri.c | 3 ++- bundle.c | 10 ++++++++-- bundle.h | 1 + t/t5558-clone-bundle-uri.sh | 25 ++++++++++++++++++++----- 5 files changed, 32 insertions(+), 9 deletions(-) diff --git a/builtin/bundle.c b/builtin/bundle.c index 7d983a238f0..fd4586b09e0 100644 --- a/builtin/bundle.c +++ b/builtin/bundle.c @@ -120,7 +120,7 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { } close(bundle_fd); if (verify_bundle(the_repository, &header, - quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) { + quiet ? VERIFY_BUNDLE_QUIET : VERIFY_BUNDLE_VERBOSE)) { ret = 1; goto cleanup; } diff --git a/bundle-uri.c b/bundle-uri.c index d9060be707e..d872acf5ab0 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -308,7 +308,8 @@ static int unbundle_from_file(struct repository *r, const char *file) * a reachable ref pointing to the new tips, which will reach * the prerequisite commits. */ - if ((result = unbundle(r, &header, bundle_fd, NULL, 0))) + if ((result = unbundle(r, &header, bundle_fd, NULL, + VERIFY_BUNDLE_QUIET))) return 1; /* diff --git a/bundle.c b/bundle.c index 1f6a7f782e1..4ef7256aa11 100644 --- a/bundle.c +++ b/bundle.c @@ -216,7 +216,10 @@ int verify_bundle(struct repository *r, add_pending_object(&revs, o, name); continue; } - if (++ret == 1) + ret++; + if (flags & VERIFY_BUNDLE_QUIET) + continue; + if (ret == 1) error("%s", message); error("%s %s", oid_to_hex(oid), name); } @@ -243,7 +246,10 @@ int verify_bundle(struct repository *r, assert(o); /* otherwise we'd have returned early */ if (o->flags & SHOWN) continue; - if (++ret == 1) + ret++; + if (flags & VERIFY_BUNDLE_QUIET) + continue; + if (ret == 1) error("%s", message); error("%s %s", oid_to_hex(oid), name); } diff --git a/bundle.h b/bundle.h index 6652e819981..575c34245d1 100644 --- a/bundle.h +++ b/bundle.h @@ -32,6 +32,7 @@ int create_bundle(struct repository *r, const char *path, enum verify_bundle_flags { VERIFY_BUNDLE_VERBOSE = (1 << 0), + VERIFY_BUNDLE_QUIET = (1 << 1), }; int verify_bundle(struct repository *r, struct bundle_header *header, diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index a86dc04f528..9b159078386 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -99,7 +99,10 @@ test_expect_success 'clone bundle list (file, no heuristic)' ' uri = file://$(pwd)/clone-from/bundle-4.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-list-file && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-list-file 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-list-file cat-file --batch-check <oids && @@ -141,7 +144,10 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' ' EOF GIT_TRACE2_PERF=1 \ - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-some && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-all-some 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-some cat-file --batch-check <oids && @@ -169,7 +175,10 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' ' uri = file://$(pwd)/clone-from/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-all-fail && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-all-fail 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-fail cat-file --batch-check <oids && @@ -195,7 +204,10 @@ test_expect_success 'clone bundle list (file, any mode)' ' uri = file://$(pwd)/clone-from/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-file && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-any-file 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-file cat-file --batch-check <oids && @@ -284,7 +296,10 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' ' uri = $HTTPD_URL/bundle-4.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-list-http && + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-list-http 2>err && + ! grep "Repository lacks these prerequisite commits" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-list-http cat-file --batch-check <oids ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v5 11/12] bundle-uri: quiet failed unbundlings 2022-10-12 12:52 ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget @ 2022-10-12 16:32 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-10-12 16:32 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > When downloading a list of bundles in "all" mode, Git has no > understanding of the dependencies between the bundles. Git attempts to > unbundle the bundles in some order, but some may not pass the > verify_bundle() step because of missing prerequisites. This is passed as > error messages to the user, even when they eventually succeed in later > attempts after their dependent bundles are unbundled. > > Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the > error messages from the missing prerequisite commits. The method still > returns the number of missing prerequisit commits, allowing callers to > unbundle() to notice that the bundle failed to apply. > > Use this flag in bundle-uri.c and test that the messages go away for > 'git clone --bundle-uri' commands. > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- Interesting that we ended up with <quiet, normal, verbose> verbosity levels, but "bundle verify --verbose" does not (have to) exist, as that is the default, and 0 (aka "normal") is no longer used to call verify_bundle() by anybody. I actually was wondering that with SKIP_REACHABLE gone, we would lose the "enum verify_bundle_flags" altogether, without the need for a new "quiet" option. But that would not work as unbundle() calls verify_bundle() and callers of unbundle() do not necessarily want the verification step to squelch errors. So looks good overall. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 ` Derrick Stolee via GitGitGadget 2022-10-26 18:54 ` Junio C Hamano 2022-10-26 14:34 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee 12 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-10-12 12:52 UTC (permalink / raw) To: git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> When downloading bundles from a git-remote-https subprocess, the bundle URI logic wants to be opportunistic and download as much as possible and work with what did succeed. This is particularly important in the "any" mode, where any single bundle success will work. If the URI is not available, the git-remote-https process will die() with a "fatal:" error message, even though that error is not actually fatal to the super process. Since stderr is passed through, it looks like a fatal error to the user. Suppress stderr to avoid these errors from bubbling to the surface. The bundle URI API adds its own warning() messages on these failures. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- bundle-uri.c | 1 + t/t5558-clone-bundle-uri.sh | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index d872acf5ab0..79a914f961b 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -230,6 +230,7 @@ static int download_https_uri_to_file(const char *file, const char *uri) int found_get = 0; strvec_pushl(&cp.args, "git-remote-https", uri, NULL); + cp.err = -1; cp.in = -1; cp.out = -1; diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9b159078386..9155f31fa2c 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -147,6 +147,8 @@ test_expect_success 'clone bundle list (file, all mode, some failures)' ' git clone --bundle-uri="file://$(pwd)/bundle-list" \ clone-from clone-all-some 2>err && ! grep "Repository lacks these prerequisite commits" err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-some cat-file --batch-check <oids && @@ -178,6 +180,8 @@ test_expect_success 'clone bundle list (file, all mode, all failures)' ' git clone --bundle-uri="file://$(pwd)/bundle-list" \ clone-from clone-all-fail 2>err && ! grep "Repository lacks these prerequisite commits" err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-all-fail cat-file --batch-check <oids && @@ -234,7 +238,11 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' ' uri = $HTTPD_URL/bundle-5.bundle EOF - git clone --bundle-uri="file://$(pwd)/bundle-list" clone-from clone-any-fail && + git clone --bundle-uri="file://$(pwd)/bundle-list" \ + clone-from clone-any-fail 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-fail cat-file --batch-check <oids && @@ -323,7 +331,11 @@ test_expect_success 'clone bundle list (HTTP, any mode)' ' uri = $HTTPD_URL/bundle-5.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" clone-from clone-any-http && + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + clone-from clone-any-http 2>err && + ! grep "fatal" err && + grep "warning: failed to download bundle from URI" err && + git -C clone-from for-each-ref --format="%(objectname)" >oids && git -C clone-any-http cat-file --batch-check <oids && -- gitgitgadget ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https 2022-10-12 12:52 ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget @ 2022-10-26 18:54 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-10-26 18:54 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > When downloading bundles from a git-remote-https subprocess, the bundle > URI logic wants to be opportunistic and download as much as possible and > work with what did succeed. This is particularly important in the "any" > mode, where any single bundle success will work. > > If the URI is not available, the git-remote-https process will die() > with a "fatal:" error message, even though that error is not actually > fatal to the super process. Since stderr is passed through, it looks > like a fatal error to the user. > > Suppress stderr to avoid these errors from bubbling to the surface. The > bundle URI API adds its own warning() messages on these failures. > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > bundle-uri.c | 1 + > t/t5558-clone-bundle-uri.sh | 16 ++++++++++++++-- > 2 files changed, 15 insertions(+), 2 deletions(-) So this is the same in spirit as [11/12] to squelch errors from an action that we are prepared to fail. If we had an easy way to squelch only one class of errors (e.g. the resource no longer exists at the URI) while allowing others to pass (e.g. we downloaded but it was corrupt), that might be even better when somebody is debugging the thing, but to an end user, it is a hopefully ignorable failure either way, as long as there are other alternatives in the set of bundles in the "any" mode. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget ` (11 preceding siblings ...) 2022-10-12 12:52 ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget @ 2022-10-26 14:34 ` Derrick Stolee 2022-10-26 16:06 ` Junio C Hamano 12 siblings, 1 reply; 94+ messages in thread From: Derrick Stolee @ 2022-10-26 14:34 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget, git Cc: gitster, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long On 10/12/2022 8:52 AM, Derrick Stolee via GitGitGadget wrote: > This is the third series building the bundle URI feature. It is built on top > of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is > a URI to a bundle file. This series adds the capability of downloading and > parsing a bundle list and then downloading the URIs in that list. > > The core functionality of bundle lists is implemented by creating data > structures from a list of key-value pairs. These pairs can come from a > plain-text file in Git config format, but in the future, we will support the > list being supplied by packet lines over Git's protocol v2 in the > 'bundle-uri' command (reserved for the next series). This version has been available for a while now without comment. Could we consider it for merging to 'next' soon? I want to wait for this series to merge into 'master' before sending part IV on top, which advertises bundle URIs over protocol v2. Thanks, -Stolee ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists 2022-10-26 14:34 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee @ 2022-10-26 16:06 ` Junio C Hamano 0 siblings, 0 replies; 94+ messages in thread From: Junio C Hamano @ 2022-10-26 16:06 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab, mjcheetham, steadmon, Glen Choo, Jonathan Tan, Teng Long Derrick Stolee <derrickstolee@github.com> writes: > This version has been available for a while now without comment. Could > we consider it for merging to 'next' soon? Could somebody who has reviewed it fully give an Ack (or two)? I know an earlier rounds had some comments, but after v3 things have quieted down. I know the change from v4 to v5 has good improvements, but do not claim to have read the other parts in detail. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
end of thread, other threads:[~2022-10-26 19:09 UTC | newest] Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-22 15:12 [PATCH 0/7] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 1/7] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget 2022-08-22 17:57 ` Junio C Hamano 2022-08-22 15:12 ` [PATCH 2/7] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-08-22 18:20 ` Junio C Hamano 2022-08-23 16:29 ` Derrick Stolee 2022-08-31 22:10 ` Jonathan Tan 2022-08-31 22:02 ` Glen Choo 2022-09-01 2:38 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Teng Long 2022-08-22 15:12 ` [PATCH 3/7] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-08-22 19:17 ` Junio C Hamano 2022-08-23 16:31 ` Derrick Stolee 2022-09-02 23:41 ` Josh Steadmon 2022-08-22 15:12 ` [PATCH 4/7] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-09-01 2:56 ` Teng Long 2022-08-22 15:12 ` [PATCH 5/7] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget 2022-08-22 19:25 ` Junio C Hamano 2022-08-23 16:43 ` Derrick Stolee 2022-08-31 22:18 ` Jonathan Tan 2022-09-01 8:05 ` Teng Long 2022-08-22 15:12 ` [PATCH 6/7] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget 2022-08-22 15:12 ` [PATCH 7/7] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-09-02 23:51 ` Josh Steadmon 2022-09-05 12:50 ` Teng Long 2022-09-08 17:10 ` Derrick Stolee 2022-09-09 14:33 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 1/9] bundle-uri: short-circuit capability parsing Derrick Stolee via GitGitGadget 2022-09-09 17:24 ` Junio C Hamano 2022-09-19 17:55 ` Derrick Stolee 2022-09-09 14:33 ` [PATCH v2 2/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-09-09 17:56 ` Junio C Hamano 2022-09-19 17:54 ` Derrick Stolee 2022-09-19 18:16 ` Junio C Hamano 2022-09-09 14:33 ` [PATCH v2 3/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 4/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-09-29 21:49 ` Jonathan Tan 2022-09-09 14:33 ` [PATCH v2 5/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 6/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 7/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 8/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget 2022-09-09 14:33 ` [PATCH v2 9/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-09-29 21:58 ` Jonathan Tan 2022-09-30 12:49 ` Derrick Stolee 2022-09-26 13:19 ` [PATCH v2 0/9] Bundle URIs III: Parse and download from bundle lists Derrick Stolee 2022-09-26 19:10 ` Junio C Hamano 2022-09-29 22:00 ` Jonathan Tan 2022-09-30 13:21 ` Derrick Stolee 2022-10-04 12:34 ` [PATCH v3 " Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 1/9] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 2/9] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 3/9] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 4/9] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 5/9] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 6/9] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 7/9] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget 2022-10-04 12:34 ` [PATCH v3 8/9] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-10-04 21:44 ` Jonathan Tan 2022-10-07 13:29 ` Derrick Stolee 2022-10-04 12:34 ` [PATCH v3 9/9] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 00/11] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 01/11] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 02/11] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 03/11] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 04/11] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 05/11] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 06/11] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 07/11] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 08/11] bundle: add flags to verify_bundle(), skip walk Derrick Stolee via GitGitGadget 2022-10-10 17:27 ` Junio C Hamano 2022-10-10 18:13 ` Derrick Stolee 2022-10-10 18:40 ` Junio C Hamano 2022-10-11 19:04 ` Derrick Stolee 2022-10-10 16:04 ` [PATCH v4 09/11] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 10/11] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget 2022-10-10 16:04 ` [PATCH v4 11/11] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 01/12] bundle-uri: use plain string in find_temp_filename() Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 02/12] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 03/12] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 04/12] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 05/12] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 06/12] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 07/12] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 08/12] bundle: properly clear all revision flags Derrick Stolee via GitGitGadget 2022-10-12 16:17 ` Junio C Hamano 2022-10-12 12:52 ` [PATCH v5 09/12] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget 2022-10-26 19:06 ` Junio C Hamano 2022-10-12 12:52 ` [PATCH v5 10/12] bundle: add flags to verify_bundle() Derrick Stolee via GitGitGadget 2022-10-12 12:52 ` [PATCH v5 11/12] bundle-uri: quiet failed unbundlings Derrick Stolee via GitGitGadget 2022-10-12 16:32 ` Junio C Hamano 2022-10-12 12:52 ` [PATCH v5 12/12] bundle-uri: suppress stderr from remote-https Derrick Stolee via GitGitGadget 2022-10-26 18:54 ` Junio C Hamano 2022-10-26 14:34 ` [PATCH v5 00/12] Bundle URIs III: Parse and download from bundle lists Derrick Stolee 2022-10-26 16:06 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).