git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Bundle URIs II: git clone --bundle-uri
@ 2022-07-25 20:34 Derrick Stolee via GitGitGadget
  2022-07-25 20:34 ` [PATCH 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-07-25 20:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee

This is the second series building the bundle URI feature as discussed in
the previous series that added the design document [1]. This series does not
modify the design document, so the patches are independent and can be
applied to the latest 'master'.

[1]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

This series brings in just enough logic that we can bootstrap clones from a
single bundle using git clone --bundle-uri=<X>.

 * Patch 1 adds a 'get' capability to 'git remote-https' which allows
   downloading the contents of a URI to a local file.
 * Patch 2 creates basic file-copy logic within a new bundle-uri.c file. It
   is not used until patch 3.
 * Patch 3 creates the git clone --bundle-uri=<X> option, allowing Git to
   bootstrap a clone from a bundle, but get the remaining objects from the
   origin URL. (As of this patch, it only accepts a filename.)
 * Patch 4 extends the git clone --bundle-uri=<X> option to allow file://
   and https:// URIs.
 * Patch 5 is a CLI helper to avoid using --bundle-uri and --depth at the
   same time in git clone.

As outlined in [1], the next steps after this are:

 1. Allow parsing a bundle list as a config file at the given URI. The
    key-value format is unified with the protocol v2 verb (coming in (3)).
    [2]
 2. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [3]
 3. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [4]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[2] https://github.com/derrickstolee/git/pull/20

[3] https://github.com/derrickstolee/git/pull/21

[4] https://github.com/derrickstolee/git/pull/22

Note: this series includes some code pulled out of the first series [1], and
in particular the git fetch --bundle-uri=<X> option is removed. The
intention was to replace that with git bundle fetch <X>, but that conflicts
with the work to refactor how subcommands are parsed. The git bundle fetch
subcommand could be added later for maximum flexibility on the client side,
but we can also move forward without it.

Thanks, -Stolee

P.S. Here is the range-diff compared to v2 of the previous bundle URI
series:

1:  d444042dc4dcc < -:  ------------- docs: document bundle URI standard
2:  0a2cf60437f78 ! 1:  40808e92afb7b remote-curl: add 'get' capability
    @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
      
     +static void parse_get(struct strbuf *buf)
     +{
    -+	struct http_get_options opts = { 0 };
     +	struct strbuf url = STRBUF_INIT;
     +	struct strbuf path = STRBUF_INIT;
     +	const char *p, *space;
    @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
     +	strbuf_add(&url, p, space - p);
     +	strbuf_addstr(&path, space + 1);
     +
    -+	if (http_get_file(url.buf, path.buf, &opts))
    ++	if (http_get_file(url.buf, path.buf, NULL))
     +		die(_("failed to download file at URL '%s'"), url.buf);
     +
     +	strbuf_release(&url);
    @@ t/t5557-http-get.sh (new)
     +	get $url file1
     +	EOF
     +
    -+	test_must_fail git remote-http $url $url <input 2>err &&
    ++	test_must_fail git remote-http $url <input 2>err &&
     +	test_path_is_missing file1 &&
     +	grep "failed to download file at URL" err &&
     +	rm file1.temp
    @@ t/t5557-http-get.sh (new)
     +
     +	EOF
     +
    -+	GIT_TRACE2_PERF=1 git remote-http $url $url <input &&
    ++	GIT_TRACE2_PERF=1 git remote-http $url <input &&
     +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
     +'
     +
     +test_done
    -
    - ## transport-helper.c ##
    -@@ transport-helper.c: struct helper_data {
    - 		check_connectivity : 1,
    - 		no_disconnect_req : 1,
    - 		no_private_update : 1,
    --		object_format : 1;
    -+		object_format : 1,
    -+		get : 1;
    - 
    - 	/*
    - 	 * As an optimization, the transport code may invoke fetch before
    -@@ transport-helper.c: static struct child_process *get_helper(struct transport *transport)
    - 			data->no_private_update = 1;
    - 		} else if (starts_with(capname, "object-format")) {
    - 			data->object_format = 1;
    -+		} else if (!strcmp(capname, "get")) {
    -+			data->get = 1;
    - 		} else if (mandatory) {
    - 			die(_("unknown mandatory capability %s; this remote "
    - 			      "helper probably needs newer version of Git"),
3:  abec47564fd9c ! 2:  7d3159f0d9a29 bundle-uri: create basic file-copy logic
    @@ Commit message
         file, not a bundle list. Bundle lists will be implemented in a future
         change.
     
    +    Note that the discovery of a temporary filename is slightly racy because
    +    the odb_mkstemp() relies on the temporary file not existing. With the
    +    current implementation being limited to file copies, we could replace
    +    the copy_file() with copy_fd(). The tricky part comes in future changes
    +    that send the filename to 'git remote-https' and its 'get' capability.
    +    At that point, we need the file descriptor closed _and_ the file
    +    unlinked. If we were to keep the file descriptor open for the sake of
    +    normal file copies, then we would pollute the rest of the code for
    +    little benefit. This is especially the case because we expect that most
    +    bundle URI use will be based on HTTPS instead of file copies.
    +
         Signed-off-by: Derrick Stolee <derrickstolee@github.com>
     
      ## Makefile ##
    @@ bundle-uri.c (new)
     +#include "refs.h"
     +#include "run-command.h"
     +
    -+static void find_temp_filename(struct strbuf *name)
    ++static int find_temp_filename(struct strbuf *name)
     +{
     +	int fd;
     +	/*
    @@ bundle-uri.c (new)
     +	 * racy, but unlikely to collide.
     +	 */
     +	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
    -+	if (fd < 0)
    -+		die(_("failed to create temporary file"));
    ++	if (fd < 0) {
    ++		warning(_("failed to create temporary file"));
    ++		return -1;
    ++	}
    ++
     +	close(fd);
     +	unlink(name->buf);
    ++	return 0;
     +}
     +
    -+static int copy_uri_to_file(const char *uri, const char *file)
    ++static int copy_uri_to_file(const char *file, const char *uri)
     +{
    -+	/* Copy as a file */
    -+	return copy_file(file, uri, 0444);
    ++	/* File-based URIs only for now. */
    ++	return copy_file(file, uri, 0);
     +}
     +
     +static int unbundle_from_file(struct repository *r, const char *file)
    @@ bundle-uri.c (new)
     +	int result = 0;
     +	int bundle_fd;
     +	struct bundle_header header = BUNDLE_HEADER_INIT;
    -+	struct strvec extra_index_pack_args = STRVEC_INIT;
     +	struct string_list_item *refname;
     +	struct strbuf bundle_ref = STRBUF_INIT;
     +	size_t bundle_prefix_len;
    @@ bundle-uri.c (new)
     +	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
     +		return 1;
     +
    -+	if ((result = unbundle(r, &header, bundle_fd, &extra_index_pack_args)))
    ++	if ((result = unbundle(r, &header, bundle_fd, NULL)))
     +		return 1;
     +
     +	/*
    @@ bundle-uri.c (new)
     +	int result = 0;
     +	struct strbuf filename = STRBUF_INIT;
     +
    -+	find_temp_filename(&filename);
    -+	if ((result = copy_uri_to_file(uri, filename.buf)))
    ++	if ((result = find_temp_filename(&filename)))
    ++		goto cleanup;
    ++
    ++	if ((result = copy_uri_to_file(filename.buf, uri))) {
    ++		warning(_("failed to download bundle from URI '%s'"), uri);
     +		goto cleanup;
    ++	}
     +
    -+	if ((result = !is_bundle(filename.buf, 0)))
    ++	if ((result = !is_bundle(filename.buf, 0))) {
    ++		warning(_("file at URI '%s' is not a bundle"), uri);
     +		goto cleanup;
    ++	}
     +
    -+	if ((result = unbundle_from_file(r, filename.buf)))
    ++	if ((result = unbundle_from_file(r, filename.buf))) {
    ++		warning(_("failed to unbundle bundle from URI '%s'"), uri);
     +		goto cleanup;
    ++	}
     +
     +cleanup:
     +	unlink(filename.buf);
4:  f6255ec518857 < -:  ------------- fetch: add --bundle-uri option
-:  ------------- > 3:  29e645a54ba7f clone: add --bundle-uri option
5:  bfbd11b48bf1b ! 4:  f6bc3177332e8 bundle-uri: add support for http(s):// and file://
    @@ Metadata
      ## Commit message ##
         bundle-uri: add support for http(s):// and file://
     
    -    The previous change created the 'git fetch --bundle-uri=<uri>' option.
    +    The previous change created the 'git clone --bundle-uri=<uri>' option.
         Currently, <uri> must be a filename.
     
         Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
    @@ Commit message
         Signed-off-by: Derrick Stolee <derrickstolee@github.com>
     
      ## bundle-uri.c ##
    -@@ bundle-uri.c: static void find_temp_filename(struct strbuf *name)
    - 	unlink(name->buf);
    +@@ bundle-uri.c: static int find_temp_filename(struct strbuf *name)
    + 	return 0;
      }
      
    -+static int download_https_uri_to_file(const char *uri, const char *file)
    -+{
    +-static int copy_uri_to_file(const char *file, const char *uri)
    ++static int download_https_uri_to_file(const char *file, const char *uri)
    + {
    +-	/* File-based URIs only for now. */
    +-	return copy_file(file, uri, 0);
     +	int result = 0;
     +	struct child_process cp = CHILD_PROCESS_INIT;
     +	FILE *child_in = NULL, *child_out = NULL;
    @@ bundle-uri.c: static void find_temp_filename(struct strbuf *name)
     +	return result;
     +}
     +
    - static int copy_uri_to_file(const char *uri, const char *file)
    - {
    ++static int copy_uri_to_file(const char *filename, const char *uri)
    ++{
     +	const char *out;
    ++
     +	if (skip_prefix(uri, "https:", &out) ||
     +	    skip_prefix(uri, "http:", &out))
    -+		return download_https_uri_to_file(uri, file);
    ++		return download_https_uri_to_file(filename, uri);
     +
     +	if (!skip_prefix(uri, "file://", &out))
     +		out = uri;
     +
    - 	/* Copy as a file */
    --	return copy_file(file, uri, 0444);
    -+	return !!copy_file(file, out, 0);
    ++	/* Copy as a file */
    ++	return copy_file(filename, out, 0);
      }
      
      static int unbundle_from_file(struct repository *r, const char *file)
     
    - ## t/t5558-fetch-bundle-uri.sh ##
    -@@ t/t5558-fetch-bundle-uri.sh: test_expect_success 'fetch file bundle' '
    + ## t/t5558-clone-bundle-uri.sh ##
    +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
          test_cmp expect actual
      '
      
    -+test_expect_success 'fetch file:// bundle' '
    -+	git init fetch-file &&
    -+	git -C fetch-file fetch --bundle-uri="file://$(pwd)/fetch-from/B.bundle" &&
    -+	git -C fetch-file rev-parse refs/bundles/topic >actual &&
    -+	git -C fetch-from rev-parse topic >expect &&
    ++test_expect_success 'clone with file:// bundle' '
    ++	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
    ++		clone-from clone-file &&
    ++	git -C clone-file rev-parse refs/bundles/topic >actual &&
    ++	git -C clone-from rev-parse topic >expect &&
     +	test_cmp expect actual
     +'
     +
    @@ t/t5558-fetch-bundle-uri.sh: test_expect_success 'fetch file bundle' '
     +start_httpd
     +
     +test_expect_success 'fail to fetch from non-existent HTTP URL' '
    -+	test_must_fail git fetch --bundle-uri="$HTTPD_URL/does-not-exist" 2>err &&
    ++	test_when_finished rm -rf test &&
    ++	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
     +	grep "failed to download bundle from URI" err
     +'
     +
     +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
    ++	test_when_finished rm -rf test &&
     +	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
    -+	test_must_fail git fetch --bundle-uri="$HTTPD_URL/bogus" 2>err &&
    ++	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
     +	grep "is not a bundle" err
     +'
     +
    -+test_expect_success 'fetch HTTP bundle' '
    -+	cp fetch-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
    -+	git init fetch-http &&
    -+	git -C fetch-http fetch --bundle-uri="$HTTPD_URL/B.bundle" &&
    -+	git -C fetch-http rev-parse refs/bundles/topic >actual &&
    -+	git -C fetch-from rev-parse topic >expect &&
    -+	test_cmp expect actual
    ++test_expect_success 'clone HTTP bundle' '
    ++	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
    ++
    ++	git clone --no-local --mirror clone-from \
    ++		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
    ++
    ++	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
    ++		"$HTTPD_URL/smart/fetch.git" clone-http &&
    ++	git -C clone-http rev-parse refs/bundles/topic >actual &&
    ++	git -C clone-from rev-parse topic >expect &&
    ++	test_cmp expect actual &&
    ++
    ++	test_config -C clone-http log.excludedecoration refs/bundle/
     +'
     +
     +# Do not add tests here unless they use the HTTP server, as they will
6:  a217e9a0640b4 < -:  ------------- fetch: add 'refs/bundle/' to log.excludeDecoration
-:  ------------- > 5:  e823b168ab725 clone: --bundle-uri cannot be combined with --depth


Derrick Stolee (5):
  remote-curl: add 'get' capability
  bundle-uri: create basic file-copy logic
  clone: add --bundle-uri option
  bundle-uri: add support for http(s):// and file://
  clone: --bundle-uri cannot be combined with --depth

 Documentation/git-clone.txt         |   7 ++
 Documentation/gitremote-helpers.txt |   9 ++
 Makefile                            |   1 +
 builtin/clone.c                     |  18 +++
 bundle-uri.c                        | 168 ++++++++++++++++++++++++++++
 bundle-uri.h                        |  14 +++
 remote-curl.c                       |  32 ++++++
 t/t5557-http-get.sh                 |  37 ++++++
 t/t5558-clone-bundle-uri.sh         |  81 ++++++++++++++
 t/t5606-clone-options.sh            |   8 ++
 10 files changed, 375 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h
 create mode 100755 t/t5557-http-get.sh
 create mode 100755 t/t5558-clone-bundle-uri.sh


base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1300%2Fderrickstolee%2Fbundle-redo%2Fclone-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1300/derrickstolee/bundle-redo/clone-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1300
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] remote-curl: add 'get' capability
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
@ 2022-07-25 20:34 ` Derrick Stolee via GitGitGadget
  2022-07-27 22:08   ` Josh Steadmon
  2022-07-27 23:00   ` Ævar Arnfjörð Bjarmason
  2022-07-25 20:34 ` [PATCH 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-07-25 20:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.

Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/gitremote-helpers.txt |  9 +++++++
 remote-curl.c                       | 32 +++++++++++++++++++++++++
 t/t5557-http-get.sh                 | 37 +++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+)
 create mode 100755 t/t5557-http-get.sh

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 6f1e269ae43..ed8da428c98 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
 	Can guarantee that when a clone is requested, the received
 	pack is self contained and is connected.
 
+'get'::
+	Can use the 'get' command to download a file from a given URI.
+
 If a helper advertises 'connect', Git will use it if possible and
 fall back to another capability if the helper requests so when
 connecting (see the 'connect' command under COMMANDS).
@@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
 +
 Supported if the helper has the "stateless-connect" capability.
 
+'get' <uri> <path>::
+	Downloads the file from the given `<uri>` to the given `<path>`. If
+	`<path>.temp` exists, then Git assumes that the `.temp` file is a
+	partial download from a previous attempt and will resume the
+	download from that position.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
diff --git a/remote-curl.c b/remote-curl.c
index b8758757ece..73fbdbddd84 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1286,6 +1286,33 @@ static void parse_fetch(struct strbuf *buf)
 	strbuf_reset(buf);
 }
 
+static void parse_get(struct strbuf *buf)
+{
+	struct strbuf url = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	const char *p, *space;
+
+	if (!skip_prefix(buf->buf, "get ", &p))
+		die(_("http transport does not support %s"), buf->buf);
+
+	space = strchr(p, ' ');
+
+	if (!space)
+		die(_("protocol error: expected '<url> <path>', missing space"));
+
+	strbuf_add(&url, p, space - p);
+	strbuf_addstr(&path, space + 1);
+
+	if (http_get_file(url.buf, path.buf, NULL))
+		die(_("failed to download file at URL '%s'"), url.buf);
+
+	strbuf_release(&url);
+	strbuf_release(&path);
+	printf("\n");
+	fflush(stdout);
+	strbuf_reset(buf);
+}
+
 static int push_dav(int nr_spec, const char **specs)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1564,9 +1591,14 @@ int cmd_main(int argc, const char **argv)
 				printf("unsupported\n");
 			fflush(stdout);
 
+		} else if (skip_prefix(buf.buf, "get ", &arg)) {
+			parse_get(&buf);
+			fflush(stdout);
+
 		} else if (!strcmp(buf.buf, "capabilities")) {
 			printf("stateless-connect\n");
 			printf("fetch\n");
+			printf("get\n");
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
new file mode 100755
index 00000000000..50b7dbcf957
--- /dev/null
+++ b/t/t5557-http-get.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='test downloading a file by URL'
+
+. ./test-lib.sh
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'get by URL: 404' '
+	url="$HTTPD_URL/none.txt" &&
+	cat >input <<-EOF &&
+	capabilities
+	get $url file1
+	EOF
+
+	test_must_fail git remote-http $url <input 2>err &&
+	test_path_is_missing file1 &&
+	grep "failed to download file at URL" err &&
+	rm file1.temp
+'
+
+test_expect_success 'get by URL: 200' '
+	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
+
+	url="$HTTPD_URL/exists.txt" &&
+	cat >input <<-EOF &&
+	capabilities
+	get $url file2
+
+	EOF
+
+	GIT_TRACE2_PERF=1 git remote-http $url <input &&
+	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] bundle-uri: create basic file-copy logic
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
  2022-07-25 20:34 ` [PATCH 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
@ 2022-07-25 20:34 ` Derrick Stolee via GitGitGadget
  2022-07-27 22:09   ` Josh Steadmon
  2022-07-25 20:34 ` [PATCH 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-07-25 20:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.

For now, we also only succeed if the content at the URI is a bundle
file, not a bundle list. Bundle lists will be implemented in a future
change.

Note that the discovery of a temporary filename is slightly racy because
the odb_mkstemp() relies on the temporary file not existing. With the
current implementation being limited to file copies, we could replace
the copy_file() with copy_fd(). The tricky part comes in future changes
that send the filename to 'git remote-https' and its 'get' capability.
At that point, we need the file descriptor closed _and_ the file
unlinked. If we were to keep the file descriptor open for the sake of
normal file copies, then we would pollute the rest of the code for
little benefit. This is especially the case because we expect that most
bundle URI use will be based on HTTPS instead of file copies.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile     |   1 +
 bundle-uri.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h |  14 +++++++
 3 files changed, 119 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h

diff --git a/Makefile b/Makefile
index 1624471badc..7d5f48069ea 100644
--- a/Makefile
+++ b/Makefile
@@ -906,6 +906,7 @@ LIB_OBJS += blob.o
 LIB_OBJS += bloom.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
+LIB_OBJS += bundle-uri.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += cbtree.o
diff --git a/bundle-uri.c b/bundle-uri.c
new file mode 100644
index 00000000000..b35babc36aa
--- /dev/null
+++ b/bundle-uri.c
@@ -0,0 +1,104 @@
+#include "cache.h"
+#include "bundle-uri.h"
+#include "bundle.h"
+#include "object-store.h"
+#include "refs.h"
+#include "run-command.h"
+
+static int find_temp_filename(struct strbuf *name)
+{
+	int fd;
+	/*
+	 * Find a temporary filename that is available. This is briefly
+	 * racy, but unlikely to collide.
+	 */
+	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	if (fd < 0) {
+		warning(_("failed to create temporary file"));
+		return -1;
+	}
+
+	close(fd);
+	unlink(name->buf);
+	return 0;
+}
+
+static int copy_uri_to_file(const char *file, const char *uri)
+{
+	/* File-based URIs only for now. */
+	return copy_file(file, uri, 0);
+}
+
+static int unbundle_from_file(struct repository *r, const char *file)
+{
+	int result = 0;
+	int bundle_fd;
+	struct bundle_header header = BUNDLE_HEADER_INIT;
+	struct string_list_item *refname;
+	struct strbuf bundle_ref = STRBUF_INIT;
+	size_t bundle_prefix_len;
+
+	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
+		return 1;
+
+	if ((result = unbundle(r, &header, bundle_fd, NULL)))
+		return 1;
+
+	/*
+	 * Convert all refs/heads/ from the bundle into refs/bundles/
+	 * in the local repository.
+	 */
+	strbuf_addstr(&bundle_ref, "refs/bundles/");
+	bundle_prefix_len = bundle_ref.len;
+
+	for_each_string_list_item(refname, &header.references) {
+		struct object_id *oid = refname->util;
+		struct object_id old_oid;
+		const char *branch_name;
+		int has_old;
+
+		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+			continue;
+
+		strbuf_setlen(&bundle_ref, bundle_prefix_len);
+		strbuf_addstr(&bundle_ref, branch_name);
+
+		has_old = !read_ref(bundle_ref.buf, &old_oid);
+		update_ref("fetched bundle", bundle_ref.buf, oid,
+			   has_old ? &old_oid : NULL,
+			   REF_SKIP_OID_VERIFICATION,
+			   UPDATE_REFS_MSG_ON_ERR);
+	}
+
+	bundle_header_release(&header);
+	return result;
+}
+
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	int result = 0;
+	struct strbuf filename = STRBUF_INIT;
+
+	if ((result = find_temp_filename(&filename)))
+		goto cleanup;
+
+	if ((result = copy_uri_to_file(filename.buf, uri))) {
+		warning(_("failed to download bundle from URI '%s'"), uri);
+		goto cleanup;
+	}
+
+	if ((result = !is_bundle(filename.buf, 0))) {
+		warning(_("file at URI '%s' is not a bundle"), uri);
+		goto cleanup;
+	}
+
+	if ((result = unbundle_from_file(r, filename.buf))) {
+		warning(_("failed to unbundle bundle from URI '%s'"), uri);
+		goto cleanup;
+	}
+
+cleanup:
+	unlink(filename.buf);
+	strbuf_release(&filename);
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
new file mode 100644
index 00000000000..8a152f1ef14
--- /dev/null
+++ b/bundle-uri.h
@@ -0,0 +1,14 @@
+#ifndef BUNDLE_URI_H
+#define BUNDLE_URI_H
+
+struct repository;
+
+/**
+ * Fetch data from the given 'uri' and unbundle the bundle data found
+ * based on that information.
+ *
+ * Returns non-zero if no bundle information is found at the given 'uri'.
+ */
+int fetch_bundle_uri(struct repository *r, const char *uri);
+
+#endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] clone: add --bundle-uri option
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
  2022-07-25 20:34 ` [PATCH 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
  2022-07-25 20:34 ` [PATCH 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
@ 2022-07-25 20:34 ` Derrick Stolee via GitGitGadget
  2022-07-25 20:34 ` [PATCH 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-07-25 20:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from
a bundle. If the user is aware of a bundle server, then they can tell
Git to bootstrap the new repository with these bundles before fetching
the remaining objects from the origin server.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-clone.txt |  6 ++++++
 builtin/clone.c             | 15 +++++++++++++++
 t/t5558-clone-bundle-uri.sh | 36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+)
 create mode 100755 t/t5558-clone-bundle-uri.sh

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 632bd1348ea..60fedf7eb5e 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -323,6 +323,12 @@ or `--mirror` is given)
 	for `host.xz:foo/.git`).  Cloning into an existing directory
 	is only allowed if the directory is empty.
 
+--bundle-uri=<uri>::
+	Before fetching from the remote, fetch a bundle from the given
+	`<uri>` and unbundle the data into the local repository. The refs
+	in the bundle will be stored under the hidden `refs/bundle/*`
+	namespace.
+
 :git-clone: 1
 include::urls.txt[]
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c4ff4643ecd..4224d562758 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -34,6 +34,7 @@
 #include "list-objects-filter-options.h"
 #include "hook.h"
 #include "bundle.h"
+#include "bundle-uri.h"
 
 /*
  * Overall FIXMEs:
@@ -77,6 +78,7 @@ static int option_filter_submodules = -1;    /* unspecified */
 static int config_filter_submodules = -1;    /* unspecified */
 static struct string_list server_options = STRING_LIST_INIT_NODUP;
 static int option_remote_submodules;
+static const char *bundle_uri;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -160,6 +162,8 @@ static struct option builtin_clone_options[] = {
 		    N_("any cloned submodules will use their remote-tracking branch")),
 	OPT_BOOL(0, "sparse", &option_sparse_checkout,
 		    N_("initialize sparse-checkout file to include only files at root")),
+	OPT_STRING(0, "bundle-uri", &bundle_uri,
+		   N_("uri"), N_("a URI for downloading bundles before fetching from origin remote")),
 	OPT_END()
 };
 
@@ -1232,6 +1236,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (transport->smart_options && !deepen && !filter_options.choice)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
+	/*
+	 * Before fetching from the remote, download and install bundle
+	 * data from the --bundle-uri option.
+	 */
+	if (bundle_uri) {
+		/* At this point, we need the_repository to match the cloned repo. */
+		repo_init(the_repository, git_dir, work_tree);
+		if (fetch_bundle_uri(the_repository, bundle_uri))
+			warning(_("failed to fetch objects from bundle URI '%s'"),
+				bundle_uri);
+	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
 	refspec_ref_prefixes(&remote->fetch,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
new file mode 100755
index 00000000000..f709bcb729c
--- /dev/null
+++ b/t/t5558-clone-bundle-uri.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='test fetching bundles with --bundle-uri'
+
+. ./test-lib.sh
+
+test_expect_success 'fail to clone from non-existent file' '
+	test_when_finished rm -rf test &&
+	git clone --bundle-uri="$(pwd)/does-not-exist" . test 2>err &&
+	grep "failed to download bundle from URI" err
+'
+
+test_expect_success 'fail to clone from non-bundle file' '
+	test_when_finished rm -rf test &&
+	echo bogus >bogus &&
+	git clone --bundle-uri="$(pwd)/bogus" . test 2>err &&
+	grep "is not a bundle" err
+'
+
+test_expect_success 'create bundle' '
+	git init clone-from &&
+	git -C clone-from checkout -b topic &&
+	test_commit -C clone-from A &&
+	test_commit -C clone-from B &&
+	git -C clone-from bundle create B.bundle topic
+'
+
+test_expect_success 'clone with path bundle' '
+	git clone --bundle-uri="clone-from/B.bundle" \
+		clone-from clone-path &&
+	git -C clone-path rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] bundle-uri: add support for http(s):// and file://
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-07-25 20:34 ` [PATCH 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
@ 2022-07-25 20:34 ` Derrick Stolee via GitGitGadget
  2022-07-27 22:09   ` Josh Steadmon
  2022-07-25 20:34 ` [PATCH 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-07-25 20:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change created the 'git clone --bundle-uri=<uri>' option.
Currently, <uri> must be a filename.

Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
and use git-remote-https as the way to download the data at that URI.
Otherwise, check to see if file:// is present and modify the prefix
accordingly.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 70 +++++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index b35babc36aa..7086382b186 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
 	return 0;
 }
 
-static int copy_uri_to_file(const char *file, const char *uri)
+static int download_https_uri_to_file(const char *file, const char *uri)
 {
-	/* File-based URIs only for now. */
-	return copy_file(file, uri, 0);
+	int result = 0;
+	struct child_process cp = CHILD_PROCESS_INIT;
+	FILE *child_in = NULL, *child_out = NULL;
+	struct strbuf line = STRBUF_INIT;
+	int found_get = 0;
+
+	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
+	cp.in = -1;
+	cp.out = -1;
+
+	if (start_command(&cp))
+		return 1;
+
+	child_in = fdopen(cp.in, "w");
+	if (!child_in) {
+		result = 1;
+		goto cleanup;
+	}
+
+	child_out = fdopen(cp.out, "r");
+	if (!child_out) {
+		result = 1;
+		goto cleanup;
+	}
+
+	fprintf(child_in, "capabilities\n");
+	fflush(child_in);
+
+	while (!strbuf_getline(&line, child_out)) {
+		if (!line.len)
+			break;
+		if (!strcmp(line.buf, "get"))
+			found_get = 1;
+	}
+	strbuf_release(&line);
+
+	if (!found_get) {
+		result = error(_("insufficient capabilities"));
+		goto cleanup;
+	}
+
+	fprintf(child_in, "get %s %s\n\n", uri, file);
+
+cleanup:
+	if (child_in)
+		fclose(child_in);
+	if (finish_command(&cp))
+		return 1;
+	if (child_out)
+		fclose(child_out);
+	return result;
+}
+
+static int copy_uri_to_file(const char *filename, const char *uri)
+{
+	const char *out;
+
+	if (skip_prefix(uri, "https:", &out) ||
+	    skip_prefix(uri, "http:", &out))
+		return download_https_uri_to_file(filename, uri);
+
+	if (!skip_prefix(uri, "file://", &out))
+		out = uri;
+
+	/* Copy as a file */
+	return copy_file(filename, out, 0);
 }
 
 static int unbundle_from_file(struct repository *r, const char *file)
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index f709bcb729c..ad666a2d28a 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone with file:// bundle' '
+	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
+		clone-from clone-file &&
+	git -C clone-file rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual
+'
+
+#########################################################################
+# HTTP tests begin here
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fail to fetch from non-existent HTTP URL' '
+	test_when_finished rm -rf test &&
+	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
+	grep "failed to download bundle from URI" err
+'
+
+test_expect_success 'fail to fetch from non-bundle HTTP URL' '
+	test_when_finished rm -rf test &&
+	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
+	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
+	grep "is not a bundle" err
+'
+
+test_expect_success 'clone HTTP bundle' '
+	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
+
+	git clone --no-local --mirror clone-from \
+		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
+
+	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
+		"$HTTPD_URL/smart/fetch.git" clone-http &&
+	git -C clone-http rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual &&
+
+	test_config -C clone-http log.excludedecoration refs/bundle/
+'
+
+# Do not add tests here unless they use the HTTP server, as they will
+# not run unless the HTTP dependencies exist.
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] clone: --bundle-uri cannot be combined with --depth
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-07-25 20:34 ` [PATCH 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-07-25 20:34 ` Derrick Stolee via GitGitGadget
  2022-07-27 22:11 ` [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Josh Steadmon
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-07-25 20:34 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change added the '--bundle-uri' option, but did not check
if the --depth parameter was included. Since bundles are not compatible
with shallow clones, provide an error message to the user who is
attempting this combination.

I am leaving this as its own change, separate from the one that
implements '--bundle-uri', because this is more of an advisory for the
user. There is nothing wrong with bootstrapping with bundles and then
fetching a shallow clone. However, that is likely going to involve too
much work for the client _and_ the server. The client will download all
of this bundle information containing the full history of the
repository only to ignore most of it. The server will get a shallow
fetch request, but with a list of haves that might cause a more painful
computation of that shallow pack-file.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-clone.txt | 3 ++-
 builtin/clone.c             | 3 +++
 t/t5606-clone-options.sh    | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 60fedf7eb5e..d032d971dd7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -327,7 +327,8 @@ or `--mirror` is given)
 	Before fetching from the remote, fetch a bundle from the given
 	`<uri>` and unbundle the data into the local repository. The refs
 	in the bundle will be stored under the hidden `refs/bundle/*`
-	namespace.
+	namespace. This option is incompatible with `--depth`,
+	`--shallow-since`, and `--shallow-exclude`.
 
 :git-clone: 1
 include::urls.txt[]
diff --git a/builtin/clone.c b/builtin/clone.c
index 4224d562758..4463789680b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -937,6 +937,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		option_no_checkout = 1;
 	}
 
+	if (bundle_uri && deepen)
+		die(_("--bundle-uri is incompatible with --depth, --shallow-since, and --shallow-exclude"));
+
 	repo_name = argv[0];
 
 	path = get_repo_path(repo_name, &is_bundle);
diff --git a/t/t5606-clone-options.sh b/t/t5606-clone-options.sh
index 8f676d6b0c0..f6bb02ab947 100755
--- a/t/t5606-clone-options.sh
+++ b/t/t5606-clone-options.sh
@@ -58,6 +58,14 @@ test_expect_success 'disallows --bare with --separate-git-dir' '
 
 '
 
+test_expect_success 'disallows --bundle-uri with shallow options' '
+	for option in --depth=1 --shallow-since=01-01-2000 --shallow-exclude=HEAD
+	do
+		test_must_fail git clone --bundle-uri=bundle $option from to 2>err &&
+		grep "bundle-uri is incompatible" err || return 1
+	done
+'
+
 test_expect_success 'reject cloning shallow repository' '
 	test_when_finished "rm -rf repo" &&
 	test_must_fail git clone --reject-shallow shallow-repo out 2>err &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] remote-curl: add 'get' capability
  2022-07-25 20:34 ` [PATCH 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
@ 2022-07-27 22:08   ` Josh Steadmon
  2022-07-27 23:00   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 36+ messages in thread
From: Josh Steadmon @ 2022-07-27 22:08 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, dyroneteng,
	Johannes.Schindelin, szeder.dev, mjcheetham, Derrick Stolee

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> A future change will want a way to download a file over HTTP(S) using
> the simplest of download mechanisms. We do not want to assume that the
> server on the other side understands anything about the Git protocol but
> could be a simple static web server.
> 
> Create the new 'get' capability for the remote helpers which advertises
> that the 'get' command is avalable. A caller can send a line containing
> 'get <url> <path>' to download the file at <url> into the file at
> <path>.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/gitremote-helpers.txt |  9 +++++++
>  remote-curl.c                       | 32 +++++++++++++++++++++++++
>  t/t5557-http-get.sh                 | 37 +++++++++++++++++++++++++++++
>  3 files changed, 78 insertions(+)
>  create mode 100755 t/t5557-http-get.sh
> 
> diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
> index 6f1e269ae43..ed8da428c98 100644
> --- a/Documentation/gitremote-helpers.txt
> +++ b/Documentation/gitremote-helpers.txt
> @@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
>  	Can guarantee that when a clone is requested, the received
>  	pack is self contained and is connected.
>  
> +'get'::
> +	Can use the 'get' command to download a file from a given URI.
> +
>  If a helper advertises 'connect', Git will use it if possible and
>  fall back to another capability if the helper requests so when
>  connecting (see the 'connect' command under COMMANDS).
> @@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
>  +
>  Supported if the helper has the "stateless-connect" capability.
>  
> +'get' <uri> <path>::
> +	Downloads the file from the given `<uri>` to the given `<path>`. If
> +	`<path>.temp` exists, then Git assumes that the `.temp` file is a
> +	partial download from a previous attempt and will resume the
> +	download from that position.
> +
>  If a fatal error occurs, the program writes the error message to
>  stderr and exits. The caller should expect that a suitable error
>  message has been printed if the child closes the connection without
> diff --git a/remote-curl.c b/remote-curl.c
> index b8758757ece..73fbdbddd84 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -1286,6 +1286,33 @@ static void parse_fetch(struct strbuf *buf)
>  	strbuf_reset(buf);
>  }
>  
> +static void parse_get(struct strbuf *buf)
> +{
> +	struct strbuf url = STRBUF_INIT;
> +	struct strbuf path = STRBUF_INIT;
> +	const char *p, *space;
> +
> +	if (!skip_prefix(buf->buf, "get ", &p))
> +		die(_("http transport does not support %s"), buf->buf);

Nit: since we're already calling skip_prefix(...) in cmd_main() below,
can we just pass the suffix to parse_get() and avoid having to skip the
prefix twice?


> +
> +	space = strchr(p, ' ');
> +
> +	if (!space)
> +		die(_("protocol error: expected '<url> <path>', missing space"));
> +
> +	strbuf_add(&url, p, space - p);
> +	strbuf_addstr(&path, space + 1);
> +
> +	if (http_get_file(url.buf, path.buf, NULL))
> +		die(_("failed to download file at URL '%s'"), url.buf);
> +
> +	strbuf_release(&url);
> +	strbuf_release(&path);
> +	printf("\n");
> +	fflush(stdout);
> +	strbuf_reset(buf);
> +}
> +
>  static int push_dav(int nr_spec, const char **specs)
>  {
>  	struct child_process child = CHILD_PROCESS_INIT;
> @@ -1564,9 +1591,14 @@ int cmd_main(int argc, const char **argv)
>  				printf("unsupported\n");
>  			fflush(stdout);
>  
> +		} else if (skip_prefix(buf.buf, "get ", &arg)) {
> +			parse_get(&buf);
> +			fflush(stdout);
> +
>  		} else if (!strcmp(buf.buf, "capabilities")) {
>  			printf("stateless-connect\n");
>  			printf("fetch\n");
> +			printf("get\n");
>  			printf("option\n");
>  			printf("push\n");
>  			printf("check-connectivity\n");
> diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
> new file mode 100755
> index 00000000000..50b7dbcf957
> --- /dev/null
> +++ b/t/t5557-http-get.sh
> @@ -0,0 +1,37 @@
> +#!/bin/sh
> +
> +test_description='test downloading a file by URL'
> +
> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> +	url="$HTTPD_URL/none.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file1
> +	EOF
> +
> +	test_must_fail git remote-http $url <input 2>err &&
> +	test_path_is_missing file1 &&
> +	grep "failed to download file at URL" err &&
> +	rm file1.temp
> +'
> +
> +test_expect_success 'get by URL: 200' '
> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> +	url="$HTTPD_URL/exists.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file2
> +
> +	EOF
> +
> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&
> +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
> +'
> +
> +test_done
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] bundle-uri: create basic file-copy logic
  2022-07-25 20:34 ` [PATCH 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
@ 2022-07-27 22:09   ` Josh Steadmon
  2022-08-01 13:58     ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Steadmon @ 2022-07-27 22:09 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, dyroneteng,
	Johannes.Schindelin, szeder.dev, mjcheetham, Derrick Stolee

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> Before implementing a way to fetch bundles into a repository, create the
> basic logic. Assume that the URI is actually a file path. Future logic
> will make this more careful to other protocols.
> 
> For now, we also only succeed if the content at the URI is a bundle
> file, not a bundle list. Bundle lists will be implemented in a future
> change.
> 
> Note that the discovery of a temporary filename is slightly racy because
> the odb_mkstemp() relies on the temporary file not existing. With the
> current implementation being limited to file copies, we could replace
> the copy_file() with copy_fd(). The tricky part comes in future changes
> that send the filename to 'git remote-https' and its 'get' capability.

> At that point, we need the file descriptor closed _and_ the file
> unlinked.

Ahh, it looks like this was the point I missed in my previous review.
IIUC, we need the file unlinked because http_get_file() will eventually
call finalize_object_file() to move a tempfile to the final object name,
and that will fail if we have an empty file already in place.


> If we were to keep the file descriptor open for the sake of
> normal file copies, then we would pollute the rest of the code for
> little benefit. This is especially the case because we expect that most
> bundle URI use will be based on HTTPS instead of file copies.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Makefile     |   1 +
>  bundle-uri.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  bundle-uri.h |  14 +++++++
>  3 files changed, 119 insertions(+)
>  create mode 100644 bundle-uri.c
>  create mode 100644 bundle-uri.h
> 
> diff --git a/Makefile b/Makefile
> index 1624471badc..7d5f48069ea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -906,6 +906,7 @@ LIB_OBJS += blob.o
>  LIB_OBJS += bloom.o
>  LIB_OBJS += branch.o
>  LIB_OBJS += bulk-checkin.o
> +LIB_OBJS += bundle-uri.o
>  LIB_OBJS += bundle.o
>  LIB_OBJS += cache-tree.o
>  LIB_OBJS += cbtree.o
> diff --git a/bundle-uri.c b/bundle-uri.c
> new file mode 100644
> index 00000000000..b35babc36aa
> --- /dev/null
> +++ b/bundle-uri.c
> @@ -0,0 +1,104 @@
> +#include "cache.h"
> +#include "bundle-uri.h"
> +#include "bundle.h"
> +#include "object-store.h"
> +#include "refs.h"
> +#include "run-command.h"
> +
> +static int find_temp_filename(struct strbuf *name)
> +{
> +	int fd;
> +	/*
> +	 * Find a temporary filename that is available. This is briefly
> +	 * racy, but unlikely to collide.
> +	 */
> +	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
> +	if (fd < 0) {
> +		warning(_("failed to create temporary file"));
> +		return -1;
> +	}
> +
> +	close(fd);
> +	unlink(name->buf);
> +	return 0;
> +}
> +
> +static int copy_uri_to_file(const char *file, const char *uri)
> +{
> +	/* File-based URIs only for now. */
> +	return copy_file(file, uri, 0);
> +}
> +
> +static int unbundle_from_file(struct repository *r, const char *file)
> +{
> +	int result = 0;
> +	int bundle_fd;
> +	struct bundle_header header = BUNDLE_HEADER_INIT;
> +	struct string_list_item *refname;
> +	struct strbuf bundle_ref = STRBUF_INIT;
> +	size_t bundle_prefix_len;
> +
> +	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
> +		return 1;
> +
> +	if ((result = unbundle(r, &header, bundle_fd, NULL)))
> +		return 1;
> +
> +	/*
> +	 * Convert all refs/heads/ from the bundle into refs/bundles/
> +	 * in the local repository.
> +	 */
> +	strbuf_addstr(&bundle_ref, "refs/bundles/");
> +	bundle_prefix_len = bundle_ref.len;
> +
> +	for_each_string_list_item(refname, &header.references) {
> +		struct object_id *oid = refname->util;
> +		struct object_id old_oid;
> +		const char *branch_name;
> +		int has_old;
> +
> +		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
> +			continue;
> +
> +		strbuf_setlen(&bundle_ref, bundle_prefix_len);
> +		strbuf_addstr(&bundle_ref, branch_name);
> +
> +		has_old = !read_ref(bundle_ref.buf, &old_oid);
> +		update_ref("fetched bundle", bundle_ref.buf, oid,
> +			   has_old ? &old_oid : NULL,
> +			   REF_SKIP_OID_VERIFICATION,
> +			   UPDATE_REFS_MSG_ON_ERR);
> +	}
> +
> +	bundle_header_release(&header);

We still also need to release bundle_ref here, right?


> +	return result;
> +}
> +
> +int fetch_bundle_uri(struct repository *r, const char *uri)
> +{
> +	int result = 0;
> +	struct strbuf filename = STRBUF_INIT;
> +
> +	if ((result = find_temp_filename(&filename)))
> +		goto cleanup;
> +
> +	if ((result = copy_uri_to_file(filename.buf, uri))) {
> +		warning(_("failed to download bundle from URI '%s'"), uri);
> +		goto cleanup;
> +	}
> +
> +	if ((result = !is_bundle(filename.buf, 0))) {
> +		warning(_("file at URI '%s' is not a bundle"), uri);
> +		goto cleanup;
> +	}
> +
> +	if ((result = unbundle_from_file(r, filename.buf))) {
> +		warning(_("failed to unbundle bundle from URI '%s'"), uri);
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	unlink(filename.buf);
> +	strbuf_release(&filename);
> +	return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> new file mode 100644
> index 00000000000..8a152f1ef14
> --- /dev/null
> +++ b/bundle-uri.h
> @@ -0,0 +1,14 @@
> +#ifndef BUNDLE_URI_H
> +#define BUNDLE_URI_H
> +
> +struct repository;
> +
> +/**
> + * Fetch data from the given 'uri' and unbundle the bundle data found
> + * based on that information.
> + *
> + * Returns non-zero if no bundle information is found at the given 'uri'.
> + */
> +int fetch_bundle_uri(struct repository *r, const char *uri);
> +
> +#endif
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] bundle-uri: add support for http(s):// and file://
  2022-07-25 20:34 ` [PATCH 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-07-27 22:09   ` Josh Steadmon
  2022-08-01 14:00     ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Steadmon @ 2022-07-27 22:09 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, dyroneteng,
	Johannes.Schindelin, szeder.dev, mjcheetham, Derrick Stolee

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The previous change created the 'git clone --bundle-uri=<uri>' option.
> Currently, <uri> must be a filename.
> 
> Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
> and use git-remote-https as the way to download the data at that URI.
> Otherwise, check to see if file:// is present and modify the prefix
> accordingly.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle-uri.c                | 70 +++++++++++++++++++++++++++++++++++--
>  t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
>  2 files changed, 112 insertions(+), 3 deletions(-)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index b35babc36aa..7086382b186 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
>  	return 0;
>  }
>  
> -static int copy_uri_to_file(const char *file, const char *uri)
> +static int download_https_uri_to_file(const char *file, const char *uri)
>  {
> -	/* File-based URIs only for now. */
> -	return copy_file(file, uri, 0);
> +	int result = 0;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	FILE *child_in = NULL, *child_out = NULL;
> +	struct strbuf line = STRBUF_INIT;
> +	int found_get = 0;
> +
> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
> +	cp.in = -1;
> +	cp.out = -1;
> +
> +	if (start_command(&cp))
> +		return 1;
> +
> +	child_in = fdopen(cp.in, "w");
> +	if (!child_in) {
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	child_out = fdopen(cp.out, "r");
> +	if (!child_out) {
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	fprintf(child_in, "capabilities\n");
> +	fflush(child_in);
> +
> +	while (!strbuf_getline(&line, child_out)) {
> +		if (!line.len)
> +			break;
> +		if (!strcmp(line.buf, "get"))
> +			found_get = 1;
> +	}
> +	strbuf_release(&line);
> +
> +	if (!found_get) {
> +		result = error(_("insufficient capabilities"));
> +		goto cleanup;
> +	}
> +
> +	fprintf(child_in, "get %s %s\n\n", uri, file);
> +
> +cleanup:
> +	if (child_in)
> +		fclose(child_in);
> +	if (finish_command(&cp))
> +		return 1;
> +	if (child_out)
> +		fclose(child_out);
> +	return result;
> +}
> +
> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> +	const char *out;
> +
> +	if (skip_prefix(uri, "https:", &out) ||
> +	    skip_prefix(uri, "http:", &out))
> +		return download_https_uri_to_file(filename, uri);

Since we're not using "out" here, should we replace the skip_prefix()s
with starts_with(), or perhaps even istarts_with()?


> +	if (!skip_prefix(uri, "file://", &out))
> +		out = uri;
> +
> +	/* Copy as a file */
> +	return copy_file(filename, out, 0);
>  }
>  
>  static int unbundle_from_file(struct repository *r, const char *file)
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index f709bcb729c..ad666a2d28a 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'clone with file:// bundle' '
> +	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
> +		clone-from clone-file &&
> +	git -C clone-file rev-parse refs/bundles/topic >actual &&
> +	git -C clone-from rev-parse topic >expect &&
> +	test_cmp expect actual
> +'
> +
> +#########################################################################
> +# HTTP tests begin here
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'fail to fetch from non-existent HTTP URL' '
> +	test_when_finished rm -rf test &&
> +	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
> +	grep "failed to download bundle from URI" err
> +'
> +
> +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
> +	test_when_finished rm -rf test &&
> +	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
> +	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
> +	grep "is not a bundle" err
> +'
> +
> +test_expect_success 'clone HTTP bundle' '
> +	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
> +
> +	git clone --no-local --mirror clone-from \
> +		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
> +
> +	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
> +		"$HTTPD_URL/smart/fetch.git" clone-http &&
> +	git -C clone-http rev-parse refs/bundles/topic >actual &&
> +	git -C clone-from rev-parse topic >expect &&
> +	test_cmp expect actual &&
> +
> +	test_config -C clone-http log.excludedecoration refs/bundle/
> +'
> +
> +# Do not add tests here unless they use the HTTP server, as they will
> +# not run unless the HTTP dependencies exist.
> +
>  test_done
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/5] Bundle URIs II: git clone --bundle-uri
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-07-25 20:34 ` [PATCH 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
@ 2022-07-27 22:11 ` Josh Steadmon
  2022-08-01 14:00   ` Derrick Stolee
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  6 siblings, 1 reply; 36+ messages in thread
From: Josh Steadmon @ 2022-07-27 22:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, avarab, dyroneteng,
	Johannes.Schindelin, szeder.dev, mjcheetham, Derrick Stolee

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> This is the second series building the bundle URI feature as discussed in
> the previous series that added the design document [1]. This series does not
> modify the design document, so the patches are independent and can be
> applied to the latest 'master'.
> 
> [1]
> https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com
> 
> This series brings in just enough logic that we can bootstrap clones from a
> single bundle using git clone --bundle-uri=<X>.
> 
>  * Patch 1 adds a 'get' capability to 'git remote-https' which allows
>    downloading the contents of a URI to a local file.
>  * Patch 2 creates basic file-copy logic within a new bundle-uri.c file. It
>    is not used until patch 3.
>  * Patch 3 creates the git clone --bundle-uri=<X> option, allowing Git to
>    bootstrap a clone from a bundle, but get the remaining objects from the
>    origin URL. (As of this patch, it only accepts a filename.)
>  * Patch 4 extends the git clone --bundle-uri=<X> option to allow file://
>    and https:// URIs.
>  * Patch 5 is a CLI helper to avoid using --bundle-uri and --depth at the
>    same time in git clone.
> 
> As outlined in [1], the next steps after this are:
> 
>  1. Allow parsing a bundle list as a config file at the given URI. The
>     key-value format is unified with the protocol v2 verb (coming in (3)).
>     [2]
>  2. Implement the protocol v2 verb, re-using the bundle list logic from (2).
>     Use this to auto-discover bundle URIs during 'git clone' (behind a
>     config option). [3]
>  3. Implement the 'creationToken' heuristic, allowing incremental 'git
>     fetch' commands to download a bundle list from a configured URI, and
>     only download bundles that are new based on the creation token values.
>     [4]
> 
> I have prepared some of this work as pull requests on my personal fork so
> curious readers can look ahead to where we are going:
> 
> [2] https://github.com/derrickstolee/git/pull/20
> 
> [3] https://github.com/derrickstolee/git/pull/21
> 
> [4] https://github.com/derrickstolee/git/pull/22
> 
> Note: this series includes some code pulled out of the first series [1], and
> in particular the git fetch --bundle-uri=<X> option is removed. The
> intention was to replace that with git bundle fetch <X>, but that conflicts
> with the work to refactor how subcommands are parsed. The git bundle fetch
> subcommand could be added later for maximum flexibility on the client side,
> but we can also move forward without it.
> 
> Thanks, -Stolee
> 
> P.S. Here is the range-diff compared to v2 of the previous bundle URI
> series:
> 
> 1:  d444042dc4dcc < -:  ------------- docs: document bundle URI standard
> 2:  0a2cf60437f78 ! 1:  40808e92afb7b remote-curl: add 'get' capability
>     @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
>       
>      +static void parse_get(struct strbuf *buf)
>      +{
>     -+	struct http_get_options opts = { 0 };
>      +	struct strbuf url = STRBUF_INIT;
>      +	struct strbuf path = STRBUF_INIT;
>      +	const char *p, *space;
>     @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
>      +	strbuf_add(&url, p, space - p);
>      +	strbuf_addstr(&path, space + 1);
>      +
>     -+	if (http_get_file(url.buf, path.buf, &opts))
>     ++	if (http_get_file(url.buf, path.buf, NULL))
>      +		die(_("failed to download file at URL '%s'"), url.buf);
>      +
>      +	strbuf_release(&url);
>     @@ t/t5557-http-get.sh (new)
>      +	get $url file1
>      +	EOF
>      +
>     -+	test_must_fail git remote-http $url $url <input 2>err &&
>     ++	test_must_fail git remote-http $url <input 2>err &&
>      +	test_path_is_missing file1 &&
>      +	grep "failed to download file at URL" err &&
>      +	rm file1.temp
>     @@ t/t5557-http-get.sh (new)
>      +
>      +	EOF
>      +
>     -+	GIT_TRACE2_PERF=1 git remote-http $url $url <input &&
>     ++	GIT_TRACE2_PERF=1 git remote-http $url <input &&
>      +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
>      +'
>      +
>      +test_done
>     -
>     - ## transport-helper.c ##
>     -@@ transport-helper.c: struct helper_data {
>     - 		check_connectivity : 1,
>     - 		no_disconnect_req : 1,
>     - 		no_private_update : 1,
>     --		object_format : 1;
>     -+		object_format : 1,
>     -+		get : 1;
>     - 
>     - 	/*
>     - 	 * As an optimization, the transport code may invoke fetch before
>     -@@ transport-helper.c: static struct child_process *get_helper(struct transport *transport)
>     - 			data->no_private_update = 1;
>     - 		} else if (starts_with(capname, "object-format")) {
>     - 			data->object_format = 1;
>     -+		} else if (!strcmp(capname, "get")) {
>     -+			data->get = 1;
>     - 		} else if (mandatory) {
>     - 			die(_("unknown mandatory capability %s; this remote "
>     - 			      "helper probably needs newer version of Git"),
> 3:  abec47564fd9c ! 2:  7d3159f0d9a29 bundle-uri: create basic file-copy logic
>     @@ Commit message
>          file, not a bundle list. Bundle lists will be implemented in a future
>          change.
>      
>     +    Note that the discovery of a temporary filename is slightly racy because
>     +    the odb_mkstemp() relies on the temporary file not existing. With the
>     +    current implementation being limited to file copies, we could replace
>     +    the copy_file() with copy_fd(). The tricky part comes in future changes
>     +    that send the filename to 'git remote-https' and its 'get' capability.
>     +    At that point, we need the file descriptor closed _and_ the file
>     +    unlinked. If we were to keep the file descriptor open for the sake of
>     +    normal file copies, then we would pollute the rest of the code for
>     +    little benefit. This is especially the case because we expect that most
>     +    bundle URI use will be based on HTTPS instead of file copies.
>     +
>          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>      
>       ## Makefile ##
>     @@ bundle-uri.c (new)
>      +#include "refs.h"
>      +#include "run-command.h"
>      +
>     -+static void find_temp_filename(struct strbuf *name)
>     ++static int find_temp_filename(struct strbuf *name)
>      +{
>      +	int fd;
>      +	/*
>     @@ bundle-uri.c (new)
>      +	 * racy, but unlikely to collide.
>      +	 */
>      +	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
>     -+	if (fd < 0)
>     -+		die(_("failed to create temporary file"));
>     ++	if (fd < 0) {
>     ++		warning(_("failed to create temporary file"));
>     ++		return -1;
>     ++	}
>     ++
>      +	close(fd);
>      +	unlink(name->buf);
>     ++	return 0;
>      +}
>      +
>     -+static int copy_uri_to_file(const char *uri, const char *file)
>     ++static int copy_uri_to_file(const char *file, const char *uri)
>      +{
>     -+	/* Copy as a file */
>     -+	return copy_file(file, uri, 0444);
>     ++	/* File-based URIs only for now. */
>     ++	return copy_file(file, uri, 0);
>      +}
>      +
>      +static int unbundle_from_file(struct repository *r, const char *file)
>     @@ bundle-uri.c (new)
>      +	int result = 0;
>      +	int bundle_fd;
>      +	struct bundle_header header = BUNDLE_HEADER_INIT;
>     -+	struct strvec extra_index_pack_args = STRVEC_INIT;
>      +	struct string_list_item *refname;
>      +	struct strbuf bundle_ref = STRBUF_INIT;
>      +	size_t bundle_prefix_len;
>     @@ bundle-uri.c (new)
>      +	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
>      +		return 1;
>      +
>     -+	if ((result = unbundle(r, &header, bundle_fd, &extra_index_pack_args)))
>     ++	if ((result = unbundle(r, &header, bundle_fd, NULL)))
>      +		return 1;
>      +
>      +	/*
>     @@ bundle-uri.c (new)
>      +	int result = 0;
>      +	struct strbuf filename = STRBUF_INIT;
>      +
>     -+	find_temp_filename(&filename);
>     -+	if ((result = copy_uri_to_file(uri, filename.buf)))
>     ++	if ((result = find_temp_filename(&filename)))
>     ++		goto cleanup;
>     ++
>     ++	if ((result = copy_uri_to_file(filename.buf, uri))) {
>     ++		warning(_("failed to download bundle from URI '%s'"), uri);
>      +		goto cleanup;
>     ++	}
>      +
>     -+	if ((result = !is_bundle(filename.buf, 0)))
>     ++	if ((result = !is_bundle(filename.buf, 0))) {
>     ++		warning(_("file at URI '%s' is not a bundle"), uri);
>      +		goto cleanup;
>     ++	}
>      +
>     -+	if ((result = unbundle_from_file(r, filename.buf)))
>     ++	if ((result = unbundle_from_file(r, filename.buf))) {
>     ++		warning(_("failed to unbundle bundle from URI '%s'"), uri);
>      +		goto cleanup;
>     ++	}
>      +
>      +cleanup:
>      +	unlink(filename.buf);
> 4:  f6255ec518857 < -:  ------------- fetch: add --bundle-uri option
> -:  ------------- > 3:  29e645a54ba7f clone: add --bundle-uri option
> 5:  bfbd11b48bf1b ! 4:  f6bc3177332e8 bundle-uri: add support for http(s):// and file://
>     @@ Metadata
>       ## Commit message ##
>          bundle-uri: add support for http(s):// and file://
>      
>     -    The previous change created the 'git fetch --bundle-uri=<uri>' option.
>     +    The previous change created the 'git clone --bundle-uri=<uri>' option.
>          Currently, <uri> must be a filename.
>      
>          Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
>     @@ Commit message
>          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>      
>       ## bundle-uri.c ##
>     -@@ bundle-uri.c: static void find_temp_filename(struct strbuf *name)
>     - 	unlink(name->buf);
>     +@@ bundle-uri.c: static int find_temp_filename(struct strbuf *name)
>     + 	return 0;
>       }
>       
>     -+static int download_https_uri_to_file(const char *uri, const char *file)
>     -+{
>     +-static int copy_uri_to_file(const char *file, const char *uri)
>     ++static int download_https_uri_to_file(const char *file, const char *uri)
>     + {
>     +-	/* File-based URIs only for now. */
>     +-	return copy_file(file, uri, 0);
>      +	int result = 0;
>      +	struct child_process cp = CHILD_PROCESS_INIT;
>      +	FILE *child_in = NULL, *child_out = NULL;
>     @@ bundle-uri.c: static void find_temp_filename(struct strbuf *name)
>      +	return result;
>      +}
>      +
>     - static int copy_uri_to_file(const char *uri, const char *file)
>     - {
>     ++static int copy_uri_to_file(const char *filename, const char *uri)
>     ++{
>      +	const char *out;
>     ++
>      +	if (skip_prefix(uri, "https:", &out) ||
>      +	    skip_prefix(uri, "http:", &out))
>     -+		return download_https_uri_to_file(uri, file);
>     ++		return download_https_uri_to_file(filename, uri);
>      +
>      +	if (!skip_prefix(uri, "file://", &out))
>      +		out = uri;
>      +
>     - 	/* Copy as a file */
>     --	return copy_file(file, uri, 0444);
>     -+	return !!copy_file(file, out, 0);
>     ++	/* Copy as a file */
>     ++	return copy_file(filename, out, 0);
>       }
>       
>       static int unbundle_from_file(struct repository *r, const char *file)
>      
>     - ## t/t5558-fetch-bundle-uri.sh ##
>     -@@ t/t5558-fetch-bundle-uri.sh: test_expect_success 'fetch file bundle' '
>     + ## t/t5558-clone-bundle-uri.sh ##
>     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
>           test_cmp expect actual
>       '
>       
>     -+test_expect_success 'fetch file:// bundle' '
>     -+	git init fetch-file &&
>     -+	git -C fetch-file fetch --bundle-uri="file://$(pwd)/fetch-from/B.bundle" &&
>     -+	git -C fetch-file rev-parse refs/bundles/topic >actual &&
>     -+	git -C fetch-from rev-parse topic >expect &&
>     ++test_expect_success 'clone with file:// bundle' '
>     ++	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
>     ++		clone-from clone-file &&
>     ++	git -C clone-file rev-parse refs/bundles/topic >actual &&
>     ++	git -C clone-from rev-parse topic >expect &&
>      +	test_cmp expect actual
>      +'
>      +
>     @@ t/t5558-fetch-bundle-uri.sh: test_expect_success 'fetch file bundle' '
>      +start_httpd
>      +
>      +test_expect_success 'fail to fetch from non-existent HTTP URL' '
>     -+	test_must_fail git fetch --bundle-uri="$HTTPD_URL/does-not-exist" 2>err &&
>     ++	test_when_finished rm -rf test &&
>     ++	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
>      +	grep "failed to download bundle from URI" err
>      +'
>      +
>      +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
>     ++	test_when_finished rm -rf test &&
>      +	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
>     -+	test_must_fail git fetch --bundle-uri="$HTTPD_URL/bogus" 2>err &&
>     ++	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
>      +	grep "is not a bundle" err
>      +'
>      +
>     -+test_expect_success 'fetch HTTP bundle' '
>     -+	cp fetch-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
>     -+	git init fetch-http &&
>     -+	git -C fetch-http fetch --bundle-uri="$HTTPD_URL/B.bundle" &&
>     -+	git -C fetch-http rev-parse refs/bundles/topic >actual &&
>     -+	git -C fetch-from rev-parse topic >expect &&
>     -+	test_cmp expect actual
>     ++test_expect_success 'clone HTTP bundle' '
>     ++	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
>     ++
>     ++	git clone --no-local --mirror clone-from \
>     ++		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
>     ++
>     ++	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
>     ++		"$HTTPD_URL/smart/fetch.git" clone-http &&
>     ++	git -C clone-http rev-parse refs/bundles/topic >actual &&
>     ++	git -C clone-from rev-parse topic >expect &&
>     ++	test_cmp expect actual &&
>     ++
>     ++	test_config -C clone-http log.excludedecoration refs/bundle/
>      +'
>      +
>      +# Do not add tests here unless they use the HTTP server, as they will
> 6:  a217e9a0640b4 < -:  ------------- fetch: add 'refs/bundle/' to log.excludeDecoration
> -:  ------------- > 5:  e823b168ab725 clone: --bundle-uri cannot be combined with --depth
> 
> 
> Derrick Stolee (5):
>   remote-curl: add 'get' capability
>   bundle-uri: create basic file-copy logic
>   clone: add --bundle-uri option
>   bundle-uri: add support for http(s):// and file://
>   clone: --bundle-uri cannot be combined with --depth
> 
>  Documentation/git-clone.txt         |   7 ++
>  Documentation/gitremote-helpers.txt |   9 ++
>  Makefile                            |   1 +
>  builtin/clone.c                     |  18 +++
>  bundle-uri.c                        | 168 ++++++++++++++++++++++++++++
>  bundle-uri.h                        |  14 +++
>  remote-curl.c                       |  32 ++++++
>  t/t5557-http-get.sh                 |  37 ++++++
>  t/t5558-clone-bundle-uri.sh         |  81 ++++++++++++++
>  t/t5606-clone-options.sh            |   8 ++
>  10 files changed, 375 insertions(+)
>  create mode 100644 bundle-uri.c
>  create mode 100644 bundle-uri.h
>  create mode 100755 t/t5557-http-get.sh
>  create mode 100755 t/t5558-clone-bundle-uri.sh
> 
> 
> base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1300%2Fderrickstolee%2Fbundle-redo%2Fclone-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1300/derrickstolee/bundle-redo/clone-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1300
> -- 
> gitgitgadget

Looks good to me, with just one small fix and a couple of optional
nitpicks. Thaks for the series!

Reviewed-by: Josh Steadmon <steadmon@google.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] remote-curl: add 'get' capability
  2022-07-25 20:34 ` [PATCH 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
  2022-07-27 22:08   ` Josh Steadmon
@ 2022-07-27 23:00   ` Ævar Arnfjörð Bjarmason
  2022-08-01 13:55     ` Derrick Stolee
  1 sibling, 1 reply; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-27 23:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee


On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]

Add a:

	TEST_PASSES_SANITIZE_LEAK=true

Before:

> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> +	url="$HTTPD_URL/none.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file1
> +	EOF
> +
> +	test_must_fail git remote-http $url <input 2>err &&
> +	test_path_is_missing file1 &&
> +	grep "failed to download file at URL" err &&
> +	rm file1.temp

This should presumably be a :

	test_when_finished "rm file.temp"

Before the "test_must_fail", otherwise we'll leave this around if
e.g. the "grep" in the &&-chain fails, but we surely want to clean this
up in that case..

> +'
> +
> +test_expect_success 'get by URL: 200' '
> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> +	url="$HTTPD_URL/exists.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file2
> +
> +	EOF
> +
> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&

Is this ad-hoc debugging that snuck into the test? Nothing here needs
this GIT_TRACE2_PERF...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] remote-curl: add 'get' capability
  2022-07-27 23:00   ` Ævar Arnfjörð Bjarmason
@ 2022-08-01 13:55     ` Derrick Stolee
  2022-08-01 14:39       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2022-08-01 13:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, newren, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon

On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>> [...]
> 
> Add a:
> 
> 	TEST_PASSES_SANITIZE_LEAK=true

I trust this is important for your series that flags tests
that _could_ pass with this. Thanks.

>> +. ./test-lib.sh
>> +
>> +. "$TEST_DIRECTORY"/lib-httpd.sh
>> +start_httpd
>> +
>> +test_expect_success 'get by URL: 404' '
>> +	url="$HTTPD_URL/none.txt" &&
>> +	cat >input <<-EOF &&
>> +	capabilities
>> +	get $url file1
>> +	EOF
>> +
>> +	test_must_fail git remote-http $url <input 2>err &&
>> +	test_path_is_missing file1 &&
>> +	grep "failed to download file at URL" err &&
>> +	rm file1.temp
> 
> This should presumably be a :
> 
> 	test_when_finished "rm file.temp"

Yes. Thanks.

>> +'
>> +
>> +test_expect_success 'get by URL: 200' '
>> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
>> +
>> +	url="$HTTPD_URL/exists.txt" &&
>> +	cat >input <<-EOF &&
>> +	capabilities
>> +	get $url file2
>> +
>> +	EOF
>> +
>> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&
> 
> Is this ad-hoc debugging that snuck into the test? Nothing here needs
> this GIT_TRACE2_PERF...

Good eye. Removed.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] bundle-uri: create basic file-copy logic
  2022-07-27 22:09   ` Josh Steadmon
@ 2022-08-01 13:58     ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2022-08-01 13:58 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee via GitGitGadget, git, gitster, me,
	newren, avarab, dyroneteng, Johannes.Schindelin, szeder.dev,
	mjcheetham

On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> Before implementing a way to fetch bundles into a repository, create the
>> basic logic. Assume that the URI is actually a file path. Future logic
>> will make this more careful to other protocols.
>>
>> For now, we also only succeed if the content at the URI is a bundle
>> file, not a bundle list. Bundle lists will be implemented in a future
>> change.
>>
>> Note that the discovery of a temporary filename is slightly racy because
>> the odb_mkstemp() relies on the temporary file not existing. With the
>> current implementation being limited to file copies, we could replace
>> the copy_file() with copy_fd(). The tricky part comes in future changes
>> that send the filename to 'git remote-https' and its 'get' capability.
> 
>> At that point, we need the file descriptor closed _and_ the file
>> unlinked.
> 
> Ahh, it looks like this was the point I missed in my previous review.
> IIUC, we need the file unlinked because http_get_file() will eventually
> call finalize_object_file() to move a tempfile to the final object name,
> and that will fail if we have an empty file already in place.

Yes, and I also was not sure what would happen if the empty file existed.
I tested it and thought allowing overwriting an existing file would be a
bigger problem than this choice of a filename.

We also discussed options about how it would be nice to have a predictable
filename so we could resume downloads _across Git process failures_
instead of just a network failure within a single Git process. This is
something to explore when creating that functionality.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] bundle-uri: add support for http(s):// and file://
  2022-07-27 22:09   ` Josh Steadmon
@ 2022-08-01 14:00     ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2022-08-01 14:00 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee via GitGitGadget, git, gitster, me,
	newren, avarab, dyroneteng, Johannes.Schindelin, szeder.dev,
	mjcheetham

On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:

>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> +	const char *out;
>> +
>> +	if (skip_prefix(uri, "https:", &out) ||
>> +	    skip_prefix(uri, "http:", &out))
>> +		return download_https_uri_to_file(filename, uri);
> 
> Since we're not using "out" here, should we replace the skip_prefix()s
> with starts_with(), or perhaps even istarts_with()?

Sounds good to me. Can't abandon "out" because it is used
for the "file://" prefix, but istarts_with() is better here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/5] Bundle URIs II: git clone --bundle-uri
  2022-07-27 22:11 ` [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Josh Steadmon
@ 2022-08-01 14:00   ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2022-08-01 14:00 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee via GitGitGadget, git, gitster, me,
	newren, avarab, dyroneteng, Johannes.Schindelin, szeder.dev,
	mjcheetham

On 7/27/2022 6:11 PM, Josh Steadmon wrote:

> Looks good to me, with just one small fix and a couple of optional
> nitpicks. Thaks for the series!
> 
> Reviewed-by: Josh Steadmon <steadmon@google.com>

Thanks for the review!

-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] remote-curl: add 'get' capability
  2022-08-01 13:55     ` Derrick Stolee
@ 2022-08-01 14:39       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 36+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-01 14:39 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, gitster, me, newren,
	dyroneteng, Johannes.Schindelin, szeder.dev, mjcheetham,
	steadmon


On Mon, Aug 01 2022, Derrick Stolee wrote:

> On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> [...]
>> 
>> Add a:
>> 
>> 	TEST_PASSES_SANITIZE_LEAK=true
>
> I trust this is important for your series that flags tests
> that _could_ pass with this. Thanks.

Not really, it does add a "check" mode, and this is one of the few tests
crop up when merged with "seen" if run in that mode, but currentlny it's
only run manually, and we have other outstanding deviations.

It's just something for "master": We have the "linux-leaks" job for a
while now, so it makes sense for new tests to declare if they're
leak-free or not, so we guard against the introduction of leaks with new
code, if possible.

I (or someone else) can always follow-up with something like[1], but as
it's easy it makes sense to just add it in the first place, thanks.

1. https://lore.kernel.org/git/patch-v3-13.15-28255ac3239-20220727T230800Z-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/5] Bundle URIs II: git clone --bundle-uri
  2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-07-27 22:11 ` [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Josh Steadmon
@ 2022-08-02 12:29 ` Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
                     ` (5 more replies)
  6 siblings, 6 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-02 12:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee

This is the second series building the bundle URI feature as discussed in
the previous series that added the design document [1]. This series does not
modify the design document, so the patches are independent and can be
applied to the latest 'master'.

[1]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

This series brings in just enough logic that we can bootstrap clones from a
single bundle using git clone --bundle-uri=<X>.

 * Patch 1 adds a 'get' capability to 'git remote-https' which allows
   downloading the contents of a URI to a local file.
 * Patch 2 creates basic file-copy logic within a new bundle-uri.c file. It
   is not used until patch 3.
 * Patch 3 creates the git clone --bundle-uri=<X> option, allowing Git to
   bootstrap a clone from a bundle, but get the remaining objects from the
   origin URL. (As of this patch, it only accepts a filename.)
 * Patch 4 extends the git clone --bundle-uri=<X> option to allow file://
   and https:// URIs.
 * Patch 5 is a CLI helper to avoid using --bundle-uri and --depth at the
   same time in git clone.

As outlined in [1], the next steps after this are:

 1. Allow parsing a bundle list as a config file at the given URI. The
    key-value format is unified with the protocol v2 verb (coming in (3)).
    [2]
 2. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [3]
 3. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [4]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[2] https://github.com/derrickstolee/git/pull/20

[3] https://github.com/derrickstolee/git/pull/21

[4] https://github.com/derrickstolee/git/pull/22

Note: this series includes some code pulled out of the first series [1], and
in particular the git fetch --bundle-uri=<X> option is removed. The
intention was to replace that with git bundle fetch <X>, but that conflicts
with the work to refactor how subcommands are parsed. The git bundle fetch
subcommand could be added later for maximum flexibility on the client side,
but we can also move forward without it.


Updates in v2
=============

 * Several typos or small tweaks. See the range-diff for details.

Thanks, -Stolee

Derrick Stolee (5):
  remote-curl: add 'get' capability
  bundle-uri: create basic file-copy logic
  clone: add --bundle-uri option
  bundle-uri: add support for http(s):// and file://
  clone: --bundle-uri cannot be combined with --depth

 Documentation/git-clone.txt         |   7 ++
 Documentation/gitremote-helpers.txt |   9 ++
 Makefile                            |   1 +
 builtin/clone.c                     |  18 +++
 bundle-uri.c                        | 168 ++++++++++++++++++++++++++++
 bundle-uri.h                        |  14 +++
 remote-curl.c                       |  28 +++++
 t/t5557-http-get.sh                 |  39 +++++++
 t/t5558-clone-bundle-uri.sh         |  81 ++++++++++++++
 t/t5606-clone-options.sh            |   8 ++
 10 files changed, 373 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h
 create mode 100755 t/t5557-http-get.sh
 create mode 100755 t/t5558-clone-bundle-uri.sh


base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1300%2Fderrickstolee%2Fbundle-redo%2Fclone-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1300/derrickstolee/bundle-redo/clone-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1300

Range-diff vs v1:

 1:  40808e92afb ! 1:  4df4a1d679a remote-curl: add 'get' capability
     @@ Commit message
          'get <url> <path>' to download the file at <url> into the file at
          <path>.
      
     +    Reviewed-by: Josh Steadmon <steadmon@google.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/gitremote-helpers.txt ##
     @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
       	strbuf_reset(buf);
       }
       
     -+static void parse_get(struct strbuf *buf)
     ++static void parse_get(const char *arg)
      +{
      +	struct strbuf url = STRBUF_INIT;
      +	struct strbuf path = STRBUF_INIT;
     -+	const char *p, *space;
     ++	const char *space;
      +
     -+	if (!skip_prefix(buf->buf, "get ", &p))
     -+		die(_("http transport does not support %s"), buf->buf);
     -+
     -+	space = strchr(p, ' ');
     ++	space = strchr(arg, ' ');
      +
      +	if (!space)
      +		die(_("protocol error: expected '<url> <path>', missing space"));
      +
     -+	strbuf_add(&url, p, space - p);
     ++	strbuf_add(&url, arg, space - arg);
      +	strbuf_addstr(&path, space + 1);
      +
      +	if (http_get_file(url.buf, path.buf, NULL))
     @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
      +	strbuf_release(&path);
      +	printf("\n");
      +	fflush(stdout);
     -+	strbuf_reset(buf);
      +}
      +
       static int push_dav(int nr_spec, const char **specs)
     @@ remote-curl.c: int cmd_main(int argc, const char **argv)
       			fflush(stdout);
       
      +		} else if (skip_prefix(buf.buf, "get ", &arg)) {
     -+			parse_get(&buf);
     ++			parse_get(arg);
      +			fflush(stdout);
      +
       		} else if (!strcmp(buf.buf, "capabilities")) {
     @@ t/t5557-http-get.sh (new)
      +start_httpd
      +
      +test_expect_success 'get by URL: 404' '
     ++	test_when_finished "rm -f file.temp" &&
      +	url="$HTTPD_URL/none.txt" &&
      +	cat >input <<-EOF &&
      +	capabilities
     @@ t/t5557-http-get.sh (new)
      +
      +	test_must_fail git remote-http $url <input 2>err &&
      +	test_path_is_missing file1 &&
     -+	grep "failed to download file at URL" err &&
     -+	rm file1.temp
     ++	grep "failed to download file at URL" err
      +'
      +
      +test_expect_success 'get by URL: 200' '
     @@ t/t5557-http-get.sh (new)
      +
      +	EOF
      +
     -+	GIT_TRACE2_PERF=1 git remote-http $url <input &&
     ++	git remote-http $url <input &&
      +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
      +'
      +
 2:  7d3159f0d9a ! 2:  6a6f1a04889 bundle-uri: create basic file-copy logic
     @@ Commit message
          little benefit. This is especially the case because we expect that most
          bundle URI use will be based on HTTPS instead of file copies.
      
     +    Reviewed-by: Josh Steadmon <steadmon@google.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Makefile ##
     @@ bundle-uri.h (new)
      +int fetch_bundle_uri(struct repository *r, const char *uri);
      +
      +#endif
     +
     + ## t/t5557-http-get.sh ##
     +@@
     + 
     + test_description='test downloading a file by URL'
     + 
     ++TEST_PASSES_SANITIZE_LEAK=true
     ++
     + . ./test-lib.sh
     + 
     + . "$TEST_DIRECTORY"/lib-httpd.sh
 3:  29e645a54ba ! 3:  00debaf6e77 clone: add --bundle-uri option
     @@ Commit message
          Git to bootstrap the new repository with these bundles before fetching
          the remaining objects from the origin server.
      
     +    Reviewed-by: Josh Steadmon <steadmon@google.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/git-clone.txt ##
 4:  f6bc3177332 ! 4:  e4f2dcc7a45 bundle-uri: add support for http(s):// and file://
     @@ Commit message
          Otherwise, check to see if file:// is present and modify the prefix
          accordingly.
      
     +    Reviewed-by: Josh Steadmon <steadmon@google.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## bundle-uri.c ##
     @@ bundle-uri.c: static int find_temp_filename(struct strbuf *name)
      +{
      +	const char *out;
      +
     -+	if (skip_prefix(uri, "https:", &out) ||
     -+	    skip_prefix(uri, "http:", &out))
     ++	if (istarts_with(uri, "https:") ||
     ++	    istarts_with(uri, "http:"))
      +		return download_https_uri_to_file(filename, uri);
      +
      +	if (!skip_prefix(uri, "file://", &out))
 5:  e823b168ab7 ! 5:  acee1fae027 clone: --bundle-uri cannot be combined with --depth
     @@ Commit message
          fetch request, but with a list of haves that might cause a more painful
          computation of that shallow pack-file.
      
     +    Reviewed-by: Josh Steadmon <steadmon@google.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/git-clone.txt ##

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/5] remote-curl: add 'get' capability
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
@ 2022-08-02 12:29   ` Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-02 12:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.

Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/gitremote-helpers.txt |  9 +++++++
 remote-curl.c                       | 28 ++++++++++++++++++++++
 t/t5557-http-get.sh                 | 37 +++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+)
 create mode 100755 t/t5557-http-get.sh

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 6f1e269ae43..ed8da428c98 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
 	Can guarantee that when a clone is requested, the received
 	pack is self contained and is connected.
 
+'get'::
+	Can use the 'get' command to download a file from a given URI.
+
 If a helper advertises 'connect', Git will use it if possible and
 fall back to another capability if the helper requests so when
 connecting (see the 'connect' command under COMMANDS).
@@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
 +
 Supported if the helper has the "stateless-connect" capability.
 
+'get' <uri> <path>::
+	Downloads the file from the given `<uri>` to the given `<path>`. If
+	`<path>.temp` exists, then Git assumes that the `.temp` file is a
+	partial download from a previous attempt and will resume the
+	download from that position.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
diff --git a/remote-curl.c b/remote-curl.c
index b8758757ece..72dfb8fb86a 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1286,6 +1286,29 @@ static void parse_fetch(struct strbuf *buf)
 	strbuf_reset(buf);
 }
 
+static void parse_get(const char *arg)
+{
+	struct strbuf url = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	const char *space;
+
+	space = strchr(arg, ' ');
+
+	if (!space)
+		die(_("protocol error: expected '<url> <path>', missing space"));
+
+	strbuf_add(&url, arg, space - arg);
+	strbuf_addstr(&path, space + 1);
+
+	if (http_get_file(url.buf, path.buf, NULL))
+		die(_("failed to download file at URL '%s'"), url.buf);
+
+	strbuf_release(&url);
+	strbuf_release(&path);
+	printf("\n");
+	fflush(stdout);
+}
+
 static int push_dav(int nr_spec, const char **specs)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1564,9 +1587,14 @@ int cmd_main(int argc, const char **argv)
 				printf("unsupported\n");
 			fflush(stdout);
 
+		} else if (skip_prefix(buf.buf, "get ", &arg)) {
+			parse_get(arg);
+			fflush(stdout);
+
 		} else if (!strcmp(buf.buf, "capabilities")) {
 			printf("stateless-connect\n");
 			printf("fetch\n");
+			printf("get\n");
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
new file mode 100755
index 00000000000..09fb0bd9a8a
--- /dev/null
+++ b/t/t5557-http-get.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='test downloading a file by URL'
+
+. ./test-lib.sh
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'get by URL: 404' '
+	test_when_finished "rm -f file.temp" &&
+	url="$HTTPD_URL/none.txt" &&
+	cat >input <<-EOF &&
+	capabilities
+	get $url file1
+	EOF
+
+	test_must_fail git remote-http $url <input 2>err &&
+	test_path_is_missing file1 &&
+	grep "failed to download file at URL" err
+'
+
+test_expect_success 'get by URL: 200' '
+	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
+
+	url="$HTTPD_URL/exists.txt" &&
+	cat >input <<-EOF &&
+	capabilities
+	get $url file2
+
+	EOF
+
+	git remote-http $url <input &&
+	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/5] bundle-uri: create basic file-copy logic
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
@ 2022-08-02 12:29   ` Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-02 12:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.

For now, we also only succeed if the content at the URI is a bundle
file, not a bundle list. Bundle lists will be implemented in a future
change.

Note that the discovery of a temporary filename is slightly racy because
the odb_mkstemp() relies on the temporary file not existing. With the
current implementation being limited to file copies, we could replace
the copy_file() with copy_fd(). The tricky part comes in future changes
that send the filename to 'git remote-https' and its 'get' capability.
At that point, we need the file descriptor closed _and_ the file
unlinked. If we were to keep the file descriptor open for the sake of
normal file copies, then we would pollute the rest of the code for
little benefit. This is especially the case because we expect that most
bundle URI use will be based on HTTPS instead of file copies.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile            |   1 +
 bundle-uri.c        | 104 ++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h        |  14 ++++++
 t/t5557-http-get.sh |   2 +
 4 files changed, 121 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h

diff --git a/Makefile b/Makefile
index 1624471badc..7d5f48069ea 100644
--- a/Makefile
+++ b/Makefile
@@ -906,6 +906,7 @@ LIB_OBJS += blob.o
 LIB_OBJS += bloom.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
+LIB_OBJS += bundle-uri.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += cbtree.o
diff --git a/bundle-uri.c b/bundle-uri.c
new file mode 100644
index 00000000000..b35babc36aa
--- /dev/null
+++ b/bundle-uri.c
@@ -0,0 +1,104 @@
+#include "cache.h"
+#include "bundle-uri.h"
+#include "bundle.h"
+#include "object-store.h"
+#include "refs.h"
+#include "run-command.h"
+
+static int find_temp_filename(struct strbuf *name)
+{
+	int fd;
+	/*
+	 * Find a temporary filename that is available. This is briefly
+	 * racy, but unlikely to collide.
+	 */
+	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	if (fd < 0) {
+		warning(_("failed to create temporary file"));
+		return -1;
+	}
+
+	close(fd);
+	unlink(name->buf);
+	return 0;
+}
+
+static int copy_uri_to_file(const char *file, const char *uri)
+{
+	/* File-based URIs only for now. */
+	return copy_file(file, uri, 0);
+}
+
+static int unbundle_from_file(struct repository *r, const char *file)
+{
+	int result = 0;
+	int bundle_fd;
+	struct bundle_header header = BUNDLE_HEADER_INIT;
+	struct string_list_item *refname;
+	struct strbuf bundle_ref = STRBUF_INIT;
+	size_t bundle_prefix_len;
+
+	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
+		return 1;
+
+	if ((result = unbundle(r, &header, bundle_fd, NULL)))
+		return 1;
+
+	/*
+	 * Convert all refs/heads/ from the bundle into refs/bundles/
+	 * in the local repository.
+	 */
+	strbuf_addstr(&bundle_ref, "refs/bundles/");
+	bundle_prefix_len = bundle_ref.len;
+
+	for_each_string_list_item(refname, &header.references) {
+		struct object_id *oid = refname->util;
+		struct object_id old_oid;
+		const char *branch_name;
+		int has_old;
+
+		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+			continue;
+
+		strbuf_setlen(&bundle_ref, bundle_prefix_len);
+		strbuf_addstr(&bundle_ref, branch_name);
+
+		has_old = !read_ref(bundle_ref.buf, &old_oid);
+		update_ref("fetched bundle", bundle_ref.buf, oid,
+			   has_old ? &old_oid : NULL,
+			   REF_SKIP_OID_VERIFICATION,
+			   UPDATE_REFS_MSG_ON_ERR);
+	}
+
+	bundle_header_release(&header);
+	return result;
+}
+
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	int result = 0;
+	struct strbuf filename = STRBUF_INIT;
+
+	if ((result = find_temp_filename(&filename)))
+		goto cleanup;
+
+	if ((result = copy_uri_to_file(filename.buf, uri))) {
+		warning(_("failed to download bundle from URI '%s'"), uri);
+		goto cleanup;
+	}
+
+	if ((result = !is_bundle(filename.buf, 0))) {
+		warning(_("file at URI '%s' is not a bundle"), uri);
+		goto cleanup;
+	}
+
+	if ((result = unbundle_from_file(r, filename.buf))) {
+		warning(_("failed to unbundle bundle from URI '%s'"), uri);
+		goto cleanup;
+	}
+
+cleanup:
+	unlink(filename.buf);
+	strbuf_release(&filename);
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
new file mode 100644
index 00000000000..8a152f1ef14
--- /dev/null
+++ b/bundle-uri.h
@@ -0,0 +1,14 @@
+#ifndef BUNDLE_URI_H
+#define BUNDLE_URI_H
+
+struct repository;
+
+/**
+ * Fetch data from the given 'uri' and unbundle the bundle data found
+ * based on that information.
+ *
+ * Returns non-zero if no bundle information is found at the given 'uri'.
+ */
+int fetch_bundle_uri(struct repository *r, const char *uri);
+
+#endif
diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
index 09fb0bd9a8a..76a4bbd16af 100755
--- a/t/t5557-http-get.sh
+++ b/t/t5557-http-get.sh
@@ -2,6 +2,8 @@
 
 test_description='test downloading a file by URL'
 
+TEST_PASSES_SANITIZE_LEAK=true
+
 . ./test-lib.sh
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/5] clone: add --bundle-uri option
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
@ 2022-08-02 12:29   ` Derrick Stolee via GitGitGadget
  2022-08-02 12:29   ` [PATCH v2 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-02 12:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from
a bundle. If the user is aware of a bundle server, then they can tell
Git to bootstrap the new repository with these bundles before fetching
the remaining objects from the origin server.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-clone.txt |  6 ++++++
 builtin/clone.c             | 15 +++++++++++++++
 t/t5558-clone-bundle-uri.sh | 36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+)
 create mode 100755 t/t5558-clone-bundle-uri.sh

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 632bd1348ea..60fedf7eb5e 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -323,6 +323,12 @@ or `--mirror` is given)
 	for `host.xz:foo/.git`).  Cloning into an existing directory
 	is only allowed if the directory is empty.
 
+--bundle-uri=<uri>::
+	Before fetching from the remote, fetch a bundle from the given
+	`<uri>` and unbundle the data into the local repository. The refs
+	in the bundle will be stored under the hidden `refs/bundle/*`
+	namespace.
+
 :git-clone: 1
 include::urls.txt[]
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c4ff4643ecd..4224d562758 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -34,6 +34,7 @@
 #include "list-objects-filter-options.h"
 #include "hook.h"
 #include "bundle.h"
+#include "bundle-uri.h"
 
 /*
  * Overall FIXMEs:
@@ -77,6 +78,7 @@ static int option_filter_submodules = -1;    /* unspecified */
 static int config_filter_submodules = -1;    /* unspecified */
 static struct string_list server_options = STRING_LIST_INIT_NODUP;
 static int option_remote_submodules;
+static const char *bundle_uri;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -160,6 +162,8 @@ static struct option builtin_clone_options[] = {
 		    N_("any cloned submodules will use their remote-tracking branch")),
 	OPT_BOOL(0, "sparse", &option_sparse_checkout,
 		    N_("initialize sparse-checkout file to include only files at root")),
+	OPT_STRING(0, "bundle-uri", &bundle_uri,
+		   N_("uri"), N_("a URI for downloading bundles before fetching from origin remote")),
 	OPT_END()
 };
 
@@ -1232,6 +1236,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (transport->smart_options && !deepen && !filter_options.choice)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
+	/*
+	 * Before fetching from the remote, download and install bundle
+	 * data from the --bundle-uri option.
+	 */
+	if (bundle_uri) {
+		/* At this point, we need the_repository to match the cloned repo. */
+		repo_init(the_repository, git_dir, work_tree);
+		if (fetch_bundle_uri(the_repository, bundle_uri))
+			warning(_("failed to fetch objects from bundle URI '%s'"),
+				bundle_uri);
+	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
 	refspec_ref_prefixes(&remote->fetch,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
new file mode 100755
index 00000000000..f709bcb729c
--- /dev/null
+++ b/t/t5558-clone-bundle-uri.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='test fetching bundles with --bundle-uri'
+
+. ./test-lib.sh
+
+test_expect_success 'fail to clone from non-existent file' '
+	test_when_finished rm -rf test &&
+	git clone --bundle-uri="$(pwd)/does-not-exist" . test 2>err &&
+	grep "failed to download bundle from URI" err
+'
+
+test_expect_success 'fail to clone from non-bundle file' '
+	test_when_finished rm -rf test &&
+	echo bogus >bogus &&
+	git clone --bundle-uri="$(pwd)/bogus" . test 2>err &&
+	grep "is not a bundle" err
+'
+
+test_expect_success 'create bundle' '
+	git init clone-from &&
+	git -C clone-from checkout -b topic &&
+	test_commit -C clone-from A &&
+	test_commit -C clone-from B &&
+	git -C clone-from bundle create B.bundle topic
+'
+
+test_expect_success 'clone with path bundle' '
+	git clone --bundle-uri="clone-from/B.bundle" \
+		clone-from clone-path &&
+	git -C clone-path rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-08-02 12:29   ` [PATCH v2 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
@ 2022-08-02 12:29   ` Derrick Stolee via GitGitGadget
  2022-08-02 21:32     ` Junio C Hamano
  2022-08-02 12:29   ` [PATCH v2 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
  5 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-02 12:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change created the 'git clone --bundle-uri=<uri>' option.
Currently, <uri> must be a filename.

Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
and use git-remote-https as the way to download the data at that URI.
Otherwise, check to see if file:// is present and modify the prefix
accordingly.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 70 +++++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index b35babc36aa..930e995d194 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
 	return 0;
 }
 
-static int copy_uri_to_file(const char *file, const char *uri)
+static int download_https_uri_to_file(const char *file, const char *uri)
 {
-	/* File-based URIs only for now. */
-	return copy_file(file, uri, 0);
+	int result = 0;
+	struct child_process cp = CHILD_PROCESS_INIT;
+	FILE *child_in = NULL, *child_out = NULL;
+	struct strbuf line = STRBUF_INIT;
+	int found_get = 0;
+
+	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
+	cp.in = -1;
+	cp.out = -1;
+
+	if (start_command(&cp))
+		return 1;
+
+	child_in = fdopen(cp.in, "w");
+	if (!child_in) {
+		result = 1;
+		goto cleanup;
+	}
+
+	child_out = fdopen(cp.out, "r");
+	if (!child_out) {
+		result = 1;
+		goto cleanup;
+	}
+
+	fprintf(child_in, "capabilities\n");
+	fflush(child_in);
+
+	while (!strbuf_getline(&line, child_out)) {
+		if (!line.len)
+			break;
+		if (!strcmp(line.buf, "get"))
+			found_get = 1;
+	}
+	strbuf_release(&line);
+
+	if (!found_get) {
+		result = error(_("insufficient capabilities"));
+		goto cleanup;
+	}
+
+	fprintf(child_in, "get %s %s\n\n", uri, file);
+
+cleanup:
+	if (child_in)
+		fclose(child_in);
+	if (finish_command(&cp))
+		return 1;
+	if (child_out)
+		fclose(child_out);
+	return result;
+}
+
+static int copy_uri_to_file(const char *filename, const char *uri)
+{
+	const char *out;
+
+	if (istarts_with(uri, "https:") ||
+	    istarts_with(uri, "http:"))
+		return download_https_uri_to_file(filename, uri);
+
+	if (!skip_prefix(uri, "file://", &out))
+		out = uri;
+
+	/* Copy as a file */
+	return copy_file(filename, out, 0);
 }
 
 static int unbundle_from_file(struct repository *r, const char *file)
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index f709bcb729c..ad666a2d28a 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone with file:// bundle' '
+	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
+		clone-from clone-file &&
+	git -C clone-file rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual
+'
+
+#########################################################################
+# HTTP tests begin here
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fail to fetch from non-existent HTTP URL' '
+	test_when_finished rm -rf test &&
+	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
+	grep "failed to download bundle from URI" err
+'
+
+test_expect_success 'fail to fetch from non-bundle HTTP URL' '
+	test_when_finished rm -rf test &&
+	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
+	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
+	grep "is not a bundle" err
+'
+
+test_expect_success 'clone HTTP bundle' '
+	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
+
+	git clone --no-local --mirror clone-from \
+		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
+
+	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
+		"$HTTPD_URL/smart/fetch.git" clone-http &&
+	git -C clone-http rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual &&
+
+	test_config -C clone-http log.excludedecoration refs/bundle/
+'
+
+# Do not add tests here unless they use the HTTP server, as they will
+# not run unless the HTTP dependencies exist.
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/5] clone: --bundle-uri cannot be combined with --depth
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-08-02 12:29   ` [PATCH v2 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-08-02 12:29   ` Derrick Stolee via GitGitGadget
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-02 12:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change added the '--bundle-uri' option, but did not check
if the --depth parameter was included. Since bundles are not compatible
with shallow clones, provide an error message to the user who is
attempting this combination.

I am leaving this as its own change, separate from the one that
implements '--bundle-uri', because this is more of an advisory for the
user. There is nothing wrong with bootstrapping with bundles and then
fetching a shallow clone. However, that is likely going to involve too
much work for the client _and_ the server. The client will download all
of this bundle information containing the full history of the
repository only to ignore most of it. The server will get a shallow
fetch request, but with a list of haves that might cause a more painful
computation of that shallow pack-file.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-clone.txt | 3 ++-
 builtin/clone.c             | 3 +++
 t/t5606-clone-options.sh    | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 60fedf7eb5e..d032d971dd7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -327,7 +327,8 @@ or `--mirror` is given)
 	Before fetching from the remote, fetch a bundle from the given
 	`<uri>` and unbundle the data into the local repository. The refs
 	in the bundle will be stored under the hidden `refs/bundle/*`
-	namespace.
+	namespace. This option is incompatible with `--depth`,
+	`--shallow-since`, and `--shallow-exclude`.
 
 :git-clone: 1
 include::urls.txt[]
diff --git a/builtin/clone.c b/builtin/clone.c
index 4224d562758..4463789680b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -937,6 +937,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		option_no_checkout = 1;
 	}
 
+	if (bundle_uri && deepen)
+		die(_("--bundle-uri is incompatible with --depth, --shallow-since, and --shallow-exclude"));
+
 	repo_name = argv[0];
 
 	path = get_repo_path(repo_name, &is_bundle);
diff --git a/t/t5606-clone-options.sh b/t/t5606-clone-options.sh
index 8f676d6b0c0..f6bb02ab947 100755
--- a/t/t5606-clone-options.sh
+++ b/t/t5606-clone-options.sh
@@ -58,6 +58,14 @@ test_expect_success 'disallows --bare with --separate-git-dir' '
 
 '
 
+test_expect_success 'disallows --bundle-uri with shallow options' '
+	for option in --depth=1 --shallow-since=01-01-2000 --shallow-exclude=HEAD
+	do
+		test_must_fail git clone --bundle-uri=bundle $option from to 2>err &&
+		grep "bundle-uri is incompatible" err || return 1
+	done
+'
+
 test_expect_success 'reject cloning shallow repository' '
 	test_when_finished "rm -rf repo" &&
 	test_must_fail git clone --reject-shallow shallow-repo out 2>err &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-02 12:29   ` [PATCH v2 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-08-02 21:32     ` Junio C Hamano
  2022-08-04 15:34       ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2022-08-02 21:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> +	const char *out;
> +
> +	if (istarts_with(uri, "https:") ||
> +	    istarts_with(uri, "http:"))

Let's be a bit more strict to avoid mistakes and make the code
immediately obvious, e.g.

	if (istarts_with(uri, "https://") ||
	    istarts_with(uri, "http://"))

> +		return download_https_uri_to_file(filename, uri);
> +
> +	if (!skip_prefix(uri, "file://", &out))
> +		out = uri;

If we are using istarts_with because URI scheme name is case
insensitive, shouldn't we do the same for "file://" URL, not
just for "http(s)://" URL?  IOW

	if (!skip_iprefix(uri, "file://", &out))

> +static int download_https_uri_to_file(const char *file, const char *uri)
>  {
> +	int result = 0;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	FILE *child_in = NULL, *child_out = NULL;
> +	struct strbuf line = STRBUF_INIT;
> +	int found_get = 0;
> +
> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);

Does "git-remote-https" talk to a "http://" URL just fine when uri
parameter starts with "http://"?  Would it be the same if the uri
parameter begins with say "Http://"?

Other than that, looks sensible.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-02 21:32     ` Junio C Hamano
@ 2022-08-04 15:34       ` Derrick Stolee
  2022-08-04 18:19         ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2022-08-04 15:34 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon

On 8/2/2022 5:32 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> +	const char *out;
>> +
>> +	if (istarts_with(uri, "https:") ||
>> +	    istarts_with(uri, "http:"))
> 
> Let's be a bit more strict to avoid mistakes and make the code
> immediately obvious, e.g.
> 
> 	if (istarts_with(uri, "https://") ||
> 	    istarts_with(uri, "http://"))
> 
>> +		return download_https_uri_to_file(filename, uri);
>> +
>> +	if (!skip_prefix(uri, "file://", &out))
>> +		out = uri;
> 
> If we are using istarts_with because URI scheme name is case
> insensitive, shouldn't we do the same for "file://" URL, not
> just for "http(s)://" URL?  IOW
> 
> 	if (!skip_iprefix(uri, "file://", &out))

Good ideas. Of course, we don't have a skip_iprefix(), but
I can use "istarts_with()" and then manually add the length.
If we see more need for that in the future, we can consider
adding it.

(It's interesting that these uses in bundle-uri.c are the
only uses of istarts_with() that I see in the codebase.)

>> +static int download_https_uri_to_file(const char *file, const char *uri)
>>  {
>> +	int result = 0;
>> +	struct child_process cp = CHILD_PROCESS_INIT;
>> +	FILE *child_in = NULL, *child_out = NULL;
>> +	struct strbuf line = STRBUF_INIT;
>> +	int found_get = 0;
>> +
>> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
> 
> Does "git-remote-https" talk to a "http://" URL just fine when uri
> parameter starts with "http://"?  Would it be the same if the uri
> parameter begins with say "Http://"?

I did a quick check of our HTTPS tests modifying the HTTPD_PROTO
variable in lib-httpd.sh to "HtTP" and we get this fun error:

+ git clone --filter=blob:limit=0 HtTP://127.0.0.1:5601/smart/server client
Cloning into 'client'...
git: 'remote-HtTP' is not a git command. See 'git --help'.

So I guess I can keep case-sensitive comparisons here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-04 15:34       ` Derrick Stolee
@ 2022-08-04 18:19         ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2022-08-04 18:19 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	dyroneteng, Johannes.Schindelin, szeder.dev, mjcheetham,
	steadmon

Derrick Stolee <derrickstolee@github.com> writes:

>>> +	if (istarts_with(uri, "https:") ||
>>> +	    istarts_with(uri, "http:"))
>> 
>> Let's be a bit more strict to avoid mistakes and make the code
>> immediately obvious, e.g.
>> 
>> 	if (istarts_with(uri, "https://") ||
>> 	    istarts_with(uri, "http://"))

This part (i.e. check for "<scheme>://", not "<scheme>:") still
stands, as the latter could be an scp style Git URL that goes to the
host whose name is <scheme>.  As mentioned later, s/istarts/starts/
is probably a good thing to do here.

>> Does "git-remote-https" talk to a "http://" URL just fine when uri
>> parameter starts with "http://"?  Would it be the same if the uri
>> parameter begins with say "Http://"?
>
> I did a quick check of our HTTPS tests modifying the HTTPD_PROTO
> variable in lib-httpd.sh to "HtTP" and we get this fun error:
>
> + git clone --filter=blob:limit=0 HtTP://127.0.0.1:5601/smart/server client
> Cloning into 'client'...
> git: 'remote-HtTP' is not a git command. See 'git --help'.
>
> So I guess I can keep case-sensitive comparisons here.

Guarding them to lowercase-only may sound like a cop-out to purists,
but I think it is reasonable thing to do.  The only folks that would
be offended by are protocol lawyers, and as your check shows, we are
treating <scheme> case sensitively already.

An obvious alternative is to downcase the "<scheme>://" part but I
do not think it is worth it; we have to do that everywhere and we
need to be confident that we covered all the code paths---the latter
is expensive.

There do exist skip_iprefix() that are used in many places, like
convert, http, mailinfo, etc. by the way.  That is a moot point as
we are not doing case insensitive comparison anymore.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri
  2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-08-02 12:29   ` [PATCH v2 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
@ 2022-08-09 13:11   ` Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
                       ` (4 more replies)
  5 siblings, 5 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-09 13:11 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee

This is the second series building the bundle URI feature as discussed in
the previous series that added the design document [1]. This series does not
modify the design document, so the patches are independent and can be
applied to the latest 'master'.

[1]
https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

This series brings in just enough logic that we can bootstrap clones from a
single bundle using git clone --bundle-uri=<X>.

 * Patch 1 adds a 'get' capability to 'git remote-https' which allows
   downloading the contents of a URI to a local file.
 * Patch 2 creates basic file-copy logic within a new bundle-uri.c file. It
   is not used until patch 3.
 * Patch 3 creates the git clone --bundle-uri=<X> option, allowing Git to
   bootstrap a clone from a bundle, but get the remaining objects from the
   origin URL. (As of this patch, it only accepts a filename.)
 * Patch 4 extends the git clone --bundle-uri=<X> option to allow file://
   and https:// URIs.
 * Patch 5 is a CLI helper to avoid using --bundle-uri and --depth at the
   same time in git clone.

As outlined in [1], the next steps after this are:

 1. Allow parsing a bundle list as a config file at the given URI. The
    key-value format is unified with the protocol v2 verb (coming in (3)).
    [2]
 2. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [3]
 3. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [4]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[2] https://github.com/derrickstolee/git/pull/20

[3] https://github.com/derrickstolee/git/pull/21

[4] https://github.com/derrickstolee/git/pull/22

Note: this series includes some code pulled out of the first series [1], and
in particular the git fetch --bundle-uri=<X> option is removed. The
intention was to replace that with git bundle fetch <X>, but that conflicts
with the work to refactor how subcommands are parsed. The git bundle fetch
subcommand could be added later for maximum flexibility on the client side,
but we can also move forward without it.


Updates in v3
=============

 * The protocol matching for "http://", "https://", and "file://" now
   uniformly include the "//" portion and are case-sensitive.


Updates in v2
=============

 * Several typos or small tweaks. See the range-diff for details.

Thanks, -Stolee

Derrick Stolee (5):
  remote-curl: add 'get' capability
  bundle-uri: create basic file-copy logic
  clone: add --bundle-uri option
  bundle-uri: add support for http(s):// and file://
  clone: --bundle-uri cannot be combined with --depth

 Documentation/git-clone.txt         |   7 ++
 Documentation/gitremote-helpers.txt |   9 ++
 Makefile                            |   1 +
 builtin/clone.c                     |  18 +++
 bundle-uri.c                        | 168 ++++++++++++++++++++++++++++
 bundle-uri.h                        |  14 +++
 remote-curl.c                       |  28 +++++
 t/t5557-http-get.sh                 |  39 +++++++
 t/t5558-clone-bundle-uri.sh         |  81 ++++++++++++++
 t/t5606-clone-options.sh            |   8 ++
 10 files changed, 373 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h
 create mode 100755 t/t5557-http-get.sh
 create mode 100755 t/t5558-clone-bundle-uri.sh


base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1300%2Fderrickstolee%2Fbundle-redo%2Fclone-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1300/derrickstolee/bundle-redo/clone-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1300

Range-diff vs v2:

 1:  4df4a1d679a = 1:  4df4a1d679a remote-curl: add 'get' capability
 2:  6a6f1a04889 = 2:  6a6f1a04889 bundle-uri: create basic file-copy logic
 3:  00debaf6e77 = 3:  00debaf6e77 clone: add --bundle-uri option
 4:  e4f2dcc7a45 ! 4:  66a1ce40451 bundle-uri: add support for http(s):// and file://
     @@ bundle-uri.c: static int find_temp_filename(struct strbuf *name)
      +	struct strbuf line = STRBUF_INIT;
      +	int found_get = 0;
      +
     -+	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
     ++	strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
      +	cp.in = -1;
      +	cp.out = -1;
      +
     @@ bundle-uri.c: static int find_temp_filename(struct strbuf *name)
      +{
      +	const char *out;
      +
     -+	if (istarts_with(uri, "https:") ||
     -+	    istarts_with(uri, "http:"))
     ++	if (starts_with(uri, "https:") ||
     ++	    starts_with(uri, "http:"))
      +		return download_https_uri_to_file(filename, uri);
      +
     -+	if (!skip_prefix(uri, "file://", &out))
     -+		out = uri;
     ++	if (skip_prefix(uri, "file://", &out))
     ++		uri = out;
      +
      +	/* Copy as a file */
     -+	return copy_file(filename, out, 0);
     ++	return copy_file(filename, uri, 0);
       }
       
       static int unbundle_from_file(struct repository *r, const char *file)
 5:  acee1fae027 = 5:  ed76d84c5a7 clone: --bundle-uri cannot be combined with --depth

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 1/5] remote-curl: add 'get' capability
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
@ 2022-08-09 13:11     ` Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-09 13:11 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.

Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/gitremote-helpers.txt |  9 +++++++
 remote-curl.c                       | 28 ++++++++++++++++++++++
 t/t5557-http-get.sh                 | 37 +++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+)
 create mode 100755 t/t5557-http-get.sh

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 6f1e269ae43..ed8da428c98 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
 	Can guarantee that when a clone is requested, the received
 	pack is self contained and is connected.
 
+'get'::
+	Can use the 'get' command to download a file from a given URI.
+
 If a helper advertises 'connect', Git will use it if possible and
 fall back to another capability if the helper requests so when
 connecting (see the 'connect' command under COMMANDS).
@@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
 +
 Supported if the helper has the "stateless-connect" capability.
 
+'get' <uri> <path>::
+	Downloads the file from the given `<uri>` to the given `<path>`. If
+	`<path>.temp` exists, then Git assumes that the `.temp` file is a
+	partial download from a previous attempt and will resume the
+	download from that position.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
diff --git a/remote-curl.c b/remote-curl.c
index b8758757ece..72dfb8fb86a 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1286,6 +1286,29 @@ static void parse_fetch(struct strbuf *buf)
 	strbuf_reset(buf);
 }
 
+static void parse_get(const char *arg)
+{
+	struct strbuf url = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	const char *space;
+
+	space = strchr(arg, ' ');
+
+	if (!space)
+		die(_("protocol error: expected '<url> <path>', missing space"));
+
+	strbuf_add(&url, arg, space - arg);
+	strbuf_addstr(&path, space + 1);
+
+	if (http_get_file(url.buf, path.buf, NULL))
+		die(_("failed to download file at URL '%s'"), url.buf);
+
+	strbuf_release(&url);
+	strbuf_release(&path);
+	printf("\n");
+	fflush(stdout);
+}
+
 static int push_dav(int nr_spec, const char **specs)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1564,9 +1587,14 @@ int cmd_main(int argc, const char **argv)
 				printf("unsupported\n");
 			fflush(stdout);
 
+		} else if (skip_prefix(buf.buf, "get ", &arg)) {
+			parse_get(arg);
+			fflush(stdout);
+
 		} else if (!strcmp(buf.buf, "capabilities")) {
 			printf("stateless-connect\n");
 			printf("fetch\n");
+			printf("get\n");
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
new file mode 100755
index 00000000000..09fb0bd9a8a
--- /dev/null
+++ b/t/t5557-http-get.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='test downloading a file by URL'
+
+. ./test-lib.sh
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'get by URL: 404' '
+	test_when_finished "rm -f file.temp" &&
+	url="$HTTPD_URL/none.txt" &&
+	cat >input <<-EOF &&
+	capabilities
+	get $url file1
+	EOF
+
+	test_must_fail git remote-http $url <input 2>err &&
+	test_path_is_missing file1 &&
+	grep "failed to download file at URL" err
+'
+
+test_expect_success 'get by URL: 200' '
+	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
+
+	url="$HTTPD_URL/exists.txt" &&
+	cat >input <<-EOF &&
+	capabilities
+	get $url file2
+
+	EOF
+
+	git remote-http $url <input &&
+	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 2/5] bundle-uri: create basic file-copy logic
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
@ 2022-08-09 13:11     ` Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-09 13:11 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.

For now, we also only succeed if the content at the URI is a bundle
file, not a bundle list. Bundle lists will be implemented in a future
change.

Note that the discovery of a temporary filename is slightly racy because
the odb_mkstemp() relies on the temporary file not existing. With the
current implementation being limited to file copies, we could replace
the copy_file() with copy_fd(). The tricky part comes in future changes
that send the filename to 'git remote-https' and its 'get' capability.
At that point, we need the file descriptor closed _and_ the file
unlinked. If we were to keep the file descriptor open for the sake of
normal file copies, then we would pollute the rest of the code for
little benefit. This is especially the case because we expect that most
bundle URI use will be based on HTTPS instead of file copies.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile            |   1 +
 bundle-uri.c        | 104 ++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h        |  14 ++++++
 t/t5557-http-get.sh |   2 +
 4 files changed, 121 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h

diff --git a/Makefile b/Makefile
index 1624471badc..7d5f48069ea 100644
--- a/Makefile
+++ b/Makefile
@@ -906,6 +906,7 @@ LIB_OBJS += blob.o
 LIB_OBJS += bloom.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
+LIB_OBJS += bundle-uri.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += cbtree.o
diff --git a/bundle-uri.c b/bundle-uri.c
new file mode 100644
index 00000000000..b35babc36aa
--- /dev/null
+++ b/bundle-uri.c
@@ -0,0 +1,104 @@
+#include "cache.h"
+#include "bundle-uri.h"
+#include "bundle.h"
+#include "object-store.h"
+#include "refs.h"
+#include "run-command.h"
+
+static int find_temp_filename(struct strbuf *name)
+{
+	int fd;
+	/*
+	 * Find a temporary filename that is available. This is briefly
+	 * racy, but unlikely to collide.
+	 */
+	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	if (fd < 0) {
+		warning(_("failed to create temporary file"));
+		return -1;
+	}
+
+	close(fd);
+	unlink(name->buf);
+	return 0;
+}
+
+static int copy_uri_to_file(const char *file, const char *uri)
+{
+	/* File-based URIs only for now. */
+	return copy_file(file, uri, 0);
+}
+
+static int unbundle_from_file(struct repository *r, const char *file)
+{
+	int result = 0;
+	int bundle_fd;
+	struct bundle_header header = BUNDLE_HEADER_INIT;
+	struct string_list_item *refname;
+	struct strbuf bundle_ref = STRBUF_INIT;
+	size_t bundle_prefix_len;
+
+	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
+		return 1;
+
+	if ((result = unbundle(r, &header, bundle_fd, NULL)))
+		return 1;
+
+	/*
+	 * Convert all refs/heads/ from the bundle into refs/bundles/
+	 * in the local repository.
+	 */
+	strbuf_addstr(&bundle_ref, "refs/bundles/");
+	bundle_prefix_len = bundle_ref.len;
+
+	for_each_string_list_item(refname, &header.references) {
+		struct object_id *oid = refname->util;
+		struct object_id old_oid;
+		const char *branch_name;
+		int has_old;
+
+		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+			continue;
+
+		strbuf_setlen(&bundle_ref, bundle_prefix_len);
+		strbuf_addstr(&bundle_ref, branch_name);
+
+		has_old = !read_ref(bundle_ref.buf, &old_oid);
+		update_ref("fetched bundle", bundle_ref.buf, oid,
+			   has_old ? &old_oid : NULL,
+			   REF_SKIP_OID_VERIFICATION,
+			   UPDATE_REFS_MSG_ON_ERR);
+	}
+
+	bundle_header_release(&header);
+	return result;
+}
+
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	int result = 0;
+	struct strbuf filename = STRBUF_INIT;
+
+	if ((result = find_temp_filename(&filename)))
+		goto cleanup;
+
+	if ((result = copy_uri_to_file(filename.buf, uri))) {
+		warning(_("failed to download bundle from URI '%s'"), uri);
+		goto cleanup;
+	}
+
+	if ((result = !is_bundle(filename.buf, 0))) {
+		warning(_("file at URI '%s' is not a bundle"), uri);
+		goto cleanup;
+	}
+
+	if ((result = unbundle_from_file(r, filename.buf))) {
+		warning(_("failed to unbundle bundle from URI '%s'"), uri);
+		goto cleanup;
+	}
+
+cleanup:
+	unlink(filename.buf);
+	strbuf_release(&filename);
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
new file mode 100644
index 00000000000..8a152f1ef14
--- /dev/null
+++ b/bundle-uri.h
@@ -0,0 +1,14 @@
+#ifndef BUNDLE_URI_H
+#define BUNDLE_URI_H
+
+struct repository;
+
+/**
+ * Fetch data from the given 'uri' and unbundle the bundle data found
+ * based on that information.
+ *
+ * Returns non-zero if no bundle information is found at the given 'uri'.
+ */
+int fetch_bundle_uri(struct repository *r, const char *uri);
+
+#endif
diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
index 09fb0bd9a8a..76a4bbd16af 100755
--- a/t/t5557-http-get.sh
+++ b/t/t5557-http-get.sh
@@ -2,6 +2,8 @@
 
 test_description='test downloading a file by URL'
 
+TEST_PASSES_SANITIZE_LEAK=true
+
 . ./test-lib.sh
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 3/5] clone: add --bundle-uri option
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
@ 2022-08-09 13:11     ` Derrick Stolee via GitGitGadget
  2022-08-22 21:24       ` Junio C Hamano
  2022-08-09 13:11     ` [PATCH v3 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
  2022-08-09 13:11     ` [PATCH v3 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
  4 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-09 13:11 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from
a bundle. If the user is aware of a bundle server, then they can tell
Git to bootstrap the new repository with these bundles before fetching
the remaining objects from the origin server.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-clone.txt |  6 ++++++
 builtin/clone.c             | 15 +++++++++++++++
 t/t5558-clone-bundle-uri.sh | 36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+)
 create mode 100755 t/t5558-clone-bundle-uri.sh

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 632bd1348ea..60fedf7eb5e 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -323,6 +323,12 @@ or `--mirror` is given)
 	for `host.xz:foo/.git`).  Cloning into an existing directory
 	is only allowed if the directory is empty.
 
+--bundle-uri=<uri>::
+	Before fetching from the remote, fetch a bundle from the given
+	`<uri>` and unbundle the data into the local repository. The refs
+	in the bundle will be stored under the hidden `refs/bundle/*`
+	namespace.
+
 :git-clone: 1
 include::urls.txt[]
 
diff --git a/builtin/clone.c b/builtin/clone.c
index c4ff4643ecd..4224d562758 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -34,6 +34,7 @@
 #include "list-objects-filter-options.h"
 #include "hook.h"
 #include "bundle.h"
+#include "bundle-uri.h"
 
 /*
  * Overall FIXMEs:
@@ -77,6 +78,7 @@ static int option_filter_submodules = -1;    /* unspecified */
 static int config_filter_submodules = -1;    /* unspecified */
 static struct string_list server_options = STRING_LIST_INIT_NODUP;
 static int option_remote_submodules;
+static const char *bundle_uri;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -160,6 +162,8 @@ static struct option builtin_clone_options[] = {
 		    N_("any cloned submodules will use their remote-tracking branch")),
 	OPT_BOOL(0, "sparse", &option_sparse_checkout,
 		    N_("initialize sparse-checkout file to include only files at root")),
+	OPT_STRING(0, "bundle-uri", &bundle_uri,
+		   N_("uri"), N_("a URI for downloading bundles before fetching from origin remote")),
 	OPT_END()
 };
 
@@ -1232,6 +1236,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (transport->smart_options && !deepen && !filter_options.choice)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
+	/*
+	 * Before fetching from the remote, download and install bundle
+	 * data from the --bundle-uri option.
+	 */
+	if (bundle_uri) {
+		/* At this point, we need the_repository to match the cloned repo. */
+		repo_init(the_repository, git_dir, work_tree);
+		if (fetch_bundle_uri(the_repository, bundle_uri))
+			warning(_("failed to fetch objects from bundle URI '%s'"),
+				bundle_uri);
+	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
 	refspec_ref_prefixes(&remote->fetch,
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
new file mode 100755
index 00000000000..f709bcb729c
--- /dev/null
+++ b/t/t5558-clone-bundle-uri.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='test fetching bundles with --bundle-uri'
+
+. ./test-lib.sh
+
+test_expect_success 'fail to clone from non-existent file' '
+	test_when_finished rm -rf test &&
+	git clone --bundle-uri="$(pwd)/does-not-exist" . test 2>err &&
+	grep "failed to download bundle from URI" err
+'
+
+test_expect_success 'fail to clone from non-bundle file' '
+	test_when_finished rm -rf test &&
+	echo bogus >bogus &&
+	git clone --bundle-uri="$(pwd)/bogus" . test 2>err &&
+	grep "is not a bundle" err
+'
+
+test_expect_success 'create bundle' '
+	git init clone-from &&
+	git -C clone-from checkout -b topic &&
+	test_commit -C clone-from A &&
+	test_commit -C clone-from B &&
+	git -C clone-from bundle create B.bundle topic
+'
+
+test_expect_success 'clone with path bundle' '
+	git clone --bundle-uri="clone-from/B.bundle" \
+		clone-from clone-path &&
+	git -C clone-path rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-08-09 13:11     ` [PATCH v3 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
@ 2022-08-09 13:11     ` Derrick Stolee via GitGitGadget
  2022-08-29  4:58       ` Teng Long
  2022-08-09 13:11     ` [PATCH v3 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
  4 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-09 13:11 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change created the 'git clone --bundle-uri=<uri>' option.
Currently, <uri> must be a filename.

Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
and use git-remote-https as the way to download the data at that URI.
Otherwise, check to see if file:// is present and modify the prefix
accordingly.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 70 +++++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index b35babc36aa..4a8cc74ed05 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
 	return 0;
 }
 
-static int copy_uri_to_file(const char *file, const char *uri)
+static int download_https_uri_to_file(const char *file, const char *uri)
 {
-	/* File-based URIs only for now. */
-	return copy_file(file, uri, 0);
+	int result = 0;
+	struct child_process cp = CHILD_PROCESS_INIT;
+	FILE *child_in = NULL, *child_out = NULL;
+	struct strbuf line = STRBUF_INIT;
+	int found_get = 0;
+
+	strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
+	cp.in = -1;
+	cp.out = -1;
+
+	if (start_command(&cp))
+		return 1;
+
+	child_in = fdopen(cp.in, "w");
+	if (!child_in) {
+		result = 1;
+		goto cleanup;
+	}
+
+	child_out = fdopen(cp.out, "r");
+	if (!child_out) {
+		result = 1;
+		goto cleanup;
+	}
+
+	fprintf(child_in, "capabilities\n");
+	fflush(child_in);
+
+	while (!strbuf_getline(&line, child_out)) {
+		if (!line.len)
+			break;
+		if (!strcmp(line.buf, "get"))
+			found_get = 1;
+	}
+	strbuf_release(&line);
+
+	if (!found_get) {
+		result = error(_("insufficient capabilities"));
+		goto cleanup;
+	}
+
+	fprintf(child_in, "get %s %s\n\n", uri, file);
+
+cleanup:
+	if (child_in)
+		fclose(child_in);
+	if (finish_command(&cp))
+		return 1;
+	if (child_out)
+		fclose(child_out);
+	return result;
+}
+
+static int copy_uri_to_file(const char *filename, const char *uri)
+{
+	const char *out;
+
+	if (starts_with(uri, "https:") ||
+	    starts_with(uri, "http:"))
+		return download_https_uri_to_file(filename, uri);
+
+	if (skip_prefix(uri, "file://", &out))
+		uri = out;
+
+	/* Copy as a file */
+	return copy_file(filename, uri, 0);
 }
 
 static int unbundle_from_file(struct repository *r, const char *file)
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index f709bcb729c..ad666a2d28a 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone with file:// bundle' '
+	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
+		clone-from clone-file &&
+	git -C clone-file rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual
+'
+
+#########################################################################
+# HTTP tests begin here
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fail to fetch from non-existent HTTP URL' '
+	test_when_finished rm -rf test &&
+	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
+	grep "failed to download bundle from URI" err
+'
+
+test_expect_success 'fail to fetch from non-bundle HTTP URL' '
+	test_when_finished rm -rf test &&
+	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
+	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
+	grep "is not a bundle" err
+'
+
+test_expect_success 'clone HTTP bundle' '
+	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
+
+	git clone --no-local --mirror clone-from \
+		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
+
+	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
+		"$HTTPD_URL/smart/fetch.git" clone-http &&
+	git -C clone-http rev-parse refs/bundles/topic >actual &&
+	git -C clone-from rev-parse topic >expect &&
+	test_cmp expect actual &&
+
+	test_config -C clone-http log.excludedecoration refs/bundle/
+'
+
+# Do not add tests here unless they use the HTTP server, as they will
+# not run unless the HTTP dependencies exist.
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 5/5] clone: --bundle-uri cannot be combined with --depth
  2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-08-09 13:11     ` [PATCH v3 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-08-09 13:11     ` Derrick Stolee via GitGitGadget
  4 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-08-09 13:11 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change added the '--bundle-uri' option, but did not check
if the --depth parameter was included. Since bundles are not compatible
with shallow clones, provide an error message to the user who is
attempting this combination.

I am leaving this as its own change, separate from the one that
implements '--bundle-uri', because this is more of an advisory for the
user. There is nothing wrong with bootstrapping with bundles and then
fetching a shallow clone. However, that is likely going to involve too
much work for the client _and_ the server. The client will download all
of this bundle information containing the full history of the
repository only to ignore most of it. The server will get a shallow
fetch request, but with a list of haves that might cause a more painful
computation of that shallow pack-file.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-clone.txt | 3 ++-
 builtin/clone.c             | 3 +++
 t/t5606-clone-options.sh    | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 60fedf7eb5e..d032d971dd7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -327,7 +327,8 @@ or `--mirror` is given)
 	Before fetching from the remote, fetch a bundle from the given
 	`<uri>` and unbundle the data into the local repository. The refs
 	in the bundle will be stored under the hidden `refs/bundle/*`
-	namespace.
+	namespace. This option is incompatible with `--depth`,
+	`--shallow-since`, and `--shallow-exclude`.
 
 :git-clone: 1
 include::urls.txt[]
diff --git a/builtin/clone.c b/builtin/clone.c
index 4224d562758..4463789680b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -937,6 +937,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		option_no_checkout = 1;
 	}
 
+	if (bundle_uri && deepen)
+		die(_("--bundle-uri is incompatible with --depth, --shallow-since, and --shallow-exclude"));
+
 	repo_name = argv[0];
 
 	path = get_repo_path(repo_name, &is_bundle);
diff --git a/t/t5606-clone-options.sh b/t/t5606-clone-options.sh
index 8f676d6b0c0..f6bb02ab947 100755
--- a/t/t5606-clone-options.sh
+++ b/t/t5606-clone-options.sh
@@ -58,6 +58,14 @@ test_expect_success 'disallows --bare with --separate-git-dir' '
 
 '
 
+test_expect_success 'disallows --bundle-uri with shallow options' '
+	for option in --depth=1 --shallow-since=01-01-2000 --shallow-exclude=HEAD
+	do
+		test_must_fail git clone --bundle-uri=bundle $option from to 2>err &&
+		grep "bundle-uri is incompatible" err || return 1
+	done
+'
+
 test_expect_success 'reject cloning shallow repository' '
 	test_when_finished "rm -rf repo" &&
 	test_must_fail git clone --reject-shallow shallow-repo out 2>err &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 3/5] clone: add --bundle-uri option
  2022-08-09 13:11     ` [PATCH v3 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
@ 2022-08-22 21:24       ` Junio C Hamano
  2022-08-23 14:05         ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2022-08-22 21:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * Before fetching from the remote, download and install bundle
> +	 * data from the --bundle-uri option.
> +	 */
> +	if (bundle_uri) {
> +		/* At this point, we need the_repository to match the cloned repo. */
> +		repo_init(the_repository, git_dir, work_tree);
> +		if (fetch_bundle_uri(the_repository, bundle_uri))
> +			warning(_("failed to fetch objects from bundle URI '%s'"),
> +				bundle_uri);
> +	}

I do not offhand know why I suddenly started seeing the issue for
this relatively old commit I have had in my tree for at least 10
days, but I am getting this

builtin/clone.c: In function 'cmd_clone':
builtin/clone.c:1248:17: error: ignoring return value of 'repo_init' declared with attribute 'warn_unused_result' [-Werror=unused-result]
 1248 |                 repo_init(the_repository, git_dir, work_tree);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


with the commit merged in 'seen'.

It seems that the updated ab/submodule-helper-prep is doing it.  

Why can't that (or any of Ævar's) topic focus on what it needs to
do, without churning the codebase and inflict damages to other
topics like this?  Quite frustrating.

I'll redo today's integration without the offending topic to give
other topics test coverage.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 3/5] clone: add --bundle-uri option
  2022-08-22 21:24       ` Junio C Hamano
@ 2022-08-23 14:05         ` Derrick Stolee
  2022-08-24 15:46           ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2022-08-23 14:05 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, avarab, dyroneteng, Johannes.Schindelin,
	szeder.dev, mjcheetham, steadmon

On 8/22/2022 5:24 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	/*
>> +	 * Before fetching from the remote, download and install bundle
>> +	 * data from the --bundle-uri option.
>> +	 */
>> +	if (bundle_uri) {
>> +		/* At this point, we need the_repository to match the cloned repo. */
>> +		repo_init(the_repository, git_dir, work_tree);
>> +		if (fetch_bundle_uri(the_repository, bundle_uri))
>> +			warning(_("failed to fetch objects from bundle URI '%s'"),
>> +				bundle_uri);
>> +	}
> 
> I do not offhand know why I suddenly started seeing the issue for
> this relatively old commit I have had in my tree for at least 10
> days, but I am getting this
> 
> builtin/clone.c: In function 'cmd_clone':
> builtin/clone.c:1248:17: error: ignoring return value of 'repo_init' declared with attribute 'warn_unused_result' [-Werror=unused-result]
>  1248 |                 repo_init(the_repository, git_dir, work_tree);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> with the commit merged in 'seen'.

Thank you for pointing this out. Here is a patch that fixes
the issue from this patch:

--- >8 ---

From 6df8bc6d7ffdc1b115d85ef9550bab5dcf1811f8 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <derrickstolee@github.com>
Date: Tue, 23 Aug 2022 09:53:47 -0400
Subject: [PATCH] clone: warn on failure to repo_init()

The --bundle-uri option was added in 55568919616 (clone: add
--bundle-uri option, 2022-08-09), but this also introduced a call to
repo_init() whose return value was ignored. Fix that ignored value by
warning that the bundle URI process could not continue if it failed.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4463789680b..e21d42dfee5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1245,8 +1245,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 */
 	if (bundle_uri) {
 		/* At this point, we need the_repository to match the cloned repo. */
-		repo_init(the_repository, git_dir, work_tree);
-		if (fetch_bundle_uri(the_repository, bundle_uri))
+		if (repo_init(the_repository, git_dir, work_tree))
+			warning(_("failed to initialize the repo, skipping bundle URI"));
+		else if (fetch_bundle_uri(the_repository, bundle_uri))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
 	}
-- 
2.37.1.vfs.0.0.rebase



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 3/5] clone: add --bundle-uri option
  2022-08-23 14:05         ` Derrick Stolee
@ 2022-08-24 15:46           ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2022-08-24 15:46 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, avarab,
	dyroneteng, Johannes.Schindelin, szeder.dev, mjcheetham,
	steadmon

Derrick Stolee <derrickstolee@github.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
> Date: Tue, 23 Aug 2022 09:53:47 -0400
> Subject: [PATCH] clone: warn on failure to repo_init()
>
> The --bundle-uri option was added in 55568919616 (clone: add
> --bundle-uri option, 2022-08-09), but this also introduced a call to
> repo_init() whose return value was ignored. Fix that ignored value by
> warning that the bundle URI process could not continue if it failed.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

OK.  It looks like a sensible thing to do.

>  builtin/clone.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 4463789680b..e21d42dfee5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1245,8 +1245,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	 */
>  	if (bundle_uri) {
>  		/* At this point, we need the_repository to match the cloned repo. */
> -		repo_init(the_repository, git_dir, work_tree);
> -		if (fetch_bundle_uri(the_repository, bundle_uri))
> +		if (repo_init(the_repository, git_dir, work_tree))
> +			warning(_("failed to initialize the repo, skipping bundle URI"));
> +		else if (fetch_bundle_uri(the_repository, bundle_uri))
>  			warning(_("failed to fetch objects from bundle URI '%s'"),
>  				bundle_uri);
>  	}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-09 13:11     ` [PATCH v3 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-08-29  4:58       ` Teng Long
  2022-08-30 13:33         ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Teng Long @ 2022-08-29  4:58 UTC (permalink / raw)
  To: gitgitgadget
  Cc: Johannes.Schindelin, avarab, derrickstolee, dyroneteng, git,
	gitster, me, mjcheetham, newren, steadmon, szeder.dev

Derrick Stolee <derrickstolee@github.com> writes:

> +	while (!strbuf_getline(&line, child_out)) {
> +		if (!line.len)
> +			break;
> +		if (!strcmp(line.buf, "get"))
> +			found_get = 1;
> +	}

Clear implementation for me to read, thanks.

I'm not sure but maybe a nit, should it be a "break;" after
"found_get = 1;" for omitting the left uncencerned capabilities
to quit earlier?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 4/5] bundle-uri: add support for http(s):// and file://
  2022-08-29  4:58       ` Teng Long
@ 2022-08-30 13:33         ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2022-08-30 13:33 UTC (permalink / raw)
  To: Teng Long, gitgitgadget
  Cc: Johannes.Schindelin, avarab, git, gitster, me, mjcheetham,
	newren, steadmon, szeder.dev

On 8/29/2022 12:58 AM, Teng Long wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> +	while (!strbuf_getline(&line, child_out)) {
>> +		if (!line.len)
>> +			break;
>> +		if (!strcmp(line.buf, "get"))
>> +			found_get = 1;
>> +	}
> 
> Clear implementation for me to read, thanks.
> 
> I'm not sure but maybe a nit, should it be a "break;" after
> "found_get = 1;" for omitting the left uncencerned capabilities
> to quit earlier?

Teng, you are right that it does not hurt to have a 'break'
here. Since this topic has already merged, that change will
need a forward-fix that I can add to a future series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-08-30 13:33 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-25 20:34 [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
2022-07-25 20:34 ` [PATCH 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-27 22:08   ` Josh Steadmon
2022-07-27 23:00   ` Ævar Arnfjörð Bjarmason
2022-08-01 13:55     ` Derrick Stolee
2022-08-01 14:39       ` Ævar Arnfjörð Bjarmason
2022-07-25 20:34 ` [PATCH 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-07-27 22:09   ` Josh Steadmon
2022-08-01 13:58     ` Derrick Stolee
2022-07-25 20:34 ` [PATCH 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-07-25 20:34 ` [PATCH 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-07-27 22:09   ` Josh Steadmon
2022-08-01 14:00     ` Derrick Stolee
2022-07-25 20:34 ` [PATCH 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
2022-07-27 22:11 ` [PATCH 0/5] Bundle URIs II: git clone --bundle-uri Josh Steadmon
2022-08-01 14:00   ` Derrick Stolee
2022-08-02 12:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2022-08-02 12:29   ` [PATCH v2 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-08-02 12:29   ` [PATCH v2 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-08-02 12:29   ` [PATCH v2 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-08-02 12:29   ` [PATCH v2 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-08-02 21:32     ` Junio C Hamano
2022-08-04 15:34       ` Derrick Stolee
2022-08-04 18:19         ` Junio C Hamano
2022-08-02 12:29   ` [PATCH v2 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
2022-08-09 13:11   ` [PATCH v3 0/5] Bundle URIs II: git clone --bundle-uri Derrick Stolee via GitGitGadget
2022-08-09 13:11     ` [PATCH v3 1/5] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-08-09 13:11     ` [PATCH v3 2/5] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-08-09 13:11     ` [PATCH v3 3/5] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-08-22 21:24       ` Junio C Hamano
2022-08-23 14:05         ` Derrick Stolee
2022-08-24 15:46           ` Junio C Hamano
2022-08-09 13:11     ` [PATCH v3 4/5] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-08-29  4:58       ` Teng Long
2022-08-30 13:33         ` Derrick Stolee
2022-08-09 13:11     ` [PATCH v3 5/5] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).