All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Albert Cui" <albertqcui@gmail.com>,
	"Robin H . Johnson" <robbat2@gentoo.org>,
	"Teng Long" <dyroneteng@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [RFC PATCH v2 13/13] bundle-uri: make the download program configurable
Date: Fri, 11 Mar 2022 17:24:25 +0100	[thread overview]
Message-ID: <RFC-patch-v2-13.13-40f37c8b9d5-20220311T155841Z-avarab@gmail.com> (raw)
In-Reply-To: <RFC-cover-v2-00.13-00000000000-20220311T155841Z-avarab@gmail.com>

As noted in a preceding commit we really should be using libcurl's C
API by default in get_bundle_uri(), but testing with a command-line
program can be very handy, and useful e.g. to implement custom or
ad-hoc caching.

E.g. using part of the recipe noted in a preceding commit to create
the "git-master-only.bdl" and "git-master-to-next.bdl" files, we can
implement a trivial caching shellscript as:

	cat >get-bundle.sh <<-\EOF &&
	#!/bin/sh
	set -xe

	uri="$1"

	bundle_cache_key () {
		echo "Computing cache key for URI '$1' (only getting the header)" >&2

		curl --silent --output - -- "$1" |
		sed -n -e '/^$/q' -e 'p' |
		git hash-object --stdin
	}

	get_cached_bundle_uri() {
		cache_key=$(bundle_cache_key "$1")

		path="/tmp/bundle-cache-$cache_key.bdl"

		if test -e "$path"
		then
			echo "Using cache '$path' for URI '$1'" >&2
			cat "$path"
		else
			echo "Downloading bundle URI $1" >&2
			curl --silent --output - -- "$uri" | tee "$path"
		fi
	}

	get_cached_bundle_uri "$1"
	EOF
	chmod +x get-bundle.sh &&
	rm -rf /tmp/git.git &&
	./git \
		-c protocol.version=2 \
		-c fetch.uriProtocols=file \
		-c transfer.bundleURI.downloader=./get-bundle.sh \
		-c transfer.injectBundleURI="file:///tmp/git-master-only.bdl" \
		-c transfer.injectBundleURI="file:///tmp/git-master-to-next.bdl" \
		clone --bare --no-tags --single-branch --branch next --template= \
		--verbose --verbose \
		https://github.com/git/git.git /tmp/git.git

Now, clearly that specific example is rather pointless. We're getting
a local file anyway, so "cat"-ing another local file doesn't make any
difference, it's even slightly slower & more redundant as we're having
to get it twice with "curl".

But the point is that this can be trivially improved for use in any
arbitrary custom caching strategy. E.g.:

 * A less dumber implementation that would stream the remote URL,
   check the header as we go, and disconnect if we've got that content
   locally.
 * Ditto, but using an ETag or other strategy.
 * N boxes could share a cache an NFS with a shared mount, or N
   disconnected git processes could use a common cache without the
   need for a front-line HTTP proxy server.

 * It would be trivial to extend this to guard against a "thundering
   herd" (e.g. concurrent CI) downloading the same bundle N times. As
   soon as we'd get the header we'd create a $cache_key.lock as we
   download the rest, and other concurrent clients spotting that would
   wait, then eventually cache "$cache_key".

   Still racy as N clients could download the header in parallel, but
   way less so (the header will be a tiny part of the payload).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/transfer.txt | 7 +++++++
 fetch-pack.c                      | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt
index ae85ca5760b..5310cd96cb9 100644
--- a/Documentation/config/transfer.txt
+++ b/Documentation/config/transfer.txt
@@ -84,6 +84,13 @@ transfer.bundleURI::
 	using bundles to bootstap is possible. Defaults to `true`,
 	i.e. bundle-uri is tried whenever a server offers it.
 
+transfer.bundleURI.downloader::
+	When set to `<program>` will be invoked when
+	`transfer.bundleURI` is in effect to download URIs containing
+	bundles. Expected to take one `URI` as an argument, and to
+	emit the bundle on STDOUT. Defaults to "curl --silent --output
+	- --". I.e. we'll invoke "curl --silent --output - -- <URI>".
+
 transfer.injectBundleURI::
 	Allows for the injection of `bundle-uri` lines into the
 	protocol v2 transport dialog (see `protocol.version` in
diff --git a/fetch-pack.c b/fetch-pack.c
index 4f1a7acb20d..6e22605f06c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1116,12 +1116,18 @@ static int get_bundle_uri(struct string_list_item *item, unsigned int nth,
 	const char *uri = item->string;
 	FILE *out;
 	int out_fd;
+	const char *tmp;
 
 	strvec_push(&cmd.args, "curl");
 	strvec_push(&cmd.args, "--silent");
 	strvec_push(&cmd.args, "--output");
 	strvec_push(&cmd.args, "-");
 	strvec_push(&cmd.args, "--");
+	if (!git_config_get_string_tmp("transfer.bundleURI.downloader", &tmp)) {
+		strvec_clear(&cmd.args);
+		strvec_push(&cmd.args, tmp);
+		cmd.use_shell = 1;
+	}
 	strvec_push(&cmd.args, item->string);
 	cmd.git_cmd = 0;
 	cmd.no_stdin = 1;
-- 
2.35.1.1337.g7e32d794afe


  parent reply	other threads:[~2022-03-11 16:25 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 21:25 [PATCH 0/3] bundle-uri: "dumb" static CDN offloading, spec & server implementation Ævar Arnfjörð Bjarmason
2021-10-25 21:25 ` [PATCH 1/3] leak tests: mark t5701-git-serve.sh as passing SANITIZE=leak Ævar Arnfjörð Bjarmason
2021-10-25 21:25 ` [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri" Ævar Arnfjörð Bjarmason
2021-10-26 14:00   ` Derrick Stolee
2021-10-26 15:00     ` Ævar Arnfjörð Bjarmason
2021-10-27  1:55       ` Derrick Stolee
2021-10-27 17:49         ` Ævar Arnfjörð Bjarmason
2021-10-27  2:01   ` Derrick Stolee
2021-10-27  8:29     ` Ævar Arnfjörð Bjarmason
2021-10-27 16:31       ` Derrick Stolee
2021-10-27 18:01         ` Ævar Arnfjörð Bjarmason
2021-10-27 19:23           ` Derrick Stolee
2021-10-27 20:22             ` Ævar Arnfjörð Bjarmason
2021-10-29 18:30               ` Derrick Stolee
2021-10-30 14:51           ` Philip Oakley
2021-10-25 21:25 ` [PATCH 3/3] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-10-26 14:05   ` Derrick Stolee
2021-10-29 18:46 ` [PATCH 0/3] bundle-uri: "dumb" static CDN offloading, spec & server implementation Derrick Stolee
2021-10-30  7:21   ` Ævar Arnfjörð Bjarmason
2021-11-01 21:00     ` Derrick Stolee
2021-11-01 23:18       ` Ævar Arnfjörð Bjarmason
2022-03-11 16:24 ` [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 01/13] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 02/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 03/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 04/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 05/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 06/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 07/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 08/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 11/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` Ævar Arnfjörð Bjarmason [this message]
2022-03-11 21:28   ` [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git Derrick Stolee
2022-04-18 17:23   ` [RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 01/36] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 02/36] dir API: add a generalized path_match_flags() function Ævar Arnfjörð Bjarmason
2022-04-21 17:26       ` Derrick Stolee
2022-04-18 17:23     ` [RFC PATCH v2 03/36] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2022-04-21 17:28       ` Derrick Stolee
2022-04-18 17:23     ` [RFC PATCH v2 04/36] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 05/36] http: make http_get_file() external Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 06/36] remote: move relative_url() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 07/36] remote: allow relative_url() to return an absolute url Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 08/36] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 09/36] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 10/36] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 11/36] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 12/36] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 13/36] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 14/36] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 15/36] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 16/36] bundle-uri: make the download program configurable Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 17/36] remote-curl: add 'get' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 18/36] bundle: implement 'fetch' command for direct bundles Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 19/36] bundle: parse table of contents during 'fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 20/36] bundle: add --filter option to 'fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 21/36] bundle: allow relative URLs in table of contents Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 22/36] bundle: make it easy to call 'git bundle fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 23/36] clone: add --bundle-uri option Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 24/36] clone: --bundle-uri cannot be combined with --depth Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 25/36] bundle: only fetch bundles if timestamp is new Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 26/36] fetch: fetch bundles before fetching original data Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 27/36] protocol-caps: implement cap_features() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 28/36] serve: understand but do not advertise 'features' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 29/36] serve: advertise 'features' when config exists Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 30/36] connect: implement get_recommended_features() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 31/36] transport: add connections for 'features' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 32/36] clone: use server-recommended bundle URI Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 33/36] t5601: basic bundle URI test Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 34/36] protocol v2: add server-side "bundle-uri" skeleton (docs) Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 35/36] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 36/36] docs: document bundle URI standard Ævar Arnfjörð Bjarmason
2022-04-21 19:54     ` [RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format Derrick Stolee
2022-04-22  9:37       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=RFC-patch-v2-13.13-40f37c8b9d5-20220311T155841Z-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=albertqcui@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.