All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>
Subject: [PATCH 5/8] Documentation: add Packfile URIs design doc
Date: Fri, 29 May 2020 15:30:17 -0700	[thread overview]
Message-ID: <4eea9d927af1df11cdb0342e969b293a6e317d46.1590789428.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1590789428.git.jonathantanmy@google.com>

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/packfile-uri.txt | 78 ++++++++++++++++++++++++
 Documentation/technical/protocol-v2.txt  | 28 ++++++++-
 2 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/technical/packfile-uri.txt

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
new file mode 100644
index 0000000000..6a5a6440d5
--- /dev/null
+++ b/Documentation/technical/packfile-uri.txt
@@ -0,0 +1,78 @@
+Packfile URIs
+=============
+
+This feature allows servers to serve part of their packfile response as URIs.
+This allows server designs that improve scalability in bandwidth and CPU usage
+(for example, by serving some data through a CDN), and (in the future) provides
+some measure of resumability to clients.
+
+This feature is available only in protocol version 2.
+
+Protocol
+--------
+
+The server advertises `packfile-uris`.
+
+If the client then communicates which protocols (HTTPS, etc.) it supports with
+a `packfile-uris` argument, the server MAY send a `packfile-uris` section
+directly before the `packfile` section (right after `wanted-refs` if it is
+sent) containing URIs of any of the given protocols. The URIs point to
+packfiles that use only features that the client has declared that it supports
+(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
+this section.
+
+Clients then should understand that the returned packfile could be incomplete,
+and that it needs to download all the given URIs before the fetch or clone is
+complete.
+
+Server design
+-------------
+
+The server can be trivially made compatible with the proposed protocol by
+having it advertise `packfile-uris`, tolerating the client sending
+`packfile-uris`, and never sending any `packfile-uris` section. But we should
+include some sort of non-trivial implementation in the Minimum Viable Product,
+at least so that we can test the client.
+
+This is the implementation: a feature, marked experimental, that allows the
+server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
+<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
+with the given sha1 can be replaced by the given URI. This allows, for example,
+servers to delegate serving of large blobs to CDNs.
+
+Client design
+-------------
+
+While fetching, the client needs to remember the list of URIs and cannot
+declare that the fetch is complete until all URIs have been downloaded as
+packfiles.
+
+The division of work (initial fetch + additional URIs) introduces convenient
+points for resumption of an interrupted clone - such resumption can be done
+after the Minimum Viable Product (see "Future work").
+
+The client can inhibit this feature (i.e. refrain from sending the
+`packfile-uris` parameter) by passing --no-packfile-uris to `git fetch`.
+
+Future work
+-----------
+
+The protocol design allows some evolution of the server and client without any
+need for protocol changes, so only a small-scoped design is included here to
+form the MVP. For example, the following can be done:
+
+ * On the server, a long-running process that takes in entire requests and
+   outputs a list of URIs and the corresponding inclusion and exclusion sets of
+   objects. This allows, e.g., signed URIs to be used and packfiles for common
+   requests to be cached.
+ * On the client, resumption of clone. If a clone is interrupted, information
+   could be recorded in the repository's config and a "clone-resume" command
+   can resume the clone in progress. (Resumption of subsequent fetches is more
+   difficult because that must deal with the user wanting to use the repository
+   even after the fetch was interrupted.)
+
+There are some possible features that will require a change in protocol:
+
+ * Additional HTTP headers (e.g. authentication)
+ * Byte range support
+ * Different file formats referenced by URIs (e.g. raw object)
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index ef7514a3ee..7e1b3a0bfe 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -325,13 +325,26 @@ included in the client's request:
 	indicating its sideband (1, 2, or 3), and the server may send "0005\2"
 	(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
 
+If the 'packfile-uris' feature is advertised, the following argument
+can be included in the client's request as well as the potential
+addition of the 'packfile-uris' section in the server's response as
+explained below.
+
+    packfile-uris <comma-separated list of protocols>
+	Indicates to the server that the client is willing to receive
+	URIs of any of the given protocols in place of objects in the
+	sent packfile. Before performing the connectivity check, the
+	client should download from all given URIs. Currently, the
+	protocols supported are "http" and "https".
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header. Most sections are sent only when the packfile is sent.
 
     output = acknowledgements flush-pkt |
 	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
-	     [wanted-refs delim-pkt] packfile flush-pkt
+	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
+	     packfile flush-pkt
 
     acknowledgments = PKT-LINE("acknowledgments" LF)
 		      (nak | *ack)
@@ -349,6 +362,9 @@ header. Most sections are sent only when the packfile is sent.
 		  *PKT-LINE(wanted-ref LF)
     wanted-ref = obj-id SP refname
 
+    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
+    packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)
+
     packfile = PKT-LINE("packfile" LF)
 	       *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -420,6 +436,16 @@ header. Most sections are sent only when the packfile is sent.
 	* The server MUST NOT send any refs which were not requested
 	  using 'want-ref' lines.
 
+    packfile-uris section
+	* This section is only included if the client sent
+	  'packfile-uris' and the server has at least one such URI to
+	  send.
+
+	* Always begins with the section header "packfile-uris".
+
+	* For each URI the server sends, it sends a hash of the pack's
+	  contents (as output by git index-pack) followed by the URI.
+
     packfile section
 	* This section is only included if the client has sent 'want'
 	  lines in its request and either requested that no more
-- 
2.27.0.rc0.183.gde8f92d652-goog


  parent reply	other threads:[~2020-05-29 22:30 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29 22:30 [PATCH 0/8] CDN offloading update Jonathan Tan
2020-05-29 22:30 ` [PATCH 1/8] http: use --stdin when getting dumb HTTP pack Jonathan Tan
2020-05-29 23:00   ` Junio C Hamano
2020-06-01 20:37     ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 2/8] http: improve documentation of http_pack_request Jonathan Tan
2020-05-29 22:30 ` [PATCH 3/8] http-fetch: support fetching packfiles by URL Jonathan Tan
2020-05-29 23:25   ` Junio C Hamano
2020-06-01 20:54     ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 4/8] Documentation: order protocol v2 sections Jonathan Tan
2020-05-29 23:32   ` Junio C Hamano
2020-06-01 20:57     ` Jonathan Tan
2020-05-29 22:30 ` Jonathan Tan [this message]
2020-05-30  0:15   ` [PATCH 5/8] Documentation: add Packfile URIs design doc Junio C Hamano
2020-05-30  0:22     ` Junio C Hamano
2020-06-01 23:10       ` Jonathan Tan
2020-06-01 23:07     ` Jonathan Tan
2020-06-10  1:14     ` Jonathan Tan
2020-06-10 17:16       ` Junio C Hamano
2020-06-10 18:04         ` Jonathan Tan
2020-05-29 22:30 ` [PATCH 6/8] upload-pack: refactor reading of pack-objects out Jonathan Tan
2020-05-29 22:30 ` [PATCH 7/8] fetch-pack: support more than one pack lockfile Jonathan Tan
2020-05-29 22:30 ` [PATCH 8/8] upload-pack: send part of packfile response as uri Jonathan Tan
2020-05-31 16:59   ` Junio C Hamano
2020-05-31 17:35   ` Junio C Hamano
2020-06-01 23:20     ` Jonathan Tan
2020-06-01 20:00   ` Jonathan Nieder
2020-06-10 20:57 ` [PATCH v2 0/9] CDN offloading update Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 1/9] http: use --stdin when indexing dumb HTTP pack Jonathan Tan
2020-06-11  1:10     ` Junio C Hamano
2020-06-10 20:57   ` [PATCH v2 2/9] http: refactor finish_http_pack_request() Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 3/9] http-fetch: refactor into function Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 4/9] http-fetch: support fetching packfiles by URL Jonathan Tan
2020-06-11  1:30     ` Junio C Hamano
2020-06-10 20:57   ` [PATCH v2 5/9] Documentation: order protocol v2 sections Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 6/9] Documentation: add Packfile URIs design doc Jonathan Tan
2020-06-11  1:55     ` Junio C Hamano
2020-11-25  9:15     ` Ævar Arnfjörð Bjarmason
2020-11-25 19:09       ` Jonathan Tan
2020-12-01 12:48         ` Ævar Arnfjörð Bjarmason
2020-06-10 20:57   ` [PATCH v2 7/9] upload-pack: refactor reading of pack-objects out Jonathan Tan
2020-06-10 20:57   ` [PATCH v2 8/9] fetch-pack: support more than one pack lockfile Jonathan Tan
2020-06-11  1:41     ` Junio C Hamano
2020-06-10 20:57   ` [PATCH v2 9/9] upload-pack: send part of packfile response as uri Jonathan Tan
2020-06-10 23:16   ` [PATCH v2 0/9] CDN offloading update Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4eea9d927af1df11cdb0342e969b293a6e317d46.1590789428.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.