git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Return of smart HTTP
@ 2009-10-09  5:22 Shawn O. Pearce
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
  0 siblings, 1 reply; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-09  5:22 UTC (permalink / raw)
  To: git

This is an RFC series to restart the smart HTTP transport work.

Those familiar with the native git:// protocol should be able to
quickly understand what I'm doing here by looking at only the last
two patches.

This time around I actually have the whole thing fully implemented
in JGit (both client and server), and am now trying to port that
over to C git.git, as well as document it in depth.

The JGit series can be found here at Eclipse.org:

  http://egit.eclipse.org/r/
  git://egit.eclipse.org/egit/parallelip-jgit refs/changes/50/50/4

This RFC C Git series only implements the server side, and only
has partial documentation.  I did some limited smoke testing with
the JGit client against this server, it seems to work as expected.

I plan on trying to write the C Git clients tomorrow.  The
send-pack/receive-pack protocol is trivial and shouldn't be
that hard, but the fetch-pack/upload-pack protocol is going
to be somewhat interesting...


Shawn O. Pearce (4):
  Document the HTTP transport protocol
  Git-aware CGI to provide dumb HTTP transport
  Add smart-http options to upload-pack, receive-pack
  Smart fetch and push over HTTP: server side

 .gitignore                                |    1 +
 Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
 Makefile                                  |    1 +
 builtin-receive-pack.c                    |   26 +-
 http-backend.c                            |  394 +++++++++++++++++++++
 upload-pack.c                             |   40 ++-
 6 files changed, 994 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/technical/http-protocol.txt
 create mode 100644 http-backend.c

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 [RFC PATCH 0/4] Return of smart HTTP Shawn O. Pearce
@ 2009-10-09  5:22 ` Shawn O. Pearce
  2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
                     ` (8 more replies)
  0 siblings, 9 replies; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-09  5:22 UTC (permalink / raw)
  To: git

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
 1 files changed, 542 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/http-protocol.txt

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
new file mode 100644
index 0000000..316d9b6
--- /dev/null
+++ b/Documentation/technical/http-protocol.txt
@@ -0,0 +1,542 @@
+HTTP transfer protocols
+=======================
+
+Git supports two HTTP based transfer protocols.  A "dumb" protocol
+which requires only a standard HTTP server on the server end of the
+connection, and a "smart" protocol which requires a Git aware CGI
+(or server module).  This document describes both protocols.
+
+As a design feature smart clients can automatically upgrade "dumb"
+protocol URLs to smart URLs.  This permits all users to have the
+same published URL, and the peers automatically select the most
+efficient transport available to them.
+
+
+URL Format
+----------
+
+URLs for Git repositories accessed by HTTP use the standard HTTP
+URL syntax documented by RFC 1738, so they are of the form:
+
+  http://<host>:<port>/<path>
+
+Within this documentation the placeholder $GIT_URL will stand for
+the http:// repository URL entered by the end-user.
+
+Both the "smart" and "dumb" HTTP protocols used by Git operate
+by appending additional path components onto the end of the user
+supplied $GIT_URL string.
+
+Clients MUST strip a trailing '/', if present, from the user supplied
+$GIT_URL string to prevent empty path tokens ('//') from appearing
+in any URL sent to a server.  Compatible clients must expand
+'$GIT_URL/info/refs' as 'foo/info/refs' and not 'foo//info/refs'.
+
+
+Authentication
+--------------
+
+Standard HTTP authentication is used if authentication is required
+to access a repository, and MAY be configured and enforced by the
+HTTP server software.
+
+Because Git repositories are accessed by standard path components
+server administrators MAY use directory based permissions within
+their HTTP server to control repository access.
+
+Clients SHOULD support Basic authentication as described by RFC 2616.
+Servers SHOULD support Basic authentication by relying upon the
+HTTP server placed in front of the Git server software.
+
+Servers MUST NOT require HTTP cookies for the purposes of
+authentication or access control.
+
+Clients and servers MAY support other common forms of HTTP based
+authentication, such as Digest authentication.
+
+
+SSL
+---
+
+Clients and servers SHOULD support SSL, particularly to protect
+passwords when relying on Basic HTTP authentication.
+
+
+Session State
+-------------
+
+The Git over HTTP protocol (much like HTTP itself) is stateless
+from the perspective of the HTTP server side.  All state must be
+retained and managed by the client process.  This permits simple
+round-robin load-balancing on the server side, without needing to
+worry about state mangement.
+
+Clients MUST NOT require state management on the server side in
+order to function correctly.
+
+Servers MUST NOT require HTTP cookies in order to function correctly.
+Clients MAY store and forward HTTP cookies during request processing
+as described by RFC 2616 (HTTP/1.1).  Servers SHOULD ignore any
+cookies sent by a client.
+
+
+pkt-line Format
+---------------
+
+Much (but not all) of the payload is described around pkt-lines.
+
+A pkt-line is a variable length binary string.  The first four bytes
+of the line indicates the total length of the line, in hexadecimal.
+The total length includes the 4 bytes used to denote the length.
+A line SHOULD BE terminated by an LF, which if present MUST be
+included in the total length.
+
+A pkt-line MAY contain binary data, so implementors MUST ensure all
+pkt-line parsing/formatting routines are 8-bit clean.  The maximum
+length of a pkt-line's data is 65532 bytes (65536 - 4).
+
+Examples (as C-style strings):
+
+  pkt-line          actual value
+  ---------------------------------
+  "0006a\n"         "a\n"
+  "0005a"           "a"
+  "000bfoobar\n"    "foobar\n"
+  "0004"            ""
+
+A pkt-line with a length of 0 ("0000") is a special case and MUST
+be treated as a message break or terminator in the payload.
+
+
+General Request Processing
+--------------------------
+
+Except where noted, all standard HTTP behavior SHOULD be assumed
+by both client and server.  This includes (but is not necessarily
+limited to):
+
+If there is no repository at $GIT_URL, the server MUST respond with
+the '404 Not Found' HTTP status code.
+
+If there is a repository at $GIT_URL, but access is not currently
+permitted, the server MUST respond with the '403 Forbidden' HTTP
+status code.
+
+Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
+Servers SHOULD support chunked encoding for both
+request and response bodies.
+
+Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
+Clients SHOULD support chunked encoding for both
+request and response bodies.
+
+Servers MAY return ETag and/or Last-Modified headers.
+
+Clients MAY revalidate cached entities by including If-Modified-Since
+and/or If-None-Match request headers.
+
+Servers MAY return '304 Not Modified' if the relevant headers appear
+in the request and the entity has not changed.  Clients MUST treat
+'304 Not Modified' identical to '200 OK' by reusing the cached entity.
+
+Clients MAY reuse a cached entity without revalidation if the
+Cache-Control and/or Expires header permits caching.  Clients and
+servers MUST follow RFC 2616 for cache controls.
+
+
+Discovering References
+----------------------
+
+All HTTP clients MUST begin either a fetch or a push exchange by
+discovering the references available on the remote repository.
+
+Dumb Clients
+~~~~~~~~~~~~
+
+HTTP clients that only support the "dumb" protocol MUST discover
+references by making a request for the special info/refs file of
+the repository.
+
+Dumb HTTP clients MUST NOT include search/query parameters when
+fetching the info/refs file.  (That is, '?' must not appear in the
+requested URL.)
+
+	C: GET $GIT_URL/info/refs HTTP/1.0
+
+	S: 200 OK
+	S:
+	S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+	S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+	S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+	S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+The Content-Type of the returned info/refs entity SHOULD be
+"text/plain; charset=utf-8", but MAY be any content type.
+Clients MUST NOT attempt to validate the returned Content-Type.
+Dumb servers MUST NOT return a return type starting with
+"application/x-git-".
+
+Cache-Control headers MAY be returned to disable caching of the
+returned entity.
+
+When examining the response clients SHOULD only examine the HTTP
+status code.  Valid responses are '200 OK', or '304 Not Modified'.
+
+The returned content is a UNIX formatted text file describing
+each ref and its known value.  The file SHOULD be sorted by name
+according to the C locale ordering.  The file SHOULD NOT include
+the default ref named 'HEAD'.
+
+	info_refs     = *( ref_record )
+	ref_record    = any_ref | peeled_ref
+
+	any_ref       = id HT name LF
+	peeled_ref    = id HT name LF
+	                id HT name "^{}" LF
+	id            = 40*HEX
+
+	HEX           = "0".."9" | "a".."f"
+	LF            = <US-ASCII LF, linefeed (10)>
+	HT            = <US-ASCII HT, horizontal-tab (9)>
+
+Smart Clients
+~~~~~~~~~~~~~
+
+HTTP clients that support the "smart" protocol (or both the
+"smart" and "dumb" protocols) MUST discover references by making
+a paramterized request for the info/refs file of the repository.
+
+The request MUST contain exactly one query parameter,
+'service=$servicename', where $servicename MUST be the service
+name the client wishes to contact to complete the operation.
+The request MUST NOT contain additional query parameters.
+
+	C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+
+	dumb server reply:
+	S: 200 OK
+	S:
+	S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+	S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+	S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+	S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+	smart server reply:
+	S: 200 OK
+	S: Content-Type: application/x-git-upload-pack-advertisement
+	S: Cache-Control: no-cache
+	S:
+	S: ....# service=git-upload-pack
+	S: ....95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0 multi_ack
+	S: ....d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
+	S: ....2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
+	S: ....a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
+
+Dumb Server Response
+^^^^^^^^^^^^^^^^^^^^
+Dumb servers MUST respond with the dumb server reply format.
+
+See the prior section under dumb clients for a more detailed
+description of the dumb server response.
+
+Smart Server Response
+^^^^^^^^^^^^^^^^^^^^^
+Smart servers MUST respond with the smart server reply format.
+
+If the server does not recognize the requested service name, or the
+requested service name has been disabled by the server administrator,
+the server MUST respond with the '403 Forbidden' HTTP status code.
+
+Cache-Control headers SHOULD be used to disable caching of the
+returned entity.
+
+The Content-Type MUST be 'application/x-$servicename-advertisement'.
+Clients SHOULD fall back to the dumb protocol if another content
+type is returned.  When falling back to the dumb protocol clients
+SHOULD NOT make an additional request to $GIT_URL/info/refs, but
+instead SHOULD use the response already in hand.  Clients MUST NOT
+continue if they do not support the dumb protocol.
+
+Clients MUST validate the status code is either '200 OK' or
+'304 Not Modified'.
+
+Clients MUST validate the first five bytes of the response entity
+matches the regex "^[0-9a-f]{4}#".  If this test fails, clients
+MUST NOT continue.
+
+Clients MUST parse the entire response as a sequence of pkt-line
+records.
+
+Clients MUST verify the first pkt-line is "# service=$servicename".
+Servers MUST set $servicename to be the request parameter value.
+Servers SHOULD include an LF at the end of this line.
+Clients MUST ignore an LF at the end of the line.
+
+Servers MUST terminate the response with the magic "0000" end
+pkt-line marker.
+
+The returned response is a pkt-line stream describing each ref and
+its known value.  The stream SHOULD be sorted by name according to
+the C locale ordering.  The stream SHOULD include the default ref
+named 'HEAD' as the first ref.  The stream MUST include capability
+declarations behind a NUL on the first ref.
+
+	smart_reply    = PKT-LINE("# service=$servicename" LF)
+	                 ref_list
+	                 "0000"
+	ref_list       = empty_list | populated_list
+
+	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
+
+	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
+	                 *ref_record
+
+	cap_list      = *(SP capability) SP
+	ref_record    = any_ref | peeled_ref
+
+	any_ref       = PKT-LINE(id SP name LF)
+	peeled_ref    = PKT-LINE(id SP name LF)
+	                PKT-LINE(id SP name "^{}" LF
+	id            = 40*HEX
+
+	HEX           = "0".."9" | "a".."f"
+	NL            = <US-ASCII NUL, null (0)>
+	LF            = <US-ASCII LF,  linefeed (10)>
+	SP            = <US-ASCII SP,  horizontal-tab (9)>
+
+
+Smart Service git-upload-pack
+------------------------------
+This service reads from the remote repository.
+
+Clients MUST first perform ref discovery with
+'$GIT_URL/info/refs?service=git-upload-pack'.
+
+	C: POST $GIT_URL/git-upload-pack HTTP/1.0
+	C: Content-Type: application/x-git-upload-pack-request
+	C:
+	C: ....want 0a53e9ddeaddad63ad106860237bbf53411d11a7
+	C: ....have 441b40d833fdfa93eb2908e52742248faf0ee993
+	C: 0000
+
+	S: 200 OK
+	S: Content-Type: application/x-git-upload-pack-result
+	S: Cache-Control: no-cache
+	S:
+	S: ....ACK %s, continue
+	S: ....NAK
+
+Clients MUST NOT reuse or revalidate a cached reponse.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one 'want' command in the request body.
+Clients MUST NOT reference an id in a 'want' command which did not
+appear in the response obtained through ref discovery.
+
+	compute_request   = want_list
+	                    have_list
+	                    request_end
+	request_end       = "0000" | "done"
+
+	want_list         = PKT-LINE(want NUL cap_list LF)
+	                    *(want_pkt)
+	want_pkt          = PKT-LINE(want LF)
+	want              = "want" SP id
+	cap_list          = *(SP capability) SP
+
+	have_list         = *PKT-LINE("have" SP id LF)
+
+	command           = create | delete | update
+	create            = 40*"0" SP new_id SP name
+	delete            = old_id SP 40*"0" SP name
+	update            = old_id SP new_id SP name
+
+TODO: Document this further.
+TODO: Don't use uppercase for variable names below.
+
+Capability include-tag
+~~~~~~~~~~~~~~~~~~~~~~
+
+When packing an object that an annotated tag points at, include the
+tag object too.  Clients can request this if they want to fetch
+tags, but don't know which tags they will need until after they
+receive the branch data.  By enabling include-tag an entire call
+to upload-pack can be avoided.
+
+Capability thin-pack
+~~~~~~~~~~~~~~~~~~~~
+
+When packing a deltified object the base is not included if the base
+is reachable from an object listed in the COMMON set by the client.
+This reduces the bandwidth required to transfer, but it does slightly
+increase processing time for the client to save the pack to disk.
+
+The Negotiation Algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~
+The computation to select the minimal pack proceeds as follows
+(c = client, s = server):
+
+ init step:
+ (c) Use ref discovery to obtain the advertised refs.
+ (c) Place any object seen into set ADVERTISED.
+
+ (c) Build an empty set, COMMON, to hold the objects that are later
+     determined to be on both ends.
+ (c) Build a set, WANT, of the objects from ADVERTISED the client
+     wants to fetch, based on what it saw during ref discovery.
+
+ (c) Start a queue, C_PENDING, ordered by commit time (popping newest
+     first).  Add all client refs.  When a commit is popped from
+     the queue its parents should be automatically inserted back.
+     Commits MUST only enter the queue once.
+
+ one compute step:
+ (c) Send one $GIT_URL/git-upload-pack request:
+
+	C: 0032want <WANT #1>...............................
+	C: 0032want <WANT #2>...............................
+	....
+	C: 0032have <COMMON #1>.............................
+	C: 0032have <COMMON #2>.............................
+	....
+	C: 0032have <HAVE #1>...............................
+	C: 0032have <HAVE #2>...............................
+	....
+	C: 0000
+
+     The stream is organized into "commands", with each command
+     appearing by itself in a pkt-line.  Within a command line
+     the text leading up to the first space is the command name,
+     and the remainder of the line to the first LF is the value.
+     Command lines are terminated with an LF as the last byte of
+     the pkt-line value.
+
+     Commands MUST appear in the following order, if they appear
+     at all in the request stream:
+
+       * want
+       * have
+
+     The stream is terminated by a pkt-line flush ("0000").
+
+     A single "want" or "have" command MUST have one hex formatted
+     SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
+     multiple commands.
+
+     The HAVE list is created by popping the first 32 commits
+     from C_PENDING.  Less can be supplied if C_PENDING empties.
+
+     If the client has sent 256 HAVE commits and has not yet
+     received one of those back from S_COMMON, or the client has
+     emptied C_PENDING it should include a "done" command to let
+     the server know it won't proceed:
+
+	C: 0009done
+
+  (s) Parse the git-upload-pack request:
+
+      Verify all objects in WANT are directly reachable from refs.
+
+	  The server MAY walk backwards through history or through
+      the reflog to permit slightly stale requests.
+
+      If no WANT objects are received, send an error:
+
+TODO: Define error if no want lines are requested.
+
+      If any WANT object is not reachable, send an error:
+
+TODO: Define error if an invalid want is requested.
+
+     Create an empty list, S_COMMON.
+
+     If 'have' was sent:
+
+     Loop through the objects in the order supplied by the client.
+     For each object, if the server has the object reachable from
+     a ref, add it to S_COMMON.  If a commit is added to S_COMMON,
+     do not add any ancestors, even if they also appear in HAVE.
+
+  (s) Send the git-upload-pack response:
+
+     If the server has found a closed set of objects to pack or the
+     request ends with "done", it replies with the pack.
+
+TODO: Document the pack based response
+	S: PACK...
+
+     The returned stream is the side-band-64k protocol supported
+     by the git-upload-pack service, and the pack is embedded into
+     stream 1.  Progress messages from the server side may appear
+     in stream 2.
+
+     Here a "closed set of objects" is defined to have at least
+     one path from every WANT to at least one COMMON object.
+
+     If the server needs more information, it replies with a
+     status continue response:
+
+TODO: Document the non-pack response
+
+  (c) Parse the upload-pack response:
+
+TODO: Document parsing response
+
+      Do another compute step.
+
+
+Smart Service git-receive-pack
+------------------------------
+This service modifies the remote repository.
+
+Clients MUST first perform ref discovery with
+'$GIT_URL/info/refs?service=git-receive-pack'.
+
+	C: POST $GIT_URL/git-receive-pack HTTP/1.0
+	C: Content-Type: application/x-git-receive-pack-request
+	C:
+	C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
+	C: 0000
+	C: PACK....
+
+	S: 200 OK
+	S: Content-Type: application/x-git-receive-pack-result
+	S: Cache-Control: no-cache
+	S:
+	S: ....
+
+Clients MUST NOT reuse or revalidate a cached reponse.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one command in the request body.
+Within the command portion of the request body clients SHOULD send
+the id obtained through ref discovery as old_id.
+
+	update_request    = command_list
+	                    "PACK" <binary data>
+
+	command_list      = PKT-LINE(command NUL cap_list LF)
+	                    *(command_pkt)
+	command_pkt       = PKT-LINE(command LF)
+	cap_list          = *(SP capability) SP
+
+	command           = create | delete | update
+	create            = 40*"0" SP new_id SP name
+	delete            = old_id SP 40*"0" SP name
+	update            = old_id SP new_id SP name
+
+TODO: Document this further.
+
+
+References
+----------
+
+link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
+link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
+
-- 
1.6.5.rc3.193.gdf7a

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
@ 2009-10-09  5:22   ` Shawn O. Pearce
  2009-10-09  5:22     ` [RFC PATCH 3/4] Add smart-http options to upload-pack, receive-pack Shawn O. Pearce
  2009-10-09  5:52     ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport J.H.
  2009-10-09  8:01   ` [RFC PATCH 1/4] Document the HTTP transport protocol Sverre Rabbelier
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-09  5:22 UTC (permalink / raw)
  To: git

The git-http-backend CGI can be configured into any Apache server
using ScriptAlias, such as with the following configuration:

  LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so
  LoadModule alias_module /usr/libexec/apache2/mod_alias.so
  ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

Repositories are accessed via the translated PATH_INFO.

The CGI is backwards compatible with the dumb client, allowing all
older HTTP clients to continue to download repositories which are
managed by the CGI.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .gitignore     |    1 +
 Makefile       |    1 +
 http-backend.c |  261 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 263 insertions(+), 0 deletions(-)
 create mode 100644 http-backend.c

diff --git a/.gitignore b/.gitignore
index 51a37b1..353d22f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -55,6 +55,7 @@ git-get-tar-commit-id
 git-grep
 git-hash-object
 git-help
+git-http-backend
 git-http-fetch
 git-http-push
 git-imap-send
diff --git a/Makefile b/Makefile
index dd3d520..c80fb56 100644
--- a/Makefile
+++ b/Makefile
@@ -361,6 +361,7 @@ PROGRAMS += git-show-index$X
 PROGRAMS += git-unpack-file$X
 PROGRAMS += git-upload-pack$X
 PROGRAMS += git-var$X
+PROGRAMS += git-http-backend$X
 
 # List built-in command $C whose implementation cmd_$C() is not in
 # builtin-$C.o but is linked in as part of some other command.
diff --git a/http-backend.c b/http-backend.c
new file mode 100644
index 0000000..39cfd25
--- /dev/null
+++ b/http-backend.c
@@ -0,0 +1,261 @@
+#include "cache.h"
+#include "refs.h"
+#include "pkt-line.h"
+#include "object.h"
+#include "tag.h"
+#include "exec_cmd.h"
+#include "run-command.h"
+
+static const char content_type[] = "Content-Type";
+static const char content_length[] = "Content-Length";
+
+static char buffer[1024];
+
+static const char *http_date(unsigned long time)
+{
+	return show_date(time, 0, DATE_RFC2822);
+}
+
+static void format_write(const char *fmt, ...)
+{
+	va_list args;
+	unsigned n;
+
+	va_start(args, fmt);
+	n = vsnprintf(buffer, sizeof(buffer), fmt, args);
+	va_end(args);
+	if (n >= sizeof(buffer))
+		die("protocol error: impossibly long line");
+
+	safe_write(1, buffer, n);
+}
+
+static void write_status(unsigned code, const char *msg)
+{
+	format_write("Status: %u %s\r\n", code, msg);
+}
+
+static void write_header(const char *name, const char *value)
+{
+	format_write("%s: %s\r\n", name, value);
+}
+
+static void end_headers(void)
+{
+	safe_write(1, "\r\n", 2);
+}
+
+static void write_nocache(void)
+{
+	write_header("Expires", "Fri, 01 Jan 1980 00:00:00 GMT");
+	write_header("Pragma", "no-cache");
+	write_header("Cache-Control", "no-cache, max-age=0, must-revalidate");
+}
+
+static void write_cache_forever(void)
+{
+	unsigned long now = time(NULL);
+	write_header("Date", http_date(now));
+	write_header("Expires", http_date(now + 31536000));
+	write_header("Cache-Control", "public, max-age=31536000");
+}
+
+static NORETURN void not_found(const char *err, ...)
+{
+	va_list params;
+
+	write_status(404, "Not Found");
+	write_nocache();
+	end_headers();
+
+	va_start(params, err);
+	if (err && *err) {
+		vsnprintf(buffer, sizeof(buffer), err, params);
+		fprintf(stderr, "%s\n", buffer);
+	}
+	va_end(params);
+	exit(0);
+}
+
+static void write_file(const char *the_type, const char *name)
+{
+	const char *p = git_path("%s", name);
+	int fd;
+	struct stat sb;
+	uintmax_t remaining;
+
+	fd = open(p, O_RDONLY);
+	if (fd < 0)
+		not_found("Cannot open '%s': %s", p, strerror(errno));
+	if (fstat(fd, &sb) < 0)
+		die_errno("Cannot stat '%s'", p);
+	remaining = (uintmax_t)sb.st_size;
+
+	write_header(content_type, the_type);
+	write_header("Last-Modified", http_date(sb.st_mtime));
+	format_write("Content-Length: %" PRIuMAX "\r\n", remaining);
+	end_headers();
+
+	while (remaining) {
+		ssize_t n = xread(fd, buffer, sizeof(buffer));
+		if (n < 0)
+			die_errno("Cannot read '%s'", p);
+		n = safe_write(1, buffer, n);
+		if (n <= 0)
+			break;
+	}
+	close(fd);
+}
+
+static void get_text_file(char *name)
+{
+	write_nocache();
+	write_file("text/plain; charset=utf-8", name);
+}
+
+static void get_loose_object(char *name)
+{
+	write_cache_forever();
+	write_file("application/x-git-loose-object", name);
+}
+
+static void get_pack_file(char *name)
+{
+	write_cache_forever();
+	write_file("application/x-git-packed-objects", name);
+}
+
+static void get_idx_file(char *name)
+{
+	write_cache_forever();
+	write_file("application/x-git-packed-objects-toc", name);
+}
+
+static int show_text_ref(const char *name, const unsigned char *sha1,
+	int flag, void *cb_data)
+{
+	struct object *o = parse_object(sha1);
+	if (!o)
+		return 0;
+
+	format_write("%s\t%s\n", sha1_to_hex(sha1), name);
+	if (o->type == OBJ_TAG) {
+		o = deref_tag(o, name, 0);
+		if (!o)
+			return 0;
+		format_write("%s\t%s^{}\n", sha1_to_hex(o->sha1), name);
+	}
+
+	return 0;
+}
+
+static void get_info_refs(char *arg)
+{
+	write_nocache();
+	write_header(content_type, "text/plain; charset=utf-8");
+	end_headers();
+
+	for_each_ref(show_text_ref, NULL);
+}
+
+static void get_info_packs(char *arg)
+{
+	size_t objdirlen = strlen(get_object_directory());
+	struct packed_git *p;
+
+	write_nocache();
+	write_header(content_type, "text/plain; charset=utf-8");
+	end_headers();
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next) {
+		if (!p->pack_local)
+			continue;
+		format_write("P %s\n", p->pack_name + objdirlen + 6);
+	}
+	safe_write(1, "\n", 1);
+}
+
+static NORETURN void die_webcgi(const char *err, va_list params)
+{
+	write_status(500, "Internal Server Error");
+	write_nocache();
+	end_headers();
+
+	vsnprintf(buffer, sizeof(buffer), err, params);
+	fprintf(stderr, "fatal: %s\n", buffer);
+	exit(0);
+}
+
+static struct service_cmd {
+	const char *method;
+	const char *pattern;
+	void (*imp)(char *);
+} services[] = {
+	{"GET", "/HEAD$", get_text_file},
+	{"GET", "/info/refs$", get_info_refs},
+	{"GET", "/objects/info/packs$", get_info_packs},
+	{"GET", "/objects/info/[^/]*$", get_text_file},
+	{"GET", "/objects/[0-9a-f]{2}/[0-9a-f]{38}$", get_loose_object},
+	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.pack$", get_pack_file},
+	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file}
+};
+
+int main(int argc, char **argv)
+{
+	char *dir = getenv("PATH_TRANSLATED");
+	char *input_method = getenv("REQUEST_METHOD");
+	struct service_cmd *cmd = NULL;
+	char *cmd_arg = NULL;
+	int i;
+
+	set_die_routine(die_webcgi);
+
+	if (!dir)
+		die("No PATH_TRANSLATED from server");
+	if (!input_method)
+		die("No REQUEST_METHOD from server");
+	if (!strcmp(input_method, "HEAD"))
+		input_method = "GET";
+
+	for (i = 0; i < ARRAY_SIZE(services); i++) {
+		struct service_cmd *c = &services[i];
+		regex_t re;
+		regmatch_t out[1];
+
+		if (regcomp(&re, c->pattern, REG_EXTENDED))
+			die("Bogus regex in service table: %s", c->pattern);
+		if (!regexec(&re, dir, 1, out, 0)) {
+			size_t n = out[0].rm_eo - out[0].rm_so;
+
+			if (strcmp(input_method, c->method)) {
+				const char *proto = getenv("SERVER_PROTOCOL");
+				if (proto && !strcmp(proto, "HTTP/1.1"))
+					write_status(405, "Method Not Allowed");
+				else
+					write_status(400, "Bad Request");
+				write_nocache();
+				end_headers();
+				return 0;
+			}
+
+			cmd = c;
+			cmd_arg = xmalloc(n);
+			strncpy(cmd_arg, dir + out[0].rm_so + 1, n);
+			cmd_arg[n] = '\0';
+			dir[out[0].rm_so] = 0;
+			break;
+		}
+		regfree(&re);
+	}
+
+	if (!cmd)
+		not_found("Request not supported: '%s'", dir);
+
+	setup_path();
+	if (!enter_repo(dir, 0))
+		not_found("Not a git repository: '%s'", dir);
+
+	cmd->imp(cmd_arg);
+	return 0;
+}
-- 
1.6.5.rc3.193.gdf7a

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC PATCH 3/4] Add smart-http options to upload-pack, receive-pack
  2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
@ 2009-10-09  5:22     ` Shawn O. Pearce
  2009-10-09  5:22       ` [RFC PATCH 4/4] Smart fetch and push over HTTP: server side Shawn O. Pearce
  2009-10-09  5:52     ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport J.H.
  1 sibling, 1 reply; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-09  5:22 UTC (permalink / raw)
  To: git

When --smart-http is passed as a command line parameter to
upload-pack or receive-pack the programs now assume they may
perform only a single read-write cycle with stdin and stdout.
This fits with the HTTP POST request processing model where a
program may read the request, write a response, and must exit.

When --advertise-refs is passed as a command line parameter only
the initial ref advertisement is output, and the program exits
immediately.  This fits with the HTTP GET request model, where
no request content is received but a response must be produced.

HTTP headers and/or environment are not processed here, but
instead are assumed to be handled by the program invoking
either service backend.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 builtin-receive-pack.c |   26 ++++++++++++++++++++------
 upload-pack.c          |   40 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/builtin-receive-pack.c b/builtin-receive-pack.c
index b771fe9..a075785 100644
--- a/builtin-receive-pack.c
+++ b/builtin-receive-pack.c
@@ -615,6 +615,8 @@ static void add_alternate_refs(void)
 
 int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 {
+	int advertise_refs = 0;
+	int smart_http = 0;
 	int i;
 	char *dir = NULL;
 
@@ -623,7 +625,15 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		const char *arg = *argv++;
 
 		if (*arg == '-') {
-			/* Do flag handling here */
+			if (!strcmp(arg, "--advertise-refs")) {
+				advertise_refs = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--smart-http")) {
+				smart_http = 1;
+				continue;
+			}
+
 			usage(receive_pack_usage);
 		}
 		if (dir)
@@ -652,12 +662,16 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		" report-status delete-refs ofs-delta " :
 		" report-status delete-refs ";
 
-	add_alternate_refs();
-	write_head_info();
-	clear_extra_refs();
+	if (advertise_refs || !smart_http) {
+		add_alternate_refs();
+		write_head_info();
+		clear_extra_refs();
 
-	/* EOF */
-	packet_flush(1);
+		/* EOF */
+		packet_flush(1);
+	}
+	if (advertise_refs)
+		return 0;
 
 	read_head_info();
 	if (commands) {
diff --git a/upload-pack.c b/upload-pack.c
index 38ddac2..ae67039 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -39,6 +39,8 @@ static unsigned int timeout;
  */
 static int use_sideband;
 static int debug_fd;
+static int advertise_refs;
+static int smart_http;
 
 static void reset_timeout(void)
 {
@@ -509,6 +511,8 @@ static int get_common_commits(void)
 		if (!len) {
 			if (have_obj.nr == 0 || multi_ack)
 				packet_write(1, "NAK\n");
+			if (smart_http)
+				exit(0);
 			continue;
 		}
 		strip(line, len);
@@ -705,12 +709,32 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 	return 0;
 }
 
+static int mark_our_ref(const char *refname, const unsigned char *sha1, int flag, void *cb_data)
+{
+	struct object *o = parse_object(sha1);
+	if (!o)
+		die("git upload-pack: cannot find object %s:", sha1_to_hex(sha1));
+	if (!(o->flags & OUR_REF)) {
+		o->flags |= OUR_REF;
+		nr_our_refs++;
+	}
+	return 0;
+}
+
 static void upload_pack(void)
 {
-	reset_timeout();
-	head_ref(send_ref, NULL);
-	for_each_ref(send_ref, NULL);
-	packet_flush(1);
+	if (advertise_refs || !smart_http) {
+		reset_timeout();
+		head_ref(send_ref, NULL);
+		for_each_ref(send_ref, NULL);
+		packet_flush(1);
+	} else {
+		head_ref(mark_our_ref, NULL);
+		for_each_ref(mark_our_ref, NULL);
+	}
+	if (advertise_refs)
+		return;
+
 	receive_needs();
 	if (want_obj.nr) {
 		get_common_commits();
@@ -732,6 +756,14 @@ int main(int argc, char **argv)
 
 		if (arg[0] != '-')
 			break;
+		if (!strcmp(arg, "--advertise-refs")) {
+			advertise_refs = 1;
+			continue;
+		}
+		if (!strcmp(arg, "--smart-http")) {
+			smart_http = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--strict")) {
 			strict = 1;
 			continue;
-- 
1.6.5.rc3.193.gdf7a

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC PATCH 4/4] Smart fetch and push over HTTP: server side
  2009-10-09  5:22     ` [RFC PATCH 3/4] Add smart-http options to upload-pack, receive-pack Shawn O. Pearce
@ 2009-10-09  5:22       ` Shawn O. Pearce
  0 siblings, 0 replies; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-09  5:22 UTC (permalink / raw)
  To: git

Requests for $GIT_URL/git-receive-pack and $GIT_URL/git-upload-pack
are forwarded to the corresponding backend process by directly
executing it and leaving stdin and stdout connected to the web
server.  Prior to starting the backend HTTP headers are sent, thereby
freeing the backend from needing to know about the HTTP protocol.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 http-backend.c |  135 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 134 insertions(+), 1 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index 39cfd25..978f820 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -77,6 +77,95 @@ static NORETURN void not_found(const char *err, ...)
 	exit(0);
 }
 
+static NORETURN void forbidden(const char *err, ...)
+{
+	va_list params;
+
+	write_status(403, "Forbidden");
+	write_nocache();
+	end_headers();
+
+	va_start(params, err);
+	if (err && *err) {
+		vsnprintf(buffer, sizeof(buffer), err, params);
+		fprintf(stderr, "%s\n", buffer);
+	}
+	va_end(params);
+	exit(0);
+}
+
+struct http_service {
+	const char *name;
+	const char *config_name;
+	int enabled;
+};
+static struct http_service *service;
+
+static struct http_service http_service[] = {
+	{ "upload-pack", "uploadpack", 1 },
+	{ "receive-pack", "receivepack", 0 },
+};
+
+static int http_config(const char *var, const char *value, void *cb)
+{
+	if (!prefixcmp(var, "http.") &&
+	    !strcmp(var + 7, service->config_name)) {
+		service->enabled = git_config_bool(var, value);
+		return 0;
+	}
+
+	/* we are not interested in parsing any other configuration here */
+	return 0;
+}
+
+static void select_service(const char *name)
+{
+	int i;
+
+	if (prefixcmp(name, "git-"))
+		forbidden("Unsupported service: '%s'", name);
+
+	for (i = 0; i < ARRAY_SIZE(http_service); i++) {
+		service = &http_service[i];
+		if (!strcmp(service->name, name + 4)) {
+			git_config(http_config, NULL);
+			if (!service->enabled)
+				forbidden("Service not enabled: '%s'", name);
+			return;
+		}
+	}
+	forbidden("Unsupported service: '%s'", name);
+}
+
+static void run_service(const char **argv)
+{
+#ifndef WIN32
+	execv_git_cmd(argv);
+#else
+	struct child_process cld;
+
+	memset(&cld, 0, sizeof(cld));
+	cld.argv = argv;
+	cld.git_cmd = 1;
+	if (start_command(&cld))
+		die("Cannot start git-%s service", service->name);
+	close(0);
+	close(1);
+	finish_command(&cld);
+#endif
+}
+
+static void require_content_type(const char *need_type)
+{
+	const char *input_type = getenv("CONTENT_TYPE");
+	if (!input_type || strcmp(input_type, need_type)) {
+		write_status(415, "Unsupported Media Type");
+		write_nocache();
+		end_headers();
+		exit(0);
+	}
+}
+
 static void write_file(const char *the_type, const char *name)
 {
 	const char *p = git_path("%s", name);
@@ -151,6 +240,25 @@ static int show_text_ref(const char *name, const unsigned char *sha1,
 
 static void get_info_refs(char *arg)
 {
+	char *query = getenv("QUERY_STRING");
+
+	if (query && !prefixcmp(query, "service=")) {
+		const char *argv[] = {NULL /* service name */,
+			"--smart-http", "--advertise-refs",
+			".", NULL};
+
+		select_service(query + 8);
+
+		write_nocache();
+		format_write("%s: application/x-git-%s-advertisement\r\n",
+			content_type, service->name);
+		end_headers();
+		packet_write(1, "# service=git-%s\n", service->name);
+
+		argv[0] = service->name;
+		run_service(argv);
+	}
+
 	write_nocache();
 	write_header(content_type, "text/plain; charset=utf-8");
 	end_headers();
@@ -176,6 +284,28 @@ static void get_info_packs(char *arg)
 	safe_write(1, "\n", 1);
 }
 
+static void post_to_service(char *service_name)
+{
+	const char *argv[] = {NULL, "--smart-http", ".", NULL};
+	unsigned n;
+
+	select_service(service_name);
+
+	n = snprintf(buffer, sizeof(buffer),
+		"application/x-git-%s-request", service->name);
+	if (n >= sizeof(buffer))
+		die("impossibly long service name");
+	require_content_type(buffer);
+
+	write_nocache();
+	format_write("%s: application/x-git-%s-result\r\n",
+		content_type, service->name);
+	end_headers();
+
+	argv[0] = service->name;
+	run_service(argv);
+}
+
 static NORETURN void die_webcgi(const char *err, va_list params)
 {
 	write_status(500, "Internal Server Error");
@@ -198,7 +328,10 @@ static struct service_cmd {
 	{"GET", "/objects/info/[^/]*$", get_text_file},
 	{"GET", "/objects/[0-9a-f]{2}/[0-9a-f]{38}$", get_loose_object},
 	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.pack$", get_pack_file},
-	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file}
+	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
+
+	{"POST", "/git-upload-pack$", post_to_service},
+	{"POST", "/git-receive-pack$", post_to_service}
 };
 
 int main(int argc, char **argv)
-- 
1.6.5.rc3.193.gdf7a

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport
  2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
  2009-10-09  5:22     ` [RFC PATCH 3/4] Add smart-http options to upload-pack, receive-pack Shawn O. Pearce
@ 2009-10-09  5:52     ` J.H.
  1 sibling, 0 replies; 46+ messages in thread
From: J.H. @ 2009-10-09  5:52 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

I dunno I kinda object to it being called http-backend, personally I'd 
rather it be called git-smart since this is the smart http protocol ;-)

- John 'Warthog9' Hawley

Shawn O. Pearce wrote:
> The git-http-backend CGI can be configured into any Apache server
> using ScriptAlias, such as with the following configuration:
> 
>   LoadModule cgi_module /usr/libexec/apache2/mod_cgi.so
>   LoadModule alias_module /usr/libexec/apache2/mod_alias.so
>   ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/
> 
> Repositories are accessed via the translated PATH_INFO.
> 
> The CGI is backwards compatible with the dumb client, allowing all
> older HTTP clients to continue to download repositories which are
> managed by the CGI.
> 
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
> ---
>  .gitignore     |    1 +
>  Makefile       |    1 +
>  http-backend.c |  261 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 263 insertions(+), 0 deletions(-)
>  create mode 100644 http-backend.c
> 
> diff --git a/.gitignore b/.gitignore
> index 51a37b1..353d22f 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -55,6 +55,7 @@ git-get-tar-commit-id
>  git-grep
>  git-hash-object
>  git-help
> +git-http-backend
>  git-http-fetch
>  git-http-push
>  git-imap-send
> diff --git a/Makefile b/Makefile
> index dd3d520..c80fb56 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -361,6 +361,7 @@ PROGRAMS += git-show-index$X
>  PROGRAMS += git-unpack-file$X
>  PROGRAMS += git-upload-pack$X
>  PROGRAMS += git-var$X
> +PROGRAMS += git-http-backend$X
>  
>  # List built-in command $C whose implementation cmd_$C() is not in
>  # builtin-$C.o but is linked in as part of some other command.
> diff --git a/http-backend.c b/http-backend.c
> new file mode 100644
> index 0000000..39cfd25
> --- /dev/null
> +++ b/http-backend.c
> @@ -0,0 +1,261 @@
> +#include "cache.h"
> +#include "refs.h"
> +#include "pkt-line.h"
> +#include "object.h"
> +#include "tag.h"
> +#include "exec_cmd.h"
> +#include "run-command.h"
> +
> +static const char content_type[] = "Content-Type";
> +static const char content_length[] = "Content-Length";
> +
> +static char buffer[1024];
> +
> +static const char *http_date(unsigned long time)
> +{
> +	return show_date(time, 0, DATE_RFC2822);
> +}
> +
> +static void format_write(const char *fmt, ...)
> +{
> +	va_list args;
> +	unsigned n;
> +
> +	va_start(args, fmt);
> +	n = vsnprintf(buffer, sizeof(buffer), fmt, args);
> +	va_end(args);
> +	if (n >= sizeof(buffer))
> +		die("protocol error: impossibly long line");
> +
> +	safe_write(1, buffer, n);
> +}
> +
> +static void write_status(unsigned code, const char *msg)
> +{
> +	format_write("Status: %u %s\r\n", code, msg);
> +}
> +
> +static void write_header(const char *name, const char *value)
> +{
> +	format_write("%s: %s\r\n", name, value);
> +}
> +
> +static void end_headers(void)
> +{
> +	safe_write(1, "\r\n", 2);
> +}
> +
> +static void write_nocache(void)
> +{
> +	write_header("Expires", "Fri, 01 Jan 1980 00:00:00 GMT");
> +	write_header("Pragma", "no-cache");
> +	write_header("Cache-Control", "no-cache, max-age=0, must-revalidate");
> +}
> +
> +static void write_cache_forever(void)
> +{
> +	unsigned long now = time(NULL);
> +	write_header("Date", http_date(now));
> +	write_header("Expires", http_date(now + 31536000));
> +	write_header("Cache-Control", "public, max-age=31536000");
> +}
> +
> +static NORETURN void not_found(const char *err, ...)
> +{
> +	va_list params;
> +
> +	write_status(404, "Not Found");
> +	write_nocache();
> +	end_headers();
> +
> +	va_start(params, err);
> +	if (err && *err) {
> +		vsnprintf(buffer, sizeof(buffer), err, params);
> +		fprintf(stderr, "%s\n", buffer);
> +	}
> +	va_end(params);
> +	exit(0);
> +}
> +
> +static void write_file(const char *the_type, const char *name)
> +{
> +	const char *p = git_path("%s", name);
> +	int fd;
> +	struct stat sb;
> +	uintmax_t remaining;
> +
> +	fd = open(p, O_RDONLY);
> +	if (fd < 0)
> +		not_found("Cannot open '%s': %s", p, strerror(errno));
> +	if (fstat(fd, &sb) < 0)
> +		die_errno("Cannot stat '%s'", p);
> +	remaining = (uintmax_t)sb.st_size;
> +
> +	write_header(content_type, the_type);
> +	write_header("Last-Modified", http_date(sb.st_mtime));
> +	format_write("Content-Length: %" PRIuMAX "\r\n", remaining);
> +	end_headers();
> +
> +	while (remaining) {
> +		ssize_t n = xread(fd, buffer, sizeof(buffer));
> +		if (n < 0)
> +			die_errno("Cannot read '%s'", p);
> +		n = safe_write(1, buffer, n);
> +		if (n <= 0)
> +			break;
> +	}
> +	close(fd);
> +}
> +
> +static void get_text_file(char *name)
> +{
> +	write_nocache();
> +	write_file("text/plain; charset=utf-8", name);
> +}
> +
> +static void get_loose_object(char *name)
> +{
> +	write_cache_forever();
> +	write_file("application/x-git-loose-object", name);
> +}
> +
> +static void get_pack_file(char *name)
> +{
> +	write_cache_forever();
> +	write_file("application/x-git-packed-objects", name);
> +}
> +
> +static void get_idx_file(char *name)
> +{
> +	write_cache_forever();
> +	write_file("application/x-git-packed-objects-toc", name);
> +}
> +
> +static int show_text_ref(const char *name, const unsigned char *sha1,
> +	int flag, void *cb_data)
> +{
> +	struct object *o = parse_object(sha1);
> +	if (!o)
> +		return 0;
> +
> +	format_write("%s\t%s\n", sha1_to_hex(sha1), name);
> +	if (o->type == OBJ_TAG) {
> +		o = deref_tag(o, name, 0);
> +		if (!o)
> +			return 0;
> +		format_write("%s\t%s^{}\n", sha1_to_hex(o->sha1), name);
> +	}
> +
> +	return 0;
> +}
> +
> +static void get_info_refs(char *arg)
> +{
> +	write_nocache();
> +	write_header(content_type, "text/plain; charset=utf-8");
> +	end_headers();
> +
> +	for_each_ref(show_text_ref, NULL);
> +}
> +
> +static void get_info_packs(char *arg)
> +{
> +	size_t objdirlen = strlen(get_object_directory());
> +	struct packed_git *p;
> +
> +	write_nocache();
> +	write_header(content_type, "text/plain; charset=utf-8");
> +	end_headers();
> +
> +	prepare_packed_git();
> +	for (p = packed_git; p; p = p->next) {
> +		if (!p->pack_local)
> +			continue;
> +		format_write("P %s\n", p->pack_name + objdirlen + 6);
> +	}
> +	safe_write(1, "\n", 1);
> +}
> +
> +static NORETURN void die_webcgi(const char *err, va_list params)
> +{
> +	write_status(500, "Internal Server Error");
> +	write_nocache();
> +	end_headers();
> +
> +	vsnprintf(buffer, sizeof(buffer), err, params);
> +	fprintf(stderr, "fatal: %s\n", buffer);
> +	exit(0);
> +}
> +
> +static struct service_cmd {
> +	const char *method;
> +	const char *pattern;
> +	void (*imp)(char *);
> +} services[] = {
> +	{"GET", "/HEAD$", get_text_file},
> +	{"GET", "/info/refs$", get_info_refs},
> +	{"GET", "/objects/info/packs$", get_info_packs},
> +	{"GET", "/objects/info/[^/]*$", get_text_file},
> +	{"GET", "/objects/[0-9a-f]{2}/[0-9a-f]{38}$", get_loose_object},
> +	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.pack$", get_pack_file},
> +	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file}
> +};
> +
> +int main(int argc, char **argv)
> +{
> +	char *dir = getenv("PATH_TRANSLATED");
> +	char *input_method = getenv("REQUEST_METHOD");
> +	struct service_cmd *cmd = NULL;
> +	char *cmd_arg = NULL;
> +	int i;
> +
> +	set_die_routine(die_webcgi);
> +
> +	if (!dir)
> +		die("No PATH_TRANSLATED from server");
> +	if (!input_method)
> +		die("No REQUEST_METHOD from server");
> +	if (!strcmp(input_method, "HEAD"))
> +		input_method = "GET";
> +
> +	for (i = 0; i < ARRAY_SIZE(services); i++) {
> +		struct service_cmd *c = &services[i];
> +		regex_t re;
> +		regmatch_t out[1];
> +
> +		if (regcomp(&re, c->pattern, REG_EXTENDED))
> +			die("Bogus regex in service table: %s", c->pattern);
> +		if (!regexec(&re, dir, 1, out, 0)) {
> +			size_t n = out[0].rm_eo - out[0].rm_so;
> +
> +			if (strcmp(input_method, c->method)) {
> +				const char *proto = getenv("SERVER_PROTOCOL");
> +				if (proto && !strcmp(proto, "HTTP/1.1"))
> +					write_status(405, "Method Not Allowed");
> +				else
> +					write_status(400, "Bad Request");
> +				write_nocache();
> +				end_headers();
> +				return 0;
> +			}
> +
> +			cmd = c;
> +			cmd_arg = xmalloc(n);
> +			strncpy(cmd_arg, dir + out[0].rm_so + 1, n);
> +			cmd_arg[n] = '\0';
> +			dir[out[0].rm_so] = 0;
> +			break;
> +		}
> +		regfree(&re);
> +	}
> +
> +	if (!cmd)
> +		not_found("Request not supported: '%s'", dir);
> +
> +	setup_path();
> +	if (!enter_repo(dir, 0))
> +		not_found("Not a git repository: '%s'", dir);
> +
> +	cmd->imp(cmd_arg);
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
  2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
@ 2009-10-09  8:01   ` Sverre Rabbelier
  2009-10-09  8:09     ` Sverre Rabbelier
  2009-10-09  8:54   ` Alex Blewitt
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Sverre Rabbelier @ 2009-10-09  8:01 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

Heya,

I had some spare time, I hope these comments from someone that is not
too familiar with the protocol are helpful :).

On Fri, Oct 9, 2009 at 07:22, Shawn O. Pearce <spearce@spearce.org> wrote:
> +Compatible clients must expand
> +'$GIT_URL/info/refs' as 'foo/info/refs' and not 'foo//info/refs'.

Does this not need s/must/MUST/

> +       S: ....# service=git-upload-pack
> +       S: ....95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0 multi_ack
> +       S: ....d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
> +       S: ....2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
> +       S: ....a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}

Shouldn't this contain HEAD as the first ref?

> +       ref_list       = empty_list | populated_list
> +
> +       empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> +
> +       non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> +                        *ref_record

Does this need a s/non_empty_list/populated_list/ ?

> +       cap_list      = *(SP capability) SP

You never define capability.

> + (c) Send one $GIT_URL/git-upload-pack request:

I don't think you documented what $GIT_URL/git-upload-pack means.

> +     If the client has sent 256 HAVE commits and has not yet
> +     received one of those back from S_COMMON, or the client has
> +     emptied C_PENDING it should include a "done" command to let
> +     the server know it won't proceed:
> +
> +       C: 0009done

This should probably move down to after you define what S_COMMON is in
the first place.


> +     Here a "closed set of objects" is defined to have at least
> +     one path from every WANT to at least one COMMON object.

A 'path from' is perhaps a bit unclear.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  8:01   ` [RFC PATCH 1/4] Document the HTTP transport protocol Sverre Rabbelier
@ 2009-10-09  8:09     ` Sverre Rabbelier
  0 siblings, 0 replies; 46+ messages in thread
From: Sverre Rabbelier @ 2009-10-09  8:09 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

Heya,

On Fri, Oct 9, 2009 at 10:01, Sverre Rabbelier <srabbelier@gmail.com> wrote:
>> + (c) Send one $GIT_URL/git-upload-pack request:
>
> I don't think you documented what $GIT_URL/git-upload-pack means.

Ah, I didn't realize until I read 4/4 that this is just a regular
request to the 'http://<host>:<port>/git-upload-pack' url, I was
confused by the need to query
"http://<host>:<port>/info/refs?service=git-upload-pack".

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
  2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
  2009-10-09  8:01   ` [RFC PATCH 1/4] Document the HTTP transport protocol Sverre Rabbelier
@ 2009-10-09  8:54   ` Alex Blewitt
  2009-10-15 16:39     ` Shawn O. Pearce
  2009-10-09 19:27   ` Jakub Narebski
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Alex Blewitt @ 2009-10-09  8:54 UTC (permalink / raw)
  To: git

Shawn O. Pearce <spearce <at> spearce.org> writes:

> +URL Format
> +----------
> +
> +URLs for Git repositories accessed by HTTP use the standard HTTP
> +URL syntax documented by RFC 1738, so they are of the form:
> +
> +  http://<host>:<port>/<path>
> +
> +Within this documentation the placeholder $GIT_URL will stand for
> +the http:// repository URL entered by the end-user.

It's worth making clear here that $GIT_URL will be the path to the repository,
rather than necessarily just the host upon which the server sits. Perhaps
including an example, like http://example:8080/repos/example.git
would make it clearer that there can be a path (and so leading to
a request like http://example:8080/repos/example.git/info/refs?service=...

It's also worth clarifying, therefore, that multiple repositories can be served
by the same process (as with the git server today) by using different path(s).
And for those that are interested in submodules, it's worth confirming that
http://example/repos/master.git/child.git/info/refs?service= will ensure 
that the repository is the 'child' git rather than anything else.

> HEX = [0-9a-f]

Is there any reason not to support A-F as well in the hex spec, even if they
SHOULD use a-f? This may limit the appeal for some case-insensitive systems.

It would also be good to document, like with the git daemon, whether all
repositories under a path are exported or only those that have the magic
setting in the config like git-daemon-export-ok.

Lastly, it would be good to clarify when the result of this GET/POST exchange
is a text-based (and encoded in UTF-8) vs when binary data is returned; we 
don't want to get into the state where we're returning binary data and 
pretending that it's UTF-8.

Alex

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
                     ` (2 preceding siblings ...)
  2009-10-09  8:54   ` Alex Blewitt
@ 2009-10-09 19:27   ` Jakub Narebski
  2009-10-09 19:50   ` Jeff King
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 46+ messages in thread
From: Jakub Narebski @ 2009-10-09 19:27 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> +
> +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> +	                 *ref_record
> +
> +	cap_list      = *(SP capability) SP

Errr... are you sure?  Because from examples it looks like cap_list
(capabilities list) is a list of space *separated* capabilities, while
the above requires also both leading and trailing space.  Shouldn't it
be

	cap_list      = capability *(SP capability)

Also the format for capability is not defined; I guess only 
a-z, 0-9, '-' and '_' are allowed in capability name.


BTW. is it possible to not have capability list?

> +	HEX           = "0".."9" | "a".."f"

Do you plan allowing also upper case letters, while server and client
SHOULD use lowercase?  Because if you do, then RFC 5234 which defines
ABNF you seem to be using here has HEXDIG defined.

> +	NL            = <US-ASCII NUL, null (0)>

Why not NUL?

> +	LF            = <US-ASCII LF,  linefeed (10)>
> +	SP            = <US-ASCII SP,  horizontal-tab (9)>
                                       ^^^^^^^^^^^^^^-- o'rly?

Those are pre-defined in ABNF, e.g.

	SP             =  %x20

> +References
> +----------
> +
> +link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
> +link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]

You should also reference the following RFCs:
 * "RFC 5234: Augmented BNF for Syntax Specifications: ABNF"
 * "RFC 2119: Key words for use in RFCs to Indicate Requirement Levels"

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
                     ` (3 preceding siblings ...)
  2009-10-09 19:27   ` Jakub Narebski
@ 2009-10-09 19:50   ` Jeff King
  2009-10-15 16:52     ` Shawn O. Pearce
  2009-10-09 20:44   ` Junio C Hamano
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 46+ messages in thread
From: Jeff King @ 2009-10-09 19:50 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

On Thu, Oct 08, 2009 at 10:22:45PM -0700, Shawn O. Pearce wrote:

> +Servers MUST NOT require HTTP cookies for the purposes of
> +authentication or access control.
> [...]
> +Servers MUST NOT require HTTP cookies in order to function correctly.
> +Clients MAY store and forward HTTP cookies during request processing
> +as described by RFC 2616 (HTTP/1.1).  Servers SHOULD ignore any
> +cookies sent by a client.

Why not? I can grant that the current git implementation probably can't
handle it, but keep in mind this is talking about the protocol and not
the implementation. And I can see it being useful for sites like github
which already have a cookie-based login. Adapting the client to handle
this case would not be too difficult (it would just mean keeping cookie
state in a file between runs, or even just pulling it out of the normal
browser's cookie store). And people whose client didn't do this would
simply get an "access denied" response code.

Is there a technical reason not to allow it?

-Peff

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
                     ` (4 preceding siblings ...)
  2009-10-09 19:50   ` Jeff King
@ 2009-10-09 20:44   ` Junio C Hamano
  2009-10-10 10:12     ` Antti-Juhani Kaijanaho
                       ` (4 more replies)
  2009-10-10 12:17   ` Tay Ray Chuan
                     ` (2 subsequent siblings)
  8 siblings, 5 replies; 46+ messages in thread
From: Junio C Hamano @ 2009-10-09 20:44 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

Nice write-up.

>  Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
>  1 files changed, 542 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/technical/http-protocol.txt
>
> diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> new file mode 100644
> index 0000000..316d9b6
> --- /dev/null
> +++ b/Documentation/technical/http-protocol.txt
> @@ -0,0 +1,542 @@
> +HTTP transfer protocols
> +=======================
> ...
> +As a design feature smart clients can automatically upgrade "dumb"
> +protocol URLs to smart URLs.  This permits all users to have the
> +same published URL, and the peers automatically select the most
> +efficient transport available to them.

The first sentence feels backwards although the conclusion in the second
sentence is true.  It is more like smart ones trying smart protocol first,
and downgrading to "dumb" after noticing that the server is not smart.

> +Authentication
> +--------------
> ...
> +Clients SHOULD support Basic authentication as described by RFC 2616.
> +Servers SHOULD support Basic authentication by relying upon the
> +HTTP server placed in front of the Git server software.

It is perfectly fine to make it a requirement for a server to support the
Basic authentication, but should you make it a requirement that the
support is done by a specific implementation, i.e. "by relying upon..."?

> +Session State
> +-------------
> ...
> +retained and managed by the client process.  This permits simple
> +round-robin load-balancing on the server side, without needing to
> +worry about state mangement.

s/mangement/management/;

> +pkt-line Format
> +---------------
> ...
> +Examples (as C-style strings):
> +
> +  pkt-line          actual value
> +  ---------------------------------
> +  "0006a\n"         "a\n"
> +  "0005a"           "a"
> +  "000bfoobar\n"    "foobar\n"
> +  "0004"            ""
> +
> +A pkt-line with a length of 0 ("0000") is a special case and MUST
> +be treated as a message break or terminator in the payload.

Isn't this "MUST be" wrong?

It is not an advice to the implementors, but the protocol specification
itself defines what the flush packet means.  IOW, "The author of this
specification, Shawn, MUST treat a flush packet as a message break or
terminator in the payload, when designing this protocol."

> +General Request Processing
> +--------------------------
> +
> +Except where noted, all standard HTTP behavior SHOULD be assumed
> +by both client and server.  This includes (but is not necessarily
> +limited to):
> +
> +If there is no repository at $GIT_URL, the server MUST respond with
> +the '404 Not Found' HTTP status code.

We may also want to add

    If there is no object at $GIT_URL/some/path, the server MUST respond
    with the '404 Not Found' HTTP status code.

to help dumb clients.

> +Dumb Clients
> +~~~~~~~~~~~~
> +
> +HTTP clients that only support the "dumb" protocol MUST discover
> +references by making a request for the special info/refs file of
> +the repository.
> +
> +Dumb HTTP clients MUST NOT include search/query parameters when
> +fetching the info/refs file.  (That is, '?' must not appear in the
> +requested URL.)

It is unclear if '?' can be part of $GIT_URL. E.g.

    $ wget http://example.xz/serve.cgi?path=git.git/info/refs
    $ git clone http://example.xz/serve.cgi?path=git.git

It might be clearer to just say

    Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
    without any search/query parameters.  I.e.

	C: GET $GIT_URL/info/refs HTTP/1.0

to also exclude methods other than GET.

> +	C: GET $GIT_URL/info/refs HTTP/1.0
> +
> +	S: 200 OK
> ...
> +When examining the response clients SHOULD only examine the HTTP
> +status code.  Valid responses are '200 OK', or '304 Not Modified'.

Isn't 401 ("Ah, I was given a wrong URL") and 403 ("Ok, I do not have an
access to this repository") also valid?

> +The returned content is a UNIX formatted text file describing
> +each ref and its known value.  The file SHOULD be sorted by name
> +according to the C locale ordering.  The file SHOULD NOT include
> +the default ref named 'HEAD'.

I know you said "known" to imply "concurrent operations may change it
while the server is serving this client", but it feels rather awkward.

> +Smart Server Response
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +Smart servers MUST respond with the smart server reply format.
> +If the server does not recognize the requested service name, or the
> +requested service name has been disabled by the server administrator,
> +the server MUST respond with the '403 Forbidden' HTTP status code.

This is a bit confusing.

If you as a server administrator want to disable the smart upload-pack for
one repository (but not for other repositories), you would not be able to
force smart clients to fall back to the dumb protocol by giving "403" for
that repository.

Maybe in 2 years somebody smarter than us will have invented a more
efficient git-upload-pack-2 service, which is the only fetch protocol his
server supports other than dumb.  If your v1 smart client asks for the
original git-upload-pack service and gets a "403", you won't be able to
fall back to "dumb".

The solution for such cases likely is to pretend as if you are a dumb
server for the smart request.  That unfortunately means that the first
sentence is misleading, and the second sentence is also an inappropriate
advice.

> +The Content-Type MUST be 'application/x-$servicename-advertisement'.
> +Clients SHOULD fall back to the dumb protocol if another content
> +type is returned.  When falling back to the dumb protocol clients
> +SHOULD NOT make an additional request to $GIT_URL/info/refs, but
> +instead SHOULD use the response already in hand.  Clients MUST NOT
> +continue if they do not support the dumb protocol.

The part I commented on (the beginning of Smart Server Response) was
written as a generic description, not specific to git-upload-pack service,
and the beginning of this paragraph also pretends to be a generic
description, but it is misleading.  This is a specific instruction to the
clients that asked for git-upload-pack service and got a dumb server
response (if the above were talking about something other than upload-pack
service, there is no guarantee that "response already in hand" is useful
to talk to dumb servers).

> +The returned response is a pkt-line stream describing each ref and
> +its known value.  The stream SHOULD be sorted by name according to
> +the C locale ordering.  The stream SHOULD include the default ref
> +named 'HEAD' as the first ref.  The stream MUST include capability
> +declarations behind a NUL on the first ref.
> +
> +	smart_reply    = PKT-LINE("# service=$servicename" LF)
> +	                 ref_list
> +	                 "0000"
> +	ref_list       = empty_list | populated_list
> +
> +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> +
> +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> +	                 *ref_record
> +
> +	cap_list      = *(SP capability) SP
> +	ref_record    = any_ref | peeled_ref
> +
> +	any_ref       = PKT-LINE(id SP name LF)
> +	peeled_ref    = PKT-LINE(id SP name LF)
> +	                PKT-LINE(id SP name "^{}" LF
> +	id            = 40*HEX
> +
> +	HEX           = "0".."9" | "a".."f"
> +	NL            = <US-ASCII NUL, null (0)>
> +	LF            = <US-ASCII LF,  linefeed (10)>
> +	SP            = <US-ASCII SP,  horizontal-tab (9)>

Did you define what "populated_list" is?

> +Smart Service git-upload-pack
> +------------------------------
> +This service reads from the remote repository.

The wording "remote repository" felt confusing.  I know it is "from the
repository served by the server", but if it were named without
"upload-pack", I might have mistaken that you are allowing to proxy a
request to access a third-party repository by this server.  The same
comment applies to the git-receive-pack service.

> +Capability include-tag
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +When packing an object that an annotated tag points at, include the
> +tag object too.  Clients can request this if they want to fetch
> +tags, but don't know which tags they will need until after they
> +receive the branch data.  By enabling include-tag an entire call
> +to upload-pack can be avoided.
> +

I think you are avoiding an "extra" call; you would need one entire call
to upload-pack anyway for the primary transfer.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09 20:44   ` Junio C Hamano
@ 2009-10-10 10:12     ` Antti-Juhani Kaijanaho
  2009-10-16  5:59       ` H. Peter Anvin
  2010-04-07 18:16     ` Tay Ray Chuan
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Antti-Juhani Kaijanaho @ 2009-10-10 10:12 UTC (permalink / raw)
  To: git

On 2009-10-09, Junio C Hamano <gitster@pobox.com> wrote:
>> +If there is no repository at $GIT_URL, the server MUST respond with
>> +the '404 Not Found' HTTP status code.
>
> We may also want to add
>
>     If there is no object at $GIT_URL/some/path, the server MUST respond
>     with the '404 Not Found' HTTP status code.
>
> to help dumb clients.

In both cases - is it really necessary to forbid the use of 410 (Gone)?

-- 
Mr. Antti-Juhani Kaijanaho, Jyvaskyla, Finland

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
                     ` (5 preceding siblings ...)
  2009-10-09 20:44   ` Junio C Hamano
@ 2009-10-10 12:17   ` Tay Ray Chuan
  2010-04-06  4:57   ` Scott Chacon
  2013-09-10 17:07   ` [PATCH 00/14] document edits to original http protocol documentation Tay Ray Chuan
  8 siblings, 0 replies; 46+ messages in thread
From: Tay Ray Chuan @ 2009-10-10 12:17 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

Hi,

On Fri, Oct 9, 2009 at 1:22 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> +Smart Clients
> +~~~~~~~~~~~~~
> +
> +HTTP clients that support the "smart" protocol (or both the
> +"smart" and "dumb" protocols) MUST discover references by making
> +a paramterized request for the info/refs file of the repository.

s/paramterized/parameterized/ -- missing 'e'.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  8:54   ` Alex Blewitt
@ 2009-10-15 16:39     ` Shawn O. Pearce
  0 siblings, 0 replies; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-15 16:39 UTC (permalink / raw)
  To: Alex Blewitt; +Cc: git

Alex Blewitt <Alex.Blewitt@gmail.com> wrote:
> Shawn O. Pearce <spearce <at> spearce.org> writes:
> 
> > +URL Format
> > +----------
> 
> It's worth making clear here that $GIT_URL will be the path to the repository,
...

Thanks, noted.

> > HEX = [0-9a-f]
> 
> Is there any reason not to support A-F as well in the hex spec, even if they
> SHOULD use a-f?

Consistency.  I'd rather be strict and say HEX is [0-9a-f] and
demand that everyone try to standardize on the lower case form.

> This may limit the appeal for some case-insensitive systems.

Given that this particular notation of HEX is *only* used within
the protocol body to describe SHA-1 IDs, it won't make it to the
file system as-is.

A conforming Git implementation would first validate that this is in
fact a SHA-1 ID, likely translate it into a binary representation
(that is collapse the 40 byte hex to a 20 byte binary), and then
reformat it as a file system path if its looking for a loose object.
 
> It would also be good to document, like with the git daemon, whether all
> repositories under a path are exported or only those that have the magic
> setting in the config like git-daemon-export-ok.

This isn't something that matters to the protocol specification.
Its a server access control, not protocol detail.

Really, its an implementation detail of git-http-backend in git.git,
or of the RepositoryResolver and UploadPackFactory in JGit.

Therefore, its not going to be documented in this document.
 
> Lastly, it would be good to clarify when the result of this GET/POST exchange
> is a text-based (and encoded in UTF-8) vs when binary data is returned; we 
> don't want to get into the state where we're returning binary data and 
> pretending that it's UTF-8.

Oh, right.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09 19:50   ` Jeff King
@ 2009-10-15 16:52     ` Shawn O. Pearce
  2009-10-15 17:39       ` Jeff King
  0 siblings, 1 reply; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-15 16:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> On Thu, Oct 08, 2009 at 10:22:45PM -0700, Shawn O. Pearce wrote:
> > +Servers MUST NOT require HTTP cookies for the purposes of
> > +authentication or access control.
> > [...]
> > +Servers MUST NOT require HTTP cookies in order to function correctly.
> 
> Why not? I can grant that the current git implementation probably can't
> handle it, but keep in mind this is talking about the protocol and not
> the implementation.

Good point... this document is about trying to explain the common
functionality that everyone can agree on.

> And I can see it being useful for sites like github
> which already have a cookie-based login.

What I'm concerned about is using the cookie jar.  My Mac OS X
laptop has 5 browsers installed, each with their own #@!*! cookie
jar: Safari, Opera, Firefox, Camino, Google Chrome.  How the hell
is the git client going to be able to use those cookies in order
to interact with a website that requires cookie authentication?

> Adapting the client to handle
> this case would not be too difficult (it would just mean keeping cookie
> state in a file between runs,

Saving our own cookie jar is easy, libcurl has some limited cookie
jar support already built in.  We just have to enable it.

> or even just pulling it out of the normal
> browser's cookie store).

See above, I don't think this will be very easy.

> And people whose client didn't do this would
> simply get an "access denied" response code.

And then they will email git ML or ask on #git why their git client
can't speak to some random website... and its because they used
"lynx" or yet-another-browser whose cookie jar format we can't read.

> Is there a technical reason not to allow it?

Not technical, but I want to reduce the amount of complexity that
a conforming client has to deal with to reduce support costs for
everyone involved.

I weakend the sections on cookies:

+ Authentication
+ --------------
....
+ Servers SHOULD NOT require HTTP cookies for the purposes of
+ authentication or access control.

and that's all we say on the matter.  I took out the Servers MUST
NOT line under session state.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-15 16:52     ` Shawn O. Pearce
@ 2009-10-15 17:39       ` Jeff King
  0 siblings, 0 replies; 46+ messages in thread
From: Jeff King @ 2009-10-15 17:39 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

On Thu, Oct 15, 2009 at 09:52:28AM -0700, Shawn O. Pearce wrote:

> > And I can see it being useful for sites like github
> > which already have a cookie-based login.
> 
> What I'm concerned about is using the cookie jar.  My Mac OS X
> laptop has 5 browsers installed, each with their own #@!*! cookie
> jar: Safari, Opera, Firefox, Camino, Google Chrome.  How the hell
> is the git client going to be able to use those cookies in order
> to interact with a website that requires cookie authentication?

Sure, it is obviously something that an implementation will have to deal
with. Either through manual configuration by the user or some
auto-detection magic that tries to cover every case (and I suspect if we
really wanted to do this, a patch to libcurl to handle different cookie
jar formats would probably be the best way to go).

But my main point was that it is an implementation issue, not a protocol
issue. The lines are a little blurry for us because there really aren't
very many git implementations, but I think your document is an attempt
to document just the protocol to allow interoperability between clients.

But I think you got my point:

> Not technical, but I want to reduce the amount of complexity that
> a conforming client has to deal with to reduce support costs for
> everyone involved.
> 
> I weakend the sections on cookies:
> 
> + Authentication
> + --------------
> ....
> + Servers SHOULD NOT require HTTP cookies for the purposes of
> + authentication or access control.
> 
> and that's all we say on the matter.  I took out the Servers MUST
> NOT line under session state.

I think this is a good compromise. It's not recommended at this point,
but there is no reason to disallow it if both sides can handle the
non-protocol part (i.e., storing and managing cookies). Thanks.

-Peff

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-10 10:12     ` Antti-Juhani Kaijanaho
@ 2009-10-16  5:59       ` H. Peter Anvin
  2009-10-16  7:19         ` Mike Hommey
  2009-10-16 14:23         ` Antti-Juhani Kaijanaho
  0 siblings, 2 replies; 46+ messages in thread
From: H. Peter Anvin @ 2009-10-16  5:59 UTC (permalink / raw)
  To: Antti-Juhani Kaijanaho; +Cc: git

On 10/10/2009 03:12 AM, Antti-Juhani Kaijanaho wrote:
> On 2009-10-09, Junio C Hamano <gitster@pobox.com> wrote:
>>> +If there is no repository at $GIT_URL, the server MUST respond with
>>> +the '404 Not Found' HTTP status code.
>>
>> We may also want to add
>>
>>     If there is no object at $GIT_URL/some/path, the server MUST respond
>>     with the '404 Not Found' HTTP status code.
>>
>> to help dumb clients.
> 
> In both cases - is it really necessary to forbid the use of 410 (Gone)?
> 

410 means "we once had it, it's no longer here, no idea where it went."
 It's a largely useless code...

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-16  5:59       ` H. Peter Anvin
@ 2009-10-16  7:19         ` Mike Hommey
  2009-10-16 14:21           ` Shawn O. Pearce
  2009-10-16 14:23         ` Antti-Juhani Kaijanaho
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Hommey @ 2009-10-16  7:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Antti-Juhani Kaijanaho, git

On Thu, Oct 15, 2009 at 10:59:25PM -0700, H. Peter Anvin wrote:
> On 10/10/2009 03:12 AM, Antti-Juhani Kaijanaho wrote:
> > On 2009-10-09, Junio C Hamano <gitster@pobox.com> wrote:
> >>> +If there is no repository at $GIT_URL, the server MUST respond with
> >>> +the '404 Not Found' HTTP status code.
> >>
> >> We may also want to add
> >>
> >>     If there is no object at $GIT_URL/some/path, the server MUST respond
> >>     with the '404 Not Found' HTTP status code.
> >>
> >> to help dumb clients.
> > 
> > In both cases - is it really necessary to forbid the use of 410 (Gone)?
> > 
> 
> 410 means "we once had it, it's no longer here, no idea where it went."
>  It's a largely useless code...

There is an additional meaning to it, that is "it will never ever
return". It thus has a stronger meaning than 404. Sadly, not even search
engine spiders consider it as a hint to not crawl there in the future...

Mike

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-16  7:19         ` Mike Hommey
@ 2009-10-16 14:21           ` Shawn O. Pearce
  0 siblings, 0 replies; 46+ messages in thread
From: Shawn O. Pearce @ 2009-10-16 14:21 UTC (permalink / raw)
  To: Mike Hommey; +Cc: H. Peter Anvin, Antti-Juhani Kaijanaho, git

Mike Hommey <mh@glandium.org> wrote:
> On Thu, Oct 15, 2009 at 10:59:25PM -0700, H. Peter Anvin wrote:
> > On 10/10/2009 03:12 AM, Antti-Juhani Kaijanaho wrote:
> > > On 2009-10-09, Junio C Hamano <gitster@pobox.com> wrote:
> > >>> +If there is no repository at $GIT_URL, the server MUST respond with
> > >>> +the '404 Not Found' HTTP status code.
> > >>
> > >> We may also want to add
> > >>
> > >>     If there is no object at $GIT_URL/some/path, the server MUST respond
> > >>     with the '404 Not Found' HTTP status code.
> > >>
> > >> to help dumb clients.
> > > 
> > > In both cases - is it really necessary to forbid the use of 410 (Gone)?

My original text got taken a bit out of context here.  I guess MUST
was too strong of a word.  I more ment something like:

  If there is no repository at $GIT_URL, the server MUST NOT respond
  with '200 OK' and a valid info/refs response.  A server SHOULD
  respond with '404 Not Found', '410 Gone', or any other suitable
  HTTP status code which does not imply the resource exists as
  requested.

> > 410 means "we once had it, it's no longer here, no idea where it went."
> >  It's a largely useless code...
> 
> There is an additional meaning to it, that is "it will never ever
> return". It thus has a stronger meaning than 404. Sadly, not even search
> engine spiders consider it as a hint to not crawl there in the future...

I know.  I broke a URL on a site back in Janurary, MSN keeps crawling
it anyway.  F'king spiders.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-16  5:59       ` H. Peter Anvin
  2009-10-16  7:19         ` Mike Hommey
@ 2009-10-16 14:23         ` Antti-Juhani Kaijanaho
  1 sibling, 0 replies; 46+ messages in thread
From: Antti-Juhani Kaijanaho @ 2009-10-16 14:23 UTC (permalink / raw)
  To: git

On 2009-10-16, H. Peter Anvin <hpa@zytor.com> wrote:
> 410 means "we once had it, it's no longer here, no idea where it went."
>  It's a largely useless code...

That's not a reason to forbid it methinks.  And I quite like the difference
between "oops, mistyped the URI" and "oops, that URI is no longer valid".

-- 
Mr. Antti-Juhani Kaijanaho, Jyvaskyla, Finland

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
                     ` (6 preceding siblings ...)
  2009-10-10 12:17   ` Tay Ray Chuan
@ 2010-04-06  4:57   ` Scott Chacon
  2010-04-06  6:09     ` Junio C Hamano
  2013-09-10 17:07   ` [PATCH 00/14] document edits to original http protocol documentation Tay Ray Chuan
  8 siblings, 1 reply; 46+ messages in thread
From: Scott Chacon @ 2010-04-06  4:57 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

Hey,

On Thu, Oct 8, 2009 at 10:22 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
> ---
>  Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
>  1 files changed, 542 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/technical/http-protocol.txt

I just spent a while looking for this in my email archive - why was
this document not added to the technical/ dir?  Can we put it there?

Scott

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2010-04-06  4:57   ` Scott Chacon
@ 2010-04-06  6:09     ` Junio C Hamano
       [not found]       ` <u2hd411cc4a1004060652k5a7f8ea4l67a9b079963f4dc4@mail.gmail.com>
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2010-04-06  6:09 UTC (permalink / raw)
  To: Scott Chacon; +Cc: Shawn O. Pearce, git

Scott Chacon <schacon@gmail.com> writes:

> On Thu, Oct 8, 2009 at 10:22 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
>> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
>> ---
>>  Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
>>  1 files changed, 542 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/technical/http-protocol.txt
>
> I just spent a while looking for this in my email archive - why was
> this document not added to the technical/ dir?  Can we put it there?

Perhaps because it was marked as RFC and not much discussion went on?
Sorry, but I cannot keep mental bandwidth to remember the threads from 6
months ago while doing this as a part-time non-job ;-)

I wonder what other three patches were about, at the same time...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC PATCH 1/4] Document the HTTP transport protocol
       [not found]       ` <u2hd411cc4a1004060652k5a7f8ea4l67a9b079963f4dc4@mail.gmail.com>
@ 2010-04-06 13:53         ` Scott Chacon
  2010-04-06 17:26           ` Junio C Hamano
  0 siblings, 1 reply; 46+ messages in thread
From: Scott Chacon @ 2010-04-06 13:53 UTC (permalink / raw)
  To: git list; +Cc: Shawn O. Pearce

Hey,

On Mon, Apr 5, 2010 at 11:09 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Scott Chacon <schacon@gmail.com> writes:
>
>> On Thu, Oct 8, 2009 at 10:22 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
>>> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
>>> ---
>>>  Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
>>>  1 files changed, 542 insertions(+), 0 deletions(-)
>>>  create mode 100644 Documentation/technical/http-protocol.txt
>>
>> I just spent a while looking for this in my email archive - why was
>> this document not added to the technical/ dir?  Can we put it there?
>
> Perhaps because it was marked as RFC and not much discussion went on?
> Sorry, but I cannot keep mental bandwidth to remember the threads from 6
> months ago while doing this as a part-time non-job ;-)
>

I understand, it wasn't meant as a criticism, I was just curious why
this file was never included.  That the series was marked as RFC makes
sense.  Could I request that this one patch be included?  Or if Shawn
has a more recent one?  I have found and extracted it and have it in a
topic branch locally, but if someone else wanted to reference it to
implement the HTTP stuff it would probably be really helpful to at
least have something in the main tree.

Thanks,
Scott

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2010-04-06 13:53         ` Scott Chacon
@ 2010-04-06 17:26           ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2010-04-06 17:26 UTC (permalink / raw)
  To: Scott Chacon; +Cc: git list, Shawn O. Pearce

Scott Chacon <schacon@gmail.com> writes:

> I understand, it wasn't meant as a criticism, I was just curious why
> this file was never included.  That the series was marked as RFC makes
> sense.  Could I request that this one patch be included?  Or if Shawn
> has a more recent one?

I also understand and I didn't mean to sound as if I took offense.  I very
much appreciate reminders like yours of old discussions and patches that
were basically good but did not reach conclusion at the end to avoid
wasted effort.

A pointer is good, but if you are reviving an old patch, it would be
much easier if you did a resend/forward for people to comment in-line,
by the way.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09 20:44   ` Junio C Hamano
  2009-10-10 10:12     ` Antti-Juhani Kaijanaho
@ 2010-04-07 18:16     ` Tay Ray Chuan
  2010-04-07 18:19     ` Tay Ray Chuan
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Tay Ray Chuan @ 2010-04-07 18:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

Hi,

(I'm reviving this thread to complete the document. What I have right
now is available at my github repo; you can see it at

  http://github.com/rctay/git/compare/git/next...feature/http-doc#files_bucket

.

An inlined patch should be sent in soon.)

On Fri, 09 Oct 2009 13:44:53 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> > new file mode 100644
> > index 0000000..316d9b6
> > --- /dev/null
> > +++ b/Documentation/technical/http-protocol.txt
> > @@ -0,0 +1,542 @@
> > +HTTP transfer protocols
> > +=======================
> > ...
> > +As a design feature smart clients can automatically upgrade "dumb"
> > +protocol URLs to smart URLs.  This permits all users to have the
> > +same published URL, and the peers automatically select the most
> > +efficient transport available to them.
> 
> The first sentence feels backwards although the conclusion in the second
> sentence is true.  It is more like smart ones trying smart protocol first,
> and downgrading to "dumb" after noticing that the server is not smart.

I think Shawn is trying to describe this from the persepective of the
client implementation - a "dumb" url is first constructed, then
"upgraded" to a "smart" one.

> > +Authentication
> > +--------------
> > ...
> > +Clients SHOULD support Basic authentication as described by RFC 2616.
> > +Servers SHOULD support Basic authentication by relying upon the
> > +HTTP server placed in front of the Git server software.
> 
> It is perfectly fine to make it a requirement for a server to support the
> Basic authentication, but should you make it a requirement that the
> support is done by a specific implementation, i.e. "by relying upon..."?

I think the term "Server" in this document is implied as an amalgam of
the HTTP server, and a CGI script/program ("Git server software"). I
think what Shawn meant was for Basic authentication to be implemented
at the server layer, not in the CGI scripts/program.

> > +Session State
> > +-------------
> > ...
> > +retained and managed by the client process.  This permits simple
> > +round-robin load-balancing on the server side, without needing to
> > +worry about state mangement.
> 
> s/mangement/management/;

Done.

> > +pkt-line Format
> > +---------------
> > ...
> > +Examples (as C-style strings):
> > +
> > +  pkt-line          actual value
> > +  ---------------------------------
> > +  "0006a\n"         "a\n"
> > +  "0005a"           "a"
> > +  "000bfoobar\n"    "foobar\n"
> > +  "0004"            ""
> > +
> > +A pkt-line with a length of 0 ("0000") is a special case and MUST
> > +be treated as a message break or terminator in the payload.
> 
> Isn't this "MUST be" wrong?
> 
> It is not an advice to the implementors, but the protocol specification
> itself defines what the flush packet means.  IOW, "The author of this
> specification, Shawn, MUST treat a flush packet as a message break or
> terminator in the payload, when designing this protocol."

This section has been purged; we already have this in
Documentation/technical/protocol-common.txt.

> > +General Request Processing
> > +--------------------------
> > +
> > +Except where noted, all standard HTTP behavior SHOULD be assumed
> > +by both client and server.  This includes (but is not necessarily
> > +limited to):
> > +
> > +If there is no repository at $GIT_URL, the server MUST respond with
> > +the '404 Not Found' HTTP status code.
> 
> We may also want to add
> 
>     If there is no object at $GIT_URL/some/path, the server MUST respond
>     with the '404 Not Found' HTTP status code.
> 
> to help dumb clients.

Proposed re-wording:

  If there is no repository at $GIT_URL, or the resource pointed to by a
  location containing $GIT_URL does not exist, the server MUST NOT respond
  with '200 OK' response.  A server SHOULD respond with
  '404 Not Found', '410 Gone', or any other suitable HTTP status code
  which does not imply the resource exists as requested.

(The 'valid info/refs response' part has been dropped.)

> > +Dumb Clients
> > +~~~~~~~~~~~~
> > +
> > +HTTP clients that only support the "dumb" protocol MUST discover
> > +references by making a request for the special info/refs file of
> > +the repository.
> > +
> > +Dumb HTTP clients MUST NOT include search/query parameters when
> > +fetching the info/refs file.  (That is, '?' must not appear in the
> > +requested URL.)
> 
> It is unclear if '?' can be part of $GIT_URL. E.g.
> 
>     $ wget http://example.xz/serve.cgi?path=git.git/info/refs
>     $ git clone http://example.xz/serve.cgi?path=git.git
> 
> It might be clearer to just say
> 
>     Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
>     without any search/query parameters.  I.e.
> 
> 	C: GET $GIT_URL/info/refs HTTP/1.0
> 
> to also exclude methods other than GET.

Done.

> > +	C: GET $GIT_URL/info/refs HTTP/1.0
> > +
> > +	S: 200 OK
> > ...
> > +When examining the response clients SHOULD only examine the HTTP
> > +status code.  Valid responses are '200 OK', or '304 Not Modified'.
> 
> Isn't 401 ("Ah, I was given a wrong URL") and 403 ("Ok, I do not have an
> access to this repository") also valid?

I think "valid" for the client means "ok, continue processing
normally".

> > +The returned content is a UNIX formatted text file describing
> > +each ref and its known value.  The file SHOULD be sorted by name
> > +according to the C locale ordering.  The file SHOULD NOT include
> > +the default ref named 'HEAD'.
> 
> I know you said "known" to imply "concurrent operations may change it
> while the server is serving this client", but it feels rather awkward.

TODO

> > +Smart Server Response
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Smart servers MUST respond with the smart server reply format.
> > +If the server does not recognize the requested service name, or the
> > +requested service name has been disabled by the server administrator,
> > +the server MUST respond with the '403 Forbidden' HTTP status code.
> 
> This is a bit confusing.
> 
> If you as a server administrator want to disable the smart upload-pack for
> one repository (but not for other repositories), you would not be able to
> force smart clients to fall back to the dumb protocol by giving "403" for
> that repository.
> 
> Maybe in 2 years somebody smarter than us will have invented a more
> efficient git-upload-pack-2 service, which is the only fetch protocol his
> server supports other than dumb.  If your v1 smart client asks for the
> original git-upload-pack service and gets a "403", you won't be able to
> fall back to "dumb".
> 
> The solution for such cases likely is to pretend as if you are a dumb
> server for the smart request.  That unfortunately means that the first
> sentence is misleading, and the second sentence is also an inappropriate
> advice.

Proposed rewording:

  If the server does not recognize the requested service name, or the
  requested service name has been disabled by the server administrator,
  the server MUST respond with the '403 Forbidden' HTTP status code.
  
  Otherwise, smart servers MUST respond with the smart server reply
  format for the requested service name.

> > +The Content-Type MUST be 'application/x-$servicename-advertisement'.
> > +Clients SHOULD fall back to the dumb protocol if another content
> > +type is returned.  When falling back to the dumb protocol clients
> > +SHOULD NOT make an additional request to $GIT_URL/info/refs, but
> > +instead SHOULD use the response already in hand.  Clients MUST NOT
> > +continue if they do not support the dumb protocol.
> 
> The part I commented on (the beginning of Smart Server Response) was
> written as a generic description, not specific to git-upload-pack service,
> and the beginning of this paragraph also pretends to be a generic
> description, but it is misleading.  This is a specific instruction to the
> clients that asked for git-upload-pack service and got a dumb server
> response (if the above were talking about something other than upload-pack
> service, there is no guarantee that "response already in hand" is useful
> to talk to dumb servers).

Previous hunk should fix this.

> > +	ref_list       = empty_list | populated_list
> > +
> > +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> > +
> > +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> > +	                 *ref_record
>
> [snip]
> 
> Did you define what "populated_list" is?

I think "non_empty_list" was meant.

Ideally, ref advertisements should be in protocol-common.txt.

> > +Smart Service git-upload-pack
> > +------------------------------
> > +This service reads from the remote repository.
> 
> The wording "remote repository" felt confusing.  I know it is "from the
> repository served by the server", but if it were named without
> "upload-pack", I might have mistaken that you are allowing to proxy a
> request to access a third-party repository by this server.  The same
> comment applies to the git-receive-pack service.

Would

  This service reads from the repository pointed to by $GIT_URL.

be an improvement?

> > +Capability include-tag
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +When packing an object that an annotated tag points at, include the
> > +tag object too.  Clients can request this if they want to fetch
> > +tags, but don't know which tags they will need until after they
> > +receive the branch data.  By enabling include-tag an entire call
> > +to upload-pack can be avoided.
> > +
> 
> I think you are avoiding an "extra" call; you would need one entire call
> to upload-pack anyway for the primary transfer.

Done.

--
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09 20:44   ` Junio C Hamano
  2009-10-10 10:12     ` Antti-Juhani Kaijanaho
  2010-04-07 18:16     ` Tay Ray Chuan
@ 2010-04-07 18:19     ` Tay Ray Chuan
  2010-04-07 19:11     ` (resend v2) " Tay Ray Chuan
  2010-04-07 19:24     ` Tay Ray Chuan
  4 siblings, 0 replies; 46+ messages in thread
From: Tay Ray Chuan @ 2010-04-07 18:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

Hi,

(I'm reviving this thread to complete the document. What I have right
now is available at my github repo; you can see it at

  http://github.com/rctay/git/compare/git/next...feature/http-doc#files_bucket

.

An inlined patch should be sent in soon.)

On Fri, 09 Oct 2009 13:44:53 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> > new file mode 100644
> > index 0000000..316d9b6
> > --- /dev/null
> > +++ b/Documentation/technical/http-protocol.txt
> > @@ -0,0 +1,542 @@
> > +HTTP transfer protocols
> > +=======================
> > ...
> > +As a design feature smart clients can automatically upgrade "dumb"
> > +protocol URLs to smart URLs.  This permits all users to have the
> > +same published URL, and the peers automatically select the most
> > +efficient transport available to them.
> 
> The first sentence feels backwards although the conclusion in the second
> sentence is true.  It is more like smart ones trying smart protocol first,
> and downgrading to "dumb" after noticing that the server is not smart.

I think Shawn is trying to describe this from the persepective of the
client implementation - a "dumb" url is first constructed, then
"upgraded" to a "smart" one.

> > +Authentication
> > +--------------
> > ...
> > +Clients SHOULD support Basic authentication as described by RFC 2616.
> > +Servers SHOULD support Basic authentication by relying upon the
> > +HTTP server placed in front of the Git server software.
> 
> It is perfectly fine to make it a requirement for a server to support the
> Basic authentication, but should you make it a requirement that the
> support is done by a specific implementation, i.e. "by relying upon..."?

I think the term "Server" in this document is implied as an amalgam of
the HTTP server, and a CGI script/program ("Git server software"). I
think what Shawn meant was for Basic authentication to be implemented
at the server layer, not in the CGI scripts/program.

> > +Session State
> > +-------------
> > ...
> > +retained and managed by the client process.  This permits simple
> > +round-robin load-balancing on the server side, without needing to
> > +worry about state mangement.
> 
> s/mangement/management/;

Done.

> > +pkt-line Format
> > +---------------
> > ...
> > +Examples (as C-style strings):
> > +
> > +  pkt-line          actual value
> > +  ---------------------------------
> > +  "0006a\n"         "a\n"
> > +  "0005a"           "a"
> > +  "000bfoobar\n"    "foobar\n"
> > +  "0004"            ""
> > +
> > +A pkt-line with a length of 0 ("0000") is a special case and MUST
> > +be treated as a message break or terminator in the payload.
> 
> Isn't this "MUST be" wrong?
> 
> It is not an advice to the implementors, but the protocol specification
> itself defines what the flush packet means.  IOW, "The author of this
> specification, Shawn, MUST treat a flush packet as a message break or
> terminator in the payload, when designing this protocol."

This section has been purged; we already have this in
Documentation/technical/protocol-common.txt.

> > +General Request Processing
> > +--------------------------
> > +
> > +Except where noted, all standard HTTP behavior SHOULD be assumed
> > +by both client and server.  This includes (but is not necessarily
> > +limited to):
> > +
> > +If there is no repository at $GIT_URL, the server MUST respond with
> > +the '404 Not Found' HTTP status code.
> 
> We may also want to add
> 
>     If there is no object at $GIT_URL/some/path, the server MUST respond
>     with the '404 Not Found' HTTP status code.
> 
> to help dumb clients.

Proposed re-wording:

  If there is no repository at $GIT_URL, or the resource pointed to by a
  location containing $GIT_URL does not exist, the server MUST NOT respond
  with '200 OK' response.  A server SHOULD respond with
  '404 Not Found', '410 Gone', or any other suitable HTTP status code
  which does not imply the resource exists as requested.

(The 'valid info/refs response' part has been dropped.)

> > +Dumb Clients
> > +~~~~~~~~~~~~
> > +
> > +HTTP clients that only support the "dumb" protocol MUST discover
> > +references by making a request for the special info/refs file of
> > +the repository.
> > +
> > +Dumb HTTP clients MUST NOT include search/query parameters when
> > +fetching the info/refs file.  (That is, '?' must not appear in the
> > +requested URL.)
> 
> It is unclear if '?' can be part of $GIT_URL. E.g.
> 
>     $ wget http://example.xz/serve.cgi?path=git.git/info/refs
>     $ git clone http://example.xz/serve.cgi?path=git.git
> 
> It might be clearer to just say
> 
>     Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
>     without any search/query parameters.  I.e.
> 
> 	C: GET $GIT_URL/info/refs HTTP/1.0
> 
> to also exclude methods other than GET.

Done.

> > +	C: GET $GIT_URL/info/refs HTTP/1.0
> > +
> > +	S: 200 OK
> > ...
> > +When examining the response clients SHOULD only examine the HTTP
> > +status code.  Valid responses are '200 OK', or '304 Not Modified'.
> 
> Isn't 401 ("Ah, I was given a wrong URL") and 403 ("Ok, I do not have an
> access to this repository") also valid?

I think "valid" for the client means "ok, continue processing
normally".

> > +The returned content is a UNIX formatted text file describing
> > +each ref and its known value.  The file SHOULD be sorted by name
> > +according to the C locale ordering.  The file SHOULD NOT include
> > +the default ref named 'HEAD'.
> 
> I know you said "known" to imply "concurrent operations may change it
> while the server is serving this client", but it feels rather awkward.

TODO

> > +Smart Server Response
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Smart servers MUST respond with the smart server reply format.
> > +If the server does not recognize the requested service name, or the
> > +requested service name has been disabled by the server administrator,
> > +the server MUST respond with the '403 Forbidden' HTTP status code.
> 
> This is a bit confusing.
> 
> If you as a server administrator want to disable the smart upload-pack for
> one repository (but not for other repositories), you would not be able to
> force smart clients to fall back to the dumb protocol by giving "403" for
> that repository.
> 
> Maybe in 2 years somebody smarter than us will have invented a more
> efficient git-upload-pack-2 service, which is the only fetch protocol his
> server supports other than dumb.  If your v1 smart client asks for the
> original git-upload-pack service and gets a "403", you won't be able to
> fall back to "dumb".
> 
> The solution for such cases likely is to pretend as if you are a dumb
> server for the smart request.  That unfortunately means that the first
> sentence is misleading, and the second sentence is also an inappropriate
> advice.

Proposed rewording:

  If the server does not recognize the requested service name, or the
  requested service name has been disabled by the server administrator,
  the server MUST respond with the '403 Forbidden' HTTP status code.
  
  Otherwise, smart servers MUST respond with the smart server reply
  format for the requested service name.

> > +The Content-Type MUST be 'application/x-$servicename-advertisement'.
> > +Clients SHOULD fall back to the dumb protocol if another content
> > +type is returned.  When falling back to the dumb protocol clients
> > +SHOULD NOT make an additional request to $GIT_URL/info/refs, but
> > +instead SHOULD use the response already in hand.  Clients MUST NOT
> > +continue if they do not support the dumb protocol.
> 
> The part I commented on (the beginning of Smart Server Response) was
> written as a generic description, not specific to git-upload-pack service,
> and the beginning of this paragraph also pretends to be a generic
> description, but it is misleading.  This is a specific instruction to the
> clients that asked for git-upload-pack service and got a dumb server
> response (if the above were talking about something other than upload-pack
> service, there is no guarantee that "response already in hand" is useful
> to talk to dumb servers).

Previous hunk should fix this.

> > +	ref_list       = empty_list | populated_list
> > +
> > +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> > +
> > +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> > +	                 *ref_record
>
> [snip]
> 
> Did you define what "populated_list" is?

I think "non_empty_list" was meant.

Ideally, ref advertisements should be in protocol-common.txt.

> > +Smart Service git-upload-pack
> > +------------------------------
> > +This service reads from the remote repository.
> 
> The wording "remote repository" felt confusing.  I know it is "from the
> repository served by the server", but if it were named without
> "upload-pack", I might have mistaken that you are allowing to proxy a
> request to access a third-party repository by this server.  The same
> comment applies to the git-receive-pack service.

Would

  This service reads from the repository pointed to by $GIT_URL.

be an improvement?

> > +Capability include-tag
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +When packing an object that an annotated tag points at, include the
> > +tag object too.  Clients can request this if they want to fetch
> > +tags, but don't know which tags they will need until after they
> > +receive the branch data.  By enabling include-tag an entire call
> > +to upload-pack can be avoided.
> > +
> 
> I think you are avoiding an "extra" call; you would need one entire call
> to upload-pack anyway for the primary transfer.

Done.

--
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* (resend v2) Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09 20:44   ` Junio C Hamano
                       ` (2 preceding siblings ...)
  2010-04-07 18:19     ` Tay Ray Chuan
@ 2010-04-07 19:11     ` Tay Ray Chuan
  2010-04-07 19:51       ` Junio C Hamano
  2010-04-07 19:24     ` Tay Ray Chuan
  4 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2010-04-07 19:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

(sorry for the multiple copies - something wrong with my MUA. Had to
give it a kiss in the a** to get it working.)

(v2 - added back headers. My apologies again.)

Hi,

(I'm reviving this thread to complete the document. What I have right
now is available at my github repo; you can see it at

  http://github.com/rctay/git/compare/git/next...feature/http-doc#files_bucket

.

An inlined patch should be sent in soon.)

On Fri, 09 Oct 2009 13:44:53 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> > new file mode 100644
> > index 0000000..316d9b6
> > --- /dev/null
> > +++ b/Documentation/technical/http-protocol.txt
> > @@ -0,0 +1,542 @@
> > +HTTP transfer protocols
> > +=======================
> > ...
> > +As a design feature smart clients can automatically upgrade "dumb"
> > +protocol URLs to smart URLs.  This permits all users to have the
> > +same published URL, and the peers automatically select the most
> > +efficient transport available to them.
>
> The first sentence feels backwards although the conclusion in the second
> sentence is true.  It is more like smart ones trying smart protocol first,
> and downgrading to "dumb" after noticing that the server is not smart.

I think Shawn is trying to describe this from the persepective of the
client implementation - a "dumb" url is first constructed, then
"upgraded" to a "smart" one.

> > +Authentication
> > +--------------
> > ...
> > +Clients SHOULD support Basic authentication as described by RFC 2616.
> > +Servers SHOULD support Basic authentication by relying upon the
> > +HTTP server placed in front of the Git server software.
>
> It is perfectly fine to make it a requirement for a server to support the
> Basic authentication, but should you make it a requirement that the
> support is done by a specific implementation, i.e. "by relying upon..."?

I think the term "Server" in this document is implied as an amalgam of
the HTTP server, and a CGI script/program ("Git server software"). I
think what Shawn meant was for Basic authentication to be implemented
at the server layer, not in the CGI scripts/program.

> > +Session State
> > +-------------
> > ...
> > +retained and managed by the client process.  This permits simple
> > +round-robin load-balancing on the server side, without needing to
> > +worry about state mangement.
>
> s/mangement/management/;

Done.

> > +pkt-line Format
> > +---------------
> > ...
> > +Examples (as C-style strings):
> > +
> > +  pkt-line          actual value
> > +  ---------------------------------
> > +  "0006a\n"         "a\n"
> > +  "0005a"           "a"
> > +  "000bfoobar\n"    "foobar\n"
> > +  "0004"            ""
> > +
> > +A pkt-line with a length of 0 ("0000") is a special case and MUST
> > +be treated as a message break or terminator in the payload.
>
> Isn't this "MUST be" wrong?
>
> It is not an advice to the implementors, but the protocol specification
> itself defines what the flush packet means.  IOW, "The author of this
> specification, Shawn, MUST treat a flush packet as a message break or
> terminator in the payload, when designing this protocol."

This section has been purged; we already have this in
Documentation/technical/protocol-common.txt.

> > +General Request Processing
> > +--------------------------
> > +
> > +Except where noted, all standard HTTP behavior SHOULD be assumed
> > +by both client and server.  This includes (but is not necessarily
> > +limited to):
> > +
> > +If there is no repository at $GIT_URL, the server MUST respond with
> > +the '404 Not Found' HTTP status code.
>
> We may also want to add
>
>     If there is no object at $GIT_URL/some/path, the server MUST respond
>     with the '404 Not Found' HTTP status code.
>
> to help dumb clients.

Proposed re-wording:

  If there is no repository at $GIT_URL, or the resource pointed to by a
  location containing $GIT_URL does not exist, the server MUST NOT respond
  with '200 OK' response.  A server SHOULD respond with
  '404 Not Found', '410 Gone', or any other suitable HTTP status code
  which does not imply the resource exists as requested.

(The 'valid info/refs response' part has been dropped.)

> > +Dumb Clients
> > +~~~~~~~~~~~~
> > +
> > +HTTP clients that only support the "dumb" protocol MUST discover
> > +references by making a request for the special info/refs file of
> > +the repository.
> > +
> > +Dumb HTTP clients MUST NOT include search/query parameters when
> > +fetching the info/refs file.  (That is, '?' must not appear in the
> > +requested URL.)
>
> It is unclear if '?' can be part of $GIT_URL. E.g.
>
>     $ wget http://example.xz/serve.cgi?path=git.git/info/refs
>     $ git clone http://example.xz/serve.cgi?path=git.git
>
> It might be clearer to just say
>
>     Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
>     without any search/query parameters.  I.e.
>
> 	C: GET $GIT_URL/info/refs HTTP/1.0
>
> to also exclude methods other than GET.

Done.

> > +	C: GET $GIT_URL/info/refs HTTP/1.0
> > +
> > +	S: 200 OK
> > ...
> > +When examining the response clients SHOULD only examine the HTTP
> > +status code.  Valid responses are '200 OK', or '304 Not Modified'.
>
> Isn't 401 ("Ah, I was given a wrong URL") and 403 ("Ok, I do not have an
> access to this repository") also valid?

I think "valid" for the client means "ok, continue processing
normally".

> > +The returned content is a UNIX formatted text file describing
> > +each ref and its known value.  The file SHOULD be sorted by name
> > +according to the C locale ordering.  The file SHOULD NOT include
> > +the default ref named 'HEAD'.
>
> I know you said "known" to imply "concurrent operations may change it
> while the server is serving this client", but it feels rather awkward.

TODO

> > +Smart Server Response
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Smart servers MUST respond with the smart server reply format.
> > +If the server does not recognize the requested service name, or the
> > +requested service name has been disabled by the server administrator,
> > +the server MUST respond with the '403 Forbidden' HTTP status code.
>
> This is a bit confusing.
>
> If you as a server administrator want to disable the smart upload-pack for
> one repository (but not for other repositories), you would not be able to
> force smart clients to fall back to the dumb protocol by giving "403" for
> that repository.
>
> Maybe in 2 years somebody smarter than us will have invented a more
> efficient git-upload-pack-2 service, which is the only fetch protocol his
> server supports other than dumb.  If your v1 smart client asks for the
> original git-upload-pack service and gets a "403", you won't be able to
> fall back to "dumb".
>
> The solution for such cases likely is to pretend as if you are a dumb
> server for the smart request.  That unfortunately means that the first
> sentence is misleading, and the second sentence is also an inappropriate
> advice.

Proposed rewording:

  If the server does not recognize the requested service name, or the
  requested service name has been disabled by the server administrator,
  the server MUST respond with the '403 Forbidden' HTTP status code.

  Otherwise, smart servers MUST respond with the smart server reply
  format for the requested service name.

> > +The Content-Type MUST be 'application/x-$servicename-advertisement'.
> > +Clients SHOULD fall back to the dumb protocol if another content
> > +type is returned.  When falling back to the dumb protocol clients
> > +SHOULD NOT make an additional request to $GIT_URL/info/refs, but
> > +instead SHOULD use the response already in hand.  Clients MUST NOT
> > +continue if they do not support the dumb protocol.
>
> The part I commented on (the beginning of Smart Server Response) was
> written as a generic description, not specific to git-upload-pack service,
> and the beginning of this paragraph also pretends to be a generic
> description, but it is misleading.  This is a specific instruction to the
> clients that asked for git-upload-pack service and got a dumb server
> response (if the above were talking about something other than upload-pack
> service, there is no guarantee that "response already in hand" is useful
> to talk to dumb servers).

Previous hunk should fix this.

> > +	ref_list       = empty_list | populated_list
> > +
> > +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> > +
> > +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> > +	                 *ref_record
>
> [snip]
>
> Did you define what "populated_list" is?

I think "non_empty_list" was meant.

Ideally, ref advertisements should be in protocol-common.txt.

> > +Smart Service git-upload-pack
> > +------------------------------
> > +This service reads from the remote repository.
>
> The wording "remote repository" felt confusing.  I know it is "from the
> repository served by the server", but if it were named without
> "upload-pack", I might have mistaken that you are allowing to proxy a
> request to access a third-party repository by this server.  The same
> comment applies to the git-receive-pack service.

Would

  This service reads from the repository pointed to by $GIT_URL.

be an improvement?

> > +Capability include-tag
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +When packing an object that an annotated tag points at, include the
> > +tag object too.  Clients can request this if they want to fetch
> > +tags, but don't know which tags they will need until after they
> > +receive the branch data.  By enabling include-tag an entire call
> > +to upload-pack can be avoided.
> > +
>
> I think you are avoiding an "extra" call; you would need one entire call
> to upload-pack anyway for the primary transfer.

Done.

--
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* (resend v2) Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2009-10-09 20:44   ` Junio C Hamano
                       ` (3 preceding siblings ...)
  2010-04-07 19:11     ` (resend v2) " Tay Ray Chuan
@ 2010-04-07 19:24     ` Tay Ray Chuan
  4 siblings, 0 replies; 46+ messages in thread
From: Tay Ray Chuan @ 2010-04-07 19:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

(sorry for the multiple copies - something wrong with my MUA. Had to
give it a kiss in the a** to get it working.)

(v2 - added back headers. My apologies again.)

Hi,

(I'm reviving this thread to complete the document. What I have right
now is available at my github repo; you can see it at

  http://github.com/rctay/git/compare/git/next...feature/http-doc#files_bucket

.

An inlined patch should be sent in soon.)

On Fri, 09 Oct 2009 13:44:53 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> > new file mode 100644
> > index 0000000..316d9b6
> > --- /dev/null
> > +++ b/Documentation/technical/http-protocol.txt
> > @@ -0,0 +1,542 @@
> > +HTTP transfer protocols
> > +=======================
> > ...
> > +As a design feature smart clients can automatically upgrade "dumb"
> > +protocol URLs to smart URLs.  This permits all users to have the
> > +same published URL, and the peers automatically select the most
> > +efficient transport available to them.
>
> The first sentence feels backwards although the conclusion in the second
> sentence is true.  It is more like smart ones trying smart protocol first,
> and downgrading to "dumb" after noticing that the server is not smart.

I think Shawn is trying to describe this from the persepective of the
client implementation - a "dumb" url is first constructed, then
"upgraded" to a "smart" one.

> > +Authentication
> > +--------------
> > ...
> > +Clients SHOULD support Basic authentication as described by RFC 2616.
> > +Servers SHOULD support Basic authentication by relying upon the
> > +HTTP server placed in front of the Git server software.
>
> It is perfectly fine to make it a requirement for a server to support the
> Basic authentication, but should you make it a requirement that the
> support is done by a specific implementation, i.e. "by relying upon..."?

I think the term "Server" in this document is implied as an amalgam of
the HTTP server, and a CGI script/program ("Git server software"). I
think what Shawn meant was for Basic authentication to be implemented
at the server layer, not in the CGI scripts/program.

> > +Session State
> > +-------------
> > ...
> > +retained and managed by the client process.  This permits simple
> > +round-robin load-balancing on the server side, without needing to
> > +worry about state mangement.
>
> s/mangement/management/;

Done.

> > +pkt-line Format
> > +---------------
> > ...
> > +Examples (as C-style strings):
> > +
> > +  pkt-line          actual value
> > +  ---------------------------------
> > +  "0006a\n"         "a\n"
> > +  "0005a"           "a"
> > +  "000bfoobar\n"    "foobar\n"
> > +  "0004"            ""
> > +
> > +A pkt-line with a length of 0 ("0000") is a special case and MUST
> > +be treated as a message break or terminator in the payload.
>
> Isn't this "MUST be" wrong?
>
> It is not an advice to the implementors, but the protocol specification
> itself defines what the flush packet means.  IOW, "The author of this
> specification, Shawn, MUST treat a flush packet as a message break or
> terminator in the payload, when designing this protocol."

This section has been purged; we already have this in
Documentation/technical/protocol-common.txt.

> > +General Request Processing
> > +--------------------------
> > +
> > +Except where noted, all standard HTTP behavior SHOULD be assumed
> > +by both client and server.  This includes (but is not necessarily
> > +limited to):
> > +
> > +If there is no repository at $GIT_URL, the server MUST respond with
> > +the '404 Not Found' HTTP status code.
>
> We may also want to add
>
>     If there is no object at $GIT_URL/some/path, the server MUST respond
>     with the '404 Not Found' HTTP status code.
>
> to help dumb clients.

Proposed re-wording:

  If there is no repository at $GIT_URL, or the resource pointed to by a
  location containing $GIT_URL does not exist, the server MUST NOT respond
  with '200 OK' response.  A server SHOULD respond with
  '404 Not Found', '410 Gone', or any other suitable HTTP status code
  which does not imply the resource exists as requested.

(The 'valid info/refs response' part has been dropped.)

> > +Dumb Clients
> > +~~~~~~~~~~~~
> > +
> > +HTTP clients that only support the "dumb" protocol MUST discover
> > +references by making a request for the special info/refs file of
> > +the repository.
> > +
> > +Dumb HTTP clients MUST NOT include search/query parameters when
> > +fetching the info/refs file.  (That is, '?' must not appear in the
> > +requested URL.)
>
> It is unclear if '?' can be part of $GIT_URL. E.g.
>
>     $ wget http://example.xz/serve.cgi?path=git.git/info/refs
>     $ git clone http://example.xz/serve.cgi?path=git.git
>
> It might be clearer to just say
>
>     Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
>     without any search/query parameters.  I.e.
>
> 	C: GET $GIT_URL/info/refs HTTP/1.0
>
> to also exclude methods other than GET.

Done.

> > +	C: GET $GIT_URL/info/refs HTTP/1.0
> > +
> > +	S: 200 OK
> > ...
> > +When examining the response clients SHOULD only examine the HTTP
> > +status code.  Valid responses are '200 OK', or '304 Not Modified'.
>
> Isn't 401 ("Ah, I was given a wrong URL") and 403 ("Ok, I do not have an
> access to this repository") also valid?

I think "valid" for the client means "ok, continue processing
normally".

> > +The returned content is a UNIX formatted text file describing
> > +each ref and its known value.  The file SHOULD be sorted by name
> > +according to the C locale ordering.  The file SHOULD NOT include
> > +the default ref named 'HEAD'.
>
> I know you said "known" to imply "concurrent operations may change it
> while the server is serving this client", but it feels rather awkward.

TODO

> > +Smart Server Response
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Smart servers MUST respond with the smart server reply format.
> > +If the server does not recognize the requested service name, or the
> > +requested service name has been disabled by the server administrator,
> > +the server MUST respond with the '403 Forbidden' HTTP status code.
>
> This is a bit confusing.
>
> If you as a server administrator want to disable the smart upload-pack for
> one repository (but not for other repositories), you would not be able to
> force smart clients to fall back to the dumb protocol by giving "403" for
> that repository.
>
> Maybe in 2 years somebody smarter than us will have invented a more
> efficient git-upload-pack-2 service, which is the only fetch protocol his
> server supports other than dumb.  If your v1 smart client asks for the
> original git-upload-pack service and gets a "403", you won't be able to
> fall back to "dumb".
>
> The solution for such cases likely is to pretend as if you are a dumb
> server for the smart request.  That unfortunately means that the first
> sentence is misleading, and the second sentence is also an inappropriate
> advice.

Proposed rewording:

  If the server does not recognize the requested service name, or the
  requested service name has been disabled by the server administrator,
  the server MUST respond with the '403 Forbidden' HTTP status code.

  Otherwise, smart servers MUST respond with the smart server reply
  format for the requested service name.

> > +The Content-Type MUST be 'application/x-$servicename-advertisement'.
> > +Clients SHOULD fall back to the dumb protocol if another content
> > +type is returned.  When falling back to the dumb protocol clients
> > +SHOULD NOT make an additional request to $GIT_URL/info/refs, but
> > +instead SHOULD use the response already in hand.  Clients MUST NOT
> > +continue if they do not support the dumb protocol.
>
> The part I commented on (the beginning of Smart Server Response) was
> written as a generic description, not specific to git-upload-pack service,
> and the beginning of this paragraph also pretends to be a generic
> description, but it is misleading.  This is a specific instruction to the
> clients that asked for git-upload-pack service and got a dumb server
> response (if the above were talking about something other than upload-pack
> service, there is no guarantee that "response already in hand" is useful
> to talk to dumb servers).

Previous hunk should fix this.

> > +	ref_list       = empty_list | populated_list
> > +
> > +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> > +
> > +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> > +	                 *ref_record
>
> [snip]
>
> Did you define what "populated_list" is?

I think "non_empty_list" was meant.

Ideally, ref advertisements should be in protocol-common.txt.

> > +Smart Service git-upload-pack
> > +------------------------------
> > +This service reads from the remote repository.
>
> The wording "remote repository" felt confusing.  I know it is "from the
> repository served by the server", but if it were named without
> "upload-pack", I might have mistaken that you are allowing to proxy a
> request to access a third-party repository by this server.  The same
> comment applies to the git-receive-pack service.

Would

  This service reads from the repository pointed to by $GIT_URL.

be an improvement?

> > +Capability include-tag
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +When packing an object that an annotated tag points at, include the
> > +tag object too.  Clients can request this if they want to fetch
> > +tags, but don't know which tags they will need until after they
> > +receive the branch data.  By enabling include-tag an entire call
> > +to upload-pack can be avoided.
> > +
>
> I think you are avoiding an "extra" call; you would need one entire call
> to upload-pack anyway for the primary transfer.

Done.

--
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: (resend v2) Re: [RFC PATCH 1/4] Document the HTTP transport protocol
  2010-04-07 19:11     ` (resend v2) " Tay Ray Chuan
@ 2010-04-07 19:51       ` Junio C Hamano
  2010-04-08  1:47         ` Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2010-04-07 19:51 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: Shawn O. Pearce, git

Tay Ray Chuan <rctay89@gmail.com> writes:

> (I'm reviving this thread to complete the document. What I have right
> now is available at my github repo; you can see it at
>
>   http://github.com/rctay/git/compare/git/next...feature/http-doc#files_bucket

I looked at the above page; it was quite readable.  You seem to have
picked up Shawn's non-patch responses to reviews quite well.

By the way, aren't there a better way than visiting:

    http://github.com/rctay/git/commits/feature/http-doc/Documentation/technical/http-protocol.txt

and then repeat (click each commit, go back)

to get a moral equivalent of "git log -p feature/http-doc -- $that_path"?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: (resend v2) Re: [RFC PATCH 1/4] Document the HTTP transport  protocol
  2010-04-07 19:51       ` Junio C Hamano
@ 2010-04-08  1:47         ` Tay Ray Chuan
  0 siblings, 0 replies; 46+ messages in thread
From: Tay Ray Chuan @ 2010-04-08  1:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

Hi,

On Thu, Apr 8, 2010 at 3:51 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Tay Ray Chuan <rctay89@gmail.com> writes:
>
>> (I'm reviving this thread to complete the document. What I have right
>> now is available at my github repo; you can see it at
>>
>>   http://github.com/rctay/git/compare/git/next...feature/http-doc#files_bucket
>
> I looked at the above page; it was quite readable.  You seem to have
> picked up Shawn's non-patch responses to reviews quite well.

Thanks.

> By the way, aren't there a better way than visiting:
>
>    http://github.com/rctay/git/commits/feature/http-doc/Documentation/technical/http-protocol.txt

to view just the blob - yeah, but I'm so used to using github's
"Compare view", it's the first thing I do.

> and then repeat (click each commit, go back)
>
> to get a moral equivalent of "git log -p feature/http-doc -- $that_path"?

The Compare view let's you select a range of revisions, so it's not equivalent.

-- 
Cheers,
Ray Chuan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 00/14] document edits to original http protocol documentation
  2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
                     ` (7 preceding siblings ...)
  2010-04-06  4:57   ` Scott Chacon
@ 2013-09-10 17:07   ` Tay Ray Chuan
  2013-09-10 17:07     ` [PATCH 01/14] Document the HTTP transport protocol Tay Ray Chuan
  8 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Scott Chacon, Jeff King, Shawn O . Pearce

This patch series are the changes based on the discussion on Shawn's
original text [1]. Some of them are minor, while some may potentially
change behaviour; see below for a classification of the changes.
Hopefully they can be examined by the git contributors here.

An earlier iteration of this patch series [2], including additional
changes by Nguyen [3], had been merged in 36d8020 (Merge branch
'sp/doc-smart-http', Aug 30). Since that iteration, the changes have
been corrected and consolidated. Effort has also been made to provide
the context for the changes; hopefully it helps with the review.

[1] http://mid.gmane.org/<1255065768-10428-2-git-send-email-spearce@spearce.org>
[2] https://github.com/rctay/git/blob/rc/http-doc/v1/p/Documentation/technical/http-protocol.txt
[3] http://mid.gmane.org/<1377092713-25434-1-git-send-email-pclouds@gmail.com>

(For convenience, a diff against 36d8020 is included at the end of this
message; it is in word-diff form, hopefully for better clarity of the
changes.)

Given that an earlier iteration had already been merged, perhaps that
could be replaced with merge -Xtheirs (just throwing ideas, my git-fu is
not that strong). This would make the changes on the original RFC
available eg. via git-blame, which may be helpful for implementations
made based on the original RFC, especially since these "early"
implementations may now be in violation of the recently-included copy of
the spec.

The patches have been grouped based on their "safeness" (with regard to
potentially changing the protocol spec), with a bias towards caution, as
follows:

Trivial changes (eg formatting, style):
  [PATCH 01/14] Document the HTTP transport protocol
  [PATCH 02/14] normalize indentation with protcol-common.txt
  [PATCH 03/14] capitalize key words according to RFC 2119
  [PATCH 04/14] normalize rules with RFC 5234
  [PATCH 05/14] drop rules, etc. common to the pack protocol
  [PATCH 10/14] fix example request/responses
  [PATCH 13/14] shift dumb server response details
  
Rewords based on discussions that have been settled, or seem safe:
  [PATCH 07/14] weaken specification over cookies for authentication
  [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients
  [PATCH 11/14] be clearer in place of 'remote repository' phrase
  
Potentially behaviour-changes, may need of discussion:
  [PATCH 06/14] reword behaviour on missing repository or objects
  [PATCH 08/14] mention different variations around $GIT_URL
  [PATCH 12/14] reduce confusion over smart server response behaviour
  [PATCH 14/14] mention effect of "allow-tip-sha1-in-want" capability

Full, ordered listing:
  [PATCH 01/14] Document the HTTP transport protocol
  [PATCH 02/14] normalize indentation with protcol-common.txt
  [PATCH 03/14] capitalize key words according to RFC 2119
  [PATCH 04/14] normalize rules with RFC 5234
  [PATCH 05/14] drop rules, etc. common to the pack protocol
  [PATCH 06/14] reword behaviour on missing repository or objects
  [PATCH 07/14] weaken specification over cookies for authentication
  [PATCH 08/14] mention different variations around $GIT_URL
  [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients
  [PATCH 10/14] fix example request/responses
  [PATCH 11/14] be clearer in place of 'remote repository' phrase
  [PATCH 12/14] reduce confusion over smart server response behaviour
  [PATCH 13/14] shift dumb server response details
  [PATCH 14/14] mention effect of "allow-tip-sha1-in-want" capability

This patch series is queued at:

  https://github.com/rctay/git/commits/rc/http-doc/v2/q

-- 
1.8.4.rc4.527.g303b16c

output of

  $ git diff -b --word-diff 36d8020 -- Documentation/technical/http-protocol.txt

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index a1173ee..acc68ac 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -11,6 +11,10 @@ protocol URLs to smart URLs.  This permits all users to have the
same published URL, and the peers automatically select the most
efficient transport available to them.

{+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL+}
{+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and+}
{+"OPTIONAL" in this document are to be interpreted as described in+}
{+RFC 2119.+}

URL Format
----------
@@ -33,16 +37,13 @@ An example of a dumb client requesting for a loose object:
  $GIT_URL:     http://example.com:8080/git/repo.git
  URL request:  http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355

An example of a smart request to a catch-all [-gateway:-]{+gateway (notice how the+}
{+'service' parameter is passed with '&', since a '?' was detected in+}
{+$GIT_URL):+}

  $GIT_URL:     http://example.com/daemon.cgi?svc=git&q=
  URL request:  http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack

[-An example of a request to a submodule:-]

[-  $GIT_URL:     http://example.com/git/repo.git/path/submodule.git-]
[-  URL request:  http://example.com/git/repo.git/path/submodule.git/info/refs-]

Clients MUST strip a trailing '/', if present, from the user supplied
$GIT_URL string to prevent empty path tokens ('//') from appearing
in any URL sent to a server.  Compatible clients MUST expand
@@ -103,9 +104,10 @@ Except where noted, all standard HTTP behavior SHOULD be assumed
by both client and server.  This includes (but is not necessarily
limited to):

If there is no repository at $GIT_URL, [-or-]{+the server MUST NOT respond with+}
{+'200 OK' and a valid info/refs response.  Also, if+} the resource pointed
to by a location matching $GIT_URL does not exist, the server MUST NOT
respond with '200 [-OK' response.-]{+OK'.+}  A server SHOULD respond with
'404 Not Found', '410 Gone', or any other suitable HTTP status code
which does not imply the resource exists as requested.

@@ -114,12 +116,12 @@ permitted, the server MUST respond with the '403 Forbidden' HTTP
status code.

Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
Servers SHOULD support chunked encoding for both
request and response bodies.

Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
Clients SHOULD support chunked encoding for both
request and response bodies.

Servers MAY return ETag and/or Last-Modified headers.

@@ -149,40 +151,16 @@ references by making a request for the special info/refs file of
the repository.

Dumb HTTP clients MUST make a GET request to $GIT_URL/info/refs,
without any search/query parameters.  {+E.g.+}

   C: GET $GIT_URL/info/refs HTTP/1.0

   S: 200 OK
   S:
   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	[-refs/heads/maint-]{+refs/heads/maint\n+}
   S: d049f6c27a2244e12041955e262a404c7faba355	[-refs/heads/master-]{+refs/heads/master\n+}
   S: 2cb58b79488a98d2721cea644875a8dd0026b115	[-refs/tags/v1.0-]{+refs/tags/v1.0\n+}
   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	[-refs/tags/v1.0^{}-]

[-The Content-Type of the returned info/refs entity SHOULD be-]
[-"text/plain; charset=utf-8", but MAY be any content type.-]
[-Clients MUST NOT attempt to validate the returned Content-Type.-]
[-Dumb servers MUST NOT return a return type starting with-]
[-"application/x-git-".-]

[-Cache-Control headers MAY be returned to disable caching of the-]
[-returned entity.-]

[-When examining the response clients SHOULD only examine the HTTP-]
[-status code.  Valid responses are '200 OK', or '304 Not Modified'.-]

[-The returned content is a UNIX formatted text file describing-]
[-each ref and its known value.  The file SHOULD be sorted by name-]
[-according to the C locale ordering.  The file SHOULD NOT include-]
[-the default ref named 'HEAD'.-]

[-  info_refs   =  *( ref_record )-]
[-  ref_record  =  any_ref / peeled_ref-]

[-  any_ref     =  obj-id HTAB refname LF-]
[-  peeled_ref  =  obj-id HTAB refname LF-]
[-		 obj-id HTAB refname "^{}" LF-]{+refs/tags/v1.0^{}\n+}

Smart Clients
~~~~~~~~~~~~~
@@ -196,15 +174,20 @@ The request MUST contain exactly one query parameter,
name the client wishes to contact to complete the operation.
The request MUST NOT contain additional query parameters.

{+TODO: "exactly" one query parameter may be too strict; see the catch-all+}
{+gateway $GIT_URL for an example where more than one parameter is passed.+}
{+In fact, the http client implementation in Git can handle similar+}
{+$GIT_URLs, and thus may pass more than parameter to the server.+}

   C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0

   dumb server reply:
   S: 200 OK
   S:
   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	[-refs/heads/maint-]{+refs/heads/maint\n+}
   S: d049f6c27a2244e12041955e262a404c7faba355	[-refs/heads/master-]{+refs/heads/master\n+}
   S: 2cb58b79488a98d2721cea644875a8dd0026b115	[-refs/tags/v1.0-]{+refs/tags/v1.0\n+}
   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	[-refs/tags/v1.0^{}-]{+refs/tags/v1.0^{}\n+}

   smart server reply:
   S: 200 OK
@@ -216,13 +199,35 @@ The request MUST NOT contain additional query parameters.
   S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
   S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
   S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
   {+S: 0000+}

Dumb Server Response
^^^^^^^^^^^^^^^^^^^^
Dumb servers MUST respond with the dumb server reply format.

[-See-]{+The Content-Type of+} the [-prior section under dumb clients for-]{+returned info/refs entity SHOULD be+}
{+"text/plain; charset=utf-8", but MAY be any content type.+}
{+Clients MUST NOT attempt to validate the returned Content-Type.+}
{+Dumb servers MUST NOT return+} a [-more detailed-]
[-description-]{+return type starting with+}
{+"application/x-git-".+}

{+Cache-Control headers MAY be returned to disable caching+} of the
[-dumb server response.-]{+returned entity.+}

{+When examining the response clients SHOULD only examine the HTTP+}
{+status code.  Valid responses are '200 OK', or '304 Not Modified'.+}

{+The returned content is a UNIX formatted text file describing+}
{+each ref and its known value.  The file SHOULD be sorted by name+}
{+according to the C locale ordering.  The file SHOULD NOT include+}
{+the default ref named 'HEAD'.+}

{+  info_refs        =  *( ref_record )+}
{+  ref_record       =  any_ref / peeled_ref+}

{+  any_ref          =  obj-id HTAB refname LF+}
{+  peeled_ref       =  obj-id HTAB refname LF+}
{+		      obj-id HTAB refname "^{}" LF+}

Smart Server Response
^^^^^^^^^^^^^^^^^^^^^
@@ -268,23 +273,7 @@ named 'HEAD' as the first ref.  The stream MUST include capability
declarations behind a NUL on the first ref.

  smart_reply      =  PKT-LINE("# service=$servicename" LF)
		      [-ref_list-]
[-		     "0000"-]
[-  ref_list        =  empty_list / non_empty_list-]

[-  empty_list      =  PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF)-]

[-  non_empty_list  =  PKT-LINE(obj-id SP name NUL cap_list LF)-]
[-		     *ref_record-]

[-  cap-list        =  capability *(SP capability)-]
[-  capability      =  1*(LC_ALPHA / DIGIT / "-" / "_")-]
[-  LC_ALPHA        =  %x61-7A-]

[-  ref_record      =  any_ref / peeled_ref-]
[-  any_ref         =  PKT-LINE(obj-id SP name LF)-]
[-  peeled_ref      =  PKT-LINE(obj-id SP name LF)-]
[-		     PKT-LINE(obj-id SP name "^{}" LF-]{+advertised-refs+}

Smart Service git-upload-pack
------------------------------
@@ -394,7 +383,7 @@ The computation to select the minimal pack proceeds as follows
     emptied C_PENDING it SHOULD include a "done" command to let
     the server know it won't proceed:

   C: [-0009done-]{+0009done\n+}

  (s) Parse the git-upload-pack request:

@@ -450,7 +439,7 @@ TODO: Document parsing response

Smart Service git-receive-pack
------------------------------
This service [-reads from-]{+modifies+} the repository pointed to by $GIT_URL.

Clients MUST first perform ref discovery with
'$GIT_URL/info/refs?service=git-receive-pack'.
@@ -458,7 +447,7 @@ Clients MUST first perform ref discovery with
   C: POST $GIT_URL/git-receive-pack HTTP/1.0
   C: Content-Type: application/x-git-receive-pack-request
   C:
   C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 [-refs/heads/maint\0 report-status-]{+refs/heads/maint\0report-status+}
   C: 0000
   C: PACK....

@@ -487,9 +476,9 @@ the id obtained through ref discovery as old_id.
  cap_list         =  *(SP capability) SP

  command          =  create / delete / update
  create           =  zero-id SP new_id SP [-name-]{+refname+}
  delete           =  old_id SP zero-id SP [-name-]{+refname+}
  update           =  old_id SP new_id SP [-name-]{+refname+}

TODO: Document this further.

@@ -498,6 +487,9 @@ References
----------

link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
{+link:http://www.ietf.org/rfc/rfc2119.txt[RFC 2119: Key words for use in RFCs to Indicate Requirement Levels]+}
link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
link:technical/pack-protocol.txt
{+link:technical/protocol-common.txt+}
link:technical/protocol-capabilities.txt

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 01/14] Document the HTTP transport protocol
  2013-09-10 17:07   ` [PATCH 00/14] document edits to original http protocol documentation Tay Ray Chuan
@ 2013-09-10 17:07     ` Tay Ray Chuan
  2013-09-10 17:07       ` [PATCH 02/14] normalize indentation with protcol-common.txt Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce

From: "Shawn O. Pearce" <spearce@spearce.org>

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
--

This is the original

  <1255065768-10428-2-git-send-email-spearce@spearce.org>

with some minor changes, as follows:
 - fix mis-spelling 'paramterized'
 - fix mis-spelling 'mangement' (spotted by Junio)
 - fix missing ABNF reference for smart replies (spotted by Sverre, Junio)
---
 Documentation/technical/http-protocol.txt | 542 ++++++++++++++++++++++++++++++
 1 file changed, 542 insertions(+)
 create mode 100644 Documentation/technical/http-protocol.txt

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
new file mode 100644
index 0000000..0a2a53d
--- /dev/null
+++ b/Documentation/technical/http-protocol.txt
@@ -0,0 +1,542 @@
+HTTP transfer protocols
+=======================
+
+Git supports two HTTP based transfer protocols.  A "dumb" protocol
+which requires only a standard HTTP server on the server end of the
+connection, and a "smart" protocol which requires a Git aware CGI
+(or server module).  This document describes both protocols.
+
+As a design feature smart clients can automatically upgrade "dumb"
+protocol URLs to smart URLs.  This permits all users to have the
+same published URL, and the peers automatically select the most
+efficient transport available to them.
+
+
+URL Format
+----------
+
+URLs for Git repositories accessed by HTTP use the standard HTTP
+URL syntax documented by RFC 1738, so they are of the form:
+
+  http://<host>:<port>/<path>
+
+Within this documentation the placeholder $GIT_URL will stand for
+the http:// repository URL entered by the end-user.
+
+Both the "smart" and "dumb" HTTP protocols used by Git operate
+by appending additional path components onto the end of the user
+supplied $GIT_URL string.
+
+Clients MUST strip a trailing '/', if present, from the user supplied
+$GIT_URL string to prevent empty path tokens ('//') from appearing
+in any URL sent to a server.  Compatible clients must expand
+'$GIT_URL/info/refs' as 'foo/info/refs' and not 'foo//info/refs'.
+
+
+Authentication
+--------------
+
+Standard HTTP authentication is used if authentication is required
+to access a repository, and MAY be configured and enforced by the
+HTTP server software.
+
+Because Git repositories are accessed by standard path components
+server administrators MAY use directory based permissions within
+their HTTP server to control repository access.
+
+Clients SHOULD support Basic authentication as described by RFC 2616.
+Servers SHOULD support Basic authentication by relying upon the
+HTTP server placed in front of the Git server software.
+
+Servers MUST NOT require HTTP cookies for the purposes of
+authentication or access control.
+
+Clients and servers MAY support other common forms of HTTP based
+authentication, such as Digest authentication.
+
+
+SSL
+---
+
+Clients and servers SHOULD support SSL, particularly to protect
+passwords when relying on Basic HTTP authentication.
+
+
+Session State
+-------------
+
+The Git over HTTP protocol (much like HTTP itself) is stateless
+from the perspective of the HTTP server side.  All state must be
+retained and managed by the client process.  This permits simple
+round-robin load-balancing on the server side, without needing to
+worry about state management.
+
+Clients MUST NOT require state management on the server side in
+order to function correctly.
+
+Servers MUST NOT require HTTP cookies in order to function correctly.
+Clients MAY store and forward HTTP cookies during request processing
+as described by RFC 2616 (HTTP/1.1).  Servers SHOULD ignore any
+cookies sent by a client.
+
+
+pkt-line Format
+---------------
+
+Much (but not all) of the payload is described around pkt-lines.
+
+A pkt-line is a variable length binary string.  The first four bytes
+of the line indicates the total length of the line, in hexadecimal.
+The total length includes the 4 bytes used to denote the length.
+A line SHOULD BE terminated by an LF, which if present MUST be
+included in the total length.
+
+A pkt-line MAY contain binary data, so implementors MUST ensure all
+pkt-line parsing/formatting routines are 8-bit clean.  The maximum
+length of a pkt-line's data is 65532 bytes (65536 - 4).
+
+Examples (as C-style strings):
+
+  pkt-line          actual value
+  ---------------------------------
+  "0006a\n"         "a\n"
+  "0005a"           "a"
+  "000bfoobar\n"    "foobar\n"
+  "0004"            ""
+
+A pkt-line with a length of 0 ("0000") is a special case and MUST
+be treated as a message break or terminator in the payload.
+
+
+General Request Processing
+--------------------------
+
+Except where noted, all standard HTTP behavior SHOULD be assumed
+by both client and server.  This includes (but is not necessarily
+limited to):
+
+If there is no repository at $GIT_URL, the server MUST respond with
+the '404 Not Found' HTTP status code.
+
+If there is a repository at $GIT_URL, but access is not currently
+permitted, the server MUST respond with the '403 Forbidden' HTTP
+status code.
+
+Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
+Servers SHOULD support chunked encoding for both
+request and response bodies.
+
+Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
+Clients SHOULD support chunked encoding for both
+request and response bodies.
+
+Servers MAY return ETag and/or Last-Modified headers.
+
+Clients MAY revalidate cached entities by including If-Modified-Since
+and/or If-None-Match request headers.
+
+Servers MAY return '304 Not Modified' if the relevant headers appear
+in the request and the entity has not changed.  Clients MUST treat
+'304 Not Modified' identical to '200 OK' by reusing the cached entity.
+
+Clients MAY reuse a cached entity without revalidation if the
+Cache-Control and/or Expires header permits caching.  Clients and
+servers MUST follow RFC 2616 for cache controls.
+
+
+Discovering References
+----------------------
+
+All HTTP clients MUST begin either a fetch or a push exchange by
+discovering the references available on the remote repository.
+
+Dumb Clients
+~~~~~~~~~~~~
+
+HTTP clients that only support the "dumb" protocol MUST discover
+references by making a request for the special info/refs file of
+the repository.
+
+Dumb HTTP clients MUST NOT include search/query parameters when
+fetching the info/refs file.  (That is, '?' must not appear in the
+requested URL.)
+
+	C: GET $GIT_URL/info/refs HTTP/1.0
+
+	S: 200 OK
+	S:
+	S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+	S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+	S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+	S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+The Content-Type of the returned info/refs entity SHOULD be
+"text/plain; charset=utf-8", but MAY be any content type.
+Clients MUST NOT attempt to validate the returned Content-Type.
+Dumb servers MUST NOT return a return type starting with
+"application/x-git-".
+
+Cache-Control headers MAY be returned to disable caching of the
+returned entity.
+
+When examining the response clients SHOULD only examine the HTTP
+status code.  Valid responses are '200 OK', or '304 Not Modified'.
+
+The returned content is a UNIX formatted text file describing
+each ref and its known value.  The file SHOULD be sorted by name
+according to the C locale ordering.  The file SHOULD NOT include
+the default ref named 'HEAD'.
+
+	info_refs     = *( ref_record )
+	ref_record    = any_ref | peeled_ref
+
+	any_ref       = id HT name LF
+	peeled_ref    = id HT name LF
+	                id HT name "^{}" LF
+	id            = 40*HEX
+
+	HEX           = "0".."9" | "a".."f"
+	LF            = <US-ASCII LF, linefeed (10)>
+	HT            = <US-ASCII HT, horizontal-tab (9)>
+
+Smart Clients
+~~~~~~~~~~~~~
+
+HTTP clients that support the "smart" protocol (or both the
+"smart" and "dumb" protocols) MUST discover references by making
+a parameterized request for the info/refs file of the repository.
+
+The request MUST contain exactly one query parameter,
+'service=$servicename', where $servicename MUST be the service
+name the client wishes to contact to complete the operation.
+The request MUST NOT contain additional query parameters.
+
+	C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+
+	dumb server reply:
+	S: 200 OK
+	S:
+	S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+	S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+	S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+	S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+	smart server reply:
+	S: 200 OK
+	S: Content-Type: application/x-git-upload-pack-advertisement
+	S: Cache-Control: no-cache
+	S:
+	S: ....# service=git-upload-pack
+	S: ....95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0 multi_ack
+	S: ....d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
+	S: ....2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
+	S: ....a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
+
+Dumb Server Response
+^^^^^^^^^^^^^^^^^^^^
+Dumb servers MUST respond with the dumb server reply format.
+
+See the prior section under dumb clients for a more detailed
+description of the dumb server response.
+
+Smart Server Response
+^^^^^^^^^^^^^^^^^^^^^
+Smart servers MUST respond with the smart server reply format.
+
+If the server does not recognize the requested service name, or the
+requested service name has been disabled by the server administrator,
+the server MUST respond with the '403 Forbidden' HTTP status code.
+
+Cache-Control headers SHOULD be used to disable caching of the
+returned entity.
+
+The Content-Type MUST be 'application/x-$servicename-advertisement'.
+Clients SHOULD fall back to the dumb protocol if another content
+type is returned.  When falling back to the dumb protocol clients
+SHOULD NOT make an additional request to $GIT_URL/info/refs, but
+instead SHOULD use the response already in hand.  Clients MUST NOT
+continue if they do not support the dumb protocol.
+
+Clients MUST validate the status code is either '200 OK' or
+'304 Not Modified'.
+
+Clients MUST validate the first five bytes of the response entity
+matches the regex "^[0-9a-f]{4}#".  If this test fails, clients
+MUST NOT continue.
+
+Clients MUST parse the entire response as a sequence of pkt-line
+records.
+
+Clients MUST verify the first pkt-line is "# service=$servicename".
+Servers MUST set $servicename to be the request parameter value.
+Servers SHOULD include an LF at the end of this line.
+Clients MUST ignore an LF at the end of the line.
+
+Servers MUST terminate the response with the magic "0000" end
+pkt-line marker.
+
+The returned response is a pkt-line stream describing each ref and
+its known value.  The stream SHOULD be sorted by name according to
+the C locale ordering.  The stream SHOULD include the default ref
+named 'HEAD' as the first ref.  The stream MUST include capability
+declarations behind a NUL on the first ref.
+
+	smart_reply    = PKT-LINE("# service=$servicename" LF)
+	                 ref_list
+	                 "0000"
+	ref_list       = empty_list | non_empty_list
+
+	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
+
+	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
+	                 *ref_record
+
+	cap_list      = *(SP capability) SP
+	ref_record    = any_ref | peeled_ref
+
+	any_ref       = PKT-LINE(id SP name LF)
+	peeled_ref    = PKT-LINE(id SP name LF)
+	                PKT-LINE(id SP name "^{}" LF
+	id            = 40*HEX
+
+	HEX           = "0".."9" | "a".."f"
+	NL            = <US-ASCII NUL, null (0)>
+	LF            = <US-ASCII LF,  linefeed (10)>
+	SP            = <US-ASCII SP,  horizontal-tab (9)>
+
+
+Smart Service git-upload-pack
+------------------------------
+This service reads from the remote repository.
+
+Clients MUST first perform ref discovery with
+'$GIT_URL/info/refs?service=git-upload-pack'.
+
+	C: POST $GIT_URL/git-upload-pack HTTP/1.0
+	C: Content-Type: application/x-git-upload-pack-request
+	C:
+	C: ....want 0a53e9ddeaddad63ad106860237bbf53411d11a7
+	C: ....have 441b40d833fdfa93eb2908e52742248faf0ee993
+	C: 0000
+
+	S: 200 OK
+	S: Content-Type: application/x-git-upload-pack-result
+	S: Cache-Control: no-cache
+	S:
+	S: ....ACK %s, continue
+	S: ....NAK
+
+Clients MUST NOT reuse or revalidate a cached reponse.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one 'want' command in the request body.
+Clients MUST NOT reference an id in a 'want' command which did not
+appear in the response obtained through ref discovery.
+
+	compute_request   = want_list
+	                    have_list
+	                    request_end
+	request_end       = "0000" | "done"
+
+	want_list         = PKT-LINE(want NUL cap_list LF)
+	                    *(want_pkt)
+	want_pkt          = PKT-LINE(want LF)
+	want              = "want" SP id
+	cap_list          = *(SP capability) SP
+
+	have_list         = *PKT-LINE("have" SP id LF)
+
+	command           = create | delete | update
+	create            = 40*"0" SP new_id SP name
+	delete            = old_id SP 40*"0" SP name
+	update            = old_id SP new_id SP name
+
+TODO: Document this further.
+TODO: Don't use uppercase for variable names below.
+
+Capability include-tag
+~~~~~~~~~~~~~~~~~~~~~~
+
+When packing an object that an annotated tag points at, include the
+tag object too.  Clients can request this if they want to fetch
+tags, but don't know which tags they will need until after they
+receive the branch data.  By enabling include-tag an entire call
+to upload-pack can be avoided.
+
+Capability thin-pack
+~~~~~~~~~~~~~~~~~~~~
+
+When packing a deltified object the base is not included if the base
+is reachable from an object listed in the COMMON set by the client.
+This reduces the bandwidth required to transfer, but it does slightly
+increase processing time for the client to save the pack to disk.
+
+The Negotiation Algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~
+The computation to select the minimal pack proceeds as follows
+(c = client, s = server):
+
+ init step:
+ (c) Use ref discovery to obtain the advertised refs.
+ (c) Place any object seen into set ADVERTISED.
+
+ (c) Build an empty set, COMMON, to hold the objects that are later
+     determined to be on both ends.
+ (c) Build a set, WANT, of the objects from ADVERTISED the client
+     wants to fetch, based on what it saw during ref discovery.
+
+ (c) Start a queue, C_PENDING, ordered by commit time (popping newest
+     first).  Add all client refs.  When a commit is popped from
+     the queue its parents should be automatically inserted back.
+     Commits MUST only enter the queue once.
+
+ one compute step:
+ (c) Send one $GIT_URL/git-upload-pack request:
+
+	C: 0032want <WANT #1>...............................
+	C: 0032want <WANT #2>...............................
+	....
+	C: 0032have <COMMON #1>.............................
+	C: 0032have <COMMON #2>.............................
+	....
+	C: 0032have <HAVE #1>...............................
+	C: 0032have <HAVE #2>...............................
+	....
+	C: 0000
+
+     The stream is organized into "commands", with each command
+     appearing by itself in a pkt-line.  Within a command line
+     the text leading up to the first space is the command name,
+     and the remainder of the line to the first LF is the value.
+     Command lines are terminated with an LF as the last byte of
+     the pkt-line value.
+
+     Commands MUST appear in the following order, if they appear
+     at all in the request stream:
+
+       * want
+       * have
+
+     The stream is terminated by a pkt-line flush ("0000").
+
+     A single "want" or "have" command MUST have one hex formatted
+     SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
+     multiple commands.
+
+     The HAVE list is created by popping the first 32 commits
+     from C_PENDING.  Less can be supplied if C_PENDING empties.
+
+     If the client has sent 256 HAVE commits and has not yet
+     received one of those back from S_COMMON, or the client has
+     emptied C_PENDING it should include a "done" command to let
+     the server know it won't proceed:
+
+	C: 0009done
+
+  (s) Parse the git-upload-pack request:
+
+      Verify all objects in WANT are directly reachable from refs.
+
+	  The server MAY walk backwards through history or through
+      the reflog to permit slightly stale requests.
+
+      If no WANT objects are received, send an error:
+
+TODO: Define error if no want lines are requested.
+
+      If any WANT object is not reachable, send an error:
+
+TODO: Define error if an invalid want is requested.
+
+     Create an empty list, S_COMMON.
+
+     If 'have' was sent:
+
+     Loop through the objects in the order supplied by the client.
+     For each object, if the server has the object reachable from
+     a ref, add it to S_COMMON.  If a commit is added to S_COMMON,
+     do not add any ancestors, even if they also appear in HAVE.
+
+  (s) Send the git-upload-pack response:
+
+     If the server has found a closed set of objects to pack or the
+     request ends with "done", it replies with the pack.
+
+TODO: Document the pack based response
+	S: PACK...
+
+     The returned stream is the side-band-64k protocol supported
+     by the git-upload-pack service, and the pack is embedded into
+     stream 1.  Progress messages from the server side may appear
+     in stream 2.
+
+     Here a "closed set of objects" is defined to have at least
+     one path from every WANT to at least one COMMON object.
+
+     If the server needs more information, it replies with a
+     status continue response:
+
+TODO: Document the non-pack response
+
+  (c) Parse the upload-pack response:
+
+TODO: Document parsing response
+
+      Do another compute step.
+
+
+Smart Service git-receive-pack
+------------------------------
+This service modifies the remote repository.
+
+Clients MUST first perform ref discovery with
+'$GIT_URL/info/refs?service=git-receive-pack'.
+
+	C: POST $GIT_URL/git-receive-pack HTTP/1.0
+	C: Content-Type: application/x-git-receive-pack-request
+	C:
+	C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
+	C: 0000
+	C: PACK....
+
+	S: 200 OK
+	S: Content-Type: application/x-git-receive-pack-result
+	S: Cache-Control: no-cache
+	S:
+	S: ....
+
+Clients MUST NOT reuse or revalidate a cached reponse.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one command in the request body.
+Within the command portion of the request body clients SHOULD send
+the id obtained through ref discovery as old_id.
+
+	update_request    = command_list
+	                    "PACK" <binary data>
+
+	command_list      = PKT-LINE(command NUL cap_list LF)
+	                    *(command_pkt)
+	command_pkt       = PKT-LINE(command LF)
+	cap_list          = *(SP capability) SP
+
+	command           = create | delete | update
+	create            = 40*"0" SP new_id SP name
+	delete            = old_id SP 40*"0" SP name
+	update            = old_id SP new_id SP name
+
+TODO: Document this further.
+
+
+References
+----------
+
+link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
+link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
+
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/14] normalize indentation with protcol-common.txt
  2013-09-10 17:07     ` [PATCH 01/14] Document the HTTP transport protocol Tay Ray Chuan
@ 2013-09-10 17:07       ` Tay Ray Chuan
  2013-09-10 17:07         ` [PATCH 03/14] capitalize key words according to RFC 2119 Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Indent client/server query examples with 3 spaces.

Indent ABNF rules with 2 spaces.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
--

This is in its own patch to minimize noise in diffs.
---
 Documentation/technical/http-protocol.txt | 226 +++++++++++++++---------------
 1 file changed, 113 insertions(+), 113 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 0a2a53d..70a1648 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -161,14 +161,14 @@ Dumb HTTP clients MUST NOT include search/query parameters when
 fetching the info/refs file.  (That is, '?' must not appear in the
 requested URL.)
 
-	C: GET $GIT_URL/info/refs HTTP/1.0
+   C: GET $GIT_URL/info/refs HTTP/1.0
 
-	S: 200 OK
-	S:
-	S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
-	S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
-	S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
-	S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+   S: 200 OK
+   S:
+   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
 
 The Content-Type of the returned info/refs entity SHOULD be
 "text/plain; charset=utf-8", but MAY be any content type.
@@ -187,17 +187,17 @@ each ref and its known value.  The file SHOULD be sorted by name
 according to the C locale ordering.  The file SHOULD NOT include
 the default ref named 'HEAD'.
 
-	info_refs     = *( ref_record )
-	ref_record    = any_ref | peeled_ref
+  info_refs        =  *( ref_record )
+  ref_record       =  any_ref | peeled_ref
 
-	any_ref       = id HT name LF
-	peeled_ref    = id HT name LF
-	                id HT name "^{}" LF
-	id            = 40*HEX
+  any_ref          =  id HT name LF
+  peeled_ref       =  id HT name LF
+		      id HT name "^{}" LF
+  id               =  40*HEX
 
-	HEX           = "0".."9" | "a".."f"
-	LF            = <US-ASCII LF, linefeed (10)>
-	HT            = <US-ASCII HT, horizontal-tab (9)>
+  HEX              =  "0".."9" | "a".."f"
+  LF               =  <US-ASCII LF, linefeed (10)>
+  HT               =  <US-ASCII HT, horizontal-tab (9)>
 
 Smart Clients
 ~~~~~~~~~~~~~
@@ -211,26 +211,26 @@ The request MUST contain exactly one query parameter,
 name the client wishes to contact to complete the operation.
 The request MUST NOT contain additional query parameters.
 
-	C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
-
-	dumb server reply:
-	S: 200 OK
-	S:
-	S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
-	S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
-	S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
-	S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
-
-	smart server reply:
-	S: 200 OK
-	S: Content-Type: application/x-git-upload-pack-advertisement
-	S: Cache-Control: no-cache
-	S:
-	S: ....# service=git-upload-pack
-	S: ....95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0 multi_ack
-	S: ....d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
-	S: ....2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
-	S: ....a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
+   C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+
+   dumb server reply:
+   S: 200 OK
+   S:
+   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+   smart server reply:
+   S: 200 OK
+   S: Content-Type: application/x-git-upload-pack-advertisement
+   S: Cache-Control: no-cache
+   S:
+   S: ....# service=git-upload-pack
+   S: ....95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0 multi_ack
+   S: ....d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
+   S: ....2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
+   S: ....a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
 
 Dumb Server Response
 ^^^^^^^^^^^^^^^^^^^^
@@ -281,28 +281,28 @@ the C locale ordering.  The stream SHOULD include the default ref
 named 'HEAD' as the first ref.  The stream MUST include capability
 declarations behind a NUL on the first ref.
 
-	smart_reply    = PKT-LINE("# service=$servicename" LF)
-	                 ref_list
-	                 "0000"
-	ref_list       = empty_list | non_empty_list
+  smart_reply      =  PKT-LINE("# service=$servicename" LF)
+		      ref_list
+		      "0000"
+  ref_list         =  empty_list | non_empty_list
 
-	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
+  empty_list       =  PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
 
-	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
-	                 *ref_record
+  non_empty_list   =  PKT-LINE(id SP name NUL cap_list LF)
+		      *ref_record
 
-	cap_list      = *(SP capability) SP
-	ref_record    = any_ref | peeled_ref
+  cap_list         =  *(SP capability) SP
+  ref_record       =  any_ref | peeled_ref
 
-	any_ref       = PKT-LINE(id SP name LF)
-	peeled_ref    = PKT-LINE(id SP name LF)
-	                PKT-LINE(id SP name "^{}" LF
-	id            = 40*HEX
+  any_ref          =  PKT-LINE(id SP name LF)
+  peeled_ref       =  PKT-LINE(id SP name LF)
+		      PKT-LINE(id SP name "^{}" LF
+  id               =  40*HEX
 
-	HEX           = "0".."9" | "a".."f"
-	NL            = <US-ASCII NUL, null (0)>
-	LF            = <US-ASCII LF,  linefeed (10)>
-	SP            = <US-ASCII SP,  horizontal-tab (9)>
+  HEX              =  "0".."9" | "a".."f"
+  NL               =  <US-ASCII NUL, null (0)>
+  LF               =  <US-ASCII LF,  linefeed (10)>
+  SP               =  <US-ASCII SP,  horizontal-tab (9)>
 
 
 Smart Service git-upload-pack
@@ -312,19 +312,19 @@ This service reads from the remote repository.
 Clients MUST first perform ref discovery with
 '$GIT_URL/info/refs?service=git-upload-pack'.
 
-	C: POST $GIT_URL/git-upload-pack HTTP/1.0
-	C: Content-Type: application/x-git-upload-pack-request
-	C:
-	C: ....want 0a53e9ddeaddad63ad106860237bbf53411d11a7
-	C: ....have 441b40d833fdfa93eb2908e52742248faf0ee993
-	C: 0000
+   C: POST $GIT_URL/git-upload-pack HTTP/1.0
+   C: Content-Type: application/x-git-upload-pack-request
+   C:
+   C: ....want 0a53e9ddeaddad63ad106860237bbf53411d11a7
+   C: ....have 441b40d833fdfa93eb2908e52742248faf0ee993
+   C: 0000
 
-	S: 200 OK
-	S: Content-Type: application/x-git-upload-pack-result
-	S: Cache-Control: no-cache
-	S:
-	S: ....ACK %s, continue
-	S: ....NAK
+   S: 200 OK
+   S: Content-Type: application/x-git-upload-pack-result
+   S: Cache-Control: no-cache
+   S:
+   S: ....ACK %s, continue
+   S: ....NAK
 
 Clients MUST NOT reuse or revalidate a cached reponse.
 Servers MUST include sufficient Cache-Control headers
@@ -336,23 +336,23 @@ Clients MUST send at least one 'want' command in the request body.
 Clients MUST NOT reference an id in a 'want' command which did not
 appear in the response obtained through ref discovery.
 
-	compute_request   = want_list
-	                    have_list
-	                    request_end
-	request_end       = "0000" | "done"
+  compute_request  =  want_list
+		      have_list
+		      request_end
+  request_end      =  "0000" | "done"
 
-	want_list         = PKT-LINE(want NUL cap_list LF)
-	                    *(want_pkt)
-	want_pkt          = PKT-LINE(want LF)
-	want              = "want" SP id
-	cap_list          = *(SP capability) SP
+  want_list        =  PKT-LINE(want NUL cap_list LF)
+		      *(want_pkt)
+  want_pkt         =  PKT-LINE(want LF)
+  want             =  "want" SP id
+  cap_list         =  *(SP capability) SP
 
-	have_list         = *PKT-LINE("have" SP id LF)
+  have_list        =  *PKT-LINE("have" SP id LF)
 
-	command           = create | delete | update
-	create            = 40*"0" SP new_id SP name
-	delete            = old_id SP 40*"0" SP name
-	update            = old_id SP new_id SP name
+  command          =  create | delete | update
+  create           =  40*"0" SP new_id SP name
+  delete           =  old_id SP 40*"0" SP name
+  update           =  old_id SP new_id SP name
 
 TODO: Document this further.
 TODO: Don't use uppercase for variable names below.
@@ -396,16 +396,16 @@ The computation to select the minimal pack proceeds as follows
  one compute step:
  (c) Send one $GIT_URL/git-upload-pack request:
 
-	C: 0032want <WANT #1>...............................
-	C: 0032want <WANT #2>...............................
-	....
-	C: 0032have <COMMON #1>.............................
-	C: 0032have <COMMON #2>.............................
-	....
-	C: 0032have <HAVE #1>...............................
-	C: 0032have <HAVE #2>...............................
-	....
-	C: 0000
+   C: 0032want <WANT #1>...............................
+   C: 0032want <WANT #2>...............................
+   ....
+   C: 0032have <COMMON #1>.............................
+   C: 0032have <COMMON #2>.............................
+   ....
+   C: 0032have <HAVE #1>...............................
+   C: 0032have <HAVE #2>...............................
+   ....
+   C: 0000
 
      The stream is organized into "commands", with each command
      appearing by itself in a pkt-line.  Within a command line
@@ -434,13 +434,13 @@ The computation to select the minimal pack proceeds as follows
      emptied C_PENDING it should include a "done" command to let
      the server know it won't proceed:
 
-	C: 0009done
+   C: 0009done
 
   (s) Parse the git-upload-pack request:
 
       Verify all objects in WANT are directly reachable from refs.
 
-	  The server MAY walk backwards through history or through
+      The server MAY walk backwards through history or through
       the reflog to permit slightly stale requests.
 
       If no WANT objects are received, send an error:
@@ -466,7 +466,7 @@ TODO: Define error if an invalid want is requested.
      request ends with "done", it replies with the pack.
 
 TODO: Document the pack based response
-	S: PACK...
+   S: PACK...
 
      The returned stream is the side-band-64k protocol supported
      by the git-upload-pack service, and the pack is embedded into
@@ -495,18 +495,18 @@ This service modifies the remote repository.
 Clients MUST first perform ref discovery with
 '$GIT_URL/info/refs?service=git-receive-pack'.
 
-	C: POST $GIT_URL/git-receive-pack HTTP/1.0
-	C: Content-Type: application/x-git-receive-pack-request
-	C:
-	C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
-	C: 0000
-	C: PACK....
+   C: POST $GIT_URL/git-receive-pack HTTP/1.0
+   C: Content-Type: application/x-git-receive-pack-request
+   C:
+   C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
+   C: 0000
+   C: PACK....
 
-	S: 200 OK
-	S: Content-Type: application/x-git-receive-pack-result
-	S: Cache-Control: no-cache
-	S:
-	S: ....
+   S: 200 OK
+   S: Content-Type: application/x-git-receive-pack-result
+   S: Cache-Control: no-cache
+   S:
+   S: ....
 
 Clients MUST NOT reuse or revalidate a cached reponse.
 Servers MUST include sufficient Cache-Control headers
@@ -518,18 +518,18 @@ Clients MUST send at least one command in the request body.
 Within the command portion of the request body clients SHOULD send
 the id obtained through ref discovery as old_id.
 
-	update_request    = command_list
-	                    "PACK" <binary data>
+  update_request   =  command_list
+		      "PACK" <binary data>
 
-	command_list      = PKT-LINE(command NUL cap_list LF)
-	                    *(command_pkt)
-	command_pkt       = PKT-LINE(command LF)
-	cap_list          = *(SP capability) SP
+  command_list     =  PKT-LINE(command NUL cap_list LF)
+		      *(command_pkt)
+  command_pkt      =  PKT-LINE(command LF)
+  cap_list         =  *(SP capability) SP
 
-	command           = create | delete | update
-	create            = 40*"0" SP new_id SP name
-	delete            = old_id SP 40*"0" SP name
-	update            = old_id SP new_id SP name
+  command          =  create | delete | update
+  create           =  40*"0" SP new_id SP name
+  delete           =  old_id SP 40*"0" SP name
+  update           =  old_id SP new_id SP name
 
 TODO: Document this further.
 
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/14] capitalize key words according to RFC 2119
  2013-09-10 17:07       ` [PATCH 02/14] normalize indentation with protcol-common.txt Tay Ray Chuan
@ 2013-09-10 17:07         ` Tay Ray Chuan
  2013-09-10 17:07           ` [PATCH 04/14] normalize rules with RFC 5234 Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 70a1648..55753bb 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -11,6 +11,10 @@ protocol URLs to smart URLs.  This permits all users to have the
 same published URL, and the peers automatically select the most
 efficient transport available to them.
 
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
 
 URL Format
 ----------
@@ -29,7 +33,7 @@ supplied $GIT_URL string.
 
 Clients MUST strip a trailing '/', if present, from the user supplied
 $GIT_URL string to prevent empty path tokens ('//') from appearing
-in any URL sent to a server.  Compatible clients must expand
+in any URL sent to a server.  Compatible clients MUST expand
 '$GIT_URL/info/refs' as 'foo/info/refs' and not 'foo//info/refs'.
 
 
@@ -66,7 +70,7 @@ Session State
 -------------
 
 The Git over HTTP protocol (much like HTTP itself) is stateless
-from the perspective of the HTTP server side.  All state must be
+from the perspective of the HTTP server side.  All state MUST be
 retained and managed by the client process.  This permits simple
 round-robin load-balancing on the server side, without needing to
 worry about state management.
@@ -158,7 +162,7 @@ references by making a request for the special info/refs file of
 the repository.
 
 Dumb HTTP clients MUST NOT include search/query parameters when
-fetching the info/refs file.  (That is, '?' must not appear in the
+fetching the info/refs file.  (That is, '?' MUST NOT appear in the
 requested URL.)
 
    C: GET $GIT_URL/info/refs HTTP/1.0
@@ -390,7 +394,7 @@ The computation to select the minimal pack proceeds as follows
 
  (c) Start a queue, C_PENDING, ordered by commit time (popping newest
      first).  Add all client refs.  When a commit is popped from
-     the queue its parents should be automatically inserted back.
+     the queue its parents SHOULD be automatically inserted back.
      Commits MUST only enter the queue once.
 
  one compute step:
@@ -431,7 +435,7 @@ The computation to select the minimal pack proceeds as follows
 
      If the client has sent 256 HAVE commits and has not yet
      received one of those back from S_COMMON, or the client has
-     emptied C_PENDING it should include a "done" command to let
+     emptied C_PENDING it SHOULD include a "done" command to let
      the server know it won't proceed:
 
    C: 0009done
@@ -470,7 +474,7 @@ TODO: Document the pack based response
 
      The returned stream is the side-band-64k protocol supported
      by the git-upload-pack service, and the pack is embedded into
-     stream 1.  Progress messages from the server side may appear
+     stream 1.  Progress messages from the server side MAY appear
      in stream 2.
 
      Here a "closed set of objects" is defined to have at least
@@ -538,5 +542,6 @@ References
 ----------
 
 link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
+link:http://www.ietf.org/rfc/rfc2119.txt[RFC 2119: Key words for use in RFCs to Indicate Requirement Levels]
 link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
 
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/14] normalize rules with RFC 5234
  2013-09-10 17:07         ` [PATCH 03/14] capitalize key words according to RFC 2119 Tay Ray Chuan
@ 2013-09-10 17:07           ` Tay Ray Chuan
  2013-09-10 17:07             ` [PATCH 05/14] drop rules, etc. common to the pack protocol Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Drop LF, SP which are defined in RFC 5234.

Replace HT with HTAB (also defined in the RFC).

Use '/' instead of '|', as the RFC does.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 55753bb..ff91bb0 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -192,16 +192,13 @@ according to the C locale ordering.  The file SHOULD NOT include
 the default ref named 'HEAD'.
 
   info_refs        =  *( ref_record )
-  ref_record       =  any_ref | peeled_ref
+  ref_record       =  any_ref / peeled_ref
 
-  any_ref          =  id HT name LF
-  peeled_ref       =  id HT name LF
-		      id HT name "^{}" LF
+  any_ref          =  id HTAB name LF
+  peeled_ref       =  id HTAB name LF
+		      id HTAB name "^{}" LF
   id               =  40*HEX
 
-  HEX              =  "0".."9" | "a".."f"
-  LF               =  <US-ASCII LF, linefeed (10)>
-  HT               =  <US-ASCII HT, horizontal-tab (9)>
 
 Smart Clients
 ~~~~~~~~~~~~~
@@ -288,7 +285,7 @@ declarations behind a NUL on the first ref.
   smart_reply      =  PKT-LINE("# service=$servicename" LF)
 		      ref_list
 		      "0000"
-  ref_list         =  empty_list | non_empty_list
+  ref_list         =  empty_list / non_empty_list
 
   empty_list       =  PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
 
@@ -296,18 +293,13 @@ declarations behind a NUL on the first ref.
 		      *ref_record
 
   cap_list         =  *(SP capability) SP
-  ref_record       =  any_ref | peeled_ref
+  ref_record       =  any_ref / peeled_ref
 
   any_ref          =  PKT-LINE(id SP name LF)
   peeled_ref       =  PKT-LINE(id SP name LF)
 		      PKT-LINE(id SP name "^{}" LF
   id               =  40*HEX
 
-  HEX              =  "0".."9" | "a".."f"
-  NL               =  <US-ASCII NUL, null (0)>
-  LF               =  <US-ASCII LF,  linefeed (10)>
-  SP               =  <US-ASCII SP,  horizontal-tab (9)>
-
 
 Smart Service git-upload-pack
 ------------------------------
@@ -343,7 +335,7 @@ appear in the response obtained through ref discovery.
   compute_request  =  want_list
 		      have_list
 		      request_end
-  request_end      =  "0000" | "done"
+  request_end      =  "0000" / "done"
 
   want_list        =  PKT-LINE(want NUL cap_list LF)
 		      *(want_pkt)
@@ -353,7 +345,7 @@ appear in the response obtained through ref discovery.
 
   have_list        =  *PKT-LINE("have" SP id LF)
 
-  command          =  create | delete | update
+  command          =  create / delete / update
   create           =  40*"0" SP new_id SP name
   delete           =  old_id SP 40*"0" SP name
   update           =  old_id SP new_id SP name
@@ -530,7 +522,7 @@ the id obtained through ref discovery as old_id.
   command_pkt      =  PKT-LINE(command LF)
   cap_list         =  *(SP capability) SP
 
-  command          =  create | delete | update
+  command          =  create / delete / update
   create           =  40*"0" SP new_id SP name
   delete           =  old_id SP 40*"0" SP name
   update           =  old_id SP new_id SP name
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/14] drop rules, etc. common to the pack protocol
  2013-09-10 17:07           ` [PATCH 04/14] normalize rules with RFC 5234 Tay Ray Chuan
@ 2013-09-10 17:07             ` Tay Ray Chuan
  2013-09-10 17:07               ` [PATCH 06/14] reword behaviour on missing repository or objects Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Use obj-id in lieu of id (defined as 40*HEX).

Use zero-id in lieu of 40*"0".

Use refname in lieu of name (not defined).

Drop section on capabilities, since they are already available in
protocol-capabilities.txt.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
--

pkt-line format section was dropped in response to Junio's comments:

  From:   Junio C Hamano <gitster@pobox.com>
  Message-ID: <7vskdss3ei.fsf@alter.siamese.dyndns.org>

  > +pkt-line Format
  > +---------------
  > ...
  > +Examples (as C-style strings):
  > +
  > +  pkt-line          actual value
  > +  ---------------------------------
  > +  "0006a\n"         "a\n"
  > +  "0005a"           "a"
  > +  "000bfoobar\n"    "foobar\n"
  > +  "0004"            ""
  > +
  > +A pkt-line with a length of 0 ("0000") is a special case and MUST
  > +be treated as a message break or terminator in the payload.

  Isn't this "MUST be" wrong?

  It is not an advice to the implementors, but the protocol specification
  itself defines what the flush packet means.  IOW, "The author of this
  specification, Shawn, MUST treat a flush packet as a message break or
  terminator in the payload, when designing this protocol."

Capabilities and 'command' ABNF rules under git-upload-pack were
dropped by Nguyễn:

  Message-ID: <1377092713-25434-1-git-send-email-pclouds@gmail.com>
---
 Documentation/technical/http-protocol.txt | 85 ++++---------------------------
 1 file changed, 10 insertions(+), 75 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index ff91bb0..a8d28ba 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -84,34 +84,6 @@ as described by RFC 2616 (HTTP/1.1).  Servers SHOULD ignore any
 cookies sent by a client.
 
 
-pkt-line Format
----------------
-
-Much (but not all) of the payload is described around pkt-lines.
-
-A pkt-line is a variable length binary string.  The first four bytes
-of the line indicates the total length of the line, in hexadecimal.
-The total length includes the 4 bytes used to denote the length.
-A line SHOULD BE terminated by an LF, which if present MUST be
-included in the total length.
-
-A pkt-line MAY contain binary data, so implementors MUST ensure all
-pkt-line parsing/formatting routines are 8-bit clean.  The maximum
-length of a pkt-line's data is 65532 bytes (65536 - 4).
-
-Examples (as C-style strings):
-
-  pkt-line          actual value
-  ---------------------------------
-  "0006a\n"         "a\n"
-  "0005a"           "a"
-  "000bfoobar\n"    "foobar\n"
-  "0004"            ""
-
-A pkt-line with a length of 0 ("0000") is a special case and MUST
-be treated as a message break or terminator in the payload.
-
-
 General Request Processing
 --------------------------
 
@@ -194,11 +166,9 @@ the default ref named 'HEAD'.
   info_refs        =  *( ref_record )
   ref_record       =  any_ref / peeled_ref
 
-  any_ref          =  id HTAB name LF
-  peeled_ref       =  id HTAB name LF
-		      id HTAB name "^{}" LF
-  id               =  40*HEX
-
+  any_ref          =  obj-id HTAB refname LF
+  peeled_ref       =  obj-id HTAB refname LF
+		      obj-id HTAB refname "^{}" LF
 
 Smart Clients
 ~~~~~~~~~~~~~
@@ -283,23 +253,7 @@ named 'HEAD' as the first ref.  The stream MUST include capability
 declarations behind a NUL on the first ref.
 
   smart_reply      =  PKT-LINE("# service=$servicename" LF)
-		      ref_list
-		      "0000"
-  ref_list         =  empty_list / non_empty_list
-
-  empty_list       =  PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
-
-  non_empty_list   =  PKT-LINE(id SP name NUL cap_list LF)
-		      *ref_record
-
-  cap_list         =  *(SP capability) SP
-  ref_record       =  any_ref / peeled_ref
-
-  any_ref          =  PKT-LINE(id SP name LF)
-  peeled_ref       =  PKT-LINE(id SP name LF)
-		      PKT-LINE(id SP name "^{}" LF
-  id               =  40*HEX
-
+		      advertised-refs
 
 Smart Service git-upload-pack
 ------------------------------
@@ -345,31 +299,9 @@ appear in the response obtained through ref discovery.
 
   have_list        =  *PKT-LINE("have" SP id LF)
 
-  command          =  create / delete / update
-  create           =  40*"0" SP new_id SP name
-  delete           =  old_id SP 40*"0" SP name
-  update           =  old_id SP new_id SP name
-
 TODO: Document this further.
 TODO: Don't use uppercase for variable names below.
 
-Capability include-tag
-~~~~~~~~~~~~~~~~~~~~~~
-
-When packing an object that an annotated tag points at, include the
-tag object too.  Clients can request this if they want to fetch
-tags, but don't know which tags they will need until after they
-receive the branch data.  By enabling include-tag an entire call
-to upload-pack can be avoided.
-
-Capability thin-pack
-~~~~~~~~~~~~~~~~~~~~
-
-When packing a deltified object the base is not included if the base
-is reachable from an object listed in the COMMON set by the client.
-This reduces the bandwidth required to transfer, but it does slightly
-increase processing time for the client to save the pack to disk.
-
 The Negotiation Algorithm
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 The computation to select the minimal pack proceeds as follows
@@ -523,9 +455,9 @@ the id obtained through ref discovery as old_id.
   cap_list         =  *(SP capability) SP
 
   command          =  create / delete / update
-  create           =  40*"0" SP new_id SP name
-  delete           =  old_id SP 40*"0" SP name
-  update           =  old_id SP new_id SP name
+  create           =  zero-id SP new_id SP refname
+  delete           =  old_id SP zero-id SP refname
+  update           =  old_id SP new_id SP refname
 
 TODO: Document this further.
 
@@ -536,4 +468,7 @@ References
 link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
 link:http://www.ietf.org/rfc/rfc2119.txt[RFC 2119: Key words for use in RFCs to Indicate Requirement Levels]
 link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
+link:technical/pack-protocol.txt
+link:technical/protocol-common.txt
+link:technical/protocol-capabilities.txt
 
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/14] reword behaviour on missing repository or objects
  2013-09-10 17:07             ` [PATCH 05/14] drop rules, etc. common to the pack protocol Tay Ray Chuan
@ 2013-09-10 17:07               ` Tay Ray Chuan
  2013-09-10 17:07                 ` [PATCH 07/14] weaken specification over cookies for authentication Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Shawn O. Pearce, Antti-Juhani Kaijanaho,
	H. Peter Anvin, Mike Hommey

From: "Shawn O. Pearce" <spearce@spearce.org>

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
--
To Shawn: sign-off-by needed.

Based on:

  From: "Shawn O. Pearce" <spearce@spearce.org>
  Message-ID: <20091016142135.GR10505@spearce.org>

  Mike Hommey <mh@glandium.org> wrote:
  > On Thu, Oct 15, 2009 at 10:59:25PM -0700, H. Peter Anvin wrote:
  > > On 10/10/2009 03:12 AM, Antti-Juhani Kaijanaho wrote:
  > > > On 2009-10-09, Junio C Hamano <gitster@pobox.com> wrote:
  > > >>> +If there is no repository at $GIT_URL, the server MUST respond with
  > > >>> +the '404 Not Found' HTTP status code.
  > > >>
  > > >> We may also want to add
  > > >>
  > > >>     If there is no object at $GIT_URL/some/path, the server MUST respond
  > > >>     with the '404 Not Found' HTTP status code.
  > > >>
  > > >> to help dumb clients.
  > > >
  > > > In both cases - is it really necessary to forbid the use of 410 (Gone)?

  My original text got taken a bit out of context here.  I guess MUST
  was too strong of a word.  I more ment something like:

    If there is no repository at $GIT_URL, the server MUST NOT respond
    with '200 OK' and a valid info/refs response.  A server SHOULD
    respond with '404 Not Found', '410 Gone', or any other suitable
    HTTP status code which does not imply the resource exists as
    requested.

In addition, address behaviour on missing objects, as suggested by
Junio. His text (see quote in above excerpt) was not used, in favour of
a more general treatment (locations matching $GIT_URL, not just
objects).
---
 Documentation/technical/http-protocol.txt | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index a8d28ba..412b898 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -91,8 +91,12 @@ Except where noted, all standard HTTP behavior SHOULD be assumed
 by both client and server.  This includes (but is not necessarily
 limited to):
 
-If there is no repository at $GIT_URL, the server MUST respond with
-the '404 Not Found' HTTP status code.
+If there is no repository at $GIT_URL, the server MUST NOT respond with
+'200 OK' and a valid info/refs response.  Also, if the resource pointed
+to by a location matching $GIT_URL does not exist, the server MUST NOT
+respond with '200 OK'.  A server SHOULD respond with
+'404 Not Found', '410 Gone', or any other suitable HTTP status code
+which does not imply the resource exists as requested.
 
 If there is a repository at $GIT_URL, but access is not currently
 permitted, the server MUST respond with the '403 Forbidden' HTTP
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/14] weaken specification over cookies for authentication
  2013-09-10 17:07               ` [PATCH 06/14] reword behaviour on missing repository or objects Tay Ray Chuan
@ 2013-09-10 17:07                 ` Tay Ray Chuan
  2013-09-10 17:07                   ` [PATCH 08/14] mention different variations around $GIT_URL Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Shawn O. Pearce, Jeff King

From: "Shawn O. Pearce" <spearce@spearce.org>

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
--
To Shawn: sign-off-by needed.

Based on the discussion in
  <20091009195035.GA15153@coredump.intra.peff.net>,
  <20091015165228.GO10505@spearce.org> (patch),
  <20091015173902.GA22262@sigill.intra.peff.net> (agreement)

  From: "Shawn O. Pearce" <spearce@spearce.org>
  Message-ID: <20091015165228.GO10505@spearce.org>

  I weakend the sections on cookies:

  + Authentication
  + --------------
  ....
  + Servers SHOULD NOT require HTTP cookies for the purposes of
  + authentication or access control.

  and that's all we say on the matter.  I took out the Servers MUST
  NOT line under session state.
---
 Documentation/technical/http-protocol.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 412b898..2382384 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -52,7 +52,7 @@ Clients SHOULD support Basic authentication as described by RFC 2616.
 Servers SHOULD support Basic authentication by relying upon the
 HTTP server placed in front of the Git server software.
 
-Servers MUST NOT require HTTP cookies for the purposes of
+Servers SHOULD NOT require HTTP cookies for the purposes of
 authentication or access control.
 
 Clients and servers MAY support other common forms of HTTP based
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/14] mention different variations around $GIT_URL
  2013-09-10 17:07                 ` [PATCH 07/14] weaken specification over cookies for authentication Tay Ray Chuan
@ 2013-09-10 17:07                   ` Tay Ray Chuan
  2013-09-10 17:07                     ` [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Alex Blewitt, Shawn O. Pearce

Based on

  From:	Alex Blewitt <Alex.Blewitt@gmail.com>
  Message-ID: <loom.20091009T104530-586@post.gmane.org>

  Shawn O. Pearce <spearce <at> spearce.org> writes:

  > +URL Format
  > +----------
  > +
  > +URLs for Git repositories accessed by HTTP use the standard HTTP
  > +URL syntax documented by RFC 1738, so they are of the form:
  > +
  > +  http://<host>:<port>/<path>
  > +
  > +Within this documentation the placeholder $GIT_URL will stand for
  > +the http:// repository URL entered by the end-user.

  It's worth making clear here that $GIT_URL will be the path to the repository,
  rather than necessarily just the host upon which the server sits. Perhaps
  including an example, like http://example:8080/repos/example.git
  would make it clearer that there can be a path (and so leading to
  a request like http://example:8080/repos/example.git/info/refs?service=...

  It's also worth clarifying, therefore, that multiple repositories can be served
  by the same process (as with the git server today) by using different path(s).
  And for those that are interested in submodules, it's worth confirming that
  http://example/repos/master.git/child.git/info/refs?service= will ensure
  that the repository is the 'child' git rather than anything else.

The submodule example (/master.git/child.git) seems potentially
confusing - it suggests a setup where the server has a route to a git
repo (child.git) with a parent path containing another git repo
(master.git). It is excluded lest we be seen as encouraging such
mind-boggling setups.

While providing an example $GIT_URL containing a '?' (the catch-all
gateway one), also mention a possible contradiction between the
exactly-one-param requirement and the http client implementation in Git.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 2382384..d0955c2 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -22,15 +22,28 @@ URL Format
 URLs for Git repositories accessed by HTTP use the standard HTTP
 URL syntax documented by RFC 1738, so they are of the form:
 
-  http://<host>:<port>/<path>
+  http://<host>:<port>/<path>?<searchpart>
 
 Within this documentation the placeholder $GIT_URL will stand for
 the http:// repository URL entered by the end-user.
 
-Both the "smart" and "dumb" HTTP protocols used by Git operate
+Servers SHOULD handle all requests to locations matching $GIT_URL, as
+both the "smart" and "dumb" HTTP protocols used by Git operate
 by appending additional path components onto the end of the user
 supplied $GIT_URL string.
 
+An example of a dumb client requesting for a loose object:
+
+  $GIT_URL:     http://example.com:8080/git/repo.git
+  URL request:  http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355
+
+An example of a smart request to a catch-all gateway (notice how the
+'service' parameter is passed with '&', since a '?' was detected in
+$GIT_URL):
+
+  $GIT_URL:     http://example.com/daemon.cgi?svc=git&q=
+  URL request:  http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack
+
 Clients MUST strip a trailing '/', if present, from the user supplied
 $GIT_URL string to prevent empty path tokens ('//') from appearing
 in any URL sent to a server.  Compatible clients MUST expand
@@ -186,6 +199,11 @@ The request MUST contain exactly one query parameter,
 name the client wishes to contact to complete the operation.
 The request MUST NOT contain additional query parameters.
 
+TODO: "exactly" one query parameter may be too strict; see the catch-all
+gateway $GIT_URL for an example where more than one parameter is passed.
+In fact, the http client implementation in Git can handle similar
+$GIT_URLs, and thus may pass more than parameter to the server.
+
    C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
 
    dumb server reply:
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients
  2013-09-10 17:07                   ` [PATCH 08/14] mention different variations around $GIT_URL Tay Ray Chuan
@ 2013-09-10 17:07                     ` Tay Ray Chuan
  2013-09-10 17:07                       ` [PATCH 10/14] fix example request/responses Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

From: Junio C Hamano <gitster@pobox.com>

It is unclear if '?' can be part of $GIT_URL. E.g.

    $ wget http://example.xz/serve.cgi?path=git.git/info/refs
    $ git clone http://example.xz/serve.cgi?path=git.git

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
--

Notes:
 - said "request to" instead of Junio's "request against", for
   consistency with the rest of the document.
 - said "E.g." instead of "I.e." since it's an example request and
   response

Based on:

  From:   Junio C Hamano <gitster@pobox.com>
  Message-ID: <7vskdss3ei.fsf@alter.siamese.dyndns.org>

  > +Dumb Clients
  > +~~~~~~~~~~~~
  > +
  > +HTTP clients that only support the "dumb" protocol MUST discover
  > +references by making a request for the special info/refs file of
  > +the repository.
  > +
  > +Dumb HTTP clients MUST NOT include search/query parameters when
  > +fetching the info/refs file.  (That is, '?' must not appear in the
  > +requested URL.)

  It is unclear if '?' can be part of $GIT_URL. E.g.

      $ wget http://example.xz/serve.cgi?path=git.git/info/refs
      $ git clone http://example.xz/serve.cgi?path=git.git

  It might be clearer to just say

      Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
      without any search/query parameters.  I.e.

          C: GET $GIT_URL/info/refs HTTP/1.0

  to also exclude methods other than GET.
---
 Documentation/technical/http-protocol.txt | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index d0955c2..5141c6a 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -150,9 +150,8 @@ HTTP clients that only support the "dumb" protocol MUST discover
 references by making a request for the special info/refs file of
 the repository.
 
-Dumb HTTP clients MUST NOT include search/query parameters when
-fetching the info/refs file.  (That is, '?' MUST NOT appear in the
-requested URL.)
+Dumb HTTP clients MUST make a GET request to $GIT_URL/info/refs,
+without any search/query parameters.  E.g.
 
    C: GET $GIT_URL/info/refs HTTP/1.0
 
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/14] fix example request/responses
  2013-09-10 17:07                     ` [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients Tay Ray Chuan
@ 2013-09-10 17:07                       ` Tay Ray Chuan
  2013-09-10 17:07                         ` [PATCH 11/14] be clearer in place of 'remote repository' phrase Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Add LF for responses.

For smart interactions, add pkt-line lengths and the flush-pkt (0000) line.

Drop the SP that followed NUL before capability list.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 35 ++++++++++++++++---------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 5141c6a..dbfff36 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -157,10 +157,10 @@ without any search/query parameters.  E.g.
 
    S: 200 OK
    S:
-   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
-   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
-   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
-   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint\n
+   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master\n
+   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0\n
+   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}\n
 
 The Content-Type of the returned info/refs entity SHOULD be
 "text/plain; charset=utf-8", but MAY be any content type.
@@ -208,21 +208,22 @@ $GIT_URLs, and thus may pass more than parameter to the server.
    dumb server reply:
    S: 200 OK
    S:
-   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
-   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
-   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
-   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint\n
+   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master\n
+   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0\n
+   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}\n
 
    smart server reply:
    S: 200 OK
    S: Content-Type: application/x-git-upload-pack-advertisement
    S: Cache-Control: no-cache
    S:
-   S: ....# service=git-upload-pack
-   S: ....95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0 multi_ack
-   S: ....d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
-   S: ....2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
-   S: ....a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
+   S: 001e# service=git-upload-pack\n
+   S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
+   S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
+   S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
+   S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
+   S: 0000
 
 Dumb Server Response
 ^^^^^^^^^^^^^^^^^^^^
@@ -286,8 +287,8 @@ Clients MUST first perform ref discovery with
    C: POST $GIT_URL/git-upload-pack HTTP/1.0
    C: Content-Type: application/x-git-upload-pack-request
    C:
-   C: ....want 0a53e9ddeaddad63ad106860237bbf53411d11a7
-   C: ....have 441b40d833fdfa93eb2908e52742248faf0ee993
+   C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n
+   C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n
    C: 0000
 
    S: 200 OK
@@ -383,7 +384,7 @@ The computation to select the minimal pack proceeds as follows
      emptied C_PENDING it SHOULD include a "done" command to let
      the server know it won't proceed:
 
-   C: 0009done
+   C: 0009done\n
 
   (s) Parse the git-upload-pack request:
 
@@ -447,7 +448,7 @@ Clients MUST first perform ref discovery with
    C: POST $GIT_URL/git-receive-pack HTTP/1.0
    C: Content-Type: application/x-git-receive-pack-request
    C:
-   C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
+   C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0report-status
    C: 0000
    C: PACK....
 
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/14] be clearer in place of 'remote repository' phrase
  2013-09-10 17:07                       ` [PATCH 10/14] fix example request/responses Tay Ray Chuan
@ 2013-09-10 17:07                         ` Tay Ray Chuan
  2013-09-10 17:07                           ` [PATCH 12/14] reduce confusion over smart server response behaviour Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Based on:

  From:   Junio C Hamano <gitster@pobox.com>
  Message-ID: <7vskdss3ei.fsf@alter.siamese.dyndns.org>

  > +Smart Service git-upload-pack
  > +------------------------------
  > +This service reads from the remote repository.

  The wording "remote repository" felt confusing.  I know it is "from the
  repository served by the server", but if it were named without
  "upload-pack", I might have mistaken that you are allowing to proxy a
  request to access a third-party repository by this server.  The same
  comment applies to the git-receive-pack service.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index dbfff36..4bb1614 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -279,7 +279,7 @@ declarations behind a NUL on the first ref.
 
 Smart Service git-upload-pack
 ------------------------------
-This service reads from the remote repository.
+This service reads from the repository pointed to by $GIT_URL.
 
 Clients MUST first perform ref discovery with
 '$GIT_URL/info/refs?service=git-upload-pack'.
@@ -440,7 +440,7 @@ TODO: Document parsing response
 
 Smart Service git-receive-pack
 ------------------------------
-This service modifies the remote repository.
+This service modifies the repository pointed to by $GIT_URL.
 
 Clients MUST first perform ref discovery with
 '$GIT_URL/info/refs?service=git-receive-pack'.
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/14] reduce confusion over smart server response behaviour
  2013-09-10 17:07                         ` [PATCH 11/14] be clearer in place of 'remote repository' phrase Tay Ray Chuan
@ 2013-09-10 17:07                           ` Tay Ray Chuan
  2013-09-10 17:07                             ` [PATCH 13/14] shift dumb server response details Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

The MUST and the following 'If' scenario may seem contradictory at first
glance; swap their order to alleviate this.

Also mention that the response should specifically be for the requested
service, for clarity's sake.

Based on:

  From:   Junio C Hamano <gitster@pobox.com>
  Message-ID: <7vskdss3ei.fsf@alter.siamese.dyndns.org>

  > +Smart Server Response
  > +^^^^^^^^^^^^^^^^^^^^^
  > +
  > +Smart servers MUST respond with the smart server reply format.
  > +If the server does not recognize the requested service name, or the
  > +requested service name has been disabled by the server administrator,
  > +the server MUST respond with the '403 Forbidden' HTTP status code.

  This is a bit confusing.

  If you as a server administrator want to disable the smart upload-pack for
  one repository (but not for other repositories), you would not be able to
  force smart clients to fall back to the dumb protocol by giving "403" for
  that repository.

  Maybe in 2 years somebody smarter than us will have invented a more
  efficient git-upload-pack-2 service, which is the only fetch protocol his
  server supports other than dumb.  If your v1 smart client asks for the
  original git-upload-pack service and gets a "403", you won't be able to
  fall back to "dumb".

  The solution for such cases likely is to pretend as if you are a dumb
  server for the smart request.  That unfortunately means that the first
  sentence is misleading, and the second sentence is also an inappropriate
  advice.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 4bb1614..63a089a 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -234,12 +234,13 @@ description of the dumb server response.
 
 Smart Server Response
 ^^^^^^^^^^^^^^^^^^^^^
-Smart servers MUST respond with the smart server reply format.
-
 If the server does not recognize the requested service name, or the
 requested service name has been disabled by the server administrator,
 the server MUST respond with the '403 Forbidden' HTTP status code.
 
+Otherwise, smart servers MUST respond with the smart server reply
+format for the requested service name.
+
 Cache-Control headers SHOULD be used to disable caching of the
 returned entity.
 
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/14] shift dumb server response details
  2013-09-10 17:07                           ` [PATCH 12/14] reduce confusion over smart server response behaviour Tay Ray Chuan
@ 2013-09-10 17:07                             ` Tay Ray Chuan
  2013-09-10 17:07                               ` [PATCH 14/14] mention effect of "allow-tip-sha1-in-want" capability on git-upload-pack Tay Ray Chuan
  0 siblings, 1 reply; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Shift details like ABNF from the client section to server section. This
is in line with the smart analogue.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
---
 Documentation/technical/http-protocol.txt | 49 +++++++++++++++----------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 63a089a..3098aa4 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -162,30 +162,6 @@ without any search/query parameters.  E.g.
    S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0\n
    S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}\n
 
-The Content-Type of the returned info/refs entity SHOULD be
-"text/plain; charset=utf-8", but MAY be any content type.
-Clients MUST NOT attempt to validate the returned Content-Type.
-Dumb servers MUST NOT return a return type starting with
-"application/x-git-".
-
-Cache-Control headers MAY be returned to disable caching of the
-returned entity.
-
-When examining the response clients SHOULD only examine the HTTP
-status code.  Valid responses are '200 OK', or '304 Not Modified'.
-
-The returned content is a UNIX formatted text file describing
-each ref and its known value.  The file SHOULD be sorted by name
-according to the C locale ordering.  The file SHOULD NOT include
-the default ref named 'HEAD'.
-
-  info_refs        =  *( ref_record )
-  ref_record       =  any_ref / peeled_ref
-
-  any_ref          =  obj-id HTAB refname LF
-  peeled_ref       =  obj-id HTAB refname LF
-		      obj-id HTAB refname "^{}" LF
-
 Smart Clients
 ~~~~~~~~~~~~~
 
@@ -229,8 +205,29 @@ Dumb Server Response
 ^^^^^^^^^^^^^^^^^^^^
 Dumb servers MUST respond with the dumb server reply format.
 
-See the prior section under dumb clients for a more detailed
-description of the dumb server response.
+The Content-Type of the returned info/refs entity SHOULD be
+"text/plain; charset=utf-8", but MAY be any content type.
+Clients MUST NOT attempt to validate the returned Content-Type.
+Dumb servers MUST NOT return a return type starting with
+"application/x-git-".
+
+Cache-Control headers MAY be returned to disable caching of the
+returned entity.
+
+When examining the response clients SHOULD only examine the HTTP
+status code.  Valid responses are '200 OK', or '304 Not Modified'.
+
+The returned content is a UNIX formatted text file describing
+each ref and its known value.  The file SHOULD be sorted by name
+according to the C locale ordering.  The file SHOULD NOT include
+the default ref named 'HEAD'.
+
+  info_refs        =  *( ref_record )
+  ref_record       =  any_ref / peeled_ref
+
+  any_ref          =  obj-id HTAB refname LF
+  peeled_ref       =  obj-id HTAB refname LF
+		      obj-id HTAB refname "^{}" LF
 
 Smart Server Response
 ^^^^^^^^^^^^^^^^^^^^^
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/14] mention effect of "allow-tip-sha1-in-want" capability on git-upload-pack
  2013-09-10 17:07                             ` [PATCH 13/14] shift dumb server response details Tay Ray Chuan
@ 2013-09-10 17:07                               ` Tay Ray Chuan
  0 siblings, 0 replies; 46+ messages in thread
From: Tay Ray Chuan @ 2013-09-10 17:07 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
--

Subject crafted by Ray Chuan, Nguyễn's s-o-b lifted from
<1377092713-25434-1-git-send-email-pclouds@gmail.com>.

---
 Documentation/technical/http-protocol.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 3098aa4..acc68ac 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -304,7 +304,8 @@ Servers SHOULD support all capabilities defined here.
 
 Clients MUST send at least one 'want' command in the request body.
 Clients MUST NOT reference an id in a 'want' command which did not
-appear in the response obtained through ref discovery.
+appear in the response obtained through ref discovery unless the
+server advertises capability "allow-tip-sha1-in-want".
 
   compute_request  =  want_list
 		      have_list
-- 
1.8.4.rc4.527.g303b16c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-09-10 17:09 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-09  5:22 [RFC PATCH 0/4] Return of smart HTTP Shawn O. Pearce
2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
2009-10-09  5:22     ` [RFC PATCH 3/4] Add smart-http options to upload-pack, receive-pack Shawn O. Pearce
2009-10-09  5:22       ` [RFC PATCH 4/4] Smart fetch and push over HTTP: server side Shawn O. Pearce
2009-10-09  5:52     ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport J.H.
2009-10-09  8:01   ` [RFC PATCH 1/4] Document the HTTP transport protocol Sverre Rabbelier
2009-10-09  8:09     ` Sverre Rabbelier
2009-10-09  8:54   ` Alex Blewitt
2009-10-15 16:39     ` Shawn O. Pearce
2009-10-09 19:27   ` Jakub Narebski
2009-10-09 19:50   ` Jeff King
2009-10-15 16:52     ` Shawn O. Pearce
2009-10-15 17:39       ` Jeff King
2009-10-09 20:44   ` Junio C Hamano
2009-10-10 10:12     ` Antti-Juhani Kaijanaho
2009-10-16  5:59       ` H. Peter Anvin
2009-10-16  7:19         ` Mike Hommey
2009-10-16 14:21           ` Shawn O. Pearce
2009-10-16 14:23         ` Antti-Juhani Kaijanaho
2010-04-07 18:16     ` Tay Ray Chuan
2010-04-07 18:19     ` Tay Ray Chuan
2010-04-07 19:11     ` (resend v2) " Tay Ray Chuan
2010-04-07 19:51       ` Junio C Hamano
2010-04-08  1:47         ` Tay Ray Chuan
2010-04-07 19:24     ` Tay Ray Chuan
2009-10-10 12:17   ` Tay Ray Chuan
2010-04-06  4:57   ` Scott Chacon
2010-04-06  6:09     ` Junio C Hamano
     [not found]       ` <u2hd411cc4a1004060652k5a7f8ea4l67a9b079963f4dc4@mail.gmail.com>
2010-04-06 13:53         ` Scott Chacon
2010-04-06 17:26           ` Junio C Hamano
2013-09-10 17:07   ` [PATCH 00/14] document edits to original http protocol documentation Tay Ray Chuan
2013-09-10 17:07     ` [PATCH 01/14] Document the HTTP transport protocol Tay Ray Chuan
2013-09-10 17:07       ` [PATCH 02/14] normalize indentation with protcol-common.txt Tay Ray Chuan
2013-09-10 17:07         ` [PATCH 03/14] capitalize key words according to RFC 2119 Tay Ray Chuan
2013-09-10 17:07           ` [PATCH 04/14] normalize rules with RFC 5234 Tay Ray Chuan
2013-09-10 17:07             ` [PATCH 05/14] drop rules, etc. common to the pack protocol Tay Ray Chuan
2013-09-10 17:07               ` [PATCH 06/14] reword behaviour on missing repository or objects Tay Ray Chuan
2013-09-10 17:07                 ` [PATCH 07/14] weaken specification over cookies for authentication Tay Ray Chuan
2013-09-10 17:07                   ` [PATCH 08/14] mention different variations around $GIT_URL Tay Ray Chuan
2013-09-10 17:07                     ` [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients Tay Ray Chuan
2013-09-10 17:07                       ` [PATCH 10/14] fix example request/responses Tay Ray Chuan
2013-09-10 17:07                         ` [PATCH 11/14] be clearer in place of 'remote repository' phrase Tay Ray Chuan
2013-09-10 17:07                           ` [PATCH 12/14] reduce confusion over smart server response behaviour Tay Ray Chuan
2013-09-10 17:07                             ` [PATCH 13/14] shift dumb server response details Tay Ray Chuan
2013-09-10 17:07                               ` [PATCH 14/14] mention effect of "allow-tip-sha1-in-want" capability on git-upload-pack Tay Ray Chuan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).