All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH 1/4] Document the HTTP transport protocol
Date: Fri, 09 Oct 2009 13:44:53 -0700	[thread overview]
Message-ID: <7vskdss3ei.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: 1255065768-10428-2-git-send-email-spearce@spearce.org

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

Nice write-up.

>  Documentation/technical/http-protocol.txt |  542 +++++++++++++++++++++++++++++
>  1 files changed, 542 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/technical/http-protocol.txt
>
> diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> new file mode 100644
> index 0000000..316d9b6
> --- /dev/null
> +++ b/Documentation/technical/http-protocol.txt
> @@ -0,0 +1,542 @@
> +HTTP transfer protocols
> +=======================
> ...
> +As a design feature smart clients can automatically upgrade "dumb"
> +protocol URLs to smart URLs.  This permits all users to have the
> +same published URL, and the peers automatically select the most
> +efficient transport available to them.

The first sentence feels backwards although the conclusion in the second
sentence is true.  It is more like smart ones trying smart protocol first,
and downgrading to "dumb" after noticing that the server is not smart.

> +Authentication
> +--------------
> ...
> +Clients SHOULD support Basic authentication as described by RFC 2616.
> +Servers SHOULD support Basic authentication by relying upon the
> +HTTP server placed in front of the Git server software.

It is perfectly fine to make it a requirement for a server to support the
Basic authentication, but should you make it a requirement that the
support is done by a specific implementation, i.e. "by relying upon..."?

> +Session State
> +-------------
> ...
> +retained and managed by the client process.  This permits simple
> +round-robin load-balancing on the server side, without needing to
> +worry about state mangement.

s/mangement/management/;

> +pkt-line Format
> +---------------
> ...
> +Examples (as C-style strings):
> +
> +  pkt-line          actual value
> +  ---------------------------------
> +  "0006a\n"         "a\n"
> +  "0005a"           "a"
> +  "000bfoobar\n"    "foobar\n"
> +  "0004"            ""
> +
> +A pkt-line with a length of 0 ("0000") is a special case and MUST
> +be treated as a message break or terminator in the payload.

Isn't this "MUST be" wrong?

It is not an advice to the implementors, but the protocol specification
itself defines what the flush packet means.  IOW, "The author of this
specification, Shawn, MUST treat a flush packet as a message break or
terminator in the payload, when designing this protocol."

> +General Request Processing
> +--------------------------
> +
> +Except where noted, all standard HTTP behavior SHOULD be assumed
> +by both client and server.  This includes (but is not necessarily
> +limited to):
> +
> +If there is no repository at $GIT_URL, the server MUST respond with
> +the '404 Not Found' HTTP status code.

We may also want to add

    If there is no object at $GIT_URL/some/path, the server MUST respond
    with the '404 Not Found' HTTP status code.

to help dumb clients.

> +Dumb Clients
> +~~~~~~~~~~~~
> +
> +HTTP clients that only support the "dumb" protocol MUST discover
> +references by making a request for the special info/refs file of
> +the repository.
> +
> +Dumb HTTP clients MUST NOT include search/query parameters when
> +fetching the info/refs file.  (That is, '?' must not appear in the
> +requested URL.)

It is unclear if '?' can be part of $GIT_URL. E.g.

    $ wget http://example.xz/serve.cgi?path=git.git/info/refs
    $ git clone http://example.xz/serve.cgi?path=git.git

It might be clearer to just say

    Dumb HTTP clients MUST make a GET request against $GIT_URL/info/refs,
    without any search/query parameters.  I.e.

	C: GET $GIT_URL/info/refs HTTP/1.0

to also exclude methods other than GET.

> +	C: GET $GIT_URL/info/refs HTTP/1.0
> +
> +	S: 200 OK
> ...
> +When examining the response clients SHOULD only examine the HTTP
> +status code.  Valid responses are '200 OK', or '304 Not Modified'.

Isn't 401 ("Ah, I was given a wrong URL") and 403 ("Ok, I do not have an
access to this repository") also valid?

> +The returned content is a UNIX formatted text file describing
> +each ref and its known value.  The file SHOULD be sorted by name
> +according to the C locale ordering.  The file SHOULD NOT include
> +the default ref named 'HEAD'.

I know you said "known" to imply "concurrent operations may change it
while the server is serving this client", but it feels rather awkward.

> +Smart Server Response
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +Smart servers MUST respond with the smart server reply format.
> +If the server does not recognize the requested service name, or the
> +requested service name has been disabled by the server administrator,
> +the server MUST respond with the '403 Forbidden' HTTP status code.

This is a bit confusing.

If you as a server administrator want to disable the smart upload-pack for
one repository (but not for other repositories), you would not be able to
force smart clients to fall back to the dumb protocol by giving "403" for
that repository.

Maybe in 2 years somebody smarter than us will have invented a more
efficient git-upload-pack-2 service, which is the only fetch protocol his
server supports other than dumb.  If your v1 smart client asks for the
original git-upload-pack service and gets a "403", you won't be able to
fall back to "dumb".

The solution for such cases likely is to pretend as if you are a dumb
server for the smart request.  That unfortunately means that the first
sentence is misleading, and the second sentence is also an inappropriate
advice.

> +The Content-Type MUST be 'application/x-$servicename-advertisement'.
> +Clients SHOULD fall back to the dumb protocol if another content
> +type is returned.  When falling back to the dumb protocol clients
> +SHOULD NOT make an additional request to $GIT_URL/info/refs, but
> +instead SHOULD use the response already in hand.  Clients MUST NOT
> +continue if they do not support the dumb protocol.

The part I commented on (the beginning of Smart Server Response) was
written as a generic description, not specific to git-upload-pack service,
and the beginning of this paragraph also pretends to be a generic
description, but it is misleading.  This is a specific instruction to the
clients that asked for git-upload-pack service and got a dumb server
response (if the above were talking about something other than upload-pack
service, there is no guarantee that "response already in hand" is useful
to talk to dumb servers).

> +The returned response is a pkt-line stream describing each ref and
> +its known value.  The stream SHOULD be sorted by name according to
> +the C locale ordering.  The stream SHOULD include the default ref
> +named 'HEAD' as the first ref.  The stream MUST include capability
> +declarations behind a NUL on the first ref.
> +
> +	smart_reply    = PKT-LINE("# service=$servicename" LF)
> +	                 ref_list
> +	                 "0000"
> +	ref_list       = empty_list | populated_list
> +
> +	empty_list     = PKT-LINE(id SP "capabilities^{}" NUL cap_list LF)
> +
> +	non_empty_list = PKT-LINE(id SP name NUL cap_list LF)
> +	                 *ref_record
> +
> +	cap_list      = *(SP capability) SP
> +	ref_record    = any_ref | peeled_ref
> +
> +	any_ref       = PKT-LINE(id SP name LF)
> +	peeled_ref    = PKT-LINE(id SP name LF)
> +	                PKT-LINE(id SP name "^{}" LF
> +	id            = 40*HEX
> +
> +	HEX           = "0".."9" | "a".."f"
> +	NL            = <US-ASCII NUL, null (0)>
> +	LF            = <US-ASCII LF,  linefeed (10)>
> +	SP            = <US-ASCII SP,  horizontal-tab (9)>

Did you define what "populated_list" is?

> +Smart Service git-upload-pack
> +------------------------------
> +This service reads from the remote repository.

The wording "remote repository" felt confusing.  I know it is "from the
repository served by the server", but if it were named without
"upload-pack", I might have mistaken that you are allowing to proxy a
request to access a third-party repository by this server.  The same
comment applies to the git-receive-pack service.

> +Capability include-tag
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +When packing an object that an annotated tag points at, include the
> +tag object too.  Clients can request this if they want to fetch
> +tags, but don't know which tags they will need until after they
> +receive the branch data.  By enabling include-tag an entire call
> +to upload-pack can be avoided.
> +

I think you are avoiding an "extra" call; you would need one entire call
to upload-pack anyway for the primary transfer.

  parent reply	other threads:[~2009-10-09 20:51 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-09  5:22 [RFC PATCH 0/4] Return of smart HTTP Shawn O. Pearce
2009-10-09  5:22 ` [RFC PATCH 1/4] Document the HTTP transport protocol Shawn O. Pearce
2009-10-09  5:22   ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport Shawn O. Pearce
2009-10-09  5:22     ` [RFC PATCH 3/4] Add smart-http options to upload-pack, receive-pack Shawn O. Pearce
2009-10-09  5:22       ` [RFC PATCH 4/4] Smart fetch and push over HTTP: server side Shawn O. Pearce
2009-10-09  5:52     ` [RFC PATCH 2/4] Git-aware CGI to provide dumb HTTP transport J.H.
2009-10-09  8:01   ` [RFC PATCH 1/4] Document the HTTP transport protocol Sverre Rabbelier
2009-10-09  8:09     ` Sverre Rabbelier
2009-10-09  8:54   ` Alex Blewitt
2009-10-15 16:39     ` Shawn O. Pearce
2009-10-09 19:27   ` Jakub Narebski
2009-10-09 19:50   ` Jeff King
2009-10-15 16:52     ` Shawn O. Pearce
2009-10-15 17:39       ` Jeff King
2009-10-09 20:44   ` Junio C Hamano [this message]
2009-10-10 10:12     ` Antti-Juhani Kaijanaho
2009-10-16  5:59       ` H. Peter Anvin
2009-10-16  7:19         ` Mike Hommey
2009-10-16 14:21           ` Shawn O. Pearce
2009-10-16 14:23         ` Antti-Juhani Kaijanaho
2010-04-07 18:16     ` Tay Ray Chuan
2010-04-07 18:19     ` Tay Ray Chuan
2010-04-07 19:11     ` (resend v2) " Tay Ray Chuan
2010-04-07 19:51       ` Junio C Hamano
2010-04-08  1:47         ` Tay Ray Chuan
2010-04-07 19:24     ` Tay Ray Chuan
2009-10-10 12:17   ` Tay Ray Chuan
2010-04-06  4:57   ` Scott Chacon
2010-04-06  6:09     ` Junio C Hamano
     [not found]       ` <u2hd411cc4a1004060652k5a7f8ea4l67a9b079963f4dc4@mail.gmail.com>
2010-04-06 13:53         ` Scott Chacon
2010-04-06 17:26           ` Junio C Hamano
2013-09-10 17:07   ` [PATCH 00/14] document edits to original http protocol documentation Tay Ray Chuan
2013-09-10 17:07     ` [PATCH 01/14] Document the HTTP transport protocol Tay Ray Chuan
2013-09-10 17:07       ` [PATCH 02/14] normalize indentation with protcol-common.txt Tay Ray Chuan
2013-09-10 17:07         ` [PATCH 03/14] capitalize key words according to RFC 2119 Tay Ray Chuan
2013-09-10 17:07           ` [PATCH 04/14] normalize rules with RFC 5234 Tay Ray Chuan
2013-09-10 17:07             ` [PATCH 05/14] drop rules, etc. common to the pack protocol Tay Ray Chuan
2013-09-10 17:07               ` [PATCH 06/14] reword behaviour on missing repository or objects Tay Ray Chuan
2013-09-10 17:07                 ` [PATCH 07/14] weaken specification over cookies for authentication Tay Ray Chuan
2013-09-10 17:07                   ` [PATCH 08/14] mention different variations around $GIT_URL Tay Ray Chuan
2013-09-10 17:07                     ` [PATCH 09/14] reduce ambiguity over '?' in $GIT_URL for dumb clients Tay Ray Chuan
2013-09-10 17:07                       ` [PATCH 10/14] fix example request/responses Tay Ray Chuan
2013-09-10 17:07                         ` [PATCH 11/14] be clearer in place of 'remote repository' phrase Tay Ray Chuan
2013-09-10 17:07                           ` [PATCH 12/14] reduce confusion over smart server response behaviour Tay Ray Chuan
2013-09-10 17:07                             ` [PATCH 13/14] shift dumb server response details Tay Ray Chuan
2013-09-10 17:07                               ` [PATCH 14/14] mention effect of "allow-tip-sha1-in-want" capability on git-upload-pack Tay Ray Chuan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vskdss3ei.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.