git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: CONTENT_LENGTH can no longer be empty
       [not found] ` <20180905202613.GA20473@blodeuwedd>
@ 2018-09-06  6:10   ` Jonathan Nieder
  2018-09-06 19:35     ` [PATCH] http-backend: allow empty CONTENT_LENGTH Max Kirillov
  0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-06  6:10 UTC (permalink / raw)
  To: Jelmer Vernooij; +Cc: git, Max Kirillov

Hi,

Jelmer Vernooij wrote[1]:

> Git's http-backend has become slightly stricter about the content
> of the CONTENT_LENGTH variable. Previously, Dulwich would leave this
> variable empty but git now expects it to be set to 0 for GET requests
> without a body.
>
> I'm uploading a fixed version of dulwich.

Thanks for tracking it down!  This is likely due to v2.19.0-rc0~45^2~2
(http-backend: respect CONTENT_LENGTH as specified by rfc3875,
2018-06-10).

Max, RFC 3875 appears to allow a CONTENT_LENGTH of "" when no data is
attached to the request.  Should we check for this case (e.g. inserting
a *str check in

	if (str && !git_parse_ssize_t(str, &val))
		die("failed to parse CONTENT_LENGTH: %s", str);

?

Thanks,
Jonathan

[1] https://bugs.debian.org/907587

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-06  6:10   ` CONTENT_LENGTH can no longer be empty Jonathan Nieder
@ 2018-09-06 19:35     ` Max Kirillov
  2018-09-06 21:54       ` Junio C Hamano
                         ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-06 19:35 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Max Kirillov, Jelmer Vernooij, git, Jeff King

According to RFC3875, empty environment variable is equivalent to unset,
and for CONTENT_LENGTH it should mean zero body to read.

However, as discussed in [1], unset CONTENT_LENGTH is also used for
chunked encoding to indicate reading until EOF, so keep this behavior also
for empty CONTENT_LENGTH.

Add a test for the case.

[1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/

Signed-off-by: Max Kirillov <max@max630.net>
---
Hi.

This should fix it. I'm not sure should it treat it as 0 or "-1"
At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
So keep the existing behavior as much as possible
 http-backend.c                         |  2 +-
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..a1230d7ead 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,7 +353,7 @@ static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
+	if (str && *str && !git_parse_ssize_t(str, &val))
 		die("failed to parse CONTENT_LENGTH: %s", str);
 	return val;
 }
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..ca34c2f054 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING=/repo.git/HEAD \
+		PATH_TRANSLATED="$PWD"/.git/HEAD \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-06 19:35     ` [PATCH] http-backend: allow empty CONTENT_LENGTH Max Kirillov
@ 2018-09-06 21:54       ` Junio C Hamano
  2018-09-07  3:27         ` Max Kirillov
  2018-09-06 22:45       ` Jonathan Nieder
  2018-09-07  3:36       ` [PATCH v2] " Max Kirillov
  2 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-06 21:54 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Jonathan Nieder, Jelmer Vernooij, git, Jeff King

Max Kirillov <max@max630.net> writes:

> According to RFC3875, empty environment variable is equivalent to unset,
> and for CONTENT_LENGTH it should mean zero body to read.
>
> However, as discussed in [1], unset CONTENT_LENGTH is also used for
> chunked encoding to indicate reading until EOF, so keep this behavior also
> for empty CONTENT_LENGTH.

Makes sense.

>
> Add a test for the case.
>
> [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/
>
> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> Hi.
>
> This should fix it. I'm not sure should it treat it as 0 or "-1"
> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> So keep the existing behavior as much as possible

I am not sure what you mean by the above, between 0 and -1.  The
code signals the caller of get_content_length() that req_len is -1
which is used as a sign to read through to the EOF, so it appears to
me that the code treats missing content-length (i.e. str == NULL
case) as "-1".


>  http-backend.c                         |  2 +-
>  t/t5562-http-backend-content-length.sh | 11 +++++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/http-backend.c b/http-backend.c
> index e88d29f62b..a1230d7ead 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -353,7 +353,7 @@ static ssize_t get_content_length(void)
>  	ssize_t val = -1;
>  	const char *str = getenv("CONTENT_LENGTH");
>  
> -	if (str && !git_parse_ssize_t(str, &val))
> +	if (str && *str && !git_parse_ssize_t(str, &val))
>  		die("failed to parse CONTENT_LENGTH: %s", str);
>  	return val;
>  }
> diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
> index 057dcb85d6..ca34c2f054 100755
> --- a/t/t5562-http-backend-content-length.sh
> +++ b/t/t5562-http-backend-content-length.sh
> @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
>  	grep "fatal:.*CONTENT_LENGTH" err
>  '
>  
> +test_expect_success 'empty CONTENT_LENGTH' '
> +	env \
> +		QUERY_STRING=/repo.git/HEAD \
> +		PATH_TRANSLATED="$PWD"/.git/HEAD \
> +		GIT_HTTP_EXPORT_ALL=TRUE \
> +		REQUEST_METHOD=GET \
> +		CONTENT_LENGTH="" \
> +		git http-backend <empty_body >act.out 2>act.err &&
> +	verify_http_result "200 OK"
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-06 19:35     ` [PATCH] http-backend: allow empty CONTENT_LENGTH Max Kirillov
  2018-09-06 21:54       ` Junio C Hamano
@ 2018-09-06 22:45       ` Jonathan Nieder
  2018-09-07  3:36       ` [PATCH v2] " Max Kirillov
  2 siblings, 0 replies; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-06 22:45 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Jelmer Vernooij, git, Jeff King

Hi,

Max Kirillov wrote:

> According to RFC3875, empty environment variable is equivalent to unset,
> and for CONTENT_LENGTH it should mean zero body to read.
>
> However, as discussed in [1], unset CONTENT_LENGTH is also used for
> chunked encoding to indicate reading until EOF, so keep this behavior also
> for empty CONTENT_LENGTH.
>
> Add a test for the case.
>
> [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/
>
> Signed-off-by: Max Kirillov <max@max630.net>

Reported-by: Jelmer Vernooij <jelmer@jelmer.uk>

Thanks for fixing it.

Can you include a summary of [1] instead of relying on the mailing
list archive?  Perhaps just omiting "as discussed in [1]" would do the
trick.  Alternatively, if there's a point from that discussion that's
relevant to the change, please include it here.  That way, people
finding this change later can save some time by avoiding having to dig
through that mailing list thread.

For example, it's probably worth mentioning that this was discovered
using dulwich's test suite.

[...]
> This should fix it. I'm not sure should it treat it as 0 or "-1"
> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> So keep the existing behavior as much as possible

That sounds worth figuring out so we can understand and possibly
document it better.  What are the ramifications of this choice ---
what would work / not work with each choice?

[...]
> --- a/t/t5562-http-backend-content-length.sh
> +++ b/t/t5562-http-backend-content-length.sh
> @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '

Yay, thanks for this as well.

Sincerely,
Jonathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-06 21:54       ` Junio C Hamano
@ 2018-09-07  3:27         ` Max Kirillov
  2018-09-07  3:38           ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Max Kirillov @ 2018-09-07  3:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Max Kirillov, Jonathan Nieder, Jelmer Vernooij, git, Jeff King

On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote:
> Max Kirillov <max@max630.net> writes:
>> This should fix it. I'm not sure should it treat it as 0 or "-1"
>> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
>> So keep the existing behavior as much as possible
> 
> I am not sure what you mean by the above, between 0 and -1.  The
> code signals the caller of get_content_length() that req_len is -1
> which is used as a sign to read through to the EOF, so it appears to
> me that the code treats missing content-length (i.e. str == NULL
> case) as "-1".

I made a mistake in this, it should be "if I try to treat missing
CONTENT_LENGTH as 0". This, as far as I understand, what the
RFC specifies.

That is, after the following change, the test "large fetch-pack
requests can be split across POSTs" from t5551 starts faliing:

-- >8 --
@@ -353,8 +353,12 @@ static ssize_t get_content_length(void)
        ssize_t val = -1;
        const char *str = getenv("CONTENT_LENGTH");
 
-       if (str && *str && !git_parse_ssize_t(str, &val))
-               die("failed to parse CONTENT_LENGTH: %s", str);
+       if (str && *str) {
+               if (!git_parse_ssize_t(str, &val))
+                       die("failed to parse CONTENT_LENGTH: %s", str);
+       } else
+               val = 0;
+
        return val;
 }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2] http-backend: allow empty CONTENT_LENGTH
  2018-09-06 19:35     ` [PATCH] http-backend: allow empty CONTENT_LENGTH Max Kirillov
  2018-09-06 21:54       ` Junio C Hamano
  2018-09-06 22:45       ` Jonathan Nieder
@ 2018-09-07  3:36       ` Max Kirillov
  2018-09-08  0:19         ` Jonathan Nieder
  2018-09-09  4:10         ` [PATCH v4] http-backend: allow empty CONTENT_LENGTH Max Kirillov
  2 siblings, 2 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-07  3:36 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Max Kirillov, Jelmer Vernooij, git, Jeff King

According to RFC3875, empty environment variable is equivalent to unset,
and for CONTENT_LENGTH it should mean zero body to read.

However, unset CONTENT_LENGTH is also used for chunked encoding to indicate
reading until EOF. At least, the test "large fetch-pack requests can be split
across POSTs" from t5551 starts faliing, if unset or empty CONTENT_LENGTH is
treated as zero length body. So keep the existing behavior as much as possible.

Add a test for the case.

Reported-By: Jelmer Vernooij <jelmer@jelmer.uk>
Signed-off-by: Max Kirillov <max@max630.net>
---
Added the "reported-by" and explained inline the reason to keep existing behavior
 http-backend.c                         |  2 +-
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..a1230d7ead 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,7 +353,7 @@ static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
+	if (str && *str && !git_parse_ssize_t(str, &val))
 		die("failed to parse CONTENT_LENGTH: %s", str);
 	return val;
 }
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..ca34c2f054 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING=/repo.git/HEAD \
+		PATH_TRANSLATED="$PWD"/.git/HEAD \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  3:27         ` Max Kirillov
@ 2018-09-07  3:38           ` Jeff King
  2018-09-07  4:20             ` Max Kirillov
  2018-09-07  4:59             ` Max Kirillov
  0 siblings, 2 replies; 39+ messages in thread
From: Jeff King @ 2018-09-07  3:38 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Junio C Hamano, Jonathan Nieder, Jelmer Vernooij, git

On Fri, Sep 07, 2018 at 06:27:40AM +0300, Max Kirillov wrote:

> On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote:
> > Max Kirillov <max@max630.net> writes:
> >> This should fix it. I'm not sure should it treat it as 0 or "-1"
> >> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> >> So keep the existing behavior as much as possible
> > 
> > I am not sure what you mean by the above, between 0 and -1.  The
> > code signals the caller of get_content_length() that req_len is -1
> > which is used as a sign to read through to the EOF, so it appears to
> > me that the code treats missing content-length (i.e. str == NULL
> > case) as "-1".
> 
> I made a mistake in this, it should be "if I try to treat missing
> CONTENT_LENGTH as 0". This, as far as I understand, what the
> RFC specifies.
> 
> That is, after the following change, the test "large fetch-pack
> requests can be split across POSTs" from t5551 starts faliing:
> 
> -- >8 --
> @@ -353,8 +353,12 @@ static ssize_t get_content_length(void)
>         ssize_t val = -1;
>         const char *str = getenv("CONTENT_LENGTH");
>  
> -       if (str && *str && !git_parse_ssize_t(str, &val))
> -               die("failed to parse CONTENT_LENGTH: %s", str);
> +       if (str && *str) {
> +               if (!git_parse_ssize_t(str, &val))
> +                       die("failed to parse CONTENT_LENGTH: %s", str);
> +       } else
> +               val = 0;
> +

Right, I'm pretty sure it is a problem if you treat a missing
CONTENT_LENGTH as "present, but zero". Because chunked encodings from
apache really do want us to read until EOF.

My understanding from Jelmer's report is that a present-but-empty
variable should be counted as "0" to mean "do not read any body bytes".
That matches my reading of RFC 3875, which says:

  If no data is attached, then NULL (or unset).

(and earlier they explicitly define NULL as the empty string). That
said, we do not do what they say for the "unset" case. And cannot
without breaking chunked encoding from apache. So I don't know how much
we want to follow that rfc to the letter, but at least it makes sense to
me to revert this case back to what Git used to do, and what the rfc
says.

In other words, I think the logic we want is:

  if (!str) {
	/*
	 * RFC3875 says this must mean "no body", but in practice we
	 * receive chunked encodings with no CONTENT_LENGTH. Tell the
	 * caller to read until EOF.
	 */
	val = -1;
  } else if (!*str) {
	/*
	 * An empty length should be treated as "no body" according to
	 * RFC3875, and this seems to hold in practice.
	 */
	val = 0;
  } else {
	/*
	 * We have a CONTENT_LENGTH; trust what's in it as long as it
	 * can be parsed.
	 */
	if (!git_parse_ssize_t(str, &val))
	        die(...);
  }

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  3:38           ` Jeff King
@ 2018-09-07  4:20             ` Max Kirillov
  2018-09-07  4:59             ` Max Kirillov
  1 sibling, 0 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-07  4:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Max Kirillov, Junio C Hamano, Jonathan Nieder, Jelmer Vernooij, git

On Thu, Sep 06, 2018 at 11:38:31PM -0400, Jeff King wrote:
> My understanding from Jelmer's report is that a present-but-empty
> variable should be counted as "0" to mean "do not read any body bytes".
> That matches my reading of RFC 3875, which says:
> 
>   If no data is attached, then NULL (or unset).
> 
> (and earlier they explicitly define NULL as the empty string). That
> said, we do not do what they say for the "unset" case. And cannot
> without breaking chunked encoding from apache. So I don't know how much
> we want to follow that rfc to the letter, but at least it makes sense to
> me to revert this case back to what Git used to do, and what the rfc
> says.

I could find this discussion about it:
https://lists.gt.net/apache/users/373042

Basically, it says the CGI RFC was written before chunked
encoding appeared, so implementations should choose between
caching all boody before calling script, or breaking the
spec some way. So apache does it so.

(I wonder how IIS would handle it)

> In other words, I think the logic we want is:
> 
>   if (!str) {
> 	/*
> 	 * RFC3875 says this must mean "no body", but in practice we
> 	 * receive chunked encodings with no CONTENT_LENGTH. Tell the
> 	 * caller to read until EOF.
> 	 */
> 	val = -1;
>   } else if (!*str) {
> 	/*
> 	 * An empty length should be treated as "no body" according to
> 	 * RFC3875, and this seems to hold in practice.
> 	 */
> 	val = 0;
>   } else {
> 	/*
> 	 * We have a CONTENT_LENGTH; trust what's in it as long as it
> 	 * can be parsed.
> 	 */
> 	if (!git_parse_ssize_t(str, &val))
> 	        die(...);
>   }

I feel reluctant to treat empty and unset differently, but
probably this is the only thing which could be done.

I'll resumbmit some time later.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  3:38           ` Jeff King
  2018-09-07  4:20             ` Max Kirillov
@ 2018-09-07  4:59             ` Max Kirillov
  2018-09-07  9:49               ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Max Kirillov @ 2018-09-07  4:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jonathan Nieder, Jelmer Vernooij, git

Actually, another reason for the latest issue was that CONTENT_LENGTH
is parsed for GET requests at all. It should be parsed only for POST
requests, or, rather, only for upoad-pack and receive-pack requests.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  4:59             ` Max Kirillov
@ 2018-09-07  9:49               ` Junio C Hamano
  2018-09-08  5:41                 ` Max Kirillov
  2018-09-09  4:40                 ` Max Kirillov
  0 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-07  9:49 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Jeff King, Jonathan Nieder, Jelmer Vernooij, git

Max Kirillov <max@max630.net> writes:

> Actually, another reason for the latest issue was that CONTENT_LENGTH
> is parsed for GET requests at all. It should be parsed only for POST
> requests, or, rather, only for upoad-pack and receive-pack requests.

Not really.  The layered design of the HTTP protocol means that any
request type can have non-empty body, but request types for which
no semantics of the body is defined must ignore what is in the body,
which in turn means we need to parse and pay attention to the
content length etc. to find the end of the body, if only to ignore
it.

In any case, hopefully we can fix this before the final, as this is
a regression introduced during this cycle?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  3:36       ` [PATCH v2] " Max Kirillov
@ 2018-09-08  0:19         ` Jonathan Nieder
  2018-09-08  5:35           ` Max Kirillov
  2018-09-08  5:42           ` [PATCH v3] " Max Kirillov
  2018-09-09  4:10         ` [PATCH v4] http-backend: allow empty CONTENT_LENGTH Max Kirillov
  1 sibling, 2 replies; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-08  0:19 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Jelmer Vernooij, git, Jeff King

Hi,

Max Kirillov wrote:

> According to RFC3875, empty environment variable is equivalent to unset,
> and for CONTENT_LENGTH it should mean zero body to read.
>
> However, unset CONTENT_LENGTH is also used for chunked encoding to indicate
> reading until EOF. At least, the test "large fetch-pack requests can be split
> across POSTs" from t5551 starts faliing, if unset or empty CONTENT_LENGTH is
> treated as zero length body. So keep the existing behavior as much as possible.
>
> Add a test for the case.
>
> Reported-By: Jelmer Vernooij <jelmer@jelmer.uk>
> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> Added the "reported-by" and explained inline the reason to keep existing behavior

Lovely, thanks.

To me, "keep the existing behavior as much as possible" isn't comforting
because it doesn't tell me *which* existing behavior.  Fortunately the patch
itself is comforting: it makes us treat "" the same way as unset, which is
exactly what the RFC requires.

So I'm happy with this version.  Thanks for your patient work.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2] http-backend: allow empty CONTENT_LENGTH
  2018-09-08  0:19         ` Jonathan Nieder
@ 2018-09-08  5:35           ` Max Kirillov
  2018-09-08  5:42           ` [PATCH v3] " Max Kirillov
  1 sibling, 0 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-08  5:35 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Max Kirillov, Jelmer Vernooij, git, Jeff King

Before 817f7dc223, CONTENT_LENGTH variable was never considered,
http-backend was just reading request body from standard input until EOF
when it, or a command started by it, needed it.

Then it was discovered that some HTTP do not close standard input, instead
expecting CGI scripts to obey CONTENT_LENGTH. In 817f7dc223, behavior
was changed to consider the CONTENT_LENGTH variable when it is set. Case
of unset CONTENT_LENGTH was kept to mean "read until EOF" which is not
compliant to the RFC3875 (which treats it as empty body), but
practically is used when client uses chunked encoding to submit big
request.

Case of empty CONTENT_LENGTH has slept through this conditions.
Apparently, it is used for GET requests, and RFC3875 does specify that
it also means empty body. Current implementation, however, fails to
parse it and aborts the request.

Fix the case of empty CONTENT_LENGTH to also be treated as "read until EOF".
It does not actually matter what does it mean because body is never read
anyway, it just should not cause parse error. Add a test for the case.

Reported-By: Jelmer Vernooij <jelmer@jelmer.uk>
Signed-off-by: Max Kirillov <max@max630.net>
---
Provided more thorough message, also fix test (it did not test actually the error before)

There will be more versions later, at least the one which suggested by Jeff
 http-backend.c                         |  2 +-
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..a1230d7ead 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,7 +353,7 @@ static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
+	if (str && *str && !git_parse_ssize_t(str, &val))
 		die("failed to parse CONTENT_LENGTH: %s", str);
 	return val;
 }
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..b28c3c4765 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING="/repo.git/info/refs?service=git-receive-pack" \
+		PATH_TRANSLATED="$PWD"/.git/info/refs \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  9:49               ` Junio C Hamano
@ 2018-09-08  5:41                 ` Max Kirillov
  2018-09-09  4:40                 ` Max Kirillov
  1 sibling, 0 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-08  5:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Max Kirillov, Jeff King, Jonathan Nieder, Jelmer Vernooij, git

On Fri, Sep 07, 2018 at 02:49:23AM -0700, Junio C Hamano wrote:
> Max Kirillov <max@max630.net> writes:
> 
>> Actually, another reason for the latest issue was that CONTENT_LENGTH
>> is parsed for GET requests at all. It should be parsed only for POST
>> requests, or, rather, only for upoad-pack and receive-pack requests.
> 
> Not really.  The layered design of the HTTP protocol means that any
> request type can have non-empty body, but request types for which
> no semantics of the body is defined must ignore what is in the body,
> which in turn means we need to parse and pay attention to the
> content length etc. to find the end of the body, if only to ignore
> it.

I don't think it is git's job to police web server implementations,
especially considering that there is a gap between letter of RFC and
actual behavior.  Anyway, it only runs the check for "*/info/refs" GET
request, which ends up in get_info_refs(). Other GET requests do not
check CONTENT_LENGTH. Also, the version of service which is started from
get_info_refs() do not consume input (I think, actually, the
"--stateless-rpc" argument is not needed there).

> In any case, hopefully we can fix this before the final, as this is
> a regression introduced during this cycle?

Yes, I'm working on it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3] http-backend: allow empty CONTENT_LENGTH
  2018-09-08  0:19         ` Jonathan Nieder
  2018-09-08  5:35           ` Max Kirillov
@ 2018-09-08  5:42           ` Max Kirillov
  2018-09-10  5:17             ` Jonathan Nieder
  1 sibling, 1 reply; 39+ messages in thread
From: Max Kirillov @ 2018-09-08  5:42 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Max Kirillov, Jelmer Vernooij, git, Jeff King

Before 817f7dc223, CONTENT_LENGTH variable was never considered,
http-backend was just reading request body from standard input until EOF
when it, or a command started by it, needed it.

Then it was discovered that some HTTP do not close standard input, instead
expecting CGI scripts to obey CONTENT_LENGTH. In 817f7dc223, behavior
was changed to consider the CONTENT_LENGTH variable when it is set. Case
of unset CONTENT_LENGTH was kept to mean "read until EOF" which is not
compliant to the RFC3875 (which treats it as empty body), but
practically is used when client uses chunked encoding to submit big
request.

Case of empty CONTENT_LENGTH has slept through this conditions.
Apparently, it is used for GET requests, and RFC3875 does specify that
it also means empty body. Current implementation, however, fails to
parse it and aborts the request.

Fix the case of empty CONTENT_LENGTH to also be treated as "read until EOF".
It does not actually matter what does it mean because body is never read
anyway, it just should not cause parse error. Add a test for the case.

Reported-By: Jelmer Vernooij <jelmer@jelmer.uk>
Signed-off-by: Max Kirillov <max@max630.net>
---
Provided more thorough message, also fix test (it did not test actually the error before)

There will be more versions later, at least the one which suggested by Jeff

PS: did I write v2, it should be v3 of course!
 http-backend.c                         |  2 +-
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..a1230d7ead 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,7 +353,7 @@ static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
+	if (str && *str && !git_parse_ssize_t(str, &val))
 		die("failed to parse CONTENT_LENGTH: %s", str);
 	return val;
 }
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..b28c3c4765 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING="/repo.git/info/refs?service=git-receive-pack" \
+		PATH_TRANSLATED="$PWD"/.git/info/refs \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  3:36       ` [PATCH v2] " Max Kirillov
  2018-09-08  0:19         ` Jonathan Nieder
@ 2018-09-09  4:10         ` Max Kirillov
  2018-09-10  5:25           ` Jonathan Nieder
  1 sibling, 1 reply; 39+ messages in thread
From: Max Kirillov @ 2018-09-09  4:10 UTC (permalink / raw)
  To: git, Jeff King; +Cc: Max Kirillov, Jonathan Nieder, Jelmer Vernooij

Before 817f7dc223, CONTENT_LENGTH variable was never considered,
http-backend was just reading request body from standard input until EOF
when it, or a command started by it, needed it.

Then it was discovered that some HTTP do not close standard input, instead
expecting CGI scripts to obey CONTENT_LENGTH. In 817f7dc223, behavior
was changed to consider the CONTENT_LENGTH variable when it is set. Case
of unset CONTENT_LENGTH was kept to mean "read until EOF" which is not
compliant to the RFC3875 (which treats it as empty body), but
practically is used when client uses chunked encoding to submit big
request.

Case of empty CONTENT_LENGTH has slept through this conditions.
Apparently, it is used for GET requests, and RFC3875 does specify that
it also means empty body. Current implementation, however, fails to
parse it and aborts the request.

Fix the case of empty CONTENT_LENGTH to be treated as zero-length body
is expected, as specified by RFC3875. It does not actually matter what
does it mean because body is never read anyway, it just should not cause
parse error. Add a test for the case.

Reported-By: Jelmer Vernooij <jelmer@jelmer.uk>
Authored-by: Jeff King <peff@peff.net>
Signed-off-by: Max Kirillov <max@max630.net>
---
The fix suggested by Jeff. I supposed there should be "signed-off"
The tests pass as well
 http-backend.c                         | 24 ++++++++++++++++++++++--
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..949821b46f 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,8 +353,28 @@ static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
-		die("failed to parse CONTENT_LENGTH: %s", str);
+	if (!str) {
+		/*
+		 * RFC3875 says this must mean "no body", but in practice we
+		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
+		 * caller to read until EOF.
+		 */
+		val = -1;
+	} else if (!*str) {
+		/*
+		 * An empty length should be treated as "no body" according to
+		 * RFC3875, and this seems to hold in practice.
+		 */
+		val = 0;
+	} else {
+		/*
+		 * We have a non-empty CONTENT_LENGTH; trust what's in it as long
+		 * as it can be parsed.
+		 */
+		if (!git_parse_ssize_t(str, &val))
+			die("failed to parse CONTENT_LENGTH: '%s'", str);
+	}
+
 	return val;
 }
 
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..b28c3c4765 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING="/repo.git/info/refs?service=git-receive-pack" \
+		PATH_TRANSLATED="$PWD"/.git/info/refs \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: allow empty CONTENT_LENGTH
  2018-09-07  9:49               ` Junio C Hamano
  2018-09-08  5:41                 ` Max Kirillov
@ 2018-09-09  4:40                 ` Max Kirillov
  1 sibling, 0 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-09  4:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Max Kirillov, Jeff King, Jonathan Nieder, Jelmer Vernooij, git

On Fri, Sep 07, 2018 at 02:49:23AM -0700, Junio C Hamano wrote:
> In any case, hopefully we can fix this before the final, as this is
> a regression introduced during this cycle?

I think I am going to stop at the v4. Unless there are some
corrections requested.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3] http-backend: allow empty CONTENT_LENGTH
  2018-09-08  5:42           ` [PATCH v3] " Max Kirillov
@ 2018-09-10  5:17             ` Jonathan Nieder
  2018-09-10 20:36               ` Max Kirillov
  0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-10  5:17 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Jelmer Vernooij, git, Jeff King

From: Max Kirillov <max@max630.net>
Subject: http-backend test: make empty CONTENT_LENGTH test more realistic

This is a test of smart HTTP, so it should use the smart HTTP endpoints
(e.g. /info/refs?service=git-receive-pack), not dumb HTTP (HEAD).

Signed-off-by: Max Kirillov <max@max630.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Max Kirillov wrote:

> Provided more thorough message, also fix test (it did not test
> actually the error before)
>
> There will be more versions later, at least the one which suggested
> by Jeff

v2 is in "next", and I believe that version should already be
sufficient for Git 2.19.  Please correct me if I'm wrong.

Since v2 is in "next", I think any further refinements are supposed to
be incremental patches on top.  Here's an example (representing the
v2->v3 diff).  It's more of an RFC than a serious patch, because:

This version of the test doesn't seem to reproduce the bug.  When I
run the test against the unfixed version of http-backend, it passes.
Ideas?

Not about this patch: could this test share some infrustructure with
t5560-http-backend-noserver.sh?  If there were some common shell
library that they shared, the tests might be easier to read and write.

Thanks,
Jonathan

 t/t5562-http-backend-content-length.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index f94d01f69e..fceb3d39c1 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -155,8 +155,8 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 
 test_expect_success 'empty CONTENT_LENGTH' '
 	env \
-		QUERY_STRING=/repo.git/HEAD \
-		PATH_TRANSLATED="$PWD"/.git/HEAD \
+		QUERY_STRING="/repo.git/info/refs?service=git-receive-pack" \
+		PATH_TRANSLATED="$PWD"/.git/info/refs \
 		GIT_HTTP_EXPORT_ALL=TRUE \
 		REQUEST_METHOD=GET \
 		CONTENT_LENGTH="" \
-- 
2.19.0.rc2.392.g5ba43deb5a


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] http-backend: allow empty CONTENT_LENGTH
  2018-09-09  4:10         ` [PATCH v4] http-backend: allow empty CONTENT_LENGTH Max Kirillov
@ 2018-09-10  5:25           ` Jonathan Nieder
  2018-09-10 13:17             ` Jeff King
  2018-09-10 20:53             ` [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero Max Kirillov
  0 siblings, 2 replies; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-10  5:25 UTC (permalink / raw)
  To: Max Kirillov; +Cc: git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Max Kirillov wrote:

> Reported-By: Jelmer Vernooij <jelmer@jelmer.uk>
> Authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Max Kirillov <max@max630.net>

Nit: for this kind of case of forwarding someone else's patch, we put
a From field at the beginning of the body of the message.  "git
format-patch" can produce a message with that format if you commit
with 'git commit --author="Someone Else <person@example.com>"' and run
format-patch with --from="My Name <me@example.com>".  More details are
in the DISCUSSION section of git-format-patch(1).

As with v3, since v2 is already in "next" this should go incremental.

[...]
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -353,8 +353,28 @@ static ssize_t get_content_length(void)
>  	ssize_t val = -1;
>  	const char *str = getenv("CONTENT_LENGTH");
>  
> -	if (str && !git_parse_ssize_t(str, &val))
> -		die("failed to parse CONTENT_LENGTH: %s", str);
> +	if (!str) {
> +		/*
> +		 * RFC3875 says this must mean "no body", but in practice we
> +		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
> +		 * caller to read until EOF.
> +		 */
> +		val = -1;
> +	} else if (!*str) {
> +		/*
> +		 * An empty length should be treated as "no body" according to
> +		 * RFC3875, and this seems to hold in practice.
> +		 */
> +		val = 0;

Are there example callers that this version fixes?  Where can I read
more, or what can I run to experience it?

For example, v2.19.0-rc0~45^2~2 (http-backend: respect CONTENT_LENGTH
as specified by rfc3875, 2018-06-10) mentions IIS/Windows; does IIS
make use of this distinction?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] http-backend: allow empty CONTENT_LENGTH
  2018-09-10  5:25           ` Jonathan Nieder
@ 2018-09-10 13:17             ` Jeff King
  2018-09-10 16:37               ` Junio C Hamano
  2018-09-10 20:53             ` [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero Max Kirillov
  1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-09-10 13:17 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Max Kirillov, git, Jelmer Vernooij, Florian Manschwetus

On Sun, Sep 09, 2018 at 10:25:58PM -0700, Jonathan Nieder wrote:

> > --- a/http-backend.c
> > +++ b/http-backend.c
> > @@ -353,8 +353,28 @@ static ssize_t get_content_length(void)
> >  	ssize_t val = -1;
> >  	const char *str = getenv("CONTENT_LENGTH");
> >  
> > -	if (str && !git_parse_ssize_t(str, &val))
> > -		die("failed to parse CONTENT_LENGTH: %s", str);
> > +	if (!str) {
> > +		/*
> > +		 * RFC3875 says this must mean "no body", but in practice we
> > +		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
> > +		 * caller to read until EOF.
> > +		 */
> > +		val = -1;
> > +	} else if (!*str) {
> > +		/*
> > +		 * An empty length should be treated as "no body" according to
> > +		 * RFC3875, and this seems to hold in practice.
> > +		 */
> > +		val = 0;
> 
> Are there example callers that this version fixes?  Where can I read
> more, or what can I run to experience it?
> 
> For example, v2.19.0-rc0~45^2~2 (http-backend: respect CONTENT_LENGTH
> as specified by rfc3875, 2018-06-10) mentions IIS/Windows; does IIS
> make use of this distinction?

So this code is what I recommended based on my reading of the RFC, and
based on my understanding of the Debian bug. But I admit I'm confused.

I thought the complaint was that this:

  CONTENT_LENGTH= git http-backend

was reading a body, when it shouldn't be. And so setting it to 0 here
made sense.

But that couldn't have been what older versions were doing, since they
never looked at CONTENT_LENGTH at all, and instead always read to EOF.
So presumably the original problem wasn't that we tried to read a body,
but that the empty string caused git_parse_ssize_t to report failure,
and we called die(). Which probably should be explained by 574c513e8d
(http-backend: allow empty CONTENT_LENGTH, 2018-09-07), but it's too
late for that.

So after that patch, we really do have the original behavior, and that's
enough for v2.19.

But the remaining question then is: what should clients expect on an
empty variable? We know what the RFC says, and we know what dulwich
expected, but I'm not sure we have real world cases beyond that. So it
might actually make sense to punt until we see one, though I don't mind
doing what the rfc says in the meantime. And then the explanation in the
commit message would be "do what the rfc says", and any test probably
ought to be feeding a non-empty empty and confirming that we don't read
it.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] http-backend: allow empty CONTENT_LENGTH
  2018-09-10 13:17             ` Jeff King
@ 2018-09-10 16:37               ` Junio C Hamano
  2018-09-10 18:46                 ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-10 16:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Max Kirillov, git, Jelmer Vernooij,
	Florian Manschwetus

Jeff King <peff@peff.net> writes:

> But that couldn't have been what older versions were doing, since they
> never looked at CONTENT_LENGTH at all, and instead always read to EOF.
> So presumably the original problem wasn't that we tried to read a body,
> but that the empty string caused git_parse_ssize_t to report failure,
> and we called die(). Which probably should be explained by 574c513e8d
> (http-backend: allow empty CONTENT_LENGTH, 2018-09-07), but it's too
> late for that.
>
> So after that patch, we really do have the original behavior, and that's
> enough for v2.19.

To recap to make sure I am following it correctly:

 - pay attention to content-length when it is clearly given with a
   byte count, which is an improvement over v2.18

 - mimick what we have been doing until now when content-length is
   missing or set to an empty string, so we are regression free and
   bug-to-bug compatible relative to v2.18 in these two cases.

> But the remaining question then is: what should clients expect on an
> empty variable? We know what the RFC says, and we know what dulwich
> expected, but I'm not sure we have real world cases beyond that. So it
> might actually make sense to punt until we see one, though I don't mind
> doing what the rfc says in the meantime. And then the explanation in the
> commit message would be "do what the rfc says", and any test probably
> ought to be feeding a non-empty empty and confirming that we don't read
> it.

The RFC is pretty clear that no data is signaled by "NULL (or
unset)", meaning an empty string value and missing variable both
mean the same "no message body", but it further says that the
servers MUST set CONTENT_LENGTH if and only if there is a
message-body, which contradicts with itself (if you adhered to 'if
and only if', in no case you would set it to NULL).

Googling "cgi chunked encoding" seems to give us tons of hits to
show that people are puzzled, just like us, that the scripts would
not get to see Chunked (as the server is supposed to deChunk to
count content-length before calling the backend).  So I agree "do
what the rfc says" is a good thing to try early in the next cycle.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] http-backend: allow empty CONTENT_LENGTH
  2018-09-10 16:37               ` Junio C Hamano
@ 2018-09-10 18:46                 ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-09-10 18:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Max Kirillov, git, Jelmer Vernooij,
	Florian Manschwetus

On Mon, Sep 10, 2018 at 09:37:20AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But that couldn't have been what older versions were doing, since they
> > never looked at CONTENT_LENGTH at all, and instead always read to EOF.
> > So presumably the original problem wasn't that we tried to read a body,
> > but that the empty string caused git_parse_ssize_t to report failure,
> > and we called die(). Which probably should be explained by 574c513e8d
> > (http-backend: allow empty CONTENT_LENGTH, 2018-09-07), but it's too
> > late for that.
> >
> > So after that patch, we really do have the original behavior, and that's
> > enough for v2.19.
> 
> To recap to make sure I am following it correctly:
> 
>  - pay attention to content-length when it is clearly given with a
>    byte count, which is an improvement over v2.18
> 
>  - mimick what we have been doing until now when content-length is
>    missing or set to an empty string, so we are regression free and
>    bug-to-bug compatible relative to v2.18 in these two cases.

That (and what you wrote below) matches my current understanding, too.
Though I did already admit to being confused. ;)

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3] http-backend: allow empty CONTENT_LENGTH
  2018-09-10  5:17             ` Jonathan Nieder
@ 2018-09-10 20:36               ` Max Kirillov
  2018-09-11  4:06                 ` Jonathan Nieder
  0 siblings, 1 reply; 39+ messages in thread
From: Max Kirillov @ 2018-09-10 20:36 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Max Kirillov, Jelmer Vernooij, git, Jeff King

On Sun, Sep 09, 2018 at 10:17:48PM -0700, Jonathan Nieder wrote:
> From: Max Kirillov <max@max630.net>
> Subject: http-backend test: make empty CONTENT_LENGTH test more realistic

Thank you, yes, this is what should have left

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
  2018-09-10  5:25           ` Jonathan Nieder
  2018-09-10 13:17             ` Jeff King
@ 2018-09-10 20:53             ` Max Kirillov
  2018-09-10 21:22               ` Jonathan Nieder
                                 ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-10 20:53 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Jeff King, Jelmer Vernooij, Florian Manschwetus, Max Kirillov

From: Jeff King <peff@peff.net>
Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

There is no known case where empty body it used by a server as
instruction to read until EOF, so there is no need to violate the RFC.
Make get_content_length() return 0 in this case.

Currently there is no practical difference, as the GET request
where it can be empty is handled without actual reading the body
(in get_info_refs() function), but it is better to stick to the correct
behavior.

Signed-off-by: Max Kirillov <max@max630.net>
---
The incremental. Hopefully I described the reason right. Needs "signed-off-by"
 http-backend.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index 458642ef72..ea36a52118 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,8 +353,28 @@ static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && *str && !git_parse_ssize_t(str, &val))
-		die("failed to parse CONTENT_LENGTH: %s", str);
+	if (!str) {
+		/*
+		 * RFC3875 says this must mean "no body", but in practice we
+		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
+		 * caller to read until EOF.
+		 */
+		val = -1;
+	} else if (!*str) {
+		/*
+		 * An empty length should be treated as "no body" according to
+		 * RFC3875, and this seems to hold in practice.
+		 */
+		val = 0;
+	} else {
+		/*
+		 * We have a non-empty CONTENT_LENGTH; trust what's in it as long
+		 * as it can be parsed.
+		 */
+		if (!git_parse_ssize_t(str, &val))
+			die("failed to parse CONTENT_LENGTH: '%s'", str);
+	}
+
 	return val;
 }
 
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
  2018-09-10 20:53             ` [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero Max Kirillov
@ 2018-09-10 21:22               ` Jonathan Nieder
  2018-09-11  1:55                 ` Jeff King
  2018-09-11  1:58               ` Jeff King
  2018-09-11  3:42               ` [PATCH] http-backend: treat " Jonathan Nieder
  2 siblings, 1 reply; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-10 21:22 UTC (permalink / raw)
  To: Max Kirillov; +Cc: git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Hi,

Max Kirillov wrote:

> From: Jeff King <peff@peff.net>
> Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero

micronit: s/Treat/treat/

> There is no known case where empty body it used by a server as
> instruction to read until EOF, so there is no need to violate the RFC.
> Make get_content_length() return 0 in this case.
>
> Currently there is no practical difference, as the GET request
> where it can be empty is handled without actual reading the body
> (in get_info_refs() function), but it is better to stick to the correct
> behavior.
>
> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> The incremental. Hopefully I described the reason right. Needs "signed-off-by"

Thanks.  I am wondering if we should go all the way and do

	ssize_t val;
	const char *str = getenv("CONTENT_LENGTH");

	if (!str || !*str)
		return 0;
	if (!git_parse_ssize_t(str, &val))
		die(...);
	return val;

That would match the RFC, but it seems to make t5510-fetch.sh hang,
right after

  ok 165 - --negotiation-tip understands abbreviated SHA-1

When I run with -v -i -x, it stalls at

  ++ git -C '/usr/local/google/home/jrn/src/git/t/trash directory.t5510-fetch/httpd/www/server' tag -d alpha_1 alpha_2 beta_1 beta_2
  Deleted tag 'alpha_1' (was a84e4a9)
  Deleted tag 'alpha_2' (was 7dd5cf4)
  Deleted tag 'beta_1' (was bcb5c65)
  Deleted tag 'beta_2' (was d3b6dcd)
  +++ pwd
  ++ GIT_TRACE_PACKET='/usr/local/google/home/jrn/src/git/t/trash directory.t5510-fetch/trace'
  ++ git -C client fetch --negotiation-tip=alpha_1 --negotiation-tip=beta_1 origin alpha_s beta_s

Do you know why?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
  2018-09-10 21:22               ` Jonathan Nieder
@ 2018-09-11  1:55                 ` Jeff King
  2018-09-11  2:20                   ` Jonathan Nieder
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-09-11  1:55 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Max Kirillov, git, Jelmer Vernooij, Florian Manschwetus

On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote:

> Thanks.  I am wondering if we should go all the way and do
> 
> 	ssize_t val;
> 	const char *str = getenv("CONTENT_LENGTH");
> 
> 	if (!str || !*str)
> 		return 0;
> 	if (!git_parse_ssize_t(str, &val))
> 		die(...);
> 	return val;
> 
> That would match the RFC, but it seems to make t5510-fetch.sh hang,
> right after
> 
>   ok 165 - --negotiation-tip understands abbreviated SHA-1
> 
> When I run with -v -i -x, it stalls at
> 
>   ++ git -C '/usr/local/google/home/jrn/src/git/t/trash directory.t5510-fetch/httpd/www/server' tag -d alpha_1 alpha_2 beta_1 beta_2
>   Deleted tag 'alpha_1' (was a84e4a9)
>   Deleted tag 'alpha_2' (was 7dd5cf4)
>   Deleted tag 'beta_1' (was bcb5c65)
>   Deleted tag 'beta_2' (was d3b6dcd)
>   +++ pwd
>   ++ GIT_TRACE_PACKET='/usr/local/google/home/jrn/src/git/t/trash directory.t5510-fetch/trace'
>   ++ git -C client fetch --negotiation-tip=alpha_1 --negotiation-tip=beta_1 origin alpha_s beta_s
> 
> Do you know why?

Yes. :)

It's due to this comment in the patch you are replying to:

+       if (!str) {
+               /*
+                * RFC3875 says this must mean "no body", but in practice we
+                * receive chunked encodings with no CONTENT_LENGTH. Tell the
+                * caller to read until EOF.
+                */
+               val = -1;

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
  2018-09-10 20:53             ` [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero Max Kirillov
  2018-09-10 21:22               ` Jonathan Nieder
@ 2018-09-11  1:58               ` Jeff King
  2018-09-11  3:42               ` [PATCH] http-backend: treat " Jonathan Nieder
  2 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-09-11  1:58 UTC (permalink / raw)
  To: Max Kirillov
  Cc: Jonathan Nieder, git, Jelmer Vernooij, Florian Manschwetus

On Mon, Sep 10, 2018 at 11:53:59PM +0300, Max Kirillov wrote:

> From: Jeff King <peff@peff.net>
> Subject: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
> 
> There is no known case where empty body it used by a server as
> instruction to read until EOF, so there is no need to violate the RFC.
> Make get_content_length() return 0 in this case.
> 
> Currently there is no practical difference, as the GET request
> where it can be empty is handled without actual reading the body
> (in get_info_refs() function), but it is better to stick to the correct
> behavior.

There could be a difference if there is a server which actually sets
CONTENT_LENGTH to the empty string for a chunked body. But we don't know
of any such server at this point.

> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> The incremental. Hopefully I described the reason right. Needs "signed-off-by"

Certainly this is:

  Signed-off-by: Jeff King <peff@peff.net>

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
  2018-09-11  1:55                 ` Jeff King
@ 2018-09-11  2:20                   ` Jonathan Nieder
  2018-09-11  2:30                     ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-11  2:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Max Kirillov, git, Jelmer Vernooij, Florian Manschwetus

Jeff King wrote:
> On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote:

>> Thanks.  I am wondering if we should go all the way and do
>>
>> 	ssize_t val;
>> 	const char *str = getenv("CONTENT_LENGTH");
>>
>> 	if (!str || !*str)
>> 		return 0;
>> 	if (!git_parse_ssize_t(str, &val))
>> 		die(...);
>> 	return val;
>>
>> That would match the RFC, but it seems to make t5510-fetch.sh hang,
[...]
>> Do you know why?
>
> Yes. :)
>
> It's due to this comment in the patch you are replying to:
>
> +       if (!str) {
> +               /*
> +                * RFC3875 says this must mean "no body", but in practice we
> +                * receive chunked encodings with no CONTENT_LENGTH. Tell the
> +                * caller to read until EOF.
> +                */
> +               val = -1;

Ah!  So "in practice" includes "in Apache".  An old discussion[1] on
Apache's httpd-users list agrees.

The question then becomes: what does IIS do for zero-length requests?
Does any other web server fail to support "read until EOF" in general?

The CGI standard does not cover chunked encoding so we can't lean on
the standard for advice.  It's not clear to me yet whether this patch
improves on what's in "master".

Thanks,
Jonathan

[1] http://mail-archives.apache.org/mod_mbox/httpd-users/200909.mbox/%3C4AAACC38.3070200@rowe-clan.net%3E

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero
  2018-09-11  2:20                   ` Jonathan Nieder
@ 2018-09-11  2:30                     ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-09-11  2:30 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Max Kirillov, git, Jelmer Vernooij, Florian Manschwetus

On Mon, Sep 10, 2018 at 07:20:28PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Mon, Sep 10, 2018 at 02:22:21PM -0700, Jonathan Nieder wrote:
> 
> >> Thanks.  I am wondering if we should go all the way and do
> >>
> >> 	ssize_t val;
> >> 	const char *str = getenv("CONTENT_LENGTH");
> >>
> >> 	if (!str || !*str)
> >> 		return 0;
> >> 	if (!git_parse_ssize_t(str, &val))
> >> 		die(...);
> >> 	return val;
> >>
> >> That would match the RFC, but it seems to make t5510-fetch.sh hang,
> [...]
> >> Do you know why?
> >
> > Yes. :)
> >
> > It's due to this comment in the patch you are replying to:
> >
> > +       if (!str) {
> > +               /*
> > +                * RFC3875 says this must mean "no body", but in practice we
> > +                * receive chunked encodings with no CONTENT_LENGTH. Tell the
> > +                * caller to read until EOF.
> > +                */
> > +               val = -1;
> 
> Ah!  So "in practice" includes "in Apache".  An old discussion[1] on
> Apache's httpd-users list agrees.
> 
> The question then becomes: what does IIS do for zero-length requests?
> Does any other web server fail to support "read until EOF" in general?
> 
> The CGI standard does not cover chunked encoding so we can't lean on
> the standard for advice.  It's not clear to me yet whether this patch
> improves on what's in "master".

I'd note that the case in question (no CONTENT_LENGTH at all) is not
changed between this patch and master. It's only the case of
CONTENT_LENGTH set to an empty string. But I agree that it is not clear
to me whether it is actually improving anything in practice.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-10 20:53             ` [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero Max Kirillov
  2018-09-10 21:22               ` Jonathan Nieder
  2018-09-11  1:58               ` Jeff King
@ 2018-09-11  3:42               ` Jonathan Nieder
  2018-09-11  4:03                 ` Jonathan Nieder
  2018-09-11  4:18                 ` Junio C Hamano
  2 siblings, 2 replies; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-11  3:42 UTC (permalink / raw)
  To: Max Kirillov; +Cc: git, Jeff King, Jelmer Vernooij, Florian Manschwetus

As discussed in v2.19.0-rc0~45^2~2 (http-backend: respect
CONTENT_LENGTH as specified by rfc3875, 2018-06-10), HTTP servers such
as IIS do not close a CGI script's standard input at the end of a
request, instead expecting CGI scripts to stop reading after
CONTENT_LENGTH bytes.  That commit taught http-backend to respect this
convention except when CONTENT_LENGTH is unset, in which case it
preserved the previous behavior of reading until EOF.

RFC 3875 (the CGI specification) explains:

   The CONTENT_LENGTH variable contains the size of the message-body
   attached to the request, if any, in decimal number of octets.  If no
   data is attached, then NULL (or unset).

      CONTENT_LENGTH = "" | 1*digit

And:

   This specification does not distinguish between zero-length (NULL)
   values and missing values.

But that specification was written before HTTP/1.1 and chunked
encoding.  With chunked encoding, the length of a request is not known
early and it is useful to start a CGI script to process it anyway, so
Apache and many other servers violate the spec: they leave
CONTENT_LENGTH unset and rely on EOF to indicate the end of request.
This is reproducible using t5510-fetch.sh, which hangs if http-backend
is patched to treat a missing CONTENT_LENGTH as zero.

So we are in a bind: to support HTTP servers that don't produce EOF,
http-backend should respect an unset or empty CONTENT_LENGTH that
represents zero, and to support chunked encoding, http-backend should
respect an unset CONTENT_LENGTH that represents "read until EOF".

Fortunately, there's a way out.  Use the HTTP_TRANSFER_ENCODING
environment variable to distinguish the two cases.

Reported-by: Jeff King <peff@peff.net>
Helped-by: Max Kirillov <max@max630.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
How about this?

 http-backend.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index 458642ef72..7902eeb0b3 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char **o
 
 static ssize_t get_content_length(void)
 {
-	ssize_t val = -1;
+	ssize_t val;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && *str && !git_parse_ssize_t(str, &val))
+	if (!str || !*str) {
+		/*
+		 * According to RFC 3875, an empty or missing
+		 * CONTENT_LENGTH means "no body", but RFC 3875
+		 * precedes HTTP/1.1 and chunked encoding. Apache and
+		 * its imitators leave CONTENT_LENGTH unset for
+		 * chunked requests, for which we should use EOF to
+		 * detect the end of the request.
+		 */
+		str = getenv("HTTP_TRANSFER_ENCODING");
+		if (str && !strcmp(str, "chunked"))
+			return -1;
+
+		return 0;
+	}
+	if (!git_parse_ssize_t(str, &val))
 		die("failed to parse CONTENT_LENGTH: %s", str);
 	return val;
 }
-- 
2.19.0.397.gdd90340f6a


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-11  3:42               ` [PATCH] http-backend: treat " Jonathan Nieder
@ 2018-09-11  4:03                 ` Jonathan Nieder
  2018-09-11 18:15                   ` Junio C Hamano
  2018-09-11  4:18                 ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-11  4:03 UTC (permalink / raw)
  To: Max Kirillov; +Cc: git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Kicking off the reviews: ;-)

Jonathan Nieder wrote:

> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char **o
>  
>  static ssize_t get_content_length(void)
[...]
> +		/*
> +		 * According to RFC 3875, an empty or missing
> +		 * CONTENT_LENGTH means "no body", but RFC 3875
> +		 * precedes HTTP/1.1 and chunked encoding. Apache and
> +		 * its imitators leave CONTENT_LENGTH unset for

Which imitators?  Maybe this should just say "Apache leaves [...]".

> +		 * chunked requests, for which we should use EOF to
> +		 * detect the end of the request.
> +		 */
> +		str = getenv("HTTP_TRANSFER_ENCODING");
> +		if (str && !strcmp(str, "chunked"))

RFC 2616 says Transfer-Encoding is a list of transfer-codings applied,
in the order that they were applied, and that "chunked" is always
applied last.  That means a transfer-encoding like

	Transfer-Encoding: identity chunked

would be permitted, or e.g.

	Transfer-Encoding: gzip chunked

Does that means we should be using a check like

	str && (!strcmp(str, "chunked") || ends_with(str, " chunked"))

?

That said, a quick search of codesearch.debian.net mostly finds
examples using straight comparison, so maybe the patch is fine as-is.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3] http-backend: allow empty CONTENT_LENGTH
  2018-09-10 20:36               ` Max Kirillov
@ 2018-09-11  4:06                 ` Jonathan Nieder
  2018-09-11 20:33                   ` [PATCH v2] http-backend test: make empty CONTENT_LENGTH test more realistic Max Kirillov
  0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-11  4:06 UTC (permalink / raw)
  To: Max Kirillov; +Cc: Jelmer Vernooij, git, Jeff King

Max Kirillov wrote:
> On Sun, Sep 09, 2018 at 10:17:48PM -0700, Jonathan Nieder wrote:

>> From: Max Kirillov <max@max630.net>
>> Subject: http-backend test: make empty CONTENT_LENGTH test more realistic
>
> Thank you, yes, this is what should have left

Oh, tying up this loose end: do you know why the test passes without
574c513e8d (http-backend: allow empty CONTENT_LENGTH, 2018-09-10)?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-11  3:42               ` [PATCH] http-backend: treat " Jonathan Nieder
  2018-09-11  4:03                 ` Jonathan Nieder
@ 2018-09-11  4:18                 ` Junio C Hamano
  2018-09-11  4:29                   ` Jonathan Nieder
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11  4:18 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Max Kirillov, git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Jonathan Nieder <jrnieder@gmail.com> writes:

> RFC 3875 (the CGI specification) explains:
>
>    The CONTENT_LENGTH variable contains the size of the message-body
>    attached to the request, if any, in decimal number of octets.  If no
>    data is attached, then NULL (or unset).
>
>       CONTENT_LENGTH = "" | 1*digit
>
> And:
>
>    This specification does not distinguish between zero-length (NULL)
>    values and missing values.
>
> But that specification was written before HTTP/1.1 and chunked
> encoding.

RFC 3875 is from October 2004, while RFC 2616 (HTTP/1.1) is from
June 1999; I presume that 3875 only writes down what has already
been an established practice and that is where the date discrepancy
comes from.

This is a bit old but some of those who participated in the
discussion archived at https://lists.gt.net/apache/users/373042
seemed to know what they were talking about.  In short, lack of
content-length for CGI scripts driven by Apache seems to mean that
the content length is unknown and we are expected to read through to
the eof, and we are expected to ignore the CGI spec, which says
missing CONTENT_LENGTH and CONTENT_LENGTH set to an empty string
both must mean there is no message body.  Instead, we need to take
the former as a sign to read through to the end.

> Fortunately, there's a way out.  Use the HTTP_TRANSFER_ENCODING
> environment variable to distinguish the two cases.

Cute.

I'm anxious to learn how well this works in practice. Or is this a
trick you know somebody else's system already uses (in which case,
that's wonderful)?

> +	if (!str || !*str) {
> +		/*
> +		 * According to RFC 3875, an empty or missing
> +		 * CONTENT_LENGTH means "no body", but RFC 3875
> +		 * precedes HTTP/1.1 and chunked encoding. Apache and
> +		 * its imitators leave CONTENT_LENGTH unset for
> +		 * chunked requests, for which we should use EOF to
> +		 * detect the end of the request.
> +		 */
> +		str = getenv("HTTP_TRANSFER_ENCODING");
> +		if (str && !strcmp(str, "chunked"))
> +			return -1;
> +
> +		return 0;
> +	}
> +	if (!git_parse_ssize_t(str, &val))
>  		die("failed to parse CONTENT_LENGTH: %s", str);
>  	return val;
>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-11  4:18                 ` Junio C Hamano
@ 2018-09-11  4:29                   ` Jonathan Nieder
  0 siblings, 0 replies; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-11  4:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Max Kirillov, git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> RFC 3875 (the CGI specification) explains:
>>
>>    The CONTENT_LENGTH variable contains the size of the message-body
>>    attached to the request, if any, in decimal number of octets.  If no
>>    data is attached, then NULL (or unset).
[...]
>> But that specification was written before HTTP/1.1 and chunked
>> encoding.
>
> RFC 3875 is from October 2004, while RFC 2616 (HTTP/1.1) is from
> June 1999; I presume that 3875 only writes down what has already
> been an established practice and that is where the date discrepancy
> comes from.

Yes, CGI 1.1 is from 1995.  More details are at
https://www.w3.org/CGI/.

[...]
>> Fortunately, there's a way out.  Use the HTTP_TRANSFER_ENCODING
>> environment variable to distinguish the two cases.
>
> Cute.
>
> I'm anxious to learn how well this works in practice. Or is this a
> trick you know somebody else's system already uses (in which case,
> that's wonderful)?

Alas, I came up with it today so I don't know yet how well it will
work in practice.

I can poke around a little tomorrow in Apache to sanity-check the
approach.  Results from anyone able to test using various HTTP servers
(lighttpd, etc) would also be very welcome.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-11  4:03                 ` Jonathan Nieder
@ 2018-09-11 18:15                   ` Junio C Hamano
  2018-09-11 18:27                     ` Junio C Hamano
  2018-09-12  5:56                     ` Jeff King
  0 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11 18:15 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Max Kirillov, git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Jonathan Nieder <jrnieder@gmail.com> writes:

> Kicking off the reviews: ;-)
>
> Jonathan Nieder wrote:
>
>> --- a/http-backend.c
>> +++ b/http-backend.c
>> @@ -350,10 +350,25 @@ static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char **o
>>  
>>  static ssize_t get_content_length(void)
> [...]
>> +		/*
>> +		 * According to RFC 3875, an empty or missing
>> +		 * CONTENT_LENGTH means "no body", but RFC 3875
>> +		 * precedes HTTP/1.1 and chunked encoding. Apache and
>> +		 * its imitators leave CONTENT_LENGTH unset for
>
> Which imitators?  Maybe this should just say "Apache leaves [...]".

I tend to agree; I do not mind amending the text while queuing.

>> +		 * chunked requests, for which we should use EOF to
>> +		 * detect the end of the request.
>> +		 */
>> +		str = getenv("HTTP_TRANSFER_ENCODING");
>> +		if (str && !strcmp(str, "chunked"))
>
> RFC 2616 says Transfer-Encoding is a list of transfer-codings applied,
> in the order that they were applied, and that "chunked" is always
> applied last.  That means a transfer-encoding like
>
> 	Transfer-Encoding: identity chunked
>
> would be permitted, or e.g.
>
> 	Transfer-Encoding: gzip chunked
>
> Does that means we should be using a check like
>
> 	str && (!strcmp(str, "chunked") || ends_with(str, " chunked"))
>
> ?

Hmph, that's 

	"Transfer-Encoding" ":" 1#transfer-coding

where #rule is

   #rule
      A construct "#" is defined, similar to "*", for defining lists of
      elements. The full form is "<n>#<m>element" indicating at least
      <n> and at most <m> elements, each separated by one or more commas
      (",") and OPTIONAL linear white space (LWS). This makes the usual
      form of lists very easy; a rule such as
         ( *LWS element *( *LWS "," *LWS element ))
      can be shown as
         1#element

So

 - you need to account for comma
 - your LWS may not be a SP

if you want to handle gzipped stream coming in a chunked form, I
think.

Unless I am missing the rule in CGI spec that is used to transform
the value on the Transfer-Encoding header to HTTP_TRANSFER_ENCODING
environment variable, that is.

> That said, a quick search of codesearch.debian.net mostly finds
> examples using straight comparison, so maybe the patch is fine as-is.

I do not think we would mind terribly if we do not support
combinations like gzipped-and-then-chunked from day one.  An in-code
NEEDSWORK comment that refers to the production in RFC 2616 Page 143
may not hurt, though.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-11 18:15                   ` Junio C Hamano
@ 2018-09-11 18:27                     ` Junio C Hamano
  2018-09-12  5:56                     ` Jeff King
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-11 18:27 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Max Kirillov, git, Jeff King, Jelmer Vernooij, Florian Manschwetus

Junio C Hamano <gitster@pobox.com> writes:

>>> +		/*
>>> +		 * According to RFC 3875, an empty or missing
>>> +		 * CONTENT_LENGTH means "no body", but RFC 3875
>>> +		 * precedes HTTP/1.1 and chunked encoding. Apache and
>>> +		 * its imitators leave CONTENT_LENGTH unset for
>>
>> Which imitators?  Maybe this should just say "Apache leaves [...]".
>
> I tend to agree; I do not mind amending the text while queuing.
> ...
> I do not think we would mind terribly if we do not support
> combinations like gzipped-and-then-chunked from day one.  An in-code
> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
> may not hurt, though.

I refrained from reflowing the first paragraph of the comment in
this message, but will probably reflow it before committing, if the
updated text is acceptable.


 http-backend.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index 8f515a6def..b997eafb00 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -357,10 +357,17 @@ static ssize_t get_content_length(void)
 		/*
 		 * According to RFC 3875, an empty or missing
 		 * CONTENT_LENGTH means "no body", but RFC 3875
-		 * precedes HTTP/1.1 and chunked encoding. Apache and
-		 * its imitators leave CONTENT_LENGTH unset for
+		 * precedes HTTP/1.1 and chunked encoding. Apache
+		 * leaves CONTENT_LENGTH unset for
 		 * chunked requests, for which we should use EOF to
 		 * detect the end of the request.
+		 *
+		 * NEEDSWORK: Transfer-Encoding header is defined to
+		 * be a list of elements where "chunked", if exists,
+		 * must be at the end.  The current code only deals
+		 * with the case where "chunked" is the only element.
+		 * See RFC 2616 (14.41 Transfer-Encoding) when
+		 * extending this code.
 		 */
 		str = getenv("HTTP_TRANSFER_ENCODING");
 		if (str && !strcmp(str, "chunked"))

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2] http-backend test: make empty CONTENT_LENGTH test more realistic
  2018-09-11  4:06                 ` Jonathan Nieder
@ 2018-09-11 20:33                   ` Max Kirillov
  0 siblings, 0 replies; 39+ messages in thread
From: Max Kirillov @ 2018-09-11 20:33 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Max Kirillov, Jelmer Vernooij, git, Jeff King

This is a test of smart HTTP, so it should use the smart HTTP endpoints
(e.g. /info/refs?service=git-receive-pack), not dumb HTTP (HEAD).

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Max Kirillov <max@max630.net>
---
>  do you know why the test passes without 574c513e8d (http-backend: allow empty CONTENT_LENGTH, 2018-09-10)?

Because I did not know what is QUERY_STRING (it is the part after "?"). Fixed now
(Somehow I did see the failure during development. Maybe there was another parameter before?)

I suspect it is not OK in other places of the test, but I hope it should not affect the result
 t/t5562-http-backend-content-length.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index f94d01f69e..b24d8b05a4 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -155,8 +155,8 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 
 test_expect_success 'empty CONTENT_LENGTH' '
 	env \
-		QUERY_STRING=/repo.git/HEAD \
-		PATH_TRANSLATED="$PWD"/.git/HEAD \
+		QUERY_STRING="service=git-receive-pack" \
+		PATH_TRANSLATED="$PWD"/.git/info/refs \
 		GIT_HTTP_EXPORT_ALL=TRUE \
 		REQUEST_METHOD=GET \
 		CONTENT_LENGTH="" \
-- 
2.17.0.1185.g782057d875


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-11 18:15                   ` Junio C Hamano
  2018-09-11 18:27                     ` Junio C Hamano
@ 2018-09-12  5:56                     ` Jeff King
  2018-09-12  6:26                       ` Jonathan Nieder
  2018-09-12 16:10                       ` Junio C Hamano
  1 sibling, 2 replies; 39+ messages in thread
From: Jeff King @ 2018-09-12  5:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Max Kirillov, git, Jelmer Vernooij,
	Florian Manschwetus

On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote:

> > That said, a quick search of codesearch.debian.net mostly finds
> > examples using straight comparison, so maybe the patch is fine as-is.
> 
> I do not think we would mind terribly if we do not support
> combinations like gzipped-and-then-chunked from day one.  An in-code
> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
> may not hurt, though.

It's pretty common for Git to send gzip'd contents, so this might
actually be necessary on day one. However, it looks like we do so by
setting the content-encoding header.

I really wonder if this topic is worth pursuing further without finding
a real-world case that actually fails with the v2.19 code. I.e., is
there actually a server that doesn't set CONTENT_LENGTH and really can't
handle read-to-eof? It's plausible to me, but it's also equally
plausible that we'd be breaking some other case.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-12  5:56                     ` Jeff King
@ 2018-09-12  6:26                       ` Jonathan Nieder
  2018-09-12 16:10                       ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Jonathan Nieder @ 2018-09-12  6:26 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Max Kirillov, git, Jelmer Vernooij,
	Florian Manschwetus

Jeff King wrote:
> On Tue, Sep 11, 2018 at 11:15:13AM -0700, Junio C Hamano wrote:

>> I do not think we would mind terribly if we do not support
>> combinations like gzipped-and-then-chunked from day one.  An in-code
>> NEEDSWORK comment that refers to the production in RFC 2616 Page 143
>> may not hurt, though.
>
> It's pretty common for Git to send gzip'd contents, so this might
> actually be necessary on day one. However, it looks like we do so by
> setting the content-encoding header.

Correct, we haven't been using Transfer-Encoding for that.

> I really wonder if this topic is worth pursuing further without finding
> a real-world case that actually fails with the v2.19 code. I.e., is
> there actually a server that doesn't set CONTENT_LENGTH and really can't
> handle read-to-eof? It's plausible to me, but it's also equally
> plausible that we'd be breaking some other case.

I wonder about the motivating IIS case.  The CGI spec says that
CONTENT_LENGTH is set if and only if the message has a message-body.
When discussing message-body, it says

      Request-Data   = [ request-body ] [ extension-data ]
[...]
   A request-body is supplied with the request if the CONTENT_LENGTH is
   not NULL.  The server MUST make at least that many bytes available
   for the script to read.  The server MAY signal an end-of-file
   condition after CONTENT_LENGTH bytes have been read or it MAY supply
   extension data.  Therefore, the script MUST NOT attempt to read more
   than CONTENT_LENGTH bytes, even if more data is available.

Does that mean that if CONTENT_LENGTH is not set, then we are
guaranteed to see EOF, because extension-data cannot be present?  If
so, then what we have in v2.19 (plus Max's test improvement that is in
"next") is already enough.

So I agree.

 1. Junio, please eject this patch from "pu", since we don't have any
    need for it.

 2. IIS users, please test v2.19 and let us know how it goes.

Do we have any scenarios that would use an empty POST (or other
non-GET) request?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] http-backend: treat empty CONTENT_LENGTH as zero
  2018-09-12  5:56                     ` Jeff King
  2018-09-12  6:26                       ` Jonathan Nieder
@ 2018-09-12 16:10                       ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-09-12 16:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Max Kirillov, git, Jelmer Vernooij,
	Florian Manschwetus

Jeff King <peff@peff.net> writes:

> I really wonder if this topic is worth pursuing further without finding
> a real-world case that actually fails with the v2.19 code. I.e., is
> there actually a server that doesn't set CONTENT_LENGTH and really can't
> handle read-to-eof? It's plausible to me, but it's also equally
> plausible that we'd be breaking some other case.

OK.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-09-12 16:10 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <f12bc1d7-6acb-6ad9-2917-fbb09105f87a@debian.org>
     [not found] ` <20180905202613.GA20473@blodeuwedd>
2018-09-06  6:10   ` CONTENT_LENGTH can no longer be empty Jonathan Nieder
2018-09-06 19:35     ` [PATCH] http-backend: allow empty CONTENT_LENGTH Max Kirillov
2018-09-06 21:54       ` Junio C Hamano
2018-09-07  3:27         ` Max Kirillov
2018-09-07  3:38           ` Jeff King
2018-09-07  4:20             ` Max Kirillov
2018-09-07  4:59             ` Max Kirillov
2018-09-07  9:49               ` Junio C Hamano
2018-09-08  5:41                 ` Max Kirillov
2018-09-09  4:40                 ` Max Kirillov
2018-09-06 22:45       ` Jonathan Nieder
2018-09-07  3:36       ` [PATCH v2] " Max Kirillov
2018-09-08  0:19         ` Jonathan Nieder
2018-09-08  5:35           ` Max Kirillov
2018-09-08  5:42           ` [PATCH v3] " Max Kirillov
2018-09-10  5:17             ` Jonathan Nieder
2018-09-10 20:36               ` Max Kirillov
2018-09-11  4:06                 ` Jonathan Nieder
2018-09-11 20:33                   ` [PATCH v2] http-backend test: make empty CONTENT_LENGTH test more realistic Max Kirillov
2018-09-09  4:10         ` [PATCH v4] http-backend: allow empty CONTENT_LENGTH Max Kirillov
2018-09-10  5:25           ` Jonathan Nieder
2018-09-10 13:17             ` Jeff King
2018-09-10 16:37               ` Junio C Hamano
2018-09-10 18:46                 ` Jeff King
2018-09-10 20:53             ` [PATCH] http-backend: Treat empty CONTENT_LENGTH as zero Max Kirillov
2018-09-10 21:22               ` Jonathan Nieder
2018-09-11  1:55                 ` Jeff King
2018-09-11  2:20                   ` Jonathan Nieder
2018-09-11  2:30                     ` Jeff King
2018-09-11  1:58               ` Jeff King
2018-09-11  3:42               ` [PATCH] http-backend: treat " Jonathan Nieder
2018-09-11  4:03                 ` Jonathan Nieder
2018-09-11 18:15                   ` Junio C Hamano
2018-09-11 18:27                     ` Junio C Hamano
2018-09-12  5:56                     ` Jeff King
2018-09-12  6:26                       ` Jonathan Nieder
2018-09-12 16:10                       ` Junio C Hamano
2018-09-11  4:18                 ` Junio C Hamano
2018-09-11  4:29                   ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).