All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] http.c: use CURLOPT_RANGE for range requests
@ 2015-10-30 22:54 David Turner
  2015-10-30 23:13 ` Junio C Hamano
  2015-10-31  0:08 ` Jeff King
  0 siblings, 2 replies; 8+ messages in thread
From: David Turner @ 2015-10-30 22:54 UTC (permalink / raw)
  To: git; +Cc: David Turner

A HTTP server is permitted to return a non-range response to a HTTP
range request (and Apache httpd in fact does this in some cases).
While libcurl knows how to correctly handle this (by skipping bytes
before and after the requested range), it only turns on this handling
if it is aware that a range request is being made.  By manually
setting the range header instead of using CURLOPT_RANGE, we were
hiding the fact that this was a range request from libcurl.  This
could cause corruption.

Signed-off-by: David Turner <dturner@twopensource.com>
---

This version applies on top of pu.  It also catches all of the range
requests instead of just the one that I happened to notice.

---
 http.c | 24 ++++++++----------------
 http.h |  1 -
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/http.c b/http.c
index 6b89dea..16610b9 100644
--- a/http.c
+++ b/http.c
@@ -30,7 +30,7 @@ static CURL *curl_default;
 #endif
 
 #define PREV_BUF_SIZE 4096
-#define RANGE_HEADER_SIZE 30
+#define RANGE_HEADER_SIZE 17
 
 char curl_errorstr[CURL_ERROR_SIZE];
 
@@ -1213,8 +1213,9 @@ static int http_request(const char *url,
 			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
 					 fwrite);
 			if (posn > 0) {
-				strbuf_addf(&buf, "Range: bytes=%ld-", posn);
-				headers = curl_slist_append(headers, buf.buf);
+				strbuf_addf(&buf, "%ld-", posn);
+				curl_easy_setopt(slot->curl, CURLOPT_RANGE,
+						 &buf.buf);
 				strbuf_reset(&buf);
 			}
 		} else
@@ -1526,10 +1527,6 @@ void release_http_pack_request(struct http_pack_request *preq)
 		fclose(preq->packfile);
 		preq->packfile = NULL;
 	}
-	if (preq->range_header != NULL) {
-		curl_slist_free_all(preq->range_header);
-		preq->range_header = NULL;
-	}
 	preq->slot = NULL;
 	free(preq->url);
 	free(preq);
@@ -1631,10 +1628,8 @@ struct http_pack_request *new_http_pack_request(
 			fprintf(stderr,
 				"Resuming fetch of pack %s at byte %ld\n",
 				sha1_to_hex(target->sha1), prev_posn);
-		xsnprintf(range, sizeof(range), "Range: bytes=%ld-", prev_posn);
-		preq->range_header = curl_slist_append(NULL, range);
-		curl_easy_setopt(preq->slot->curl, CURLOPT_HTTPHEADER,
-			preq->range_header);
+		xsnprintf(range, sizeof(range), "%ld-", prev_posn);
+		curl_easy_setopt(preq->slot->curl, CURLOPT_RANGE, range);
 	}
 
 	return preq;
@@ -1685,7 +1680,6 @@ struct http_object_request *new_http_object_request(const char *base_url,
 	ssize_t prev_read = 0;
 	long prev_posn = 0;
 	char range[RANGE_HEADER_SIZE];
-	struct curl_slist *range_header = NULL;
 	struct http_object_request *freq;
 
 	freq = xcalloc(1, sizeof(*freq));
@@ -1791,10 +1785,8 @@ struct http_object_request *new_http_object_request(const char *base_url,
 			fprintf(stderr,
 				"Resuming fetch of object %s at byte %ld\n",
 				hex, prev_posn);
-		xsnprintf(range, sizeof(range), "Range: bytes=%ld-", prev_posn);
-		range_header = curl_slist_append(range_header, range);
-		curl_easy_setopt(freq->slot->curl,
-				 CURLOPT_HTTPHEADER, range_header);
+		xsnprintf(range, sizeof(range), "%ld-", prev_posn);
+		curl_easy_setopt(freq->slot->curl, CURLOPT_RANGE, range);
 	}
 
 	return freq;
diff --git a/http.h b/http.h
index 49afe39..4f97b60 100644
--- a/http.h
+++ b/http.h
@@ -190,7 +190,6 @@ struct http_pack_request {
 	struct packed_git **lst;
 	FILE *packfile;
 	char tmpfile[PATH_MAX];
-	struct curl_slist *range_header;
 	struct active_request_slot *slot;
 };
 
-- 
2.4.2.691.g714732c-twtrsrc

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-10-30 22:54 [PATCH v2] http.c: use CURLOPT_RANGE for range requests David Turner
@ 2015-10-30 23:13 ` Junio C Hamano
  2015-10-31  0:08 ` Jeff King
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2015-10-30 23:13 UTC (permalink / raw)
  To: David Turner; +Cc: git

David Turner <dturner@twopensource.com> writes:

> A HTTP server is permitted to return a non-range response to a HTTP
> range request (and Apache httpd in fact does this in some cases).
> While libcurl knows how to correctly handle this (by skipping bytes
> before and after the requested range), it only turns on this handling
> if it is aware that a range request is being made.  By manually
> setting the range header instead of using CURLOPT_RANGE, we were
> hiding the fact that this was a range request from libcurl.  This
> could cause corruption.

Wow, that looks really bad.

> Signed-off-by: David Turner <dturner@twopensource.com>
> ---
>
> This version applies on top of pu.  It also catches all of the range
> requests instead of just the one that I happened to notice.
>
> ---
>  http.c | 24 ++++++++----------------
>  http.h |  1 -
>  2 files changed, 8 insertions(+), 17 deletions(-)
>
> diff --git a/http.c b/http.c
> index 6b89dea..16610b9 100644
> --- a/http.c
> +++ b/http.c
> @@ -30,7 +30,7 @@ static CURL *curl_default;
>  #endif
>  
>  #define PREV_BUF_SIZE 4096
> -#define RANGE_HEADER_SIZE 30
> +#define RANGE_HEADER_SIZE 17

Was this change necessary and is 17-byte string sufficient for
64-bit longs?

> @@ -1213,8 +1213,9 @@ static int http_request(const char *url,
>  			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
>  					 fwrite);
>  			if (posn > 0) {
> -				strbuf_addf(&buf, "Range: bytes=%ld-", posn);
> -				headers = curl_slist_append(headers, buf.buf);
> +				strbuf_addf(&buf, "%ld-", posn);
> +				curl_easy_setopt(slot->curl, CURLOPT_RANGE,
> +						 &buf.buf);
>  				strbuf_reset(&buf);

Makes me wonder if all the callers have a CURL* and a long so that a
new helper range_opt_ask_remainder(CURL *, long) can make the
resulting code even simpler; we only say "we know what comes before
this position, give us everything from there" in all callers, right?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-10-30 22:54 [PATCH v2] http.c: use CURLOPT_RANGE for range requests David Turner
  2015-10-30 23:13 ` Junio C Hamano
@ 2015-10-31  0:08 ` Jeff King
  2015-10-31 17:40   ` Junio C Hamano
  2015-11-01 23:00   ` Daniel Stenberg
  1 sibling, 2 replies; 8+ messages in thread
From: Jeff King @ 2015-10-31  0:08 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Fri, Oct 30, 2015 at 06:54:42PM -0400, David Turner wrote:

> A HTTP server is permitted to return a non-range response to a HTTP
> range request (and Apache httpd in fact does this in some cases).
> While libcurl knows how to correctly handle this (by skipping bytes
> before and after the requested range), it only turns on this handling
> if it is aware that a range request is being made.  By manually
> setting the range header instead of using CURLOPT_RANGE, we were
> hiding the fact that this was a range request from libcurl.  This
> could cause corruption.

The goal makes sense. Why weren't we using CURLOPT_RANGE before? Did it
not exist (or otherwise have limitations) in 2005, and if so, when did
it become usable? Do we need to protect this with an #ifdef for the curl
version?

> @@ -1213,8 +1213,9 @@ static int http_request(const char *url,
>  			curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION,
>  					 fwrite);
>  			if (posn > 0) {
> -				strbuf_addf(&buf, "Range: bytes=%ld-", posn);
> -				headers = curl_slist_append(headers, buf.buf);
> +				strbuf_addf(&buf, "%ld-", posn);
> +				curl_easy_setopt(slot->curl, CURLOPT_RANGE,
> +						 &buf.buf);
>  				strbuf_reset(&buf);
>  			}
>  		} else

We reuse curl slots from request to request, and curl options may
persist.  The old code appended to the header list, which then gets
added to the curl object with CURLOPT_HTTPHEADER. Subsequent requests,
even if they are not using ranges, would also set CURLOPT_HTTPHEADER,
possibly to NULL, so the range header would not accidentally creep into
the next request.

But with CURLOPT_RANGE, I think the value would persist to the next
request (unless curl automagically forgets it after making the request,
but I don't think it generally does so). I think the cleanest thing
would be to reset it back to NULL in get_active_slot (there are several
other fields that we reset there, too).

> @@ -1685,7 +1680,6 @@ struct http_object_request *new_http_object_request(const char *base_url,
>  	ssize_t prev_read = 0;
>  	long prev_posn = 0;
>  	char range[RANGE_HEADER_SIZE];
> -	struct curl_slist *range_header = NULL;
>  	struct http_object_request *freq;

Junio asked elsewhere if we really need to tweak RANGE_HEADER_SIZE. But
I think we should go even further, and bump the definition much closer
to the point of use[1], and not worry about an arbitrary constant. After
all, we already use sizeof(), so we do not even end up repeating the
constant.

We could even hide the whole thing away with something like:

  void http_set_range(CURL *curl, long lo, long hi)
  {
	char buf[128];
	int len = 0;

	if (lo >= 0)
		len += xsnprintf(buf + len, "%ld", lo);
	len += xsnprintf(buf + len, "-");
	if (hi >= 0)
		len += xsnprintf(buf + len, "%ld", hi);

	curl_easy_setopt(curl, CURLOPT_RANGE, buf);
  }

That would also make it easier to replace if we do need to keep an
#ifdef for older versions of curl. But maybe it is just
over-engineering.

-Peff

[1] Antique versions of curl needed the buffer to remain valid through
    the whole request, but these days it makes its own copy (and even
    under the old regime, I am not sure if the existing code would work
    anyway, as we return the request from this function, and the buffer
    is function-local).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-10-31  0:08 ` Jeff King
@ 2015-10-31 17:40   ` Junio C Hamano
  2015-10-31 17:43     ` Jeff King
  2015-11-01 23:00   ` Daniel Stenberg
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-10-31 17:40 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git

Jeff King <peff@peff.net> writes:

> We could even hide the whole thing away with something like:
>
>   void http_set_range(CURL *curl, long lo, long hi)
>   {
> 	char buf[128];
> 	int len = 0;
>
> 	if (lo >= 0)
> 		len += xsnprintf(buf + len, "%ld", lo);
> 	len += xsnprintf(buf + len, "-");
> 	if (hi >= 0)
> 		len += xsnprintf(buf + len, "%ld", hi);
>
> 	curl_easy_setopt(curl, CURLOPT_RANGE, buf);
>   }
>
> That would also make it easier to replace if we do need to keep an
> #ifdef for older versions of curl. But maybe it is just
> over-engineering.

I personally do not think this is an over-engineered version.  This
is exactly what I had in mind when I alluded to a small helper ;-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-10-31 17:40   ` Junio C Hamano
@ 2015-10-31 17:43     ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2015-10-31 17:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git

On Sat, Oct 31, 2015 at 10:40:13AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We could even hide the whole thing away with something like:
> >
> >   void http_set_range(CURL *curl, long lo, long hi)
> >   {
> > 	char buf[128];
> > 	int len = 0;
> >
> > 	if (lo >= 0)
> > 		len += xsnprintf(buf + len, "%ld", lo);
> > 	len += xsnprintf(buf + len, "-");
> > 	if (hi >= 0)
> > 		len += xsnprintf(buf + len, "%ld", hi);
> >
> > 	curl_easy_setopt(curl, CURLOPT_RANGE, buf);
> >   }
> >
> > That would also make it easier to replace if we do need to keep an
> > #ifdef for older versions of curl. But maybe it is just
> > over-engineering.
> 
> I personally do not think this is an over-engineered version.  This
> is exactly what I had in mind when I alluded to a small helper ;-)

Yeah, I somehow missed your suggestion before writing that. I think the
important thing your example noted is that we would always pass -1 for
"hi" in all the current callers, so we can simplify this quite a bit,
and use only one snprintf.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-10-31  0:08 ` Jeff King
  2015-10-31 17:40   ` Junio C Hamano
@ 2015-11-01 23:00   ` Daniel Stenberg
  2015-11-02  1:51     ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Stenberg @ 2015-11-01 23:00 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git

On Fri, 30 Oct 2015, Jeff King wrote:

> The goal makes sense. Why weren't we using CURLOPT_RANGE before? Did it not 
> exist (or otherwise have limitations) in 2005, and if so, when did it become 
> usable? Do we need to protect this with an #ifdef for the curl version?

CURLOPT_RANGE existed already in the first libcurl release: version 7.1, 
relased in August 2000.

-- 

  / daniel.haxx.se

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-11-01 23:00   ` Daniel Stenberg
@ 2015-11-02  1:51     ` Jeff King
  2015-11-02  7:05       ` Daniel Stenberg
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2015-11-02  1:51 UTC (permalink / raw)
  To: Daniel Stenberg; +Cc: David Turner, git

On Mon, Nov 02, 2015 at 12:00:42AM +0100, Daniel Stenberg wrote:

> On Fri, 30 Oct 2015, Jeff King wrote:
> 
> >The goal makes sense. Why weren't we using CURLOPT_RANGE before? Did it
> >not exist (or otherwise have limitations) in 2005, and if so, when did it
> >become usable? Do we need to protect this with an #ifdef for the curl
> >version?
> 
> CURLOPT_RANGE existed already in the first libcurl release: version 7.1,
> relased in August 2000.

Ah, thanks. I guess we don't have to worry about that, then.

While I have your attention, Daniel, am I correct in assuming that
performing a second unrelated request with the same CURL object will
need an explicit:

  curl_easy_setopt(curl, CURLOPT_RANGE, NULL);

to avoid using the range twice?

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] http.c: use CURLOPT_RANGE for range requests
  2015-11-02  1:51     ` Jeff King
@ 2015-11-02  7:05       ` Daniel Stenberg
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Stenberg @ 2015-11-02  7:05 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git

On Sun, 1 Nov 2015, Jeff King wrote:

> While I have your attention, Daniel, am I correct in assuming that 
> performing a second unrelated request with the same CURL object will need an 
> explicit:
>
>  curl_easy_setopt(curl, CURLOPT_RANGE, NULL);
>
> to avoid using the range twice?

Correct. As the options stick until modified.

-- 

  / daniel.haxx.se

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-11-02  7:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-30 22:54 [PATCH v2] http.c: use CURLOPT_RANGE for range requests David Turner
2015-10-30 23:13 ` Junio C Hamano
2015-10-31  0:08 ` Jeff King
2015-10-31 17:40   ` Junio C Hamano
2015-10-31 17:43     ` Jeff King
2015-11-01 23:00   ` Daniel Stenberg
2015-11-02  1:51     ` Jeff King
2015-11-02  7:05       ` Daniel Stenberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.