* URL decoding changed semantics of + in URLs @ 2010-07-23 13:18 Thomas Rast 2010-07-23 13:21 ` Thomas Rast ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Thomas Rast @ 2010-07-23 13:18 UTC (permalink / raw) To: Jeff King; +Cc: git, jstpierre Hi Jeff As pointed out by Jasper St. Pierre on #git, it is no longer possible to clone git://git.gnome.org/gtk+ because your 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) decodes + characters in URLs to spaces in the http style. It was later fixed by ce83eda (url.c: "<scheme>://" part at the beginning should not be URL decoded, 2010-06-23) but the later part of the url still decodes + as space. The tests that go along with the commit make it clear that it was an intended change. But the interesting thing is, I cannot find any reference in any RFC that + must have this meaning. In particular, http://www.ietf.org/rfc/rfc2396.txt doesn't say much about + and the only escaping defined is the usual %xx style. So is there a standard that mandates this, or was it just a well-meaning but unnecessary backwards incompatible change? -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-23 13:18 URL decoding changed semantics of + in URLs Thomas Rast @ 2010-07-23 13:21 ` Thomas Rast 2010-07-23 14:10 ` Ævar Arnfjörð Bjarmason 2010-07-26 15:40 ` URL decoding changed semantics of + " Jeff King 2 siblings, 0 replies; 18+ messages in thread From: Thomas Rast @ 2010-07-23 13:21 UTC (permalink / raw) To: Jeff King; +Cc: git, jstpierre Thomas Rast wrote: > Hi Jeff > > As pointed out by Jasper St. Pierre on #git, it is no longer possible > to clone > > git://git.gnome.org/gtk+ Well, that reads wrong now that I'm seeing it again. I of course meant to say that the above URL no longer works and must be spelled git://git.gnome.org/gtk%2B instead. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-23 13:18 URL decoding changed semantics of + in URLs Thomas Rast 2010-07-23 13:21 ` Thomas Rast @ 2010-07-23 14:10 ` Ævar Arnfjörð Bjarmason 2010-07-23 14:25 ` Jasper St. Pierre 2010-07-23 21:23 ` [PATCH] Do not unquote + into ' ' " Thomas Rast 2010-07-26 15:40 ` URL decoding changed semantics of + " Jeff King 2 siblings, 2 replies; 18+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-23 14:10 UTC (permalink / raw) To: Thomas Rast; +Cc: Jeff King, git, jstpierre On Fri, Jul 23, 2010 at 13:18, Thomas Rast <trast@student.ethz.ch> wrote: > doesn't say much about + and the only escaping defined is the usual > %xx style. So is there a standard that mandates this, or was it just > a well-meaning but unnecessary backwards incompatible change? + and %20 are as far as I know only interchangable in *query strings*, so having to clone 'git://git.gnome.org/gtk%2B' where you could previously clone 'git://git.gnome.org/gtk+' is a bug. Git shouldn't be changing that + to a %20. I haven't followed why we need to escape + to %20 at all, even in the query string. E.g. curl(1) doesn't do that before sending requests to Apache, which can handle either one. The + v.s. %20 duality is always handled at the server AFAIK. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-23 14:10 ` Ævar Arnfjörð Bjarmason @ 2010-07-23 14:25 ` Jasper St. Pierre 2010-07-23 21:23 ` [PATCH] Do not unquote + into ' ' " Thomas Rast 1 sibling, 0 replies; 18+ messages in thread From: Jasper St. Pierre @ 2010-07-23 14:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Thomas Rast, Jeff King, git Yep. http://www.ietf.org/rfc/rfc2396.txt defines '+' as a reserved character, but doesn't give a purpose for it. www-form-encoded replaces space with '+' but in a URL it can mean anything it wants. On Fri, Jul 23, 2010 at 10:10 AM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Fri, Jul 23, 2010 at 13:18, Thomas Rast <trast@student.ethz.ch> wrote: > >> doesn't say much about + and the only escaping defined is the usual >> %xx style. So is there a standard that mandates this, or was it just >> a well-meaning but unnecessary backwards incompatible change? > > + and %20 are as far as I know only interchangable in *query strings*, > so having to clone 'git://git.gnome.org/gtk%2B' where you could > previously clone 'git://git.gnome.org/gtk+' is a bug. Git shouldn't be > changing that + to a %20. > > I haven't followed why we need to escape + to %20 at all, even in the > query string. E.g. curl(1) doesn't do that before sending requests to > Apache, which can handle either one. The + v.s. %20 duality is always > handled at the server AFAIK. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] Do not unquote + into ' ' in URLs 2010-07-23 14:10 ` Ævar Arnfjörð Bjarmason 2010-07-23 14:25 ` Jasper St. Pierre @ 2010-07-23 21:23 ` Thomas Rast 2010-07-23 22:20 ` Ævar Arnfjörð Bjarmason 2010-07-23 22:26 ` Junio C Hamano 1 sibling, 2 replies; 18+ messages in thread From: Thomas Rast @ 2010-07-23 21:23 UTC (permalink / raw) To: Jeff King; +Cc: git, avarab, jstpierre Since 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) the URL logic unquotes escaped URLs. For the %2B type of escape, this is conformant with RFC 2396. However, it also unquotes + into a space character, which is only appropriate for the query strings in HTTP. This notably broke fetching from the gtk+ repository. Remove the corresponding bit of code. Reported-by: Jasper St. Pierre <jstpierre@mecheye.net> Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- Jasper St. Pierre wrote: > Yep. http://www.ietf.org/rfc/rfc2396.txt defines '+' as a reserved character, > but doesn't give a purpose for it. www-form-encoded replaces space with '+' > but in a URL it can mean anything it wants. So let's do this then, instead? Based on the discussion, I would consider this a bugfix that should go in 1.7.2.1. t/t5601-clone.sh | 10 ++++++++-- url.c | 5 +---- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 8abb71a..4431dfd 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -178,8 +178,14 @@ test_expect_success 'clone respects global branch.autosetuprebase' ' test_expect_success 'respect url-encoding of file://' ' git init x+y && - test_must_fail git clone "file://$PWD/x+y" xy-url && - git clone "file://$PWD/x%2By" xy-url + git clone "file://$PWD/x+y" xy-url-1 && + git clone "file://$PWD/x%2By" xy-url-2 +' + +test_expect_success 'do not query-string-decode + in URLs' ' + rm -rf x+y && + git init "x y" && + test_must_fail git clone "file://$PWD/x+y" xy-no-plus ' test_expect_success 'do not respect url-encoding of non-url path' ' diff --git a/url.c b/url.c index 2306236..fa4b8d4 100644 --- a/url.c +++ b/url.c @@ -90,10 +90,7 @@ static char *url_decode_internal(const char **query, const char *stop_at, struct } } - if (c == '+') - strbuf_addch(out, ' '); - else - strbuf_addch(out, c); + strbuf_addch(out, c); q++; } while (1); *query = q; -- 1.7.2.rc3.335.g26d7d ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Do not unquote + into ' ' in URLs 2010-07-23 21:23 ` [PATCH] Do not unquote + into ' ' " Thomas Rast @ 2010-07-23 22:20 ` Ævar Arnfjörð Bjarmason 2010-07-23 22:26 ` Junio C Hamano 1 sibling, 0 replies; 18+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-23 22:20 UTC (permalink / raw) To: Thomas Rast; +Cc: Jeff King, git, jstpierre On Fri, Jul 23, 2010 at 21:23, Thomas Rast <trast@student.ethz.ch> wrote: > Since 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) the URL > logic unquotes escaped URLs. For the %2B type of escape, this is > conformant with RFC 2396. However, it also unquotes + into a space > character, which is only appropriate for the query strings in HTTP. > This notably broke fetching from the gtk+ repository. > > Remove the corresponding bit of code. > > Reported-by: Jasper St. Pierre <jstpierre@mecheye.net> > Signed-off-by: Thomas Rast <trast@student.ethz.ch> > --- > > Jasper St. Pierre wrote: >> Yep. http://www.ietf.org/rfc/rfc2396.txt defines '+' as a reserved character, >> but doesn't give a purpose for it. www-form-encoded replaces space with '+' >> but in a URL it can mean anything it wants. > > So let's do this then, instead? > > Based on the discussion, I would consider this a bugfix that should go > in 1.7.2.1. > > > t/t5601-clone.sh | 10 ++++++++-- > url.c | 5 +---- > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 8abb71a..4431dfd 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -178,8 +178,14 @@ test_expect_success 'clone respects global branch.autosetuprebase' ' > > test_expect_success 'respect url-encoding of file://' ' > git init x+y && > - test_must_fail git clone "file://$PWD/x+y" xy-url && > - git clone "file://$PWD/x%2By" xy-url > + git clone "file://$PWD/x+y" xy-url-1 && > + git clone "file://$PWD/x%2By" xy-url-2 > +' > + > +test_expect_success 'do not query-string-decode + in URLs' ' > + rm -rf x+y && > + git init "x y" && > + test_must_fail git clone "file://$PWD/x+y" xy-no-plus > ' > > test_expect_success 'do not respect url-encoding of non-url path' ' > diff --git a/url.c b/url.c > index 2306236..fa4b8d4 100644 > --- a/url.c > +++ b/url.c > @@ -90,10 +90,7 @@ static char *url_decode_internal(const char **query, const char *stop_at, struct > } > } > > - if (c == '+') > - strbuf_addch(out, ' '); > - else > - strbuf_addch(out, c); > + strbuf_addch(out, c); > q++; > } while (1); > *query = q; > -- > 1.7.2.rc3.335.g26d7d > This looks good, Ack. But as icing, it'd be nice to extend these tests to create files / clone repositories with the rest of the reserved characters: http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters Of course that would have to be wrapped in something that skips the tests if those paths can't be created. E.g. "/" is a no-no on Unix, and some of the others will probably cause troubles on other systems. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Do not unquote + into ' ' in URLs 2010-07-23 21:23 ` [PATCH] Do not unquote + into ' ' " Thomas Rast 2010-07-23 22:20 ` Ævar Arnfjörð Bjarmason @ 2010-07-23 22:26 ` Junio C Hamano 2010-07-23 23:04 ` Thomas Rast 1 sibling, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2010-07-23 22:26 UTC (permalink / raw) To: Thomas Rast; +Cc: Jeff King, git, avarab, jstpierre Thomas Rast <trast@student.ethz.ch> writes: > Since 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) the URL > logic unquotes escaped URLs. For the %2B type of escape, this is > conformant with RFC 2396. However, it also unquotes + into a space > character, which is only appropriate for the query strings in HTTP. > This notably broke fetching from the gtk+ repository. Wait a minute. > Based on the discussion, I would consider this a bugfix that should go > in 1.7.2.1. Some form of this may need to be applied to help the client side, but what will happen to http-backend.c::get_info_refs() -> http-backend.c::get_parameter() -> http-backend.c::get_parameters() -> url.c::url_decode_parameter_value() -> url.c::url_decode_internal() codepath, which is the server-side handing of query strings? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Do not unquote + into ' ' in URLs 2010-07-23 22:26 ` Junio C Hamano @ 2010-07-23 23:04 ` Thomas Rast 2010-07-24 14:49 ` [PATCH v2] " Thomas Rast 0 siblings, 1 reply; 18+ messages in thread From: Thomas Rast @ 2010-07-23 23:04 UTC (permalink / raw) To: Junio C Hamano, Shawn O. Pearce; +Cc: Jeff King, git, avarab, jstpierre Junio C Hamano wrote: > Thomas Rast <trast@student.ethz.ch> writes: > > > Since 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) the URL > > logic unquotes escaped URLs. For the %2B type of escape, this is > > conformant with RFC 2396. However, it also unquotes + into a space > > character, which is only appropriate for the query strings in HTTP. > > This notably broke fetching from the gtk+ repository. > > Wait a minute. > > > Based on the discussion, I would consider this a bugfix that should go > > in 1.7.2.1. > > Some form of this may need to be applied to help the client side, but what > will happen to > > http-backend.c::get_info_refs() > -> http-backend.c::get_parameter() > -> http-backend.c::get_parameters() > -> url.c::url_decode_parameter_value() > -> url.c::url_decode_internal() > > codepath, which is the server-side handing of query strings? You're right, I forgot about those. I imagine it would be one of two cases: * It should never have dequoted + before the ? query-delimiter, so we need to fix that too; * It should never have dequoted + before the ? query-delimiter, but it's too late to change that now. Shawn, can you help with this? -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] Do not unquote + into ' ' in URLs 2010-07-23 23:04 ` Thomas Rast @ 2010-07-24 14:49 ` Thomas Rast 2010-07-31 21:18 ` Jasper St. Pierre 0 siblings, 1 reply; 18+ messages in thread From: Thomas Rast @ 2010-07-24 14:49 UTC (permalink / raw) To: git; +Cc: Jeff King, avarab, jstpierre, Shawn O. Pearce, Junio C Hamano Since 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) the URL logic unquotes escaped URLs. For the %2B type of escape, this is conformant with RFC 2396. However, it also unquotes + into a space character, which is only appropriate for the query strings in HTTP. This notably broke fetching from the gtk+ repository. We cannot just remove the corresponding code since the same url_decode_internal() is also used by the HTTP backend to decode query parameters. Introduce a new argument that controls whether the + decoding happens, and use it only in the (client-side) url_decode(). Reported-by: Jasper St. Pierre <jstpierre@mecheye.net> Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- I wrote: > Junio C Hamano wrote: > > > > http-backend.c::get_info_refs() > > -> http-backend.c::get_parameter() > > -> http-backend.c::get_parameters() > > -> url.c::url_decode_parameter_value() > > -> url.c::url_decode_internal() > > You're right, I forgot about those. I imagine it would be one of two > cases: [...] > Shawn, can you help with this? The third case, of course, is: * It only uses these functions for parameter decoding, which of course was correct to begin with. So after hopefully drinking enough coffee, I made this one. The catch is that I'm not entirely clear whether *not* decoding the + client-side anywhere in the URL is correct for http:// URLs? If the client decodes and re-encodes the URL, then the + would be turned into a %2B on the re-encoding. Then again maybe UI-facing URLs should never have a query part at all? t/t5601-clone.sh | 10 ++++++++-- url.c | 11 ++++++----- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 8abb71a..4431dfd 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -178,8 +178,14 @@ test_expect_success 'clone respects global branch.autosetuprebase' ' test_expect_success 'respect url-encoding of file://' ' git init x+y && - test_must_fail git clone "file://$PWD/x+y" xy-url && - git clone "file://$PWD/x%2By" xy-url + git clone "file://$PWD/x+y" xy-url-1 && + git clone "file://$PWD/x%2By" xy-url-2 +' + +test_expect_success 'do not query-string-decode + in URLs' ' + rm -rf x+y && + git init "x y" && + test_must_fail git clone "file://$PWD/x+y" xy-no-plus ' test_expect_success 'do not respect url-encoding of non-url path' ' diff --git a/url.c b/url.c index 2306236..cd8f74f 100644 --- a/url.c +++ b/url.c @@ -67,7 +67,8 @@ static int url_decode_char(const char *q) return val; } -static char *url_decode_internal(const char **query, const char *stop_at, struct strbuf *out) +static char *url_decode_internal(const char **query, const char *stop_at, + struct strbuf *out, int decode_plus) { const char *q = *query; @@ -90,7 +91,7 @@ static char *url_decode_internal(const char **query, const char *stop_at, struct } } - if (c == '+') + if (decode_plus && c == '+') strbuf_addch(out, ' '); else strbuf_addch(out, c); @@ -110,17 +111,17 @@ char *url_decode(const char *url) strbuf_add(&out, url, colon - url); url = colon; } - return url_decode_internal(&url, NULL, &out); + return url_decode_internal(&url, NULL, &out, 0); } char *url_decode_parameter_name(const char **query) { struct strbuf out = STRBUF_INIT; - return url_decode_internal(query, "&=", &out); + return url_decode_internal(query, "&=", &out, 1); } char *url_decode_parameter_value(const char **query) { struct strbuf out = STRBUF_INIT; - return url_decode_internal(query, "&", &out); + return url_decode_internal(query, "&", &out, 1); } -- 1.7.2.278.g76edd.dirty ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2] Do not unquote + into ' ' in URLs 2010-07-24 14:49 ` [PATCH v2] " Thomas Rast @ 2010-07-31 21:18 ` Jasper St. Pierre 2010-07-31 21:33 ` Thomas Rast 0 siblings, 1 reply; 18+ messages in thread From: Jasper St. Pierre @ 2010-07-31 21:18 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Jeff King, avarab, Shawn O. Pearce, Junio C Hamano Hmm.. could we get a follow-up on this patch? It's annoying because it's both backwards and forwards incompatible: the %2b workaround doesn't work in older versions. On Sat, Jul 24, 2010 at 10:49 AM, Thomas Rast <trast@student.ethz.ch> wrote: > Since 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) the URL > logic unquotes escaped URLs. For the %2B type of escape, this is > conformant with RFC 2396. However, it also unquotes + into a space > character, which is only appropriate for the query strings in HTTP. > This notably broke fetching from the gtk+ repository. > > We cannot just remove the corresponding code since the same > url_decode_internal() is also used by the HTTP backend to decode query > parameters. Introduce a new argument that controls whether the + > decoding happens, and use it only in the (client-side) url_decode(). > > Reported-by: Jasper St. Pierre <jstpierre@mecheye.net> > Signed-off-by: Thomas Rast <trast@student.ethz.ch> > --- > > I wrote: >> Junio C Hamano wrote: >> > >> > http-backend.c::get_info_refs() >> > -> http-backend.c::get_parameter() >> > -> http-backend.c::get_parameters() >> > -> url.c::url_decode_parameter_value() >> > -> url.c::url_decode_internal() >> >> You're right, I forgot about those. I imagine it would be one of two >> cases: > [...] >> Shawn, can you help with this? > > The third case, of course, is: > > * It only uses these functions for parameter decoding, which of course > was correct to begin with. > > So after hopefully drinking enough coffee, I made this one. The catch > is that I'm not entirely clear whether *not* decoding the + > client-side anywhere in the URL is correct for http:// URLs? If the > client decodes and re-encodes the URL, then the + would be turned into > a %2B on the re-encoding. Then again maybe UI-facing URLs should > never have a query part at all? > > > t/t5601-clone.sh | 10 ++++++++-- > url.c | 11 ++++++----- > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 8abb71a..4431dfd 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -178,8 +178,14 @@ test_expect_success 'clone respects global branch.autosetuprebase' ' > > test_expect_success 'respect url-encoding of file://' ' > git init x+y && > - test_must_fail git clone "file://$PWD/x+y" xy-url && > - git clone "file://$PWD/x%2By" xy-url > + git clone "file://$PWD/x+y" xy-url-1 && > + git clone "file://$PWD/x%2By" xy-url-2 > +' > + > +test_expect_success 'do not query-string-decode + in URLs' ' > + rm -rf x+y && > + git init "x y" && > + test_must_fail git clone "file://$PWD/x+y" xy-no-plus > ' > > test_expect_success 'do not respect url-encoding of non-url path' ' > diff --git a/url.c b/url.c > index 2306236..cd8f74f 100644 > --- a/url.c > +++ b/url.c > @@ -67,7 +67,8 @@ static int url_decode_char(const char *q) > return val; > } > > -static char *url_decode_internal(const char **query, const char *stop_at, struct strbuf *out) > +static char *url_decode_internal(const char **query, const char *stop_at, > + struct strbuf *out, int decode_plus) > { > const char *q = *query; > > @@ -90,7 +91,7 @@ static char *url_decode_internal(const char **query, const char *stop_at, struct > } > } > > - if (c == '+') > + if (decode_plus && c == '+') > strbuf_addch(out, ' '); > else > strbuf_addch(out, c); > @@ -110,17 +111,17 @@ char *url_decode(const char *url) > strbuf_add(&out, url, colon - url); > url = colon; > } > - return url_decode_internal(&url, NULL, &out); > + return url_decode_internal(&url, NULL, &out, 0); > } > > char *url_decode_parameter_name(const char **query) > { > struct strbuf out = STRBUF_INIT; > - return url_decode_internal(query, "&=", &out); > + return url_decode_internal(query, "&=", &out, 1); > } > > char *url_decode_parameter_value(const char **query) > { > struct strbuf out = STRBUF_INIT; > - return url_decode_internal(query, "&", &out); > + return url_decode_internal(query, "&", &out, 1); > } > -- > 1.7.2.278.g76edd.dirty > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] Do not unquote + into ' ' in URLs 2010-07-31 21:18 ` Jasper St. Pierre @ 2010-07-31 21:33 ` Thomas Rast 2010-08-06 10:46 ` Ralf Ebert 0 siblings, 1 reply; 18+ messages in thread From: Thomas Rast @ 2010-07-31 21:33 UTC (permalink / raw) To: Jasper St. Pierre; +Cc: git, Jeff King, avarab, Shawn O. Pearce, Junio C Hamano Jasper St. Pierre wrote: > Hmm.. could we get a follow-up on this patch? It's annoying because it's both > backwards and forwards incompatible: the %2b workaround doesn't work in > older versions. I was hoping for an Ack from someone who knows the whole HTTP business, because I don't feel confident enough to say it won't break anything there. To be precise, if the client ever attempts to decode URLs with query strings (for whatever reasons) then it would break. Probably there are no such URLs, but I don't know for sure. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] Do not unquote + into ' ' in URLs 2010-07-31 21:33 ` Thomas Rast @ 2010-08-06 10:46 ` Ralf Ebert 0 siblings, 0 replies; 18+ messages in thread From: Ralf Ebert @ 2010-08-06 10:46 UTC (permalink / raw) To: git Hi, > To be precise, if the client ever attempts to decode URLs with query > strings (for whatever reasons) then it would break. Probably there > are no such URLs, but I don't know for sure. url_decode seems to be called only from connect.c to support percent-encoding for git/ssh-URLs, so at least as of now, there should be no problem here. Since applying the patch, git clone 'git://git.gnome.org/gtk+' works ok again for me. Greetings, Ralf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-23 13:18 URL decoding changed semantics of + in URLs Thomas Rast 2010-07-23 13:21 ` Thomas Rast 2010-07-23 14:10 ` Ævar Arnfjörð Bjarmason @ 2010-07-26 15:40 ` Jeff King 2010-07-26 17:57 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 18+ messages in thread From: Jeff King @ 2010-07-26 15:40 UTC (permalink / raw) To: Thomas Rast; +Cc: git, jstpierre On Fri, Jul 23, 2010 at 03:18:30PM +0200, Thomas Rast wrote: > As pointed out by Jasper St. Pierre on #git, it is no longer possible > to clone > > git://git.gnome.org/gtk+ > > because your 9d2e942 (decode file:// and ssh:// URLs, 2010-05-23) > decodes + characters in URLs to spaces in the http style. It was > later fixed by ce83eda (url.c: "<scheme>://" part at the beginning > should not be URL decoded, 2010-06-23) but the later part of the url > still decodes + as space. > > The tests that go along with the commit make it clear that it was an > intended change. But the interesting thing is, I cannot find any > reference in any RFC that + must have this meaning. In particular, > > http://www.ietf.org/rfc/rfc2396.txt > > doesn't say much about + and the only escaping defined is the usual > %xx style. So is there a standard that mandates this, or was it just > a well-meaning but unnecessary backwards incompatible change? Sorry for the slow reply. I am in the middle of moving and just got a usable machine running. I think it is well-meaning but unnecessary. That is, I think %-decoding is probably still a good idea, and I just dragged '+'-decoding along out of cluelessness. I thought I cross-checked with what curl was doing to http URLs, but now I can't even get it to do anything but pass the path component literally to the webserver. So I'm really not sure what I tested before. As Jasper noted, "+" is a reserved character, which means "gtk+" probably _should_ be escaped. But clearly it doesn't happen in practice, and I am more interested in not breaking current use than in nitpicking with a standard. So I agree with the spirit of your patches, and reading over your v2, it looks fine to me, but I didn't do any extensive testing. -Peff ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-26 15:40 ` URL decoding changed semantics of + " Jeff King @ 2010-07-26 17:57 ` Ævar Arnfjörð Bjarmason 2010-07-26 18:22 ` Jasper St. Pierre 0 siblings, 1 reply; 18+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-26 17:57 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, git, jstpierre On Mon, Jul 26, 2010 at 15:40, Jeff King <peff@peff.net> wrote: > As Jasper noted, "+" is a reserved character, which means "gtk+" > probably _should_ be escaped. But clearly it doesn't happen in practice, > and I am more interested in not breaking current use than in nitpicking > with a standard. Reserved characters only have to be escaped in certain contexts, from RFC 2396: Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," E.g. @ is special in the hostname part (http://user@example.org), but writing http://example.org/Git@Big%20companies:%20A%20Study is just fine. Which is why curl passes it along literally, it *can* escape them, and real webservers like Apache handle reserved characters equivalently (in their unreserved contexts) whether they're escaped or not, but the git-daemon at git.gnome.org evidently doesn't implement RFC 2396 carefully enough. So we shouldn't escape + for backwards compatibility and because it's not necessary, but we should probably also fix git-daemon to accept both forms if that hasn't been done already. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-26 17:57 ` Ævar Arnfjörð Bjarmason @ 2010-07-26 18:22 ` Jasper St. Pierre 2010-07-26 18:30 ` Matthieu Moy 2010-07-26 18:35 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 18+ messages in thread From: Jasper St. Pierre @ 2010-07-26 18:22 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, Thomas Rast, git Where is '+' used in the URL context? I don't see that it's a replacement for '+' aside from formencoded, which is from CGI, not HTTP or the URI spec. I also can't access something called "test 2.txt" from Apache with "http://localhost/test+2.txt", so I don't think it's unescaping the '+'. I don't think we should do anything about the '+' case, except where used in formencoded parameters (aka the "query string"), where it is used. On Mon, Jul 26, 2010 at 1:57 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Mon, Jul 26, 2010 at 15:40, Jeff King <peff@peff.net> wrote: > >> As Jasper noted, "+" is a reserved character, which means "gtk+" >> probably _should_ be escaped. But clearly it doesn't happen in practice, >> and I am more interested in not breaking current use than in nitpicking >> with a standard. > > Reserved characters only have to be escaped in certain contexts, from > RFC 2396: > > Many URI include components consisting of or delimited by, certain > special characters. These characters are called "reserved", since > their usage within the URI component is limited to their reserved > purpose. If the data for a URI component would conflict with the > reserved purpose, then the conflicting data must be escaped before > forming the URI. > > reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | > "$" | "," > > E.g. @ is special in the hostname part (http://user@example.org), but > writing http://example.org/Git@Big%20companies:%20A%20Study is just > fine. > > Which is why curl passes it along literally, it *can* escape them, and > real webservers like Apache handle reserved characters equivalently > (in their unreserved contexts) whether they're escaped or not, but the > git-daemon at git.gnome.org evidently doesn't implement RFC 2396 > carefully enough. > > So we shouldn't escape + for backwards compatibility and because it's > not necessary, but we should probably also fix git-daemon to accept > both forms if that hasn't been done already. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-26 18:22 ` Jasper St. Pierre @ 2010-07-26 18:30 ` Matthieu Moy 2010-07-26 18:35 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 18+ messages in thread From: Matthieu Moy @ 2010-07-26 18:30 UTC (permalink / raw) To: Jasper St. Pierre Cc: Ævar Arnfjörð Bjarmason, Jeff King, Thomas Rast, git "Jasper St. Pierre" <jstpierre@mecheye.net> writes: > I don't think we should do anything about the '+' case, except where used in > formencoded parameters (aka the "query string"), where it is used. This is what the latest version of the patch does, AIUI, yes. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-26 18:22 ` Jasper St. Pierre 2010-07-26 18:30 ` Matthieu Moy @ 2010-07-26 18:35 ` Ævar Arnfjörð Bjarmason 2010-07-26 18:44 ` Jasper St. Pierre 1 sibling, 1 reply; 18+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-07-26 18:35 UTC (permalink / raw) To: Jasper St. Pierre; +Cc: Jeff King, Thomas Rast, git On Mon, Jul 26, 2010 at 18:22, Jasper St. Pierre <jstpierre@mecheye.net> wrote: > Where is '+' used in the URL context? I don't see that it's a > replacement for '+' > aside from formencoded, which is from CGI, not HTTP or the URI spec. In my example? Nowhere, but I used : and @, which are also reserved characters. > I also can't access something called "test 2.txt" from Apache with > "http://localhost/test+2.txt", so I don't think it's unescaping the '+'. Yes, that's not supposed to work. But you should be able to access "test+2.txt" using /test+2.txt and /test%2B.txt and /%74%65%73%74%2B2%2E%74%78%74 for that matter. git-daemon only seems to handle the first form. Which is probably a bug, maybe it doesn't *have to* URI unescape its arguments, but it's probably a good idea anyway. E.g. some systems that handle URIs will convert : to %3A automatically when passing them through. That would break a git URL. > I don't think we should do anything about the '+' case, except where used in > formencoded parameters (aka the "query string"), where it is used. I don't really have an opinion on what we should do. It's not a problem for me, I'm just noting how it could break, and that maybe we should try harder and support URI escaping where we handle URLs. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: URL decoding changed semantics of + in URLs 2010-07-26 18:35 ` Ævar Arnfjörð Bjarmason @ 2010-07-26 18:44 ` Jasper St. Pierre 0 siblings, 0 replies; 18+ messages in thread From: Jasper St. Pierre @ 2010-07-26 18:44 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, Thomas Rast, git Hm. I've always wondered about the percent double-encoding problem, where the "%" in "%2B" double-encodes to "%252B", which might require the mandatory server implementation then. I do think that because "git://git.gnome.org/git+" was working, something on the server was going right. On Mon, Jul 26, 2010 at 2:35 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Mon, Jul 26, 2010 at 18:22, Jasper St. Pierre <jstpierre@mecheye.net> wrote: >> Where is '+' used in the URL context? I don't see that it's a >> replacement for '+' >> aside from formencoded, which is from CGI, not HTTP or the URI spec. > > In my example? Nowhere, but I used : and @, which are also reserved > characters. > >> I also can't access something called "test 2.txt" from Apache with >> "http://localhost/test+2.txt", so I don't think it's unescaping the '+'. > > Yes, that's not supposed to work. But you should be able to access > "test+2.txt" using /test+2.txt and /test%2B.txt and > /%74%65%73%74%2B2%2E%74%78%74 for that matter. > > git-daemon only seems to handle the first form. Which is probably a > bug, maybe it doesn't *have to* URI unescape its arguments, but it's > probably a good idea anyway. E.g. some systems that handle URIs will > convert : to %3A automatically when passing them through. That would > break a git URL. > >> I don't think we should do anything about the '+' case, except where used in >> formencoded parameters (aka the "query string"), where it is used. > > I don't really have an opinion on what we should do. It's not a > problem for me, I'm just noting how it could break, and that maybe we > should try harder and support URI escaping where we handle URLs. > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-08-06 10:50 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-23 13:18 URL decoding changed semantics of + in URLs Thomas Rast 2010-07-23 13:21 ` Thomas Rast 2010-07-23 14:10 ` Ævar Arnfjörð Bjarmason 2010-07-23 14:25 ` Jasper St. Pierre 2010-07-23 21:23 ` [PATCH] Do not unquote + into ' ' " Thomas Rast 2010-07-23 22:20 ` Ævar Arnfjörð Bjarmason 2010-07-23 22:26 ` Junio C Hamano 2010-07-23 23:04 ` Thomas Rast 2010-07-24 14:49 ` [PATCH v2] " Thomas Rast 2010-07-31 21:18 ` Jasper St. Pierre 2010-07-31 21:33 ` Thomas Rast 2010-08-06 10:46 ` Ralf Ebert 2010-07-26 15:40 ` URL decoding changed semantics of + " Jeff King 2010-07-26 17:57 ` Ævar Arnfjörð Bjarmason 2010-07-26 18:22 ` Jasper St. Pierre 2010-07-26 18:30 ` Matthieu Moy 2010-07-26 18:35 ` Ævar Arnfjörð Bjarmason 2010-07-26 18:44 ` Jasper St. Pierre
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.