All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] handle alternate charsets for remote http errors
@ 2014-05-21 10:25 Jeff King
  2014-05-21 10:27 ` [PATCH 1/9] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
                   ` (9 more replies)
  0 siblings, 10 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:25 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

As of commit 426e70d (remote-curl: show server content on http errors,
2013-04-05), we relay any text/plain errors shown by the remote http
server to the user. However, we were lazy back then and left this TODO
in place:

       /*
        * We only show text/plain parts, as other types are likely
        * to be ugly to look at on the user's terminal.
        *
        * TODO should handle "; charset=XXX", and re-encode into
        * logoutputencoding
        */

This series actually implements that, along with a few other cleanups.

  [1/9]: test-lib: preserve GIT_CURL_VERBOSE from the environment
  [2/9]: strbuf: add strbuf_tolower function
  [3/9]: daemon/config: factor out duplicate xstrdup_tolower
  [4/9]: http: normalize case of returned content-type
  [5/9]: t/lib-httpd: use write_script to copy CGI scripts
  [6/9]: t5550: test display of remote http error messages
  [7/9]: remote-curl: recognize text/plain with a charset parameter
  [8/9]: strbuf: add strbuf_reencode helper
  [9/9]: remote-curl: reencode http error messages

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] test-lib: preserve GIT_CURL_VERBOSE from the environment
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
@ 2014-05-21 10:27 ` Jeff King
  2014-05-21 10:27 ` [PATCH 2/9] strbuf: add strbuf_tolower function Jeff King
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:27 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

Turning on this variable can be useful when debugging http
tests. It does break a few tests in t5541, but it is not
a variable that the user is likely to have enabled
accidentally.

Signed-off-by: Jeff King <peff@peff.net>
---
Not necessary for this series, but I found it helpful.

I took a look at making the tests in t5541 work with it, but it's a
little nasty (we have to separate the curl output from the regular
stderr). Unsetting GIT_CURL_VERBOSE for those tests would also be an
option. Or just dropping this patch is OK with me, too; it's really only
for debugging.

 t/test-lib.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index c081668..f7e55b1 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -91,6 +91,7 @@ unset VISUAL EMAIL LANGUAGE COLUMNS $("$PERL_PATH" -e '
 		VALGRIND
 		UNZIP
 		PERF_
+		CURL_VERBOSE
 	));
 	my @vars = grep(/^GIT_/ && !/^GIT_($ok)/o, @env);
 	print join("\n", @vars);
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
  2014-05-21 10:27 ` [PATCH 1/9] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
@ 2014-05-21 10:27 ` Jeff King
  2014-05-22  0:07   ` Kyle J. McKay
  2014-05-21 10:28 ` [PATCH 3/9] daemon/config: factor out duplicate xstrdup_tolower Jeff King
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:27 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

This makes config's lowercase() function public.

Note that we could continue to offer a pure-string
lowercase, but there would be no callers (in most
pure-string cases, we actually duplicate and lowercase the
duplicate).

Signed-off-by: Jeff King <peff@peff.net>
---
This ones gets used later on...

 Documentation/technical/api-strbuf.txt | 4 ++++
 config.c                               | 8 +-------
 strbuf.c                               | 7 +++++++
 strbuf.h                               | 1 +
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index 3350d97..8480f89 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -125,6 +125,10 @@ Functions
 
 	Strip whitespace from the end of a string.
 
+`strbuf_tolower`::
+
+	Lowercase each character in the buffer using `tolower`.
+
 `strbuf_cmp`::
 
 	Compare two buffers. Returns an integer less than, equal to, or greater
diff --git a/config.c b/config.c
index a30cb5c..03ce5c6 100644
--- a/config.c
+++ b/config.c
@@ -147,12 +147,6 @@ int git_config_include(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static void lowercase(char *p)
-{
-	for (; *p; p++)
-		*p = tolower(*p);
-}
-
 void git_config_push_parameter(const char *text)
 {
 	struct strbuf env = STRBUF_INIT;
@@ -180,7 +174,7 @@ int git_config_parse_parameter(const char *text,
 		strbuf_list_free(pair);
 		return error("bogus config parameter: %s", text);
 	}
-	lowercase(pair[0]->buf);
+	strbuf_tolower(pair[0]);
 	if (fn(pair[0]->buf, pair[1] ? pair[1]->buf : NULL, data) < 0) {
 		strbuf_list_free(pair);
 		return -1;
diff --git a/strbuf.c b/strbuf.c
index ee96dcf..a1b8a47 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -106,6 +106,13 @@ void strbuf_ltrim(struct strbuf *sb)
 	sb->buf[sb->len] = '\0';
 }
 
+void strbuf_tolower(struct strbuf *sb)
+{
+	size_t i;
+	for (i = 0; i < sb->len; i++)
+		sb->buf[i] = tolower(sb->buf[i]);
+}
+
 struct strbuf **strbuf_split_buf(const char *str, size_t slen,
 				 int terminator, int max)
 {
diff --git a/strbuf.h b/strbuf.h
index 39c14cf..6b6f745 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -45,6 +45,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len)
 extern void strbuf_trim(struct strbuf *);
 extern void strbuf_rtrim(struct strbuf *);
 extern void strbuf_ltrim(struct strbuf *);
+extern void strbuf_tolower(struct strbuf *sb);
 extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
 
 /*
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/9] daemon/config: factor out duplicate xstrdup_tolower
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
  2014-05-21 10:27 ` [PATCH 1/9] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
  2014-05-21 10:27 ` [PATCH 2/9] strbuf: add strbuf_tolower function Jeff King
@ 2014-05-21 10:28 ` Jeff King
  2014-05-21 10:29 ` [PATCH 4/9] http: normalize case of returned content-type Jeff King
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:28 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

We have two implementations of the same function; let's drop
that to one. We take the name from daemon.c, but the
implementation (which is just slightly more efficient) from
the config code.

Signed-off-by: Jeff King <peff@peff.net>
---
Unlike the last patch, this one does _not_ get used elsewhere in this
series. It's just a cleanup I happened to notice while writing the other
one. It could be omitted, or applied separately.

 builtin/config.c | 15 +--------------
 daemon.c         |  8 --------
 strbuf.c         | 13 +++++++++++++
 strbuf.h         |  2 ++
 4 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 5677c94..fcd8474 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -395,19 +395,6 @@ static int urlmatch_collect_fn(const char *var, const char *value, void *cb)
 	return 0;
 }
 
-static char *dup_downcase(const char *string)
-{
-	char *result;
-	size_t len, i;
-
-	len = strlen(string);
-	result = xmalloc(len + 1);
-	for (i = 0; i < len; i++)
-		result[i] = tolower(string[i]);
-	result[i] = '\0';
-	return result;
-}
-
 static int get_urlmatch(const char *var, const char *url)
 {
 	char *section_tail;
@@ -422,7 +409,7 @@ static int get_urlmatch(const char *var, const char *url)
 	if (!url_normalize(url, &config.url))
 		die("%s", config.url.err);
 
-	config.section = dup_downcase(var);
+	config.section = xstrdup_tolower(var);
 	section_tail = strchr(config.section, '.');
 	if (section_tail) {
 		*section_tail = '\0';
diff --git a/daemon.c b/daemon.c
index eba1255..f9c63e9 100644
--- a/daemon.c
+++ b/daemon.c
@@ -475,14 +475,6 @@ static void make_service_overridable(const char *name, int ena)
 	die("No such service %s", name);
 }
 
-static char *xstrdup_tolower(const char *str)
-{
-	char *p, *dup = xstrdup(str);
-	for (p = dup; *p; p++)
-		*p = tolower(*p);
-	return dup;
-}
-
 static void parse_host_and_port(char *hostport, char **host,
 	char **port)
 {
diff --git a/strbuf.c b/strbuf.c
index a1b8a47..d289d1a 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -577,3 +577,16 @@ int fprintf_ln(FILE *fp, const char *fmt, ...)
 		return -1;
 	return ret + 1;
 }
+
+char *xstrdup_tolower(const char *string)
+{
+	char *result;
+	size_t len, i;
+
+	len = strlen(string);
+	result = xmalloc(len + 1);
+	for (i = 0; i < len; i++)
+		result[i] = tolower(string[i]);
+	result[i] = '\0';
+	return result;
+}
diff --git a/strbuf.h b/strbuf.h
index 6b6f745..25328b9 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -184,4 +184,6 @@ extern int printf_ln(const char *fmt, ...);
 __attribute__((format (printf,2,3)))
 extern int fprintf_ln(FILE *fp, const char *fmt, ...);
 
+char *xstrdup_tolower(const char *);
+
 #endif /* STRBUF_H */
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/9] http: normalize case of returned content-type
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (2 preceding siblings ...)
  2014-05-21 10:28 ` [PATCH 3/9] daemon/config: factor out duplicate xstrdup_tolower Jeff King
@ 2014-05-21 10:29 ` Jeff King
  2014-05-21 10:29 ` [PATCH 5/9] t/lib-httpd: use write_script to copy CGI scripts Jeff King
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:29 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

The content-type string curl hands us comes straight from
the server, and may have odd capitalization. RFC 2616 states
that content-types are case-insensitive. We already handle
this when checking for text/plain (by using strcasecmp), but
do not when checking for a smart-http content-type.

We could simply convert the latter to use strcasecmp, but as
we add more parsing of the header, having it normalized will
simplify our parsing code.

Note that there is one caveat. RFC 2616 notes that the type
itself is case insensitive, as are parameter names. However,
parameter valuse may be case-sensitive, depending on the
individual parameter. In practice, we are OK, though. We
currently only look at the type itself. In the future we
will start looking at charset parameters, but those are also
case-insensitive. And it doesn't seem likely that we would
look at any other parameters.

Signed-off-by: Jeff King <peff@peff.net>
---
I think this is fine. If not, we can either:

  1. Use strcasecmp and friends more consistently when
     parsing/comparing (later bits of the series will need to be
     adjusted).

  2. Downcase here in a more context-aware way.

 http.c        | 4 +++-
 remote-curl.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/http.c b/http.c
index 94e1afd..cd6c328 100644
--- a/http.c
+++ b/http.c
@@ -957,9 +957,11 @@ static int http_request(const char *url,
 
 	ret = run_one_slot(slot, &results);
 
-	if (options && options->content_type)
+	if (options && options->content_type) {
 		curlinfo_strbuf(slot->curl, CURLINFO_CONTENT_TYPE,
 				options->content_type);
+		strbuf_tolower(options->content_type);
+	}
 
 	if (options && options->effective_url)
 		curlinfo_strbuf(slot->curl, CURLINFO_EFFECTIVE_URL,
diff --git a/remote-curl.c b/remote-curl.c
index 52c2d96..a5ab977 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -205,7 +205,7 @@ static int show_http_message(struct strbuf *type, struct strbuf *msg)
 	 * TODO should handle "; charset=XXX", and re-encode into
 	 * logoutputencoding
 	 */
-	if (strcasecmp(type->buf, "text/plain"))
+	if (strcmp(type->buf, "text/plain"))
 		return -1;
 
 	strbuf_trim(msg);
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/9] t/lib-httpd: use write_script to copy CGI scripts
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (3 preceding siblings ...)
  2014-05-21 10:29 ` [PATCH 4/9] http: normalize case of returned content-type Jeff King
@ 2014-05-21 10:29 ` Jeff King
  2014-05-21 10:29 ` [PATCH 6/9] t5550: test display of remote http error messages Jeff King
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:29 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

Using write_script will set our shebang line appropriately
with $SHELL_PATH. The script that is there now is quite
simple and likely to succeed even with a non-POSIX /bin/sh,
but it does not hurt to be defensive.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh                   | 6 +++++-
 t/lib-httpd/broken-smart-http.sh | 1 -
 2 files changed, 5 insertions(+), 2 deletions(-)
 mode change 100755 => 100644 t/lib-httpd/broken-smart-http.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 252cbf1..8e680ef 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -105,10 +105,14 @@ else
 		"Could not identify web server at '$LIB_HTTPD_PATH'"
 fi
 
+install_script () {
+	write_script "$HTTPD_ROOT_PATH/$1" <"$TEST_PATH/$1"
+}
+
 prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
-	cp "$TEST_PATH"/broken-smart-http.sh "$HTTPD_ROOT_PATH"
+	install_script broken-smart-http.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/broken-smart-http.sh b/t/lib-httpd/broken-smart-http.sh
old mode 100755
new mode 100644
index f7ebfff..82cc610
--- a/t/lib-httpd/broken-smart-http.sh
+++ b/t/lib-httpd/broken-smart-http.sh
@@ -1,4 +1,3 @@
-#!/bin/sh
 printf "Content-Type: text/%s\n" "html"
 echo
 printf "%s\n" "001e# service=git-upload-pack"
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/9] t5550: test display of remote http error messages
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (4 preceding siblings ...)
  2014-05-21 10:29 ` [PATCH 5/9] t/lib-httpd: use write_script to copy CGI scripts Jeff King
@ 2014-05-21 10:29 ` Jeff King
  2014-05-21 10:33 ` [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter Jeff King
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:29 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

Since commit 426e70d (remote-curl: show server content on
http errors, 2013-04-05), we relay any text/plain error
messages from the remote server to the user. However, we
never tested it.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh             |  1 +
 t/lib-httpd/apache.conf    |  4 ++++
 t/lib-httpd/error.sh       | 17 +++++++++++++++++
 t/t5550-http-fetch-dumb.sh | 10 ++++++++++
 4 files changed, 32 insertions(+)
 create mode 100755 t/lib-httpd/error.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 8e680ef..f7640be 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -113,6 +113,7 @@ prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 3a03e82..b384d79 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -97,12 +97,16 @@ Alias /auth/dumb/ www/auth/dumb/
 </LocationMatch>
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error/ error.sh/
 <Directory ${GIT_EXEC_PATH}>
 	Options FollowSymlinks
 </Directory>
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error.sh>
+  Options ExecCGI
+</Files>
 <Files ${GIT_EXEC_PATH}/git-http-backend>
 	Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
new file mode 100755
index 0000000..786f281
--- /dev/null
+++ b/t/lib-httpd/error.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+printf "Status: 500 Intentional Breakage\n"
+
+printf "Content-Type: "
+case "$PATH_INFO" in
+*html*)
+	printf "text/html"
+	;;
+*text*)
+	printf "text/plain"
+	;;
+esac
+printf "\n"
+
+printf "\n"
+printf "this is the error message\n"
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 1a3a2b6..13defd3 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -171,5 +171,15 @@ test_expect_success 'did not use upload-pack service' '
 	test_cmp exp act
 '
 
+test_expect_success 'git client shows text/plain errors' '
+	test_must_fail git clone "$HTTPD_URL/error/text" 2>stderr &&
+	grep "this is the error message" stderr
+'
+
+test_expect_success 'git client does not show html errors' '
+	test_must_fail git clone "$HTTPD_URL/error/html" 2>stderr &&
+	! grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (5 preceding siblings ...)
  2014-05-21 10:29 ` [PATCH 6/9] t5550: test display of remote http error messages Jeff King
@ 2014-05-21 10:33 ` Jeff King
  2014-05-22  0:07   ` Kyle J. McKay
  2014-05-21 10:33 ` [PATCH 8/9] strbuf: add strbuf_reencode helper Jeff King
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:33 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

Commit 426e70d (remote-curl: show server content on http
errors, 2013-04-05) tried to recognize text/plain content
types, but fails to do so if they have any parameters.

This patches makes our parsing more liberal, though we still
do not do anything useful with a charset parameter.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c              | 26 ++++++++++++++++++--------
 t/lib-httpd/error.sh       |  8 +++++++-
 t/t5550-http-fetch-dumb.sh |  5 +++++
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index a5ab977..6d1b206 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -194,20 +194,30 @@ static void free_discovery(struct discovery *d)
 	}
 }
 
+/*
+ * We only show text/plain parts, as other types are likely
+ * to be ugly to look at on the user's terminal.
+ */
+static int content_type_is_terminal_friendly(struct strbuf *type)
+{
+	const char *p;
+
+	p = skip_prefix(type->buf, "text/plain");
+	if (!p || (*p && *p != ';'))
+		return 0;
+
+	return 1;
+}
+
 static int show_http_message(struct strbuf *type, struct strbuf *msg)
 {
 	const char *p, *eol;
 
-	/*
-	 * We only show text/plain parts, as other types are likely
-	 * to be ugly to look at on the user's terminal.
-	 *
-	 * TODO should handle "; charset=XXX", and re-encode into
-	 * logoutputencoding
-	 */
-	if (strcmp(type->buf, "text/plain"))
+	if (!content_type_is_terminal_friendly(type))
 		return -1;
 
+	/* TODO should record charset and reencode msg to logOutputEncoding */
+
 	strbuf_trim(msg);
 	if (!msg->len)
 		return -1;
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 786f281..02e80b3 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -3,6 +3,7 @@
 printf "Status: 500 Intentional Breakage\n"
 
 printf "Content-Type: "
+charset=iso8859-1
 case "$PATH_INFO" in
 *html*)
 	printf "text/html"
@@ -10,8 +11,13 @@ case "$PATH_INFO" in
 *text*)
 	printf "text/plain"
 	;;
+*charset*)
+	printf "text/plain; charset=utf-8"
+	charset=utf-8
+	;;
 esac
 printf "\n"
 
 printf "\n"
-printf "this is the error message\n"
+printf "this is the error message\n" |
+iconv -f us-ascii -t $charset
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 13defd3..b35b261 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -181,5 +181,10 @@ test_expect_success 'git client does not show html errors' '
 	! grep "this is the error message" stderr
 '
 
+test_expect_success 'git client shows text/plain with a charset' '
+	test_must_fail git clone "$HTTPD_URL/error/charset" 2>stderr &&
+	grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 8/9] strbuf: add strbuf_reencode helper
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (6 preceding siblings ...)
  2014-05-21 10:33 ` [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter Jeff King
@ 2014-05-21 10:33 ` Jeff King
  2014-05-21 10:33 ` [PATCH 9/9] remote-curl: reencode http error messages Jeff King
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
  9 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:33 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

This is a convenience wrapper around `reencode_string_len`
and `strbuf_attach`.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-strbuf.txt |  5 +++++
 strbuf.c                               | 17 +++++++++++++++++
 strbuf.h                               |  1 +
 3 files changed, 23 insertions(+)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index 8480f89..a468816 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -129,6 +129,11 @@ Functions
 
 	Lowercase each character in the buffer using `tolower`.
 
+`strbuf_reencode`::
+
+	Replace the contents of the strbuf with a reencoded form.  Returns -1
+	on error, 0 on success.
+
 `strbuf_cmp`::
 
 	Compare two buffers. Returns an integer less than, equal to, or greater
diff --git a/strbuf.c b/strbuf.c
index d289d1a..a717d9b 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "refs.h"
+#include "utf8.h"
 
 int starts_with(const char *str, const char *prefix)
 {
@@ -113,6 +114,22 @@ void strbuf_tolower(struct strbuf *sb)
 		sb->buf[i] = tolower(sb->buf[i]);
 }
 
+int strbuf_reencode(struct strbuf *sb, const char *from, const char *to)
+{
+	char *out;
+	int len;
+
+	if (same_encoding(from, to))
+		return 0;
+
+	out = reencode_string_len(sb->buf, sb->len, to, from, &len);
+	if (!out)
+		return -1;
+
+	strbuf_attach(sb, out, len, len);
+	return 0;
+}
+
 struct strbuf **strbuf_split_buf(const char *str, size_t slen,
 				 int terminator, int max)
 {
diff --git a/strbuf.h b/strbuf.h
index 25328b9..a068dd6 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -46,6 +46,7 @@ extern void strbuf_trim(struct strbuf *);
 extern void strbuf_rtrim(struct strbuf *);
 extern void strbuf_ltrim(struct strbuf *);
 extern void strbuf_tolower(struct strbuf *sb);
+extern int strbuf_reencode(struct strbuf *sb, const char *from, const char *to);
 extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
 
 /*
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 9/9] remote-curl: reencode http error messages
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (7 preceding siblings ...)
  2014-05-21 10:33 ` [PATCH 8/9] strbuf: add strbuf_reencode helper Jeff King
@ 2014-05-21 10:33 ` Jeff King
  2014-05-22  0:07   ` Kyle J. McKay
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
  9 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-21 10:33 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay

As of the last commit, we now recognize an error message
with a content-type "text/plain; charset=utf-16" as text,
but we ignore the charset parameter entirely. Let's encode
it to log_output_encoding, which is presumably something the
user's terminal can handle.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c              | 37 +++++++++++++++++++++++++++++++++----
 t/lib-httpd/error.sh       |  4 ++++
 t/t5550-http-fetch-dumb.sh |  5 +++++
 3 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 6d1b206..1dc90d7 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -194,11 +194,34 @@ static void free_discovery(struct discovery *d)
 	}
 }
 
+static char *find_param(const char *str, const char *name)
+{
+	int len = strlen(name);
+
+	for (; *str; str++) {
+		const char *p;
+
+		if (*p++ != ' ')
+			continue;
+
+		if (strncmp(p, name, len))
+			continue;
+		p += len;
+
+		if (*p++ != '=')
+			continue;
+
+		return xstrndup(p, strchrnul(p, ' ') - p);
+	}
+
+	return NULL;
+}
+
 /*
  * We only show text/plain parts, as other types are likely
  * to be ugly to look at on the user's terminal.
  */
-static int content_type_is_terminal_friendly(struct strbuf *type)
+static int content_type_is_terminal_friendly(struct strbuf *type, char **charset)
 {
 	const char *p;
 
@@ -206,17 +229,23 @@ static int content_type_is_terminal_friendly(struct strbuf *type)
 	if (!p || (*p && *p != ';'))
 		return 0;
 
+	*charset = find_param(p, "charset");
+	/* default charset from rfc2616 */
+	if (!*charset)
+		*charset = xstrdup("iso8859-1");
+
 	return 1;
 }
 
 static int show_http_message(struct strbuf *type, struct strbuf *msg)
 {
 	const char *p, *eol;
+	char *charset;
 
-	if (!content_type_is_terminal_friendly(type))
+	if (!content_type_is_terminal_friendly(type, &charset))
 		return -1;
-
-	/* TODO should record charset and reencode msg to logOutputEncoding */
+	strbuf_reencode(msg, charset, get_log_output_encoding());
+	free(charset);
 
 	strbuf_trim(msg);
 	if (!msg->len)
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 02e80b3..4efbce7 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -15,6 +15,10 @@ case "$PATH_INFO" in
 	printf "text/plain; charset=utf-8"
 	charset=utf-8
 	;;
+*utf16*)
+	printf "text/plain; charset=utf-16"
+	charset=utf-16
+	;;
 esac
 printf "\n"
 
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index b35b261..01b8aae 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -186,5 +186,10 @@ test_expect_success 'git client shows text/plain with a charset' '
 	grep "this is the error message" stderr
 '
 
+test_expect_success 'http error messages are reencoded' '
+	test_must_fail git clone "$HTTPD_URL/error/utf16" 2>stderr &&
+	grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-21 10:27 ` [PATCH 2/9] strbuf: add strbuf_tolower function Jeff King
@ 2014-05-22  0:07   ` Kyle J. McKay
  2014-05-22  5:58     ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-22  0:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On May 21, 2014, at 03:27, Jeff King wrote:

> This makes config's lowercase() function public.
>
> Note that we could continue to offer a pure-string
> lowercase, but there would be no callers (in most
> pure-string cases, we actually duplicate and lowercase the
> duplicate).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This ones gets used later on...
>
> Documentation/technical/api-strbuf.txt | 4 ++++
> config.c                               | 8 +-------
> strbuf.c                               | 7 +++++++
> strbuf.h                               | 1 +
> 4 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/ 
> technical/api-strbuf.txt
> index 3350d97..8480f89 100644
> --- a/Documentation/technical/api-strbuf.txt
> +++ b/Documentation/technical/api-strbuf.txt
> @@ -125,6 +125,10 @@ Functions
>
> 	Strip whitespace from the end of a string.
>
> +`strbuf_tolower`::
> +
> +	Lowercase each character in the buffer using `tolower`.
> +
> `strbuf_cmp`::
>
> 	Compare two buffers. Returns an integer less than, equal to, or  
> greater
> diff --git a/config.c b/config.c
> index a30cb5c..03ce5c6 100644
> --- a/config.c
> +++ b/config.c
> @@ -147,12 +147,6 @@ int git_config_include(const char *var, const  
> char *value, void *data)
> 	return ret;
> }
>
> -static void lowercase(char *p)
> -{
> -	for (; *p; p++)
> -		*p = tolower(*p);
> -}
> -
> void git_config_push_parameter(const char *text)
> {
> 	struct strbuf env = STRBUF_INIT;
> @@ -180,7 +174,7 @@ int git_config_parse_parameter(const char *text,
> 		strbuf_list_free(pair);
> 		return error("bogus config parameter: %s", text);
> 	}
> -	lowercase(pair[0]->buf);
> +	strbuf_tolower(pair[0]);
> 	if (fn(pair[0]->buf, pair[1] ? pair[1]->buf : NULL, data) < 0) {
> 		strbuf_list_free(pair);
> 		return -1;
> diff --git a/strbuf.c b/strbuf.c
> index ee96dcf..a1b8a47 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -106,6 +106,13 @@ void strbuf_ltrim(struct strbuf *sb)
> 	sb->buf[sb->len] = '\0';
> }
>
> +void strbuf_tolower(struct strbuf *sb)
> +{
> +	size_t i;
> +	for (i = 0; i < sb->len; i++)
> +		sb->buf[i] = tolower(sb->buf[i]);
> +}
> +

Wouldn't a direct transfer of the lowercase function be something more  
like:


void strbuf_tolower(struct strbuf *sb)
{
	char *p = sb->buf;
	for (; *p; p++)
		*p = tolower(*p);
}

That seems to me to be a bit more efficient.  According to the  
comments in strbuf.c, "people can always assume buf is non NULL and - 
 >buf is NUL terminated even for a freshly initialized strbuf."

> struct strbuf **strbuf_split_buf(const char *str, size_t slen,
> 				 int terminator, int max)
> {
> diff --git a/strbuf.h b/strbuf.h
> index 39c14cf..6b6f745 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -45,6 +45,7 @@ static inline void strbuf_setlen(struct strbuf  
> *sb, size_t len)
> extern void strbuf_trim(struct strbuf *);
> extern void strbuf_rtrim(struct strbuf *);
> extern void strbuf_ltrim(struct strbuf *);
> +extern void strbuf_tolower(struct strbuf *sb);
> extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
>
> /*
> -- 
> 2.0.0.rc1.436.g03cb729
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-21 10:33 ` [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter Jeff King
@ 2014-05-22  0:07   ` Kyle J. McKay
  2014-05-22  6:05     ` Jeff King
  2014-05-22  7:12     ` Peter Krefting
  0 siblings, 2 replies; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-22  0:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On May 21, 2014, at 03:33, Jeff King wrote:

> Commit 426e70d (remote-curl: show server content on http
> errors, 2013-04-05) tried to recognize text/plain content
> types, but fails to do so if they have any parameters.
>
> This patches makes our parsing more liberal, though we still
> do not do anything useful with a charset parameter.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> remote-curl.c              | 26 ++++++++++++++++++--------
> t/lib-httpd/error.sh       |  8 +++++++-
> t/t5550-http-fetch-dumb.sh |  5 +++++
> 3 files changed, 30 insertions(+), 9 deletions(-)
>
> diff --git a/remote-curl.c b/remote-curl.c
> index a5ab977..6d1b206 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -194,20 +194,30 @@ static void free_discovery(struct discovery *d)
> 	}
> }
>
> +/*
> + * We only show text/plain parts, as other types are likely
> + * to be ugly to look at on the user's terminal.
> + */
> +static int content_type_is_terminal_friendly(struct strbuf *type)
> +{
> +	const char *p;
> +
> +	p = skip_prefix(type->buf, "text/plain");
> +	if (!p || (*p && *p != ';'))
> +		return 0;
> +
> +	return 1;
> +}
> +

I think that a strict reading of RFC 2616 allows "text/plain ;  
charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;  
charset=utf-8".  So perhaps this if line instead:

+	if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t'))

See RFC 2616 section 2.2 for the definition of linear white space  
(LWS) and the end of section 2.1 for the "implied *LWS" rule that  
allows it.

> static int show_http_message(struct strbuf *type, struct strbuf *msg)
> {
> 	const char *p, *eol;
>
> -	/*
> -	 * We only show text/plain parts, as other types are likely
> -	 * to be ugly to look at on the user's terminal.
> -	 *
> -	 * TODO should handle "; charset=XXX", and re-encode into
> -	 * logoutputencoding
> -	 */
> -	if (strcmp(type->buf, "text/plain"))
> +	if (!content_type_is_terminal_friendly(type))
> 		return -1;
>
> +	/* TODO should record charset and reencode msg to  
> logOutputEncoding */
> +
> 	strbuf_trim(msg);
> 	if (!msg->len)
> 		return -1;
> diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
> index 786f281..02e80b3 100755
> --- a/t/lib-httpd/error.sh
> +++ b/t/lib-httpd/error.sh
> @@ -3,6 +3,7 @@
> printf "Status: 500 Intentional Breakage\n"
>
> printf "Content-Type: "
> +charset=iso8859-1
> case "$PATH_INFO" in
> *html*)
> 	printf "text/html"
> @@ -10,8 +11,13 @@ case "$PATH_INFO" in
> *text*)
> 	printf "text/plain"
> 	;;
> +*charset*)
> +	printf "text/plain; charset=utf-8"
> +	charset=utf-8
> +	;;
> esac
> printf "\n"
>
> printf "\n"
> -printf "this is the error message\n"
> +printf "this is the error message\n" |
> +iconv -f us-ascii -t $charset
> diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
> index 13defd3..b35b261 100755
> --- a/t/t5550-http-fetch-dumb.sh
> +++ b/t/t5550-http-fetch-dumb.sh
> @@ -181,5 +181,10 @@ test_expect_success 'git client does not show  
> html errors' '
> 	! grep "this is the error message" stderr
> '
>
> +test_expect_success 'git client shows text/plain with a charset' '
> +	test_must_fail git clone "$HTTPD_URL/error/charset" 2>stderr &&
> +	grep "this is the error message" stderr
> +'
> +
> stop_httpd
> test_done
> -- 
> 2.0.0.rc1.436.g03cb729
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] remote-curl: reencode http error messages
  2014-05-21 10:33 ` [PATCH 9/9] remote-curl: reencode http error messages Jeff King
@ 2014-05-22  0:07   ` Kyle J. McKay
  2014-05-22  6:05     ` Jeff King
  2014-05-22  7:26     ` Peter Krefting
  0 siblings, 2 replies; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-22  0:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On May 21, 2014, at 03:33, Jeff King wrote:

> As of the last commit, we now recognize an error message
> with a content-type "text/plain; charset=utf-16" as text,
> but we ignore the charset parameter entirely. Let's encode
> it to log_output_encoding, which is presumably something the
> user's terminal can handle.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> remote-curl.c              | 37 +++++++++++++++++++++++++++++++++----
> t/lib-httpd/error.sh       |  4 ++++
> t/t5550-http-fetch-dumb.sh |  5 +++++
> 3 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/remote-curl.c b/remote-curl.c
> index 6d1b206..1dc90d7 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -194,11 +194,34 @@ static void free_discovery(struct discovery *d)
> 	}
> }
>
> +static char *find_param(const char *str, const char *name)
> +{
> +	int len = strlen(name);
> +
> +	for (; *str; str++) {
> +		const char *p;
> +
> +		if (*p++ != ' ')
> +			continue;
> +
> +		if (strncmp(p, name, len))
> +			continue;
> +		p += len;
> +
> +		if (*p++ != '=')
> +			continue;
> +
> +		return xstrndup(p, strchrnul(p, ' ') - p);
> +	}
> +
> +	return NULL;
> +}
> +
> /*
>  * We only show text/plain parts, as other types are likely
>  * to be ugly to look at on the user's terminal.
>  */
> -static int content_type_is_terminal_friendly(struct strbuf *type)
> +static int content_type_is_terminal_friendly(struct strbuf *type,  
> char **charset)
> {
> 	const char *p;
>
> @@ -206,17 +229,23 @@ static int  
> content_type_is_terminal_friendly(struct strbuf *type)
> 	if (!p || (*p && *p != ';'))
> 		return 0;
>
> +	*charset = find_param(p, "charset");
> +	/* default charset from rfc2616 */
> +	if (!*charset)
> +		*charset = xstrdup("iso8859-1");

Actually the name should be "ISO-8859-1".  See RFC 2616 section  
3.7.1.  Since it's case insensitive "iso-8859-1" would be fine too.

> +
> 	return 1;
> }
>
> static int show_http_message(struct strbuf *type, struct strbuf *msg)
> {
> 	const char *p, *eol;
> +	char *charset;
>
> -	if (!content_type_is_terminal_friendly(type))
> +	if (!content_type_is_terminal_friendly(type, &charset))
> 		return -1;
> -
> -	/* TODO should record charset and reencode msg to  
> logOutputEncoding */
> +	strbuf_reencode(msg, charset, get_log_output_encoding());
> +	free(charset);
>
> 	strbuf_trim(msg);
> 	if (!msg->len)
> diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
> index 02e80b3..4efbce7 100755
> --- a/t/lib-httpd/error.sh
> +++ b/t/lib-httpd/error.sh
> @@ -15,6 +15,10 @@ case "$PATH_INFO" in
> 	printf "text/plain; charset=utf-8"
> 	charset=utf-8
> 	;;
> +*utf16*)
> +	printf "text/plain; charset=utf-16"
> +	charset=utf-16
> +	;;
> esac
> printf "\n"
>
> diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
> index b35b261..01b8aae 100755
> --- a/t/t5550-http-fetch-dumb.sh
> +++ b/t/t5550-http-fetch-dumb.sh
> @@ -186,5 +186,10 @@ test_expect_success 'git client shows text/ 
> plain with a charset' '
> 	grep "this is the error message" stderr
> '
>
> +test_expect_success 'http error messages are reencoded' '
> +	test_must_fail git clone "$HTTPD_URL/error/utf16" 2>stderr &&
> +	grep "this is the error message" stderr
> +'
> +
> stop_httpd
> test_done
> -- 
> 2.0.0.rc1.436.g03cb729

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22  0:07   ` Kyle J. McKay
@ 2014-05-22  5:58     ` Jeff King
  2014-05-22 18:36       ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-22  5:58 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git

On Wed, May 21, 2014 at 05:07:36PM -0700, Kyle J. McKay wrote:

> >+void strbuf_tolower(struct strbuf *sb)
> >+{
> >+	size_t i;
> >+	for (i = 0; i < sb->len; i++)
> >+		sb->buf[i] = tolower(sb->buf[i]);
> >+}
> >+
> 
> Wouldn't a direct transfer of the lowercase function be something more like:
> 
> 
> void strbuf_tolower(struct strbuf *sb)
> {
> 	char *p = sb->buf;
> 	for (; *p; p++)
> 		*p = tolower(*p);
> }
> 
> That seems to me to be a bit more efficient.  According to the comments in
> strbuf.c, "people can always assume buf is non NULL and ->buf is NUL
> terminated even for a freshly initialized strbuf."

Yes, and that would be fine with me (I actually wrote strbuf_tolower for
my own use, and _then_ realized that we already had such a thing that
could be replaced).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-22  0:07   ` Kyle J. McKay
@ 2014-05-22  6:05     ` Jeff King
  2014-05-22  7:27       ` Kyle J. McKay
  2014-05-22  7:12     ` Peter Krefting
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-22  6:05 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git

On Wed, May 21, 2014 at 05:07:38PM -0700, Kyle J. McKay wrote:

> >+	p = skip_prefix(type->buf, "text/plain");
> >+	if (!p || (*p && *p != ';'))
> >+		return 0;
> >+
> >+	return 1;
> >+}
> >+
> 
> I think that a strict reading of RFC 2616 allows "text/plain ;
> charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;
> charset=utf-8".  So perhaps this if line instead:
> 
> +	if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t'))
> 
> See RFC 2616 section 2.2 for the definition of linear white space (LWS) and
> the end of section 2.1 for the "implied *LWS" rule that allows it.

You're right. There are a few exceptions in 3.7:

  Linear white space (LWS) MUST NOT be used between the type and
  subtype, nor between an attribute and its value.

but it does not include between the subtype and parameter. So the
"find_parameter" also needs to accept the collapsed whitespace, too, I
guess.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] remote-curl: reencode http error messages
  2014-05-22  0:07   ` Kyle J. McKay
@ 2014-05-22  6:05     ` Jeff King
  2014-05-22  7:26     ` Peter Krefting
  1 sibling, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  6:05 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git

On Wed, May 21, 2014 at 05:07:40PM -0700, Kyle J. McKay wrote:

> >+	/* default charset from rfc2616 */
> >+	if (!*charset)
> >+		*charset = xstrdup("iso8859-1");
> 
> Actually the name should be "ISO-8859-1".  See RFC 2616 section 3.7.1.
> Since it's case insensitive "iso-8859-1" would be fine too.

Thanks, will fix.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-22  0:07   ` Kyle J. McKay
  2014-05-22  6:05     ` Jeff King
@ 2014-05-22  7:12     ` Peter Krefting
  2014-05-22  9:05       ` Jeff King
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Krefting @ 2014-05-22  7:12 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Jeff King, git

Kyle J. McKay:

> I think that a strict reading of RFC 2616 allows "text/plain ; charset=utf-8" 
> as well as "text/plain;charset=utf-8" and "text/plain; charset=utf-8".

It does indeed, and I have seen servers send both variants, so they do 
need to be catered for.

The number of servers that would actually send the charset attribute 
here (for error messages) are probably not that many. It is probably a 
good idea to make the default user-configurable (I know the specs 
state that anything undeclared is iso-8859-1, but the real world 
doesn't agree to that).

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] remote-curl: reencode http error messages
  2014-05-22  0:07   ` Kyle J. McKay
  2014-05-22  6:05     ` Jeff King
@ 2014-05-22  7:26     ` Peter Krefting
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Krefting @ 2014-05-22  7:26 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Jeff King, git

Kyle J. McKay:

>> +	if (!*charset)
>> +		*charset = xstrdup("iso8859-1");
>
> Actually the name should be "ISO-8859-1".  See RFC 2616 section 3.7.1.  Since 
> it's case insensitive "iso-8859-1" would be fine too.

You'd be amazed at what you see in the wild... I'd recommend going 
with the recommended algorithm from WHATWG's Encoding Standard, if you 
want to make this robust: 
<http://encoding.spec.whatwg.org/#names-and-labels>.

The spec is partly based on a lot of research I made in my previous 
$DAYJOB, with a lot of research added by the spec writer.

There is also Unicode's attempt at it, but it does unfortunately 
produce too many false positives: <http://www.unicode.org/reports/tr22/tr22-7.html#Charset_Alias_Matching>

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-22  6:05     ` Jeff King
@ 2014-05-22  7:27       ` Kyle J. McKay
  2014-05-22  9:02         ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-22  7:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On May 21, 2014, at 23:05, Jeff King wrote:

> On Wed, May 21, 2014 at 05:07:38PM -0700, Kyle J. McKay wrote:
>
>>> +	p = skip_prefix(type->buf, "text/plain");
>>> +	if (!p || (*p && *p != ';'))
>>> +		return 0;
>>> +
>>> +	return 1;
>>> +}
>>> +
>>
>> I think that a strict reading of RFC 2616 allows "text/plain ;
>> charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;
>> charset=utf-8".  So perhaps this if line instead:
>>
>> +	if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t'))
>>
>> See RFC 2616 section 2.2 for the definition of linear white space  
>> (LWS) and
>> the end of section 2.1 for the "implied *LWS" rule that allows it.
>
> You're right. There are a few exceptions in 3.7:
>
>  Linear white space (LWS) MUST NOT be used between the type and
>  subtype, nor between an attribute and its value.
>
> but it does not include between the subtype and parameter. So the
> "find_parameter" also needs to accept the collapsed whitespace, too, I
> guess.

Yeah I think so too.  It's probably enough though just to just strip  
all " " and "\t" characters at the same time the content type is  
lowercased.  While that would cause invalid content types such as  
"text / plain" to be recognized it would keep the rest of the code  
simpler.  Since a producer of an invalid content type shouldn't really  
be depending on any particular behavior by a receiver of an invalid  
content type it's probably not an issue.

--Kyle

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-22  7:27       ` Kyle J. McKay
@ 2014-05-22  9:02         ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:02 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git

On Thu, May 22, 2014 at 12:27:38AM -0700, Kyle J. McKay wrote:

> Yeah I think so too.  It's probably enough though just to just strip all " "
> and "\t" characters at the same time the content type is lowercased.  While
> that would cause invalid content types such as "text / plain" to be
> recognized it would keep the rest of the code simpler.  Since a producer of
> an invalid content type shouldn't really be depending on any particular
> behavior by a receiver of an invalid content type it's probably not an
> issue.

I think you have to retain whitespace between parameters. Not that I
expect there to be multiple parameters in a text/plain, but it's
allowed.

I started doing this path of trying to produce a normalized version, but
found that it was just as much code as parsing at all (and it still
leaves the calling code to do its own parse). Instead, what I ended up
with was just doing the parsing in http.c into nice, normalized chunks,
and then the calling code can compare them with simple strcmps.

I'll post the patches in a few minutes.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-22  7:12     ` Peter Krefting
@ 2014-05-22  9:05       ` Jeff King
  2014-05-22 10:19         ` Peter Krefting
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:05 UTC (permalink / raw)
  To: Peter Krefting; +Cc: Kyle J. McKay, git

On Thu, May 22, 2014 at 08:12:58AM +0100, Peter Krefting wrote:

> Kyle J. McKay:
> 
> >I think that a strict reading of RFC 2616 allows "text/plain ;
> >charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;
> >charset=utf-8".
> 
> It does indeed, and I have seen servers send both variants, so they do need
> to be catered for.
> 
> The number of servers that would actually send the charset attribute here
> (for error messages) are probably not that many. It is probably a good idea
> to make the default user-configurable (I know the specs state that anything
> undeclared is iso-8859-1, but the real world doesn't agree to that).

I was really hoping to avoid getting into all of the real-world
messiness that a real http client needs to deal with (as opposed to just
following the standards). This is only used for an optional relay of
error messages from the server. Most servers will send back text/html by
default, and only those which specifically configure text/plain will
have their messages shown. IOW, I expect this to be configured
specifically for git messages, and the server admin can make sure they
are following the standard and that it works with git.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 0/9] handle alternate charsets for remote http errors
  2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
                   ` (8 preceding siblings ...)
  2014-05-21 10:33 ` [PATCH 9/9] remote-curl: reencode http error messages Jeff King
@ 2014-05-22  9:28 ` Jeff King
  2014-05-22  9:28   ` [PATCH v2 1/8] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
                     ` (7 more replies)
  9 siblings, 8 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:28 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

On Wed, May 21, 2014 at 06:25:24AM -0400, Jeff King wrote:

> As of commit 426e70d (remote-curl: show server content on http errors,
> 2013-04-05), we relay any text/plain errors shown by the remote http
> server to the user. However, we were lazy back then and left this TODO
> in place:
> 
>        /*
>         * We only show text/plain parts, as other types are likely
>         * to be ugly to look at on the user's terminal.
>         *
>         * TODO should handle "; charset=XXX", and re-encode into
>         * logoutputencoding
>         */
> 
> This series actually implements that, along with a few other cleanups.

Here's a second version based on feedback from Kyle and Peter. Thanks
both for your comments.

It drops the "tolower" patches, which are not used anymore, and makes
the parsing of content-types and their parameters a bit more robust.

  [1/8]: test-lib: preserve GIT_CURL_VERBOSE from the environment
  [2/8]: t/lib-httpd: use write_script to copy CGI scripts
  [3/8]: t5550: test display of remote http error messages

    These three are the same as before.

  [4/8]: http: extract type/subtype portion of content-type

    Make our content-type matching more robust, both for the errors and
    for matching smart-http types.

  [5/8]: http: optionally extract charset parameter from content-type

    Feature work to support 7/8.

  [6/8]: strbuf: add strbuf_reencode helper

    Same as before (feature work to support 7/8).

  [7/8]: remote-curl: reencode http error messages

    The actual fix.

  [8/8]: http: default text charset to iso-8859-1

    This could be part of 5/8, but I floated it to the top of the heap
    to make it easier to discuss/adjust it.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/8] test-lib: preserve GIT_CURL_VERBOSE from the environment
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
@ 2014-05-22  9:28   ` Jeff King
  2014-05-22  9:28   ` [PATCH v2 2/8] t/lib-httpd: use write_script to copy CGI scripts Jeff King
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:28 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

Turning on this variable can be useful when debugging http
tests. It does break a few tests in t5541, but it is not
a variable that the user is likely to have enabled
accidentally.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index c081668..f7e55b1 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -91,6 +91,7 @@ unset VISUAL EMAIL LANGUAGE COLUMNS $("$PERL_PATH" -e '
 		VALGRIND
 		UNZIP
 		PERF_
+		CURL_VERBOSE
 	));
 	my @vars = grep(/^GIT_/ && !/^GIT_($ok)/o, @env);
 	print join("\n", @vars);
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/8] t/lib-httpd: use write_script to copy CGI scripts
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
  2014-05-22  9:28   ` [PATCH v2 1/8] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
@ 2014-05-22  9:28   ` Jeff King
  2014-05-22  9:29   ` [PATCH v2 3/8] t5550: test display of remote http error messages Jeff King
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:28 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

Using write_script will set our shebang line appropriately
with $SHELL_PATH. The script that is there now is quite
simple and likely to succeed even with a non-POSIX /bin/sh,
but it does not hurt to be defensive.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh                   | 6 +++++-
 t/lib-httpd/broken-smart-http.sh | 1 -
 2 files changed, 5 insertions(+), 2 deletions(-)
 mode change 100755 => 100644 t/lib-httpd/broken-smart-http.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 252cbf1..8e680ef 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -105,10 +105,14 @@ else
 		"Could not identify web server at '$LIB_HTTPD_PATH'"
 fi
 
+install_script () {
+	write_script "$HTTPD_ROOT_PATH/$1" <"$TEST_PATH/$1"
+}
+
 prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
-	cp "$TEST_PATH"/broken-smart-http.sh "$HTTPD_ROOT_PATH"
+	install_script broken-smart-http.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/broken-smart-http.sh b/t/lib-httpd/broken-smart-http.sh
old mode 100755
new mode 100644
index f7ebfff..82cc610
--- a/t/lib-httpd/broken-smart-http.sh
+++ b/t/lib-httpd/broken-smart-http.sh
@@ -1,4 +1,3 @@
-#!/bin/sh
 printf "Content-Type: text/%s\n" "html"
 echo
 printf "%s\n" "001e# service=git-upload-pack"
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] t5550: test display of remote http error messages
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
  2014-05-22  9:28   ` [PATCH v2 1/8] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
  2014-05-22  9:28   ` [PATCH v2 2/8] t/lib-httpd: use write_script to copy CGI scripts Jeff King
@ 2014-05-22  9:29   ` Jeff King
  2014-05-22  9:29   ` [PATCH v2 4/8] http: extract type/subtype portion of content-type Jeff King
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:29 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

Since commit 426e70d (remote-curl: show server content on
http errors, 2013-04-05), we relay any text/plain error
messages from the remote server to the user. However, we
never tested it.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh             |  1 +
 t/lib-httpd/apache.conf    |  4 ++++
 t/lib-httpd/error.sh       | 17 +++++++++++++++++
 t/t5550-http-fetch-dumb.sh | 10 ++++++++++
 4 files changed, 32 insertions(+)
 create mode 100755 t/lib-httpd/error.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 8e680ef..f7640be 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -113,6 +113,7 @@ prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 3a03e82..b384d79 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -97,12 +97,16 @@ Alias /auth/dumb/ www/auth/dumb/
 </LocationMatch>
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error/ error.sh/
 <Directory ${GIT_EXEC_PATH}>
 	Options FollowSymlinks
 </Directory>
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error.sh>
+  Options ExecCGI
+</Files>
 <Files ${GIT_EXEC_PATH}/git-http-backend>
 	Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
new file mode 100755
index 0000000..786f281
--- /dev/null
+++ b/t/lib-httpd/error.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+printf "Status: 500 Intentional Breakage\n"
+
+printf "Content-Type: "
+case "$PATH_INFO" in
+*html*)
+	printf "text/html"
+	;;
+*text*)
+	printf "text/plain"
+	;;
+esac
+printf "\n"
+
+printf "\n"
+printf "this is the error message\n"
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 1a3a2b6..13defd3 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -171,5 +171,15 @@ test_expect_success 'did not use upload-pack service' '
 	test_cmp exp act
 '
 
+test_expect_success 'git client shows text/plain errors' '
+	test_must_fail git clone "$HTTPD_URL/error/text" 2>stderr &&
+	grep "this is the error message" stderr
+'
+
+test_expect_success 'git client does not show html errors' '
+	test_must_fail git clone "$HTTPD_URL/error/html" 2>stderr &&
+	! grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/8] http: extract type/subtype portion of content-type
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
                     ` (2 preceding siblings ...)
  2014-05-22  9:29   ` [PATCH v2 3/8] t5550: test display of remote http error messages Jeff King
@ 2014-05-22  9:29   ` Jeff King
  2014-05-22 22:52     ` Kyle J. McKay
  2014-05-22  9:30   ` [PATCH v2 5/8] http: optionally extract charset parameter from content-type Jeff King
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:29 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

When we get a content-type from curl, we get the whole
header line, including any parameters, and without any
normalization (like downcasing or whitespace) applied.
If we later try to match it with strcmp() or even
strcasecmp(), we may get false negatives.

This could cause two visible behaviors:

  1. We might fail to recognize a smart-http server by its
     content-type.

  2. We might fail to relay text/plain error messages to
     users (especially if they contain a charset parameter).

This patch teaches the http code to extract and normalize
just the type/subtype portion of the string. This is
technically passing out less information to the callers, who
can no longer see the parameters. But none of the current
callers cares, and a future patch will add back an
easier-to-use method for accessing those parameters.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c                     | 32 +++++++++++++++++++++++++++++---
 remote-curl.c              |  2 +-
 t/lib-httpd/error.sh       |  8 +++++++-
 t/t5550-http-fetch-dumb.sh |  5 +++++
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/http.c b/http.c
index 94e1afd..4edf5b9 100644
--- a/http.c
+++ b/http.c
@@ -906,6 +906,29 @@ static CURLcode curlinfo_strbuf(CURL *curl, CURLINFO info, struct strbuf *buf)
 	return ret;
 }
 
+/*
+ * Extract a normalized version of the content type, with any
+ * spaces suppressed, all letters lowercased, and no trailing ";"
+ * or parameters.
+ *
+ * Example:
+ *   "TEXT/PLAIN; charset=utf-8" -> "text/plain"
+ */
+static void extract_content_type(struct strbuf *raw, struct strbuf *type)
+{
+	const char *p;
+
+	strbuf_reset(type);
+	strbuf_grow(type, raw->len);
+	for (p = raw->buf; *p; p++) {
+		if (isspace(*p))
+			continue;
+		if (*p == ';')
+			break;
+		strbuf_addch(type, tolower(*p));
+	}
+}
+
 /* http_request() targets */
 #define HTTP_REQUEST_STRBUF	0
 #define HTTP_REQUEST_FILE	1
@@ -957,9 +980,12 @@ static int http_request(const char *url,
 
 	ret = run_one_slot(slot, &results);
 
-	if (options && options->content_type)
-		curlinfo_strbuf(slot->curl, CURLINFO_CONTENT_TYPE,
-				options->content_type);
+	if (options && options->content_type) {
+		struct strbuf raw = STRBUF_INIT;
+		curlinfo_strbuf(slot->curl, CURLINFO_CONTENT_TYPE, &raw);
+		extract_content_type(&raw, options->content_type);
+		strbuf_release(&raw);
+	}
 
 	if (options && options->effective_url)
 		curlinfo_strbuf(slot->curl, CURLINFO_EFFECTIVE_URL,
diff --git a/remote-curl.c b/remote-curl.c
index 52c2d96..a5ab977 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -205,7 +205,7 @@ static int show_http_message(struct strbuf *type, struct strbuf *msg)
 	 * TODO should handle "; charset=XXX", and re-encode into
 	 * logoutputencoding
 	 */
-	if (strcasecmp(type->buf, "text/plain"))
+	if (strcmp(type->buf, "text/plain"))
 		return -1;
 
 	strbuf_trim(msg);
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 786f281..23cec97 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -3,6 +3,7 @@
 printf "Status: 500 Intentional Breakage\n"
 
 printf "Content-Type: "
+charset=iso-8859-1
 case "$PATH_INFO" in
 *html*)
 	printf "text/html"
@@ -10,8 +11,13 @@ case "$PATH_INFO" in
 *text*)
 	printf "text/plain"
 	;;
+*charset*)
+	printf "text/plain; charset=utf-8"
+	charset=utf-8
+	;;
 esac
 printf "\n"
 
 printf "\n"
-printf "this is the error message\n"
+printf "this is the error message\n" |
+iconv -f us-ascii -t $charset
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 13defd3..b35b261 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -181,5 +181,10 @@ test_expect_success 'git client does not show html errors' '
 	! grep "this is the error message" stderr
 '
 
+test_expect_success 'git client shows text/plain with a charset' '
+	test_must_fail git clone "$HTTPD_URL/error/charset" 2>stderr &&
+	grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/8] http: optionally extract charset parameter from content-type
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
                     ` (3 preceding siblings ...)
  2014-05-22  9:29   ` [PATCH v2 4/8] http: extract type/subtype portion of content-type Jeff King
@ 2014-05-22  9:30   ` Jeff King
  2014-05-22  9:30   ` [PATCH v2 6/8] strbuf: add strbuf_reencode helper Jeff King
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:30 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

Since the previous commit, we now give a sanitized,
shortened version of the content-type header to any callers
who ask for it.

This patch adds back a way for them to cleanly access
specific parameters to the type. We could easily extract all
parameters and make them available via a string_list, but:

  1. That complicates the interface and memory management.

  2. In practice, no planned callers care about anything
     except the charset.

This patch therefore goes with the simplest thing, and we
can expand or change the interface later if it becomes
necessary.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 http.h |  7 +++++++
 2 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/http.c b/http.c
index 4edf5b9..e26ee8b 100644
--- a/http.c
+++ b/http.c
@@ -907,14 +907,44 @@ static CURLcode curlinfo_strbuf(CURL *curl, CURLINFO info, struct strbuf *buf)
 }
 
 /*
+ * Check for and extract a content-type parameter. "raw"
+ * should be positioned at the start of the potential
+ * parameter, with any whitespace already removed.
+ *
+ * "name" is the name of the parameter. The value is appended
+ * to "out".
+ */
+static int extract_param(const char *raw, const char *name,
+			 struct strbuf *out)
+{
+	size_t len = strlen(name);
+
+	if (strncasecmp(raw, name, len))
+		return -1;
+	raw += len;
+
+	if (*raw != '=')
+		return -1;
+	raw++;
+
+	while (*raw && !isspace(*raw))
+		strbuf_addch(out, *raw++);
+	return 0;
+}
+
+/*
  * Extract a normalized version of the content type, with any
  * spaces suppressed, all letters lowercased, and no trailing ";"
  * or parameters.
  *
+ * If the "charset" argument is not NULL, store the value of any
+ * charset parameter there.
+ *
  * Example:
- *   "TEXT/PLAIN; charset=utf-8" -> "text/plain"
+ *   "TEXT/PLAIN; charset=utf-8" -> "text/plain", "utf-8"
  */
-static void extract_content_type(struct strbuf *raw, struct strbuf *type)
+static void extract_content_type(struct strbuf *raw, struct strbuf *type,
+				 struct strbuf *charset)
 {
 	const char *p;
 
@@ -923,10 +953,25 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type)
 	for (p = raw->buf; *p; p++) {
 		if (isspace(*p))
 			continue;
-		if (*p == ';')
+		if (*p == ';') {
+			p++;
 			break;
+		}
 		strbuf_addch(type, tolower(*p));
 	}
+
+	if (!charset)
+		return;
+
+	strbuf_reset(charset);
+	while (*p) {
+		while (isspace(*p))
+			p++;
+		if (!extract_param(p, "charset", charset))
+			return;
+		while (*p && !isspace(*p))
+			p++;
+	}
 }
 
 /* http_request() targets */
@@ -983,7 +1028,8 @@ static int http_request(const char *url,
 	if (options && options->content_type) {
 		struct strbuf raw = STRBUF_INIT;
 		curlinfo_strbuf(slot->curl, CURLINFO_CONTENT_TYPE, &raw);
-		extract_content_type(&raw, options->content_type);
+		extract_content_type(&raw, options->content_type,
+				     options->charset);
 		strbuf_release(&raw);
 	}
 
diff --git a/http.h b/http.h
index e64084f..473179b 100644
--- a/http.h
+++ b/http.h
@@ -144,6 +144,13 @@ struct http_get_options {
 	struct strbuf *content_type;
 
 	/*
+	 * If non-NULL, and content_type above is non-NULL, returns
+	 * the charset parameter from the content-type. If none is
+	 * present, returns an empty string.
+	 */
+	struct strbuf *charset;
+
+	/*
 	 * If non-NULL, returns the URL we ended up at, including any
 	 * redirects we followed.
 	 */
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 6/8] strbuf: add strbuf_reencode helper
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
                     ` (4 preceding siblings ...)
  2014-05-22  9:30   ` [PATCH v2 5/8] http: optionally extract charset parameter from content-type Jeff King
@ 2014-05-22  9:30   ` Jeff King
  2014-05-22  9:30   ` [PATCH v2 7/8] remote-curl: reencode http error messages Jeff King
  2014-05-22  9:36   ` [PATCH v2 8/8] http: default text charset to iso-8859-1 Jeff King
  7 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:30 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

This is a convenience wrapper around `reencode_string_len`
and `strbuf_attach`.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-strbuf.txt |  5 +++++
 strbuf.c                               | 17 +++++++++++++++++
 strbuf.h                               |  1 +
 3 files changed, 23 insertions(+)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index 3350d97..9d28b03 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -125,6 +125,11 @@ Functions
 
 	Strip whitespace from the end of a string.
 
+`strbuf_reencode`::
+
+	Replace the contents of the strbuf with a reencoded form.  Returns -1
+	on error, 0 on success.
+
 `strbuf_cmp`::
 
 	Compare two buffers. Returns an integer less than, equal to, or greater
diff --git a/strbuf.c b/strbuf.c
index ee96dcf..fc7290f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "refs.h"
+#include "utf8.h"
 
 int starts_with(const char *str, const char *prefix)
 {
@@ -106,6 +107,22 @@ void strbuf_ltrim(struct strbuf *sb)
 	sb->buf[sb->len] = '\0';
 }
 
+int strbuf_reencode(struct strbuf *sb, const char *from, const char *to)
+{
+	char *out;
+	int len;
+
+	if (same_encoding(from, to))
+		return 0;
+
+	out = reencode_string_len(sb->buf, sb->len, to, from, &len);
+	if (!out)
+		return -1;
+
+	strbuf_attach(sb, out, len, len);
+	return 0;
+}
+
 struct strbuf **strbuf_split_buf(const char *str, size_t slen,
 				 int terminator, int max)
 {
diff --git a/strbuf.h b/strbuf.h
index 39c14cf..4e9a2f8 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -45,6 +45,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len)
 extern void strbuf_trim(struct strbuf *);
 extern void strbuf_rtrim(struct strbuf *);
 extern void strbuf_ltrim(struct strbuf *);
+extern int strbuf_reencode(struct strbuf *sb, const char *from, const char *to);
 extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
 
 /*
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 7/8] remote-curl: reencode http error messages
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
                     ` (5 preceding siblings ...)
  2014-05-22  9:30   ` [PATCH v2 6/8] strbuf: add strbuf_reencode helper Jeff King
@ 2014-05-22  9:30   ` Jeff King
  2014-05-22  9:36   ` [PATCH v2 8/8] http: default text charset to iso-8859-1 Jeff King
  7 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:30 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

We currently recognize an error message with a content-type
"text/plain; charset=utf-16" as text, but we ignore the
charset parameter entirely. Let's encode it to
log_output_encoding, which is presumably something the
user's terminal can handle.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c              | 17 ++++++++++-------
 t/lib-httpd/error.sh       |  4 ++++
 t/t5550-http-fetch-dumb.sh |  5 +++++
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index a5ab977..4493b38 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -194,19 +194,19 @@ static void free_discovery(struct discovery *d)
 	}
 }
 
-static int show_http_message(struct strbuf *type, struct strbuf *msg)
+static int show_http_message(struct strbuf *type, struct strbuf *charset,
+			     struct strbuf *msg)
 {
 	const char *p, *eol;
 
 	/*
 	 * We only show text/plain parts, as other types are likely
 	 * to be ugly to look at on the user's terminal.
-	 *
-	 * TODO should handle "; charset=XXX", and re-encode into
-	 * logoutputencoding
 	 */
 	if (strcmp(type->buf, "text/plain"))
 		return -1;
+	if (charset->len)
+		strbuf_reencode(msg, charset->buf, get_log_output_encoding());
 
 	strbuf_trim(msg);
 	if (!msg->len)
@@ -225,6 +225,7 @@ static struct discovery* discover_refs(const char *service, int for_push)
 {
 	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
+	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
 	struct strbuf refs_url = STRBUF_INIT;
 	struct strbuf effective_url = STRBUF_INIT;
@@ -249,6 +250,7 @@ static struct discovery* discover_refs(const char *service, int for_push)
 
 	memset(&options, 0, sizeof(options));
 	options.content_type = &type;
+	options.charset = &charset;
 	options.effective_url = &effective_url;
 	options.base_url = &url;
 	options.no_cache = 1;
@@ -259,13 +261,13 @@ static struct discovery* discover_refs(const char *service, int for_push)
 	case HTTP_OK:
 		break;
 	case HTTP_MISSING_TARGET:
-		show_http_message(&type, &buffer);
+		show_http_message(&type, &charset, &buffer);
 		die("repository '%s' not found", url.buf);
 	case HTTP_NOAUTH:
-		show_http_message(&type, &buffer);
+		show_http_message(&type, &charset, &buffer);
 		die("Authentication failed for '%s'", url.buf);
 	default:
-		show_http_message(&type, &buffer);
+		show_http_message(&type, &charset, &buffer);
 		die("unable to access '%s': %s", url.buf, curl_errorstr);
 	}
 
@@ -310,6 +312,7 @@ static struct discovery* discover_refs(const char *service, int for_push)
 	strbuf_release(&refs_url);
 	strbuf_release(&exp);
 	strbuf_release(&type);
+	strbuf_release(&charset);
 	strbuf_release(&effective_url);
 	strbuf_release(&buffer);
 	last_discovery = last;
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 23cec97..eafc9d2 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -15,6 +15,10 @@ case "$PATH_INFO" in
 	printf "text/plain; charset=utf-8"
 	charset=utf-8
 	;;
+*utf16*)
+	printf "text/plain; charset=utf-16"
+	charset=utf-16
+	;;
 esac
 printf "\n"
 
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index b35b261..01b8aae 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -186,5 +186,10 @@ test_expect_success 'git client shows text/plain with a charset' '
 	grep "this is the error message" stderr
 '
 
+test_expect_success 'http error messages are reencoded' '
+	test_must_fail git clone "$HTTPD_URL/error/utf16" 2>stderr &&
+	grep "this is the error message" stderr
+'
+
 stop_httpd
 test_done
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 8/8] http: default text charset to iso-8859-1
  2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
                     ` (6 preceding siblings ...)
  2014-05-22  9:30   ` [PATCH v2 7/8] remote-curl: reencode http error messages Jeff King
@ 2014-05-22  9:36   ` Jeff King
  2014-05-23  2:02     ` brian m. carlson
  7 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-22  9:36 UTC (permalink / raw)
  To: git; +Cc: Kyle J. McKay, Peter Krefting

This is specified by RFC 2616 as the default if no "charset"
parameter is given.

Signed-off-by: Jeff King <peff@peff.net>
---
I'd prefer to do this simple, standard thing, and see how it works in
the real world. We'll hand whatever we get off to iconv, and if it
chokes, we'll pass through the data as-is. That should be enough for
most ascii messages to make it through readable, even if we get the
encoding wrong.

If we do want to do magic like "latin1 is really iso-8859-1", that seems
like the domain of iconv to me. If iconv doesn't handle it itself, I'd
rather have a wrapper there. Putting it at that layer keeps the code
cleaner, and it means the wrapper would benefit the regular commit-log
reencoding code.

If anybody wants to go further in that direction, be my guest, but please
make your suggestions in the form of patches which apply on top. :)

 http.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/http.c b/http.c
index e26ee8b..a37e84e 100644
--- a/http.c
+++ b/http.c
@@ -972,6 +972,9 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type,
 		while (*p && !isspace(*p))
 			p++;
 	}
+
+	if (!charset->len && starts_with(type->buf, "text/"))
+		strbuf_addstr(charset, "ISO-8859-1");
 }
 
 /* http_request() targets */
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
  2014-05-22  9:05       ` Jeff King
@ 2014-05-22 10:19         ` Peter Krefting
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Krefting @ 2014-05-22 10:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Kyle J. McKay, git

Jeff King:

> I was really hoping to avoid getting into all of the real-world
> messiness that a real http client needs to deal with (as opposed to just
> following the standards).

Yeah, I agree, you're probably fine without all this detail in over 
99% of the cases where this code would ever be exposed. I'm a bit 
damaged after 10+ years as a web browser developer, responsible for 
exactly this kind of stuff... :-)

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22  5:58     ` Jeff King
@ 2014-05-22 18:36       ` Junio C Hamano
  2014-05-22 18:41         ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2014-05-22 18:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Kyle J. McKay, git

Jeff King <peff@peff.net> writes:

> On Wed, May 21, 2014 at 05:07:36PM -0700, Kyle J. McKay wrote:
>
>> >+void strbuf_tolower(struct strbuf *sb)
>> >+{
>> >+	size_t i;
>> >+	for (i = 0; i < sb->len; i++)
>> >+		sb->buf[i] = tolower(sb->buf[i]);
>> >+}
>> >+
>> 
>> Wouldn't a direct transfer of the lowercase function be something more like:
>> 
>> 
>> void strbuf_tolower(struct strbuf *sb)
>> {
>> 	char *p = sb->buf;
>> 	for (; *p; p++)
>> 		*p = tolower(*p);
>> }
>> 
>> That seems to me to be a bit more efficient.  According to the comments in
>> strbuf.c, "people can always assume buf is non NULL and ->buf is NUL
>> terminated even for a freshly initialized strbuf."
>
> Yes, and that would be fine with me (I actually wrote strbuf_tolower for
> my own use, and _then_ realized that we already had such a thing that
> could be replaced).

Do we forbid that sb->buf[x] for some x < sb->len to be NUL, and if
there is such a byte we stop running tolower() on the remainder?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22 18:36       ` Junio C Hamano
@ 2014-05-22 18:41         ` Jeff King
  2014-05-22 21:04           ` Junio C Hamano
  2014-05-22 22:52           ` Kyle J. McKay
  0 siblings, 2 replies; 42+ messages in thread
From: Jeff King @ 2014-05-22 18:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle J. McKay, git

On Thu, May 22, 2014 at 11:36:37AM -0700, Junio C Hamano wrote:

> > Yes, and that would be fine with me (I actually wrote strbuf_tolower for
> > my own use, and _then_ realized that we already had such a thing that
> > could be replaced).
> 
> Do we forbid that sb->buf[x] for some x < sb->len to be NUL, and if
> there is such a byte we stop running tolower() on the remainder?

Christian brought this up elsewhere, and I agree it's probably better to
work over the whole buffer, NULs included. I'm happy to re-roll (or you
can just pick up the version of the patch in this thread), but I think
the bigger question is: is this refactor worth doing, since there is
only one caller?

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22 18:41         ` Jeff King
@ 2014-05-22 21:04           ` Junio C Hamano
  2014-05-23 20:03             ` Jeff King
  2014-05-22 22:52           ` Kyle J. McKay
  1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2014-05-22 21:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Kyle J. McKay, git

Jeff King <peff@peff.net> writes:

>> > Yes, and that would be fine with me (I actually wrote strbuf_tolower for
>> > my own use, and _then_ realized that we already had such a thing that
>> > could be replaced).
>> ...
> ... I think
> the bigger question is: is this refactor worth doing, since there is
> only one caller?

If you wrote it for your own use and then realized that it is
applicable to this codepath, wouldn't that say that there are
multiple potential callers that would benefit from having it?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22 18:41         ` Jeff King
  2014-05-22 21:04           ` Junio C Hamano
@ 2014-05-22 22:52           ` Kyle J. McKay
  2014-05-23 20:05             ` Jeff King
  1 sibling, 1 reply; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-22 22:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On May 22, 2014, at 11:41, Jeff King wrote:

> On Thu, May 22, 2014 at 11:36:37AM -0700, Junio C Hamano wrote:
>
>>> Yes, and that would be fine with me (I actually wrote  
>>> strbuf_tolower for
>>> my own use, and _then_ realized that we already had such a thing  
>>> that
>>> could be replaced).
>>
>> Do we forbid that sb->buf[x] for some x < sb->len to be NUL, and if
>> there is such a byte we stop running tolower() on the remainder?
>
> Christian brought this up elsewhere, and I agree it's probably  
> better to
> work over the whole buffer, NULs included. I'm happy to re-roll (or  
> you
> can just pick up the version of the patch in this thread),

The only reason I brought up the code difference in the first place  
was that the comment was "This makes config's lowercase() function  
public" which made me expect to see basically the equivalent of  
replacing a "static" with an "extern", but actually the function being  
made public was a different implementation than config's lowercase()  
function.  So it looks like the original PATCH 2/9 version of the code  
is best, but perhaps the comment can be tweaked a bit to not convey  
the same impression I got.  Maybe something like "Replace config's  
lowercase() function with a public one".

--Kyle

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] http: extract type/subtype portion of content-type
  2014-05-22  9:29   ` [PATCH v2 4/8] http: extract type/subtype portion of content-type Jeff King
@ 2014-05-22 22:52     ` Kyle J. McKay
  2014-05-23 20:12       ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-22 22:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Peter Krefting


On May 22, 2014, at 02:29, Jeff King wrote:

> When we get a content-type from curl, we get the whole
> header line, including any parameters, and without any
> normalization (like downcasing or whitespace) applied.
> If we later try to match it with strcmp() or even
> strcasecmp(), we may get false negatives.
>
> This could cause two visible behaviors:
>
>  1. We might fail to recognize a smart-http server by its
>     content-type.
>
>  2. We might fail to relay text/plain error messages to
>     users (especially if they contain a charset parameter).
>
> This patch teaches the http code to extract and normalize
> just the type/subtype portion of the string. This is
> technically passing out less information to the callers, who
> can no longer see the parameters. But none of the current
> callers cares, and a future patch will add back an
> easier-to-use method for accessing those parameters.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> http.c                     | 32 +++++++++++++++++++++++++++++---
> remote-curl.c              |  2 +-
> t/lib-httpd/error.sh       |  8 +++++++-
> t/t5550-http-fetch-dumb.sh |  5 +++++
> 4 files changed, 42 insertions(+), 5 deletions(-)
>
> diff --git a/http.c b/http.c
> index 94e1afd..4edf5b9 100644
> --- a/http.c
> +++ b/http.c
> @@ -906,6 +906,29 @@ static CURLcode curlinfo_strbuf(CURL *curl,  
> CURLINFO info, struct strbuf *buf)
> 	return ret;
> }
>
> +/*
> + * Extract a normalized version of the content type, with any
> + * spaces suppressed, all letters lowercased, and no trailing ";"
> + * or parameters.
> + *
> + * Example:
> + *   "TEXT/PLAIN; charset=utf-8" -> "text/plain"
> + */
> +static void extract_content_type(struct strbuf *raw, struct strbuf  
> *type)
> +{
> +	const char *p;
> +
> +	strbuf_reset(type);
> +	strbuf_grow(type, raw->len);
> +	for (p = raw->buf; *p; p++) {
> +		if (isspace(*p))
> +			continue;
> +		if (*p == ';')
> +			break;
> +		strbuf_addch(type, tolower(*p));
> +	}
> +}
> +

This will parse invalid content types as valid.  Probably not  
important since the producer of an invalid content type shouldn't be  
depending on any particular behavior by the consumer of such a type,  
but I think it warrants a note in the comment block, perhaps something  
like:

   * Note that an invalid content-type may be converted to a valid one

or some such.

--Kyle

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 8/8] http: default text charset to iso-8859-1
  2014-05-22  9:36   ` [PATCH v2 8/8] http: default text charset to iso-8859-1 Jeff King
@ 2014-05-23  2:02     ` brian m. carlson
  0 siblings, 0 replies; 42+ messages in thread
From: brian m. carlson @ 2014-05-23  2:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Kyle J. McKay, Peter Krefting

[-- Attachment #1: Type: text/plain, Size: 1043 bytes --]

On Thu, May 22, 2014 at 05:36:12AM -0400, Jeff King wrote:
> If we do want to do magic like "latin1 is really iso-8859-1", that seems
> like the domain of iconv to me. If iconv doesn't handle it itself, I'd
> rather have a wrapper there. Putting it at that layer keeps the code
> cleaner, and it means the wrapper would benefit the regular commit-log
> reencoding code.

I think being a little stricter in our character encoding actually
benefits users.  If someone claims that all their commit messages are in
US-ASCII or ISO-8859-1, and then stuffs Windows-1252 in there, that's
going to break a lot of stuff, especially if someone assumes US-ASCII
means it's okay to use it where UTF-8 is required.

It's much better to let people not insert broken stuff in the first
place rather than deal with it afterwards.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22 21:04           ` Junio C Hamano
@ 2014-05-23 20:03             ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2014-05-23 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle J. McKay, git

On Thu, May 22, 2014 at 02:04:20PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> > Yes, and that would be fine with me (I actually wrote strbuf_tolower for
> >> > my own use, and _then_ realized that we already had such a thing that
> >> > could be replaced).
> >> ...
> > ... I think
> > the bigger question is: is this refactor worth doing, since there is
> > only one caller?
> 
> If you wrote it for your own use and then realized that it is
> applicable to this codepath, wouldn't that say that there are
> multiple potential callers that would benefit from having it?

Sure, and that's why I posted it. But I know our standard for adding
code is often a bit higher. We don't generally add code that has no
callers.  In this case, there is one caller, but it is still only
theoretical that there will be another. I.e., for the same reason that I
did not end up using it in my patch (namely, that we often want to
downcase _and_ do other things while we traverse the string), other
places may not.

That's why I asked: it is a judgement call on "does this seem like it is
likely to be a useful thing in the future?". Probably, but I wouldn't be
sad if people disagreed and we dropped it.

Anyway, here is an updated patch. It uses pointer arithmetic (though I
am curious whether hand-optimizing "*p" is actually any faster than
"sb->buf[i]" with modern compilers), but traverses past NULs. I also
updated the commit message to reflect the discussion.

-- >8 --
Subject: strbuf: add strbuf_tolower function

This is a convenience wrapper to call tolower on each
character of the string.

This makes config's lowercase() function obsolete, though
note that because we have a strbuf, we are careful to
operate over the whole strbuf, rather than assuming that a
NUL is the end-of-string.

We could continue to offer a pure-string lowercase, but
there would be no callers (in most pure-string cases, we
actually duplicate and lowercase the duplicate, for which we
have the xstrdup_tolower wrapper).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-strbuf.txt | 4 ++++
 config.c                               | 8 +-------
 strbuf.c                               | 7 +++++++
 strbuf.h                               | 1 +
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index 3350d97..8480f89 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -125,6 +125,10 @@ Functions
 
 	Strip whitespace from the end of a string.
 
+`strbuf_tolower`::
+
+	Lowercase each character in the buffer using `tolower`.
+
 `strbuf_cmp`::
 
 	Compare two buffers. Returns an integer less than, equal to, or greater
diff --git a/config.c b/config.c
index a30cb5c..03ce5c6 100644
--- a/config.c
+++ b/config.c
@@ -147,12 +147,6 @@ int git_config_include(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static void lowercase(char *p)
-{
-	for (; *p; p++)
-		*p = tolower(*p);
-}
-
 void git_config_push_parameter(const char *text)
 {
 	struct strbuf env = STRBUF_INIT;
@@ -180,7 +174,7 @@ int git_config_parse_parameter(const char *text,
 		strbuf_list_free(pair);
 		return error("bogus config parameter: %s", text);
 	}
-	lowercase(pair[0]->buf);
+	strbuf_tolower(pair[0]);
 	if (fn(pair[0]->buf, pair[1] ? pair[1]->buf : NULL, data) < 0) {
 		strbuf_list_free(pair);
 		return -1;
diff --git a/strbuf.c b/strbuf.c
index 854c725..2d059b9 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -106,6 +106,13 @@ void strbuf_ltrim(struct strbuf *sb)
 	sb->buf[sb->len] = '\0';
 }
 
+void strbuf_tolower(struct strbuf *sb)
+{
+	char *p = sb->buf, *end = sb->buf + sb->len;
+	for (; p < end; p++)
+		*p = tolower(*p);
+}
+
 struct strbuf **strbuf_split_buf(const char *str, size_t slen,
 				 int terminator, int max)
 {
diff --git a/strbuf.h b/strbuf.h
index 4de7531..25328b9 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -45,6 +45,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len)
 extern void strbuf_trim(struct strbuf *);
 extern void strbuf_rtrim(struct strbuf *);
 extern void strbuf_ltrim(struct strbuf *);
+extern void strbuf_tolower(struct strbuf *sb);
 extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
 
 /*
-- 
2.0.0.rc1.436.g03cb729

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-22 22:52           ` Kyle J. McKay
@ 2014-05-23 20:05             ` Jeff King
  2014-05-23 22:34               ` Kyle J. McKay
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-23 20:05 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Junio C Hamano, git

On Thu, May 22, 2014 at 03:52:08PM -0700, Kyle J. McKay wrote:

> >Christian brought this up elsewhere, and I agree it's probably better to
> >work over the whole buffer, NULs included. I'm happy to re-roll (or you
> >can just pick up the version of the patch in this thread),
> 
> The only reason I brought up the code difference in the first place was that
> the comment was "This makes config's lowercase() function public" which made
> me expect to see basically the equivalent of replacing a "static" with an
> "extern", but actually the function being made public was a different
> implementation than config's lowercase() function.  So it looks like the
> original PATCH 2/9 version of the code is best, but perhaps the comment can
> be tweaked a bit to not convey the same impression I got.  Maybe something
> like "Replace config's lowercase() function with a public one".

Yeah, sorry if it sounded like I was complaining about your review
elsewhere. I mostly found it amusing that I got two opposite-direction
reviews.

I agree that clarifying the situation in the commit message is best, and
I've done that in the version I just posted.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] http: extract type/subtype portion of content-type
  2014-05-22 22:52     ` Kyle J. McKay
@ 2014-05-23 20:12       ` Jeff King
  2014-05-23 22:00         ` Kyle J. McKay
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2014-05-23 20:12 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git, Peter Krefting

On Thu, May 22, 2014 at 03:52:21PM -0700, Kyle J. McKay wrote:

> >+static void extract_content_type(struct strbuf *raw, struct strbuf *type)
> >+{
> >+	const char *p;
> >+
> >+	strbuf_reset(type);
> >+	strbuf_grow(type, raw->len);
> >+	for (p = raw->buf; *p; p++) {
> >+		if (isspace(*p))
> >+			continue;
> >+		if (*p == ';')
> >+			break;
> >+		strbuf_addch(type, tolower(*p));
> >+	}
> >+}
> >+
> 
> This will parse invalid content types as valid.  Probably not important
> since the producer of an invalid content type shouldn't be depending on any
> particular behavior by the consumer of such a type, but I think it warrants
> a note in the comment block, perhaps something like:
> 
>   * Note that an invalid content-type may be converted to a valid one
> 
> or some such.

Yeah, that is intentional based on our earlier discussion (this function
started as "normalize_content_type" :) ). I think it's not a big deal,
but agree it's worth a comment. Like:

diff --git a/http.c b/http.c
index 4edf5b9..6bfd093 100644
--- a/http.c
+++ b/http.c
@@ -911,8 +911,14 @@ static CURLcode curlinfo_strbuf(CURL *curl, CURLINFO info, struct strbuf *buf)
  * spaces suppressed, all letters lowercased, and no trailing ";"
  * or parameters.
  *
+ * Note that we will silently remove even invalid whitespace. For
+ * example, "text / plain" is specifically forbidden by RFC 2616,
+ * but "text/plain" is the only reasonable output, and this keeps
+ * our code simple.
+ *
  * Example:
  *   "TEXT/PLAIN; charset=utf-8" -> "text/plain"
+ *   "text / plain" -> "text/plain"
  */
 static void extract_content_type(struct strbuf *raw, struct strbuf *type)
 {

-Peff

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] http: extract type/subtype portion of content-type
  2014-05-23 20:12       ` Jeff King
@ 2014-05-23 22:00         ` Kyle J. McKay
  0 siblings, 0 replies; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-23 22:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Peter Krefting

On May 23, 2014, at 13:12, Jeff King wrote:
> On Thu, May 22, 2014 at 03:52:21PM -0700, Kyle J. McKay wrote:
>
>>> +static void extract_content_type(struct strbuf *raw, struct  
>>> strbuf *type)
>>> +{
>>> +	const char *p;
>>> +
>>> +	strbuf_reset(type);
>>> +	strbuf_grow(type, raw->len);
>>> +	for (p = raw->buf; *p; p++) {
>>> +		if (isspace(*p))
>>> +			continue;
>>> +		if (*p == ';')
>>> +			break;
>>> +		strbuf_addch(type, tolower(*p));
>>> +	}
>>> +}
>>> +
>>
>> This will parse invalid content types as valid.  Probably not  
>> important
>> since the producer of an invalid content type shouldn't be  
>> depending on any
>> particular behavior by the consumer of such a type, but I think it  
>> warrants
>> a note in the comment block, perhaps something like:
>>
>>  * Note that an invalid content-type may be converted to a valid one
>>
>> or some such.
>
> Yeah, that is intentional based on our earlier discussion (this  
> function
> started as "normalize_content_type" :) ). I think it's not a big deal,
> but agree it's worth a comment. Like:
>
> diff --git a/http.c b/http.c
> index 4edf5b9..6bfd093 100644
> --- a/http.c
> +++ b/http.c
> @@ -911,8 +911,14 @@ static CURLcode curlinfo_strbuf(CURL *curl,  
> CURLINFO info, struct strbuf *buf)
>  * spaces suppressed, all letters lowercased, and no trailing ";"
>  * or parameters.
>  *
> + * Note that we will silently remove even invalid whitespace. For
> + * example, "text / plain" is specifically forbidden by RFC 2616,
> + * but "text/plain" is the only reasonable output, and this keeps
> + * our code simple.

Very nice.  :)

> + *
>  * Example:
>  *   "TEXT/PLAIN; charset=utf-8" -> "text/plain"
> + *   "text / plain" -> "text/plain"
>  */
> static void extract_content_type(struct strbuf *raw, struct strbuf  
> *type)
> {
>
> -Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] strbuf: add strbuf_tolower function
  2014-05-23 20:05             ` Jeff King
@ 2014-05-23 22:34               ` Kyle J. McKay
  0 siblings, 0 replies; 42+ messages in thread
From: Kyle J. McKay @ 2014-05-23 22:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On May 23, 2014, at 13:05, Jeff King wrote:
> On Thu, May 22, 2014 at 03:52:08PM -0700, Kyle J. McKay wrote:
>
>>> Christian brought this up elsewhere, and I agree it's probably  
>>> better to
>>> work over the whole buffer, NULs included. I'm happy to re-roll  
>>> (or you
>>> can just pick up the version of the patch in this thread),
>>
>> The only reason I brought up the code difference in the first place  
>> was that
>> the comment was "This makes config's lowercase() function public"  
>> which made
>> me expect to see basically the equivalent of replacing a "static"  
>> with an
>> "extern", but actually the function being made public was a different
>> implementation than config's lowercase() function.
>
> Yeah, sorry if it sounded like I was complaining about your review
> elsewhere. I mostly found it amusing that I got two opposite-direction
> reviews.

I didn't seem like complaining to me.  I also was amused.  :)

> I agree that clarifying the situation in the commit message is best,  
> and
> I've done that in the version I just posted.

It looks great.  And I suspect you're correct that a modern compiler  
would optimize the index-based version to be as efficient as the  
pointer arithmetic version.

--Kyle

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-05-23 22:34 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-21 10:25 [PATCH 0/9] handle alternate charsets for remote http errors Jeff King
2014-05-21 10:27 ` [PATCH 1/9] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
2014-05-21 10:27 ` [PATCH 2/9] strbuf: add strbuf_tolower function Jeff King
2014-05-22  0:07   ` Kyle J. McKay
2014-05-22  5:58     ` Jeff King
2014-05-22 18:36       ` Junio C Hamano
2014-05-22 18:41         ` Jeff King
2014-05-22 21:04           ` Junio C Hamano
2014-05-23 20:03             ` Jeff King
2014-05-22 22:52           ` Kyle J. McKay
2014-05-23 20:05             ` Jeff King
2014-05-23 22:34               ` Kyle J. McKay
2014-05-21 10:28 ` [PATCH 3/9] daemon/config: factor out duplicate xstrdup_tolower Jeff King
2014-05-21 10:29 ` [PATCH 4/9] http: normalize case of returned content-type Jeff King
2014-05-21 10:29 ` [PATCH 5/9] t/lib-httpd: use write_script to copy CGI scripts Jeff King
2014-05-21 10:29 ` [PATCH 6/9] t5550: test display of remote http error messages Jeff King
2014-05-21 10:33 ` [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter Jeff King
2014-05-22  0:07   ` Kyle J. McKay
2014-05-22  6:05     ` Jeff King
2014-05-22  7:27       ` Kyle J. McKay
2014-05-22  9:02         ` Jeff King
2014-05-22  7:12     ` Peter Krefting
2014-05-22  9:05       ` Jeff King
2014-05-22 10:19         ` Peter Krefting
2014-05-21 10:33 ` [PATCH 8/9] strbuf: add strbuf_reencode helper Jeff King
2014-05-21 10:33 ` [PATCH 9/9] remote-curl: reencode http error messages Jeff King
2014-05-22  0:07   ` Kyle J. McKay
2014-05-22  6:05     ` Jeff King
2014-05-22  7:26     ` Peter Krefting
2014-05-22  9:28 ` [PATCH v2 0/9] handle alternate charsets for remote http errors Jeff King
2014-05-22  9:28   ` [PATCH v2 1/8] test-lib: preserve GIT_CURL_VERBOSE from the environment Jeff King
2014-05-22  9:28   ` [PATCH v2 2/8] t/lib-httpd: use write_script to copy CGI scripts Jeff King
2014-05-22  9:29   ` [PATCH v2 3/8] t5550: test display of remote http error messages Jeff King
2014-05-22  9:29   ` [PATCH v2 4/8] http: extract type/subtype portion of content-type Jeff King
2014-05-22 22:52     ` Kyle J. McKay
2014-05-23 20:12       ` Jeff King
2014-05-23 22:00         ` Kyle J. McKay
2014-05-22  9:30   ` [PATCH v2 5/8] http: optionally extract charset parameter from content-type Jeff King
2014-05-22  9:30   ` [PATCH v2 6/8] strbuf: add strbuf_reencode helper Jeff King
2014-05-22  9:30   ` [PATCH v2 7/8] remote-curl: reencode http error messages Jeff King
2014-05-22  9:36   ` [PATCH v2 8/8] http: default text charset to iso-8859-1 Jeff King
2014-05-23  2:02     ` brian m. carlson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.