From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Sunshine Subject: Re: [PATCH v6 1/1] http: Add Accept-Language header if possible Date: Wed, 24 Dec 2014 15:35:09 -0500 Message-ID: References: <1419266658-1180-1-git-send-email-eungjun.yi@navercorp.com> <1419266658-1180-2-git-send-email-eungjun.yi@navercorp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Git List , Yi EungJun , Jeff King , Peter Krefting , Junio C Hamano , Michael Blume To: Yi EungJun X-From: git-owner@vger.kernel.org Wed Dec 24 21:35:21 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y3seO-000697-J6 for gcvg-git-2@plane.gmane.org; Wed, 24 Dec 2014 21:35:21 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751531AbaLXUfL (ORCPT ); Wed, 24 Dec 2014 15:35:11 -0500 Received: from mail-yk0-f169.google.com ([209.85.160.169]:49420 "EHLO mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751484AbaLXUfK (ORCPT ); Wed, 24 Dec 2014 15:35:10 -0500 Received: by mail-yk0-f169.google.com with SMTP id 79so4144782ykr.14 for ; Wed, 24 Dec 2014 12:35:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=RXkqBxzswbBtYQj+jwYdD8UvxgsTKRq8R/RtKsD77Mk=; b=XReHtO1q3NQeYqbd0f9sn5AZ1++rLP55IYturaMZasGHU6gsN6oSST5+1gowf/movQ e4pi6BtT1kkGE0mgeAUD2jK528QEc/nvL8UgT9FHL/vYypALjfQyBgqZE6dONz9TfyZq uQlnL8v4k++0sVqNGciHB/8L0MPbY7dUcPPFHu+V2lV5NlSmj5qP9pIF0oGasPzjzj30 NWmtuj0XtCfkrS6JvOJQ6VpwQj6HGQ7OEnkGuPcN1TfpH1sOoQy8+UPHvcPjlcb+Ijqm pAEkPXZGBRD4DKIUCMkaVFVTqv+vSoLP2dgoGUPJB6qdmEdmHT7v4GcqBRoTRwwxY98k 6gcQ== X-Received: by 10.170.78.130 with SMTP id u124mr31076434yku.91.1419453309431; Wed, 24 Dec 2014 12:35:09 -0800 (PST) Received: by 10.170.73.7 with HTTP; Wed, 24 Dec 2014 12:35:09 -0800 (PST) In-Reply-To: <1419266658-1180-2-git-send-email-eungjun.yi@navercorp.com> X-Google-Sender-Auth: 7kkWCYwCFgHXKck3hJtolAFXT3U Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Dec 22, 2014 at 11:44 AM, Yi EungJun wrote: > From: Yi EungJun > > Add an Accept-Language header which indicates the user's preferred > languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG. > > Examples: > LANGUAGE= -> "" > LANGUAGE=ko:en -> "Accept-Language: ko, en;q=0.9, *;q=0.1" > LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *;q=0.1" > LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *;q=0.1" > > This gives git servers a chance to display remote error messages in > the user's preferred language. > > Limit the number of languages to 1,000 because q-value must not be > smaller than 0.001, and limit the length of Accept-Language header to > 4,000 bytes for some HTTP servers which cannot accept such long header. Just a few comments and observations below. Alone, they are not necessarily worth a re-roll, but if you happen to re-roll for some other reason, perhaps take them into consideration. > Signed-off-by: Yi EungJun > --- > diff --git a/http.c b/http.c > index 040f362..7a77708 100644 > --- a/http.c > +++ b/http.c > @@ -986,6 +993,166 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type, > strbuf_addstr(charset, "ISO-8859-1"); > } > > +static void write_accept_language(struct strbuf *buf) > +{ > + [...] > + /* > + * MAX_LANGS must not be larger than 1,000. If it is larger than that, > + * q-value will be smaller than 0.001, the minimum q-value the HTTP > + * specification allows [1]. > + * > + * [1]: http://tools.ietf.org/html/rfc7231#section-5.3.1 > + */ > + const int MAX_LANGS = 1000; > + const int MAX_SIZE_OF_HEADER = 4000; > + const int MAX_SIZE_OF_ASTERISK_ELEMENT = 11; /* for ", *;q=0.001" */ These two MAX_SIZE_* constants are never used individually, but rather only as (MAX_SIZE_OF_HEADER - MAX_SIZE_OF_ASTERISK_ELEMENT). It might be a bit more readable to compute the final value here, with a suitable comment, rather than at point-of-use. Perhaps something like: /* limit of some HTTP servers is 4000 - strlen(", *;q=0.001") */ const int MAX_HEADER_SIZE = 4000 - 11; More below. > + [...] > + /* > + * Convert a list of colon-separated locale values [1][2] to a list of > + * comma-separated language tags [3] which can be used as a value of > + * Accept-Language header. > + [...] > + */ > + for (pos = lang_begin; ; pos++) { > + if (!*pos || *pos == ':') { > + if (is_q_factor_required) { > + /* Put a q-factor only if it is less than 1.0. */ > + if (q < max_q) > + strbuf_addf(buf, q_format, q); > + > + if (q > 1) > + q--; > + > + last_size = buf->len; > + > + is_q_factor_required = 0; > + } > + parse_state = SEPARATOR; > + } else if (parse_state == CODESET_OR_MODIFIER) > + continue; > + else if (*pos == ' ') /* Ignore whitespace character */ > + continue; > + else if (*pos == '.' || *pos == '@') /* Remove .codeset and @modifier. */ > + parse_state = CODESET_OR_MODIFIER; > + else { > + if (parse_state != LANGUAGE_TAG && q < max_q) > + strbuf_addstr(buf, ", "); > + strbuf_addch(buf, *pos == '_' ? '-' : *pos); > + is_q_factor_required = 1; > + parse_state = LANGUAGE_TAG; > + } > + > + if (buf->len > MAX_SIZE_OF_HEADER - MAX_SIZE_OF_ASTERISK_ELEMENT) { > + strbuf_remove(buf, last_size, buf->len - last_size); > + break; > + } > + > + if (!*pos) > + break; > + } Although often suitable when parsing complex inputs, state machines demand high cognitive load. The input you're parsing, on the other hand, is straightforward and can easily be processed with a simple sequential parser, which is easier to reason about and review for correctness. For instance, something like this: while (*s) { /* collect language tag */ for (; *s && *s != '.' && *s != '@' && *s != ':'; s++) strbuf_addch(buf, *s == '_' ? '-' : *s); /* skip .codeset and @modifier */ while (*s && *s != ':') s++; strbuf_addf(buf, q_format, q); ... other bookkeeping ... if (*s == ':') s++; } This example is intentionally simplified but illustrates the general idea. It lacks comma insertion (left as an exercise for the reader) and empty language tag handling (":en", "en::ko"); and doesn't take whitespace into consideration since it wasn't clear why your v6 parser pays attention to embedded spaces, whereas your earlier versions did not. > + /* Don't add Accept-Language header if no language is preferred. */ > + if (q >= max_q) { > + strbuf_reset(buf); > + return; > + } > + > + /* Add '*' with minimum q-factor greater than 0.0. */ > + strbuf_addstr(buf, ", *"); > + strbuf_addf(buf, q_format, 1); > +} > + > +/* > + * Get an Accept-Language header which indicates user's preferred languages. > + * > + * This function always return non-NULL string as strbuf_detach() does. A couple comments: It's not necessary to explain the public API in terms of an implementation detail. Callers of this function don't care and don't need to know that the value was constructed via strbuf, nor that it is somehow dependent upon the behavior of the underlying implementation of strbuf_detach(). This is a somewhat unusual contract. It's much more common and idiomatic in C to return NULL as an indication of "no preference" (or "failure") than to return an empty string. > + * > + * Examples: > + * LANGUAGE= -> "" > + * LANGUAGE=ko:en -> "Accept-Language: ko, en; q=0.9, *; q=0.1" > + * LANGUAGE=ko_KR.UTF-8:sr@latin -> "Accept-Language: ko-KR, sr; q=0.9, *; q=0.1" > + * LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *; q=0.1" > + * LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *; q=0.1" > + * LANGUAGE= LANG=C -> "" > + */ > +static const char *get_accept_language(void) > +{ > + if (!cached_accept_language) { > + struct strbuf buf = STRBUF_INIT; > + write_accept_language(&buf); > + cached_accept_language = strbuf_detach(&buf, NULL); > + strbuf_release(&buf); Junio already mentioned that strbuf_release() is unnecessary following strbuf_detach(). > + } > + > + return cached_accept_language; > +} > + > /* http_request() targets */ > #define HTTP_REQUEST_STRBUF 0 > #define HTTP_REQUEST_FILE 1