* [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" @ 2022-03-07 3:37 Jason Yundt 2022-03-07 3:37 ` [PATCH 1/2] comment: fix typo Jason Yundt ` (4 more replies) 0 siblings, 5 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-07 3:37 UTC (permalink / raw) To: git; +Cc: Jeff King, Jason Yundt See the second commit's message for more details. Jason Yundt (2): comment: fix typo gitweb: remove invalid http-equiv="content-type" gitweb/gitweb.perl | 4 +--- t/t9502-gitweb-standalone-parse-output.sh | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 4 deletions(-) -- 2.35.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] comment: fix typo 2022-03-07 3:37 [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt @ 2022-03-07 3:37 ` Jason Yundt 2022-03-07 3:37 ` [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt ` (3 subsequent siblings) 4 siblings, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-07 3:37 UTC (permalink / raw) To: git; +Cc: Jeff King, Jason Yundt Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- t/t9502-gitweb-standalone-parse-output.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index 3167473b30..e7363511dd 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -34,7 +34,7 @@ EOF # # This will check that gitweb HTTP header contains proposed filename # as <basename> with '.tar' suffix added, and that generated tarfile -# (gitweb message body) has <prefix> as prefix for al files in tarfile +# (gitweb message body) has <prefix> as prefix for all files in tarfile # # <prefix> default to <basename> check_snapshot () { -- 2.35.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-07 3:37 [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-07 3:37 ` [PATCH 1/2] comment: fix typo Jason Yundt @ 2022-03-07 3:37 ` Jason Yundt 2022-03-07 12:23 ` Ævar Arnfjörð Bjarmason 2022-03-07 23:24 ` brian m. carlson 2022-03-08 1:07 ` [PATCH v2 0/2] " Jason Yundt ` (2 subsequent siblings) 4 siblings, 2 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-07 3:37 UTC (permalink / raw) To: git; +Cc: Jeff King, Jason Yundt Before this change, gitweb would generate pages which included: <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/> A meta element with http-equiv="content-type" is said to be in the "Encoding declaration state". According to the HTML Standard, The Encoding declaration state may be used in HTML documents, but elements with an http-equiv attribute in that state must not be used in XML documents. Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type> This change removes that meta element since gitweb always generates XML documents. Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- gitweb/gitweb.perl | 4 +--- t/t9502-gitweb-standalone-parse-output.sh | 13 +++++++++++++ 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index fbd1c20a23..606b50104c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4213,8 +4213,7 @@ sub git_header_html { my %opts = @_; my $title = get_page_title(); - my $content_type = get_content_type_html(); - print $cgi->header(-type=>$content_type, -charset => 'utf-8', + print $cgi->header(-type=>get_content_type_html(), -charset => 'utf-8', -status=> $status, -expires => $expires) unless ($opts{'-no_http_header'}); my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : ''; @@ -4225,7 +4224,6 @@ sub git_header_html { <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> <!-- git core binaries version $git_version --> <head> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/> <meta name="robots" content="index, nofollow"/> <title>$title</title> diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index e7363511dd..25165edacc 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -207,4 +207,17 @@ test_expect_success 'xss checks' ' xss "" "$TAG+" ' +no_http_equiv_content_type() { + gitweb_run "$@" && + ! grep -Ei "http-equiv=['\"]?content-type" gitweb.body +} + +# See: <https://html.spec.whatwg.org/dev/semantics.html#attr-meta-http-equiv-content-type> +test_expect_success 'no http-equiv="content-type" in XHTML' ' + no_http_equiv_content_type && + no_http_equiv_content_type "p=.git" && + no_http_equiv_content_type "p=.git;a=log" && + no_http_equiv_content_type "p=.git;a=tree" +' + test_done -- 2.35.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-07 3:37 ` [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt @ 2022-03-07 12:23 ` Ævar Arnfjörð Bjarmason 2022-03-07 22:49 ` Jason Yundt 2022-03-07 23:24 ` brian m. carlson 1 sibling, 1 reply; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-03-07 12:23 UTC (permalink / raw) To: Jason Yundt; +Cc: git, Jeff King On Sun, Mar 06 2022, Jason Yundt wrote: > Before this change, gitweb would generate pages which included: > > <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/> > > A meta element with http-equiv="content-type" is said to be in the > "Encoding declaration state". According to the HTML Standard, > > The Encoding declaration state may be used in HTML documents, > but elements with an http-equiv attribute in that state must not > be used in XML documents. > > Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type> > > This change removes that meta element since gitweb always generates XML > documents. > > Signed-off-by: Jason Yundt <jason@jasonyundt.email> > --- > gitweb/gitweb.perl | 4 +--- > t/t9502-gitweb-standalone-parse-output.sh | 13 +++++++++++++ > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index fbd1c20a23..606b50104c 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -4213,8 +4213,7 @@ sub git_header_html { > my %opts = @_; > > my $title = get_page_title(); > - my $content_type = get_content_type_html(); > - print $cgi->header(-type=>$content_type, -charset => 'utf-8', > + print $cgi->header(-type=>get_content_type_html(), -charset => 'utf-8', I think it would be better to just skip this hunk, no behavior will change if it's left in. > -status=> $status, -expires => $expires) > unless ($opts{'-no_http_header'}); > my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : ''; > @@ -4225,7 +4224,6 @@ sub git_header_html { > <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> > <!-- git core binaries version $git_version --> > <head> > -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> ..with this being the only behavior change (yeah the variable will now be used only in one place, but that's fine) I'm not sure I understand this change really. The result in always XML, so application/xhtml+xml is redundant, text/html, or both? But aside from that: I have seen browsers get the lack of encoding="" "wrong" with data at rest, don't some still default to ISO-8859-1? So won't this result in badly decoded data if you save the web page & view it locally? > <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/> > <meta name="robots" content="index, nofollow"/> > <title>$title</title> > diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh > index e7363511dd..25165edacc 100755 > --- a/t/t9502-gitweb-standalone-parse-output.sh > +++ b/t/t9502-gitweb-standalone-parse-output.sh > @@ -207,4 +207,17 @@ test_expect_success 'xss checks' ' > xss "" "$TAG+" > ' > > +no_http_equiv_content_type() { > + gitweb_run "$@" && > + ! grep -Ei "http-equiv=['\"]?content-type" gitweb.body Nit: Should we skip the "-i" here since we're testing our own output, and not http standards in general (i.e. we don't have to worry about the case of http-equiv?) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-07 12:23 ` Ævar Arnfjörð Bjarmason @ 2022-03-07 22:49 ` Jason Yundt 0 siblings, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-07 22:49 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King On Monday, March 7, 2022 7:23:49 AM EST Ævar Arnfjörð Bjarmason wrote: > I'm not sure I understand this change really. The result in always XML, > so application/xhtml+xml is redundant, text/html, or both? To be honest, using an http-equiv="content-type" in XHTML is confusing. When you do use one, your goal shouldn’t really be to specify the document’s MIME type. After all, the first three lines of each page say <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> Those lines are more than enough to determine that something is using XHTML and UTF-8. Instead, the idea is to help out a parser that is incorrectly parsing the document as HTML (instead of as XHTML). Historical W3C documents (that were applicable when http-equiv="content-type" was allowed in XHTML) [1] [2][3] indicate that http-equiv="content-type" should be used like this: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> In other words, to use http-equiv="content-type" properly in XHTML, you had to lie about the document’s type. The fact that this is confusing is probably part of why WHATWG disallowed it in the HTML Standard. > But aside from that: I have seen browsers get the lack of encoding="" > "wrong" with data at rest, don't some still default to ISO-8859-1? > > So won't this result in badly decoded data if you save the web page & > view it locally? I tested this idea in ungoogled-chromium, Firefox and Pale Moon. Other than Pale Moon in one specific circumstance, they all used UTF-8 as the encoding. Pale Moon used windows-1252, but only when the file ended with .html. When the file ended with .xhtml, Pale Moon used UTF-8. That being said, we don’t have to use an http-equiv="content-type" to fix the problem. Instead, we can use a <meta charset="utf-8"> which is allowed by the HTML Standard [4]. [1]: <https://www.w3.org/TR/xhtml1/#C_9> [2]: <https://www.w3.org/TR/html-polyglot/#character-encoding> [3]: <https://www.w3.org/Bugs/Public/show_bug.cgi?id=21818> [4]: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-charset> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-07 3:37 ` [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-07 12:23 ` Ævar Arnfjörð Bjarmason @ 2022-03-07 23:24 ` brian m. carlson 1 sibling, 0 replies; 17+ messages in thread From: brian m. carlson @ 2022-03-07 23:24 UTC (permalink / raw) To: Jason Yundt; +Cc: git, Jeff King [-- Attachment #1: Type: text/plain, Size: 1772 bytes --] On 2022-03-07 at 03:37:23, Jason Yundt wrote: > Before this change, gitweb would generate pages which included: > > <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/> > > A meta element with http-equiv="content-type" is said to be in the > "Encoding declaration state". According to the HTML Standard, > > The Encoding declaration state may be used in HTML documents, > but elements with an http-equiv attribute in that state must not > be used in XML documents. > > Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type> > > This change removes that meta element since gitweb always generates XML > documents. This change seems fine. We do specify this in the HTTP header, including the character set, which is what matters, so this should work in every browser, and the http-equiv is unneeded. I also don't think we need a meta header here, since we have an XML declaration, and that's controlling in this situation. This isn't regular HTML and we don't declare it as such, so using a meta header to control this isn't correct: the XML declaration should be used instead in the event a user downloads this to a local disk and processes it outside the context of an HTTP request. Since we control the HTTP headers, I'd actually argue that your test might well reject all http-equiv headers since they could be done much better with actual HTTP headers (and would therefore work with non-browser clients), but I don't think that's worth a reroll, nor do I think a test is even needed here (but bonus points for adding one). So I think this looks good as is. Thanks for the patch. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/2] gitweb: remove invalid http-equiv="content-type" 2022-03-07 3:37 [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-07 3:37 ` [PATCH 1/2] comment: fix typo Jason Yundt 2022-03-07 3:37 ` [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt @ 2022-03-08 1:07 ` Jason Yundt 2022-03-08 2:13 ` Junio C Hamano 2022-03-08 15:56 ` [PATCH v3 " Jason Yundt 2022-03-08 1:07 ` [PATCH v2 1/2] comment: fix typo Jason Yundt 2022-03-08 1:07 ` [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 4 siblings, 2 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 1:07 UTC (permalink / raw) To: git; +Cc: Jason Yundt, Jeff King, Ævar Arnfjörð Bjarmason See the second commit's message for more details. Compared to the first version of this patch, this one - keeps an extra variable, - replaces the http-equiv="content-type" tag with a charset= one, and - removes the -i flag from grep. Jason Yundt (2): comment: fix typo gitweb: remove invalid http-equiv="content-type" gitweb/gitweb.perl | 2 +- t/t9502-gitweb-standalone-parse-output.sh | 18 +++++++++++++++++- 2 files changed, 18 insertions(+), 2 deletions(-) -- 2.35.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 1:07 ` [PATCH v2 0/2] " Jason Yundt @ 2022-03-08 2:13 ` Junio C Hamano 2022-03-08 12:26 ` Jason Yundt 2022-03-08 15:56 ` [PATCH v3 " Jason Yundt 1 sibling, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2022-03-08 2:13 UTC (permalink / raw) To: Jason Yundt; +Cc: git, Jeff King, Ævar Arnfjörð Bjarmason Jason Yundt <jason@jasonyundt.email> writes: > - keeps an extra variable, I am not sure if this is an improvement. The original had two places that used $content_type, but after getting rid of one, there is only one place that needed the value, which can be used in place; and it was quite clear that was what was going on in the previous iteration. About the <meta> thing, it seems that brian already commented on it. Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 0/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 2:13 ` Junio C Hamano @ 2022-03-08 12:26 ` Jason Yundt 0 siblings, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 12:26 UTC (permalink / raw) To: Junio C Hamano Cc: git, Jeff King, Ævar Arnfjörð Bjarmason, brian m. carlson On Monday, March 7, 2022 9:13:52 PM EST Junio C Hamano wrote: > About the <meta> thing, it seems that brian already commented on it. Thanks for mentioning that. I now see Brian’s comments on the archive. My mail server was blocking him (via zen.spamhaus.org), but I’ve added his server to the allowlist. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 0/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 1:07 ` [PATCH v2 0/2] " Jason Yundt 2022-03-08 2:13 ` Junio C Hamano @ 2022-03-08 15:56 ` Jason Yundt 2022-03-08 15:56 ` [PATCH v3 1/2] comment: fix typo Jason Yundt 2022-03-08 15:56 ` [PATCH v3 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 1 sibling, 2 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 15:56 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, brian m. carlson, Junio C Hamano, Jeff King, Jason Yundt See the second commit's message for more details. Compared to the second version of this patch, this one - removes the extra variable again, - doesn't include a <meta charset="utf-8"/> and - corrects a technical error in the second commit’s message. Jason Yundt (2): comment: fix typo gitweb: remove invalid http-equiv="content-type" gitweb/gitweb.perl | 4 +--- t/t9502-gitweb-standalone-parse-output.sh | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 4 deletions(-) -- 2.35.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 1/2] comment: fix typo 2022-03-08 15:56 ` [PATCH v3 " Jason Yundt @ 2022-03-08 15:56 ` Jason Yundt 2022-03-08 15:56 ` [PATCH v3 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 1 sibling, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 15:56 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, brian m. carlson, Junio C Hamano, Jeff King, Jason Yundt Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- t/t9502-gitweb-standalone-parse-output.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index 3167473b30..e7363511dd 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -34,7 +34,7 @@ EOF # # This will check that gitweb HTTP header contains proposed filename # as <basename> with '.tar' suffix added, and that generated tarfile -# (gitweb message body) has <prefix> as prefix for al files in tarfile +# (gitweb message body) has <prefix> as prefix for all files in tarfile # # <prefix> default to <basename> check_snapshot () { -- 2.35.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 15:56 ` [PATCH v3 " Jason Yundt 2022-03-08 15:56 ` [PATCH v3 1/2] comment: fix typo Jason Yundt @ 2022-03-08 15:56 ` Jason Yundt 1 sibling, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 15:56 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, brian m. carlson, Junio C Hamano, Jeff King, Jason Yundt Before this change, gitweb would generate pages which included: <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/> When a meta's http-equiv equals "content-type", the http-equiv is said to be in the "Encoding declaration state". According to the HTML Standard, The Encoding declaration state may be used in HTML documents, but elements with an http-equiv attribute in that state must not be used in XML documents. Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type> This change removes that meta element since gitweb always generates XML documents. Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- gitweb/gitweb.perl | 4 +--- t/t9502-gitweb-standalone-parse-output.sh | 13 +++++++++++++ 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index fbd1c20a23..606b50104c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4213,8 +4213,7 @@ sub git_header_html { my %opts = @_; my $title = get_page_title(); - my $content_type = get_content_type_html(); - print $cgi->header(-type=>$content_type, -charset => 'utf-8', + print $cgi->header(-type=>get_content_type_html(), -charset => 'utf-8', -status=> $status, -expires => $expires) unless ($opts{'-no_http_header'}); my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : ''; @@ -4225,7 +4224,6 @@ sub git_header_html { <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> <!-- git core binaries version $git_version --> <head> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/> <meta name="robots" content="index, nofollow"/> <title>$title</title> diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index e7363511dd..8cb582f0e6 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -207,4 +207,17 @@ test_expect_success 'xss checks' ' xss "" "$TAG+" ' +no_http_equiv_content_type() { + gitweb_run "$@" && + ! grep -E "http-equiv=['\"]?content-type" gitweb.body +} + +# See: <https://html.spec.whatwg.org/dev/semantics.html#attr-meta-http-equiv-content-type> +test_expect_success 'no http-equiv="content-type" in XHTML' ' + no_http_equiv_content_type && + no_http_equiv_content_type "p=.git" && + no_http_equiv_content_type "p=.git;a=log" && + no_http_equiv_content_type "p=.git;a=tree" +' + test_done -- 2.35.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 1/2] comment: fix typo 2022-03-07 3:37 [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt ` (2 preceding siblings ...) 2022-03-08 1:07 ` [PATCH v2 0/2] " Jason Yundt @ 2022-03-08 1:07 ` Jason Yundt 2022-03-08 1:07 ` [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 4 siblings, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 1:07 UTC (permalink / raw) To: git; +Cc: Jason Yundt, Jeff King, Ævar Arnfjörð Bjarmason Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- t/t9502-gitweb-standalone-parse-output.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index 3167473b30..e7363511dd 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -34,7 +34,7 @@ EOF # # This will check that gitweb HTTP header contains proposed filename # as <basename> with '.tar' suffix added, and that generated tarfile -# (gitweb message body) has <prefix> as prefix for al files in tarfile +# (gitweb message body) has <prefix> as prefix for all files in tarfile # # <prefix> default to <basename> check_snapshot () { -- 2.35.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-07 3:37 [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt ` (3 preceding siblings ...) 2022-03-08 1:07 ` [PATCH v2 1/2] comment: fix typo Jason Yundt @ 2022-03-08 1:07 ` Jason Yundt 2022-03-08 1:50 ` brian m. carlson 4 siblings, 1 reply; 17+ messages in thread From: Jason Yundt @ 2022-03-08 1:07 UTC (permalink / raw) To: git; +Cc: Jason Yundt, Jeff King, Ævar Arnfjörð Bjarmason Before this change, gitweb would generate pages which included: <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/> A meta element with http-equiv="content-type" is said to be in the "Encoding declaration state". According to the HTML Standard, The Encoding declaration state may be used in HTML documents, but elements with an http-equiv attribute in that state must not be used in XML documents. Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type> Gitweb always generates XML documents, so its use of http-equiv="content-type" was invalid. This change replaces that tag with <meta charset="utf-8"/> which is equivalent [1] and allowed in XML documents [2]. [1]: <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-http-equiv> [2]: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-charset> Signed-off-by: Jason Yundt <jason@jasonyundt.email> --- gitweb/gitweb.perl | 2 +- t/t9502-gitweb-standalone-parse-output.sh | 16 ++++++++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index fbd1c20a23..59457c1004 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4225,7 +4225,7 @@ sub git_header_html { <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> <!-- git core binaries version $git_version --> <head> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> +<meta charset="utf-8"/> <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/> <meta name="robots" content="index, nofollow"/> <title>$title</title> diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh index e7363511dd..0b06e2d6b0 100755 --- a/t/t9502-gitweb-standalone-parse-output.sh +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -207,4 +207,20 @@ test_expect_success 'xss checks' ' xss "" "$TAG+" ' +check_encoding_meta_element() { + gitweb_run "$@" && + ! grep -E "http-equiv=['\"]?content-type" gitweb.body && + grep -F '<meta charset="utf-8"/>' gitweb.body +} + +# One of those can be used in XHTML, the other one can't. See: +# <https://html.spec.whatwg.org/dev/semantics.html#attr-meta-charset> +# <https://html.spec.whatwg.org/dev/semantics.html#attr-meta-http-equiv-content-type> +test_expect_success 'no http-equiv="content-type", yes charset="utf-8"' ' + check_encoding_meta_element && + check_encoding_meta_element "p=.git" && + check_encoding_meta_element "p=.git;a=log" && + check_encoding_meta_element "p=.git;a=tree" +' + test_done -- 2.35.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 1:07 ` [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt @ 2022-03-08 1:50 ` brian m. carlson 2022-03-08 12:44 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 17+ messages in thread From: brian m. carlson @ 2022-03-08 1:50 UTC (permalink / raw) To: Jason Yundt; +Cc: git, Jeff King, Ævar Arnfjörð Bjarmason [-- Attachment #1: Type: text/plain, Size: 814 bytes --] On 2022-03-08 at 01:07:11, Jason Yundt wrote: > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index fbd1c20a23..59457c1004 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -4225,7 +4225,7 @@ sub git_header_html { > <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> > <!-- git core binaries version $git_version --> > <head> > -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> > +<meta charset="utf-8"/> I don't actually think this is an improvement. I don't think it's necessary, considering we have an XML declaration and the HTTP header, both of which already say it's UTF-8 and will take precedence over this. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 1:50 ` brian m. carlson @ 2022-03-08 12:44 ` Ævar Arnfjörð Bjarmason 2022-03-08 14:54 ` Jason Yundt 0 siblings, 1 reply; 17+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-03-08 12:44 UTC (permalink / raw) To: brian m. carlson; +Cc: Jason Yundt, git, Jeff King On Tue, Mar 08 2022, brian m. carlson wrote: > [[PGP Signed Part:Undecided]] > On 2022-03-08 at 01:07:11, Jason Yundt wrote: >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index fbd1c20a23..59457c1004 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -4225,7 +4225,7 @@ sub git_header_html { >> <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> >> <!-- git core binaries version $git_version --> >> <head> >> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> >> +<meta charset="utf-8"/> > > I don't actually think this is an improvement. I don't think it's > necessary, considering we have an XML declaration and the HTTP header, > both of which already say it's UTF-8 and will take precedence over this. Ageed. I was a bit surprised per Jason's https://lore.kernel.org/git/109813056.nniJfEyVGO@jason-desktop-linux/ that the removal wasn't kept. I.e. he was replying to a question of mine asking whether we didn't need this data at rest, e.g if you save the page. I didn't notice the "<?xml version..." we emit, which seems to be enough. I.e. this seems to have always been redundant going back to c994d620cc8 (v220, 2005-08-07), or rather, the character set part of it. Maybe I still don't understand this, but the commit message seems to me be conflating whether we send the *right* http-equiv with whether we send it at all, i.e. if the problem is that XML documents shouldn't be text/html isn't this correct?: diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index fbd1c20a232..c1c5af0b197 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4049,7 +4049,13 @@ sub get_page_title { return $title; } +sub get_content_type_xml { + return 'application/xhtml+xml'; +} + sub get_content_type_html { + my ($want_xml) = @_; + # require explicit support from the UA if we are to send the page as # 'application/xhtml+xml', otherwise send it as plain old 'text/html'. # we have to do this because MSIE sometimes globs '*/*', pretending to @@ -4057,7 +4063,7 @@ sub get_content_type_html { if (defined $cgi->http('HTTP_ACCEPT') && $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ && $cgi->Accept('application/xhtml+xml') != 0) { - return 'application/xhtml+xml'; + return get_content_type_html(); } else { return 'text/html'; } @@ -4214,6 +4220,7 @@ sub git_header_html { my $title = get_page_title(); my $content_type = get_content_type_html(); + my $content_type_xml = get_content_type_html(); print $cgi->header(-type=>$content_type, -charset => 'utf-8', -status=> $status, -expires => $expires) unless ($opts{'-no_http_header'}); @@ -4225,7 +4232,7 @@ sub git_header_html { <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> <!-- git core binaries version $git_version --> <head> -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> +<meta http-equiv="content-type" content="$content_type_xml; charset=utf-8"/> <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/> <meta name="robots" content="index, nofollow"/> <title>$title</title> Of course we might then *also* decide that <meta http-equiv> in this case isn't needed at all, but isn't that a seperate change? And won't conforming browsers treat application/xhtml+xml differently when the page is saved? A long time ago (Idid some web development) using it would enable pedantic strictness in browsers, i.e. unclosed tags etc. would be a hard error, but I can't reproduce that locally in either Firefox or Chrome now (with just the gitweb output as-is with that http-equiv tweaked). So maybe it does nothing, or maybe it's just those browser... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" 2022-03-08 12:44 ` Ævar Arnfjörð Bjarmason @ 2022-03-08 14:54 ` Jason Yundt 0 siblings, 0 replies; 17+ messages in thread From: Jason Yundt @ 2022-03-08 14:54 UTC (permalink / raw) To: brian m. carlson, Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King On Tuesday, March 8, 2022 7:44:35 AM EST Ævar Arnfjörð Bjarmason wrote: > Maybe I still don't understand this, but the commit message seems to me > be conflating whether we send the *right* http-equiv with whether we > send it at all, The intent behind the commit message is to say that <meta http-equiv="content-type" …> is never correct in XHTML. > i.e. if the problem is that XML documents shouldn't be > text/html isn't this correct?: > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index fbd1c20a232..c1c5af0b197 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -4049,7 +4049,13 @@ sub get_page_title { > return $title; > } > > +sub get_content_type_xml { > + return 'application/xhtml+xml'; > +} > + > sub get_content_type_html { > + my ($want_xml) = @_; > + > # require explicit support from the UA if we are to send the page as > # 'application/xhtml+xml', otherwise send it as plain old 'text/html'. > # we have to do this because MSIE sometimes globs '*/*', pretending to > @@ -4057,7 +4063,7 @@ sub get_content_type_html { > if (defined $cgi->http('HTTP_ACCEPT') && > $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ && > $cgi->Accept('application/xhtml+xml') != 0) { > - return 'application/xhtml+xml'; > + return get_content_type_html(); I’m guessing that you meant to call get_content_type_xml() here. > } else { > return 'text/html'; > } > @@ -4214,6 +4220,7 @@ sub git_header_html { > > my $title = get_page_title(); > my $content_type = get_content_type_html(); > + my $content_type_xml = get_content_type_html(); I’m also guessing that you meant to call get_content_type_xml() here. > print $cgi->header(-type=>$content_type, -charset => 'utf-8', > -status=> $status, -expires => $expires) > unless ($opts{'-no_http_header'}); > @@ -4225,7 +4232,7 @@ sub git_header_html { > <!-- git web interface version $version, (C) 2005-2006, Kay Sievers <kay.sievers\@vrfy.org>, Christian Gierke --> > <!-- git core binaries version $git_version --> > <head> > -<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> > +<meta http-equiv="content-type" content="$content_type_xml; charset=utf-8"/> > <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version"/> > <meta name="robots" content="index, nofollow"/> > <title>$title</title> With those assumptions in mind, I don’t think that your code is correct if the problem is that XML documents shouldn't be text/html. Here’s why: 1. XML documents shouldn’t contain http-equiv="content-type" [1]. 2. When a meta’s http-equiv attribute equals content-type, then its content attribute should equal “the literal string "text/html;", optionally followed by any number of ASCII whitespace, followed by the literal string "charset=utf-8".” [1] [1]: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-03-08 15:59 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-07 3:37 [PATCH 0/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-07 3:37 ` [PATCH 1/2] comment: fix typo Jason Yundt 2022-03-07 3:37 ` [PATCH 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-07 12:23 ` Ævar Arnfjörð Bjarmason 2022-03-07 22:49 ` Jason Yundt 2022-03-07 23:24 ` brian m. carlson 2022-03-08 1:07 ` [PATCH v2 0/2] " Jason Yundt 2022-03-08 2:13 ` Junio C Hamano 2022-03-08 12:26 ` Jason Yundt 2022-03-08 15:56 ` [PATCH v3 " Jason Yundt 2022-03-08 15:56 ` [PATCH v3 1/2] comment: fix typo Jason Yundt 2022-03-08 15:56 ` [PATCH v3 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-08 1:07 ` [PATCH v2 1/2] comment: fix typo Jason Yundt 2022-03-08 1:07 ` [PATCH v2 2/2] gitweb: remove invalid http-equiv="content-type" Jason Yundt 2022-03-08 1:50 ` brian m. carlson 2022-03-08 12:44 ` Ævar Arnfjörð Bjarmason 2022-03-08 14:54 ` Jason Yundt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.