git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] gitweb: Improve diffs when filenames contain problem characters
       [not found] <A60B35D8-0981-45C8-B7D9-B89C729A837D@kellerfarm.com>
@ 2014-04-03  8:18 ` Jakub Narębski
  0 siblings, 0 replies; 2+ messages in thread
From: Jakub Narębski @ 2014-04-03  8:18 UTC (permalink / raw)
  To: Andrew Keller
  Cc: Git List, Krzesimir Nowak, Ævar Arnfjörð Bjarmason

[Forgot to hit Reply-to-all instead of Reply. Andrew, I'm sorry for
duplicate email]

On Sat, Mar 29, 2014 at 2:53 PM, Andrew Keller <andrew@kellerfarm.com> wrote:
>
> When formatting a diff header line, be sure to escape the raw output from git
> for use as HTML.  This ensures that when "problem characters" (&, <, >, ?, etc.)
> exist in filenames, gitweb displays them as text, rather than letting the
> browser interpret them as HTML.

Actually gitweb tries to do the right thing, and in most cases it does
HTML escaping correctly. The problem is only with *binary* files (this fact
should IMHO have been mentioned in commit message).

This issue is caused by two problems / errors. First, gitweb misclassify
"Binary files a/foo and b/bar differ" as diff header, while it is untypical
but it is diff contents. Second, gitweb doesn't HTML-escape unknown
diff headers, assuming that it knows about all possible types.

I have had those changes in my git repository, but I do not know if
I have pushed it before the PC went down, and if I have it in backup.

>
> Reported-by: Dongsheng Song <dongsheng.song@gmail.com>
> Signed-off-by: Andrew Keller <andrew@kellerfarm.com>

Thank you for your work.

> ---
> Steps to reproduce:
>
> 1)  Create a repository that contains a commit that adds a file:
>     * with an ampersand in the filename
>     * with binary contents
> 2)  Make the repository visible to gitweb
> 3)  In gitweb, navigate to the page that shows the diff for the commit
>     that added the file
>
> Behavior without patch:

Page contains unescaped '&' instead of '&amp;'

>                                   Page stops rendering when it gets to one of
> the instances of the filename (in the diff header, to be specific), and
> a light-red box appears a the top of the page, saying something like
> this:
>
> This page contains the following errors:
>
> error on line 67 at column 22: xmlParseEntityRef: no name
>
> Below is a rendering of the page up to the first error.
>
>
> (This particular error is what you get with an unescaped ampersand)

This is caused by the fact that some browsers use strict XML validation
mode when receiving 'application/xml+xhtml' contents, and not 'text/html'.
So one might not notice it...

> Behavior with patch: You see the ampersand in the file name, and the
> page renders as one would expect.

A question: it does not escapes HTML twice, i.e. you don't see
'&amp;' in rendered output ('&amp;amp;' in page source)?

> Other notes:
>
> Several helper methods exist for escaping HTML, and I was unsure
> about which one to use.  Although this patch fixes the problem,
> it is possible that it may be more correct to use, for example, the
> 'esc_html' helper method instead of interacting with $cgi directly.

It is preferred to use esc_html instead of $cgi->escapeHTML.
Even better use esc_path to escape pathnames.

> The first hunk in the diff seems to be all that's required to experience
> the good behavior, however upon inspecting the code, it seems
> prudent to also include the second one.  It's a nearby section of code,
> doing similar work, and is called from the same function as the
> former, and with similar arguments.

I wonder if it would not interfere with processing of diff contents
by gitweb, for example adding links to pre-image and post-image
version of a file.

>  gitweb/gitweb.perl |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 79057b7..6c559f8 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2223,6 +2223,8 @@ sub format_git_diff_header_line {
>   my $diffinfo = shift;
>   my ($from, $to) = @_;
>
> + $line = $cgi->escapeHTML($line);
> +
>   if ($diffinfo->{'nparents'}) {
>   # combined diff
>   $line =~ s!^(diff (.*?) )"?.*$!$1!;
> @@ -2259,6 +2261,8 @@ sub format_extended_diff_header_line {
>   my $diffinfo = shift;
>   my ($from, $to) = @_;
>
> + $line = $cgi->escapeHTML($line);
> +
>   # match <path>
>   if ($line =~ s!^((copy|rename) from ).*$!$1! && $from->{'href'}) {
>   $line .= $cgi->a({-href=>$from->{'href'}, -class=>"path"},

I'd have to examine code flow to check where it lands...

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH] gitweb: Improve diffs when filenames contain problem characters
@ 2014-03-31  2:05 Andrew Keller
  0 siblings, 0 replies; 2+ messages in thread
From: Andrew Keller @ 2014-03-31  2:05 UTC (permalink / raw)
  To: Git List
  Cc: Jakub Narebski, Krzesimir Nowak, Ævar Arnfjörð Bjarmason

When formatting a diff header line, be sure to escape the raw output from git
for use as HTML.  This ensures that when "problem characters" (&, <, >, ?, etc.)
exist in filenames, gitweb displays them as text, rather than letting the
browser interpret them as HTML.

Reported-by: Dongsheng Song <dongsheng.song@gmail.com>
Signed-off-by: Andrew Keller <andrew@kellerfarm.com>
---
Steps to reproduce:

1)  Create a repository that contains a commit that adds a file:
    * with an ampersand in the filename
    * with binary contents
2)  Make the repository visible to gitweb
3)  In gitweb, navigate to the page that shows the diff for the commit
    that added the file

Behavior without patch: Page stops rendering when it gets to one of
the instances of the filename (in the diff header, to be specific), and
a light-red box appears a the top of the page, saying something like
this:

    This page contains the following errors:

    error on line 67 at column 22: xmlParseEntityRef: no name

    Below is a rendering of the page up to the first error.


(This particular error is what you get with an unescaped ampersand)

Behavior with patch: You see the ampersand in the file name, and the
page renders as one would expect.


Other notes:

Several helper methods exist for escaping HTML, and I was unsure
about which one to use.  Although this patch fixes the problem,
it is possible that it may be more correct to use, for example, the
'esc_html' helper method instead of interacting with $cgi directly.

The first hunk in the diff seems to be all that's required to experience
the good behavior, however upon inspecting the code, it seems
prudent to also include the second one.  It's a nearby section of code,
doing similar work, and is called from the same function as the
former, and with similar arguments.


Thanks,
Andrew Keller


 gitweb/gitweb.perl |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 79057b7..6c559f8 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2223,6 +2223,8 @@ sub format_git_diff_header_line {
 	my $diffinfo = shift;
 	my ($from, $to) = @_;
 
+	$line = $cgi->escapeHTML($line);
+
 	if ($diffinfo->{'nparents'}) {
 		# combined diff
 		$line =~ s!^(diff (.*?) )"?.*$!$1!;
@@ -2259,6 +2261,8 @@ sub format_extended_diff_header_line {
 	my $diffinfo = shift;
 	my ($from, $to) = @_;
 
+	$line = $cgi->escapeHTML($line);
+
 	# match <path>
 	if ($line =~ s!^((copy|rename) from ).*$!$1! && $from->{'href'}) {
 		$line .= $cgi->a({-href=>$from->{'href'}, -class=>"path"},
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-04-03 10:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <A60B35D8-0981-45C8-B7D9-B89C729A837D@kellerfarm.com>
2014-04-03  8:18 ` [PATCH] gitweb: Improve diffs when filenames contain problem characters Jakub Narębski
2014-03-31  2:05 Andrew Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).