git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gitweb: fix esc_param
@ 2009-10-13 19:51 Giuseppe Bilotta
  2009-10-14  1:13 ` Stephen Boyd
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Giuseppe Bilotta @ 2009-10-13 19:51 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski, Stephen Boyd, Junio C Hamano, Giuseppe Bilotta

The custom CGI escaping done in esc_param failed to escape UTF-8
properly. Fix by using CGI::escape on each sequence of matched
characters instead of sprintf()ing a custom escaping for each byte.

Additionally, the space -> + escape was being escaped due to greedy
matching on the first substitution. Fix by adding space to the
list of characters not handled on the first substitution.

Finally, remove an unnecessary escaping of the + sign.
---
 gitweb/gitweb.perl |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

The issues with this routine were exposed by Stephen's
"author as search link" patch. This should fix them.

Since the idea of esc_param is to replicate CGI::escape except for the /
character (if I read the comment correclty), a possible alternative
would be to just use CGI::escape on the whole string and then undo the
escaping for the / character.

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6237865..6593e5c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1115,8 +1115,7 @@ sub to_utf8 {
 # correct, but quoted slashes look too horrible in bookmarks
 sub esc_param {
 	my $str = shift;
-	$str =~ s/([^A-Za-z0-9\-_.~()\/:@])/sprintf("%%%02X", ord($1))/eg;
-	$str =~ s/\+/%2B/g;
+	$str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;
 	$str =~ s/ /\+/g;
 	return $str;
 }
-- 
1.6.3.rc1.192.gdbfcb

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] gitweb: fix esc_param
  2009-10-13 19:51 [PATCH] gitweb: fix esc_param Giuseppe Bilotta
@ 2009-10-14  1:13 ` Stephen Boyd
  2009-10-14  6:19   ` Giuseppe Bilotta
  2009-10-14  8:23 ` Jakub Narebski
  2009-10-14  9:13 ` Junio C Hamano
  2 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2009-10-14  1:13 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Jakub Narebski, Junio C Hamano

Giuseppe Bilotta wrote:
> The custom CGI escaping done in esc_param failed to escape UTF-8
> properly. Fix by using CGI::escape on each sequence of matched
> characters instead of sprintf()ing a custom escaping for each byte.
>
> Additionally, the space -> + escape was being escaped due to greedy
> matching on the first substitution. Fix by adding space to the
> list of characters not handled on the first substitution.
>
> Finally, remove an unnecessary escaping of the + sign.
>

Signoff?

This works great for my purposes. Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] gitweb: fix esc_param
  2009-10-14  1:13 ` Stephen Boyd
@ 2009-10-14  6:19   ` Giuseppe Bilotta
  2009-10-14  6:29     ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: Giuseppe Bilotta @ 2009-10-14  6:19 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: git, Jakub Narebski, Junio C Hamano

On Wed, Oct 14, 2009 at 3:13 AM, Stephen Boyd <bebarino@gmail.com> wrote:
> Giuseppe Bilotta wrote:
>> The custom CGI escaping done in esc_param failed to escape UTF-8
>> properly. Fix by using CGI::escape on each sequence of matched
>> characters instead of sprintf()ing a custom escaping for each byte.
>>
>> Additionally, the space -> + escape was being escaped due to greedy
>> matching on the first substitution. Fix by adding space to the
>> list of characters not handled on the first substitution.
>>
>> Finally, remove an unnecessary escaping of the + sign.
>>
>
> Signoff?

Doh.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

(If this isn't enough, let me know)

> This works great for my purposes. Thanks.

Can you also check if this fixes the branch name issue you mentioned
in the other branch? (And/or do you have a repository exposing the
problem if not?)



-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] gitweb: fix esc_param
  2009-10-14  6:19   ` Giuseppe Bilotta
@ 2009-10-14  6:29     ` Stephen Boyd
  2009-10-14  9:03       ` Jakub Narebski
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2009-10-14  6:29 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Jakub Narebski, Junio C Hamano

Giuseppe Bilotta wrote:
> On Wed, Oct 14, 2009 at 3:13 AM, Stephen Boyd <bebarino@gmail.com> wrote:
>> This works great for my purposes. Thanks.
>
> Can you also check if this fixes the branch name issue you mentioned
> in the other branch? (And/or do you have a repository exposing the
> problem if not?)

(We're jumping back and forth between threads haha)

Sorry, it doesn't. But this diff fixes the first part of the problem.
There are still problems with the RSS/Atom feed names being mangled. The
branch name I'm using is gitwéb, but I imagine any utf8 character will fail.

I see the title and the actual text being mangled without this patch.

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4b21ad2..910c370 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1500,9 +1500,10 @@ sub format_ref_marker {
                                -href => href(
                                        action=>$dest_action,
                                        hash=>$dest
-                               )}, $name);
+                               )}, esc_html($name));
 
-                       $markers .= " <span class=\"$class\" title=\"$ref\">" .
+                       my $title_ref = esc_html($ref);
+                       $markers .= " <span class=\"$class\" title=\"$title_ref\">" .
                                $link . "</span>";
                }
        }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] gitweb: fix esc_param
  2009-10-13 19:51 [PATCH] gitweb: fix esc_param Giuseppe Bilotta
  2009-10-14  1:13 ` Stephen Boyd
@ 2009-10-14  8:23 ` Jakub Narebski
  2009-10-14  9:13 ` Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2009-10-14  8:23 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Stephen Boyd, Junio C Hamano

On Tue, 13 Oct 2009, Giuseppe Bilotta wrote:

> The custom CGI escaping done in esc_param failed to escape UTF-8
> properly. Fix by using CGI::escape on each sequence of matched
> characters instead of sprintf()ing a custom escaping for each byte.

Hmmm... I wonder if this bug isn't caused by failing to mark some
input as utf8 using to_utf8() subroutine... or by using 
binmode $fd, ':utf8' on $fd from opening git-rev-list, after ensuring
that it outputs utf8 by --encoding=utf8 (or is it only git-log that
accepts that option?).

> 
> Additionally, the space -> + escape was being escaped due to greedy
> matching on the first substitution. Fix by adding space to the
> list of characters not handled on the first substitution.

Thanks.

> 
> Finally, remove an unnecessary escaping of the + sign.

Signoff?

> ---
>  gitweb/gitweb.perl |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> The issues with this routine were exposed by Stephen's
> "author as search link" patch. This should fix them.
> 
> Since the idea of esc_param is to replicate CGI::escape except for the /
> character (if I read the comment correclty), a possible alternative
> would be to just use CGI::escape on the whole string and then undo the
> escaping for the / character.

Well, that and widely used but non-standard (well, not using percent
encoding) escaping of space with '+'; CGI::escape(" ") is %20, not '+'.
Se we would have to turn '%2F' into '/', and '%20' into '+'.

> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 6237865..6593e5c 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1115,8 +1115,7 @@ sub to_utf8 {
>  # correct, but quoted slashes look too horrible in bookmarks
>  sub esc_param {
>  	my $str = shift;
> -	$str =~ s/([^A-Za-z0-9\-_.~()\/:@])/sprintf("%%%02X", ord($1))/eg;
> -	$str =~ s/\+/%2B/g;
> +	$str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;
>  	$str =~ s/ /\+/g;
>  	return $str;
>  }
> -- 
> 1.6.3.rc1.192.gdbfcb
> 
> 

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] gitweb: fix esc_param
  2009-10-14  6:29     ` Stephen Boyd
@ 2009-10-14  9:03       ` Jakub Narebski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2009-10-14  9:03 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: Giuseppe Bilotta, git, Junio C Hamano

On Wed, 14 Oct 2009, Stephen Boyd wrote:
> Giuseppe Bilotta wrote:
>> On Wed, Oct 14, 2009 at 3:13 AM, Stephen Boyd <bebarino@gmail.com> wrote:
>>
>>> This works great for my purposes. Thanks.
>>>
>> Can you also check if this fixes the branch name issue you mentioned
>> in the other branch? (And/or do you have a repository exposing the
>> problem if not?)
> 
> (We're jumping back and forth between threads haha)
> 
> Sorry, it doesn't. But this diff fixes the first part of the problem.
> There are still problems with the RSS/Atom feed names being mangled. The
> branch name I'm using is gitwéb, but I imagine any utf8 character will fail.

That it is (probably) not because of lack of esc_html, but because
of lack of to_utf8.  Grrr... we really should convert to utf8 (usually
just mark as utf8) on reading input, to avoid such kind of errors.

> 
> I see the title and the actual text being mangled without this patch.
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4b21ad2..910c370 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1500,9 +1500,10 @@ sub format_ref_marker {
>                                 -href => href(
>                                         action=>$dest_action,
>                                         hash=>$dest
> -                               )}, $name);
> +                               )}, esc_html($name));

Hmmm... to_utf8 would be enough here, but for the fact that 
git-check-ref-format doesn't prohibit '<', '>' in ref names, 
so it is possible that somebody somewhere is using such strange
ref names (e.g. branch named '<b>' is valid branch name).

>  
> -                       $markers .= " <span class=\"$class\" title=\"$ref\">" .
> +                       my $title_ref = esc_html($ref);
> +                       $markers .= " <span class=\"$class\" title=\"$title_ref\">" .

Here it would be really useful to have esc_attr (which would do to_utf8
plus escaping of '"' and '%', quote and escape characters), although
I guess that full HTML escaping done with esc_html wouldn't do anything
bad.

>                                 $link . "</span>";
>                 }
>         }

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] gitweb: fix esc_param
  2009-10-13 19:51 [PATCH] gitweb: fix esc_param Giuseppe Bilotta
  2009-10-14  1:13 ` Stephen Boyd
  2009-10-14  8:23 ` Jakub Narebski
@ 2009-10-14  9:13 ` Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2009-10-14  9:13 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Jakub Narebski, Stephen Boyd, Junio C Hamano

Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:

> ...
> Finally, remove an unnecessary escaping of the + sign.
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-10-14  9:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-13 19:51 [PATCH] gitweb: fix esc_param Giuseppe Bilotta
2009-10-14  1:13 ` Stephen Boyd
2009-10-14  6:19   ` Giuseppe Bilotta
2009-10-14  6:29     ` Stephen Boyd
2009-10-14  9:03       ` Jakub Narebski
2009-10-14  8:23 ` Jakub Narebski
2009-10-14  9:13 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).