All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] gitweb: use decode_utf8 directly
@ 2007-04-24 14:05 Ismail Dönmez
  2007-04-27  8:55 ` Ismail Dönmez
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-04-24 14:05 UTC (permalink / raw)
  To: git


[-- Attachment #1.1: Type: text/plain, Size: 1074 bytes --]

Hi,

gitweb currently uses Encode::decode function with a wrapper like this :

# very thin wrapper for decode("utf8", $str, Encode::FB_DEFAULT);
sub to_utf8 {
       my $str = shift;
       return decode("utf8", $str, Encode::FB_DEFAULT);
}

But for me this gives the following error when I try to view RSS feed for 
Linux kernel GIT repo (local checkout) :

Cannot decode string with wide characters 
at /usr/lib/perl5/vendor_perl/5.8.8/i686-linux/Encode.pm line 162.

I Google'd a bit but the relevant information seems to be missing about this 
error. Anyhow there is no need for a wrapper at all as Encode class has a 
decode_utf8 function which fixes the problem I am experiencing too and chops 
off the unneeded wrapper.

Patch against git 1.5.1.2 is attached. Comments welcome.

P.S: I am using Encode 2.20 from CPAN which is the latest stable version 
available.

Regards,
ismail

-- 
Life is a game, and if you aren't in it to win,
what the heck are you still doing here?

-- Linus Torvalds (talking about open source development)

[-- Attachment #1.2: decode-utf8.patch --]
[-- Type: text/x-diff, Size: 3063 bytes --]

--- gitweb/gitweb.perl	2007-04-24 16:53:00.000000000 +0300
+++ gitweb/gitweb.perl	2007-04-24 16:54:22.000000000 +0300
@@ -566,12 +566,6 @@
 	return $input;
 }
 
-# very thin wrapper for decode("utf8", $str, Encode::FB_DEFAULT);
-sub to_utf8 {
-	my $str = shift;
-	return decode("utf8", $str, Encode::FB_DEFAULT);
-}
-
 # quote unsafe chars, but keep the slash, even when it's not
 # correct, but quoted slashes look too horrible in bookmarks
 sub esc_param {
@@ -596,7 +590,7 @@
 	my $str = shift;
 	my %opts = @_;
 
-	$str = to_utf8($str);
+	$str = decode_utf8($str);
 	$str = $cgi->escapeHTML($str);
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ / /g;
@@ -610,7 +604,7 @@
 	my $str = shift;
 	my %opts = @_;
 
-	$str = to_utf8($str);
+	$str = decode_utf8($str);
 	$str = $cgi->escapeHTML($str);
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ / /g;
@@ -893,7 +887,7 @@
 
 	if (length($short) < length($long)) {
 		return $cgi->a({-href => $href, -class => "list subject",
-		                -title => to_utf8($long)},
+		                -title => decode_utf8($long)},
 		       esc_html($short) . $extra);
 	} else {
 		return $cgi->a({-href => $href, -class => "list subject"},
@@ -1110,7 +1104,7 @@
 			if (check_export_ok("$projectroot/$path")) {
 				my $pr = {
 					path => $path,
-					owner => to_utf8($owner),
+					owner => decode_utf8($owner),
 				};
 				push @list, $pr
 			}
@@ -1139,7 +1133,7 @@
 			$pr = unescape($pr);
 			$ow = unescape($ow);
 			if ($pr eq $project) {
-				$owner = to_utf8($ow);
+				$owner = decode_utf8($ow);
 				last;
 			}
 		}
@@ -1613,7 +1607,7 @@
 	}
 	my $owner = $gcos;
 	$owner =~ s/[,;].*$//;
-	return to_utf8($owner);
+	return decode_utf8($owner);
 }
 
 ## ......................................................................
@@ -1696,7 +1690,7 @@
 
 	my $title = "$site_name";
 	if (defined $project) {
-		$title .= " - " . to_utf8($project);
+		$title .= " - " . decode_utf8($project);
 		if (defined $action) {
 			$title .= "/$action";
 			if (defined $file_name) {
@@ -1969,7 +1963,7 @@
 
 	print "<div class=\"page_path\">";
 	print $cgi->a({-href => href(action=>"tree", hash_base=>$hb),
-	              -title => 'tree root'}, to_utf8("[$project]"));
+	              -title => 'tree root'}, decode_utf8("[$project]"));
 	print " / ";
 	if (defined $name) {
 		my @dirname = split '/', $name;
@@ -2584,7 +2578,7 @@
 		($pr->{'age'}, $pr->{'age_string'}) = @aa;
 		if (!defined $pr->{'descr'}) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
-			$pr->{'descr_long'} = to_utf8($descr);
+			$pr->{'descr_long'} = decode_utf8($descr);
 			$pr->{'descr'} = chop_str($descr, 25, 5);
 		}
 		if (!defined $pr->{'owner'}) {
@@ -3616,7 +3610,7 @@
 		$hash = git_get_head_hash($project);
 	}
 
-	my $filename = to_utf8(basename($project)) . "-$hash.tar.$suffix";
+	my $filename = decode_utf8(basename($project)) . "-$hash.tar.$suffix";
 
 	print $cgi->header(
 		-type => "application/$ctype",

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-04-24 14:05 [PATCH] gitweb: use decode_utf8 directly Ismail Dönmez
@ 2007-04-27  8:55 ` Ismail Dönmez
  2007-04-27  9:07   ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-04-27  8:55 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Tuesday 24 April 2007 17:05:15 you wrote:
> Hi,
>
> gitweb currently uses Encode::decode function with a wrapper like this :
>
> # very thin wrapper for decode("utf8", $str, Encode::FB_DEFAULT);
> sub to_utf8 {
>        my $str = shift;
>        return decode("utf8", $str, Encode::FB_DEFAULT);
> }
>
> But for me this gives the following error when I try to view RSS feed for
> Linux kernel GIT repo (local checkout) :
>
> Cannot decode string with wide characters
> at /usr/lib/perl5/vendor_perl/5.8.8/i686-linux/Encode.pm line 162.
>
> I Google'd a bit but the relevant information seems to be missing about
> this error. Anyhow there is no need for a wrapper at all as Encode class
> has a decode_utf8 function which fixes the problem I am experiencing too
> and chops off the unneeded wrapper.
>
> Patch against git 1.5.1.2 is attached. Comments welcome.
>
> P.S: I am using Encode 2.20 from CPAN which is the latest stable version
> available.

Ping? This patch should be harmless and it fixes a real error, can it be 
applied please?

Regards,
ismail


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-04-27  8:55 ` Ismail Dönmez
@ 2007-04-27  9:07   ` Junio C Hamano
  2007-04-27  9:22     ` Ismail Dönmez
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-04-27  9:07 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: git

Ismail Dönmez <ismail@pardus.org.tr> writes:

>> I Google'd a bit but the relevant information seems to be missing about
>> this error. Anyhow there is no need for a wrapper at all as Encode class
>> has a decode_utf8 function which fixes the problem I am experiencing too
>> and chops off the unneeded wrapper.
>>
>> Patch against git 1.5.1.2 is attached. Comments welcome.
>>
>> P.S: I am using Encode 2.20 from CPAN which is the latest stable version
>> available.
>
> Ping? This patch should be harmless and it fixes a real error, can it be 
> applied please?

I cannot tell if it is harmless.  The original used

	decode("utf8", $str, Encode::FB_DEFAULT);

and you made them to:

	decode_utf8($str);

According to the documentation, decode_utf8($octets [,CHECK])
should be equivalent to decode("utf8", $octets [,CHECK]), and
the documentation further says that without CHECK, these
functions assume Encode::FB_DEFAULT; in other words, these two
should be equivalent.

Which means that there is something else going on.  Your change
may fix what you observed (I do not doubt that it fixed what you
observed for you), but without understanding what really is
going on (iow, why it is a fix, when the documentation clearly
indicates they should be equivalent and it should not fix
anything), we cannot tell what *ELSE* we are breaking with this
change.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-04-27  9:07   ` Junio C Hamano
@ 2007-04-27  9:22     ` Ismail Dönmez
  2007-04-27 19:29       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-04-27  9:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

On Friday 27 April 2007 12:07:58 you wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> >> I Google'd a bit but the relevant information seems to be missing about
> >> this error. Anyhow there is no need for a wrapper at all as Encode class
> >> has a decode_utf8 function which fixes the problem I am experiencing too
> >> and chops off the unneeded wrapper.
> >>
> >> Patch against git 1.5.1.2 is attached. Comments welcome.
> >>
> >> P.S: I am using Encode 2.20 from CPAN which is the latest stable version
> >> available.
> >
> > Ping? This patch should be harmless and it fixes a real error, can it be
> > applied please?
>
> I cannot tell if it is harmless.  The original used
>
> 	decode("utf8", $str, Encode::FB_DEFAULT);
>
> and you made them to:
>
> 	decode_utf8($str);
>
> According to the documentation, decode_utf8($octets [,CHECK])
> should be equivalent to decode("utf8", $octets [,CHECK]), and
> the documentation further says that without CHECK, these
> functions assume Encode::FB_DEFAULT; in other words, these two
> should be equivalent.
>
> Which means that there is something else going on.  Your change
> may fix what you observed (I do not doubt that it fixed what you
> observed for you), but without understanding what really is
> going on (iow, why it is a fix, when the documentation clearly
> indicates they should be equivalent and it should not fix
> anything), we cannot tell what *ELSE* we are breaking with this
> change.

That might be a bug in Encode itself indeed, I will dig a bit more. Thanks.

Regards,
ismail




[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-04-27  9:22     ` Ismail Dönmez
@ 2007-04-27 19:29       ` Junio C Hamano
  2007-05-01 21:12         ` Ismail Dönmez
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-04-27 19:29 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: git

Ismail Dönmez <ismail@pardus.org.tr> writes:

>> Which means that there is something else going on.  Your change
>> may fix what you observed (I do not doubt that it fixed what you
>> observed for you), but without understanding what really is
>> going on (iow, why it is a fix, when the documentation clearly
>> indicates they should be equivalent and it should not fix
>> anything), we cannot tell what *ELSE* we are breaking with this
>> change.
>
> That might be a bug in Encode itself indeed, I will dig a bit more. Thanks.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-04-27 19:29       ` Junio C Hamano
@ 2007-05-01 21:12         ` Ismail Dönmez
  2007-05-01 21:39           ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-05-01 21:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Friday 27 April 2007 22:29:03 you wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> >> Which means that there is something else going on.  Your change
> >> may fix what you observed (I do not doubt that it fixed what you
> >> observed for you), but without understanding what really is
> >> going on (iow, why it is a fix, when the documentation clearly
> >> indicates they should be equivalent and it should not fix
> >> anything), we cannot tell what *ELSE* we are breaking with this
> >> change.
> >
> > That might be a bug in Encode itself indeed, I will dig a bit more.
> > Thanks.
>
> Thanks.

Ok found out the reason. decode() tries to decode data that is already UTF-8 
and borks.

This is from Encode.pm :

sub decode_utf8($;$) {
    my ( $str, $check ) = @_;
    return $str if is_utf8($str); <--- Checks if the $str is already UTF-8
    if ($check) {
        return decode( "utf8", $str, $check ); <--- Else do what gitweb does
    [...]

So my patch is indeed correct. I attach it again for reference. Can it be 
please applied?

Regards,
ismail

[-- Attachment #2: decode-utf8.patch --]
[-- Type: text/x-diff, Size: 3063 bytes --]

--- gitweb/gitweb.perl	2007-04-24 16:53:00.000000000 +0300
+++ gitweb/gitweb.perl	2007-04-24 16:54:22.000000000 +0300
@@ -566,12 +566,6 @@
 	return $input;
 }
 
-# very thin wrapper for decode("utf8", $str, Encode::FB_DEFAULT);
-sub to_utf8 {
-	my $str = shift;
-	return decode("utf8", $str, Encode::FB_DEFAULT);
-}
-
 # quote unsafe chars, but keep the slash, even when it's not
 # correct, but quoted slashes look too horrible in bookmarks
 sub esc_param {
@@ -596,7 +590,7 @@
 	my $str = shift;
 	my %opts = @_;
 
-	$str = to_utf8($str);
+	$str = decode_utf8($str);
 	$str = $cgi->escapeHTML($str);
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ /&nbsp;/g;
@@ -610,7 +604,7 @@
 	my $str = shift;
 	my %opts = @_;
 
-	$str = to_utf8($str);
+	$str = decode_utf8($str);
 	$str = $cgi->escapeHTML($str);
 	if ($opts{'-nbsp'}) {
 		$str =~ s/ /&nbsp;/g;
@@ -893,7 +887,7 @@
 
 	if (length($short) < length($long)) {
 		return $cgi->a({-href => $href, -class => "list subject",
-		                -title => to_utf8($long)},
+		                -title => decode_utf8($long)},
 		       esc_html($short) . $extra);
 	} else {
 		return $cgi->a({-href => $href, -class => "list subject"},
@@ -1110,7 +1104,7 @@
 			if (check_export_ok("$projectroot/$path")) {
 				my $pr = {
 					path => $path,
-					owner => to_utf8($owner),
+					owner => decode_utf8($owner),
 				};
 				push @list, $pr
 			}
@@ -1139,7 +1133,7 @@
 			$pr = unescape($pr);
 			$ow = unescape($ow);
 			if ($pr eq $project) {
-				$owner = to_utf8($ow);
+				$owner = decode_utf8($ow);
 				last;
 			}
 		}
@@ -1613,7 +1607,7 @@
 	}
 	my $owner = $gcos;
 	$owner =~ s/[,;].*$//;
-	return to_utf8($owner);
+	return decode_utf8($owner);
 }
 
 ## ......................................................................
@@ -1696,7 +1690,7 @@
 
 	my $title = "$site_name";
 	if (defined $project) {
-		$title .= " - " . to_utf8($project);
+		$title .= " - " . decode_utf8($project);
 		if (defined $action) {
 			$title .= "/$action";
 			if (defined $file_name) {
@@ -1969,7 +1963,7 @@
 
 	print "<div class=\"page_path\">";
 	print $cgi->a({-href => href(action=>"tree", hash_base=>$hb),
-	              -title => 'tree root'}, to_utf8("[$project]"));
+	              -title => 'tree root'}, decode_utf8("[$project]"));
 	print " / ";
 	if (defined $name) {
 		my @dirname = split '/', $name;
@@ -2584,7 +2578,7 @@
 		($pr->{'age'}, $pr->{'age_string'}) = @aa;
 		if (!defined $pr->{'descr'}) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
-			$pr->{'descr_long'} = to_utf8($descr);
+			$pr->{'descr_long'} = decode_utf8($descr);
 			$pr->{'descr'} = chop_str($descr, 25, 5);
 		}
 		if (!defined $pr->{'owner'}) {
@@ -3616,7 +3610,7 @@
 		$hash = git_get_head_hash($project);
 	}
 
-	my $filename = to_utf8(basename($project)) . "-$hash.tar.$suffix";
+	my $filename = decode_utf8(basename($project)) . "-$hash.tar.$suffix";
 
 	print $cgi->header(
 		-type => "application/$ctype",

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-05-01 21:12         ` Ismail Dönmez
@ 2007-05-01 21:39           ` Junio C Hamano
  2007-05-01 21:44             ` Ismail Dönmez
  2007-05-03 19:22             ` Ismail Dönmez
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2007-05-01 21:39 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: git, Jakub Narebski

Ismail Dönmez <ismail@pardus.org.tr> writes:

> Ok found out the reason. decode() tries to decode data that is already UTF-8 
> and borks.
>
> This is from Encode.pm :
>
> sub decode_utf8($;$) {
>     my ( $str, $check ) = @_;
>     return $str if is_utf8($str); <--- Checks if the $str is already UTF-8
>     if ($check) {
>         return decode( "utf8", $str, $check ); <--- Else do what gitweb does
>     [...]
>
> So my patch is indeed correct.

Ok, I think that makes it an improvement from the current code,
so I'd apply.

But at the same time I wonder why should the callers be feeding
an already decoded string to to_utf8().  It might be that some
callers needs fixing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-05-01 21:39           ` Junio C Hamano
@ 2007-05-01 21:44             ` Ismail Dönmez
  2007-05-01 21:48               ` Ismail Dönmez
  2007-05-03 19:22             ` Ismail Dönmez
  1 sibling, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-05-01 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jakub Narebski

On Wednesday 02 May 2007 00:39:34 you wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> > Ok found out the reason. decode() tries to decode data that is already
> > UTF-8 and borks.
> >
> > This is from Encode.pm :
> >
> > sub decode_utf8($;$) {
> >     my ( $str, $check ) = @_;
> >     return $str if is_utf8($str); <--- Checks if the $str is already
> > UTF-8 if ($check) {
> >         return decode( "utf8", $str, $check ); <--- Else do what gitweb
> > does [...]
> >
> > So my patch is indeed correct.
>
> Ok, I think that makes it an improvement from the current code,
> so I'd apply.
>
> But at the same time I wonder why should the callers be feeding
> an already decoded string to to_utf8().  It might be that some
> callers needs fixing.

FWIW it was passing my name "İsmail Dönmez" based on user info I guess.

Regards,
ismail

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-05-01 21:44             ` Ismail Dönmez
@ 2007-05-01 21:48               ` Ismail Dönmez
  0 siblings, 0 replies; 22+ messages in thread
From: Ismail Dönmez @ 2007-05-01 21:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jakub Narebski

On Wednesday 02 May 2007 00:44:46 you wrote:
> On Wednesday 02 May 2007 00:39:34 you wrote:
> > Ismail Dönmez <ismail@pardus.org.tr> writes:
> > > Ok found out the reason. decode() tries to decode data that is already
> > > UTF-8 and borks.
> > >
> > > This is from Encode.pm :
> > >
> > > sub decode_utf8($;$) {
> > >     my ( $str, $check ) = @_;
> > >     return $str if is_utf8($str); <--- Checks if the $str is already
> > > UTF-8 if ($check) {
> > >         return decode( "utf8", $str, $check ); <--- Else do what gitweb
> > > does [...]
> > >
> > > So my patch is indeed correct.
> >
> > Ok, I think that makes it an improvement from the current code,
> > so I'd apply.
> >
> > But at the same time I wonder why should the callers be feeding
> > an already decoded string to to_utf8().  It might be that some
> > callers needs fixing.
>
> FWIW it was passing my name "İsmail Dönmez" based on user info I guess.

I guess its line 1116:

 if (check_export_ok("$projectroot/$path")) {
	my $pr = {
	path => $path,
	owner => to_utf8($owner), <---- Here
};

My system is configured for UTF-8 so $owner will be UTF-8 but in some systems 
it might not be so I don't think there is anything to fix here.

Regards,
ismail

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-05-01 21:39           ` Junio C Hamano
  2007-05-01 21:44             ` Ismail Dönmez
@ 2007-05-03 19:22             ` Ismail Dönmez
  2007-05-03 19:26               ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-05-03 19:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jakub Narebski

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

Hi,
On Wednesday 02 May 2007 00:39:34 Junio C Hamano wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> > Ok found out the reason. decode() tries to decode data that is already
> > UTF-8 and borks.
> >
> > This is from Encode.pm :
> >
> > sub decode_utf8($;$) {
> >     my ( $str, $check ) = @_;
> >     return $str if is_utf8($str); <--- Checks if the $str is already
> > UTF-8 if ($check) {
> >         return decode( "utf8", $str, $check ); <--- Else do what gitweb
> > does [...]
> >
> > So my patch is indeed correct.
>
> Ok, I think that makes it an improvement from the current code,
> so I'd apply.
>
> But at the same time I wonder why should the callers be feeding
> an already decoded string to to_utf8().  It might be that some
> callers needs fixing.

Is the patch OK do you want more investigation? Asking because its still not 
in git.git.

Regards,
ismail

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-05-03 19:22             ` Ismail Dönmez
@ 2007-05-03 19:26               ` Junio C Hamano
  2007-06-01 13:45                 ` Alexandre Julliard
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-05-03 19:26 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: git, Jakub Narebski

Ismail Dönmez <ismail@pardus.org.tr> writes:

>> But at the same time I wonder why should the callers be feeding
>> an already decoded string to to_utf8().  It might be that some
>> callers needs fixing.
>
> Is the patch OK do you want more investigation? Asking because its still not 
> in git.git.

I would say that the patch is an improvement from the current
code so it should hit 'master'; I was a bit busy lately and then
am sick, and also we are post -rc1 freeze now and I was being
cautious, just in case some nacks from more informed parties
arrive late.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-05-03 19:26               ` Junio C Hamano
@ 2007-06-01 13:45                 ` Alexandre Julliard
  2007-06-01 13:50                   ` Ismail Dönmez
                                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Alexandre Julliard @ 2007-06-01 13:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ismail Dönmez, git, Jakub Narebski

Junio C Hamano <junkio@cox.net> writes:

> I would say that the patch is an improvement from the current
> code so it should hit 'master'; I was a bit busy lately and then
> am sick, and also we are post -rc1 freeze now and I was being
> cautious, just in case some nacks from more informed parties
> arrive late.

Sorry for the late nack, but it turns out that this patch breaks diff
output on the Wine server for files that are not utf-8.

The cause is apparently that decode_utf8() returns undef for invalid
sequences instead of substituting a replacement char like
decode("utf8") does.

That may be considered an Encode bug since we are running a fairly old
version (1.99, coming with Debian 3.1), but I'd rather not upgrade
perl on the server. Could the patch be reverted, or done differently?

-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 13:45                 ` Alexandre Julliard
@ 2007-06-01 13:50                   ` Ismail Dönmez
  2007-06-01 16:51                     ` Alexandre Julliard
  2007-06-01 19:44                   ` Junio C Hamano
  2007-06-02  8:22                   ` Jakub Narebski
  2 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-06-01 13:50 UTC (permalink / raw)
  To: Alexandre Julliard; +Cc: Junio C Hamano, git, Jakub Narebski

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

On Friday 01 June 2007 16:45:31 Alexandre Julliard wrote:
> Junio C Hamano <junkio@cox.net> writes:
> > I would say that the patch is an improvement from the current
> > code so it should hit 'master'; I was a bit busy lately and then
> > am sick, and also we are post -rc1 freeze now and I was being
> > cautious, just in case some nacks from more informed parties
> > arrive late.
>
> Sorry for the late nack, but it turns out that this patch breaks diff
> output on the Wine server for files that are not utf-8.

Isn't UTF-8 default even for Linux kernel now?

> The cause is apparently that decode_utf8() returns undef for invalid
> sequences instead of substituting a replacement char like
> decode("utf8") does.
>
> That may be considered an Encode bug since we are running a fairly old
> version (1.99, coming with Debian 3.1), but I'd rather not upgrade
> perl on the server. Could the patch be reverted, or done differently?

Sorry but thats too old. Of course I am not the maintainer of GIT so its not 
for me to decide but well as David Woodhouse puts it, please join us in 21st 
century and start using UTF-8.

/ismail

-- 
Perfect is the enemy of good

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 13:50                   ` Ismail Dönmez
@ 2007-06-01 16:51                     ` Alexandre Julliard
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Julliard @ 2007-06-01 16:51 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: Junio C Hamano, git, Jakub Narebski

Ismail Dönmez <ismail@pardus.org.tr> writes:

> Sorry but thats too old. Of course I am not the maintainer of GIT so its not 
> for me to decide but well as David Woodhouse puts it, please join us in 21st 
> century and start using UTF-8.

That's not very helpful. There can be many valid reasons for not using
utf-8, in our case compatibility with Windows tools is the main
reason. And even if we were to convert all our files today, it
wouldn't help when browsing older versions.

I'm not asking gitweb to magically guess the encoding of the files,
I'm happy with it replacing invalid sequences with some substitution
char, like it did before 1.5.2. But now it is deleting whole lines
from the diff, without any indication that something went wrong.
That's not an improvement IMNSHO.

-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 13:45                 ` Alexandre Julliard
  2007-06-01 13:50                   ` Ismail Dönmez
@ 2007-06-01 19:44                   ` Junio C Hamano
  2007-06-01 19:47                     ` Ismail Dönmez
  2007-06-02  8:22                   ` Jakub Narebski
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-06-01 19:44 UTC (permalink / raw)
  To: Alexandre Julliard; +Cc: Ismail Dönmez, git, Jakub Narebski

Alexandre Julliard <julliard@winehq.org> writes:

> Sorry for the late nack, but it turns out that this patch breaks diff
> output on the Wine server for files that are not utf-8.
>
> The cause is apparently that decode_utf8() returns undef for invalid
> sequences instead of substituting a replacement char like
> decode("utf8") does.

Thanks for noticing.  Will revert.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 19:44                   ` Junio C Hamano
@ 2007-06-01 19:47                     ` Ismail Dönmez
  2007-06-01 20:00                       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-06-01 19:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexandre Julliard, git, Jakub Narebski

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On Friday 01 June 2007 22:44:36 Junio C Hamano wrote:
> Alexandre Julliard <julliard@winehq.org> writes:
> > Sorry for the late nack, but it turns out that this patch breaks diff
> > output on the Wine server for files that are not utf-8.
> >
> > The cause is apparently that decode_utf8() returns undef for invalid
> > sequences instead of substituting a replacement char like
> > decode("utf8") does.
>
> Thanks for noticing.  Will revert.

Why are reverting a correct bugfix? :( He's at most using outdated software. 
*sigh*

/ismail

-- 
Perfect is the enemy of good

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 19:47                     ` Ismail Dönmez
@ 2007-06-01 20:00                       ` Junio C Hamano
  2007-06-01 20:08                         ` Ismail Dönmez
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-06-01 20:00 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: Alexandre Julliard, git, Jakub Narebski

Ismail Dönmez <ismail@pardus.org.tr> writes:

> On Friday 01 June 2007 22:44:36 Junio C Hamano wrote:
>> Alexandre Julliard <julliard@winehq.org> writes:
>> > Sorry for the late nack, but it turns out that this patch breaks diff
>> > output on the Wine server for files that are not utf-8.
>> >
>> > The cause is apparently that decode_utf8() returns undef for invalid
>> > sequences instead of substituting a replacement char like
>> > decode("utf8") does.
>>
>> Thanks for noticing.  Will revert.
>
> Why are reverting a correct bugfix? :( He's at most using outdated software. 
> *sigh*

I would assume that on top of a revert, with an additional

	return $str if is_utf8($str);

to to_utf8() you should be able to fix both installations that
has old or new Encode.pm?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 20:00                       ` Junio C Hamano
@ 2007-06-01 20:08                         ` Ismail Dönmez
  2007-06-03 22:06                           ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Ismail Dönmez @ 2007-06-01 20:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexandre Julliard, git, Jakub Narebski

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

On Friday 01 June 2007 23:00:31 you wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> > On Friday 01 June 2007 22:44:36 Junio C Hamano wrote:
> >> Alexandre Julliard <julliard@winehq.org> writes:
> >> > Sorry for the late nack, but it turns out that this patch breaks diff
> >> > output on the Wine server for files that are not utf-8.
> >> >
> >> > The cause is apparently that decode_utf8() returns undef for invalid
> >> > sequences instead of substituting a replacement char like
> >> > decode("utf8") does.
> >>
> >> Thanks for noticing.  Will revert.
> >
> > Why are reverting a correct bugfix? :( He's at most using outdated
> > software. *sigh*
>
> I would assume that on top of a revert, with an additional
>
> 	return $str if is_utf8($str);
>
> to to_utf8() you should be able to fix both installations that
> has old or new Encode.pm?

I can try the patch if you can send me what you propose. 

/ismail

-- 
Perfect is the enemy of good

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 13:45                 ` Alexandre Julliard
  2007-06-01 13:50                   ` Ismail Dönmez
  2007-06-01 19:44                   ` Junio C Hamano
@ 2007-06-02  8:22                   ` Jakub Narebski
  2 siblings, 0 replies; 22+ messages in thread
From: Jakub Narebski @ 2007-06-02  8:22 UTC (permalink / raw)
  To: Alexandre Julliard; +Cc: Junio C Hamano, Ismail Dönmez, git

On Fri, 1 Jun 2007, Alexandre Julliard wrote:
 
> The cause is apparently that decode_utf8() returns undef for invalid
> sequences instead of substituting a replacement char like
> decode("utf8") does.
> 
> That may be considered an Encode bug since we are running a fairly old
> version (1.99, coming with Debian 3.1), but I'd rather not upgrade
> perl on the server. Could the patch be reverted, or done differently?

Could you put modern (without this decode_utf8 bug) version of Encode.pm
in the directory with gitweb.cgi, so gitweb uses new local version and
not the one that is installed system-wide?
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-01 20:08                         ` Ismail Dönmez
@ 2007-06-03 22:06                           ` Junio C Hamano
  2007-06-03 22:13                             ` Ismail Dönmez
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2007-06-03 22:06 UTC (permalink / raw)
  To: Ismail Dönmez; +Cc: Alexandre Julliard, git, Jakub Narebski

Ismail Dönmez <ismail@pardus.org.tr> writes:

> I can try the patch if you can send me what you propose. 

Does the recent one from Jakub work for you?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
  2007-06-03 22:06                           ` Junio C Hamano
@ 2007-06-03 22:13                             ` Ismail Dönmez
  0 siblings, 0 replies; 22+ messages in thread
From: Ismail Dönmez @ 2007-06-03 22:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexandre Julliard, git, Jakub Narebski

[-- Attachment #1: Type: text/plain, Size: 297 bytes --]

On Monday 04 June 2007 01:06:55 Junio C Hamano wrote:
> Ismail Dönmez <ismail@pardus.org.tr> writes:
> > I can try the patch if you can send me what you propose.
>
> Does the recent one from Jakub work for you?

It works fine here.

Regards,
ismail

-- 
Perfect is the enemy of good

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] gitweb: use decode_utf8 directly
@ 2007-06-01 16:13 Martin Koegler
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Koegler @ 2007-06-01 16:13 UTC (permalink / raw)
  To: julliard; +Cc: git

Alexandre Julliard wrote:
>Junio C Hamano <junkio@cox.net> writes:
> > I would say that the patch is an improvement from the current
> > code so it should hit 'master'; I was a bit busy lately and then
> > am sick, and also we are post -rc1 freeze now and I was being
> > cautious, just in case some nacks from more informed parties
> > arrive late.
> 
> Sorry for the late nack, but it turns out that this patch breaks diff
> output on the Wine server for files that are not utf-8.
> 
> The cause is apparently that decode_utf8() returns undef for invalid
> sequences instead of substituting a replacement char like
> decode("utf8") does.
> 
> That may be considered an Encode bug since we are running a fairly old
> version (1.99, coming with Debian 3.1), but I'd rather not upgrade
> perl on the server. Could the patch be reverted, or done differently?

I hit the same problem:
http://marc.info/?l=git&m=117978122420441&w=2

On my system, I use this patch as workaround:
http://marc.info/?l=git&m=118038526531694&w=2

mfg Martin Kögler

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-06-03 22:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-24 14:05 [PATCH] gitweb: use decode_utf8 directly Ismail Dönmez
2007-04-27  8:55 ` Ismail Dönmez
2007-04-27  9:07   ` Junio C Hamano
2007-04-27  9:22     ` Ismail Dönmez
2007-04-27 19:29       ` Junio C Hamano
2007-05-01 21:12         ` Ismail Dönmez
2007-05-01 21:39           ` Junio C Hamano
2007-05-01 21:44             ` Ismail Dönmez
2007-05-01 21:48               ` Ismail Dönmez
2007-05-03 19:22             ` Ismail Dönmez
2007-05-03 19:26               ` Junio C Hamano
2007-06-01 13:45                 ` Alexandre Julliard
2007-06-01 13:50                   ` Ismail Dönmez
2007-06-01 16:51                     ` Alexandre Julliard
2007-06-01 19:44                   ` Junio C Hamano
2007-06-01 19:47                     ` Ismail Dönmez
2007-06-01 20:00                       ` Junio C Hamano
2007-06-01 20:08                         ` Ismail Dönmez
2007-06-03 22:06                           ` Junio C Hamano
2007-06-03 22:13                             ` Ismail Dönmez
2007-06-02  8:22                   ` Jakub Narebski
2007-06-01 16:13 Martin Koegler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.