All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] gitweb: Highlight interesting parts of diff
@ 2012-02-10  9:18 Michał Kiedrowicz
  2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
                   ` (8 more replies)
  0 siblings, 9 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

Reading diff output is sometimes very hard, even if it's colored,
especially if lines differ only in few characters. This is often true when
a commit fixes a typo or renames some variables or functions.

This patch series teaches gitweb to highlight fragments that are different
between old and new line. This should mimic the same feature in Trac or
GitHub.

To achieve that, added and removed lines must be accumulated and compared.
The current code in print_sidebyside_diff_chunk() already does this, so
this patch series reuses it in commits:

  gitweb: Extract print_sidebyside_diff_lines()
  gitweb: Use print_diff_chunk() for both side-by-side and inline diffs

Next, HTML-formatting diff lines is pushed down to the place where they
are about to be printed. This is required because comparision must be
performened on raw git-diff output and not on HTML-formatted lines. This
is done in commits:

  gitweb: Move HTML-formatting diff line back to process_diff_line()
  gitweb: Push formatting diff lines to print_diff_chunk()
  gitweb: Format diff lines just before printing

The last three commits implement the advertised feature. They could be
squashed together but that would make them harder to review (I think).

  gitweb: Highlight interesting parts of diff
  gitweb: Use different colors to present marked changes
  gitweb: Highlight combined diffs

This series is based on v1.7.9.

Michał Kiedrowicz (8):
  gitweb: Extract print_sidebyside_diff_lines()
  gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
  gitweb: Move HTML-formatting diff line back to process_diff_line()
  gitweb: Push formatting diff lines to print_diff_chunk()
  gitweb: Format diff lines just before printing
  gitweb: Highlight interesting parts of diff
  gitweb: Use different colors to present marked changes
  gitweb: Highlight combined diffs

 gitweb/gitweb.perl       |  304 +++++++++++++++++++++++++++++++++++-----------
 gitweb/static/gitweb.css |    8 ++
 2 files changed, 238 insertions(+), 74 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines()
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-11 15:20   ` Jakub Narebski
  2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

Currently, print_sidebyside_diff_chunk() does two things: it
accumulates diff lines and prints them. Accumulation may be used to
perform additional operations on diff lines, so it makes sense to split
these two things. Thus, the code that prints diff lines in a side-by-side
manner is moved out of print_sidebyside_diff_chunk() to a separate
subroutine.

The outcome of this patch is that print_sidebyside_diff_chunk() is now
much shorter and easier to read.

This change is meant to be a simple code movement. No behavior change is
intended.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   97 ++++++++++++++++++++++++++++------------------------
 1 files changed, 52 insertions(+), 45 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index abb5a79..1247607 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4861,6 +4861,53 @@ sub git_difftree_body {
 	print "</table>\n";
 }
 
+# Print context lines and then rem/add lines in a side-by-side manner.
+sub print_sidebyside_diff_lines {
+	my ($ctx, $rem, $add) = @_;
+
+	# print context block before add/rem block
+	if (@$ctx) {
+		print join '',
+			'<div class="chunk_block ctx">',
+				'<div class="old">',
+				@$ctx,
+				'</div>',
+				'<div class="new">',
+				@$ctx,
+				'</div>',
+			'</div>';
+	}
+
+	if (!@$add) {
+		# pure removal
+		print join '',
+			'<div class="chunk_block rem">',
+				'<div class="old">',
+				@$rem,
+				'</div>',
+			'</div>';
+	} elsif (!@$rem) {
+		# pure addition
+		print join '',
+			'<div class="chunk_block add">',
+				'<div class="new">',
+				@$add,
+				'</div>',
+			'</div>';
+	} else {
+		# assume that it is change
+		print join '',
+			'<div class="chunk_block chg">',
+				'<div class="old">',
+				@$rem,
+				'</div>',
+				'<div class="new">',
+				@$add,
+				'</div>',
+			'</div>';
+	}
+}
+
 sub print_sidebyside_diff_chunk {
 	my @chunk = @_;
 	my (@ctx, @rem, @add);
@@ -4887,51 +4934,11 @@ sub print_sidebyside_diff_chunk {
 			next;
 		}
 
-		## print from accumulator when type of class of lines change
-		# empty contents block on start rem/add block, or end of chunk
-		if (@ctx && (!$class || $class eq 'rem' || $class eq 'add')) {
-			print join '',
-				'<div class="chunk_block ctx">',
-					'<div class="old">',
-					@ctx,
-					'</div>',
-					'<div class="new">',
-					@ctx,
-					'</div>',
-				'</div>';
-			@ctx = ();
-		}
-		# empty add/rem block on start context block, or end of chunk
-		if ((@rem || @add) && (!$class || $class eq 'ctx')) {
-			if (!@add) {
-				# pure removal
-				print join '',
-					'<div class="chunk_block rem">',
-						'<div class="old">',
-						@rem,
-						'</div>',
-					'</div>';
-			} elsif (!@rem) {
-				# pure addition
-				print join '',
-					'<div class="chunk_block add">',
-						'<div class="new">',
-						@add,
-						'</div>',
-					'</div>';
-			} else {
-				# assume that it is change
-				print join '',
-					'<div class="chunk_block chg">',
-						'<div class="old">',
-						@rem,
-						'</div>',
-						'<div class="new">',
-						@add,
-						'</div>',
-					'</div>';
-			}
-			@rem = @add = ();
+		## print from accumulator when have some add/rem lines or end
+		# of chunk (flush context lines)
+		if (((@rem || @add) && $class eq 'ctx') || !$class) {
+			print_sidebyside_diff_lines(\@ctx, \@rem, \@add);
+			@ctx = @rem = @add = ();
 		}
 
 		## adding lines to accumulator
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
  2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-11 15:53   ` Jakub Narebski
  2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

This renames print_sidebyside_diff_chunk() to print_diff_chunk() and
makes use of it for both side-by-side and inline diffs. Now diff lines
are always accumulated before they are printed. This opens the
possibility to preprocess diff output before it's printed.

The new function print_inline_diff_lines() could reorder diff lines.  It
first prints all context lines, then all removed lines and finally all
added lines. If the diff output consisted of mixed added and removed
lines, gitweb would reorder these lines. This is especially true for
combined diff output, for example:

	 - removed line for first parent
	 + added line for first parent
	  -removed line for second parent
	 ++added line for both parents

would be rendered as:

	- removed line for first parent
	 -removed line for second parent
	+ added line for first parent
	++added line for both parents

To prevent gitweb from reordering lines, print_diff_chunk() calls
print_diff_lines() as soon as it detects that both added and removed
lines are present and there was a class change.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   53 +++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1247607..ed63261 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4908,9 +4908,31 @@ sub print_sidebyside_diff_lines {
 	}
 }
 
-sub print_sidebyside_diff_chunk {
-	my @chunk = @_;
+# Print context lines and then rem/add lines in inline manner.
+sub print_inline_diff_lines {
+	my ($ctx, $rem, $add) = @_;
+
+	print foreach (@$ctx);
+	print foreach (@$rem);
+	print foreach (@$add);
+}
+
+# Print context lines and then rem/add lines.
+sub print_diff_lines {
+	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
+
+	if ($diff_style eq 'sidebyside' && !$is_combined) {
+		print_sidebyside_diff_lines($ctx, $rem, $add);
+	} else {
+		# default 'inline' style and unknown styles
+		print_inline_diff_lines($ctx, $rem, $add);
+	}
+}
+
+sub print_diff_chunk {
+	my ($diff_style, $is_combined, @chunk) = @_;
 	my (@ctx, @rem, @add);
+	my $prev_class = '';
 
 	return unless @chunk;
 
@@ -4935,9 +4957,13 @@ sub print_sidebyside_diff_chunk {
 		}
 
 		## print from accumulator when have some add/rem lines or end
-		# of chunk (flush context lines)
-		if (((@rem || @add) && $class eq 'ctx') || !$class) {
-			print_sidebyside_diff_lines(\@ctx, \@rem, \@add);
+		# of chunk (flush context lines), or when have add and rem
+		# lines and new block is reached (otherwise add/rem lines could
+		# be reordered)
+		if (((@rem || @add) && $class eq 'ctx') || !$class ||
+				(@rem && @add && $class ne $prev_class)) {
+			print_diff_lines(\@ctx, \@rem, \@add, $diff_style,
+					$is_combined);
 			@ctx = @rem = @add = ();
 		}
 
@@ -4954,6 +4980,8 @@ sub print_sidebyside_diff_chunk {
 		if ($class eq 'ctx') {
 			push @ctx, $line;
 		}
+
+		$prev_class = $class;
 	}
 }
 
@@ -5080,22 +5108,17 @@ sub git_patchset_body {
 			$diff_classes .= " $class" if ($class);
 			$line = "<div class=\"$diff_classes\">$line</div>\n";
 
-			if ($diff_style eq 'sidebyside' && !$is_combined) {
-				if ($class eq 'chunk_header') {
-					print_sidebyside_diff_chunk(@chunk);
-					@chunk = ( [ $class, $line ] );
-				} else {
-					push @chunk, [ $class, $line ];
-				}
+			if ($class eq 'chunk_header') {
+				print_diff_chunk($diff_style, $is_combined, @chunk);
+				@chunk = ( [ $class, $line ] );
 			} else {
-				# default 'inline' style and unknown styles
-				print $line;
+				push @chunk, [ $class, $line ];
 			}
 		}
 
 	} continue {
 		if (@chunk) {
-			print_sidebyside_diff_chunk(@chunk);
+			print_diff_chunk($diff_style, $is_combined, @chunk);
 			@chunk = ();
 		}
 		print "</div>\n"; # class="patch"
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line()
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
  2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
  2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-11 16:02   ` Jakub Narebski
  2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

Commit 6ba1eb51b (gitweb: Add a feature to show side-by-side diff,
2011-10-31) for no special reason moved wrapping diff line in <div> out
of format_diff_line(). Bring back old behavior.

This simplifies code in git_patchset_body() and keeps formatting of a
diff line in one place.

The more long-term purpose of this patch is to move formatting diff
lines down to print_diff_chunk(), to allow processing lines without
HTML-formatting.

This is just a refactoring patch. It's not meant to change gitweb
output.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index ed63261..d2f75c4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2332,14 +2332,18 @@ sub process_diff_line {
 
 	if ($from && $to && $line =~ m/^\@{2} /) {
 		$line = format_unidiff_chunk_header($line, $from, $to);
-		return $diff_class, $line;
 
 	} elsif ($from && $to && $line =~ m/^\@{3}/) {
 		$line = format_cc_diff_chunk_header($line, $from, $to);
-		return $diff_class, $line;
-
+	} else {
+		$line = esc_html($line, -nbsp=>1);
 	}
-	return $diff_class, esc_html($line, -nbsp=>1);
+
+	my $diff_classes = "diff";
+	$diff_classes .= " $diff_class" if ($diff_class);
+	$line = "<div class=\"$diff_classes\">$line</div>\n";
+
+	return $diff_class, $line;
 }
 
 # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
@@ -5104,9 +5108,6 @@ sub git_patchset_body {
 			next PATCH if ($patch_line =~ m/^diff /);
 
 			my ($class, $line) = process_diff_line($patch_line, \%from, \%to);
-			my $diff_classes = "diff";
-			$diff_classes .= " $class" if ($class);
-			$line = "<div class=\"$diff_classes\">$line</div>\n";
 
 			if ($class eq 'chunk_header') {
 				print_diff_chunk($diff_style, $is_combined, @chunk);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk()
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (2 preceding siblings ...)
  2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-11 16:29   ` Jakub Narebski
  2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

Now git_patchset_body() only calls diff_line_class() (removed from
process_diff_line()). The latter function is renamed to
format_diff_line() and is called from print_diff_chunk().

This is a pure code movement, needed for processing raw diff lines in
the accumulator in print_diff_chunk(). No behavior change is intended by
this change.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d2f75c4..cae9dfa 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2320,12 +2320,9 @@ sub format_cc_diff_chunk_header {
 }
 
 # process patch (diff) line (not to be used for diff headers),
-# returning class and HTML-formatted (but not wrapped) line
-sub process_diff_line {
-	my $line = shift;
-	my ($from, $to) = @_;
-
-	my $diff_class = diff_line_class($line, $from, $to);
+# returning HTML-formatted (but not wrapped) line
+sub format_diff_line {
+	my ($line, $diff_class, $from, $to) = @_;
 
 	chomp $line;
 	$line = untabify($line);
@@ -2343,7 +2340,7 @@ sub process_diff_line {
 	$diff_classes .= " $diff_class" if ($diff_class);
 	$line = "<div class=\"$diff_classes\">$line</div>\n";
 
-	return $diff_class, $line;
+	return $line;
 }
 
 # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
@@ -4934,7 +4931,7 @@ sub print_diff_lines {
 }
 
 sub print_diff_chunk {
-	my ($diff_style, $is_combined, @chunk) = @_;
+	my ($diff_style, $is_combined, $from, $to, @chunk) = @_;
 	my (@ctx, @rem, @add);
 	my $prev_class = '';
 
@@ -4954,6 +4951,8 @@ sub print_diff_chunk {
 	foreach my $line_info (@chunk) {
 		my ($class, $line) = @$line_info;
 
+		$line = format_diff_line($line, $class, $from, $to);
+
 		# print chunk headers
 		if ($class && $class eq 'chunk_header') {
 			print $line;
@@ -5107,19 +5106,19 @@ sub git_patchset_body {
 
 			next PATCH if ($patch_line =~ m/^diff /);
 
-			my ($class, $line) = process_diff_line($patch_line, \%from, \%to);
+			my $class = diff_line_class($patch_line, \%from, \%to);
 
 			if ($class eq 'chunk_header') {
-				print_diff_chunk($diff_style, $is_combined, @chunk);
-				@chunk = ( [ $class, $line ] );
+				print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
+				@chunk = ( [ $class, $patch_line ] );
 			} else {
-				push @chunk, [ $class, $line ];
+				push @chunk, [ $class, $patch_line ];
 			}
 		}
 
 	} continue {
 		if (@chunk) {
-			print_diff_chunk($diff_style, $is_combined, @chunk);
+			print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
 			@chunk = ();
 		}
 		print "</div>\n"; # class="patch"
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 5/8] gitweb: Format diff lines just before printing
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (3 preceding siblings ...)
  2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-11 17:14   ` Jakub Narebski
  2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

Now we're ready to insert highlightning to diff output.

The call to untabify() remains in the main loop in print_diff_chunk().
The motivation is that it must be called before any call to esc_html()
(because that converts spaces to &nbsp;) and to call it only once.

This is a refactoring patch.  It's not meant to change gitweb output.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   42 +++++++++++++++++++++++++++++++-----------
 1 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cae9dfa..db61553 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2319,13 +2319,22 @@ sub format_cc_diff_chunk_header {
 	return $line;
 }
 
-# process patch (diff) line (not to be used for diff headers),
-# returning HTML-formatted (but not wrapped) line
+# wrap patch (diff) line into a <div> (not to be used for diff headers),
+# the line must be esc_html()'ed
 sub format_diff_line {
 	my ($line, $diff_class, $from, $to) = @_;
 
-	chomp $line;
-	$line = untabify($line);
+	my $diff_classes = "diff";
+	$diff_classes .= " $diff_class" if ($diff_class);
+	$line = "<div class=\"$diff_classes\">$line</div>\n";
+
+	return $line;
+}
+
+# format diff chunk header line (not to be used for diff headers),
+# returning HTML-formatted line
+sub format_diff_chunk_header {
+	my ($line, $diff_class, $from, $to) = @_;
 
 	if ($from && $to && $line =~ m/^\@{2} /) {
 		$line = format_unidiff_chunk_header($line, $from, $to);
@@ -2336,11 +2345,7 @@ sub format_diff_line {
 		$line = esc_html($line, -nbsp=>1);
 	}
 
-	my $diff_classes = "diff";
-	$diff_classes .= " $diff_class" if ($diff_class);
-	$line = "<div class=\"$diff_classes\">$line</div>\n";
-
-	return $line;
+	return format_diff_line($line, $diff_class);
 }
 
 # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
@@ -4918,10 +4923,24 @@ sub print_inline_diff_lines {
 	print foreach (@$add);
 }
 
+# HTML-format diff context, removed and added lines.
+sub format_ctx_rem_add_lines {
+	my ($ctx, $rem, $add) = @_;
+	my (@new_ctx, @new_rem, @new_add);
+
+	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), 'ctx') } @$ctx;
+	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
+	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
+
+	return (\@new_ctx, \@new_rem, \@new_add);
+}
+
 # Print context lines and then rem/add lines.
 sub print_diff_lines {
 	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
 
+	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add);
+
 	if ($diff_style eq 'sidebyside' && !$is_combined) {
 		print_sidebyside_diff_lines($ctx, $rem, $add);
 	} else {
@@ -4951,11 +4970,12 @@ sub print_diff_chunk {
 	foreach my $line_info (@chunk) {
 		my ($class, $line) = @$line_info;
 
-		$line = format_diff_line($line, $class, $from, $to);
+		$line = untabify($line);
 
 		# print chunk headers
 		if ($class && $class eq 'chunk_header') {
-			print $line;
+			print format_diff_chunk_header($line, $class, $from,
+				$to);
 			next;
 		}
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (4 preceding siblings ...)
  2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-10 13:23   ` Jakub Narebski
                     ` (2 more replies)
  2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

Reading diff output is sometimes very hard, even if it's colored,
especially if lines differ only in few characters.  This is often true
when a commit fixes a typo or renames some variables or functions.

This commit teaches gitweb to highlight characters that are different
between old and new line.  This should work in the similar manner as in
Trac or GitHub.

The code that comares lines is based on
contrib/diff-highlight/diff-highlight, except that it works with
multiline changes too.  It also won't highlight lines that are
completely different because that would only make the output unreadable.
Combined diffs are not supported but a following commit will change it.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   82 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index db61553..1a5b454 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2322,7 +2322,7 @@ sub format_cc_diff_chunk_header {
 # wrap patch (diff) line into a <div> (not to be used for diff headers),
 # the line must be esc_html()'ed
 sub format_diff_line {
-	my ($line, $diff_class, $from, $to) = @_;
+	my ($line, $diff_class) = @_;
 
 	my $diff_classes = "diff";
 	$diff_classes .= " $diff_class" if ($diff_class);
@@ -4923,14 +4923,85 @@ sub print_inline_diff_lines {
 	print foreach (@$add);
 }
 
+# Highlight characters from $prefix to $suffix and escape HTML.
+# $str is a reference to the array of characters.
+sub esc_html_mark_range {
+	my ($str, $prefix, $suffix) = @_;
+
+	# Don't generate empty <span> element.
+	if ($prefix == $suffix + 1) {
+		return esc_html(join('', @$str), -nbsp=>1);
+	}
+
+	my $before = join('', @{$str}[0..($prefix - 1)]);
+	my $marked = join('', @{$str}[$prefix..$suffix]);
+	my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);
+
+	return esc_html($before, -nbsp=>1) .
+		$cgi->span({-class=>'marked'}, esc_html($marked, -nbsp=>1)) .
+		esc_html($after,-nbsp=>1);
+}
+
+# Format removed and added line, mark changed part and HTML-format them.
+sub format_rem_add_line {
+	my ($rem, $add) = @_;
+	my @r = split(//, $rem);
+	my @a = split(//, $add);
+	my ($esc_rem, $esc_add);
+	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
+	my ($prefix_is_space, $suffix_is_space) = (1, 1);
+
+	while ($prefix < @r && $prefix < @a) {
+		last if ($r[$prefix] ne $a[$prefix]);
+
+		$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
+		$prefix++;
+	}
+
+	while ($suffix_rem >= $prefix && $suffix_add >= $prefix) {
+		last if ($r[$suffix_rem] ne $a[$suffix_add]);
+
+		$suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/);
+		$suffix_rem--;
+		$suffix_add--;
+	}
+
+	# Mark lines that are different from each other, but have some common
+	# part that isn't whitespace.  If lines are completely different, don't
+	# mark them because that would make output unreadable, especially if
+	# diff consists of multiple lines.
+	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
+		|| ($prefix_is_space && $suffix_is_space)) {
+		$esc_rem = esc_html($rem);
+		$esc_add = esc_html($add);
+	} else {
+		$esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
+		$esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
+	}
+
+	return format_diff_line($esc_rem, 'rem'),
+		format_diff_line($esc_add, 'add');
+}
+
 # HTML-format diff context, removed and added lines.
 sub format_ctx_rem_add_lines {
-	my ($ctx, $rem, $add) = @_;
+	my ($ctx, $rem, $add, $is_combined) = @_;
 	my (@new_ctx, @new_rem, @new_add);
+	my $num_add_lines = @$add;
+
+	if (!$is_combined && $num_add_lines > 0 && $num_add_lines == @$rem) {
+		for (my $i = 0; $i < $num_add_lines; $i++) {
+			my ($line_rem, $line_add) = format_rem_add_line(
+				$rem->[$i], $add->[$i]);
+			push @new_rem, $line_rem;
+			push @new_add, $line_add;
+		}
+	} else {
+		@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
+		@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
+	}
 
 	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), 'ctx') } @$ctx;
-	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
-	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
 
 	return (\@new_ctx, \@new_rem, \@new_add);
 }
@@ -4939,7 +5010,8 @@ sub format_ctx_rem_add_lines {
 sub print_diff_lines {
 	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
 
-	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add);
+	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add,
+		$is_combined);
 
 	if ($diff_style eq 'sidebyside' && !$is_combined) {
 		print_sidebyside_diff_lines($ctx, $rem, $add);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 7/8] gitweb: Use different colors to present marked changes
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (5 preceding siblings ...)
  2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-12  0:11   ` Jakub Narebski
  2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
  2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

This makes use of the highlight diff feature.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
I decided to split mechanism (generate HTML page with <span> elements that
mark interesting fragments of diff output) from politics (use these
particular colors for this <span> elements), but otherwise this commit
may be squashed with the previous one. These colors work for me but if
someone comes out with better ones, I'd be happy.

 gitweb/static/gitweb.css |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
index c7827e8..4f87d16 100644
--- a/gitweb/static/gitweb.css
+++ b/gitweb/static/gitweb.css
@@ -438,6 +438,10 @@ div.diff.add {
 	color: #008800;
 }
 
+div.diff.add span.marked {
+	background-color: #77ff77;
+}
+
 div.diff.from_file a.path,
 div.diff.from_file {
 	color: #aa0000;
@@ -447,6 +451,10 @@ div.diff.rem {
 	color: #cc0000;
 }
 
+div.diff.rem span.marked {
+	background-color: #ff7777;
+}
+
 div.diff.chunk_header a,
 div.diff.chunk_header {
 	color: #990099;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 8/8] gitweb: Highlight combined diffs
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (6 preceding siblings ...)
  2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
@ 2012-02-10  9:18 ` Michał Kiedrowicz
  2012-02-12  0:03   ` Jakub Narebski
  2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
  8 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10  9:18 UTC (permalink / raw)
  To: git; +Cc: Michał Kiedrowicz

The highlightning of combined diffs is currently disabled.  This is
because output from a combined diff is much harder to highlight because
it's not obvious which removed and added lines should be compared.

Moreover, code that compares added and removed lines doesn't care about
combined diffs. It only skips first +/- character, treating second +/-
as a line content.

Let's start with a simple case: only highlight changes that come from
one parent, i.e. when every removed line has a corresponding added line
for the same parent.  This way the highlightning cannot get wrong. For
example, following diffs would be highlighted:

	- removed line for first parent
	+ added line for first parent
	  context line
	 -removed line for second parent
	 +added line for second parent

or

	- removed line for first parent
	 -removed line for second parent
	+ added line for first parent
	 +added line for second parent

but following output will not:

	- removed line for first parent
	 -removed line for second parent
	 +added line for second parent
	++added line for both parents

Further changes may introduce more intelligent approach that better
handles combined diffs.

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---
 gitweb/gitweb.perl |   40 +++++++++++++++++++++++++++++++++++++---
 1 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1a5b454..2b6cb9e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4944,13 +4944,16 @@ sub esc_html_mark_range {
 
 # Format removed and added line, mark changed part and HTML-format them.
 sub format_rem_add_line {
-	my ($rem, $add) = @_;
+	my ($rem, $add, $is_combined) = @_;
 	my @r = split(//, $rem);
 	my @a = split(//, $add);
 	my ($esc_rem, $esc_add);
 	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
 	my ($prefix_is_space, $suffix_is_space) = (1, 1);
 
+	# In combined diff we must ignore two +/- characters.
+	$prefix = 2 if ($is_combined);
+
 	while ($prefix < @r && $prefix < @a) {
 		last if ($r[$prefix] ne $a[$prefix]);
 
@@ -4988,11 +4991,42 @@ sub format_ctx_rem_add_lines {
 	my ($ctx, $rem, $add, $is_combined) = @_;
 	my (@new_ctx, @new_rem, @new_add);
 	my $num_add_lines = @$add;
+	my $can_highlight;
+
+	# Highlight if every removed line has a corresponding added line.
+	if ($num_add_lines > 0 && $num_add_lines == @$rem) {
+		$can_highlight = 1;
+
+		# Highlight lines in combined diff only if the chunk contains
+		# diff between the same version, e.g.
+		#
+		#    - a
+		#   -  b
+		#    + c
+		#   +  d
+		#
+		# Otherwise the highlightling would be confusing.
+		if ($is_combined) {
+			for (my $i = 0; $i < $num_add_lines; $i++) {
+				my $prefix_rem = substr($rem->[$i], 0, 2);
+				my $prefix_add = substr($add->[$i], 0, 2);
+
+				$prefix_rem =~ s/-/+/g;
+
+				if ($prefix_rem ne $prefix_add) {
+					$can_highlight = 0;
+					last;
+				}
+			}
+		}
+	} else {
+		$can_highlight = 0;
+	}
 
-	if (!$is_combined && $num_add_lines > 0 && $num_add_lines == @$rem) {
+	if ($can_highlight) {
 		for (my $i = 0; $i < $num_add_lines; $i++) {
 			my ($line_rem, $line_add) = format_rem_add_line(
-				$rem->[$i], $add->[$i]);
+				$rem->[$i], $add->[$i], $is_combined);
 			push @new_rem, $line_rem;
 			push @new_add, $line_add;
 		}
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
@ 2012-02-10 13:23   ` Jakub Narebski
  2012-02-10 14:15     ` Michał Kiedrowicz
  2012-02-14  6:54     ` Michal Kiedrowicz
  2012-02-10 20:20   ` Jeff King
  2012-02-11 23:45   ` Jakub Narebski
  2 siblings, 2 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-10 13:23 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> The code that comares lines is based on
> contrib/diff-highlight/diff-highlight, except that it works with
> multiline changes too.  It also won't highlight lines that are
> completely different because that would only make the output unreadable.
> Combined diffs are not supported but a following commit will change it.

I was thinking that if gitweb were to support "diff refinement
highlighting", it would either use one of *Diff* packages from CPAN,
or "git diff --word-diff" output.
 
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 13:23   ` Jakub Narebski
@ 2012-02-10 14:15     ` Michał Kiedrowicz
  2012-02-10 14:55       ` Jakub Narebski
  2012-02-14  6:54     ` Michal Kiedrowicz
  1 sibling, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10 14:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > The code that comares lines is based on
> > contrib/diff-highlight/diff-highlight, except that it works with
> > multiline changes too.  It also won't highlight lines that are
> > completely different because that would only make the output unreadable.
> > Combined diffs are not supported but a following commit will change it.
> 
> I was thinking that if gitweb were to support "diff refinement
> highlighting", it would either use one of *Diff* packages from CPAN,
> or "git diff --word-diff" output.
>  

I think highlighting inline and side-by-side diff outputs is
something different from "git diff --word-diff". I find it useful for
people who are used to these diff formats (i.e. me :).

OTOH I'm not against using a dedicated package from CPAN. But I think
my approach is proven to work (I use contrib/diff-highlight as a
filter) and more lightweight (doesn't add another dependency to
gitweb). Moreover, adding support for some Diff package may be done
later, at any moment. It's just a matter of replacing one function
(format_rem_add_line()) with the one that uses Diff. 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 14:15     ` Michał Kiedrowicz
@ 2012-02-10 14:55       ` Jakub Narebski
  2012-02-10 17:33         ` Michał Kiedrowicz
  2012-02-10 20:24         ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
  0 siblings, 2 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-10 14:55 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

On Fri, 10 Feb 2012, Michał Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
> > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > 
> > > The code that comares lines is based on
> > > contrib/diff-highlight/diff-highlight, except that it works with
> > > multiline changes too.  It also won't highlight lines that are
> > > completely different because that would only make the output unreadable.
> > > Combined diffs are not supported but a following commit will change it.
> > 
> > I was thinking that if gitweb were to support "diff refinement
> > highlighting", it would either use one of *Diff* packages from CPAN,
> > or "git diff --word-diff" output.
> 
> I think highlighting inline and side-by-side diff outputs is
> something different from "git diff --word-diff". I find it useful for
> people who are used to these diff formats (i.e. me :).

I was thinking about *using* "git diff --word-diff" for diff refinement
highlighting of inline (unified) and side-by-side diff... 

though having an option of showing word-diff would be I think a good
idea in some cases, like e.g. documentation changes.

> OTOH I'm not against using a dedicated package from CPAN. But I think
> my approach is proven to work (I use contrib/diff-highlight as a
> filter) and more lightweight (doesn't add another dependency to
> gitweb). Moreover, adding support for some Diff package may be done
> later, at any moment. It's just a matter of replacing one function
> (format_rem_add_line()) with the one that uses Diff. 

O.K., if it is tested code, then all is good.  Well, except the fact
that I'm rather wary about adding more code to gitweb when it is still
single monolithic script, rather than split into packages.

Anyway, I'll try to review those patches soon.  I like the refactoring
work (that is from what I had chance to examine).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 14:55       ` Jakub Narebski
@ 2012-02-10 17:33         ` Michał Kiedrowicz
  2012-02-10 22:52           ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
  2012-02-10 20:24         ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
  1 sibling, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10 17:33 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> On Fri, 10 Feb 2012, Michał Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > > 
> > > > The code that comares lines is based on
> > > > contrib/diff-highlight/diff-highlight, except that it works with
> > > > multiline changes too.  It also won't highlight lines that are
> > > > completely different because that would only make the output unreadable.
> > > > Combined diffs are not supported but a following commit will change it.
> > > 
> > > I was thinking that if gitweb were to support "diff refinement
> > > highlighting", it would either use one of *Diff* packages from CPAN,
> > > or "git diff --word-diff" output.
> > 
> > I think highlighting inline and side-by-side diff outputs is
> > something different from "git diff --word-diff". I find it useful for
> > people who are used to these diff formats (i.e. me :).
> 
> I was thinking about *using* "git diff --word-diff" for diff refinement
> highlighting of inline (unified) and side-by-side diff... 
> 

Then I must have misunderstood you.  

> though having an option of showing word-diff would be I think a good
> idea in some cases, like e.g. documentation changes.
> 
> > OTOH I'm not against using a dedicated package from CPAN. But I think
> > my approach is proven to work (I use contrib/diff-highlight as a
> > filter) and more lightweight (doesn't add another dependency to
> > gitweb). Moreover, adding support for some Diff package may be done
> > later, at any moment. It's just a matter of replacing one function
> > (format_rem_add_line()) with the one that uses Diff. 
> 
> O.K., if it is tested code, then all is good.  

As I wrote, I haven't taken the code as-is (for example, original
code only works for oneline changes). But the general approach is the
same.

> Well, except the fact
> that I'm rather wary about adding more code to gitweb when it is still
> single monolithic script, rather than split into packages.
> 

Yeah, jumping between 2k'th and 5k'th line isn't a great fun. Do you
have any roadmap how to split gitweb?

> Anyway, I'll try to review those patches soon.  I like the refactoring
> work (that is from what I had chance to examine).

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
  2012-02-10 13:23   ` Jakub Narebski
@ 2012-02-10 20:20   ` Jeff King
  2012-02-10 21:29     ` Michał Kiedrowicz
  2012-02-10 21:56     ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
  2012-02-11 23:45   ` Jakub Narebski
  2 siblings, 2 replies; 67+ messages in thread
From: Jeff King @ 2012-02-10 20:20 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

On Fri, Feb 10, 2012 at 10:18:12AM +0100, Michał Kiedrowicz wrote:

> The code that comares lines is based on
> contrib/diff-highlight/diff-highlight, except that it works with
> multiline changes too.  It also won't highlight lines that are
> completely different because that would only make the output unreadable.
> Combined diffs are not supported but a following commit will change it.

Have you considered contributing back the enhancements to
contrib/diff-highlight? I took a look at handling multi-line changes
when I originally wrote it, but I was worried too much about failing to
match up lines properly, and ending up with too much noise in the diff.
Maybe your "don't highlight lines that are completely different" rule
helps that, though.

Do you have any examples handy? (I was hoping not to need to get a
running gitweb installation in order to see the output).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 14:55       ` Jakub Narebski
  2012-02-10 17:33         ` Michał Kiedrowicz
@ 2012-02-10 20:24         ` Jeff King
  1 sibling, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-10 20:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Michał Kiedrowicz, git

On Fri, Feb 10, 2012 at 03:55:19PM +0100, Jakub Narebski wrote:

> > I think highlighting inline and side-by-side diff outputs is
> > something different from "git diff --word-diff". I find it useful for
> > people who are used to these diff formats (i.e. me :).
> 
> I was thinking about *using* "git diff --word-diff" for diff refinement
> highlighting of inline (unified) and side-by-side diff... 

Yeah, you might be able to get better results out of a real diff engine
(i.e., having "git diff" do the color by doing a real LCS-match) than
out of diff-highlight (which is really just a couple of simple
heuristics). OTOH, the heuristics sometimes make the result "less noisy"
because they don't find little bits of commonality, and instead
highlight only a single chunk per line.

So if somebody wanted to work on that, I'd be really curious to see if
they could get better results. It's not high enough priority for me
personally, as I find diff-highlight is "good enough".

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 20:20   ` Jeff King
@ 2012-02-10 21:29     ` Michał Kiedrowicz
  2012-02-10 21:32       ` Jeff King
  2012-02-10 21:56     ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
  1 sibling, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10 21:29 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:

> On Fri, Feb 10, 2012 at 10:18:12AM +0100, Michał Kiedrowicz wrote:
> 
> > The code that comares lines is based on
> > contrib/diff-highlight/diff-highlight, except that it works with
> > multiline changes too.  It also won't highlight lines that are
> > completely different because that would only make the output unreadable.
> > Combined diffs are not supported but a following commit will change it.
> 
> Have you considered contributing back the enhancements to
> contrib/diff-highlight? 

Yeah, I did. In fact, at work I have a hacked version of your
diff-highlight that supports multiline changes and I use it every day.
But I just couldn't make myself fix your long README and send a
patch :).

Maybe I'll cook something in my spare time.

> I took a look at handling multi-line changes
> when I originally wrote it, but I was worried too much about failing to
> match up lines properly, and ending up with too much noise in the diff.
> Maybe your "don't highlight lines that are completely different" rule
> helps that, though.

I must say that it works great for me. Most often it's very helping.
Like every heuristics it sometimes goes the wrong way, but it's so rare
that I don't find it disturbing.

> 
> Do you have any examples handy? (I was hoping not to need to get a
> running gitweb installation in order to see the output).
> 
> -Peff

Nope. Except for comparing diffs in various commits in gitweb-1.7.9 and
from my branch, I just created a dummy commit with different kinds of
changes to check if they are properly colorized. 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 21:29     ` Michał Kiedrowicz
@ 2012-02-10 21:32       ` Jeff King
  2012-02-10 21:36         ` Michał Kiedrowicz
  2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
  0 siblings, 2 replies; 67+ messages in thread
From: Jeff King @ 2012-02-10 21:32 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

On Fri, Feb 10, 2012 at 10:29:16PM +0100, Michał Kiedrowicz wrote:

> > Have you considered contributing back the enhancements to
> > contrib/diff-highlight? 
> 
> Yeah, I did. In fact, at work I have a hacked version of your
> diff-highlight that supports multiline changes and I use it every day.
> But I just couldn't make myself fix your long README and send a
> patch :).

Heh. Why don't you show your hack in the meantime. Even if the code is
ugly or the documentation is missing, I'd like to see the output, and
maybe we can fix those other things together.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 21:32       ` Jeff King
@ 2012-02-10 21:36         ` Michał Kiedrowicz
  2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
  1 sibling, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10 21:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:

> On Fri, Feb 10, 2012 at 10:29:16PM +0100, Michał Kiedrowicz wrote:
> 
> > > Have you considered contributing back the enhancements to
> > > contrib/diff-highlight? 
> > 
> > Yeah, I did. In fact, at work I have a hacked version of your
> > diff-highlight that supports multiline changes and I use it every day.
> > But I just couldn't make myself fix your long README and send a
> > patch :).
> 
> Heh. Why don't you show your hack in the meantime. Even if the code is
> ugly or the documentation is missing, I'd like to see the output, and
> maybe we can fix those other things together.
> 
> -Peff

Then just give must few minutes.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH] diff-highlight: Work for multiline changes too
  2012-02-10 21:32       ` Jeff King
  2012-02-10 21:36         ` Michał Kiedrowicz
@ 2012-02-10 21:47         ` Michał Kiedrowicz
  2012-02-13 22:27           ` Jeff King
  1 sibling, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-10 21:47 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Michał Kiedrowicz

Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
---

After looking at outputs I noticed that it can also ignore lines with
prefixes/suffixes that consist only of punctuation (asterisk, semicolon, dot,
etc), because otherwise whole line is highlighted except for terminating
punctuation.

 contrib/diff-highlight/diff-highlight |   96 ++++++++++++++++++++++-----------
 1 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index d893898..4811550 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -1,28 +1,40 @@
 #!/usr/bin/perl
 
+use warnings;
+use strict;
+
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
 my $HIGHLIGHT   = "\x1b[7m";
 my $UNHIGHLIGHT = "\x1b[27m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 
-my @window;
+my $context;
+my @removed = ();
+my @added = ();
+my $started = 0;
 
 while (<>) {
-	# We highlight only single-line changes, so we need
-	# a 4-line window to make a decision on whether
-	# to highlight.
-	push @window, $_;
-	next if @window < 4;
-	if ($window[0] =~ /^$COLOR*(\@| )/ &&
-	    $window[1] =~ /^$COLOR*-/ &&
-	    $window[2] =~ /^$COLOR*\+/ &&
-	    $window[3] !~ /^$COLOR*\+/) {
-		print shift @window;
-		show_pair(shift @window, shift @window);
-	}
-	else {
-		print shift @window;
+	if (/^$COLOR*-/) {
+		push @removed, $_;
+	} elsif (/^$COLOR*\+/) {
+		push @added, $_;
+	} else {
+		if ($started == 1 ) {
+			show_pairs(\@removed, \@added);
+		} else {
+			print @removed;
+			print @added;
+		}
+		@removed = ();
+		@added = ();
+		print $_;
+
+		if (/^$COLOR*(\@| )/) {
+			$started = 1;
+		} else {
+			$started = 0;
+		}
 	}
 
 	# Most of the time there is enough output to keep things streaming,
@@ -38,23 +50,33 @@ while (<>) {
 	}
 }
 
-# Special case a single-line hunk at the end of file.
-if (@window == 3 &&
-    $window[0] =~ /^$COLOR*(\@| )/ &&
-    $window[1] =~ /^$COLOR*-/ &&
-    $window[2] =~ /^$COLOR*\+/) {
-	print shift @window;
-	show_pair(shift @window, shift @window);
-}
-
-# And then flush any remaining lines.
-while (@window) {
-	print shift @window;
-}
+show_pairs(\@removed, \@added);
 
 exit 0;
 
-sub show_pair {
+sub show_pairs {
+	my $a = shift;
+	my $b = shift;
+
+	if (scalar(@{$a}) == scalar(@{$b}) && scalar(@${a}) > 0) {
+		my @removed;
+		my @added;
+
+		for(my $i = 0; $i < scalar(@{$a}); $i++) {
+			my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]);
+			push @removed, $rm;
+			push @added, $add;
+		}
+
+		print @removed;
+		print @added;
+	} else {
+		print @{$a};
+		print @{$b};
+	}
+}
+
+sub highlight_pair {
 	my @a = split_line(shift);
 	my @b = split_line(shift);
 
@@ -101,8 +123,20 @@ sub show_pair {
 		}
 	}
 
-	print highlight(\@a, $pa, $sa);
-	print highlight(\@b, $pb, $sb);
+	my $prefa = join('', @a[0..($pa-1)]);
+	my $prefb = join('', @b[0..($pb-1)]);
+	my $sufa = join('', @a[($sa+1)..$#a]);
+	my $sufb = join('', @b[($sb+1)..$#b]);
+
+	# Highlight only if prefix or suffix is interesting (i.e. not consisting
+	# of color and (for prefix) +/-). Otherwise we would highlight whole
+	# lines.
+	if ($prefa =~ /^($COLOR)*-(\s|$COLOR)*$/ && $sufa =~ /^(\s|$COLOR)*$/
+		&& $prefb =~ /^($COLOR)*\+(\s|$COLOR)*$/ && $sufb =~ /^(\s|$COLOR)*$/) {
+		return join('', @a), join('', @b);
+	} else {
+		return highlight(\@a, $pa, $sa), highlight(\@b, $pb, $sb);
+	}
 }
 
 sub split_line {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 20:20   ` Jeff King
  2012-02-10 21:29     ` Michał Kiedrowicz
@ 2012-02-10 21:56     ` Jakub Narebski
  1 sibling, 0 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-10 21:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Michał Kiedrowicz, git

Jeff King <peff@peff.net> writes:

> (I was hoping not to need to get a running gitweb installation in
> order to see the output).

Well, there is always git-instaweb ;-)

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff)
  2012-02-10 17:33         ` Michał Kiedrowicz
@ 2012-02-10 22:52           ` Jakub Narebski
  0 siblings, 0 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-10 22:52 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:

> > Well, except the fact
> > that I'm rather wary about adding more code to gitweb when it is still
> > single monolithic script, rather than split into packages.
> > 
> 
> Yeah, jumping between 2k'th and 5k'th line isn't a great fun. Do you

There is an easy part, an almost easy part, and a hard part.

The easy part of splitting gitweb is creating infrastructure for it,
at least in the basic case.  The 'gitweb/split' branch in my git forks:

  http://repo.or.cz/w/git/jnareb-git.git
  https://github.com/jnareb/git

contains changes to gitweb and gitweb/Makefile, and splitting off
Gitweb::Util as an example; I'd have to update this branch to current
state of gitweb.

The almost easy part is to come up with a way to split gitweb.  Do we
follow SVN::Web (Subversion web interface in Perl), or maybe Gitalist
(git web interface in Perl, using Catalyst MVC framework)?  Do we use
MVC paradigm?  Or do we split on functionality: diffs, blobs, trees,
logs, etc.?


The hard part is about splitting main parts of gitweb.  It is easy to
put generic subroutines that are not specific to git or gitweb in
Gitweb::Util (or Gitweb::Util::* submodules).  It would be almost as
easy to put parsing of git command output in Gitweb::Parse (or 
Gitweb::Parse::* submodules).

The problem is with putting actual actions in separate submodules.
For that we would need to replace our hacky "longjmp"-based error handling
(nonlocal goto in Perl is roughly equivalent to longjmp() in C) to
exception-based one, as I don't think going back to exit-based error
ahndling is a good idea.  We would need exception-based error handling
if we want to implement HTTP output caching anyway, I think.

Not to not reimplement the wheel, badly, we will do better to use some
non-core Perl modules, namely Try::Tiny for capturing exceptions, and
HTTP::Exception (based on Exception::Class) for throwing exceptions.
So we would have to add a way to handle such non-core modules, perhaps
bundling them with gitweb, like Error is bundled with Git.pm and used
if it is not present on install.

Maybe Module::Install would help us there with bundling of such
dependencies for install; maybe "push @INC, __DIR__.'/inc' would be
a good idea.

It needs thinking about.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines()
  2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
@ 2012-02-11 15:20   ` Jakub Narebski
  2012-02-11 23:03     ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 15:20 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Currently, print_sidebyside_diff_chunk() does two things: it
> accumulates diff lines and prints them. Accumulation may be used to
> perform additional operations on diff lines, so it makes sense to split
> these two things. Thus, the code that prints diff lines in a side-by-side
> manner is moved out of print_sidebyside_diff_chunk() to a separate
> subroutine.
> 
> The outcome of this patch is that print_sidebyside_diff_chunk() is now
> much shorter and easier to read.
> 
> This change is meant to be a simple code movement. No behavior change is
> intended.
> 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>

As it is pure refactoring, and improves readibility of code quite a
bit, I am all for it, but...

You replace the following set of conditions

> -		## print from accumulator when type of class of lines change
> -		# empty contents block on start rem/add block, or end of chunk
> -		if (@ctx && (!$class || $class eq 'rem' || $class eq 'add')) {

and

> -		# empty add/rem block on start context block, or end of chunk
> -		if ((@rem || @add) && (!$class || $class eq 'ctx')) {
> -			if (!@add) {
[...]
> -			} elsif (!@rem) {
[...]
> -			} else {


with these (I have reindented the code to match)

> +		## print from accumulator when have some add/rem lines or end
> +		# of chunk (flush context lines)
> +		if (((@rem || @add) && $class eq 'ctx') || !$class) {

> +                     if (@$ctx) {
[...]
> +                     }
> +                     if (!@$add) {
[...]
> +                     } elsif (!@$rem) {
[...]
> +                     } else {

It is not obvious that the final result is the same.

BTW doesn't new code print context when printing added and removed
lines, and not as soon as class changes?  This doesn't change the
output, but it slightly changes behavior.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
  2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
@ 2012-02-11 15:53   ` Jakub Narebski
  2012-02-11 23:16     ` Michał Kiedrowicz
  2012-02-25  9:00     ` Michał Kiedrowicz
  0 siblings, 2 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 15:53 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> This renames print_sidebyside_diff_chunk() to print_diff_chunk() and
> makes use of it for both side-by-side and inline diffs. Now diff lines
> are always accumulated before they are printed. This opens the
> possibility to preprocess diff output before it's printed.
> 
> The new function print_inline_diff_lines() could reorder diff lines.

I think you meant here "if left as is", isn't it?

> It first prints all context lines, then all removed lines and
> finally all added lines. If the diff output consisted of mixed added
> and removed lines, gitweb would reorder these lines. This is
> especially true for combined diff output, for example:
> 
> 	 - removed line for first parent
> 	 + added line for first parent
> 	  -removed line for second parent
> 	 ++added line for both parents
> 
> would be rendered as:
> 
> 	- removed line for first parent
> 	 -removed line for second parent
> 	+ added line for first parent
> 	++added line for both parents

This is not "is especially true"; it can happen (though: can it
really?) in combined diff output; in ordinary diff you would always
have context or end/beginning of chunk between added and removed
lines.

> 
> To prevent gitweb from reordering lines, print_diff_chunk() calls
> print_diff_lines() as soon as it detects that both added and removed
> lines are present and there was a class change.

You really should mention, in my opinion, that this change is
preparation for diff refinement highlighting (higlighting the
differing segments).

Otherwise I don't see the reason for it: ordinary diff didn't need the
fancy stuff with accumulating then printing, and could be send
straightly to output.  This adds complexity for non-sidebyside diff,
unnecessary if not for diff refinement highlighting.
 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> ---
>  gitweb/gitweb.perl |   53 +++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 38 insertions(+), 15 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 1247607..ed63261 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -4908,9 +4908,31 @@ sub print_sidebyside_diff_lines {
>  	}
>  }
>  
> -sub print_sidebyside_diff_chunk {
> -	my @chunk = @_;
> +# Print context lines and then rem/add lines in inline manner.
> +sub print_inline_diff_lines {
> +	my ($ctx, $rem, $add) = @_;
> +
> +	print foreach (@$ctx);
> +	print foreach (@$rem);
> +	print foreach (@$add);
> +}
> +
> +# Print context lines and then rem/add lines.
> +sub print_diff_lines {
> +	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
> +
> +	if ($diff_style eq 'sidebyside' && !$is_combined) {
> +		print_sidebyside_diff_lines($ctx, $rem, $add);
> +	} else {
> +		# default 'inline' style and unknown styles
> +		print_inline_diff_lines($ctx, $rem, $add);
> +	}
> +}

That looks nice.

BTW. I didn't examine the final code, but what happens for binary
diffs that git supports?  Is it handled outside print_diff_chunk()?

> +
> +sub print_diff_chunk {
> +	my ($diff_style, $is_combined, @chunk) = @_;
>  	my (@ctx, @rem, @add);
> +	my $prev_class = '';

Is $prev_class here class of previous chunk fragment or _previous class_
(i.e. always different from $class), or is it class of previous line?
It is not obvious here, from variable name.  Better name or at least
a comment would be appreciated.

>  
>  	return unless @chunk;
>  
> @@ -4935,9 +4957,13 @@ sub print_sidebyside_diff_chunk {
>  		}
>  
>  		## print from accumulator when have some add/rem lines or end
> -		# of chunk (flush context lines)
> -		if (((@rem || @add) && $class eq 'ctx') || !$class) {
> -			print_sidebyside_diff_lines(\@ctx, \@rem, \@add);
> +		# of chunk (flush context lines), or when have add and rem
> +		# lines and new block is reached (otherwise add/rem lines could
> +		# be reordered)
> +		if (((@rem || @add) && $class eq 'ctx') || !$class ||
> +				(@rem && @add && $class ne $prev_class)) {

We usually use tabs to align, but spaces to indent in gitweb, so it
would be

> +		if (((@rem || @add) && $class eq 'ctx') || !$class ||
> +		    (@rem && @add && $class ne $prev_class)) {

Anyway, this conditional gets ridicously complicated.  I wonder if we
could simplify it.

If we allow printing context together widh added/removed lines, and
not as soon as possible, perhaps good conditional would be: class
changed, and accumulator for current class is full:

> +		if ($class ne $prev_class && @{ $acc{$class} }) {

Or something like that, where

        my %acc = (ctx => \@ctx, add => \@add, rem => \@rem);

If we want to print context as soon as possible, like it was before
1st patch in this series, we could uuse the following conditional

> +		if ($class ne $prev_class && 
> +                 ($prev_class eq 'ctx' || @{ $acc{$class} })) {

Perhaps with "scalar @{ $acc{$class} }" if it would make things more
clear.

Though probably to avoid "Use of uninitialized value in string ne"
we would probably have to use empty string instead of undefined
value for "no class" case, i.e. beginning or end of chunk.

> +			print_diff_lines(\@ctx, \@rem, \@add, $diff_style,
> +					$is_combined);
>  			@ctx = @rem = @add = ();
>  		}
>  
> @@ -4954,6 +4980,8 @@ sub print_sidebyside_diff_chunk {
>  		if ($class eq 'ctx') {
>  			push @ctx, $line;
>  		}
> +
> +		$prev_class = $class;
>  	}
>  }
[...]  

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line()
  2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
@ 2012-02-11 16:02   ` Jakub Narebski
  0 siblings, 0 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 16:02 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Commit 6ba1eb51b (gitweb: Add a feature to show side-by-side diff,
> 2011-10-31) for no special reason moved wrapping diff line in <div> out
> of format_diff_line(). Bring back old behavior.
> 
> This simplifies code in git_patchset_body() and keeps formatting of a
> diff line in one place.
> 
> The more long-term purpose of this patch is to move formatting diff
> lines down to print_diff_chunk(), to allow processing lines without
> HTML-formatting.
> 
> This is just a refactoring patch. It's not meant to change gitweb
> output.

Gah, I don't remember why we ended with change in behavior when adding
side-by-side diff support to gitweb.
 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>

  Acked-by: Jakub Narębski <jnareb@gmail.com>

> ---
>  gitweb/gitweb.perl |   15 ++++++++-------
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index ed63261..d2f75c4 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2332,14 +2332,18 @@ sub process_diff_line {
>  
>  	if ($from && $to && $line =~ m/^\@{2} /) {
>  		$line = format_unidiff_chunk_header($line, $from, $to);
> -		return $diff_class, $line;
>  
>  	} elsif ($from && $to && $line =~ m/^\@{3}/) {
>  		$line = format_cc_diff_chunk_header($line, $from, $to);
> -		return $diff_class, $line;
> -
> +	} else {
> +		$line = esc_html($line, -nbsp=>1);
>  	}
> -	return $diff_class, esc_html($line, -nbsp=>1);
> +
> +	my $diff_classes = "diff";
> +	$diff_classes .= " $diff_class" if ($diff_class);
> +	$line = "<div class=\"$diff_classes\">$line</div>\n";
> +
> +	return $diff_class, $line;
>  }
>  
>  # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
> @@ -5104,9 +5108,6 @@ sub git_patchset_body {
>  			next PATCH if ($patch_line =~ m/^diff /);
>  
>  			my ($class, $line) = process_diff_line($patch_line, \%from, \%to);
> -			my $diff_classes = "diff";
> -			$diff_classes .= " $class" if ($class);
> -			$line = "<div class=\"$diff_classes\">$line</div>\n";
>  
>  			if ($class eq 'chunk_header') {
>  				print_diff_chunk($diff_style, $is_combined, @chunk);
> -- 
> 1.7.3.4
> 

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk()
  2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
@ 2012-02-11 16:29   ` Jakub Narebski
  2012-02-11 23:20     ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 16:29 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Now git_patchset_body() only calls diff_line_class() (removed from
> process_diff_line()). The latter function is renamed to
> format_diff_line() and is called from print_diff_chunk().
> 
> This is a pure code movement, needed for processing raw diff lines in
> the accumulator in print_diff_chunk(). No behavior change is intended by
> this change.

Well, this is not "pure code movement" per se; it is meant to be
refactoring that doesn't change gitweb output nor behavior.

If I understand correctly the change is from

  read
  format
  accumulate
  print
 
to

  read
  accumulate
  format
  print

Isn't it?

As a note I would add also that process_diff_line got renamed to
format_diff_line, and its output changed to returning only
HTML-formatted line, which bringg it in line with other format_*
subroutines.

> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>

I think it is a good change even without subsequent patches.

  Acked-by: Jakub Narębski <jnareb@gmail.com>

> ---
>  gitweb/gitweb.perl |   25 ++++++++++++-------------
>  1 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index d2f75c4..cae9dfa 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2320,12 +2320,9 @@ sub format_cc_diff_chunk_header {
>  }
>  
>  # process patch (diff) line (not to be used for diff headers),
> -# returning class and HTML-formatted (but not wrapped) line
> -sub process_diff_line {
> -	my $line = shift;
> -	my ($from, $to) = @_;
> -
> -	my $diff_class = diff_line_class($line, $from, $to);
> +# returning HTML-formatted (but not wrapped) line
> +sub format_diff_line {
> +	my ($line, $diff_class, $from, $to) = @_;
>  
>  	chomp $line;
>  	$line = untabify($line);
> @@ -2343,7 +2340,7 @@ sub process_diff_line {
>  	$diff_classes .= " $diff_class" if ($diff_class);
>  	$line = "<div class=\"$diff_classes\">$line</div>\n";
>  
> -	return $diff_class, $line;
> +	return $line;
>  }
>  
>  # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
> @@ -4934,7 +4931,7 @@ sub print_diff_lines {
>  }
>  
>  sub print_diff_chunk {
> -	my ($diff_style, $is_combined, @chunk) = @_;
> +	my ($diff_style, $is_combined, $from, $to, @chunk) = @_;
>  	my (@ctx, @rem, @add);
>  	my $prev_class = '';
>  
> @@ -4954,6 +4951,8 @@ sub print_diff_chunk {
>  	foreach my $line_info (@chunk) {
>  		my ($class, $line) = @$line_info;
>  
> +		$line = format_diff_line($line, $class, $from, $to);
> +
>  		# print chunk headers
>  		if ($class && $class eq 'chunk_header') {
>  			print $line;
> @@ -5107,19 +5106,19 @@ sub git_patchset_body {
>  
>  			next PATCH if ($patch_line =~ m/^diff /);
>  
> -			my ($class, $line) = process_diff_line($patch_line, \%from, \%to);
> +			my $class = diff_line_class($patch_line, \%from, \%to);
>  
>  			if ($class eq 'chunk_header') {
> -				print_diff_chunk($diff_style, $is_combined, @chunk);
> -				@chunk = ( [ $class, $line ] );
> +				print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
> +				@chunk = ( [ $class, $patch_line ] );
>  			} else {
> -				push @chunk, [ $class, $line ];
> +				push @chunk, [ $class, $patch_line ];
>  			}
>  		}
>  
>  	} continue {
>  		if (@chunk) {
> -			print_diff_chunk($diff_style, $is_combined, @chunk);
> +			print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
>  			@chunk = ();
>  		}
>  		print "</div>\n"; # class="patch"
> -- 
> 1.7.3.4
> 

Nice!

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/8] gitweb: Format diff lines just before printing
  2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
@ 2012-02-11 17:14   ` Jakub Narebski
  2012-02-11 23:38     ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 17:14 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Now we're ready to insert highlightning to diff output.
> 
> The call to untabify() remains in the main loop in print_diff_chunk().
> The motivation is that it must be called before any call to esc_html()
> (because that converts spaces to &nbsp;) and to call it only once.
> 
> This is a refactoring patch.  It's not meant to change gitweb output.

I'm not sure about this patch.

First, in my opinion it doesn't make much sense standalone, and not
squashed with subsequent patch.

Second, it makes format_diff_line an odd duck among all other format_*
subroutines in that it post-processes HTML output, rather than
generating HTML from data.
 
Why "diff refinement highlighting" cannot be part of format_diff_line()?
If it does need additional data, just pass it as additional parameters
to this subroutine.

Another solution could be to borrow idea from stalled and inactive
committags feature, namely that parts that are HTML are to be passed
as scalar reference (\$str), while plain text (unescaped yet) is to be
passed as string ($str).

> -# process patch (diff) line (not to be used for diff headers),
> -# returning HTML-formatted (but not wrapped) line
> +# wrap patch (diff) line into a <div> (not to be used for diff headers),
> +# the line must be esc_html()'ed
>  sub format_diff_line {

I just don't like this error-prone "the line must be esc_html()'ed".

> +# HTML-format diff context, removed and added lines.
> +sub format_ctx_rem_add_lines {
> +	my ($ctx, $rem, $add) = @_;
> +	my (@new_ctx, @new_rem, @new_add);
> +
> +	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), 'ctx') } @$ctx;
> +	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
> +	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
> +
> +	return (\@new_ctx, \@new_rem, \@new_add);
> +}
> +
>  # Print context lines and then rem/add lines.
>  sub print_diff_lines {
>  	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
>  
> +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add);
> +

Nice trick.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/8] gitweb: Highlight interesting parts of diff
  2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
                   ` (7 preceding siblings ...)
  2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
@ 2012-02-11 18:32 ` Jakub Narebski
  2012-02-11 22:56   ` Michał Kiedrowicz
  8 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 18:32 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:


>   gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
>   gitweb: Move HTML-formatting diff line back to process_diff_line()
>   gitweb: Push formatting diff lines to print_diff_chunk()
>   gitweb: Format diff lines just before printing

Those patshes are expected to not change gitweb output.  Did you check
that gitweb after those changes handles incomplete line marker
correctly?  print_sidebyside_diff_chunk() ealt with 'incomplete' class
in slightly hacky way, assuming side-by-side output.

Namely, does gitweb after those patches print incomplete line marker
correctly, i.e. only once for "inline" output while it prints it for
both sides in "side-by-side" output, and using 'incomplete' class?

You have to treat correctly all the following situations:

* removing end of line at end of file (eol-at-eof):

  diff --git a/a b/b
  index 257cc56..1910281 100644
  --- a/a
  +++ b/b
  @@ -1 +1 @@
  -foo
  +foo
  \ No newline at end of file

* adding eol-at-eof:

  diff --git b/b a/a
  index 1910281..257cc56 100644
  --- b/b
  +++ a/a
  @@ -1 +1 @@
  -foo
  \ No newline at end of file
  +foo

* removing from end of file, preserving lack of eol-at-eof:

  diff --git b/c a/a
  index a5d4a2f..257cc56 100644
  --- b/c
  +++ a/a
  @@ -1,2 +1 @@
   foo
  -foo
  \ No newline at end of file

* change at the end of file, preserving lack of eol-at-eof:

  diff --git a/b b/d
  index 1910281..ba0e162 100644
  --- a/b
  +++ b/d
  @@ -1 +1 @@
  -foo
  \ No newline at end of file
  +bar
  \ No newline at end of file

* change near the end of file, with incomplete line marker in context

  diff --git b/c a/b
  index a5d4a2f..1910281 100644
  --- b/c
  +++ a/b
  @@ -1,2 +1 @@
  -foo
   foo
  \ No newline at end of file


-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/8] gitweb: Highlight interesting parts of diff
  2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
@ 2012-02-11 22:56   ` Michał Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-11 22:56 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> 
> >   gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
> >   gitweb: Move HTML-formatting diff line back to process_diff_line()
> >   gitweb: Push formatting diff lines to print_diff_chunk()
> >   gitweb: Format diff lines just before printing
> 
> Those patshes are expected to not change gitweb output.  

Yes.

> Did you check
> that gitweb after those changes handles incomplete line marker
> correctly?  print_sidebyside_diff_chunk() ealt with 'incomplete' class
> in slightly hacky way, assuming side-by-side output.
> 

Yeah, I saw the 'for' loop at the beginning of
print_sidebyside_diff_chunk() but I didn't look at it very closely.
I'll make sure 'incomplete' lines are properly printed.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines()
  2012-02-11 15:20   ` Jakub Narebski
@ 2012-02-11 23:03     ` Michał Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-11 23:03 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > Currently, print_sidebyside_diff_chunk() does two things: it
> > accumulates diff lines and prints them. Accumulation may be used to
> > perform additional operations on diff lines, so it makes sense to split
> > these two things. Thus, the code that prints diff lines in a side-by-side
> > manner is moved out of print_sidebyside_diff_chunk() to a separate
> > subroutine.
> > 
> > The outcome of this patch is that print_sidebyside_diff_chunk() is now
> > much shorter and easier to read.
> > 
> > This change is meant to be a simple code movement. No behavior change is
> > intended.
> > 
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> 
> As it is pure refactoring, and improves readibility of code quite a
> bit, I am all for it, but...
> 
> You replace the following set of conditions
> 
> > -		## print from accumulator when type of class of lines change
> > -		# empty contents block on start rem/add block, or end of chunk
> > -		if (@ctx && (!$class || $class eq 'rem' || $class eq 'add')) {
> 
> and
> 
> > -		# empty add/rem block on start context block, or end of chunk
> > -		if ((@rem || @add) && (!$class || $class eq 'ctx')) {
> > -			if (!@add) {
> [...]
> > -			} elsif (!@rem) {
> [...]
> > -			} else {
> 
> 
> with these (I have reindented the code to match)
> 
> > +		## print from accumulator when have some add/rem lines or end
> > +		# of chunk (flush context lines)
> > +		if (((@rem || @add) && $class eq 'ctx') || !$class) {
> 
> > +                     if (@$ctx) {
> [...]
> > +                     }
> > +                     if (!@$add) {
> [...]
> > +                     } elsif (!@$rem) {
> [...]
> > +                     } else {
> 
> It is not obvious that the final result is the same.

You are right. I should have described it in the commit message.

> 
> BTW doesn't new code print context when printing added and removed
> lines, and not as soon as class changes?  This doesn't change the
> output, but it slightly changes behavior.
> 

This is also true. I'll describe it in the commit message. I could also
create two functions: one that prints @ctx and second one that prints
@rem and @add but then I would have to create two other functions for
inline diff.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
  2012-02-11 15:53   ` Jakub Narebski
@ 2012-02-11 23:16     ` Michał Kiedrowicz
  2012-02-25  9:00     ` Michał Kiedrowicz
  1 sibling, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-11 23:16 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > This renames print_sidebyside_diff_chunk() to print_diff_chunk() and
> > makes use of it for both side-by-side and inline diffs. Now diff lines
> > are always accumulated before they are printed. This opens the
> > possibility to preprocess diff output before it's printed.
> > 
> > The new function print_inline_diff_lines() could reorder diff lines.
> 
> I think you meant here "if left as is", isn't it?

Yes.

> > It first prints all context lines, then all removed lines and
> > finally all added lines. If the diff output consisted of mixed added
> > and removed lines, gitweb would reorder these lines. This is
> > especially true for combined diff output, for example:
> > 
> > 	 - removed line for first parent
> > 	 + added line for first parent
> > 	  -removed line for second parent
> > 	 ++added line for both parents
> > 
> > would be rendered as:
> > 
> > 	- removed line for first parent
> > 	 -removed line for second parent
> > 	+ added line for first parent
> > 	++added line for both parents
> 
> This is not "is especially true"; it can happen (though: can it
> really?) 

Yeah. See
http://git.kernel.org/?p=git/git.git;a=commitdiff;h=5de89d3abfca98b0dfd0280d28576940c913d60d 


> in combined diff output; in ordinary diff you would always
> have context or end/beginning of chunk between added and removed
> lines.

OK, I'll reword this part of the commit message.
> 
> > 
> > To prevent gitweb from reordering lines, print_diff_chunk() calls
> > print_diff_lines() as soon as it detects that both added and removed
> > lines are present and there was a class change.
> 
> You really should mention, in my opinion, that this change is
> preparation for diff refinement highlighting (higlighting the
> differing segments).
> 
> Otherwise I don't see the reason for it: ordinary diff didn't need the
> fancy stuff with accumulating then printing, and could be send
> straightly to output.  This adds complexity for non-sidebyside diff,
> unnecessary if not for diff refinement highlighting.

I completely agree with that. This patch doesn't makes sense without
later patches.

>  
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> > ---
> >  gitweb/gitweb.perl |   53 +++++++++++++++++++++++++++++++++++++--------------
> >  1 files changed, 38 insertions(+), 15 deletions(-)
> > 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 1247607..ed63261 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -4908,9 +4908,31 @@ sub print_sidebyside_diff_lines {
> >  	}
> >  }
> >  
> > -sub print_sidebyside_diff_chunk {
> > -	my @chunk = @_;
> > +# Print context lines and then rem/add lines in inline manner.
> > +sub print_inline_diff_lines {
> > +	my ($ctx, $rem, $add) = @_;
> > +
> > +	print foreach (@$ctx);
> > +	print foreach (@$rem);
> > +	print foreach (@$add);
> > +}
> > +
> > +# Print context lines and then rem/add lines.
> > +sub print_diff_lines {
> > +	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
> > +
> > +	if ($diff_style eq 'sidebyside' && !$is_combined) {
> > +		print_sidebyside_diff_lines($ctx, $rem, $add);
> > +	} else {
> > +		# default 'inline' style and unknown styles
> > +		print_inline_diff_lines($ctx, $rem, $add);
> > +	}
> > +}
> 
> That looks nice.
> 
> BTW. I didn't examine the final code, but what happens for binary
> diffs that git supports?  Is it handled outside print_diff_chunk()?

Good question. Will check.

> 
> > +
> > +sub print_diff_chunk {
> > +	my ($diff_style, $is_combined, @chunk) = @_;
> >  	my (@ctx, @rem, @add);
> > +	my $prev_class = '';
> 
> Is $prev_class here class of previous chunk fragment or _previous class_
> (i.e. always different from $class), or is it class of previous line?
> It is not obvious here, from variable name.  Better name or at least
> a comment would be appreciated.
> 

It's a class of previous line.

> >  
> >  	return unless @chunk;
> >  
> > @@ -4935,9 +4957,13 @@ sub print_sidebyside_diff_chunk {
> >  		}
> >  
> >  		## print from accumulator when have some add/rem lines or end
> > -		# of chunk (flush context lines)
> > -		if (((@rem || @add) && $class eq 'ctx') || !$class) {
> > -			print_sidebyside_diff_lines(\@ctx, \@rem, \@add);
> > +		# of chunk (flush context lines), or when have add and rem
> > +		# lines and new block is reached (otherwise add/rem lines could
> > +		# be reordered)
> > +		if (((@rem || @add) && $class eq 'ctx') || !$class ||
> > +				(@rem && @add && $class ne $prev_class)) {
> 
> We usually use tabs to align, but spaces to indent in gitweb, so it
> would be
> 
> > +		if (((@rem || @add) && $class eq 'ctx') || !$class ||
> > +		    (@rem && @add && $class ne $prev_class)) {
> 

OK.

> Anyway, this conditional gets ridicously complicated.  I wonder if we
> could simplify it.
> 
> If we allow printing context together widh added/removed lines, and
> not as soon as possible, perhaps good conditional would be: class
> changed, and accumulator for current class is full:
> 
> > +		if ($class ne $prev_class && @{ $acc{$class} }) {
> 
> Or something like that, where
> 
>         my %acc = (ctx => \@ctx, add => \@add, rem => \@rem);
> 
> If we want to print context as soon as possible, like it was before
> 1st patch in this series, we could uuse the following conditional
> 
> > +		if ($class ne $prev_class && 
> > +                 ($prev_class eq 'ctx' || @{ $acc{$class} })) {
> 
> Perhaps with "scalar @{ $acc{$class} }" if it would make things more
> clear.
> 
> Though probably to avoid "Use of uninitialized value in string ne"
> we would probably have to use empty string instead of undefined
> value for "no class" case, i.e. beginning or end of chunk.
>

OK, I'll try to work out something.
 
> > +			print_diff_lines(\@ctx, \@rem, \@add, $diff_style,
> > +					$is_combined);
> >  			@ctx = @rem = @add = ();
> >  		}
> >  
> > @@ -4954,6 +4980,8 @@ sub print_sidebyside_diff_chunk {
> >  		if ($class eq 'ctx') {
> >  			push @ctx, $line;
> >  		}
> > +
> > +		$prev_class = $class;
> >  	}
> >  }
> [...]  
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk()
  2012-02-11 16:29   ` Jakub Narebski
@ 2012-02-11 23:20     ` Michał Kiedrowicz
  2012-02-11 23:30       ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-11 23:20 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > Now git_patchset_body() only calls diff_line_class() (removed from
> > process_diff_line()). The latter function is renamed to
> > format_diff_line() and is called from print_diff_chunk().
> > 
> > This is a pure code movement, needed for processing raw diff lines in
> > the accumulator in print_diff_chunk(). No behavior change is intended by
> > this change.
> 
> Well, this is not "pure code movement" per se; it is meant to be
> refactoring that doesn't change gitweb output nor behavior.
> 
> If I understand correctly the change is from
> 
>   read
>   format
>   accumulate
>   print
>  
> to
> 
>   read
>   accumulate
>   format
>   print
> 
> Isn't it?

Yeah, this is what I meant :).

> 
> As a note I would add also that process_diff_line got renamed to
> format_diff_line, 

I thought I wrote that ("The latter function is renamed...")?

> and its output changed to returning only
> HTML-formatted line, which bringg it in line with other format_*
> subroutines.
> 

OK.

> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> 
> I think it is a good change even without subsequent patches.
> 
>   Acked-by: Jakub Narębski <jnareb@gmail.com>
> 

Thanks.

> > ---
> >  gitweb/gitweb.perl |   25 ++++++++++++-------------
> >  1 files changed, 12 insertions(+), 13 deletions(-)
> > 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index d2f75c4..cae9dfa 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -2320,12 +2320,9 @@ sub format_cc_diff_chunk_header {
> >  }
> >  
> >  # process patch (diff) line (not to be used for diff headers),
> > -# returning class and HTML-formatted (but not wrapped) line
> > -sub process_diff_line {
> > -	my $line = shift;
> > -	my ($from, $to) = @_;
> > -
> > -	my $diff_class = diff_line_class($line, $from, $to);
> > +# returning HTML-formatted (but not wrapped) line
> > +sub format_diff_line {
> > +	my ($line, $diff_class, $from, $to) = @_;
> >  
> >  	chomp $line;
> >  	$line = untabify($line);
> > @@ -2343,7 +2340,7 @@ sub process_diff_line {
> >  	$diff_classes .= " $diff_class" if ($diff_class);
> >  	$line = "<div class=\"$diff_classes\">$line</div>\n";
> >  
> > -	return $diff_class, $line;
> > +	return $line;
> >  }
> >  
> >  # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
> > @@ -4934,7 +4931,7 @@ sub print_diff_lines {
> >  }
> >  
> >  sub print_diff_chunk {
> > -	my ($diff_style, $is_combined, @chunk) = @_;
> > +	my ($diff_style, $is_combined, $from, $to, @chunk) = @_;
> >  	my (@ctx, @rem, @add);
> >  	my $prev_class = '';
> >  
> > @@ -4954,6 +4951,8 @@ sub print_diff_chunk {
> >  	foreach my $line_info (@chunk) {
> >  		my ($class, $line) = @$line_info;
> >  
> > +		$line = format_diff_line($line, $class, $from, $to);
> > +
> >  		# print chunk headers
> >  		if ($class && $class eq 'chunk_header') {
> >  			print $line;
> > @@ -5107,19 +5106,19 @@ sub git_patchset_body {
> >  
> >  			next PATCH if ($patch_line =~ m/^diff /);
> >  
> > -			my ($class, $line) = process_diff_line($patch_line, \%from, \%to);
> > +			my $class = diff_line_class($patch_line, \%from, \%to);
> >  
> >  			if ($class eq 'chunk_header') {
> > -				print_diff_chunk($diff_style, $is_combined, @chunk);
> > -				@chunk = ( [ $class, $line ] );
> > +				print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
> > +				@chunk = ( [ $class, $patch_line ] );
> >  			} else {
> > -				push @chunk, [ $class, $line ];
> > +				push @chunk, [ $class, $patch_line ];
> >  			}
> >  		}
> >  
> >  	} continue {
> >  		if (@chunk) {
> > -			print_diff_chunk($diff_style, $is_combined, @chunk);
> > +			print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
> >  			@chunk = ();
> >  		}
> >  		print "</div>\n"; # class="patch"
> > -- 
> > 1.7.3.4
> > 
> 
> Nice!
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk()
  2012-02-11 23:20     ` Michał Kiedrowicz
@ 2012-02-11 23:30       ` Michał Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-11 23:30 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> wrote:

> Jakub Narebski <jnareb@gmail.com> wrote:
> 
> > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > 
> > > Now git_patchset_body() only calls diff_line_class() (removed from
> > > process_diff_line()). The latter function is renamed to
> > > format_diff_line() and is called from print_diff_chunk().
> > > 
> > > This is a pure code movement, needed for processing raw diff lines in
> > > the accumulator in print_diff_chunk(). No behavior change is intended by
> > > this change.
> > 
> > Well, this is not "pure code movement" per se; it is meant to be
> > refactoring that doesn't change gitweb output nor behavior.
> > 
> > If I understand correctly the change is from
> > 
> >   read
> >   format
> >   accumulate
> >   print
> >  
> > to
> > 
> >   read
> >   accumulate
> >   format
> >   print
> > 
> > Isn't it?
> 
> Yeah, this is what I meant :).

Minor correction: The order *does* change, in the sense that first a
chunk is read, then (in print_diff_chunk) formatted and printed,
but in print_diff_chunk() the order remains:

	read line
	format
	put into accumulator
	print
> 
> > 
> > As a note I would add also that process_diff_line got renamed to
> > format_diff_line, 
> 
> I thought I wrote that ("The latter function is renamed...")?
> 
> > and its output changed to returning only
> > HTML-formatted line, which bringg it in line with other format_*
> > subroutines.
> > 
> 
> OK.
> 
> > > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> > 
> > I think it is a good change even without subsequent patches.
> > 
> >   Acked-by: Jakub Narębski <jnareb@gmail.com>
> > 
> 
> Thanks.
> 
> > > ---
> > >  gitweb/gitweb.perl |   25 ++++++++++++-------------
> > >  1 files changed, 12 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > > index d2f75c4..cae9dfa 100755
> > > --- a/gitweb/gitweb.perl
> > > +++ b/gitweb/gitweb.perl
> > > @@ -2320,12 +2320,9 @@ sub format_cc_diff_chunk_header {
> > >  }
> > >  
> > >  # process patch (diff) line (not to be used for diff headers),
> > > -# returning class and HTML-formatted (but not wrapped) line
> > > -sub process_diff_line {
> > > -	my $line = shift;
> > > -	my ($from, $to) = @_;
> > > -
> > > -	my $diff_class = diff_line_class($line, $from, $to);
> > > +# returning HTML-formatted (but not wrapped) line
> > > +sub format_diff_line {
> > > +	my ($line, $diff_class, $from, $to) = @_;
> > >  
> > >  	chomp $line;
> > >  	$line = untabify($line);
> > > @@ -2343,7 +2340,7 @@ sub process_diff_line {
> > >  	$diff_classes .= " $diff_class" if ($diff_class);
> > >  	$line = "<div class=\"$diff_classes\">$line</div>\n";
> > >  
> > > -	return $diff_class, $line;
> > > +	return $line;
> > >  }
> > >  
> > >  # Generates undef or something like "_snapshot_" or "snapshot (_tbz2_ _zip_)",
> > > @@ -4934,7 +4931,7 @@ sub print_diff_lines {
> > >  }
> > >  
> > >  sub print_diff_chunk {
> > > -	my ($diff_style, $is_combined, @chunk) = @_;
> > > +	my ($diff_style, $is_combined, $from, $to, @chunk) = @_;
> > >  	my (@ctx, @rem, @add);
> > >  	my $prev_class = '';
> > >  
> > > @@ -4954,6 +4951,8 @@ sub print_diff_chunk {
> > >  	foreach my $line_info (@chunk) {
> > >  		my ($class, $line) = @$line_info;
> > >  
> > > +		$line = format_diff_line($line, $class, $from, $to);
> > > +
> > >  		# print chunk headers
> > >  		if ($class && $class eq 'chunk_header') {
> > >  			print $line;
> > > @@ -5107,19 +5106,19 @@ sub git_patchset_body {
> > >  
> > >  			next PATCH if ($patch_line =~ m/^diff /);
> > >  
> > > -			my ($class, $line) = process_diff_line($patch_line, \%from, \%to);
> > > +			my $class = diff_line_class($patch_line, \%from, \%to);
> > >  
> > >  			if ($class eq 'chunk_header') {
> > > -				print_diff_chunk($diff_style, $is_combined, @chunk);
> > > -				@chunk = ( [ $class, $line ] );
> > > +				print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
> > > +				@chunk = ( [ $class, $patch_line ] );
> > >  			} else {
> > > -				push @chunk, [ $class, $line ];
> > > +				push @chunk, [ $class, $patch_line ];
> > >  			}
> > >  		}
> > >  
> > >  	} continue {
> > >  		if (@chunk) {
> > > -			print_diff_chunk($diff_style, $is_combined, @chunk);
> > > +			print_diff_chunk($diff_style, $is_combined, \%from, \%to, @chunk);
> > >  			@chunk = ();
> > >  		}
> > >  		print "</div>\n"; # class="patch"
> > > -- 
> > > 1.7.3.4
> > > 
> > 
> > Nice!
> > 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/8] gitweb: Format diff lines just before printing
  2012-02-11 17:14   ` Jakub Narebski
@ 2012-02-11 23:38     ` Michał Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-11 23:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > Now we're ready to insert highlightning to diff output.
> > 
> > The call to untabify() remains in the main loop in print_diff_chunk().
> > The motivation is that it must be called before any call to esc_html()
> > (because that converts spaces to &nbsp;) and to call it only once.
> > 
> > This is a refactoring patch.  It's not meant to change gitweb output.
> 
> I'm not sure about this patch.
> 
> First, in my opinion it doesn't make much sense standalone, and not
> squashed with subsequent patch.

True. I wanted to separate "preparation" from "adding new feature" but
maybe went few steps too far.

> 
> Second, it makes format_diff_line an odd duck among all other format_*
> subroutines in that it post-processes HTML output, rather than
> generating HTML from data.
>  
> Why "diff refinement highlighting" cannot be part of format_diff_line()?
> If it does need additional data, just pass it as additional parameters
> to this subroutine.
> 
> Another solution could be to borrow idea from stalled and inactive
> committags feature, namely that parts that are HTML are to be passed
> as scalar reference (\$str), while plain text (unescaped yet) is to be
> passed as string ($str).

I'll look into it.

> 
> > -# process patch (diff) line (not to be used for diff headers),
> > -# returning HTML-formatted (but not wrapped) line
> > +# wrap patch (diff) line into a <div> (not to be used for diff headers),
> > +# the line must be esc_html()'ed
> >  sub format_diff_line {
> 
> I just don't like this error-prone "the line must be esc_html()'ed".
> 
> > +# HTML-format diff context, removed and added lines.
> > +sub format_ctx_rem_add_lines {
> > +	my ($ctx, $rem, $add) = @_;
> > +	my (@new_ctx, @new_rem, @new_add);
> > +
> > +	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), 'ctx') } @$ctx;
> > +	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
> > +	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
> > +
> > +	return (\@new_ctx, \@new_rem, \@new_add);
> > +}
> > +
> >  # Print context lines and then rem/add lines.
> >  sub print_diff_lines {
> >  	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
> >  
> > +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add);
> > +
> 
> Nice trick.
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
  2012-02-10 13:23   ` Jakub Narebski
  2012-02-10 20:20   ` Jeff King
@ 2012-02-11 23:45   ` Jakub Narebski
  2012-02-12 10:42     ` Jakub Narebski
  2012-02-13  6:41     ` Michal Kiedrowicz
  2 siblings, 2 replies; 67+ messages in thread
From: Jakub Narebski @ 2012-02-11 23:45 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> Reading diff output is sometimes very hard, even if it's colored,
> especially if lines differ only in few characters.  This is often true
> when a commit fixes a typo or renames some variables or functions.
>
This is certainly nice feature to have.  I think most tools that
implement side-by-side diff implement this too.

> This commit teaches gitweb to highlight characters that are different
> between old and new line.  This should work in the similar manner as in
> Trac or GitHub.
>
Doe you have URLs with good examples of such diff refinement
highlighting (GNU Emacs ediff/emerge terminology) / highlighting
differing segments (diff-highlight terminology)?
 
> The code that compares lines is based on
> contrib/diff-highlight/diff-highlight, except that it works with
> multiline changes too.  It also won't highlight lines that are
> completely different because that would only make the output unreadable.
>
So what does it look like?  From the contrib/diff-highlight/README I
guess that it finds common prefix and common suffix, and highlights
the rest, which includes change.  It doesn't implement LCS / diff
algorithm like e.g. tkdiff does for its diff refinement highlighting,
isn't it?

By completly different you mean that they do not have common prefix or
common suffix (at least one of them), isn't it?

> Combined diffs are not supported but a following commit will change it.
> 
O.K.

> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> ---
>  gitweb/gitweb.perl |   82 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 77 insertions(+), 5 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index db61553..1a5b454 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2322,7 +2322,7 @@ sub format_cc_diff_chunk_header {
>  # wrap patch (diff) line into a <div> (not to be used for diff headers),
>  # the line must be esc_html()'ed
>  sub format_diff_line {
> -	my ($line, $diff_class, $from, $to) = @_;
> +	my ($line, $diff_class) = @_;

Why that change?

>  
>  	my $diff_classes = "diff";
>  	$diff_classes .= " $diff_class" if ($diff_class);
> @@ -4923,14 +4923,85 @@ sub print_inline_diff_lines {
>  	print foreach (@$add);
>  }
>  
> +# Highlight characters from $prefix to $suffix and escape HTML.
> +# $str is a reference to the array of characters.
> +sub esc_html_mark_range {
> +	my ($str, $prefix, $suffix) = @_;
> +
> +	# Don't generate empty <span> element.
> +	if ($prefix == $suffix + 1) {
> +		return esc_html(join('', @$str), -nbsp=>1);
> +	}
> +
> +	my $before = join('', @{$str}[0..($prefix - 1)]);
> +	my $marked = join('', @{$str}[$prefix..$suffix]);
> +	my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);

Eeeeeek!  First you split into letters, in caller at that, then join?
Why not pass striung ($str suggests string not array of characters),
and use substr instead?

[Please disregard this and the next paragraph at first reading]

> +
> +	return esc_html($before, -nbsp=>1) .
> +		$cgi->span({-class=>'marked'}, esc_html($marked, -nbsp=>1)) .
> +		esc_html($after,-nbsp=>1);
> +}

Anyway I have send to git mailing list a patch series, which in one of
patches adds esc_html_match_hl($str, $regexp) to highlight matches in
a string.  Your esc_html_mark_range(), after a generalization, could
be used as underlying "engine".

Something like this, perhaps (untested):

   # Highlight selected fragments of string, using given CSS class,
   # and escape HTML.  It is assumed that fragments do not overlap.
   # Regions are passed as list of pairs (array references).
   sub esc_html_hl {
        my ($str, $css_class, @sel) = @_;
        return esc_html($str) unless @sel;
   
        my $out = '';
        my $pos = 0;
   
        for my $s (@sel) {
                $out .= esc_html(substr($str, $pos, $s->[0] - $pos))
                        if ($s->[0] - $pos > 0);
                $out .= $cgi->span({-class => $css_class},
                                   esc_html(substr($str, $s->[0], $s->[1] - $s->[0])));

                $pos = $m->[1];
        }
        $out .= esc_html(substr($str, $pos))
                if ($pos < length($str));
   
        return $out;
   }

> +
> +# Format removed and added line, mark changed part and HTML-format them.

You should probably ad here that this code is taken from
diff-highlight in contrib area, isn't it?

> +sub format_rem_add_line {
> +	my ($rem, $add) = @_;
> +	my @r = split(//, $rem);
> +	my @a = split(//, $add);
> +	my ($esc_rem, $esc_add);
> +	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);

It is not as much $prefix, as $prefix_len, isn't it?  Shouldn't
$prefix / $prefix_len start from 0, not from 1?

> +	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> +
> +	while ($prefix < @r && $prefix < @a) {
> +		last if ($r[$prefix] ne $a[$prefix]);
> +
> +		$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> +		$prefix++;
> +	}

Ah, I see that it is easier to find common prefix by treating string
as array of characters.

Though I wonder if it wouldn't be easier to use trick of XOR-ing two
strings (see "Bitwise String Operators" in perlop(1)):

        my $xor = "$rem" ^ "$add";

and finding starting sequence of "\0", which denote common prefix.


Though this and the following is a nice implementation of
algorithm... as it would be implemented in C.  Nevermind, it might be
good enough...

> +
> +	while ($suffix_rem >= $prefix && $suffix_add >= $prefix) {
> +		last if ($r[$suffix_rem] ne $a[$suffix_add]);
> +
> +		$suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/);
> +		$suffix_rem--;
> +		$suffix_add--;
> +	}

BTW., perhaps using single negative $suffix_len instead of separate
$suffix_rem_pos and $suffix_add_pos would make code simpler and easier
to understand?

> +
> +	# Mark lines that are different from each other, but have some common
> +	# part that isn't whitespace.  If lines are completely different, don't
> +	# mark them because that would make output unreadable, especially if
> +	# diff consists of multiple lines.
> +	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
> +		|| ($prefix_is_space && $suffix_is_space)) {

Micronit about style: in gitweb we put boolean operator at the end of
continued line, not at beginning of next one.

So this would be:

  +	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a) ||
  +	    ($prefix_is_space && $suffix_is_space)) {

> +		$esc_rem = esc_html($rem);
> +		$esc_add = esc_html($add);
> +	} else {
> +		$esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
> +		$esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
> +	}
> +
> +	return format_diff_line($esc_rem, 'rem'),
> +		format_diff_line($esc_add, 'add');

Please use spaces to align.

> +}
> +
>  # HTML-format diff context, removed and added lines.
>  sub format_ctx_rem_add_lines {
> -	my ($ctx, $rem, $add) = @_;
> +	my ($ctx, $rem, $add, $is_combined) = @_;
>  	my (@new_ctx, @new_rem, @new_add);
> +	my $num_add_lines = @$add;

Why is this temporary variable needed?  If you are not sure that ==
operator enforces scalar context on both arguments you can always
write

  scalar @$add == scalar @$rem

> +
> +	if (!$is_combined && $num_add_lines > 0 && $num_add_lines == @$rem) {
> +		for (my $i = 0; $i < $num_add_lines; $i++) {
> +			my ($line_rem, $line_add) = format_rem_add_line(
> +				$rem->[$i], $add->[$i]);
> +			push @new_rem, $line_rem;
> +			push @new_add, $line_add;

The original contrib/diff-highlight works only for single changed line
(single removed and single added).  You make this code work also for
the case where number of aded lines is equal to the number of removed
lines, and assume that subsequent changed lines in preimage
correcponds to subsequent changed lines in postimage, which is not
always true:

   -foo
   -bar
   +baz
   +fooo

This is not described in commit message, I think.

> +		}
> +	} else {
> +		@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
> +		@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
> +	}
>  
>  	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1), 'ctx') } @$ctx;
> -	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1), 'rem') } @$rem;
> -	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1), 'add') } @$add;
>  
>  	return (\@new_ctx, \@new_rem, \@new_add);
>  }
> @@ -4939,7 +5010,8 @@ sub format_ctx_rem_add_lines {
>  sub print_diff_lines {
>  	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
>  
> -	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add);
> +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem, $add,
> +		$is_combined);

O.K., now the code depends on $is_combined

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 8/8] gitweb: Highlight combined diffs
  2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
@ 2012-02-12  0:03   ` Jakub Narebski
  2012-02-13  6:48     ` Michal Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-12  0:03 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> The highlightning of combined diffs is currently disabled.  This is
> because output from a combined diff is much harder to highlight because
> it's not obvious which removed and added lines should be compared.
> 
> Moreover, code that compares added and removed lines doesn't care about
> combined diffs. It only skips first +/- character, treating second +/-
> as a line content.
> 
> Let's start with a simple case: only highlight changes that come from
> one parent, i.e. when every removed line has a corresponding added line
> for the same parent.  This way the highlightning cannot get wrong. For
> example, following diffs would be highlighted:
> 
> 	- removed line for first parent
> 	+ added line for first parent
> 	  context line
> 	 -removed line for second parent
> 	 +added line for second parent
> 
> or
> 
> 	- removed line for first parent
> 	 -removed line for second parent
> 	+ added line for first parent
> 	 +added line for second parent
> 
> but following output will not:
> 
> 	- removed line for first parent
> 	 -removed line for second parent
> 	 +added line for second parent
> 	++added line for both parents
> 
That is a very reasonable approach.

> Further changes may introduce more intelligent approach that better
> handles combined diffs.
> 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> ---
>  gitweb/gitweb.perl |   40 +++++++++++++++++++++++++++++++++++++---
>  1 files changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 1a5b454..2b6cb9e 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -4944,13 +4944,16 @@ sub esc_html_mark_range {
>  
>  # Format removed and added line, mark changed part and HTML-format them.
>  sub format_rem_add_line {
> -	my ($rem, $add) = @_;
> +	my ($rem, $add, $is_combined) = @_;
>  	my @r = split(//, $rem);
>  	my @a = split(//, $add);
>  	my ($esc_rem, $esc_add);
>  	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
>  	my ($prefix_is_space, $suffix_is_space) = (1, 1);
>  
> +	# In combined diff we must ignore two +/- characters.
> +	$prefix = 2 if ($is_combined);
> +

Errr... actually number of prefix is equalt to number of parents, so
it might be in case of octopus merge more than 2.

>  	while ($prefix < @r && $prefix < @a) {
>  		last if ($r[$prefix] ne $a[$prefix]);
>  
> @@ -4988,11 +4991,42 @@ sub format_ctx_rem_add_lines {
>  	my ($ctx, $rem, $add, $is_combined) = @_;
>  	my (@new_ctx, @new_rem, @new_add);
>  	my $num_add_lines = @$add;
> +	my $can_highlight;
> +
> +	# Highlight if every removed line has a corresponding added line.
> +	if ($num_add_lines > 0 && $num_add_lines == @$rem) {
> +		$can_highlight = 1;
> +
> +		# Highlight lines in combined diff only if the chunk contains
> +		# diff between the same version, e.g.
> +		#
> +		#    - a
> +		#   -  b
> +		#    + c
> +		#   +  d
> +		#
> +		# Otherwise the highlightling would be confusing.
> +		if ($is_combined) {
> +			for (my $i = 0; $i < $num_add_lines; $i++) {
> +				my $prefix_rem = substr($rem->[$i], 0, 2);
> +				my $prefix_add = substr($add->[$i], 0, 2);
> +
> +				$prefix_rem =~ s/-/+/g;
> +
> +				if ($prefix_rem ne $prefix_add) {
> +					$can_highlight = 0;
> +					last;

Nb. this assumes that we cannot refine and highlight something like
this:

   		#    - a
   		#   -  b
   		#    + c
   		#   ++ d

But perhaps this is better left for future improvemnt.

> +				}
> +			}
> +		}
> +	} else {
> +		$can_highlight = 0;
> +	}

This 'else' would be not necessary if $can_highlight was initialized
to 0.

>  
> -	if (!$is_combined && $num_add_lines > 0 && $num_add_lines == @$rem) {
> +	if ($can_highlight) {
>  		for (my $i = 0; $i < $num_add_lines; $i++) {
>  			my ($line_rem, $line_add) = format_rem_add_line(
> -				$rem->[$i], $add->[$i]);
> +				$rem->[$i], $add->[$i], $is_combined);
>  			push @new_rem, $line_rem;
>  			push @new_add, $line_add;

O.K.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 7/8] gitweb: Use different colors to present marked changes
  2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
@ 2012-02-12  0:11   ` Jakub Narebski
  2012-02-13  6:46     ` Michal Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-12  0:11 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> This makes use of the highlight diff feature.
> 
> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> ---
> I decided to split mechanism (generate HTML page with <span> elements that
> mark interesting fragments of diff output) from politics (use these
> particular colors for this <span> elements), but otherwise this commit
> may be squashed with the previous one. These colors work for me but if
> someone comes out with better ones, I'd be happy.

I think it would be better squashed with previous patch, otherwise it
is a bit not visible change...
 
>  gitweb/static/gitweb.css |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
> index c7827e8..4f87d16 100644
> --- a/gitweb/static/gitweb.css
> +++ b/gitweb/static/gitweb.css
> @@ -438,6 +438,10 @@ div.diff.add {
>  	color: #008800;
>  }
>  
> +div.diff.add span.marked {
> +	background-color: #77ff77;
> +}
> +
>  div.diff.from_file a.path,
>  div.diff.from_file {
>  	color: #aa0000;
> @@ -447,6 +451,10 @@ div.diff.rem {
>  	color: #cc0000;
>  }
>  
> +div.diff.rem span.marked {
> +	background-color: #ff7777;
> +}
> +
>  div.diff.chunk_header a,
>  div.diff.chunk_header {
>  	color: #990099;
> -- 

I'd have to see those colors in use.  BTW what colors other
highlighting diff GUIs use?

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-11 23:45   ` Jakub Narebski
@ 2012-02-12 10:42     ` Jakub Narebski
  2012-02-13  6:54       ` Michal Kiedrowicz
  2012-02-13  6:41     ` Michal Kiedrowicz
  1 sibling, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-12 10:42 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:
> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
 
> > +# Highlight characters from $prefix to $suffix and escape HTML.
> > +# $str is a reference to the array of characters.
> > +sub esc_html_mark_range {
> > +	my ($str, $prefix, $suffix) = @_;
> > +
> > +	# Don't generate empty <span> element.
> > +	if ($prefix == $suffix + 1) {
> > +		return esc_html(join('', @$str), -nbsp=>1);
> > +	}
> > +
> > +	my $before = join('', @{$str}[0..($prefix - 1)]);
> > +	my $marked = join('', @{$str}[$prefix..$suffix]);
> > +	my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);
> 
> Eeeeeek!  First you split into letters, in caller at that, then join?
> Why not pass striung ($str suggests string not array of characters),
> and use substr instead?
> 
> [Please disregard this and the next paragraph at first reading]
> 
> > +
> > +	return esc_html($before, -nbsp=>1) .
> > +		$cgi->span({-class=>'marked'}, esc_html($marked, -nbsp=>1)) .
> > +		esc_html($after,-nbsp=>1);
> > +}
> 
> Anyway I have send to git mailing list a patch series, which in one of
> patches adds esc_html_match_hl($str, $regexp) to highlight matches in
> a string.  Your esc_html_mark_range(), after a generalization, could
> be used as underlying "engine".
> 
> Something like this, perhaps (untested):
> 
>    # Highlight selected fragments of string, using given CSS class,
>    # and escape HTML.  It is assumed that fragments do not overlap.
>    # Regions are passed as list of pairs (array references).
>    sub esc_html_hl {
>         my ($str, $css_class, @sel) = @_;
>         return esc_html($str) unless @sel;
>    
>         my $out = '';
>         my $pos = 0;
>    
>         for my $s (@sel) {
>                 $out .= esc_html(substr($str, $pos, $s->[0] - $pos))
>                         if ($s->[0] - $pos > 0);
>                 $out .= $cgi->span({-class => $css_class},
>                                    esc_html(substr($str, $s->[0], $s->[1] - $s->[0])));
> 
>                 $pos = $m->[1];
>         }
>         $out .= esc_html(substr($str, $pos))
>                 if ($pos < length($str));
>    
>         return $out;
>    }

Actually we can accomodate both operating on string and operating on
array of characters in a single subroutine.  Though it can be left for
later commit, anyway...

     # Highlight selected fragments of string, using given CSS class,
     # and escape HTML.  It is assumed that fragments do not overlap.
     # Regions are passed as list of pairs (array references).
     sub esc_html_hl {
          my ($sth, $css_class, @sel) = @_;

          if (!@sel) {
                if (ref($sth) eq "ARRAY") {
                        return esc_html(join('', @$sth), -nbsp=>1);
                } else {
                        return esc_html($sth, -nbsp=>1);
          }

          if (ref($sth) eq "ARRAY") {
                return esc_html_hl_gen($sth,
                        sub { 
                                my ($arr, $from, $to) = @_;
                                return join('', @{$arr}[$from..$to]);
                        },
                        scalar @{$arr}, $css_class, @sel);
           } else {
                return esc_html_hl_gen($sth,
                        sub {
                                my ($str, $from, $to) = @_;
                                if ($to < 0) { $to += lenght($str); };
                                return substr($str, $from, $to - $from);
                        },
                        length($sth), $css_class, @sel);
           }
     }

     # Highlight selected fragments of string or array of characters
     # with given length, using provided $extr subroutine to extract
     # fragment (substring)
     sub esc_html_hl_gen {
          my ($sth, $extr, $len, $css_class, @sel) = @_;
     
          my $out = '';
          my $pos = 0;
     
          for my $s (@sel) {
                $out .= esc_html($extr->($str, $pos, $s->[0]))
                        if ($s->[0] - $pos > 0);
                $out .= $cgi->span({-class => $css_class},
                                   esc_html($extr->($str, $s->[0], $s->[1])));
  
                $pos = $s->[1];
          }
          $out .= esc_html($extr->($str, $pos, $len))
                  if ($pos < $len);
     
          return $out;
     }

Or maybe I have read "Higher-Order Perl" one time too many ;-))))

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-11 23:45   ` Jakub Narebski
  2012-02-12 10:42     ` Jakub Narebski
@ 2012-02-13  6:41     ` Michal Kiedrowicz
  2012-02-13 18:44       ` Jakub Narebski
  1 sibling, 1 reply; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-13  6:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > Reading diff output is sometimes very hard, even if it's colored,
> > especially if lines differ only in few characters.  This is often
> > true when a commit fixes a typo or renames some variables or
> > functions.
> >
> This is certainly nice feature to have.  I think most tools that
> implement side-by-side diff implement this too.
> 
> > This commit teaches gitweb to highlight characters that are
> > different between old and new line.  This should work in the
> > similar manner as in Trac or GitHub.
> >
> Doe you have URLs with good examples of such diff refinement
> highlighting (GNU Emacs ediff/emerge terminology) / highlighting
> differing segments (diff-highlight terminology)?

I haven't found *examples* on GitHub and Trac sites, but what about
these ones:

https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
http://trac.edgewall.org/changeset/10973

>  
> > The code that compares lines is based on
> > contrib/diff-highlight/diff-highlight, except that it works with
> > multiline changes too.  It also won't highlight lines that are
> > completely different because that would only make the output
> > unreadable.
> >
> So what does it look like?  From the contrib/diff-highlight/README I
> guess that it finds common prefix and common suffix, and highlights
> the rest, which includes change.  

Yes.

> It doesn't implement LCS / diff
> algorithm like e.g. tkdiff does for its diff refinement highlighting,
> isn't it?
> 

I doesn't. I share the Jeff's opinion that:
a) Jeff's approach is "good enough"
b) LCS on bytes could be very confusing if it marked few sets of
characters.

> By completly different you mean that they do not have common prefix or
> common suffix (at least one of them), isn't it?

Yes, but I also don't highlight lines which prefix/suffix consists only
of whitespace (This is quite common). I would also consider ignoring
prefixes/suffixes with punctuation, like:

	- * I like you.
	+ * Alice had a little lamb.

> 
> > Combined diffs are not supported but a following commit will change
> > it.
> > 
> O.K.
> 
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> > ---
> >  gitweb/gitweb.perl |   82
> > ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files
> > changed, 77 insertions(+), 5 deletions(-)
> > 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index db61553..1a5b454 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -2322,7 +2322,7 @@ sub format_cc_diff_chunk_header {
> >  # wrap patch (diff) line into a <div> (not to be used for diff
> > headers), # the line must be esc_html()'ed
> >  sub format_diff_line {
> > -	my ($line, $diff_class, $from, $to) = @_;
> > +	my ($line, $diff_class) = @_;
> 
> Why that change?

I think it fallout from previous patch where format_diff_line() stopped
using $from and $to. Will fix that.

> 
> >  
> >  	my $diff_classes = "diff";
> >  	$diff_classes .= " $diff_class" if ($diff_class);
> > @@ -4923,14 +4923,85 @@ sub print_inline_diff_lines {
> >  	print foreach (@$add);
> >  }
> >  
> > +# Highlight characters from $prefix to $suffix and escape HTML.
> > +# $str is a reference to the array of characters.
> > +sub esc_html_mark_range {
> > +	my ($str, $prefix, $suffix) = @_;
> > +
> > +	# Don't generate empty <span> element.
> > +	if ($prefix == $suffix + 1) {
> > +		return esc_html(join('', @$str), -nbsp=>1);
> > +	}
> > +
> > +	my $before = join('', @{$str}[0..($prefix - 1)]);
> > +	my $marked = join('', @{$str}[$prefix..$suffix]);
> > +	my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);
> 
> Eeeeeek!  First you split into letters, in caller at that, then join?
> Why not pass striung ($str suggests string not array of characters),
> and use substr instead?
> 
> [Please disregard this and the next paragraph at first reading]

I will rename $str to something more self describing.

> 
> > +
> > +	return esc_html($before, -nbsp=>1) .
> > +		$cgi->span({-class=>'marked'}, esc_html($marked,
> > -nbsp=>1)) .
> > +		esc_html($after,-nbsp=>1);
> > +}
> 
> Anyway I have send to git mailing list a patch series, which in one of
> patches adds esc_html_match_hl($str, $regexp) to highlight matches in
> a string.  Your esc_html_mark_range(), after a generalization, could
> be used as underlying "engine".
> 
> Something like this, perhaps (untested):
> 
>    # Highlight selected fragments of string, using given CSS class,
>    # and escape HTML.  It is assumed that fragments do not overlap.
>    # Regions are passed as list of pairs (array references).
>    sub esc_html_hl {
>         my ($str, $css_class, @sel) = @_;
>         return esc_html($str) unless @sel;
>    
>         my $out = '';
>         my $pos = 0;
>    
>         for my $s (@sel) {
>                 $out .= esc_html(substr($str, $pos, $s->[0] - $pos))
>                         if ($s->[0] - $pos > 0);
>                 $out .= $cgi->span({-class => $css_class},
>                                    esc_html(substr($str, $s->[0],
> $s->[1] - $s->[0])));
> 
>                 $pos = $m->[1];
>         }
>         $out .= esc_html(substr($str, $pos))
>                 if ($pos < length($str));
>    
>         return $out;
>    }
> 
> > +
> > +# Format removed and added line, mark changed part and HTML-format
> > them.
> 
> You should probably ad here that this code is taken from
> diff-highlight in contrib area, isn't it?

True.

> 
> > +sub format_rem_add_line {
> > +	my ($rem, $add) = @_;
> > +	my @r = split(//, $rem);
> > +	my @a = split(//, $add);
> > +	my ($esc_rem, $esc_add);
> > +	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
> 
> It is not as much $prefix, as $prefix_len, isn't it?  

Yes.

> Shouldn't
> $prefix / $prefix_len start from 0, not from 1?

It starts from 1 because it skips first +/-. It should become obvious
after reading the comment from last patch :).

+	# In combined diff we must ignore two +/- characters.
+	$prefix = 2 if ($is_combined);
+

> 
> > +	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> > +
> > +	while ($prefix < @r && $prefix < @a) {
> > +		last if ($r[$prefix] ne $a[$prefix]);
> > +
> > +		$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > +		$prefix++;
> > +	}
> 
> Ah, I see that it is easier to find common prefix by treating string
> as array of characters.
> 
> Though I wonder if it wouldn't be easier to use trick of XOR-ing two
> strings (see "Bitwise String Operators" in perlop(1)):
> 
>         my $xor = "$rem" ^ "$add";
> 
> and finding starting sequence of "\0", which denote common prefix.
> 
> 
> Though this and the following is a nice implementation of
> algorithm... as it would be implemented in C.  Nevermind, it might be
> good enough...

The splitting and comparing by characters is taken from diff-highlight.
I don't think it's worth changing here.

> 
> > +
> > +	while ($suffix_rem >= $prefix && $suffix_add >= $prefix) {
> > +		last if ($r[$suffix_rem] ne $a[$suffix_add]);
> > +
> > +		$suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/);
> > +		$suffix_rem--;
> > +		$suffix_add--;
> > +	}
> 
> BTW., perhaps using single negative $suffix_len instead of separate
> $suffix_rem_pos and $suffix_add_pos would make code simpler and easier
> to understand?

I'll try that.

> 
> > +
> > +	# Mark lines that are different from each other, but have
> > some common
> > +	# part that isn't whitespace.  If lines are completely
> > different, don't
> > +	# mark them because that would make output unreadable,
> > especially if
> > +	# diff consists of multiple lines.
> > +	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add ==
> > $#a)
> > +		|| ($prefix_is_space && $suffix_is_space)) {
> 
> Micronit about style: in gitweb we put boolean operator at the end of
> continued line, not at beginning of next one.
> 
> So this would be:
> 
>   +	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add ==
> $#a) ||
>   +	    ($prefix_is_space && $suffix_is_space)) {
> 

OK

> > +		$esc_rem = esc_html($rem);
> > +		$esc_add = esc_html($add);
> > +	} else {
> > +		$esc_rem = esc_html_mark_range(\@r, $prefix,
> > $suffix_rem);
> > +		$esc_add = esc_html_mark_range(\@a, $prefix,
> > $suffix_add);
> > +	}
> > +
> > +	return format_diff_line($esc_rem, 'rem'),
> > +		format_diff_line($esc_add, 'add');
> 
> Please use spaces to align.

OK

> 
> > +}
> > +
> >  # HTML-format diff context, removed and added lines.
> >  sub format_ctx_rem_add_lines {
> > -	my ($ctx, $rem, $add) = @_;
> > +	my ($ctx, $rem, $add, $is_combined) = @_;
> >  	my (@new_ctx, @new_rem, @new_add);
> > +	my $num_add_lines = @$add;
> 
> Why is this temporary variable needed?  If you are not sure that ==
> operator enforces scalar context on both arguments you can always
> write
> 
>   scalar @$add == scalar @$rem
> 

You read my mind.

> > +
> > +	if (!$is_combined && $num_add_lines > 0 && $num_add_lines
> > == @$rem) {
> > +		for (my $i = 0; $i < $num_add_lines; $i++) {
> > +			my ($line_rem, $line_add) =
> > format_rem_add_line(
> > +				$rem->[$i], $add->[$i]);
> > +			push @new_rem, $line_rem;
> > +			push @new_add, $line_add;
> 
> The original contrib/diff-highlight works only for single changed line
> (single removed and single added).  You make this code work also for
> the case where number of aded lines is equal to the number of removed
> lines, and assume that subsequent changed lines in preimage
> correcponds to subsequent changed lines in postimage, which is not
> always true:
> 
>    -foo
>    -bar
>    +baz
>    +fooo
> 
> This is not described in commit message, I think.

True. Note that in this particular case nothing should be highlighted
because corresponding lines don't have common prefix/suffix.

> 
> > +		}
> > +	} else {
> > +		@new_rem = map { format_diff_line(esc_html($_,
> > -nbsp=>1), 'rem') } @$rem;
> > +		@new_add = map { format_diff_line(esc_html($_,
> > -nbsp=>1), 'add') } @$add;
> > +	}
> >  
> >  	@new_ctx = map { format_diff_line(esc_html($_, -nbsp=>1),
> > 'ctx') } @$ctx;
> > -	@new_rem = map { format_diff_line(esc_html($_, -nbsp=>1),
> > 'rem') } @$rem;
> > -	@new_add = map { format_diff_line(esc_html($_, -nbsp=>1),
> > 'add') } @$add; 
> >  	return (\@new_ctx, \@new_rem, \@new_add);
> >  }
> > @@ -4939,7 +5010,8 @@ sub format_ctx_rem_add_lines {
> >  sub print_diff_lines {
> >  	my ($ctx, $rem, $add, $diff_style, $is_combined) = @_;
> >  
> > -	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem,
> > $add);
> > +	($ctx, $rem, $add) = format_ctx_rem_add_lines($ctx, $rem,
> > $add,
> > +		$is_combined);
> 
> O.K., now the code depends on $is_combined
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 7/8] gitweb: Use different colors to present marked changes
  2012-02-12  0:11   ` Jakub Narebski
@ 2012-02-13  6:46     ` Michal Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-13  6:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > This makes use of the highlight diff feature.
> > 
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> > ---
> > I decided to split mechanism (generate HTML page with <span>
> > elements that mark interesting fragments of diff output) from
> > politics (use these particular colors for this <span> elements),
> > but otherwise this commit may be squashed with the previous one.
> > These colors work for me but if someone comes out with better ones,
> > I'd be happy.
> 
> I think it would be better squashed with previous patch, otherwise it
> is a bit not visible change...

OK, but please note that since previous patch HTML contains <span>
elements around differing segments of diff so the change exists. It
just isn't reflected by CSS.

>  
> >  gitweb/static/gitweb.css |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
> > index c7827e8..4f87d16 100644
> > --- a/gitweb/static/gitweb.css
> > +++ b/gitweb/static/gitweb.css
> > @@ -438,6 +438,10 @@ div.diff.add {
> >  	color: #008800;
> >  }
> >  
> > +div.diff.add span.marked {
> > +	background-color: #77ff77;
> > +}
> > +
> >  div.diff.from_file a.path,
> >  div.diff.from_file {
> >  	color: #aa0000;
> > @@ -447,6 +451,10 @@ div.diff.rem {
> >  	color: #cc0000;
> >  }
> >  
> > +div.diff.rem span.marked {
> > +	background-color: #ff7777;
> > +}
> > +
> >  div.diff.chunk_header a,
> >  div.diff.chunk_header {
> >  	color: #990099;
> > -- 
> 
> I'd have to see those colors in use.  

Then why don't you try it?

> BTW what colors other
> highlighting diff GUIs use?
> 

AFAIK they just use darker red/green background than for the rest of
line.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 8/8] gitweb: Highlight combined diffs
  2012-02-12  0:03   ` Jakub Narebski
@ 2012-02-13  6:48     ` Michal Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-13  6:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > The highlightning of combined diffs is currently disabled.  This is
> > because output from a combined diff is much harder to highlight
> > because it's not obvious which removed and added lines should be
> > compared.
> > 
> > Moreover, code that compares added and removed lines doesn't care
> > about combined diffs. It only skips first +/- character, treating
> > second +/- as a line content.
> > 
> > Let's start with a simple case: only highlight changes that come
> > from one parent, i.e. when every removed line has a corresponding
> > added line for the same parent.  This way the highlightning cannot
> > get wrong. For example, following diffs would be highlighted:
> > 
> > 	- removed line for first parent
> > 	+ added line for first parent
> > 	  context line
> > 	 -removed line for second parent
> > 	 +added line for second parent
> > 
> > or
> > 
> > 	- removed line for first parent
> > 	 -removed line for second parent
> > 	+ added line for first parent
> > 	 +added line for second parent
> > 
> > but following output will not:
> > 
> > 	- removed line for first parent
> > 	 -removed line for second parent
> > 	 +added line for second parent
> > 	++added line for both parents
> > 
> That is a very reasonable approach.

Thanks.

> 
> > Further changes may introduce more intelligent approach that better
> > handles combined diffs.
> > 
> > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com>
> > ---
> >  gitweb/gitweb.perl |   40 +++++++++++++++++++++++++++++++++++++---
> >  1 files changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 1a5b454..2b6cb9e 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -4944,13 +4944,16 @@ sub esc_html_mark_range {
> >  
> >  # Format removed and added line, mark changed part and HTML-format
> > them. sub format_rem_add_line {
> > -	my ($rem, $add) = @_;
> > +	my ($rem, $add, $is_combined) = @_;
> >  	my @r = split(//, $rem);
> >  	my @a = split(//, $add);
> >  	my ($esc_rem, $esc_add);
> >  	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
> >  	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> >  
> > +	# In combined diff we must ignore two +/- characters.
> > +	$prefix = 2 if ($is_combined);
> > +
> 
> Errr... actually number of prefix is equalt to number of parents, so
> it might be in case of octopus merge more than 2.

OK

> 
> >  	while ($prefix < @r && $prefix < @a) {
> >  		last if ($r[$prefix] ne $a[$prefix]);
> >  
> > @@ -4988,11 +4991,42 @@ sub format_ctx_rem_add_lines {
> >  	my ($ctx, $rem, $add, $is_combined) = @_;
> >  	my (@new_ctx, @new_rem, @new_add);
> >  	my $num_add_lines = @$add;
> > +	my $can_highlight;
> > +
> > +	# Highlight if every removed line has a corresponding
> > added line.
> > +	if ($num_add_lines > 0 && $num_add_lines == @$rem) {
> > +		$can_highlight = 1;
> > +
> > +		# Highlight lines in combined diff only if the
> > chunk contains
> > +		# diff between the same version, e.g.
> > +		#
> > +		#    - a
> > +		#   -  b
> > +		#    + c
> > +		#   +  d
> > +		#
> > +		# Otherwise the highlightling would be confusing.
> > +		if ($is_combined) {
> > +			for (my $i = 0; $i < $num_add_lines; $i++)
> > {
> > +				my $prefix_rem =
> > substr($rem->[$i], 0, 2);
> > +				my $prefix_add =
> > substr($add->[$i], 0, 2); +
> > +				$prefix_rem =~ s/-/+/g;
> > +
> > +				if ($prefix_rem ne $prefix_add) {
> > +					$can_highlight = 0;
> > +					last;
> 
> Nb. this assumes that we cannot refine and highlight something like
> this:
> 
>    		#    - a
>    		#   -  b
>    		#    + c
>    		#   ++ d
> 
> But perhaps this is better left for future improvemnt.

I can add a patch for that at the end of this series.

> 
> > +				}
> > +			}
> > +		}
> > +	} else {
> > +		$can_highlight = 0;
> > +	}
> 
> This 'else' would be not necessary if $can_highlight was initialized
> to 0.
> 

OK.

> >  
> > -	if (!$is_combined && $num_add_lines > 0 && $num_add_lines
> > == @$rem) {
> > +	if ($can_highlight) {
> >  		for (my $i = 0; $i < $num_add_lines; $i++) {
> >  			my ($line_rem, $line_add) =
> > format_rem_add_line(
> > -				$rem->[$i], $add->[$i]);
> > +				$rem->[$i], $add->[$i],
> > $is_combined); push @new_rem, $line_rem;
> >  			push @new_add, $line_add;
> 
> O.K.
> 

Thanks for your thorough review.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-12 10:42     ` Jakub Narebski
@ 2012-02-13  6:54       ` Michal Kiedrowicz
  2012-02-13 19:58         ` Jakub Narebski
  0 siblings, 1 reply; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-13  6:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
>  
> > > +# Highlight characters from $prefix to $suffix and escape HTML.
> > > +# $str is a reference to the array of characters.
> > > +sub esc_html_mark_range {
> > > +	my ($str, $prefix, $suffix) = @_;
> > > +
> > > +	# Don't generate empty <span> element.
> > > +	if ($prefix == $suffix + 1) {
> > > +		return esc_html(join('', @$str), -nbsp=>1);
> > > +	}
> > > +
> > > +	my $before = join('', @{$str}[0..($prefix - 1)]);
> > > +	my $marked = join('', @{$str}[$prefix..$suffix]);
> > > +	my $after = join('', @{$str}[($suffix + 1)..$#{$str}]);
> > 
> > Eeeeeek!  First you split into letters, in caller at that, then
> > join? Why not pass striung ($str suggests string not array of
> > characters), and use substr instead?
> > 
> > [Please disregard this and the next paragraph at first reading]
> > 
> > > +
> > > +	return esc_html($before, -nbsp=>1) .
> > > +		$cgi->span({-class=>'marked'}, esc_html($marked,
> > > -nbsp=>1)) .
> > > +		esc_html($after,-nbsp=>1);
> > > +}
> > 
> > Anyway I have send to git mailing list a patch series, which in one
> > of patches adds esc_html_match_hl($str, $regexp) to highlight
> > matches in a string.  

Yeah, I saw that but after seeing that they accept different arguments
I decided to leave them alone.

> Your esc_html_mark_range(), after a
> > generalization, could be used as underlying "engine".
> > 
> > Something like this, perhaps (untested):

I think I'll leave it to you after merging both these series to
master :)

> > 
> >    # Highlight selected fragments of string, using given CSS class,
> >    # and escape HTML.  It is assumed that fragments do not overlap.
> >    # Regions are passed as list of pairs (array references).
> >    sub esc_html_hl {
> >         my ($str, $css_class, @sel) = @_;
> >         return esc_html($str) unless @sel;
> >    
> >         my $out = '';
> >         my $pos = 0;
> >    
> >         for my $s (@sel) {
> >                 $out .= esc_html(substr($str, $pos, $s->[0] - $pos))
> >                         if ($s->[0] - $pos > 0);
> >                 $out .= $cgi->span({-class => $css_class},
> >                                    esc_html(substr($str, $s->[0],
> > $s->[1] - $s->[0])));
> > 
> >                 $pos = $m->[1];
> >         }
> >         $out .= esc_html(substr($str, $pos))
> >                 if ($pos < length($str));
> >    
> >         return $out;
> >    }
> 
> Actually we can accomodate both operating on string and operating on
> array of characters in a single subroutine.  Though it can be left for
> later commit, anyway...

> 
>      # Highlight selected fragments of string, using given CSS class,
>      # and escape HTML.  It is assumed that fragments do not overlap.
>      # Regions are passed as list of pairs (array references).
>      sub esc_html_hl {
>           my ($sth, $css_class, @sel) = @_;
> 
>           if (!@sel) {
>                 if (ref($sth) eq "ARRAY") {
>                         return esc_html(join('', @$sth), -nbsp=>1);
>                 } else {
>                         return esc_html($sth, -nbsp=>1);
>           }
> 
>           if (ref($sth) eq "ARRAY") {
>                 return esc_html_hl_gen($sth,
>                         sub { 
>                                 my ($arr, $from, $to) = @_;
>                                 return join('', @{$arr}[$from..$to]);
>                         },
>                         scalar @{$arr}, $css_class, @sel);
>            } else {
>                 return esc_html_hl_gen($sth,
>                         sub {
>                                 my ($str, $from, $to) = @_;
>                                 if ($to < 0) { $to += lenght($str); };
>                                 return substr($str, $from, $to -
> $from); },
>                         length($sth), $css_class, @sel);
>            }
>      }
> 
>      # Highlight selected fragments of string or array of characters
>      # with given length, using provided $extr subroutine to extract
>      # fragment (substring)
>      sub esc_html_hl_gen {
>           my ($sth, $extr, $len, $css_class, @sel) = @_;
>      
>           my $out = '';
>           my $pos = 0;
>      
>           for my $s (@sel) {
>                 $out .= esc_html($extr->($str, $pos, $s->[0]))
>                         if ($s->[0] - $pos > 0);
>                 $out .= $cgi->span({-class => $css_class},
>                                    esc_html($extr->($str, $s->[0],
> $s->[1]))); 
>                 $pos = $s->[1];
>           }
>           $out .= esc_html($extr->($str, $pos, $len))
>                   if ($pos < $len);
>      
>           return $out;
>      }
> 
> Or maybe I have read "Higher-Order Perl" one time too many ;-))))
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-13  6:41     ` Michal Kiedrowicz
@ 2012-02-13 18:44       ` Jakub Narebski
  2012-02-13 21:09         ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-13 18:44 UTC (permalink / raw)
  To: Michal Kiedrowicz; +Cc: git

On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
> 
> > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > 
> > > Reading diff output is sometimes very hard, even if it's colored,
> > > especially if lines differ only in few characters.  This is often
> > > true when a commit fixes a typo or renames some variables or
> > > functions.
> > >
> > This is certainly nice feature to have.  I think most tools that
> > implement side-by-side diff implement this too.
> > 
> > > This commit teaches gitweb to highlight characters that are
> > > different between old and new line.  This should work in the
> > > similar manner as in Trac or GitHub.
> > >
> > Doe you have URLs with good examples of such diff refinement
> > highlighting (GNU Emacs ediff/emerge terminology) / highlighting
> > differing segments (diff-highlight terminology)?
> 
> I haven't found *examples* on GitHub and Trac sites, but what about
> these ones:
> 
> https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> http://trac.edgewall.org/changeset/10973

Thanks.  That is what I meant by "good examples".  Perhaps they should
be put in the commit message?

BTW GitHub is closed source, but we can check what algorithm does Trac
use for diff refinement highlighting (highlighting changed portions of
diff).

> > > The code that compares lines is based on
> > > contrib/diff-highlight/diff-highlight, except that it works with
> > > multiline changes too.  It also won't highlight lines that are
> > > completely different because that would only make the output
> > > unreadable.
> > >
> > So what does it look like?  From the contrib/diff-highlight/README I
> > guess that it finds common prefix and common suffix, and highlights
> > the rest, which includes change.  
> 
> Yes.
> 
> > It doesn't implement LCS / diff
> > algorithm like e.g. tkdiff does for its diff refinement highlighting,
> > isn't it?
> 
> I doesn't. I share the Jeff's opinion that:
> a) Jeff's approach is "good enough"
> b) LCS on bytes could be very confusing if it marked few sets of
> characters.

I wonder if we can use --diff-words for diff refinement highlighting,
i.e. LCS on words.

Anyway Jeff's approach is a bit limited, in that it would work only for
change that does not involve adding newlines, for example splitting
overly long line when changing something.

See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
 
> > By completly different you mean that they do not have common prefix or
> > common suffix (at least one of them), isn't it?

BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
and suffix are non-empty"?

> Yes, but I also don't highlight lines which prefix/suffix consists only
> of whitespace (This is quite common). 

O.K., that is quite sensible.

> I would also consider ignoring prefixes/suffixes with punctuation, like:
> 
> 	- * I like you.
> 	+ * Alice had a little lamb.

But this patch doesn't implement this feature yet, isn't it?

Well, here is another idea: do not highlight if sum of prefix and suffix
lengths are less than some threshold, e.g. 2 characters not including 
whitespace, or some percentage with respect to total line length.

> > > +# Highlight characters from $prefix to $suffix and escape HTML.
> > > +# $str is a reference to the array of characters.
> > > +sub esc_html_mark_range {
> > > +	my ($str, $prefix, $suffix) = @_;
> > > +
> > > +	# Don't generate empty <span> element.
> > > +	if ($prefix == $suffix + 1) {
> > > +		return esc_html(join('', @$str), -nbsp=>1);
> > > +	}
> > > +
> > > +	my $before = join('', @{$str}[0..($prefix - 1)]);
> > > +	my $marked = join('', @{$str}[$prefix..$suffix]);
> > > +	my $after  = join('', @{$str}[($suffix + 1)..$#{$str}]);

It would be nicer with 'part' from List::MoreUtils... but that module
is unfortunately not in core.

> > Eeeeeek!  First you split into letters, in caller at that, then join?
> > Why not pass striung ($str suggests string not array of characters),
> > and use substr instead?
> > 
> > [Please disregard this and the next paragraph at first reading]
> 
> I will rename $str to something more self describing.

Please do.

BTW. don't you assume here that both common prefix and common suffix
are non-empty?
 
[...]
> > > +
> > > +# Format removed and added line, mark changed part and HTML-format
> > > them.
> > 
> > You should probably add here that this code is taken from
> > diff-highlight in contrib area, isn't it?
> 
> True.

And perhaps not the changes from diff-highlight, unless you meant to
send them upstream for inclusion.

> > > +sub format_rem_add_line {
> > > +	my ($rem, $add) = @_;
> > > +	my @r = split(//, $rem);
> > > +	my @a = split(//, $add);

BTW the name of variable can be just @add and @rem.

> > > +	my ($esc_rem, $esc_add);
> > > +	my ($prefix, $suffix_rem, $suffix_add) = (1, $#r, $#a);
> > 
> > It is not as much $prefix, as $prefix_len, isn't it?  
> 
> Yes.
> 
> > Shouldn't
> > $prefix / $prefix_len start from 0, not from 1?
> 
> It starts from 1 because it skips first +/-. It should become obvious
> after reading the comment from last patch :).
> 
> +	# In combined diff we must ignore two +/- characters.
> +	$prefix = 2 if ($is_combined);

Anyway comment about that fact would be nice.
 
> > > +	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> > > +
> > > +	while ($prefix < @r && $prefix < @a) {
> > > +		last if ($r[$prefix] ne $a[$prefix]);
> > > +
> > > +		$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > > +		$prefix++;
> > > +	}
> > 
> > Ah, I see that it is easier to find common prefix by treating string
> > as array of characters.
> > 
> > Though I wonder if it wouldn't be easier to use trick of XOR-ing two
> > strings (see "Bitwise String Operators" in perlop(1)):
> > 
> >         my $xor = "$rem" ^ "$add";
> > 
> > and finding starting sequence of "\0", which denote common prefix.
> > 
> > 
> > Though this and the following is a nice implementation of
> > algorithm... as it would be implemented in C.  Nevermind, it might be
> > good enough...
> 
> The splitting and comparing by characters is taken from diff-highlight.
> I don't think it's worth changing here.

You are right.

I'll try to come with hacky algorithm using string bitwise xor and regexp,
and benchmark it comparing to your C-like solution, but it can be left for
later (simple is better than clever, usually).
 
> > > +	while ($suffix_rem >= $prefix && $suffix_add >= $prefix) {
> > > +		last if ($r[$suffix_rem] ne $a[$suffix_add]);
> > > +
> > > +		$suffix_is_space = 0 if ($r[$suffix_rem] !~ /\s/);
> > > +		$suffix_rem--;
> > > +		$suffix_add--;
> > > +	}
> > 
> > BTW., perhaps using single negative $suffix_len instead of separate
> > $suffix_rem_pos and $suffix_add_pos would make code simpler and easier
> > to understand?
> 
> I'll try that.

I think this should just work, and it would be one variable less to
track.
 
> > >  # HTML-format diff context, removed and added lines.
> > >  sub format_ctx_rem_add_lines {
> > > -	my ($ctx, $rem, $add) = @_;
> > > +	my ($ctx, $rem, $add, $is_combined) = @_;
> > >  	my (@new_ctx, @new_rem, @new_add);
> > > +	my $num_add_lines = @$add;
> > 
> > Why is this temporary variable needed?  If you are not sure that ==
> > operator enforces scalar context on both arguments you can always
> > write
> > 
> >   scalar @$add == scalar @$rem
> 
> You read my mind.

BTW. the same comment applies to patch adding support for highlighting
changed part in combined diff.
 
> > The original contrib/diff-highlight works only for single changed line
> > (single removed and single added).  You make this code work also for
> > the case where number of aded lines is equal to the number of removed
> > lines, and assume that subsequent changed lines in preimage
> > correcponds to subsequent changed lines in postimage, [...]
[...]
> > This is not described in commit message, I think.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-13  6:54       ` Michal Kiedrowicz
@ 2012-02-13 19:58         ` Jakub Narebski
  2012-02-13 21:10           ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-13 19:58 UTC (permalink / raw)
  To: Michal Kiedrowicz; +Cc: git

Michal Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
> > Jakub Narebski <jnareb@gmail.com> writes:
[...]
> > > Anyway I have send to git mailing list a patch series, which in one
> > > of patches adds esc_html_match_hl($str, $regexp) to highlight
> > > matches in a string.  
> 
> Yeah, I saw that but after seeing that they accept different arguments
> I decided to leave them alone.
> 
> > > Your esc_html_mark_range(), after a
> > > generalization, could be used as underlying "engine".
> > > 
> > > Something like this, perhaps (untested):
> 
> I think I'll leave it to you after merging both these series to
> master :)
 
Yes, you are right.  Let's do it when we actually need it.

[...]
> > >    # Highlight selected fragments of string, using given CSS class,
> > >    # and escape HTML.  It is assumed that fragments do not overlap.
> > >    # Regions are passed as list of pairs (array references).
> > >    sub esc_html_hl {
[...]
> > Actually we can accomodate both operating on string and operating on
> > array of characters in a single subroutine.  Though it can be left for
> > later commit, anyway...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-13 18:44       ` Jakub Narebski
@ 2012-02-13 21:09         ` Michał Kiedrowicz
  2012-02-14 17:31           ` Jakub Narebski
  0 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-13 21:09 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> > I haven't found *examples* on GitHub and Trac sites, but what about
> > these ones:
> > 
> > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > http://trac.edgewall.org/changeset/10973
> 
> Thanks.  That is what I meant by "good examples".  Perhaps they should
> be put in the commit message?

OK

> 
> BTW GitHub is closed source, but we can check what algorithm does Trac
> use for diff refinement highlighting (highlighting changed portions of
> diff).
> 

I think it's
http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
(see markup intraline_changes()).


> > > It doesn't implement LCS / diff
> > > algorithm like e.g. tkdiff does for its diff refinement highlighting,
> > > isn't it?
> > 
> > I doesn't. I share the Jeff's opinion that:
> > a) Jeff's approach is "good enough"
> > b) LCS on bytes could be very confusing if it marked few sets of
> > characters.
> 
> I wonder if we can use --diff-words for diff refinement highlighting,
> i.e. LCS on words.

I think we can try it, but I worry about performance of running `git
diff` on every diff chunk.


> 
> Anyway Jeff's approach is a bit limited, in that it would work only for
> change that does not involve adding newlines, for example splitting
> overly long line when changing something.
> 
> See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
>  

Yes, I'm aware of that. I was thinking about improving it later ("Let's
start with a simple refinment highlightning and maybe later add more
sophisticated algorithms").

> > > By completly different you mean that they do not have common prefix or
> > > common suffix (at least one of them), isn't it?
> 
> BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
> and suffix are non-empty"?
> 

At least one. See:

	-a = 42;
	+b = 42;

Here prefix is empty but suffix is not.

> > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > 
> > 	- * I like you.
> > 	+ * Alice had a little lamb.
> 
> But this patch doesn't implement this feature yet, isn't it?

No, but is a matter of adding

	-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
	+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);

(and the same for suffix)

> Well, here is another idea: do not highlight if sum of prefix and suffix
> lengths are less than some threshold, e.g. 2 characters not including 
> whitespace, or some percentage with respect to total line length.
> 

That might be a good idea.

> > > Eeeeeek!  First you split into letters, in caller at that, then join?
> > > Why not pass striung ($str suggests string not array of characters),
> > > and use substr instead?
> > > 
> > > [Please disregard this and the next paragraph at first reading]
> > 
> > I will rename $str to something more self describing.
> 
> Please do.
> 
> BTW. don't you assume here that both common prefix and common suffix
> are non-empty?

Yes. The caller makes sure they are correct (at least 

> > > > +sub format_rem_add_line {
> > > > +	my ($rem, $add) = @_;
> > > > +	my @r = split(//, $rem);
> > > > +	my @a = split(//, $add);
> 
> BTW the name of variable can be just @add and @rem.
>

I know they are different scopes but I don't like it. It makes the code
more confusing IMO. But I won't insist.

> > > Shouldn't
> > > $prefix / $prefix_len start from 0, not from 1?
> > 
> > It starts from 1 because it skips first +/-. It should become obvious
> > after reading the comment from last patch :).
> > 
> > +	# In combined diff we must ignore two +/- characters.
> > +	$prefix = 2 if ($is_combined);
> 
> Anyway comment about that fact would be nice.

Will do.

>  
> > > > +	my ($prefix_is_space, $suffix_is_space) = (1, 1);
> > > > +
> > > > +	while ($prefix < @r && $prefix < @a) {
> > > > +		last if ($r[$prefix] ne $a[$prefix]);
> > > > +
> > > > +		$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > > > +		$prefix++;
> > > > +	}
> > > 
> > > Ah, I see that it is easier to find common prefix by treating string
> > > as array of characters.
> > > 
> > > Though I wonder if it wouldn't be easier to use trick of XOR-ing two
> > > strings (see "Bitwise String Operators" in perlop(1)):
> > > 
> > >         my $xor = "$rem" ^ "$add";
> > > 
> > > and finding starting sequence of "\0", which denote common prefix.
> > > 
> > > 
> > > Though this and the following is a nice implementation of
> > > algorithm... as it would be implemented in C.  Nevermind, it might be
> > > good enough...
> > 
> > The splitting and comparing by characters is taken from diff-highlight.
> > I don't think it's worth changing here.
> 
> You are right.
> 
> I'll try to come with hacky algorithm using string bitwise xor and regexp,
> and benchmark it comparing to your C-like solution, but it can be left for
> later (simple is better than clever, usually).

If you have time :).

> > > >  # HTML-format diff context, removed and added lines.
> > > >  sub format_ctx_rem_add_lines {
> > > > -	my ($ctx, $rem, $add) = @_;
> > > > +	my ($ctx, $rem, $add, $is_combined) = @_;
> > > >  	my (@new_ctx, @new_rem, @new_add);
> > > > +	my $num_add_lines = @$add;
> > > 
> > > Why is this temporary variable needed?  If you are not sure that ==
> > > operator enforces scalar context on both arguments you can always
> > > write
> > > 
> > >   scalar @$add == scalar @$rem
> > 
> > You read my mind.
> 
> BTW. the same comment applies to patch adding support for highlighting
> changed part in combined diff.
>

OK
  

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-13 19:58         ` Jakub Narebski
@ 2012-02-13 21:10           ` Michał Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-13 21:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> Michal Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > Jakub Narebski <jnareb@gmail.com> writes:
> [...]
> > > > Anyway I have send to git mailing list a patch series, which in one
> > > > of patches adds esc_html_match_hl($str, $regexp) to highlight
> > > > matches in a string.  
> > 
> > Yeah, I saw that but after seeing that they accept different arguments
> > I decided to leave them alone.
> > 
> > > > Your esc_html_mark_range(), after a
> > > > generalization, could be used as underlying "engine".
> > > > 
> > > > Something like this, perhaps (untested):
> > 
> > I think I'll leave it to you after merging both these series to
> > master :)
>  
> Yes, you are right.  Let's do it when we actually need it.
> 

Good :)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] diff-highlight: Work for multiline changes too
  2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
@ 2012-02-13 22:27           ` Jeff King
  2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
                               ` (6 more replies)
  0 siblings, 7 replies; 67+ messages in thread
From: Jeff King @ 2012-02-13 22:27 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

On Fri, Feb 10, 2012 at 10:47:13PM +0100, Michał Kiedrowicz wrote:

>  contrib/diff-highlight/diff-highlight |   96 ++++++++++++++++++++++-----------
>  1 files changed, 65 insertions(+), 31 deletions(-)

Thanks for sending. I looked at a whole bunch of patches, and I was
pleasantly surprised to find how infrequently we hit false positives in
practice. In fact, the only things that looked worse with your patch
were places where your patch happened to turn on highlighting for lines
where the existing heuristics already were a little ugly (i.e., the
problem was not your patch, but that the existing heuristic is sometimes
non-optimal).

I ended up pulling your changes out into a few distinct commits. That
made it easier for me to review and understand what was going on (and
hopefully ditto for other reviewers, or people who end up bisecting or
reading the log later). I'll post that series in a moment.

> After looking at outputs I noticed that it can also ignore lines with
> prefixes/suffixes that consist only of punctuation (asterisk, semicolon, dot,
> etc), because otherwise whole line is highlighted except for terminating
> punctuation.

I missed this note when I applied the patch and started looking at the
outputs, and ended up having a similar thought. However, I don't know
that it buys much in practice, and it's nice to be fairly agnostic about
content. I did leave that open to easy tweaking in my series, though.

  [1/5]: diff-highlight: make perl strict and warnings fatal
  [2/5]: diff-highlight: don't highlight whole lines
  [3/5]: diff-highlight: refactor to prepare for multi-line hunks
  [4/5]: diff-highlight: match multi-line hunks
  [5/5]: diff-highlight: document some non-optimal cases

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/5] diff-highlight: make perl strict and warnings fatal
  2012-02-13 22:27           ` Jeff King
@ 2012-02-13 22:28             ` Jeff King
  2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
                               ` (5 subsequent siblings)
  6 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-13 22:28 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

These perl features can catch bugs, and we shouldn't be
violating any of the strict rules or creating any warnings,
so let's turn them on.

Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/diff-highlight |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index d893898..c3302dd 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -1,5 +1,8 @@
 #!/usr/bin/perl
 
+use warnings FATAL => 'all';
+use strict;
+
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
 my $HIGHLIGHT   = "\x1b[7m";
-- 
1.7.8.4.17.g2df81

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 2/5] diff-highlight: don't highlight whole lines
  2012-02-13 22:27           ` Jeff King
  2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
@ 2012-02-13 22:32             ` Jeff King
  2012-02-14  6:35               ` Michal Kiedrowicz
  2012-02-13 22:33             ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
                               ` (4 subsequent siblings)
  6 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2012-02-13 22:32 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

If you have a change like:

  -foo
  +bar

we end up highlighting the entirety of both lines (since the
whole thing is changed). But the point of diff highlighting
is to pinpoint the specific change in a pair of lines that
are mostly identical. In this case, the highlighting is just
noise, since there is nothing to pinpoint, and we are better
off doing nothing.

The implementation looks for "interesting" pairs by checking
to see whether they actually have a matching prefix or
suffix that does not simply consist of colorization and
whitespace.  However, the implementation makes it easy to
plug in other heuristics, too, like:

  1. Depending on the source material, the set of "boring"
     characters could be tweaked to include language-specific
     stuff (like braces or semicolons for C).

  2. Instead of saying "an interesting line has at least one
     character of prefix or suffix", we could require that
     less than N percent of the line be highlighted.

The simple "ignore whitespace, and highlight if there are
any matched characters" implemented by this patch seems to
give good results on git.git. I'll leave experimentation
with other heuristics to somebody who has a dataset that
does not look good with the current code.

Based on an original idea and implementation by Michał
Kiedrowicz.

Signed-off-by: Jeff King <peff@peff.net>
---
Regarding attribution: I kept myself as author, because I rewrote it a
bit and figured that I would take primary responsibility for bugs in
this patch, and in the long run would be responsible for maintaining it.
But the idea and the substance of the patch are yours, and I would be
happy to list you as author if you prefer getting the credit that way
(after all, it bumps your shortlog numbers :) ).

The implementation is similar to yours, but pulls some of the decisions
out into a separate function to make tweaking the above heuristics
easier.

 contrib/diff-highlight/diff-highlight |   28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index c3302dd..0d8df84 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -8,6 +8,7 @@ use strict;
 my $HIGHLIGHT   = "\x1b[7m";
 my $UNHIGHLIGHT = "\x1b[27m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
+my $BORING = qr/$COLOR|\s/;
 
 my @window;
 
@@ -104,8 +105,14 @@ sub show_pair {
 		}
 	}
 
-	print highlight(\@a, $pa, $sa);
-	print highlight(\@b, $pb, $sb);
+	if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
+		print highlight(\@a, $pa, $sa);
+		print highlight(\@b, $pb, $sb);
+	}
+	else {
+		print join('', @a);
+		print join('', @b);
+	}
 }
 
 sub split_line {
@@ -125,3 +132,20 @@ sub highlight {
 		@{$line}[($suffix+1)..$#$line]
 	);
 }
+
+# Pairs are interesting to highlight only if we are going to end up
+# highlighting a subset (i.e., not the whole line). Otherwise, the highlighting
+# is just useless noise. We can detect this by finding either a matching prefix
+# or suffix (disregarding boring bits like whitespace and colorization).
+sub is_pair_interesting {
+	my ($a, $pa, $sa, $b, $pb, $sb) = @_;
+	my $prefix_a = join('', @$a[0..($pa-1)]);
+	my $prefix_b = join('', @$b[0..($pb-1)]);
+	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
+	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
+
+	return $prefix_a !~ /^$COLOR*-$BORING*$/ ||
+	       $prefix_b !~ /^$COLOR*\+$BORING*$/ ||
+	       $suffix_a !~ /^$BORING*$/ ||
+	       $suffix_b !~ /^$BORING*$/;
+}
-- 
1.7.8.4.17.g2df81

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks
  2012-02-13 22:27           ` Jeff King
  2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
  2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
@ 2012-02-13 22:33             ` Jeff King
  2012-02-13 22:36             ` [PATCH 4/5] diff-highlight: match " Jeff King
                               ` (3 subsequent siblings)
  6 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-13 22:33 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

The current code structure assumes that we will only look at
a pair of lines at any given time, and that the end result
should always be to output that pair. However, we want to
eventually handle multi-line hunks, which will involve
collating pairs of removed/added lines. Let's refactor the
code to return highlighted pairs instead of printing them.

Signed-off-by: Jeff King <peff@peff.net>
---
You did a similar refactoring in your patch, but I found pulling it out
made the next patch a lot more readable.

 contrib/diff-highlight/diff-highlight |   22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index 0d8df84..279d211 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -23,7 +23,7 @@ while (<>) {
 	    $window[2] =~ /^$COLOR*\+/ &&
 	    $window[3] !~ /^$COLOR*\+/) {
 		print shift @window;
-		show_pair(shift @window, shift @window);
+		show_hunk(shift @window, shift @window);
 	}
 	else {
 		print shift @window;
@@ -48,7 +48,7 @@ if (@window == 3 &&
     $window[1] =~ /^$COLOR*-/ &&
     $window[2] =~ /^$COLOR*\+/) {
 	print shift @window;
-	show_pair(shift @window, shift @window);
+	show_hunk(shift @window, shift @window);
 }
 
 # And then flush any remaining lines.
@@ -58,7 +58,13 @@ while (@window) {
 
 exit 0;
 
-sub show_pair {
+sub show_hunk {
+	my ($a, $b) = @_;
+
+	print highlight_pair($a, $b);
+}
+
+sub highlight_pair {
 	my @a = split_line(shift);
 	my @b = split_line(shift);
 
@@ -106,12 +112,12 @@ sub show_pair {
 	}
 
 	if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
-		print highlight(\@a, $pa, $sa);
-		print highlight(\@b, $pb, $sb);
+		return highlight_line(\@a, $pa, $sa),
+		       highlight_line(\@b, $pb, $sb);
 	}
 	else {
-		print join('', @a);
-		print join('', @b);
+		return join('', @a),
+		       join('', @b);
 	}
 }
 
@@ -121,7 +127,7 @@ sub split_line {
 	       split /($COLOR*)/;
 }
 
-sub highlight {
+sub highlight_line {
 	my ($line, $prefix, $suffix) = @_;
 
 	return join('',
-- 
1.7.8.4.17.g2df81

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 4/5] diff-highlight: match multi-line hunks
  2012-02-13 22:27           ` Jeff King
                               ` (2 preceding siblings ...)
  2012-02-13 22:33             ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
@ 2012-02-13 22:36             ` Jeff King
  2012-02-13 22:37             ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
                               ` (2 subsequent siblings)
  6 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-13 22:36 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

Currently we only bother highlighting single-line hunks. The
rationale was that the purpose of highlighting is to point
out small changes between two similar lines that are
otherwise hard to see. However, that meant we missed similar
cases where two lines were changed together, like:

   -foo(buf);
   -bar(buf);
   +foo(obj->buf);
   +bar(obj->buf);

Each of those changes is simple, and would benefit from
highlighting (the "obj->" parts in this case).

This patch considers whole hunks at a time. For now, we
consider only the case where the hunk has the same number of
removed and added lines, and assume that the lines from each
segment correspond one-to-one. While this is just a
heuristic, in practice it seems to generate sensible
results (especially because we now omit highlighting on
completely-changed lines, so when our heuristic is wrong, we
tend to avoid highlighting at all).

Based on an original idea and implementation by Michał
Kiedrowicz.

Signed-off-by: Jeff King <peff@peff.net>
---
Same attribution statement applies as to patch 2 (in fact, patches 1 and
3 could be attributed to you, too).

This version has the missing documentation fixes.  The implementation is
a little different than yours. I rearranged the parsing in a manner that
was a little more obvious to me, and I pulled out the "don't highlight
if the number of lines don't match" case into its own conditional, which
makes it more obvious where to work if somebody wants to try doing
something fancier.

 contrib/diff-highlight/README         |   16 ++++---
 contrib/diff-highlight/diff-highlight |   70 ++++++++++++++++++++-------------
 2 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index 1b7b6df..4a58579 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -14,13 +14,15 @@ Instead, this script post-processes the line-oriented diff, finds pairs
 of lines, and highlights the differing segments.  It's currently very
 simple and stupid about doing these tasks. In particular:
 
-  1. It will only highlight a pair of lines if they are the only two
-     lines in a hunk.  It could instead try to match up "before" and
-     "after" lines for a given hunk into pairs of similar lines.
-     However, this may end up visually distracting, as the paired
-     lines would have other highlighted lines in between them. And in
-     practice, the lines which most need attention called to their
-     small, hard-to-see changes are touching only a single line.
+  1. It will only highlight hunks in which the number of removed and
+     added lines is the same, and it will pair lines within the hunk by
+     position (so the first removed line is compared to the first added
+     line, and so forth). This is simple and tends to work well in
+     practice. More complex changes don't highlight well, so we tend to
+     exclude them due to the "same number of removed and added lines"
+     restriction. Or even if we do try to highlight them, they end up
+     not highlighting because of our "don't highlight if the whole line
+     would be highlighted" rule.
 
   2. It will find the common prefix and suffix of two lines, and
      consider everything in the middle to be "different". It could
diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index 279d211..c4404d4 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -10,23 +10,28 @@ my $UNHIGHLIGHT = "\x1b[27m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 
-my @window;
+my @removed;
+my @added;
+my $in_hunk;
 
 while (<>) {
-	# We highlight only single-line changes, so we need
-	# a 4-line window to make a decision on whether
-	# to highlight.
-	push @window, $_;
-	next if @window < 4;
-	if ($window[0] =~ /^$COLOR*(\@| )/ &&
-	    $window[1] =~ /^$COLOR*-/ &&
-	    $window[2] =~ /^$COLOR*\+/ &&
-	    $window[3] !~ /^$COLOR*\+/) {
-		print shift @window;
-		show_hunk(shift @window, shift @window);
+	if (!$in_hunk) {
+		print;
+		$in_hunk = /^$COLOR*\@/;
+	}
+	elsif (/^$COLOR*-/) {
+		push @removed, $_;
+	}
+	elsif (/^$COLOR*\+/) {
+		push @added, $_;
 	}
 	else {
-		print shift @window;
+		show_hunk(\@removed, \@added);
+		@removed = ();
+		@added = ();
+
+		print;
+		$in_hunk = /^$COLOR*[\@ ]/;
 	}
 
 	# Most of the time there is enough output to keep things streaming,
@@ -42,26 +47,37 @@ while (<>) {
 	}
 }
 
-# Special case a single-line hunk at the end of file.
-if (@window == 3 &&
-    $window[0] =~ /^$COLOR*(\@| )/ &&
-    $window[1] =~ /^$COLOR*-/ &&
-    $window[2] =~ /^$COLOR*\+/) {
-	print shift @window;
-	show_hunk(shift @window, shift @window);
-}
-
-# And then flush any remaining lines.
-while (@window) {
-	print shift @window;
-}
+# Flush any queued hunk (this can happen when there is no trailing context in
+# the final diff of the input).
+show_hunk(\@removed, \@added);
 
 exit 0;
 
 sub show_hunk {
 	my ($a, $b) = @_;
 
-	print highlight_pair($a, $b);
+	# If one side is empty, then there is nothing to compare or highlight.
+	if (!@$a || !@$b) {
+		print @$a, @$b;
+		return;
+	}
+
+	# If we have mismatched numbers of lines on each side, we could try to
+	# be clever and match up similar lines. But for now we are simple and
+	# stupid, and only handle multi-line hunks that remove and add the same
+	# number of lines.
+	if (@$a != @$b) {
+		print @$a, @$b;
+		return;
+	}
+
+	my @queue;
+	for (my $i = 0; $i < @$a; $i++) {
+		my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]);
+		print $rm;
+		push @queue, $add;
+	}
+	print @queue;
 }
 
 sub highlight_pair {
-- 
1.7.8.4.17.g2df81

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 5/5] diff-highlight: document some non-optimal cases
  2012-02-13 22:27           ` Jeff King
                               ` (3 preceding siblings ...)
  2012-02-13 22:36             ` [PATCH 4/5] diff-highlight: match " Jeff King
@ 2012-02-13 22:37             ` Jeff King
  2012-02-14  6:48               ` Michal Kiedrowicz
  2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
  2012-02-14  6:28             ` Michal Kiedrowicz
  6 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2012-02-13 22:37 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

The diff-highlight script works on heuristics, so it can be
wrong. Let's document some of the wrong-ness in case
somebody feels like working on it.

Signed-off-by: Jeff King <peff@peff.net>
---
These were just some that I considered while looking at the output of
the original and the current code. Suggestions are welcome for more.

 contrib/diff-highlight/README |   93 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/contrib/diff-highlight/README b/contrib/diff-highlight/README
index 4a58579..502e03b 100644
--- a/contrib/diff-highlight/README
+++ b/contrib/diff-highlight/README
@@ -57,3 +57,96 @@ following in your git configuration:
 	show = diff-highlight | less
 	diff = diff-highlight | less
 ---------------------------------------------
+
+Bugs
+----
+
+Because diff-highlight relies on heuristics to guess which parts of
+changes are important, there are some cases where the highlighting is
+more distracting than useful. Fortunately, these cases are rare in
+practice, and when they do occur, the worst case is simply a little
+extra highlighting. This section documents some cases known to be
+sub-optimal, in case somebody feels like working on improving the
+heuristics.
+
+1. Two changes on the same line get highlighted in a blob. For example,
+   highlighting:
+
+----------------------------------------------
+-foo(buf, size);
++foo(obj->buf, obj->size);
+----------------------------------------------
+
+   yields (where the inside of "+{}" would be highlighted):
+
+----------------------------------------------
+-foo(buf, size);
++foo(+{obj->buf, obj->}size);
+----------------------------------------------
+
+   whereas a more semantically meaningful output would be:
+
+----------------------------------------------
+-foo(buf, size);
++foo(+{obj->}buf, +{obj->}size);
+----------------------------------------------
+
+   Note that doing this right would probably involve a set of
+   content-specific boundary patterns, similar to word-diff. Otherwise
+   you get junk like:
+
+-----------------------------------------------------
+-this line has some -{i}nt-{ere}sti-{ng} text on it
++this line has some +{fa}nt+{a}sti+{c} text on it
+-----------------------------------------------------
+
+   which is less readable than the current output.
+
+2. The multi-line matching assumes that lines in the pre- and post-image
+   match by position. This is often the case, but can be fooled when a
+   line is removed from the top and a new one added at the bottom (or
+   vice versa). Unless the lines in the middle are also changed, diffs
+   will show this as two hunks, and it will not get highlighted at all
+   (which is good). But if the lines in the middle are changed, the
+   highlighting can be misleading. Here's a pathological case:
+
+-----------------------------------------------------
+-one
+-two
+-three
+-four
++two 2
++three 3
++four 4
++five 5
+-----------------------------------------------------
+
+   which gets highlighted as:
+
+-----------------------------------------------------
+-one
+-t-{wo}
+-three
+-f-{our}
++two 2
++t+{hree 3}
++four 4
++f+{ive 5}
+-----------------------------------------------------
+
+   because it matches "two" to "three 3", and so forth. It would be
+   nicer as:
+
+-----------------------------------------------------
+-one
+-two
+-three
+-four
++two +{2}
++three +{3}
++four +{4}
++five 5
+-----------------------------------------------------
+
+   which would probably involve pre-matching the lines into pairs
+   according to some heuristic.
-- 
1.7.8.4.17.g2df81

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] diff-highlight: Work for multiline changes too
  2012-02-13 22:27           ` Jeff King
                               ` (4 preceding siblings ...)
  2012-02-13 22:37             ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
@ 2012-02-14  0:05             ` Junio C Hamano
  2012-02-14  0:22               ` Jeff King
  2012-02-14  6:28             ` Michal Kiedrowicz
  6 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2012-02-14  0:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Michał Kiedrowicz, git

Jeff King <peff@peff.net> writes:

> I ended up pulling your changes out into a few distinct commits. That
> made it easier for me to review and understand what was going on (and
> hopefully ditto for other reviewers, or people who end up bisecting or
> reading the log later). I'll post that series in a moment.

This is all nice.  As I am lazy and this is a long neglected contrib/
material I didn't even know it existed ;-), I am tempted to apply them
directly on top of 'master'.

This shows the first hunk of your "diff-highlight: refactor to prepare for
multi-line hunks" like this to me, by the way.

@@ -23,7 +23,7 @@ while (<>) {
            $window[2] =~ /^$COLOR*\+/ &&
            $window[3] !~ /^$COLOR*\+/) {
                print shift @window;
{-               show_pair}(shift @window, shift @window);
+               show_{hunk}(shift @window, shift @window);
        }
        else {
                print shift @window;

Is this intended, or is setting "diff.color.old = red reverse" not
supported (without the custom configuration, the leading blank on the old
line is not highlighted)?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] diff-highlight: Work for multiline changes too
  2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
@ 2012-02-14  0:22               ` Jeff King
  2012-02-14  1:19                 ` Junio C Hamano
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2012-02-14  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michał Kiedrowicz, git

On Mon, Feb 13, 2012 at 04:05:05PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I ended up pulling your changes out into a few distinct commits. That
> > made it easier for me to review and understand what was going on (and
> > hopefully ditto for other reviewers, or people who end up bisecting or
> > reading the log later). I'll post that series in a moment.
> 
> This is all nice.  As I am lazy and this is a long neglected contrib/
> material I didn't even know it existed ;-), I am tempted to apply them
> directly on top of 'master'.

Yeah, given the contrib nature, I'm comfortable just applying. I wanted
to wait on the attribution question from Michał before sending a final
version, though.

> This shows the first hunk of your "diff-highlight: refactor to prepare for
> multi-line hunks" like this to me, by the way.
> 
> @@ -23,7 +23,7 @@ while (<>) {
>             $window[2] =~ /^$COLOR*\+/ &&
>             $window[3] !~ /^$COLOR*\+/) {
>                 print shift @window;
> {-               show_pair}(shift @window, shift @window);
> +               show_{hunk}(shift @window, shift @window);
>         }
>         else {
>                 print shift @window;
> 
> Is this intended, or is setting "diff.color.old = red reverse" not
> supported (without the custom configuration, the leading blank on the old
> line is not highlighted)?

No, it's not intended, and should be:

   -                show-{pair}...
   +                show+{hunk}...

(and appears that way with my config).  I suspect it is because the
default highlight color is a subset of your "old" configured color. It
is ANSI "reverse video", but sadly that does not work as a toggle (i.e.,
it does not reverse your reverse, but simply is a no-op).

I chose reverse because I like the way it looks, and because it should
Just Work if people have selected alternate colors (I never dreamed
somebody would use "reverse" all the time, as I find it horribly ugly.
But to each his own).  We could read the highlight color from
diff.highlight{Old,New}. That would also let people highlight in purple
or something if they care.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] diff-highlight: Work for multiline changes too
  2012-02-14  0:22               ` Jeff King
@ 2012-02-14  1:19                 ` Junio C Hamano
  2012-02-14  6:04                   ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2012-02-14  1:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Michał Kiedrowicz, git

Jeff King <peff@peff.net> writes:

> I chose reverse because I like the way it looks, and because it should
> Just Work if people have selected alternate colors (I never dreamed
> somebody would use "reverse" all the time, as I find it horribly ugly.
> But to each his own).

I also find it ugly, but I am on black-letters on white background window,
and I do not see my terminal's red very well, so it is hard to tell the
old from the context if I use the "diff.color.old=red" default; that is
the primary reason for my setting.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] diff-highlight: Work for multiline changes too
  2012-02-14  1:19                 ` Junio C Hamano
@ 2012-02-14  6:04                   ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-14  6:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michał Kiedrowicz, git

On Mon, Feb 13, 2012 at 05:19:33PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I chose reverse because I like the way it looks, and because it should
> > Just Work if people have selected alternate colors (I never dreamed
> > somebody would use "reverse" all the time, as I find it horribly ugly.
> > But to each his own).
> 
> I also find it ugly, but I am on black-letters on white background window,
> and I do not see my terminal's red very well, so it is hard to tell the
> old from the context if I use the "diff.color.old=red" default; that is
> the primary reason for my setting.

I find black-on-white ugly, too, but I have heard some people find it
more readable. You might try setting color.diff.old to "bold red" to
make it more readable. Depending on your terminal, that may end up as a
brighter shade of red.

Configurable colors for diff-highlight would look like the patch below:

diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight
index c4404d4..f43832b 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -2,11 +2,13 @@
 
 use warnings FATAL => 'all';
 use strict;
+use Git;
+
+my $repo = Git->repository;
+my $color_old = $repo->get_color('color.diff.highlightold', 'reverse');
+my $color_new = $repo->get_color('color.diff.highlightnew', 'reverse');
+my $color_end = $repo->get_color('color.diff.highlightend', 'noreverse');
 
-# Highlight by reversing foreground and background. You could do
-# other things like bold or underline if you prefer.
-my $HIGHLIGHT   = "\x1b[7m";
-my $UNHIGHLIGHT = "\x1b[27m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 
@@ -128,8 +130,8 @@ sub highlight_pair {
 	}
 
 	if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
-		return highlight_line(\@a, $pa, $sa),
-		       highlight_line(\@b, $pb, $sb);
+		return highlight_line(\@a, $pa, $sa, $color_old, $color_end),
+		       highlight_line(\@b, $pb, $sb, $color_new, $color_end);
 	}
 	else {
 		return join('', @a),
@@ -144,13 +146,13 @@ sub split_line {
 }
 
 sub highlight_line {
-	my ($line, $prefix, $suffix) = @_;
+	my ($line, $prefix, $suffix, $highlight, $unhighlight) = @_;
 
 	return join('',
 		@{$line}[0..($prefix-1)],
-		$HIGHLIGHT,
+		$highlight,
 		@{$line}[$prefix..$suffix],
-		$UNHIGHLIGHT,
+		$unhighlight,
 		@{$line}[($suffix+1)..$#$line]
 	);
 }

However, there are two problems:

  1. Git's color-parsing support does not understand the "noreverse"
     attribute. There is no way to turn off attributes short of doing a
     whole "reset". But we don't want to do that here, because we want
     whatever other colors were in effect to continue after the
     highlight ends. The patch to teach "noreverse" is below (though it
     should probably also teach "normal" and "noblink", too).

  2. The Git.pm:get_color code requires that we have a repository
     object, which means we will die() if we are not in a git
     repository. Yet "git config" will do the right thing whether we are
     in a repository or not. I would have thought all of the _maybe_self
     stuff in Git.pm would handle "Git->get_color" properly, but it
     doesn't. That's probably a bug that should be fixed.

I don't especially care about this feature, as I won't use it. But if
you are interested in using diff-highlight and the lack of configurable
colors is blocking you, then I at least know there will be one user and
I don't mind putting a little bit of time into it.

Here's the patch for (1) above.

diff --git a/color.c b/color.c
index e8e2681..b0f53a7 100644
--- a/color.c
+++ b/color.c
@@ -47,9 +47,9 @@ static int parse_color(const char *name, int len)
 
 static int parse_attr(const char *name, int len)
 {
-	static const int attr_values[] = { 1, 2, 4, 5, 7 };
+	static const int attr_values[] = { 1, 2, 4, 5, 7, 27 };
 	static const char * const attr_names[] = {
-		"bold", "dim", "ul", "blink", "reverse"
+		"bold", "dim", "ul", "blink", "reverse", "noreverse"
 	};
 	int i;
 	for (i = 0; i < ARRAY_SIZE(attr_names); i++) {
@@ -128,7 +128,7 @@ void color_parse_mem(const char *value, int value_len, const char *var,
 			attr &= ~bit;
 			if (sep++)
 				*dst++ = ';';
-			*dst++ = '0' + i;
+			dst += sprintf(dst, "%d", i);
 		}
 		if (fg >= 0) {
 			if (sep++)

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] diff-highlight: Work for multiline changes too
  2012-02-13 22:27           ` Jeff King
                               ` (5 preceding siblings ...)
  2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
@ 2012-02-14  6:28             ` Michal Kiedrowicz
  6 siblings, 0 replies; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-14  6:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:

> On Fri, Feb 10, 2012 at 10:47:13PM +0100, Michał Kiedrowicz wrote:
> 
> >  contrib/diff-highlight/diff-highlight |   96
> > ++++++++++++++++++++++----------- 1 files changed, 65
> > insertions(+), 31 deletions(-)
> 
> Thanks for sending. I looked at a whole bunch of patches, and I was
> pleasantly surprised to find how infrequently we hit false positives
> in practice. 

Yeah, I completely agree with that.

> In fact, the only things that looked worse with your
> patch were places where your patch happened to turn on highlighting
> for lines where the existing heuristics already were a little ugly
> (i.e., the problem was not your patch, but that the existing
> heuristic is sometimes non-optimal).
> 
> I ended up pulling your changes out into a few distinct commits. That
> made it easier for me to review and understand what was going on (and
> hopefully ditto for other reviewers, or people who end up bisecting or
> reading the log later). I'll post that series in a moment.

Very nicely done.

> 
> > After looking at outputs I noticed that it can also ignore lines
> > with prefixes/suffixes that consist only of punctuation (asterisk,
> > semicolon, dot, etc), because otherwise whole line is highlighted
> > except for terminating punctuation.
> 
> I missed this note when I applied the patch and started looking at the
> outputs, and ended up having a similar thought. However, I don't know
> that it buys much in practice, and it's nice to be fairly agnostic
> about content. I did leave that open to easy tweaking in my series,
> though.
> 
>   [1/5]: diff-highlight: make perl strict and warnings fatal
>   [2/5]: diff-highlight: don't highlight whole lines
>   [3/5]: diff-highlight: refactor to prepare for multi-line hunks
>   [4/5]: diff-highlight: match multi-line hunks
>   [5/5]: diff-highlight: document some non-optimal cases
> 
> -Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/5] diff-highlight: don't highlight whole lines
  2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
@ 2012-02-14  6:35               ` Michal Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-14  6:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:

> Regarding attribution: I kept myself as author, because I rewrote it a
> bit and figured that I would take primary responsibility for bugs in
> this patch, and in the long run would be responsible for maintaining
> it. But the idea and the substance of the patch are yours, and I
> would be happy to list you as author if you prefer getting the credit
> that way (after all, it bumps your shortlog numbers :) ).

I'm completely OK with that. I don't care much about shortlog :). It's
nice enough to see that you liked my idea :) 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 5/5] diff-highlight: document some non-optimal cases
  2012-02-13 22:37             ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
@ 2012-02-14  6:48               ` Michal Kiedrowicz
  0 siblings, 0 replies; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-14  6:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:

> The diff-highlight script works on heuristics, so it can be
> wrong. Let's document some of the wrong-ness in case
> somebody feels like working on it.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> These were just some that I considered while looking at the output of
> the original and the current code. Suggestions are welcome for more.

I would also add (feel free to reword if my English is not very clear):

3. Sometimes the prefix or suffix is very small but because it exists,
almost whole line is highlighted. For example:

----------------------------------------------
--{This is a very long line}.
++{I like apples}.
----------------------------------------------

or:

----------------------------------------------
-Th-{is is an apple}
+Th+{at was a car}
----------------------------------------------


> 
>  contrib/diff-highlight/README |   93
> +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93
> insertions(+)
> 
> diff --git a/contrib/diff-highlight/README
> b/contrib/diff-highlight/README index 4a58579..502e03b 100644
> --- a/contrib/diff-highlight/README
> +++ b/contrib/diff-highlight/README
> @@ -57,3 +57,96 @@ following in your git configuration:
>  	show = diff-highlight | less
>  	diff = diff-highlight | less
>  ---------------------------------------------
> +
> +Bugs
> +----
> +
> +Because diff-highlight relies on heuristics to guess which parts of
> +changes are important, there are some cases where the highlighting is
> +more distracting than useful. Fortunately, these cases are rare in
> +practice, and when they do occur, the worst case is simply a little
> +extra highlighting. This section documents some cases known to be
> +sub-optimal, in case somebody feels like working on improving the
> +heuristics.
> +
> +1. Two changes on the same line get highlighted in a blob. For
> example,
> +   highlighting:
> +
> +----------------------------------------------
> +-foo(buf, size);
> ++foo(obj->buf, obj->size);
> +----------------------------------------------
> +
> +   yields (where the inside of "+{}" would be highlighted):
> +
> +----------------------------------------------
> +-foo(buf, size);
> ++foo(+{obj->buf, obj->}size);
> +----------------------------------------------
> +
> +   whereas a more semantically meaningful output would be:
> +
> +----------------------------------------------
> +-foo(buf, size);
> ++foo(+{obj->}buf, +{obj->}size);
> +----------------------------------------------
> +
> +   Note that doing this right would probably involve a set of
> +   content-specific boundary patterns, similar to word-diff.
> Otherwise
> +   you get junk like:
> +
> +-----------------------------------------------------
> +-this line has some -{i}nt-{ere}sti-{ng} text on it
> ++this line has some +{fa}nt+{a}sti+{c} text on it
> +-----------------------------------------------------
> +
> +   which is less readable than the current output.
> +
> +2. The multi-line matching assumes that lines in the pre- and
> post-image
> +   match by position. This is often the case, but can be fooled when
> a
> +   line is removed from the top and a new one added at the bottom (or
> +   vice versa). Unless the lines in the middle are also changed,
> diffs
> +   will show this as two hunks, and it will not get highlighted at
> all
> +   (which is good). But if the lines in the middle are changed, the
> +   highlighting can be misleading. Here's a pathological case:
> +
> +-----------------------------------------------------
> +-one
> +-two
> +-three
> +-four
> ++two 2
> ++three 3
> ++four 4
> ++five 5
> +-----------------------------------------------------
> +
> +   which gets highlighted as:
> +
> +-----------------------------------------------------
> +-one
> +-t-{wo}
> +-three
> +-f-{our}
> ++two 2
> ++t+{hree 3}
> ++four 4
> ++f+{ive 5}
> +-----------------------------------------------------
> +
> +   because it matches "two" to "three 3", and so forth. It would be
> +   nicer as:
> +
> +-----------------------------------------------------
> +-one
> +-two
> +-three
> +-four
> ++two +{2}
> ++three +{3}
> ++four +{4}
> ++five 5
> +-----------------------------------------------------
> +
> +   which would probably involve pre-matching the lines into pairs
> +   according to some heuristic.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-10 13:23   ` Jakub Narebski
  2012-02-10 14:15     ` Michał Kiedrowicz
@ 2012-02-14  6:54     ` Michal Kiedrowicz
  2012-02-14  7:14       ` Junio C Hamano
  1 sibling, 1 reply; 67+ messages in thread
From: Michal Kiedrowicz @ 2012-02-14  6:54 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski, Jeff King

Jakub Narebski <jnareb@gmail.com> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@gmail.com> writes:
> 
> > The code that comares lines is based on
> > contrib/diff-highlight/diff-highlight, except that it works with
> > multiline changes too.  It also won't highlight lines that are
> > completely different because that would only make the output
> > unreadable. Combined diffs are not supported but a following commit
> > will change it.
> 
> I was thinking that if gitweb were to support "diff refinement
> highlighting", it would either use one of *Diff* packages from CPAN,
> or "git diff --word-diff" output.
>  

I just started to wonder if we couldn't use output from Jeff's
diff-highlight for gitweb. We could tech diff-highlight to produce diffs
marked with -{} and +{} (this is the notation used by Jeff in one of his
recent patches) or something like this and then just convert that into
HTML markup. Changes in gitweb would be minimal, we would reduce
redundancy and could focus on improving matching algorithm in one place.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-14  6:54     ` Michal Kiedrowicz
@ 2012-02-14  7:14       ` Junio C Hamano
  2012-02-14  8:20         ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2012-02-14  7:14 UTC (permalink / raw)
  To: Michal Kiedrowicz; +Cc: git, Jakub Narebski, Jeff King

Michal Kiedrowicz <michal.kiedrowicz@gmail.com> writes:

> I just started to wonder if we couldn't use output from Jeff's
> diff-highlight for gitweb.

That could be a sensible approach, but

> We could tech diff-highlight to produce diffs
> marked with -{} and +{} (this is the notation used by Jeff in one of his
> recent patches) or something like this and then just convert that into
> HTML markup.

this implementation strategy would not work well, given that the payload
can contain arbitrary letter sequence (e.g. a Perl script that wants be
explicit when writing a hashref literal write +{...}  to disambiguate it
from a block).  If you are going to modularize diff-highlight and reuse
it, it needs to learn how to talk HTML to properly escape the payload.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-14  7:14       ` Junio C Hamano
@ 2012-02-14  8:20         ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-14  8:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michal Kiedrowicz, git, Jakub Narebski

On Mon, Feb 13, 2012 at 11:14:22PM -0800, Junio C Hamano wrote:

> > We could tech diff-highlight to produce diffs
> > marked with -{} and +{} (this is the notation used by Jeff in one of his
> > recent patches) or something like this and then just convert that into
> > HTML markup.
> 
> this implementation strategy would not work well, given that the payload
> can contain arbitrary letter sequence (e.g. a Perl script that wants be
> explicit when writing a hashref literal write +{...}  to disambiguate it
> from a block).  If you are going to modularize diff-highlight and reuse
> it, it needs to learn how to talk HTML to properly escape the payload.

They're both written in perl; perhaps a more sensible solution would be
to lib-ify diff-highlight and use it directly inside gitweb.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-13 21:09         ` Michał Kiedrowicz
@ 2012-02-14 17:31           ` Jakub Narebski
  2012-02-14 18:23             ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jakub Narebski @ 2012-02-14 17:31 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: git

A few issues I have just noticed about this series.


First, about naming.  "Highlighting interesting parts of diff" is
acceptable name, but "syntax highlighting for diff" is not: gitweb
already use syntax highlighting in diff views.  Call it "diff refinement
highlighting", "highlighting changes / changed sections", "intraline
highlighting".


Second, I think (but I am not sure) that there is a bug in code that
finds common suffix and prefix.

If I understand correctly the idea is to highlight changed part if
there is at least one of common non-whitespace suffix or prefix.
So syntax highlighting should look like this:

1. Both prefix and suffix are non empty and non whitespace only

  -foo -{bar} baz
  +foo +{quux} baz

2. Non empty and non whitespace only prefix

  -foo -{bar}
  +foo +{quux}

2. Non empty and non whitespace only suffix

  --{bar} baz
  ++{quux} baz

But in your code $prefix is not the length of common prefix, but
the position of end of prefix in the original line of diff.  So
you start with $prefix = 1... even though the prefix is empty.

How is that supposed to work?


On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
> > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:

> > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > these ones:
> > > 
> > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > http://trac.edgewall.org/changeset/10973
[...]
> > BTW GitHub is closed source, but we can check what algorithm does Trac
> > use for diff refinement highlighting (highlighting changed portions of
> > diff).
> > 
> 
> I think it's
> http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> (see markup intraline_changes()).
 
It is get_change_extent() that finds extent of changes, as a pair
containing the offset at which the changes start, and the negative offset
at which the changes end.  So it is the same solution you use, only
without ignoring whitespace-only prefixes and suffixes...  This code can
be easily ported to Perl, BTW.

The markup_intraline_changes() function compares lines from preimage and
from postimage pairwise, requiring that number of lines matches, the same
like in your algorithm.

[...]
> > I wonder if we can use --diff-words for diff refinement highlighting,
> > i.e. LCS on words.
> 
> I think we can try it, but I worry about performance of running `git
> diff` on every diff chunk.

I was thinking about one single additional run of git-diff-tree with
`--diff-words`, not one per chunk.

Or perhaps even put it together in one git-diff-tree invocation, just like
'commitdiff' action / git_commitdiff() subroutine uses single git-diff-tree
invocation, with the option "--patch-with-raw", to generate both raw diff
for difftree and patchset.
 
> > Anyway Jeff's approach is a bit limited, in that it would work only for
> > change that does not involve adding newlines, for example splitting
> > overly long line when changing something.
> > 
> > See for example line 1786 (in pre-image) in http://trac.edgewall.org/changeset/10973
> 
> Yes, I'm aware of that. I was thinking about improving it later ("Let's
> start with a simple refinment highlightning and maybe later add more
> sophisticated algorithms").

Right.
 
[...]
> > BTW. is it "at least one of prefix or suffix are non-empty" or "both prefix
> > and suffix are non-empty"?
> > 
> 
> At least one. See:
> 
> 	-a = 42;
> 	+b = 42;
> 
> Here prefix is empty but suffix is not.

Nb. prefix is empty but $prefix == 1, and is boolean true.

> > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > > 
> > > 	- * I like you.
> > > 	+ * Alice had a little lamb.
> > 
> > But this patch doesn't implement this feature yet, isn't it?
> 
> No, but is a matter of adding
> 
> 	-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> 	+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
> 
> (and the same for suffix)

All right.  But it is better added as separate patch.  Perhaps even
requiring that not only there is at least one of common prefix or common
suffix, but at least one of them is not whitespace only could be put
in a separate commit...

> > > > > +sub format_rem_add_line {
> > > > > +	my ($rem, $add) = @_;
> > > > > +	my @r = split(//, $rem);
> > > > > +	my @a = split(//, $add);
> > 
> > BTW the name of variable can be just @add and @rem.
> >
> 
> I know they are different scopes but I don't like it. It makes the code
> more confusing IMO. But I won't insist.

In my opinion if the variable refers to the same entity in different
forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
used here) is all right, and even better than trying to come up with
different name for the same thing because of sigil.
 
> > > > Shouldn't
> > > > $prefix / $prefix_len start from 0, not from 1?
> > > 
> > > It starts from 1 because it skips first +/-. It should become obvious
> > > after reading the comment from last patch :).

This means that $prefix is true even if prefix is empty ($prefix == 1).
Wouldn't it be better for $prefix_len to count length of true prefix,
without diff adornment?  Or make @r / @rem skip initial characters...

> > > +	# In combined diff we must ignore two +/- characters.
> > > +	$prefix = 2 if ($is_combined);
> > 
> > Anyway comment about that fact would be nice.
> 
> Will do.

BTW. it is not "2" but "scalar @{$co{'parents'}}".


> > > The splitting and comparing by characters is taken from diff-highlight.
> > > I don't think it's worth changing here.
> > 
> > You are right.
> > 
> > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > and benchmark it comparing to your C-like solution, but it can be left for
> > later (simple is better than clever, usually).
> 
> If you have time :).

Anyway it would be separate commit.  Better to just copy tested code
from contrib/diff-highlight

BTW. would "git blame -C -C -C -w" detect this correctly as code
movement^W copying?
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-14 17:31           ` Jakub Narebski
@ 2012-02-14 18:23             ` Michał Kiedrowicz
  2012-02-14 18:52               ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-14 18:23 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> A few issues I have just noticed about this series.
> 
> 
> First, about naming.  "Highlighting interesting parts of diff" is
> acceptable name, but "syntax highlighting for diff" is not: gitweb
> already use syntax highlighting in diff views.  Call it "diff refinement
> highlighting", "highlighting changes / changed sections", "intraline
> highlighting".

OK, I'll make sure the naming is correct and consistent.
> 
> 
> Second, I think (but I am not sure) that there is a bug in code that
> finds common suffix and prefix.
> 
> If I understand correctly the idea is to highlight changed part if
> there is at least one of common non-whitespace suffix or prefix.
> So syntax highlighting should look like this:
> 
> 1. Both prefix and suffix are non empty and non whitespace only
> 
>   -foo -{bar} baz
>   +foo +{quux} baz
> 
> 2. Non empty and non whitespace only prefix
> 
>   -foo -{bar}
>   +foo +{quux}
> 
> 2. Non empty and non whitespace only suffix
> 
>   --{bar} baz
>   ++{quux} baz
> 
> But in your code $prefix is not the length of common prefix, but
> the position of end of prefix in the original line of diff.  So
> you start with $prefix = 1... even though the prefix is empty.
> 
> How is that supposed to work?

But see the check:

+	# Mark lines that are different from each other, but have some common
+	# part that isn't whitespace.  If lines are completely different, don't
+	# mark them because that would make output unreadable, especially if
+	# diff consists of multiple lines.
+	if (($prefix == 1 && $suffix_rem == $#r && $suffix_add == $#a)
                        ^
                        |
                        This part.

+		|| ($prefix_is_space && $suffix_is_space)) {
+		$esc_rem = esc_html($rem);
+		$esc_add = esc_html($add);
+	} else {
+		$esc_rem = esc_html_mark_range(\@r, $prefix, $suffix_rem);
+		$esc_add = esc_html_mark_range(\@a, $prefix, $suffix_add);
+	}

I guess it's still not correct because it should be equal to number of
parents or $prefix_length should start from 0 like you wrote later.

> 
> 
> On Mon, 13 Feb 2012, Michał Kiedrowicz wrote:
> > Jakub Narebski <jnareb@gmail.com> wrote:
> > > On Mon, 13 Feb 2012, Michal Kiedrowicz wrote:
> 
> > > > I haven't found *examples* on GitHub and Trac sites, but what about
> > > > these ones:
> > > > 
> > > > https://github.com/gitster/git/commit/8cad4744ee37ebec1d9491a1381ec1771a1ba795
> > > > http://trac.edgewall.org/changeset/10973
> [...]
> > > BTW GitHub is closed source, but we can check what algorithm does Trac
> > > use for diff refinement highlighting (highlighting changed portions of
> > > diff).
> > > 
> > 
> > I think it's
> > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> > (see markup intraline_changes()).
>  
> It is get_change_extent() that finds extent of changes, as a pair
> containing the offset at which the changes start, and the negative offset
> at which the changes end.  So it is the same solution you use, only
> without ignoring whitespace-only prefixes and suffixes...  This code can
> be easily ported to Perl, BTW.
> 
> The markup_intraline_changes() function compares lines from preimage and
> from postimage pairwise, requiring that number of lines matches, the same
> like in your algorithm.
> 

So using Jeff's diff-highlight we remain quite consistent with Trac
output. There's nothing we can "steal" from it. 

> > > > I would also consider ignoring prefixes/suffixes with punctuation, like:
> > > > 
> > > > 	- * I like you.
> > > > 	+ * Alice had a little lamb.
> > > 
> > > But this patch doesn't implement this feature yet, isn't it?
> > 
> > No, but is a matter of adding
> > 
> > 	-$prefix_is_space = 0 if ($r[$prefix] !~ /\s/);
> > 	+$prefix_is_space = 0 if ($r[$prefix] !~ /\s|[[:punct:]]/);
> > 
> > (and the same for suffix)
> 
> All right.  But it is better added as separate patch.

Sure 

> Perhaps even
> requiring that not only there is at least one of common prefix or common
> suffix, but at least one of them is not whitespace only could be put
> in a separate commit...
> 
> > > > > > +sub format_rem_add_line {
> > > > > > +	my ($rem, $add) = @_;
> > > > > > +	my @r = split(//, $rem);
> > > > > > +	my @a = split(//, $add);
> > > 
> > > BTW the name of variable can be just @add and @rem.
> > >
> > 
> > I know they are different scopes but I don't like it. It makes the code
> > more confusing IMO. But I won't insist.
> 
> In my opinion if the variable refers to the same entity in different
> forms, using @foo and %foo (used in gitweb), or $foo and @foo (could be
> used here) is all right, and even better than trying to come up with
> different name for the same thing because of sigil.

OK. I'll follow that convention then.

>  
> > > > > Shouldn't
> > > > > $prefix / $prefix_len start from 0, not from 1?
> > > > 
> > > > It starts from 1 because it skips first +/-. It should become obvious
> > > > after reading the comment from last patch :).
> 
> This means that $prefix is true even if prefix is empty ($prefix == 1).
> Wouldn't it be better for $prefix_len to count length of true prefix,
> without diff adornment?  Or make @r / @rem skip initial characters...
> 
> > > > +	# In combined diff we must ignore two +/- characters.
> > > > +	$prefix = 2 if ($is_combined);
> > > 
> > > Anyway comment about that fact would be nice.
> > 
> > Will do.
> 
> BTW. it is not "2" but "scalar @{$co{'parents'}}".
> 

OK.

> 
> > > > The splitting and comparing by characters is taken from diff-highlight.
> > > > I don't think it's worth changing here.
> > > 
> > > You are right.
> > > 
> > > I'll try to come with hacky algorithm using string bitwise xor and regexp,
> > > and benchmark it comparing to your C-like solution, but it can be left for
> > > later (simple is better than clever, usually).
> > 
> > If you have time :).
> 
> Anyway it would be separate commit.  Better to just copy tested code
> from contrib/diff-highlight
> 
> BTW. would "git blame -C -C -C -w" detect this correctly as code
> movement^W copying?
>  

Cannot say. Have you considered what I wrote in a separate e-mail,
about using diff-highlight output directly / as a library? 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-14 18:23             ` Michał Kiedrowicz
@ 2012-02-14 18:52               ` Jeff King
  2012-02-14 20:04                 ` Michał Kiedrowicz
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2012-02-14 18:52 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: Jakub Narebski, git

On Tue, Feb 14, 2012 at 07:23:40PM +0100, Michał Kiedrowicz wrote:

> > > > BTW GitHub is closed source, but we can check what algorithm does Trac
> > > > use for diff refinement highlighting (highlighting changed portions of
> > > > diff).
> > > > 
> > > I think it's
> > > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> > > (see markup intraline_changes()).
> [...]
> [...]
> > The markup_intraline_changes() function compares lines from preimage and
> > from postimage pairwise, requiring that number of lines matches, the same
> > like in your algorithm.
> 
> So using Jeff's diff-highlight we remain quite consistent with Trac
> output. There's nothing we can "steal" from it.

Neat. When I originally wrote diff-highlight, I was inspired by seeing
what Trac and GitHub did, but came up with the algorithm on my own.
Learning that Trac does the multiline thing (as we started doing with
recent patches) makes me feel even better that it's a good strategy.

As an aside, I looked at what GitHub does. It turns on highlighting only
for the single-line case, and does the same prefix/suffix thing.

It's really not very complex code; most of the hassle in diff-highlight
is ignoring (but preserving) embedded colors, so that we build on top of
existing colorization. A web tool will be doing the line coloring itself
anyway, so it can be much simpler. Elsewhere, I suggested lib-ifying
diff-highlight for gitweb to use. But once you remove the embedded color
handling, there really is not that much code, and it's probably simpler
to just rewrite it.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-14 18:52               ` Jeff King
@ 2012-02-14 20:04                 ` Michał Kiedrowicz
  2012-02-14 20:38                   ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-14 20:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Jakub Narebski, git

Jeff King <peff@peff.net> wrote:

> On Tue, Feb 14, 2012 at 07:23:40PM +0100, Michał Kiedrowicz wrote:
> 
> > > > > BTW GitHub is closed source, but we can check what algorithm does Trac
> > > > > use for diff refinement highlighting (highlighting changed portions of
> > > > > diff).
> > > > > 
> > > > I think it's
> > > > http://trac.edgewall.org/browser/trunk/trac/versioncontrol/diff.py
> > > > (see markup intraline_changes()).
> > [...]
> > [...]
> > > The markup_intraline_changes() function compares lines from preimage and
> > > from postimage pairwise, requiring that number of lines matches, the same
> > > like in your algorithm.
> > 
> > So using Jeff's diff-highlight we remain quite consistent with Trac
> > output. There's nothing we can "steal" from it.
> 
> Neat. When I originally wrote diff-highlight, I was inspired by seeing
> what Trac and GitHub did, but came up with the algorithm on my own.
> Learning that Trac does the multiline thing (as we started doing with
> recent patches) makes me feel even better that it's a good strategy.
> 
> As an aside, I looked at what GitHub does. It turns on highlighting only
> for the single-line case, and does the same prefix/suffix thing.
> 
> It's really not very complex code; most of the hassle in diff-highlight
> is ignoring (but preserving) embedded colors, so that we build on top of
> existing colorization. A web tool will be doing the line coloring itself
> anyway, so it can be much simpler. Elsewhere, I suggested lib-ifying
> diff-highlight for gitweb to use. But once you remove the embedded color
> handling, there really is not that much code, and it's probably simpler
> to just rewrite it.
> 
> -Peff

Sure, now it's a simple algorithm, but if we add more code, we will
have problems with making it consistent in both gitweb and
diff-highlight (which is nice-to-have IMO). Note that my patches to
gitweb already support combined diffs (in obvious cases) while
diff-highlight will fail on them badly (see for example 09bb4eb4f14c).
I haven't done it in diff-highlight because I noticed that problem
while working on patches for gitweb.

Anyway, thanks for looking at Trac/GitHub code and sharing your opinion.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff
  2012-02-14 20:04                 ` Michał Kiedrowicz
@ 2012-02-14 20:38                   ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2012-02-14 20:38 UTC (permalink / raw)
  To: Michał Kiedrowicz; +Cc: Jakub Narebski, git

On Tue, Feb 14, 2012 at 09:04:53PM +0100, Michał Kiedrowicz wrote:

> Sure, now it's a simple algorithm, but if we add more code, we will
> have problems with making it consistent in both gitweb and
> diff-highlight (which is nice-to-have IMO).

True. I was thinking they would stay simple enough that porting features
wouldn't be too painful. But that might be overly optimistic.

> Note that my patches to gitweb already support combined diffs (in
> obvious cases) while diff-highlight will fail on them badly (see for
> example 09bb4eb4f14c).  I haven't done it in diff-highlight because I
> noticed that problem while working on patches for gitweb.

Hmm. 09bb4eb4f14c looks fine to me, because all of the combined lines
are from the left-hand side. A better example is 4802997, where the
first hunk properly highlights, but the second does not (because we
interpret the lines as context lines).

Even worse is "git show 5de89d3 -- notes-cache.c", where we match a "- "
line with a "++" line, and erroneously think the changes begin in the
second character (even though it is simply a combined-diff marker).

These are reasonably rare, so I don't consider them critical bugs, but
yeah, it would be nice to fix.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
  2012-02-11 15:53   ` Jakub Narebski
  2012-02-11 23:16     ` Michał Kiedrowicz
@ 2012-02-25  9:00     ` Michał Kiedrowicz
  1 sibling, 0 replies; 67+ messages in thread
From: Michał Kiedrowicz @ 2012-02-25  9:00 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:

> BTW. I didn't examine the final code, but what happens for binary
> diffs that git supports?  Is it handled outside print_diff_chunk()?

gitweb doesn't produce binary diffs at all except for "?a=patch", but
it isn't HTML'ed. So I don't think there anything I can do with it.

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2012-02-25  9:00 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-10  9:18 [PATCH 0/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 1/8] gitweb: Extract print_sidebyside_diff_lines() Michał Kiedrowicz
2012-02-11 15:20   ` Jakub Narebski
2012-02-11 23:03     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 2/8] gitweb: Use print_diff_chunk() for both side-by-side and inline diffs Michał Kiedrowicz
2012-02-11 15:53   ` Jakub Narebski
2012-02-11 23:16     ` Michał Kiedrowicz
2012-02-25  9:00     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 3/8] gitweb: Move HTML-formatting diff line back to process_diff_line() Michał Kiedrowicz
2012-02-11 16:02   ` Jakub Narebski
2012-02-10  9:18 ` [PATCH 4/8] gitweb: Push formatting diff lines to print_diff_chunk() Michał Kiedrowicz
2012-02-11 16:29   ` Jakub Narebski
2012-02-11 23:20     ` Michał Kiedrowicz
2012-02-11 23:30       ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 5/8] gitweb: Format diff lines just before printing Michał Kiedrowicz
2012-02-11 17:14   ` Jakub Narebski
2012-02-11 23:38     ` Michał Kiedrowicz
2012-02-10  9:18 ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Michał Kiedrowicz
2012-02-10 13:23   ` Jakub Narebski
2012-02-10 14:15     ` Michał Kiedrowicz
2012-02-10 14:55       ` Jakub Narebski
2012-02-10 17:33         ` Michał Kiedrowicz
2012-02-10 22:52           ` Splitting gitweb (was: Re: [PATCH 6/8] gitweb: Highlight interesting parts of diff) Jakub Narebski
2012-02-10 20:24         ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jeff King
2012-02-14  6:54     ` Michal Kiedrowicz
2012-02-14  7:14       ` Junio C Hamano
2012-02-14  8:20         ` Jeff King
2012-02-10 20:20   ` Jeff King
2012-02-10 21:29     ` Michał Kiedrowicz
2012-02-10 21:32       ` Jeff King
2012-02-10 21:36         ` Michał Kiedrowicz
2012-02-10 21:47         ` [PATCH] diff-highlight: Work for multiline changes too Michał Kiedrowicz
2012-02-13 22:27           ` Jeff King
2012-02-13 22:28             ` [PATCH 1/5] diff-highlight: make perl strict and warnings fatal Jeff King
2012-02-13 22:32             ` [PATCH 2/5] diff-highlight: don't highlight whole lines Jeff King
2012-02-14  6:35               ` Michal Kiedrowicz
2012-02-13 22:33             ` [PATCH 3/5] diff-highlight: refactor to prepare for multi-line hunks Jeff King
2012-02-13 22:36             ` [PATCH 4/5] diff-highlight: match " Jeff King
2012-02-13 22:37             ` [PATCH 5/5] diff-highlight: document some non-optimal cases Jeff King
2012-02-14  6:48               ` Michal Kiedrowicz
2012-02-14  0:05             ` [PATCH] diff-highlight: Work for multiline changes too Junio C Hamano
2012-02-14  0:22               ` Jeff King
2012-02-14  1:19                 ` Junio C Hamano
2012-02-14  6:04                   ` Jeff King
2012-02-14  6:28             ` Michal Kiedrowicz
2012-02-10 21:56     ` [PATCH 6/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 23:45   ` Jakub Narebski
2012-02-12 10:42     ` Jakub Narebski
2012-02-13  6:54       ` Michal Kiedrowicz
2012-02-13 19:58         ` Jakub Narebski
2012-02-13 21:10           ` Michał Kiedrowicz
2012-02-13  6:41     ` Michal Kiedrowicz
2012-02-13 18:44       ` Jakub Narebski
2012-02-13 21:09         ` Michał Kiedrowicz
2012-02-14 17:31           ` Jakub Narebski
2012-02-14 18:23             ` Michał Kiedrowicz
2012-02-14 18:52               ` Jeff King
2012-02-14 20:04                 ` Michał Kiedrowicz
2012-02-14 20:38                   ` Jeff King
2012-02-10  9:18 ` [PATCH 7/8] gitweb: Use different colors to present marked changes Michał Kiedrowicz
2012-02-12  0:11   ` Jakub Narebski
2012-02-13  6:46     ` Michal Kiedrowicz
2012-02-10  9:18 ` [PATCH 8/8] gitweb: Highlight combined diffs Michał Kiedrowicz
2012-02-12  0:03   ` Jakub Narebski
2012-02-13  6:48     ` Michal Kiedrowicz
2012-02-11 18:32 ` [PATCH 0/8] gitweb: Highlight interesting parts of diff Jakub Narebski
2012-02-11 22:56   ` Michał Kiedrowicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.