All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] chainlint: improve annotated output
@ 2022-11-08 19:08 Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 1/4] chainlint: add explanatory comments Eric Sunshine via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Eric Sunshine via GitGitGadget @ 2022-11-08 19:08 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine

When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition, it
instead dumps out a parsed token representation of the test. Since it lacks
comments, indentation, here-doc bodies, and so forth, this tokenized
representation can be difficult for the test author to digest and relate
back to the original test definition.

An earlier patch series[1] improved the output somewhat by colorizing the
"?!FOO?!" annotations and the "# chainlint:" lines, but the output can still
be difficult to digest.

This patch series further improves the output by instead making chainlint.pl
annotate the original test definition rather than the parsed token stream,
thus preserving indentation (and whitespace, in general), here-doc bodies,
etc., which should make it easier for a test author to relate each problem
back to the source.

This series was inspired by usability comments from Peff[2] and Ævar[3] and
a bit of discussion which followed[4][5].

(Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)

FOOTNOTES

[1]
https://lore.kernel.org/git/pull.1324.v2.git.git.1663041707260.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/Yx1x5lme2SGBjfia@coredump.intra.peff.net/
[3] https://lore.kernel.org/git/221024.865yg9ecsx.gmgdl@evledraar.gmail.com/
[4]
https://lore.kernel.org/git/CAPig+cRJVn-mbA6-jOmNfDJtK_nX4ZTw+OcNShvvz8zcQYbCHQ@mail.gmail.com/
[5]
https://lore.kernel.org/git/CAPig+cT=cWYT6kicNWT+6RxfiKKMyVz72H3_9kwkF-f4Vuoe1w@mail.gmail.com/

Eric Sunshine (4):
  chainlint: add explanatory comments
  chainlint: tighten accuracy when consuming input stream
  chainlint: latch start/end position of each token
  chainlint: annotate original test definition rather than token stream

 t/chainlint.pl                                | 107 +++++++++++-------
 t/chainlint/block-comment.expect              |   2 +
 t/chainlint/case-comment.expect               |   3 +
 t/chainlint/close-subshell.expect             |   3 +-
 t/chainlint/comment.expect                    |   4 +
 t/chainlint/double-here-doc.expect            |  14 ++-
 t/chainlint/empty-here-doc.expect             |   3 +-
 t/chainlint/for-loop.expect                   |   4 +-
 t/chainlint/here-doc-close-subshell.expect    |   4 +-
 t/chainlint/here-doc-indent-operator.expect   |  10 +-
 .../here-doc-multi-line-command-subst.expect  |   5 +-
 t/chainlint/here-doc-multi-line-string.expect |   4 +-
 t/chainlint/here-doc.expect                   |  24 +++-
 t/chainlint/if-then-else.expect               |   4 +-
 t/chainlint/incomplete-line.expect            |  10 +-
 t/chainlint/inline-comment.expect             |   4 +-
 t/chainlint/loop-detect-status.expect         |   2 +-
 t/chainlint/nested-here-doc.expect            |  27 ++++-
 t/chainlint/nested-subshell-comment.expect    |   2 +
 t/chainlint/subshell-here-doc.expect          |  28 ++++-
 t/chainlint/t7900-subtree.expect              |   4 +
 t/chainlint/while-loop.expect                 |   4 +-
 22 files changed, 206 insertions(+), 66 deletions(-)


base-commit: 63bba4fdd86d80ef061c449daa97a981a9be0792
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1375%2Fsunshineco%2Fchainlintpreserve-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1375/sunshineco/chainlintpreserve-v1
Pull-Request: https://github.com/git/git/pull/1375
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] chainlint: add explanatory comments
  2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
@ 2022-11-08 19:08 ` Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 2/4] chainlint: tighten accuracy when consuming input stream Eric Sunshine via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Eric Sunshine via GitGitGadget @ 2022-11-08 19:08 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Eric Sunshine

From: Eric Sunshine <sunshine@sunshineco.com>

The logic in TestParser::accumulate() for detecting broken &&-chains is
mostly well-commented, but a couple branches which were deemed obvious
and straightforward lack comments. In retrospect, though, these cases
may give future readers pause, so comment them, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index 976db4b8a01..9908de6c758 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -505,7 +505,11 @@ my @safe_endings = (
 
 sub accumulate {
 	my ($self, $tokens, $cmd) = @_;
+
+	# no previous command to check for missing "&&"
 	goto DONE unless @$tokens;
+
+	# new command is empty line; can't yet check if previous is missing "&&"
 	goto DONE if @$cmd == 1 && $$cmd[0] eq "\n";
 
 	# did previous command end with "&&", "|", "|| return" or similar?
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] chainlint: tighten accuracy when consuming input stream
  2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 1/4] chainlint: add explanatory comments Eric Sunshine via GitGitGadget
@ 2022-11-08 19:08 ` Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 3/4] chainlint: latch start/end position of each token Eric Sunshine via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Eric Sunshine via GitGitGadget @ 2022-11-08 19:08 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Eric Sunshine

From: Eric Sunshine <sunshine@sunshineco.com>

To extract the next token in the input stream, Lexer::scan_token() finds
the start of the token by skipping whitespace, then consumes characters
belonging to the token until it encounters a non-token character, such
as an operator, punctuation, or whitespace. In the case of an operator
or punctuation which ends a token, before returning the just-scanned
token, it pushes that operator or punctuation character back onto the
input stream to ensure that it will be the first character consumed by
the next call to scan_token().

However, scan_token() is intentionally lax when whitespace ends a token;
it doesn't bother pushing the whitespace character back onto the token
stream since it knows that the next call to scan_token() will, as its
first step, skip over whitespace anyhow when looking for the start of
the token.

Although such laxity is harmless for the proper functioning of the
lexical analyzer, it does make it difficult to precisely identify the
token's end position in the input stream. Accurate token position
information may be desirable, for instance, to annotate problems or
highlight other interesting facets of the input found during the parsing
phase. To accommodate such possibilities, tighten scan_token() by making
it push the token-ending whitespace character back onto the input
stream, just as it does for other token-ending characters.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index 9908de6c758..1f66c03c593 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -179,7 +179,7 @@ RESTART:
 		# handle special characters
 		last unless $$b =~ /\G(.)/sgc;
 		my $c = $1;
-		last if $c =~ /^[ \t]$/; # whitespace ends token
+		pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token
 		pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/;
 		$token .= $self->scan_sqstring(), next if $c eq "'";
 		$token .= $self->scan_dqstring(), next if $c eq '"';
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] chainlint: latch start/end position of each token
  2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 1/4] chainlint: add explanatory comments Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 2/4] chainlint: tighten accuracy when consuming input stream Eric Sunshine via GitGitGadget
@ 2022-11-08 19:08 ` Eric Sunshine via GitGitGadget
  2022-11-08 19:08 ` [PATCH 4/4] chainlint: annotate original test definition rather than token stream Eric Sunshine via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Eric Sunshine via GitGitGadget @ 2022-11-08 19:08 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Eric Sunshine

From: Eric Sunshine <sunshine@sunshineco.com>

When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

To address this shortcoming, an upcoming change will make it print out
an annotated copy of the original test definition rather than the
tokenized representation. In order to do so, it will need to know the
start and end positions of each token in the original test definition.
As preparation, upgrade TestParser::scan_token() to latch the start and
end position of the token being scanned, and return that information
along with the token itself. A subsequent change will take advantage of
this positional information.

In terms of implementation, TestParser::scan_token() is retrofitted to
return a tuple consisting of the token's lexeme and its start and end
positions, rather than returning just the lexeme. However, an
alternative would be to define a class which represents a token:

    package Token;

    sub new {
        my ($class, $lexeme, $start, $end) = @_;
        bless [$lexeme, $start, $end] => $class;
    }

    sub as_string {
        my $self = shift @_;
        return $self->[0];
    }

    sub compare {
        my ($x, $y) = @_;
        if (UNIVERSAL::isa($y, 'Token')) {
            return $x->[0] cmp $y->[0];
        }
        return $x->[0] cmp $y;
    }

    use overload (
        '""' => 'as_string',
        'cmp' => 'compare'
    );

The major benefit of the class-based approach is that it is entirely
non-invasive; it requires no additional changes to the rest of the
script since a Token converts automatically to a string, which is what
scan_token() historically returned.

The big downside to the Token approach, however, is that it is _slow_;
on this developer's (old) machine, it increases user-time by an
unacceptable seven seconds when scanning all test scripts in the
project. Hence, the simple tuple approach is employed instead since it
adds only a fraction of a second user-time.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 80 +++++++++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 37 deletions(-)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index 1f66c03c593..59aa79babc2 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -75,7 +75,9 @@ sub scan_heredoc_tag {
 	my $self = shift @_;
 	${$self->{buff}} =~ /\G(-?)/gc;
 	my $indented = $1;
-	my $tag = $self->scan_token();
+	my $token = $self->scan_token();
+	return "<<$indented" unless $token;
+	my $tag = $token->[0];
 	$tag =~ s/['"\\]//g;
 	push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
 	return "<<$indented$tag";
@@ -149,7 +151,7 @@ sub scan_dollar {
 	my $self = shift @_;
 	my $b = $self->{buff};
 	return $self->scan_balanced('(', ')') if $$b =~ /\G\((?=\()/gc; # $((...))
-	return '(' . join(' ', $self->scan_subst()) . ')' if $$b =~ /\G\(/gc; # $(...)
+	return '(' . join(' ', map {$_->[0]} $self->scan_subst()) . ')' if $$b =~ /\G\(/gc; # $(...)
 	return $self->scan_balanced('{', '}') if $$b =~ /\G\{/gc; # ${...}
 	return $1 if $$b =~ /\G(\w+)/gc; # $var
 	return $1 if $$b =~ /\G([@*#?$!0-9-])/gc; # $*, $1, $$, etc.
@@ -170,9 +172,11 @@ sub scan_token {
 	my $self = shift @_;
 	my $b = $self->{buff};
 	my $token = '';
+	my $start;
 RESTART:
 	$$b =~ /\G[ \t]+/gc; # skip whitespace (but not newline)
-	return "\n" if $$b =~ /\G#[^\n]*(?:\n|\z)/gc; # comment
+	$start = pos($$b) || 0;
+	return ["\n", $start, pos($$b)] if $$b =~ /\G#[^\n]*(?:\n|\z)/gc; # comment
 	while (1) {
 		# slurp up non-special characters
 		$token .= $1 if $$b =~ /\G([^\\;&|<>(){}'"\$\s]+)/gc;
@@ -197,7 +201,7 @@ RESTART:
 		}
 		die("internal error scanning character '$c'\n");
 	}
-	return length($token) ? $token : undef;
+	return length($token) ? [$token, $start, pos($$b)] : undef;
 }
 
 # ShellParser parses POSIX shell scripts (with minor extensions for Bash). It
@@ -239,14 +243,14 @@ sub stop_at {
 	my ($self, $token) = @_;
 	return 1 unless defined($token);
 	my $stop = ${$self->{stop}}[-1] if @{$self->{stop}};
-	return defined($stop) && $token =~ $stop;
+	return defined($stop) && $token->[0] =~ $stop;
 }
 
 sub expect {
 	my ($self, $expect) = @_;
 	my $token = $self->next_token();
-	return $token if defined($token) && $token eq $expect;
-	push(@{$self->{output}}, "?!ERR?! expected '$expect' but found '" . (defined($token) ? $token : "<end-of-input>") . "'\n");
+	return $token if defined($token) && $token->[0] eq $expect;
+	push(@{$self->{output}}, "?!ERR?! expected '$expect' but found '" . (defined($token) ? $token->[0] : "<end-of-input>") . "'\n");
 	$self->untoken($token) if defined($token);
 	return ();
 }
@@ -255,7 +259,7 @@ sub optional_newlines {
 	my $self = shift @_;
 	my @tokens;
 	while (my $token = $self->peek()) {
-		last unless $token eq "\n";
+		last unless $token->[0] eq "\n";
 		push(@tokens, $self->next_token());
 	}
 	return @tokens;
@@ -278,7 +282,7 @@ sub parse_case_pattern {
 	my @tokens;
 	while (defined(my $token = $self->next_token())) {
 		push(@tokens, $token);
-		last if $token eq ')';
+		last if $token->[0] eq ')';
 	}
 	return @tokens;
 }
@@ -293,13 +297,13 @@ sub parse_case {
 	     $self->optional_newlines());
 	while (1) {
 		my $token = $self->peek();
-		last unless defined($token) && $token ne 'esac';
+		last unless defined($token) && $token->[0] ne 'esac';
 		push(@tokens,
 		     $self->parse_case_pattern(),
 		     $self->optional_newlines(),
 		     $self->parse(qr/^(?:;;|esac)$/)); # item body
 		$token = $self->peek();
-		last unless defined($token) && $token ne 'esac';
+		last unless defined($token) && $token->[0] ne 'esac';
 		push(@tokens,
 		     $self->expect(';;'),
 		     $self->optional_newlines());
@@ -315,7 +319,7 @@ sub parse_for {
 	     $self->next_token(), # variable
 	     $self->optional_newlines());
 	my $token = $self->peek();
-	if (defined($token) && $token eq 'in') {
+	if (defined($token) && $token->[0] eq 'in') {
 		push(@tokens,
 		     $self->expect('in'),
 		     $self->optional_newlines());
@@ -339,11 +343,11 @@ sub parse_if {
 		     $self->optional_newlines(),
 		     $self->parse(qr/^(?:elif|else|fi)$/)); # if/elif body
 		my $token = $self->peek();
-		last unless defined($token) && $token eq 'elif';
+		last unless defined($token) && $token->[0] eq 'elif';
 		push(@tokens, $self->expect('elif'));
 	}
 	my $token = $self->peek();
-	if (defined($token) && $token eq 'else') {
+	if (defined($token) && $token->[0] eq 'else') {
 		push(@tokens,
 		     $self->expect('else'),
 		     $self->optional_newlines(),
@@ -380,7 +384,7 @@ sub parse_bash_array_assignment {
 	my @tokens = $self->expect('(');
 	while (defined(my $token = $self->next_token())) {
 		push(@tokens, $token);
-		last if $token eq ')';
+		last if $token->[0] eq ')';
 	}
 	return @tokens;
 }
@@ -398,29 +402,31 @@ sub parse_cmd {
 	my $self = shift @_;
 	my $cmd = $self->next_token();
 	return () unless defined($cmd);
-	return $cmd if $cmd eq "\n";
+	return $cmd if $cmd->[0] eq "\n";
 
 	my $token;
 	my @tokens = $cmd;
-	if ($cmd eq '!') {
+	if ($cmd->[0] eq '!') {
 		push(@tokens, $self->parse_cmd());
 		return @tokens;
-	} elsif (my $f = $compound{$cmd}) {
+	} elsif (my $f = $compound{$cmd->[0]}) {
 		push(@tokens, $self->$f());
-	} elsif (defined($token = $self->peek()) && $token eq '(') {
-		if ($cmd !~ /\w=$/) {
+	} elsif (defined($token = $self->peek()) && $token->[0] eq '(') {
+		if ($cmd->[0] !~ /\w=$/) {
 			push(@tokens, $self->parse_func());
 			return @tokens;
 		}
-		$tokens[-1] .= join(' ', $self->parse_bash_array_assignment());
+		my @array = $self->parse_bash_array_assignment();
+		$tokens[-1]->[0] .= join(' ', map {$_->[0]} @array);
+		$tokens[-1]->[2] = $array[$#array][2] if @array;
 	}
 
 	while (defined(my $token = $self->next_token())) {
 		$self->untoken($token), last if $self->stop_at($token);
 		push(@tokens, $token);
-		last if $token =~ /^(?:[;&\n|]|&&|\|\|)$/;
+		last if $token->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
 	}
-	push(@tokens, $self->next_token()) if $tokens[-1] ne "\n" && defined($token = $self->peek()) && $token eq "\n";
+	push(@tokens, $self->next_token()) if $tokens[-1]->[0] ne "\n" && defined($token = $self->peek()) && $token->[0] eq "\n";
 	return @tokens;
 }
 
@@ -457,7 +463,7 @@ sub find_non_nl {
 	my $tokens = shift @_;
 	my $n = shift @_;
 	$n = $#$tokens if !defined($n);
-	$n-- while $n >= 0 && $$tokens[$n] eq "\n";
+	$n-- while $n >= 0 && $$tokens[$n]->[0] eq "\n";
 	return $n;
 }
 
@@ -467,7 +473,7 @@ sub ends_with {
 	for my $needle (reverse(@$needles)) {
 		return undef if $n < 0;
 		$n = find_non_nl($tokens, $n), next if $needle eq "\n";
-		return undef if $$tokens[$n] !~ $needle;
+		return undef if $$tokens[$n]->[0] !~ $needle;
 		$n--;
 	}
 	return 1;
@@ -486,13 +492,13 @@ sub parse_loop_body {
 	my $self = shift @_;
 	my @tokens = $self->SUPER::parse_loop_body(@_);
 	# did loop signal failure via "|| return" or "|| exit"?
-	return @tokens if !@tokens || grep(/^(?:return|exit|\$\?)$/, @tokens);
+	return @tokens if !@tokens || grep {$_->[0] =~ /^(?:return|exit|\$\?)$/} @tokens;
 	# did loop upstream of a pipe signal failure via "|| echo 'impossible
 	# text'" as the final command in the loop body?
 	return @tokens if ends_with(\@tokens, [qr/^\|\|$/, "\n", qr/^echo$/, qr/^.+$/]);
 	# flag missing "return/exit" handling explicit failure in loop body
 	my $n = find_non_nl(\@tokens);
-	splice(@tokens, $n + 1, 0, '?!LOOP?!');
+	splice(@tokens, $n + 1, 0, ['?!LOOP?!', $tokens[$n]->[1], $tokens[$n]->[2]]);
 	return @tokens;
 }
 
@@ -510,7 +516,7 @@ sub accumulate {
 	goto DONE unless @$tokens;
 
 	# new command is empty line; can't yet check if previous is missing "&&"
-	goto DONE if @$cmd == 1 && $$cmd[0] eq "\n";
+	goto DONE if @$cmd == 1 && $$cmd[0]->[0] eq "\n";
 
 	# did previous command end with "&&", "|", "|| return" or similar?
 	goto DONE if match_ending($tokens, \@safe_endings);
@@ -518,20 +524,20 @@ sub accumulate {
 	# if this command handles "$?" specially, then okay for previous
 	# command to be missing "&&"
 	for my $token (@$cmd) {
-		goto DONE if $token =~ /\$\?/;
+		goto DONE if $token->[0] =~ /\$\?/;
 	}
 
 	# if this command is "false", "return 1", or "exit 1" (which signal
 	# failure explicitly), then okay for all preceding commands to be
 	# missing "&&"
-	if ($$cmd[0] =~ /^(?:false|return|exit)$/) {
-		@$tokens = grep(!/^\?!AMP\?!$/, @$tokens);
+	if ($$cmd[0]->[0] =~ /^(?:false|return|exit)$/) {
+		@$tokens = grep {$_->[0] !~ /^\?!AMP\?!$/} @$tokens;
 		goto DONE;
 	}
 
 	# flag missing "&&" at end of previous command
 	my $n = find_non_nl($tokens);
-	splice(@$tokens, $n + 1, 0, '?!AMP?!') unless $n < 0;
+	splice(@$tokens, $n + 1, 0, ['?!AMP?!', $$tokens[$n]->[1], $$tokens[$n]->[2]]) unless $n < 0;
 
 DONE:
 	$self->SUPER::accumulate($tokens, $cmd);
@@ -557,7 +563,7 @@ sub new {
 # composition of multiple strings and non-string character runs; for instance,
 # `"test body"` unwraps to `test body`; `word"a b"42'c d'` to `worda b42c d`
 sub unwrap {
-	my $token = @_ ? shift @_ : $_;
+	my $token = (@_ ? shift @_ : $_)->[0];
 	# simple case: 'sqstring' or "dqstring"
 	return $token if $token =~ s/^'([^']*)'$/$1/;
 	return $token if $token =~ s/^"([^"]*)"$/$1/;
@@ -588,9 +594,9 @@ sub check_test {
 	$self->{ntests}++;
 	my $parser = TestParser->new(\$body);
 	my @tokens = $parser->parse();
-	return unless $emit_all || grep(/\?![^?]+\?!/, @tokens);
+	return unless $emit_all || grep {$_->[0] =~ /\?![^?]+\?!/} @tokens;
 	my $c = main::fd_colors(1);
-	my $checked = join(' ', @tokens);
+	my $checked = join(' ', map {$_->[0]} @tokens);
 	$checked =~ s/^\n//;
 	$checked =~ s/^ //mg;
 	$checked =~ s/ $//mg;
@@ -602,9 +608,9 @@ sub check_test {
 sub parse_cmd {
 	my $self = shift @_;
 	my @tokens = $self->SUPER::parse_cmd();
-	return @tokens unless @tokens && $tokens[0] =~ /^test_expect_(?:success|failure)$/;
+	return @tokens unless @tokens && $tokens[0]->[0] =~ /^test_expect_(?:success|failure)$/;
 	my $n = $#tokens;
-	$n-- while $n >= 0 && $tokens[$n] =~ /^(?:[;&\n|]|&&|\|\|)$/;
+	$n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
 	$self->check_test($tokens[1], $tokens[2]) if $n == 2; # title body
 	$self->check_test($tokens[2], $tokens[3]) if $n > 2;  # prereq title body
 	return @tokens;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] chainlint: annotate original test definition rather than token stream
  2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-11-08 19:08 ` [PATCH 3/4] chainlint: latch start/end position of each token Eric Sunshine via GitGitGadget
@ 2022-11-08 19:08 ` Eric Sunshine via GitGitGadget
  2022-11-08 20:28 ` [PATCH 0/4] chainlint: improve annotated output Taylor Blau
  2022-11-08 22:17 ` Ævar Arnfjörð Bjarmason
  5 siblings, 0 replies; 11+ messages in thread
From: Eric Sunshine via GitGitGadget @ 2022-11-08 19:08 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Eric Sunshine

From: Eric Sunshine <sunshine@sunshineco.com>

When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.

However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.

Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:

    msg="total: $(cd data; wc -w *.txt) words"

requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:

    cd data ; ?!AMP?! wc -w *.txt

However, the parent parse context will see everything inside the
double-quotes as a single string token:

    "total: $(cd data ; ?!AMP?! wc -w *.txt) words"

losing whatever positional information was attached to the ";" token
where the problem was detected.

One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:

    "total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"

and then extract the positional information when annotating the original
test definition.

However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.

Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:

    * indentation (and whitespace, in general)

    * comments

    * here-doc bodies

    * here-doc tag quoting (i.e. "\EOF")

    * line-splices (i.e. "\" at the end of a line)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl                                | 31 ++++++++++++++-----
 t/chainlint/block-comment.expect              |  2 ++
 t/chainlint/case-comment.expect               |  3 ++
 t/chainlint/close-subshell.expect             |  3 +-
 t/chainlint/comment.expect                    |  4 +++
 t/chainlint/double-here-doc.expect            | 14 +++++++--
 t/chainlint/empty-here-doc.expect             |  3 +-
 t/chainlint/for-loop.expect                   |  4 ++-
 t/chainlint/here-doc-close-subshell.expect    |  4 ++-
 t/chainlint/here-doc-indent-operator.expect   | 10 ++++--
 .../here-doc-multi-line-command-subst.expect  |  5 ++-
 t/chainlint/here-doc-multi-line-string.expect |  4 ++-
 t/chainlint/here-doc.expect                   | 24 ++++++++++++--
 t/chainlint/if-then-else.expect               |  4 ++-
 t/chainlint/incomplete-line.expect            | 10 ++++--
 t/chainlint/inline-comment.expect             |  4 +--
 t/chainlint/loop-detect-status.expect         |  2 +-
 t/chainlint/nested-here-doc.expect            | 27 ++++++++++++++--
 t/chainlint/nested-subshell-comment.expect    |  2 ++
 t/chainlint/subshell-here-doc.expect          | 28 ++++++++++++++---
 t/chainlint/t7900-subtree.expect              |  4 +++
 t/chainlint/while-loop.expect                 |  4 ++-
 22 files changed, 163 insertions(+), 33 deletions(-)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index 59aa79babc2..7972c5bbe6f 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -459,6 +459,13 @@ package TestParser;
 
 use base 'ShellParser';
 
+sub new {
+	my $class = shift @_;
+	my $self = $class->SUPER::new(@_);
+	$self->{problems} = [];
+	return $self;
+}
+
 sub find_non_nl {
 	my $tokens = shift @_;
 	my $n = shift @_;
@@ -498,7 +505,7 @@ sub parse_loop_body {
 	return @tokens if ends_with(\@tokens, [qr/^\|\|$/, "\n", qr/^echo$/, qr/^.+$/]);
 	# flag missing "return/exit" handling explicit failure in loop body
 	my $n = find_non_nl(\@tokens);
-	splice(@tokens, $n + 1, 0, ['?!LOOP?!', $tokens[$n]->[1], $tokens[$n]->[2]]);
+	push(@{$self->{problems}}, ['LOOP', $tokens[$n]]);
 	return @tokens;
 }
 
@@ -511,6 +518,7 @@ my @safe_endings = (
 
 sub accumulate {
 	my ($self, $tokens, $cmd) = @_;
+	my $problems = $self->{problems};
 
 	# no previous command to check for missing "&&"
 	goto DONE unless @$tokens;
@@ -531,13 +539,13 @@ sub accumulate {
 	# failure explicitly), then okay for all preceding commands to be
 	# missing "&&"
 	if ($$cmd[0]->[0] =~ /^(?:false|return|exit)$/) {
-		@$tokens = grep {$_->[0] !~ /^\?!AMP\?!$/} @$tokens;
+		@$problems = grep {$_->[0] ne 'AMP'} @$problems;
 		goto DONE;
 	}
 
 	# flag missing "&&" at end of previous command
 	my $n = find_non_nl($tokens);
-	splice(@$tokens, $n + 1, 0, ['?!AMP?!', $$tokens[$n]->[1], $$tokens[$n]->[2]]) unless $n < 0;
+	push(@$problems, ['AMP', $tokens->[$n]]) unless $n < 0;
 
 DONE:
 	$self->SUPER::accumulate($tokens, $cmd);
@@ -594,12 +602,21 @@ sub check_test {
 	$self->{ntests}++;
 	my $parser = TestParser->new(\$body);
 	my @tokens = $parser->parse();
-	return unless $emit_all || grep {$_->[0] =~ /\?![^?]+\?!/} @tokens;
+	my $problems = $parser->{problems};
+	return unless $emit_all || @$problems;
 	my $c = main::fd_colors(1);
-	my $checked = join(' ', map {$_->[0]} @tokens);
+	my $start = 0;
+	my $checked = '';
+	for (sort {$a->[1]->[2] <=> $b->[1]->[2]} @$problems) {
+		my ($label, $token) = @$_;
+		my $pos = $token->[2];
+		$checked .= substr($body, $start, $pos - $start) . " ?!$label?! ";
+		$start = $pos;
+	}
+	$checked .= substr($body, $start);
 	$checked =~ s/^\n//;
-	$checked =~ s/^ //mg;
-	$checked =~ s/ $//mg;
+	$checked =~ s/(\s) \?!/$1?!/mg;
+	$checked =~ s/\?! (\s)/?!$1/mg;
 	$checked =~ s/(\?![^?]+\?!)/$c->{rev}$c->{red}$1$c->{reset}/mg;
 	$checked .= "\n" unless $checked =~ /\n$/;
 	push(@{$self->{output}}, "$c->{blue}# chainlint: $title$c->{reset}\n$checked");
diff --git a/t/chainlint/block-comment.expect b/t/chainlint/block-comment.expect
index d10b2eeaf27..df2beea8887 100644
--- a/t/chainlint/block-comment.expect
+++ b/t/chainlint/block-comment.expect
@@ -1,6 +1,8 @@
 (
 	{
+		# show a
 		echo a &&
+		# show b
 		echo b
 	}
 )
diff --git a/t/chainlint/case-comment.expect b/t/chainlint/case-comment.expect
index 1e4b054bda0..641c157b98c 100644
--- a/t/chainlint/case-comment.expect
+++ b/t/chainlint/case-comment.expect
@@ -1,7 +1,10 @@
 (
 	case "$x" in
+	# found foo
 	x) foo ;;
+	# found other
 	*)
+		# treat it as bar
 		bar
 		;;
 	esac
diff --git a/t/chainlint/close-subshell.expect b/t/chainlint/close-subshell.expect
index 0f87db9ae68..2192a2870a1 100644
--- a/t/chainlint/close-subshell.expect
+++ b/t/chainlint/close-subshell.expect
@@ -15,7 +15,8 @@
 ) | wuzzle &&
 (
 	bop
-) | fazz 	fozz &&
+) | fazz \
+	fozz &&
 (
 	bup
 ) |
diff --git a/t/chainlint/comment.expect b/t/chainlint/comment.expect
index f76fde1ffba..a68f1f9d7c2 100644
--- a/t/chainlint/comment.expect
+++ b/t/chainlint/comment.expect
@@ -1,4 +1,8 @@
 (
+	# comment 1
 	nothing &&
+	# comment 2
 	something
+	# comment 3
+	# comment 4
 )
diff --git a/t/chainlint/double-here-doc.expect b/t/chainlint/double-here-doc.expect
index 75477bb1add..cd584a43573 100644
--- a/t/chainlint/double-here-doc.expect
+++ b/t/chainlint/double-here-doc.expect
@@ -1,2 +1,12 @@
-run_sub_test_lib_test_err run-inv-range-start "--run invalid range start" --run="a-5" <<-EOF &&
-check_sub_test_lib_test_err run-inv-range-start <<-EOF_OUT 3 <<-EOF_ERR
+run_sub_test_lib_test_err run-inv-range-start \
+	"--run invalid range start" \
+	--run="a-5" <<-\EOF &&
+test_expect_success "passing test #1" "true"
+test_done
+EOF
+check_sub_test_lib_test_err run-inv-range-start \
+	<<-\EOF_OUT 3<<-EOF_ERR
+> FATAL: Unexpected exit with code 1
+EOF_OUT
+> error: --run: invalid non-numeric in range start: ${SQ}a-5${SQ}
+EOF_ERR
diff --git a/t/chainlint/empty-here-doc.expect b/t/chainlint/empty-here-doc.expect
index f42f2d41ba8..e8733c97c64 100644
--- a/t/chainlint/empty-here-doc.expect
+++ b/t/chainlint/empty-here-doc.expect
@@ -1,3 +1,4 @@
 git ls-tree $tree path > current &&
-cat > expected <<EOF &&
+cat > expected <<\EOF &&
+EOF
 test_output
diff --git a/t/chainlint/for-loop.expect b/t/chainlint/for-loop.expect
index a5810c9bddd..d65c82129a6 100644
--- a/t/chainlint/for-loop.expect
+++ b/t/chainlint/for-loop.expect
@@ -2,7 +2,9 @@
 	for i in a b c
 	do
 		echo $i ?!AMP?!
-		cat <<-EOF ?!LOOP?!
+		cat <<-\EOF ?!LOOP?!
+		bar
+		EOF
 	done ?!AMP?!
 	for i in a b c; do
 		echo $i &&
diff --git a/t/chainlint/here-doc-close-subshell.expect b/t/chainlint/here-doc-close-subshell.expect
index 2af9ced71cc..7d9c2b56070 100644
--- a/t/chainlint/here-doc-close-subshell.expect
+++ b/t/chainlint/here-doc-close-subshell.expect
@@ -1,2 +1,4 @@
 (
-	cat <<-INPUT)
+	cat <<-\INPUT)
+	fizz
+	INPUT
diff --git a/t/chainlint/here-doc-indent-operator.expect b/t/chainlint/here-doc-indent-operator.expect
index fb6cf7285d0..f92a7ce9992 100644
--- a/t/chainlint/here-doc-indent-operator.expect
+++ b/t/chainlint/here-doc-indent-operator.expect
@@ -1,5 +1,11 @@
-cat > expect <<-EOF &&
+cat >expect <<- EOF &&
+header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0
+num_commits: $1
+chunks: oid_fanout oid_lookup commit_metadata generation_data bloom_indexes bloom_data
+EOF
 
-cat > expect <<-EOF ?!AMP?!
+cat >expect << -EOF ?!AMP?!
+this is not indented
+-EOF
 
 cleanup
diff --git a/t/chainlint/here-doc-multi-line-command-subst.expect b/t/chainlint/here-doc-multi-line-command-subst.expect
index f8b3aa73c4f..b7364c82c89 100644
--- a/t/chainlint/here-doc-multi-line-command-subst.expect
+++ b/t/chainlint/here-doc-multi-line-command-subst.expect
@@ -1,5 +1,8 @@
 (
-	x=$(bobble <<-END &&
+	x=$(bobble <<-\END &&
+		fossil
+		vegetable
+		END
 		wiffle) ?!AMP?!
 	echo $x
 )
diff --git a/t/chainlint/here-doc-multi-line-string.expect b/t/chainlint/here-doc-multi-line-string.expect
index be64b26869a..6c13bdcbfb5 100644
--- a/t/chainlint/here-doc-multi-line-string.expect
+++ b/t/chainlint/here-doc-multi-line-string.expect
@@ -1,5 +1,7 @@
 (
-	cat <<-TXT && echo "multi-line
+	cat <<-\TXT && echo "multi-line
 	string" ?!AMP?!
+	fizzle
+	TXT
 	bap
 )
diff --git a/t/chainlint/here-doc.expect b/t/chainlint/here-doc.expect
index 110059ba584..1df3f782821 100644
--- a/t/chainlint/here-doc.expect
+++ b/t/chainlint/here-doc.expect
@@ -1,7 +1,25 @@
-boodle wobba        gorgo snoot        wafta snurb <<EOF &&
+boodle wobba \
+	gorgo snoot \
+	wafta snurb <<EOF &&
+quoth the raven,
+nevermore...
+EOF
 
 cat <<-Arbitrary_Tag_42 >foo &&
+snoz
+boz
+woz
+Arbitrary_Tag_42
 
-cat <<zump >boo &&
+cat <<"zump" >boo &&
+snoz
+boz
+woz
+zump
 
-horticulture <<EOF
+horticulture <<\EOF
+gomez
+morticia
+wednesday
+pugsly
+EOF
diff --git a/t/chainlint/if-then-else.expect b/t/chainlint/if-then-else.expect
index 44d86c35976..cbaaf857d47 100644
--- a/t/chainlint/if-then-else.expect
+++ b/t/chainlint/if-then-else.expect
@@ -8,7 +8,9 @@
 		echo foo
 	else
 		echo foo &&
-		cat <<-EOF
+		cat <<-\EOF
+		bar
+		EOF
 	fi ?!AMP?!
 	echo poodle
 ) &&
diff --git a/t/chainlint/incomplete-line.expect b/t/chainlint/incomplete-line.expect
index ffac8f90185..134d3a14f5c 100644
--- a/t/chainlint/incomplete-line.expect
+++ b/t/chainlint/incomplete-line.expect
@@ -1,4 +1,10 @@
-line 1 line 2 line 3 line 4 &&
+line 1 \
+line 2 \
+line 3 \
+line 4 &&
 (
-	line 5 	line 6 	line 7 	line 8
+	line 5 \
+	line 6 \
+	line 7 \
+	line 8
 )
diff --git a/t/chainlint/inline-comment.expect b/t/chainlint/inline-comment.expect
index dd0dace077f..6bad2185300 100644
--- a/t/chainlint/inline-comment.expect
+++ b/t/chainlint/inline-comment.expect
@@ -1,6 +1,6 @@
 (
-	foobar &&
-	barfoo ?!AMP?!
+	foobar && # comment 1
+	barfoo ?!AMP?! # wrong position for &&
 	flibble "not a # comment"
 ) &&
 
diff --git a/t/chainlint/loop-detect-status.expect b/t/chainlint/loop-detect-status.expect
index 0ad23bb35e4..24da9e86d59 100644
--- a/t/chainlint/loop-detect-status.expect
+++ b/t/chainlint/loop-detect-status.expect
@@ -2,7 +2,7 @@
 do
 	printf "Generating blob $i/$blobcount\r" >& 2 &&
 	printf "blob\nmark :$i\ndata $blobsize\n" &&
-
+	#test-tool genrandom $i $blobsize &&
 	printf "%-${blobsize}s" $i &&
 	echo "M 100644 :$i $i" >> commit &&
 	i=$(($i+1)) ||
diff --git a/t/chainlint/nested-here-doc.expect b/t/chainlint/nested-here-doc.expect
index e3bef63f754..29b3832a986 100644
--- a/t/chainlint/nested-here-doc.expect
+++ b/t/chainlint/nested-here-doc.expect
@@ -1,7 +1,30 @@
 cat <<ARBITRARY >foop &&
+naddle
+fub <<EOF
+	nozzle
+	noodle
+EOF
+formp
+ARBITRARY
 
 (
-	cat <<-INPUT_END &&
-	cat <<-EOT ?!AMP?!
+	cat <<-\INPUT_END &&
+	fish are mice
+	but geese go slow
+	data <<EOF
+		perl is lerp
+		and nothing else
+	EOF
+	toink
+	INPUT_END
+
+	cat <<-\EOT ?!AMP?!
+	text goes here
+	data <<EOF
+		data goes here
+	EOF
+	more test here
+	EOT
+
 	foobar
 )
diff --git a/t/chainlint/nested-subshell-comment.expect b/t/chainlint/nested-subshell-comment.expect
index be4b27a305b..9138cf386d3 100644
--- a/t/chainlint/nested-subshell-comment.expect
+++ b/t/chainlint/nested-subshell-comment.expect
@@ -2,6 +2,8 @@
 	foo &&
 	(
 		bar &&
+		# bottles wobble while fiddles gobble
+		# minor numbers of cows (or do they?)
 		baz &&
 		snaff
 	) ?!AMP?!
diff --git a/t/chainlint/subshell-here-doc.expect b/t/chainlint/subshell-here-doc.expect
index 029d129299a..52789278d13 100644
--- a/t/chainlint/subshell-here-doc.expect
+++ b/t/chainlint/subshell-here-doc.expect
@@ -1,10 +1,30 @@
 (
-	echo wobba 	       gorgo snoot 	       wafta snurb <<-EOF &&
+	echo wobba \
+		gorgo snoot \
+		wafta snurb <<-EOF &&
+	quoth the raven,
+	nevermore...
+	EOF
+
 	cat <<EOF >bip ?!AMP?!
-	echo <<-EOF >bop
+	fish fly high
+EOF
+
+	echo <<-\EOF >bop
+	gomez
+	morticia
+	wednesday
+	pugsly
+	EOF
 ) &&
 (
-	cat <<-ARBITRARY >bup &&
-	cat <<-ARBITRARY3 >bup3 &&
+	cat <<-\ARBITRARY >bup &&
+	glink
+	FIZZ
+	ARBITRARY
+	cat <<-"ARBITRARY3" >bup3 &&
+	glink
+	FIZZ
+	ARBITRARY3
 	meep
 )
diff --git a/t/chainlint/t7900-subtree.expect b/t/chainlint/t7900-subtree.expect
index 69167da2f27..71b3b3bc20e 100644
--- a/t/chainlint/t7900-subtree.expect
+++ b/t/chainlint/t7900-subtree.expect
@@ -4,12 +4,16 @@ sub2
 sub3
 sub4" &&
 	chks_sub=$(cat <<TXT | sed "s,^,sub dir/,"
+$chks
+TXT
 ) &&
 	chkms="main-sub1
 main-sub2
 main-sub3
 main-sub4" &&
 	chkms_sub=$(cat <<TXT | sed "s,^,sub dir/,"
+$chkms
+TXT
 ) &&
 	subfiles=$(git ls-files) &&
 	check_equal "$subfiles" "$chkms
diff --git a/t/chainlint/while-loop.expect b/t/chainlint/while-loop.expect
index f272aa21fee..1f5eaea0fd5 100644
--- a/t/chainlint/while-loop.expect
+++ b/t/chainlint/while-loop.expect
@@ -2,7 +2,9 @@
 	while true
 	do
 		echo foo ?!AMP?!
-		cat <<-EOF ?!LOOP?!
+		cat <<-\EOF ?!LOOP?!
+		bar
+		EOF
 	done ?!AMP?!
 	while true; do
 		echo foo &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] chainlint: improve annotated output
  2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-11-08 19:08 ` [PATCH 4/4] chainlint: annotate original test definition rather than token stream Eric Sunshine via GitGitGadget
@ 2022-11-08 20:28 ` Taylor Blau
  2022-11-09 13:11   ` Jeff King
  2022-11-08 22:17 ` Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2022-11-08 20:28 UTC (permalink / raw)
  To: Eric Sunshine via GitGitGadget
  Cc: git, Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine

On Tue, Nov 08, 2022 at 07:08:26PM +0000, Eric Sunshine via GitGitGadget wrote:
> This patch series further improves the output by instead making chainlint.pl
> annotate the original test definition rather than the parsed token stream,
> thus preserving indentation (and whitespace, in general), here-doc bodies,
> etc., which should make it easier for a test author to relate each problem
> back to the source.

Very nicely done. The changes all seemed reasonable to me (and, in fact,
the approach is pretty straightforward -- the diffstat is misleading
since many of changes are to chainlint's expected output).

So I'm happy with it, but let's hear from some other folks who are more
familiar with this area before we start merging it down.


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] chainlint: improve annotated output
  2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-11-08 20:28 ` [PATCH 0/4] chainlint: improve annotated output Taylor Blau
@ 2022-11-08 22:17 ` Ævar Arnfjörð Bjarmason
  2022-11-08 22:43   ` Eric Sunshine
  5 siblings, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-08 22:17 UTC (permalink / raw)
  To: Eric Sunshine via GitGitGadget; +Cc: git, Jeff King, Eric Sunshine


On Tue, Nov 08 2022, Eric Sunshine via GitGitGadget wrote:

> When chainlint detects problems in a test, such as a broken &&-chain, it
> prints out the test with "?!FOO?!" annotations inserted at each problem
> location. However, rather than annotating the original test definition, it
> instead dumps out a parsed token representation of the test. Since it lacks
> comments, indentation, here-doc bodies, and so forth, this tokenized
> representation can be difficult for the test author to digest and relate
> back to the original test definition.
>
> An earlier patch series[1] improved the output somewhat by colorizing the
> "?!FOO?!" annotations and the "# chainlint:" lines, but the output can still
> be difficult to digest.
>
> This patch series further improves the output by instead making chainlint.pl
> annotate the original test definition rather than the parsed token stream,
> thus preserving indentation (and whitespace, in general), here-doc bodies,
> etc., which should make it easier for a test author to relate each problem
> back to the source.
>
> This series was inspired by usability comments from Peff[2] and Ævar[3] and
> a bit of discussion which followed[4][5].
>
> (Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)

Heh! It's great to see a follow-up to our discussion the other day, and
having the output verbatim & annotated looks much better, especially for
complex tests.

E.g. (taking one at random, after some grepping/skimming), ruining this one:
	
	diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
	index dcaab7265f5..c27539a773d 100755
	--- a/t/t6300-for-each-ref.sh
	+++ b/t/t6300-for-each-ref.sh
	@@ -1365,8 +1365,7 @@ test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
	                do
	                        GIT_COMMITTER_EMAIL="$email@example.com" \
	                        git tag -m "tag $subject" icase-$(printf %02d $nr) &&
	-                       nr=$((nr+1))||
	-                       return 1
	+                       nr=$((nr+1))
	                done
	        done &&
	        git for-each-ref --ignore-case \

Would, before emit (correct, but a bit of a token-soup):

	
	$ ./t6300-for-each-ref.sh 
	# chainlint: ./t6300-for-each-ref.sh
	# chainlint: for-each-ref --ignore-case works on multiple sort keys
	
	nr=0 &&
	for email in a A b B
	do
	for subject in a A b B
	do
	GIT_COMMITTER_EMAIL="$email@example.com" git tag -m "tag $subject" icase-$(printf %02d $nr) &&
	nr=$((nr+1)) ?!LOOP?!
	done ?!LOOP?!
	done &&
	git for-each-ref --ignore-case --format="%(taggeremail) %(subject) %(refname)" --sort=refname --sort=subject --sort=taggeremail refs/tags/icase-* > actual &&
	cat > expect <<-EOF &&
	test_cmp expect actual
	error: bug in the test script: lint error (see '?!...!? annotations above)

But now it'll instead emit:
	
	$ ./t6300-for-each-ref.sh
	# chainlint: ./t6300-for-each-ref.sh
	# chainlint: for-each-ref --ignore-case works on multiple sort keys
	        # name refs numerically to avoid case-insensitive filesystem conflicts
	        nr=0 &&
	        for email in a A b B
	        do
	                for subject in a A b B
	                do
	                        GIT_COMMITTER_EMAIL="$email@example.com" \
	                        git tag -m "tag $subject" icase-$(printf %02d $nr) &&
	                        nr=$((nr+1)) ?!LOOP?!
	                done ?!LOOP?!
	        done &&
	        git for-each-ref --ignore-case \
	                --format="%(taggeremail) %(subject) %(refname)" \
	                --sort=refname \
	                --sort=subject \
	                --sort=taggeremail \
	                refs/tags/icase-* >actual &&
	        cat >expect <<-\EOF &&
	        <a@example.com> tag a refs/tags/icase-00
	        <a@example.com> tag A refs/tags/icase-01
	        <A@example.com> tag a refs/tags/icase-04
	        <A@example.com> tag A refs/tags/icase-05
	        <a@example.com> tag b refs/tags/icase-02
	        <a@example.com> tag B refs/tags/icase-03
	        <A@example.com> tag b refs/tags/icase-06
	        <A@example.com> tag B refs/tags/icase-07
	        <b@example.com> tag a refs/tags/icase-08
	        <b@example.com> tag A refs/tags/icase-09
	        <B@example.com> tag a refs/tags/icase-12
	        <B@example.com> tag A refs/tags/icase-13
	        <b@example.com> tag b refs/tags/icase-10
	        <b@example.com> tag B refs/tags/icase-11
	        <B@example.com> tag b refs/tags/icase-14
	        <B@example.com> tag B refs/tags/icase-15
	        EOF
	        test_cmp expect actual
	error: bug in the test script: lint error (see '?!...!? annotations above)

Which is so much better, i.e. as you're preserving the whitespace &
comments, and the "?!LOOP?!" is of course much easier to see with the
colored output.

I hadn't noticed before that the contents of here-docs was pruned, but
that made sense in the previous parser, but having the content.

Also, and I guess this is an attempt to evade your blacklist. I *did*
notice when playing around with this that if I now expand the "1 while"
loop here:

	my $s = do { local $/; <$fh> };
	close($fh);
	my $parser = ScriptParser->new(\$s);
	1 while $parser->parse_cmd();

To something that "follows along" with the parser it shouldn't be too
hard in the future to add line number annotations now. E.g. for
"#!/bin/sh\n" you'll emit a token like "\n", but the positions will be
0, 10.

But that's all for some hypothetical future, this is already much better
:)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] chainlint: improve annotated output
  2022-11-08 22:17 ` Ævar Arnfjörð Bjarmason
@ 2022-11-08 22:43   ` Eric Sunshine
  2022-11-08 22:52     ` Eric Sunshine
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Sunshine @ 2022-11-08 22:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Eric Sunshine via GitGitGadget, git, Jeff King

On Tue, Nov 8, 2022 at 5:29 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Tue, Nov 08 2022, Eric Sunshine via GitGitGadget wrote:
> > (Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)
>
> Also, and I guess this is an attempt to evade your blacklist. I *did*
> notice when playing around with this that if I now expand the "1 while"
> loop here:
>    [...]
> To something that "follows along" with the parser it shouldn't be too
> hard in the future to add line number annotations now. E.g. for
> "#!/bin/sh\n" you'll emit a token like "\n", but the positions will be
> 0, 10.
>
> But that's all for some hypothetical future, this is already much better

My nerd-snipe blacklist hasn't fully solidified yet, unfortunately (for me).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] chainlint: improve annotated output
  2022-11-08 22:43   ` Eric Sunshine
@ 2022-11-08 22:52     ` Eric Sunshine
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Sunshine @ 2022-11-08 22:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Eric Sunshine via GitGitGadget, git, Jeff King

On Tue, Nov 8, 2022 at 5:43 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Tue, Nov 8, 2022 at 5:29 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > On Tue, Nov 08 2022, Eric Sunshine via GitGitGadget wrote:
> > > (Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)
> >
> > Also, and I guess this is an attempt to evade your blacklist. I *did*
> > notice when playing around with this that if I now expand the "1 while"
> > loop here:
> >    [...]
> > To something that "follows along" with the parser it shouldn't be too
> > hard in the future to add line number annotations now. E.g. for
> > "#!/bin/sh\n" you'll emit a token like "\n", but the positions will be
> > 0, 10.
> >
> > But that's all for some hypothetical future, this is already much better
>
> My nerd-snipe blacklist hasn't fully solidified yet, unfortunately (for me).

I forgot to add that if you do manage to penetrate my nerd-snipe
blacklist, such a feature would be built atop the current series (i.e.
no reason to hold up this series for the "hypothetical future", as you
say).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] chainlint: improve annotated output
  2022-11-08 20:28 ` [PATCH 0/4] chainlint: improve annotated output Taylor Blau
@ 2022-11-09 13:11   ` Jeff King
  2022-11-10  2:42     ` Taylor Blau
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2022-11-09 13:11 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Eric Sunshine via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Eric Sunshine

On Tue, Nov 08, 2022 at 03:28:34PM -0500, Taylor Blau wrote:

> On Tue, Nov 08, 2022 at 07:08:26PM +0000, Eric Sunshine via GitGitGadget wrote:
> > This patch series further improves the output by instead making chainlint.pl
> > annotate the original test definition rather than the parsed token stream,
> > thus preserving indentation (and whitespace, in general), here-doc bodies,
> > etc., which should make it easier for a test author to relate each problem
> > back to the source.
> 
> Very nicely done. The changes all seemed reasonable to me (and, in fact,
> the approach is pretty straightforward -- the diffstat is misleading
> since many of changes are to chainlint's expected output).
> 
> So I'm happy with it, but let's hear from some other folks who are more
> familiar with this area before we start merging it down.

I don't claim to be _that_ familiar with the code itself, but all of the
patches look reasonable to me. And most importantly, I dug out the state
of my tree from early September (via the reflog) before I fixed all of
the chainlint problems on my local topics. The improvement in the output
with this series is night and day.

I was a little surprised that using a class in patch 3 would cause such
a slowdown. But it's not that hard to believe that the workload is so
heavy on string comparison and manipulation that the overloaded string
and comparison functions introduce significant overhead. It has been a
long time since I've optimized any perl, but I remember the rule of
thumb being to minimize the number of lines of perl (because all of the
builtin stuff is blazingly fast C, and all of the perl is byte-code).

At any rate, the result you came up with doesn't look too bad. The only
risk is that you forgot to s/$token/$token->[0]/ somewhere, and I
suspect we'd have found that in running the tests.

So it all seems like a step forward to me.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] chainlint: improve annotated output
  2022-11-09 13:11   ` Jeff King
@ 2022-11-10  2:42     ` Taylor Blau
  0 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-11-10  2:42 UTC (permalink / raw)
  To: Jeff King
  Cc: Eric Sunshine via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Eric Sunshine

On Wed, Nov 09, 2022 at 08:11:45AM -0500, Jeff King wrote:
> So it all seems like a step forward to me.

Thanks, all. Let's start merging this topic down :-).


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-11-10  2:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08 19:08 [PATCH 0/4] chainlint: improve annotated output Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` [PATCH 1/4] chainlint: add explanatory comments Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` [PATCH 2/4] chainlint: tighten accuracy when consuming input stream Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` [PATCH 3/4] chainlint: latch start/end position of each token Eric Sunshine via GitGitGadget
2022-11-08 19:08 ` [PATCH 4/4] chainlint: annotate original test definition rather than token stream Eric Sunshine via GitGitGadget
2022-11-08 20:28 ` [PATCH 0/4] chainlint: improve annotated output Taylor Blau
2022-11-09 13:11   ` Jeff King
2022-11-10  2:42     ` Taylor Blau
2022-11-08 22:17 ` Ævar Arnfjörð Bjarmason
2022-11-08 22:43   ` Eric Sunshine
2022-11-08 22:52     ` Eric Sunshine

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.