[PATCH 0/4] some chainlint fixes and performance improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] some chainlint fixes and performance improvements
@ 2023-03-28 20:20 Jeff King
  2023-03-28 20:22 ` [PATCH 1/4] tests: run internal chain-linter under "make test" Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Jeff King @ 2023-03-28 20:20 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

Here are a few fixes for chainlint. The first patch should avoid the
confusion we discussed in the subthread at:

  https://lore.kernel.org/git/3714ba2f6528c38eb9c438126316a08b0cefca7c.1679696180.git.git@grubix.eu/

and the relevant folks are cc'd. The rest are some old performance
improvement ideas I had for the internal chain-linter. I doubt they make
a huge difference overall, but they can be measured in certain cases.
The first one to me looks like an obvious win. The second one is
debatable, as it involves some hand-waving. The third one turned out not
to make anything faster, but makes the code a little simpler.

So I'm on the fence for patches 3 and 4 below, but the first two I think
are strict improvements.

  [1/4]: tests: run internal chain-linter under "make test"
  [2/4]: tests: replace chainlint subshell with a function
  [3/4]: tests: drop here-doc check from internal chain-linter
  [4/4]: tests: skip test_eval_ in internal chain-lint

 t/Makefile    |  2 +-
 t/test-lib.sh | 27 ++++++++++++++-------------
 2 files changed, 15 insertions(+), 14 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] tests: run internal chain-linter under "make test"
  2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
@ 2023-03-28 20:22 ` Jeff King
  2023-03-29 10:20   ` Ævar Arnfjörð Bjarmason
  2023-03-28 20:23 ` [PATCH 2/4] tests: replace chainlint subshell with a function Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-28 20:22 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

Since 69b9924b875 (t/Makefile: teach `make test` and `make prove` to run
chainlint.pl, 2022-09-01), we run a single chainlint.pl process for all
scripts, and then instruct each individual script to run with the
equivalent of --no-chain-lint, which tells them not to redundantly run
the chainlint script themselves.

However, this also disables the internal linter run within the shell by
eval-ing "(exit 117) && $1" and confirming we get code 117. In theory
the external linter produces a superset of complaints, and we don't need
the internal one anymore. However, we know there is at least one case
where they differ. A test like:

	test_expect_success 'should fail linter' '
		false &&
		sleep 2 &
		pid=$! &&
		kill $pid
	'

is buggy (it ignores the failure from "false", because it is
backgrounded along with the sleep). The internal linter catches this,
but the external one doesn't (and teaching it to do so is complicated[1]).
So not only does "make test" miss this problem, but it's doubly
confusing because running the script standalone does complain.

Let's teach the suppression in the Makefile to only turn off the
external linter (which we know is redundant, as it was already run) and
leave the internal one intact.

I've used a new environment variable to do this here, and intentionally
did not add a "--no-ext-chain-lint" option. This is an internal
optimization used by the Makefile, and not something that ordinary users
would need to tweak.

[1] For discussion of chainlint.pl and this case, see:
    https://lore.kernel.org/git/CAPig+cQtLFX4PgXyyK_AAkCvg4Aw2RAC5MmLbib-aHHgTBcDuw@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
---
 t/Makefile    | 2 +-
 t/test-lib.sh | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/Makefile b/t/Makefile
index 88fa5049573..10881affdd0 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -45,7 +45,7 @@ CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
 # `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
 # checks all tests in all scripts via a single invocation, so tell individual
 # scripts not to "chainlint" themselves
-CHAINLINTSUPPRESS = GIT_TEST_CHAIN_LINT=0 && export GIT_TEST_CHAIN_LINT &&
+CHAINLINTSUPPRESS = GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT &&

 all: $(DEFAULT_TEST_TARGET)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 62136caee5a..09789566374 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1593,7 +1593,8 @@ then
 	BAIL_OUT_ENV_NEEDS_SANITIZE_LEAK "GIT_TEST_SANITIZE_LEAK_LOG=true"
 fi

-if test "${GIT_TEST_CHAIN_LINT:-1}" != 0
+if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 &&
+   test "${GIT_TEST_EXT_CHAIN_LINT:-1}" != 0
 then
 	"$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
 		BUG "lint error (see '?!...!? annotations above)"
-- 
2.40.0.616.gf524ec75088

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] tests: replace chainlint subshell with a function
  2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
  2023-03-28 20:22 ` [PATCH 1/4] tests: run internal chain-linter under "make test" Jeff King
@ 2023-03-28 20:23 ` Jeff King
  2023-03-28 20:40   ` Junio C Hamano
  2023-03-28 20:28 ` [PATCH 3/4] tests: drop here-doc check from internal chain-linter Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-28 20:23 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

To test that we don't break the &&-chain, test-lib.sh does something
like:

   (exit 117) && $test_commands

and checks that the result is exit code 117. We don't care what that
initial command is, as long as it exits with a unique code. Using "exit"
works and is simple, but is a bit expensive since it requires a subshell
(to avoid exiting the whole script!). This isn't usually very
noticeable, but it can add up for scripts which have a large number of
tests.

Using "return" naively won't work here, because we'd return from the
function eval-ing the snippet (and it wouldn't find &&-chain breakages).
But if we further push that into its own function, it does exactly what
we want, without extra subshell overhead.

According to hyperfine, this produces a measurable improvement when
running t3070 (which has 1800 tests, all of them quite short):

  'HEAD' ran
    1.09 ± 0.01 times faster than 'HEAD~1'

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 09789566374..cfcbd899c5a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1086,6 +1086,10 @@ test_eval_ () {
 	return $test_eval_ret_
 }

+fail_117 () {
+	return 117
+}
+
 test_run_ () {
 	test_cleanup=:
 	expecting_failure=$2
@@ -1097,7 +1101,7 @@ test_run_ () {
 		trace=
 		# 117 is magic because it is unlikely to match the exit
 		# code of other programs
-		if test "OK-117" != "$(test_eval_ "(exit 117) && $1${LF}${LF}echo OK-\$?" 3>&1)"
+		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
 		then
 			BUG "broken &&-chain or run-away HERE-DOC: $1"
 		fi
-- 
2.40.0.616.gf524ec75088

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
  2023-03-28 20:22 ` [PATCH 1/4] tests: run internal chain-linter under "make test" Jeff King
  2023-03-28 20:23 ` [PATCH 2/4] tests: replace chainlint subshell with a function Jeff King
@ 2023-03-28 20:28 ` Jeff King
  2023-03-28 21:46   ` Junio C Hamano
  2023-03-28 20:28 ` [PATCH 4/4] tests: skip test_eval_ in internal chain-lint Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-28 20:28 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

Commit 99a64e4b73c (tests: lint for run-away here-doc, 2017-03-22)
tweaked the chain-lint test to catch unclosed here-docs. It works by
adding an extra "echo" command after the test snippet, and checking that
it is run (if it gets swallowed by a here-doc, naturally it is not run).

The downside here is that we introduced an extra $() substitution, which
happens in a subshell. This has a measurable performance impact when
run for many tests.

The tradeoff in safety was undoubtedly worth it when 99a64e4b73c was
written. But these days, the external chainlint.pl does a pretty good
job of finding these (even though it's not something it specifically
tries to flag). For example, if you have a test like:

	test_expect_success 'should fail linter' '
	        some_command >actual &&
	        cat >expect <<-\EOF &&
		ok
		# missing EOF line here
	        test_cmp expect actual
	'

it will see that the here-doc isn't closed, treat it as not-a-here-doc,
and complain that the "ok" line does not have an "&&". So in practice we
should be catching these via that linter, although:

  - the error message is not as good as it could be (the real problem is
    the unclosed here-doc)

  - it can be fooled if there are no lines in the here-doc:

      cat >expect <<-\EOF &&
      # missing EOF line here

    or if every line in the here-doc has &&-chaining (weird, but
    possible)

Those are sufficiently unlikely that they're not worth worrying too much
about. And by switching back to a simpler chain-lint, hyperfine reports
a measurable speedup on t3070 (which has 1800 tests):

  'HEAD' ran
    1.12 ± 0.01 times faster than 'HEAD~1'

Signed-off-by: Jeff King <peff@peff.net>
---
I didn't look at how hard it would be to teach chainlint.pl to complain
about the unclosed here-doc. I think it _might_ actually not be that
bad, just because it does already understand here-docs in general. If it
handled that, then all of the hand-waving in the second half of the
commit message could go away. ;)

I'd also be OK dropping this. 12% is nice, but this one test is an
outlier. Picking t4202 somewhat at random as a more realistic test, any
improvement seems to be mostly lost in the noise.

 t/test-lib.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index cfcbd899c5a..0048ec7b6f6 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1101,9 +1101,10 @@ test_run_ () {
 		trace=
 		# 117 is magic because it is unlikely to match the exit
 		# code of other programs
-		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
+		test_eval_ "fail_117 && $1"
+		if test $? != 117
 		then
-			BUG "broken &&-chain or run-away HERE-DOC: $1"
+			BUG "broken &&-chain: $1"
 		fi
 		trace=$trace_tmp
 	fi
-- 
2.40.0.616.gf524ec75088


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] tests: skip test_eval_ in internal chain-lint
  2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
                   ` (2 preceding siblings ...)
  2023-03-28 20:28 ` [PATCH 3/4] tests: drop here-doc check from internal chain-linter Jeff King
@ 2023-03-28 20:28 ` Jeff King
  2023-03-28 21:08 ` [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-28 20:28 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

To check for broken &&-chains, we run "fail_117 && $1" as a test
snippet, and check the exit code. We use test_eval_ to do so, because
that's the way we run the actual test.

But we don't need any of its niceties, like "set -x" tracing. In fact,
they hinder us, because we have to explicitly disable them. So let's
skip that and use "eval" more directly, which is simpler. I had hoped it
would also be faster, but it doesn't seem to produce a measurable
improvement (probably because it's just running internal shell commands,
with no subshells or forks).

Note that there is one gotcha: even though we don't intend to run any of
the commands if the &&-chain is intact, an error like this:

   test_expect_success 'broken' '
	# this next line breaks the &&-chain
	true
	# and then this one is executed even by the linter
	return 1
   '

means we'll "return 1" from the eval, and thus from test_run_(). We
actually do notice this in test_expect_success, but only by saying "hey,
this test didn't say it was OK, so it must have failed", which is not
right (it should say "broken &&-chain").

We can handle this by calling test_eval_inner_() instead, which is our
trick for wrapping "return" in a test snippet. But to do that, we have
to push the trace code out of that inner function and into test_eval_().
This is arguably where it belonged in the first place, but it never
mattered because the "inner_" function had only one caller.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0048ec7b6f6..293caf0f20e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1041,10 +1041,7 @@ want_trace () {
 # (and we want to make sure we run any cleanup like
 # "set +x").
 test_eval_inner_ () {
-	# Do not add anything extra (including LF) after '$*'
-	eval "
-		want_trace && trace_level_=$(($trace_level_+1)) && set -x
-		$*"
+	eval "$*"
 }

 test_eval_ () {
@@ -1069,7 +1066,10 @@ test_eval_ () {
 	#     be _inside_ the block to avoid polluting the "set -x" output
 	#

-	test_eval_inner_ "$@" </dev/null >&3 2>&4
+	# Do not add anything extra (including LF) after '$*'
+	test_eval_inner_ </dev/null >&3 2>&4 "
+		want_trace && trace_level_=$(($trace_level_+1)) && set -x
+		$*"
 	{
 		test_eval_ret_=$?
 		if want_trace
@@ -1095,18 +1095,13 @@ test_run_ () {
 	expecting_failure=$2

 	if test "${GIT_TEST_CHAIN_LINT:-1}" != 0; then
-		# turn off tracing for this test-eval, as it simply creates
-		# confusing noise in the "-x" output
-		trace_tmp=$trace
-		trace=
 		# 117 is magic because it is unlikely to match the exit
 		# code of other programs
-		test_eval_ "fail_117 && $1"
+		test_eval_inner_ "fail_117 && $1" </dev/null >&3 2>&4
 		if test $? != 117
 		then
 			BUG "broken &&-chain: $1"
 		fi
-		trace=$trace_tmp
 	fi

 	setup_malloc_check
-- 
2.40.0.616.gf524ec75088

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] tests: replace chainlint subshell with a function
  2023-03-28 20:23 ` [PATCH 2/4] tests: replace chainlint subshell with a function Jeff King
@ 2023-03-28 20:40   ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2023-03-28 20:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

Jeff King <peff@peff.net> writes:

> To test that we don't break the &&-chain, test-lib.sh does something
> like:
>
>    (exit 117) && $test_commands
>
> and checks that the result is exit code 117. We don't care what that
> initial command is, as long as it exits with a unique code. Using "exit"
> works and is simple, but is a bit expensive since it requires a subshell
> (to avoid exiting the whole script!). This isn't usually very
> noticeable, but it can add up for scripts which have a large number of
> tests.
>
> Using "return" naively won't work here, because we'd return from the
> function eval-ing the snippet (and it wouldn't find &&-chain breakages).
> But if we further push that into its own function, it does exactly what
> we want, without extra subshell overhead.

Cute.

> According to hyperfine, this produces a measurable improvement when
> running t3070 (which has 1800 tests, all of them quite short):
>
>   'HEAD' ran
>     1.09 ± 0.01 times faster than 'HEAD~1'

That is certainly a respectable improvement.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/test-lib.sh | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 09789566374..cfcbd899c5a 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1086,6 +1086,10 @@ test_eval_ () {
>  	return $test_eval_ret_
>  }
>  
> +fail_117 () {
> +	return 117
> +}
> +
>  test_run_ () {
>  	test_cleanup=:
>  	expecting_failure=$2
> @@ -1097,7 +1101,7 @@ test_run_ () {
>  		trace=
>  		# 117 is magic because it is unlikely to match the exit
>  		# code of other programs
> -		if test "OK-117" != "$(test_eval_ "(exit 117) && $1${LF}${LF}echo OK-\$?" 3>&1)"
> +		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
>  		then
>  			BUG "broken &&-chain or run-away HERE-DOC: $1"
>  		fi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] some chainlint fixes and performance improvements
  2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
                   ` (3 preceding siblings ...)
  2023-03-28 20:28 ` [PATCH 4/4] tests: skip test_eval_ in internal chain-lint Jeff King
@ 2023-03-28 21:08 ` Jeff King
  2023-03-30 22:08   ` Jeff King
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
  5 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-28 21:08 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 04:20:44PM -0400, Jeff King wrote:

> The rest are some old performance improvement ideas I had for the
> internal chain-linter. I doubt they make a huge difference overall,
> but they can be measured in certain cases.  The first one to me looks
> like an obvious win. The second one is debatable, as it involves some
> hand-waving. The third one turned out not to make anything faster, but
> makes the code a little simpler.

BTW, I noticed something really funky when timing t3070 for this series.

  $ time ./t3070-wildmatch.sh
  [a bunch of output]
  real	0m4.750s
  user	0m3.665s
  sys	0m0.955s

  $ time ./t3070-wildmatch.sh >/dev/null
  real	0m18.664s
  user	0m9.185s
  sys	0m9.495s

Er, what? It gets way slower when redirected to /dev/null. I can't
figure out why. Replacing the actual tests with a simple command shows
the same behavior, so it's not a problem in the tested commands. I tried
simplifying what the test script was doing, but it really looks like the
slowdown scales with the number of subshells/forks.

The problem is also independent of shell (bash vs dash). And here's an
even weirder bit. If I pipe the output through cat, it's still fast:

  $ time ./t3070-wildmatch.sh | cat
  [lots of output]
  real	0m4.719s
  user	0m3.636s
  sys	0m0.946s

but when cat goes to /dev/null, it's slow again!

  $ time ./t3070-wildmatch.sh | cat >/dev/null
  real	0m18.383s
  user	0m8.987s
  sys	0m9.450s

So our scripts are seeing the same environment (a pipe), but somehow cat
on the other side is slowing things down. Which seems to me like it
could be a kernel problem, but it's hard to imagine what. I tried a few
versions and couldn't find one that didn't exhibit it.  This is a Debian
unstable machine (so running kernel 6.1.20). Another machine running
stable (so kernel 5.15) did not exhibit the problem, but there are many
different variables beyond kernel version there.

So this doesn't seem like a Git problem at all, but it's an interesting
mystery, and I wondered if anybody else has run into it.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-28 20:28 ` [PATCH 3/4] tests: drop here-doc check from internal chain-linter Jeff King
@ 2023-03-28 21:46   ` Junio C Hamano
  2023-03-29  2:37     ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-03-28 21:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

Jeff King <peff@peff.net> writes:

> I'd also be OK dropping this. 12% is nice, but this one test is an
> outlier. Picking t4202 somewhat at random as a more realistic test, any
> improvement seems to be mostly lost in the noise.
>
>  t/test-lib.sh | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index cfcbd899c5a..0048ec7b6f6 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1101,9 +1101,10 @@ test_run_ () {
>  		trace=
>  		# 117 is magic because it is unlikely to match the exit
>  		# code of other programs
> -		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
> +		test_eval_ "fail_117 && $1"
> +		if test $? != 117
>  		then
> -			BUG "broken &&-chain or run-away HERE-DOC: $1"
> +			BUG "broken &&-chain: $1"
>  		fi

This "here-doc" linter is a cute trick.  With seemingly so little
extra code, it catches a breakage in such an unexpected way.

Even with a very small code footprint, overhead of an extra process
is still there, and it would be very hard to figure out what it does
(once you are told what it does, you can probably figure out how it
works).  These make it a "cute trick".

While it is a bit sad to see it lost, the resulting code certainly
is easier to follow, I would think.  I do not offhand know how
valuable detecting unterminated here-doc is, compared to the
increased clarity of hte code.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-28 21:46   ` Junio C Hamano
@ 2023-03-29  2:37     ` Jeff King
  2023-03-29  3:04       ` Jeff King
  2023-03-29  3:07       ` Eric Sunshine
  0 siblings, 2 replies; 32+ messages in thread
From: Jeff King @ 2023-03-29  2:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 02:46:12PM -0700, Junio C Hamano wrote:

> > -		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
> > +		test_eval_ "fail_117 && $1"
> > +		if test $? != 117
> >  		then
> > -			BUG "broken &&-chain or run-away HERE-DOC: $1"
> > +			BUG "broken &&-chain: $1"
> >  		fi
> 
> This "here-doc" linter is a cute trick.  With seemingly so little
> extra code, it catches a breakage in such an unexpected way.
> 
> Even with a very small code footprint, overhead of an extra process
> is still there, and it would be very hard to figure out what it does
> (once you are told what it does, you can probably figure out how it
> works).  These make it a "cute trick".

Yes, I thought it was quite clever (and it is attributed to you), but I
agree that I did not quite realize what it was doing until after running
git-blame. I only started with "gee, why didn't we write this in the
simpler way?", which is often a good starting point for archaeology. :)

> While it is a bit sad to see it lost, the resulting code certainly
> is easier to follow, I would think.  I do not offhand know how
> valuable detecting unterminated here-doc is, compared to the
> increased clarity of hte code.

I think the complexity is merited _if_ we think it is catching useful
cases. And I do count unterminated here-doc as a useful case, because,
as with broken &&-chains, the failure mode is so nasty (a test will
report success, even though part of the test was simply not run!).

I just think chainlint.pl is doing a good enough job of catching it that
we can rely on it. I'll be curious if Eric has input there on whether it
can do even better, which would remove all of the caveats from the
commit message.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  2:37     ` Jeff King
@ 2023-03-29  3:04       ` Jeff King
  2023-03-29  3:13         ` Eric Sunshine
  2023-03-29  3:07       ` Eric Sunshine
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-29  3:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 10:37:02PM -0400, Jeff King wrote:

> I just think chainlint.pl is doing a good enough job of catching it that
> we can rely on it. I'll be curious if Eric has input there on whether it
> can do even better, which would remove all of the caveats from the
> commit message.

So I _think_ it's something like this:

diff --git a/t/chainlint.pl b/t/chainlint.pl
index e966412999a..6b8c1de5208 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -171,6 +171,9 @@ sub swallow_heredocs {
 		my $start = pos($$b);
 		my $indent = $tag =~ s/^\t// ? '\\s*' : '';
 		$$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
+		if (pos($$b) == $start) {
+			die "oops, we did not find the end of the heredoc";
+		}
 		my $body = substr($$b, $start, pos($$b) - $start);
 		$self->{lineno} += () = $body =~ /\n/sg;
 	}

But I wasn't sure how to surface a clean error from here, since we're in
the Lexer. Maybe we just accumulate a "problems" array here, and then
roll those up via the TestParser? I'm not very familiar with the
arrangement of that part of the script.

And I say "think" because the thing I was worried about is that we'd do
this lexing at too high a level, and accidentally walk past the end of
the test. Which would mean getting fooled by;

  test_expect_success 'this one is broken' '
	cat >foo <<\EOF
	oops, we are missing our here-doc end
  '

  test_expect_success 'this one is ok' '
	cat >foo <<\EOF
	this one is OK, but we would not want to confuse
	its closing tag for the missing one
	EOF
  '

But it looks like Lexer::swallow_heredocs gets to see the individual
test snippets.

-Peff

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  2:37     ` Jeff King
  2023-03-29  3:04       ` Jeff King
@ 2023-03-29  3:07       ` Eric Sunshine
  2023-03-29  6:28         ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Sunshine @ 2023-03-29  3:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 10:37 PM Jeff King <peff@peff.net> wrote:
> On Tue, Mar 28, 2023 at 02:46:12PM -0700, Junio C Hamano wrote:
> > > -           if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
> > > -                   BUG "broken &&-chain or run-away HERE-DOC: $1"
> >
> > This "here-doc" linter is a cute trick.  With seemingly so little
> > extra code, it catches a breakage in such an unexpected way.
> >
> > Even with a very small code footprint, overhead of an extra process
> > is still there, and it would be very hard to figure out what it does
> > (once you are told what it does, you can probably figure out how it
> > works).  These make it a "cute trick".
>
> Yes, I thought it was quite clever (and it is attributed to you), but I
> agree that I did not quite realize what it was doing until after running
> git-blame. I only started with "gee, why didn't we write this in the
> simpler way?", which is often a good starting point for archaeology. :)

I never realized what the "OK-" part of "OK-117" was doing either. I
guess I should have gone spelunking through history to find out,
though it wasn't high-priority for me to know with regards to my work
on chainlint.pl, so I never got around to it. I suspect that the "OK-"
trick was discussed and added during the period I was off-list.

> > While it is a bit sad to see it lost, the resulting code certainly
> > is easier to follow, I would think.  I do not offhand know how
> > valuable detecting unterminated here-doc is, compared to the
> > increased clarity of hte code.
>
> I think the complexity is merited _if_ we think it is catching useful
> cases. And I do count unterminated here-doc as a useful case, because,
> as with broken &&-chains, the failure mode is so nasty (a test will
> report success, even though part of the test was simply not run!).
>
> I just think chainlint.pl is doing a good enough job of catching it that
> we can rely on it. I'll be curious if Eric has input there on whether it
> can do even better, which would remove all of the caveats from the
> commit message.

It was an intentional design choice, for the sake of simplicity, _not_
to make chainlint.pl a shell syntax checker, despite the fact that it
is genuinely parsing shell code. After all, the shell itself -- by
which test code will ultimately be executed -- is more than capable of
diagnosing syntax errors, so why burden chainlint.pl with all that
extra baggage? Instead, chainlint.pl is focussed on detecting semantic
problems -- such as broken &&-chain and missing `return 1` -- which
are of importance to _this_ project.

So, it was very much deliberate that chainlint.pl does not diagnose an
unclosed here-doc. When making that design choice, though, I wasn't
aware of 99a64e4b73c (tests: lint for run-away here-doc, 2017-03-22),
thus didn't know that our test framework had been allowing such
problems to slip through silently[1].

That said, it doesn't look too hard to make chainlint.pl diagnose an
unclosed here-doc.

[1]: Why is that, by the way? Is `eval` not complaining about the
unclosed here-doc? Is that something that can be fixed more generally?
Are there other syntactic problems that go unnoticed?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  3:04       ` Jeff King
@ 2023-03-29  3:13         ` Eric Sunshine
  2023-03-29  3:46           ` Eric Sunshine
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Sunshine @ 2023-03-29  3:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 11:04 PM Jeff King <peff@peff.net> wrote:
> On Tue, Mar 28, 2023 at 10:37:02PM -0400, Jeff King wrote:
> > I just think chainlint.pl is doing a good enough job of catching it that
> > we can rely on it. I'll be curious if Eric has input there on whether it
> > can do even better, which would remove all of the caveats from the
> > commit message.
>
> So I _think_ it's something like this:
>
> @@ -171,6 +171,9 @@ sub swallow_heredocs {
>                 my $start = pos($$b);
>                 my $indent = $tag =~ s/^\t// ? '\\s*' : '';
>                 $$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
> +               if (pos($$b) == $start) {
> +                       die "oops, we did not find the end of the heredoc";
> +               }
>                 my $body = substr($$b, $start, pos($$b) - $start);
>                 $self->{lineno} += () = $body =~ /\n/sg;
>         }
>
> But I wasn't sure how to surface a clean error from here, since we're in
> the Lexer. Maybe we just accumulate a "problems" array here, and then
> roll those up via the TestParser? I'm not very familiar with the
> arrangement of that part of the script.

Yes, it would look something like that and you chose the correct spot
to detect the problem, but to get a "pretty" error message properly
positioned in the input, we need to capture the input stream position
of the here-doc tag itself in scan_heredoc_tag(). It doesn't look too
difficult, and I even started writing a bit of code to do it, but I'm
not sure how soon I can get around to finishing the implementation.

> And I say "think" because the thing I was worried about is that we'd do
> this lexing at too high a level, and accidentally walk past the end of
> the test. Which would mean getting fooled by;
>
>   test_expect_success 'this one is broken' '
>         cat >foo <<\EOF
>         oops, we are missing our here-doc end
>   '
>
>   test_expect_success 'this one is ok' '
>         cat >foo <<\EOF
>         this one is OK, but we would not want to confuse
>         its closing tag for the missing one
>         EOF
>   '
>
> But it looks like Lexer::swallow_heredocs gets to see the individual
> test snippets.

Correct. ScriptParser scans the input files for
test_expect_{success,failure} invocations in order to extract the
individual test snippets, which then get passed to TestParser for
semantic analysis.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  3:13         ` Eric Sunshine
@ 2023-03-29  3:46           ` Eric Sunshine
  2023-03-29  4:02             ` Eric Sunshine
  2023-03-29  6:07             ` Jeff King
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Sunshine @ 2023-03-29  3:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

[-- Attachment #1: Type: text/plain, Size: 3386 bytes --]

On Tue, Mar 28, 2023 at 11:13 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Tue, Mar 28, 2023 at 11:04 PM Jeff King <peff@peff.net> wrote:
> > So I _think_ it's something like this:
> >
> > But I wasn't sure how to surface a clean error from here, since we're in
> > the Lexer. Maybe we just accumulate a "problems" array here, and then
> > roll those up via the TestParser? I'm not very familiar with the
> > arrangement of that part of the script.
>
> Yes, it would look something like that and you chose the correct spot
> to detect the problem, but to get a "pretty" error message properly
> positioned in the input, we need to capture the input stream position
> of the here-doc tag itself in scan_heredoc_tag(). It doesn't look too
> difficult, and I even started writing a bit of code to do it, but I'm
> not sure how soon I can get around to finishing the implementation.

The attached patch seems to do the job. Apologies for Gmail messing up
the whitespace. It's also attached to the email.

This would probably make a good preparatory patch to your [3/4]. As
mentioned earlier in the thread, the changes to scan_heredoc_tag ()
capture the input-stream position of the here-doc tag itself, which is
necessary since it would be too late to do so by the time the error is
detected by swallow_heredocs(). I don't now when I'll get time to send
this as a proper patch, so feel free to write a commit message and
incorporate it into your series if you want to use it. And, of course,
you have my sign-off already in the patch. It should be easy to add a
test, as well, in t/chainlint, perhaps as
unclosed-here-doc.{text,expect}.

--- >8 ---
From b7103da900dd843aabb17201bc0f9ef0b7a704ba Mon Sep 17 00:00:00 2001
From: Eric Sunshine <sunshine@sunshineco.com>
Date: Tue, 28 Mar 2023 23:35:33 -0400
Subject: [PATCH] chainlint: diagnose unclosed here-doc

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index e966412999..3dac033ace 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -80,7 +80,8 @@ sub scan_heredoc_tag {
     return "<<$indented" unless $token;
     my $tag = $token->[0];
     $tag =~ s/['"\\]//g;
-    push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
+    $$token[0] = "\t$tag" if $indented;
+    push(@{$self->{heretags}}, $token);
     return "<<$indented$tag";
 }

@@ -169,10 +170,18 @@ sub swallow_heredocs {
     my $tags = $self->{heretags};
     while (my $tag = shift @$tags) {
         my $start = pos($$b);
-        my $indent = $tag =~ s/^\t// ? '\\s*' : '';
-        $$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
+        my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
+        $$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
+        if (pos($$b) > $start) {
+            my $body = substr($$b, $start, pos($$b) - $start);
+            $self->{lineno} += () = $body =~ /\n/sg;
+            next;
+        }
+        push(@{$self->{parser}->{problems}}, ['HERE', $tag]);
+        $$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
         my $body = substr($$b, $start, pos($$b) - $start);
         $self->{lineno} += () = $body =~ /\n/sg;
+        last;
     }
 }

--
2.40.0.460.g7fdda0a984
--- >8 ---

[-- Attachment #2: 0001-chainlint-diagnose-unclosed-here-doc.patch --]
[-- Type: application/octet-stream, Size: 1512 bytes --]

From b7103da900dd843aabb17201bc0f9ef0b7a704ba Mon Sep 17 00:00:00 2001
From: Eric Sunshine <sunshine@sunshineco.com>
Date: Tue, 28 Mar 2023 23:35:33 -0400
Subject: [PATCH] chainlint: diagnose unclosed here-doc

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/t/chainlint.pl b/t/chainlint.pl
index e966412999..3dac033ace 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -80,7 +80,8 @@ sub scan_heredoc_tag {
 	return "<<$indented" unless $token;
 	my $tag = $token->[0];
 	$tag =~ s/['"\\]//g;
-	push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
+	$$token[0] = "\t$tag" if $indented;
+	push(@{$self->{heretags}}, $token);
 	return "<<$indented$tag";
 }
 
@@ -169,10 +170,18 @@ sub swallow_heredocs {
 	my $tags = $self->{heretags};
 	while (my $tag = shift @$tags) {
 		my $start = pos($$b);
-		my $indent = $tag =~ s/^\t// ? '\\s*' : '';
-		$$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
+		my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
+		$$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
+		if (pos($$b) > $start) {
+			my $body = substr($$b, $start, pos($$b) - $start);
+			$self->{lineno} += () = $body =~ /\n/sg;
+			next;
+		}
+		push(@{$self->{parser}->{problems}}, ['HERE', $tag]);
+		$$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
 		my $body = substr($$b, $start, pos($$b) - $start);
 		$self->{lineno} += () = $body =~ /\n/sg;
+		last;
 	}
 }
 
-- 
2.40.0.460.g7fdda0a984


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  3:46           ` Eric Sunshine
@ 2023-03-29  4:02             ` Eric Sunshine
  2023-03-29  6:07             ` Jeff King
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Sunshine @ 2023-03-29  4:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 11:46 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> The attached patch seems to do the job. Apologies for Gmail messing up
> the whitespace. It's also attached to the email.
>
>      $tag =~ s/['"\\]//g;
> -    push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
> +    $$token[0] = "\t$tag" if $indented;
> +    push(@{$self->{heretags}}, $token);
>      return "<<$indented$tag";

Bah, the `$$token[0] = ...` line is incorrect. It should be:

     $$token[0] = $indented ? "\t$tag" : "$tag";

which more correctly matches the original code. Without this fix,
$$token[0] only gets the cleaned tag name in the `<<-EOF` case but not
in the plain `<<EOF` case.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  3:46           ` Eric Sunshine
  2023-03-29  4:02             ` Eric Sunshine
@ 2023-03-29  6:07             ` Jeff King
  2023-03-29  6:28               ` Eric Sunshine
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-29  6:07 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 11:46:37PM -0400, Eric Sunshine wrote:

> > Yes, it would look something like that and you chose the correct spot
> > to detect the problem, but to get a "pretty" error message properly
> > positioned in the input, we need to capture the input stream position
> > of the here-doc tag itself in scan_heredoc_tag(). It doesn't look too
> > difficult, and I even started writing a bit of code to do it, but I'm
> > not sure how soon I can get around to finishing the implementation.
> 
> The attached patch seems to do the job. Apologies for Gmail messing up
> the whitespace. It's also attached to the email.

Thanks! I was going to say "please don't consider this urgent", but I
see that my nerd-snipe was successful. ;)

> This would probably make a good preparatory patch to your [3/4]. As
> mentioned earlier in the thread, the changes to scan_heredoc_tag ()
> capture the input-stream position of the here-doc tag itself, which is
> necessary since it would be too late to do so by the time the error is
> detected by swallow_heredocs(). I don't now when I'll get time to send
> this as a proper patch, so feel free to write a commit message and
> incorporate it into your series if you want to use it. And, of course,
> you have my sign-off already in the patch. It should be easy to add a
> test, as well, in t/chainlint, perhaps as
> unclosed-here-doc.{text,expect}.

Thanks, I can take it from here (and I agree doing it as prep for 3/4 is
good, as I can then omit a bunch of explanations there). Here are the
tests I'll squash in (along with your $indent fix):

diff --git a/t/chainlint/unclosed-here-doc-indent.expect b/t/chainlint/unclosed-here-doc-indent.expect
new file mode 100644
index 00000000000..6e17bb66336
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc-indent.expect
@@ -0,0 +1,4 @@
+command_which_is_run &&
+cat >expect <<-\EOF ?!HERE?! &&
+we forget to end the here-doc
+command_which_is_gobbled
diff --git a/t/chainlint/unclosed-here-doc-indent.test b/t/chainlint/unclosed-here-doc-indent.test
new file mode 100644
index 00000000000..5c841a9dfd4
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc-indent.test
@@ -0,0 +1,4 @@
+command_which_is_run &&
+cat >expect <<-\EOF &&
+we forget to end the here-doc
+command_which_is_gobbled
diff --git a/t/chainlint/unclosed-here-doc.expect b/t/chainlint/unclosed-here-doc.expect
new file mode 100644
index 00000000000..c53b6b794a7
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc.expect
@@ -0,0 +1,7 @@
+command_which_is_run &&
+cat >expect <<\EOF ?!HERE?! &&
+	we try to end the here-doc below,
+	but the indentation throws us off
+	since the operator is not "<<-".
+	EOF
+command_which_is_gobbled
diff --git a/t/chainlint/unclosed-here-doc.test b/t/chainlint/unclosed-here-doc.test
new file mode 100644
index 00000000000..69d3786c348
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc.test
@@ -0,0 +1,7 @@
+command_which_is_run &&
+cat >expect <<\EOF &&
+	we try to end the here-doc below,
+	but the indentation throws us off
+	since the operator is not "<<-".
+	EOF
+command_which_is_gobbled

-Peff

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  3:07       ` Eric Sunshine
@ 2023-03-29  6:28         ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-29  6:28 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 11:07:17PM -0400, Eric Sunshine wrote:

> > I just think chainlint.pl is doing a good enough job of catching it that
> > we can rely on it. I'll be curious if Eric has input there on whether it
> > can do even better, which would remove all of the caveats from the
> > commit message.
> 
> It was an intentional design choice, for the sake of simplicity, _not_
> to make chainlint.pl a shell syntax checker, despite the fact that it
> is genuinely parsing shell code. After all, the shell itself -- by
> which test code will ultimately be executed -- is more than capable of
> diagnosing syntax errors, so why burden chainlint.pl with all that
> extra baggage? Instead, chainlint.pl is focussed on detecting semantic
> problems -- such as broken &&-chain and missing `return 1` -- which
> are of importance to _this_ project.

Yeah, I'm not too surprised to hear any of that, and I almost wrote off
chainlint.pl for this purpose (hence the hand-waving in my commit
message). But after realizing it has to find here-docs anyway to ignore
them, it seemed reasonable to bend it to my will. ;)

Thanks again for your patch.

> [1]: Why is that, by the way? Is `eval` not complaining about the
> unclosed here-doc? Is that something that can be fixed more generally?
> Are there other syntactic problems that go unnoticed?

The behavior varies from shell to shell. Historically, I suspect it was
a deliberate decision to read until EOF, because it lets you stick
arbitrary data at the end of a script, like:

  $ cat foo.sh
  my_prog <<\EOF
  1 2 3 4
  5 6 7 8
  [imagine kilobytes of data tables here]

without having to worry about terminating it. I think I've seen it with
something like:

  {
    echo "uudecode <<\EOF | tar tf -"
    tar cf - Documentation | uuencode -
  } >foo.sh

which makes foo.sh a sort of self-extracting tarball. (As an aside, I
was disappointed that I did not have uuencode installed by default on my
system. How times have changed. ;) ).

These days bash will warn about it, but dash will not:

  $ bash foo.sh | wc -l
  foo.sh: line 129028: warning: here-document at line 1 delimited by end-of-file (wanted `EOF')
  901

  $ dash foo.sh | wc -l
  901

Bash still has a zero exit code, though, so aside from the stderr cruft
(which is hidden unless you are digging in "-v" output), the tests would
pass. I can't remember when bash started warning, though "git log -S" in
its repo suggests it was bash 4.0 in 2009.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] tests: drop here-doc check from internal chain-linter
  2023-03-29  6:07             ` Jeff King
@ 2023-03-29  6:28               ` Eric Sunshine
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Sunshine @ 2023-03-29  6:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Phillip Wood, Michael J Gruber

On Wed, Mar 29, 2023 at 2:07 AM Jeff King <peff@peff.net> wrote:
> On Tue, Mar 28, 2023 at 11:46:37PM -0400, Eric Sunshine wrote:
> > The attached patch seems to do the job. Apologies for Gmail messing up
> > the whitespace. It's also attached to the email.
>
> Thanks! I was going to say "please don't consider this urgent", but I
> see that my nerd-snipe was successful. ;)

As always. My nerd-snipe armor is in failure mode.

> > This would probably make a good preparatory patch to your [3/4]. As
> > mentioned earlier in the thread, the changes to scan_heredoc_tag ()
> > capture the input-stream position of the here-doc tag itself, which is
> > necessary since it would be too late to do so by the time the error is
> > detected by swallow_heredocs(). I don't now when I'll get time to send
> > this as a proper patch, so feel free to write a commit message and
> > incorporate it into your series if you want to use it. And, of course,
> > you have my sign-off already in the patch. It should be easy to add a
> > test, as well, in t/chainlint, perhaps as
> > unclosed-here-doc.{text,expect}.
>
> Thanks, I can take it from here (and I agree doing it as prep for 3/4 is
> good, as I can then omit a bunch of explanations there). Here are the
> tests I'll squash in (along with your $indent fix):

Thanks for picking this up.

> diff --git a/t/chainlint/unclosed-here-doc-indent.expect b/t/chainlint/unclosed-here-doc-indent.expect
> diff --git a/t/chainlint/unclosed-here-doc-indent.test b/t/chainlint/unclosed-here-doc-indent.test
> diff --git a/t/chainlint/unclosed-here-doc.expect b/t/chainlint/unclosed-here-doc.expect
> diff --git a/t/chainlint/unclosed-here-doc.test b/t/chainlint/unclosed-here-doc.test

The new tests look fine.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] tests: run internal chain-linter under "make test"
  2023-03-28 20:22 ` [PATCH 1/4] tests: run internal chain-linter under "make test" Jeff King
@ 2023-03-29 10:20   ` Ævar Arnfjörð Bjarmason
  2023-03-29 15:49     ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-29 10:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber


On Tue, Mar 28 2023, Jeff King wrote:

> Since 69b9924b875 (t/Makefile: teach `make test` and `make prove` to run
> chainlint.pl, 2022-09-01), we run a single chainlint.pl process for all
> scripts, and then instruct each individual script to run with the
> equivalent of --no-chain-lint, which tells them not to redundantly run
> the chainlint script themselves.
>
> However, this also disables the internal linter run within the shell by
> eval-ing "(exit 117) && $1" and confirming we get code 117. In theory
> the external linter produces a superset of complaints, and we don't need
> the internal one anymore. However, we know there is at least one case
> where they differ. A test like:
>
> 	test_expect_success 'should fail linter' '
> 		false &&
> 		sleep 2 &
> 		pid=$! &&
> 		kill $pid
> 	'
>
> is buggy (it ignores the failure from "false", because it is
> backgrounded along with the sleep). The internal linter catches this,
> but the external one doesn't (and teaching it to do so is complicated[1]).
> So not only does "make test" miss this problem, but it's doubly
> confusing because running the script standalone does complain.
>
> Let's teach the suppression in the Makefile to only turn off the
> external linter (which we know is redundant, as it was already run) and
> leave the internal one intact.
>
> I've used a new environment variable to do this here, and intentionally
> did not add a "--no-ext-chain-lint" option. This is an internal
> optimization used by the Makefile, and not something that ordinary users
> would need to tweak.
>
> [1] For discussion of chainlint.pl and this case, see:
>     https://lore.kernel.org/git/CAPig+cQtLFX4PgXyyK_AAkCvg4Aw2RAC5MmLbib-aHHgTBcDuw@mail.gmail.com/
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/Makefile    | 2 +-
>  t/test-lib.sh | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/t/Makefile b/t/Makefile
> index 88fa5049573..10881affdd0 100644
> --- a/t/Makefile
> +++ b/t/Makefile
> @@ -45,7 +45,7 @@ CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
>  # `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
>  # checks all tests in all scripts via a single invocation, so tell individual
>  # scripts not to "chainlint" themselves
> -CHAINLINTSUPPRESS = GIT_TEST_CHAIN_LINT=0 && export GIT_TEST_CHAIN_LINT &&
> +CHAINLINTSUPPRESS = GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT &&
>  
>  all: $(DEFAULT_TEST_TARGET)
>  
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 62136caee5a..09789566374 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1593,7 +1593,8 @@ then
>  	BAIL_OUT_ENV_NEEDS_SANITIZE_LEAK "GIT_TEST_SANITIZE_LEAK_LOG=true"
>  fi
>  
> -if test "${GIT_TEST_CHAIN_LINT:-1}" != 0
> +if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 &&
> +   test "${GIT_TEST_EXT_CHAIN_LINT:-1}" != 0
>  then
>  	"$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
>  		BUG "lint error (see '?!...!? annotations above)"

So, we used to use the "sed" script to run against the *.sh, and do the
"eval 117" check, but since chainlint.pl we've checked all the tests in
one go when invoked from the Makefile, and turned off the "eval 117".

And, as this change points out, there's cases where chainlint.pl won't
catch everything, so we'd like to keep the "eval" bits by default, but
not the invoking of chainlint.pl by the script itself.

All of which makes sense.

But it seems to me that the findings in this change make the docs
outdated, and we should update them, but maybe I'm missing something.

Now the comment in the Makefile still says (as seen in the context):

	# `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
	# checks all tests in all scripts via a single invocation, so tell individual
	# scripts not to "chainlint" themselves

But as the commit message notes that's not accurate anymore. We're *not*
telling them to turn off chainlint in its entirety, we're telling them
to only suppress the chainlint.pl part.

So if you invoke the Makefile with GIT_TEST_CHAIN_LINT=0 we'll still
turn all of it off, but an invokation of chainlint.pl for the scripts as
a batch is no longer thought to make the "eval 117" trick redundant.

I haven't looked too deeply, but it seems that we should at least adjust
that comment in the Makefile, or perhaps we should rename this "eval
117" thing to be "internal lint" or something, and not "chain lint", to
avoid conflating it with chainlint.pl itself.

So maybe something like this, i.e. we'd just "rebrand" this to begin
with, and avoid editing the t/Makefile at all (untested)?

 t/README      | 12 ++++++++++++
 t/test-lib.sh |  6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/t/README b/t/README
index 29576c37488..83d58f2e680 100644
--- a/t/README
+++ b/t/README
@@ -196,6 +196,18 @@ appropriately before running "make". Short options can be bundled, i.e.
 	this feature by setting the GIT_TEST_CHAIN_LINT environment
 	variable to "1" or "0", respectively.
 
+--internal-lint::
+--no-internal-lint::
+	If --internal-lint is enabled, the test harness will try to
+	detect missing "&&"-chains in edge cases that "--chain-lint"
+	wouldn't notice, unlike "--chain-lint" this check is run from
+	within "test_expect_*" itself, and thus requires the tests to
+	run.
+
+	You may also enable or disable this feature by setting the
+	GIT_TEST_INTERNAL_LINT environment variable to "1" or "0",
+	respectively.
+
 --stress::
 	Run the test script repeatedly in multiple parallel jobs until
 	one of them fails.  Useful for reproducing rare failures in
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 62136caee5a..792c68aa56a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -205,6 +205,10 @@ parse_option () {
 		GIT_TEST_CHAIN_LINT=1 ;;
 	--no-chain-lint)
 		GIT_TEST_CHAIN_LINT=0 ;;
+	--internal-lint)
+		GIT_TEST_INTERNAL_LINT=1 ;;
+	--no-internal-lint)
+		GIT_TEST_INTERNAL_LINT=0 ;;
 	-x)
 		trace=t ;;
 	-V|--verbose-log)
@@ -1090,7 +1094,7 @@ test_run_ () {
 	test_cleanup=:
 	expecting_failure=$2
 
-	if test "${GIT_TEST_CHAIN_LINT:-1}" != 0; then
+	if test "${GIT_TEST_INTERNAL_LINT:-1}" != 0; then
 		# turn off tracing for this test-eval, as it simply creates
 		# confusing noise in the "-x" output
 		trace_tmp=$trace

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] tests: run internal chain-linter under "make test"
  2023-03-29 10:20   ` Ævar Arnfjörð Bjarmason
@ 2023-03-29 15:49     ` Junio C Hamano
  2023-03-29 23:28       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-03-29 15:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, git, Eric Sunshine, Phillip Wood, Michael J Gruber

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Now the comment in the Makefile still says (as seen in the context):
>
> 	# `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
> 	# checks all tests in all scripts via a single invocation, so tell individual
> 	# scripts not to "chainlint" themselves
>
> But as the commit message notes that's not accurate anymore. We're *not*
> telling them to turn off chainlint in its entirety, we're telling them
> to only suppress the chainlint.pl part.

Correct.  To avoid a confusion we saw in the thread that led to this
series, we need an explanation that clearly distinguishes the "exit
117" one and the "script that parses the shell" one.  If we consider
that the name "chainlint" refers to the latter, the script, and
re-read the three lines with that in mind, I think they are OK.  It
would become even clearer if we insert four words, like so:

	`test-chainlint' (which is ...) checks ... via a single
	invocation of the "chainlint" script, so tell ...

> So if you invoke the Makefile with GIT_TEST_CHAIN_LINT=0 we'll still
> turn all of it off, but an invokation of chainlint.pl for the scripts as
> a batch is no longer thought to make the "eval 117" trick redundant.

Yes, I think that is the idea reached by the discussion in the other
thread that led to this series.

> I haven't looked too deeply, but it seems that we should at least adjust
> that comment in the Makefile, or perhaps we should rename this "eval
> 117" thing to be "internal lint" or something, and not "chain lint", to
> avoid conflating it with chainlint.pl itself.

I agree that it would help to clarify that we mean by "chainlint"
the script that parses (or the act of running the script, when it is
used as a verb), and the above may be a way to do so.  I however do
not see the need to come up with a new name for the other one, until
we have a way to toggle it enabled/disabled, which is not something
the discussion in the other thread found necessary.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] tests: run internal chain-linter under "make test"
  2023-03-29 15:49     ` Junio C Hamano
@ 2023-03-29 23:28       ` Jeff King
  2023-03-30 18:45         ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-29 23:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Eric Sunshine,
	Phillip Wood, Michael J Gruber

On Wed, Mar 29, 2023 at 08:49:07AM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
> > Now the comment in the Makefile still says (as seen in the context):
> >
> > 	# `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
> > 	# checks all tests in all scripts via a single invocation, so tell individual
> > 	# scripts not to "chainlint" themselves
> >
> > But as the commit message notes that's not accurate anymore. We're *not*
> > telling them to turn off chainlint in its entirety, we're telling them
> > to only suppress the chainlint.pl part.
> 
> Correct.  To avoid a confusion we saw in the thread that led to this
> series, we need an explanation that clearly distinguishes the "exit
> 117" one and the "script that parses the shell" one.  If we consider
> that the name "chainlint" refers to the latter, the script, and
> re-read the three lines with that in mind, I think they are OK.  It
> would become even clearer if we insert four words, like so:
> 
> 	`test-chainlint' (which is ...) checks ... via a single
> 	invocation of the "chainlint" script, so tell ...

I had hoped that "chainlint" in that comment would remain sufficient, as
the context implies that we're disabling the script. But it's easy
enough to expand. I squashed this in:

diff --git a/t/Makefile b/t/Makefile
index 10881affdd0..3e00cdd801d 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -44,7 +44,7 @@ CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
 
 # `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
 # checks all tests in all scripts via a single invocation, so tell individual
-# scripts not to "chainlint" themselves
+# scripts not to run the external "chainlint.pl" script themselves
 CHAINLINTSUPPRESS = GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT &&
 
 all: $(DEFAULT_TEST_TARGET)

> > I haven't looked too deeply, but it seems that we should at least adjust
> > that comment in the Makefile, or perhaps we should rename this "eval
> > 117" thing to be "internal lint" or something, and not "chain lint", to
> > avoid conflating it with chainlint.pl itself.
> 
> I agree that it would help to clarify that we mean by "chainlint"
> the script that parses (or the act of running the script, when it is
> used as a verb), and the above may be a way to do so.  I however do
> not see the need to come up with a new name for the other one, until
> we have a way to toggle it enabled/disabled, which is not something
> the discussion in the other thread found necessary.

Yeah. This distinction is not something anybody who is running or
writing tests should have to worry about (and the fact that it did come
up was a bug that this patch is fixing). So adding an option like
"--no-chain-lint-internal" is just making things more confusing, and
nobody would use it. You'd need it only if you are working on
chainlint.pl itself (and comparing results with the internal linter or
something), and there you are better off running "perl chainlint.pl
t1234-foo.sh" itself.

-Peff

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] tests: run internal chain-linter under "make test"
  2023-03-29 23:28       ` Jeff King
@ 2023-03-30 18:45         ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2023-03-30 18:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Eric Sunshine,
	Phillip Wood, Michael J Gruber

Jeff King <peff@peff.net> writes:

> I had hoped that "chainlint" in that comment would remain sufficient, as
> the context implies that we're disabling the script. But it's easy
> enough to expand. I squashed this in:
>
> diff --git a/t/Makefile b/t/Makefile
> index 10881affdd0..3e00cdd801d 100644
> --- a/t/Makefile
> +++ b/t/Makefile
> @@ -44,7 +44,7 @@ CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
>  
>  # `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
>  # checks all tests in all scripts via a single invocation, so tell individual
> -# scripts not to "chainlint" themselves
> +# scripts not to run the external "chainlint.pl" script themselves

OK.  I've taken it and did "rebase -i" on this end.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/5] some chainlint fixes and performance improvements
  2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
                   ` (4 preceding siblings ...)
  2023-03-28 21:08 ` [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
@ 2023-03-30 19:27 ` Jeff King
  2023-03-30 19:27   ` [PATCH v2 1/5] tests: run internal chain-linter under "make test" Jeff King
                     ` (5 more replies)
  5 siblings, 6 replies; 32+ messages in thread
From: Jeff King @ 2023-03-30 19:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eric Sunshine, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 04:20:44PM -0400, Jeff King wrote:

> Here are a few fixes for chainlint.

And here's a re-roll.

As before, I think the first patch is the most important, and the rest
are optimizations. But with Eric's patch to chainlint.pl in the middle,
I think the argument for patch 4 (previously patch 3) is much stronger.

Patch 5 remains mostly a cleanup, with no performance improvement. IMHO
the result is easier to follow, but I'm open to arguments to the
contrary.

  [1/5]: tests: run internal chain-linter under "make test"
  [2/5]: tests: replace chainlint subshell with a function
  [3/5]: tests: diagnose unclosed here-doc in chainlint.pl
  [4/5]: tests: drop here-doc check from internal chain-linter
  [5/5]: tests: skip test_eval_ in internal chain-lint

 t/Makefile                                  |  4 +--
 t/chainlint.pl                              | 15 +++++++++---
 t/chainlint/unclosed-here-doc-indent.expect |  4 +++
 t/chainlint/unclosed-here-doc-indent.test   |  4 +++
 t/chainlint/unclosed-here-doc.expect        |  7 ++++++
 t/chainlint/unclosed-here-doc.test          |  7 ++++++
 t/test-lib.sh                               | 27 +++++++++++----------
 7 files changed, 50 insertions(+), 18 deletions(-)
 create mode 100644 t/chainlint/unclosed-here-doc-indent.expect
 create mode 100644 t/chainlint/unclosed-here-doc-indent.test
 create mode 100644 t/chainlint/unclosed-here-doc.expect
 create mode 100644 t/chainlint/unclosed-here-doc.test

Range-diff:

1:  19deb7195df ! 1:  d536d3b9ec0 tests: run internal chain-linter under "make test"
    @@ Commit message
     
      ## t/Makefile ##
     @@ t/Makefile: CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
    + 
      # `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
      # checks all tests in all scripts via a single invocation, so tell individual
    - # scripts not to "chainlint" themselves
    +-# scripts not to "chainlint" themselves
     -CHAINLINTSUPPRESS = GIT_TEST_CHAIN_LINT=0 && export GIT_TEST_CHAIN_LINT &&
    ++# scripts not to run the external "chainlint.pl" script themselves
     +CHAINLINTSUPPRESS = GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT &&
      
      all: $(DEFAULT_TEST_TARGET)
2:  a05c440dde5 = 2:  fa29c781fca tests: replace chainlint subshell with a function
-:  ----------- > 3:  c1a3ec3619e tests: diagnose unclosed here-doc in chainlint.pl
3:  46556678938 ! 4:  b5dc3618c83 tests: drop here-doc check from internal chain-linter
    @@ Commit message
         run for many tests.
     
         The tradeoff in safety was undoubtedly worth it when 99a64e4b73c was
    -    written. But these days, the external chainlint.pl does a pretty good
    -    job of finding these (even though it's not something it specifically
    -    tries to flag). For example, if you have a test like:
    -
    -            test_expect_success 'should fail linter' '
    -                    some_command >actual &&
    -                    cat >expect <<-\EOF &&
    -                    ok
    -                    # missing EOF line here
    -                    test_cmp expect actual
    -            '
    -
    -    it will see that the here-doc isn't closed, treat it as not-a-here-doc,
    -    and complain that the "ok" line does not have an "&&". So in practice we
    -    should be catching these via that linter, although:
    -
    -      - the error message is not as good as it could be (the real problem is
    -        the unclosed here-doc)
    -
    -      - it can be fooled if there are no lines in the here-doc:
    -
    -          cat >expect <<-\EOF &&
    -          # missing EOF line here
    -
    -        or if every line in the here-doc has &&-chaining (weird, but
    -        possible)
    -
    -    Those are sufficiently unlikely that they're not worth worrying too much
    -    about. And by switching back to a simpler chain-lint, hyperfine reports
    -    a measurable speedup on t3070 (which has 1800 tests):
    +    written. But since the external chainlint.pl learned to find these
    +    recently, we can just rely on it. By switching back to a simpler
    +    chain-lint, hyperfine reports a measurable speedup on t3070 (which has
    +    1800 tests):
     
           'HEAD' ran
             1.12 ± 0.01 times faster than 'HEAD~1'
4:  f810780d326 = 5:  0ebf1da8b93 tests: skip test_eval_ in internal chain-lint


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/5] tests: run internal chain-linter under "make test"
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
@ 2023-03-30 19:27   ` Jeff King
  2023-03-30 19:27   ` [PATCH v2 2/5] tests: replace chainlint subshell with a function Jeff King
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-30 19:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eric Sunshine, Phillip Wood, Michael J Gruber

Since 69b9924b875 (t/Makefile: teach `make test` and `make prove` to run
chainlint.pl, 2022-09-01), we run a single chainlint.pl process for all
scripts, and then instruct each individual script to run with the
equivalent of --no-chain-lint, which tells them not to redundantly run
the chainlint script themselves.

However, this also disables the internal linter run within the shell by
eval-ing "(exit 117) && $1" and confirming we get code 117. In theory
the external linter produces a superset of complaints, and we don't need
the internal one anymore. However, we know there is at least one case
where they differ. A test like:

	test_expect_success 'should fail linter' '
		false &&
		sleep 2 &
		pid=$! &&
		kill $pid
	'

is buggy (it ignores the failure from "false", because it is
backgrounded along with the sleep). The internal linter catches this,
but the external one doesn't (and teaching it to do so is complicated[1]).
So not only does "make test" miss this problem, but it's doubly
confusing because running the script standalone does complain.

Let's teach the suppression in the Makefile to only turn off the
external linter (which we know is redundant, as it was already run) and
leave the internal one intact.

I've used a new environment variable to do this here, and intentionally
did not add a "--no-ext-chain-lint" option. This is an internal
optimization used by the Makefile, and not something that ordinary users
would need to tweak.

[1] For discussion of chainlint.pl and this case, see:
    https://lore.kernel.org/git/CAPig+cQtLFX4PgXyyK_AAkCvg4Aw2RAC5MmLbib-aHHgTBcDuw@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
---
Same as before, but tweaking the Makefile comment.

 t/Makefile    | 4 ++--
 t/test-lib.sh | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/Makefile b/t/Makefile
index 88fa5049573..3e00cdd801d 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -44,8 +44,8 @@ CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl

 # `test-chainlint` (which is a dependency of `test-lint`, `test` and `prove`)
 # checks all tests in all scripts via a single invocation, so tell individual
-# scripts not to "chainlint" themselves
-CHAINLINTSUPPRESS = GIT_TEST_CHAIN_LINT=0 && export GIT_TEST_CHAIN_LINT &&
+# scripts not to run the external "chainlint.pl" script themselves
+CHAINLINTSUPPRESS = GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT &&

 all: $(DEFAULT_TEST_TARGET)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 62136caee5a..09789566374 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1593,7 +1593,8 @@ then
 	BAIL_OUT_ENV_NEEDS_SANITIZE_LEAK "GIT_TEST_SANITIZE_LEAK_LOG=true"
 fi

-if test "${GIT_TEST_CHAIN_LINT:-1}" != 0
+if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 &&
+   test "${GIT_TEST_EXT_CHAIN_LINT:-1}" != 0
 then
 	"$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
 		BUG "lint error (see '?!...!? annotations above)"
-- 
2.40.0.692.g7c4c956fc5c

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/5] tests: replace chainlint subshell with a function
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
  2023-03-30 19:27   ` [PATCH v2 1/5] tests: run internal chain-linter under "make test" Jeff King
@ 2023-03-30 19:27   ` Jeff King
  2023-03-30 19:30   ` [PATCH v2 3/5] tests: diagnose unclosed here-doc in chainlint.pl Jeff King
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-30 19:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eric Sunshine, Phillip Wood, Michael J Gruber

To test that we don't break the &&-chain, test-lib.sh does something
like:

   (exit 117) && $test_commands

and checks that the result is exit code 117. We don't care what that
initial command is, as long as it exits with a unique code. Using "exit"
works and is simple, but is a bit expensive since it requires a subshell
(to avoid exiting the whole script!). This isn't usually very
noticeable, but it can add up for scripts which have a large number of
tests.

Using "return" naively won't work here, because we'd return from the
function eval-ing the snippet (and it wouldn't find &&-chain breakages).
But if we further push that into its own function, it does exactly what
we want, without extra subshell overhead.

According to hyperfine, this produces a measurable improvement when
running t3070 (which has 1800 tests, all of them quite short):

  'HEAD' ran
    1.09 ± 0.01 times faster than 'HEAD~1'

Signed-off-by: Jeff King <peff@peff.net>
---
Same as before.

 t/test-lib.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 09789566374..cfcbd899c5a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1086,6 +1086,10 @@ test_eval_ () {
 	return $test_eval_ret_
 }

+fail_117 () {
+	return 117
+}
+
 test_run_ () {
 	test_cleanup=:
 	expecting_failure=$2
@@ -1097,7 +1101,7 @@ test_run_ () {
 		trace=
 		# 117 is magic because it is unlikely to match the exit
 		# code of other programs
-		if test "OK-117" != "$(test_eval_ "(exit 117) && $1${LF}${LF}echo OK-\$?" 3>&1)"
+		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
 		then
 			BUG "broken &&-chain or run-away HERE-DOC: $1"
 		fi
-- 
2.40.0.692.g7c4c956fc5c

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/5] tests: diagnose unclosed here-doc in chainlint.pl
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
  2023-03-30 19:27   ` [PATCH v2 1/5] tests: run internal chain-linter under "make test" Jeff King
  2023-03-30 19:27   ` [PATCH v2 2/5] tests: replace chainlint subshell with a function Jeff King
@ 2023-03-30 19:30   ` Jeff King
  2023-03-30 21:26     ` Eric Sunshine
  2023-03-30 19:30   ` [PATCH v2 4/5] tests: drop here-doc check from internal chain-linter Jeff King
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-30 19:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eric Sunshine, Phillip Wood, Michael J Gruber

From: Eric Sunshine <sunshine@sunshineco.com>

An unclosed here-doc in a test is a problem, because it silently gobbles
up any remaining commands. Since 99a64e4b73c (tests: lint for run-away
here-doc, 2017-03-22) we detect this by piggy-backing on the internal
chainlint checker in test-lib.sh.

However, it would be nice to detect it in chainlint.pl, for a few
reasons:

  - the output from chainlint.pl is much nicer; it can show the exact
    spot of the error, rather than a vague "somewhere in this test you
    broke the &&-chain or had a bad here-doc" message.

  - the implementation in test-lib.sh runs for each test snippet. And
    since it requires a subshell, the extra cost is small but not zero.
    If chainlint.pl can reliably find the problem, we can optimize the
    test-lib.sh code.

The chainlint.pl code never intended to find here-doc problems. But
since it has to parse them anyway (to avoid reporting problems inside
here-docs), most of what we need is already there. We can detect the
problem when we fail to find the missing end-tag in swallow_heredocs().
The extra change in scan_heredoc_tag() stores the location of the start
of the here-doc, which lets us mark it as the source of the error in the
output (see the new tests for examples).

[jk: added commit message and tests]

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
---
New in this iteration (thanks again, Eric!).

I changed the error-tag from "HERE" to "UNCLOSED-HEREDOC". I don't think
there any syntactic limitations to worry about (it's just shown to the
user). And it needs to be descriptive enough not to confuse users.
"HERE" is especially bad because my brain interprets it as "HERE there
is an error", which makes me say "Right. What error?".

So "HEREDOC" would work, but there is no reason (aside from length) not
to err on the side of being descriptive.

 t/chainlint.pl                              | 15 ++++++++++++---
 t/chainlint/unclosed-here-doc-indent.expect |  4 ++++
 t/chainlint/unclosed-here-doc-indent.test   |  4 ++++
 t/chainlint/unclosed-here-doc.expect        |  7 +++++++
 t/chainlint/unclosed-here-doc.test          |  7 +++++++
 5 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 t/chainlint/unclosed-here-doc-indent.expect
 create mode 100644 t/chainlint/unclosed-here-doc-indent.test
 create mode 100644 t/chainlint/unclosed-here-doc.expect
 create mode 100644 t/chainlint/unclosed-here-doc.test

diff --git a/t/chainlint.pl b/t/chainlint.pl
index e966412999a..556ee91a15b 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -80,7 +80,8 @@ sub scan_heredoc_tag {
 	return "<<$indented" unless $token;
 	my $tag = $token->[0];
 	$tag =~ s/['"\\]//g;
-	push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
+	$$token[0] = $indented ? "\t$tag" : "$tag";
+	push(@{$self->{heretags}}, $token);
 	return "<<$indented$tag";
 }
 
@@ -169,10 +170,18 @@ sub swallow_heredocs {
 	my $tags = $self->{heretags};
 	while (my $tag = shift @$tags) {
 		my $start = pos($$b);
-		my $indent = $tag =~ s/^\t// ? '\\s*' : '';
-		$$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
+		my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
+		$$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
+		if (pos($$b) > $start) {
+			my $body = substr($$b, $start, pos($$b) - $start);
+			$self->{lineno} += () = $body =~ /\n/sg;
+			next;
+		}
+		push(@{$self->{parser}->{problems}}, ['UNCLOSED-HEREDOC', $tag]);
+		$$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
 		my $body = substr($$b, $start, pos($$b) - $start);
 		$self->{lineno} += () = $body =~ /\n/sg;
+		last;
 	}
 }
 
diff --git a/t/chainlint/unclosed-here-doc-indent.expect b/t/chainlint/unclosed-here-doc-indent.expect
new file mode 100644
index 00000000000..7c30a1a024e
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc-indent.expect
@@ -0,0 +1,4 @@
+command_which_is_run &&
+cat >expect <<-\EOF ?!UNCLOSED-HEREDOC?! &&
+we forget to end the here-doc
+command_which_is_gobbled
diff --git a/t/chainlint/unclosed-here-doc-indent.test b/t/chainlint/unclosed-here-doc-indent.test
new file mode 100644
index 00000000000..5c841a9dfd4
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc-indent.test
@@ -0,0 +1,4 @@
+command_which_is_run &&
+cat >expect <<-\EOF &&
+we forget to end the here-doc
+command_which_is_gobbled
diff --git a/t/chainlint/unclosed-here-doc.expect b/t/chainlint/unclosed-here-doc.expect
new file mode 100644
index 00000000000..d65e50f78d4
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc.expect
@@ -0,0 +1,7 @@
+command_which_is_run &&
+cat >expect <<\EOF ?!UNCLOSED-HEREDOC?! &&
+	we try to end the here-doc below,
+	but the indentation throws us off
+	since the operator is not "<<-".
+	EOF
+command_which_is_gobbled
diff --git a/t/chainlint/unclosed-here-doc.test b/t/chainlint/unclosed-here-doc.test
new file mode 100644
index 00000000000..69d3786c348
--- /dev/null
+++ b/t/chainlint/unclosed-here-doc.test
@@ -0,0 +1,7 @@
+command_which_is_run &&
+cat >expect <<\EOF &&
+	we try to end the here-doc below,
+	but the indentation throws us off
+	since the operator is not "<<-".
+	EOF
+command_which_is_gobbled
-- 
2.40.0.692.g7c4c956fc5c


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/5] tests: drop here-doc check from internal chain-linter
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
                     ` (2 preceding siblings ...)
  2023-03-30 19:30   ` [PATCH v2 3/5] tests: diagnose unclosed here-doc in chainlint.pl Jeff King
@ 2023-03-30 19:30   ` Jeff King
  2023-03-30 19:30   ` [PATCH v2 5/5] tests: skip test_eval_ in internal chain-lint Jeff King
  2023-03-30 20:32   ` [PATCH v2 0/5] some chainlint fixes and performance improvements Junio C Hamano
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-30 19:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eric Sunshine, Phillip Wood, Michael J Gruber

Commit 99a64e4b73c (tests: lint for run-away here-doc, 2017-03-22)
tweaked the chain-lint test to catch unclosed here-docs. It works by
adding an extra "echo" command after the test snippet, and checking that
it is run (if it gets swallowed by a here-doc, naturally it is not run).

The downside here is that we introduced an extra $() substitution, which
happens in a subshell. This has a measurable performance impact when
run for many tests.

The tradeoff in safety was undoubtedly worth it when 99a64e4b73c was
written. But since the external chainlint.pl learned to find these
recently, we can just rely on it. By switching back to a simpler
chain-lint, hyperfine reports a measurable speedup on t3070 (which has
1800 tests):

  'HEAD' ran
    1.12 ± 0.01 times faster than 'HEAD~1'

Signed-off-by: Jeff King <peff@peff.net>
---
Now with 90% less hand-waving.

 t/test-lib.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index cfcbd899c5a..0048ec7b6f6 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1101,9 +1101,10 @@ test_run_ () {
 		trace=
 		# 117 is magic because it is unlikely to match the exit
 		# code of other programs
-		if test "OK-117" != "$(test_eval_ "fail_117 && $1${LF}${LF}echo OK-\$?" 3>&1)"
+		test_eval_ "fail_117 && $1"
+		if test $? != 117
 		then
-			BUG "broken &&-chain or run-away HERE-DOC: $1"
+			BUG "broken &&-chain: $1"
 		fi
 		trace=$trace_tmp
 	fi
-- 
2.40.0.692.g7c4c956fc5c

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 5/5] tests: skip test_eval_ in internal chain-lint
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
                     ` (3 preceding siblings ...)
  2023-03-30 19:30   ` [PATCH v2 4/5] tests: drop here-doc check from internal chain-linter Jeff King
@ 2023-03-30 19:30   ` Jeff King
  2023-03-30 20:32   ` [PATCH v2 0/5] some chainlint fixes and performance improvements Junio C Hamano
  5 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-30 19:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Eric Sunshine, Phillip Wood, Michael J Gruber

To check for broken &&-chains, we run "fail_117 && $1" as a test
snippet, and check the exit code. We use test_eval_ to do so, because
that's the way we run the actual test.

But we don't need any of its niceties, like "set -x" tracing. In fact,
they hinder us, because we have to explicitly disable them. So let's
skip that and use "eval" more directly, which is simpler. I had hoped it
would also be faster, but it doesn't seem to produce a measurable
improvement (probably because it's just running internal shell commands,
with no subshells or forks).

Note that there is one gotcha: even though we don't intend to run any of
the commands if the &&-chain is intact, an error like this:

   test_expect_success 'broken' '
	# this next line breaks the &&-chain
	true
	# and then this one is executed even by the linter
	return 1
   '

means we'll "return 1" from the eval, and thus from test_run_(). We
actually do notice this in test_expect_success, but only by saying "hey,
this test didn't say it was OK, so it must have failed", which is not
right (it should say "broken &&-chain").

We can handle this by calling test_eval_inner_() instead, which is our
trick for wrapping "return" in a test snippet. But to do that, we have
to push the trace code out of that inner function and into test_eval_().
This is arguably where it belonged in the first place, but it never
mattered because the "inner_" function had only one caller.

Signed-off-by: Jeff King <peff@peff.net>
---
Same as before.

 t/test-lib.sh | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0048ec7b6f6..293caf0f20e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1041,10 +1041,7 @@ want_trace () {
 # (and we want to make sure we run any cleanup like
 # "set +x").
 test_eval_inner_ () {
-	# Do not add anything extra (including LF) after '$*'
-	eval "
-		want_trace && trace_level_=$(($trace_level_+1)) && set -x
-		$*"
+	eval "$*"
 }

 test_eval_ () {
@@ -1069,7 +1066,10 @@ test_eval_ () {
 	#     be _inside_ the block to avoid polluting the "set -x" output
 	#

-	test_eval_inner_ "$@" </dev/null >&3 2>&4
+	# Do not add anything extra (including LF) after '$*'
+	test_eval_inner_ </dev/null >&3 2>&4 "
+		want_trace && trace_level_=$(($trace_level_+1)) && set -x
+		$*"
 	{
 		test_eval_ret_=$?
 		if want_trace
@@ -1095,18 +1095,13 @@ test_run_ () {
 	expecting_failure=$2

 	if test "${GIT_TEST_CHAIN_LINT:-1}" != 0; then
-		# turn off tracing for this test-eval, as it simply creates
-		# confusing noise in the "-x" output
-		trace_tmp=$trace
-		trace=
 		# 117 is magic because it is unlikely to match the exit
 		# code of other programs
-		test_eval_ "fail_117 && $1"
+		test_eval_inner_ "fail_117 && $1" </dev/null >&3 2>&4
 		if test $? != 117
 		then
 			BUG "broken &&-chain: $1"
 		fi
-		trace=$trace_tmp
 	fi

 	setup_malloc_check
-- 
2.40.0.692.g7c4c956fc5c

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/5] some chainlint fixes and performance improvements
  2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
                     ` (4 preceding siblings ...)
  2023-03-30 19:30   ` [PATCH v2 5/5] tests: skip test_eval_ in internal chain-lint Jeff King
@ 2023-03-30 20:32   ` Junio C Hamano
  2023-03-30 22:09     ` Jeff King
  5 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-03-30 20:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

Jeff King <peff@peff.net> writes:

> As before, I think the first patch is the most important, and the rest
> are optimizations. But with Eric's patch to chainlint.pl in the middle,
> I think the argument for patch 4 (previously patch 3) is much stronger.
>
> Patch 5 remains mostly a cleanup, with no performance improvement. IMHO
> the result is easier to follow, but I'm open to arguments to the
> contrary.
>
>   [1/5]: tests: run internal chain-linter under "make test"
>   [2/5]: tests: replace chainlint subshell with a function
>   [3/5]: tests: diagnose unclosed here-doc in chainlint.pl
>   [4/5]: tests: drop here-doc check from internal chain-linter
>   [5/5]: tests: skip test_eval_ in internal chain-lint

The new step [3/5] makes it easier to justify [4/5], indeed.  Two
primary changes at the beginning are good as before.  The last one
does not make anything particularly easier to read, replacing one
cryptic eval stuff with another, but it does not make it any worse,
and the most importantly, it is clear to see that it does not change
the behaviour.

Will queue.  Thanks.  Let's merge it down to 'next'.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/5] tests: diagnose unclosed here-doc in chainlint.pl
  2023-03-30 19:30   ` [PATCH v2 3/5] tests: diagnose unclosed here-doc in chainlint.pl Jeff King
@ 2023-03-30 21:26     ` Eric Sunshine
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Sunshine @ 2023-03-30 21:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Phillip Wood, Michael J Gruber

On Thu, Mar 30, 2023 at 3:30 PM Jeff King <peff@peff.net> wrote:
> From: Eric Sunshine <sunshine@sunshineco.com>
>
> An unclosed here-doc in a test is a problem, because it silently gobbles
> up any remaining commands. Since 99a64e4b73c (tests: lint for run-away
> here-doc, 2017-03-22) we detect this by piggy-backing on the internal
> chainlint checker in test-lib.sh.
>
> However, it would be nice to detect it in chainlint.pl, for a few
> reasons:
>
>   - the output from chainlint.pl is much nicer; it can show the exact
>     spot of the error, rather than a vague "somewhere in this test you
>     broke the &&-chain or had a bad here-doc" message.
>
>   - the implementation in test-lib.sh runs for each test snippet. And
>     since it requires a subshell, the extra cost is small but not zero.
>     If chainlint.pl can reliably find the problem, we can optimize the
>     test-lib.sh code.
>
> The chainlint.pl code never intended to find here-doc problems. But
> since it has to parse them anyway (to avoid reporting problems inside
> here-docs), most of what we need is already there. We can detect the
> problem when we fail to find the missing end-tag in swallow_heredocs().
> The extra change in scan_heredoc_tag() stores the location of the start
> of the here-doc, which lets us mark it as the source of the error in the
> output (see the new tests for examples).
>
> [jk: added commit message and tests]

Thanks for packaging this up as a proper patch.

> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> New in this iteration (thanks again, Eric!).
>
> I changed the error-tag from "HERE" to "UNCLOSED-HEREDOC". I don't think
> there any syntactic limitations to worry about (it's just shown to the
> user).

The error-tag is plucked by the script at a couple places using a
regex such as /\?![^?]+\?!/ which won't be tripped up by the longer
and hyphenated name, so no problem there.

> And it needs to be descriptive enough not to confuse users.
> "HERE" is especially bad because my brain interprets it as "HERE there
> is an error", which makes me say "Right. What error?".
>
> So "HEREDOC" would work, but there is no reason (aside from length) not
> to err on the side of being descriptive.

I had the same thought about HERE being potentially misleading, thus
wavered between HERE and HEREDOC, but ultimately chose the shorter for
length-similarity with the existing AMP and LOOP. That reminds me
that, for some time now, I've been thinking that t/README ought to
have a section explaining the meaning of AMP and LOOP (and HERE, if we
went with that) since newcomers may not intuitively understand what
such terse complaints mean.

> diff --git a/t/chainlint.pl b/t/chainlint.pl
> @@ -80,7 +80,8 @@ sub scan_heredoc_tag {
> -       push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
> +       $$token[0] = $indented ? "\t$tag" : "$tag";
> +       push(@{$self->{heretags}}, $token);
>         return "<<$indented$tag";
> @@ -169,10 +170,18 @@ sub swallow_heredocs {
> -               my $indent = $tag =~ s/^\t// ? '\\s*' : '';
> -               $$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
> +               my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
> +               $$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
> +               if (pos($$b) > $start) {
> +                       my $body = substr($$b, $start, pos($$b) - $start);
> +                       $self->{lineno} += () = $body =~ /\n/sg;
> +                       next;
> +               }
> +               push(@{$self->{parser}->{problems}}, ['UNCLOSED-HEREDOC', $tag]);
> +               $$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
>                 my $body = substr($$b, $start, pos($$b) - $start);
>                 $self->{lineno} += () = $body =~ /\n/sg;
> +               last;

Oddly enough, these changes look fine to me.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] some chainlint fixes and performance improvements
  2023-03-28 21:08 ` [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
@ 2023-03-30 22:08   ` Jeff King
  2023-03-30 22:16     ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2023-03-30 22:08 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Phillip Wood, Michael J Gruber

On Tue, Mar 28, 2023 at 05:08:15PM -0400, Jeff King wrote:

> BTW, I noticed something really funky when timing t3070 for this series.
> 
>   $ time ./t3070-wildmatch.sh
>   [a bunch of output]
>   real	0m4.750s
>   user	0m3.665s
>   sys	0m0.955s
> 
>   $ time ./t3070-wildmatch.sh >/dev/null
>   real	0m18.664s
>   user	0m9.185s
>   sys	0m9.495s
> 
> Er, what? It gets way slower when redirected to /dev/null. I can't
> figure out why.

In case anyone is curious (and I know you were all on the edge of your
seats), I figured this out. The issue is that with the "powersave" CPU
governor in place, we never ratchet up the CPU frequency. Perhaps
because no process is pegging the CPU, but we just have tons of small
processes that quickly exit (which seems like a blind spot in the
governor, but at least makes some sense).

When the output is going to the terminal, then the terminal is consuming
CPU, and the frequency scales up. So it's faster when we show the
output, even though we're doing more work, because the CPU clock is
faster. Switching to the "performance" governor makes the problem go
away.

I cared for this series, of course, because I wanted to run t3070 under
hyperfine, which behaves like the /dev/null case (unless you pass
--show-output, which mangles the screen, and is why the hyperfine output
I showed earlier was so terse). So with the performance governor in
place, here's the hyperfine output for the whole series (this is on the
5-patch v2):

  $ hyperfine -P parent 0 5 -s 'git checkout jk/chainlint-fixes~{parent}' \
      -n 't3070 on jk/chainlint-fixes~{parent}' ./t3070-wildmatch.sh

  Benchmark 1: t3070 on jk/chainlint-fixes~0
    Time (mean ± σ):      3.677 s ±  0.047 s    [User: 2.893 s, System: 0.677 s]
    Range (min … max):    3.606 s …  3.725 s    10 runs
  
  Benchmark 2: t3070 on jk/chainlint-fixes~1
    Time (mean ± σ):      3.720 s ±  0.013 s    [User: 2.941 s, System: 0.676 s]
    Range (min … max):    3.698 s …  3.738 s    10 runs
  
  Benchmark 3: t3070 on jk/chainlint-fixes~2
    Time (mean ± σ):      4.224 s ±  0.019 s    [User: 3.291 s, System: 0.850 s]
    Range (min … max):    4.191 s …  4.254 s    10 runs
  
  Benchmark 4: t3070 on jk/chainlint-fixes~3
    Time (mean ± σ):      4.227 s ±  0.018 s    [User: 3.293 s, System: 0.856 s]
    Range (min … max):    4.198 s …  4.252 s    10 runs
  
  Benchmark 5: t3070 on jk/chainlint-fixes~4
    Time (mean ± σ):      4.604 s ±  0.014 s    [User: 3.599 s, System: 0.887 s]
    Range (min … max):    4.583 s …  4.629 s    10 runs
  
  Benchmark 6: t3070 on jk/chainlint-fixes~5
    Time (mean ± σ):      4.603 s ±  0.010 s    [User: 3.578 s, System: 0.904 s]
    Range (min … max):    4.583 s …  4.617 s    10 runs
  
  Summary
    't3070 on jk/chainlint-fixes~0' ran
      1.01 ± 0.01 times faster than 't3070 on jk/chainlint-fixes~1'
      1.15 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~2'
      1.15 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~3'
      1.25 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~5'
      1.25 ± 0.02 times faster than 't3070 on jk/chainlint-fixes~4'

Which is what we'd expect. We got about 1.25x faster, in two jumps at ~3
and ~1, which were the patches removing subshells (marking commits by
their parent number is rather confusing; I think it might be worth
making a small hyperfine wrapper that feeds the commit summary to "-n").

So no effect on the series (good), but I didn't want to leave the
mystery unsolved on the list. :)

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/5] some chainlint fixes and performance improvements
  2023-03-30 20:32   ` [PATCH v2 0/5] some chainlint fixes and performance improvements Junio C Hamano
@ 2023-03-30 22:09     ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2023-03-30 22:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

On Thu, Mar 30, 2023 at 01:32:34PM -0700, Junio C Hamano wrote:

> >   [5/5]: tests: skip test_eval_ in internal chain-lint
> 
> The new step [3/5] makes it easier to justify [4/5], indeed.  Two
> primary changes at the beginning are good as before.  The last one
> does not make anything particularly easier to read, replacing one
> cryptic eval stuff with another, but it does not make it any worse,
> and the most importantly, it is clear to see that it does not change
> the behaviour.

Yeah, the eval garbage is horrid no matter where it is. The improvement
is that we no longer have to manually save-and-restore the "trace"
variable.

> Will queue.  Thanks.  Let's merge it down to 'next'.

Great, thanks. :)

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] some chainlint fixes and performance improvements
  2023-03-30 22:08   ` Jeff King
@ 2023-03-30 22:16     ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2023-03-30 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Phillip Wood, Michael J Gruber

Jeff King <peff@peff.net> writes:

> When the output is going to the terminal, then the terminal is consuming
> CPU, and the frequency scales up. So it's faster when we show the
> output, even though we're doing more work, because the CPU clock is
> faster.

That's disgustingly sick and crazy ;-).  But it does explain the symptom
and it sounds like that "powersave" is a poor match for measurement.

> Switching to the "performance" governor makes the problem go
> away.
> ...
> So no effect on the series (good), but I didn't want to leave the
> mystery unsolved on the list. :)

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-03-30 22:16 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 20:20 [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
2023-03-28 20:22 ` [PATCH 1/4] tests: run internal chain-linter under "make test" Jeff King
2023-03-29 10:20   ` Ævar Arnfjörð Bjarmason
2023-03-29 15:49     ` Junio C Hamano
2023-03-29 23:28       ` Jeff King
2023-03-30 18:45         ` Junio C Hamano
2023-03-28 20:23 ` [PATCH 2/4] tests: replace chainlint subshell with a function Jeff King
2023-03-28 20:40   ` Junio C Hamano
2023-03-28 20:28 ` [PATCH 3/4] tests: drop here-doc check from internal chain-linter Jeff King
2023-03-28 21:46   ` Junio C Hamano
2023-03-29  2:37     ` Jeff King
2023-03-29  3:04       ` Jeff King
2023-03-29  3:13         ` Eric Sunshine
2023-03-29  3:46           ` Eric Sunshine
2023-03-29  4:02             ` Eric Sunshine
2023-03-29  6:07             ` Jeff King
2023-03-29  6:28               ` Eric Sunshine
2023-03-29  3:07       ` Eric Sunshine
2023-03-29  6:28         ` Jeff King
2023-03-28 20:28 ` [PATCH 4/4] tests: skip test_eval_ in internal chain-lint Jeff King
2023-03-28 21:08 ` [PATCH 0/4] some chainlint fixes and performance improvements Jeff King
2023-03-30 22:08   ` Jeff King
2023-03-30 22:16     ` Junio C Hamano
2023-03-30 19:27 ` [PATCH v2 0/5] " Jeff King
2023-03-30 19:27   ` [PATCH v2 1/5] tests: run internal chain-linter under "make test" Jeff King
2023-03-30 19:27   ` [PATCH v2 2/5] tests: replace chainlint subshell with a function Jeff King
2023-03-30 19:30   ` [PATCH v2 3/5] tests: diagnose unclosed here-doc in chainlint.pl Jeff King
2023-03-30 21:26     ` Eric Sunshine
2023-03-30 19:30   ` [PATCH v2 4/5] tests: drop here-doc check from internal chain-linter Jeff King
2023-03-30 19:30   ` [PATCH v2 5/5] tests: skip test_eval_ in internal chain-lint Jeff King
2023-03-30 20:32   ` [PATCH v2 0/5] some chainlint fixes and performance improvements Junio C Hamano
2023-03-30 22:09     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.