All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] question mark in GIT_SKIP_TESTS is broken in master
@ 2021-06-15 20:55 Andrei Rybak
  2021-06-15 23:28 ` [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master) Andrei Rybak
  0 siblings, 1 reply; 17+ messages in thread
From: Andrei Rybak @ 2021-06-15 20:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Hi,

I've taken an example of how to use GIT_SKIP_TESTS directly from
t/README:

     $ GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make

and noticed that t0000 is still being run.  I've written a script to
bisect the issue:


     #/bin/sh

     (
             cd t
             GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make | \
                     head | grep 'SKIP skip all tests in t0000'
     )


and it bisected to edc23840b0 (test-lib: bring $remove_trash out of
retirement, 2021-05-10) which was merged to master in 2019256717 ("Merge
branch 'ab/test-lib-updates'", 2021-06-14).

Note, that spelling out the tests number in full works, i.e.

     $ GIT_SKIP_TESTS=t0000 make

won't run t0000.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master)
  2021-06-15 20:55 [BUG] question mark in GIT_SKIP_TESTS is broken in master Andrei Rybak
@ 2021-06-15 23:28 ` Andrei Rybak
  2021-06-16  3:28   ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Andrei Rybak @ 2021-06-15 23:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On 15/06/2021 22:55, Andrei Rybak wrote:
> Hi,
> 
> I've taken an example of how to use GIT_SKIP_TESTS directly from
> t/README:
> 
>      $ GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make
> 
> and noticed that t0000 is still being run.  I've written a script to
> bisect the issue:
> 
> 
>      #/bin/sh
> 
>      (
>              cd t
>              GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make | \
>                      head | grep 'SKIP skip all tests in t0000'
>      )
> 
> 
> and it bisected to edc23840b0 (test-lib: bring $remove_trash out of
> retirement, 2021-05-10) which was merged to master in 2019256717 ("Merge
> branch 'ab/test-lib-updates'", 2021-06-14).
> 
> Note, that spelling out the tests number in full works, i.e.
> 
>      $ GIT_SKIP_TESTS=t0000 make
> 
> won't run t0000.

I sent this originally with incorrect subject. Question marks, e.g.

     $ GIT_SKIP_TESTS='t000?' make

work.  It is the range expressions, like [0-9] that are broken:

     $ GIT_SKIP_TESTS='t[0-9]000' make

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master)
  2021-06-15 23:28 ` [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master) Andrei Rybak
@ 2021-06-16  3:28   ` Junio C Hamano
  2021-06-16  4:12     ` Junio C Hamano
  2021-06-16  8:24     ` [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-06-16  3:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Andrei Rybak; +Cc: git

Andrei Rybak <rybak.a.v@gmail.com> writes:

> I sent this originally with incorrect subject. Question marks, e.g.
>
>     $ GIT_SKIP_TESTS='t000?' make
>
> work.  It is the range expressions, like [0-9] that are broken:
>
>     $ GIT_SKIP_TESTS='t[0-9]000' make

This reproduces for me, and

      $ GIT_SKIP_TESTS=t?000' sh t0000-basic.sh
      $ GIT_SKIP_TESTS=t0?00' sh t0000-basic.sh
      $ GIT_SKIP_TESTS=t00?0' sh t0000-basic.sh

tells me that "Question marks work" above is not exactly correct.
It is an exception that this

      $ GIT_SKIP_TESTS=t000?' sh t0000-basic.sh

happens to skip.

Interestingly enough, edc23840b0 (test-lib: bring $remove_trash out
of retirement, 2021-05-10) cleanly reverts without being depended on
by anything else in the series.

Ævar?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master)
  2021-06-16  3:28   ` Junio C Hamano
@ 2021-06-16  4:12     ` Junio C Hamano
  2021-06-16  8:29       ` Jeff King
  2021-06-16  8:24     ` [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-06-16  4:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Andrei Rybak, git

Junio C Hamano <gitster@pobox.com> writes:

> Interestingly enough, edc23840b0 (test-lib: bring $remove_trash out
> of retirement, 2021-05-10) cleanly reverts without being depended on
> by anything else in the series.
>
> Ævar?

With the crude debugging aid patch (attached at the end) applied,
running

    $ GIT_SKIP_TESTS='t?000' sh -x t0000-basic.sh -v

will show something interesting in the trace.

    ++ this_test=t0000
    ++ _s_k_i_p_='t?000'
    ++ match_pattern_list t0000 t5000

The variable $GIT_SKIP_TESTS on this line:

    if match_pattern_list "$this_test" $GIT_SKIP_TESTS

globs to t5000.  We don't quote the variable because we want them
separated at $IFS boundaries, but we didn't want the glob specials
in its value to take any effect.  Sigh.

The reason why edc23840b0 appears to break this is probably because
we are still in $TEST_DIRECTORY when this match_pattern_list is
executed; before that change, we've created $TRASH_DIRECTORY and
chdir'd there already, and when we check "do we want to skip all?",
there is nothing for the glob to match.

That also explains why GIT_SKIP_TESTS="t000?" appears to work.
There is no such filesystem entity directly in $TEST_DIRECTORY.

    $ echo t000? t00?0 t0?00 t?000
    t000? t00?0 t0200 t5000



diff --git i/t/test-lib.sh w/t/test-lib.sh
index 54938c6427..8ee0540532 100644
--- i/t/test-lib.sh
+++ w/t/test-lib.sh
@@ -1346,13 +1346,17 @@ fi
 remove_trash=
 this_test=${0##*/}
 this_test=${this_test%%-*}
+_s_k_i_p_=$GIT_SKIP_TESTS
+
 if match_pattern_list "$this_test" $GIT_SKIP_TESTS
 then
 	say_color info >&3 "skipping test $this_test altogether"
 	skip_all="skip all tests in $this_test"
 	test_done
 fi
 
+exit
+
 # Last-minute variable setup
 HOME="$TRASH_DIRECTORY"
 GNUPGHOME="$HOME/gnupg-home-not-used"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  3:28   ` Junio C Hamano
  2021-06-16  4:12     ` Junio C Hamano
@ 2021-06-16  8:24     ` Ævar Arnfjörð Bjarmason
  2021-06-16  8:36       ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-16  8:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Andrei Rybak,
	Đoàn Trần Công Danh,
	Ævar Arnfjörð Bjarmason

My edc23840b0b (test-lib: bring $remove_trash out of retirement,
2021-05-10) caused a regression where we'd fail to handle
GIT_SKIP_TESTS variable content like 't????', since we were calling
match_pattern_list() from the t/ directory not the trash
directory.

We'd thus glob match in the directory containing our tests, whereas
before we'd do the match in our newly setup (and empty, aside from the
.git) trash directory.

This bug reveals a more general problem though, which I'm attempting
to solve here: We have other users of match_pattern_list() that have
always been broken, namely the --verbose-only=* and --valgrind-only=*
options added in ff09af3fb8f (test-lib: verbose mode for only tests
matching a pattern, 2013-06-23) and 5dfc368f5ec (test-lib: valgrind
for only tests matching a pattern, 2013-06-23).

Those options will try to match an argument like --verbose-only=1*
against a test number like "10", but since we run the match in our own
trash directory an earlier test creating a file like "1one.txt" will
break that option.

We cannot simply quote the $GIT_SKIP_TESTS" being passed to
match_pattern_list(), since we are relying on the $IFS semantics.

Let's instead setup a .test-lib-trash subdirectory under the trash
directory, and an "empty-dir" directory under that. Then let's run the
match_pattern_list() in a sub-shell in that directory.

This fixes the regression in my edc23840b0b, as well as the bugs we've
had in the --{verbose,valgrind}-only=* options. I have tests for this,
but they're a pain to support without my in-flight lib-subtest.sh
changes, I'll submit them once those land.

This has the added benefit of providing a new clean work area for
tests in general, so functions in test-lib-functions.sh can use it for
their own scratch files, instead of potentially tripping up the test
logic itself.

The drawback is that any tests that cares about the complete
cleanliness of its test are might need to be adjusted, as one ls-files
test here does. I think it's still worth it because that'll be one
special case (and a dotfile), as opposed to every test potentially
tripping over the likes of "expected.config" on a glob match of "*".

1. https://lore.kernel.org/git/1d003cac-83fa-0b63-f60e-55513ac45cf9@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Wed, Jun 16 2021, Junio C Hamano wrote:

> Interestingly enough, edc23840b0 (test-lib: bring $remove_trash out
> of retirement, 2021-05-10) cleanly reverts without being depended on
> by anything else in the series.

I think reverting edc23840b0 might be the best short-term fix to avoid
dealing with the breakage + get a quick fix down. I can confirm that
nothing else in the series relies on it.

This patch is an attempt at a more general solution to the problem,
which depending on what you think you might want to take instead. It
fixes not only the immediate regression, but as noted other existing
bugs in match_pattern_list() usage.

 t/t3000-ls-files-others.sh |  8 ++++++--
 t/test-lib-functions.sh    |  8 +++++---
 t/test-lib.sh              | 40 +++++++++++++++++++++++---------------
 3 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/t/t3000-ls-files-others.sh b/t/t3000-ls-files-others.sh
index 740ce56eab5..8c62da502df 100755
--- a/t/t3000-ls-files-others.sh
+++ b/t/t3000-ls-files-others.sh
@@ -37,6 +37,7 @@ test_expect_success 'setup: expected output' '
 	cat >expected1 <<-\EOF &&
 	expected1
 	expected2
+	expected2.noempty
 	expected3
 	output
 	path0
@@ -47,7 +48,10 @@ test_expect_success 'setup: expected output' '
 
 	sed -e "s|path2/file2|path2/|" <expected1 >expected2 &&
 	cp expected2 expected3 &&
-	echo path4/ >>expected2
+	echo path4/ >>expected2 &&
+
+	echo .test-lib-trash/ >expected2.noempty &&
+	cat expected2 >>expected2.noempty
 '
 
 test_expect_success 'ls-files --others' '
@@ -57,7 +61,7 @@ test_expect_success 'ls-files --others' '
 
 test_expect_success 'ls-files --others --directory' '
 	git ls-files --others --directory >output &&
-	test_cmp expected2 output
+	test_cmp expected2.noempty output
 '
 
 test_expect_success '--no-empty-directory hides empty directory' '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index f0448daa74b..33697b0df81 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1040,10 +1040,12 @@ test_cmp_config () {
 		GD="-C $1" &&
 		shift
 	fi &&
-	printf "%s\n" "$1" >expect.config &&
+	printf "%s\n" "$1" >"$TRASH_DIRECTORY_TEST_LIB"/expect.config &&
 	shift &&
-	git $GD config "$@" >actual.config &&
-	test_cmp expect.config actual.config
+	git $GD config "$@" >"$TRASH_DIRECTORY_TEST_LIB"/actual.config &&
+	test_cmp \
+		"$TRASH_DIRECTORY_TEST_LIB"/expect.config \
+		"$TRASH_DIRECTORY_TEST_LIB"/actual.config
 }
 
 # test_cmp_bin - helper to compare binary files
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 54938c64279..7f284f56b1d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -260,6 +260,8 @@ case "$TRASH_DIRECTORY" in
 /*) ;; # absolute path is good
  *) TRASH_DIRECTORY="$TEST_OUTPUT_DIRECTORY/$TRASH_DIRECTORY" ;;
 esac
+TRASH_DIRECTORY_TEST_LIB="$TRASH_DIRECTORY/.test-lib-trash"
+TRASH_DIRECTORY_TEST_LIB_EMPTY="$TRASH_DIRECTORY_TEST_LIB/empty-dir"
 
 # If --stress was passed, run this test repeatedly in several parallel loops.
 if test "$GIT_TEST_STRESS_STARTED" = "done"
@@ -848,7 +850,8 @@ maybe_teardown_verbose () {
 last_verbose=t
 maybe_setup_verbose () {
 	test -z "$verbose_only" && return
-	if match_pattern_list $test_count $verbose_only
+	if (cd "$TRASH_DIRECTORY_TEST_LIB_EMPTY" &&
+	    match_pattern_list $test_count $verbose_only)
 	then
 		exec 4>&2 3>&1
 		# Emit a delimiting blank line when going from
@@ -878,7 +881,8 @@ maybe_setup_valgrind () {
 		return
 	fi
 	GIT_VALGRIND_ENABLED=
-	if match_pattern_list $test_count $valgrind_only
+	if (cd "$TRASH_DIRECTORY_TEST_LIB_EMPTY" &&
+	    match_pattern_list $test_count $valgrind_only)
 	then
 		GIT_VALGRIND_ENABLED=t
 	fi
@@ -1006,7 +1010,8 @@ test_finish_ () {
 test_skip () {
 	to_skip=
 	skipped_reason=
-	if match_pattern_list $this_test.$test_count $GIT_SKIP_TESTS
+	if (cd "$TRASH_DIRECTORY_TEST_LIB_EMPTY" &&
+	    match_pattern_list $this_test.$test_count $GIT_SKIP_TESTS)
 	then
 		to_skip=t
 		skipped_reason="GIT_SKIP_TESTS"
@@ -1177,7 +1182,7 @@ test_done () {
 			esac
 		fi
 
-		if test -z "$debug" && test -n "$remove_trash"
+		if test -z "$debug"
 		then
 			test -d "$TRASH_DIRECTORY" ||
 			error "Tests passed but trash directory already removed before test cleanup; aborting"
@@ -1342,11 +1347,22 @@ then
 	exit 1
 fi
 
+# Test repository
+rm -fr "$TRASH_DIRECTORY" || {
+	GIT_EXIT_OK=t
+	echo >&5 "FATAL: Cannot prepare test area"
+	exit 1
+}
+
+# Set up an early work area for the test code itself
+mkdir -p "$TRASH_DIRECTORY_TEST_LIB_EMPTY" >&3 2>&4 ||
+	error "cannot create test-lib.sh trash directory"
+
 # Are we running this test at all?
-remove_trash=
 this_test=${0##*/}
 this_test=${this_test%%-*}
-if match_pattern_list "$this_test" $GIT_SKIP_TESTS
+if (cd "$TRASH_DIRECTORY_TEST_LIB_EMPTY" &&
+    match_pattern_list "$this_test" $GIT_SKIP_TESTS)
 then
 	say_color info >&3 "skipping test $this_test altogether"
 	skip_all="skip all tests in $this_test"
@@ -1358,20 +1374,12 @@ HOME="$TRASH_DIRECTORY"
 GNUPGHOME="$HOME/gnupg-home-not-used"
 export HOME GNUPGHOME
 
-# Test repository
-rm -fr "$TRASH_DIRECTORY" || {
-	GIT_EXIT_OK=t
-	echo >&5 "FATAL: Cannot prepare test area"
-	exit 1
-}
-
-remove_trash=t
+# "We have the $TRASH_DIRECTORY already, but let's create a
+# $TRASH_DIRECTORY/.git
 if test -z "$TEST_NO_CREATE_REPO"
 then
 	git init "$TRASH_DIRECTORY" >&3 2>&4 ||
 	error "cannot run git init"
-else
-	mkdir -p "$TRASH_DIRECTORY"
 fi
 
 # Use -P to resolve symlinks in our working directory so that the cwd
-- 
2.32.0.555.g0268d380f7b


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master)
  2021-06-16  4:12     ` Junio C Hamano
@ 2021-06-16  8:29       ` Jeff King
  2021-06-16  9:19         ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-06-16  8:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Andrei Rybak, git

On Wed, Jun 16, 2021 at 01:12:09PM +0900, Junio C Hamano wrote:

> The variable $GIT_SKIP_TESTS on this line:
> 
>     if match_pattern_list "$this_test" $GIT_SKIP_TESTS
> 
> globs to t5000.  We don't quote the variable because we want them
> separated at $IFS boundaries, but we didn't want the glob specials
> in its value to take any effect.  Sigh.

Good find.

It's surprisingly hard to do field-splitting without pathname globbing
in pure shell. I couldn't find a way without using "set -f". That's in
POSIX, but it feels funny tweaking a global that can effect how other
code runs. We can at least constraint it to a subshell close to the
point of use:

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 54938c6427..093104d04f 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -732,14 +732,15 @@ match_pattern_list () {
 	arg="$1"
 	shift
 	test -z "$*" && return 1
-	for pattern_
+	(set -f
+	for pattern_ in $*
 	do
 		case "$arg" in
 		$pattern_)
-			return 0
+			exit 0
 		esac
 	done
-	return 1
+	exit 1)
 }
 
 match_test_selector_list () {
@@ -848,7 +849,7 @@ maybe_teardown_verbose () {
 last_verbose=t
 maybe_setup_verbose () {
 	test -z "$verbose_only" && return
-	if match_pattern_list $test_count $verbose_only
+	if match_pattern_list $test_count "$verbose_only"
 	then
 		exec 4>&2 3>&1
 		# Emit a delimiting blank line when going from
@@ -878,7 +879,7 @@ maybe_setup_valgrind () {
 		return
 	fi
 	GIT_VALGRIND_ENABLED=
-	if match_pattern_list $test_count $valgrind_only
+	if match_pattern_list $test_count "$valgrind_only"
 	then
 		GIT_VALGRIND_ENABLED=t
 	fi
@@ -1006,7 +1007,7 @@ test_finish_ () {
 test_skip () {
 	to_skip=
 	skipped_reason=
-	if match_pattern_list $this_test.$test_count $GIT_SKIP_TESTS
+	if match_pattern_list $this_test.$test_count "$GIT_SKIP_TESTS"
 	then
 		to_skip=t
 		skipped_reason="GIT_SKIP_TESTS"
@@ -1346,7 +1347,7 @@ fi
 remove_trash=
 this_test=${0##*/}
 this_test=${this_test%%-*}
-if match_pattern_list "$this_test" $GIT_SKIP_TESTS
+if match_pattern_list "$this_test" "$GIT_SKIP_TESTS"
 then
 	say_color info >&3 "skipping test $this_test altogether"
 	skip_all="skip all tests in $this_test"

If that isn't portable for some reason, I think we could fall back on
splitting with an external tool. You can't feed the result as a function
argument (you run into the same problem!) but you can use "read" to
split on newlines, like:

  echo "$GIT_SKIP_TESTS" |
  tr ' ' '\n' |
  while read pattern
  do
    echo "got $pattern"
  done

That does put the shell code on the right-hand side of a pipe, which
means it's constrained in terms of setting variables, etc. But that
would be acceptable for our use here.

I dunno. Maybe somebody else can come up with something more clever (or
maybe I am just missing something obvious).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  8:24     ` [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs Ævar Arnfjörð Bjarmason
@ 2021-06-16  8:36       ` Jeff King
  2021-06-16  9:22         ` Junio C Hamano
  2021-06-16  9:49         ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2021-06-16  8:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Andrei Rybak,
	Đoàn Trần Công Danh

On Wed, Jun 16, 2021 at 10:24:13AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Those options will try to match an argument like --verbose-only=1*
> against a test number like "10", but since we run the match in our own
> trash directory an earlier test creating a file like "1one.txt" will
> break that option.
> 
> We cannot simply quote the $GIT_SKIP_TESTS" being passed to
> match_pattern_list(), since we are relying on the $IFS semantics.
> 
> Let's instead setup a .test-lib-trash subdirectory under the trash
> directory, and an "empty-dir" directory under that. Then let's run the
> match_pattern_list() in a sub-shell in that directory.

Looks like my email just crossed with this one. Your "cd to an empty
directory" is a fun version of my "maybe somebody can think of something
clever" statement. :)

As a general solution, it does fail if the globs may contain things that
look like absolute paths, but that is quite unlikely for our use case
here. Still, I kind of like the "set -f" version because it doesn't need
the extra directory which could cause problems with "ls-files -o", etc,
as you mentioned. You could also create the empty directory on the fly,
though if "set -f" works portably, that seems less complicated to me.

Whatever the expansion mechanism, I do think it's worth having callers
quote "$GIT_SKIP_TESTS" and then performing the expansion within
match_pattern_list. Then the nasty mechanics are all in that one place.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master)
  2021-06-16  8:29       ` Jeff King
@ 2021-06-16  9:19         ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-06-16  9:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Andrei Rybak, git

Jeff King <peff@peff.net> writes:

> It's surprisingly hard to do field-splitting without pathname globbing
> in pure shell. I couldn't find a way without using "set -f". That's in
> POSIX, but it feels funny tweaking a global that can effect how other
> code runs. We can at least constraint it to a subshell close to the
> point of use:
> ...
> -	for pattern_
> +	(set -f
> +	for pattern_ in $*
>  	do
>  		case "$arg" in
>  		$pattern_)
> -			return 0
> +			exit 0
>  		esac
>  	done
> -	return 1
> +	exit 1)
>  }

Nice.  "set -f" is what I wanted to find myself but couldn't, when I
wrote the message you are responding to.

> -if match_pattern_list "$this_test" $GIT_SKIP_TESTS
> +if match_pattern_list "$this_test" "$GIT_SKIP_TESTS"

OK.  'for pattern_ in $*' that flattens $* allows us to quote it
here, passing it as a single argument without globbing.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  8:36       ` Jeff King
@ 2021-06-16  9:22         ` Junio C Hamano
  2021-06-16 10:23           ` Jeff King
  2021-06-16  9:49         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-06-16  9:22 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Andrei Rybak,
	Đoàn Trần Công Danh

Jeff King <peff@peff.net> writes:

> ... Still, I kind of like the "set -f" version because it doesn't need
> the extra directory which could cause problems with "ls-files -o", etc,
> as you mentioned. You could also create the empty directory on the fly,
> though if "set -f" works portably, that seems less complicated to me.

FWIW, I share that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  8:36       ` Jeff King
  2021-06-16  9:22         ` Junio C Hamano
@ 2021-06-16  9:49         ` Ævar Arnfjörð Bjarmason
  2021-06-16 11:43           ` Jeff King
  2021-06-17  0:36           ` Junio C Hamano
  1 sibling, 2 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-16  9:49 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Andrei Rybak,
	Đoàn Trần Công Danh


On Wed, Jun 16 2021, Jeff King wrote:

> On Wed, Jun 16, 2021 at 10:24:13AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Those options will try to match an argument like --verbose-only=1*
>> against a test number like "10", but since we run the match in our own
>> trash directory an earlier test creating a file like "1one.txt" will
>> break that option.
>> 
>> We cannot simply quote the $GIT_SKIP_TESTS" being passed to
>> match_pattern_list(), since we are relying on the $IFS semantics.
>> 
>> Let's instead setup a .test-lib-trash subdirectory under the trash
>> directory, and an "empty-dir" directory under that. Then let's run the
>> match_pattern_list() in a sub-shell in that directory.
>
> Looks like my email just crossed with this one. Your "cd to an empty
> directory" is a fun version of my "maybe somebody can think of something
> clever" statement. :)

Yeah, I sent it before seeing yours.

> As a general solution, it does fail if the globs may contain things that
> look like absolute paths, but that is quite unlikely for our use case
> here.[...]

Not just unlikely, impossible. Yes it's possible in the general case,
but in the match_pattern_list we are always normalizing
e.g. ./t1234-some-test.sh t t1234, and the other match cases are test
numbers etc. It doesn't need to deal with the general case.

> [...] Still, I kind of like the "set -f" version because it doesn't need
> the extra directory which could cause problems with "ls-files -o", etc,
> as you mentioned. You could also create the empty directory on the fly,
> though if "set -f" works portably, that seems less complicated to me.

Yeah, in isolation I'd agree, but given the desire (with existing code
and other in-flight code) to have a scratch are for the test library
code itself simply creating such a directory seems like a good thing to
have in general, and once we have it we can use the subshell trick, or
just "set -f" I suppose and use the scratch dir for other stuff.

I am a bit wary of this being our first ever use of "set -f", but maybe
it's sufficiently portable.

> Whatever the expansion mechanism, I do think it's worth having callers
> quote "$GIT_SKIP_TESTS" and then performing the expansion within
> match_pattern_list. Then the nasty mechanics are all in that one place.

I think it's rather clean to not quote it, i.e. to have the loop get a
list of item and then things to match, it would also make it easier to
e.g. port it to a native C program.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  9:22         ` Junio C Hamano
@ 2021-06-16 10:23           ` Jeff King
  2021-06-16 10:24             ` Jeff King
  2021-06-16 11:38             ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2021-06-16 10:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Andrei Rybak,
	Đoàn Trần Công Danh

On Wed, Jun 16, 2021 at 06:22:10PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... Still, I kind of like the "set -f" version because it doesn't need
> > the extra directory which could cause problems with "ls-files -o", etc,
> > as you mentioned. You could also create the empty directory on the fly,
> > though if "set -f" works portably, that seems less complicated to me.
> 
> FWIW, I share that.

Here it is with some cosmetic cleanups and a commit message. I don't
mean to preempt further discussion if Ævar prefers another route, but I
want to make sure we didn't stall out.

I'm a little curious whether the bug could be triggered before the
recent move to running match_pattern_list outside of the trash directory
(i.e., whether there is some test that creates a sufficiently
confusing-looking file in the filesystem; t0000 would be a likely
candidate). But not curious enough to comb through the test suite
looking for candidates. :)

-- >8 --
Subject: [PATCH] test-lib: avoid accidental globbing in match_pattern_list()

We have a custom match_pattern_list() function which we use for matching
test names (like "t1234") against glob-like patterns (like "t1???") for
$GIT_SKIP_TESTS, --verbose-only, etc.

Those patterns may have multiple whitespace-separated elements (e.g.,
"t0* t1234 t5?78"). The callers of match_pattern_list thus pass the
strings unquoted, so that the shell does the usual field-splitting into
separate arguments.

But this also means the shell will do the usual globbing for each
argument, which can result in us seeing an expansion based on what's in
the filesystem, rather than the real pattern. For example, if I have the
path "t5000" in the filesystem, and you feed the pattern "t?0000", that
_should_ match the string "t0000", but it won't after the shell has
expanded it to "t5000".

This has been a bug ever since that function was introduced. But it
didn't usually trigger since we typically use the function inside the
trash directory, which has a very limited set of files that are unlikely
to match. It became a lot easier to trigger after edc23840b0 (test-lib:
bring $remove_trash out of retirement, 2021-05-10), because now we match
$GIT_SKIP_TESTS before even entering the trash directory. So the t5000
example above can be seen with:

  GIT_SKIP_TESTS=t?000 ./t0000-basic.sh

which should skip all tests but doesn't.

We can fix this by using "set -f" to ask the shell not to glob (which is
in POSIX, so should hopefully be portable enough). We only want to do
this in a subshell (to avoid polluting the rest of the script), which
means we need to get the whole string intact into the match_pattern_list
function by quoting it. Arguably this is a good idea anyway, since it
makes it much more obvious that we intend to split, and it's not simply
sloppy scripting.

Diagnosed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib.sh | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 54938c6427..a7badbf1dd 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -732,14 +732,24 @@ match_pattern_list () {
 	arg="$1"
 	shift
 	test -z "$*" && return 1
-	for pattern_
-	do
-		case "$arg" in
-		$pattern_)
-			return 0
-		esac
-	done
-	return 1
+	# We need to use "$*" to get field-splitting, but we want to
+	# disable globbing, since we are matching against an arbitrary
+	# $arg, not what's in the filesystem. Using "set -f" accomplishes
+	# that, but we must do it in a subshell to avoid impacting the
+	# rest of the script. The exit value of the subshell becomes
+	# the function's return value.
+	(
+		set -f
+		for pattern_ in $*
+		do
+			case "$arg" in
+			$pattern_)
+				exit 0
+				;;
+			esac
+		done
+		exit 1
+	)
 }
 
 match_test_selector_list () {
@@ -848,7 +858,7 @@ maybe_teardown_verbose () {
 last_verbose=t
 maybe_setup_verbose () {
 	test -z "$verbose_only" && return
-	if match_pattern_list $test_count $verbose_only
+	if match_pattern_list $test_count "$verbose_only"
 	then
 		exec 4>&2 3>&1
 		# Emit a delimiting blank line when going from
@@ -878,7 +888,7 @@ maybe_setup_valgrind () {
 		return
 	fi
 	GIT_VALGRIND_ENABLED=
-	if match_pattern_list $test_count $valgrind_only
+	if match_pattern_list $test_count "$valgrind_only"
 	then
 		GIT_VALGRIND_ENABLED=t
 	fi
@@ -1006,7 +1016,7 @@ test_finish_ () {
 test_skip () {
 	to_skip=
 	skipped_reason=
-	if match_pattern_list $this_test.$test_count $GIT_SKIP_TESTS
+	if match_pattern_list $this_test.$test_count "$GIT_SKIP_TESTS"
 	then
 		to_skip=t
 		skipped_reason="GIT_SKIP_TESTS"
@@ -1346,7 +1356,7 @@ fi
 remove_trash=
 this_test=${0##*/}
 this_test=${this_test%%-*}
-if match_pattern_list "$this_test" $GIT_SKIP_TESTS
+if match_pattern_list "$this_test" "$GIT_SKIP_TESTS"
 then
 	say_color info >&3 "skipping test $this_test altogether"
 	skip_all="skip all tests in $this_test"
-- 
2.32.0.559.g998c91e3d7


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16 10:23           ` Jeff King
@ 2021-06-16 10:24             ` Jeff King
  2021-06-16 11:38             ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-06-16 10:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Andrei Rybak,
	Đoàn Trần Công Danh

On Wed, Jun 16, 2021 at 06:23:07AM -0400, Jeff King wrote:

> But this also means the shell will do the usual globbing for each
> argument, which can result in us seeing an expansion based on what's in
> the filesystem, rather than the real pattern. For example, if I have the
> path "t5000" in the filesystem, and you feed the pattern "t?0000", that
> _should_ match the string "t0000", but it won't after the shell has
> expanded it to "t5000".

Whoops, that pattern should be "t?000", of course.

It's correct in the runnable example later on.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16 10:23           ` Jeff King
  2021-06-16 10:24             ` Jeff King
@ 2021-06-16 11:38             ` Ævar Arnfjörð Bjarmason
  2021-06-16 11:50               ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-16 11:38 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Andrei Rybak,
	Đoàn Trần Công Danh


On Wed, Jun 16 2021, Jeff King wrote:

> On Wed, Jun 16, 2021 at 06:22:10PM +0900, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > ... Still, I kind of like the "set -f" version because it doesn't need
>> > the extra directory which could cause problems with "ls-files -o", etc,
>> > as you mentioned. You could also create the empty directory on the fly,
>> > though if "set -f" works portably, that seems less complicated to me.
>> 
>> FWIW, I share that.
>
> Here it is with some cosmetic cleanups and a commit message. I don't
> mean to preempt further discussion if Ævar prefers another route, but I
> want to make sure we didn't stall out.

I mildly prefer mine as noted in
https://lore.kernel.org/git/87pmwmxd6f.fsf@evledraar.gmail.com/; but I'd
mainly prefer just to fix the immediate breakage on master, so whatever
variant of reverting, taking yours or mine Junio prefers I'm fine with.

Just inline comments on the patch:

> @@ -732,14 +732,24 @@ match_pattern_list () {
>  	arg="$1"
>  	shift
>  	test -z "$*" && return 1
> -	for pattern_
> -	do
> -		case "$arg" in
> -		$pattern_)
> -			return 0
> -		esac
> -	done
> -	return 1
> +	# We need to use "$*" to get field-splitting, but we want to
> +	# disable globbing, since we are matching against an arbitrary
> +	# $arg, not what's in the filesystem. Using "set -f" accomplishes
> +	# that, but we must do it in a subshell to avoid impacting the
> +	# rest of the script. The exit value of the subshell becomes
> +	# the function's return value.
> +	(
> +		set -f
> +		for pattern_ in $*
> +		do
> +			case "$arg" in
> +			$pattern_)
> +				exit 0
> +				;;
> +			esac
> +		done
> +		exit 1
> +	)
>  }

Why not just start with a ret=1, set ret=0 if we have a match and break
from the loop, and then do a "set +f" afterwards? I.e. is there an
actual need for the subshell here.

I'm mildly paranoid about a new "set -<flag>" in the codebase for vague
fears of portability (as noted in my linked message), but whatever shell
supports "set -<flag>" surely supports the inverse with "set +<flag>",
no?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  9:49         ` Ævar Arnfjörð Bjarmason
@ 2021-06-16 11:43           ` Jeff King
  2021-06-17  0:36           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-06-16 11:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Andrei Rybak,
	Đoàn Trần Công Danh

On Wed, Jun 16, 2021 at 11:49:20AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > [...] Still, I kind of like the "set -f" version because it doesn't need
> > the extra directory which could cause problems with "ls-files -o", etc,
> > as you mentioned. You could also create the empty directory on the fly,
> > though if "set -f" works portably, that seems less complicated to me.
> 
> Yeah, in isolation I'd agree, but given the desire (with existing code
> and other in-flight code) to have a scratch are for the test library
> code itself simply creating such a directory seems like a good thing to
> have in general, and once we have it we can use the subshell trick, or
> just "set -f" I suppose and use the scratch dir for other stuff.

I don't have much of an opinion on other uses of the scratch dir, not
having seen them. I agree if we do have one and are paying the cost for
it already, then using it here isn't a big deal.

> I am a bit wary of this being our first ever use of "set -f", but maybe
> it's sufficiently portable.

Yep, me too. The nice thing about it is that we can swap out the
solution pretty easily if it turns out not to be. I don't know how good
our coverage of obscure shells is in CI, though (e.g., on your gcc
build-farm stuff, are we mostly using system shells?).

> > Whatever the expansion mechanism, I do think it's worth having callers
> > quote "$GIT_SKIP_TESTS" and then performing the expansion within
> > match_pattern_list. Then the nasty mechanics are all in that one place.
> 
> I think it's rather clean to not quote it, i.e. to have the loop get a
> list of item and then things to match, it would also make it easier to
> e.g. port it to a native C program.

My main complaint is that it's a real gotcha for the callers. They have
to remember to do this cd-to-scratch (or whatever technique we use).
That's cumbersome, and if they forget, their call is wrong in a really
subtle way that basic testing isn't likely to uncover.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16 11:38             ` Ævar Arnfjörð Bjarmason
@ 2021-06-16 11:50               ` Jeff King
  2021-06-17  0:29                 ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-06-16 11:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Andrei Rybak,
	Đoàn Trần Công Danh

On Wed, Jun 16, 2021 at 01:38:55PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > +	# We need to use "$*" to get field-splitting, but we want to
> > +	# disable globbing, since we are matching against an arbitrary
> > +	# $arg, not what's in the filesystem. Using "set -f" accomplishes
> > +	# that, but we must do it in a subshell to avoid impacting the
> > +	# rest of the script. The exit value of the subshell becomes
> > +	# the function's return value.
> > +	(
> > +		set -f
> > +		for pattern_ in $*
> > +		do
> > +			case "$arg" in
> > +			$pattern_)
> > +				exit 0
> > +				;;
> > +			esac
> > +		done
> > +		exit 1
> > +	)
> >  }
> 
> Why not just start with a ret=1, set ret=0 if we have a match and break
> from the loop, and then do a "set +f" afterwards? I.e. is there an
> actual need for the subshell here.

My thought was that the subshell takes us back to the original state,
regardless of what it was. As opposed to "set +f" which takes us back to
a particular state. But it is unlikely that we'd have done a global "set
-f" before calling this, so maybe that is being overly conservative.

> I'm mildly paranoid about a new "set -<flag>" in the codebase for vague
> fears of portability (as noted in my linked message), but whatever shell
> supports "set -<flag>" surely supports the inverse with "set +<flag>",
> no?

Yes, I think we can assume that if it supports one, it supports the
other.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16 11:50               ` Jeff King
@ 2021-06-17  0:29                 ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-06-17  0:29 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Andrei Rybak,
	Đoàn Trần Công Danh

Jeff King <peff@peff.net> writes:

> My thought was that the subshell takes us back to the original state,
> regardless of what it was. As opposed to "set +f" which takes us back to
> a particular state. But it is unlikely that we'd have done a global "set
> -f" before calling this, so maybe that is being overly conservative.

Overly conservative, yes, but if it is not too much overhead, I
think it is a good practice anyway.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs
  2021-06-16  9:49         ` Ævar Arnfjörð Bjarmason
  2021-06-16 11:43           ` Jeff King
@ 2021-06-17  0:36           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-06-17  0:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, git, Andrei Rybak, Đoàn Trần Công Danh

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Whatever the expansion mechanism, I do think it's worth having callers
>> quote "$GIT_SKIP_TESTS" and then performing the expansion within
>> match_pattern_list. Then the nasty mechanics are all in that one place.
>
> I think it's rather clean to not quote it, i.e. to have the loop get a
> list of item and then things to match, it would also make it easier to
> e.g. port it to a native C program.

Sorry, I am not sure how it can work without quoting $GIT_SKIP_TESTS
and exposing the details of how we disable globbing when the caller
calls match_pattern_list.

A list of item in $GIT_SKIP_TESTS would not be able to pass t?000 in
it intact to the loop that processes each of the item on the list.

	for pattern in $GIT_SKIP_TESTS
	do
		attempt to match with $pattern
	done

will see t5000 in $pattern due to globbing.  It's not like "$@"
trick can be applied to any plain variable.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-17  0:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15 20:55 [BUG] question mark in GIT_SKIP_TESTS is broken in master Andrei Rybak
2021-06-15 23:28 ` [BUG] range expressions in GIT_SKIP_TESTS are broken in master (was [BUG] question mark in GIT_SKIP_TESTS is broken in master) Andrei Rybak
2021-06-16  3:28   ` Junio C Hamano
2021-06-16  4:12     ` Junio C Hamano
2021-06-16  8:29       ` Jeff King
2021-06-16  9:19         ` Junio C Hamano
2021-06-16  8:24     ` [PATCH] test-lib: fix "$remove_trash" regression and match_pattern_list() bugs Ævar Arnfjörð Bjarmason
2021-06-16  8:36       ` Jeff King
2021-06-16  9:22         ` Junio C Hamano
2021-06-16 10:23           ` Jeff King
2021-06-16 10:24             ` Jeff King
2021-06-16 11:38             ` Ævar Arnfjörð Bjarmason
2021-06-16 11:50               ` Jeff King
2021-06-17  0:29                 ` Junio C Hamano
2021-06-16  9:49         ` Ævar Arnfjörð Bjarmason
2021-06-16 11:43           ` Jeff King
2021-06-17  0:36           ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.