fstests.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/6] fstests: various improvements to the test framework
@ 2021-02-10  2:56 Darrick J. Wong
  2021-02-10  2:56 ` [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

Hi all,

This patchset contains assorted changes to the test suite that I use to
make it easier to reproduce regressions reported in nightly test runs.
The first two make it so that we capture metadumps of corrupt xfs
filesystems, and the rest introduce various options to ./check so that I
can exclude tests and run exact sequences of tests.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=testsuite-improvements
---
 README        |    2 ++
 check         |   44 +++++++++++++++++++++++++++++++++++---------
 common/config |    1 +
 common/fuzzy  |    3 +++
 common/rc     |    2 +-
 common/xfs    |   26 ++++++++++++++++++++++++++
 6 files changed, 68 insertions(+), 10 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools
  2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
@ 2021-02-10  2:56 ` Darrick J. Wong
  2021-02-11 13:58   ` Brian Foster
  2021-02-10  2:56 ` [PATCH 2/6] common: capture metadump output if xfs filesystem check fails Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

When we set up a fstests run, preserve the path xfs_metadump binary with
an $XFS_METADUMP_PROG wrapper, like we do for the other xfsprogs tools.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/config |    1 +
 common/rc     |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)


diff --git a/common/config b/common/config
index d83dfb28..d4cf8089 100644
--- a/common/config
+++ b/common/config
@@ -156,6 +156,7 @@ MKSWAP_PROG="$MKSWAP_PROG -f"
 export XFS_LOGPRINT_PROG="$(type -P xfs_logprint)"
 export XFS_REPAIR_PROG="$(type -P xfs_repair)"
 export XFS_DB_PROG="$(type -P xfs_db)"
+export XFS_METADUMP_PROG="$(type -P xfs_metadump)"
 export XFS_ADMIN_PROG="$(type -P xfs_admin)"
 export XFS_GROWFS_PROG=$(type -P xfs_growfs)
 export XFS_SPACEMAN_PROG="$(type -P xfs_spaceman)"
diff --git a/common/rc b/common/rc
index 649b1cfd..ad54b3de 100644
--- a/common/rc
+++ b/common/rc
@@ -509,7 +509,7 @@ _scratch_metadump()
 	[ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
 		options="-l $SCRATCH_LOGDEV"
 
-	xfs_metadump $options "$@" $SCRATCH_DEV $dumpfile
+	$XFS_METADUMP_PROG $options "$@" $SCRATCH_DEV $dumpfile
 }
 
 _setup_large_ext4_fs()


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] common: capture metadump output if xfs filesystem check fails
  2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
  2021-02-10  2:56 ` [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools Darrick J. Wong
@ 2021-02-10  2:56 ` Darrick J. Wong
  2021-02-11 13:59   ` Brian Foster
  2021-02-10  2:56 ` [PATCH 3/6] check: allow '-e testid' to exclude a single test Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@djwong.org>

Capture metadump output when various userspace repair and checker tools
fail or indicate corruption, to aid in debugging.  We don't bother to
annotate xfs_check because it's bitrotting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 README     |    2 ++
 common/xfs |   26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)


diff --git a/README b/README
index 43bb0cee..36f72088 100644
--- a/README
+++ b/README
@@ -109,6 +109,8 @@ Preparing system for tests:
              - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
                it between test invocations.  This assumes that the name of
                the module is the same as FSTYP.
+	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
+	       filesystems if the various stages of _check_xfs_filesystem fail.
 
         - or add a case to the switch in common/config assigning
           these variables based on the hostname of your test
diff --git a/common/xfs b/common/xfs
index 2156749d..ad1eb6ee 100644
--- a/common/xfs
+++ b/common/xfs
@@ -432,6 +432,21 @@ _supports_xfs_scrub()
 	return 0
 }
 
+# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
+_snapshot_xfs() {
+	local metadump="$1"
+	local device="$2"
+	local logdev="$3"
+	local options="-a -o"
+
+	if [ "$logdev" != "none" ]; then
+		options="$options -l $logdev"
+	fi
+
+	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
+	gzip -f "$metadump" >> "$seqres.full" 2>&1 &
+}
+
 # run xfs_check and friends on a FS.
 _check_xfs_filesystem()
 {
@@ -482,6 +497,9 @@ _check_xfs_filesystem()
 		# mounted ...
 		mountpoint=`_umount_or_remount_ro $device`
 	fi
+	if [ "$ok" -ne 1 ] && [ "$SNAPSHOT_CORRUPT_XFS" = "1" ]; then
+		_snapshot_xfs "$seqres.scrub.md" "$device" "$2"
+	fi
 
 	$XFS_LOGPRINT_PROG -t $extra_log_options $device 2>&1 \
 		| tee $tmp.logprint | grep -q "<CLEAN>"
@@ -491,6 +509,8 @@ _check_xfs_filesystem()
 		cat $tmp.logprint			>>$seqres.full
 		echo "*** end xfs_logprint output"	>>$seqres.full
 
+		test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
+			_snapshot_xfs "$seqres.logprint.md" "$device" "$2"
 		ok=0
 	fi
 
@@ -516,6 +536,8 @@ _check_xfs_filesystem()
 		cat $tmp.repair				>>$seqres.full
 		echo "*** end xfs_repair output"	>>$seqres.full
 
+		test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
+			_snapshot_xfs "$seqres.repair.md" "$device" "$2"
 		ok=0
 	fi
 	rm -f $tmp.fs_check $tmp.logprint $tmp.repair
@@ -529,6 +551,8 @@ _check_xfs_filesystem()
 			cat $tmp.repair				>>$seqres.full
 			echo "*** end xfs_repair output"	>>$seqres.full
 
+			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
+				_snapshot_xfs "$seqres.rebuild.md" "$device" "$2"
 			ok=0
 		fi
 		rm -f $tmp.repair
@@ -540,6 +564,8 @@ _check_xfs_filesystem()
 			cat $tmp.repair				>>$seqres.full
 			echo "*** end xfs_repair output"	>>$seqres.full
 
+			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
+				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"
 			ok=0
 		fi
 		rm -f $tmp.repair


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] check: allow '-e testid' to exclude a single test
  2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
  2021-02-10  2:56 ` [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools Darrick J. Wong
  2021-02-10  2:56 ` [PATCH 2/6] common: capture metadump output if xfs filesystem check fails Darrick J. Wong
@ 2021-02-10  2:56 ` Darrick J. Wong
  2021-02-11 14:00   ` Brian Foster
  2021-02-10  2:56 ` [PATCH 4/6] check: don't abort on non-existent excluded groups Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

This enables us to mask off specific tests.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/check b/check
index c6ad1d6c..e51cbede 100755
--- a/check
+++ b/check
@@ -79,6 +79,7 @@ testlist options
     -g group[,group...]	include tests from these groups
     -x group[,group...]	exclude tests from these groups
     -X exclude_file	exclude individual tests
+    -e testlist         exclude a specific list of tests
     -E external_file	exclude individual tests
     [testlist]		include tests matching names in testlist
 
@@ -287,6 +288,11 @@ while [ $# -gt 0 ]; do
 
 	-X)	subdir_xfile=$2; shift ;
 		;;
+	-e)
+		xfile=$2; shift ;
+		echo "$xfile" | tr ', ' '\n\n' >> $tmp.xlist
+		;;
+
 	-E)	xfile=$2; shift ;
 		if [ -f $xfile ]; then
 			sed "s/#.*$//" "$xfile" >> $tmp.xlist


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] check: don't abort on non-existent excluded groups
  2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-02-10  2:56 ` [PATCH 3/6] check: allow '-e testid' to exclude a single test Darrick J. Wong
@ 2021-02-10  2:56 ` Darrick J. Wong
  2021-02-11 14:00   ` Brian Foster
  2021-02-10  2:56 ` [PATCH 5/6] check: run tests in exactly the order specified Darrick J. Wong
  2021-02-10  2:56 ` [PATCH 6/6] fuzzy: capture core dumps from repair utilities Darrick J. Wong
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Don't abort the whole test run if we asked to exclude groups that aren't
included in the candidate group list, since we actually /are/ satisfying
the user's request.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/check b/check
index e51cbede..6f8db858 100755
--- a/check
+++ b/check
@@ -243,7 +243,7 @@ _prepare_test_list()
 		list=$(get_group_list $xgroup)
 		if [ -z "$list" ]; then
 			echo "Group \"$xgroup\" is empty or not defined?"
-			exit 1
+			continue
 		fi
 
 		trim_test_list $list


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] check: run tests in exactly the order specified
  2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
                   ` (3 preceding siblings ...)
  2021-02-10  2:56 ` [PATCH 4/6] check: don't abort on non-existent excluded groups Darrick J. Wong
@ 2021-02-10  2:56 ` Darrick J. Wong
  2021-02-11 14:00   ` Brian Foster
  2021-02-10  2:56 ` [PATCH 6/6] fuzzy: capture core dumps from repair utilities Darrick J. Wong
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Introduce a new --exact-order switch to disable all sorting, filtering
of repeated lines, and shuffling of test order.  The goal of this is to
be able to run tests in a specific order, namely to try to reproduce
test failures that could be the result of a -r(andomize) run getting
lucky.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check |   36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)


diff --git a/check b/check
index 6f8db858..106ec8e1 100755
--- a/check
+++ b/check
@@ -20,6 +20,7 @@ diff="diff -u"
 showme=false
 have_test_arg=false
 randomize=false
+exact_order=false
 export here=`pwd`
 xfile=""
 subdir_xfile=""
@@ -67,6 +68,7 @@ check options
     -n			show me, do not run tests
     -T			output timestamps
     -r			randomize test order
+    --exact-order	run tests in the exact order specified
     -i <n>		iterate the test list <n> times
     -d			dump test output to stdout
     -b			brief test summary
@@ -249,17 +251,22 @@ _prepare_test_list()
 		trim_test_list $list
 	done
 
-	# sort the list of tests into numeric order
-	if $randomize; then
-		if type shuf >& /dev/null; then
-			sorter="shuf"
+	# sort the list of tests into numeric order unless we're running tests
+	# in the exact order specified
+	if ! $exact_order; then
+		if $randomize; then
+			if type shuf >& /dev/null; then
+				sorter="shuf"
+			else
+				sorter="awk -v seed=$RANDOM -f randomize.awk"
+			fi
 		else
-			sorter="awk -v seed=$RANDOM -f randomize.awk"
+			sorter="cat"
 		fi
+		list=`sort -n $tmp.list | uniq | $sorter`
 	else
-		sorter="cat"
+		list=`cat $tmp.list`
 	fi
-	list=`sort -n $tmp.list | uniq | $sorter`
 	rm -f $tmp.list
 }
 
@@ -304,7 +311,20 @@ while [ $# -gt 0 ]; do
 	-udiff)	diff="$diff -u" ;;
 
 	-n)	showme=true ;;
-        -r)	randomize=true ;;
+	-r)
+		if $exact_order; then
+			echo "Cannot specify -r and --exact-order."
+			exit 1
+		fi
+		randomize=true
+		;;
+	--exact-order)
+		if $randomize; then
+			echo "Cannnot specify --exact-order and -r."
+			exit 1
+		fi
+		exact_order=true
+		;;
 	-i)	iterations=$2; shift ;;
 	-T)	timestamp=true ;;
 	-d)	DUMP_OUTPUT=true ;;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] fuzzy: capture core dumps from repair utilities
  2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
                   ` (4 preceding siblings ...)
  2021-02-10  2:56 ` [PATCH 5/6] check: run tests in exactly the order specified Darrick J. Wong
@ 2021-02-10  2:56 ` Darrick J. Wong
  2021-02-11 14:00   ` Brian Foster
  5 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-10  2:56 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Always capture the core dumps when we run repair tools against a fuzzed
filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/fuzzy |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/common/fuzzy b/common/fuzzy
index bd08af1c..809dee54 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -307,6 +307,9 @@ _scratch_xfs_fuzz_metadata() {
 	echo "Verbs we propose to fuzz with:"
 	echo $(echo "${verbs}")
 
+	# Always capture full core dumps from crashing tools
+	ulimit -c unlimited
+
 	echo "${fields}" | while read field; do
 		echo "${verbs}" | while read fuzzverb; do
 			__scratch_xfs_fuzz_mdrestore


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools
  2021-02-10  2:56 ` [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools Darrick J. Wong
@ 2021-02-11 13:58   ` Brian Foster
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2021-02-11 13:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Feb 09, 2021 at 06:56:25PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When we set up a fstests run, preserve the path xfs_metadump binary with
> an $XFS_METADUMP_PROG wrapper, like we do for the other xfsprogs tools.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  common/config |    1 +
>  common/rc     |    2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/common/config b/common/config
> index d83dfb28..d4cf8089 100644
> --- a/common/config
> +++ b/common/config
> @@ -156,6 +156,7 @@ MKSWAP_PROG="$MKSWAP_PROG -f"
>  export XFS_LOGPRINT_PROG="$(type -P xfs_logprint)"
>  export XFS_REPAIR_PROG="$(type -P xfs_repair)"
>  export XFS_DB_PROG="$(type -P xfs_db)"
> +export XFS_METADUMP_PROG="$(type -P xfs_metadump)"
>  export XFS_ADMIN_PROG="$(type -P xfs_admin)"
>  export XFS_GROWFS_PROG=$(type -P xfs_growfs)
>  export XFS_SPACEMAN_PROG="$(type -P xfs_spaceman)"
> diff --git a/common/rc b/common/rc
> index 649b1cfd..ad54b3de 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -509,7 +509,7 @@ _scratch_metadump()
>  	[ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
>  		options="-l $SCRATCH_LOGDEV"
>  
> -	xfs_metadump $options "$@" $SCRATCH_DEV $dumpfile
> +	$XFS_METADUMP_PROG $options "$@" $SCRATCH_DEV $dumpfile
>  }
>  
>  _setup_large_ext4_fs()
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] common: capture metadump output if xfs filesystem check fails
  2021-02-10  2:56 ` [PATCH 2/6] common: capture metadump output if xfs filesystem check fails Darrick J. Wong
@ 2021-02-11 13:59   ` Brian Foster
  2021-02-11 18:12     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Foster @ 2021-02-11 13:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Feb 09, 2021 at 06:56:30PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@djwong.org>
> 
> Capture metadump output when various userspace repair and checker tools
> fail or indicate corruption, to aid in debugging.  We don't bother to
> annotate xfs_check because it's bitrotting.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  README     |    2 ++
>  common/xfs |   26 ++++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
> 
> 
> diff --git a/README b/README
> index 43bb0cee..36f72088 100644
> --- a/README
> +++ b/README
> @@ -109,6 +109,8 @@ Preparing system for tests:
>               - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
>                 it between test invocations.  This assumes that the name of
>                 the module is the same as FSTYP.
> +	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
> +	       filesystems if the various stages of _check_xfs_filesystem fail.
>  
>          - or add a case to the switch in common/config assigning
>            these variables based on the hostname of your test
> diff --git a/common/xfs b/common/xfs
> index 2156749d..ad1eb6ee 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -432,6 +432,21 @@ _supports_xfs_scrub()
>  	return 0
>  }
>  
> +# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
> +_snapshot_xfs() {

The term snapshot has a well known meaning. Can we just call this
_metadump_xfs()?

> +	local metadump="$1"
> +	local device="$2"
> +	local logdev="$3"
> +	local options="-a -o"
> +
> +	if [ "$logdev" != "none" ]; then
> +		options="$options -l $logdev"
> +	fi
> +
> +	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
> +	gzip -f "$metadump" >> "$seqres.full" 2>&1 &

Why compress in the background? I wonder if we should just skip the
compression step since this requires an option to enable in the first
place..

> +}
> +
>  # run xfs_check and friends on a FS.
>  _check_xfs_filesystem()
>  {
...
> @@ -540,6 +564,8 @@ _check_xfs_filesystem()
>  			cat $tmp.repair				>>$seqres.full
>  			echo "*** end xfs_repair output"	>>$seqres.full
>  
> +			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
> +				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"

Why do we collect so many metadump images? Shouldn't all but the last
TEST_XFS_REPAIR_REBUILD thing not modify the fs? If so, it seems like we
should be able to collect one image (and perhaps just call it
"$seqres.$device.md") if any of the first several checks flag a problem.

Brian

>  			ok=0
>  		fi
>  		rm -f $tmp.repair
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] check: allow '-e testid' to exclude a single test
  2021-02-10  2:56 ` [PATCH 3/6] check: allow '-e testid' to exclude a single test Darrick J. Wong
@ 2021-02-11 14:00   ` Brian Foster
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2021-02-11 14:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Feb 09, 2021 at 06:56:36PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This enables us to mask off specific tests.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

Seems reasonable:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  check |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> 
> diff --git a/check b/check
> index c6ad1d6c..e51cbede 100755
> --- a/check
> +++ b/check
> @@ -79,6 +79,7 @@ testlist options
>      -g group[,group...]	include tests from these groups
>      -x group[,group...]	exclude tests from these groups
>      -X exclude_file	exclude individual tests
> +    -e testlist         exclude a specific list of tests
>      -E external_file	exclude individual tests
>      [testlist]		include tests matching names in testlist
>  
> @@ -287,6 +288,11 @@ while [ $# -gt 0 ]; do
>  
>  	-X)	subdir_xfile=$2; shift ;
>  		;;
> +	-e)
> +		xfile=$2; shift ;
> +		echo "$xfile" | tr ', ' '\n\n' >> $tmp.xlist
> +		;;
> +
>  	-E)	xfile=$2; shift ;
>  		if [ -f $xfile ]; then
>  			sed "s/#.*$//" "$xfile" >> $tmp.xlist
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] check: don't abort on non-existent excluded groups
  2021-02-10  2:56 ` [PATCH 4/6] check: don't abort on non-existent excluded groups Darrick J. Wong
@ 2021-02-11 14:00   ` Brian Foster
  2021-02-11 17:27     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Foster @ 2021-02-11 14:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Feb 09, 2021 at 06:56:42PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Don't abort the whole test run if we asked to exclude groups that aren't
> included in the candidate group list, since we actually /are/ satisfying
> the user's request.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  check |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/check b/check
> index e51cbede..6f8db858 100755
> --- a/check
> +++ b/check
> @@ -243,7 +243,7 @@ _prepare_test_list()
>  		list=$(get_group_list $xgroup)
>  		if [ -z "$list" ]; then
>  			echo "Group \"$xgroup\" is empty or not defined?"
> -			exit 1
> +			continue
>  		fi

Is this only for a nonexistent group? I.e., 'check -x nosuchgroup ...' ?
If so, what's the advantage?

Brian

>  
>  		trim_test_list $list
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] check: run tests in exactly the order specified
  2021-02-10  2:56 ` [PATCH 5/6] check: run tests in exactly the order specified Darrick J. Wong
@ 2021-02-11 14:00   ` Brian Foster
  2021-02-11 17:28     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Foster @ 2021-02-11 14:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Feb 09, 2021 at 06:56:47PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Introduce a new --exact-order switch to disable all sorting, filtering
> of repeated lines, and shuffling of test order.  The goal of this is to
> be able to run tests in a specific order, namely to try to reproduce
> test failures that could be the result of a -r(andomize) run getting
> lucky.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  check |   36 ++++++++++++++++++++++++++++--------
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/check b/check
> index 6f8db858..106ec8e1 100755
> --- a/check
> +++ b/check
...
> @@ -249,17 +251,22 @@ _prepare_test_list()
>  		trim_test_list $list
>  	done
>  
> -	# sort the list of tests into numeric order
> -	if $randomize; then
> -		if type shuf >& /dev/null; then
> -			sorter="shuf"
> +	# sort the list of tests into numeric order unless we're running tests
> +	# in the exact order specified
> +	if ! $exact_order; then
> +		if $randomize; then
> +			if type shuf >& /dev/null; then
> +				sorter="shuf"
> +			else
> +				sorter="awk -v seed=$RANDOM -f randomize.awk"
> +			fi
>  		else
> -			sorter="awk -v seed=$RANDOM -f randomize.awk"
> +			sorter="cat"
>  		fi
> +		list=`sort -n $tmp.list | uniq | $sorter`
>  	else
> -		sorter="cat"
> +		list=`cat $tmp.list`

Do we want to still filter out duplicates (i.e. uniq) in exact order
mode? LGTM either way:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	fi
> -	list=`sort -n $tmp.list | uniq | $sorter`
>  	rm -f $tmp.list
>  }
>  
> @@ -304,7 +311,20 @@ while [ $# -gt 0 ]; do
>  	-udiff)	diff="$diff -u" ;;
>  
>  	-n)	showme=true ;;
> -        -r)	randomize=true ;;
> +	-r)
> +		if $exact_order; then
> +			echo "Cannot specify -r and --exact-order."
> +			exit 1
> +		fi
> +		randomize=true
> +		;;
> +	--exact-order)
> +		if $randomize; then
> +			echo "Cannnot specify --exact-order and -r."
> +			exit 1
> +		fi
> +		exact_order=true
> +		;;
>  	-i)	iterations=$2; shift ;;
>  	-T)	timestamp=true ;;
>  	-d)	DUMP_OUTPUT=true ;;
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] fuzzy: capture core dumps from repair utilities
  2021-02-10  2:56 ` [PATCH 6/6] fuzzy: capture core dumps from repair utilities Darrick J. Wong
@ 2021-02-11 14:00   ` Brian Foster
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2021-02-11 14:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Feb 09, 2021 at 06:56:53PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Always capture the core dumps when we run repair tools against a fuzzed
> filesystem.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  common/fuzzy |    3 +++
>  1 file changed, 3 insertions(+)
> 
> 
> diff --git a/common/fuzzy b/common/fuzzy
> index bd08af1c..809dee54 100644
> --- a/common/fuzzy
> +++ b/common/fuzzy
> @@ -307,6 +307,9 @@ _scratch_xfs_fuzz_metadata() {
>  	echo "Verbs we propose to fuzz with:"
>  	echo $(echo "${verbs}")
>  
> +	# Always capture full core dumps from crashing tools
> +	ulimit -c unlimited
> +
>  	echo "${fields}" | while read field; do
>  		echo "${verbs}" | while read fuzzverb; do
>  			__scratch_xfs_fuzz_mdrestore
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] check: don't abort on non-existent excluded groups
  2021-02-11 14:00   ` Brian Foster
@ 2021-02-11 17:27     ` Darrick J. Wong
  2021-02-11 18:01       ` Brian Foster
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-11 17:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Feb 11, 2021 at 09:00:19AM -0500, Brian Foster wrote:
> On Tue, Feb 09, 2021 at 06:56:42PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Don't abort the whole test run if we asked to exclude groups that aren't
> > included in the candidate group list, since we actually /are/ satisfying
> > the user's request.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  check |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > diff --git a/check b/check
> > index e51cbede..6f8db858 100755
> > --- a/check
> > +++ b/check
> > @@ -243,7 +243,7 @@ _prepare_test_list()
> >  		list=$(get_group_list $xgroup)
> >  		if [ -z "$list" ]; then
> >  			echo "Group \"$xgroup\" is empty or not defined?"
> > -			exit 1
> > +			continue
> >  		fi
> 
> Is this only for a nonexistent group? I.e., 'check -x nosuchgroup ...' ?
> If so, what's the advantage?

I wrote this for groups that exist somewhere but would never have been
selected for this filesystem type in the first place.  For example,
'dangerous_scrub' (aka fuzz testing for xfs_scrub) is only found in
tests/xfs/group, so running:

# FSTYP=ext4 ./check -x dangerous_scrub

fails because ./check cannot select any of the dangerous_scrub tests for
an ext4 run so it doesn't recognize the group name.  IOWs, it's too
stupid to realize that excluding a group that can't be selected should
be a no-op.

--D

> Brian
> 
> >  
> >  		trim_test_list $list
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] check: run tests in exactly the order specified
  2021-02-11 14:00   ` Brian Foster
@ 2021-02-11 17:28     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-11 17:28 UTC (permalink / raw)
  To: Brian Foster; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Feb 11, 2021 at 09:00:45AM -0500, Brian Foster wrote:
> On Tue, Feb 09, 2021 at 06:56:47PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Introduce a new --exact-order switch to disable all sorting, filtering
> > of repeated lines, and shuffling of test order.  The goal of this is to
> > be able to run tests in a specific order, namely to try to reproduce
> > test failures that could be the result of a -r(andomize) run getting
> > lucky.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  check |   36 ++++++++++++++++++++++++++++--------
> >  1 file changed, 28 insertions(+), 8 deletions(-)
> > 
> > 
> > diff --git a/check b/check
> > index 6f8db858..106ec8e1 100755
> > --- a/check
> > +++ b/check
> ...
> > @@ -249,17 +251,22 @@ _prepare_test_list()
> >  		trim_test_list $list
> >  	done
> >  
> > -	# sort the list of tests into numeric order
> > -	if $randomize; then
> > -		if type shuf >& /dev/null; then
> > -			sorter="shuf"
> > +	# sort the list of tests into numeric order unless we're running tests
> > +	# in the exact order specified
> > +	if ! $exact_order; then
> > +		if $randomize; then
> > +			if type shuf >& /dev/null; then
> > +				sorter="shuf"
> > +			else
> > +				sorter="awk -v seed=$RANDOM -f randomize.awk"
> > +			fi
> >  		else
> > -			sorter="awk -v seed=$RANDOM -f randomize.awk"
> > +			sorter="cat"
> >  		fi
> > +		list=`sort -n $tmp.list | uniq | $sorter`
> >  	else
> > -		sorter="cat"
> > +		list=`cat $tmp.list`
> 
> Do we want to still filter out duplicates (i.e. uniq) in exact order
> mode? LGTM either way:

I figure --exact-order means to run exactly what the user specified,
duplicates and all.

--D

> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  	fi
> > -	list=`sort -n $tmp.list | uniq | $sorter`
> >  	rm -f $tmp.list
> >  }
> >  
> > @@ -304,7 +311,20 @@ while [ $# -gt 0 ]; do
> >  	-udiff)	diff="$diff -u" ;;
> >  
> >  	-n)	showme=true ;;
> > -        -r)	randomize=true ;;
> > +	-r)
> > +		if $exact_order; then
> > +			echo "Cannot specify -r and --exact-order."
> > +			exit 1
> > +		fi
> > +		randomize=true
> > +		;;
> > +	--exact-order)
> > +		if $randomize; then
> > +			echo "Cannnot specify --exact-order and -r."
> > +			exit 1
> > +		fi
> > +		exact_order=true
> > +		;;
> >  	-i)	iterations=$2; shift ;;
> >  	-T)	timestamp=true ;;
> >  	-d)	DUMP_OUTPUT=true ;;
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] check: don't abort on non-existent excluded groups
  2021-02-11 17:27     ` Darrick J. Wong
@ 2021-02-11 18:01       ` Brian Foster
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2021-02-11 18:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Feb 11, 2021 at 09:27:05AM -0800, Darrick J. Wong wrote:
> On Thu, Feb 11, 2021 at 09:00:19AM -0500, Brian Foster wrote:
> > On Tue, Feb 09, 2021 at 06:56:42PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Don't abort the whole test run if we asked to exclude groups that aren't
> > > included in the candidate group list, since we actually /are/ satisfying
> > > the user's request.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  check |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > 
> > > diff --git a/check b/check
> > > index e51cbede..6f8db858 100755
> > > --- a/check
> > > +++ b/check
> > > @@ -243,7 +243,7 @@ _prepare_test_list()
> > >  		list=$(get_group_list $xgroup)
> > >  		if [ -z "$list" ]; then
> > >  			echo "Group \"$xgroup\" is empty or not defined?"
> > > -			exit 1
> > > +			continue
> > >  		fi
> > 
> > Is this only for a nonexistent group? I.e., 'check -x nosuchgroup ...' ?
> > If so, what's the advantage?
> 
> I wrote this for groups that exist somewhere but would never have been
> selected for this filesystem type in the first place.  For example,
> 'dangerous_scrub' (aka fuzz testing for xfs_scrub) is only found in
> tests/xfs/group, so running:
> 
> # FSTYP=ext4 ./check -x dangerous_scrub
> 
> fails because ./check cannot select any of the dangerous_scrub tests for
> an ext4 run so it doesn't recognize the group name.  IOWs, it's too
> stupid to realize that excluding a group that can't be selected should
> be a no-op.
> 

Ah, I see. Seems reasonable enough:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> --D
> 
> > Brian
> > 
> > >  
> > >  		trim_test_list $list
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] common: capture metadump output if xfs filesystem check fails
  2021-02-11 13:59   ` Brian Foster
@ 2021-02-11 18:12     ` Darrick J. Wong
  2021-02-11 18:35       ` Brian Foster
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-11 18:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Feb 11, 2021 at 08:59:58AM -0500, Brian Foster wrote:
> On Tue, Feb 09, 2021 at 06:56:30PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@djwong.org>
> > 
> > Capture metadump output when various userspace repair and checker tools
> > fail or indicate corruption, to aid in debugging.  We don't bother to
> > annotate xfs_check because it's bitrotting.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  README     |    2 ++
> >  common/xfs |   26 ++++++++++++++++++++++++++
> >  2 files changed, 28 insertions(+)
> > 
> > 
> > diff --git a/README b/README
> > index 43bb0cee..36f72088 100644
> > --- a/README
> > +++ b/README
> > @@ -109,6 +109,8 @@ Preparing system for tests:
> >               - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
> >                 it between test invocations.  This assumes that the name of
> >                 the module is the same as FSTYP.
> > +	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
> > +	       filesystems if the various stages of _check_xfs_filesystem fail.
> >  
> >          - or add a case to the switch in common/config assigning
> >            these variables based on the hostname of your test
> > diff --git a/common/xfs b/common/xfs
> > index 2156749d..ad1eb6ee 100644
> > --- a/common/xfs
> > +++ b/common/xfs
> > @@ -432,6 +432,21 @@ _supports_xfs_scrub()
> >  	return 0
> >  }
> >  
> > +# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
> > +_snapshot_xfs() {
> 
> The term snapshot has a well known meaning. Can we just call this
> _metadump_xfs()?

Ok.

> 
> > +	local metadump="$1"
> > +	local device="$2"
> > +	local logdev="$3"
> > +	local options="-a -o"
> > +
> > +	if [ "$logdev" != "none" ]; then
> > +		options="$options -l $logdev"
> > +	fi
> > +
> > +	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
> > +	gzip -f "$metadump" >> "$seqres.full" 2>&1 &
> 
> Why compress in the background?

Sometimes the metadumps can become very large and I don't tend to have a
lot of space on the test appliances for storing blobs.

Also, I was under the impression that it was customary for people to
share compressed metadumps of crashes, so why not save everyone a step?

I do this in the background to avoid holding up the next fstest.

> I wonder if we should just skip the
> compression step since this requires an option to enable in the first
> place..

Seeing as it's optional, I think that's all the more reason to compress.

> 
> > +}
> > +
> >  # run xfs_check and friends on a FS.
> >  _check_xfs_filesystem()
> >  {
> ...
> > @@ -540,6 +564,8 @@ _check_xfs_filesystem()
> >  			cat $tmp.repair				>>$seqres.full
> >  			echo "*** end xfs_repair output"	>>$seqres.full
> >  
> > +			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
> > +				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"
> 
> Why do we collect so many metadump images? Shouldn't all but the last
> TEST_XFS_REPAIR_REBUILD thing not modify the fs? If so, it seems like we
> should be able to collect one image (and perhaps just call it
> "$seqres.$device.md") if any of the first several checks flag a problem.

Yes, the number of metadumps collected can be reduced to two.  One if
scrub or logprint or repair -n fail, and a second one if the user set
TEST_XFS_REPAIR_REBUILD=1 and either the repair or the repair -n fail.

Will change that.

--D

> 
> Brian
> 
> >  			ok=0
> >  		fi
> >  		rm -f $tmp.repair
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] common: capture metadump output if xfs filesystem check fails
  2021-02-11 18:12     ` Darrick J. Wong
@ 2021-02-11 18:35       ` Brian Foster
  2021-02-11 19:05         ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Foster @ 2021-02-11 18:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Feb 11, 2021 at 10:12:34AM -0800, Darrick J. Wong wrote:
> On Thu, Feb 11, 2021 at 08:59:58AM -0500, Brian Foster wrote:
> > On Tue, Feb 09, 2021 at 06:56:30PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@djwong.org>
> > > 
> > > Capture metadump output when various userspace repair and checker tools
> > > fail or indicate corruption, to aid in debugging.  We don't bother to
> > > annotate xfs_check because it's bitrotting.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  README     |    2 ++
> > >  common/xfs |   26 ++++++++++++++++++++++++++
> > >  2 files changed, 28 insertions(+)
> > > 
> > > 
> > > diff --git a/README b/README
> > > index 43bb0cee..36f72088 100644
> > > --- a/README
> > > +++ b/README
> > > @@ -109,6 +109,8 @@ Preparing system for tests:
> > >               - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
> > >                 it between test invocations.  This assumes that the name of
> > >                 the module is the same as FSTYP.
> > > +	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
> > > +	       filesystems if the various stages of _check_xfs_filesystem fail.
> > >  
> > >          - or add a case to the switch in common/config assigning
> > >            these variables based on the hostname of your test
> > > diff --git a/common/xfs b/common/xfs
> > > index 2156749d..ad1eb6ee 100644
> > > --- a/common/xfs
> > > +++ b/common/xfs
> > > @@ -432,6 +432,21 @@ _supports_xfs_scrub()
> > >  	return 0
> > >  }
> > >  
> > > +# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
> > > +_snapshot_xfs() {
> > 
> > The term snapshot has a well known meaning. Can we just call this
> > _metadump_xfs()?
> 
> Ok.
> 
> > 
> > > +	local metadump="$1"
> > > +	local device="$2"
> > > +	local logdev="$3"
> > > +	local options="-a -o"
> > > +
> > > +	if [ "$logdev" != "none" ]; then
> > > +		options="$options -l $logdev"
> > > +	fi
> > > +
> > > +	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
> > > +	gzip -f "$metadump" >> "$seqres.full" 2>&1 &
> > 
> > Why compress in the background?
> 
> Sometimes the metadumps can become very large and I don't tend to have a
> lot of space on the test appliances for storing blobs.
> 
> Also, I was under the impression that it was customary for people to
> share compressed metadumps of crashes, so why not save everyone a step?
> 
> I do this in the background to avoid holding up the next fstest.
> 
> > I wonder if we should just skip the
> > compression step since this requires an option to enable in the first
> > place..
> 
> Seeing as it's optional, I think that's all the more reason to compress.
> 

That's fair. It was more the background task that I was concerned about.
If the issue is that the compression takes too long, ISTM there's a
similar risk of the background compression conflicting with ongoing
tests. E.g., we have various tests that scale out I/O threads to extreme
levels and could delay the compression even longer (or vice versa), we
have no way to prevent multiple background compression tasks from
starting/competing as tests continue to run, etc.

What about allowing the user to specify an optional env var in the
config file to provide a compression command to use? If set, compress
the file in the foreground. Then the user can determine whether
compression is necessary at all, and if so, which compression tool might
provide a suitable enough time/space tradeoff for the test environment
(i.e., something like lz4 might be faster than gzip or bzip2 at the cost
of space).

Brian

> > 
> > > +}
> > > +
> > >  # run xfs_check and friends on a FS.
> > >  _check_xfs_filesystem()
> > >  {
> > ...
> > > @@ -540,6 +564,8 @@ _check_xfs_filesystem()
> > >  			cat $tmp.repair				>>$seqres.full
> > >  			echo "*** end xfs_repair output"	>>$seqres.full
> > >  
> > > +			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
> > > +				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"
> > 
> > Why do we collect so many metadump images? Shouldn't all but the last
> > TEST_XFS_REPAIR_REBUILD thing not modify the fs? If so, it seems like we
> > should be able to collect one image (and perhaps just call it
> > "$seqres.$device.md") if any of the first several checks flag a problem.
> 
> Yes, the number of metadumps collected can be reduced to two.  One if
> scrub or logprint or repair -n fail, and a second one if the user set
> TEST_XFS_REPAIR_REBUILD=1 and either the repair or the repair -n fail.
> 
> Will change that.
> 
> --D
> 
> > 
> > Brian
> > 
> > >  			ok=0
> > >  		fi
> > >  		rm -f $tmp.repair
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] common: capture metadump output if xfs filesystem check fails
  2021-02-11 18:35       ` Brian Foster
@ 2021-02-11 19:05         ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-02-11 19:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Feb 11, 2021 at 01:35:24PM -0500, Brian Foster wrote:
> On Thu, Feb 11, 2021 at 10:12:34AM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 11, 2021 at 08:59:58AM -0500, Brian Foster wrote:
> > > On Tue, Feb 09, 2021 at 06:56:30PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@djwong.org>
> > > > 
> > > > Capture metadump output when various userspace repair and checker tools
> > > > fail or indicate corruption, to aid in debugging.  We don't bother to
> > > > annotate xfs_check because it's bitrotting.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  README     |    2 ++
> > > >  common/xfs |   26 ++++++++++++++++++++++++++
> > > >  2 files changed, 28 insertions(+)
> > > > 
> > > > 
> > > > diff --git a/README b/README
> > > > index 43bb0cee..36f72088 100644
> > > > --- a/README
> > > > +++ b/README
> > > > @@ -109,6 +109,8 @@ Preparing system for tests:
> > > >               - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
> > > >                 it between test invocations.  This assumes that the name of
> > > >                 the module is the same as FSTYP.
> > > > +	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
> > > > +	       filesystems if the various stages of _check_xfs_filesystem fail.
> > > >  
> > > >          - or add a case to the switch in common/config assigning
> > > >            these variables based on the hostname of your test
> > > > diff --git a/common/xfs b/common/xfs
> > > > index 2156749d..ad1eb6ee 100644
> > > > --- a/common/xfs
> > > > +++ b/common/xfs
> > > > @@ -432,6 +432,21 @@ _supports_xfs_scrub()
> > > >  	return 0
> > > >  }
> > > >  
> > > > +# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
> > > > +_snapshot_xfs() {
> > > 
> > > The term snapshot has a well known meaning. Can we just call this
> > > _metadump_xfs()?
> > 
> > Ok.
> > 
> > > 
> > > > +	local metadump="$1"
> > > > +	local device="$2"
> > > > +	local logdev="$3"
> > > > +	local options="-a -o"
> > > > +
> > > > +	if [ "$logdev" != "none" ]; then
> > > > +		options="$options -l $logdev"
> > > > +	fi
> > > > +
> > > > +	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
> > > > +	gzip -f "$metadump" >> "$seqres.full" 2>&1 &
> > > 
> > > Why compress in the background?
> > 
> > Sometimes the metadumps can become very large and I don't tend to have a
> > lot of space on the test appliances for storing blobs.
> > 
> > Also, I was under the impression that it was customary for people to
> > share compressed metadumps of crashes, so why not save everyone a step?
> > 
> > I do this in the background to avoid holding up the next fstest.
> > 
> > > I wonder if we should just skip the
> > > compression step since this requires an option to enable in the first
> > > place..
> > 
> > Seeing as it's optional, I think that's all the more reason to compress.
> > 
> 
> That's fair. It was more the background task that I was concerned about.
> If the issue is that the compression takes too long, ISTM there's a
> similar risk of the background compression conflicting with ongoing
> tests. E.g., we have various tests that scale out I/O threads to extreme
> levels and could delay the compression even longer (or vice versa), we
> have no way to prevent multiple background compression tasks from
> starting/competing as tests continue to run, etc.

<nod> Admittedly I chose gzip because it's decent in both the speed and
compression ratio traits; someone else might want xz --extreme.

> What about allowing the user to specify an optional env var in the
> config file to provide a compression command to use? If set, compress
> the file in the foreground. Then the user can determine whether
> compression is necessary at all, and if so, which compression tool might
> provide a suitable enough time/space tradeoff for the test environment
> (i.e., something like lz4 might be faster than gzip or bzip2 at the cost
> of space).

Good idea!  If the user sets SNAPSHOT_XFS_COMPRESSOR to the compressor
program of their choice (e.g. 'gzip -9') then we'll use that to compress
the metadump.

It also occurred to me that I could refactor _scratch_metadump to use
this new helper, so I think I'll implement some means for letting actual
tests disable compression unconditionally.

--D

> Brian
> 
> > > 
> > > > +}
> > > > +
> > > >  # run xfs_check and friends on a FS.
> > > >  _check_xfs_filesystem()
> > > >  {
> > > ...
> > > > @@ -540,6 +564,8 @@ _check_xfs_filesystem()
> > > >  			cat $tmp.repair				>>$seqres.full
> > > >  			echo "*** end xfs_repair output"	>>$seqres.full
> > > >  
> > > > +			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
> > > > +				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"
> > > 
> > > Why do we collect so many metadump images? Shouldn't all but the last
> > > TEST_XFS_REPAIR_REBUILD thing not modify the fs? If so, it seems like we
> > > should be able to collect one image (and perhaps just call it
> > > "$seqres.$device.md") if any of the first several checks flag a problem.
> > 
> > Yes, the number of metadumps collected can be reduced to two.  One if
> > scrub or logprint or repair -n fail, and a second one if the user set
> > TEST_XFS_REPAIR_REBUILD=1 and either the repair or the repair -n fail.
> > 
> > Will change that.
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > >  			ok=0
> > > >  		fi
> > > >  		rm -f $tmp.repair
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-02-11 19:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10  2:56 [PATCHSET 0/6] fstests: various improvements to the test framework Darrick J. Wong
2021-02-10  2:56 ` [PATCH 1/6] config: wrap xfs_metadump as $XFS_METADUMP_PROG like the other tools Darrick J. Wong
2021-02-11 13:58   ` Brian Foster
2021-02-10  2:56 ` [PATCH 2/6] common: capture metadump output if xfs filesystem check fails Darrick J. Wong
2021-02-11 13:59   ` Brian Foster
2021-02-11 18:12     ` Darrick J. Wong
2021-02-11 18:35       ` Brian Foster
2021-02-11 19:05         ` Darrick J. Wong
2021-02-10  2:56 ` [PATCH 3/6] check: allow '-e testid' to exclude a single test Darrick J. Wong
2021-02-11 14:00   ` Brian Foster
2021-02-10  2:56 ` [PATCH 4/6] check: don't abort on non-existent excluded groups Darrick J. Wong
2021-02-11 14:00   ` Brian Foster
2021-02-11 17:27     ` Darrick J. Wong
2021-02-11 18:01       ` Brian Foster
2021-02-10  2:56 ` [PATCH 5/6] check: run tests in exactly the order specified Darrick J. Wong
2021-02-11 14:00   ` Brian Foster
2021-02-11 17:28     ` Darrick J. Wong
2021-02-10  2:56 ` [PATCH 6/6] fuzzy: capture core dumps from repair utilities Darrick J. Wong
2021-02-11 14:00   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).