All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/2] fstests: exercise code refactored in 5.14
@ 2021-08-17 23:53 Darrick J. Wong
  2021-08-17 23:53 ` [PATCH 1/2] generic: fsstress with cpu offlining Darrick J. Wong
  2021-08-17 23:53 ` [PATCH 2/2] generic: test shutdowns of a nested filesystem Darrick J. Wong
  0 siblings, 2 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-08-17 23:53 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

Hi all,

Add one new test to exercise code that got refactored in 5.14.  The
nested shutdown test simulates the process of recovering after a VM host
filesystem goes down and the guests have to recover.

v2: fix some bugs pointed out by the maintainer, add cpu offlining stress test

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=new-tests-for-5.14
---
 common/rc             |   20 +++++++
 tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/725.out |    2 +
 tests/generic/726     |   71 ++++++++++++++++++++++++++
 tests/generic/726.out |    2 +
 5 files changed, 231 insertions(+)
 create mode 100755 tests/generic/725
 create mode 100644 tests/generic/725.out
 create mode 100755 tests/generic/726
 create mode 100644 tests/generic/726.out


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] generic: fsstress with cpu offlining
  2021-08-17 23:53 [PATCHSET v2 0/2] fstests: exercise code refactored in 5.14 Darrick J. Wong
@ 2021-08-17 23:53 ` Darrick J. Wong
  2021-08-18  6:07   ` Zorro Lang
  2021-08-17 23:53 ` [PATCH 2/2] generic: test shutdowns of a nested filesystem Darrick J. Wong
  1 sibling, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2021-08-17 23:53 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Exercise filesystem operations when we're taking CPUs online and offline
throughout the test.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/726     |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/726.out |    2 +
 2 files changed, 73 insertions(+)
 create mode 100755 tests/generic/726
 create mode 100644 tests/generic/726.out


diff --git a/tests/generic/726 b/tests/generic/726
new file mode 100755
index 00000000..4b072b7f
--- /dev/null
+++ b/tests/generic/726
@@ -0,0 +1,71 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
+#
+# FS QA Test No. 726
+#
+# Run an all-writes fsstress run with multiple threads while exercising CPU
+# hotplugging to shake out bugs in the write path.
+#
+. ./common/preamble
+_begin_fstest auto rw
+
+# Override the default cleanup function.
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+	for i in "$sysfs_cpu_dir/"cpu*/online; do
+		echo 1 > "$i" 2>/dev/null
+	done
+}
+
+exercise_cpu_hotplug()
+{
+	while [ -e $sentinel_file ]; do
+		local idx=$(( RANDOM % nr_hotplug_cpus ))
+		local cpu="${hotplug_cpus[idx]}"
+		local action=$(( RANDOM % 2 ))
+
+		echo "$action" > "$sysfs_cpu_dir/cpu$cpu/online" 2>/dev/null
+		sleep 0.5
+	done
+}
+
+# Import common functions.
+
+# Modify as appropriate.
+_supported_fs generic
+
+sysfs_cpu_dir="/sys/devices/system/cpu"
+
+# Figure out which CPU(s) support hotplug.
+nrcpus=$(getconf _NPROCESSORS_CONF)
+hotplug_cpus=()
+for ((i = 0; i < nrcpus; i++ )); do
+	test -e "$sysfs_cpu_dir/cpu$i/online" && hotplug_cpus+=("$i")
+done
+nr_hotplug_cpus="${#hotplug_cpus[@]}"
+test "$nr_hotplug_cpus" -gt 0 || _notrun "CPU hotplugging not supported"
+
+_require_scratch
+_require_command "$KILLALL_PROG" "killall"
+
+echo "Silence is golden."
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+sentinel_file=$tmp.hotplug
+touch $sentinel_file
+exercise_cpu_hotplug &
+
+nr_cpus=$((LOAD_FACTOR * 4))
+nr_ops=$((10000 * nr_cpus * TIME_FACTOR))
+$FSSTRESS_PROG $FSSTRESS_AVOID -w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
+rm -f $sentinel_file
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/726.out b/tests/generic/726.out
new file mode 100644
index 00000000..6839f8ce
--- /dev/null
+++ b/tests/generic/726.out
@@ -0,0 +1,2 @@
+QA output created by 726
+Silence is golden.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] generic: test shutdowns of a nested filesystem
  2021-08-17 23:53 [PATCHSET v2 0/2] fstests: exercise code refactored in 5.14 Darrick J. Wong
  2021-08-17 23:53 ` [PATCH 1/2] generic: fsstress with cpu offlining Darrick J. Wong
@ 2021-08-17 23:53 ` Darrick J. Wong
  2021-08-18  7:06   ` Zorro Lang
  2021-08-22 11:18   ` Eryu Guan
  1 sibling, 2 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-08-17 23:53 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

generic/475, but we're running fsstress on a disk image inside the
scratch filesystem

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc             |   20 +++++++
 tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/725.out |    2 +
 3 files changed, 158 insertions(+)
 create mode 100755 tests/generic/725
 create mode 100644 tests/generic/725.out


diff --git a/common/rc b/common/rc
index 84757fc1..473bfb0a 100644
--- a/common/rc
+++ b/common/rc
@@ -631,6 +631,26 @@ _ext4_metadump()
 		$DUMP_COMPRESSOR -f "$dumpfile" &>> "$seqres.full"
 }
 
+# Capture the metadata of a filesystem in a dump file for offline analysis
+_metadump_dev() {
+	local device="$1"
+	local dumpfile="$2"
+	local compressopt="$3"
+
+	case "$FSTYP" in
+	ext*)
+		_ext4_metadump $device $dumpfile $compressopt
+		;;
+	xfs)
+		_xfs_metadump $dumpfile $device none $compressopt
+		;;
+	*)
+		echo "Don't know how to metadump $FSTYP"
+		return 1
+		;;
+	esac
+}
+
 _test_mkfs()
 {
     case $FSTYP in
diff --git a/tests/generic/725 b/tests/generic/725
new file mode 100755
index 00000000..ac008fdb
--- /dev/null
+++ b/tests/generic/725
@@ -0,0 +1,136 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
+#
+# FS QA Test No. 725
+#
+# Test nested log recovery with repeated (simulated) disk failures.  We kick
+# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
+# out the underlying scratch device with dm-error to see what happens when the
+# disk goes down.  Having taken down both fses in this manner, remount them and
+# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
+# in writeback on the host that cause VM guests to fail to recover.
+#
+. ./common/preamble
+_begin_fstest shutdown auto log metadata eio recoveryloop
+
+_cleanup()
+{
+	cd /
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+	wait
+	if [ -n "$loopmnt" ]; then
+		$UMOUNT_PROG $loopmnt 2>/dev/null
+		rm -r -f $loopmnt
+	fi
+	rm -f $tmp.*
+	_dmerror_unmount
+	_dmerror_cleanup
+}
+
+# Import common functions.
+. ./common/dmerror
+. ./common/reflink
+
+# Modify as appropriate.
+_supported_fs generic
+
+_require_scratch_reflink
+_require_cp_reflink
+_require_dm_target error
+_require_command "$KILLALL_PROG" "killall"
+
+echo "Silence is golden."
+
+_scratch_mkfs >> $seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_dmerror_init
+_dmerror_mount
+
+# Create a fs image consuming 1/3 of the scratch fs
+scratch_freesp_bytes=$(_get_available_space $SCRATCH_MNT)
+loopimg_bytes=$((scratch_freesp_bytes / 3))
+
+loopimg=$SCRATCH_MNT/testfs
+truncate -s $loopimg_bytes $loopimg
+_mkfs_dev $loopimg
+
+loopmnt=$tmp.mount
+mkdir -p $loopmnt
+
+scratch_aliveflag=$tmp.runsnap
+snap_aliveflag=$tmp.snapping
+
+snap_loop_fs() {
+	touch "$snap_aliveflag"
+	while [ -e "$scratch_aliveflag" ]; do
+		rm -f $loopimg.a
+		_cp_reflink $loopimg $loopimg.a
+		sleep 1
+	done
+	rm -f "$snap_aliveflag"
+}
+
+fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
+
+for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
+	touch $scratch_aliveflag
+	snap_loop_fs >> $seqres.full 2>&1 &
+
+	if ! _mount $loopimg $loopmnt -o loop; then
+		rm -f $scratch_aliveflag
+		_metadump_dev $loopimg $seqres.loop.$i.md
+		_fail "iteration $i loopimg mount failed"
+		break
+	fi
+
+	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
+
+	# purposely include 0 second sleeps to test shutdown immediately after
+	# recovery
+	sleep $((RANDOM % (3 * TIME_FACTOR) ))
+	rm -f $scratch_aliveflag
+
+	# This test aims to simulate sudden disk failure, which means that we
+	# do not want to quiesce the filesystem or otherwise give it a chance
+	# to flush its logs.  Therefore we want to call dmsetup with the
+	# --nolockfs parameter; to make this happen we must call the load
+	# error table helper *without* 'lockfs'.
+	_dmerror_load_error_table
+
+	ps -e | grep fsstress > /dev/null 2>&1
+	while [ $? -eq 0 ]; do
+		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+		wait > /dev/null 2>&1
+		ps -e | grep fsstress > /dev/null 2>&1
+	done
+	for ((i = 0; i < 10; i++)); do
+		test -e "$snap_aliveflag" || break
+		sleep 1
+	done
+
+	# Mount again to replay log after loading working table, so we have a
+	# consistent fs after test.
+	$UMOUNT_PROG $loopmnt
+	_dmerror_unmount || _fail "iteration $i scratch unmount failed"
+	_dmerror_load_working_table
+	if ! _dmerror_mount; then
+		_metadump_dev $DMERROR_DEV $seqres.scratch.$i.md
+		_fail "iteration $i scratch mount failed"
+	fi
+done
+
+# Make sure the fs image file is ok
+if [ -f "$loopimg" ]; then
+	if _mount $loopimg $loopmnt -o loop; then
+		$UMOUNT_PROG $loopmnt &> /dev/null
+	else
+		_metadump_dev $DMERROR_DEV $seqres.scratch.final.md
+		echo "final scratch mount failed"
+	fi
+	SCRATCH_RTDEV= SCRATCH_LOGDEV= _check_scratch_fs $loopimg
+fi
+
+# success, all done; let the test harness check the scratch fs
+status=0
+exit
diff --git a/tests/generic/725.out b/tests/generic/725.out
new file mode 100644
index 00000000..ed73a9fc
--- /dev/null
+++ b/tests/generic/725.out
@@ -0,0 +1,2 @@
+QA output created by 725
+Silence is golden.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] generic: fsstress with cpu offlining
  2021-08-17 23:53 ` [PATCH 1/2] generic: fsstress with cpu offlining Darrick J. Wong
@ 2021-08-18  6:07   ` Zorro Lang
  2021-08-18  6:32     ` Zorro Lang
  0 siblings, 1 reply; 11+ messages in thread
From: Zorro Lang @ 2021-08-18  6:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Aug 17, 2021 at 04:53:19PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Exercise filesystem operations when we're taking CPUs online and offline
> throughout the test.

Just ask, is this test cover something (commits)?

> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/generic/726     |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/726.out |    2 +
>  2 files changed, 73 insertions(+)
>  create mode 100755 tests/generic/726
>  create mode 100644 tests/generic/726.out
> 
> 
> diff --git a/tests/generic/726 b/tests/generic/726
> new file mode 100755
> index 00000000..4b072b7f
> --- /dev/null
> +++ b/tests/generic/726
> @@ -0,0 +1,71 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> +#
> +# FS QA Test No. 726
> +#
> +# Run an all-writes fsstress run with multiple threads while exercising CPU
> +# hotplugging to shake out bugs in the write path.
> +#
> +. ./common/preamble
> +_begin_fstest auto rw
> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1

At least there's "exercise_cpu_hotplug &", should we wait at here? Even we removed
$tmp.hotplug, can't make sure the process is over.

> +	for i in "$sysfs_cpu_dir/"cpu*/online; do
> +		echo 1 > "$i" 2>/dev/null
> +	done
> +}
> +
> +exercise_cpu_hotplug()
> +{
> +	while [ -e $sentinel_file ]; do
> +		local idx=$(( RANDOM % nr_hotplug_cpus ))
> +		local cpu="${hotplug_cpus[idx]}"
> +		local action=$(( RANDOM % 2 ))
> +
> +		echo "$action" > "$sysfs_cpu_dir/cpu$cpu/online" 2>/dev/null
> +		sleep 0.5
> +	done
> +}
> +
> +# Import common functions.
> +
> +# Modify as appropriate.

Two useless comments at here?

> +_supported_fs generic
> +
> +sysfs_cpu_dir="/sys/devices/system/cpu"
> +
> +# Figure out which CPU(s) support hotplug.
> +nrcpus=$(getconf _NPROCESSORS_CONF)
> +hotplug_cpus=()
> +for ((i = 0; i < nrcpus; i++ )); do
> +	test -e "$sysfs_cpu_dir/cpu$i/online" && hotplug_cpus+=("$i")
> +done
> +nr_hotplug_cpus="${#hotplug_cpus[@]}"
> +test "$nr_hotplug_cpus" -gt 0 || _notrun "CPU hotplugging not supported"

Is that worth being a helper?

> +
> +_require_scratch
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +sentinel_file=$tmp.hotplug
> +touch $sentinel_file
> +exercise_cpu_hotplug &
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((10000 * nr_cpus * TIME_FACTOR))
> +$FSSTRESS_PROG $FSSTRESS_AVOID -w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
> +rm -f $sentinel_file
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/726.out b/tests/generic/726.out
> new file mode 100644
> index 00000000..6839f8ce
> --- /dev/null
> +++ b/tests/generic/726.out
> @@ -0,0 +1,2 @@
> +QA output created by 726
> +Silence is golden.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] generic: fsstress with cpu offlining
  2021-08-18  6:07   ` Zorro Lang
@ 2021-08-18  6:32     ` Zorro Lang
  2021-08-18 16:01       ` Darrick J. Wong
  0 siblings, 1 reply; 11+ messages in thread
From: Zorro Lang @ 2021-08-18  6:32 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu, linux-xfs, fstests, guan

On Wed, Aug 18, 2021 at 02:07:37PM +0800, Zorro Lang wrote:
> On Tue, Aug 17, 2021 at 04:53:19PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Exercise filesystem operations when we're taking CPUs online and offline
> > throughout the test.
> 
> Just ask, is this test cover something (commits)?
> 
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/generic/726     |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/726.out |    2 +
> >  2 files changed, 73 insertions(+)
> >  create mode 100755 tests/generic/726
> >  create mode 100644 tests/generic/726.out
> > 
> > 
> > diff --git a/tests/generic/726 b/tests/generic/726
> > new file mode 100755
> > index 00000000..4b072b7f
> > --- /dev/null
> > +++ b/tests/generic/726
> > @@ -0,0 +1,71 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test No. 726
> > +#
> > +# Run an all-writes fsstress run with multiple threads while exercising CPU
> > +# hotplugging to shake out bugs in the write path.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto rw

Oh, I think it can be in 'stress' group, due to it's a fsstress random test, and
it really takes long time on my system (with 24 cpus):

generic/726      1041s

And might take more time :)

Thanks,
Zorro

> > +
> > +# Override the default cleanup function.
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -f $tmp.*
> > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> 
> At least there's "exercise_cpu_hotplug &", should we wait at here? Even we removed
> $tmp.hotplug, can't make sure the process is over.
> 
> > +	for i in "$sysfs_cpu_dir/"cpu*/online; do
> > +		echo 1 > "$i" 2>/dev/null
> > +	done
> > +}
> > +
> > +exercise_cpu_hotplug()
> > +{
> > +	while [ -e $sentinel_file ]; do
> > +		local idx=$(( RANDOM % nr_hotplug_cpus ))
> > +		local cpu="${hotplug_cpus[idx]}"
> > +		local action=$(( RANDOM % 2 ))
> > +
> > +		echo "$action" > "$sysfs_cpu_dir/cpu$cpu/online" 2>/dev/null
> > +		sleep 0.5
> > +	done
> > +}
> > +
> > +# Import common functions.
> > +
> > +# Modify as appropriate.
> 
> Two useless comments at here?
> 
> > +_supported_fs generic
> > +
> > +sysfs_cpu_dir="/sys/devices/system/cpu"
> > +
> > +# Figure out which CPU(s) support hotplug.
> > +nrcpus=$(getconf _NPROCESSORS_CONF)
> > +hotplug_cpus=()
> > +for ((i = 0; i < nrcpus; i++ )); do
> > +	test -e "$sysfs_cpu_dir/cpu$i/online" && hotplug_cpus+=("$i")
> > +done
> > +nr_hotplug_cpus="${#hotplug_cpus[@]}"
> > +test "$nr_hotplug_cpus" -gt 0 || _notrun "CPU hotplugging not supported"
> 
> Is that worth being a helper?
> 
> > +
> > +_require_scratch
> > +_require_command "$KILLALL_PROG" "killall"
> > +
> > +echo "Silence is golden."
> > +
> > +_scratch_mkfs > $seqres.full 2>&1
> > +_scratch_mount >> $seqres.full 2>&1
> > +
> > +sentinel_file=$tmp.hotplug
> > +touch $sentinel_file
> > +exercise_cpu_hotplug &
> > +
> > +nr_cpus=$((LOAD_FACTOR * 4))
> > +nr_ops=$((10000 * nr_cpus * TIME_FACTOR))
> > +$FSSTRESS_PROG $FSSTRESS_AVOID -w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
> > +rm -f $sentinel_file
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/726.out b/tests/generic/726.out
> > new file mode 100644
> > index 00000000..6839f8ce
> > --- /dev/null
> > +++ b/tests/generic/726.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 726
> > +Silence is golden.
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] generic: test shutdowns of a nested filesystem
  2021-08-17 23:53 ` [PATCH 2/2] generic: test shutdowns of a nested filesystem Darrick J. Wong
@ 2021-08-18  7:06   ` Zorro Lang
  2021-08-18 15:55     ` Darrick J. Wong
  2021-08-22 11:18   ` Eryu Guan
  1 sibling, 1 reply; 11+ messages in thread
From: Zorro Lang @ 2021-08-18  7:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Aug 17, 2021 at 04:53:25PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> generic/475, but we're running fsstress on a disk image inside the
> scratch filesystem
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

Good to me, thanks for this helpful test case. Just one question,
is it better to use xfs_metadump with "-o" option by default?

Reviewed-by: Zorro Lang <zlang@redhat.com>

>  common/rc             |   20 +++++++
>  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/725.out |    2 +
>  3 files changed, 158 insertions(+)
>  create mode 100755 tests/generic/725
>  create mode 100644 tests/generic/725.out
> 
> 
> diff --git a/common/rc b/common/rc
> index 84757fc1..473bfb0a 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -631,6 +631,26 @@ _ext4_metadump()
>  		$DUMP_COMPRESSOR -f "$dumpfile" &>> "$seqres.full"
>  }
>  
> +# Capture the metadata of a filesystem in a dump file for offline analysis
> +_metadump_dev() {
> +	local device="$1"
> +	local dumpfile="$2"
> +	local compressopt="$3"
> +
> +	case "$FSTYP" in
> +	ext*)
> +		_ext4_metadump $device $dumpfile $compressopt
> +		;;
> +	xfs)
> +		_xfs_metadump $dumpfile $device none $compressopt
> +		;;
> +	*)
> +		echo "Don't know how to metadump $FSTYP"
> +		return 1
> +		;;
> +	esac
> +}
> +
>  _test_mkfs()
>  {
>      case $FSTYP in
> diff --git a/tests/generic/725 b/tests/generic/725
> new file mode 100755
> index 00000000..ac008fdb
> --- /dev/null
> +++ b/tests/generic/725
> @@ -0,0 +1,136 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> +#
> +# FS QA Test No. 725
> +#
> +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> +# out the underlying scratch device with dm-error to see what happens when the
> +# disk goes down.  Having taken down both fses in this manner, remount them and
> +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> +# in writeback on the host that cause VM guests to fail to recover.
> +#
> +. ./common/preamble
> +_begin_fstest shutdown auto log metadata eio recoveryloop
> +
> +_cleanup()
> +{
> +	cd /
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +	wait
> +	if [ -n "$loopmnt" ]; then
> +		$UMOUNT_PROG $loopmnt 2>/dev/null
> +		rm -r -f $loopmnt
> +	fi
> +	rm -f $tmp.*
> +	_dmerror_unmount
> +	_dmerror_cleanup
> +}
> +
> +# Import common functions.
> +. ./common/dmerror
> +. ./common/reflink
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +
> +_require_scratch_reflink
> +_require_cp_reflink
> +_require_dm_target error
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +_require_metadata_journaling $SCRATCH_DEV
> +_dmerror_init
> +_dmerror_mount
> +
> +# Create a fs image consuming 1/3 of the scratch fs
> +scratch_freesp_bytes=$(_get_available_space $SCRATCH_MNT)
> +loopimg_bytes=$((scratch_freesp_bytes / 3))
> +
> +loopimg=$SCRATCH_MNT/testfs
> +truncate -s $loopimg_bytes $loopimg
> +_mkfs_dev $loopimg
> +
> +loopmnt=$tmp.mount
> +mkdir -p $loopmnt
> +
> +scratch_aliveflag=$tmp.runsnap
> +snap_aliveflag=$tmp.snapping
> +
> +snap_loop_fs() {
> +	touch "$snap_aliveflag"
> +	while [ -e "$scratch_aliveflag" ]; do
> +		rm -f $loopimg.a
> +		_cp_reflink $loopimg $loopimg.a
> +		sleep 1
> +	done
> +	rm -f "$snap_aliveflag"
> +}
> +
> +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> +
> +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> +	touch $scratch_aliveflag
> +	snap_loop_fs >> $seqres.full 2>&1 &
> +
> +	if ! _mount $loopimg $loopmnt -o loop; then
> +		rm -f $scratch_aliveflag
> +		_metadump_dev $loopimg $seqres.loop.$i.md
> +		_fail "iteration $i loopimg mount failed"
> +		break
> +	fi
> +
> +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> +
> +	# purposely include 0 second sleeps to test shutdown immediately after
> +	# recovery
> +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> +	rm -f $scratch_aliveflag
> +
> +	# This test aims to simulate sudden disk failure, which means that we
> +	# do not want to quiesce the filesystem or otherwise give it a chance
> +	# to flush its logs.  Therefore we want to call dmsetup with the
> +	# --nolockfs parameter; to make this happen we must call the load
> +	# error table helper *without* 'lockfs'.
> +	_dmerror_load_error_table
> +
> +	ps -e | grep fsstress > /dev/null 2>&1
> +	while [ $? -eq 0 ]; do
> +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +		wait > /dev/null 2>&1
> +		ps -e | grep fsstress > /dev/null 2>&1
> +	done
> +	for ((i = 0; i < 10; i++)); do
> +		test -e "$snap_aliveflag" || break
> +		sleep 1
> +	done
> +
> +	# Mount again to replay log after loading working table, so we have a
> +	# consistent fs after test.
> +	$UMOUNT_PROG $loopmnt
> +	_dmerror_unmount || _fail "iteration $i scratch unmount failed"
> +	_dmerror_load_working_table
> +	if ! _dmerror_mount; then
> +		_metadump_dev $DMERROR_DEV $seqres.scratch.$i.md
> +		_fail "iteration $i scratch mount failed"
> +	fi
> +done
> +
> +# Make sure the fs image file is ok
> +if [ -f "$loopimg" ]; then
> +	if _mount $loopimg $loopmnt -o loop; then
> +		$UMOUNT_PROG $loopmnt &> /dev/null
> +	else
> +		_metadump_dev $DMERROR_DEV $seqres.scratch.final.md
> +		echo "final scratch mount failed"
> +	fi
> +	SCRATCH_RTDEV= SCRATCH_LOGDEV= _check_scratch_fs $loopimg
> +fi
> +
> +# success, all done; let the test harness check the scratch fs
> +status=0
> +exit
> diff --git a/tests/generic/725.out b/tests/generic/725.out
> new file mode 100644
> index 00000000..ed73a9fc
> --- /dev/null
> +++ b/tests/generic/725.out
> @@ -0,0 +1,2 @@
> +QA output created by 725
> +Silence is golden.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] generic: test shutdowns of a nested filesystem
  2021-08-18  7:06   ` Zorro Lang
@ 2021-08-18 15:55     ` Darrick J. Wong
  2021-08-18 17:18       ` Zorro Lang
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2021-08-18 15:55 UTC (permalink / raw)
  To: guaneryu, linux-xfs, fstests, guan

On Wed, Aug 18, 2021 at 03:06:54PM +0800, Zorro Lang wrote:
> On Tue, Aug 17, 2021 at 04:53:25PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > generic/475, but we're running fsstress on a disk image inside the
> > scratch filesystem
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> 
> Good to me, thanks for this helpful test case. Just one question,
> is it better to use xfs_metadump with "-o" option by default?

_xfs_metadump already passes -a and -o.

--D

> Reviewed-by: Zorro Lang <zlang@redhat.com>
> 
> >  common/rc             |   20 +++++++
> >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/725.out |    2 +
> >  3 files changed, 158 insertions(+)
> >  create mode 100755 tests/generic/725
> >  create mode 100644 tests/generic/725.out
> > 
> > 
> > diff --git a/common/rc b/common/rc
> > index 84757fc1..473bfb0a 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -631,6 +631,26 @@ _ext4_metadump()
> >  		$DUMP_COMPRESSOR -f "$dumpfile" &>> "$seqres.full"
> >  }
> >  
> > +# Capture the metadata of a filesystem in a dump file for offline analysis
> > +_metadump_dev() {
> > +	local device="$1"
> > +	local dumpfile="$2"
> > +	local compressopt="$3"
> > +
> > +	case "$FSTYP" in
> > +	ext*)
> > +		_ext4_metadump $device $dumpfile $compressopt
> > +		;;
> > +	xfs)
> > +		_xfs_metadump $dumpfile $device none $compressopt
> > +		;;
> > +	*)
> > +		echo "Don't know how to metadump $FSTYP"
> > +		return 1
> > +		;;
> > +	esac
> > +}
> > +
> >  _test_mkfs()
> >  {
> >      case $FSTYP in
> > diff --git a/tests/generic/725 b/tests/generic/725
> > new file mode 100755
> > index 00000000..ac008fdb
> > --- /dev/null
> > +++ b/tests/generic/725
> > @@ -0,0 +1,136 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test No. 725
> > +#
> > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > +# out the underlying scratch device with dm-error to see what happens when the
> > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > +# in writeback on the host that cause VM guests to fail to recover.
> > +#
> > +. ./common/preamble
> > +_begin_fstest shutdown auto log metadata eio recoveryloop
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +	wait
> > +	if [ -n "$loopmnt" ]; then
> > +		$UMOUNT_PROG $loopmnt 2>/dev/null
> > +		rm -r -f $loopmnt
> > +	fi
> > +	rm -f $tmp.*
> > +	_dmerror_unmount
> > +	_dmerror_cleanup
> > +}
> > +
> > +# Import common functions.
> > +. ./common/dmerror
> > +. ./common/reflink
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +
> > +_require_scratch_reflink
> > +_require_cp_reflink
> > +_require_dm_target error
> > +_require_command "$KILLALL_PROG" "killall"
> > +
> > +echo "Silence is golden."
> > +
> > +_scratch_mkfs >> $seqres.full 2>&1
> > +_require_metadata_journaling $SCRATCH_DEV
> > +_dmerror_init
> > +_dmerror_mount
> > +
> > +# Create a fs image consuming 1/3 of the scratch fs
> > +scratch_freesp_bytes=$(_get_available_space $SCRATCH_MNT)
> > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > +
> > +loopimg=$SCRATCH_MNT/testfs
> > +truncate -s $loopimg_bytes $loopimg
> > +_mkfs_dev $loopimg
> > +
> > +loopmnt=$tmp.mount
> > +mkdir -p $loopmnt
> > +
> > +scratch_aliveflag=$tmp.runsnap
> > +snap_aliveflag=$tmp.snapping
> > +
> > +snap_loop_fs() {
> > +	touch "$snap_aliveflag"
> > +	while [ -e "$scratch_aliveflag" ]; do
> > +		rm -f $loopimg.a
> > +		_cp_reflink $loopimg $loopimg.a
> > +		sleep 1
> > +	done
> > +	rm -f "$snap_aliveflag"
> > +}
> > +
> > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > +
> > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > +	touch $scratch_aliveflag
> > +	snap_loop_fs >> $seqres.full 2>&1 &
> > +
> > +	if ! _mount $loopimg $loopmnt -o loop; then
> > +		rm -f $scratch_aliveflag
> > +		_metadump_dev $loopimg $seqres.loop.$i.md
> > +		_fail "iteration $i loopimg mount failed"
> > +		break
> > +	fi
> > +
> > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > +
> > +	# purposely include 0 second sleeps to test shutdown immediately after
> > +	# recovery
> > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > +	rm -f $scratch_aliveflag
> > +
> > +	# This test aims to simulate sudden disk failure, which means that we
> > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > +	# --nolockfs parameter; to make this happen we must call the load
> > +	# error table helper *without* 'lockfs'.
> > +	_dmerror_load_error_table
> > +
> > +	ps -e | grep fsstress > /dev/null 2>&1
> > +	while [ $? -eq 0 ]; do
> > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +		wait > /dev/null 2>&1
> > +		ps -e | grep fsstress > /dev/null 2>&1
> > +	done
> > +	for ((i = 0; i < 10; i++)); do
> > +		test -e "$snap_aliveflag" || break
> > +		sleep 1
> > +	done
> > +
> > +	# Mount again to replay log after loading working table, so we have a
> > +	# consistent fs after test.
> > +	$UMOUNT_PROG $loopmnt
> > +	_dmerror_unmount || _fail "iteration $i scratch unmount failed"
> > +	_dmerror_load_working_table
> > +	if ! _dmerror_mount; then
> > +		_metadump_dev $DMERROR_DEV $seqres.scratch.$i.md
> > +		_fail "iteration $i scratch mount failed"
> > +	fi
> > +done
> > +
> > +# Make sure the fs image file is ok
> > +if [ -f "$loopimg" ]; then
> > +	if _mount $loopimg $loopmnt -o loop; then
> > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > +	else
> > +		_metadump_dev $DMERROR_DEV $seqres.scratch.final.md
> > +		echo "final scratch mount failed"
> > +	fi
> > +	SCRATCH_RTDEV= SCRATCH_LOGDEV= _check_scratch_fs $loopimg
> > +fi
> > +
> > +# success, all done; let the test harness check the scratch fs
> > +status=0
> > +exit
> > diff --git a/tests/generic/725.out b/tests/generic/725.out
> > new file mode 100644
> > index 00000000..ed73a9fc
> > --- /dev/null
> > +++ b/tests/generic/725.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 725
> > +Silence is golden.
> > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] generic: fsstress with cpu offlining
  2021-08-18  6:32     ` Zorro Lang
@ 2021-08-18 16:01       ` Darrick J. Wong
  0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-08-18 16:01 UTC (permalink / raw)
  To: guaneryu, linux-xfs, fstests, guan

On Wed, Aug 18, 2021 at 02:32:49PM +0800, Zorro Lang wrote:
> On Wed, Aug 18, 2021 at 02:07:37PM +0800, Zorro Lang wrote:
> > On Tue, Aug 17, 2021 at 04:53:19PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Exercise filesystem operations when we're taking CPUs online and offline
> > > throughout the test.
> > 
> > Just ask, is this test cover something (commits)?

This test started its life as a simple exerciser to try to spot problems
with the percpu data structure handling in Dave's log scalability
patchset.  Now that inode inactivation also uses percpu lists, it covers
both of those things.

However, filesystems were already supposed to keep running even with
CPUs going off and online, so I didn't list any specific commits.

> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  tests/generic/726     |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/726.out |    2 +
> > >  2 files changed, 73 insertions(+)
> > >  create mode 100755 tests/generic/726
> > >  create mode 100644 tests/generic/726.out
> > > 
> > > 
> > > diff --git a/tests/generic/726 b/tests/generic/726
> > > new file mode 100755
> > > index 00000000..4b072b7f
> > > --- /dev/null
> > > +++ b/tests/generic/726
> > > @@ -0,0 +1,71 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > +#
> > > +# FS QA Test No. 726
> > > +#
> > > +# Run an all-writes fsstress run with multiple threads while exercising CPU
> > > +# hotplugging to shake out bugs in the write path.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto rw
> 
> Oh, I think it can be in 'stress' group, due to it's a fsstress random test, and
> it really takes long time on my system (with 24 cpus):
> 
> generic/726      1041s
> 
> And might take more time :)

Hmm.  Ok, back to the drawing board on this one then...

> Thanks,
> Zorro
> 
> > > +
> > > +# Override the default cleanup function.
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	rm -f $tmp.*
> > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > 
> > At least there's "exercise_cpu_hotplug &", should we wait at here? Even we removed
> > $tmp.hotplug, can't make sure the process is over.

Yes.

> > 
> > > +	for i in "$sysfs_cpu_dir/"cpu*/online; do
> > > +		echo 1 > "$i" 2>/dev/null
> > > +	done
> > > +}
> > > +
> > > +exercise_cpu_hotplug()
> > > +{
> > > +	while [ -e $sentinel_file ]; do
> > > +		local idx=$(( RANDOM % nr_hotplug_cpus ))
> > > +		local cpu="${hotplug_cpus[idx]}"
> > > +		local action=$(( RANDOM % 2 ))
> > > +
> > > +		echo "$action" > "$sysfs_cpu_dir/cpu$cpu/online" 2>/dev/null
> > > +		sleep 0.5
> > > +	done
> > > +}
> > > +
> > > +# Import common functions.
> > > +
> > > +# Modify as appropriate.
> > 
> > Two useless comments at here?

Removed.

> > > +_supported_fs generic
> > > +
> > > +sysfs_cpu_dir="/sys/devices/system/cpu"
> > > +
> > > +# Figure out which CPU(s) support hotplug.
> > > +nrcpus=$(getconf _NPROCESSORS_CONF)
> > > +hotplug_cpus=()
> > > +for ((i = 0; i < nrcpus; i++ )); do
> > > +	test -e "$sysfs_cpu_dir/cpu$i/online" && hotplug_cpus+=("$i")
> > > +done
> > > +nr_hotplug_cpus="${#hotplug_cpus[@]}"
> > > +test "$nr_hotplug_cpus" -gt 0 || _notrun "CPU hotplugging not supported"
> > 
> > Is that worth being a helper?

I would defer that to the second time someone wants to write a cpu
hotplug test.

> > > +
> > > +_require_scratch
> > > +_require_command "$KILLALL_PROG" "killall"
> > > +
> > > +echo "Silence is golden."
> > > +
> > > +_scratch_mkfs > $seqres.full 2>&1
> > > +_scratch_mount >> $seqres.full 2>&1
> > > +
> > > +sentinel_file=$tmp.hotplug
> > > +touch $sentinel_file
> > > +exercise_cpu_hotplug &
> > > +
> > > +nr_cpus=$((LOAD_FACTOR * 4))

Hm, this probably ought to be nr_cpus=$((LOAD_FACTOR * nr_hotplug_cpus))

> > > +nr_ops=$((10000 * nr_cpus * TIME_FACTOR))

And this nr_ops=$((25000 * TIME_FACTOR))

--D

> > > +$FSSTRESS_PROG $FSSTRESS_AVOID -w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
> > > +rm -f $sentinel_file
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/726.out b/tests/generic/726.out
> > > new file mode 100644
> > > index 00000000..6839f8ce
> > > --- /dev/null
> > > +++ b/tests/generic/726.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 726
> > > +Silence is golden.
> > > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] generic: test shutdowns of a nested filesystem
  2021-08-18 15:55     ` Darrick J. Wong
@ 2021-08-18 17:18       ` Zorro Lang
  0 siblings, 0 replies; 11+ messages in thread
From: Zorro Lang @ 2021-08-18 17:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Wed, Aug 18, 2021 at 08:55:26AM -0700, Darrick J. Wong wrote:
> On Wed, Aug 18, 2021 at 03:06:54PM +0800, Zorro Lang wrote:
> > On Tue, Aug 17, 2021 at 04:53:25PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > generic/475, but we're running fsstress on a disk image inside the
> > > scratch filesystem
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > 
> > Good to me, thanks for this helpful test case. Just one question,
> > is it better to use xfs_metadump with "-o" option by default?
> 
> _xfs_metadump already passes -a and -o.

Oh, sorry, I didn't notice this line:

test -z "$options" && options="-a -o".

> 
> --D
> 
> > Reviewed-by: Zorro Lang <zlang@redhat.com>
> > 
> > >  common/rc             |   20 +++++++
> > >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/725.out |    2 +
> > >  3 files changed, 158 insertions(+)
> > >  create mode 100755 tests/generic/725
> > >  create mode 100644 tests/generic/725.out
> > > 
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index 84757fc1..473bfb0a 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -631,6 +631,26 @@ _ext4_metadump()
> > >  		$DUMP_COMPRESSOR -f "$dumpfile" &>> "$seqres.full"
> > >  }
> > >  
> > > +# Capture the metadata of a filesystem in a dump file for offline analysis
> > > +_metadump_dev() {
> > > +	local device="$1"
> > > +	local dumpfile="$2"
> > > +	local compressopt="$3"
> > > +
> > > +	case "$FSTYP" in
> > > +	ext*)
> > > +		_ext4_metadump $device $dumpfile $compressopt
> > > +		;;
> > > +	xfs)
> > > +		_xfs_metadump $dumpfile $device none $compressopt
> > > +		;;
> > > +	*)
> > > +		echo "Don't know how to metadump $FSTYP"
> > > +		return 1
> > > +		;;
> > > +	esac
> > > +}
> > > +
> > >  _test_mkfs()
> > >  {
> > >      case $FSTYP in
> > > diff --git a/tests/generic/725 b/tests/generic/725
> > > new file mode 100755
> > > index 00000000..ac008fdb
> > > --- /dev/null
> > > +++ b/tests/generic/725
> > > @@ -0,0 +1,136 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > +#
> > > +# FS QA Test No. 725
> > > +#
> > > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > > +# out the underlying scratch device with dm-error to see what happens when the
> > > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > > +# in writeback on the host that cause VM guests to fail to recover.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest shutdown auto log metadata eio recoveryloop
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > +	wait
> > > +	if [ -n "$loopmnt" ]; then
> > > +		$UMOUNT_PROG $loopmnt 2>/dev/null
> > > +		rm -r -f $loopmnt
> > > +	fi
> > > +	rm -f $tmp.*
> > > +	_dmerror_unmount
> > > +	_dmerror_cleanup
> > > +}
> > > +
> > > +# Import common functions.
> > > +. ./common/dmerror
> > > +. ./common/reflink
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +
> > > +_require_scratch_reflink
> > > +_require_cp_reflink
> > > +_require_dm_target error
> > > +_require_command "$KILLALL_PROG" "killall"
> > > +
> > > +echo "Silence is golden."
> > > +
> > > +_scratch_mkfs >> $seqres.full 2>&1
> > > +_require_metadata_journaling $SCRATCH_DEV
> > > +_dmerror_init
> > > +_dmerror_mount
> > > +
> > > +# Create a fs image consuming 1/3 of the scratch fs
> > > +scratch_freesp_bytes=$(_get_available_space $SCRATCH_MNT)
> > > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > > +
> > > +loopimg=$SCRATCH_MNT/testfs
> > > +truncate -s $loopimg_bytes $loopimg
> > > +_mkfs_dev $loopimg
> > > +
> > > +loopmnt=$tmp.mount
> > > +mkdir -p $loopmnt
> > > +
> > > +scratch_aliveflag=$tmp.runsnap
> > > +snap_aliveflag=$tmp.snapping
> > > +
> > > +snap_loop_fs() {
> > > +	touch "$snap_aliveflag"
> > > +	while [ -e "$scratch_aliveflag" ]; do
> > > +		rm -f $loopimg.a
> > > +		_cp_reflink $loopimg $loopimg.a
> > > +		sleep 1
> > > +	done
> > > +	rm -f "$snap_aliveflag"
> > > +}
> > > +
> > > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > > +
> > > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > > +	touch $scratch_aliveflag
> > > +	snap_loop_fs >> $seqres.full 2>&1 &
> > > +
> > > +	if ! _mount $loopimg $loopmnt -o loop; then
> > > +		rm -f $scratch_aliveflag
> > > +		_metadump_dev $loopimg $seqres.loop.$i.md
> > > +		_fail "iteration $i loopimg mount failed"
> > > +		break
> > > +	fi
> > > +
> > > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > > +
> > > +	# purposely include 0 second sleeps to test shutdown immediately after
> > > +	# recovery
> > > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > > +	rm -f $scratch_aliveflag
> > > +
> > > +	# This test aims to simulate sudden disk failure, which means that we
> > > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > > +	# --nolockfs parameter; to make this happen we must call the load
> > > +	# error table helper *without* 'lockfs'.
> > > +	_dmerror_load_error_table
> > > +
> > > +	ps -e | grep fsstress > /dev/null 2>&1
> > > +	while [ $? -eq 0 ]; do
> > > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > +		wait > /dev/null 2>&1
> > > +		ps -e | grep fsstress > /dev/null 2>&1
> > > +	done
> > > +	for ((i = 0; i < 10; i++)); do
> > > +		test -e "$snap_aliveflag" || break
> > > +		sleep 1
> > > +	done
> > > +
> > > +	# Mount again to replay log after loading working table, so we have a
> > > +	# consistent fs after test.
> > > +	$UMOUNT_PROG $loopmnt
> > > +	_dmerror_unmount || _fail "iteration $i scratch unmount failed"
> > > +	_dmerror_load_working_table
> > > +	if ! _dmerror_mount; then
> > > +		_metadump_dev $DMERROR_DEV $seqres.scratch.$i.md
> > > +		_fail "iteration $i scratch mount failed"
> > > +	fi
> > > +done
> > > +
> > > +# Make sure the fs image file is ok
> > > +if [ -f "$loopimg" ]; then
> > > +	if _mount $loopimg $loopmnt -o loop; then
> > > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > > +	else
> > > +		_metadump_dev $DMERROR_DEV $seqres.scratch.final.md
> > > +		echo "final scratch mount failed"
> > > +	fi
> > > +	SCRATCH_RTDEV= SCRATCH_LOGDEV= _check_scratch_fs $loopimg
> > > +fi
> > > +
> > > +# success, all done; let the test harness check the scratch fs
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/725.out b/tests/generic/725.out
> > > new file mode 100644
> > > index 00000000..ed73a9fc
> > > --- /dev/null
> > > +++ b/tests/generic/725.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 725
> > > +Silence is golden.
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] generic: test shutdowns of a nested filesystem
  2021-08-17 23:53 ` [PATCH 2/2] generic: test shutdowns of a nested filesystem Darrick J. Wong
  2021-08-18  7:06   ` Zorro Lang
@ 2021-08-22 11:18   ` Eryu Guan
  2021-08-22 17:23     ` Darrick J. Wong
  1 sibling, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2021-08-22 11:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests

On Tue, Aug 17, 2021 at 04:53:25PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> generic/475, but we're running fsstress on a disk image inside the
> scratch filesystem
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/rc             |   20 +++++++
>  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/725.out |    2 +
>  3 files changed, 158 insertions(+)
>  create mode 100755 tests/generic/725
>  create mode 100644 tests/generic/725.out
> 
> 
> diff --git a/common/rc b/common/rc
> index 84757fc1..473bfb0a 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -631,6 +631,26 @@ _ext4_metadump()
>  		$DUMP_COMPRESSOR -f "$dumpfile" &>> "$seqres.full"
>  }
>  
> +# Capture the metadata of a filesystem in a dump file for offline analysis
> +_metadump_dev() {
> +	local device="$1"
> +	local dumpfile="$2"
> +	local compressopt="$3"
> +
> +	case "$FSTYP" in
> +	ext*)
> +		_ext4_metadump $device $dumpfile $compressopt
> +		;;
> +	xfs)
> +		_xfs_metadump $dumpfile $device none $compressopt
> +		;;
> +	*)
> +		echo "Don't know how to metadump $FSTYP"

This breaks tests on filesystems other than ext* and xfs. I think it's
OK if we only want to use it in failure path, but it's better to
describe the use case in comments.

And Im' wondering if should honor DUMP_CORRUPT_FS, and only do the dump
when it's set.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] generic: test shutdowns of a nested filesystem
  2021-08-22 11:18   ` Eryu Guan
@ 2021-08-22 17:23     ` Darrick J. Wong
  0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2021-08-22 17:23 UTC (permalink / raw)
  To: Eryu Guan; +Cc: guaneryu, linux-xfs, fstests

On Sun, Aug 22, 2021 at 07:18:49PM +0800, Eryu Guan wrote:
> On Tue, Aug 17, 2021 at 04:53:25PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > generic/475, but we're running fsstress on a disk image inside the
> > scratch filesystem
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  common/rc             |   20 +++++++
> >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/725.out |    2 +
> >  3 files changed, 158 insertions(+)
> >  create mode 100755 tests/generic/725
> >  create mode 100644 tests/generic/725.out
> > 
> > 
> > diff --git a/common/rc b/common/rc
> > index 84757fc1..473bfb0a 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -631,6 +631,26 @@ _ext4_metadump()
> >  		$DUMP_COMPRESSOR -f "$dumpfile" &>> "$seqres.full"
> >  }
> >  
> > +# Capture the metadata of a filesystem in a dump file for offline analysis
> > +_metadump_dev() {
> > +	local device="$1"
> > +	local dumpfile="$2"
> > +	local compressopt="$3"
> > +
> > +	case "$FSTYP" in
> > +	ext*)
> > +		_ext4_metadump $device $dumpfile $compressopt
> > +		;;
> > +	xfs)
> > +		_xfs_metadump $dumpfile $device none $compressopt
> > +		;;
> > +	*)
> > +		echo "Don't know how to metadump $FSTYP"
> 
> This breaks tests on filesystems other than ext* and xfs. I think it's
> OK if we only want to use it in failure path, but it's better to
> describe the use case in comments.

Ok, I'll make a note of that in the comment.

"Capture the metadata of a filesystem in a dump file for offline
analysis.  Not all filesystems support this, so this function should
only be used to capture information about a previous test failure."

> And Im' wondering if should honor DUMP_CORRUPT_FS, and only do the dump
> when it's set.

Yes.  Will fix that in the next release.

--D

> Thanks,
> Eryu

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-22 17:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 23:53 [PATCHSET v2 0/2] fstests: exercise code refactored in 5.14 Darrick J. Wong
2021-08-17 23:53 ` [PATCH 1/2] generic: fsstress with cpu offlining Darrick J. Wong
2021-08-18  6:07   ` Zorro Lang
2021-08-18  6:32     ` Zorro Lang
2021-08-18 16:01       ` Darrick J. Wong
2021-08-17 23:53 ` [PATCH 2/2] generic: test shutdowns of a nested filesystem Darrick J. Wong
2021-08-18  7:06   ` Zorro Lang
2021-08-18 15:55     ` Darrick J. Wong
2021-08-18 17:18       ` Zorro Lang
2021-08-22 11:18   ` Eryu Guan
2021-08-22 17:23     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.