linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/3] fstests: exercise code refactored in 5.14
@ 2021-07-28  0:10 Darrick J. Wong
  2021-07-28  0:10 ` [PATCH 1/3] generic: test xattr operations only Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Darrick J. Wong @ 2021-07-28  0:10 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

Hi all,

Add a few tests to exercise code that got refactored in 5.14.  The xattr
tests shook out some bugs in the big extended attributes refactoring,
and the nested shutdown test simulates the process of recovering after a
VM host filesystem goes down and the guests have to recover.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=new-tests-for-5.14
---
 tests/generic/724     |   57 +++++++++++++++
 tests/generic/724.out |    2 +
 tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++
 tests/generic/725.out |    2 +
 tests/xfs/778         |  190 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/778.out     |    2 +
 6 files changed, 389 insertions(+)
 create mode 100755 tests/generic/724
 create mode 100644 tests/generic/724.out
 create mode 100755 tests/generic/725
 create mode 100644 tests/generic/725.out
 create mode 100755 tests/xfs/778
 create mode 100644 tests/xfs/778.out


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] generic: test xattr operations only
  2021-07-28  0:10 [PATCHSET 0/3] fstests: exercise code refactored in 5.14 Darrick J. Wong
@ 2021-07-28  0:10 ` Darrick J. Wong
  2021-08-12  5:34   ` Zorro Lang
  2021-07-28  0:10 ` [PATCH 2/3] generic: test shutdowns of a nested filesystem Darrick J. Wong
  2021-07-28  0:10 ` [PATCH 3/3] xfs: test regression in shrink when the new EOFS splits a sparse inode cluster Darrick J. Wong
  2 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2021-07-28  0:10 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Exercise extended attribute operations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/724     |   57 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/724.out |    2 ++
 2 files changed, 59 insertions(+)
 create mode 100755 tests/generic/724
 create mode 100644 tests/generic/724.out


diff --git a/tests/generic/724 b/tests/generic/724
new file mode 100755
index 00000000..b19f8f73
--- /dev/null
+++ b/tests/generic/724
@@ -0,0 +1,57 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
+#
+# FS QA Test No. 724
+#
+# Run an extended attributes fsstress run with multiple threads to shake out
+# bugs in the xattr code.
+#
+. ./common/preamble
+_begin_fstest soak attr long_rw stress
+
+_cleanup()
+{
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+	cd /
+	rm -f $tmp.*
+}
+
+# Modify as appropriate.
+_supported_fs generic
+
+_require_scratch
+_require_command "$KILLALL_PROG" "killall"
+
+echo "Silence is golden."
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+nr_cpus=$((LOAD_FACTOR * 4))
+nr_ops=$((70000 * nr_cpus * TIME_FACTOR))
+
+args=('-z' '-S' 'c')
+
+# Do some directory tree modifications, but the bulk of this is geared towards
+# exercising the xattr code, especially attr_set which can do up to 10k values.
+for verb in unlink rmdir; do
+	args+=('-f' "${verb}=1")
+done
+for verb in creat mkdir; do
+	args+=('-f' "${verb}=2")
+done
+for verb in getfattr listfattr; do
+	args+=('-f' "${verb}=3")
+done
+for verb in attr_remove removefattr; do
+	args+=('-f' "${verb}=4")
+done
+args+=('-f' "setfattr=20")
+args+=('-f' "attr_set=60")	# sets larger xattrs
+
+$FSSTRESS_PROG "${args[@]}" $FSSTRESS_AVOID -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/724.out b/tests/generic/724.out
new file mode 100644
index 00000000..164cfffb
--- /dev/null
+++ b/tests/generic/724.out
@@ -0,0 +1,2 @@
+QA output created by 724
+Silence is golden.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-07-28  0:10 [PATCHSET 0/3] fstests: exercise code refactored in 5.14 Darrick J. Wong
  2021-07-28  0:10 ` [PATCH 1/3] generic: test xattr operations only Darrick J. Wong
@ 2021-07-28  0:10 ` Darrick J. Wong
  2021-08-12  5:44   ` Zorro Lang
  2021-08-15 16:28   ` Eryu Guan
  2021-07-28  0:10 ` [PATCH 3/3] xfs: test regression in shrink when the new EOFS splits a sparse inode cluster Darrick J. Wong
  2 siblings, 2 replies; 16+ messages in thread
From: Darrick J. Wong @ 2021-07-28  0:10 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

generic/475, but we're running fsstress on a disk image inside the
scratch filesystem

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/725.out |    2 +
 2 files changed, 138 insertions(+)
 create mode 100755 tests/generic/725
 create mode 100644 tests/generic/725.out


diff --git a/tests/generic/725 b/tests/generic/725
new file mode 100755
index 00000000..f43bcb37
--- /dev/null
+++ b/tests/generic/725
@@ -0,0 +1,136 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
+#
+# FS QA Test No. 725
+#
+# Test nested log recovery with repeated (simulated) disk failures.  We kick
+# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
+# out the underlying scratch device with dm-error to see what happens when the
+# disk goes down.  Having taken down both fses in this manner, remount them and
+# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
+# in writeback on the host that cause VM guests to fail to recover.
+#
+. ./common/preamble
+_begin_fstest shutdown auto log metadata eio
+
+_cleanup()
+{
+	cd /
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+	wait
+	if [ -n "$loopmnt" ]; then
+		umount $loopmnt 2>/dev/null
+		rm -r -f $loopmnt
+	fi
+	rm -f $tmp.*
+	_dmerror_unmount
+	_dmerror_cleanup
+}
+
+# Import common functions.
+. ./common/dmerror
+. ./common/reflink
+
+# Modify as appropriate.
+_supported_fs generic
+
+_require_scratch_reflink
+_require_cp_reflink
+_require_dm_target error
+_require_command "$KILLALL_PROG" "killall"
+
+echo "Silence is golden."
+
+_scratch_mkfs >> $seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_dmerror_init
+_dmerror_mount
+
+# Create a fs image consuming 1/3 of the scratch fs
+scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
+loopimg_bytes=$((scratch_freesp_bytes / 3))
+
+loopimg=$SCRATCH_MNT/testfs
+truncate -s $loopimg_bytes $loopimg
+_mkfs_dev $loopimg
+
+loopmnt=$tmp.mount
+mkdir -p $loopmnt
+
+scratch_aliveflag=$tmp.runsnap
+snap_aliveflag=$tmp.snapping
+
+snap_loop_fs() {
+	touch "$snap_aliveflag"
+	while [ -e "$scratch_aliveflag" ]; do
+		rm -f $loopimg.a
+		_cp_reflink $loopimg $loopimg.a
+		sleep 1
+	done
+	rm -f "$snap_aliveflag"
+}
+
+fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
+
+for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
+	touch $scratch_aliveflag
+	snap_loop_fs >> $seqres.full 2>&1 &
+
+	if ! _mount $loopimg $loopmnt -o loop; then
+		rm -f $scratch_aliveflag
+		_fail "loop mount failed"
+		break
+	fi
+
+	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
+
+	# purposely include 0 second sleeps to test shutdown immediately after
+	# recovery
+	sleep $((RANDOM % (3 * TIME_FACTOR) ))
+	rm -f $scratch_aliveflag
+
+	# This test aims to simulate sudden disk failure, which means that we
+	# do not want to quiesce the filesystem or otherwise give it a chance
+	# to flush its logs.  Therefore we want to call dmsetup with the
+	# --nolockfs parameter; to make this happen we must call the load
+	# error table helper *without* 'lockfs'.
+	_dmerror_load_error_table
+
+	ps -e | grep fsstress > /dev/null 2>&1
+	while [ $? -eq 0 ]; do
+		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+		wait > /dev/null 2>&1
+		ps -e | grep fsstress > /dev/null 2>&1
+	done
+	for ((i = 0; i < 10; i++)); do
+		test -e "$snap_aliveflag" || break
+		sleep 1
+	done
+
+	# Mount again to replay log after loading working table, so we have a
+	# consistent XFS after test.
+	$UMOUNT_PROG $loopmnt
+	_dmerror_unmount || _fail "unmount failed"
+	_dmerror_load_working_table
+	if ! _dmerror_mount; then
+		dmsetup table | tee -a /dev/ttyprintk
+		lsblk | tee -a /dev/ttyprintk
+		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
+		_fail "mount failed"
+	fi
+done
+
+# Make sure the fs image file is ok
+if [ -f "$loopimg" ]; then
+	if _mount $loopimg $loopmnt -o loop; then
+		$UMOUNT_PROG $loopmnt &> /dev/null
+	else
+		echo "final loop mount failed"
+	fi
+	_check_xfs_filesystem $loopimg none none
+fi
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/725.out b/tests/generic/725.out
new file mode 100644
index 00000000..ed73a9fc
--- /dev/null
+++ b/tests/generic/725.out
@@ -0,0 +1,2 @@
+QA output created by 725
+Silence is golden.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] xfs: test regression in shrink when the new EOFS splits a sparse inode cluster
  2021-07-28  0:10 [PATCHSET 0/3] fstests: exercise code refactored in 5.14 Darrick J. Wong
  2021-07-28  0:10 ` [PATCH 1/3] generic: test xattr operations only Darrick J. Wong
  2021-07-28  0:10 ` [PATCH 2/3] generic: test shutdowns of a nested filesystem Darrick J. Wong
@ 2021-07-28  0:10 ` Darrick J. Wong
  2021-08-12  6:01   ` Zorro Lang
  2 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2021-07-28  0:10 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

This is a targeted regression test for the patch "xfs: check for sparse
inode clusters that cross new EOAG when shrinking", which was found by
running the random-loopy shrink stresser xfs/168.

The original shrink implementation assumed that if we could allocate the
last free extent in the filesystem, it was ok to proceed with the fs
shrink.  Unfortunately, this isn't quite the case -- if there's a sparse
inode cluster such that the blocks at the end of the cluster are free,
it is not ok to shrink the fs to the point that part of the cluster
hangs off the end of the filesystem.  Doing so results in repair and
scrub marking the filesystem corrupt, so we must not.

(EOFS == "end of filesystem"; EOAG == "end of allocation group")

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/778     |  190 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/778.out |    2 +
 2 files changed, 192 insertions(+)
 create mode 100755 tests/xfs/778
 create mode 100644 tests/xfs/778.out


diff --git a/tests/xfs/778 b/tests/xfs/778
new file mode 100755
index 00000000..73cebaf1
--- /dev/null
+++ b/tests/xfs/778
@@ -0,0 +1,190 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2021 Oracle.  All Rights Reserved.
+#
+# FS QA Test 778
+#
+# Ensure that online shrink does not let us shrink the fs such that the end
+# of the filesystem is now in the middle of a sparse inode cluster.
+#
+. ./common/preamble
+_begin_fstest auto quick shrink
+
+# Import common functions.
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_require_scratch
+_require_xfs_sparse_inodes
+_require_scratch_xfs_shrink
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fpunch"
+
+_scratch_mkfs "-d size=50m -m crc=1 -i sparse" |
+	_filter_mkfs > /dev/null 2> $tmp.mkfs
+. $tmp.mkfs	# for isize
+cat $tmp.mkfs >> $seqres.full
+
+daddr_to_fsblocks=$((dbsize / 512))
+
+convert_units() {
+	_scratch_xfs_db -f -c "$@" | sed -e 's/^.*(\([0-9]*\)).*$/\1/g'
+}
+
+# Figure out the next possible inode number after the log, since we can't
+# shrink or relocate the log
+logstart=$(_scratch_xfs_get_metadata_field 'logstart' 'sb')
+if [ $logstart -gt 0 ]; then
+	logblocks=$(_scratch_xfs_get_metadata_field 'logblocks' 'sb')
+	logend=$((logstart + logblocks))
+	logend_agno=$(convert_units "convert fsb $logend agno")
+	logend_agino=$(convert_units "convert fsb $logend agino")
+else
+	logend_agno=0
+	logend_agino=0
+fi
+
+_scratch_mount
+_xfs_force_bdev data $SCRATCH_MNT
+old_dblocks=$($XFS_IO_PROG -c 'statfs' $SCRATCH_MNT | grep geom.datablocks)
+
+mkdir $SCRATCH_MNT/save/
+sino=$(stat -c '%i' $SCRATCH_MNT/save)
+
+_consume_freesp()
+{
+	file=$1
+
+	# consume nearly all available space (leave ~1MB)
+	avail=`_get_available_space $SCRATCH_MNT`
+	filesizemb=$((avail / 1024 / 1024 - 1))
+	$XFS_IO_PROG -fc "falloc 0 ${filesizemb}m" $file
+}
+
+# Allocate inodes in a directory until failure.
+_alloc_inodes()
+{
+	dir=$1
+
+	i=0
+	while [ true ]; do
+		touch $dir/$i 2>> $seqres.full || break
+		i=$((i + 1))
+	done
+}
+
+# Find a sparse inode cluster after logend_agno/logend_agino.
+find_sparse_clusters()
+{
+	for ((agno = agcount - 1; agno >= logend_agno; agno--)); do
+		_scratch_xfs_db -c "agi $agno" -c "addr root" -c "btdump" | \
+			tr ':[,]' '    ' | \
+			awk -v "agno=$agno" \
+			    -v "agino=$logend_agino" \
+'{if ($2 >= agino && and(strtonum($3), 0x8000)) {printf("%s %s %s\n", agno, $2, $3);}}' | \
+			tac
+	done
+}
+
+# Calculate the fs inode chunk size based on the inode size and fixed 64-inode
+# record. This value is used as the target level of free space fragmentation
+# induced by the test (i.e., max size of free extents). We don't need to go
+# smaller than a full chunk because the XFS block allocator tacks on alignment
+# requirements to the size of the requested allocation. In other words, a chunk
+# sized free chunk is not enough to guarantee a successful chunk sized
+# allocation.
+XFS_INODES_PER_CHUNK=64
+CHUNK_SIZE=$((isize * XFS_INODES_PER_CHUNK))
+
+_consume_freesp $SCRATCH_MNT/spc
+
+# Now that the fs is nearly full, punch holes in every other $CHUNK_SIZE range
+# of the space consumer file.  The goal here is to end up with a sparse cluster
+# at the end of the fs (and past any internal log), where the chunks at the end
+# of the cluster are sparse.
+
+offset=`_get_filesize $SCRATCH_MNT/spc`
+offset=$((offset - $CHUNK_SIZE * 2))
+nr=0
+while [ $offset -ge 0 ]; do
+	$XFS_IO_PROG -c "fpunch $offset $CHUNK_SIZE" $SCRATCH_MNT/spc \
+		2>> $seqres.full || _fail "fpunch failed"
+
+	# allocate as many inodes as possible
+	mkdir -p $SCRATCH_MNT/urk/offset.$offset > /dev/null 2>&1
+	_alloc_inodes $SCRATCH_MNT/urk/offset.$offset
+
+	offset=$((offset - $CHUNK_SIZE * 2))
+
+	# Every five times through the loop, see if we got a sparse cluster
+	nr=$((nr + 1))
+	if [ $((nr % 5)) -eq 4 ]; then
+		_scratch_unmount
+		find_sparse_clusters > $tmp.clusters
+		if [ -s $tmp.clusters ]; then
+			break;
+		fi
+		_scratch_mount
+	fi
+done
+
+test -s $tmp.clusters || _notrun "Could not create a sparse inode cluster"
+
+echo clusters >> $seqres.full
+cat $tmp.clusters >> $seqres.full
+
+# Figure out which inode numbers are in that last cluster.  We need to preserve
+# that cluster but delete everything else ahead of shrinking.
+icluster_agno=$(head -n 1 $tmp.clusters | cut -d ' ' -f 1)
+icluster_agino=$(head -n 1 $tmp.clusters | cut -d ' ' -f 2)
+icluster_ino=$(convert_units "convert agno $icluster_agno agino $icluster_agino ino")
+
+# Check that the save directory isn't going to prevent us from shrinking
+test $sino -lt $icluster_ino || \
+	echo "/save inode comes after target cluster, test may fail"
+
+# Save the inodes in the last cluster and delete everything else
+_scratch_mount
+rm -r $SCRATCH_MNT/spc
+for ((ino = icluster_ino; ino < icluster_ino + XFS_INODES_PER_CHUNK; ino++)); do
+	find $SCRATCH_MNT/urk/ -inum "$ino" -print0 | xargs -r -0 mv -t $SCRATCH_MNT/save/
+done
+rm -rf $SCRATCH_MNT/urk/ $SCRATCH_MNT/save/*/*
+sync
+$XFS_IO_PROG -c 'fsmap -vvvvv' $SCRATCH_MNT &>> $seqres.full
+
+# Propose shrinking the filesystem such that the end of the fs ends up in the
+# sparse part of our sparse cluster.  Remember, the last block of that cluster
+# ought to be free.
+target_ino=$((icluster_ino + XFS_INODES_PER_CHUNK - 1))
+for ((ino = target_ino; ino >= icluster_ino; ino--)); do
+	found=$(find $SCRATCH_MNT/save/ -inum "$ino" | wc -l)
+	test $found -gt 0 && break
+
+	ino_daddr=$(convert_units "convert ino $ino daddr")
+	new_size=$((ino_daddr / daddr_to_fsblocks))
+
+	echo "Hope to fail at shrinking to $new_size" >> $seqres.full
+	$XFS_GROWFS_PROG -D $new_size $SCRATCH_MNT &>> $seqres.full
+	res=$?
+
+	# Make sure shrink did not work
+	new_dblocks=$($XFS_IO_PROG -c 'statfs' $SCRATCH_MNT | grep geom.datablocks)
+	if [ "$new_dblocks" != "$old_dblocks" ]; then
+		echo "should not have shrank $old_dblocks -> $new_dblocks"
+		break
+	fi
+
+	if [ $res -eq 0 ]; then
+		echo "shrink to $new_size (ino $ino) should have failed"
+		break
+	fi
+done
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/778.out b/tests/xfs/778.out
new file mode 100644
index 00000000..e80f72a3
--- /dev/null
+++ b/tests/xfs/778.out
@@ -0,0 +1,2 @@
+QA output created by 778
+Silence is golden


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] generic: test xattr operations only
  2021-07-28  0:10 ` [PATCH 1/3] generic: test xattr operations only Darrick J. Wong
@ 2021-08-12  5:34   ` Zorro Lang
  2021-08-12 17:04     ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Zorro Lang @ 2021-08-12  5:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Jul 27, 2021 at 05:10:24PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Exercise extended attribute operations.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/generic/724     |   57 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/724.out |    2 ++
>  2 files changed, 59 insertions(+)
>  create mode 100755 tests/generic/724
>  create mode 100644 tests/generic/724.out
> 
> 
> diff --git a/tests/generic/724 b/tests/generic/724
> new file mode 100755
> index 00000000..b19f8f73
> --- /dev/null
> +++ b/tests/generic/724
> @@ -0,0 +1,57 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> +#
> +# FS QA Test No. 724
> +#
> +# Run an extended attributes fsstress run with multiple threads to shake out
> +# bugs in the xattr code.
> +#
> +. ./common/preamble
> +_begin_fstest soak attr long_rw stress

Should we add this test into 'auto' group too?

> +
> +_cleanup()
> +{
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1

Can a "wait" command help more at here?

Others looks good to me.

Thanks,
Zorro

> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +
> +_require_scratch
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((70000 * nr_cpus * TIME_FACTOR))
> +
> +args=('-z' '-S' 'c')
> +
> +# Do some directory tree modifications, but the bulk of this is geared towards
> +# exercising the xattr code, especially attr_set which can do up to 10k values.
> +for verb in unlink rmdir; do
> +	args+=('-f' "${verb}=1")
> +done
> +for verb in creat mkdir; do
> +	args+=('-f' "${verb}=2")
> +done
> +for verb in getfattr listfattr; do
> +	args+=('-f' "${verb}=3")
> +done
> +for verb in attr_remove removefattr; do
> +	args+=('-f' "${verb}=4")
> +done
> +args+=('-f' "setfattr=20")
> +args+=('-f' "attr_set=60")	# sets larger xattrs
> +
> +$FSSTRESS_PROG "${args[@]}" $FSSTRESS_AVOID -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/724.out b/tests/generic/724.out
> new file mode 100644
> index 00000000..164cfffb
> --- /dev/null
> +++ b/tests/generic/724.out
> @@ -0,0 +1,2 @@
> +QA output created by 724
> +Silence is golden.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-07-28  0:10 ` [PATCH 2/3] generic: test shutdowns of a nested filesystem Darrick J. Wong
@ 2021-08-12  5:44   ` Zorro Lang
  2021-08-12 17:07     ` Darrick J. Wong
  2021-08-15 16:28   ` Eryu Guan
  1 sibling, 1 reply; 16+ messages in thread
From: Zorro Lang @ 2021-08-12  5:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> generic/475, but we're running fsstress on a disk image inside the
> scratch filesystem
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/725.out |    2 +
>  2 files changed, 138 insertions(+)
>  create mode 100755 tests/generic/725
>  create mode 100644 tests/generic/725.out
> 
> 
> diff --git a/tests/generic/725 b/tests/generic/725
> new file mode 100755
> index 00000000..f43bcb37
> --- /dev/null
> +++ b/tests/generic/725
> @@ -0,0 +1,136 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> +#
> +# FS QA Test No. 725
> +#
> +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> +# out the underlying scratch device with dm-error to see what happens when the
> +# disk goes down.  Having taken down both fses in this manner, remount them and
> +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> +# in writeback on the host that cause VM guests to fail to recover.
> +#
> +. ./common/preamble
> +_begin_fstest shutdown auto log metadata eio
> +
> +_cleanup()
> +{
> +	cd /
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +	wait
> +	if [ -n "$loopmnt" ]; then
> +		umount $loopmnt 2>/dev/null
> +		rm -r -f $loopmnt
> +	fi
> +	rm -f $tmp.*
> +	_dmerror_unmount
> +	_dmerror_cleanup
> +}
> +
> +# Import common functions.
> +. ./common/dmerror
> +. ./common/reflink
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +
> +_require_scratch_reflink
> +_require_cp_reflink
> +_require_dm_target error
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +_require_metadata_journaling $SCRATCH_DEV
> +_dmerror_init
> +_dmerror_mount
> +
> +# Create a fs image consuming 1/3 of the scratch fs
> +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> +loopimg_bytes=$((scratch_freesp_bytes / 3))
> +
> +loopimg=$SCRATCH_MNT/testfs
> +truncate -s $loopimg_bytes $loopimg
> +_mkfs_dev $loopimg

I must say this's a nice test as generic/475, I'd like to have it ASAP :)
Just one question: if the FSTYP is nfs, cifs or virtiofs and so on ... [see below]

> +
> +loopmnt=$tmp.mount
> +mkdir -p $loopmnt
> +
> +scratch_aliveflag=$tmp.runsnap
> +snap_aliveflag=$tmp.snapping
> +
> +snap_loop_fs() {
> +	touch "$snap_aliveflag"
> +	while [ -e "$scratch_aliveflag" ]; do
> +		rm -f $loopimg.a
> +		_cp_reflink $loopimg $loopimg.a
> +		sleep 1
> +	done
> +	rm -f "$snap_aliveflag"
> +}
> +
> +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> +
> +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> +	touch $scratch_aliveflag
> +	snap_loop_fs >> $seqres.full 2>&1 &
> +
> +	if ! _mount $loopimg $loopmnt -o loop; then

... This test will fail directly at here

Thanks,
Zorro

> +		rm -f $scratch_aliveflag
> +		_fail "loop mount failed"
> +		break
> +	fi
> +
> +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> +
> +	# purposely include 0 second sleeps to test shutdown immediately after
> +	# recovery
> +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> +	rm -f $scratch_aliveflag
> +
> +	# This test aims to simulate sudden disk failure, which means that we
> +	# do not want to quiesce the filesystem or otherwise give it a chance
> +	# to flush its logs.  Therefore we want to call dmsetup with the
> +	# --nolockfs parameter; to make this happen we must call the load
> +	# error table helper *without* 'lockfs'.
> +	_dmerror_load_error_table
> +
> +	ps -e | grep fsstress > /dev/null 2>&1
> +	while [ $? -eq 0 ]; do
> +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +		wait > /dev/null 2>&1
> +		ps -e | grep fsstress > /dev/null 2>&1
> +	done
> +	for ((i = 0; i < 10; i++)); do
> +		test -e "$snap_aliveflag" || break
> +		sleep 1
> +	done
> +
> +	# Mount again to replay log after loading working table, so we have a
> +	# consistent XFS after test.
> +	$UMOUNT_PROG $loopmnt
> +	_dmerror_unmount || _fail "unmount failed"
> +	_dmerror_load_working_table
> +	if ! _dmerror_mount; then
> +		dmsetup table | tee -a /dev/ttyprintk
> +		lsblk | tee -a /dev/ttyprintk
> +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> +		_fail "mount failed"
> +	fi
> +done
> +
> +# Make sure the fs image file is ok
> +if [ -f "$loopimg" ]; then
> +	if _mount $loopimg $loopmnt -o loop; then
> +		$UMOUNT_PROG $loopmnt &> /dev/null
> +	else
> +		echo "final loop mount failed"
> +	fi
> +	_check_xfs_filesystem $loopimg none none
> +fi
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/725.out b/tests/generic/725.out
> new file mode 100644
> index 00000000..ed73a9fc
> --- /dev/null
> +++ b/tests/generic/725.out
> @@ -0,0 +1,2 @@
> +QA output created by 725
> +Silence is golden.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] xfs: test regression in shrink when the new EOFS splits a sparse inode cluster
  2021-07-28  0:10 ` [PATCH 3/3] xfs: test regression in shrink when the new EOFS splits a sparse inode cluster Darrick J. Wong
@ 2021-08-12  6:01   ` Zorro Lang
  0 siblings, 0 replies; 16+ messages in thread
From: Zorro Lang @ 2021-08-12  6:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Tue, Jul 27, 2021 at 05:10:35PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This is a targeted regression test for the patch "xfs: check for sparse
> inode clusters that cross new EOAG when shrinking", which was found by
> running the random-loopy shrink stresser xfs/168.
> 
> The original shrink implementation assumed that if we could allocate the
> last free extent in the filesystem, it was ok to proceed with the fs
> shrink.  Unfortunately, this isn't quite the case -- if there's a sparse
> inode cluster such that the blocks at the end of the cluster are free,
> it is not ok to shrink the fs to the point that part of the cluster
> hangs off the end of the filesystem.  Doing so results in repair and
> scrub marking the filesystem corrupt, so we must not.
> 
> (EOFS == "end of filesystem"; EOAG == "end of allocation group")
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

Good to me, and test passed on my system.

Reviewed-by: Zorro Lang <zlang@redhat.com>

>  tests/xfs/778     |  190 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/778.out |    2 +
>  2 files changed, 192 insertions(+)
>  create mode 100755 tests/xfs/778
>  create mode 100644 tests/xfs/778.out
> 
> 
> diff --git a/tests/xfs/778 b/tests/xfs/778
> new file mode 100755
> index 00000000..73cebaf1
> --- /dev/null
> +++ b/tests/xfs/778
> @@ -0,0 +1,190 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 778
> +#
> +# Ensure that online shrink does not let us shrink the fs such that the end
> +# of the filesystem is now in the middle of a sparse inode cluster.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick shrink
> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_require_scratch
> +_require_xfs_sparse_inodes
> +_require_scratch_xfs_shrink
> +_require_xfs_io_command "falloc"
> +_require_xfs_io_command "fpunch"
> +
> +_scratch_mkfs "-d size=50m -m crc=1 -i sparse" |
> +	_filter_mkfs > /dev/null 2> $tmp.mkfs
> +. $tmp.mkfs	# for isize
> +cat $tmp.mkfs >> $seqres.full
> +
> +daddr_to_fsblocks=$((dbsize / 512))
> +
> +convert_units() {
> +	_scratch_xfs_db -f -c "$@" | sed -e 's/^.*(\([0-9]*\)).*$/\1/g'
> +}
> +
> +# Figure out the next possible inode number after the log, since we can't
> +# shrink or relocate the log
> +logstart=$(_scratch_xfs_get_metadata_field 'logstart' 'sb')
> +if [ $logstart -gt 0 ]; then
> +	logblocks=$(_scratch_xfs_get_metadata_field 'logblocks' 'sb')
> +	logend=$((logstart + logblocks))
> +	logend_agno=$(convert_units "convert fsb $logend agno")
> +	logend_agino=$(convert_units "convert fsb $logend agino")
> +else
> +	logend_agno=0
> +	logend_agino=0
> +fi
> +
> +_scratch_mount
> +_xfs_force_bdev data $SCRATCH_MNT
> +old_dblocks=$($XFS_IO_PROG -c 'statfs' $SCRATCH_MNT | grep geom.datablocks)
> +
> +mkdir $SCRATCH_MNT/save/
> +sino=$(stat -c '%i' $SCRATCH_MNT/save)
> +
> +_consume_freesp()
> +{
> +	file=$1
> +
> +	# consume nearly all available space (leave ~1MB)
> +	avail=`_get_available_space $SCRATCH_MNT`
> +	filesizemb=$((avail / 1024 / 1024 - 1))
> +	$XFS_IO_PROG -fc "falloc 0 ${filesizemb}m" $file
> +}
> +
> +# Allocate inodes in a directory until failure.
> +_alloc_inodes()
> +{
> +	dir=$1
> +
> +	i=0
> +	while [ true ]; do
> +		touch $dir/$i 2>> $seqres.full || break
> +		i=$((i + 1))
> +	done
> +}
> +
> +# Find a sparse inode cluster after logend_agno/logend_agino.
> +find_sparse_clusters()
> +{
> +	for ((agno = agcount - 1; agno >= logend_agno; agno--)); do
> +		_scratch_xfs_db -c "agi $agno" -c "addr root" -c "btdump" | \
> +			tr ':[,]' '    ' | \
> +			awk -v "agno=$agno" \
> +			    -v "agino=$logend_agino" \
> +'{if ($2 >= agino && and(strtonum($3), 0x8000)) {printf("%s %s %s\n", agno, $2, $3);}}' | \
> +			tac
> +	done
> +}
> +
> +# Calculate the fs inode chunk size based on the inode size and fixed 64-inode
> +# record. This value is used as the target level of free space fragmentation
> +# induced by the test (i.e., max size of free extents). We don't need to go
> +# smaller than a full chunk because the XFS block allocator tacks on alignment
> +# requirements to the size of the requested allocation. In other words, a chunk
> +# sized free chunk is not enough to guarantee a successful chunk sized
> +# allocation.
> +XFS_INODES_PER_CHUNK=64
> +CHUNK_SIZE=$((isize * XFS_INODES_PER_CHUNK))
> +
> +_consume_freesp $SCRATCH_MNT/spc
> +
> +# Now that the fs is nearly full, punch holes in every other $CHUNK_SIZE range
> +# of the space consumer file.  The goal here is to end up with a sparse cluster
> +# at the end of the fs (and past any internal log), where the chunks at the end
> +# of the cluster are sparse.
> +
> +offset=`_get_filesize $SCRATCH_MNT/spc`
> +offset=$((offset - $CHUNK_SIZE * 2))
> +nr=0
> +while [ $offset -ge 0 ]; do
> +	$XFS_IO_PROG -c "fpunch $offset $CHUNK_SIZE" $SCRATCH_MNT/spc \
> +		2>> $seqres.full || _fail "fpunch failed"
> +
> +	# allocate as many inodes as possible
> +	mkdir -p $SCRATCH_MNT/urk/offset.$offset > /dev/null 2>&1
> +	_alloc_inodes $SCRATCH_MNT/urk/offset.$offset
> +
> +	offset=$((offset - $CHUNK_SIZE * 2))
> +
> +	# Every five times through the loop, see if we got a sparse cluster
> +	nr=$((nr + 1))
> +	if [ $((nr % 5)) -eq 4 ]; then
> +		_scratch_unmount
> +		find_sparse_clusters > $tmp.clusters
> +		if [ -s $tmp.clusters ]; then
> +			break;
> +		fi
> +		_scratch_mount
> +	fi
> +done
> +
> +test -s $tmp.clusters || _notrun "Could not create a sparse inode cluster"
> +
> +echo clusters >> $seqres.full
> +cat $tmp.clusters >> $seqres.full
> +
> +# Figure out which inode numbers are in that last cluster.  We need to preserve
> +# that cluster but delete everything else ahead of shrinking.
> +icluster_agno=$(head -n 1 $tmp.clusters | cut -d ' ' -f 1)
> +icluster_agino=$(head -n 1 $tmp.clusters | cut -d ' ' -f 2)
> +icluster_ino=$(convert_units "convert agno $icluster_agno agino $icluster_agino ino")
> +
> +# Check that the save directory isn't going to prevent us from shrinking
> +test $sino -lt $icluster_ino || \
> +	echo "/save inode comes after target cluster, test may fail"
> +
> +# Save the inodes in the last cluster and delete everything else
> +_scratch_mount
> +rm -r $SCRATCH_MNT/spc
> +for ((ino = icluster_ino; ino < icluster_ino + XFS_INODES_PER_CHUNK; ino++)); do
> +	find $SCRATCH_MNT/urk/ -inum "$ino" -print0 | xargs -r -0 mv -t $SCRATCH_MNT/save/
> +done
> +rm -rf $SCRATCH_MNT/urk/ $SCRATCH_MNT/save/*/*
> +sync
> +$XFS_IO_PROG -c 'fsmap -vvvvv' $SCRATCH_MNT &>> $seqres.full
> +
> +# Propose shrinking the filesystem such that the end of the fs ends up in the
> +# sparse part of our sparse cluster.  Remember, the last block of that cluster
> +# ought to be free.
> +target_ino=$((icluster_ino + XFS_INODES_PER_CHUNK - 1))
> +for ((ino = target_ino; ino >= icluster_ino; ino--)); do
> +	found=$(find $SCRATCH_MNT/save/ -inum "$ino" | wc -l)
> +	test $found -gt 0 && break
> +
> +	ino_daddr=$(convert_units "convert ino $ino daddr")
> +	new_size=$((ino_daddr / daddr_to_fsblocks))
> +
> +	echo "Hope to fail at shrinking to $new_size" >> $seqres.full
> +	$XFS_GROWFS_PROG -D $new_size $SCRATCH_MNT &>> $seqres.full
> +	res=$?
> +
> +	# Make sure shrink did not work
> +	new_dblocks=$($XFS_IO_PROG -c 'statfs' $SCRATCH_MNT | grep geom.datablocks)
> +	if [ "$new_dblocks" != "$old_dblocks" ]; then
> +		echo "should not have shrank $old_dblocks -> $new_dblocks"
> +		break
> +	fi
> +
> +	if [ $res -eq 0 ]; then
> +		echo "shrink to $new_size (ino $ino) should have failed"
> +		break
> +	fi
> +done
> +
> +# success, all done
> +echo Silence is golden
> +status=0
> +exit
> diff --git a/tests/xfs/778.out b/tests/xfs/778.out
> new file mode 100644
> index 00000000..e80f72a3
> --- /dev/null
> +++ b/tests/xfs/778.out
> @@ -0,0 +1,2 @@
> +QA output created by 778
> +Silence is golden
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] generic: test xattr operations only
  2021-08-12  5:34   ` Zorro Lang
@ 2021-08-12 17:04     ` Darrick J. Wong
  2021-08-15 15:46       ` Eryu Guan
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2021-08-12 17:04 UTC (permalink / raw)
  To: guaneryu, linux-xfs, fstests, guan

On Thu, Aug 12, 2021 at 01:34:52PM +0800, Zorro Lang wrote:
> On Tue, Jul 27, 2021 at 05:10:24PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Exercise extended attribute operations.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/generic/724     |   57 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/724.out |    2 ++
> >  2 files changed, 59 insertions(+)
> >  create mode 100755 tests/generic/724
> >  create mode 100644 tests/generic/724.out
> > 
> > 
> > diff --git a/tests/generic/724 b/tests/generic/724
> > new file mode 100755
> > index 00000000..b19f8f73
> > --- /dev/null
> > +++ b/tests/generic/724
> > @@ -0,0 +1,57 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test No. 724
> > +#
> > +# Run an extended attributes fsstress run with multiple threads to shake out
> > +# bugs in the xattr code.
> > +#
> > +. ./common/preamble
> > +_begin_fstest soak attr long_rw stress
> 
> Should we add this test into 'auto' group too?

Yes, fixed.

> > +
> > +_cleanup()
> > +{
> > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> 
> Can a "wait" command help more at here?

Ok, I"ll add that.

--D

> Others looks good to me.
> 
> Thanks,
> Zorro
> 
> > +	cd /
> > +	rm -f $tmp.*
> > +}
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +
> > +_require_scratch
> > +_require_command "$KILLALL_PROG" "killall"
> > +
> > +echo "Silence is golden."
> > +
> > +_scratch_mkfs > $seqres.full 2>&1
> > +_scratch_mount >> $seqres.full 2>&1
> > +
> > +nr_cpus=$((LOAD_FACTOR * 4))
> > +nr_ops=$((70000 * nr_cpus * TIME_FACTOR))
> > +
> > +args=('-z' '-S' 'c')
> > +
> > +# Do some directory tree modifications, but the bulk of this is geared towards
> > +# exercising the xattr code, especially attr_set which can do up to 10k values.
> > +for verb in unlink rmdir; do
> > +	args+=('-f' "${verb}=1")
> > +done
> > +for verb in creat mkdir; do
> > +	args+=('-f' "${verb}=2")
> > +done
> > +for verb in getfattr listfattr; do
> > +	args+=('-f' "${verb}=3")
> > +done
> > +for verb in attr_remove removefattr; do
> > +	args+=('-f' "${verb}=4")
> > +done
> > +args+=('-f' "setfattr=20")
> > +args+=('-f' "attr_set=60")	# sets larger xattrs
> > +
> > +$FSSTRESS_PROG "${args[@]}" $FSSTRESS_AVOID -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/724.out b/tests/generic/724.out
> > new file mode 100644
> > index 00000000..164cfffb
> > --- /dev/null
> > +++ b/tests/generic/724.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 724
> > +Silence is golden.
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-08-12  5:44   ` Zorro Lang
@ 2021-08-12 17:07     ` Darrick J. Wong
  2021-08-13 14:52       ` Zorro Lang
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2021-08-12 17:07 UTC (permalink / raw)
  To: guaneryu, linux-xfs, fstests, guan

On Thu, Aug 12, 2021 at 01:44:21PM +0800, Zorro Lang wrote:
> On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > generic/475, but we're running fsstress on a disk image inside the
> > scratch filesystem
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/725.out |    2 +
> >  2 files changed, 138 insertions(+)
> >  create mode 100755 tests/generic/725
> >  create mode 100644 tests/generic/725.out
> > 
> > 
> > diff --git a/tests/generic/725 b/tests/generic/725
> > new file mode 100755
> > index 00000000..f43bcb37
> > --- /dev/null
> > +++ b/tests/generic/725
> > @@ -0,0 +1,136 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test No. 725
> > +#
> > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > +# out the underlying scratch device with dm-error to see what happens when the
> > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > +# in writeback on the host that cause VM guests to fail to recover.
> > +#
> > +. ./common/preamble
> > +_begin_fstest shutdown auto log metadata eio
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +	wait
> > +	if [ -n "$loopmnt" ]; then
> > +		umount $loopmnt 2>/dev/null
> > +		rm -r -f $loopmnt
> > +	fi
> > +	rm -f $tmp.*
> > +	_dmerror_unmount
> > +	_dmerror_cleanup
> > +}
> > +
> > +# Import common functions.
> > +. ./common/dmerror
> > +. ./common/reflink
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +
> > +_require_scratch_reflink
> > +_require_cp_reflink
> > +_require_dm_target error
> > +_require_command "$KILLALL_PROG" "killall"
> > +
> > +echo "Silence is golden."
> > +
> > +_scratch_mkfs >> $seqres.full 2>&1
> > +_require_metadata_journaling $SCRATCH_DEV
> > +_dmerror_init
> > +_dmerror_mount
> > +
> > +# Create a fs image consuming 1/3 of the scratch fs
> > +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > +
> > +loopimg=$SCRATCH_MNT/testfs
> > +truncate -s $loopimg_bytes $loopimg
> > +_mkfs_dev $loopimg
> 
> I must say this's a nice test as generic/475, I'd like to have it ASAP :)
> Just one question: if the FSTYP is nfs, cifs or virtiofs and so on ... [see below]
> 
> > +
> > +loopmnt=$tmp.mount
> > +mkdir -p $loopmnt
> > +
> > +scratch_aliveflag=$tmp.runsnap
> > +snap_aliveflag=$tmp.snapping
> > +
> > +snap_loop_fs() {
> > +	touch "$snap_aliveflag"
> > +	while [ -e "$scratch_aliveflag" ]; do
> > +		rm -f $loopimg.a
> > +		_cp_reflink $loopimg $loopimg.a
> > +		sleep 1
> > +	done
> > +	rm -f "$snap_aliveflag"
> > +}
> > +
> > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > +
> > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > +	touch $scratch_aliveflag
> > +	snap_loop_fs >> $seqres.full 2>&1 &
> > +
> > +	if ! _mount $loopimg $loopmnt -o loop; then
> 
> ... This test will fail directly at here

It won't, because this test doesn't run if SCRATCH_DEV isn't a block
device.  _require_dm_target calls _require_block_device, which should
prevent that, right?

--D

> 
> Thanks,
> Zorro
> 
> > +		rm -f $scratch_aliveflag
> > +		_fail "loop mount failed"
> > +		break
> > +	fi
> > +
> > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > +
> > +	# purposely include 0 second sleeps to test shutdown immediately after
> > +	# recovery
> > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > +	rm -f $scratch_aliveflag
> > +
> > +	# This test aims to simulate sudden disk failure, which means that we
> > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > +	# --nolockfs parameter; to make this happen we must call the load
> > +	# error table helper *without* 'lockfs'.
> > +	_dmerror_load_error_table
> > +
> > +	ps -e | grep fsstress > /dev/null 2>&1
> > +	while [ $? -eq 0 ]; do
> > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +		wait > /dev/null 2>&1
> > +		ps -e | grep fsstress > /dev/null 2>&1
> > +	done
> > +	for ((i = 0; i < 10; i++)); do
> > +		test -e "$snap_aliveflag" || break
> > +		sleep 1
> > +	done
> > +
> > +	# Mount again to replay log after loading working table, so we have a
> > +	# consistent XFS after test.
> > +	$UMOUNT_PROG $loopmnt
> > +	_dmerror_unmount || _fail "unmount failed"
> > +	_dmerror_load_working_table
> > +	if ! _dmerror_mount; then
> > +		dmsetup table | tee -a /dev/ttyprintk
> > +		lsblk | tee -a /dev/ttyprintk
> > +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> > +		_fail "mount failed"
> > +	fi
> > +done
> > +
> > +# Make sure the fs image file is ok
> > +if [ -f "$loopimg" ]; then
> > +	if _mount $loopimg $loopmnt -o loop; then
> > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > +	else
> > +		echo "final loop mount failed"
> > +	fi
> > +	_check_xfs_filesystem $loopimg none none
> > +fi
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/725.out b/tests/generic/725.out
> > new file mode 100644
> > index 00000000..ed73a9fc
> > --- /dev/null
> > +++ b/tests/generic/725.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 725
> > +Silence is golden.
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-08-12 17:07     ` Darrick J. Wong
@ 2021-08-13 14:52       ` Zorro Lang
  0 siblings, 0 replies; 16+ messages in thread
From: Zorro Lang @ 2021-08-13 14:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

On Thu, Aug 12, 2021 at 10:07:46AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 12, 2021 at 01:44:21PM +0800, Zorro Lang wrote:
> > On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > generic/475, but we're running fsstress on a disk image inside the
> > > scratch filesystem
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/725.out |    2 +
> > >  2 files changed, 138 insertions(+)
> > >  create mode 100755 tests/generic/725
> > >  create mode 100644 tests/generic/725.out
> > > 
> > > 
> > > diff --git a/tests/generic/725 b/tests/generic/725
> > > new file mode 100755
> > > index 00000000..f43bcb37
> > > --- /dev/null
> > > +++ b/tests/generic/725
> > > @@ -0,0 +1,136 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > +#
> > > +# FS QA Test No. 725
> > > +#
> > > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > > +# out the underlying scratch device with dm-error to see what happens when the
> > > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > > +# in writeback on the host that cause VM guests to fail to recover.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest shutdown auto log metadata eio
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > +	wait
> > > +	if [ -n "$loopmnt" ]; then
> > > +		umount $loopmnt 2>/dev/null
> > > +		rm -r -f $loopmnt
> > > +	fi
> > > +	rm -f $tmp.*
> > > +	_dmerror_unmount
> > > +	_dmerror_cleanup
> > > +}
> > > +
> > > +# Import common functions.
> > > +. ./common/dmerror
> > > +. ./common/reflink
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +
> > > +_require_scratch_reflink
> > > +_require_cp_reflink
> > > +_require_dm_target error
> > > +_require_command "$KILLALL_PROG" "killall"
> > > +
> > > +echo "Silence is golden."
> > > +
> > > +_scratch_mkfs >> $seqres.full 2>&1
> > > +_require_metadata_journaling $SCRATCH_DEV
> > > +_dmerror_init
> > > +_dmerror_mount
> > > +
> > > +# Create a fs image consuming 1/3 of the scratch fs
> > > +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> > > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > > +
> > > +loopimg=$SCRATCH_MNT/testfs
> > > +truncate -s $loopimg_bytes $loopimg
> > > +_mkfs_dev $loopimg
> > 
> > I must say this's a nice test as generic/475, I'd like to have it ASAP :)
> > Just one question: if the FSTYP is nfs, cifs or virtiofs and so on ... [see below]
> > 
> > > +
> > > +loopmnt=$tmp.mount
> > > +mkdir -p $loopmnt
> > > +
> > > +scratch_aliveflag=$tmp.runsnap
> > > +snap_aliveflag=$tmp.snapping
> > > +
> > > +snap_loop_fs() {
> > > +	touch "$snap_aliveflag"
> > > +	while [ -e "$scratch_aliveflag" ]; do
> > > +		rm -f $loopimg.a
> > > +		_cp_reflink $loopimg $loopimg.a
> > > +		sleep 1
> > > +	done
> > > +	rm -f "$snap_aliveflag"
> > > +}
> > > +
> > > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > > +
> > > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > > +	touch $scratch_aliveflag
> > > +	snap_loop_fs >> $seqres.full 2>&1 &
> > > +
> > > +	if ! _mount $loopimg $loopmnt -o loop; then
> > 
> > ... This test will fail directly at here
> 
> It won't, because this test doesn't run if SCRATCH_DEV isn't a block
> device.  _require_dm_target calls _require_block_device, which should
> prevent that, right?

Oh, you're right[1], I forgot that. If so, this case is good to me.
Hope it get merged soon :)
Reviewed-by: Zorro Lang <zlang@redhat.com>

Thanks,
Zorro

[1]
# ./check generic/725
FSTYP         -- nfs
PLATFORM      -- Linux/x86_64 xx-xxx-xx 4.18.0-xxx.el8.x86_64+debug #1 SMP Wed Jul 14 12:35:49 EDT 2021
MKFS_OPTIONS  -- xxx-xxx-xxx-xxxxxxx:/mnt/scratch/nfs-server
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 xx-xxxx-xxxx.xxxxx.xx:/mnt/scratch/nfs-server /mnt/nfs-scratch

generic/725     [not run] require xx-xxxx-xxxx.xxxxx.xx:/mnt/scratch/nfs-server to be valid block disk
Ran: generic/725
Not run: generic/725
Passed all 1 tests

# ./check generic/725
FSTYP         -- glusterfs
PLATFORM      -- Linux/x86_64 xx-xxx-xx 4.18.0-xxx.el8.x86_64+debug #1 SMP Wed Jul 14 12:35:49 EDT 2021
MKFS_OPTIONS  -- xxx-xxx-xxx-xxxxxxx:/SCRATCH_VOL
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 xx-xxxx-xxxx.xxxxx.xx:/SCRATCH_VOL /mnt/gluster-scratch

generic/725     [not run] Reflink not supported by scratch filesystem type: glusterfs
Ran: generic/725
Not run: generic/725
Passed all 1 tests

> 
> --D
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > +		rm -f $scratch_aliveflag
> > > +		_fail "loop mount failed"
> > > +		break
> > > +	fi
> > > +
> > > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > > +
> > > +	# purposely include 0 second sleeps to test shutdown immediately after
> > > +	# recovery
> > > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > > +	rm -f $scratch_aliveflag
> > > +
> > > +	# This test aims to simulate sudden disk failure, which means that we
> > > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > > +	# --nolockfs parameter; to make this happen we must call the load
> > > +	# error table helper *without* 'lockfs'.
> > > +	_dmerror_load_error_table
> > > +
> > > +	ps -e | grep fsstress > /dev/null 2>&1
> > > +	while [ $? -eq 0 ]; do
> > > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > +		wait > /dev/null 2>&1
> > > +		ps -e | grep fsstress > /dev/null 2>&1
> > > +	done
> > > +	for ((i = 0; i < 10; i++)); do
> > > +		test -e "$snap_aliveflag" || break
> > > +		sleep 1
> > > +	done
> > > +
> > > +	# Mount again to replay log after loading working table, so we have a
> > > +	# consistent XFS after test.
> > > +	$UMOUNT_PROG $loopmnt
> > > +	_dmerror_unmount || _fail "unmount failed"
> > > +	_dmerror_load_working_table
> > > +	if ! _dmerror_mount; then
> > > +		dmsetup table | tee -a /dev/ttyprintk
> > > +		lsblk | tee -a /dev/ttyprintk
> > > +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> > > +		_fail "mount failed"
> > > +	fi
> > > +done
> > > +
> > > +# Make sure the fs image file is ok
> > > +if [ -f "$loopimg" ]; then
> > > +	if _mount $loopimg $loopmnt -o loop; then
> > > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > > +	else
> > > +		echo "final loop mount failed"
> > > +	fi
> > > +	_check_xfs_filesystem $loopimg none none
> > > +fi
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/725.out b/tests/generic/725.out
> > > new file mode 100644
> > > index 00000000..ed73a9fc
> > > --- /dev/null
> > > +++ b/tests/generic/725.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 725
> > > +Silence is golden.
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] generic: test xattr operations only
  2021-08-12 17:04     ` Darrick J. Wong
@ 2021-08-15 15:46       ` Eryu Guan
  0 siblings, 0 replies; 16+ messages in thread
From: Eryu Guan @ 2021-08-15 15:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, zlang

On Thu, Aug 12, 2021 at 10:04:53AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 12, 2021 at 01:34:52PM +0800, Zorro Lang wrote:
> > On Tue, Jul 27, 2021 at 05:10:24PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Exercise extended attribute operations.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  tests/generic/724     |   57 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/724.out |    2 ++
> > >  2 files changed, 59 insertions(+)
> > >  create mode 100755 tests/generic/724
> > >  create mode 100644 tests/generic/724.out
> > > 
> > > 
> > > diff --git a/tests/generic/724 b/tests/generic/724
> > > new file mode 100755
> > > index 00000000..b19f8f73
> > > --- /dev/null
> > > +++ b/tests/generic/724
> > > @@ -0,0 +1,57 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > +#
> > > +# FS QA Test No. 724
> > > +#
> > > +# Run an extended attributes fsstress run with multiple threads to shake out
> > > +# bugs in the xattr code.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest soak attr long_rw stress
> > 
> > Should we add this test into 'auto' group too?
> 
> Yes, fixed.

I can fix that on commit.

> 
> > > +
> > > +_cleanup()
> > > +{
> > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > 
> > Can a "wait" command help more at here?

There's no background process in this test, so it seems 'wait' won't
do anything.

Thanks,
Eryu

> 
> Ok, I"ll add that.
> 
> --D
> 
> > Others looks good to me.
> > 
> > Thanks,
> > Zorro
> > 
> > > +	cd /
> > > +	rm -f $tmp.*
> > > +}
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +
> > > +_require_scratch
> > > +_require_command "$KILLALL_PROG" "killall"
> > > +
> > > +echo "Silence is golden."
> > > +
> > > +_scratch_mkfs > $seqres.full 2>&1
> > > +_scratch_mount >> $seqres.full 2>&1
> > > +
> > > +nr_cpus=$((LOAD_FACTOR * 4))
> > > +nr_ops=$((70000 * nr_cpus * TIME_FACTOR))
> > > +
> > > +args=('-z' '-S' 'c')
> > > +
> > > +# Do some directory tree modifications, but the bulk of this is geared towards
> > > +# exercising the xattr code, especially attr_set which can do up to 10k values.
> > > +for verb in unlink rmdir; do
> > > +	args+=('-f' "${verb}=1")
> > > +done
> > > +for verb in creat mkdir; do
> > > +	args+=('-f' "${verb}=2")
> > > +done
> > > +for verb in getfattr listfattr; do
> > > +	args+=('-f' "${verb}=3")
> > > +done
> > > +for verb in attr_remove removefattr; do
> > > +	args+=('-f' "${verb}=4")
> > > +done
> > > +args+=('-f' "setfattr=20")
> > > +args+=('-f' "attr_set=60")	# sets larger xattrs
> > > +
> > > +$FSSTRESS_PROG "${args[@]}" $FSSTRESS_AVOID -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus >> $seqres.full
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/724.out b/tests/generic/724.out
> > > new file mode 100644
> > > index 00000000..164cfffb
> > > --- /dev/null
> > > +++ b/tests/generic/724.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 724
> > > +Silence is golden.
> > > 
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-07-28  0:10 ` [PATCH 2/3] generic: test shutdowns of a nested filesystem Darrick J. Wong
  2021-08-12  5:44   ` Zorro Lang
@ 2021-08-15 16:28   ` Eryu Guan
  2021-08-16 16:35     ` Darrick J. Wong
  1 sibling, 1 reply; 16+ messages in thread
From: Eryu Guan @ 2021-08-15 16:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests

On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> generic/475, but we're running fsstress on a disk image inside the
> scratch filesystem
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/725.out |    2 +
>  2 files changed, 138 insertions(+)
>  create mode 100755 tests/generic/725
>  create mode 100644 tests/generic/725.out
> 
> 
> diff --git a/tests/generic/725 b/tests/generic/725
> new file mode 100755
> index 00000000..f43bcb37
> --- /dev/null
> +++ b/tests/generic/725
> @@ -0,0 +1,136 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> +#
> +# FS QA Test No. 725
> +#
> +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> +# out the underlying scratch device with dm-error to see what happens when the
> +# disk goes down.  Having taken down both fses in this manner, remount them and
> +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> +# in writeback on the host that cause VM guests to fail to recover.

It currently fails for me on btrfs, the loop mount failed in 2nd
iteration, seems like a bug in btrfs.

> +#
> +. ./common/preamble
> +_begin_fstest shutdown auto log metadata eio
> +
> +_cleanup()
> +{
> +	cd /
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +	wait
> +	if [ -n "$loopmnt" ]; then
> +		umount $loopmnt 2>/dev/null

$UMOUNT_PROG

> +		rm -r -f $loopmnt
> +	fi
> +	rm -f $tmp.*
> +	_dmerror_unmount
> +	_dmerror_cleanup
> +}
> +
> +# Import common functions.
> +. ./common/dmerror
> +. ./common/reflink
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +
> +_require_scratch_reflink
> +_require_cp_reflink
> +_require_dm_target error
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +_require_metadata_journaling $SCRATCH_DEV
> +_dmerror_init
> +_dmerror_mount
> +
> +# Create a fs image consuming 1/3 of the scratch fs
> +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)

_get_available_space $SCRATCH_MNT ?

> +loopimg_bytes=$((scratch_freesp_bytes / 3))
> +
> +loopimg=$SCRATCH_MNT/testfs
> +truncate -s $loopimg_bytes $loopimg
> +_mkfs_dev $loopimg
> +
> +loopmnt=$tmp.mount
> +mkdir -p $loopmnt
> +
> +scratch_aliveflag=$tmp.runsnap
> +snap_aliveflag=$tmp.snapping
> +
> +snap_loop_fs() {
> +	touch "$snap_aliveflag"
> +	while [ -e "$scratch_aliveflag" ]; do
> +		rm -f $loopimg.a
> +		_cp_reflink $loopimg $loopimg.a
> +		sleep 1
> +	done
> +	rm -f "$snap_aliveflag"
> +}
> +
> +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> +
> +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> +	touch $scratch_aliveflag
> +	snap_loop_fs >> $seqres.full 2>&1 &
> +
> +	if ! _mount $loopimg $loopmnt -o loop; then
> +		rm -f $scratch_aliveflag
> +		_fail "loop mount failed"

I found it a bit easier to debug if print $i here.

> +		break
> +	fi
> +
> +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> +
> +	# purposely include 0 second sleeps to test shutdown immediately after
> +	# recovery
> +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> +	rm -f $scratch_aliveflag
> +
> +	# This test aims to simulate sudden disk failure, which means that we
> +	# do not want to quiesce the filesystem or otherwise give it a chance
> +	# to flush its logs.  Therefore we want to call dmsetup with the
> +	# --nolockfs parameter; to make this happen we must call the load
> +	# error table helper *without* 'lockfs'.
> +	_dmerror_load_error_table
> +
> +	ps -e | grep fsstress > /dev/null 2>&1
> +	while [ $? -eq 0 ]; do
> +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +		wait > /dev/null 2>&1
> +		ps -e | grep fsstress > /dev/null 2>&1
> +	done
> +	for ((i = 0; i < 10; i++)); do
> +		test -e "$snap_aliveflag" || break
> +		sleep 1
> +	done
> +
> +	# Mount again to replay log after loading working table, so we have a
> +	# consistent XFS after test.

This is a generic test, fix the XFS specific comments?

> +	$UMOUNT_PROG $loopmnt
> +	_dmerror_unmount || _fail "unmount failed"
> +	_dmerror_load_working_table
> +	if ! _dmerror_mount; then
> +		dmsetup table | tee -a /dev/ttyprintk
> +		lsblk | tee -a /dev/ttyprintk
> +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md

Above logs all should go to $seqres.full ?

And $XFS_METADUMP_PROG is not suitable for a generic test.

> +		_fail "mount failed"
> +	fi
> +done
> +
> +# Make sure the fs image file is ok
> +if [ -f "$loopimg" ]; then
> +	if _mount $loopimg $loopmnt -o loop; then
> +		$UMOUNT_PROG $loopmnt &> /dev/null
> +	else
> +		echo "final loop mount failed"
> +	fi
> +	_check_xfs_filesystem $loopimg none none

Same here, use _check_scratch_fs?

Thanks,
Eryu

> +fi
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/725.out b/tests/generic/725.out
> new file mode 100644
> index 00000000..ed73a9fc
> --- /dev/null
> +++ b/tests/generic/725.out
> @@ -0,0 +1,2 @@
> +QA output created by 725
> +Silence is golden.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-08-15 16:28   ` Eryu Guan
@ 2021-08-16 16:35     ` Darrick J. Wong
  2021-08-17  3:16       ` Eryu Guan
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2021-08-16 16:35 UTC (permalink / raw)
  To: Eryu Guan; +Cc: guaneryu, linux-xfs, fstests

On Mon, Aug 16, 2021 at 12:28:20AM +0800, Eryu Guan wrote:
> On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > generic/475, but we're running fsstress on a disk image inside the
> > scratch filesystem
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/725.out |    2 +
> >  2 files changed, 138 insertions(+)
> >  create mode 100755 tests/generic/725
> >  create mode 100644 tests/generic/725.out
> > 
> > 
> > diff --git a/tests/generic/725 b/tests/generic/725
> > new file mode 100755
> > index 00000000..f43bcb37
> > --- /dev/null
> > +++ b/tests/generic/725
> > @@ -0,0 +1,136 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test No. 725
> > +#
> > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > +# out the underlying scratch device with dm-error to see what happens when the
> > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > +# in writeback on the host that cause VM guests to fail to recover.
> 
> It currently fails for me on btrfs, the loop mount failed in 2nd
> iteration, seems like a bug in btrfs.

Yep.  Until recently (aka the Big Xfs Log Recovery Bughunt of 2021) it
wouldn't pass xfs either. :/

> > +#
> > +. ./common/preamble
> > +_begin_fstest shutdown auto log metadata eio
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +	wait
> > +	if [ -n "$loopmnt" ]; then
> > +		umount $loopmnt 2>/dev/null
> 
> $UMOUNT_PROG
> 
> > +		rm -r -f $loopmnt
> > +	fi
> > +	rm -f $tmp.*
> > +	_dmerror_unmount
> > +	_dmerror_cleanup
> > +}
> > +
> > +# Import common functions.
> > +. ./common/dmerror
> > +. ./common/reflink
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +
> > +_require_scratch_reflink
> > +_require_cp_reflink
> > +_require_dm_target error
> > +_require_command "$KILLALL_PROG" "killall"
> > +
> > +echo "Silence is golden."
> > +
> > +_scratch_mkfs >> $seqres.full 2>&1
> > +_require_metadata_journaling $SCRATCH_DEV
> > +_dmerror_init
> > +_dmerror_mount
> > +
> > +# Create a fs image consuming 1/3 of the scratch fs
> > +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> 
> _get_available_space $SCRATCH_MNT ?
> 
> > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > +
> > +loopimg=$SCRATCH_MNT/testfs
> > +truncate -s $loopimg_bytes $loopimg
> > +_mkfs_dev $loopimg
> > +
> > +loopmnt=$tmp.mount
> > +mkdir -p $loopmnt
> > +
> > +scratch_aliveflag=$tmp.runsnap
> > +snap_aliveflag=$tmp.snapping
> > +
> > +snap_loop_fs() {
> > +	touch "$snap_aliveflag"
> > +	while [ -e "$scratch_aliveflag" ]; do
> > +		rm -f $loopimg.a
> > +		_cp_reflink $loopimg $loopimg.a
> > +		sleep 1
> > +	done
> > +	rm -f "$snap_aliveflag"
> > +}
> > +
> > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > +
> > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > +	touch $scratch_aliveflag
> > +	snap_loop_fs >> $seqres.full 2>&1 &
> > +
> > +	if ! _mount $loopimg $loopmnt -o loop; then
> > +		rm -f $scratch_aliveflag
> > +		_fail "loop mount failed"
> 
> I found it a bit easier to debug if print $i here.

Ok, I'll change it to "loop $i mount failed".

> > +		break
> > +	fi
> > +
> > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > +
> > +	# purposely include 0 second sleeps to test shutdown immediately after
> > +	# recovery
> > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > +	rm -f $scratch_aliveflag
> > +
> > +	# This test aims to simulate sudden disk failure, which means that we
> > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > +	# --nolockfs parameter; to make this happen we must call the load
> > +	# error table helper *without* 'lockfs'.
> > +	_dmerror_load_error_table
> > +
> > +	ps -e | grep fsstress > /dev/null 2>&1
> > +	while [ $? -eq 0 ]; do
> > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +		wait > /dev/null 2>&1
> > +		ps -e | grep fsstress > /dev/null 2>&1
> > +	done
> > +	for ((i = 0; i < 10; i++)); do
> > +		test -e "$snap_aliveflag" || break
> > +		sleep 1
> > +	done
> > +
> > +	# Mount again to replay log after loading working table, so we have a
> > +	# consistent XFS after test.
> 
> This is a generic test, fix the XFS specific comments?

Oops.  "...a consistent fs after test."

> > +	$UMOUNT_PROG $loopmnt
> > +	_dmerror_unmount || _fail "unmount failed"
> > +	_dmerror_load_working_table
> > +	if ! _dmerror_mount; then
> > +		dmsetup table | tee -a /dev/ttyprintk
> > +		lsblk | tee -a /dev/ttyprintk
> > +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> 
> Above logs all should go to $seqres.full ?

Oops, yeah.  I'll remove them since I was only using them to check the
system state.

> And $XFS_METADUMP_PROG is not suitable for a generic test.

I'll create _metadump_dev so that this at least works for the two
filesystems for which we have dump creation helpers (ext* and xfs).

> > +		_fail "mount failed"
> > +	fi
> > +done
> > +
> > +# Make sure the fs image file is ok
> > +if [ -f "$loopimg" ]; then
> > +	if _mount $loopimg $loopmnt -o loop; then
> > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > +	else
> > +		echo "final loop mount failed"
> > +	fi
> > +	_check_xfs_filesystem $loopimg none none
> 
> Same here, use _check_scratch_fs?

$loopimg is a file within the scratch fs.

--D

> Thanks,
> Eryu
> 
> > +fi
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/725.out b/tests/generic/725.out
> > new file mode 100644
> > index 00000000..ed73a9fc
> > --- /dev/null
> > +++ b/tests/generic/725.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 725
> > +Silence is golden.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-08-16 16:35     ` Darrick J. Wong
@ 2021-08-17  3:16       ` Eryu Guan
  2021-08-17  4:16         ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Eryu Guan @ 2021-08-17  3:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Eryu Guan, guaneryu, linux-xfs, fstests

On Mon, Aug 16, 2021 at 09:35:24AM -0700, Darrick J. Wong wrote:
> On Mon, Aug 16, 2021 at 12:28:20AM +0800, Eryu Guan wrote:
> > On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > generic/475, but we're running fsstress on a disk image inside the
> > > scratch filesystem
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/725.out |    2 +
> > >  2 files changed, 138 insertions(+)
> > >  create mode 100755 tests/generic/725
> > >  create mode 100644 tests/generic/725.out
> > > 
> > > 
> > > diff --git a/tests/generic/725 b/tests/generic/725
> > > new file mode 100755
> > > index 00000000..f43bcb37
> > > --- /dev/null
> > > +++ b/tests/generic/725
> > > @@ -0,0 +1,136 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > +#
> > > +# FS QA Test No. 725
> > > +#
> > > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > > +# out the underlying scratch device with dm-error to see what happens when the
> > > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > > +# in writeback on the host that cause VM guests to fail to recover.
> > 
> > It currently fails for me on btrfs, the loop mount failed in 2nd
> > iteration, seems like a bug in btrfs.
> 
> Yep.  Until recently (aka the Big Xfs Log Recovery Bughunt of 2021) it
> wouldn't pass xfs either. :/
> 
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest shutdown auto log metadata eio
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > +	wait
> > > +	if [ -n "$loopmnt" ]; then
> > > +		umount $loopmnt 2>/dev/null
> > 
> > $UMOUNT_PROG
> > 
> > > +		rm -r -f $loopmnt
> > > +	fi
> > > +	rm -f $tmp.*
> > > +	_dmerror_unmount
> > > +	_dmerror_cleanup
> > > +}
> > > +
> > > +# Import common functions.
> > > +. ./common/dmerror
> > > +. ./common/reflink
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +
> > > +_require_scratch_reflink
> > > +_require_cp_reflink
> > > +_require_dm_target error
> > > +_require_command "$KILLALL_PROG" "killall"
> > > +
> > > +echo "Silence is golden."
> > > +
> > > +_scratch_mkfs >> $seqres.full 2>&1
> > > +_require_metadata_journaling $SCRATCH_DEV
> > > +_dmerror_init
> > > +_dmerror_mount
> > > +
> > > +# Create a fs image consuming 1/3 of the scratch fs
> > > +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> > 
> > _get_available_space $SCRATCH_MNT ?
> > 
> > > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > > +
> > > +loopimg=$SCRATCH_MNT/testfs
> > > +truncate -s $loopimg_bytes $loopimg
> > > +_mkfs_dev $loopimg
> > > +
> > > +loopmnt=$tmp.mount
> > > +mkdir -p $loopmnt
> > > +
> > > +scratch_aliveflag=$tmp.runsnap
> > > +snap_aliveflag=$tmp.snapping
> > > +
> > > +snap_loop_fs() {
> > > +	touch "$snap_aliveflag"
> > > +	while [ -e "$scratch_aliveflag" ]; do
> > > +		rm -f $loopimg.a
> > > +		_cp_reflink $loopimg $loopimg.a
> > > +		sleep 1
> > > +	done
> > > +	rm -f "$snap_aliveflag"
> > > +}
> > > +
> > > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > > +
> > > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > > +	touch $scratch_aliveflag
> > > +	snap_loop_fs >> $seqres.full 2>&1 &
> > > +
> > > +	if ! _mount $loopimg $loopmnt -o loop; then
> > > +		rm -f $scratch_aliveflag
> > > +		_fail "loop mount failed"
> > 
> > I found it a bit easier to debug if print $i here.
> 
> Ok, I'll change it to "loop $i mount failed".
> 
> > > +		break
> > > +	fi
> > > +
> > > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > > +
> > > +	# purposely include 0 second sleeps to test shutdown immediately after
> > > +	# recovery
> > > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > > +	rm -f $scratch_aliveflag
> > > +
> > > +	# This test aims to simulate sudden disk failure, which means that we
> > > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > > +	# --nolockfs parameter; to make this happen we must call the load
> > > +	# error table helper *without* 'lockfs'.
> > > +	_dmerror_load_error_table
> > > +
> > > +	ps -e | grep fsstress > /dev/null 2>&1
> > > +	while [ $? -eq 0 ]; do
> > > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > +		wait > /dev/null 2>&1
> > > +		ps -e | grep fsstress > /dev/null 2>&1
> > > +	done
> > > +	for ((i = 0; i < 10; i++)); do
> > > +		test -e "$snap_aliveflag" || break
> > > +		sleep 1
> > > +	done
> > > +
> > > +	# Mount again to replay log after loading working table, so we have a
> > > +	# consistent XFS after test.
> > 
> > This is a generic test, fix the XFS specific comments?
> 
> Oops.  "...a consistent fs after test."
> 
> > > +	$UMOUNT_PROG $loopmnt
> > > +	_dmerror_unmount || _fail "unmount failed"
> > > +	_dmerror_load_working_table
> > > +	if ! _dmerror_mount; then
> > > +		dmsetup table | tee -a /dev/ttyprintk
> > > +		lsblk | tee -a /dev/ttyprintk
> > > +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> > 
> > Above logs all should go to $seqres.full ?
> 
> Oops, yeah.  I'll remove them since I was only using them to check the
> system state.
> 
> > And $XFS_METADUMP_PROG is not suitable for a generic test.
> 
> I'll create _metadump_dev so that this at least works for the two
> filesystems for which we have dump creation helpers (ext* and xfs).

Sounds great!

> 
> > > +		_fail "mount failed"
> > > +	fi
> > > +done
> > > +
> > > +# Make sure the fs image file is ok
> > > +if [ -f "$loopimg" ]; then
> > > +	if _mount $loopimg $loopmnt -o loop; then
> > > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > > +	else
> > > +		echo "final loop mount failed"
> > > +	fi
> > > +	_check_xfs_filesystem $loopimg none none
> > 
> > Same here, use _check_scratch_fs?
> 
> $loopimg is a file within the scratch fs.

_check_scratch_fs can take dev as argument, and default to $SCRATCH_DEV,
I think that works in this case?

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-08-17  3:16       ` Eryu Guan
@ 2021-08-17  4:16         ` Darrick J. Wong
  2021-08-17 15:54           ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2021-08-17  4:16 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Eryu Guan, guaneryu, linux-xfs, fstests

On Tue, Aug 17, 2021 at 11:16:49AM +0800, Eryu Guan wrote:
> On Mon, Aug 16, 2021 at 09:35:24AM -0700, Darrick J. Wong wrote:
> > On Mon, Aug 16, 2021 at 12:28:20AM +0800, Eryu Guan wrote:
> > > On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > generic/475, but we're running fsstress on a disk image inside the
> > > > scratch filesystem
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > > >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  tests/generic/725.out |    2 +
> > > >  2 files changed, 138 insertions(+)
> > > >  create mode 100755 tests/generic/725
> > > >  create mode 100644 tests/generic/725.out
> > > > 
> > > > 
> > > > diff --git a/tests/generic/725 b/tests/generic/725
> > > > new file mode 100755
> > > > index 00000000..f43bcb37
> > > > --- /dev/null
> > > > +++ b/tests/generic/725
> > > > @@ -0,0 +1,136 @@
> > > > +#! /bin/bash
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > > +#
> > > > +# FS QA Test No. 725
> > > > +#
> > > > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > > > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > > > +# out the underlying scratch device with dm-error to see what happens when the
> > > > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > > > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > > > +# in writeback on the host that cause VM guests to fail to recover.
> > > 
> > > It currently fails for me on btrfs, the loop mount failed in 2nd
> > > iteration, seems like a bug in btrfs.
> > 
> > Yep.  Until recently (aka the Big Xfs Log Recovery Bughunt of 2021) it
> > wouldn't pass xfs either. :/
> > 
> > > > +#
> > > > +. ./common/preamble
> > > > +_begin_fstest shutdown auto log metadata eio
> > > > +
> > > > +_cleanup()
> > > > +{
> > > > +	cd /
> > > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > > +	wait
> > > > +	if [ -n "$loopmnt" ]; then
> > > > +		umount $loopmnt 2>/dev/null
> > > 
> > > $UMOUNT_PROG
> > > 
> > > > +		rm -r -f $loopmnt
> > > > +	fi
> > > > +	rm -f $tmp.*
> > > > +	_dmerror_unmount
> > > > +	_dmerror_cleanup
> > > > +}
> > > > +
> > > > +# Import common functions.
> > > > +. ./common/dmerror
> > > > +. ./common/reflink
> > > > +
> > > > +# Modify as appropriate.
> > > > +_supported_fs generic
> > > > +
> > > > +_require_scratch_reflink
> > > > +_require_cp_reflink
> > > > +_require_dm_target error
> > > > +_require_command "$KILLALL_PROG" "killall"
> > > > +
> > > > +echo "Silence is golden."
> > > > +
> > > > +_scratch_mkfs >> $seqres.full 2>&1
> > > > +_require_metadata_journaling $SCRATCH_DEV
> > > > +_dmerror_init
> > > > +_dmerror_mount
> > > > +
> > > > +# Create a fs image consuming 1/3 of the scratch fs
> > > > +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> > > 
> > > _get_available_space $SCRATCH_MNT ?
> > > 
> > > > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > > > +
> > > > +loopimg=$SCRATCH_MNT/testfs
> > > > +truncate -s $loopimg_bytes $loopimg
> > > > +_mkfs_dev $loopimg
> > > > +
> > > > +loopmnt=$tmp.mount
> > > > +mkdir -p $loopmnt
> > > > +
> > > > +scratch_aliveflag=$tmp.runsnap
> > > > +snap_aliveflag=$tmp.snapping
> > > > +
> > > > +snap_loop_fs() {
> > > > +	touch "$snap_aliveflag"
> > > > +	while [ -e "$scratch_aliveflag" ]; do
> > > > +		rm -f $loopimg.a
> > > > +		_cp_reflink $loopimg $loopimg.a
> > > > +		sleep 1
> > > > +	done
> > > > +	rm -f "$snap_aliveflag"
> > > > +}
> > > > +
> > > > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > > > +
> > > > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > > > +	touch $scratch_aliveflag
> > > > +	snap_loop_fs >> $seqres.full 2>&1 &
> > > > +
> > > > +	if ! _mount $loopimg $loopmnt -o loop; then
> > > > +		rm -f $scratch_aliveflag
> > > > +		_fail "loop mount failed"
> > > 
> > > I found it a bit easier to debug if print $i here.
> > 
> > Ok, I'll change it to "loop $i mount failed".
> > 
> > > > +		break
> > > > +	fi
> > > > +
> > > > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > > > +
> > > > +	# purposely include 0 second sleeps to test shutdown immediately after
> > > > +	# recovery
> > > > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > > > +	rm -f $scratch_aliveflag
> > > > +
> > > > +	# This test aims to simulate sudden disk failure, which means that we
> > > > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > > > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > > > +	# --nolockfs parameter; to make this happen we must call the load
> > > > +	# error table helper *without* 'lockfs'.
> > > > +	_dmerror_load_error_table
> > > > +
> > > > +	ps -e | grep fsstress > /dev/null 2>&1
> > > > +	while [ $? -eq 0 ]; do
> > > > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > > +		wait > /dev/null 2>&1
> > > > +		ps -e | grep fsstress > /dev/null 2>&1
> > > > +	done
> > > > +	for ((i = 0; i < 10; i++)); do
> > > > +		test -e "$snap_aliveflag" || break
> > > > +		sleep 1
> > > > +	done
> > > > +
> > > > +	# Mount again to replay log after loading working table, so we have a
> > > > +	# consistent XFS after test.
> > > 
> > > This is a generic test, fix the XFS specific comments?
> > 
> > Oops.  "...a consistent fs after test."
> > 
> > > > +	$UMOUNT_PROG $loopmnt
> > > > +	_dmerror_unmount || _fail "unmount failed"
> > > > +	_dmerror_load_working_table
> > > > +	if ! _dmerror_mount; then
> > > > +		dmsetup table | tee -a /dev/ttyprintk
> > > > +		lsblk | tee -a /dev/ttyprintk
> > > > +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> > > 
> > > Above logs all should go to $seqres.full ?
> > 
> > Oops, yeah.  I'll remove them since I was only using them to check the
> > system state.
> > 
> > > And $XFS_METADUMP_PROG is not suitable for a generic test.
> > 
> > I'll create _metadump_dev so that this at least works for the two
> > filesystems for which we have dump creation helpers (ext* and xfs).
> 
> Sounds great!
> 
> > 
> > > > +		_fail "mount failed"
> > > > +	fi
> > > > +done
> > > > +
> > > > +# Make sure the fs image file is ok
> > > > +if [ -f "$loopimg" ]; then
> > > > +	if _mount $loopimg $loopmnt -o loop; then
> > > > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > > > +	else
> > > > +		echo "final loop mount failed"
> > > > +	fi
> > > > +	_check_xfs_filesystem $loopimg none none
> > > 
> > > Same here, use _check_scratch_fs?
> > 
> > $loopimg is a file within the scratch fs.
> 
> _check_scratch_fs can take dev as argument, and default to $SCRATCH_DEV,
> I think that works in this case?

It could be made to work with a large enough crowbar, but that's
seriously overkill because $loopimg is a file *within* the scratch
filesystem.  The $loopimg fs gets formatted without the
SCRATCH_LOGDEV/SCRATCH_RTDEV options (because it is not itself the
scratch filesystem), which means that in order to (ab)use
_check_scratch_fs to do the same thing as _check_xfs_filesystem, you
have to exclude those options.  So yes, this:

	SCRATCH_RTDEV= SCRATCH_LOGDEV= _check_scratch_fs $loopimg

is the equivalent of this:

	_check_xfs_filesystem $loopimg none none

But the first is longer and pointless.

--D

> Thanks,
> Eryu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] generic: test shutdowns of a nested filesystem
  2021-08-17  4:16         ` Darrick J. Wong
@ 2021-08-17 15:54           ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2021-08-17 15:54 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Eryu Guan, guaneryu, linux-xfs, fstests

On Mon, Aug 16, 2021 at 09:16:16PM -0700, Darrick J. Wong wrote:
> On Tue, Aug 17, 2021 at 11:16:49AM +0800, Eryu Guan wrote:
> > On Mon, Aug 16, 2021 at 09:35:24AM -0700, Darrick J. Wong wrote:
> > > On Mon, Aug 16, 2021 at 12:28:20AM +0800, Eryu Guan wrote:
> > > > On Tue, Jul 27, 2021 at 05:10:30PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > generic/475, but we're running fsstress on a disk image inside the
> > > > > scratch filesystem
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > > ---
> > > > >  tests/generic/725     |  136 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  tests/generic/725.out |    2 +
> > > > >  2 files changed, 138 insertions(+)
> > > > >  create mode 100755 tests/generic/725
> > > > >  create mode 100644 tests/generic/725.out
> > > > > 
> > > > > 
> > > > > diff --git a/tests/generic/725 b/tests/generic/725
> > > > > new file mode 100755
> > > > > index 00000000..f43bcb37
> > > > > --- /dev/null
> > > > > +++ b/tests/generic/725
> > > > > @@ -0,0 +1,136 @@
> > > > > +#! /bin/bash
> > > > > +# SPDX-License-Identifier: GPL-2.0
> > > > > +# Copyright (c) 2021 Oracle, Inc.  All Rights Reserved.
> > > > > +#
> > > > > +# FS QA Test No. 725
> > > > > +#
> > > > > +# Test nested log recovery with repeated (simulated) disk failures.  We kick
> > > > > +# off fsstress on a loopback filesystem mounted on the scratch fs, then switch
> > > > > +# out the underlying scratch device with dm-error to see what happens when the
> > > > > +# disk goes down.  Having taken down both fses in this manner, remount them and
> > > > > +# repeat.  This test simulates VM hosts crashing to try to shake out CoW bugs
> > > > > +# in writeback on the host that cause VM guests to fail to recover.
> > > > 
> > > > It currently fails for me on btrfs, the loop mount failed in 2nd
> > > > iteration, seems like a bug in btrfs.
> > > 
> > > Yep.  Until recently (aka the Big Xfs Log Recovery Bughunt of 2021) it
> > > wouldn't pass xfs either. :/
> > > 
> > > > > +#
> > > > > +. ./common/preamble
> > > > > +_begin_fstest shutdown auto log metadata eio
> > > > > +
> > > > > +_cleanup()
> > > > > +{
> > > > > +	cd /
> > > > > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > > > +	wait
> > > > > +	if [ -n "$loopmnt" ]; then
> > > > > +		umount $loopmnt 2>/dev/null
> > > > 
> > > > $UMOUNT_PROG
> > > > 
> > > > > +		rm -r -f $loopmnt
> > > > > +	fi
> > > > > +	rm -f $tmp.*
> > > > > +	_dmerror_unmount
> > > > > +	_dmerror_cleanup
> > > > > +}
> > > > > +
> > > > > +# Import common functions.
> > > > > +. ./common/dmerror
> > > > > +. ./common/reflink
> > > > > +
> > > > > +# Modify as appropriate.
> > > > > +_supported_fs generic
> > > > > +
> > > > > +_require_scratch_reflink
> > > > > +_require_cp_reflink
> > > > > +_require_dm_target error
> > > > > +_require_command "$KILLALL_PROG" "killall"
> > > > > +
> > > > > +echo "Silence is golden."
> > > > > +
> > > > > +_scratch_mkfs >> $seqres.full 2>&1
> > > > > +_require_metadata_journaling $SCRATCH_DEV
> > > > > +_dmerror_init
> > > > > +_dmerror_mount
> > > > > +
> > > > > +# Create a fs image consuming 1/3 of the scratch fs
> > > > > +scratch_freesp_bytes=$(stat -f -c '%a * %S' $SCRATCH_MNT | bc)
> > > > 
> > > > _get_available_space $SCRATCH_MNT ?
> > > > 
> > > > > +loopimg_bytes=$((scratch_freesp_bytes / 3))
> > > > > +
> > > > > +loopimg=$SCRATCH_MNT/testfs
> > > > > +truncate -s $loopimg_bytes $loopimg
> > > > > +_mkfs_dev $loopimg
> > > > > +
> > > > > +loopmnt=$tmp.mount
> > > > > +mkdir -p $loopmnt
> > > > > +
> > > > > +scratch_aliveflag=$tmp.runsnap
> > > > > +snap_aliveflag=$tmp.snapping
> > > > > +
> > > > > +snap_loop_fs() {
> > > > > +	touch "$snap_aliveflag"
> > > > > +	while [ -e "$scratch_aliveflag" ]; do
> > > > > +		rm -f $loopimg.a
> > > > > +		_cp_reflink $loopimg $loopimg.a
> > > > > +		sleep 1
> > > > > +	done
> > > > > +	rm -f "$snap_aliveflag"
> > > > > +}
> > > > > +
> > > > > +fsstress=($FSSTRESS_PROG $FSSTRESS_AVOID -d "$loopmnt" -n 999999 -p "$((LOAD_FACTOR * 4))")
> > > > > +
> > > > > +for i in $(seq 1 $((25 * TIME_FACTOR)) ); do
> > > > > +	touch $scratch_aliveflag
> > > > > +	snap_loop_fs >> $seqres.full 2>&1 &
> > > > > +
> > > > > +	if ! _mount $loopimg $loopmnt -o loop; then
> > > > > +		rm -f $scratch_aliveflag
> > > > > +		_fail "loop mount failed"
> > > > 
> > > > I found it a bit easier to debug if print $i here.
> > > 
> > > Ok, I'll change it to "loop $i mount failed".
> > > 
> > > > > +		break
> > > > > +	fi
> > > > > +
> > > > > +	("${fsstress[@]}" >> $seqres.full &) > /dev/null 2>&1
> > > > > +
> > > > > +	# purposely include 0 second sleeps to test shutdown immediately after
> > > > > +	# recovery
> > > > > +	sleep $((RANDOM % (3 * TIME_FACTOR) ))
> > > > > +	rm -f $scratch_aliveflag
> > > > > +
> > > > > +	# This test aims to simulate sudden disk failure, which means that we
> > > > > +	# do not want to quiesce the filesystem or otherwise give it a chance
> > > > > +	# to flush its logs.  Therefore we want to call dmsetup with the
> > > > > +	# --nolockfs parameter; to make this happen we must call the load
> > > > > +	# error table helper *without* 'lockfs'.
> > > > > +	_dmerror_load_error_table
> > > > > +
> > > > > +	ps -e | grep fsstress > /dev/null 2>&1
> > > > > +	while [ $? -eq 0 ]; do
> > > > > +		$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > > > > +		wait > /dev/null 2>&1
> > > > > +		ps -e | grep fsstress > /dev/null 2>&1
> > > > > +	done
> > > > > +	for ((i = 0; i < 10; i++)); do
> > > > > +		test -e "$snap_aliveflag" || break
> > > > > +		sleep 1
> > > > > +	done
> > > > > +
> > > > > +	# Mount again to replay log after loading working table, so we have a
> > > > > +	# consistent XFS after test.
> > > > 
> > > > This is a generic test, fix the XFS specific comments?
> > > 
> > > Oops.  "...a consistent fs after test."
> > > 
> > > > > +	$UMOUNT_PROG $loopmnt
> > > > > +	_dmerror_unmount || _fail "unmount failed"
> > > > > +	_dmerror_load_working_table
> > > > > +	if ! _dmerror_mount; then
> > > > > +		dmsetup table | tee -a /dev/ttyprintk
> > > > > +		lsblk | tee -a /dev/ttyprintk
> > > > > +		$XFS_METADUMP_PROG -a -g -o $DMERROR_DEV $seqres.dmfail.md
> > > > 
> > > > Above logs all should go to $seqres.full ?
> > > 
> > > Oops, yeah.  I'll remove them since I was only using them to check the
> > > system state.
> > > 
> > > > And $XFS_METADUMP_PROG is not suitable for a generic test.
> > > 
> > > I'll create _metadump_dev so that this at least works for the two
> > > filesystems for which we have dump creation helpers (ext* and xfs).
> > 
> > Sounds great!
> > 
> > > 
> > > > > +		_fail "mount failed"
> > > > > +	fi
> > > > > +done
> > > > > +
> > > > > +# Make sure the fs image file is ok
> > > > > +if [ -f "$loopimg" ]; then
> > > > > +	if _mount $loopimg $loopmnt -o loop; then
> > > > > +		$UMOUNT_PROG $loopmnt &> /dev/null
> > > > > +	else
> > > > > +		echo "final loop mount failed"
> > > > > +	fi
> > > > > +	_check_xfs_filesystem $loopimg none none
> > > > 
> > > > Same here, use _check_scratch_fs?
> > > 
> > > $loopimg is a file within the scratch fs.
> > 
> > _check_scratch_fs can take dev as argument, and default to $SCRATCH_DEV,
> > I think that works in this case?
> 
> It could be made to work with a large enough crowbar, but that's
> seriously overkill because $loopimg is a file *within* the scratch
> filesystem.  The $loopimg fs gets formatted without the
> SCRATCH_LOGDEV/SCRATCH_RTDEV options (because it is not itself the
> scratch filesystem), which means that in order to (ab)use
> _check_scratch_fs to do the same thing as _check_xfs_filesystem, you
> have to exclude those options.  So yes, this:
> 
> 	SCRATCH_RTDEV= SCRATCH_LOGDEV= _check_scratch_fs $loopimg
> 
> is the equivalent of this:
> 
> 	_check_xfs_filesystem $loopimg none none
> 
> But the first is longer and pointless.

...and now that it's morning and I've had coffee again, I now understand
what you're actually asking, which is "Don't use _foo_xfs* functions in
a generic test!", not "rototill in this helper for stylistic reasons".

Judging from my immediate defensive reaction, I've clearly been worn
down by all the bikeshedding the past year.  Have they resumed flights
to Nobikeshed Island?

Anyway, I'll go fix that.  Thank you for catching the mistake.

--D

> --D
> 
> > Thanks,
> > Eryu

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-17 15:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28  0:10 [PATCHSET 0/3] fstests: exercise code refactored in 5.14 Darrick J. Wong
2021-07-28  0:10 ` [PATCH 1/3] generic: test xattr operations only Darrick J. Wong
2021-08-12  5:34   ` Zorro Lang
2021-08-12 17:04     ` Darrick J. Wong
2021-08-15 15:46       ` Eryu Guan
2021-07-28  0:10 ` [PATCH 2/3] generic: test shutdowns of a nested filesystem Darrick J. Wong
2021-08-12  5:44   ` Zorro Lang
2021-08-12 17:07     ` Darrick J. Wong
2021-08-13 14:52       ` Zorro Lang
2021-08-15 16:28   ` Eryu Guan
2021-08-16 16:35     ` Darrick J. Wong
2021-08-17  3:16       ` Eryu Guan
2021-08-17  4:16         ` Darrick J. Wong
2021-08-17 15:54           ` Darrick J. Wong
2021-07-28  0:10 ` [PATCH 3/3] xfs: test regression in shrink when the new EOFS splits a sparse inode cluster Darrick J. Wong
2021-08-12  6:01   ` Zorro Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).