linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/8] fstests: random fixes
@ 2021-07-07  0:21 Darrick J. Wong
  2021-07-07  0:21 ` [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation Darrick J. Wong
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

Hi all,

This series fixes a bunch of small problems in the test suite that were
causing intermittent test failures.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
---
 check             |   24 +++++++++++++++++-------
 common/dmthin     |    8 ++++++--
 tests/generic/019 |    3 +++
 tests/generic/371 |    8 ++++++++
 tests/generic/561 |    9 +++++++--
 tests/shared/298  |    2 +-
 tests/xfs/084     |    8 ++++++--
 tests/xfs/172     |   30 +++++++++++++++++++++++++++++-
 8 files changed, 77 insertions(+), 15 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-09 23:38   ` Allison Henderson
  2021-07-07  0:21 ` [PATCH 2/8] generic/561: hide assertions when duperemove is killed Darrick J. Wong
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

This test tries to exploit an interaction between delayed allocation and
writeback on full filesystems to see if it can trip up the filestreams
allocator.  The behaviors do not present if the filesystem allocates
space at write time, so disable it under these scenarios.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/172 |   30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)


diff --git a/tests/xfs/172 b/tests/xfs/172
index 0d1b441e..c0495305 100755
--- a/tests/xfs/172
+++ b/tests/xfs/172
@@ -16,9 +16,37 @@ _begin_fstest rw filestreams
 
 # real QA test starts here
 _supported_fs xfs
-
+_require_command "$FILEFRAG_PROG" filefrag
 _require_scratch
 
+# The first _test_streams call sets up the filestreams allocator to fail and
+# then checks that it actually failed.  It does this by creating a very small
+# filesystem, writing a lot of data in parallel to separate streams, and then
+# flushes the dirty data, also in parallel.  To trip the allocator, the test
+# relies on writeback combining adjacent dirty ranges into large allocation
+# requests which eventually bleed across AGs.  This happens either because the
+# big writes are slow enough that filestreams contexts expire between
+# allocation requests, or because the AGs are so full at allocation time that
+# the bmapi allocator decides to scan for a less full AG.  Either way, stream
+# directories share AGs, which is what the test considers a success.
+#
+# However, this only happens if writes use the delayed allocation code paths.
+# If the kernel allocates small amounts of space at the time of each write()
+# call, the successive small allocations never trip the bmapi allocator's
+# rescan thresholds and will keep pushing out the expiration time, with the
+# result that the filestreams allocator succeeds in maintaining the streams.
+# The test considers this a failure.
+#
+# Make sure that a regular buffered write produces delalloc reservations.
+# This effectively disables the test for files with extent size hints or DAX
+# mode set.
+_scratch_mkfs > $seqres.full
+_scratch_mount
+$XFS_IO_PROG -f -c 'pwrite 0 64k' $SCRATCH_MNT/testy &> /dev/null
+$FILEFRAG_PROG -v $SCRATCH_MNT/testy 2>&1 | grep -q delalloc || \
+	_notrun "test requires delayed allocation buffered writes"
+_scratch_unmount
+
 _check_filestreams_support || _notrun "filestreams not available"
 
 # test reaper works by setting timeout low. Expected to fail


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/8] generic/561: hide assertions when duperemove is killed
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
  2021-07-07  0:21 ` [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-09 23:38   ` Allison Henderson
  2021-07-07  0:21 ` [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces Darrick J. Wong
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Use some bash redirection trickery to capture in $seqres.full all of
bash's warnings about duperemove being killed due to assertions
triggering.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/561 |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/tests/generic/561 b/tests/generic/561
index bfd4443d..85037e50 100755
--- a/tests/generic/561
+++ b/tests/generic/561
@@ -62,8 +62,13 @@ dupe_run=$TEST_DIR/${seq}-running
 touch $dupe_run
 for ((i = 0; i < $((2 * LOAD_FACTOR)); i++)); do
 	while [ -e $dupe_run ]; do
-		$DUPEREMOVE_PROG -dr --dedupe-options=same $testdir \
-			>>$seqres.full 2>&1
+		# Employ shell trickery here so that the golden output does not
+		# capture assertions that trigger when killall shoots down
+		# dupremove processes in an arbitrary order, which leaves the
+		# memory in an inconsistent state long enough for the assert
+		# to trip.
+		cmd="$DUPEREMOVE_PROG -dr --dedupe-options=same $testdir"
+		bash -c "$cmd" >> $seqres.full 2>&1
 	done 2>&1 | sed -e '/Terminated/d' &
 	dedup_pids="$! $dedup_pids"
 done


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
  2021-07-07  0:21 ` [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation Darrick J. Wong
  2021-07-07  0:21 ` [PATCH 2/8] generic/561: hide assertions when duperemove is killed Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-09 23:39   ` Allison Henderson
  2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Correct the deletion loop in this test to work properly when there are
files in $here that have spaces in their name.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/shared/298 |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/tests/shared/298 b/tests/shared/298
index 981a4dfc..bd52b6a0 100755
--- a/tests/shared/298
+++ b/tests/shared/298
@@ -163,7 +163,7 @@ get_holes $img_file > $fiemap_ref
 
 # Delete some files
 find $loop_mnt -type f -print | $AWK_PROG \
-	'BEGIN {srand()}; {if(rand() > 0.7) print $1;}' | xargs rm
+	'BEGIN {srand()}; {if(rand() > 0.7) printf("%s\0", $0);}' | xargs -0 rm
 echo "done."
 
 echo -n "Running fstrim..."


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/8] dmthin: erase the metadata device properly before starting
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-07-07  0:21 ` [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-09 23:39   ` Allison Henderson
                     ` (2 more replies)
  2021-07-07  0:21 ` [PATCH 5/8] check: run _check_filesystems in an OOM-happy subshell Darrick J. Wong
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Every now and then I see the following failure when running generic/347:

--- generic/347.out
+++ generic/347.out.bad
@@ -1,2 +1,2 @@
 QA output created by 347
-=== completed
+failed to create dm thin pool device

Accompanied by the following dmesg spew:

device-mapper: thin metadata: sb_check failed: blocknr 7016996765293437281: wanted 0
device-mapper: block manager: superblock validator check failed for block 0
device-mapper: thin metadata: couldn't read superblock
device-mapper: table: 253:2: thin-pool: Error creating metadata object
device-mapper: ioctl: error adding target to table

7016996765293437281 is of course the magic number 0x6161616161616161,
which are stale ondisk contents left behind by previous tests that wrote
known tests patterns to files on the scratch device.  This is a bit
surprising, since _dmthin_init supposedly zeroes the first 4k of the
thin pool metadata device before initializing the pool.  Or does it?

dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null

Herein lies the problem: the dd process writes zeroes into the page
cache and exits.  Normally the block layer will flush the page cache
after the last file descriptor is closed, but once in a while the
terminating dd process won't be the only process in the system with an
open file descriptor!

That process is of course udev.  The write() call from dd triggers a
kernel uevent, which starts udev.  If udev is running particularly
slowly, it'll still be running an instant later when dd terminates,
thereby preventing the page cache flush.  If udev is still running a
moment later when we call dmsetup to set up the thin pool, the pool
creation will issue a bio to read the ondisk superblock.  This read
isn't coherent with the page cache, so it sees old disk contents and the
test fails even though we supposedly formatted the metadata device.

Fix this by explicitly flushing the page cache after writing the zeroes.

Fixes: 4b52fffb ("dm-thinp helpers in common/dmthin")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/dmthin |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/common/dmthin b/common/dmthin
index 3b1c7d45..91147e47 100644
--- a/common/dmthin
+++ b/common/dmthin
@@ -113,8 +113,12 @@ _dmthin_init()
 	_dmsetup_create $DMTHIN_DATA_NAME --table "$DMTHIN_DATA_TABLE" || \
 		_fatal "failed to create dm thin data device"
 
-	# Zap the pool metadata dev
-	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
+	# Zap the pool metadata dev.  Explicitly fsync the zeroes to disk
+	# because a slow-running udev running concurrently with dd can maintain
+	# an open file descriptor.  The block layer only flushes the page cache
+	# on last close, which means that the thin pool creation below will
+	# see the (stale) ondisk contents and fail.
+	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 conv=fsync &>/dev/null
 
 	# Thin pool
 	# "start length thin-pool metadata_dev data_dev data_block_size low_water_mark"


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/8] check: run _check_filesystems in an OOM-happy subshell
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-07  0:21 ` [PATCH 6/8] xfs/084: fix test program status collection and processing Darrick J. Wong
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

While running fstests one night, I observed that fstests stopped
abruptly because ./check ran _check_filesystems to run xfs_repair.  In
turn, repair (which inherited oom_score_adj=-1000 from ./check) consumed
so much memory that the OOM killer ran around killing other daemons,
rendering the system nonfunctional.

This is silly -- we set an OOM score adjustment of -1000 on the ./check
process so that the test framework itself wouldn't get OOM-killed,
because that aborts the entire run.  Everything else is fair game for
that, including subprocesses started by _check_filesystems.

Therefore, adapt _check_filesystems (and its children) to run in a
subshell with a much higher oom score adjustment.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check |   24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)


diff --git a/check b/check
index de8104d0..bb7e030c 100755
--- a/check
+++ b/check
@@ -525,17 +525,20 @@ _summary()
 
 _check_filesystems()
 {
+	local ret=0
+
 	if [ -f ${RESULT_DIR}/require_test ]; then
-		_check_test_fs || err=true
+		_check_test_fs || ret=1
 		rm -f ${RESULT_DIR}/require_test*
 	else
 		_test_unmount 2> /dev/null
 	fi
 	if [ -f ${RESULT_DIR}/require_scratch ]; then
-		_check_scratch_fs || err=true
+		_check_scratch_fs || ret=1
 		rm -f ${RESULT_DIR}/require_scratch*
 	fi
 	_scratch_unmount 2> /dev/null
+	return $ret
 }
 
 _expunge_test()
@@ -558,11 +561,15 @@ test $? -eq 77 && HAVE_SYSTEMD_SCOPES=yes
 
 # Make the check script unattractive to the OOM killer...
 OOM_SCORE_ADJ="/proc/self/oom_score_adj"
-test -w ${OOM_SCORE_ADJ} && echo -1000 > ${OOM_SCORE_ADJ}
+function _adjust_oom_score() {
+	test -w "${OOM_SCORE_ADJ}" && echo "$1" > "${OOM_SCORE_ADJ}"
+}
+_adjust_oom_score -1000
 
 # ...and make the tests themselves somewhat more attractive to it, so that if
 # the system runs out of memory it'll be the test that gets killed and not the
-# test framework.
+# test framework.  The test is run in a separate process without any of our
+# functions, so we open-code adjusting the OOM score.
 #
 # If systemd is available, run the entire test script in a scope so that we can
 # kill all subprocesses of the test if it fails to clean up after itself.  This
@@ -875,9 +882,12 @@ function run_section()
 			rm -f ${RESULT_DIR}/require_scratch*
 			err=true
 		else
-			# the test apparently passed, so check for corruption
-			# and log messages that shouldn't be there.
-			_check_filesystems
+			# The test apparently passed, so check for corruption
+			# and log messages that shouldn't be there.  Run the
+			# checking tools from a subshell with adjusted OOM
+			# score so that the OOM killer will target them instead
+			# of the check script itself.
+			(_adjust_oom_score 250; _check_filesystems) || err=true
 			_check_dmesg || err=true
 		fi
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/8] xfs/084: fix test program status collection and processing
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2021-07-07  0:21 ` [PATCH 5/8] check: run _check_filesystems in an OOM-happy subshell Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-07  0:21 ` [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS Darrick J. Wong
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

On a test VM with 1.2GB memory, I noticed that the test will sometimes
fail because resvtest leaks too much memory and gets OOM killed.  It
would be useful to _notrun the test when this happens so that it doesn't
appear as an intermittent regression.

The exit code processing in this test is incorrect, since "$?" will get
us the exit status of _filter_resv, not $here/src/resvtest.  Fix that
as part of learning to detect a SIGKILL and skip the test.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/084 |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/tests/xfs/084 b/tests/xfs/084
index 5967fe12..e796fec4 100755
--- a/tests/xfs/084
+++ b/tests/xfs/084
@@ -33,13 +33,17 @@ _require_test
 echo
 echo "*** First case - I/O blocksize same as pagesize"
 $here/src/resvtest -i 20 -b $pgsize "$TEST_DIR/resv" | _filter_resv
-[ $? -eq 0 ] && echo done
+res=${PIPESTATUS[0]}
+[ $res -eq 137 ] && _notrun "resvtest -i 20 -b $pgsize was SIGKILLed (OOM?)"
+[ $res -eq 0 ] && echo done
 rm -f "$TEST_DIR/mumble"
 
 echo
 echo "*** Second case - 512 byte I/O blocksize"
 $here/src/resvtest -i 40 -b 512 "$TEST_DIR/resv" | _filter_resv
-[ $? -eq 0 ] && echo done
+res=${PIPESTATUS[0]}
+[ $res -eq 137 ] && _notrun "resvtest -i 40 -b 512 was SIGKILLed (OOM?)"
+[ $res -eq 0 ] && echo done
 rm -f "$TEST_DIR/grumble"
 
 # success, all done


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2021-07-07  0:21 ` [PATCH 6/8] xfs/084: fix test program status collection and processing Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-09 23:50   ` Allison Henderson
  2021-07-07  0:21 ` [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors Darrick J. Wong
  2021-07-18 14:35 ` [PATCHSET 0/8] fstests: random fixes Eryu Guan
  8 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Once in a very long while, the fallocate calls in this test will fail
due to ENOSPC conditions.  While in theory this test is careful only to
allocate at most 160M of space from a 256M filesystem, there's a twist
on XFS: speculative preallocation.

The first loop in this test is an 80M appending write done in units of
4k.  Once the file size hits 64k, XFS will begin speculatively
preallocating blocks past the end of the file; as the file grows larger,
so will the speculative preallocation.

Since the pwrite/rm loop races with the fallocate/rm loop, it's possible
that the fallocate loop will free that file just before the buffered
write extends the speculative preallocation out to 160MB.  With fs and
log overhead, that doesn't leave enough free space to start the 80MB
fallocate request, which tries to avoid disappointing the caller by
freeing all speculative preallocations.  That fails if the pwriter
thread owns the IOLOCK on $testfile1, so fallocate returns ENOSPC and
the test fails.

The simple solution here is to disable speculative preallocation by
setting an extent size hint if the fs is XFS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/371 |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/tests/generic/371 b/tests/generic/371
index c94fa85e..a2fdaf7b 100755
--- a/tests/generic/371
+++ b/tests/generic/371
@@ -18,10 +18,18 @@ _begin_fstest auto quick enospc prealloc
 _supported_fs generic
 _require_scratch
 _require_xfs_io_command "falloc"
+test "$FSTYP" = "xfs" && _require_xfs_io_command "extsize"
 
 _scratch_mkfs_sized $((256 * 1024 * 1024)) >> $seqres.full 2>&1
 _scratch_mount
 
+# Disable speculative post-EOF preallocation on XFS, which can grow fast enough
+# that a racing fallocate then fails.
+if [ "$FSTYP" = "xfs" ]; then
+	alloc_sz="$(_get_file_block_size $SCRATCH_MNT)"
+	$XFS_IO_PROG -c "extsize $alloc_sz" $SCRATCH_MNT >> $seqres.full
+fi
+
 testfile1=$SCRATCH_MNT/testfile1
 testfile2=$SCRATCH_MNT/testfile2
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
                   ` (6 preceding siblings ...)
  2021-07-07  0:21 ` [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS Darrick J. Wong
@ 2021-07-07  0:21 ` Darrick J. Wong
  2021-07-09 23:39   ` Allison Henderson
  2021-07-18 14:35 ` [PATCHSET 0/8] fstests: random fixes Eryu Guan
  8 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-07  0:21 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Disable coredumps so that fstests won't mark the test failed when the
EIO injector causes an mmap write to abort with SIGBUS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/019 |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/tests/generic/019 b/tests/generic/019
index bd234815..b8d025d6 100755
--- a/tests/generic/019
+++ b/tests/generic/019
@@ -62,6 +62,9 @@ NUM_JOBS=$((4*LOAD_FACTOR))
 BLK_DEV_SIZE=`blockdev --getsz $SCRATCH_DEV`
 FILE_SIZE=$((BLK_DEV_SIZE * 512))
 
+# Don't fail the test just because fio or fsstress dump cores
+ulimit -c 0
+
 cat >$fio_config <<EOF
 ###########
 # $seq test's fio activity


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation
  2021-07-07  0:21 ` [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation Darrick J. Wong
@ 2021-07-09 23:38   ` Allison Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Allison Henderson @ 2021-07-09 23:38 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu; +Cc: linux-xfs, fstests, guan



On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This test tries to exploit an interaction between delayed allocation and
> writeback on full filesystems to see if it can trip up the filestreams
> allocator.  The behaviors do not present if the filesystem allocates
> space at write time, so disable it under these scenarios.
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   tests/xfs/172 |   30 +++++++++++++++++++++++++++++-
>   1 file changed, 29 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/tests/xfs/172 b/tests/xfs/172
> index 0d1b441e..c0495305 100755
> --- a/tests/xfs/172
> +++ b/tests/xfs/172
> @@ -16,9 +16,37 @@ _begin_fstest rw filestreams
>   
>   # real QA test starts here
>   _supported_fs xfs
> -
> +_require_command "$FILEFRAG_PROG" filefrag
>   _require_scratch
>   
> +# The first _test_streams call sets up the filestreams allocator to fail and
> +# then checks that it actually failed.  It does this by creating a very small
> +# filesystem, writing a lot of data in parallel to separate streams, and then
> +# flushes the dirty data, also in parallel.  To trip the allocator, the test
> +# relies on writeback combining adjacent dirty ranges into large allocation
> +# requests which eventually bleed across AGs.  This happens either because the
> +# big writes are slow enough that filestreams contexts expire between
> +# allocation requests, or because the AGs are so full at allocation time that
> +# the bmapi allocator decides to scan for a less full AG.  Either way, stream
> +# directories share AGs, which is what the test considers a success.
> +#
> +# However, this only happens if writes use the delayed allocation code paths.
> +# If the kernel allocates small amounts of space at the time of each write()
> +# call, the successive small allocations never trip the bmapi allocator's
> +# rescan thresholds and will keep pushing out the expiration time, with the
> +# result that the filestreams allocator succeeds in maintaining the streams.
> +# The test considers this a failure.
> +#
> +# Make sure that a regular buffered write produces delalloc reservations.
> +# This effectively disables the test for files with extent size hints or DAX
> +# mode set.
> +_scratch_mkfs > $seqres.full
> +_scratch_mount
> +$XFS_IO_PROG -f -c 'pwrite 0 64k' $SCRATCH_MNT/testy &> /dev/null
> +$FILEFRAG_PROG -v $SCRATCH_MNT/testy 2>&1 | grep -q delalloc || \
> +	_notrun "test requires delayed allocation buffered writes"
> +_scratch_unmount
> +
>   _check_filestreams_support || _notrun "filestreams not available"
>   
>   # test reaper works by setting timeout low. Expected to fail
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] generic/561: hide assertions when duperemove is killed
  2021-07-07  0:21 ` [PATCH 2/8] generic/561: hide assertions when duperemove is killed Darrick J. Wong
@ 2021-07-09 23:38   ` Allison Henderson
  2021-07-10  1:25     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Allison Henderson @ 2021-07-09 23:38 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu; +Cc: linux-xfs, fstests, guan



On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Use some bash redirection trickery to capture in $seqres.full all of
> bash's warnings about duperemove being killed due to assertions
> triggering.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   tests/generic/561 |    9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/tests/generic/561 b/tests/generic/561
> index bfd4443d..85037e50 100755
> --- a/tests/generic/561
> +++ b/tests/generic/561
> @@ -62,8 +62,13 @@ dupe_run=$TEST_DIR/${seq}-running
>   touch $dupe_run
>   for ((i = 0; i < $((2 * LOAD_FACTOR)); i++)); do
>   	while [ -e $dupe_run ]; do
> -		$DUPEREMOVE_PROG -dr --dedupe-options=same $testdir \
> -			>>$seqres.full 2>&1
> +		# Employ shell trickery here so that the golden output does not
nit:
I think I'd be more more specific with the commentary:

                 # We run cmd in a bash shell so that the golden output ...
> +		# capture assertions that trigger when killall shoots down
> +		# dupremove processes in an arbitrary order, which leaves the
> +		# memory in an inconsistent state long enough for the assert
> +		# to trip.
> +		cmd="$DUPEREMOVE_PROG -dr --dedupe-options=same $testdir"
> +		bash -c "$cmd" >> $seqres.full 2>&1
>   	done 2>&1 | sed -e '/Terminated/d' &
>   	dedup_pids="$! $dedup_pids"
>   done
> 
Otherwise looks fine to me
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces
  2021-07-07  0:21 ` [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces Darrick J. Wong
@ 2021-07-09 23:39   ` Allison Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Allison Henderson @ 2021-07-09 23:39 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu; +Cc: linux-xfs, fstests, guan



On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Correct the deletion loop in this test to work properly when there are
> files in $here that have spaces in their name.
> 
Looks ok
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   tests/shared/298 |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/tests/shared/298 b/tests/shared/298
> index 981a4dfc..bd52b6a0 100755
> --- a/tests/shared/298
> +++ b/tests/shared/298
> @@ -163,7 +163,7 @@ get_holes $img_file > $fiemap_ref
>   
>   # Delete some files
>   find $loop_mnt -type f -print | $AWK_PROG \
> -	'BEGIN {srand()}; {if(rand() > 0.7) print $1;}' | xargs rm
> +	'BEGIN {srand()}; {if(rand() > 0.7) printf("%s\0", $0);}' | xargs -0 rm
>   echo "done."
>   
>   echo -n "Running fstrim..."
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/8] dmthin: erase the metadata device properly before starting
  2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
@ 2021-07-09 23:39   ` Allison Henderson
  2021-07-18 14:20   ` Eryu Guan
  2021-07-18 14:32   ` Eryu Guan
  2 siblings, 0 replies; 19+ messages in thread
From: Allison Henderson @ 2021-07-09 23:39 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu; +Cc: linux-xfs, fstests, guan



On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Every now and then I see the following failure when running generic/347:
> 
> --- generic/347.out
> +++ generic/347.out.bad
> @@ -1,2 +1,2 @@
>   QA output created by 347
> -=== completed
> +failed to create dm thin pool device
> 
> Accompanied by the following dmesg spew:
> 
> device-mapper: thin metadata: sb_check failed: blocknr 7016996765293437281: wanted 0
> device-mapper: block manager: superblock validator check failed for block 0
> device-mapper: thin metadata: couldn't read superblock
> device-mapper: table: 253:2: thin-pool: Error creating metadata object
> device-mapper: ioctl: error adding target to table
> 
> 7016996765293437281 is of course the magic number 0x6161616161616161,
> which are stale ondisk contents left behind by previous tests that wrote
> known tests patterns to files on the scratch device.  This is a bit
> surprising, since _dmthin_init supposedly zeroes the first 4k of the
> thin pool metadata device before initializing the pool.  Or does it?
> 
> dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> 
> Herein lies the problem: the dd process writes zeroes into the page
> cache and exits.  Normally the block layer will flush the page cache
> after the last file descriptor is closed, but once in a while the
> terminating dd process won't be the only process in the system with an
> open file descriptor!
> 
> That process is of course udev.  The write() call from dd triggers a
> kernel uevent, which starts udev.  If udev is running particularly
> slowly, it'll still be running an instant later when dd terminates,
> thereby preventing the page cache flush.  If udev is still running a
> moment later when we call dmsetup to set up the thin pool, the pool
> creation will issue a bio to read the ondisk superblock.  This read
> isn't coherent with the page cache, so it sees old disk contents and the
> test fails even though we supposedly formatted the metadata device.
> 
> Fix this by explicitly flushing the page cache after writing the zeroes.
> 
> Fixes: 4b52fffb ("dm-thinp helpers in common/dmthin")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Ok, makes sense.  Good catch!
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>   common/dmthin |    8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/common/dmthin b/common/dmthin
> index 3b1c7d45..91147e47 100644
> --- a/common/dmthin
> +++ b/common/dmthin
> @@ -113,8 +113,12 @@ _dmthin_init()
>   	_dmsetup_create $DMTHIN_DATA_NAME --table "$DMTHIN_DATA_TABLE" || \
>   		_fatal "failed to create dm thin data device"
>   
> -	# Zap the pool metadata dev
> -	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> +	# Zap the pool metadata dev.  Explicitly fsync the zeroes to disk
> +	# because a slow-running udev running concurrently with dd can maintain
> +	# an open file descriptor.  The block layer only flushes the page cache
> +	# on last close, which means that the thin pool creation below will
> +	# see the (stale) ondisk contents and fail.
> +	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 conv=fsync &>/dev/null
>   
>   	# Thin pool
>   	# "start length thin-pool metadata_dev data_dev data_block_size low_water_mark"
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors
  2021-07-07  0:21 ` [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors Darrick J. Wong
@ 2021-07-09 23:39   ` Allison Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Allison Henderson @ 2021-07-09 23:39 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu; +Cc: linux-xfs, fstests, guan



On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Disable coredumps so that fstests won't mark the test failed when the
> EIO injector causes an mmap write to abort with SIGBUS.
> 
Looks ok
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   tests/generic/019 |    3 +++
>   1 file changed, 3 insertions(+)
> 
> 
> diff --git a/tests/generic/019 b/tests/generic/019
> index bd234815..b8d025d6 100755
> --- a/tests/generic/019
> +++ b/tests/generic/019
> @@ -62,6 +62,9 @@ NUM_JOBS=$((4*LOAD_FACTOR))
>   BLK_DEV_SIZE=`blockdev --getsz $SCRATCH_DEV`
>   FILE_SIZE=$((BLK_DEV_SIZE * 512))
>   
> +# Don't fail the test just because fio or fsstress dump cores
> +ulimit -c 0
> +
>   cat >$fio_config <<EOF
>   ###########
>   # $seq test's fio activity
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS
  2021-07-07  0:21 ` [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS Darrick J. Wong
@ 2021-07-09 23:50   ` Allison Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Allison Henderson @ 2021-07-09 23:50 UTC (permalink / raw)
  To: Darrick J. Wong, guaneryu; +Cc: linux-xfs, fstests, guan



On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Once in a very long while, the fallocate calls in this test will fail
> due to ENOSPC conditions.  While in theory this test is careful only to
> allocate at most 160M of space from a 256M filesystem, there's a twist
> on XFS: speculative preallocation.
> 
> The first loop in this test is an 80M appending write done in units of
> 4k.  Once the file size hits 64k, XFS will begin speculatively
> preallocating blocks past the end of the file; as the file grows larger,
> so will the speculative preallocation.
> 
> Since the pwrite/rm loop races with the fallocate/rm loop, it's possible
> that the fallocate loop will free that file just before the buffered
> write extends the speculative preallocation out to 160MB.  With fs and
> log overhead, that doesn't leave enough free space to start the 80MB
> fallocate request, which tries to avoid disappointing the caller by
> freeing all speculative preallocations.  That fails if the pwriter
> thread owns the IOLOCK on $testfile1, so fallocate returns ENOSPC and
> the test fails.
> 
> The simple solution here is to disable speculative preallocation by
> setting an extent size hint if the fs is XFS.
> 
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   tests/generic/371 |    8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> 
> diff --git a/tests/generic/371 b/tests/generic/371
> index c94fa85e..a2fdaf7b 100755
> --- a/tests/generic/371
> +++ b/tests/generic/371
> @@ -18,10 +18,18 @@ _begin_fstest auto quick enospc prealloc
>   _supported_fs generic
>   _require_scratch
>   _require_xfs_io_command "falloc"
> +test "$FSTYP" = "xfs" && _require_xfs_io_command "extsize"
>   
>   _scratch_mkfs_sized $((256 * 1024 * 1024)) >> $seqres.full 2>&1
>   _scratch_mount
>   
> +# Disable speculative post-EOF preallocation on XFS, which can grow fast enough
> +# that a racing fallocate then fails.
> +if [ "$FSTYP" = "xfs" ]; then
> +	alloc_sz="$(_get_file_block_size $SCRATCH_MNT)"
> +	$XFS_IO_PROG -c "extsize $alloc_sz" $SCRATCH_MNT >> $seqres.full
> +fi
> +
>   testfile1=$SCRATCH_MNT/testfile1
>   testfile2=$SCRATCH_MNT/testfile2
>   
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] generic/561: hide assertions when duperemove is killed
  2021-07-09 23:38   ` Allison Henderson
@ 2021-07-10  1:25     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-07-10  1:25 UTC (permalink / raw)
  To: Allison Henderson; +Cc: guaneryu, linux-xfs, fstests, guan

On Fri, Jul 09, 2021 at 04:38:46PM -0700, Allison Henderson wrote:
> 
> 
> On 7/6/21 5:21 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Use some bash redirection trickery to capture in $seqres.full all of
> > bash's warnings about duperemove being killed due to assertions
> > triggering.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >   tests/generic/561 |    9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/tests/generic/561 b/tests/generic/561
> > index bfd4443d..85037e50 100755
> > --- a/tests/generic/561
> > +++ b/tests/generic/561
> > @@ -62,8 +62,13 @@ dupe_run=$TEST_DIR/${seq}-running
> >   touch $dupe_run
> >   for ((i = 0; i < $((2 * LOAD_FACTOR)); i++)); do
> >   	while [ -e $dupe_run ]; do
> > -		$DUPEREMOVE_PROG -dr --dedupe-options=same $testdir \
> > -			>>$seqres.full 2>&1
> > +		# Employ shell trickery here so that the golden output does not
> nit:
> I think I'd be more more specific with the commentary:
> 
>                 # We run cmd in a bash shell so that the golden output ...

Ok, fixed.

--D

> > +		# capture assertions that trigger when killall shoots down
> > +		# dupremove processes in an arbitrary order, which leaves the
> > +		# memory in an inconsistent state long enough for the assert
> > +		# to trip.
> > +		cmd="$DUPEREMOVE_PROG -dr --dedupe-options=same $testdir"
> > +		bash -c "$cmd" >> $seqres.full 2>&1
> >   	done 2>&1 | sed -e '/Terminated/d' &
> >   	dedup_pids="$! $dedup_pids"
> >   done
> > 
> Otherwise looks fine to me
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/8] dmthin: erase the metadata device properly before starting
  2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
  2021-07-09 23:39   ` Allison Henderson
@ 2021-07-18 14:20   ` Eryu Guan
  2021-07-18 14:32   ` Eryu Guan
  2 siblings, 0 replies; 19+ messages in thread
From: Eryu Guan @ 2021-07-18 14:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests

On Tue, Jul 06, 2021 at 05:21:28PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Every now and then I see the following failure when running generic/347:
> 
> --- generic/347.out
> +++ generic/347.out.bad
> @@ -1,2 +1,2 @@
>  QA output created by 347
> -=== completed
> +failed to create dm thin pool device
> 
> Accompanied by the following dmesg spew:
> 
> device-mapper: thin metadata: sb_check failed: blocknr 7016996765293437281: wanted 0
> device-mapper: block manager: superblock validator check failed for block 0
> device-mapper: thin metadata: couldn't read superblock
> device-mapper: table: 253:2: thin-pool: Error creating metadata object
> device-mapper: ioctl: error adding target to table
> 
> 7016996765293437281 is of course the magic number 0x6161616161616161,
> which are stale ondisk contents left behind by previous tests that wrote
> known tests patterns to files on the scratch device.  This is a bit
> surprising, since _dmthin_init supposedly zeroes the first 4k of the
> thin pool metadata device before initializing the pool.  Or does it?
> 
> dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> 
> Herein lies the problem: the dd process writes zeroes into the page
> cache and exits.  Normally the block layer will flush the page cache
> after the last file descriptor is closed, but once in a while the
> terminating dd process won't be the only process in the system with an
> open file descriptor!
> 
> That process is of course udev.  The write() call from dd triggers a
> kernel uevent, which starts udev.  If udev is running particularly
> slowly, it'll still be running an instant later when dd terminates,
> thereby preventing the page cache flush.  If udev is still running a
> moment later when we call dmsetup to set up the thin pool, the pool
> creation will issue a bio to read the ondisk superblock.  This read
> isn't coherent with the page cache, so it sees old disk contents and the
> test fails even though we supposedly formatted the metadata device.
> 
> Fix this by explicitly flushing the page cache after writing the zeroes.
> 
> Fixes: 4b52fffb ("dm-thinp helpers in common/dmthin")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

This is a hard one to catch, thanks for the patch! :)

Eryu

> ---
>  common/dmthin |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/common/dmthin b/common/dmthin
> index 3b1c7d45..91147e47 100644
> --- a/common/dmthin
> +++ b/common/dmthin
> @@ -113,8 +113,12 @@ _dmthin_init()
>  	_dmsetup_create $DMTHIN_DATA_NAME --table "$DMTHIN_DATA_TABLE" || \
>  		_fatal "failed to create dm thin data device"
>  
> -	# Zap the pool metadata dev
> -	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> +	# Zap the pool metadata dev.  Explicitly fsync the zeroes to disk
> +	# because a slow-running udev running concurrently with dd can maintain
> +	# an open file descriptor.  The block layer only flushes the page cache
> +	# on last close, which means that the thin pool creation below will
> +	# see the (stale) ondisk contents and fail.
> +	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 conv=fsync &>/dev/null
>  
>  	# Thin pool
>  	# "start length thin-pool metadata_dev data_dev data_block_size low_water_mark"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/8] dmthin: erase the metadata device properly before starting
  2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
  2021-07-09 23:39   ` Allison Henderson
  2021-07-18 14:20   ` Eryu Guan
@ 2021-07-18 14:32   ` Eryu Guan
  2 siblings, 0 replies; 19+ messages in thread
From: Eryu Guan @ 2021-07-18 14:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests

On Tue, Jul 06, 2021 at 05:21:28PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Every now and then I see the following failure when running generic/347:
> 
> --- generic/347.out
> +++ generic/347.out.bad
> @@ -1,2 +1,2 @@
>  QA output created by 347
> -=== completed
> +failed to create dm thin pool device

Note that I have to add space indention to above diff lines, otherwise
it seems git am will treat lines after "---" as the beginning of patch,
and failed to apply.

Thanks,
Eryu

> 
> Accompanied by the following dmesg spew:
> 
> device-mapper: thin metadata: sb_check failed: blocknr 7016996765293437281: wanted 0
> device-mapper: block manager: superblock validator check failed for block 0
> device-mapper: thin metadata: couldn't read superblock
> device-mapper: table: 253:2: thin-pool: Error creating metadata object
> device-mapper: ioctl: error adding target to table
> 
> 7016996765293437281 is of course the magic number 0x6161616161616161,
> which are stale ondisk contents left behind by previous tests that wrote
> known tests patterns to files on the scratch device.  This is a bit
> surprising, since _dmthin_init supposedly zeroes the first 4k of the
> thin pool metadata device before initializing the pool.  Or does it?
> 
> dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> 
> Herein lies the problem: the dd process writes zeroes into the page
> cache and exits.  Normally the block layer will flush the page cache
> after the last file descriptor is closed, but once in a while the
> terminating dd process won't be the only process in the system with an
> open file descriptor!
> 
> That process is of course udev.  The write() call from dd triggers a
> kernel uevent, which starts udev.  If udev is running particularly
> slowly, it'll still be running an instant later when dd terminates,
> thereby preventing the page cache flush.  If udev is still running a
> moment later when we call dmsetup to set up the thin pool, the pool
> creation will issue a bio to read the ondisk superblock.  This read
> isn't coherent with the page cache, so it sees old disk contents and the
> test fails even though we supposedly formatted the metadata device.
> 
> Fix this by explicitly flushing the page cache after writing the zeroes.
> 
> Fixes: 4b52fffb ("dm-thinp helpers in common/dmthin")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/dmthin |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/common/dmthin b/common/dmthin
> index 3b1c7d45..91147e47 100644
> --- a/common/dmthin
> +++ b/common/dmthin
> @@ -113,8 +113,12 @@ _dmthin_init()
>  	_dmsetup_create $DMTHIN_DATA_NAME --table "$DMTHIN_DATA_TABLE" || \
>  		_fatal "failed to create dm thin data device"
>  
> -	# Zap the pool metadata dev
> -	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> +	# Zap the pool metadata dev.  Explicitly fsync the zeroes to disk
> +	# because a slow-running udev running concurrently with dd can maintain
> +	# an open file descriptor.  The block layer only flushes the page cache
> +	# on last close, which means that the thin pool creation below will
> +	# see the (stale) ondisk contents and fail.
> +	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 conv=fsync &>/dev/null
>  
>  	# Thin pool
>  	# "start length thin-pool metadata_dev data_dev data_block_size low_water_mark"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/8] fstests: random fixes
  2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
                   ` (7 preceding siblings ...)
  2021-07-07  0:21 ` [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors Darrick J. Wong
@ 2021-07-18 14:35 ` Eryu Guan
  8 siblings, 0 replies; 19+ messages in thread
From: Eryu Guan @ 2021-07-18 14:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests

On Tue, Jul 06, 2021 at 05:21:07PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> This series fixes a bunch of small problems in the test suite that were
> causing intermittent test failures.

I've applied all fixes except patch 2/8, apparently it got a new version
in your local tree.

Thanks for all the fixes!

Eryu

> 
> If you're going to start using this mess, you probably ought to just
> pull from my git trees, which are linked below.
> 
> This is an extraordinary way to destroy everything.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes
> 
> xfsprogs git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes
> 
> fstests git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
> ---
>  check             |   24 +++++++++++++++++-------
>  common/dmthin     |    8 ++++++--
>  tests/generic/019 |    3 +++
>  tests/generic/371 |    8 ++++++++
>  tests/generic/561 |    9 +++++++--
>  tests/shared/298  |    2 +-
>  tests/xfs/084     |    8 ++++++--
>  tests/xfs/172     |   30 +++++++++++++++++++++++++++++-
>  8 files changed, 77 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-07-18 14:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
2021-07-07  0:21 ` [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation Darrick J. Wong
2021-07-09 23:38   ` Allison Henderson
2021-07-07  0:21 ` [PATCH 2/8] generic/561: hide assertions when duperemove is killed Darrick J. Wong
2021-07-09 23:38   ` Allison Henderson
2021-07-10  1:25     ` Darrick J. Wong
2021-07-07  0:21 ` [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces Darrick J. Wong
2021-07-09 23:39   ` Allison Henderson
2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
2021-07-09 23:39   ` Allison Henderson
2021-07-18 14:20   ` Eryu Guan
2021-07-18 14:32   ` Eryu Guan
2021-07-07  0:21 ` [PATCH 5/8] check: run _check_filesystems in an OOM-happy subshell Darrick J. Wong
2021-07-07  0:21 ` [PATCH 6/8] xfs/084: fix test program status collection and processing Darrick J. Wong
2021-07-07  0:21 ` [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS Darrick J. Wong
2021-07-09 23:50   ` Allison Henderson
2021-07-07  0:21 ` [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors Darrick J. Wong
2021-07-09 23:39   ` Allison Henderson
2021-07-18 14:35 ` [PATCHSET 0/8] fstests: random fixes Eryu Guan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).