[PATCH 0/4] fix up generic dmlogwrites tests to work with XFS

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS
@ 2020-08-26 14:38 Brian Foster
  2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs

Hi all,

We've still had lingering false positive failure reports on some of
these generic dmlogwrites tests on XFS due to metadata ordering issues.
The logwrites mechanism relies on discard to provide zeroing behavior to
avoid this, but if that is not available, this can result in subtle
failures that take time to diagnose.

This series updates the remaining generic dmlogwrites tests to use the
same scheme we used in generic/482 to address this problem, which is to
explicitly use a thin volume for predictable discard support. It also
adds a discard zeroing behavior check as a backstop against future
tests. The thought crossed my mind of pushing much of this code down
into the common dmlogwrites code to reduce duplication, but I didn't
want to get too deep into the weeds of reworking the common code to
address this problem in a handful of tests.

Thoughts, reviews, flames appreciated.

Brian

Brian Foster (4):
  generic: require discard zero behavior for dmlogwrites on XFS
  generic/455: use thin volume for dmlogwrites target device
  generic/457: use thin volume for dmlogwrites target device
  generic/470: use thin volume for dmlogwrites target device

 common/dmlogwrites | 10 ++++++++--
 common/rc          | 14 ++++++++++++++
 tests/generic/455  | 36 ++++++++++++++++++++++--------------
 tests/generic/457  | 33 +++++++++++++++++++++------------
 tests/generic/470  | 24 ++++++++++++++++++------
 5 files changed, 83 insertions(+), 34 deletions(-)

-- 
2.25.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster
@ 2020-08-26 14:38 ` Brian Foster
  2020-08-27  6:58   ` Amir Goldstein
  2020-08-27  7:38   ` Christoph Hellwig
  2020-08-26 14:38 ` [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs

Several generic fstests use dm-log-writes to test the filesystem for
consistency at various crash recovery points. dm-log-writes and the
associated replay mechanism rely on zeroing via discard to clear
stale blocks when moving to various points in time of the fs. If the
storage doesn't provide zeroing or the discard requests exceed the
hardcoded maximum (128MB) of the fallback solution to physically
write zeroes, stale blocks are left around in the target fs. This
scheme is known to cause issues on XFS v5 superblocks if recovery
observes metadata from a future variant of an fs that has been
replayed to an older point in time. This corrupts the filesystem and
leads to false test failures.

generic/482 already works around this problem by using a thin volume
as the target device, which provides consistent and efficient
discard zeroing behavior, but other tests have seen similar issues
on XFS. Add an XFS specific check to the dmlogwrites init time code
that requires discard zeroing support and otherwise skips the test
to avoid false positive failures.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 common/dmlogwrites | 10 ++++++++--
 common/rc          | 14 ++++++++++++++
 tests/generic/470  |  2 +-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/common/dmlogwrites b/common/dmlogwrites
index 573f4b8a..92cc6ce2 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt()
 	_require_test_program "log-writes/replay-log"
 
 	local ret=0
-	local mountopt=$1
+	local dev=$1
+	local mountopt=$2
 
-	_log_writes_init $SCRATCH_DEV
+	_log_writes_init $dev
 	_log_writes_mkfs > /dev/null 2>&1
 	_log_writes_mount "-o $mountopt" > /dev/null 2>&1
 	# Check options to be sure.
@@ -66,6 +67,11 @@ _log_writes_init()
 	[ -z "$blkdev" ] && _fail \
 	"block dev must be specified for _log_writes_init"
 
+	# XFS requires discard zeroing support on the target device to work
+	# reliably with dm-log-writes. Use dm-thin devices in tests that want
+	# to provide reliable discard zeroing support.
+	[ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev
+
 	local BLK_DEV_SIZE=`blockdev --getsz $blkdev`
 	LOGWRITES_NAME=logwrites-test
 	LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
diff --git a/common/rc b/common/rc
index aa5a7409..fedb5221 100644
--- a/common/rc
+++ b/common/rc
@@ -4313,6 +4313,20 @@ _require_mknod()
 	rm -f $TEST_DIR/$seq.null
 }
 
+# check that discard is supported and subsequent reads return zeroes
+_require_discard_zeroes()
+{
+	local dev=$1
+
+	_require_command "$BLKDISCARD_PROG" blkdiscard
+
+	$XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 ||
+		_fail "write error"
+	$BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support"
+	hexdump -n 4096 $dev | head -n 1 | grep cdcd &&
+		_notrun "no discard zeroing support"
+}
+
 init_rc
 
 ################################################################################
diff --git a/tests/generic/470 b/tests/generic/470
index fd6da563..707b6237 100755
--- a/tests/generic/470
+++ b/tests/generic/470
@@ -35,7 +35,7 @@ rm -f $seqres.full
 _supported_fs generic
 _supported_os Linux
 _require_scratch
-_require_log_writes_dax_mountopt "dax"
+_require_log_writes_dax_mountopt $SCRATCH_DEV "dax"
 _require_xfs_io_command "mmap" "-S"
 _require_xfs_io_command "log_writes"
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device
  2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster
  2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster
@ 2020-08-26 14:38 ` Brian Foster
  2020-08-26 14:38 ` [PATCH 3/4] generic/457: " Brian Foster
  2020-08-26 14:38 ` [PATCH 4/4] generic/470: " Brian Foster
  3 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs

dmlogwrites support for XFS depends on discard zeroing support of
the intended target device. Update the test to use a thin volume and
allow it to run consistently and reliably on XFS.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 tests/generic/455 | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/tests/generic/455 b/tests/generic/455
index 05621220..72a44fda 100755
--- a/tests/generic/455
+++ b/tests/generic/455
@@ -16,12 +16,14 @@ status=1	# failure is the default!
 _cleanup()
 {
 	_log_writes_cleanup
+	_dmthin_cleanup
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # get standard environment, filters and checks
 . ./common/rc
 . ./common/filter
+. ./common/dmthin
 . ./common/dmlogwrites
 
 # real QA test starts here
@@ -30,6 +32,7 @@ _supported_os Linux
 _require_test
 _require_scratch_nocheck
 _require_log_writes
+_require_dm_target thin-pool
 
 rm -f $seqres.full
 
@@ -42,13 +45,12 @@ check_files()
 		local filename=$(basename $i)
 		local mark="${filename##*.}"
 		echo "checking $filename" >> $seqres.full
-		_log_writes_replay_log $filename $SCRATCH_DEV
-		_scratch_mount
+		_log_writes_replay_log $filename $DMTHIN_VOL_DEV
+		_dmthin_mount
 		local expected_md5=$(_md5_checksum $i)
 		local md5=$(_md5_checksum $SCRATCH_MNT/$name)
 		[ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
-		_scratch_unmount
-		_check_scratch_fs
+		_dmthin_check_fs
 	done
 }
 
@@ -56,8 +58,16 @@ SANITY_DIR=$TEST_DIR/fsxtests
 rm -rf $SANITY_DIR
 mkdir $SANITY_DIR
 
+devsize=$((1024*1024*200 / 512))        # 200m phys/virt size
+csize=$((1024*64 / 512))                # 64k cluster size
+lowspace=$((1024*1024 / 512))           # 1m low space threshold
+
+# Use a thin device to provide deterministic discard behavior. Discards are used
+# by the log replay tool for fast zeroing to prevent out-of-order replay issues.
+_dmthin_init $devsize $devsize $csize $lowspace
+
 # Create the log
-_log_writes_init $SCRATCH_DEV
+_log_writes_init $DMTHIN_VOL_DEV
 
 _log_writes_mkfs >> $seqres.full 2>&1
 
@@ -88,14 +98,13 @@ _log_writes_mark last
 _log_writes_unmount
 _log_writes_mark end
 _log_writes_remove
-_check_scratch_fs
+_dmthin_check_fs
 
 # check pre umount
 echo "checking pre umount" >> $seqres.full
-_log_writes_replay_log last $SCRATCH_DEV
-_scratch_mount
-_scratch_unmount
-_check_scratch_fs
+_log_writes_replay_log last $DMTHIN_VOL_DEV
+_dmthin_mount
+_dmthin_check_fs
 
 for j in `seq 0 $((NUM_FILES-1))`; do
 	check_files testfile$j
@@ -103,14 +112,13 @@ done
 
 # Check the end
 echo "checking post umount" >> $seqres.full
-_log_writes_replay_log end $SCRATCH_DEV
-_scratch_mount
+_log_writes_replay_log end $DMTHIN_VOL_DEV
+_dmthin_mount
 for j in `seq 0 $((NUM_FILES-1))`; do
 	md5=$(_md5_checksum $SCRATCH_MNT/testfile$j)
 	[ "${md5}" != "${test_md5[$j]}" ] && _fail "testfile$j end md5sum mismatched"
 done
-_scratch_unmount
-_check_scratch_fs
+_dmthin_check_fs
 
 echo "Silence is golden"
 status=0
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] generic/457: use thin volume for dmlogwrites target device
  2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster
  2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster
  2020-08-26 14:38 ` [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device Brian Foster
@ 2020-08-26 14:38 ` Brian Foster
  2020-08-26 14:38 ` [PATCH 4/4] generic/470: " Brian Foster
  3 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs

dmlogwrites support for XFS depends on discard zeroing support of
the intended target device. Update the test to use a thin volume and
allow it to run consistently and reliably on XFS.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 tests/generic/457 | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/tests/generic/457 b/tests/generic/457
index 82367304..42a064d8 100755
--- a/tests/generic/457
+++ b/tests/generic/457
@@ -16,6 +16,7 @@ status=1	# failure is the default!
 _cleanup()
 {
 	_log_writes_cleanup
+	_dmthin_cleanup
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -23,6 +24,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common/rc
 . ./common/filter
 . ./common/reflink
+. ./common/dmthin
 . ./common/dmlogwrites
 
 # real QA test starts here
@@ -32,6 +34,7 @@ _require_test
 _require_scratch_reflink
 _require_cp_reflink
 _require_log_writes
+_require_dm_target thin-pool
 
 rm -f $seqres.full
 
@@ -44,13 +47,12 @@ check_files()
 		local filename=$(basename $i)
 		local mark="${filename##*.}"
 		echo "checking $filename" >> $seqres.full
-		_log_writes_replay_log $filename $SCRATCH_DEV
-		_scratch_mount
+		_log_writes_replay_log $filename $DMTHIN_VOL_DEV
+		_dmthin_mount
 		local expected_md5=$(_md5_checksum $i)
 		local md5=$(_md5_checksum $SCRATCH_MNT/$name)
 		[ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched"
-		_scratch_unmount
-		_check_scratch_fs
+		_dmthin_check_fs
 	done
 }
 
@@ -58,8 +60,16 @@ SANITY_DIR=$TEST_DIR/fsxtests
 rm -rf $SANITY_DIR
 mkdir $SANITY_DIR
 
+devsize=$((1024*1024*200 / 512))        # 200m phys/virt size
+csize=$((1024*64 / 512))                # 64k cluster size
+lowspace=$((1024*1024 / 512))           # 1m low space threshold
+
+# Use a thin device to provide deterministic discard behavior. Discards are used
+# by the log replay tool for fast zeroing to prevent out-of-order replay issues.
+_dmthin_init $devsize $devsize $csize $lowspace
+
 # Create the log
-_log_writes_init $SCRATCH_DEV
+_log_writes_init $DMTHIN_VOL_DEV
 
 _log_writes_mkfs >> $seqres.full 2>&1
 
@@ -92,14 +102,13 @@ _log_writes_mark last
 _log_writes_unmount
 _log_writes_mark end
 _log_writes_remove
-_check_scratch_fs
+_dmthin_check_fs
 
 # check pre umount
 echo "checking pre umount" >> $seqres.full
-_log_writes_replay_log last $SCRATCH_DEV
-_scratch_mount
-_scratch_unmount
-_check_scratch_fs
+_log_writes_replay_log last $DMTHIN_VOL_DEV
+_dmthin_mount
+_dmthin_check_fs
 
 for j in `seq 0 $((NUM_FILES-1))`; do
 	check_files testfile$j
@@ -107,8 +116,8 @@ done
 
 # Check the end
 echo "checking post umount" >> $seqres.full
-_log_writes_replay_log end $SCRATCH_DEV
-_scratch_mount
+_log_writes_replay_log end $DMTHIN_VOL_DEV
+_dmthin_mount
 for j in `seq 0 $((NUM_FILES-1))`; do
 	md5=$(_md5_checksum $SCRATCH_MNT/testfile$j)
 	[ "${md5}" != "${test_md5[$j]}" ] && _fail "testfile$j end md5sum mismatched"
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] generic/470: use thin volume for dmlogwrites target device
  2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster
                   ` (2 preceding siblings ...)
  2020-08-26 14:38 ` [PATCH 3/4] generic/457: " Brian Foster
@ 2020-08-26 14:38 ` Brian Foster
  3 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-26 14:38 UTC (permalink / raw)
  To: fstests; +Cc: linux-xfs

dmlogwrites support for XFS depends on discard zeroing support of
the intended target device. Update the test to use a thin volume and
allow it to run consistently and reliably on XFS.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 tests/generic/470 | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/tests/generic/470 b/tests/generic/470
index 707b6237..cb7ab484 100755
--- a/tests/generic/470
+++ b/tests/generic/470
@@ -20,12 +20,14 @@ _cleanup()
 {
 	cd /
 	_log_writes_cleanup
+	_dmthin_cleanup
 	rm -f $tmp.*
 }
 
 # get standard environment, filters and checks
 . ./common/rc
 . ./common/filter
+. ./common/dmthin
 . ./common/dmlogwrites
 
 # remove previous $seqres.full before test
@@ -35,11 +37,21 @@ rm -f $seqres.full
 _supported_fs generic
 _supported_os Linux
 _require_scratch
-_require_log_writes_dax_mountopt $SCRATCH_DEV "dax"
+_require_dm_target thin-pool
 _require_xfs_io_command "mmap" "-S"
 _require_xfs_io_command "log_writes"
 
-_log_writes_init $SCRATCH_DEV
+devsize=$((1024*1024*200 / 512))        # 200m phys/virt size
+csize=$((1024*64 / 512))                # 64k cluster size
+lowspace=$((1024*1024 / 512))           # 1m low space threshold
+
+# Use a thin device to provide deterministic discard behavior. Discards are used
+# by the log replay tool for fast zeroing to prevent out-of-order replay issues.
+_dmthin_init $devsize $devsize $csize $lowspace
+
+_require_log_writes_dax_mountopt $DMTHIN_VOL_DEV "dax"
+
+_log_writes_init $DMTHIN_VOL_DEV
 _log_writes_mkfs >> $seqres.full 2>&1
 _log_writes_mount -o dax
 
@@ -52,14 +64,14 @@ $XFS_IO_PROG -t -c "truncate $LEN" -c "mmap -S 0 $LEN" -c "mwrite 0 $LEN" \
 # Unmount the scratch dir and tear down the log writes target
 _log_writes_unmount
 _log_writes_remove
-_check_scratch_fs
+_dmthin_check_fs
 
 # destroy previous filesystem so we can be sure our rebuild works
-_scratch_mkfs >> $seqres.full 2>&1
+_mkfs_dev $DMTHIN_VOL_DEV >> $seqres.full 2>&1
 
 # check pre-unmap state
-_log_writes_replay_log preunmap $SCRATCH_DEV
-_scratch_mount
+_log_writes_replay_log preunmap $DMTHIN_VOL_DEV
+_dmthin_mount
 
 # We should see $SCRATCH_MNT/test as having 1 MiB in block allocations
 du -sh $SCRATCH_MNT/test | _filter_scratch | _filter_spaces
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster
@ 2020-08-27  6:58   ` Amir Goldstein
  2020-08-27  7:02     ` Christoph Hellwig
  2020-08-27  7:38   ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Amir Goldstein @ 2020-08-27  6:58 UTC (permalink / raw)
  To: Brian Foster; +Cc: fstests, linux-xfs, Christoph Hellwig

On Wed, Aug 26, 2020 at 5:38 PM Brian Foster <bfoster@redhat.com> wrote:
>
> Several generic fstests use dm-log-writes to test the filesystem for
> consistency at various crash recovery points. dm-log-writes and the
> associated replay mechanism rely on zeroing via discard to clear
> stale blocks when moving to various points in time of the fs. If the
> storage doesn't provide zeroing or the discard requests exceed the
> hardcoded maximum (128MB) of the fallback solution to physically
> write zeroes, stale blocks are left around in the target fs. This
> scheme is known to cause issues on XFS v5 superblocks if recovery
> observes metadata from a future variant of an fs that has been
> replayed to an older point in time. This corrupts the filesystem and
> leads to false test failures.
>
> generic/482 already works around this problem by using a thin volume
> as the target device, which provides consistent and efficient
> discard zeroing behavior, but other tests have seen similar issues
> on XFS. Add an XFS specific check to the dmlogwrites init time code
> that requires discard zeroing support and otherwise skips the test
> to avoid false positive failures.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  common/dmlogwrites | 10 ++++++++--
>  common/rc          | 14 ++++++++++++++
>  tests/generic/470  |  2 +-
>  3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/common/dmlogwrites b/common/dmlogwrites
> index 573f4b8a..92cc6ce2 100644
> --- a/common/dmlogwrites
> +++ b/common/dmlogwrites
> @@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt()
>         _require_test_program "log-writes/replay-log"
>
>         local ret=0
> -       local mountopt=$1
> +       local dev=$1
> +       local mountopt=$2
>
> -       _log_writes_init $SCRATCH_DEV
> +       _log_writes_init $dev
>         _log_writes_mkfs > /dev/null 2>&1
>         _log_writes_mount "-o $mountopt" > /dev/null 2>&1
>         # Check options to be sure.
> @@ -66,6 +67,11 @@ _log_writes_init()
>         [ -z "$blkdev" ] && _fail \
>         "block dev must be specified for _log_writes_init"
>
> +       # XFS requires discard zeroing support on the target device to work
> +       # reliably with dm-log-writes. Use dm-thin devices in tests that want
> +       # to provide reliable discard zeroing support.
> +       [ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev
> +

I imagine that ext4 could also be burned by this.
Do we have a reason to limit this requirement to xfs?
I prefer to make it generic.

>         local BLK_DEV_SIZE=`blockdev --getsz $blkdev`
>         LOGWRITES_NAME=logwrites-test
>         LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
> diff --git a/common/rc b/common/rc
> index aa5a7409..fedb5221 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4313,6 +4313,20 @@ _require_mknod()
>         rm -f $TEST_DIR/$seq.null
>  }
>
> +# check that discard is supported and subsequent reads return zeroes
> +_require_discard_zeroes()
> +{
> +       local dev=$1
> +
> +       _require_command "$BLKDISCARD_PROG" blkdiscard
> +
> +       $XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 ||
> +               _fail "write error"
> +       $BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support"
> +       hexdump -n 4096 $dev | head -n 1 | grep cdcd &&
> +               _notrun "no discard zeroing support"
> +}
> +

I am fine with your solution, but if there was a discussion on the best way to
solve the problem, I missed it, so would like to hear what Chritoph has to say.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27  6:58   ` Amir Goldstein
@ 2020-08-27  7:02     ` Christoph Hellwig
  2020-08-27  7:29       ` Amir Goldstein
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2020-08-27  7:02 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Brian Foster, fstests, linux-xfs, Christoph Hellwig

On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote:
> I imagine that ext4 could also be burned by this.
> Do we have a reason to limit this requirement to xfs?
> I prefer to make it generic.

The whole idea that discard zeroes data is just completely broken.
Discard is a hint that is explicitly allowed to do nothing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27  7:02     ` Christoph Hellwig
@ 2020-08-27  7:29       ` Amir Goldstein
  2020-08-27  7:37         ` Christoph Hellwig
  2020-08-27 14:11         ` Brian Foster
  0 siblings, 2 replies; 19+ messages in thread
From: Amir Goldstein @ 2020-08-27  7:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, fstests, linux-xfs, Josef Bacik

On Thu, Aug 27, 2020 at 10:02 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote:
> > I imagine that ext4 could also be burned by this.
> > Do we have a reason to limit this requirement to xfs?
> > I prefer to make it generic.
>
> The whole idea that discard zeroes data is just completely broken.
> Discard is a hint that is explicitly allowed to do nothing.

I figured you'd say something like that :)
but since we are talking about dm-thin as a solution for predictable
behavior at the moment and this sanity check helps avoiding adding
new tests that can fail to some extent, is the proposed bandaid good enough
to keep those tests alive until a better solution is proposed?

Another concern that I have is that perhaps adding dm-thin to the fsx tests
would change timing of io in a subtle way and hide the original bugs that they
caught:

47c7d0b19502 ("xfs: fix incorrect log_flushed on fsync")
3af423b03435 ("xfs: evict CoW fork extents when performing finsert/fcollapse")
51e3ae81ec58 ("ext4: fix interaction between i_size, fallocate, and
delalloc after a crash")

Because at the time I (re)wrote Josef's fsx tests, they flushed out the bugs
on spinning rust much more frequently than on SSD.

So Brian, if you could verify that the fsx tests still catch the original bugs
by reverting the fixes or running with an old kernel and probably running
on a slow disk that would be great.

I know it is not a simple request, but I just don't know how else to trust
these changes. I guess a quick way for you to avoid the hassle is to add
_require_discard_zeroes (patch #1) and drop the rest.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27  7:29       ` Amir Goldstein
@ 2020-08-27  7:37         ` Christoph Hellwig
  2020-08-27 15:57           ` Josef Bacik
  2020-08-27 14:11         ` Brian Foster
  1 sibling, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2020-08-27  7:37 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christoph Hellwig, Brian Foster, fstests, linux-xfs, Josef Bacik

On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote:
> I figured you'd say something like that :)
> but since we are talking about dm-thin as a solution for predictable
> behavior at the moment and this sanity check helps avoiding adding
> new tests that can fail to some extent, is the proposed bandaid good enough
> to keep those tests alive until a better solution is proposed?

Well, the problem is that a test that wants to reliable nuke data needs
to... *drumroll* reliably nuke data.  Which means zeroing or at least
a known pattern.  discard doesn't give you that.

I don't see how a plain discard is going to work for any file system
for that particular case.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster
  2020-08-27  6:58   ` Amir Goldstein
@ 2020-08-27  7:38   ` Christoph Hellwig
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2020-08-27  7:38 UTC (permalink / raw)
  To: Brian Foster; +Cc: fstests, linux-xfs

On Wed, Aug 26, 2020 at 10:38:12AM -0400, Brian Foster wrote:
> Several generic fstests use dm-log-writes to test the filesystem for
> consistency at various crash recovery points. dm-log-writes and the
> associated replay mechanism rely on zeroing via discard to clear
> stale blocks when moving to various points in time of the fs. If the
> storage doesn't provide zeroing or the discard requests exceed the
> hardcoded maximum (128MB) of the fallback solution to physically
> write zeroes, stale blocks are left around in the target fs. This
> scheme is known to cause issues on XFS v5 superblocks if recovery
> observes metadata from a future variant of an fs that has been
> replayed to an older point in time. This corrupts the filesystem and
> leads to false test failures.
> 
> generic/482 already works around this problem by using a thin volume
> as the target device, which provides consistent and efficient
> discard zeroing behavior, but other tests have seen similar issues
> on XFS. Add an XFS specific check to the dmlogwrites init time code
> that requires discard zeroing support and otherwise skips the test
> to avoid false positive failures.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  common/dmlogwrites | 10 ++++++++--
>  common/rc          | 14 ++++++++++++++
>  tests/generic/470  |  2 +-
>  3 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/common/dmlogwrites b/common/dmlogwrites
> index 573f4b8a..92cc6ce2 100644
> --- a/common/dmlogwrites
> +++ b/common/dmlogwrites
> @@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt()
>  	_require_test_program "log-writes/replay-log"
>  
>  	local ret=0
> -	local mountopt=$1
> +	local dev=$1
> +	local mountopt=$2
>  
> -	_log_writes_init $SCRATCH_DEV
> +	_log_writes_init $dev
>  	_log_writes_mkfs > /dev/null 2>&1
>  	_log_writes_mount "-o $mountopt" > /dev/null 2>&1
>  	# Check options to be sure.
> @@ -66,6 +67,11 @@ _log_writes_init()
>  	[ -z "$blkdev" ] && _fail \
>  	"block dev must be specified for _log_writes_init"
>  
> +	# XFS requires discard zeroing support on the target device to work
> +	# reliably with dm-log-writes. Use dm-thin devices in tests that want
> +	# to provide reliable discard zeroing support.
> +	[ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev
> +
>  	local BLK_DEV_SIZE=`blockdev --getsz $blkdev`
>  	LOGWRITES_NAME=logwrites-test
>  	LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
> diff --git a/common/rc b/common/rc
> index aa5a7409..fedb5221 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4313,6 +4313,20 @@ _require_mknod()
>  	rm -f $TEST_DIR/$seq.null
>  }
>  
> +# check that discard is supported and subsequent reads return zeroes
> +_require_discard_zeroes()
> +{
> +	local dev=$1
> +
> +	_require_command "$BLKDISCARD_PROG" blkdiscard
> +
> +	$XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 ||
> +		_fail "write error"
> +	$BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support"
> +	hexdump -n 4096 $dev | head -n 1 | grep cdcd &&
> +		_notrun "no discard zeroing support"
> +}

This test is bogus.  a discard may zero parts of the request or all
of it.  It may decided to zero based on the LBA, the physical block
number in the SSD, the phase of the moon or a random number generator.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27  7:29       ` Amir Goldstein
  2020-08-27  7:37         ` Christoph Hellwig
@ 2020-08-27 14:11         ` Brian Foster
       [not found]           ` <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Brian Foster @ 2020-08-27 14:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Christoph Hellwig, fstests, linux-xfs, Josef Bacik

On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote:
> On Thu, Aug 27, 2020 at 10:02 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote:
> > > I imagine that ext4 could also be burned by this.
> > > Do we have a reason to limit this requirement to xfs?
> > > I prefer to make it generic.
> >
> > The whole idea that discard zeroes data is just completely broken.
> > Discard is a hint that is explicitly allowed to do nothing.
> 
> I figured you'd say something like that :)
> but since we are talking about dm-thin as a solution for predictable
> behavior at the moment and this sanity check helps avoiding adding
> new tests that can fail to some extent, is the proposed bandaid good enough
> to keep those tests alive until a better solution is proposed?
> 
> Another concern that I have is that perhaps adding dm-thin to the fsx tests
> would change timing of io in a subtle way and hide the original bugs that they
> caught:
> 
> 47c7d0b19502 ("xfs: fix incorrect log_flushed on fsync")
> 3af423b03435 ("xfs: evict CoW fork extents when performing finsert/fcollapse")
> 51e3ae81ec58 ("ext4: fix interaction between i_size, fallocate, and
> delalloc after a crash")
> 
> Because at the time I (re)wrote Josef's fsx tests, they flushed out the bugs
> on spinning rust much more frequently than on SSD.
> 
> So Brian, if you could verify that the fsx tests still catch the original bugs
> by reverting the fixes or running with an old kernel and probably running
> on a slow disk that would be great.
> 

I made these particular changes because it's how we previously addressed
generic/482 and they were a pretty clear and quick way to address the
problem. I'm pretty sure I've seen these tests reproduce legitimate
issues with or without thin, but that's from memory so I can't confirm
that with certainty or suggest that applies for every regression these
have discovered in the past.

If we want to go down that path, I'd say lets just assume those no
longer reproduce. That doesn't make these tests any less broken on XFS
(v5, at least) today, so I guess I'd drop the thin changes (note again
that generic/482 is already using dmthinp) and just disable these tests
on XFS until we can come up with an acceptable solution to make them
more reliable. That's slightly unfortunate because I think the tests are
quite effective, but we're continuing to see spurious failures that are
not trivial to diagnose.

IIRC, I did at one point experiment with removing the (128M?) physical
zeroing limit from the associated logwrites tool, but that somewhat
predictably caused the test to take an extremely long time. I'm not sure
I even allowed it to finish, tbh. Anyways, perhaps some options are to
allow a larger physical zeroing limit and remove the tests from the auto
group, use smaller target devices to reduce the amount of zeroing
required, require the user to have a thinp or some such volume as a
scratch dev already or otherwise find some other more efficient zeroing
mechanism.

> I know it is not a simple request, but I just don't know how else to trust
> these changes. I guess a quick way for you to avoid the hassle is to add
> _require_discard_zeroes (patch #1) and drop the rest.
> 

That was going to be my fallback suggestion if the changes to the tests
were problematic for whatever reason. Christoph points out that the
discard zeroing logic is not predictable in general as it might be on
dm-thinp, so I think for now we should probably just notrun these on
XFS. I'll send something like that as a v2 of patch 1.

Brian

> Thanks,
> Amir.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27  7:37         ` Christoph Hellwig
@ 2020-08-27 15:57           ` Josef Bacik
  2020-08-27 17:02             ` Christoph Hellwig
  2020-08-29  6:44             ` Christoph Hellwig
  0 siblings, 2 replies; 19+ messages in thread
From: Josef Bacik @ 2020-08-27 15:57 UTC (permalink / raw)
  To: Christoph Hellwig, Amir Goldstein; +Cc: Brian Foster, fstests, linux-xfs

On 8/27/20 3:37 AM, Christoph Hellwig wrote:
> On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote:
>> I figured you'd say something like that :)
>> but since we are talking about dm-thin as a solution for predictable
>> behavior at the moment and this sanity check helps avoiding adding
>> new tests that can fail to some extent, is the proposed bandaid good enough
>> to keep those tests alive until a better solution is proposed?
> 
> Well, the problem is that a test that wants to reliable nuke data needs
> to... *drumroll* reliably nuke data.  Which means zeroing or at least
> a known pattern.  discard doesn't give you that.
> 
> I don't see how a plain discard is going to work for any file system
> for that particular case.
> 

This sort of brings up a good point, the whole point of DISCARD support 
in log-writes was to expose problems where we may have been discarding 
real data we cared about, hence adding the forced zero'ing stuff for 
devices that didn't support discard.  But that made the incorrect 
assumption that a drive with actual discard support would actually 
return 0's for discarded data.  That assumption was based on hardware 
that did actually do that, but now we live in the brave new world of 
significantly shittier drives.  Does dm-thinp reliably unmap the ranges 
we discard, and thus give us this zero'ing behavior?  Because we might 
as well just use that for everything so log-writes doesn't have to 
resort to pwrite()'ing zeros everywhere.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27 15:57           ` Josef Bacik
@ 2020-08-27 17:02             ` Christoph Hellwig
  2020-08-27 18:35               ` Brian Foster
  2020-08-29  6:44             ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2020-08-27 17:02 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christoph Hellwig, Amir Goldstein, Brian Foster, fstests, linux-xfs

On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote:
> This sort of brings up a good point, the whole point of DISCARD support in
> log-writes was to expose problems where we may have been discarding real
> data we cared about, hence adding the forced zero'ing stuff for devices that
> didn't support discard.  But that made the incorrect assumption that a drive
> with actual discard support would actually return 0's for discarded data.
> That assumption was based on hardware that did actually do that, but now we
> live in the brave new world of significantly shittier drives.  Does dm-thinp
> reliably unmap the ranges we discard, and thus give us this zero'ing
> behavior?  Because we might as well just use that for everything so
> log-writes doesn't have to resort to pwrite()'ing zeros everywhere.  Thanks,

We have a write zeroes operation in the block layer.  For some devices
this is as efficient as discard, and that should (I think) dm.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27 17:02             ` Christoph Hellwig
@ 2020-08-27 18:35               ` Brian Foster
  2020-08-29  6:46                 ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Foster @ 2020-08-27 18:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Josef Bacik, Amir Goldstein, fstests, linux-xfs

On Thu, Aug 27, 2020 at 06:02:42PM +0100, Christoph Hellwig wrote:
> On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote:
> > This sort of brings up a good point, the whole point of DISCARD support in
> > log-writes was to expose problems where we may have been discarding real
> > data we cared about, hence adding the forced zero'ing stuff for devices that
> > didn't support discard.  But that made the incorrect assumption that a drive
> > with actual discard support would actually return 0's for discarded data.
> > That assumption was based on hardware that did actually do that, but now we
> > live in the brave new world of significantly shittier drives.  Does dm-thinp
> > reliably unmap the ranges we discard, and thus give us this zero'ing
> > behavior?  Because we might as well just use that for everything so
> > log-writes doesn't have to resort to pwrite()'ing zeros everywhere.  Thanks,
> 

That's pretty much what this series does. It only modifies the generic
tests because I didn't want to mess with the others (all btrfs, I think)
that might not have any issues, but I wouldn't be opposed to burying the
logic into the dmlogwrites bits so it just always creates a thin volume
behind the scenes. If we were going to do that, I'd prefer to do it as a
follow up to these patches (dropping patch 1, most likely) so at least
they can remain enabled on XFS for the time being.

OTOH, perhaps the thinp behavior could be internal, but conditional
based on XFS. It's not really clear to me if this problem is more of an
XFS phenomenon or just that XFS happens to have some unique recovery
checking logic that explicitly detects it. It seems more like the
latter, but I don't know enough about ext4 or btrfs to say..

> We have a write zeroes operation in the block layer.  For some devices
> this is as efficient as discard, and that should (I think) dm.
> 

Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes
from userspace, but I don't think it's efficient enough to solve this
problem. It takes about 3m to manually zero my 15GB lvm (dm-linear)
scratch device on my test vm via dd using sync writes. A 'blkdiscard -z'
saves me about half that time, but IIRC this is an operation that would
occur every time the logwrites device is replayed to a particular
recovery point (which can happen many times per test).

Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
       [not found]           ` <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com>
@ 2020-08-28 14:10             ` Brian Foster
  0 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-28 14:10 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Christoph Hellwig, Josef Bacik, fstests, linux-xfs

Re-add lists to CC.

On Fri, Aug 28, 2020 at 06:47:06AM +0300, Amir Goldstein wrote:
> On Thu, Aug 27, 2020, 5:11 PM Brian Foster <bfoster@redhat.com> wrote:
> 
> > On Thu, Aug 27, 2020 at 10:29:05AM +0300, Amir Goldstein wrote:
> > > On Thu, Aug 27, 2020 at 10:02 AM Christoph Hellwig <hch@infradead.org>
> > wrote:
> > > >
> > > > On Thu, Aug 27, 2020 at 09:58:09AM +0300, Amir Goldstein wrote:
> > > > > I imagine that ext4 could also be burned by this.
> > > > > Do we have a reason to limit this requirement to xfs?
> > > > > I prefer to make it generic.
> > > >
> > > > The whole idea that discard zeroes data is just completely broken.
> > > > Discard is a hint that is explicitly allowed to do nothing.
> > >
> > > I figured you'd say something like that :)
> > > but since we are talking about dm-thin as a solution for predictable
> > > behavior at the moment and this sanity check helps avoiding adding
> > > new tests that can fail to some extent, is the proposed bandaid good
> > enough
> > > to keep those tests alive until a better solution is proposed?
> > >
> > > Another concern that I have is that perhaps adding dm-thin to the fsx
> > tests
> > > would change timing of io in a subtle way and hide the original bugs
> > that they
> > > caught:
> > >
> > > 47c7d0b19502 ("xfs: fix incorrect log_flushed on fsync")
> > > 3af423b03435 ("xfs: evict CoW fork extents when performing
> > finsert/fcollapse")
> > > 51e3ae81ec58 ("ext4: fix interaction between i_size, fallocate, and
> > > delalloc after a crash")
> > >
> > > Because at the time I (re)wrote Josef's fsx tests, they flushed out the
> > bugs
> > > on spinning rust much more frequently than on SSD.
> > >
> > > So Brian, if you could verify that the fsx tests still catch the
> > original bugs
> > > by reverting the fixes or running with an old kernel and probably running
> > > on a slow disk that would be great.
> > >
> >
> > I made these particular changes because it's how we previously addressed
> > generic/482 and they were a pretty clear and quick way to address the
> > problem. I'm pretty sure I've seen these tests reproduce legitimate
> > issues with or without thin, but that's from memory so I can't confirm
> > that with certainty or suggest that applies for every regression these
> > have discovered in the past.
> >
> > If we want to go down that path, I'd say lets just assume those no
> > longer reproduce. That doesn't make these tests any less broken on XFS
> > (v5, at least) today, so I guess I'd drop the thin changes (note again
> > that generic/482 is already using dmthinp) and just disable these tests
> > on XFS until we can come up with an acceptable solution to make them
> > more reliable. That's slightly unfortunate because I think the tests are
> > quite effective, but we're continuing to see spurious failures that are
> > not trivial to diagnose.
> 
> 
> 
> > IIRC, I did at one point experiment with removing the (128M?) physical
> > zeroing limit from the associated logwrites tool, but that somewhat
> > predictably caused the test to take an extremely long time. I'm not sure
> > I even allowed it to finish, tbh. Anyways, perhaps some options are to
> > allow a larger physical zeroing limit and remove the tests from the auto
> > group, use smaller target devices to reduce the amount of zeroing
> > required, require the user to have a thinp or some such volume as a
> > scratch dev already or otherwise find some other more efficient zeroing
> > mechanism.
> >
> > > I know it is not a simple request, but I just don't know how else to
> > trust
> > > these changes. I guess a quick way for you to avoid the hassle is to add
> > > _require_discard_zeroes (patch #1) and drop the rest.
> > >
> >
> > That was going to be my fallback suggestion if the changes to the tests
> > were problematic for whatever reason. Christoph points out that the
> > discard zeroing logic is not predictable in general as it might be on
> > dm-thinp, so I think for now we should probably just notrun these on
> > XFS. I'll send something like that as a v2 of patch 1.
> >
> 
> Maybe there is a better way to stay safe and not sorry.
> 
> Can we use dm thinp only for replay but not for fsx?
> I suppose reliable zeroing is only needed in replay?
> 

I think that would work, but might clutter or complicate the tests by
using multiple devices for I/O vs. replay. That kind of strikes me as
overkill given that two of these run multiple instances of fsx (which
has a fixed size file) and the other looks like it runs a simple xfs_io
command with a fixed amount of I/O. Ironically, generic/482 seems like
the test with the most I/O overhead (via fsstress), but it's been using
dm-thin for a while now.

I think if we're really that concerned with dm-thin allocation overhead,
we should either configure the cluster size to amortize the cost of it
or just look into using smaller block devices so replay-log falls back
to manual zeroing when discard doesn't work. A quick test of the latter
doesn't show a huge increase in test runtime for 455/457, but I'd also
have to confirm that this works as expected and eliminates the spurious
corruption issue. Of course, a tradeoff could be that if discard does
work but doesn't provide zeroing (which Christoph already explained is
unpredictable), then I think we're still susceptible to the original
problem. Perhaps we could create a mode that simply forces manual
zeroing over discards if there isn't one already...

Brian

> Thanks,
> Amir.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27 15:57           ` Josef Bacik
  2020-08-27 17:02             ` Christoph Hellwig
@ 2020-08-29  6:44             ` Christoph Hellwig
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2020-08-29  6:44 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christoph Hellwig, Amir Goldstein, Brian Foster, fstests, linux-xfs

On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote:
> > 
> 
> This sort of brings up a good point, the whole point of DISCARD support in
> log-writes was to expose problems where we may have been discarding real
> data we cared about, hence adding the forced zero'ing stuff for devices that
> didn't support discard.  But that made the incorrect assumption that a drive
> with actual discard support would actually return 0's for discarded data.
> That assumption was based on hardware that did actually do that, but now we
> live in the brave new world of significantly shittier drives.  Does dm-thinp
> reliably unmap the ranges we discard, and thus give us this zero'ing
> behavior?  Because we might as well just use that for everything so
> log-writes doesn't have to resort to pwrite()'ing zeros everywhere.  Thanks,

So the right fix is to replace issuing REQ_OP_DISCARD bios with
REQ_OP_WRITE_ZEROES (plus the check for that).  That'll give you the
effective zeroing part if the device supports it, and you can still
fall back to manual zeroing (and blk-lib.c actually has helpers for
that as well).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-27 18:35               ` Brian Foster
@ 2020-08-29  6:46                 ` Christoph Hellwig
  2020-08-30 13:30                   ` Amir Goldstein
  2020-08-31 13:37                   ` Brian Foster
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2020-08-29  6:46 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, Josef Bacik, Amir Goldstein, fstests, linux-xfs

On Thu, Aug 27, 2020 at 02:35:07PM -0400, Brian Foster wrote:
> OTOH, perhaps the thinp behavior could be internal, but conditional
> based on XFS. It's not really clear to me if this problem is more of an
> XFS phenomenon or just that XFS happens to have some unique recovery
> checking logic that explicitly detects it. It seems more like the
> latter, but I don't know enough about ext4 or btrfs to say..

The way I understand the tests (and Josefs mail seems to confirm that)
is that it uses discards to ensure data disappears.  Unfortunately
that's only how discard sometimes work, but not all the time.

> > We have a write zeroes operation in the block layer.  For some devices
> > this is as efficient as discard, and that should (I think) dm.
> > 
> 
> Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes
> from userspace, but I don't think it's efficient enough to solve this
> problem. It takes about 3m to manually zero my 15GB lvm (dm-linear)
> scratch device on my test vm via dd using sync writes. A 'blkdiscard -z'
> saves me about half that time, but IIRC this is an operation that would
> occur every time the logwrites device is replayed to a particular
> recovery point (which can happen many times per test).

Are we talking about the block layer interface or the userspace syscall
one?  I though it was the former, in which case REQ_OP_WRITE_ZEROES
is the interface.  User interface is harder - you need to use fallocate
on the block device, but the flags are mapped kinda weird:

FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE guarantees you a
REQ_OP_WRITE_ZEROES, but there is a bunch of other variants that include
fallbacks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-29  6:46                 ` Christoph Hellwig
@ 2020-08-30 13:30                   ` Amir Goldstein
  2020-08-31 13:37                   ` Brian Foster
  1 sibling, 0 replies; 19+ messages in thread
From: Amir Goldstein @ 2020-08-30 13:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, Josef Bacik, fstests, linux-xfs

On Sat, Aug 29, 2020 at 9:47 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Aug 27, 2020 at 02:35:07PM -0400, Brian Foster wrote:
> > OTOH, perhaps the thinp behavior could be internal, but conditional
> > based on XFS. It's not really clear to me if this problem is more of an
> > XFS phenomenon or just that XFS happens to have some unique recovery
> > checking logic that explicitly detects it. It seems more like the
> > latter, but I don't know enough about ext4 or btrfs to say..
>
> The way I understand the tests (and Josefs mail seems to confirm that)
> is that it uses discards to ensure data disappears.  Unfortunately
> that's only how discard sometimes work, but not all the time.
>

I think we are confusing two slightly different uses of discard in those tests.
One use case is that dm-logwrites records discards in test workloads and then
needs to replay them to simulate the sequence of IO event up to a crash point.

I think that use case is less interesting, because as Christoph points out,
discard is not reliable, so I think we should get rid of " -o discard"
in the tests -
it did not catch any issues that I know of.

But there is another discard in those tests issued by _log_writes_mkfs
(at least it does for xfs and ext4). This discard has the by product of
making sure that replay from the start to point in time, first wipes all
stale data from the replay block device.

Of course the problems we encounter is that it does not wipe all stale
data when not running on dm-thinp, which is why I suggested to always
use dm-thinp for replay device, but I can live perfectly well with Brian's
v1 patches where both workload and replay are done on dm-thinp.

Josef had two variants for those tests, one doing "replay from start"
for every checkpoint and utilizing this discard-as-wipe behavior
and one flavor that used dm-thinp to take snapshots and replay
from snapshot T to the next mark.

I remember someone tried converting some of the tests to the snapshot
flavor, but it turned out to be slower, so we left it as is (always replay from
the start).

In conclusion, I *think* the correct fix for the failing tests is:
1. Use dm-thinp for all those tests (as v1 does)
2. In _log_writes_replay_log{,_range}() start by explicitly
    wiping dm-thinp, either with with hole punch command or by
    re-creating the new thinp volume, whichever is faster.
    instead of relying on the replay of discard operation recorded
    from mkfs that sort of kind of worked by mistake.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS
  2020-08-29  6:46                 ` Christoph Hellwig
  2020-08-30 13:30                   ` Amir Goldstein
@ 2020-08-31 13:37                   ` Brian Foster
  1 sibling, 0 replies; 19+ messages in thread
From: Brian Foster @ 2020-08-31 13:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Josef Bacik, Amir Goldstein, fstests, linux-xfs

On Sat, Aug 29, 2020 at 07:46:59AM +0100, Christoph Hellwig wrote:
> On Thu, Aug 27, 2020 at 02:35:07PM -0400, Brian Foster wrote:
> > OTOH, perhaps the thinp behavior could be internal, but conditional
> > based on XFS. It's not really clear to me if this problem is more of an
> > XFS phenomenon or just that XFS happens to have some unique recovery
> > checking logic that explicitly detects it. It seems more like the
> > latter, but I don't know enough about ext4 or btrfs to say..
> 
> The way I understand the tests (and Josefs mail seems to confirm that)
> is that it uses discards to ensure data disappears.  Unfortunately
> that's only how discard sometimes work, but not all the time.
> 

I think Amir's followup describes how the infrastructure uses discard
better than I could. I'm not intimately familiar with how it works, so
my goal was to take the same approach as generic/482 and update the
tests to provide the predictable behavior expected by the
infrastructure. If folks want to revisit all of that to improve the
tests to not rely on discard and break that dependency, that seems like
a fine direction, but it also seems that can come later as improvements
to the broader infrastructure.

> > > We have a write zeroes operation in the block layer.  For some devices
> > > this is as efficient as discard, and that should (I think) dm.
> > > 
> > 
> > Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes
> > from userspace, but I don't think it's efficient enough to solve this
> > problem. It takes about 3m to manually zero my 15GB lvm (dm-linear)
> > scratch device on my test vm via dd using sync writes. A 'blkdiscard -z'
> > saves me about half that time, but IIRC this is an operation that would
> > occur every time the logwrites device is replayed to a particular
> > recovery point (which can happen many times per test).
> 
> Are we talking about the block layer interface or the userspace syscall
> one?  I though it was the former, in which case REQ_OP_WRITE_ZEROES
> is the interface.  User interface is harder - you need to use fallocate
> on the block device, but the flags are mapped kinda weird:
> 
> FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE guarantees you a
> REQ_OP_WRITE_ZEROES, but there is a bunch of other variants that include
> fallbacks.
> 

I was using the BLKZEROOUT ioctl in my previous test because
fallocate(PUNCH_HOLE|KEEP_SIZE) (zeroing offload) isn't supported on
this device. I see similar results as above with
fallocate(PUNCH_HOLE|KEEP_SIZE) though, which seems to fall back to
__blkdev_issue_zero_pages() in that case.

Brian


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-08-31 13:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-26 14:38 [PATCH 0/4] fix up generic dmlogwrites tests to work with XFS Brian Foster
2020-08-26 14:38 ` [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS Brian Foster
2020-08-27  6:58   ` Amir Goldstein
2020-08-27  7:02     ` Christoph Hellwig
2020-08-27  7:29       ` Amir Goldstein
2020-08-27  7:37         ` Christoph Hellwig
2020-08-27 15:57           ` Josef Bacik
2020-08-27 17:02             ` Christoph Hellwig
2020-08-27 18:35               ` Brian Foster
2020-08-29  6:46                 ` Christoph Hellwig
2020-08-30 13:30                   ` Amir Goldstein
2020-08-31 13:37                   ` Brian Foster
2020-08-29  6:44             ` Christoph Hellwig
2020-08-27 14:11         ` Brian Foster
     [not found]           ` <CAOQ4uxj6RKX01kKKc_SGZJegWEKaF+D8ZNJGALvh4o0c5bBcBg@mail.gmail.com>
2020-08-28 14:10             ` Brian Foster
2020-08-27  7:38   ` Christoph Hellwig
2020-08-26 14:38 ` [PATCH 2/4] generic/455: use thin volume for dmlogwrites target device Brian Foster
2020-08-26 14:38 ` [PATCH 3/4] generic/457: " Brian Foster
2020-08-26 14:38 ` [PATCH 4/4] generic/470: " Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).