[PATCH] btrfs/011: handle finished replace properly

* [PATCH] btrfs/011: handle finished replace properly
@ 2022-01-10 11:28 Qu Wenruo
  2022-01-10 11:57 ` Filipe Manana
  0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2022-01-10 11:28 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs

[BUG]
When running btrfs/011 inside VM which has unsafe cache set for its
devices, and the host have enough memory to cache all the IO:

btrfs/011 98s ... [failed, exit status 1]- output mismatch
    --- tests/btrfs/011.out	2019-10-22 15:18:13.962298674 +0800
    +++ /xfstests-dev/results//btrfs/011.out.bad	2022-01-10 19:12:14.683333251 +0800
    @@ -1,3 +1,4 @@
     QA output created by 011
     *** test btrfs replace
    -*** done
    +failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
    +(see /xfstests-dev/results//btrfs/011.full for details)
    ...
Ran: btrfs/011
Failures: btrfs/011
Failed 1 of 1 tests

[CAUSE]
Although commit fa85aa64 ("btrfs/011: Fill the fs to ensure we
have enough data for dev-replace") tries to address the problem by
filling the fs with extra content, there is still no guarantee that 2
seconds of IO still needs 2 seconds to finish.

Thus even we tried our best to make sure the replace will take 2
seconds, it can still finish faster than 2 seconds.

And just to mention how fast the test finishes, after the fix, the test
takes around 90~100 seconds to finish.
While on real-hardware it can take over 1000 seconds.

[FIX]
Instead of further enlarging the IO, here we just accept the fact that
replace can finish faster than our expectation, and continue the test.

One thing to notice is, since the replace finished, we need to replace
back the device, or later fsck will be executed on blank device, and
cause false alert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/btrfs/011 | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tests/btrfs/011 b/tests/btrfs/011
index b4673341..aae89696 100755
--- a/tests/btrfs/011
+++ b/tests/btrfs/011
@@ -171,13 +171,24 @@ btrfs_replace_test()
 		# background the replace operation (no '-B' option given)
 		_run_btrfs_util_prog replace start -f $replace_options $source_dev $target_dev $SCRATCH_MNT
 		sleep $wait_time
-		_run_btrfs_util_prog replace cancel $SCRATCH_MNT
+		$BTRFS_UTIL_PROG replace cancel $SCRATCH_MNT 2>&1 >> $seqres.full
 
 		# 'replace status' waits for the replace operation to finish
 		# before the status is printed
 		$BTRFS_UTIL_PROG replace status $SCRATCH_MNT > $tmp.tmp 2>&1
 		cat $tmp.tmp >> $seqres.full
-		grep -q canceled $tmp.tmp || _fail "btrfs replace status (canceled) failed"
+
+		# There is no guarantee we canceled the replace, it can finish
+		if grep -q 'finished' $tmp.tmp ; then
+			# The replace finished, we need to replace it back or
+			# later fsck will report error as $SCRATCH_DEV is now
+			# blank
+			$BTRFS_UTIL_PROG replace start -Bf $target_dev \
+				$source_dev $SCRATCH_MNT > /dev/null
+		else
+			grep -q 'canceled' $tmp.tmp || _fail \
+				"btrfs replace status (canceled ) failed"
+		fi
 	else
 		if [ "${quick}Q" = "thoroughQ" ]; then
 			# The thorough test runs around 2 * $wait_time seconds.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread