On 2019/9/1 上午2:10, Eryu Guan wrote: > On Mon, Aug 26, 2019 at 02:20:45PM +0800, Qu Wenruo wrote: >> We have generic dm-logwrites with fsstress test case (generic/482), but >> it doesn't cover fs specific operations like btrfs snapshot creation and >> deletion. >> >> Furthermore, that test is not heavy enough to bump btrfs tree height by >> its short runtime. >> >> And finally, btrfs check doesn't consider dirty log as an error, unlike >> ext*/xfs, that's to say we don't need to mount the fs to replay the log, >> but just run btrfs check on the fs is enough. >> >> So introduce a similar test case but for btrfs only. >> >> The test case will stress btrfs by: >> - Use small nodesize to bump tree height >> - Create a base tree which is already high enough >> - Trim tree blocks to find possible trim bugs >> - Call snapshot creation and deletion along with fsstress >> >> To utilize replay-log --check and --fsck command, we fix one bug in >> replay-log first: >> - Return 1 when fsck failed >> Original when fsck failed, run_fsck() returns -1, but to make >> replay_log prog to return 1, we need to return a minus value, so >> fix it by setting @ret to -EUCLEAN when run_fsck() failed, so that >> we can detect the fsck failure by simply checking the return value >> of replay-log. > > Sorry, I didn't quite get this. run_fsck() already returns a negative > value (-1) on fsck failure (thus @ret is -1 in this case), and > replay_log exits with 1 if @ret < 0. All seem fine to me, setting > -EUCLEAN doesn't seem necessary to me. Did I miss anything? My bad, it looks like so. > > Anyway, I think this bugfix could be in a separate patch. > >> >> Also it includes certain workaround for btrfs: >> - Use no-holes feature >> To avoid missing hole file extents. >> Although that behavior doesn't follow the on-disk format spec, it >> doesn't cause data loss. And will follow the new on-disk format spec >> of no-holes feature, so it's better to workaround it. >> >> And an optimization for btrfs only: >> - Use replay-log --fsck/--check command >> Since dm-log-writes records bios sequentially, there is no way to >> locate certain entry unless we iterate all entries. >> This is becoming a big performance penalty if we replay certain a >> range, check the fs, then re-execute replay-log to replay another >> range. >> >> We need to records the previous entry location, or we need to >> re-iterate all previous entries. >> >> Thankfully, replay-log has already address it by providing --fsck and >> --check command, thus we don't need to break replay-log command. >> >> Please note, for fast storage (e.g. fast NVME or unsafe cache mode), >> it's recommended to use log devices larger than 15G, or we can't record >> the full log of just 30s fsstress run. >> >> Signed-off-by: Qu Wenruo >> --- >> For the log devices size problem, I have submitted dm-logwrites bio flag >> filter support, to filter out data bios. >> >> But that is not yet merged into kernel, thus we need a large log device >> for short run. >> >> For reference, if using unsafe cache mode for all test devices, on a >> system with 32G dual-channel DDR4 3200 RAM, 5G log device will be >> filled up in less than 15 seconds. >> So to ensure dm-log-writes covers all the operations, one needs at least >> 15G log device, and even more if using RAM with more channels. >> >> Changelog: >> v2 >> - Better expression/words for comment >> - Add requirement for no-holes features >> - Use xattr to bump up tree height >> So no need for max_inline mount option >> - Coding style fixes for function definition >> - Add -f for rm to avoid user alias setting >> - Add new workload (update time stamp and create new files) for snapshot >> workload >> - Remove an unnecessary sync call >> - Get rid of wrong 2>&1 redirection >> - Add to group "snapshot" and "stress" >> >> v3: >> - Add '_require_attrs' and source common/attr >> - Introduce '_require_fsck_not_report_dirty_logs_as_error' >> - Add comment for the replay-log code fix >> - Wait after killing all background fsstress >> - Use $BLKDISCARD_PROG instead of plain 'blkdiscard' >> - Add trap for snapshot and delete workload >> --- >> common/config | 1 + >> common/dmlogwrites | 44 ++++++++++ >> src/log-writes/replay-log.c | 2 + >> tests/btrfs/192 | 156 ++++++++++++++++++++++++++++++++++++ >> tests/btrfs/192.out | 2 + >> tests/btrfs/group | 1 + >> 6 files changed, 206 insertions(+) >> create mode 100755 tests/btrfs/192 >> create mode 100644 tests/btrfs/192.out >> >> diff --git a/common/config b/common/config >> index bd64be62..4c86a492 100644 >> --- a/common/config >> +++ b/common/config >> @@ -183,6 +183,7 @@ export LOGGER_PROG="$(type -P logger)" >> export DBENCH_PROG="$(type -P dbench)" >> export DMSETUP_PROG="$(type -P dmsetup)" >> export WIPEFS_PROG="$(type -P wipefs)" >> +export BLKDISCARD_PROG="$(type -P blkdiscard)" >> export DUMP_PROG="$(type -P dump)" >> export RESTORE_PROG="$(type -P restore)" >> export LVM_PROG="$(type -P lvm)" >> diff --git a/common/dmlogwrites b/common/dmlogwrites >> index ae2cbc6a..474ec570 100644 >> --- a/common/dmlogwrites >> +++ b/common/dmlogwrites >> @@ -175,3 +175,47 @@ _log_writes_replay_log_range() >> >> $seqres.full 2>&1 >> [ $? -ne 0 ] && _fail "replay failed" >> } >> + >> +# Require fsck not to report dirty logs as error >> +# >> +# This is a special requirement to use _log_writes_fast_replay_check >> +# The reasons are: >> +# - To avoid unnecessary seek when there are a lot of entries >> +# replay-log doesn't have a tree-like structure to do fast index, >> +# thus it iterate all entries one by one, this can be very slow >> +# - No way to revert the log replay for next check >> +# A lot of fsck will replay the log, which will pollute the replay device >> +# for next entry >> +_require_fsck_not_report_dirty_logs_as_error() >> +{ >> + if [ $FSTYP != "btrfs" ]; then >> + _notrun "fsck of $FSTYP reports dirty jounal/log as error, skipping test" >> + fi >> +} > > The rule name seems ugly :) > >> + >> +# Replay and check each fua/flush (specified by $2) point. >> +# >> +# Since dm-log-writes records bio sequentially, even just replaying a range >> +# still needs to iterate all records before the end point. >> +# When number of records grows, it will be unacceptably slow, thus we need >> +# to use relay-log itself to trigger fsck, avoid unnecessary seek. >> +_log_writes_fast_replay_check() >> +{ >> + local check_point=$1 >> + local blkdev=$2 >> + local fsck_command >> + >> + _require_fsck_not_report_dirty_logs_as_error >> + >> + [ -z "$check_point" -o -z "$blkdev" ] && _fail \ >> + "check_point and blkdev must be specified for _log_writes_fast_replay_check" >> + case $FSTYP in >> + btrfs) >> + fsck_command="$BTRFS_UTIL_PROG check $blkdev" >> + ;; >> + esac >> + $here/src/log-writes/replay-log --log $LOGWRITES_DEV \ >> + --replay $blkdev --check $check_point --fsck "$fsck_command" \ >> + 2>&1 | tail -n 128 >> $seqres.full >> + [ $? -ne 0 ] && _fail "fsck failed during replay" >> +} > > And I think we could make _log_writes_fast_replay_check(), which seems > only useful to btrfs, a local function in the test, so we avoid all > these $FSTYP == btrfs checks. Yep, that's much better. Thanks, Qu > > Thanks, > Eryu > >> diff --git a/src/log-writes/replay-log.c b/src/log-writes/replay-log.c >> index 829b18e2..1e1cd524 100644 >> --- a/src/log-writes/replay-log.c >> +++ b/src/log-writes/replay-log.c >> @@ -1,5 +1,6 @@ >> // SPDX-License-Identifier: GPL-2.0 >> #include >> +#include >> #include >> #include >> #include >> @@ -375,6 +376,7 @@ int main(int argc, char **argv) >> fprintf(stderr, "Fsck errored out on entry " >> "%llu\n", >> (unsigned long long)log->cur_entry - 1); >> + ret = -EUCLEAN; >> break; >> } >> } >> diff --git a/tests/btrfs/192 b/tests/btrfs/192 >> new file mode 100755 >> index 00000000..db9bc40e >> --- /dev/null >> +++ b/tests/btrfs/192 >> @@ -0,0 +1,156 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved. >> +# >> +# FS QA Test 192 >> +# >> +# Test btrfs consistency after each FUA for a workload with snapshot creation >> +# and removal >> +# >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + kill -q $pid1 &> /dev/null >> + kill -q $pid2 &> /dev/null >> + "$KILLALL_PROG" -q $FSSTRESS_PROG &> /dev/null >> + wait >> + _log_writes_cleanup &> /dev/null >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/attr >> +. ./common/dmlogwrites >> + >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# real QA test starts here >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_supported_os Linux >> + >> +_require_command "$KILLALL_PROG" killall >> +_require_command "$BLKDISCARD_PROG" blkdiscard >> +_require_btrfs_fs_feature "no_holes" >> +_require_btrfs_mkfs_feature "no-holes" >> +_require_fsck_not_report_dirty_logs_as_error >> +_require_log_writes >> +_require_scratch >> +_require_attrs >> + >> +# To generate 3 level fs tree for 64K nodesize, we need 32768 xattr items. >> +# That will cause too many transactions, bumping replay check time >> +# from ~60s to ~300s. (VM alreayd using unsafe cache for the test devices) >> +# So here we skip non-4K page size system, in favor of a shorter default >> +# test time >> +if [ $(get_page_size) -ne 4096 ]; then >> + _notrun "This test doesn't support non-4K page size yet" >> +fi >> + >> +runtime=30 >> +nr_cpus=$("$here/src/feature" -o) >> +# cap nr_cpus to 8 to avoid spending too much time on hosts with many cpus >> +if [ $nr_cpus -gt 8 ]; then >> + nr_cpus=8 >> +fi >> +fsstress_args=$(_scale_fsstress_args -w -d $SCRATCH_MNT -n 99999 -p $nr_cpus \ >> + $FSSTRESS_AVOID) >> +_log_writes_init $SCRATCH_DEV >> + >> +# Discard the whole devices so when some tree pointer is wrong, it won't point >> +# to some older valid tree blocks, so we can detect it. >> +$BLKDISCARD_PROG $LOGWRITES_DMDEV > /dev/null 2>&1 >> + >> +# Workaround minor file extent discountinous. >> +# And use 4K nodesize to bump tree height. >> +_log_writes_mkfs -O no-holes -n 4k >> $seqres.full >> +_log_writes_mount >> + >> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null >> +mkdir -p $SCRATCH_MNT/snapshots >> +mkdir -p $SCRATCH_MNT/src/padding >> + >> +random_file() >> +{ >> + local basedir=$1 >> + echo "$basedir/$(ls $basedir | sort -R | tail -1)" >> +} >> + >> +snapshot_workload() >> +{ >> + trap "wait; exit" SIGTERM >> + >> + local i=0 >> + while true; do >> + $BTRFS_UTIL_PROG subvolume snapshot \ >> + $SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \ >> + > /dev/null >> + # Do something small to make snapshots different >> + rm -f "$(random_file $SCRATCH_MNT/src/padding)" >> + rm -f "$(random_file $SCRATCH_MNT/src/padding)" >> + touch "$(random_file $SCRATCH_MNT/src/padding)" >> + touch "$SCRATCH_MNT/src/padding/random_$RANDOM" >> + >> + i=$(($i + 1)) >> + sleep 1 >> + done >> +} >> + >> +delete_workload() >> +{ >> + trap "wait; exit" SIGTERM >> + >> + while true; do >> + sleep 2 >> + $BTRFS_UTIL_PROG subvolume delete \ >> + "$(random_file $SCRATCH_MNT/snapshots)" \ >> + > /dev/null 2>&1 >> + done >> +} >> + >> +xattr_value=$(printf '%0.sX' $(seq 1 3800)) >> + >> +# Bumping tree height to level 2. >> +for ((i = 0; i < 64; i++)); do >> + touch "$SCRATCH_MNT/src/padding/$i" >> + $SETFATTR_PROG -n 'user.x1' -v $xattr_value \ >> + "$SCRATCH_MNT/src/padding/$i" >> +done >> + >> +_log_writes_mark prepare >> + >> +snapshot_workload & >> +pid1=$! >> +delete_workload & >> +pid2=$! >> + >> +"$FSSTRESS_PROG" $fsstress_args > /dev/null & >> +sleep $runtime >> + >> +"$KILLALL_PROG" -q "$FSSTRESS_PROG" &> /dev/null >> +kill $pid1 &> /dev/null >> +kill $pid2 &> /dev/null >> +wait >> +_log_writes_unmount >> +_log_writes_remove >> + >> +_log_writes_fast_replay_check fua "$SCRATCH_DEV" >> + >> +echo "Silence is golden" >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/192.out b/tests/btrfs/192.out >> new file mode 100644 >> index 00000000..6779aa77 >> --- /dev/null >> +++ b/tests/btrfs/192.out >> @@ -0,0 +1,2 @@ >> +QA output created by 192 >> +Silence is golden >> diff --git a/tests/btrfs/group b/tests/btrfs/group >> index 2474d43e..cab10d19 100644 >> --- a/tests/btrfs/group >> +++ b/tests/btrfs/group >> @@ -194,3 +194,4 @@ >> 189 auto quick send clone >> 190 auto quick replay balance qgroup >> 191 auto quick send dedupe >> +192 auto replay snapshot stress >> -- >> 2.22.0 >>