* [PATCH 0/4] aborted fstests may leave frozen fs behind @ 2022-06-19 13:46 Amir Goldstein 2022-06-19 13:46 ` [PATCH 1/4] fstests: add missing _require_freeze() to tests Amir Goldstein ` (3 more replies) 0 siblings, 4 replies; 24+ messages in thread From: Amir Goldstein @ 2022-06-19 13:46 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, fstests Zorro, I was analyzing a flakey test failure of xfs/517 and fell down a rabbit hole of frzon fs cleanups. Patch 4 fixes the false positive I was running into occasionnaly in cases where xfs/517 was completed successfully. While testing, I found out that if I interrupt the run of xfs/517 this often leaves the fs frozen. So I did an audit of the tests that freeze the fs and found that very few of them unfreeze fs on interrupt. This is an attempt to provide a robust solution that minimizes the risks of that happenning. I tested that all the tests in the 'freeze' group that I modified run to completion. I did not try to interrupt all the rest of the tests, because interrupting a test at exact time that it is frozen is quite hard. I did test that interrupting xfs/517 several times did not leave the fs frozen. Before the changes, it was rather easy to get it to leave a frozen fs behind, because the freeze_loop() keeps the fs frozon around half of the time that the test is running. As far as I can tell, this series is orthogonal to Dave's _cleanup() work [1]. It adds missing cleanups to tests (e.g. xfs/422) that could later make use of the common cleanup helpers (i.e. _fsstress_cleanup). However, the common _fsstress_cleanup() helper will need to kill without waiting. To my understand, Dave was also against waiting in the cleanup helper. Thanks, Amir. [1] https://lore.kernel.org/fstests/20220524073411.1943480-1-david@fromorbit.com/ Amir Goldstein (4): fstests: add missing _require_freeze() to tests fstests: make sure to unfreeze test and scratch mounts xfs/{422,517}: add missing killall to _cleanup() xfs/{422,517}: fix false positive failure check | 14 ++++++++------ common/rc | 5 +++-- tests/generic/390 | 2 -- tests/xfs/011 | 2 -- tests/xfs/119 | 1 + tests/xfs/318 | 3 ++- tests/xfs/325 | 3 ++- tests/xfs/422 | 16 ++++++++++++++-- tests/xfs/438 | 2 +- tests/xfs/517 | 8 ++++++-- 10 files changed, 37 insertions(+), 19 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/4] fstests: add missing _require_freeze() to tests 2022-06-19 13:46 [PATCH 0/4] aborted fstests may leave frozen fs behind Amir Goldstein @ 2022-06-19 13:46 ` Amir Goldstein 2022-06-23 18:00 ` Darrick J. Wong 2022-06-19 13:46 ` [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts Amir Goldstein ` (2 subsequent siblings) 3 siblings, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-19 13:46 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, fstests And add a few tests that use freeze to the freeze group Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- tests/xfs/119 | 1 + tests/xfs/318 | 3 ++- tests/xfs/325 | 3 ++- tests/xfs/422 | 3 ++- tests/xfs/438 | 2 +- tests/xfs/517 | 1 + 6 files changed, 9 insertions(+), 4 deletions(-) diff --git a/tests/xfs/119 b/tests/xfs/119 index a1180371..b6f96601 100755 --- a/tests/xfs/119 +++ b/tests/xfs/119 @@ -20,6 +20,7 @@ _begin_fstest log v2log auto freeze _supported_fs xfs _require_scratch +_require_freeze # this may hang sync diff --git a/tests/xfs/318 b/tests/xfs/318 index 38c7aa60..be93f9ab 100755 --- a/tests/xfs/318 +++ b/tests/xfs/318 @@ -7,7 +7,7 @@ # Simulate free extent errors with a file write and a file remove. # . ./common/preamble -_begin_fstest auto quick rw +_begin_fstest auto quick rw freeze # Override the default cleanup function. _cleanup() @@ -26,6 +26,7 @@ _supported_fs xfs _require_scratch _require_error_injection _require_xfs_io_error_injection "rmap_finish_one" +_require_freeze blksz=65536 blks=64 diff --git a/tests/xfs/325 b/tests/xfs/325 index 5b26b2b3..c6861fbc 100755 --- a/tests/xfs/325 +++ b/tests/xfs/325 @@ -8,7 +8,7 @@ # Inject an error during extent freeing to test log recovery. # . ./common/preamble -_begin_fstest auto quick clone +_begin_fstest auto quick clone freeze # Override the default cleanup function. _cleanup() @@ -29,6 +29,7 @@ _require_cp_reflink _require_scratch_reflink _require_error_injection _require_xfs_io_error_injection "free_extent" +_require_freeze blksz=65536 blks=30 diff --git a/tests/xfs/422 b/tests/xfs/422 index 175253aa..a83a66df 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -9,7 +9,7 @@ # activity, so we can't have userspace wandering in and thawing it. # . ./common/preamble -_begin_fstest dangerous_scrub dangerous_online_repair +_begin_fstest dangerous_scrub dangerous_online_repair freeze _register_cleanup "_cleanup" BUS @@ -24,6 +24,7 @@ _require_xfs_scratch_rmapbt _require_xfs_io_command "scrub" _require_xfs_io_error_injection "force_repair" _require_command "$KILLALL_PROG" killall +_require_freeze echo "Format and populate" _scratch_mkfs > "$seqres.full" 2>&1 diff --git a/tests/xfs/438 b/tests/xfs/438 index c3008b1c..cfe75bd8 100755 --- a/tests/xfs/438 +++ b/tests/xfs/438 @@ -21,7 +21,7 @@ # Fixed by upstream commit 373b058 ("xfs: Properly retry failed dquot # items in case of error during buffer writeback") . ./common/preamble -_begin_fstest auto quick quota +_begin_fstest auto quick quota freeze # Override the default cleanup function. _cleanup() diff --git a/tests/xfs/517 b/tests/xfs/517 index 88c4f43b..f7f9a8a2 100755 --- a/tests/xfs/517 +++ b/tests/xfs/517 @@ -29,6 +29,7 @@ _supported_fs xfs _require_xfs_scratch_rmapbt _require_xfs_io_command "fsmap" _require_command "$KILLALL_PROG" killall +_require_freeze echo "Format and populate" _scratch_mkfs > "$seqres.full" 2>&1 -- 2.25.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/4] fstests: add missing _require_freeze() to tests 2022-06-19 13:46 ` [PATCH 1/4] fstests: add missing _require_freeze() to tests Amir Goldstein @ 2022-06-23 18:00 ` Darrick J. Wong 2022-06-24 4:05 ` Amir Goldstein 0 siblings, 1 reply; 24+ messages in thread From: Darrick J. Wong @ 2022-06-23 18:00 UTC (permalink / raw) To: Amir Goldstein; +Cc: Zorro Lang, Darrick J . Wong, Dave Chinner, fstests On Sun, Jun 19, 2022 at 04:46:54PM +0300, Amir Goldstein wrote: > And add a few tests that use freeze to the freeze group > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > tests/xfs/119 | 1 + > tests/xfs/318 | 3 ++- > tests/xfs/325 | 3 ++- > tests/xfs/422 | 3 ++- > tests/xfs/438 | 2 +- > tests/xfs/517 | 1 + > 6 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/tests/xfs/119 b/tests/xfs/119 > index a1180371..b6f96601 100755 > --- a/tests/xfs/119 > +++ b/tests/xfs/119 > @@ -20,6 +20,7 @@ _begin_fstest log v2log auto freeze > _supported_fs xfs > > _require_scratch > +_require_freeze > > # this may hang > sync > diff --git a/tests/xfs/318 b/tests/xfs/318 > index 38c7aa60..be93f9ab 100755 > --- a/tests/xfs/318 > +++ b/tests/xfs/318 > @@ -7,7 +7,7 @@ > # Simulate free extent errors with a file write and a file remove. > # > . ./common/preamble > -_begin_fstest auto quick rw > +_begin_fstest auto quick rw freeze Not sure why these tests (318, 325, 422, 438) are being added to the freeze group -- they use freeze to force xfs to do background tasks (inodegc, initializing quota, etc), but they are not themselves functionality testing for filesystem freezing. The _require_freeze additions look fine though. --D > > # Override the default cleanup function. > _cleanup() > @@ -26,6 +26,7 @@ _supported_fs xfs > _require_scratch > _require_error_injection > _require_xfs_io_error_injection "rmap_finish_one" > +_require_freeze > > blksz=65536 > blks=64 > diff --git a/tests/xfs/325 b/tests/xfs/325 > index 5b26b2b3..c6861fbc 100755 > --- a/tests/xfs/325 > +++ b/tests/xfs/325 > @@ -8,7 +8,7 @@ > # Inject an error during extent freeing to test log recovery. > # > . ./common/preamble > -_begin_fstest auto quick clone > +_begin_fstest auto quick clone freeze > > # Override the default cleanup function. > _cleanup() > @@ -29,6 +29,7 @@ _require_cp_reflink > _require_scratch_reflink > _require_error_injection > _require_xfs_io_error_injection "free_extent" > +_require_freeze > > blksz=65536 > blks=30 > diff --git a/tests/xfs/422 b/tests/xfs/422 > index 175253aa..a83a66df 100755 > --- a/tests/xfs/422 > +++ b/tests/xfs/422 > @@ -9,7 +9,7 @@ > # activity, so we can't have userspace wandering in and thawing it. > # > . ./common/preamble > -_begin_fstest dangerous_scrub dangerous_online_repair > +_begin_fstest dangerous_scrub dangerous_online_repair freeze > > _register_cleanup "_cleanup" BUS > > @@ -24,6 +24,7 @@ _require_xfs_scratch_rmapbt > _require_xfs_io_command "scrub" > _require_xfs_io_error_injection "force_repair" > _require_command "$KILLALL_PROG" killall > +_require_freeze > > echo "Format and populate" > _scratch_mkfs > "$seqres.full" 2>&1 > diff --git a/tests/xfs/438 b/tests/xfs/438 > index c3008b1c..cfe75bd8 100755 > --- a/tests/xfs/438 > +++ b/tests/xfs/438 > @@ -21,7 +21,7 @@ > # Fixed by upstream commit 373b058 ("xfs: Properly retry failed dquot > # items in case of error during buffer writeback") > . ./common/preamble > -_begin_fstest auto quick quota > +_begin_fstest auto quick quota freeze > > # Override the default cleanup function. > _cleanup() > diff --git a/tests/xfs/517 b/tests/xfs/517 > index 88c4f43b..f7f9a8a2 100755 > --- a/tests/xfs/517 > +++ b/tests/xfs/517 > @@ -29,6 +29,7 @@ _supported_fs xfs > _require_xfs_scratch_rmapbt > _require_xfs_io_command "fsmap" > _require_command "$KILLALL_PROG" killall > +_require_freeze > > echo "Format and populate" > _scratch_mkfs > "$seqres.full" 2>&1 > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/4] fstests: add missing _require_freeze() to tests 2022-06-23 18:00 ` Darrick J. Wong @ 2022-06-24 4:05 ` Amir Goldstein 2022-06-24 6:27 ` Zorro Lang 0 siblings, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-24 4:05 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Zorro Lang, Darrick J . Wong, Dave Chinner, fstests On Thu, Jun 23, 2022 at 9:00 PM Darrick J. Wong <djwong@kernel.org> wrote: > > On Sun, Jun 19, 2022 at 04:46:54PM +0300, Amir Goldstein wrote: > > And add a few tests that use freeze to the freeze group > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > tests/xfs/119 | 1 + > > tests/xfs/318 | 3 ++- > > tests/xfs/325 | 3 ++- > > tests/xfs/422 | 3 ++- > > tests/xfs/438 | 2 +- > > tests/xfs/517 | 1 + > > 6 files changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/tests/xfs/119 b/tests/xfs/119 > > index a1180371..b6f96601 100755 > > --- a/tests/xfs/119 > > +++ b/tests/xfs/119 > > @@ -20,6 +20,7 @@ _begin_fstest log v2log auto freeze > > _supported_fs xfs > > > > _require_scratch > > +_require_freeze > > > > # this may hang > > sync > > diff --git a/tests/xfs/318 b/tests/xfs/318 > > index 38c7aa60..be93f9ab 100755 > > --- a/tests/xfs/318 > > +++ b/tests/xfs/318 > > @@ -7,7 +7,7 @@ > > # Simulate free extent errors with a file write and a file remove. > > # > > . ./common/preamble > > -_begin_fstest auto quick rw > > +_begin_fstest auto quick rw freeze > > Not sure why these tests (318, 325, 422, 438) are being added to the > freeze group -- they use freeze to force xfs to do background tasks > (inodegc, initializing quota, etc), but they are not themselves > functionality testing for filesystem freezing. True. I was hesitant about that part myself. But think about it this way - One of the reasons is that when developers change some code in the vicinity of freeze they want to run a fast smoke test with tests the exercise freeze. In this perspective, it seems useful that the freeze smoke test will also exercise freeze when used as a trigger for internal xfs tasks. That was my thinking, but I have no strong feelings either way, so if others don't like it, I can drop this part. > > The _require_freeze additions look fine though. > Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/4] fstests: add missing _require_freeze() to tests 2022-06-24 4:05 ` Amir Goldstein @ 2022-06-24 6:27 ` Zorro Lang 0 siblings, 0 replies; 24+ messages in thread From: Zorro Lang @ 2022-06-24 6:27 UTC (permalink / raw) To: Amir Goldstein; +Cc: Darrick J. Wong, Darrick J . Wong, Dave Chinner, fstests On Fri, Jun 24, 2022 at 07:05:46AM +0300, Amir Goldstein wrote: > On Thu, Jun 23, 2022 at 9:00 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Sun, Jun 19, 2022 at 04:46:54PM +0300, Amir Goldstein wrote: > > > And add a few tests that use freeze to the freeze group > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > tests/xfs/119 | 1 + > > > tests/xfs/318 | 3 ++- > > > tests/xfs/325 | 3 ++- > > > tests/xfs/422 | 3 ++- > > > tests/xfs/438 | 2 +- > > > tests/xfs/517 | 1 + > > > 6 files changed, 9 insertions(+), 4 deletions(-) > > > > > > diff --git a/tests/xfs/119 b/tests/xfs/119 > > > index a1180371..b6f96601 100755 > > > --- a/tests/xfs/119 > > > +++ b/tests/xfs/119 > > > @@ -20,6 +20,7 @@ _begin_fstest log v2log auto freeze > > > _supported_fs xfs > > > > > > _require_scratch > > > +_require_freeze > > > > > > # this may hang > > > sync > > > diff --git a/tests/xfs/318 b/tests/xfs/318 > > > index 38c7aa60..be93f9ab 100755 > > > --- a/tests/xfs/318 > > > +++ b/tests/xfs/318 > > > @@ -7,7 +7,7 @@ > > > # Simulate free extent errors with a file write and a file remove. > > > # > > > . ./common/preamble > > > -_begin_fstest auto quick rw > > > +_begin_fstest auto quick rw freeze > > > > Not sure why these tests (318, 325, 422, 438) are being added to the > > freeze group -- they use freeze to force xfs to do background tasks > > (inodegc, initializing quota, etc), but they are not themselves > > functionality testing for filesystem freezing. > > True. I was hesitant about that part myself. > But think about it this way - > > One of the reasons is that when developers change > some code in the vicinity of freeze they want to run a fast > smoke test with tests the exercise freeze. > In this perspective, it seems useful that the freeze smoke test > will also exercise freeze when used as a trigger for internal > xfs tasks. > > That was my thinking, but I have no strong feelings > either way, so if others don't like it, I can drop this part. I'm good to with/without (318, 325, 422, 438) in freeze group. If this change doesn't bring in troubles to Darrick and others, I think you can keep current version. Thanks, Zorro > > > > > The _require_freeze additions look fine though. > > > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-19 13:46 [PATCH 0/4] aborted fstests may leave frozen fs behind Amir Goldstein 2022-06-19 13:46 ` [PATCH 1/4] fstests: add missing _require_freeze() to tests Amir Goldstein @ 2022-06-19 13:46 ` Amir Goldstein 2022-06-20 11:17 ` Zorro Lang 2022-06-20 22:06 ` Dave Chinner 2022-06-19 13:46 ` [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() Amir Goldstein 2022-06-19 13:46 ` [PATCH 4/4] xfs/{422,517}: fix false positive failure Amir Goldstein 3 siblings, 2 replies; 24+ messages in thread From: Amir Goldstein @ 2022-06-19 13:46 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, fstests Almost all of the tests that _require_freeze() fail to unfreeze scratch mount in case the test is interrupted while fs is frozen. Move the handling of unfreeze to generic check code. For now, tests only freeze scratch fs, but to be more robust, unfreeze both test and scratch fs following a call to _require_freeze(). Tests could still hang if thier private _cleanup() routine tries to modify the frozen fs or wait for a blocked process. Fix the _cleanup() routine of xfs/011 to avoid that. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- check | 14 ++++++++------ common/rc | 5 +++-- tests/generic/390 | 2 -- tests/xfs/011 | 2 -- tests/xfs/517 | 1 - 5 files changed, 11 insertions(+), 13 deletions(-) diff --git a/check b/check index de11b37e..d6ee71aa 100755 --- a/check +++ b/check @@ -527,17 +527,21 @@ _check_filesystems() { local ret=0 + # Make sure both test and scratch are unfrozen post _require_freeze() + if [ -f ${RESULT_DIR}/require_freeze ]; then + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 + fi if [ -f ${RESULT_DIR}/require_test ]; then _check_test_fs || ret=1 - rm -f ${RESULT_DIR}/require_test* else _test_unmount 2> /dev/null fi if [ -f ${RESULT_DIR}/require_scratch ]; then _check_scratch_fs || ret=1 - rm -f ${RESULT_DIR}/require_scratch* fi _scratch_unmount 2> /dev/null + rm -f ${RESULT_DIR}/require_* return $ret } @@ -783,8 +787,7 @@ function run_section() seqres="$REPORT_DIR/$seqnum" mkdir -p $RESULT_DIR - rm -f ${RESULT_DIR}/require_scratch* - rm -f ${RESULT_DIR}/require_test* + rm -f ${RESULT_DIR}/require_* echo -n "$seqnum" if $showme; then @@ -882,8 +885,7 @@ function run_section() _dump_err_cont "[failed, exit status $sts]" _test_unmount 2> /dev/null _scratch_unmount 2> /dev/null - rm -f ${RESULT_DIR}/require_test* - rm -f ${RESULT_DIR}/require_scratch* + rm -f ${RESULT_DIR}/require_* err=true else # The test apparently passed, so check for corruption diff --git a/common/rc b/common/rc index 3c072c16..b87dfe05 100644 --- a/common/rc +++ b/common/rc @@ -1540,8 +1540,7 @@ _notrun() { echo "$*" > $seqres.notrun echo "$seq not run: $*" - rm -f ${RESULT_DIR}/require_test* - rm -f ${RESULT_DIR}/require_scratch* + rm -f ${RESULT_DIR}/require_* status=0 exit @@ -3648,6 +3647,8 @@ _require_freeze() local result=$? xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 [ $result -eq 0 ] || _notrun "$FSTYP does not support freezing" + # Make sure both test and scratch are unfrozen at the end of the test + touch ${RESULT_DIR}/require_freeze } # Does NFS export work on this fs? diff --git a/tests/generic/390 b/tests/generic/390 index 20c66e22..0f2b86fa 100755 --- a/tests/generic/390 +++ b/tests/generic/390 @@ -14,8 +14,6 @@ _begin_fstest auto freeze stress _cleanup() { cd / - # Make sure $SCRATCH_MNT is unfreezed - xfs_freeze -u $SCRATCH_MNT 2>/dev/null rm -f $tmp.* } diff --git a/tests/xfs/011 b/tests/xfs/011 index d6e9099e..351a574e 100755 --- a/tests/xfs/011 +++ b/tests/xfs/011 @@ -17,9 +17,7 @@ _begin_fstest auto freeze log metadata quick _cleanup() { $KILLALL_PROG -9 fsstress 2>/dev/null - wait cd / - _scratch_unmount 2>/dev/null rm -f $tmp.* } diff --git a/tests/xfs/517 b/tests/xfs/517 index f7f9a8a2..961668e3 100755 --- a/tests/xfs/517 +++ b/tests/xfs/517 @@ -15,7 +15,6 @@ _register_cleanup "_cleanup" BUS _cleanup() { cd / - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT > /dev/null 2>&1 rm -rf $tmp.* } -- 2.25.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-19 13:46 ` [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts Amir Goldstein @ 2022-06-20 11:17 ` Zorro Lang 2022-06-20 12:13 ` Amir Goldstein 2022-06-20 22:06 ` Dave Chinner 1 sibling, 1 reply; 24+ messages in thread From: Zorro Lang @ 2022-06-20 11:17 UTC (permalink / raw) To: Amir Goldstein; +Cc: Darrick J . Wong, Dave Chinner, fstests On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > Almost all of the tests that _require_freeze() fail to unfreeze > scratch mount in case the test is interrupted while fs is frozen. > > Move the handling of unfreeze to generic check code. > For now, tests only freeze scratch fs, but to be more robust, unfreeze > both test and scratch fs following a call to _require_freeze(). > > Tests could still hang if thier private _cleanup() routine tries > to modify the frozen fs or wait for a blocked process. Fix the > _cleanup() routine of xfs/011 to avoid that. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > check | 14 ++++++++------ > common/rc | 5 +++-- > tests/generic/390 | 2 -- > tests/xfs/011 | 2 -- > tests/xfs/517 | 1 - > 5 files changed, 11 insertions(+), 13 deletions(-) > > diff --git a/check b/check > index de11b37e..d6ee71aa 100755 > --- a/check > +++ b/check > @@ -527,17 +527,21 @@ _check_filesystems() > { > local ret=0 > > + # Make sure both test and scratch are unfrozen post _require_freeze() > + if [ -f ${RESULT_DIR}/require_freeze ]; then > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > + fi Hi Amir, I'm wondering why not let each freeze related test cases take care of "unfreeze" by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't help. Thanks, Zorro > if [ -f ${RESULT_DIR}/require_test ]; then > _check_test_fs || ret=1 > - rm -f ${RESULT_DIR}/require_test* > else > _test_unmount 2> /dev/null > fi > if [ -f ${RESULT_DIR}/require_scratch ]; then > _check_scratch_fs || ret=1 > - rm -f ${RESULT_DIR}/require_scratch* > fi > _scratch_unmount 2> /dev/null > + rm -f ${RESULT_DIR}/require_* > return $ret > } > > @@ -783,8 +787,7 @@ function run_section() > seqres="$REPORT_DIR/$seqnum" > > mkdir -p $RESULT_DIR > - rm -f ${RESULT_DIR}/require_scratch* > - rm -f ${RESULT_DIR}/require_test* > + rm -f ${RESULT_DIR}/require_* > echo -n "$seqnum" > > if $showme; then > @@ -882,8 +885,7 @@ function run_section() > _dump_err_cont "[failed, exit status $sts]" > _test_unmount 2> /dev/null > _scratch_unmount 2> /dev/null > - rm -f ${RESULT_DIR}/require_test* > - rm -f ${RESULT_DIR}/require_scratch* > + rm -f ${RESULT_DIR}/require_* > err=true > else > # The test apparently passed, so check for corruption > diff --git a/common/rc b/common/rc > index 3c072c16..b87dfe05 100644 > --- a/common/rc > +++ b/common/rc > @@ -1540,8 +1540,7 @@ _notrun() > { > echo "$*" > $seqres.notrun > echo "$seq not run: $*" > - rm -f ${RESULT_DIR}/require_test* > - rm -f ${RESULT_DIR}/require_scratch* > + rm -f ${RESULT_DIR}/require_* > > status=0 > exit > @@ -3648,6 +3647,8 @@ _require_freeze() > local result=$? > xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > [ $result -eq 0 ] || _notrun "$FSTYP does not support freezing" > + # Make sure both test and scratch are unfrozen at the end of the test > + touch ${RESULT_DIR}/require_freeze > } > > # Does NFS export work on this fs? > diff --git a/tests/generic/390 b/tests/generic/390 > index 20c66e22..0f2b86fa 100755 > --- a/tests/generic/390 > +++ b/tests/generic/390 > @@ -14,8 +14,6 @@ _begin_fstest auto freeze stress > _cleanup() > { > cd / > - # Make sure $SCRATCH_MNT is unfreezed > - xfs_freeze -u $SCRATCH_MNT 2>/dev/null > rm -f $tmp.* > } > > diff --git a/tests/xfs/011 b/tests/xfs/011 > index d6e9099e..351a574e 100755 > --- a/tests/xfs/011 > +++ b/tests/xfs/011 > @@ -17,9 +17,7 @@ _begin_fstest auto freeze log metadata quick > _cleanup() > { > $KILLALL_PROG -9 fsstress 2>/dev/null > - wait > cd / > - _scratch_unmount 2>/dev/null > rm -f $tmp.* > } > > diff --git a/tests/xfs/517 b/tests/xfs/517 > index f7f9a8a2..961668e3 100755 > --- a/tests/xfs/517 > +++ b/tests/xfs/517 > @@ -15,7 +15,6 @@ _register_cleanup "_cleanup" BUS > _cleanup() > { > cd / > - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT > /dev/null 2>&1 > rm -rf $tmp.* > } > > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-20 11:17 ` Zorro Lang @ 2022-06-20 12:13 ` Amir Goldstein 2022-06-20 14:21 ` Zorro Lang 0 siblings, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-20 12:13 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, Brian Foster, fstests On Mon, Jun 20, 2022 at 2:17 PM Zorro Lang <zlang@redhat.com> wrote: > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > > Almost all of the tests that _require_freeze() fail to unfreeze > > scratch mount in case the test is interrupted while fs is frozen. > > > > Move the handling of unfreeze to generic check code. > > For now, tests only freeze scratch fs, but to be more robust, unfreeze > > both test and scratch fs following a call to _require_freeze(). > > > > Tests could still hang if thier private _cleanup() routine tries > > to modify the frozen fs or wait for a blocked process. Fix the > > _cleanup() routine of xfs/011 to avoid that. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > check | 14 ++++++++------ > > common/rc | 5 +++-- > > tests/generic/390 | 2 -- > > tests/xfs/011 | 2 -- > > tests/xfs/517 | 1 - > > 5 files changed, 11 insertions(+), 13 deletions(-) > > > > diff --git a/check b/check > > index de11b37e..d6ee71aa 100755 > > --- a/check > > +++ b/check > > @@ -527,17 +527,21 @@ _check_filesystems() > > { > > local ret=0 > > > > + # Make sure both test and scratch are unfrozen post _require_freeze() > > + if [ -f ${RESULT_DIR}/require_freeze ]; then > > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > > + fi > > Hi Amir, > > I'm wondering why not let each freeze related test cases take care of "unfreeze" > by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV > (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the > case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't > help. That's also an option, but: 1. It is less robust, leaving room for human mistakes. Why is that better? 2. Leftover frozen fs is quite harmful to subsequent test runs, so it is important to avoid it 3. Tests that freeze loop/dm (generic/459, xfs/428, [*]) can and should cleanup themselves. I will add that too, but in any case... 4. unfreeze of tests and scratch fs is harmless even when it is not needed - we may even want to do that at the start of tests run(?) [*] I noticed that generic/085 which _require_freeze() does not even use xfs_freeze (intentionally) - it uses dmsetup suspend/resume, so I guess _require_freeze() should be removed from that test. Anyway, if you and others insist against this auto-unfreeze approach, I will post the patch to unfreeze fs in individual tests, but I won't be happy about it. Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-20 12:13 ` Amir Goldstein @ 2022-06-20 14:21 ` Zorro Lang 2022-06-20 15:08 ` Amir Goldstein 2022-06-20 22:34 ` Dave Chinner 0 siblings, 2 replies; 24+ messages in thread From: Zorro Lang @ 2022-06-20 14:21 UTC (permalink / raw) To: Amir Goldstein; +Cc: Darrick J . Wong, Dave Chinner, Brian Foster, fstests On Mon, Jun 20, 2022 at 03:13:33PM +0300, Amir Goldstein wrote: > On Mon, Jun 20, 2022 at 2:17 PM Zorro Lang <zlang@redhat.com> wrote: > > > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > > > Almost all of the tests that _require_freeze() fail to unfreeze > > > scratch mount in case the test is interrupted while fs is frozen. > > > > > > Move the handling of unfreeze to generic check code. > > > For now, tests only freeze scratch fs, but to be more robust, unfreeze > > > both test and scratch fs following a call to _require_freeze(). > > > > > > Tests could still hang if thier private _cleanup() routine tries > > > to modify the frozen fs or wait for a blocked process. Fix the > > > _cleanup() routine of xfs/011 to avoid that. > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > check | 14 ++++++++------ > > > common/rc | 5 +++-- > > > tests/generic/390 | 2 -- > > > tests/xfs/011 | 2 -- > > > tests/xfs/517 | 1 - > > > 5 files changed, 11 insertions(+), 13 deletions(-) > > > > > > diff --git a/check b/check > > > index de11b37e..d6ee71aa 100755 > > > --- a/check > > > +++ b/check > > > @@ -527,17 +527,21 @@ _check_filesystems() > > > { > > > local ret=0 > > > > > > + # Make sure both test and scratch are unfrozen post _require_freeze() > > > + if [ -f ${RESULT_DIR}/require_freeze ]; then > > > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > > > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > > > + fi > > > > Hi Amir, > > > > I'm wondering why not let each freeze related test cases take care of "unfreeze" > > by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV > > (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the > > case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't > > help. > > That's also an option, but: > 1. It is less robust, leaving room for human mistakes. Why is that better? > 2. Leftover frozen fs is quite harmful to subsequent test runs, so it > is important > to avoid it > 3. Tests that freeze loop/dm (generic/459, xfs/428, [*]) can and should cleanup > themselves. I will add that too, but in any case... > 4. unfreeze of tests and scratch fs is harmless even when it is not needed - > we may even want to do that at the start of tests run(?) I think Dave doesn't like to add common steps to thousands of cases, if without a critical reason. So I hope to get more review points for this kind of changes. > > [*] I noticed that generic/085 which _require_freeze() does not even > use xfs_freeze (intentionally) - it uses dmsetup suspend/resume, > so I guess _require_freeze() should be removed from that test. Agree, I think dm suspend through different userspace and kernel interface with fsfreeze, so that require doesn't make sense. > > Anyway, if you and others insist against this auto-unfreeze approach, > I will post the patch to unfreeze fs in individual tests, but I won't be > happy about it. From the functional side, I think this change makes sense. But if think about the performance, let's get more opinions at first. If there's not objection, we can have it. Thanks, Zorro > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-20 14:21 ` Zorro Lang @ 2022-06-20 15:08 ` Amir Goldstein 2022-06-20 22:34 ` Dave Chinner 1 sibling, 0 replies; 24+ messages in thread From: Amir Goldstein @ 2022-06-20 15:08 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, Brian Foster, fstests On Mon, Jun 20, 2022 at 5:21 PM Zorro Lang <zlang@redhat.com> wrote: > > On Mon, Jun 20, 2022 at 03:13:33PM +0300, Amir Goldstein wrote: > > On Mon, Jun 20, 2022 at 2:17 PM Zorro Lang <zlang@redhat.com> wrote: > > > > > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > > > > Almost all of the tests that _require_freeze() fail to unfreeze > > > > scratch mount in case the test is interrupted while fs is frozen. > > > > > > > > Move the handling of unfreeze to generic check code. > > > > For now, tests only freeze scratch fs, but to be more robust, unfreeze > > > > both test and scratch fs following a call to _require_freeze(). > > > > > > > > Tests could still hang if thier private _cleanup() routine tries > > > > to modify the frozen fs or wait for a blocked process. Fix the > > > > _cleanup() routine of xfs/011 to avoid that. > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > check | 14 ++++++++------ > > > > common/rc | 5 +++-- > > > > tests/generic/390 | 2 -- > > > > tests/xfs/011 | 2 -- > > > > tests/xfs/517 | 1 - > > > > 5 files changed, 11 insertions(+), 13 deletions(-) > > > > > > > > diff --git a/check b/check > > > > index de11b37e..d6ee71aa 100755 > > > > --- a/check > > > > +++ b/check > > > > @@ -527,17 +527,21 @@ _check_filesystems() > > > > { > > > > local ret=0 > > > > > > > > + # Make sure both test and scratch are unfrozen post _require_freeze() > > > > + if [ -f ${RESULT_DIR}/require_freeze ]; then > > > > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > > > > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > > > > + fi > > > > > > Hi Amir, > > > > > > I'm wondering why not let each freeze related test cases take care of "unfreeze" > > > by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV > > > (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the > > > case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't > > > help. > > > > That's also an option, but: > > 1. It is less robust, leaving room for human mistakes. Why is that better? > > 2. Leftover frozen fs is quite harmful to subsequent test runs, so it > > is important > > to avoid it > > 3. Tests that freeze loop/dm (generic/459, xfs/428, [*]) can and should cleanup > > themselves. I will add that too, but in any case... > > 4. unfreeze of tests and scratch fs is harmless even when it is not needed - > > we may even want to do that at the start of tests run(?) > > I think Dave doesn't like to add common steps to thousands of cases, if without > a critical reason. So I hope to get more review points for this kind of changes. > > > > > [*] I noticed that generic/085 which _require_freeze() does not even > > use xfs_freeze (intentionally) - it uses dmsetup suspend/resume, > > so I guess _require_freeze() should be removed from that test. > > Agree, I think dm suspend through different userspace and kernel interface with > fsfreeze, so that require doesn't make sense. > > > > > Anyway, if you and others insist against this auto-unfreeze approach, > > I will post the patch to unfreeze fs in individual tests, but I won't be > > happy about it. > > From the functional side, I think this change makes sense. But if think about > the performance, let's get more opinions at first. If there's not objection, > we can have it. > Maybe the change was not clear. Only the tests that declare _require_freeze() in the beginning of the test (14 tests) are going to be affected by this change. The rest of the tests have no impact at all. Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-20 14:21 ` Zorro Lang 2022-06-20 15:08 ` Amir Goldstein @ 2022-06-20 22:34 ` Dave Chinner 2022-06-21 2:53 ` Amir Goldstein 1 sibling, 1 reply; 24+ messages in thread From: Dave Chinner @ 2022-06-20 22:34 UTC (permalink / raw) To: Zorro Lang; +Cc: Amir Goldstein, Darrick J . Wong, Brian Foster, fstests On Mon, Jun 20, 2022 at 10:21:39PM +0800, Zorro Lang wrote: > On Mon, Jun 20, 2022 at 03:13:33PM +0300, Amir Goldstein wrote: > > On Mon, Jun 20, 2022 at 2:17 PM Zorro Lang <zlang@redhat.com> wrote: > > > > > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > > > > Almost all of the tests that _require_freeze() fail to unfreeze > > > > scratch mount in case the test is interrupted while fs is frozen. > > > > > > > > Move the handling of unfreeze to generic check code. > > > > For now, tests only freeze scratch fs, but to be more robust, unfreeze > > > > both test and scratch fs following a call to _require_freeze(). > > > > > > > > Tests could still hang if thier private _cleanup() routine tries > > > > to modify the frozen fs or wait for a blocked process. Fix the > > > > _cleanup() routine of xfs/011 to avoid that. > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > check | 14 ++++++++------ > > > > common/rc | 5 +++-- > > > > tests/generic/390 | 2 -- > > > > tests/xfs/011 | 2 -- > > > > tests/xfs/517 | 1 - > > > > 5 files changed, 11 insertions(+), 13 deletions(-) > > > > > > > > diff --git a/check b/check > > > > index de11b37e..d6ee71aa 100755 > > > > --- a/check > > > > +++ b/check > > > > @@ -527,17 +527,21 @@ _check_filesystems() > > > > { > > > > local ret=0 > > > > > > > > + # Make sure both test and scratch are unfrozen post _require_freeze() > > > > + if [ -f ${RESULT_DIR}/require_freeze ]; then > > > > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > > > > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > > > > + fi > > > > > > Hi Amir, > > > > > > I'm wondering why not let each freeze related test cases take care of "unfreeze" > > > by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV > > > (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the > > > case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't > > > help. > > > > That's also an option, but: > > 1. It is less robust, leaving room for human mistakes. Why is that better? It's not, but I'm addressing exactly this problem with the common _cleanup() infrastructure that I'm working on. Several of the issues that you are trying to address here are already dealt with in that series, and it doesn't add extra overhead to the test infrastructure for tests that don't use freeze/thaw. > > 2. Leftover frozen fs is quite harmful to subsequent test runs, so it > > is important > > to avoid it Well, yes. It is a test bug, and the test shouldn't have bugs. > > 3. Tests that freeze loop/dm (generic/459, xfs/428, [*]) can and should cleanup > > themselves. I will add that too, but in any case... Except now we have two methods of thawing the filesystem depending on where it is mounted and what device is in use. That's not an improvement. > > 4. unfreeze of tests and scratch fs is harmless even when it is not needed - > > we may even want to do that at the start of tests run(?) > > I think Dave doesn't like to add common steps to thousands of cases, if without > a critical reason. So I hope to get more review points for this kind of changes. Doing work that in every test execution that is only actually necessary if a test is interrupted to work around a potential bug in 1 or 2 tests is really poor design and implementation. Just fix the test bugs, don't burden everything else with the additional overhead of checking whether the test might have a bug in it or not... > > [*] I noticed that generic/085 which _require_freeze() does not even > > use xfs_freeze (intentionally) - it uses dmsetup suspend/resume, > > so I guess _require_freeze() should be removed from that test. > > Agree, I think dm suspend through different userspace and kernel interface with > fsfreeze, so that require doesn't make sense. dmsuspend uses the internal kernel freeze API to freeze the filesytsems. Failures in those tests can leave filesytsems frozen, too, but these days we cannot thaw them with fs level freeze.... > > Anyway, if you and others insist against this auto-unfreeze approach, > > I will post the patch to unfreeze fs in individual tests, but I won't be > > happy about it. > > From the functional side, I think this change makes sense. But if think about > the performance, let's get more opinions at first. If there's not objection, > we can have it. I think it's wrong from a functional side, too. The test harness is not responsible for individual test state cleanup - the tests themselves are responsible for that. The whole point of the _cleanup() consolidation I've started doing is that it will centralise all this common cleanup functionality and allow us to end up connecting it to _requires rules that indicate what functionality the test uses. i.e. the end goal is that most tests do not need any cleanup - everything that needs cleaning up is defined by the _requires rules the test runs first and nothing else is needed. Only when tests do custom stuff will test specific cleanup functions be needed, just like they are required to do now to clean up after themselves. And that means, right now, that tests that freeze the filesystem still need to do the right thing in the _cleanup() functions up until the consolidation of cleanup means they don't need it anymore.... AFAIC, you can go move all this stuff to check if you really want, but I'm just going to rip it all back out in short order because it just doesn't fit the cleanup model I'm trying to move the infrastructure towards. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-20 22:34 ` Dave Chinner @ 2022-06-21 2:53 ` Amir Goldstein 0 siblings, 0 replies; 24+ messages in thread From: Amir Goldstein @ 2022-06-21 2:53 UTC (permalink / raw) To: Dave Chinner; +Cc: Zorro Lang, Darrick J . Wong, Brian Foster, fstests On Tue, Jun 21, 2022 at 1:34 AM Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Jun 20, 2022 at 10:21:39PM +0800, Zorro Lang wrote: > > On Mon, Jun 20, 2022 at 03:13:33PM +0300, Amir Goldstein wrote: > > > On Mon, Jun 20, 2022 at 2:17 PM Zorro Lang <zlang@redhat.com> wrote: > > > > > > > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > > > > > Almost all of the tests that _require_freeze() fail to unfreeze > > > > > scratch mount in case the test is interrupted while fs is frozen. > > > > > > > > > > Move the handling of unfreeze to generic check code. > > > > > For now, tests only freeze scratch fs, but to be more robust, unfreeze > > > > > both test and scratch fs following a call to _require_freeze(). > > > > > > > > > > Tests could still hang if thier private _cleanup() routine tries > > > > > to modify the frozen fs or wait for a blocked process. Fix the > > > > > _cleanup() routine of xfs/011 to avoid that. > > > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > --- > > > > > check | 14 ++++++++------ > > > > > common/rc | 5 +++-- > > > > > tests/generic/390 | 2 -- > > > > > tests/xfs/011 | 2 -- > > > > > tests/xfs/517 | 1 - > > > > > 5 files changed, 11 insertions(+), 13 deletions(-) > > > > > > > > > > diff --git a/check b/check > > > > > index de11b37e..d6ee71aa 100755 > > > > > --- a/check > > > > > +++ b/check > > > > > @@ -527,17 +527,21 @@ _check_filesystems() > > > > > { > > > > > local ret=0 > > > > > > > > > > + # Make sure both test and scratch are unfrozen post _require_freeze() > > > > > + if [ -f ${RESULT_DIR}/require_freeze ]; then > > > > > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > > > > > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > > > > > + fi > > > > > > > > Hi Amir, > > > > > > > > I'm wondering why not let each freeze related test cases take care of "unfreeze" > > > > by itself? If a case freeze a filesystem which isn't on TEST_DEV or SCRATCH_DEV > > > > (e.g. a loop device base on a file in $TEST_DIR), or other situations, then the > > > > case still can call _require_freeze, but "unfreeze TEST_DIR or SCRATCH_MNT" can't > > > > help. > > > > > > That's also an option, but: > > > 1. It is less robust, leaving room for human mistakes. Why is that better? > > It's not, but I'm addressing exactly this problem with the common > _cleanup() infrastructure that I'm working on. Several of the issues > that you are trying to address here are already dealt with in that > series, and it doesn't add extra overhead to the test > infrastructure for tests that don't use freeze/thaw. > > > > 2. Leftover frozen fs is quite harmful to subsequent test runs, so it > > > is important > > > to avoid it > > Well, yes. It is a test bug, and the test shouldn't have bugs. > > > > 3. Tests that freeze loop/dm (generic/459, xfs/428, [*]) can and should cleanup > > > themselves. I will add that too, but in any case... > > Except now we have two methods of thawing the filesystem depending > on where it is mounted and what device is in use. That's not an > improvement. > > > > 4. unfreeze of tests and scratch fs is harmless even when it is not needed - > > > we may even want to do that at the start of tests run(?) > > > > I think Dave doesn't like to add common steps to thousands of cases, if without > > a critical reason. So I hope to get more review points for this kind of changes. > > Doing work that in every test execution that is only actually > necessary if a test is interrupted to work around a potential bug in > 1 or 2 tests is really poor design and implementation. Just fix the > test bugs, don't burden everything else with the additional overhead > of checking whether the test might have a bug in it or not... > > > > [*] I noticed that generic/085 which _require_freeze() does not even > > > use xfs_freeze (intentionally) - it uses dmsetup suspend/resume, > > > so I guess _require_freeze() should be removed from that test. > > > > Agree, I think dm suspend through different userspace and kernel interface with > > fsfreeze, so that require doesn't make sense. > > dmsuspend uses the internal kernel freeze API to freeze the > filesytsems. Failures in those tests can leave filesytsems frozen, > too, but these days we cannot thaw them with fs level freeze.... > > > > Anyway, if you and others insist against this auto-unfreeze approach, > > > I will post the patch to unfreeze fs in individual tests, but I won't be > > > happy about it. > > > > From the functional side, I think this change makes sense. But if think about > > the performance, let's get more opinions at first. If there's not objection, > > we can have it. > > I think it's wrong from a functional side, too. The test harness is > not responsible for individual test state cleanup - the tests > themselves are responsible for that. The whole point of the > _cleanup() consolidation I've started doing is that it will > centralise all this common cleanup functionality and allow us to end > up connecting it to _requires rules that indicate what functionality > the test uses. i.e. the end goal is that most tests do not need any > cleanup - everything that needs cleaning up is defined by the > _requires rules the test runs first and nothing else is needed. > > Only when tests do custom stuff will test specific cleanup functions > be needed, just like they are required to do now to clean up after > themselves. And that means, right now, that tests that freeze the > filesystem still need to do the right thing in the _cleanup() > functions up until the consolidation of cleanup means they don't > need it anymore.... > > AFAIC, you can go move all this stuff to check if you really > want, but I'm just going to rip it all back out in short order > because it just doesn't fit the cleanup model I'm trying to move the > infrastructure towards. > As long as your cleanup is going to consolidate all this I will just go ahead and fix the buggy tests now. Thanks! Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-19 13:46 ` [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts Amir Goldstein 2022-06-20 11:17 ` Zorro Lang @ 2022-06-20 22:06 ` Dave Chinner 2022-06-21 2:57 ` Amir Goldstein 1 sibling, 1 reply; 24+ messages in thread From: Dave Chinner @ 2022-06-20 22:06 UTC (permalink / raw) To: Amir Goldstein; +Cc: Zorro Lang, Darrick J . Wong, fstests On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > Almost all of the tests that _require_freeze() fail to unfreeze > scratch mount in case the test is interrupted while fs is frozen. > > Move the handling of unfreeze to generic check code. > For now, tests only freeze scratch fs, but to be more robust, unfreeze > both test and scratch fs following a call to _require_freeze(). > > Tests could still hang if thier private _cleanup() routine tries > to modify the frozen fs or wait for a blocked process. Fix the > _cleanup() routine of xfs/011 to avoid that. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > check | 14 ++++++++------ > common/rc | 5 +++-- > tests/generic/390 | 2 -- > tests/xfs/011 | 2 -- > tests/xfs/517 | 1 - > 5 files changed, 11 insertions(+), 13 deletions(-) > > diff --git a/check b/check > index de11b37e..d6ee71aa 100755 > --- a/check > +++ b/check > @@ -527,17 +527,21 @@ _check_filesystems() > { > local ret=0 > > + # Make sure both test and scratch are unfrozen post _require_freeze() > + if [ -f ${RESULT_DIR}/require_freeze ]; then > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > + fi A test leaving a filesystem frozen on exit is a test bug. There can still be background test processes sitting blocked on a frozen filesystem when the test exits with a frozen filesystem, and that has the potential to cause problems in the next few operations because of "busy filesystem" errors trying to unmount the fs... IOWs, think this is the wrong way to address this problem. tests that freeze filesystems need to ensure that everything is cleaned up properly in the test _cleanup() function where the right thing can be done and blocked processes can be waited on once the fs has been thawed. > diff --git a/tests/generic/390 b/tests/generic/390 > index 20c66e22..0f2b86fa 100755 > --- a/tests/generic/390 > +++ b/tests/generic/390 > @@ -14,8 +14,6 @@ _begin_fstest auto freeze stress > _cleanup() > { > cd / > - # Make sure $SCRATCH_MNT is unfreezed > - xfs_freeze -u $SCRATCH_MNT 2>/dev/null > rm -f $tmp.* > } This test is already doing the right thing. > diff --git a/tests/xfs/011 b/tests/xfs/011 > index d6e9099e..351a574e 100755 > --- a/tests/xfs/011 > +++ b/tests/xfs/011 > @@ -17,9 +17,7 @@ _begin_fstest auto freeze log metadata quick > _cleanup() > { > $KILLALL_PROG -9 fsstress 2>/dev/null > - wait > cd / > - _scratch_unmount 2>/dev/null > rm -f $tmp.* > } This is wrong. We have to wait for background fsstress processes to exit, otherwise unmount can fail randomly. What it is missing is the thaw before killing the fsstress processes and waiting for them to complete. > diff --git a/tests/xfs/517 b/tests/xfs/517 > index f7f9a8a2..961668e3 100755 > --- a/tests/xfs/517 > +++ b/tests/xfs/517 > @@ -15,7 +15,6 @@ _register_cleanup "_cleanup" BUS > _cleanup() > { > cd / > - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT > /dev/null 2>&1 > rm -rf $tmp.* > } This is doing the right thing, too. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts 2022-06-20 22:06 ` Dave Chinner @ 2022-06-21 2:57 ` Amir Goldstein 0 siblings, 0 replies; 24+ messages in thread From: Amir Goldstein @ 2022-06-21 2:57 UTC (permalink / raw) To: Dave Chinner; +Cc: Zorro Lang, Darrick J . Wong, fstests On Tue, Jun 21, 2022 at 1:06 AM Dave Chinner <david@fromorbit.com> wrote: > > On Sun, Jun 19, 2022 at 04:46:55PM +0300, Amir Goldstein wrote: > > Almost all of the tests that _require_freeze() fail to unfreeze > > scratch mount in case the test is interrupted while fs is frozen. > > > > Move the handling of unfreeze to generic check code. > > For now, tests only freeze scratch fs, but to be more robust, unfreeze > > both test and scratch fs following a call to _require_freeze(). > > > > Tests could still hang if thier private _cleanup() routine tries > > to modify the frozen fs or wait for a blocked process. Fix the > > _cleanup() routine of xfs/011 to avoid that. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > check | 14 ++++++++------ > > common/rc | 5 +++-- > > tests/generic/390 | 2 -- > > tests/xfs/011 | 2 -- > > tests/xfs/517 | 1 - > > 5 files changed, 11 insertions(+), 13 deletions(-) > > > > diff --git a/check b/check > > index de11b37e..d6ee71aa 100755 > > --- a/check > > +++ b/check > > @@ -527,17 +527,21 @@ _check_filesystems() > > { > > local ret=0 > > > > + # Make sure both test and scratch are unfrozen post _require_freeze() > > + if [ -f ${RESULT_DIR}/require_freeze ]; then > > + xfs_freeze -u "$TEST_DIR" >/dev/null 2>&1 > > + xfs_freeze -u "$SCRATCH_MNT" >/dev/null 2>&1 > > + fi > > A test leaving a filesystem frozen on exit is a test bug. There can > still be background test processes sitting blocked on a frozen > filesystem when the test exits with a frozen filesystem, and that > has the potential to cause problems in the next few operations > because of "busy filesystem" errors trying to unmount the fs... > > IOWs, think this is the wrong way to address this problem. tests > that freeze filesystems need to ensure that everything is cleaned up > properly in the test _cleanup() function where the right thing can > be done and blocked processes can be waited on once the fs has been > thawed. > ok. > > diff --git a/tests/generic/390 b/tests/generic/390 > > index 20c66e22..0f2b86fa 100755 > > --- a/tests/generic/390 > > +++ b/tests/generic/390 > > @@ -14,8 +14,6 @@ _begin_fstest auto freeze stress > > _cleanup() > > { > > cd / > > - # Make sure $SCRATCH_MNT is unfreezed > > - xfs_freeze -u $SCRATCH_MNT 2>/dev/null > > rm -f $tmp.* > > } > > This test is already doing the right thing. > > > diff --git a/tests/xfs/011 b/tests/xfs/011 > > index d6e9099e..351a574e 100755 > > --- a/tests/xfs/011 > > +++ b/tests/xfs/011 > > @@ -17,9 +17,7 @@ _begin_fstest auto freeze log metadata quick > > _cleanup() > > { > > $KILLALL_PROG -9 fsstress 2>/dev/null > > - wait > > cd / > > - _scratch_unmount 2>/dev/null > > rm -f $tmp.* > > } > > This is wrong. We have to wait for background fsstress processes to > exit, otherwise unmount can fail randomly. What it is missing is the > thaw before killing the fsstress processes and waiting for them to > complete. > I didn't remember if your position was that _cleanup() should wait or not . I guess from your answer that you think that it should and it makes sense to me, so I will also fix xfs/517 to wait. Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-19 13:46 [PATCH 0/4] aborted fstests may leave frozen fs behind Amir Goldstein 2022-06-19 13:46 ` [PATCH 1/4] fstests: add missing _require_freeze() to tests Amir Goldstein 2022-06-19 13:46 ` [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts Amir Goldstein @ 2022-06-19 13:46 ` Amir Goldstein 2022-06-21 4:02 ` Amir Goldstein 2022-06-19 13:46 ` [PATCH 4/4] xfs/{422,517}: fix false positive failure Amir Goldstein 3 siblings, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-19 13:46 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, fstests Those tests failed to cleanup background jobs after test is interrupted. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- tests/xfs/422 | 8 ++++++++ tests/xfs/517 | 1 + 2 files changed, 9 insertions(+) diff --git a/tests/xfs/422 b/tests/xfs/422 index a83a66df..8e9a3576 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze _register_cleanup "_cleanup" BUS +# Override the default cleanup function. +_cleanup() +{ + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 + cd / + rm -rf $tmp.* +} + # Import common functions. . ./common/filter . ./common/fuzzy diff --git a/tests/xfs/517 b/tests/xfs/517 index 961668e3..18404248 100755 --- a/tests/xfs/517 +++ b/tests/xfs/517 @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS # Override the default cleanup function. _cleanup() { + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 cd / rm -rf $tmp.* } -- 2.25.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-19 13:46 ` [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() Amir Goldstein @ 2022-06-21 4:02 ` Amir Goldstein 2022-06-23 18:04 ` Darrick J. Wong 0 siblings, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-21 4:02 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, fstests On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Those tests failed to cleanup background jobs after test > is interrupted. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > tests/xfs/422 | 8 ++++++++ > tests/xfs/517 | 1 + > 2 files changed, 9 insertions(+) > > diff --git a/tests/xfs/422 b/tests/xfs/422 > index a83a66df..8e9a3576 100755 > --- a/tests/xfs/422 > +++ b/tests/xfs/422 > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > _register_cleanup "_cleanup" BUS > > +# Override the default cleanup function. > +_cleanup() > +{ > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > + cd / > + rm -rf $tmp.* > +} > + > # Import common functions. > . ./common/filter > . ./common/fuzzy > diff --git a/tests/xfs/517 b/tests/xfs/517 > index 961668e3..18404248 100755 > --- a/tests/xfs/517 > +++ b/tests/xfs/517 > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > # Override the default cleanup function. > _cleanup() > { > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 Besides the missing wait, this cleanup is still racy, because stress_loop can spawn a new fstress process after killall /proc iteration. Worst, freeze_loop can freeze after killall /proc iteration. The correct solution (I think) is to record the pid of the XXX_loop sub-shells and kill those, which should also solve the "Terminated" false positive error. Will give that a shot. Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-21 4:02 ` Amir Goldstein @ 2022-06-23 18:04 ` Darrick J. Wong 2022-06-24 4:36 ` Amir Goldstein 2022-06-24 4:49 ` Delegating fstests maintenance work (Was: Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup()) Amir Goldstein 0 siblings, 2 replies; 24+ messages in thread From: Darrick J. Wong @ 2022-06-23 18:04 UTC (permalink / raw) To: Amir Goldstein; +Cc: Zorro Lang, Darrick J . Wong, Dave Chinner, fstests On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > Those tests failed to cleanup background jobs after test > > is interrupted. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > tests/xfs/422 | 8 ++++++++ > > tests/xfs/517 | 1 + > > 2 files changed, 9 insertions(+) > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > index a83a66df..8e9a3576 100755 > > --- a/tests/xfs/422 > > +++ b/tests/xfs/422 > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > _register_cleanup "_cleanup" BUS > > > > +# Override the default cleanup function. > > +_cleanup() > > +{ > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > + cd / > > + rm -rf $tmp.* > > +} > > + > > # Import common functions. > > . ./common/filter > > . ./common/fuzzy > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > index 961668e3..18404248 100755 > > --- a/tests/xfs/517 > > +++ b/tests/xfs/517 > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > # Override the default cleanup function. > > _cleanup() > > { > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > Besides the missing wait, this cleanup is still racy, because stress_loop > can spawn a new fstress process after killall /proc iteration. > > Worst, freeze_loop can freeze after killall /proc iteration. > > The correct solution (I think) is to record the pid of the XXX_loop > sub-shells and kill those, which should also solve the "Terminated" > false positive error. > > Will give that a shot. FWIW I already have patches reworking 422 and all the other scrub-related stress tests: https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes I swear I'll send all these some day, if I ever get caught up... Delegating LTS maintenance is a big help, but as it is I still can't stay abreast of all the mainline patchsets /and/ send my own stuff. :( --D > Thanks, > Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-23 18:04 ` Darrick J. Wong @ 2022-06-24 4:36 ` Amir Goldstein 2022-06-24 6:31 ` Zorro Lang 2022-06-24 4:49 ` Delegating fstests maintenance work (Was: Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup()) Amir Goldstein 1 sibling, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-24 4:36 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Zorro Lang, Darrick J . Wong, Dave Chinner, fstests On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > Those tests failed to cleanup background jobs after test > > > is interrupted. > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > tests/xfs/422 | 8 ++++++++ > > > tests/xfs/517 | 1 + > > > 2 files changed, 9 insertions(+) > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > index a83a66df..8e9a3576 100755 > > > --- a/tests/xfs/422 > > > +++ b/tests/xfs/422 > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > _register_cleanup "_cleanup" BUS > > > > > > +# Override the default cleanup function. > > > +_cleanup() > > > +{ > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > + cd / > > > + rm -rf $tmp.* > > > +} > > > + > > > # Import common functions. > > > . ./common/filter > > > . ./common/fuzzy > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > index 961668e3..18404248 100755 > > > --- a/tests/xfs/517 > > > +++ b/tests/xfs/517 > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > # Override the default cleanup function. > > > _cleanup() > > > { > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > can spawn a new fstress process after killall /proc iteration. > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > The correct solution (I think) is to record the pid of the XXX_loop > > sub-shells and kill those, which should also solve the "Terminated" > > false positive error. > > > > Will give that a shot. > > FWIW I already have patches reworking 422 and all the other > scrub-related stress tests: > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > That looks very nice. The cleanup looks very similar to the one I posted for v2, but instead of "killall xfs_io fsstress" you would be better off with "kill $__SCRUB_STRESS_FREEZE_PID" killing the child process and waiting for its parent bash process has the side effect of the bash parent emitting "Terminated" or "Killed" message with breaks golden output if the inner loop did not already exit, which is the behavior that got me started on doing this cleanup. > I swear I'll send all these some day, if I ever get caught up... > Delegating LTS maintenance is a big help, but as it is I still can't > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > I will respond to that in another email. Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-24 4:36 ` Amir Goldstein @ 2022-06-24 6:31 ` Zorro Lang 2022-06-24 6:41 ` Amir Goldstein 0 siblings, 1 reply; 24+ messages in thread From: Zorro Lang @ 2022-06-24 6:31 UTC (permalink / raw) To: Amir Goldstein; +Cc: fstests On Fri, Jun 24, 2022 at 07:36:51AM +0300, Amir Goldstein wrote: > On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > Those tests failed to cleanup background jobs after test > > > > is interrupted. > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > tests/xfs/422 | 8 ++++++++ > > > > tests/xfs/517 | 1 + > > > > 2 files changed, 9 insertions(+) > > > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > > index a83a66df..8e9a3576 100755 > > > > --- a/tests/xfs/422 > > > > +++ b/tests/xfs/422 > > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > > > _register_cleanup "_cleanup" BUS > > > > > > > > +# Override the default cleanup function. > > > > +_cleanup() > > > > +{ > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > + cd / > > > > + rm -rf $tmp.* > > > > +} > > > > + > > > > # Import common functions. > > > > . ./common/filter > > > > . ./common/fuzzy > > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > > index 961668e3..18404248 100755 > > > > --- a/tests/xfs/517 > > > > +++ b/tests/xfs/517 > > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > > # Override the default cleanup function. > > > > _cleanup() > > > > { > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > > can spawn a new fstress process after killall /proc iteration. > > > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > > > The correct solution (I think) is to record the pid of the XXX_loop > > > sub-shells and kill those, which should also solve the "Terminated" > > > false positive error. > > > > > > Will give that a shot. > > > > FWIW I already have patches reworking 422 and all the other > > scrub-related stress tests: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > > > > That looks very nice. > The cleanup looks very similar to the one I posted for v2, > but instead of "killall xfs_io fsstress" you would be better off with > "kill $__SCRUB_STRESS_FREEZE_PID" > > killing the child process and waiting for its parent bash > process has the side effect of the bash parent emitting > "Terminated" or "Killed" message with breaks golden output > if the inner loop did not already exit, which is the behavior > that got me started on doing this cleanup. > > > I swear I'll send all these some day, if I ever get caught up... > > Delegating LTS maintenance is a big help, but as it is I still can't > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > > > I will respond to that in another email. Hi Amir, So ... will you singly resend a V3 patch to remove the xfs/422 changes in this patch? Thanks, Zorro > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-24 6:31 ` Zorro Lang @ 2022-06-24 6:41 ` Amir Goldstein 2022-06-24 15:09 ` Zorro Lang 0 siblings, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-24 6:41 UTC (permalink / raw) To: Zorro Lang; +Cc: fstests On Fri, Jun 24, 2022 at 9:31 AM Zorro Lang <zlang@redhat.com> wrote: > > On Fri, Jun 24, 2022 at 07:36:51AM +0300, Amir Goldstein wrote: > > On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > Those tests failed to cleanup background jobs after test > > > > > is interrupted. > > > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > --- > > > > > tests/xfs/422 | 8 ++++++++ > > > > > tests/xfs/517 | 1 + > > > > > 2 files changed, 9 insertions(+) > > > > > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > > > index a83a66df..8e9a3576 100755 > > > > > --- a/tests/xfs/422 > > > > > +++ b/tests/xfs/422 > > > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > > > > > _register_cleanup "_cleanup" BUS > > > > > > > > > > +# Override the default cleanup function. > > > > > +_cleanup() > > > > > +{ > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > + cd / > > > > > + rm -rf $tmp.* > > > > > +} > > > > > + > > > > > # Import common functions. > > > > > . ./common/filter > > > > > . ./common/fuzzy > > > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > > > index 961668e3..18404248 100755 > > > > > --- a/tests/xfs/517 > > > > > +++ b/tests/xfs/517 > > > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > > > # Override the default cleanup function. > > > > > _cleanup() > > > > > { > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > > > can spawn a new fstress process after killall /proc iteration. > > > > > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > > > > > The correct solution (I think) is to record the pid of the XXX_loop > > > > sub-shells and kill those, which should also solve the "Terminated" > > > > false positive error. > > > > > > > > Will give that a shot. > > > > > > FWIW I already have patches reworking 422 and all the other > > > scrub-related stress tests: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > > > > > > > That looks very nice. > > The cleanup looks very similar to the one I posted for v2, > > but instead of "killall xfs_io fsstress" you would be better off with > > "kill $__SCRUB_STRESS_FREEZE_PID" > > > > killing the child process and waiting for its parent bash > > process has the side effect of the bash parent emitting > > "Terminated" or "Killed" message with breaks golden output > > if the inner loop did not already exit, which is the behavior > > that got me started on doing this cleanup. > > > > > I swear I'll send all these some day, if I ever get caught up... > > > Delegating LTS maintenance is a big help, but as it is I still can't > > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > > > > > > I will respond to that in another email. > > Hi Amir, > > So ... will you singly resend a V3 patch to remove the xfs/422 changes in this patch? > I don't understand. V2 patch should be reviewed https://lore.kernel.org/fstests/20220621173729.2135249-1-amir73il@gmail.com/ The fact that Darrick has future work pending to cleanup xfs/422 should not stop a cleanup patch to be applied now. Am I missing something? Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() 2022-06-24 6:41 ` Amir Goldstein @ 2022-06-24 15:09 ` Zorro Lang 0 siblings, 0 replies; 24+ messages in thread From: Zorro Lang @ 2022-06-24 15:09 UTC (permalink / raw) To: Amir Goldstein; +Cc: fstests On Fri, Jun 24, 2022 at 09:41:34AM +0300, Amir Goldstein wrote: > On Fri, Jun 24, 2022 at 9:31 AM Zorro Lang <zlang@redhat.com> wrote: > > > > On Fri, Jun 24, 2022 at 07:36:51AM +0300, Amir Goldstein wrote: > > > On Thu, Jun 23, 2022 at 9:04 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > > > On Tue, Jun 21, 2022 at 07:02:02AM +0300, Amir Goldstein wrote: > > > > > On Sun, Jun 19, 2022 at 4:47 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > > > Those tests failed to cleanup background jobs after test > > > > > > is interrupted. > > > > > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > > --- > > > > > > tests/xfs/422 | 8 ++++++++ > > > > > > tests/xfs/517 | 1 + > > > > > > 2 files changed, 9 insertions(+) > > > > > > > > > > > > diff --git a/tests/xfs/422 b/tests/xfs/422 > > > > > > index a83a66df..8e9a3576 100755 > > > > > > --- a/tests/xfs/422 > > > > > > +++ b/tests/xfs/422 > > > > > > @@ -13,6 +13,14 @@ _begin_fstest dangerous_scrub dangerous_online_repair freeze > > > > > > > > > > > > _register_cleanup "_cleanup" BUS > > > > > > > > > > > > +# Override the default cleanup function. > > > > > > +_cleanup() > > > > > > +{ > > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > + cd / > > > > > > + rm -rf $tmp.* > > > > > > +} > > > > > > + > > > > > > # Import common functions. > > > > > > . ./common/filter > > > > > > . ./common/fuzzy > > > > > > diff --git a/tests/xfs/517 b/tests/xfs/517 > > > > > > index 961668e3..18404248 100755 > > > > > > --- a/tests/xfs/517 > > > > > > +++ b/tests/xfs/517 > > > > > > @@ -14,6 +14,7 @@ _register_cleanup "_cleanup" BUS > > > > > > # Override the default cleanup function. > > > > > > _cleanup() > > > > > > { > > > > > > + $KILLALL_PROG -9 $XFS_IO_PROG $FSSTRESS_PROG > /dev/null 2>&1 > > > > > > > > > > Besides the missing wait, this cleanup is still racy, because stress_loop > > > > > can spawn a new fstress process after killall /proc iteration. > > > > > > > > > > Worst, freeze_loop can freeze after killall /proc iteration. > > > > > > > > > > The correct solution (I think) is to record the pid of the XXX_loop > > > > > sub-shells and kill those, which should also solve the "Terminated" > > > > > false positive error. > > > > > > > > > > Will give that a shot. > > > > > > > > FWIW I already have patches reworking 422 and all the other > > > > scrub-related stress tests: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes > > > > > > > > > > That looks very nice. > > > The cleanup looks very similar to the one I posted for v2, > > > but instead of "killall xfs_io fsstress" you would be better off with > > > "kill $__SCRUB_STRESS_FREEZE_PID" > > > > > > killing the child process and waiting for its parent bash > > > process has the side effect of the bash parent emitting > > > "Terminated" or "Killed" message with breaks golden output > > > if the inner loop did not already exit, which is the behavior > > > that got me started on doing this cleanup. > > > > > > > I swear I'll send all these some day, if I ever get caught up... > > > > Delegating LTS maintenance is a big help, but as it is I still can't > > > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > > > > > > > > > I will respond to that in another email. > > > > Hi Amir, > > > > So ... will you singly resend a V3 patch to remove the xfs/422 changes in this patch? > > > > I don't understand. > > V2 patch should be reviewed > > https://lore.kernel.org/fstests/20220621173729.2135249-1-amir73il@gmail.com/ > > The fact that Darrick has future work pending to cleanup xfs/422 > should not stop a cleanup patch to be applied now. Oh, maybe I misunderstood. I thought Darrick meant his patchset (which haven't been sent out) contains what you did on xfs/422. If that's misunderstanding, I'll review and merge your V2 :) Thanks, Zorro > > Am I missing something? > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Delegating fstests maintenance work (Was: Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup()) 2022-06-23 18:04 ` Darrick J. Wong 2022-06-24 4:36 ` Amir Goldstein @ 2022-06-24 4:49 ` Amir Goldstein 2022-06-25 3:11 ` Darrick J. Wong 1 sibling, 1 reply; 24+ messages in thread From: Amir Goldstein @ 2022-06-24 4:49 UTC (permalink / raw) To: Darrick J. Wong Cc: Zorro Lang, Darrick J . Wong, Dave Chinner, fstests, Josef Bacik, Theodore Tso, Chris Mason, linux-fsdevel [+fsdevel] > > I swear I'll send all these some day, if I ever get caught up... > Delegating LTS maintenance is a big help, but as it is I still can't > stay abreast of all the mainline patchsets /and/ send my own stuff. :( I have to repeat what I said in LSFMM about the LTP project and what fstests could be like. Companies put dedicated engineers to work as proactive LTP maintainers. There is absolutely no reason that companies won't do the same for fstests. If only every large corp. that employs >10 fs developers will assign a halftime engineer for fstests maintenance, their fs developer's work would become so much more productive and their fs product will become more reliable. I think the fact that this is not happening is a failure on our part to communicate that to our managers. From my experience, if you had sent stuff like your fstests cleanups to the LTP maintainers and ask for their help to land it, they would thank you for the work you did and take care of all the testing on all platforms and fixing all the code style and framework issues. LTP maintainers constantly work on improving the framework and providing more features to test writers as well as on converting old tests to use new infrastructure. Stuff like Dave's work on sorting up the cleanup mess or the groups cleanup and groups speedup - all of those do not have to add load to busy maintainers - life can be different! Taking responsibility away from developers to deliver their own tests is a slippery slope, but getting help and working together is essential for offloading work from busy maintainers. Thanks, Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Delegating fstests maintenance work (Was: Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup()) 2022-06-24 4:49 ` Delegating fstests maintenance work (Was: Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup()) Amir Goldstein @ 2022-06-25 3:11 ` Darrick J. Wong 0 siblings, 0 replies; 24+ messages in thread From: Darrick J. Wong @ 2022-06-25 3:11 UTC (permalink / raw) To: Amir Goldstein Cc: Zorro Lang, Darrick J . Wong, Dave Chinner, fstests, Josef Bacik, Theodore Tso, Chris Mason, linux-fsdevel On Fri, Jun 24, 2022 at 07:49:29AM +0300, Amir Goldstein wrote: > [+fsdevel] > > > > > I swear I'll send all these some day, if I ever get caught up... > > Delegating LTS maintenance is a big help, but as it is I still can't > > stay abreast of all the mainline patchsets /and/ send my own stuff. :( > > I have to repeat what I said in LSFMM about the LTP project and > what fstests could be like. > > Companies put dedicated engineers to work as proactive LTP > maintainers. There is absolutely no reason that companies won't > do the same for fstests. > > If only every large corp. that employs >10 fs developers will assign > a halftime engineer for fstests maintenance, their fs developer's > work would become so much more productive and their fs product > will become more reliable. > > I think the fact that this is not happening is a failure on our part to > communicate that to our managers. I'd enjoy that too; I'll bring it up the next time they start asking about budgeting here (which means in 2 weeks). > From my experience, if you had sent stuff like your fstests cleanups > to the LTP maintainers and ask for their help to land it, they would > thank you for the work you did and take care of all the testing > on all platforms and fixing all the code style and framework issues. Though to be fair -- a lot of the fstests changes backing up in djwong-dev exist to enable testing of the online fsck feature. This whole year I've deprioritized sending any of those patches to concentrate on writing the design documentation for online fsck[1]. Now that I've submitted *that*, I'm hoping to start code review once I convince a few people to grok the design doc. So perhaps next week I'll resume the patchbombing that I've become infamous for doing. In the mean time, no objections to merging /this/ series. The group labelling is a little odd and I think that should be separate fix from adding _require_freeze), but if zorro's ok with its present form then so be it. --D [1] https://lore.kernel.org/linux-xfs/165456652256.167418.912764930038710353.stgit@magnolia/ > LTP maintainers constantly work on improving the framework and > providing more features to test writers as well as on converting old > tests to use new infrastructure. > > Stuff like Dave's work on sorting up the cleanup mess or the groups > cleanup and groups speedup - all of those do not have to add load > to busy maintainers - life can be different! > > Taking responsibility away from developers to deliver their own tests > is a slippery slope, but getting help and working together is essential > for offloading work from busy maintainers. > > Thanks, > Amir. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 4/4] xfs/{422,517}: fix false positive failure 2022-06-19 13:46 [PATCH 0/4] aborted fstests may leave frozen fs behind Amir Goldstein ` (2 preceding siblings ...) 2022-06-19 13:46 ` [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() Amir Goldstein @ 2022-06-19 13:46 ` Amir Goldstein 3 siblings, 0 replies; 24+ messages in thread From: Amir Goldstein @ 2022-06-19 13:46 UTC (permalink / raw) To: Zorro Lang; +Cc: Darrick J . Wong, Dave Chinner, fstests xfs/517 fails randomally with this error: QA output created by 517 Format and populate Concurrent fsmap and freeze +Terminated Test done These two test run fsstress inside the sub-shell stress_loop(), which is run as a background job. The sub-shell has an inner loop that exist after at least 30s. The outer shell, sleeps for 32s and then kills fsstress using killall. If the inner sub-shell loop does not exit before fsstress is killed, bash sub-shell prints "Terminated" to stderr and breaks golden output. There are two easy solutions to this issue: 1. The sub-shell stderr could be redirected to /dev/null or $seq.full 2. killall can use SIGINT which suppresses the "Terminated" print The tests generic/270, generic/388, generic/475, generic/648 use the first method, but that looses any other errors that fsstress may report during the inner loop. overlay/058 uses the second method (but with SIGPIPE). Use this method to preserve other reported errors. Alas, this is not enough to fix the false positive failure - the main test thread needs to also wait for the background jobs to exit. Otherwise, killall -9 in _cleanup() will cause a similar "Killed" message in stderr. Adding -w to killall requires to move it to after unfreeze, otherwise, fsstress process may be left blocked on a frozen fs and wait will not return. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- tests/xfs/422 | 5 ++++- tests/xfs/517 | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/tests/xfs/422 b/tests/xfs/422 index 8e9a3576..63d133f8 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -95,8 +95,11 @@ while [ "$(date +%s)" -lt $((end + 2)) ]; do done # ...and clean up after the loops in case they didn't do it themselves. -$KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1 +# First thaw fs, so fsstress can exit, then kill and wait for fsstress. +# Use of SIGINT instead of SIGTERM suppresses the "Terminated" print +# from the XXX_loop() bash sub-shells $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 +$KILLALL_PROG -w -SIGINT $XFS_IO_PROG $FSSTRESS_PROG >> $seqres.full 2>&1 echo "Loop finished at $(date)" >> $seqres.full echo "Test done" diff --git a/tests/xfs/517 b/tests/xfs/517 index 18404248..2f52d634 100755 --- a/tests/xfs/517 +++ b/tests/xfs/517 @@ -92,8 +92,11 @@ while [ "$(date +%s)" -lt $((end + 2)) ]; do done # ...and clean up after the loops in case they didn't do it themselves. -$KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1 +# First thaw fs, so fsstress can exit, then kill and wait for fsstress. +# Use of SIGINT instead of SIGTERM suppresses the "Terminated" print +# from the XXX_loop() bash sub-shells $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 +$KILLALL_PROG -w -SIGINT $XFS_IO_PROG $FSSTRESS_PROG >> $seqres.full 2>&1 echo "Loop finished at $(date)" >> $seqres.full echo "Test done" -- 2.25.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
end of thread, other threads:[~2022-06-25 3:11 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-19 13:46 [PATCH 0/4] aborted fstests may leave frozen fs behind Amir Goldstein 2022-06-19 13:46 ` [PATCH 1/4] fstests: add missing _require_freeze() to tests Amir Goldstein 2022-06-23 18:00 ` Darrick J. Wong 2022-06-24 4:05 ` Amir Goldstein 2022-06-24 6:27 ` Zorro Lang 2022-06-19 13:46 ` [PATCH 2/4] fstests: make sure to unfreeze test and scratch mounts Amir Goldstein 2022-06-20 11:17 ` Zorro Lang 2022-06-20 12:13 ` Amir Goldstein 2022-06-20 14:21 ` Zorro Lang 2022-06-20 15:08 ` Amir Goldstein 2022-06-20 22:34 ` Dave Chinner 2022-06-21 2:53 ` Amir Goldstein 2022-06-20 22:06 ` Dave Chinner 2022-06-21 2:57 ` Amir Goldstein 2022-06-19 13:46 ` [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup() Amir Goldstein 2022-06-21 4:02 ` Amir Goldstein 2022-06-23 18:04 ` Darrick J. Wong 2022-06-24 4:36 ` Amir Goldstein 2022-06-24 6:31 ` Zorro Lang 2022-06-24 6:41 ` Amir Goldstein 2022-06-24 15:09 ` Zorro Lang 2022-06-24 4:49 ` Delegating fstests maintenance work (Was: Re: [PATCH 3/4] xfs/{422,517}: add missing killall to _cleanup()) Amir Goldstein 2022-06-25 3:11 ` Darrick J. Wong 2022-06-19 13:46 ` [PATCH 4/4] xfs/{422,517}: fix false positive failure Amir Goldstein
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.