[PATCH 1/2] check: try to fix the test device if it gets corrupted

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] check: try to fix the test device if it gets corrupted
@ 2023-04-14 22:11 Leah Rumancik
  2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik
  2023-04-15  0:35 ` [PATCH 1/2] check: try to fix the test device if it gets corrupted Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Leah Rumancik @ 2023-04-14 22:11 UTC (permalink / raw)
  To: fstests; +Cc: tytso, djwong

From: Theodore Ts'o <tytso@mit.edu>

If the test device gets corrupted all subsequent tests will fail.  To
prevent this from causing all subsequent tests to be useless, try
repair the file system on TEST_DEV if possible.  We don't need to do
this with the scratch device since that file system gets recreated
each time anyway.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 check      |  7 ++++++-
 common/rc  | 41 +++++++++++++++++++++++++++++++++++++++++
 common/xfs | 12 ++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/check b/check
index 1a58a2b2..befbf465 100755
--- a/check
+++ b/check
@@ -536,7 +536,12 @@ _check_filesystems()
 	local ret=0
 
 	if [ -f ${RESULT_DIR}/require_test ]; then
-		_check_test_fs || ret=1
+		if ! _check_test_fs ; then
+			ret=1
+			echo "Trying to repair broken TEST_DEV file system"
+			_repair_test_fs
+			_test_mount
+		fi
 		rm -f ${RESULT_DIR}/require_test*
 	else
 		_test_unmount 2> /dev/null
diff --git a/common/rc b/common/rc
index 90749343..e8fc7e86 100644
--- a/common/rc
+++ b/common/rc
@@ -1199,6 +1199,47 @@ _repair_scratch_fs()
     esac
 }
 
+_repair_test_fs()
+{
+	case $FSTYP in
+	xfs)
+		_repair_xfs_test_fs "$@" >$tmp.repair 2>&1
+		res=$?
+		if [ "$res" -ne 0 ]; then
+			echo "xfs_repair returns $res; replay log?" >>$tmp.repair
+			_test_mount
+			res=$?
+			if [ $res -gt 0 ]; then
+				echo "mount returns $res; zap log?" >>$tmp.repair
+				_xfs_repair_test_fs -L >>$tmp.repair 2>&1
+				echo "log zap returns $?" >> $tmp.repair
+			else
+				umount "$TEST_DEV"
+			fi
+			_xfs_repair_test_fs "$@" >>$tmp.repair 2>&1
+			res=$?
+		fi
+		;;
+	*)
+		# Let's hope fsck -y suffices...
+		fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1
+		res=$?
+		if test "$res" -lt 4 ; then
+			res=0
+		fi
+		;;
+	esac
+	if [ $res -ne 0 ]; then
+		_log_err "_repair_test_fs: failed, err=$res"
+		echo "*** fsck.$FSTYP output ***"	>>$seqres.full
+		cat $tmp.repair				>>$seqres.full
+		echo "*** end fsck.$FSTYP output"	>>$seqres.full
+
+	fi
+	rm -f $tmp.repair
+	return $res
+}
+
 _get_pids_by_name()
 {
     if [ $# -ne 1 ]
diff --git a/common/xfs b/common/xfs
index e8e4832c..4a130493 100644
--- a/common/xfs
+++ b/common/xfs
@@ -988,6 +988,18 @@ _check_xfs_test_fs()
 	return $?
 }
 
+# modeled after _scratch_xfs_repair
+_repair_xfs_test_fs()
+{
+	TEST_OPTIONS=""
+	[ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \
+		TEST_OPTIONS="-l$TEST_LOGDEV"
+	[ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \
+		TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV"
+	[ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t"
+	$XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV
+}
+
 _require_xfs_test_rmapbt()
 {
 	_require_test
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] check: _check_filesystems for errors even if test failed
  2023-04-14 22:11 [PATCH 1/2] check: try to fix the test device if it gets corrupted Leah Rumancik
@ 2023-04-14 22:11 ` Leah Rumancik
  2023-04-15  0:41   ` Darrick J. Wong
  2023-04-15  0:35 ` [PATCH 1/2] check: try to fix the test device if it gets corrupted Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Leah Rumancik @ 2023-04-14 22:11 UTC (permalink / raw)
  To: fstests; +Cc: tytso, djwong, Leah Rumancik

Previously, we would only run _check_filesystems to ensure that a test
that appeared to pass did not have any filesystem corruption. However,
in _check_filesystems, we also repair any errors found in the filesystem.
Let's do this even if we already know the test failed so that subsequent
tests aren't affected.

Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 check | 1 +
 1 file changed, 1 insertion(+)

diff --git a/check b/check
index befbf465..c18f02ca 100755
--- a/check
+++ b/check
@@ -972,6 +972,7 @@ function run_section()
 			# Even though we failed, there may be something interesting in
 			# dmesg which can help debugging.
 			_check_dmesg
+			(_adjust_oom_score 250; _check_filesystems)
 			tc_status="fail"
 		else
 			# The test apparently passed, so check for corruption
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] check: try to fix the test device if it gets corrupted
  2023-04-14 22:11 [PATCH 1/2] check: try to fix the test device if it gets corrupted Leah Rumancik
  2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik
@ 2023-04-15  0:35 ` Darrick J. Wong
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2023-04-15  0:35 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: fstests, tytso

On Fri, Apr 14, 2023 at 03:11:32PM -0700, Leah Rumancik wrote:
> From: Theodore Ts'o <tytso@mit.edu>
> 
> If the test device gets corrupted all subsequent tests will fail.  To
> prevent this from causing all subsequent tests to be useless, try
> repair the file system on TEST_DEV if possible.  We don't need to do
> this with the scratch device since that file system gets recreated
> each time anyway.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  check      |  7 ++++++-
>  common/rc  | 41 +++++++++++++++++++++++++++++++++++++++++
>  common/xfs | 12 ++++++++++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/check b/check
> index 1a58a2b2..befbf465 100755
> --- a/check
> +++ b/check
> @@ -536,7 +536,12 @@ _check_filesystems()
>  	local ret=0
>  
>  	if [ -f ${RESULT_DIR}/require_test ]; then
> -		_check_test_fs || ret=1
> +		if ! _check_test_fs ; then
> +			ret=1
> +			echo "Trying to repair broken TEST_DEV file system"
> +			_repair_test_fs
> +			_test_mount
> +		fi
>  		rm -f ${RESULT_DIR}/require_test*
>  	else
>  		_test_unmount 2> /dev/null
> diff --git a/common/rc b/common/rc
> index 90749343..e8fc7e86 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -1199,6 +1199,47 @@ _repair_scratch_fs()
>      esac
>  }
>  
> +_repair_test_fs()
> +{
> +	case $FSTYP in
> +	xfs)
> +		_repair_xfs_test_fs "$@" >$tmp.repair 2>&1
> +		res=$?
> +		if [ "$res" -ne 0 ]; then
> +			echo "xfs_repair returns $res; replay log?" >>$tmp.repair
> +			_test_mount
> +			res=$?
> +			if [ $res -gt 0 ]; then
> +				echo "mount returns $res; zap log?" >>$tmp.repair
> +				_xfs_repair_test_fs -L >>$tmp.repair 2>&1
> +				echo "log zap returns $?" >> $tmp.repair
> +			else
> +				umount "$TEST_DEV"
> +			fi
> +			_xfs_repair_test_fs "$@" >>$tmp.repair 2>&1
> +			res=$?
> +		fi
> +		;;
> +	*)
> +		# Let's hope fsck -y suffices...
> +		fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1
> +		res=$?
> +		if test "$res" -lt 4 ; then
> +			res=0
> +		fi
> +		;;
> +	esac
> +	if [ $res -ne 0 ]; then
> +		_log_err "_repair_test_fs: failed, err=$res"
> +		echo "*** fsck.$FSTYP output ***"	>>$seqres.full
> +		cat $tmp.repair				>>$seqres.full
> +		echo "*** end fsck.$FSTYP output"	>>$seqres.full
> +
> +	fi
> +	rm -f $tmp.repair
> +	return $res
> +}
> +
>  _get_pids_by_name()
>  {
>      if [ $# -ne 1 ]
> diff --git a/common/xfs b/common/xfs
> index e8e4832c..4a130493 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -988,6 +988,18 @@ _check_xfs_test_fs()
>  	return $?
>  }
>  
> +# modeled after _scratch_xfs_repair
> +_repair_xfs_test_fs()

Dumb bikeshed: Can this be named _test_xfs_repair? :D

With that changed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +{
> +	TEST_OPTIONS=""
> +	[ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \
> +		TEST_OPTIONS="-l$TEST_LOGDEV"
> +	[ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \
> +		TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV"
> +	[ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t"
> +	$XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV
> +}
> +
>  _require_xfs_test_rmapbt()
>  {
>  	_require_test
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed
  2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik
@ 2023-04-15  0:41   ` Darrick J. Wong
  2023-04-17 17:19     ` Leah Rumancik
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2023-04-15  0:41 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: fstests, tytso

On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> Previously, we would only run _check_filesystems to ensure that a test
> that appeared to pass did not have any filesystem corruption. However,
> in _check_filesystems, we also repair any errors found in the filesystem.
> Let's do this even if we already know the test failed so that subsequent
> tests aren't affected.
> 
> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> ---
>  check | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/check b/check
> index befbf465..c18f02ca 100755
> --- a/check
> +++ b/check
> @@ -972,6 +972,7 @@ function run_section()
>  			# Even though we failed, there may be something interesting in
>  			# dmesg which can help debugging.
>  			_check_dmesg
> +			(_adjust_oom_score 250; _check_filesystems)

Seeing as the test failed, do we care about the state of the scratch fs?
Would it be sufficient only to clean up the test fs to avoid cascading
damage?

(Asking as someone who knows how impactful slow filesystem checking can
be on fstests runtimes... ;))

--D

>  			tc_status="fail"
>  		else
>  			# The test apparently passed, so check for corruption
> -- 
> 2.40.0.634.g4ca3ef3211-goog
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed
  2023-04-15  0:41   ` Darrick J. Wong
@ 2023-04-17 17:19     ` Leah Rumancik
  2023-04-17 20:08       ` Leah Rumancik
  0 siblings, 1 reply; 7+ messages in thread
From: Leah Rumancik @ 2023-04-17 17:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Leah Rumancik, fstests, tytso

Sure, will send out another set. Thanks!

On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> > Previously, we would only run _check_filesystems to ensure that a test
> > that appeared to pass did not have any filesystem corruption. However,
> > in _check_filesystems, we also repair any errors found in the filesystem.
> > Let's do this even if we already know the test failed so that subsequent
> > tests aren't affected.
> >
> > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > ---
> >  check | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/check b/check
> > index befbf465..c18f02ca 100755
> > --- a/check
> > +++ b/check
> > @@ -972,6 +972,7 @@ function run_section()
> >                       # Even though we failed, there may be something interesting in
> >                       # dmesg which can help debugging.
> >                       _check_dmesg
> > +                     (_adjust_oom_score 250; _check_filesystems)
>
> Seeing as the test failed, do we care about the state of the scratch fs?
> Would it be sufficient only to clean up the test fs to avoid cascading
> damage?
>
> (Asking as someone who knows how impactful slow filesystem checking can
> be on fstests runtimes... ;))
>
> --D
>
> >                       tc_status="fail"
> >               else
> >                       # The test apparently passed, so check for corruption
> > --
> > 2.40.0.634.g4ca3ef3211-goog
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed
  2023-04-17 17:19     ` Leah Rumancik
@ 2023-04-17 20:08       ` Leah Rumancik
  2023-04-18  4:49         ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Leah Rumancik @ 2023-04-17 20:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Leah Rumancik, fstests, tytso

Well actually, if you are using DUMP_CORRUPT_FS, you'd probably want
_check_filesystems to run on the scratch fs as well. This shouldn't
add that much time if most of your tests are passing :) but I could
make this only run for scratch fs on failure only if DUMP_CORRUPT_FS
is set?


On Mon, Apr 17, 2023 at 10:19 AM Leah Rumancik <lrumancik@google.com> wrote:
>
> Sure, will send out another set. Thanks!
>
> On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> > > Previously, we would only run _check_filesystems to ensure that a test
> > > that appeared to pass did not have any filesystem corruption. However,
> > > in _check_filesystems, we also repair any errors found in the filesystem.
> > > Let's do this even if we already know the test failed so that subsequent
> > > tests aren't affected.
> > >
> > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > > ---
> > >  check | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/check b/check
> > > index befbf465..c18f02ca 100755
> > > --- a/check
> > > +++ b/check
> > > @@ -972,6 +972,7 @@ function run_section()
> > >                       # Even though we failed, there may be something interesting in
> > >                       # dmesg which can help debugging.
> > >                       _check_dmesg
> > > +                     (_adjust_oom_score 250; _check_filesystems)
> >
> > Seeing as the test failed, do we care about the state of the scratch fs?
> > Would it be sufficient only to clean up the test fs to avoid cascading
> > damage?
> >
> > (Asking as someone who knows how impactful slow filesystem checking can
> > be on fstests runtimes... ;))
> >
> > --D
> >
> > >                       tc_status="fail"
> > >               else
> > >                       # The test apparently passed, so check for corruption
> > > --
> > > 2.40.0.634.g4ca3ef3211-goog
> > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed
  2023-04-17 20:08       ` Leah Rumancik
@ 2023-04-18  4:49         ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2023-04-18  4:49 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: Leah Rumancik, fstests, tytso

On Mon, Apr 17, 2023 at 01:08:20PM -0700, Leah Rumancik wrote:
> Well actually, if you are using DUMP_CORRUPT_FS, you'd probably want
> _check_filesystems to run on the scratch fs as well.

ooh, good point

> This shouldn't
> add that much time if most of your tests are passing :) but I could
> make this only run for scratch fs on failure only if DUMP_CORRUPT_FS
> is set?

Hmm.  On second thought, you've convinced me to go along with this.
The test failed, let's see if corruption happened --> HAPPY DANCE.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> On Mon, Apr 17, 2023 at 10:19 AM Leah Rumancik <lrumancik@google.com> wrote:
> >
> > Sure, will send out another set. Thanks!
> >
> > On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> > > > Previously, we would only run _check_filesystems to ensure that a test
> > > > that appeared to pass did not have any filesystem corruption. However,
> > > > in _check_filesystems, we also repair any errors found in the filesystem.
> > > > Let's do this even if we already know the test failed so that subsequent
> > > > tests aren't affected.
> > > >
> > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > > > ---
> > > >  check | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/check b/check
> > > > index befbf465..c18f02ca 100755
> > > > --- a/check
> > > > +++ b/check
> > > > @@ -972,6 +972,7 @@ function run_section()
> > > >                       # Even though we failed, there may be something interesting in
> > > >                       # dmesg which can help debugging.
> > > >                       _check_dmesg
> > > > +                     (_adjust_oom_score 250; _check_filesystems)
> > >
> > > Seeing as the test failed, do we care about the state of the scratch fs?
> > > Would it be sufficient only to clean up the test fs to avoid cascading
> > > damage?
> > >
> > > (Asking as someone who knows how impactful slow filesystem checking can
> > > be on fstests runtimes... ;))
> > >
> > > --D
> > >
> > > >                       tc_status="fail"
> > > >               else
> > > >                       # The test apparently passed, so check for corruption
> > > > --
> > > > 2.40.0.634.g4ca3ef3211-goog
> > > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-18  4:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-14 22:11 [PATCH 1/2] check: try to fix the test device if it gets corrupted Leah Rumancik
2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik
2023-04-15  0:41   ` Darrick J. Wong
2023-04-17 17:19     ` Leah Rumancik
2023-04-17 20:08       ` Leah Rumancik
2023-04-18  4:49         ` Darrick J. Wong
2023-04-15  0:35 ` [PATCH 1/2] check: try to fix the test device if it gets corrupted Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.