linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] fstests: add patient module remover
@ 2021-08-11 15:45 Luis Chamberlain
  2021-08-11 15:45 ` [PATCH v2 1/3] fstests: use udevadm settle after pvremove Luis Chamberlain
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-11 15:45 UTC (permalink / raw)
  To: fstests
  Cc: hare, dgilbert, jeyu, lucas.demarchi, linux-kernel, Luis Chamberlain

This is v2 of my series of enhancements to fstests to deal with
false positives with meta block drivers we use for tests such as
scsi_debug which are caused by races by not being able to remove
a driver.

Changes in v2:

  - Now that I have confirmed the issue with the refcnt being bumped
    after it becomes 0 is also present on linux-next, and *is* a generic
    "this is life with modules" matter, I went ahead and implemented
    a patient module remover support into kmod and posted patches.
    What this means for this series of patches is that we get a real
    patient module remover support in modprobe, and so modprobe -p
    will soon be an option, if merged. This series then now checks for
    that and if its present uses it, otherwise it open codes a similar
    solution.

  - The patient module remover now also re-tries to remove the module,
    as *any* race can easily bump a module refcnt up. We just then need
    an upper limit threshold on timeout or to decide if we run forever.

  - adds udevadm settle after pvremove

  - I confirm now I don't get any stupid module false positives on older
    or newer kernels, and life can move on.

Luis Chamberlain (3):
  fstests: use udevadm settle after pvremove
  common/module: add patient module rmmod support
  common/scsi_debug: use the patient module remover

 common/config     |  31 ++++++++++++++
 common/module     | 107 ++++++++++++++++++++++++++++++++++++++++++++++
 common/scsi_debug |   6 ++-
 tests/generic/081 |   5 ++-
 tests/generic/108 |   1 +
 tests/generic/459 |   1 +
 6 files changed, 148 insertions(+), 3 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] fstests: use udevadm settle after pvremove
  2021-08-11 15:45 [PATCH v2 0/3] fstests: add patient module remover Luis Chamberlain
@ 2021-08-11 15:45 ` Luis Chamberlain
  2021-08-15 12:36   ` Eryu Guan
  2021-08-11 15:45 ` [PATCH v2 2/3] common/module: add patient module rmmod support Luis Chamberlain
  2021-08-11 15:45 ` [PATCH v2 3/3] common/scsi_debug: use the patient module remover Luis Chamberlain
  2 siblings, 1 reply; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-11 15:45 UTC (permalink / raw)
  To: fstests
  Cc: hare, dgilbert, jeyu, lucas.demarchi, linux-kernel, Luis Chamberlain

As with creation, we also need to use udevadm settle
when removing a pv, otherwise we can trip on races with
module removals for the block devices in use.

This reduces the amount of time in which a block device
module refcnt for test modules such as scsi_debug spends
outside of 0.

Races with the refcnt being greater than 0 means module
removal can fail causing false positives. This helps
ensure that the pv is really long gone. These issues
are tracked for scsi_debug [0] and later found to be a
generic issue regardless of filesystem with pvremove [1].

Using udevadm settle *helps*, it does not address all
possible races with the refcnt as noted in the generic
bug entry [1].

[0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214015
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 tests/generic/081 | 5 ++++-
 tests/generic/108 | 1 +
 tests/generic/459 | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/generic/081 b/tests/generic/081
index f795b2c1..8e552074 100755
--- a/tests/generic/081
+++ b/tests/generic/081
@@ -12,6 +12,7 @@ _begin_fstest auto quick
 # Override the default cleanup function.
 _cleanup()
 {
+	local pv_ret
 	cd /
 	rm -f $tmp.*
 
@@ -34,7 +35,9 @@ _cleanup()
 		$UMOUNT_PROG $mnt >> $seqres.full 2>&1
 		$LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1
 		$LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1
-		test $? -eq 0 && break
+		pv_ret=$?
+		$UDEV_SETTLE_PROG
+		test $pv_ret -eq 0 && break
 		sleep 2
 	done
 }
diff --git a/tests/generic/108 b/tests/generic/108
index 7dd426c1..b7797e8f 100755
--- a/tests/generic/108
+++ b/tests/generic/108
@@ -21,6 +21,7 @@ _cleanup()
 	$UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
 	$LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1
 	$LVM_PROG pvremove -f $SCRATCH_DEV $SCSI_DEBUG_DEV >>$seqres.full 2>&1
+	$UDEV_SETTLE_PROG
 	_put_scsi_debug_dev
 	rm -f $tmp.*
 }
diff --git a/tests/generic/459 b/tests/generic/459
index e5e5e9ab..5b44e245 100755
--- a/tests/generic/459
+++ b/tests/generic/459
@@ -29,6 +29,7 @@ _cleanup()
 	$UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
 	$LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1
 	$LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1
+	$UDEV_SETTLE_PROG
 }
 
 # Import common functions.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] common/module: add patient module rmmod support
  2021-08-11 15:45 [PATCH v2 0/3] fstests: add patient module remover Luis Chamberlain
  2021-08-11 15:45 ` [PATCH v2 1/3] fstests: use udevadm settle after pvremove Luis Chamberlain
@ 2021-08-11 15:45 ` Luis Chamberlain
  2021-08-15 12:29   ` Eryu Guan
  2021-08-11 15:45 ` [PATCH v2 3/3] common/scsi_debug: use the patient module remover Luis Chamberlain
  2 siblings, 1 reply; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-11 15:45 UTC (permalink / raw)
  To: fstests
  Cc: hare, dgilbert, jeyu, lucas.demarchi, linux-kernel, Luis Chamberlain

When we call rmmod it will fail if the refcnt is greater than 0.
This is expected, however, if using test modules such as scsi_debug,
userspace tests may expect that once userspace is done issuing out
commands it can safely remove the module, and the module will be
removed.

This is not true for few reasons. First, a module might take a while
to quiesce after its used. This varies module by module. For example,
at least for scsi_debug there is one patch to help with this but
that is not sufficient to address all the removal issues, it just helps
quiesce the module faster. If something like LVM pvremove is used, as in
the case of generic/108, it may take time before the module's refcnt goes
to 0 even if DM_DEFERRED_REMOVE is *not* used and even if udevadm settle
is used. Even *after* all this... the module refcnt is still very
fickle. For example, any blkdev_open() against a block device will bump
a module refcnt up and we have little control over stopping these
sporadic userspace calls after a test. A failure on module removal then
just becomes an inconvenience on false positives.

This was first observed on scsi_debug [0]. Doug worked on a patch to
help the driver quiesce [1]. Later the issue has been determined to be
generic [2]. The only way to properly resolve these issues is with a
patient module remover. The kernel used to support a wait for the
delete_module() system call, however this was later deprecated into
kmod with a 10 second userspace sleep. That 10 second sleep is long gone
from kmod now though. I've posted patches now for a kmod patient module
remover then [3], in light of the fact that this issue is generic and
the only way to then properly deal with this is implementing a userspace
patient module remover.

Use the kmod patient module remover when supported, otherwise we open
code our own solution inside fstests. We default to a timeout of 100
seconds. Each test can override the timeout by setting the variable
MODPROBE_PATIENT_RM_TIMEOUT_SECONDS or setting it to "forever" if they
wish for the patience to be infinite.

This uses kmod's patient module remover if you have that feature,
otherwise we open code a solution in fstests which is a simplified
version of what has been proposed for kmod.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
[1] https://lore.kernel.org/linux-scsi/20210508230745.27923-1-dgilbert@interlog.com/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=214015
[3] https://lkml.kernel.org/r/20210810051602.3067384-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/config |  31 +++++++++++++++
 common/module | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 138 insertions(+)

diff --git a/common/config b/common/config
index 005fd50a..9b8a2bc4 100644
--- a/common/config
+++ b/common/config
@@ -252,6 +252,37 @@ if [[ "$UDEV_SETTLE_PROG" == "" || ! -d /proc/net ]]; then
 fi
 export UDEV_SETTLE_PROG
 
+# Set MODPROBE_PATIENT_RM_TIMEOUT_SECONDS to "forever" if you want the patient
+# modprobe removal to run forever trying to remove a module.
+MODPROBE_REMOVE_PATIENT=""
+modprobe --help | grep -q -1 "remove-patiently"
+if [[ $? -ne 0 ]]; then
+	if [[ -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then
+		# We will open code our own implementation of patien module
+		# remover in fstests. Use 100 second default.
+		export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS="100"
+	fi
+else
+	MODPROBE_RM_PATIENT_TIMEOUT_ARGS=""
+	if [[ ! -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then
+		if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_MS" != "forever" ]]; then
+			MODPROBE_PATIENT_RM_TIMEOUT_MS="$((MODPROBE_PATIENT_RM_TIMEOUT_SECONDS * 1000))"
+			MODPROBE_RM_PATIENT_TIMEOUT_ARGS="-t $MODPROBE_PATIENT_RM_TIMEOUT_MS"
+		fi
+	else
+		# We export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS here for parity
+		# with environments without support for modprobe -p, but we
+		# only really need it exported right now for environments which
+		# don't have support for modprobe -p to implement our own
+		# patient module removal support within fstests.
+		export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS="100"
+		MODPROBE_PATIENT_RM_TIMEOUT_MS="$((MODPROBE_PATIENT_RM_TIMEOUT_SECONDS * 1000))"
+		MODPROBE_RM_PATIENT_TIMEOUT_ARGS="-t $MODPROBE_PATIENT_RM_TIMEOUT_MS"
+	fi
+	MODPROBE_REMOVE_PATIENT="modprobe -p $MODPROBE_RM_TIMEOUT_ARGS"
+fi
+export MODPROBE_REMOVE_PATIENT
+
 export MKFS_XFS_PROG=$(type -P mkfs.xfs)
 export MKFS_EXT4_PROG=$(type -P mkfs.ext4)
 export MKFS_UDF_PROG=$(type -P mkudffs)
diff --git a/common/module b/common/module
index 39e4e793..03953fa1 100644
--- a/common/module
+++ b/common/module
@@ -4,6 +4,8 @@
 #
 # Routines for messing around with loadable kernel modules
 
+source common/config
+
 # Return the module name for this fs.
 _module_for_fs()
 {
@@ -81,3 +83,108 @@ _get_fs_module_param()
 {
 	cat /sys/module/${FSTYP}/parameters/${1} 2>/dev/null
 }
+
+# checks the refcount and returns 0 if we can safely remove the module. rmmod
+# does this check for us, but we can use this to also iterate checking for this
+# refcount before we even try to remove the module. This is useful when using
+# debug test modules which take a while to quiesce.
+_patient_rmmod_check_refcnt()
+{
+	local module=$1
+	local refcnt=0
+
+	if [[ -f /sys/module/$module/refcnt ]]; then
+		refcnt=$(cat /sys/module/$module/refcnt 2>/dev/null)
+		if [[ $? -ne 0 || $refcnt -eq 0 ]]; then
+			return 0
+		fi
+		return 1
+	fi
+	return 0
+}
+
+# Patiently tries to wait to remove a module by ensuring first
+# the refcnt is 0 and then trying to persistently remove the module within
+# the time allowed. The timeout is configurable per test, just set
+# MODPROBE_PATIENT_RM_TIMEOUT_SECONDS prior to including this file.
+# If you want this to try forever just set MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
+# to the special value of "forever". This applies to both cases where kmod
+# supports the patient module remover (modrobe -p) and where it does not.
+#
+# If your version of kmod supports modprobe -p, we instead use that
+# instead. Otherwise we have to implement a patient module remover
+# ourselves.
+_patient_rmmod()
+{
+	local module=$1
+	local max_tries_max=$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
+	local max_tries=0
+	local mod_ret=0
+	local refcnt_is_zero=0
+
+	if [[ ! -z $MODPROBE_REMOVE_PATIENT ]]; then
+		$MODPROBE_REMOVE_PATIENT $module
+		mod_ret=$?
+		if [[ $mod_ret -ne 0 ]]; then
+			echo "kmod patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max returned $mod_ret"
+		fi
+		return $mod_ret
+	fi
+
+	max_tries=$max_tries_max
+
+	while [[ "$max_tries" != "0" ]]; do
+		_patient_rmmod_check_refcnt $module
+		if [[ $? -eq 0 ]]; then
+			refcnt_is_zero=1
+			break
+		fi
+		sleep 1
+		if [[ "$max_tries" == "forever" ]]; then
+			continue
+		fi
+		let max_tries=$max_tries-1
+	done
+
+	if [[ $refcnt_is_zero -ne 1 ]]; then
+		echo "custom patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max"
+		return -1
+	fi
+
+	# If we ran out of time but our refcnt check confirms we had
+	# a refcnt of 0, just try to remove the module once.
+	if [[ "$max_tries" == "0" ]]; then
+		modprobe -r $module
+		return $?
+	fi
+
+	# If we have extra time left. Use the time left to now try to
+	# persistently remove the module. We do this because although through
+	# the above we found refcnt to be 0, removal can still fail since
+	# userspace can always race to bump the refcnt. An example is any
+	# blkdev_open() calls against a block device. These issues have been
+	# tracked and documented in the following bug reports, which justifies
+	# our need to do this in userspace:
+	# https://bugzilla.kernel.org/show_bug.cgi?id=212337
+	# https://bugzilla.kernel.org/show_bug.cgi?id=214015
+	while [[ $max_tries != 0 ]]; do
+		if [[ -d /sys/module/$module ]]; then
+			modprobe -r $module 2> /dev/null
+			mod_ret=$?
+			if [[ $mod_ret == 0 ]]; then
+				break;
+			fi
+			sleep 1
+			if [[ "$max_tries" == "forever" ]]; then
+				continue
+			fi
+			let max_tries=$max_tries-1
+		fi
+	done
+
+	if [[ $mod_ret -ne 0 ]]; then
+		echo "custom patient module removal for $module timed out trying to remove $module using timeout of $max_tries_max last try returned $mod_ret"
+	fi
+
+	return $mod_ret
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] common/scsi_debug: use the patient module remover
  2021-08-11 15:45 [PATCH v2 0/3] fstests: add patient module remover Luis Chamberlain
  2021-08-11 15:45 ` [PATCH v2 1/3] fstests: use udevadm settle after pvremove Luis Chamberlain
  2021-08-11 15:45 ` [PATCH v2 2/3] common/module: add patient module rmmod support Luis Chamberlain
@ 2021-08-11 15:45 ` Luis Chamberlain
  2021-08-15 12:35   ` Eryu Guan
  2 siblings, 1 reply; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-11 15:45 UTC (permalink / raw)
  To: fstests
  Cc: hare, dgilbert, jeyu, lucas.demarchi, linux-kernel, Luis Chamberlain

If you try to run tests such as generic/108 in a loop
you'll eventually see a failure, but the failure can
be a false positive and the test was just unable to remove
the scsi_debug module.

We need to give some time for the refcnt to become 0. For
instance for the test generic/108 the refcnt lingers between
2 and 1. It should be 0 when we're done but a bit of time
seems to be required. The chance of us trying to run rmmod
when the refcnt is 2 or 1 is low, about 1/30 times if you
run the test in a loop on linux-next today.

Likewise, even when its 0 we just need a tiny breather before
we can remove the module (sleep 10 suffices) but this is
only required on older kernels. Otherwise removing the module
will just fail.

Some of these races are documented on the korg#212337, and
Doug Gilbert has posted at least one patch attempt to try
to help with this [1]. The patch does not resolve all the
issues though, it helps though.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
[1] https://lkml.kernel.org/r/20210508230745.27923-1-dgilbert@interlog.com
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/scsi_debug | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/common/scsi_debug b/common/scsi_debug
index e7988469..3c9cd820 100644
--- a/common/scsi_debug
+++ b/common/scsi_debug
@@ -4,11 +4,13 @@
 #
 # Functions useful for tests on unique block devices
 
+. common/module
+
 _require_scsi_debug()
 {
 	# make sure we have the module and it's not already used
 	modinfo scsi_debug 2>&1 > /dev/null || _notrun "scsi_debug module not found"
-	lsmod | grep -wq scsi_debug && (rmmod scsi_debug || _notrun "scsi_debug module in use")
+	lsmod | grep -wq scsi_debug && (_patient_rmmod scsi_debug || _notrun "scsi_debug module in use")
 	# make sure it has the features we need
 	# logical/physical sectors plus unmap support all went in together
 	modinfo scsi_debug | grep -wq sector_size || _notrun "scsi_debug too old"
@@ -53,5 +55,5 @@ _put_scsi_debug_dev()
 		$UDEV_SETTLE_PROG
 		n=$((n-1))
 	done
-	rmmod scsi_debug || _fail "Could not remove scsi_debug module"
+	_patient_rmmod scsi_debug || _fail "Could not remove scsi_debug module"
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] common/module: add patient module rmmod support
  2021-08-11 15:45 ` [PATCH v2 2/3] common/module: add patient module rmmod support Luis Chamberlain
@ 2021-08-15 12:29   ` Eryu Guan
  2021-08-18 14:02     ` Luis Chamberlain
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2021-08-15 12:29 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

On Wed, Aug 11, 2021 at 08:45:11AM -0700, Luis Chamberlain wrote:
> When we call rmmod it will fail if the refcnt is greater than 0.
> This is expected, however, if using test modules such as scsi_debug,
> userspace tests may expect that once userspace is done issuing out
> commands it can safely remove the module, and the module will be
> removed.
> 
> This is not true for few reasons. First, a module might take a while
> to quiesce after its used. This varies module by module. For example,
> at least for scsi_debug there is one patch to help with this but
> that is not sufficient to address all the removal issues, it just helps
> quiesce the module faster. If something like LVM pvremove is used, as in
> the case of generic/108, it may take time before the module's refcnt goes
> to 0 even if DM_DEFERRED_REMOVE is *not* used and even if udevadm settle
> is used. Even *after* all this... the module refcnt is still very
> fickle. For example, any blkdev_open() against a block device will bump
> a module refcnt up and we have little control over stopping these
> sporadic userspace calls after a test. A failure on module removal then
> just becomes an inconvenience on false positives.
> 
> This was first observed on scsi_debug [0]. Doug worked on a patch to
> help the driver quiesce [1]. Later the issue has been determined to be
> generic [2]. The only way to properly resolve these issues is with a
> patient module remover. The kernel used to support a wait for the
> delete_module() system call, however this was later deprecated into
> kmod with a 10 second userspace sleep. That 10 second sleep is long gone
> from kmod now though. I've posted patches now for a kmod patient module
> remover then [3], in light of the fact that this issue is generic and
> the only way to then properly deal with this is implementing a userspace
> patient module remover.
> 
> Use the kmod patient module remover when supported, otherwise we open
> code our own solution inside fstests. We default to a timeout of 100
> seconds. Each test can override the timeout by setting the variable
> MODPROBE_PATIENT_RM_TIMEOUT_SECONDS or setting it to "forever" if they
> wish for the patience to be infinite.
> 
> This uses kmod's patient module remover if you have that feature,
> otherwise we open code a solution in fstests which is a simplified
> version of what has been proposed for kmod.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
> [1] https://lore.kernel.org/linux-scsi/20210508230745.27923-1-dgilbert@interlog.com/
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=214015
> [3] https://lkml.kernel.org/r/20210810051602.3067384-1-mcgrof@kernel.org
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/config |  31 +++++++++++++++
>  common/module | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++

Please also update README to document the new configurable variables.

>  2 files changed, 138 insertions(+)
> 
> diff --git a/common/config b/common/config
> index 005fd50a..9b8a2bc4 100644
> --- a/common/config
> +++ b/common/config
> @@ -252,6 +252,37 @@ if [[ "$UDEV_SETTLE_PROG" == "" || ! -d /proc/net ]]; then
>  fi
>  export UDEV_SETTLE_PROG
>  
> +# Set MODPROBE_PATIENT_RM_TIMEOUT_SECONDS to "forever" if you want the patient
> +# modprobe removal to run forever trying to remove a module.
> +MODPROBE_REMOVE_PATIENT=""
> +modprobe --help | grep -q -1 "remove-patiently"
> +if [[ $? -ne 0 ]]; then
> +	if [[ -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then
> +		# We will open code our own implementation of patien module
> +		# remover in fstests. Use 100 second default.
> +		export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS="100"

100s as default seems a bit long to me, use 10s as in v1 patch?

> +	fi
> +else
> +	MODPROBE_RM_PATIENT_TIMEOUT_ARGS=""
> +	if [[ ! -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then
> +		if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_MS" != "forever" ]]; then

Should check MODPROBE_PATIENT_RM_TIMEOUT_SECONDS instead?

> +			MODPROBE_PATIENT_RM_TIMEOUT_MS="$((MODPROBE_PATIENT_RM_TIMEOUT_SECONDS * 1000))"
> +			MODPROBE_RM_PATIENT_TIMEOUT_ARGS="-t $MODPROBE_PATIENT_RM_TIMEOUT_MS"
> +		fi
> +	else
> +		# We export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS here for parity
> +		# with environments without support for modprobe -p, but we
> +		# only really need it exported right now for environments which
> +		# don't have support for modprobe -p to implement our own
> +		# patient module removal support within fstests.
> +		export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS="100"
> +		MODPROBE_PATIENT_RM_TIMEOUT_MS="$((MODPROBE_PATIENT_RM_TIMEOUT_SECONDS * 1000))"
> +		MODPROBE_RM_PATIENT_TIMEOUT_ARGS="-t $MODPROBE_PATIENT_RM_TIMEOUT_MS"
> +	fi
> +	MODPROBE_REMOVE_PATIENT="modprobe -p $MODPROBE_RM_TIMEOUT_ARGS"
> +fi
> +export MODPROBE_REMOVE_PATIENT
> +
>  export MKFS_XFS_PROG=$(type -P mkfs.xfs)
>  export MKFS_EXT4_PROG=$(type -P mkfs.ext4)
>  export MKFS_UDF_PROG=$(type -P mkudffs)
> diff --git a/common/module b/common/module
> index 39e4e793..03953fa1 100644
> --- a/common/module
> +++ b/common/module
> @@ -4,6 +4,8 @@
>  #
>  # Routines for messing around with loadable kernel modules
>  
> +source common/config
> +

Seems there's no need to source common/config here, as it's sourced in
common/rc, which is sourced by every test.

>  # Return the module name for this fs.
>  _module_for_fs()
>  {
> @@ -81,3 +83,108 @@ _get_fs_module_param()
>  {
>  	cat /sys/module/${FSTYP}/parameters/${1} 2>/dev/null
>  }
> +
> +# checks the refcount and returns 0 if we can safely remove the module. rmmod
> +# does this check for us, but we can use this to also iterate checking for this
> +# refcount before we even try to remove the module. This is useful when using
> +# debug test modules which take a while to quiesce.
> +_patient_rmmod_check_refcnt()
> +{
> +	local module=$1
> +	local refcnt=0
> +
> +	if [[ -f /sys/module/$module/refcnt ]]; then
> +		refcnt=$(cat /sys/module/$module/refcnt 2>/dev/null)
> +		if [[ $? -ne 0 || $refcnt -eq 0 ]]; then
> +			return 0
> +		fi
> +		return 1
> +	fi
> +	return 0
> +}
> +
> +# Patiently tries to wait to remove a module by ensuring first
> +# the refcnt is 0 and then trying to persistently remove the module within
> +# the time allowed. The timeout is configurable per test, just set
> +# MODPROBE_PATIENT_RM_TIMEOUT_SECONDS prior to including this file.
> +# If you want this to try forever just set MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
> +# to the special value of "forever". This applies to both cases where kmod
> +# supports the patient module remover (modrobe -p) and where it does not.
> +#
> +# If your version of kmod supports modprobe -p, we instead use that
> +# instead. Otherwise we have to implement a patient module remover
> +# ourselves.
> +_patient_rmmod()
> +{
> +	local module=$1
> +	local max_tries_max=$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
> +	local max_tries=0
> +	local mod_ret=0
> +	local refcnt_is_zero=0
> +
> +	if [[ ! -z $MODPROBE_REMOVE_PATIENT ]]; then
> +		$MODPROBE_REMOVE_PATIENT $module
> +		mod_ret=$?
> +		if [[ $mod_ret -ne 0 ]]; then
> +			echo "kmod patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max returned $mod_ret"
> +		fi
> +		return $mod_ret
> +	fi
> +
> +	max_tries=$max_tries_max
> +
> +	while [[ "$max_tries" != "0" ]]; do

Use "$max_tries -ne 0" to check inters seems better.

> +		_patient_rmmod_check_refcnt $module
> +		if [[ $? -eq 0 ]]; then

Like above

> +			refcnt_is_zero=1
> +			break
> +		fi
> +		sleep 1
> +		if [[ "$max_tries" == "forever" ]]; then
> +			continue
> +		fi
> +		let max_tries=$max_tries-1
> +	done
> +
> +	if [[ $refcnt_is_zero -ne 1 ]]; then
> +		echo "custom patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max"
> +		return -1
> +	fi
> +
> +	# If we ran out of time but our refcnt check confirms we had
> +	# a refcnt of 0, just try to remove the module once.
> +	if [[ "$max_tries" == "0" ]]; then

$max_tries -eq 0

Thanks,
Eryu

> +		modprobe -r $module
> +		return $?
> +	fi
> +
> +	# If we have extra time left. Use the time left to now try to
> +	# persistently remove the module. We do this because although through
> +	# the above we found refcnt to be 0, removal can still fail since
> +	# userspace can always race to bump the refcnt. An example is any
> +	# blkdev_open() calls against a block device. These issues have been
> +	# tracked and documented in the following bug reports, which justifies
> +	# our need to do this in userspace:
> +	# https://bugzilla.kernel.org/show_bug.cgi?id=212337
> +	# https://bugzilla.kernel.org/show_bug.cgi?id=214015
> +	while [[ $max_tries != 0 ]]; do
> +		if [[ -d /sys/module/$module ]]; then
> +			modprobe -r $module 2> /dev/null
> +			mod_ret=$?
> +			if [[ $mod_ret == 0 ]]; then
> +				break;
> +			fi
> +			sleep 1
> +			if [[ "$max_tries" == "forever" ]]; then
> +				continue
> +			fi
> +			let max_tries=$max_tries-1
> +		fi
> +	done
> +
> +	if [[ $mod_ret -ne 0 ]]; then
> +		echo "custom patient module removal for $module timed out trying to remove $module using timeout of $max_tries_max last try returned $mod_ret"
> +	fi
> +
> +	return $mod_ret
> +}
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] common/scsi_debug: use the patient module remover
  2021-08-11 15:45 ` [PATCH v2 3/3] common/scsi_debug: use the patient module remover Luis Chamberlain
@ 2021-08-15 12:35   ` Eryu Guan
  2021-08-20  0:33     ` Luis Chamberlain
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2021-08-15 12:35 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

On Wed, Aug 11, 2021 at 08:45:12AM -0700, Luis Chamberlain wrote:
> If you try to run tests such as generic/108 in a loop
> you'll eventually see a failure, but the failure can
> be a false positive and the test was just unable to remove
> the scsi_debug module.
> 
> We need to give some time for the refcnt to become 0. For
> instance for the test generic/108 the refcnt lingers between
> 2 and 1. It should be 0 when we're done but a bit of time
> seems to be required. The chance of us trying to run rmmod
> when the refcnt is 2 or 1 is low, about 1/30 times if you
> run the test in a loop on linux-next today.
> 
> Likewise, even when its 0 we just need a tiny breather before
> we can remove the module (sleep 10 suffices) but this is
> only required on older kernels. Otherwise removing the module
> will just fail.
> 
> Some of these races are documented on the korg#212337, and
> Doug Gilbert has posted at least one patch attempt to try
> to help with this [1]. The patch does not resolve all the
> issues though, it helps though.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
> [1] https://lkml.kernel.org/r/20210508230745.27923-1-dgilbert@interlog.com
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/scsi_debug | 6 ++++--

There're also some "modprobe -r" calls in common/module, should the be
replaced with _patient_rmmod as well?

>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/common/scsi_debug b/common/scsi_debug
> index e7988469..3c9cd820 100644
> --- a/common/scsi_debug
> +++ b/common/scsi_debug
> @@ -4,11 +4,13 @@
>  #
>  # Functions useful for tests on unique block devices
>  
> +. common/module
> +
>  _require_scsi_debug()
>  {
>  	# make sure we have the module and it's not already used
>  	modinfo scsi_debug 2>&1 > /dev/null || _notrun "scsi_debug module not found"
> -	lsmod | grep -wq scsi_debug && (rmmod scsi_debug || _notrun "scsi_debug module in use")
> +	lsmod | grep -wq scsi_debug && (_patient_rmmod scsi_debug || _notrun "scsi_debug module in use")

I don't think we should use _patient_rmmod here, if we set timeout to
forever and there's actually some other process using scsi_debug module,
we'll loop forever here. And a failure to remove scsi_debug only results
in _notrun, and won't cause false test failure.

>  	# make sure it has the features we need
>  	# logical/physical sectors plus unmap support all went in together
>  	modinfo scsi_debug | grep -wq sector_size || _notrun "scsi_debug too old"
> @@ -53,5 +55,5 @@ _put_scsi_debug_dev()
>  		$UDEV_SETTLE_PROG
>  		n=$((n-1))
>  	done

I think the above while loop could be removed as well?

Thanks,
Eryu

> -	rmmod scsi_debug || _fail "Could not remove scsi_debug module"
> +	_patient_rmmod scsi_debug || _fail "Could not remove scsi_debug module"
>  }
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] fstests: use udevadm settle after pvremove
  2021-08-11 15:45 ` [PATCH v2 1/3] fstests: use udevadm settle after pvremove Luis Chamberlain
@ 2021-08-15 12:36   ` Eryu Guan
  0 siblings, 0 replies; 11+ messages in thread
From: Eryu Guan @ 2021-08-15 12:36 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

On Wed, Aug 11, 2021 at 08:45:10AM -0700, Luis Chamberlain wrote:
> As with creation, we also need to use udevadm settle
> when removing a pv, otherwise we can trip on races with
> module removals for the block devices in use.
> 
> This reduces the amount of time in which a block device
> module refcnt for test modules such as scsi_debug spends
> outside of 0.
> 
> Races with the refcnt being greater than 0 means module
> removal can fail causing false positives. This helps
> ensure that the pv is really long gone. These issues
> are tracked for scsi_debug [0] and later found to be a
> generic issue regardless of filesystem with pvremove [1].
> 
> Using udevadm settle *helps*, it does not address all
> possible races with the refcnt as noted in the generic
> bug entry [1].
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=214015
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

I applied this patch for this week's update.

Thanks,
Eryu

> ---
>  tests/generic/081 | 5 ++++-
>  tests/generic/108 | 1 +
>  tests/generic/459 | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/generic/081 b/tests/generic/081
> index f795b2c1..8e552074 100755
> --- a/tests/generic/081
> +++ b/tests/generic/081
> @@ -12,6 +12,7 @@ _begin_fstest auto quick
>  # Override the default cleanup function.
>  _cleanup()
>  {
> +	local pv_ret
>  	cd /
>  	rm -f $tmp.*
>  
> @@ -34,7 +35,9 @@ _cleanup()
>  		$UMOUNT_PROG $mnt >> $seqres.full 2>&1
>  		$LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1
>  		$LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1
> -		test $? -eq 0 && break
> +		pv_ret=$?
> +		$UDEV_SETTLE_PROG
> +		test $pv_ret -eq 0 && break
>  		sleep 2
>  	done
>  }
> diff --git a/tests/generic/108 b/tests/generic/108
> index 7dd426c1..b7797e8f 100755
> --- a/tests/generic/108
> +++ b/tests/generic/108
> @@ -21,6 +21,7 @@ _cleanup()
>  	$UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
>  	$LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1
>  	$LVM_PROG pvremove -f $SCRATCH_DEV $SCSI_DEBUG_DEV >>$seqres.full 2>&1
> +	$UDEV_SETTLE_PROG
>  	_put_scsi_debug_dev
>  	rm -f $tmp.*
>  }
> diff --git a/tests/generic/459 b/tests/generic/459
> index e5e5e9ab..5b44e245 100755
> --- a/tests/generic/459
> +++ b/tests/generic/459
> @@ -29,6 +29,7 @@ _cleanup()
>  	$UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
>  	$LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1
>  	$LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1
> +	$UDEV_SETTLE_PROG
>  }
>  
>  # Import common functions.
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] common/module: add patient module rmmod support
  2021-08-15 12:29   ` Eryu Guan
@ 2021-08-18 14:02     ` Luis Chamberlain
  2021-08-19  2:26       ` Eryu Guan
  0 siblings, 1 reply; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-18 14:02 UTC (permalink / raw)
  To: Eryu Guan; +Cc: fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

On Sun, Aug 15, 2021 at 08:29:42PM +0800, Eryu Guan wrote:
> On Wed, Aug 11, 2021 at 08:45:11AM -0700, Luis Chamberlain wrote:
> >  common/config |  31 +++++++++++++++
> >  common/module | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Please also update README to document the new configurable variables.

Got it.

> >  2 files changed, 138 insertions(+)
> > 
> > diff --git a/common/config b/common/config
> > index 005fd50a..9b8a2bc4 100644
> > --- a/common/config
> > +++ b/common/config
> 
> 100s as default seems a bit long to me, use 10s as in v1 patch?

In practice I tried using 10s and from my observations we *still* ran
into false positives. So from my own testing peace of mind I'd prefer at
least something higher, and if its going to be higher might as well go
with something which at least makes painfully safe. I'll go with 50s
for my next submission.

> > +	fi
> > +else
> > +	MODPROBE_RM_PATIENT_TIMEOUT_ARGS=""
> > +	if [[ ! -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then
> > +		if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_MS" != "forever" ]]; then
> 
> Should check MODPROBE_PATIENT_RM_TIMEOUT_SECONDS instead?

Indeed will fix.

> > diff --git a/common/module b/common/module
> > index 39e4e793..03953fa1 100644
> > --- a/common/module
> > +++ b/common/module
> > @@ -4,6 +4,8 @@
> >  #
> >  # Routines for messing around with loadable kernel modules
> >  
> > +source common/config
> > +
> 
> Seems there's no need to source common/config here, as it's sourced in
> common/rc, which is sourced by every test.

OK.

> > +	local max_tries_max=$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
<-- snip -->

> > +	local max_tries=0
<-- snip -->

> > +	max_tries=$max_tries_max
> > +
> > +	while [[ "$max_tries" != "0" ]]; do
> 
> Use "$max_tries -ne 0" to check inters seems better.

max_tries can be "forever", in which case this is -eq 0:

$ foo="forever"; if [[ $foo -eq 0 ]]; then echo buggy; else echo ok; fi
buggy

> > +			refcnt_is_zero=1
> > +			break
> > +		fi
> > +		sleep 1
> > +		if [[ "$max_tries" == "forever" ]]; then
> > +			continue
> > +		fi
> > +		let max_tries=$max_tries-1
> > +	done
> > +
> > +	if [[ $refcnt_is_zero -ne 1 ]]; then
> > +		echo "custom patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max"
> > +		return -1
> > +	fi
> > +
> > +	# If we ran out of time but our refcnt check confirms we had
> > +	# a refcnt of 0, just try to remove the module once.
> > +	if [[ "$max_tries" == "0" ]]; then
> 
> $max_tries -eq 0

Same issue.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] common/module: add patient module rmmod support
  2021-08-18 14:02     ` Luis Chamberlain
@ 2021-08-19  2:26       ` Eryu Guan
  2021-08-19 23:58         ` Luis Chamberlain
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2021-08-19  2:26 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Eryu Guan, fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

On Wed, Aug 18, 2021 at 07:02:56AM -0700, Luis Chamberlain wrote:
> On Sun, Aug 15, 2021 at 08:29:42PM +0800, Eryu Guan wrote:
> > On Wed, Aug 11, 2021 at 08:45:11AM -0700, Luis Chamberlain wrote:
> > >  common/config |  31 +++++++++++++++
> > >  common/module | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > 
> > Please also update README to document the new configurable variables.
> 
> Got it.
> 
> > >  2 files changed, 138 insertions(+)
> > > 
> > > diff --git a/common/config b/common/config
> > > index 005fd50a..9b8a2bc4 100644
> > > --- a/common/config
> > > +++ b/common/config
> > 
> > 100s as default seems a bit long to me, use 10s as in v1 patch?
> 
> In practice I tried using 10s and from my observations we *still* ran
> into false positives. So from my own testing peace of mind I'd prefer at
> least something higher, and if its going to be higher might as well go
> with something which at least makes painfully safe. I'll go with 50s
> for my next submission.
> 
> > > +	fi
> > > +else
> > > +	MODPROBE_RM_PATIENT_TIMEOUT_ARGS=""
> > > +	if [[ ! -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then
> > > +		if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_MS" != "forever" ]]; then
> > 
> > Should check MODPROBE_PATIENT_RM_TIMEOUT_SECONDS instead?
> 
> Indeed will fix.
> 
> > > diff --git a/common/module b/common/module
> > > index 39e4e793..03953fa1 100644
> > > --- a/common/module
> > > +++ b/common/module
> > > @@ -4,6 +4,8 @@
> > >  #
> > >  # Routines for messing around with loadable kernel modules
> > >  
> > > +source common/config
> > > +
> > 
> > Seems there's no need to source common/config here, as it's sourced in
> > common/rc, which is sourced by every test.
> 
> OK.
> 
> > > +	local max_tries_max=$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
> <-- snip -->
> 
> > > +	local max_tries=0
> <-- snip -->
> 
> > > +	max_tries=$max_tries_max
> > > +
> > > +	while [[ "$max_tries" != "0" ]]; do
> > 
> > Use "$max_tries -ne 0" to check inters seems better.
> 
> max_tries can be "forever", in which case this is -eq 0:
> 
> $ foo="forever"; if [[ $foo -eq 0 ]]; then echo buggy; else echo ok; fi
> buggy

I see, that makes sense. Then perhaps some comments would help.

Thanks,
Eryu

> 
> > > +			refcnt_is_zero=1
> > > +			break
> > > +		fi
> > > +		sleep 1
> > > +		if [[ "$max_tries" == "forever" ]]; then
> > > +			continue
> > > +		fi
> > > +		let max_tries=$max_tries-1
> > > +	done
> > > +
> > > +	if [[ $refcnt_is_zero -ne 1 ]]; then
> > > +		echo "custom patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max"
> > > +		return -1
> > > +	fi
> > > +
> > > +	# If we ran out of time but our refcnt check confirms we had
> > > +	# a refcnt of 0, just try to remove the module once.
> > > +	if [[ "$max_tries" == "0" ]]; then
> > 
> > $max_tries -eq 0
> 
> Same issue.
> 
>   Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] common/module: add patient module rmmod support
  2021-08-19  2:26       ` Eryu Guan
@ 2021-08-19 23:58         ` Luis Chamberlain
  0 siblings, 0 replies; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-19 23:58 UTC (permalink / raw)
  To: Eryu Guan
  Cc: Eryu Guan, fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

On Thu, Aug 19, 2021 at 10:26:44AM +0800, Eryu Guan wrote:
> On Wed, Aug 18, 2021 at 07:02:56AM -0700, Luis Chamberlain wrote:
> > On Sun, Aug 15, 2021 at 08:29:42PM +0800, Eryu Guan wrote:
> > > 
> > > Use "$max_tries -ne 0" to check inters seems better.
> > 
> > max_tries can be "forever", in which case this is -eq 0:
> > 
> > $ foo="forever"; if [[ $foo -eq 0 ]]; then echo buggy; else echo ok; fi
> > buggy
> 
> I see, that makes sense. Then perhaps some comments would help.

I had sent out a v3 before seeing this request. I'll send a v4 with 
a comment clarifying this.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] common/scsi_debug: use the patient module remover
  2021-08-15 12:35   ` Eryu Guan
@ 2021-08-20  0:33     ` Luis Chamberlain
  0 siblings, 0 replies; 11+ messages in thread
From: Luis Chamberlain @ 2021-08-20  0:33 UTC (permalink / raw)
  To: Eryu Guan; +Cc: fstests, hare, dgilbert, jeyu, lucas.demarchi, linux-kernel

Sorry for some reason I missed this email. Replies are below.

On Sun, Aug 15, 2021 at 08:35:02PM +0800, Eryu Guan wrote:
> On Wed, Aug 11, 2021 at 08:45:12AM -0700, Luis Chamberlain wrote:
> > If you try to run tests such as generic/108 in a loop
> > you'll eventually see a failure, but the failure can
> > be a false positive and the test was just unable to remove
> > the scsi_debug module.
> > 
> > We need to give some time for the refcnt to become 0. For
> > instance for the test generic/108 the refcnt lingers between
> > 2 and 1. It should be 0 when we're done but a bit of time
> > seems to be required. The chance of us trying to run rmmod
> > when the refcnt is 2 or 1 is low, about 1/30 times if you
> > run the test in a loop on linux-next today.
> > 
> > Likewise, even when its 0 we just need a tiny breather before
> > we can remove the module (sleep 10 suffices) but this is
> > only required on older kernels. Otherwise removing the module
> > will just fail.
> > 
> > Some of these races are documented on the korg#212337, and
> > Doug Gilbert has posted at least one patch attempt to try
> > to help with this [1]. The patch does not resolve all the
> > issues though, it helps though.
> > 
> > [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
> > [1] https://lkml.kernel.org/r/20210508230745.27923-1-dgilbert@interlog.com
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  common/scsi_debug | 6 ++++--
> 
> There're also some "modprobe -r" calls in common/module, should the be
> replaced with _patient_rmmod as well?

Ah, yes well... indeed, that could very well be the case. But that
is another change. It can surely help fix quite a bit of other odd
false positives.

I would prefer if such changes would go in separately.

> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/common/scsi_debug b/common/scsi_debug
> > index e7988469..3c9cd820 100644
> > --- a/common/scsi_debug
> > +++ b/common/scsi_debug
> > @@ -4,11 +4,13 @@
> >  #
> >  # Functions useful for tests on unique block devices
> >  
> > +. common/module
> > +
> >  _require_scsi_debug()
> >  {
> >  	# make sure we have the module and it's not already used
> >  	modinfo scsi_debug 2>&1 > /dev/null || _notrun "scsi_debug module not found"
> > -	lsmod | grep -wq scsi_debug && (rmmod scsi_debug || _notrun "scsi_debug module in use")
> > +	lsmod | grep -wq scsi_debug && (_patient_rmmod scsi_debug || _notrun "scsi_debug module in use")
> 
> I don't think we should use _patient_rmmod here, if we set timeout to
> forever and there's actually some other process using scsi_debug module,
> we'll loop forever here. And a failure to remove scsi_debug only results
> in _notrun, and won't cause false test failure.

If one sets the timeout to forever and if the module is already loaded
and we cannot remove it, I'd say its working as intended. But indeed,
fstests does not leave modules lingering around... it would be a user
error. I'll add a special case for this.

> >  	# make sure it has the features we need
> >  	# logical/physical sectors plus unmap support all went in together
> >  	modinfo scsi_debug | grep -wq sector_size || _notrun "scsi_debug too old"
> > @@ -53,5 +55,5 @@ _put_scsi_debug_dev()
> >  		$UDEV_SETTLE_PROG
> >  		n=$((n-1))
> >  	done
> 
> I think the above while loop could be removed as well?

Good call, yeah that crap is not needed anymore. I'd still leave in
place the UDEV_SETTLE_PROG though, as I think that's good practice
and we can't loose anything. It can for instance salvage buggy
tests which forgot to use it.

 Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-20  0:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 15:45 [PATCH v2 0/3] fstests: add patient module remover Luis Chamberlain
2021-08-11 15:45 ` [PATCH v2 1/3] fstests: use udevadm settle after pvremove Luis Chamberlain
2021-08-15 12:36   ` Eryu Guan
2021-08-11 15:45 ` [PATCH v2 2/3] common/module: add patient module rmmod support Luis Chamberlain
2021-08-15 12:29   ` Eryu Guan
2021-08-18 14:02     ` Luis Chamberlain
2021-08-19  2:26       ` Eryu Guan
2021-08-19 23:58         ` Luis Chamberlain
2021-08-11 15:45 ` [PATCH v2 3/3] common/scsi_debug: use the patient module remover Luis Chamberlain
2021-08-15 12:35   ` Eryu Guan
2021-08-20  0:33     ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).