All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: do not use interruptible wait anywhere
@ 2018-04-12 16:23 Alan Jenkins
  2018-04-12 17:51   ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-12 16:23 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: linux-kernel, Alan Jenkins, stable

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

# dd if=/dev/sda of=/dev/null iflag=direct & \
  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
 block/blk-core.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index abcb8684ba67..5a6d20069364 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -915,7 +915,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 	while (true) {
 		bool success = false;
-		int ret;
 
 		rcu_read_lock();
 		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
-		ret = wait_event_interruptible(q->mq_freeze_wq,
+		wait_event(q->mq_freeze_wq,
 				(atomic_read(&q->mq_freeze_depth) == 0 &&
 				 (preempt || !blk_queue_preempt_only(q))) ||
 				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
-		if (ret)
-			return ret;
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] block: do not use interruptible wait anywhere
  2018-04-12 16:23 [PATCH] block: do not use interruptible wait anywhere Alan Jenkins
@ 2018-04-12 17:51   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2018-04-12 17:51 UTC (permalink / raw)
  To: alan.christopher.jenkins, linux-block, axboe; +Cc: linux-kernel, stable

T24gVGh1LCAyMDE4LTA0LTEyIGF0IDE3OjIzICswMTAwLCBBbGFuIEplbmtpbnMgd3JvdGU6DQo+
IEBAIC05NDcsMTQgKzk0NiwxMiBAQCBpbnQgYmxrX3F1ZXVlX2VudGVyKHN0cnVjdCByZXF1ZXN0
X3F1ZXVlICpxLCBibGtfbXFfcmVxX2ZsYWdzX3QgZmxhZ3MpDQo+ICAJCSAqLw0KPiAgCQlzbXBf
cm1iKCk7DQo+ICANCj4gLQkJcmV0ID0gd2FpdF9ldmVudF9pbnRlcnJ1cHRpYmxlKHEtPm1xX2Zy
ZWV6ZV93cSwNCj4gKwkJd2FpdF9ldmVudChxLT5tcV9mcmVlemVfd3EsDQo+ICAJCQkJKGF0b21p
Y19yZWFkKCZxLT5tcV9mcmVlemVfZGVwdGgpID09IDAgJiYNCj4gIAkJCQkgKHByZWVtcHQgfHwg
IWJsa19xdWV1ZV9wcmVlbXB0X29ubHkocSkpKSB8fA0KPiAgCQkJCWJsa19xdWV1ZV9keWluZyhx
KSk7DQo+ICAJCWlmIChibGtfcXVldWVfZHlpbmcocSkpDQo+ICAJCQlyZXR1cm4gLUVOT0RFVjsN
Cj4gLQkJaWYgKHJldCkNCj4gLQkJCXJldHVybiByZXQ7DQo+ICAJfQ0KPiAgfQ0KDQpIZWxsbyBB
bGFuLA0KDQpQbGVhc2UgcmVpbmRlbnQgdGhlIHdhaXRfZXZlbnQoKSBhcmd1bWVudHMgc3VjaCB0
aGF0IHRoZXNlIHJlbWFpbiBhbGlnbmVkLg0KDQpBbnl3YXk6DQoNClJldmlld2VkLWJ5OiBCYXJ0
IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+DQoNCg0K

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] block: do not use interruptible wait anywhere
@ 2018-04-12 17:51   ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2018-04-12 17:51 UTC (permalink / raw)
  To: alan.christopher.jenkins, linux-block, axboe; +Cc: linux-kernel, stable

On Thu, 2018-04-12 at 17:23 +0100, Alan Jenkins wrote:
> @@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  		 */
>  		smp_rmb();
>  
> -		ret = wait_event_interruptible(q->mq_freeze_wq,
> +		wait_event(q->mq_freeze_wq,
>  				(atomic_read(&q->mq_freeze_depth) == 0 &&
>  				 (preempt || !blk_queue_preempt_only(q))) ||
>  				blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;
> -		if (ret)
> -			return ret;
>  	}
>  }

Hello Alan,

Please reindent the wait_event() arguments such that these remain aligned.

Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] block: do not use interruptible wait anywhere
  2018-04-12 17:51   ` Bart Van Assche
  (?)
@ 2018-04-12 18:11   ` Alan Jenkins
  2018-04-13  8:31     ` Johannes Thumshirn
  2018-04-14 19:54     ` [PATCH v2] block: do not use interruptible wait anywhere Jens Axboe
  -1 siblings, 2 replies; 23+ messages in thread
From: Alan Jenkins @ 2018-04-12 18:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Bart Van Assche, linux-kernel, Alan Jenkins, stable

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

# dd if=/dev/sda of=/dev/null iflag=direct & \
  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
v2: fix indentation

 block/blk-core.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index abcb8684ba67..1a762f3980f2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -915,7 +915,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 	while (true) {
 		bool success = false;
-		int ret;
 
 		rcu_read_lock();
 		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
-		ret = wait_event_interruptible(q->mq_freeze_wq,
-				(atomic_read(&q->mq_freeze_depth) == 0 &&
-				 (preempt || !blk_queue_preempt_only(q))) ||
-				blk_queue_dying(q));
+		wait_event(q->mq_freeze_wq,
+			   (atomic_read(&q->mq_freeze_depth) == 0 &&
+			    (preempt || !blk_queue_preempt_only(q))) ||
+			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
-		if (ret)
-			return ret;
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] block: do not use interruptible wait anywhere
  2018-04-12 18:11   ` [PATCH v2] " Alan Jenkins
@ 2018-04-13  8:31     ` Johannes Thumshirn
  2018-04-14 19:46       ` blktest for " Alan Jenkins
  2018-04-14 19:54     ` [PATCH v2] block: do not use interruptible wait anywhere Jens Axboe
  1 sibling, 1 reply; 23+ messages in thread
From: Johannes Thumshirn @ 2018-04-13  8:31 UTC (permalink / raw)
  To: Alan Jenkins, Jens Axboe, linux-block
  Cc: Bart Van Assche, linux-kernel, stable

Hi Alan,

On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds

Can you please also add a regression test to blktests[1] for this?

[1] https://github.com/osandov/blktests

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere
  2018-04-13  8:31     ` Johannes Thumshirn
@ 2018-04-14 19:46       ` Alan Jenkins
  2018-04-14 19:52         ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-14 19:46 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-block
  Cc: Jens Axboe, Bart Van Assche, linux-kernel, stable

On 13/04/18 09:31, Johannes Thumshirn wrote:
> Hi Alan,
>
> On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>    while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>    echo mem > /sys/power/state ; \
>>    sleep 5; killall dd  # stop after 5 seconds
> Can you please also add a regression test to blktests[1] for this?
>
> [1] https://github.com/osandov/blktests
>
> Thanks,
> 	Johannes

Good question. It would be nice to promote this test.

Template looks like I need the commit (sha1) first.

I had some ideas about automating it, so I wrote a standalone (see 
end).  I can automate the wakeup by using pm_test, but this is still a 
system suspend test.  Unfortunately I don't think there's any 
alternative. To give the most dire example

     # This test is non-destructive, but it exercises suspend in all drivers.
     # If your system has a problem with suspend, it might not wake up again.


So I'm not sure if it would be acceptable for the default set?

How useful is this going to be? Is there an expanded/full set of tests 
that gets run somewhere?

If you can't guarantee it's going to be run somewhere, I'd worry the 
cost/benefit  feels a little narrow :-(. There were one or two further 
"interesting" details, and it might theoretically bitrot if it's not run 
periodically.

If you look at the diff and title for the fix, I don't think it's at 
high risk of being reversed unintentionally.

And I think you can trust users will notice if the fix gets merged away 
accidentally, before it hits -stable releases :-). The issue kills the 
entire GUI session on resume from suspend, say once every three days, on 
gnome-shell (due to Xwayland). One unfortunate user switched to Xorg 
only to find that was also affected.  I honestly assume the issue 
applies generally to laptop systems.  The only mitigating factor is if 
you have RAM to spare, so you don't hit the major pagefaults during resume.

#!/bin/bash

# This test is non-destructive, but it exercises suspend in all drivers.
# If your system has a problem with suspend, it might not wake up again.

# TEST_DEV must be SCSI (inc. libata).
#
# Additionally, this test will abort if $TEST_DEV is too tiny
# and we finish reading it within 3 seconds.  Sorry.
TEST_DEV=sda

# RATIONALE
#
# The original root cause issue was the behaviour around blk_queue_freeze().
# It put tasks into an interruptible wait, which is wrong for block devices.
#
# XXX Insert reference to fix commit XXX
#
# The freeze feature is not directly exposed to userspace, so I can not test
# it directly :(.  (It's used to "guarantee no request is in use, so we can
# change any data structure of the queue afterward".  I.e. freeze, modify the
# queue structure, unfreeze).
#
# However, this lead to a regression with a decent reproducer.  In v4.15 the
# same interruptible wait was also used for SCSI suspend/resume.  SCSI resume
# can take a second or so... hence we like to do it asynchronously.  This
# means we can observe the wait at resume time, and we can test if it is
# interruptible.
#
# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
# trigger the specific wait in the block layer.  That code path only
# sets the SCSI device state; it does not set any block device state.
# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).

set -o nounset

abort() {
     echo "$*"
     echo "=== Test ERROR ==="
     exit 2
}

SYSFS_PM_TEST_DELAY=/sys/module/suspend/parameters/pm_test_delay
SAVED_PM_TEST_DELAY=

# Child process IDs
DD=
SUBSHELL=

cleanup() {
     # In many cases the subshell will already have exited...
     # and semantics for `wait` are crappy in shell.
     # Failure will be harmless in most cases.
     # Just try to provide enough context for the user to guess.

     echo "Cleaning up"
     
     if [ -n "$SUBSHELL" ]; then
         echo "Killing sub-shell PID $SUBSHELL..."
         kill $SUBSHELL
         wait $SUBSHELL
     fi
     if [ -n "$DD" ]; then
         echo "Killing 'dd' PID $DD..."
         kill $DD
         wait $DD
     fi

     echo "Resetting pm_test"
     echo none > /sys/power/pm_test
     
     echo "Resetting pm_test_delay"
     if [ -n "$SAVED_PM_TEST_DELAY" ]; then
         echo "$SAVED_PM_TEST_DELAY" > "$SYSFS_PM_TEST_DELAY"
     fi
}
trap cleanup EXIT

# "If a user has disabled async probing a likely reason
#  is due to a storage enclosure that does not inject
#  staggered spin-ups. For safety, make resume
#  synchronous as well in that case."
if ! SCAN="$(cat /sys/module/scsi_mod/parameters/scan)"; then
     abort "error reading '/sys/module/scsi_mod/parameters/scan' ?"
fi
if [ "$SCAN" != "async" ]; then
     abort "This test does not work if you have set 'scsi_mod.scan=sync'"
fi

# Ignore USR1, in the hope that this applies to child processes.
# This allows us to safely `kill -USR1 $DD`, when we don't know
# whether the child process has fully started yet.
#
# I think this is the only place I relied on the specific
# shell (bash) behaviour.
trap "" USR1

# Check dd can work
if ! dd iflag=direct if="/dev/$TEST_DEV" of=/dev/null count=1 status=none; then
     abort "'dd'"
fi

# Start dd, as a background process which submits IOs and yells when one fails.
# We want to hit the block layer, so use direct IO to avoid being served from
# page cache.
dd iflag=direct if="/dev/$TEST_DEV" of=/dev/null status=none &
DD=$!

if ! echo devices > /sys/power/pm_test; then
     abort "Setting pm_test failed, does your kernel lack CONFIG_PM_TEST?"
fi

if ! SAVED_PM_TEST_DELAY="$(cat "$SYSFS_PM_TEST_DELAY")"; then
     abort "error reading pm_test_delay"
fi
if ! echo 0 > "$SYSFS_PM_TEST_DELAY"; then
     abort "error setting pm_test_delay"
fi

# Just keep sending signals to 'dd' as long as it's alive.
# dd accepts USR1 signal to print status.  It doesn't seem to be a problem
# that we told dd not to actually *print* anything ('status=none').
#
# In theory this script is probably subject to various pid re-use races.
# But I started in sh... so far blktests does not depend on python...
# also direct IO is best to reproduce this, which is not built in to python.
#
( while kill -USR1 $DD 2>/dev/null; do true; done ) &

SUBSHELL=$!

# Wait a second without suspending, it might pick up typos
# or other unexpected errors.
sleep 1
if ! kill -0 $DD; then
     DD=
     wait $DD || echo "'dd' exited with error"
     abort "'dd' exited early?"
fi
if ! kill -0 $SUBSHELL; then
     SUBSHELL=
     abort "subshell exited early?"
fi

# Log that we're suspending.  User might not have guessed,
# or maybe suspend (or pm_test suspend) is broken on this system.
echo "Now simulating suspend/resume"

if ! echo mem > /sys/power/state; then
     abort "system suspend failed or not supported?"
fi

# Now wait for TEST_DEV to resume asynchronously
if ! dd iflag=direct if="/dev/$TEST_DEV" of=/dev/null count=1 status=none; then
     abort "'dd'"
fi

# Wait another second.  This might be useful in the case dd got blocked on a
# page fault during the suspend; it will have a second to get sorted out,
# while potentially receiving an IO error and exiting.
sleep 1

if ! kill -0 $DD 2>/dev/null; then
     if wait $DD; then
         DD=
         abort "'dd' exited early, without error.  Device too tiny?"
     fi
     
     echo "'dd' exited with error"
     echo "=== Test FAIL ==="
     DD=
     exit 1
fi
echo "=== Test PASS ==="

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere
  2018-04-14 19:46       ` blktest for " Alan Jenkins
@ 2018-04-14 19:52         ` Jens Axboe
  2018-04-15 12:15           ` Alan Jenkins
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2018-04-14 19:52 UTC (permalink / raw)
  To: Alan Jenkins, Johannes Thumshirn, linux-block
  Cc: Bart Van Assche, linux-kernel, stable

On 4/14/18 1:46 PM, Alan Jenkins wrote:
> On 13/04/18 09:31, Johannes Thumshirn wrote:
>> Hi Alan,
>>
>> On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
>>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>>    while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>>    echo mem > /sys/power/state ; \
>>>    sleep 5; killall dd  # stop after 5 seconds
>> Can you please also add a regression test to blktests[1] for this?
>>
>> [1] https://github.com/osandov/blktests
>>
>> Thanks,
>> 	Johannes
> 
> Good question. It would be nice to promote this test.
> 
> Template looks like I need the commit (sha1) first.
> 
> I had some ideas about automating it, so I wrote a standalone (see 
> end).  I can automate the wakeup by using pm_test, but this is still a 
> system suspend test.  Unfortunately I don't think there's any 
> alternative. To give the most dire example
> 
>      # This test is non-destructive, but it exercises suspend in all drivers.
>      # If your system has a problem with suspend, it might not wake up again.
> 
> 
> So I'm not sure if it would be acceptable for the default set?
> 
> How useful is this going to be? Is there an expanded/full set of tests 
> that gets run somewhere?
> 
> If you can't guarantee it's going to be run somewhere, I'd worry the 
> cost/benefit  feels a little narrow :-(. There were one or two further 
> "interesting" details, and it might theoretically bitrot if it's not run 
> periodically.

I run it, just last week we found two new bugs with it. I'm requiring
anyone that submits block patches to run the test suite, and also
working towards having it be part of the 0-day runs so it gets run
on posted patches automatically.

So yes, it's useful and it won't bitrot. Please do turn it into a blktests
test.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] block: do not use interruptible wait anywhere
  2018-04-12 18:11   ` [PATCH v2] " Alan Jenkins
  2018-04-13  8:31     ` Johannes Thumshirn
@ 2018-04-14 19:54     ` Jens Axboe
  1 sibling, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2018-04-14 19:54 UTC (permalink / raw)
  To: Alan Jenkins, linux-block; +Cc: Bart Van Assche, linux-kernel, stable

On 4/12/18 12:11 PM, Alan Jenkins wrote:
> When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
> 
> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> ("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
> 
> So that commit exposed the bug as a regression in v4.15.  A mysterious
> SIGBUS (or -EIO) sometimes happened during the time the device was being
> resumed.  Most frequently, there was no kernel log message, and we saw Xorg
> or Xwayland killed by SIGBUS.[1]
> 
> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
> 
> Without this fix, I get an IO error in this test:
> 
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds
> 
> The interruptible wait was added to blk_queue_enter in
> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> Before then, the interruptible wait was only in blk-mq, but I don't think
> it could ever have been correct.

Applied, thanks.

Still want that test in blktests, though!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere
  2018-04-14 19:52         ` Jens Axboe
@ 2018-04-15 12:15           ` Alan Jenkins
  2018-04-16 21:23             ` [PATCH] blktests: regression test "block: do not use interruptible wait anywhere" Alan Jenkins
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-15 12:15 UTC (permalink / raw)
  To: Jens Axboe, Johannes Thumshirn, linux-block
  Cc: Bart Van Assche, linux-kernel, stable

On 14/04/18 20:52, Jens Axboe wrote:
> On 4/14/18 1:46 PM, Alan Jenkins wrote:
>> On 13/04/18 09:31, Johannes Thumshirn wrote:
>>> Hi Alan,
>>>
>>> On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
>>>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>>>     while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>>>     echo mem > /sys/power/state ; \
>>>>     sleep 5; killall dd  # stop after 5 seconds
>>> Can you please also add a regression test to blktests[1] for this?
>>>
>>> [1] https://github.com/osandov/blktests
>>>
>>> Thanks,
>>> 	Johannes
>> Good question. It would be nice to promote this test.
>>
>> Template looks like I need the commit (sha1) first.
>>
>> I had some ideas about automating it, so I wrote a standalone (see
>> end).  I can automate the wakeup by using pm_test, but this is still a
>> system suspend test.  Unfortunately I don't think there's any
>> alternative. To give the most dire example
>>
>>       # This test is non-destructive, but it exercises suspend in all drivers.
>>       # If your system has a problem with suspend, it might not wake up again.
>>
>>
>> So I'm not sure if it would be acceptable for the default set?
>>
>> How useful is this going to be? Is there an expanded/full set of tests
>> that gets run somewhere?
>>
>> If you can't guarantee it's going to be run somewhere, I'd worry the
>> cost/benefit  feels a little narrow :-(. There were one or two further
>> "interesting" details, and it might theoretically bitrot if it's not run
>> periodically.
> I run it, just last week we found two new bugs with it. I'm requiring
> anyone that submits block patches to run the test suite, and also
> working towards having it be part of the 0-day runs so it gets run
> on posted patches automatically.
>
> So yes, it's useful and it won't bitrot. Please do turn it into a blktests
> test.

Thanks, it's really great to have a test suite. I was specifically 
checking in on how we can include a system suspend test.

I've been thinking the suspend test could be opt-in test (e.g. 
ALLOW_PM_TEST=1), and then we have some infrastructure (you or 0-day 
runs) that does the opt-in.  Without knowing anything about the 
infrastructure, I didn't want to assume that would work.

I'm aware of one particular suspend issue; inside virt-manager VMs I see 
Linux crashing with a null pointer in qxl_drm_freeze.  A regression soon 
after I learned how to use VMs for suspend tests :-( , and it's been 
long enough that I suspect few people use it.

Partly what you saw me fishing for in the comments, is the idea of some 
kernel code allowing more direct testing of the queue freeze / 
preempt_only flag.  That might be better engineering, but I don't know 
where I could put it.

Alan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-15 12:15           ` Alan Jenkins
@ 2018-04-16 21:23             ` Alan Jenkins
  2018-04-16 21:52               ` Alan Jenkins
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-16 21:23 UTC (permalink / raw)
  To: Jens Axboe, Johannes Thumshirn, linux-block; +Cc: Bart Van Assche, Alan Jenkins

> Without this fix, I get an IO error in this test:
>
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds

linux-block insisted they wanted this, based on my reproducer above.
If you start wondering why you wouldn't base it on scsi_debug with a new
"quiesce" option, that makes two of us.

Thread: http://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com

Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
 tests/scsi/004     | 235 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/scsi/004.out |   7 ++
 2 files changed, 242 insertions(+)
 create mode 100755 tests/scsi/004
 create mode 100644 tests/scsi/004.out

diff --git a/tests/scsi/004 b/tests/scsi/004
new file mode 100755
index 0000000..791c76a
--- /dev/null
+++ b/tests/scsi/004
@@ -0,0 +1,235 @@
+#!/bin/bash
+#
+# Regression test for patch "block: do not use interruptible wait anywhere".
+#
+# > Without this fix, I get an IO error in this test:
+# >
+# > # dd if=/dev/sda of=/dev/null iflag=direct & \
+# >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
+# >   echo mem > /sys/power/state ; \
+# >   sleep 5; killall dd  # stop after 5 seconds
+#
+# AJ: linux-block insisted they wanted this, based on my reproducer above.
+# If you start wondering why you wouldn't base it on scsi_debug with a new
+# "quiesce" option, that makes two of us.
+# Thread: http://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com
+#
+#
+# RATIONALE for a suspend test:
+#
+# The original root cause issue was the behaviour around blk_queue_freeze().
+# It put tasks into an interruptible wait, which is wrong for block devices.
+#
+# The freeze feature is not directly exposed to userspace, so I can not test
+# it directly :(.  (It's used to "guarantee no request is in use, so we can
+# change any data structure of the queue afterward".  I.e. freeze, modify the
+# queue structure, unfreeze).
+#
+# However, this lead to a regression with a decent reproducer.  In v4.15 the
+# same interruptible wait was also used for SCSI suspend/resume.  SCSI resume
+# can take a second or so... hence we like to do it asynchronously.  This
+# means we can observe the wait at resume time, and we can test if it is
+# interruptible.
+#
+# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
+# trigger the specific wait in the block layer.  That code path only
+# sets the SCSI device state; it does not set any block device state.
+# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
+#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).
+#
+#
+# Copyright (C) 2018 Alan Jenkins
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+DESCRIPTION="check SCSI blockdev suspend is not interruptible"
+
+QUICK=1
+
+requires() {
+	# I can't expect to hit the window using bash, if the device is
+	# emulated by cpu.
+	#
+	# Maybe annoying for Xen dom0, but I'm guessing it's not common.
+	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
+	   ! [[ -e "$HAVE_BARE_METAL_SCSI" ]]; then
+		SKIP_REASON=\
+"Hypervisor detected, but this test wants bare-metal SCSI timings.
+If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
+		return 1
+	fi
+
+	# "If a user has disabled async probing a likely reason
+	#  is due to a storage enclosure that does not inject
+	#  staggered spin-ups. For safety, make resume
+	#  synchronous as well in that case."
+	if ! scan="$(cat /sys/module/scsi_mod/parameters/scan)"; then
+		SKIP_REASON="Could not read '/sys/module/scsi_mod/parameters/scan'"
+		return 1
+	fi
+	if [[ "$scan" != async ]]; then
+		SKIP_REASON="This test does not work if you have set 'scsi_mod.scan=sync'"
+		return 1
+	fi
+
+	if ! cat /sys/power/pm_test > /dev/null; then
+		SKIP_REASON="Error reading pm_test.  Maybe kernel lacks CONFIG_PM_TEST?"
+		return 1
+	fi
+
+	_have_fio
+}
+
+do_test_device() ( # run whole function in a subshell
+
+	sysfs_pm_test_delay=/sys/module/suspend/parameters/pm_test_delay
+	saved_pm_test_delay=
+	dd_pid=
+	subshell_pid=
+
+	fail() {
+		echo "$*"
+		exit 1
+	}
+
+	cleanup_pid() {
+		local pid="$1"
+
+		# Suppress shell messages, about killed process
+		exec 3>&1 4>&2
+		exec >>"$FULL" 2>&1
+
+		kill "$pid" >&3 2>&4
+
+		# Don't try to re-redirect output from `wait` just in case,
+		# if `wait` is executed in a subshell then it cannot work.
+		wait "$pid"
+
+		# Restore stdout/stderr
+		exec >&3 2>&4
+		exec 3>&- 4>&-
+	}
+
+	cleanup() {
+		if [ -n "$subshell_pid" ]; then
+			echo "Killing sub-shell..."
+			cleanup_pid "$subshell_pid"
+		fi
+		if [ -n "$dd_pid" ]; then
+			echo "Killing dd..."
+			cleanup_pid "$dd_pid"
+		fi
+
+		echo "Resetting pm_test"
+		echo none > /sys/power/pm_test
+
+		echo "Resetting pm_test_delay"
+		if [ -n "$saved_pm_test_delay" ]; then
+			echo "$saved_pm_test_delay" > "$sysfs_pm_test_delay"
+		fi
+	}
+	trap cleanup EXIT
+
+	# Have not tested on devices which require larger IO,
+	# but let's not die if we see one.
+	bs="$(_test_dev_queue_get hw_sector_size)"
+
+	# Start dd, as a background process which submits IOs
+	# and yells when one fails.  We want to test the block
+	# layer, so we use direct IO to avoid being served
+	# from page cache.
+	#
+	# I tried using fio with the exact same IO pattern,
+	# sadly it was not 100% reliable at reproducing this
+	# issue.
+	#
+	dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none &
+	dd_pid=$!
+
+# 	fio --output="$FULL" --loops=65535 \
+# 	    --thread --exitall_on_error \
+# 	    --direct=1 \
+# 	    --bs=$bs --rw=read --name=reads \
+# 	    --filename="$TEST_DEV" &
+# 	fio_pid=$!
+
+	# Keep sending signals to 'dd'.  Give it 1ms between
+	# signals so it gets a chance to actually submit IOs.
+	#
+	# In theory this script is probably subject to various
+	# pid re-use races.  But I started in sh... so far
+	# blktests does not depend on python... also direct IO
+	# is best to reproduce this, which is not built in to
+	# python.
+	(
+		while kill -STOP $dd_pid 2>/dev/null &&
+		      kill -CONT $dd_pid 2>/dev/null; do
+
+			sleep 0.001
+		done
+
+		# Wait to be killed. Simplifies cleanup.
+		while true; do
+			sleep 1
+		done
+	) &
+	subshell_pid=$!
+
+	if ! echo devices > /sys/power/pm_test; then
+		fail "error setting pm_test"
+	fi
+
+	if ! saved_pm_test_delay="$(cat "$sysfs_pm_test_delay")"; then
+		fail "error reading pm_test_delay"
+	fi
+	if ! echo 0 > "$sysfs_pm_test_delay"; then
+		fail "error setting pm_test_delay"
+	fi
+
+	# Log that we're suspending.  User might not have guessed,
+	# or maybe suspend (or pm_test suspend) is broken on this system.
+	echo "Simulating suspend/resume now"
+	echo mem > /sys/power/state
+
+	# Now wait for TEST_DEV to resume asynchronously
+	dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none
+
+	# Wait another second.  This might be useful in the case dd got blocked on a
+	# page fault during the suspend; it will have a second to get sorted out,
+	# so it can potentially receive an IO error and exit.
+	sleep 1
+
+	if ! kill -0 $dd_pid 2>/dev/null; then
+		# dd exited before we entered cleanup.
+		# Read its exit status
+		wait $dd_pid
+		ret=$?
+		dd_pid=
+
+		if [[ $ret == 0 ]]; then
+			echo "'dd' exited early, without error."
+			echo "Is your scsi device implausibly fast or small?"
+		else
+			# Test should already fail at this point due to
+			# error messages, but let's log it while we're here.
+			echo "'dd' exited with error $ret"
+		fi
+	fi
+) # end subshell function
+
+test_device() {
+	echo "Running ${TEST_NAME}"
+	do_test_device
+	echo "Test complete"
+}
diff --git a/tests/scsi/004.out b/tests/scsi/004.out
new file mode 100644
index 0000000..4085f62
--- /dev/null
+++ b/tests/scsi/004.out
@@ -0,0 +1,7 @@
+Running scsi/004
+Simulating suspend/resume now
+Killing sub-shell...
+Killing dd...
+Resetting pm_test
+Resetting pm_test_delay
+Test complete
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-16 21:23             ` [PATCH] blktests: regression test "block: do not use interruptible wait anywhere" Alan Jenkins
@ 2018-04-16 21:52               ` Alan Jenkins
  2018-04-17  7:21                 ` Johannes Thumshirn
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-16 21:52 UTC (permalink / raw)
  To: Jens Axboe, Johannes Thumshirn, linux-block; +Cc: Bart Van Assche, Alan Jenkins

> Without this fix, I get an IO error in this test:
>
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds

linux-block insisted they wanted this, based on my reproducer above.
If you start wondering why you wouldn't base it on scsi_debug with a new
"quiesce" option, that makes two of us.

Thread: http://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com

Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
v2: fix sense of conditional test for HAVE_BARE_METAL_SCSI.
    Also I had a few single-bracket tests, which wanted to be replaced
    with double-brackets to match the coding style.

 tests/scsi/004     | 235 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/scsi/004.out |   7 ++
 2 files changed, 242 insertions(+)
 create mode 100755 tests/scsi/004
 create mode 100644 tests/scsi/004.out

diff --git a/tests/scsi/004 b/tests/scsi/004
new file mode 100755
index 0000000..ef42033
--- /dev/null
+++ b/tests/scsi/004
@@ -0,0 +1,235 @@
+#!/bin/bash
+#
+# Regression test for patch "block: do not use interruptible wait anywhere".
+#
+# > Without this fix, I get an IO error in this test:
+# >
+# > # dd if=/dev/sda of=/dev/null iflag=direct & \
+# >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
+# >   echo mem > /sys/power/state ; \
+# >   sleep 5; killall dd  # stop after 5 seconds
+#
+# AJ: linux-block insisted they wanted this, based on my reproducer above.
+# If you start wondering why you wouldn't base it on scsi_debug with a new
+# "quiesce" option, that makes two of us.
+# Thread: http://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com
+#
+#
+# RATIONALE for a suspend test:
+#
+# The original root cause issue was the behaviour around blk_queue_freeze().
+# It put tasks into an interruptible wait, which is wrong for block devices.
+#
+# The freeze feature is not directly exposed to userspace, so I can not test
+# it directly :(.  (It's used to "guarantee no request is in use, so we can
+# change any data structure of the queue afterward".  I.e. freeze, modify the
+# queue structure, unfreeze).
+#
+# However, this lead to a regression with a decent reproducer.  In v4.15 the
+# same interruptible wait was also used for SCSI suspend/resume.  SCSI resume
+# can take a second or so... hence we like to do it asynchronously.  This
+# means we can observe the wait at resume time, and we can test if it is
+# interruptible.
+#
+# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
+# trigger the specific wait in the block layer.  That code path only
+# sets the SCSI device state; it does not set any block device state.
+# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
+#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).
+#
+#
+# Copyright (C) 2018 Alan Jenkins
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+DESCRIPTION="check SCSI blockdev suspend is not interruptible"
+
+QUICK=1
+
+requires() {
+	# I can't expect to hit the window using bash, if the device is
+	# emulated by cpu.
+	#
+	# Maybe annoying for Xen dom0, but I'm guessing it's not common.
+	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
+	   (( !HAVE_BARE_METAL_SCSI )); then
+		SKIP_REASON=\
+"Hypervisor detected, but this test wants bare-metal SCSI timings.
+If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
+		return 1
+	fi
+
+	# "If a user has disabled async probing a likely reason
+	#  is due to a storage enclosure that does not inject
+	#  staggered spin-ups. For safety, make resume
+	#  synchronous as well in that case."
+	if ! scan="$(cat /sys/module/scsi_mod/parameters/scan)"; then
+		SKIP_REASON="Could not read '/sys/module/scsi_mod/parameters/scan'"
+		return 1
+	fi
+	if [[ "$scan" != async ]]; then
+		SKIP_REASON="This test does not work if you have set 'scsi_mod.scan=sync'"
+		return 1
+	fi
+
+	if ! cat /sys/power/pm_test > /dev/null; then
+		SKIP_REASON="Error reading pm_test.  Maybe kernel lacks CONFIG_PM_TEST?"
+		return 1
+	fi
+
+	_have_fio
+}
+
+do_test_device() ( # run whole function in a subshell
+
+	sysfs_pm_test_delay=/sys/module/suspend/parameters/pm_test_delay
+	saved_pm_test_delay=
+	dd_pid=
+	subshell_pid=
+
+	fail() {
+		echo "$*"
+		exit 1
+	}
+
+	cleanup_pid() {
+		local pid="$1"
+
+		# Suppress shell messages, about killed process
+		exec 3>&1 4>&2
+		exec >>"$FULL" 2>&1
+
+		kill "$pid" >&3 2>&4
+
+		# Don't try to re-redirect output from `wait` just in case,
+		# if `wait` is executed in a subshell then it cannot work.
+		wait "$pid"
+
+		# Restore stdout/stderr
+		exec >&3 2>&4
+		exec 3>&- 4>&-
+	}
+
+	cleanup() {
+		if [[ -n "$subshell_pid" ]]; then
+			echo "Killing sub-shell..."
+			cleanup_pid "$subshell_pid"
+		fi
+		if [[ -n "$dd_pid" ]]; then
+			echo "Killing dd..."
+			cleanup_pid "$dd_pid"
+		fi
+
+		echo "Resetting pm_test"
+		echo none > /sys/power/pm_test
+
+		echo "Resetting pm_test_delay"
+		if [[ -n "$saved_pm_test_delay" ]]; then
+			echo "$saved_pm_test_delay" > "$sysfs_pm_test_delay"
+		fi
+	}
+	trap cleanup EXIT
+
+	# Have not tested on devices which require larger IO,
+	# but let's not die if we see one.
+	bs="$(_test_dev_queue_get hw_sector_size)"
+
+	# Start dd, as a background process which submits IOs
+	# and yells when one fails.  We want to test the block
+	# layer, so we use direct IO to avoid being served
+	# from page cache.
+	#
+	# I tried using fio with the exact same IO pattern,
+	# sadly it was not 100% reliable at reproducing this
+	# issue.
+	#
+	dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none &
+	dd_pid=$!
+
+# 	fio --output="$FULL" --loops=65535 \
+# 	    --thread --exitall_on_error \
+# 	    --direct=1 \
+# 	    --bs=$bs --rw=read --name=reads \
+# 	    --filename="$TEST_DEV" &
+# 	fio_pid=$!
+
+	# Keep sending signals to 'dd'.  Give it 1ms between
+	# signals so it gets a chance to actually submit IOs.
+	#
+	# In theory this script is probably subject to various
+	# pid re-use races.  But I started in sh... so far
+	# blktests does not depend on python... also direct IO
+	# is best to reproduce this, which is not built in to
+	# python.
+	(
+		while kill -STOP $dd_pid 2>/dev/null &&
+		      kill -CONT $dd_pid 2>/dev/null; do
+
+			sleep 0.001
+		done
+
+		# Wait to be killed. Simplifies cleanup.
+		while true; do
+			sleep 1
+		done
+	) &
+	subshell_pid=$!
+
+	if ! echo devices > /sys/power/pm_test; then
+		fail "error setting pm_test"
+	fi
+
+	if ! saved_pm_test_delay="$(cat "$sysfs_pm_test_delay")"; then
+		fail "error reading pm_test_delay"
+	fi
+	if ! echo 0 > "$sysfs_pm_test_delay"; then
+		fail "error setting pm_test_delay"
+	fi
+
+	# Log that we're suspending.  User might not have guessed,
+	# or maybe suspend (or pm_test suspend) is broken on this system.
+	echo "Simulating suspend/resume now"
+	echo mem > /sys/power/state
+
+	# Now wait for TEST_DEV to resume asynchronously
+	dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none
+
+	# Wait another second.  This might be useful in the case dd got blocked on a
+	# page fault during the suspend; it will have a second to get sorted out,
+	# so it can potentially receive an IO error and exit.
+	sleep 1
+
+	if ! kill -0 $dd_pid 2>/dev/null; then
+		# dd exited before we entered cleanup.
+		# Read its exit status
+		wait $dd_pid
+		ret=$?
+		dd_pid=
+
+		if [[ $ret == 0 ]]; then
+			echo "'dd' exited early, without error."
+			echo "Is your scsi device implausibly fast or small?"
+		else
+			# Test should already fail at this point due to
+			# error messages, but let's log it while we're here.
+			echo "'dd' exited with error $ret"
+		fi
+	fi
+) # end subshell function
+
+test_device() {
+	echo "Running ${TEST_NAME}"
+	do_test_device
+	echo "Test complete"
+}
diff --git a/tests/scsi/004.out b/tests/scsi/004.out
new file mode 100644
index 0000000..4085f62
--- /dev/null
+++ b/tests/scsi/004.out
@@ -0,0 +1,7 @@
+Running scsi/004
+Simulating suspend/resume now
+Killing sub-shell...
+Killing dd...
+Resetting pm_test
+Resetting pm_test_delay
+Test complete
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-16 21:52               ` Alan Jenkins
@ 2018-04-17  7:21                 ` Johannes Thumshirn
  2018-04-17 15:55                   ` Alan Jenkins
  0 siblings, 1 reply; 23+ messages in thread
From: Johannes Thumshirn @ 2018-04-17  7:21 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: Jens Axboe, linux-block, Bart Van Assche

On Mon, Apr 16, 2018 at 10:52:47PM +0100, Alan Jenkins wrote:
> > Without this fix, I get an IO error in this test:
> >
> > # dd if=/dev/sda of=/dev/null iflag=direct & \
> >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
> >   echo mem > /sys/power/state ; \
> >   sleep 5; killall dd  # stop after 5 seconds
> 
> linux-block insisted they wanted this, based on my reproducer above.
> If you start wondering why you wouldn't base it on scsi_debug with a new
> "quiesce" option, that makes two of us.

Thanks for doing this:

> +	# Maybe annoying for Xen dom0, but I'm guessing it's not common.
> +	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
> +	   (( !HAVE_BARE_METAL_SCSI )); then
> +		SKIP_REASON=\
> +"Hypervisor detected, but this test wants bare-metal SCSI timings.
> +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
> +		return 1
> +	fi

I don't think we need this, if people want to shoot themselves in the
foot by runnning a possibly destructive test suite in a HV we
shouldn't stop them.

> +
> +	_have_fio

Not needed anymore as per below comment?


> +	# I tried using fio with the exact same IO pattern,
> +	# sadly it was not 100% reliable at reproducing this
> +	# issue.
> +	#
> +	dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none &
> +	dd_pid=$!
> +
> +# 	fio --output="$FULL" --loops=65535 \
> +# 	    --thread --exitall_on_error \
> +# 	    --direct=1 \
> +# 	    --bs=$bs --rw=read --name=reads \
> +# 	    --filename="$TEST_DEV" &
> +# 	fio_pid=$!

I think we can just zap the commented out fio part and the comment
about it.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-17 15:55                   ` Alan Jenkins
@ 2018-04-17 15:10                     ` Alan Jenkins
  2018-04-17 15:21                       ` [PATCH v4] " Alan Jenkins
  2018-04-18  7:25                     ` [PATCH] " Johannes Thumshirn
  1 sibling, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-17 15:10 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe, linux-block; +Cc: Bart Van Assche, Alan Jenkins

> Without this fix, I get an IO error in this test:
>
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds

linux-block specifically asked for a test derived from this reproducer.
They didn't come up with any suggestion for testing the code more directly
(and robustly).  So this test uses system suspend, automated with pm_test.

Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
v3: Switch from dd to fio, clarify some comment.
    The HAVE_BARE_METAL_SCSI check is left unchanged,
    waiting for further discussion.

 tests/scsi/004     | 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/scsi/004.out |  12 +++
 2 files changed, 267 insertions(+)
 create mode 100755 tests/scsi/004
 create mode 100644 tests/scsi/004.out

diff --git a/tests/scsi/004 b/tests/scsi/004
new file mode 100755
index 0000000..2a7f794
--- /dev/null
+++ b/tests/scsi/004
@@ -0,0 +1,255 @@
+#!/bin/bash
+#
+# Regression test for patch "block: do not use interruptible wait anywhere".
+#
+# > Without this fix, I get an IO error in this test:
+# >
+# > # dd if=/dev/sda of=/dev/null iflag=direct & \
+# >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
+# >   echo mem > /sys/power/state ; \
+# >   sleep 5; killall dd  # stop after 5 seconds
+#
+# AJ: linux-block specifically asked for a test derived from this reproducer.
+# They didn't come up with any suggestion for testing the code more directly
+# (and robustly).  So this test uses system suspend, automated with pm_test.
+#
+#
+# Rationale for the test needing system suspend:
+#
+# The original root cause issue was the behaviour around blk_queue_freeze().
+# It put tasks into an interruptible wait, which is wrong for block devices.
+#
+# The freeze feature is not directly exposed to userspace, so I can not test
+# it directly :(.  (It's used to "guarantee no request is in use, so we can
+# change any data structure of the queue afterward".  I.e. freeze, modify the
+# queue structure, unfreeze).
+#
+# However, this lead to a kernel regression with a decent reproducer.  In
+# v4.15 the same interruptible wait was also used for SCSI suspend/resume.
+# SCSI resume can take a second or so, hence we like to do it asynchronously.
+# This means we can observe the wait at resume time, and we can test if it is
+# interruptible.
+#
+# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
+# trigger the specific wait in the block layer.  That code path only
+# sets the SCSI device state; it does not set any block device state.
+# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
+#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).
+#
+#
+# Copyright (C) 2018 Alan Jenkins
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+DESCRIPTION="check SCSI blockdev suspend is not interruptible"
+
+QUICK=1
+
+requires() {
+	# I can't expect to hit the window using bash, if the device is
+	# emulated by cpu.
+	#
+	# Maybe annoying to see this message on Xen dom0,
+	# but I'm guessing that's not common.
+	#
+	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
+	   (( !HAVE_BARE_METAL_SCSI )); then
+		SKIP_REASON=\
+"Hypervisor detected, but this test wants bare-metal SCSI timings.
+If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
+		return 1
+	fi
+
+	# "If a user has disabled async probing a likely reason
+	#  is due to a storage enclosure that does not inject
+	#  staggered spin-ups. For safety, make resume
+	#  synchronous as well in that case."
+	if ! scan="$(cat /sys/module/scsi_mod/parameters/scan)"; then
+		SKIP_REASON="Could not read '/sys/module/scsi_mod/parameters/scan'"
+		return 1
+	fi
+	if [[ "$scan" != async ]]; then
+		SKIP_REASON="This test does not work if you have set 'scsi_mod.scan=sync'"
+		return 1
+	fi
+
+	if ! cat /sys/power/pm_test > /dev/null; then
+		SKIP_REASON="Error reading pm_test.  Maybe kernel lacks CONFIG_PM_TEST?"
+		return 1
+	fi
+
+	_have_fio
+}
+
+do_test_device() ( # run whole function in a subshell
+
+	sysfs_pm_test_delay=/sys/module/suspend/parameters/pm_test_delay
+	saved_pm_test_delay=
+	fio_pid=
+	subshell_pid=
+
+	# Fail the test early, in cases where it should not continue.
+	fail() {
+		echo "$*"
+		exit 1
+	}
+
+	# Terminate child process
+	cleanup_pid() {
+		local pid="$1"
+
+		# Suppress shell messages about killed process.
+		# The messages would vary, causing the test to fail.
+		exec 3>&1 4>&2
+		exec >>"$FULL" 2>&1
+
+		# Send terminate signal.  Also send the continue signal,
+		# in case the process was currently stopped.
+		(kill "$pid" && kill -CONT "$pid") >&3 2>&4
+
+		# Don't try to re-redirect output from `wait` just in case,
+		# if `wait` is executed in a subshell then it cannot work.
+		wait "$pid"
+
+		# Restore stdout/stderr
+		exec >&3 2>&4
+		exec 3>&- 4>&-
+	}
+
+	cleanup() {
+		if [[ -n "$subshell_pid" ]]; then
+			echo "Killing sub-shell..."
+			cleanup_pid "$subshell_pid"
+		fi
+		if [[ -n "$fio_pid" ]]; then
+			echo "Killing fio..."
+			cleanup_pid "$fio_pid"
+		fi
+
+		echo "Resetting pm_test_delay"
+		if [[ -n "$saved_pm_test_delay" ]]; then
+			echo "$saved_pm_test_delay" > "$sysfs_pm_test_delay"
+		fi
+
+		echo "Resetting pm_test"
+		echo none > /sys/power/pm_test
+	}
+	trap cleanup EXIT
+
+	# Start fio, as a background process which submits IOs and stops
+	# with an error when one fails.  Use threads instead of separate
+	# processes, so it's easier to send signals to the IO thread.
+	#
+	# This is the same behaviour as dd, except that we loop in case the
+	# device is tiny. (Strictly speaking, the block size is different too).
+	#
+	fio --output="${FULL}.fio" --filename="$TEST_DEV" \
+	    --thread --exitall_on_error --loops=1G \
+	    --direct=1 --rw=read --name=reads &
+	fio_pid=$!
+
+	# Keep sending signals to 'fio`.  Give it 1ms between
+	# signals so it gets a chance to actually submit IOs.
+	#
+	# In theory this script is probably subject to various
+	# pid re-use races.  But I started in sh... so far
+	# blktests does not depend on python... also direct IO
+	# is best to reproduce this, which is not built in to
+	# python.
+	(
+		while kill -STOP $fio_pid 2>>"$FULL" &&
+		      kill -CONT $fio_pid 2>>"$FULL"; do
+
+			sleep 0.001
+		done
+
+		# dd exited.  Wait to be killed, it simplifies cleanup.
+		while true; do
+			sleep 1
+		done
+	) &
+	subshell_pid=$!
+
+	# Here's the real race condition.
+	#
+	# We only want to suspend once both child processes have reached their
+	# main loops.  Otherwise we get a false pass.  We use the following
+	# mitigations:
+	#
+	# 1. Wait 1 second first.
+	#
+	# 2. Make sure to call this function twice, so hopefully the second
+	#    time will not have to wait to page anything in.
+	#
+	# 3. Wait for any pending writes first.  I think that this redundant in
+	#    principle, but will make for more consistent timings.
+	#
+	# (You can actually solve this precisely using strace or the like...
+	#  but it still looks weird, and adds another depedency)
+	#
+	sync
+	sleep 1
+
+	if ! echo devices > /sys/power/pm_test; then
+		fail "error setting pm_test"
+	fi
+
+	if ! saved_pm_test_delay="$(cat "$sysfs_pm_test_delay")"; then
+		fail "error reading pm_test_delay"
+	fi
+	if ! echo 0 > "$sysfs_pm_test_delay"; then
+		fail "error setting pm_test_delay"
+	fi
+
+	# Log that we're suspending.  User might not have guessed,
+	# or maybe suspend (or pm_test suspend) is broken on this system.
+	echo "Simulating suspend/resume now"
+	echo mem > /sys/power/state
+
+	# Now wait for TEST_DEV to resume asynchronously
+	dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none
+
+	# Wait again.  This will be useful in the case fio got blocked on a
+	# page fault during the suspend; it will have a second to get sorted out,
+	# so it can potentially receive an IO error and exit.
+	sleep 1
+	dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none
+
+	if ! kill -0 $fio_pid 2>/dev/null; then
+		# dd exited before we entered cleanup.
+		# Read its exit status
+		wait $fio_pid
+		ret=$?
+		fio_pid=
+
+		if [[ $ret == 0 ]]; then
+			fail "'fio' exited early, without error. Please report this as a bug."
+		else
+			# Test should already fail at this point due to
+			# error messages.  But let's log it while we're here,
+			# and also not run the second iteration of the test.
+			fail "'fio' exited with error $ret"
+		fi
+	fi
+) # end subshell function
+
+test_device() {
+	echo "Running ${TEST_NAME}"
+
+	# Run the test twice.  Hopefully the second iteration will
+	# have everything in page cache for consistent timings.
+	do_test_device && do_test_device
+
+	echo "Test complete"
+}
diff --git a/tests/scsi/004.out b/tests/scsi/004.out
new file mode 100644
index 0000000..7211b4d
--- /dev/null
+++ b/tests/scsi/004.out
@@ -0,0 +1,12 @@
+Running scsi/004
+Simulating suspend/resume now
+Killing sub-shell...
+Killing fio...
+Resetting pm_test_delay
+Resetting pm_test
+Simulating suspend/resume now
+Killing sub-shell...
+Killing fio...
+Resetting pm_test_delay
+Resetting pm_test
+Test complete
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-17 15:10                     ` [PATCH v3] " Alan Jenkins
@ 2018-04-17 15:21                       ` Alan Jenkins
  2018-04-24 22:46                         ` Omar Sandoval
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-04-17 15:21 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe, linux-block; +Cc: Bart Van Assche, Alan Jenkins

> Without this fix, I get an IO error in this test:
>
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds

linux-block specifically asked for a test derived from this reproducer.
They didn't come up with any suggestion for testing the code more directly
(and robustly).  So this test uses system suspend, automated with pm_test.

Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
v4: Fix use of $FULL introduced in v3

v3: Switch from dd to fio, clarify some comment.
    The HAVE_BARE_METAL_SCSI check is left unchanged,
    waiting for further discussion.

 tests/scsi/004     | 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/scsi/004.out |  12 +++
 2 files changed, 267 insertions(+)
 create mode 100755 tests/scsi/004
 create mode 100644 tests/scsi/004.out

diff --git a/tests/scsi/004 b/tests/scsi/004
new file mode 100755
index 0000000..cea1475
--- /dev/null
+++ b/tests/scsi/004
@@ -0,0 +1,255 @@
+#!/bin/bash
+#
+# Regression test for patch "block: do not use interruptible wait anywhere".
+#
+# > Without this fix, I get an IO error in this test:
+# >
+# > # dd if=/dev/sda of=/dev/null iflag=direct & \
+# >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
+# >   echo mem > /sys/power/state ; \
+# >   sleep 5; killall dd  # stop after 5 seconds
+#
+# AJ: linux-block specifically asked for a test derived from this reproducer.
+# They didn't come up with any suggestion for testing the code more directly
+# (and robustly).  So this test uses system suspend, automated with pm_test.
+#
+#
+# Rationale for the test needing system suspend:
+#
+# The original root cause issue was the behaviour around blk_queue_freeze().
+# It put tasks into an interruptible wait, which is wrong for block devices.
+#
+# The freeze feature is not directly exposed to userspace, so I can not test
+# it directly :(.  (It's used to "guarantee no request is in use, so we can
+# change any data structure of the queue afterward".  I.e. freeze, modify the
+# queue structure, unfreeze).
+#
+# However, this lead to a kernel regression with a decent reproducer.  In
+# v4.15 the same interruptible wait was also used for SCSI suspend/resume.
+# SCSI resume can take a second or so, hence we like to do it asynchronously.
+# This means we can observe the wait at resume time, and we can test if it is
+# interruptible.
+#
+# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
+# trigger the specific wait in the block layer.  That code path only
+# sets the SCSI device state; it does not set any block device state.
+# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
+#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).
+#
+#
+# Copyright (C) 2018 Alan Jenkins
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+DESCRIPTION="check SCSI blockdev suspend is not interruptible"
+
+QUICK=1
+
+requires() {
+	# I can't expect to hit the window using bash, if the device is
+	# emulated by cpu.
+	#
+	# Maybe annoying to see this message on Xen dom0,
+	# but I'm guessing that's not common.
+	#
+	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
+	   (( !HAVE_BARE_METAL_SCSI )); then
+		SKIP_REASON=\
+"Hypervisor detected, but this test wants bare-metal SCSI timings.
+If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
+		return 1
+	fi
+
+	# "If a user has disabled async probing a likely reason
+	#  is due to a storage enclosure that does not inject
+	#  staggered spin-ups. For safety, make resume
+	#  synchronous as well in that case."
+	if ! scan="$(cat /sys/module/scsi_mod/parameters/scan)"; then
+		SKIP_REASON="Could not read '/sys/module/scsi_mod/parameters/scan'"
+		return 1
+	fi
+	if [[ "$scan" != async ]]; then
+		SKIP_REASON="This test does not work if you have set 'scsi_mod.scan=sync'"
+		return 1
+	fi
+
+	if ! cat /sys/power/pm_test > /dev/null; then
+		SKIP_REASON="Error reading pm_test.  Maybe kernel lacks CONFIG_PM_TEST?"
+		return 1
+	fi
+
+	_have_fio
+}
+
+do_test_device() ( # run whole function in a subshell
+
+	sysfs_pm_test_delay=/sys/module/suspend/parameters/pm_test_delay
+	saved_pm_test_delay=
+	fio_pid=
+	subshell_pid=
+
+	# Fail the test early, in cases where it should not continue.
+	fail() {
+		echo "$*"
+		exit 1
+	}
+
+	# Terminate child process
+	cleanup_pid() {
+		local pid="$1"
+
+		# Suppress shell messages about killed process.
+		# The messages would vary, causing the test to fail.
+		exec 3>&1 4>&2
+		exec >/dev/null 2>&1
+
+		# Send terminate signal.  Also send the continue signal,
+		# in case the process was currently stopped.
+		(kill "$pid" && kill -CONT "$pid") >&3 2>&4
+
+		# Don't try to re-redirect output from `wait` just in case,
+		# if `wait` is executed in a subshell then it cannot work.
+		wait "$pid"
+
+		# Restore stdout/stderr
+		exec >&3 2>&4
+		exec 3>&- 4>&-
+	}
+
+	cleanup() {
+		if [[ -n "$subshell_pid" ]]; then
+			echo "Killing sub-shell..."
+			cleanup_pid "$subshell_pid"
+		fi
+		if [[ -n "$fio_pid" ]]; then
+			echo "Killing fio..."
+			cleanup_pid "$fio_pid"
+		fi
+
+		echo "Resetting pm_test_delay"
+		if [[ -n "$saved_pm_test_delay" ]]; then
+			echo "$saved_pm_test_delay" > "$sysfs_pm_test_delay"
+		fi
+
+		echo "Resetting pm_test"
+		echo none > /sys/power/pm_test
+	}
+	trap cleanup EXIT
+
+	# Start fio, as a background process which submits IOs and stops
+	# with an error when one fails.  Use threads instead of separate
+	# processes, so it's easier to send signals to the IO thread.
+	#
+	# This is the same behaviour as dd, except that we loop in case the
+	# device is tiny. (Strictly speaking, the block size is different too).
+	#
+	fio --output="${FULL}" --filename="$TEST_DEV" \
+	    --thread --exitall_on_error --loops=1G \
+	    --direct=1 --rw=read --name=reads &
+	fio_pid=$!
+
+	# Keep sending signals to 'fio`.  Give it 1ms between
+	# signals so it gets a chance to actually submit IOs.
+	#
+	# In theory this script is probably subject to various
+	# pid re-use races.  But I started in sh... so far
+	# blktests does not depend on python... also direct IO
+	# is best to reproduce this, which is not built in to
+	# python.
+	(
+		while kill -STOP $fio_pid 2>/dev/null &&
+		      kill -CONT $fio_pid 2>/dev/null; do
+
+			sleep 0.001
+		done
+
+		# dd exited.  Wait to be killed, it simplifies cleanup.
+		while true; do
+			sleep 1
+		done
+	) &
+	subshell_pid=$!
+
+	# Here's the real race condition.
+	#
+	# We only want to suspend once both child processes have reached their
+	# main loops.  Otherwise we get a false pass.  We use the following
+	# mitigations:
+	#
+	# 1. Wait 1 second first.
+	#
+	# 2. Make sure to call this function twice, so hopefully the second
+	#    time will not have to wait to page anything in.
+	#
+	# 3. Wait for any pending writes first.  I think that this redundant in
+	#    principle, but will make for more consistent timings.
+	#
+	# (You can actually solve this precisely using strace or the like...
+	#  but it still looks weird, and adds another depedency)
+	#
+	sync
+	sleep 1
+
+	if ! echo devices > /sys/power/pm_test; then
+		fail "error setting pm_test"
+	fi
+
+	if ! saved_pm_test_delay="$(cat "$sysfs_pm_test_delay")"; then
+		fail "error reading pm_test_delay"
+	fi
+	if ! echo 0 > "$sysfs_pm_test_delay"; then
+		fail "error setting pm_test_delay"
+	fi
+
+	# Log that we're suspending.  User might not have guessed,
+	# or maybe suspend (or pm_test suspend) is broken on this system.
+	echo "Simulating suspend/resume now"
+	echo mem > /sys/power/state
+
+	# Now wait for TEST_DEV to resume asynchronously
+	dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none
+
+	# Wait again.  This will be useful in the case fio got blocked on a
+	# page fault during the suspend; it will have a second to get sorted out,
+	# so it can potentially receive an IO error and exit.
+	sleep 1
+	dd iflag=direct if="$TEST_DEV" of=/dev/null count=1 status=none
+
+	if ! kill -0 $fio_pid 2>/dev/null; then
+		# fio exited before we entered cleanup.
+		# Read its exit status
+		wait $fio_pid
+		ret=$?
+		fio_pid=
+
+		if [[ $ret == 0 ]]; then
+			fail "'fio' exited early, without error. Please report this as a bug."
+		else
+			# Test should already fail at this point due to
+			# error messages.  But let's log it while we're here,
+			# and also not run the second iteration of the test.
+			fail "'fio' exited with error $ret"
+		fi
+	fi
+) # end subshell function
+
+test_device() {
+	echo "Running ${TEST_NAME}"
+
+	# Run the test twice.  Hopefully the second iteration will
+	# have everything in page cache for consistent timings.
+	do_test_device && do_test_device
+
+	echo "Test complete"
+}
diff --git a/tests/scsi/004.out b/tests/scsi/004.out
new file mode 100644
index 0000000..7211b4d
--- /dev/null
+++ b/tests/scsi/004.out
@@ -0,0 +1,12 @@
+Running scsi/004
+Simulating suspend/resume now
+Killing sub-shell...
+Killing fio...
+Resetting pm_test_delay
+Resetting pm_test
+Simulating suspend/resume now
+Killing sub-shell...
+Killing fio...
+Resetting pm_test_delay
+Resetting pm_test
+Test complete
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-17  7:21                 ` Johannes Thumshirn
@ 2018-04-17 15:55                   ` Alan Jenkins
  2018-04-17 15:10                     ` [PATCH v3] " Alan Jenkins
  2018-04-18  7:25                     ` [PATCH] " Johannes Thumshirn
  0 siblings, 2 replies; 23+ messages in thread
From: Alan Jenkins @ 2018-04-17 15:55 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Jens Axboe, linux-block, Bart Van Assche

On 17/04/18 08:21, Johannes Thumshirn wrote:
> On Mon, Apr 16, 2018 at 10:52:47PM +0100, Alan Jenkins wrote:
>>> Without this fix, I get an IO error in this test:
>>>
>>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>>    while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>>    echo mem > /sys/power/state ; \
>>>    sleep 5; killall dd  # stop after 5 seconds
>> linux-block insisted they wanted this, based on my reproducer above.
>> If you start wondering why you wouldn't base it on scsi_debug with a new
>> "quiesce" option, that makes two of us.
> Thanks for doing this:
>> +	# Maybe annoying for Xen dom0, but I'm guessing it's not common.
>> +	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
>> +	   (( !HAVE_BARE_METAL_SCSI )); then
>> +		SKIP_REASON=\
>> +"Hypervisor detected, but this test wants bare-metal SCSI timings.
>> +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
>> +		return 1
>> +	fi
> I don't think we need this, if people want to shoot themselves in the
> foot by runnning a possibly destructive test suite in a HV we
> shouldn't stop them.

Thanks, this is what I was hoping to get discussion on :).

What is meant by HV?

This is phrased to solve a problem I hadn't mentioned previously:

+       # I can't expect to hit the window using bash, if the device is
+       # emulated by cpu.

I haven't tested this reproducer against devices emulated in software.  
It's written against real hardware which takes a whole second to bring 
the SATA link up.  (And maybe spin up the hdd as well?).

I'm not familiar with linux-block's testing, to know how prevalent that 
bare-metal access is.  I would like to avoid putting out a lot of "pass" 
where it's effectively being skipped.

(The comment you did quote just refers to this check being annoying in 
dom0, because I assume the check detects dom0 as being virtualized, 
despite it having access to bare-metal scsi devices. It's not a great 
comment; I will clarify it a bit).

Yes, if this reason turns out to be marginal, I would be back to the 
concern about reliability of VM suspend, and wanting it to be opt-in 
with DO_PM_TEST=1 in the config file or something.  As a newb to 
blk-tests I would be very frustrated to spin up virt-manager with a 
virtual test device, because I would run into what turns out to be an 
unfixed kernel bug.

I'm happy to have the simpler check for DO_PM_TEST (with more verbose 
SKIP_REASON) if it works; maybe the autodetection just made extra 
complexity.

>> +
>> +	_have_fio
> Not needed anymore as per below comment?

Good point, yes (but see below).

>> +	# I tried using fio with the exact same IO pattern,
>> +	# sadly it was not 100% reliable at reproducing this
>> +	# issue.
>> +	#
>> +	dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none &
>> +	dd_pid=$!
>> +
>> +# 	fio --output="$FULL" --loops=65535 \
>> +# 	    --thread --exitall_on_error \
>> +# 	    --direct=1 \
>> +# 	    --bs=$bs --rw=read --name=reads \
>> +# 	    --filename="$TEST_DEV" &
>> +# 	fio_pid=$!
> I think we can just zap the commented out fio part and the comment
> about it.

fio was attractive as a way to avoid the failure case for ludicrously 
small/fast devices.  Later I actually hit that case, from running this 
test on a small unused partition :).

I think I've worked out the reliability now.  So I can start using fio, 
and have a solid answer to both issues.

Alan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-17 15:55                   ` Alan Jenkins
  2018-04-17 15:10                     ` [PATCH v3] " Alan Jenkins
@ 2018-04-18  7:25                     ` Johannes Thumshirn
  1 sibling, 0 replies; 23+ messages in thread
From: Johannes Thumshirn @ 2018-04-18  7:25 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: Jens Axboe, linux-block, Bart Van Assche

On Tue, Apr 17, 2018 at 04:55:51PM +0100, Alan Jenkins wrote:
> On 17/04/18 08:21, Johannes Thumshirn wrote:
> > On Mon, Apr 16, 2018 at 10:52:47PM +0100, Alan Jenkins wrote:
> > > > Without this fix, I get an IO error in this test:
> > > > 
> > > > # dd if=/dev/sda of=/dev/null iflag=direct & \
> > > >    while killall -SIGUSR1 dd; do sleep 0.1; done & \
> > > >    echo mem > /sys/power/state ; \
> > > >    sleep 5; killall dd  # stop after 5 seconds
> > > linux-block insisted they wanted this, based on my reproducer above.
> > > If you start wondering why you wouldn't base it on scsi_debug with a new
> > > "quiesce" option, that makes two of us.
> > Thanks for doing this:
> > > +	# Maybe annoying for Xen dom0, but I'm guessing it's not common.
> > > +	if grep -q ^flags.*\ hypervisor /proc/cpuinfo &&
> > > +	   (( !HAVE_BARE_METAL_SCSI )); then
> > > +		SKIP_REASON=\
> > > +"Hypervisor detected, but this test wants bare-metal SCSI timings.
> > > +If you have a pass-through device, you may set HAVE_BARE_METAL_SCSI=1."
> > > +		return 1
> > > +	fi
> > I don't think we need this, if people want to shoot themselves in the
> > foot by runnning a possibly destructive test suite in a HV we
> > shouldn't stop them.
> 
> Thanks, this is what I was hoping to get discussion on :).
> 
> What is meant by HV?

Hypervisor, sorry.

> 
> This is phrased to solve a problem I hadn't mentioned previously:
> 
> +������ # I can't expect to hit the window using bash, if the device is
> +������ # emulated by cpu.
> 
> I haven't tested this reproducer against devices emulated in software.� It's
> written against real hardware which takes a whole second to bring the SATA
> link up.� (And maybe spin up the hdd as well?).
> 
> I'm not familiar with linux-block's testing, to know how prevalent that
> bare-metal access is.� I would like to avoid putting out a lot of "pass"
> where it's effectively being skipped.
> 
> (The comment you did quote just refers to this check being annoying in dom0,
> because I assume the check detects dom0 as being virtualized, despite it
> having access to bare-metal scsi devices. It's not a great comment; I will
> clarify it a bit).
> 
> Yes, if this reason turns out to be marginal, I would be back to the concern
> about reliability of VM suspend, and wanting it to be opt-in with
> DO_PM_TEST=1 in the config file or something.� As a newb to blk-tests I
> would be very frustrated to spin up virt-manager with a virtual test device,
> because I would run into what turns out to be an unfixed kernel bug.

Well it's a test result as well, isn't it ;-). I personally run all my
testing in VMs (with optional PCI passthrough if I need special
Hardware). But yes this is just my personal preference.

> I'm happy to have the simpler check for DO_PM_TEST (with more verbose
> SKIP_REASON) if it works; maybe the autodetection just made extra
> complexity.
> > > +
> > > +	_have_fio
> > Not needed anymore as per below comment?
> 
> Good point, yes (but see below).
> 
> > > +	# I tried using fio with the exact same IO pattern,
> > > +	# sadly it was not 100% reliable at reproducing this
> > > +	# issue.
> > > +	#
> > > +	dd iflag=direct if="$TEST_DEV" of=/dev/null bs=$bs status=none &
> > > +	dd_pid=$!
> > > +
> > > +# 	fio --output="$FULL" --loops=65535 \
> > > +# 	    --thread --exitall_on_error \
> > > +# 	    --direct=1 \
> > > +# 	    --bs=$bs --rw=read --name=reads \
> > > +# 	    --filename="$TEST_DEV" &
> > > +# 	fio_pid=$!
> > I think we can just zap the commented out fio part and the comment
> > about it.
> 
> fio was attractive as a way to avoid the failure case for ludicrously
> small/fast devices.� Later I actually hit that case, from running this test
> on a small unused partition :).
> 
> I think I've worked out the reliability now.� So I can start using fio, and
> have a solid answer to both issues.

OK

Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-17 15:21                       ` [PATCH v4] " Alan Jenkins
@ 2018-04-24 22:46                         ` Omar Sandoval
  2018-04-25  8:46                           ` Alan Jenkins
  0 siblings, 1 reply; 23+ messages in thread
From: Omar Sandoval @ 2018-04-24 22:46 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: Johannes Thumshirn, Jens Axboe, linux-block, Bart Van Assche

On Tue, Apr 17, 2018 at 04:21:36PM +0100, Alan Jenkins wrote:
> > Without this fix, I get an IO error in this test:
> >
> > # dd if=/dev/sda of=/dev/null iflag=direct & \
> >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
> >   echo mem > /sys/power/state ; \
> >   sleep 5; killall dd  # stop after 5 seconds
> 
> linux-block specifically asked for a test derived from this reproducer.
> They didn't come up with any suggestion for testing the code more directly
> (and robustly).  So this test uses system suspend, automated with pm_test.
> 
> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>

Hi, Alan, thanks for the test. I was able to come up with a
deterministic reproducer which runs in a few seconds, added here:
https://github.com/osandov/blktests/blob/master/tests/block/016.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4] blktests: regression test "block: do not use interruptible wait anywhere"
  2018-04-24 22:46                         ` Omar Sandoval
@ 2018-04-25  8:46                           ` Alan Jenkins
  0 siblings, 0 replies; 23+ messages in thread
From: Alan Jenkins @ 2018-04-25  8:46 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Johannes Thumshirn, Jens Axboe, linux-block, Bart Van Assche

On 24/04/2018, Omar Sandoval <osandov@osandov.com> wrote:
> On Tue, Apr 17, 2018 at 04:21:36PM +0100, Alan Jenkins wrote:
>> > Without this fix, I get an IO error in this test:
>> >
>> > # dd if=/dev/sda of=/dev/null iflag=direct & \
>> >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>> >   echo mem > /sys/power/state ; \
>> >   sleep 5; killall dd  # stop after 5 seconds
>>
>> linux-block specifically asked for a test derived from this reproducer.
>> They didn't come up with any suggestion for testing the code more
>> directly
>> (and robustly).  So this test uses system suspend, automated with
>> pm_test.
>>
>> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>
> Hi, Alan, thanks for the test. I was able to come up with a
> deterministic reproducer which runs in a few seconds, added here:
> https://github.com/osandov/blktests/blob/master/tests/block/016.
On 24/04/2018, Omar Sandoval <osandov@osandov.com> wrote:
> On Tue, Apr 17, 2018 at 04:21:36PM +0100, Alan Jenkins wrote:
>> > Without this fix, I get an IO error in this test:
>> >
>> > # dd if=/dev/sda of=/dev/null iflag=direct & \
>> >   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>> >   echo mem > /sys/power/state ; \
>> >   sleep 5; killall dd  # stop after 5 seconds
>>
>> linux-block specifically asked for a test derived from this reproducer.
>> They didn't come up with any suggestion for testing the code more
>> directly
>> (and robustly).  So this test uses system suspend, automated with
>> pm_test.
>>
>> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>
> Hi, Alan, thanks for the test. I was able to come up with a
> deterministic reproducer which runs in a few seconds, added here:
> https://github.com/osandov/blktests/blob/master/tests/block/016.

Thanks for the update, you may consider this reviewed.

It answers my concern and makes it look so easy.  I will console
myself that at least I had the right question :-).

Regards
Alan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] block: do not use interruptible wait anywhere
  2018-06-14 15:36   ` Alan Jenkins
@ 2018-06-15  5:49     ` Khalid Elmously
  0 siblings, 0 replies; 23+ messages in thread
From: Khalid Elmously @ 2018-06-15  5:49 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: kernel-team, stable

I screwed up with git-send-email (I think there was a change in its behaviour recently). I shouldn't have sent to -stable - sorry.



On 2018-06-14 16:36:46 , Alan Jenkins wrote:
> Hi Khaled
> 
> As per the Ubuntu bug I quoted in my message to the Ubuntu kernel team, this
> patch has already been accepted in -stable kernel 4.16.7.� AFAIK there is no
> need to resend to -stable, on the basis that they are not currently
> maintaining a 4.15.x kernel.
> 
> I'm not deeply familiar so it is possible I am mis-understanding. (Hence I
> will restrain myself from writing further commentary :-P).
> 
> Regards
> 
> Alan
> 
> 
> On 14/06/18 16:06, Khaled Elmously wrote:
> > stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again).
> > 
> > 
> > 
> > On 2018-06-14 10:53:18 , Khalid Elmously wrote:
> > > From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> > > 
> > > BugLink: http://bugs.launchpad.net/bugs/1776887
> > > 
> > > When blk_queue_enter() waits for a queue to unfreeze, or unset the
> > > PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
> > > 
> > > The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> > > ("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
> > > device is resumed asynchronously, i.e. after un-freezing userspace tasks.
> > > 
> > > So that commit exposed the bug as a regression in v4.15.  A mysterious
> > > SIGBUS (or -EIO) sometimes happened during the time the device was being
> > > resumed.  Most frequently, there was no kernel log message, and we saw Xorg
> > > or Xwayland killed by SIGBUS.[1]
> > > 
> > > [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
> > > 
> > > Without this fix, I get an IO error in this test:
> > > 
> > > # dd if=/dev/sda of=/dev/null iflag=direct & \
> > >    while killall -SIGUSR1 dd; do sleep 0.1; done & \
> > >    echo mem > /sys/power/state ; \
> > >    sleep 5; killall dd  # stop after 5 seconds
> > > 
> > > The interruptible wait was added to blk_queue_enter in
> > > commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> > > Before then, the interruptible wait was only in blk-mq, but I don't think
> > > it could ever have been correct.
> > > 
> > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > > (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
> > > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
> > > 
> > > ---
> > >   block/blk-core.c | 11 ++++-------
> > >   1 file changed, 4 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index fc0666354af3..59c91e345eea 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> > >   	while (true) {
> > >   		bool success = false;
> > > -		int ret;
> > >   		rcu_read_lock();
> > >   		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> > > @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> > >   		 */
> > >   		smp_rmb();
> > > -		ret = wait_event_interruptible(q->mq_freeze_wq,
> > > -				(atomic_read(&q->mq_freeze_depth) == 0 &&
> > > -				 (preempt || !blk_queue_preempt_only(q))) ||
> > > -				blk_queue_dying(q));
> > > +		wait_event(q->mq_freeze_wq,
> > > +			   (atomic_read(&q->mq_freeze_depth) == 0 &&
> > > +			    (preempt || !blk_queue_preempt_only(q))) ||
> > > +			   blk_queue_dying(q));
> > >   		if (blk_queue_dying(q))
> > >   			return -ENODEV;
> > > -		if (ret)
> > > -			return ret;
> > >   	}
> > >   }
> > > -- 
> > > 2.17.1
> > > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] block: do not use interruptible wait anywhere
  2018-06-14 15:06 ` Khaled Elmously
@ 2018-06-14 15:36   ` Alan Jenkins
  2018-06-15  5:49     ` Khalid Elmously
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Jenkins @ 2018-06-14 15:36 UTC (permalink / raw)
  To: Khaled Elmously, kernel-team; +Cc: stable

Hi Khaled

As per the Ubuntu bug I quoted in my message to the Ubuntu kernel team, 
this patch has already been accepted in -stable kernel 4.16.7.  AFAIK 
there is no need to resend to -stable, on the basis that they are not 
currently maintaining a 4.15.x kernel.

I'm not deeply familiar so it is possible I am mis-understanding. (Hence 
I will restrain myself from writing further commentary :-P).

Regards

Alan


On 14/06/18 16:06, Khaled Elmously wrote:
> stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again).
>
>
>
> On 2018-06-14 10:53:18 , Khalid Elmously wrote:
>> From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>>
>> BugLink: http://bugs.launchpad.net/bugs/1776887
>>
>> When blk_queue_enter() waits for a queue to unfreeze, or unset the
>> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
>>
>> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
>> ("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
>> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
>>
>> So that commit exposed the bug as a regression in v4.15.  A mysterious
>> SIGBUS (or -EIO) sometimes happened during the time the device was being
>> resumed.  Most frequently, there was no kernel log message, and we saw Xorg
>> or Xwayland killed by SIGBUS.[1]
>>
>> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
>>
>> Without this fix, I get an IO error in this test:
>>
>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>    while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>    echo mem > /sys/power/state ; \
>>    sleep 5; killall dd  # stop after 5 seconds
>>
>> The interruptible wait was added to blk_queue_enter in
>> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
>> Before then, the interruptible wait was only in blk-mq, but I don't think
>> it could ever have been correct.
>>
>> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
>> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
>>
>> ---
>>   block/blk-core.c | 11 ++++-------
>>   1 file changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index fc0666354af3..59c91e345eea 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>>   
>>   	while (true) {
>>   		bool success = false;
>> -		int ret;
>>   
>>   		rcu_read_lock();
>>   		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
>> @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>>   		 */
>>   		smp_rmb();
>>   
>> -		ret = wait_event_interruptible(q->mq_freeze_wq,
>> -				(atomic_read(&q->mq_freeze_depth) == 0 &&
>> -				 (preempt || !blk_queue_preempt_only(q))) ||
>> -				blk_queue_dying(q));
>> +		wait_event(q->mq_freeze_wq,
>> +			   (atomic_read(&q->mq_freeze_depth) == 0 &&
>> +			    (preempt || !blk_queue_preempt_only(q))) ||
>> +			   blk_queue_dying(q));
>>   		if (blk_queue_dying(q))
>>   			return -ENODEV;
>> -		if (ret)
>> -			return ret;
>>   	}
>>   }
>>   
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] block: do not use interruptible wait anywhere
  2018-06-14 14:53 Khalid Elmously
@ 2018-06-14 15:06 ` Khaled Elmously
  2018-06-14 15:36   ` Alan Jenkins
  0 siblings, 1 reply; 23+ messages in thread
From: Khaled Elmously @ 2018-06-14 15:06 UTC (permalink / raw)
  To: kernel-team; +Cc: Alan Jenkins, stable

stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again).



On 2018-06-14 10:53:18 , Khalid Elmously wrote:
> From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1776887
> 
> When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
> 
> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> ("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
> 
> So that commit exposed the bug as a regression in v4.15.  A mysterious
> SIGBUS (or -EIO) sometimes happened during the time the device was being
> resumed.  Most frequently, there was no kernel log message, and we saw Xorg
> or Xwayland killed by SIGBUS.[1]
> 
> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
> 
> Without this fix, I get an IO error in this test:
> 
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds
> 
> The interruptible wait was added to blk_queue_enter in
> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> Before then, the interruptible wait was only in blk-mq, but I don't think
> it could ever have been correct.
> 
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
> 
> ---
>  block/blk-core.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fc0666354af3..59c91e345eea 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  
>  	while (true) {
>  		bool success = false;
> -		int ret;
>  
>  		rcu_read_lock();
>  		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  		 */
>  		smp_rmb();
>  
> -		ret = wait_event_interruptible(q->mq_freeze_wq,
> -				(atomic_read(&q->mq_freeze_depth) == 0 &&
> -				 (preempt || !blk_queue_preempt_only(q))) ||
> -				blk_queue_dying(q));
> +		wait_event(q->mq_freeze_wq,
> +			   (atomic_read(&q->mq_freeze_depth) == 0 &&
> +			    (preempt || !blk_queue_preempt_only(q))) ||
> +			   blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;
> -		if (ret)
> -			return ret;
>  	}
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] block: do not use interruptible wait anywhere
@ 2018-06-14 14:53 Khalid Elmously
  2018-06-14 15:06 ` Khaled Elmously
  0 siblings, 1 reply; 23+ messages in thread
From: Khalid Elmously @ 2018-06-14 14:53 UTC (permalink / raw)
  To: kernel-team; +Cc: khalid.elmously, Alan Jenkins, stable

From: Alan Jenkins <alan.christopher.jenkins@gmail.com>

BugLink: http://bugs.launchpad.net/bugs/1776887

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

# dd if=/dev/sda of=/dev/null iflag=direct & \
  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

---
 block/blk-core.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fc0666354af3..59c91e345eea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 	while (true) {
 		bool success = false;
-		int ret;
 
 		rcu_read_lock();
 		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
-		ret = wait_event_interruptible(q->mq_freeze_wq,
-				(atomic_read(&q->mq_freeze_depth) == 0 &&
-				 (preempt || !blk_queue_preempt_only(q))) ||
-				blk_queue_dying(q));
+		wait_event(q->mq_freeze_wq,
+			   (atomic_read(&q->mq_freeze_depth) == 0 &&
+			    (preempt || !blk_queue_preempt_only(q))) ||
+			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
-		if (ret)
-			return ret;
 	}
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] block: do not use interruptible wait anywhere
@ 2018-06-14 14:50 Khalid Elmously
  0 siblings, 0 replies; 23+ messages in thread
From: Khalid Elmously @ 2018-06-14 14:50 UTC (permalink / raw)
  To: 999; +Cc: khalid.elmously, Alan Jenkins, stable

From: Alan Jenkins <alan.christopher.jenkins@gmail.com>

BugLink: http://bugs.launchpad.net/bugs/1776887

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

# dd if=/dev/sda of=/dev/null iflag=direct & \
  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>

---
 block/blk-core.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fc0666354af3..59c91e345eea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 	while (true) {
 		bool success = false;
-		int ret;
 
 		rcu_read_lock();
 		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		 */
 		smp_rmb();
 
-		ret = wait_event_interruptible(q->mq_freeze_wq,
-				(atomic_read(&q->mq_freeze_depth) == 0 &&
-				 (preempt || !blk_queue_preempt_only(q))) ||
-				blk_queue_dying(q));
+		wait_event(q->mq_freeze_wq,
+			   (atomic_read(&q->mq_freeze_depth) == 0 &&
+			    (preempt || !blk_queue_preempt_only(q))) ||
+			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
-		if (ret)
-			return ret;
 	}
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-15  5:49 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 16:23 [PATCH] block: do not use interruptible wait anywhere Alan Jenkins
2018-04-12 17:51 ` Bart Van Assche
2018-04-12 17:51   ` Bart Van Assche
2018-04-12 18:11   ` [PATCH v2] " Alan Jenkins
2018-04-13  8:31     ` Johannes Thumshirn
2018-04-14 19:46       ` blktest for " Alan Jenkins
2018-04-14 19:52         ` Jens Axboe
2018-04-15 12:15           ` Alan Jenkins
2018-04-16 21:23             ` [PATCH] blktests: regression test "block: do not use interruptible wait anywhere" Alan Jenkins
2018-04-16 21:52               ` Alan Jenkins
2018-04-17  7:21                 ` Johannes Thumshirn
2018-04-17 15:55                   ` Alan Jenkins
2018-04-17 15:10                     ` [PATCH v3] " Alan Jenkins
2018-04-17 15:21                       ` [PATCH v4] " Alan Jenkins
2018-04-24 22:46                         ` Omar Sandoval
2018-04-25  8:46                           ` Alan Jenkins
2018-04-18  7:25                     ` [PATCH] " Johannes Thumshirn
2018-04-14 19:54     ` [PATCH v2] block: do not use interruptible wait anywhere Jens Axboe
2018-06-14 14:50 [PATCH] " Khalid Elmously
2018-06-14 14:53 Khalid Elmously
2018-06-14 15:06 ` Khaled Elmously
2018-06-14 15:36   ` Alan Jenkins
2018-06-15  5:49     ` Khalid Elmously

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.