* [PATCH] block: do not use interruptible wait anywhere
@ 2018-06-14 14:53 Khalid Elmously
2018-06-14 14:56 ` NACK: " Khaled Elmously
2018-06-14 15:06 ` Khaled Elmously
0 siblings, 2 replies; 9+ messages in thread
From: Khalid Elmously @ 2018-06-14 14:53 UTC (permalink / raw)
To: kernel-team; +Cc: khalid.elmously, Alan Jenkins, stable
From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
BugLink: http://bugs.launchpad.net/bugs/1776887
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
---
block/blk-core.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index fc0666354af3..59c91e345eea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
while (true) {
bool success = false;
- int ret;
rcu_read_lock();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
*/
smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq,
- (atomic_read(&q->mq_freeze_depth) == 0 &&
- (preempt || !blk_queue_preempt_only(q))) ||
- blk_queue_dying(q));
+ wait_event(q->mq_freeze_wq,
+ (atomic_read(&q->mq_freeze_depth) == 0 &&
+ (preempt || !blk_queue_preempt_only(q))) ||
+ blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
- if (ret)
- return ret;
}
}
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* NACK: [PATCH] block: do not use interruptible wait anywhere
2018-06-14 14:53 [PATCH] block: do not use interruptible wait anywhere Khalid Elmously
@ 2018-06-14 14:56 ` Khaled Elmously
2018-06-14 15:06 ` Khaled Elmously
1 sibling, 0 replies; 9+ messages in thread
From: Khaled Elmously @ 2018-06-14 14:56 UTC (permalink / raw)
To: kernel-team; +Cc: Alan Jenkins, stable
Sent a v2 with correct subject tags
On 2018-06-14 10:53:18 , Khalid Elmously wrote:
> From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>
> BugLink: http://bugs.launchpad.net/bugs/1776887
>
> When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
>
> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
>
> So that commit exposed the bug as a regression in v4.15. A mysterious
> SIGBUS (or -EIO) sometimes happened during the time the device was being
> resumed. Most frequently, there was no kernel log message, and we saw Xorg
> or Xwayland killed by SIGBUS.[1]
>
> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
>
> Without this fix, I get an IO error in this test:
>
> # dd if=/dev/sda of=/dev/null iflag=direct & \
> while killall -SIGUSR1 dd; do sleep 0.1; done & \
> echo mem > /sys/power/state ; \
> sleep 5; killall dd # stop after 5 seconds
>
> The interruptible wait was added to blk_queue_enter in
> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> Before then, the interruptible wait was only in blk-mq, but I don't think
> it could ever have been correct.
>
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
>
> ---
> block/blk-core.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fc0666354af3..59c91e345eea 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>
> while (true) {
> bool success = false;
> - int ret;
>
> rcu_read_lock();
> if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> */
> smp_rmb();
>
> - ret = wait_event_interruptible(q->mq_freeze_wq,
> - (atomic_read(&q->mq_freeze_depth) == 0 &&
> - (preempt || !blk_queue_preempt_only(q))) ||
> - blk_queue_dying(q));
> + wait_event(q->mq_freeze_wq,
> + (atomic_read(&q->mq_freeze_depth) == 0 &&
> + (preempt || !blk_queue_preempt_only(q))) ||
> + blk_queue_dying(q));
> if (blk_queue_dying(q))
> return -ENODEV;
> - if (ret)
> - return ret;
> }
> }
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] block: do not use interruptible wait anywhere
2018-06-14 14:53 [PATCH] block: do not use interruptible wait anywhere Khalid Elmously
2018-06-14 14:56 ` NACK: " Khaled Elmously
@ 2018-06-14 15:06 ` Khaled Elmously
2018-06-14 15:36 ` Alan Jenkins
1 sibling, 1 reply; 9+ messages in thread
From: Khaled Elmously @ 2018-06-14 15:06 UTC (permalink / raw)
To: kernel-team; +Cc: Alan Jenkins, stable
stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again).
On 2018-06-14 10:53:18 , Khalid Elmously wrote:
> From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>
> BugLink: http://bugs.launchpad.net/bugs/1776887
>
> When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
>
> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
>
> So that commit exposed the bug as a regression in v4.15. A mysterious
> SIGBUS (or -EIO) sometimes happened during the time the device was being
> resumed. Most frequently, there was no kernel log message, and we saw Xorg
> or Xwayland killed by SIGBUS.[1]
>
> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
>
> Without this fix, I get an IO error in this test:
>
> # dd if=/dev/sda of=/dev/null iflag=direct & \
> while killall -SIGUSR1 dd; do sleep 0.1; done & \
> echo mem > /sys/power/state ; \
> sleep 5; killall dd # stop after 5 seconds
>
> The interruptible wait was added to blk_queue_enter in
> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> Before then, the interruptible wait was only in blk-mq, but I don't think
> it could ever have been correct.
>
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
>
> ---
> block/blk-core.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fc0666354af3..59c91e345eea 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>
> while (true) {
> bool success = false;
> - int ret;
>
> rcu_read_lock();
> if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> */
> smp_rmb();
>
> - ret = wait_event_interruptible(q->mq_freeze_wq,
> - (atomic_read(&q->mq_freeze_depth) == 0 &&
> - (preempt || !blk_queue_preempt_only(q))) ||
> - blk_queue_dying(q));
> + wait_event(q->mq_freeze_wq,
> + (atomic_read(&q->mq_freeze_depth) == 0 &&
> + (preempt || !blk_queue_preempt_only(q))) ||
> + blk_queue_dying(q));
> if (blk_queue_dying(q))
> return -ENODEV;
> - if (ret)
> - return ret;
> }
> }
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] block: do not use interruptible wait anywhere
2018-06-14 15:06 ` Khaled Elmously
@ 2018-06-14 15:36 ` Alan Jenkins
2018-06-15 5:49 ` Khalid Elmously
0 siblings, 1 reply; 9+ messages in thread
From: Alan Jenkins @ 2018-06-14 15:36 UTC (permalink / raw)
To: Khaled Elmously, kernel-team; +Cc: stable
Hi Khaled
As per the Ubuntu bug I quoted in my message to the Ubuntu kernel team,
this patch has already been accepted in -stable kernel 4.16.7. AFAIK
there is no need to resend to -stable, on the basis that they are not
currently maintaining a 4.15.x kernel.
I'm not deeply familiar so it is possible I am mis-understanding. (Hence
I will restrain myself from writing further commentary :-P).
Regards
Alan
On 14/06/18 16:06, Khaled Elmously wrote:
> stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again).
>
>
>
> On 2018-06-14 10:53:18 , Khalid Elmously wrote:
>> From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>>
>> BugLink: http://bugs.launchpad.net/bugs/1776887
>>
>> When blk_queue_enter() waits for a queue to unfreeze, or unset the
>> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
>>
>> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
>> ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
>> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
>>
>> So that commit exposed the bug as a regression in v4.15. A mysterious
>> SIGBUS (or -EIO) sometimes happened during the time the device was being
>> resumed. Most frequently, there was no kernel log message, and we saw Xorg
>> or Xwayland killed by SIGBUS.[1]
>>
>> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
>>
>> Without this fix, I get an IO error in this test:
>>
>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>> while killall -SIGUSR1 dd; do sleep 0.1; done & \
>> echo mem > /sys/power/state ; \
>> sleep 5; killall dd # stop after 5 seconds
>>
>> The interruptible wait was added to blk_queue_enter in
>> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
>> Before then, the interruptible wait was only in blk-mq, but I don't think
>> it could ever have been correct.
>>
>> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
>> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
>>
>> ---
>> block/blk-core.c | 11 ++++-------
>> 1 file changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index fc0666354af3..59c91e345eea 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>>
>> while (true) {
>> bool success = false;
>> - int ret;
>>
>> rcu_read_lock();
>> if (percpu_ref_tryget_live(&q->q_usage_counter)) {
>> @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>> */
>> smp_rmb();
>>
>> - ret = wait_event_interruptible(q->mq_freeze_wq,
>> - (atomic_read(&q->mq_freeze_depth) == 0 &&
>> - (preempt || !blk_queue_preempt_only(q))) ||
>> - blk_queue_dying(q));
>> + wait_event(q->mq_freeze_wq,
>> + (atomic_read(&q->mq_freeze_depth) == 0 &&
>> + (preempt || !blk_queue_preempt_only(q))) ||
>> + blk_queue_dying(q));
>> if (blk_queue_dying(q))
>> return -ENODEV;
>> - if (ret)
>> - return ret;
>> }
>> }
>>
>> --
>> 2.17.1
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] block: do not use interruptible wait anywhere
2018-06-14 15:36 ` Alan Jenkins
@ 2018-06-15 5:49 ` Khalid Elmously
0 siblings, 0 replies; 9+ messages in thread
From: Khalid Elmously @ 2018-06-15 5:49 UTC (permalink / raw)
To: Alan Jenkins; +Cc: kernel-team, stable
I screwed up with git-send-email (I think there was a change in its behaviour recently). I shouldn't have sent to -stable - sorry.
On 2018-06-14 16:36:46 , Alan Jenkins wrote:
> Hi Khaled
>
> As per the Ubuntu bug I quoted in my message to the Ubuntu kernel team, this
> patch has already been accepted in -stable kernel 4.16.7.� AFAIK there is no
> need to resend to -stable, on the basis that they are not currently
> maintaining a 4.15.x kernel.
>
> I'm not deeply familiar so it is possible I am mis-understanding. (Hence I
> will restrain myself from writing further commentary :-P).
>
> Regards
>
> Alan
>
>
> On 14/06/18 16:06, Khaled Elmously wrote:
> > stable@vger.kernel.org : Please disregard this whole email thread, and sorry for the spam. Not sure why git-send-email is doing this to me (again).
> >
> >
> >
> > On 2018-06-14 10:53:18 , Khalid Elmously wrote:
> > > From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> > >
> > > BugLink: http://bugs.launchpad.net/bugs/1776887
> > >
> > > When blk_queue_enter() waits for a queue to unfreeze, or unset the
> > > PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
> > >
> > > The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> > > ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
> > > device is resumed asynchronously, i.e. after un-freezing userspace tasks.
> > >
> > > So that commit exposed the bug as a regression in v4.15. A mysterious
> > > SIGBUS (or -EIO) sometimes happened during the time the device was being
> > > resumed. Most frequently, there was no kernel log message, and we saw Xorg
> > > or Xwayland killed by SIGBUS.[1]
> > >
> > > [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
> > >
> > > Without this fix, I get an IO error in this test:
> > >
> > > # dd if=/dev/sda of=/dev/null iflag=direct & \
> > > while killall -SIGUSR1 dd; do sleep 0.1; done & \
> > > echo mem > /sys/power/state ; \
> > > sleep 5; killall dd # stop after 5 seconds
> > >
> > > The interruptible wait was added to blk_queue_enter in
> > > commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> > > Before then, the interruptible wait was only in blk-mq, but I don't think
> > > it could ever have been correct.
> > >
> > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
> > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > > (cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
> > > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
> > >
> > > ---
> > > block/blk-core.c | 11 ++++-------
> > > 1 file changed, 4 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index fc0666354af3..59c91e345eea 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> > > while (true) {
> > > bool success = false;
> > > - int ret;
> > > rcu_read_lock();
> > > if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> > > @@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> > > */
> > > smp_rmb();
> > > - ret = wait_event_interruptible(q->mq_freeze_wq,
> > > - (atomic_read(&q->mq_freeze_depth) == 0 &&
> > > - (preempt || !blk_queue_preempt_only(q))) ||
> > > - blk_queue_dying(q));
> > > + wait_event(q->mq_freeze_wq,
> > > + (atomic_read(&q->mq_freeze_depth) == 0 &&
> > > + (preempt || !blk_queue_preempt_only(q))) ||
> > > + blk_queue_dying(q));
> > > if (blk_queue_dying(q))
> > > return -ENODEV;
> > > - if (ret)
> > > - return ret;
> > > }
> > > }
> > > --
> > > 2.17.1
> > >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] block: do not use interruptible wait anywhere
@ 2018-06-14 14:50 Khalid Elmously
0 siblings, 0 replies; 9+ messages in thread
From: Khalid Elmously @ 2018-06-14 14:50 UTC (permalink / raw)
To: 999; +Cc: khalid.elmously, Alan Jenkins, stable
From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
BugLink: http://bugs.launchpad.net/bugs/1776887
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
---
block/blk-core.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index fc0666354af3..59c91e345eea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
while (true) {
bool success = false;
- int ret;
rcu_read_lock();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
*/
smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq,
- (atomic_read(&q->mq_freeze_depth) == 0 &&
- (preempt || !blk_queue_preempt_only(q))) ||
- blk_queue_dying(q));
+ wait_event(q->mq_freeze_wq,
+ (atomic_read(&q->mq_freeze_depth) == 0 &&
+ (preempt || !blk_queue_preempt_only(q))) ||
+ blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
- if (ret)
- return ret;
}
}
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] block: do not use interruptible wait anywhere
@ 2018-04-12 16:23 Alan Jenkins
2018-04-12 17:51 ` Bart Van Assche
0 siblings, 1 reply; 9+ messages in thread
From: Alan Jenkins @ 2018-04-12 16:23 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: linux-kernel, Alan Jenkins, stable
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
---
block/blk-core.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index abcb8684ba67..5a6d20069364 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -915,7 +915,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
while (true) {
bool success = false;
- int ret;
rcu_read_lock();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
*/
smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq,
+ wait_event(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
- if (ret)
- return ret;
}
}
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] block: do not use interruptible wait anywhere
2018-04-12 16:23 Alan Jenkins
@ 2018-04-12 17:51 ` Bart Van Assche
0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2018-04-12 17:51 UTC (permalink / raw)
To: alan.christopher.jenkins, linux-block, axboe; +Cc: linux-kernel, stable
T24gVGh1LCAyMDE4LTA0LTEyIGF0IDE3OjIzICswMTAwLCBBbGFuIEplbmtpbnMgd3JvdGU6DQo+
IEBAIC05NDcsMTQgKzk0NiwxMiBAQCBpbnQgYmxrX3F1ZXVlX2VudGVyKHN0cnVjdCByZXF1ZXN0
X3F1ZXVlICpxLCBibGtfbXFfcmVxX2ZsYWdzX3QgZmxhZ3MpDQo+ICAJCSAqLw0KPiAgCQlzbXBf
cm1iKCk7DQo+ICANCj4gLQkJcmV0ID0gd2FpdF9ldmVudF9pbnRlcnJ1cHRpYmxlKHEtPm1xX2Zy
ZWV6ZV93cSwNCj4gKwkJd2FpdF9ldmVudChxLT5tcV9mcmVlemVfd3EsDQo+ICAJCQkJKGF0b21p
Y19yZWFkKCZxLT5tcV9mcmVlemVfZGVwdGgpID09IDAgJiYNCj4gIAkJCQkgKHByZWVtcHQgfHwg
IWJsa19xdWV1ZV9wcmVlbXB0X29ubHkocSkpKSB8fA0KPiAgCQkJCWJsa19xdWV1ZV9keWluZyhx
KSk7DQo+ICAJCWlmIChibGtfcXVldWVfZHlpbmcocSkpDQo+ICAJCQlyZXR1cm4gLUVOT0RFVjsN
Cj4gLQkJaWYgKHJldCkNCj4gLQkJCXJldHVybiByZXQ7DQo+ICAJfQ0KPiAgfQ0KDQpIZWxsbyBB
bGFuLA0KDQpQbGVhc2UgcmVpbmRlbnQgdGhlIHdhaXRfZXZlbnQoKSBhcmd1bWVudHMgc3VjaCB0
aGF0IHRoZXNlIHJlbWFpbiBhbGlnbmVkLg0KDQpBbnl3YXk6DQoNClJldmlld2VkLWJ5OiBCYXJ0
IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+DQoNCg0K
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] block: do not use interruptible wait anywhere
@ 2018-04-12 17:51 ` Bart Van Assche
0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2018-04-12 17:51 UTC (permalink / raw)
To: alan.christopher.jenkins, linux-block, axboe; +Cc: linux-kernel, stable
On Thu, 2018-04-12 at 17:23 +0100, Alan Jenkins wrote:
> @@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
> */
> smp_rmb();
>
> - ret = wait_event_interruptible(q->mq_freeze_wq,
> + wait_event(q->mq_freeze_wq,
> (atomic_read(&q->mq_freeze_depth) == 0 &&
> (preempt || !blk_queue_preempt_only(q))) ||
> blk_queue_dying(q));
> if (blk_queue_dying(q))
> return -ENODEV;
> - if (ret)
> - return ret;
> }
> }
Hello Alan,
Please reindent the wait_event() arguments such that these remain aligned.
Anyway:
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-06-15 5:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-14 14:53 [PATCH] block: do not use interruptible wait anywhere Khalid Elmously
2018-06-14 14:56 ` NACK: " Khaled Elmously
2018-06-14 15:06 ` Khaled Elmously
2018-06-14 15:36 ` Alan Jenkins
2018-06-15 5:49 ` Khalid Elmously
-- strict thread matches above, loose matches on Subject: below --
2018-06-14 14:50 Khalid Elmously
2018-04-12 16:23 Alan Jenkins
2018-04-12 17:51 ` Bart Van Assche
2018-04-12 17:51 ` Bart Van Assche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.