* [PATCH] block: Fix bug in runtime-resume handling
[not found] ` <6f0c530f-4309-ab1e-393b-83bf8367f59e@puri.sm>
@ 2020-08-23 14:57 ` Alan Stern
2020-08-24 17:48 ` Bart Van Assche
0 siblings, 1 reply; 12+ messages in thread
From: Alan Stern @ 2020-08-23 14:57 UTC (permalink / raw)
To: Jens Axboe, Martin Kepplinger
Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
Runtime power-management support in the block layer has somehow gotten
messed up in the past few years. This area of code has been through a
lot of churn and it's not easy to track exactly when the bug addressed
here was introduced; it was present for a while, then fixed for a
while, and then it returned.
At any rate, the problem is that the block layer code thinks that it's
okay to issue a request with the BLK_MQ_REQ_PREEMPT flag set at any
time, even when the queue and underlying device are runtime suspended.
This belief is wrong; the flag merely indicates that it's okay to
issue the request while the queue and device are in the process of
suspending or resuming. When they are actually suspended, no requests
may be issued.
The symptom of this bug is that a runtime-suspended block device
doesn't resume as it should. The request which should cause a runtime
resume instead gets issued directly, without resuming the device
first. Of course the device can't handle it properly, the I/O
fails, and the device remains suspended.
The problem is fixed by checking that the queue's runtime-PM status
isn't RPM_SUSPENDED before allowing a request to be issued, and
queuing a runtime-resume request if it is. In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and
the code is unified by merging the surrounding checks into the
routine. If the queue isn't set up for runtime PM, or there currently
is no restriction on allowed requests, the request is allowed.
Likewise if the BLK_MQ_REQ_PREEMPT flag is set and the status isn't
RPM_SUSPENDED. Otherwise a runtime resume is queued and the request
is blocked until conditions are more suitable.
Reported-and-tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Bart Van Assche <bvanassche@acm.org>
CC: <stable@vger.kernel.org> # 5.2
---
The bug goes back way before 5.2, but other changes will prevent the
patch from applying directly to earlier kernels, so I'm limiting the
@stable updates to 5.2 and after.
[as1941]
block/blk-core.c | 6 +++---
block/blk-pm.h | 14 +++++++++-----
2 files changed, 12 insertions(+), 8 deletions(-)
Index: usb-devel/block/blk-core.c
===================================================================
--- usb-devel.orig/block/blk-core.c
+++ usb-devel/block/blk-core.c
@@ -423,7 +423,8 @@ int blk_queue_enter(struct request_queue
* responsible for ensuring that that counter is
* globally visible before the queue is unfrozen.
*/
- if (pm || !blk_queue_pm_only(q)) {
+ if ((pm && q->rpm_status != RPM_SUSPENDED) ||
+ !blk_queue_pm_only(q)) {
success = true;
} else {
percpu_ref_put(&q->q_usage_counter);
@@ -448,8 +449,7 @@ int blk_queue_enter(struct request_queue
wait_event(q->mq_freeze_wq,
(!q->mq_freeze_depth &&
- (pm || (blk_pm_request_resume(q),
- !blk_queue_pm_only(q)))) ||
+ blk_pm_resume_queue(pm, q)) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
Index: usb-devel/block/blk-pm.h
===================================================================
--- usb-devel.orig/block/blk-pm.h
+++ usb-devel/block/blk-pm.h
@@ -6,11 +6,14 @@
#include <linux/pm_runtime.h>
#ifdef CONFIG_PM
-static inline void blk_pm_request_resume(struct request_queue *q)
+static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
{
- if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
- q->rpm_status == RPM_SUSPENDING))
- pm_request_resume(q->dev);
+ if (!q->dev || !blk_queue_pm_only(q))
+ return 1; /* Nothing to do */
+ if (pm && q->rpm_status != RPM_SUSPENDED)
+ return 1; /* Request allowed */
+ pm_request_resume(q->dev);
+ return 0;
}
static inline void blk_pm_mark_last_busy(struct request *rq)
@@ -44,8 +47,9 @@ static inline void blk_pm_put_request(st
--rq->q->nr_pending;
}
#else
-static inline void blk_pm_request_resume(struct request_queue *q)
+static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
{
+ return 1;
}
static inline void blk_pm_mark_last_busy(struct request *rq)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-23 14:57 ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern
@ 2020-08-24 17:48 ` Bart Van Assche
2020-08-24 20:13 ` Alan Stern
0 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2020-08-24 17:48 UTC (permalink / raw)
To: Alan Stern, Jens Axboe, Martin Kepplinger
Cc: Can Guo, linux-scsi, linux-block, kernel
On 2020-08-23 07:57, Alan Stern wrote:
> The problem is fixed by checking that the queue's runtime-PM status
> isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Can, can you help with testing this patch?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-24 17:48 ` Bart Van Assche
@ 2020-08-24 20:13 ` Alan Stern
2020-08-26 7:48 ` Martin Kepplinger
0 siblings, 1 reply; 12+ messages in thread
From: Alan Stern @ 2020-08-24 20:13 UTC (permalink / raw)
To: Martin Kepplinger
Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On Mon, Aug 24, 2020 at 10:48:32AM -0700, Bart Van Assche wrote:
> On 2020-08-23 07:57, Alan Stern wrote:
> > The problem is fixed by checking that the queue's runtime-PM status
> > isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>
> Can, can you help with testing this patch?
>
> Thanks,
>
> Bart.
Martin:
(I forgot to ask this question several weeks ago, while you were running
your tests. Better ask it now before I forget again...)
I suspect the old runtime-PM code in the block layer would have worked
okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not
been set. Do you know why the flag was set, or what line of code caused
it to be set?
As I recall, the request that failed was a read-ahead. It's not clear
to me why such a request would need to have BLK_MQ_REQ_PREEMPT set.
Although there may be other reasons for having that flag, at this point
we're concerned with requests that are part of the runtime-resume
procedure. A read-ahead does not fall into that category.
Do you think you can find the answer? Perhaps we should add a separate
BLK_MQ_REQ_PM flag.
Alan Stern
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-24 20:13 ` Alan Stern
@ 2020-08-26 7:48 ` Martin Kepplinger
2020-08-27 17:42 ` Martin Kepplinger
0 siblings, 1 reply; 12+ messages in thread
From: Martin Kepplinger @ 2020-08-26 7:48 UTC (permalink / raw)
To: Alan Stern; +Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On 24.08.20 22:13, Alan Stern wrote:
> On Mon, Aug 24, 2020 at 10:48:32AM -0700, Bart Van Assche wrote:
>> On 2020-08-23 07:57, Alan Stern wrote:
>>> The problem is fixed by checking that the queue's runtime-PM status
>>> isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
>>
>> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>>
>> Can, can you help with testing this patch?
>>
>> Thanks,
>>
>> Bart.
>
> Martin:
>
> (I forgot to ask this question several weeks ago, while you were running
> your tests. Better ask it now before I forget again...)
>
> I suspect the old runtime-PM code in the block layer would have worked
> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not
> been set. Do you know why the flag was set, or what line of code caused
> it to be set?
Correct. if not set, I could handle all I need in the scsi error path.
>
> As I recall, the request that failed was a read-ahead. It's not clear
> to me why such a request would need to have BLK_MQ_REQ_PREEMPT set.
> Although there may be other reasons for having that flag, at this point
> we're concerned with requests that are part of the runtime-resume
> procedure. A read-ahead does not fall into that category.
>
> Do you think you can find the answer? Perhaps we should add a separate
> BLK_MQ_REQ_PM flag.
I guess I can but give me some time as I'm on vacations currently.
>
> Alan Stern
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-26 7:48 ` Martin Kepplinger
@ 2020-08-27 17:42 ` Martin Kepplinger
2020-08-27 20:29 ` Alan Stern
0 siblings, 1 reply; 12+ messages in thread
From: Martin Kepplinger @ 2020-08-27 17:42 UTC (permalink / raw)
To: Alan Stern; +Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On 26.08.20 09:48, Martin Kepplinger wrote:
> On 24.08.20 22:13, Alan Stern wrote:
>> On Mon, Aug 24, 2020 at 10:48:32AM -0700, Bart Van Assche wrote:
>>> On 2020-08-23 07:57, Alan Stern wrote:
>>>> The problem is fixed by checking that the queue's runtime-PM status
>>>> isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
>>>
>>> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>>>
>>> Can, can you help with testing this patch?
>>>
>>> Thanks,
>>>
>>> Bart.
>>
>> Martin:
>>
>> (I forgot to ask this question several weeks ago, while you were running
>> your tests. Better ask it now before I forget again...)
>>
>> I suspect the old runtime-PM code in the block layer would have worked
>> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not
>> been set. Do you know why the flag was set, or what line of code caused
>> it to be set?
>
> Correct. if not set, I could handle all I need in the scsi error path.
this thread becomes a bit confusing. I thought about REQ_FAILFAST_DEV
but you're talking about something different.
the only place I see BLK_MQ_REQ_PREEMPT getting passed on is in
__scsi_execute() which is the case when mounting/unmounting. At least
that about the only place I can find.
I remember *only* your block pm fix would let me mount/unmount, but not
use files yet (REQ_FAILFAST_DEV and so on).
When I revert your fix and remove BLK_MQ_REQ_PREEMPT from being passed
on to blk_get_request() in __scsi_execute(), that line gets executed
exactly once during startup and I'm missing the /dev/sda device from the
cardreader then.
Is this what you're asking?
>
>>
>> As I recall, the request that failed was a read-ahead. It's not clear
>> to me why such a request would need to have BLK_MQ_REQ_PREEMPT set.
>> Although there may be other reasons for having that flag, at this point
>> we're concerned with requests that are part of the runtime-resume
>> procedure. A read-ahead does not fall into that category.
>>
>> Do you think you can find the answer? Perhaps we should add a separate
>> BLK_MQ_REQ_PM flag.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-27 17:42 ` Martin Kepplinger
@ 2020-08-27 20:29 ` Alan Stern
2020-08-29 7:24 ` Martin Kepplinger
0 siblings, 1 reply; 12+ messages in thread
From: Alan Stern @ 2020-08-27 20:29 UTC (permalink / raw)
To: Martin Kepplinger
Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On Thu, Aug 27, 2020 at 07:42:43PM +0200, Martin Kepplinger wrote:
> On 26.08.20 09:48, Martin Kepplinger wrote:
> > On 24.08.20 22:13, Alan Stern wrote:
> >> Martin:
> >>
> >> (I forgot to ask this question several weeks ago, while you were running
> >> your tests. Better ask it now before I forget again...)
> >>
> >> I suspect the old runtime-PM code in the block layer would have worked
> >> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not
> >> been set. Do you know why the flag was set, or what line of code caused
> >> it to be set?
> >
> > Correct. if not set, I could handle all I need in the scsi error path.
>
> this thread becomes a bit confusing. I thought about REQ_FAILFAST_DEV
> but you're talking about something different.
>
> the only place I see BLK_MQ_REQ_PREEMPT getting passed on is in
> __scsi_execute() which is the case when mounting/unmounting. At least
> that about the only place I can find.
Ah yes, I see what you mean.
> I remember *only* your block pm fix would let me mount/unmount, but not
> use files yet (REQ_FAILFAST_DEV and so on).
>
> When I revert your fix and remove BLK_MQ_REQ_PREEMPT from being passed
> on to blk_get_request() in __scsi_execute(), that line gets executed
> exactly once during startup and I'm missing the /dev/sda device from the
> cardreader then.
>
> Is this what you're asking?
Not quite sure, but it doesn't matter. Removing BLK_MQ_REQ_PREEMPT in
__scsi_execute() is probably not a safe thing to do.
Instead, look at sd_resume(). That routine calls __scsi_execute()
indirectly through sd_start_stop_device(), and the only reason it does
this is because the sdkp->device->manage_start_stop flag is set. You
ought to be able to clear this flag in sysfs, by writing to
/sys/block/sda/device/scsi_disk/*/manage_start_stop. If you do this
before allowing the card reader to go into runtime suspend, does it then
resume okay?
(Yes, I know you still won't be able to read it because of the FAILFAST
flag. I just want to know if the runtime resume actually takes place.)
Alan Stern
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-27 20:29 ` Alan Stern
@ 2020-08-29 7:24 ` Martin Kepplinger
2020-08-29 15:26 ` Alan Stern
0 siblings, 1 reply; 12+ messages in thread
From: Martin Kepplinger @ 2020-08-29 7:24 UTC (permalink / raw)
To: Alan Stern; +Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On 27.08.20 22:29, Alan Stern wrote:
> On Thu, Aug 27, 2020 at 07:42:43PM +0200, Martin Kepplinger wrote:
>> On 26.08.20 09:48, Martin Kepplinger wrote:
>>> On 24.08.20 22:13, Alan Stern wrote:
>
>>>> Martin:
>>>>
>>>> (I forgot to ask this question several weeks ago, while you were running
>>>> your tests. Better ask it now before I forget again...)
>>>>
>>>> I suspect the old runtime-PM code in the block layer would have worked
>>>> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not
>>>> been set. Do you know why the flag was set, or what line of code caused
>>>> it to be set?
>>>
>>> Correct. if not set, I could handle all I need in the scsi error path.
>>
>> this thread becomes a bit confusing. I thought about REQ_FAILFAST_DEV
>> but you're talking about something different.
>>
>> the only place I see BLK_MQ_REQ_PREEMPT getting passed on is in
>> __scsi_execute() which is the case when mounting/unmounting. At least
>> that about the only place I can find.
>
> Ah yes, I see what you mean.
>
>> I remember *only* your block pm fix would let me mount/unmount, but not
>> use files yet (REQ_FAILFAST_DEV and so on).
>>
>> When I revert your fix and remove BLK_MQ_REQ_PREEMPT from being passed
>> on to blk_get_request() in __scsi_execute(), that line gets executed
>> exactly once during startup and I'm missing the /dev/sda device from the
>> cardreader then.
>>
>> Is this what you're asking?
>
> Not quite sure, but it doesn't matter. Removing BLK_MQ_REQ_PREEMPT in
> __scsi_execute() is probably not a safe thing to do.
>
> Instead, look at sd_resume(). That routine calls __scsi_execute()
> indirectly through sd_start_stop_device(), and the only reason it does
> this is because the sdkp->device->manage_start_stop flag is set. You
> ought to be able to clear this flag in sysfs, by writing to
> /sys/block/sda/device/scsi_disk/*/manage_start_stop. If you do this
> before allowing the card reader to go into runtime suspend, does it then
> resume okay?
manage_start_stop in sysfs is 0 here.
>
> (Yes, I know you still won't be able to read it because of the FAILFAST
> flag. I just want to know if the runtime resume actually takes place.)
>
> Alan Stern
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-29 7:24 ` Martin Kepplinger
@ 2020-08-29 15:26 ` Alan Stern
2020-08-29 16:33 ` Martin Kepplinger
0 siblings, 1 reply; 12+ messages in thread
From: Alan Stern @ 2020-08-29 15:26 UTC (permalink / raw)
To: Martin Kepplinger
Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On Sat, Aug 29, 2020 at 09:24:30AM +0200, Martin Kepplinger wrote:
> On 27.08.20 22:29, Alan Stern wrote:
> > Instead, look at sd_resume(). That routine calls __scsi_execute()
> > indirectly through sd_start_stop_device(), and the only reason it does
> > this is because the sdkp->device->manage_start_stop flag is set. You
> > ought to be able to clear this flag in sysfs, by writing to
> > /sys/block/sda/device/scsi_disk/*/manage_start_stop. If you do this
> > before allowing the card reader to go into runtime suspend, does it then
> > resume okay?
>
> manage_start_stop in sysfs is 0 here.
Hmmm. I'm wondering about something you wrote back in June
(https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
blk_queue_enter() always - especially when sd is runtime
suspended and I try to mount as above - sets success to be true
for me, so never continues down to bkl_pm_request_resume(). All
I see is "PM: Removing info for No Bus:sda1".
blk_queue_enter() would always set success to be true because pm
(derived from the BLK_MQ_REQ_PREEMPT flag) is true. But why was the
BLK_MQ_REQ_PREEMPT flag set? In other words, where was
blk_queue_enter() called from?
Can you get a stack trace (i.e., call dump_stack()) at exactly this
point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED? Or
do you already know the answer?
Alan Stern
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-29 15:26 ` Alan Stern
@ 2020-08-29 16:33 ` Martin Kepplinger
2020-08-29 18:56 ` Alan Stern
0 siblings, 1 reply; 12+ messages in thread
From: Martin Kepplinger @ 2020-08-29 16:33 UTC (permalink / raw)
To: Alan Stern; +Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On 29.08.20 17:26, Alan Stern wrote:
> On Sat, Aug 29, 2020 at 09:24:30AM +0200, Martin Kepplinger wrote:
>> On 27.08.20 22:29, Alan Stern wrote:
>>> Instead, look at sd_resume(). That routine calls __scsi_execute()
>>> indirectly through sd_start_stop_device(), and the only reason it does
>>> this is because the sdkp->device->manage_start_stop flag is set. You
>>> ought to be able to clear this flag in sysfs, by writing to
>>> /sys/block/sda/device/scsi_disk/*/manage_start_stop. If you do this
>>> before allowing the card reader to go into runtime suspend, does it then
>>> resume okay?
>>
>> manage_start_stop in sysfs is 0 here.
>
> Hmmm. I'm wondering about something you wrote back in June
> (https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
>
> blk_queue_enter() always - especially when sd is runtime
> suspended and I try to mount as above - sets success to be true
> for me, so never continues down to bkl_pm_request_resume(). All
> I see is "PM: Removing info for No Bus:sda1".
>
> blk_queue_enter() would always set success to be true because pm
> (derived from the BLK_MQ_REQ_PREEMPT flag) is true. But why was the
> BLK_MQ_REQ_PREEMPT flag set? In other words, where was
> blk_queue_enter() called from?
>
> Can you get a stack trace (i.e., call dump_stack()) at exactly this
> point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED? Or
> do you already know the answer?
>
>
I reverted any scsi/block out-of-tree fixes for this.
when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the
first stack trace I get in this condition, inside of blk_queue_enter():
There is more, but I don't know if that's interesting.
[ 38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487
[ 38.642207] Hardware name: Purism Librem 5r3 (DT)
[ 38.642213] Call trace:
[ 38.642233] dump_backtrace+0x0/0x210
[ 38.642242] show_stack+0x20/0x30
[ 38.642252] dump_stack+0xc8/0x128
[ 38.642262] blk_queue_enter+0x1b8/0x2d8
[ 38.642271] blk_mq_alloc_request+0x54/0xb0
[ 38.642277] blk_get_request+0x34/0x78
[ 38.642286] __scsi_execute+0x60/0x1c8
[ 38.642291] scsi_test_unit_ready+0x88/0x118
[ 38.642298] sd_check_events+0x110/0x158
[ 38.642306] disk_check_events+0x68/0x188
[ 38.642312] disk_clear_events+0x84/0x198
[ 38.642320] check_disk_change+0x38/0x90
[ 38.642325] sd_open+0x60/0x148
[ 38.642330] __blkdev_get+0xcc/0x4c8
[ 38.642335] __blkdev_get+0x278/0x4c8
[ 38.642339] blkdev_get+0x128/0x1a8
[ 38.642345] blkdev_open+0x98/0xb0
[ 38.642354] do_dentry_open+0x130/0x3c8
[ 38.642359] vfs_open+0x34/0x40
[ 38.642366] path_openat+0xa30/0xe40
[ 38.642372] do_filp_open+0x84/0x100
[ 38.642377] do_sys_openat2+0x1f4/0x2b0
[ 38.642382] do_sys_open+0x60/0xa8
(...)
and of course it doesn't work and /dev/sda1 disappears, see the initial
discussion that led to your fix.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-29 16:33 ` Martin Kepplinger
@ 2020-08-29 18:56 ` Alan Stern
2020-08-30 0:38 ` Bart Van Assche
0 siblings, 1 reply; 12+ messages in thread
From: Alan Stern @ 2020-08-29 18:56 UTC (permalink / raw)
To: Martin Kepplinger
Cc: Bart Van Assche, Can Guo, linux-scsi, linux-block, kernel
On Sat, Aug 29, 2020 at 06:33:26PM +0200, Martin Kepplinger wrote:
> On 29.08.20 17:26, Alan Stern wrote:
> > Hmmm. I'm wondering about something you wrote back in June
> > (https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
> >
> > blk_queue_enter() always - especially when sd is runtime
> > suspended and I try to mount as above - sets success to be true
> > for me, so never continues down to bkl_pm_request_resume(). All
> > I see is "PM: Removing info for No Bus:sda1".
> >
> > blk_queue_enter() would always set success to be true because pm
> > (derived from the BLK_MQ_REQ_PREEMPT flag) is true. But why was the
> > BLK_MQ_REQ_PREEMPT flag set? In other words, where was
> > blk_queue_enter() called from?
> >
> > Can you get a stack trace (i.e., call dump_stack()) at exactly this
> > point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED? Or
> > do you already know the answer?
> >
> >
>
> I reverted any scsi/block out-of-tree fixes for this.
>
> when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the
> first stack trace I get in this condition, inside of blk_queue_enter():
>
> There is more, but I don't know if that's interesting.
>
> [ 38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487
> [ 38.642207] Hardware name: Purism Librem 5r3 (DT)
> [ 38.642213] Call trace:
> [ 38.642233] dump_backtrace+0x0/0x210
> [ 38.642242] show_stack+0x20/0x30
> [ 38.642252] dump_stack+0xc8/0x128
> [ 38.642262] blk_queue_enter+0x1b8/0x2d8
> [ 38.642271] blk_mq_alloc_request+0x54/0xb0
> [ 38.642277] blk_get_request+0x34/0x78
> [ 38.642286] __scsi_execute+0x60/0x1c8
> [ 38.642291] scsi_test_unit_ready+0x88/0x118
> [ 38.642298] sd_check_events+0x110/0x158
> [ 38.642306] disk_check_events+0x68/0x188
> [ 38.642312] disk_clear_events+0x84/0x198
> [ 38.642320] check_disk_change+0x38/0x90
> [ 38.642325] sd_open+0x60/0x148
> [ 38.642330] __blkdev_get+0xcc/0x4c8
> [ 38.642335] __blkdev_get+0x278/0x4c8
> [ 38.642339] blkdev_get+0x128/0x1a8
> [ 38.642345] blkdev_open+0x98/0xb0
> [ 38.642354] do_dentry_open+0x130/0x3c8
> [ 38.642359] vfs_open+0x34/0x40
> [ 38.642366] path_openat+0xa30/0xe40
> [ 38.642372] do_filp_open+0x84/0x100
> [ 38.642377] do_sys_openat2+0x1f4/0x2b0
> [ 38.642382] do_sys_open+0x60/0xa8
> (...)
>
> and of course it doesn't work and /dev/sda1 disappears, see the initial
> discussion that led to your fix.
Great! That's exactly what I was looking for, thank you.
Bart, this is a perfect example of the potential race I've been talking
about in the other email thread. Suppose thread 0 is carrying out a
runtime suspend of a SCSI disk and at the same time, thread 1 is opening
the disk's block device (as we see in the stack trace here). Then we
could have the following:
Thread 0 Thread 1
-------- --------
Start runtime suspend
blk_pre_runtime_suspend calls
blk_set_pm_only and sets
q->rpm_status to RPM_SUSPENDING
Call sd_open -> ... -> scsi_test_unit_ready
-> __scsi_execute -> ...
-> blk_queue_enter
Sees BLK_MQ_REQ_PREEMPT set and
RPM_SUSPENDING queue status, so does
not postpone the request
blk_post_runtime_suspend sets
q->rpm_status to RPM_SUSPENDED
The drive goes into runtime suspend
Issues the TEST UNIT READY request
Request fails because the drive is suspended
One way to avoid this race is mutual exclusion: We could make sd_open
prevent the drive from being runtime suspended until it returns.
However I don't like this approach; it would mean tracking down every
possible pathway to __scsi_execute and making sure that runtime suspend
is blocked.
A more fine-grained approach would be to have __scsi_execute itself call
scsi_autopm_get/put_device whenever the rq_flags argument doesn't
contain RQF_PM. This way we wouldn't have to worry about missing any
possiible pathways. But it relies on an implicit assumption that
__scsi_execute is the only place where the PREEMPT flag gets set.
A third possibility is the approach I outlined before, adding a
BLK_MQ_REQ_PM flag. But to avoid the deadlock you pointed out, I would
make blk_queue_enter smarter about whether to postpone a request. The
logic would go like this:
If !blk_queue_pm_only:
Allow
If !BLK_MQ_REQ_PREEMPT:
Postpone
If q->rpm_status is RPM_ACTIVE:
Allow
If !BLK_MQ_REQ_PM:
Postpone
If q->rpm_status is RPM_SUSPENDED:
Postpone
Else:
Allow
The assumption here is that the PREEMPT flag is set whenever the PM flag
is.
I believe either the second or third possibility would work. The second
looks to be the simplest
What do you think?
Alan Stern
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-29 18:56 ` Alan Stern
@ 2020-08-30 0:38 ` Bart Van Assche
2020-08-30 1:06 ` Alan Stern
0 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2020-08-30 0:38 UTC (permalink / raw)
To: Alan Stern, Martin Kepplinger; +Cc: Can Guo, linux-scsi, linux-block, kernel
On 2020-08-29 11:56, Alan Stern wrote:
> A third possibility is the approach I outlined before, adding a
> BLK_MQ_REQ_PM flag. But to avoid the deadlock you pointed out, I would
> make blk_queue_enter smarter about whether to postpone a request. The
> logic would go like this:
>
> If !blk_queue_pm_only:
> Allow
> If !BLK_MQ_REQ_PREEMPT:
> Postpone
> If q->rpm_status is RPM_ACTIVE:
> Allow
> If !BLK_MQ_REQ_PM:
> Postpone
> If q->rpm_status is RPM_SUSPENDED:
> Postpone
> Else:
> Allow
>
> The assumption here is that the PREEMPT flag is set whenever the PM flag
> is.
Hi Alan,
Although interesting, these solutions sound like workarounds to me. How
about fixing the root cause by modifying the SCSI DV implementation such
that it doesn't use scsi_device_quiesce() anymore()? That change would
allow to remove BLK_MQ_REQ_PREEMPT / RQF_PREEMPT from the block layer and
move these flags into the SCSI and IDE code.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: Fix bug in runtime-resume handling
2020-08-30 0:38 ` Bart Van Assche
@ 2020-08-30 1:06 ` Alan Stern
0 siblings, 0 replies; 12+ messages in thread
From: Alan Stern @ 2020-08-30 1:06 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin Kepplinger, Can Guo, linux-scsi, linux-block, kernel
On Sat, Aug 29, 2020 at 05:38:50PM -0700, Bart Van Assche wrote:
> On 2020-08-29 11:56, Alan Stern wrote:
> > A third possibility is the approach I outlined before, adding a
> > BLK_MQ_REQ_PM flag. But to avoid the deadlock you pointed out, I would
> > make blk_queue_enter smarter about whether to postpone a request. The
> > logic would go like this:
> >
> > If !blk_queue_pm_only:
> > Allow
> > If !BLK_MQ_REQ_PREEMPT:
> > Postpone
> > If q->rpm_status is RPM_ACTIVE:
> > Allow
> > If !BLK_MQ_REQ_PM:
> > Postpone
> > If q->rpm_status is RPM_SUSPENDED:
> > Postpone
> > Else:
> > Allow
> >
> > The assumption here is that the PREEMPT flag is set whenever the PM flag
> > is.
>
> Hi Alan,
>
> Although interesting, these solutions sound like workarounds to me. How
> about fixing the root cause by modifying the SCSI DV implementation such
> that it doesn't use scsi_device_quiesce() anymore()? That change would
> allow to remove BLK_MQ_REQ_PREEMPT / RQF_PREEMPT from the block layer and
> move these flags into the SCSI and IDE code.
That's a perfectly reasonable approach, but I have no idea how to do it.
Any suggestions?
Alan Stern
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-08-30 1:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <9b80ca7c-39f8-e52d-2535-8b0baf93c7d1@puri.sm>
[not found] ` <425990b3-4b0b-4dcf-24dc-4e7e60d5869d@puri.sm>
[not found] ` <20200807143002.GE226516@rowland.harvard.edu>
[not found] ` <b0abab28-880e-4b88-eb3c-9ffd927d1ed9@puri.sm>
[not found] ` <20200808150542.GB256751@rowland.harvard.edu>
[not found] ` <d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm>
[not found] ` <20200809152643.GA277165@rowland.harvard.edu>
[not found] ` <60150284-be13-d373-5448-651b72a7c4c9@puri.sm>
[not found] ` <20200810141343.GA299045@rowland.harvard.edu>
[not found] ` <6f0c530f-4309-ab1e-393b-83bf8367f59e@puri.sm>
2020-08-23 14:57 ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern
2020-08-24 17:48 ` Bart Van Assche
2020-08-24 20:13 ` Alan Stern
2020-08-26 7:48 ` Martin Kepplinger
2020-08-27 17:42 ` Martin Kepplinger
2020-08-27 20:29 ` Alan Stern
2020-08-29 7:24 ` Martin Kepplinger
2020-08-29 15:26 ` Alan Stern
2020-08-29 16:33 ` Martin Kepplinger
2020-08-29 18:56 ` Alan Stern
2020-08-30 0:38 ` Bart Van Assche
2020-08-30 1:06 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).