* requeue failure with blk-mq-sched
@ 2017-01-19 12:27 Hannes Reinecke
2017-01-19 14:09 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2017-01-19 12:27 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig
Hi Jens,
upon further testing with your blk-mq-sched branch I hit a queue stall
during requeing:
[ 202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
[ 202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
[ 202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[ 202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
[ 202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
[ 202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
device reset occurred
[ 202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
[ 202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
into mlqueue
... and that is the last ever heard of that device.
The 'device_busy' count remains at '1' and no further commands will be
sent to the device.
Debugging continues.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: requeue failure with blk-mq-sched
2017-01-19 12:27 requeue failure with blk-mq-sched Hannes Reinecke
@ 2017-01-19 14:09 ` Jens Axboe
2017-01-19 14:24 ` Hannes Reinecke
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2017-01-19 14:09 UTC (permalink / raw)
To: Hannes Reinecke
Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig
On 01/19/2017 04:27 AM, Hannes Reinecke wrote:
> Hi Jens,
>
> upon further testing with your blk-mq-sched branch I hit a queue stall
> during requeing:
>
> [ 202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
> [ 202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
> [ 202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
> hostbyte=DID_OK driverbyte=DRIVER_OK
> [ 202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
> [ 202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
> [ 202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
> device reset occurred
> [ 202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
> [ 202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
> into mlqueue
>
> ... and that is the last ever heard of that device.
> The 'device_busy' count remains at '1' and no further commands will be
> sent to the device.
If device_busy is at 1, then it should have a command pending. Where did
you log this - it would be bandy if you attached whatever debug patch
you put in, so we can see where the printks are coming from. If we get a
BUSY with nothing pending, the driver should be ensuring that the queue
gets run again later through blk_mq_delay_queue(), for instance.
When the device is stuck, does it restart if you send it IO?
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: requeue failure with blk-mq-sched
2017-01-19 14:09 ` Jens Axboe
@ 2017-01-19 14:24 ` Hannes Reinecke
2017-01-19 14:35 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2017-01-19 14:24 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig
On 01/19/2017 03:09 PM, Jens Axboe wrote:
> On 01/19/2017 04:27 AM, Hannes Reinecke wrote:
>> Hi Jens,
>>
>> upon further testing with your blk-mq-sched branch I hit a queue stall
>> during requeing:
>>
>> [ 202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
>> [ 202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>> [ 202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
>> hostbyte=DID_OK driverbyte=DRIVER_OK
>> [ 202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>> [ 202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
>> [ 202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
>> device reset occurred
>> [ 202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
>> [ 202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
>> into mlqueue
>>
>> ... and that is the last ever heard of that device.
>> The 'device_busy' count remains at '1' and no further commands will be
>> sent to the device.
>
> If device_busy is at 1, then it should have a command pending. Where did
> you log this - it would be bandy if you attached whatever debug patch
> you put in, so we can see where the printks are coming from. If we get a
> BUSY with nothing pending, the driver should be ensuring that the queue
> gets run again later through blk_mq_delay_queue(), for instance.
>
> When the device is stuck, does it restart if you send it IO?
>
Meanwhile I've found it.
Problem is that scsi_queue_rq() will not stop the queue when hitting a
busy condition before sending commands down to the driver, but still
calls blk_mq_delay_queue():
switch (ret) {
case BLK_MQ_RQ_QUEUE_BUSY:
if (atomic_read(&sdev->device_busy) == 0 &&
!scsi_device_blocked(sdev))
blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
break;
As the queue isn't stopped blk_mq_delay_queue() won't do anything,
so queue_rq() will never be called.
I've send a patch to linux-scsi.
BTW: Is it a hard requirement that the queue has to be stopped when
returning BLK_MQ_RQ_QUEUE_BUSY?
The comments indicate as such, but none of the drivers do so...
Also, blk_mq_delay_queue() is a bit odd, in that it'll only start
stopped hardware queues. I would at least document that the queue has to
be stopped when calling that.
Better still, can't we have blk_mq_delay_queue start the queues
unconditionally?
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: requeue failure with blk-mq-sched
2017-01-19 14:24 ` Hannes Reinecke
@ 2017-01-19 14:35 ` Jens Axboe
0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2017-01-19 14:35 UTC (permalink / raw)
To: Hannes Reinecke
Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig
On 01/19/2017 06:24 AM, Hannes Reinecke wrote:
> On 01/19/2017 03:09 PM, Jens Axboe wrote:
>> On 01/19/2017 04:27 AM, Hannes Reinecke wrote:
>>> Hi Jens,
>>>
>>> upon further testing with your blk-mq-sched branch I hit a queue stall
>>> during requeing:
>>>
>>> [ 202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
>>> [ 202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>>> [ 202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
>>> hostbyte=DID_OK driverbyte=DRIVER_OK
>>> [ 202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>>> [ 202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
>>> [ 202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
>>> device reset occurred
>>> [ 202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
>>> [ 202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
>>> into mlqueue
>>>
>>> ... and that is the last ever heard of that device.
>>> The 'device_busy' count remains at '1' and no further commands will be
>>> sent to the device.
>>
>> If device_busy is at 1, then it should have a command pending. Where did
>> you log this - it would be bandy if you attached whatever debug patch
>> you put in, so we can see where the printks are coming from. If we get a
>> BUSY with nothing pending, the driver should be ensuring that the queue
>> gets run again later through blk_mq_delay_queue(), for instance.
>>
>> When the device is stuck, does it restart if you send it IO?
>>
> Meanwhile I've found it.
>
> Problem is that scsi_queue_rq() will not stop the queue when hitting a
> busy condition before sending commands down to the driver, but still
> calls blk_mq_delay_queue():
>
> switch (ret) {
> case BLK_MQ_RQ_QUEUE_BUSY:
> if (atomic_read(&sdev->device_busy) == 0 &&
> !scsi_device_blocked(sdev))
> blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
> break;
>
> As the queue isn't stopped blk_mq_delay_queue() won't do anything,
> so queue_rq() will never be called.
> I've send a patch to linux-scsi.
>
> BTW: Is it a hard requirement that the queue has to be stopped when
> returning BLK_MQ_RQ_QUEUE_BUSY?
No, but currently it is apparently a hard requirement that the queue be
stopped when you call delay. Which does make sense, since there's little
point in doing the delay if the queue is run anyway.
> The comments indicate as such, but none of the drivers do so...
> Also, blk_mq_delay_queue() is a bit odd, in that it'll only start
> stopped hardware queues. I would at least document that the queue has to
> be stopped when calling that.
> Better still, can't we have blk_mq_delay_queue start the queues
> unconditionally?
I think so, it doesn't make sense to have blk_mq_delay_queue() NOT stop
the queue, yet its own work handler requires it to be set to actually
run.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fa1f8619bfe7..739a29208a63 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1161,8 +1161,8 @@ static void blk_mq_delay_work_fn(struct work_struct *work)
hctx = container_of(work, struct blk_mq_hw_ctx, delay_work.work);
- if (test_and_clear_bit(BLK_MQ_S_STOPPED, &hctx->state))
- __blk_mq_run_hw_queue(hctx);
+ clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
+ __blk_mq_run_hw_queue(hctx);
}
void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
--
Jens Axboe
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-01-19 14:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-19 12:27 requeue failure with blk-mq-sched Hannes Reinecke
2017-01-19 14:09 ` Jens Axboe
2017-01-19 14:24 ` Hannes Reinecke
2017-01-19 14:35 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.