All of lore.kernel.org
 help / color / mirror / Atom feed
* requeue failure with blk-mq-sched
@ 2017-01-19 12:27 Hannes Reinecke
  2017-01-19 14:09 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2017-01-19 12:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig

Hi Jens,

upon further testing with your blk-mq-sched branch I hit a queue stall
during requeing:

[  202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
[  202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
[  202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
hostbyte=DID_OK driverbyte=DRIVER_OK
[  202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
[  202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
[  202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
device reset occurred
[  202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
[  202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
into mlqueue

... and that is the last ever heard of that device.
The 'device_busy' count remains at '1' and no further commands will be
sent to the device.

Debugging continues.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: requeue failure with blk-mq-sched
  2017-01-19 12:27 requeue failure with blk-mq-sched Hannes Reinecke
@ 2017-01-19 14:09 ` Jens Axboe
  2017-01-19 14:24   ` Hannes Reinecke
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2017-01-19 14:09 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig

On 01/19/2017 04:27 AM, Hannes Reinecke wrote:
> Hi Jens,
> 
> upon further testing with your blk-mq-sched branch I hit a queue stall
> during requeing:
> 
> [  202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
> [  202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
> [  202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
> hostbyte=DID_OK driverbyte=DRIVER_OK
> [  202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
> [  202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
> [  202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
> device reset occurred
> [  202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
> [  202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
> into mlqueue
> 
> ... and that is the last ever heard of that device.
> The 'device_busy' count remains at '1' and no further commands will be
> sent to the device.

If device_busy is at 1, then it should have a command pending. Where did
you log this - it would be bandy if you attached whatever debug patch
you put in, so we can see where the printks are coming from. If we get a
BUSY with nothing pending, the driver should be ensuring that the queue
gets run again later through blk_mq_delay_queue(), for instance.

When the device is stuck, does it restart if you send it IO?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: requeue failure with blk-mq-sched
  2017-01-19 14:09 ` Jens Axboe
@ 2017-01-19 14:24   ` Hannes Reinecke
  2017-01-19 14:35     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2017-01-19 14:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig

On 01/19/2017 03:09 PM, Jens Axboe wrote:
> On 01/19/2017 04:27 AM, Hannes Reinecke wrote:
>> Hi Jens,
>>
>> upon further testing with your blk-mq-sched branch I hit a queue stall
>> during requeing:
>>
>> [  202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
>> [  202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>> [  202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
>> hostbyte=DID_OK driverbyte=DRIVER_OK
>> [  202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>> [  202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
>> [  202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
>> device reset occurred
>> [  202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
>> [  202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
>> into mlqueue
>>
>> ... and that is the last ever heard of that device.
>> The 'device_busy' count remains at '1' and no further commands will be
>> sent to the device.
> 
> If device_busy is at 1, then it should have a command pending. Where did
> you log this - it would be bandy if you attached whatever debug patch
> you put in, so we can see where the printks are coming from. If we get a
> BUSY with nothing pending, the driver should be ensuring that the queue
> gets run again later through blk_mq_delay_queue(), for instance.
> 
> When the device is stuck, does it restart if you send it IO?
> 
Meanwhile I've found it.

Problem is that scsi_queue_rq() will not stop the queue when hitting a
busy condition before sending commands down to the driver, but still
calls blk_mq_delay_queue():

	switch (ret) {
	case BLK_MQ_RQ_QUEUE_BUSY:
		if (atomic_read(&sdev->device_busy) == 0 &&
		    !scsi_device_blocked(sdev))
			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
		break;

As the queue isn't stopped blk_mq_delay_queue() won't do anything,
so queue_rq() will never be called.
I've send a patch to linux-scsi.

BTW: Is it a hard requirement that the queue has to be stopped when
returning BLK_MQ_RQ_QUEUE_BUSY?
The comments indicate as such, but none of the drivers do so...
Also, blk_mq_delay_queue() is a bit odd, in that it'll only start
stopped hardware queues. I would at least document that the queue has to
be stopped when calling that.
Better still, can't we have blk_mq_delay_queue start the queues
unconditionally?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: requeue failure with blk-mq-sched
  2017-01-19 14:24   ` Hannes Reinecke
@ 2017-01-19 14:35     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2017-01-19 14:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-block, linux-scsi@vger.kernel.org, Christoph Hellwig

On 01/19/2017 06:24 AM, Hannes Reinecke wrote:
> On 01/19/2017 03:09 PM, Jens Axboe wrote:
>> On 01/19/2017 04:27 AM, Hannes Reinecke wrote:
>>> Hi Jens,
>>>
>>> upon further testing with your blk-mq-sched branch I hit a queue stall
>>> during requeing:
>>>
>>> [  202.340959] sd 3:0:4:1: tag#473 Send: scmd 0xffff880422e7a440
>>> [  202.340962] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>>> [  202.341161] sd 3:0:4:1: tag#473 Done: ADD_TO_MLQUEUE Result:
>>> hostbyte=DID_OK driverbyte=DRIVER_OK
>>> [  202.341164] sd 3:0:4:1: tag#473 CDB: Test Unit Ready 00 00 00 00 00 00
>>> [  202.341167] sd 3:0:4:1: tag#473 Sense Key : Unit Attention [current]
>>> [  202.341171] sd 3:0:4:1: tag#473 Add. Sense: Power on, reset, or bus
>>> device reset occurred
>>> [  202.341173] sd 3:0:4:1: tag#473 scsi host busy 1 failed 0
>>> [  202.341176] sd 3:0:4:1: tag#473 Inserting command ffff880422e7a440
>>> into mlqueue
>>>
>>> ... and that is the last ever heard of that device.
>>> The 'device_busy' count remains at '1' and no further commands will be
>>> sent to the device.
>>
>> If device_busy is at 1, then it should have a command pending. Where did
>> you log this - it would be bandy if you attached whatever debug patch
>> you put in, so we can see where the printks are coming from. If we get a
>> BUSY with nothing pending, the driver should be ensuring that the queue
>> gets run again later through blk_mq_delay_queue(), for instance.
>>
>> When the device is stuck, does it restart if you send it IO?
>>
> Meanwhile I've found it.
> 
> Problem is that scsi_queue_rq() will not stop the queue when hitting a
> busy condition before sending commands down to the driver, but still
> calls blk_mq_delay_queue():
> 
> 	switch (ret) {
> 	case BLK_MQ_RQ_QUEUE_BUSY:
> 		if (atomic_read(&sdev->device_busy) == 0 &&
> 		    !scsi_device_blocked(sdev))
> 			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
> 		break;
> 
> As the queue isn't stopped blk_mq_delay_queue() won't do anything,
> so queue_rq() will never be called.
> I've send a patch to linux-scsi.
> 
> BTW: Is it a hard requirement that the queue has to be stopped when
> returning BLK_MQ_RQ_QUEUE_BUSY?

No, but currently it is apparently a hard requirement that the queue be
stopped when you call delay. Which does make sense, since there's little
point in doing the delay if the queue is run anyway.

> The comments indicate as such, but none of the drivers do so...
> Also, blk_mq_delay_queue() is a bit odd, in that it'll only start
> stopped hardware queues. I would at least document that the queue has to
> be stopped when calling that.
> Better still, can't we have blk_mq_delay_queue start the queues
> unconditionally?

I think so, it doesn't make sense to have blk_mq_delay_queue() NOT stop
the queue, yet its own work handler requires it to be set to actually
run.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index fa1f8619bfe7..739a29208a63 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1161,8 +1161,8 @@ static void blk_mq_delay_work_fn(struct work_struct *work)
 
 	hctx = container_of(work, struct blk_mq_hw_ctx, delay_work.work);
 
-	if (test_and_clear_bit(BLK_MQ_S_STOPPED, &hctx->state))
-		__blk_mq_run_hw_queue(hctx);
+	clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
+	__blk_mq_run_hw_queue(hctx);
 }
 
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-19 14:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-19 12:27 requeue failure with blk-mq-sched Hannes Reinecke
2017-01-19 14:09 ` Jens Axboe
2017-01-19 14:24   ` Hannes Reinecke
2017-01-19 14:35     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.