All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] blk-mq: fix I/O hang during system resume
@ 2017-08-30 15:19 Ming Lei
  2017-08-30 15:19 ` Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ming Lei @ 2017-08-30 15:19 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Oleksandr Natalenko, Ming Lei

Hi,

This two patches fix SCSI I/O hang during system resume.

The cause is that when SCSI device is put into SCSI's
quiesce state, normal I/O request can't be dispatched to lld
any more, only request with RQF_PREEMPT is allowed to be
sent to drive.

In current blk-mq implementation, if there is request in ->dispatch,
no new request can't be dispatched to driver any more.

This two patches fix the issue reported by Oleksandr.

Thanks,
Ming

Ming Lei (2):
  blk-mq: add requests in the tail of hctx->dispatch
  blk-mq: align to legacy's implementation of blk_execute_rq

 block/blk-core.c     |  2 +-
 block/blk-exec.c     |  2 +-
 block/blk-flush.c    |  2 +-
 block/blk-mq-sched.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/blk-mq-sched.h |  2 ++
 block/blk-mq.c       |  2 +-
 6 files changed, 65 insertions(+), 5 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 0/2] blk-mq: fix I/O hang during system resume
  2017-08-30 15:19 [PATCH 0/2] blk-mq: fix I/O hang during system resume Ming Lei
@ 2017-08-30 15:19 ` Ming Lei
  2017-08-30 15:19 ` [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch Ming Lei
  2017-08-30 15:19 ` [PATCH 2/2] blk-mq: align to legacy's implementation of blk_execute_rq Ming Lei
  2 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-08-30 15:19 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Oleksandr Natalenko, Ming Lei

Hi,

This two patches fix I/O hang of SCSI-MQ during system resume.

The cause is that when SCSI device is put into SCSI's
quiesce state, normal I/O request can't be dispatched to lld
any more, only request with RQF_PREEMPT is allowed to be
sent to drive.

In current blk-mq implementation, if there is request in ->dispatch,
no new request can't be dispatched to driver any more.

This two patches fix the issue reported by Oleksandr.

Thanks,
Ming

Ming Lei (2):
  blk-mq: add requests in the tail of hctx->dispatch
  blk-mq: align to legacy's implementation of blk_execute_rq

 block/blk-core.c     |  2 +-
 block/blk-exec.c     |  2 +-
 block/blk-flush.c    |  2 +-
 block/blk-mq-sched.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/blk-mq-sched.h |  2 ++
 block/blk-mq.c       |  2 +-
 6 files changed, 65 insertions(+), 5 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch
  2017-08-30 15:19 [PATCH 0/2] blk-mq: fix I/O hang during system resume Ming Lei
  2017-08-30 15:19 ` Ming Lei
@ 2017-08-30 15:19 ` Ming Lei
  2017-08-30 15:22   ` Jens Axboe
  2017-08-30 15:19 ` [PATCH 2/2] blk-mq: align to legacy's implementation of blk_execute_rq Ming Lei
  2 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2017-08-30 15:19 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Oleksandr Natalenko, Ming Lei

It is more reasonable to add requests to ->dispatch in way
of FIFO style, instead of LIFO style.

Also in this way, we can allow to insert request at the front
of hw queue, which function is needed to fix one bug
in blk-mq's implementation of blk_execute_rq()

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 2 +-
 block/blk-mq.c       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..8d97df40fc28 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	 * the dispatch list.
 	 */
 	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
+	list_add_tail(&rq->queuelist, &hctx->dispatch);
 	spin_unlock(&hctx->lock);
 	return true;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4603b115e234..fed3d0c16266 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 		blk_mq_put_driver_tag(rq);
 
 		spin_lock(&hctx->lock);
-		list_splice_init(list, &hctx->dispatch);
+		list_splice_tail_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
 
 		/*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] blk-mq: align to legacy's implementation of blk_execute_rq
  2017-08-30 15:19 [PATCH 0/2] blk-mq: fix I/O hang during system resume Ming Lei
  2017-08-30 15:19 ` Ming Lei
  2017-08-30 15:19 ` [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch Ming Lei
@ 2017-08-30 15:19 ` Ming Lei
  2 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-08-30 15:19 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Oleksandr Natalenko, Ming Lei

In legacy path, when one request is executed via blk_execute_rq(),
the request is added to the front of q->queue_head directly,
and I/O scheduler's queue is bypassed because either merging
or sorting isn't needed.

When SCSI device is put into quiece state, such as during
system suspend, we need to add the RQF_PM request into
front of the queue.

This patch fixes I/O hang after system resume by taking
the similar implementation of legacy path.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c     |  2 +-
 block/blk-exec.c     |  2 +-
 block/blk-flush.c    |  2 +-
 block/blk-mq-sched.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-mq-sched.h |  2 ++
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index dbecbf4a64e0..fb75bc646ebc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2330,7 +2330,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 	if (q->mq_ops) {
 		if (blk_queue_io_stat(q))
 			blk_account_io_start(rq, true);
-		blk_mq_sched_insert_request(rq, false, true, false, false);
+		blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
 		return BLK_STS_OK;
 	}
 
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 5c0f3dc446dc..4565aa6bb624 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -61,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 	 * be reused after dying flag is set
 	 */
 	if (q->mq_ops) {
-		blk_mq_sched_insert_request(rq, at_head, true, false, false);
+		blk_mq_sched_insert_request_bypass(rq, at_head, true, false, false);
 		return;
 	}
 
diff --git a/block/blk-flush.c b/block/blk-flush.c
index ed5fe322abba..51e89e5c525a 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, true, false, false);
+			blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 8d97df40fc28..b40dd063d61f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -354,6 +354,64 @@ static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
 		blk_mq_add_to_requeue_list(rq, false, true);
 }
 
+static void blk_mq_flush_hctx(struct blk_mq_hw_ctx *hctx,
+			      struct elevator_queue *e,
+			      const bool has_sched_dispatch,
+			      struct list_head *rqs)
+{
+	LIST_HEAD(list);
+
+	if (!has_sched_dispatch)
+		blk_mq_flush_busy_ctxs(hctx, &list);
+	else {
+		while (true) {
+			struct request *rq;
+
+			rq = e->type->ops.mq.dispatch_request(hctx);
+			if (!rq)
+				break;
+			list_add_tail(&rq->queuelist, &list);
+		}
+	}
+
+	list_splice_tail(&list, rqs);
+}
+
+void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
+					bool run_queue, bool async,
+					bool can_block)
+{
+	struct request_queue *q = rq->q;
+	struct elevator_queue *e = q->elevator;
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
+	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+	LIST_HEAD(list);
+	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
+
+	if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+		blk_mq_sched_insert_flush(hctx, rq, can_block);
+		return;
+	}
+
+	if (at_head)
+		list_add_tail(&rq->queuelist, &list);
+	else {
+		blk_mq_flush_hctx(hctx, e, has_sched_dispatch, &list);
+		list_add_tail(&rq->queuelist, &list);
+		run_queue = true;
+	}
+
+	spin_lock(&hctx->lock);
+	if (at_head)
+		list_splice(&list, &hctx->dispatch);
+	else
+		list_splice_tail(&list, &hctx->dispatch);
+	spin_unlock(&hctx->lock);
+
+	if (run_queue)
+		blk_mq_run_hw_queue(hctx, async);
+}
+
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block)
 {
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..4d01697a627f 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -18,6 +18,8 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block);
+void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
+					bool run_queue, bool async, bool can_block);
 void blk_mq_sched_insert_requests(struct request_queue *q,
 				  struct blk_mq_ctx *ctx,
 				  struct list_head *list, bool run_queue_async);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch
  2017-08-30 15:19 ` [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch Ming Lei
@ 2017-08-30 15:22   ` Jens Axboe
  2017-08-30 15:39     ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2017-08-30 15:22 UTC (permalink / raw)
  To: Ming Lei, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Oleksandr Natalenko

On 08/30/2017 09:19 AM, Ming Lei wrote:
> It is more reasonable to add requests to ->dispatch in way
> of FIFO style, instead of LIFO style.
> 
> Also in this way, we can allow to insert request at the front
> of hw queue, which function is needed to fix one bug
> in blk-mq's implementation of blk_execute_rq()
> 
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-sched.c | 2 +-
>  block/blk-mq.c       | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4ab69435708c..8d97df40fc28 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>  	 * the dispatch list.
>  	 */
>  	spin_lock(&hctx->lock);
> -	list_add(&rq->queuelist, &hctx->dispatch);
> +	list_add_tail(&rq->queuelist, &hctx->dispatch);
>  	spin_unlock(&hctx->lock);
>  	return true;
>  }
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4603b115e234..fed3d0c16266 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
>  		blk_mq_put_driver_tag(rq);
>  
>  		spin_lock(&hctx->lock);
> -		list_splice_init(list, &hctx->dispatch);
> +		list_splice_tail_init(list, &hctx->dispatch);
>  		spin_unlock(&hctx->lock);

I'm not convinced this is safe, there's actually a reason why the
request is added to the front and not the back. We do have
reorder_tags_to_front() as a safe guard, but I'd much rather get rid of
that than make this change.

What's your reasoning here? Your changelog doesn't really explain why
this fixes anything, it's very vague.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch
  2017-08-30 15:22   ` Jens Axboe
@ 2017-08-30 15:39     ` Ming Lei
  2017-08-30 15:51       ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2017-08-30 15:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Oleksandr Natalenko

On Wed, Aug 30, 2017 at 09:22:42AM -0600, Jens Axboe wrote:
> On 08/30/2017 09:19 AM, Ming Lei wrote:
> > It is more reasonable to add requests to ->dispatch in way
> > of FIFO style, instead of LIFO style.
> > 
> > Also in this way, we can allow to insert request at the front
> > of hw queue, which function is needed to fix one bug
> > in blk-mq's implementation of blk_execute_rq()
> > 
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq-sched.c | 2 +-
> >  block/blk-mq.c       | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 4ab69435708c..8d97df40fc28 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
> >  	 * the dispatch list.
> >  	 */
> >  	spin_lock(&hctx->lock);
> > -	list_add(&rq->queuelist, &hctx->dispatch);
> > +	list_add_tail(&rq->queuelist, &hctx->dispatch);
> >  	spin_unlock(&hctx->lock);
> >  	return true;
> >  }
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 4603b115e234..fed3d0c16266 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
> >  		blk_mq_put_driver_tag(rq);
> >  
> >  		spin_lock(&hctx->lock);
> > -		list_splice_init(list, &hctx->dispatch);
> > +		list_splice_tail_init(list, &hctx->dispatch);
> >  		spin_unlock(&hctx->lock);
> 
> I'm not convinced this is safe, there's actually a reason why the
> request is added to the front and not the back. We do have
> reorder_tags_to_front() as a safe guard, but I'd much rather get rid of

reorder_tags_to_front() is for reordering the requests in current list,
this patch is for splicing list into hctx->dispatch, so I can't see
it isn't safe, or could you explain it a bit?

> that than make this change.
> 
> What's your reasoning here? Your changelog doesn't really explain why

Firstly the 2nd patch need to add one rq(such as RQF_PM) to the
front of the hw queue, the simple way is to add it to the front
of hctx->dispatch. Without this change, the 2nd patch can't work
at all.

Secondly this way is still reasonable:

	- one rq is added to hctx->dispatch because queue is busy
	- another rq is added to hctx->dispatch too because of same reason

so it is reasonable to to add list into hctx->dispatch in FIFO style.

Finally my patchset for 'improving SCSI-MQ perf' will change to not
dequeue rq from sw/scheduler if ->dispatch isn't flushed. I believe
it is reasonable and correct thing to do, with that change, there
won't be difference between the two styles.


-- 
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch
  2017-08-30 15:39     ` Ming Lei
@ 2017-08-30 15:51       ` Jens Axboe
  2017-08-30 16:58         ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2017-08-30 15:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Oleksandr Natalenko

On 08/30/2017 09:39 AM, Ming Lei wrote:
> On Wed, Aug 30, 2017 at 09:22:42AM -0600, Jens Axboe wrote:
>> On 08/30/2017 09:19 AM, Ming Lei wrote:
>>> It is more reasonable to add requests to ->dispatch in way
>>> of FIFO style, instead of LIFO style.
>>>
>>> Also in this way, we can allow to insert request at the front
>>> of hw queue, which function is needed to fix one bug
>>> in blk-mq's implementation of blk_execute_rq()
>>>
>>> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>>> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>  block/blk-mq-sched.c | 2 +-
>>>  block/blk-mq.c       | 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> index 4ab69435708c..8d97df40fc28 100644
>>> --- a/block/blk-mq-sched.c
>>> +++ b/block/blk-mq-sched.c
>>> @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>>>  	 * the dispatch list.
>>>  	 */
>>>  	spin_lock(&hctx->lock);
>>> -	list_add(&rq->queuelist, &hctx->dispatch);
>>> +	list_add_tail(&rq->queuelist, &hctx->dispatch);
>>>  	spin_unlock(&hctx->lock);
>>>  	return true;
>>>  }
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 4603b115e234..fed3d0c16266 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
>>>  		blk_mq_put_driver_tag(rq);
>>>  
>>>  		spin_lock(&hctx->lock);
>>> -		list_splice_init(list, &hctx->dispatch);
>>> +		list_splice_tail_init(list, &hctx->dispatch);
>>>  		spin_unlock(&hctx->lock);
>>
>> I'm not convinced this is safe, there's actually a reason why the
>> request is added to the front and not the back. We do have
>> reorder_tags_to_front() as a safe guard, but I'd much rather get rid of
> 
> reorder_tags_to_front() is for reordering the requests in current list,
> this patch is for splicing list into hctx->dispatch, so I can't see
> it isn't safe, or could you explain it a bit?

If we can get the ordering right, then down the line we won't need to
have the tags reordering at all. It's an ugly hack that I'd love to see
go away.

>> that than make this change.
>>
>> What's your reasoning here? Your changelog doesn't really explain why
> 
> Firstly the 2nd patch need to add one rq(such as RQF_PM) to the
> front of the hw queue, the simple way is to add it to the front
> of hctx->dispatch. Without this change, the 2nd patch can't work
> at all.
> 
> Secondly this way is still reasonable:
> 
> 	- one rq is added to hctx->dispatch because queue is busy
> 	- another rq is added to hctx->dispatch too because of same reason
>
> so it is reasonable to to add list into hctx->dispatch in FIFO style.

Not disagreeing with the logic. But it also begs the question of why we
don't apply the same treatment to when we splice leftovers to the
dispatch list, currently we front splice that.

All I'm saying is that you need to tread very carefully with this, and
throw it through some careful testing to ensure that we don't introduce
conditions that now livelock. NVMe is the easy test case, that will
generally always work since we never run out of tags. The problematic
test case is usually things like SATA with 31 tags, and especially SATA
with flushes that don't queue. One good test case is the one where you
end up having all tags (or almost all) consumed by flushes, and still
ensuring that we're making forward progress.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch
  2017-08-30 15:51       ` Jens Axboe
@ 2017-08-30 16:58         ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-08-30 16:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Oleksandr Natalenko

On Wed, Aug 30, 2017 at 09:51:31AM -0600, Jens Axboe wrote:
> On 08/30/2017 09:39 AM, Ming Lei wrote:
> > On Wed, Aug 30, 2017 at 09:22:42AM -0600, Jens Axboe wrote:
> >> On 08/30/2017 09:19 AM, Ming Lei wrote:
> >>> It is more reasonable to add requests to ->dispatch in way
> >>> of FIFO style, instead of LIFO style.
> >>>
> >>> Also in this way, we can allow to insert request at the front
> >>> of hw queue, which function is needed to fix one bug
> >>> in blk-mq's implementation of blk_execute_rq()
> >>>
> >>> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> >>> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>> ---
> >>>  block/blk-mq-sched.c | 2 +-
> >>>  block/blk-mq.c       | 2 +-
> >>>  2 files changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> >>> index 4ab69435708c..8d97df40fc28 100644
> >>> --- a/block/blk-mq-sched.c
> >>> +++ b/block/blk-mq-sched.c
> >>> @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
> >>>  	 * the dispatch list.
> >>>  	 */
> >>>  	spin_lock(&hctx->lock);
> >>> -	list_add(&rq->queuelist, &hctx->dispatch);
> >>> +	list_add_tail(&rq->queuelist, &hctx->dispatch);
> >>>  	spin_unlock(&hctx->lock);
> >>>  	return true;
> >>>  }
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index 4603b115e234..fed3d0c16266 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
> >>>  		blk_mq_put_driver_tag(rq);
> >>>  
> >>>  		spin_lock(&hctx->lock);
> >>> -		list_splice_init(list, &hctx->dispatch);
> >>> +		list_splice_tail_init(list, &hctx->dispatch);
> >>>  		spin_unlock(&hctx->lock);
> >>
> >> I'm not convinced this is safe, there's actually a reason why the
> >> request is added to the front and not the back. We do have
> >> reorder_tags_to_front() as a safe guard, but I'd much rather get rid of
> > 
> > reorder_tags_to_front() is for reordering the requests in current list,
> > this patch is for splicing list into hctx->dispatch, so I can't see
> > it isn't safe, or could you explain it a bit?
> 
> If we can get the ordering right, then down the line we won't need to
> have the tags reordering at all. It's an ugly hack that I'd love to see
> go away.

If reorder_tags_to_front() isn't removed, this patch is still safe.

But blk_execute_rq_nowait() need to add one request in the front
of hw queue, that can be a contradiction compared with maintaining
a perfect order for removing reorder_tags_to_front().

So could you share your opinion on the 2nd patch for fixing
blk_execute_rq_nowait()?

> 
> >> that than make this change.
> >>
> >> What's your reasoning here? Your changelog doesn't really explain why
> > 
> > Firstly the 2nd patch need to add one rq(such as RQF_PM) to the
> > front of the hw queue, the simple way is to add it to the front
> > of hctx->dispatch. Without this change, the 2nd patch can't work
> > at all.
> > 
> > Secondly this way is still reasonable:
> > 
> > 	- one rq is added to hctx->dispatch because queue is busy
> > 	- another rq is added to hctx->dispatch too because of same reason
> >
> > so it is reasonable to to add list into hctx->dispatch in FIFO style.
> 
> Not disagreeing with the logic. But it also begs the question of why we
> don't apply the same treatment to when we splice leftovers to the
> dispatch list, currently we front splice that.
> 
> All I'm saying is that you need to tread very carefully with this, and
> throw it through some careful testing to ensure that we don't introduce
> conditions that now livelock. NVMe is the easy test case, that will

Yes, ->dispatch is far away from NVMe, but friends of SCSI-MQ.

> generally always work since we never run out of tags. The problematic
> test case is usually things like SATA with 31 tags, and especially SATA
> with flushes that don't queue. One good test case is the one where you
> end up having all tags (or almost all) consumed by flushes, and still
> ensuring that we're making forward progress.

Understood!

Even we can make the test more aggressive.

I just setup one virtio-scsi by changing both 'can_queue' and
'cmd_per_lun' as 1, that means the hw queue depth is 1, and
hw queue number is set as 1.

Then I run 'dbench -t 30 -s -F 64' in ext4 which is over
this virtio-scsi device.

The dbench(64 jobs, sync write, fsync) just works fine with
this patch applied.


-- 
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-08-30 16:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 15:19 [PATCH 0/2] blk-mq: fix I/O hang during system resume Ming Lei
2017-08-30 15:19 ` Ming Lei
2017-08-30 15:19 ` [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch Ming Lei
2017-08-30 15:22   ` Jens Axboe
2017-08-30 15:39     ` Ming Lei
2017-08-30 15:51       ` Jens Axboe
2017-08-30 16:58         ` Ming Lei
2017-08-30 15:19 ` [PATCH 2/2] blk-mq: align to legacy's implementation of blk_execute_rq Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.