linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
@ 2019-02-12  1:56 Jianchao Wang
  2019-02-12  2:51 ` Jens Axboe
  2019-02-15  2:00 ` Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Jianchao Wang @ 2019-02-12  1:56 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel

When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),

   kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
   kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
   kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
   kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]

(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
V2:
 - refactor the code based on Jens' suggestion

 block/blk-mq.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f5b533..9437a5e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	spin_unlock_irq(&q->requeue_lock);
 
 	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
-		if (!(rq->rq_flags & RQF_SOFTBARRIER))
+		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
 			continue;
 
 		rq->rq_flags &= ~RQF_SOFTBARRIER;
 		list_del_init(&rq->queuelist);
-		blk_mq_sched_insert_request(rq, true, false, false);
+		/*
+		 * If RQF_DONTPREP, rq has contained some driver specific
+		 * data, so insert it to hctx dispatch list to avoid any
+		 * merge.
+		 */
+		if (rq->rq_flags & RQF_DONTPREP)
+			blk_mq_request_bypass_insert(rq, false);
+		else
+			blk_mq_sched_insert_request(rq, true, false, false);
 	}
 
 	while (!list_empty(&rq_list)) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
  2019-02-12  1:56 [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue Jianchao Wang
@ 2019-02-12  2:51 ` Jens Axboe
  2019-02-15  2:00 ` Ming Lei
  1 sibling, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2019-02-12  2:51 UTC (permalink / raw)
  To: Jianchao Wang; +Cc: linux-block, linux-kernel

On 2/11/19 6:56 PM, Jianchao Wang wrote:
> When requeue, if RQF_DONTPREP, rq has contained some driver
> specific data, so insert it to hctx dispatch list to avoid any
> merge. Take scsi as example, here is the trace event log (no
> io scheduler, because RQF_STARTED would prevent merging),
> 
>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> 
> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> the sdb only contained the part of (32768 + 8), then only that part
> was completed. The lucky thing was that scsi_io_completion detected
> it and requeued the remaining part. So we didn't get corrupted data.
> However, the requeue of (32776 + 8) is not expected.

Looks good to me, I'll add this for 5.0.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
  2019-02-12  1:56 [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue Jianchao Wang
  2019-02-12  2:51 ` Jens Axboe
@ 2019-02-15  2:00 ` Ming Lei
  2019-02-15  2:34   ` jianchao.wang
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-02-15  2:00 UTC (permalink / raw)
  To: Jianchao Wang; +Cc: axboe, linux-block, linux-kernel, Damien Le Moal

On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
> When requeue, if RQF_DONTPREP, rq has contained some driver
> specific data, so insert it to hctx dispatch list to avoid any
> merge. Take scsi as example, here is the trace event log (no
> io scheduler, because RQF_STARTED would prevent merging),
> 
>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> 
> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.

scsi_mq_requeue_cmd() does uninit the request before requeuing, but
__scsi_queue_insert doesn't do that.


> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> the sdb only contained the part of (32768 + 8), then only that part
> was completed. The lucky thing was that scsi_io_completion detected
> it and requeued the remaining part. So we didn't get corrupted data.
> However, the requeue of (32776 + 8) is not expected.
> 
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
> ---
> V2:
>  - refactor the code based on Jens' suggestion
> 
>  block/blk-mq.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8f5b533..9437a5e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	spin_unlock_irq(&q->requeue_lock);
>  
>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>  			continue;
>  
>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>  		list_del_init(&rq->queuelist);
> -		blk_mq_sched_insert_request(rq, true, false, false);
> +		/*
> +		 * If RQF_DONTPREP, rq has contained some driver specific
> +		 * data, so insert it to hctx dispatch list to avoid any
> +		 * merge.
> +		 */
> +		if (rq->rq_flags & RQF_DONTPREP)
> +			blk_mq_request_bypass_insert(rq, false);
> +		else
> +			blk_mq_sched_insert_request(rq, true, false, false);
>  	}

Suppose it is one WRITE request to zone device, this way might break
the order.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
  2019-02-15  2:00 ` Ming Lei
@ 2019-02-15  2:34   ` jianchao.wang
  2019-02-15  3:14     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: jianchao.wang @ 2019-02-15  2:34 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, Damien Le Moal

Hi Ming

Thanks for your kindly response.

On 2/15/19 10:00 AM, Ming Lei wrote:
> On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
>> When requeue, if RQF_DONTPREP, rq has contained some driver
>> specific data, so insert it to hctx dispatch list to avoid any
>> merge. Take scsi as example, here is the trace event log (no
>> io scheduler, because RQF_STARTED would prevent merging),
>>
>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>
>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> 
> scsi_mq_requeue_cmd() does uninit the request before requeuing, but
> __scsi_queue_insert doesn't do that.

Yes.
scsi layer use both of them.

> 
> 
>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>> the sdb only contained the part of (32768 + 8), then only that part
>> was completed. The lucky thing was that scsi_io_completion detected
>> it and requeued the remaining part. So we didn't get corrupted data.
>> However, the requeue of (32776 + 8) is not expected.
>>
>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>> ---
>> V2:
>>  - refactor the code based on Jens' suggestion
>>
>>  block/blk-mq.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 8f5b533..9437a5e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
>>  	spin_unlock_irq(&q->requeue_lock);
>>  
>>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
>> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
>> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>>  			continue;
>>  
>>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>>  		list_del_init(&rq->queuelist);
>> -		blk_mq_sched_insert_request(rq, true, false, false);
>> +		/*
>> +		 * If RQF_DONTPREP, rq has contained some driver specific
>> +		 * data, so insert it to hctx dispatch list to avoid any
>> +		 * merge.
>> +		 */
>> +		if (rq->rq_flags & RQF_DONTPREP)
>> +			blk_mq_request_bypass_insert(rq, false);
>> +		else
>> +			blk_mq_sched_insert_request(rq, true, false, false);
>>  	}
> 
> Suppose it is one WRITE request to zone device, this way might break
> the order.

I'm not sure about this.
Since the request is dispatched, it should hold and zone write lock.
And also mq-deadline doesn't have a .requeue_request, zone write lock
wouldn't be released during requeue.

IMO, this requeue action is similar with what blk_mq_dispatch_rq_list does.
The latter one also issues the request to underlying driver and requeue rqs
on dispatch_list if get BLK_STS_SOURCE or BLK_STS_DEV_SOURCE.

And in addition, RQF_STARTED is set by io scheduler .dispatch_request and
it could be stop merging as RQF_NOMERGE_FLAGS contains it. 

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
  2019-02-15  2:34   ` jianchao.wang
@ 2019-02-15  3:14     ` Ming Lei
  2019-02-15  3:41       ` jianchao.wang
  0 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-02-15  3:14 UTC (permalink / raw)
  To: jianchao.wang; +Cc: axboe, linux-block, linux-kernel, Damien Le Moal

On Fri, Feb 15, 2019 at 10:34:39AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> Thanks for your kindly response.
> 
> On 2/15/19 10:00 AM, Ming Lei wrote:
> > On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
> >> When requeue, if RQF_DONTPREP, rq has contained some driver
> >> specific data, so insert it to hctx dispatch list to avoid any
> >> merge. Take scsi as example, here is the trace event log (no
> >> io scheduler, because RQF_STARTED would prevent merging),
> >>
> >>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
> >> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
> >> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
> >>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
> >> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
> >> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
> >>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> >>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
> >> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
> >>
> >> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
> > 
> > scsi_mq_requeue_cmd() does uninit the request before requeuing, but
> > __scsi_queue_insert doesn't do that.
> 
> Yes.
> scsi layer use both of them.
> 
> > 
> > 
> >> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
> >> the sdb only contained the part of (32768 + 8), then only that part
> >> was completed. The lucky thing was that scsi_io_completion detected
> >> it and requeued the remaining part. So we didn't get corrupted data.
> >> However, the requeue of (32776 + 8) is not expected.
> >>
> >> Suggested-by: Jens Axboe <axboe@kernel.dk>
> >> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
> >> ---
> >> V2:
> >>  - refactor the code based on Jens' suggestion
> >>
> >>  block/blk-mq.c | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 8f5b533..9437a5e 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
> >>  	spin_unlock_irq(&q->requeue_lock);
> >>  
> >>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
> >> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
> >> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
> >>  			continue;
> >>  
> >>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
> >>  		list_del_init(&rq->queuelist);
> >> -		blk_mq_sched_insert_request(rq, true, false, false);
> >> +		/*
> >> +		 * If RQF_DONTPREP, rq has contained some driver specific
> >> +		 * data, so insert it to hctx dispatch list to avoid any
> >> +		 * merge.
> >> +		 */
> >> +		if (rq->rq_flags & RQF_DONTPREP)
> >> +			blk_mq_request_bypass_insert(rq, false);
> >> +		else
> >> +			blk_mq_sched_insert_request(rq, true, false, false);
> >>  	}
> > 
> > Suppose it is one WRITE request to zone device, this way might break
> > the order.
> 
> I'm not sure about this.
> Since the request is dispatched, it should hold and zone write lock.
> And also mq-deadline doesn't have a .requeue_request, zone write lock
> wouldn't be released during requeue.

You are right, looks I misunderstood the zone write lock, sorry for
the noise.

> 
> IMO, this requeue action is similar with what blk_mq_dispatch_rq_list does.
> The latter one also issues the request to underlying driver and requeue rqs
> on dispatch_list if get BLK_STS_SOURCE or BLK_STS_DEV_SOURCE.
> 
> And in addition, RQF_STARTED is set by io scheduler .dispatch_request and
> it could be stop merging as RQF_NOMERGE_FLAGS contains it. 

Yes, that is correct.

Then another question is:

Why don't always requeue request in this way so that it can be simplified
into one code path?

1) in block legacy code, blk_requeue_request() doesn't insert the
request into scheduler queue, and simply put the request into
q->queue_head.

2) blk_mq_requeue_request() is basically run from completion context for
handling very unusual cases(partial completion, error, timeout, ...),
and there shouldn't have benefit to schedule/merge requeued request.

3) RQF_DONTPREP is like a driver private flag, and read/write by driver
only before this patch.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
  2019-02-15  3:14     ` Ming Lei
@ 2019-02-15  3:41       ` jianchao.wang
  0 siblings, 0 replies; 6+ messages in thread
From: jianchao.wang @ 2019-02-15  3:41 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, Damien Le Moal



On 2/15/19 11:14 AM, Ming Lei wrote:
> On Fri, Feb 15, 2019 at 10:34:39AM +0800, jianchao.wang wrote:
>> Hi Ming
>>
>> Thanks for your kindly response.
>>
>> On 2/15/19 10:00 AM, Ming Lei wrote:
>>> On Tue, Feb 12, 2019 at 09:56:25AM +0800, Jianchao Wang wrote:
>>>> When requeue, if RQF_DONTPREP, rq has contained some driver
>>>> specific data, so insert it to hctx dispatch list to avoid any
>>>> merge. Take scsi as example, here is the trace event log (no
>>>> io scheduler, because RQF_STARTED would prevent merging),
>>>>
>>>>    kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
>>>> scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
>>>>    kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
>>>> scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
>>>> scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
>>>>    kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>>    kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
>>>> scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
>>>>
>>>> (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
>>>
>>> scsi_mq_requeue_cmd() does uninit the request before requeuing, but
>>> __scsi_queue_insert doesn't do that.
>>
>> Yes.
>> scsi layer use both of them.
>>
>>>
>>>
>>>> Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
>>>> the sdb only contained the part of (32768 + 8), then only that part
>>>> was completed. The lucky thing was that scsi_io_completion detected
>>>> it and requeued the remaining part. So we didn't get corrupted data.
>>>> However, the requeue of (32776 + 8) is not expected.
>>>>
>>>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>>>> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>>>> ---
>>>> V2:
>>>>  - refactor the code based on Jens' suggestion
>>>>
>>>>  block/blk-mq.c | 12 ++++++++++--
>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 8f5b533..9437a5e 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -737,12 +737,20 @@ static void blk_mq_requeue_work(struct work_struct *work)
>>>>  	spin_unlock_irq(&q->requeue_lock);
>>>>  
>>>>  	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
>>>> -		if (!(rq->rq_flags & RQF_SOFTBARRIER))
>>>> +		if (!(rq->rq_flags & (RQF_SOFTBARRIER | RQF_DONTPREP)))
>>>>  			continue;
>>>>  
>>>>  		rq->rq_flags &= ~RQF_SOFTBARRIER;
>>>>  		list_del_init(&rq->queuelist);
>>>> -		blk_mq_sched_insert_request(rq, true, false, false);
>>>> +		/*
>>>> +		 * If RQF_DONTPREP, rq has contained some driver specific
>>>> +		 * data, so insert it to hctx dispatch list to avoid any
>>>> +		 * merge.
>>>> +		 */
>>>> +		if (rq->rq_flags & RQF_DONTPREP)
>>>> +			blk_mq_request_bypass_insert(rq, false);
>>>> +		else
>>>> +			blk_mq_sched_insert_request(rq, true, false, false);
>>>>  	}
>>>
>>> Suppose it is one WRITE request to zone device, this way might break
>>> the order.
>>
>> I'm not sure about this.
>> Since the request is dispatched, it should hold and zone write lock.
>> And also mq-deadline doesn't have a .requeue_request, zone write lock
>> wouldn't be released during requeue.
> 
> You are right, looks I misunderstood the zone write lock, sorry for
> the noise.
> 
>>
>> IMO, this requeue action is similar with what blk_mq_dispatch_rq_list does.
>> The latter one also issues the request to underlying driver and requeue rqs
>> on dispatch_list if get BLK_STS_SOURCE or BLK_STS_DEV_SOURCE.
>>
>> And in addition, RQF_STARTED is set by io scheduler .dispatch_request and
>> it could be stop merging as RQF_NOMERGE_FLAGS contains it. 
> 
> Yes, that is correct.
> 
> Then another question is:
> 
> Why don't always requeue request in this way so that it can be simplified
> into one code path?
> 
> 1) in block legacy code, blk_requeue_request() doesn't insert the
> request into scheduler queue, and simply put the request into
> q->queue_head.
> 
> 2) blk_mq_requeue_request() is basically run from completion context for
> handling very unusual cases(partial completion, error, timeout, ...),
> and there shouldn't have benefit to schedule/merge requeued request.

Actually, I'm also confused about questions above when I looked into the code before :)

> 
> 3) RQF_DONTPREP is like a driver private flag, and read/write by driver
> only before this patch.

Yes, indeed.
And it tells us there is driver specific data in the request.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-15  3:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-12  1:56 [PATCH V2] blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue Jianchao Wang
2019-02-12  2:51 ` Jens Axboe
2019-02-15  2:00 ` Ming Lei
2019-02-15  2:34   ` jianchao.wang
2019-02-15  3:14     ` Ming Lei
2019-02-15  3:41       ` jianchao.wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).