[PATCH] blk-mq: run queue after issuing the last request of the plug list

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] blk-mq: run queue after issuing the last request of the plug list
@ 2022-07-18 12:35 Yufen Yu
  2022-07-19  9:26 ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Yufen Yu @ 2022-07-18 12:35 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, ming.lei, hch, Yufen Yu

We do test on a virtio scsi device (/dev/sda) and the default mq
scheduler is 'none'. We found a IO hung as following:

blk_finish_plug
  blk_mq_plug_issue_direct
      scsi_mq_get_budget
      //get budget_token fail and sdev->restarts=1

			     	 scsi_end_request
				   scsi_run_queue_async
                                   //sdev->restart=0 and run queue

     blk_mq_request_bypass_insert
        //add request to hctx->dispatch list

  //continue to dispath plug list
  blk_mq_dispatch_plug_list
      blk_mq_try_issue_list_directly
        //success issue all requests from plug list

After .get_budget fail, scsi_mq_get_budget will increase 'restarts'.
Normally, it will run hw queue when io complete and set 'restarts'
as 0. But if we run queue before adding request to the dispatch list
and blk_mq_dispatch_plug_list also success issue all requests, then
on one will run queue, and the request will be stall in the dispatch
list and cannot complete forever.

To fix the bug, we run queue after issuing the last request in
function blk_mq_sched_insert_requests.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/blk-mq-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index a4f7c101b53b..c3ad97ca2753 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -490,8 +490,8 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
 		blk_mq_insert_requests(hctx, ctx, list);
 	}
 
-	blk_mq_run_hw_queue(hctx, run_queue_async);
  out:
+	blk_mq_run_hw_queue(hctx, run_queue_async);
 	percpu_ref_put(&q->q_usage_counter);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-18 12:35 [PATCH] blk-mq: run queue after issuing the last request of the plug list Yufen Yu
@ 2022-07-19  9:26 ` Ming Lei
  2022-07-19 11:00   ` Yufen Yu
  2022-07-23  2:50   ` Yu Kuai
  0 siblings, 2 replies; 19+ messages in thread
From: Ming Lei @ 2022-07-19  9:26 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, hch, ming.lei

On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> We do test on a virtio scsi device (/dev/sda) and the default mq
> scheduler is 'none'. We found a IO hung as following:
> 
> blk_finish_plug
>   blk_mq_plug_issue_direct
>       scsi_mq_get_budget
>       //get budget_token fail and sdev->restarts=1
> 
> 			     	 scsi_end_request
> 				   scsi_run_queue_async
>                                    //sdev->restart=0 and run queue
> 
>      blk_mq_request_bypass_insert
>         //add request to hctx->dispatch list

Here the issue shouldn't be related with scsi's get budget or
scsi_run_queue_async.

If blk-mq adds request into ->dispatch_list, it is blk-mq core's
responsibility to re-run queue for moving on. Can you investigate a
bit more why blk-mq doesn't run queue after adding request to
hctx dispatch list?



Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-19  9:26 ` Ming Lei
@ 2022-07-19 11:00   ` Yufen Yu
  2022-07-23  2:50   ` Yu Kuai
  1 sibling, 0 replies; 19+ messages in thread
From: Yufen Yu @ 2022-07-19 11:00 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, hch



On 2022/7/19 17:26, Ming Lei wrote:
> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>> We do test on a virtio scsi device (/dev/sda) and the default mq
>> scheduler is 'none'. We found a IO hung as following:
>>
>> blk_finish_plug
>>    blk_mq_plug_issue_direct
>>        scsi_mq_get_budget
>>        //get budget_token fail and sdev->restarts=1
>>
>> 			     	 scsi_end_request
>> 				   scsi_run_queue_async
>>                                     //sdev->restart=0 and run queue
>>
>>       blk_mq_request_bypass_insert
>>          //add request to hctx->dispatch list
> 
> Here the issue shouldn't be related with scsi's get budget or
> scsi_run_queue_async.
> 
> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> responsibility to re-run queue for moving on. Can you investigate a
> bit more why blk-mq doesn't run queue after adding request to
> hctx dispatch list?
> 

In my IO hung scenario, no one issue any IO any more after calling blk_finish_plug().
There is no chance of run queue.

Thanks,
Yufen


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-19  9:26 ` Ming Lei
  2022-07-19 11:00   ` Yufen Yu
@ 2022-07-23  2:50   ` Yu Kuai
  2022-07-25 15:43     ` Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2022-07-23  2:50 UTC (permalink / raw)
  To: Ming Lei, Yufen Yu; +Cc: axboe, linux-block, hch, yukuai3, zhangyi (F)

Hi, Ming!

在 2022/07/19 17:26, Ming Lei 写道:
> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>> We do test on a virtio scsi device (/dev/sda) and the default mq
>> scheduler is 'none'. We found a IO hung as following:
>>
>> blk_finish_plug
>>    blk_mq_plug_issue_direct
>>        scsi_mq_get_budget
>>        //get budget_token fail and sdev->restarts=1
>>
>> 			     	 scsi_end_request
>> 				   scsi_run_queue_async
>>                                     //sdev->restart=0 and run queue
>>
>>       blk_mq_request_bypass_insert
>>          //add request to hctx->dispatch list
> 
> Here the issue shouldn't be related with scsi's get budget or
> scsi_run_queue_async.
> 
> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> responsibility to re-run queue for moving on. Can you investigate a
> bit more why blk-mq doesn't run queue after adding request to
> hctx dispatch list?

I think Yufen is probably thinking about the following Concurrent
scenario:

blk_mq_flush_plug_list
# assume there are three rq
  blk_mq_plug_issue_direct
   blk_mq_request_issue_directly
   # dispatch rq1, succeed
   blk_mq_request_issue_directly
   # dispatch rq2
    __blk_mq_try_issue_directly
     blk_mq_get_dispatch_budget
      scsi_mq_get_budget
       atomic_inc(&sdev->restarts);
       # rq2 failed to get budget
       # restarts is 1 now
                                         scsi_end_request
                                         # rq1 is completed
                                         ┊scsi_run_queue_async
                                         ┊ 
atomic_cmpxchg(&sdev->restarts, old, 0) == old
                                         ┊ # set restarts to 0
                                         ┊ blk_mq_run_hw_queues
                                         ┊ # hctx->dispatch list is empty
   blk_mq_request_bypass_insert
   # insert rq2 to hctx->dispatch list

  blk_mq_dispatch_plug_list
  # continue to dispatch rq3
   blk_mq_sched_insert_requests
    blk_mq_try_issue_list_directly
    # blk_mq_run_hw_queue() won't be called
    # because dispatching is succeed
                                         scsi_end_request
                                         ┊# rq3 is complete
                                         ┊ scsi_run_queue_async
                                         ┊ # restarts is 0, won't run queue

The root cause is that the failed dispatching is not the last rq,
and last rq is dispatched sucessfully.

Thanks,
Kuai
> 
> 
> 
> Thanks,
> Ming
> 
> .
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-23  2:50   ` Yu Kuai
@ 2022-07-25 15:43     ` Ming Lei
  2022-07-26  1:08       ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2022-07-25 15:43 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yufen Yu, axboe, linux-block, hch, yukuai3, zhangyi (F), ming.lei

On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> Hi, Ming!
> 
> 在 2022/07/19 17:26, Ming Lei 写道:
> > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > scheduler is 'none'. We found a IO hung as following:
> > > 
> > > blk_finish_plug
> > >    blk_mq_plug_issue_direct
> > >        scsi_mq_get_budget
> > >        //get budget_token fail and sdev->restarts=1
> > > 
> > > 			     	 scsi_end_request
> > > 				   scsi_run_queue_async
> > >                                     //sdev->restart=0 and run queue
> > > 
> > >       blk_mq_request_bypass_insert
> > >          //add request to hctx->dispatch list
> > 
> > Here the issue shouldn't be related with scsi's get budget or
> > scsi_run_queue_async.
> > 
> > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > responsibility to re-run queue for moving on. Can you investigate a
> > bit more why blk-mq doesn't run queue after adding request to
> > hctx dispatch list?
> 
> I think Yufen is probably thinking about the following Concurrent
> scenario:
> 
> blk_mq_flush_plug_list
> # assume there are three rq
>  blk_mq_plug_issue_direct
>   blk_mq_request_issue_directly
>   # dispatch rq1, succeed
>   blk_mq_request_issue_directly
>   # dispatch rq2
>    __blk_mq_try_issue_directly
>     blk_mq_get_dispatch_budget
>      scsi_mq_get_budget
>       atomic_inc(&sdev->restarts);
>       # rq2 failed to get budget
>       # restarts is 1 now
>                                         scsi_end_request
>                                         # rq1 is completed
>                                         ┊scsi_run_queue_async
>                                         ┊ atomic_cmpxchg(&sdev->restarts,
> old, 0) == old
>                                         ┊ # set restarts to 0
>                                         ┊ blk_mq_run_hw_queues
>                                         ┊ # hctx->dispatch list is empty
>   blk_mq_request_bypass_insert
>   # insert rq2 to hctx->dispatch list

After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
no matter if list_empty(list) is empty or not, queue will be run either from
blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().

And rq2 should be visible to the run queue, just wondering why rq2 isn't
issued finally?


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-25 15:43     ` Ming Lei
@ 2022-07-26  1:08       ` Yu Kuai
  2022-07-26  1:46         ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2022-07-26  1:08 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai; +Cc: Yufen Yu, axboe, linux-block, hch, zhangyi (F)

Hi, Ming!

在 2022/07/25 23:43, Ming Lei 写道:
> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>> Hi, Ming!
>>
>> 在 2022/07/19 17:26, Ming Lei 写道:
>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>> scheduler is 'none'. We found a IO hung as following:
>>>>
>>>> blk_finish_plug
>>>>     blk_mq_plug_issue_direct
>>>>         scsi_mq_get_budget
>>>>         //get budget_token fail and sdev->restarts=1
>>>>
>>>> 			     	 scsi_end_request
>>>> 				   scsi_run_queue_async
>>>>                                      //sdev->restart=0 and run queue
>>>>
>>>>        blk_mq_request_bypass_insert
>>>>           //add request to hctx->dispatch list
>>>
>>> Here the issue shouldn't be related with scsi's get budget or
>>> scsi_run_queue_async.
>>>
>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>> responsibility to re-run queue for moving on. Can you investigate a
>>> bit more why blk-mq doesn't run queue after adding request to
>>> hctx dispatch list?
>>
>> I think Yufen is probably thinking about the following Concurrent
>> scenario:
>>
>> blk_mq_flush_plug_list
>> # assume there are three rq
>>   blk_mq_plug_issue_direct
>>    blk_mq_request_issue_directly
>>    # dispatch rq1, succeed
>>    blk_mq_request_issue_directly
>>    # dispatch rq2
>>     __blk_mq_try_issue_directly
>>      blk_mq_get_dispatch_budget
>>       scsi_mq_get_budget
>>        atomic_inc(&sdev->restarts);
>>        # rq2 failed to get budget
>>        # restarts is 1 now
>>                                          scsi_end_request
>>                                          # rq1 is completed
>>                                          ┊scsi_run_queue_async
>>                                          ┊ atomic_cmpxchg(&sdev->restarts,
>> old, 0) == old
>>                                          ┊ # set restarts to 0
>>                                          ┊ blk_mq_run_hw_queues
>>                                          ┊ # hctx->dispatch list is empty
>>    blk_mq_request_bypass_insert
>>    # insert rq2 to hctx->dispatch list
> 
> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> no matter if list_empty(list) is empty or not, queue will be run either from
> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().

1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
is called from blk_mq_try_issue_list_directly(), list_empty() won't
pass, thus thus blk_mq_request_bypass_insert() won't run queue.

2) after blk_mq_sched_insert_requests() dispatchs rq3, list_empty() will
pass, thus blk_mq_sched_insert_requests() won't run queue. (That's what
yufen tries to fix.)
> 
> And rq2 should be visible to the run queue, just wondering why rq2 isn't
> issued finally?
> 
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  1:08       ` Yu Kuai
@ 2022-07-26  1:46         ` Ming Lei
  2022-07-26  2:08           ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2022-07-26  1:46 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
> Hi, Ming!
> 
> 在 2022/07/25 23:43, Ming Lei 写道:
> > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> > > Hi, Ming!
> > > 
> > > 在 2022/07/19 17:26, Ming Lei 写道:
> > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > > > scheduler is 'none'. We found a IO hung as following:
> > > > > 
> > > > > blk_finish_plug
> > > > >     blk_mq_plug_issue_direct
> > > > >         scsi_mq_get_budget
> > > > >         //get budget_token fail and sdev->restarts=1
> > > > > 
> > > > > 			     	 scsi_end_request
> > > > > 				   scsi_run_queue_async
> > > > >                                      //sdev->restart=0 and run queue
> > > > > 
> > > > >        blk_mq_request_bypass_insert
> > > > >           //add request to hctx->dispatch list
> > > > 
> > > > Here the issue shouldn't be related with scsi's get budget or
> > > > scsi_run_queue_async.
> > > > 
> > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > > > responsibility to re-run queue for moving on. Can you investigate a
> > > > bit more why blk-mq doesn't run queue after adding request to
> > > > hctx dispatch list?
> > > 
> > > I think Yufen is probably thinking about the following Concurrent
> > > scenario:
> > > 
> > > blk_mq_flush_plug_list
> > > # assume there are three rq
> > >   blk_mq_plug_issue_direct
> > >    blk_mq_request_issue_directly
> > >    # dispatch rq1, succeed
> > >    blk_mq_request_issue_directly
> > >    # dispatch rq2
> > >     __blk_mq_try_issue_directly
> > >      blk_mq_get_dispatch_budget
> > >       scsi_mq_get_budget
> > >        atomic_inc(&sdev->restarts);
> > >        # rq2 failed to get budget
> > >        # restarts is 1 now
> > >                                          scsi_end_request
> > >                                          # rq1 is completed
> > >                                          ┊scsi_run_queue_async
> > >                                          ┊ atomic_cmpxchg(&sdev->restarts,
> > > old, 0) == old
> > >                                          ┊ # set restarts to 0
> > >                                          ┊ blk_mq_run_hw_queues
> > >                                          ┊ # hctx->dispatch list is empty
> > >    blk_mq_request_bypass_insert
> > >    # insert rq2 to hctx->dispatch list
> > 
> > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> > no matter if list_empty(list) is empty or not, queue will be run either from
> > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
> 
> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
> is called from blk_mq_try_issue_list_directly(), list_empty() won't
> pass, thus thus blk_mq_request_bypass_insert() won't run queue.

Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
list, the loop is broken and blk_mq_try_issue_list_directly() returns to
blk_mq_sched_insert_requests() in which list_empty() is false, so
blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
is still run.

Also not sure why you make rq3 involved, since the list is local list on
stack, and it can be operated concurrently.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  1:46         ` Ming Lei
@ 2022-07-26  2:08           ` Yu Kuai
  2022-07-26  2:32             ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2022-07-26  2:08 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

在 2022/07/26 9:46, Ming Lei 写道:
> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
>> Hi, Ming!
>>
>> 在 2022/07/25 23:43, Ming Lei 写道:
>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>>>> Hi, Ming!
>>>>
>>>> 在 2022/07/19 17:26, Ming Lei 写道:
>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>>>> scheduler is 'none'. We found a IO hung as following:
>>>>>>
>>>>>> blk_finish_plug
>>>>>>      blk_mq_plug_issue_direct
>>>>>>          scsi_mq_get_budget
>>>>>>          //get budget_token fail and sdev->restarts=1
>>>>>>
>>>>>> 			     	 scsi_end_request
>>>>>> 				   scsi_run_queue_async
>>>>>>                                       //sdev->restart=0 and run queue
>>>>>>
>>>>>>         blk_mq_request_bypass_insert
>>>>>>            //add request to hctx->dispatch list
>>>>>
>>>>> Here the issue shouldn't be related with scsi's get budget or
>>>>> scsi_run_queue_async.
>>>>>
>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>>>> responsibility to re-run queue for moving on. Can you investigate a
>>>>> bit more why blk-mq doesn't run queue after adding request to
>>>>> hctx dispatch list?
>>>>
>>>> I think Yufen is probably thinking about the following Concurrent
>>>> scenario:
>>>>
>>>> blk_mq_flush_plug_list
>>>> # assume there are three rq
>>>>    blk_mq_plug_issue_direct
>>>>     blk_mq_request_issue_directly
>>>>     # dispatch rq1, succeed
>>>>     blk_mq_request_issue_directly
>>>>     # dispatch rq2
>>>>      __blk_mq_try_issue_directly
>>>>       blk_mq_get_dispatch_budget
>>>>        scsi_mq_get_budget
>>>>         atomic_inc(&sdev->restarts);
>>>>         # rq2 failed to get budget
>>>>         # restarts is 1 now
>>>>                                           scsi_end_request
>>>>                                           # rq1 is completed
>>>>                                           ┊scsi_run_queue_async
>>>>                                           ┊ atomic_cmpxchg(&sdev->restarts,
>>>> old, 0) == old
>>>>                                           ┊ # set restarts to 0
>>>>                                           ┊ blk_mq_run_hw_queues
>>>>                                           ┊ # hctx->dispatch list is empty
>>>>     blk_mq_request_bypass_insert
>>>>     # insert rq2 to hctx->dispatch list
>>>
>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
>>> no matter if list_empty(list) is empty or not, queue will be run either from
>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
>>
>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
>> is called from blk_mq_try_issue_list_directly(), list_empty() won't
>> pass, thus thus blk_mq_request_bypass_insert() won't run queue.
> 
> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
> list, the loop is broken and blk_mq_try_issue_list_directly() returns to
> blk_mq_sched_insert_requests() in which list_empty() is false, so
> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
> is still run.
> 
> Also not sure why you make rq3 involved, since the list is local list on
> stack, and it can be operated concurrently.

I make rq3 involved because there are some conditions that
blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
blk_mq_sched_insert_requests():

This is the case I'm thinking, if blk_mq_try_issue_list_directly()
succeed from blk_mq_sched_insert_requests(), and list become empty.

Thanks,
Kuai
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  2:08           ` Yu Kuai
@ 2022-07-26  2:32             ` Ming Lei
  2022-07-26  2:52               ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2022-07-26  2:32 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
> 在 2022/07/26 9:46, Ming Lei 写道:
> > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
> > > Hi, Ming!
> > > 
> > > 在 2022/07/25 23:43, Ming Lei 写道:
> > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> > > > > Hi, Ming!
> > > > > 
> > > > > 在 2022/07/19 17:26, Ming Lei 写道:
> > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > > > > > scheduler is 'none'. We found a IO hung as following:
> > > > > > > 
> > > > > > > blk_finish_plug
> > > > > > >      blk_mq_plug_issue_direct
> > > > > > >          scsi_mq_get_budget
> > > > > > >          //get budget_token fail and sdev->restarts=1
> > > > > > > 
> > > > > > > 			     	 scsi_end_request
> > > > > > > 				   scsi_run_queue_async
> > > > > > >                                       //sdev->restart=0 and run queue
> > > > > > > 
> > > > > > >         blk_mq_request_bypass_insert
> > > > > > >            //add request to hctx->dispatch list
> > > > > > 
> > > > > > Here the issue shouldn't be related with scsi's get budget or
> > > > > > scsi_run_queue_async.
> > > > > > 
> > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > > > > > responsibility to re-run queue for moving on. Can you investigate a
> > > > > > bit more why blk-mq doesn't run queue after adding request to
> > > > > > hctx dispatch list?
> > > > > 
> > > > > I think Yufen is probably thinking about the following Concurrent
> > > > > scenario:
> > > > > 
> > > > > blk_mq_flush_plug_list
> > > > > # assume there are three rq
> > > > >    blk_mq_plug_issue_direct
> > > > >     blk_mq_request_issue_directly
> > > > >     # dispatch rq1, succeed
> > > > >     blk_mq_request_issue_directly
> > > > >     # dispatch rq2
> > > > >      __blk_mq_try_issue_directly
> > > > >       blk_mq_get_dispatch_budget
> > > > >        scsi_mq_get_budget
> > > > >         atomic_inc(&sdev->restarts);
> > > > >         # rq2 failed to get budget
> > > > >         # restarts is 1 now
> > > > >                                           scsi_end_request
> > > > >                                           # rq1 is completed
> > > > >                                           ┊scsi_run_queue_async
> > > > >                                           ┊ atomic_cmpxchg(&sdev->restarts,
> > > > > old, 0) == old
> > > > >                                           ┊ # set restarts to 0
> > > > >                                           ┊ blk_mq_run_hw_queues
> > > > >                                           ┊ # hctx->dispatch list is empty
> > > > >     blk_mq_request_bypass_insert
> > > > >     # insert rq2 to hctx->dispatch list
> > > > 
> > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> > > > no matter if list_empty(list) is empty or not, queue will be run either from
> > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
> > > 
> > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
> > > is called from blk_mq_try_issue_list_directly(), list_empty() won't
> > > pass, thus thus blk_mq_request_bypass_insert() won't run queue.
> > 
> > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
> > list, the loop is broken and blk_mq_try_issue_list_directly() returns to
> > blk_mq_sched_insert_requests() in which list_empty() is false, so
> > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
> > is still run.
> > 
> > Also not sure why you make rq3 involved, since the list is local list on
> > stack, and it can be operated concurrently.
> 
> I make rq3 involved because there are some conditions that
> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
> blk_mq_sched_insert_requests():

The two won't be called if list_empty() is true, and will be called if
!list_empty().

That is why I mentioned run queue has been done after rq2 is added to
->dispatch_list.

Can you show the debugfs log after the hang is caused?



thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  2:32             ` Ming Lei
@ 2022-07-26  2:52               ` Yu Kuai
  2022-07-26  3:02                 ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2022-07-26  2:52 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

Hi, Ming
在 2022/07/26 10:32, Ming Lei 写道:
> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
>> 在 2022/07/26 9:46, Ming Lei 写道:
>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
>>>> Hi, Ming!
>>>>
>>>> 在 2022/07/25 23:43, Ming Lei 写道:
>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>>>>>> Hi, Ming!
>>>>>>
>>>>>> 在 2022/07/19 17:26, Ming Lei 写道:
>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>>>>>> scheduler is 'none'. We found a IO hung as following:
>>>>>>>>
>>>>>>>> blk_finish_plug
>>>>>>>>       blk_mq_plug_issue_direct
>>>>>>>>           scsi_mq_get_budget
>>>>>>>>           //get budget_token fail and sdev->restarts=1
>>>>>>>>
>>>>>>>> 			     	 scsi_end_request
>>>>>>>> 				   scsi_run_queue_async
>>>>>>>>                                        //sdev->restart=0 and run queue
>>>>>>>>
>>>>>>>>          blk_mq_request_bypass_insert
>>>>>>>>             //add request to hctx->dispatch list
>>>>>>>
>>>>>>> Here the issue shouldn't be related with scsi's get budget or
>>>>>>> scsi_run_queue_async.
>>>>>>>
>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>>>>>> responsibility to re-run queue for moving on. Can you investigate a
>>>>>>> bit more why blk-mq doesn't run queue after adding request to
>>>>>>> hctx dispatch list?
>>>>>>
>>>>>> I think Yufen is probably thinking about the following Concurrent
>>>>>> scenario:
>>>>>>
>>>>>> blk_mq_flush_plug_list
>>>>>> # assume there are three rq
>>>>>>     blk_mq_plug_issue_direct
>>>>>>      blk_mq_request_issue_directly
>>>>>>      # dispatch rq1, succeed
>>>>>>      blk_mq_request_issue_directly
>>>>>>      # dispatch rq2
>>>>>>       __blk_mq_try_issue_directly
>>>>>>        blk_mq_get_dispatch_budget
>>>>>>         scsi_mq_get_budget
>>>>>>          atomic_inc(&sdev->restarts);
>>>>>>          # rq2 failed to get budget
>>>>>>          # restarts is 1 now
>>>>>>                                            scsi_end_request
>>>>>>                                            # rq1 is completed
>>>>>>                                            ┊scsi_run_queue_async
>>>>>>                                            ┊ atomic_cmpxchg(&sdev->restarts,
>>>>>> old, 0) == old
>>>>>>                                            ┊ # set restarts to 0
>>>>>>                                            ┊ blk_mq_run_hw_queues
>>>>>>                                            ┊ # hctx->dispatch list is empty
>>>>>>      blk_mq_request_bypass_insert
>>>>>>      # insert rq2 to hctx->dispatch list
>>>>>
>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
>>>>> no matter if list_empty(list) is empty or not, queue will be run either from
>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
>>>>
>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't
>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue.
>>>
>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to
>>> blk_mq_sched_insert_requests() in which list_empty() is false, so
>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
>>> is still run.
>>>
>>> Also not sure why you make rq3 involved, since the list is local list on
>>> stack, and it can be operated concurrently.
>>
>> I make rq3 involved because there are some conditions that
>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
>> blk_mq_sched_insert_requests():
> 
> The two won't be called if list_empty() is true, and will be called if
> !list_empty().
> 
> That is why I mentioned run queue has been done after rq2 is added to
> ->dispatch_list.

I don't follow here, it's right after rq2 is inserted to dispatch list,
list is not empty, and blk_mq_sched_insert_requests() will be called.
However, do you think that it's impossible that
blk_mq_sched_insert_requests() can dispatch rq in the list and list
will become empty?

> 
> Can you show the debugfs log after the hang is caused?

I didn't repoduce the problem myself, perhaps Yufen can show the log.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  2:52               ` Yu Kuai
@ 2022-07-26  3:02                 ` Ming Lei
  2022-07-26  3:14                   ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2022-07-26  3:02 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
> Hi, Ming
> 在 2022/07/26 10:32, Ming Lei 写道:
> > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
> > > 在 2022/07/26 9:46, Ming Lei 写道:
> > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
> > > > > Hi, Ming!
> > > > > 
> > > > > 在 2022/07/25 23:43, Ming Lei 写道:
> > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> > > > > > > Hi, Ming!
> > > > > > > 
> > > > > > > 在 2022/07/19 17:26, Ming Lei 写道:
> > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > > > > > > > scheduler is 'none'. We found a IO hung as following:
> > > > > > > > > 
> > > > > > > > > blk_finish_plug
> > > > > > > > >       blk_mq_plug_issue_direct
> > > > > > > > >           scsi_mq_get_budget
> > > > > > > > >           //get budget_token fail and sdev->restarts=1
> > > > > > > > > 
> > > > > > > > > 			     	 scsi_end_request
> > > > > > > > > 				   scsi_run_queue_async
> > > > > > > > >                                        //sdev->restart=0 and run queue
> > > > > > > > > 
> > > > > > > > >          blk_mq_request_bypass_insert
> > > > > > > > >             //add request to hctx->dispatch list
> > > > > > > > 
> > > > > > > > Here the issue shouldn't be related with scsi's get budget or
> > > > > > > > scsi_run_queue_async.
> > > > > > > > 
> > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > > > > > > > responsibility to re-run queue for moving on. Can you investigate a
> > > > > > > > bit more why blk-mq doesn't run queue after adding request to
> > > > > > > > hctx dispatch list?
> > > > > > > 
> > > > > > > I think Yufen is probably thinking about the following Concurrent
> > > > > > > scenario:
> > > > > > > 
> > > > > > > blk_mq_flush_plug_list
> > > > > > > # assume there are three rq
> > > > > > >     blk_mq_plug_issue_direct
> > > > > > >      blk_mq_request_issue_directly
> > > > > > >      # dispatch rq1, succeed
> > > > > > >      blk_mq_request_issue_directly
> > > > > > >      # dispatch rq2
> > > > > > >       __blk_mq_try_issue_directly
> > > > > > >        blk_mq_get_dispatch_budget
> > > > > > >         scsi_mq_get_budget
> > > > > > >          atomic_inc(&sdev->restarts);
> > > > > > >          # rq2 failed to get budget
> > > > > > >          # restarts is 1 now
> > > > > > >                                            scsi_end_request
> > > > > > >                                            # rq1 is completed
> > > > > > >                                            ┊scsi_run_queue_async
> > > > > > >                                            ┊ atomic_cmpxchg(&sdev->restarts,
> > > > > > > old, 0) == old
> > > > > > >                                            ┊ # set restarts to 0
> > > > > > >                                            ┊ blk_mq_run_hw_queues
> > > > > > >                                            ┊ # hctx->dispatch list is empty
> > > > > > >      blk_mq_request_bypass_insert
> > > > > > >      # insert rq2 to hctx->dispatch list
> > > > > > 
> > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> > > > > > no matter if list_empty(list) is empty or not, queue will be run either from
> > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
> > > > > 
> > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
> > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't
> > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue.
> > > > 
> > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
> > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to
> > > > blk_mq_sched_insert_requests() in which list_empty() is false, so
> > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
> > > > is still run.
> > > > 
> > > > Also not sure why you make rq3 involved, since the list is local list on
> > > > stack, and it can be operated concurrently.
> > > 
> > > I make rq3 involved because there are some conditions that
> > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
> > > blk_mq_sched_insert_requests():
> > 
> > The two won't be called if list_empty() is true, and will be called if
> > !list_empty().
> > 
> > That is why I mentioned run queue has been done after rq2 is added to
> > ->dispatch_list.
> 
> I don't follow here, it's right after rq2 is inserted to dispatch list,
> list is not empty, and blk_mq_sched_insert_requests() will be called.
> However, do you think that it's impossible that
> blk_mq_sched_insert_requests() can dispatch rq in the list and list
> will become empty?

Please take a look at blk_mq_sched_insert_requests().

When codes runs into blk_mq_sched_insert_requests(), the following
blk_mq_run_hw_queue() will be run always, how does list empty or not
make a difference there?

In short, after rq2 is added into ->dispatch, the queue is guaranteed
to run, the handling logic isn't wrong. That said that the reported
hang isn't root caused yet, is it?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  3:02                 ` Ming Lei
@ 2022-07-26  3:14                   ` Yu Kuai
  2022-07-26  3:21                     ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2022-07-26  3:14 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

Hi, Ming

在 2022/07/26 11:02, Ming Lei 写道:
> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
>> Hi, Ming
>> 在 2022/07/26 10:32, Ming Lei 写道:
>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
>>>> 在 2022/07/26 9:46, Ming Lei 写道:
>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
>>>>>> Hi, Ming!
>>>>>>
>>>>>> 在 2022/07/25 23:43, Ming Lei 写道:
>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>>>>>>>> Hi, Ming!
>>>>>>>>
>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道:
>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>>>>>>>> scheduler is 'none'. We found a IO hung as following:
>>>>>>>>>>
>>>>>>>>>> blk_finish_plug
>>>>>>>>>>        blk_mq_plug_issue_direct
>>>>>>>>>>            scsi_mq_get_budget
>>>>>>>>>>            //get budget_token fail and sdev->restarts=1
>>>>>>>>>>
>>>>>>>>>> 			     	 scsi_end_request
>>>>>>>>>> 				   scsi_run_queue_async
>>>>>>>>>>                                         //sdev->restart=0 and run queue
>>>>>>>>>>
>>>>>>>>>>           blk_mq_request_bypass_insert
>>>>>>>>>>              //add request to hctx->dispatch list
>>>>>>>>>
>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or
>>>>>>>>> scsi_run_queue_async.
>>>>>>>>>
>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a
>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to
>>>>>>>>> hctx dispatch list?
>>>>>>>>
>>>>>>>> I think Yufen is probably thinking about the following Concurrent
>>>>>>>> scenario:
>>>>>>>>
>>>>>>>> blk_mq_flush_plug_list
>>>>>>>> # assume there are three rq
>>>>>>>>      blk_mq_plug_issue_direct
>>>>>>>>       blk_mq_request_issue_directly
>>>>>>>>       # dispatch rq1, succeed
>>>>>>>>       blk_mq_request_issue_directly
>>>>>>>>       # dispatch rq2
>>>>>>>>        __blk_mq_try_issue_directly
>>>>>>>>         blk_mq_get_dispatch_budget
>>>>>>>>          scsi_mq_get_budget
>>>>>>>>           atomic_inc(&sdev->restarts);
>>>>>>>>           # rq2 failed to get budget
>>>>>>>>           # restarts is 1 now
>>>>>>>>                                             scsi_end_request
>>>>>>>>                                             # rq1 is completed
>>>>>>>>                                             ┊scsi_run_queue_async
>>>>>>>>                                             ┊ atomic_cmpxchg(&sdev->restarts,
>>>>>>>> old, 0) == old
>>>>>>>>                                             ┊ # set restarts to 0
>>>>>>>>                                             ┊ blk_mq_run_hw_queues
>>>>>>>>                                             ┊ # hctx->dispatch list is empty
>>>>>>>>       blk_mq_request_bypass_insert
>>>>>>>>       # insert rq2 to hctx->dispatch list
>>>>>>>
>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from
>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
>>>>>>
>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't
>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue.
>>>>>
>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to
>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so
>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
>>>>> is still run.
>>>>>
>>>>> Also not sure why you make rq3 involved, since the list is local list on
>>>>> stack, and it can be operated concurrently.
>>>>
>>>> I make rq3 involved because there are some conditions that
>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
>>>> blk_mq_sched_insert_requests():
>>>
>>> The two won't be called if list_empty() is true, and will be called if
>>> !list_empty().
>>>
>>> That is why I mentioned run queue has been done after rq2 is added to
>>> ->dispatch_list.
>>
>> I don't follow here, it's right after rq2 is inserted to dispatch list,
>> list is not empty, and blk_mq_sched_insert_requests() will be called.
>> However, do you think that it's impossible that
>> blk_mq_sched_insert_requests() can dispatch rq in the list and list
>> will become empty?
> 
> Please take a look at blk_mq_sched_insert_requests().
> 
> When codes runs into blk_mq_sched_insert_requests(), the following
> blk_mq_run_hw_queue() will be run always, how does list empty or not
> make a difference there?

This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
tries to do in this patch, are we look at different code?

I'm copying blk_mq_sched_insert_requests() here, the code is from
latest linux-next:

461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
462                                 ┊ struct blk_mq_ctx *ctx,
463                                 ┊ struct list_head *list, bool 
run_queue_async)
464 {
465         struct elevator_queue *e;
466         struct request_queue *q = hctx->queue;
467
468         /*
469         ┊* blk_mq_sched_insert_requests() is called from flush plug
470         ┊* context only, and hold one usage counter to prevent queue
471         ┊* from being released.
472         ┊*/
473         percpu_ref_get(&q->q_usage_counter);
474
475         e = hctx->queue->elevator;
476         if (e) {
477                 e->type->ops.insert_requests(hctx, list, false);
478         } else {
479                 /*
480                 ┊* try to issue requests directly if the hw queue isn't
481                 ┊* busy in case of 'none' scheduler, and this way 
may save
482                 ┊* us one extra enqueue & dequeue to sw queue.
483                 ┊*/
484                 if (!hctx->dispatch_busy && !run_queue_async) {
485                         blk_mq_run_dispatch_ops(hctx->queue,
486                                 blk_mq_try_issue_list_directly(hctx, 
list));
487                         if (list_empty(list))
488                                 goto out;
489                 }
490                 blk_mq_insert_requests(hctx, ctx, list);
491         }
492
493         blk_mq_run_hw_queue(hctx, run_queue_async);
494  out:
495         percpu_ref_put(&q->q_usage_counter);
496 }

Here in line 487, if list_empty() is true, out label will skip
run_queue().
> 
> In short, after rq2 is added into ->dispatch, the queue is guaranteed
> to run, the handling logic isn't wrong. That said that the reported
> hang isn't root caused yet, is it?
> 
> 
> Thanks,
> Ming
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  3:14                   ` Yu Kuai
@ 2022-07-26  3:21                     ` Ming Lei
  2022-07-26  3:31                       ` Yufen Yu
  2022-07-26  3:31                       ` Yu Kuai
  0 siblings, 2 replies; 19+ messages in thread
From: Ming Lei @ 2022-07-26  3:21 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote:
> Hi, Ming
> 
> 在 2022/07/26 11:02, Ming Lei 写道:
> > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
> > > Hi, Ming
> > > 在 2022/07/26 10:32, Ming Lei 写道:
> > > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
> > > > > 在 2022/07/26 9:46, Ming Lei 写道:
> > > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
> > > > > > > Hi, Ming!
> > > > > > > 
> > > > > > > 在 2022/07/25 23:43, Ming Lei 写道:
> > > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> > > > > > > > > Hi, Ming!
> > > > > > > > > 
> > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道:
> > > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > > > > > > > > > scheduler is 'none'. We found a IO hung as following:
> > > > > > > > > > > 
> > > > > > > > > > > blk_finish_plug
> > > > > > > > > > >        blk_mq_plug_issue_direct
> > > > > > > > > > >            scsi_mq_get_budget
> > > > > > > > > > >            //get budget_token fail and sdev->restarts=1
> > > > > > > > > > > 
> > > > > > > > > > > 			     	 scsi_end_request
> > > > > > > > > > > 				   scsi_run_queue_async
> > > > > > > > > > >                                         //sdev->restart=0 and run queue
> > > > > > > > > > > 
> > > > > > > > > > >           blk_mq_request_bypass_insert
> > > > > > > > > > >              //add request to hctx->dispatch list
> > > > > > > > > > 
> > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or
> > > > > > > > > > scsi_run_queue_async.
> > > > > > > > > > 
> > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a
> > > > > > > > > > bit more why blk-mq doesn't run queue after adding request to
> > > > > > > > > > hctx dispatch list?
> > > > > > > > > 
> > > > > > > > > I think Yufen is probably thinking about the following Concurrent
> > > > > > > > > scenario:
> > > > > > > > > 
> > > > > > > > > blk_mq_flush_plug_list
> > > > > > > > > # assume there are three rq
> > > > > > > > >      blk_mq_plug_issue_direct
> > > > > > > > >       blk_mq_request_issue_directly
> > > > > > > > >       # dispatch rq1, succeed
> > > > > > > > >       blk_mq_request_issue_directly
> > > > > > > > >       # dispatch rq2
> > > > > > > > >        __blk_mq_try_issue_directly
> > > > > > > > >         blk_mq_get_dispatch_budget
> > > > > > > > >          scsi_mq_get_budget
> > > > > > > > >           atomic_inc(&sdev->restarts);
> > > > > > > > >           # rq2 failed to get budget
> > > > > > > > >           # restarts is 1 now
> > > > > > > > >                                             scsi_end_request
> > > > > > > > >                                             # rq1 is completed
> > > > > > > > >                                             ┊scsi_run_queue_async
> > > > > > > > >                                             ┊ atomic_cmpxchg(&sdev->restarts,
> > > > > > > > > old, 0) == old
> > > > > > > > >                                             ┊ # set restarts to 0
> > > > > > > > >                                             ┊ blk_mq_run_hw_queues
> > > > > > > > >                                             ┊ # hctx->dispatch list is empty
> > > > > > > > >       blk_mq_request_bypass_insert
> > > > > > > > >       # insert rq2 to hctx->dispatch list
> > > > > > > > 
> > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> > > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from
> > > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
> > > > > > > 
> > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
> > > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't
> > > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue.
> > > > > > 
> > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
> > > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to
> > > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so
> > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
> > > > > > is still run.
> > > > > > 
> > > > > > Also not sure why you make rq3 involved, since the list is local list on
> > > > > > stack, and it can be operated concurrently.
> > > > > 
> > > > > I make rq3 involved because there are some conditions that
> > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
> > > > > blk_mq_sched_insert_requests():
> > > > 
> > > > The two won't be called if list_empty() is true, and will be called if
> > > > !list_empty().
> > > > 
> > > > That is why I mentioned run queue has been done after rq2 is added to
> > > > ->dispatch_list.
> > > 
> > > I don't follow here, it's right after rq2 is inserted to dispatch list,
> > > list is not empty, and blk_mq_sched_insert_requests() will be called.
> > > However, do you think that it's impossible that
> > > blk_mq_sched_insert_requests() can dispatch rq in the list and list
> > > will become empty?
> > 
> > Please take a look at blk_mq_sched_insert_requests().
> > 
> > When codes runs into blk_mq_sched_insert_requests(), the following
> > blk_mq_run_hw_queue() will be run always, how does list empty or not
> > make a difference there?
> 
> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
> tries to do in this patch, are we look at different code?

No.

> 
> I'm copying blk_mq_sched_insert_requests() here, the code is from
> latest linux-next:
> 
> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
> 462                                 ┊ struct blk_mq_ctx *ctx,
> 463                                 ┊ struct list_head *list, bool
> run_queue_async)
> 464 {
> 465         struct elevator_queue *e;
> 466         struct request_queue *q = hctx->queue;
> 467
> 468         /*
> 469         ┊* blk_mq_sched_insert_requests() is called from flush plug
> 470         ┊* context only, and hold one usage counter to prevent queue
> 471         ┊* from being released.
> 472         ┊*/
> 473         percpu_ref_get(&q->q_usage_counter);
> 474
> 475         e = hctx->queue->elevator;
> 476         if (e) {
> 477                 e->type->ops.insert_requests(hctx, list, false);
> 478         } else {
> 479                 /*
> 480                 ┊* try to issue requests directly if the hw queue isn't
> 481                 ┊* busy in case of 'none' scheduler, and this way may
> save
> 482                 ┊* us one extra enqueue & dequeue to sw queue.
> 483                 ┊*/
> 484                 if (!hctx->dispatch_busy && !run_queue_async) {
> 485                         blk_mq_run_dispatch_ops(hctx->queue,
> 486                                 blk_mq_try_issue_list_directly(hctx,
> list));
> 487                         if (list_empty(list))
> 488                                 goto out;
> 489                 }
> 490                 blk_mq_insert_requests(hctx, ctx, list);
> 491         }
> 492
> 493         blk_mq_run_hw_queue(hctx, run_queue_async);
> 494  out:
> 495         percpu_ref_put(&q->q_usage_counter);
> 496 }
> 
> Here in line 487, if list_empty() is true, out label will skip
> run_queue().

If list_empty() is true, run queue is guaranteed to run
in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE
is returned from blk_mq_request_issue_directly().

		ret = blk_mq_request_issue_directly(rq, list_empty(list));
		if (ret != BLK_STS_OK) {
			if (ret == BLK_STS_RESOURCE ||
					ret == BLK_STS_DEV_RESOURCE) {
				blk_mq_request_bypass_insert(rq, false,
							list_empty(list));	//run queue
				break;
			}
			blk_mq_end_request(rq, ret);
			errors++;
		} else
			queued++;

So why do you try to add one extra run queue?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  3:21                     ` Ming Lei
@ 2022-07-26  3:31                       ` Yufen Yu
  2022-07-26  3:31                       ` Yu Kuai
  1 sibling, 0 replies; 19+ messages in thread
From: Yufen Yu @ 2022-07-26  3:31 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai; +Cc: Yu Kuai, axboe, linux-block, hch, zhangyi (F)



On 2022/7/26 11:21, Ming Lei wrote:
> On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote:
>> Hi, Ming
>>
>> 在 2022/07/26 11:02, Ming Lei 写道:
>>> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
>>>> Hi, Ming
>>>> 在 2022/07/26 10:32, Ming Lei 写道:
>>>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
>>>>>> 在 2022/07/26 9:46, Ming Lei 写道:
>>>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
>>>>>>>> Hi, Ming!
>>>>>>>>
>>>>>>>> 在 2022/07/25 23:43, Ming Lei 写道:
>>>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>>>>>>>>>> Hi, Ming!
>>>>>>>>>>
>>>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道:
>>>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>>>>>>>>>> scheduler is 'none'. We found a IO hung as following:
>>>>>>>>>>>>
>>>>>>>>>>>> blk_finish_plug
>>>>>>>>>>>>         blk_mq_plug_issue_direct
>>>>>>>>>>>>             scsi_mq_get_budget
>>>>>>>>>>>>             //get budget_token fail and sdev->restarts=1
>>>>>>>>>>>>
>>>>>>>>>>>> 			     	 scsi_end_request
>>>>>>>>>>>> 				   scsi_run_queue_async
>>>>>>>>>>>>                                          //sdev->restart=0 and run queue
>>>>>>>>>>>>
>>>>>>>>>>>>            blk_mq_request_bypass_insert
>>>>>>>>>>>>               //add request to hctx->dispatch list
>>>>>>>>>>>
>>>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or
>>>>>>>>>>> scsi_run_queue_async.
>>>>>>>>>>>
>>>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a
>>>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to
>>>>>>>>>>> hctx dispatch list?
>>>>>>>>>>
>>>>>>>>>> I think Yufen is probably thinking about the following Concurrent
>>>>>>>>>> scenario:
>>>>>>>>>>
>>>>>>>>>> blk_mq_flush_plug_list
>>>>>>>>>> # assume there are three rq
>>>>>>>>>>       blk_mq_plug_issue_direct
>>>>>>>>>>        blk_mq_request_issue_directly
>>>>>>>>>>        # dispatch rq1, succeed
>>>>>>>>>>        blk_mq_request_issue_directly
>>>>>>>>>>        # dispatch rq2
>>>>>>>>>>         __blk_mq_try_issue_directly
>>>>>>>>>>          blk_mq_get_dispatch_budget
>>>>>>>>>>           scsi_mq_get_budget
>>>>>>>>>>            atomic_inc(&sdev->restarts);
>>>>>>>>>>            # rq2 failed to get budget
>>>>>>>>>>            # restarts is 1 now
>>>>>>>>>>                                              scsi_end_request
>>>>>>>>>>                                              # rq1 is completed
>>>>>>>>>>                                              ┊scsi_run_queue_async
>>>>>>>>>>                                              ┊ atomic_cmpxchg(&sdev->restarts,
>>>>>>>>>> old, 0) == old
>>>>>>>>>>                                              ┊ # set restarts to 0
>>>>>>>>>>                                              ┊ blk_mq_run_hw_queues
>>>>>>>>>>                                              ┊ # hctx->dispatch list is empty
>>>>>>>>>>        blk_mq_request_bypass_insert
>>>>>>>>>>        # insert rq2 to hctx->dispatch list
>>>>>>>>>
>>>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
>>>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from
>>>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
>>>>>>>>
>>>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
>>>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't
>>>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue.
>>>>>>>
>>>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
>>>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to
>>>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so
>>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
>>>>>>> is still run.
>>>>>>>
>>>>>>> Also not sure why you make rq3 involved, since the list is local list on
>>>>>>> stack, and it can be operated concurrently.
>>>>>>
>>>>>> I make rq3 involved because there are some conditions that
>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
>>>>>> blk_mq_sched_insert_requests():
>>>>>
>>>>> The two won't be called if list_empty() is true, and will be called if
>>>>> !list_empty().
>>>>>
>>>>> That is why I mentioned run queue has been done after rq2 is added to
>>>>> ->dispatch_list.
>>>>
>>>> I don't follow here, it's right after rq2 is inserted to dispatch list,
>>>> list is not empty, and blk_mq_sched_insert_requests() will be called.
>>>> However, do you think that it's impossible that
>>>> blk_mq_sched_insert_requests() can dispatch rq in the list and list
>>>> will become empty?
>>>
>>> Please take a look at blk_mq_sched_insert_requests().
>>>
>>> When codes runs into blk_mq_sched_insert_requests(), the following
>>> blk_mq_run_hw_queue() will be run always, how does list empty or not
>>> make a difference there?
>>
>> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
>> tries to do in this patch, are we look at different code?
> 
> No.
> 
>>
>> I'm copying blk_mq_sched_insert_requests() here, the code is from
>> latest linux-next:
>>
>> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
>> 462                                 ┊ struct blk_mq_ctx *ctx,
>> 463                                 ┊ struct list_head *list, bool
>> run_queue_async)
>> 464 {
>> 465         struct elevator_queue *e;
>> 466         struct request_queue *q = hctx->queue;
>> 467
>> 468         /*
>> 469         ┊* blk_mq_sched_insert_requests() is called from flush plug
>> 470         ┊* context only, and hold one usage counter to prevent queue
>> 471         ┊* from being released.
>> 472         ┊*/
>> 473         percpu_ref_get(&q->q_usage_counter);
>> 474
>> 475         e = hctx->queue->elevator;
>> 476         if (e) {
>> 477                 e->type->ops.insert_requests(hctx, list, false);
>> 478         } else {
>> 479                 /*
>> 480                 ┊* try to issue requests directly if the hw queue isn't
>> 481                 ┊* busy in case of 'none' scheduler, and this way may
>> save
>> 482                 ┊* us one extra enqueue & dequeue to sw queue.
>> 483                 ┊*/
>> 484                 if (!hctx->dispatch_busy && !run_queue_async) {
>> 485                         blk_mq_run_dispatch_ops(hctx->queue,
>> 486                                 blk_mq_try_issue_list_directly(hctx,
>> list));
>> 487                         if (list_empty(list))
>> 488                                 goto out;
>> 489                 }
>> 490                 blk_mq_insert_requests(hctx, ctx, list);
>> 491         }
>> 492
>> 493         blk_mq_run_hw_queue(hctx, run_queue_async);
>> 494  out:
>> 495         percpu_ref_put(&q->q_usage_counter);
>> 496 }
>>
>> Here in line 487, if list_empty() is true, out label will skip
>> run_queue().
> 
> If list_empty() is true, run queue is guaranteed to run
> in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE
> is returned from blk_mq_request_issue_directly().

If request issue success here, this what I say in my patch.
Then no one run queue.

> 
> 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
> 		if (ret != BLK_STS_OK) {
> 			if (ret == BLK_STS_RESOURCE ||
> 					ret == BLK_STS_DEV_RESOURCE) {
> 				blk_mq_request_bypass_insert(rq, false,
> 							list_empty(list));	//run queue
> 				break;
> 			}
> 			blk_mq_end_request(rq, ret);
> 			errors++;
> 		} else
> 			queued++;
> 
> So why do you try to add one extra run queue?
> 
> 
> Thanks,
> Ming
> 
> .

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  3:21                     ` Ming Lei
  2022-07-26  3:31                       ` Yufen Yu
@ 2022-07-26  3:31                       ` Yu Kuai
  2022-07-26  4:16                         ` Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2022-07-26  3:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

在 2022/07/26 11:21, Ming Lei 写道:
> On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote:
>> Hi, Ming
>>
>> 在 2022/07/26 11:02, Ming Lei 写道:
>>> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
>>>> Hi, Ming
>>>> 在 2022/07/26 10:32, Ming Lei 写道:
>>>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
>>>>>> 在 2022/07/26 9:46, Ming Lei 写道:
>>>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
>>>>>>>> Hi, Ming!
>>>>>>>>
>>>>>>>> 在 2022/07/25 23:43, Ming Lei 写道:
>>>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>>>>>>>>>> Hi, Ming!
>>>>>>>>>>
>>>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道:
>>>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>>>>>>>>>> scheduler is 'none'. We found a IO hung as following:
>>>>>>>>>>>>
>>>>>>>>>>>> blk_finish_plug
>>>>>>>>>>>>         blk_mq_plug_issue_direct
>>>>>>>>>>>>             scsi_mq_get_budget
>>>>>>>>>>>>             //get budget_token fail and sdev->restarts=1
>>>>>>>>>>>>
>>>>>>>>>>>> 			     	 scsi_end_request
>>>>>>>>>>>> 				   scsi_run_queue_async
>>>>>>>>>>>>                                          //sdev->restart=0 and run queue
>>>>>>>>>>>>
>>>>>>>>>>>>            blk_mq_request_bypass_insert
>>>>>>>>>>>>               //add request to hctx->dispatch list
>>>>>>>>>>>
>>>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or
>>>>>>>>>>> scsi_run_queue_async.
>>>>>>>>>>>
>>>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a
>>>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to
>>>>>>>>>>> hctx dispatch list?
>>>>>>>>>>
>>>>>>>>>> I think Yufen is probably thinking about the following Concurrent
>>>>>>>>>> scenario:
>>>>>>>>>>
>>>>>>>>>> blk_mq_flush_plug_list
>>>>>>>>>> # assume there are three rq
>>>>>>>>>>       blk_mq_plug_issue_direct
>>>>>>>>>>        blk_mq_request_issue_directly
>>>>>>>>>>        # dispatch rq1, succeed
>>>>>>>>>>        blk_mq_request_issue_directly
>>>>>>>>>>        # dispatch rq2
>>>>>>>>>>         __blk_mq_try_issue_directly
>>>>>>>>>>          blk_mq_get_dispatch_budget
>>>>>>>>>>           scsi_mq_get_budget
>>>>>>>>>>            atomic_inc(&sdev->restarts);
>>>>>>>>>>            # rq2 failed to get budget
>>>>>>>>>>            # restarts is 1 now
>>>>>>>>>>                                              scsi_end_request
>>>>>>>>>>                                              # rq1 is completed
>>>>>>>>>>                                              ┊scsi_run_queue_async
>>>>>>>>>>                                              ┊ atomic_cmpxchg(&sdev->restarts,
>>>>>>>>>> old, 0) == old
>>>>>>>>>>                                              ┊ # set restarts to 0
>>>>>>>>>>                                              ┊ blk_mq_run_hw_queues
>>>>>>>>>>                                              ┊ # hctx->dispatch list is empty
>>>>>>>>>>        blk_mq_request_bypass_insert
>>>>>>>>>>        # insert rq2 to hctx->dispatch list
>>>>>>>>>
>>>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
>>>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from
>>>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
>>>>>>>>
>>>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
>>>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't
>>>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue.
>>>>>>>
>>>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
>>>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to
>>>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so
>>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
>>>>>>> is still run.
>>>>>>>
>>>>>>> Also not sure why you make rq3 involved, since the list is local list on
>>>>>>> stack, and it can be operated concurrently.
>>>>>>
>>>>>> I make rq3 involved because there are some conditions that
>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
>>>>>> blk_mq_sched_insert_requests():
>>>>>
>>>>> The two won't be called if list_empty() is true, and will be called if
>>>>> !list_empty().
>>>>>
>>>>> That is why I mentioned run queue has been done after rq2 is added to
>>>>> ->dispatch_list.
>>>>
>>>> I don't follow here, it's right after rq2 is inserted to dispatch list,
>>>> list is not empty, and blk_mq_sched_insert_requests() will be called.
>>>> However, do you think that it's impossible that
>>>> blk_mq_sched_insert_requests() can dispatch rq in the list and list
>>>> will become empty?
>>>
>>> Please take a look at blk_mq_sched_insert_requests().
>>>
>>> When codes runs into blk_mq_sched_insert_requests(), the following
>>> blk_mq_run_hw_queue() will be run always, how does list empty or not
>>> make a difference there?
>>
>> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
>> tries to do in this patch, are we look at different code?
> 
> No.
> 
>>
>> I'm copying blk_mq_sched_insert_requests() here, the code is from
>> latest linux-next:
>>
>> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
>> 462                                 ┊ struct blk_mq_ctx *ctx,
>> 463                                 ┊ struct list_head *list, bool
>> run_queue_async)
>> 464 {
>> 465         struct elevator_queue *e;
>> 466         struct request_queue *q = hctx->queue;
>> 467
>> 468         /*
>> 469         ┊* blk_mq_sched_insert_requests() is called from flush plug
>> 470         ┊* context only, and hold one usage counter to prevent queue
>> 471         ┊* from being released.
>> 472         ┊*/
>> 473         percpu_ref_get(&q->q_usage_counter);
>> 474
>> 475         e = hctx->queue->elevator;
>> 476         if (e) {
>> 477                 e->type->ops.insert_requests(hctx, list, false);
>> 478         } else {
>> 479                 /*
>> 480                 ┊* try to issue requests directly if the hw queue isn't
>> 481                 ┊* busy in case of 'none' scheduler, and this way may
>> save
>> 482                 ┊* us one extra enqueue & dequeue to sw queue.
>> 483                 ┊*/
>> 484                 if (!hctx->dispatch_busy && !run_queue_async) {
>> 485                         blk_mq_run_dispatch_ops(hctx->queue,
>> 486                                 blk_mq_try_issue_list_directly(hctx,
>> list));
>> 487                         if (list_empty(list))
>> 488                                 goto out;
>> 489                 }
>> 490                 blk_mq_insert_requests(hctx, ctx, list);
>> 491         }
>> 492
>> 493         blk_mq_run_hw_queue(hctx, run_queue_async);
>> 494  out:
>> 495         percpu_ref_put(&q->q_usage_counter);
>> 496 }
>>
>> Here in line 487, if list_empty() is true, out label will skip
>> run_queue().
> 
> If list_empty() is true, run queue is guaranteed to run
> in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE
> is returned from blk_mq_request_issue_directly().
> 
> 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
> 		if (ret != BLK_STS_OK) {
> 			if (ret == BLK_STS_RESOURCE ||
> 					ret == BLK_STS_DEV_RESOURCE) {
> 				blk_mq_request_bypass_insert(rq, false,
> 							list_empty(list));	//run queue
> 				break;
> 			}
> 			blk_mq_end_request(rq, ret);
> 			errors++;
> 		} else
> 			queued++;
> 
> So why do you try to add one extra run queue?

Hi, Ming

Perhaps I didn't explain the scenario clearly, please notice that list
contain three rq is required.

1) rq1 is dispatched successfuly
2) rq2 failed to dispatch due to no budget, in this case
    - rq2 will insert to dispatch list
    - list is not emply yet, run queue won't called
3) finally, blk_mq_sched_insert_requests() dispatch rq3 successfuly,
and list will become empty, thus run queue still won't be called.

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  3:31                       ` Yu Kuai
@ 2022-07-26  4:16                         ` Ming Lei
  2022-07-26  5:01                           ` Yufen Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2022-07-26  4:16 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Yu Kuai, Yufen Yu, axboe, linux-block, hch, zhangyi (F)

On Tue, Jul 26, 2022 at 11:31:34AM +0800, Yu Kuai wrote:
> 在 2022/07/26 11:21, Ming Lei 写道:
> > On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote:
> > > Hi, Ming
> > > 
> > > 在 2022/07/26 11:02, Ming Lei 写道:
> > > > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
> > > > > Hi, Ming
> > > > > 在 2022/07/26 10:32, Ming Lei 写道:
> > > > > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
> > > > > > > 在 2022/07/26 9:46, Ming Lei 写道:
> > > > > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
> > > > > > > > > Hi, Ming!
> > > > > > > > > 
> > > > > > > > > 在 2022/07/25 23:43, Ming Lei 写道:
> > > > > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> > > > > > > > > > > Hi, Ming!
> > > > > > > > > > > 
> > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道:
> > > > > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > > > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > > > > > > > > > > > scheduler is 'none'. We found a IO hung as following:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > blk_finish_plug
> > > > > > > > > > > > >         blk_mq_plug_issue_direct
> > > > > > > > > > > > >             scsi_mq_get_budget
> > > > > > > > > > > > >             //get budget_token fail and sdev->restarts=1
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 			     	 scsi_end_request
> > > > > > > > > > > > > 				   scsi_run_queue_async
> > > > > > > > > > > > >                                          //sdev->restart=0 and run queue
> > > > > > > > > > > > > 
> > > > > > > > > > > > >            blk_mq_request_bypass_insert
> > > > > > > > > > > > >               //add request to hctx->dispatch list
> > > > > > > > > > > > 
> > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or
> > > > > > > > > > > > scsi_run_queue_async.
> > > > > > > > > > > > 
> > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > > > > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a
> > > > > > > > > > > > bit more why blk-mq doesn't run queue after adding request to
> > > > > > > > > > > > hctx dispatch list?
> > > > > > > > > > > 
> > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent
> > > > > > > > > > > scenario:
> > > > > > > > > > > 
> > > > > > > > > > > blk_mq_flush_plug_list
> > > > > > > > > > > # assume there are three rq
> > > > > > > > > > >       blk_mq_plug_issue_direct
> > > > > > > > > > >        blk_mq_request_issue_directly
> > > > > > > > > > >        # dispatch rq1, succeed
> > > > > > > > > > >        blk_mq_request_issue_directly
> > > > > > > > > > >        # dispatch rq2
> > > > > > > > > > >         __blk_mq_try_issue_directly
> > > > > > > > > > >          blk_mq_get_dispatch_budget
> > > > > > > > > > >           scsi_mq_get_budget
> > > > > > > > > > >            atomic_inc(&sdev->restarts);
> > > > > > > > > > >            # rq2 failed to get budget
> > > > > > > > > > >            # restarts is 1 now
> > > > > > > > > > >                                              scsi_end_request
> > > > > > > > > > >                                              # rq1 is completed
> > > > > > > > > > >                                              ┊scsi_run_queue_async
> > > > > > > > > > >                                              ┊ atomic_cmpxchg(&sdev->restarts,
> > > > > > > > > > > old, 0) == old
> > > > > > > > > > >                                              ┊ # set restarts to 0
> > > > > > > > > > >                                              ┊ blk_mq_run_hw_queues
> > > > > > > > > > >                                              ┊ # hctx->dispatch list is empty
> > > > > > > > > > >        blk_mq_request_bypass_insert
> > > > > > > > > > >        # insert rq2 to hctx->dispatch list
> > > > > > > > > > 
> > > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> > > > > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from
> > > > > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
> > > > > > > > > 
> > > > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
> > > > > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't
> > > > > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue.
> > > > > > > > 
> > > > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
> > > > > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to
> > > > > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so
> > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
> > > > > > > > is still run.
> > > > > > > > 
> > > > > > > > Also not sure why you make rq3 involved, since the list is local list on
> > > > > > > > stack, and it can be operated concurrently.
> > > > > > > 
> > > > > > > I make rq3 involved because there are some conditions that
> > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
> > > > > > > blk_mq_sched_insert_requests():
> > > > > > 
> > > > > > The two won't be called if list_empty() is true, and will be called if
> > > > > > !list_empty().
> > > > > > 
> > > > > > That is why I mentioned run queue has been done after rq2 is added to
> > > > > > ->dispatch_list.
> > > > > 
> > > > > I don't follow here, it's right after rq2 is inserted to dispatch list,
> > > > > list is not empty, and blk_mq_sched_insert_requests() will be called.
> > > > > However, do you think that it's impossible that
> > > > > blk_mq_sched_insert_requests() can dispatch rq in the list and list
> > > > > will become empty?
> > > > 
> > > > Please take a look at blk_mq_sched_insert_requests().
> > > > 
> > > > When codes runs into blk_mq_sched_insert_requests(), the following
> > > > blk_mq_run_hw_queue() will be run always, how does list empty or not
> > > > make a difference there?
> > > 
> > > This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
> > > tries to do in this patch, are we look at different code?
> > 
> > No.
> > 
> > > 
> > > I'm copying blk_mq_sched_insert_requests() here, the code is from
> > > latest linux-next:
> > > 
> > > 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
> > > 462                                 ┊ struct blk_mq_ctx *ctx,
> > > 463                                 ┊ struct list_head *list, bool
> > > run_queue_async)
> > > 464 {
> > > 465         struct elevator_queue *e;
> > > 466         struct request_queue *q = hctx->queue;
> > > 467
> > > 468         /*
> > > 469         ┊* blk_mq_sched_insert_requests() is called from flush plug
> > > 470         ┊* context only, and hold one usage counter to prevent queue
> > > 471         ┊* from being released.
> > > 472         ┊*/
> > > 473         percpu_ref_get(&q->q_usage_counter);
> > > 474
> > > 475         e = hctx->queue->elevator;
> > > 476         if (e) {
> > > 477                 e->type->ops.insert_requests(hctx, list, false);
> > > 478         } else {
> > > 479                 /*
> > > 480                 ┊* try to issue requests directly if the hw queue isn't
> > > 481                 ┊* busy in case of 'none' scheduler, and this way may
> > > save
> > > 482                 ┊* us one extra enqueue & dequeue to sw queue.
> > > 483                 ┊*/
> > > 484                 if (!hctx->dispatch_busy && !run_queue_async) {
> > > 485                         blk_mq_run_dispatch_ops(hctx->queue,
> > > 486                                 blk_mq_try_issue_list_directly(hctx,
> > > list));
> > > 487                         if (list_empty(list))
> > > 488                                 goto out;
> > > 489                 }
> > > 490                 blk_mq_insert_requests(hctx, ctx, list);
> > > 491         }
> > > 492
> > > 493         blk_mq_run_hw_queue(hctx, run_queue_async);
> > > 494  out:
> > > 495         percpu_ref_put(&q->q_usage_counter);
> > > 496 }
> > > 
> > > Here in line 487, if list_empty() is true, out label will skip
> > > run_queue().
> > 
> > If list_empty() is true, run queue is guaranteed to run
> > in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE
> > is returned from blk_mq_request_issue_directly().
> > 
> > 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
> > 		if (ret != BLK_STS_OK) {
> > 			if (ret == BLK_STS_RESOURCE ||
> > 					ret == BLK_STS_DEV_RESOURCE) {
> > 				blk_mq_request_bypass_insert(rq, false,
> > 							list_empty(list));	//run queue
> > 				break;
> > 			}
> > 			blk_mq_end_request(rq, ret);
> > 			errors++;
> > 		} else
> > 			queued++;
> > 
> > So why do you try to add one extra run queue?
> 
> Hi, Ming
> 
> Perhaps I didn't explain the scenario clearly, please notice that list
> contain three rq is required.
> 
> 1) rq1 is dispatched successfuly
> 2) rq2 failed to dispatch due to no budget, in this case
>    - rq2 will insert to dispatch list
>    - list is not emply yet, run queue won't called

In the case, blk_mq_try_issue_list_directly() returns to
blk_mq_sched_insert_requests() immediately, then blk_mq_insert_requests()
and blk_mq_run_hw_queue() will be run from blk_mq_sched_insert_requests()
because the list isn't empty.

Right?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  4:16                         ` Ming Lei
@ 2022-07-26  5:01                           ` Yufen Yu
  2022-07-26  7:39                             ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Yufen Yu @ 2022-07-26  5:01 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai; +Cc: Yu Kuai, axboe, linux-block, hch, zhangyi (F)



On 2022/7/26 12:16, Ming Lei wrote:
> On Tue, Jul 26, 2022 at 11:31:34AM +0800, Yu Kuai wrote:
>> 在 2022/07/26 11:21, Ming Lei 写道:
>>> On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote:
>>>> Hi, Ming
>>>>
>>>> 在 2022/07/26 11:02, Ming Lei 写道:
>>>>> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
>>>>>> Hi, Ming
>>>>>> 在 2022/07/26 10:32, Ming Lei 写道:
>>>>>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
>>>>>>>> 在 2022/07/26 9:46, Ming Lei 写道:
>>>>>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
>>>>>>>>>> Hi, Ming!
>>>>>>>>>>
>>>>>>>>>> 在 2022/07/25 23:43, Ming Lei 写道:
>>>>>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
>>>>>>>>>>>> Hi, Ming!
>>>>>>>>>>>>
>>>>>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道:
>>>>>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
>>>>>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq
>>>>>>>>>>>>>> scheduler is 'none'. We found a IO hung as following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> blk_finish_plug
>>>>>>>>>>>>>>          blk_mq_plug_issue_direct
>>>>>>>>>>>>>>              scsi_mq_get_budget
>>>>>>>>>>>>>>              //get budget_token fail and sdev->restarts=1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 			     	 scsi_end_request
>>>>>>>>>>>>>> 				   scsi_run_queue_async
>>>>>>>>>>>>>>                                           //sdev->restart=0 and run queue
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             blk_mq_request_bypass_insert
>>>>>>>>>>>>>>                //add request to hctx->dispatch list
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or
>>>>>>>>>>>>> scsi_run_queue_async.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's
>>>>>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a
>>>>>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to
>>>>>>>>>>>>> hctx dispatch list?
>>>>>>>>>>>>
>>>>>>>>>>>> I think Yufen is probably thinking about the following Concurrent
>>>>>>>>>>>> scenario:
>>>>>>>>>>>>
>>>>>>>>>>>> blk_mq_flush_plug_list
>>>>>>>>>>>> # assume there are three rq
>>>>>>>>>>>>        blk_mq_plug_issue_direct
>>>>>>>>>>>>         blk_mq_request_issue_directly
>>>>>>>>>>>>         # dispatch rq1, succeed
>>>>>>>>>>>>         blk_mq_request_issue_directly
>>>>>>>>>>>>         # dispatch rq2
>>>>>>>>>>>>          __blk_mq_try_issue_directly
>>>>>>>>>>>>           blk_mq_get_dispatch_budget
>>>>>>>>>>>>            scsi_mq_get_budget
>>>>>>>>>>>>             atomic_inc(&sdev->restarts);
>>>>>>>>>>>>             # rq2 failed to get budget
>>>>>>>>>>>>             # restarts is 1 now
>>>>>>>>>>>>                                               scsi_end_request
>>>>>>>>>>>>                                               # rq1 is completed
>>>>>>>>>>>>                                               ┊scsi_run_queue_async
>>>>>>>>>>>>                                               ┊ atomic_cmpxchg(&sdev->restarts,
>>>>>>>>>>>> old, 0) == old
>>>>>>>>>>>>                                               ┊ # set restarts to 0
>>>>>>>>>>>>                                               ┊ blk_mq_run_hw_queues
>>>>>>>>>>>>                                               ┊ # hctx->dispatch list is empty
>>>>>>>>>>>>         blk_mq_request_bypass_insert
>>>>>>>>>>>>         # insert rq2 to hctx->dispatch list
>>>>>>>>>>>
>>>>>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
>>>>>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from
>>>>>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
>>>>>>>>>>
>>>>>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
>>>>>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't
>>>>>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue.
>>>>>>>>>
>>>>>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
>>>>>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to
>>>>>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so
>>>>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
>>>>>>>>> is still run.
>>>>>>>>>
>>>>>>>>> Also not sure why you make rq3 involved, since the list is local list on
>>>>>>>>> stack, and it can be operated concurrently.
>>>>>>>>
>>>>>>>> I make rq3 involved because there are some conditions that
>>>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
>>>>>>>> blk_mq_sched_insert_requests():
>>>>>>>
>>>>>>> The two won't be called if list_empty() is true, and will be called if
>>>>>>> !list_empty().
>>>>>>>
>>>>>>> That is why I mentioned run queue has been done after rq2 is added to
>>>>>>> ->dispatch_list.
>>>>>>
>>>>>> I don't follow here, it's right after rq2 is inserted to dispatch list,
>>>>>> list is not empty, and blk_mq_sched_insert_requests() will be called.
>>>>>> However, do you think that it's impossible that
>>>>>> blk_mq_sched_insert_requests() can dispatch rq in the list and list
>>>>>> will become empty?
>>>>>
>>>>> Please take a look at blk_mq_sched_insert_requests().
>>>>>
>>>>> When codes runs into blk_mq_sched_insert_requests(), the following
>>>>> blk_mq_run_hw_queue() will be run always, how does list empty or not
>>>>> make a difference there?
>>>>
>>>> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
>>>> tries to do in this patch, are we look at different code?
>>>
>>> No.
>>>
>>>>
>>>> I'm copying blk_mq_sched_insert_requests() here, the code is from
>>>> latest linux-next:
>>>>
>>>> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
>>>> 462                                 ┊ struct blk_mq_ctx *ctx,
>>>> 463                                 ┊ struct list_head *list, bool
>>>> run_queue_async)
>>>> 464 {
>>>> 465         struct elevator_queue *e;
>>>> 466         struct request_queue *q = hctx->queue;
>>>> 467
>>>> 468         /*
>>>> 469         ┊* blk_mq_sched_insert_requests() is called from flush plug
>>>> 470         ┊* context only, and hold one usage counter to prevent queue
>>>> 471         ┊* from being released.
>>>> 472         ┊*/
>>>> 473         percpu_ref_get(&q->q_usage_counter);
>>>> 474
>>>> 475         e = hctx->queue->elevator;
>>>> 476         if (e) {
>>>> 477                 e->type->ops.insert_requests(hctx, list, false);
>>>> 478         } else {
>>>> 479                 /*
>>>> 480                 ┊* try to issue requests directly if the hw queue isn't
>>>> 481                 ┊* busy in case of 'none' scheduler, and this way may
>>>> save
>>>> 482                 ┊* us one extra enqueue & dequeue to sw queue.
>>>> 483                 ┊*/
>>>> 484                 if (!hctx->dispatch_busy && !run_queue_async) {
>>>> 485                         blk_mq_run_dispatch_ops(hctx->queue,
>>>> 486                                 blk_mq_try_issue_list_directly(hctx,
>>>> list));
>>>> 487                         if (list_empty(list))
>>>> 488                                 goto out;
>>>> 489                 }
>>>> 490                 blk_mq_insert_requests(hctx, ctx, list);
>>>> 491         }
>>>> 492
>>>> 493         blk_mq_run_hw_queue(hctx, run_queue_async);
>>>> 494  out:
>>>> 495         percpu_ref_put(&q->q_usage_counter);
>>>> 496 }
>>>>
>>>> Here in line 487, if list_empty() is true, out label will skip
>>>> run_queue().
>>>
>>> If list_empty() is true, run queue is guaranteed to run
>>> in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE
>>> is returned from blk_mq_request_issue_directly().
>>>
>>> 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
>>> 		if (ret != BLK_STS_OK) {
>>> 			if (ret == BLK_STS_RESOURCE ||
>>> 					ret == BLK_STS_DEV_RESOURCE) {
>>> 				blk_mq_request_bypass_insert(rq, false,
>>> 							list_empty(list));	//run queue
>>> 				break;
>>> 			}
>>> 			blk_mq_end_request(rq, ret);
>>> 			errors++;
>>> 		} else
>>> 			queued++;
>>>
>>> So why do you try to add one extra run queue?
>>
>> Hi, Ming
>>
>> Perhaps I didn't explain the scenario clearly, please notice that list
>> contain three rq is required.
>>
>> 1) rq1 is dispatched successfuly
>> 2) rq2 failed to dispatch due to no budget, in this case
>>     - rq2 will insert to dispatch list
>>     - list is not emply yet, run queue won't called
> 
> In the case, blk_mq_try_issue_list_directly() returns to
> blk_mq_sched_insert_requests() immediately, then blk_mq_insert_requests()
> and blk_mq_run_hw_queue() will be run from blk_mq_sched_insert_requests()
> because the list isn't empty.
> 
> Right?
> 

hi Ming,

Here rq2 fail from blk_mq_plug_issue_direct() in blk_mq_flush_plug_list(),
not blk_mq_sched_insert_requests

blk_mq_flush_plug_list

     if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
         struct request_queue *q;

         rq = rq_list_peek(&plug->mq_list);
         q = rq->q;

         /*
          * Peek first request and see if we have a ->queue_rqs() hook.
          * If we do, we can dispatch the whole plug list in one go. We
          * already know at this point that all requests belong to the
          * same queue, caller must ensure that's the case.
          *
          * Since we pass off the full list to the driver at this point,
          * we do not increment the active request count for the queue.
          * Bypass shared tags for now because of that.
          */
         if (q->mq_ops->queue_rqs &&
             !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
             blk_mq_run_dispatch_ops(q,
                 __blk_mq_flush_plug_list(q, plug));
             if (rq_list_empty(plug->mq_list))
                 return;
         }

         blk_mq_run_dispatch_ops(q,
                 blk_mq_plug_issue_direct(plug, false));  //rq2 insert into dispatch list
         if (rq_list_empty(plug->mq_list))
             return;
     }

     do {
         blk_mq_dispatch_plug_list(plug, from_schedule);  //continue issue rq3 and success
     } while (!rq_list_empty(plug->mq_list));

> 
> Thanks,
> Ming
> 
> .

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  5:01                           ` Yufen Yu
@ 2022-07-26  7:39                             ` Ming Lei
  2022-07-26  9:20                               ` Yufen Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2022-07-26  7:39 UTC (permalink / raw)
  To: Yufen Yu; +Cc: Yu Kuai, Yu Kuai, axboe, linux-block, hch, zhangyi (F)

On Tue, Jul 26, 2022 at 01:01:41PM +0800, Yufen Yu wrote:
> 
> 
> On 2022/7/26 12:16, Ming Lei wrote:
> > On Tue, Jul 26, 2022 at 11:31:34AM +0800, Yu Kuai wrote:
> > > 在 2022/07/26 11:21, Ming Lei 写道:
> > > > On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote:
> > > > > Hi, Ming
> > > > > 
> > > > > 在 2022/07/26 11:02, Ming Lei 写道:
> > > > > > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote:
> > > > > > > Hi, Ming
> > > > > > > 在 2022/07/26 10:32, Ming Lei 写道:
> > > > > > > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote:
> > > > > > > > > 在 2022/07/26 9:46, Ming Lei 写道:
> > > > > > > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote:
> > > > > > > > > > > Hi, Ming!
> > > > > > > > > > > 
> > > > > > > > > > > 在 2022/07/25 23:43, Ming Lei 写道:
> > > > > > > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote:
> > > > > > > > > > > > > Hi, Ming!
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道:
> > > > > > > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote:
> > > > > > > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq
> > > > > > > > > > > > > > > scheduler is 'none'. We found a IO hung as following:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > blk_finish_plug
> > > > > > > > > > > > > > >          blk_mq_plug_issue_direct
> > > > > > > > > > > > > > >              scsi_mq_get_budget
> > > > > > > > > > > > > > >              //get budget_token fail and sdev->restarts=1
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			     	 scsi_end_request
> > > > > > > > > > > > > > > 				   scsi_run_queue_async
> > > > > > > > > > > > > > >                                           //sdev->restart=0 and run queue
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > >             blk_mq_request_bypass_insert
> > > > > > > > > > > > > > >                //add request to hctx->dispatch list
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or
> > > > > > > > > > > > > > scsi_run_queue_async.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's
> > > > > > > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a
> > > > > > > > > > > > > > bit more why blk-mq doesn't run queue after adding request to
> > > > > > > > > > > > > > hctx dispatch list?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent
> > > > > > > > > > > > > scenario:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > blk_mq_flush_plug_list
> > > > > > > > > > > > > # assume there are three rq
> > > > > > > > > > > > >        blk_mq_plug_issue_direct
> > > > > > > > > > > > >         blk_mq_request_issue_directly
> > > > > > > > > > > > >         # dispatch rq1, succeed
> > > > > > > > > > > > >         blk_mq_request_issue_directly
> > > > > > > > > > > > >         # dispatch rq2
> > > > > > > > > > > > >          __blk_mq_try_issue_directly
> > > > > > > > > > > > >           blk_mq_get_dispatch_budget
> > > > > > > > > > > > >            scsi_mq_get_budget
> > > > > > > > > > > > >             atomic_inc(&sdev->restarts);
> > > > > > > > > > > > >             # rq2 failed to get budget
> > > > > > > > > > > > >             # restarts is 1 now
> > > > > > > > > > > > >                                               scsi_end_request
> > > > > > > > > > > > >                                               # rq1 is completed
> > > > > > > > > > > > >                                               ┊scsi_run_queue_async
> > > > > > > > > > > > >                                               ┊ atomic_cmpxchg(&sdev->restarts,
> > > > > > > > > > > > > old, 0) == old
> > > > > > > > > > > > >                                               ┊ # set restarts to 0
> > > > > > > > > > > > >                                               ┊ blk_mq_run_hw_queues
> > > > > > > > > > > > >                                               ┊ # hctx->dispatch list is empty
> > > > > > > > > > > > >         blk_mq_request_bypass_insert
> > > > > > > > > > > > >         # insert rq2 to hctx->dispatch list
> > > > > > > > > > > > 
> > > > > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(),
> > > > > > > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from
> > > > > > > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests().
> > > > > > > > > > > 
> > > > > > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert()
> > > > > > > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't
> > > > > > > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue.
> > > > > > > > > > 
> > > > > > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch
> > > > > > > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to
> > > > > > > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so
> > > > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue
> > > > > > > > > > is still run.
> > > > > > > > > > 
> > > > > > > > > > Also not sure why you make rq3 involved, since the list is local list on
> > > > > > > > > > stack, and it can be operated concurrently.
> > > > > > > > > 
> > > > > > > > > I make rq3 involved because there are some conditions that
> > > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from
> > > > > > > > > blk_mq_sched_insert_requests():
> > > > > > > > 
> > > > > > > > The two won't be called if list_empty() is true, and will be called if
> > > > > > > > !list_empty().
> > > > > > > > 
> > > > > > > > That is why I mentioned run queue has been done after rq2 is added to
> > > > > > > > ->dispatch_list.
> > > > > > > 
> > > > > > > I don't follow here, it's right after rq2 is inserted to dispatch list,
> > > > > > > list is not empty, and blk_mq_sched_insert_requests() will be called.
> > > > > > > However, do you think that it's impossible that
> > > > > > > blk_mq_sched_insert_requests() can dispatch rq in the list and list
> > > > > > > will become empty?
> > > > > > 
> > > > > > Please take a look at blk_mq_sched_insert_requests().
> > > > > > 
> > > > > > When codes runs into blk_mq_sched_insert_requests(), the following
> > > > > > blk_mq_run_hw_queue() will be run always, how does list empty or not
> > > > > > make a difference there?
> > > > > 
> > > > > This is strange, always blk_mq_run_hw_queue() is exactly what Yufen
> > > > > tries to do in this patch, are we look at different code?
> > > > 
> > > > No.
> > > > 
> > > > > 
> > > > > I'm copying blk_mq_sched_insert_requests() here, the code is from
> > > > > latest linux-next:
> > > > > 
> > > > > 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
> > > > > 462                                 ┊ struct blk_mq_ctx *ctx,
> > > > > 463                                 ┊ struct list_head *list, bool
> > > > > run_queue_async)
> > > > > 464 {
> > > > > 465         struct elevator_queue *e;
> > > > > 466         struct request_queue *q = hctx->queue;
> > > > > 467
> > > > > 468         /*
> > > > > 469         ┊* blk_mq_sched_insert_requests() is called from flush plug
> > > > > 470         ┊* context only, and hold one usage counter to prevent queue
> > > > > 471         ┊* from being released.
> > > > > 472         ┊*/
> > > > > 473         percpu_ref_get(&q->q_usage_counter);
> > > > > 474
> > > > > 475         e = hctx->queue->elevator;
> > > > > 476         if (e) {
> > > > > 477                 e->type->ops.insert_requests(hctx, list, false);
> > > > > 478         } else {
> > > > > 479                 /*
> > > > > 480                 ┊* try to issue requests directly if the hw queue isn't
> > > > > 481                 ┊* busy in case of 'none' scheduler, and this way may
> > > > > save
> > > > > 482                 ┊* us one extra enqueue & dequeue to sw queue.
> > > > > 483                 ┊*/
> > > > > 484                 if (!hctx->dispatch_busy && !run_queue_async) {
> > > > > 485                         blk_mq_run_dispatch_ops(hctx->queue,
> > > > > 486                                 blk_mq_try_issue_list_directly(hctx,
> > > > > list));
> > > > > 487                         if (list_empty(list))
> > > > > 488                                 goto out;
> > > > > 489                 }
> > > > > 490                 blk_mq_insert_requests(hctx, ctx, list);
> > > > > 491         }
> > > > > 492
> > > > > 493         blk_mq_run_hw_queue(hctx, run_queue_async);
> > > > > 494  out:
> > > > > 495         percpu_ref_put(&q->q_usage_counter);
> > > > > 496 }
> > > > > 
> > > > > Here in line 487, if list_empty() is true, out label will skip
> > > > > run_queue().
> > > > 
> > > > If list_empty() is true, run queue is guaranteed to run
> > > > in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE
> > > > is returned from blk_mq_request_issue_directly().
> > > > 
> > > > 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
> > > > 		if (ret != BLK_STS_OK) {
> > > > 			if (ret == BLK_STS_RESOURCE ||
> > > > 					ret == BLK_STS_DEV_RESOURCE) {
> > > > 				blk_mq_request_bypass_insert(rq, false,
> > > > 							list_empty(list));	//run queue
> > > > 				break;
> > > > 			}
> > > > 			blk_mq_end_request(rq, ret);
> > > > 			errors++;
> > > > 		} else
> > > > 			queued++;
> > > > 
> > > > So why do you try to add one extra run queue?
> > > 
> > > Hi, Ming
> > > 
> > > Perhaps I didn't explain the scenario clearly, please notice that list
> > > contain three rq is required.
> > > 
> > > 1) rq1 is dispatched successfuly
> > > 2) rq2 failed to dispatch due to no budget, in this case
> > >     - rq2 will insert to dispatch list
> > >     - list is not emply yet, run queue won't called
> > 
> > In the case, blk_mq_try_issue_list_directly() returns to
> > blk_mq_sched_insert_requests() immediately, then blk_mq_insert_requests()
> > and blk_mq_run_hw_queue() will be run from blk_mq_sched_insert_requests()
> > because the list isn't empty.
> > 
> > Right?
> > 
> 
> hi Ming,
> 
> Here rq2 fail from blk_mq_plug_issue_direct() in blk_mq_flush_plug_list(),
> not blk_mq_sched_insert_requests

OK, just wondering why Yufen's patch touches
blk_mq_sched_insert_requests().

Here the issue is in blk_mq_plug_issue_direct() itself, it is wrong to use last
request of plug list to decide if run queue is needed since all the remained
requests in plug list may be from other hctxs, and the simplest fix could be pass
run_queue as true always to blk_mq_request_bypass_insert().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] blk-mq: run queue after issuing the last request of the plug list
  2022-07-26  7:39                             ` Ming Lei
@ 2022-07-26  9:20                               ` Yufen Yu
  0 siblings, 0 replies; 19+ messages in thread
From: Yufen Yu @ 2022-07-26  9:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yu Kuai, Yu Kuai, axboe, linux-block, hch, zhangyi (F)



On 2022/7/26 15:39, Ming Lei wrote:
> On Tue, Jul 26, 2022 at 01:01:41PM +0800, Yufen Yu wrote:
>>
>>>
>>
>> hi Ming,
>>
>> Here rq2 fail from blk_mq_plug_issue_direct() in blk_mq_flush_plug_list(),
>> not blk_mq_sched_insert_requests
> 
> OK, just wondering why Yufen's patch touches
> blk_mq_sched_insert_requests().
> 
> Here the issue is in blk_mq_plug_issue_direct() itself, it is wrong to use last
> request of plug list to decide if run queue is needed since all the remained
> requests in plug list may be from other hctxs, and the simplest fix could be pass
> run_queue as true always to blk_mq_request_bypass_insert().
> 


OK, thanks for your suggestion and I will send v2.

Thanks,
Yufen

> 
> Thanks,
> Ming
> 
> .

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-07-26  9:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18 12:35 [PATCH] blk-mq: run queue after issuing the last request of the plug list Yufen Yu
2022-07-19  9:26 ` Ming Lei
2022-07-19 11:00   ` Yufen Yu
2022-07-23  2:50   ` Yu Kuai
2022-07-25 15:43     ` Ming Lei
2022-07-26  1:08       ` Yu Kuai
2022-07-26  1:46         ` Ming Lei
2022-07-26  2:08           ` Yu Kuai
2022-07-26  2:32             ` Ming Lei
2022-07-26  2:52               ` Yu Kuai
2022-07-26  3:02                 ` Ming Lei
2022-07-26  3:14                   ` Yu Kuai
2022-07-26  3:21                     ` Ming Lei
2022-07-26  3:31                       ` Yufen Yu
2022-07-26  3:31                       ` Yu Kuai
2022-07-26  4:16                         ` Ming Lei
2022-07-26  5:01                           ` Yufen Yu
2022-07-26  7:39                             ` Ming Lei
2022-07-26  9:20                               ` Yufen Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.