[PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
@ 2019-03-26 12:07 Hannes Reinecke
  2019-03-26 13:43 ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hannes Reinecke @ 2019-03-26 12:07 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-block, Hannes Reinecke,
	Hannes Reinecke

When a queue is dying or dead there is no point in calling
blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
so might crash the machine as the queue structures are in the
process of being deleted.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a9c181603cbd..b1eeba38bc79 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 
 	/* dispatch requests which are inserted during quiescing */
-	blk_mq_run_hw_queues(q, true);
+	if (!blk_queue_dying(q) && !blk_queue_dead(q))
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 12:07 [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues Hannes Reinecke
@ 2019-03-26 13:43 ` Bart Van Assche
  2019-03-26 14:08   ` Hannes Reinecke
  2019-03-27  0:05 ` Keith Busch
  2019-03-27  1:25 ` Ming Lei
  2 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-03-26 13:43 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-block, Hannes Reinecke

On 3/26/19 5:07 AM, Hannes Reinecke wrote:
> When a queue is dying or dead there is no point in calling
> blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
> so might crash the machine as the queue structures are in the
> process of being deleted.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>   block/blk-mq.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a9c181603cbd..b1eeba38bc79 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>   	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>   
>   	/* dispatch requests which are inserted during quiescing */
> -	blk_mq_run_hw_queues(q, true);
> +	if (!blk_queue_dying(q) && !blk_queue_dead(q))
> +		blk_mq_run_hw_queues(q, true);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);

Hi Hannes,

Please provide more context information. In the "dead" state the queue 
must be run to make sure that all requests that were queued before the 
"dead" state get processed. The blk_cleanup_queue() function is 
responsible for stopping all code that can run the queue after all 
requests have finished and before destruction of the data structures 
needed for request processing starts.

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 13:43 ` Bart Van Assche
@ 2019-03-26 14:08   ` Hannes Reinecke
  2019-03-26 15:12     ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2019-03-26 14:08 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-block, Hannes Reinecke

On 3/26/19 2:43 PM, Bart Van Assche wrote:
> On 3/26/19 5:07 AM, Hannes Reinecke wrote:
>> When a queue is dying or dead there is no point in calling
>> blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
>> so might crash the machine as the queue structures are in the
>> process of being deleted.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>   block/blk-mq.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index a9c181603cbd..b1eeba38bc79 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>>       blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>>       /* dispatch requests which are inserted during quiescing */
>> -    blk_mq_run_hw_queues(q, true);
>> +    if (!blk_queue_dying(q) && !blk_queue_dead(q))
>> +        blk_mq_run_hw_queues(q, true);
>>   }
>>   EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
> 
> Hi Hannes,
> 
> Please provide more context information. In the "dead" state the queue 
> must be run to make sure that all requests that were queued before the 
> "dead" state get processed. The blk_cleanup_queue() function is 
> responsible for stopping all code that can run the queue after all 
> requests have finished and before destruction of the data structures 
> needed for request processing starts.
> 
I have a crash with two processes competing for the same controller:

#0  0xffffffff983d3bcb in sbitmap_any_bit_set (sb=0xffff8a1b874ba0d8)
     at ../lib/sbitmap.c:181
#1  0xffffffff98366c05 in blk_mq_hctx_has_pending (hctx=0xffff8a1b874ba000)
     at ../block/blk-mq.c:66
#2  0xffffffff98366c85 in blk_mq_run_hw_queues (q=0xffff8a1b874ba0d8, 
async=true) at ../block/blk-mq.c:1292
#3  0xffffffff98366d3a in blk_mq_unquiesce_queue (q=<optimized out>)
     at ../block/blk-mq.c:265
#4  0xffffffffc01f3e0e in nvme_start_queues (ctrl=<optimized out>)
     at ../drivers/nvme/host/core.c:3658
#5  0xffffffffc01e843c in nvme_fc_delete_association 
(ctrl=0xffff8a1f9be5a000)
     at ../drivers/nvme/host/fc.c:2843
#6  0xffffffffc01e8544 in nvme_fc_delete_association (ctrl=<optimized out>)
     at ../drivers/nvme/host/fc.c:2918
#7  0xffffffffc01e8544 in __nvme_fc_terminate_io (ctrl=0xffff8a1f9be5a000)
     at ../drivers/nvme/host/fc.c:2911
#8  0xffffffffc01e8f09 in nvme_fc_reset_ctrl_work (work=0xffff8a1f9be5a6d0)
     at ../drivers/nvme/host/fc.c:2927
#9  0xffffffff980a224a in process_one_work (worker=0xffff8a1b73934f00, 
work=0xffff8a1f9be5a6d0) at ../kernel/workqueue.c:2092
#10 0xffffffff980a249b in worker_thread (__worker=0xffff8a1b73934f00)
     at ../kernel/workqueue.c:2226

#7  0xffffffff986d2e9a in wait_for_completion (x=0xffffa48eca88bc40)
     at ../kernel/sched/completion.c:125
#8  0xffffffff980f25ae in __synchronize_srcu (sp=0xffffffff9914fc20 
<debugfs_srcu>, do_norm=<optimized out>) at ../kernel/rcu/srcutree.c:851
#9  0xffffffff982d18b1 in debugfs_remove_recursive (dentry=<optimized out>)
     at ../fs/debugfs/inode.c:741
#10 0xffffffff98398ac5 in blk_mq_debugfs_unregister_hctx 
(hctx=0xffff8a1b7cccc000) at ../block/blk-mq-debugfs.c:897
#11 0xffffffff983661cf in blk_mq_exit_hctx (q=0xffff8a1f825e4040, 
set=0xffff8a1f9be5a0c0, hctx=0xffff8a1b7cccc000, hctx_idx=2) at 
../block/blk-mq.c:1987
#12 0xffffffff9836946a in blk_mq_exit_hw_queues (nr_queue=<optimized 
out>, set=<optimized out>, q=<optimized out>) at ../block/blk-mq.c:2017
#13 0xffffffff9836946a in blk_mq_free_queue (q=0xffff8a1f825e4040)
     at ../block/blk-mq.c:2506
#14 0xffffffff9835aac5 in blk_cleanup_queue (q=0xffff8a1f825e4040)
     at ../block/blk-core.c:691
#15 0xffffffffc01f5bc8 in nvme_ns_remove (ns=0xffff8a1f819e8f80)
     at ../drivers/nvme/host/core.c:3138
#16 0xffffffffc01f6fea in nvme_validate_ns (ctrl=0xffff8a1f9be5a308, nsid=5)
     at ../drivers/nvme/host/core.c:3164
#17 0xffffffffc01f9053 in nvme_scan_ns_list (nn=<optimized out>, 
ctrl=<optimized out>) at ../drivers/nvme/host/core.c:3202
#18 0xffffffffc01f9053 in nvme_scan_work (work=<optimized out>)
     at ../drivers/nvme/host/core.c:3280
#19 0xffffffff980a224a in process_one_work (worker=0xffff8a1b7349f6c0, 
work=0xffff8a1f9be5aba0) at ../kernel/workqueue.c:2092

Point is that the queue is already dead by the time nvme_start_queues() 
tries to flush existing requests (of which there are none, of course).
I had been looking into synchronizing scan_work and reset_work, but then 
I wasn't sure if that wouldn't deadlock somewhere.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 14:08   ` Hannes Reinecke
@ 2019-03-26 15:12     ` Bart Van Assche
  2019-03-26 15:33       ` Hannes Reinecke
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-03-26 15:12 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe, James Smart
  Cc: Christoph Hellwig, Ming Lei, linux-block, Hannes Reinecke

On Tue, 2019-03-26 at 15:08 +0100, Hannes Reinecke wrote:
> On 3/26/19 2:43 PM, Bart Van Assche wrote:
> > On 3/26/19 5:07 AM, Hannes Reinecke wrote:
> > > When a queue is dying or dead there is no point in calling
> > > blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
> > > so might crash the machine as the queue structures are in the
> > > process of being deleted.
> > > 
> > > Signed-off-by: Hannes Reinecke <hare@suse.com>
> > > ---
> > >   block/blk-mq.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index a9c181603cbd..b1eeba38bc79 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
> > >       blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > >       /* dispatch requests which are inserted during quiescing */
> > > -    blk_mq_run_hw_queues(q, true);
> > > +    if (!blk_queue_dying(q) && !blk_queue_dead(q))
> > > +        blk_mq_run_hw_queues(q, true);
> > >   }
> > >   EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
> > 
> > Hi Hannes,
> > 
> > Please provide more context information. In the "dead" state the queue 
> > must be run to make sure that all requests that were queued before the 
> > "dead" state get processed. The blk_cleanup_queue() function is 
> > responsible for stopping all code that can run the queue after all 
> > requests have finished and before destruction of the data structures 
> > needed for request processing starts.
> > 
> 
> I have a crash with two processes competing for the same controller:
> 
> #0  0xffffffff983d3bcb in sbitmap_any_bit_set (sb=0xffff8a1b874ba0d8)
>      at ../lib/sbitmap.c:181
> #1  0xffffffff98366c05 in blk_mq_hctx_has_pending (hctx=0xffff8a1b874ba000)
>      at ../block/blk-mq.c:66
> #2  0xffffffff98366c85 in blk_mq_run_hw_queues (q=0xffff8a1b874ba0d8, 
> async=true) at ../block/blk-mq.c:1292
> #3  0xffffffff98366d3a in blk_mq_unquiesce_queue (q=<optimized out>)
>      at ../block/blk-mq.c:265
> #4  0xffffffffc01f3e0e in nvme_start_queues (ctrl=<optimized out>)
>      at ../drivers/nvme/host/core.c:3658
> #5  0xffffffffc01e843c in nvme_fc_delete_association 
> (ctrl=0xffff8a1f9be5a000)
>      at ../drivers/nvme/host/fc.c:2843
> #6  0xffffffffc01e8544 in nvme_fc_delete_association (ctrl=<optimized out>)
>      at ../drivers/nvme/host/fc.c:2918
> #7  0xffffffffc01e8544 in __nvme_fc_terminate_io (ctrl=0xffff8a1f9be5a000)
>      at ../drivers/nvme/host/fc.c:2911
> #8  0xffffffffc01e8f09 in nvme_fc_reset_ctrl_work (work=0xffff8a1f9be5a6d0)
>      at ../drivers/nvme/host/fc.c:2927
> #9  0xffffffff980a224a in process_one_work (worker=0xffff8a1b73934f00, 
> work=0xffff8a1f9be5a6d0) at ../kernel/workqueue.c:2092
> #10 0xffffffff980a249b in worker_thread (__worker=0xffff8a1b73934f00)
>      at ../kernel/workqueue.c:2226
> 
> #7  0xffffffff986d2e9a in wait_for_completion (x=0xffffa48eca88bc40)
>      at ../kernel/sched/completion.c:125
> #8  0xffffffff980f25ae in __synchronize_srcu (sp=0xffffffff9914fc20 
> <debugfs_srcu>, do_norm=<optimized out>) at ../kernel/rcu/srcutree.c:851
> #9  0xffffffff982d18b1 in debugfs_remove_recursive (dentry=<optimized out>)
>      at ../fs/debugfs/inode.c:741
> #10 0xffffffff98398ac5 in blk_mq_debugfs_unregister_hctx 
> (hctx=0xffff8a1b7cccc000) at ../block/blk-mq-debugfs.c:897
> #11 0xffffffff983661cf in blk_mq_exit_hctx (q=0xffff8a1f825e4040, 
> set=0xffff8a1f9be5a0c0, hctx=0xffff8a1b7cccc000, hctx_idx=2) at 
> ../block/blk-mq.c:1987
> #12 0xffffffff9836946a in blk_mq_exit_hw_queues (nr_queue=<optimized 
> out>, set=<optimized out>, q=<optimized out>) at ../block/blk-mq.c:2017
> #13 0xffffffff9836946a in blk_mq_free_queue (q=0xffff8a1f825e4040)
>      at ../block/blk-mq.c:2506
> #14 0xffffffff9835aac5 in blk_cleanup_queue (q=0xffff8a1f825e4040)
>      at ../block/blk-core.c:691
> #15 0xffffffffc01f5bc8 in nvme_ns_remove (ns=0xffff8a1f819e8f80)
>      at ../drivers/nvme/host/core.c:3138
> #16 0xffffffffc01f6fea in nvme_validate_ns (ctrl=0xffff8a1f9be5a308, nsid=5)
>      at ../drivers/nvme/host/core.c:3164
> #17 0xffffffffc01f9053 in nvme_scan_ns_list (nn=<optimized out>, 
> ctrl=<optimized out>) at ../drivers/nvme/host/core.c:3202
> #18 0xffffffffc01f9053 in nvme_scan_work (work=<optimized out>)
>      at ../drivers/nvme/host/core.c:3280
> #19 0xffffffff980a224a in process_one_work (worker=0xffff8a1b7349f6c0, 
> work=0xffff8a1f9be5aba0) at ../kernel/workqueue.c:2092
> 
> Point is that the queue is already dead by the time nvme_start_queues() 
> tries to flush existing requests (of which there are none, of course).
> I had been looking into synchronizing scan_work and reset_work, but then 
> I wasn't sure if that wouldn't deadlock somewhere.

James, do you agree that nvme_fc_reset_ctrl_work should be canceled before
nvme_ns_remove() is allowed to call blk_cleanup_queue()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 15:12     ` Bart Van Assche
@ 2019-03-26 15:33       ` Hannes Reinecke
  2019-03-26 17:49         ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2019-03-26 15:33 UTC (permalink / raw)
  To: Bart Van Assche, Hannes Reinecke, Jens Axboe, James Smart
  Cc: Christoph Hellwig, Ming Lei, linux-block

On 3/26/19 4:12 PM, Bart Van Assche wrote:
> On Tue, 2019-03-26 at 15:08 +0100, Hannes Reinecke wrote:
>> On 3/26/19 2:43 PM, Bart Van Assche wrote:
>>> On 3/26/19 5:07 AM, Hannes Reinecke wrote:
>>>> When a queue is dying or dead there is no point in calling
>>>> blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
>>>> so might crash the machine as the queue structures are in the
>>>> process of being deleted.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>>>> ---
>>>>    block/blk-mq.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index a9c181603cbd..b1eeba38bc79 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>>>>        blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>>>>        /* dispatch requests which are inserted during quiescing */
>>>> -    blk_mq_run_hw_queues(q, true);
>>>> +    if (!blk_queue_dying(q) && !blk_queue_dead(q))
>>>> +        blk_mq_run_hw_queues(q, true);
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
>>>
>>> Hi Hannes,
>>>
>>> Please provide more context information. In the "dead" state the queue
>>> must be run to make sure that all requests that were queued before the
>>> "dead" state get processed. The blk_cleanup_queue() function is
>>> responsible for stopping all code that can run the queue after all
>>> requests have finished and before destruction of the data structures
>>> needed for request processing starts.
>>>
>>
>> I have a crash with two processes competing for the same controller:
>>
>> #0  0xffffffff983d3bcb in sbitmap_any_bit_set (sb=0xffff8a1b874ba0d8)
>>       at ../lib/sbitmap.c:181
>> #1  0xffffffff98366c05 in blk_mq_hctx_has_pending (hctx=0xffff8a1b874ba000)
>>       at ../block/blk-mq.c:66
>> #2  0xffffffff98366c85 in blk_mq_run_hw_queues (q=0xffff8a1b874ba0d8,
>> async=true) at ../block/blk-mq.c:1292
>> #3  0xffffffff98366d3a in blk_mq_unquiesce_queue (q=<optimized out>)
>>       at ../block/blk-mq.c:265
>> #4  0xffffffffc01f3e0e in nvme_start_queues (ctrl=<optimized out>)
>>       at ../drivers/nvme/host/core.c:3658
>> #5  0xffffffffc01e843c in nvme_fc_delete_association
>> (ctrl=0xffff8a1f9be5a000)
>>       at ../drivers/nvme/host/fc.c:2843
>> #6  0xffffffffc01e8544 in nvme_fc_delete_association (ctrl=<optimized out>)
>>       at ../drivers/nvme/host/fc.c:2918
>> #7  0xffffffffc01e8544 in __nvme_fc_terminate_io (ctrl=0xffff8a1f9be5a000)
>>       at ../drivers/nvme/host/fc.c:2911
>> #8  0xffffffffc01e8f09 in nvme_fc_reset_ctrl_work (work=0xffff8a1f9be5a6d0)
>>       at ../drivers/nvme/host/fc.c:2927
>> #9  0xffffffff980a224a in process_one_work (worker=0xffff8a1b73934f00,
>> work=0xffff8a1f9be5a6d0) at ../kernel/workqueue.c:2092
>> #10 0xffffffff980a249b in worker_thread (__worker=0xffff8a1b73934f00)
>>       at ../kernel/workqueue.c:2226
>>
>> #7  0xffffffff986d2e9a in wait_for_completion (x=0xffffa48eca88bc40)
>>       at ../kernel/sched/completion.c:125
>> #8  0xffffffff980f25ae in __synchronize_srcu (sp=0xffffffff9914fc20
>> <debugfs_srcu>, do_norm=<optimized out>) at ../kernel/rcu/srcutree.c:851
>> #9  0xffffffff982d18b1 in debugfs_remove_recursive (dentry=<optimized out>)
>>       at ../fs/debugfs/inode.c:741
>> #10 0xffffffff98398ac5 in blk_mq_debugfs_unregister_hctx
>> (hctx=0xffff8a1b7cccc000) at ../block/blk-mq-debugfs.c:897
>> #11 0xffffffff983661cf in blk_mq_exit_hctx (q=0xffff8a1f825e4040,
>> set=0xffff8a1f9be5a0c0, hctx=0xffff8a1b7cccc000, hctx_idx=2) at
>> ../block/blk-mq.c:1987
>> #12 0xffffffff9836946a in blk_mq_exit_hw_queues (nr_queue=<optimized
>> out>, set=<optimized out>, q=<optimized out>) at ../block/blk-mq.c:2017
>> #13 0xffffffff9836946a in blk_mq_free_queue (q=0xffff8a1f825e4040)
>>       at ../block/blk-mq.c:2506
>> #14 0xffffffff9835aac5 in blk_cleanup_queue (q=0xffff8a1f825e4040)
>>       at ../block/blk-core.c:691
>> #15 0xffffffffc01f5bc8 in nvme_ns_remove (ns=0xffff8a1f819e8f80)
>>       at ../drivers/nvme/host/core.c:3138
>> #16 0xffffffffc01f6fea in nvme_validate_ns (ctrl=0xffff8a1f9be5a308, nsid=5)
>>       at ../drivers/nvme/host/core.c:3164
>> #17 0xffffffffc01f9053 in nvme_scan_ns_list (nn=<optimized out>,
>> ctrl=<optimized out>) at ../drivers/nvme/host/core.c:3202
>> #18 0xffffffffc01f9053 in nvme_scan_work (work=<optimized out>)
>>       at ../drivers/nvme/host/core.c:3280
>> #19 0xffffffff980a224a in process_one_work (worker=0xffff8a1b7349f6c0,
>> work=0xffff8a1f9be5aba0) at ../kernel/workqueue.c:2092
>>
>> Point is that the queue is already dead by the time nvme_start_queues()
>> tries to flush existing requests (of which there are none, of course).
>> I had been looking into synchronizing scan_work and reset_work, but then
>> I wasn't sure if that wouldn't deadlock somewhere.
> 
> James, do you agree that nvme_fc_reset_ctrl_work should be canceled before
> nvme_ns_remove() is allowed to call blk_cleanup_queue()?
> 
Hmm. I would've thought exactly the other way round, namely 
cancelling/flushing the scan_work element when calling reset; after all, 
reset definitely takes precedence to scanning the controller (which 
won't work anyway as the controller is about to be reset...)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 15:33       ` Hannes Reinecke
@ 2019-03-26 17:49         ` James Smart
  2019-03-26 17:59           ` Hannes Reinecke
  0 siblings, 1 reply; 9+ messages in thread
From: James Smart @ 2019-03-26 17:49 UTC (permalink / raw)
  To: Hannes Reinecke, Bart Van Assche, Hannes Reinecke, Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-block, smart >> James Smart

On 3/26/2019 8:33 AM, Hannes Reinecke wrote:
> On 3/26/19 4:12 PM, Bart Van Assche wrote:
>> On Tue, 2019-03-26 at 15:08 +0100, Hannes Reinecke wrote:
>>> On 3/26/19 2:43 PM, Bart Van Assche wrote:
>>>> On 3/26/19 5:07 AM, Hannes Reinecke wrote:
>>>>> When a queue is dying or dead there is no point in calling
>>>>> blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
>>>>> so might crash the machine as the queue structures are in the
>>>>> process of being deleted.
>>>>>
>>>>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>>>>> ---
>>>>>    block/blk-mq.c | 3 ++-
>>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index a9c181603cbd..b1eeba38bc79 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct 
>>>>> request_queue *q)
>>>>>        blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>>>>>        /* dispatch requests which are inserted during quiescing */
>>>>> -    blk_mq_run_hw_queues(q, true);
>>>>> +    if (!blk_queue_dying(q) && !blk_queue_dead(q))
>>>>> +        blk_mq_run_hw_queues(q, true);
>>>>>    }
>>>>>    EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
>>>>
>>>> Hi Hannes,
>>>>
>>>> Please provide more context information. In the "dead" state the queue
>>>> must be run to make sure that all requests that were queued before the
>>>> "dead" state get processed. The blk_cleanup_queue() function is
>>>> responsible for stopping all code that can run the queue after all
>>>> requests have finished and before destruction of the data structures
>>>> needed for request processing starts.
>>>>
>>>
>>> I have a crash with two processes competing for the same controller:
>>>
>>> #0  0xffffffff983d3bcb in sbitmap_any_bit_set (sb=0xffff8a1b874ba0d8)
>>>       at ../lib/sbitmap.c:181
>>> #1  0xffffffff98366c05 in blk_mq_hctx_has_pending 
>>> (hctx=0xffff8a1b874ba000)
>>>       at ../block/blk-mq.c:66
>>> #2  0xffffffff98366c85 in blk_mq_run_hw_queues (q=0xffff8a1b874ba0d8,
>>> async=true) at ../block/blk-mq.c:1292
>>> #3  0xffffffff98366d3a in blk_mq_unquiesce_queue (q=<optimized out>)
>>>       at ../block/blk-mq.c:265
>>> #4  0xffffffffc01f3e0e in nvme_start_queues (ctrl=<optimized out>)
>>>       at ../drivers/nvme/host/core.c:3658
>>> #5  0xffffffffc01e843c in nvme_fc_delete_association
>>> (ctrl=0xffff8a1f9be5a000)
>>>       at ../drivers/nvme/host/fc.c:2843
>>> #6  0xffffffffc01e8544 in nvme_fc_delete_association 
>>> (ctrl=<optimized out>)
>>>       at ../drivers/nvme/host/fc.c:2918
>>> #7  0xffffffffc01e8544 in __nvme_fc_terminate_io 
>>> (ctrl=0xffff8a1f9be5a000)
>>>       at ../drivers/nvme/host/fc.c:2911
>>> #8  0xffffffffc01e8f09 in nvme_fc_reset_ctrl_work 
>>> (work=0xffff8a1f9be5a6d0)
>>>       at ../drivers/nvme/host/fc.c:2927
>>> #9  0xffffffff980a224a in process_one_work (worker=0xffff8a1b73934f00,
>>> work=0xffff8a1f9be5a6d0) at ../kernel/workqueue.c:2092
>>> #10 0xffffffff980a249b in worker_thread (__worker=0xffff8a1b73934f00)
>>>       at ../kernel/workqueue.c:2226
>>>
>>> #7  0xffffffff986d2e9a in wait_for_completion (x=0xffffa48eca88bc40)
>>>       at ../kernel/sched/completion.c:125
>>> #8  0xffffffff980f25ae in __synchronize_srcu (sp=0xffffffff9914fc20
>>> <debugfs_srcu>, do_norm=<optimized out>) at 
>>> ../kernel/rcu/srcutree.c:851
>>> #9  0xffffffff982d18b1 in debugfs_remove_recursive 
>>> (dentry=<optimized out>)
>>>       at ../fs/debugfs/inode.c:741
>>> #10 0xffffffff98398ac5 in blk_mq_debugfs_unregister_hctx
>>> (hctx=0xffff8a1b7cccc000) at ../block/blk-mq-debugfs.c:897
>>> #11 0xffffffff983661cf in blk_mq_exit_hctx (q=0xffff8a1f825e4040,
>>> set=0xffff8a1f9be5a0c0, hctx=0xffff8a1b7cccc000, hctx_idx=2) at
>>> ../block/blk-mq.c:1987
>>> #12 0xffffffff9836946a in blk_mq_exit_hw_queues (nr_queue=<optimized
>>> out>, set=<optimized out>, q=<optimized out>) at ../block/blk-mq.c:2017
>>> #13 0xffffffff9836946a in blk_mq_free_queue (q=0xffff8a1f825e4040)
>>>       at ../block/blk-mq.c:2506
>>> #14 0xffffffff9835aac5 in blk_cleanup_queue (q=0xffff8a1f825e4040)
>>>       at ../block/blk-core.c:691
>>> #15 0xffffffffc01f5bc8 in nvme_ns_remove (ns=0xffff8a1f819e8f80)
>>>       at ../drivers/nvme/host/core.c:3138
>>> #16 0xffffffffc01f6fea in nvme_validate_ns (ctrl=0xffff8a1f9be5a308, 
>>> nsid=5)
>>>       at ../drivers/nvme/host/core.c:3164
>>> #17 0xffffffffc01f9053 in nvme_scan_ns_list (nn=<optimized out>,
>>> ctrl=<optimized out>) at ../drivers/nvme/host/core.c:3202
>>> #18 0xffffffffc01f9053 in nvme_scan_work (work=<optimized out>)
>>>       at ../drivers/nvme/host/core.c:3280
>>> #19 0xffffffff980a224a in process_one_work (worker=0xffff8a1b7349f6c0,
>>> work=0xffff8a1f9be5aba0) at ../kernel/workqueue.c:2092
>>>
>>> Point is that the queue is already dead by the time nvme_start_queues()
>>> tries to flush existing requests (of which there are none, of course).
>>> I had been looking into synchronizing scan_work and reset_work, but 
>>> then
>>> I wasn't sure if that wouldn't deadlock somewhere.
>>
>> James, do you agree that nvme_fc_reset_ctrl_work should be canceled 
>> before
>> nvme_ns_remove() is allowed to call blk_cleanup_queue()?
>>
> Hmm. I would've thought exactly the other way round, namely 
> cancelling/flushing the scan_work element when calling reset; after 
> all, reset definitely takes precedence to scanning the controller 
> (which won't work anyway as the controller is about to be reset...)
>
> Cheers,
>
> Hannes

Cancelling/flushing scan_work before starting reset won't work. Hannes 
is, correct, reset must take precedent.

The scenario is a controller that was recently connected, thus scan work 
started, and while the scan thread is running, there's an error or 
connectivity is lost.  The errors force the reset - which must run to 
terminate any outstanding requests, including those issued by the scan 
thread.  This has to happen or we're going to hang the scan thread as 
it's synchronous io.  The 
nvme_fc_delete_association->nvme_start_queues() sequence says the reset 
path froze the queues, cancelled all outstanding io and is unfreezing 
the queues to allow io to pass again, so that any new io on the request 
queue can be rejected while the controller is not connected (ioctls or 
subsequent io from the scan thread while the controller is unconnected 
as well as anything that was pending on the io queues from the fs that 
should be bounced back for multipath).  Meanwhile, the scan thread got 
the error on the io request, and since it's the identify command that 
failed, it's deleting the namespace at the same time the transport is 
unfreezing the queues.

Given the transport nvme_start_queues() routine has no idea what 
ns-queues exist, or what state the ns queues are in, it seems like a 
synchronization issue on the ns queue itself, that can't be solved by 
scheduling transport work elements.

I wonder if it should be something like this:
--- a    2019-03-26 10:46:08.576056741 -0700
+++ b    2019-03-26 10:47:55.081252967 -0700
@@ -3870,8 +3870,10 @@ void nvme_start_queues(struct nvme_ctrl
      struct nvme_ns *ns;

      down_read(&ctrl->namespaces_rwsem);
-    list_for_each_entry(ns, &ctrl->namespaces, list)
-        blk_mq_unquiesce_queue(ns->queue);
+    list_for_each_entry(ns, &ctrl->namespaces, list) {
+        if (!test_bit(NVME_NS_REMOVING, &ns->flags))
+            blk_mq_unquiesce_queue(ns->queue);
+    }
      up_read(&ctrl->namespaces_rwsem);
  }
  EXPORT_SYMBOL_GPL(nvme_start_queues);

which if valid, would also mean the same check should be in 
nvme_stop_queues() and perhaps elsewhere.

-- james


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 17:49         ` James Smart
@ 2019-03-26 17:59           ` Hannes Reinecke
  0 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2019-03-26 17:59 UTC (permalink / raw)
  To: James Smart, Hannes Reinecke, Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-block

On 3/26/19 6:49 PM, James Smart wrote:
> 
> Cancelling/flushing scan_work before starting reset won't work. Hannes 
> is, correct, reset must take precedent.
> 
> The scenario is a controller that was recently connected, thus scan work 
> started, and while the scan thread is running, there's an error or 
> connectivity is lost.  The errors force the reset - which must run to 
> terminate any outstanding requests, including those issued by the scan 
> thread.  This has to happen or we're going to hang the scan thread as 
> it's synchronous io.  The 
> nvme_fc_delete_association->nvme_start_queues() sequence says the reset 
> path froze the queues, cancelled all outstanding io and is unfreezing 
> the queues to allow io to pass again, so that any new io on the request 
> queue can be rejected while the controller is not connected (ioctls or 
> subsequent io from the scan thread while the controller is unconnected 
> as well as anything that was pending on the io queues from the fs that 
> should be bounced back for multipath).  Meanwhile, the scan thread got 
> the error on the io request, and since it's the identify command that 
> failed, it's deleting the namespace at the same time the transport is 
> unfreezing the queues.
> 
> Given the transport nvme_start_queues() routine has no idea what 
> ns-queues exist, or what state the ns queues are in, it seems like a 
> synchronization issue on the ns queue itself, that can't be solved by 
> scheduling transport work elements.
> 
> I wonder if it should be something like this:
> --- a    2019-03-26 10:46:08.576056741 -0700
> +++ b    2019-03-26 10:47:55.081252967 -0700
> @@ -3870,8 +3870,10 @@ void nvme_start_queues(struct nvme_ctrl
>       struct nvme_ns *ns;
> 
>       down_read(&ctrl->namespaces_rwsem);
> -    list_for_each_entry(ns, &ctrl->namespaces, list)
> -        blk_mq_unquiesce_queue(ns->queue);
> +    list_for_each_entry(ns, &ctrl->namespaces, list) {
> +        if (!test_bit(NVME_NS_REMOVING, &ns->flags))
> +            blk_mq_unquiesce_queue(ns->queue);
> +    }
>       up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_start_queues);
> 
> which if valid, would also mean the same check should be in 
> nvme_stop_queues() and perhaps elsewhere.
> 
Well, if the namespace wouldn't be part of the list in the first place 
once it's been set to 'REMOVING'... lemme check.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 12:07 [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues Hannes Reinecke
  2019-03-26 13:43 ` Bart Van Assche
@ 2019-03-27  0:05 ` Keith Busch
  2019-03-27  1:25 ` Ming Lei
  2 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2019-03-27  0:05 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, Christoph Hellwig, Ming Lei, linux-block, Hannes Reinecke

On Tue, Mar 26, 2019 at 01:07:12PM +0100, Hannes Reinecke wrote:
> When a queue is dying or dead there is no point in calling
> blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
> so might crash the machine as the queue structures are in the
> process of being deleted.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-mq.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a9c181603cbd..b1eeba38bc79 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>  	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>  
>  	/* dispatch requests which are inserted during quiescing */
> -	blk_mq_run_hw_queues(q, true);
> +	if (!blk_queue_dying(q) && !blk_queue_dead(q))
> +		blk_mq_run_hw_queues(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);

We actually still have to unquieces the queue to flush entered requests to
a failed completion after killing that queue, like in nvme_set_queue_dying().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues
  2019-03-26 12:07 [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues Hannes Reinecke
  2019-03-26 13:43 ` Bart Van Assche
  2019-03-27  0:05 ` Keith Busch
@ 2019-03-27  1:25 ` Ming Lei
  2 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2019-03-27  1:25 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, Christoph Hellwig, Ming Lei, linux-block, Hannes Reinecke

On Tue, Mar 26, 2019 at 01:07:12PM +0100, Hannes Reinecke wrote:
> When a queue is dying or dead there is no point in calling
> blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing
> so might crash the machine as the queue structures are in the
> process of being deleted.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-mq.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a9c181603cbd..b1eeba38bc79 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -258,7 +258,8 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>  	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>  
>  	/* dispatch requests which are inserted during quiescing */
> -	blk_mq_run_hw_queues(q, true);
> +	if (!blk_queue_dying(q) && !blk_queue_dead(q))
> +		blk_mq_run_hw_queues(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);

Per previous protocol in legacy io path, the LLD should handle requests
when queue is dying.

Also now, the request queue has become frozen before setting DEAD, so
not necessary to add the DEAD check before run queue.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-27  1:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-26 12:07 [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues Hannes Reinecke
2019-03-26 13:43 ` Bart Van Assche
2019-03-26 14:08   ` Hannes Reinecke
2019-03-26 15:12     ` Bart Van Assche
2019-03-26 15:33       ` Hannes Reinecke
2019-03-26 17:49         ` James Smart
2019-03-26 17:59           ` Hannes Reinecke
2019-03-27  0:05 ` Keith Busch
2019-03-27  1:25 ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.