[PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
@ 2019-10-23 17:57 James Smart
  2019-10-24  9:28 ` Ming Lei
  2019-10-25 20:51 ` Sagi Grimberg
  0 siblings, 2 replies; 10+ messages in thread
From: James Smart @ 2019-10-23 17:57 UTC (permalink / raw)
  To: linux-block; +Cc: James Smart, Shagun Agrawal, Christoph Hellwig

During the following test scenario:
- Offline a cpu
- load lpfc driver, which auto-discovers NVMe devices. For a new
  nvme device, the lpfc/nvme_fc transport can request up to
  num_online_cpus() worth of nr_hw_queues. The target in
  this case allowed at least that many of nvme queues.
The system encountered the following crash:

 BUG: unable to handle kernel paging request at 00003659d33953a8
 ...
 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
 RIP: 0010:blk_mq_get_request+0x21d/0x3c0
 ...
 Blk_mq_alloc_request_hctx+0xef/0x140
 Nvme_alloc_request+0x32/0x80 [nvme_core]
 __nvme_submit_sync_cmd+0x4a/0x1c0 [nvme_core]
 Nvmf_connect_io_queue+0x130/0x1a0 [nvme_fabrics]
 Nvme_fc_connect_io_queues+0x285/0x2b0 [nvme_fc]
 Nvme_fc_create_association+0x0x8ea/0x9c0 [nvme_fc]
 Nvme_fc_connect_ctrl_work+0x19/0x50 [nvme_fc]
 ...

There was a commit a while ago to simplify queue mapping which
replaced the use of cpumask_first() by cpumask_first_and().
The issue is if cpumask_first_and() does not find any _intersecting_ cpus,
it return's nr_cpu_id. nr_cpu_id isn't valid for the per_cpu_ptr index
which is done in __blk_mq_get_ctx().

Considered reverting back to cpumask_first(), but instead followed
logic in blk_mq_first_mapped_cpu() to check for nr_cpu_id before
calling cpumask_first().

Fixes: 20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: Shagun Agrawal <shagun.agrawal@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8538dc415499..0b06b4ea57f1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -461,6 +461,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		return ERR_PTR(-EXDEV);
 	}
 	cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
+	if (cpu >= nr_cpu_ids)
+		cpu = cpumask_first(alloc_data.hctx->cpumask);
 	alloc_data.ctx = __blk_mq_get_ctx(q, cpu);

 	rq = blk_mq_get_request(q, NULL, &alloc_data);
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-23 17:57 [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx() James Smart
@ 2019-10-24  9:28 ` Ming Lei
  2019-10-24 13:02   ` Jens Axboe
  2019-10-25 20:51 ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2019-10-24  9:28 UTC (permalink / raw)
  To: James Smart; +Cc: linux-block, Shagun Agrawal, Christoph Hellwig

On Thu, Oct 24, 2019 at 4:53 PM James Smart <jsmart2021@gmail.com> wrote:
>
> During the following test scenario:
> - Offline a cpu
> - load lpfc driver, which auto-discovers NVMe devices. For a new
>   nvme device, the lpfc/nvme_fc transport can request up to
>   num_online_cpus() worth of nr_hw_queues. The target in
>   this case allowed at least that many of nvme queues.
> The system encountered the following crash:
>
>  BUG: unable to handle kernel paging request at 00003659d33953a8
>  ...
>  Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
>  RIP: 0010:blk_mq_get_request+0x21d/0x3c0
>  ...
>  Blk_mq_alloc_request_hctx+0xef/0x140
>  Nvme_alloc_request+0x32/0x80 [nvme_core]
>  __nvme_submit_sync_cmd+0x4a/0x1c0 [nvme_core]
>  Nvmf_connect_io_queue+0x130/0x1a0 [nvme_fabrics]
>  Nvme_fc_connect_io_queues+0x285/0x2b0 [nvme_fc]
>  Nvme_fc_create_association+0x0x8ea/0x9c0 [nvme_fc]
>  Nvme_fc_connect_ctrl_work+0x19/0x50 [nvme_fc]
>  ...
>
> There was a commit a while ago to simplify queue mapping which
> replaced the use of cpumask_first() by cpumask_first_and().
> The issue is if cpumask_first_and() does not find any _intersecting_ cpus,
> it return's nr_cpu_id. nr_cpu_id isn't valid for the per_cpu_ptr index
> which is done in __blk_mq_get_ctx().
>
> Considered reverting back to cpumask_first(), but instead followed
> logic in blk_mq_first_mapped_cpu() to check for nr_cpu_id before
> calling cpumask_first().
>
> Fixes: 20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
> Signed-off-by: Shagun Agrawal <shagun.agrawal@broadcom.com>
> Signed-off-by: James Smart <jsmart2021@gmail.com>
> CC: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8538dc415499..0b06b4ea57f1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -461,6 +461,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>                 return ERR_PTR(-EXDEV);
>         }
>         cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
> +       if (cpu >= nr_cpu_ids)
> +               cpu = cpumask_first(alloc_data.hctx->cpumask);

The first cpu may be offline too, then kernel warning or timeout may
be triggered later
when this allocated request is dispatched.

To be honest, blk_mq_alloc_request_hctx() is really a weird interface,
given the hctx may become
dead just when calling into blk_mq_alloc_request_hctx().

Given only NVMe FC/RDMA uses this interface, could you provide some
background about
this kind of usage?

The normal usage is that user doesn't specify the hctx for allocating
request from, since blk-mq
can figure out which hctx is used for allocation via queue mapping.
Just wondering why NVMe
FC/RDMA can't do that way?


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-24  9:28 ` Ming Lei
@ 2019-10-24 13:02   ` Jens Axboe
  2019-10-24 18:53     ` James Smart
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2019-10-24 13:02 UTC (permalink / raw)
  To: Ming Lei, James Smart; +Cc: linux-block, Shagun Agrawal, Christoph Hellwig

On 10/24/19 3:28 AM, Ming Lei wrote:
> On Thu, Oct 24, 2019 at 4:53 PM James Smart <jsmart2021@gmail.com> wrote:
>>
>> During the following test scenario:
>> - Offline a cpu
>> - load lpfc driver, which auto-discovers NVMe devices. For a new
>>    nvme device, the lpfc/nvme_fc transport can request up to
>>    num_online_cpus() worth of nr_hw_queues. The target in
>>    this case allowed at least that many of nvme queues.
>> The system encountered the following crash:
>>
>>   BUG: unable to handle kernel paging request at 00003659d33953a8
>>   ...
>>   Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
>>   RIP: 0010:blk_mq_get_request+0x21d/0x3c0
>>   ...
>>   Blk_mq_alloc_request_hctx+0xef/0x140
>>   Nvme_alloc_request+0x32/0x80 [nvme_core]
>>   __nvme_submit_sync_cmd+0x4a/0x1c0 [nvme_core]
>>   Nvmf_connect_io_queue+0x130/0x1a0 [nvme_fabrics]
>>   Nvme_fc_connect_io_queues+0x285/0x2b0 [nvme_fc]
>>   Nvme_fc_create_association+0x0x8ea/0x9c0 [nvme_fc]
>>   Nvme_fc_connect_ctrl_work+0x19/0x50 [nvme_fc]
>>   ...
>>
>> There was a commit a while ago to simplify queue mapping which
>> replaced the use of cpumask_first() by cpumask_first_and().
>> The issue is if cpumask_first_and() does not find any _intersecting_ cpus,
>> it return's nr_cpu_id. nr_cpu_id isn't valid for the per_cpu_ptr index
>> which is done in __blk_mq_get_ctx().
>>
>> Considered reverting back to cpumask_first(), but instead followed
>> logic in blk_mq_first_mapped_cpu() to check for nr_cpu_id before
>> calling cpumask_first().
>>
>> Fixes: 20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
>> Signed-off-by: Shagun Agrawal <shagun.agrawal@broadcom.com>
>> Signed-off-by: James Smart <jsmart2021@gmail.com>
>> CC: Christoph Hellwig <hch@lst.de>
>> ---
>>   block/blk-mq.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 8538dc415499..0b06b4ea57f1 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -461,6 +461,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>                  return ERR_PTR(-EXDEV);
>>          }
>>          cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
>> +       if (cpu >= nr_cpu_ids)
>> +               cpu = cpumask_first(alloc_data.hctx->cpumask);
> 
> The first cpu may be offline too, then kernel warning or timeout may
> be triggered later
> when this allocated request is dispatched.
> 
> To be honest, blk_mq_alloc_request_hctx() is really a weird interface,
> given the hctx may become
> dead just when calling into blk_mq_alloc_request_hctx().
> 
> Given only NVMe FC/RDMA uses this interface, could you provide some
> background about
> this kind of usage?
> 
> The normal usage is that user doesn't specify the hctx for allocating
> request from, since blk-mq
> can figure out which hctx is used for allocation via queue mapping.
> Just wondering why NVMe
> FC/RDMA can't do that way?

Fully agree, it'd be much nicer if that weird interface could just
die.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-24 13:02   ` Jens Axboe
@ 2019-10-24 18:53     ` James Smart
  2019-10-25  7:22       ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: James Smart @ 2019-10-24 18:53 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: linux-block, Shagun Agrawal, Christoph Hellwig, James Smart

On 10/24/2019 6:02 AM, Jens Axboe wrote:
> On 10/24/19 3:28 AM, Ming Lei wrote:
>> The normal usage is that user doesn't specify the hctx for allocating
>> request from, since blk-mq
>> can figure out which hctx is used for allocation via queue mapping.
>> Just wondering why NVMe
>> FC/RDMA can't do that way?
> 
> Fully agree, it'd be much nicer if that weird interface could just
> die.
> 

Well.  All depends on what you think a hctx corresponds to, which 
relates to why the caller originally set nr_hw_queues to what it did. I 
think it is reasonable for the caller to say - this must be via this 
specific context.

To the nvme fabric transports (rdma, fc, tcp) the hctx corresponds to 
the nvme controller queue to issue a request on. In the single case 
where the hctx is specified specifically, it is the 1st command on a new 
nvme controller queue. The command *must* be issued on the queue it is 
to initialize (this is different from pci nvme).  The hctx is specified 
so the correct nvme queue is selected when the command comes down the 
request path.  Saying "don't do that" means one of the following: a) 
snooping every rq on the request path to spot initialization ios and 
move them to the right queue; or b) creating a duplicate non-blk-mq 
request path for this 1 initialization io. Both of those are ugly.

-- james

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-24 18:53     ` James Smart
@ 2019-10-25  7:22       ` Ming Lei
  2019-10-25 20:26         ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2019-10-25  7:22 UTC (permalink / raw)
  To: James Smart
  Cc: Jens Axboe, Ming Lei, linux-block, Shagun Agrawal, Christoph Hellwig

On Thu, Oct 24, 2019 at 11:53:26AM -0700, James Smart wrote:
> On 10/24/2019 6:02 AM, Jens Axboe wrote:
> > On 10/24/19 3:28 AM, Ming Lei wrote:
> > > The normal usage is that user doesn't specify the hctx for allocating
> > > request from, since blk-mq
> > > can figure out which hctx is used for allocation via queue mapping.
> > > Just wondering why NVMe
> > > FC/RDMA can't do that way?
> > 
> > Fully agree, it'd be much nicer if that weird interface could just
> > die.
> > 
> 
> Well.  All depends on what you think a hctx corresponds to, which relates to
> why the caller originally set nr_hw_queues to what it did. I think it is
> reasonable for the caller to say - this must be via this specific context.

Usually MQ is only for handling IO efficiently, and we seldom use
MQ for management purpose, such as, so far all admin queue is SQ.

> 
> To the nvme fabric transports (rdma, fc, tcp) the hctx corresponds to the
> nvme controller queue to issue a request on. In the single case where the

Could you explain a bit about what the purpose of nvme controller is ?

> hctx is specified specifically, it is the 1st command on a new nvme
> controller queue. The command *must* be issued on the queue it is to
> initialize (this is different from pci nvme).  The hctx is specified so the
> correct nvme queue is selected when the command comes down the request path.
> Saying "don't do that" means one of the following: a) snooping every rq on
> the request path to spot initialization ios and move them to the right
> queue; or b) creating a duplicate non-blk-mq request path for this 1
> initialization io. Both of those are ugly.

In nvmf_connect_io_queue(), 'qid' has been encoded into instance of 'struct
nvme_command', that means the 'nvme controller' should know the
specified queue by parsing the command. So still not understand why you
have to submit the command via the specified queue.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-25  7:22       ` Ming Lei
@ 2019-10-25 20:26         ` Sagi Grimberg
  2019-10-25 22:20           ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-10-25 20:26 UTC (permalink / raw)
  To: Ming Lei, James Smart
  Cc: Jens Axboe, Ming Lei, linux-block, Shagun Agrawal, Christoph Hellwig

>> hctx is specified specifically, it is the 1st command on a new nvme
>> controller queue. The command *must* be issued on the queue it is to
>> initialize (this is different from pci nvme).  The hctx is specified so the
>> correct nvme queue is selected when the command comes down the request path.
>> Saying "don't do that" means one of the following: a) snooping every rq on
>> the request path to spot initialization ios and move them to the right
>> queue; or b) creating a duplicate non-blk-mq request path for this 1
>> initialization io. Both of those are ugly.
> 
> In nvmf_connect_io_queue(), 'qid' has been encoded into instance of 'struct
> nvme_command', that means the 'nvme controller' should know the
> specified queue by parsing the command. So still not understand why you
> have to submit the command via the specified queue.

The connect command must be send on the queue that it is connecting, the
qid is telling the controller the id of the queue, but the controller
still expects the connect to be issued on the queue that it is designed
to connect (or rather initialize).

in queue_rq we take queue from hctx->driver_data and use it to issue
the command. The connect is different that it is invoked on a context
that is not necessarily running from a cpu that maps to this specific
hctx. So in essence what is needed is a tag from the specific queue tags
without running cpu consideration.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-23 17:57 [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx() James Smart
  2019-10-24  9:28 ` Ming Lei
@ 2019-10-25 20:51 ` Sagi Grimberg
  1 sibling, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-10-25 20:51 UTC (permalink / raw)
  To: James Smart, linux-block; +Cc: Shagun Agrawal, Christoph Hellwig


> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8538dc415499..0b06b4ea57f1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -461,6 +461,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>   		return ERR_PTR(-EXDEV);
>   	}
>   	cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
> +	if (cpu >= nr_cpu_ids)
> +		cpu = cpumask_first(alloc_data.hctx->cpumask);

I'd just drop the cpu_online_mask test, we don't care about the cpu
being online as this request is running already.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-25 20:26         ` Sagi Grimberg
@ 2019-10-25 22:20           ` Ming Lei
  2019-10-25 22:33             ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2019-10-25 22:20 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: James Smart, Jens Axboe, Ming Lei, linux-block, Shagun Agrawal,
	Christoph Hellwig

On Fri, Oct 25, 2019 at 01:26:46PM -0700, Sagi Grimberg wrote:
> 
> > > hctx is specified specifically, it is the 1st command on a new nvme
> > > controller queue. The command *must* be issued on the queue it is to
> > > initialize (this is different from pci nvme).  The hctx is specified so the
> > > correct nvme queue is selected when the command comes down the request path.
> > > Saying "don't do that" means one of the following: a) snooping every rq on
> > > the request path to spot initialization ios and move them to the right
> > > queue; or b) creating a duplicate non-blk-mq request path for this 1
> > > initialization io. Both of those are ugly.
> > 
> > In nvmf_connect_io_queue(), 'qid' has been encoded into instance of 'struct
> > nvme_command', that means the 'nvme controller' should know the
> > specified queue by parsing the command. So still not understand why you
> > have to submit the command via the specified queue.
> 
> The connect command must be send on the queue that it is connecting, the
> qid is telling the controller the id of the queue, but the controller
> still expects the connect to be issued on the queue that it is designed
> to connect (or rather initialize).
> 
> in queue_rq we take queue from hctx->driver_data and use it to issue
> the command. The connect is different that it is invoked on a context
> that is not necessarily running from a cpu that maps to this specific
> hctx. So in essence what is needed is a tag from the specific queue tags
> without running cpu consideration.

OK, got it.

If nvmf_connect_io_queue() is only run before setting up IO queues, the
shared tag problem could be solved easily, such as, use a standalone
tagset?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-25 22:20           ` Ming Lei
@ 2019-10-25 22:33             ` Sagi Grimberg
  2019-10-27  7:23               ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-10-25 22:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: James Smart, Jens Axboe, Ming Lei, linux-block, Shagun Agrawal,
	Christoph Hellwig


>>>> hctx is specified specifically, it is the 1st command on a new nvme
>>>> controller queue. The command *must* be issued on the queue it is to
>>>> initialize (this is different from pci nvme).  The hctx is specified so the
>>>> correct nvme queue is selected when the command comes down the request path.
>>>> Saying "don't do that" means one of the following: a) snooping every rq on
>>>> the request path to spot initialization ios and move them to the right
>>>> queue; or b) creating a duplicate non-blk-mq request path for this 1
>>>> initialization io. Both of those are ugly.
>>>
>>> In nvmf_connect_io_queue(), 'qid' has been encoded into instance of 'struct
>>> nvme_command', that means the 'nvme controller' should know the
>>> specified queue by parsing the command. So still not understand why you
>>> have to submit the command via the specified queue.
>>
>> The connect command must be send on the queue that it is connecting, the
>> qid is telling the controller the id of the queue, but the controller
>> still expects the connect to be issued on the queue that it is designed
>> to connect (or rather initialize).
>>
>> in queue_rq we take queue from hctx->driver_data and use it to issue
>> the command. The connect is different that it is invoked on a context
>> that is not necessarily running from a cpu that maps to this specific
>> hctx. So in essence what is needed is a tag from the specific queue tags
>> without running cpu consideration.
> 
> OK, got it.
> 
> If nvmf_connect_io_queue() is only run before setting up IO queues, the
> shared tag problem could be solved easily, such as, use a standalone
> tagset?

Not sure what you mean exactly...

Also, keep in mind that this sequence also goes into reconnects, where
we already have our tagset allocated (with pending requests
potentially).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()
  2019-10-25 22:33             ` Sagi Grimberg
@ 2019-10-27  7:23               ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2019-10-27  7:23 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: James Smart, Jens Axboe, Ming Lei, linux-block, Shagun Agrawal,
	Christoph Hellwig

On Fri, Oct 25, 2019 at 03:33:15PM -0700, Sagi Grimberg wrote:
> 
> > > > > hctx is specified specifically, it is the 1st command on a new nvme
> > > > > controller queue. The command *must* be issued on the queue it is to
> > > > > initialize (this is different from pci nvme).  The hctx is specified so the
> > > > > correct nvme queue is selected when the command comes down the request path.
> > > > > Saying "don't do that" means one of the following: a) snooping every rq on
> > > > > the request path to spot initialization ios and move them to the right
> > > > > queue; or b) creating a duplicate non-blk-mq request path for this 1
> > > > > initialization io. Both of those are ugly.
> > > > 
> > > > In nvmf_connect_io_queue(), 'qid' has been encoded into instance of 'struct
> > > > nvme_command', that means the 'nvme controller' should know the
> > > > specified queue by parsing the command. So still not understand why you
> > > > have to submit the command via the specified queue.
> > > 
> > > The connect command must be send on the queue that it is connecting, the
> > > qid is telling the controller the id of the queue, but the controller
> > > still expects the connect to be issued on the queue that it is designed
> > > to connect (or rather initialize).
> > > 
> > > in queue_rq we take queue from hctx->driver_data and use it to issue
> > > the command. The connect is different that it is invoked on a context
> > > that is not necessarily running from a cpu that maps to this specific
> > > hctx. So in essence what is needed is a tag from the specific queue tags
> > > without running cpu consideration.
> > 
> > OK, got it.
> > 
> > If nvmf_connect_io_queue() is only run before setting up IO queues, the
> > shared tag problem could be solved easily, such as, use a standalone
> > tagset?
> 
> Not sure what you mean exactly...
> 
> Also, keep in mind that this sequence also goes into reconnects, where
> we already have our tagset allocated (with pending requests
> potentially).

I just found the connect command is always submitted via the unique reserved
tag 0, so it is nothing to do with IO requests any more.

You can use the reserved tag 0 for connect command as before just by not
sharing tagset between connect queue and IO queues.

Follows the detailed idea:

1) still reserve 1 tag for connect command in the IO tagset.

2) in each driver, create a conn_tagset for .connect_q only, and create
the .connect_q from this 'conn_tagset', and sets conn_tagset.nr_hw_queues
as 1, and sets conn_tagset.queue_depth as 'ctrl.queue_count - 1'

3) in .queue_rq of conn_tagset.ops:

- parse index of queue to be connected from nvme_command.conn.qid
- set the connect command's tag as 0
- then do every other thing just like before


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-27  7:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23 17:57 [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx() James Smart
2019-10-24  9:28 ` Ming Lei
2019-10-24 13:02   ` Jens Axboe
2019-10-24 18:53     ` James Smart
2019-10-25  7:22       ` Ming Lei
2019-10-25 20:26         ` Sagi Grimberg
2019-10-25 22:20           ` Ming Lei
2019-10-25 22:33             ` Sagi Grimberg
2019-10-27  7:23               ` Ming Lei
2019-10-25 20:51 ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).