All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
@ 2019-05-23  4:51 Nirranjan Kirubaharan
  2019-05-23  7:21 ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-23  4:51 UTC (permalink / raw)


Return error -ENOMEM when nvmf target allocates lesser
io queues than the number of io queues requested by nvmf
initiator.

Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
---
 drivers/nvme/host/rdma.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f383146e7d0f..187007d136cc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 	struct ib_device *ibdev = ctrl->device->dev;
-	unsigned int nr_io_queues;
+	unsigned int nr_io_queues, nr_req_queues;
 	int i, ret;
 
 	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
 	}
 
+	nr_req_queues = nr_io_queues;
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
+	if (nr_io_queues < nr_req_queues) {
+		dev_err(ctrl->ctrl.device,
+			"alloc queues %u < req no of queues %u",
+			nr_io_queues, nr_req_queues);
+		return -ENOMEM;
+	}
 
 	ctrl->ctrl.queue_count = nr_io_queues + 1;
 	if (ctrl->ctrl.queue_count < 2)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-23  4:51 [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated Nirranjan Kirubaharan
@ 2019-05-23  7:21 ` Max Gurtovoy
  2019-05-23  7:55   ` Nirranjan Kirubaharan
  0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-23  7:21 UTC (permalink / raw)



On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> Return error -ENOMEM when nvmf target allocates lesser
> io queues than the number of io queues requested by nvmf
> initiator.

why can't we live with lesser queues ?

I can demand 64K queues and the target might return 4 and it's fine for 
functionality.

where is the NULL that you see ?


>
> Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> ---
>   drivers/nvme/host/rdma.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index f383146e7d0f..187007d136cc 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>   {
>   	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>   	struct ib_device *ibdev = ctrl->device->dev;
> -	unsigned int nr_io_queues;
> +	unsigned int nr_io_queues, nr_req_queues;
>   	int i, ret;
>   
>   	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> @@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>   		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>   	}
>   
> +	nr_req_queues = nr_io_queues;
>   	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>   	if (ret)
>   		return ret;
> +	if (nr_io_queues < nr_req_queues) {
> +		dev_err(ctrl->ctrl.device,
> +			"alloc queues %u < req no of queues %u",
> +			nr_io_queues, nr_req_queues);
> +		return -ENOMEM;
> +	}
>   
>   	ctrl->ctrl.queue_count = nr_io_queues + 1;
>   	if (ctrl->ctrl.queue_count < 2)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-23  7:21 ` Max Gurtovoy
@ 2019-05-23  7:55   ` Nirranjan Kirubaharan
  2019-05-23 11:14     ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-23  7:55 UTC (permalink / raw)


On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
> 
> On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> >Return error -ENOMEM when nvmf target allocates lesser
> >io queues than the number of io queues requested by nvmf
> >initiator.
> 
> why can't we live with lesser queues ?

In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
assuming all the requested no of queues will be allocated by the target.

> 
> I can demand 64K queues and the target might return 4 and it's fine
> for functionality.
> 
> where is the NULL that you see ?

In nvme_rdma_init_request() accessing unallocated queue_idx of
ctrl->io_queues[] causes NULL deref.

[  703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
[  703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
[  703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
[  703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
[  703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
[  703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
[  703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
[  703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
[  703.192195] FS:  00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
[  703.192196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
[  703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  703.192199] Call Trace:
[  703.192206]  blk_mq_alloc_rqs+0x1f0/0x290
[  703.192207]  __blk_mq_alloc_rq_map+0x46/0x80
[  703.192209]  blk_mq_map_swqueue+0x1dd/0x2e0
[  703.192210]  blk_mq_init_allocated_queue+0x3c8/0x430
[  703.192211]  blk_mq_init_queue+0x35/0x60
[  703.192213]  ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
[  703.192214]  nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
[  703.192215]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
[  703.192218]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
[  703.192222]  vfs_write+0xad/0x1b0
[  703.192224]  ksys_write+0x5a/0xd0
[  703.192228]  do_syscall_64+0x5b/0x180
[  703.192231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  703.192232] RIP: 0033:0x7fd660cddc60
[  703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
[  703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
[  703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
[  703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
[  703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
[  703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0

> 
> 
> >
> >Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> >Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> >---
> >  drivers/nvme/host/rdma.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >index f383146e7d0f..187007d136cc 100644
> >--- a/drivers/nvme/host/rdma.c
> >+++ b/drivers/nvme/host/rdma.c
> >@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >  {
> >  	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >  	struct ib_device *ibdev = ctrl->device->dev;
> >-	unsigned int nr_io_queues;
> >+	unsigned int nr_io_queues, nr_req_queues;
> >  	int i, ret;
> >  	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >  		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >  	}
> >+	nr_req_queues = nr_io_queues;
> >  	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >  	if (ret)
> >  		return ret;
> >+	if (nr_io_queues < nr_req_queues) {
> >+		dev_err(ctrl->ctrl.device,
> >+			"alloc queues %u < req no of queues %u",
> >+			nr_io_queues, nr_req_queues);
> >+		return -ENOMEM;
> >+	}
> >  	ctrl->ctrl.queue_count = nr_io_queues + 1;
> >  	if (ctrl->ctrl.queue_count < 2)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-23  7:55   ` Nirranjan Kirubaharan
@ 2019-05-23 11:14     ` Max Gurtovoy
  2019-05-23 11:41       ` Nirranjan Kirubaharan
  0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-23 11:14 UTC (permalink / raw)


I see.

probably we need to review again read/write/poll queues patches.

can you try the attached untested patch ?

On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
> On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
>> On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
>>> Return error -ENOMEM when nvmf target allocates lesser
>>> io queues than the number of io queues requested by nvmf
>>> initiator.
>> why can't we live with lesser queues ?
> In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
> assuming all the requested no of queues will be allocated by the target.
>
>> I can demand 64K queues and the target might return 4 and it's fine
>> for functionality.
>>
>> where is the NULL that you see ?
> In nvme_rdma_init_request() accessing unallocated queue_idx of
> ctrl->io_queues[] causes NULL deref.
>
> [  703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
> [  703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
> [  703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
> [  703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
> [  703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
> [  703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
> [  703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
> [  703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
> [  703.192195] FS:  00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
> [  703.192196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
> [  703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  703.192199] Call Trace:
> [  703.192206]  blk_mq_alloc_rqs+0x1f0/0x290
> [  703.192207]  __blk_mq_alloc_rq_map+0x46/0x80
> [  703.192209]  blk_mq_map_swqueue+0x1dd/0x2e0
> [  703.192210]  blk_mq_init_allocated_queue+0x3c8/0x430
> [  703.192211]  blk_mq_init_queue+0x35/0x60
> [  703.192213]  ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
> [  703.192214]  nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
> [  703.192215]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> [  703.192218]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> [  703.192222]  vfs_write+0xad/0x1b0
> [  703.192224]  ksys_write+0x5a/0xd0
> [  703.192228]  do_syscall_64+0x5b/0x180
> [  703.192231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  703.192232] RIP: 0033:0x7fd660cddc60
> [  703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> [  703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
> [  703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
> [  703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
> [  703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
> [  703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
>
>>
>>> Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
>>> Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
>>> ---
>>>   drivers/nvme/host/rdma.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index f383146e7d0f..187007d136cc 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>>   {
>>>   	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>>>   	struct ib_device *ibdev = ctrl->device->dev;
>>> -	unsigned int nr_io_queues;
>>> +	unsigned int nr_io_queues, nr_req_queues;
>>>   	int i, ret;
>>>   	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
>>> @@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>>   		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>>>   	}
>>> +	nr_req_queues = nr_io_queues;
>>>   	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>>>   	if (ret)
>>>   		return ret;
>>> +	if (nr_io_queues < nr_req_queues) {
>>> +		dev_err(ctrl->ctrl.device,
>>> +			"alloc queues %u < req no of queues %u",
>>> +			nr_io_queues, nr_req_queues);
>>> +		return -ENOMEM;
>>> +	}
>>>   	ctrl->ctrl.queue_count = nr_io_queues + 1;
>>>   	if (ctrl->ctrl.queue_count < 2)
-------------- next part --------------
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a4ad25..d0cc981 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 	struct ib_device *ibdev = ctrl->device->dev;
-	unsigned int nr_io_queues;
+	unsigned int nr_io_queues, nr_req_queues;
+	unsigned int default_queues, poll_queues = 0, write_queues = 0;
 	int i, ret;
 
 	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
@@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 	 * optimal locality so we don't need more queues than
 	 * completion vectors.
 	 */
-	nr_io_queues = min_t(unsigned int, nr_io_queues,
-				ibdev->num_comp_vectors);
+	default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
+					      ibdev->num_comp_vectors);
 
-	if (opts->nr_write_queues) {
-		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-				min(opts->nr_write_queues, nr_io_queues);
-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
-	} else {
-		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
-	}
-
-	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
-
-	if (opts->nr_poll_queues) {
-		ctrl->io_queues[HCTX_TYPE_POLL] =
-			min(opts->nr_poll_queues, num_online_cpus());
-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
-	}
+	if (opts->nr_write_queues)
+		write_queues = min(opts->nr_write_queues, default_queues);
+	if (opts->nr_poll_queues)
+		poll_queues = min(opts->nr_poll_queues, num_online_cpus());
 
+	nr_io_queues += write_queues + poll_queues;
+	nr_req_queues = nr_io_queues;
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
 
+	if (nr_req_queues <= nr_io_queues) {
+		/* set the queues according to host demand */
+		ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
+		if (write_queues)
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
+		else
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+					nr_req_queues - poll_queues;
+		if (poll_queues)
+			ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
+
+	} else {
+		/* set the queues according to controller capability */
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
+		ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
+		ctrl->io_queues[HCTX_TYPE_POLL] = 0;
+	}
+
 	ctrl->ctrl.queue_count = nr_io_queues + 1;
 	if (ctrl->ctrl.queue_count < 2)
 		return 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-23 11:14     ` Max Gurtovoy
@ 2019-05-23 11:41       ` Nirranjan Kirubaharan
  2019-05-23 14:35         ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-23 11:41 UTC (permalink / raw)


On Thursday, May 05/23/19, 2019@14:14:25 +0300, Max Gurtovoy wrote:
> I see.
> 
> probably we need to review again read/write/poll queues patches.
> 
> can you try the attached untested patch ?

Using the attached patch, it works if I dont use the write/poll queues,
even when target allocates lesser queues.

I see the below panic, if I use poll queues with target allocating
less than the requested queues.

[161557.300219] RIP: 0010:blk_mq_map_queues+0x92/0xa0
[161557.312476] Code: 39 05 26 76 02 01 0f 46 c3 39 d8 74 19 89 c0 41 8b 04 84 43 89 04 bc eb a3 5b 5d 41 5c 41 5d 41 5e 31 c0 41 5f c3 89 d8 31 d2 <f7> f5 41 03 55 0c 43 89 14 bc eb 86 66 90 66 66 66 66 90 55 b8 ff
[161557.347031] RSP: 0018:ffffa04fc3d57ce8 EFLAGS: 00010246
[161557.360280] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[161557.375517] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff924d47c11810
[161557.390701] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff924f143451e0
[161557.405828] R10: ffff924d47c03980 R11: 0000000000000000 R12: ffff924f143451e0
[161557.420882] R13: ffff924f14af4028 R14: 0000000000010120 R15: 0000000000000000
[161557.435919] FS:  00007f4566a31740(0000) GS:ffff92506fa00000(0000) knlGS:0000000000000000
[161557.451938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[161557.465622] CR2: 00007ffdf1ef99d8 CR3: 000000042213a000 CR4: 00000000000006f0
[161557.480765] Call Trace:
[161557.491147]  nvme_rdma_map_queues+0x9e/0xc0 [nvme_rdma]
[161557.504360]  blk_mq_alloc_tag_set+0x1bd/0x2d0
[161557.516666]  nvme_rdma_alloc_tagset+0xd6/0x2a0 [nvme_rdma]
[161557.530108]  nvme_rdma_setup_ctrl+0x362/0x7a0 [nvme_rdma]
[161557.543410]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
[161557.556754]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
[161557.569687]  vfs_write+0xad/0x1b0
[161557.580718]  ksys_write+0x55/0xd0
[161557.591689]  do_syscall_64+0x5b/0x1b0
[161557.602941]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[161557.615575] RIP: 0033:0x7f4566536c60
[161557.626678] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
[161557.661257] RSP: 002b:00007ffdf1ef99d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[161557.676897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4566536c60
[161557.692124] RDX: 000000000000005c RSI: 00007ffdf1efae70 RDI: 0000000000000003
[161557.707378] RBP: 00007ffdf1efae70 R08: 0000000000000000 R09: 00007f45664940fd
[161557.722648] R10: 00007ffdf1ef95a0 R11: 0000000000000246 R12: 000000000000005c
[161557.737848] R13: 0000000000000037 R14: 000000000000000b R15: 000000000064f5e0

> 
> On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
> >On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
> >>On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> >>>Return error -ENOMEM when nvmf target allocates lesser
> >>>io queues than the number of io queues requested by nvmf
> >>>initiator.
> >>why can't we live with lesser queues ?
> >In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
> >assuming all the requested no of queues will be allocated by the target.
> >
> >>I can demand 64K queues and the target might return 4 and it's fine
> >>for functionality.
> >>
> >>where is the NULL that you see ?
> >In nvme_rdma_init_request() accessing unallocated queue_idx of
> >ctrl->io_queues[] causes NULL deref.
> >
> >[  703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
> >[  703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
> >[  703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
> >[  703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
> >[  703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
> >[  703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
> >[  703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
> >[  703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
> >[  703.192195] FS:  00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
> >[  703.192196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[  703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
> >[  703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >[  703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >[  703.192199] Call Trace:
> >[  703.192206]  blk_mq_alloc_rqs+0x1f0/0x290
> >[  703.192207]  __blk_mq_alloc_rq_map+0x46/0x80
> >[  703.192209]  blk_mq_map_swqueue+0x1dd/0x2e0
> >[  703.192210]  blk_mq_init_allocated_queue+0x3c8/0x430
> >[  703.192211]  blk_mq_init_queue+0x35/0x60
> >[  703.192213]  ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
> >[  703.192214]  nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
> >[  703.192215]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> >[  703.192218]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> >[  703.192222]  vfs_write+0xad/0x1b0
> >[  703.192224]  ksys_write+0x5a/0xd0
> >[  703.192228]  do_syscall_64+0x5b/0x180
> >[  703.192231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >[  703.192232] RIP: 0033:0x7fd660cddc60
> >[  703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> >[  703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >[  703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
> >[  703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
> >[  703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
> >[  703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
> >[  703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
> >
> >>
> >>>Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> >>>Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> >>>---
> >>>  drivers/nvme/host/rdma.c | 9 ++++++++-
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >>>index f383146e7d0f..187007d136cc 100644
> >>>--- a/drivers/nvme/host/rdma.c
> >>>+++ b/drivers/nvme/host/rdma.c
> >>>@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>>  {
> >>>  	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >>>  	struct ib_device *ibdev = ctrl->device->dev;
> >>>-	unsigned int nr_io_queues;
> >>>+	unsigned int nr_io_queues, nr_req_queues;
> >>>  	int i, ret;
> >>>  	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >>>@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>>  		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >>>  	}
> >>>+	nr_req_queues = nr_io_queues;
> >>>  	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >>>  	if (ret)
> >>>  		return ret;
> >>>+	if (nr_io_queues < nr_req_queues) {
> >>>+		dev_err(ctrl->ctrl.device,
> >>>+			"alloc queues %u < req no of queues %u",
> >>>+			nr_io_queues, nr_req_queues);
> >>>+		return -ENOMEM;
> >>>+	}
> >>>  	ctrl->ctrl.queue_count = nr_io_queues + 1;
> >>>  	if (ctrl->ctrl.queue_count < 2)

> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 5a4ad25..d0cc981 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>  {
>  	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>  	struct ib_device *ibdev = ctrl->device->dev;
> -	unsigned int nr_io_queues;
> +	unsigned int nr_io_queues, nr_req_queues;
> +	unsigned int default_queues, poll_queues = 0, write_queues = 0;
>  	int i, ret;
>  
>  	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> @@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>  	 * optimal locality so we don't need more queues than
>  	 * completion vectors.
>  	 */
> -	nr_io_queues = min_t(unsigned int, nr_io_queues,
> -				ibdev->num_comp_vectors);
> +	default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
> +					      ibdev->num_comp_vectors);
>  
> -	if (opts->nr_write_queues) {
> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> -				min(opts->nr_write_queues, nr_io_queues);
> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -	} else {
> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> -	}
> -
> -	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> -
> -	if (opts->nr_poll_queues) {
> -		ctrl->io_queues[HCTX_TYPE_POLL] =
> -			min(opts->nr_poll_queues, num_online_cpus());
> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> -	}
> +	if (opts->nr_write_queues)
> +		write_queues = min(opts->nr_write_queues, default_queues);
> +	if (opts->nr_poll_queues)
> +		poll_queues = min(opts->nr_poll_queues, num_online_cpus());
>  
> +	nr_io_queues += write_queues + poll_queues;
> +	nr_req_queues = nr_io_queues;
>  	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>  	if (ret)
>  		return ret;
>  
> +	if (nr_req_queues <= nr_io_queues) {
> +		/* set the queues according to host demand */
> +		ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
> +		if (write_queues)
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
> +		else
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> +					nr_req_queues - poll_queues;
> +		if (poll_queues)
> +			ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
> +
> +	} else {
> +		/* set the queues according to controller capability */
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> +		ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> +		ctrl->io_queues[HCTX_TYPE_POLL] = 0;
> +	}
> +
>  	ctrl->ctrl.queue_count = nr_io_queues + 1;
>  	if (ctrl->ctrl.queue_count < 2)
>  		return 0;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-23 11:41       ` Nirranjan Kirubaharan
@ 2019-05-23 14:35         ` Max Gurtovoy
  2019-05-24  4:43           ` Nirranjan Kirubaharan
  0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-23 14:35 UTC (permalink / raw)


I'll take deeper look on it next week but please try the new attached patch

On 5/23/2019 2:41 PM, Nirranjan Kirubaharan wrote:
> On Thursday, May 05/23/19, 2019@14:14:25 +0300, Max Gurtovoy wrote:
>> I see.
>>
>> probably we need to review again read/write/poll queues patches.
>>
>> can you try the attached untested patch ?
> Using the attached patch, it works if I dont use the write/poll queues,
> even when target allocates lesser queues.
>
> I see the below panic, if I use poll queues with target allocating
> less than the requested queues.
>
> [161557.300219] RIP: 0010:blk_mq_map_queues+0x92/0xa0
> [161557.312476] Code: 39 05 26 76 02 01 0f 46 c3 39 d8 74 19 89 c0 41 8b 04 84 43 89 04 bc eb a3 5b 5d 41 5c 41 5d 41 5e 31 c0 41 5f c3 89 d8 31 d2 <f7> f5 41 03 55 0c 43 89 14 bc eb 86 66 90 66 66 66 66 90 55 b8 ff
> [161557.347031] RSP: 0018:ffffa04fc3d57ce8 EFLAGS: 00010246
> [161557.360280] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [161557.375517] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff924d47c11810
> [161557.390701] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff924f143451e0
> [161557.405828] R10: ffff924d47c03980 R11: 0000000000000000 R12: ffff924f143451e0
> [161557.420882] R13: ffff924f14af4028 R14: 0000000000010120 R15: 0000000000000000
> [161557.435919] FS:  00007f4566a31740(0000) GS:ffff92506fa00000(0000) knlGS:0000000000000000
> [161557.451938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [161557.465622] CR2: 00007ffdf1ef99d8 CR3: 000000042213a000 CR4: 00000000000006f0
> [161557.480765] Call Trace:
> [161557.491147]  nvme_rdma_map_queues+0x9e/0xc0 [nvme_rdma]
> [161557.504360]  blk_mq_alloc_tag_set+0x1bd/0x2d0
> [161557.516666]  nvme_rdma_alloc_tagset+0xd6/0x2a0 [nvme_rdma]
> [161557.530108]  nvme_rdma_setup_ctrl+0x362/0x7a0 [nvme_rdma]
> [161557.543410]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> [161557.556754]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> [161557.569687]  vfs_write+0xad/0x1b0
> [161557.580718]  ksys_write+0x55/0xd0
> [161557.591689]  do_syscall_64+0x5b/0x1b0
> [161557.602941]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [161557.615575] RIP: 0033:0x7f4566536c60
> [161557.626678] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> [161557.661257] RSP: 002b:00007ffdf1ef99d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [161557.676897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4566536c60
> [161557.692124] RDX: 000000000000005c RSI: 00007ffdf1efae70 RDI: 0000000000000003
> [161557.707378] RBP: 00007ffdf1efae70 R08: 0000000000000000 R09: 00007f45664940fd
> [161557.722648] R10: 00007ffdf1ef95a0 R11: 0000000000000246 R12: 000000000000005c
> [161557.737848] R13: 0000000000000037 R14: 000000000000000b R15: 000000000064f5e0
>
>> On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
>>> On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
>>>> On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
>>>>> Return error -ENOMEM when nvmf target allocates lesser
>>>>> io queues than the number of io queues requested by nvmf
>>>>> initiator.
>>>> why can't we live with lesser queues ?
>>> In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
>>> assuming all the requested no of queues will be allocated by the target.
>>>
>>>> I can demand 64K queues and the target might return 4 and it's fine
>>>> for functionality.
>>>>
>>>> where is the NULL that you see ?
>>> In nvme_rdma_init_request() accessing unallocated queue_idx of
>>> ctrl->io_queues[] causes NULL deref.
>>>
>>> [  703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
>>> [  703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
>>> [  703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
>>> [  703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
>>> [  703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
>>> [  703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
>>> [  703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
>>> [  703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
>>> [  703.192195] FS:  00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
>>> [  703.192196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
>>> [  703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [  703.192199] Call Trace:
>>> [  703.192206]  blk_mq_alloc_rqs+0x1f0/0x290
>>> [  703.192207]  __blk_mq_alloc_rq_map+0x46/0x80
>>> [  703.192209]  blk_mq_map_swqueue+0x1dd/0x2e0
>>> [  703.192210]  blk_mq_init_allocated_queue+0x3c8/0x430
>>> [  703.192211]  blk_mq_init_queue+0x35/0x60
>>> [  703.192213]  ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
>>> [  703.192214]  nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
>>> [  703.192215]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
>>> [  703.192218]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
>>> [  703.192222]  vfs_write+0xad/0x1b0
>>> [  703.192224]  ksys_write+0x5a/0xd0
>>> [  703.192228]  do_syscall_64+0x5b/0x180
>>> [  703.192231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [  703.192232] RIP: 0033:0x7fd660cddc60
>>> [  703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
>>> [  703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> [  703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
>>> [  703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
>>> [  703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
>>> [  703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
>>> [  703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
>>>
>>>>> Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
>>>>> Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
>>>>> ---
>>>>>   drivers/nvme/host/rdma.c | 9 ++++++++-
>>>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>> index f383146e7d0f..187007d136cc 100644
>>>>> --- a/drivers/nvme/host/rdma.c
>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>> @@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>>>>   {
>>>>>   	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>>>>>   	struct ib_device *ibdev = ctrl->device->dev;
>>>>> -	unsigned int nr_io_queues;
>>>>> +	unsigned int nr_io_queues, nr_req_queues;
>>>>>   	int i, ret;
>>>>>   	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
>>>>> @@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>>>>   		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>>>>>   	}
>>>>> +	nr_req_queues = nr_io_queues;
>>>>>   	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>>>>>   	if (ret)
>>>>>   		return ret;
>>>>> +	if (nr_io_queues < nr_req_queues) {
>>>>> +		dev_err(ctrl->ctrl.device,
>>>>> +			"alloc queues %u < req no of queues %u",
>>>>> +			nr_io_queues, nr_req_queues);
>>>>> +		return -ENOMEM;
>>>>> +	}
>>>>>   	ctrl->ctrl.queue_count = nr_io_queues + 1;
>>>>>   	if (ctrl->ctrl.queue_count < 2)
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 5a4ad25..d0cc981 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>   {
>>   	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>>   	struct ib_device *ibdev = ctrl->device->dev;
>> -	unsigned int nr_io_queues;
>> +	unsigned int nr_io_queues, nr_req_queues;
>> +	unsigned int default_queues, poll_queues = 0, write_queues = 0;
>>   	int i, ret;
>>   
>>   	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
>> @@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>   	 * optimal locality so we don't need more queues than
>>   	 * completion vectors.
>>   	 */
>> -	nr_io_queues = min_t(unsigned int, nr_io_queues,
>> -				ibdev->num_comp_vectors);
>> +	default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
>> +					      ibdev->num_comp_vectors);
>>   
>> -	if (opts->nr_write_queues) {
>> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
>> -				min(opts->nr_write_queues, nr_io_queues);
>> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> -	} else {
>> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
>> -	}
>> -
>> -	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
>> -
>> -	if (opts->nr_poll_queues) {
>> -		ctrl->io_queues[HCTX_TYPE_POLL] =
>> -			min(opts->nr_poll_queues, num_online_cpus());
>> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>> -	}
>> +	if (opts->nr_write_queues)
>> +		write_queues = min(opts->nr_write_queues, default_queues);
>> +	if (opts->nr_poll_queues)
>> +		poll_queues = min(opts->nr_poll_queues, num_online_cpus());
>>   
>> +	nr_io_queues += write_queues + poll_queues;
>> +	nr_req_queues = nr_io_queues;
>>   	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>>   	if (ret)
>>   		return ret;
>>   
>> +	if (nr_req_queues <= nr_io_queues) {
>> +		/* set the queues according to host demand */
>> +		ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
>> +		if (write_queues)
>> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
>> +		else
>> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] =
>> +					nr_req_queues - poll_queues;
>> +		if (poll_queues)
>> +			ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
>> +
>> +	} else {
>> +		/* set the queues according to controller capability */
>> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
>> +		ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
>> +		ctrl->io_queues[HCTX_TYPE_POLL] = 0;
>> +	}
>> +
>>   	ctrl->ctrl.queue_count = nr_io_queues + 1;
>>   	if (ctrl->ctrl.queue_count < 2)
>>   		return 0;
-------------- next part --------------
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5ee75b5..f924451 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -247,6 +247,9 @@ struct nvme_ctrl {
 
 	struct page *discard_page;
 	unsigned long discard_page_busy;
+
+	unsigned int nr_write_queues;
+	unsigned int nr_poll_queues;
 };
 
 enum nvme_iopolicy {
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a4ad25..5e90e92 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -641,39 +641,56 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 	struct ib_device *ibdev = ctrl->device->dev;
-	unsigned int nr_io_queues;
+	unsigned int nr_io_queues, nr_req_queues;
+	unsigned int default_queues, poll_queues = 0, write_queues = 0;
 	int i, ret;
 
-	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
+	default_queues = min(opts->nr_io_queues, num_online_cpus());
 
 	/*
 	 * we map queues according to the device irq vectors for
 	 * optimal locality so we don't need more queues than
 	 * completion vectors.
 	 */
-	nr_io_queues = min_t(unsigned int, nr_io_queues,
-				ibdev->num_comp_vectors);
+	default_queues = min_t(unsigned int, default_queues,
+			       ibdev->num_comp_vectors);
 
-	if (opts->nr_write_queues) {
-		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-				min(opts->nr_write_queues, nr_io_queues);
-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
-	} else {
-		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
-	}
-
-	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
-
-	if (opts->nr_poll_queues) {
-		ctrl->io_queues[HCTX_TYPE_POLL] =
-			min(opts->nr_poll_queues, num_online_cpus());
-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
-	}
+	if (opts->nr_write_queues)
+		write_queues = min(opts->nr_write_queues, default_queues);
+	if (opts->nr_poll_queues)
+		poll_queues = min(opts->nr_poll_queues, num_online_cpus());
 
+	nr_io_queues = default_queues + write_queues + poll_queues;
+	nr_req_queues = nr_io_queues;
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
 
+	if (nr_req_queues <= nr_io_queues) {
+		/* set the queues according to host demand */
+		ctrl->io_queues[HCTX_TYPE_READ] =
+				nr_req_queues - poll_queues - write_queues;
+		if (write_queues) {
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
+			ctrl->ctrl.nr_write_queues = write_queues;
+		} else {
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+					ctrl->io_queues[HCTX_TYPE_READ];
+		}
+		if (poll_queues) {
+			ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
+			ctrl->ctrl.nr_poll_queues = poll_queues;
+		}
+		nr_io_queues = nr_req_queues;
+	} else {
+		/* set the queues according to controller capability */
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
+		ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
+		ctrl->io_queues[HCTX_TYPE_POLL] = 0;
+		ctrl->ctrl.nr_write_queues = 0;
+		ctrl->ctrl.nr_poll_queues = 0;
+	}
+
 	ctrl->ctrl.queue_count = nr_io_queues + 1;
 	if (ctrl->ctrl.queue_count < 2)
 		return 0;
@@ -739,7 +756,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
-		set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
+		set->nr_maps = nctrl->nr_poll_queues ? HCTX_MAX_TYPES : 2;
 	}
 
 	ret = blk_mq_alloc_tag_set(set);
@@ -1791,8 +1808,7 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 	set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
 	set->map[HCTX_TYPE_DEFAULT].nr_queues =
 			ctrl->io_queues[HCTX_TYPE_DEFAULT];
-	set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
-	if (ctrl->ctrl.opts->nr_write_queues) {
+	if (ctrl->ctrl.nr_write_queues) {
 		/* separate read/write queues */
 		set->map[HCTX_TYPE_READ].queue_offset =
 				ctrl->io_queues[HCTX_TYPE_DEFAULT];
@@ -1800,17 +1816,19 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 		/* mixed read/write queues */
 		set->map[HCTX_TYPE_READ].queue_offset = 0;
 	}
+	set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
+
 	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
 			ctrl->device->dev, 0);
 	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
 			ctrl->device->dev, 0);
 
-	if (ctrl->ctrl.opts->nr_poll_queues) {
+	if (ctrl->ctrl.nr_poll_queues) {
 		set->map[HCTX_TYPE_POLL].nr_queues =
 				ctrl->io_queues[HCTX_TYPE_POLL];
 		set->map[HCTX_TYPE_POLL].queue_offset =
 				ctrl->io_queues[HCTX_TYPE_DEFAULT];
-		if (ctrl->ctrl.opts->nr_write_queues)
+		if (ctrl->ctrl.nr_write_queues)
 			set->map[HCTX_TYPE_POLL].queue_offset +=
 				ctrl->io_queues[HCTX_TYPE_READ];
 		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-23 14:35         ` Max Gurtovoy
@ 2019-05-24  4:43           ` Nirranjan Kirubaharan
  2019-05-24  6:24             ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-24  4:43 UTC (permalink / raw)


On Thursday, May 05/23/19, 2019@17:35:09 +0300, Max Gurtovoy wrote:
> I'll take deeper look on it next week but please try the new attached patch
This new patch holds good for the minimal testing I did, with target allocating
lesser queues.

> 
> On 5/23/2019 2:41 PM, Nirranjan Kirubaharan wrote:
> >On Thursday, May 05/23/19, 2019@14:14:25 +0300, Max Gurtovoy wrote:
> >>I see.
> >>
> >>probably we need to review again read/write/poll queues patches.
> >>
> >>can you try the attached untested patch ?
> >Using the attached patch, it works if I dont use the write/poll queues,
> >even when target allocates lesser queues.
> >
> >I see the below panic, if I use poll queues with target allocating
> >less than the requested queues.
> >
> >[161557.300219] RIP: 0010:blk_mq_map_queues+0x92/0xa0
> >[161557.312476] Code: 39 05 26 76 02 01 0f 46 c3 39 d8 74 19 89 c0 41 8b 04 84 43 89 04 bc eb a3 5b 5d 41 5c 41 5d 41 5e 31 c0 41 5f c3 89 d8 31 d2 <f7> f5 41 03 55 0c 43 89 14 bc eb 86 66 90 66 66 66 66 90 55 b8 ff
> >[161557.347031] RSP: 0018:ffffa04fc3d57ce8 EFLAGS: 00010246
> >[161557.360280] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >[161557.375517] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff924d47c11810
> >[161557.390701] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff924f143451e0
> >[161557.405828] R10: ffff924d47c03980 R11: 0000000000000000 R12: ffff924f143451e0
> >[161557.420882] R13: ffff924f14af4028 R14: 0000000000010120 R15: 0000000000000000
> >[161557.435919] FS:  00007f4566a31740(0000) GS:ffff92506fa00000(0000) knlGS:0000000000000000
> >[161557.451938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[161557.465622] CR2: 00007ffdf1ef99d8 CR3: 000000042213a000 CR4: 00000000000006f0
> >[161557.480765] Call Trace:
> >[161557.491147]  nvme_rdma_map_queues+0x9e/0xc0 [nvme_rdma]
> >[161557.504360]  blk_mq_alloc_tag_set+0x1bd/0x2d0
> >[161557.516666]  nvme_rdma_alloc_tagset+0xd6/0x2a0 [nvme_rdma]
> >[161557.530108]  nvme_rdma_setup_ctrl+0x362/0x7a0 [nvme_rdma]
> >[161557.543410]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> >[161557.556754]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> >[161557.569687]  vfs_write+0xad/0x1b0
> >[161557.580718]  ksys_write+0x55/0xd0
> >[161557.591689]  do_syscall_64+0x5b/0x1b0
> >[161557.602941]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >[161557.615575] RIP: 0033:0x7f4566536c60
> >[161557.626678] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> >[161557.661257] RSP: 002b:00007ffdf1ef99d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >[161557.676897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4566536c60
> >[161557.692124] RDX: 000000000000005c RSI: 00007ffdf1efae70 RDI: 0000000000000003
> >[161557.707378] RBP: 00007ffdf1efae70 R08: 0000000000000000 R09: 00007f45664940fd
> >[161557.722648] R10: 00007ffdf1ef95a0 R11: 0000000000000246 R12: 000000000000005c
> >[161557.737848] R13: 0000000000000037 R14: 000000000000000b R15: 000000000064f5e0
> >
> >>On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
> >>>On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
> >>>>On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> >>>>>Return error -ENOMEM when nvmf target allocates lesser
> >>>>>io queues than the number of io queues requested by nvmf
> >>>>>initiator.
> >>>>why can't we live with lesser queues ?
> >>>In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
> >>>assuming all the requested no of queues will be allocated by the target.
> >>>
> >>>>I can demand 64K queues and the target might return 4 and it's fine
> >>>>for functionality.
> >>>>
> >>>>where is the NULL that you see ?
> >>>In nvme_rdma_init_request() accessing unallocated queue_idx of
> >>>ctrl->io_queues[] causes NULL deref.
> >>>
> >>>[  703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
> >>>[  703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
> >>>[  703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
> >>>[  703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
> >>>[  703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
> >>>[  703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
> >>>[  703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
> >>>[  703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
> >>>[  703.192195] FS:  00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
> >>>[  703.192196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>[  703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
> >>>[  703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>[  703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>[  703.192199] Call Trace:
> >>>[  703.192206]  blk_mq_alloc_rqs+0x1f0/0x290
> >>>[  703.192207]  __blk_mq_alloc_rq_map+0x46/0x80
> >>>[  703.192209]  blk_mq_map_swqueue+0x1dd/0x2e0
> >>>[  703.192210]  blk_mq_init_allocated_queue+0x3c8/0x430
> >>>[  703.192211]  blk_mq_init_queue+0x35/0x60
> >>>[  703.192213]  ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
> >>>[  703.192214]  nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
> >>>[  703.192215]  nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> >>>[  703.192218]  nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> >>>[  703.192222]  vfs_write+0xad/0x1b0
> >>>[  703.192224]  ksys_write+0x5a/0xd0
> >>>[  703.192228]  do_syscall_64+0x5b/0x180
> >>>[  703.192231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>[  703.192232] RIP: 0033:0x7fd660cddc60
> >>>[  703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> >>>[  703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >>>[  703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
> >>>[  703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
> >>>[  703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
> >>>[  703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
> >>>[  703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
> >>>
> >>>>>Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> >>>>>Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> >>>>>---
> >>>>>  drivers/nvme/host/rdma.c | 9 ++++++++-
> >>>>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>
> >>>>>diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >>>>>index f383146e7d0f..187007d136cc 100644
> >>>>>--- a/drivers/nvme/host/rdma.c
> >>>>>+++ b/drivers/nvme/host/rdma.c
> >>>>>@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>>>>  {
> >>>>>  	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >>>>>  	struct ib_device *ibdev = ctrl->device->dev;
> >>>>>-	unsigned int nr_io_queues;
> >>>>>+	unsigned int nr_io_queues, nr_req_queues;
> >>>>>  	int i, ret;
> >>>>>  	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >>>>>@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>>>>  		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >>>>>  	}
> >>>>>+	nr_req_queues = nr_io_queues;
> >>>>>  	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >>>>>  	if (ret)
> >>>>>  		return ret;
> >>>>>+	if (nr_io_queues < nr_req_queues) {
> >>>>>+		dev_err(ctrl->ctrl.device,
> >>>>>+			"alloc queues %u < req no of queues %u",
> >>>>>+			nr_io_queues, nr_req_queues);
> >>>>>+		return -ENOMEM;
> >>>>>+	}
> >>>>>  	ctrl->ctrl.queue_count = nr_io_queues + 1;
> >>>>>  	if (ctrl->ctrl.queue_count < 2)
> >>diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >>index 5a4ad25..d0cc981 100644
> >>--- a/drivers/nvme/host/rdma.c
> >>+++ b/drivers/nvme/host/rdma.c
> >>@@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>  {
> >>  	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >>  	struct ib_device *ibdev = ctrl->device->dev;
> >>-	unsigned int nr_io_queues;
> >>+	unsigned int nr_io_queues, nr_req_queues;
> >>+	unsigned int default_queues, poll_queues = 0, write_queues = 0;
> >>  	int i, ret;
> >>  	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >>@@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>  	 * optimal locality so we don't need more queues than
> >>  	 * completion vectors.
> >>  	 */
> >>-	nr_io_queues = min_t(unsigned int, nr_io_queues,
> >>-				ibdev->num_comp_vectors);
> >>+	default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
> >>+					      ibdev->num_comp_vectors);
> >>-	if (opts->nr_write_queues) {
> >>-		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> >>-				min(opts->nr_write_queues, nr_io_queues);
> >>-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> >>-	} else {
> >>-		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> >>-	}
> >>-
> >>-	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> >>-
> >>-	if (opts->nr_poll_queues) {
> >>-		ctrl->io_queues[HCTX_TYPE_POLL] =
> >>-			min(opts->nr_poll_queues, num_online_cpus());
> >>-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >>-	}
> >>+	if (opts->nr_write_queues)
> >>+		write_queues = min(opts->nr_write_queues, default_queues);
> >>+	if (opts->nr_poll_queues)
> >>+		poll_queues = min(opts->nr_poll_queues, num_online_cpus());
> >>+	nr_io_queues += write_queues + poll_queues;
> >>+	nr_req_queues = nr_io_queues;
> >>  	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >>  	if (ret)
> >>  		return ret;
> >>+	if (nr_req_queues <= nr_io_queues) {
> >>+		/* set the queues according to host demand */
> >>+		ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
> >>+		if (write_queues)
> >>+			ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
> >>+		else
> >>+			ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> >>+					nr_req_queues - poll_queues;
> >>+		if (poll_queues)
> >>+			ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
> >>+
> >>+	} else {
> >>+		/* set the queues according to controller capability */
> >>+		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> >>+		ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> >>+		ctrl->io_queues[HCTX_TYPE_POLL] = 0;
> >>+	}
> >>+
> >>  	ctrl->ctrl.queue_count = nr_io_queues + 1;
> >>  	if (ctrl->ctrl.queue_count < 2)
> >>  		return 0;

> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 5ee75b5..f924451 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -247,6 +247,9 @@ struct nvme_ctrl {
>  
>  	struct page *discard_page;
>  	unsigned long discard_page_busy;
> +
> +	unsigned int nr_write_queues;
> +	unsigned int nr_poll_queues;
>  };
>  
>  enum nvme_iopolicy {
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 5a4ad25..5e90e92 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -641,39 +641,56 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>  {
>  	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>  	struct ib_device *ibdev = ctrl->device->dev;
> -	unsigned int nr_io_queues;
> +	unsigned int nr_io_queues, nr_req_queues;
> +	unsigned int default_queues, poll_queues = 0, write_queues = 0;
>  	int i, ret;
>  
> -	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> +	default_queues = min(opts->nr_io_queues, num_online_cpus());
>  
>  	/*
>  	 * we map queues according to the device irq vectors for
>  	 * optimal locality so we don't need more queues than
>  	 * completion vectors.
>  	 */
> -	nr_io_queues = min_t(unsigned int, nr_io_queues,
> -				ibdev->num_comp_vectors);
> +	default_queues = min_t(unsigned int, default_queues,
> +			       ibdev->num_comp_vectors);
>  
> -	if (opts->nr_write_queues) {
> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> -				min(opts->nr_write_queues, nr_io_queues);
> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -	} else {
> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> -	}
> -
> -	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> -
> -	if (opts->nr_poll_queues) {
> -		ctrl->io_queues[HCTX_TYPE_POLL] =
> -			min(opts->nr_poll_queues, num_online_cpus());
> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> -	}
> +	if (opts->nr_write_queues)
> +		write_queues = min(opts->nr_write_queues, default_queues);
> +	if (opts->nr_poll_queues)
> +		poll_queues = min(opts->nr_poll_queues, num_online_cpus());
>  
> +	nr_io_queues = default_queues + write_queues + poll_queues;
> +	nr_req_queues = nr_io_queues;
>  	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>  	if (ret)
>  		return ret;
>  
> +	if (nr_req_queues <= nr_io_queues) {
> +		/* set the queues according to host demand */
> +		ctrl->io_queues[HCTX_TYPE_READ] =
> +				nr_req_queues - poll_queues - write_queues;
> +		if (write_queues) {
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
> +			ctrl->ctrl.nr_write_queues = write_queues;
> +		} else {
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> +					ctrl->io_queues[HCTX_TYPE_READ];
> +		}
> +		if (poll_queues) {
> +			ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
> +			ctrl->ctrl.nr_poll_queues = poll_queues;
> +		}
> +		nr_io_queues = nr_req_queues;
> +	} else {
> +		/* set the queues according to controller capability */
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> +		ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> +		ctrl->io_queues[HCTX_TYPE_POLL] = 0;
> +		ctrl->ctrl.nr_write_queues = 0;
> +		ctrl->ctrl.nr_poll_queues = 0;
> +	}
> +
>  	ctrl->ctrl.queue_count = nr_io_queues + 1;
>  	if (ctrl->ctrl.queue_count < 2)
>  		return 0;
> @@ -739,7 +756,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
>  		set->driver_data = ctrl;
>  		set->nr_hw_queues = nctrl->queue_count - 1;
>  		set->timeout = NVME_IO_TIMEOUT;
> -		set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
> +		set->nr_maps = nctrl->nr_poll_queues ? HCTX_MAX_TYPES : 2;
>  	}
>  
>  	ret = blk_mq_alloc_tag_set(set);
> @@ -1791,8 +1808,7 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
>  	set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
>  	set->map[HCTX_TYPE_DEFAULT].nr_queues =
>  			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -	set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
> -	if (ctrl->ctrl.opts->nr_write_queues) {
> +	if (ctrl->ctrl.nr_write_queues) {
>  		/* separate read/write queues */
>  		set->map[HCTX_TYPE_READ].queue_offset =
>  				ctrl->io_queues[HCTX_TYPE_DEFAULT];
> @@ -1800,17 +1816,19 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
>  		/* mixed read/write queues */
>  		set->map[HCTX_TYPE_READ].queue_offset = 0;
>  	}
> +	set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
> +
>  	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
>  			ctrl->device->dev, 0);
>  	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
>  			ctrl->device->dev, 0);
>  
> -	if (ctrl->ctrl.opts->nr_poll_queues) {
> +	if (ctrl->ctrl.nr_poll_queues) {
>  		set->map[HCTX_TYPE_POLL].nr_queues =
>  				ctrl->io_queues[HCTX_TYPE_POLL];
>  		set->map[HCTX_TYPE_POLL].queue_offset =
>  				ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -		if (ctrl->ctrl.opts->nr_write_queues)
> +		if (ctrl->ctrl.nr_write_queues)
>  			set->map[HCTX_TYPE_POLL].queue_offset +=
>  				ctrl->io_queues[HCTX_TYPE_READ];
>  		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-24  4:43           ` Nirranjan Kirubaharan
@ 2019-05-24  6:24             ` Sagi Grimberg
  2019-05-24 12:07               ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-05-24  6:24 UTC (permalink / raw)



>> I'll take deeper look on it next week but please try the new attached patch
> This new patch holds good for the minimal testing I did, with target allocating
> lesser queues.

Hi,

Just got to this one (after ramping up on activity post vacation..)

Max, I think that your patch is missing the case where default+read+poll
queues are higher than controller queue limit but we can still fit
a queue separation.

For example:
- requested 10/10/10 (default/read/poll)
- supported 28

If I read your patch correctly, this will fall-back to shared read/write
queue maps with no poll queues (10/0/0), while we'd want in this case
to still have the separation but simply have less poll queues (10/10/8).

I posted an alternative patch for review.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-24  6:24             ` Sagi Grimberg
@ 2019-05-24 12:07               ` Max Gurtovoy
  2019-05-24 16:11                 ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-24 12:07 UTC (permalink / raw)



On 5/24/2019 9:24 AM, Sagi Grimberg wrote:
>
>>> I'll take deeper look on it next week but please try the new 
>>> attached patch
>> This new patch holds good for the minimal testing I did, with target 
>> allocating
>> lesser queues.
>
> Hi,
>
> Just got to this one (after ramping up on activity post vacation..)
>
> Max, I think that your patch is missing the case where default+read+poll
> queues are higher than controller queue limit but we can still fit
> a queue separation.

My patch was just a draft to see if it fixes the NULL deref.



>
> For example:
> - requested 10/10/10 (default/read/poll)
> - supported 28
>
> If I read your patch correctly, this will fall-back to shared read/write
> queue maps with no poll queues (10/0/0), while we'd want in this case
> to still have the separation but simply have less poll queues (10/10/8).
>
it will fallback to shared read/write with 28 queues. same as it was 
without this feature (actually I think I need to zero 
"ctrl->io_queues[HCTX_TYPE_READ]" for that to get (28/0/0) instead of 
(28/28/0).

so the default queues can be read/write queues and also write queues 
only (in case we have read_queues != 0) ?

I will review this feature again next week...


> I posted an alternative patch for review.

I'll review it early next week. I have few comments to add there.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
  2019-05-24 12:07               ` Max Gurtovoy
@ 2019-05-24 16:11                 ` Sagi Grimberg
  0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-05-24 16:11 UTC (permalink / raw)



>> Hi,
>>
>> Just got to this one (after ramping up on activity post vacation..)
>>
>> Max, I think that your patch is missing the case where default+read+poll
>> queues are higher than controller queue limit but we can still fit
>> a queue separation.
> 
> My patch was just a draft to see if it fixes the NULL deref.

Yea, I know..

>> For example:
>> - requested 10/10/10 (default/read/poll)
>> - supported 28
>>
>> If I read your patch correctly, this will fall-back to shared read/write
>> queue maps with no poll queues (10/0/0), while we'd want in this case
>> to still have the separation but simply have less poll queues (10/10/8).
>>
> it will fallback to shared read/write with 28 queues. same as it was 
> without this feature

Oh.. this can leave unused queues (if say we have 10 cpu cores) right?

  (actually I think I need to zero
> "ctrl->io_queues[HCTX_TYPE_READ]" for that to get (28/0/0) instead of 
> (28/28/0).
> 
> so the default queues can be read/write queues and also write queues 
> only (in case we have read_queues != 0) ?
> 
> I will review this feature again next week...
> 
> 
>> I posted an alternative patch for review.
> 
> I'll review it early next week. I have few comments to add there.

Cool, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-05-24 16:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23  4:51 [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated Nirranjan Kirubaharan
2019-05-23  7:21 ` Max Gurtovoy
2019-05-23  7:55   ` Nirranjan Kirubaharan
2019-05-23 11:14     ` Max Gurtovoy
2019-05-23 11:41       ` Nirranjan Kirubaharan
2019-05-23 14:35         ` Max Gurtovoy
2019-05-24  4:43           ` Nirranjan Kirubaharan
2019-05-24  6:24             ` Sagi Grimberg
2019-05-24 12:07               ` Max Gurtovoy
2019-05-24 16:11                 ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.