* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
@ 2019-05-23 4:51 Nirranjan Kirubaharan
2019-05-23 7:21 ` Max Gurtovoy
0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-23 4:51 UTC (permalink / raw)
Return error -ENOMEM when nvmf target allocates lesser
io queues than the number of io queues requested by nvmf
initiator.
Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
---
drivers/nvme/host/rdma.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f383146e7d0f..187007d136cc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
{
struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
struct ib_device *ibdev = ctrl->device->dev;
- unsigned int nr_io_queues;
+ unsigned int nr_io_queues, nr_req_queues;
int i, ret;
nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
}
+ nr_req_queues = nr_io_queues;
ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
if (ret)
return ret;
+ if (nr_io_queues < nr_req_queues) {
+ dev_err(ctrl->ctrl.device,
+ "alloc queues %u < req no of queues %u",
+ nr_io_queues, nr_req_queues);
+ return -ENOMEM;
+ }
ctrl->ctrl.queue_count = nr_io_queues + 1;
if (ctrl->ctrl.queue_count < 2)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-23 4:51 [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated Nirranjan Kirubaharan
@ 2019-05-23 7:21 ` Max Gurtovoy
2019-05-23 7:55 ` Nirranjan Kirubaharan
0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-23 7:21 UTC (permalink / raw)
On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> Return error -ENOMEM when nvmf target allocates lesser
> io queues than the number of io queues requested by nvmf
> initiator.
why can't we live with lesser queues ?
I can demand 64K queues and the target might return 4 and it's fine for
functionality.
where is the NULL that you see ?
>
> Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> ---
> drivers/nvme/host/rdma.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index f383146e7d0f..187007d136cc 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> {
> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> struct ib_device *ibdev = ctrl->device->dev;
> - unsigned int nr_io_queues;
> + unsigned int nr_io_queues, nr_req_queues;
> int i, ret;
>
> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> @@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> }
>
> + nr_req_queues = nr_io_queues;
> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> if (ret)
> return ret;
> + if (nr_io_queues < nr_req_queues) {
> + dev_err(ctrl->ctrl.device,
> + "alloc queues %u < req no of queues %u",
> + nr_io_queues, nr_req_queues);
> + return -ENOMEM;
> + }
>
> ctrl->ctrl.queue_count = nr_io_queues + 1;
> if (ctrl->ctrl.queue_count < 2)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-23 7:21 ` Max Gurtovoy
@ 2019-05-23 7:55 ` Nirranjan Kirubaharan
2019-05-23 11:14 ` Max Gurtovoy
0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-23 7:55 UTC (permalink / raw)
On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
>
> On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> >Return error -ENOMEM when nvmf target allocates lesser
> >io queues than the number of io queues requested by nvmf
> >initiator.
>
> why can't we live with lesser queues ?
In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
assuming all the requested no of queues will be allocated by the target.
>
> I can demand 64K queues and the target might return 4 and it's fine
> for functionality.
>
> where is the NULL that you see ?
In nvme_rdma_init_request() accessing unallocated queue_idx of
ctrl->io_queues[] causes NULL deref.
[ 703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
[ 703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
[ 703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
[ 703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
[ 703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
[ 703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
[ 703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
[ 703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
[ 703.192195] FS: 00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
[ 703.192196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
[ 703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 703.192199] Call Trace:
[ 703.192206] blk_mq_alloc_rqs+0x1f0/0x290
[ 703.192207] __blk_mq_alloc_rq_map+0x46/0x80
[ 703.192209] blk_mq_map_swqueue+0x1dd/0x2e0
[ 703.192210] blk_mq_init_allocated_queue+0x3c8/0x430
[ 703.192211] blk_mq_init_queue+0x35/0x60
[ 703.192213] ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
[ 703.192214] nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
[ 703.192215] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
[ 703.192218] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
[ 703.192222] vfs_write+0xad/0x1b0
[ 703.192224] ksys_write+0x5a/0xd0
[ 703.192228] do_syscall_64+0x5b/0x180
[ 703.192231] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 703.192232] RIP: 0033:0x7fd660cddc60
[ 703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
[ 703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
[ 703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
[ 703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
[ 703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
[ 703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
>
>
> >
> >Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> >Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> >---
> > drivers/nvme/host/rdma.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >index f383146e7d0f..187007d136cc 100644
> >--- a/drivers/nvme/host/rdma.c
> >+++ b/drivers/nvme/host/rdma.c
> >@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> > {
> > struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> > struct ib_device *ibdev = ctrl->device->dev;
> >- unsigned int nr_io_queues;
> >+ unsigned int nr_io_queues, nr_req_queues;
> > int i, ret;
> > nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> > nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> > }
> >+ nr_req_queues = nr_io_queues;
> > ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> > if (ret)
> > return ret;
> >+ if (nr_io_queues < nr_req_queues) {
> >+ dev_err(ctrl->ctrl.device,
> >+ "alloc queues %u < req no of queues %u",
> >+ nr_io_queues, nr_req_queues);
> >+ return -ENOMEM;
> >+ }
> > ctrl->ctrl.queue_count = nr_io_queues + 1;
> > if (ctrl->ctrl.queue_count < 2)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-23 7:55 ` Nirranjan Kirubaharan
@ 2019-05-23 11:14 ` Max Gurtovoy
2019-05-23 11:41 ` Nirranjan Kirubaharan
0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-23 11:14 UTC (permalink / raw)
I see.
probably we need to review again read/write/poll queues patches.
can you try the attached untested patch ?
On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
> On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
>> On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
>>> Return error -ENOMEM when nvmf target allocates lesser
>>> io queues than the number of io queues requested by nvmf
>>> initiator.
>> why can't we live with lesser queues ?
> In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
> assuming all the requested no of queues will be allocated by the target.
>
>> I can demand 64K queues and the target might return 4 and it's fine
>> for functionality.
>>
>> where is the NULL that you see ?
> In nvme_rdma_init_request() accessing unallocated queue_idx of
> ctrl->io_queues[] causes NULL deref.
>
> [ 703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
> [ 703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
> [ 703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
> [ 703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
> [ 703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
> [ 703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
> [ 703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
> [ 703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
> [ 703.192195] FS: 00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
> [ 703.192196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
> [ 703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 703.192199] Call Trace:
> [ 703.192206] blk_mq_alloc_rqs+0x1f0/0x290
> [ 703.192207] __blk_mq_alloc_rq_map+0x46/0x80
> [ 703.192209] blk_mq_map_swqueue+0x1dd/0x2e0
> [ 703.192210] blk_mq_init_allocated_queue+0x3c8/0x430
> [ 703.192211] blk_mq_init_queue+0x35/0x60
> [ 703.192213] ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
> [ 703.192214] nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
> [ 703.192215] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> [ 703.192218] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> [ 703.192222] vfs_write+0xad/0x1b0
> [ 703.192224] ksys_write+0x5a/0xd0
> [ 703.192228] do_syscall_64+0x5b/0x180
> [ 703.192231] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 703.192232] RIP: 0033:0x7fd660cddc60
> [ 703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> [ 703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
> [ 703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
> [ 703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
> [ 703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
> [ 703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
>
>>
>>> Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
>>> Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
>>> ---
>>> drivers/nvme/host/rdma.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index f383146e7d0f..187007d136cc 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>> {
>>> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>>> struct ib_device *ibdev = ctrl->device->dev;
>>> - unsigned int nr_io_queues;
>>> + unsigned int nr_io_queues, nr_req_queues;
>>> int i, ret;
>>> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
>>> @@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>> nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>>> }
>>> + nr_req_queues = nr_io_queues;
>>> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>>> if (ret)
>>> return ret;
>>> + if (nr_io_queues < nr_req_queues) {
>>> + dev_err(ctrl->ctrl.device,
>>> + "alloc queues %u < req no of queues %u",
>>> + nr_io_queues, nr_req_queues);
>>> + return -ENOMEM;
>>> + }
>>> ctrl->ctrl.queue_count = nr_io_queues + 1;
>>> if (ctrl->ctrl.queue_count < 2)
-------------- next part --------------
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a4ad25..d0cc981 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
{
struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
struct ib_device *ibdev = ctrl->device->dev;
- unsigned int nr_io_queues;
+ unsigned int nr_io_queues, nr_req_queues;
+ unsigned int default_queues, poll_queues = 0, write_queues = 0;
int i, ret;
nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
@@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
* optimal locality so we don't need more queues than
* completion vectors.
*/
- nr_io_queues = min_t(unsigned int, nr_io_queues,
- ibdev->num_comp_vectors);
+ default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
+ ibdev->num_comp_vectors);
- if (opts->nr_write_queues) {
- ctrl->io_queues[HCTX_TYPE_DEFAULT] =
- min(opts->nr_write_queues, nr_io_queues);
- nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
- } else {
- ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
- }
-
- ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
-
- if (opts->nr_poll_queues) {
- ctrl->io_queues[HCTX_TYPE_POLL] =
- min(opts->nr_poll_queues, num_online_cpus());
- nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
- }
+ if (opts->nr_write_queues)
+ write_queues = min(opts->nr_write_queues, default_queues);
+ if (opts->nr_poll_queues)
+ poll_queues = min(opts->nr_poll_queues, num_online_cpus());
+ nr_io_queues += write_queues + poll_queues;
+ nr_req_queues = nr_io_queues;
ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
if (ret)
return ret;
+ if (nr_req_queues <= nr_io_queues) {
+ /* set the queues according to host demand */
+ ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
+ if (write_queues)
+ ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
+ else
+ ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+ nr_req_queues - poll_queues;
+ if (poll_queues)
+ ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
+
+ } else {
+ /* set the queues according to controller capability */
+ ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
+ ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
+ ctrl->io_queues[HCTX_TYPE_POLL] = 0;
+ }
+
ctrl->ctrl.queue_count = nr_io_queues + 1;
if (ctrl->ctrl.queue_count < 2)
return 0;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-23 11:14 ` Max Gurtovoy
@ 2019-05-23 11:41 ` Nirranjan Kirubaharan
2019-05-23 14:35 ` Max Gurtovoy
0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-23 11:41 UTC (permalink / raw)
On Thursday, May 05/23/19, 2019@14:14:25 +0300, Max Gurtovoy wrote:
> I see.
>
> probably we need to review again read/write/poll queues patches.
>
> can you try the attached untested patch ?
Using the attached patch, it works if I dont use the write/poll queues,
even when target allocates lesser queues.
I see the below panic, if I use poll queues with target allocating
less than the requested queues.
[161557.300219] RIP: 0010:blk_mq_map_queues+0x92/0xa0
[161557.312476] Code: 39 05 26 76 02 01 0f 46 c3 39 d8 74 19 89 c0 41 8b 04 84 43 89 04 bc eb a3 5b 5d 41 5c 41 5d 41 5e 31 c0 41 5f c3 89 d8 31 d2 <f7> f5 41 03 55 0c 43 89 14 bc eb 86 66 90 66 66 66 66 90 55 b8 ff
[161557.347031] RSP: 0018:ffffa04fc3d57ce8 EFLAGS: 00010246
[161557.360280] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[161557.375517] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff924d47c11810
[161557.390701] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff924f143451e0
[161557.405828] R10: ffff924d47c03980 R11: 0000000000000000 R12: ffff924f143451e0
[161557.420882] R13: ffff924f14af4028 R14: 0000000000010120 R15: 0000000000000000
[161557.435919] FS: 00007f4566a31740(0000) GS:ffff92506fa00000(0000) knlGS:0000000000000000
[161557.451938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[161557.465622] CR2: 00007ffdf1ef99d8 CR3: 000000042213a000 CR4: 00000000000006f0
[161557.480765] Call Trace:
[161557.491147] nvme_rdma_map_queues+0x9e/0xc0 [nvme_rdma]
[161557.504360] blk_mq_alloc_tag_set+0x1bd/0x2d0
[161557.516666] nvme_rdma_alloc_tagset+0xd6/0x2a0 [nvme_rdma]
[161557.530108] nvme_rdma_setup_ctrl+0x362/0x7a0 [nvme_rdma]
[161557.543410] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
[161557.556754] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
[161557.569687] vfs_write+0xad/0x1b0
[161557.580718] ksys_write+0x55/0xd0
[161557.591689] do_syscall_64+0x5b/0x1b0
[161557.602941] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[161557.615575] RIP: 0033:0x7f4566536c60
[161557.626678] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
[161557.661257] RSP: 002b:00007ffdf1ef99d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[161557.676897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4566536c60
[161557.692124] RDX: 000000000000005c RSI: 00007ffdf1efae70 RDI: 0000000000000003
[161557.707378] RBP: 00007ffdf1efae70 R08: 0000000000000000 R09: 00007f45664940fd
[161557.722648] R10: 00007ffdf1ef95a0 R11: 0000000000000246 R12: 000000000000005c
[161557.737848] R13: 0000000000000037 R14: 000000000000000b R15: 000000000064f5e0
>
> On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
> >On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
> >>On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> >>>Return error -ENOMEM when nvmf target allocates lesser
> >>>io queues than the number of io queues requested by nvmf
> >>>initiator.
> >>why can't we live with lesser queues ?
> >In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
> >assuming all the requested no of queues will be allocated by the target.
> >
> >>I can demand 64K queues and the target might return 4 and it's fine
> >>for functionality.
> >>
> >>where is the NULL that you see ?
> >In nvme_rdma_init_request() accessing unallocated queue_idx of
> >ctrl->io_queues[] causes NULL deref.
> >
> >[ 703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
> >[ 703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
> >[ 703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
> >[ 703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
> >[ 703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
> >[ 703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
> >[ 703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
> >[ 703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
> >[ 703.192195] FS: 00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
> >[ 703.192196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[ 703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
> >[ 703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >[ 703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >[ 703.192199] Call Trace:
> >[ 703.192206] blk_mq_alloc_rqs+0x1f0/0x290
> >[ 703.192207] __blk_mq_alloc_rq_map+0x46/0x80
> >[ 703.192209] blk_mq_map_swqueue+0x1dd/0x2e0
> >[ 703.192210] blk_mq_init_allocated_queue+0x3c8/0x430
> >[ 703.192211] blk_mq_init_queue+0x35/0x60
> >[ 703.192213] ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
> >[ 703.192214] nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
> >[ 703.192215] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> >[ 703.192218] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> >[ 703.192222] vfs_write+0xad/0x1b0
> >[ 703.192224] ksys_write+0x5a/0xd0
> >[ 703.192228] do_syscall_64+0x5b/0x180
> >[ 703.192231] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >[ 703.192232] RIP: 0033:0x7fd660cddc60
> >[ 703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> >[ 703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >[ 703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
> >[ 703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
> >[ 703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
> >[ 703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
> >[ 703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
> >
> >>
> >>>Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> >>>Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> >>>---
> >>> drivers/nvme/host/rdma.c | 9 ++++++++-
> >>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >>>index f383146e7d0f..187007d136cc 100644
> >>>--- a/drivers/nvme/host/rdma.c
> >>>+++ b/drivers/nvme/host/rdma.c
> >>>@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>> {
> >>> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >>> struct ib_device *ibdev = ctrl->device->dev;
> >>>- unsigned int nr_io_queues;
> >>>+ unsigned int nr_io_queues, nr_req_queues;
> >>> int i, ret;
> >>> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >>>@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>> nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >>> }
> >>>+ nr_req_queues = nr_io_queues;
> >>> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >>> if (ret)
> >>> return ret;
> >>>+ if (nr_io_queues < nr_req_queues) {
> >>>+ dev_err(ctrl->ctrl.device,
> >>>+ "alloc queues %u < req no of queues %u",
> >>>+ nr_io_queues, nr_req_queues);
> >>>+ return -ENOMEM;
> >>>+ }
> >>> ctrl->ctrl.queue_count = nr_io_queues + 1;
> >>> if (ctrl->ctrl.queue_count < 2)
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 5a4ad25..d0cc981 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> {
> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> struct ib_device *ibdev = ctrl->device->dev;
> - unsigned int nr_io_queues;
> + unsigned int nr_io_queues, nr_req_queues;
> + unsigned int default_queues, poll_queues = 0, write_queues = 0;
> int i, ret;
>
> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> @@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> * optimal locality so we don't need more queues than
> * completion vectors.
> */
> - nr_io_queues = min_t(unsigned int, nr_io_queues,
> - ibdev->num_comp_vectors);
> + default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
> + ibdev->num_comp_vectors);
>
> - if (opts->nr_write_queues) {
> - ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> - min(opts->nr_write_queues, nr_io_queues);
> - nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> - } else {
> - ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> - }
> -
> - ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> -
> - if (opts->nr_poll_queues) {
> - ctrl->io_queues[HCTX_TYPE_POLL] =
> - min(opts->nr_poll_queues, num_online_cpus());
> - nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> - }
> + if (opts->nr_write_queues)
> + write_queues = min(opts->nr_write_queues, default_queues);
> + if (opts->nr_poll_queues)
> + poll_queues = min(opts->nr_poll_queues, num_online_cpus());
>
> + nr_io_queues += write_queues + poll_queues;
> + nr_req_queues = nr_io_queues;
> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> if (ret)
> return ret;
>
> + if (nr_req_queues <= nr_io_queues) {
> + /* set the queues according to host demand */
> + ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
> + if (write_queues)
> + ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
> + else
> + ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> + nr_req_queues - poll_queues;
> + if (poll_queues)
> + ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
> +
> + } else {
> + /* set the queues according to controller capability */
> + ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> + ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> + ctrl->io_queues[HCTX_TYPE_POLL] = 0;
> + }
> +
> ctrl->ctrl.queue_count = nr_io_queues + 1;
> if (ctrl->ctrl.queue_count < 2)
> return 0;
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-23 11:41 ` Nirranjan Kirubaharan
@ 2019-05-23 14:35 ` Max Gurtovoy
2019-05-24 4:43 ` Nirranjan Kirubaharan
0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-23 14:35 UTC (permalink / raw)
I'll take deeper look on it next week but please try the new attached patch
On 5/23/2019 2:41 PM, Nirranjan Kirubaharan wrote:
> On Thursday, May 05/23/19, 2019@14:14:25 +0300, Max Gurtovoy wrote:
>> I see.
>>
>> probably we need to review again read/write/poll queues patches.
>>
>> can you try the attached untested patch ?
> Using the attached patch, it works if I dont use the write/poll queues,
> even when target allocates lesser queues.
>
> I see the below panic, if I use poll queues with target allocating
> less than the requested queues.
>
> [161557.300219] RIP: 0010:blk_mq_map_queues+0x92/0xa0
> [161557.312476] Code: 39 05 26 76 02 01 0f 46 c3 39 d8 74 19 89 c0 41 8b 04 84 43 89 04 bc eb a3 5b 5d 41 5c 41 5d 41 5e 31 c0 41 5f c3 89 d8 31 d2 <f7> f5 41 03 55 0c 43 89 14 bc eb 86 66 90 66 66 66 66 90 55 b8 ff
> [161557.347031] RSP: 0018:ffffa04fc3d57ce8 EFLAGS: 00010246
> [161557.360280] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [161557.375517] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff924d47c11810
> [161557.390701] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff924f143451e0
> [161557.405828] R10: ffff924d47c03980 R11: 0000000000000000 R12: ffff924f143451e0
> [161557.420882] R13: ffff924f14af4028 R14: 0000000000010120 R15: 0000000000000000
> [161557.435919] FS: 00007f4566a31740(0000) GS:ffff92506fa00000(0000) knlGS:0000000000000000
> [161557.451938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [161557.465622] CR2: 00007ffdf1ef99d8 CR3: 000000042213a000 CR4: 00000000000006f0
> [161557.480765] Call Trace:
> [161557.491147] nvme_rdma_map_queues+0x9e/0xc0 [nvme_rdma]
> [161557.504360] blk_mq_alloc_tag_set+0x1bd/0x2d0
> [161557.516666] nvme_rdma_alloc_tagset+0xd6/0x2a0 [nvme_rdma]
> [161557.530108] nvme_rdma_setup_ctrl+0x362/0x7a0 [nvme_rdma]
> [161557.543410] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> [161557.556754] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> [161557.569687] vfs_write+0xad/0x1b0
> [161557.580718] ksys_write+0x55/0xd0
> [161557.591689] do_syscall_64+0x5b/0x1b0
> [161557.602941] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [161557.615575] RIP: 0033:0x7f4566536c60
> [161557.626678] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> [161557.661257] RSP: 002b:00007ffdf1ef99d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [161557.676897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4566536c60
> [161557.692124] RDX: 000000000000005c RSI: 00007ffdf1efae70 RDI: 0000000000000003
> [161557.707378] RBP: 00007ffdf1efae70 R08: 0000000000000000 R09: 00007f45664940fd
> [161557.722648] R10: 00007ffdf1ef95a0 R11: 0000000000000246 R12: 000000000000005c
> [161557.737848] R13: 0000000000000037 R14: 000000000000000b R15: 000000000064f5e0
>
>> On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
>>> On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
>>>> On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
>>>>> Return error -ENOMEM when nvmf target allocates lesser
>>>>> io queues than the number of io queues requested by nvmf
>>>>> initiator.
>>>> why can't we live with lesser queues ?
>>> In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
>>> assuming all the requested no of queues will be allocated by the target.
>>>
>>>> I can demand 64K queues and the target might return 4 and it's fine
>>>> for functionality.
>>>>
>>>> where is the NULL that you see ?
>>> In nvme_rdma_init_request() accessing unallocated queue_idx of
>>> ctrl->io_queues[] causes NULL deref.
>>>
>>> [ 703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
>>> [ 703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
>>> [ 703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
>>> [ 703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
>>> [ 703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
>>> [ 703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
>>> [ 703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
>>> [ 703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
>>> [ 703.192195] FS: 00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
>>> [ 703.192196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
>>> [ 703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ 703.192199] Call Trace:
>>> [ 703.192206] blk_mq_alloc_rqs+0x1f0/0x290
>>> [ 703.192207] __blk_mq_alloc_rq_map+0x46/0x80
>>> [ 703.192209] blk_mq_map_swqueue+0x1dd/0x2e0
>>> [ 703.192210] blk_mq_init_allocated_queue+0x3c8/0x430
>>> [ 703.192211] blk_mq_init_queue+0x35/0x60
>>> [ 703.192213] ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
>>> [ 703.192214] nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
>>> [ 703.192215] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
>>> [ 703.192218] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
>>> [ 703.192222] vfs_write+0xad/0x1b0
>>> [ 703.192224] ksys_write+0x5a/0xd0
>>> [ 703.192228] do_syscall_64+0x5b/0x180
>>> [ 703.192231] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [ 703.192232] RIP: 0033:0x7fd660cddc60
>>> [ 703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
>>> [ 703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> [ 703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
>>> [ 703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
>>> [ 703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
>>> [ 703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
>>> [ 703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
>>>
>>>>> Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
>>>>> Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
>>>>> ---
>>>>> drivers/nvme/host/rdma.c | 9 ++++++++-
>>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>>> index f383146e7d0f..187007d136cc 100644
>>>>> --- a/drivers/nvme/host/rdma.c
>>>>> +++ b/drivers/nvme/host/rdma.c
>>>>> @@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>>>> {
>>>>> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>>>>> struct ib_device *ibdev = ctrl->device->dev;
>>>>> - unsigned int nr_io_queues;
>>>>> + unsigned int nr_io_queues, nr_req_queues;
>>>>> int i, ret;
>>>>> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
>>>>> @@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>>>>> nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>>>>> }
>>>>> + nr_req_queues = nr_io_queues;
>>>>> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>>>>> if (ret)
>>>>> return ret;
>>>>> + if (nr_io_queues < nr_req_queues) {
>>>>> + dev_err(ctrl->ctrl.device,
>>>>> + "alloc queues %u < req no of queues %u",
>>>>> + nr_io_queues, nr_req_queues);
>>>>> + return -ENOMEM;
>>>>> + }
>>>>> ctrl->ctrl.queue_count = nr_io_queues + 1;
>>>>> if (ctrl->ctrl.queue_count < 2)
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 5a4ad25..d0cc981 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>> {
>> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>> struct ib_device *ibdev = ctrl->device->dev;
>> - unsigned int nr_io_queues;
>> + unsigned int nr_io_queues, nr_req_queues;
>> + unsigned int default_queues, poll_queues = 0, write_queues = 0;
>> int i, ret;
>>
>> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
>> @@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>> * optimal locality so we don't need more queues than
>> * completion vectors.
>> */
>> - nr_io_queues = min_t(unsigned int, nr_io_queues,
>> - ibdev->num_comp_vectors);
>> + default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
>> + ibdev->num_comp_vectors);
>>
>> - if (opts->nr_write_queues) {
>> - ctrl->io_queues[HCTX_TYPE_DEFAULT] =
>> - min(opts->nr_write_queues, nr_io_queues);
>> - nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> - } else {
>> - ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
>> - }
>> -
>> - ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
>> -
>> - if (opts->nr_poll_queues) {
>> - ctrl->io_queues[HCTX_TYPE_POLL] =
>> - min(opts->nr_poll_queues, num_online_cpus());
>> - nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
>> - }
>> + if (opts->nr_write_queues)
>> + write_queues = min(opts->nr_write_queues, default_queues);
>> + if (opts->nr_poll_queues)
>> + poll_queues = min(opts->nr_poll_queues, num_online_cpus());
>>
>> + nr_io_queues += write_queues + poll_queues;
>> + nr_req_queues = nr_io_queues;
>> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>> if (ret)
>> return ret;
>>
>> + if (nr_req_queues <= nr_io_queues) {
>> + /* set the queues according to host demand */
>> + ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
>> + if (write_queues)
>> + ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
>> + else
>> + ctrl->io_queues[HCTX_TYPE_DEFAULT] =
>> + nr_req_queues - poll_queues;
>> + if (poll_queues)
>> + ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
>> +
>> + } else {
>> + /* set the queues according to controller capability */
>> + ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
>> + ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
>> + ctrl->io_queues[HCTX_TYPE_POLL] = 0;
>> + }
>> +
>> ctrl->ctrl.queue_count = nr_io_queues + 1;
>> if (ctrl->ctrl.queue_count < 2)
>> return 0;
-------------- next part --------------
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5ee75b5..f924451 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -247,6 +247,9 @@ struct nvme_ctrl {
struct page *discard_page;
unsigned long discard_page_busy;
+
+ unsigned int nr_write_queues;
+ unsigned int nr_poll_queues;
};
enum nvme_iopolicy {
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5a4ad25..5e90e92 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -641,39 +641,56 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
{
struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
struct ib_device *ibdev = ctrl->device->dev;
- unsigned int nr_io_queues;
+ unsigned int nr_io_queues, nr_req_queues;
+ unsigned int default_queues, poll_queues = 0, write_queues = 0;
int i, ret;
- nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
+ default_queues = min(opts->nr_io_queues, num_online_cpus());
/*
* we map queues according to the device irq vectors for
* optimal locality so we don't need more queues than
* completion vectors.
*/
- nr_io_queues = min_t(unsigned int, nr_io_queues,
- ibdev->num_comp_vectors);
+ default_queues = min_t(unsigned int, default_queues,
+ ibdev->num_comp_vectors);
- if (opts->nr_write_queues) {
- ctrl->io_queues[HCTX_TYPE_DEFAULT] =
- min(opts->nr_write_queues, nr_io_queues);
- nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
- } else {
- ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
- }
-
- ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
-
- if (opts->nr_poll_queues) {
- ctrl->io_queues[HCTX_TYPE_POLL] =
- min(opts->nr_poll_queues, num_online_cpus());
- nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
- }
+ if (opts->nr_write_queues)
+ write_queues = min(opts->nr_write_queues, default_queues);
+ if (opts->nr_poll_queues)
+ poll_queues = min(opts->nr_poll_queues, num_online_cpus());
+ nr_io_queues = default_queues + write_queues + poll_queues;
+ nr_req_queues = nr_io_queues;
ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
if (ret)
return ret;
+ if (nr_req_queues <= nr_io_queues) {
+ /* set the queues according to host demand */
+ ctrl->io_queues[HCTX_TYPE_READ] =
+ nr_req_queues - poll_queues - write_queues;
+ if (write_queues) {
+ ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
+ ctrl->ctrl.nr_write_queues = write_queues;
+ } else {
+ ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+ ctrl->io_queues[HCTX_TYPE_READ];
+ }
+ if (poll_queues) {
+ ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
+ ctrl->ctrl.nr_poll_queues = poll_queues;
+ }
+ nr_io_queues = nr_req_queues;
+ } else {
+ /* set the queues according to controller capability */
+ ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
+ ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
+ ctrl->io_queues[HCTX_TYPE_POLL] = 0;
+ ctrl->ctrl.nr_write_queues = 0;
+ ctrl->ctrl.nr_poll_queues = 0;
+ }
+
ctrl->ctrl.queue_count = nr_io_queues + 1;
if (ctrl->ctrl.queue_count < 2)
return 0;
@@ -739,7 +756,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
set->driver_data = ctrl;
set->nr_hw_queues = nctrl->queue_count - 1;
set->timeout = NVME_IO_TIMEOUT;
- set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
+ set->nr_maps = nctrl->nr_poll_queues ? HCTX_MAX_TYPES : 2;
}
ret = blk_mq_alloc_tag_set(set);
@@ -1791,8 +1808,7 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
set->map[HCTX_TYPE_DEFAULT].nr_queues =
ctrl->io_queues[HCTX_TYPE_DEFAULT];
- set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
- if (ctrl->ctrl.opts->nr_write_queues) {
+ if (ctrl->ctrl.nr_write_queues) {
/* separate read/write queues */
set->map[HCTX_TYPE_READ].queue_offset =
ctrl->io_queues[HCTX_TYPE_DEFAULT];
@@ -1800,17 +1816,19 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
/* mixed read/write queues */
set->map[HCTX_TYPE_READ].queue_offset = 0;
}
+ set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
+
blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
ctrl->device->dev, 0);
blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
ctrl->device->dev, 0);
- if (ctrl->ctrl.opts->nr_poll_queues) {
+ if (ctrl->ctrl.nr_poll_queues) {
set->map[HCTX_TYPE_POLL].nr_queues =
ctrl->io_queues[HCTX_TYPE_POLL];
set->map[HCTX_TYPE_POLL].queue_offset =
ctrl->io_queues[HCTX_TYPE_DEFAULT];
- if (ctrl->ctrl.opts->nr_write_queues)
+ if (ctrl->ctrl.nr_write_queues)
set->map[HCTX_TYPE_POLL].queue_offset +=
ctrl->io_queues[HCTX_TYPE_READ];
blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-23 14:35 ` Max Gurtovoy
@ 2019-05-24 4:43 ` Nirranjan Kirubaharan
2019-05-24 6:24 ` Sagi Grimberg
0 siblings, 1 reply; 10+ messages in thread
From: Nirranjan Kirubaharan @ 2019-05-24 4:43 UTC (permalink / raw)
On Thursday, May 05/23/19, 2019@17:35:09 +0300, Max Gurtovoy wrote:
> I'll take deeper look on it next week but please try the new attached patch
This new patch holds good for the minimal testing I did, with target allocating
lesser queues.
>
> On 5/23/2019 2:41 PM, Nirranjan Kirubaharan wrote:
> >On Thursday, May 05/23/19, 2019@14:14:25 +0300, Max Gurtovoy wrote:
> >>I see.
> >>
> >>probably we need to review again read/write/poll queues patches.
> >>
> >>can you try the attached untested patch ?
> >Using the attached patch, it works if I dont use the write/poll queues,
> >even when target allocates lesser queues.
> >
> >I see the below panic, if I use poll queues with target allocating
> >less than the requested queues.
> >
> >[161557.300219] RIP: 0010:blk_mq_map_queues+0x92/0xa0
> >[161557.312476] Code: 39 05 26 76 02 01 0f 46 c3 39 d8 74 19 89 c0 41 8b 04 84 43 89 04 bc eb a3 5b 5d 41 5c 41 5d 41 5e 31 c0 41 5f c3 89 d8 31 d2 <f7> f5 41 03 55 0c 43 89 14 bc eb 86 66 90 66 66 66 66 90 55 b8 ff
> >[161557.347031] RSP: 0018:ffffa04fc3d57ce8 EFLAGS: 00010246
> >[161557.360280] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >[161557.375517] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff924d47c11810
> >[161557.390701] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff924f143451e0
> >[161557.405828] R10: ffff924d47c03980 R11: 0000000000000000 R12: ffff924f143451e0
> >[161557.420882] R13: ffff924f14af4028 R14: 0000000000010120 R15: 0000000000000000
> >[161557.435919] FS: 00007f4566a31740(0000) GS:ffff92506fa00000(0000) knlGS:0000000000000000
> >[161557.451938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[161557.465622] CR2: 00007ffdf1ef99d8 CR3: 000000042213a000 CR4: 00000000000006f0
> >[161557.480765] Call Trace:
> >[161557.491147] nvme_rdma_map_queues+0x9e/0xc0 [nvme_rdma]
> >[161557.504360] blk_mq_alloc_tag_set+0x1bd/0x2d0
> >[161557.516666] nvme_rdma_alloc_tagset+0xd6/0x2a0 [nvme_rdma]
> >[161557.530108] nvme_rdma_setup_ctrl+0x362/0x7a0 [nvme_rdma]
> >[161557.543410] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> >[161557.556754] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> >[161557.569687] vfs_write+0xad/0x1b0
> >[161557.580718] ksys_write+0x55/0xd0
> >[161557.591689] do_syscall_64+0x5b/0x1b0
> >[161557.602941] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >[161557.615575] RIP: 0033:0x7f4566536c60
> >[161557.626678] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> >[161557.661257] RSP: 002b:00007ffdf1ef99d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >[161557.676897] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4566536c60
> >[161557.692124] RDX: 000000000000005c RSI: 00007ffdf1efae70 RDI: 0000000000000003
> >[161557.707378] RBP: 00007ffdf1efae70 R08: 0000000000000000 R09: 00007f45664940fd
> >[161557.722648] R10: 00007ffdf1ef95a0 R11: 0000000000000246 R12: 000000000000005c
> >[161557.737848] R13: 0000000000000037 R14: 000000000000000b R15: 000000000064f5e0
> >
> >>On 5/23/2019 10:55 AM, Nirranjan Kirubaharan wrote:
> >>>On Thursday, May 05/23/19, 2019@10:21:46 +0300, Max Gurtovoy wrote:
> >>>>On 5/23/2019 7:51 AM, Nirranjan Kirubaharan wrote:
> >>>>>Return error -ENOMEM when nvmf target allocates lesser
> >>>>>io queues than the number of io queues requested by nvmf
> >>>>>initiator.
> >>>>why can't we live with lesser queues ?
> >>>In nvme_rdma_alloc_io_queues() ctrl->io_queues[] are already filled
> >>>assuming all the requested no of queues will be allocated by the target.
> >>>
> >>>>I can demand 64K queues and the target might return 4 and it's fine
> >>>>for functionality.
> >>>>
> >>>>where is the NULL that you see ?
> >>>In nvme_rdma_init_request() accessing unallocated queue_idx of
> >>>ctrl->io_queues[] causes NULL deref.
> >>>
> >>>[ 703.192172] RIP: 0010:nvme_rdma_init_request+0x31/0x140 [nvme_rdma]
> >>>[ 703.192173] Code: 55 31 ed 53 48 8b 47 60 48 89 f3 48 8d 48 08 48 39 cf 0f 84 fb 00 00 00 48 03 28 48 05 f8 02 00 00 be c0 0d 00 00 48 8b 55 20 <4c> 8b 22 48 89 83 28 01 00 00 ba 40 00 00 00 48 8b 3d a9 7b 42 f4
> >>>[ 703.192174] RSP: 0018:ffff9c36835bfc38 EFLAGS: 00010282
> >>>[ 703.192192] RAX: ffff8eb49c8b92f8 RBX: ffff8eb5a6e50000 RCX: ffff8eb49c8b9008
> >>>[ 703.192192] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff8eb49c8b9008
> >>>[ 703.192193] RBP: ffff8eb5ad3c50e0 R08: 00000000119b9400 R09: ffff8eb5831d9520
> >>>[ 703.192194] R10: ffffc83e119b9400 R11: ffffc83e119b9800 R12: ffff8eb49c8b9008
> >>>[ 703.192194] R13: ffff8eb5831d9480 R14: 0000000000000000 R15: ffff8eb5a6e50000
> >>>[ 703.192195] FS: 00007fd6613bb780(0000) GS:ffff8eb5afbc0000(0000) knlGS:0000000000000000
> >>>[ 703.192196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>[ 703.192197] CR2: 0000000000000000 CR3: 00000004646a4005 CR4: 00000000003606e0
> >>>[ 703.192197] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>[ 703.192198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>[ 703.192199] Call Trace:
> >>>[ 703.192206] blk_mq_alloc_rqs+0x1f0/0x290
> >>>[ 703.192207] __blk_mq_alloc_rq_map+0x46/0x80
> >>>[ 703.192209] blk_mq_map_swqueue+0x1dd/0x2e0
> >>>[ 703.192210] blk_mq_init_allocated_queue+0x3c8/0x430
> >>>[ 703.192211] blk_mq_init_queue+0x35/0x60
> >>>[ 703.192213] ? nvme_rdma_alloc_tagset+0x1bb/0x330 [nvme_rdma]
> >>>[ 703.192214] nvme_rdma_setup_ctrl+0x420/0x7b0 [nvme_rdma]
> >>>[ 703.192215] nvme_rdma_create_ctrl+0x29a/0x3d8 [nvme_rdma]
> >>>[ 703.192218] nvmf_dev_write+0xa18/0xbff [nvme_fabrics]
> >>>[ 703.192222] vfs_write+0xad/0x1b0
> >>>[ 703.192224] ksys_write+0x5a/0xd0
> >>>[ 703.192228] do_syscall_64+0x5b/0x180
> >>>[ 703.192231] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>[ 703.192232] RIP: 0033:0x7fd660cddc60
> >>>[ 703.192233] Code: 73 01 c3 48 8b 0d 30 62 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 3d c3 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
> >>>[ 703.192234] RSP: 002b:00007ffe8f58d928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >>>[ 703.192235] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd660cddc60
> >>>[ 703.192236] RDX: 000000000000004d RSI: 00007ffe8f58e9a0 RDI: 0000000000000003
> >>>[ 703.192236] RBP: 00007ffe8f58e9a0 R08: 00007ffe8f58e9ed R09: 00007fd660c3b0fd
> >>>[ 703.192237] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000004d
> >>>[ 703.192237] R13: 000000000151a500 R14: 000000000151a600 R15: 00007ffe8f58e9e0
> >>>
> >>>>>Signed-off-by: Nirranjan Kirubaharan <nirranjan at chelsio.com>
> >>>>>Reviewed-by: Potnuri Bharat Teja <bharat at chelsio.com>
> >>>>>---
> >>>>> drivers/nvme/host/rdma.c | 9 ++++++++-
> >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>
> >>>>>diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >>>>>index f383146e7d0f..187007d136cc 100644
> >>>>>--- a/drivers/nvme/host/rdma.c
> >>>>>+++ b/drivers/nvme/host/rdma.c
> >>>>>@@ -641,7 +641,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>>>> {
> >>>>> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >>>>> struct ib_device *ibdev = ctrl->device->dev;
> >>>>>- unsigned int nr_io_queues;
> >>>>>+ unsigned int nr_io_queues, nr_req_queues;
> >>>>> int i, ret;
> >>>>> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >>>>>@@ -670,9 +670,16 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >>>>> nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >>>>> }
> >>>>>+ nr_req_queues = nr_io_queues;
> >>>>> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >>>>> if (ret)
> >>>>> return ret;
> >>>>>+ if (nr_io_queues < nr_req_queues) {
> >>>>>+ dev_err(ctrl->ctrl.device,
> >>>>>+ "alloc queues %u < req no of queues %u",
> >>>>>+ nr_io_queues, nr_req_queues);
> >>>>>+ return -ENOMEM;
> >>>>>+ }
> >>>>> ctrl->ctrl.queue_count = nr_io_queues + 1;
> >>>>> if (ctrl->ctrl.queue_count < 2)
> >>diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> >>index 5a4ad25..d0cc981 100644
> >>--- a/drivers/nvme/host/rdma.c
> >>+++ b/drivers/nvme/host/rdma.c
> >>@@ -641,7 +641,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >> {
> >> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> >> struct ib_device *ibdev = ctrl->device->dev;
> >>- unsigned int nr_io_queues;
> >>+ unsigned int nr_io_queues, nr_req_queues;
> >>+ unsigned int default_queues, poll_queues = 0, write_queues = 0;
> >> int i, ret;
> >> nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> >>@@ -651,29 +652,38 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> >> * optimal locality so we don't need more queues than
> >> * completion vectors.
> >> */
> >>- nr_io_queues = min_t(unsigned int, nr_io_queues,
> >>- ibdev->num_comp_vectors);
> >>+ default_queues = nr_io_queues = min_t(unsigned int, nr_io_queues,
> >>+ ibdev->num_comp_vectors);
> >>- if (opts->nr_write_queues) {
> >>- ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> >>- min(opts->nr_write_queues, nr_io_queues);
> >>- nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> >>- } else {
> >>- ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> >>- }
> >>-
> >>- ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> >>-
> >>- if (opts->nr_poll_queues) {
> >>- ctrl->io_queues[HCTX_TYPE_POLL] =
> >>- min(opts->nr_poll_queues, num_online_cpus());
> >>- nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> >>- }
> >>+ if (opts->nr_write_queues)
> >>+ write_queues = min(opts->nr_write_queues, default_queues);
> >>+ if (opts->nr_poll_queues)
> >>+ poll_queues = min(opts->nr_poll_queues, num_online_cpus());
> >>+ nr_io_queues += write_queues + poll_queues;
> >>+ nr_req_queues = nr_io_queues;
> >> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> >> if (ret)
> >> return ret;
> >>+ if (nr_req_queues <= nr_io_queues) {
> >>+ /* set the queues according to host demand */
> >>+ ctrl->io_queues[HCTX_TYPE_READ] = nr_req_queues - poll_queues;
> >>+ if (write_queues)
> >>+ ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
> >>+ else
> >>+ ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> >>+ nr_req_queues - poll_queues;
> >>+ if (poll_queues)
> >>+ ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
> >>+
> >>+ } else {
> >>+ /* set the queues according to controller capability */
> >>+ ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> >>+ ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> >>+ ctrl->io_queues[HCTX_TYPE_POLL] = 0;
> >>+ }
> >>+
> >> ctrl->ctrl.queue_count = nr_io_queues + 1;
> >> if (ctrl->ctrl.queue_count < 2)
> >> return 0;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 5ee75b5..f924451 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -247,6 +247,9 @@ struct nvme_ctrl {
>
> struct page *discard_page;
> unsigned long discard_page_busy;
> +
> + unsigned int nr_write_queues;
> + unsigned int nr_poll_queues;
> };
>
> enum nvme_iopolicy {
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 5a4ad25..5e90e92 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -641,39 +641,56 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> {
> struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> struct ib_device *ibdev = ctrl->device->dev;
> - unsigned int nr_io_queues;
> + unsigned int nr_io_queues, nr_req_queues;
> + unsigned int default_queues, poll_queues = 0, write_queues = 0;
> int i, ret;
>
> - nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> + default_queues = min(opts->nr_io_queues, num_online_cpus());
>
> /*
> * we map queues according to the device irq vectors for
> * optimal locality so we don't need more queues than
> * completion vectors.
> */
> - nr_io_queues = min_t(unsigned int, nr_io_queues,
> - ibdev->num_comp_vectors);
> + default_queues = min_t(unsigned int, default_queues,
> + ibdev->num_comp_vectors);
>
> - if (opts->nr_write_queues) {
> - ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> - min(opts->nr_write_queues, nr_io_queues);
> - nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> - } else {
> - ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> - }
> -
> - ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> -
> - if (opts->nr_poll_queues) {
> - ctrl->io_queues[HCTX_TYPE_POLL] =
> - min(opts->nr_poll_queues, num_online_cpus());
> - nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> - }
> + if (opts->nr_write_queues)
> + write_queues = min(opts->nr_write_queues, default_queues);
> + if (opts->nr_poll_queues)
> + poll_queues = min(opts->nr_poll_queues, num_online_cpus());
>
> + nr_io_queues = default_queues + write_queues + poll_queues;
> + nr_req_queues = nr_io_queues;
> ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> if (ret)
> return ret;
>
> + if (nr_req_queues <= nr_io_queues) {
> + /* set the queues according to host demand */
> + ctrl->io_queues[HCTX_TYPE_READ] =
> + nr_req_queues - poll_queues - write_queues;
> + if (write_queues) {
> + ctrl->io_queues[HCTX_TYPE_DEFAULT] = write_queues;
> + ctrl->ctrl.nr_write_queues = write_queues;
> + } else {
> + ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> + ctrl->io_queues[HCTX_TYPE_READ];
> + }
> + if (poll_queues) {
> + ctrl->io_queues[HCTX_TYPE_POLL] = poll_queues;
> + ctrl->ctrl.nr_poll_queues = poll_queues;
> + }
> + nr_io_queues = nr_req_queues;
> + } else {
> + /* set the queues according to controller capability */
> + ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> + ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> + ctrl->io_queues[HCTX_TYPE_POLL] = 0;
> + ctrl->ctrl.nr_write_queues = 0;
> + ctrl->ctrl.nr_poll_queues = 0;
> + }
> +
> ctrl->ctrl.queue_count = nr_io_queues + 1;
> if (ctrl->ctrl.queue_count < 2)
> return 0;
> @@ -739,7 +756,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
> set->driver_data = ctrl;
> set->nr_hw_queues = nctrl->queue_count - 1;
> set->timeout = NVME_IO_TIMEOUT;
> - set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
> + set->nr_maps = nctrl->nr_poll_queues ? HCTX_MAX_TYPES : 2;
> }
>
> ret = blk_mq_alloc_tag_set(set);
> @@ -1791,8 +1808,7 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
> set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> set->map[HCTX_TYPE_DEFAULT].nr_queues =
> ctrl->io_queues[HCTX_TYPE_DEFAULT];
> - set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
> - if (ctrl->ctrl.opts->nr_write_queues) {
> + if (ctrl->ctrl.nr_write_queues) {
> /* separate read/write queues */
> set->map[HCTX_TYPE_READ].queue_offset =
> ctrl->io_queues[HCTX_TYPE_DEFAULT];
> @@ -1800,17 +1816,19 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
> /* mixed read/write queues */
> set->map[HCTX_TYPE_READ].queue_offset = 0;
> }
> + set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
> +
> blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
> ctrl->device->dev, 0);
> blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
> ctrl->device->dev, 0);
>
> - if (ctrl->ctrl.opts->nr_poll_queues) {
> + if (ctrl->ctrl.nr_poll_queues) {
> set->map[HCTX_TYPE_POLL].nr_queues =
> ctrl->io_queues[HCTX_TYPE_POLL];
> set->map[HCTX_TYPE_POLL].queue_offset =
> ctrl->io_queues[HCTX_TYPE_DEFAULT];
> - if (ctrl->ctrl.opts->nr_write_queues)
> + if (ctrl->ctrl.nr_write_queues)
> set->map[HCTX_TYPE_POLL].queue_offset +=
> ctrl->io_queues[HCTX_TYPE_READ];
> blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-24 4:43 ` Nirranjan Kirubaharan
@ 2019-05-24 6:24 ` Sagi Grimberg
2019-05-24 12:07 ` Max Gurtovoy
0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-05-24 6:24 UTC (permalink / raw)
>> I'll take deeper look on it next week but please try the new attached patch
> This new patch holds good for the minimal testing I did, with target allocating
> lesser queues.
Hi,
Just got to this one (after ramping up on activity post vacation..)
Max, I think that your patch is missing the case where default+read+poll
queues are higher than controller queue limit but we can still fit
a queue separation.
For example:
- requested 10/10/10 (default/read/poll)
- supported 28
If I read your patch correctly, this will fall-back to shared read/write
queue maps with no poll queues (10/0/0), while we'd want in this case
to still have the separation but simply have less poll queues (10/10/8).
I posted an alternative patch for review.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-24 6:24 ` Sagi Grimberg
@ 2019-05-24 12:07 ` Max Gurtovoy
2019-05-24 16:11 ` Sagi Grimberg
0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2019-05-24 12:07 UTC (permalink / raw)
On 5/24/2019 9:24 AM, Sagi Grimberg wrote:
>
>>> I'll take deeper look on it next week but please try the new
>>> attached patch
>> This new patch holds good for the minimal testing I did, with target
>> allocating
>> lesser queues.
>
> Hi,
>
> Just got to this one (after ramping up on activity post vacation..)
>
> Max, I think that your patch is missing the case where default+read+poll
> queues are higher than controller queue limit but we can still fit
> a queue separation.
My patch was just a draft to see if it fixes the NULL deref.
>
> For example:
> - requested 10/10/10 (default/read/poll)
> - supported 28
>
> If I read your patch correctly, this will fall-back to shared read/write
> queue maps with no poll queues (10/0/0), while we'd want in this case
> to still have the separation but simply have less poll queues (10/10/8).
>
it will fallback to shared read/write with 28 queues. same as it was
without this feature (actually I think I need to zero
"ctrl->io_queues[HCTX_TYPE_READ]" for that to get (28/0/0) instead of
(28/28/0).
so the default queues can be read/write queues and also write queues
only (in case we have read_queues != 0) ?
I will review this feature again next week...
> I posted an alternative patch for review.
I'll review it early next week. I have few comments to add there.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated.
2019-05-24 12:07 ` Max Gurtovoy
@ 2019-05-24 16:11 ` Sagi Grimberg
0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-05-24 16:11 UTC (permalink / raw)
>> Hi,
>>
>> Just got to this one (after ramping up on activity post vacation..)
>>
>> Max, I think that your patch is missing the case where default+read+poll
>> queues are higher than controller queue limit but we can still fit
>> a queue separation.
>
> My patch was just a draft to see if it fixes the NULL deref.
Yea, I know..
>> For example:
>> - requested 10/10/10 (default/read/poll)
>> - supported 28
>>
>> If I read your patch correctly, this will fall-back to shared read/write
>> queue maps with no poll queues (10/0/0), while we'd want in this case
>> to still have the separation but simply have less poll queues (10/10/8).
>>
> it will fallback to shared read/write with 28 queues. same as it was
> without this feature
Oh.. this can leave unused queues (if say we have 10 cpu cores) right?
(actually I think I need to zero
> "ctrl->io_queues[HCTX_TYPE_READ]" for that to get (28/0/0) instead of
> (28/28/0).
>
> so the default queues can be read/write queues and also write queues
> only (in case we have read_queues != 0) ?
>
> I will review this feature again next week...
>
>
>> I posted an alternative patch for review.
>
> I'll review it early next week. I have few comments to add there.
Cool, thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-05-24 16:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 4:51 [PATCH] nvme-rdma: Fix a NULL deref when lesser io queues are allocated Nirranjan Kirubaharan
2019-05-23 7:21 ` Max Gurtovoy
2019-05-23 7:55 ` Nirranjan Kirubaharan
2019-05-23 11:14 ` Max Gurtovoy
2019-05-23 11:41 ` Nirranjan Kirubaharan
2019-05-23 14:35 ` Max Gurtovoy
2019-05-24 4:43 ` Nirranjan Kirubaharan
2019-05-24 6:24 ` Sagi Grimberg
2019-05-24 12:07 ` Max Gurtovoy
2019-05-24 16:11 ` Sagi Grimberg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.