* [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue
@ 2017-03-20 9:40 Jitendra Bhivare
2017-03-27 9:14 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Jitendra Bhivare @ 2017-03-20 9:40 UTC (permalink / raw)
To: linux-block; +Cc: Jitendra Bhivare, Somnath Kotur
As part of blk_mq_realloc_hw_ctx(), if the init_hctx() ops is
failed by the underyling transport, the hctx pointer is freed and
initialized to NULL.
However, functions down the line, access this hwctx pointer without
a NULL pointer check, which could lead to a kernel crash.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
---
block/blk-mq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a4546f0..9cb2d2e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2048,6 +2048,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
continue;
hctx = blk_mq_map_queue(q, i);
+ if (!hctx)
+ continue;
/*
* Set local node, IFF we have more than one hw queue. If
@@ -2128,6 +2130,8 @@ static void blk_mq_map_swqueue(struct request_queue *q,
ctx = per_cpu_ptr(q->queue_ctx, i);
hctx = blk_mq_map_queue(q, i);
+ if (!hctx)
+ continue;
cpumask_set_cpu(i, hctx->cpumask);
ctx->index_hw = hctx->nr_ctx;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue
2017-03-20 9:40 [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue Jitendra Bhivare
@ 2017-03-27 9:14 ` Christoph Hellwig
2017-03-28 8:35 ` Somnath Kotur
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2017-03-27 9:14 UTC (permalink / raw)
To: Jitendra Bhivare; +Cc: linux-block, Somnath Kotur
On Mon, Mar 20, 2017 at 03:10:01PM +0530, Jitendra Bhivare wrote:
> As part of blk_mq_realloc_hw_ctx(), if the init_hctx() ops is
> failed by the underyling transport, the hctx pointer is freed and
> initialized to NULL.
> However, functions down the line, access this hwctx pointer without
> a NULL pointer check, which could lead to a kernel crash.
Shouldn't we fail initializing the queue if any of the hctx allocations
fail?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue
2017-03-27 9:14 ` Christoph Hellwig
@ 2017-03-28 8:35 ` Somnath Kotur
0 siblings, 0 replies; 3+ messages in thread
From: Somnath Kotur @ 2017-03-28 8:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jitendra Bhivare, linux-block
On Mon, Mar 27, 2017 at 2:44 PM, Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Mar 20, 2017 at 03:10:01PM +0530, Jitendra Bhivare wrote:
> > As part of blk_mq_realloc_hw_ctx(), if the init_hctx() ops is
> > failed by the underyling transport, the hctx pointer is freed and
> > initialized to NULL.
> > However, functions down the line, access this hwctx pointer without
> > a NULL pointer check, which could lead to a kernel crash.
>
> Shouldn't we fail initializing the queue if any of the hctx allocations
> fail?
Well, just to give a better background of the issue, here is the
dump_stack of where/when the failure happens
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa05d42d6>]
ib_alloc_mr+0x26/0x50 [ib_core]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a37691>]
__nvme_rdma_init_request+0xc1/0x230 [nvme_rdma]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a37831>]
nvme_rdma_init_request+0x11/0x20 [nvme_rdma]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffff813429bb>]
blk_mq_init_rq_map+0x23b/0x2b0
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffff81342e25>]
blk_mq_alloc_tag_set+0x135/0x2c0
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a37cc3>]
nvme_rdma_create_ctrl+0x483/0x710 [nvme_rdma]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a2c127>]
nvmf_dev_write+0x727/0x93c [nvme_fabrics]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffff812320e7>]
__vfs_write+0x37/0x160
the ctrl->queue_count in nvme_rdma_create_ctrl() is initialized like so:
ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
where opts->nr_io_queues is typically set to num_online_cpus() which
in my case turned out to be 16, while the failure i encountered was
for the 14th CPU , the failure being alloc_mr() because we reached the
limitation of MRs in our chip.
The point is that post this failure, functions like
blk_mq_init_cpu_queues() and blk_mq_map_swqueue() use code snippet
like below to access the hctxs:
for_each_possible_cpu(i) {
....
hctx = blk_mq_map_queue(q, i);
hctx->.... // crash if ptr is NULL
..
}
I'm not that familiar with the blk code itself, so perhaps there is a
better way of fixing it, but have pointed out the problem and a
possible fix, this is more of a bug
in the error-handling path?
Thanks
Som
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-03-28 8:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20 9:40 [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue Jitendra Bhivare
2017-03-27 9:14 ` Christoph Hellwig
2017-03-28 8:35 ` Somnath Kotur
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).