linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue
@ 2017-03-20  9:40 Jitendra Bhivare
  2017-03-27  9:14 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Jitendra Bhivare @ 2017-03-20  9:40 UTC (permalink / raw)
  To: linux-block; +Cc: Jitendra Bhivare, Somnath Kotur

As part of blk_mq_realloc_hw_ctx(), if the init_hctx() ops is
failed by the underyling transport, the hctx pointer is freed and
initialized to NULL.
However, functions down the line, access this hwctx pointer without
a NULL pointer check, which could lead to a kernel crash.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
---
 block/blk-mq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a4546f0..9cb2d2e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2048,6 +2048,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
+		if (!hctx)
+			continue;
 
 		/*
 		 * Set local node, IFF we have more than one hw queue. If
@@ -2128,6 +2130,8 @@ static void blk_mq_map_swqueue(struct request_queue *q,
 
 		ctx = per_cpu_ptr(q->queue_ctx, i);
 		hctx = blk_mq_map_queue(q, i);
+		if (!hctx)
+			continue;
 
 		cpumask_set_cpu(i, hctx->cpumask);
 		ctx->index_hw = hctx->nr_ctx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue
  2017-03-20  9:40 [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue Jitendra Bhivare
@ 2017-03-27  9:14 ` Christoph Hellwig
  2017-03-28  8:35   ` Somnath Kotur
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2017-03-27  9:14 UTC (permalink / raw)
  To: Jitendra Bhivare; +Cc: linux-block, Somnath Kotur

On Mon, Mar 20, 2017 at 03:10:01PM +0530, Jitendra Bhivare wrote:
> As part of blk_mq_realloc_hw_ctx(), if the init_hctx() ops is
> failed by the underyling transport, the hctx pointer is freed and
> initialized to NULL.
> However, functions down the line, access this hwctx pointer without
> a NULL pointer check, which could lead to a kernel crash.

Shouldn't we fail initializing the queue if any of the hctx allocations
fail?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue
  2017-03-27  9:14 ` Christoph Hellwig
@ 2017-03-28  8:35   ` Somnath Kotur
  0 siblings, 0 replies; 3+ messages in thread
From: Somnath Kotur @ 2017-03-28  8:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jitendra Bhivare, linux-block

On Mon, Mar 27, 2017 at 2:44 PM, Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Mar 20, 2017 at 03:10:01PM +0530, Jitendra Bhivare wrote:
> > As part of blk_mq_realloc_hw_ctx(), if the init_hctx() ops is
> > failed by the underyling transport, the hctx pointer is freed and
> > initialized to NULL.
> > However, functions down the line, access this hwctx pointer without
> > a NULL pointer check, which could lead to a kernel crash.
>
> Shouldn't we fail initializing the queue if any of the hctx allocations
> fail?

Well, just to give a better background of the issue, here is the
dump_stack of where/when the failure happens

Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa05d42d6>]
ib_alloc_mr+0x26/0x50 [ib_core]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a37691>]
__nvme_rdma_init_request+0xc1/0x230 [nvme_rdma]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a37831>]
nvme_rdma_init_request+0x11/0x20 [nvme_rdma]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffff813429bb>]
blk_mq_init_rq_map+0x23b/0x2b0
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffff81342e25>]
blk_mq_alloc_tag_set+0x135/0x2c0
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a37cc3>]
nvme_rdma_create_ctrl+0x483/0x710 [nvme_rdma]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffffa0a2c127>]
nvmf_dev_write+0x727/0x93c [nvme_fabrics]
Mar 18 08:27:31 dhcp-10-192-204-6 kernel: [<ffffffff812320e7>]
__vfs_write+0x37/0x160

the ctrl->queue_count in nvme_rdma_create_ctrl() is initialized like so:

ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */

where opts->nr_io_queues is typically set to num_online_cpus() which
in my case turned out to be 16, while the failure i encountered was
for the 14th CPU , the failure being alloc_mr() because we reached the
limitation of MRs in our chip.

The point is that post this failure, functions like
blk_mq_init_cpu_queues() and blk_mq_map_swqueue() use code snippet
like below to access the hctxs:

for_each_possible_cpu(i) {
....
 hctx = blk_mq_map_queue(q, i);
 hctx->....                                          // crash if ptr is NULL
..
}

I'm not that familiar with the blk code itself, so perhaps there is a
better way of fixing it, but have pointed out the problem and a
possible fix, this is more of a bug
in the error-handling path?

Thanks
Som

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-28  8:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20  9:40 [PATCH] blk-mq: Add NULL pointer check for HW dispatch queue Jitendra Bhivare
2017-03-27  9:14 ` Christoph Hellwig
2017-03-28  8:35   ` Somnath Kotur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).