linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: clear stale nvmeq->tags after tagset free
@ 2020-01-14 18:17 Edmund Nadolski
  2020-01-16 16:06 ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Edmund Nadolski @ 2020-01-14 18:17 UTC (permalink / raw)
  To: edmund.nadolski, linux-nvme, kbusch

This patch introduces nvme_exit_hctx() to NULL the nvmeq->tags pointer
after a tagset free, so that a subsequent nvme_init_hctx() can
correctly re-initialize it.

This patch fixes a failure case which occurs during cascaded controller
resets. The tagset is freed after the controller fails commands which
try to create IO SQs. A second reset restores the controller, but the
following nvme_validate_ns/device_add_disk sequence uses the old/stale
nvmeq->tags value, leading to an invalid tag value in nvme_handle_cqe and
kernel panic from nvme_irq.

Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com>
---
 drivers/nvme/host/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 365a2ddbeaa7..7daf3888ba8e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -404,6 +404,13 @@ static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 	return 0;
 }
 
+static void nvme_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
+{
+	struct nvme_queue *nvmeq = hctx->driver_data;
+
+	nvmeq->tags = NULL;
+}
+
 static int nvme_init_request(struct blk_mq_tag_set *set, struct request *req,
 		unsigned int hctx_idx, unsigned int numa_node)
 {
@@ -1582,6 +1589,7 @@ static const struct blk_mq_ops nvme_mq_ops = {
 	.complete	= nvme_pci_complete_rq,
 	.commit_rqs	= nvme_commit_rqs,
 	.init_hctx	= nvme_init_hctx,
+	.exit_hctx      = nvme_exit_hctx,
 	.init_request	= nvme_init_request,
 	.map_queues	= nvme_pci_map_queues,
 	.timeout	= nvme_timeout,
-- 
2.20.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-14 18:17 [PATCH] nvme: clear stale nvmeq->tags after tagset free Edmund Nadolski
@ 2020-01-16 16:06 ` Keith Busch
  2020-01-16 22:06   ` Nadolski, Edmund
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2020-01-16 16:06 UTC (permalink / raw)
  To: Edmund Nadolski; +Cc: linux-nvme

On Tue, Jan 14, 2020 at 11:17:45AM -0700, Edmund Nadolski wrote:
> +static void nvme_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
> +{
> +	struct nvme_queue *nvmeq = hctx->driver_data;
> +
> +	nvmeq->tags = NULL;
> +}

If you've multiple namespaces, disconnecting one of them will cause the
shared nvmeq to have NULL tags, which will crash the kernel on the very
next CQE read.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-16 16:06 ` Keith Busch
@ 2020-01-16 22:06   ` Nadolski, Edmund
  2020-01-16 23:26     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nadolski, Edmund @ 2020-01-16 22:06 UTC (permalink / raw)
  To: Keith Busch, Edmund Nadolski; +Cc: linux-nvme

On 1/16/2020 9:06 AM, Keith Busch wrote:
> On Tue, Jan 14, 2020 at 11:17:45AM -0700, Edmund Nadolski wrote:
>> +static void nvme_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
>> +{
>> +	struct nvme_queue *nvmeq = hctx->driver_data;
>> +
>> +	nvmeq->tags = NULL;
>> +}
> 
> If you've multiple namespaces, disconnecting one of them will cause the
> shared nvmeq to have NULL tags, which will crash the kernel on the very
> next CQE read.

Currently it crashes with just a single namespace, because the nvmeq->tags is 
still pointing into a tagset that was previously released by calling 
nvme_free_tagset() from nvme_reset_work().

Perhaps the driver should not free the tagset unless & until there are no more 
namespaces?  Is nvme_remove_namespaces() supposed to ensure that?

Thanks,
Ed

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-16 22:06   ` Nadolski, Edmund
@ 2020-01-16 23:26     ` Keith Busch
  2020-01-21 17:32       ` Nadolski, Edmund
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2020-01-16 23:26 UTC (permalink / raw)
  To: Nadolski, Edmund; +Cc: linux-nvme

On Thu, Jan 16, 2020 at 03:06:40PM -0700, Nadolski, Edmund wrote:
> On 1/16/2020 9:06 AM, Keith Busch wrote:
> > On Tue, Jan 14, 2020 at 11:17:45AM -0700, Edmund Nadolski wrote:
> > > +static void nvme_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
> > > +{
> > > +	struct nvme_queue *nvmeq = hctx->driver_data;
> > > +
> > > +	nvmeq->tags = NULL;
> > > +}
> > 
> > If you've multiple namespaces, disconnecting one of them will cause the
> > shared nvmeq to have NULL tags, which will crash the kernel on the very
> > next CQE read.
> 
> Currently it crashes with just a single namespace, because the nvmeq->tags
> is still pointing into a tagset that was previously released by calling
> nvme_free_tagset() from nvme_reset_work().
> 
> Perhaps the driver should not free the tagset unless & until there are no
> more namespaces?  Is nvme_remove_namespaces() supposed to ensure that?

Looks like we'd need to clear all the nvmeq tagset pointers in
nvme_free_tagset() we're called from the reset work.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-16 23:26     ` Keith Busch
@ 2020-01-21 17:32       ` Nadolski, Edmund
  2020-01-21 17:42         ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nadolski, Edmund @ 2020-01-21 17:32 UTC (permalink / raw)
  To: Keith Busch, Nadolski, Edmund; +Cc: Christoph Hellwig, linux-nvme

On 1/16/2020 4:26 PM, Keith Busch wrote:
> On Thu, Jan 16, 2020 at 03:06:40PM -0700, Nadolski, Edmund wrote:
>> On 1/16/2020 9:06 AM, Keith Busch wrote:
>> > On Tue, Jan 14, 2020 at 11:17:45AM -0700, Edmund Nadolski wrote:
>> > > +static void nvme_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
>> > > +{
>> > > +	struct nvme_queue *nvmeq = hctx->driver_data;
>> > > +
>> > > +	nvmeq->tags = NULL;
>> > > +}
>> > 
>> > If you've multiple namespaces, disconnecting one of them will cause the
>> > shared nvmeq to have NULL tags, which will crash the kernel on the very
>> > next CQE read.
>> 
>> Currently it crashes with just a single namespace, because the nvmeq->tags
>> is still pointing into a tagset that was previously released by calling
>> nvme_free_tagset() from nvme_reset_work().
>> 
>> Perhaps the driver should not free the tagset unless & until there are no
>> more namespaces?  Is nvme_remove_namespaces() supposed to ensure that?
> 
> Looks like we'd need to clear all the nvmeq tagset pointers in
> nvme_free_tagset() we're called from the reset work.

The tagset pointer gets invalidated when the driver calls the following:

nvme_reset_work()
    +-> nvme_setup_io_queues()  <<< cmd fails, no io queues created
    +-> nvme_kill_queues()
    +-> nvme_remove_namespaces()
    |   +-> nvme_ns_remove()  <<< for each ctrl->namespaces
    |       +-> del_gendisk()
    |       +-> blk_cleanup_queue()
    |       |   +-> blk_mq_exit_queue()  <<< invalidates the tags
    |       |       +-> blk_mq_del_queue_tag_set()
    |       |       +-> blk_mq_exit_hw_queues()
    |       |           +-> blk_mq_exit_hctx()
    |       |               +-> set->ops->exit_hctx(...) <<< driver callback
    |       +-> nvme_put_ns()
    +-> nvme_free_tagset()
    |   +-> blk_mq_free_tagset()
    +-> nvme_start_ctrl()  <<< allow operation w/o namespaces

So the nvmeq->tags is already dead and stale by the time we get to 
nvme_free_tagset. The exit_hctx callback seems like the architected place to 
clear it (also noting that we init it in nmve_init_hctx).

If we don't clear the stale nvmeq->tags from the exit_hctx callback, then we 
will never re-init with a valid one if/when the device does recover (i.e., in 
a future call into nvme_init_hctx).

Seems we should not need to worry about crashing in the presence of multiple 
namespaces, as this is the sequence that removes them all (notwithstanding 
some existing race window bug).

Thanks,
Ed

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-21 17:32       ` Nadolski, Edmund
@ 2020-01-21 17:42         ` Keith Busch
  2020-01-21 18:49           ` Nadolski, Edmund
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2020-01-21 17:42 UTC (permalink / raw)
  To: Nadolski, Edmund; +Cc: Christoph Hellwig, linux-nvme

On Tue, Jan 21, 2020 at 10:32:22AM -0700, Nadolski, Edmund wrote:
> The tagset pointer gets invalidated when the driver calls the following:
> 
> nvme_reset_work()
>    +-> nvme_setup_io_queues()  <<< cmd fails, no io queues created
>    +-> nvme_kill_queues()
>    +-> nvme_remove_namespaces()
>    |   +-> nvme_ns_remove()  <<< for each ctrl->namespaces
>    |       +-> del_gendisk()
>    |       +-> blk_cleanup_queue()
>    |       |   +-> blk_mq_exit_queue()  <<< invalidates the tags
>    |       |       +-> blk_mq_del_queue_tag_set()
>    |       |       +-> blk_mq_exit_hw_queues()
>    |       |           +-> blk_mq_exit_hctx()
>    |       |               +-> set->ops->exit_hctx(...) <<< driver callback
>    |       +-> nvme_put_ns()
>    +-> nvme_free_tagset()
>    |   +-> blk_mq_free_tagset()
>    +-> nvme_start_ctrl()  <<< allow operation w/o namespaces
> 
> So the nvmeq->tags is already dead and stale by the time we get to
> nvme_free_tagset.

No, nvmeq->tags is not stale until we call nvme_free_tagset(). The
exit_hctx() callback is invoked per-namespace, but the nvmeq is shared
among all of them. You can't modify the shared resource until all users
have exited.

If you really wanted to do this in exit_hctx(), you'd have to introduce
reference counting on 'struct nvme_queue'.

> The exit_hctx callback seems like the architected place to
> clear it (also noting that we init it in nmve_init_hctx).
> 
> If we don't clear the stale nvmeq->tags from the exit_hctx callback, then we
> will never re-init with a valid one if/when the device does recover (i.e.,
> in a future call into nvme_init_hctx).
> 
> Seems we should not need to worry about crashing in the presence of multiple
> namespaces, as this is the sequence that removes them all (notwithstanding
> some existing race window bug).

If you remove just 1 of many namespaces, if that namespace's exit_hctx()
clears the shared nvmeq's tagset, kernel panic will happen for all the
other operational namespaces.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-21 17:42         ` Keith Busch
@ 2020-01-21 18:49           ` Nadolski, Edmund
  2020-01-21 19:12             ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Nadolski, Edmund @ 2020-01-21 18:49 UTC (permalink / raw)
  To: Keith Busch, Nadolski, Edmund; +Cc: Christoph Hellwig, linux-nvme

On 1/21/2020 10:42 AM, Keith Busch wrote:
> On Tue, Jan 21, 2020 at 10:32:22AM -0700, Nadolski, Edmund wrote:
>> The tagset pointer gets invalidated when the driver calls the following:
>> 
>> nvme_reset_work()
>>    +-> nvme_setup_io_queues()  <<< cmd fails, no io queues created
>>    +-> nvme_kill_queues()
>>    +-> nvme_remove_namespaces()
>>    |   +-> nvme_ns_remove()  <<< for each ctrl->namespaces
>>    |       +-> del_gendisk()
>>    |       +-> blk_cleanup_queue()
>>    |       |   +-> blk_mq_exit_queue()  <<< invalidates the tags
>>    |       |       +-> blk_mq_del_queue_tag_set()
>>    |       |       +-> blk_mq_exit_hw_queues()
>>    |       |           +-> blk_mq_exit_hctx()
>>    |       |               +-> set->ops->exit_hctx(...) <<< driver callback
>>    |       +-> nvme_put_ns()
>>    +-> nvme_free_tagset()
>>    |   +-> blk_mq_free_tagset()
>>    +-> nvme_start_ctrl()  <<< allow operation w/o namespaces
>> 
>> So the nvmeq->tags is already dead and stale by the time we get to
>> nvme_free_tagset.
> 
> No, nvmeq->tags is not stale until we call nvme_free_tagset().

I see this in blk/blk-mq.c:

2943 /* tags can _not_ be used after returning from blk_mq_exit_queue */
2944 void blk_mq_exit_queue(struct request_queue *q)

which ends up being called on every individual ctrl->namespaces entry.  What 
have I overlooked?

Thanks for clarifying,
Ed



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-21 18:49           ` Nadolski, Edmund
@ 2020-01-21 19:12             ` Keith Busch
  2020-01-21 19:26               ` Nadolski, Edmund
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2020-01-21 19:12 UTC (permalink / raw)
  To: Nadolski, Edmund; +Cc: Christoph Hellwig, linux-nvme

On Tue, Jan 21, 2020 at 11:49:53AM -0700, Nadolski, Edmund wrote:
> On 1/21/2020 10:42 AM, Keith Busch wrote:
> > On Tue, Jan 21, 2020 at 10:32:22AM -0700, Nadolski, Edmund wrote:
> > > The tagset pointer gets invalidated when the driver calls the following:
> > > 
> > > nvme_reset_work()
> > >    +-> nvme_setup_io_queues()  <<< cmd fails, no io queues created
> > >    +-> nvme_kill_queues()
> > >    +-> nvme_remove_namespaces()
> > >    |   +-> nvme_ns_remove()  <<< for each ctrl->namespaces
> > >    |       +-> del_gendisk()
> > >    |       +-> blk_cleanup_queue()
> > >    |       |   +-> blk_mq_exit_queue()  <<< invalidates the tags
> > >    |       |       +-> blk_mq_del_queue_tag_set()
> > >    |       |       +-> blk_mq_exit_hw_queues()
> > >    |       |           +-> blk_mq_exit_hctx()
> > >    |       |               +-> set->ops->exit_hctx(...) <<< driver callback
> > >    |       +-> nvme_put_ns()
> > >    +-> nvme_free_tagset()
> > >    |   +-> blk_mq_free_tagset()
> > >    +-> nvme_start_ctrl()  <<< allow operation w/o namespaces
> > > 
> > > So the nvmeq->tags is already dead and stale by the time we get to
> > > nvme_free_tagset.
> > 
> > No, nvmeq->tags is not stale until we call nvme_free_tagset().
> 
> I see this in blk/blk-mq.c:
> 
> 2943 /* tags can _not_ be used after returning from blk_mq_exit_queue */
> 2944 void blk_mq_exit_queue(struct request_queue *q)

You can't access it through q->tag_set after that. The tagset still
exists beyond any individual request_queue using it until you call
blk_mq_free_tag_set().
 
> which ends up being called on every individual ctrl->namespaces entry.  What
> have I overlooked?

There are various paths that can have driver can remove a subset of
namespaces rather than all of them. Is that what you're overlooking?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: clear stale nvmeq->tags after tagset free
  2020-01-21 19:12             ` Keith Busch
@ 2020-01-21 19:26               ` Nadolski, Edmund
  0 siblings, 0 replies; 9+ messages in thread
From: Nadolski, Edmund @ 2020-01-21 19:26 UTC (permalink / raw)
  To: Keith Busch, Nadolski, Edmund; +Cc: Christoph Hellwig, linux-nvme

On 1/21/2020 12:12 PM, Keith Busch wrote:
> On Tue, Jan 21, 2020 at 11:49:53AM -0700, Nadolski, Edmund wrote:
>> On 1/21/2020 10:42 AM, Keith Busch wrote:
>> > On Tue, Jan 21, 2020 at 10:32:22AM -0700, Nadolski, Edmund wrote:
>> > > The tagset pointer gets invalidated when the driver calls the following:
>> > > 
>> > > nvme_reset_work()
>> > >    +-> nvme_setup_io_queues()  <<< cmd fails, no io queues created
>> > >    +-> nvme_kill_queues()
>> > >    +-> nvme_remove_namespaces()
>> > >    |   +-> nvme_ns_remove()  <<< for each ctrl->namespaces
>> > >    |       +-> del_gendisk()
>> > >    |       +-> blk_cleanup_queue()
>> > >    |       |   +-> blk_mq_exit_queue()  <<< invalidates the tags
>> > >    |       |       +-> blk_mq_del_queue_tag_set()
>> > >    |       |       +-> blk_mq_exit_hw_queues()
>> > >    |       |           +-> blk_mq_exit_hctx()
>> > >    |       |               +-> set->ops->exit_hctx(...) <<< driver callback
>> > >    |       +-> nvme_put_ns()
>> > >    +-> nvme_free_tagset()
>> > >    |   +-> blk_mq_free_tagset()
>> > >    +-> nvme_start_ctrl()  <<< allow operation w/o namespaces
>> > > 
>> > > So the nvmeq->tags is already dead and stale by the time we get to
>> > > nvme_free_tagset.
>> > 
>> > No, nvmeq->tags is not stale until we call nvme_free_tagset().
>> 
>> I see this in blk/blk-mq.c:
>> 
>> 2943 /* tags can _not_ be used after returning from blk_mq_exit_queue */
>> 2944 void blk_mq_exit_queue(struct request_queue *q)
> 
> You can't access it through q->tag_set after that. The tagset still
> exists beyond any individual request_queue using it until you call
> blk_mq_free_tag_set().

Ah ok, well that's subtle ;)  (I guess it works as long as blk-mq guarantees 
the tagset lifetime.)

I'll rework the patch to clear the pointers from nvme_free_tagset, as you 
mentioned previously.


>> which ends up being called on every individual ctrl->namespaces entry.  What
>> have I overlooked?
> 
> There are various paths that can have driver can remove a subset of
> namespaces rather than all of them. Is that what you're overlooking?

Yes, I should have looked into all the callers of nvme_ns_remove.

Thanks,
Ed


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-01-21 19:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14 18:17 [PATCH] nvme: clear stale nvmeq->tags after tagset free Edmund Nadolski
2020-01-16 16:06 ` Keith Busch
2020-01-16 22:06   ` Nadolski, Edmund
2020-01-16 23:26     ` Keith Busch
2020-01-21 17:32       ` Nadolski, Edmund
2020-01-21 17:42         ` Keith Busch
2020-01-21 18:49           ` Nadolski, Edmund
2020-01-21 19:12             ` Keith Busch
2020-01-21 19:26               ` Nadolski, Edmund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).