* [PATCH v2] nvme: Add sibling to list after full initialization
@ 2021-11-11 13:21 Daniel Wagner
2021-11-19 17:10 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Wagner @ 2021-11-11 13:21 UTC (permalink / raw)
To: linux-nvme; +Cc: Daniel Wagner
Adding the newly created namespace before the object is fully
initialized is opening a race with nvme_mpath_revalidate_paths() which
tries to access ns->disk. ns->disk can still be NULL when iterating
over the sibling list.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
v2: use list_add_tail_rcu instead of list_add_tail
I got a few bug reports from our customer hitting this
quite often:
RIP: 0010:nvme_mpath_revalidate_paths+0x27/0xb0 [nvme_core]
Code: 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 50 48 8b 55 00 48 8b 85 10 c5 00 00 48 39 d5 48 8b 48 40 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 40 74 05 f0 80 60 78 ef 48 8b 50 30 48 39 d5 48 8d 42 d0
RSP: 0018:ffffaf1303fffcc0 EFLAGS: 00010283
RAX: ffff95f75ef5c400 RBX: ffff95f71f63aa00 RCX: 0000000000200000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff95f71f63aa00
RBP: ffff95f71b1d0000 R08: 0000000800000000 R09: 00000008ffffffff
R10: 00000000000007be R11: 0000000000000384 R12: 0000000000000000
R13: ffff95f71b1d0000 R14: ffff95f721d79338 R15: ffff95f71f63aa00
FS: 0000000000000000(0000) GS:ffff95f77f9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 000000033940a005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
nvme_update_ns_info+0x15b/0x2f0 [nvme_core]
nvme_alloc_ns+0x27f/0x810 [nvme_core]
nvme_validate_or_alloc_ns+0xbb/0x190 [nvme_core]
nvme_scan_work+0x155/0x2d0 [nvme_core]
process_one_work+0x1f4/0x3e0
worker_thread+0x24c/0x3e0
? process_one_work+0x3e0/0x3e0
kthread+0x10d/0x130
? kthread_park+0xa0/0xa0
ret_from_fork+0x35/0x40
This patch fixes the problem reported. I am not totally sure why this
suddenly happens but I guess it is related to 041bd1a1fc73 ("nvme:
only call synchronize_srcu when clearing current path").
drivers/nvme/host/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9a2610e147ce..7e43cb31d41e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3776,7 +3776,6 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
}
}
- list_add_tail_rcu(&ns->siblings, &head->list);
ns->head = head;
mutex_unlock(&ctrl->subsys->lock);
return 0;
@@ -3873,6 +3872,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
if (nvme_update_ns_info(ns, id))
goto out_unlink_ns;
+ mutex_lock(&ctrl->subsys->lock);
+ list_add_tail_rcu(&ns->siblings, &ns->head->list);
+ mutex_unlock(&ctrl->subsys->lock);
+
down_write(&ctrl->namespaces_rwsem);
nvme_ns_add_to_ctrl_list(ns);
up_write(&ctrl->namespaces_rwsem);
--
2.29.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: Add sibling to list after full initialization
2021-11-11 13:21 [PATCH v2] nvme: Add sibling to list after full initialization Daniel Wagner
@ 2021-11-19 17:10 ` Christoph Hellwig
2021-11-21 10:22 ` Sagi Grimberg
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2021-11-19 17:10 UTC (permalink / raw)
To: Daniel Wagner; +Cc: linux-nvme
Thanks,
applied to nvme-5.16.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: Add sibling to list after full initialization
2021-11-19 17:10 ` Christoph Hellwig
@ 2021-11-21 10:22 ` Sagi Grimberg
2021-11-23 5:58 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2021-11-21 10:22 UTC (permalink / raw)
To: Christoph Hellwig, Daniel Wagner; +Cc: linux-nvme
> Thanks,
>
> applied to nvme-5.16.
I thought this is not needed upstream. Daniel?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: Add sibling to list after full initialization
2021-11-21 10:22 ` Sagi Grimberg
@ 2021-11-23 5:58 ` Christoph Hellwig
2021-11-23 9:05 ` Sagi Grimberg
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2021-11-23 5:58 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: Christoph Hellwig, Daniel Wagner, linux-nvme
On Sun, Nov 21, 2021 at 12:22:28PM +0200, Sagi Grimberg wrote:
>
> > Thanks,
> >
> > applied to nvme-5.16.
>
> I thought this is not needed upstream. Daniel?
Why?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: Add sibling to list after full initialization
2021-11-23 5:58 ` Christoph Hellwig
@ 2021-11-23 9:05 ` Sagi Grimberg
2021-11-23 16:17 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2021-11-23 9:05 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Daniel Wagner, linux-nvme
>>> Thanks,
>>>
>>> applied to nvme-5.16.
>>
>> I thought this is not needed upstream. Daniel?
>
> Why?
The correspondence on v1 of this patch concluded that it
is not needed for upstream.
--
On Sun, Nov 14, 2021 at 12:56:55PM +0200, Sagi Grimberg wrote:
>> Adding the newly created namespace before the object is fully
>> initialized is opening a race with nvme_mpath_revalidate_paths() which
>> tries to access ns->disk. ns->disk can still be NULL when iterating
>> over the sibling list.
>
> But ns->disk is assigned before nvme_init_ns_head is even called.
ns->disk is NULL, but we are missing 5f432cceb3e9 ("nvme: use
blk_mq_alloc_disk") in our kernels. The commit moves the ns->disk
assignment in front of nvme_init_ns_head.
I didn't spot the code change when I forward ported the patch.
Sorry for the noise and thanks for the feedback!
Daniel
--
However I do think we need the following:
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3725b6a3791c..516eedeca044 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3885,8 +3885,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns,
unsigned nsid,
}
}
- list_add_tail_rcu(&ns->siblings, &head->list);
ns->head = head;
+ list_add_tail_rcu(&ns->siblings, &head->list);
mutex_unlock(&ctrl->subsys->lock);
return 0;
--
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: Add sibling to list after full initialization
2021-11-23 9:05 ` Sagi Grimberg
@ 2021-11-23 16:17 ` Christoph Hellwig
2021-12-06 13:42 ` Daniel Wagner
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2021-11-23 16:17 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: Christoph Hellwig, Daniel Wagner, linux-nvme
On Tue, Nov 23, 2021 at 11:05:37AM +0200, Sagi Grimberg wrote:
>
> > > > Thanks,
> > > >
> > > > applied to nvme-5.16.
> > >
> > > I thought this is not needed upstream. Daniel?
> >
> > Why?
>
> The correspondence on v1 of this patch concluded that it
> is not needed for upstream.
I'll drop it for now then.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: Add sibling to list after full initialization
2021-11-23 16:17 ` Christoph Hellwig
@ 2021-12-06 13:42 ` Daniel Wagner
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Wagner @ 2021-12-06 13:42 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Sagi Grimberg, linux-nvme
> I'll drop it for now then.
Thanks, the patch is not necessary due to do the changes from
5f432cceb3e9 ("nvme: use blk_mq_alloc_disk").
Next time I'll make sure I mark the newest version of the patch as
invalid if such a discussion happens on a previous version.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-12-06 13:42 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 13:21 [PATCH v2] nvme: Add sibling to list after full initialization Daniel Wagner
2021-11-19 17:10 ` Christoph Hellwig
2021-11-21 10:22 ` Sagi Grimberg
2021-11-23 5:58 ` Christoph Hellwig
2021-11-23 9:05 ` Sagi Grimberg
2021-11-23 16:17 ` Christoph Hellwig
2021-12-06 13:42 ` Daniel Wagner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.