All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: Add sibling to list after full initialization
@ 2021-11-11 13:06 Daniel Wagner
  2021-11-14 10:56 ` Sagi Grimberg
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Wagner @ 2021-11-11 13:06 UTC (permalink / raw)
  To: linux-nvme; +Cc: Daniel Wagner

Adding the newly created namespace before the object is fully
initialized is opening a race with nvme_mpath_revalidate_paths() which
tries to access ns->disk. ns->disk can still be NULL when iterating
over the sibling list.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---

I got a few bug reports from our customer hitting this
quite often:

 RIP: 0010:nvme_mpath_revalidate_paths+0x27/0xb0 [nvme_core]
 Code: 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 50 48 8b 55 00 48 8b 85 10 c5 00 00 48 39 d5 48 8b 48 40 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 40 74 05 f0 80 60 78 ef 48 8b 50 30 48 39 d5 48 8d 42 d0
 RSP: 0018:ffffaf1303fffcc0 EFLAGS: 00010283
 RAX: ffff95f75ef5c400 RBX: ffff95f71f63aa00 RCX: 0000000000200000
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff95f71f63aa00
 RBP: ffff95f71b1d0000 R08: 0000000800000000 R09: 00000008ffffffff
 R10: 00000000000007be R11: 0000000000000384 R12: 0000000000000000
 R13: ffff95f71b1d0000 R14: ffff95f721d79338 R15: ffff95f71f63aa00
 FS:  0000000000000000(0000) GS:ffff95f77f9c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 000000033940a005 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  nvme_update_ns_info+0x15b/0x2f0 [nvme_core]
  nvme_alloc_ns+0x27f/0x810 [nvme_core]
  nvme_validate_or_alloc_ns+0xbb/0x190 [nvme_core]
  nvme_scan_work+0x155/0x2d0 [nvme_core]
  process_one_work+0x1f4/0x3e0
  worker_thread+0x24c/0x3e0
  ? process_one_work+0x3e0/0x3e0
  kthread+0x10d/0x130
  ? kthread_park+0xa0/0xa0
  ret_from_fork+0x35/0x40

This patch fixes the problem reported. I am not totally sure why this
suddenly happens but I guess it is related to 041bd1a1fc73 ("nvme:
only call synchronize_srcu when clearing current path").

 drivers/nvme/host/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9a2610e147ce..84f52e6c1a02 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3776,7 +3776,6 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 		}
 	}
 
-	list_add_tail_rcu(&ns->siblings, &head->list);
 	ns->head = head;
 	mutex_unlock(&ctrl->subsys->lock);
 	return 0;
@@ -3873,6 +3872,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
 	if (nvme_update_ns_info(ns, id))
 		goto out_unlink_ns;
 
+	mutex_lock(&ctrl->subsys->lock);
+	list_add_tail(&ns->siblings, &ns->head->list);
+	mutex_unlock(&ctrl->subsys->lock);
+
 	down_write(&ctrl->namespaces_rwsem);
 	nvme_ns_add_to_ctrl_list(ns);
 	up_write(&ctrl->namespaces_rwsem);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme: Add sibling to list after full initialization
  2021-11-11 13:06 [PATCH] nvme: Add sibling to list after full initialization Daniel Wagner
@ 2021-11-14 10:56 ` Sagi Grimberg
  2021-11-15  9:10   ` Daniel Wagner
  0 siblings, 1 reply; 3+ messages in thread
From: Sagi Grimberg @ 2021-11-14 10:56 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme

> Adding the newly created namespace before the object is fully
> initialized is opening a race with nvme_mpath_revalidate_paths() which
> tries to access ns->disk. ns->disk can still be NULL when iterating
> over the sibling list.

But ns->disk is assigned before nvme_init_ns_head is even called.
Maybe you mean that ns->head is NULL?

Where you would just need this?
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8642cf2160c4..19f7cde5bbc6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3878,8 +3878,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, 
unsigned nsid,
                 }
         }

-       list_add_tail_rcu(&ns->siblings, &head->list);
         ns->head = head;
+       list_add_tail_rcu(&ns->siblings, &head->list);
         mutex_unlock(&ctrl->subsys->lock);
         return 0;
--

> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> 
> I got a few bug reports from our customer hitting this
> quite often:
> 
>   RIP: 0010:nvme_mpath_revalidate_paths+0x27/0xb0 [nvme_core]
>   Code: 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 50 48 8b 55 00 48 8b 85 10 c5 00 00 48 39 d5 48 8b 48 40 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 40 74 05 f0 80 60 78 ef 48 8b 50 30 48 39 d5 48 8d 42 d0
>   RSP: 0018:ffffaf1303fffcc0 EFLAGS: 00010283
>   RAX: ffff95f75ef5c400 RBX: ffff95f71f63aa00 RCX: 0000000000200000
>   RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff95f71f63aa00
>   RBP: ffff95f71b1d0000 R08: 0000000800000000 R09: 00000008ffffffff
>   R10: 00000000000007be R11: 0000000000000384 R12: 0000000000000000
>   R13: ffff95f71b1d0000 R14: ffff95f721d79338 R15: ffff95f71f63aa00
>   FS:  0000000000000000(0000) GS:ffff95f77f9c0000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000040 CR3: 000000033940a005 CR4: 00000000003706e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    nvme_update_ns_info+0x15b/0x2f0 [nvme_core]
>    nvme_alloc_ns+0x27f/0x810 [nvme_core]
>    nvme_validate_or_alloc_ns+0xbb/0x190 [nvme_core]
>    nvme_scan_work+0x155/0x2d0 [nvme_core]
>    process_one_work+0x1f4/0x3e0
>    worker_thread+0x24c/0x3e0
>    ? process_one_work+0x3e0/0x3e0
>    kthread+0x10d/0x130
>    ? kthread_park+0xa0/0xa0
>    ret_from_fork+0x35/0x40
> 
> This patch fixes the problem reported. I am not totally sure why this
> suddenly happens but I guess it is related to 041bd1a1fc73 ("nvme:
> only call synchronize_srcu when clearing current path").
> 
>   drivers/nvme/host/core.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 9a2610e147ce..84f52e6c1a02 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3776,7 +3776,6 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
>   		}
>   	}
>   
> -	list_add_tail_rcu(&ns->siblings, &head->list);
>   	ns->head = head;
>   	mutex_unlock(&ctrl->subsys->lock);
>   	return 0;
> @@ -3873,6 +3872,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
>   	if (nvme_update_ns_info(ns, id))
>   		goto out_unlink_ns;
>   
> +	mutex_lock(&ctrl->subsys->lock);
> +	list_add_tail(&ns->siblings, &ns->head->list);
> +	mutex_unlock(&ctrl->subsys->lock);

What is the subsys->lock protecting here? I don't see
nvme_mpath_revalidate_paths acquiring it.


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme: Add sibling to list after full initialization
  2021-11-14 10:56 ` Sagi Grimberg
@ 2021-11-15  9:10   ` Daniel Wagner
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Wagner @ 2021-11-15  9:10 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme

Hi Sagi,

On Sun, Nov 14, 2021 at 12:56:55PM +0200, Sagi Grimberg wrote:
> > Adding the newly created namespace before the object is fully
> > initialized is opening a race with nvme_mpath_revalidate_paths() which
> > tries to access ns->disk. ns->disk can still be NULL when iterating
> > over the sibling list.
>
> But ns->disk is assigned before nvme_init_ns_head is even called.

ns->disk is NULL, but we are missing 5f432cceb3e9 ("nvme: use
blk_mq_alloc_disk") in our kernels. The commit moves the ns->disk
assignment in front of nvme_init_ns_head.

I didn't spot the code change when I forward ported the patch.

Sorry for the noise and thanks for the feedback!
Daniel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-15  9:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 13:06 [PATCH] nvme: Add sibling to list after full initialization Daniel Wagner
2021-11-14 10:56 ` Sagi Grimberg
2021-11-15  9:10   ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.