All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Meneghini <jmeneghi@redhat.com>
To: Chris Leech <cleech@redhat.com>
Cc: linux-nvme@lists.infradead.org, lengchao@huawei.com,
	dwagner@suse.de, hare@suse.de, mlombard@redhat.com,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [RFC PATCH] nvme: fix RCU hole that allowed for endless looping in multipath round robin
Date: Wed, 23 Mar 2022 15:07:04 -0400	[thread overview]
Message-ID: <98469a28-a2da-65f9-2719-c568eea94b26@redhat.com> (raw)
In-Reply-To: <20220323153414.GA440@lst.de>

This is good. I'm glad everyone agrees.

Chris, please ask our partner to test this patch in their testbed and verify that
this fixes the soft-lockup problem they are seeing. I believe they have the ability
to reproduce this problem on their 8.6 testbed.

If that works then please re-post a patch so people can consider this for v5.18.

/John


On 3/23/22 11:34, Christoph Hellwig wrote:
> On Wed, Mar 23, 2022 at 04:54:26PM +0200, Sagi Grimberg wrote:
>>
>>
>> On 3/22/22 00:43, Chris Leech wrote:
>>> Make nvme_ns_remove match the assumptions elsewhere.
>>>
>>> 1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
>>>      running in __nvme_find_path or nvme_round_robin_path that will
>>>      re-assign this ns to current_path.
>>>
>>> 2) Any matching current_path entries need to be cleared before removing
>>>      from the siblings list, to prevent calling nvme_round_robin_path with
>>>      an "old" ns that's off list.
>>>
>>> 3) Finally the list_del_rcu can happen, and then synchronize again
>>>      before releasing any reference counts.
>>> ---
>>>    drivers/nvme/host/core.c | 13 +++++++++----
>>>    1 file changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index fd4720d37cc0..20778dc9224c 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -3917,6 +3917,15 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>>    	set_capacity(ns->disk, 0);
>>>    	nvme_fault_inject_fini(&ns->fault_inject);
>>>    +	/* ensure that !NVME_NS_READY is seen
>>> +	 * to prevent this ns going back in current_path
>>> +	 */
>>> +	synchronize_srcu(&ns->head->srcu);
>>> +
>>> +	/* wait for concurrent submissions */
>>> +	if (nvme_mpath_clear_current_path(ns))
>>> +		synchronize_srcu(&ns->head->srcu);
>>
>> Nothing prevents it from being reselected again.
>> This is what drove the placement of this after the
>> ns is removed from the head->list. But that was before
>> the selection looked into NVME_NS_READY flag...
>>
>> This looks legit to me...
> 
> Yes, this looks pretty sensible.  I'm tempted to just queue it up
> for 5.18.
> 



  reply	other threads:[~2022-03-23 19:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21 22:43 nvme-multipath: round-robin infinite looping Chris Leech
2022-03-21 22:43 ` kdump details (Re: nvme-multipath: round-robin infinite looping) Chris Leech
2022-03-22 11:16   ` Christoph Hellwig
2022-03-22 15:36     ` Chris Leech
2022-03-21 22:43 ` [RFC PATCH] nvme-multipath: break endless loop in nvme_round_robin_path Chris Leech
2022-03-22 11:17   ` Christoph Hellwig
2022-03-22 12:07     ` Daniel Wagner
2022-03-22 15:42     ` Chris Leech
2022-03-21 22:43 ` [RFC PATCH] nvme: fix RCU hole that allowed for endless looping in multipath round robin Chris Leech
2022-03-23 14:54   ` Sagi Grimberg
2022-03-23 15:34     ` Christoph Hellwig
2022-03-23 19:07       ` John Meneghini [this message]
2022-04-05 13:14         ` John Meneghini
2022-03-25  6:36   ` Christoph Hellwig
2022-03-25 12:42   ` Sagi Grimberg
2022-04-05 17:25     ` Chris Leech

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98469a28-a2da-65f9-2719-c568eea94b26@redhat.com \
    --to=jmeneghi@redhat.com \
    --cc=cleech@redhat.com \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mlombard@redhat.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.