linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: James Smart <james.smart@broadcom.com>
To: Israel Rukshin <israelr@mellanox.com>,
	Linux-nvme <linux-nvme@lists.infradead.org>,
	Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Shlomi Nimrodi <shlomin@mellanox.com>, Max Gurtovoy <maxg@mellanox.com>
Subject: Re: [PATCH 5/7] nvme: Fix controller creation races with teardown flow
Date: Wed, 19 Aug 2020 16:16:26 -0700	[thread overview]
Message-ID: <fad9c553-28cd-a40c-1a92-6e36fb7ad383@broadcom.com> (raw)
In-Reply-To: <1585063785-14268-6-git-send-email-israelr@mellanox.com>

On 3/24/2020 8:29 AM, Israel Rukshin wrote:
> Calling nvme_sysfs_delete() when the controller is in the middle of
> creation may cause several bugs. If the controller is in NEW state we
> remove delete_controller file and don't delete the controller. The user
> will not be able to use nvme disconnect command on that controller again,
> although the controller may be active. Other bugs may happen if the
> controller is in the middle of create_ctrl callback and
> nvme_do_delete_ctrl() starts. For example, freeing I/O tagset at
> nvme_do_delete_ctrl() before it was allocated at create_ctrl callback.
>
> To fix all those races don't allow the user to delete the controller
> before it was fully created.
>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 5 +++++
>   drivers/nvme/host/nvme.h | 1 +
>   2 files changed, 6 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ba064fd..9961d0e 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3228,6 +3228,10 @@ static ssize_t nvme_sysfs_delete(struct device *dev,
>   {
>   	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
>   
> +	/* Can't delete non-created controllers */
> +	if (!ctrl->created)
> +		return -EBUSY;
> +
>   	if (device_remove_file_self(dev, attr))
>   		nvme_delete_ctrl_sync(ctrl);
>   	return count;
> @@ -4039,6 +4043,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
>   		nvme_queue_scan(ctrl);
>   		nvme_start_queues(ctrl);
>   	}
> +	ctrl->created = true;
>   }
>   EXPORT_SYMBOL_GPL(nvme_start_ctrl);
>

FYI - I've hit a scenario with this patch, where if the device starts 
rejecting the initial connections or they continuously hit a failure - 
we're forced to wait ctrl_loss_tmo before it goes away. We can't 
forcibly delete the controller via sysfs.  This shouldn't be possible.

I understand the race conditions with delete and am looking at the same 
thing on FC.  Looking at what was trying to be achieved, it seems to 
overlap with some of the teardown that Sagi is synchronizing with the 
teardown_lock.

We may want to revisit this change.

-- james



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-08-19 23:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 15:29 [PATCH 0/7 V3] nvme: Fixes for deleting a ctrl before it was created Israel Rukshin
2020-03-24 15:29 ` [PATCH 1/7] nvme: Remove unused return code from nvme_delete_ctrl_sync Israel Rukshin
2020-03-24 15:29 ` [PATCH 2/7] nvme-pci: Re-order nvme_pci_free_ctrl Israel Rukshin
2020-03-24 15:42   ` Christoph Hellwig
2020-03-24 15:29 ` [PATCH 3/7] nvme: Fix ctrl use-after-free during sysfs deletion Israel Rukshin
2020-03-24 15:42   ` Christoph Hellwig
2020-03-24 15:29 ` [PATCH 4/7] nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl Israel Rukshin
2020-03-24 15:42   ` Christoph Hellwig
2020-03-24 15:29 ` [PATCH 5/7] nvme: Fix controller creation races with teardown flow Israel Rukshin
2020-08-19 23:16   ` James Smart [this message]
2020-08-21  0:28     ` Sagi Grimberg
2020-03-24 15:29 ` [PATCH 6/7] nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl Israel Rukshin
2020-03-25  0:19   ` Sagi Grimberg
2020-03-25 10:07     ` Israel Rukshin
2020-03-24 15:29 ` [PATCH 7/7] nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl Israel Rukshin
2020-03-24 16:17 ` [PATCH 0/7 V3] nvme: Fixes for deleting a ctrl before it was created Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fad9c553-28cd-a40c-1a92-6e36fb7ad383@broadcom.com \
    --to=james.smart@broadcom.com \
    --cc=hch@lst.de \
    --cc=israelr@mellanox.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=shlomin@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).