linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Edmund Nadolski <edmund.nadolski@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH 2/5] nvme: Prevent resets during paused states
Date: Thu, 5 Sep 2019 14:35:46 -0600	[thread overview]
Message-ID: <20190905203546.GB25467@localhost.localdomain> (raw)
In-Reply-To: <5f36518c-7cf0-9fe1-49d7-2b24b3d229fe@grimberg.me>

On Thu, Sep 05, 2019 at 01:23:53PM -0700, Sagi Grimberg wrote:
> 
> > A paused controller is doing critical internal activation work. Don't
> > allow a reset to occur by setting it to the resetting state, preventing
> > any future reset from occuring during this time.
> 
> Is there a reproducible bug actually being addressed here?

Yes, IO timeouts happen during CSTS.PP, which is normal, and esaclating
such errors to reset the controller while it is activating firmware is
not a good idea.

Further, we do not want to a user to manaully trigger a reset (via sysfs
or other means), so this properly blocks such actions.
 
> Also, seems a bit "acrobatic" to set the state to RESETTING without
> really resetting it (and then change it back to LIVE before you do
> actually resetting it).

We can think of a CSTS.PP as the device internally resetting itself to
activate firmware.

> Would it make sense to look at nvme_ctrl_pp_status when
> scheduling a reset in nvme_reset_ctrl? Just a thought..

We have to be able to reset if we decide CSTS.PP is stuck, fw activation
timeout.

> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > ---
> >   drivers/nvme/host/core.c | 9 ++++++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 91b1f0e57715..d42167d7594b 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3705,20 +3705,23 @@ static void nvme_fw_act_work(struct work_struct *work)
> >   		fw_act_timeout = jiffies +
> >   				msecs_to_jiffies(admin_timeout * 1000);
> > +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
> > +		return;
> > +
> >   	nvme_stop_queues(ctrl);
> >   	while (nvme_ctrl_pp_status(ctrl)) {
> >   		if (time_after(jiffies, fw_act_timeout)) {
> >   			dev_warn(ctrl->device,
> >   				"Fw activation timeout, reset controller\n");
> 
> Would be good if the print will reflect if it resetting or not..
>
> > -			nvme_reset_ctrl(ctrl);
> > +			if (nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> > +				nvme_reset_ctrl(ctrl);
> 
> How can this state change not succeed? ctrl removal?

Right, we can't prevent a transition to a deleting state.

> >   			break;
> >   		}
> >   		msleep(100);
> >   	}
> > -	if (ctrl->state != NVME_CTRL_LIVE)
> > +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> >   		return;
> 
> In what scenario this will not succeed? if the reset did it?

Controller deletion should be the only reason here.

I see now the "break" for a failed activation ought to be a return,
so I can fix that if you're okay with the rest.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-09-05 20:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
2019-09-05 14:26 ` [PATCH 2/5] nvme: Prevent resets during paused states Keith Busch
2019-09-05 20:23   ` Sagi Grimberg
2019-09-05 20:35     ` Keith Busch [this message]
2019-09-05 20:42       ` Sagi Grimberg
2019-09-05 14:26 ` [PATCH 3/5] nvme-pci: Free tagset if no IO queues Keith Busch
2019-09-05 20:24   ` Sagi Grimberg
2019-09-05 20:40     ` Keith Busch
2019-09-05 20:43       ` Sagi Grimberg
2019-09-05 14:26 ` [PATCH 4/5] nvme: Remove ADMIN_ONLY state Keith Busch
2019-09-05 14:26 ` [PATCH 5/5] nvme: Wait for reset state when required Keith Busch
2019-09-05 15:57   ` James Smart
2019-09-05 20:47   ` Sagi Grimberg
2019-09-05 20:55     ` Keith Busch
2019-09-05 20:13 ` [PATCH 1/5] nvme: Restart request timers in resetting state Sagi Grimberg
2019-09-05 20:25   ` Keith Busch
2019-09-05 20:39     ` Sagi Grimberg
2019-09-05 21:36       ` James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905203546.GB25467@localhost.localdomain \
    --to=kbusch@kernel.org \
    --cc=edmund.nadolski@intel.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).