linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: James Puthukattukaran <james.puthukattukaran@oracle.com>
To: Keith Busch <kbusch@kernel.org>
Cc: linux-nvme@lists.infradead.org
Subject: Re: [External] : Re: way to unbind a bad nvme device/controller without powering off system
Date: Mon, 24 Oct 2022 20:02:33 -0400	[thread overview]
Message-ID: <13888912-24a4-870a-cc93-4192a69ce9ca@oracle.com> (raw)
In-Reply-To: <Y1cTbOinkQZvpzY0@kbusch-mbp.dhcp.thefacebook.com>



On 10/24/22 18:36, Keith Busch wrote:
> On Mon, Oct 24, 2022 at 05:40:30PM -0400, James Puthukattukaran wrote:
>> Hi -
>>
>> I'm seeing a scenario where what seems to be a non-functioning nvme controller/drive where the IO transactions are timing out and the controller is not responding to any controller commands. The controller seems to be disabled (nvme_dev_disable called via the nvme_timeout) but we're still seeing the nvme_reset_work thread  blocked and not making progress. I tried to remove the controller via the HP sysfs interface and that also hangs behind the reset thread waiting for it to complete. 
> 
> If it's in a hotplug slot, then just pull it out.

Looking for a programmatic (remote) way to do it. Also, doing this will cause surprise remove and won't it leave the nvme controller data structure in a bad state/not unbound from the driver?
>  
>> I thought the the disable controller path does not talk to the controller and simply unblocks the queues and cleans them out before unbinding the controller from the device. Not sure why the reset thread is still stuck then? Does the reset thread have to finish its course even though the controller has been disabled? trying to understand the flow here.
>>
>> I guess what I'm really looking for is a way to simply unbind the device from the driver, kill any threads and allow the device to be powered of via the hotplug interface (trying to avoid rebooting the system to remove the device).
> 
> What kernel are you using?

5.14 based kernel

> 
> Generally, the default timeout is really long. If you have a broken
> controller, it could take several minutes before the driver unblocks
> forward progress to unbind.
One concern is that the reset controller flow attempts to reinitialze the controller and this will cause problems if the controller is bad. Would it make sense to have a sysfs "remove_controller" interface that simply goes through and does a nvme_dev_disable() with the assumption that the controller is dead? Will the nvme_kill_queues() in nvme_dev_disadble() unwedge any potential nvme reset thread that is blocked and thus allow the nvme_remove() flow to complete?
thanks



  reply	other threads:[~2022-10-25  0:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-24 21:40 way to unbind a bad nvme device/controller without powering off system James Puthukattukaran
2022-10-24 22:36 ` Keith Busch
2022-10-25  0:02   ` James Puthukattukaran [this message]
2022-10-25  2:26     ` [External] : " Keith Busch
2022-10-25 16:56       ` Keith Busch
2022-10-28  3:14         ` James Puthukattukaran
2022-11-08 19:13         ` James Puthukattukaran
2022-11-08 23:15           ` Keith Busch
2022-11-10 16:51             ` James Puthukattukaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13888912-24a4-870a-cc93-4192a69ce9ca@oracle.com \
    --to=james.puthukattukaran@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).