* sysfs interface to force power off @ 2022-11-04 23:08 James Puthukattukaran 2022-11-07 20:41 ` Bjorn Helgaas 0 siblings, 1 reply; 8+ messages in thread From: James Puthukattukaran @ 2022-11-04 23:08 UTC (permalink / raw) To: linux-pci, helgaas Looking to solve a problem where we have nvme drives that are hung in the field and we are not sure of the root cause but the working theory is that the controller is "bad" and not responding properly to commands. The nvme driver times out on outstanding IO requests and as part of recovery, attempts to reset the controller and reinitialize the device. The reset controller also hangs like here -- ernel:info: [10419813.132341] Workqueue: nvme-reset-wq nvme_reset_work [nvme] kernel:warning: [10419813.132342] Call Trace: kernel:warning: [10419813.132345] __schedule+0x2bc/0x89b kernel:warning: [10419813.132348] schedule+0x36/0x7c kernel:warning: [10419813.132351] blk_mq_freeze_queue_wait+0x4b/0xaa kernel:warning: [10419813.132353] ? remove_wait_queue+0x60/0x60 kernel:warning: [10419813.132359] nvme_wait_freeze+0x33/0x50 [nvme_core] kernel:warning: [10419813.132362] nvme_reset_work+0x802/0xd84 [nvme] kernel:warning: [10419813.132364] ? __switch_to_asm+0x40/0x62 kernel:warning: [10419813.132365] ? __switch_to_asm+0x34/0x62 kernel:warning: [10419813.132367] ? __switch_to+0x9b/0x505 kernel:warning: [10419813.132368] ? __switch_to_asm+0x40/0x62 kernel:warning: [10419813.132370] ? __switch_to_asm+0x40/0x62 kernel:warning: [10419813.132372] process_one_work+0x169/0x399 kernel:warning: [10419813.132374] worker_thread+0x4d/0x3e5 kernel:warning: [10419813.132377] kthread+0x105/0x138 kernel:warning: [10419813.132379] ? rescuer_thread+0x380/0x375 kernel:warning: [10419813.132380] ? kthread_bind+0x20/0x15 kernel:warning: [10419813.132382] ret_from_fork+0x24/0x49 ... So, I tried to hot power off the device via "echo 0 > /sys/bus/pci/slots/X/power" -- the thread also hangs waiting for the nvme reset thread to finish (like so) -- kernel:warning: [10419813.158116] __schedule+0x2bc/0x89b kernel:warning: [10419813.158119] schedule+0x36/0x7c kernel:warning: [10419813.158122] schedule_timeout+0x1f6/0x31f kernel:warning: [10419813.158124] ? sched_clock_cpu+0x11/0xa5 kernel:warning: [10419813.158126] ? try_to_wake_up+0x59/0x505 kernel:warning: [10419813.158130] wait_for_completion+0x12b/0x18a kernel:warning: [10419813.158132] ? wake_up_q+0x80/0x73 kernel:warning: [10419813.158134] flush_work+0x122/0x1a7 kernel:warning: [10419813.158137] ? wake_up_worker+0x30/0x2b kernel:warning: [10419813.158141] nvme_remove+0x71/0x100 [nvme] kernel:warning: [10419813.158146] pci_device_remove+0x3e/0xb6 kernel:warning: [10419813.158149] device_release_driver_internal+0x134/0x1eb kernel:warning: [10419813.158151] device_release_driver+0x12/0x14 kernel:warning: [10419813.158155] pci_stop_bus_device+0x7c/0x96 kernel:warning: [10419813.158158] pci_stop_bus_device+0x39/0x96 kernel:warning: [10419813.158164] pci_stop_and_remove_bus_device+0x12/0x1d kernel:warning: [10419813.158167] pciehp_unconfigure_device+0x7a/0x1d7 kernel:warning: [10419813.158169] pciehp_disable_slot+0x52/0xca kernel:warning: [10419813.158171] pciehp_sysfs_disable_slot+0x67/0x112 kernel:warning: [10419813.158174] disable_slot+0x12/0x14 kernel:warning: [10419813.158175] power_write_file+0x6e/0xf8 kernel:warning: [10419813.158179] pci_slot_attr_store+0x24/0x2e kernel:warning: [10419813.158180] sysfs_kf_write+0x3f/0x46 kernel:warning: [10419813.158182] kernfs_fop_write+0x124/0x1a3 kernel:warning: [10419813.158184] __vfs_write+0x3a/0x16d kernel:warning: [10419813.158187] ? audit_filter_syscall+0x33/0xce kernel:warning: [10419813.158189] vfs_write+0xb2/0x1a1 Is there a way to force power off the device instead of the "graceful" approach? Obviously, we don't want to reset the system and don't have physical access to the device. Would it make sense to create a "force power off" in /sys/bus/pci/slots/X which basically a) Sets completion timeout mask (CTO) (for outstanding IO requests not causing a fatal error due to CTOs; not an issue for DPCs I would think) b) power off the slot c) enable CTO mask d) unconfigure the device via pciehp_unconfigure_device Any help here appreciated! thanks James ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sysfs interface to force power off 2022-11-04 23:08 sysfs interface to force power off James Puthukattukaran @ 2022-11-07 20:41 ` Bjorn Helgaas 2022-11-07 21:14 ` [External] : " James Puthukattukaran 2022-11-08 9:53 ` Lukas Wunner 0 siblings, 2 replies; 8+ messages in thread From: Bjorn Helgaas @ 2022-11-07 20:41 UTC (permalink / raw) To: James Puthukattukaran; +Cc: Lukas Wunner, Hans de Goede, linux-pci [+cc Lukas, Hans] On Fri, Nov 04, 2022 at 07:08:34PM -0400, James Puthukattukaran wrote: > Looking to solve a problem where we have nvme drives that are hung > in the field and we are not sure of the root cause but the working > theory is that the controller is "bad" and not responding properly > to commands. The nvme driver times out on outstanding IO requests > and as part of recovery, attempts to reset the controller and > reinitialize the device. The reset controller also hangs like here > -- > > ernel:info: [10419813.132341] Workqueue: nvme-reset-wq nvme_reset_work [nvme] > kernel:warning: [10419813.132342] Call Trace: > kernel:warning: [10419813.132345] __schedule+0x2bc/0x89b > kernel:warning: [10419813.132348] schedule+0x36/0x7c > kernel:warning: [10419813.132351] blk_mq_freeze_queue_wait+0x4b/0xaa > kernel:warning: [10419813.132353] ? remove_wait_queue+0x60/0x60 > kernel:warning: [10419813.132359] nvme_wait_freeze+0x33/0x50 [nvme_core] > kernel:warning: [10419813.132362] nvme_reset_work+0x802/0xd84 [nvme] > kernel:warning: [10419813.132364] ? __switch_to_asm+0x40/0x62 > kernel:warning: [10419813.132365] ? __switch_to_asm+0x34/0x62 > kernel:warning: [10419813.132367] ? __switch_to+0x9b/0x505 > kernel:warning: [10419813.132368] ? __switch_to_asm+0x40/0x62 > kernel:warning: [10419813.132370] ? __switch_to_asm+0x40/0x62 > kernel:warning: [10419813.132372] process_one_work+0x169/0x399 > kernel:warning: [10419813.132374] worker_thread+0x4d/0x3e5 > kernel:warning: [10419813.132377] kthread+0x105/0x138 > kernel:warning: [10419813.132379] ? rescuer_thread+0x380/0x375 > kernel:warning: [10419813.132380] ? kthread_bind+0x20/0x15 > kernel:warning: [10419813.132382] ret_from_fork+0x24/0x49 > ... > > So, I tried to hot power off the device via > "echo 0 > /sys/bus/pci/slots/X/power" -- the thread also hangs > waiting for the nvme reset thread to finish (like so) -- Looks like this "power" sysfs file could use some documentation. I couldn't find anything in Documentation/ABI/testing/ that seems to cover it. > kernel:warning: [10419813.158116] __schedule+0x2bc/0x89b > kernel:warning: [10419813.158119] schedule+0x36/0x7c > kernel:warning: [10419813.158122] schedule_timeout+0x1f6/0x31f > kernel:warning: [10419813.158124] ? sched_clock_cpu+0x11/0xa5 > kernel:warning: [10419813.158126] ? try_to_wake_up+0x59/0x505 > kernel:warning: [10419813.158130] wait_for_completion+0x12b/0x18a > kernel:warning: [10419813.158132] ? wake_up_q+0x80/0x73 > kernel:warning: [10419813.158134] flush_work+0x122/0x1a7 > kernel:warning: [10419813.158137] ? wake_up_worker+0x30/0x2b > kernel:warning: [10419813.158141] nvme_remove+0x71/0x100 [nvme] > kernel:warning: [10419813.158146] pci_device_remove+0x3e/0xb6 > kernel:warning: [10419813.158149] device_release_driver_internal+0x134/0x1eb > kernel:warning: [10419813.158151] device_release_driver+0x12/0x14 > kernel:warning: [10419813.158155] pci_stop_bus_device+0x7c/0x96 > kernel:warning: [10419813.158158] pci_stop_bus_device+0x39/0x96 > kernel:warning: [10419813.158164] pci_stop_and_remove_bus_device+0x12/0x1d > kernel:warning: [10419813.158167] pciehp_unconfigure_device+0x7a/0x1d7 > kernel:warning: [10419813.158169] pciehp_disable_slot+0x52/0xca > kernel:warning: [10419813.158171] pciehp_sysfs_disable_slot+0x67/0x112 > kernel:warning: [10419813.158174] disable_slot+0x12/0x14 > kernel:warning: [10419813.158175] power_write_file+0x6e/0xf8 > kernel:warning: [10419813.158179] pci_slot_attr_store+0x24/0x2e > kernel:warning: [10419813.158180] sysfs_kf_write+0x3f/0x46 > kernel:warning: [10419813.158182] kernfs_fop_write+0x124/0x1a3 > kernel:warning: [10419813.158184] __vfs_write+0x3a/0x16d > kernel:warning: [10419813.158187] ? audit_filter_syscall+0x33/0xce > kernel:warning: [10419813.158189] vfs_write+0xb2/0x1a1 > > Is there a way to force power off the device instead of the > "graceful" approach? Obviously, we don't want to reset the system > and don't have physical access to the device. > > Would it make sense to create a "force power off" in > /sys/bus/pci/slots/X which basically > a) Sets completion timeout mask (CTO) (for outstanding IO requests > not causing a fatal error due to CTOs; not an issue for DPCs I > would think) > b) power off the slot > c) enable CTO mask > d) unconfigure the device via pciehp_unconfigure_device So I assume the existing sysfs slot "power" interface would do what you want except that nvme_remove() hangs? There might be some improvement to make in nvme_remove(); maybe it doesn't correctly detect I/O errors or something. But maybe there's *also* a case to be made for an interface like you suggest. Lukas, Hans, any reaction to this? Bjorn ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : Re: sysfs interface to force power off 2022-11-07 20:41 ` Bjorn Helgaas @ 2022-11-07 21:14 ` James Puthukattukaran 2022-11-07 21:29 ` Bjorn Helgaas 2022-11-08 16:12 ` Keith Busch 2022-11-08 9:53 ` Lukas Wunner 1 sibling, 2 replies; 8+ messages in thread From: James Puthukattukaran @ 2022-11-07 21:14 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Lukas Wunner, Hans de Goede, linux-pci On 11/7/22 15:41, Bjorn Helgaas wrote: > [+cc Lukas, Hans] > > On Fri, Nov 04, 2022 at 07:08:34PM -0400, James Puthukattukaran wrote: >> Looking to solve a problem where we have nvme drives that are hung >> in the field and we are not sure of the root cause but the working >> theory is that the controller is "bad" and not responding properly >> to commands. The nvme driver times out on outstanding IO requests >> and as part of recovery, attempts to reset the controller and >> reinitialize the device. The reset controller also hangs like here >> -- >> >> ernel:info: [10419813.132341] Workqueue: nvme-reset-wq nvme_reset_work [nvme] >> kernel:warning: [10419813.132342] Call Trace: >> kernel:warning: [10419813.132345] __schedule+0x2bc/0x89b >> kernel:warning: [10419813.132348] schedule+0x36/0x7c >> kernel:warning: [10419813.132351] blk_mq_freeze_queue_wait+0x4b/0xaa >> kernel:warning: [10419813.132353] ? remove_wait_queue+0x60/0x60 >> kernel:warning: [10419813.132359] nvme_wait_freeze+0x33/0x50 [nvme_core] >> kernel:warning: [10419813.132362] nvme_reset_work+0x802/0xd84 [nvme] >> kernel:warning: [10419813.132364] ? __switch_to_asm+0x40/0x62 >> kernel:warning: [10419813.132365] ? __switch_to_asm+0x34/0x62 >> kernel:warning: [10419813.132367] ? __switch_to+0x9b/0x505 >> kernel:warning: [10419813.132368] ? __switch_to_asm+0x40/0x62 >> kernel:warning: [10419813.132370] ? __switch_to_asm+0x40/0x62 >> kernel:warning: [10419813.132372] process_one_work+0x169/0x399 >> kernel:warning: [10419813.132374] worker_thread+0x4d/0x3e5 >> kernel:warning: [10419813.132377] kthread+0x105/0x138 >> kernel:warning: [10419813.132379] ? rescuer_thread+0x380/0x375 >> kernel:warning: [10419813.132380] ? kthread_bind+0x20/0x15 >> kernel:warning: [10419813.132382] ret_from_fork+0x24/0x49 >> ... >> >> So, I tried to hot power off the device via >> "echo 0 > /sys/bus/pci/slots/X/power" -- the thread also hangs >> waiting for the nvme reset thread to finish (like so) -- > > Looks like this "power" sysfs file could use some documentation. I > couldn't find anything in Documentation/ABI/testing/ that seems to > cover it. > >> kernel:warning: [10419813.158116] __schedule+0x2bc/0x89b >> kernel:warning: [10419813.158119] schedule+0x36/0x7c >> kernel:warning: [10419813.158122] schedule_timeout+0x1f6/0x31f >> kernel:warning: [10419813.158124] ? sched_clock_cpu+0x11/0xa5 >> kernel:warning: [10419813.158126] ? try_to_wake_up+0x59/0x505 >> kernel:warning: [10419813.158130] wait_for_completion+0x12b/0x18a >> kernel:warning: [10419813.158132] ? wake_up_q+0x80/0x73 >> kernel:warning: [10419813.158134] flush_work+0x122/0x1a7 >> kernel:warning: [10419813.158137] ? wake_up_worker+0x30/0x2b >> kernel:warning: [10419813.158141] nvme_remove+0x71/0x100 [nvme] >> kernel:warning: [10419813.158146] pci_device_remove+0x3e/0xb6 >> kernel:warning: [10419813.158149] device_release_driver_internal+0x134/0x1eb >> kernel:warning: [10419813.158151] device_release_driver+0x12/0x14 >> kernel:warning: [10419813.158155] pci_stop_bus_device+0x7c/0x96 >> kernel:warning: [10419813.158158] pci_stop_bus_device+0x39/0x96 >> kernel:warning: [10419813.158164] pci_stop_and_remove_bus_device+0x12/0x1d >> kernel:warning: [10419813.158167] pciehp_unconfigure_device+0x7a/0x1d7 >> kernel:warning: [10419813.158169] pciehp_disable_slot+0x52/0xca >> kernel:warning: [10419813.158171] pciehp_sysfs_disable_slot+0x67/0x112 >> kernel:warning: [10419813.158174] disable_slot+0x12/0x14 >> kernel:warning: [10419813.158175] power_write_file+0x6e/0xf8 >> kernel:warning: [10419813.158179] pci_slot_attr_store+0x24/0x2e >> kernel:warning: [10419813.158180] sysfs_kf_write+0x3f/0x46 >> kernel:warning: [10419813.158182] kernfs_fop_write+0x124/0x1a3 >> kernel:warning: [10419813.158184] __vfs_write+0x3a/0x16d >> kernel:warning: [10419813.158187] ? audit_filter_syscall+0x33/0xce >> kernel:warning: [10419813.158189] vfs_write+0xb2/0x1a1 >> >> Is there a way to force power off the device instead of the >> "graceful" approach? Obviously, we don't want to reset the system >> and don't have physical access to the device. >> >> Would it make sense to create a "force power off" in >> /sys/bus/pci/slots/X which basically > >> a) Sets completion timeout mask (CTO) (for outstanding IO requests >> not causing a fatal error due to CTOs; not an issue for DPCs I >> would think) >> b) power off the slot >> c) enable CTO mask >> d) unconfigure the device via pciehp_unconfigure_device > > So I assume the existing sysfs slot "power" interface would do what > you want except that nvme_remove() hangs? > > There might be some improvement to make in nvme_remove(); maybe it > doesn't correctly detect I/O errors or something. There is a path to disable the controller and that code ran but did not help. I checked wit the nvme folks and Keith mentioned that there might be an issue with the nvme queue management. Unfortunately, we can't try newer kernels in the field. So, looking for a way to just "shut off the device" when we have scenarios like this where we can't untangle the mess. > But maybe there's *also* a case to be made for an interface like you > suggest. Lukas, Hans, any reaction to this? >> Bjorn I have a patch that I've tested out assuming this makes approach makes sense. thanks James ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : Re: sysfs interface to force power off 2022-11-07 21:14 ` [External] : " James Puthukattukaran @ 2022-11-07 21:29 ` Bjorn Helgaas 2022-11-08 16:12 ` Keith Busch 1 sibling, 0 replies; 8+ messages in thread From: Bjorn Helgaas @ 2022-11-07 21:29 UTC (permalink / raw) To: James Puthukattukaran; +Cc: Lukas Wunner, Hans de Goede, linux-pci On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote: > ... > I have a patch that I've tested out assuming this makes approach > makes sense. Don't hesitate to post the patch. It's always easier to talk about things when we can see the concrete details. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : Re: sysfs interface to force power off 2022-11-07 21:14 ` [External] : " James Puthukattukaran 2022-11-07 21:29 ` Bjorn Helgaas @ 2022-11-08 16:12 ` Keith Busch 2022-11-08 20:16 ` Lukas Wunner 1 sibling, 1 reply; 8+ messages in thread From: Keith Busch @ 2022-11-08 16:12 UTC (permalink / raw) To: James Puthukattukaran Cc: Bjorn Helgaas, Lukas Wunner, Hans de Goede, linux-pci On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote: > > There is a path to disable the controller and that code ran but did > not help. I checked wit the nvme folks and Keith mentioned that there > might be an issue with the nvme queue management. Unfortunately, we > can't try newer kernels in the field. So, looking for a way to just > "shut off the device" when we have scenarios like this where we can't > untangle the mess. Well, I didn't request you try new kernels in the field. I asked if you could experiment with a newer one on a development machine to confirm if the bug was fixed by some of the significant changes in this path so that we could confirm a reason to port to stable. You're going to have to change your kernel to fix this observation, so it would be worth the effort to know if the changes being considered actually address the problem. If you're just looking for a work-around for this specific scenario, sorry, I don't think we'll find one. You should just avoid this scenario if you can't change your kernel. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : Re: sysfs interface to force power off 2022-11-08 16:12 ` Keith Busch @ 2022-11-08 20:16 ` Lukas Wunner 2022-11-08 20:37 ` Keith Busch 0 siblings, 1 reply; 8+ messages in thread From: Lukas Wunner @ 2022-11-08 20:16 UTC (permalink / raw) To: Keith Busch Cc: James Puthukattukaran, Bjorn Helgaas, Hans de Goede, linux-pci On Tue, Nov 08, 2022 at 09:12:44AM -0700, Keith Busch wrote: > On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote: > > > > There is a path to disable the controller and that code ran but did > > not help. I checked wit the nvme folks and Keith mentioned that there > > might be an issue with the nvme queue management. Unfortunately, we > > can't try newer kernels in the field. So, looking for a way to just > > "shut off the device" when we have scenarios like this where we can't > > untangle the mess. > > Well, I didn't request you try new kernels in the field. I asked if you > could experiment with a newer one on a development machine to confirm if > the bug was fixed by some of the significant changes in this path so > that we could confirm a reason to port to stable. You're going to have > to change your kernel to fix this observation, so it would be worth the > effort to know if the changes being considered actually address the > problem. Current mainline still contains this problematic sequence: nvme_reset_work() nvme_wait_freeze() blk_mq_freeze_queue_wait() So I'm inclined to believe that the issue still persists, but I agree that validating that hypothesis with a contemporary kernel should be the first step. I think nvme_reset_work() is overly optimistic that resetting the drive succeeded. It just freezes and unfreezes the I/O queue without checking for errors. In particular, nvme_wait_freeze() should call the _timeout variant of blk_mq_freeze_queue_wait() and cope with failure of freezing. Thanks, Lukas ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : Re: sysfs interface to force power off 2022-11-08 20:16 ` Lukas Wunner @ 2022-11-08 20:37 ` Keith Busch 0 siblings, 0 replies; 8+ messages in thread From: Keith Busch @ 2022-11-08 20:37 UTC (permalink / raw) To: Lukas Wunner Cc: James Puthukattukaran, Bjorn Helgaas, Hans de Goede, linux-pci On Tue, Nov 08, 2022 at 09:16:53PM +0100, Lukas Wunner wrote: > On Tue, Nov 08, 2022 at 09:12:44AM -0700, Keith Busch wrote: > > On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote: > > > > > > There is a path to disable the controller and that code ran but did > > > not help. I checked wit the nvme folks and Keith mentioned that there > > > might be an issue with the nvme queue management. Unfortunately, we > > > can't try newer kernels in the field. So, looking for a way to just > > > "shut off the device" when we have scenarios like this where we can't > > > untangle the mess. > > > > Well, I didn't request you try new kernels in the field. I asked if you > > could experiment with a newer one on a development machine to confirm if > > the bug was fixed by some of the significant changes in this path so > > that we could confirm a reason to port to stable. You're going to have > > to change your kernel to fix this observation, so it would be worth the > > effort to know if the changes being considered actually address the > > problem. > > Current mainline still contains this problematic sequence: > > nvme_reset_work() > nvme_wait_freeze() > blk_mq_freeze_queue_wait() > > So I'm inclined to believe that the issue still persists, but I agree Yeah, that sequence exists, but there are some subtle changes with how the workqueues account for unquiesceing hardware queues that can affect how a freeze can make forward progress. > I think nvme_reset_work() is overly optimistic that resetting the drive > succeeded. It just freezes and unfreezes the I/O queue without checking > for errors. I'm not sure what you mean. An nvme reset is a CC.EN 0->1 transition, and we definitely confirm that succeeds. If you're referring to the 1->0 transition, that has to happen after the initial freeze/quiesce steps, but whether or not that succeeds shouldn't be relevant to the rest of the sequence: we're about to disable the device at the PCI level. > In particular, nvme_wait_freeze() should call the _timeout variant of > blk_mq_freeze_queue_wait() and cope with failure of freezing. That would indicate we have a mismatched freeze depth or a unbalanced quiesce problem, so the timeout freeze would just mask the underlying issue. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sysfs interface to force power off 2022-11-07 20:41 ` Bjorn Helgaas 2022-11-07 21:14 ` [External] : " James Puthukattukaran @ 2022-11-08 9:53 ` Lukas Wunner 1 sibling, 0 replies; 8+ messages in thread From: Lukas Wunner @ 2022-11-08 9:53 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: James Puthukattukaran, Hans de Goede, linux-pci On Mon, Nov 07, 2022 at 02:41:29PM -0600, Bjorn Helgaas wrote: > On Fri, Nov 04, 2022 at 07:08:34PM -0400, James Puthukattukaran wrote: > > Looking to solve a problem where we have nvme drives that are hung > > in the field and we are not sure of the root cause but the working > > theory is that the controller is "bad" and not responding properly > > to commands. The nvme driver times out on outstanding IO requests > > and as part of recovery, attempts to reset the controller and > > reinitialize the device. The reset controller also hangs like here > > -- > > > > ernel:info: [10419813.132341] Workqueue: nvme-reset-wq nvme_reset_work [nvme] > > kernel:warning: [10419813.132342] Call Trace: > > kernel:warning: [10419813.132345] __schedule+0x2bc/0x89b > > kernel:warning: [10419813.132348] schedule+0x36/0x7c > > kernel:warning: [10419813.132351] blk_mq_freeze_queue_wait+0x4b/0xaa > > kernel:warning: [10419813.132353] ? remove_wait_queue+0x60/0x60 > > kernel:warning: [10419813.132359] nvme_wait_freeze+0x33/0x50 [nvme_core] > > kernel:warning: [10419813.132362] nvme_reset_work+0x802/0xd84 [nvme] > > kernel:warning: [10419813.132364] ? __switch_to_asm+0x40/0x62 > > kernel:warning: [10419813.132365] ? __switch_to_asm+0x34/0x62 > > kernel:warning: [10419813.132367] ? __switch_to+0x9b/0x505 > > kernel:warning: [10419813.132368] ? __switch_to_asm+0x40/0x62 > > kernel:warning: [10419813.132370] ? __switch_to_asm+0x40/0x62 > > kernel:warning: [10419813.132372] process_one_work+0x169/0x399 > > kernel:warning: [10419813.132374] worker_thread+0x4d/0x3e5 > > kernel:warning: [10419813.132377] kthread+0x105/0x138 > > kernel:warning: [10419813.132379] ? rescuer_thread+0x380/0x375 > > kernel:warning: [10419813.132380] ? kthread_bind+0x20/0x15 > > kernel:warning: [10419813.132382] ret_from_fork+0x24/0x49 > > ... > > > > So, I tried to hot power off the device via > > "echo 0 > /sys/bus/pci/slots/X/power" -- the thread also hangs > > waiting for the nvme reset thread to finish (like so) -- > > Looks like this "power" sysfs file could use some documentation. I > couldn't find anything in Documentation/ABI/testing/ that seems to > cover it. That sysfs attribute was introduced in early 2002, I guess we were less diligent with documentation back then: http://git.kernel.org/tglx/history/c/a8a2069f432c (search for power_write_file() in the commit) The problem here is in the NVMe / block layer, not the PCI layer. nvme_wait_freeze() calls blk_mq_freeze_queue_wait(), but obviously it should call blk_mq_freeze_queue_wait_timeout() instead and handle a timeout by retiring any outstanding I/O requests to the drive and marking it as dead. > > kernel:warning: [10419813.158116] __schedule+0x2bc/0x89b > > kernel:warning: [10419813.158119] schedule+0x36/0x7c > > kernel:warning: [10419813.158122] schedule_timeout+0x1f6/0x31f > > kernel:warning: [10419813.158124] ? sched_clock_cpu+0x11/0xa5 > > kernel:warning: [10419813.158126] ? try_to_wake_up+0x59/0x505 > > kernel:warning: [10419813.158130] wait_for_completion+0x12b/0x18a > > kernel:warning: [10419813.158132] ? wake_up_q+0x80/0x73 > > kernel:warning: [10419813.158134] flush_work+0x122/0x1a7 > > kernel:warning: [10419813.158137] ? wake_up_worker+0x30/0x2b > > kernel:warning: [10419813.158141] nvme_remove+0x71/0x100 [nvme] > > kernel:warning: [10419813.158146] pci_device_remove+0x3e/0xb6 > > kernel:warning: [10419813.158149] device_release_driver_internal+0x134/0x1eb > > kernel:warning: [10419813.158151] device_release_driver+0x12/0x14 > > kernel:warning: [10419813.158155] pci_stop_bus_device+0x7c/0x96 > > kernel:warning: [10419813.158158] pci_stop_bus_device+0x39/0x96 > > kernel:warning: [10419813.158164] pci_stop_and_remove_bus_device+0x12/0x1d > > kernel:warning: [10419813.158167] pciehp_unconfigure_device+0x7a/0x1d7 > > kernel:warning: [10419813.158169] pciehp_disable_slot+0x52/0xca > > kernel:warning: [10419813.158171] pciehp_sysfs_disable_slot+0x67/0x112 > > kernel:warning: [10419813.158174] disable_slot+0x12/0x14 > > kernel:warning: [10419813.158175] power_write_file+0x6e/0xf8 > > kernel:warning: [10419813.158179] pci_slot_attr_store+0x24/0x2e > > kernel:warning: [10419813.158180] sysfs_kf_write+0x3f/0x46 > > kernel:warning: [10419813.158182] kernfs_fop_write+0x124/0x1a3 > > kernel:warning: [10419813.158184] __vfs_write+0x3a/0x16d > > kernel:warning: [10419813.158187] ? audit_filter_syscall+0x33/0xce > > kernel:warning: [10419813.158189] vfs_write+0xb2/0x1a1 > > > > Is there a way to force power off the device instead of the > > "graceful" approach? Obviously, we don't want to reset the system > > and don't have physical access to the device. > > > > Would it make sense to create a "force power off" in > > /sys/bus/pci/slots/X which basically The power attribute in sysfs already does what you want, but when unbinding the nvme driver from the device, the flush_work() call waits for nvme_reset_work() to finish. And because that's stuck, unbinding also gets stuck. Again, the solution is a code fix in the NVMe / block layer, so the proper mailing list to ask would be linux-nvme and linux-block. Thanks, Lukas ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-11-08 20:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-04 23:08 sysfs interface to force power off James Puthukattukaran 2022-11-07 20:41 ` Bjorn Helgaas 2022-11-07 21:14 ` [External] : " James Puthukattukaran 2022-11-07 21:29 ` Bjorn Helgaas 2022-11-08 16:12 ` Keith Busch 2022-11-08 20:16 ` Lukas Wunner 2022-11-08 20:37 ` Keith Busch 2022-11-08 9:53 ` Lukas Wunner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.