linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel hangs when powering up/down drive using sysfs
@ 2020-03-14 14:19 Hoyer, David
  2020-03-16 16:15 ` Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hoyer, David @ 2020-03-14 14:19 UTC (permalink / raw)
  To: linux-pci

We have come across what we believe to be a kernel bug (using 4.19.98-1 kernel in Debian Buster).

We are seeing an issue with the latest kernel when it comes to powering up/down drives.    If you use /sys/bus/pci/slots/<N>/power to power up/down a drive, the command is hanging with this latest version of the kernel.   We have found that applying the following change appears to fix it.   We are not finding any bug reports upstream yet.

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index c3e3f53..c4d230a 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
        events = atomic_xchg(&ctrl->pending_events, 0);
        if (!events) {
                pci_config_pm_runtime_put(pdev);
+               ctrl->ist_running = false;
+               wake_up(&ctrl->requester);
                return IRQ_NONE;
       }

We believe the reason we see this (and not the others who introduced it) are because of our timings on our PLX-based switch.  Since we fan things out over I2C transactions instead of direct GPIOs, there are different timings than most folks would see.

We've instrumented the code and we do see that pciehp_ist() runs twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE.  We believe that is due to the timing differences.  Adding debug in here changes the timings enough that the hang goes away, so we are having troubles proving this 100% at the moment.  But just based on code inspection, if pciehp_ist() exits with the IRQ_NONE case, then nothing will ever set ist_running=false until a subsequent hotplug event happens that causes the IRQ_HANDLED case to run.  (We were able to prove that will cause things to "unhang" and progress at that point - if you're hung and you remove a drive, the slot status change will then unstick things.)

So we currently believe the problem is:
pciehp_sysfs_enable/disable_slot() each have a wait_event() that checks for ctrl->pending_events and ctrl->ist_running to both be false.  We run through pciehp_ist() the first time, return with IRQ_HANDLED, pending_events==0, ist_running==0.  While pciehp_sysfs_enable/disable_slot() are waking up, pciehp_ist() runs a 2nd time, setting ist_running==1, (pending_events is still 0) and returning IRQ_NONE.  Now that ist_running==1, the original wait_event() goes back to waiting and will not be woken up until something causes pciehp_ist() to run again with a pending_events!=0 that will result in returning IRQ_HANDLED.

Setting ist_running=0 and issuing a wake_up() for the IRQ_NONE return case in pciehp_ist() seems to fix this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel hangs when powering up/down drive using sysfs
  2020-03-14 14:19 Kernel hangs when powering up/down drive using sysfs Hoyer, David
@ 2020-03-16 16:15 ` Keith Busch
  2020-03-16 18:10   ` Lukas Wunner
  2020-03-16 18:19 ` Lukas Wunner
  2020-03-18 11:33 ` [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests Lukas Wunner
  2 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2020-03-16 16:15 UTC (permalink / raw)
  To: Hoyer, David; +Cc: linux-pci

On Sat, Mar 14, 2020 at 02:19:44PM +0000, Hoyer, David wrote:
> We have come across what we believe to be a kernel bug (using 4.19.98-1 kernel in Debian Buster).
> 
> We are seeing an issue with the latest kernel when it comes to powering up/down drives.    If you use /sys/bus/pci/slots/<N>/power to power up/down a drive, the command is hanging with this latest version of the kernel.   We have found that applying the following change appears to fix it.   We are not finding any bug reports upstream yet.
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index c3e3f53..c4d230a 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>         events = atomic_xchg(&ctrl->pending_events, 0);
>         if (!events) {
>                 pci_config_pm_runtime_put(pdev);
> +               ctrl->ist_running = false;
> +               wake_up(&ctrl->requester);
>                 return IRQ_NONE;
>        }
> 
> We believe the reason we see this (and not the others who introduced it) are because of our timings on our PLX-based switch.  Since we fan things out over I2C transactions instead of direct GPIOs, there are different timings than most folks would see.
> 
> We've instrumented the code and we do see that pciehp_ist() runs twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE.  We believe that is due to the timing differences.  Adding debug in here changes the timings enough that the hang goes away, so we are having troubles proving this 100% at the moment.  But just based on code inspection, if pciehp_ist() exits with the IRQ_NONE case, then nothing will ever set ist_running=false until a subsequent hotplug event happens that causes the IRQ_HANDLED case to run.  (We were able to prove that will cause things to "unhang" and progress at that point - if you're hung and you remove a drive, the slot status change will then unstick things.)
> 
> So we currently believe the problem is:
> pciehp_sysfs_enable/disable_slot() each have a wait_event() that checks for ctrl->pending_events and ctrl->ist_running to both be false.  We run through pciehp_ist() the first time, return with IRQ_HANDLED, pending_events==0, ist_running==0.  While pciehp_sysfs_enable/disable_slot() are waking up, pciehp_ist() runs a 2nd time, setting ist_running==1, (pending_events is still 0) and returning IRQ_NONE.  Now that ist_running==1, the original wait_event() goes back to waiting and will not be woken up until something causes pciehp_ist() to run again with a pending_events!=0 that will result in returning IRQ_HANDLED.
> 
> Setting ist_running=0 and issuing a wake_up() for the IRQ_NONE return case in pciehp_ist() seems to fix this.

I follow what you're saying.

I'm not sure why the hard-irq context is even setting the thread running
flag while it can still exit without handling anything. Shouldn't it leave
the flag cleared until knows it's actually going to do something?

---
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 8a2cb1764386..adf13657b3d2 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -618,7 +618,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	irqreturn_t ret;
 	u32 events;
 
-	ctrl->ist_running = true;
 	pci_config_pm_runtime_get(pdev);
 
 	/* rerun pciehp_isr() if the port was inaccessible on interrupt */
@@ -638,6 +637,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 		return IRQ_NONE;
 	}
 
+	ctrl->ist_running = true;
 	/* Check Attention Button Pressed */
 	if (events & PCI_EXP_SLTSTA_ABP) {
 		ctrl_info(ctrl, "Slot(%s): Attention button pressed\n",
--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel hangs when powering up/down drive using sysfs
  2020-03-16 16:15 ` Keith Busch
@ 2020-03-16 18:10   ` Lukas Wunner
  2020-03-16 18:42     ` Keith Busch
  0 siblings, 1 reply; 13+ messages in thread
From: Lukas Wunner @ 2020-03-16 18:10 UTC (permalink / raw)
  To: Keith Busch; +Cc: Hoyer, David, linux-pci

On Mon, Mar 16, 2020 at 09:15:43AM -0700, Keith Busch wrote:
> I'm not sure why the hard-irq context is even setting the thread running
> flag while it can still exit without handling anything. Shouldn't it leave
> the flag cleared until knows it's actually going to do something?

No, ist_running must be set to true before the invocation of
atomic_xchg(&ctrl->pending_events, 0).

There's a time window between the atomic_xchg() and actually
turning off the slot when pending_events is 0.  Previously we
only checked in the sysfs functions that pending_events is 0.
That was insufficient as we risked returning prematurely from
the sysfs functions.  The point of ist_running is to prevent
that.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel hangs when powering up/down drive using sysfs
  2020-03-14 14:19 Kernel hangs when powering up/down drive using sysfs Hoyer, David
  2020-03-16 16:15 ` Keith Busch
@ 2020-03-16 18:19 ` Lukas Wunner
  2020-03-16 18:25   ` Hoyer, David
  2020-03-18 11:33 ` [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests Lukas Wunner
  2 siblings, 1 reply; 13+ messages in thread
From: Lukas Wunner @ 2020-03-16 18:19 UTC (permalink / raw)
  To: Hoyer, David; +Cc: linux-pci, Keith Busch

On Sat, Mar 14, 2020 at 02:19:44PM +0000, Hoyer, David wrote:
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>         events = atomic_xchg(&ctrl->pending_events, 0);
>         if (!events) {
>                 pci_config_pm_runtime_put(pdev);
> +               ctrl->ist_running = false;
> +               wake_up(&ctrl->requester);
>                 return IRQ_NONE;
>        }

Thanks David for the report and sorry for the breakage.

The above LGTM, please submit it as a proper patch and
feel free to add my Reviewed-by.  Please add the same
two lines before the "return ret" a little further up
in the function.

If it's too cumbersome for you to submit a proper patch
I can do it for you.


> We've instrumented the code and we do see that pciehp_ist() runs
> twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE.
> We believe that is due to the timing differences.  Adding debug in
> here changes the timings enough that the hang goes away, so we are
> having troubles proving this 100% at the moment.  But just based on
> code inspection, if pciehp_ist() exits with the IRQ_NONE case, then
> nothing will ever set ist_running=false until a subsequent hotplug
> event happens that causes the IRQ_HANDLED case to run.  (We were
> able to prove that will cause things to "unhang" and progress at
> that point - if you're hung and you remove a drive, the slot status
> change will then unstick things.)

The question is, why is pciehp_ist() run once more.  Most likely
because another event is signaled from the slot.  Try adding a
printk() at the top of pciehp_ist() which emits ctrl->pending_events
to understand what's going on.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Kernel hangs when powering up/down drive using sysfs
  2020-03-16 18:19 ` Lukas Wunner
@ 2020-03-16 18:25   ` Hoyer, David
  2020-03-16 21:35     ` Hoyer, David
  2020-03-18 11:49     ` Lukas Wunner
  0 siblings, 2 replies; 13+ messages in thread
From: Hoyer, David @ 2020-03-16 18:25 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, Keith Busch

We were not sure about the return just a few lines up so we did not add the 2 lines.
I will try what you suggested to better understand why we are getting the extra interrupt.

I am not as familiar with submitting a "proper patch" and ask that you do it if you would be so kind.

-----Original Message-----
From: Lukas Wunner <lukas@wunner.de> 
Sent: Monday, March 16, 2020 1:20 PM
To: Hoyer, David <David.Hoyer@netapp.com>
Cc: linux-pci@vger.kernel.org; Keith Busch <kbusch@kernel.org>
Subject: Re: Kernel hangs when powering up/down drive using sysfs

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Sat, Mar 14, 2020 at 02:19:44PM +0000, Hoyer, David wrote:
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>         events = atomic_xchg(&ctrl->pending_events, 0);
>         if (!events) {
>                 pci_config_pm_runtime_put(pdev);
> +               ctrl->ist_running = false;
> +               wake_up(&ctrl->requester);
>                 return IRQ_NONE;
>        }

Thanks David for the report and sorry for the breakage.

The above LGTM, please submit it as a proper patch and feel free to add my Reviewed-by.  Please add the same two lines before the "return ret" a little further up in the function.

If it's too cumbersome for you to submit a proper patch I can do it for you.


> We've instrumented the code and we do see that pciehp_ist() runs 
> twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE.
> We believe that is due to the timing differences.  Adding debug in 
> here changes the timings enough that the hang goes away, so we are 
> having troubles proving this 100% at the moment.  But just based on 
> code inspection, if pciehp_ist() exits with the IRQ_NONE case, then 
> nothing will ever set ist_running=false until a subsequent hotplug 
> event happens that causes the IRQ_HANDLED case to run.  (We were able 
> to prove that will cause things to "unhang" and progress at that point 
> - if you're hung and you remove a drive, the slot status change will 
> then unstick things.)

The question is, why is pciehp_ist() run once more.  Most likely because another event is signaled from the slot.  Try adding a
printk() at the top of pciehp_ist() which emits ctrl->pending_events to understand what's going on.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel hangs when powering up/down drive using sysfs
  2020-03-16 18:10   ` Lukas Wunner
@ 2020-03-16 18:42     ` Keith Busch
  2020-03-18 11:53       ` Lukas Wunner
  0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2020-03-16 18:42 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Hoyer, David, linux-pci

On Mon, Mar 16, 2020 at 07:10:58PM +0100, Lukas Wunner wrote:
> On Mon, Mar 16, 2020 at 09:15:43AM -0700, Keith Busch wrote:
> > I'm not sure why the hard-irq context is even setting the thread running
> > flag while it can still exit without handling anything. Shouldn't it leave
> > the flag cleared until knows it's actually going to do something?
> 
> No, ist_running must be set to true before the invocation of
> atomic_xchg(&ctrl->pending_events, 0).

Okay, I see what you mean.

Even with David's patch, there's still another condition that could exit
with ist_running set. It may make sense to move the setting just above
the atomic_xchg() so that clearing it doesn't need to be duplicated for
the remaining uncommon exit case.
 
> There's a time window between the atomic_xchg() and actually
> turning off the slot when pending_events is 0.  Previously we
> only checked in the sysfs functions that pending_events is 0.
> That was insufficient as we risked returning prematurely from
> the sysfs functions.  The point of ist_running is to prevent
> that.

Oh, right, I'm remembering now. And since we've a polled option for
pciehp, using synchronize_irq() from the sysfs path isn't possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Kernel hangs when powering up/down drive using sysfs
  2020-03-16 18:25   ` Hoyer, David
@ 2020-03-16 21:35     ` Hoyer, David
  2020-03-18 11:49     ` Lukas Wunner
  1 sibling, 0 replies; 13+ messages in thread
From: Hoyer, David @ 2020-03-16 21:35 UTC (permalink / raw)
  To: Hoyer, David, Lukas Wunner; +Cc: linux-pci, Keith Busch

I ran the suggested experiment.   The first interrupt is reporting non-zero pending event (either 0x08 or 0x1000 depending on power up or power down).   The second interrupt is always zero.   So it sounds like we are getting an interrupt indicating work complete.

Power up:
Mar 16 21:10:15 eos-a kernel: pending events x8
Mar 16 21:10:15 eos-a kernel: pciehp 0000:66:09.0:pcie204: Slot(9): Card present
Mar 16 21:10:15 eos-a kernel: pci 0000:6f:00.0: Max Payload Size set to 256 (was 128, max 256)
Mar 16 21:10:15 eos-a kernel: iommu: Adding device 0000:6f:00.0 to group 70
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0: BAR 13: no space for [io  size 0x1000]
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0: BAR 13: failed to assign [io  size 0x1000]
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0: BAR 13: no space for [io  size 0x1000]
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0: BAR 13: failed to assign [io  size 0x1000]
Mar 16 21:10:15 eos-a kernel: pci 0000:6f:00.0: BAR 0: assigned [mem 0xe0b00000-0xe0b03fff 64bit]
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0: PCI bridge to [bus 6f]
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0:   bridge window [mem 0xe0b00000-0xe0bfffff]
Mar 16 21:10:15 eos-a kernel: pcieport 0000:66:09.0:   bridge window [mem 0x3ac00c000000-0x3ac00dffffff 64bit pref]
Mar 16 21:10:15 eos-a kernel: pending events x0
Mar 16 21:10:16 eos-a kernel: vfio-pci 0000:6f:00.0: enabling device (0100 -> 0102)
Mar 16 21:10:16 eos-a kernel: pcieport 0000:64:00.0: can't derive routing for PCI INT A
Mar 16 21:10:16 eos-a kernel: vfio-pci 0000:6f:00.0: PCI INT A: not connected
Mar 16 21:10:16 eos-a kernel: vfio_ecap_init: 0000:6f:00.0 hiding ecap 0x19@0x178

Power down:
Mar 16 21:10:47 eos-a kernel: pending events x10000
Mar 16 21:10:47 eos-a kernel: iommu: Removing device 0000:6f:00.0 from group 70
Mar 16 21:10:48 eos-a kernel: pending events x0

-----Original Message-----
From: linux-pci-owner@vger.kernel.org <linux-pci-owner@vger.kernel.org> On Behalf Of Hoyer, David
Sent: Monday, March 16, 2020 1:26 PM
To: Lukas Wunner <lukas@wunner.de>
Cc: linux-pci@vger.kernel.org; Keith Busch <kbusch@kernel.org>
Subject: RE: Kernel hangs when powering up/down drive using sysfs

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




We were not sure about the return just a few lines up so we did not add the 2 lines.
I will try what you suggested to better understand why we are getting the extra interrupt.

I am not as familiar with submitting a "proper patch" and ask that you do it if you would be so kind.

-----Original Message-----
From: Lukas Wunner <lukas@wunner.de>
Sent: Monday, March 16, 2020 1:20 PM
To: Hoyer, David <David.Hoyer@netapp.com>
Cc: linux-pci@vger.kernel.org; Keith Busch <kbusch@kernel.org>
Subject: Re: Kernel hangs when powering up/down drive using sysfs

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Sat, Mar 14, 2020 at 02:19:44PM +0000, Hoyer, David wrote:
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -637,6 +637,8 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>         events = atomic_xchg(&ctrl->pending_events, 0);
>         if (!events) {
>                 pci_config_pm_runtime_put(pdev);
> +               ctrl->ist_running = false;
> +               wake_up(&ctrl->requester);
>                 return IRQ_NONE;
>        }

Thanks David for the report and sorry for the breakage.

The above LGTM, please submit it as a proper patch and feel free to add my Reviewed-by.  Please add the same two lines before the "return ret" a little further up in the function.

If it's too cumbersome for you to submit a proper patch I can do it for you.


> We've instrumented the code and we do see that pciehp_ist() runs 
> twice, once exiting with IRQ_HANDLED and then again with IRQ_NONE.
> We believe that is due to the timing differences.  Adding debug in 
> here changes the timings enough that the hang goes away, so we are 
> having troubles proving this 100% at the moment.  But just based on 
> code inspection, if pciehp_ist() exits with the IRQ_NONE case, then 
> nothing will ever set ist_running=false until a subsequent hotplug 
> event happens that causes the IRQ_HANDLED case to run.  (We were able 
> to prove that will cause things to "unhang" and progress at that point
> - if you're hung and you remove a drive, the slot status change will 
> then unstick things.)

The question is, why is pciehp_ist() run once more.  Most likely because another event is signaled from the slot.  Try adding a
printk() at the top of pciehp_ist() which emits ctrl->pending_events to understand what's going on.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests
  2020-03-14 14:19 Kernel hangs when powering up/down drive using sysfs Hoyer, David
  2020-03-16 16:15 ` Keith Busch
  2020-03-16 18:19 ` Lukas Wunner
@ 2020-03-18 11:33 ` Lukas Wunner
  2020-03-18 16:43   ` Keith Busch
  2020-03-28 20:25   ` Bjorn Helgaas
  2 siblings, 2 replies; 13+ messages in thread
From: Lukas Wunner @ 2020-03-18 11:33 UTC (permalink / raw)
  To: Bjorn Helgaas, David Hoyer
  Cc: Keith Busch, linux-pci, Xiongfeng Wang, Sergey Miroshnichenko

David Hoyer reports that powering pciehp slots up or down via sysfs may
hang:  The call to wait_event() in pciehp_sysfs_enable_slot() and
_disable_slot() does not return because ctrl->ist_running remains true.

This flag, which was introduced by commit 54ecb8f7028c ("PCI: pciehp:
Avoid returning prematurely from sysfs requests"), signifies that the
IRQ thread pciehp_ist() is running.  It is set to true at the top of
pciehp_ist() and reset to false at the end.  However there are two
additional return statements in pciehp_ist() before which the commit
neglected to reset the flag to false and wake up waiters for the flag.

That omission opens up the following race when powering up the slot:

* pciehp_ist() runs because a PCI_EXP_SLTSTA_PDC event was requested
  by pciehp_sysfs_enable_slot()

* pciehp_ist() turns on slot power via the following call stack:
  pciehp_handle_presence_or_link_change() -> pciehp_enable_slot() ->
  __pciehp_enable_slot() -> board_added() -> pciehp_power_on_slot()

* after slot power is turned on, the link comes up, resulting in a
  PCI_EXP_SLTSTA_DLLSC event

* the IRQ handler pciehp_isr() stores the event in ctrl->pending_events
  and returns IRQ_WAKE_THREAD

* the IRQ thread is already woken (it's bringing up the slot), but the
  genirq code remembers to re-run the IRQ thread after it has finished
  (such that it can deal with the new event) by setting IRQTF_RUNTHREAD
  via __handle_irq_event_percpu() -> __irq_wake_thread()

* the IRQ thread removes PCI_EXP_SLTSTA_DLLSC from ctrl->pending_events
  via board_added() -> pciehp_check_link_status() in order to deal with
  presence and link flaps per commit 6c35a1ac3da6 ("PCI: pciehp:
  Tolerate initially unstable link")

* after pciehp_ist() has successfully brought up the slot, it resets
  ctrl->ist_running to false and wakes up the sysfs requester

* the genirq code re-runs pciehp_ist(), which sets ctrl->ist_running
  to true but then returns with IRQ_NONE because ctrl->pending_events
  is empty

* pciehp_sysfs_enable_slot() is finally woken but notices that
  ctrl->ist_running is true, hence continues waiting

The only way to get the hung task going again is to trigger a hotplug
event which brings down the slot, e.g. by yanking out the card.

The same race exists when powering down the slot because remove_board()
likewise clears link or presence changes in ctrl->pending_events per
commit 3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes after
powering off a slot") and thereby may cause a re-run of pciehp_ist()
which returns with IRQ_NONE without resetting ctrl->ist_running to false.

Fix by adding a goto label before the teardown steps at the end of
pciehp_ist() and jumping to that label from the two return statements
which currently neglect to reset the ctrl->ist_running flag.

Fixes: 54ecb8f7028c ("PCI: pciehp: Avoid returning prematurely from sysfs requests")
Reported-by: David Hoyer <David.Hoyer@netapp.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.19+
Cc: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/hotplug/pciehp_hpc.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index e4627c68b30f..5f1a27bfcb19 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -663,17 +663,15 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	if (atomic_fetch_and(~RERUN_ISR, &ctrl->pending_events) & RERUN_ISR) {
 		ret = pciehp_isr(irq, dev_id);
 		enable_irq(irq);
-		if (ret != IRQ_WAKE_THREAD) {
-			pci_config_pm_runtime_put(pdev);
-			return ret;
-		}
+		if (ret != IRQ_WAKE_THREAD)
+			goto out;
 	}
 
 	synchronize_hardirq(irq);
 	events = atomic_xchg(&ctrl->pending_events, 0);
 	if (!events) {
-		pci_config_pm_runtime_put(pdev);
-		return IRQ_NONE;
+		ret = IRQ_NONE;
+		goto out;
 	}
 
 	/* Check Attention Button Pressed */
@@ -702,10 +700,12 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 		pciehp_handle_presence_or_link_change(ctrl, events);
 	up_read(&ctrl->reset_lock);
 
+	ret = IRQ_HANDLED;
+out:
 	pci_config_pm_runtime_put(pdev);
 	ctrl->ist_running = false;
 	wake_up(&ctrl->requester);
-	return IRQ_HANDLED;
+	return ret;
 }
 
 static int pciehp_poll(void *data)
-- 
2.25.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel hangs when powering up/down drive using sysfs
  2020-03-16 18:25   ` Hoyer, David
  2020-03-16 21:35     ` Hoyer, David
@ 2020-03-18 11:49     ` Lukas Wunner
  2020-03-18 14:06       ` Hoyer, David
  1 sibling, 1 reply; 13+ messages in thread
From: Lukas Wunner @ 2020-03-18 11:49 UTC (permalink / raw)
  To: Hoyer, David; +Cc: linux-pci, Keith Busch

On Mon, Mar 16, 2020 at 06:25:53PM +0000, Hoyer, David wrote:
> I am not as familiar with submitting a "proper patch" and ask that you
> do it if you would be so kind.

I've just submitted a patch to address this issue.  Does it work for you?

The term "proper patch" is just kernel slang for a patch which can be
applied with "git am" by the PCI maintainer, Bjorn Helgaas.

Such a patch can be generated with "git format-patch" and sent out
with "msmtp" or "git send-email".

There are some formal requirements which the patch needs to satisfy,
they are listed here:

https://www.kernel.org/doc/html/latest/process/submitting-patches.html
https://www.kernel.org/doc/html/latest/process/development-process.html
https://marc.info/?l=linux-pci&m=150905742808166

Never mind all these details if you do not intend to submit patches
yourself.  In this case, I introduced this embarrassing mistake, so
it's my job to provide a solution if you don't want to.

Thanks for all the debugging work you have done and once again sorry
for the breakage.

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Kernel hangs when powering up/down drive using sysfs
  2020-03-16 18:42     ` Keith Busch
@ 2020-03-18 11:53       ` Lukas Wunner
  0 siblings, 0 replies; 13+ messages in thread
From: Lukas Wunner @ 2020-03-18 11:53 UTC (permalink / raw)
  To: Keith Busch; +Cc: Hoyer, David, linux-pci

On Mon, Mar 16, 2020 at 11:42:00AM -0700, Keith Busch wrote:
> On Mon, Mar 16, 2020 at 07:10:58PM +0100, Lukas Wunner wrote:
> > On Mon, Mar 16, 2020 at 09:15:43AM -0700, Keith Busch wrote:
> > > I'm not sure why the hard-irq context is even setting the thread running
> > > flag while it can still exit without handling anything. Shouldn't it
> > > leave the flag cleared until knows it's actually going to do something?
> > 
> > No, ist_running must be set to true before the invocation of
> > atomic_xchg(&ctrl->pending_events, 0).
> 
> Okay, I see what you mean.
> 
> Even with David's patch, there's still another condition that could exit
> with ist_running set. It may make sense to move the setting just above
> the atomic_xchg() so that clearing it doesn't need to be duplicated for
> the remaining uncommon exit case.

Right, or introduce a goto label before the teardown code at the end of
pciehp_ist() and replace the return statements with gotos.  I've compared
both approaches and the goto solution required the least lines of code
and subjectively looked the nicest, so I went with that.


> > There's a time window between the atomic_xchg() and actually
> > turning off the slot when pending_events is 0.  Previously we
> > only checked in the sysfs functions that pending_events is 0.
> > That was insufficient as we risked returning prematurely from
> > the sysfs functions.  The point of ist_running is to prevent
> > that.
> 
> Oh, right, I'm remembering now. And since we've a polled option for
> pciehp, using synchronize_irq() from the sysfs path isn't possible.

Precisely. :-)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Kernel hangs when powering up/down drive using sysfs
  2020-03-18 11:49     ` Lukas Wunner
@ 2020-03-18 14:06       ` Hoyer, David
  0 siblings, 0 replies; 13+ messages in thread
From: Hoyer, David @ 2020-03-18 14:06 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, Keith Busch

Thank you for the information and for submitting the patch.   I will review this so that I can do this better in the future.   I appreciate all the help!

-----Original Message-----
From: Lukas Wunner <lukas@wunner.de> 
Sent: Wednesday, March 18, 2020 6:50 AM
To: Hoyer, David <David.Hoyer@netapp.com>
Cc: linux-pci@vger.kernel.org; Keith Busch <kbusch@kernel.org>
Subject: Re: Kernel hangs when powering up/down drive using sysfs

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Mon, Mar 16, 2020 at 06:25:53PM +0000, Hoyer, David wrote:
> I am not as familiar with submitting a "proper patch" and ask that you 
> do it if you would be so kind.

I've just submitted a patch to address this issue.  Does it work for you?

The term "proper patch" is just kernel slang for a patch which can be applied with "git am" by the PCI maintainer, Bjorn Helgaas.

Such a patch can be generated with "git format-patch" and sent out with "msmtp" or "git send-email".

There are some formal requirements which the patch needs to satisfy, they are listed here:

https://www.kernel.org/doc/html/latest/process/submitting-patches.html
https://www.kernel.org/doc/html/latest/process/development-process.html
https://marc.info/?l=linux-pci&m=150905742808166

Never mind all these details if you do not intend to submit patches yourself.  In this case, I introduced this embarrassing mistake, so it's my job to provide a solution if you don't want to.

Thanks for all the debugging work you have done and once again sorry for the breakage.

Lukas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests
  2020-03-18 11:33 ` [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests Lukas Wunner
@ 2020-03-18 16:43   ` Keith Busch
  2020-03-28 20:25   ` Bjorn Helgaas
  1 sibling, 0 replies; 13+ messages in thread
From: Keith Busch @ 2020-03-18 16:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, David Hoyer, linux-pci, Xiongfeng Wang,
	Sergey Miroshnichenko

On Wed, Mar 18, 2020 at 12:33:12PM +0100, Lukas Wunner wrote:
> Fixes: 54ecb8f7028c ("PCI: pciehp: Avoid returning prematurely from sysfs requests")
> Reported-by: David Hoyer <David.Hoyer@netapp.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v4.19+
> Cc: Keith Busch <kbusch@kernel.org>

Looks good to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>

>  drivers/pci/hotplug/pciehp_hpc.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index e4627c68b30f..5f1a27bfcb19 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -663,17 +663,15 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  	if (atomic_fetch_and(~RERUN_ISR, &ctrl->pending_events) & RERUN_ISR) {
>  		ret = pciehp_isr(irq, dev_id);
>  		enable_irq(irq);
> -		if (ret != IRQ_WAKE_THREAD) {
> -			pci_config_pm_runtime_put(pdev);
> -			return ret;
> -		}
> +		if (ret != IRQ_WAKE_THREAD)
> +			goto out;
>  	}
>  
>  	synchronize_hardirq(irq);
>  	events = atomic_xchg(&ctrl->pending_events, 0);
>  	if (!events) {
> -		pci_config_pm_runtime_put(pdev);
> -		return IRQ_NONE;
> +		ret = IRQ_NONE;
> +		goto out;
>  	}
>  
>  	/* Check Attention Button Pressed */
> @@ -702,10 +700,12 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  		pciehp_handle_presence_or_link_change(ctrl, events);
>  	up_read(&ctrl->reset_lock);
>  
> +	ret = IRQ_HANDLED;
> +out:
>  	pci_config_pm_runtime_put(pdev);
>  	ctrl->ist_running = false;
>  	wake_up(&ctrl->requester);
> -	return IRQ_HANDLED;
> +	return ret;
>  }
>  
>  static int pciehp_poll(void *data)
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests
  2020-03-18 11:33 ` [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests Lukas Wunner
  2020-03-18 16:43   ` Keith Busch
@ 2020-03-28 20:25   ` Bjorn Helgaas
  1 sibling, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2020-03-28 20:25 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David Hoyer, Keith Busch, linux-pci, Xiongfeng Wang,
	Sergey Miroshnichenko

On Wed, Mar 18, 2020 at 12:33:12PM +0100, Lukas Wunner wrote:
> David Hoyer reports that powering pciehp slots up or down via sysfs may
> hang:  The call to wait_event() in pciehp_sysfs_enable_slot() and
> _disable_slot() does not return because ctrl->ist_running remains true.
> 
> This flag, which was introduced by commit 54ecb8f7028c ("PCI: pciehp:
> Avoid returning prematurely from sysfs requests"), signifies that the
> IRQ thread pciehp_ist() is running.  It is set to true at the top of
> pciehp_ist() and reset to false at the end.  However there are two
> additional return statements in pciehp_ist() before which the commit
> neglected to reset the flag to false and wake up waiters for the flag.
> 
> That omission opens up the following race when powering up the slot:
> 
> * pciehp_ist() runs because a PCI_EXP_SLTSTA_PDC event was requested
>   by pciehp_sysfs_enable_slot()
> 
> * pciehp_ist() turns on slot power via the following call stack:
>   pciehp_handle_presence_or_link_change() -> pciehp_enable_slot() ->
>   __pciehp_enable_slot() -> board_added() -> pciehp_power_on_slot()
> 
> * after slot power is turned on, the link comes up, resulting in a
>   PCI_EXP_SLTSTA_DLLSC event
> 
> * the IRQ handler pciehp_isr() stores the event in ctrl->pending_events
>   and returns IRQ_WAKE_THREAD
> 
> * the IRQ thread is already woken (it's bringing up the slot), but the
>   genirq code remembers to re-run the IRQ thread after it has finished
>   (such that it can deal with the new event) by setting IRQTF_RUNTHREAD
>   via __handle_irq_event_percpu() -> __irq_wake_thread()
> 
> * the IRQ thread removes PCI_EXP_SLTSTA_DLLSC from ctrl->pending_events
>   via board_added() -> pciehp_check_link_status() in order to deal with
>   presence and link flaps per commit 6c35a1ac3da6 ("PCI: pciehp:
>   Tolerate initially unstable link")
> 
> * after pciehp_ist() has successfully brought up the slot, it resets
>   ctrl->ist_running to false and wakes up the sysfs requester
> 
> * the genirq code re-runs pciehp_ist(), which sets ctrl->ist_running
>   to true but then returns with IRQ_NONE because ctrl->pending_events
>   is empty
> 
> * pciehp_sysfs_enable_slot() is finally woken but notices that
>   ctrl->ist_running is true, hence continues waiting
> 
> The only way to get the hung task going again is to trigger a hotplug
> event which brings down the slot, e.g. by yanking out the card.
> 
> The same race exists when powering down the slot because remove_board()
> likewise clears link or presence changes in ctrl->pending_events per
> commit 3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes after
> powering off a slot") and thereby may cause a re-run of pciehp_ist()
> which returns with IRQ_NONE without resetting ctrl->ist_running to false.
> 
> Fix by adding a goto label before the teardown steps at the end of
> pciehp_ist() and jumping to that label from the two return statements
> which currently neglect to reset the ctrl->ist_running flag.
> 
> Fixes: 54ecb8f7028c ("PCI: pciehp: Avoid returning prematurely from sysfs requests")
> Reported-by: David Hoyer <David.Hoyer@netapp.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v4.19+
> Cc: Keith Busch <kbusch@kernel.org>

Applied to pci/hotplug for v5.7, thanks!

> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index e4627c68b30f..5f1a27bfcb19 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -663,17 +663,15 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  	if (atomic_fetch_and(~RERUN_ISR, &ctrl->pending_events) & RERUN_ISR) {
>  		ret = pciehp_isr(irq, dev_id);
>  		enable_irq(irq);
> -		if (ret != IRQ_WAKE_THREAD) {
> -			pci_config_pm_runtime_put(pdev);
> -			return ret;
> -		}
> +		if (ret != IRQ_WAKE_THREAD)
> +			goto out;
>  	}
>  
>  	synchronize_hardirq(irq);
>  	events = atomic_xchg(&ctrl->pending_events, 0);
>  	if (!events) {
> -		pci_config_pm_runtime_put(pdev);
> -		return IRQ_NONE;
> +		ret = IRQ_NONE;
> +		goto out;
>  	}
>  
>  	/* Check Attention Button Pressed */
> @@ -702,10 +700,12 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  		pciehp_handle_presence_or_link_change(ctrl, events);
>  	up_read(&ctrl->reset_lock);
>  
> +	ret = IRQ_HANDLED;
> +out:
>  	pci_config_pm_runtime_put(pdev);
>  	ctrl->ist_running = false;
>  	wake_up(&ctrl->requester);
> -	return IRQ_HANDLED;
> +	return ret;
>  }
>  
>  static int pciehp_poll(void *data)
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-28 20:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-14 14:19 Kernel hangs when powering up/down drive using sysfs Hoyer, David
2020-03-16 16:15 ` Keith Busch
2020-03-16 18:10   ` Lukas Wunner
2020-03-16 18:42     ` Keith Busch
2020-03-18 11:53       ` Lukas Wunner
2020-03-16 18:19 ` Lukas Wunner
2020-03-16 18:25   ` Hoyer, David
2020-03-16 21:35     ` Hoyer, David
2020-03-18 11:49     ` Lukas Wunner
2020-03-18 14:06       ` Hoyer, David
2020-03-18 11:33 ` [PATCH] PCI: pciehp: Fix indefinite wait on sysfs requests Lukas Wunner
2020-03-18 16:43   ` Keith Busch
2020-03-28 20:25   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).