Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
@ 2020-02-07 19:54 Stuart Hayes
  2020-02-09 15:03 ` Lukas Wunner
  2020-02-19 14:31 ` [PATCH v4] PCI: pciehp: Fix MSI interrupt race Lukas Wunner
  0 siblings, 2 replies; 8+ messages in thread
From: Stuart Hayes @ 2020-02-07 19:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Austin Bolen, Keith Busch, Alexandru Gagniuc, Rafael J . Wysocki,
	Mika Westerberg, Andy Shevchenko, Gustavo A . R . Silva,
	Sinan Kaya, Oza Pawandeep, linux-pci, linux-kernel, lukas,
	narendra_k, Stuart Hayes

Without this patch, a pciehp hotplug port can stop generating interrupts
on hotplug events, so device adds and removals will not be seen.

The pciehp interrupt handler pciehp_isr() will read the slot status
register and then write back to it to clear the bits that caused the
interrupt. If a different interrupt event bit gets set between the read and
the write, pciehp_isr will exit without having cleared all of the interrupt
event bits. If this happens when the MSI isn't masked (it will never be,
for example, when MSR per-vector masking is not supported), we won't get
any more hotplug interrupts from that device.

That is expected behavior, according to the PCI Express Base Specification
Revision 5.0 Version 1.0, section 6.7.3.4, "Software Notification of Hot-
Plug Events".

Because the "presence detect changed" and "data link layer state changed"
event bits can both get set at nearly the same time when a device is added
or removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).

This patch fixes this issue by modifying pciehp_isr() by looping back and
re-reading the slot status register immediately after writing to it, until
it sees that all of the event status bits have been cleared.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
v3:
  * removed pvm_capable flag (from v2) since MSI may not be masked
    regardless of whether per-vector masking is supported
  * tweaked comments

v2:
  * fixed ctrl_warn() call
  * improved comments
  * added pvm_capable flag and changed pciehp_isr() to loop back only when
    pvm_capable flag not set (suggested by Lukas Wunner)

 drivers/pci/hotplug/pciehp_hpc.c | 41 +++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 8a2cb1764386..0f99a150115e 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -522,12 +522,22 @@ void pciehp_power_off_slot(struct controller *ctrl)
 		 PCI_EXP_SLTCTL_PWR_OFF);
 }
 
+/*
+ * Set a limit to how many times the ISR will loop reading and writing the
+ * slot status register trying to clear the event bits.  These bits should
+ * not toggle rapidly, and there are only six possible events that could
+ * generate this interrupt.  If we still see events after this many reads,
+ * there is likely a bit stuck.
+ */
+#define MAX_ISR_STATUS_READS 6
+
 static irqreturn_t pciehp_isr(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	struct device *parent = pdev->dev.parent;
-	u16 status, events;
+	u16 status, events = 0;
+	int status_reads = 0;
 
 	/*
 	 * Interrupts only occur in D3hot or shallower and only if enabled
@@ -552,6 +562,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		}
 	}
 
+read_status:
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
 	if (status == (u16) ~0) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
@@ -564,24 +575,42 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	 * Slot Status contains plain status bits as well as event
 	 * notification bits; right now we only want the event bits.
 	 */
-	events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
-			   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
-			   PCI_EXP_SLTSTA_DLLSC);
+	status &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
+		   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
+		   PCI_EXP_SLTSTA_DLLSC);
 
 	/*
 	 * If we've already reported a power fault, don't report it again
 	 * until we've done something to handle it.
 	 */
 	if (ctrl->power_fault_detected)
-		events &= ~PCI_EXP_SLTSTA_PFD;
+		status &= ~PCI_EXP_SLTSTA_PFD;
 
+	events |= status;
 	if (!events) {
 		if (parent)
 			pm_runtime_put(parent);
 		return IRQ_NONE;
 	}
 
-	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
+	if (status) {
+		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status);
+
+		/*
+		 * Unless the MSI happens to be masked, all of the event
+		 * bits must be zero before the port will send a new
+		 * interrupt (see PCI Express Base Specification Rev 5.0
+		 * Version 1.0, section 6.7.3.4, "Software Notification of
+		 * Hot-Plug Events"). So, if an event bit gets set between
+		 * the read and the write of PCI_EXP_SLTSTA, we need to
+		 * loop back and try again.
+		 */
+		if (status_reads++ < MAX_ISR_STATUS_READS)
+			goto read_status;
+		ctrl_warn(ctrl, "Hot plug event bit stuck (%x)\n",
+			  status);
+	}
+
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
 	if (parent)
 		pm_runtime_put(parent);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
  2020-02-07 19:54 [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events Stuart Hayes
@ 2020-02-09 15:03 ` Lukas Wunner
  2020-02-09 18:07   ` Lukas Wunner
  2020-02-19 14:31 ` [PATCH v4] PCI: pciehp: Fix MSI interrupt race Lukas Wunner
  1 sibling, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2020-02-09 15:03 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: Bjorn Helgaas, Austin Bolen, Keith Busch, Alexandru Gagniuc,
	Rafael J . Wysocki, Mika Westerberg, Andy Shevchenko,
	Gustavo A . R . Silva, Sinan Kaya, Oza Pawandeep, linux-pci,
	linux-kernel, narendra_k

On Fri, Feb 07, 2020 at 02:54:50PM -0500, Stuart Hayes wrote:
> +/*
> + * Set a limit to how many times the ISR will loop reading and writing the
> + * slot status register trying to clear the event bits.  These bits should
> + * not toggle rapidly, and there are only six possible events that could
> + * generate this interrupt.  If we still see events after this many reads,
> + * there is likely a bit stuck.
> + */
> +#define MAX_ISR_STATUS_READS 6

Actually only *three* possible events could generate this interrupt
because pcie_enable_notification() only enables DLLSC, CCIE and
either of ABP or PDC.


> -	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
> +	if (status) {
> +		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status);

Writing "events" instead of "status" would seem to be more advantageous
because it reduces the number of loops.  Say you read PDC in the first
loop iteration, then DLLSC in the second loop iteration and shortly
before writing the register, PDC transitions to 1.  If you write
"events", you can make do with 2 loop iterations, if you write "status"
you'll need 3.


> +
> +		/*
> +		 * Unless the MSI happens to be masked, all of the event
> +		 * bits must be zero before the port will send a new
> +		 * interrupt (see PCI Express Base Specification Rev 5.0
> +		 * Version 1.0, section 6.7.3.4, "Software Notification of
> +		 * Hot-Plug Events"). So, if an event bit gets set between
> +		 * the read and the write of PCI_EXP_SLTSTA, we need to
> +		 * loop back and try again.
> +		 */
> +		if (status_reads++ < MAX_ISR_STATUS_READS)
> +			goto read_status;

Please use "pci_dev_msi_enabled(pdev)" as conditional for the if-clause,
we don't need this with INTx.


Using a for (;;) or do/while loop that you jump out of if
(!status || !pci_dev_msi_enabled(pdev)) might be more readable
than a goto, but I'm not sure.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
  2020-02-09 15:03 ` Lukas Wunner
@ 2020-02-09 18:07   ` Lukas Wunner
  2020-02-09 20:25     ` Lukas Wunner
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2020-02-09 18:07 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: Bjorn Helgaas, Austin Bolen, Keith Busch, Alexandru Gagniuc,
	Rafael J . Wysocki, Mika Westerberg, Andy Shevchenko,
	Gustavo A . R . Silva, Sinan Kaya, Oza Pawandeep, linux-pci,
	linux-kernel, narendra_k

On Sun, Feb 09, 2020 at 04:03:28PM +0100, Lukas Wunner wrote:
> Using a for (;;) or do/while loop that you jump out of if
> (!status || !pci_dev_msi_enabled(pdev)) might be more readable
> than a goto, but I'm not sure.

Actually, scratch that.  After thinking about this problem for a day
I've come up with a much simpler and more elegant solution.  Could you
test if the below works for you?

This solution has the added benefit that the IRQ thread is started up
once the first event bits have been collected.  If more event bits are
found in the additional loop iterations, they're added to the collected
event bits and the IRQ thread will pick them up asynchronously.  Once no
more bits are found, the hardirq handler exits with IRQ_NONE.  This means
that the genirq code won't wake the IRQ thread but that doesn't matter
because the ISR has already done that itself.  Should also work correctly
in poll mode and the behavior in INTx mode should be as before.

-- >8 --
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index c3e3f53..707324d 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -553,6 +553,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		}
 	}
 
+read_status:
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
 	if (status == (u16) ~0) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
@@ -609,6 +610,17 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 
 	/* Save pending events for consumption by IRQ thread. */
 	atomic_or(events, &ctrl->pending_events);
+
+	/*
+	 * In MSI mode, all event bits must be zero before the port will send
+	 * a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).  So re-read the
+	 * Slot Status register in case a bit was set between read and write.
+	 */
+	if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode) {
+		irq_wake_thread(irq, ctrl);
+		goto read_status;
+	}
+
 	return IRQ_WAKE_THREAD;
 }
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
  2020-02-09 18:07   ` Lukas Wunner
@ 2020-02-09 20:25     ` Lukas Wunner
  2020-02-10 21:40       ` Stuart Hayes
  2020-02-19 15:57       ` Lukas Wunner
  0 siblings, 2 replies; 8+ messages in thread
From: Lukas Wunner @ 2020-02-09 20:25 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: Bjorn Helgaas, Austin Bolen, Keith Busch, Alexandru Gagniuc,
	Rafael J . Wysocki, Mika Westerberg, Andy Shevchenko,
	Gustavo A . R . Silva, Sinan Kaya, Oza Pawandeep, linux-pci,
	linux-kernel, narendra_k

On Sun, Feb 09, 2020 at 07:07:22PM +0100, Lukas Wunner wrote:
> Actually, scratch that.  After thinking about this problem for a day
> I've come up with a much simpler and more elegant solution.  Could you
> test if the below works for you?

Sorry, I missed a few things:

* pm_runtime_put() is called too often in the MSI case.
* If only the CC bit is set or if ignore_hotplug is set, the function
  may return prematurely without re-reading the Slot Status register.
* Returning IRQ_NONE in the MSI case even though the IRQ thread was woken
  may incorrectly signal a spurious interrupt to the genirq code.
  It's better to return IRQ_HANDLED instead.

Below is another attempt.  I'll have to take a look at this with a
fresh pair of eyeballs though to verify I haven't overlooked anything
else and also to determine if this is actually simpler than Stuart's
approach.  Again, the advantage here is that processing of the events
by the IRQ thread is sped up by not delaying it until the Slot Status
register has settled.

Thanks.

-- >8 --
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index c3e3f53..db5baa5 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -530,6 +530,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	struct device *parent = pdev->dev.parent;
+	irqreturn_t ret = IRQ_NONE;
 	u16 status, events;
 
 	/*
@@ -553,6 +554,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		}
 	}
 
+read_status:
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
 	if (status == (u16) ~0) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
@@ -579,13 +581,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	if (!events) {
 		if (parent)
 			pm_runtime_put(parent);
-		return IRQ_NONE;
+		return ret;
 	}
 
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
-	if (parent)
-		pm_runtime_put(parent);
 
 	/*
 	 * Command Completed notifications are not deferred to the
@@ -595,21 +595,33 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		ctrl->cmd_busy = 0;
 		smp_mb();
 		wake_up(&ctrl->queue);
-
-		if (events == PCI_EXP_SLTSTA_CC)
-			return IRQ_HANDLED;
-
 		events &= ~PCI_EXP_SLTSTA_CC;
 	}
 
 	if (pdev->ignore_hotplug) {
 		ctrl_dbg(ctrl, "ignoring hotplug event %#06x\n", events);
-		return IRQ_HANDLED;
+		events = 0;
 	}
 
 	/* Save pending events for consumption by IRQ thread. */
 	atomic_or(events, &ctrl->pending_events);
-	return IRQ_WAKE_THREAD;
+
+	/*
+	 * In MSI mode, all event bits must be zero before the port will send
+	 * a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).  So re-read the
+	 * Slot Status register in case a bit was set between read and write.
+	 */
+	if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode) {
+		irq_wake_thread(irq, ctrl);
+		ret = IRQ_HANDLED;
+		goto read_status;
+	}
+
+	if (parent)
+		pm_runtime_put(parent);
+	if (events)
+		return IRQ_WAKE_THREAD;
+	return IRQ_HANDLED;
 }
 
 static irqreturn_t pciehp_ist(int irq, void *dev_id)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
  2020-02-09 20:25     ` Lukas Wunner
@ 2020-02-10 21:40       ` Stuart Hayes
  2020-02-13 20:19         ` Stuart Hayes
  2020-02-19 15:57       ` Lukas Wunner
  1 sibling, 1 reply; 8+ messages in thread
From: Stuart Hayes @ 2020-02-10 21:40 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Austin Bolen, Keith Busch, Alexandru Gagniuc,
	Rafael J . Wysocki, Mika Westerberg, Andy Shevchenko,
	Gustavo A . R . Silva, Sinan Kaya, Oza Pawandeep, linux-pci,
	linux-kernel, narendra_k



On 2/9/20 2:25 PM, Lukas Wunner wrote:
> On Sun, Feb 09, 2020 at 07:07:22PM +0100, Lukas Wunner wrote:
>> Actually, scratch that.  After thinking about this problem for a day
>> I've come up with a much simpler and more elegant solution.  Could you
>> test if the below works for you?
> 
> Sorry, I missed a few things:
> 
> * pm_runtime_put() is called too often in the MSI case.
> * If only the CC bit is set or if ignore_hotplug is set, the function
>   may return prematurely without re-reading the Slot Status register.
> * Returning IRQ_NONE in the MSI case even though the IRQ thread was woken
>   may incorrectly signal a spurious interrupt to the genirq code.
>   It's better to return IRQ_HANDLED instead.
> 
> Below is another attempt.  I'll have to take a look at this with a
> fresh pair of eyeballs though to verify I haven't overlooked anything
> else and also to determine if this is actually simpler than Stuart's
> approach.  Again, the advantage here is that processing of the events
> by the IRQ thread is sped up by not delaying it until the Slot Status
> register has settled.
> 
> Thanks.
> 
> -- >8 --
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index c3e3f53..db5baa5 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -530,6 +530,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  	struct controller *ctrl = (struct controller *)dev_id;
>  	struct pci_dev *pdev = ctrl_dev(ctrl);
>  	struct device *parent = pdev->dev.parent;
> +	irqreturn_t ret = IRQ_NONE;
>  	u16 status, events;
>  
>  	/*
> @@ -553,6 +554,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  		}
>  	}
>  
> +read_status:
>  	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
>  	if (status == (u16) ~0) {
>  		ctrl_info(ctrl, "%s: no response from device\n", __func__);
> @@ -579,13 +581,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  	if (!events) {
>  		if (parent)
>  			pm_runtime_put(parent);
> -		return IRQ_NONE;
> +		return ret;
>  	}
>  
>  	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
>  	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
> -	if (parent)
> -		pm_runtime_put(parent);
>  
>  	/*
>  	 * Command Completed notifications are not deferred to the
> @@ -595,21 +595,33 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>  		ctrl->cmd_busy = 0;
>  		smp_mb();
>  		wake_up(&ctrl->queue);
> -
> -		if (events == PCI_EXP_SLTSTA_CC)
> -			return IRQ_HANDLED;
> -
>  		events &= ~PCI_EXP_SLTSTA_CC;
>  	}
>  
>  	if (pdev->ignore_hotplug) {
>  		ctrl_dbg(ctrl, "ignoring hotplug event %#06x\n", events);
> -		return IRQ_HANDLED;
> +		events = 0;
>  	}
>  
>  	/* Save pending events for consumption by IRQ thread. */
>  	atomic_or(events, &ctrl->pending_events);
> -	return IRQ_WAKE_THREAD;
> +
> +	/*
> +	 * In MSI mode, all event bits must be zero before the port will send
> +	 * a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).  So re-read the
> +	 * Slot Status register in case a bit was set between read and write.
> +	 */
> +	if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode) {
> +		irq_wake_thread(irq, ctrl);
> +		ret = IRQ_HANDLED;
> +		goto read_status;
> +	}
> +
> +	if (parent)
> +		pm_runtime_put(parent);
> +	if (events)
> +		return IRQ_WAKE_THREAD;
> +	return IRQ_HANDLED;
>  }
>  
>  static irqreturn_t pciehp_ist(int irq, void *dev_id)
> 

I tested this patch, and it fixes the issue on my system.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
  2020-02-10 21:40       ` Stuart Hayes
@ 2020-02-13 20:19         ` Stuart Hayes
  0 siblings, 0 replies; 8+ messages in thread
From: Stuart Hayes @ 2020-02-13 20:19 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Austin Bolen, Keith Busch, Alexandru Gagniuc,
	Rafael J . Wysocki, Mika Westerberg, Andy Shevchenko,
	Gustavo A . R . Silva, Sinan Kaya, Oza Pawandeep, linux-pci,
	linux-kernel, narendra_k, Enzo Matsumiya



On 2/10/20 3:40 PM, Stuart Hayes wrote:
> 
> 
> On 2/9/20 2:25 PM, Lukas Wunner wrote:
>> On Sun, Feb 09, 2020 at 07:07:22PM +0100, Lukas Wunner wrote:
>>> Actually, scratch that.  After thinking about this problem for a day
>>> I've come up with a much simpler and more elegant solution.  Could you
>>> test if the below works for you?
>>
>> Sorry, I missed a few things:
>>
>> * pm_runtime_put() is called too often in the MSI case.
>> * If only the CC bit is set or if ignore_hotplug is set, the function
>>   may return prematurely without re-reading the Slot Status register.
>> * Returning IRQ_NONE in the MSI case even though the IRQ thread was woken
>>   may incorrectly signal a spurious interrupt to the genirq code.
>>   It's better to return IRQ_HANDLED instead.
>>
>> Below is another attempt.  I'll have to take a look at this with a
>> fresh pair of eyeballs though to verify I haven't overlooked anything
>> else and also to determine if this is actually simpler than Stuart's
>> approach.  Again, the advantage here is that processing of the events
>> by the IRQ thread is sped up by not delaying it until the Slot Status
>> register has settled.
>>
>> Thanks.
>>
>> -- >8 --
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index c3e3f53..db5baa5 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -530,6 +530,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>  	struct controller *ctrl = (struct controller *)dev_id;
>>  	struct pci_dev *pdev = ctrl_dev(ctrl);
>>  	struct device *parent = pdev->dev.parent;
>> +	irqreturn_t ret = IRQ_NONE;
>>  	u16 status, events;
>>  
>>  	/*
>> @@ -553,6 +554,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>  		}
>>  	}
>>  
>> +read_status:
>>  	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
>>  	if (status == (u16) ~0) {
>>  		ctrl_info(ctrl, "%s: no response from device\n", __func__);
>> @@ -579,13 +581,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>  	if (!events) {
>>  		if (parent)
>>  			pm_runtime_put(parent);
>> -		return IRQ_NONE;
>> +		return ret;
>>  	}
>>  
>>  	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
>>  	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
>> -	if (parent)
>> -		pm_runtime_put(parent);
>>  
>>  	/*
>>  	 * Command Completed notifications are not deferred to the
>> @@ -595,21 +595,33 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
>>  		ctrl->cmd_busy = 0;
>>  		smp_mb();
>>  		wake_up(&ctrl->queue);
>> -
>> -		if (events == PCI_EXP_SLTSTA_CC)
>> -			return IRQ_HANDLED;
>> -
>>  		events &= ~PCI_EXP_SLTSTA_CC;
>>  	}
>>  
>>  	if (pdev->ignore_hotplug) {
>>  		ctrl_dbg(ctrl, "ignoring hotplug event %#06x\n", events);
>> -		return IRQ_HANDLED;
>> +		events = 0;
>>  	}
>>  
>>  	/* Save pending events for consumption by IRQ thread. */
>>  	atomic_or(events, &ctrl->pending_events);
>> -	return IRQ_WAKE_THREAD;
>> +
>> +	/*
>> +	 * In MSI mode, all event bits must be zero before the port will send
>> +	 * a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).  So re-read the
>> +	 * Slot Status register in case a bit was set between read and write.
>> +	 */
>> +	if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode) {
>> +		irq_wake_thread(irq, ctrl);
>> +		ret = IRQ_HANDLED;
>> +		goto read_status;
>> +	}
>> +
>> +	if (parent)
>> +		pm_runtime_put(parent);
>> +	if (events)
>> +		return IRQ_WAKE_THREAD;
>> +	return IRQ_HANDLED;
>>  }
>>  
>>  static irqreturn_t pciehp_ist(int irq, void *dev_id)
>>
> 
> I tested this patch, and it fixes the issue on my system.
> 

CCing Enzo Matsumiya.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4] PCI: pciehp: Fix MSI interrupt race
  2020-02-07 19:54 [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events Stuart Hayes
  2020-02-09 15:03 ` Lukas Wunner
@ 2020-02-19 14:31 ` Lukas Wunner
  1 sibling, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2020-02-19 14:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Stuart Hayes
  Cc: Austin Bolen, Keith Busch, Alexandru Gagniuc, Rafael J . Wysocki,
	Mika Westerberg, Andy Shevchenko, Sinan Kaya, Oza Pawandeep,
	linux-pci, linux-kernel, narendra_k, Enzo Matsumiya

From: Stuart Hayes <stuart.w.hayes@gmail.com>

Without this commit, a PCIe hotplug port can stop generating interrupts
on hotplug events, so device adds and removals will not be seen:

The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt.
If a different interrupt event bit gets set between the read and the
write, pciehp_isr() returns without having cleared all of the interrupt
event bits.  If this happens when the MSI isn't masked (which by default
it isn't in handle_edge_irq(), and which it will never be when MSI
per-vector masking is not supported), we won't get any more hotplug
interrupts from that device.

That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".

Because the Presence Detect Changed and Data Link Layer State Changed
event bits can both get set at nearly the same time when a device is
added or removed, this is more likely to happen than it might seem.
The issue was found (and can be reproduced rather easily) by connecting
and disconnecting an NVMe storage device on at least one system model
where the NVMe devices were being connected to an AMD PCIe port (PCI
device 0x1022/0x1483).

Fix the issue by modifying pciehp_isr() to loop back and re-read the
Slot Status register immediately after writing to it, until it sees that
all of the event status bits have been cleared.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
v4 (lukas):
  * drop "MAX_ISR_STATUS_READS" loop count limitation
  * drop unnecessary braces around PCI_EXP_SLTSTA_* flags
  * write "events" instead of "status" variable to Slot Status register
    to avoid unnecessary loop iterations if the same bit gets set
    repeatedly
  * don't loop back in INTx and poll modes
  * shorten and tweak code comment & commit message

v3:
  * removed pvm_capable flag (from v2) since MSI may not be masked
    regardless of whether per-vector masking is supported
  * tweaked comments

v2:
  * fixed ctrl_warn() call
  * improved comments
  * added pvm_capable flag and changed pciehp_isr() to loop back only when
    pvm_capable flag not set (suggested by Lukas Wunner)

 drivers/pci/hotplug/pciehp_hpc.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 8a2cb1764386..f64d10df9eb5 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -527,7 +527,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	struct device *parent = pdev->dev.parent;
-	u16 status, events;
+	u16 status, events = 0;
 
 	/*
 	 * Interrupts only occur in D3hot or shallower and only if enabled
@@ -552,6 +552,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		}
 	}
 
+read_status:
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
 	if (status == (u16) ~0) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
@@ -564,24 +565,37 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	 * Slot Status contains plain status bits as well as event
 	 * notification bits; right now we only want the event bits.
 	 */
-	events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
-			   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
-			   PCI_EXP_SLTSTA_DLLSC);
+	status &= PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
+		  PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
+		  PCI_EXP_SLTSTA_DLLSC;
 
 	/*
 	 * If we've already reported a power fault, don't report it again
 	 * until we've done something to handle it.
 	 */
 	if (ctrl->power_fault_detected)
-		events &= ~PCI_EXP_SLTSTA_PFD;
+		status &= ~PCI_EXP_SLTSTA_PFD;
 
+	events |= status;
 	if (!events) {
 		if (parent)
 			pm_runtime_put(parent);
 		return IRQ_NONE;
 	}
 
-	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
+	if (status) {
+		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
+
+		/*
+		 * In MSI mode, all event bits must be zero before the port
+		 * will send a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).
+		 * So re-read the Slot Status register in case a bit was set
+		 * between read and write.
+		 */
+		if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode)
+			goto read_status;
+	}
+
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
 	if (parent)
 		pm_runtime_put(parent);
-- 
2.24.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events
  2020-02-09 20:25     ` Lukas Wunner
  2020-02-10 21:40       ` Stuart Hayes
@ 2020-02-19 15:57       ` Lukas Wunner
  1 sibling, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2020-02-19 15:57 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: Bjorn Helgaas, Austin Bolen, Keith Busch, Alexandru Gagniuc,
	Rafael J . Wysocki, Mika Westerberg, Andy Shevchenko,
	Gustavo A . R . Silva, Sinan Kaya, Oza Pawandeep, linux-pci,
	linux-kernel, narendra_k, Enzo Matsumiya

On Sun, Feb 09, 2020 at 09:25:12PM +0100, Lukas Wunner wrote:
> Below is another attempt.  I'll have to take a look at this with a
> fresh pair of eyeballs though to verify I haven't overlooked anything
> else and also to determine if this is actually simpler than Stuart's
> approach.  Again, the advantage here is that processing of the events
> by the IRQ thread is sped up by not delaying it until the Slot Status
> register has settled.

After some deliberation I've come full circle and think that Stuart's
approach is actually better than mine:

I thought that my approach would speed up processing of events by
waking the IRQ thread immediately after the first loop iteration.
But I've realized that right at the beginning of the IRQ thread,
synchronize_hardirq() is called, so the IRQ thread will wait for
the hardirq handler to finish before actually processing the events.

The rationale for the call to synchronize_hardirq() is that the
IRQ thread was woken, but now sees that the hardirq handler is
running (again) to collect more events.  In that situation it makes
sense to wait for them to be collected before starting to process
events.

Is the synchronize_hardirq() absolutely necessary?  Not really,
but I still think that it makes sense.  In reality, the latency
for additional loop iterations is likely small, so it's probably
not worth to optimize for immediate processing after the first
loop iteration.

Stuart's approach is also less intrusive and doesn't change the
logic as much as my approach does.  His patch therefore lends
itself better for backporting to stable.

So I've just respun Stuart's v3 patch, taking into account the
review comments I had sent for it.  I've taken the liberty to make
some editorial changes to the commit message and code comment.
Stuart & Bjorn, if you don't like these, please feel free to roll
back my changes to them as you see fit.

I realize now that I forgot to add the following tags,
Bjorn, could you add them if/when applying?

Fixes: 7b4ce26bcf69 ("PCI: pciehp: Convert to threaded IRQ")
Cc: stable@vger.kernel.org # v4.19+

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 19:54 [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events Stuart Hayes
2020-02-09 15:03 ` Lukas Wunner
2020-02-09 18:07   ` Lukas Wunner
2020-02-09 20:25     ` Lukas Wunner
2020-02-10 21:40       ` Stuart Hayes
2020-02-13 20:19         ` Stuart Hayes
2020-02-19 15:57       ` Lukas Wunner
2020-02-19 14:31 ` [PATCH v4] PCI: pciehp: Fix MSI interrupt race Lukas Wunner

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git