[PATCH] PCI: pciehp: Make sure pciehp_isr clears interrupt events

From: Stuart Hayes <stuart.w.hayes@gmail.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Austin Bolen <austin_bolen@dell.com>,
	keith.busch@intel.com, Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Andy Shevchenko <andy.shevchenko@gmail.com>,
	"Gustavo A . R . Silva" <gustavo@embeddedor.com>,
	Sinan Kaya <okaya@kernel.org>,
	Oza Pawandeep <poza@codeaurora.org>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	lukas@wunner.de, Stuart Hayes <stuart.w.hayes@gmail.com>
Subject: [PATCH] PCI: pciehp: Make sure pciehp_isr clears interrupt events
Date: Tue, 12 Nov 2019 16:59:38 -0500	[thread overview]
Message-ID: <20191112215938.1145-1-stuart.w.hayes@gmail.com> (raw)

The pciehp interrupt handler pciehp_isr() will read the slot status
register and then write back to it to clear just the bits that caused the
interrupt. If a different interrupt event bit gets set between the read and
the write, pciehp_isr() will exit without having cleared all of the
interrupt event bits, so we will never get another hotplug interrupt from
that device.

That is expected behavior according to the PCI Express spec (v.5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events").

Because the "presence detect changed" and "data link layer state changed"
event bits are both getting set at nearly the same time when a device is
added or removed, this is more likely to happen than it might seem. The
issue can be reproduced rather easily by connecting and disconnecting an
NVMe device on at least one system model.

This patch fixes the issue by modifying pciehp_isr() to loop back and
re-read the slot status register immediately after writing to it, until
it sees that all of the event status bits have been cleared.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 1a522c1c4177..8ec22a872b28 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -487,12 +487,21 @@ void pciehp_power_off_slot(struct controller *ctrl)
 		 PCI_EXP_SLTCTL_PWR_OFF);
 }
 
+/*
+ * We should never need to re-read the slot status register this many times,
+ * because there are only six possible events that could generate this
+ * interrupt.  If we still see events after this many reads, there's a stuck
+ * bit.
+ */
+#define MAX_ISR_STATUS_READS 6
+
 static irqreturn_t pciehp_isr(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	struct device *parent = pdev->dev.parent;
-	u16 status, events;
+	u16 status, events = 0;
+	int status_reads = 0;
 
 	/*
 	 * Interrupts only occur in D3hot or shallower and only if enabled
@@ -517,6 +526,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		}
 	}
 
+read_status:
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
 	if (status == (u16) ~0) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
@@ -529,16 +539,34 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	 * Slot Status contains plain status bits as well as event
 	 * notification bits; right now we only want the event bits.
 	 */
-	events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
-			   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
-			   PCI_EXP_SLTSTA_DLLSC);
+	status &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
+		   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
+		   PCI_EXP_SLTSTA_DLLSC);
 
 	/*
 	 * If we've already reported a power fault, don't report it again
 	 * until we've done something to handle it.
 	 */
 	if (ctrl->power_fault_detected)
-		events &= ~PCI_EXP_SLTSTA_PFD;
+		status &= ~PCI_EXP_SLTSTA_PFD;
+
+	if (status) {
+		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status);
+		events |= status;
+	}
+
+	/*
+	 * All of the event bits must be zero before the port will send
+	 * a new interrupt.  If an event bit gets set between the read
+	 * and the write, we'll never get another interrupt, so loop until
+	 * we see no event bits set.
+	 */
+	if (status && status_reads++ < MAX_ISR_STATUS_READS)
+		goto read_status;
+
+	if (status_reads == MAX_ISR_STATUS_READS)
+		ctrl_warn(ctrl, "Slot(%s): Hot plug event bit stuck, reading %x\n",
+			  status, slot_name(ctrl));
 
 	if (!events) {
 		if (parent)
@@ -546,7 +574,6 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		return IRQ_NONE;
 	}
 
-	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
 	if (parent)
 		pm_runtime_put(parent);
-- 
2.18.1