linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/7] DPC updates
@ 2018-04-02 16:21 Keith Busch
  2018-04-02 16:21 ` [PATCHv2 1/7] PCI/DPC: Enable ERR_COR Keith Busch
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:21 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

I am terribly sorry about the delay getting this back to you. I had some
scheduling issues with test equipment and other unrelated conflicts.

I added Oza's legacy IRQ fix into this series, as I didn't see this
staged yet, and it really does fix some issues as well as provide the
means to clean up other parts.

I'm sending this later than I intended, but hoping you might see a way
to including this to 4.17. I can rebase to 'next' where the file is
renamed to 'dpc.c'.

Keith Busch (6):
  PCI/DPC: Enable ERR_COR
  PCI/DPC: Leave interrupts enabled while handling event
  PCI/DPC: Defer event handling to work queue
  PCI/DPC: Remove rp_pio_status from dpc struct
  PCI/AER: API for obtaining AER information
  PCI/DPC: Print AER status in DPC event handling

Oza Pawandeep (1):
  PCI/DPC: Fix PCI legacy interrupt acknowledgement

 drivers/pci/pcie/aer/aerdrv.h      |   1 +
 drivers/pci/pcie/aer/aerdrv_core.c |   8 +--
 drivers/pci/pcie/pcie-dpc.c        | 106 +++++++++++++++++--------------------
 include/uapi/linux/pci_regs.h      |   1 +
 4 files changed, 55 insertions(+), 61 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv2 1/7] PCI/DPC: Enable ERR_COR
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
@ 2018-04-02 16:21 ` Keith Busch
  2018-04-02 21:23   ` Bjorn Helgaas
  2018-04-02 16:21 ` [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement Keith Busch
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:21 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

A DPC port may be configured to send ERR_COR message when the
triggered. This patch enables this feature so additional notification
of the event is possible.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c   | 5 ++++-
 include/uapi/linux/pci_regs.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 38e40c6c576f..a9be6938417f 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -269,7 +269,10 @@ static int dpc_probe(struct pcie_device *dev)
 		}
 	}
 
-	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN;
+	ctl = (ctl & 0xfff4) |
+		PCI_EXP_DPC_CTL_EN_NONFATAL |
+		PCI_EXP_DPC_CTL_INT_EN |
+		PCI_EXP_DPC_CTL_ERR_COR;
 	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
 
 	dev_info(device, "DPC error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 0c79eac5e9b8..9cfcecdc3ec7 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -980,6 +980,7 @@
 #define PCI_EXP_DPC_CTL			6	/* DPC control */
 #define  PCI_EXP_DPC_CTL_EN_NONFATAL 	0x0002	/* Enable trigger on ERR_NONFATAL message */
 #define  PCI_EXP_DPC_CTL_INT_EN 	0x0008	/* DPC Interrupt Enable */
+#define  PCI_EXP_DPC_CTL_ERR_COR 	0x0010	/* DPC ERR_COR Enable */
 
 #define PCI_EXP_DPC_STATUS		8	/* DPC Status */
 #define  PCI_EXP_DPC_STATUS_TRIGGER	    0x0001 /* Trigger Status */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
  2018-04-02 16:21 ` [PATCHv2 1/7] PCI/DPC: Enable ERR_COR Keith Busch
@ 2018-04-02 16:21 ` Keith Busch
  2018-04-03 20:38   ` Bjorn Helgaas
  2018-04-02 16:21 ` [PATCHv2 3/7] PCI/DPC: Leave interrupts enabled while handling event Keith Busch
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:21 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya

From: Oza Pawandeep <poza@codeaurora.org>

The DPC driver was acknowledging the DPC interrupt status in deferred
work. That works for edge triggered interrupts, but causes an interrupt
storm with level triggered legacy interrupts.

This patch fixes that by clearing the interrupt status in interrupt
handler.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[changelog]
Reviewed-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index a9be6938417f..82644245cb4d 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -112,7 +112,7 @@ static void dpc_work(struct work_struct *work)
 	}
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
-		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
+			      PCI_EXP_DPC_STATUS_TRIGGER);
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
@@ -222,6 +222,9 @@ static irqreturn_t dpc_irq(int irq, void *context)
 	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
 		dpc_process_rp_pio_error(dpc);
 
+	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
+			      PCI_EXP_DPC_STATUS_INTERRUPT);
+
 	schedule_work(&dpc->work);
 
 	return IRQ_HANDLED;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv2 3/7] PCI/DPC: Leave interrupts enabled while handling event
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
  2018-04-02 16:21 ` [PATCHv2 1/7] PCI/DPC: Enable ERR_COR Keith Busch
  2018-04-02 16:21 ` [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement Keith Busch
@ 2018-04-02 16:21 ` Keith Busch
  2018-04-02 16:22 ` [PATCHv2 4/7] PCI/DPC: Defer event handling to work queue Keith Busch
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:21 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

Now that the DPC driver clears the interrupt status before exiting the
irq handler, we don't need to abuse the DPC control register to know if
a shared interrupt is for a new DPC event: a DPC port can not trigger
a second interrupt until the host clears the trigger status later in the
work queue handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 82644245cb4d..38b2d0586c72 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -87,7 +87,7 @@ static void dpc_work(struct work_struct *work)
 	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
 	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
 	struct pci_bus *parent = pdev->subordinate;
-	u16 cap = dpc->cap_pos, ctl;
+	u16 cap = dpc->cap_pos;
 
 	pci_lock_rescan_remove();
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
@@ -113,10 +113,6 @@ static void dpc_work(struct work_struct *work)
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
-
-	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
-	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
-			      ctl | PCI_EXP_DPC_CTL_INT_EN);
 }
 
 static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
@@ -181,16 +177,10 @@ static irqreturn_t dpc_irq(int irq, void *context)
 	struct dpc_dev *dpc = (struct dpc_dev *)context;
 	struct pci_dev *pdev = dpc->dev->port;
 	struct device *dev = &dpc->dev->device;
-	u16 cap = dpc->cap_pos, ctl, status, source, reason, ext_reason;
-
-	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
-
-	if (!(ctl & PCI_EXP_DPC_CTL_INT_EN) || ctl == (u16)(~0))
-		return IRQ_NONE;
+	u16 cap = dpc->cap_pos, status, source, reason, ext_reason;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
-
-	if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT))
+	if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT) || status == (u16)(~0))
 		return IRQ_NONE;
 
 	if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) {
@@ -199,9 +189,6 @@ static irqreturn_t dpc_irq(int irq, void *context)
 		return IRQ_HANDLED;
 	}
 
-	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
-			      ctl & ~PCI_EXP_DPC_CTL_INT_EN);
-
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID,
 			     &source);
 
@@ -224,9 +211,8 @@ static irqreturn_t dpc_irq(int irq, void *context)
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_INTERRUPT);
-
-	schedule_work(&dpc->work);
-
+	if (status & PCI_EXP_DPC_STATUS_TRIGGER)
+		schedule_work(&dpc->work);
 	return IRQ_HANDLED;
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv2 4/7] PCI/DPC: Defer event handling to work queue
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
                   ` (2 preceding siblings ...)
  2018-04-02 16:21 ` [PATCHv2 3/7] PCI/DPC: Leave interrupts enabled while handling event Keith Busch
@ 2018-04-02 16:22 ` Keith Busch
  2018-04-02 16:22 ` [PATCHv2 5/7] PCI/DPC: Remove rp_pio_status from dpc struct Keith Busch
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:22 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

This moves all event handling to the existing work queue, which will
make passing event information simpler to the handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c | 53 ++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 38b2d0586c72..a28901e88f7b 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -45,6 +45,8 @@ static const char * const rp_pio_error_string[] = {
 	"Memory Request Completion Timeout",		 /* Bit Position 18 */
 };
 
+static void dpc_process_rp_pio_error(struct dpc_dev *dpc);
+
 static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
 {
 	unsigned long timeout = jiffies + HZ;
@@ -87,7 +89,27 @@ static void dpc_work(struct work_struct *work)
 	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
 	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
 	struct pci_bus *parent = pdev->subordinate;
-	u16 cap = dpc->cap_pos;
+	u16 cap = dpc->cap_pos, status, source, reason, ext_reason;
+
+	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
+	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
+
+	dev_info(&pdev->dev, "DPC containment event, status:%#06x source:%#06x\n",
+		status, source);
+
+	reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
+	ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5;
+
+	dev_warn(&pdev->dev, "DPC %s detected, remove downstream devices\n",
+		 (reason == 0) ? "unmasked uncorrectable error" :
+		 (reason == 1) ? "ERR_NONFATAL" :
+		 (reason == 2) ? "ERR_FATAL" :
+		 (ext_reason == 0) ? "RP PIO error" :
+		 (ext_reason == 1) ? "software trigger" :
+				     "reserved error");
+
+	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
+		dpc_process_rp_pio_error(dpc);
 
 	pci_lock_rescan_remove();
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
@@ -176,39 +198,12 @@ static irqreturn_t dpc_irq(int irq, void *context)
 {
 	struct dpc_dev *dpc = (struct dpc_dev *)context;
 	struct pci_dev *pdev = dpc->dev->port;
-	struct device *dev = &dpc->dev->device;
-	u16 cap = dpc->cap_pos, status, source, reason, ext_reason;
+	u16 cap = dpc->cap_pos, status;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 	if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT) || status == (u16)(~0))
 		return IRQ_NONE;
 
-	if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) {
-		pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
-				      PCI_EXP_DPC_STATUS_INTERRUPT);
-		return IRQ_HANDLED;
-	}
-
-	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID,
-			     &source);
-
-	dev_info(dev, "DPC containment event, status:%#06x source:%#06x\n",
-		status, source);
-
-	reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
-	ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5;
-
-	dev_warn(dev, "DPC %s detected, remove downstream devices\n",
-		 (reason == 0) ? "unmasked uncorrectable error" :
-		 (reason == 1) ? "ERR_NONFATAL" :
-		 (reason == 2) ? "ERR_FATAL" :
-		 (ext_reason == 0) ? "RP PIO error" :
-		 (ext_reason == 1) ? "software trigger" :
-				     "reserved error");
-	/* show RP PIO error detail information */
-	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
-		dpc_process_rp_pio_error(dpc);
-
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_INTERRUPT);
 	if (status & PCI_EXP_DPC_STATUS_TRIGGER)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv2 5/7] PCI/DPC: Remove rp_pio_status from dpc struct
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
                   ` (3 preceding siblings ...)
  2018-04-02 16:22 ` [PATCHv2 4/7] PCI/DPC: Defer event handling to work queue Keith Busch
@ 2018-04-02 16:22 ` Keith Busch
  2018-04-02 16:22 ` [PATCHv2 6/7] PCI/AER: API for obtaining AER information Keith Busch
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:22 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

We don't need to save the rp pio status across multiple contexts anymore
as all DPC event handling occurs in a single work queue context.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index a28901e88f7b..e12837ee4f1c 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -19,7 +19,6 @@ struct dpc_dev {
 	struct work_struct	work;
 	u16			cap_pos;
 	bool			rp_extensions;
-	u32			rp_pio_status;
 	u8			rp_log_size;
 };
 
@@ -45,7 +44,7 @@ static const char * const rp_pio_error_string[] = {
 	"Memory Request Completion Timeout",		 /* Bit Position 18 */
 };
 
-static void dpc_process_rp_pio_error(struct dpc_dev *dpc);
+static u32 dpc_process_rp_pio_error(struct dpc_dev *dpc);
 
 static int dpc_wait_rp_inactive(struct dpc_dev *dpc)
 {
@@ -90,6 +89,7 @@ static void dpc_work(struct work_struct *work)
 	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
 	struct pci_bus *parent = pdev->subordinate;
 	u16 cap = dpc->cap_pos, status, source, reason, ext_reason;
+	u32 pio_status = 0;
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
@@ -109,7 +109,7 @@ static void dpc_work(struct work_struct *work)
 				     "reserved error");
 
 	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
-		dpc_process_rp_pio_error(dpc);
+		pio_status = dpc_process_rp_pio_error(dpc);
 
 	pci_lock_rescan_remove();
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
@@ -127,22 +127,21 @@ static void dpc_work(struct work_struct *work)
 	dpc_wait_link_inactive(dpc);
 	if (dpc->rp_extensions && dpc_wait_rp_inactive(dpc))
 		return;
-	if (dpc->rp_extensions && dpc->rp_pio_status) {
+	if (pio_status)
 		pci_write_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS,
-				       dpc->rp_pio_status);
-		dpc->rp_pio_status = 0;
-	}
+				       pio_status);
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
 }
 
-static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
+static u32 dpc_process_rp_pio_error(struct dpc_dev *dpc)
 {
 	struct device *dev = &dpc->dev->device;
 	struct pci_dev *pdev = dpc->dev->port;
 	u16 cap = dpc->cap_pos, dpc_status, first_error;
-	u32 status, mask, sev, syserr, exc, dw0, dw1, dw2, dw3, log, prefix;
+	u32 status, mask, sev, syserr, exc, dw0, dw1, dw2, dw3, log, prefix,
+		pio_status;
 	int i;
 
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS, &status);
@@ -150,7 +149,7 @@ static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 	dev_err(dev, "rp_pio_status: %#010x, rp_pio_mask: %#010x\n",
 		status, mask);
 
-	dpc->rp_pio_status = status;
+	pio_status = status;
 
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_SEVERITY, &sev);
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_SYSERROR, &syserr);
@@ -170,7 +169,7 @@ static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 	}
 
 	if (dpc->rp_log_size < 4)
-		return;
+		return pio_status;
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG,
 			      &dw0);
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG + 4,
@@ -183,7 +182,7 @@ static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 		dw0, dw1, dw2, dw3);
 
 	if (dpc->rp_log_size < 5)
-		return;
+		return pio_status;
 	pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_IMPSPEC_LOG, &log);
 	dev_err(dev, "RP PIO ImpSpec Log %#010x\n", log);
 
@@ -192,6 +191,7 @@ static void dpc_process_rp_pio_error(struct dpc_dev *dpc)
 			cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, &prefix);
 		dev_err(dev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix);
 	}
+	return pio_status;
 }
 
 static irqreturn_t dpc_irq(int irq, void *context)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv2 6/7] PCI/AER: API for obtaining AER information
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
                   ` (4 preceding siblings ...)
  2018-04-02 16:22 ` [PATCHv2 5/7] PCI/DPC: Remove rp_pio_status from dpc struct Keith Busch
@ 2018-04-02 16:22 ` Keith Busch
  2018-04-09 10:03   ` David Laight
  2018-04-02 16:22 ` [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling Keith Busch
  2018-06-19 21:31 ` [PATCHv2 0/7] DPC updates Bjorn Helgaas
  7 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:22 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

Since this is making the function externally visible, appending "aer_"
prefix to the function name.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer/aerdrv.h      | 1 +
 drivers/pci/pcie/aer/aerdrv_core.c | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 5449e5ce139d..7c833a1d897e 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -107,6 +107,7 @@ static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
 }
 
 extern struct bus_type pcie_port_bus_type;
+int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
 void aer_isr(struct work_struct *work);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index a4bfea52e7d4..4fb24003cac3 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -638,7 +638,7 @@ static void aer_recover_work_func(struct work_struct *work)
 #endif
 
 /**
- * get_device_error_info - read error status from dev and store it to info
+ * aer_get_device_error_info - read error status from dev and store it to info
  * @dev: pointer to the device expected to have a error record
  * @info: pointer to structure to store the error record
  *
@@ -646,7 +646,7 @@ static void aer_recover_work_func(struct work_struct *work)
  *
  * Note that @info is reused among all error devices. Clear fields properly.
  */
-static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
+int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 {
 	int pos, temp;
 
@@ -705,11 +705,11 @@ static inline void aer_process_err_devices(struct pcie_device *p_device,
 
 	/* Report all before handle them, not to lost records by reset etc. */
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (get_device_error_info(e_info->dev[i], e_info))
+		if (aer_get_device_error_info(e_info->dev[i], e_info))
 			aer_print_error(e_info->dev[i], e_info);
 	}
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (get_device_error_info(e_info->dev[i], e_info))
+		if (aer_get_device_error_info(e_info->dev[i], e_info))
 			handle_error_source(p_device, e_info->dev[i], e_info);
 	}
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
                   ` (5 preceding siblings ...)
  2018-04-02 16:22 ` [PATCHv2 6/7] PCI/AER: API for obtaining AER information Keith Busch
@ 2018-04-02 16:22 ` Keith Busch
  2018-05-17 15:02   ` poza
  2018-06-19 21:31 ` [PATCHv2 0/7] DPC updates Bjorn Helgaas
  7 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-04-02 16:22 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya, Keith Busch

A DPC enabled device suppresses ERR_(NON)FATAL messages, preventing the
AER handler from reporting error details. If the DPC trigger reason says
the downstream port detected the error, this patch has the DPC driver
collect the AER uncorrectable status for logging, then clears the status.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index e12837ee4f1c..76f963a5089e 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -85,6 +85,7 @@ static void dpc_wait_link_inactive(struct dpc_dev *dpc)
 
 static void dpc_work(struct work_struct *work)
 {
+	struct aer_err_info info;
 	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
 	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
 	struct pci_bus *parent = pdev->subordinate;
@@ -108,8 +109,12 @@ static void dpc_work(struct work_struct *work)
 		 (ext_reason == 1) ? "software trigger" :
 				     "reserved error");
 
-	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
+	if (dpc->rp_extensions && reason == 3 && ext_reason == 0) {
 		pio_status = dpc_process_rp_pio_error(dpc);
+	} else if (reason == 0 && aer_get_device_error_info(pdev, &info)) {
+		aer_print_error(pdev, &info);
+		pci_cleanup_aer_uncorrect_error_status(pdev);
+	}
 
 	pci_lock_rescan_remove();
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 1/7] PCI/DPC: Enable ERR_COR
  2018-04-02 16:21 ` [PATCHv2 1/7] PCI/DPC: Enable ERR_COR Keith Busch
@ 2018-04-02 21:23   ` Bjorn Helgaas
  2018-04-02 23:09     ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2018-04-02 21:23 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
> A DPC port may be configured to send ERR_COR message when the
> triggered. This patch enables this feature so additional notification
> of the event is possible.

Who is the intended consumer of the ERR_COR message?  I guess if the
root port supports AER, we'll see AER logging when we didn't before?
Is that what you mean by "additional notification"?

Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
that ERR_COR is primarily intended for use by platform firmware, does
this change enable notification via the firmware?  If so, how does
that work?  Does it require the platform to retain control of AER so
it can see these events?  But if the platform retains AER control, I
don't think we'd be enabling DPC in Linux.  I'm confused.

The similar implementation note in 6.2.10.5 suggests that
DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
be setting that for the same reason we're setting
PCI_EXP_DPC_CTL_ERR_COR?

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/pcie-dpc.c   | 5 ++++-
>  include/uapi/linux/pci_regs.h | 1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> index 38e40c6c576f..a9be6938417f 100644
> --- a/drivers/pci/pcie/pcie-dpc.c
> +++ b/drivers/pci/pcie/pcie-dpc.c
> @@ -269,7 +269,10 @@ static int dpc_probe(struct pcie_device *dev)
>  		}
>  	}
>  
> -	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN;
> +	ctl = (ctl & 0xfff4) |
> +		PCI_EXP_DPC_CTL_EN_NONFATAL |
> +		PCI_EXP_DPC_CTL_INT_EN |
> +		PCI_EXP_DPC_CTL_ERR_COR;

The 0xfff4 is a little confusing because it:

  - Clears DPC Trigger Enable (0x0003), then sets
    PCI_EXP_DPC_CTL_EN_NONFATAL.  That makes good sense.  Maybe we
    should have a mask for the 0x0003 value.

  - Preserves whatever the firmware set for DPC Completion Control
    (0x0004).  Seems like maybe we should decide and set this
    explicitly, unless there's a reason we should respect the
    platform's choice.  But that's a separate issue.

  - Clears DPC Interrupt Enable (0x0008), then sets it again.
    Clearing seems pointless.

  - Preserves DPC ERR_COR Enable (0x0010), then sets it.

I think the most readable thing would be:

  ctl = (ctl & 0xfffc) | PCI_EXP_DPC_CTL_EN_NONFATAL;
  ctl |= PCI_EXP_DPC_CTL_INT_EN | PCI_EXP_DPC_CTL_ERR_COR;

or maybe:

  ctl = (ctl & 0xfffc) | PCI_EXP_DPC_CTL_EN_NONFATAL;
  ctl |= PCI_EXP_DPC_CTL_INT_EN;

  /* Platform firmware may rely on these ERR_COR messages */
  ctl |= PCI_EXP_DPC_CTL_ERR_COR | PCI_EXP_DPC_CTL_DL_ACT_ERR_COR;

(If that comment is accurate.)

>  	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
>  
>  	dev_info(device, "DPC error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 0c79eac5e9b8..9cfcecdc3ec7 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -980,6 +980,7 @@
>  #define PCI_EXP_DPC_CTL			6	/* DPC control */
>  #define  PCI_EXP_DPC_CTL_EN_NONFATAL 	0x0002	/* Enable trigger on ERR_NONFATAL message */
>  #define  PCI_EXP_DPC_CTL_INT_EN 	0x0008	/* DPC Interrupt Enable */
> +#define  PCI_EXP_DPC_CTL_ERR_COR 	0x0010	/* DPC ERR_COR Enable */
>  
>  #define PCI_EXP_DPC_STATUS		8	/* DPC Status */
>  #define  PCI_EXP_DPC_STATUS_TRIGGER	    0x0001 /* Trigger Status */
> -- 
> 2.14.3
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 1/7] PCI/DPC: Enable ERR_COR
  2018-04-02 21:23   ` Bjorn Helgaas
@ 2018-04-02 23:09     ` Keith Busch
  2018-04-03 14:16       ` Bjorn Helgaas
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-04-02 23:09 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

On Mon, Apr 02, 2018 at 04:23:02PM -0500, Bjorn Helgaas wrote:
> On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
> > A DPC port may be configured to send ERR_COR message when the
> > triggered. This patch enables this feature so additional notification
> > of the event is possible.
> 
> Who is the intended consumer of the ERR_COR message?  I guess if the
> root port supports AER, we'll see AER logging when we didn't before?
> Is that what you mean by "additional notification"?
> 
> Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
> that ERR_COR is primarily intended for use by platform firmware, does
> this change enable notification via the firmware?  If so, how does
> that work?  Does it require the platform to retain control of AER so
> it can see these events?  But if the platform retains AER control, I
> don't think we'd be enabling DPC in Linux.  I'm confused.
> 
> The similar implementation note in 6.2.10.5 suggests that
> DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
> be setting that for the same reason we're setting
> PCI_EXP_DPC_CTL_ERR_COR?

I think there are various ways this could play out. I added this to
the series per a request for future use, but I actually don't have an
environment where I could test as intended. As I understand, it was
for platform firmware that may own the root port, but not switches in
an external enclosure, so the OS would own the DPC control. The ERR_COR
is to notify firmware so it may note an event in a platform log.

I think you bring up a good point on DL_Active.

This should be harmless even if there are no consumers for the message,
or if the OS owns the root port AER control. But I'm totally fine
with dropping this one in the series, and it's not related to the rest
anyway. I think I at least need to circle back with platform makers and
really test the feature on capable hardware.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 1/7] PCI/DPC: Enable ERR_COR
  2018-04-02 23:09     ` Keith Busch
@ 2018-04-03 14:16       ` Bjorn Helgaas
  2018-04-03 14:31         ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2018-04-03 14:16 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya,
	Rafael J. Wysocki, linux-acpi

[+cc Rafael, linux-acpi]

On Mon, Apr 02, 2018 at 05:09:49PM -0600, Keith Busch wrote:
> On Mon, Apr 02, 2018 at 04:23:02PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
> > > A DPC port may be configured to send ERR_COR message when the
> > > triggered. This patch enables this feature so additional notification
> > > of the event is possible.
> > 
> > Who is the intended consumer of the ERR_COR message?  I guess if the
> > root port supports AER, we'll see AER logging when we didn't before?
> > Is that what you mean by "additional notification"?
> > 
> > Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
> > that ERR_COR is primarily intended for use by platform firmware, does
> > this change enable notification via the firmware?  If so, how does
> > that work?  Does it require the platform to retain control of AER so
> > it can see these events?  But if the platform retains AER control, I
> > don't think we'd be enabling DPC in Linux.  I'm confused.
> > 
> > The similar implementation note in 6.2.10.5 suggests that
> > DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
> > be setting that for the same reason we're setting
> > PCI_EXP_DPC_CTL_ERR_COR?
> 
> I think there are various ways this could play out. I added this to
> the series per a request for future use, but I actually don't have an
> environment where I could test as intended. As I understand, it was
> for platform firmware that may own the root port, but not switches in
> an external enclosure, so the OS would own the DPC control. The ERR_COR
> is to notify firmware so it may note an event in a platform log.

This picture doesn't make sense to me yet.  Per the PCI Firmware spec,
r3.2, sec 4.5.2, we use _OSC to negotiate control over AER (and now
DPC).  I think _OSC is only allowed directly under a host bridge
device (PNP0A08 or PNP0A03), and it applies to the entire hierarchy
under the host bridge.

I don't know how firmware could claim to own AER on a root port, but
not on switches (external or otherwise) below that root port.

If _OSC says firmware owns AER, we won't touch AER on the root port,
but we also won't touch DPC on switches below the root port.  So I
don't see how this change will help.

> I think you bring up a good point on DL_Active.
> 
> This should be harmless even if there are no consumers for the message,
> or if the OS owns the root port AER control. But I'm totally fine
> with dropping this one in the series, and it's not related to the rest
> anyway. I think I at least need to circle back with platform makers and
> really test the feature on capable hardware.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 1/7] PCI/DPC: Enable ERR_COR
  2018-04-03 14:16       ` Bjorn Helgaas
@ 2018-04-03 14:31         ` Rafael J. Wysocki
  2018-04-03 14:37           ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2018-04-03 14:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya,
	Rafael J. Wysocki, ACPI Devel Maling List

On Tue, Apr 3, 2018 at 4:16 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> [+cc Rafael, linux-acpi]

Thanks.

> On Mon, Apr 02, 2018 at 05:09:49PM -0600, Keith Busch wrote:
>> On Mon, Apr 02, 2018 at 04:23:02PM -0500, Bjorn Helgaas wrote:
>> > On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
>> > > A DPC port may be configured to send ERR_COR message when the
>> > > triggered. This patch enables this feature so additional notification
>> > > of the event is possible.
>> >
>> > Who is the intended consumer of the ERR_COR message?  I guess if the
>> > root port supports AER, we'll see AER logging when we didn't before?
>> > Is that what you mean by "additional notification"?
>> >
>> > Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
>> > that ERR_COR is primarily intended for use by platform firmware, does
>> > this change enable notification via the firmware?  If so, how does
>> > that work?  Does it require the platform to retain control of AER so
>> > it can see these events?  But if the platform retains AER control, I
>> > don't think we'd be enabling DPC in Linux.  I'm confused.
>> >
>> > The similar implementation note in 6.2.10.5 suggests that
>> > DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
>> > be setting that for the same reason we're setting
>> > PCI_EXP_DPC_CTL_ERR_COR?
>>
>> I think there are various ways this could play out. I added this to
>> the series per a request for future use, but I actually don't have an
>> environment where I could test as intended. As I understand, it was
>> for platform firmware that may own the root port, but not switches in
>> an external enclosure, so the OS would own the DPC control. The ERR_COR
>> is to notify firmware so it may note an event in a platform log.
>
> This picture doesn't make sense to me yet.  Per the PCI Firmware spec,
> r3.2, sec 4.5.2, we use _OSC to negotiate control over AER (and now
> DPC).  I think _OSC is only allowed directly under a host bridge
> device (PNP0A08 or PNP0A03), and it applies to the entire hierarchy
> under the host bridge.
>
> I don't know how firmware could claim to own AER on a root port, but
> not on switches (external or otherwise) below that root port.
>
> If _OSC says firmware owns AER, we won't touch AER on the root port,
> but we also won't touch DPC on switches below the root port.  So I
> don't see how this change will help.

This matches my understanding of that part of the spec.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 1/7] PCI/DPC: Enable ERR_COR
  2018-04-03 14:31         ` Rafael J. Wysocki
@ 2018-04-03 14:37           ` Keith Busch
  0 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2018-04-03 14:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Linux PCI, Bjorn Helgaas, Oza Pawandeep,
	Sinan Kaya, Rafael J. Wysocki, ACPI Devel Maling List

On Tue, Apr 03, 2018 at 04:31:15PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 3, 2018 at 4:16 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > This picture doesn't make sense to me yet.  Per the PCI Firmware spec,
> > r3.2, sec 4.5.2, we use _OSC to negotiate control over AER (and now
> > DPC).  I think _OSC is only allowed directly under a host bridge
> > device (PNP0A08 or PNP0A03), and it applies to the entire hierarchy
> > under the host bridge.
> >
> > I don't know how firmware could claim to own AER on a root port, but
> > not on switches (external or otherwise) below that root port.
> >
> > If _OSC says firmware owns AER, we won't touch AER on the root port,
> > but we also won't touch DPC on switches below the root port.  So I
> > don't see how this change will help.
> 
> This matches my understanding of that part of the spec.

Thanks, guys. It seems this patch doesn't accomplish anything.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-02 16:21 ` [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement Keith Busch
@ 2018-04-03 20:38   ` Bjorn Helgaas
  2018-04-03 23:04     ` Bjorn Helgaas
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2018-04-03 20:38 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

[+cc Thomas, Rafael]

On Mon, Apr 02, 2018 at 10:21:58AM -0600, Keith Busch wrote:
> From: Oza Pawandeep <poza@codeaurora.org>
> 
> The DPC driver was acknowledging the DPC interrupt status in deferred
> work. That works for edge triggered interrupts, but causes an interrupt
> storm with level triggered legacy interrupts.
> 
> This patch fixes that by clearing the interrupt status in interrupt
> handler.

I'm fuzzy on this question of edge vs. level and where the interrupt
should be cleared.  I'd like to understand this better and include the
general rule in the changelog in case I'm not the only one who is
unclear on this.

Here's my understanding, gleaned from kernel/irq/chip.c and
https://notes.shichao.io/lkd/ch7/ :

  The generic IRQ handling code ensures that an interrupt handler runs
  with its interrupt masked or disabled.  If the interrupt is
  level-triggered, the interrupt handler must tell its device to stop
  asserting the interrupt before returning.  If it doesn't, we will
  immediately take the interrupt again when the handler returns and
  the generic code unmasks the interrupt.

  The driver doesn't know whether its interrupt is edge- or
  level-triggered, so it must clear its interrupt source directly in
  its interrupt handler.

I'd also like to convince myself that we don't have similar issues
with other services, e.g., AER, PME, pciehp.  Here are my notes about
those:

  1) pciehp:
    request_irq(irq, pcie_isr, ...)
    pcie_isr
      pciehp_isr
        # clear Slot Status event bits
        pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
	events = status & (...)
	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);

  2) AER:
    request_irq(dev->irq, aer_irq, ...)
    aer_irq
      # clear AER Root Error Status bits
      pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status);
      pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status);

  3) DPC:
    devm_request_irq(..., dev->irq, dpc_irq, ...)
    dpc_irq
      schedule_work(<dpc_work>)
    ...
    dpc_work
      # clear DPC Interrupt Status
      pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_INTERRUPT);

  4) PME:
    request_irq(srv->irq, pcie_pme_irq, ...)
    pcie_pme_irq
      pcie_pme_interrupt_enable(port, false);
        # clear PME Interrupt Enable
	pcie_capability_clear_word(dev, PCI_EXP_RTCTL, PCI_EXP_RTCTL_PMEIE);
      schedule_work(<pcie_pme_work_fn>)
    ...
    pcie_pme_work_fn
      # clear PME Status
      pcie_capability_set_dword(dev, PCI_EXP_RTSTA, PCI_EXP_RTSTA_PME);
      pcie_pme_interrupt_enable(port, true);
        # set PME Interrupt Enable
        pcie_capability_set_word(dev, PCI_EXP_RTCTL, PCI_EXP_RTCTL_PMEIE);

The pciehp and AER cases look OK to me.  DPC looks definitely wrong,
and this patch should fix it.

I *guess* PME looks OK, although I would feel better about it if it
used the same strategy as the others.  All of these things have both
"Interrupt Enable" and "Interrupt Status" bits.  PME is the only one
that twiddles the Interrupt Enable in the interrupt path.

Bottom line, I think this patch is fine and I applied it with the
following changelog:

  PCI/DPC: Clear interrupt status in interrupt handler top half

  The generic IRQ handling code ensures that an interrupt handler runs with
  its interrupt masked or disabled.  If the interrupt is level-triggered, the
  interrupt handler must tell its device to stop asserting the interrupt
  before returning.  If it doesn't, we will immediately take the interrupt
  again when the handler returns and the generic code unmasks the interrupt.

  The driver doesn't know whether its interrupt is edge- or level-triggered,
  so it must clear its interrupt source directly in its interrupt handler.

  Previously we cleared the DPC interrupt status in the bottom half, i.e., in
  deferred work, which can cause an interrupt storm if the DPC interrupt
  happens to be level-triggered, e.g., if we're using INTx instead of MSI.

  Clear the DPC interrupt status bit in the interrupt handler, not in the
  deferred work.

  Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
  Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
  [bhelgaas: changelog]
  Reviewed-by: Keith Busch <keith.busch@intel.com>


> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> [changelog]
> Reviewed-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/pcie-dpc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> index a9be6938417f..82644245cb4d 100644
> --- a/drivers/pci/pcie/pcie-dpc.c
> +++ b/drivers/pci/pcie/pcie-dpc.c
> @@ -112,7 +112,7 @@ static void dpc_work(struct work_struct *work)
>  	}
>  
>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> -		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
> +			      PCI_EXP_DPC_STATUS_TRIGGER);
>  
>  	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
> @@ -222,6 +222,9 @@ static irqreturn_t dpc_irq(int irq, void *context)
>  	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
>  		dpc_process_rp_pio_error(dpc);
>  
> +	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> +			      PCI_EXP_DPC_STATUS_INTERRUPT);
> +
>  	schedule_work(&dpc->work);
>  
>  	return IRQ_HANDLED;
> -- 
> 2.14.3
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-03 20:38   ` Bjorn Helgaas
@ 2018-04-03 23:04     ` Bjorn Helgaas
  2018-04-04  8:13       ` Thomas Gleixner
  2018-04-26  5:33     ` poza
  2018-05-16 15:16     ` Timur Tabi
  2 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2018-04-03 23:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya,
	Thomas Gleixner, Rafael J. Wysocki

[+cc Thomas, Rafael for real]
On Tue, Apr 03, 2018 at 03:38:47PM -0500, Bjorn Helgaas wrote:
> [+cc Thomas, Rafael]
> 
> On Mon, Apr 02, 2018 at 10:21:58AM -0600, Keith Busch wrote:
> > From: Oza Pawandeep <poza@codeaurora.org>
> > 
> > The DPC driver was acknowledging the DPC interrupt status in deferred
> > work. That works for edge triggered interrupts, but causes an interrupt
> > storm with level triggered legacy interrupts.
> > 
> > This patch fixes that by clearing the interrupt status in interrupt
> > handler.
> 
> I'm fuzzy on this question of edge vs. level and where the interrupt
> should be cleared.  I'd like to understand this better and include the
> general rule in the changelog in case I'm not the only one who is
> unclear on this.
> 
> Here's my understanding, gleaned from kernel/irq/chip.c and
> https://notes.shichao.io/lkd/ch7/ :
> 
>   The generic IRQ handling code ensures that an interrupt handler runs
>   with its interrupt masked or disabled.  If the interrupt is
>   level-triggered, the interrupt handler must tell its device to stop
>   asserting the interrupt before returning.  If it doesn't, we will
>   immediately take the interrupt again when the handler returns and
>   the generic code unmasks the interrupt.
> 
>   The driver doesn't know whether its interrupt is edge- or
>   level-triggered, so it must clear its interrupt source directly in
>   its interrupt handler.
> 
> I'd also like to convince myself that we don't have similar issues
> with other services, e.g., AER, PME, pciehp.  Here are my notes about
> those:
> 
>   1) pciehp:
>     request_irq(irq, pcie_isr, ...)
>     pcie_isr
>       pciehp_isr
>         # clear Slot Status event bits
>         pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
> 	events = status & (...)
> 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
> 
>   2) AER:
>     request_irq(dev->irq, aer_irq, ...)
>     aer_irq
>       # clear AER Root Error Status bits
>       pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status);
>       pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status);
> 
>   3) DPC:
>     devm_request_irq(..., dev->irq, dpc_irq, ...)
>     dpc_irq
>       schedule_work(<dpc_work>)
>     ...
>     dpc_work
>       # clear DPC Interrupt Status
>       pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_INTERRUPT);
> 
>   4) PME:
>     request_irq(srv->irq, pcie_pme_irq, ...)
>     pcie_pme_irq
>       pcie_pme_interrupt_enable(port, false);
>         # clear PME Interrupt Enable
> 	pcie_capability_clear_word(dev, PCI_EXP_RTCTL, PCI_EXP_RTCTL_PMEIE);
>       schedule_work(<pcie_pme_work_fn>)
>     ...
>     pcie_pme_work_fn
>       # clear PME Status
>       pcie_capability_set_dword(dev, PCI_EXP_RTSTA, PCI_EXP_RTSTA_PME);
>       pcie_pme_interrupt_enable(port, true);
>         # set PME Interrupt Enable
>         pcie_capability_set_word(dev, PCI_EXP_RTCTL, PCI_EXP_RTCTL_PMEIE);
> 
> The pciehp and AER cases look OK to me.  DPC looks definitely wrong,
> and this patch should fix it.
> 
> I *guess* PME looks OK, although I would feel better about it if it
> used the same strategy as the others.  All of these things have both
> "Interrupt Enable" and "Interrupt Status" bits.  PME is the only one
> that twiddles the Interrupt Enable in the interrupt path.
> 
> Bottom line, I think this patch is fine and I applied it with the
> following changelog:
> 
>   PCI/DPC: Clear interrupt status in interrupt handler top half
> 
>   The generic IRQ handling code ensures that an interrupt handler runs with
>   its interrupt masked or disabled.  If the interrupt is level-triggered, the
>   interrupt handler must tell its device to stop asserting the interrupt
>   before returning.  If it doesn't, we will immediately take the interrupt
>   again when the handler returns and the generic code unmasks the interrupt.
> 
>   The driver doesn't know whether its interrupt is edge- or level-triggered,
>   so it must clear its interrupt source directly in its interrupt handler.
> 
>   Previously we cleared the DPC interrupt status in the bottom half, i.e., in
>   deferred work, which can cause an interrupt storm if the DPC interrupt
>   happens to be level-triggered, e.g., if we're using INTx instead of MSI.
> 
>   Clear the DPC interrupt status bit in the interrupt handler, not in the
>   deferred work.
> 
>   Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>   Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
>   [bhelgaas: changelog]
>   Reviewed-by: Keith Busch <keith.busch@intel.com>
> 
> 
> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> > [changelog]
> > Reviewed-by: Keith Busch <keith.busch@intel.com>
> > ---
> >  drivers/pci/pcie/pcie-dpc.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> > index a9be6938417f..82644245cb4d 100644
> > --- a/drivers/pci/pcie/pcie-dpc.c
> > +++ b/drivers/pci/pcie/pcie-dpc.c
> > @@ -112,7 +112,7 @@ static void dpc_work(struct work_struct *work)
> >  	}
> >  
> >  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> > -		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
> > +			      PCI_EXP_DPC_STATUS_TRIGGER);
> >  
> >  	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
> >  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
> > @@ -222,6 +222,9 @@ static irqreturn_t dpc_irq(int irq, void *context)
> >  	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
> >  		dpc_process_rp_pio_error(dpc);
> >  
> > +	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> > +			      PCI_EXP_DPC_STATUS_INTERRUPT);
> > +
> >  	schedule_work(&dpc->work);
> >  
> >  	return IRQ_HANDLED;
> > -- 
> > 2.14.3
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-03 23:04     ` Bjorn Helgaas
@ 2018-04-04  8:13       ` Thomas Gleixner
  2018-04-04  8:38         ` Lukas Wunner
  0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2018-04-04  8:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya,
	Rafael J. Wysocki

On Tue, 3 Apr 2018, Bjorn Helgaas wrote:

> [+cc Thomas, Rafael for real]
> On Tue, Apr 03, 2018 at 03:38:47PM -0500, Bjorn Helgaas wrote:
> > [+cc Thomas, Rafael]
> > 
> > On Mon, Apr 02, 2018 at 10:21:58AM -0600, Keith Busch wrote:
> > > From: Oza Pawandeep <poza@codeaurora.org>
> > > 
> > > The DPC driver was acknowledging the DPC interrupt status in deferred
> > > work. That works for edge triggered interrupts, but causes an interrupt
> > > storm with level triggered legacy interrupts.

The problem is homebrewn in the driver. So, yes it needs to mask the
interrupt before returning from the irq handler if the rest of the magic is
done in a worker.

Though this could have been avoided if the driver simply would have used a
threaded interrupt. The core handles that correctly for any type of
interrupts, edge/level. The logic there is:

      low_level_handler()
	   if (level)
	   	irq_mask();
	   primary_handler();
	   wake_thread();
	   if (level && !thread_pending)
	   	irq_unmask();

and the thread handler does

      thread_handler()
      	   thread_handler_function();
	   if (!thread_pending && masked && !disabled)
	   	irq_unmask();

There is a reason why threaded interrupt handlers exist....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-04  8:13       ` Thomas Gleixner
@ 2018-04-04  8:38         ` Lukas Wunner
  0 siblings, 0 replies; 25+ messages in thread
From: Lukas Wunner @ 2018-04-04  8:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya,
	Rafael J. Wysocki

On Wed, Apr 04, 2018 at 10:13:32AM +0200, Thomas Gleixner wrote:
> On Tue, 3 Apr 2018, Bjorn Helgaas wrote:
> > On Tue, Apr 03, 2018 at 03:38:47PM -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 10:21:58AM -0600, Keith Busch wrote:
> > > > From: Oza Pawandeep <poza@codeaurora.org>
> > > > 
> > > > The DPC driver was acknowledging the DPC interrupt status in deferred
> > > > work. That works for edge triggered interrupts, but causes an interrupt
> > > > storm with level triggered legacy interrupts.
> 
> The problem is homebrewn in the driver. So, yes it needs to mask the
> interrupt before returning from the irq handler if the rest of the magic is
> done in a worker.

If the IRQ is shared, the other handlers are starved for a brief
period of time between the IRQ being masked in the handler and
unmasked in the worker.

What is the recommended way to handle this?

I'm asking because I'm working on patches to runtime suspend
pciehp hotplug ports to D3hot.  The first few Thunderbolt
controllers that came to market had broken MSI signalling
and are thus using INTx.  If a hotplug port is signaling an
interrupt while its parent(s) are in D3hot, its config space
is inaccessible for the IRQ handler, so the parent(s) have to
be resumed to D0 first.  This takes too long to be done in an
IRQ handler and I believe can also sleep, so it's done by a worker.

Now the problem is, INTx interrupts may be shared by multiple
devices, and they *are* on my machine.  The conundrum I'm
facing is to mask the IRQ and starve all the other handlers
while the device is woken, or not mask it but risk getting
lots of spurious interrupts.

For now I've gone with the latter approach, so I leave the
IRQ unmasked and return IRQ_NONE because I don't know yet if
the IRQ originated from this particular hotplug port or from
something else.  Now if I check /proc/interrupts, I can see
that about 5000 spurious interrupts were accumulated until
the hotplug port's parents were woken and the interrupt could
finally be handled.

Is there a better way to deal with this?

Just so you get an idea what I'm talking about, this is
/proc/interrupts on a MacBookPro9,1 with a daisy-chain of two
Light Ridge Thunderbolt controllers and one Port Ridge (all
with broken MSI signaling):

  16:  IO-APIC  16-fasteoi  pciehp
  17:  IO-APIC  17-fasteoi  pciehp, pciehp, pciehp, mmc0, snd_hda_intel, b43
  18:  IO-APIC  18-fasteoi  pciehp, pciehp, i801_smbus
  19:  IO-APIC  19-fasteoi  pciehp

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCHv2 6/7] PCI/AER: API for obtaining AER information
  2018-04-02 16:22 ` [PATCHv2 6/7] PCI/AER: API for obtaining AER information Keith Busch
@ 2018-04-09 10:03   ` David Laight
  2018-04-09 14:37     ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: David Laight @ 2018-04-09 10:03 UTC (permalink / raw)
  To: 'Keith Busch', Linux PCI, Bjorn Helgaas; +Cc: Oza Pawandeep, Sinan Kaya

From: Keith Busch
> Sent: 02 April 2018 17:22
> To: Linux PCI; Bjorn Helgaas
> 
> Since this is making the function externally visible, appending "aer_"
> prefix to the function name.
> 
...
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index a4bfea52e7d4..4fb24003cac3 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -638,7 +638,7 @@ static void aer_recover_work_func(struct work_struct *work)
>  #endif
> 
>  /**
> - * get_device_error_info - read error status from dev and store it to info
> + * aer_get_device_error_info - read error status from dev and store it to info
>   * @dev: pointer to the device expected to have a error record
>   * @info: pointer to structure to store the error record
>   *
> @@ -646,7 +646,7 @@ static void aer_recover_work_func(struct work_struct *work)
>   *
>   * Note that @info is reused among all error devices. Clear fields properly.
>   */
> -static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
> +int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>  {
>  	int pos, temp;
> 
> @@ -705,11 +705,11 @@ static inline void aer_process_err_devices(struct pcie_device *p_device,
> 
>  	/* Report all before handle them, not to lost records by reset etc. */
>  	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
> -		if (get_device_error_info(e_info->dev[i], e_info))
> +		if (aer_get_device_error_info(e_info->dev[i], e_info))
>  			aer_print_error(e_info->dev[i], e_info);
>  	}
>  	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
> -		if (get_device_error_info(e_info->dev[i], e_info))
> +		if (aer_get_device_error_info(e_info->dev[i], e_info))
>  			handle_error_source(p_device, e_info->dev[i], e_info);
>  	}
>  }

Should there be an EXPORT_SYMBOL(aer_get_device_error_info) here?
(preferable without _GPL).

	David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 6/7] PCI/AER: API for obtaining AER information
  2018-04-09 10:03   ` David Laight
@ 2018-04-09 14:37     ` Keith Busch
  0 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2018-04-09 14:37 UTC (permalink / raw)
  To: David Laight; +Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

On Mon, Apr 09, 2018 at 10:03:14AM +0000, David Laight wrote:
> 
> Should there be an EXPORT_SYMBOL(aer_get_device_error_info) here?
> (preferable without _GPL).
> 
> 	David

That's necessary only if a module is using the symbol. All existing and
proposed users are built-in, so no need to export it at this time.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-03 20:38   ` Bjorn Helgaas
  2018-04-03 23:04     ` Bjorn Helgaas
@ 2018-04-26  5:33     ` poza
  2018-05-16 15:16     ` Timur Tabi
  2 siblings, 0 replies; 25+ messages in thread
From: poza @ 2018-04-26  5:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Sinan Kaya, linux-pci-owner

On 2018-04-04 02:08, Bjorn Helgaas wrote:
> [+cc Thomas, Rafael]
> 
> On Mon, Apr 02, 2018 at 10:21:58AM -0600, Keith Busch wrote:
>> From: Oza Pawandeep <poza@codeaurora.org>
>> 
>> The DPC driver was acknowledging the DPC interrupt status in deferred
>> work. That works for edge triggered interrupts, but causes an 
>> interrupt
>> storm with level triggered legacy interrupts.
>> 
>> This patch fixes that by clearing the interrupt status in interrupt
>> handler.
> 
> I'm fuzzy on this question of edge vs. level and where the interrupt
> should be cleared.  I'd like to understand this better and include the
> general rule in the changelog in case I'm not the only one who is
> unclear on this.
> 
> Here's my understanding, gleaned from kernel/irq/chip.c and
> https://notes.shichao.io/lkd/ch7/ :
> 
>   The generic IRQ handling code ensures that an interrupt handler runs
>   with its interrupt masked or disabled.  If the interrupt is
>   level-triggered, the interrupt handler must tell its device to stop
>   asserting the interrupt before returning.  If it doesn't, we will
>   immediately take the interrupt again when the handler returns and
>   the generic code unmasks the interrupt.
> 
>   The driver doesn't know whether its interrupt is edge- or
>   level-triggered, so it must clear its interrupt source directly in
>   its interrupt handler.
> 
> I'd also like to convince myself that we don't have similar issues
> with other services, e.g., AER, PME, pciehp.  Here are my notes about
> those:
> 
>   1) pciehp:
>     request_irq(irq, pcie_isr, ...)
>     pcie_isr
>       pciehp_isr
>         # clear Slot Status event bits
>         pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
> 	events = status & (...)
> 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
> 
>   2) AER:
>     request_irq(dev->irq, aer_irq, ...)
>     aer_irq
>       # clear AER Root Error Status bits
>       pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, 
> &status);
>       pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, 
> status);
> 
>   3) DPC:
>     devm_request_irq(..., dev->irq, dpc_irq, ...)
>     dpc_irq
>       schedule_work(<dpc_work>)
>     ...
>     dpc_work
>       # clear DPC Interrupt Status
>       pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> PCI_EXP_DPC_STATUS_INTERRUPT);
> 
>   4) PME:
>     request_irq(srv->irq, pcie_pme_irq, ...)
>     pcie_pme_irq
>       pcie_pme_interrupt_enable(port, false);
>         # clear PME Interrupt Enable
> 	pcie_capability_clear_word(dev, PCI_EXP_RTCTL, PCI_EXP_RTCTL_PMEIE);
>       schedule_work(<pcie_pme_work_fn>)
>     ...
>     pcie_pme_work_fn
>       # clear PME Status
>       pcie_capability_set_dword(dev, PCI_EXP_RTSTA, PCI_EXP_RTSTA_PME);
>       pcie_pme_interrupt_enable(port, true);
>         # set PME Interrupt Enable
>         pcie_capability_set_word(dev, PCI_EXP_RTCTL, 
> PCI_EXP_RTCTL_PMEIE);
> 
> The pciehp and AER cases look OK to me.  DPC looks definitely wrong,
> and this patch should fix it.
> 
> I *guess* PME looks OK, although I would feel better about it if it
> used the same strategy as the others.  All of these things have both
> "Interrupt Enable" and "Interrupt Status" bits.  PME is the only one
> that twiddles the Interrupt Enable in the interrupt path.
> 
> Bottom line, I think this patch is fine and I applied it with the
> following changelog:
> 
>   PCI/DPC: Clear interrupt status in interrupt handler top half
> 
>   The generic IRQ handling code ensures that an interrupt handler runs 
> with
>   its interrupt masked or disabled.  If the interrupt is 
> level-triggered, the
>   interrupt handler must tell its device to stop asserting the 
> interrupt
>   before returning.  If it doesn't, we will immediately take the 
> interrupt
>   again when the handler returns and the generic code unmasks the 
> interrupt.
> 
>   The driver doesn't know whether its interrupt is edge- or 
> level-triggered,
>   so it must clear its interrupt source directly in its interrupt 
> handler.
> 
>   Previously we cleared the DPC interrupt status in the bottom half, 
> i.e., in
>   deferred work, which can cause an interrupt storm if the DPC 
> interrupt
>   happens to be level-triggered, e.g., if we're using INTx instead of 
> MSI.
> 
>   Clear the DPC interrupt status bit in the interrupt handler, not in 
> the
>   deferred work.
> 
>   Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>   Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
>   [bhelgaas: changelog]
>   Reviewed-by: Keith Busch <keith.busch@intel.com>
> 
> 
>> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>> [changelog]
>> Reviewed-by: Keith Busch <keith.busch@intel.com>
>> ---
>>  drivers/pci/pcie/pcie-dpc.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
>> index a9be6938417f..82644245cb4d 100644
>> --- a/drivers/pci/pcie/pcie-dpc.c
>> +++ b/drivers/pci/pcie/pcie-dpc.c
>> @@ -112,7 +112,7 @@ static void dpc_work(struct work_struct *work)
>>  	}
>> 
>>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>> -		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
>> +			      PCI_EXP_DPC_STATUS_TRIGGER);
>> 
>>  	pci_read_config_word(pdev, cap + PCI_EXP_DPC_CTL, &ctl);
>>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_CTL,
>> @@ -222,6 +222,9 @@ static irqreturn_t dpc_irq(int irq, void *context)
>>  	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
>>  		dpc_process_rp_pio_error(dpc);
>> 
>> +	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>> +			      PCI_EXP_DPC_STATUS_INTERRUPT);
>> +
>>  	schedule_work(&dpc->work);
>> 
>>  	return IRQ_HANDLED;
>> --
>> 2.14.3
>> 

Hi Bjorn,

Is this change left out in 4.17 for some reason ?


Regards,
Oza.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-04-03 20:38   ` Bjorn Helgaas
  2018-04-03 23:04     ` Bjorn Helgaas
  2018-04-26  5:33     ` poza
@ 2018-05-16 15:16     ` Timur Tabi
  2018-05-16 21:04       ` Bjorn Helgaas
  2 siblings, 1 reply; 25+ messages in thread
From: Timur Tabi @ 2018-05-16 15:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

On Tue, Apr 3, 2018 at 3:38 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Bottom line, I think this patch is fine and I applied it with the
> following changelog:
>
>   PCI/DPC: Clear interrupt status in interrupt handler top half

Bjorn,

I can't find this patch (with either subject line) anywhere on
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/ or
linux-next.  I see you pulled in Oza's other patch set, but this patch
is still MIA.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement
  2018-05-16 15:16     ` Timur Tabi
@ 2018-05-16 21:04       ` Bjorn Helgaas
  0 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2018-05-16 21:04 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

On Wed, May 16, 2018 at 10:16:15AM -0500, Timur Tabi wrote:
> On Tue, Apr 3, 2018 at 3:38 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > Bottom line, I think this patch is fine and I applied it with the
> > following changelog:
> >
> >   PCI/DPC: Clear interrupt status in interrupt handler top half
> 
> Bjorn,
> 
> I can't find this patch (with either subject line) anywhere on
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/ or
> linux-next.  I see you pulled in Oza's other patch set, but this patch
> is still MIA.

I blew it, sorry.  I think I was hoping to clarify the question of
using threaded interrupts, but that never happened.

I applied this to pci/dpc for v4.18 and pushed it.

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling
  2018-04-02 16:22 ` [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling Keith Busch
@ 2018-05-17 15:02   ` poza
  2018-05-17 15:04     ` poza
  0 siblings, 1 reply; 25+ messages in thread
From: poza @ 2018-05-17 15:02 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Sinan Kaya

On 2018-04-02 21:52, Keith Busch wrote:
> A DPC enabled device suppresses ERR_(NON)FATAL messages, preventing the
> AER handler from reporting error details. If the DPC trigger reason 
> says
> the downstream port detected the error, this patch has the DPC driver
> collect the AER uncorrectable status for logging, then clears the 
> status.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/pcie-dpc.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> index e12837ee4f1c..76f963a5089e 100644
> --- a/drivers/pci/pcie/pcie-dpc.c
> +++ b/drivers/pci/pcie/pcie-dpc.c
> @@ -85,6 +85,7 @@ static void dpc_wait_link_inactive(struct dpc_dev 
> *dpc)
> 
>  static void dpc_work(struct work_struct *work)
>  {
> +	struct aer_err_info info;
>  	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
>  	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
>  	struct pci_bus *parent = pdev->subordinate;
> @@ -108,8 +109,12 @@ static void dpc_work(struct work_struct *work)
>  		 (ext_reason == 1) ? "software trigger" :
>  				     "reserved error");
> 
> -	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
> +	if (dpc->rp_extensions && reason == 3 && ext_reason == 0) {
>  		pio_status = dpc_process_rp_pio_error(dpc);
> +	} else if (reason == 0 && aer_get_device_error_info(pdev, &info)) {
> +		aer_print_error(pdev, &info);
> +		pci_cleanup_aer_uncorrect_error_status(pdev);
> +	}
> 
>  	pci_lock_rescan_remove();
>  	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,


I am not sure if this series is in pursuit. but was wondering why do we 
need to clear aer status in DPC (when DPC is triggered)?
because

6.2.10 for Downstream Port Containment:

   When DPC is triggered due to receipt of an uncorrectable error 
Message,
   the Requester ID from the Message is recorded in the DPC Error
   Source ID register and that Message is discarded and not forwarded
   Upstream. When DPC is triggered by an unmasked uncorrectable error,
   that error will not be signaled with an uncorrectable error Message,
   even if otherwise enabled.

so when DPC is active, the msg is discarded and not forwarded to 
upstream.
which means that we should find AER status set in RP or Switch.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling
  2018-05-17 15:02   ` poza
@ 2018-05-17 15:04     ` poza
  0 siblings, 0 replies; 25+ messages in thread
From: poza @ 2018-05-17 15:04 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Sinan Kaya, linux-pci-owner

On 2018-05-17 20:32, poza@codeaurora.org wrote:
> On 2018-04-02 21:52, Keith Busch wrote:
>> A DPC enabled device suppresses ERR_(NON)FATAL messages, preventing 
>> the
>> AER handler from reporting error details. If the DPC trigger reason 
>> says
>> the downstream port detected the error, this patch has the DPC driver
>> collect the AER uncorrectable status for logging, then clears the 
>> status.
>> 
>> Signed-off-by: Keith Busch <keith.busch@intel.com>
>> ---
>>  drivers/pci/pcie/pcie-dpc.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
>> index e12837ee4f1c..76f963a5089e 100644
>> --- a/drivers/pci/pcie/pcie-dpc.c
>> +++ b/drivers/pci/pcie/pcie-dpc.c
>> @@ -85,6 +85,7 @@ static void dpc_wait_link_inactive(struct dpc_dev 
>> *dpc)
>> 
>>  static void dpc_work(struct work_struct *work)
>>  {
>> +	struct aer_err_info info;
>>  	struct dpc_dev *dpc = container_of(work, struct dpc_dev, work);
>>  	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
>>  	struct pci_bus *parent = pdev->subordinate;
>> @@ -108,8 +109,12 @@ static void dpc_work(struct work_struct *work)
>>  		 (ext_reason == 1) ? "software trigger" :
>>  				     "reserved error");
>> 
>> -	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
>> +	if (dpc->rp_extensions && reason == 3 && ext_reason == 0) {
>>  		pio_status = dpc_process_rp_pio_error(dpc);
>> +	} else if (reason == 0 && aer_get_device_error_info(pdev, &info)) {
>> +		aer_print_error(pdev, &info);
>> +		pci_cleanup_aer_uncorrect_error_status(pdev);
>> +	}
>> 
>>  	pci_lock_rescan_remove();
>>  	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
> 
> 
> I am not sure if this series is in pursuit. but was wondering why do
> we need to clear aer status in DPC (when DPC is triggered)?
> because
> 
> 6.2.10 for Downstream Port Containment:
> 
>   When DPC is triggered due to receipt of an uncorrectable error 
> Message,
>   the Requester ID from the Message is recorded in the DPC Error
>   Source ID register and that Message is discarded and not forwarded
>   Upstream. When DPC is triggered by an unmasked uncorrectable error,
>   that error will not be signaled with an uncorrectable error Message,
>   even if otherwise enabled.
> 
> so when DPC is active, the msg is discarded and not forwarded to 
> upstream.
> which means that we should find AER status set in RP or Switch.
> 
> Regards,
> Oza.

Correction: which means that we should *not* find AER status set in RP 
or Switch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 0/7] DPC updates
  2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
                   ` (6 preceding siblings ...)
  2018-04-02 16:22 ` [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling Keith Busch
@ 2018-06-19 21:31 ` Bjorn Helgaas
  7 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2018-06-19 21:31 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Oza Pawandeep, Sinan Kaya

On Mon, Apr 02, 2018 at 10:21:56AM -0600, Keith Busch wrote:
> I am terribly sorry about the delay getting this back to you. I had some
> scheduling issues with test equipment and other unrelated conflicts.
> 
> I added Oza's legacy IRQ fix into this series, as I didn't see this
> staged yet, and it really does fix some issues as well as provide the
> means to clean up other parts.
> 
> I'm sending this later than I intended, but hoping you might see a way
> to including this to 4.17. I can rebase to 'next' where the file is
> renamed to 'dpc.c'.
> 
> Keith Busch (6):
>   PCI/DPC: Enable ERR_COR
>   PCI/DPC: Leave interrupts enabled while handling event
>   PCI/DPC: Defer event handling to work queue
>   PCI/DPC: Remove rp_pio_status from dpc struct
>   PCI/AER: API for obtaining AER information
>   PCI/DPC: Print AER status in DPC event handling
> 
> Oza Pawandeep (1):
>   PCI/DPC: Fix PCI legacy interrupt acknowledgement
> 
>  drivers/pci/pcie/aer/aerdrv.h      |   1 +
>  drivers/pci/pcie/aer/aerdrv_core.c |   8 +--
>  drivers/pci/pcie/pcie-dpc.c        | 106 +++++++++++++++++--------------------
>  include/uapi/linux/pci_regs.h      |   1 +
>  4 files changed, 55 insertions(+), 61 deletions(-)

Hi Keith, I'm sorry, with the exception of Oza's patch, I didn't get
these applied, and now they don't apply cleanly.  Would you mind
rebasing them to v4.18-rc1?

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-06-19 21:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02 16:21 [PATCHv2 0/7] DPC updates Keith Busch
2018-04-02 16:21 ` [PATCHv2 1/7] PCI/DPC: Enable ERR_COR Keith Busch
2018-04-02 21:23   ` Bjorn Helgaas
2018-04-02 23:09     ` Keith Busch
2018-04-03 14:16       ` Bjorn Helgaas
2018-04-03 14:31         ` Rafael J. Wysocki
2018-04-03 14:37           ` Keith Busch
2018-04-02 16:21 ` [PATCHv2 2/7] PCI/DPC: Fix PCI legacy interrupt acknowledgement Keith Busch
2018-04-03 20:38   ` Bjorn Helgaas
2018-04-03 23:04     ` Bjorn Helgaas
2018-04-04  8:13       ` Thomas Gleixner
2018-04-04  8:38         ` Lukas Wunner
2018-04-26  5:33     ` poza
2018-05-16 15:16     ` Timur Tabi
2018-05-16 21:04       ` Bjorn Helgaas
2018-04-02 16:21 ` [PATCHv2 3/7] PCI/DPC: Leave interrupts enabled while handling event Keith Busch
2018-04-02 16:22 ` [PATCHv2 4/7] PCI/DPC: Defer event handling to work queue Keith Busch
2018-04-02 16:22 ` [PATCHv2 5/7] PCI/DPC: Remove rp_pio_status from dpc struct Keith Busch
2018-04-02 16:22 ` [PATCHv2 6/7] PCI/AER: API for obtaining AER information Keith Busch
2018-04-09 10:03   ` David Laight
2018-04-09 14:37     ` Keith Busch
2018-04-02 16:22 ` [PATCHv2 7/7] PCI/DPC: Print AER status in DPC event handling Keith Busch
2018-05-17 15:02   ` poza
2018-05-17 15:04     ` poza
2018-06-19 21:31 ` [PATCHv2 0/7] DPC updates Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).