ntb.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api
@ 2022-09-28 10:59 Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core Zhuo Chen
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

Hello.

Here comes patch v3, which contains some fixes and optimizations of
aer api usage. The v1 and v2 can be found on the mailing list.

v3:
- Modifications to comments proposed by Sathyanarayanan. Remove
  pci_aer_clear_nonfatal_status() call in NTB and improve commit log. 

v2:
- Modifications to comments proposed by Bjorn. Split patch into more
  obvious parts.

Zhuo Chen (9):
  PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
  PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
    uncorrectable error status
  NTB: Remove pci_aer_clear_nonfatal_status() call
  scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  PCI/AER: Unexport pci_aer_clear_nonfatal_status()
  PCI/AER: Move check inside pcie_clear_device_status().
  PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
  PCI/ERR: Clear fatal error status when pci_channel_io_frozen
  PCI/AER: Refine status clearing process with api

 drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
 drivers/pci/pci.c               |  7 +++--
 drivers/pci/pci.h               |  2 ++
 drivers/pci/pcie/aer.c          | 45 +++++++++++++++++++--------------
 drivers/pci/pcie/dpc.c          |  3 +--
 drivers/pci/pcie/err.c          | 15 ++++-------
 drivers/pci/pcie/portdrv_core.c |  3 +--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
 include/linux/aer.h             |  4 +--
 9 files changed, 44 insertions(+), 41 deletions(-)

-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2023-03-15 21:29   ` Bjorn Helgaas
  2022-09-28 10:59 ` [PATCH v3 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

In lpfc_aer_cleanup_state(), uncorrectable error status needs to be
cleared, which can be done by calling pci_aer_clear_nonfatal_status()
and pci_aer_clear_fatal_status(). Meanwhile they can be combined in
one function (the same in dpc_process_error). So add
pci_aer_clear_uncorrect_error_status() function to PCI core and
export symbol to other modules which wants to use it.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/aer.c | 16 ++++++++++++++++
 include/linux/aer.h    |  5 +++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..4e637121be23 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
 		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
 }
 
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+	int aer = dev->aer_cap;
+	u32 status;
+
+	if (!pcie_aer_is_native(dev))
+		return -EIO;
+
+	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
+	if (status)
+		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
+
 /**
  * pci_aer_raw_clear_status - Clear AER error registers.
  * @dev: the PCI device
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..154690c278cb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
 #else
@@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+	return -EINVAL;
+}
 static inline void pci_save_aer_state(struct pci_dev *dev) {}
 static inline void pci_restore_aer_state(struct pci_dev *dev) {}
 #endif
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call Zhuo Chen
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

pci_aer_clear_uncorrect_error_status() clears both fatal and non-fatal
errors. So use it in place of pci_aer_clear_nonfatal_status()
and pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/dpc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d..7942073fbb34 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
 		aer_print_error(pdev, &info);
-		pci_aer_clear_nonfatal_status(pdev);
-		pci_aer_clear_fatal_status(pdev);
+		pci_aer_clear_uncorrect_error_status(pdev);
 	}
 }
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-09-28 11:03   ` Serge Semin
  2023-03-15 21:31   ` Bjorn Helgaas
  2022-09-28 10:59 ` [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status() Zhuo Chen
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

There is no need to clear error status during init code, so remove it.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 0ed6f809ff2e..fed03217289d 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
 	ret = pci_enable_pcie_error_reporting(pdev);
 	if (ret != 0)
 		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
-	else /* Cleanup nonfatal error status before getting to init */
-		pci_aer_clear_nonfatal_status(pdev);
 
 	/* First enable the PCI device */
 	ret = pcim_enable_device(pdev);
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (2 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-12-06 22:13   ` Bjorn Helgaas
  2022-09-28 10:59 ` [PATCH v3 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status() Zhuo Chen
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
uncorrectable error status. But using pci_aer_clear_nonfatal_status()
will only clear non-fatal error status. To clear both fatal and
non-fatal error status, use pci_aer_clear_uncorrect_error_status().

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 09cf2cd0ae60..d835cc0ba153 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
  * Description:
  * If the @buf contains 1 and the device currently has the AER support
  * enabled, then invokes the kernel AER helper routine
- * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
+ * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
  * error status register.
  *
  * Notes:
@@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
 		return -EINVAL;
 
 	if (phba->hba_flag & HBA_AER_ENABLED)
-		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
+		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
 
 	if (rc == 0)
 		return strlen(buf);
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status()
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (3 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status() Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 6/9] PCI/AER: Move check inside pcie_clear_device_status() Zhuo Chen
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pci.h      | 2 ++
 drivers/pci/pcie/aer.c | 7 ++-----
 include/linux/aer.h    | 5 -----
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 785f31086313..a114175d08e4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -684,6 +684,7 @@ void pci_aer_init(struct pci_dev *dev);
 void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
@@ -691,6 +692,7 @@ static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
 static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
 #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4e637121be23..e2ebd108339d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	int aer = dev->aer_cap;
 	u32 status, sev;
 
 	if (!pcie_aer_is_native(dev))
-		return -EIO;
+		return;
 
 	/* Clear status bits for ERR_NONFATAL errors only */
 	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 	status &= ~sev;
 	if (status)
 		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
-
-	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 154690c278cb..f638ad955deb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,7 +44,6 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
@@ -57,10 +56,6 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
-static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
-{
-	return -EINVAL;
-}
 static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
 {
 	return -EINVAL;
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/9] PCI/AER: Move check inside pcie_clear_device_status().
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (4 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status() Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER Zhuo Chen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after commit 068c29a248b6 ("PCI/ERR: Clear PCIe Device
Status errors only if OS owns AER") and commit aa344bc8b727 ("PCI/ERR:
Clear AER status only when we control AER"), both callers check before
calling it. So move the check inside pcie_clear_device_status().

pcie_clear_device_status() and pci_aer_clear_nonfatal_status() both
have check internally, so remove check when callers calling them.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pci.c      |  7 +++++--
 drivers/pci/pcie/aer.c |  4 ++--
 drivers/pci/pcie/err.c | 14 +++-----------
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 95bc329e74c0..8caf4a5529a1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2282,9 +2282,12 @@ EXPORT_SYMBOL_GPL(pci_set_pcie_reset_state);
 void pcie_clear_device_status(struct pci_dev *dev)
 {
 	u16 sta;
+	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
-	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
-	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+	if (host->native_aer || pcie_ports_native) {
+		pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
+		pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+	}
 }
 #endif
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2ebd108339d..e2320ab27a31 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -971,11 +971,11 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
 		 * Correctable error does not need software intervention.
 		 * No need to go through error recovery process.
 		 */
-		if (aer)
+		if (aer) {
 			pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
 					info->status);
-		if (pcie_aer_is_native(dev))
 			pcie_clear_device_status(dev);
+		}
 	} else if (info->severity == AER_NONFATAL)
 		pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
 	else if (info->severity == AER_FATAL)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 59c90d04a609..f80b21244ef1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -188,7 +188,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	int type = pci_pcie_type(dev);
 	struct pci_dev *bridge;
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 
 	/*
 	 * If the error was detected by a Root Port, Downstream Port, RCEC,
@@ -241,16 +240,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_dbg(bridge, "broadcast resume message\n");
 	pci_walk_bridge(bridge, report_resume, &status);
 
-	/*
-	 * If we have native control of AER, clear error status in the device
-	 * that detected the error.  If the platform retained control of AER,
-	 * it is responsible for clearing this status.  In that case, the
-	 * signaling device may not even be visible to the OS.
-	 */
-	if (host->native_aer || pcie_ports_native) {
-		pcie_clear_device_status(dev);
-		pci_aer_clear_nonfatal_status(dev);
-	}
+	pcie_clear_device_status(dev);
+	pci_aer_clear_nonfatal_status(dev);
+
 	pci_info(bridge, "device recovery successful\n");
 	return status;
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (5 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 6/9] PCI/AER: Move check inside pcie_clear_device_status() Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-09-28 10:59 ` [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen Zhuo Chen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

Use pcie_aer_is_native() in place of "host->native_aer ||
pcie_ports_native" to judge whether OS owns AER in aer_root_reset().

Replace "dev->aer_cap && (pcie_ports_native || host->native_aer)" in
get_port_device_capability() with pcie_aer_is_native(), which has no
functional changes.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/aer.c          | 5 ++---
 drivers/pci/pcie/portdrv_core.c | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2320ab27a31..a6d29269ccf2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1403,7 +1403,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 	int type = pci_pcie_type(dev);
 	struct pci_dev *root;
 	int aer;
-	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 	u32 reg32;
 	int rc;
 
@@ -1424,7 +1423,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 	 */
 	aer = root ? root->aer_cap : 0;
 
-	if ((host->native_aer || pcie_ports_native) && aer) {
+	if (aer && pcie_aer_is_native(root)) {
 		/* Disable Root's interrupt in response to error messages */
 		pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
 		reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
@@ -1443,7 +1442,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 			pci_is_root_bus(dev->bus) ? "Root" : "Downstream", rc);
 	}
 
-	if ((host->native_aer || pcie_ports_native) && aer) {
+	if (aer && pcie_aer_is_native(root)) {
 		/* Clear Root Error Status */
 		pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, &reg32);
 		pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 1ac7fec47d6f..844297c0c85e 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -221,8 +221,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 	}
 
 #ifdef CONFIG_PCIEAER
-	if (dev->aer_cap && pci_aer_available() &&
-	    (pcie_ports_native || host->native_aer))
+	if (pcie_aer_is_native(dev) && pci_aer_available())
 		services |= PCIE_PORT_SERVICE_AER;
 #endif
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (6 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-12-06 21:42   ` Bjorn Helgaas
  2022-09-28 10:59 ` [PATCH v3 9/9] PCI/AER: Refine status clearing process with api Zhuo Chen
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

When state is pci_channel_io_frozen in pcie_do_recovery(), the
severity is fatal and fatal error status should be cleared.
So add pci_aer_clear_fatal_status().

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/err.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index f80b21244ef1..b46f1d36c090 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -241,7 +241,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_walk_bridge(bridge, report_resume, &status);
 
 	pcie_clear_device_status(dev);
-	pci_aer_clear_nonfatal_status(dev);
+	if (state == pci_channel_io_frozen)
+		pci_aer_clear_fatal_status(dev);
+	else
+		pci_aer_clear_nonfatal_status(dev);
 
 	pci_info(bridge, "device recovery successful\n");
 	return status;
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 9/9] PCI/AER: Refine status clearing process with api
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (7 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen Zhuo Chen
@ 2022-09-28 10:59 ` Zhuo Chen
  2022-09-28 11:06 ` [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Serge Semin
  2022-11-04 17:20 ` Zhuo Chen
  10 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 10:59 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

Statements clearing status in aer_enable_rootport() is functionally
equivalent with pcie_clear_device_status() and pci_aer_clear_status().
So replace them, which has no functional changes.

After commit 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status()
to unconditionally clear Error Status"), pci_aer_raw_clear_status()
is only used by the EDR path, so we add note in function comment.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/aer.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a6d29269ccf2..bd5ecfa4860f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -306,6 +306,8 @@ EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
  * Clearing AER error status registers unconditionally, regardless of
  * whether they're owned by firmware or the OS.
  *
+ * Used only by the EDR path. Other paths should use pci_aer_clear_status().
+ *
  * Returns 0 on success, or negative on failure.
  */
 int pci_aer_raw_clear_status(struct pci_dev *dev)
@@ -1277,24 +1279,17 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 {
 	struct pci_dev *pdev = rpc->rpd;
 	int aer = pdev->aer_cap;
-	u16 reg16;
 	u32 reg32;
 
 	/* Clear PCIe Capability's Device Status */
-	pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &reg16);
-	pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+	pcie_clear_device_status(pdev);
 
 	/* Disable system error generation in response to error messages */
 	pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
 				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
 	/* Clear error status */
-	pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-	pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-	pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+	pci_aer_clear_status(pdev);
 
 	/*
 	 * Enable error reporting for the root port device and downstream port
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call
  2022-09-28 10:59 ` [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call Zhuo Chen
@ 2022-09-28 11:03   ` Serge Semin
  2022-12-06 18:09     ` Bjorn Helgaas
  2023-03-15 21:31   ` Bjorn Helgaas
  1 sibling, 1 reply; 23+ messages in thread
From: Serge Semin @ 2022-09-28 11:03 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall, jdmason,
	dave.jiang, allenbh, james.smart, dick.kennedy, jejb,
	martin.petersen, linuxppc-dev, linux-pci, linux-kernel, ntb,
	linux-scsi

On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.

Why do you think there isn't? Justify in more details.

-Sergey

> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>  	ret = pci_enable_pcie_error_reporting(pdev);
>  	if (ret != 0)
>  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> -	else /* Cleanup nonfatal error status before getting to init */
> -		pci_aer_clear_nonfatal_status(pdev);
>  
>  	/* First enable the PCI device */
>  	ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (8 preceding siblings ...)
  2022-09-28 10:59 ` [PATCH v3 9/9] PCI/AER: Refine status clearing process with api Zhuo Chen
@ 2022-09-28 11:06 ` Serge Semin
  2022-09-28 15:54   ` [External] " Zhuo Chen
  2022-11-04 17:20 ` Zhuo Chen
  10 siblings, 1 reply; 23+ messages in thread
From: Serge Semin @ 2022-09-28 11:06 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall, jdmason,
	dave.jiang, allenbh, james.smart, dick.kennedy, jejb,
	martin.petersen, linuxppc-dev, linux-pci, linux-kernel, ntb,
	linux-scsi

On Wed, Sep 28, 2022 at 06:59:37PM +0800, Zhuo Chen wrote:
> Hello.
> 
> Here comes patch v3, which contains some fixes and optimizations of
> aer api usage. The v1 and v2 can be found on the mailing list.
> 
> v3:
> - Modifications to comments proposed by Sathyanarayanan.

> Remove
>   pci_aer_clear_nonfatal_status() call in NTB and improve commit log. 

Failed to see who has requested that...

-Sergey

> 
> v2:
> - Modifications to comments proposed by Bjorn. Split patch into more
>   obvious parts.
> 
> Zhuo Chen (9):
>   PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
>   PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
>     uncorrectable error status
>   NTB: Remove pci_aer_clear_nonfatal_status() call
>   scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
>   PCI/AER: Unexport pci_aer_clear_nonfatal_status()
>   PCI/AER: Move check inside pcie_clear_device_status().
>   PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
>   PCI/ERR: Clear fatal error status when pci_channel_io_frozen
>   PCI/AER: Refine status clearing process with api
> 
>  drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
>  drivers/pci/pci.c               |  7 +++--
>  drivers/pci/pci.h               |  2 ++
>  drivers/pci/pcie/aer.c          | 45 +++++++++++++++++++--------------
>  drivers/pci/pcie/dpc.c          |  3 +--
>  drivers/pci/pcie/err.c          | 15 ++++-------
>  drivers/pci/pcie/portdrv_core.c |  3 +--
>  drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
>  include/linux/aer.h             |  4 +--
>  9 files changed, 44 insertions(+), 41 deletions(-)
> 
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [External] Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api
  2022-09-28 11:06 ` [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Serge Semin
@ 2022-09-28 15:54   ` Zhuo Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-09-28 15:54 UTC (permalink / raw)
  To: Serge Semin, bhelgaas, Sathyanarayanan Kuppuswamy
  Cc: ruscur, oohall, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen, linuxppc-dev, linux-pci,
	linux-kernel, ntb, linux-scsi



On 9/28/22 7:06 PM, Serge Semin wrote:
> On Wed, Sep 28, 2022 at 06:59:37PM +0800, Zhuo Chen wrote:
>> Hello.
>>
>> Here comes patch v3, which contains some fixes and optimizations of
>> aer api usage. The v1 and v2 can be found on the mailing list.
>>
>> v3:
>> - Modifications to comments proposed by Sathyanarayanan.
> 
>> Remove
>>    pci_aer_clear_nonfatal_status() call in NTB and improve commit log.
> 
> Failed to see who has requested that...
> 
> -Sergey
> 
Hi, Sergey

Currently other vendor drivers do not clear error status in their own 
init code, I don't exactly know what is special reason for clearing 
error status during init code in ntb driver.

An evidence is in pci_aer_init(), PCI core driver has do 
pci_aer_clear_status() and pci_enable_pcie_error_reporting() in common 
process. So vendor drivers don't need to do again.

But I don't know the reason why many vendor drivers reserve 
pci_enable_pcie_error_reporting() after commit f26e58bf6f54 ("PCI/AER: 
Enable error reporting when AER is native"). Do they need to be removed?
Could Bjorn and Sathyanarayanan help look into it, thanks a lot.

Thanks.
>>
>> v2:
>> - Modifications to comments proposed by Bjorn. Split patch into more
>>    obvious parts.
>>
>> Zhuo Chen (9):
>>    PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
>>    PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
>>      uncorrectable error status
>>    NTB: Remove pci_aer_clear_nonfatal_status() call
>>    scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
>>    PCI/AER: Unexport pci_aer_clear_nonfatal_status()
>>    PCI/AER: Move check inside pcie_clear_device_status().
>>    PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
>>    PCI/ERR: Clear fatal error status when pci_channel_io_frozen
>>    PCI/AER: Refine status clearing process with api
>>
>>   drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
>>   drivers/pci/pci.c               |  7 +++--
>>   drivers/pci/pci.h               |  2 ++
>>   drivers/pci/pcie/aer.c          | 45 +++++++++++++++++++--------------
>>   drivers/pci/pcie/dpc.c          |  3 +--
>>   drivers/pci/pcie/err.c          | 15 ++++-------
>>   drivers/pci/pcie/portdrv_core.c |  3 +--
>>   drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
>>   include/linux/aer.h             |  4 +--
>>   9 files changed, 44 insertions(+), 41 deletions(-)
>>
>> -- 
>> 2.30.1 (Apple Git-130)
>>

-- 
Zhuo Chen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api
  2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
                   ` (9 preceding siblings ...)
  2022-09-28 11:06 ` [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Serge Semin
@ 2022-11-04 17:20 ` Zhuo Chen
  2022-11-24 11:55   ` Zhuo Chen
  10 siblings, 1 reply; 23+ messages in thread
From: Zhuo Chen @ 2022-11-04 17:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Sathyanarayanan Kuppuswamy
  Cc: ruscur, oohall, fancer.lancer, jdmason, dave.jiang, allenbh,
	james.smart, dick.kennedy, jejb, martin.petersen, bhelgaas,
	linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi

Hi Bjorn, a gentle reminder.

Thanks and regards.

On 9/28/22 6:59 PM, Zhuo Chen wrote:
> Hello.
> 
> Here comes patch v3, which contains some fixes and optimizations of
> aer api usage. The v1 and v2 can be found on the mailing list.
> 
> v3:
> - Modifications to comments proposed by Sathyanarayanan. Remove
>    pci_aer_clear_nonfatal_status() call in NTB and improve commit log.
> 
> v2:
> - Modifications to comments proposed by Bjorn. Split patch into more
>    obvious parts.
> 
> Zhuo Chen (9):
>    PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
>    PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
>      uncorrectable error status
>    NTB: Remove pci_aer_clear_nonfatal_status() call
>    scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
>    PCI/AER: Unexport pci_aer_clear_nonfatal_status()
>    PCI/AER: Move check inside pcie_clear_device_status().
>    PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
>    PCI/ERR: Clear fatal error status when pci_channel_io_frozen
>    PCI/AER: Refine status clearing process with api
> 
>   drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
>   drivers/pci/pci.c               |  7 +++--
>   drivers/pci/pci.h               |  2 ++
>   drivers/pci/pcie/aer.c          | 45 +++++++++++++++++++--------------
>   drivers/pci/pcie/dpc.c          |  3 +--
>   drivers/pci/pcie/err.c          | 15 ++++-------
>   drivers/pci/pcie/portdrv_core.c |  3 +--
>   drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
>   include/linux/aer.h             |  4 +--
>   9 files changed, 44 insertions(+), 41 deletions(-)
> 

-- 
Zhuo Chen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api
  2022-11-04 17:20 ` Zhuo Chen
@ 2022-11-24 11:55   ` Zhuo Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Zhuo Chen @ 2022-11-24 11:55 UTC (permalink / raw)
  To: Bjorn Helgaas, Sathyanarayanan Kuppuswamy
  Cc: ruscur, oohall, fancer.lancer, jdmason, dave.jiang, allenbh,
	james.smart, dick.kennedy, jejb, martin.petersen, bhelgaas,
	linuxppc-dev, linux-pci, linux-kernel, ntb, linux-scsi


Ping. Gentle reminder


On 11/5/22 1:20 AM, Zhuo Chen wrote:
> Hi Bjorn, a gentle reminder.
> 
> Thanks and regards.
> 
> On 9/28/22 6:59 PM, Zhuo Chen wrote:
>> Hello.
>>
>> Here comes patch v3, which contains some fixes and optimizations of
>> aer api usage. The v1 and v2 can be found on the mailing list.
>>
>> v3:
>> - Modifications to comments proposed by Sathyanarayanan. Remove
>>    pci_aer_clear_nonfatal_status() call in NTB and improve commit log.
>>
>> v2:
>> - Modifications to comments proposed by Bjorn. Split patch into more
>>    obvious parts.
>>
>> Zhuo Chen (9):
>>    PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
>>    PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear
>>      uncorrectable error status
>>    NTB: Remove pci_aer_clear_nonfatal_status() call
>>    scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
>>    PCI/AER: Unexport pci_aer_clear_nonfatal_status()
>>    PCI/AER: Move check inside pcie_clear_device_status().
>>    PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER
>>    PCI/ERR: Clear fatal error status when pci_channel_io_frozen
>>    PCI/AER: Refine status clearing process with api
>>
>>   drivers/ntb/hw/idt/ntb_hw_idt.c |  2 --
>>   drivers/pci/pci.c               |  7 +++--
>>   drivers/pci/pci.h               |  2 ++
>>   drivers/pci/pcie/aer.c          | 45 +++++++++++++++++++--------------
>>   drivers/pci/pcie/dpc.c          |  3 +--
>>   drivers/pci/pcie/err.c          | 15 ++++-------
>>   drivers/pci/pcie/portdrv_core.c |  3 +--
>>   drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
>>   include/linux/aer.h             |  4 +--
>>   9 files changed, 44 insertions(+), 41 deletions(-)
>>
> 

-- 
Zhuo Chen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call
  2022-09-28 11:03   ` Serge Semin
@ 2022-12-06 18:09     ` Bjorn Helgaas
  2022-12-06 21:41       ` Serge Semin
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2022-12-06 18:09 UTC (permalink / raw)
  To: Serge Semin
  Cc: Zhuo Chen, sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	jdmason, dave.jiang, allenbh, james.smart, dick.kennedy, jejb,
	martin.petersen, linuxppc-dev, linux-pci, linux-kernel, ntb,
	linux-scsi

On Wed, Sep 28, 2022 at 02:03:55PM +0300, Serge Semin wrote:
> On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> > There is no need to clear error status during init code, so remove it.
> 
> Why do you think there isn't? Justify in more details.

Thanks for taking a look, Sergey!  I agree we should leave it or add
the rationale here.

> > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > ---
> >  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > index 0ed6f809ff2e..fed03217289d 100644
> > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> >  	ret = pci_enable_pcie_error_reporting(pdev);
> >  	if (ret != 0)
> >  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > -	else /* Cleanup nonfatal error status before getting to init */
> > -		pci_aer_clear_nonfatal_status(pdev);

I do think drivers should not need to clear errors; I think the PCI
core should be responsible for that.

And I think the core *does* do that in this path:

  pci_init_capabilities
    pci_aer_init
      pci_aer_clear_status
        pci_aer_raw_clear_status
          pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS)
          pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS)

pci_aer_clear_nonfatal_status() clears only non-fatal uncorrectable
errors, while pci_aer_init() clears all correctable and all
uncorrectable errors, so the PCI core is already doing more than
idt_init_pci() does.

So I think this change is good because it removes some work from the
driver, but let me know if you think otherwise.

> >  
> >  	/* First enable the PCI device */
> >  	ret = pcim_enable_device(pdev);
> > -- 
> > 2.30.1 (Apple Git-130)
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call
  2022-12-06 18:09     ` Bjorn Helgaas
@ 2022-12-06 21:41       ` Serge Semin
  0 siblings, 0 replies; 23+ messages in thread
From: Serge Semin @ 2022-12-06 21:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Zhuo Chen, sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	jdmason, dave.jiang, allenbh, james.smart, dick.kennedy, jejb,
	martin.petersen, linuxppc-dev, linux-pci, linux-kernel, ntb,
	linux-scsi

Hi Bjorn

On Tue, Dec 06, 2022 at 12:09:56PM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 28, 2022 at 02:03:55PM +0300, Serge Semin wrote:
> > On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> > > There is no need to clear error status during init code, so remove it.
> > 
> > Why do you think there isn't? Justify in more details.
> 
> Thanks for taking a look, Sergey!  I agree we should leave it or add
> the rationale here.
> 
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > > ---
> > >  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 0ed6f809ff2e..fed03217289d 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > >  	ret = pci_enable_pcie_error_reporting(pdev);
> > >  	if (ret != 0)
> > >  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > > -	else /* Cleanup nonfatal error status before getting to init */
> > > -		pci_aer_clear_nonfatal_status(pdev);
> 
> I do think drivers should not need to clear errors; I think the PCI
> core should be responsible for that.
> 
> And I think the core *does* do that in this path:
> 
>   pci_init_capabilities
>     pci_aer_init
>       pci_aer_clear_status
>         pci_aer_raw_clear_status
>           pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS)
>           pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS)
> 
> pci_aer_clear_nonfatal_status() clears only non-fatal uncorrectable
> errors, while pci_aer_init() clears all correctable and all
> uncorrectable errors, so the PCI core is already doing more than
> idt_init_pci() does.
> 
> So I think this change is good because it removes some work from the
> driver, but let me know if you think otherwise.

It's hard to remember now all the details but IIRC back when this
driver was developed the "Unsupported Request" flag was left uncleared
on our platform even after the probe completion. Most likely an
erroneous TLP was generated by some action performed on the device
probe stage. The forced cleanup of the AER status solved that problem.
On the other hand the problem of having the UnsupReq+ flag set was
solved some time after the driver was merged in into the kernel (it
was caused by a vendor-specific behavior of the IDT PCIe switch placed
on the path between a RP and PCIe NTB). So since the original reason
of having the pci_aer_clear_nonfatal_status() method called here was
platform specific and fixed now anyway, and the AER flags cleanup is
done by the core, then I have no reason to be against the patch. It
would be good to add your clarification to the commit message though.

Reviewed-by: Serge Semin <fancer.lancer@gmail.com>

-Serge(y)

> 
> > >  
> > >  	/* First enable the PCI device */
> > >  	ret = pcim_enable_device(pdev);
> > > -- 
> > > 2.30.1 (Apple Git-130)
> > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen
  2022-09-28 10:59 ` [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen Zhuo Chen
@ 2022-12-06 21:42   ` Bjorn Helgaas
  0 siblings, 0 replies; 23+ messages in thread
From: Bjorn Helgaas @ 2022-12-06 21:42 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen, linuxppc-dev, linux-pci,
	linux-kernel, ntb, linux-scsi

Hi Zhuo,

On Wed, Sep 28, 2022 at 06:59:45PM +0800, Zhuo Chen wrote:
> When state is pci_channel_io_frozen in pcie_do_recovery(), the
> severity is fatal and fatal error status should be cleared.
> So add pci_aer_clear_fatal_status().
> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/pci/pcie/err.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index f80b21244ef1..b46f1d36c090 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -241,7 +241,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	pci_walk_bridge(bridge, report_resume, &status);
>  
>  	pcie_clear_device_status(dev);
> -	pci_aer_clear_nonfatal_status(dev);
> +	if (state == pci_channel_io_frozen)
> +		pci_aer_clear_fatal_status(dev);
> +	else
> +		pci_aer_clear_nonfatal_status(dev);

I'm confused.  It seems like we certainly need to clear fatal errors
after they occur *somewhere*, and if we don't, surely this would be a
very obvious issue.  But you didn't mention this being a bug fix, so I
assume it's more of a cleanup.

If it *is* a bug fix, please say that and give a hint about what the
bug looks like, e.g., what sort of messages a user might see.

If it's not a bug fix, I don't understand how AER fatal errors get
cleared today.  The PCI_ERR_UNCOR_STATUS bits are sticky, so they're
not cleared by a reset.  In the current tree, these are the only
places I see that clear AER fatal errors:

  pci_init_capabilities
    pci_aer_init         # once at device enumeration
      pci_aer_clear_status
        pci_aer_raw_clear_status
          pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

  aer_probe
    aer_enable_rootport  # once at Root Port enumeration
      pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32)

  dpc_process_error      # after DPC triggered
    pci_aer_clear_fatal_status
      pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

  edr_handle_event       # after EDR event
    pci_aer_raw_clear_status
      pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

  pci_restore_state      # after reset or PM sleep/resume
    pci_aer_clear_status
      pci_aer_raw_clear_status
        pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status)

The only one that could clear errors after an AER error (not DPC or
EDR), would be the pci_restore_state() in the reset path.  If the
current code relies on that, I'd say that's a pretty non-obvious
dependency.

>  	pci_info(bridge, "device recovery successful\n");
>  	return status;
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  2022-09-28 10:59 ` [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status() Zhuo Chen
@ 2022-12-06 22:13   ` Bjorn Helgaas
  2023-03-15 21:35     ` Bjorn Helgaas
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2022-12-06 22:13 UTC (permalink / raw)
  To: Zhuo Chen, James Smart, Dick Kennedy
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, jejb,
	martin.petersen, linux-scsi, linux-pci, linux-kernel, ntb,
	linuxppc-dev

[moved James, Dick, LPFC supporters to "to"]

On Wed, Sep 28, 2022 at 06:59:41PM +0800, Zhuo Chen wrote:
> lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
> uncorrectable error status.

I don't know what the point of lpfc_aer_cleanup_state() is.  AER
errors should be handled and cleared by the PCI core, not by
individual drivers.  Only lpfc, liquidio, and sky2 touch
PCI_ERR_UNCOR_STATUS.

But lpfc_aer_cleanup_state() is visible in the
"lpfc_aer_state_cleanup" sysfs file, so removing it would break any
userspace that uses it.

If we can rely on the PCI core to clean up AER errors itself
(admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
could just become a no-op?

Any comment from the LPFC folks?

Ideally, I would rather not export pci_aer_clear_nonfatal_status() or
pci_aer_clear_uncorrect_error_status() outside the PCI core at all.

> But using pci_aer_clear_nonfatal_status()
> will only clear non-fatal error status. To clear both fatal and
> non-fatal error status, use pci_aer_clear_uncorrect_error_status().
> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> index 09cf2cd0ae60..d835cc0ba153 100644
> --- a/drivers/scsi/lpfc/lpfc_attr.c
> +++ b/drivers/scsi/lpfc/lpfc_attr.c
> @@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
>   * Description:
>   * If the @buf contains 1 and the device currently has the AER support
>   * enabled, then invokes the kernel AER helper routine
> - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
>   * error status register.
>   *
>   * Notes:
> @@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
>  		return -EINVAL;
>  
>  	if (phba->hba_flag & HBA_AER_ENABLED)
> -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
>  
>  	if (rc == 0)
>  		return strlen(buf);
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core
  2022-09-28 10:59 ` [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core Zhuo Chen
@ 2023-03-15 21:29   ` Bjorn Helgaas
  0 siblings, 0 replies; 23+ messages in thread
From: Bjorn Helgaas @ 2023-03-15 21:29 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen, linux-scsi, linux-pci,
	linux-kernel, ntb, linuxppc-dev

On Wed, Sep 28, 2022 at 06:59:38PM +0800, Zhuo Chen wrote:
> In lpfc_aer_cleanup_state(), uncorrectable error status needs to be
> cleared, which can be done by calling pci_aer_clear_nonfatal_status()
> and pci_aer_clear_fatal_status(). Meanwhile they can be combined in
> one function (the same in dpc_process_error). So add
> pci_aer_clear_uncorrect_error_status() function to PCI core and
> export symbol to other modules which wants to use it.

Sorry for getting back to this so late.

Why does lpfc need this?  I think AER error status should be cleared
by the PCI core, not by individual drivers, so I really would rather
not add a new interface for drivers to use.

> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/pci/pcie/aer.c | 16 ++++++++++++++++
>  include/linux/aer.h    |  5 +++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..4e637121be23 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -286,6 +286,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>  		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>  }
>  
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> +	int aer = dev->aer_cap;
> +	u32 status;
> +
> +	if (!pcie_aer_is_native(dev))
> +		return -EIO;
> +
> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> +	if (status)
> +		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> +
>  /**
>   * pci_aer_raw_clear_status - Clear AER error registers.
>   * @dev: the PCI device
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 97f64ba1b34a..154690c278cb 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -45,6 +45,7 @@ struct aer_capability_regs {
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
>  void pci_save_aer_state(struct pci_dev *dev);
>  void pci_restore_aer_state(struct pci_dev *dev);
>  #else
> @@ -60,6 +61,10 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
>  	return -EINVAL;
>  }
> +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> +	return -EINVAL;
> +}
>  static inline void pci_save_aer_state(struct pci_dev *dev) {}
>  static inline void pci_restore_aer_state(struct pci_dev *dev) {}
>  #endif
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call
  2022-09-28 10:59 ` [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call Zhuo Chen
  2022-09-28 11:03   ` Serge Semin
@ 2023-03-15 21:31   ` Bjorn Helgaas
  1 sibling, 0 replies; 23+ messages in thread
From: Bjorn Helgaas @ 2023-03-15 21:31 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, james.smart,
	dick.kennedy, jejb, martin.petersen, linux-scsi, linux-pci,
	linux-kernel, ntb, linuxppc-dev

On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.
> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>

Can you send this to the NTB folks?  It doesn't depend on anything, so
no real reason to merge via the PCI tree.

To help reviewers, ideally the commit log would mention where the PCI
core clears the non-fatal errors so the driver doesn't have to.

> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>  	ret = pci_enable_pcie_error_reporting(pdev);
>  	if (ret != 0)
>  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> -	else /* Cleanup nonfatal error status before getting to init */
> -		pci_aer_clear_nonfatal_status(pdev);
>  
>  	/* First enable the PCI device */
>  	ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  2022-12-06 22:13   ` Bjorn Helgaas
@ 2023-03-15 21:35     ` Bjorn Helgaas
  2023-03-30 23:43       ` Justin Tee
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2023-03-15 21:35 UTC (permalink / raw)
  To: Zhuo Chen, James Smart, Dick Kennedy
  Cc: sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, jejb,
	martin.petersen, linux-scsi, linux-pci, linux-kernel, ntb,
	linuxppc-dev

On Tue, Dec 06, 2022 at 04:13:35PM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 28, 2022 at 06:59:41PM +0800, Zhuo Chen wrote:
> > lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
> > uncorrectable error status.
> 
> I don't know what the point of lpfc_aer_cleanup_state() is.  AER
> errors should be handled and cleared by the PCI core, not by
> individual drivers.  Only lpfc, liquidio, and sky2 touch
> PCI_ERR_UNCOR_STATUS.
> 
> But lpfc_aer_cleanup_state() is visible in the
> "lpfc_aer_state_cleanup" sysfs file, so removing it would break any
> userspace that uses it.
> 
> If we can rely on the PCI core to clean up AER errors itself
> (admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
> could just become a no-op?
> 
> Any comment from the LPFC folks?
> 
> Ideally, I would rather not export pci_aer_clear_nonfatal_status() or
> pci_aer_clear_uncorrect_error_status() outside the PCI core at all.

Resurrecting this old thread.  Zhuo, can you figure out where the PCI
core clears these errors, include that in the commit log, and propose
a patch that makes lpfc_aer_cleanup_state() a no-op, by removing the
pci_aer_clear_nonfatal_status() call completely?

Such a patch could be sent to the SCSI maintainers since it doesn't
involve the PCI core.

If it turns out that the PCI core *doesn't* clear these errors, we
should figure out *why* it doesn't and try to change the PCI core so
it does.

> > But using pci_aer_clear_nonfatal_status()
> > will only clear non-fatal error status. To clear both fatal and
> > non-fatal error status, use pci_aer_clear_uncorrect_error_status().
> > 
> > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > ---
> >  drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> > index 09cf2cd0ae60..d835cc0ba153 100644
> > --- a/drivers/scsi/lpfc/lpfc_attr.c
> > +++ b/drivers/scsi/lpfc/lpfc_attr.c
> > @@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
> >   * Description:
> >   * If the @buf contains 1 and the device currently has the AER support
> >   * enabled, then invokes the kernel AER helper routine
> > - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> > + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
> >   * error status register.
> >   *
> >   * Notes:
> > @@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
> >  		return -EINVAL;
> >  
> >  	if (phba->hba_flag & HBA_AER_ENABLED)
> > -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> > +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
> >  
> >  	if (rc == 0)
> >  		return strlen(buf);
> > -- 
> > 2.30.1 (Apple Git-130)
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status()
  2023-03-15 21:35     ` Bjorn Helgaas
@ 2023-03-30 23:43       ` Justin Tee
  0 siblings, 0 replies; 23+ messages in thread
From: Justin Tee @ 2023-03-30 23:43 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Zhuo Chen, James Smart, Dick Kennedy, Justin Tee,
	sathyanarayanan.kuppuswamy, bhelgaas, ruscur, oohall,
	fancer.lancer, jdmason, dave.jiang, allenbh, jejb,
	martin.petersen, linux-scsi, linux-pci, linux-kernel, ntb,
	linuxppc-dev

Hi Bjorn,

> But lpfc_aer_cleanup_state() is visible in the
> "lpfc_aer_state_cleanup" sysfs file, so removing it would break any
> userspace that uses it.
>
> If we can rely on the PCI core to clean up AER errors itself
> (admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
> could just become a no-op?
>
> Any comment from the LPFC folks?

We have notified all users of the lpfc_aer_cleanup_state sysfs entry,
and Broadcom LPFC is okay to no-op.

Regards,
Justin

On Wed, Mar 15, 2023 at 2:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Dec 06, 2022 at 04:13:35PM -0600, Bjorn Helgaas wrote:
> > On Wed, Sep 28, 2022 at 06:59:41PM +0800, Zhuo Chen wrote:
> > > lpfc_aer_cleanup_state() requires clearing both fatal and non-fatal
> > > uncorrectable error status.
> >
> > I don't know what the point of lpfc_aer_cleanup_state() is.  AER
> > errors should be handled and cleared by the PCI core, not by
> > individual drivers.  Only lpfc, liquidio, and sky2 touch
> > PCI_ERR_UNCOR_STATUS.
> >
> > But lpfc_aer_cleanup_state() is visible in the
> > "lpfc_aer_state_cleanup" sysfs file, so removing it would break any
> > userspace that uses it.
> >
> > If we can rely on the PCI core to clean up AER errors itself
> > (admittedly, that might be a big "if"), maybe lpfc_aer_cleanup_state()
> > could just become a no-op?
> >
> > Any comment from the LPFC folks?
> >
> > Ideally, I would rather not export pci_aer_clear_nonfatal_status() or
> > pci_aer_clear_uncorrect_error_status() outside the PCI core at all.
>
> Resurrecting this old thread.  Zhuo, can you figure out where the PCI
> core clears these errors, include that in the commit log, and propose
> a patch that makes lpfc_aer_cleanup_state() a no-op, by removing the
> pci_aer_clear_nonfatal_status() call completely?
>
> Such a patch could be sent to the SCSI maintainers since it doesn't
> involve the PCI core.
>
> If it turns out that the PCI core *doesn't* clear these errors, we
> should figure out *why* it doesn't and try to change the PCI core so
> it does.
>
> > > But using pci_aer_clear_nonfatal_status()
> > > will only clear non-fatal error status. To clear both fatal and
> > > non-fatal error status, use pci_aer_clear_uncorrect_error_status().
> > >
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > > ---
> > >  drivers/scsi/lpfc/lpfc_attr.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> > > index 09cf2cd0ae60..d835cc0ba153 100644
> > > --- a/drivers/scsi/lpfc/lpfc_attr.c
> > > +++ b/drivers/scsi/lpfc/lpfc_attr.c
> > > @@ -4689,7 +4689,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
> > >   * Description:
> > >   * If the @buf contains 1 and the device currently has the AER support
> > >   * enabled, then invokes the kernel AER helper routine
> > > - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> > > + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
> > >   * error status register.
> > >   *
> > >   * Notes:
> > > @@ -4715,7 +4715,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
> > >             return -EINVAL;
> > >
> > >     if (phba->hba_flag & HBA_AER_ENABLED)
> > > -           rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> > > +           rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
> > >
> > >     if (rc == 0)
> > >             return strlen(buf);
> > > --
> > > 2.30.1 (Apple Git-130)
> > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-03-30 23:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 10:59 [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Zhuo Chen
2022-09-28 10:59 ` [PATCH v3 1/9] PCI/AER: Add pci_aer_clear_uncorrect_error_status() to PCI core Zhuo Chen
2023-03-15 21:29   ` Bjorn Helgaas
2022-09-28 10:59 ` [PATCH v3 2/9] PCI/DPC: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
2022-09-28 10:59 ` [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call Zhuo Chen
2022-09-28 11:03   ` Serge Semin
2022-12-06 18:09     ` Bjorn Helgaas
2022-12-06 21:41       ` Serge Semin
2023-03-15 21:31   ` Bjorn Helgaas
2022-09-28 10:59 ` [PATCH v3 4/9] scsi: lpfc: Change to use pci_aer_clear_uncorrect_error_status() Zhuo Chen
2022-12-06 22:13   ` Bjorn Helgaas
2023-03-15 21:35     ` Bjorn Helgaas
2023-03-30 23:43       ` Justin Tee
2022-09-28 10:59 ` [PATCH v3 5/9] PCI/AER: Unexport pci_aer_clear_nonfatal_status() Zhuo Chen
2022-09-28 10:59 ` [PATCH v3 6/9] PCI/AER: Move check inside pcie_clear_device_status() Zhuo Chen
2022-09-28 10:59 ` [PATCH v3 7/9] PCI/AER: Use pcie_aer_is_native() to judge whether OS owns AER Zhuo Chen
2022-09-28 10:59 ` [PATCH v3 8/9] PCI/ERR: Clear fatal error status when pci_channel_io_frozen Zhuo Chen
2022-12-06 21:42   ` Bjorn Helgaas
2022-09-28 10:59 ` [PATCH v3 9/9] PCI/AER: Refine status clearing process with api Zhuo Chen
2022-09-28 11:06 ` [PATCH v3 0/9] PCI/AER: Fix and optimize usage of status clearing api Serge Semin
2022-09-28 15:54   ` [External] " Zhuo Chen
2022-11-04 17:20 ` Zhuo Chen
2022-11-24 11:55   ` Zhuo Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).