linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api
@ 2022-09-01 18:16 Zhuo Chen
  2022-09-01 18:16 ` [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Zhuo Chen @ 2022-09-01 18:16 UTC (permalink / raw)
  To: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, ntb, linux-kernel, linux-pci, linuxppc-dev, linux-scsi

Hello,

This series contains some fixes and optimizations of aer api usage.
We add some process to clear uncorrectable error status, then add
distinction between fatal and nonfatal situations in pcie_do_recovery()
and reduce some redundant code. The series involves pci driver and
vendor driver.

Thanks,
Zhuo Chen

Zhuo Chen (3):
  PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear
    uncorrectable error status
  PCI/ERR: Clear fatal status in pcie_do_recovery()
  PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error
    status

 drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
 drivers/pci/pci.h               |  2 ++
 drivers/pci/pcie/aer.c          | 30 +++++++++++++++++++-----------
 drivers/pci/pcie/dpc.c          |  3 +--
 drivers/pci/pcie/err.c          |  8 ++++++--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
 include/linux/aer.h             |  4 ++--
 7 files changed, 34 insertions(+), 21 deletions(-)

-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-01 18:16 [PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api Zhuo Chen
@ 2022-09-01 18:16 ` Zhuo Chen
  2022-09-11 16:22   ` Serge Semin
  2022-09-01 18:16 ` [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery() Zhuo Chen
  2022-09-01 18:16 ` [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status Zhuo Chen
  2 siblings, 1 reply; 17+ messages in thread
From: Zhuo Chen @ 2022-09-01 18:16 UTC (permalink / raw)
  To: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, ntb, linux-kernel, linux-pci, linuxppc-dev, linux-scsi

Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in ntb_hw_idt.c and lpfc_attr.c. So we add
pci_aer_clear_uncorrect_error_status() and change to use it.

Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
 drivers/pci/pci.h               |  2 ++
 drivers/pci/pcie/aer.c          | 23 ++++++++++++++++++-----
 drivers/pci/pcie/dpc.c          |  3 +--
 drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
 include/linux/aer.h             |  4 ++--
 6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 733557231ed0..de1dbbc5b9de 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
 	ret = pci_enable_pcie_error_reporting(pdev);
 	if (ret != 0)
 		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
-	else /* Cleanup nonfatal error status before getting to init */
-		pci_aer_clear_nonfatal_status(pdev);
+	else /* Cleanup uncorrectable error status before getting to init */
+		pci_aer_clear_uncorrect_error_status(pdev);
 
 	/* First enable the PCI device */
 	ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..574176f43025 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
 void pci_aer_exit(struct pci_dev *dev);
 extern const struct attribute_group aer_stats_attr_group;
 void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pci_aer_clear_status(struct pci_dev *dev);
 int pci_aer_raw_clear_status(struct pci_dev *dev);
 #else
@@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
 static inline void pci_aer_init(struct pci_dev *d) { }
 static inline void pci_aer_exit(struct pci_dev *d) { }
 static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
 static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
 static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
 #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7952e5efd6cf..d2996afa80f6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	int aer = dev->aer_cap;
 	u32 status, sev;
 
 	if (!pcie_aer_is_native(dev))
-		return -EIO;
+		return;
 
 	/* Clear status bits for ERR_NONFATAL errors only */
 	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 	status &= ~sev;
 	if (status)
 		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
-
-	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
 
 void pci_aer_clear_fatal_status(struct pci_dev *dev)
 {
@@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
 		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
 }
 
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
+{
+	int aer = dev->aer_cap;
+	u32 status;
+
+	if (!pcie_aer_is_native(dev))
+		return -EIO;
+
+	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
+	if (status)
+		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
+
 /**
  * pci_aer_raw_clear_status - Clear AER error registers.
  * @dev: the PCI device
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d..7942073fbb34 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
 		aer_print_error(pdev, &info);
-		pci_aer_clear_nonfatal_status(pdev);
-		pci_aer_clear_fatal_status(pdev);
+		pci_aer_clear_uncorrect_error_status(pdev);
 	}
 }
 
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 3caaa7c4af48..1ed8d1640325 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -4712,7 +4712,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
  * Description:
  * If the @buf contains 1 and the device currently has the AER support
  * enabled, then invokes the kernel AER helper routine
- * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
+ * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
  * error status register.
  *
  * Notes:
@@ -4738,7 +4738,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
 		return -EINVAL;
 
 	if (phba->hba_flag & HBA_AER_ENABLED)
-		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
+		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
 
 	if (rc == 0)
 		return strlen(buf);
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 97f64ba1b34a..f638ad955deb 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,7 +44,7 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
 void pci_save_aer_state(struct pci_dev *dev);
 void pci_restore_aer_state(struct pci_dev *dev);
 #else
@@ -56,7 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
-static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
+static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
  2022-09-01 18:16 [PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api Zhuo Chen
  2022-09-01 18:16 ` [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
@ 2022-09-01 18:16 ` Zhuo Chen
  2022-09-22 21:08   ` Bjorn Helgaas
  2022-09-01 18:16 ` [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status Zhuo Chen
  2 siblings, 1 reply; 17+ messages in thread
From: Zhuo Chen @ 2022-09-01 18:16 UTC (permalink / raw)
  To: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, ntb, linux-kernel, linux-pci, linuxppc-dev, linux-scsi

When state is pci_channel_io_frozen in pcie_do_recovery(),
the severity is fatal and fatal status should be cleared.
So we add pci_aer_clear_fatal_status().

Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
and pci_aer_clear_nonfatal_status() contains the function of
'if (host->native_aer || pcie_ports_native)', so we move them
out of it.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/err.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..e0a8ade4c3fe 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	 * it is responsible for clearing this status.  In that case, the
 	 * signaling device may not even be visible to the OS.
 	 */
-	if (host->native_aer || pcie_ports_native) {
+	if (host->native_aer || pcie_ports_native)
 		pcie_clear_device_status(dev);
+
+	if (state == pci_channel_io_frozen)
+		pci_aer_clear_fatal_status(dev);
+	else
 		pci_aer_clear_nonfatal_status(dev);
-	}
+
 	pci_info(bridge, "device recovery successful\n");
 	return status;
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status
  2022-09-01 18:16 [PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api Zhuo Chen
  2022-09-01 18:16 ` [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
  2022-09-01 18:16 ` [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery() Zhuo Chen
@ 2022-09-01 18:16 ` Zhuo Chen
  2022-09-22 21:50   ` Bjorn Helgaas
  2 siblings, 1 reply; 17+ messages in thread
From: Zhuo Chen @ 2022-09-01 18:16 UTC (permalink / raw)
  To: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen
  Cc: chenzhuo.1, ntb, linux-kernel, linux-pci, linuxppc-dev, linux-scsi

Statements clearing AER error status in aer_enable_rootport() has the
same function as pci_aer_raw_clear_status(). So we replace them, which
has no functional changes.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/pci/pcie/aer.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d2996afa80f6..eb0193f279f2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
 	/* Clear error status */
-	pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-	pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-	pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+	pci_aer_raw_clear_status(pdev);
 
 	/*
 	 * Enable error reporting for the root port device and downstream port
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-01 18:16 ` [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
@ 2022-09-11 16:22   ` Serge Semin
  2022-09-11 17:09     ` [External] " Zhuo Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Serge Semin @ 2022-09-11 16:22 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: jdmason, dave.jiang, allenbh, bhelgaas, ruscur, oohall,
	james.smart, dick.kennedy, jejb, martin.petersen, ntb,
	linux-kernel, linux-pci, linuxppc-dev, linux-scsi

Hi Zhuo

On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
> Status bits for ERR_NONFATAL errors only are cleared in
> pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> error status in ntb_hw_idt.c and lpfc_attr.c. So we add
> pci_aer_clear_uncorrect_error_status() and change to use it.

What about the next drivers

drivers/scsi/lpfc/lpfc_attr.c
drivers/crypto/hisilicon/qm.c
drivers/net/ethernet/intel/ice/ice_main.c

which call the pci_aer_clear_nonfatal_status() method too?

> 
> Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
> no functional changes.
> 
> Since pci_aer_clear_nonfatal_status() is used only internally, move
> its declaration to the PCI internal header file. Also, no one cares
> about return value of pci_aer_clear_nonfatal_status(), so make it void.
> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
>  drivers/pci/pci.h               |  2 ++
>  drivers/pci/pcie/aer.c          | 23 ++++++++++++++++++-----
>  drivers/pci/pcie/dpc.c          |  3 +--
>  drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
>  include/linux/aer.h             |  4 ++--
>  6 files changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 733557231ed0..de1dbbc5b9de 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>  	ret = pci_enable_pcie_error_reporting(pdev);
>  	if (ret != 0)
>  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");

> -	else /* Cleanup nonfatal error status before getting to init */
> -		pci_aer_clear_nonfatal_status(pdev);
> +	else /* Cleanup uncorrectable error status before getting to init */
> +		pci_aer_clear_uncorrect_error_status(pdev);

From the IDT NTB PCIe initialization procedure point of view both of
these methods are equivalent. So for the IDT NTB part:

Acked-by: Serge Semin <fancer.lancer@gmail.com>

-Sergey

>  
>  	/* First enable the PCI device */
>  	ret = pcim_enable_device(pdev);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index e10cdec6c56e..574176f43025 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
>  void pci_aer_exit(struct pci_dev *dev);
>  extern const struct attribute_group aer_stats_attr_group;
>  void pci_aer_clear_fatal_status(struct pci_dev *dev);
> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>  int pci_aer_clear_status(struct pci_dev *dev);
>  int pci_aer_raw_clear_status(struct pci_dev *dev);
>  #else
> @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
>  static inline void pci_aer_init(struct pci_dev *d) { }
>  static inline void pci_aer_exit(struct pci_dev *d) { }
>  static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
>  static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
>  static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
>  #endif
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7952e5efd6cf..d2996afa80f6 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
>  
> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
>  	int aer = dev->aer_cap;
>  	u32 status, sev;
>  
>  	if (!pcie_aer_is_native(dev))
> -		return -EIO;
> +		return;
>  
>  	/* Clear status bits for ERR_NONFATAL errors only */
>  	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> @@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  	status &= ~sev;
>  	if (status)
>  		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> -
> -	return 0;
>  }
> -EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
>  
>  void pci_aer_clear_fatal_status(struct pci_dev *dev)
>  {
> @@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>  		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>  }
>  
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> +	int aer = dev->aer_cap;
> +	u32 status;
> +
> +	if (!pcie_aer_is_native(dev))
> +		return -EIO;
> +
> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> +	if (status)
> +		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> +
>  /**
>   * pci_aer_raw_clear_status - Clear AER error registers.
>   * @dev: the PCI device
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 3e9afee02e8d..7942073fbb34 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
>  		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
>  		 aer_get_device_error_info(pdev, &info)) {
>  		aer_print_error(pdev, &info);
> -		pci_aer_clear_nonfatal_status(pdev);
> -		pci_aer_clear_fatal_status(pdev);
> +		pci_aer_clear_uncorrect_error_status(pdev);
>  	}
>  }
>  
> diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> index 3caaa7c4af48..1ed8d1640325 100644
> --- a/drivers/scsi/lpfc/lpfc_attr.c
> +++ b/drivers/scsi/lpfc/lpfc_attr.c
> @@ -4712,7 +4712,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
>   * Description:
>   * If the @buf contains 1 and the device currently has the AER support
>   * enabled, then invokes the kernel AER helper routine
> - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
>   * error status register.
>   *
>   * Notes:
> @@ -4738,7 +4738,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
>  		return -EINVAL;
>  
>  	if (phba->hba_flag & HBA_AER_ENABLED)
> -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
>  
>  	if (rc == 0)
>  		return strlen(buf);
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 97f64ba1b34a..f638ad955deb 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -44,7 +44,7 @@ struct aer_capability_regs {
>  /* PCIe port driver needs this function to enable AER */
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
>  void pci_save_aer_state(struct pci_dev *dev);
>  void pci_restore_aer_state(struct pci_dev *dev);
>  #else
> @@ -56,7 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>  {
>  	return -EINVAL;
>  }
> -static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
>  {
>  	return -EINVAL;
>  }
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-11 16:22   ` Serge Semin
@ 2022-09-11 17:09     ` Zhuo Chen
  2022-09-11 17:55       ` Serge Semin
  2022-09-22 20:02       ` Bjorn Helgaas
  0 siblings, 2 replies; 17+ messages in thread
From: Zhuo Chen @ 2022-09-11 17:09 UTC (permalink / raw)
  To: Serge Semin
  Cc: jdmason, dave.jiang, allenbh, bhelgaas, ruscur, oohall,
	james.smart, dick.kennedy, jejb, martin.petersen, ntb,
	linux-kernel, linux-pci, linuxppc-dev, linux-scsi



On 9/12/22 12:22 AM, Serge Semin wrote:
> Hi Zhuo
> 
> On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
>> Status bits for ERR_NONFATAL errors only are cleared in
>> pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
>> error status in ntb_hw_idt.c and lpfc_attr.c. So we add
>> pci_aer_clear_uncorrect_error_status() and change to use it.
> 
> What about the next drivers
> 
> drivers/scsi/lpfc/lpfc_attr.c
> drivers/crypto/hisilicon/qm.c
> drivers/net/ethernet/intel/ice/ice_main.c
> 
> which call the pci_aer_clear_nonfatal_status() method too?

‘pci_aer_clear_nonfatal_status()’ in 
drivers/net/ethernet/intel/ice/ice_main.c has already been removed and 
merged in kernel in: 
https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776

‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will 
be removed in the next kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406

Uncorrectable error status register was intended to be cleared in 
drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
and
https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f
> 
>>
>> Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
>> no functional changes.
>>
>> Since pci_aer_clear_nonfatal_status() is used only internally, move
>> its declaration to the PCI internal header file. Also, no one cares
>> about return value of pci_aer_clear_nonfatal_status(), so make it void.
>>
>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>> ---
>>   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
>>   drivers/pci/pci.h               |  2 ++
>>   drivers/pci/pcie/aer.c          | 23 ++++++++++++++++++-----
>>   drivers/pci/pcie/dpc.c          |  3 +--
>>   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
>>   include/linux/aer.h             |  4 ++--
>>   6 files changed, 27 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
>> index 733557231ed0..de1dbbc5b9de 100644
>> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
>> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
>> @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>>   	ret = pci_enable_pcie_error_reporting(pdev);
>>   	if (ret != 0)
>>   		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> 
>> -	else /* Cleanup nonfatal error status before getting to init */
>> -		pci_aer_clear_nonfatal_status(pdev);
>> +	else /* Cleanup uncorrectable error status before getting to init */
>> +		pci_aer_clear_uncorrect_error_status(pdev);
> 
>  From the IDT NTB PCIe initialization procedure point of view both of
> these methods are equivalent. So for the IDT NTB part:
> 
IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original 
function is clear uncorrectable error status register including fatal 
and non-fatal error status bits.

> Acked-by: Serge Semin <fancer.lancer@gmail.com>
> 
> -Sergey
> 
>>   
>>   	/* First enable the PCI device */
>>   	ret = pcim_enable_device(pdev);
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index e10cdec6c56e..574176f43025 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
>>   void pci_aer_exit(struct pci_dev *dev);
>>   extern const struct attribute_group aer_stats_attr_group;
>>   void pci_aer_clear_fatal_status(struct pci_dev *dev);
>> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>   int pci_aer_clear_status(struct pci_dev *dev);
>>   int pci_aer_raw_clear_status(struct pci_dev *dev);
>>   #else
>> @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
>>   static inline void pci_aer_init(struct pci_dev *d) { }
>>   static inline void pci_aer_exit(struct pci_dev *d) { }
>>   static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
>> +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
>>   static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
>>   static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
>>   #endif
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 7952e5efd6cf..d2996afa80f6 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>>   }
>>   EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
>>   
>> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>   {
>>   	int aer = dev->aer_cap;
>>   	u32 status, sev;
>>   
>>   	if (!pcie_aer_is_native(dev))
>> -		return -EIO;
>> +		return;
>>   
>>   	/* Clear status bits for ERR_NONFATAL errors only */
>>   	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
>> @@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>   	status &= ~sev;
>>   	if (status)
>>   		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>> -
>> -	return 0;
>>   }
>> -EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
>>   
>>   void pci_aer_clear_fatal_status(struct pci_dev *dev)
>>   {
>> @@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>>   		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>>   }
>>   
>> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
>> +{
>> +	int aer = dev->aer_cap;
>> +	u32 status;
>> +
>> +	if (!pcie_aer_is_native(dev))
>> +		return -EIO;
>> +
>> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
>> +	if (status)
>> +		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
>> +
>>   /**
>>    * pci_aer_raw_clear_status - Clear AER error registers.
>>    * @dev: the PCI device
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index 3e9afee02e8d..7942073fbb34 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
>>   		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
>>   		 aer_get_device_error_info(pdev, &info)) {
>>   		aer_print_error(pdev, &info);
>> -		pci_aer_clear_nonfatal_status(pdev);
>> -		pci_aer_clear_fatal_status(pdev);
>> +		pci_aer_clear_uncorrect_error_status(pdev);
>>   	}
>>   }
>>   
>> diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
>> index 3caaa7c4af48..1ed8d1640325 100644
>> --- a/drivers/scsi/lpfc/lpfc_attr.c
>> +++ b/drivers/scsi/lpfc/lpfc_attr.c
>> @@ -4712,7 +4712,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
>>    * Description:
>>    * If the @buf contains 1 and the device currently has the AER support
>>    * enabled, then invokes the kernel AER helper routine
>> - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
>> + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
>>    * error status register.
>>    *
>>    * Notes:
>> @@ -4738,7 +4738,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
>>   		return -EINVAL;
>>   
>>   	if (phba->hba_flag & HBA_AER_ENABLED)
>> -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
>> +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
>>   
>>   	if (rc == 0)
>>   		return strlen(buf);
>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>> index 97f64ba1b34a..f638ad955deb 100644
>> --- a/include/linux/aer.h
>> +++ b/include/linux/aer.h
>> @@ -44,7 +44,7 @@ struct aer_capability_regs {
>>   /* PCIe port driver needs this function to enable AER */
>>   int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>>   int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
>>   void pci_save_aer_state(struct pci_dev *dev);
>>   void pci_restore_aer_state(struct pci_dev *dev);
>>   #else
>> @@ -56,7 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>>   {
>>   	return -EINVAL;
>>   }
>> -static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>> +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
>>   {
>>   	return -EINVAL;
>>   }
>> -- 
>> 2.30.1 (Apple Git-130)
>>

-- 
Thanks,
Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-11 17:09     ` [External] " Zhuo Chen
@ 2022-09-11 17:55       ` Serge Semin
  2022-09-22 20:02       ` Bjorn Helgaas
  1 sibling, 0 replies; 17+ messages in thread
From: Serge Semin @ 2022-09-11 17:55 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: jdmason, dave.jiang, allenbh, bhelgaas, ruscur, oohall,
	james.smart, dick.kennedy, jejb, martin.petersen, ntb,
	linux-kernel, linux-pci, linuxppc-dev, linux-scsi

On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
> 
> 
> On 9/12/22 12:22 AM, Serge Semin wrote:
> > Hi Zhuo
> > 
> > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
> > > Status bits for ERR_NONFATAL errors only are cleared in
> > > pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> > > error status in ntb_hw_idt.c and lpfc_attr.c. So we add
> > > pci_aer_clear_uncorrect_error_status() and change to use it.
> > 
> > What about the next drivers
> > 
> > drivers/scsi/lpfc/lpfc_attr.c
> > drivers/crypto/hisilicon/qm.c
> > drivers/net/ethernet/intel/ice/ice_main.c
> > 
> > which call the pci_aer_clear_nonfatal_status() method too?
> 
> ‘pci_aer_clear_nonfatal_status()’ in
> drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
> merged in kernel in: https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776
> 
> ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
> removed in the next kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406
> 
> Uncorrectable error status register was intended to be cleared in
> drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
> and
> https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f

Got it. Thanks for clarification.

-Sergey

> > 
> > > 
> > > Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
> > > no functional changes.
> > > 
> > > Since pci_aer_clear_nonfatal_status() is used only internally, move
> > > its declaration to the PCI internal header file. Also, no one cares
> > > about return value of pci_aer_clear_nonfatal_status(), so make it void.
> > > 
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > > ---
> > >   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
> > >   drivers/pci/pci.h               |  2 ++
> > >   drivers/pci/pcie/aer.c          | 23 ++++++++++++++++++-----
> > >   drivers/pci/pcie/dpc.c          |  3 +--
> > >   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
> > >   include/linux/aer.h             |  4 ++--
> > >   6 files changed, 27 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 733557231ed0..de1dbbc5b9de 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > >   	ret = pci_enable_pcie_error_reporting(pdev);
> > >   	if (ret != 0)
> > >   		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > 
> > > -	else /* Cleanup nonfatal error status before getting to init */
> > > -		pci_aer_clear_nonfatal_status(pdev);
> > > +	else /* Cleanup uncorrectable error status before getting to init */
> > > +		pci_aer_clear_uncorrect_error_status(pdev);
> > 
> >  From the IDT NTB PCIe initialization procedure point of view both of
> > these methods are equivalent. So for the IDT NTB part:
> > 
> IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
> function is clear uncorrectable error status register including fatal and
> non-fatal error status bits.
> 
> > Acked-by: Serge Semin <fancer.lancer@gmail.com>
> > 
> > -Sergey
> > 
> > >   	/* First enable the PCI device */
> > >   	ret = pcim_enable_device(pdev);
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index e10cdec6c56e..574176f43025 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
> > >   void pci_aer_exit(struct pci_dev *dev);
> > >   extern const struct attribute_group aer_stats_attr_group;
> > >   void pci_aer_clear_fatal_status(struct pci_dev *dev);
> > > +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> > >   int pci_aer_clear_status(struct pci_dev *dev);
> > >   int pci_aer_raw_clear_status(struct pci_dev *dev);
> > >   #else
> > > @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
> > >   static inline void pci_aer_init(struct pci_dev *d) { }
> > >   static inline void pci_aer_exit(struct pci_dev *d) { }
> > >   static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> > > +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
> > >   static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
> > >   static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
> > >   #endif
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 7952e5efd6cf..d2996afa80f6 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
> > >   }
> > >   EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
> > > -int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > > +void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > >   {
> > >   	int aer = dev->aer_cap;
> > >   	u32 status, sev;
> > >   	if (!pcie_aer_is_native(dev))
> > > -		return -EIO;
> > > +		return;
> > >   	/* Clear status bits for ERR_NONFATAL errors only */
> > >   	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> > > @@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > >   	status &= ~sev;
> > >   	if (status)
> > >   		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> > > -
> > > -	return 0;
> > >   }
> > > -EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
> > >   void pci_aer_clear_fatal_status(struct pci_dev *dev)
> > >   {
> > > @@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
> > >   		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> > >   }
> > > +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> > > +{
> > > +	int aer = dev->aer_cap;
> > > +	u32 status;
> > > +
> > > +	if (!pcie_aer_is_native(dev))
> > > +		return -EIO;
> > > +
> > > +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> > > +	if (status)
> > > +		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> > > +
> > >   /**
> > >    * pci_aer_raw_clear_status - Clear AER error registers.
> > >    * @dev: the PCI device
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 3e9afee02e8d..7942073fbb34 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
> > >   		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > >   		 aer_get_device_error_info(pdev, &info)) {
> > >   		aer_print_error(pdev, &info);
> > > -		pci_aer_clear_nonfatal_status(pdev);
> > > -		pci_aer_clear_fatal_status(pdev);
> > > +		pci_aer_clear_uncorrect_error_status(pdev);
> > >   	}
> > >   }
> > > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> > > index 3caaa7c4af48..1ed8d1640325 100644
> > > --- a/drivers/scsi/lpfc/lpfc_attr.c
> > > +++ b/drivers/scsi/lpfc/lpfc_attr.c
> > > @@ -4712,7 +4712,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
> > >    * Description:
> > >    * If the @buf contains 1 and the device currently has the AER support
> > >    * enabled, then invokes the kernel AER helper routine
> > > - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> > > + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
> > >    * error status register.
> > >    *
> > >    * Notes:
> > > @@ -4738,7 +4738,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
> > >   		return -EINVAL;
> > >   	if (phba->hba_flag & HBA_AER_ENABLED)
> > > -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> > > +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
> > >   	if (rc == 0)
> > >   		return strlen(buf);
> > > diff --git a/include/linux/aer.h b/include/linux/aer.h
> > > index 97f64ba1b34a..f638ad955deb 100644
> > > --- a/include/linux/aer.h
> > > +++ b/include/linux/aer.h
> > > @@ -44,7 +44,7 @@ struct aer_capability_regs {
> > >   /* PCIe port driver needs this function to enable AER */
> > >   int pci_enable_pcie_error_reporting(struct pci_dev *dev);
> > >   int pci_disable_pcie_error_reporting(struct pci_dev *dev);
> > > -int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> > > +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
> > >   void pci_save_aer_state(struct pci_dev *dev);
> > >   void pci_restore_aer_state(struct pci_dev *dev);
> > >   #else
> > > @@ -56,7 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
> > >   {
> > >   	return -EINVAL;
> > >   }
> > > -static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > > +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> > >   {
> > >   	return -EINVAL;
> > >   }
> > > -- 
> > > 2.30.1 (Apple Git-130)
> > > 
> 
> -- 
> Thanks,
> Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-11 17:09     ` [External] " Zhuo Chen
  2022-09-11 17:55       ` Serge Semin
@ 2022-09-22 20:02       ` Bjorn Helgaas
  2022-09-26 13:30         ` Zhuo Chen
  1 sibling, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2022-09-22 20:02 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: Serge Semin, allenbh, dave.jiang, linux-scsi, martin.petersen,
	linux-pci, jejb, jdmason, james.smart, linux-kernel, ntb, oohall,
	bhelgaas, dick.kennedy, linuxppc-dev

On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
> On 9/12/22 12:22 AM, Serge Semin wrote:
> > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
> > > Status bits for ERR_NONFATAL errors only are cleared in
> > > pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> > > error status in ntb_hw_idt.c and lpfc_attr.c. So we add
> > > pci_aer_clear_uncorrect_error_status() and change to use it.
> > 
> > What about the next drivers
> > 
> > drivers/scsi/lpfc/lpfc_attr.c
> > drivers/crypto/hisilicon/qm.c
> > drivers/net/ethernet/intel/ice/ice_main.c
> > 
> > which call the pci_aer_clear_nonfatal_status() method too?
> 
> ‘pci_aer_clear_nonfatal_status()’ in
> drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
> merged in kernel in: https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776

It's better if you can use kernel.org URLs that don't depend on
third parties like github, e.g.,

  https://git.kernel.org/linus/ca415ea1f03a

> ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
> removed in the next kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406

This is a problem because 00278564a60e ("crypto: hisilicon - Remove
pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
and if I apply this series to the PCI tree and Linus merges it before
Herbert's cryptodev changes, it will break the build.

I think we need to split this patch up like this:

  - Add pci_aer_clear_uncorrect_error_status() to PCI core
  - Convert dpc to use pci_aer_clear_uncorrect_error_status()
    (I might end up squashing with above)
  - Convert lpfc to use pci_aer_clear_uncorrect_error_status()
  - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
  - Unexport pci_aer_clear_nonfatal_status()

Then I can apply all but the last patch safely.  If the crypto changes
are merged first, we can add the last one; otherwise we can do it for
the next cycle.

> Uncorrectable error status register was intended to be cleared in
> drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
> and
> https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f

This will be a behavior change for lpfc and ntb_hw_idt.  It looks like
it changes the behavior back to what it was before e7b0b847de6d
("PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery"),
so it might be OK, but splitting these out to their own patches will
make the change more obvious and we can make sure that's what we want.

Bjorn

> > > Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
> > > no functional changes.
> > > 
> > > Since pci_aer_clear_nonfatal_status() is used only internally, move
> > > its declaration to the PCI internal header file. Also, no one cares
> > > about return value of pci_aer_clear_nonfatal_status(), so make it void.
> > > 
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > > ---
> > >   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
> > >   drivers/pci/pci.h               |  2 ++
> > >   drivers/pci/pcie/aer.c          | 23 ++++++++++++++++++-----
> > >   drivers/pci/pcie/dpc.c          |  3 +--
> > >   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
> > >   include/linux/aer.h             |  4 ++--
> > >   6 files changed, 27 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 733557231ed0..de1dbbc5b9de 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > >   	ret = pci_enable_pcie_error_reporting(pdev);
> > >   	if (ret != 0)
> > >   		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > 
> > > -	else /* Cleanup nonfatal error status before getting to init */
> > > -		pci_aer_clear_nonfatal_status(pdev);
> > > +	else /* Cleanup uncorrectable error status before getting to init */
> > > +		pci_aer_clear_uncorrect_error_status(pdev);
> > 
> >  From the IDT NTB PCIe initialization procedure point of view both of
> > these methods are equivalent. So for the IDT NTB part:
> > 
> IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
> function is clear uncorrectable error status register including fatal and
> non-fatal error status bits.
> 
> > Acked-by: Serge Semin <fancer.lancer@gmail.com>
> > 
> > -Sergey
> > 
> > >   	/* First enable the PCI device */
> > >   	ret = pcim_enable_device(pdev);
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index e10cdec6c56e..574176f43025 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
> > >   void pci_aer_exit(struct pci_dev *dev);
> > >   extern const struct attribute_group aer_stats_attr_group;
> > >   void pci_aer_clear_fatal_status(struct pci_dev *dev);
> > > +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> > >   int pci_aer_clear_status(struct pci_dev *dev);
> > >   int pci_aer_raw_clear_status(struct pci_dev *dev);
> > >   #else
> > > @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
> > >   static inline void pci_aer_init(struct pci_dev *d) { }
> > >   static inline void pci_aer_exit(struct pci_dev *d) { }
> > >   static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> > > +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
> > >   static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
> > >   static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
> > >   #endif
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 7952e5efd6cf..d2996afa80f6 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
> > >   }
> > >   EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
> > > -int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > > +void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > >   {
> > >   	int aer = dev->aer_cap;
> > >   	u32 status, sev;
> > >   	if (!pcie_aer_is_native(dev))
> > > -		return -EIO;
> > > +		return;
> > >   	/* Clear status bits for ERR_NONFATAL errors only */
> > >   	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> > > @@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > >   	status &= ~sev;
> > >   	if (status)
> > >   		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> > > -
> > > -	return 0;
> > >   }
> > > -EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
> > >   void pci_aer_clear_fatal_status(struct pci_dev *dev)
> > >   {
> > > @@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
> > >   		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> > >   }
> > > +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> > > +{
> > > +	int aer = dev->aer_cap;
> > > +	u32 status;
> > > +
> > > +	if (!pcie_aer_is_native(dev))
> > > +		return -EIO;
> > > +
> > > +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
> > > +	if (status)
> > > +		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> > > +
> > >   /**
> > >    * pci_aer_raw_clear_status - Clear AER error registers.
> > >    * @dev: the PCI device
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 3e9afee02e8d..7942073fbb34 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
> > >   		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > >   		 aer_get_device_error_info(pdev, &info)) {
> > >   		aer_print_error(pdev, &info);
> > > -		pci_aer_clear_nonfatal_status(pdev);
> > > -		pci_aer_clear_fatal_status(pdev);
> > > +		pci_aer_clear_uncorrect_error_status(pdev);
> > >   	}
> > >   }
> > > diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
> > > index 3caaa7c4af48..1ed8d1640325 100644
> > > --- a/drivers/scsi/lpfc/lpfc_attr.c
> > > +++ b/drivers/scsi/lpfc/lpfc_attr.c
> > > @@ -4712,7 +4712,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
> > >    * Description:
> > >    * If the @buf contains 1 and the device currently has the AER support
> > >    * enabled, then invokes the kernel AER helper routine
> > > - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
> > > + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
> > >    * error status register.
> > >    *
> > >    * Notes:
> > > @@ -4738,7 +4738,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
> > >   		return -EINVAL;
> > >   	if (phba->hba_flag & HBA_AER_ENABLED)
> > > -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
> > > +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
> > >   	if (rc == 0)
> > >   		return strlen(buf);
> > > diff --git a/include/linux/aer.h b/include/linux/aer.h
> > > index 97f64ba1b34a..f638ad955deb 100644
> > > --- a/include/linux/aer.h
> > > +++ b/include/linux/aer.h
> > > @@ -44,7 +44,7 @@ struct aer_capability_regs {
> > >   /* PCIe port driver needs this function to enable AER */
> > >   int pci_enable_pcie_error_reporting(struct pci_dev *dev);
> > >   int pci_disable_pcie_error_reporting(struct pci_dev *dev);
> > > -int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> > > +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
> > >   void pci_save_aer_state(struct pci_dev *dev);
> > >   void pci_restore_aer_state(struct pci_dev *dev);
> > >   #else
> > > @@ -56,7 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
> > >   {
> > >   	return -EINVAL;
> > >   }
> > > -static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> > > +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> > >   {
> > >   	return -EINVAL;
> > >   }
> > > -- 
> > > 2.30.1 (Apple Git-130)
> > > 
> 
> -- 
> Thanks,
> Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
  2022-09-01 18:16 ` [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery() Zhuo Chen
@ 2022-09-22 21:08   ` Bjorn Helgaas
  2022-09-26 14:01     ` Zhuo Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2022-09-22 21:08 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen,
	linux-scsi, linux-pci, linux-kernel, ntb, linuxppc-dev

On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
> When state is pci_channel_io_frozen in pcie_do_recovery(),
> the severity is fatal and fatal status should be cleared.
> So we add pci_aer_clear_fatal_status().

Seems sensible to me.  Did you find this by code inspection or by
debugging a problem?  If the latter, it would be nice to mention the
symptoms of the problem in the commit log.

> Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
> and pci_aer_clear_nonfatal_status() contains the function of
> 'if (host->native_aer || pcie_ports_native)', so we move them
> out of it.

Wrap commit log to fill 75 columns.

> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/pci/pcie/err.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 0c5a143025af..e0a8ade4c3fe 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	 * it is responsible for clearing this status.  In that case, the
>  	 * signaling device may not even be visible to the OS.
>  	 */
> -	if (host->native_aer || pcie_ports_native) {
> +	if (host->native_aer || pcie_ports_native)
>  		pcie_clear_device_status(dev);

pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
status only when we control AER"), both callers check before calling
it.

I think we should move the check inside pcie_clear_device_status().
That could be a separate preliminary patch.

There are a couple other places (aer_root_reset() and
get_port_device_capability()) that do the same check and could be
changed to use pcie_aer_is_native() instead.  That could be another
preliminary patch.


> +	if (state == pci_channel_io_frozen)
> +		pci_aer_clear_fatal_status(dev);
> +	else
>  		pci_aer_clear_nonfatal_status(dev);
> -	}
> +
>  	pci_info(bridge, "device recovery successful\n");
>  	return status;
>  
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status
  2022-09-01 18:16 ` [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status Zhuo Chen
@ 2022-09-22 21:50   ` Bjorn Helgaas
  2022-09-26 14:16     ` Zhuo Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2022-09-22 21:50 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen,
	linux-scsi, linux-pci, linux-kernel, ntb, linuxppc-dev

On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:
> Statements clearing AER error status in aer_enable_rootport() has the
> same function as pci_aer_raw_clear_status(). So we replace them, which
> has no functional changes.
> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/pci/pcie/aer.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index d2996afa80f6..eb0193f279f2 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>  				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
>  
>  	/* Clear error status */
> -	pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, &reg32);
> -	pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
> -	pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, &reg32);
> -	pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
> -	pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, &reg32);
> -	pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
> +	pci_aer_raw_clear_status(pdev);

It's true that this is functionally equivalent.

But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
unconditionally clear Error Status") says pci_aer_raw_clear_status()
is only for use in the EDR path (this should have been included in the
function comment), so I think we should preserve that property and use
pci_aer_clear_status() here.

pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
because get_port_device_capability() checks the same thing, so they
should be equivalent here.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-22 20:02       ` Bjorn Helgaas
@ 2022-09-26 13:30         ` Zhuo Chen
  2022-09-26 17:21           ` Bjorn Helgaas
  0 siblings, 1 reply; 17+ messages in thread
From: Zhuo Chen @ 2022-09-26 13:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Serge Semin, allenbh, dave.jiang, linux-scsi, martin.petersen,
	linux-pci, jejb, jdmason, james.smart, linux-kernel, ntb, oohall,
	bhelgaas, dick.kennedy, linuxppc-dev



On 9/23/22 4:02 AM, Bjorn Helgaas wrote:
> On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
>> On 9/12/22 12:22 AM, Serge Semin wrote:
>>> On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
>>>> Status bits for ERR_NONFATAL errors only are cleared in
>>>> pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
>>>> error status in ntb_hw_idt.c and lpfc_attr.c. So we add
>>>> pci_aer_clear_uncorrect_error_status() and change to use it.
>>>
>>> What about the next drivers
>>>
>>> drivers/scsi/lpfc/lpfc_attr.c
>>> drivers/crypto/hisilicon/qm.c
>>> drivers/net/ethernet/intel/ice/ice_main.c
>>>
>>> which call the pci_aer_clear_nonfatal_status() method too?
>>
>> ‘pci_aer_clear_nonfatal_status()’ in
>> drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
>> merged in kernel in: https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776
> 
> It's better if you can use kernel.org URLs that don't depend on
> third parties like github, e.g.,
> 
>    https://git.kernel.org/linus/ca415ea1f03a
> 
Good reminder, I'll pay attention next time.

>> ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
>> removed in the next kernel:
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406
> 
> This is a problem because 00278564a60e ("crypto: hisilicon - Remove
> pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
> and if I apply this series to the PCI tree and Linus merges it before
> Herbert's cryptodev changes, it will break the build.
> 
> I think we need to split this patch up like this:
> 
>    - Add pci_aer_clear_uncorrect_error_status() to PCI core
>    - Convert dpc to use pci_aer_clear_uncorrect_error_status()
>      (I might end up squashing with above)
>    - Convert lpfc to use pci_aer_clear_uncorrect_error_status()
>    - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
>    - Unexport pci_aer_clear_nonfatal_status()
> 
> Then I can apply all but the last patch safely.  If the crypto changes
> are merged first, we can add the last one; otherwise we can do it for
> the next cycle.
> 
Good proposal. I will implement these in the next version.

Do I need to put pci related modifications (include patch 2/3 and 3/3) 
in a patch set or just single patches?

>> Uncorrectable error status register was intended to be cleared in
>> drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
>> and
>> https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f
> 
> This will be a behavior change for lpfc and ntb_hw_idt.  It looks like
> it changes the behavior back to what it was before e7b0b847de6d
> ("PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery"),
> so it might be OK, but splitting these out to their own patches will
> make the change more obvious and we can make sure that's what we want.
> 
> Bjorn

Thanks Bjorn, I will put lpfc and ntb_hw_idt changes in single patchs.
> 
>>>> Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
>>>> no functional changes.
>>>>
>>>> Since pci_aer_clear_nonfatal_status() is used only internally, move
>>>> its declaration to the PCI internal header file. Also, no one cares
>>>> about return value of pci_aer_clear_nonfatal_status(), so make it void.
>>>>
>>>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>>>> ---
>>>>    drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
>>>>    drivers/pci/pci.h               |  2 ++
>>>>    drivers/pci/pcie/aer.c          | 23 ++++++++++++++++++-----
>>>>    drivers/pci/pcie/dpc.c          |  3 +--
>>>>    drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
>>>>    include/linux/aer.h             |  4 ++--
>>>>    6 files changed, 27 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
>>>> index 733557231ed0..de1dbbc5b9de 100644
>>>> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
>>>> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
>>>> @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>>>>    	ret = pci_enable_pcie_error_reporting(pdev);
>>>>    	if (ret != 0)
>>>>    		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
>>>
>>>> -	else /* Cleanup nonfatal error status before getting to init */
>>>> -		pci_aer_clear_nonfatal_status(pdev);
>>>> +	else /* Cleanup uncorrectable error status before getting to init */
>>>> +		pci_aer_clear_uncorrect_error_status(pdev);
>>>
>>>   From the IDT NTB PCIe initialization procedure point of view both of
>>> these methods are equivalent. So for the IDT NTB part:
>>>
>> IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
>> function is clear uncorrectable error status register including fatal and
>> non-fatal error status bits.
>>
>>> Acked-by: Serge Semin <fancer.lancer@gmail.com>
>>>
>>> -Sergey
>>>
>>>>    	/* First enable the PCI device */
>>>>    	ret = pcim_enable_device(pdev);
>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>> index e10cdec6c56e..574176f43025 100644
>>>> --- a/drivers/pci/pci.h
>>>> +++ b/drivers/pci/pci.h
>>>> @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
>>>>    void pci_aer_exit(struct pci_dev *dev);
>>>>    extern const struct attribute_group aer_stats_attr_group;
>>>>    void pci_aer_clear_fatal_status(struct pci_dev *dev);
>>>> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>>>    int pci_aer_clear_status(struct pci_dev *dev);
>>>>    int pci_aer_raw_clear_status(struct pci_dev *dev);
>>>>    #else
>>>> @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
>>>>    static inline void pci_aer_init(struct pci_dev *d) { }
>>>>    static inline void pci_aer_exit(struct pci_dev *d) { }
>>>>    static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
>>>> +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
>>>>    static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; }
>>>>    static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; }
>>>>    #endif
>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>> index 7952e5efd6cf..d2996afa80f6 100644
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
>>>> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>>    {
>>>>    	int aer = dev->aer_cap;
>>>>    	u32 status, sev;
>>>>    	if (!pcie_aer_is_native(dev))
>>>> -		return -EIO;
>>>> +		return;
>>>>    	/* Clear status bits for ERR_NONFATAL errors only */
>>>>    	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
>>>> @@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>>    	status &= ~sev;
>>>>    	if (status)
>>>>    		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>>>> -
>>>> -	return 0;
>>>>    }
>>>> -EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
>>>>    void pci_aer_clear_fatal_status(struct pci_dev *dev)
>>>>    {
>>>> @@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>>>>    		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>>>>    }
>>>> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
>>>> +{
>>>> +	int aer = dev->aer_cap;
>>>> +	u32 status;
>>>> +
>>>> +	if (!pcie_aer_is_native(dev))
>>>> +		return -EIO;
>>>> +
>>>> +	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &status);
>>>> +	if (status)
>>>> +		pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
>>>> +
>>>>    /**
>>>>     * pci_aer_raw_clear_status - Clear AER error registers.
>>>>     * @dev: the PCI device
>>>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>>>> index 3e9afee02e8d..7942073fbb34 100644
>>>> --- a/drivers/pci/pcie/dpc.c
>>>> +++ b/drivers/pci/pcie/dpc.c
>>>> @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
>>>>    		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
>>>>    		 aer_get_device_error_info(pdev, &info)) {
>>>>    		aer_print_error(pdev, &info);
>>>> -		pci_aer_clear_nonfatal_status(pdev);
>>>> -		pci_aer_clear_fatal_status(pdev);
>>>> +		pci_aer_clear_uncorrect_error_status(pdev);
>>>>    	}
>>>>    }
>>>> diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
>>>> index 3caaa7c4af48..1ed8d1640325 100644
>>>> --- a/drivers/scsi/lpfc/lpfc_attr.c
>>>> +++ b/drivers/scsi/lpfc/lpfc_attr.c
>>>> @@ -4712,7 +4712,7 @@ static DEVICE_ATTR_RW(lpfc_aer_support);
>>>>     * Description:
>>>>     * If the @buf contains 1 and the device currently has the AER support
>>>>     * enabled, then invokes the kernel AER helper routine
>>>> - * pci_aer_clear_nonfatal_status() to clean up the uncorrectable
>>>> + * pci_aer_clear_uncorrect_error_status() to clean up the uncorrectable
>>>>     * error status register.
>>>>     *
>>>>     * Notes:
>>>> @@ -4738,7 +4738,7 @@ lpfc_aer_cleanup_state(struct device *dev, struct device_attribute *attr,
>>>>    		return -EINVAL;
>>>>    	if (phba->hba_flag & HBA_AER_ENABLED)
>>>> -		rc = pci_aer_clear_nonfatal_status(phba->pcidev);
>>>> +		rc = pci_aer_clear_uncorrect_error_status(phba->pcidev);
>>>>    	if (rc == 0)
>>>>    		return strlen(buf);
>>>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>>>> index 97f64ba1b34a..f638ad955deb 100644
>>>> --- a/include/linux/aer.h
>>>> +++ b/include/linux/aer.h
>>>> @@ -44,7 +44,7 @@ struct aer_capability_regs {
>>>>    /* PCIe port driver needs this function to enable AER */
>>>>    int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>>>>    int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>>>> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>>> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev);
>>>>    void pci_save_aer_state(struct pci_dev *dev);
>>>>    void pci_restore_aer_state(struct pci_dev *dev);
>>>>    #else
>>>> @@ -56,7 +56,7 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>>>>    {
>>>>    	return -EINVAL;
>>>>    }
>>>> -static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>>> +static inline int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
>>>>    {
>>>>    	return -EINVAL;
>>>>    }
>>>> -- 
>>>> 2.30.1 (Apple Git-130)
>>>>
>>
>> -- 
>> Thanks,
>> Zhuo Chen

-- 
Thanks,
Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
  2022-09-22 21:08   ` Bjorn Helgaas
@ 2022-09-26 14:01     ` Zhuo Chen
  2022-09-26 18:09       ` Bjorn Helgaas
  0 siblings, 1 reply; 17+ messages in thread
From: Zhuo Chen @ 2022-09-26 14:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen,
	linux-scsi, linux-pci, linux-kernel, ntb, linuxppc-dev



On 9/23/22 5:08 AM, Bjorn Helgaas wrote:
> On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
>> When state is pci_channel_io_frozen in pcie_do_recovery(),
>> the severity is fatal and fatal status should be cleared.
>> So we add pci_aer_clear_fatal_status().
> 
> Seems sensible to me.  Did you find this by code inspection or by
> debugging a problem?  If the latter, it would be nice to mention the
> symptoms of the problem in the commit log.

I found this by code inspection so I may not enumerate what kind of 
problems this code will cause.
> 
>> Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
>> and pci_aer_clear_nonfatal_status() contains the function of
>> 'if (host->native_aer || pcie_ports_native)', so we move them
>> out of it.
> 
> Wrap commit log to fill 75 columns.
> 
>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>> ---
>>   drivers/pci/pcie/err.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index 0c5a143025af..e0a8ade4c3fe 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>   	 * it is responsible for clearing this status.  In that case, the
>>   	 * signaling device may not even be visible to the OS.
>>   	 */
>> -	if (host->native_aer || pcie_ports_native) {
>> +	if (host->native_aer || pcie_ports_native)
>>   		pcie_clear_device_status(dev);
> 
> pcie_clear_device_status() doesn't check for pcie_aer_is_native()
> internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
> errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
> status only when we control AER"), both callers check before calling
> it.
> 
> I think we should move the check inside pcie_clear_device_status().
> That could be a separate preliminary patch.
> 
> There are a couple other places (aer_root_reset() and
> get_port_device_capability()) that do the same check and could be
> changed to use pcie_aer_is_native() instead.  That could be another
> preliminary patch.
> 
Good suggestion. But I have only one doubt. In aer_root_reset(), if we 
use "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return 
false. It's different from just using "(host->native_aer ||
pcie_ports_native)".
Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL 
pointer check should be added in pcie_aer_is_native() because root may 
be NULL.

> 
>> +	if (state == pci_channel_io_frozen)
>> +		pci_aer_clear_fatal_status(dev);
>> +	else
>>   		pci_aer_clear_nonfatal_status(dev);
>> -	}
>> +
>>   	pci_info(bridge, "device recovery successful\n");
>>   	return status;
>>   
>> -- 
>> 2.30.1 (Apple Git-130)
>>

-- 
Thanks,
Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status
  2022-09-22 21:50   ` Bjorn Helgaas
@ 2022-09-26 14:16     ` Zhuo Chen
  2022-09-26 17:22       ` Bjorn Helgaas
  0 siblings, 1 reply; 17+ messages in thread
From: Zhuo Chen @ 2022-09-26 14:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: fancer.lancer, jdmason, dave.jiang, allenbh, bhelgaas, ruscur,
	oohall, james.smart, dick.kennedy, jejb, martin.petersen,
	linux-scsi, linux-pci, linux-kernel, ntb, linuxppc-dev



On 9/23/22 5:50 AM, Bjorn Helgaas wrote:
> On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:
>> Statements clearing AER error status in aer_enable_rootport() has the
>> same function as pci_aer_raw_clear_status(). So we replace them, which
>> has no functional changes.
>>
>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>> ---
>>   drivers/pci/pcie/aer.c | 7 +------
>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index d2996afa80f6..eb0193f279f2 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>>   				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
>>   
>>   	/* Clear error status */
>> -	pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, &reg32);
>> -	pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
>> -	pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, &reg32);
>> -	pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
>> -	pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, &reg32);
>> -	pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
>> +	pci_aer_raw_clear_status(pdev);
> 
> It's true that this is functionally equivalent.
> 
> But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
> unconditionally clear Error Status") says pci_aer_raw_clear_status()
> is only for use in the EDR path (this should have been included in the
> function comment), so I think we should preserve that property and use
> pci_aer_clear_status() here.
> 
> pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
> except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
> can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
> because get_port_device_capability() checks the same thing, so they
> should be equivalent here.
> 
> Bjorn
Thanks Bjorn, this very detailed correction is helpful. By the way, 
'only for use in the EDR path' obviously written in the function 
comments may be better. So far only commit log has included these.

I will change to use pci_aer_clear_status() in next patch.

-- 
Thanks,
Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
  2022-09-26 13:30         ` Zhuo Chen
@ 2022-09-26 17:21           ` Bjorn Helgaas
  0 siblings, 0 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2022-09-26 17:21 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: allenbh, dave.jiang, martin.petersen, linux-scsi, linux-pci,
	jejb, james.smart, Serge Semin, linux-kernel, ntb, oohall,
	jdmason, bhelgaas, dick.kennedy, linuxppc-dev

On Mon, Sep 26, 2022 at 09:30:48PM +0800, Zhuo Chen wrote:
> On 9/23/22 4:02 AM, Bjorn Helgaas wrote:
> > On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
> > > On 9/12/22 12:22 AM, Serge Semin wrote:
> > > > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:

> > > ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
> > > removed in the next kernel:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406
> > 
> > This is a problem because 00278564a60e ("crypto: hisilicon - Remove
> > pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
> > and if I apply this series to the PCI tree and Linus merges it before
> > Herbert's cryptodev changes, it will break the build.
> > 
> > I think we need to split this patch up like this:
> > 
> >    - Add pci_aer_clear_uncorrect_error_status() to PCI core
> >    - Convert dpc to use pci_aer_clear_uncorrect_error_status()
> >      (I might end up squashing with above)
> >    - Convert lpfc to use pci_aer_clear_uncorrect_error_status()
> >    - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
> >    - Unexport pci_aer_clear_nonfatal_status()
> > 
> > Then I can apply all but the last patch safely.  If the crypto changes
> > are merged first, we can add the last one; otherwise we can do it for
> > the next cycle.
> > 
> Good proposal. I will implement these in the next version.
> 
> Do I need to put pci related modifications (include patch 2/3 and 3/3) in a
> patch set or just single patches?

When in doubt, put them in separate patches.  It's trivial for me to
squash them together if that makes more sense, but much more difficult
for me to split them apart.

Thanks for helping clean up this area!

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status
  2022-09-26 14:16     ` Zhuo Chen
@ 2022-09-26 17:22       ` Bjorn Helgaas
  0 siblings, 0 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2022-09-26 17:22 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: allenbh, dave.jiang, linux-scsi, martin.petersen, linux-pci,
	jejb, jdmason, james.smart, fancer.lancer, linux-kernel, ntb,
	oohall, bhelgaas, dick.kennedy, linuxppc-dev

On Mon, Sep 26, 2022 at 10:16:23PM +0800, Zhuo Chen wrote:
> On 9/23/22 5:50 AM, Bjorn Helgaas wrote:
> > On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:
> > > Statements clearing AER error status in aer_enable_rootport() has the
> > > same function as pci_aer_raw_clear_status(). So we replace them, which
> > > has no functional changes.

> > > -	pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, &reg32);
> > > -	pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
> > > -	pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, &reg32);
> > > -	pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
> > > -	pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, &reg32);
> > > -	pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
> > > +	pci_aer_raw_clear_status(pdev);
> > 
> > It's true that this is functionally equivalent.
> > 
> > But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
> > unconditionally clear Error Status") says pci_aer_raw_clear_status()
> > is only for use in the EDR path (this should have been included in the
> > function comment), so I think we should preserve that property and use
> > pci_aer_clear_status() here.
> > 
> > pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
> > except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
> > can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
> > because get_port_device_capability() checks the same thing, so they
> > should be equivalent here.
> > 
> Thanks Bjorn, this very detailed correction is helpful. By the way, 'only
> for use in the EDR path' obviously written in the function comments may be
> better. So far only commit log has included these.

Yes, definitely!  I goofed when I applied that patch without making
sure there was something in the function comment.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
  2022-09-26 14:01     ` Zhuo Chen
@ 2022-09-26 18:09       ` Bjorn Helgaas
  2022-09-27 13:41         ` [External] " Zhuo Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2022-09-26 18:09 UTC (permalink / raw)
  To: Zhuo Chen
  Cc: allenbh, dave.jiang, linux-scsi, martin.petersen, linux-pci,
	jejb, jdmason, james.smart, fancer.lancer, linux-kernel, ntb,
	oohall, bhelgaas, dick.kennedy, linuxppc-dev

On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote:
> On 9/23/22 5:08 AM, Bjorn Helgaas wrote:
> > On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
> > > When state is pci_channel_io_frozen in pcie_do_recovery(),
> > > the severity is fatal and fatal status should be cleared.
> > > So we add pci_aer_clear_fatal_status().
> > 
> > Seems sensible to me.  Did you find this by code inspection or by
> > debugging a problem?  If the latter, it would be nice to mention the
> > symptoms of the problem in the commit log.
> 
> I found this by code inspection so I may not enumerate what kind of problems
> this code will cause.
> > 
> > > Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
> > > and pci_aer_clear_nonfatal_status() contains the function of
> > > 'if (host->native_aer || pcie_ports_native)', so we move them
> > > out of it.
> > 
> > Wrap commit log to fill 75 columns.
> > 
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > > ---
> > >   drivers/pci/pcie/err.c | 8 ++++++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index 0c5a143025af..e0a8ade4c3fe 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > >   	 * it is responsible for clearing this status.  In that case, the
> > >   	 * signaling device may not even be visible to the OS.
> > >   	 */
> > > -	if (host->native_aer || pcie_ports_native) {
> > > +	if (host->native_aer || pcie_ports_native)
> > >   		pcie_clear_device_status(dev);
> > 
> > pcie_clear_device_status() doesn't check for pcie_aer_is_native()
> > internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
> > errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
> > status only when we control AER"), both callers check before calling
> > it.
> > 
> > I think we should move the check inside pcie_clear_device_status().
> > That could be a separate preliminary patch.
> > 
> > There are a couple other places (aer_root_reset() and
> > get_port_device_capability()) that do the same check and could be
> > changed to use pcie_aer_is_native() instead.  That could be another
> > preliminary patch.
> > 
> Good suggestion. But I have only one doubt. In aer_root_reset(), if we use
> "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
> is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return
> false. It's different from just using "(host->native_aer ||
> pcie_ports_native)".
> Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL
> pointer check should be added in pcie_aer_is_native() because root may be
> NULL.

Good point.  In aer_root_reset(), we're updating Root Port registers,
so I think they should look like:

  if (pcie_aer_is_native(root) && aer) {
    ...
  }

Does that seem safe and equivalent to you?

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [External] Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
  2022-09-26 18:09       ` Bjorn Helgaas
@ 2022-09-27 13:41         ` Zhuo Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Zhuo Chen @ 2022-09-27 13:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: allenbh, dave.jiang, linux-scsi, martin.petersen, linux-pci,
	jejb, jdmason, james.smart, fancer.lancer, linux-kernel, ntb,
	oohall, bhelgaas, dick.kennedy, linuxppc-dev



On 9/27/22 2:09 AM, Bjorn Helgaas wrote:
> On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote:
>> On 9/23/22 5:08 AM, Bjorn Helgaas wrote:
>>> On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
>>>> When state is pci_channel_io_frozen in pcie_do_recovery(),
>>>> the severity is fatal and fatal status should be cleared.
>>>> So we add pci_aer_clear_fatal_status().
>>>
>>> Seems sensible to me.  Did you find this by code inspection or by
>>> debugging a problem?  If the latter, it would be nice to mention the
>>> symptoms of the problem in the commit log.
>>
>> I found this by code inspection so I may not enumerate what kind of problems
>> this code will cause.
>>>
>>>> Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
>>>> and pci_aer_clear_nonfatal_status() contains the function of
>>>> 'if (host->native_aer || pcie_ports_native)', so we move them
>>>> out of it.
>>>
>>> Wrap commit log to fill 75 columns.
>>>
>>>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>>>> ---
>>>>    drivers/pci/pcie/err.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index 0c5a143025af..e0a8ade4c3fe 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>>    	 * it is responsible for clearing this status.  In that case, the
>>>>    	 * signaling device may not even be visible to the OS.
>>>>    	 */
>>>> -	if (host->native_aer || pcie_ports_native) {
>>>> +	if (host->native_aer || pcie_ports_native)
>>>>    		pcie_clear_device_status(dev);
>>>
>>> pcie_clear_device_status() doesn't check for pcie_aer_is_native()
>>> internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
>>> errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
>>> status only when we control AER"), both callers check before calling
>>> it.
>>>
>>> I think we should move the check inside pcie_clear_device_status().
>>> That could be a separate preliminary patch.
>>>
>>> There are a couple other places (aer_root_reset() and
>>> get_port_device_capability()) that do the same check and could be
>>> changed to use pcie_aer_is_native() instead.  That could be another
>>> preliminary patch.
>>>
>> Good suggestion. But I have only one doubt. In aer_root_reset(), if we use
>> "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
>> is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return
>> false. It's different from just using "(host->native_aer ||
>> pcie_ports_native)".
>> Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL
>> pointer check should be added in pcie_aer_is_native() because root may be
>> NULL.
> 
> Good point.  In aer_root_reset(), we're updating Root Port registers,
> so I think they should look like:
> 
>    if (pcie_aer_is_native(root) && aer) {
>      ...
>    }
> 
> Does that seem safe and equivalent to you?
> 
> Bjorn

I think ‘if (aer && pcie_aer_is_native(root))’ might be safer,
because when root is NULL, 'aer' will be NUll as well, and the
predicate will return false without entering pcie_aer_is_native(root).


-- 
Thanks,
Zhuo Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-09-27 13:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-01 18:16 [PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api Zhuo Chen
2022-09-01 18:16 ` [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
2022-09-11 16:22   ` Serge Semin
2022-09-11 17:09     ` [External] " Zhuo Chen
2022-09-11 17:55       ` Serge Semin
2022-09-22 20:02       ` Bjorn Helgaas
2022-09-26 13:30         ` Zhuo Chen
2022-09-26 17:21           ` Bjorn Helgaas
2022-09-01 18:16 ` [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery() Zhuo Chen
2022-09-22 21:08   ` Bjorn Helgaas
2022-09-26 14:01     ` Zhuo Chen
2022-09-26 18:09       ` Bjorn Helgaas
2022-09-27 13:41         ` [External] " Zhuo Chen
2022-09-01 18:16 ` [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status Zhuo Chen
2022-09-22 21:50   ` Bjorn Helgaas
2022-09-26 14:16     ` Zhuo Chen
2022-09-26 17:22       ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).