linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits
@ 2018-06-07  6:00 Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 2/6] PCI/AER: Clear uncorrectable fatal error status bits Oza Pawandeep
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Oza Pawandeep @ 2018-06-07  6:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

PCIe ERR_NONFATAL and ERR_FATAL are uncorrectable errors, and clearing
uncorrectable error bits should take error mask into account.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 377e576..8cbc62b 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -341,8 +341,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
  */
 static void aer_error_resume(struct pci_dev *dev)
 {
-	int pos;
-	u32 status, mask;
 	u16 reg16;
 
 	/* Clean up Root device status */
@@ -350,11 +348,7 @@ static void aer_error_resume(struct pci_dev *dev)
 	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);
 
 	/* Clean AER Root Error Status */
-	pos = dev->aer_cap;
-	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
-	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
-	status &= ~mask; /* Clear corresponding nonfatal bits */
-	pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
+	pci_cleanup_aer_uncorrect_error_status(dev);
 }
 
 /**
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 946f3f6..309f3f5 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -50,13 +50,17 @@ EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 {
 	int pos;
-	u32 status;
+	u32 status, mask;
 
 	pos = dev->aer_cap;
 	if (!pos)
 		return -EIO;
 
+	/* Clean AER Root Error Status */
+	pos = dev->aer_cap;
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
+	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
+	status &= ~mask; /* Clear corresponding nonfatal bits */
 	if (status)
 		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH NEXT 2/6] PCI/AER: Clear uncorrectable fatal error status bits
  2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
@ 2018-06-07  6:00 ` Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 3/6] PCI/ERR: Cleanup ERR_FATAL of error broadcast Oza Pawandeep
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Oza Pawandeep @ 2018-06-07  6:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

During ERR_FATAL handling, AER calls pci_cleanup_aer_uncorrect_error_status
which should handle pci_channel_io_frozen case in order to determine if it
has to clear fatal bits or nonfatal bits.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 309f3f5..6745e37 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -60,7 +60,12 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 	pos = dev->aer_cap;
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
-	status &= ~mask; /* Clear corresponding nonfatal bits */
+
+	if (dev->error_state == pci_channel_io_normal)
+		status &= ~mask; /* Clear corresponding nonfatal bits */
+	else
+		status &= mask; /* Clear corresponding fatal bits */
+
 	if (status)
 		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
 
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index f7ce0cb..00d2875 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -288,6 +288,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 	struct pci_dev *pdev, *temp;
 	pci_ers_result_t result;
 
+	dev->error_state = pci_channel_io_frozen;
 	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
 		udev = dev;
 	else
@@ -323,6 +324,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 		if (pcie_wait_for_link(udev, true))
 			pci_rescan_bus(udev->bus);
 		pci_info(dev, "Device recovery from fatal error successful\n");
+		dev->error_state = pci_channel_io_normal;
 	} else {
 		pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
 		pci_info(dev, "Device recovery from fatal error failed\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH NEXT 3/6] PCI/ERR: Cleanup ERR_FATAL of error broadcast
  2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 2/6] PCI/AER: Clear uncorrectable fatal error status bits Oza Pawandeep
@ 2018-06-07  6:00 ` Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 4/6] PCI/AER: Clear device status error bits during ERR_FATAL and ERR_NONFATAL Oza Pawandeep
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Oza Pawandeep @ 2018-06-07  6:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

ERR_FATAL is handled by resetting the Link in software, skipping the
driver pci_error_handlers callbacks, removing the devices from the PCI
subsystem, and re-enumerating, so now no more ERR_FATAL handling is
required inside pci_broadcast_error_message()

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 00d2875..3998ed7 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -259,15 +259,10 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 		/*
 		 * If the error is reported by an end point, we think this
 		 * error is related to the upstream link of the end point.
+		 * The error is non fatal so the bus is ok, just invoke
+		 * the callback for the function that logged the error.
 		 */
-		if (state == pci_channel_io_normal)
-			/*
-			 * the error is non fatal so the bus is ok, just invoke
-			 * the callback for the function that logged the error.
-			 */
-			cb(dev, &result_data);
-		else
-			pci_walk_bus(dev->bus, cb, &result_data);
+		cb(dev, &result_data);
 	}
 
 	return result_data.result;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH NEXT 4/6] PCI/AER: Clear device status error bits during ERR_FATAL and ERR_NONFATAL
  2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 2/6] PCI/AER: Clear uncorrectable fatal error status bits Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 3/6] PCI/ERR: Cleanup ERR_FATAL of error broadcast Oza Pawandeep
@ 2018-06-07  6:00 ` Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 5/6] PCI/AER: Clear correctable status bits in device register Oza Pawandeep
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Oza Pawandeep @ 2018-06-07  6:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

We are handling ERR_FATAL by resetting the Link in software,skipping the
driver pci_error_handlers callbacks, removing the devices from the PCI
subsystem, and re-enumerating, the device status has to be cleared,
which fixes BUG existed before.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 8cbc62b..0d9eaba 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -341,12 +341,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
  */
 static void aer_error_resume(struct pci_dev *dev)
 {
-	u16 reg16;
-
 	/* Clean up Root device status */
-	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &reg16);
-	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);
-
+	pci_cleanup_aer_error_device_status(dev);
 	/* Clean AER Root Error Status */
 	pci_cleanup_aer_uncorrect_error_status(dev);
 }
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 6745e37..95e9828 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -47,6 +47,17 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
 
+int pci_cleanup_aer_error_device_status(struct pci_dev *dev)
+{
+	u16 reg16;
+
+	/* Clean up Root device status */
+	pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &reg16);
+	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);
+
+	return 0;
+}
+
 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 {
 	int pos;
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 3998ed7..e1e642c 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -252,6 +252,7 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 			dev->error_state = state;
 		pci_walk_bus(dev->subordinate, cb, &result_data);
 		if (cb == report_resume) {
+			pci_cleanup_aer_error_device_status(dev);
 			pci_cleanup_aer_uncorrect_error_status(dev);
 			dev->error_state = pci_channel_io_normal;
 		}
@@ -312,6 +313,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 		 * do error recovery on all subordinates of the bridge instead
 		 * of the bridge and clear the error status of the bridge.
 		 */
+		pci_cleanup_aer_error_device_status(dev);
 		pci_cleanup_aer_uncorrect_error_status(dev);
 	}
 
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 514bffa..165a147 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -44,6 +44,7 @@ struct aer_capability_regs {
 /* PCIe port driver needs this function to enable AER */
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
+int pci_cleanup_aer_error_device_status(struct pci_dev *dev);
 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
 int pci_cleanup_aer_error_status_regs(struct pci_dev *dev);
 #else
@@ -55,6 +56,10 @@ static inline int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
+static inline int pci_cleanup_aer_error_device_status(struct pci_dev *dev)
+{
+	return -EINVAL;
+}
 static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 {
 	return -EINVAL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH NEXT 5/6] PCI/AER: Clear correctable status bits in device register
  2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
                   ` (2 preceding siblings ...)
  2018-06-07  6:00 ` [PATCH NEXT 4/6] PCI/AER: Clear device status error bits during ERR_FATAL and ERR_NONFATAL Oza Pawandeep
@ 2018-06-07  6:00 ` Oza Pawandeep
  2018-06-07  6:00 ` [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset() Oza Pawandeep
  2018-06-07 13:21 ` [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Bjorn Helgaas
  5 siblings, 0 replies; 16+ messages in thread
From: Oza Pawandeep @ 2018-06-07  6:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

In case of correctable error Device Status Register sets
Correctable Error Detected, which should be cleared after handling
the error

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 95e9828..0e4e99a 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -271,6 +271,7 @@ static void handle_error_source(struct pcie_device *aerdev,
 		if (pos)
 			pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS,
 					info->status);
+		pci_cleanup_aer_error_device_status(dev);
 	} else if (info->severity == AER_NONFATAL)
 		pcie_do_nonfatal_recovery(dev);
 	else if (info->severity == AER_FATAL)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
                   ` (3 preceding siblings ...)
  2018-06-07  6:00 ` [PATCH NEXT 5/6] PCI/AER: Clear correctable status bits in device register Oza Pawandeep
@ 2018-06-07  6:00 ` Oza Pawandeep
  2018-06-07 13:48   ` poza
  2018-06-07 13:21 ` [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Bjorn Helgaas
  5 siblings, 1 reply; 16+ messages in thread
From: Oza Pawandeep @ 2018-06-07  6:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi
  Cc: Oza Pawandeep

We are handling ERR_FATAL by resetting the Link in software,skipping the
driver pci_error_handlers callbacks, removing the devices from the PCI
subsystem, and re-enumerating, as a result of that, no more calling
pcie_portdrv_slot_reset in ERR_FATAL case.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 973f1b8..92f5d330 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
 
 /* global data */
 
-static int pcie_portdrv_restore_config(struct pci_dev *dev)
-{
-	int retval;
-
-	retval = pci_enable_device(dev);
-	if (retval)
-		return retval;
-	pci_set_master(dev);
-	return 0;
-}
-
 #ifdef CONFIG_PM
 static int pcie_port_runtime_suspend(struct device *dev)
 {
@@ -162,14 +151,6 @@ static pci_ers_result_t pcie_portdrv_mmio_enabled(struct pci_dev *dev)
 
 static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 {
-	/* If fatal, restore cfg space for possible link reset at upstream */
-	if (dev->error_state == pci_channel_io_frozen) {
-		dev->state_saved = true;
-		pci_restore_state(dev);
-		pcie_portdrv_restore_config(dev);
-		pci_enable_pcie_error_reporting(dev);
-	}
-
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits
  2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
                   ` (4 preceding siblings ...)
  2018-06-07  6:00 ` [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset() Oza Pawandeep
@ 2018-06-07 13:21 ` Bjorn Helgaas
  2018-06-07 13:44   ` poza
  5 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2018-06-07 13:21 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On Thu, Jun 07, 2018 at 02:00:29AM -0400, Oza Pawandeep wrote:
> PCIe ERR_NONFATAL and ERR_FATAL are uncorrectable errors, and clearing
> uncorrectable error bits should take error mask into account.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

If/when you repost these, please include a [0/6] cover letter with an
overview of the purpose of the series.

I assume these are for v4.19, so I'll look at them after the merge
window.

If they fix issues introduced during the v4.18 merge window, we may be
able to merge them during the v4.18 -rc cycle.  In this case, I would
need specifics about what exactly the problems are.

> diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
> index 377e576..8cbc62b 100644
> --- a/drivers/pci/pcie/aer/aerdrv.c
> +++ b/drivers/pci/pcie/aer/aerdrv.c
> @@ -341,8 +341,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>   */
>  static void aer_error_resume(struct pci_dev *dev)
>  {
> -	int pos;
> -	u32 status, mask;
>  	u16 reg16;
>  
>  	/* Clean up Root device status */
> @@ -350,11 +348,7 @@ static void aer_error_resume(struct pci_dev *dev)
>  	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);
>  
>  	/* Clean AER Root Error Status */
> -	pos = dev->aer_cap;
> -	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
> -	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
> -	status &= ~mask; /* Clear corresponding nonfatal bits */
> -	pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
> +	pci_cleanup_aer_uncorrect_error_status(dev);
>  }
>  
>  /**
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 946f3f6..309f3f5 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -50,13 +50,17 @@ EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
>  int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
>  {
>  	int pos;
> -	u32 status;
> +	u32 status, mask;
>  
>  	pos = dev->aer_cap;
>  	if (!pos)
>  		return -EIO;
>  
> +	/* Clean AER Root Error Status */
> +	pos = dev->aer_cap;
>  	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
> +	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
> +	status &= ~mask; /* Clear corresponding nonfatal bits */
>  	if (status)
>  		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits
  2018-06-07 13:21 ` [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Bjorn Helgaas
@ 2018-06-07 13:44   ` poza
  0 siblings, 0 replies; 16+ messages in thread
From: poza @ 2018-06-07 13:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-06-07 18:51, Bjorn Helgaas wrote:
> On Thu, Jun 07, 2018 at 02:00:29AM -0400, Oza Pawandeep wrote:
>> PCIe ERR_NONFATAL and ERR_FATAL are uncorrectable errors, and clearing
>> uncorrectable error bits should take error mask into account.
>> 
>> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> If/when you repost these, please include a [0/6] cover letter with an
> overview of the purpose of the series.
> 
> I assume these are for v4.19, so I'll look at them after the merge
> window.
> 
> If they fix issues introduced during the v4.18 merge window, we may be
> able to merge them during the v4.18 -rc cycle.  In this case, I would
> need specifics about what exactly the problems are.

sure Bjorn, will include cover letter.
Mostly these fixes the things which existed before 4.18 as well.
I have a question, please clarify when you get a chance.

I am posting the question on tops of PATCH-6.

Regards,
Oza.

> 
>> diff --git a/drivers/pci/pcie/aer/aerdrv.c 
>> b/drivers/pci/pcie/aer/aerdrv.c
>> index 377e576..8cbc62b 100644
>> --- a/drivers/pci/pcie/aer/aerdrv.c
>> +++ b/drivers/pci/pcie/aer/aerdrv.c
>> @@ -341,8 +341,6 @@ static pci_ers_result_t aer_root_reset(struct 
>> pci_dev *dev)
>>   */
>>  static void aer_error_resume(struct pci_dev *dev)
>>  {
>> -	int pos;
>> -	u32 status, mask;
>>  	u16 reg16;
>> 
>>  	/* Clean up Root device status */
>> @@ -350,11 +348,7 @@ static void aer_error_resume(struct pci_dev *dev)
>>  	pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16);
>> 
>>  	/* Clean AER Root Error Status */
>> -	pos = dev->aer_cap;
>> -	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
>> -	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
>> -	status &= ~mask; /* Clear corresponding nonfatal bits */
>> -	pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
>> +	pci_cleanup_aer_uncorrect_error_status(dev);
>>  }
>> 
>>  /**
>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
>> b/drivers/pci/pcie/aer/aerdrv_core.c
>> index 946f3f6..309f3f5 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_core.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
>> @@ -50,13 +50,17 @@ 
>> EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
>>  int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
>>  {
>>  	int pos;
>> -	u32 status;
>> +	u32 status, mask;
>> 
>>  	pos = dev->aer_cap;
>>  	if (!pos)
>>  		return -EIO;
>> 
>> +	/* Clean AER Root Error Status */
>> +	pos = dev->aer_cap;
>>  	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
>> +	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
>> +	status &= ~mask; /* Clear corresponding nonfatal bits */
>>  	if (status)
>>  		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
>> 
>> --
>> 2.7.4
>> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-07  6:00 ` [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset() Oza Pawandeep
@ 2018-06-07 13:48   ` poza
  2018-06-07 21:34     ` Bjorn Helgaas
  0 siblings, 1 reply; 16+ messages in thread
From: poza @ 2018-06-07 13:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-06-07 11:30, Oza Pawandeep wrote:
> We are handling ERR_FATAL by resetting the Link in software,skipping 
> the
> driver pci_error_handlers callbacks, removing the devices from the PCI
> subsystem, and re-enumerating, as a result of that, no more calling
> pcie_portdrv_slot_reset in ERR_FATAL case.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c 
> b/drivers/pci/pcie/portdrv_pci.c
> index 973f1b8..92f5d330 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
> 
>  /* global data */
> 
> -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> -{
> -	int retval;
> -
> -	retval = pci_enable_device(dev);
> -	if (retval)
> -		return retval;
> -	pci_set_master(dev);
> -	return 0;
> -}
> -
>  #ifdef CONFIG_PM
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
> @@ -162,14 +151,6 @@ static pci_ers_result_t
> pcie_portdrv_mmio_enabled(struct pci_dev *dev)
> 
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> -	/* If fatal, restore cfg space for possible link reset at upstream */
> -	if (dev->error_state == pci_channel_io_frozen) {
> -		dev->state_saved = true;
> -		pci_restore_state(dev);
> -		pcie_portdrv_restore_config(dev);
> -		pci_enable_pcie_error_reporting(dev);
> -	}
> -
>  	return PCI_ERS_RESULT_RECOVERED;
>  }


Hi Bjorn,

the above patch removes ERR_FATAL handling from 
pcie_portdrv_slot_reset()
because now we are handling ERR_FATAL differently than before.

I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
case where it
restores the config space, enable device, set master and enable error 
reporting....
and as far as I understand this is being done for upstream link (bridges 
etc..)

why was it done at the first point (I checked the commit description, 
but could not really get it)
and do we need to handle the same thing in ERR_FATAL now ?

Regards,
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-07 13:48   ` poza
@ 2018-06-07 21:34     ` Bjorn Helgaas
  2018-06-08  4:47       ` poza
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2018-06-07 21:34 UTC (permalink / raw)
  To: poza
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On Thu, Jun 07, 2018 at 07:18:03PM +0530, poza@codeaurora.org wrote:
> On 2018-06-07 11:30, Oza Pawandeep wrote:
> > We are handling ERR_FATAL by resetting the Link in software,skipping the
> > driver pci_error_handlers callbacks, removing the devices from the PCI
> > subsystem, and re-enumerating, as a result of that, no more calling
> > pcie_portdrv_slot_reset in ERR_FATAL case.
> > 
> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> > 
> > diff --git a/drivers/pci/pcie/portdrv_pci.c
> > b/drivers/pci/pcie/portdrv_pci.c
> > index 973f1b8..92f5d330 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
> > 
> >  /* global data */
> > 
> > -static int pcie_portdrv_restore_config(struct pci_dev *dev)
> > -{
> > -	int retval;
> > -
> > -	retval = pci_enable_device(dev);
> > -	if (retval)
> > -		return retval;
> > -	pci_set_master(dev);
> > -	return 0;
> > -}
> > -
> >  #ifdef CONFIG_PM
> >  static int pcie_port_runtime_suspend(struct device *dev)
> >  {
> > @@ -162,14 +151,6 @@ static pci_ers_result_t
> > pcie_portdrv_mmio_enabled(struct pci_dev *dev)
> > 
> >  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
> >  {
> > -	/* If fatal, restore cfg space for possible link reset at upstream */
> > -	if (dev->error_state == pci_channel_io_frozen) {
> > -		dev->state_saved = true;
> > -		pci_restore_state(dev);
> > -		pcie_portdrv_restore_config(dev);
> > -		pci_enable_pcie_error_reporting(dev);
> > -	}
> > -
> >  	return PCI_ERS_RESULT_RECOVERED;
> >  }
> 
> 
> Hi Bjorn,
> 
> the above patch removes ERR_FATAL handling from pcie_portdrv_slot_reset()
> because now we are handling ERR_FATAL differently than before.
> 
> I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL case
> where it
> restores the config space, enable device, set master and enable error
> reporting....
> and as far as I understand this is being done for upstream link (bridges
> etc..)
> 
> why was it done at the first point (I checked the commit description, but
> could not really get it)
> and do we need to handle the same thing in ERR_FATAL now ?

You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
commit log has no useful information.  I don't know any of the history
behind it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-07 21:34     ` Bjorn Helgaas
@ 2018-06-08  4:47       ` poza
  2018-06-08 22:43         ` Keith Busch
  2018-06-08  4:57       ` poza
  2018-06-11 10:01       ` poza
  2 siblings, 1 reply; 16+ messages in thread
From: poza @ 2018-06-08  4:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-06-08 03:04, Bjorn Helgaas wrote:
> On Thu, Jun 07, 2018 at 07:18:03PM +0530, poza@codeaurora.org wrote:
>> On 2018-06-07 11:30, Oza Pawandeep wrote:
>> > We are handling ERR_FATAL by resetting the Link in software,skipping the
>> > driver pci_error_handlers callbacks, removing the devices from the PCI
>> > subsystem, and re-enumerating, as a result of that, no more calling
>> > pcie_portdrv_slot_reset in ERR_FATAL case.
>> >
>> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>> >
>> > diff --git a/drivers/pci/pcie/portdrv_pci.c
>> > b/drivers/pci/pcie/portdrv_pci.c
>> > index 973f1b8..92f5d330 100644
>> > --- a/drivers/pci/pcie/portdrv_pci.c
>> > +++ b/drivers/pci/pcie/portdrv_pci.c
>> > @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>> >
>> >  /* global data */
>> >
>> > -static int pcie_portdrv_restore_config(struct pci_dev *dev)
>> > -{
>> > -	int retval;
>> > -
>> > -	retval = pci_enable_device(dev);
>> > -	if (retval)
>> > -		return retval;
>> > -	pci_set_master(dev);
>> > -	return 0;
>> > -}
>> > -
>> >  #ifdef CONFIG_PM
>> >  static int pcie_port_runtime_suspend(struct device *dev)
>> >  {
>> > @@ -162,14 +151,6 @@ static pci_ers_result_t
>> > pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>> >
>> >  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>> >  {
>> > -	/* If fatal, restore cfg space for possible link reset at upstream */
>> > -	if (dev->error_state == pci_channel_io_frozen) {
>> > -		dev->state_saved = true;
>> > -		pci_restore_state(dev);
>> > -		pcie_portdrv_restore_config(dev);
>> > -		pci_enable_pcie_error_reporting(dev);
>> > -	}
>> > -
>> >  	return PCI_ERS_RESULT_RECOVERED;
>> >  }
>> 
>> 
>> Hi Bjorn,
>> 
>> the above patch removes ERR_FATAL handling from 
>> pcie_portdrv_slot_reset()
>> because now we are handling ERR_FATAL differently than before.
>> 
>> I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
>> case
>> where it
>> restores the config space, enable device, set master and enable error
>> reporting....
>> and as far as I understand this is being done for upstream link 
>> (bridges
>> etc..)
>> 
>> why was it done at the first point (I checked the commit description, 
>> but
>> could not really get it)
>> and do we need to handle the same thing in ERR_FATAL now ?
> 
> You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
> error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
> commit log has no useful information.  I don't know any of the history
> behind it.

Keith,

do you know why in ERR_FATAL case following was done ?
have a look at pcie_portdrv_slot_reset() handling (for bridges, switches 
etc..)

Regards,
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-07 21:34     ` Bjorn Helgaas
  2018-06-08  4:47       ` poza
@ 2018-06-08  4:57       ` poza
  2018-06-08 10:41         ` okaya
  2018-06-11 10:01       ` poza
  2 siblings, 1 reply; 16+ messages in thread
From: poza @ 2018-06-08  4:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-06-08 03:04, Bjorn Helgaas wrote:
> On Thu, Jun 07, 2018 at 07:18:03PM +0530, poza@codeaurora.org wrote:
>> On 2018-06-07 11:30, Oza Pawandeep wrote:
>> > We are handling ERR_FATAL by resetting the Link in software,skipping the
>> > driver pci_error_handlers callbacks, removing the devices from the PCI
>> > subsystem, and re-enumerating, as a result of that, no more calling
>> > pcie_portdrv_slot_reset in ERR_FATAL case.
>> >
>> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>> >
>> > diff --git a/drivers/pci/pcie/portdrv_pci.c
>> > b/drivers/pci/pcie/portdrv_pci.c
>> > index 973f1b8..92f5d330 100644
>> > --- a/drivers/pci/pcie/portdrv_pci.c
>> > +++ b/drivers/pci/pcie/portdrv_pci.c
>> > @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>> >
>> >  /* global data */
>> >
>> > -static int pcie_portdrv_restore_config(struct pci_dev *dev)
>> > -{
>> > -	int retval;
>> > -
>> > -	retval = pci_enable_device(dev);
>> > -	if (retval)
>> > -		return retval;
>> > -	pci_set_master(dev);
>> > -	return 0;
>> > -}
>> > -
>> >  #ifdef CONFIG_PM
>> >  static int pcie_port_runtime_suspend(struct device *dev)
>> >  {
>> > @@ -162,14 +151,6 @@ static pci_ers_result_t
>> > pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>> >
>> >  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>> >  {
>> > -	/* If fatal, restore cfg space for possible link reset at upstream */
>> > -	if (dev->error_state == pci_channel_io_frozen) {
>> > -		dev->state_saved = true;
>> > -		pci_restore_state(dev);
>> > -		pcie_portdrv_restore_config(dev);
>> > -		pci_enable_pcie_error_reporting(dev);
>> > -	}
>> > -
>> >  	return PCI_ERS_RESULT_RECOVERED;
>> >  }
>> 
>> 
>> Hi Bjorn,
>> 
>> the above patch removes ERR_FATAL handling from 
>> pcie_portdrv_slot_reset()
>> because now we are handling ERR_FATAL differently than before.
>> 
>> I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
>> case
>> where it
>> restores the config space, enable device, set master and enable error
>> reporting....
>> and as far as I understand this is being done for upstream link 
>> (bridges
>> etc..)
>> 
>> why was it done at the first point (I checked the commit description, 
>> but
>> could not really get it)
>> and do we need to handle the same thing in ERR_FATAL now ?
> 
> You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
> error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
> commit log has no useful information.  I don't know any of the history
> behind it.


Yes Bjorn thats right.
I am trying to understand it but no clue.
since it is restoring the stuffs in ERR_FATAL case, why would PCIe 
bridge loose all the settings ?  [config space, aer bits, master, device 
enable etc..)
Max we do is link_reset in ERR_FATAL case, and Secondary bus reset 
should affect downstream components (not upstream)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-08  4:57       ` poza
@ 2018-06-08 10:41         ` okaya
  0 siblings, 0 replies; 16+ messages in thread
From: okaya @ 2018-06-08 10:41 UTC (permalink / raw)
  To: poza
  Cc: Bjorn Helgaas, Bjorn Helgaas, Philippe Ombredanne,
	Thomas Gleixner, Greg Kroah-Hartman, Kate Stewart, linux-pci,
	linux-kernel, Dongdong Liu, Keith Busch, Wei Zhang, Timur Tabi,
	linux-pci-owner

On 2018-06-08 00:57, poza@codeaurora.org wrote:
> On 2018-06-08 03:04, Bjorn Helgaas wrote:
>> On Thu, Jun 07, 2018 at 07:18:03PM +0530, poza@codeaurora.org wrote:
>>> On 2018-06-07 11:30, Oza Pawandeep wrote:
>>> > We are handling ERR_FATAL by resetting the Link in software,skipping the
>>> > driver pci_error_handlers callbacks, removing the devices from the PCI
>>> > subsystem, and re-enumerating, as a result of that, no more calling
>>> > pcie_portdrv_slot_reset in ERR_FATAL case.
>>> >
>>> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>>> >
>>> > diff --git a/drivers/pci/pcie/portdrv_pci.c
>>> > b/drivers/pci/pcie/portdrv_pci.c
>>> > index 973f1b8..92f5d330 100644
>>> > --- a/drivers/pci/pcie/portdrv_pci.c
>>> > +++ b/drivers/pci/pcie/portdrv_pci.c
>>> > @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>>> >
>>> >  /* global data */
>>> >
>>> > -static int pcie_portdrv_restore_config(struct pci_dev *dev)
>>> > -{
>>> > -	int retval;
>>> > -
>>> > -	retval = pci_enable_device(dev);
>>> > -	if (retval)
>>> > -		return retval;
>>> > -	pci_set_master(dev);
>>> > -	return 0;
>>> > -}
>>> > -
>>> >  #ifdef CONFIG_PM
>>> >  static int pcie_port_runtime_suspend(struct device *dev)
>>> >  {
>>> > @@ -162,14 +151,6 @@ static pci_ers_result_t
>>> > pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>>> >
>>> >  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>>> >  {
>>> > -	/* If fatal, restore cfg space for possible link reset at upstream */
>>> > -	if (dev->error_state == pci_channel_io_frozen) {
>>> > -		dev->state_saved = true;
>>> > -		pci_restore_state(dev);
>>> > -		pcie_portdrv_restore_config(dev);
>>> > -		pci_enable_pcie_error_reporting(dev);
>>> > -	}
>>> > -
>>> >  	return PCI_ERS_RESULT_RECOVERED;
>>> >  }
>>> 
>>> 
>>> Hi Bjorn,
>>> 
>>> the above patch removes ERR_FATAL handling from 
>>> pcie_portdrv_slot_reset()
>>> because now we are handling ERR_FATAL differently than before.
>>> 
>>> I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
>>> case
>>> where it
>>> restores the config space, enable device, set master and enable error
>>> reporting....
>>> and as far as I understand this is being done for upstream link 
>>> (bridges
>>> etc..)
>>> 
>>> why was it done at the first point (I checked the commit description, 
>>> but
>>> could not really get it)
>>> and do we need to handle the same thing in ERR_FATAL now ?
>> 
>> You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
>> error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
>> commit log has no useful information.  I don't know any of the history
>> behind it.
> 
> 
> Yes Bjorn thats right.
> I am trying to understand it but no clue.
> since it is restoring the stuffs in ERR_FATAL case, why would PCIe
> bridge loose all the settings ?  [config space, aer bits, master,
> device enable etc..)
> Max we do is link_reset in ERR_FATAL case, and Secondary bus reset
> should affect downstream components (not upstream)

Our first generation controller had this problem. There could be others 
too.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-08  4:47       ` poza
@ 2018-06-08 22:43         ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2018-06-08 22:43 UTC (permalink / raw)
  To: poza
  Cc: Bjorn Helgaas, Bjorn Helgaas, Philippe Ombredanne,
	Thomas Gleixner, Greg Kroah-Hartman, Kate Stewart, linux-pci,
	linux-kernel, Dongdong Liu, Wei Zhang, Sinan Kaya, Timur Tabi

On Thu, Jun 07, 2018 at 09:47:42PM -0700, poza@codeaurora.org wrote:
> Keith,
> 
> do you know why in ERR_FATAL case following was done ?
> have a look at pcie_portdrv_slot_reset() handling (for bridges, switches 
> etc..)

Not sure, but I was looking into some issues in this area anyway.

I'm finding that non-hotpluggable bridges that support D3 are getting
put into that low-power mode, and that pretty much breaks the
re-enumeration.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-07 21:34     ` Bjorn Helgaas
  2018-06-08  4:47       ` poza
  2018-06-08  4:57       ` poza
@ 2018-06-11 10:01       ` poza
  2018-06-11 12:50         ` poza
  2 siblings, 1 reply; 16+ messages in thread
From: poza @ 2018-06-11 10:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi

On 2018-06-08 03:04, Bjorn Helgaas wrote:
> On Thu, Jun 07, 2018 at 07:18:03PM +0530, poza@codeaurora.org wrote:
>> On 2018-06-07 11:30, Oza Pawandeep wrote:
>> > We are handling ERR_FATAL by resetting the Link in software,skipping the
>> > driver pci_error_handlers callbacks, removing the devices from the PCI
>> > subsystem, and re-enumerating, as a result of that, no more calling
>> > pcie_portdrv_slot_reset in ERR_FATAL case.
>> >
>> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>> >
>> > diff --git a/drivers/pci/pcie/portdrv_pci.c
>> > b/drivers/pci/pcie/portdrv_pci.c
>> > index 973f1b8..92f5d330 100644
>> > --- a/drivers/pci/pcie/portdrv_pci.c
>> > +++ b/drivers/pci/pcie/portdrv_pci.c
>> > @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>> >
>> >  /* global data */
>> >
>> > -static int pcie_portdrv_restore_config(struct pci_dev *dev)
>> > -{
>> > -	int retval;
>> > -
>> > -	retval = pci_enable_device(dev);
>> > -	if (retval)
>> > -		return retval;
>> > -	pci_set_master(dev);
>> > -	return 0;
>> > -}
>> > -
>> >  #ifdef CONFIG_PM
>> >  static int pcie_port_runtime_suspend(struct device *dev)
>> >  {
>> > @@ -162,14 +151,6 @@ static pci_ers_result_t
>> > pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>> >
>> >  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>> >  {
>> > -	/* If fatal, restore cfg space for possible link reset at upstream */
>> > -	if (dev->error_state == pci_channel_io_frozen) {
>> > -		dev->state_saved = true;
>> > -		pci_restore_state(dev);
>> > -		pcie_portdrv_restore_config(dev);
>> > -		pci_enable_pcie_error_reporting(dev);
>> > -	}
>> > -
>> >  	return PCI_ERS_RESULT_RECOVERED;
>> >  }
>> 
>> 
>> Hi Bjorn,
>> 
>> the above patch removes ERR_FATAL handling from 
>> pcie_portdrv_slot_reset()
>> because now we are handling ERR_FATAL differently than before.
>> 
>> I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
>> case
>> where it
>> restores the config space, enable device, set master and enable error
>> reporting....
>> and as far as I understand this is being done for upstream link 
>> (bridges
>> etc..)
>> 
>> why was it done at the first point (I checked the commit description, 
>> but
>> could not really get it)
>> and do we need to handle the same thing in ERR_FATAL now ?
> 
> You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
> error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
> commit log has no useful information.  I don't know any of the history
> behind it.

Hi Bjorn and Keith,

broadcast_error_message()
if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
         .....
         pci_walk_bus(dev->subordinate, cb, &result_data);


so in case of ERR_FATAL, the walk bus is happening on subordinates, and 
if I understand the walk right
then, pcie_portdrv_slot_reset() is called only on BRIDGES/Switches

If is never called on Root-Ports

having said that, now since we are removing the devices (compare to 
previous error callback handling in ERR_FATAL)
I dont see the need of the above code anymore.

because there is nothing to restore to any more. as we are initiating 
re-enumeration.

Regards,
Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset()
  2018-06-11 10:01       ` poza
@ 2018-06-11 12:50         ` poza
  0 siblings, 0 replies; 16+ messages in thread
From: poza @ 2018-06-11 12:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi,
	linux-pci-owner

On 2018-06-11 15:31, poza@codeaurora.org wrote:
> On 2018-06-08 03:04, Bjorn Helgaas wrote:
>> On Thu, Jun 07, 2018 at 07:18:03PM +0530, poza@codeaurora.org wrote:
>>> On 2018-06-07 11:30, Oza Pawandeep wrote:
>>> > We are handling ERR_FATAL by resetting the Link in software,skipping the
>>> > driver pci_error_handlers callbacks, removing the devices from the PCI
>>> > subsystem, and re-enumerating, as a result of that, no more calling
>>> > pcie_portdrv_slot_reset in ERR_FATAL case.
>>> >
>>> > Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>>> >
>>> > diff --git a/drivers/pci/pcie/portdrv_pci.c
>>> > b/drivers/pci/pcie/portdrv_pci.c
>>> > index 973f1b8..92f5d330 100644
>>> > --- a/drivers/pci/pcie/portdrv_pci.c
>>> > +++ b/drivers/pci/pcie/portdrv_pci.c
>>> > @@ -42,17 +42,6 @@ __setup("pcie_ports=", pcie_port_setup);
>>> >
>>> >  /* global data */
>>> >
>>> > -static int pcie_portdrv_restore_config(struct pci_dev *dev)
>>> > -{
>>> > -	int retval;
>>> > -
>>> > -	retval = pci_enable_device(dev);
>>> > -	if (retval)
>>> > -		return retval;
>>> > -	pci_set_master(dev);
>>> > -	return 0;
>>> > -}
>>> > -
>>> >  #ifdef CONFIG_PM
>>> >  static int pcie_port_runtime_suspend(struct device *dev)
>>> >  {
>>> > @@ -162,14 +151,6 @@ static pci_ers_result_t
>>> > pcie_portdrv_mmio_enabled(struct pci_dev *dev)
>>> >
>>> >  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>>> >  {
>>> > -	/* If fatal, restore cfg space for possible link reset at upstream */
>>> > -	if (dev->error_state == pci_channel_io_frozen) {
>>> > -		dev->state_saved = true;
>>> > -		pci_restore_state(dev);
>>> > -		pcie_portdrv_restore_config(dev);
>>> > -		pci_enable_pcie_error_reporting(dev);
>>> > -	}
>>> > -
>>> >  	return PCI_ERS_RESULT_RECOVERED;
>>> >  }
>>> 
>>> 
>>> Hi Bjorn,
>>> 
>>> the above patch removes ERR_FATAL handling from 
>>> pcie_portdrv_slot_reset()
>>> because now we are handling ERR_FATAL differently than before.
>>> 
>>> I tried to dig into pcie_portdrv_slot_reset() handling for ERR_FATAL 
>>> case
>>> where it
>>> restores the config space, enable device, set master and enable error
>>> reporting....
>>> and as far as I understand this is being done for upstream link 
>>> (bridges
>>> etc..)
>>> 
>>> why was it done at the first point (I checked the commit description, 
>>> but
>>> could not really get it)
>>> and do we need to handle the same thing in ERR_FATAL now ?
>> 
>> You mean 4bf3392e0bf5 ("PCI-Express AER implemetation: pcie_portdrv
>> error handler"), which added pcie_portdrv_slot_reset()?  I agree, that
>> commit log has no useful information.  I don't know any of the history
>> behind it.
> 
> Hi Bjorn and Keith,
> 
> broadcast_error_message()
> if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>         .....
>         pci_walk_bus(dev->subordinate, cb, &result_data);
> 
> 
> so in case of ERR_FATAL, the walk bus is happening on subordinates,
> and if I understand the walk right
> then, pcie_portdrv_slot_reset() is called only on BRIDGES/Switches
> 
> If is never called on Root-Ports
> 
> having said that, now since we are removing the devices (compare to
> previous error callback handling in ERR_FATAL)
> I dont see the need of the above code anymore.
> 

when I say above code, I meant this patch itself which removes ERR_FATAL 
handling out of pcie_portdrv_slot_reset

> because there is nothing to restore to any more. as we are initiating
> re-enumeration.
> 
> Regards,
> Oza.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-06-11 12:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07  6:00 [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Oza Pawandeep
2018-06-07  6:00 ` [PATCH NEXT 2/6] PCI/AER: Clear uncorrectable fatal error status bits Oza Pawandeep
2018-06-07  6:00 ` [PATCH NEXT 3/6] PCI/ERR: Cleanup ERR_FATAL of error broadcast Oza Pawandeep
2018-06-07  6:00 ` [PATCH NEXT 4/6] PCI/AER: Clear device status error bits during ERR_FATAL and ERR_NONFATAL Oza Pawandeep
2018-06-07  6:00 ` [PATCH NEXT 5/6] PCI/AER: Clear correctable status bits in device register Oza Pawandeep
2018-06-07  6:00 ` [PATCH NEXT 6/6] PCI/PORTDRV: Remove ERR_FATAL handling from pcie_portdrv_slot_reset() Oza Pawandeep
2018-06-07 13:48   ` poza
2018-06-07 21:34     ` Bjorn Helgaas
2018-06-08  4:47       ` poza
2018-06-08 22:43         ` Keith Busch
2018-06-08  4:57       ` poza
2018-06-08 10:41         ` okaya
2018-06-11 10:01       ` poza
2018-06-11 12:50         ` poza
2018-06-07 13:21 ` [PATCH NEXT 1/6] PCI/AER: Take mask into account while clearing error bits Bjorn Helgaas
2018-06-07 13:44   ` poza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).