linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] PCI/ERR: Clear status of the reporting device
@ 2020-12-17 17:14 Keith Busch
  2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Keith Busch @ 2020-12-17 17:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Keith Busch

Error handling operates on the first downstream port above the detected
error, but the error may have been reported by a downstream device.
Clear the AER status of the device that reported the error rather than
the first downstream port.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pcie/err.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 510f31f0ef6d..a84f0bf4c1e2 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -231,15 +231,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_walk_bridge(bridge, report_resume, &status);
 
 	/*
-	 * If we have native control of AER, clear error status in the Root
-	 * Port or Downstream Port that signaled the error.  If the
-	 * platform retained control of AER, it is responsible for clearing
-	 * this status.  In that case, the signaling device may not even be
-	 * visible to the OS.
+	 * If we have native control of AER, clear error status in the device
+	 * that detected the error.  If the platform retained control of AER,
+	 * it is responsible for clearing this status.  In that case, the
+	 * signaling device may not even be visible to the OS.
 	 */
 	if (host->native_aer || pcie_ports_native) {
-		pcie_clear_device_status(bridge);
-		pci_aer_clear_nonfatal_status(bridge);
+		pcie_clear_device_status(dev);
+		pci_aer_clear_nonfatal_status(dev);
 	}
 	pci_info(bridge, "device recovery successful\n");
 	return status;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] PCI/AER: Actually get the root port
  2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
@ 2020-12-17 17:14 ` Keith Busch
  2020-12-17 23:15   ` kernel test robot
                     ` (2 more replies)
  2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
  2021-01-04 18:42 ` [PATCH 1/3] PCI/ERR: Clear status of the reporting device Kelley, Sean V
  2 siblings, 3 replies; 11+ messages in thread
From: Keith Busch @ 2020-12-17 17:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Keith Busch

The pci_dev parameter given to aer_root_reset() may be a downstream port
rather than the root port. Get the root port from the provided device in
order to clear the root's aer status, and update the message to indicate
which type of port was actually reset.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pcie/aer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 77b0f2c45bc0..b2b0e9eb5cfb 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1388,7 +1388,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 	if (type == PCI_EXP_TYPE_RC_END)
 		root = dev->rcec;
 	else
-		root = dev;
+		root = pcie_find_root_port(dev);
 
 	/*
 	 * If the platform retained control of AER, an RCiEP may not have
@@ -1414,7 +1414,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 		}
 	} else {
 		rc = pci_bus_error_reset(dev);
-		pci_info(dev, "Root Port link has been reset (%d)\n", rc);
+		pci_info(dev, "%s Port link has been reset (%d)\n", rc,
+			pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
 	}
 
 	if ((host->native_aer || pcie_ports_native) && aer) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] PCI/ERR: Retain status from error notification
  2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
  2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
@ 2020-12-17 17:14 ` Keith Busch
  2021-01-04 18:43   ` Kelley, Sean V
  2021-01-05 14:12   ` Hinko Kocevar
  2021-01-04 18:42 ` [PATCH 1/3] PCI/ERR: Clear status of the reporting device Kelley, Sean V
  2 siblings, 2 replies; 11+ messages in thread
From: Keith Busch @ 2020-12-17 17:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Keith Busch, Hinko Kocevar

Overwriting the frozen detected status with the result of the link reset
loses the NEED_RESET result that drivers are depending on for error
handling to report the .slot_reset() callback. Retain this status so
that subsequent error handling has the correct flow.

Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pcie/err.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index a84f0bf4c1e2..b576aa890c76 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_dbg(bridge, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bridge(bridge, report_frozen_detected, &status);
-		status = reset_subordinates(bridge);
-		if (status != PCI_ERS_RESULT_RECOVERED) {
+		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(bridge, "subordinate device reset failed\n");
 			goto failed;
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Actually get the root port
  2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
@ 2020-12-17 23:15   ` kernel test robot
  2020-12-17 23:34   ` kernel test robot
  2021-01-04 18:42   ` Kelley, Sean V
  2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-17 23:15 UTC (permalink / raw)
  To: Keith Busch, linux-pci, Bjorn Helgaas; +Cc: kbuild-all, Keith Busch

[-- Attachment #1: Type: text/plain, Size: 4295 bytes --]

Hi Keith,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on next-20201217]
[cannot apply to v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: parisc-randconfig-r014-20201217 (attached as .config)
compiler: hppa64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
        git checkout 8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/pci/pcie/aer.c: In function 'aer_root_reset':
>> drivers/pci/pcie/aer.c:15:21: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat=]
      15 | #define pr_fmt(fmt) "AER: " fmt
         |                     ^~~~~~~
   drivers/pci/pcie/aer.c:16:17: note: in expansion of macro 'pr_fmt'
      16 | #define dev_fmt pr_fmt
         |                 ^~~~~~
   include/linux/dev_printk.h:118:17: note: in expansion of macro 'dev_fmt'
     118 |  _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
         |                 ^~~~~~~
   include/linux/pci.h:2417:37: note: in expansion of macro 'dev_info'
    2417 | #define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
         |                                     ^~~~~~~~
   drivers/pci/pcie/aer.c:1417:3: note: in expansion of macro 'pci_info'
    1417 |   pci_info(dev, "%s Port link has been reset (%d)\n", rc,
         |   ^~~~~~~~
   drivers/pci/pcie/aer.c:1417:19: note: format string is defined here
    1417 |   pci_info(dev, "%s Port link has been reset (%d)\n", rc,
         |                  ~^
         |                   |
         |                   char *
         |                  %d
>> drivers/pci/pcie/aer.c:15:21: warning: format '%d' expects argument of type 'int', but argument 4 has type 'char *' [-Wformat=]
      15 | #define pr_fmt(fmt) "AER: " fmt
         |                     ^~~~~~~
   drivers/pci/pcie/aer.c:16:17: note: in expansion of macro 'pr_fmt'
      16 | #define dev_fmt pr_fmt
         |                 ^~~~~~
   include/linux/dev_printk.h:118:17: note: in expansion of macro 'dev_fmt'
     118 |  _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
         |                 ^~~~~~~
   include/linux/pci.h:2417:37: note: in expansion of macro 'dev_info'
    2417 | #define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
         |                                     ^~~~~~~~
   drivers/pci/pcie/aer.c:1417:3: note: in expansion of macro 'pci_info'
    1417 |   pci_info(dev, "%s Port link has been reset (%d)\n", rc,
         |   ^~~~~~~~
   drivers/pci/pcie/aer.c:1417:48: note: format string is defined here
    1417 |   pci_info(dev, "%s Port link has been reset (%d)\n", rc,
         |                                               ~^
         |                                                |
         |                                                int
         |                                               %s


vim +15 drivers/pci/pcie/aer.c

9cc6f75b27e76d3 Frederick Lawler 2019-05-07 @15  #define pr_fmt(fmt) "AER: " fmt
9cc6f75b27e76d3 Frederick Lawler 2019-05-07  16  #define dev_fmt pr_fmt
9cc6f75b27e76d3 Frederick Lawler 2019-05-07  17  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31159 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Actually get the root port
  2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
  2020-12-17 23:15   ` kernel test robot
@ 2020-12-17 23:34   ` kernel test robot
  2021-01-04 18:42   ` Kelley, Sean V
  2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-17 23:34 UTC (permalink / raw)
  To: Keith Busch, linux-pci, Bjorn Helgaas
  Cc: kbuild-all, clang-built-linux, Keith Busch

[-- Attachment #1: Type: text/plain, Size: 5806 bytes --]

Hi Keith,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on next-20201217]
[cannot apply to v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: powerpc64-randconfig-r026-20201217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc64 cross compiling tool for clang build
        # apt-get install binutils-powerpc64-linux-gnu
        # https://github.com/0day-ci/linux/commit/8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
        git checkout 8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/pci/pcie/aer.c:1417:55: warning: format specifies type 'char *' but the argument has type 'int' [-Wformat]
                   pci_info(dev, "%s Port link has been reset (%d)\n", rc,
                                  ~~                                   ^~
                                  %d
   include/linux/pci.h:2417:67: note: expanded from macro 'pci_info'
   #define pci_info(pdev, fmt, arg...)     dev_info(&(pdev)->dev, fmt, ##arg)
                                                                  ~~~    ^~~
   include/linux/dev_printk.h:118:33: note: expanded from macro 'dev_info'
           _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
                                  ~~~     ^~~~~~~~~~~
>> drivers/pci/pcie/aer.c:1418:4: warning: format specifies type 'int' but the argument has type 'char *' [-Wformat]
                           pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2417:67: note: expanded from macro 'pci_info'
   #define pci_info(pdev, fmt, arg...)     dev_info(&(pdev)->dev, fmt, ##arg)
                                                                  ~~~    ^~~
   include/linux/dev_printk.h:118:33: note: expanded from macro 'dev_info'
           _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
                                  ~~~     ^~~~~~~~~~~
   2 warnings generated.


vim +1417 drivers/pci/pcie/aer.c

  1367	
  1368	/**
  1369	 * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
  1370	 * @dev: pointer to Root Port, RCEC, or RCiEP
  1371	 *
  1372	 * Invoked by Port Bus driver when performing reset.
  1373	 */
  1374	static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
  1375	{
  1376		int type = pci_pcie_type(dev);
  1377		struct pci_dev *root;
  1378		int aer;
  1379		struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
  1380		u32 reg32;
  1381		int rc;
  1382	
  1383		/*
  1384		 * Only Root Ports and RCECs have AER Root Command and Root Status
  1385		 * registers.  If "dev" is an RCiEP, the relevant registers are in
  1386		 * the RCEC.
  1387		 */
  1388		if (type == PCI_EXP_TYPE_RC_END)
  1389			root = dev->rcec;
  1390		else
  1391			root = pcie_find_root_port(dev);
  1392	
  1393		/*
  1394		 * If the platform retained control of AER, an RCiEP may not have
  1395		 * an RCEC visible to us, so dev->rcec ("root") may be NULL.  In
  1396		 * that case, firmware is responsible for these registers.
  1397		 */
  1398		aer = root ? root->aer_cap : 0;
  1399	
  1400		if ((host->native_aer || pcie_ports_native) && aer) {
  1401			/* Disable Root's interrupt in response to error messages */
  1402			pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
  1403			reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
  1404			pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
  1405		}
  1406	
  1407		if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
  1408			if (pcie_has_flr(dev)) {
  1409				rc = pcie_flr(dev);
  1410				pci_info(dev, "has been reset (%d)\n", rc);
  1411			} else {
  1412				pci_info(dev, "not reset (no FLR support)\n");
  1413				rc = -ENOTTY;
  1414			}
  1415		} else {
  1416			rc = pci_bus_error_reset(dev);
> 1417			pci_info(dev, "%s Port link has been reset (%d)\n", rc,
> 1418				pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
  1419		}
  1420	
  1421		if ((host->native_aer || pcie_ports_native) && aer) {
  1422			/* Clear Root Error Status */
  1423			pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, &reg32);
  1424			pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
  1425	
  1426			/* Enable Root Port's interrupt in response to error messages */
  1427			pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
  1428			reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
  1429			pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
  1430		}
  1431	
  1432		return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
  1433	}
  1434	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39078 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] PCI/ERR: Clear status of the reporting device
  2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
  2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
  2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
@ 2021-01-04 18:42 ` Kelley, Sean V
  2 siblings, 0 replies; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 18:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas


Hi Keith,

> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
> 
> Error handling operates on the first downstream port above the detected
> error, but the error may have been reported by a downstream device.
> Clear the AER status of the device that reported the error rather than
> the first downstream port.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/err.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 510f31f0ef6d..a84f0bf4c1e2 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -231,15 +231,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> 	pci_walk_bridge(bridge, report_resume, &status);
> 
> 	/*
> -	 * If we have native control of AER, clear error status in the Root
> -	 * Port or Downstream Port that signaled the error.  If the
> -	 * platform retained control of AER, it is responsible for clearing
> -	 * this status.  In that case, the signaling device may not even be
> -	 * visible to the OS.
> +	 * If we have native control of AER, clear error status in the device
> +	 * that detected the error.  If the platform retained control of AER,
> +	 * it is responsible for clearing this status.  In that case, the
> +	 * signaling device may not even be visible to the OS.
> 	 */
> 	if (host->native_aer || pcie_ports_native) {
> -		pcie_clear_device_status(bridge);
> -		pci_aer_clear_nonfatal_status(bridge);
> +		pcie_clear_device_status(dev);
> +		pci_aer_clear_nonfatal_status(dev);


This looks good to me.

Acked-by: Sean V Kelley <sean.v.kelley@intel.com>



> 	}
> 	pci_info(bridge, "device recovery successful\n");
> 	return status;
> -- 
> 2.24.1
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Actually get the root port
  2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
  2020-12-17 23:15   ` kernel test robot
  2020-12-17 23:34   ` kernel test robot
@ 2021-01-04 18:42   ` Kelley, Sean V
  2021-01-04 22:05     ` Keith Busch
  2 siblings, 1 reply; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 18:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas

Hi Keith,

> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
> 
> The pci_dev parameter given to aer_root_reset() may be a downstream port
> rather than the root port. Get the root port from the provided device in
> order to clear the root's aer status, and update the message to indicate
> which type of port was actually reset.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/aer.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 77b0f2c45bc0..b2b0e9eb5cfb 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1388,7 +1388,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> 	if (type == PCI_EXP_TYPE_RC_END)
> 		root = dev->rcec;
> 	else
> -		root = dev;
> +		root = pcie_find_root_port(dev);

This is a good sanity check on the dev being passed.  This also reminds me to take a look at pcie_do_recovery() in terms of clarity with the two devices of interest being handled.

Acked-by: Sean V Kelley <sean.v.kelley@intel.com>

> 
> 	/*
> 	 * If the platform retained control of AER, an RCiEP may not have
> @@ -1414,7 +1414,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> 		}
> 	} else {
> 		rc = pci_bus_error_reset(dev);
> -		pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> +		pci_info(dev, "%s Port link has been reset (%d)\n", rc,

Perhaps … "%s%s Port


> +			pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
> 	}
> 
> 	if ((host->native_aer || pcie_ports_native) && aer) {
> -- 
> 2.24.1
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] PCI/ERR: Retain status from error notification
  2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
@ 2021-01-04 18:43   ` Kelley, Sean V
  2021-01-05 14:12   ` Hinko Kocevar
  1 sibling, 0 replies; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 18:43 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Hinko Kocevar

Hi Keith,

> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
> 
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
> 
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/err.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index a84f0bf4c1e2..b576aa890c76 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> 	pci_dbg(bridge, "broadcast error_detected message\n");
> 	if (state == pci_channel_io_frozen) {
> 		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		status = reset_subordinates(bridge);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> 			pci_warn(bridge, "subordinate device reset failed\n");
> 			goto failed;
> 		}

This was an interesting scenario for Hinko not getting the NEED_RESET.

Reviewed-by: Sean V Kelley <sean.v.kelley@intel.com>

> -- 
> 2.24.1
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Actually get the root port
  2021-01-04 18:42   ` Kelley, Sean V
@ 2021-01-04 22:05     ` Keith Busch
  2021-01-04 22:10       ` Kelley, Sean V
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2021-01-04 22:05 UTC (permalink / raw)
  To: Kelley, Sean V; +Cc: Linux PCI, Bjorn Helgaas

On Mon, Jan 04, 2021 at 06:42:58PM +0000, Kelley, Sean V wrote:
> > On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
> > 		rc = pci_bus_error_reset(dev);
> > -		pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> > +		pci_info(dev, "%s Port link has been reset (%d)\n", rc,
> 
> Perhaps … "%s%s Port

I am not sure I understand. What should the second string in this format
say?

I have to resend this patch anyway because I messed up the parmeter
order and build bot caught me. I'll split it out from this patch since
it is really a separate issue.
 
 
> > +			pci_is_root_bus(dev->bus) ? "Root" : "Downstream");

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Actually get the root port
  2021-01-04 22:05     ` Keith Busch
@ 2021-01-04 22:10       ` Kelley, Sean V
  0 siblings, 0 replies; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 22:10 UTC (permalink / raw)
  To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas



> On Jan 4, 2021, at 2:05 PM, Keith Busch <kbusch@kernel.org> wrote:
> 
> On Mon, Jan 04, 2021 at 06:42:58PM +0000, Kelley, Sean V wrote:
>>> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
>>> 		rc = pci_bus_error_reset(dev);
>>> -		pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>>> +		pci_info(dev, "%s Port link has been reset (%d)\n", rc,
>> 
>> Perhaps … "%s%s Port
> 
> I am not sure I understand. What should the second string in this format
> say?
> 
> I have to resend this patch anyway because I messed up the parmeter
> order and build bot caught me. I'll split it out from this patch since
> it is really a separate issue.

Never mind you only had one string, it was the order issue that the builedbot caught.

Sean


> 
> 
>>> +			pci_is_root_bus(dev->bus) ? "Root" : "Downstream");


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] PCI/ERR: Retain status from error notification
  2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
  2021-01-04 18:43   ` Kelley, Sean V
@ 2021-01-05 14:12   ` Hinko Kocevar
  1 sibling, 0 replies; 11+ messages in thread
From: Hinko Kocevar @ 2021-01-05 14:12 UTC (permalink / raw)
  To: Keith Busch, linux-pci, Bjorn Helgaas



On 12/17/20 6:14 PM, Keith Busch wrote:
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
> 
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/pci/pcie/err.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index a84f0bf4c1e2..b576aa890c76 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	pci_dbg(bridge, "broadcast error_detected message\n");
>   	if (state == pci_channel_io_frozen) {
>   		pci_walk_bridge(bridge, report_frozen_detected, &status);
> -		status = reset_subordinates(bridge);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>   			pci_warn(bridge, "subordinate device reset failed\n");
>   			goto failed;
>   		}
> 

Dear Keith,

I can confirm that with this series of patches, applied to a linux-pci 
GIT tree kernel, the problem is solved for me. The reset is carried out, 
the secondary bus and the PCI device beind the bus all recover after 
injecting an error into the root port.

Tested-by: Hinko Kocevar <hinko.kocevar@ess.eu>

Thank you!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-05 14:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
2020-12-17 23:15   ` kernel test robot
2020-12-17 23:34   ` kernel test robot
2021-01-04 18:42   ` Kelley, Sean V
2021-01-04 22:05     ` Keith Busch
2021-01-04 22:10       ` Kelley, Sean V
2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
2021-01-04 18:43   ` Kelley, Sean V
2021-01-05 14:12   ` Hinko Kocevar
2021-01-04 18:42 ` [PATCH 1/3] PCI/ERR: Clear status of the reporting device Kelley, Sean V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).