* [PATCH 1/3] PCI/ERR: Clear status of the reporting device
@ 2020-12-17 17:14 Keith Busch
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Keith Busch @ 2020-12-17 17:14 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas; +Cc: Keith Busch
Error handling operates on the first downstream port above the detected
error, but the error may have been reported by a downstream device.
Clear the AER status of the device that reported the error rather than
the first downstream port.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pcie/err.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 510f31f0ef6d..a84f0bf4c1e2 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -231,15 +231,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_walk_bridge(bridge, report_resume, &status);
/*
- * If we have native control of AER, clear error status in the Root
- * Port or Downstream Port that signaled the error. If the
- * platform retained control of AER, it is responsible for clearing
- * this status. In that case, the signaling device may not even be
- * visible to the OS.
+ * If we have native control of AER, clear error status in the device
+ * that detected the error. If the platform retained control of AER,
+ * it is responsible for clearing this status. In that case, the
+ * signaling device may not even be visible to the OS.
*/
if (host->native_aer || pcie_ports_native) {
- pcie_clear_device_status(bridge);
- pci_aer_clear_nonfatal_status(bridge);
+ pcie_clear_device_status(dev);
+ pci_aer_clear_nonfatal_status(dev);
}
pci_info(bridge, "device recovery successful\n");
return status;
--
2.24.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] PCI/AER: Actually get the root port
2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
@ 2020-12-17 17:14 ` Keith Busch
2020-12-17 23:15 ` kernel test robot
` (2 more replies)
2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
2021-01-04 18:42 ` [PATCH 1/3] PCI/ERR: Clear status of the reporting device Kelley, Sean V
2 siblings, 3 replies; 11+ messages in thread
From: Keith Busch @ 2020-12-17 17:14 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas; +Cc: Keith Busch
The pci_dev parameter given to aer_root_reset() may be a downstream port
rather than the root port. Get the root port from the provided device in
order to clear the root's aer status, and update the message to indicate
which type of port was actually reset.
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pcie/aer.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 77b0f2c45bc0..b2b0e9eb5cfb 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1388,7 +1388,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
if (type == PCI_EXP_TYPE_RC_END)
root = dev->rcec;
else
- root = dev;
+ root = pcie_find_root_port(dev);
/*
* If the platform retained control of AER, an RCiEP may not have
@@ -1414,7 +1414,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
}
} else {
rc = pci_bus_error_reset(dev);
- pci_info(dev, "Root Port link has been reset (%d)\n", rc);
+ pci_info(dev, "%s Port link has been reset (%d)\n", rc,
+ pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
}
if ((host->native_aer || pcie_ports_native) && aer) {
--
2.24.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] PCI/ERR: Retain status from error notification
2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
@ 2020-12-17 17:14 ` Keith Busch
2021-01-04 18:43 ` Kelley, Sean V
2021-01-05 14:12 ` Hinko Kocevar
2021-01-04 18:42 ` [PATCH 1/3] PCI/ERR: Clear status of the reporting device Kelley, Sean V
2 siblings, 2 replies; 11+ messages in thread
From: Keith Busch @ 2020-12-17 17:14 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas; +Cc: Keith Busch, Hinko Kocevar
Overwriting the frozen detected status with the result of the link reset
loses the NEED_RESET result that drivers are depending on for error
handling to report the .slot_reset() callback. Retain this status so
that subsequent error handling has the correct flow.
Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/pci/pcie/err.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index a84f0bf4c1e2..b576aa890c76 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, report_frozen_detected, &status);
- status = reset_subordinates(bridge);
- if (status != PCI_ERS_RESULT_RECOVERED) {
+ if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
pci_warn(bridge, "subordinate device reset failed\n");
goto failed;
}
--
2.24.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] PCI/AER: Actually get the root port
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
@ 2020-12-17 23:15 ` kernel test robot
2020-12-17 23:34 ` kernel test robot
2021-01-04 18:42 ` Kelley, Sean V
2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-17 23:15 UTC (permalink / raw)
To: Keith Busch, linux-pci, Bjorn Helgaas; +Cc: kbuild-all, Keith Busch
[-- Attachment #1: Type: text/plain, Size: 4295 bytes --]
Hi Keith,
I love your patch! Perhaps something to improve:
[auto build test WARNING on pci/next]
[also build test WARNING on next-20201217]
[cannot apply to v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: parisc-randconfig-r014-20201217 (attached as .config)
compiler: hppa64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
git checkout 8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
drivers/pci/pcie/aer.c: In function 'aer_root_reset':
>> drivers/pci/pcie/aer.c:15:21: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat=]
15 | #define pr_fmt(fmt) "AER: " fmt
| ^~~~~~~
drivers/pci/pcie/aer.c:16:17: note: in expansion of macro 'pr_fmt'
16 | #define dev_fmt pr_fmt
| ^~~~~~
include/linux/dev_printk.h:118:17: note: in expansion of macro 'dev_fmt'
118 | _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~
include/linux/pci.h:2417:37: note: in expansion of macro 'dev_info'
2417 | #define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
| ^~~~~~~~
drivers/pci/pcie/aer.c:1417:3: note: in expansion of macro 'pci_info'
1417 | pci_info(dev, "%s Port link has been reset (%d)\n", rc,
| ^~~~~~~~
drivers/pci/pcie/aer.c:1417:19: note: format string is defined here
1417 | pci_info(dev, "%s Port link has been reset (%d)\n", rc,
| ~^
| |
| char *
| %d
>> drivers/pci/pcie/aer.c:15:21: warning: format '%d' expects argument of type 'int', but argument 4 has type 'char *' [-Wformat=]
15 | #define pr_fmt(fmt) "AER: " fmt
| ^~~~~~~
drivers/pci/pcie/aer.c:16:17: note: in expansion of macro 'pr_fmt'
16 | #define dev_fmt pr_fmt
| ^~~~~~
include/linux/dev_printk.h:118:17: note: in expansion of macro 'dev_fmt'
118 | _dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~
include/linux/pci.h:2417:37: note: in expansion of macro 'dev_info'
2417 | #define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
| ^~~~~~~~
drivers/pci/pcie/aer.c:1417:3: note: in expansion of macro 'pci_info'
1417 | pci_info(dev, "%s Port link has been reset (%d)\n", rc,
| ^~~~~~~~
drivers/pci/pcie/aer.c:1417:48: note: format string is defined here
1417 | pci_info(dev, "%s Port link has been reset (%d)\n", rc,
| ~^
| |
| int
| %s
vim +15 drivers/pci/pcie/aer.c
9cc6f75b27e76d3 Frederick Lawler 2019-05-07 @15 #define pr_fmt(fmt) "AER: " fmt
9cc6f75b27e76d3 Frederick Lawler 2019-05-07 16 #define dev_fmt pr_fmt
9cc6f75b27e76d3 Frederick Lawler 2019-05-07 17
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31159 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] PCI/AER: Actually get the root port
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
2020-12-17 23:15 ` kernel test robot
@ 2020-12-17 23:34 ` kernel test robot
2021-01-04 18:42 ` Kelley, Sean V
2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2020-12-17 23:34 UTC (permalink / raw)
To: Keith Busch, linux-pci, Bjorn Helgaas
Cc: kbuild-all, clang-built-linux, Keith Busch
[-- Attachment #1: Type: text/plain, Size: 5806 bytes --]
Hi Keith,
I love your patch! Perhaps something to improve:
[auto build test WARNING on pci/next]
[also build test WARNING on next-20201217]
[cannot apply to v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: powerpc64-randconfig-r026-20201217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# https://github.com/0day-ci/linux/commit/8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Keith-Busch/PCI-ERR-Clear-status-of-the-reporting-device/20201218-011952
git checkout 8f1a820daa9ec2e25dfbdd5b4752df01e849b3aa
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> drivers/pci/pcie/aer.c:1417:55: warning: format specifies type 'char *' but the argument has type 'int' [-Wformat]
pci_info(dev, "%s Port link has been reset (%d)\n", rc,
~~ ^~
%d
include/linux/pci.h:2417:67: note: expanded from macro 'pci_info'
#define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
~~~ ^~~
include/linux/dev_printk.h:118:33: note: expanded from macro 'dev_info'
_dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
>> drivers/pci/pcie/aer.c:1418:4: warning: format specifies type 'int' but the argument has type 'char *' [-Wformat]
pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/pci.h:2417:67: note: expanded from macro 'pci_info'
#define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg)
~~~ ^~~
include/linux/dev_printk.h:118:33: note: expanded from macro 'dev_info'
_dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
2 warnings generated.
vim +1417 drivers/pci/pcie/aer.c
1367
1368 /**
1369 * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
1370 * @dev: pointer to Root Port, RCEC, or RCiEP
1371 *
1372 * Invoked by Port Bus driver when performing reset.
1373 */
1374 static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
1375 {
1376 int type = pci_pcie_type(dev);
1377 struct pci_dev *root;
1378 int aer;
1379 struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
1380 u32 reg32;
1381 int rc;
1382
1383 /*
1384 * Only Root Ports and RCECs have AER Root Command and Root Status
1385 * registers. If "dev" is an RCiEP, the relevant registers are in
1386 * the RCEC.
1387 */
1388 if (type == PCI_EXP_TYPE_RC_END)
1389 root = dev->rcec;
1390 else
1391 root = pcie_find_root_port(dev);
1392
1393 /*
1394 * If the platform retained control of AER, an RCiEP may not have
1395 * an RCEC visible to us, so dev->rcec ("root") may be NULL. In
1396 * that case, firmware is responsible for these registers.
1397 */
1398 aer = root ? root->aer_cap : 0;
1399
1400 if ((host->native_aer || pcie_ports_native) && aer) {
1401 /* Disable Root's interrupt in response to error messages */
1402 pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
1403 reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
1404 pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
1405 }
1406
1407 if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
1408 if (pcie_has_flr(dev)) {
1409 rc = pcie_flr(dev);
1410 pci_info(dev, "has been reset (%d)\n", rc);
1411 } else {
1412 pci_info(dev, "not reset (no FLR support)\n");
1413 rc = -ENOTTY;
1414 }
1415 } else {
1416 rc = pci_bus_error_reset(dev);
> 1417 pci_info(dev, "%s Port link has been reset (%d)\n", rc,
> 1418 pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
1419 }
1420
1421 if ((host->native_aer || pcie_ports_native) && aer) {
1422 /* Clear Root Error Status */
1423 pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
1424 pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
1425
1426 /* Enable Root Port's interrupt in response to error messages */
1427 pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
1428 reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
1429 pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
1430 }
1431
1432 return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
1433 }
1434
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39078 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] PCI/ERR: Clear status of the reporting device
2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
@ 2021-01-04 18:42 ` Kelley, Sean V
2 siblings, 0 replies; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 18:42 UTC (permalink / raw)
To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas
Hi Keith,
> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
>
> Error handling operates on the first downstream port above the detected
> error, but the error may have been reported by a downstream device.
> Clear the AER status of the device that reported the error rather than
> the first downstream port.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/err.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 510f31f0ef6d..a84f0bf4c1e2 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -231,15 +231,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_walk_bridge(bridge, report_resume, &status);
>
> /*
> - * If we have native control of AER, clear error status in the Root
> - * Port or Downstream Port that signaled the error. If the
> - * platform retained control of AER, it is responsible for clearing
> - * this status. In that case, the signaling device may not even be
> - * visible to the OS.
> + * If we have native control of AER, clear error status in the device
> + * that detected the error. If the platform retained control of AER,
> + * it is responsible for clearing this status. In that case, the
> + * signaling device may not even be visible to the OS.
> */
> if (host->native_aer || pcie_ports_native) {
> - pcie_clear_device_status(bridge);
> - pci_aer_clear_nonfatal_status(bridge);
> + pcie_clear_device_status(dev);
> + pci_aer_clear_nonfatal_status(dev);
This looks good to me.
Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
> }
> pci_info(bridge, "device recovery successful\n");
> return status;
> --
> 2.24.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] PCI/AER: Actually get the root port
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
2020-12-17 23:15 ` kernel test robot
2020-12-17 23:34 ` kernel test robot
@ 2021-01-04 18:42 ` Kelley, Sean V
2021-01-04 22:05 ` Keith Busch
2 siblings, 1 reply; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 18:42 UTC (permalink / raw)
To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas
Hi Keith,
> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
>
> The pci_dev parameter given to aer_root_reset() may be a downstream port
> rather than the root port. Get the root port from the provided device in
> order to clear the root's aer status, and update the message to indicate
> which type of port was actually reset.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/aer.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 77b0f2c45bc0..b2b0e9eb5cfb 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1388,7 +1388,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> if (type == PCI_EXP_TYPE_RC_END)
> root = dev->rcec;
> else
> - root = dev;
> + root = pcie_find_root_port(dev);
This is a good sanity check on the dev being passed. This also reminds me to take a look at pcie_do_recovery() in terms of clarity with the two devices of interest being handled.
Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
>
> /*
> * If the platform retained control of AER, an RCiEP may not have
> @@ -1414,7 +1414,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> }
> } else {
> rc = pci_bus_error_reset(dev);
> - pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> + pci_info(dev, "%s Port link has been reset (%d)\n", rc,
Perhaps … "%s%s Port
> + pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
> }
>
> if ((host->native_aer || pcie_ports_native) && aer) {
> --
> 2.24.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] PCI/ERR: Retain status from error notification
2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
@ 2021-01-04 18:43 ` Kelley, Sean V
2021-01-05 14:12 ` Hinko Kocevar
1 sibling, 0 replies; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 18:43 UTC (permalink / raw)
To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas, Hinko Kocevar
Hi Keith,
> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
>
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
>
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/err.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index a84f0bf4c1e2..b576aa890c76 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_dbg(bridge, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> pci_walk_bridge(bridge, report_frozen_detected, &status);
> - status = reset_subordinates(bridge);
> - if (status != PCI_ERS_RESULT_RECOVERED) {
> + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> pci_warn(bridge, "subordinate device reset failed\n");
> goto failed;
> }
This was an interesting scenario for Hinko not getting the NEED_RESET.
Reviewed-by: Sean V Kelley <sean.v.kelley@intel.com>
> --
> 2.24.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] PCI/AER: Actually get the root port
2021-01-04 18:42 ` Kelley, Sean V
@ 2021-01-04 22:05 ` Keith Busch
2021-01-04 22:10 ` Kelley, Sean V
0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2021-01-04 22:05 UTC (permalink / raw)
To: Kelley, Sean V; +Cc: Linux PCI, Bjorn Helgaas
On Mon, Jan 04, 2021 at 06:42:58PM +0000, Kelley, Sean V wrote:
> > On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
> > rc = pci_bus_error_reset(dev);
> > - pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> > + pci_info(dev, "%s Port link has been reset (%d)\n", rc,
>
> Perhaps … "%s%s Port
I am not sure I understand. What should the second string in this format
say?
I have to resend this patch anyway because I messed up the parmeter
order and build bot caught me. I'll split it out from this patch since
it is really a separate issue.
> > + pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] PCI/AER: Actually get the root port
2021-01-04 22:05 ` Keith Busch
@ 2021-01-04 22:10 ` Kelley, Sean V
0 siblings, 0 replies; 11+ messages in thread
From: Kelley, Sean V @ 2021-01-04 22:10 UTC (permalink / raw)
To: Keith Busch; +Cc: Linux PCI, Bjorn Helgaas
> On Jan 4, 2021, at 2:05 PM, Keith Busch <kbusch@kernel.org> wrote:
>
> On Mon, Jan 04, 2021 at 06:42:58PM +0000, Kelley, Sean V wrote:
>>> On Dec 17, 2020, at 9:14 AM, Keith Busch <kbusch@kernel.org> wrote:
>>> rc = pci_bus_error_reset(dev);
>>> - pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>>> + pci_info(dev, "%s Port link has been reset (%d)\n", rc,
>>
>> Perhaps … "%s%s Port
>
> I am not sure I understand. What should the second string in this format
> say?
>
> I have to resend this patch anyway because I messed up the parmeter
> order and build bot caught me. I'll split it out from this patch since
> it is really a separate issue.
Never mind you only had one string, it was the order issue that the builedbot caught.
Sean
>
>
>>> + pci_is_root_bus(dev->bus) ? "Root" : "Downstream");
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] PCI/ERR: Retain status from error notification
2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
2021-01-04 18:43 ` Kelley, Sean V
@ 2021-01-05 14:12 ` Hinko Kocevar
1 sibling, 0 replies; 11+ messages in thread
From: Hinko Kocevar @ 2021-01-05 14:12 UTC (permalink / raw)
To: Keith Busch, linux-pci, Bjorn Helgaas
On 12/17/20 6:14 PM, Keith Busch wrote:
> Overwriting the frozen detected status with the result of the link reset
> loses the NEED_RESET result that drivers are depending on for error
> handling to report the .slot_reset() callback. Retain this status so
> that subsequent error handling has the correct flow.
>
> Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> drivers/pci/pcie/err.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index a84f0bf4c1e2..b576aa890c76 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -198,8 +198,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_dbg(bridge, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> pci_walk_bridge(bridge, report_frozen_detected, &status);
> - status = reset_subordinates(bridge);
> - if (status != PCI_ERS_RESULT_RECOVERED) {
> + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> pci_warn(bridge, "subordinate device reset failed\n");
> goto failed;
> }
>
Dear Keith,
I can confirm that with this series of patches, applied to a linux-pci
GIT tree kernel, the problem is solved for me. The reset is carried out,
the secondary bus and the PCI device beind the bus all recover after
injecting an error into the root port.
Tested-by: Hinko Kocevar <hinko.kocevar@ess.eu>
Thank you!
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-01-05 14:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 17:14 [PATCH 1/3] PCI/ERR: Clear status of the reporting device Keith Busch
2020-12-17 17:14 ` [PATCH 2/3] PCI/AER: Actually get the root port Keith Busch
2020-12-17 23:15 ` kernel test robot
2020-12-17 23:34 ` kernel test robot
2021-01-04 18:42 ` Kelley, Sean V
2021-01-04 22:05 ` Keith Busch
2021-01-04 22:10 ` Kelley, Sean V
2020-12-17 17:14 ` [PATCH 3/3] PCI/ERR: Retain status from error notification Keith Busch
2021-01-04 18:43 ` Kelley, Sean V
2021-01-05 14:12 ` Hinko Kocevar
2021-01-04 18:42 ` [PATCH 1/3] PCI/ERR: Clear status of the reporting device Kelley, Sean V
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).