linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kuppuswamy, Sathyanarayanan"  <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Zhiqiang Hou <Zhiqiang.Hou@nxp.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	ruscur@russell.cc, sbobroff@linux.ibm.com, oohall@gmail.com,
	bhelgaas@google.com
Subject: Re: [PATCH] PCI: ERR: Don't override the status returned by error_detect()
Date: Thu, 28 May 2020 14:18:41 -0700	[thread overview]
Message-ID: <84a2bc7e-7556-96ff-6cd5-988d432ad8e3@linux.intel.com> (raw)
In-Reply-To: <20200527083130.4137-1-Zhiqiang.Hou@nxp.com>

Hi,

On 5/27/20 1:31 AM, Zhiqiang Hou wrote:
> From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> 
> The commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
> overrode the 'status' returned by the error_detect() call back function,
> which is depended on by the next step. This overriding makes the Endpoint
> driver's required info (kept in the var status) lost, so it results in the
> fatal errors' recovery failed and then kernel panic.
Can you explain why updating status affects the recovery ?
> 
> In the e1000e case, the error logs:
> pcieport 0002:00:00.0: AER: Uncorrected (Fatal) error received: 0002:01:00.0
> e1000e 0002:01:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
> pcieport 0002:00:00.0: AER: Root Port link has been reset
As per above commit log, it looks like link is reset correctly.
> SError Interrupt on CPU0, code 0xbf000002 -- SError
> CPU: 0 PID: 111 Comm: irq/76-aerdrv Not tainted 5.7.0-rc7-next-20200526 #8
> Hardware name: LS1046A RDB Board (DT)
> pstate: 80000005 (Nzcv daif -PAN -UAO BTYPE=--)
> pc : __pci_enable_msix_range+0x4c8/0x5b8
> lr : __pci_enable_msix_range+0x480/0x5b8
> sp : ffff80001116bb30
> x29: ffff80001116bb30 x28: 0000000000000003
> x27: 0000000000000003 x26: 0000000000000000
> x25: ffff00097243e0a8 x24: 0000000000000001
> x23: ffff00097243e2d8 x22: 0000000000000000
> x21: 0000000000000003 x20: ffff00095bd46080
> x19: ffff00097243e000 x18: ffffffffffffffff
> x17: 0000000000000000 x16: 0000000000000000
> x15: ffffb958fa0e9948 x14: ffff00095bd46303
> x13: ffff00095bd46302 x12: 0000000000000038
> x11: 0000000000000040 x10: ffffb958fa101e68
> x9 : ffffb958fa101e60 x8 : 0000000000000908
> x7 : 0000000000000908 x6 : ffff800011600000
> x5 : ffff00095bd46800 x4 : ffff00096e7f6080
> x3 : 0000000000000000 x2 : 0000000000000000
> x1 : 0000000000000000 x0 : 0000000000000000
> Kernel panic - not syncing: Asynchronous SError Interrupt
> CPU: 0 PID: 111 Comm: irq/76-aerdrv Not tainted 5.7.0-rc7-next-20200526 #8
> 
> I think it's the expected result that "if the initial value of error
> status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER
> then even after successful recovery (using reset_link()) pcie_do_recovery()
> will report the recovery result as failure" which is described in
> commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()").
> 
> Refer to the Documentation/PCI/pci-error-recovery.rst.
> As the error_detect() is mandatory callback if the pci_err_handlers is
> implemented, if it return the PCI_ERS_RESULT_DISCONNECT, it means the
> driver doesn't want to recover at all;
> For the case PCI_ERS_RESULT_NO_AER_DRIVER, if the pci_err_handlers is not
> implemented, the failure is more expected.
> 
> Fixes: commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
>   drivers/pci/pcie/err.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 14bb8f54723e..84f72342259c 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -165,8 +165,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   	pci_dbg(dev, "broadcast error_detected message\n");
>   	if (state == pci_channel_io_frozen) {
>   		pci_walk_bus(bus, report_frozen_detected, &status);
> -		status = reset_link(dev);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> +		if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) {
>   			pci_warn(dev, "link reset failed\n");
>   			goto failed;
>   		}
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

  reply	other threads:[~2020-05-28 21:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27  8:31 [PATCH] PCI: ERR: Don't override the status returned by error_detect() Zhiqiang Hou
2020-05-28 21:18 ` Kuppuswamy, Sathyanarayanan [this message]
2020-05-29  4:04   ` Z.q. Hou
2020-05-29  4:24     ` Kuppuswamy, Sathyanarayanan
2020-06-10  4:07       ` Z.q. Hou
2020-07-10 21:07         ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84a2bc7e-7556-96ff-6cd5-988d432ad8e3@linux.intel.com \
    --to=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=Zhiqiang.Hou@nxp.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).