From: James Puthukattukaran <james.puthukattukaran@oracle.com>
To: Keith Busch <kbusch@kernel.org>
Cc: "Kelley, Sean V" <sean.v.kelley@intel.com>,
"Kuppuswamy,
Sathyanarayanan" <sathyanarayanan.kuppuswamy@intel.com>,
Linux PCI <linux-pci@vger.kernel.org>,
"bhelgaas@google.com" <bhelgaas@google.com>
Subject: RE: [External] : Re: pci_do_recovery not handling fata errors
Date: Tue, 16 Mar 2021 21:13:56 +0000 [thread overview]
Message-ID: <MN2PR10MB4093780C86ABFCAB54B1427C996B9@MN2PR10MB4093.namprd10.prod.outlook.com> (raw)
In-Reply-To: <20210313171135.GA8648@redsun51.ssa.fujisawa.hgst.com>
Keith -
I understand that the RP did not detect the error and so nothing to clear in its AER register. My question is - where is the fatal error register cleared in the device's (the device that was the cause of the fata error) AER register? It does not seem to be done in pci_do_recovery walking the hierarchy (unless I'm missing it)....
> -----Original Message-----
> From: Keith Busch <kbusch@kernel.org>
> Sent: Saturday, March 13, 2021 12:12 PM
> To: James Puthukattukaran <james.puthukattukaran@oracle.com>
> Cc: Kelley, Sean V <sean.v.kelley@intel.com>; Kuppuswamy,
> Sathyanarayanan <sathyanarayanan.kuppuswamy@intel.com>; Linux PCI
> <linux-pci@vger.kernel.org>; bhelgaas@google.com
> Subject: [External] : Re: pci_do_recovery not handling fata errors
>
> On Fri, Mar 12, 2021 at 10:57:18PM +0000, James Puthukattukaran wrote:
> > But the clearing of fatal error in the dpc_process_error is only for DPC
> trigger due to "unmaskable uncorrectable".
> > If the trigger reason is ERR_FATAL, then it does not hit the else clause and
> neither is it cleared in the pci_do_recovery code.
>
> If the reason is ERR_FATAL, then the port didn't detect the error; it is just the
> first DPC capable downstream port to receive the message from some device
> downstream, so there's nothing to clear in its AER register.
>
> > From dpc_process_error with more context --
> >
> > else if (reason == 0 && <<<<<<< only for "unmaskable uncorrectable".
> What about for ERR_FATAL?
> > dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > aer_get_device_error_info(pdev, &info)) {
> > aer_print_error(pdev, &info);
> > pci_aer_clear_nonfatal_status(pdev);
> > pci_aer_clear_fatal_status(pdev);
> > }
> >
> >
> > > -----Original Message-----
> > > From: Kelley, Sean V <sean.v.kelley@intel.com>
> > > Sent: Friday, March 12, 2021 5:25 PM
> > > To: James Puthukattukaran <james.puthukattukaran@oracle.com>;
> > > Kuppuswamy, Sathyanarayanan
> > > <sathyanarayanan.kuppuswamy@intel.com>
> > > Cc: Linux PCI <linux-pci@vger.kernel.org>; bhelgaas@google.com
> > > Subject: [External] : Re: pci_do_recovery not handling fata errors
> > >
> > >
> > >
> > > > On Mar 12, 2021, at 12:56 PM, James Puthukattukaran
> > > <james.puthukattukaran@oracle.com> wrote:
> > > >
> > > > Hi -
> > > > I’m trying to understand why pci_do_recovery() only clears
> > > > non-fatal but
> > > not fata errors? My immediate concern is call from dpc_handler. If a
> > > device sends an ERR_FATAL to the root port, I would think that as
> > > part of recovery the fatal status in the AER registers of the endpoint
> device would be cleared?
> > > >
> > >
> > >
> > > Adding Sathya who mentioned to me that:
> > >
> > > Fatal error are cleared in
> > >
> > > void dpc_process_error(struct pci_dev *pdev)
> > >
> > > 253 dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > > 254 aer_get_device_error_info(pdev, &info)) {
> > > 255 aer_print_error(pdev, &info);
> > > 256 pci_aer_clear_nonfatal_status(pdev);
> > > 257 pci_aer_clear_fatal_status(pdev);
> > >
> > > Thanks,
> > >
> > > Sean
> > >
> > > > Snippet of concern in pci_do_recovery –
> > > >
> > > > /*
> > > > * If we have native control of AER, clear error status in the Root
> > > > * Port or Downstream Port that signaled the error. If the
> > > > * platform retained control of AER, it is responsible for clearing
> > > > * this status. In that case, the signaling device may not even be
> > > > * visible to the OS.
> > > > */
> > > > if (host->native_aer || pcie_ports_native) {
> > > > pcie_clear_device_status(bridge);
> > > > pci_aer_clear_nonfatal_status(bridge); <<<< Just clearing
> > > nonfatal. What about fatal?
> > > > }
> > > >
> > > > Thanks
> > > > James
> >
next prev parent reply other threads:[~2021-03-16 21:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <MN2PR10MB4093188B8CDC659AE68E5640996F9@MN2PR10MB4093.namprd10.prod.outlook.com>
2021-03-12 22:25 ` pci_do_recovery not handling fata errors Kelley, Sean V
2021-03-12 22:57 ` James Puthukattukaran
2021-03-13 17:11 ` Keith Busch
2021-03-16 21:13 ` James Puthukattukaran [this message]
2021-03-16 21:51 ` [External] : " Keith Busch
2021-04-01 2:15 ` James Puthukattukaran
2021-04-01 2:22 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MN2PR10MB4093780C86ABFCAB54B1427C996B9@MN2PR10MB4093.namprd10.prod.outlook.com \
--to=james.puthukattukaran@oracle.com \
--cc=bhelgaas@google.com \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=sathyanarayanan.kuppuswamy@intel.com \
--cc=sean.v.kelley@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).