All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Puthukattukaran <james.puthukattukaran@oracle.com>
To: Keith Busch <kbusch@kernel.org>
Cc: "Kelley, Sean V" <sean.v.kelley@intel.com>,
	"Kuppuswamy,
	Sathyanarayanan" <sathyanarayanan.kuppuswamy@intel.com>,
	Linux PCI <linux-pci@vger.kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>
Subject: RE: [External] : Re: pci_do_recovery not handling fata errors
Date: Tue, 16 Mar 2021 21:13:56 +0000	[thread overview]
Message-ID: <MN2PR10MB4093780C86ABFCAB54B1427C996B9@MN2PR10MB4093.namprd10.prod.outlook.com> (raw)
In-Reply-To: <20210313171135.GA8648@redsun51.ssa.fujisawa.hgst.com>

Keith -
I understand that the RP did not detect the error and so nothing to clear in its AER register. My question is - where is the fatal error register cleared in the device's (the device that was the cause of the fata error) AER register? It does not seem to be done in pci_do_recovery walking the hierarchy (unless I'm missing it)....


> -----Original Message-----
> From: Keith Busch <kbusch@kernel.org>
> Sent: Saturday, March 13, 2021 12:12 PM
> To: James Puthukattukaran <james.puthukattukaran@oracle.com>
> Cc: Kelley, Sean V <sean.v.kelley@intel.com>; Kuppuswamy,
> Sathyanarayanan <sathyanarayanan.kuppuswamy@intel.com>; Linux PCI
> <linux-pci@vger.kernel.org>; bhelgaas@google.com
> Subject: [External] : Re: pci_do_recovery not handling fata errors
> 
> On Fri, Mar 12, 2021 at 10:57:18PM +0000, James Puthukattukaran wrote:
> > But the clearing of fatal error in the dpc_process_error is only for DPC
> trigger due to "unmaskable uncorrectable".
> > If the trigger reason is ERR_FATAL, then it does not hit the else clause and
> neither is it cleared in the pci_do_recovery code.
> 
> If the reason is ERR_FATAL, then the port didn't detect the error; it is just the
> first DPC capable downstream port to receive the message from some device
> downstream, so there's nothing to clear in its AER register.
> 
> > From dpc_process_error with more context --
> >
> >        else if (reason == 0 &&  <<<<<<< only for "unmaskable uncorrectable".
> What about for ERR_FATAL?
> >                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
> >                  aer_get_device_error_info(pdev, &info)) {
> >                 aer_print_error(pdev, &info);
> >                 pci_aer_clear_nonfatal_status(pdev);
> >                 pci_aer_clear_fatal_status(pdev);
> >         }
> >
> >
> > > -----Original Message-----
> > > From: Kelley, Sean V <sean.v.kelley@intel.com>
> > > Sent: Friday, March 12, 2021 5:25 PM
> > > To: James Puthukattukaran <james.puthukattukaran@oracle.com>;
> > > Kuppuswamy, Sathyanarayanan
> > > <sathyanarayanan.kuppuswamy@intel.com>
> > > Cc: Linux PCI <linux-pci@vger.kernel.org>; bhelgaas@google.com
> > > Subject: [External] : Re: pci_do_recovery not handling fata errors
> > >
> > >
> > >
> > > > On Mar 12, 2021, at 12:56 PM, James Puthukattukaran
> > > <james.puthukattukaran@oracle.com> wrote:
> > > >
> > > > Hi -
> > > > I’m trying to understand why pci_do_recovery() only clears
> > > > non-fatal but
> > > not fata errors? My immediate concern is call from dpc_handler. If a
> > > device sends an ERR_FATAL to the root port, I would think that as
> > > part of recovery the fatal status in the AER registers of the endpoint
> device would be cleared?
> > > >
> > >
> > >
> > > Adding Sathya who mentioned to me that:
> > >
> > > Fatal error are cleared in
> > >
> > > void dpc_process_error(struct pci_dev *pdev)
> > >
> > > 253                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > > 254                  aer_get_device_error_info(pdev, &info)) {
> > > 255                 aer_print_error(pdev, &info);
> > > 256                 pci_aer_clear_nonfatal_status(pdev);
> > > 257                 pci_aer_clear_fatal_status(pdev);
> > >
> > > Thanks,
> > >
> > > Sean
> > >
> > > > Snippet of concern in pci_do_recovery –
> > > >
> > > >         /*
> > > >          * If we have native control of AER, clear error status in the Root
> > > >          * Port or Downstream Port that signaled the error.  If the
> > > >          * platform retained control of AER, it is responsible for clearing
> > > >          * this status.  In that case, the signaling device may not even be
> > > >          * visible to the OS.
> > > >          */
> > > >         if (host->native_aer || pcie_ports_native) {
> > > >                 pcie_clear_device_status(bridge);
> > > >                 pci_aer_clear_nonfatal_status(bridge);   <<<< Just clearing
> > > nonfatal. What about fatal?
> > > >         }
> > > >
> > > > Thanks
> > > > James
> >

  reply	other threads:[~2021-03-16 21:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <MN2PR10MB4093188B8CDC659AE68E5640996F9@MN2PR10MB4093.namprd10.prod.outlook.com>
2021-03-12 22:25 ` pci_do_recovery not handling fata errors Kelley, Sean V
2021-03-12 22:57   ` James Puthukattukaran
2021-03-13 17:11     ` Keith Busch
2021-03-16 21:13       ` James Puthukattukaran [this message]
2021-03-16 21:51         ` [External] : " Keith Busch
2021-04-01  2:15           ` James Puthukattukaran
2021-04-01  2:22             ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR10MB4093780C86ABFCAB54B1427C996B9@MN2PR10MB4093.namprd10.prod.outlook.com \
    --to=james.puthukattukaran@oracle.com \
    --cc=bhelgaas@google.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@intel.com \
    --cc=sean.v.kelley@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.