linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: poza@codeaurora.org
Cc: Thomas Tai <thomas.tai@oracle.com>,
	bhelgaas@google.com, keith.busch@intel.com,
	linux-pci@vger.kernel.org, linux-pci-owner@vger.kernel.org
Subject: Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed
Date: Sat, 18 Aug 2018 17:38:58 +1000	[thread overview]
Message-ID: <9e654e989967e029815fc3b1c7e5f0cbc5d6d6e9.camel@kernel.crashing.org> (raw)
In-Reply-To: <65ec0c5b4682c0f0215114ac09d46cf5@codeaurora.org>

On Fri, 2018-08-17 at 15:59 +0530, poza@codeaurora.org wrote:
> 
> The existing design is; framework dictating drivers what it should do 
> rather than driver deciding.
> let me explain.

The AER code design might be but it's not what was documented in
Documentation/PCI/pci-error-recovery.txt before you changed it !

And it's not what about 20 years of doing PCI error recovery before
there even was such a thing as PCIe or AER taught us on powerpc.

The basic idea is that the "action" to be taken to recover is the
"deepest" of all the ones requested by the framework and all the
involved drivers (more than one device could be part of an error
handling "domain" under some circumstances, for example when the
failure originates from a bridge).

What the PCIe spec says is rather uninteresting and can be trusted as
far as policy is concern about as much as you can trust HW to be
compliant with it, that is about zilch.

The only interesting thing to take from the spec is that in the case of
a FATAL error, you *must* at least reset the link. That's it. It
doesn't mean that you shouldn't reset more (PERST for example) and it
doesn't mean that you shouldn't reset the device or link for a
NON_FATAL error either. And even if it did we should ignore it.

There are many reasons why a driver might request a reset. For example,
the error, while non-fatal to the link itself, might still have
triggered a non-recoverable event in the device, either in HW or SW,
and the only safe way out might well be a reset.

On POWER systems, with EEH HW support, we have additional rules that
say that on *any* non-corrected error (and under some circumstances if
corrected errors cross a certain threshold) will isolate the device
completely and the typical recovery from that is a reset (it doesn't
*have* to be but that's what most drivers chose).

This is rules design to prevent propagation of potentially corrupted
data which is considering way more important than QoS of the device.

We should see if we can make some of the AER and EEH code more common,
the first thing would be to take things out of pcie/ since EEH isn't
PCIe specific, and abstract the recovery process from the underlying
architecture hooks to perform things like reset etc...
 
> aer_isr
>      aer_isr_one_error
>           get_device_error_info   >> this checks
>                {
>                  /* Link is still healthy for IO reads */    >> Bjorn 
> this looks like a very strange thing to me, if link is not healthy we 
> are not even going for pcie_do_fatal_recovery()
> 		pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS,
> 			&info->status);
> 		pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_MASK,
> 			&info->mask);
> 		if (!(info->status & ~info->mask))
> 			return 0;
> 
>                {
>          handle_error_source
>          pcie_do_nonfatal_recovery or pcie_do_fatal_recovery
> 
> now it does not seem to me that driver has some way of knowing if it has 
> to return PCI_ERS_RESULT_CAN_RECOVER ? or PCI_ERS_RESULT_NEED_RESET?

You don't know that. A driver can look at the device state at the
enable_mmio phase and decide that the device is sufficiently sane that
it can continue... or not. A driver could have a policy of always
resetting. Etc...

> let us take an example of storage driver here e.g. NVME
> nvme_error_detected()  relies heavily on pci_channel_state_t which is 
> passed by error framework  (err.c)

So what ? This is one example... the IPR SCSI driver is much more
complex in its error recovery afaik.

> I understand your point of view and original intention of design that 
> let driver dictate whether it wants slot_reset ?

It's both the driver *and* the framework, which ever wants the
"harshest" solution wins. If either wants a reset we do a reset.

> but driver relies on framework to pass that information in order to take 
> decision.

No. Some drivers might. This is by no means the absolute rule. There
are also all other type of errors that aren't covered by the AER
categories that fit in that framework such as iommu errors (at least
for us on powerpc).

> In case of pci_channel_io_frozen  (which is ERR_FATAL), most of the 
> drivers demand link reset.

No, they demand a *device* reset. ie, hot reset or PERST. It's not
about the link. It's about the device.

> but in case of ERR_NONFATAL, the link is supposed to be functional and 
> there is no need for link reset.

It depends whether the error was corrected or not. If it was not
corrected, then something bad did happen and the framework has no way
to know what is the appropriate remediation for a given device.

> and exactly all the PCI based drivers returns 
> PCI_ERS_RESULT_CAN_RECOVER, which is valid.

You are mixing the AER terminology of FATAL/NON_FATAL which is AER
specific and the framework defined channel states, which are
"functional", "frozen" and "permanently dead". These don't necessarily
match 1:1.

> so I think to conclude, you are referring more of your views towards
> pcie_do_fatal_recovery()

Both. But yes, pcie_do_fatal_recovery() should definitely NOT do an
unplug/replug blindly. The whole point of the error handlers was to
allow driver to recover from these things without losing the links to
their respective consumers (such as filesystems for block devices).

> which does
> 
> > remove_devices from the downstream link
> > reset_link()   (Secondary bus reset)
> > re-enumerate.
> 
> and this is what DPC defined, and AER is just following that.

And is completely and utterly broken and useless as a policy.

Again, somebody cared too much about the spec rather than what makes
actual sense in practice.

You should *only* remove devices if the driver doesn't have the
necessary callbacks.

In fact, remove and re-plug device is probably the worst thing you can
do, I don't remember all the details but it's been causing us endless
issues with EEH, which is why as I mentioned, we are looking more
toward an unbind and rebind of the driver, but that's a details.

If the driver has the error callbacks it should participate in the
recovery and NOT be unbound or the device unplugged. That's completely
useless for Linux to do that since it means you cannot recover from an
error on a storage driver.

Ben.

> Regards,
> Oza.
> 
> > 
> > Also because multiple devices, at least on power, can share a domain,
> > we can get into a situation where one device requires a reset, so all
> > will get reset, and their respective drivers need to be notified.
> > 
> > Note: it's not so much about resetting the *link* than about resetting
> > the *device*.
> > 
> > > although
> > > I see following note in the code as well.
> > >                  /*
> > > 		 * TODO: Should call platform-specific
> > > 		 * functions to reset slot before calling
> > > 		 * drivers' slot_reset callbacks?
> > > 		 */
> > 
> > Yup :-)
> > 
> > Cheers,
> > Ben.
> > 
> > > Regards,
> > > Oza.
> > > 
> > > > 
> > > > Also we should do a hot reset at least, not just a link reset.
> > > > 
> > > > >      report_resume()
> > > > > 
> > > > > If you suggest how it is broken, it will help me to understand.
> > > > > probably you might want to point out what are the calls need to be
> > > > > added
> > > > > or removed or differently handled, specially storage point of view.
> > > > 
> > > > 
> > > > > Regards,
> > > > > Oza.
> > > > > 
> > > > > > 
> > > > > > Keep in mind that those callbacks were designed originally for EEH
> > > > > > (which predates AER), and so was the spec written.
> > > > > > 
> > > > > > We don't actually use the AER code on POWER today, so we didn't notice
> > > > > > how broken the implementation was :-)
> > > > > > 
> > > > > > We should fix that.
> > > > > > 
> > > > > > Either we can sort all that out by email, or we should plan some kind
> > > > > > of catch-up, either at Plumbers (provided I can go there) or maybe a
> > > > > > webex call.
> > > > > > 
> > > > > > Cheers,
> > > > > > Ben.

  parent reply	other threads:[~2018-08-18 10:46 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-13 16:51 [PATCH 0/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed Thomas Tai
2018-08-13 16:51 ` [PATCH 1/1] " Thomas Tai
2018-08-14  9:16   ` poza
2018-08-14  9:22     ` poza
2018-08-14 13:51       ` Thomas Tai
2018-08-15 14:57         ` poza
2018-08-15 15:02           ` Thomas Tai
2018-08-15 15:26             ` poza
2018-08-15 15:43               ` Thomas Tai
2018-08-15 15:59                 ` poza
2018-08-15 16:04                   ` Thomas Tai
2018-08-15 21:55       ` Benjamin Herrenschmidt
2018-08-15 21:56       ` Benjamin Herrenschmidt
2018-08-16  6:36         ` poza
2018-08-16  6:51           ` Benjamin Herrenschmidt
2018-08-16  6:59             ` Benjamin Herrenschmidt
2018-08-16  8:07               ` poza
2018-08-16  8:12                 ` Benjamin Herrenschmidt
2018-08-16  9:03                   ` poza
2018-08-16 10:07                     ` Benjamin Herrenschmidt
2018-08-16 14:11                       ` poza
2018-08-16 23:30                         ` Benjamin Herrenschmidt
2018-08-17 10:29                           ` poza
2018-08-17 10:44                             ` poza
2018-08-18  7:38                             ` Benjamin Herrenschmidt [this message]
2018-08-16  7:05             ` Benjamin Herrenschmidt
2018-08-16  7:15               ` Benjamin Herrenschmidt
2018-08-16  7:56                 ` poza
2018-08-16  8:10                   ` Benjamin Herrenschmidt
2018-08-16  8:05               ` poza
2018-08-16  8:15                 ` Benjamin Herrenschmidt
2018-08-16  8:22                   ` poza
2018-08-16  8:28                     ` Benjamin Herrenschmidt
2018-08-16 13:30                       ` Thomas Tai
2018-08-16 13:46                     ` Sinan Kaya
2018-08-16 23:27                       ` Benjamin Herrenschmidt
2018-08-17  6:35                         ` poza
2018-08-19  2:24                         ` Bjorn Helgaas
2018-08-20  5:09                           ` poza
2018-08-20  5:15                             ` Benjamin Herrenschmidt
2018-08-20 13:02                               ` Thomas Tai
2018-08-20 13:27                                 ` Benjamin Herrenschmidt
2018-08-19  2:19               ` Bjorn Helgaas
2018-08-19 21:41                 ` Sinan Kaya
2018-08-20  2:03                   ` Benjamin Herrenschmidt
2018-08-20  5:19                     ` poza
2018-08-20  5:33                       ` Benjamin Herrenschmidt
2018-08-20  7:56                         ` poza
2018-08-20 11:22                           ` Benjamin Herrenschmidt
2018-08-20 13:26                             ` poza
2018-08-20 21:02                               ` Benjamin Herrenschmidt
2018-08-21  5:14                                 ` poza
2018-08-21  6:06                                   ` Benjamin Herrenschmidt
2018-08-21 14:37                                     ` Keith Busch
2018-08-21 15:07                                       ` Sinan Kaya
2018-08-21 15:29                                         ` Keith Busch
2018-08-21 15:50                                           ` Sinan Kaya
2018-08-21 15:55                                             ` Sinan Kaya
2018-08-21 16:44                                             ` Keith Busch
2018-08-21 15:30                                       ` poza
2018-08-21 21:14                                       ` Benjamin Herrenschmidt
2018-08-21 22:04                                         ` Keith Busch
2018-08-21 22:33                                           ` Keith Busch
2018-08-21 23:06                                           ` Benjamin Herrenschmidt
2018-08-21 23:13                                             ` Keith Busch
2018-08-22  0:36                                               ` Benjamin Herrenschmidt
2018-08-30  0:01                                             ` Keith Busch
2018-08-30  0:10                                               ` Sinan Kaya
2018-08-30  0:46                                                 ` Keith Busch
2018-08-30  4:26                                               ` Benjamin Herrenschmidt
2018-08-20 15:53                             ` Keith Busch
2018-08-20 16:13                               ` poza
2018-08-20 16:32                                 ` Keith Busch
2018-08-20 21:05                               ` Benjamin Herrenschmidt
2018-08-20 21:21                                 ` Sinan Kaya
2018-08-20 21:35                                   ` Keith Busch
2018-08-20 21:53                                     ` Benjamin Herrenschmidt
2018-08-20 22:02                                       ` Sinan Kaya
2018-08-20 22:04                                         ` Benjamin Herrenschmidt
2018-08-20 22:13                                           ` Sinan Kaya
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-22  9:13                                             ` Lukas Wunner
2018-08-22 14:38                                               ` Keith Busch
2018-08-22 14:51                                                 ` Sinan Kaya
2018-08-20 22:13                                           ` Keith Busch
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-21  1:30                                               ` Keith Busch
2018-08-20  4:37                 ` Benjamin Herrenschmidt
2018-08-20  4:39                 ` PATCH] Partial revert of "PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices" Benjamin Herrenschmidt
2018-08-21 19:50                   ` Bjorn Helgaas
2018-08-22  4:35                     ` poza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e654e989967e029815fc3b1c7e5f0cbc5d6d6e9.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=keith.busch@intel.com \
    --cc=linux-pci-owner@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=poza@codeaurora.org \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).