linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: poza@codeaurora.org
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Tai <thomas.tai@oracle.com>,
	bhelgaas@google.com, keith.busch@intel.com,
	linux-pci@vger.kernel.org, linux-pci-owner@vger.kernel.org
Subject: Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed
Date: Thu, 16 Aug 2018 12:06:36 +0530	[thread overview]
Message-ID: <6cb069038530757f31f3dd60328c7e30@codeaurora.org> (raw)
In-Reply-To: <903394c04d6ad468ed06dc0a779200e7555345a7.camel@kernel.crashing.org>

On 2018-08-16 03:26, Benjamin Herrenschmidt wrote:
> (resent using my proper email address)
> 
> On Tue, 2018-08-14 at 14:52 +0530, poza@codeaurora.org wrote:
>> > >       if (result == PCI_ERS_RESULT_RECOVERED) {
>> > >               if (pcie_wait_for_link(udev, true))
>> > >                       pci_rescan_bus(udev->bus);
>> > > -            pci_info(dev, "Device recovery from fatal error successful\n");
>> > > +            /* find the pci_dev after rescanning the bus */
>> > > +            dev = pci_get_domain_bus_and_slot(domain, bus, devfn);
>> >
>> > one of the motivations was to remove and re-enumerate rather then
>> > going thorugh driver's recovery sequence
>> > was; it might be possible that hotplug capable bridge, the device
>> > might have changed.
>> > hence this check will fail
> 
> Under what circumstances do you actually "unplug" the device ? We are
> trying to cleanup/fix some of the PowerPC EEH code which is in a way
> similar to AER, and we found that this unplug/replug, which we do if
> the driver doesn't have recovery callbacks only, is causing more
> problems than it solves.
> 
> We are moving toward instead unbinding the driver, resetting the
> device, then re-binding the driver instead of unplug/replug.
> 
> Also why would you ever bypass the driver callbacks if the driver has
> some ? The whole point is to keep the driver bound while resetting the
> device (provided it has the right callbacks) so we don't lose the
> linkage between stroage devices and mounted filesystems.
> 

there are two different paths we are taking, one is for ERR_FATAL and 
the other is for ERR_NONFATAL.
while AER and DPC converge into ERR_FATAL path; ERR_NONFATAL path is the 
one where AER and driver callbacks would come into picture.

since DPC does hw recovery of the link, and handles only ERR_FATAL, we 
do remove devices and re-enumerate them.
but if the error is no fatal, we will fall back to driver callbacks to 
initiate link error handling and recovery process.

under normal circumstances I do not see the some downstream devices 
would have been replaced in case of FATAL errors, but code might not 
want to assume that
same device is there, since we are removing downstream devices 
completely. so taking reference of old BDF and keeping it for later 
reference is not good idea.
although I think there are some parts of pcie_do_fatal_recovery need a 
fix, which Thomas is fixing is anyway.

so yes, I agree with you and we are talking about same thing e.g.
"
  We are moving toward instead unbinding the driver, resetting the
  device, then re-binding the driver instead of unplug/replug.
"

driver callbacks are very much there, please have a look at 
pcie_do_nonfatal_recovery().

refering Sepc: 6.2.2.2.1 Fatal Errors, where link is unreliable and it 
might need AER style reset of link or DPC style HW recovery
In both cases, the shutdown callbacks are expected to be called,
e.g.  some driver handle errors ERR_NONFATAL or FATAL in similar ways
e.g.
ioat_pcie_error_detected();  calls  ioat_shutdown(); in case of 
ERR_NONFATAL
otherwise ioat_shutdown() in case of ERR_FATAL.


> Cheers,
> Ben.

  reply	other threads:[~2018-08-16  9:32 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-13 16:51 [PATCH 0/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed Thomas Tai
2018-08-13 16:51 ` [PATCH 1/1] " Thomas Tai
2018-08-14  9:16   ` poza
2018-08-14  9:22     ` poza
2018-08-14 13:51       ` Thomas Tai
2018-08-15 14:57         ` poza
2018-08-15 15:02           ` Thomas Tai
2018-08-15 15:26             ` poza
2018-08-15 15:43               ` Thomas Tai
2018-08-15 15:59                 ` poza
2018-08-15 16:04                   ` Thomas Tai
2018-08-15 21:55       ` Benjamin Herrenschmidt
2018-08-15 21:56       ` Benjamin Herrenschmidt
2018-08-16  6:36         ` poza [this message]
2018-08-16  6:51           ` Benjamin Herrenschmidt
2018-08-16  6:59             ` Benjamin Herrenschmidt
2018-08-16  8:07               ` poza
2018-08-16  8:12                 ` Benjamin Herrenschmidt
2018-08-16  9:03                   ` poza
2018-08-16 10:07                     ` Benjamin Herrenschmidt
2018-08-16 14:11                       ` poza
2018-08-16 23:30                         ` Benjamin Herrenschmidt
2018-08-17 10:29                           ` poza
2018-08-17 10:44                             ` poza
2018-08-18  7:38                             ` Benjamin Herrenschmidt
2018-08-16  7:05             ` Benjamin Herrenschmidt
2018-08-16  7:15               ` Benjamin Herrenschmidt
2018-08-16  7:56                 ` poza
2018-08-16  8:10                   ` Benjamin Herrenschmidt
2018-08-16  8:05               ` poza
2018-08-16  8:15                 ` Benjamin Herrenschmidt
2018-08-16  8:22                   ` poza
2018-08-16  8:28                     ` Benjamin Herrenschmidt
2018-08-16 13:30                       ` Thomas Tai
2018-08-16 13:46                     ` Sinan Kaya
2018-08-16 23:27                       ` Benjamin Herrenschmidt
2018-08-17  6:35                         ` poza
2018-08-19  2:24                         ` Bjorn Helgaas
2018-08-20  5:09                           ` poza
2018-08-20  5:15                             ` Benjamin Herrenschmidt
2018-08-20 13:02                               ` Thomas Tai
2018-08-20 13:27                                 ` Benjamin Herrenschmidt
2018-08-19  2:19               ` Bjorn Helgaas
2018-08-19 21:41                 ` Sinan Kaya
2018-08-20  2:03                   ` Benjamin Herrenschmidt
2018-08-20  5:19                     ` poza
2018-08-20  5:33                       ` Benjamin Herrenschmidt
2018-08-20  7:56                         ` poza
2018-08-20 11:22                           ` Benjamin Herrenschmidt
2018-08-20 13:26                             ` poza
2018-08-20 21:02                               ` Benjamin Herrenschmidt
2018-08-21  5:14                                 ` poza
2018-08-21  6:06                                   ` Benjamin Herrenschmidt
2018-08-21 14:37                                     ` Keith Busch
2018-08-21 15:07                                       ` Sinan Kaya
2018-08-21 15:29                                         ` Keith Busch
2018-08-21 15:50                                           ` Sinan Kaya
2018-08-21 15:55                                             ` Sinan Kaya
2018-08-21 16:44                                             ` Keith Busch
2018-08-21 15:30                                       ` poza
2018-08-21 21:14                                       ` Benjamin Herrenschmidt
2018-08-21 22:04                                         ` Keith Busch
2018-08-21 22:33                                           ` Keith Busch
2018-08-21 23:06                                           ` Benjamin Herrenschmidt
2018-08-21 23:13                                             ` Keith Busch
2018-08-22  0:36                                               ` Benjamin Herrenschmidt
2018-08-30  0:01                                             ` Keith Busch
2018-08-30  0:10                                               ` Sinan Kaya
2018-08-30  0:46                                                 ` Keith Busch
2018-08-30  4:26                                               ` Benjamin Herrenschmidt
2018-08-20 15:53                             ` Keith Busch
2018-08-20 16:13                               ` poza
2018-08-20 16:32                                 ` Keith Busch
2018-08-20 21:05                               ` Benjamin Herrenschmidt
2018-08-20 21:21                                 ` Sinan Kaya
2018-08-20 21:35                                   ` Keith Busch
2018-08-20 21:53                                     ` Benjamin Herrenschmidt
2018-08-20 22:02                                       ` Sinan Kaya
2018-08-20 22:04                                         ` Benjamin Herrenschmidt
2018-08-20 22:13                                           ` Sinan Kaya
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-22  9:13                                             ` Lukas Wunner
2018-08-22 14:38                                               ` Keith Busch
2018-08-22 14:51                                                 ` Sinan Kaya
2018-08-20 22:13                                           ` Keith Busch
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-21  1:30                                               ` Keith Busch
2018-08-20  4:37                 ` Benjamin Herrenschmidt
2018-08-20  4:39                 ` PATCH] Partial revert of "PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices" Benjamin Herrenschmidt
2018-08-21 19:50                   ` Bjorn Helgaas
2018-08-22  4:35                     ` poza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6cb069038530757f31f3dd60328c7e30@codeaurora.org \
    --to=poza@codeaurora.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=keith.busch@intel.com \
    --cc=linux-pci-owner@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).