All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Oliver O'Halloran" <oohall@gmail.com>
To: Sam Bobroff <sbobroff@linux.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 07/14] powernv/eeh: Use generic code to handle hot resets
Date: Tue, 17 Sep 2019 17:30:18 +1000	[thread overview]
Message-ID: <CAOSf1CE9AuG_nS0F+pmejx8eX5XDJ06AGxKBMPs3xweZ-XWnrQ@mail.gmail.com> (raw)
In-Reply-To: <20190917011542.GG21303@tungsten.ozlabs.ibm.com>

On Tue, Sep 17, 2019 at 11:15 AM Sam Bobroff <sbobroff@linux.ibm.com> wrote:
>
> On Tue, Sep 03, 2019 at 08:15:58PM +1000, Oliver O'Halloran wrote:
> > When we reset PCI devices managed by a hotplug driver the reset may
> > generate spurious hotplug events that cause the PCI device we're resetting
> > to be torn down accidently. This is a problem for EEH (when the driver is
> > EEH aware) since we want to leave the OS PCI device state intact so that
> > the device can be re-set without losing any resources (network, disks,
> > etc) provided by the driver.
> >
> > Generic PCI code provides the pci_bus_error_reset() function to handle
> > resetting a PCI Device (or bus) by using the reset method provided by the
> > hotplug slot driver. We can use this function if the EEH core has
> > requested a hot reset (common case) without tripping over the hotplug
> > driver.

> Could you explain a bit more about how this change solves the problem?
> Is it that the hotplug driver's reset method doesn't cause spurious
> hotplug events?

Yes, see the comment below.

> > -     if (pci_is_root_bus(bus) ||
> > -         pci_is_root_bus(bus->parent))
> > +     if (pci_is_root_bus(bus))
> >               return pnv_eeh_root_reset(hose, option);
> >
> > +     /*
> > +      * For hot resets try use the generic PCI error recovery reset
> > +      * functions. These correctly handles the case where the secondary
> > +      * bus is behind a hotplug slot and it will use the slot provided
> > +      * reset methods to prevent spurious hotplug events during the reset.
> > +      *
> > +      * Fundemental resets need to be handled internally to EEH since the
> > +      * PCI core doesn't really have a concept of a fundemental reset,
> > +      * mainly because there's no standard way to generate one. Only a
> > +      * few devices require an FRESET so it should be fine.
> > +      */
> > +     if (option != EEH_RESET_FUNDAMENTAL) {
> > +             /*
> > +              * NB: Skiboot and pnv_eeh_bridge_reset() also no-op the
> > +              *     de-assert step. It's like the OPAL reset API was
> > +              *     poorly designed or something...
> > +              */
> > +             if (option == EEH_RESET_DEACTIVATE)
> > +                     return 0;
>
> It looks like this will prevent pnv_eeh_root_reset(bus->parent) (below)
> from being called for EEH_RESET_DEACTIVATE, when it was before the
> patch. Is that right?

I agree it's a little awkward, but it works fine. OPAL has always
treated the resets defined by opal_pci_reset() as being edge-triggered
rather than level triggered since the de-assert step has always been
implemented as a no-op. This behaviour is effectively part of the ABI
between OPAL and the kernel since the kernel skips the de-assert step
in pnv_eeh_bridge_reset(). Although pnv_eeh_reset() uses
pnv_eeh_reset_root() to reset the secondary bus of the root port
pnv_pci_reset_secondary_bus() still uses the bridge reset.

I should probably update the OPAL API docs to mention that. Oh well.

> > +             rc = pci_bus_error_reset(bus->self);
> > +             if (!rc)
> > +                     return 0;
>
> Is it correct to fall through and try a different reset if this fails?

The only reason I can see for the generic code failing is when config
space to the bridge is blocked by the EEH core. The internal
pnv_eeh_bridge_reset() function has the option of calling
opal_pci_reset() or using the internal EEH config accessors (which
aren't filtered) so falling back makes sense to me.

  reply	other threads:[~2019-09-17  7:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 10:15 EEH + hotplug fixes Oliver O'Halloran
2019-09-03 10:15 ` [PATCH 01/14] powerpc/eeh: Clean up EEH PEs after recovery finishes Oliver O'Halloran
2019-09-17  0:43   ` Sam Bobroff
2019-09-19 10:25   ` Michael Ellerman
2019-09-03 10:15 ` [PATCH 02/14] powerpc/eeh: Fix race when freeing PDNs Oliver O'Halloran
2019-09-17  0:50   ` Sam Bobroff
2019-09-03 10:15 ` [PATCH 03/14] powerpc/eeh: Make permanently failed devices non-actionable Oliver O'Halloran
2019-09-17  0:51   ` Sam Bobroff
2019-09-03 10:15 ` [PATCH 04/14] powerpc/eeh: Check slot presence state in eeh_handle_normal_event() Oliver O'Halloran
2019-09-17  1:00   ` Sam Bobroff
2019-09-17  4:20     ` Oliver O'Halloran
2019-09-03 10:15 ` [PATCH 05/14] powerpc/eeh: Defer printing stack trace Oliver O'Halloran
2019-09-17  1:04   ` Sam Bobroff
2019-09-17  1:45     ` Oliver O'Halloran
2019-09-17  3:35       ` Sam Bobroff
2019-09-17  3:38         ` Oliver O'Halloran
2019-09-03 10:15 ` [PATCH 06/14] powerpc/eeh: Remove stale CAPI comment Oliver O'Halloran
2019-09-03 14:09   ` Andrew Donnellan
2019-09-17  1:04   ` Sam Bobroff
2019-09-03 10:15 ` [PATCH 07/14] powernv/eeh: Use generic code to handle hot resets Oliver O'Halloran
2019-09-17  1:15   ` Sam Bobroff
2019-09-17  7:30     ` Oliver O'Halloran [this message]
2019-09-03 10:15 ` [PATCH 08/14] pci-hotplug/pnv_php: Add a reset_slot() callback Oliver O'Halloran
2019-09-03 10:16 ` [PATCH 09/14] pci-hotplug/pnv_php: Add support for IODA3 Power9 PHBs Oliver O'Halloran
2019-09-03 10:16 ` [PATCH 10/14] pci-hotplug/pnv_php: Add attention indicator support Oliver O'Halloran
2019-09-03 10:16 ` [PATCH 11/14] powerpc/eeh: Set attention indicator while recovering Oliver O'Halloran
2019-09-17  1:23   ` Sam Bobroff
2019-09-03 10:16 ` [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check Oliver O'Halloran
2019-09-17  3:15   ` Sam Bobroff
2019-09-17  3:36     ` Oliver O'Halloran
2019-09-17  4:23       ` Oliver O'Halloran
2019-09-03 10:16 ` [PATCH 13/14] powerpc/eeh: Add a eeh_dev_break debugfs interface Oliver O'Halloran
2019-09-17  3:19   ` Sam Bobroff
2019-09-03 10:16 ` [PATCH 14/14] selftests/powerpc: Add basic EEH selftest Oliver O'Halloran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOSf1CE9AuG_nS0F+pmejx8eX5XDJ06AGxKBMPs3xweZ-XWnrQ@mail.gmail.com \
    --to=oohall@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.