All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Keith Busch <keith.busch@intel.com>
Cc: Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Sinan Kaya <okaya@kernel.org>, Thomas Tai <thomas.tai@oracle.com>,
	poza@codeaurora.org, Lukas Wunner <lukas@wunner.de>,
	Christoph Hellwig <hch@lst.de>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCHv4 08/12] PCI: ERR: Always use the first downstream port
Date: Thu, 27 Sep 2018 17:56:25 -0500	[thread overview]
Message-ID: <20180927225625.GB18434@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20180926221924.GA17934@localhost.localdomain>

On Wed, Sep 26, 2018 at 04:19:25PM -0600, Keith Busch wrote:
> On Wed, Sep 26, 2018 at 05:01:16PM -0500, Bjorn Helgaas wrote:
> > On Thu, Sep 20, 2018 at 10:27:13AM -0600, Keith Busch wrote:
> > > The link reset always used the first bridge device, but AER broadcast
> > > error handling may have reported an end device. This means the reset may
> > > hit devices that were never notified of the impending error recovery.
> > > 
> > > This patch uses the first downstream port in the hierarchy considered
> > > reliable. An error detected by a switch upstream port should mean it
> > > occurred on its upstream link, so the patch selects the parent device
> > > if the error is not a root or downstream port.
> > 
> > I'm not really clear on what "Always use the first downstream port"
> > means.  Always use it for *what*?

And I forgot to ask what "first downstream port" means.

> I'll see if I can better rephrase.
> 
> Error handling should notify all affected pci functions. If an end device
> detects and emits ERR_FATAL, the old way would have only notified that
> end-device driver, but other functions may be on or below the same bus.
> 
> Using the downstream port that connects to that bus where the error was
> detectedas the anchor point to broadcast error handling progression,
> we can notify all functions so they have a chance to prepare for the
> link reset.

So do I understand correctly that if the ERR_FATAL source is:

  - a Switch Upstream Port, you assume the problem is with the Link
    upstream from the Port, and that Link may need to be reset, so you
    notify everything below that Link, including the Upstream Port,
    everything below it (the Downstream Ports and anything below
    them), and potentially even any peers of the Upstream Port (is it
    even possible for a Upstream Port to have peer multi-function
    devices?)

  - a Switch Downstream Port, you assume the Port (and the Link going
    downstream) may need to be reset, so you notify the Port and
    anything below it

  - an Endpoint, you assume the Link leading to the Endpoint may need
    to be reset, so you notify the Endpoint, any peer multi-function
    devices, any related SR-IOV devices, and any devices below a peer
    that happens to be a bridge

And this is different from before because it notifies more devices in
some cases?  There was a pci_walk_bus() in broadcast_error_message(),
so we should have notified several devices in *some* cases, at least.

Bjorn

  reply	other threads:[~2018-09-27 22:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-20 16:27 [PATCHv4 00/12] pci error handling fixes Keith Busch
2018-09-20 16:27 ` [PATCHv4 01/12] PCI: portdrv: Initialize service drivers directly Keith Busch
2018-09-20 16:27 ` [PATCHv4 02/12] PCI: portdrv: Restore pci state on slot reset Keith Busch
2018-09-20 16:27 ` [PATCHv4 03/12] PCI: DPC: Save and restore control state Keith Busch
2018-09-20 19:46   ` Sinan Kaya
2018-09-20 19:47     ` Sinan Kaya
2018-09-20 19:54       ` Keith Busch
2018-09-20 16:27 ` [PATCHv4 04/12] PCI: AER: Take reference on error devices Keith Busch
2018-09-20 16:27 ` [PATCHv4 05/12] PCI: AER: Don't read upstream ports below fatal errors Keith Busch
2018-09-20 16:27 ` [PATCHv4 06/12] PCI: ERR: Use slot reset if available Keith Busch
2018-09-20 16:27 ` [PATCHv4 07/12] PCI: ERR: Handle fatal error recovery Keith Busch
2018-09-20 16:27 ` [PATCHv4 08/12] PCI: ERR: Always use the first downstream port Keith Busch
2018-09-26 22:01   ` Bjorn Helgaas
2018-09-26 22:19     ` Keith Busch
2018-09-27 22:56       ` Bjorn Helgaas [this message]
2018-09-28 15:42         ` Keith Busch
2018-09-28 20:50           ` Bjorn Helgaas
2018-09-28 21:35             ` Keith Busch
2018-09-28 23:28               ` Bjorn Helgaas
2018-10-01 15:14                 ` Keith Busch
2018-10-02 19:35                   ` Bjorn Helgaas
2018-10-02 19:55                     ` Keith Busch
2018-09-20 16:27 ` [PATCHv4 09/12] PCI: ERR: Simplify broadcast callouts Keith Busch
2018-09-20 16:27 ` [PATCHv4 10/12] PCI: ERR: Report current recovery status for udev Keith Busch
2018-09-20 16:27 ` [PATCHv4 11/12] PCI: Unify device inaccessible Keith Busch
2018-09-20 16:27 ` [PATCHv4 12/12] PCI: Make link active reporting detection generic Keith Busch
2018-09-20 20:00 ` [PATCHv4 00/12] pci error handling fixes Sinan Kaya
2018-09-20 21:17 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180927225625.GB18434@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=okaya@kernel.org \
    --cc=poza@codeaurora.org \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.