linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: poza@codeaurora.org, Sinan Kaya <okaya@kernel.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Thomas Tai <thomas.tai@oracle.com>,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-pci-owner@vger.kernel.org,
	Sam Bobroff <sam.bobroff@au1.ibm.com>
Subject: Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed
Date: Tue, 21 Aug 2018 16:33:03 -0600	[thread overview]
Message-ID: <20180821223303.GA19320@localhost.localdomain> (raw)
In-Reply-To: <20180821220456.GC18612@localhost.localdomain>

On Tue, Aug 21, 2018 at 04:04:56PM -0600, Keith Busch wrote:
> > For example, if the error happens in a parent bridge and that gets
> > reset, the entire hierarchy underneath does too.
> > 
> > We need to save/restore the state of all bridges and devices (BARs
> > etc...) below that.
> 
> That's not good enough. You'd at least need to check SLTSTS.PDC on all
> bridge DSP's beneath the parent *before* you try to restore the state
> of devices beneath them. Those races are primarily why DPC currently
> removes and reenumerates instead, because that's not something that can
> be readily coordinated today.

A picture of a real scenario to go with the above comments:

                    ----------------
  --------          |              |
  |      |       -------        -------       --------------
  |  RP  | <---> | USP | SWITCH | DSP | <---> | END DEVICE |
  |      |       ------         -------       --------------
  --------          |              |
                    ----------------

The downstream port (DSP) is hotplug capable, and the root port (RP)
has DPC enabled.

Lets say you hot swap END DEVICE at a time with an inflight posted
transaction such that the RP triggers a CTO, starting a DPC event.
If you treat the RP's DPC event as just "error handling" and restore
the state after containment is released, we are in undefined behavior
because of the mistaken idententy of the device below the SWITCH DSP.

Re-enumerating the topology is easy since it has no races like that. I
understand re-enumeration is problematic for some scenarios, so if we
really need to avoid re-enumeration except when absolutely necessary,
it should be possible if we can detangle the necessary coordination
between the port service drivers.

  reply	other threads:[~2018-08-21 22:33 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-13 16:51 [PATCH 0/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed Thomas Tai
2018-08-13 16:51 ` [PATCH 1/1] " Thomas Tai
2018-08-14  9:16   ` poza
2018-08-14  9:22     ` poza
2018-08-14 13:51       ` Thomas Tai
2018-08-15 14:57         ` poza
2018-08-15 15:02           ` Thomas Tai
2018-08-15 15:26             ` poza
2018-08-15 15:43               ` Thomas Tai
2018-08-15 15:59                 ` poza
2018-08-15 16:04                   ` Thomas Tai
2018-08-15 21:55       ` Benjamin Herrenschmidt
2018-08-15 21:56       ` Benjamin Herrenschmidt
2018-08-16  6:36         ` poza
2018-08-16  6:51           ` Benjamin Herrenschmidt
2018-08-16  6:59             ` Benjamin Herrenschmidt
2018-08-16  8:07               ` poza
2018-08-16  8:12                 ` Benjamin Herrenschmidt
2018-08-16  9:03                   ` poza
2018-08-16 10:07                     ` Benjamin Herrenschmidt
2018-08-16 14:11                       ` poza
2018-08-16 23:30                         ` Benjamin Herrenschmidt
2018-08-17 10:29                           ` poza
2018-08-17 10:44                             ` poza
2018-08-18  7:38                             ` Benjamin Herrenschmidt
2018-08-16  7:05             ` Benjamin Herrenschmidt
2018-08-16  7:15               ` Benjamin Herrenschmidt
2018-08-16  7:56                 ` poza
2018-08-16  8:10                   ` Benjamin Herrenschmidt
2018-08-16  8:05               ` poza
2018-08-16  8:15                 ` Benjamin Herrenschmidt
2018-08-16  8:22                   ` poza
2018-08-16  8:28                     ` Benjamin Herrenschmidt
2018-08-16 13:30                       ` Thomas Tai
2018-08-16 13:46                     ` Sinan Kaya
2018-08-16 23:27                       ` Benjamin Herrenschmidt
2018-08-17  6:35                         ` poza
2018-08-19  2:24                         ` Bjorn Helgaas
2018-08-20  5:09                           ` poza
2018-08-20  5:15                             ` Benjamin Herrenschmidt
2018-08-20 13:02                               ` Thomas Tai
2018-08-20 13:27                                 ` Benjamin Herrenschmidt
2018-08-19  2:19               ` Bjorn Helgaas
2018-08-19 21:41                 ` Sinan Kaya
2018-08-20  2:03                   ` Benjamin Herrenschmidt
2018-08-20  5:19                     ` poza
2018-08-20  5:33                       ` Benjamin Herrenschmidt
2018-08-20  7:56                         ` poza
2018-08-20 11:22                           ` Benjamin Herrenschmidt
2018-08-20 13:26                             ` poza
2018-08-20 21:02                               ` Benjamin Herrenschmidt
2018-08-21  5:14                                 ` poza
2018-08-21  6:06                                   ` Benjamin Herrenschmidt
2018-08-21 14:37                                     ` Keith Busch
2018-08-21 15:07                                       ` Sinan Kaya
2018-08-21 15:29                                         ` Keith Busch
2018-08-21 15:50                                           ` Sinan Kaya
2018-08-21 15:55                                             ` Sinan Kaya
2018-08-21 16:44                                             ` Keith Busch
2018-08-21 15:30                                       ` poza
2018-08-21 21:14                                       ` Benjamin Herrenschmidt
2018-08-21 22:04                                         ` Keith Busch
2018-08-21 22:33                                           ` Keith Busch [this message]
2018-08-21 23:06                                           ` Benjamin Herrenschmidt
2018-08-21 23:13                                             ` Keith Busch
2018-08-22  0:36                                               ` Benjamin Herrenschmidt
2018-08-30  0:01                                             ` Keith Busch
2018-08-30  0:10                                               ` Sinan Kaya
2018-08-30  0:46                                                 ` Keith Busch
2018-08-30  4:26                                               ` Benjamin Herrenschmidt
2018-08-20 15:53                             ` Keith Busch
2018-08-20 16:13                               ` poza
2018-08-20 16:32                                 ` Keith Busch
2018-08-20 21:05                               ` Benjamin Herrenschmidt
2018-08-20 21:21                                 ` Sinan Kaya
2018-08-20 21:35                                   ` Keith Busch
2018-08-20 21:53                                     ` Benjamin Herrenschmidt
2018-08-20 22:02                                       ` Sinan Kaya
2018-08-20 22:04                                         ` Benjamin Herrenschmidt
2018-08-20 22:13                                           ` Sinan Kaya
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-22  9:13                                             ` Lukas Wunner
2018-08-22 14:38                                               ` Keith Busch
2018-08-22 14:51                                                 ` Sinan Kaya
2018-08-20 22:13                                           ` Keith Busch
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-21  1:30                                               ` Keith Busch
2018-08-20  4:37                 ` Benjamin Herrenschmidt
2018-08-20  4:39                 ` PATCH] Partial revert of "PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices" Benjamin Herrenschmidt
2018-08-21 19:50                   ` Bjorn Helgaas
2018-08-22  4:35                     ` poza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180821223303.GA19320@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci-owner@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@kernel.org \
    --cc=poza@codeaurora.org \
    --cc=sam.bobroff@au1.ibm.com \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).