linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	linux-pci@vger.kernel.org, keith.busch@intel.com,
	alex_gagniuc@dellteam.com, austin_bolen@dell.com,
	shyam_iyer@dell.com, linux-kernel@vger.kernel.org,
	Jonathan Derrick <jonathan.derrick@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Russell Currey <ruscur@russell.cc>,
	Sam Bobroff <sbobroff@linux.ibm.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Fri, 9 Nov 2018 08:11:39 +0100	[thread overview]
Message-ID: <20181109071139.uxa6gu7jwsvr7ve6@wunner.de> (raw)
In-Reply-To: <20181108200855.GE41183@google.com>

On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> +	/*
> +	 * If an MMIO read from the device returns ~0 data, that data may
> +	 * be valid, or it may indicate a bus error.  If config space is
> +	 * readable, assume it's valid data; otherwise, assume a bus error.
> +	 */
> +	if (val == ~0) {
> +		pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> +		if (id == ~0)
> +			pci_dev_set_disconnected(dev, NULL);
> +	}

This isn't safe unfortunately because "all ones" may occur for other
reasons besides disconnectedness.  E.g. on an Uncorrectable Error,
the device may likewise respond with all ones, but revert to valid
responses if the error can be recovered through a Secondary Bus Reset.
In such a case, marking the device disconnected would be inappropriate.

Accessing a device in D3cold would be another example where all ones
is returned both from mmio and config space despite the device still
being present and future accesses having a chance to succeed.

In fact, in v2 of Keith's patches adding pci_dev_set_disconnected()
he attempted the same as what you're doing here and caused issues
for me with devices in D3cold:

https://spinics.net/lists/linux-pci/msg54337.html


> One thing I'm uncomfortable with is that [...].  Another is that the
> only place we call pci_dev_set_disconnected() is in pciehp and acpiphp,
> so the only "disconnected" case we catch is if hotplug happens to be
> involved.

Yes, that's because the hotplug drivers are the only ones who can
identify removal authoritatively and unambiguously.  They *know*
when the device is gone and don't have to resort to heuristics
such as all ones.  (ISTR that dpc also marks devices disconnected.)


> sprinkling pci_dev_is_disconnected() around feels ad hoc
> instead of systematic, in the sense that I don't know how we convince
> ourselves that this (and only this) is the correct place to put it.

We need to add documentation for driver authors how to deal with
surprise removal.  Briefly:

* If (pdev->error_state == pci_channel_io_perm_failure), the device
  is definitely gone and any further device access can be skipped.
  Otherwise presence of the device is likely, but not guaranteed.

* If a device access can significantly delay device removal due to
  Completion Timeouts, or can cause an infinite loop, MCE or crash,
  do check pdev->error_state before carrying out the device access.

* Always be prepared that a device access may fail due to surprise
  removal, do not blindly trust mmio or config space reads or
  assume success of writes.

I'm sure this can be extended quite a bit.  There's more information
in this LWN article in the "Surprise removal" section:

https://lwn.net/Articles/767885/

Thanks,

Lukas

  parent reply	other threads:[~2018-11-09  7:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-18 22:15 [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Alexandru Gagniuc
2018-11-06  0:32 ` Alex G.
2018-11-07 17:04   ` Derrick, Jonathan
2018-11-07 23:42 ` Bjorn Helgaas
2018-11-08 20:09   ` Bjorn Helgaas
2018-11-08 21:49     ` Keith Busch
2018-11-08 22:01     ` Greg Kroah-Hartman
2018-11-08 22:32       ` Keith Busch
2018-11-08 22:42         ` Greg Kroah-Hartman
2018-11-08 22:49           ` Alex_Gagniuc
2018-11-08 22:51             ` Greg KH
2018-11-08 23:06               ` Alex_Gagniuc
2018-11-12  5:49                 ` Oliver O'Halloran
2018-11-12 20:05                   ` Alex_Gagniuc
2018-11-13  5:02                     ` Bjorn Helgaas
2018-11-13 22:39                       ` Alex_Gagniuc
2018-11-13 22:52                         ` Keith Busch
2018-11-14  0:31                           ` Alex_Gagniuc
2018-11-14  5:59                         ` Bjorn Helgaas
2018-11-14 19:22                           ` Alex_Gagniuc
2018-11-14 19:41                             ` Derrick, Jonathan
2018-11-14 20:23                             ` Keith Busch
2018-11-14 20:52                               ` Alex_Gagniuc
2018-11-14 20:58                                 ` Keith Busch
2018-11-15  6:24                             ` Bjorn Helgaas
2018-11-16  0:19                               ` Alex_Gagniuc
2018-11-08 23:03           ` Keith Busch
2018-11-09  7:29       ` Lukas Wunner
2018-11-09 11:32         ` Greg Kroah-Hartman
2018-11-09 16:36           ` Keith Busch
2018-11-08 22:20     ` Alex_Gagniuc
2018-11-09  7:11     ` Lukas Wunner [this message]
2018-11-12  5:48       ` Oliver O'Halloran
2018-12-27 19:28     ` Alex_Gagniuc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181109071139.uxa6gu7jwsvr7ve6@wunner.de \
    --to=lukas@wunner.de \
    --cc=alex_gagniuc@dellteam.com \
    --cc=austin_bolen@dell.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=jonathan.derrick@intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mr.nuke.me@gmail.com \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sbobroff@linux.ibm.com \
    --cc=shyam_iyer@dell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).