All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>,
	Mario.Limonciello@dell.com,
	Michael Jamet <michael.jamet@intel.com>,
	Yehezkel Bernat <yehezkel.bernat@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v3 5/5] ACPI / hotplug / PCI: Mark stale PCI devices disconnected
Date: Tue, 27 Mar 2018 17:13:17 -0500	[thread overview]
Message-ID: <20180327221317.GH7759@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20180327205236.GA15613@wunner.de>

[+cc Greg]

On Tue, Mar 27, 2018 at 10:52:36PM +0200, Lukas Wunner wrote:
> On Tue, Mar 27, 2018 at 02:45:12PM -0500, Bjorn Helgaas wrote:
> > I hate this pci_dev_set_disconnected() thing and hope to remove it
> > someday.  It's racy and adds additional states that are hard to
> > manage, e.g., "device has been removed and will not respond to PCI
> > accesses, but pci_dev_set_disconnected() hasn't been called yet".  If
> > we can handle that case (and we should), we shouldn't need
> > pci_dev_set_disconnected().
> 
> Absent pci_dev_is_disconnected(), how can the PCI core detect if
> a device is still there (versus has been hot-removed)?  The only
> valid method I know of is reading the vendor ID from config space.
> However reading config spaces takes time.  To properly handle
> surprise removal, we ought to check presence of the device frequently
> lest we clutter the logs with errors, unnecessarily occupy the CPU
> and bus or in some cases hang the machine.  Ideally, we should check
> presence of the device every time we access it.

I'm not proposing a change in pci_dev_is_disconnected() for *this*
patch, so we can have a discussion here, but I don't think resolving
this is a precondition for this patch.

I was just expressing remorse, which I agree is not constructive, so I
should have kept quiet.  I know you understand the problem, but to try
to make amends, I'll recap it here for any onlookers:

  - driver calls pci_dev_is_disconnected(), learns device is present
  - device is removed
  - driver reads register, believing device is still present

Obviously the register read fails, because the device is gone.  So
the driver has to be prepared for that failure regardless of whether
we have pci_dev_is_disconnected().

> Now, imagine every time we access config or mmio space, we read the
> vendor ID from config space, so we essentially double the number of
> accesses to the device.  Does that make sense, performance-wise?

No, but I wouldn't advocate that strategy.

> If pci_dev_is_disconnected() returns true, the device is either there
> or was until very recently, so in general checking that instead of
> reading the vendor ID should be sufficient and should result in
> acceptable performance.
> 
> I keep hearing these complaints about the PCI_DEV_DISCONNECTED flag,
> from Greg but now also from you, but I'm not seeing constructive
> criticism how checking presence of a device should be handled instead,
> in a way that doesn't negatively impact performance.
> 
> IMO it was a mistake to constrain visibility of the flag to the PCI core
> (at Greg's behest), it should have been made available to drivers so
> they're afforded a better method to detect surprise removal than just
> reading the vendor ID every time they access the device.

To avoid the race, drivers have to check for a valid value anyway.  If
they *also* check pci_dev_is_disconnected(), that clutters the code
and gives a false sense of security.

When a read fails, it normally returns ~0 as the data to the driver. 
The driver knows the semantics of the registers it reads, and often it
can tell immediately that a successful read of the register could
never return that value, so it doesn't need to try any more accesses
to Vendor ID or anything else.

If ~0 *is* a possible valid value, the driver can read some other
register (maybe Vendor ID, or maybe an MMIO register that can be read
faster).

Bjorn

  reply	other threads:[~2018-03-27 22:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26 13:21 [PATCH v3 0/5] PCI: Fixes for native PCIe and ACPI hotplug Mika Westerberg
2018-02-26 13:21 ` [PATCH v3 1/5] PCI: Make sure all bridges reserve at least one bus number Mika Westerberg
2018-03-27 18:57   ` Bjorn Helgaas
2018-03-28 11:43     ` Mika Westerberg
2018-03-28 18:09       ` Bjorn Helgaas
2018-03-29 11:59         ` Mika Westerberg
2018-03-31  8:29           ` Lukas Wunner
2018-03-31  8:58             ` Mika Westerberg
2018-03-31  9:12               ` Lukas Wunner
2018-03-31  9:19                 ` Rafael J. Wysocki
2018-03-31  9:20                 ` Mika Westerberg
2018-03-31  9:30                   ` Lukas Wunner
2018-03-31  9:58                     ` Mika Westerberg
2018-03-31 10:18                       ` Lukas Wunner
2018-03-31 10:37                         ` Mika Westerberg
2018-03-31  9:30                   ` Rafael J. Wysocki
2018-03-31  9:56                     ` Mika Westerberg
2018-03-31 10:05                       ` Rafael J. Wysocki
2018-03-31 10:12                         ` Mika Westerberg
2018-02-26 13:21 ` [PATCH v3 2/5] PCI: Take bridge window alignment into account when distributing resources Mika Westerberg
2018-03-27 19:23   ` Bjorn Helgaas
2018-03-28 12:01     ` Mika Westerberg
2018-03-29 21:29       ` Bjorn Helgaas
2018-03-31  9:18         ` Mika Westerberg
2018-02-26 13:21 ` [PATCH v3 3/5] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume Mika Westerberg
2018-02-26 13:21 ` [PATCH v3 4/5] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used Mika Westerberg
2018-02-26 13:21 ` [PATCH v3 5/5] ACPI / hotplug / PCI: Mark stale PCI devices disconnected Mika Westerberg
2018-03-27 19:45   ` Bjorn Helgaas
2018-03-27 20:52     ` Lukas Wunner
2018-03-27 22:13       ` Bjorn Helgaas [this message]
2018-03-28  6:25         ` Greg Kroah-Hartman
2018-02-26 15:16 ` [PATCH v3 0/5] PCI: Fixes for native PCIe and ACPI hotplug Andy Shevchenko
2018-03-15  6:00 ` Mika Westerberg
2018-03-26 10:01   ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180327221317.GH7759@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=Mario.Limonciello@dell.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michael.jamet@intel.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=yehezkel.bernat@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.