linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Hari Vyas <hari.vyas@broadcom.com>,
	linux-pci@vger.kernel.org, ray.jui@broadcom.com,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Jens Axboe <axboe@kernel.dk>,
	mr.nuke.me@gmail.com, keith.busch@intel.com
Subject: Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Sat, 18 Aug 2018 11:22:14 +0200	[thread overview]
Message-ID: <20180818092214.jjodm4mtxea53b2p@wunner.de> (raw)
In-Reply-To: <535f823d185b6c17b90bab326df268a56db0af36.camel@kernel.crashing.org>

On Sat, Aug 18, 2018 at 01:37:35PM +1000, Benjamin Herrenschmidt wrote:
> As-is, what you have is a bit that is private to drivers/pci (why ?
> devices might be interested in knowing the device has been
> disconnected...)

It is private to drivers/pci because Greg wanted it so:

   "> The flag is not intended for a device specific driver to use.
    > The intention is for the pci bus driver to use it, so no additional
    > work for other drivers.
    But the driver can see this in the structure, so it will get used..."
    https://spinics.net/lists/linux-pci/msg57744.html

   "I applaud your attempt to make this "private" to the pci core, just
    don't know if it really will work, or if it is worth it entirely..."
    https://spinics.net/lists/linux-pci/msg58017.html

Greg is of the opinion that drivers should check for themselves whether
a device has been removed and he was happy that they are barred from
using PCI_DEV_DISCONNECTED.  He believes that drivers should verify
for every read of mmio and config space that that the read value is not
0xffffffff (if that is an invalid value) and consider the device removed
if so:

   "If you are worried about your device going away (and you have to),
    then just check all reads and be fine with it.  If you have values
    that can be all 0xff, then just accept that as a valid value and
    move to the next read where it can't be valid."
    https://spinics.net/lists/linux-pci/msg70793.html

However Alex Gagniuc recently countered:

   "The discussion is based on the wrong assumptions that reads are 
    symmetrical wrt to a device being present or not. However, completion 
    timeouts and unsupported requests blow that assumption right out of the 
    water. Best case scenario, you just waste a little more time waiting for 
    hardware IO. More common is to end up crashing the machine.

    Greg's ideas work in a perfect world where PCI and PCIe are equivalent 
    at every level. And in that case, PCI_DEV_DISCONNECTED would have been 
    pure, 100% genuine Redmond bloatware. We wouldn't have gotten complaints 
    from Facebook and other industry players that it takes too damn long to 
    remove a device. We probably also wouldn't be seeing machines crash on 
    PCIe removal."
    https://spinics.net/lists/linux-pci/msg74864.html

The reason I'm interested in PCI_DEV_DISCONNECTED is because hot-removing
an Apple Thunderbolt Ethernet adapter locks up the machine a due to the tg3
driver trying to access the removed device.

Now, tg3 does call pci_channel_offline() and refrains from accessing the
device if that returns true.  If I make PCI_DEV_DISCONNECTED public and
check its value in pci_channel_offline(), I can hot-remove the Thunderbolt
Ethernet adapter without problems.  I posted a patch for that 2 years ago:
https://spinics.net/lists/linux-pci/msg55601.html

Thus, I'd be more than happy if the PCI_DEV_DISCONNECTED state could be
folded into enum pci_channel_state as it would immediately fix Thunderbolt
hot-removal.


> Fundamentally both mean, from a driver perspective, two things.
> 
>  - One very important: break out of a loop that waits for a HW state to
> change because it won't
> 
>  - One an optimisation: don't bother with all those register updates
> bcs they're never going to reach your HW.

Right.  PCI_DEV_DISCONNECTED was introduced by Intel on behalf of
Facebook.  See slides 13 to 16 of this slide deck for the details:
http://files.opencompute.org/oc/public.php?service=files&t=4ff50715e3c1e273e02b694757b80d25&download

There's a graph on slide 16 showing the speedup achieved by
PCI_DEV_DISCONNECTED.

There's also a recording of that talk, the relevant segment is just
10 minutes:
https://youtu.be/GJ6B0xzgvlM?t=926

Thanks,

Lukas

  reply	other threads:[~2018-08-18  9:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:16     ` Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt
2018-08-17  1:12             ` Benjamin Herrenschmidt
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner [this message]
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180818092214.jjodm4mtxea53b2p@wunner.de \
    --to=lukas@wunner.de \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=hari.vyas@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-pci@vger.kernel.org \
    --cc=mr.nuke.me@gmail.com \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).