linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Hari Vyas <hari.vyas@broadcom.com>,
	linux-pci@vger.kernel.org, ray.jui@broadcom.com,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Fri, 17 Aug 2018 11:12:50 +1000	[thread overview]
Message-ID: <6ce65522aee9a2edbc6c116624b1b0b60a7b79d8.camel@kernel.crashing.org> (raw)
In-Reply-To: <6b610ee94bcef718db97600ae0ee931de3501e40.camel@kernel.crashing.org>

On Fri, 2018-08-17 at 09:25 +1000, Benjamin Herrenschmidt wrote:
> > That said, the race *you're* dealing with appears to be distinct from
> > Hari's in that a lock is unavoidable because (AFAIUI) atomic_inc_return()
> > is called before enablement actually happens, yet both needs to happen
> > atomically.
> 
> Every single time one of these flags represent more than the flag
> itself, but some other state that is being modified or updates, then
> you need a lock to cover both the flag and the other state.
> 
> So if we are going to create a sane locking mechanism to protect the
> various bits of pci_dev (not just enable), let's use it for everything
> shall we ? At least everything that doesn't need to be manipulated in
> atomic context.
> 
> That also possibly include removing some of the ad-hoc locks in various
> parts of pcie/* etc...

Allright, looking at those atomic flags, we have two today:

 - PCI_DEV_DISCONNECTED

Now that's a complete dup of pci_channel_state_t error_state, yuck.

Also the atomic bit is completely pointless. It only protects the
actual field from RMW access, it doesn't synchronize with any of the
users.

For example, it's being tested in the config ops wrappers (instead of
the channel state) but that's completely racy. In that specific case it
probably doesn't matter either way, but still.

It's also tested in __pci_write_msi_msg, why ? What for ? If MMIO is
blocked it's handled by the channel state. Again, you notice the
complete absence of synchronization between the producer and the
consumer of that bit.

 - PCI_DEV_ADDED

Now the only reason that was moved was to avoid the RMW races on the
bit itself. There is, here too, 0 synchronization with the callers.

Now I forgot the specific details of the race Hari found, but this is
definitely not the right way to fix things. Plus it forced powerpc to
do a relative path include which sucks.

The latter would be much more cleanly handled using the mutex I
proposed.

The former should go a way, that's what error_state is already meant to
be. As for the locking, this needs to be looked at more closely since
this is inherently a racy op, though testing it in the MSI writing code
looks more like a band-aid than a feature to me. The original commit
lokos like it's meant to just be some kind of optimisation. One has to
be careful however of the possible ordering issues when the bit is
cleared.

So at this stage, I don't see any reasonable justification for those
private atomic bits. They should should both go away along with the
whole priv_flags.

Ben.

  reply	other threads:[~2018-08-17  1:12 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:16     ` Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt
2018-08-17  1:12             ` Benjamin Herrenschmidt [this message]
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ce65522aee9a2edbc6c116624b1b0b60a7b79d8.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=axboe@kernel.dk \
    --cc=hari.vyas@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).