linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Hari Vyas <hari.vyas@broadcom.com>,
	linux-pci@vger.kernel.org, ray.jui@broadcom.com,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Fri, 17 Aug 2018 09:25:59 +1000	[thread overview]
Message-ID: <6b610ee94bcef718db97600ae0ee931de3501e40.camel@kernel.crashing.org> (raw)
In-Reply-To: <20180816122807.6xof2u3hbhv57ua5@wunner.de>


On Thu, 2018-08-16 at 14:28 +0200, Lukas Wunner wrote:
> On Thu, Aug 16, 2018 at 07:52:06AM +1000, Benjamin Herrenschmidt wrote:
> > I would also like to use this rather than the bitmap atomics for is_added
> > etc... (Hari's fix) in the long run. Atomics aren't significantly cheaper
> > and imho makes thing even messier.
> 
> PCI device enablement isn't a hotpath, so the performance saving of
> acquire/release memory barriers vs. full memory barriers is hardly
> a strong argument.

My argument is not about performance. Fine grained atomics are a mess,
and make you lose the big picture. The problem isn't so much that
indiviual bits need to avoid low level RMW races, but overall higher
level bnehaviours need to be synchronized, such as the enable race
above.

> Atomic bitops have the advantage of being usable in atomic context,
> unlike a struct mutex.  And they don't require any spinning if the
> variable is accessed concurrently.

And why on earth would you need to use pci_enable_device or manipulate
is_added in an atomic context ?

Atomic bitops are a giant race-o-matic and experience shows that 99% of
the time, people using them insteaf of standard locking are getting it
wrong.

The PCI code is already messy enough, let's not "fix" the broken
synchronization accross the board in an even messier way.

> Last not least an atomic bitop needs just 1 line of code versus 3 lines
> of code to acquire a lock, update the variable and release the lock.
> "Elegance is not optional."

Sorry, atomics isn't elegance, it's broken crap. The flags dont' live
in isolation. They relate to an overall function. For example
is_busmaster relates to the actual setting of the bus master bit in the
command register. Those two needs to be updated together atomically,
you need a lock or a mutex. Not an atomic.

This is true of about everything else.

> That said, the race *you're* dealing with appears to be distinct from
> Hari's in that a lock is unavoidable because (AFAIUI) atomic_inc_return()
> is called before enablement actually happens, yet both needs to happen
> atomically.

Every single time one of these flags represent more than the flag
itself, but some other state that is being modified or updates, then
you need a lock to cover both the flag and the other state.

So if we are going to create a sane locking mechanism to protect the
various bits of pci_dev (not just enable), let's use it for everything
shall we ? At least everything that doesn't need to be manipulated in
atomic context.

That also possibly include removing some of the ad-hoc locks in various
parts of pcie/* etc...

Ben.

> Thanks,
> 
> Lukas

  reply	other threads:[~2018-08-16 23:25 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:16     ` Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt [this message]
2018-08-17  1:12             ` Benjamin Herrenschmidt
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b610ee94bcef718db97600ae0ee931de3501e40.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=axboe@kernel.dk \
    --cc=hari.vyas@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).