All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Bjorn Helgaas <helgaas@kernel.org>, Hari Vyas <hari.vyas@broadcom.com>
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, ray.jui@broadcom.com
Subject: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Wed, 15 Aug 2018 13:35:16 +1000	[thread overview]
Message-ID: <ecaed7664b46d73888d2494065905f8e108fc0f4.camel@kernel.crashing.org> (raw)
In-Reply-To: <20180731163727.GK45322@bhelgaas-glaptop.roam.corp.google.com>

On Tue, 2018-07-31 at 11:37 -0500, Bjorn Helgaas wrote:
> On Tue, Jul 03, 2018 at 02:35:40PM +0530, Hari Vyas wrote:
> > Changes in v3:
> > 	As per review comments from Lukas Wunner <lukas@wunner.de>,
> > 	squashed 3 commits to single commit. Without this build breaks.
> > 	Also clubbed set and clear function for is_added bits to a
> > 	single assign function. This optimizes code and reduce LoC.
> > 	Removed one wrongly added blank line in pci.c
> > 	 
> > Changes in v2:
> >         To avoid race condition while updating is_added and is_busmaster
> >         bits, is_added is moved to a private flag variable.
> >         is_added updation is handled in atomic manner also.
> > 
> > Hari Vyas (1):
> >   PCI: Data corruption happening due to race condition

Sooo .... I was chasing a different problem which makes me think we
have a deeper problem here.

In my case, I have a system with >70 nvme devices behind layers of
switches.

What I think is happening is all the nvme devices are probed in
parallel (the machine has about 40 CPU cores).

They all call pci_enable_device() around the same time.

This will walk up the bridge/switch chain and try to enable every
switch along the way. However there is *no* locking at the switch level
at all that I can see. Or am I missing something subtle ?

So here's an example simplified scenario:

	Bridge
	/    \
     dev A   dev B

Both dev A and B hit pci_enable_device() simultaneously, thus both
call pci_enable_bridge() at the same time: This does (simplified):

	if (pci_is_enabled(dev)) {
		if (!dev->is_busmaster)
			pci_set_master(dev);
		return;
	}

	retval = pci_enable_device(dev);
	if (retval)
		pci_err(dev, "Error enabling bridge (%d), continuing\n",
			retval);
	pci_set_master(dev);

Now the pci_is_enabled() just checks dev->enable_cnt and pci_enable_device()
increments it *before* enabling the device.

So it's possible that pci_is_enabled() returns true for the bridge for dev B
because dev A just did the atomic_inc_return(), but hasn't actually enabled
the bridge yet (hasnt yet hit the config space).

At that point, driver for dev B hits an MMIO and gets an UR response from
the bridge.

I need to setup a rig to verify my theory but I think this is racy. The same
race is also present with dev->is_busmaster. Using bitmaps won't help.

What's really needed is a per device mutex covering all those operations
on a given device. (This would also allow to get rid of those games with
atomics).

Any comments ?

Cheers,
Ben.

  reply	other threads:[~2018-08-15  3:35 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` Benjamin Herrenschmidt [this message]
2018-08-15  4:16     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt
2018-08-17  1:12             ` Benjamin Herrenschmidt
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecaed7664b46d73888d2494065905f8e108fc0f4.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=hari.vyas@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.