linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Hari Vyas <hari.vyas@broadcom.com>, Lukas Wunner <lukas@wunner.de>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, Ray Jui <ray.jui@broadcom.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Fri, 17 Aug 2018 09:20:40 +1000	[thread overview]
Message-ID: <7c423d2aa74cbc04533caf9fa30f8cbca286007e.camel@kernel.crashing.org> (raw)
In-Reply-To: <CAM5rFu_YX8eQxRkQ5_YGdDvZ3M+v6ciry31VSnHF36J2MqOfQw@mail.gmail.com>

On Thu, 2018-08-16 at 16:17 +0530, Hari Vyas wrote:
> On Thu, Aug 16, 2018 at 3:56 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Thu, Aug 16, 2018 at 08:10:28PM +1000, Benjamin Herrenschmidt wrote:
> > > On Thu, 2018-08-16 at 14:52 +0530, Hari Vyas wrote:
> > > > There was an issue reported by my colleague srinath while enabling pci
> > > > bridge and a race condition was happening while setting memory and
> > > > master bits i.e. bits were over-written.
> > > > As per my understanding is_busmaster and is_added bit race issue was
> > > > at internal data management and is quite different from pci bridge
> > > > enabling issue.
> > > > Am I missing some thing ? Would be interested to know what exactly was
> > > > affected due to is_busmaster fix.
> > > 
> > > The is_busmaster fix isn't I think affecting anything, however I don't
> > > like the use of atomics for these things. It's a band-aid. If we grow a
> > > proper pci_dev mutex, which is what I'm introducing here, it should be
> > > able to also handle the is_added race etc..
> > 
> > What is your rationale to introduce an additional mutex instead if
> > utilizing the existing mutex in struct device via device_lock() /
> > device_unlock() or alternatively pci_dev_lock() / pci_dev_unlock()?
> > 
> > This is also what Bjorn had suggested here:
> > https://lore.kernel.org/lkml/20170816134354.GV32525@bhelgaas-glaptop.roam.corp.google.com/
> > 
> > Thanks,
> > 
> > Lukas
> 
> Agreeing. My "pci bridge enabling" proposed simple fix(issue is not
> easy to reproduce in our environment
> so not tested yet but believe it should work) too uses existing
> locking mechanism only.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=200793
> 
> Currently addressing only PCI_COMMAND but can be easily extended for
> other pci config having bit fields.
> Good that we all are in same direction. Issue should be fixed though
> be addressed in different way.

This is straight in line with your is_added fix, more way too fine
grained locking that fixes the details of accessing a specific field or
pair off fields but completely ignore the higher level interactions.

I'm not fan of this approach at all.

Most of the manipulations done in all those code path are NOT
scalability critical and that sort of extra fine grained locking is not
only very fragile, but wasteful.

Itt's like playing whack-a-mole with micro-races, the overall picture
quickly becomes a mess, it already more/less is with all the random
global mutexes here or there.

It's a lot cleaner to have a mutex in the device itself that covers its
general state.

Ben.

  reply	other threads:[~2018-08-16 23:20 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:16     ` Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt [this message]
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt
2018-08-17  1:12             ` Benjamin Herrenschmidt
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c423d2aa74cbc04533caf9fa30f8cbca286007e.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=hari.vyas@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).