linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hari Vyas <hari.vyas@broadcom.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, Ray Jui <ray.jui@broadcom.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Thu, 16 Aug 2018 14:52:04 +0530	[thread overview]
Message-ID: <CAM5rFu-QCBq9TdoiFtvfNENUaS5Tpay5f0H9_BenEqK8nDgoNg@mail.gmail.com> (raw)
In-Reply-To: <6f76eb5f54fdc780dac744773d836214d7844346.camel@kernel.crashing.org>

On Thu, Aug 16, 2018 at 1:32 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2018-08-16 at 10:58 +0300, Konstantin Khlebnikov wrote:
>> On 16.08.2018 00:52, Benjamin Herrenschmidt wrote:
>> > On Wed, 2018-08-15 at 13:50 -0500, Bjorn Helgaas wrote:
>> > > Yes, this is definitely broken.  Some folks have tried to fix it in
>> > > the past, but it hasn't quite happened yet.  We actually merged one
>> > > patch, 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
>> > > bridges"), but had to revert it after we found issues:
>> > >
>> > > https://lkml.kernel.org/r/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com
>> > > https://lkml.kernel.org/r/20170915072352.10453.31977.stgit@bhelgaas-glaptop.roam.corp.google.com
>> >
>> > Ok so I had a look at this previous patch and it adds yet anothe use of
>> > some global mutex to protect part of the operation which makes me
>> > cringe a bit, we have too many of these.
>> >
>> > What do you think of the one I sent yesterday ? (I can't find it in the
>> > archives yet)
>> >
>> > [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races
>> >
>> > The patch itself needs splitting etc... but the basic idea is to move away
>> > from those global mutexes in a number of places and have one in the pci_dev
>> > struct itself to protect its state.
>> >
>> > I would also like to use this rather than the bitmap atomics for is_added
>> > etc... (Hari's fix) in the long run. Atomics aren't significantly cheaper
>> > and imho makes thing even messier.
>> >
>> > Jens, Konstantin, any chance you can test if the above also breaks iwlwifi
>> > (I don't see why it would but ...)
>> >
>>
>> I suppose original race was discovered between enabling bridge and device as described here
>>
>> https://lore.kernel.org/lkml/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>>
>> I barely can remember what I ever posted this, so I couldn't reproduce for sure.
>
> Ok. Well, my patch fixes it for my repro-case at least and seems to not
> break anyhting on my thinkpad so ...
>
> Bjorn, are you ok with the approach ? If yes, I'll start breaking up
> that patch into a few smaller bits in case something goes wrong and we
> want to bisect (such as the changes I did to tracking is_busmaster
> etc...)
>
> Cheers,
> ben.
>
>
There was an issue reported by my colleague srinath while enabling pci
bridge and a race
condition was happening while setting memory and master bits i.e. bits
were over-written.
As per my understanding is_busmaster and is_added bit race issue was
at internal data
management and is quite different from pci bridge enabling issue.
Am I missing some thing ? Would be interested to know what exactly was
affected due to
is_busmaster fix.

In any case, one bug is already filed and may propose a patch soon
about pci bridge enabling
scenario. In summary, bit manipulation is not working fine due to race
conditions in SMP
environment.

Regards,
hari

  reply	other threads:[~2018-08-16  9:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:16     ` Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas [this message]
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt
2018-08-17  1:12             ` Benjamin Herrenschmidt
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM5rFu-QCBq9TdoiFtvfNENUaS5Tpay5f0H9_BenEqK8nDgoNg@mail.gmail.com \
    --to=hari.vyas@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-pci@vger.kernel.org \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).