All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hari Vyas <hari.vyas@broadcom.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, Ray Jui <ray.jui@broadcom.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)
Date: Thu, 16 Aug 2018 14:52:04 +0530	[thread overview]
Message-ID: <CAM5rFu-QCBq9TdoiFtvfNENUaS5Tpay5f0H9_BenEqK8nDgoNg@mail.gmail.com> (raw)
In-Reply-To: <6f76eb5f54fdc780dac744773d836214d7844346.camel@kernel.crashing.org>

On Thu, Aug 16, 2018 at 1:32 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2018-08-16 at 10:58 +0300, Konstantin Khlebnikov wrote:
>> On 16.08.2018 00:52, Benjamin Herrenschmidt wrote:
>> > On Wed, 2018-08-15 at 13:50 -0500, Bjorn Helgaas wrote:
>> > > Yes, this is definitely broken.  Some folks have tried to fix it in
>> > > the past, but it hasn't quite happened yet.  We actually merged one
>> > > patch, 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
>> > > bridges"), but had to revert it after we found issues:
>> > >
>> > > https://lkml.kernel.org/r/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com
>> > > https://lkml.kernel.org/r/20170915072352.10453.31977.stgit@bhelgaas-glaptop.roam.corp.google.com
>> >
>> > Ok so I had a look at this previous patch and it adds yet anothe use of
>> > some global mutex to protect part of the operation which makes me
>> > cringe a bit, we have too many of these.
>> >
>> > What do you think of the one I sent yesterday ? (I can't find it in the
>> > archives yet)
>> >
>> > [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races
>> >
>> > The patch itself needs splitting etc... but the basic idea is to move away
>> > from those global mutexes in a number of places and have one in the pci_dev
>> > struct itself to protect its state.
>> >
>> > I would also like to use this rather than the bitmap atomics for is_added
>> > etc... (Hari's fix) in the long run. Atomics aren't significantly cheaper
>> > and imho makes thing even messier.
>> >
>> > Jens, Konstantin, any chance you can test if the above also breaks iwlwifi
>> > (I don't see why it would but ...)
>> >
>>
>> I suppose original race was discovered between enabling bridge and device as described here
>>
>> https://lore.kernel.org/lkml/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>>
>> I barely can remember what I ever posted this, so I couldn't reproduce for sure.
>
> Ok. Well, my patch fixes it for my repro-case at least and seems to not
> break anyhting on my thinkpad so ...
>
> Bjorn, are you ok with the approach ? If yes, I'll start breaking up
> that patch into a few smaller bits in case something goes wrong and we
> want to bisect (such as the changes I did to tracking is_busmaster
> etc...)
>
> Cheers,
> ben.
>
>
There was an issue reported by my colleague srinath while enabling pci
bridge and a race
condition was happening while setting memory and master bits i.e. bits
were over-written.
As per my understanding is_busmaster and is_added bit race issue was
at internal data
management and is quite different from pci bridge enabling issue.
Am I missing some thing ? Would be interested to know what exactly was
affected due to
is_busmaster fix.

In any case, one bug is already filed and may propose a patch soon
about pci bridge enabling
scenario. In summary, bit manipulation is not working fine due to race
conditions in SMP
environment.

Regards,
hari

  reply	other threads:[~2018-08-16  9:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:05 [PATCH v3] PCI: Data corruption happening due to race condition Hari Vyas
2018-07-03  9:05 ` Hari Vyas
2018-07-03  9:13   ` Lukas Wunner
2018-07-18 23:29   ` Bjorn Helgaas
2018-07-19  4:18     ` Benjamin Herrenschmidt
2018-07-19 14:04       ` Hari Vyas
2018-07-19 18:55         ` Lukas Wunner
2018-07-20  4:27           ` Benjamin Herrenschmidt
2018-07-27 22:25       ` Bjorn Helgaas
2018-07-28  0:45         ` Benjamin Herrenschmidt
2018-07-31 11:21         ` Michael Ellerman
2018-07-19 17:41   ` Bjorn Helgaas
2018-07-20  9:16     ` Hari Vyas
2018-07-20 12:20       ` Bjorn Helgaas
2018-07-31 16:37 ` Bjorn Helgaas
2018-08-15  3:35   ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Benjamin Herrenschmidt
2018-08-15  4:16     ` Benjamin Herrenschmidt
2018-08-15  4:44       ` Benjamin Herrenschmidt
2018-08-15  5:21         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 19:09         ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:50         ` [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races Benjamin Herrenschmidt
2018-08-15 22:40           ` Guenter Roeck
2018-08-15 23:38             ` Benjamin Herrenschmidt
2018-08-20  1:31               ` Guenter Roeck
2018-08-17  3:07           ` Bjorn Helgaas
2018-08-17  3:42             ` Benjamin Herrenschmidt
2018-08-15 18:50     ` PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition) Bjorn Helgaas
2018-08-15 21:52       ` Benjamin Herrenschmidt
2018-08-15 23:23         ` Benjamin Herrenschmidt
2018-08-16  7:58         ` Konstantin Khlebnikov
2018-08-16  8:02           ` Benjamin Herrenschmidt
2018-08-16  9:22             ` Hari Vyas [this message]
2018-08-16 10:10               ` Benjamin Herrenschmidt
2018-08-16 10:11                 ` Benjamin Herrenschmidt
2018-08-16 10:26                 ` Lukas Wunner
2018-08-16 10:47                   ` Hari Vyas
2018-08-16 23:20                     ` Benjamin Herrenschmidt
2018-08-16 23:17                   ` Benjamin Herrenschmidt
2018-08-17  0:43                     ` Benjamin Herrenschmidt
2018-08-16 19:43             ` Jens Axboe
2018-08-16 21:37               ` Benjamin Herrenschmidt
2018-08-16 21:56                 ` Jens Axboe
2018-08-16 23:09                   ` Benjamin Herrenschmidt
2018-08-17  0:14                     ` Jens Axboe
2018-08-16 12:28         ` Lukas Wunner
2018-08-16 23:25           ` Benjamin Herrenschmidt
2018-08-17  1:12             ` Benjamin Herrenschmidt
2018-08-17 16:39               ` Lukas Wunner
2018-08-18  3:37                 ` Benjamin Herrenschmidt
2018-08-18  9:22                   ` Lukas Wunner
2018-08-18 13:11                     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM5rFu-QCBq9TdoiFtvfNENUaS5Tpay5f0H9_BenEqK8nDgoNg@mail.gmail.com \
    --to=hari.vyas@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-pci@vger.kernel.org \
    --cc=ray.jui@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.