From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: MIME-Version: 1.0 In-Reply-To: <20170819025551.GP28977@bhelgaas-glaptop.roam.corp.google.com> References: <1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com> <20170816134354.GV32525@bhelgaas-glaptop.roam.corp.google.com> <20170819025551.GP28977@bhelgaas-glaptop.roam.corp.google.com> From: Srinath Mannam Date: Sat, 19 Aug 2017 08:55:16 +0530 Message-ID: Subject: Re: [RFC PATCH v3] pci: Concurrency issue during pci enable bridge To: Bjorn Helgaas Cc: BCM Kernel Feedback , Bjorn Helgaas , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org Content-Type: multipart/alternative; boundary="001a113cfa3e684c1d055712cb66" List-ID: --001a113cfa3e684c1d055712cb66 Content-Type: text/plain; charset="UTF-8" Hi Bjorn, Thank you for your guidance. Regards, Srinath. On 19-Aug-2017 8:25 AM, "Bjorn Helgaas" wrote: > On Wed, Aug 16, 2017 at 10:33:12PM +0530, Srinath Mannam wrote: > > Hi Bjorn, > > > > Thank you for the feedback. > > > > My comments are in lined. > > > > On Wed, Aug 16, 2017 at 7:13 PM, Bjorn Helgaas > wrote: > > > On Fri, Aug 04, 2017 at 08:27:28PM +0530, Srinath Mannam wrote: > > >> Concurrency issue is observed during pci enable bridge called > > >> for multiple pci devices initialization in SMP system. > > >> > > >> Setup details: > > >> - SMP system with 8 ARMv8 cores running Linux kernel(4.11). > > >> - Two EPs are connected to PCIe RC through bridge as shown > > >> in the below figure. > > >> > > >> [RC] > > >> | > > >> [BRIDGE] > > >> | > > >> ----------- > > >> | | > > >> [EP] [EP] > > >> > > >> Issue description: > > >> After PCIe enumeration completed EP driver probe function called > > >> for both the devices from two CPUS simultaneously. > > >> From EP probe function, pci_enable_device_mem called for both the EPs. > > >> This function called pci_enable_bridge enable for all the bridges > > >> recursively in the path of EP to RC. > > >> > > >> Inside pci_enable_bridge function, at two places concurrency issue is > > >> observed. > > >> > > >> Place 1: > > >> CPU 0: > > >> 1. Done Atomic increment dev->enable_cnt > > >> in pci_enable_device_flags > > >> 2. Inside pci_enable_resources > > >> 3. Completed pci_read_config_word(dev, PCI_COMMAND, &cmd) > > >> 4. Ready to set PCI_COMMAND_MEMORY (0x2) in > > >> pci_write_config_word(dev, PCI_COMMAND, cmd) > > >> CPU 1: > > >> 1. Check pci_is_enabled in function pci_enable_bridge > > >> and it is true > > >> 2. Check (!dev->is_busmaster) also true > > >> 3. Gone into pci_set_master > > >> 4. Completed pci_read_config_word(dev, PCI_COMMAND, &old_cmd) > > >> 5. Ready to set PCI_COMMAND_MASTER (0x4) in > > >> pci_write_config_word(dev, PCI_COMMAND, cmd) > > >> > > >> By the time of last point for both the CPUs are read value 0 and > > >> ready to write 2 and 4. > > >> After last point final value in PCI_COMMAND register is 4 instead of > 6. > > >> > > >> Place 2: > > >> CPU 0: > > >> 1. Done Atomic increment dev->enable_cnt in > > >> pci_enable_device_flags > > >> > > >> Signed-off-by: Srinath Mannam > > >> --- > > >> drivers/pci/pci.c | 8 ++++++-- > > >> 1 file changed, 6 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > >> index af0cc34..12721df 100644 > > >> --- a/drivers/pci/pci.c > > >> +++ b/drivers/pci/pci.c > > >> @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct > *work); > > >> static LIST_HEAD(pci_pme_list); > > >> static DEFINE_MUTEX(pci_pme_list_mutex); > > >> static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan); > > >> +static DEFINE_MUTEX(pci_bridge_mutex); > > >> > > >> struct pci_pme_device { > > >> struct list_head list; > > >> @@ -1348,10 +1349,11 @@ static void pci_enable_bridge(struct pci_dev > *dev) > > >> if (bridge) > > >> pci_enable_bridge(bridge); > > >> > > >> + mutex_lock(&pci_bridge_mutex); > > >> if (pci_is_enabled(dev)) { > > >> if (!dev->is_busmaster) > > >> pci_set_master(dev); > > >> - return; > > >> + goto end; > > >> } > > >> > > >> retval = pci_enable_device(dev); > > >> @@ -1359,6 +1361,8 @@ static void pci_enable_bridge(struct pci_dev > *dev) > > >> dev_err(&dev->dev, "Error enabling bridge (%d), > continuing\n", > > >> retval); > > >> pci_set_master(dev); > > >> +end: > > >> + mutex_unlock(&pci_bridge_mutex); > > > > > > I think this will deadlock because we're holding pci_bridge_mutex > > > while we call pci_enable_device(), which may recursively call > > > pci_enable_bridge(), which would try to acquire pci_bridge_mutex > > > again. My original suggestion of a mutex in the host bridge would > > > have the same problem. > > > > This extra check "if (bridge && !pci_is_enabled(bridge))" will resolve > > the deadlock in the present patch. > > Oh, I see. In pci_enable_bridge(X), the recursive call to enable > upstream bridges happens *before* acquiring pci_bridge_mutex, so when > we call pci_enable_device(X) while holding the mutex, any upstream > bridges will have been already enabled. > > We recurse all the way to the top of the hierarchy, acquire the mutex, > enable the bridge, release the mutex, then unwind and repeat for each > bridge on the way down to the endpoint. > > > > We talked about using device_lock() earlier. You found some problems > > > with that, and I'd like to understand them better. You said: > > > > > >> But the pci_enable_bridge is called in the context of the driver > > >> probe function, we will have nexted lock problem. > > > > > > The driver core does hold device_lock() while calling the driver probe > > > function, in this path: > > > > > > device_initial_probe > > > __device_attach > > > device_lock(dev) # <-- lock > > > __device_attach_driver > > > ... > > > pci_device_probe > > > ... > > > ->probe # driver probe function > > > device_unlock(dev) # <-- unlock > > > > > > I didn't see your patch using device_lock(), but what I had in mind > > > was something like the patch below, where pci_enable_bridge() acquires > > > the device_lock() of the bridge. > > > > > > For the sake of argument, assume a hierarchy: > > > > > > bridge A -> bridge B -> endpoint C > > > > > > Here's what I think will happen: > > > > > > device_lock(C) # driver core > > > ... > > > ->probe(C) # driver probe function > > > pci_enable_device_flags(C) > > > pci_enable_bridge(B) # enable C's upstream bridge > > > device_lock(B) > > > pci_enable_bridge(A) # enable B's upstream bridge > > > device_lock(A) # A has no upstream bridge > > > pci_enable_device(A) > > > do_pci_enable_device(A) # update A PCI_COMMAND > > > pci_set_master(A) # update A PCI_COMMAND > > > device_unlock(A) > > > pci_enable_device(B) # update B PCI_COMMAND > > > pci_set_master(B) # update B PCI_COMMAND > > > device_unlock(B) > > > do_pci_enable_device(C) # update C PCI_COMMAND > > > device_unlock(C) > > > > > > I don't see a nested lock problem here. What am I missing? > > From the probe call device_lock will taken to that endpoint and also > > for the bus. In this order > > > > pci register driver(C) #(driver_register()) > > device_lock(B); # lock for parent > (__driver_attach()) > > device_lock(C) # lock for endpoint > (__driver_attach()) > > driver probe(C) > > pci_enable_bridge() > > device_lock(B); # here we see the deadlock.because of parent device lock > > Oh, it's the "needed for USB" thing in __driver_attach(); > is that right? > > if (dev->parent) /* Needed for USB */ > device_lock(dev->parent); > device_lock(dev); > > Looks like that came from bf74ad5bc417 ("[PATCH] Hold the device's > parent's lock during probe and remove"), and it's probably not > practical to change it, even if it's not strictly needed for PCI. > > So I think your patch is correct and I applied it to pci/enumeration > for v4.14. Thanks for your patience. > > I didn't quite understand the "Place 2" concurrency issue in your > changelog. I *think* I understand "Place 1", and I updated the > changelog with an ordering that I think illustrates the problem and > added a comment to that effect in the code. > > > commit 40f11adc7cd9281227f0a6a627d966dd0a5f0cd9 > Author: Srinath Mannam > Date: Fri Aug 18 21:50:48 2017 -0500 > > PCI: Avoid race while enabling upstream bridges > > When we enable a device, we first enable any upstream bridges. If a > bridge > has multiple downstream devices and we enable them simultaneously, the > race > to enable the upstream bridge may cause problems. Consider this > hierarchy: > > bridge A --+-- device B > +-- device C > > If drivers for B and C call pci_enable_device() simultaneously, both > will > attempt to enable A, which involves setting PCI_COMMAND_MASTER via > pci_set_master() and PCI_COMMAND_MEMORY via pci_enable_resources(). > > In the following sequence, B's update to set A's PCI_COMMAND_MEMORY is > lost, and neither B nor C will work correctly: > > B C > pci_set_master(A) > cmd = read(A, PCI_COMMAND) > cmd |= PCI_COMMAND_MASTER > pci_set_master(A) > cmd = read(A, PCI_COMMAND) > cmd |= PCI_COMMAND_MASTER > write(A, PCI_COMMAND, cmd) > pci_enable_device(A) > pci_enable_resources(A) > cmd = read(A, PCI_COMMAND) > cmd |= PCI_COMMAND_MEMORY > write(A, PCI_COMMAND, cmd) > write(A, PCI_COMMAND, cmd) > > Avoid this race by holding a new pci_bridge_mutex while enabling a > bridge. > This ensures that both PCI_COMMAND_MASTER and PCI_COMMAND_MEMORY will > be > updated before another thread can start enabling the bridge. > > Note that although pci_enable_bridge() is recursive, it enables any > upstream bridges *before* acquiring the mutex. When it acquires the > mutex > and calls pci_set_master() and pci_enable_device(), any upstream > bridges > have already been enabled so pci_enable_device() will not deadlock by > calling pci_enable_bridge() again. > > Signed-off-by: Srinath Mannam > [bhelgaas: changelog, comment] > Signed-off-by: Bjorn Helgaas > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc3456dc1..7cb29a223b73 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct *work); > static LIST_HEAD(pci_pme_list); > static DEFINE_MUTEX(pci_pme_list_mutex); > static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan); > +static DEFINE_MUTEX(pci_bridge_mutex); > > struct pci_pme_device { > struct list_head list; > @@ -1348,10 +1349,16 @@ static void pci_enable_bridge(struct pci_dev *dev) > if (bridge) > pci_enable_bridge(bridge); > > + /* > + * Hold pci_bridge_mutex to prevent a race when enabling two > + * devices below the bridge simultaneously. The race may cause a > + * PCI_COMMAND_MEMORY update to be lost (see changelog). > + */ > + mutex_lock(&pci_bridge_mutex); > if (pci_is_enabled(dev)) { > if (!dev->is_busmaster) > pci_set_master(dev); > - return; > + goto end; > } > > retval = pci_enable_device(dev); > @@ -1359,6 +1366,8 @@ static void pci_enable_bridge(struct pci_dev *dev) > dev_err(&dev->dev, "Error enabling bridge (%d), > continuing\n", > retval); > pci_set_master(dev); > +end: > + mutex_unlock(&pci_bridge_mutex); > } > > static int pci_enable_device_flags(struct pci_dev *dev, unsigned long > flags) > @@ -1383,7 +1392,7 @@ static int pci_enable_device_flags(struct pci_dev > *dev, unsigned long flags) > return 0; /* already enabled */ > > bridge = pci_upstream_bridge(dev); > - if (bridge) > + if (bridge && !pci_is_enabled(bridge)) > pci_enable_bridge(bridge); > > /* only skip sriov related */ > --001a113cfa3e684c1d055712cb66 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Bjorn,

Th= ank you for your guidance.

Regards,
Srinath.

On 19-Aug-2017 8:25 AM, "Bjorn H= elgaas" <helgaas@kernel.org> wrote:
On Wed= , Aug 16, 2017 at 10:33:12PM +0530, Srinath Mannam wrote:
> Hi Bjorn,
>
> Thank you for the feedback.
>
> My comments are in lined.
>
> On Wed, Aug 16, 2017 at 7:13 PM, Bjorn Helgaas <
helgaas@kernel.org> wrote:
> > On Fri, Aug 04, 2017 at 08:27:28PM +0530, Srinath Mannam wrote: > >> Concurrency issue is observed during pci enable bridge called=
> >> for multiple pci devices initialization in SMP system.
> >>
> >> Setup details:
> >>=C2=A0 - SMP system with 8 ARMv8 cores running Linux kernel(4.= 11).
> >>=C2=A0 - Two EPs are connected to PCIe RC through bridge as sh= own
> >>=C2=A0 =C2=A0 in the below figure.
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 [RC]
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0|
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= [BRIDGE]
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0|
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ------= -----
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [EP]=C2=A0 = =C2=A0 =C2=A0 =C2=A0 [EP]
> >>
> >> Issue description:
> >> After PCIe enumeration completed EP driver probe function cal= led
> >> for both the devices from two CPUS simultaneously.
> >> From EP probe function, pci_enable_device_mem called for both= the EPs.
> >> This function called pci_enable_bridge enable for all the bri= dges
> >> recursively in the path of EP to RC.
> >>
> >> Inside pci_enable_bridge function, at two places concurrency = issue is
> >> observed.
> >>
> >> Place 1:
> >>=C2=A0 =C2=A0CPU 0:
> >>=C2=A0 =C2=A0 =C2=A01. Done Atomic increment dev->enable_cn= t
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 in pci_enable_device_flags
> >>=C2=A0 =C2=A0 =C2=A02. Inside pci_enable_resources
> >>=C2=A0 =C2=A0 =C2=A03. Completed pci_read_config_word(dev, PCI= _COMMAND, &cmd)
> >>=C2=A0 =C2=A0 =C2=A04. Ready to set PCI_COMMAND_MEMORY (0x2) i= n
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_write_config_word(dev, PCI_COM= MAND, cmd)
> >>=C2=A0 =C2=A0CPU 1:
> >>=C2=A0 =C2=A0 =C2=A01. Check pci_is_enabled in function pci_en= able_bridge
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 and it is true
> >>=C2=A0 =C2=A0 =C2=A02. Check (!dev->is_busmaster) also true=
> >>=C2=A0 =C2=A0 =C2=A03. Gone into pci_set_master
> >>=C2=A0 =C2=A0 =C2=A04. Completed pci_read_config_word(dev, PCI= _COMMAND, &old_cmd)
> >>=C2=A0 =C2=A0 =C2=A05. Ready to set PCI_COMMAND_MASTER (0x4) i= n
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_write_config_word(dev, PCI_COM= MAND, cmd)
> >>
> >> By the time of last point for both the CPUs are read value 0 = and
> >> ready to write 2 and 4.
> >> After last point final value in PCI_COMMAND register is 4 ins= tead of 6.
> >>
> >> Place 2:
> >>=C2=A0 =C2=A0CPU 0:
> >>=C2=A0 =C2=A0 =C2=A01. Done Atomic increment dev->enable_cn= t in
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_enable_device_flags
> >>
> >> Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
> >> ---
> >>=C2=A0 drivers/pci/pci.c | 8 ++++++--
> >>=C2=A0 1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index af0cc34..12721df 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_s= truct *work);
> >>=C2=A0 static LIST_HEAD(pci_pme_list);
> >>=C2=A0 static DEFINE_MUTEX(pci_pme_list_mutex);
> >>=C2=A0 static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_= list_scan);
> >> +static DEFINE_MUTEX(pci_bridge_mutex);
> >>
> >>=C2=A0 struct pci_pme_device {
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct list_head list;
> >> @@ -1348,10 +1349,11 @@ static void pci_enable_bridge(struct = pci_dev *dev)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (bridge)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_ena= ble_bridge(bridge);
> >>
> >> +=C2=A0 =C2=A0 =C2=A0mutex_lock(&pci_bridge_mutex);
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (pci_is_enabled(dev)) {
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!de= v->is_busmaster)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0pci_set_master(dev);
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return;
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto end; > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0retval =3D pci_enable_device(dev);<= br> > >> @@ -1359,6 +1361,8 @@ static void pci_enable_bridge(struct pc= i_dev *dev)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dev_err= (&dev->dev, "Error enabling bridge (%d), continuing\n", > >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0retval);
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0pci_set_master(dev);
> >> +end:
> >> +=C2=A0 =C2=A0 =C2=A0mutex_unlock(&pci_bridge_mutex)= ;
> >
> > I think this will deadlock because we're holding pci_bridge_m= utex
> > while we call pci_enable_device(), which may recursively call
> > pci_enable_bridge(), which would try to acquire pci_bridge_mutex<= br> > > again.=C2=A0 My original suggestion of a mutex in the host bridge= would
> > have the same problem.
>
> This extra check "if (bridge && !pci_is_enabled(bridge))&= quot; will resolve
> the deadlock in the present patch.

Oh, I see.=C2=A0 In pci_enable_bridge(X), the recursive call to enable
upstream bridges happens *before* acquiring pci_bridge_mutex, so when
we call pci_enable_device(X) while holding the mutex, any upstream
bridges will have been already enabled.

We recurse all the way to the top of the hierarchy, acquire the mutex,
enable the bridge, release the mutex, then unwind and repeat for each
bridge on the way down to the endpoint.

> > We talked about using device_lock() earlier.=C2=A0 You found some= problems
> > with that, and I'd like to understand them better.=C2=A0 You = said:
> >
> >> But the pci_enable_bridge is called in the context of the dri= ver
> >> probe function, we will have nexted lock problem.
> >
> > The driver core does hold device_lock() while calling the driver = probe
> > function, in this path:
> >
> >=C2=A0 =C2=A0device_initial_probe
> >=C2=A0 =C2=A0 =C2=A0__device_attach
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0device_lock(dev)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # <-- lock
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0__device_attach_driver
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0...
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_device_probe
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0...
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->probe= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# driver prob= e function
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0device_unlock(dev)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 # <-- unlock
> >
> > I didn't see your patch using device_lock(), but what I had i= n mind
> > was something like the patch below, where pci_enable_bridge() acq= uires
> > the device_lock() of the bridge.
> >
> > For the sake of argument, assume a hierarchy:
> >
> >=C2=A0 =C2=A0bridge A -> bridge B -> endpoint C
> >
> > Here's what I think will happen:
> >
> >=C2=A0 =C2=A0device_lock(C)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# driver core
> >=C2=A0 =C2=A0 =C2=A0...
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0->probe(C)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# driver = probe function
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_enable_device_flags(C)
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_enable_bridge(B)=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# enable C's upstream bridge
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0device_lock(B)
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_enable_bridge(= A)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# enable B's upstream bridge
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0device_lock= (A)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# A has no upstream brid= ge
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_enable_= device(A)
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do_p= ci_enable_device(A)=C2=A0 # update A PCI_COMMAND
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_set_mas= ter(A)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # update A PCI_COMMAND
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0device_unlo= ck(A)
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_enable_device(= B)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# update B PCI_COMMAND
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_set_master(B)= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # update B PCI_COMMAND
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0device_unlock(B) > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do_pci_enable_device(C)= =C2=A0 =C2=A0 =C2=A0 =C2=A0 # update C PCI_COMMAND
> >=C2=A0 =C2=A0device_unlock(C)
> >
> > I don't see a nested lock problem here.=C2=A0 What am I missi= ng?
> From the probe call device_lock will taken to that endpoint and also > for the bus. In this order
>
> pci register driver(C)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= #(driver_register())
> device_lock(B);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0# lock for parent (__driver_attach())
> device_lock(C)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0# lock for endpoint (__driver_attach())
> driver probe(C)
> pci_enable_bridge()
> device_lock(B);=C2=A0 # here we see the deadlock.because of parent dev= ice lock

Oh, it's the "needed for USB" thing in __driver_attach();
is that right?

=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (dev->parent)=C2=A0 =C2=A0 =C2=A0 =C2=A0 = /* Needed for USB */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 device_lock(dev->= ;parent);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 device_lock(dev);

Looks like that came from bf74ad5bc417 ("[PATCH] Hold the device's=
parent's lock during probe and remove"), and it's probably not=
practical to change it, even if it's not strictly needed for PCI.

So I think your patch is correct and I applied it to pci/enumeration
for v4.14.=C2=A0 Thanks for your patience.

I didn't quite understand the "Place 2" concurrency issue in = your
changelog.=C2=A0 I *think* I understand "Place 1", and I updated = the
changelog with an ordering that I think illustrates the problem and
added a comment to that effect in the code.


commit 40f11adc7cd9281227f0a6a627d966dd0a5f0cd9
Author: Srinath Mannam <s= rinath.mannam@broadcom.com>
Date:=C2=A0 =C2=A0Fri Aug 18 21:50:48 2017 -0500

=C2=A0 =C2=A0 PCI: Avoid race while enabling upstream bridges

=C2=A0 =C2=A0 When we enable a device, we first enable any upstream bridges= .=C2=A0 If a bridge
=C2=A0 =C2=A0 has multiple downstream devices and we enable them simultaneo= usly, the race
=C2=A0 =C2=A0 to enable the upstream bridge may cause problems.=C2=A0 Consi= der this hierarchy:

=C2=A0 =C2=A0 =C2=A0 bridge A --+-- device B
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0+-- device C<= br>
=C2=A0 =C2=A0 If drivers for B and C call pci_enable_device() simultaneousl= y, both will
=C2=A0 =C2=A0 attempt to enable A, which involves setting PCI_COMMAND_MASTE= R via
=C2=A0 =C2=A0 pci_set_master() and PCI_COMMAND_MEMORY via pci_enable_resour= ces().

=C2=A0 =C2=A0 In the following sequence, B's update to set A's PCI_= COMMAND_MEMORY is
=C2=A0 =C2=A0 lost, and neither B nor C will work correctly:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 C=
=C2=A0 =C2=A0 =C2=A0 pci_set_master(A)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 cmd =3D read(A, PCI_COMMAND)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 cmd |=3D PCI_COMMAND_MASTER
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pci_set_m= aster(A)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cm= d =3D read(A, PCI_COMMAND)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cm= d |=3D PCI_COMMAND_MASTER
=C2=A0 =C2=A0 =C2=A0 =C2=A0 write(A, PCI_COMMAND, cmd)
=C2=A0 =C2=A0 =C2=A0 pci_enable_device(A)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_enable_resources(A)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cmd =3D read(A, PCI_COMMAND)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cmd |=3D PCI_COMMAND_MEMORY
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 write(A, PCI_COMMAND, cmd)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0wr= ite(A, PCI_COMMAND, cmd)

=C2=A0 =C2=A0 Avoid this race by holding a new pci_bridge_mutex while enabl= ing a bridge.
=C2=A0 =C2=A0 This ensures that both PCI_COMMAND_MASTER and PCI_COMMAND_MEM= ORY will be
=C2=A0 =C2=A0 updated before another thread can start enabling the bridge.<= br>
=C2=A0 =C2=A0 Note that although pci_enable_bridge() is recursive, it enabl= es any
=C2=A0 =C2=A0 upstream bridges *before* acquiring the mutex.=C2=A0 When it = acquires the mutex
=C2=A0 =C2=A0 and calls pci_set_master() and pci_enable_device(), any upstr= eam bridges
=C2=A0 =C2=A0 have already been enabled so pci_enable_device() will not dea= dlock by
=C2=A0 =C2=A0 calling pci_enable_bridge() again.

=C2=A0 =C2=A0 Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
=C2=A0 =C2=A0 [bhelgaas: changelog, comment]
=C2=A0 =C2=A0 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc3456dc1..7cb29a223b73 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct *work);<= br> =C2=A0static LIST_HEAD(pci_pme_list);
=C2=A0static DEFINE_MUTEX(pci_pme_list_mutex);
=C2=A0static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan); +static DEFINE_MUTEX(pci_bridge_mutex);

=C2=A0struct pci_pme_device {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 struct list_head list;
@@ -1348,10 +1349,16 @@ static void pci_enable_bridge(struct pci_dev *dev)<= br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (bridge)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_enable_bridge(b= ridge);

+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * Hold pci_bridge_mutex to prevent a race when= enabling two
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * devices below the bridge simultaneously.=C2= =A0 The race may cause a
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * PCI_COMMAND_MEMORY update to be lost (see ch= angelog).
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_lock(&pci_bridge_mutex);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (pci_is_enabled(dev)) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!dev->is_bus= master)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 pci_set_master(dev);
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto end;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }

=C2=A0 =C2=A0 =C2=A0 =C2=A0 retval =3D pci_enable_device(dev);
@@ -1359,6 +1366,8 @@ static void pci_enable_bridge(struct pci_dev *dev) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_err(&dev-&g= t;dev, "Error enabling bridge (%d), continuing\n",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 retval);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_set_master(dev);
+end:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(&pci_bridge_mutex);
=C2=A0}

=C2=A0static int pci_enable_device_flags(struct pci_dev *dev, unsigned long= flags)
@@ -1383,7 +1392,7 @@ static int pci_enable_device_flags(struct pci_dev *de= v, unsigned long flags)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return 0;=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* already enabled */

=C2=A0 =C2=A0 =C2=A0 =C2=A0 bridge =3D pci_upstream_bridge(dev);
-=C2=A0 =C2=A0 =C2=A0 =C2=A0if (bridge)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (bridge && !pci_is_enabled(bridge))<= br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pci_enable_bridge(b= ridge);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 /* only skip sriov related */
--001a113cfa3e684c1d055712cb66--