All of lore.kernel.org
 help / color / mirror / Atom feed
* Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
@ 2014-01-30 13:12 Mika Westerberg
  2014-01-30 16:48 ` Yinghai Lu
  2014-01-31 23:34 ` [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()" Rafael J. Wysocki
  0 siblings, 2 replies; 27+ messages in thread
From: Mika Westerberg @ 2014-01-30 13:12 UTC (permalink / raw)
  To: linux-pci; +Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki

Hi,

The latest mainline kernel "hangs" when Thunderbolt devices are
hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
getting huge amounts of messages like:

[  352.717001] pci 0000:02:00.0: PME# disabled
[  352.717011] pci 0000:02:00.0: PME# disabled
[  352.717021] pci 0000:02:00.0: PME# disabled
[  352.717032] pci 0000:02:00.0: PME# disabled
[  352.717041] pci 0000:02:00.0: PME# disabled
[  352.717051] pci 0000:02:00.0: PME# disabled
[  352.717061] pci 0000:02:00.0: PME# disabled
[  352.717070] pci 0000:02:00.0: PME# disabled
[  352.717083] pci 0000:02:00.0: PME# disabled
[  352.717094] pci 0000:02:00.0: PME# disabled
[  352.717104] pci 0000:02:00.0: PME# disabled
[  352.717113] pci 0000:02:00.0: PME# disabled
[  352.717124] pci 0000:02:00.0: PME# disabled
[  352.717133] pci 0000:02:00.0: PME# disabled
[  352.717143] pci 0000:02:00.0: PME# disabled
[  352.717153] pci 0000:02:00.0: PME# disabled
[  352.717162] pci 0000:02:00.0: PME# disabled

and then the system becomes really unresponsive.

Reverting the commit in $subject makes TBT work again.

Please let me know if you need any additional information. The system I'm
testing on is Intel NUC.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 13:12 Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug Mika Westerberg
@ 2014-01-30 16:48 ` Yinghai Lu
  2014-01-30 16:56   ` Yinghai Lu
  2014-01-31 23:34 ` [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()" Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Yinghai Lu @ 2014-01-30 16:48 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thu, Jan 30, 2014 at 5:12 AM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> Hi,
>
> The latest mainline kernel "hangs" when Thunderbolt devices are
> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> getting huge amounts of messages like:
>
> [  352.717001] pci 0000:02:00.0: PME# disabled
> [  352.717011] pci 0000:02:00.0: PME# disabled
> [  352.717021] pci 0000:02:00.0: PME# disabled
> [  352.717032] pci 0000:02:00.0: PME# disabled
> [  352.717041] pci 0000:02:00.0: PME# disabled
> [  352.717051] pci 0000:02:00.0: PME# disabled
> [  352.717061] pci 0000:02:00.0: PME# disabled
> [  352.717070] pci 0000:02:00.0: PME# disabled
> [  352.717083] pci 0000:02:00.0: PME# disabled
> [  352.717094] pci 0000:02:00.0: PME# disabled
> [  352.717104] pci 0000:02:00.0: PME# disabled
> [  352.717113] pci 0000:02:00.0: PME# disabled
> [  352.717124] pci 0000:02:00.0: PME# disabled
> [  352.717133] pci 0000:02:00.0: PME# disabled
> [  352.717143] pci 0000:02:00.0: PME# disabled
> [  352.717153] pci 0000:02:00.0: PME# disabled
> [  352.717162] pci 0000:02:00.0: PME# disabled

that mean pci_stop_dev() get called again and again ?

>
> and then the system becomes really unresponsive.
>
> Reverting the commit in $subject makes TBT work again.

Did you bisect to that commit?

Can you test just before
| 9d16947b75831acd317ab9a53e0e94d160731d33
| Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| Date:   Fri Jan 10 15:22:18 2014 +0100
|
|    PCI: Add global pci_lock_rescan_remove()

that patch and following patches could change some calling sequence.

>
> Please let me know if you need any additional information. The system I'm
> testing on is Intel NUC.

I am surprised as

| commit ef83b0781a73f9efcb1228256bfdfb97fc9533a8
| Author: Yinghai Lu <yinghai@kernel.org>
| Date:   Sat Nov 30 14:40:29 2013 -0800
|
|    PCI: Remove from bus_list and release resources in pci_release_dev()

has been in pci/next for a while.
So you did not test pci/next before?

Please post boot log with "debug ignore_loglevel initcall_debug".

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 16:48 ` Yinghai Lu
@ 2014-01-30 16:56   ` Yinghai Lu
  2014-01-30 23:39     ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Yinghai Lu @ 2014-01-30 16:56 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: linux-pci, Bjorn Helgaas, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]

>> The latest mainline kernel "hangs" when Thunderbolt devices are
>> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
>> getting huge amounts of messages like:
>>
>> [  352.717001] pci 0000:02:00.0: PME# disabled
>> [  352.717011] pci 0000:02:00.0: PME# disabled
>> [  352.717021] pci 0000:02:00.0: PME# disabled
>> [  352.717032] pci 0000:02:00.0: PME# disabled
>> [  352.717041] pci 0000:02:00.0: PME# disabled
>> [  352.717051] pci 0000:02:00.0: PME# disabled
>> [  352.717061] pci 0000:02:00.0: PME# disabled
>> [  352.717070] pci 0000:02:00.0: PME# disabled
>> [  352.717083] pci 0000:02:00.0: PME# disabled
>> [  352.717094] pci 0000:02:00.0: PME# disabled
>> [  352.717104] pci 0000:02:00.0: PME# disabled
>> [  352.717113] pci 0000:02:00.0: PME# disabled
>> [  352.717124] pci 0000:02:00.0: PME# disabled
>> [  352.717133] pci 0000:02:00.0: PME# disabled
>> [  352.717143] pci 0000:02:00.0: PME# disabled
>> [  352.717153] pci 0000:02:00.0: PME# disabled
>> [  352.717162] pci 0000:02:00.0: PME# disabled
>
> that mean pci_stop_dev() get called again and again ?

please check if attached patch could help.

it should prevent possible reattaching driver.

---
 drivers/pci/remove.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -11,6 +11,7 @@ static void pci_stop_dev(struct pci_dev
         pci_proc_detach_device(dev);
         pci_remove_sysfs_dev_files(dev);
         device_release_driver(&dev->dev);
+        dev->match_driver = false;
         dev->is_added = 0;
     }

[-- Attachment #2: not_allow_attach_driver.patch --]
[-- Type: text/x-patch, Size: 470 bytes --]

---
 drivers/pci/remove.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -11,6 +11,7 @@ static void pci_stop_dev(struct pci_dev
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		device_release_driver(&dev->dev);
+		dev->match_driver = false;
 		dev->is_added = 0;
 	}
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 23:39     ` Rafael J. Wysocki
@ 2014-01-30 23:39       ` Yinghai Lu
  2014-01-30 23:59         ` Rafael J. Wysocki
  2014-01-31  0:39         ` Rafael J. Wysocki
  2014-01-30 23:58       ` Rafael J. Wysocki
  1 sibling, 2 replies; 27+ messages in thread
From: Yinghai Lu @ 2014-01-30 23:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
>>
>> --047d7b5d2ea4eb937804f132eedf
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> >> The latest mainline kernel "hangs" when Thunderbolt devices are
>> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
>> >> getting huge amounts of messages like:
>> >>
>> >> [  352.717001] pci 0000:02:00.0: PME# disabled
>> >> [  352.717011] pci 0000:02:00.0: PME# disabled
>> >> [  352.717021] pci 0000:02:00.0: PME# disabled
>> >> [  352.717032] pci 0000:02:00.0: PME# disabled
>> >> [  352.717041] pci 0000:02:00.0: PME# disabled
>> >> [  352.717051] pci 0000:02:00.0: PME# disabled
>> >> [  352.717061] pci 0000:02:00.0: PME# disabled
>> >> [  352.717070] pci 0000:02:00.0: PME# disabled
>> >> [  352.717083] pci 0000:02:00.0: PME# disabled
>> >> [  352.717094] pci 0000:02:00.0: PME# disabled
>> >> [  352.717104] pci 0000:02:00.0: PME# disabled
>> >> [  352.717113] pci 0000:02:00.0: PME# disabled
>> >> [  352.717124] pci 0000:02:00.0: PME# disabled
>> >> [  352.717133] pci 0000:02:00.0: PME# disabled
>> >> [  352.717143] pci 0000:02:00.0: PME# disabled
>> >> [  352.717153] pci 0000:02:00.0: PME# disabled
>> >> [  352.717162] pci 0000:02:00.0: PME# disabled
>> >
>> > that mean pci_stop_dev() get called again and again ?
>>
>> please check if attached patch could help.
>
> Well, it looks like what happens is an endless loop in
> acpiphp_glue.c:disable_slot().
>
> dev_in_slot() returns the first device in the list, so
> pci_stop_and_remove_bus_device() is called for it, but it
> doesn't remove the device from bus->devices any more, so
> dev_in_slot() will return the same device next time and
> so on forever.
>
...
>
> So the above won't help in my opinion.
>
> I wonder, however, if this patch helps instead:
>
> https://patchwork.kernel.org/patch/3540701/
>
> I thought it would be 3.15 material, but it very well can go in earlier if
> it happens to address this particular problem.

Agree, that should fix the problem.

but please use list_for_each_entry_safe_reverse
instead.

please refer to pciehp changelog in

commit 29ed1f29b68a8395d5679b3c4e38352b617b3236
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Jul 19 12:14:16 2013 -0700

    PCI: pciehp: Fix null pointer deref when hot-removing SR-IOV device

    Hot-removing a device with SR-IOV enabled causes a null pointer dereference
    in v3.9 and v3.10.

    This is a regression caused by ba518e3c17 ("PCI: pciehp: Iterate over all
    devices in slot, not functions 0-7").  When we iterate over the
    bus->devices list, we first remove the PF, which also removes all the VFs
    from the list.  Then the list iterator blows up because more than just the
    current entry was removed from the list.

    ac205b7bb7 ("PCI: make sriov work with hotplug remove") works around a
    similar problem in pci_stop_bus_devices() by iterating over the list in
    reverse, so the VFs are stopped and removed from the list first, before the
    PF.

    This patch changes pciehp_unconfigure_device() to iterate over the list in
    reverse, too.

    [bhelgaas: bugzilla, changelog]
    Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60604

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 16:56   ` Yinghai Lu
@ 2014-01-30 23:39     ` Rafael J. Wysocki
  2014-01-30 23:39       ` Yinghai Lu
  2014-01-30 23:58       ` Rafael J. Wysocki
  0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-30 23:39 UTC (permalink / raw)
  To: Yinghai Lu, Mika Westerberg; +Cc: linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> 
> --047d7b5d2ea4eb937804f132eedf
> Content-Type: text/plain; charset=ISO-8859-1
> 
> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> >> getting huge amounts of messages like:
> >>
> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> >
> > that mean pci_stop_dev() get called again and again ?
> 
> please check if attached patch could help.

Well, it looks like what happens is an endless loop in
acpiphp_glue.c:disable_slot().

dev_in_slot() returns the first device in the list, so
pci_stop_and_remove_bus_device() is called for it, but it
doesn't remove the device from bus->devices any more, so
dev_in_slot() will return the same device next time and
so on forever.

> it should prevent possible reattaching driver.

If my analysis above is correct, this isn't related to attaching drivers
and rather to the way dev_in_slot() is designed.

> ---
>  drivers/pci/remove.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -11,6 +11,7 @@ static void pci_stop_dev(struct pci_dev
>          pci_proc_detach_device(dev);
>          pci_remove_sysfs_dev_files(dev);
>          device_release_driver(&dev->dev);
> +        dev->match_driver = false;
>          dev->is_added = 0;
>      }

So the above won't help in my opinion.

I wonder, however, if this patch helps instead:

https://patchwork.kernel.org/patch/3540701/

I thought it would be 3.15 material, but it very well can go in earlier if
it happens to address this particular problem.

Mika, can you please try that?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 23:39     ` Rafael J. Wysocki
  2014-01-30 23:39       ` Yinghai Lu
@ 2014-01-30 23:58       ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-30 23:58 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 12:39:07 AM Rafael J. Wysocki wrote:
> On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> > 
> > --047d7b5d2ea4eb937804f132eedf
> > Content-Type: text/plain; charset=ISO-8859-1
> > 
> > >> The latest mainline kernel "hangs" when Thunderbolt devices are
> > >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> > >> getting huge amounts of messages like:
> > >>
> > >> [  352.717001] pci 0000:02:00.0: PME# disabled
> > >> [  352.717011] pci 0000:02:00.0: PME# disabled
> > >> [  352.717021] pci 0000:02:00.0: PME# disabled
> > >> [  352.717032] pci 0000:02:00.0: PME# disabled
> > >> [  352.717041] pci 0000:02:00.0: PME# disabled
> > >> [  352.717051] pci 0000:02:00.0: PME# disabled
> > >> [  352.717061] pci 0000:02:00.0: PME# disabled
> > >> [  352.717070] pci 0000:02:00.0: PME# disabled
> > >> [  352.717083] pci 0000:02:00.0: PME# disabled
> > >> [  352.717094] pci 0000:02:00.0: PME# disabled
> > >> [  352.717104] pci 0000:02:00.0: PME# disabled
> > >> [  352.717113] pci 0000:02:00.0: PME# disabled
> > >> [  352.717124] pci 0000:02:00.0: PME# disabled
> > >> [  352.717133] pci 0000:02:00.0: PME# disabled
> > >> [  352.717143] pci 0000:02:00.0: PME# disabled
> > >> [  352.717153] pci 0000:02:00.0: PME# disabled
> > >> [  352.717162] pci 0000:02:00.0: PME# disabled
> > >
> > > that mean pci_stop_dev() get called again and again ?
> > 
> > please check if attached patch could help.
> 
> Well, it looks like what happens is an endless loop in
> acpiphp_glue.c:disable_slot().
> 
> dev_in_slot() returns the first device in the list, so
> pci_stop_and_remove_bus_device() is called for it, but it
> doesn't remove the device from bus->devices any more, so
> dev_in_slot() will return the same device next time and
> so on forever.
> 
> > it should prevent possible reattaching driver.
> 
> If my analysis above is correct, this isn't related to attaching drivers
> and rather to the way dev_in_slot() is designed.
> 
> > ---
> >  drivers/pci/remove.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > Index: linux-2.6/drivers/pci/remove.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/pci/remove.c
> > +++ linux-2.6/drivers/pci/remove.c
> > @@ -11,6 +11,7 @@ static void pci_stop_dev(struct pci_dev
> >          pci_proc_detach_device(dev);
> >          pci_remove_sysfs_dev_files(dev);
> >          device_release_driver(&dev->dev);
> > +        dev->match_driver = false;
> >          dev->is_added = 0;
> >      }
> 
> So the above won't help in my opinion.
> 
> I wonder, however, if this patch helps instead:
> 
> https://patchwork.kernel.org/patch/3540701/

And even if it does, I wonder what happens if someone walks bus->devices of
that bus after we've run pci_stop_and_remove_bus_device() for one of its
devices and before that device is released?  The device is pretty much dead
at this point, so won't stepping on it during the walk cause any problems to
happen?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 23:39       ` Yinghai Lu
@ 2014-01-30 23:59         ` Rafael J. Wysocki
  2014-01-31  0:38           ` Rafael J. Wysocki
  2014-01-31  0:39         ` Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-30 23:59 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> >>
> >> --047d7b5d2ea4eb937804f132eedf
> >> Content-Type: text/plain; charset=ISO-8859-1
> >>
> >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> >> >> getting huge amounts of messages like:
> >> >>
> >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> >> >
> >> > that mean pci_stop_dev() get called again and again ?
> >>
> >> please check if attached patch could help.
> >
> > Well, it looks like what happens is an endless loop in
> > acpiphp_glue.c:disable_slot().
> >
> > dev_in_slot() returns the first device in the list, so
> > pci_stop_and_remove_bus_device() is called for it, but it
> > doesn't remove the device from bus->devices any more, so
> > dev_in_slot() will return the same device next time and
> > so on forever.
> >
> ...
> >
> > So the above won't help in my opinion.
> >
> > I wonder, however, if this patch helps instead:
> >
> > https://patchwork.kernel.org/patch/3540701/
> >
> > I thought it would be 3.15 material, but it very well can go in earlier if
> > it happens to address this particular problem.
> 
> Agree, that should fix the problem.
> 
> but please use list_for_each_entry_safe_reverse
> instead.

OK, I will.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 23:59         ` Rafael J. Wysocki
@ 2014-01-31  0:38           ` Rafael J. Wysocki
  2014-01-31  1:39             ` Yinghai Lu
  2014-01-31 10:53             ` Mika Westerberg
  0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31  0:38 UTC (permalink / raw)
  To: Yinghai Lu, Mika Westerberg; +Cc: linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 12:59:06 AM Rafael J. Wysocki wrote:
> On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> > On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> > >>
> > >> --047d7b5d2ea4eb937804f132eedf
> > >> Content-Type: text/plain; charset=ISO-8859-1
> > >>
> > >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> > >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> > >> >> getting huge amounts of messages like:
> > >> >>
> > >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> > >> >
> > >> > that mean pci_stop_dev() get called again and again ?
> > >>
> > >> please check if attached patch could help.
> > >
> > > Well, it looks like what happens is an endless loop in
> > > acpiphp_glue.c:disable_slot().
> > >
> > > dev_in_slot() returns the first device in the list, so
> > > pci_stop_and_remove_bus_device() is called for it, but it
> > > doesn't remove the device from bus->devices any more, so
> > > dev_in_slot() will return the same device next time and
> > > so on forever.
> > >
> > ...
> > >
> > > So the above won't help in my opinion.
> > >
> > > I wonder, however, if this patch helps instead:
> > >
> > > https://patchwork.kernel.org/patch/3540701/
> > >
> > > I thought it would be 3.15 material, but it very well can go in earlier if
> > > it happens to address this particular problem.
> > 
> > Agree, that should fix the problem.
> > 
> > but please use list_for_each_entry_safe_reverse
> > instead.
> 
> OK, I will.

Mika, below is an updated patch to try.

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: ACPI / hotplug / PCI: Simplify disable_slot()

After recent PCI core changes related to the rescan/remove locking,
the ACPIPHP's disable_slot() function is only called under the
general PCI rescan/remove lock, so it doesn't have to use
dev_in_slot() any more to avoid race conditions.  Make it simply
walk the devices on the bus and drop the ones in the slot being
disabled and drop dev_in_slot() which has no more users.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/hotplug/acpiphp_glue.c |   28 +++++-----------------------
 1 file changed, 5 insertions(+), 23 deletions(-)

Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
===================================================================
--- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
+++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
@@ -604,32 +604,15 @@ static void __ref enable_slot(struct acp
 	}
 }
 
-/* return first device in slot, acquiring a reference on it */
-static struct pci_dev *dev_in_slot(struct acpiphp_slot *slot)
-{
-	struct pci_bus *bus = slot->bus;
-	struct pci_dev *dev;
-	struct pci_dev *ret = NULL;
-
-	down_read(&pci_bus_sem);
-	list_for_each_entry(dev, &bus->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot->device) {
-			ret = pci_dev_get(dev);
-			break;
-		}
-	up_read(&pci_bus_sem);
-
-	return ret;
-}
-
 /**
  * disable_slot - disable a slot
  * @slot: ACPI PHP slot
  */
 static void disable_slot(struct acpiphp_slot *slot)
 {
+	struct pci_bus *bus = slot->bus;
+	struct pci_dev *dev, *prev;
 	struct acpiphp_func *func;
-	struct pci_dev *pdev;
 
 	/*
 	 * enable_slot() enumerates all functions in this device via
@@ -637,10 +620,9 @@ static void disable_slot(struct acpiphp_
 	 * methods (_EJ0, etc.) or not.  Therefore, we remove all functions
 	 * here.
 	 */
-	while ((pdev = dev_in_slot(slot))) {
-		pci_stop_and_remove_bus_device(pdev);
-		pci_dev_put(pdev);
-	}
+	list_for_each_entry_safe_reverse(dev, prev, &bus->devices, bus_list)
+		if (PCI_SLOT(dev->devfn) == slot->device)
+			pci_stop_and_remove_bus_device(dev);
 
 	list_for_each_entry(func, &slot->funcs, sibling)
 		acpiphp_bus_trim(func_to_handle(func));


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-30 23:39       ` Yinghai Lu
  2014-01-30 23:59         ` Rafael J. Wysocki
@ 2014-01-31  0:39         ` Rafael J. Wysocki
  2014-01-31  1:04           ` Rafael J. Wysocki
  2014-01-31  1:38           ` Yinghai Lu
  1 sibling, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31  0:39 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> >>
> >> --047d7b5d2ea4eb937804f132eedf
> >> Content-Type: text/plain; charset=ISO-8859-1
> >>
> >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> >> >> getting huge amounts of messages like:
> >> >>
> >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> >> >
> >> > that mean pci_stop_dev() get called again and again ?
> >>
> >> please check if attached patch could help.
> >
> > Well, it looks like what happens is an endless loop in
> > acpiphp_glue.c:disable_slot().
> >
> > dev_in_slot() returns the first device in the list, so
> > pci_stop_and_remove_bus_device() is called for it, but it
> > doesn't remove the device from bus->devices any more, so
> > dev_in_slot() will return the same device next time and
> > so on forever.
> >
> ...
> >
> > So the above won't help in my opinion.
> >
> > I wonder, however, if this patch helps instead:
> >
> > https://patchwork.kernel.org/patch/3540701/
> >
> > I thought it would be 3.15 material, but it very well can go in earlier if
> > it happens to address this particular problem.
> 
> Agree, that should fix the problem.
> 
> but please use list_for_each_entry_safe_reverse
> instead.
> 
> please refer to pciehp changelog in
> 
> commit 29ed1f29b68a8395d5679b3c4e38352b617b3236
> Author: Yinghai Lu <yinghai@kernel.org>
> Date:   Fri Jul 19 12:14:16 2013 -0700
> 
>     PCI: pciehp: Fix null pointer deref when hot-removing SR-IOV device
> 
>     Hot-removing a device with SR-IOV enabled causes a null pointer dereference
>     in v3.9 and v3.10.
> 
>     This is a regression caused by ba518e3c17 ("PCI: pciehp: Iterate over all
>     devices in slot, not functions 0-7").  When we iterate over the
>     bus->devices list, we first remove the PF, which also removes all the VFs
>     from the list.  Then the list iterator blows up because more than just the
>     current entry was removed from the list.
> 
>     ac205b7bb7 ("PCI: make sriov work with hotplug remove") works around a
>     similar problem in pci_stop_bus_devices() by iterating over the list in
>     reverse, so the VFs are stopped and removed from the list first, before the
>     PF.
> 
>     This patch changes pciehp_unconfigure_device() to iterate over the list in
>     reverse, too.
> 
>     [bhelgaas: bugzilla, changelog]
>     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60604

So I gather all of bus->devices list walks during which devices may be removed
should be done in the reverse order, right?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31  0:39         ` Rafael J. Wysocki
@ 2014-01-31  1:04           ` Rafael J. Wysocki
  2014-01-31  1:38           ` Yinghai Lu
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31  1:04 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 01:39:38 AM Rafael J. Wysocki wrote:
> On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> > On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> > >>
> > >> --047d7b5d2ea4eb937804f132eedf
> > >> Content-Type: text/plain; charset=ISO-8859-1
> > >>
> > >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> > >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> > >> >> getting huge amounts of messages like:
> > >> >>
> > >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> > >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> > >> >
> > >> > that mean pci_stop_dev() get called again and again ?
> > >>
> > >> please check if attached patch could help.
> > >
> > > Well, it looks like what happens is an endless loop in
> > > acpiphp_glue.c:disable_slot().
> > >
> > > dev_in_slot() returns the first device in the list, so
> > > pci_stop_and_remove_bus_device() is called for it, but it
> > > doesn't remove the device from bus->devices any more, so
> > > dev_in_slot() will return the same device next time and
> > > so on forever.
> > >
> > ...
> > >
> > > So the above won't help in my opinion.
> > >
> > > I wonder, however, if this patch helps instead:
> > >
> > > https://patchwork.kernel.org/patch/3540701/
> > >
> > > I thought it would be 3.15 material, but it very well can go in earlier if
> > > it happens to address this particular problem.
> > 
> > Agree, that should fix the problem.
> > 
> > but please use list_for_each_entry_safe_reverse
> > instead.
> > 
> > please refer to pciehp changelog in
> > 
> > commit 29ed1f29b68a8395d5679b3c4e38352b617b3236
> > Author: Yinghai Lu <yinghai@kernel.org>
> > Date:   Fri Jul 19 12:14:16 2013 -0700
> > 
> >     PCI: pciehp: Fix null pointer deref when hot-removing SR-IOV device
> > 
> >     Hot-removing a device with SR-IOV enabled causes a null pointer dereference
> >     in v3.9 and v3.10.
> > 
> >     This is a regression caused by ba518e3c17 ("PCI: pciehp: Iterate over all
> >     devices in slot, not functions 0-7").  When we iterate over the
> >     bus->devices list, we first remove the PF, which also removes all the VFs
> >     from the list.  Then the list iterator blows up because more than just the
> >     current entry was removed from the list.
> > 
> >     ac205b7bb7 ("PCI: make sriov work with hotplug remove") works around a
> >     similar problem in pci_stop_bus_devices() by iterating over the list in
> >     reverse, so the VFs are stopped and removed from the list first, before the
> >     PF.
> > 
> >     This patch changes pciehp_unconfigure_device() to iterate over the list in
> >     reverse, too.
> > 
> >     [bhelgaas: bugzilla, changelog]
> >     Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60604
> 
> So I gather all of bus->devices list walks during which devices may be removed
> should be done in the reverse order, right?

So it looks like we need the patch below (on top of https://patchwork.kernel.org/patch/3559831/),
right?

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order

According to the changelog of commit 29ed1f29b68a (PCI: pciehp: Fix null
pointer deref when hot-removing SR-IOV device) it is unsafe to walk the
bus->devices list of a PCI bus and remove devices from it in direct order,
because that may lead to NULL pointer dereferences related to virtual
functions.

For this reason, change all of the bus->devices list walks in
acpiphp_glue.c during which devices may be removed to be carried out in
reverse order.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/hotplug/acpiphp_glue.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
===================================================================
--- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
+++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
@@ -724,7 +724,7 @@ static void trim_stale_devices(struct pc
 
 		/* The device is a bridge. so check the bus below it. */
 		pm_runtime_get_sync(&dev->dev);
-		list_for_each_entry_safe(child, tmp, &bus->devices, bus_list)
+		list_for_each_entry_safe_reverse(child, tmp, &bus->devices, bus_list)
 			trim_stale_devices(child);
 
 		pm_runtime_put(&dev->dev);
@@ -755,8 +755,8 @@ static void acpiphp_check_bridge(struct
 			; /* do nothing */
 		} else if (get_slot_status(slot) == ACPI_STA_ALL) {
 			/* remove stale devices if any */
-			list_for_each_entry_safe(dev, tmp, &bus->devices,
-						 bus_list)
+			list_for_each_entry_safe_reverse(dev, tmp,
+							 &bus->devices, bus_list)
 				if (PCI_SLOT(dev->devfn) == slot->device)
 					trim_stale_devices(dev);
 
@@ -787,7 +787,7 @@ static void acpiphp_sanitize_bus(struct
 	int i;
 	unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM;
 
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+	list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
 		for (i=0; i<PCI_BRIDGE_RESOURCES; i++) {
 			struct resource *res = &dev->resource[i];
 			if ((res->flags & type_mask) && !res->start &&


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31  0:39         ` Rafael J. Wysocki
  2014-01-31  1:04           ` Rafael J. Wysocki
@ 2014-01-31  1:38           ` Yinghai Lu
  1 sibling, 0 replies; 27+ messages in thread
From: Yinghai Lu @ 2014-01-31  1:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thu, Jan 30, 2014 at 4:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
>
> So I gather all of bus->devices list walks during which devices may be removed
> should be done in the reverse order, right?

Yes.

Yinghai

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31  0:38           ` Rafael J. Wysocki
@ 2014-01-31  1:39             ` Yinghai Lu
  2014-01-31 10:53             ` Mika Westerberg
  1 sibling, 0 replies; 27+ messages in thread
From: Yinghai Lu @ 2014-01-31  1:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Thu, Jan 30, 2014 at 4:38 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Friday, January 31, 2014 12:59:06 AM Rafael J. Wysocki wrote:
>> On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
>> > On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
>> > >>
>> > >> --047d7b5d2ea4eb937804f132eedf
>> > >> Content-Type: text/plain; charset=ISO-8859-1
>> > >>
>> > >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
>> > >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
>> > >> >> getting huge amounts of messages like:
>> > >> >>
>> > >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
>> > >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
>> > >> >
>> > >> > that mean pci_stop_dev() get called again and again ?
>> > >>
>> > >> please check if attached patch could help.
>> > >
>> > > Well, it looks like what happens is an endless loop in
>> > > acpiphp_glue.c:disable_slot().
>> > >
>> > > dev_in_slot() returns the first device in the list, so
>> > > pci_stop_and_remove_bus_device() is called for it, but it
>> > > doesn't remove the device from bus->devices any more, so
>> > > dev_in_slot() will return the same device next time and
>> > > so on forever.
>> > >
>> > ...
>> > >
>> > > So the above won't help in my opinion.
>> > >
>> > > I wonder, however, if this patch helps instead:
>> > >
>> > > https://patchwork.kernel.org/patch/3540701/
>> > >
>> > > I thought it would be 3.15 material, but it very well can go in earlier if
>> > > it happens to address this particular problem.
>> >
>> > Agree, that should fix the problem.
>> >
>> > but please use list_for_each_entry_safe_reverse
>> > instead.
>>
>> OK, I will.
>
> Mika, below is an updated patch to try.
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: ACPI / hotplug / PCI: Simplify disable_slot()
>
> After recent PCI core changes related to the rescan/remove locking,
> the ACPIPHP's disable_slot() function is only called under the
> general PCI rescan/remove lock, so it doesn't have to use
> dev_in_slot() any more to avoid race conditions.  Make it simply
> walk the devices on the bus and drop the ones in the slot being
> disabled and drop dev_in_slot() which has no more users.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/pci/hotplug/acpiphp_glue.c |   28 +++++-----------------------
>  1 file changed, 5 insertions(+), 23 deletions(-)
>
> Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
> +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> @@ -604,32 +604,15 @@ static void __ref enable_slot(struct acp
>         }
>  }
>
> -/* return first device in slot, acquiring a reference on it */
> -static struct pci_dev *dev_in_slot(struct acpiphp_slot *slot)
> -{
> -       struct pci_bus *bus = slot->bus;
> -       struct pci_dev *dev;
> -       struct pci_dev *ret = NULL;
> -
> -       down_read(&pci_bus_sem);
> -       list_for_each_entry(dev, &bus->devices, bus_list)
> -               if (PCI_SLOT(dev->devfn) == slot->device) {
> -                       ret = pci_dev_get(dev);
> -                       break;
> -               }
> -       up_read(&pci_bus_sem);
> -
> -       return ret;
> -}
> -
>  /**
>   * disable_slot - disable a slot
>   * @slot: ACPI PHP slot
>   */
>  static void disable_slot(struct acpiphp_slot *slot)
>  {
> +       struct pci_bus *bus = slot->bus;
> +       struct pci_dev *dev, *prev;
>         struct acpiphp_func *func;
> -       struct pci_dev *pdev;
>
>         /*
>          * enable_slot() enumerates all functions in this device via
> @@ -637,10 +620,9 @@ static void disable_slot(struct acpiphp_
>          * methods (_EJ0, etc.) or not.  Therefore, we remove all functions
>          * here.
>          */
> -       while ((pdev = dev_in_slot(slot))) {
> -               pci_stop_and_remove_bus_device(pdev);
> -               pci_dev_put(pdev);
> -       }
> +       list_for_each_entry_safe_reverse(dev, prev, &bus->devices, bus_list)
> +               if (PCI_SLOT(dev->devfn) == slot->device)
> +                       pci_stop_and_remove_bus_device(dev);
>
>         list_for_each_entry(func, &slot->funcs, sibling)
>                 acpiphp_bus_trim(func_to_handle(func));
>

Acked-by: Yinghai Lu <yinghai@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31  0:38           ` Rafael J. Wysocki
  2014-01-31  1:39             ` Yinghai Lu
@ 2014-01-31 10:53             ` Mika Westerberg
  2014-01-31 11:52               ` Rafael J. Wysocki
  1 sibling, 1 reply; 27+ messages in thread
From: Mika Westerberg @ 2014-01-31 10:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Yinghai Lu, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Fri, Jan 31, 2014 at 01:38:42AM +0100, Rafael J. Wysocki wrote:
> On Friday, January 31, 2014 12:59:06 AM Rafael J. Wysocki wrote:
> > On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> > > On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> > > >>
> > > >> --047d7b5d2ea4eb937804f132eedf
> > > >> Content-Type: text/plain; charset=ISO-8859-1
> > > >>
> > > >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> > > >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> > > >> >> getting huge amounts of messages like:
> > > >> >>
> > > >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> > > >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> > > >> >
> > > >> > that mean pci_stop_dev() get called again and again ?
> > > >>
> > > >> please check if attached patch could help.
> > > >
> > > > Well, it looks like what happens is an endless loop in
> > > > acpiphp_glue.c:disable_slot().
> > > >
> > > > dev_in_slot() returns the first device in the list, so
> > > > pci_stop_and_remove_bus_device() is called for it, but it
> > > > doesn't remove the device from bus->devices any more, so
> > > > dev_in_slot() will return the same device next time and
> > > > so on forever.
> > > >
> > > ...
> > > >
> > > > So the above won't help in my opinion.
> > > >
> > > > I wonder, however, if this patch helps instead:
> > > >
> > > > https://patchwork.kernel.org/patch/3540701/
> > > >
> > > > I thought it would be 3.15 material, but it very well can go in earlier if
> > > > it happens to address this particular problem.
> > > 
> > > Agree, that should fix the problem.
> > > 
> > > but please use list_for_each_entry_safe_reverse
> > > instead.
> > 
> > OK, I will.
> 
> Mika, below is an updated patch to try.
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: ACPI / hotplug / PCI: Simplify disable_slot()
> 
> After recent PCI core changes related to the rescan/remove locking,
> the ACPIPHP's disable_slot() function is only called under the
> general PCI rescan/remove lock, so it doesn't have to use
> dev_in_slot() any more to avoid race conditions.  Make it simply
> walk the devices on the bus and drop the ones in the slot being
> disabled and drop dev_in_slot() which has no more users.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks for the fix.

Unfortunately, it now crashes here after I re-plug the TBT chain (I have
both of your patches applied):

int sysfs_create_bin_file(struct kobject *kobj,
                          const struct bin_attribute *attr)
{
	BUG_ON(!kobj || !kobj->sd || !attr); <--

Since I don't have proper serial console to that machine, all I see is the
end of the backtrace :-(

Here is a hand copied backtrace from the screen:

pci_create_sysfs_dev_files()
pci_bus_add_device()
pci_bus_add_devices()
enable_slot()
acpiphp_check_bridge()
hotplug_event()
...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 10:53             ` Mika Westerberg
@ 2014-01-31 11:52               ` Rafael J. Wysocki
  2014-01-31 12:36                 ` Mika Westerberg
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31 11:52 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Yinghai Lu, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 12:53:01 PM Mika Westerberg wrote:
> On Fri, Jan 31, 2014 at 01:38:42AM +0100, Rafael J. Wysocki wrote:
> > On Friday, January 31, 2014 12:59:06 AM Rafael J. Wysocki wrote:
> > > On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> > > > On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> > > > >>
> > > > >> --047d7b5d2ea4eb937804f132eedf
> > > > >> Content-Type: text/plain; charset=ISO-8859-1
> > > > >>
> > > > >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> > > > >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> > > > >> >> getting huge amounts of messages like:
> > > > >> >>
> > > > >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> > > > >> >
> > > > >> > that mean pci_stop_dev() get called again and again ?
> > > > >>
> > > > >> please check if attached patch could help.
> > > > >
> > > > > Well, it looks like what happens is an endless loop in
> > > > > acpiphp_glue.c:disable_slot().
> > > > >
> > > > > dev_in_slot() returns the first device in the list, so
> > > > > pci_stop_and_remove_bus_device() is called for it, but it
> > > > > doesn't remove the device from bus->devices any more, so
> > > > > dev_in_slot() will return the same device next time and
> > > > > so on forever.
> > > > >
> > > > ...
> > > > >
> > > > > So the above won't help in my opinion.
> > > > >
> > > > > I wonder, however, if this patch helps instead:
> > > > >
> > > > > https://patchwork.kernel.org/patch/3540701/
> > > > >
> > > > > I thought it would be 3.15 material, but it very well can go in earlier if
> > > > > it happens to address this particular problem.
> > > > 
> > > > Agree, that should fix the problem.
> > > > 
> > > > but please use list_for_each_entry_safe_reverse
> > > > instead.
> > > 
> > > OK, I will.
> > 
> > Mika, below is an updated patch to try.
> > 
> > ---
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Subject: ACPI / hotplug / PCI: Simplify disable_slot()
> > 
> > After recent PCI core changes related to the rescan/remove locking,
> > the ACPIPHP's disable_slot() function is only called under the
> > general PCI rescan/remove lock, so it doesn't have to use
> > dev_in_slot() any more to avoid race conditions.  Make it simply
> > walk the devices on the bus and drop the ones in the slot being
> > disabled and drop dev_in_slot() which has no more users.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Thanks for the fix.
> 
> Unfortunately, it now crashes here after I re-plug the TBT chain (I have
> both of your patches applied):
> 
> int sysfs_create_bin_file(struct kobject *kobj,
>                           const struct bin_attribute *attr)
> {
> 	BUG_ON(!kobj || !kobj->sd || !attr); <--
> 
> Since I don't have proper serial console to that machine, all I see is the
> end of the backtrace :-(
> 
> Here is a hand copied backtrace from the screen:
> 
> pci_create_sysfs_dev_files()
> pci_bus_add_device()
> pci_bus_add_devices()
> enable_slot()
> acpiphp_check_bridge()
> hotplug_event()
> ...

So I think what happens is that we leak the struct pci_dev during removal and
the proper cleanup is never done.

Can you please add a debug printk into pci_release_dev() and see if that's
ever called after TBT unplug?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 11:52               ` Rafael J. Wysocki
@ 2014-01-31 12:36                 ` Mika Westerberg
  2014-01-31 13:49                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Mika Westerberg @ 2014-01-31 12:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Yinghai Lu, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Fri, Jan 31, 2014 at 12:52:43PM +0100, Rafael J. Wysocki wrote:
> So I think what happens is that we leak the struct pci_dev during removal and
> the proper cleanup is never done.
> 
> Can you please add a debug printk into pci_release_dev() and see if that's
> ever called after TBT unplug?

OK, I added the debug print (still on top of your two patches) and was able
to capture a bit more from /var/log/messages before it crashes. Here's the
log. I added dev_info(dev, "RELEASE\n") to pci_release_dev().

Unplug:

Jan 31 20:05:57 buildroot kern.debug kernel: [  439.557920] pcieport 0000:06:03.0: PME# disabled
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.559483] pcieport 0000:05:00.0: PME# disabled
Jan 31 20:05:57 buildroot kern.info kernel: [  439.561074] pci 0000:07:00.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.562536] pci_bus 0000:07: busn_res: [bus 07] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.563993] pci 0000:06:03.0: RELEASE
Jan 31 20:05:57 buildroot kern.info kernel: [  439.570345] pci 0000:0a:00.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.571734] pci_bus 0000:0a: busn_res: [bus 0a] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.573154] pci 0000:09:00.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.574528] pci_bus 0000:09: busn_res: [bus 09-2e] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.575939] pci 0000:08:00.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.577316] pci_bus 0000:08: busn_res: [bus 08-2e] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.578721] pci 0000:06:04.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.580081] pci_bus 0000:2f: busn_res: [bus 2f] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.581487] pci 0000:06:05.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.582873] pci_bus 0000:06: busn_res: [bus 06-2f] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.584322] pci 0000:05:00.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.585727] pcieport 0000:03:00.0: PME# disabled
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.587225] pci_bus 0000:04: busn_res: [bus 04] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.588723] pci 0000:03:00.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.660389] pci_bus 0000:05: busn_res: [bus 05-2f] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.661993] pci 0000:03:03.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.663527] pci_bus 0000:30: busn_res: [bus 30-38] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.665103] pci 0000:03:04.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.666641] pci_bus 0000:39: busn_res: [bus 39] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.668210] pci 0000:03:05.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.669764] pci_bus 0000:3a: busn_res: [bus 3a] is released
Jan 31 20:05:57 buildroot kern.info kernel: [  439.671350] pci 0000:03:06.0: RELEASE
Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released

Plug:

Jan 31 20:06:11 buildroot kern.debug kernel: [  453.609684] acpiphp_glue: hotplug_event: Bus check notify on \_SB_.PCI0.RP05
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.611339] acpiphp_glue: hotplug_event: re-enumerating slots under \_SB_.PCI0.RP05
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.614625] pci 0000:02:00.0: scanning [bus 03-3a] behind bridge, pass 0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.616434] ------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.618102] WARNING: CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.619797] kobject_add_internal failed for pci_bus (error: -2 parent: 0000:02:00.0)
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.621491] Modules linked in:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.623191] CPU: 1 PID: 956 Comm: kworker/u8:5 Not tainted 3.13.0+ #156
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.624912] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.626649] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.628395]  0000000000000009 ffff88006de4d9f8 ffffffff818129e3 ffff88006de4da40
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.630164]  ffff88006de4da30 ffffffff81047228 ffff88006dfd1000 00000000fffffffe
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.631933]  ffff88006de140a8 ffff88006d582918 ffff88006d582918 ffff88006de4da90
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.633691] Call Trace:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.635428]  [<ffffffff818129e3>] dump_stack+0x45/0x56
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.637138]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.638879]  [<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.640579]  [<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.642297]  [<ffffffff812d88b5>] kobject_add+0x65/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.643986]  [<ffffffff81141852>] ? kmem_cache_alloc_trace+0xe2/0x130
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.645694]  [<ffffffff81455584>] get_device_parent+0x174/0x1e0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.647377]  [<ffffffff81455a33>] device_add+0xe3/0x610
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.649062]  [<ffffffff81460ac4>] ? device_pm_sleep_init+0x44/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.650729]  [<ffffffff81455f75>] device_register+0x15/0x20
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.652409]  [<ffffffff8180c1a7>] pci_add_new_bus+0x167/0x3e0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.654064]  [<ffffffff81303057>] ? pci_find_next_bus+0x47/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.655724]  [<ffffffff812fc692>] pci_scan_bridge+0x5c2/0x630
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.657372]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.659057]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.660703]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.662387]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.664049]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.665746]  [<ffffffff81316685>] hotplug_event+0x105/0x260
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.667417]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.669118]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.670816]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.672537]  [<ffffffff81061e89>] worker_thread+0x119/0x390
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.674239]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.675976]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.677689]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.679446]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.681174]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.682942] ---[ end trace 84e80bde4d2086ef ]---
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.684679] ------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.686450] WARNING: CPU: 1 PID: 956 at drivers/pci/probe.c:711 pci_add_new_bus+0x3db/0x3e0()
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.688245] Modules linked in:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.690032] CPU: 1 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.691883] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.693703] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.695531]  0000000000000009 ffff88006de4db88 ffffffff818129e3 0000000000000000
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.697377]  ffff88006de4dbc0 ffffffff81047228 ffff88006d582800 ffff88006eac9000
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.699233]  ffff88006de14000 ffff88006de14000 ffff88006d582918 ffff88006de4dbd0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.701114] Call Trace:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.702989]  [<ffffffff818129e3>] dump_stack+0x45/0x56
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.704871]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.706767]  [<ffffffff81047305>] warn_slowpath_null+0x15/0x20
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.708637]  [<ffffffff8180c41b>] pci_add_new_bus+0x3db/0x3e0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.710518]  [<ffffffff81303057>] ? pci_find_next_bus+0x47/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.712381]  [<ffffffff812fc692>] pci_scan_bridge+0x5c2/0x630
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.714258]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.716109]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.717973]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.719824]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.721685]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.723527]  [<ffffffff81316685>] hotplug_event+0x105/0x260
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.725378]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.727195]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.729016]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.730815]  [<ffffffff81061e89>] worker_thread+0x119/0x390
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.732622]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.734403]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.736193]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.737962]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.739730]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.741471] ---[ end trace 84e80bde4d2086f0 ]---
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.743215] pci_bus 0000:03: scanning bus
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.744993] pci 0000:03:00.0: [8086:1548] type 01 class 0x060400
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.746859] pci 0000:03:00.0: calling pci_fixup_transparent_bridge+0x0/0x30
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.748767] pci 0000:03:00.0: supports D1 D2
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.750433] pci 0000:03:00.0: PME# supported from D0 D1 D2 D3hot D3cold
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.752141] pci 0000:03:00.0: PME# disabled
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.753848] ------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.755500] WARNING: CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.757195] kobject_add_internal failed for 0000:03:00.0 (error: -2 parent: 0000:02:00.0)
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.758885] Modules linked in:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.760589] CPU: 1 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.762328] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.764082] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.765862]  0000000000000009 ffff88006de4d9c0 ffffffff818129e3 ffff88006de4da08
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.767661]  ffff88006de4d9f8 ffffffff81047228 ffff88006de170a8 00000000fffffffe
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.769476]  ffff88006de140a8 ffff88006de17098 ffff88006eac9000 ffff88006de4da58
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.771289] Call Trace:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.773091]  [<ffffffff818129e3>] dump_stack+0x45/0x56
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.774886]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.776712]  [<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.778510]  [<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.780332]  [<ffffffff8163ea05>] ? pci_conf1_read+0xb5/0x110
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.782134]  [<ffffffff812d88b5>] kobject_add+0x65/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.783959]  [<ffffffff814558fe>] ? device_private_init+0x1e/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.785760]  [<ffffffff81455a61>] device_add+0x111/0x610
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.787573]  [<ffffffff812fb89d>] pci_device_add+0x10d/0x140
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.789362]  [<ffffffff8180c011>] pci_scan_single_device+0x91/0xc0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.791171]  [<ffffffff812fb919>] pci_scan_slot+0x49/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.792958]  [<ffffffff812fc73d>] pci_scan_child_bus+0x3d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.794759]  [<ffffffff812fc53b>] pci_scan_bridge+0x46b/0x630
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.796535]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.798329]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.800097]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.801896]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.803671]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.805471]  [<ffffffff81316685>] hotplug_event+0x105/0x260
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.807246]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.809033]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.810801]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.812573]  [<ffffffff81061e89>] worker_thread+0x119/0x390
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.814307]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.816053]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.817755]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.819467]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.821144]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.822829] ---[ end trace 84e80bde4d2086f1 ]---
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.824500] ------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.826182] WARNING: CPU: 1 PID: 956 at drivers/pci/probe.c:1397 pci_device_add+0x13c/0x140()
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.827872] Modules linked in:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.829572] CPU: 1 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.831314] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.833072] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.834845]  0000000000000009 ffff88006de4db10 ffffffff818129e3 0000000000000000
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.836648]  ffff88006de4db48 ffffffff81047228 ffff88006de17000 ffff88006d582828
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.838466]  ffff88006de17098 0000000000000000 ffff88006eac9000 ffff88006de4db58
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.840280] Call Trace:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.842081]  [<ffffffff818129e3>] dump_stack+0x45/0x56
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.843877]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.845694]  [<ffffffff81047305>] warn_slowpath_null+0x15/0x20
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.847482]  [<ffffffff812fb8cc>] pci_device_add+0x13c/0x140
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.849290]  [<ffffffff8180c011>] pci_scan_single_device+0x91/0xc0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.851090]  [<ffffffff812fb919>] pci_scan_slot+0x49/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.852897]  [<ffffffff812fc73d>] pci_scan_child_bus+0x3d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.854674]  [<ffffffff812fc53b>] pci_scan_bridge+0x46b/0x630
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.856470]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.858251]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.860044]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.861819]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.863612]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.865389]  [<ffffffff81316685>] hotplug_event+0x105/0x260
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.867187]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.868956]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.870739]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.872493]  [<ffffffff81061e89>] worker_thread+0x119/0x390
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.874268]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.876024]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.877794]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.879532]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.881273]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.882984] ---[ end trace 84e80bde4d2086f2 ]---
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.884782] pci 0000:03:03.0: [8086:1548] type 01 class 0x060400
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.886627] pci 0000:03:03.0: calling pci_fixup_transparent_bridge+0x0/0x30
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.888494] pci 0000:03:03.0: supports D1 D2
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.890141] pci 0000:03:03.0: PME# supported from D0 D1 D2 D3hot D3cold
Jan 31 20:06:11 buildroot kern.debug kernel: [  453.891805] pci 0000:03:03.0: PME# disabled
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.893490] ------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.895118] WARNING: CPU: 3 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.896778] kobject_add_internal failed for 0000:03:03.0 (error: -2 parent: 0000:02:00.0)
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.898453] Modules linked in:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.900135] CPU: 3 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.901841] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.903569] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.905310]  0000000000000009 ffff88006de4d9c0 ffffffff818129e3 ffff88006de4da08
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.907080]  ffff88006de4d9f8 ffffffff81047228 ffff88006d6260a8 00000000fffffffe
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.908866]  ffff88006de140a8 ffff88006d626098 ffff88006eac9000 ffff88006de4da58
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.910654] Call Trace:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.912412]  [<ffffffff818129e3>] dump_stack+0x45/0x56
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.914189]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.915980]  [<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.917764]  [<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.919557]  [<ffffffff8164189e>] ? raw_pci_read+0x1e/0x40
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.921344]  [<ffffffff812d88b5>] kobject_add+0x65/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.923127]  [<ffffffff814558fe>] ? device_private_init+0x1e/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.924910]  [<ffffffff81455a61>] device_add+0x111/0x610
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.926683]  [<ffffffff812fb89d>] pci_device_add+0x10d/0x140
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.928453]  [<ffffffff8180c011>] pci_scan_single_device+0x91/0xc0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.930234]  [<ffffffff812fb919>] pci_scan_slot+0x49/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.932004]  [<ffffffff812fc73d>] pci_scan_child_bus+0x3d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.933773]  [<ffffffff812fc53b>] pci_scan_bridge+0x46b/0x630
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.935636]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.937399]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.939148]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.940901]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.942659]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.944419]  [<ffffffff81316685>] hotplug_event+0x105/0x260
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.946166]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.947917]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.949648]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.951366]  [<ffffffff81061e89>] worker_thread+0x119/0x390
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.953070]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.954764]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.956425]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.958084]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.959727]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.961378] ---[ end trace 84e80bde4d2086f3 ]---
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.963016] ------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.964646] WARNING: CPU: 3 PID: 956 at drivers/pci/probe.c:1397 pci_device_add+0x13c/0x140()

and then it crashes.

The PCI tree looks like:

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.4 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation QS77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
02:00.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
03:00.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
03:03.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
03:04.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
03:05.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
03:06.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
05:00.0 PCI bridge: Intel Corporation Device 1513
06:03.0 PCI bridge: Intel Corporation Device 1513
06:04.0 PCI bridge: Intel Corporation Device 1513
06:05.0 PCI bridge: Intel Corporation Device 1513
07:00.0 SATA controller: Marvell Technology Group Ltd. Device 9182 (rev 11)
08:00.0 PCI bridge: Intel Corporation DSL3510 Thunderbolt Controller [Cactus Ridge]
09:00.0 PCI bridge: Intel Corporation DSL3510 Thunderbolt Controller [Cactus Ridge]
0a:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM57762 Gigabit Ethernet PCIe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 12:36                 ` Mika Westerberg
@ 2014-01-31 13:49                   ` Rafael J. Wysocki
  2014-01-31 13:49                     ` Mika Westerberg
  2014-01-31 14:05                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31 13:49 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Yinghai Lu, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 02:36:07 PM Mika Westerberg wrote:
> On Fri, Jan 31, 2014 at 12:52:43PM +0100, Rafael J. Wysocki wrote:
> > So I think what happens is that we leak the struct pci_dev during removal and
> > the proper cleanup is never done.
> > 
> > Can you please add a debug printk into pci_release_dev() and see if that's
> > ever called after TBT unplug?
> 
> OK, I added the debug print (still on top of your two patches) and was able
> to capture a bit more from /var/log/messages before it crashes. Here's the
> log. I added dev_info(dev, "RELEASE\n") to pci_release_dev().
> 
> Unplug:
> 
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.557920] pcieport 0000:06:03.0: PME# disabled
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.559483] pcieport 0000:05:00.0: PME# disabled
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.561074] pci 0000:07:00.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.562536] pci_bus 0000:07: busn_res: [bus 07] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.563993] pci 0000:06:03.0: RELEASE
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.570345] pci 0000:0a:00.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.571734] pci_bus 0000:0a: busn_res: [bus 0a] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.573154] pci 0000:09:00.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.574528] pci_bus 0000:09: busn_res: [bus 09-2e] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.575939] pci 0000:08:00.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.577316] pci_bus 0000:08: busn_res: [bus 08-2e] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.578721] pci 0000:06:04.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.580081] pci_bus 0000:2f: busn_res: [bus 2f] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.581487] pci 0000:06:05.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.582873] pci_bus 0000:06: busn_res: [bus 06-2f] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.584322] pci 0000:05:00.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.585727] pcieport 0000:03:00.0: PME# disabled
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.587225] pci_bus 0000:04: busn_res: [bus 04] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.588723] pci 0000:03:00.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.660389] pci_bus 0000:05: busn_res: [bus 05-2f] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.661993] pci 0000:03:03.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.663527] pci_bus 0000:30: busn_res: [bus 30-38] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.665103] pci 0000:03:04.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.666641] pci_bus 0000:39: busn_res: [bus 39] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.668210] pci 0000:03:05.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.669764] pci_bus 0000:3a: busn_res: [bus 3a] is released
> Jan 31 20:05:57 buildroot kern.info kernel: [  439.671350] pci 0000:03:06.0: RELEASE
> Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released

OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
devices that go away after unplug.

Do I think correctly that the below doesn't happen with the Yinghai's commit
reverted?

> Plug:
> 
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.609684] acpiphp_glue: hotplug_event: Bus check notify on \_SB_.PCI0.RP05
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.611339] acpiphp_glue: hotplug_event: re-enumerating slots under \_SB_.PCI0.RP05
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.614625] pci 0000:02:00.0: scanning [bus 03-3a] behind bridge, pass 0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.616434] ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.618102] WARNING: CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.619797] kobject_add_internal failed for pci_bus (error: -2 parent: 0000:02:00.0)

create_dir() fails here and that's not because it already exists.
Interesting.

> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.621491] Modules linked in:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.623191] CPU: 1 PID: 956 Comm: kworker/u8:5 Not tainted 3.13.0+ #156
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.624912] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.626649] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.628395]  0000000000000009 ffff88006de4d9f8 ffffffff818129e3 ffff88006de4da40
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.630164]  ffff88006de4da30 ffffffff81047228 ffff88006dfd1000 00000000fffffffe
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.631933]  ffff88006de140a8 ffff88006d582918 ffff88006d582918 ffff88006de4da90
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.633691] Call Trace:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.635428]  [<ffffffff818129e3>] dump_stack+0x45/0x56
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.637138]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.638879]  [<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.640579]  [<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.642297]  [<ffffffff812d88b5>] kobject_add+0x65/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.643986]  [<ffffffff81141852>] ? kmem_cache_alloc_trace+0xe2/0x130
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.645694]  [<ffffffff81455584>] get_device_parent+0x174/0x1e0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.647377]  [<ffffffff81455a33>] device_add+0xe3/0x610
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.649062]  [<ffffffff81460ac4>] ? device_pm_sleep_init+0x44/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.650729]  [<ffffffff81455f75>] device_register+0x15/0x20
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.652409]  [<ffffffff8180c1a7>] pci_add_new_bus+0x167/0x3e0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.654064]  [<ffffffff81303057>] ? pci_find_next_bus+0x47/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.655724]  [<ffffffff812fc692>] pci_scan_bridge+0x5c2/0x630
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.657372]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.659057]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.660703]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.662387]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.664049]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.665746]  [<ffffffff81316685>] hotplug_event+0x105/0x260
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.667417]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.669118]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.670816]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.672537]  [<ffffffff81061e89>] worker_thread+0x119/0x390
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.674239]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.675976]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.677689]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.679446]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.681174]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.682942] ---[ end trace 84e80bde4d2086ef ]---
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.684679] ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.686450] WARNING: CPU: 1 PID: 956 at drivers/pci/probe.c:711 pci_add_new_bus+0x3db/0x3e0()

That's failing device_register(), probably because of the earlier sysfs issue.

> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.688245] Modules linked in:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.690032] CPU: 1 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.691883] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.693703] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.695531]  0000000000000009 ffff88006de4db88 ffffffff818129e3 0000000000000000
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.697377]  ffff88006de4dbc0 ffffffff81047228 ffff88006d582800 ffff88006eac9000
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.699233]  ffff88006de14000 ffff88006de14000 ffff88006d582918 ffff88006de4dbd0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.701114] Call Trace:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.702989]  [<ffffffff818129e3>] dump_stack+0x45/0x56
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.704871]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.706767]  [<ffffffff81047305>] warn_slowpath_null+0x15/0x20
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.708637]  [<ffffffff8180c41b>] pci_add_new_bus+0x3db/0x3e0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.710518]  [<ffffffff81303057>] ? pci_find_next_bus+0x47/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.712381]  [<ffffffff812fc692>] pci_scan_bridge+0x5c2/0x630
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.714258]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.716109]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.717973]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.719824]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.721685]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.723527]  [<ffffffff81316685>] hotplug_event+0x105/0x260
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.725378]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.727195]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.729016]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.730815]  [<ffffffff81061e89>] worker_thread+0x119/0x390
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.732622]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.734403]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.736193]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.737962]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.739730]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.741471] ---[ end trace 84e80bde4d2086f0 ]---
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.743215] pci_bus 0000:03: scanning bus
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.744993] pci 0000:03:00.0: [8086:1548] type 01 class 0x060400
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.746859] pci 0000:03:00.0: calling pci_fixup_transparent_bridge+0x0/0x30
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.748767] pci 0000:03:00.0: supports D1 D2
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.750433] pci 0000:03:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.752141] pci 0000:03:00.0: PME# disabled
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.753848] ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.755500] WARNING: CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.757195] kobject_add_internal failed for 0000:03:00.0 (error: -2 parent: 0000:02:00.0)

And here it repeats for the next device and so on.

> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.758885] Modules linked in:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.760589] CPU: 1 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.762328] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.764082] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.765862]  0000000000000009 ffff88006de4d9c0 ffffffff818129e3 ffff88006de4da08
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.767661]  ffff88006de4d9f8 ffffffff81047228 ffff88006de170a8 00000000fffffffe
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.769476]  ffff88006de140a8 ffff88006de17098 ffff88006eac9000 ffff88006de4da58
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.771289] Call Trace:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.773091]  [<ffffffff818129e3>] dump_stack+0x45/0x56
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.774886]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.776712]  [<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.778510]  [<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.780332]  [<ffffffff8163ea05>] ? pci_conf1_read+0xb5/0x110
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.782134]  [<ffffffff812d88b5>] kobject_add+0x65/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.783959]  [<ffffffff814558fe>] ? device_private_init+0x1e/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.785760]  [<ffffffff81455a61>] device_add+0x111/0x610
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.787573]  [<ffffffff812fb89d>] pci_device_add+0x10d/0x140
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.789362]  [<ffffffff8180c011>] pci_scan_single_device+0x91/0xc0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.791171]  [<ffffffff812fb919>] pci_scan_slot+0x49/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.792958]  [<ffffffff812fc73d>] pci_scan_child_bus+0x3d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.794759]  [<ffffffff812fc53b>] pci_scan_bridge+0x46b/0x630
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.796535]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.798329]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.800097]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.801896]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.803671]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.805471]  [<ffffffff81316685>] hotplug_event+0x105/0x260
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.807246]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.809033]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.810801]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.812573]  [<ffffffff81061e89>] worker_thread+0x119/0x390
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.814307]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.816053]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.817755]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.819467]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.821144]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.822829] ---[ end trace 84e80bde4d2086f1 ]---
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.824500] ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.826182] WARNING: CPU: 1 PID: 956 at drivers/pci/probe.c:1397 pci_device_add+0x13c/0x140()
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.827872] Modules linked in:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.829572] CPU: 1 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.831314] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.833072] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.834845]  0000000000000009 ffff88006de4db10 ffffffff818129e3 0000000000000000
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.836648]  ffff88006de4db48 ffffffff81047228 ffff88006de17000 ffff88006d582828
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.838466]  ffff88006de17098 0000000000000000 ffff88006eac9000 ffff88006de4db58
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.840280] Call Trace:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.842081]  [<ffffffff818129e3>] dump_stack+0x45/0x56
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.843877]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.845694]  [<ffffffff81047305>] warn_slowpath_null+0x15/0x20
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.847482]  [<ffffffff812fb8cc>] pci_device_add+0x13c/0x140
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.849290]  [<ffffffff8180c011>] pci_scan_single_device+0x91/0xc0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.851090]  [<ffffffff812fb919>] pci_scan_slot+0x49/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.852897]  [<ffffffff812fc73d>] pci_scan_child_bus+0x3d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.854674]  [<ffffffff812fc53b>] pci_scan_bridge+0x46b/0x630
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.856470]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.858251]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.860044]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.861819]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.863612]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.865389]  [<ffffffff81316685>] hotplug_event+0x105/0x260
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.867187]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.868956]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.870739]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.872493]  [<ffffffff81061e89>] worker_thread+0x119/0x390
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.874268]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.876024]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.877794]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.879532]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.881273]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.882984] ---[ end trace 84e80bde4d2086f2 ]---
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.884782] pci 0000:03:03.0: [8086:1548] type 01 class 0x060400
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.886627] pci 0000:03:03.0: calling pci_fixup_transparent_bridge+0x0/0x30
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.888494] pci 0000:03:03.0: supports D1 D2
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.890141] pci 0000:03:03.0: PME# supported from D0 D1 D2 D3hot D3cold
> Jan 31 20:06:11 buildroot kern.debug kernel: [  453.891805] pci 0000:03:03.0: PME# disabled
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.893490] ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.895118] WARNING: CPU: 3 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.896778] kobject_add_internal failed for 0000:03:03.0 (error: -2 parent: 0000:02:00.0)
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.898453] Modules linked in:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.900135] CPU: 3 PID: 956 Comm: kworker/u8:5 Tainted: G        W    3.13.0+ #156
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.901841] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.903569] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.905310]  0000000000000009 ffff88006de4d9c0 ffffffff818129e3 ffff88006de4da08
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.907080]  ffff88006de4d9f8 ffffffff81047228 ffff88006d6260a8 00000000fffffffe
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.908866]  ffff88006de140a8 ffff88006d626098 ffff88006eac9000 ffff88006de4da58
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.910654] Call Trace:
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.912412]  [<ffffffff818129e3>] dump_stack+0x45/0x56
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.914189]  [<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.915980]  [<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.917764]  [<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.919557]  [<ffffffff8164189e>] ? raw_pci_read+0x1e/0x40
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.921344]  [<ffffffff812d88b5>] kobject_add+0x65/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.923127]  [<ffffffff814558fe>] ? device_private_init+0x1e/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.924910]  [<ffffffff81455a61>] device_add+0x111/0x610
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.926683]  [<ffffffff812fb89d>] pci_device_add+0x10d/0x140
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.928453]  [<ffffffff8180c011>] pci_scan_single_device+0x91/0xc0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.930234]  [<ffffffff812fb919>] pci_scan_slot+0x49/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.932004]  [<ffffffff812fc73d>] pci_scan_child_bus+0x3d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.933773]  [<ffffffff812fc53b>] pci_scan_bridge+0x46b/0x630
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.935636]  [<ffffffff812fb9dd>] ? pci_scan_slot+0x10d/0x150
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.937399]  [<ffffffff8180d116>] enable_slot+0xb6/0x320
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.939148]  [<ffffffff812fa273>] ? pci_bus_read_dev_vendor_id+0x23/0xe0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.940901]  [<ffffffff81315814>] ? trim_stale_devices+0xc4/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.942659]  [<ffffffff81315cf8>] acpiphp_check_bridge.part.9+0xe8/0x100
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.944419]  [<ffffffff81316685>] hotplug_event+0x105/0x260
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.946166]  [<ffffffff8131680a>] hotplug_event_work+0x2a/0x70
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.947917]  [<ffffffff8132fc09>] acpi_hotplug_work_fn+0x17/0x22
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.949648]  [<ffffffff8106128a>] process_one_work+0x17a/0x440
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.951366]  [<ffffffff81061e89>] worker_thread+0x119/0x390
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.953070]  [<ffffffff81061d70>] ? manage_workers.isra.25+0x2a0/0x2a0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.954764]  [<ffffffff81067dfd>] kthread+0xcd/0xf0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.956425]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.958084]  [<ffffffff81823a3c>] ret_from_fork+0x7c/0xb0
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.959727]  [<ffffffff81067d30>] ? kthread_create_on_node+0x180/0x180
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.961378] ---[ end trace 84e80bde4d2086f3 ]---
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.963016] ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.964646] WARNING: CPU: 3 PID: 956 at drivers/pci/probe.c:1397 pci_device_add+0x13c/0x140()
> 
> and then it crashes.
> 
> The PCI tree looks like:
> 
> 00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
> 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
> 00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
> 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
> 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
> 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
> 00:1c.4 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4)
> 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
> 00:1f.0 ISA bridge: Intel Corporation QS77 Express Chipset LPC Controller (rev 04)
> 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
> 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
> 02:00.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
> 03:00.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
> 03:03.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
> 03:04.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
> 03:05.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
> 03:06.0 PCI bridge: Intel Corporation Device 1548 (rev 03)
> 05:00.0 PCI bridge: Intel Corporation Device 1513
> 06:03.0 PCI bridge: Intel Corporation Device 1513
> 06:04.0 PCI bridge: Intel Corporation Device 1513
> 06:05.0 PCI bridge: Intel Corporation Device 1513
> 07:00.0 SATA controller: Marvell Technology Group Ltd. Device 9182 (rev 11)
> 08:00.0 PCI bridge: Intel Corporation DSL3510 Thunderbolt Controller [Cactus Ridge]
> 09:00.0 PCI bridge: Intel Corporation DSL3510 Thunderbolt Controller [Cactus Ridge]
> 0a:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM57762 Gigabit Ethernet PCIe

Can you please check how the PCI sysfs directory structure changes after unplug
with the Yinghai's commit present and reverted and what the difference is?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 13:49                   ` Rafael J. Wysocki
@ 2014-01-31 13:49                     ` Mika Westerberg
  2014-01-31 16:41                       ` Mika Westerberg
  2014-02-01  3:44                       ` Yinghai Lu
  2014-01-31 14:05                     ` Rafael J. Wysocki
  1 sibling, 2 replies; 27+ messages in thread
From: Mika Westerberg @ 2014-01-31 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Yinghai Lu, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Fri, Jan 31, 2014 at 02:49:21PM +0100, Rafael J. Wysocki wrote:
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released
> 
> OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
> devices that go away after unplug.
> 
> Do I think correctly that the below doesn't happen with the Yinghai's commit
> reverted?

Yes, with that commit reverted everything works fine.

> Can you please check how the PCI sysfs directory structure changes after unplug
> with the Yinghai's commit present and reverted and what the difference is?

OK, I'll check that and report back.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 13:49                   ` Rafael J. Wysocki
  2014-01-31 13:49                     ` Mika Westerberg
@ 2014-01-31 14:05                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31 14:05 UTC (permalink / raw)
  To: Mika Westerberg, Yinghai Lu; +Cc: linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 02:49:21 PM Rafael J. Wysocki wrote:
> On Friday, January 31, 2014 02:36:07 PM Mika Westerberg wrote:
> > On Fri, Jan 31, 2014 at 12:52:43PM +0100, Rafael J. Wysocki wrote:
> > > So I think what happens is that we leak the struct pci_dev during removal and
> > > the proper cleanup is never done.
> > > 
> > > Can you please add a debug printk into pci_release_dev() and see if that's
> > > ever called after TBT unplug?
> > 
> > OK, I added the debug print (still on top of your two patches) and was able
> > to capture a bit more from /var/log/messages before it crashes. Here's the
> > log. I added dev_info(dev, "RELEASE\n") to pci_release_dev().
> > 
> > Unplug:
> > 
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.557920] pcieport 0000:06:03.0: PME# disabled
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.559483] pcieport 0000:05:00.0: PME# disabled
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.561074] pci 0000:07:00.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.562536] pci_bus 0000:07: busn_res: [bus 07] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.563993] pci 0000:06:03.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.570345] pci 0000:0a:00.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.571734] pci_bus 0000:0a: busn_res: [bus 0a] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.573154] pci 0000:09:00.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.574528] pci_bus 0000:09: busn_res: [bus 09-2e] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.575939] pci 0000:08:00.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.577316] pci_bus 0000:08: busn_res: [bus 08-2e] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.578721] pci 0000:06:04.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.580081] pci_bus 0000:2f: busn_res: [bus 2f] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.581487] pci 0000:06:05.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.582873] pci_bus 0000:06: busn_res: [bus 06-2f] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.584322] pci 0000:05:00.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.585727] pcieport 0000:03:00.0: PME# disabled
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.587225] pci_bus 0000:04: busn_res: [bus 04] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.588723] pci 0000:03:00.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.660389] pci_bus 0000:05: busn_res: [bus 05-2f] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.661993] pci 0000:03:03.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.663527] pci_bus 0000:30: busn_res: [bus 30-38] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.665103] pci 0000:03:04.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.666641] pci_bus 0000:39: busn_res: [bus 39] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.668210] pci 0000:03:05.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.669764] pci_bus 0000:3a: busn_res: [bus 3a] is released
> > Jan 31 20:05:57 buildroot kern.info kernel: [  439.671350] pci 0000:03:06.0: RELEASE
> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released
> 
> OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
> devices that go away after unplug.
> 
> Do I think correctly that the below doesn't happen with the Yinghai's commit
> reverted?
> 
> > Plug:
> > 
> > Jan 31 20:06:11 buildroot kern.debug kernel: [  453.609684] acpiphp_glue: hotplug_event: Bus check notify on \_SB_.PCI0.RP05
> > Jan 31 20:06:11 buildroot kern.debug kernel: [  453.611339] acpiphp_glue: hotplug_event: re-enumerating slots under \_SB_.PCI0.RP05
> > Jan 31 20:06:11 buildroot kern.debug kernel: [  453.614625] pci 0000:02:00.0: scanning [bus 03-3a] behind bridge, pass 0
> > Jan 31 20:06:11 buildroot kern.warn kernel: [  453.616434] ------------[ cut here ]------------
> > Jan 31 20:06:11 buildroot kern.warn kernel: [  453.618102] WARNING: CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/0x400()
> > Jan 31 20:06:11 buildroot kern.warn kernel: [  453.619797] kobject_add_internal failed for pci_bus (error: -2 parent: 0000:02:00.0)
> 
> create_dir() fails here and that's not because it already exists.
> Interesting.

That's more interesting than I thought.

So the error is -2, which is -ENOENT.  Let's see when create_dir() returns -ENOENT,
then.

Evidently, it calls sysfs_create_dir_ns() and returns the error code returned by
that, but if it is 0, it returns the return value of populate_dir().

sysfs_create_dir_ns() tries to use kobj->parent->sd and returns -ENOENT when
that is NULL.  There you go.  So it looks like the sysfs dir of PCI device
0000:02:00.0 doesn't exist at this point.

Yinghai, any ideas?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 13:49                     ` Mika Westerberg
@ 2014-01-31 16:41                       ` Mika Westerberg
  2014-02-01  3:44                       ` Yinghai Lu
  1 sibling, 0 replies; 27+ messages in thread
From: Mika Westerberg @ 2014-01-31 16:41 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Yinghai Lu, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Fri, Jan 31, 2014 at 03:49:25PM +0200, Mika Westerberg wrote:
> On Fri, Jan 31, 2014 at 02:49:21PM +0100, Rafael J. Wysocki wrote:
> > > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released
> > 
> > OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
> > devices that go away after unplug.
> > 
> > Do I think correctly that the below doesn't happen with the Yinghai's commit
> > reverted?
> 
> Yes, with that commit reverted everything works fine.
> 
> > Can you please check how the PCI sysfs directory structure changes after unplug
> > with the Yinghai's commit present and reverted and what the difference is?
> 
> OK, I'll check that and report back.

I now tried so that I have your two patches and then did hotplug/unplug
with and without Yinghai's patch and then compared the resulting PCI sysfs
structure. Unfortunately I didn't find any difference the layout in both
cases is this:

/sys/bus/pci/devices:
0000:00:00.0@
0000:00:02.0@
0000:00:16.0@
0000:00:1a.0@
0000:00:1b.0@
0000:00:1c.0@
0000:00:1c.4@
0000:00:1d.0@
0000:00:1f.0@
0000:00:1f.2@
0000:00:1f.3@

Then I checked 0000:00:1c.4 as that's the root port that hosts the TBT
stuff:

0000:00:1c.4/
0000:00:1c.4/irq
0000:00:1c.4/subsystem_vendor
0000:00:1c.4/broken_parity_status
0000:00:1c.4/class
0000:00:1c.4/power
0000:00:1c.4/power/wakeup_abort_count
0000:00:1c.4/power/wakeup_active
0000:00:1c.4/power/wakeup_total_time_ms
0000:00:1c.4/power/wakeup_active_count
0000:00:1c.4/power/wakeup_max_time_ms
0000:00:1c.4/power/wakeup_count
0000:00:1c.4/power/wakeup_last_time_ms
0000:00:1c.4/power/wakeup
0000:00:1c.4/power/wakeup_expire_count
0000:00:1c.4/reset
0000:00:1c.4/resource
0000:00:1c.4/enabled
0000:00:1c.4/consistent_dma_mask_bits
0000:00:1c.4/modalias
0000:00:1c.4/dma_mask_bits
0000:00:1c.4/local_cpus
0000:00:1c.4/config
0000:00:1c.4/device
0000:00:1c.4/driver
0000:00:1c.4/subsystem
0000:00:1c.4/msi_bus
0000:00:1c.4/local_cpulist
0000:00:1c.4/remove
0000:00:1c.4/rescan
0000:00:1c.4/uevent
0000:00:1c.4/vendor
0000:00:1c.4/pci_bus
0000:00:1c.4/pci_bus/0000:02
0000:00:1c.4/pci_bus/0000:02/power
0000:00:1c.4/pci_bus/0000:02/device
0000:00:1c.4/pci_bus/0000:02/subsystem
0000:00:1c.4/pci_bus/0000:02/cpulistaffinity
0000:00:1c.4/pci_bus/0000:02/cpuaffinity
0000:00:1c.4/pci_bus/0000:02/rescan
0000:00:1c.4/pci_bus/0000:02/uevent
0000:00:1c.4/subsystem_device
0000:00:1c.4/numa_node
0000:00:1c.4/firmware_node

Both cases the structure is the same.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"
  2014-01-30 13:12 Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug Mika Westerberg
  2014-01-30 16:48 ` Yinghai Lu
@ 2014-01-31 23:34 ` Rafael J. Wysocki
  2014-02-01  1:56   ` Yinghai Lu
  1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-01-31 23:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, linux-pci, Yinghai Lu, Rafael J. Wysocki,
	Linus Torvalds, Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    
Revert commit ef83b0781a73 "PCI: Remove from bus_list and release
resources in pci_release_dev()" that made some nasty race conditions
become possible.  For example, if a Thunderbolt link is unplugged
and then replugged immediately, the pci_release_dev() resulting from
the hot-remove code path may be racing with the hot-add code path
which after that commit causes various kinds of breakage to happen
(up to and including a hard crash of the whole system).

Moreover, the problem that commit ef83b0781a73 attempted to address
cannot happen any more after commit 8a4c5c329de7 "PCI: Check parent
kobject in pci_destroy_dev()", because pci_destroy_dev() will now
return immediately if it has already been executed for the given
device.

Fixes: ef83b0781a73 (PCI: Remove from bus_list and release resources in pci_release_dev())
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 04796c056d12..6e34498ec9f0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1208,18 +1208,6 @@ static void pci_release_capabilities(struct pci_dev *dev)
 	pci_free_cap_save_buffers(dev);
 }
 
-static void pci_free_resources(struct pci_dev *dev)
-{
-	int i;
-
-	pci_cleanup_rom(dev);
-	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-		struct resource *res = dev->resource + i;
-		if (res->parent)
-			release_resource(res);
-	}
-}
-
 /**
  * pci_release_dev - free a pci device structure when all users of it are finished.
  * @dev: device that's been disconnected
@@ -1229,14 +1217,9 @@ static void pci_free_resources(struct pci_dev *dev)
  */
 static void pci_release_dev(struct device *dev)
 {
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-
-	down_write(&pci_bus_sem);
-	list_del(&pci_dev->bus_list);
-	up_write(&pci_bus_sem);
-
-	pci_free_resources(pci_dev);
+	struct pci_dev *pci_dev;
 
+	pci_dev = to_pci_dev(dev);
 	pci_release_capabilities(pci_dev);
 	pci_release_of_node(pci_dev);
 	pcibios_release_device(pci_dev);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 4ff36bfa785e..b8c93c90daf5 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -3,6 +3,20 @@
 #include <linux/pci-aspm.h>
 #include "pci.h"
 
+static void pci_free_resources(struct pci_dev *dev)
+{
+	int i;
+
+	msi_remove_pci_irq_vectors(dev);
+
+	pci_cleanup_rom(dev);
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		struct resource *res = dev->resource + i;
+		if (res->parent)
+			release_resource(res);
+	}
+}
+
 static void pci_stop_dev(struct pci_dev *dev)
 {
 	pci_pme_active(dev, false);
@@ -25,6 +39,11 @@ static void pci_destroy_dev(struct pci_dev *dev)
 
 	device_del(&dev->dev);
 
+	down_write(&pci_bus_sem);
+	list_del(&dev->bus_list);
+	up_write(&pci_bus_sem);
+
+	pci_free_resources(dev);
 	put_device(&dev->dev);
 }
 


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"
  2014-01-31 23:34 ` [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()" Rafael J. Wysocki
@ 2014-02-01  1:56   ` Yinghai Lu
  2014-02-01 14:38     ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Yinghai Lu @ 2014-02-01  1:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Mika Westerberg, linux-pci, Rafael J. Wysocki,
	Linus Torvalds, Linux Kernel Mailing List

On Fri, Jan 31, 2014 at 3:34 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Revert commit ef83b0781a73 "PCI: Remove from bus_list and release
> resources in pci_release_dev()" that made some nasty race conditions
> become possible.  For example, if a Thunderbolt link is unplugged
> and then replugged immediately, the pci_release_dev() resulting from
> the hot-remove code path may be racing with the hot-add code path
> which after that commit causes various kinds of breakage to happen
> (up to and including a hard crash of the whole system).
>
> Moreover, the problem that commit ef83b0781a73 attempted to address
> cannot happen any more after commit 8a4c5c329de7 "PCI: Check parent
> kobject in pci_destroy_dev()", because pci_destroy_dev() will now
> return immediately if it has already been executed for the given
> device.
>
> Fixes: ef83b0781a73 (PCI: Remove from bus_list and release resources in pci_release_dev())
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 04796c056d12..6e34498ec9f0 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1208,18 +1208,6 @@ static void pci_release_capabilities(struct pci_dev *dev)
>         pci_free_cap_save_buffers(dev);
>  }
>
> -static void pci_free_resources(struct pci_dev *dev)
> -{
> -       int i;
> -
> -       pci_cleanup_rom(dev);
> -       for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> -               struct resource *res = dev->resource + i;
> -               if (res->parent)
> -                       release_resource(res);
> -       }
> -}
> -
>  /**
>   * pci_release_dev - free a pci device structure when all users of it are finished.
>   * @dev: device that's been disconnected
> @@ -1229,14 +1217,9 @@ static void pci_free_resources(struct pci_dev *dev)
>   */
>  static void pci_release_dev(struct device *dev)
>  {
> -       struct pci_dev *pci_dev = to_pci_dev(dev);
> -
> -       down_write(&pci_bus_sem);
> -       list_del(&pci_dev->bus_list);
> -       up_write(&pci_bus_sem);
> -
> -       pci_free_resources(pci_dev);
> +       struct pci_dev *pci_dev;
>
> +       pci_dev = to_pci_dev(dev);
>         pci_release_capabilities(pci_dev);
>         pci_release_of_node(pci_dev);
>         pcibios_release_device(pci_dev);
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index 4ff36bfa785e..b8c93c90daf5 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -3,6 +3,20 @@
>  #include <linux/pci-aspm.h>
>  #include "pci.h"
>
> +static void pci_free_resources(struct pci_dev *dev)
> +{
> +       int i;
> +
> +       msi_remove_pci_irq_vectors(dev);

looks like you are in a rush. Why do you put back msi_remove_pci_irq_vectors?

> +
> +       pci_cleanup_rom(dev);
> +       for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +               struct resource *res = dev->resource + i;
> +               if (res->parent)
> +                       release_resource(res);
> +       }
> +}
> +
>  static void pci_stop_dev(struct pci_dev *dev)
>  {
>         pci_pme_active(dev, false);
> @@ -25,6 +39,11 @@ static void pci_destroy_dev(struct pci_dev *dev)
>
>         device_del(&dev->dev);
>
> +       down_write(&pci_bus_sem);
> +       list_del(&dev->bus_list);
> +       up_write(&pci_bus_sem);
> +
> +       pci_free_resources(dev);
>         put_device(&dev->dev);
>  }
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-01-31 13:49                     ` Mika Westerberg
  2014-01-31 16:41                       ` Mika Westerberg
@ 2014-02-01  3:44                       ` Yinghai Lu
  2014-02-01  3:51                         ` Yinghai Lu
  1 sibling, 1 reply; 27+ messages in thread
From: Yinghai Lu @ 2014-02-01  3:44 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Fri, Jan 31, 2014 at 5:49 AM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> On Fri, Jan 31, 2014 at 02:49:21PM +0100, Rafael J. Wysocki wrote:
>> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released
>>
>> OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
>> devices that go away after unplug.
>>
>> Do I think correctly that the below doesn't happen with the Yinghai's commit
>> reverted?
>
> Yes, with that commit reverted everything works fine.

can you make it clear ?

after my commit is reverted, the warning does not happen any more?

Jan 31 20:06:11 buildroot kern.warn kernel: [  453.616434]
------------[ cut here ]------------
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.618102] WARNING:
CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/
0x400()
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.619797]
kobject_add_internal failed for pci_bus (error: -2 parent:
0000:02:00.0)
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.621491] Modules linked in:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.623191] CPU: 1 PID:
956 Comm: kworker/u8:5 Not tainted 3.13.0+ #156
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.624912] Hardware
name:                  /D33217CK, BIOS
GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.626649] Workqueue:
kacpi_hotplug acpi_hotplug_work_fn
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.628395]
0000000000000009 ffff88006de4d9f8 ffffffff818129e3 ffff88006de4da40
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.630164]
ffff88006de4da30 ffffffff81047228 ffff88006dfd1000 00000000fffffffe
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.631933]
ffff88006de140a8 ffff88006d582918 ffff88006d582918 ffff88006de4da90
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.633691] Call Trace:
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.635428]
[<ffffffff818129e3>] dump_stack+0x45/0x56
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.637138]
[<ffffffff81047228>] warn_slowpath_common+0x78/0xa0
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.638879]
[<ffffffff81047297>] warn_slowpath_fmt+0x47/0x50
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.640579]
[<ffffffff812d81ad>] kobject_add_internal+0x12d/0x400
Jan 31 20:06:11 buildroot kern.warn kernel: [  453.642297]
[<ffffffff812d88b5>] kobject_add+0x65/0xb0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-02-01  3:44                       ` Yinghai Lu
@ 2014-02-01  3:51                         ` Yinghai Lu
  2014-02-01 14:35                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Yinghai Lu @ 2014-02-01  3:51 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]

On Fri, Jan 31, 2014 at 7:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 31, 2014 at 5:49 AM, Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
>> On Fri, Jan 31, 2014 at 02:49:21PM +0100, Rafael J. Wysocki wrote:
>>> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released
>>>
>>> OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
>>> devices that go away after unplug.
>>>
>>> Do I think correctly that the below doesn't happen with the Yinghai's commit
>>> reverted?
>>
>> Yes, with that commit reverted everything works fine.
>
> can you make it clear ?
>
> after my commit is reverted, the warning does not happen any more?
>
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.616434]
> ------------[ cut here ]------------
> Jan 31 20:06:11 buildroot kern.warn kernel: [  453.618102] WARNING:
> CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/
> 0x400()

Hi, Mika,

Can you try attached partial reverting?

Thanks

Yinghai

[-- Attachment #2: partial_revert.patch --]
[-- Type: text/x-patch, Size: 768 bytes --]

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 04796c0..dd91116 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1231,10 +1231,6 @@ static void pci_release_dev(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 
-	down_write(&pci_bus_sem);
-	list_del(&pci_dev->bus_list);
-	up_write(&pci_bus_sem);
-
 	pci_free_resources(pci_dev);
 
 	pci_release_capabilities(pci_dev);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 4ff36bf..c8264c4 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -25,6 +25,10 @@ static void pci_destroy_dev(struct pci_dev *dev)
 
 	device_del(&dev->dev);
 
+	down_write(&pci_bus_sem);
+	list_del(&dev->bus_list);
+	up_write(&pci_bus_sem);
+
 	put_device(&dev->dev);
 }
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug
  2014-02-01  3:51                         ` Yinghai Lu
@ 2014-02-01 14:35                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-02-01 14:35 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Mika Westerberg, linux-pci, Bjorn Helgaas, Rafael J. Wysocki

On Friday, January 31, 2014 07:51:49 PM Yinghai Lu wrote:
> 
> --001a11c2019ad93b5d04f1503567
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Fri, Jan 31, 2014 at 7:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Jan 31, 2014 at 5:49 AM, Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> >> On Fri, Jan 31, 2014 at 02:49:21PM +0100, Rafael J. Wysocki wrote:
> >>> > Jan 31 20:05:57 buildroot kern.debug kernel: [  439.672933] pci_bus 0000:03: busn_res: [bus 03-3a] is released
> >>>
> >>> OK, so my guess wasn't right.  We seem to call pci_release_dev for all of the
> >>> devices that go away after unplug.
> >>>
> >>> Do I think correctly that the below doesn't happen with the Yinghai's commit
> >>> reverted?
> >>
> >> Yes, with that commit reverted everything works fine.
> >
> > can you make it clear ?
> >
> > after my commit is reverted, the warning does not happen any more?
> >
> > Jan 31 20:06:11 buildroot kern.warn kernel: [  453.616434]
> > ------------[ cut here ]------------
> > Jan 31 20:06:11 buildroot kern.warn kernel: [  453.618102] WARNING:
> > CPU: 1 PID: 956 at lib/kobject.c:244 kobject_add_internal+0x12d/
> > 0x400()
> 
> Hi, Mika,
> 
> Can you try attached partial reverting?

My mailer has mangled the patch, so the below has been copied by hand
from patchwork:

> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 04796c0..dd91116 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1231,10 +1231,6 @@  static void pci_release_dev(struct device *dev)
> 
>  {
>  
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> 
> -	down_write(&pci_bus_sem);
> -	list_del(&pci_dev->bus_list);
> -	up_write(&pci_bus_sem);
> -
> 
>  	pci_free_resources(pci_dev);
>  	
>  	pci_release_capabilities(pci_dev);
> 
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index 4ff36bf..c8264c4 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -25,6 +25,10 @@  static void pci_destroy_dev(struct pci_dev *dev)
> 
>  	device_del(&dev->dev);
> 
> +	down_write(&pci_bus_sem);
> +	list_del(&dev->bus_list);
> +	up_write(&pci_bus_sem);
> +
> 
>  	put_device(&dev->dev);
>  
>  }

I've tried that, but it only helps partly.  The box doesn't crash any more,
but we get resource conflicts on replug (if that happens sufficiently quickly
after the preceding unplug).

That's why I sent a full revert.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"
  2014-02-01  1:56   ` Yinghai Lu
@ 2014-02-01 14:38     ` Rafael J. Wysocki
  2014-02-01 18:23       ` Linus Torvalds
  2014-02-01 18:48       ` Mika Westerberg
  0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2014-02-01 14:38 UTC (permalink / raw)
  To: Yinghai Lu, Bjorn Helgaas
  Cc: Mika Westerberg, linux-pci, Rafael J. Wysocki, Linus Torvalds,
	Linux Kernel Mailing List

On Friday, January 31, 2014 05:56:30 PM Yinghai Lu wrote:
> On Fri, Jan 31, 2014 at 3:34 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

[...]

> > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> > index 4ff36bfa785e..b8c93c90daf5 100644
> > --- a/drivers/pci/remove.c
> > +++ b/drivers/pci/remove.c
> > @@ -3,6 +3,20 @@
> >  #include <linux/pci-aspm.h>
> >  #include "pci.h"
> >
> > +static void pci_free_resources(struct pci_dev *dev)
> > +{
> > +       int i;
> > +
> > +       msi_remove_pci_irq_vectors(dev);
> 
> looks like you are in a rush. Why do you put back msi_remove_pci_irq_vectors?

I simply did "git revert" and that's the result.

Sorry about overlooking that, but your commit's changelog didn't mention
removing it either.

Updated revert follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"

Revert commit ef83b0781a73 "PCI: Remove from bus_list and release
resources in pci_release_dev()" that made some nasty race conditions
become possible.  For example, if a Thunderbolt link is unplugged
and then replugged immediately, the pci_release_dev() resulting from
the hot-remove code path may be racing with the hot-add code path
which after that commit causes various kinds of breakage to happen
(up to and including a hard crash of the whole system).

Moreover, the problem that commit ef83b0781a73 attempted to address
cannot happen any more after commit 8a4c5c329de7 "PCI: Check parent
kobject in pci_destroy_dev()", because pci_destroy_dev() will now
return immediately if it has already been executed for the given
device.

Note, however, that the invocation of msi_remove_pci_irq_vectors()
removed by commit ef83b0781a73 from pci_free_resources() along with
the other changes made by it is not added back because of subsequent
code changes depending on that modification.

Fixes: ef83b0781a73 (PCI: Remove from bus_list and release resources in pci_release_dev())
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

---
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

---
 drivers/pci/probe.c  |   21 ++-------------------
 drivers/pci/remove.c |   17 +++++++++++++++++
 2 files changed, 19 insertions(+), 19 deletions(-)

Index: linux-pm/drivers/pci/probe.c
===================================================================
--- linux-pm.orig/drivers/pci/probe.c
+++ linux-pm/drivers/pci/probe.c
@@ -1208,18 +1208,6 @@ static void pci_release_capabilities(str
 	pci_free_cap_save_buffers(dev);
 }
 
-static void pci_free_resources(struct pci_dev *dev)
-{
-	int i;
-
-	pci_cleanup_rom(dev);
-	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-		struct resource *res = dev->resource + i;
-		if (res->parent)
-			release_resource(res);
-	}
-}
-
 /**
  * pci_release_dev - free a pci device structure when all users of it are finished.
  * @dev: device that's been disconnected
@@ -1229,14 +1217,9 @@ static void pci_free_resources(struct pc
  */
 static void pci_release_dev(struct device *dev)
 {
-	struct pci_dev *pci_dev = to_pci_dev(dev);
-
-	down_write(&pci_bus_sem);
-	list_del(&pci_dev->bus_list);
-	up_write(&pci_bus_sem);
-
-	pci_free_resources(pci_dev);
+	struct pci_dev *pci_dev;
 
+	pci_dev = to_pci_dev(dev);
 	pci_release_capabilities(pci_dev);
 	pci_release_of_node(pci_dev);
 	pcibios_release_device(pci_dev);
Index: linux-pm/drivers/pci/remove.c
===================================================================
--- linux-pm.orig/drivers/pci/remove.c
+++ linux-pm/drivers/pci/remove.c
@@ -3,6 +3,18 @@
 #include <linux/pci-aspm.h>
 #include "pci.h"
 
+static void pci_free_resources(struct pci_dev *dev)
+{
+	int i;
+
+	pci_cleanup_rom(dev);
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		struct resource *res = dev->resource + i;
+		if (res->parent)
+			release_resource(res);
+	}
+}
+
 static void pci_stop_dev(struct pci_dev *dev)
 {
 	pci_pme_active(dev, false);
@@ -25,6 +37,11 @@ static void pci_destroy_dev(struct pci_d
 
 	device_del(&dev->dev);
 
+	down_write(&pci_bus_sem);
+	list_del(&dev->bus_list);
+	up_write(&pci_bus_sem);
+
+	pci_free_resources(dev);
 	put_device(&dev->dev);
 }
 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"
  2014-02-01 14:38     ` Rafael J. Wysocki
@ 2014-02-01 18:23       ` Linus Torvalds
  2014-02-01 18:48       ` Mika Westerberg
  1 sibling, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2014-02-01 18:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Yinghai Lu, Bjorn Helgaas, Mika Westerberg, linux-pci,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Sat, Feb 1, 2014 at 6:38 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Updated revert follows.

I'm taking this directly, since I'll cut rc1 tomorrow (or maybe later today).

           Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"
  2014-02-01 14:38     ` Rafael J. Wysocki
  2014-02-01 18:23       ` Linus Torvalds
@ 2014-02-01 18:48       ` Mika Westerberg
  1 sibling, 0 replies; 27+ messages in thread
From: Mika Westerberg @ 2014-02-01 18:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Yinghai Lu, Bjorn Helgaas, linux-pci, Rafael J. Wysocki,
	Linus Torvalds, Linux Kernel Mailing List

On Sat, Feb 01, 2014 at 03:38:29PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"
> 
> Revert commit ef83b0781a73 "PCI: Remove from bus_list and release
> resources in pci_release_dev()" that made some nasty race conditions
> become possible.  For example, if a Thunderbolt link is unplugged
> and then replugged immediately, the pci_release_dev() resulting from
> the hot-remove code path may be racing with the hot-add code path
> which after that commit causes various kinds of breakage to happen
> (up to and including a hard crash of the whole system).
> 
> Moreover, the problem that commit ef83b0781a73 attempted to address
> cannot happen any more after commit 8a4c5c329de7 "PCI: Check parent
> kobject in pci_destroy_dev()", because pci_destroy_dev() will now
> return immediately if it has already been executed for the given
> device.
> 
> Note, however, that the invocation of msi_remove_pci_irq_vectors()
> removed by commit ef83b0781a73 from pci_free_resources() along with
> the other changes made by it is not added back because of subsequent
> code changes depending on that modification.
> 
> Fixes: ef83b0781a73 (PCI: Remove from bus_list and release resources in pci_release_dev())
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks, that fixes the problem I'm seeing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-02-01 18:41 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-30 13:12 Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug Mika Westerberg
2014-01-30 16:48 ` Yinghai Lu
2014-01-30 16:56   ` Yinghai Lu
2014-01-30 23:39     ` Rafael J. Wysocki
2014-01-30 23:39       ` Yinghai Lu
2014-01-30 23:59         ` Rafael J. Wysocki
2014-01-31  0:38           ` Rafael J. Wysocki
2014-01-31  1:39             ` Yinghai Lu
2014-01-31 10:53             ` Mika Westerberg
2014-01-31 11:52               ` Rafael J. Wysocki
2014-01-31 12:36                 ` Mika Westerberg
2014-01-31 13:49                   ` Rafael J. Wysocki
2014-01-31 13:49                     ` Mika Westerberg
2014-01-31 16:41                       ` Mika Westerberg
2014-02-01  3:44                       ` Yinghai Lu
2014-02-01  3:51                         ` Yinghai Lu
2014-02-01 14:35                           ` Rafael J. Wysocki
2014-01-31 14:05                     ` Rafael J. Wysocki
2014-01-31  0:39         ` Rafael J. Wysocki
2014-01-31  1:04           ` Rafael J. Wysocki
2014-01-31  1:38           ` Yinghai Lu
2014-01-30 23:58       ` Rafael J. Wysocki
2014-01-31 23:34 ` [PATCH] Revert "PCI: Remove from bus_list and release resources in pci_release_dev()" Rafael J. Wysocki
2014-02-01  1:56   ` Yinghai Lu
2014-02-01 14:38     ` Rafael J. Wysocki
2014-02-01 18:23       ` Linus Torvalds
2014-02-01 18:48       ` Mika Westerberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.