linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Andreas Noever <andreas.noever@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan
Date: Mon, 28 Oct 2013 21:30:28 -0600	[thread overview]
Message-ID: <CAErSpo6g5vR0NMs18YZBz+6RGZd0zNeDu_=1Pk4c5a+OqHfS1g@mail.gmail.com> (raw)
In-Reply-To: <20131016202123.GA17866@google.com>

On Wed, Oct 16, 2013 at 2:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
>> > [+cc Rafael, Mika, Kirill, linux-pci]
>> >
>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever
>> > <andreas.noever@gmail.com> wrote:
>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
>> > > crashes a few seconds later. Using
>> > > echo 1 > /sys/bus/pci/devices/0000:08:00.0/remove
>> > > to remove a bridge two levels above the device triggers the fault immediately:
>> >
>> > There have been significant changes in acpiphp related to Thunderbolt
>> > since v3.11.
>>
>> Apple don't expose Thunderbolt via ACPI, so it appears as native PCIe.
>> I'd be surprised if acpiphp makes a difference here.
>
> Yeah, you're right; I wasn't paying attention.
>
> We save a pci_dev pointer in the pci_pme_list, which of course has a
> longer lifetime than the pci_dev itself, but we don't acquire a reference
> on it, so I suspect the pci_dev got released before we got around to
> doing the pci_pme_list_scan().
>
> Andreas, can you try the patch below?  It's against v3.12-rc2, but it
> should apply to v3.11, too.
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ad7fc72..8b0a2f3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1580,6 +1580,7 @@ static void pci_pme_list_scan(struct work_struct *work)
>                                 pci_pme_wakeup(pme_dev->dev, NULL);
>                         } else {
>                                 list_del(&pme_dev->list);
> +                               pci_dev_put(pme_dev->dev);
>                                 kfree(pme_dev);
>                         }
>                 }
> @@ -1640,7 +1641,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
>                                           GFP_KERNEL);
>                         if (!pme_dev)
>                                 goto out;
> -                       pme_dev->dev = dev;
> +                       pme_dev->dev = pci_dev_get(dev);
>                         mutex_lock(&pci_pme_list_mutex);
>                         list_add(&pme_dev->list, &pci_pme_list);
>                         if (list_is_singular(&pci_pme_list))
> @@ -1652,6 +1653,7 @@ void pci_pme_active(struct pci_dev *dev, bool enable)
>                         list_for_each_entry(pme_dev, &pci_pme_list, list) {
>                                 if (pme_dev->dev == dev) {
>                                         list_del(&pme_dev->list);
> +                                       pci_dev_put(pme_dev->dev);
>                                         kfree(pme_dev);
>                                         break;
>                                 }

The patch above covered up the problem, but is incorrect.  The topology is:

  08:00.0 PCI bridge to [bus 09-0a] Thunderbolt Upstream Port
  09:00.0 PCI bridge to [bus 0a]    Thunderbolt Downstream Port
  0a:00.0 tg3 NIC

and the sequence leading to the crash is (edited for brevity):

  remove_store(08:00.0)
    pci_stop_and_remove_bus_device(08:00.0)     # Upstream Port
      pci_stop_bus_device(08:00.0)
        pci_stop_bus_device(09:00.0)            # Downstream Port
          pci_stop_bus_device(0a:00.0)          # tg3 device
            pci_stop_dev(0a:00.0)
              pci_pme_active(0a:00.0, false)    # remove from pci_pme_list
              device_del(0a:00.0)
                device_release_driver
                  tg3_remove_one
                    unregister_netdev
                      dev->netdev_ops->ndo_stop # tg3_close
                      tg3_close
                        pci_wake_from_d3
                          pci_pme_active(dev, true)     # add to pci_pme_list
      pci_remove_bus_device(08:00.0)
        pci_remove_bus_device(09:00.0)
          pci_remove_bus_device(0a:00.0)
            pci_destroy_dev(0a:00.0)
              put_device(0a:00.0)               # drop last tg3
pci_dev reference
              ...
              pci_release_dev                   # pci_dev release function
                kfree(0a:00.0)
  ...
  pci_pme_list_scan
    0a:00.0 still on list => use-after-free

The patch above avoids the crash by acquiring a reference when adding
to pci_pme_list, so when pci_destroy_dev() drops the reference, it's
not the *last* reference, so the pci_dev is not released.

The problem is that the reference acquired when we add to pci_pme_list
will *never* be dropped, so we leaked the pci_dev.

Bjorn

      parent reply	other threads:[~2013-10-29  3:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-14 22:47 [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan Andreas Noever
2013-10-14 23:50 ` Bjorn Helgaas
2013-10-15  2:44   ` Matthew Garrett
2013-10-16 20:21     ` Bjorn Helgaas
2013-10-17 13:59       ` Andreas Noever
2013-10-23  3:32         ` Bjorn Helgaas
2013-10-24  5:53           ` Yinghai Lu
2013-10-25  3:33             ` Bjorn Helgaas
2013-10-25  5:13               ` Yinghai Lu
2013-10-25  5:28                 ` Yinghai Lu
2013-10-25 23:01                 ` Bjorn Helgaas
2013-10-27  0:39                   ` Andreas Noever
2013-11-15 11:52               ` Mika Westerberg
2013-11-19  1:33                 ` Bjorn Helgaas
2013-11-19  1:54                   ` Yijing Wang
2013-11-19 17:18                     ` Bjorn Helgaas
2013-11-20  1:14                       ` Yijing Wang
2013-11-20  1:20                         ` Bjorn Helgaas
2013-11-20  1:39                           ` Yijing Wang
2013-11-19 10:06                   ` Mika Westerberg
2013-10-30  7:57             ` Yijing Wang
2013-10-31  6:48               ` Yinghai Lu
2013-10-23 23:53         ` Bjorn Helgaas
2013-10-29  3:30       ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAErSpo6g5vR0NMs18YZBz+6RGZd0zNeDu_=1Pk4c5a+OqHfS1g@mail.gmail.com' \
    --to=bhelgaas@google.com \
    --cc=andreas.noever@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mjg59@srcf.ucam.org \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).