All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Lukas Wunner <lukas@wunner.de>,
	Keith Busch <keith.busch@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Frederick Lawler <fred@fredlawl.com>,
	"Gustavo A . R . Silva" <gustavo@embeddedor.com>,
	Sinan Kaya <okaya@kernel.org>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/2] PCI: pciehp: Prevent deadlock on disconnect
Date: Wed, 13 Nov 2019 13:15:19 +0200	[thread overview]
Message-ID: <20191113111519.GI34425@lahna.fi.intel.com> (raw)
In-Reply-To: <20191113031752.GA227753@google.com>

On Tue, Nov 12, 2019 at 09:17:52PM -0600, Bjorn Helgaas wrote:
> On Tue, Oct 29, 2019 at 08:00:22PM +0300, Mika Westerberg wrote:
> > If there are more than one PCIe switch with hotplug downstream ports
> > hot-removing them leads to a following deadlock:
> 
> Does this happen if two sibling switches are removed simultaneously,
> or does this happen when switch A leads to switch B and you remove
> switch A (with obviously also removes switch B)?  I'm guessing the
> latter because it would be easier to reproduce.  And I guess that's
> obvious from the text below.

Yes, that's right.

> >  INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
> >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >  irq/126-pciehp  D    0   198      2 0x80000000
> >  Call Trace:
> >   __schedule+0x2a2/0x880
> >   schedule+0x2c/0x80
> >   schedule_timeout+0x246/0x350
> >   ? ttwu_do_activate+0x67/0x90
> >   wait_for_completion+0xb7/0x140
> >   ? wake_up_q+0x80/0x80
> >   kthread_stop+0x49/0x110
> >   __free_irq+0x15c/0x2a0
> >   free_irq+0x32/0x70
> >   pcie_shutdown_notification+0x2f/0x50
> >   pciehp_remove+0x27/0x50
> >   pcie_port_remove_service+0x36/0x50
> >   device_release_driver_internal+0x18c/0x250
> >   device_release_driver+0x12/0x20
> >   bus_remove_device+0xec/0x160
> >   device_del+0x13b/0x350
> >   ? pcie_port_find_device+0x60/0x60
> >   device_unregister+0x1a/0x60
> >   remove_iter+0x1e/0x30
> >   device_for_each_child+0x56/0x90
> >   pcie_port_device_remove+0x22/0x40
> >   pcie_portdrv_remove+0x20/0x60
> >   pci_device_remove+0x3e/0xc0
> >   device_release_driver_internal+0x18c/0x250
> >   device_release_driver+0x12/0x20
> >   pci_stop_bus_device+0x6f/0x90
> >   pci_stop_bus_device+0x31/0x90
> >   pci_stop_and_remove_bus_device+0x12/0x20
> >   pciehp_unconfigure_device+0x88/0x140
> >   pciehp_disable_slot+0x6a/0x110
> >   pciehp_handle_presence_or_link_change+0x263/0x400
> >   pciehp_ist+0x1c9/0x1d0
> >   ? irq_forced_thread_fn+0x80/0x80
> >   irq_thread_fn+0x24/0x60
> >   irq_thread+0xeb/0x190
> >   ? irq_thread_fn+0x60/0x60
> >   kthread+0x120/0x140
> >   ? irq_thread_check_affinity+0xf0/0xf0
> >   ? kthread_park+0x90/0x90
> >   ret_from_fork+0x35/0x40
> >  INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
> >  irq/190-pciehp  D    0  2288      2 0x80000000
> >  Call Trace:
> >   __schedule+0x2a2/0x880
> >   schedule+0x2c/0x80
> >   schedule_preempt_disabled+0xe/0x10
> >   __mutex_lock.isra.9+0x2e0/0x4d0
> >   ? __mutex_lock_slowpath+0x13/0x20
> >   __mutex_lock_slowpath+0x13/0x20
> >   mutex_lock+0x2c/0x30
> >   pci_lock_rescan_remove+0x15/0x20
> >   pciehp_unconfigure_device+0x4d/0x140
> >   pciehp_disable_slot+0x6a/0x110
> >   pciehp_handle_presence_or_link_change+0x263/0x400
> >   pciehp_ist+0x1c9/0x1d0
> >   ? irq_forced_thread_fn+0x80/0x80
> >   irq_thread_fn+0x24/0x60
> >   irq_thread+0xeb/0x190
> >   ? irq_thread_fn+0x60/0x60
> >   kthread+0x120/0x140
> >   ? irq_thread_check_affinity+0xf0/0xf0
> >   ? kthread_park+0x90/0x90
> >   ret_from_fork+0x35/0x40
> > 
> > What happens here is that the whole hierarchy is runtime resumed and the
> 
> What is the runtime resume connection here?  I do see that
> pcie_portdrv_remove() may call pm_runtime_forbid(), which looks like
> it resumes devices, but I don't see whether that's actually relevant
> to the deadlock.

When the parent port removes its children the driver core 
(__device_release_driver) calls pm_runtime_get_sync() for the device
which ends up calling the resume hook of the child pciehp.

> > parent PCIe downstream port, who got the hot-remove event, starts
> > removing devices below it taking pci_lock_rescan_remove() lock. When the
> > child PCIe port is runtime resumed it calls pciehp_check_presence()
> > which ends up calling pciehp_card_present() and pciehp_check_link_active().
> 
> Oh, I see, pciehp_resume() calls pciehp_check_presence(), which
> schedules the IRQ thread via pciehp_request().  So does this deadlock
> only happen if the port(s) have been runtime suspended?

I'm aware of two cases where this happens with a real hardware.

First one is that all involved ports are runtime suspended and you
unplug them. This can happen easily if the drivers involved
automatically enable runtime PM (xHCI for example does that).

The other more common case is that you suspend your laptop (close the
lid with the dock + something else connected. Then you unplug the dock
and after some time open the lid. Basically system suspend/resume cycle
so that you unplug the devices when it is suspended.

This patch helps for both cases above.

As Lukas commented there is also a bit more syntetic case where the
removal is done through sysfs in paraller to hot-removing the device. I
believe this one is still unfixed.

> > Both of these read their parts of PCIe config space by calling helper
> > function pcie_capability_read_word(). Now, this function notices that
> > the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
> > with the capability value set to 0.  When pciehp gets this value it
> > thinks that its child device is also hot-removed and schedules its IRQ
> > thread to handle the event.
> 
> The child device actually *has* been hot-removed, right? :)

Yes.

> In addition to checking for PCIBIOS_DEVICE_NOT_FOUND, you added checks
> for ~0:
> 
> > +	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> > +	if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0)
> > +		return -ENODEV;
> 
> That makes sense to me and I think it's the right thing to do.

So do I :)

> But I do wonder whether pcie_capability_read_word() is doing the wrong
> thing when it sets "*val = 0" in the error case.  I suspect that just
> complicates the callers for no good reason.  The callers could
> otherwise simply check for ~0 as a "this may be an error response"
> value.

I agree.

We discussed about that in v2 and you said that you would like to
explore removal of "*val = 0" but not in the context of this issue [1].

[1] https://patchwork.kernel.org/patch/11089973/#22961207

  parent reply	other threads:[~2019-11-13 11:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-29 17:00 [PATCH v3 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
2019-10-29 17:00 ` [PATCH v3 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
2019-11-13  3:17   ` Bjorn Helgaas
2019-11-13  6:35     ` Lukas Wunner
2019-11-13 11:15     ` Mika Westerberg [this message]
2019-11-12  9:32 ` [PATCH v3 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
2019-11-13 23:06 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113111519.GI34425@lahna.fi.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=fred@fredlawl.com \
    --cc=gustavo@embeddedor.com \
    --cc=helgaas@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=okaya@kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.