linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question about Hotplug and PME deadlock issue
@ 2019-02-27 11:52 Dongdong Liu
  2019-02-27 12:31 ` Lukas Wunner
  0 siblings, 1 reply; 3+ messages in thread
From: Dongdong Liu @ 2019-02-27 11:52 UTC (permalink / raw)
  To: Bjorn Helgaas, mika.westerberg, lukas; +Cc: Linux PCI, Linuxarm

Hi

We do some test to trigger hotplug while do sysfs "remove" operation,
then met a deadlock issue.

pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
echo 1 > 0000\:00\:0c.0/remove

PME and hotplug share an MSI/MSI-X vector.
The sysfs "remove" operation will call the below function.
remove_store()
   pci_stop_and_remove_bus_device_locked()
     pci_lock_rescan_remove()
     pci_stop_and_remove_bus_device()
       ...
       pcie_pme_remove()
         synchronize_irq()   // Here will wait for hotplug IRQ handler
     pci_unlock_rescan_remove();

Hotplug
pciehp_ist()
   pciehp_handle_presence_or_link_change()
     pciehp_configure_device()
       pci_lock_rescan_remove() // Here will wait the pci_unlock_rescan_remove()

So met the deadlock issue. Any idea to solve such issue ?

2180.490703] INFO: task bash:10913 blocked for more than 120 seconds.
[ 2180.503428]       Tainted: P           OE     4.19.5-1.1.29.aarch64 #1
[ 2180.516496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2180.532176] bash            D    0 10913   6565 0x00000200

  # ps -ax |grep D
   PID TTY      STAT   TIME COMMAND
10913 ttyAMA0  Ds+    0:00 -bash
12945 ?        Ss     0:00 /usr/sbin/sshd -D
14022 ?        D      0:00 [irq/745-pciehp]
15972 pts/1    S+     0:00 grep --color=auto D
# cat /proc/14022/stack
[<0>] __switch_to+0x94/0xd8
[<0>] pci_lock_rescan_remove+0x20/0x28
[<0>] pciehp_configure_device+0x30/0x140
[<0>] pciehp_handle_presence_or_link_change+0x324/0x458
[<0>] pciehp_ist+0x1dc/0x1e0
[<0>] irq_thread_fn+0x30/0x90
[<0>] irq_thread+0x140/0x210
[<0>] kthread+0x134/0x138
[<0>] ret_from_fork+0x10/0x1c
[<0>] 0xffffffffffffffff
  # cat /proc/10913/stack
[<0>] __switch_to+0x94/0xd8
[<0>] synchronize_irq+0x8c/0xc0
[<0>] pcie_pme_suspend+0xa4/0x118
[<0>] pcie_pme_remove+0x20/0x40
[<0>] pcie_port_remove_service+0x3c/0x58
[<0>] device_release_driver_internal+0x1b4/0x250
[<0>] device_release_driver+0x28/0x38
[<0>] bus_remove_device+0xd4/0x160
[<0>] device_del+0x128/0x348
[<0>] device_unregister+0x24/0x78
[<0>] remove_iter+0x48/0x58
[<0>] device_for_each_child+0x6c/0xb8
[<0>] pcie_port_device_remove+0x2c/0x48
[<0>] pcie_portdrv_remove+0x68/0x78
[<0>] pci_device_remove+0x48/0x120
[<0>] device_release_driver_internal+0x1b4/0x250
[<0>] device_release_driver+0x28/0x38
[<0>] pci_stop_bus_device+0x84/0xc0
[<0>] pci_stop_and_remove_bus_device_locked+0x24/0x40
[<0>] remove_store+0xa4/0xb8
[<0>] dev_attr_store+0x44/0x60
[<0>] sysfs_kf_write+0x58/0x80
[<0>] kernfs_fop_write+0xd8/0x1e0
[<0>] __vfs_write+0x60/0x190
[<0>] vfs_write+0xac/0x1c0
[<0>] ksys_write+0x6c/0xd8
[<0>] __arm64_sys_write+0x24/0x30
[<0>] el0_svc_common+0x78/0x100
[<0>] el0_svc_handler+0x38/0x88
[<0>] el0_svc+0x8/0xc
[<0>] 0xffffffffffffffff

Thanks,
Dongdong



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about Hotplug and PME deadlock issue
  2019-02-27 11:52 Question about Hotplug and PME deadlock issue Dongdong Liu
@ 2019-02-27 12:31 ` Lukas Wunner
  2019-02-27 13:27   ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Lukas Wunner @ 2019-02-27 12:31 UTC (permalink / raw)
  To: Dongdong Liu
  Cc: Bjorn Helgaas, mika.westerberg, Linux PCI, Linuxarm, Keith Busch,
	Rafael J. Wysocki

On Wed, Feb 27, 2019 at 07:52:16PM +0800, Dongdong Liu wrote:
> We do some test to trigger hotplug while do sysfs "remove" operation,
> then met a deadlock issue.
> 
> pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
> echo 1 > 0000\:00\:0c.0/remove
> 
> PME and hotplug share an MSI/MSI-X vector.
> The sysfs "remove" operation will call the below function.
> remove_store()
>   pci_stop_and_remove_bus_device_locked()
>     pci_lock_rescan_remove()
>     pci_stop_and_remove_bus_device()
>       ...
>       pcie_pme_remove()
>         synchronize_irq()   // Here will wait for hotplug IRQ handler
>     pci_unlock_rescan_remove();

This is (at least) a bug in the PME driver.  I'm not familiar with
that driver, so adding Keith Busch to cc.

The bug is that pcie_pme_remove() invokes pcie_pme_suspend(), which
calls synchronize_irq(), which waits for the hardirq handlers and
IRQ threads of all drivers sharing the IRQ to finish.

pcie_pme_remove() then calls free_irq() which *again* waits for a
running hardirq handler and thread to finish, however it does not
wait for those of drivers sharing the IRQ to finish since 4.19,
see commit 519cc8652b3a ("genirq: Synchronize only with single thread
on free_irq()").

I think the invocation of synchronize_irq() in pcie_pme_suspend() is
unnecessary when called from pcie_pme_remove(), it's been there since
the driver was added by Rafael with commit c7f486567c1d ("PCI PM: PCIe
PME root port service driver"), adding him to cc as well.

Thanks,
Lukas

> 
> Hotplug
> pciehp_ist()
>   pciehp_handle_presence_or_link_change()
>     pciehp_configure_device()
>       pci_lock_rescan_remove() // Here will wait the pci_unlock_rescan_remove()
> 
> So met the deadlock issue. Any idea to solve such issue ?
> 
> 2180.490703] INFO: task bash:10913 blocked for more than 120 seconds.
> [ 2180.503428]       Tainted: P           OE     4.19.5-1.1.29.aarch64 #1
> [ 2180.516496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2180.532176] bash            D    0 10913   6565 0x00000200
> 
>  # ps -ax |grep D
>   PID TTY      STAT   TIME COMMAND
> 10913 ttyAMA0  Ds+    0:00 -bash
> 12945 ?        Ss     0:00 /usr/sbin/sshd -D
> 14022 ?        D      0:00 [irq/745-pciehp]
> 15972 pts/1    S+     0:00 grep --color=auto D
> # cat /proc/14022/stack
> [<0>] __switch_to+0x94/0xd8
> [<0>] pci_lock_rescan_remove+0x20/0x28
> [<0>] pciehp_configure_device+0x30/0x140
> [<0>] pciehp_handle_presence_or_link_change+0x324/0x458
> [<0>] pciehp_ist+0x1dc/0x1e0
> [<0>] irq_thread_fn+0x30/0x90
> [<0>] irq_thread+0x140/0x210
> [<0>] kthread+0x134/0x138
> [<0>] ret_from_fork+0x10/0x1c
> [<0>] 0xffffffffffffffff
>  # cat /proc/10913/stack
> [<0>] __switch_to+0x94/0xd8
> [<0>] synchronize_irq+0x8c/0xc0
> [<0>] pcie_pme_suspend+0xa4/0x118
> [<0>] pcie_pme_remove+0x20/0x40
> [<0>] pcie_port_remove_service+0x3c/0x58
> [<0>] device_release_driver_internal+0x1b4/0x250
> [<0>] device_release_driver+0x28/0x38
> [<0>] bus_remove_device+0xd4/0x160
> [<0>] device_del+0x128/0x348
> [<0>] device_unregister+0x24/0x78
> [<0>] remove_iter+0x48/0x58
> [<0>] device_for_each_child+0x6c/0xb8
> [<0>] pcie_port_device_remove+0x2c/0x48
> [<0>] pcie_portdrv_remove+0x68/0x78
> [<0>] pci_device_remove+0x48/0x120
> [<0>] device_release_driver_internal+0x1b4/0x250
> [<0>] device_release_driver+0x28/0x38
> [<0>] pci_stop_bus_device+0x84/0xc0
> [<0>] pci_stop_and_remove_bus_device_locked+0x24/0x40
> [<0>] remove_store+0xa4/0xb8
> [<0>] dev_attr_store+0x44/0x60
> [<0>] sysfs_kf_write+0x58/0x80
> [<0>] kernfs_fop_write+0xd8/0x1e0
> [<0>] __vfs_write+0x60/0x190
> [<0>] vfs_write+0xac/0x1c0
> [<0>] ksys_write+0x6c/0xd8
> [<0>] __arm64_sys_write+0x24/0x30
> [<0>] el0_svc_common+0x78/0x100
> [<0>] el0_svc_handler+0x38/0x88
> [<0>] el0_svc+0x8/0xc
> [<0>] 0xffffffffffffffff
> 
> Thanks,
> Dongdong

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about Hotplug and PME deadlock issue
  2019-02-27 12:31 ` Lukas Wunner
@ 2019-02-27 13:27   ` Rafael J. Wysocki
  0 siblings, 0 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2019-02-27 13:27 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Dongdong Liu, Bjorn Helgaas, mika.westerberg, Linux PCI,
	Linuxarm, Keith Busch

On Wednesday, February 27, 2019 1:31:59 PM CET Lukas Wunner wrote:
> On Wed, Feb 27, 2019 at 07:52:16PM +0800, Dongdong Liu wrote:
> > We do some test to trigger hotplug while do sysfs "remove" operation,
> > then met a deadlock issue.
> > 
> > pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
> > echo 1 > 0000\:00\:0c.0/remove
> > 
> > PME and hotplug share an MSI/MSI-X vector.
> > The sysfs "remove" operation will call the below function.
> > remove_store()
> >   pci_stop_and_remove_bus_device_locked()
> >     pci_lock_rescan_remove()
> >     pci_stop_and_remove_bus_device()
> >       ...
> >       pcie_pme_remove()
> >         synchronize_irq()   // Here will wait for hotplug IRQ handler
> >     pci_unlock_rescan_remove();
> 
> This is (at least) a bug in the PME driver.  I'm not familiar with
> that driver, so adding Keith Busch to cc.
> 
> The bug is that pcie_pme_remove() invokes pcie_pme_suspend(), which
> calls synchronize_irq(), which waits for the hardirq handlers and
> IRQ threads of all drivers sharing the IRQ to finish.
> 
> pcie_pme_remove() then calls free_irq() which *again* waits for a
> running hardirq handler and thread to finish, however it does not
> wait for those of drivers sharing the IRQ to finish since 4.19,
> see commit 519cc8652b3a ("genirq: Synchronize only with single thread
> on free_irq()").
> 
> I think the invocation of synchronize_irq() in pcie_pme_suspend() is
> unnecessary when called from pcie_pme_remove(), it's been there since
> the driver was added by Rafael with commit c7f486567c1d ("PCI PM: PCIe
> PME root port service driver"), adding him to cc as well.

Thanks!

AFAICS, the only part of pcie_pme_suspend() that needs to be done on
remove is the one under data->lock.  I'll cut a patch to make that
change.

Cheers,
Rafael


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-27 13:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-27 11:52 Question about Hotplug and PME deadlock issue Dongdong Liu
2019-02-27 12:31 ` Lukas Wunner
2019-02-27 13:27   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).