* Question about Hotplug and PME deadlock issue
@ 2019-02-27 11:52 Dongdong Liu
2019-02-27 12:31 ` Lukas Wunner
0 siblings, 1 reply; 3+ messages in thread
From: Dongdong Liu @ 2019-02-27 11:52 UTC (permalink / raw)
To: Bjorn Helgaas, mika.westerberg, lukas; +Cc: Linux PCI, Linuxarm
Hi
We do some test to trigger hotplug while do sysfs "remove" operation,
then met a deadlock issue.
pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
echo 1 > 0000\:00\:0c.0/remove
PME and hotplug share an MSI/MSI-X vector.
The sysfs "remove" operation will call the below function.
remove_store()
pci_stop_and_remove_bus_device_locked()
pci_lock_rescan_remove()
pci_stop_and_remove_bus_device()
...
pcie_pme_remove()
synchronize_irq() // Here will wait for hotplug IRQ handler
pci_unlock_rescan_remove();
Hotplug
pciehp_ist()
pciehp_handle_presence_or_link_change()
pciehp_configure_device()
pci_lock_rescan_remove() // Here will wait the pci_unlock_rescan_remove()
So met the deadlock issue. Any idea to solve such issue ?
2180.490703] INFO: task bash:10913 blocked for more than 120 seconds.
[ 2180.503428] Tainted: P OE 4.19.5-1.1.29.aarch64 #1
[ 2180.516496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2180.532176] bash D 0 10913 6565 0x00000200
# ps -ax |grep D
PID TTY STAT TIME COMMAND
10913 ttyAMA0 Ds+ 0:00 -bash
12945 ? Ss 0:00 /usr/sbin/sshd -D
14022 ? D 0:00 [irq/745-pciehp]
15972 pts/1 S+ 0:00 grep --color=auto D
# cat /proc/14022/stack
[<0>] __switch_to+0x94/0xd8
[<0>] pci_lock_rescan_remove+0x20/0x28
[<0>] pciehp_configure_device+0x30/0x140
[<0>] pciehp_handle_presence_or_link_change+0x324/0x458
[<0>] pciehp_ist+0x1dc/0x1e0
[<0>] irq_thread_fn+0x30/0x90
[<0>] irq_thread+0x140/0x210
[<0>] kthread+0x134/0x138
[<0>] ret_from_fork+0x10/0x1c
[<0>] 0xffffffffffffffff
# cat /proc/10913/stack
[<0>] __switch_to+0x94/0xd8
[<0>] synchronize_irq+0x8c/0xc0
[<0>] pcie_pme_suspend+0xa4/0x118
[<0>] pcie_pme_remove+0x20/0x40
[<0>] pcie_port_remove_service+0x3c/0x58
[<0>] device_release_driver_internal+0x1b4/0x250
[<0>] device_release_driver+0x28/0x38
[<0>] bus_remove_device+0xd4/0x160
[<0>] device_del+0x128/0x348
[<0>] device_unregister+0x24/0x78
[<0>] remove_iter+0x48/0x58
[<0>] device_for_each_child+0x6c/0xb8
[<0>] pcie_port_device_remove+0x2c/0x48
[<0>] pcie_portdrv_remove+0x68/0x78
[<0>] pci_device_remove+0x48/0x120
[<0>] device_release_driver_internal+0x1b4/0x250
[<0>] device_release_driver+0x28/0x38
[<0>] pci_stop_bus_device+0x84/0xc0
[<0>] pci_stop_and_remove_bus_device_locked+0x24/0x40
[<0>] remove_store+0xa4/0xb8
[<0>] dev_attr_store+0x44/0x60
[<0>] sysfs_kf_write+0x58/0x80
[<0>] kernfs_fop_write+0xd8/0x1e0
[<0>] __vfs_write+0x60/0x190
[<0>] vfs_write+0xac/0x1c0
[<0>] ksys_write+0x6c/0xd8
[<0>] __arm64_sys_write+0x24/0x30
[<0>] el0_svc_common+0x78/0x100
[<0>] el0_svc_handler+0x38/0x88
[<0>] el0_svc+0x8/0xc
[<0>] 0xffffffffffffffff
Thanks,
Dongdong
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question about Hotplug and PME deadlock issue
2019-02-27 11:52 Question about Hotplug and PME deadlock issue Dongdong Liu
@ 2019-02-27 12:31 ` Lukas Wunner
2019-02-27 13:27 ` Rafael J. Wysocki
0 siblings, 1 reply; 3+ messages in thread
From: Lukas Wunner @ 2019-02-27 12:31 UTC (permalink / raw)
To: Dongdong Liu
Cc: Bjorn Helgaas, mika.westerberg, Linux PCI, Linuxarm, Keith Busch,
Rafael J. Wysocki
On Wed, Feb 27, 2019 at 07:52:16PM +0800, Dongdong Liu wrote:
> We do some test to trigger hotplug while do sysfs "remove" operation,
> then met a deadlock issue.
>
> pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
> echo 1 > 0000\:00\:0c.0/remove
>
> PME and hotplug share an MSI/MSI-X vector.
> The sysfs "remove" operation will call the below function.
> remove_store()
> pci_stop_and_remove_bus_device_locked()
> pci_lock_rescan_remove()
> pci_stop_and_remove_bus_device()
> ...
> pcie_pme_remove()
> synchronize_irq() // Here will wait for hotplug IRQ handler
> pci_unlock_rescan_remove();
This is (at least) a bug in the PME driver. I'm not familiar with
that driver, so adding Keith Busch to cc.
The bug is that pcie_pme_remove() invokes pcie_pme_suspend(), which
calls synchronize_irq(), which waits for the hardirq handlers and
IRQ threads of all drivers sharing the IRQ to finish.
pcie_pme_remove() then calls free_irq() which *again* waits for a
running hardirq handler and thread to finish, however it does not
wait for those of drivers sharing the IRQ to finish since 4.19,
see commit 519cc8652b3a ("genirq: Synchronize only with single thread
on free_irq()").
I think the invocation of synchronize_irq() in pcie_pme_suspend() is
unnecessary when called from pcie_pme_remove(), it's been there since
the driver was added by Rafael with commit c7f486567c1d ("PCI PM: PCIe
PME root port service driver"), adding him to cc as well.
Thanks,
Lukas
>
> Hotplug
> pciehp_ist()
> pciehp_handle_presence_or_link_change()
> pciehp_configure_device()
> pci_lock_rescan_remove() // Here will wait the pci_unlock_rescan_remove()
>
> So met the deadlock issue. Any idea to solve such issue ?
>
> 2180.490703] INFO: task bash:10913 blocked for more than 120 seconds.
> [ 2180.503428] Tainted: P OE 4.19.5-1.1.29.aarch64 #1
> [ 2180.516496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2180.532176] bash D 0 10913 6565 0x00000200
>
> # ps -ax |grep D
> PID TTY STAT TIME COMMAND
> 10913 ttyAMA0 Ds+ 0:00 -bash
> 12945 ? Ss 0:00 /usr/sbin/sshd -D
> 14022 ? D 0:00 [irq/745-pciehp]
> 15972 pts/1 S+ 0:00 grep --color=auto D
> # cat /proc/14022/stack
> [<0>] __switch_to+0x94/0xd8
> [<0>] pci_lock_rescan_remove+0x20/0x28
> [<0>] pciehp_configure_device+0x30/0x140
> [<0>] pciehp_handle_presence_or_link_change+0x324/0x458
> [<0>] pciehp_ist+0x1dc/0x1e0
> [<0>] irq_thread_fn+0x30/0x90
> [<0>] irq_thread+0x140/0x210
> [<0>] kthread+0x134/0x138
> [<0>] ret_from_fork+0x10/0x1c
> [<0>] 0xffffffffffffffff
> # cat /proc/10913/stack
> [<0>] __switch_to+0x94/0xd8
> [<0>] synchronize_irq+0x8c/0xc0
> [<0>] pcie_pme_suspend+0xa4/0x118
> [<0>] pcie_pme_remove+0x20/0x40
> [<0>] pcie_port_remove_service+0x3c/0x58
> [<0>] device_release_driver_internal+0x1b4/0x250
> [<0>] device_release_driver+0x28/0x38
> [<0>] bus_remove_device+0xd4/0x160
> [<0>] device_del+0x128/0x348
> [<0>] device_unregister+0x24/0x78
> [<0>] remove_iter+0x48/0x58
> [<0>] device_for_each_child+0x6c/0xb8
> [<0>] pcie_port_device_remove+0x2c/0x48
> [<0>] pcie_portdrv_remove+0x68/0x78
> [<0>] pci_device_remove+0x48/0x120
> [<0>] device_release_driver_internal+0x1b4/0x250
> [<0>] device_release_driver+0x28/0x38
> [<0>] pci_stop_bus_device+0x84/0xc0
> [<0>] pci_stop_and_remove_bus_device_locked+0x24/0x40
> [<0>] remove_store+0xa4/0xb8
> [<0>] dev_attr_store+0x44/0x60
> [<0>] sysfs_kf_write+0x58/0x80
> [<0>] kernfs_fop_write+0xd8/0x1e0
> [<0>] __vfs_write+0x60/0x190
> [<0>] vfs_write+0xac/0x1c0
> [<0>] ksys_write+0x6c/0xd8
> [<0>] __arm64_sys_write+0x24/0x30
> [<0>] el0_svc_common+0x78/0x100
> [<0>] el0_svc_handler+0x38/0x88
> [<0>] el0_svc+0x8/0xc
> [<0>] 0xffffffffffffffff
>
> Thanks,
> Dongdong
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question about Hotplug and PME deadlock issue
2019-02-27 12:31 ` Lukas Wunner
@ 2019-02-27 13:27 ` Rafael J. Wysocki
0 siblings, 0 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2019-02-27 13:27 UTC (permalink / raw)
To: Lukas Wunner
Cc: Dongdong Liu, Bjorn Helgaas, mika.westerberg, Linux PCI,
Linuxarm, Keith Busch
On Wednesday, February 27, 2019 1:31:59 PM CET Lukas Wunner wrote:
> On Wed, Feb 27, 2019 at 07:52:16PM +0800, Dongdong Liu wrote:
> > We do some test to trigger hotplug while do sysfs "remove" operation,
> > then met a deadlock issue.
> >
> > pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
> > echo 1 > 0000\:00\:0c.0/remove
> >
> > PME and hotplug share an MSI/MSI-X vector.
> > The sysfs "remove" operation will call the below function.
> > remove_store()
> > pci_stop_and_remove_bus_device_locked()
> > pci_lock_rescan_remove()
> > pci_stop_and_remove_bus_device()
> > ...
> > pcie_pme_remove()
> > synchronize_irq() // Here will wait for hotplug IRQ handler
> > pci_unlock_rescan_remove();
>
> This is (at least) a bug in the PME driver. I'm not familiar with
> that driver, so adding Keith Busch to cc.
>
> The bug is that pcie_pme_remove() invokes pcie_pme_suspend(), which
> calls synchronize_irq(), which waits for the hardirq handlers and
> IRQ threads of all drivers sharing the IRQ to finish.
>
> pcie_pme_remove() then calls free_irq() which *again* waits for a
> running hardirq handler and thread to finish, however it does not
> wait for those of drivers sharing the IRQ to finish since 4.19,
> see commit 519cc8652b3a ("genirq: Synchronize only with single thread
> on free_irq()").
>
> I think the invocation of synchronize_irq() in pcie_pme_suspend() is
> unnecessary when called from pcie_pme_remove(), it's been there since
> the driver was added by Rafael with commit c7f486567c1d ("PCI PM: PCIe
> PME root port service driver"), adding him to cc as well.
Thanks!
AFAICS, the only part of pcie_pme_suspend() that needs to be done on
remove is the one under data->lock. I'll cut a patch to make that
change.
Cheers,
Rafael
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-02-27 13:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-27 11:52 Question about Hotplug and PME deadlock issue Dongdong Liu
2019-02-27 12:31 ` Lukas Wunner
2019-02-27 13:27 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).