* [PATCH 0/2] Hotplug interrupt and AER recovery deadlocks
@ 2020-06-15 14:32 Ian May
2020-06-15 14:32 ` [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock Ian May
2020-06-15 14:32 ` [PATCH 2/2] PCIe hotplug interrupt and AER pci_slot_mutex deadlock Ian May
0 siblings, 2 replies; 8+ messages in thread
From: Ian May @ 2020-06-15 14:32 UTC (permalink / raw)
To: linux-pci
We've encountered two deadlocks when the hotplug interrupt and AER
recovery trigger simultaneously on the same device. The first
deadlock is with:
Hotplug AER
---------------------------- ---------------------------
device_lock(&dev->dev) down_write(&ctrl->reset_lock)
down_read(&ctrl->reset_lock) device_lock(&dev->dev)
//Hotplug
[ 637.084657] Call Trace:
[ 637.084669] __schedule+0x2a8/0x670
[ 637.084670] schedule+0x33/0xa0
[ 637.084672] schedule_preempt_disabled+0xe/0x10
[ 637.084673] __mutex_lock.isra.9+0x26d/0x4e0
[ 637.084674] __mutex_lock_slowpath+0x13/0x20
[ 637.084675] ? __mutex_lock_slowpath+0x13/0x20
[ 637.084675] mutex_lock+0x2f/0x40 <---mutex_lock(&dev->mutex);
[ 637.084678] __device_driver_lock+0x2b/0x50
[ 637.084679] device_release_driver_internal+0x1f/0x1b0
[ 637.084680] device_release_driver+0x12/0x20
[ 637.084680] bus_remove_device+0xe1/0x150
[ 637.084682] device_del+0x16a/0x3a0
[ 637.084687] pci_remove_bus_device+0x7e/0x100
[ 637.084690] pci_stop_and_remove_bus_device+0x1a/0x20
[ 637.084691] pciehp_unconfigure_device+0x80/0x130
[ 637.084692] pciehp_disable_slot+0x6e/0x120
[ 637.084694] pciehp_handle_presence_or_link_change+0x76/0x470
[ 637.084696] pciehp_ist+0x14a/0x1e0 <---down_read(&ctrl->reset_lock);
//AER
[ 637.110208] Call Trace:
[ 637.110209] __schedule+0x2a8/0x670
[ 637.110213] ? sched_clock+0x9/0x10
[ 637.110222] schedule+0x33/0xa0
[ 637.110230] rwsem_down_write_slowpath+0x233/0x4a0
[ 637.110237] ? __schedule+0x2b0/0x670
[ 637.110239] down_write+0x3d/0x40
[ 637.110241] ? down_write+0x3d/0x40 <---down_write(&ctrl->reset_lock)
[ 637.110245] pciehp_reset_slot+0x58/0x150
[ 637.110250] pci_reset_hotplug_slot+0x43/0x70
[ 637.110256] pci_slot_reset+0x12f/0x170 <---mutex_lock(&dev->mutex);
[ 637.110264] pci_bus_error_reset+0xac/0xd0
[ 637.110266] pcie_do_recovery+0x255/0x2c0
[ 637.110267] aer_recover_work_func+0xd2/0x100
[ 637.110270] process_one_work+0x1fd/0x3f0
[ 637.110279] worker_thread+0x34/0x410
[ 637.110287] kthread+0x121/0x140
[ 637.110288] ? process_one_work+0x3f0/0x3f0
[ 637.110289] ? kthread_park+0xb0/0xb0
[ 637.110290] ret_from_fork+0x22/0x40
Applying the first patch then exposes the following 2nd deadlock:
Hotplug AER
---------------------------- ---------------------------
down_read(&ctrl->reset_lock) mutex_lock(&pci_slot_mutex)
mutex_lock(&pci_slot_mutex) down_write(&ctrl->reset_lock)
//Hotplug
[ 391.250990] INFO: task irq/42-pciehp:1960 blocked for more than 120 seconds.
[ 391.259086] Tainted: P OE 5.3.0-51-generic #44~18.04.2
[ 391.266982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 391.275977] irq/42-pciehp D 0 1960 2 0x80004000
[ 391.275979] Call Trace:
[ 391.275986] __schedule+0x2b9/0x6c0
[ 391.275987] ? __switch_to_asm+0x34/0x70
[ 391.275989] schedule+0x42/0xb0
[ 391.275992] schedule_preempt_disabled+0xe/0x10
[ 391.275997] __mutex_lock.isra.0+0x182/0x4f0
[ 391.276003] ? raw_pci_read+0x29/0x40
[ 391.276005] ? pci_read+0x2c/0x30
[ 391.276010] ? irq_finalize_oneshot.part.0+0xf0/0xf0
[ 391.276015] __mutex_lock_slowpath+0x13/0x20
[ 391.276017] mutex_lock+0x2e/0x40 <---pci_slot_mutex
[ 391.276021] pci_dev_assign_slot+0x19/0x5f
[ 391.276025] pci_setup_device+0xc1/0x150
[ 391.276029] ? online_show+0x20/0x70
[ 391.276034] ? irq_finalize_oneshot.part.0+0xf0/0xf0
[ 391.276035] pci_scan_single_device+0x95/0xd0
[ 391.276041] pci_scan_slot+0x4a/0x110
[ 391.276042] pciehp_configure_device+0x40/0x120
[ 391.276044] pciehp_handle_presence_or_link_change+0x184/0x290
[ 391.276049] pciehp_ist+0x141/0x150 <---ctrl->reset_lock
[ 391.276055] irq_thread_fn+0x28/0x60
[ 391.276056] irq_thread+0xda/0x170
[ 391.276057] ? irq_forced_thread_fn+0x80/0x80
[ 391.276059] kthread+0x104/0x140
[ 391.276060] ? irq_thread_check_affinity+0xf0/0xf0
[ 391.276061] ? kthread_park+0x80/0x80
[ 391.276062] ret_from_fork+0x22/0x40
//AER
[ 391.225984] INFO: task kworker/39:1:1593 blocked for more than 120 seconds.
[ 391.233989] Tainted: P OE 5.3.0-51-generic #44~18.04.2
[ 391.241887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 391.250882] kworker/39:1 D 0 1593 2 0x80004000
[ 391.250890] Workqueue: events aer_recover_work_func
[ 391.250893] Call Trace:
[ 391.250899] __schedule+0x2b9/0x6c0
[ 391.250902] ? sched_clock+0x9/0x10
[ 391.250903] schedule+0x42/0xb0
[ 391.250906] rwsem_down_write_slowpath+0x23a/0x4c0
[ 391.250908] down_write+0x41/0x50
[ 391.250909] pciehp_reset_lock+0x12/0x20
[ 391.250911] pci_slot_reset+0xa1/0x170 <---ctrl->reset_lock
[ 391.250912] pci_bus_error_reset+0xfb/0x130 <---pci_slot_mutex
[ 391.250914] pcie_do_recovery+0x238/0x264
[ 391.250917] aer_recover_work_func.cold+0x9a/0xa0
[ 391.250919] process_one_work+0x1da/0x390
[ 391.250920] worker_thread+0x4d/0x400
[ 391.250923] kthread+0x104/0x140
[ 391.250924] ? process_one_work+0x390/0x390
[ 391.250925] ? kthread_park+0x80/0x80
[ 391.250927] ret_from_fork+0x22/0x40
The 2nd patch is unlocking and locking the pci_slot mutex in the AER path around
the controller reset call. This allows for both the Hotplug and AER execution
to complete.
Ian May (2):
PCIe hotplug interrupt and AER deadlock with reset_lock and
device_lock
PCIe hotplug interrupt and AER pci_slot_mutex deadlock
drivers/pci/hotplug/pciehp.h | 2 ++
drivers/pci/hotplug/pciehp_core.c | 2 ++
drivers/pci/hotplug/pciehp_hpc.c | 21 ++++++++++++++++++---
drivers/pci/pci.c | 8 ++++++++
include/linux/pci_hotplug.h | 2 ++
5 files changed, 32 insertions(+), 3 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock
2020-06-15 14:32 [PATCH 0/2] Hotplug interrupt and AER recovery deadlocks Ian May
@ 2020-06-15 14:32 ` Ian May
2020-06-15 18:56 ` Lukas Wunner
2020-06-15 14:32 ` [PATCH 2/2] PCIe hotplug interrupt and AER pci_slot_mutex deadlock Ian May
1 sibling, 1 reply; 8+ messages in thread
From: Ian May @ 2020-06-15 14:32 UTC (permalink / raw)
To: linux-pci
Currently when a hotplug interrupt and AER recovery triggers simultaneously
the following deadlock can occur.
Hotplug AER
---------------------------- ---------------------------
device_lock(&dev->dev) down_write(&ctrl->reset_lock)
down_read(&ctrl->reset_lock) device_lock(&dev->dev)
This patch adds a reset_lock and reset_unlock hotplug_slot_op. This would allow
the controller reset_lock/reset_unlock to be moved from pciehp_reset_slot to
pci_slot_reset function allowing the controller reset_lock to be acquired before
the device lock allowing for both hotplug and AER to grab the reset_lock
and device lock in proper order.
Signed-off-by: Ian May <ian.may@canonical.com>
---
drivers/pci/hotplug/pciehp.h | 2 ++
drivers/pci/hotplug/pciehp_core.c | 2 ++
drivers/pci/hotplug/pciehp_hpc.c | 21 ++++++++++++++++++---
drivers/pci/pci.c | 6 ++++++
include/linux/pci_hotplug.h | 2 ++
5 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index e28b4fffd84d..1e98604cec83 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -183,6 +183,8 @@ void pciehp_release_ctrl(struct controller *ctrl);
int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot);
+int pciehp_reset_lock(struct hotplug_slot *hotplug_slot);
+int pciehp_reset_unlock(struct hotplug_slot *hotplug_slot);
int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe);
int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 4c032d75c874..a8da3e6a59b8 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -63,6 +63,8 @@ static int init_slot(struct controller *ctrl)
ops->get_power_status = get_power_status;
ops->get_adapter_status = get_adapter_status;
ops->reset_slot = pciehp_reset_slot;
+ ops->reset_lock = pciehp_reset_lock;
+ ops->reset_unlock = pciehp_reset_unlock;
if (MRL_SENS(ctrl))
ops->get_latch_status = get_latch_status;
if (ATTN_LED(ctrl)) {
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 81b23f07719e..185ec9d1b0d0 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -798,6 +798,24 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+int pciehp_reset_lock(struct hotplug_slot *hotplug_slot)
+{
+ struct controller *ctrl = to_ctrl(hotplug_slot);
+
+ down_write(&ctrl->reset_lock);
+
+ return 0;
+}
+
+int pciehp_reset_unlock(struct hotplug_slot *hotplug_slot)
+{
+ struct controller *ctrl = to_ctrl(hotplug_slot);
+
+ up_write(&ctrl->reset_lock);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
@@ -816,8 +834,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
if (probe)
return 0;
- down_write(&ctrl->reset_lock);
-
if (!ATTN_BUTTN(ctrl)) {
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
stat_mask |= PCI_EXP_SLTSTA_PDC;
@@ -836,7 +852,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
- up_write(&ctrl->reset_lock);
return rc;
}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index da21293f1111..e32c5a1a706e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5228,14 +5228,20 @@ static int pci_slot_reset(struct pci_slot *slot, int probe)
return -ENOTTY;
if (!probe)
+ {
+ slot->hotplug->ops->reset_lock(slot->hotplug);
pci_slot_lock(slot);
+ }
might_sleep();
rc = pci_reset_hotplug_slot(slot->hotplug, probe);
if (!probe)
+ {
pci_slot_unlock(slot);
+ slot->hotplug->ops->reset_unlock(slot->hotplug);
+ }
return rc;
}
diff --git a/include/linux/pci_hotplug.h b/include/linux/pci_hotplug.h
index f694eb2ca978..fce5ad979346 100644
--- a/include/linux/pci_hotplug.h
+++ b/include/linux/pci_hotplug.h
@@ -45,6 +45,8 @@ struct hotplug_slot_ops {
int (*get_latch_status) (struct hotplug_slot *slot, u8 *value);
int (*get_adapter_status) (struct hotplug_slot *slot, u8 *value);
int (*reset_slot) (struct hotplug_slot *slot, int probe);
+ int (*reset_lock) (struct hotplug_slot *slot);
+ int (*reset_unlock) (struct hotplug_slot *slot);
};
/**
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] PCIe hotplug interrupt and AER pci_slot_mutex deadlock
2020-06-15 14:32 [PATCH 0/2] Hotplug interrupt and AER recovery deadlocks Ian May
2020-06-15 14:32 ` [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock Ian May
@ 2020-06-15 14:32 ` Ian May
1 sibling, 0 replies; 8+ messages in thread
From: Ian May @ 2020-06-15 14:32 UTC (permalink / raw)
To: linux-pci
In the AER recovery code path, pci_bus_error_reset acquires the pci_slot_mutex,
unlock and relock in call to pci_slot_reset around controller reset to lock in
same order as the hotplug code path.
Hotplug AER
---------------------------- ---------------------------
down_read(&ctrl->reset_lock) mutex_lock(&pci_slot_mutex)
mutex_lock(&pci_slot_mutex) down_write(&ctrl->reset_lock)
Signed-off-by: Ian May <ian.may@canonical.com>
---
drivers/pci/pci.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e32c5a1a706e..4eeca8b18664 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5229,6 +5229,7 @@ static int pci_slot_reset(struct pci_slot *slot, int probe)
if (!probe)
{
+ mutex_unlock(&pci_slot_mutex);
slot->hotplug->ops->reset_lock(slot->hotplug);
pci_slot_lock(slot);
}
@@ -5241,6 +5242,7 @@ static int pci_slot_reset(struct pci_slot *slot, int probe)
{
pci_slot_unlock(slot);
slot->hotplug->ops->reset_unlock(slot->hotplug);
+ mutex_lock(&pci_slot_mutex);
}
return rc;
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock
2020-06-15 14:32 ` [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock Ian May
@ 2020-06-15 18:56 ` Lukas Wunner
2020-06-16 17:13 ` Ian May
2020-07-14 23:58 ` Bjorn Helgaas
0 siblings, 2 replies; 8+ messages in thread
From: Lukas Wunner @ 2020-06-15 18:56 UTC (permalink / raw)
To: Ian May; +Cc: linux-pci
On Mon, Jun 15, 2020 at 09:32:49AM -0500, Ian May wrote:
> Currently when a hotplug interrupt and AER recovery triggers simultaneously
> the following deadlock can occur.
>
> Hotplug AER
> ---------------------------- ---------------------------
> device_lock(&dev->dev) down_write(&ctrl->reset_lock)
> down_read(&ctrl->reset_lock) device_lock(&dev->dev)
>
> This patch adds a reset_lock and reset_unlock hotplug_slot_op.
> This would allow the controller reset_lock/reset_unlock to be moved
> from pciehp_reset_slot to pci_slot_reset function allowing the controller
> reset_lock to be acquired before the device lock allowing for both hotplug
> and AER to grab the reset_lock and device lock in proper order.
I've posted a patch to address such issues on Mar 31, just haven't
gotten around to respin it with a proper commit message:
https://lore.kernel.org/linux-pci/20200331130139.46oxbade6rcbaicb@wunner.de/
I've solved it by moving the reset lock into struct pci_slot.
I think that's simpler than adding two callbacks.
Do you think the AER deadlock could be fixed based on my approach?
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock
2020-06-15 18:56 ` Lukas Wunner
@ 2020-06-16 17:13 ` Ian May
2020-07-17 5:20 ` Lukas Wunner
2020-07-14 23:58 ` Bjorn Helgaas
1 sibling, 1 reply; 8+ messages in thread
From: Ian May @ 2020-06-16 17:13 UTC (permalink / raw)
To: Lukas Wunner; +Cc: linux-pci
Hi Lukas,
Thanks for the quick reply! I like your solution and have confirmed it
solves the first deadlock we see between the Hotplug interrupt and AER
recovery.
Thanks,
Ian
On 6/15/20 1:56 PM, Lukas Wunner wrote:
> On Mon, Jun 15, 2020 at 09:32:49AM -0500, Ian May wrote:
>> Currently when a hotplug interrupt and AER recovery triggers simultaneously
>> the following deadlock can occur.
>>
>> Hotplug AER
>> ---------------------------- ---------------------------
>> device_lock(&dev->dev) down_write(&ctrl->reset_lock)
>> down_read(&ctrl->reset_lock) device_lock(&dev->dev)
>>
>> This patch adds a reset_lock and reset_unlock hotplug_slot_op.
>> This would allow the controller reset_lock/reset_unlock to be moved
>> from pciehp_reset_slot to pci_slot_reset function allowing the controller
>> reset_lock to be acquired before the device lock allowing for both hotplug
>> and AER to grab the reset_lock and device lock in proper order.
> I've posted a patch to address such issues on Mar 31, just haven't
> gotten around to respin it with a proper commit message:
>
> https://lore.kernel.org/linux-pci/20200331130139.46oxbade6rcbaicb@wunner.de/
>
> I've solved it by moving the reset lock into struct pci_slot.
> I think that's simpler than adding two callbacks.
>
> Do you think the AER deadlock could be fixed based on my approach?
>
> Thanks,
>
> Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock
2020-06-15 18:56 ` Lukas Wunner
2020-06-16 17:13 ` Ian May
@ 2020-07-14 23:58 ` Bjorn Helgaas
1 sibling, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2020-07-14 23:58 UTC (permalink / raw)
To: Lukas Wunner; +Cc: Ian May, linux-pci
On Mon, Jun 15, 2020 at 08:56:50PM +0200, Lukas Wunner wrote:
> On Mon, Jun 15, 2020 at 09:32:49AM -0500, Ian May wrote:
> > Currently when a hotplug interrupt and AER recovery triggers simultaneously
> > the following deadlock can occur.
> >
> > Hotplug AER
> > ---------------------------- ---------------------------
> > device_lock(&dev->dev) down_write(&ctrl->reset_lock)
> > down_read(&ctrl->reset_lock) device_lock(&dev->dev)
> >
> > This patch adds a reset_lock and reset_unlock hotplug_slot_op.
> > This would allow the controller reset_lock/reset_unlock to be moved
> > from pciehp_reset_slot to pci_slot_reset function allowing the controller
> > reset_lock to be acquired before the device lock allowing for both hotplug
> > and AER to grab the reset_lock and device lock in proper order.
>
> I've posted a patch to address such issues on Mar 31, just haven't
> gotten around to respin it with a proper commit message:
>
> https://lore.kernel.org/linux-pci/20200331130139.46oxbade6rcbaicb@wunner.de/
>
> I've solved it by moving the reset lock into struct pci_slot.
> I think that's simpler than adding two callbacks.
>
> Do you think the AER deadlock could be fixed based on my approach?
How should we proceed on this?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock
2020-06-16 17:13 ` Ian May
@ 2020-07-17 5:20 ` Lukas Wunner
[not found] ` <CAE1ug=a8PJpKh0Jx2ZZxo5kwQvvK883xs+mzyMWbW8T1oqyKDg@mail.gmail.com>
0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2020-07-17 5:20 UTC (permalink / raw)
To: Ian May; +Cc: linux-pci
On Tue, Jun 16, 2020 at 12:13:23PM -0500, Ian May wrote:
> Thanks for the quick reply! I like your solution and have confirmed it
> solves the first deadlock we see between the Hotplug interrupt and AER
> recovery.
Thank you for the confirmation (and sorry for the delay). I'm cooking
up a proper patch right now.
One question regarding your patch [2/2]: If, instead of this patch,
you change pci_bus_error_reset() to call "device_lock(bridge)" rather
than "mutex_lock(&pci_slot_mutex)", do you still see deadlocks?
Taking the pci_slot_mutex in pci_bus_error_reset() was actually the
right thing to do because it holds the driver of the hotplug port
in place. (The hotplug port above the bus being reset.) Without
that, dereferencing slot->hotplug in pci_slot_reset() wouldn't be
safe. My fear is that acquiring the device_lock() of the bridge
leading to the bus being reset may cause other deadlocks, in particular
in cascaded topologies such as Thunderbolt, which I suspect may be
what you're dealing with.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock
[not found] ` <CAE1ug=a8PJpKh0Jx2ZZxo5kwQvvK883xs+mzyMWbW8T1oqyKDg@mail.gmail.com>
@ 2020-08-01 16:14 ` Lukas Wunner
0 siblings, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2020-08-01 16:14 UTC (permalink / raw)
To: Ian May; +Cc: linux-pci, Keith Busch
[cc += Keith]
On Fri, Jul 17, 2020 at 09:02:22AM -0500, Ian May wrote:
> I do now have a "better" patch that I was going to submit to the list
> where I converted the pci_slot_mutex to a rw_semaphore. Do you see
> any potential problems with changing the lock type? I attached the
> patch if you are interested in checking it over.
The question is, if pci_slot_mutex is an rw_semaphore, can it happen
that pciehp acquires it for writing, provoking a deadlock like this:
Hotplug AER
---------------------------- ---------------------------
1 down_read(&ctrl->reset_lock)
2 down_read(&pci_slot_mutex)
3 down_write(&pci_slot_mutex)
4 down_write(&ctrl->reset_lock)
** DEADLOCK **
I think this can happen if the device inserted into the hotplug slot
contains a PCIe switch which itself has hotplug ports. That's the
case with Thunderbolt: Every Thunderbolt device contains a PCIe
switch with hotplug ports to extend the Thunderbolt chain. E.g.
the PCIe hierarchy looks like this for a Thunderbolt host controller
with a chain of two devices:
Root - Upstream - Downstream - Upstream - Downstream - Upstream - Downstream
(host ...) (1st device ...) (2nd device ...)
When a Thunderbolt device is attached, pci_slot_mutex would be taken
for writing in pci_create_slot():
pciehp_configure_device()
pci_scan_slot()
pci_scan_single_device()
pci_scan_device()
pci_setup_device()
pci_dev_assign_slot() # acquire pci_slot_mutex for reading
pci_device_add() # match_driver = false; device_add()
pci_bus_add_devices()
pci_bus_add_device() # match_driver = true; device_attach()
device_attach()
__device_attach()
__device_attach_driver()
driver_probe_device()
pcie_portdrv_probe()
pcie_port_device_register()
pcie_device_init()
device_register()
device_add()
bus_probe_device()
device_initial_probe()
__device_attach()
__device_attach_driver()
driver_probe_device()
pciehp_probe()
init_slot()
pci_hp_initialize()
pci_create_slot()
down_write(pci_slot_mutex)
(You may want to double-check that I got this right.)
In principle, Keith did the right thing to acquire pci_slot_mutex in
pci_bus_error_reset() for accessing the bus->slots list.
I need to think some more to come up with a solution for this particular
deadlock. Maybe using a klist and traversing it with klist_iter_init()
(holds a ref on each slot, allowing concurrent list access) or something
along those lines...
Thanks,
Lukas
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-08-01 16:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15 14:32 [PATCH 0/2] Hotplug interrupt and AER recovery deadlocks Ian May
2020-06-15 14:32 ` [PATCH 1/2] PCIe hotplug interrupt and AER deadlock with reset_lock and device_lock Ian May
2020-06-15 18:56 ` Lukas Wunner
2020-06-16 17:13 ` Ian May
2020-07-17 5:20 ` Lukas Wunner
[not found] ` <CAE1ug=a8PJpKh0Jx2ZZxo5kwQvvK883xs+mzyMWbW8T1oqyKDg@mail.gmail.com>
2020-08-01 16:14 ` Lukas Wunner
2020-07-14 23:58 ` Bjorn Helgaas
2020-06-15 14:32 ` [PATCH 2/2] PCIe hotplug interrupt and AER pci_slot_mutex deadlock Ian May
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).