Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend
@ 2019-08-12 14:31 Mika Westerberg
  2019-08-12 14:31 ` [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
  2019-09-19  6:42 ` [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
  0 siblings, 2 replies; 15+ messages in thread
From: Mika Westerberg @ 2019-08-12 14:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Lukas Wunner, Keith Busch, Andy Shevchenko, Frederick Lawler,
	Gustavo A . R . Silva, Sinan Kaya, Kai-Heng Feng,
	Mika Westerberg, linux-pci, linux-kernel

We try to keep PCIe hotplug ports runtime suspended when entering system
suspend. Due to the fact that the PCIe portdrv sets NEVER_SKIP driver PM
flag the PM core always calls system suspend/resume hooks even if the
device is left runtime suspended. Since PCIe hotplug driver re-uses the
same function for both it ends up disabling hotplug interrupt twice and
the second time following is printed:

  pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device

Prevent this from happening by checking whether the device is already
runtime suspended when system suspend hook is called.

Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks")
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
Changes from previous version:

  * Move .resume inside CONFIG_PM_SLEEP
  * Added tags from Kai-Heng and Rafael

 drivers/pci/hotplug/pciehp_core.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 6ad0d86762cb..e9f82afa3773 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -248,7 +248,7 @@ static bool pme_is_native(struct pcie_device *dev)
 	return pcie_ports_native || host->native_pme;
 }
 
-static int pciehp_suspend(struct pcie_device *dev)
+static void pciehp_disable_interrupt(struct pcie_device *dev)
 {
 	/*
 	 * Disable hotplug interrupt so that it does not trigger
@@ -256,7 +256,19 @@ static int pciehp_suspend(struct pcie_device *dev)
 	 */
 	if (pme_is_native(dev))
 		pcie_disable_interrupt(get_service_data(dev));
+}
 
+#ifdef CONFIG_PM_SLEEP
+static int pciehp_suspend(struct pcie_device *dev)
+{
+	/*
+	 * If the port is already runtime suspended we can keep it that
+	 * way.
+	 */
+	if (dev_pm_smart_suspend_and_suspended(&dev->port->dev))
+		return 0;
+
+	pciehp_disable_interrupt(dev);
 	return 0;
 }
 
@@ -274,6 +286,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
 
 	return 0;
 }
+#endif
 
 static int pciehp_resume(struct pcie_device *dev)
 {
@@ -287,6 +300,12 @@ static int pciehp_resume(struct pcie_device *dev)
 	return 0;
 }
 
+static int pciehp_runtime_suspend(struct pcie_device *dev)
+{
+	pciehp_disable_interrupt(dev);
+	return 0;
+}
+
 static int pciehp_runtime_resume(struct pcie_device *dev)
 {
 	struct controller *ctrl = get_service_data(dev);
@@ -313,10 +332,12 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
 	.remove		= pciehp_remove,
 
 #ifdef	CONFIG_PM
+#ifdef	CONFIG_PM_SLEEP
 	.suspend	= pciehp_suspend,
 	.resume_noirq	= pciehp_resume_noirq,
 	.resume		= pciehp_resume,
-	.runtime_suspend = pciehp_suspend,
+#endif
+	.runtime_suspend = pciehp_runtime_suspend,
 	.runtime_resume	= pciehp_runtime_resume,
 #endif	/* PM */
 };
-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-08-12 14:31 [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
@ 2019-08-12 14:31 ` Mika Westerberg
  2019-08-19  2:28   ` Sinan Kaya
                     ` (2 more replies)
  2019-09-19  6:42 ` [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
  1 sibling, 3 replies; 15+ messages in thread
From: Mika Westerberg @ 2019-08-12 14:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Lukas Wunner, Keith Busch, Andy Shevchenko, Frederick Lawler,
	Gustavo A . R . Silva, Sinan Kaya, Kai-Heng Feng,
	Mika Westerberg, linux-pci, linux-kernel

If there are more than one PCIe switch with hotplug downstream ports
hot-removing them leads to a following deadlock:

 INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 irq/126-pciehp  D    0   198      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_timeout+0x246/0x350
  ? ttwu_do_activate+0x67/0x90
  wait_for_completion+0xb7/0x140
  ? wake_up_q+0x80/0x80
  kthread_stop+0x49/0x110
  __free_irq+0x15c/0x2a0
  free_irq+0x32/0x70
  pcie_shutdown_notification+0x2f/0x50
  pciehp_remove+0x27/0x50
  pcie_port_remove_service+0x36/0x50
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  bus_remove_device+0xec/0x160
  device_del+0x13b/0x350
  ? pcie_port_find_device+0x60/0x60
  device_unregister+0x1a/0x60
  remove_iter+0x1e/0x30
  device_for_each_child+0x56/0x90
  pcie_port_device_remove+0x22/0x40
  pcie_portdrv_remove+0x20/0x60
  pci_device_remove+0x3e/0xc0
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  pci_stop_bus_device+0x6f/0x90
  pci_stop_bus_device+0x31/0x90
  pci_stop_and_remove_bus_device+0x12/0x20
  pciehp_unconfigure_device+0x88/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
 irq/190-pciehp  D    0  2288      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_preempt_disabled+0xe/0x10
  __mutex_lock.isra.9+0x2e0/0x4d0
  ? __mutex_lock_slowpath+0x13/0x20
  __mutex_lock_slowpath+0x13/0x20
  mutex_lock+0x2c/0x30
  pci_lock_rescan_remove+0x15/0x20
  pciehp_unconfigure_device+0x4d/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

What happens here is that the whole hierarchy is runtime resumed and the
parent PCIe downstream port, who got the hot-remove event, starts
removing devices below it taking pci_lock_rescan_remove() lock. When the
child PCIe port is runtime resumed it calls pciehp_check_presence()
which ends up calling pciehp_card_present() and pciehp_check_link_active().
Both of these read their parts of PCIe config space by calling helper
function pcie_capability_read_word(). Now, this function notices that
the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
with the capability value set to 0. When pciehp gets this value it
thinks that its child device is also hot-removed and schedules its IRQ
thread to handle the event.

The deadlock happens when the child's IRQ thread runs and tries to
acquire pci_lock_rescan_remove() which is already taken by the parent
and the parent waits for the child's IRQ thread to finish.

We can prevent this from happening by checking the return value of
pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
performing any hot-removal activities.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
No changes from the previous version.

 drivers/pci/hotplug/pciehp.h      |  6 +++---
 drivers/pci/hotplug/pciehp_core.c | 11 ++++++++---
 drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
 drivers/pci/hotplug/pciehp_hpc.c  | 32 +++++++++++++++++++++++--------
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 8c51a04b8083..81c514ab9518 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -173,10 +173,10 @@ int pciehp_query_power_fault(struct controller *ctrl);
 void pciehp_green_led_on(struct controller *ctrl);
 void pciehp_green_led_off(struct controller *ctrl);
 void pciehp_green_led_blink(struct controller *ctrl);
-bool pciehp_card_present(struct controller *ctrl);
-bool pciehp_card_present_or_link_active(struct controller *ctrl);
+int pciehp_card_present(struct controller *ctrl);
+int pciehp_card_present_or_link_active(struct controller *ctrl);
 int pciehp_check_link_status(struct controller *ctrl);
-bool pciehp_check_link_active(struct controller *ctrl);
+int pciehp_check_link_active(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 
 int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index e9f82afa3773..4c032d75c874 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -134,10 +134,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
 	struct controller *ctrl = to_ctrl(hotplug_slot);
 	struct pci_dev *pdev = ctrl->pcie->port;
+	int ret;
 
 	pci_config_pm_runtime_get(pdev);
-	*value = pciehp_card_present_or_link_active(ctrl);
+	ret = pciehp_card_present_or_link_active(ctrl);
 	pci_config_pm_runtime_put(pdev);
+	if (ret < 0)
+		return ret;
+
+	*value = ret;
 	return 0;
 }
 
@@ -153,13 +158,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
  */
 static void pciehp_check_presence(struct controller *ctrl)
 {
-	bool occupied;
+	int occupied;
 
 	down_read(&ctrl->reset_lock);
 	mutex_lock(&ctrl->state_lock);
 
 	occupied = pciehp_card_present_or_link_active(ctrl);
-	if ((occupied && (ctrl->state == OFF_STATE ||
+	if ((occupied > 0 && (ctrl->state == OFF_STATE ||
 			  ctrl->state == BLINKINGON_STATE)) ||
 	    (!occupied && (ctrl->state == ON_STATE ||
 			   ctrl->state == BLINKINGOFF_STATE)))
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 631ced0ab28a..5a433cc8621f 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -221,7 +221,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
 
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 {
-	bool present, link_active;
+	int present, link_active;
 
 	/*
 	 * If the slot is on and presence or link has changed, turn it off.
@@ -252,7 +252,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 	mutex_lock(&ctrl->state_lock);
 	present = pciehp_card_present(ctrl);
 	link_active = pciehp_check_link_active(ctrl);
-	if (!present && !link_active) {
+	if (present <= 0 && link_active <= 0) {
 		mutex_unlock(&ctrl->state_lock);
 		return;
 	}
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bd990e3371e3..1f918b043adb 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -201,13 +201,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
 	pcie_do_write_cmd(ctrl, cmd, mask, false);
 }
 
-bool pciehp_check_link_active(struct controller *ctrl)
+int pciehp_check_link_active(struct controller *ctrl)
 {
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	u16 lnk_status;
-	bool ret;
+	int ret;
+
+	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+	if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+		return -ENODEV;
 
-	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
 	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
 
 	if (ret)
@@ -373,13 +376,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
 	*status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
 }
 
-bool pciehp_card_present(struct controller *ctrl)
+int pciehp_card_present(struct controller *ctrl)
 {
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	u16 slot_status;
+	int ret;
 
-	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
-	return slot_status & PCI_EXP_SLTSTA_PDS;
+	ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+	if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+		return -ENODEV;
+
+	return !!(slot_status & PCI_EXP_SLTSTA_PDS);
 }
 
 /**
@@ -390,10 +397,19 @@ bool pciehp_card_present(struct controller *ctrl)
  * Presence Detect State bit, this helper also returns true if the Link Active
  * bit is set.  This is a concession to broken hotplug ports which hardwire
  * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
+ *
+ * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
+ *	    port is not present anymore returns %-ENODEV.
  */
-bool pciehp_card_present_or_link_active(struct controller *ctrl)
+int pciehp_card_present_or_link_active(struct controller *ctrl)
 {
-	return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
+	int ret;
+
+	ret = pciehp_card_present(ctrl);
+	if (ret)
+		return ret;
+
+	return pciehp_check_link_active(ctrl);
 }
 
 int pciehp_query_power_fault(struct controller *ctrl)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-08-12 14:31 ` [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
@ 2019-08-19  2:28   ` Sinan Kaya
  2019-08-19  8:56     ` Mika Westerberg
  2019-09-23  5:34   ` Lukas Wunner
  2019-10-22 23:00   ` Bjorn Helgaas
  2 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2019-08-19  2:28 UTC (permalink / raw)
  To: Mika Westerberg, Bjorn Helgaas, Rafael J. Wysocki
  Cc: Lukas Wunner, Keith Busch, Andy Shevchenko, Frederick Lawler,
	Gustavo A . R . Silva, Kai-Heng Feng, linux-pci, linux-kernel

On 8/12/2019 10:31 AM, Mika Westerberg wrote:
> +int pciehp_card_present_or_link_active(struct controller *ctrl)
>  {
> -	return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
> +	int ret;
> +
> +	ret = pciehp_card_present(ctrl);
> +	if (ret)
> +		return ret;
> +
> +	return pciehp_check_link_active(ctrl);

The semantics of this function changed here. Before it was checking for
either presence detect bit or link active bit. Now, it is looking to
have both set.

There are PCI controllers that won't report presence detect correctly,
but still report link active.

I think you want

if (ret < 0)
	return ret;

here to match the previous behavior while still handling device removal.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-08-19  2:28   ` Sinan Kaya
@ 2019-08-19  8:56     ` Mika Westerberg
  2019-08-19 12:28       ` Sinan Kaya
  0 siblings, 1 reply; 15+ messages in thread
From: Mika Westerberg @ 2019-08-19  8:56 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Lukas Wunner, Keith Busch,
	Andy Shevchenko, Frederick Lawler, Gustavo A . R . Silva,
	Kai-Heng Feng, linux-pci, linux-kernel

On Sun, Aug 18, 2019 at 10:28:13PM -0400, Sinan Kaya wrote:
> On 8/12/2019 10:31 AM, Mika Westerberg wrote:
> > +int pciehp_card_present_or_link_active(struct controller *ctrl)
> >  {
> > -	return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
> > +	int ret;
> > +
> > +	ret = pciehp_card_present(ctrl);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return pciehp_check_link_active(ctrl);
> 
> The semantics of this function changed here. Before it was checking for
> either presence detect bit or link active bit. Now, it is looking to
> have both set.

Hmm, maybe I haven't got enough coffee yet but I'm not sure I understand :)
The intention was that the above two are equivalent with the exception
of handling the possible error.

> There are PCI controllers that won't report presence detect correctly,
> but still report link active.

If that's the case then pciehp_card_present() returns false so we call
pciehp_check_link_active() which should work with those controllers.

What I'm missing here?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-08-19  8:56     ` Mika Westerberg
@ 2019-08-19 12:28       ` Sinan Kaya
  0 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2019-08-19 12:28 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Lukas Wunner, Keith Busch,
	Andy Shevchenko, Frederick Lawler, Gustavo A . R . Silva,
	Kai-Heng Feng, linux-pci, linux-kernel

On 8/19/2019 4:56 AM, Mika Westerberg wrote:
>> There are PCI controllers that won't report presence detect correctly,
>> but still report link active.
> If that's the case then pciehp_card_present() returns false so we call
> pciehp_check_link_active() which should work with those controllers.
> 
> What I'm missing here?
> 

You are right. I thought we'd somehow prematurely leave the function.
That's not the case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend
  2019-08-12 14:31 [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
  2019-08-12 14:31 ` [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
@ 2019-09-19  6:42 ` Mika Westerberg
  1 sibling, 0 replies; 15+ messages in thread
From: Mika Westerberg @ 2019-09-19  6:42 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Lukas Wunner, Keith Busch, Andy Shevchenko, Frederick Lawler,
	Gustavo A . R . Silva, Sinan Kaya, Kai-Heng Feng, linux-pci,
	linux-kernel

On Mon, Aug 12, 2019 at 05:31:32PM +0300, Mika Westerberg wrote:
> We try to keep PCIe hotplug ports runtime suspended when entering system
> suspend. Due to the fact that the PCIe portdrv sets NEVER_SKIP driver PM
> flag the PM core always calls system suspend/resume hooks even if the
> device is left runtime suspended. Since PCIe hotplug driver re-uses the
> same function for both it ends up disabling hotplug interrupt twice and
> the second time following is printed:
> 
>   pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device
> 
> Prevent this from happening by checking whether the device is already
> runtime suspended when system suspend hook is called.
> 
> Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks")
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Hi Bjorn,

Any comments on these two?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-08-12 14:31 ` [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
  2019-08-19  2:28   ` Sinan Kaya
@ 2019-09-23  5:34   ` Lukas Wunner
  2019-09-23  8:12     ` Mika Westerberg
  2019-10-22 23:00   ` Bjorn Helgaas
  2 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2019-09-23  5:34 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote:
> If there are more than one PCIe switch with hotplug downstream ports
> hot-removing them leads to a following deadlock:

For the record, I think my comments on v1 of this patch still apply:

https://patchwork.ozlabs.org/patch/1117870/#2230798

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-09-23  5:34   ` Lukas Wunner
@ 2019-09-23  8:12     ` Mika Westerberg
  2019-09-23  8:28       ` Lukas Wunner
  2019-09-23  8:28       ` Mika Westerberg
  0 siblings, 2 replies; 15+ messages in thread
From: Mika Westerberg @ 2019-09-23  8:12 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

Hi Lukas,

On Mon, Sep 23, 2019 at 07:34:03AM +0200, Lukas Wunner wrote:
> On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote:
> > If there are more than one PCIe switch with hotplug downstream ports
> > hot-removing them leads to a following deadlock:
> 
> For the record, I think my comments on v1 of this patch still apply:
> 
> https://patchwork.ozlabs.org/patch/1117870/#2230798

Well, so do I ;-)

As I tried to explain in v1 discussion, I think what we do here in this
patch is correct thing to do regardless. I mean once the hardware is
gone the driver should not do any decisions based on what it thinks it
reads from the now missing hardware. This also makes the deadlock
problem go away on all the system I've been testing. Where previously I
was able to reproduce the deadlock 100% reliably I have not seen it
happen once with this one applied (and haven't got reports from our
internal testing either).

Regarding suggestion of unbinding PCI drivers without
pci_lock_rescan_remove() hold, I haven't looked it too closely but I
think we need to take that lock anyway because when we are unbinding a
hotplug driver it is supposed to remove the hierarchy below touching the
shared structures, possibly concurrently. Unfortunately there is no
documentation what data pci_lock_rescan_remove() actually protects so
first one needs to understand that. I think one way to clean up this is
to use finer grained locking (with documented lock ordering) for PCI bus
structures that can be accessed simultaneusly by different threads. But
that is not a simple task.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-09-23  8:12     ` Mika Westerberg
@ 2019-09-23  8:28       ` Lukas Wunner
  2019-09-23  8:28       ` Mika Westerberg
  1 sibling, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2019-09-23  8:28 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Mon, Sep 23, 2019 at 11:12:37AM +0300, Mika Westerberg wrote:
> Regarding suggestion of unbinding PCI drivers without
> pci_lock_rescan_remove() hold, I haven't looked it too closely but I
> think we need to take that lock anyway because when we are unbinding a
> hotplug driver it is supposed to remove the hierarchy below touching the
> shared structures, possibly concurrently. Unfortunately there is no
> documentation what data pci_lock_rescan_remove() actually protects so
> first one needs to understand that. I think one way to clean up this is
> to use finer grained locking (with documented lock ordering) for PCI bus
> structures that can be accessed simultaneusly by different threads. But
> that is not a simple task.

Right.  I'm (still) working on that, albeit slowly as I'm caught up with
other stuff.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-09-23  8:12     ` Mika Westerberg
  2019-09-23  8:28       ` Lukas Wunner
@ 2019-09-23  8:28       ` Mika Westerberg
  2019-10-18  7:10         ` Kai-Heng Feng
  1 sibling, 1 reply; 15+ messages in thread
From: Mika Westerberg @ 2019-09-23  8:28 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Mon, Sep 23, 2019 at 11:12:42AM +0300, Mika Westerberg wrote:
> Regarding suggestion of unbinding PCI drivers without
> pci_lock_rescan_remove() hold, I haven't looked it too closely but I
> think we need to take that lock anyway because when we are unbinding a
> hotplug driver it is supposed to remove the hierarchy below touching the
> shared structures, possibly concurrently. Unfortunately there is no
> documentation what data pci_lock_rescan_remove() actually protects so
> first one needs to understand that. I think one way to clean up this is
> to use finer grained locking (with documented lock ordering) for PCI bus
> structures that can be accessed simultaneusly by different threads. But
> that is not a simple task.

Now that I looked more closely, I realized it actually is not supposed
to remove the hierarchy below so indeed it might be possible to do that
without taking pci_lock_rescan_remove().

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-09-23  8:28       ` Mika Westerberg
@ 2019-10-18  7:10         ` Kai-Heng Feng
  0 siblings, 0 replies; 15+ messages in thread
From: Kai-Heng Feng @ 2019-10-18  7:10 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: AceLan Kao, Lukas Wunner, Bjorn Helgaas, Rafael J. Wysocki,
	Keith Busch, Andy Shevchenko, Frederick Lawler,
	Gustavo A . R . Silva, Sinan Kaya, Linux PCI, lkml

Hi Mika,

> On Sep 23, 2019, at 16:28, Mika Westerberg <mika.westerberg@linux.intel.com> wrote:
> 
> On Mon, Sep 23, 2019 at 11:12:42AM +0300, Mika Westerberg wrote:
>> Regarding suggestion of unbinding PCI drivers without
>> pci_lock_rescan_remove() hold, I haven't looked it too closely but I
>> think we need to take that lock anyway because when we are unbinding a
>> hotplug driver it is supposed to remove the hierarchy below touching the
>> shared structures, possibly concurrently. Unfortunately there is no
>> documentation what data pci_lock_rescan_remove() actually protects so
>> first one needs to understand that. I think one way to clean up this is
>> to use finer grained locking (with documented lock ordering) for PCI bus
>> structures that can be accessed simultaneusly by different threads. But
>> that is not a simple task.
> 
> Now that I looked more closely, I realized it actually is not supposed
> to remove the hierarchy below so indeed it might be possible to do that
> without taking pci_lock_rescan_remove().

This series fixes S3 resume hang when native PCIe TBT is connected to TBT dock.
Is there going to be a v3 of this series?

Anyway, please collect my tested-by tag,
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

Kai-Heng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-08-12 14:31 ` [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
  2019-08-19  2:28   ` Sinan Kaya
  2019-09-23  5:34   ` Lukas Wunner
@ 2019-10-22 23:00   ` Bjorn Helgaas
  2019-10-23  7:52     ` Mika Westerberg
  2 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-10-22 23:00 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Lukas Wunner, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote:
> If there are more than one PCIe switch with hotplug downstream ports
> hot-removing them leads to a following deadlock:
> 
>  INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  irq/126-pciehp  D    0   198      2 0x80000000
>  Call Trace:
>   __schedule+0x2a2/0x880
>   schedule+0x2c/0x80
>   schedule_timeout+0x246/0x350
>   ? ttwu_do_activate+0x67/0x90
>   wait_for_completion+0xb7/0x140
>   ? wake_up_q+0x80/0x80
>   kthread_stop+0x49/0x110
>   __free_irq+0x15c/0x2a0
>   free_irq+0x32/0x70
>   pcie_shutdown_notification+0x2f/0x50
>   pciehp_remove+0x27/0x50
>   pcie_port_remove_service+0x36/0x50
>   device_release_driver_internal+0x18c/0x250
>   device_release_driver+0x12/0x20
>   bus_remove_device+0xec/0x160
>   device_del+0x13b/0x350
>   ? pcie_port_find_device+0x60/0x60
>   device_unregister+0x1a/0x60
>   remove_iter+0x1e/0x30
>   device_for_each_child+0x56/0x90
>   pcie_port_device_remove+0x22/0x40
>   pcie_portdrv_remove+0x20/0x60
>   pci_device_remove+0x3e/0xc0
>   device_release_driver_internal+0x18c/0x250
>   device_release_driver+0x12/0x20
>   pci_stop_bus_device+0x6f/0x90
>   pci_stop_bus_device+0x31/0x90
>   pci_stop_and_remove_bus_device+0x12/0x20
>   pciehp_unconfigure_device+0x88/0x140
>   pciehp_disable_slot+0x6a/0x110
>   pciehp_handle_presence_or_link_change+0x263/0x400
>   pciehp_ist+0x1c9/0x1d0
>   ? irq_forced_thread_fn+0x80/0x80
>   irq_thread_fn+0x24/0x60
>   irq_thread+0xeb/0x190
>   ? irq_thread_fn+0x60/0x60
>   kthread+0x120/0x140
>   ? irq_thread_check_affinity+0xf0/0xf0
>   ? kthread_park+0x90/0x90
>   ret_from_fork+0x35/0x40
>  INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
>  irq/190-pciehp  D    0  2288      2 0x80000000
>  Call Trace:
>   __schedule+0x2a2/0x880
>   schedule+0x2c/0x80
>   schedule_preempt_disabled+0xe/0x10
>   __mutex_lock.isra.9+0x2e0/0x4d0
>   ? __mutex_lock_slowpath+0x13/0x20
>   __mutex_lock_slowpath+0x13/0x20
>   mutex_lock+0x2c/0x30
>   pci_lock_rescan_remove+0x15/0x20
>   pciehp_unconfigure_device+0x4d/0x140
>   pciehp_disable_slot+0x6a/0x110
>   pciehp_handle_presence_or_link_change+0x263/0x400
>   pciehp_ist+0x1c9/0x1d0
>   ? irq_forced_thread_fn+0x80/0x80
>   irq_thread_fn+0x24/0x60
>   irq_thread+0xeb/0x190
>   ? irq_thread_fn+0x60/0x60
>   kthread+0x120/0x140
>   ? irq_thread_check_affinity+0xf0/0xf0
>   ? kthread_park+0x90/0x90
>   ret_from_fork+0x35/0x40
> 
> What happens here is that the whole hierarchy is runtime resumed and the
> parent PCIe downstream port, who got the hot-remove event, starts
> removing devices below it taking pci_lock_rescan_remove() lock. When the
> child PCIe port is runtime resumed it calls pciehp_check_presence()
> which ends up calling pciehp_card_present() and pciehp_check_link_active().
> Both of these read their parts of PCIe config space by calling helper
> function pcie_capability_read_word(). Now, this function notices that
> the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
> with the capability value set to 0. When pciehp gets this value it
> thinks that its child device is also hot-removed and schedules its IRQ
> thread to handle the event.

I can't remember if there was a reason why 8c0d3a02c130 ("PCI: Add
accessors for PCI Express Capability") reset *val to 0 if
pci_read_config_word() fails.  It doesn't seem like the right thing;
it seems like it would be better for it to be consistent with a plain
pci_read_config_word().

> The deadlock happens when the child's IRQ thread runs and tries to
> acquire pci_lock_rescan_remove() which is already taken by the parent
> and the parent waits for the child's IRQ thread to finish.
> 
> We can prevent this from happening by checking the return value of
> pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
> performing any hot-removal activities.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
> No changes from the previous version.
> 
>  drivers/pci/hotplug/pciehp.h      |  6 +++---
>  drivers/pci/hotplug/pciehp_core.c | 11 ++++++++---
>  drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
>  drivers/pci/hotplug/pciehp_hpc.c  | 32 +++++++++++++++++++++++--------
>  4 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 8c51a04b8083..81c514ab9518 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -173,10 +173,10 @@ int pciehp_query_power_fault(struct controller *ctrl);
>  void pciehp_green_led_on(struct controller *ctrl);
>  void pciehp_green_led_off(struct controller *ctrl);
>  void pciehp_green_led_blink(struct controller *ctrl);
> -bool pciehp_card_present(struct controller *ctrl);
> -bool pciehp_card_present_or_link_active(struct controller *ctrl);
> +int pciehp_card_present(struct controller *ctrl);
> +int pciehp_card_present_or_link_active(struct controller *ctrl);
>  int pciehp_check_link_status(struct controller *ctrl);
> -bool pciehp_check_link_active(struct controller *ctrl);
> +int pciehp_check_link_active(struct controller *ctrl);
>  void pciehp_release_ctrl(struct controller *ctrl);
>  
>  int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index e9f82afa3773..4c032d75c874 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -134,10 +134,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
>  {
>  	struct controller *ctrl = to_ctrl(hotplug_slot);
>  	struct pci_dev *pdev = ctrl->pcie->port;
> +	int ret;
>  
>  	pci_config_pm_runtime_get(pdev);
> -	*value = pciehp_card_present_or_link_active(ctrl);
> +	ret = pciehp_card_present_or_link_active(ctrl);
>  	pci_config_pm_runtime_put(pdev);
> +	if (ret < 0)
> +		return ret;
> +
> +	*value = ret;
>  	return 0;
>  }
>  
> @@ -153,13 +158,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
>   */
>  static void pciehp_check_presence(struct controller *ctrl)
>  {
> -	bool occupied;
> +	int occupied;
>  
>  	down_read(&ctrl->reset_lock);
>  	mutex_lock(&ctrl->state_lock);
>  
>  	occupied = pciehp_card_present_or_link_active(ctrl);
> -	if ((occupied && (ctrl->state == OFF_STATE ||
> +	if ((occupied > 0 && (ctrl->state == OFF_STATE ||
>  			  ctrl->state == BLINKINGON_STATE)) ||
>  	    (!occupied && (ctrl->state == ON_STATE ||
>  			   ctrl->state == BLINKINGOFF_STATE)))
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 631ced0ab28a..5a433cc8621f 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -221,7 +221,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
>  
>  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  {
> -	bool present, link_active;
> +	int present, link_active;
>  
>  	/*
>  	 * If the slot is on and presence or link has changed, turn it off.
> @@ -252,7 +252,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  	mutex_lock(&ctrl->state_lock);
>  	present = pciehp_card_present(ctrl);
>  	link_active = pciehp_check_link_active(ctrl);
> -	if (!present && !link_active) {
> +	if (present <= 0 && link_active <= 0) {
>  		mutex_unlock(&ctrl->state_lock);
>  		return;
>  	}
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index bd990e3371e3..1f918b043adb 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -201,13 +201,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
>  	pcie_do_write_cmd(ctrl, cmd, mask, false);
>  }
>  
> -bool pciehp_check_link_active(struct controller *ctrl)
> +int pciehp_check_link_active(struct controller *ctrl)
>  {
>  	struct pci_dev *pdev = ctrl_dev(ctrl);
>  	u16 lnk_status;
> -	bool ret;
> +	int ret;
> +
> +	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> +	if (ret == PCIBIOS_DEVICE_NOT_FOUND)
> +		return -ENODEV;
>  
> -	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
>  	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
>  
>  	if (ret)
> @@ -373,13 +376,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
>  	*status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
>  }
>  
> -bool pciehp_card_present(struct controller *ctrl)
> +int pciehp_card_present(struct controller *ctrl)
>  {
>  	struct pci_dev *pdev = ctrl_dev(ctrl);
>  	u16 slot_status;
> +	int ret;
>  
> -	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> -	return slot_status & PCI_EXP_SLTSTA_PDS;
> +	ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> +	if (ret == PCIBIOS_DEVICE_NOT_FOUND)
> +		return -ENODEV;

Isn't this racy?

                                        # pdev is present
  pci_read_config_word
    if (pci_dev_is_disconnected(pdev))  # currently false
                                        # pdev is removed
    pci_bus_read_config_word            # fails, returns ~0
  slot_status = ~0

I think pci_read_config_word() checks pci_dev_is_disconnected() merely
as an optimization.  Obviously it can't guarantee that the subsequent
config access will succeed.

If pci_dev_is_disconnected() was false but the config read fails, I
think we'll get ~0 data and return 1, i.e., "PDS was set".

Shouldn't we check for slot_status being an error response (~0)
instead of looking for PCIBIOS_DEVICE_NOT_FOUND?  There are 7 RsvdP
bits in Slot Status, so ~0 is not a valid value for the register.

All 16 bits of Link Status are defined, but ~0 is still an invalid
value because the Current Link Speed and Negotiated Link Width fields
only define a few valid encodings.

> +	return !!(slot_status & PCI_EXP_SLTSTA_PDS);
>  }
>  
>  /**
> @@ -390,10 +397,19 @@ bool pciehp_card_present(struct controller *ctrl)
>   * Presence Detect State bit, this helper also returns true if the Link Active
>   * bit is set.  This is a concession to broken hotplug ports which hardwire
>   * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
> + *
> + * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
> + *	    port is not present anymore returns %-ENODEV.
>   */
> -bool pciehp_card_present_or_link_active(struct controller *ctrl)
> +int pciehp_card_present_or_link_active(struct controller *ctrl)
>  {
> -	return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
> +	int ret;
> +
> +	ret = pciehp_card_present(ctrl);
> +	if (ret)
> +		return ret;
> +
> +	return pciehp_check_link_active(ctrl);

The names of these functions seem misleading to me: all they can
really tell us is "the card *was* present" or "the link *was* active"
at some time in the past.  But the names make it so tempting to
pretend that "the card *is* present" or "the link *is* active", and
that may no longer be true.

I think names like "pciehp_card_absent()" and "pciehp_link_down()"
would make it easier to think about these situations.

>  }
>  
>  int pciehp_query_power_fault(struct controller *ctrl)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-10-22 23:00   ` Bjorn Helgaas
@ 2019-10-23  7:52     ` Mika Westerberg
  2019-10-24  9:38       ` Mika Westerberg
  0 siblings, 1 reply; 15+ messages in thread
From: Mika Westerberg @ 2019-10-23  7:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Lukas Wunner, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Tue, Oct 22, 2019 at 06:00:06PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote:
> > If there are more than one PCIe switch with hotplug downstream ports
> > hot-removing them leads to a following deadlock:
> > 
> >  INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
> >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >  irq/126-pciehp  D    0   198      2 0x80000000
> >  Call Trace:
> >   __schedule+0x2a2/0x880
> >   schedule+0x2c/0x80
> >   schedule_timeout+0x246/0x350
> >   ? ttwu_do_activate+0x67/0x90
> >   wait_for_completion+0xb7/0x140
> >   ? wake_up_q+0x80/0x80
> >   kthread_stop+0x49/0x110
> >   __free_irq+0x15c/0x2a0
> >   free_irq+0x32/0x70
> >   pcie_shutdown_notification+0x2f/0x50
> >   pciehp_remove+0x27/0x50
> >   pcie_port_remove_service+0x36/0x50
> >   device_release_driver_internal+0x18c/0x250
> >   device_release_driver+0x12/0x20
> >   bus_remove_device+0xec/0x160
> >   device_del+0x13b/0x350
> >   ? pcie_port_find_device+0x60/0x60
> >   device_unregister+0x1a/0x60
> >   remove_iter+0x1e/0x30
> >   device_for_each_child+0x56/0x90
> >   pcie_port_device_remove+0x22/0x40
> >   pcie_portdrv_remove+0x20/0x60
> >   pci_device_remove+0x3e/0xc0
> >   device_release_driver_internal+0x18c/0x250
> >   device_release_driver+0x12/0x20
> >   pci_stop_bus_device+0x6f/0x90
> >   pci_stop_bus_device+0x31/0x90
> >   pci_stop_and_remove_bus_device+0x12/0x20
> >   pciehp_unconfigure_device+0x88/0x140
> >   pciehp_disable_slot+0x6a/0x110
> >   pciehp_handle_presence_or_link_change+0x263/0x400
> >   pciehp_ist+0x1c9/0x1d0
> >   ? irq_forced_thread_fn+0x80/0x80
> >   irq_thread_fn+0x24/0x60
> >   irq_thread+0xeb/0x190
> >   ? irq_thread_fn+0x60/0x60
> >   kthread+0x120/0x140
> >   ? irq_thread_check_affinity+0xf0/0xf0
> >   ? kthread_park+0x90/0x90
> >   ret_from_fork+0x35/0x40
> >  INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
> >  irq/190-pciehp  D    0  2288      2 0x80000000
> >  Call Trace:
> >   __schedule+0x2a2/0x880
> >   schedule+0x2c/0x80
> >   schedule_preempt_disabled+0xe/0x10
> >   __mutex_lock.isra.9+0x2e0/0x4d0
> >   ? __mutex_lock_slowpath+0x13/0x20
> >   __mutex_lock_slowpath+0x13/0x20
> >   mutex_lock+0x2c/0x30
> >   pci_lock_rescan_remove+0x15/0x20
> >   pciehp_unconfigure_device+0x4d/0x140
> >   pciehp_disable_slot+0x6a/0x110
> >   pciehp_handle_presence_or_link_change+0x263/0x400
> >   pciehp_ist+0x1c9/0x1d0
> >   ? irq_forced_thread_fn+0x80/0x80
> >   irq_thread_fn+0x24/0x60
> >   irq_thread+0xeb/0x190
> >   ? irq_thread_fn+0x60/0x60
> >   kthread+0x120/0x140
> >   ? irq_thread_check_affinity+0xf0/0xf0
> >   ? kthread_park+0x90/0x90
> >   ret_from_fork+0x35/0x40
> > 
> > What happens here is that the whole hierarchy is runtime resumed and the
> > parent PCIe downstream port, who got the hot-remove event, starts
> > removing devices below it taking pci_lock_rescan_remove() lock. When the
> > child PCIe port is runtime resumed it calls pciehp_check_presence()
> > which ends up calling pciehp_card_present() and pciehp_check_link_active().
> > Both of these read their parts of PCIe config space by calling helper
> > function pcie_capability_read_word(). Now, this function notices that
> > the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
> > with the capability value set to 0. When pciehp gets this value it
> > thinks that its child device is also hot-removed and schedules its IRQ
> > thread to handle the event.
> 
> I can't remember if there was a reason why 8c0d3a02c130 ("PCI: Add
> accessors for PCI Express Capability") reset *val to 0 if
> pci_read_config_word() fails.  It doesn't seem like the right thing;
> it seems like it would be better for it to be consistent with a plain
> pci_read_config_word().
> 
> > The deadlock happens when the child's IRQ thread runs and tries to
> > acquire pci_lock_rescan_remove() which is already taken by the parent
> > and the parent waits for the child's IRQ thread to finish.
> > 
> > We can prevent this from happening by checking the return value of
> > pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
> > performing any hot-removal activities.
> > 
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> > No changes from the previous version.
> > 
> >  drivers/pci/hotplug/pciehp.h      |  6 +++---
> >  drivers/pci/hotplug/pciehp_core.c | 11 ++++++++---
> >  drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
> >  drivers/pci/hotplug/pciehp_hpc.c  | 32 +++++++++++++++++++++++--------
> >  4 files changed, 37 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> > index 8c51a04b8083..81c514ab9518 100644
> > --- a/drivers/pci/hotplug/pciehp.h
> > +++ b/drivers/pci/hotplug/pciehp.h
> > @@ -173,10 +173,10 @@ int pciehp_query_power_fault(struct controller *ctrl);
> >  void pciehp_green_led_on(struct controller *ctrl);
> >  void pciehp_green_led_off(struct controller *ctrl);
> >  void pciehp_green_led_blink(struct controller *ctrl);
> > -bool pciehp_card_present(struct controller *ctrl);
> > -bool pciehp_card_present_or_link_active(struct controller *ctrl);
> > +int pciehp_card_present(struct controller *ctrl);
> > +int pciehp_card_present_or_link_active(struct controller *ctrl);
> >  int pciehp_check_link_status(struct controller *ctrl);
> > -bool pciehp_check_link_active(struct controller *ctrl);
> > +int pciehp_check_link_active(struct controller *ctrl);
> >  void pciehp_release_ctrl(struct controller *ctrl);
> >  
> >  int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
> > diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> > index e9f82afa3773..4c032d75c874 100644
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c
> > @@ -134,10 +134,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
> >  {
> >  	struct controller *ctrl = to_ctrl(hotplug_slot);
> >  	struct pci_dev *pdev = ctrl->pcie->port;
> > +	int ret;
> >  
> >  	pci_config_pm_runtime_get(pdev);
> > -	*value = pciehp_card_present_or_link_active(ctrl);
> > +	ret = pciehp_card_present_or_link_active(ctrl);
> >  	pci_config_pm_runtime_put(pdev);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	*value = ret;
> >  	return 0;
> >  }
> >  
> > @@ -153,13 +158,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
> >   */
> >  static void pciehp_check_presence(struct controller *ctrl)
> >  {
> > -	bool occupied;
> > +	int occupied;
> >  
> >  	down_read(&ctrl->reset_lock);
> >  	mutex_lock(&ctrl->state_lock);
> >  
> >  	occupied = pciehp_card_present_or_link_active(ctrl);
> > -	if ((occupied && (ctrl->state == OFF_STATE ||
> > +	if ((occupied > 0 && (ctrl->state == OFF_STATE ||
> >  			  ctrl->state == BLINKINGON_STATE)) ||
> >  	    (!occupied && (ctrl->state == ON_STATE ||
> >  			   ctrl->state == BLINKINGOFF_STATE)))
> > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > index 631ced0ab28a..5a433cc8621f 100644
> > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > @@ -221,7 +221,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
> >  
> >  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> >  {
> > -	bool present, link_active;
> > +	int present, link_active;
> >  
> >  	/*
> >  	 * If the slot is on and presence or link has changed, turn it off.
> > @@ -252,7 +252,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> >  	mutex_lock(&ctrl->state_lock);
> >  	present = pciehp_card_present(ctrl);
> >  	link_active = pciehp_check_link_active(ctrl);
> > -	if (!present && !link_active) {
> > +	if (present <= 0 && link_active <= 0) {
> >  		mutex_unlock(&ctrl->state_lock);
> >  		return;
> >  	}
> > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> > index bd990e3371e3..1f918b043adb 100644
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -201,13 +201,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
> >  	pcie_do_write_cmd(ctrl, cmd, mask, false);
> >  }
> >  
> > -bool pciehp_check_link_active(struct controller *ctrl)
> > +int pciehp_check_link_active(struct controller *ctrl)
> >  {
> >  	struct pci_dev *pdev = ctrl_dev(ctrl);
> >  	u16 lnk_status;
> > -	bool ret;
> > +	int ret;
> > +
> > +	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> > +	if (ret == PCIBIOS_DEVICE_NOT_FOUND)
> > +		return -ENODEV;
> >  
> > -	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> >  	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
> >  
> >  	if (ret)
> > @@ -373,13 +376,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
> >  	*status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
> >  }
> >  
> > -bool pciehp_card_present(struct controller *ctrl)
> > +int pciehp_card_present(struct controller *ctrl)
> >  {
> >  	struct pci_dev *pdev = ctrl_dev(ctrl);
> >  	u16 slot_status;
> > +	int ret;
> >  
> > -	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> > -	return slot_status & PCI_EXP_SLTSTA_PDS;
> > +	ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> > +	if (ret == PCIBIOS_DEVICE_NOT_FOUND)
> > +		return -ENODEV;
> 
> Isn't this racy?
> 
>                                         # pdev is present
>   pci_read_config_word
>     if (pci_dev_is_disconnected(pdev))  # currently false
>                                         # pdev is removed
>     pci_bus_read_config_word            # fails, returns ~0
>   slot_status = ~0
> 
> I think pci_read_config_word() checks pci_dev_is_disconnected() merely
> as an optimization.  Obviously it can't guarantee that the subsequent
> config access will succeed.
> 
> If pci_dev_is_disconnected() was false but the config read fails, I
> think we'll get ~0 data and return 1, i.e., "PDS was set".

Yes, it is racy. If a device can be hot-removed by user anytime we would
need to check each and every access and based on that stop touching the
hardware. That of course affects performance.

However, in practise this situation happens almost always when the
chained child hotplug port is resumed after the parent hotplug port
notices the downstream link is down. So first access to the now missing
hardware returns ~0 for the child in its resume hook.

> Shouldn't we check for slot_status being an error response (~0)
> instead of looking for PCIBIOS_DEVICE_NOT_FOUND?  There are 7 RsvdP
> bits in Slot Status, so ~0 is not a valid value for the register.
> 
> All 16 bits of Link Status are defined, but ~0 is still an invalid
> value because the Current Link Speed and Negotiated Link Width fields
> only define a few valid encodings.

Indeed that's a good point. I'll try that.

> > +	return !!(slot_status & PCI_EXP_SLTSTA_PDS);
> >  }
> >  
> >  /**
> > @@ -390,10 +397,19 @@ bool pciehp_card_present(struct controller *ctrl)
> >   * Presence Detect State bit, this helper also returns true if the Link Active
> >   * bit is set.  This is a concession to broken hotplug ports which hardwire
> >   * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
> > + *
> > + * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
> > + *	    port is not present anymore returns %-ENODEV.
> >   */
> > -bool pciehp_card_present_or_link_active(struct controller *ctrl)
> > +int pciehp_card_present_or_link_active(struct controller *ctrl)
> >  {
> > -	return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
> > +	int ret;
> > +
> > +	ret = pciehp_card_present(ctrl);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return pciehp_check_link_active(ctrl);
> 
> The names of these functions seem misleading to me: all they can
> really tell us is "the card *was* present" or "the link *was* active"
> at some time in the past.  But the names make it so tempting to
> pretend that "the card *is* present" or "the link *is* active", and
> that may no longer be true.
> 
> I think names like "pciehp_card_absent()" and "pciehp_link_down()"
> would make it easier to think about these situations.

Fair enough. I'll create a separate patch for renaming them.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-10-23  7:52     ` Mika Westerberg
@ 2019-10-24  9:38       ` Mika Westerberg
  2019-10-24 17:11         ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Mika Westerberg @ 2019-10-24  9:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Lukas Wunner, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Wed, Oct 23, 2019 at 10:52:53AM +0300, Mika Westerberg wrote:
> > Shouldn't we check for slot_status being an error response (~0)
> > instead of looking for PCIBIOS_DEVICE_NOT_FOUND?  There are 7 RsvdP
> > bits in Slot Status, so ~0 is not a valid value for the register.
> > 
> > All 16 bits of Link Status are defined, but ~0 is still an invalid
> > value because the Current Link Speed and Negotiated Link Width fields
> > only define a few valid encodings.
> 
> Indeed that's a good point. I'll try that.

Just checking if I understand correctly what you are suggesting.

Currently we use pcie_capability_read_word() and check the return value.
If the device is gone it returns an error and resets *val to 0. That
only works if pci_dev_is_disconnected() is true so we would need to do
something like below.

pciehp_check_link_active():

	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
	if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0)
		return -ENODEV;

Or you mean that we check only for ~0 in which case we either need to
use pci_read_config_word() directly here or modify pcie_capability_read_word()
return ~0 instead of clearing it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
  2019-10-24  9:38       ` Mika Westerberg
@ 2019-10-24 17:11         ` Bjorn Helgaas
  0 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2019-10-24 17:11 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Lukas Wunner, Keith Busch, Andy Shevchenko,
	Frederick Lawler, Gustavo A . R . Silva, Sinan Kaya,
	Kai-Heng Feng, linux-pci, linux-kernel

On Thu, Oct 24, 2019 at 12:38:03PM +0300, Mika Westerberg wrote:
> On Wed, Oct 23, 2019 at 10:52:53AM +0300, Mika Westerberg wrote:
> > > Shouldn't we check for slot_status being an error response (~0)
> > > instead of looking for PCIBIOS_DEVICE_NOT_FOUND?  There are 7 RsvdP
> > > bits in Slot Status, so ~0 is not a valid value for the register.
> > > 
> > > All 16 bits of Link Status are defined, but ~0 is still an invalid
> > > value because the Current Link Speed and Negotiated Link Width fields
> > > only define a few valid encodings.
> > 
> > Indeed that's a good point. I'll try that.
> 
> Just checking if I understand correctly what you are suggesting.
> 
> Currently we use pcie_capability_read_word() and check the return value.
> If the device is gone it returns an error and resets *val to 0. That
> only works if pci_dev_is_disconnected() is true so we would need to do
> something like below.
> 
> pciehp_check_link_active():
> 
> 	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
> 	if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0)
> 		return -ENODEV;

Yes, I guess this is what you'd have to do.

> Or you mean that we check only for ~0 in which case we either need to
> use pci_read_config_word() directly here or modify pcie_capability_read_word()
> return ~0 instead of clearing it?

I *would* like to explore removing the "*val = 0" code from
pci_capability_read*(), but not in the context of this issue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12 14:31 [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg
2019-08-12 14:31 ` [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect Mika Westerberg
2019-08-19  2:28   ` Sinan Kaya
2019-08-19  8:56     ` Mika Westerberg
2019-08-19 12:28       ` Sinan Kaya
2019-09-23  5:34   ` Lukas Wunner
2019-09-23  8:12     ` Mika Westerberg
2019-09-23  8:28       ` Lukas Wunner
2019-09-23  8:28       ` Mika Westerberg
2019-10-18  7:10         ` Kai-Heng Feng
2019-10-22 23:00   ` Bjorn Helgaas
2019-10-23  7:52     ` Mika Westerberg
2019-10-24  9:38       ` Mika Westerberg
2019-10-24 17:11         ` Bjorn Helgaas
2019-09-19  6:42 ` [PATCH v2 1/2] PCI: pciehp: Do not disable interrupt twice on suspend Mika Westerberg

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git