All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order
@ 2023-01-05 23:03 Matthias Kaehlcke
  2023-01-05 23:03 ` [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver Matthias Kaehlcke
  2023-01-06 15:32 ` [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Greg Kroah-Hartman
  0 siblings, 2 replies; 9+ messages in thread
From: Matthias Kaehlcke @ 2023-01-05 23:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Johan Hovold, linux-usb, Stefan Wahren, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Matthias Kaehlcke, Ravi Chandra Sadineni

The onboard_hub 'driver' consists of two drivers, a platform
driver and a USB driver. Currently when the onboard hub driver
is initialized it first registers the platform driver, then the
USB driver. This results in a race condition when the 'attach'
work is executed, which is scheduled when the platform device
is probed. The purpose of fhe 'attach' work is to bind elegible
USB hub devices to the onboard_hub USB driver. This fails if
the work runs before the USB driver has been registered.

Register the USB driver first, then the platform driver. This
increases the chances that the onboard_hub USB devices are probed
before their corresponding platform device, which the USB driver
tries to locate in _probe(). The driver already handles this
situation and defers probing if the onboard hub platform device
doesn't exist yet.

Cc: stable@vger.kernel.org
Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
Link: https://lore.kernel.org/lkml/Y6W00vQm3jfLflUJ@hovoldconsulting.com/T/#m0d64295f017942fd988f7c53425db302d61952b4
Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---

 drivers/usb/misc/onboard_usb_hub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/misc/onboard_usb_hub.c b/drivers/usb/misc/onboard_usb_hub.c
index 94e7966e199d..db0844b30bbd 100644
--- a/drivers/usb/misc/onboard_usb_hub.c
+++ b/drivers/usb/misc/onboard_usb_hub.c
@@ -433,13 +433,13 @@ static int __init onboard_hub_init(void)
 {
 	int ret;
 
-	ret = platform_driver_register(&onboard_hub_driver);
+	ret = usb_register_device_driver(&onboard_hub_usbdev_driver, THIS_MODULE);
 	if (ret)
 		return ret;
 
-	ret = usb_register_device_driver(&onboard_hub_usbdev_driver, THIS_MODULE);
+	ret = platform_driver_register(&onboard_hub_driver);
 	if (ret)
-		platform_driver_unregister(&onboard_hub_driver);
+		usb_deregister_device_driver(&onboard_hub_usbdev_driver);
 
 	return ret;
 }
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver
  2023-01-05 23:03 [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Matthias Kaehlcke
@ 2023-01-05 23:03 ` Matthias Kaehlcke
  2023-01-07 17:23   ` Stefan Wahren
  2023-01-06 15:32 ` [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Greg Kroah-Hartman
  1 sibling, 1 reply; 9+ messages in thread
From: Matthias Kaehlcke @ 2023-01-05 23:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Johan Hovold, linux-usb, Stefan Wahren, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Matthias Kaehlcke, Ravi Chandra Sadineni

Currently each onboard_hub platform device owns an 'attach' work,
which is scheduled when the device probes. With this deadlocks
have been reported on a Raspberry Pi 3 B+ [1], which has nested
onboard hubs.

The flow of the deadlock is something like this (with the onboard_hub
driver built as a module) [2]:

- USB root hub is instantiated
- core hub driver calls onboard_hub_create_pdevs(), which creates the
  'raw' platform device for the 1st level hub
- 1st level hub is probed by the core hub driver
- core hub driver calls onboard_hub_create_pdevs(), which creates
  the 'raw' platform device for the 2nd level hub

- onboard_hub platform driver is registered
- platform device for 1st level hub is probed
  - schedules 'attach' work
- platform device for 2nd level hub is probed
  - schedules 'attach' work

- onboard_hub USB driver is registered
- device (and parent) lock of hub is held while the device is
  re-probed with the onboard_hub driver

- 'attach' work (running in another thread) calls driver_attach(), which
   blocks on one of the hub device locks

- onboard_hub_destroy_pdevs() is called by the core hub driver when one
  of the hubs is detached
- destroying the pdevs invokes onboard_hub_remove(), which waits for the
  'attach' work to complete
  - waits forever, since the 'attach' work can't acquire the device lock

Use a single work struct for the driver instead of having a work struct
per onboard hub platform driver instance. With that it isn't necessary
to cancel the work in onboard_hub_remove(), which fixes the deadlock.
The work is only cancelled when the driver is unloaded.

[1] https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
[2] https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/

Cc: stable@vger.kernel.org
Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
Link: https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
Link: https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---

 drivers/usb/misc/onboard_usb_hub.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/misc/onboard_usb_hub.c b/drivers/usb/misc/onboard_usb_hub.c
index db0844b30bbd..8bc4deb465f0 100644
--- a/drivers/usb/misc/onboard_usb_hub.c
+++ b/drivers/usb/misc/onboard_usb_hub.c
@@ -27,7 +27,10 @@
 
 #include "onboard_usb_hub.h"
 
+static void onboard_hub_attach_usb_driver(struct work_struct *work);
+
 static struct usb_device_driver onboard_hub_usbdev_driver;
+static DECLARE_WORK(attach_usb_driver_work, onboard_hub_attach_usb_driver);
 
 /************************** Platform driver **************************/
 
@@ -45,7 +48,6 @@ struct onboard_hub {
 	bool is_powered_on;
 	bool going_away;
 	struct list_head udev_list;
-	struct work_struct attach_usb_driver_work;
 	struct mutex lock;
 };
 
@@ -270,9 +272,15 @@ static int onboard_hub_probe(struct platform_device *pdev)
 	 *
 	 * This needs to be done deferred to avoid self-deadlocks on systems
 	 * with nested onboard hubs.
+	 *
+	 * If the work is already running wait for it to complete, then
+	 * schedule it again to ensure that the USB devices of this onboard
+	 * hub instance are bound to the USB driver.
 	 */
-	INIT_WORK(&hub->attach_usb_driver_work, onboard_hub_attach_usb_driver);
-	schedule_work(&hub->attach_usb_driver_work);
+	while (work_busy(&attach_usb_driver_work) & WORK_BUSY_RUNNING)
+		msleep(10);
+
+	schedule_work(&attach_usb_driver_work);
 
 	return 0;
 }
@@ -285,9 +293,6 @@ static int onboard_hub_remove(struct platform_device *pdev)
 
 	hub->going_away = true;
 
-	if (&hub->attach_usb_driver_work != current_work())
-		cancel_work_sync(&hub->attach_usb_driver_work);
-
 	mutex_lock(&hub->lock);
 
 	/* unbind the USB devices to avoid dangling references to this device */
@@ -449,6 +454,8 @@ static void __exit onboard_hub_exit(void)
 {
 	usb_deregister_device_driver(&onboard_hub_usbdev_driver);
 	platform_driver_unregister(&onboard_hub_driver);
+
+	cancel_work_sync(&attach_usb_driver_work);
 }
 module_exit(onboard_hub_exit);
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order
  2023-01-05 23:03 [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Matthias Kaehlcke
  2023-01-05 23:03 ` [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver Matthias Kaehlcke
@ 2023-01-06 15:32 ` Greg Kroah-Hartman
  2023-01-06 16:13   ` Matthias Kaehlcke
  1 sibling, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2023-01-06 15:32 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Johan Hovold, linux-usb, Stefan Wahren, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Ravi Chandra Sadineni

On Thu, Jan 05, 2023 at 11:03:28PM +0000, Matthias Kaehlcke wrote:
> The onboard_hub 'driver' consists of two drivers, a platform
> driver and a USB driver. Currently when the onboard hub driver
> is initialized it first registers the platform driver, then the
> USB driver. This results in a race condition when the 'attach'
> work is executed, which is scheduled when the platform device
> is probed. The purpose of fhe 'attach' work is to bind elegible
> USB hub devices to the onboard_hub USB driver. This fails if
> the work runs before the USB driver has been registered.
> 
> Register the USB driver first, then the platform driver. This
> increases the chances that the onboard_hub USB devices are probed
> before their corresponding platform device, which the USB driver
> tries to locate in _probe(). The driver already handles this
> situation and defers probing if the onboard hub platform device
> doesn't exist yet.
> 
> Cc: stable@vger.kernel.org
> Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
> Link: https://lore.kernel.org/lkml/Y6W00vQm3jfLflUJ@hovoldconsulting.com/T/#m0d64295f017942fd988f7c53425db302d61952b4
> Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> 
>  drivers/usb/misc/onboard_usb_hub.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Does this superseed this thread:
	Link: https://lore.kernel.org/r/20221222022605.v2.1.If5e7ec83b1782e4dffa6ea759416a27326c8231d@changeid

or is that also needed?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order
  2023-01-06 15:32 ` [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Greg Kroah-Hartman
@ 2023-01-06 16:13   ` Matthias Kaehlcke
  0 siblings, 0 replies; 9+ messages in thread
From: Matthias Kaehlcke @ 2023-01-06 16:13 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Johan Hovold, linux-usb, Stefan Wahren, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Ravi Chandra Sadineni

On Fri, Jan 06, 2023 at 04:32:52PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 05, 2023 at 11:03:28PM +0000, Matthias Kaehlcke wrote:
> > The onboard_hub 'driver' consists of two drivers, a platform
> > driver and a USB driver. Currently when the onboard hub driver
> > is initialized it first registers the platform driver, then the
> > USB driver. This results in a race condition when the 'attach'
> > work is executed, which is scheduled when the platform device
> > is probed. The purpose of fhe 'attach' work is to bind elegible
> > USB hub devices to the onboard_hub USB driver. This fails if
> > the work runs before the USB driver has been registered.
> > 
> > Register the USB driver first, then the platform driver. This
> > increases the chances that the onboard_hub USB devices are probed
> > before their corresponding platform device, which the USB driver
> > tries to locate in _probe(). The driver already handles this
> > situation and defers probing if the onboard hub platform device
> > doesn't exist yet.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
> > Link: https://lore.kernel.org/lkml/Y6W00vQm3jfLflUJ@hovoldconsulting.com/T/#m0d64295f017942fd988f7c53425db302d61952b4
> > Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > 
> >  drivers/usb/misc/onboard_usb_hub.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Does this superseed this thread:
> 	Link: https://lore.kernel.org/r/20221222022605.v2.1.If5e7ec83b1782e4dffa6ea759416a27326c8231d@changeid

This series ("usb: misc: onboard_hub: Invert driver registration order"
et al) fixes the race condition mentioned in the commit message
of the other series ("usb: misc: onboard_usb_hub: Don't create platform
devices for DT nodes without 'vdd-supply'" et al), plus another race
that was reported later (this patch).

> or is that also needed?

This series is (mostly) independent from the other one, it should
fix the issue that was reported for the RPi 3 B+. It can be landed
even if the other one is abandonded.

With this series the other one doesn't fix or mitigate any actual
issue (AFAIK), it would only be an optimization (don't instantiate
the onboard_hub drivers if they'd do nothing).

> confused

Sorry, hope this clarifies things a bit.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver
  2023-01-05 23:03 ` [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver Matthias Kaehlcke
@ 2023-01-07 17:23   ` Stefan Wahren
  2023-01-08 10:47     ` Stefan Wahren
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Wahren @ 2023-01-07 17:23 UTC (permalink / raw)
  To: Matthias Kaehlcke, Greg Kroah-Hartman
  Cc: Johan Hovold, linux-usb, Alexander Stein, Icenowy Zheng,
	Douglas Anderson, stable, linux-kernel, Ravi Chandra Sadineni

Hi Matthias,

Am 06.01.23 um 00:03 schrieb Matthias Kaehlcke:
> Currently each onboard_hub platform device owns an 'attach' work,
> which is scheduled when the device probes. With this deadlocks
> have been reported on a Raspberry Pi 3 B+ [1], which has nested
> onboard hubs.
>
> The flow of the deadlock is something like this (with the onboard_hub
> driver built as a module) [2]:
>
> - USB root hub is instantiated
> - core hub driver calls onboard_hub_create_pdevs(), which creates the
>    'raw' platform device for the 1st level hub
> - 1st level hub is probed by the core hub driver
> - core hub driver calls onboard_hub_create_pdevs(), which creates
>    the 'raw' platform device for the 2nd level hub
>
> - onboard_hub platform driver is registered
> - platform device for 1st level hub is probed
>    - schedules 'attach' work
> - platform device for 2nd level hub is probed
>    - schedules 'attach' work
>
> - onboard_hub USB driver is registered
> - device (and parent) lock of hub is held while the device is
>    re-probed with the onboard_hub driver
>
> - 'attach' work (running in another thread) calls driver_attach(), which
>     blocks on one of the hub device locks
>
> - onboard_hub_destroy_pdevs() is called by the core hub driver when one
>    of the hubs is detached
> - destroying the pdevs invokes onboard_hub_remove(), which waits for the
>    'attach' work to complete
>    - waits forever, since the 'attach' work can't acquire the device lock
>
> Use a single work struct for the driver instead of having a work struct
> per onboard hub platform driver instance. With that it isn't necessary
> to cancel the work in onboard_hub_remove(), which fixes the deadlock.
> The work is only cancelled when the driver is unloaded.

i applied both patches for this series on top of v6.1 
(multi_v7_defconfig), but usb is still broken on Raspberry Pi 3 B+

Best regards

>
> [1] https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
> [2] https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
>
> Cc: stable@vger.kernel.org
> Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
> Link: https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
> Link: https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
> Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
>
>   drivers/usb/misc/onboard_usb_hub.c | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/usb/misc/onboard_usb_hub.c b/drivers/usb/misc/onboard_usb_hub.c
> index db0844b30bbd..8bc4deb465f0 100644
> --- a/drivers/usb/misc/onboard_usb_hub.c
> +++ b/drivers/usb/misc/onboard_usb_hub.c
> @@ -27,7 +27,10 @@
>   
>   #include "onboard_usb_hub.h"
>   
> +static void onboard_hub_attach_usb_driver(struct work_struct *work);
> +
>   static struct usb_device_driver onboard_hub_usbdev_driver;
> +static DECLARE_WORK(attach_usb_driver_work, onboard_hub_attach_usb_driver);
>   
>   /************************** Platform driver **************************/
>   
> @@ -45,7 +48,6 @@ struct onboard_hub {
>   	bool is_powered_on;
>   	bool going_away;
>   	struct list_head udev_list;
> -	struct work_struct attach_usb_driver_work;
>   	struct mutex lock;
>   };
>   
> @@ -270,9 +272,15 @@ static int onboard_hub_probe(struct platform_device *pdev)
>   	 *
>   	 * This needs to be done deferred to avoid self-deadlocks on systems
>   	 * with nested onboard hubs.
> +	 *
> +	 * If the work is already running wait for it to complete, then
> +	 * schedule it again to ensure that the USB devices of this onboard
> +	 * hub instance are bound to the USB driver.
>   	 */
> -	INIT_WORK(&hub->attach_usb_driver_work, onboard_hub_attach_usb_driver);
> -	schedule_work(&hub->attach_usb_driver_work);
> +	while (work_busy(&attach_usb_driver_work) & WORK_BUSY_RUNNING)
> +		msleep(10);
> +
> +	schedule_work(&attach_usb_driver_work);
>   
>   	return 0;
>   }
> @@ -285,9 +293,6 @@ static int onboard_hub_remove(struct platform_device *pdev)
>   
>   	hub->going_away = true;
>   
> -	if (&hub->attach_usb_driver_work != current_work())
> -		cancel_work_sync(&hub->attach_usb_driver_work);
> -
>   	mutex_lock(&hub->lock);
>   
>   	/* unbind the USB devices to avoid dangling references to this device */
> @@ -449,6 +454,8 @@ static void __exit onboard_hub_exit(void)
>   {
>   	usb_deregister_device_driver(&onboard_hub_usbdev_driver);
>   	platform_driver_unregister(&onboard_hub_driver);
> +
> +	cancel_work_sync(&attach_usb_driver_work);
>   }
>   module_exit(onboard_hub_exit);
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver
  2023-01-07 17:23   ` Stefan Wahren
@ 2023-01-08 10:47     ` Stefan Wahren
  2023-01-09 17:40       ` Matthias Kaehlcke
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Wahren @ 2023-01-08 10:47 UTC (permalink / raw)
  To: Matthias Kaehlcke, Greg Kroah-Hartman
  Cc: Johan Hovold, linux-usb, Alexander Stein, Icenowy Zheng,
	Douglas Anderson, stable, linux-kernel, Ravi Chandra Sadineni

Am 07.01.23 um 18:23 schrieb Stefan Wahren:
> Hi Matthias,
>
> Am 06.01.23 um 00:03 schrieb Matthias Kaehlcke:
>> Currently each onboard_hub platform device owns an 'attach' work,
>> which is scheduled when the device probes. With this deadlocks
>> have been reported on a Raspberry Pi 3 B+ [1], which has nested
>> onboard hubs.
>>
>> The flow of the deadlock is something like this (with the onboard_hub
>> driver built as a module) [2]:
>>
>> - USB root hub is instantiated
>> - core hub driver calls onboard_hub_create_pdevs(), which creates the
>>    'raw' platform device for the 1st level hub
>> - 1st level hub is probed by the core hub driver
>> - core hub driver calls onboard_hub_create_pdevs(), which creates
>>    the 'raw' platform device for the 2nd level hub
>>
>> - onboard_hub platform driver is registered
>> - platform device for 1st level hub is probed
>>    - schedules 'attach' work
>> - platform device for 2nd level hub is probed
>>    - schedules 'attach' work
>>
>> - onboard_hub USB driver is registered
>> - device (and parent) lock of hub is held while the device is
>>    re-probed with the onboard_hub driver
>>
>> - 'attach' work (running in another thread) calls driver_attach(), which
>>     blocks on one of the hub device locks
>>
>> - onboard_hub_destroy_pdevs() is called by the core hub driver when one
>>    of the hubs is detached
>> - destroying the pdevs invokes onboard_hub_remove(), which waits for the
>>    'attach' work to complete
>>    - waits forever, since the 'attach' work can't acquire the device 
>> lock
>>
>> Use a single work struct for the driver instead of having a work struct
>> per onboard hub platform driver instance. With that it isn't necessary
>> to cancel the work in onboard_hub_remove(), which fixes the deadlock.
>> The work is only cancelled when the driver is unloaded.
>
> i applied both patches for this series on top of v6.1 
> (multi_v7_defconfig), but usb is still broken on Raspberry Pi 3 B+

here is the hung task output:

[  243.682193] INFO: task kworker/1:0:18 blocked for more than 122 seconds.
[  243.682222]       Not tainted 6.1.0-00002-gaa61d98d165b #2
[  243.682233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  243.682242] task:kworker/1:0     state:D stack:0     pid:18 
ppid:2      flags:0x00000000
[  243.682267] Workqueue: events onboard_hub_attach_usb_driver 
[onboard_usb_hub]
[  243.682317]  __schedule from schedule+0x4c/0xe0
[  243.682345]  schedule from schedule_preempt_disabled+0xc/0x10
[  243.682367]  schedule_preempt_disabled from 
__mutex_lock.constprop.0+0x244/0x804
[  243.682394]  __mutex_lock.constprop.0 from __driver_attach+0x7c/0x188
[  243.682421]  __driver_attach from bus_for_each_dev+0x70/0xb0
[  243.682449]  bus_for_each_dev from 
onboard_hub_attach_usb_driver+0xc/0x28 [onboard_usb_hub]
[  243.682494]  onboard_hub_attach_usb_driver [onboard_usb_hub] from 
process_one_work+0x1ec/0x4d0
[  243.682534]  process_one_work from worker_thread+0x50/0x540
[  243.682559]  worker_thread from kthread+0xd0/0xec
[  243.682582]  kthread from ret_from_fork+0x14/0x2c
[  243.682600] Exception stack(0xf086dfb0 to 0xf086dff8)
[  243.682615] dfa0:                                     00000000 
00000000 00000000 00000000
[  243.682631] dfc0: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[  243.682646] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  243.682692] INFO: task kworker/1:2:82 blocked for more than 122 seconds.
[  243.682703]       Not tainted 6.1.0-00002-gaa61d98d165b #2
[  243.682713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  243.682721] task:kworker/1:2     state:D stack:0     pid:82 
ppid:2      flags:0x00000000
[  243.682741] Workqueue: events_power_efficient hub_init_func2
[  243.682764]  __schedule from schedule+0x4c/0xe0
[  243.682785]  schedule from schedule_preempt_disabled+0xc/0x10
[  243.682808]  schedule_preempt_disabled from 
__mutex_lock.constprop.0+0x244/0x804
[  243.682833]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
[  243.682859]  hub_activate from process_one_work+0x1ec/0x4d0
[  243.682883]  process_one_work from worker_thread+0x50/0x540
[  243.682907]  worker_thread from kthread+0xd0/0xec
[  243.682927]  kthread from ret_from_fork+0x14/0x2c
[  243.682944] Exception stack(0xf1509fb0 to 0xf1509ff8)
[  243.682958] 9fa0:                                     00000000 
00000000 00000000 00000000
[  243.682974] 9fc0: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[  243.682988] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  243.683023] INFO: task kworker/1:4:257 blocked for more than 122 seconds.
[  243.683034]       Not tainted 6.1.0-00002-gaa61d98d165b #2
[  243.683043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  243.683051] task:kworker/1:4     state:D stack:0     pid:257 
ppid:2      flags:0x00000000
[  243.683071] Workqueue: events_power_efficient hub_init_func2
[  243.683092]  __schedule from schedule+0x4c/0xe0
[  243.683113]  schedule from schedule_preempt_disabled+0xc/0x10
[  243.683135]  schedule_preempt_disabled from 
__mutex_lock.constprop.0+0x244/0x804
[  243.683160]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
[  243.683184]  hub_activate from process_one_work+0x1ec/0x4d0
[  243.683209]  process_one_work from worker_thread+0x50/0x540
[  243.683233]  worker_thread from kthread+0xd0/0xec
[  243.683253]  kthread from ret_from_fork+0x14/0x2c
[  243.683270] Exception stack(0xf09d9fb0 to 0xf09d9ff8)
[  243.683283] 9fa0:                                     00000000 
00000000 00000000 00000000
[  243.683299] 9fc0: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[  243.683313] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000

>
> Best regards
>
>>
>> [1] 
>> https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
>> [2] https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
>> Link: 
>> https://lore.kernel.org/r/d04bcc45-3471-4417-b30b-5cf9880d785d@i2se.com/
>> Link: https://lore.kernel.org/all/Y6OrGbqaMy2iVDWB@google.com/
>> Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
>> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
>> ---
>>
>>   drivers/usb/misc/onboard_usb_hub.c | 19 +++++++++++++------
>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/usb/misc/onboard_usb_hub.c 
>> b/drivers/usb/misc/onboard_usb_hub.c
>> index db0844b30bbd..8bc4deb465f0 100644
>> --- a/drivers/usb/misc/onboard_usb_hub.c
>> +++ b/drivers/usb/misc/onboard_usb_hub.c
>> @@ -27,7 +27,10 @@
>>     #include "onboard_usb_hub.h"
>>   +static void onboard_hub_attach_usb_driver(struct work_struct *work);
>> +
>>   static struct usb_device_driver onboard_hub_usbdev_driver;
>> +static DECLARE_WORK(attach_usb_driver_work, 
>> onboard_hub_attach_usb_driver);
>>     /************************** Platform driver 
>> **************************/
>>   @@ -45,7 +48,6 @@ struct onboard_hub {
>>       bool is_powered_on;
>>       bool going_away;
>>       struct list_head udev_list;
>> -    struct work_struct attach_usb_driver_work;
>>       struct mutex lock;
>>   };
>>   @@ -270,9 +272,15 @@ static int onboard_hub_probe(struct 
>> platform_device *pdev)
>>        *
>>        * This needs to be done deferred to avoid self-deadlocks on 
>> systems
>>        * with nested onboard hubs.
>> +     *
>> +     * If the work is already running wait for it to complete, then
>> +     * schedule it again to ensure that the USB devices of this onboard
>> +     * hub instance are bound to the USB driver.
>>        */
>> -    INIT_WORK(&hub->attach_usb_driver_work, 
>> onboard_hub_attach_usb_driver);
>> -    schedule_work(&hub->attach_usb_driver_work);
>> +    while (work_busy(&attach_usb_driver_work) & WORK_BUSY_RUNNING)
>> +        msleep(10);
>> +
>> +    schedule_work(&attach_usb_driver_work);
>>         return 0;
>>   }
>> @@ -285,9 +293,6 @@ static int onboard_hub_remove(struct 
>> platform_device *pdev)
>>         hub->going_away = true;
>>   -    if (&hub->attach_usb_driver_work != current_work())
>> -        cancel_work_sync(&hub->attach_usb_driver_work);
>> -
>>       mutex_lock(&hub->lock);
>>         /* unbind the USB devices to avoid dangling references to 
>> this device */
>> @@ -449,6 +454,8 @@ static void __exit onboard_hub_exit(void)
>>   {
>> usb_deregister_device_driver(&onboard_hub_usbdev_driver);
>>       platform_driver_unregister(&onboard_hub_driver);
>> +
>> +    cancel_work_sync(&attach_usb_driver_work);
>>   }
>>   module_exit(onboard_hub_exit);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver
  2023-01-08 10:47     ` Stefan Wahren
@ 2023-01-09 17:40       ` Matthias Kaehlcke
  2023-01-09 20:19         ` Stefan Wahren
  0 siblings, 1 reply; 9+ messages in thread
From: Matthias Kaehlcke @ 2023-01-09 17:40 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Greg Kroah-Hartman, Johan Hovold, linux-usb, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Ravi Chandra Sadineni

On Sun, Jan 08, 2023 at 11:47:13AM +0100, Stefan Wahren wrote:
> Am 07.01.23 um 18:23 schrieb Stefan Wahren:
> > Hi Matthias,
> > 
> > Am 06.01.23 um 00:03 schrieb Matthias Kaehlcke:
> > > Currently each onboard_hub platform device owns an 'attach' work,
> > > which is scheduled when the device probes. With this deadlocks
> > > have been reported on a Raspberry Pi 3 B+ [1], which has nested
> > > onboard hubs.
> > > 
> > > The flow of the deadlock is something like this (with the onboard_hub
> > > driver built as a module) [2]:
> > > 
> > > - USB root hub is instantiated
> > > - core hub driver calls onboard_hub_create_pdevs(), which creates the
> > >    'raw' platform device for the 1st level hub
> > > - 1st level hub is probed by the core hub driver
> > > - core hub driver calls onboard_hub_create_pdevs(), which creates
> > >    the 'raw' platform device for the 2nd level hub
> > > 
> > > - onboard_hub platform driver is registered
> > > - platform device for 1st level hub is probed
> > >    - schedules 'attach' work
> > > - platform device for 2nd level hub is probed
> > >    - schedules 'attach' work
> > > 
> > > - onboard_hub USB driver is registered
> > > - device (and parent) lock of hub is held while the device is
> > >    re-probed with the onboard_hub driver
> > > 
> > > - 'attach' work (running in another thread) calls driver_attach(), which
> > >     blocks on one of the hub device locks
> > > 
> > > - onboard_hub_destroy_pdevs() is called by the core hub driver when one
> > >    of the hubs is detached
> > > - destroying the pdevs invokes onboard_hub_remove(), which waits for the
> > >    'attach' work to complete
> > >    - waits forever, since the 'attach' work can't acquire the device
> > > lock
> > > 
> > > Use a single work struct for the driver instead of having a work struct
> > > per onboard hub platform driver instance. With that it isn't necessary
> > > to cancel the work in onboard_hub_remove(), which fixes the deadlock.
> > > The work is only cancelled when the driver is unloaded.
> > 
> > i applied both patches for this series on top of v6.1
> > (multi_v7_defconfig), but usb is still broken on Raspberry Pi 3 B+

Thanks for testing.

> here is the hung task output:
> 
> [  243.682193] INFO: task kworker/1:0:18 blocked for more than 122 seconds.
> [  243.682222]       Not tainted 6.1.0-00002-gaa61d98d165b #2
> [  243.682233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.682242] task:kworker/1:0     state:D stack:0     pid:18 ppid:2     
> flags:0x00000000
> [  243.682267] Workqueue: events onboard_hub_attach_usb_driver
> [onboard_usb_hub]
> [  243.682317]  __schedule from schedule+0x4c/0xe0
> [  243.682345]  schedule from schedule_preempt_disabled+0xc/0x10
> [  243.682367]  schedule_preempt_disabled from
> __mutex_lock.constprop.0+0x244/0x804
> [  243.682394]  __mutex_lock.constprop.0 from __driver_attach+0x7c/0x188
> [  243.682421]  __driver_attach from bus_for_each_dev+0x70/0xb0
> [  243.682449]  bus_for_each_dev from onboard_hub_attach_usb_driver+0xc/0x28
> [onboard_usb_hub]
> [  243.682494]  onboard_hub_attach_usb_driver [onboard_usb_hub] from
> process_one_work+0x1ec/0x4d0
> [  243.682534]  process_one_work from worker_thread+0x50/0x540
> [  243.682559]  worker_thread from kthread+0xd0/0xec
> [  243.682582]  kthread from ret_from_fork+0x14/0x2c
> [  243.682600] Exception stack(0xf086dfb0 to 0xf086dff8)
> [  243.682615] dfa0:                                     00000000 00000000
> 00000000 00000000
> [  243.682631] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000
> [  243.682646] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  243.682692] INFO: task kworker/1:2:82 blocked for more than 122 seconds.
> [  243.682703]       Not tainted 6.1.0-00002-gaa61d98d165b #2
> [  243.682713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.682721] task:kworker/1:2     state:D stack:0     pid:82 ppid:2     
> flags:0x00000000
> [  243.682741] Workqueue: events_power_efficient hub_init_func2
> [  243.682764]  __schedule from schedule+0x4c/0xe0
> [  243.682785]  schedule from schedule_preempt_disabled+0xc/0x10
> [  243.682808]  schedule_preempt_disabled from
> __mutex_lock.constprop.0+0x244/0x804
> [  243.682833]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
> [  243.682859]  hub_activate from process_one_work+0x1ec/0x4d0
> [  243.682883]  process_one_work from worker_thread+0x50/0x540
> [  243.682907]  worker_thread from kthread+0xd0/0xec
> [  243.682927]  kthread from ret_from_fork+0x14/0x2c
> [  243.682944] Exception stack(0xf1509fb0 to 0xf1509ff8)
> [  243.682958] 9fa0:                                     00000000 00000000
> 00000000 00000000
> [  243.682974] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000
> [  243.682988] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  243.683023] INFO: task kworker/1:4:257 blocked for more than 122 seconds.
> [  243.683034]       Not tainted 6.1.0-00002-gaa61d98d165b #2
> [  243.683043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.683051] task:kworker/1:4     state:D stack:0     pid:257 ppid:2     
> flags:0x00000000
> [  243.683071] Workqueue: events_power_efficient hub_init_func2
> [  243.683092]  __schedule from schedule+0x4c/0xe0
> [  243.683113]  schedule from schedule_preempt_disabled+0xc/0x10
> [  243.683135]  schedule_preempt_disabled from
> __mutex_lock.constprop.0+0x244/0x804
> [  243.683160]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
> [  243.683184]  hub_activate from process_one_work+0x1ec/0x4d0
> [  243.683209]  process_one_work from worker_thread+0x50/0x540
> [  243.683233]  worker_thread from kthread+0xd0/0xec
> [  243.683253]  kthread from ret_from_fork+0x14/0x2c
> [  243.683270] Exception stack(0xf09d9fb0 to 0xf09d9ff8)
> [  243.683283] 9fa0:                                     00000000 00000000
> 00000000 00000000
> [  243.683299] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000
> [  243.683313] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000

Does commenting the following help:

  while (work_busy(&attach_usb_driver_work) & WORK_BUSY_RUNNING)
      msleep(10);

?

I'm wondering if the loop is actually needed. The idea behind it was:

The currently running work might not take into account the USB devices
of the hub that is currently probed, which should probe shortly after
the hub was powered on.

The 'attach' work is only needed for USB devices that were previously
detached through device_release_driver() in onboard_hub_remove(). These
USB device objects only persist in the kernel if the hub is not powered
off by onboard_hub_remove() (powering the hub off should be the usual
case).

If onboard_hub_probe() is invoked and the USB device objects persisted,
then an already running 'attach' work should take them into account. If
they didn't persist the running work might miss them, but that wouldn't
be a problem since the newly created USB devices don't need to be
explicitly attached (since they weren't detached previously).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver
  2023-01-09 17:40       ` Matthias Kaehlcke
@ 2023-01-09 20:19         ` Stefan Wahren
  2023-01-10  1:07           ` Matthias Kaehlcke
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Wahren @ 2023-01-09 20:19 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Greg Kroah-Hartman, Johan Hovold, linux-usb, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Ravi Chandra Sadineni

Hi Matthias,

Am 09.01.23 um 18:40 schrieb Matthias Kaehlcke:
> On Sun, Jan 08, 2023 at 11:47:13AM +0100, Stefan Wahren wrote:
>> Am 07.01.23 um 18:23 schrieb Stefan Wahren:
>>> Hi Matthias,
>>>
>>> Am 06.01.23 um 00:03 schrieb Matthias Kaehlcke:
>>>> Currently each onboard_hub platform device owns an 'attach' work,
>>>> which is scheduled when the device probes. With this deadlocks
>>>> have been reported on a Raspberry Pi 3 B+ [1], which has nested
>>>> onboard hubs.
>>>>
>>>> The flow of the deadlock is something like this (with the onboard_hub
>>>> driver built as a module) [2]:
>>>>
>>>> - USB root hub is instantiated
>>>> - core hub driver calls onboard_hub_create_pdevs(), which creates the
>>>>     'raw' platform device for the 1st level hub
>>>> - 1st level hub is probed by the core hub driver
>>>> - core hub driver calls onboard_hub_create_pdevs(), which creates
>>>>     the 'raw' platform device for the 2nd level hub
>>>>
>>>> - onboard_hub platform driver is registered
>>>> - platform device for 1st level hub is probed
>>>>     - schedules 'attach' work
>>>> - platform device for 2nd level hub is probed
>>>>     - schedules 'attach' work
>>>>
>>>> - onboard_hub USB driver is registered
>>>> - device (and parent) lock of hub is held while the device is
>>>>     re-probed with the onboard_hub driver
>>>>
>>>> - 'attach' work (running in another thread) calls driver_attach(), which
>>>>      blocks on one of the hub device locks
>>>>
>>>> - onboard_hub_destroy_pdevs() is called by the core hub driver when one
>>>>     of the hubs is detached
>>>> - destroying the pdevs invokes onboard_hub_remove(), which waits for the
>>>>     'attach' work to complete
>>>>     - waits forever, since the 'attach' work can't acquire the device
>>>> lock
>>>>
>>>> Use a single work struct for the driver instead of having a work struct
>>>> per onboard hub platform driver instance. With that it isn't necessary
>>>> to cancel the work in onboard_hub_remove(), which fixes the deadlock.
>>>> The work is only cancelled when the driver is unloaded.
>>> i applied both patches for this series on top of v6.1
>>> (multi_v7_defconfig), but usb is still broken on Raspberry Pi 3 B+
> Thanks for testing.
>
>> here is the hung task output:
>>
>> [  243.682193] INFO: task kworker/1:0:18 blocked for more than 122 seconds.
>> [  243.682222]       Not tainted 6.1.0-00002-gaa61d98d165b #2
>> [  243.682233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  243.682242] task:kworker/1:0     state:D stack:0     pid:18 ppid:2
>> flags:0x00000000
>> [  243.682267] Workqueue: events onboard_hub_attach_usb_driver
>> [onboard_usb_hub]
>> [  243.682317]  __schedule from schedule+0x4c/0xe0
>> [  243.682345]  schedule from schedule_preempt_disabled+0xc/0x10
>> [  243.682367]  schedule_preempt_disabled from
>> __mutex_lock.constprop.0+0x244/0x804
>> [  243.682394]  __mutex_lock.constprop.0 from __driver_attach+0x7c/0x188
>> [  243.682421]  __driver_attach from bus_for_each_dev+0x70/0xb0
>> [  243.682449]  bus_for_each_dev from onboard_hub_attach_usb_driver+0xc/0x28
>> [onboard_usb_hub]
>> [  243.682494]  onboard_hub_attach_usb_driver [onboard_usb_hub] from
>> process_one_work+0x1ec/0x4d0
>> [  243.682534]  process_one_work from worker_thread+0x50/0x540
>> [  243.682559]  worker_thread from kthread+0xd0/0xec
>> [  243.682582]  kthread from ret_from_fork+0x14/0x2c
>> [  243.682600] Exception stack(0xf086dfb0 to 0xf086dff8)
>> [  243.682615] dfa0:                                     00000000 00000000
>> 00000000 00000000
>> [  243.682631] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000
>> [  243.682646] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> [  243.682692] INFO: task kworker/1:2:82 blocked for more than 122 seconds.
>> [  243.682703]       Not tainted 6.1.0-00002-gaa61d98d165b #2
>> [  243.682713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  243.682721] task:kworker/1:2     state:D stack:0     pid:82 ppid:2
>> flags:0x00000000
>> [  243.682741] Workqueue: events_power_efficient hub_init_func2
>> [  243.682764]  __schedule from schedule+0x4c/0xe0
>> [  243.682785]  schedule from schedule_preempt_disabled+0xc/0x10
>> [  243.682808]  schedule_preempt_disabled from
>> __mutex_lock.constprop.0+0x244/0x804
>> [  243.682833]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
>> [  243.682859]  hub_activate from process_one_work+0x1ec/0x4d0
>> [  243.682883]  process_one_work from worker_thread+0x50/0x540
>> [  243.682907]  worker_thread from kthread+0xd0/0xec
>> [  243.682927]  kthread from ret_from_fork+0x14/0x2c
>> [  243.682944] Exception stack(0xf1509fb0 to 0xf1509ff8)
>> [  243.682958] 9fa0:                                     00000000 00000000
>> 00000000 00000000
>> [  243.682974] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000
>> [  243.682988] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> [  243.683023] INFO: task kworker/1:4:257 blocked for more than 122 seconds.
>> [  243.683034]       Not tainted 6.1.0-00002-gaa61d98d165b #2
>> [  243.683043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  243.683051] task:kworker/1:4     state:D stack:0     pid:257 ppid:2
>> flags:0x00000000
>> [  243.683071] Workqueue: events_power_efficient hub_init_func2
>> [  243.683092]  __schedule from schedule+0x4c/0xe0
>> [  243.683113]  schedule from schedule_preempt_disabled+0xc/0x10
>> [  243.683135]  schedule_preempt_disabled from
>> __mutex_lock.constprop.0+0x244/0x804
>> [  243.683160]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
>> [  243.683184]  hub_activate from process_one_work+0x1ec/0x4d0
>> [  243.683209]  process_one_work from worker_thread+0x50/0x540
>> [  243.683233]  worker_thread from kthread+0xd0/0xec
>> [  243.683253]  kthread from ret_from_fork+0x14/0x2c
>> [  243.683270] Exception stack(0xf09d9fb0 to 0xf09d9ff8)
>> [  243.683283] 9fa0:                                     00000000 00000000
>> 00000000 00000000
>> [  243.683299] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000
>> [  243.683313] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Does commenting the following help:
>
>    while (work_busy(&attach_usb_driver_work) & WORK_BUSY_RUNNING)
>        msleep(10);
>
> ?
Yes, it does. I restarted the board multiple times and it never hang :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver
  2023-01-09 20:19         ` Stefan Wahren
@ 2023-01-10  1:07           ` Matthias Kaehlcke
  0 siblings, 0 replies; 9+ messages in thread
From: Matthias Kaehlcke @ 2023-01-10  1:07 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Greg Kroah-Hartman, Johan Hovold, linux-usb, Alexander Stein,
	Icenowy Zheng, Douglas Anderson, stable, linux-kernel,
	Ravi Chandra Sadineni

On Mon, Jan 09, 2023 at 09:19:02PM +0100, Stefan Wahren wrote:
> Hi Matthias,
> 
> Am 09.01.23 um 18:40 schrieb Matthias Kaehlcke:
> > On Sun, Jan 08, 2023 at 11:47:13AM +0100, Stefan Wahren wrote:
> > > Am 07.01.23 um 18:23 schrieb Stefan Wahren:
> > > > Hi Matthias,
> > > > 
> > > > Am 06.01.23 um 00:03 schrieb Matthias Kaehlcke:
> > > > > Currently each onboard_hub platform device owns an 'attach' work,
> > > > > which is scheduled when the device probes. With this deadlocks
> > > > > have been reported on a Raspberry Pi 3 B+ [1], which has nested
> > > > > onboard hubs.
> > > > > 
> > > > > The flow of the deadlock is something like this (with the onboard_hub
> > > > > driver built as a module) [2]:
> > > > > 
> > > > > - USB root hub is instantiated
> > > > > - core hub driver calls onboard_hub_create_pdevs(), which creates the
> > > > >     'raw' platform device for the 1st level hub
> > > > > - 1st level hub is probed by the core hub driver
> > > > > - core hub driver calls onboard_hub_create_pdevs(), which creates
> > > > >     the 'raw' platform device for the 2nd level hub
> > > > > 
> > > > > - onboard_hub platform driver is registered
> > > > > - platform device for 1st level hub is probed
> > > > >     - schedules 'attach' work
> > > > > - platform device for 2nd level hub is probed
> > > > >     - schedules 'attach' work
> > > > > 
> > > > > - onboard_hub USB driver is registered
> > > > > - device (and parent) lock of hub is held while the device is
> > > > >     re-probed with the onboard_hub driver
> > > > > 
> > > > > - 'attach' work (running in another thread) calls driver_attach(), which
> > > > >      blocks on one of the hub device locks
> > > > > 
> > > > > - onboard_hub_destroy_pdevs() is called by the core hub driver when one
> > > > >     of the hubs is detached
> > > > > - destroying the pdevs invokes onboard_hub_remove(), which waits for the
> > > > >     'attach' work to complete
> > > > >     - waits forever, since the 'attach' work can't acquire the device
> > > > > lock
> > > > > 
> > > > > Use a single work struct for the driver instead of having a work struct
> > > > > per onboard hub platform driver instance. With that it isn't necessary
> > > > > to cancel the work in onboard_hub_remove(), which fixes the deadlock.
> > > > > The work is only cancelled when the driver is unloaded.
> > > > i applied both patches for this series on top of v6.1
> > > > (multi_v7_defconfig), but usb is still broken on Raspberry Pi 3 B+
> > Thanks for testing.
> > 
> > > here is the hung task output:
> > > 
> > > [  243.682193] INFO: task kworker/1:0:18 blocked for more than 122 seconds.
> > > [  243.682222]       Not tainted 6.1.0-00002-gaa61d98d165b #2
> > > [  243.682233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > this message.
> > > [  243.682242] task:kworker/1:0     state:D stack:0     pid:18 ppid:2
> > > flags:0x00000000
> > > [  243.682267] Workqueue: events onboard_hub_attach_usb_driver
> > > [onboard_usb_hub]
> > > [  243.682317]  __schedule from schedule+0x4c/0xe0
> > > [  243.682345]  schedule from schedule_preempt_disabled+0xc/0x10
> > > [  243.682367]  schedule_preempt_disabled from
> > > __mutex_lock.constprop.0+0x244/0x804
> > > [  243.682394]  __mutex_lock.constprop.0 from __driver_attach+0x7c/0x188
> > > [  243.682421]  __driver_attach from bus_for_each_dev+0x70/0xb0
> > > [  243.682449]  bus_for_each_dev from onboard_hub_attach_usb_driver+0xc/0x28
> > > [onboard_usb_hub]
> > > [  243.682494]  onboard_hub_attach_usb_driver [onboard_usb_hub] from
> > > process_one_work+0x1ec/0x4d0
> > > [  243.682534]  process_one_work from worker_thread+0x50/0x540
> > > [  243.682559]  worker_thread from kthread+0xd0/0xec
> > > [  243.682582]  kthread from ret_from_fork+0x14/0x2c
> > > [  243.682600] Exception stack(0xf086dfb0 to 0xf086dff8)
> > > [  243.682615] dfa0:                                     00000000 00000000
> > > 00000000 00000000
> > > [  243.682631] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000
> > > [  243.682646] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > > [  243.682692] INFO: task kworker/1:2:82 blocked for more than 122 seconds.
> > > [  243.682703]       Not tainted 6.1.0-00002-gaa61d98d165b #2
> > > [  243.682713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > this message.
> > > [  243.682721] task:kworker/1:2     state:D stack:0     pid:82 ppid:2
> > > flags:0x00000000
> > > [  243.682741] Workqueue: events_power_efficient hub_init_func2
> > > [  243.682764]  __schedule from schedule+0x4c/0xe0
> > > [  243.682785]  schedule from schedule_preempt_disabled+0xc/0x10
> > > [  243.682808]  schedule_preempt_disabled from
> > > __mutex_lock.constprop.0+0x244/0x804
> > > [  243.682833]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
> > > [  243.682859]  hub_activate from process_one_work+0x1ec/0x4d0
> > > [  243.682883]  process_one_work from worker_thread+0x50/0x540
> > > [  243.682907]  worker_thread from kthread+0xd0/0xec
> > > [  243.682927]  kthread from ret_from_fork+0x14/0x2c
> > > [  243.682944] Exception stack(0xf1509fb0 to 0xf1509ff8)
> > > [  243.682958] 9fa0:                                     00000000 00000000
> > > 00000000 00000000
> > > [  243.682974] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000
> > > [  243.682988] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > > [  243.683023] INFO: task kworker/1:4:257 blocked for more than 122 seconds.
> > > [  243.683034]       Not tainted 6.1.0-00002-gaa61d98d165b #2
> > > [  243.683043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > this message.
> > > [  243.683051] task:kworker/1:4     state:D stack:0     pid:257 ppid:2
> > > flags:0x00000000
> > > [  243.683071] Workqueue: events_power_efficient hub_init_func2
> > > [  243.683092]  __schedule from schedule+0x4c/0xe0
> > > [  243.683113]  schedule from schedule_preempt_disabled+0xc/0x10
> > > [  243.683135]  schedule_preempt_disabled from
> > > __mutex_lock.constprop.0+0x244/0x804
> > > [  243.683160]  __mutex_lock.constprop.0 from hub_activate+0x584/0x8b0
> > > [  243.683184]  hub_activate from process_one_work+0x1ec/0x4d0
> > > [  243.683209]  process_one_work from worker_thread+0x50/0x540
> > > [  243.683233]  worker_thread from kthread+0xd0/0xec
> > > [  243.683253]  kthread from ret_from_fork+0x14/0x2c
> > > [  243.683270] Exception stack(0xf09d9fb0 to 0xf09d9ff8)
> > > [  243.683283] 9fa0:                                     00000000 00000000
> > > 00000000 00000000
> > > [  243.683299] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000
> > > [  243.683313] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > Does commenting the following help:
> > 
> >    while (work_busy(&attach_usb_driver_work) & WORK_BUSY_RUNNING)
> >        msleep(10);
> > 
> > ?
> Yes, it does. I restarted the board multiple times and it never hang :-)

Thanks again for testing!

I'll post a version without that loop, which shouldn't be needed as per the
rationale in my previous mail.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-01-10  1:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05 23:03 [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Matthias Kaehlcke
2023-01-05 23:03 ` [PATCH 2/2] usb: misc: onboard_hub: Move 'attach' work to the driver Matthias Kaehlcke
2023-01-07 17:23   ` Stefan Wahren
2023-01-08 10:47     ` Stefan Wahren
2023-01-09 17:40       ` Matthias Kaehlcke
2023-01-09 20:19         ` Stefan Wahren
2023-01-10  1:07           ` Matthias Kaehlcke
2023-01-06 15:32 ` [PATCH 1/2] usb: misc: onboard_hub: Invert driver registration order Greg Kroah-Hartman
2023-01-06 16:13   ` Matthias Kaehlcke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.