All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthias Kaehlcke <mka@chromium.org>
To: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Fabrice Gasnier <fabrice.gasnier@foss.st.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, regressions@lists.linux.dev,
	Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Regression: onboard-usb-hub breaks USB on RPi 3
Date: Thu, 22 Dec 2022 00:55:53 +0000	[thread overview]
Message-ID: <Y6OrGbqaMy2iVDWB@google.com> (raw)
In-Reply-To: <8503cdb5-6089-b9e4-25ff-f3ac294b7a79@i2se.com>

Hi Stefan,

On Wed, Dec 21, 2022 at 10:31:09PM +0100, Stefan Wahren wrote:
> Hi Matthias,
> 
> Am 21.12.22 um 20:02 schrieb Matthias Kaehlcke:
> > Hi Stefan,
> > 
> > On Wed, Dec 21, 2022 at 07:00:41PM +0100, Stefan Wahren wrote:
> > > I will try to play with lock debugging.
> > Thanks, hopefully that can provide some hint.
> > 
> DETECT_HUNG_TASK reveals this in error case:
> 
> [  243.676253] INFO: task kworker/2:1:44 blocked for more than 122 seconds.
> [  243.676284]       Not tainted 6.1.0-00007-g22fada783b9f #32
> [  243.676294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.676303] task:kworker/2:1     state:D stack:0     pid:44 ppid:2     
> flags:0x00000000
> [  243.676329] Workqueue: events onboard_hub_attach_usb_driver
> [onboard_usb_hub]
> [  243.676388]  __schedule from schedule+0x58/0xf8
> [  243.676419]  schedule from schedule_preempt_disabled+0x1c/0x2c
> [  243.676445]  schedule_preempt_disabled from
> __mutex_lock.constprop.0+0x29c/0x948
> [  243.676474]  __mutex_lock.constprop.0 from __driver_attach+0x74/0x188
> [  243.676503]  __driver_attach from bus_for_each_dev+0x70/0xb0
> [  243.676532]  bus_for_each_dev from onboard_hub_attach_usb_driver+0xc/0x28
> [onboard_usb_hub]
> [  243.676587]  onboard_hub_attach_usb_driver [onboard_usb_hub] from
> process_one_work+0x1f8/0x520
> [  243.676637]  process_one_work from worker_thread+0x40/0x55c
> [  243.676663]  worker_thread from kthread+0xf0/0x110
> [  243.676685]  kthread from ret_from_fork+0x14/0x2c
> [  243.676705] Exception stack(0xf091dfb0 to 0xf091dff8)
> [  243.676720] dfa0:                                     00000000 00000000
> 00000000 00000000
> [  243.676737] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000
> [  243.676752] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  243.676788] INFO: task systemd-udevd:148 blocked for more than 122
> seconds.
> [  243.676800]       Not tainted 6.1.0-00007-g22fada783b9f #32
> [  243.676809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  243.676817] task:systemd-udevd   state:D stack:0     pid:148 ppid:144   
> flags:0x00000081
> [  243.676839]  __schedule from schedule+0x58/0xf8
> [  243.676864]  schedule from schedule_timeout+0xb4/0x15c
> [  243.676893]  schedule_timeout from __wait_for_common+0xc4/0x228
> [  243.676922]  __wait_for_common from __flush_work+0x1a8/0x360
> [  243.676949]  __flush_work from __cancel_work_timer+0x10c/0x1e4
> [  243.676975]  __cancel_work_timer from onboard_hub_remove+0x28/0xbc
> [onboard_usb_hub]
> [  243.677021]  onboard_hub_remove [onboard_usb_hub] from
> platform_remove+0x20/0x4c
> [  243.677067]  platform_remove from
> device_release_driver_internal+0x194/0x21c
> [  243.677092]  device_release_driver_internal from
> bus_remove_device+0xcc/0xf8
> [  243.677124]  bus_remove_device from device_del+0x16c/0x468
> [  243.677159]  device_del from platform_device_del.part.0+0x10/0x74
> [  243.677187]  platform_device_del.part.0 from
> platform_device_unregister+0x18/0x24
> [  243.677216]  platform_device_unregister from
> of_platform_device_destroy+0x98/0xa8
> [  243.677249]  of_platform_device_destroy from
> onboard_hub_destroy_pdevs+0x48/0x6c
> [  243.677287]  onboard_hub_destroy_pdevs from hub_disconnect+0x104/0x174
> [  243.677321]  hub_disconnect from usb_unbind_interface+0x78/0x26c
> [  243.677356]  usb_unbind_interface from
> device_release_driver_internal+0x194/0x21c
> [  243.677388]  device_release_driver_internal from
> bus_remove_device+0xcc/0xf8
> [  243.677419]  bus_remove_device from device_del+0x16c/0x468
> [  243.677452]  device_del from usb_disable_device+0xcc/0x178
> [  243.677486]  usb_disable_device from usb_set_configuration+0x4ec/0x8d0
> [  243.677523]  usb_set_configuration from usb_unbind_device+0x24/0x7c
> [  243.677560]  usb_unbind_device from
> device_release_driver_internal+0x194/0x21c
> [  243.677590]  device_release_driver_internal from device_reprobe+0x18/0x90
> [  243.677620]  device_reprobe from __usb_bus_reprobe_drivers+0x40/0x6c
> [  243.677648]  __usb_bus_reprobe_drivers from bus_for_each_dev+0x70/0xb0
> [  243.677676]  bus_for_each_dev from usb_register_device_driver+0x9c/0xd0
> [  243.677713]  usb_register_device_driver from onboard_hub_init+0x30/0x1000
> [onboard_usb_hub]
> [  243.677765]  onboard_hub_init [onboard_usb_hub] from
> do_one_initcall+0x40/0x204
> [  243.677811]  do_one_initcall from do_init_module+0x44/0x1d4
> [  243.677840]  do_init_module from sys_finit_module+0xbc/0xf8
> [  243.677865]  sys_finit_module from __sys_trace_return+0x0/0x10
> [  243.677887] Exception stack(0xf4659fa8 to 0xf4659ff0)
> [  243.677904] 9fa0:                   bf369800 0051dba8 00000006 b6e438e0
> 00000000 b6e443f4
> [  243.677921] 9fc0: bf369800 0051dba8 00000000 0000017b 00531658 0051a1dc
> 00526398 00000000
> [  243.677935] 9fe0: befbb160 befbb150 b6e3a9d8 b6f2aae0

Thanks, that's useful!

The flow is something like this:

- USB root hub is instantiated
- core hub driver calls onboard_hub_create_pdevs(), which creates the
  platform device for the 1st level hub
  - the platform device is created even though the onboard hub driver
    hasn't been loaded yet, because onboard_hub_create/destroy_pdevs()
    is linked into the USB core
- 1st level Microchip hubs is probed by the core hub driver
- core hub driver calls onboard_hub_create_pdevs(), which creates
  the platform device for the 2nd level hub

- onboard_hub platform driver is registered
- platform device for 1st level hub is probed
  - schedules 'attach' work
- platform device for 2nd level hub is probed
  - schedules 'attach' work

- onboard_hub USB driver is registered
- device (and parent) lock of Microchip hub is held while the device is
  re-probing

- 'attach' work (running in another thread) calls driver_attach(), which
  blocks on one of the hub device locks

- onboard_hub_destroy_pdevs() is called by the core hub driver when one
  of the Microchip hubs is detached
- destroying the pdevs invokes onboard_hub_remove(), which waits for the
  'attach' work to complete
  - waits forever, since the 'attach' work can't acquire the device lock

For the Rpi 3 B Plus and boards with similar configurations it should be
enough with not creating the onboard_hub platform devices, which anyway
is the right thing do. I'll send patches for this.

The above race condition could also impact boards which actually should
use the onboard_hub driver, so not creating the pdevs for some boards
won't be enough.

Out of my head I can't think of a clean solution. The onboard hub driver
doesn't control the locks involved. Detaching the driver is necessary
to make sure the onboard_hub USB devices don't hold stale pointers to
their platform device. Re-attaching is needed because of the detach.

One option could be to change the 'attach' work from being a member of
struct onboard_hub to a static variable owned by the driver. With that
onboard_hub_remove() wouldn't have to wait for the work to finish. An
inconvenient is that only one instance of the work can run at any time,
which could result in a race condition when multiple onboard hubs are
probed closely together. It could happen that the USB device of the 2nd
(or 3rd, ...) hub isn't re-attached if it wasn't on the system wide USB
'bus' yet when the 'attach' work of the 1st hub runs. Probably a rare
condition within the (as of now) rare scenario of multiple onboard hubs,
but it could happen. A mitigation could be to enter a sleepy loop if
schedule_work() returns false (work is already running) and schedule it
again until it is actually scheduled on behalf of the platform device
in question. I might go for that if I don't get a better idea.

Happy holidays!

m.

  reply	other threads:[~2022-12-22  0:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-18 13:35 Regression: onboard-usb-hub breaks USB on RPi 3 Stefan Wahren
2022-12-19 10:41 ` Regression: onboard-usb-hub breaks USB on RPi 3 #forregzbot Thorsten Leemhuis
2022-12-19 17:44 ` Regression: onboard-usb-hub breaks USB on RPi 3 Matthias Kaehlcke
2022-12-19 22:32   ` Stefan Wahren
2022-12-20  0:30     ` Matthias Kaehlcke
2022-12-20 16:19       ` Stefan Wahren
2022-12-20 22:50         ` Matthias Kaehlcke
2022-12-21 12:29           ` Stefan Wahren
2022-12-21 16:50             ` Matthias Kaehlcke
2022-12-21 18:00               ` Stefan Wahren
2022-12-21 19:02                 ` Matthias Kaehlcke
2022-12-21 21:31                   ` Stefan Wahren
2022-12-22  0:55                     ` Matthias Kaehlcke [this message]
2022-12-22 11:19                       ` Stefan Wahren
2022-12-27 13:15                     ` Stefan Wahren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y6OrGbqaMy2iVDWB@google.com \
    --to=mka@chromium.org \
    --cc=f.fainelli@gmail.com \
    --cc=fabrice.gasnier@foss.st.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stefan.wahren@i2se.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.