All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: Sam Sun <samsun1006219@gmail.com>,
	Greg KH <gregkh@linuxfoundation.org>, Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
	"xrivendell7@gmail.com" <xrivendell7@gmail.com>,
	hgajjar@de.adit-jv.com, quic_ugoswami@quicinc.com,
	stanley_chang@realtek.com, heikki.krogerus@linux.intel.com
Subject: Re: [Bug] INFO: task hung in hub_activate
Date: Mon, 4 Mar 2024 11:15:24 -0500	[thread overview]
Message-ID: <e9d710fc-eace-44de-b3cc-1117c3575ef7@rowland.harvard.edu> (raw)
In-Reply-To: <CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNOOHocDkPoAA@mail.gmail.com>

On Mon, Mar 04, 2024 at 08:10:02PM +0800, Sam Sun wrote:
> Dear developers and maintainers,
> 
> We encountered a task hung in function hub_activate(). It was reported
> before by Syzbot several years ago
> (https://groups.google.com/g/syzkaller-lts-bugs/c/_komEgHj03Y/m/rbcVKyLXBwAJ),
> but no repro at that time. We have a C repro this time and kernel
> config is attached to this email. The bug report is listed below.

Never mind the rest of the kernel log; I figured out what's going on.
Here are the important parts:

> ppid:8106   flags:0x00000006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5376 [inline]
>  __schedule+0xcea/0x59e0 kernel/sched/core.c:6688
>  __schedule_loop kernel/sched/core.c:6763 [inline]
>  schedule+0xe9/0x270 kernel/sched/core.c:6778
>  schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6835
>  __mutex_lock_common kernel/locking/mutex.c:679 [inline]
>  __mutex_lock+0x509/0x940 kernel/locking/mutex.c:747
>  device_lock include/linux/device.h:992 [inline]
>  usb_deauthorize_interface+0x4d/0x130 drivers/usb/core/message.c:1789
>  interface_authorized_store+0xaf/0x110 drivers/usb/core/sysfs.c:1178
>  dev_attr_store+0x54/0x80 drivers/base/core.c:2366

usb_deauthorize_interface() starts by calling device_lock() on the
usb_interface's parent usb_device.

> ppid:8109   flags:0x00004006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5376 [inline]
>  __schedule+0xcea/0x59e0 kernel/sched/core.c:6688
>  __schedule_loop kernel/sched/core.c:6763 [inline]
>  schedule+0xe9/0x270 kernel/sched/core.c:6778
>  kernfs_drain+0x36c/0x550 fs/kernfs/dir.c:505
>  __kernfs_remove+0x280/0x650 fs/kernfs/dir.c:1465
>  kernfs_remove_by_name_ns+0xb4/0x130 fs/kernfs/dir.c:1673
>  kernfs_remove_by_name include/linux/kernfs.h:623 [inline]
>  remove_files+0x96/0x1c0 fs/sysfs/group.c:28
>  sysfs_remove_group+0x8b/0x180 fs/sysfs/group.c:292
>  sysfs_remove_groups fs/sysfs/group.c:316 [inline]
>  sysfs_remove_groups+0x60/0xa0 fs/sysfs/group.c:308
>  device_remove_groups drivers/base/core.c:2734 [inline]
>  device_remove_attrs+0x192/0x290 drivers/base/core.c:2909
>  device_del+0x391/0xa30 drivers/base/core.c:3813
>  usb_disable_device+0x360/0x7b0 drivers/usb/core/message.c:1416
>  usb_set_configuration+0x1243/0x1c40 drivers/usb/core/message.c:2063
>  usb_deauthorize_device+0xe4/0x110 drivers/usb/core/hub.c:2638
>  authorized_store+0x122/0x140 drivers/usb/core/sysfs.c:747
>  dev_attr_store+0x54/0x80 drivers/base/core.c:2366

Among other things, usb_disable_device() calls device_del() for the
usb_device's child interfaces.

For brevity, let A be the parent usb_device and let B be the child
usb_interface.  Then in broad terms, we have:

CPU 0					CPU 1
-----------------------------		----------------------------
usb_deauthorize_device(A)
  device_lock(A)			usb_deauthorize_interface(B)
  usb_set_configuration(A, -1)		  device_lock(A)
    usb_disable_device(B)
      device_del(B)
        sysfs_remove_group(B, intf_attrs)

The problem now is:

1.	The kernfs core (kernfs_drain) on CPU 0 can't remove the
	intf_attrs sysfs attribute group while CPU 1 is in the middle
	of running a callback routine for one of the attribute files
	in that group.

2.	The callback routine on CPU 1 can't grab A's lock while CPU 0
	is holding it.

Result: deadlock.

This seems to be the only case where an interface sysfs callback
routine tries to acquire the parent device's lock.  That lock is
needed here because when an interface is deauthorized, the kernel has
to unbind the driver for that interface -- and binding or unbinding a
USB interface driver requires that the parent device's lock be held.

Three ideas stand out.  First, the device_lock() call should be
interruptible, because it is called when a user process writes to the
"authorized" attribute file.  But that alone won't fix the problem.

Second, we could avoid the deadlock by adding a timeout to this
device_lock() call.  But we probably don't want a deauthorize
operation to fail because of a timeout from a contested lock.

Third, this must be a generic problem.  It will occur any time a sysfs
attribute callback tries to lock its device while another process is
trying to unregister that device.

We faced this sort of problem some years ago when we were worrying
about "suicidal" attributes -- ones which would unregister their own
devices.  I don't remember what the fix was or how it worked.  But we
need something like it here.

Greg and Tejun, any ideas?  Is it possible somehow for an attribute file 
to be removed while its callback is still running?

Alan Stern

  parent reply	other threads:[~2024-03-04 16:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04 12:10 [Bug] INFO: task hung in hub_activate Sam Sun
2024-03-04 13:37 ` Hillf Danton
2024-03-04 14:59 ` Alan Stern
2024-03-04 16:15 ` Alan Stern [this message]
2024-03-04 16:36   ` Greg KH
2024-03-04 18:30     ` Alan Stern
2024-03-04 19:17       ` Bug in sysfs_break_active_protection() Alan Stern
2024-03-13 20:21         ` Tejun Heo
2024-03-13 21:43           ` [PATCH] fs: sysfs: Fix reference leak " Alan Stern
2024-03-13 21:44             ` Tejun Heo
2024-03-13 22:10             ` Bart Van Assche
2024-03-05 12:03       ` [Bug] INFO: task hung in hub_activate Greg KH
2024-03-07 20:35 ` Alan Stern
2024-03-08  2:48   ` Sam Sun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9d710fc-eace-44de-b3cc-1117c3575ef7@rowland.harvard.edu \
    --to=stern@rowland.harvard.edu \
    --cc=gregkh@linuxfoundation.org \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=hgajjar@de.adit-jv.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=quic_ugoswami@quicinc.com \
    --cc=samsun1006219@gmail.com \
    --cc=stanley_chang@realtek.com \
    --cc=tj@kernel.org \
    --cc=xrivendell7@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.