* Re: USB: add check to detect host controller hardware removal
[not found] <PH0PR11MB5191464B2F01511D2ADECB3BF1D2A@PH0PR11MB5191.namprd11.prod.outlook.com>
@ 2023-10-13 17:17 ` Alan Stern
2023-10-15 13:37 ` Li, Meng
2023-10-16 16:56 ` Steven Rostedt
0 siblings, 2 replies; 25+ messages in thread
From: Alan Stern @ 2023-10-13 17:17 UTC (permalink / raw)
To: Li, Meng; +Cc: Steven Rostedt, Ingo Molnar, USB mailing list
Messages like this should always be sent to the mailing list as well as
to me. And in this case, since it involves an RT kernel, it should be
CC-ed to some of the people involved in developing the RT patches.
On Fri, Oct 13, 2023 at 04:25:43AM +0000, Li, Meng wrote:
> Hi Alan Stern,
>
> Sorry for disrupting you very abruptly. I encounter a calltrace when trying to remove a PCIe-to-USB card device with below command. Only occurred in rt kernel.
>
> # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
>
> Call trace:
> dump_backtrace.part.0+0xc8/0xd4
> show_stack+0x20/0x30
> dump_stack_lvl+0x6c/0x88
> dump_stack+0x18/0x34
> __might_resched+0x160/0x1c0
> rt_spin_lock+0x38/0xb0
> xhci_irq+0x44/0x16d0
> usb_hcd_irq+0x38/0x5c
> usb_hcd_pci_remove+0x84/0x14c
> xhci_pci_remove+0x78/0xc0
> pci_device_remove+0x44/0xcc
> device_remove+0x54/0x8c
> device_release_driver_internal+0x1ec/0x260
> device_release_driver+0x20/0x30
> pci_stop_bus_device+0x8c/0xcc
> pci_stop_and_remove_bus_device_locked+0x28/0x44
> remove_store+0xa0/0xb0
> dev_attr_store+0x20/0x34
> sysfs_kf_write+0x4c/0x5c
> kernfs_fop_write_iter+0x128/0x1f0
> vfs_write+0x1c0/0x2f0
> ksys_write+0x78/0x110
> __arm64_sys_write+0x24/0x30
> invoke_syscall+0x5c/0x130
> el0_svc_common.constprop.0+0x4c/0xf4
> do_el0_svc+0x34/0xc0
> el0_svc+0x2c/0x84
> el0t_64_sync_handler+0xf4/0x120
> el0t_64_sync+0x18c/0x190
>
> this trace is caused by below patch that is used to fix a usb hang issue.
>
> commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> Author: Alan Stern stern@rowland.harvard.edu<mailto:stern@rowland.harvard.edu>
> Date: Wed Jun 9 17:34:27 2010 -0400
>
> USB: add check to detect host controller hardware removal
>
>
> I know it is too too long ago, but could you please try to recall how to reproduce the issue that you fixed, how can I produce the usb hang issue.
I think the issue could be reproduced just by hot-removing a USB host
controller. Maybe the controller was on a PCMCIA card; I don't
remember. And I don't remember what kind of USB host controller it was.
You might be able to find the original bug report in the linux-usb or
linux-usb-devel mailing list archives for 2010.
> I want to try whether I can get another method to fix the USB hang issue without disable irq, so that avoid the calltrace.
For new people, the contents of that commit are:
This patch (as1391) fixes a problem that can occur when USB host
controller hardware is hot-unplugged. If no interrupts are generated
by the unplug then the HCD may not realize that the controller is
gone, and the subsequent unbind may hang waiting for interrupts that
never arrive.
The solution (for PCI-based controllers) is to call the HCD's
interrupt handler at the start of usb_hcd_pci_remove(). If the
hardware is gone, the handler will realize this when it tries to read
the controller's status register.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 1cf2d1e79a5c..7e2d5271b0c9 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
if (!hcd)
return;
+ /* Fake an interrupt request in order to give the driver a chance
+ * to test whether the controller hardware has been removed (e.g.,
+ * cardbus physical eject).
+ */
+ local_irq_disable();
+ usb_hcd_irq(0, hcd);
+ local_irq_enable();
+
usb_remove_hcd(hcd);
if (hcd->driver->flags & HCD_MEMORY) {
iounmap(hcd->regs);
The local_irq_disable() is there so that the irq handler will be invoked
in the state that it expects (i.e., with interrupts disabled).
Apparently Meng's RT kernel doesn't like it when the handler then
calls spin_lock(); I don't know why.
Alan Stern
^ permalink raw reply related [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal
2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
@ 2023-10-15 13:37 ` Li, Meng
2023-10-16 16:56 ` Steven Rostedt
1 sibling, 0 replies; 25+ messages in thread
From: Li, Meng @ 2023-10-15 13:37 UTC (permalink / raw)
To: Alan Stern; +Cc: Steven Rostedt, Ingo Molnar, USB mailing list
> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Saturday, October 14, 2023 1:18 AM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>; Ingo Molnar
> <mingo@redhat.com>; USB mailing list <linux-usb@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> Messages like this should always be sent to the mailing list as well as to me.
> And in this case, since it involves an RT kernel, it should be CC-ed to some of
> the people involved in developing the RT patches.
>
> On Fri, Oct 13, 2023 at 04:25:43AM +0000, Li, Meng wrote:
> > Hi Alan Stern,
> >
> > Sorry for disrupting you very abruptly. I encounter a calltrace when trying to
> remove a PCIe-to-USB card device with below command. Only occurred in rt
> kernel.
> >
> > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> >
> > Call trace:
> > dump_backtrace.part.0+0xc8/0xd4
> > show_stack+0x20/0x30
> > dump_stack_lvl+0x6c/0x88
> > dump_stack+0x18/0x34
> > __might_resched+0x160/0x1c0
> > rt_spin_lock+0x38/0xb0
> > xhci_irq+0x44/0x16d0
> > usb_hcd_irq+0x38/0x5c
> > usb_hcd_pci_remove+0x84/0x14c
> > xhci_pci_remove+0x78/0xc0
> > pci_device_remove+0x44/0xcc
> > device_remove+0x54/0x8c
> > device_release_driver_internal+0x1ec/0x260
> > device_release_driver+0x20/0x30
> > pci_stop_bus_device+0x8c/0xcc
> > pci_stop_and_remove_bus_device_locked+0x28/0x44
> > remove_store+0xa0/0xb0
> > dev_attr_store+0x20/0x34
> > sysfs_kf_write+0x4c/0x5c
> > kernfs_fop_write_iter+0x128/0x1f0
> > vfs_write+0x1c0/0x2f0
> > ksys_write+0x78/0x110
> > __arm64_sys_write+0x24/0x30
> > invoke_syscall+0x5c/0x130
> > el0_svc_common.constprop.0+0x4c/0xf4
> > do_el0_svc+0x34/0xc0
> > el0_svc+0x2c/0x84
> > el0t_64_sync_handler+0xf4/0x120
> > el0t_64_sync+0x18c/0x190
> >
> > this trace is caused by below patch that is used to fix a usb hang issue.
> >
> > commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> > Author: Alan Stern
> stern@rowland.harvard.edu<mailto:stern@rowland.harvard.edu>
> > Date: Wed Jun 9 17:34:27 2010 -0400
> >
> > USB: add check to detect host controller hardware removal
> >
> >
> > I know it is too too long ago, but could you please try to recall how to
> reproduce the issue that you fixed, how can I produce the usb hang issue.
>
> I think the issue could be reproduced just by hot-removing a USB host
> controller. Maybe the controller was on a PCMCIA card; I don't remember.
> And I don't remember what kind of USB host controller it was.
>
> You might be able to find the original bug report in the linux-usb or linux-usb-
> devel mailing list archives for 2010.
>
Thanks for offering this clue about how to reproduce this issue, I will try it on my side.
Thanks,
Limeng
> > I want to try whether I can get another method to fix the USB hang issue
> without disable irq, so that avoid the calltrace.
>
> For new people, the contents of that commit are:
>
> This patch (as1391) fixes a problem that can occur when USB host
> controller hardware is hot-unplugged. If no interrupts are generated
> by the unplug then the HCD may not realize that the controller is
> gone, and the subsequent unbind may hang waiting for interrupts that
> never arrive.
>
> The solution (for PCI-based controllers) is to call the HCD's
> interrupt handler at the start of usb_hcd_pci_remove(). If the
> hardware is gone, the handler will realize this when it tries to read
> the controller's status register.
>
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index
> 1cf2d1e79a5c..7e2d5271b0c9 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
> if (!hcd)
> return;
>
> + /* Fake an interrupt request in order to give the driver a chance
> + * to test whether the controller hardware has been removed (e.g.,
> + * cardbus physical eject).
> + */
> + local_irq_disable();
> + usb_hcd_irq(0, hcd);
> + local_irq_enable();
> +
> usb_remove_hcd(hcd);
> if (hcd->driver->flags & HCD_MEMORY) {
> iounmap(hcd->regs);
>
> The local_irq_disable() is there so that the irq handler will be invoked in the
> state that it expects (i.e., with interrupts disabled).
> Apparently Meng's RT kernel doesn't like it when the handler then calls
> spin_lock(); I don't know why.
>
> Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
2023-10-15 13:37 ` Li, Meng
@ 2023-10-16 16:56 ` Steven Rostedt
2023-10-16 19:23 ` Alan Stern
2023-10-19 12:49 ` Sebastian Andrzej Siewior
1 sibling, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2023-10-16 16:56 UTC (permalink / raw)
To: Alan Stern
Cc: Li, Meng, Ingo Molnar, USB mailing list,
Sebastian Andrzej Siewior, linux-rt-users
On Fri, 13 Oct 2023 13:17:52 -0400
Alan Stern <stern@rowland.harvard.edu> wrote:
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
> if (!hcd)
> return;
>
> + /* Fake an interrupt request in order to give the driver a chance
> + * to test whether the controller hardware has been removed (e.g.,
> + * cardbus physical eject).
> + */
> + local_irq_disable();
> + usb_hcd_irq(0, hcd);
> + local_irq_enable();
> +
> usb_remove_hcd(hcd);
> if (hcd->driver->flags & HCD_MEMORY) {
> iounmap(hcd->regs);
>
> The local_irq_disable() is there so that the irq handler will be invoked
> in the state that it expects (i.e., with interrupts disabled).
> Apparently Meng's RT kernel doesn't like it when the handler then
> calls spin_lock(); I don't know why.
Because in RT, spin_lock()s are actually mutexes.
In RT, interrupt handlers do not even run with interrupts disabled (they
run as threads), so the assumption that they run with interrupts disabled
on RT is incorrect. One hack would simply be:
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
local_irq_disable();
usb_hcd_irq(0, hcd);
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
local_irq_enable();
But that's rather ugly. We use to have that as a wrapper of just:
local_irq_disable_nort();
but I don't know if we removed that or not.
Sebastian?
-- Steve
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-16 16:56 ` Steven Rostedt
@ 2023-10-16 19:23 ` Alan Stern
2023-10-17 2:23 ` Li, Meng
2023-10-19 12:49 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-16 19:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: Li, Meng, Ingo Molnar, USB mailing list,
Sebastian Andrzej Siewior, linux-rt-users
On Mon, Oct 16, 2023 at 12:56:24PM -0400, Steven Rostedt wrote:
> On Fri, 13 Oct 2023 13:17:52 -0400
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > --- a/drivers/usb/core/hcd-pci.c
> > +++ b/drivers/usb/core/hcd-pci.c
> > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
> > if (!hcd)
> > return;
> >
> > + /* Fake an interrupt request in order to give the driver a chance
> > + * to test whether the controller hardware has been removed (e.g.,
> > + * cardbus physical eject).
> > + */
> > + local_irq_disable();
> > + usb_hcd_irq(0, hcd);
> > + local_irq_enable();
> > +
> > usb_remove_hcd(hcd);
> > if (hcd->driver->flags & HCD_MEMORY) {
> > iounmap(hcd->regs);
> >
> > The local_irq_disable() is there so that the irq handler will be invoked
> > in the state that it expects (i.e., with interrupts disabled).
> > Apparently Meng's RT kernel doesn't like it when the handler then
> > calls spin_lock(); I don't know why.
>
> Because in RT, spin_lock()s are actually mutexes.
>
> In RT, interrupt handlers do not even run with interrupts disabled (they
> run as threads), so the assumption that they run with interrupts disabled
> on RT is incorrect. One hack would simply be:
>
> if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> local_irq_disable();
> usb_hcd_irq(0, hcd);
> if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> local_irq_enable();
>
> But that's rather ugly. We use to have that as a wrapper of just:
>
> local_irq_disable_nort();
>
> but I don't know if we removed that or not.
>
> Sebastian?
Thanks for the information. I guess a simple approach would be to add
the wrapper back in, since it's not present in the current kernel.
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal
2023-10-16 19:23 ` Alan Stern
@ 2023-10-17 2:23 ` Li, Meng
2023-10-17 14:06 ` Alan Stern
0 siblings, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-10-17 2:23 UTC (permalink / raw)
To: Alan Stern, Steven Rostedt
Cc: Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users
> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Tuesday, October 17, 2023 3:24 AM
> To: Steven Rostedt <rostedt@goodmis.org>
> Cc: Li, Meng <Meng.Li@windriver.com>; Ingo Molnar <mingo@redhat.com>;
> USB mailing list <linux-usb@vger.kernel.org>; Sebastian Andrzej Siewior
> <bigeasy@linutronix.de>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On Mon, Oct 16, 2023 at 12:56:24PM -0400, Steven Rostedt wrote:
> > On Fri, 13 Oct 2023 13:17:52 -0400
> > Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > --- a/drivers/usb/core/hcd-pci.c
> > > +++ b/drivers/usb/core/hcd-pci.c
> > > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
> > > if (!hcd)
> > > return;
> > >
> > > + /* Fake an interrupt request in order to give the driver a chance
> > > + * to test whether the controller hardware has been removed (e.g.,
> > > + * cardbus physical eject).
> > > + */
> > > + local_irq_disable();
> > > + usb_hcd_irq(0, hcd);
> > > + local_irq_enable();
> > > +
> > > usb_remove_hcd(hcd);
> > > if (hcd->driver->flags & HCD_MEMORY) {
> > > iounmap(hcd->regs);
> > >
> > > The local_irq_disable() is there so that the irq handler will be
> > > invoked in the state that it expects (i.e., with interrupts disabled).
> > > Apparently Meng's RT kernel doesn't like it when the handler then
> > > calls spin_lock(); I don't know why.
> >
> > Because in RT, spin_lock()s are actually mutexes.
> >
> > In RT, interrupt handlers do not even run with interrupts disabled
> > (they run as threads), so the assumption that they run with interrupts
> > disabled on RT is incorrect. One hack would simply be:
> >
> > if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> > local_irq_disable();
> > usb_hcd_irq(0, hcd);
> > if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> > local_irq_enable();
> >
> > But that's rather ugly. We use to have that as a wrapper of just:
> >
> > local_irq_disable_nort();
> >
> > but I don't know if we removed that or not.
> >
> > Sebastian?
>
> Thanks for the information. I guess a simple approach would be to add the
> wrapper back in, since it's not present in the current kernel.
>
I did some debugs on my side.
Firstly, the local_irq_disable_nort() function had been removed from latest RT kernel.
Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: add check to detect host controller hardware removal") is no longer useful.
Because the function usb_remove_hcd() is invoked from xhci_pci_remove() of file xhci-pci.c in advance.
I am trying to fix this issue with getting register status directly without local_irq_disable().
Thanks,
Limeng
> Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-17 2:23 ` Li, Meng
@ 2023-10-17 14:06 ` Alan Stern
2023-10-18 5:00 ` Li, Meng
0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-17 14:06 UTC (permalink / raw)
To: Li, Meng
Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
Sebastian Andrzej Siewior, linux-rt-users
On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote:
> I did some debugs on my side.
> Firstly, the local_irq_disable_nort() function had been removed from latest RT kernel.
What's in the RT kernel doesn't matter here, because the code you're
patching belongs to the vanilla kernel.
> Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: add check to detect host controller hardware removal") is no longer useful.
> Because the function usb_remove_hcd() is invoked from xhci_pci_remove() of file xhci-pci.c in advance.
What about for non-xHCI controllers?
> I am trying to fix this issue with getting register status directly without local_irq_disable().
Were you able to locate the original bug report?
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal
2023-10-17 14:06 ` Alan Stern
@ 2023-10-18 5:00 ` Li, Meng
2023-10-18 15:20 ` Alan Stern
0 siblings, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-10-18 5:00 UTC (permalink / raw)
To: Alan Stern
Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
Sebastian Andrzej Siewior, linux-rt-users
> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Tuesday, October 17, 2023 10:06 PM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>; Ingo Molnar
> <mingo@redhat.com>; USB mailing list <linux-usb@vger.kernel.org>;
> Sebastian Andrzej Siewior <bigeasy@linutronix.de>; linux-rt-users <linux-rt-
> users@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote:
> > I did some debugs on my side.
> > Firstly, the local_irq_disable_nort() function had been removed from latest
> RT kernel.
>
> What's in the RT kernel doesn't matter here, because the code you're patching
> belongs to the vanilla kernel.
>
> > Second, because of creating xhci-pci.c, the commit c548795abe0d("USB:
> add check to detect host controller hardware removal") is no longer useful.
> > Because the function usb_remove_hcd() is invoked from xhci_pci_remove()
> of file xhci-pci.c in advance.
>
> What about for non-xHCI controllers?
>
I will try non-xHCI controllers in later if I can find out one on my side.
> > I am trying to fix this issue with getting register status directly without
> local_irq_disable().
>
> Were you able to locate the original bug report?
>
This is original bug report
https://bugzilla.redhat.com/show_bug.cgi?id=579093
my latest debug information as below:
when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register.
Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP
Modules linked in:
CPU: 3 PID: 329 Comm: usb-storage Tainted: G W 6.1.53-rt10-yocto-preempt-rt #1
Hardware name: LS1043A RDB Board (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : xhci_ring_ep_doorbell+0x78/0x120
lr : xhci_queue_bulk_tx+0x3b0/0x8a4
sp : ffff80000b0e3960
x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100
x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f
x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100
x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000
x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50
x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000
x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290
Call trace:
xhci_ring_ep_doorbell+0x78/0x120
xhci_queue_bulk_tx+0x3b0/0x8a4
xhci_urb_enqueue+0x274/0x510
usb_hcd_submit_urb+0xc0/0x8b0
usb_submit_urb+0x29c/0x5c0
usb_stor_msg_common+0x9c/0x190
usb_stor_bulk_transfer_buf+0x58/0x110
usb_stor_Bulk_transport+0xdc/0x380
usb_stor_invoke_transport+0x40/0x530
usb_stor_transparent_scsi_command+0x18/0x24
usb_stor_control_thread+0x20c/0x2a0
kthread+0x12c/0x130
ret_from_fork+0x10/0x20
Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021)
---[ end trace 0000000000000000 ]---
Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally.
In this situation, if remove the pcie device with below command
# echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove
The code will hang at the xhci->lock in xhci_urb_dequeue() function.
Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(),
there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status);
although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed.
Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed.
In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware.
so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
Thanks,
Limeng
> Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-18 5:00 ` Li, Meng
@ 2023-10-18 15:20 ` Alan Stern
2023-10-18 15:34 ` Alan Stern
2023-10-19 12:38 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 25+ messages in thread
From: Alan Stern @ 2023-10-18 15:20 UTC (permalink / raw)
To: Li, Meng
Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
Sebastian Andrzej Siewior, linux-rt-users
On Wed, Oct 18, 2023 at 05:00:58AM +0000, Li, Meng wrote:
> > Were you able to locate the original bug report?
> >
> This is original bug report
> https://bugzilla.redhat.com/show_bug.cgi?id=579093
The Red Hat Bugzilla says:
You are not authorized to access bug #579093.
So I can't tell exactly what happened back then. :-(
But I do vaguely remember the discussion with Stratus Technologies.
They had special hardware in their systems, which allowed them to do
hot-swapping of PCI components.
> my latest debug information as below:
> when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register.
> Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP
> Modules linked in:
> CPU: 3 PID: 329 Comm: usb-storage Tainted: G W 6.1.53-rt10-yocto-preempt-rt #1
> Hardware name: LS1043A RDB Board (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : xhci_ring_ep_doorbell+0x78/0x120
> lr : xhci_queue_bulk_tx+0x3b0/0x8a4
> sp : ffff80000b0e3960
> x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100
> x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f
> x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100
> x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000
> x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
> x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50
> x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000
> x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000
> x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290
> Call trace:
> xhci_ring_ep_doorbell+0x78/0x120
> xhci_queue_bulk_tx+0x3b0/0x8a4
> xhci_urb_enqueue+0x274/0x510
> usb_hcd_submit_urb+0xc0/0x8b0
> usb_submit_urb+0x29c/0x5c0
> usb_stor_msg_common+0x9c/0x190
> usb_stor_bulk_transfer_buf+0x58/0x110
> usb_stor_Bulk_transport+0xdc/0x380
> usb_stor_invoke_transport+0x40/0x530
> usb_stor_transparent_scsi_command+0x18/0x24
> usb_stor_control_thread+0x20c/0x2a0
> kthread+0x12c/0x130
> ret_from_fork+0x10/0x20
> Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021)
> ---[ end trace 0000000000000000 ]---
> Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally.
> In this situation, if remove the pcie device with below command
> # echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove
> The code will hang at the xhci->lock in xhci_urb_dequeue() function.
> Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(),
> there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status);
> although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed.
> Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed.
> In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware.
If you hadn't removed the card suddenly, the exception would not have
occurred. So the logical conclusion isn't that we should get rid of the
usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't
remove PCIe cards while the system is running. Not unless your computer
uses the special hardware from Stratus Technologies.
> so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
What about cardbus or PCMCIA cards? Removing one of those cards
suddenly, while the system is running, is a perfectly reasonable thing
to do and it will not cause any hardware damage. So I think we should
keep the usb_hcd_irq(0, hcd) call.
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-18 15:20 ` Alan Stern
@ 2023-10-18 15:34 ` Alan Stern
2023-10-19 12:38 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 25+ messages in thread
From: Alan Stern @ 2023-10-18 15:34 UTC (permalink / raw)
To: Li, Meng
Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
Sebastian Andrzej Siewior, linux-rt-users
On Wed, Oct 18, 2023 at 11:20:46AM -0400, Alan Stern wrote:
> On Wed, Oct 18, 2023 at 05:00:58AM +0000, Li, Meng wrote:
> > > Were you able to locate the original bug report?
> > >
> > This is original bug report
> > https://bugzilla.redhat.com/show_bug.cgi?id=579093
>
> The Red Hat Bugzilla says:
>
> You are not authorized to access bug #579093.
>
> So I can't tell exactly what happened back then. :-(
>
> But I do vaguely remember the discussion with Stratus Technologies.
> They had special hardware in their systems, which allowed them to do
> hot-swapping of PCI components.
Incidentally, for anyone who's interested, some early discussion about
these problems can be found on an open mailing list here:
https://marc.info/?l=linux-usb&m=127559364101206&w=2
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-18 15:20 ` Alan Stern
2023-10-18 15:34 ` Alan Stern
@ 2023-10-19 12:38 ` Sebastian Andrzej Siewior
2023-10-19 15:09 ` Alan Stern
1 sibling, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-19 12:38 UTC (permalink / raw)
To: Alan Stern
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On 2023-10-18 11:20:46 [-0400], Alan Stern wrote:
>
> If you hadn't removed the card suddenly, the exception would not have
> occurred. So the logical conclusion isn't that we should get rid of the
> usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't
> remove PCIe cards while the system is running. Not unless your computer
> uses the special hardware from Stratus Technologies.
So the card was removed and the kernel complained that it can't access
the memory behind the PCI-bar?
How odd…
> > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
>
> What about cardbus or PCMCIA cards? Removing one of those cards
> suddenly, while the system is running, is a perfectly reasonable thing
> to do and it will not cause any hardware damage. So I think we should
> keep the usb_hcd_irq(0, hcd) call.
Don't you invoke pci_driver::remove in such case to properly let the
physical device go? This can also be tested via unbind from sysfs.
> Alan Stern
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-16 16:56 ` Steven Rostedt
2023-10-16 19:23 ` Alan Stern
@ 2023-10-19 12:49 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-19 12:49 UTC (permalink / raw)
To: Steven Rostedt
Cc: Alan Stern, Li, Meng, Ingo Molnar, USB mailing list, linux-rt-users
On 2023-10-16 12:56:24 [-0400], Steven Rostedt wrote:
> On Fri, 13 Oct 2023 13:17:52 -0400
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Because in RT, spin_lock()s are actually mutexes.
>
> In RT, interrupt handlers do not even run with interrupts disabled (they
> run as threads), so the assumption that they run with interrupts disabled
> on RT is incorrect. One hack would simply be:
>
> if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> local_irq_disable();
> usb_hcd_irq(0, hcd);
> if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> local_irq_enable();
>
> But that's rather ugly. We use to have that as a wrapper of just:
>
> local_irq_disable_nort();
>
> but I don't know if we removed that or not.
>
> Sebastian?
Got removed once it had zero users. It was hated because it was used
usually used as a quick solution to hide problems underneath.
> -- Steve
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-19 12:38 ` Sebastian Andrzej Siewior
@ 2023-10-19 15:09 ` Alan Stern
2023-10-19 15:27 ` Alan Stern
0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-19 15:09 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On Thu, Oct 19, 2023 at 02:38:23PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-10-18 11:20:46 [-0400], Alan Stern wrote:
> >
> > If you hadn't removed the card suddenly, the exception would not have
> > occurred. So the logical conclusion isn't that we should get rid of the
> > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't
> > remove PCIe cards while the system is running. Not unless your computer
> > uses the special hardware from Stratus Technologies.
>
> So the card was removed and the kernel complained that it can't access
> the memory behind the PCI-bar?
>
> How odd…
>
> > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
> >
> > What about cardbus or PCMCIA cards? Removing one of those cards
> > suddenly, while the system is running, is a perfectly reasonable thing
> > to do and it will not cause any hardware damage. So I think we should
> > keep the usb_hcd_irq(0, hcd) call.
>
> Don't you invoke pci_driver::remove in such case to properly let the
> physical device go? This can also be tested via unbind from sysfs.
I don't know any more. The code in question was added in 2010 in order
to handle a specially designed high-availability hot-swap-capable
system, and it may not be relevant now.
The original problem: When a particular PCI device was disabled (to
simulate a failure), pci_driver::remove did not get called. Maybe
they wanted to have it fail over to a backup device and appear to the
kernel as though nothing had happened? It's hard to tell exactly what
was going on thirteen years ago.
So while this function call may not be needed any more, I prefer to keep
it around if that's not too much trouble.
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-19 15:09 ` Alan Stern
@ 2023-10-19 15:27 ` Alan Stern
2023-10-20 9:52 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-19 15:27 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On Thu, Oct 19, 2023 at 11:09:57AM -0400, Alan Stern wrote:
> On Thu, Oct 19, 2023 at 02:38:23PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2023-10-18 11:20:46 [-0400], Alan Stern wrote:
> > >
> > > If you hadn't removed the card suddenly, the exception would not have
> > > occurred. So the logical conclusion isn't that we should get rid of the
> > > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't
> > > remove PCIe cards while the system is running. Not unless your computer
> > > uses the special hardware from Stratus Technologies.
> >
> > So the card was removed and the kernel complained that it can't access
> > the memory behind the PCI-bar?
> >
> > How odd…
> >
> > > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
> > >
> > > What about cardbus or PCMCIA cards? Removing one of those cards
> > > suddenly, while the system is running, is a perfectly reasonable thing
> > > to do and it will not cause any hardware damage. So I think we should
> > > keep the usb_hcd_irq(0, hcd) call.
> >
> > Don't you invoke pci_driver::remove in such case to properly let the
> > physical device go? This can also be tested via unbind from sysfs.
>
> I don't know any more. The code in question was added in 2010 in order
> to handle a specially designed high-availability hot-swap-capable
> system, and it may not be relevant now.
>
> The original problem: When a particular PCI device was disabled (to
> simulate a failure), pci_driver::remove did not get called.
Oops... Correction: pci_driver::remove _did_ get called, and the code
in question is part of the remove handler.
Perhaps putting it there was a mistake. At the time, I probably thought
the problem was a general one which might affect all the PCI USB
controller drivers, but looking back now, it might have been better to
put in the individual device driver.
Perhaps that what we should do.
Alan Stern
> Maybe
> they wanted to have it fail over to a backup device and appear to the
> kernel as though nothing had happened? It's hard to tell exactly what
> was going on thirteen years ago.
>
> So while this function call may not be needed any more, I prefer to keep
> it around if that's not too much trouble.
>
> Alan Stern
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-19 15:27 ` Alan Stern
@ 2023-10-20 9:52 ` Sebastian Andrzej Siewior
2023-10-20 15:19 ` Alan Stern
0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-20 9:52 UTC (permalink / raw)
To: Alan Stern
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On 2023-10-19 11:27:54 [-0400], Alan Stern wrote:
>
> Perhaps that what we should do.
Perfect.
> Alan Stern
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-20 9:52 ` Sebastian Andrzej Siewior
@ 2023-10-20 15:19 ` Alan Stern
2023-11-03 15:46 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-20 15:19 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On Fri, Oct 20, 2023 at 11:52:38AM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-10-19 11:27:54 [-0400], Alan Stern wrote:
> >
> > Perhaps that what we should do.
>
> Perfect.
Hmmm... This turns out not to be as easy as one might think.
Sebastian, if you can instead suggest a way to call drivers' interrupt
handlers (i.e., simulate an interrupt) without causing problems for RT
kernels, I think that would be a better approach.
The fundamental problem here is that the uhci-hcd driver was not written
with unexpected hardware removal in mind. It doesn't have timeouts to
handle situations where the device doesn't generate an IRQ to indicate
completion of an I/O operation. And since it's been ten years since
I've done any significant work on the driver, I'd really like to avoid
the need for such a far-reaching change (not least because I don't have
any way to test it).
I suppose an alternative approach would be to add a new callback pointer
to the hc_driver structure -- something that could tell the driver to
check for hardware removal. I'll do that if there's no other choice.
But simulating an interrupt seems easier, provided it can be done at
all.
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-10-20 15:19 ` Alan Stern
@ 2023-11-03 15:46 ` Sebastian Andrzej Siewior
2023-11-03 20:42 ` Alan Stern
0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-03 15:46 UTC (permalink / raw)
To: Alan Stern
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On 2023-10-20 11:19:49 [-0400], Alan Stern wrote:
> Hmmm... This turns out not to be as easy as one might think.
>
> Sebastian, if you can instead suggest a way to call drivers' interrupt
> handlers (i.e., simulate an interrupt) without causing problems for RT
> kernels, I think that would be a better approach.
So there is generic_handle_irq_safe(). It should get all the details
right like incrementing the counter in /proc/interrupts, doing nothing
if the interrupt has been masked or waking the interrupt thread if the
interrupt has happen to be threaded.
It triggers the interrupt so for a shared handler it will invoke _all_
registered interrupt handler and for threaded interrupts it will return
before the thread had a chance to run (free_irq() will handle it
properly and wait for the interrupt thread/handler to complete).
> The fundamental problem here is that the uhci-hcd driver was not written
> with unexpected hardware removal in mind. It doesn't have timeouts to
> handle situations where the device doesn't generate an IRQ to indicate
> completion of an I/O operation. And since it's been ten years since
> I've done any significant work on the driver, I'd really like to avoid
> the need for such a far-reaching change (not least because I don't have
> any way to test it).
I see. Don't over complicate or "correct" things here. What should work
is that the removal callback can be called at any time and things
continue work. That means it will purge all queues, cancel all requests,
timers, whatever and free all resources associated with the driver/
device.
If it comes to PCI-hotplug you have to have a so called PCI-hotplug
slot. This "slot" will let the OS know if the hardware has been removed
or added. If you don't have such a thing you have to maintain the state
yourself by using the "remove" and "rescan" sysfs files of the PCI slot.
I'm not aware of any requirement for a PCI-driver to check if its device
has been removed.
> I suppose an alternative approach would be to add a new callback pointer
> to the hc_driver structure -- something that could tell the driver to
> check for hardware removal. I'll do that if there's no other choice.
> But simulating an interrupt seems easier, provided it can be done at
> all.
>
> Alan Stern
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-11-03 15:46 ` Sebastian Andrzej Siewior
@ 2023-11-03 20:42 ` Alan Stern
2023-11-06 3:02 ` Li, Meng
2023-11-06 8:27 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 25+ messages in thread
From: Alan Stern @ 2023-11-03 20:42 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On Fri, Nov 03, 2023 at 04:46:24PM +0100, Sebastian Andrzej Siewior wrote:
> On 2023-10-20 11:19:49 [-0400], Alan Stern wrote:
> > Hmmm... This turns out not to be as easy as one might think.
> >
> > Sebastian, if you can instead suggest a way to call drivers' interrupt
> > handlers (i.e., simulate an interrupt) without causing problems for RT
> > kernels, I think that would be a better approach.
>
> So there is generic_handle_irq_safe(). It should get all the details
> right like incrementing the counter in /proc/interrupts, doing nothing
> if the interrupt has been masked or waking the interrupt thread if the
> interrupt has happen to be threaded.
> It triggers the interrupt so for a shared handler it will invoke _all_
> registered interrupt handler and for threaded interrupts it will return
> before the thread had a chance to run (free_irq() will handle it
> properly and wait for the interrupt thread/handler to complete).
Good. Meng Li, can you test a patch that replaces the
local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a
single call to generic_handle_irq_safe()?
> > The fundamental problem here is that the uhci-hcd driver was not written
> > with unexpected hardware removal in mind. It doesn't have timeouts to
> > handle situations where the device doesn't generate an IRQ to indicate
> > completion of an I/O operation. And since it's been ten years since
> > I've done any significant work on the driver, I'd really like to avoid
> > the need for such a far-reaching change (not least because I don't have
> > any way to test it).
>
> I see. Don't over complicate or "correct" things here. What should work
> is that the removal callback can be called at any time and things
> continue work. That means it will purge all queues, cancel all requests,
> timers, whatever and free all resources associated with the driver/
> device.
The driver _does_ work under those circumstances -- provided the
hardware is still present and accessible.
> If it comes to PCI-hotplug you have to have a so called PCI-hotplug
> slot. This "slot" will let the OS know if the hardware has been removed
> or added. If you don't have such a thing you have to maintain the state
> yourself by using the "remove" and "rescan" sysfs files of the PCI slot.
>
> I'm not aware of any requirement for a PCI-driver to check if its device
> has been removed.
That's the problem: The driver doesn't really support PCI-hotplug.
The code that Meng Li wants to change was sort of a half-baked way to
add such support.
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: USB: add check to detect host controller hardware removal
2023-11-03 20:42 ` Alan Stern
@ 2023-11-06 3:02 ` Li, Meng
2023-11-06 8:28 ` Sebastian Andrzej Siewior
2023-11-06 8:27 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-11-06 3:02 UTC (permalink / raw)
To: Alan Stern, Sebastian Andrzej Siewior
Cc: Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Saturday, November 4, 2023 4:42 AM
> To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Li, Meng <Meng.Li@windriver.com>; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On Fri, Nov 03, 2023 at 04:46:24PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2023-10-20 11:19:49 [-0400], Alan Stern wrote:
> > > Hmmm... This turns out not to be as easy as one might think.
> > >
> > > Sebastian, if you can instead suggest a way to call drivers'
> > > interrupt handlers (i.e., simulate an interrupt) without causing
> > > problems for RT kernels, I think that would be a better approach.
> >
> > So there is generic_handle_irq_safe(). It should get all the details
> > right like incrementing the counter in /proc/interrupts, doing nothing
> > if the interrupt has been masked or waking the interrupt thread if the
> > interrupt has happen to be threaded.
> > It triggers the interrupt so for a shared handler it will invoke _all_
> > registered interrupt handler and for threaded interrupts it will
> > return before the thread had a chance to run (free_irq() will handle
> > it properly and wait for the interrupt thread/handler to complete).
>
> Good. Meng Li, can you test a patch that replaces the
> local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to
> generic_handle_irq_safe()?
It needs an irq number as parameter, what I should pass to this function, 0 or dev->irq or other value?
Thanks,
Limeng
>
> > > The fundamental problem here is that the uhci-hcd driver was not
> > > written with unexpected hardware removal in mind. It doesn't have
> > > timeouts to handle situations where the device doesn't generate an
> > > IRQ to indicate completion of an I/O operation. And since it's been
> > > ten years since I've done any significant work on the driver, I'd
> > > really like to avoid the need for such a far-reaching change (not
> > > least because I don't have any way to test it).
> >
> > I see. Don't over complicate or "correct" things here. What should
> > work is that the removal callback can be called at any time and things
> > continue work. That means it will purge all queues, cancel all
> > requests, timers, whatever and free all resources associated with the
> > driver/ device.
>
> The driver _does_ work under those circumstances -- provided the hardware is
> still present and accessible.
>
> > If it comes to PCI-hotplug you have to have a so called PCI-hotplug
> > slot. This "slot" will let the OS know if the hardware has been
> > removed or added. If you don't have such a thing you have to maintain
> > the state yourself by using the "remove" and "rescan" sysfs files of the PCI
> slot.
> >
> > I'm not aware of any requirement for a PCI-driver to check if its
> > device has been removed.
>
> That's the problem: The driver doesn't really support PCI-hotplug.
> The code that Meng Li wants to change was sort of a half-baked way to add
> such support.
>
> Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: USB: add check to detect host controller hardware removal
2023-11-03 20:42 ` Alan Stern
2023-11-06 3:02 ` Li, Meng
@ 2023-11-06 8:27 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-06 8:27 UTC (permalink / raw)
To: Alan Stern
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On 2023-11-03 16:42:18 [-0400], Alan Stern wrote:
> > I see. Don't over complicate or "correct" things here. What should work
> > is that the removal callback can be called at any time and things
> > continue work. That means it will purge all queues, cancel all requests,
> > timers, whatever and free all resources associated with the driver/
> > device.
>
> The driver _does_ work under those circumstances -- provided the
> hardware is still present and accessible.
In that case I don't see a problem.
> > If it comes to PCI-hotplug you have to have a so called PCI-hotplug
> > slot. This "slot" will let the OS know if the hardware has been removed
> > or added. If you don't have such a thing you have to maintain the state
> > yourself by using the "remove" and "rescan" sysfs files of the PCI slot.
> >
> > I'm not aware of any requirement for a PCI-driver to check if its device
> > has been removed.
>
> That's the problem: The driver doesn't really support PCI-hotplug.
> The code that Meng Li wants to change was sort of a half-baked way to
> add such support.
This sounds half-baked. Just remove the device from sysfs and then
physically plug the card. It is going to end in mess if every driver
gets some "hotplug" support.
> Alan Stern
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: USB: add check to detect host controller hardware removal
2023-11-06 3:02 ` Li, Meng
@ 2023-11-06 8:28 ` Sebastian Andrzej Siewior
2023-11-06 8:54 ` Li, Meng
2023-11-06 15:25 ` Alan Stern
0 siblings, 2 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-06 8:28 UTC (permalink / raw)
To: Li, Meng
Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
linux-rt-users
On 2023-11-06 03:02:42 [+0000], Li, Meng wrote:
> > Good. Meng Li, can you test a patch that replaces the
> > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to
> > generic_handle_irq_safe()?
>
> It needs an irq number as parameter, what I should pass to this
> function, 0 or dev->irq or other value?
dev->irq is what it asks for. I would really appreciate if you would
instead use the sysfs interface to remove the device prior physically
removing it. This should solve your trouble.
> Thanks,
> Limeng
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: RE: USB: add check to detect host controller hardware removal
2023-11-06 8:28 ` Sebastian Andrzej Siewior
@ 2023-11-06 8:54 ` Li, Meng
2023-11-06 9:09 ` Sebastian Andrzej Siewior
2023-11-06 15:25 ` Alan Stern
1 sibling, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-11-06 8:54 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
linux-rt-users
> -----Original Message-----
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Sent: Monday, November 6, 2023 4:28 PM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: RE: USB: add check to detect host controller hardware removal
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On 2023-11-06 03:02:42 [+0000], Li, Meng wrote:
> > > Good. Meng Li, can you test a patch that replaces the
> > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with
> > > a single call to generic_handle_irq_safe()?
> >
> > It needs an irq number as parameter, what I should pass to this
> > function, 0 or dev->irq or other value?
>
> dev->irq is what it asks for. I would really appreciate if you would
> instead use the sysfs interface to remove the device prior physically removing
> it. This should solve your trouble.
>
This is not my original issue that I encountered.
I agree that we should remove the device from sys interface firstly, and then do hot-plug action.
My original issue was the calltrace on RT kernel if I remove the device from sys interface.
# echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
xhci_hcd 0001:01:00.0: remove, state 1
usb usb2: USB disconnect, device number 1
usb 2-4: USB disconnect, device number 2
xhci_hcd 0001:01:00.0: USB bus 2 deregistered
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
root@nxp-ls1043:preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
~# CPU: 0 PID: 765 Comm: sh Tainted: G W 6.1.57-rt10-yocto-preempt-rt #1
Hardware name: LS1043A RDB Board (DT)
Call trace:
dump_backtrace.part.0+0xc8/0xd4
show_stack+0x20/0x30
dump_stack_lvl+0x6c/0x88
dump_stack+0x18/0x34
__might_resched+0x160/0x1c0
rt_spin_lock+0x38/0xb0
xhci_irq+0x44/0x16d0
usb_hcd_irq+0x38/0x5c
usb_hcd_pci_remove+0x84/0x14c
xhci_pci_remove+0x78/0xc0
pci_device_remove+0x44/0xcc
device_remove+0x54/0x8c
device_release_driver_internal+0x1ec/0x260
device_release_driver+0x20/0x30
pci_stop_bus_device+0x8c/0xcc
pci_stop_and_remove_bus_device_locked+0x28/0x44
remove_store+0xa0/0xb0
dev_attr_store+0x20/0x34
sysfs_kf_write+0x4c/0x5c
kernfs_fop_write_iter+0x128/0x1f0
vfs_write+0x1c0/0x2f0
ksys_write+0x78/0x110
__arm64_sys_write+0x24/0x30
invoke_syscall+0x5c/0x130
el0_svc_common.constprop.0+0x4c/0xf4
do_el0_svc+0x34/0xc0
el0_svc+0x2c/0x84
el0t_64_sync_handler+0xf4/0x120
el0t_64_sync+0x18c/0x190
and then I checked the commit log to get which commit introduced this issue.
I found out the commit is
commit c548795abe0d3520b74e18f23ca0a0d72deddab9
Author: Alan Stern <stern@rowland.harvard.edu>
Date: Wed Jun 9 17:34:27 2010 -0400
USB: add check to detect host controller hardware removal
And then, Alan Stern told me the background of this issue. so, I started to do hotplug operation on my board to see what symptom on my nxp-ls1043/6 board.
And then there were lots of discussion followed.
Thanks,
Limeng
> > Thanks,
> > Limeng
>
> Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: RE: USB: add check to detect host controller hardware removal
2023-11-06 8:54 ` Li, Meng
@ 2023-11-06 9:09 ` Sebastian Andrzej Siewior
2023-11-07 3:08 ` Li, Meng
0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-06 9:09 UTC (permalink / raw)
To: Li, Meng
Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
linux-rt-users
On 2023-11-06 08:54:50 [+0000], Li, Meng wrote:
> This is not my original issue that I encountered.
> I agree that we should remove the device from sys interface firstly,
> and then do hot-plug action.
> My original issue was the calltrace on RT kernel if I remove the
> device from sys interface.
> # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> xhci_hcd 0001:01:00.0: remove, state 1
> usb usb2: USB disconnect, device number 1
> usb 2-4: USB disconnect, device number 2
> xhci_hcd 0001:01:00.0: USB bus 2 deregistered
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
Right.
> and then I checked the commit log to get which commit introduced this issue.
> I found out the commit is
> commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> Author: Alan Stern <stern@rowland.harvard.edu>
> Date: Wed Jun 9 17:34:27 2010 -0400
>
> USB: add check to detect host controller hardware removal
>
> And then, Alan Stern told me the background of this issue. so, I
> started to do hotplug operation on my board to see what symptom on my
> nxp-ls1043/6 board.
> And then there were lots of discussion followed.
Okay. I somehow mapped it that you try to add this to xhci.
The suggested replacement should cover it. Better if we could get rid of
it ;)
> Thanks,
> Limeng
Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: USB: add check to detect host controller hardware removal
2023-11-06 8:28 ` Sebastian Andrzej Siewior
2023-11-06 8:54 ` Li, Meng
@ 2023-11-06 15:25 ` Alan Stern
1 sibling, 0 replies; 25+ messages in thread
From: Alan Stern @ 2023-11-06 15:25 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users
On Mon, Nov 06, 2023 at 09:28:29AM +0100, Sebastian Andrzej Siewior wrote:
> On 2023-11-06 03:02:42 [+0000], Li, Meng wrote:
> > > Good. Meng Li, can you test a patch that replaces the
> > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to
> > > generic_handle_irq_safe()?
> >
> > It needs an irq number as parameter, what I should pass to this
> > function, 0 or dev->irq or other value?
>
> dev->irq is what it asks for.
In fact it should be hcd->irq -- see usb_hcd_request_irqs(). Maybe this
is the same value as dev->irq; I don't know.
> I would really appreciate if you would
> instead use the sysfs interface to remove the device prior physically
> removing it. This should solve your trouble.
The whole reason for having this code in the first place -- the specific
use case -- was to handle device removal caused by sudden hardware
failure. Obviously there's no way to use the sysfs interface before
this occurs.
Alan Stern
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: RE: RE: USB: add check to detect host controller hardware removal
2023-11-06 9:09 ` Sebastian Andrzej Siewior
@ 2023-11-07 3:08 ` Li, Meng
2023-11-07 6:17 ` Greg KH
0 siblings, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-11-07 3:08 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
linux-rt-users
> -----Original Message-----
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Sent: Monday, November 6, 2023 5:09 PM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: RE: RE: USB: add check to detect host controller hardware
> removal
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On 2023-11-06 08:54:50 [+0000], Li, Meng wrote:
>
> > This is not my original issue that I encountered.
> > I agree that we should remove the device from sys interface firstly,
> > and then do hot-plug action.
> > My original issue was the calltrace on RT kernel if I remove the
> > device from sys interface.
> > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> > xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect,
> > device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd
> > 0001:01:00.0: USB bus 2 deregistered
> > BUG: sleeping function called from invalid context at
> > kernel/locking/spinlock_rt.c:46
> > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
>
> Right.
>
> > and then I checked the commit log to get which commit introduced this
> issue.
> > I found out the commit is
> > commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> > Author: Alan Stern <stern@rowland.harvard.edu>
> > Date: Wed Jun 9 17:34:27 2010 -0400
> >
> > USB: add check to detect host controller hardware removal
> >
> > And then, Alan Stern told me the background of this issue. so, I
> > started to do hotplug operation on my board to see what symptom on my
> > nxp-ls1043/6 board.
> > And then there were lots of discussion followed.
>
> Okay. I somehow mapped it that you try to add this to xhci.
> The suggested replacement should cover it. Better if we could get rid of it ;)
>
generic_handle_irq_safe(dev->irq) works find on my board, there is no calltrace anymore.
Will you create a patch in later?
Thanks,
Limeng
> > Thanks,
> > Limeng
>
> Sebastian
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: RE: USB: add check to detect host controller hardware removal
2023-11-07 3:08 ` Li, Meng
@ 2023-11-07 6:17 ` Greg KH
0 siblings, 0 replies; 25+ messages in thread
From: Greg KH @ 2023-11-07 6:17 UTC (permalink / raw)
To: Li, Meng
Cc: Sebastian Andrzej Siewior, Alan Stern, Steven Rostedt,
Ingo Molnar, USB mailing list, linux-rt-users
On Tue, Nov 07, 2023 at 03:08:13AM +0000, Li, Meng wrote:
>
>
> > -----Original Message-----
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Sent: Monday, November 6, 2023 5:09 PM
> > To: Li, Meng <Meng.Li@windriver.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt
> > <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> > <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> > Subject: Re: RE: RE: USB: add check to detect host controller hardware
> > removal
> >
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and
> > know the content is safe.
> >
> > On 2023-11-06 08:54:50 [+0000], Li, Meng wrote:
> >
> > > This is not my original issue that I encountered.
> > > I agree that we should remove the device from sys interface firstly,
> > > and then do hot-plug action.
> > > My original issue was the calltrace on RT kernel if I remove the
> > > device from sys interface.
> > > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> > > xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect,
> > > device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd
> > > 0001:01:00.0: USB bus 2 deregistered
> > > BUG: sleeping function called from invalid context at
> > > kernel/locking/spinlock_rt.c:46
> > > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
> >
> > Right.
> >
> > > and then I checked the commit log to get which commit introduced this
> > issue.
> > > I found out the commit is
> > > commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> > > Author: Alan Stern <stern@rowland.harvard.edu>
> > > Date: Wed Jun 9 17:34:27 2010 -0400
> > >
> > > USB: add check to detect host controller hardware removal
> > >
> > > And then, Alan Stern told me the background of this issue. so, I
> > > started to do hotplug operation on my board to see what symptom on my
> > > nxp-ls1043/6 board.
> > > And then there were lots of discussion followed.
> >
> > Okay. I somehow mapped it that you try to add this to xhci.
> > The suggested replacement should cover it. Better if we could get rid of it ;)
> >
>
> generic_handle_irq_safe(dev->irq) works find on my board, there is no calltrace anymore.
> Will you create a patch in later?
As you have tested this, please create it and submit it so that we know
exactly what changes are required and you get the proper credit for
doing this work.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-11-07 6:17 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <PH0PR11MB5191464B2F01511D2ADECB3BF1D2A@PH0PR11MB5191.namprd11.prod.outlook.com>
2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
2023-10-15 13:37 ` Li, Meng
2023-10-16 16:56 ` Steven Rostedt
2023-10-16 19:23 ` Alan Stern
2023-10-17 2:23 ` Li, Meng
2023-10-17 14:06 ` Alan Stern
2023-10-18 5:00 ` Li, Meng
2023-10-18 15:20 ` Alan Stern
2023-10-18 15:34 ` Alan Stern
2023-10-19 12:38 ` Sebastian Andrzej Siewior
2023-10-19 15:09 ` Alan Stern
2023-10-19 15:27 ` Alan Stern
2023-10-20 9:52 ` Sebastian Andrzej Siewior
2023-10-20 15:19 ` Alan Stern
2023-11-03 15:46 ` Sebastian Andrzej Siewior
2023-11-03 20:42 ` Alan Stern
2023-11-06 3:02 ` Li, Meng
2023-11-06 8:28 ` Sebastian Andrzej Siewior
2023-11-06 8:54 ` Li, Meng
2023-11-06 9:09 ` Sebastian Andrzej Siewior
2023-11-07 3:08 ` Li, Meng
2023-11-07 6:17 ` Greg KH
2023-11-06 15:25 ` Alan Stern
2023-11-06 8:27 ` Sebastian Andrzej Siewior
2023-10-19 12:49 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.