All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: USB: add check to detect host controller hardware removal
       [not found] <PH0PR11MB5191464B2F01511D2ADECB3BF1D2A@PH0PR11MB5191.namprd11.prod.outlook.com>
@ 2023-10-13 17:17 ` Alan Stern
  2023-10-15 13:37   ` Li, Meng
  2023-10-16 16:56   ` Steven Rostedt
  0 siblings, 2 replies; 25+ messages in thread
From: Alan Stern @ 2023-10-13 17:17 UTC (permalink / raw)
  To: Li, Meng; +Cc: Steven Rostedt, Ingo Molnar, USB mailing list

Messages like this should always be sent to the mailing list as well as 
to me.  And in this case, since it involves an RT kernel, it should be 
CC-ed to some of the people involved in developing the RT patches.

On Fri, Oct 13, 2023 at 04:25:43AM +0000, Li, Meng wrote:
> Hi Alan Stern,
> 
> Sorry for disrupting you very abruptly. I encounter a calltrace when trying to remove a PCIe-to-USB card device with below command. Only occurred in rt kernel.
> 
> # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> 
> Call trace:
> dump_backtrace.part.0+0xc8/0xd4
> show_stack+0x20/0x30
> dump_stack_lvl+0x6c/0x88
> dump_stack+0x18/0x34
> __might_resched+0x160/0x1c0
> rt_spin_lock+0x38/0xb0
> xhci_irq+0x44/0x16d0
> usb_hcd_irq+0x38/0x5c
> usb_hcd_pci_remove+0x84/0x14c
> xhci_pci_remove+0x78/0xc0
> pci_device_remove+0x44/0xcc
> device_remove+0x54/0x8c
> device_release_driver_internal+0x1ec/0x260
> device_release_driver+0x20/0x30
> pci_stop_bus_device+0x8c/0xcc
> pci_stop_and_remove_bus_device_locked+0x28/0x44
> remove_store+0xa0/0xb0
> dev_attr_store+0x20/0x34
> sysfs_kf_write+0x4c/0x5c
> kernfs_fop_write_iter+0x128/0x1f0
> vfs_write+0x1c0/0x2f0
> ksys_write+0x78/0x110
> __arm64_sys_write+0x24/0x30
> invoke_syscall+0x5c/0x130
> el0_svc_common.constprop.0+0x4c/0xf4
> do_el0_svc+0x34/0xc0
> el0_svc+0x2c/0x84
> el0t_64_sync_handler+0xf4/0x120
> el0t_64_sync+0x18c/0x190
> 
> this trace is caused by below patch that is used to fix a usb hang issue.
> 
> commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> Author: Alan Stern stern@rowland.harvard.edu<mailto:stern@rowland.harvard.edu>
> Date:   Wed Jun 9 17:34:27 2010 -0400
> 
> USB: add check to detect host controller hardware removal
> 
> 
> I know it is too too long ago, but could you please try to recall how to reproduce the issue that you fixed, how can I produce the usb hang issue.

I think the issue could be reproduced just by hot-removing a USB host 
controller.  Maybe the controller was on a PCMCIA card; I don't 
remember.  And I don't remember what kind of USB host controller it was.  

You might be able to find the original bug report in the linux-usb or 
linux-usb-devel mailing list archives for 2010.

> I want to try whether I can get another method to fix the USB hang issue without disable irq, so that avoid the calltrace.

For new people, the contents of that commit are:

    This patch (as1391) fixes a problem that can occur when USB host
    controller hardware is hot-unplugged.  If no interrupts are generated
    by the unplug then the HCD may not realize that the controller is
    gone, and the subsequent unbind may hang waiting for interrupts that
    never arrive.
    
    The solution (for PCI-based controllers) is to call the HCD's
    interrupt handler at the start of usb_hcd_pci_remove().  If the
    hardware is gone, the handler will realize this when it tries to read
    the controller's status register.
    
    Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 1cf2d1e79a5c..7e2d5271b0c9 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
        if (!hcd)
                return;
 
+       /* Fake an interrupt request in order to give the driver a chance
+        * to test whether the controller hardware has been removed (e.g.,
+        * cardbus physical eject).
+        */
+       local_irq_disable();
+       usb_hcd_irq(0, hcd);
+       local_irq_enable();
+
        usb_remove_hcd(hcd);
        if (hcd->driver->flags & HCD_MEMORY) {
                iounmap(hcd->regs);

The local_irq_disable() is there so that the irq handler will be invoked 
in the state that it expects (i.e., with interrupts disabled).  
Apparently Meng's RT kernel doesn't like it when the handler then 
calls spin_lock(); I don't know why.

Alan Stern

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: USB: add check to detect host controller hardware removal
  2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
@ 2023-10-15 13:37   ` Li, Meng
  2023-10-16 16:56   ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Li, Meng @ 2023-10-15 13:37 UTC (permalink / raw)
  To: Alan Stern; +Cc: Steven Rostedt, Ingo Molnar, USB mailing list



> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Saturday, October 14, 2023 1:18 AM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>; Ingo Molnar
> <mingo@redhat.com>; USB mailing list <linux-usb@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> Messages like this should always be sent to the mailing list as well as to me.
> And in this case, since it involves an RT kernel, it should be CC-ed to some of
> the people involved in developing the RT patches.
> 
> On Fri, Oct 13, 2023 at 04:25:43AM +0000, Li, Meng wrote:
> > Hi Alan Stern,
> >
> > Sorry for disrupting you very abruptly. I encounter a calltrace when trying to
> remove a PCIe-to-USB card device with below command. Only occurred in rt
> kernel.
> >
> > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> >
> > Call trace:
> > dump_backtrace.part.0+0xc8/0xd4
> > show_stack+0x20/0x30
> > dump_stack_lvl+0x6c/0x88
> > dump_stack+0x18/0x34
> > __might_resched+0x160/0x1c0
> > rt_spin_lock+0x38/0xb0
> > xhci_irq+0x44/0x16d0
> > usb_hcd_irq+0x38/0x5c
> > usb_hcd_pci_remove+0x84/0x14c
> > xhci_pci_remove+0x78/0xc0
> > pci_device_remove+0x44/0xcc
> > device_remove+0x54/0x8c
> > device_release_driver_internal+0x1ec/0x260
> > device_release_driver+0x20/0x30
> > pci_stop_bus_device+0x8c/0xcc
> > pci_stop_and_remove_bus_device_locked+0x28/0x44
> > remove_store+0xa0/0xb0
> > dev_attr_store+0x20/0x34
> > sysfs_kf_write+0x4c/0x5c
> > kernfs_fop_write_iter+0x128/0x1f0
> > vfs_write+0x1c0/0x2f0
> > ksys_write+0x78/0x110
> > __arm64_sys_write+0x24/0x30
> > invoke_syscall+0x5c/0x130
> > el0_svc_common.constprop.0+0x4c/0xf4
> > do_el0_svc+0x34/0xc0
> > el0_svc+0x2c/0x84
> > el0t_64_sync_handler+0xf4/0x120
> > el0t_64_sync+0x18c/0x190
> >
> > this trace is caused by below patch that is used to fix a usb hang issue.
> >
> > commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> > Author: Alan Stern
> stern@rowland.harvard.edu<mailto:stern@rowland.harvard.edu>
> > Date:   Wed Jun 9 17:34:27 2010 -0400
> >
> > USB: add check to detect host controller hardware removal
> >
> >
> > I know it is too too long ago, but could you please try to recall how to
> reproduce the issue that you fixed, how can I produce the usb hang issue.
> 
> I think the issue could be reproduced just by hot-removing a USB host
> controller.  Maybe the controller was on a PCMCIA card; I don't remember.
> And I don't remember what kind of USB host controller it was.
> 
> You might be able to find the original bug report in the linux-usb or linux-usb-
> devel mailing list archives for 2010.
> 

Thanks for offering this clue about how to reproduce this issue, I will try it on my side.

Thanks,
Limeng 

> > I want to try whether I can get another method to fix the USB hang issue
> without disable irq, so that avoid the calltrace.
> 
> For new people, the contents of that commit are:
> 
>     This patch (as1391) fixes a problem that can occur when USB host
>     controller hardware is hot-unplugged.  If no interrupts are generated
>     by the unplug then the HCD may not realize that the controller is
>     gone, and the subsequent unbind may hang waiting for interrupts that
>     never arrive.
> 
>     The solution (for PCI-based controllers) is to call the HCD's
>     interrupt handler at the start of usb_hcd_pci_remove().  If the
>     hardware is gone, the handler will realize this when it tries to read
>     the controller's status register.
> 
>     Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index
> 1cf2d1e79a5c..7e2d5271b0c9 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
>         if (!hcd)
>                 return;
> 
> +       /* Fake an interrupt request in order to give the driver a chance
> +        * to test whether the controller hardware has been removed (e.g.,
> +        * cardbus physical eject).
> +        */
> +       local_irq_disable();
> +       usb_hcd_irq(0, hcd);
> +       local_irq_enable();
> +
>         usb_remove_hcd(hcd);
>         if (hcd->driver->flags & HCD_MEMORY) {
>                 iounmap(hcd->regs);
> 
> The local_irq_disable() is there so that the irq handler will be invoked in the
> state that it expects (i.e., with interrupts disabled).
> Apparently Meng's RT kernel doesn't like it when the handler then calls
> spin_lock(); I don't know why.
> 
> Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
  2023-10-15 13:37   ` Li, Meng
@ 2023-10-16 16:56   ` Steven Rostedt
  2023-10-16 19:23     ` Alan Stern
  2023-10-19 12:49     ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2023-10-16 16:56 UTC (permalink / raw)
  To: Alan Stern
  Cc: Li, Meng, Ingo Molnar, USB mailing list,
	Sebastian Andrzej Siewior, linux-rt-users

On Fri, 13 Oct 2023 13:17:52 -0400
Alan Stern <stern@rowland.harvard.edu> wrote:

> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
>         if (!hcd)
>                 return;
>  
> +       /* Fake an interrupt request in order to give the driver a chance
> +        * to test whether the controller hardware has been removed (e.g.,
> +        * cardbus physical eject).
> +        */
> +       local_irq_disable();
> +       usb_hcd_irq(0, hcd);
> +       local_irq_enable();
> +
>         usb_remove_hcd(hcd);
>         if (hcd->driver->flags & HCD_MEMORY) {
>                 iounmap(hcd->regs);
> 
> The local_irq_disable() is there so that the irq handler will be invoked 
> in the state that it expects (i.e., with interrupts disabled).  
> Apparently Meng's RT kernel doesn't like it when the handler then 
> calls spin_lock(); I don't know why.

Because in RT, spin_lock()s are actually mutexes.

In RT, interrupt handlers do not even run with interrupts disabled (they
run as threads), so the assumption that they run with interrupts disabled
on RT is incorrect. One hack would simply be:

	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
		local_irq_disable();
	usb_hcd_irq(0, hcd);
	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
		local_irq_enable();

But that's rather ugly. We use to have that as a wrapper of just:

	local_irq_disable_nort();

but I don't know if we removed that or not.

Sebastian?

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-16 16:56   ` Steven Rostedt
@ 2023-10-16 19:23     ` Alan Stern
  2023-10-17  2:23       ` Li, Meng
  2023-10-19 12:49     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-16 19:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Li, Meng, Ingo Molnar, USB mailing list,
	Sebastian Andrzej Siewior, linux-rt-users

On Mon, Oct 16, 2023 at 12:56:24PM -0400, Steven Rostedt wrote:
> On Fri, 13 Oct 2023 13:17:52 -0400
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > --- a/drivers/usb/core/hcd-pci.c
> > +++ b/drivers/usb/core/hcd-pci.c
> > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
> >         if (!hcd)
> >                 return;
> >  
> > +       /* Fake an interrupt request in order to give the driver a chance
> > +        * to test whether the controller hardware has been removed (e.g.,
> > +        * cardbus physical eject).
> > +        */
> > +       local_irq_disable();
> > +       usb_hcd_irq(0, hcd);
> > +       local_irq_enable();
> > +
> >         usb_remove_hcd(hcd);
> >         if (hcd->driver->flags & HCD_MEMORY) {
> >                 iounmap(hcd->regs);
> > 
> > The local_irq_disable() is there so that the irq handler will be invoked 
> > in the state that it expects (i.e., with interrupts disabled).  
> > Apparently Meng's RT kernel doesn't like it when the handler then 
> > calls spin_lock(); I don't know why.
> 
> Because in RT, spin_lock()s are actually mutexes.
> 
> In RT, interrupt handlers do not even run with interrupts disabled (they
> run as threads), so the assumption that they run with interrupts disabled
> on RT is incorrect. One hack would simply be:
> 
> 	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> 		local_irq_disable();
> 	usb_hcd_irq(0, hcd);
> 	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> 		local_irq_enable();
> 
> But that's rather ugly. We use to have that as a wrapper of just:
> 
> 	local_irq_disable_nort();
> 
> but I don't know if we removed that or not.
> 
> Sebastian?

Thanks for the information.  I guess a simple approach would be to add 
the wrapper back in, since it's not present in the current kernel.

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: USB: add check to detect host controller hardware removal
  2023-10-16 19:23     ` Alan Stern
@ 2023-10-17  2:23       ` Li, Meng
  2023-10-17 14:06         ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-10-17  2:23 UTC (permalink / raw)
  To: Alan Stern, Steven Rostedt
  Cc: Ingo Molnar, USB mailing list, Sebastian Andrzej Siewior, linux-rt-users



> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Tuesday, October 17, 2023 3:24 AM
> To: Steven Rostedt <rostedt@goodmis.org>
> Cc: Li, Meng <Meng.Li@windriver.com>; Ingo Molnar <mingo@redhat.com>;
> USB mailing list <linux-usb@vger.kernel.org>; Sebastian Andrzej Siewior
> <bigeasy@linutronix.de>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> On Mon, Oct 16, 2023 at 12:56:24PM -0400, Steven Rostedt wrote:
> > On Fri, 13 Oct 2023 13:17:52 -0400
> > Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > --- a/drivers/usb/core/hcd-pci.c
> > > +++ b/drivers/usb/core/hcd-pci.c
> > > @@ -292,6 +292,14 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
> > >         if (!hcd)
> > >                 return;
> > >
> > > +       /* Fake an interrupt request in order to give the driver a chance
> > > +        * to test whether the controller hardware has been removed (e.g.,
> > > +        * cardbus physical eject).
> > > +        */
> > > +       local_irq_disable();
> > > +       usb_hcd_irq(0, hcd);
> > > +       local_irq_enable();
> > > +
> > >         usb_remove_hcd(hcd);
> > >         if (hcd->driver->flags & HCD_MEMORY) {
> > >                 iounmap(hcd->regs);
> > >
> > > The local_irq_disable() is there so that the irq handler will be
> > > invoked in the state that it expects (i.e., with interrupts disabled).
> > > Apparently Meng's RT kernel doesn't like it when the handler then
> > > calls spin_lock(); I don't know why.
> >
> > Because in RT, spin_lock()s are actually mutexes.
> >
> > In RT, interrupt handlers do not even run with interrupts disabled
> > (they run as threads), so the assumption that they run with interrupts
> > disabled on RT is incorrect. One hack would simply be:
> >
> >       if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> >               local_irq_disable();
> >       usb_hcd_irq(0, hcd);
> >       if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> >               local_irq_enable();
> >
> > But that's rather ugly. We use to have that as a wrapper of just:
> >
> >       local_irq_disable_nort();
> >
> > but I don't know if we removed that or not.
> >
> > Sebastian?
> 
> Thanks for the information.  I guess a simple approach would be to add the
> wrapper back in, since it's not present in the current kernel.
>

I did some debugs on my side.
Firstly, the local_irq_disable_nort() function had been removed from latest RT kernel.
Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: add check to detect host controller hardware removal") is no longer useful.
Because the function usb_remove_hcd() is invoked from xhci_pci_remove() of file xhci-pci.c in advance.
I am trying to fix this issue with getting register status directly without local_irq_disable(). 

Thanks,
Limeng 

 
> Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-17  2:23       ` Li, Meng
@ 2023-10-17 14:06         ` Alan Stern
  2023-10-18  5:00           ` Li, Meng
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-17 14:06 UTC (permalink / raw)
  To: Li, Meng
  Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
	Sebastian Andrzej Siewior, linux-rt-users

On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote:
> I did some debugs on my side.
> Firstly, the local_irq_disable_nort() function had been removed from latest RT kernel.

What's in the RT kernel doesn't matter here, because the code you're 
patching belongs to the vanilla kernel.

> Second, because of creating xhci-pci.c, the commit c548795abe0d("USB: add check to detect host controller hardware removal") is no longer useful.
> Because the function usb_remove_hcd() is invoked from xhci_pci_remove() of file xhci-pci.c in advance.

What about for non-xHCI controllers?

> I am trying to fix this issue with getting register status directly without local_irq_disable(). 

Were you able to locate the original bug report?

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: USB: add check to detect host controller hardware removal
  2023-10-17 14:06         ` Alan Stern
@ 2023-10-18  5:00           ` Li, Meng
  2023-10-18 15:20             ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-10-18  5:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
	Sebastian Andrzej Siewior, linux-rt-users



> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Tuesday, October 17, 2023 10:06 PM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>; Ingo Molnar
> <mingo@redhat.com>; USB mailing list <linux-usb@vger.kernel.org>;
> Sebastian Andrzej Siewior <bigeasy@linutronix.de>; linux-rt-users <linux-rt-
> users@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> On Tue, Oct 17, 2023 at 02:23:05AM +0000, Li, Meng wrote:
> > I did some debugs on my side.
> > Firstly, the local_irq_disable_nort() function had been removed from latest
> RT kernel.
> 
> What's in the RT kernel doesn't matter here, because the code you're patching
> belongs to the vanilla kernel.
> 
> > Second, because of creating xhci-pci.c, the commit c548795abe0d("USB:
> add check to detect host controller hardware removal") is no longer useful.
> > Because the function usb_remove_hcd() is invoked from xhci_pci_remove()
> of file xhci-pci.c in advance.
> 
> What about for non-xHCI controllers?
> 

I will try non-xHCI controllers in later if I can find out one on my side.

> > I am trying to fix this issue with getting register status directly without
> local_irq_disable().
> 
> Were you able to locate the original bug report?
> 
This is original bug report
https://bugzilla.redhat.com/show_bug.cgi?id=579093

my latest debug information as below:
when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register.
Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP
Modules linked in:
CPU: 3 PID: 329 Comm: usb-storage Tainted: G        W          6.1.53-rt10-yocto-preempt-rt #1
Hardware name: LS1043A RDB Board (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : xhci_ring_ep_doorbell+0x78/0x120
lr : xhci_queue_bulk_tx+0x3b0/0x8a4
sp : ffff80000b0e3960
x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100
x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f
x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100
x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000
x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50
x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000
x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290
Call trace:
 xhci_ring_ep_doorbell+0x78/0x120
 xhci_queue_bulk_tx+0x3b0/0x8a4
 xhci_urb_enqueue+0x274/0x510
 usb_hcd_submit_urb+0xc0/0x8b0
 usb_submit_urb+0x29c/0x5c0
 usb_stor_msg_common+0x9c/0x190
 usb_stor_bulk_transfer_buf+0x58/0x110
 usb_stor_Bulk_transport+0xdc/0x380
 usb_stor_invoke_transport+0x40/0x530
 usb_stor_transparent_scsi_command+0x18/0x24
 usb_stor_control_thread+0x20c/0x2a0
 kthread+0x12c/0x130
 ret_from_fork+0x10/0x20
Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021) 
---[ end trace 0000000000000000 ]---
Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally.
In this situation, if remove the pcie device with below command
# echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove
The code will hang at the xhci->lock in xhci_urb_dequeue() function.
Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(),
there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status);
although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed.
Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed. 
In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware.
so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.

Thanks,
Limeng

> Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-18  5:00           ` Li, Meng
@ 2023-10-18 15:20             ` Alan Stern
  2023-10-18 15:34               ` Alan Stern
  2023-10-19 12:38               ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 25+ messages in thread
From: Alan Stern @ 2023-10-18 15:20 UTC (permalink / raw)
  To: Li, Meng
  Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
	Sebastian Andrzej Siewior, linux-rt-users

On Wed, Oct 18, 2023 at 05:00:58AM +0000, Li, Meng wrote:
> > Were you able to locate the original bug report?
> > 
> This is original bug report
> https://bugzilla.redhat.com/show_bug.cgi?id=579093

The Red Hat Bugzilla says:

	You are not authorized to access bug #579093.

So I can't tell exactly what happened back then.  :-(

But I do vaguely remember the discussion with Stratus Technologies.  
They had special hardware in their systems, which allowed them to do 
hot-swapping of PCI components.

> my latest debug information as below:
> when I removed the PCIe-USB card, there is below exception calltrace when operating host controller register.
> Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP
> Modules linked in:
> CPU: 3 PID: 329 Comm: usb-storage Tainted: G        W          6.1.53-rt10-yocto-preempt-rt #1
> Hardware name: LS1043A RDB Board (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : xhci_ring_ep_doorbell+0x78/0x120
> lr : xhci_queue_bulk_tx+0x3b0/0x8a4
> sp : ffff80000b0e3960
> x29: ffff80000b0e3960 x28: ffff000004ce2290 x27: ffff000008e71100
> x26: ffff000005718a80 x25: 0000000000000421 x24: 000000000000001f
> x23: ffff000008e71108 x22: 0000000000000000 x21: ffff8000099e5100
> x20: 0000000000000002 x19: 0000000000000004 x18: 0000000000000000
> x17: 0000000000000008 x16: ffff00007b5cfb00 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
> x11: 0000000000000001 x10: 0000000000000a40 x9 : ffff8000089b0b50
> x8 : ffff0000057c9a00 x7 : 000000000000001f x6 : ffff0000056c8000
> x5 : ffff800009714ca0 x4 : 0000000000000004 x3 : 0000000000000000
> x2 : 0000000000000000 x1 : ffff8000099e5108 x0 : ffff000004ce2290
> Call trace:
>  xhci_ring_ep_doorbell+0x78/0x120
>  xhci_queue_bulk_tx+0x3b0/0x8a4
>  xhci_urb_enqueue+0x274/0x510
>  usb_hcd_submit_urb+0xc0/0x8b0
>  usb_submit_urb+0x29c/0x5c0
>  usb_stor_msg_common+0x9c/0x190
>  usb_stor_bulk_transfer_buf+0x58/0x110
>  usb_stor_Bulk_transport+0xdc/0x380
>  usb_stor_invoke_transport+0x40/0x530
>  usb_stor_transparent_scsi_command+0x18/0x24
>  usb_stor_control_thread+0x20c/0x2a0
>  kthread+0x12c/0x130
>  ret_from_fork+0x10/0x20
> Code: 540001cc 8b140aa1 d5033ebf b9000033 (b9400021) 
> ---[ end trace 0000000000000000 ]---
> Because of the exception, the xhci->lock in xhci_urb_enqueue is released normally.
> In this situation, if remove the pcie device with below command
> # echo 1 > /sys/bus/pci/devices/<PCIe ID>/remove
> The code will hang at the xhci->lock in xhci_urb_dequeue() function.
> Even if I refer to commit c548795abe0d, move usb_hcd_irq(0, hcd) to function xhci_pci_remove(),
> there is also an exception calltrace("Internal error: synchronous external abort") when executing readl(&xhci->op_regs->status);
> although the code doesn't hang, the exception causes that code is broken from xhci_pci_remove(), the flowing code is not executed.
> Because usb_hcd_irq(0, hcd) causes exception, I think it should be removed. 
> In additional, removing PCIe card suddenly is not a reasonable operation, it may destroy the hardware.

If you hadn't removed the card suddenly, the exception would not have 
occurred.  So the logical conclusion isn't that we should get rid of the 
usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't 
remove PCIe cards while the system is running.  Not unless your computer 
uses the special hardware from Stratus Technologies.

> so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.

What about cardbus or PCMCIA cards?  Removing one of those cards 
suddenly, while the system is running, is a perfectly reasonable thing 
to do and it will not cause any hardware damage.  So I think we should 
keep the usb_hcd_irq(0, hcd) call.

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-18 15:20             ` Alan Stern
@ 2023-10-18 15:34               ` Alan Stern
  2023-10-19 12:38               ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Stern @ 2023-10-18 15:34 UTC (permalink / raw)
  To: Li, Meng
  Cc: Steven Rostedt, Ingo Molnar, USB mailing list,
	Sebastian Andrzej Siewior, linux-rt-users

On Wed, Oct 18, 2023 at 11:20:46AM -0400, Alan Stern wrote:
> On Wed, Oct 18, 2023 at 05:00:58AM +0000, Li, Meng wrote:
> > > Were you able to locate the original bug report?
> > > 
> > This is original bug report
> > https://bugzilla.redhat.com/show_bug.cgi?id=579093
> 
> The Red Hat Bugzilla says:
> 
> 	You are not authorized to access bug #579093.
> 
> So I can't tell exactly what happened back then.  :-(
> 
> But I do vaguely remember the discussion with Stratus Technologies.  
> They had special hardware in their systems, which allowed them to do 
> hot-swapping of PCI components.

Incidentally, for anyone who's interested, some early discussion about 
these problems can be found on an open mailing list here:

	https://marc.info/?l=linux-usb&m=127559364101206&w=2

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-18 15:20             ` Alan Stern
  2023-10-18 15:34               ` Alan Stern
@ 2023-10-19 12:38               ` Sebastian Andrzej Siewior
  2023-10-19 15:09                 ` Alan Stern
  1 sibling, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-19 12:38 UTC (permalink / raw)
  To: Alan Stern
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On 2023-10-18 11:20:46 [-0400], Alan Stern wrote:
> 
> If you hadn't removed the card suddenly, the exception would not have 
> occurred.  So the logical conclusion isn't that we should get rid of the 
> usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't 
> remove PCIe cards while the system is running.  Not unless your computer 
> uses the special hardware from Stratus Technologies.

So the card was removed and the kernel complained that it can't access
the memory behind the PCI-bar?

How odd…

> > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
> 
> What about cardbus or PCMCIA cards?  Removing one of those cards 
> suddenly, while the system is running, is a perfectly reasonable thing 
> to do and it will not cause any hardware damage.  So I think we should 
> keep the usb_hcd_irq(0, hcd) call.

Don't you invoke pci_driver::remove in such case to properly let the
physical device go? This can also be tested via unbind from sysfs.

> Alan Stern

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-16 16:56   ` Steven Rostedt
  2023-10-16 19:23     ` Alan Stern
@ 2023-10-19 12:49     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-19 12:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alan Stern, Li, Meng, Ingo Molnar, USB mailing list, linux-rt-users

On 2023-10-16 12:56:24 [-0400], Steven Rostedt wrote:
> On Fri, 13 Oct 2023 13:17:52 -0400
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> Because in RT, spin_lock()s are actually mutexes.
> 
> In RT, interrupt handlers do not even run with interrupts disabled (they
> run as threads), so the assumption that they run with interrupts disabled
> on RT is incorrect. One hack would simply be:
> 
> 	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> 		local_irq_disable();
> 	usb_hcd_irq(0, hcd);
> 	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> 		local_irq_enable();
> 
> But that's rather ugly. We use to have that as a wrapper of just:
> 
> 	local_irq_disable_nort();
> 
> but I don't know if we removed that or not.
> 
> Sebastian?

Got removed once it had zero users. It was hated because it was used
usually used as a quick solution to hide problems underneath.

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-19 12:38               ` Sebastian Andrzej Siewior
@ 2023-10-19 15:09                 ` Alan Stern
  2023-10-19 15:27                   ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-19 15:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On Thu, Oct 19, 2023 at 02:38:23PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-10-18 11:20:46 [-0400], Alan Stern wrote:
> > 
> > If you hadn't removed the card suddenly, the exception would not have 
> > occurred.  So the logical conclusion isn't that we should get rid of the 
> > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't 
> > remove PCIe cards while the system is running.  Not unless your computer 
> > uses the special hardware from Stratus Technologies.
> 
> So the card was removed and the kernel complained that it can't access
> the memory behind the PCI-bar?
> 
> How odd…
> 
> > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
> > 
> > What about cardbus or PCMCIA cards?  Removing one of those cards 
> > suddenly, while the system is running, is a perfectly reasonable thing 
> > to do and it will not cause any hardware damage.  So I think we should 
> > keep the usb_hcd_irq(0, hcd) call.
> 
> Don't you invoke pci_driver::remove in such case to properly let the
> physical device go? This can also be tested via unbind from sysfs.

I don't know any more.  The code in question was added in 2010 in order 
to handle a specially designed high-availability hot-swap-capable 
system, and it may not be relevant now.

The original problem: When a particular PCI device was disabled (to 
simulate a failure), pci_driver::remove did not get called.  Maybe 
they wanted to have it fail over to a backup device and appear to the 
kernel as though nothing had happened?  It's hard to tell exactly what 
was going on thirteen years ago.

So while this function call may not be needed any more, I prefer to keep 
it around if that's not too much trouble.

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-19 15:09                 ` Alan Stern
@ 2023-10-19 15:27                   ` Alan Stern
  2023-10-20  9:52                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-19 15:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On Thu, Oct 19, 2023 at 11:09:57AM -0400, Alan Stern wrote:
> On Thu, Oct 19, 2023 at 02:38:23PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2023-10-18 11:20:46 [-0400], Alan Stern wrote:
> > > 
> > > If you hadn't removed the card suddenly, the exception would not have 
> > > occurred.  So the logical conclusion isn't that we should get rid of the 
> > > usb_hcd_irq(0, hcd) call -- the logical conclusion is that you shouldn't 
> > > remove PCIe cards while the system is running.  Not unless your computer 
> > > uses the special hardware from Stratus Technologies.
> > 
> > So the card was removed and the kernel complained that it can't access
> > the memory behind the PCI-bar?
> > 
> > How odd…
> > 
> > > > so I think we don't need to add usb_hcd_irq(0, hcd) on the logical path of unbinding pcie driver.
> > > 
> > > What about cardbus or PCMCIA cards?  Removing one of those cards 
> > > suddenly, while the system is running, is a perfectly reasonable thing 
> > > to do and it will not cause any hardware damage.  So I think we should 
> > > keep the usb_hcd_irq(0, hcd) call.
> > 
> > Don't you invoke pci_driver::remove in such case to properly let the
> > physical device go? This can also be tested via unbind from sysfs.
> 
> I don't know any more.  The code in question was added in 2010 in order 
> to handle a specially designed high-availability hot-swap-capable 
> system, and it may not be relevant now.
> 
> The original problem: When a particular PCI device was disabled (to 
> simulate a failure), pci_driver::remove did not get called.

Oops...  Correction: pci_driver::remove _did_ get called, and the code 
in question is part of the remove handler.

Perhaps putting it there was a mistake.  At the time, I probably thought 
the problem was a general one which might affect all the PCI USB 
controller drivers, but looking back now, it might have been better to 
put in the individual device driver.

Perhaps that what we should do.

Alan Stern

>  Maybe 
> they wanted to have it fail over to a backup device and appear to the 
> kernel as though nothing had happened?  It's hard to tell exactly what 
> was going on thirteen years ago.
> 
> So while this function call may not be needed any more, I prefer to keep 
> it around if that's not too much trouble.
> 
> Alan Stern
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-19 15:27                   ` Alan Stern
@ 2023-10-20  9:52                     ` Sebastian Andrzej Siewior
  2023-10-20 15:19                       ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-20  9:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On 2023-10-19 11:27:54 [-0400], Alan Stern wrote:
> 
> Perhaps that what we should do.

Perfect.

> Alan Stern

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-20  9:52                     ` Sebastian Andrzej Siewior
@ 2023-10-20 15:19                       ` Alan Stern
  2023-11-03 15:46                         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2023-10-20 15:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On Fri, Oct 20, 2023 at 11:52:38AM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-10-19 11:27:54 [-0400], Alan Stern wrote:
> > 
> > Perhaps that what we should do.
> 
> Perfect.

Hmmm...  This turns out not to be as easy as one might think.

Sebastian, if you can instead suggest a way to call drivers' interrupt 
handlers (i.e., simulate an interrupt) without causing problems for RT 
kernels, I think that would be a better approach.

The fundamental problem here is that the uhci-hcd driver was not written 
with unexpected hardware removal in mind.  It doesn't have timeouts to 
handle situations where the device doesn't generate an IRQ to indicate 
completion of an I/O operation.  And since it's been ten years since 
I've done any significant work on the driver, I'd really like to avoid 
the need for such a far-reaching change (not least because I don't have 
any way to test it).

I suppose an alternative approach would be to add a new callback pointer 
to the hc_driver structure -- something that could tell the driver to 
check for hardware removal.  I'll do that if there's no other choice.  
But simulating an interrupt seems easier, provided it can be done at 
all.

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-10-20 15:19                       ` Alan Stern
@ 2023-11-03 15:46                         ` Sebastian Andrzej Siewior
  2023-11-03 20:42                           ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-03 15:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On 2023-10-20 11:19:49 [-0400], Alan Stern wrote:
> Hmmm...  This turns out not to be as easy as one might think.
> 
> Sebastian, if you can instead suggest a way to call drivers' interrupt 
> handlers (i.e., simulate an interrupt) without causing problems for RT 
> kernels, I think that would be a better approach.

So there is generic_handle_irq_safe(). It should get all the details
right like incrementing the counter in /proc/interrupts, doing nothing
if the interrupt has been masked or waking the interrupt thread if the
interrupt has happen to be threaded.
It triggers the interrupt so for a shared handler it will invoke _all_
registered interrupt handler and for threaded interrupts it will return
before the thread had a chance to run (free_irq() will handle it
properly and wait for the interrupt thread/handler to complete).

> The fundamental problem here is that the uhci-hcd driver was not written 
> with unexpected hardware removal in mind.  It doesn't have timeouts to 
> handle situations where the device doesn't generate an IRQ to indicate 
> completion of an I/O operation.  And since it's been ten years since 
> I've done any significant work on the driver, I'd really like to avoid 
> the need for such a far-reaching change (not least because I don't have 
> any way to test it).

I see. Don't over complicate or "correct" things here. What should work
is that the removal callback can be called at any time and things
continue work. That means it will purge all queues, cancel all requests,
timers, whatever and free all resources associated with the driver/
device.

If it comes to PCI-hotplug you have to have a so called PCI-hotplug
slot. This "slot" will let the OS know if the hardware has been removed
or added. If you don't have such a thing you have to maintain the state
yourself by using the "remove" and "rescan" sysfs files of the PCI slot.

I'm not aware of any requirement for a PCI-driver to check if its device
has been removed.

> I suppose an alternative approach would be to add a new callback pointer 
> to the hc_driver structure -- something that could tell the driver to 
> check for hardware removal.  I'll do that if there's no other choice.  
> But simulating an interrupt seems easier, provided it can be done at 
> all.
> 
> Alan Stern

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-11-03 15:46                         ` Sebastian Andrzej Siewior
@ 2023-11-03 20:42                           ` Alan Stern
  2023-11-06  3:02                             ` Li, Meng
  2023-11-06  8:27                             ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 25+ messages in thread
From: Alan Stern @ 2023-11-03 20:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On Fri, Nov 03, 2023 at 04:46:24PM +0100, Sebastian Andrzej Siewior wrote:
> On 2023-10-20 11:19:49 [-0400], Alan Stern wrote:
> > Hmmm...  This turns out not to be as easy as one might think.
> > 
> > Sebastian, if you can instead suggest a way to call drivers' interrupt 
> > handlers (i.e., simulate an interrupt) without causing problems for RT 
> > kernels, I think that would be a better approach.
> 
> So there is generic_handle_irq_safe(). It should get all the details
> right like incrementing the counter in /proc/interrupts, doing nothing
> if the interrupt has been masked or waking the interrupt thread if the
> interrupt has happen to be threaded.
> It triggers the interrupt so for a shared handler it will invoke _all_
> registered interrupt handler and for threaded interrupts it will return
> before the thread had a chance to run (free_irq() will handle it
> properly and wait for the interrupt thread/handler to complete).

Good.  Meng Li, can you test a patch that replaces the
local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a
single call to generic_handle_irq_safe()?

> > The fundamental problem here is that the uhci-hcd driver was not written 
> > with unexpected hardware removal in mind.  It doesn't have timeouts to 
> > handle situations where the device doesn't generate an IRQ to indicate 
> > completion of an I/O operation.  And since it's been ten years since 
> > I've done any significant work on the driver, I'd really like to avoid 
> > the need for such a far-reaching change (not least because I don't have 
> > any way to test it).
> 
> I see. Don't over complicate or "correct" things here. What should work
> is that the removal callback can be called at any time and things
> continue work. That means it will purge all queues, cancel all requests,
> timers, whatever and free all resources associated with the driver/
> device.

The driver _does_ work under those circumstances -- provided the
hardware is still present and accessible.

> If it comes to PCI-hotplug you have to have a so called PCI-hotplug
> slot. This "slot" will let the OS know if the hardware has been removed
> or added. If you don't have such a thing you have to maintain the state
> yourself by using the "remove" and "rescan" sysfs files of the PCI slot.
> 
> I'm not aware of any requirement for a PCI-driver to check if its device
> has been removed.

That's the problem: The driver doesn't really support PCI-hotplug.
The code that Meng Li wants to change was sort of a half-baked way to
add such support.

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: USB: add check to detect host controller hardware removal
  2023-11-03 20:42                           ` Alan Stern
@ 2023-11-06  3:02                             ` Li, Meng
  2023-11-06  8:28                               ` Sebastian Andrzej Siewior
  2023-11-06  8:27                             ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-11-06  3:02 UTC (permalink / raw)
  To: Alan Stern, Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users



> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: Saturday, November 4, 2023 4:42 AM
> To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Li, Meng <Meng.Li@windriver.com>; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: USB: add check to detect host controller hardware removal
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> On Fri, Nov 03, 2023 at 04:46:24PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2023-10-20 11:19:49 [-0400], Alan Stern wrote:
> > > Hmmm...  This turns out not to be as easy as one might think.
> > >
> > > Sebastian, if you can instead suggest a way to call drivers'
> > > interrupt handlers (i.e., simulate an interrupt) without causing
> > > problems for RT kernels, I think that would be a better approach.
> >
> > So there is generic_handle_irq_safe(). It should get all the details
> > right like incrementing the counter in /proc/interrupts, doing nothing
> > if the interrupt has been masked or waking the interrupt thread if the
> > interrupt has happen to be threaded.
> > It triggers the interrupt so for a shared handler it will invoke _all_
> > registered interrupt handler and for threaded interrupts it will
> > return before the thread had a chance to run (free_irq() will handle
> > it properly and wait for the interrupt thread/handler to complete).
> 
> Good.  Meng Li, can you test a patch that replaces the
> local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to
> generic_handle_irq_safe()?

It needs an irq number as parameter, what I should pass to this function, 0 or dev->irq or other value?

Thanks,
Limeng

> 
> > > The fundamental problem here is that the uhci-hcd driver was not
> > > written with unexpected hardware removal in mind.  It doesn't have
> > > timeouts to handle situations where the device doesn't generate an
> > > IRQ to indicate completion of an I/O operation.  And since it's been
> > > ten years since I've done any significant work on the driver, I'd
> > > really like to avoid the need for such a far-reaching change (not
> > > least because I don't have any way to test it).
> >
> > I see. Don't over complicate or "correct" things here. What should
> > work is that the removal callback can be called at any time and things
> > continue work. That means it will purge all queues, cancel all
> > requests, timers, whatever and free all resources associated with the
> > driver/ device.
> 
> The driver _does_ work under those circumstances -- provided the hardware is
> still present and accessible.
> 
> > If it comes to PCI-hotplug you have to have a so called PCI-hotplug
> > slot. This "slot" will let the OS know if the hardware has been
> > removed or added. If you don't have such a thing you have to maintain
> > the state yourself by using the "remove" and "rescan" sysfs files of the PCI
> slot.
> >
> > I'm not aware of any requirement for a PCI-driver to check if its
> > device has been removed.
> 
> That's the problem: The driver doesn't really support PCI-hotplug.
> The code that Meng Li wants to change was sort of a half-baked way to add
> such support.
> 
> Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: USB: add check to detect host controller hardware removal
  2023-11-03 20:42                           ` Alan Stern
  2023-11-06  3:02                             ` Li, Meng
@ 2023-11-06  8:27                             ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-06  8:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On 2023-11-03 16:42:18 [-0400], Alan Stern wrote:
> > I see. Don't over complicate or "correct" things here. What should work
> > is that the removal callback can be called at any time and things
> > continue work. That means it will purge all queues, cancel all requests,
> > timers, whatever and free all resources associated with the driver/
> > device.
> 
> The driver _does_ work under those circumstances -- provided the
> hardware is still present and accessible.

In that case I don't see a problem.

> > If it comes to PCI-hotplug you have to have a so called PCI-hotplug
> > slot. This "slot" will let the OS know if the hardware has been removed
> > or added. If you don't have such a thing you have to maintain the state
> > yourself by using the "remove" and "rescan" sysfs files of the PCI slot.
> > 
> > I'm not aware of any requirement for a PCI-driver to check if its device
> > has been removed.
> 
> That's the problem: The driver doesn't really support PCI-hotplug.
> The code that Meng Li wants to change was sort of a half-baked way to
> add such support.

This sounds half-baked. Just remove the device from sysfs and then
physically plug the card. It is going to end in mess if every driver
gets some "hotplug" support.

> Alan Stern

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: USB: add check to detect host controller hardware removal
  2023-11-06  3:02                             ` Li, Meng
@ 2023-11-06  8:28                               ` Sebastian Andrzej Siewior
  2023-11-06  8:54                                 ` Li, Meng
  2023-11-06 15:25                                 ` Alan Stern
  0 siblings, 2 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-06  8:28 UTC (permalink / raw)
  To: Li, Meng
  Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
	linux-rt-users

On 2023-11-06 03:02:42 [+0000], Li, Meng wrote:
> > Good.  Meng Li, can you test a patch that replaces the
> > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to
> > generic_handle_irq_safe()?
> 
> It needs an irq number as parameter, what I should pass to this
> function, 0 or dev->irq or other value?

dev->irq is what it asks for. I would really appreciate if you would
instead use the sysfs interface to remove the device prior physically
removing it. This should solve your trouble.

> Thanks,
> Limeng

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: RE: USB: add check to detect host controller hardware removal
  2023-11-06  8:28                               ` Sebastian Andrzej Siewior
@ 2023-11-06  8:54                                 ` Li, Meng
  2023-11-06  9:09                                   ` Sebastian Andrzej Siewior
  2023-11-06 15:25                                 ` Alan Stern
  1 sibling, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-11-06  8:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
	linux-rt-users



> -----Original Message-----
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Sent: Monday, November 6, 2023 4:28 PM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: RE: USB: add check to detect host controller hardware removal
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> On 2023-11-06 03:02:42 [+0000], Li, Meng wrote:
> > > Good.  Meng Li, can you test a patch that replaces the
> > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with
> > > a single call to generic_handle_irq_safe()?
> >
> > It needs an irq number as parameter, what I should pass to this
> > function, 0 or dev->irq or other value?
> 
> dev->irq is what it asks for. I would really appreciate if you would
> instead use the sysfs interface to remove the device prior physically removing
> it. This should solve your trouble.
> 

This is not my original issue that I encountered.
I agree that we should remove the device from sys interface firstly, and then do hot-plug action.
My original issue was the calltrace on RT kernel if I remove the device from sys interface.
# echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
xhci_hcd 0001:01:00.0: remove, state 1
usb usb2: USB disconnect, device number 1
usb 2-4: USB disconnect, device number 2
xhci_hcd 0001:01:00.0: USB bus 2 deregistered
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
root@nxp-ls1043:preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
~# CPU: 0 PID: 765 Comm: sh Tainted: G        W          6.1.57-rt10-yocto-preempt-rt #1
Hardware name: LS1043A RDB Board (DT)
Call trace:
 dump_backtrace.part.0+0xc8/0xd4
 show_stack+0x20/0x30
 dump_stack_lvl+0x6c/0x88
 dump_stack+0x18/0x34
 __might_resched+0x160/0x1c0
 rt_spin_lock+0x38/0xb0
 xhci_irq+0x44/0x16d0
 usb_hcd_irq+0x38/0x5c
 usb_hcd_pci_remove+0x84/0x14c
 xhci_pci_remove+0x78/0xc0
 pci_device_remove+0x44/0xcc
 device_remove+0x54/0x8c
 device_release_driver_internal+0x1ec/0x260
 device_release_driver+0x20/0x30
 pci_stop_bus_device+0x8c/0xcc
 pci_stop_and_remove_bus_device_locked+0x28/0x44
 remove_store+0xa0/0xb0
 dev_attr_store+0x20/0x34
 sysfs_kf_write+0x4c/0x5c
 kernfs_fop_write_iter+0x128/0x1f0
 vfs_write+0x1c0/0x2f0
 ksys_write+0x78/0x110
 __arm64_sys_write+0x24/0x30
 invoke_syscall+0x5c/0x130
 el0_svc_common.constprop.0+0x4c/0xf4
 do_el0_svc+0x34/0xc0
 el0_svc+0x2c/0x84
 el0t_64_sync_handler+0xf4/0x120
 el0t_64_sync+0x18c/0x190

and then I checked the commit log to get which commit introduced this issue.
I found out the commit is 
commit c548795abe0d3520b74e18f23ca0a0d72deddab9
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Wed Jun 9 17:34:27 2010 -0400

    USB: add check to detect host controller hardware removal

And then, Alan Stern told me the background of this issue. so, I started to do hotplug operation on my board to see what symptom on my nxp-ls1043/6 board.
And then there were lots of discussion followed.

Thanks,
Limeng

> > Thanks,
> > Limeng
> 
> Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: RE: USB: add check to detect host controller hardware removal
  2023-11-06  8:54                                 ` Li, Meng
@ 2023-11-06  9:09                                   ` Sebastian Andrzej Siewior
  2023-11-07  3:08                                     ` Li, Meng
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-11-06  9:09 UTC (permalink / raw)
  To: Li, Meng
  Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
	linux-rt-users

On 2023-11-06 08:54:50 [+0000], Li, Meng wrote:

> This is not my original issue that I encountered.
> I agree that we should remove the device from sys interface firstly,
> and then do hot-plug action.
> My original issue was the calltrace on RT kernel if I remove the
> device from sys interface.
> # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> xhci_hcd 0001:01:00.0: remove, state 1
> usb usb2: USB disconnect, device number 1
> usb 2-4: USB disconnect, device number 2
> xhci_hcd 0001:01:00.0: USB bus 2 deregistered
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh

Right.

> and then I checked the commit log to get which commit introduced this issue.
> I found out the commit is 
> commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> Author: Alan Stern <stern@rowland.harvard.edu>
> Date:   Wed Jun 9 17:34:27 2010 -0400
> 
>     USB: add check to detect host controller hardware removal
> 
> And then, Alan Stern told me the background of this issue. so, I
> started to do hotplug operation on my board to see what symptom on my
> nxp-ls1043/6 board.
> And then there were lots of discussion followed.

Okay. I somehow mapped it that you try to add this to xhci.
The suggested replacement should cover it. Better if we could get rid of
it ;)

> Thanks,
> Limeng

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: USB: add check to detect host controller hardware removal
  2023-11-06  8:28                               ` Sebastian Andrzej Siewior
  2023-11-06  8:54                                 ` Li, Meng
@ 2023-11-06 15:25                                 ` Alan Stern
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Stern @ 2023-11-06 15:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Li, Meng, Steven Rostedt, Ingo Molnar, USB mailing list, linux-rt-users

On Mon, Nov 06, 2023 at 09:28:29AM +0100, Sebastian Andrzej Siewior wrote:
> On 2023-11-06 03:02:42 [+0000], Li, Meng wrote:
> > > Good.  Meng Li, can you test a patch that replaces the
> > > local_irq_disable() - usb_hcd_irq() - local_irq_enable() lines with a single call to
> > > generic_handle_irq_safe()?
> > 
> > It needs an irq number as parameter, what I should pass to this
> > function, 0 or dev->irq or other value?
> 
> dev->irq is what it asks for.

In fact it should be hcd->irq -- see usb_hcd_request_irqs().  Maybe this 
is the same value as dev->irq; I don't know.

> I would really appreciate if you would
> instead use the sysfs interface to remove the device prior physically
> removing it. This should solve your trouble.

The whole reason for having this code in the first place -- the specific 
use case -- was to handle device removal caused by sudden hardware 
failure.  Obviously there's no way to use the sysfs interface before 
this occurs.

Alan Stern

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: RE: RE: USB: add check to detect host controller hardware removal
  2023-11-06  9:09                                   ` Sebastian Andrzej Siewior
@ 2023-11-07  3:08                                     ` Li, Meng
  2023-11-07  6:17                                       ` Greg KH
  0 siblings, 1 reply; 25+ messages in thread
From: Li, Meng @ 2023-11-07  3:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Alan Stern, Steven Rostedt, Ingo Molnar, USB mailing list,
	linux-rt-users



> -----Original Message-----
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Sent: Monday, November 6, 2023 5:09 PM
> To: Li, Meng <Meng.Li@windriver.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt
> <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> Subject: Re: RE: RE: USB: add check to detect host controller hardware
> removal
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> On 2023-11-06 08:54:50 [+0000], Li, Meng wrote:
> 
> > This is not my original issue that I encountered.
> > I agree that we should remove the device from sys interface firstly,
> > and then do hot-plug action.
> > My original issue was the calltrace on RT kernel if I remove the
> > device from sys interface.
> > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> > xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect,
> > device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd
> > 0001:01:00.0: USB bus 2 deregistered
> > BUG: sleeping function called from invalid context at
> > kernel/locking/spinlock_rt.c:46
> > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
> 
> Right.
> 
> > and then I checked the commit log to get which commit introduced this
> issue.
> > I found out the commit is
> > commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> > Author: Alan Stern <stern@rowland.harvard.edu>
> > Date:   Wed Jun 9 17:34:27 2010 -0400
> >
> >     USB: add check to detect host controller hardware removal
> >
> > And then, Alan Stern told me the background of this issue. so, I
> > started to do hotplug operation on my board to see what symptom on my
> > nxp-ls1043/6 board.
> > And then there were lots of discussion followed.
> 
> Okay. I somehow mapped it that you try to add this to xhci.
> The suggested replacement should cover it. Better if we could get rid of it ;)
> 

generic_handle_irq_safe(dev->irq) works find on my board, there is no calltrace anymore.
Will you create a patch in later?

Thanks,
Limeng

> > Thanks,
> > Limeng
> 
> Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: RE: USB: add check to detect host controller hardware removal
  2023-11-07  3:08                                     ` Li, Meng
@ 2023-11-07  6:17                                       ` Greg KH
  0 siblings, 0 replies; 25+ messages in thread
From: Greg KH @ 2023-11-07  6:17 UTC (permalink / raw)
  To: Li, Meng
  Cc: Sebastian Andrzej Siewior, Alan Stern, Steven Rostedt,
	Ingo Molnar, USB mailing list, linux-rt-users

On Tue, Nov 07, 2023 at 03:08:13AM +0000, Li, Meng wrote:
> 
> 
> > -----Original Message-----
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Sent: Monday, November 6, 2023 5:09 PM
> > To: Li, Meng <Meng.Li@windriver.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>; Steven Rostedt
> > <rostedt@goodmis.org>; Ingo Molnar <mingo@redhat.com>; USB mailing list
> > <linux-usb@vger.kernel.org>; linux-rt-users <linux-rt-users@vger.kernel.org>
> > Subject: Re: RE: RE: USB: add check to detect host controller hardware
> > removal
> > 
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and
> > know the content is safe.
> > 
> > On 2023-11-06 08:54:50 [+0000], Li, Meng wrote:
> > 
> > > This is not my original issue that I encountered.
> > > I agree that we should remove the device from sys interface firstly,
> > > and then do hot-plug action.
> > > My original issue was the calltrace on RT kernel if I remove the
> > > device from sys interface.
> > > # echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove
> > > xhci_hcd 0001:01:00.0: remove, state 1 usb usb2: USB disconnect,
> > > device number 1 usb 2-4: USB disconnect, device number 2 xhci_hcd
> > > 0001:01:00.0: USB bus 2 deregistered
> > > BUG: sleeping function called from invalid context at
> > > kernel/locking/spinlock_rt.c:46
> > > in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 765, name: sh
> > 
> > Right.
> > 
> > > and then I checked the commit log to get which commit introduced this
> > issue.
> > > I found out the commit is
> > > commit c548795abe0d3520b74e18f23ca0a0d72deddab9
> > > Author: Alan Stern <stern@rowland.harvard.edu>
> > > Date:   Wed Jun 9 17:34:27 2010 -0400
> > >
> > >     USB: add check to detect host controller hardware removal
> > >
> > > And then, Alan Stern told me the background of this issue. so, I
> > > started to do hotplug operation on my board to see what symptom on my
> > > nxp-ls1043/6 board.
> > > And then there were lots of discussion followed.
> > 
> > Okay. I somehow mapped it that you try to add this to xhci.
> > The suggested replacement should cover it. Better if we could get rid of it ;)
> > 
> 
> generic_handle_irq_safe(dev->irq) works find on my board, there is no calltrace anymore.
> Will you create a patch in later?

As you have tested this, please create it and submit it so that we know
exactly what changes are required and you get the proper credit for
doing this work.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-11-07  6:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <PH0PR11MB5191464B2F01511D2ADECB3BF1D2A@PH0PR11MB5191.namprd11.prod.outlook.com>
2023-10-13 17:17 ` USB: add check to detect host controller hardware removal Alan Stern
2023-10-15 13:37   ` Li, Meng
2023-10-16 16:56   ` Steven Rostedt
2023-10-16 19:23     ` Alan Stern
2023-10-17  2:23       ` Li, Meng
2023-10-17 14:06         ` Alan Stern
2023-10-18  5:00           ` Li, Meng
2023-10-18 15:20             ` Alan Stern
2023-10-18 15:34               ` Alan Stern
2023-10-19 12:38               ` Sebastian Andrzej Siewior
2023-10-19 15:09                 ` Alan Stern
2023-10-19 15:27                   ` Alan Stern
2023-10-20  9:52                     ` Sebastian Andrzej Siewior
2023-10-20 15:19                       ` Alan Stern
2023-11-03 15:46                         ` Sebastian Andrzej Siewior
2023-11-03 20:42                           ` Alan Stern
2023-11-06  3:02                             ` Li, Meng
2023-11-06  8:28                               ` Sebastian Andrzej Siewior
2023-11-06  8:54                                 ` Li, Meng
2023-11-06  9:09                                   ` Sebastian Andrzej Siewior
2023-11-07  3:08                                     ` Li, Meng
2023-11-07  6:17                                       ` Greg KH
2023-11-06 15:25                                 ` Alan Stern
2023-11-06  8:27                             ` Sebastian Andrzej Siewior
2023-10-19 12:49     ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.