linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* usb: host: Reduce xhci_handshake timeout in xhci_reset
       [not found] <CGME20210622113915epcas2p284c61291fc9d83487f6dfebb65fd4e9b@epcas2p2.samsung.com>
@ 2021-06-22 11:24 ` Daehwan Jung
  2021-06-22 19:56   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 11+ messages in thread
From: Daehwan Jung @ 2021-06-22 11:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Daehwan Jung, Mathias Nyman, linux-usb, linux-kernel

It seems 10 secs timeout is too long in general case. A core would wait for
10 secs without doing other task and it can be happended on every device.
It's better to reduce timeout for general case and use new quirk if needed.

Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
---
 drivers/usb/host/xhci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 9248ce8..0a1b6be 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -196,7 +196,7 @@ int xhci_reset(struct xhci_hcd *xhci)
 		udelay(1000);
 
 	ret = xhci_handshake(&xhci->op_regs->command,
-			CMD_RESET, 0, 10 * 1000 * 1000);
+			CMD_RESET, 0, 1 * 1000 * 1000);
 	if (ret)
 		return ret;
 
@@ -210,7 +210,7 @@ int xhci_reset(struct xhci_hcd *xhci)
 	 * than status until the "Controller Not Ready" flag is cleared.
 	 */
 	ret = xhci_handshake(&xhci->op_regs->status,
-			STS_CNR, 0, 10 * 1000 * 1000);
+			STS_CNR, 0, 1 * 1000 * 1000);
 
 	xhci->usb2_rhub.bus_state.port_c_suspend = 0;
 	xhci->usb2_rhub.bus_state.suspended_ports = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2021-06-22 11:24 ` usb: host: Reduce xhci_handshake timeout in xhci_reset Daehwan Jung
@ 2021-06-22 19:56   ` Greg Kroah-Hartman
  2021-06-28  2:25     ` Jung Daehwan
  0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2021-06-22 19:56 UTC (permalink / raw)
  To: Daehwan Jung; +Cc: Mathias Nyman, linux-usb, linux-kernel

On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> It seems 10 secs timeout is too long in general case. A core would wait for
> 10 secs without doing other task and it can be happended on every device.

Only if the handshake does not come back sooner, right?

What is causing your device to timeout here?

> It's better to reduce timeout for general case and use new quirk if needed.

What new quirk?

And why 1 second, where did that number come from?

> 
> Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
> ---
>  drivers/usb/host/xhci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 9248ce8..0a1b6be 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -196,7 +196,7 @@ int xhci_reset(struct xhci_hcd *xhci)
>  		udelay(1000);
>  
>  	ret = xhci_handshake(&xhci->op_regs->command,
> -			CMD_RESET, 0, 10 * 1000 * 1000);
> +			CMD_RESET, 0, 1 * 1000 * 1000);
>  	if (ret)
>  		return ret;
>  
> @@ -210,7 +210,7 @@ int xhci_reset(struct xhci_hcd *xhci)
>  	 * than status until the "Controller Not Ready" flag is cleared.
>  	 */
>  	ret = xhci_handshake(&xhci->op_regs->status,
> -			STS_CNR, 0, 10 * 1000 * 1000);
> +			STS_CNR, 0, 1 * 1000 * 1000);

With this change, what "goes faster"?  What is currently causing
problems with your host controller that this timeout value actually
matters?  Why is it failing?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2021-06-22 19:56   ` Greg Kroah-Hartman
@ 2021-06-28  2:25     ` Jung Daehwan
  2021-06-28  6:53       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 11+ messages in thread
From: Jung Daehwan @ 2021-06-28  2:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Mathias Nyman, linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]

On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> > It seems 10 secs timeout is too long in general case. A core would wait for
> > 10 secs without doing other task and it can be happended on every device.
> 
> Only if the handshake does not come back sooner, right?

Yes, right.

> What is causing your device to timeout here?

Host Controller doesn't respond handshake. I don't know why and I ask HW team
to debug it.

> > It's better to reduce timeout for general case and use new quirk if needed.
>
> What new quirk?

I mean someone can add new quirk if one still needs long timeout. I guess 1 sec
seems enough but there're many kinds of devices.

> And why 1 second, where did that number come from?

It was 250 msecs before changed to 10 secs. There's no required minimum time in
xhci specification. 1 second is estimated number and it works well on my device.

> >
> > Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
> > ---
> >  drivers/usb/host/xhci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > index 9248ce8..0a1b6be 100644
> > --- a/drivers/usb/host/xhci.c
> > +++ b/drivers/usb/host/xhci.c
> > @@ -196,7 +196,7 @@ int xhci_reset(struct xhci_hcd *xhci)
> >  		udelay(1000);
> >
> >  	ret = xhci_handshake(&xhci->op_regs->command,
> > -			CMD_RESET, 0, 10 * 1000 * 1000);
> > +			CMD_RESET, 0, 1 * 1000 * 1000);
> >  	if (ret)
> >  		return ret;
> >
> > @@ -210,7 +210,7 @@ int xhci_reset(struct xhci_hcd *xhci)
> >  	 * than status until the "Controller Not Ready" flag is cleared.
> >  	 */
> >  	ret = xhci_handshake(&xhci->op_regs->status,
> > -			STS_CNR, 0, 10 * 1000 * 1000);
> > +			STS_CNR, 0, 1 * 1000 * 1000);
>
> With this change, what "goes faster"?  What is currently causing
> problems with your host controller that this timeout value actually
> matters?  Why is it failing?

I guess the root cause of it is from host controller, which it is HW.
Our HW engineer has been debugging it, but I haven't get any clue till now.
However, I think 10 secs timeout is too long and it can cause system problem.
That's why I want to change timeout value. A CPU core would not do anything but
waiting xhci reset for 10 secs with disabling irq like below.

usb_remove_hcd -> xhci_stop -> xhci_reset -> xhci_handshake

static void xhci_stop(struct usb_hcd *hcd)
{
        u32 temp;
        struct xhci_hcd *xhci = hcd_to_xhci(hcd);

        mutex_lock(&xhci->mutex);

        /* Only halt host and free memory after both hcds are removed */
        if (!usb_hcd_is_primary_hcd(hcd)) {
                mutex_unlock(&xhci->mutex);
                return;
        }

        xhci_dbc_exit(xhci);

        spin_lock_irq(&xhci->lock);           -> disable IRQ
        xhci->xhc_state |= XHCI_STATE_HALTED;
        xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
        xhci_halt(xhci);
        xhci_reset(xhci);                     -> 10 seconds timeout!
        spin_unlock_irq(&xhci->lock);

Best Regards,
Jung Daehwan

> thanks,
> 
> greg k-h
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2021-06-28  2:25     ` Jung Daehwan
@ 2021-06-28  6:53       ` Greg Kroah-Hartman
  2021-06-28  6:55         ` Jung Daehwan
  0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2021-06-28  6:53 UTC (permalink / raw)
  To: Jung Daehwan; +Cc: Mathias Nyman, linux-usb, linux-kernel

On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> > > It seems 10 secs timeout is too long in general case. A core would wait for
> > > 10 secs without doing other task and it can be happended on every device.
> > 
> > Only if the handshake does not come back sooner, right?
> 
> Yes, right.
> 
> > What is causing your device to timeout here?
> 
> Host Controller doesn't respond handshake. I don't know why and I ask HW team
> to debug it.

Please work to fix your hardware, that feels like the root of the
problem here.  If you require the timeout for xhci_reset() to happen,
then how do you know that the hardware really did reset properly in the
reduced amount of time you just provided?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2021-06-28  6:53       ` Greg Kroah-Hartman
@ 2021-06-28  6:55         ` Jung Daehwan
  2021-06-28  7:49           ` Mathias Nyman
  0 siblings, 1 reply; 11+ messages in thread
From: Jung Daehwan @ 2021-06-28  6:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Mathias Nyman, linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1799 bytes --]

On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> > On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> > > > It seems 10 secs timeout is too long in general case. A core would wait for
> > > > 10 secs without doing other task and it can be happended on every device.
> > >
> > > Only if the handshake does not come back sooner, right?
> >
> > Yes, right.
> >
> > > What is causing your device to timeout here?
> >
> > Host Controller doesn't respond handshake. I don't know why and I ask HW team
> > to debug it.
>
> Please work to fix your hardware, that feels like the root of the
> problem here.  If you require the timeout for xhci_reset() to happen,
> then how do you know that the hardware really did reset properly in the
> reduced amount of time you just provided?
>

I continue fixing this issue with hardware engineer, but currently just
host controller can crash whole system and that's why I want to fix it.
How about adding some error logs in this situation for recognizing this issue?
We can add error log in xhci_stop as xhci_reset can returns error like below.

static void xhci_stop(struct usb_hcd *hcd)
{
        u32 temp;
        struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+       int ret;

        mutex_lock(&xhci->mutex);

@@ -733,6 +734,9 @@ static void xhci_stop(struct usb_hcd *hcd)
        xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
        xhci_halt(xhci);
        xhci_reset(xhci);
+       if (ret)
+               xhci_err(xhci, "%s: Error while reset xhci Host controller - ret = %d\n"
+                       , __func__, ret);
        spin_unlock_irq(&xhci->lock);


> thanks,
>
> greg k-h
>



[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2021-06-28  6:55         ` Jung Daehwan
@ 2021-06-28  7:49           ` Mathias Nyman
  2022-02-11  6:46             ` Pavan Kondeti
  0 siblings, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2021-06-28  7:49 UTC (permalink / raw)
  To: Jung Daehwan, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

On 28.6.2021 9.55, Jung Daehwan wrote:
> On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
>> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
>>> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
>>>> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
>>>>> It seems 10 secs timeout is too long in general case. A core would wait for
>>>>> 10 secs without doing other task and it can be happended on every device.
>>>>
>>>> Only if the handshake does not come back sooner, right?
>>>
>>> Yes, right.
>>>
>>>> What is causing your device to timeout here?
>>>
>>> Host Controller doesn't respond handshake. I don't know why and I ask HW team
>>> to debug it.
>>
>> Please work to fix your hardware, that feels like the root of the
>> problem here.  If you require the timeout for xhci_reset() to happen,
>> then how do you know that the hardware really did reset properly in the
>> reduced amount of time you just provided?
>>
> 
> I continue fixing this issue with hardware engineer, but currently just
> host controller can crash whole system and that's why I want to fix it.
> How about adding some error logs in this situation for recognizing this issue?
> We can add error log in xhci_stop as xhci_reset can returns error like below.
> 
> static void xhci_stop(struct usb_hcd *hcd)
> {
>         u32 temp;
>         struct xhci_hcd *xhci = hcd_to_xhci(hcd);
> +       int ret;
> 
>         mutex_lock(&xhci->mutex);
> 
> @@ -733,6 +734,9 @@ static void xhci_stop(struct usb_hcd *hcd)
>         xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
>         xhci_halt(xhci);
>         xhci_reset(xhci);
> +       if (ret)
> +               xhci_err(xhci, "%s: Error while reset xhci Host controller - ret = %d\n"
> +                       , __func__, ret);
>         spin_unlock_irq(&xhci->lock);
> 

We can check the xhci_reset() return value here and print a message, makes sense.

The original reason for the 10 second timeout was because a host actually took 9 seconds:

commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b

    xhci: Increase reset timeout for Renesas 720201 host.
    
    The NEC/Renesas 720201 xHCI host controller does not complete its reset
    within 250 milliseconds.  In fact, it takes about 9 seconds to reset the
    host controller, and 1 second for the host to be ready for doorbell
    rings.  Extend the reset and CNR polling timeout to 10 seconds each.

-Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2021-06-28  7:49           ` Mathias Nyman
@ 2022-02-11  6:46             ` Pavan Kondeti
  2022-02-11  7:43               ` Pavan Kondeti
  0 siblings, 1 reply; 11+ messages in thread
From: Pavan Kondeti @ 2022-02-11  6:46 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Jung Daehwan, Greg Kroah-Hartman, linux-usb, linux-kernel, quic_udipto

On Mon, Jun 28, 2021 at 10:49:00AM +0300, Mathias Nyman wrote:
> On 28.6.2021 9.55, Jung Daehwan wrote:
> > On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
> >> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> >>> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> >>>> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> >>>>> It seems 10 secs timeout is too long in general case. A core would wait for
> >>>>> 10 secs without doing other task and it can be happended on every device.
> >>>>
> >>>> Only if the handshake does not come back sooner, right?
> >>>
> >>> Yes, right.
> >>>
> >>>> What is causing your device to timeout here?
> >>>
> >>> Host Controller doesn't respond handshake. I don't know why and I ask HW team
> >>> to debug it.
> >>
> >> Please work to fix your hardware, that feels like the root of the
> >> problem here.  If you require the timeout for xhci_reset() to happen,
> >> then how do you know that the hardware really did reset properly in the
> >> reduced amount of time you just provided?
> >>
> > 
> > I continue fixing this issue with hardware engineer, but currently just
> > host controller can crash whole system and that's why I want to fix it.
> > How about adding some error logs in this situation for recognizing this issue?
> > We can add error log in xhci_stop as xhci_reset can returns error like below.
> > 
> > static void xhci_stop(struct usb_hcd *hcd)
> > {
> >         u32 temp;
> >         struct xhci_hcd *xhci = hcd_to_xhci(hcd);
> > +       int ret;
> > 
> >         mutex_lock(&xhci->mutex);
> > 
> > @@ -733,6 +734,9 @@ static void xhci_stop(struct usb_hcd *hcd)
> >         xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
> >         xhci_halt(xhci);
> >         xhci_reset(xhci);
> > +       if (ret)
> > +               xhci_err(xhci, "%s: Error while reset xhci Host controller - ret = %d\n"
> > +                       , __func__, ret);
> >         spin_unlock_irq(&xhci->lock);
> > 
> 
> We can check the xhci_reset() return value here and print a message, makes sense.
> 
> The original reason for the 10 second timeout was because a host actually took 9 seconds:
> 
> commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b
> 
>     xhci: Increase reset timeout for Renesas 720201 host.
>     
>     The NEC/Renesas 720201 xHCI host controller does not complete its reset
>     within 250 milliseconds.  In fact, it takes about 9 seconds to reset the
>     host controller, and 1 second for the host to be ready for doorbell
>     rings.  Extend the reset and CNR polling timeout to 10 seconds each.
> 
Agreed.

We also run into the similar issue (very very rarely reproduced) on
our platforms like SM8450. The issue happens when host mode is de-activated
(type-c cable disconnected). Since xhci_reset() is called with interrupts
disabled, a timeout of 10 seconds is fatal to the system.

Thanks,
Pavan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2022-02-11  6:46             ` Pavan Kondeti
@ 2022-02-11  7:43               ` Pavan Kondeti
  2022-02-14  4:08                 ` Pavan Kondeti
  0 siblings, 1 reply; 11+ messages in thread
From: Pavan Kondeti @ 2022-02-11  7:43 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Mathias Nyman, Jung Daehwan, Greg Kroah-Hartman, linux-usb,
	linux-kernel, quic_ugoswami

Sorry for the spam. I have added an incorrect email address in my previous
email.

On Fri, Feb 11, 2022 at 12:16:30PM +0530, Pavan Kondeti wrote:
> On Mon, Jun 28, 2021 at 10:49:00AM +0300, Mathias Nyman wrote:
> > On 28.6.2021 9.55, Jung Daehwan wrote:
> > > On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
> > >> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> > >>> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> > >>>> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> > >>>>> It seems 10 secs timeout is too long in general case. A core would wait for
> > >>>>> 10 secs without doing other task and it can be happended on every device.
> > >>>>
> > >>>> Only if the handshake does not come back sooner, right?
> > >>>
> > >>> Yes, right.
> > >>>
> > >>>> What is causing your device to timeout here?
> > >>>
> > >>> Host Controller doesn't respond handshake. I don't know why and I ask HW team
> > >>> to debug it.
> > >>
> > >> Please work to fix your hardware, that feels like the root of the
> > >> problem here.  If you require the timeout for xhci_reset() to happen,
> > >> then how do you know that the hardware really did reset properly in the
> > >> reduced amount of time you just provided?
> > >>
> > > 
> > > I continue fixing this issue with hardware engineer, but currently just
> > > host controller can crash whole system and that's why I want to fix it.
> > > How about adding some error logs in this situation for recognizing this issue?
> > > We can add error log in xhci_stop as xhci_reset can returns error like below.
> > > 
> > > static void xhci_stop(struct usb_hcd *hcd)
> > > {
> > >         u32 temp;
> > >         struct xhci_hcd *xhci = hcd_to_xhci(hcd);
> > > +       int ret;
> > > 
> > >         mutex_lock(&xhci->mutex);
> > > 
> > > @@ -733,6 +734,9 @@ static void xhci_stop(struct usb_hcd *hcd)
> > >         xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
> > >         xhci_halt(xhci);
> > >         xhci_reset(xhci);
> > > +       if (ret)
> > > +               xhci_err(xhci, "%s: Error while reset xhci Host controller - ret = %d\n"
> > > +                       , __func__, ret);
> > >         spin_unlock_irq(&xhci->lock);
> > > 
> > 
> > We can check the xhci_reset() return value here and print a message, makes sense.
> > 
> > The original reason for the 10 second timeout was because a host actually took 9 seconds:
> > 
> > commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b
> > 
> >     xhci: Increase reset timeout for Renesas 720201 host.
> >     
> >     The NEC/Renesas 720201 xHCI host controller does not complete its reset
> >     within 250 milliseconds.  In fact, it takes about 9 seconds to reset the
> >     host controller, and 1 second for the host to be ready for doorbell
> >     rings.  Extend the reset and CNR polling timeout to 10 seconds each.
> > 
> Agreed.
> 
> We also run into the similar issue (very very rarely reproduced) on
> our platforms like SM8450. The issue happens when host mode is de-activated
> (type-c cable disconnected). Since xhci_reset() is called with interrupts
> disabled, a timeout of 10 seconds is fatal to the system.
> 
> Thanks,
> Pavan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2022-02-11  7:43               ` Pavan Kondeti
@ 2022-02-14  4:08                 ` Pavan Kondeti
  2022-02-14  5:22                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 11+ messages in thread
From: Pavan Kondeti @ 2022-02-14  4:08 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Mathias Nyman, Jung Daehwan, Greg Kroah-Hartman, linux-usb,
	linux-kernel, quic_ugoswami

Hi Greg,

On Fri, Feb 11, 2022 at 01:13:31PM +0530, Pavan Kondeti wrote:
> Sorry for the spam. I have added an incorrect email address in my previous
> email.
> 
> On Fri, Feb 11, 2022 at 12:16:30PM +0530, Pavan Kondeti wrote:
> > On Mon, Jun 28, 2021 at 10:49:00AM +0300, Mathias Nyman wrote:
> > > On 28.6.2021 9.55, Jung Daehwan wrote:
> > > > On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
> > > >> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> > > >>> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> > > >>>> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> > > >>>>> It seems 10 secs timeout is too long in general case. A core would wait for
> > > >>>>> 10 secs without doing other task and it can be happended on every device.
> > > >>>>
> > > >>>> Only if the handshake does not come back sooner, right?
> > > >>>
> > > >>> Yes, right.
> > > >>>
> > > >>>> What is causing your device to timeout here?
> > > >>>
> > > >>> Host Controller doesn't respond handshake. I don't know why and I ask HW team
> > > >>> to debug it.
> > > >>
> > > >> Please work to fix your hardware, that feels like the root of the
> > > >> problem here.  If you require the timeout for xhci_reset() to happen,
> > > >> then how do you know that the hardware really did reset properly in the
> > > >> reduced amount of time you just provided?
> > > >>
> > > > 
> > > > I continue fixing this issue with hardware engineer, but currently just
> > > > host controller can crash whole system and that's why I want to fix it.
> > > > How about adding some error logs in this situation for recognizing this issue?
> > > > We can add error log in xhci_stop as xhci_reset can returns error like below.
> > > > 
> > > > static void xhci_stop(struct usb_hcd *hcd)
> > > > {
> > > >         u32 temp;
> > > >         struct xhci_hcd *xhci = hcd_to_xhci(hcd);
> > > > +       int ret;
> > > > 
> > > >         mutex_lock(&xhci->mutex);
> > > > 
> > > > @@ -733,6 +734,9 @@ static void xhci_stop(struct usb_hcd *hcd)
> > > >         xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
> > > >         xhci_halt(xhci);
> > > >         xhci_reset(xhci);
> > > > +       if (ret)
> > > > +               xhci_err(xhci, "%s: Error while reset xhci Host controller - ret = %d\n"
> > > > +                       , __func__, ret);
> > > >         spin_unlock_irq(&xhci->lock);
> > > > 
> > > 
> > > We can check the xhci_reset() return value here and print a message, makes sense.
> > > 
> > > The original reason for the 10 second timeout was because a host actually took 9 seconds:
> > > 
> > > commit 22ceac191211cf6688b1bf6ecd93c8b6bf80ed9b
> > > 
> > >     xhci: Increase reset timeout for Renesas 720201 host.
> > >     
> > >     The NEC/Renesas 720201 xHCI host controller does not complete its reset
> > >     within 250 milliseconds.  In fact, it takes about 9 seconds to reset the
> > >     host controller, and 1 second for the host to be ready for doorbell
> > >     rings.  Extend the reset and CNR polling timeout to 10 seconds each.
> > > 
> > Agreed.
> > 
> > We also run into the similar issue (very very rarely reproduced) on
> > our platforms like SM8450. The issue happens when host mode is de-activated
> > (type-c cable disconnected). Since xhci_reset() is called with interrupts
> > disabled, a timeout of 10 seconds is fatal to the system.

Can you please consider including this change? Let us know if you want this
patch to be resent again with error message and Fixes tag included.

Thanks,
Pavan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2022-02-14  4:08                 ` Pavan Kondeti
@ 2022-02-14  5:22                   ` Greg Kroah-Hartman
  2022-02-14  5:52                     ` Pavan Kondeti
  0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-14  5:22 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Mathias Nyman, Jung Daehwan, linux-usb, linux-kernel, quic_ugoswami

On Mon, Feb 14, 2022 at 09:38:38AM +0530, Pavan Kondeti wrote:
> Hi Greg,
> 
> On Fri, Feb 11, 2022 at 01:13:31PM +0530, Pavan Kondeti wrote:
> > Sorry for the spam. I have added an incorrect email address in my previous
> > email.
> > 
> > On Fri, Feb 11, 2022 at 12:16:30PM +0530, Pavan Kondeti wrote:
> > > On Mon, Jun 28, 2021 at 10:49:00AM +0300, Mathias Nyman wrote:
> > > > On 28.6.2021 9.55, Jung Daehwan wrote:
> > > > > On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
> > > > >> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> > > > >>> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> > > > >>>> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:

<snip>

> Can you please consider including this change? Let us know if you want this
> patch to be resent again with error message and Fixes tag included.

You are responding to an email thread from 6 months ago, without any
change in it at all, so I have no idea what you are referring to here,
sorry.

Please resend any patch you wish to have reviewed, as obviously it is no
longer in our queue and might not even be relevant anymore (you have
tested 5.17-rc4, right?)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
  2022-02-14  5:22                   ` Greg Kroah-Hartman
@ 2022-02-14  5:52                     ` Pavan Kondeti
  0 siblings, 0 replies; 11+ messages in thread
From: Pavan Kondeti @ 2022-02-14  5:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pavan Kondeti, Mathias Nyman, Jung Daehwan, linux-usb,
	linux-kernel, quic_ugoswami

Hi Greg,

On Mon, Feb 14, 2022 at 06:22:16AM +0100, Greg Kroah-Hartman wrote:
> On Mon, Feb 14, 2022 at 09:38:38AM +0530, Pavan Kondeti wrote:
> > Hi Greg,
> > 
> > On Fri, Feb 11, 2022 at 01:13:31PM +0530, Pavan Kondeti wrote:
> > > Sorry for the spam. I have added an incorrect email address in my previous
> > > email.
> > > 
> > > On Fri, Feb 11, 2022 at 12:16:30PM +0530, Pavan Kondeti wrote:
> > > > On Mon, Jun 28, 2021 at 10:49:00AM +0300, Mathias Nyman wrote:
> > > > > On 28.6.2021 9.55, Jung Daehwan wrote:
> > > > > > On Mon, Jun 28, 2021 at 08:53:02AM +0200, Greg Kroah-Hartman wrote:
> > > > > >> On Mon, Jun 28, 2021 at 11:25:48AM +0900, Jung Daehwan wrote:
> > > > > >>> On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> > > > > >>>> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> 
> <snip>
> 
> > Can you please consider including this change? Let us know if you want this
> > patch to be resent again with error message and Fixes tag included.
> 
> You are responding to an email thread from 6 months ago, without any
> change in it at all, so I have no idea what you are referring to here,
> sorry.
> 
> Please resend any patch you wish to have reviewed, as obviously it is no
> longer in our queue and might not even be relevant anymore (you have
> tested 5.17-rc4, right?)

Thanks Greg for the reply. We will test the patch on the latest tree and
resend it.

Thanks,
Pavan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-14  5:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210622113915epcas2p284c61291fc9d83487f6dfebb65fd4e9b@epcas2p2.samsung.com>
2021-06-22 11:24 ` usb: host: Reduce xhci_handshake timeout in xhci_reset Daehwan Jung
2021-06-22 19:56   ` Greg Kroah-Hartman
2021-06-28  2:25     ` Jung Daehwan
2021-06-28  6:53       ` Greg Kroah-Hartman
2021-06-28  6:55         ` Jung Daehwan
2021-06-28  7:49           ` Mathias Nyman
2022-02-11  6:46             ` Pavan Kondeti
2022-02-11  7:43               ` Pavan Kondeti
2022-02-14  4:08                 ` Pavan Kondeti
2022-02-14  5:22                   ` Greg Kroah-Hartman
2022-02-14  5:52                     ` Pavan Kondeti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).