stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
@ 2021-12-29 11:25 Puma Hsu
  2022-01-11 11:43 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Puma Hsu @ 2021-12-29 11:25 UTC (permalink / raw)
  To: mathias.nyman, gregkh
  Cc: s.shtylyov, albertccwang, linux-usb, linux-kernel, Puma Hsu, stable

When HCE(Host Controller Error) is set, it means an internal
error condition has been detected. It needs to re-initialize
the HC too.

Cc: stable@vger.kernel.org
Signed-off-by: Puma Hsu <pumahsu@google.com>
---
v2: Follow Sergey Shtylyov <s.shtylyov@omp.ru>'s comment.
v3: Add stable@vger.kernel.org for stable release.

 drivers/usb/host/xhci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index dc357cabb265..ab440ce8420f 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1146,8 +1146,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
 		temp = readl(&xhci->op_regs->status);
 	}
 
-	/* If restore operation fails, re-initialize the HC during resume */
-	if ((temp & STS_SRE) || hibernated) {
+	/* If restore operation fails or HC error is detected, re-initialize the HC during resume */
+	if ((temp & (STS_SRE | STS_HCE)) || hibernated) {
 
 		if ((xhci->quirks & XHCI_COMP_MODE_QUIRK) &&
 				!(xhci_all_ports_seen_u0(xhci))) {
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
  2021-12-29 11:25 [PATCH v3] xhci: re-initialize the HC during resume if HCE was set Puma Hsu
@ 2022-01-11 11:43 ` Greg KH
  2022-01-13  7:54   ` Puma Hsu
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2022-01-11 11:43 UTC (permalink / raw)
  To: Puma Hsu
  Cc: mathias.nyman, s.shtylyov, albertccwang, linux-usb, linux-kernel, stable

On Wed, Dec 29, 2021 at 07:25:51PM +0800, Puma Hsu wrote:
> When HCE(Host Controller Error) is set, it means an internal
> error condition has been detected. It needs to re-initialize
> the HC too.

What is "It" in the last sentence?

> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Puma Hsu <pumahsu@google.com>

What commit id does this fix?

> ---
> v2: Follow Sergey Shtylyov <s.shtylyov@omp.ru>'s comment.
> v3: Add stable@vger.kernel.org for stable release.
> 
>  drivers/usb/host/xhci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index dc357cabb265..ab440ce8420f 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -1146,8 +1146,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
>  		temp = readl(&xhci->op_regs->status);
>  	}
>  
> -	/* If restore operation fails, re-initialize the HC during resume */
> -	if ((temp & STS_SRE) || hibernated) {
> +	/* If restore operation fails or HC error is detected, re-initialize the HC during resume */
> +	if ((temp & (STS_SRE | STS_HCE)) || hibernated) {

But if STS_HCE is set on suspend, that means the suspend was broken so
you wouldn't get here, right?

Or can the error happen between suspend and resume?

This seems like a big hammer for when the host controller throws an
error.  Why is this the only place that it should be checked for?  What
caused the error that can now allow it to be fixed?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
  2022-01-11 11:43 ` Greg KH
@ 2022-01-13  7:54   ` Puma Hsu
  2022-01-13  8:21     ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Puma Hsu @ 2022-01-13  7:54 UTC (permalink / raw)
  To: Greg KH
  Cc: mathias.nyman, Sergey Shtylyov, Albert Wang, linux-usb,
	linux-kernel, stable

On Tue, Jan 11, 2022 at 7:43 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Dec 29, 2021 at 07:25:51PM +0800, Puma Hsu wrote:
> > When HCE(Host Controller Error) is set, it means an internal
> > error condition has been detected. It needs to re-initialize
> > the HC too.
>
> What is "It" in the last sentence?

Maybe I can change "It" to "Software", xHCI specification uses
"Software" when describing this.

>
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Puma Hsu <pumahsu@google.com>
>
> What commit id does this fix?

This commit is not used to fix a specific commit. We find a condition
that when XHCI runs the resume process but the HCE flag is set, then
the Run/Stop bit of USBCMD cannot be set so that HC would not be
enabled. In fact, HC may already meet a problem at this moment.
Besides, in xHCI requirements specification revision 1.2, Table 5-21
BIT(12) claims that Software should re-initialize the xHC when HCE is
set. Therefore, I think this commit could be the error handling for
HCE.

>
> > ---
> > v2: Follow Sergey Shtylyov <s.shtylyov@omp.ru>'s comment.
> > v3: Add stable@vger.kernel.org for stable release.
> >
> >  drivers/usb/host/xhci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > index dc357cabb265..ab440ce8420f 100644
> > --- a/drivers/usb/host/xhci.c
> > +++ b/drivers/usb/host/xhci.c
> > @@ -1146,8 +1146,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
> >               temp = readl(&xhci->op_regs->status);
> >       }
> >
> > -     /* If restore operation fails, re-initialize the HC during resume */
> > -     if ((temp & STS_SRE) || hibernated) {
> > +     /* If restore operation fails or HC error is detected, re-initialize the HC during resume */
> > +     if ((temp & (STS_SRE | STS_HCE)) || hibernated) {
>
> But if STS_HCE is set on suspend, that means the suspend was broken so
> you wouldn't get here, right?

In xhci_suspend(), it seems doesn't really check whether STS_HCE is
set and then break the suspend(The only case for checking HCE is when
STS_SAVE setting failed). So suspend function may be still able to
finish even if HCE is set? Then xhci_resume will still be called.

> Or can the error happen between suspend and resume?
>
> This seems like a big hammer for when the host controller throws an
> error.  Why is this the only place that it should be checked for?  What
> caused the error that can now allow it to be fixed?

I believe this is not the only place that the host controller may set
HCE, the host controller may set HCE anytime it sees an error in my
opinion, not only in suspend or resume.
I think this could be a recovery if xhci finds HCE during the resume process.
If someone finds HCE in other functions, it may also need to do the
recovery too.


> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
  2022-01-13  7:54   ` Puma Hsu
@ 2022-01-13  8:21     ` Greg KH
  2022-01-17  5:24       ` Puma Hsu
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2022-01-13  8:21 UTC (permalink / raw)
  To: Puma Hsu
  Cc: mathias.nyman, Sergey Shtylyov, Albert Wang, linux-usb,
	linux-kernel, stable

On Thu, Jan 13, 2022 at 03:54:27PM +0800, Puma Hsu wrote:
> On Tue, Jan 11, 2022 at 7:43 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Dec 29, 2021 at 07:25:51PM +0800, Puma Hsu wrote:
> > > When HCE(Host Controller Error) is set, it means an internal
> > > error condition has been detected. It needs to re-initialize
> > > the HC too.
> >
> > What is "It" in the last sentence?
> 
> Maybe I can change "It" to "Software", xHCI specification uses
> "Software" when describing this.

Please change it to something better :)

> > >
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Puma Hsu <pumahsu@google.com>
> >
> > What commit id does this fix?
> 
> This commit is not used to fix a specific commit. We find a condition
> that when XHCI runs the resume process but the HCE flag is set, then
> the Run/Stop bit of USBCMD cannot be set so that HC would not be
> enabled. In fact, HC may already meet a problem at this moment.
> Besides, in xHCI requirements specification revision 1.2, Table 5-21
> BIT(12) claims that Software should re-initialize the xHC when HCE is
> set. Therefore, I think this commit could be the error handling for
> HCE.

So this problem has been there since the driver was first added to the
kernel?  Should it go to stable kernels as well?  If so, how far back in
time?

> > > ---
> > > v2: Follow Sergey Shtylyov <s.shtylyov@omp.ru>'s comment.
> > > v3: Add stable@vger.kernel.org for stable release.
> > >
> > >  drivers/usb/host/xhci.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > > index dc357cabb265..ab440ce8420f 100644
> > > --- a/drivers/usb/host/xhci.c
> > > +++ b/drivers/usb/host/xhci.c
> > > @@ -1146,8 +1146,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
> > >               temp = readl(&xhci->op_regs->status);
> > >       }
> > >
> > > -     /* If restore operation fails, re-initialize the HC during resume */
> > > -     if ((temp & STS_SRE) || hibernated) {
> > > +     /* If restore operation fails or HC error is detected, re-initialize the HC during resume */
> > > +     if ((temp & (STS_SRE | STS_HCE)) || hibernated) {
> >
> > But if STS_HCE is set on suspend, that means the suspend was broken so
> > you wouldn't get here, right?
> 
> In xhci_suspend(), it seems doesn't really check whether STS_HCE is
> set and then break the suspend(The only case for checking HCE is when
> STS_SAVE setting failed). So suspend function may be still able to
> finish even if HCE is set? Then xhci_resume will still be called.

Is this a problem?

> > Or can the error happen between suspend and resume?
> >
> > This seems like a big hammer for when the host controller throws an
> > error.  Why is this the only place that it should be checked for?  What
> > caused the error that can now allow it to be fixed?
> 
> I believe this is not the only place that the host controller may set
> HCE, the host controller may set HCE anytime it sees an error in my
> opinion, not only in suspend or resume.

Then where else should it be checked?  Where else will your silicon set
this bit as part of the normal operating process?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
  2022-01-13  8:21     ` Greg KH
@ 2022-01-17  5:24       ` Puma Hsu
  2022-01-17 12:22         ` Mathias Nyman
  0 siblings, 1 reply; 7+ messages in thread
From: Puma Hsu @ 2022-01-17  5:24 UTC (permalink / raw)
  To: Greg KH
  Cc: mathias.nyman, Sergey Shtylyov, Albert Wang, linux-usb,
	linux-kernel, stable

On Thu, Jan 13, 2022 at 4:21 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Jan 13, 2022 at 03:54:27PM +0800, Puma Hsu wrote:
> > On Tue, Jan 11, 2022 at 7:43 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Dec 29, 2021 at 07:25:51PM +0800, Puma Hsu wrote:
> > > > When HCE(Host Controller Error) is set, it means an internal
> > > > error condition has been detected. It needs to re-initialize
> > > > the HC too.
> > >
> > > What is "It" in the last sentence?
> >
> > Maybe I can change "It" to "Software", xHCI specification uses
> > "Software" when describing this.
>
> Please change it to something better :)

I will fix it in next patch version.

> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Puma Hsu <pumahsu@google.com>
> > >
> > > What commit id does this fix?
> >
> > This commit is not used to fix a specific commit. We find a condition
> > that when XHCI runs the resume process but the HCE flag is set, then
> > the Run/Stop bit of USBCMD cannot be set so that HC would not be
> > enabled. In fact, HC may already meet a problem at this moment.
> > Besides, in xHCI requirements specification revision 1.2, Table 5-21
> > BIT(12) claims that Software should re-initialize the xHC when HCE is
> > set. Therefore, I think this commit could be the error handling for
> > HCE.
>
> So this problem has been there since the driver was first added to the
> kernel?  Should it go to stable kernels as well?  If so, how far back in
> time?

I think XHCI hasn’t handled HCE, so yes this may be a long problem.
I have cced stable@vger.kernel.org for stable backporting, but I’m not sure
how far it should backport since it seems this might be a rare case if no one
reported this issue?

> > > > ---
> > > > v2: Follow Sergey Shtylyov <s.shtylyov@omp.ru>'s comment.
> > > > v3: Add stable@vger.kernel.org for stable release.
> > > >
> > > >  drivers/usb/host/xhci.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > > > index dc357cabb265..ab440ce8420f 100644
> > > > --- a/drivers/usb/host/xhci.c
> > > > +++ b/drivers/usb/host/xhci.c
> > > > @@ -1146,8 +1146,8 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
> > > >               temp = readl(&xhci->op_regs->status);
> > > >       }
> > > >
> > > > -     /* If restore operation fails, re-initialize the HC during resume */
> > > > -     if ((temp & STS_SRE) || hibernated) {
> > > > +     /* If restore operation fails or HC error is detected, re-initialize the HC during resume */
> > > > +     if ((temp & (STS_SRE | STS_HCE)) || hibernated) {
> > >
> > > But if STS_HCE is set on suspend, that means the suspend was broken so
> > > you wouldn't get here, right?
> >
> > In xhci_suspend(), it seems doesn't really check whether STS_HCE is
> > set and then break the suspend(The only case for checking HCE is when
> > STS_SAVE setting failed). So suspend function may be still able to
> > finish even if HCE is set? Then xhci_resume will still be called.
>
> Is this a problem?

It could be, but I'm not sure and I think it may be not so serious if
HCE was raised
while suspend, because host controller doesn’t have job while suspend.
And we are
trying to recover it while resume.

> > > Or can the error happen between suspend and resume?
> > >
> > > This seems like a big hammer for when the host controller throws an
> > > error.  Why is this the only place that it should be checked for?  What
> > > caused the error that can now allow it to be fixed?
> >
> > I believe this is not the only place that the host controller may set
> > HCE, the host controller may set HCE anytime it sees an error in my
> > opinion, not only in suspend or resume.
>
> Then where else should it be checked?  Where else will your silicon set
> this bit as part of the normal operating process?

We observed this flag while resume in our silicon so far. According to the XHCI
specification 4.24.1, “Software should implement an algorithm for checking the
HCE flag if the xHC is not responding.”, so maybe it would be better
to implement
a new API to recover host controller whenever the driver side finds no response
from host controller in the future.

> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
  2022-01-17  5:24       ` Puma Hsu
@ 2022-01-17 12:22         ` Mathias Nyman
  2022-01-18  6:50           ` Puma Hsu
  0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2022-01-17 12:22 UTC (permalink / raw)
  To: Puma Hsu, Greg KH
  Cc: mathias.nyman, Sergey Shtylyov, Albert Wang, linux-usb,
	linux-kernel, stable


>>> This seems like a big hammer for when the host controller throws an
>>>> error.  Why is this the only place that it should be checked for?  What
>>>> caused the error that can now allow it to be fixed?
>>>
>>> I believe this is not the only place that the host controller may set
>>> HCE, the host controller may set HCE anytime it sees an error in my
>>> opinion, not only in suspend or resume.
>>
>> Then where else should it be checked?  Where else will your silicon set
>> this bit as part of the normal operating process?
> 
> We observed this flag while resume in our silicon so far. According to the XHCI
> specification 4.24.1, “Software should implement an algorithm for checking the
> HCE flag if the xHC is not responding.”, so maybe it would be better
> to implement
> a new API to recover host controller whenever the driver side finds no response
> from host controller in the future.
> 

As all the code to reset the host during resume already exists, and is well tried
due to issues in resume being so common, I think it makes sense to add the HCE case 
here as well. It's a simple fix that makes the life of users better.

That said we shouldn't hide the reason for reset like this.
Print a debug message telling about the HCE so that everybody working with xHCI
can see it and start fixing the rootcause.

Another HCE check could be added to command timeout code, but just to show a
warning for now.
Reset might not always clear HCE, and we don't want to be stuck in a reset loop.
This check  could be a separate patch

-Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] xhci: re-initialize the HC during resume if HCE was set
  2022-01-17 12:22         ` Mathias Nyman
@ 2022-01-18  6:50           ` Puma Hsu
  0 siblings, 0 replies; 7+ messages in thread
From: Puma Hsu @ 2022-01-18  6:50 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Greg KH, mathias.nyman, Sergey Shtylyov, Albert Wang, linux-usb,
	linux-kernel, stable

On Mon, Jan 17, 2022 at 8:20 PM Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
>
>
> >>> This seems like a big hammer for when the host controller throws an
> >>>> error.  Why is this the only place that it should be checked for?  What
> >>>> caused the error that can now allow it to be fixed?
> >>>
> >>> I believe this is not the only place that the host controller may set
> >>> HCE, the host controller may set HCE anytime it sees an error in my
> >>> opinion, not only in suspend or resume.
> >>
> >> Then where else should it be checked?  Where else will your silicon set
> >> this bit as part of the normal operating process?
> >
> > We observed this flag while resume in our silicon so far. According to the XHCI
> > specification 4.24.1, “Software should implement an algorithm for checking the
> > HCE flag if the xHC is not responding.”, so maybe it would be better
> > to implement
> > a new API to recover host controller whenever the driver side finds no response
> > from host controller in the future.
> >
>
> As all the code to reset the host during resume already exists, and is well tried
> due to issues in resume being so common, I think it makes sense to add the HCE case
> here as well. It's a simple fix that makes the life of users better.
>
> That said we shouldn't hide the reason for reset like this.
> Print a debug message telling about the HCE so that everybody working with xHCI
> can see it and start fixing the rootcause.

I will print a debug message in next patch version.

> Another HCE check could be added to command timeout code, but just to show a
> warning for now.
> Reset might not always clear HCE, and we don't want to be stuck in a reset loop.
> This check  could be a separate patch

Maybe I can try to make this in the future. Thank you for advising.

> -Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-18  6:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-29 11:25 [PATCH v3] xhci: re-initialize the HC during resume if HCE was set Puma Hsu
2022-01-11 11:43 ` Greg KH
2022-01-13  7:54   ` Puma Hsu
2022-01-13  8:21     ` Greg KH
2022-01-17  5:24       ` Puma Hsu
2022-01-17 12:22         ` Mathias Nyman
2022-01-18  6:50           ` Puma Hsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).