All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xhci: print warning when HCE was set
@ 2022-09-15  1:11 Longfang Liu
  2022-09-22 13:01 ` Mathias Nyman
  0 siblings, 1 reply; 7+ messages in thread
From: Longfang Liu @ 2022-09-15  1:11 UTC (permalink / raw)
  To: gregkh, mathias.nyman; +Cc: linux-usb, linux-kernel, yisen.zhuang, liulongfang

When HCE(Host Controller Error) is set, it means that the xhci hardware
controller has an error at this time, but the current xhci driver
software does not log this event.

By adding an HCE event detection in the xhci interrupt processing
interface, a warning log is output to the system, which is convenient
for system device status tracking.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
---
 drivers/usb/host/xhci-ring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index ad81e9a508b1..f6af479188e8 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3031,6 +3031,11 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
 	if (!(status & STS_EINT))
 		goto out;
 
+	if (status & STS_HCE) {
+		xhci_warn(xhci, "WARNING: Host Controller Error\n");
+		goto out;
+	}
+
 	if (status & STS_FATAL) {
 		xhci_warn(xhci, "WARNING: Host System Error\n");
 		xhci_halt(xhci);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xhci: print warning when HCE was set
  2022-09-15  1:11 [PATCH] xhci: print warning when HCE was set Longfang Liu
@ 2022-09-22 13:01 ` Mathias Nyman
  2022-09-24  2:35   ` liulongfang
  0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2022-09-22 13:01 UTC (permalink / raw)
  To: Longfang Liu, gregkh, mathias.nyman; +Cc: linux-usb, linux-kernel, yisen.zhuang

Hi

On 15.9.2022 4.11, Longfang Liu wrote:
> When HCE(Host Controller Error) is set, it means that the xhci hardware
> controller has an error at this time, but the current xhci driver
> software does not log this event.
> 
> By adding an HCE event detection in the xhci interrupt processing
> interface, a warning log is output to the system, which is convenient
> for system device status tracking.
> 

xHC should cease all activity when it sets HCE, and is probably not
generating interrupts anymore.

Would probably be more useful to check for HCE at timeouts than in the
interrupt handler.

If this is something seen on actual hardware then it makes sense to add it.

Thanks
-Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xhci: print warning when HCE was set
  2022-09-22 13:01 ` Mathias Nyman
@ 2022-09-24  2:35   ` liulongfang
  2022-09-26  7:58     ` Mathias Nyman
  0 siblings, 1 reply; 7+ messages in thread
From: liulongfang @ 2022-09-24  2:35 UTC (permalink / raw)
  To: Mathias Nyman, gregkh, mathias.nyman
  Cc: linux-usb, linux-kernel, yisen.zhuang

On 2022/9/22 21:01, Mathias Nyman Wrote:
> Hi
> 
> On 15.9.2022 4.11, Longfang Liu wrote:
>> When HCE(Host Controller Error) is set, it means that the xhci hardware
>> controller has an error at this time, but the current xhci driver
>> software does not log this event.
>>
>> By adding an HCE event detection in the xhci interrupt processing
>> interface, a warning log is output to the system, which is convenient
>> for system device status tracking.
>>
> 
> xHC should cease all activity when it sets HCE, and is probably not
> generating interrupts anymore.
> 
> Would probably be more useful to check for HCE at timeouts than in the
> interrupt handler.
> 

Which function of the driver code is this timeout in?

> If this is something seen on actual hardware then it makes sense to add it.
> 

This HCE error is sure to report an interrupt on the chip we are using.

> Thanks
> -Mathias
> .
> 
Thansk,
Longfang.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xhci: print warning when HCE was set
  2022-09-24  2:35   ` liulongfang
@ 2022-09-26  7:58     ` Mathias Nyman
  2022-10-14  3:12       ` liulongfang
  0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2022-09-26  7:58 UTC (permalink / raw)
  To: liulongfang, gregkh, mathias.nyman; +Cc: linux-usb, linux-kernel, yisen.zhuang

On 24.9.2022 5.35, liulongfang wrote:
> On 2022/9/22 21:01, Mathias Nyman Wrote:
>> Hi
>>
>> On 15.9.2022 4.11, Longfang Liu wrote:
>>> When HCE(Host Controller Error) is set, it means that the xhci hardware
>>> controller has an error at this time, but the current xhci driver
>>> software does not log this event.
>>>
>>> By adding an HCE event detection in the xhci interrupt processing
>>> interface, a warning log is output to the system, which is convenient
>>> for system device status tracking.
>>>
>>
>> xHC should cease all activity when it sets HCE, and is probably not
>> generating interrupts anymore.
>>
>> Would probably be more useful to check for HCE at timeouts than in the
>> interrupt handler.
>>
> 
> Which function of the driver code is this timeout in?

xhci_handle_command_timeout() will usually trigger at some point,

> 
>> If this is something seen on actual hardware then it makes sense to add it.
>>
> 
> This HCE error is sure to report an interrupt on the chip we are using.

Ok, then makes sense to add this patch.

Thanks
-Mathias


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xhci: print warning when HCE was set
  2022-09-26  7:58     ` Mathias Nyman
@ 2022-10-14  3:12       ` liulongfang
  2022-10-14  7:56         ` Mathias Nyman
  0 siblings, 1 reply; 7+ messages in thread
From: liulongfang @ 2022-10-14  3:12 UTC (permalink / raw)
  To: Mathias Nyman, gregkh, mathias.nyman
  Cc: linux-usb, linux-kernel, yisen.zhuang

On 2022/9/26 15:58, Mathias Nyman wrote:
> On 24.9.2022 5.35, liulongfang wrote:
>> On 2022/9/22 21:01, Mathias Nyman Wrote:
>>> Hi
>>>
>>> On 15.9.2022 4.11, Longfang Liu wrote:
>>>> When HCE(Host Controller Error) is set, it means that the xhci hardware
>>>> controller has an error at this time, but the current xhci driver
>>>> software does not log this event.
>>>>
>>>> By adding an HCE event detection in the xhci interrupt processing
>>>> interface, a warning log is output to the system, which is convenient
>>>> for system device status tracking.
>>>>
>>>
>>> xHC should cease all activity when it sets HCE, and is probably not
>>> generating interrupts anymore.
>>>
>>> Would probably be more useful to check for HCE at timeouts than in the
>>> interrupt handler.
>>>
>>
>> Which function of the driver code is this timeout in?
> 
> xhci_handle_command_timeout() will usually trigger at some point,
> 

Because this HCE error is reported in the form of an interrupt signal, it is more
concise to put it in xhci_irq() than in xhci_handle_command_timeout().

>>
>>> If this is something seen on actual hardware then it makes sense to add it.
>>>
>>
>> This HCE error is sure to report an interrupt on the chip we are using.
> 
> Ok, then makes sense to add this patch.
> 
> Thanks
> -Mathias
>
Thanks,
Longfang.
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xhci: print warning when HCE was set
  2022-10-14  3:12       ` liulongfang
@ 2022-10-14  7:56         ` Mathias Nyman
  2022-12-09  6:13           ` liulongfang
  0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2022-10-14  7:56 UTC (permalink / raw)
  To: liulongfang, Mathias Nyman, gregkh; +Cc: linux-usb, linux-kernel, yisen.zhuang

On 14.10.2022 6.12, liulongfang wrote:
> On 2022/9/26 15:58, Mathias Nyman wrote:
>> On 24.9.2022 5.35, liulongfang wrote:
>>> On 2022/9/22 21:01, Mathias Nyman Wrote:
>>>> Hi
>>>>
>>>> On 15.9.2022 4.11, Longfang Liu wrote:
>>>>> When HCE(Host Controller Error) is set, it means that the xhci hardware
>>>>> controller has an error at this time, but the current xhci driver
>>>>> software does not log this event.
>>>>>
>>>>> By adding an HCE event detection in the xhci interrupt processing
>>>>> interface, a warning log is output to the system, which is convenient
>>>>> for system device status tracking.
>>>>>
>>>>
>>>> xHC should cease all activity when it sets HCE, and is probably not
>>>> generating interrupts anymore.
>>>>
>>>> Would probably be more useful to check for HCE at timeouts than in the
>>>> interrupt handler.
>>>>
>>>
>>> Which function of the driver code is this timeout in?
>>
>> xhci_handle_command_timeout() will usually trigger at some point,
>>
> 
> Because this HCE error is reported in the form of an interrupt signal, it is more
> concise to put it in xhci_irq() than in xhci_handle_command_timeout().
> 

Patch was added to queue after you reported your xHC hardware triggers interrupts when HCE is set.
I'll send it forward after 6.1-rc1

xHCI specification still indicate HCE might not trigger interrupts:
  
Section 4.24.1 -Internal Errors
...
"Software should implement an algorithm for checking the HCE flag if the xHC is
not responding."

Thanks
-Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xhci: print warning when HCE was set
  2022-10-14  7:56         ` Mathias Nyman
@ 2022-12-09  6:13           ` liulongfang
  0 siblings, 0 replies; 7+ messages in thread
From: liulongfang @ 2022-12-09  6:13 UTC (permalink / raw)
  To: Mathias Nyman, Mathias Nyman, gregkh
  Cc: linux-usb, linux-kernel, yisen.zhuang

On 2022/10/14 15:56, Mathias Nyman Wrote:
> On 14.10.2022 6.12, liulongfang wrote:
>> On 2022/9/26 15:58, Mathias Nyman wrote:
>>> On 24.9.2022 5.35, liulongfang wrote:
>>>> On 2022/9/22 21:01, Mathias Nyman Wrote:
>>>>> Hi
>>>>>
>>>>> On 15.9.2022 4.11, Longfang Liu wrote:
>>>>>> When HCE(Host Controller Error) is set, it means that the xhci hardware
>>>>>> controller has an error at this time, but the current xhci driver
>>>>>> software does not log this event.
>>>>>>
>>>>>> By adding an HCE event detection in the xhci interrupt processing
>>>>>> interface, a warning log is output to the system, which is convenient
>>>>>> for system device status tracking.
>>>>>>
>>>>>
>>>>> xHC should cease all activity when it sets HCE, and is probably not
>>>>> generating interrupts anymore.
>>>>>
>>>>> Would probably be more useful to check for HCE at timeouts than in the
>>>>> interrupt handler.
>>>>>
>>>>
>>>> Which function of the driver code is this timeout in?
>>>
>>> xhci_handle_command_timeout() will usually trigger at some point,
>>>
>>
>> Because this HCE error is reported in the form of an interrupt signal, it is more
>> concise to put it in xhci_irq() than in xhci_handle_command_timeout().
>>
> 
> Patch was added to queue after you reported your xHC hardware triggers interrupts when HCE is set.
> I'll send it forward after 6.1-rc1
> 

In our test version, a test log is added to xhci_irq(). In the test case that triggers HCE,
the HCE interrupt is reported and recorded through the log:

{53}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
{53}[Hardware Error]: event severity: recoverable
{53}[Hardware Error]:  Error 0, type: recoverable
{53}[Hardware Error]:   section type: unknown, c8b328a8-9917-4af6-9a13-2e08ab2e7586
{53}[Hardware Error]:   section length: 0x48
{53}[Hardware Error]:   00000000: 0000186b 00000201 001a0001 00000000  k...............
{53}[Hardware Error]:   00000010: 00000000 00000000 00000000 00000028  ............(...
{53}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000  ................
{53}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000  ................
{53}[Hardware Error]:   00000040: 00000001 00000000                    ........
 xhci_hcd 0000:30:01.0: xHCI host not responding to stop endpoint command.
 xhci_hcd 0000:30:01.0: USBSTS: PCD HCE
 xhci_hcd 0000:30:01.0: xHCI host controller not responding, assume dead
 xhci_hcd 0000:30:01.0: HC died; cleaning up
 usb usb1-port1: couldn't allocate usb_device
rmmod xhci-pci
 xhci_hcd 0000:30:01.0: remove, state 4
 usb usb2: USB disconnect, device number 1
 xhci_hcd 0000:30:01.0: USB bus 2 deregistered
 xhci_hcd 0000:30:01.0: remove, state 1
 usb usb1: USB disconnect, device number 1
 xhci_hcd 0000:30:01.0: USB bus 1 deregistered

Thanks,
Longfang.

> xHCI specification still indicate HCE might not trigger interrupts:
>  
> Section 4.24.1 -Internal Errors
> ...
> "Software should implement an algorithm for checking the HCE flag if the xHC is
> not responding."
> 
> Thanks
> -Mathias
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-09  6:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15  1:11 [PATCH] xhci: print warning when HCE was set Longfang Liu
2022-09-22 13:01 ` Mathias Nyman
2022-09-24  2:35   ` liulongfang
2022-09-26  7:58     ` Mathias Nyman
2022-10-14  3:12       ` liulongfang
2022-10-14  7:56         ` Mathias Nyman
2022-12-09  6:13           ` liulongfang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.