linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression: USB/xhci issues on some systems with newer kernel versions
@ 2019-10-02 12:28 Bernhard Gebetsberger
  2019-10-03 10:23 ` Mathias Nyman
  0 siblings, 1 reply; 11+ messages in thread
From: Bernhard Gebetsberger @ 2019-10-02 12:28 UTC (permalink / raw)
  To: linux-usb

Hi,

There has been a regression in the xhci driver since kernel version 4.20, on some systems some usb devices won't work until the system gets rebooted.
The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state", although for some reason there are some usb devices that are affected by this issue but don't throw the error message(including the device I'm using, I got the error in previous kernel versions though).
It seems like this bug can also lead to system instability, one user reported in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got a system freeze because of this when using kernel 5.3.1.

When looking at the responses in the bug tracker, it looks like it mostly affects Ryzen based systems with 300 series motherboards, although there are some other affected systems as well. It doesn't only affect wifi/bluetooth sticks, some users even got this issue when connecting their smartphone or their external hard drive to their PC.

After enabling kernel debugging/tracing for xhci_hcd I got the following messages in dmesg(short version, link to the whole file below):
[  231.185635] xhci_hcd 0000:38:00.4: xhci_get_isoc_frame_id: index 0, reg 0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x372
[  231.185642] xhci_hcd 0000:38:00.4: xhci_get_isoc_frame_id: index 1, reg 0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x373
[  231.185646] xhci_hcd 0000:38:00.4: xhci_get_isoc_frame_id: index 2, reg 0x1b29 start_frame_id 0x366, end_frame_id 0x6e4, start_frame 0x374
......
[  231.887681] xhci_hcd 0000:38:00.4: xhci_get_isoc_frame_id: index 4, reg 0x3119 start_frame_id 0x624, end_frame_id 0x1a2, start_frame 0x633
[  231.887687] xhci_hcd 0000:38:00.4: xhci_get_isoc_frame_id: index 5, reg 0x3119 start_frame_id 0x624, end_frame_id 0x1a2, start_frame 0x634
[  231.892346] xhci_hcd 0000:38:00.4: Cancel URB 000000008599ca58, dev 1, ep 0x1, starting at offset 0xff388ea0
[  231.892355] xhci_hcd 0000:38:00.4: // Ding dong!
[  231.892363] xhci_hcd 0000:38:00.4: Cancel URB 000000000d35fd5d, dev 1, ep 0x1, starting at offset 0xff388ef0
[  231.892368] xhci_hcd 0000:38:00.4: Cancel URB 0000000074e3ee88, dev 1, ep 0x1, starting at offset 0xff388e40
[  231.892640] xhci_hcd 0000:38:00.4: Stopped on Transfer TRB for slot 1 ep 1
[  231.892647] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388ea0 (dma).
[  231.892651] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388eb0 (dma).
[  231.892653] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388ec0 (dma).
[  231.892656] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388ed0 (dma).
[  231.892658] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388ee0 (dma).
[  231.892661] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388ef0 (dma).
[  231.892663] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388f00 (dma).
[  231.892666] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388f10 (dma).
[  231.892668] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388f20 (dma).
[  231.892670] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388f30 (dma).
[  231.892672] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388f40 (dma).
[  231.892675] xhci_hcd 0000:38:00.4: Removing canceled TD starting at 0xff388e90 (dma).
[  231.892677] xhci_hcd 0000:38:00.4: Finding endpoint context
[  231.892679] xhci_hcd 0000:38:00.4: Cycle state = 0x1
[  231.892682] xhci_hcd 0000:38:00.4: New dequeue segment = 000000005d174923 (virtual)
[  231.892685] xhci_hcd 0000:38:00.4: New dequeue pointer = 0xff388ea0 (DMA)
[  231.892688] xhci_hcd 0000:38:00.4: Set TR Deq Ptr cmd, new deq seg = 000000005d174923 (0xff388000 dma), new deq ptr = 00000000d5c5ed2a (0xff388ea0 dma), new cycle = 1
[  231.892693] xhci_hcd 0000:38:00.4: // Ding dong!
[  231.892728] xhci_hcd 0000:38:00.4: Successful Set TR Deq Ptr cmd, deq = @ff388ea0
[  231.897107] xhci_hcd 0000:38:00.4: xhci_drop_endpoint called for udev 0000000043fc1c1f
[  231.897126] xhci_hcd 0000:38:00.4: drop ep 0x1, slot id 1, new drop flags = 0x4, new add flags = 0x0
[  231.897129] xhci_hcd 0000:38:00.4: xhci_check_bandwidth called for udev 0000000043fc1c1f
[  231.897137] xhci_hcd 0000:38:00.4: // Ding dong!
[  231.898523] xhci_hcd 0000:38:00.4: Successful Endpoint Configure command

I have uploaded the whole dmesg file and the tracing file to transfer.sh: https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace

The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: Use soft retry to recover faster from transaction errors". I think this commit should be reverted at least until a workaround has been found, especially since the next two kernel versions will be used by a lot of distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in Ubuntu 20.04) so more users would be affected by this.

- Bernhard


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-10-02 12:28 Regression: USB/xhci issues on some systems with newer kernel versions Bernhard Gebetsberger
@ 2019-10-03 10:23 ` Mathias Nyman
  2019-10-03 15:13   ` Bernhard Gebetsberger
  0 siblings, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2019-10-03 10:23 UTC (permalink / raw)
  To: Bernhard Gebetsberger, linux-usb

On 2.10.2019 15.28, Bernhard Gebetsberger wrote:
> Hi,
> 
> There has been a regression in the xhci driver since kernel version 4.20, on some systems some usb devices won't work until the system gets rebooted.
> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state", although for some reason there are some usb devices that are affected by this issue but don't throw the error message(including the device I'm using, I got the error in previous kernel versions though).
> It seems like this bug can also lead to system instability, one user reported in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got a system freeze because of this when using kernel 5.3.1.
> 

Ok, lets take a look at this.
Some of the symptoms vary a bit in the report, so lets focus on ones that
show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"

> When looking at the responses in the bug tracker, it looks like it mostly affects Ryzen based systems with 300 series motherboards, although there are some other affected systems as well. It doesn't only affect wifi/bluetooth sticks, some users even got this issue when connecting their smartphone or their external hard drive to their PC.

> 
> I have uploaded the whole dmesg file and the tracing file to transfer.sh: https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace

Hmm, trying to download these just shows "Not Found"

Could someone with a affected system enable tracing and dynamic debug on a
recent kernel, take logs and traces of one failing instance where the message
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen.

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

< Trigger the issue >

Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace

> 
> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: Use soft retry to recover faster from transaction errors". I think this commit should be reverted at least until a workaround has been found, especially since the next two kernel versions will be used by a lot of distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in Ubuntu 20.04) so more users would be affected by this.
> 

There some time left before 5.4 is out, lets see if we can find the root cause first.

-Mathias


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-10-03 10:23 ` Mathias Nyman
@ 2019-10-03 15:13   ` Bernhard Gebetsberger
  2019-10-11  1:55     ` Bernhard Gebetsberger
  2019-10-14 13:03     ` Mathias Nyman
  0 siblings, 2 replies; 11+ messages in thread
From: Bernhard Gebetsberger @ 2019-10-03 15:13 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb

I sent the instructions to one of the users in the bug tracker.
Here is the download link for his logs: https://www.sendspace.com/file/413hlj

- Bernhard

Am 03.10.19 um 12:23 schrieb Mathias Nyman:
> On 2.10.2019 15.28, Bernhard Gebetsberger wrote:
>> Hi,
>>
>> There has been a regression in the xhci driver since kernel version 4.20, on some systems some usb devices won't work until the system gets rebooted.
>> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state", although for some reason there are some usb devices that are affected by this issue but don't throw the error message(including the device I'm using, I got the error in previous kernel versions though).
>> It seems like this bug can also lead to system instability, one user reported in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got a system freeze because of this when using kernel 5.3.1.
>>
>
> Ok, lets take a look at this.
> Some of the symptoms vary a bit in the report, so lets focus on ones that
> show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
>
>> When looking at the responses in the bug tracker, it looks like it mostly affects Ryzen based systems with 300 series motherboards, although there are some other affected systems as well. It doesn't only affect wifi/bluetooth sticks, some users even got this issue when connecting their smartphone or their external hard drive to their PC.
>
>>
>> I have uploaded the whole dmesg file and the tracing file to transfer.sh: https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace
>
> Hmm, trying to download these just shows "Not Found"
>
> Could someone with a affected system enable tracing and dynamic debug on a
> recent kernel, take logs and traces of one failing instance where the message
> "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen.
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
> < Trigger the issue >
>
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
>
>>
>> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: Use soft retry to recover faster from transaction errors". I think this commit should be reverted at least until a workaround has been found, especially since the next two kernel versions will be used by a lot of distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in Ubuntu 20.04) so more users would be affected by this.
>>
>
> There some time left before 5.4 is out, lets see if we can find the root cause first.
>
> -Mathias
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-10-03 15:13   ` Bernhard Gebetsberger
@ 2019-10-11  1:55     ` Bernhard Gebetsberger
  2019-10-14 13:03     ` Mathias Nyman
  1 sibling, 0 replies; 11+ messages in thread
From: Bernhard Gebetsberger @ 2019-10-11  1:55 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb

I've just noticed that this problem also occurs when unplugging an affected device.
When unplugging the device the error
    "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
gets shown, even though I don't get this error when plugging the device in.

Here is a link to the dmesg and trace logs:
https://gist.github.com/Brn9hrd7/011405276fdf7a699dcc5cb83c67d276
maybe there is something useful in there that was missing in the previous logs.

- Bernhard


Am 03.10.19 um 17:13 schrieb Bernhard Gebetsberger:
> I sent the instructions to one of the users in the bug tracker.
> Here is the download link for his logs: https://www.sendspace.com/file/413hlj
>
> - Bernhard
>
> Am 03.10.19 um 12:23 schrieb Mathias Nyman:
>> On 2.10.2019 15.28, Bernhard Gebetsberger wrote:
>>> Hi,
>>>
>>> There has been a regression in the xhci driver since kernel version 4.20, on some systems some usb devices won't work until the system gets rebooted.
>>> The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state", although for some reason there are some usb devices that are affected by this issue but don't throw the error message(including the device I'm using, I got the error in previous kernel versions though).
>>> It seems like this bug can also lead to system instability, one user reported in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that he got a system freeze because of this when using kernel 5.3.1.
>>>
>> Ok, lets take a look at this.
>> Some of the symptoms vary a bit in the report, so lets focus on ones that
>> show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
>>
>>> When looking at the responses in the bug tracker, it looks like it mostly affects Ryzen based systems with 300 series motherboards, although there are some other affected systems as well. It doesn't only affect wifi/bluetooth sticks, some users even got this issue when connecting their smartphone or their external hard drive to their PC.
>>> I have uploaded the whole dmesg file and the tracing file to transfer.sh: https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace
>> Hmm, trying to download these just shows "Not Found"
>>
>> Could someone with a affected system enable tracing and dynamic debug on a
>> recent kernel, take logs and traces of one failing instance where the message
>> "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen.
>>
>> mount -t debugfs none /sys/kernel/debug
>> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
>> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
>> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>>
>> < Trigger the issue >
>>
>> Send output of dmesg
>> Send content of /sys/kernel/debug/tracing/trace
>>
>>> The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: Use soft retry to recover faster from transaction errors". I think this commit should be reverted at least until a workaround has been found, especially since the next two kernel versions will be used by a lot of distributions(5.4 because it's a LTS kernel and 5.5 will probably be used in Ubuntu 20.04) so more users would be affected by this.
>>>
>> There some time left before 5.4 is out, lets see if we can find the root cause first.
>>
>> -Mathias
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-10-03 15:13   ` Bernhard Gebetsberger
  2019-10-11  1:55     ` Bernhard Gebetsberger
@ 2019-10-14 13:03     ` Mathias Nyman
  2019-12-05 20:34       ` Prashant Malani
  1 sibling, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2019-10-14 13:03 UTC (permalink / raw)
  To: Bernhard Gebetsberger, linux-usb

On 3.10.2019 18.13, Bernhard Gebetsberger wrote:
> I sent the instructions to one of the users in the bug tracker.
> Here is the download link for his logs: https://www.sendspace.com/file/413hlj
> 

Thanks.
Traces show driver handles the Transaction error normally by issuing a endpoint reset,
which is successful, but after that there is no activity on that endpoint even if there
are over 120 transfers requests (TRB) pending.
After over 40 seconds the class driver starts canceling the pending transfers.

after soft retry the xhci driver should ring the doorbell of the endpoint, and hardware
should start processing pending TRBs, but ring is not handling pending TRBs
Looks like either driver or hardware fails to start the endpoint ring again

I'll add some more tracing to check driver correctly rings the endpoint doorbell.


Details of trace:

-Several TRBs (over 120) queued for slot 4 ep 3 (ep1in-bulk), starting at 0xff2d1000, up to 0xff2d1800 (0x10 per TRB)

   164.884097: xhci_urb_enqueue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
   164.884099: xhci_queue_trb: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
   164.884101: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1010(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
   ...
   164.884304: xhci_urb_enqueue: ep1in-bulk: urb 00000000fee4e260 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
   164.884304: xhci_queue_trb: BULK: Buffer 00000000ff3a304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
   164.884304: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1800(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)

-Transaction error 3 seconds later for TRB at 0xff2d1000

   167.578273: xhci_handle_event: EVENT: TRB 00000000ff2d1000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:c
   167.578288: xhci_handle_transfer: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C

-Soft retry by issuing a host side reset endpoint command,

   167.578297: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
   167.578416: xhci_handle_event: EVENT: TRB 00000000ffefe440 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c

-Host side of endpoint reset successful, endpoint is in stopped state as it should

   167.578417: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
   167.578419: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst

-Driver should ring endpoint doorbell, and hardware should continue procressing TRBs
No activity at all on slot 4 ep 3, other endpoints continue running normally.
Check driver really rang ep doorbell

A lot later class driver asks to cancel pending tranfer:

   214.132531: xhci_urb_dequeue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
   214.132548: xhci_dbg_cancel_urb: Cancel URB 000000005ebe7973, dev 2, ep 0x81, starting at offset 0xff2d1000

-xhci driver tries to stop endpoint to cancel transfer:

   214.132555: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C

-but it fails as slot is not in a proper state to be stopped, ep is in halted state after failed stop attempt.

   214.132679: xhci_handle_event: EVENT: TRB 00000000ffefe450 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
   214.132680: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
   214.132682: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512

-After this endpoint stays in halted state, xhci driver fails to recover from this while canceling the reset of the TRBs
  
-Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-10-14 13:03     ` Mathias Nyman
@ 2019-12-05 20:34       ` Prashant Malani
  2019-12-05 22:49         ` Prashant Malani
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Prashant Malani @ 2019-12-05 20:34 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Bernhard Gebetsberger, linux-usb, Hayes Wang, Grant Grundler

Hi Mathias and Bernhard,

I was interested in knowing if this issue was resolved (sounded like
this was deemed to be a hardware error, but I'm not sure).
The reason I ask is that we've recently noticed a similar error
popping up while using Realtek rtl8153a-based ethernet USB dongles
(these use the r8152 driver) on kernel 4.19 :
" hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to
incorrect slot or ep state."
This is generally followed by the dongle getting reset, and the
process repeats itself continuously.

I can share more detailed logs if required. The specific dongle I used
was LinkSys USB3GIGV1 (I think the official link is :
https://www.linksys.com/us/support-product?pid=01t80000003fwbWAAQ)

Some interesting data points:
- This issue doesn't manifest itself on kernel 4.4 or 4.14 but does
show up on 4.19
- This issue didn't manifest itself on 4.19 either before recent
changes were incorporated to patch the dongle firmware (commit
9370f2d05a2a150b0aa719a3070b26d478180df3 on the linux mainline
branch). After the firmware patching changes went in, 4.19 started
exhibiting this issue (4.4 and 4.14 still don't exhibit it).

Thanks and Best regards!


On Mon, Oct 14, 2019 at 6:01 AM Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
>
> On 3.10.2019 18.13, Bernhard Gebetsberger wrote:
> > I sent the instructions to one of the users in the bug tracker.
> > Here is the download link for his logs: https://www.sendspace.com/file/413hlj
> >
>
> Thanks.
> Traces show driver handles the Transaction error normally by issuing a endpoint reset,
> which is successful, but after that there is no activity on that endpoint even if there
> are over 120 transfers requests (TRB) pending.
> After over 40 seconds the class driver starts canceling the pending transfers.
>
> after soft retry the xhci driver should ring the doorbell of the endpoint, and hardware
> should start processing pending TRBs, but ring is not handling pending TRBs
> Looks like either driver or hardware fails to start the endpoint ring again
>
> I'll add some more tracing to check driver correctly rings the endpoint doorbell.
>
>
> Details of trace:
>
> -Several TRBs (over 120) queued for slot 4 ep 3 (ep1in-bulk), starting at 0xff2d1000, up to 0xff2d1800 (0x10 per TRB)
>
>    164.884097: xhci_urb_enqueue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
>    164.884099: xhci_queue_trb: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
>    164.884101: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1010(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
>    ...
>    164.884304: xhci_urb_enqueue: ep1in-bulk: urb 00000000fee4e260 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
>    164.884304: xhci_queue_trb: BULK: Buffer 00000000ff3a304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
>    164.884304: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1800(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
>
> -Transaction error 3 seconds later for TRB at 0xff2d1000
>
>    167.578273: xhci_handle_event: EVENT: TRB 00000000ff2d1000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:c
>    167.578288: xhci_handle_transfer: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
>
> -Soft retry by issuing a host side reset endpoint command,
>
>    167.578297: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
>    167.578416: xhci_handle_event: EVENT: TRB 00000000ffefe440 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
>
> -Host side of endpoint reset successful, endpoint is in stopped state as it should
>
>    167.578417: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
>    167.578419: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst
>
> -Driver should ring endpoint doorbell, and hardware should continue procressing TRBs
> No activity at all on slot 4 ep 3, other endpoints continue running normally.
> Check driver really rang ep doorbell
>
> A lot later class driver asks to cancel pending tranfer:
>
>    214.132531: xhci_urb_dequeue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
>    214.132548: xhci_dbg_cancel_urb: Cancel URB 000000005ebe7973, dev 2, ep 0x81, starting at offset 0xff2d1000
>
> -xhci driver tries to stop endpoint to cancel transfer:
>
>    214.132555: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
>
> -but it fails as slot is not in a proper state to be stopped, ep is in halted state after failed stop attempt.
>
>    214.132679: xhci_handle_event: EVENT: TRB 00000000ffefe450 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
>    214.132680: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
>    214.132682: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512
>
> -After this endpoint stays in halted state, xhci driver fails to recover from this while canceling the reset of the TRBs
>
> -Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-12-05 20:34       ` Prashant Malani
@ 2019-12-05 22:49         ` Prashant Malani
  2019-12-05 23:19         ` Bernhard Gebetsberger
  2019-12-16 13:29         ` Mathias Nyman
  2 siblings, 0 replies; 11+ messages in thread
From: Prashant Malani @ 2019-12-05 22:49 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Bernhard Gebetsberger, linux-usb, Hayes Wang, Grant Grundler

BTW here are the trace files for the above error (the driver is r8152):

dmesg: https://transfer.sh/uenDh/dmesg_ethernet_xhci_hcd.log
trace: https://transfer.sh/Nhhat/trace_ethernet_xhci_hcd.log

Thanks again,

On Thu, Dec 5, 2019 at 12:34 PM Prashant Malani <pmalani@chromium.org> wrote:
>
> Hi Mathias and Bernhard,
>
> I was interested in knowing if this issue was resolved (sounded like
> this was deemed to be a hardware error, but I'm not sure).
> The reason I ask is that we've recently noticed a similar error
> popping up while using Realtek rtl8153a-based ethernet USB dongles
> (these use the r8152 driver) on kernel 4.19 :
> " hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to
> incorrect slot or ep state."
> This is generally followed by the dongle getting reset, and the
> process repeats itself continuously.
>
> I can share more detailed logs if required. The specific dongle I used
> was LinkSys USB3GIGV1 (I think the official link is :
> https://www.linksys.com/us/support-product?pid=01t80000003fwbWAAQ)
>
> Some interesting data points:
> - This issue doesn't manifest itself on kernel 4.4 or 4.14 but does
> show up on 4.19
> - This issue didn't manifest itself on 4.19 either before recent
> changes were incorporated to patch the dongle firmware (commit
> 9370f2d05a2a150b0aa719a3070b26d478180df3 on the linux mainline
> branch). After the firmware patching changes went in, 4.19 started
> exhibiting this issue (4.4 and 4.14 still don't exhibit it).
>
> Thanks and Best regards!
>
>
> On Mon, Oct 14, 2019 at 6:01 AM Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
> >
> > On 3.10.2019 18.13, Bernhard Gebetsberger wrote:
> > > I sent the instructions to one of the users in the bug tracker.
> > > Here is the download link for his logs: https://www.sendspace.com/file/413hlj
> > >
> >
> > Thanks.
> > Traces show driver handles the Transaction error normally by issuing a endpoint reset,
> > which is successful, but after that there is no activity on that endpoint even if there
> > are over 120 transfers requests (TRB) pending.
> > After over 40 seconds the class driver starts canceling the pending transfers.
> >
> > after soft retry the xhci driver should ring the doorbell of the endpoint, and hardware
> > should start processing pending TRBs, but ring is not handling pending TRBs
> > Looks like either driver or hardware fails to start the endpoint ring again
> >
> > I'll add some more tracing to check driver correctly rings the endpoint doorbell.
> >
> >
> > Details of trace:
> >
> > -Several TRBs (over 120) queued for slot 4 ep 3 (ep1in-bulk), starting at 0xff2d1000, up to 0xff2d1800 (0x10 per TRB)
> >
> >    164.884097: xhci_urb_enqueue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
> >    164.884099: xhci_queue_trb: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
> >    164.884101: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1010(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
> >    ...
> >    164.884304: xhci_urb_enqueue: ep1in-bulk: urb 00000000fee4e260 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
> >    164.884304: xhci_queue_trb: BULK: Buffer 00000000ff3a304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
> >    164.884304: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1800(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
> >
> > -Transaction error 3 seconds later for TRB at 0xff2d1000
> >
> >    167.578273: xhci_handle_event: EVENT: TRB 00000000ff2d1000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:c
> >    167.578288: xhci_handle_transfer: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
> >
> > -Soft retry by issuing a host side reset endpoint command,
> >
> >    167.578297: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
> >    167.578416: xhci_handle_event: EVENT: TRB 00000000ffefe440 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
> >
> > -Host side of endpoint reset successful, endpoint is in stopped state as it should
> >
> >    167.578417: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
> >    167.578419: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst
> >
> > -Driver should ring endpoint doorbell, and hardware should continue procressing TRBs
> > No activity at all on slot 4 ep 3, other endpoints continue running normally.
> > Check driver really rang ep doorbell
> >
> > A lot later class driver asks to cancel pending tranfer:
> >
> >    214.132531: xhci_urb_dequeue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
> >    214.132548: xhci_dbg_cancel_urb: Cancel URB 000000005ebe7973, dev 2, ep 0x81, starting at offset 0xff2d1000
> >
> > -xhci driver tries to stop endpoint to cancel transfer:
> >
> >    214.132555: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
> >
> > -but it fails as slot is not in a proper state to be stopped, ep is in halted state after failed stop attempt.
> >
> >    214.132679: xhci_handle_event: EVENT: TRB 00000000ffefe450 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
> >    214.132680: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
> >    214.132682: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512
> >
> > -After this endpoint stays in halted state, xhci driver fails to recover from this while canceling the reset of the TRBs
> >
> > -Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-12-05 20:34       ` Prashant Malani
  2019-12-05 22:49         ` Prashant Malani
@ 2019-12-05 23:19         ` Bernhard Gebetsberger
  2019-12-05 23:23           ` Prashant Malani
  2019-12-16 13:29         ` Mathias Nyman
  2 siblings, 1 reply; 11+ messages in thread
From: Bernhard Gebetsberger @ 2019-12-05 23:19 UTC (permalink / raw)
  To: Prashant Malani, Mathias Nyman; +Cc: linux-usb, Hayes Wang, Grant Grundler

Hi,

The issue I have hasn't been resolved. I'm currently running a custom kernel, with the
commitf8f80be501aa2f10669585c3e328fad079d8cb3a reverted, which works fine for me. I'm not
sure if the issue you have is related to mine, because I don't have any issues with 4.19,
and I'm also using a different driver(rt2800usb).

- Bernhard

Am 05.12.19 um 21:34 schrieb Prashant Malani:
> Hi Mathias and Bernhard,
>
> I was interested in knowing if this issue was resolved (sounded like
> this was deemed to be a hardware error, but I'm not sure).
> The reason I ask is that we've recently noticed a similar error
> popping up while using Realtek rtl8153a-based ethernet USB dongles
> (these use the r8152 driver) on kernel 4.19 :
> " hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to
> incorrect slot or ep state."
> This is generally followed by the dongle getting reset, and the
> process repeats itself continuously.
>
> I can share more detailed logs if required. The specific dongle I used
> was LinkSys USB3GIGV1 (I think the official link is :
> https://www.linksys.com/us/support-product?pid=01t80000003fwbWAAQ)
>
> Some interesting data points:
> - This issue doesn't manifest itself on kernel 4.4 or 4.14 but does
> show up on 4.19
> - This issue didn't manifest itself on 4.19 either before recent
> changes were incorporated to patch the dongle firmware (commit
> 9370f2d05a2a150b0aa719a3070b26d478180df3 on the linux mainline
> branch). After the firmware patching changes went in, 4.19 started
> exhibiting this issue (4.4 and 4.14 still don't exhibit it).
>
> Thanks and Best regards!
>
>
> On Mon, Oct 14, 2019 at 6:01 AM Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
>> On 3.10.2019 18.13, Bernhard Gebetsberger wrote:
>>> I sent the instructions to one of the users in the bug tracker.
>>> Here is the download link for his logs: https://www.sendspace.com/file/413hlj
>>>
>> Thanks.
>> Traces show driver handles the Transaction error normally by issuing a endpoint reset,
>> which is successful, but after that there is no activity on that endpoint even if there
>> are over 120 transfers requests (TRB) pending.
>> After over 40 seconds the class driver starts canceling the pending transfers.
>>
>> after soft retry the xhci driver should ring the doorbell of the endpoint, and hardware
>> should start processing pending TRBs, but ring is not handling pending TRBs
>> Looks like either driver or hardware fails to start the endpoint ring again
>>
>> I'll add some more tracing to check driver correctly rings the endpoint doorbell.
>>
>>
>> Details of trace:
>>
>> -Several TRBs (over 120) queued for slot 4 ep 3 (ep1in-bulk), starting at 0xff2d1000, up to 0xff2d1800 (0x10 per TRB)
>>
>>    164.884097: xhci_urb_enqueue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
>>    164.884099: xhci_queue_trb: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
>>    164.884101: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1010(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
>>    ...
>>    164.884304: xhci_urb_enqueue: ep1in-bulk: urb 00000000fee4e260 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
>>    164.884304: xhci_queue_trb: BULK: Buffer 00000000ff3a304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
>>    164.884304: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1800(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
>>
>> -Transaction error 3 seconds later for TRB at 0xff2d1000
>>
>>    167.578273: xhci_handle_event: EVENT: TRB 00000000ff2d1000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:c
>>    167.578288: xhci_handle_transfer: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
>>
>> -Soft retry by issuing a host side reset endpoint command,
>>
>>    167.578297: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
>>    167.578416: xhci_handle_event: EVENT: TRB 00000000ffefe440 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
>>
>> -Host side of endpoint reset successful, endpoint is in stopped state as it should
>>
>>    167.578417: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
>>    167.578419: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst
>>
>> -Driver should ring endpoint doorbell, and hardware should continue procressing TRBs
>> No activity at all on slot 4 ep 3, other endpoints continue running normally.
>> Check driver really rang ep doorbell
>>
>> A lot later class driver asks to cancel pending tranfer:
>>
>>    214.132531: xhci_urb_dequeue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
>>    214.132548: xhci_dbg_cancel_urb: Cancel URB 000000005ebe7973, dev 2, ep 0x81, starting at offset 0xff2d1000
>>
>> -xhci driver tries to stop endpoint to cancel transfer:
>>
>>    214.132555: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
>>
>> -but it fails as slot is not in a proper state to be stopped, ep is in halted state after failed stop attempt.
>>
>>    214.132679: xhci_handle_event: EVENT: TRB 00000000ffefe450 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
>>    214.132680: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
>>    214.132682: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512
>>
>> -After this endpoint stays in halted state, xhci driver fails to recover from this while canceling the reset of the TRBs
>>
>> -Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-12-05 23:19         ` Bernhard Gebetsberger
@ 2019-12-05 23:23           ` Prashant Malani
  0 siblings, 0 replies; 11+ messages in thread
From: Prashant Malani @ 2019-12-05 23:23 UTC (permalink / raw)
  To: Bernhard Gebetsberger
  Cc: Mathias Nyman, linux-usb, Hayes Wang, Grant Grundler

Hi Bernhard,

Thanks for the response. I just checked and looks like we backported
that patch onto our 4.19 kernel, so it would explain why you aren't
seen the issue on 4.19.

On Thu, Dec 5, 2019 at 3:19 PM Bernhard Gebetsberger
<bernhard.gebetsberger@gmx.at> wrote:
>
> Hi,
>
> The issue I have hasn't been resolved. I'm currently running a custom kernel, with the
> commitf8f80be501aa2f10669585c3e328fad079d8cb3a reverted, which works fine for me. I'm not
> sure if the issue you have is related to mine, because I don't have any issues with 4.19,
> and I'm also using a different driver(rt2800usb).
>
> - Bernhard
>
> Am 05.12.19 um 21:34 schrieb Prashant Malani:
> > Hi Mathias and Bernhard,
> >
> > I was interested in knowing if this issue was resolved (sounded like
> > this was deemed to be a hardware error, but I'm not sure).
> > The reason I ask is that we've recently noticed a similar error
> > popping up while using Realtek rtl8153a-based ethernet USB dongles
> > (these use the r8152 driver) on kernel 4.19 :
> > " hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to
> > incorrect slot or ep state."
> > This is generally followed by the dongle getting reset, and the
> > process repeats itself continuously.
> >
> > I can share more detailed logs if required. The specific dongle I used
> > was LinkSys USB3GIGV1 (I think the official link is :
> > https://www.linksys.com/us/support-product?pid=01t80000003fwbWAAQ)
> >
> > Some interesting data points:
> > - This issue doesn't manifest itself on kernel 4.4 or 4.14 but does
> > show up on 4.19
> > - This issue didn't manifest itself on 4.19 either before recent
> > changes were incorporated to patch the dongle firmware (commit
> > 9370f2d05a2a150b0aa719a3070b26d478180df3 on the linux mainline
> > branch). After the firmware patching changes went in, 4.19 started
> > exhibiting this issue (4.4 and 4.14 still don't exhibit it).
> >
> > Thanks and Best regards!
> >
> >
> > On Mon, Oct 14, 2019 at 6:01 AM Mathias Nyman
> > <mathias.nyman@linux.intel.com> wrote:
> >> On 3.10.2019 18.13, Bernhard Gebetsberger wrote:
> >>> I sent the instructions to one of the users in the bug tracker.
> >>> Here is the download link for his logs: https://www.sendspace.com/file/413hlj
> >>>
> >> Thanks.
> >> Traces show driver handles the Transaction error normally by issuing a endpoint reset,
> >> which is successful, but after that there is no activity on that endpoint even if there
> >> are over 120 transfers requests (TRB) pending.
> >> After over 40 seconds the class driver starts canceling the pending transfers.
> >>
> >> after soft retry the xhci driver should ring the doorbell of the endpoint, and hardware
> >> should start processing pending TRBs, but ring is not handling pending TRBs
> >> Looks like either driver or hardware fails to start the endpoint ring again
> >>
> >> I'll add some more tracing to check driver correctly rings the endpoint doorbell.
> >>
> >>
> >> Details of trace:
> >>
> >> -Several TRBs (over 120) queued for slot 4 ep 3 (ep1in-bulk), starting at 0xff2d1000, up to 0xff2d1800 (0x10 per TRB)
> >>
> >>    164.884097: xhci_urb_enqueue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
> >>    164.884099: xhci_queue_trb: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
> >>    164.884101: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1010(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
> >>    ...
> >>    164.884304: xhci_urb_enqueue: ep1in-bulk: urb 00000000fee4e260 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
> >>    164.884304: xhci_queue_trb: BULK: Buffer 00000000ff3a304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
> >>    164.884304: xhci_inc_enq: BULK 00000000be510b60: enq 0x00000000ff2d1800(0x00000000ff2d1000) deq 0x00000000ff2d1000(0x00000000ff2d1000)
> >>
> >> -Transaction error 3 seconds later for TRB at 0xff2d1000
> >>
> >>    167.578273: xhci_handle_event: EVENT: TRB 00000000ff2d1000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:c
> >>    167.578288: xhci_handle_transfer: BULK: Buffer 00000000f9e2304c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C
> >>
> >> -Soft retry by issuing a host side reset endpoint command,
> >>
> >>    167.578297: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
> >>    167.578416: xhci_handle_event: EVENT: TRB 00000000ffefe440 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
> >>
> >> -Host side of endpoint reset successful, endpoint is in stopped state as it should
> >>
> >>    167.578417: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
> >>    167.578419: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst
> >>
> >> -Driver should ring endpoint doorbell, and hardware should continue procressing TRBs
> >> No activity at all on slot 4 ep 3, other endpoints continue running normally.
> >> Check driver really rang ep doorbell
> >>
> >> A lot later class driver asks to cancel pending tranfer:
> >>
> >>    214.132531: xhci_urb_dequeue: ep1in-bulk: urb 000000005ebe7973 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
> >>    214.132548: xhci_dbg_cancel_urb: Cancel URB 000000005ebe7973, dev 2, ep 0x81, starting at offset 0xff2d1000
> >>
> >> -xhci driver tries to stop endpoint to cancel transfer:
> >>
> >>    214.132555: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
> >>
> >> -but it fails as slot is not in a proper state to be stopped, ep is in halted state after failed stop attempt.
> >>
> >>    214.132679: xhci_handle_event: EVENT: TRB 00000000ffefe450 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
> >>    214.132680: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
> >>    214.132682: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512
> >>
> >> -After this endpoint stays in halted state, xhci driver fails to recover from this while canceling the reset of the TRBs
> >>
> >> -Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-12-05 20:34       ` Prashant Malani
  2019-12-05 22:49         ` Prashant Malani
  2019-12-05 23:19         ` Bernhard Gebetsberger
@ 2019-12-16 13:29         ` Mathias Nyman
  2020-01-04  0:58           ` Prashant Malani
  2 siblings, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2019-12-16 13:29 UTC (permalink / raw)
  To: Prashant Malani
  Cc: Bernhard Gebetsberger, linux-usb, Hayes Wang, Grant Grundler

Hi Prashant

On 5.12.2019 22.34, Prashant Malani wrote:
> Hi Mathias and Bernhard,
> 
> I was interested in knowing if this issue was resolved (sounded like
> this was deemed to be a hardware error, but I'm not sure).
> The reason I ask is that we've recently noticed a similar error
> popping up while using Realtek rtl8153a-based ethernet USB dongles
> (these use the r8152 driver) on kernel 4.19 :
> " hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to
> incorrect slot or ep state."
> This is generally followed by the dongle getting reset, and the
> process repeats itself continuously.

Sorry about the delay, your traces show a transaction error, and the port link
going to inactive error state.

xhci driver tries to recover from the transaction error with a soft retry
(endpoint reset), while hub thread will need to reset the whole device to recover
from the inactive link error state.

Can you try reverting commit:
"f8f80be501aa xhci: Use soft retry to recover faster from transaction errors"

If you still see "Transfer error for slot x ep y on endpoint" in dmesg,
but device is not reset and works normally, then it's possible that the soft retry
makes things worse.

If not, then the transaction error and the link inactive error are most likely symptoms
of some other cause.

The hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
is as in Bernhard's case due to xhci driver trying to issue a command for a slot in context error
state, this part needs to be fixed in the driver, but should not matter much. Device must be reset
anyway to recover from the link inactive error state.

-Mathias


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: USB/xhci issues on some systems with newer kernel versions
  2019-12-16 13:29         ` Mathias Nyman
@ 2020-01-04  0:58           ` Prashant Malani
  0 siblings, 0 replies; 11+ messages in thread
From: Prashant Malani @ 2020-01-04  0:58 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Bernhard Gebetsberger, linux-usb, Hayes Wang, Grant Grundler

Hi Mathias

On Mon, Dec 16, 2019 at 5:27 AM Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
>
> Hi Prashant
>
> On 5.12.2019 22.34, Prashant Malani wrote:
> > Hi Mathias and Bernhard,
> >
> > I was interested in knowing if this issue was resolved (sounded like
> > this was deemed to be a hardware error, but I'm not sure).
> > The reason I ask is that we've recently noticed a similar error
> > popping up while using Realtek rtl8153a-based ethernet USB dongles
> > (these use the r8152 driver) on kernel 4.19 :
> > " hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to
> > incorrect slot or ep state."
> > This is generally followed by the dongle getting reset, and the
> > process repeats itself continuously.
>
> Sorry about the delay, your traces show a transaction error, and the port link
> going to inactive error state.
>
> xhci driver tries to recover from the transaction error with a soft retry
> (endpoint reset), while hub thread will need to reset the whole device to recover
> from the inactive link error state.
>
> Can you try reverting commit:
> "f8f80be501aa xhci: Use soft retry to recover faster from transaction errors"
>
> If you still see "Transfer error for slot x ep y on endpoint" in dmesg,
> but device is not reset and works normally, then it's possible that the soft retry
> makes things worse.

Thanks for your analysis, and sorry for the delayed response. I
reverted the aforementioned commit. While the transfer error no longer
appears, I still see the repeated resets, so there is likely an issue
either on the Host Controller, or the device firmware itself.
I'll continue digging, but seems safe to rule out soft retry as a culprit.

Best regards,
>
> If not, then the transaction error and the link inactive error are most likely symptoms
> of some other cause.
>
> The hci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
> is as in Bernhard's case due to xhci driver trying to issue a command for a slot in context error
> state, this part needs to be fixed in the driver, but should not matter much. Device must be reset
> anyway to recover from the link inactive error state.
>
> -Mathias
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-04  0:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-02 12:28 Regression: USB/xhci issues on some systems with newer kernel versions Bernhard Gebetsberger
2019-10-03 10:23 ` Mathias Nyman
2019-10-03 15:13   ` Bernhard Gebetsberger
2019-10-11  1:55     ` Bernhard Gebetsberger
2019-10-14 13:03     ` Mathias Nyman
2019-12-05 20:34       ` Prashant Malani
2019-12-05 22:49         ` Prashant Malani
2019-12-05 23:19         ` Bernhard Gebetsberger
2019-12-05 23:23           ` Prashant Malani
2019-12-16 13:29         ` Mathias Nyman
2020-01-04  0:58           ` Prashant Malani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).