linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
@ 2019-09-22  8:34 alex zheng
  2019-09-23  5:36 ` Felipe Balbi
  0 siblings, 1 reply; 14+ messages in thread
From: alex zheng @ 2019-09-22  8:34 UTC (permalink / raw)
  To: linux-usb

Hi all,

I am a user of dwc3 USB host controller, I found there are some
confused behavior of trb event on this controller.
When I run a raw USB data transfer(run bulk in&out transfer with
libusb) and iperf3(over rndis) at the same time,
there are some strange interrupts occurs and make the driver report
error(ERROR DMA transfer).
And:
1. this problem only hapened in USB SS mode
2. this problem seems not hapen when I run same test case with other
xhci controller(such as asmedia/intel pcie xhci controller) on PC.
3. the kernel version is 4.9.130

I think this may be a hw bug of DWC3 USB controller, could anyone
please give me some help to debug this problem?

The detail log see as below:
[  131.074102] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: xHCI
handle event, 8000
27630 [  131.074109] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
process trans event : ep_index = 11, event_dma = 1eb13e90
27631 [  131.074117] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
inc_deq, start trb dma = 1eb13e90, dequeue_p = e482ce90, trb_free num
= 1871, ring type = 2
27632 [  131.074123] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
inc_deq 111, start trb dma = 1eb13ea0, dequeue_p = e482cea0, trb_free
num = 1872, ring type = 2
27633 [  131.074130] c0 3 (ksoftirqd/0) xhci-hcd xh[  133.057617] c0 3
(ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA
ptr not part of current TD ep_index 16 comp_code 1
27634 [  133.059312] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
Looking for event-dma 000000001eb0fff0 trb-start 000000001eb10000
trb-end 000000001eb10000 seg-start 000000001eb10000 seg-end
000000001eb10ff0
27635 [  133.066215] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
comp_code 1
27636 [  133.067908] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
Looking for event-dma 000000001eb10000 trb-start 000000001eb10230
trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
000000001eb10ff0
27637 [  133.070572] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
comp_code 1
27638 [  133.072260] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
Looking for event-dma 000000001eb10010 trb-start 000000001eb10230
trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
000000001eb10ff0
27639 [  133.075052] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
comp_code 1
27640 [  133.076739] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
Looking for event-dma 000000001eb10020 trb-start 000000001eb10230
trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
000000001eb10ff0
27641 [  133.079472] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
comp_code 1
27642 [  133.081159] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
Looking for event-dma 000000001eb10030 trb-start 000000001eb10230
trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
000000001eb10ff0
27643 [  133.083896] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
comp_code 1
27644 [  133.085584] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
Looking for event-dma 000000001eb10040 trb-start 000000001eb10230
trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
000000001eb10ff0
27645 [  133.088328] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
comp_code 1

1. According these logs above the link trb whose address is 0x1eb0fff0
occurs a transfer event, but this DMA address is not in the trb ring,
then the driver report an error(and followed a few error logs witch
invalid DMA address).
2. I dump the data of the address(0x1eb0fff0) and find the IOC bit is
not set, see as below:
# dump_reg.sh 0x1eb0fff0 4
0x1eb0fff0:0x1EB10000 0x00000000 0x00000000 0x00001800

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-22  8:34 BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted alex zheng
@ 2019-09-23  5:36 ` Felipe Balbi
  2019-09-23  7:08   ` alex zheng
  0 siblings, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2019-09-23  5:36 UTC (permalink / raw)
  To: alex zheng, linux-usb, Mathias Nyman


Hi,

(it helps when you Cc correct maintainers ;-)

alex zheng <tc0721@gmail.com> writes:

> Hi all,
>
> I am a user of dwc3 USB host controller, I found there are some
> confused behavior of trb event on this controller.
> When I run a raw USB data transfer(run bulk in&out transfer with
> libusb) and iperf3(over rndis) at the same time,
> there are some strange interrupts occurs and make the driver report
> error(ERROR DMA transfer).
> And:

So dwc3 is workingo n host mode. Which platform is this?

> 1. this problem only hapened in USB SS mode
> 2. this problem seems not hapen when I run same test case with other
> xhci controller(such as asmedia/intel pcie xhci controller) on PC.
> 3. the kernel version is 4.9.130

Have you tried a more recent kernel? 4.9 is really ancient. Please try
v5.3.

> I think this may be a hw bug of DWC3 USB controller, could anyone
> please give me some help to debug this problem?
>
> The detail log see as below:
> [  131.074102] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: xHCI
> handle event, 8000
> 27630 [  131.074109] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> process trans event : ep_index = 11, event_dma = 1eb13e90
> 27631 [  131.074117] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> inc_deq, start trb dma = 1eb13e90, dequeue_p = e482ce90, trb_free num
> = 1871, ring type = 2
> 27632 [  131.074123] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> inc_deq 111, start trb dma = 1eb13ea0, dequeue_p = e482cea0, trb_free
> num = 1872, ring type = 2
> 27633 [  131.074130] c0 3 (ksoftirqd/0) xhci-hcd xh[  133.057617] c0 3
> (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA
> ptr not part of current TD ep_index 16 comp_code 1
> 27634 [  133.059312] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> Looking for event-dma 000000001eb0fff0 trb-start 000000001eb10000
> trb-end 000000001eb10000 seg-start 000000001eb10000 seg-end
> 000000001eb10ff0
> 27635 [  133.066215] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> comp_code 1
> 27636 [  133.067908] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> Looking for event-dma 000000001eb10000 trb-start 000000001eb10230
> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> 000000001eb10ff0
> 27637 [  133.070572] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> comp_code 1
> 27638 [  133.072260] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> Looking for event-dma 000000001eb10010 trb-start 000000001eb10230
> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> 000000001eb10ff0
> 27639 [  133.075052] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> comp_code 1
> 27640 [  133.076739] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> Looking for event-dma 000000001eb10020 trb-start 000000001eb10230
> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> 000000001eb10ff0
> 27641 [  133.079472] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> comp_code 1
> 27642 [  133.081159] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> Looking for event-dma 000000001eb10030 trb-start 000000001eb10230
> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> 000000001eb10ff0
> 27643 [  133.083896] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> comp_code 1
> 27644 [  133.085584] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> Looking for event-dma 000000001eb10040 trb-start 000000001eb10230
> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> 000000001eb10ff0
> 27645 [  133.088328] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> comp_code 1
>
> 1. According these logs above the link trb whose address is 0x1eb0fff0
> occurs a transfer event, but this DMA address is not in the trb ring,
> then the driver report an error(and followed a few error logs witch
> invalid DMA address).
> 2. I dump the data of the address(0x1eb0fff0) and find the IOC bit is
> not set, see as below:
> # dump_reg.sh 0x1eb0fff0 4
> 0x1eb0fff0:0x1EB10000 0x00000000 0x00000000 0x00001800

-- 
balbi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-23  5:36 ` Felipe Balbi
@ 2019-09-23  7:08   ` alex zheng
  2019-09-23 10:15     ` Mathias Nyman
  2019-09-23 10:45     ` Felipe Balbi
  0 siblings, 2 replies; 14+ messages in thread
From: alex zheng @ 2019-09-23  7:08 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: linux-usb, Mathias Nyman, xiaowei.zheng

Hi, balbi,

Thank you for your reply~

Felipe Balbi <felipe.balbi@linux.intel.com> 于2019年9月23日周一 下午1:36写道:
>
>
> Hi,
>
> (it helps when you Cc correct maintainers ;-)
>
> alex zheng <tc0721@gmail.com> writes:
>
> > Hi all,
> >
> > I am a user of dwc3 USB host controller, I found there are some
> > confused behavior of trb event on this controller.
> > When I run a raw USB data transfer(run bulk in&out transfer with
> > libusb) and iperf3(over rndis) at the same time,
> > there are some strange interrupts occurs and make the driver report
> > error(ERROR DMA transfer).
> > And:
>
> So dwc3 is workingo n host mode. Which platform is this?

This is our self-design platform (ARM v7 cpu core  with synopsys DWC
USB3.0 controller).
version info: Linux localhost 4.9.130-645692-g6ecde01-dirty #394 SMP
PREEMPT Sun Sep 22 15:10:51 CST 2019 armv7l

>
> > 1. this problem only hapened in USB SS mode
> > 2. this problem seems not hapen when I run same test case with other
> > xhci controller(such as asmedia/intel pcie xhci controller) on PC.
> > 3. the kernel version is 4.9.130
>
> Have you tried a more recent kernel? 4.9 is really ancient. Please try
> v5.3.

Our platform only support 4.9 kernel now, and it may take a lot of
work to do to support the recent kernel.
Are there any causes may lead the link TRB trigger a interrupt when
the IOC bit is not setted?

>
> > I think this may be a hw bug of DWC3 USB controller, could anyone
> > please give me some help to debug this problem?
> >
> > The detail log see as below:
> > [  131.074102] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: xHCI
> > handle event, 8000
> > 27630 [  131.074109] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > process trans event : ep_index = 11, event_dma = 1eb13e90
> > 27631 [  131.074117] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > inc_deq, start trb dma = 1eb13e90, dequeue_p = e482ce90, trb_free num
> > = 1871, ring type = 2
> > 27632 [  131.074123] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > inc_deq 111, start trb dma = 1eb13ea0, dequeue_p = e482cea0, trb_free
> > num = 1872, ring type = 2
> > 27633 [  131.074130] c0 3 (ksoftirqd/0) xhci-hcd xh[  133.057617] c0 3
> > (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA
> > ptr not part of current TD ep_index 16 comp_code 1
> > 27634 [  133.059312] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > Looking for event-dma 000000001eb0fff0 trb-start 000000001eb10000
> > trb-end 000000001eb10000 seg-start 000000001eb10000 seg-end
> > 000000001eb10ff0
> > 27635 [  133.066215] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> > comp_code 1
> > 27636 [  133.067908] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > Looking for event-dma 000000001eb10000 trb-start 000000001eb10230
> > trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> > 000000001eb10ff0
> > 27637 [  133.070572] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> > comp_code 1
> > 27638 [  133.072260] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > Looking for event-dma 000000001eb10010 trb-start 000000001eb10230
> > trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> > 000000001eb10ff0
> > 27639 [  133.075052] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> > comp_code 1
> > 27640 [  133.076739] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > Looking for event-dma 000000001eb10020 trb-start 000000001eb10230
> > trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> > 000000001eb10ff0
> > 27641 [  133.079472] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> > comp_code 1
> > 27642 [  133.081159] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > Looking for event-dma 000000001eb10030 trb-start 000000001eb10230
> > trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> > 000000001eb10ff0
> > 27643 [  133.083896] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> > comp_code 1
> > 27644 [  133.085584] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > Looking for event-dma 000000001eb10040 trb-start 000000001eb10230
> > trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
> > 000000001eb10ff0
> > 27645 [  133.088328] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
> > ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
> > comp_code 1
> >
> > 1. According these logs above the link trb whose address is 0x1eb0fff0
> > occurs a transfer event, but this DMA address is not in the trb ring,
> > then the driver report an error(and followed a few error logs witch
> > invalid DMA address).
> > 2. I dump the data of the address(0x1eb0fff0) and find the IOC bit is
> > not set, see as below:
> > # dump_reg.sh 0x1eb0fff0 4
> > 0x1eb0fff0:0x1EB10000 0x00000000 0x00000000 0x00001800
>
> --
> balbi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-23  7:08   ` alex zheng
@ 2019-09-23 10:15     ` Mathias Nyman
       [not found]       ` <CADGPSwi87a5+3mCGAgptHgpBsQk9STQrEKs-kC6Nw55nPdRtOw@mail.gmail.com>
  2019-09-23 10:45     ` Felipe Balbi
  1 sibling, 1 reply; 14+ messages in thread
From: Mathias Nyman @ 2019-09-23 10:15 UTC (permalink / raw)
  To: alex zheng, Felipe Balbi; +Cc: linux-usb, xiaowei.zheng

On 23.9.2019 10.08, alex zheng wrote:
> Hi, balbi,
> 
> Thank you for your reply~
> 
> Felipe Balbi <felipe.balbi@linux.intel.com> 于2019年9月23日周一 下午1:36写道:
>>
>>
>> Hi,
>>
>> (it helps when you Cc correct maintainers ;-)
>>
>> alex zheng <tc0721@gmail.com> writes:
>>
>>> Hi all,
>>>
>>> I am a user of dwc3 USB host controller, I found there are some
>>> confused behavior of trb event on this controller.
>>> When I run a raw USB data transfer(run bulk in&out transfer with
>>> libusb) and iperf3(over rndis) at the same time,
>>> there are some strange interrupts occurs and make the driver report
>>> error(ERROR DMA transfer).
>>> And:
>>
>> So dwc3 is workingo n host mode. Which platform is this?
> 
> This is our self-design platform (ARM v7 cpu core  with synopsys DWC
> USB3.0 controller).
> version info: Linux localhost 4.9.130-645692-g6ecde01-dirty #394 SMP
> PREEMPT Sun Sep 22 15:10:51 CST 2019 armv7l
> 
>>
>>> 1. this problem only hapened in USB SS mode
>>> 2. this problem seems not hapen when I run same test case with other
>>> xhci controller(such as asmedia/intel pcie xhci controller) on PC.
>>> 3. the kernel version is 4.9.130
>>
>> Have you tried a more recent kernel? 4.9 is really ancient. Please try
>> v5.3.
> 
> Our platform only support 4.9 kernel now, and it may take a lot of
> work to do to support the recent kernel.
> Are there any causes may lead the link TRB trigger a interrupt when
> the IOC bit is not setted?
> 
>>
>>> I think this may be a hw bug of DWC3 USB controller, could anyone
>>> please give me some help to debug this problem?
>>>
>>> The detail log see as below:
>>> [  131.074102] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: xHCI
>>> handle event, 8000
>>> 27630 [  131.074109] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> process trans event : ep_index = 11, event_dma = 1eb13e90
>>> 27631 [  131.074117] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> inc_deq, start trb dma = 1eb13e90, dequeue_p = e482ce90, trb_free num
>>> = 1871, ring type = 2
>>> 27632 [  131.074123] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> inc_deq 111, start trb dma = 1eb13ea0, dequeue_p = e482cea0, trb_free
>>> num = 1872, ring type = 2
>>> 27633 [  131.074130] c0 3 (ksoftirqd/0) xhci-hcd xh[  133.057617] c0 3
>>> (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA
>>> ptr not part of current TD ep_index 16 comp_code 1
>>> 27634 [  133.059312] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> Looking for event-dma 000000001eb0fff0 trb-start 000000001eb10000
>>> trb-end 000000001eb10000 seg-start 000000001eb10000 seg-end
>>> 000000001eb10ff0
>>> 27635 [  133.066215] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
>>> comp_code 1
>>> 27636 [  133.067908] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> Looking for event-dma 000000001eb10000 trb-start 000000001eb10230
>>> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
>>> 000000001eb10ff0
>>> 27637 [  133.070572] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
>>> comp_code 1
>>> 27638 [  133.072260] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> Looking for event-dma 000000001eb10010 trb-start 000000001eb10230
>>> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
>>> 000000001eb10ff0
>>> 27639 [  133.075052] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
>>> comp_code 1
>>> 27640 [  133.076739] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> Looking for event-dma 000000001eb10020 trb-start 000000001eb10230
>>> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
>>> 000000001eb10ff0
>>> 27641 [  133.079472] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
>>> comp_code 1
>>> 27642 [  133.081159] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> Looking for event-dma 000000001eb10030 trb-start 000000001eb10230
>>> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
>>> 000000001eb10ff0
>>> 27643 [  133.083896] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
>>> comp_code 1
>>> 27644 [  133.085584] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> Looking for event-dma 000000001eb10040 trb-start 000000001eb10230
>>> trb-end 000000001eb10230 seg-start 000000001eb10000 seg-end
>>> 000000001eb10ff0
>>> 27645 [  133.088328] c0 3 (ksoftirqd/0) xhci-hcd xhci-hcd.0.auto:
>>> ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16
>>> comp_code 1
>>>
>>> 1. According these logs above the link trb whose address is 0x1eb0fff0
>>> occurs a transfer event, but this DMA address is not in the trb ring,
>>> then the driver report an error(and followed a few error logs witch
>>> invalid DMA address).

To me it looks like the controller creates an extra successful transfer event
for the Link TRB.

The link TRB DMA that the event is pointing to is part of the ring, but not part
of next Transfer Descriptor (TD) xhci driver expects to handle.
The link TRB is the last TRB of the previous ring segment, The TD the xhci driver
expects is on the next segment.

>>> 2. I dump the data of the address(0x1eb0fff0) and find the IOC bit is
>>> not set, see as below:
>>> # dump_reg.sh 0x1eb0fff0 4
>>> 0x1eb0fff0:0x1EB10000 0x00000000 0x00000000 0x00001800

The link TRB looks fine. TRB typeis link, and its next segment pointer is 0x1eb10000,
which is also where driver was expecting the next TD to be found.
No other bits are set.

Does everything work normally if you just ignore that error?
Can be done with a hack like this (untested):

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 40fa25c4d041..d5f4c416d0ef 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2531,7 +2531,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
                                 trb_in_td(xhci, ep_ring->deq_seg,
                                           ep_ring->dequeue, td->last_trb,
                                           ep_trb_dma, true);
-                               return -ESHUTDOWN;
+                               xhci_err(xhci, "Ignoring error\n");
+                               goto cleanup;
                         }
  
                         skip_isoc_td(xhci, td, event, ep, &status);

-Mathias

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-23  7:08   ` alex zheng
  2019-09-23 10:15     ` Mathias Nyman
@ 2019-09-23 10:45     ` Felipe Balbi
  2019-09-24 14:19       ` alex zheng
  1 sibling, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2019-09-23 10:45 UTC (permalink / raw)
  To: alex zheng; +Cc: linux-usb, Mathias Nyman, xiaowei.zheng


Hi Alex,

alex zheng <tc0721@gmail.com> writes:
>> > I am a user of dwc3 USB host controller, I found there are some
>> > confused behavior of trb event on this controller.
>> > When I run a raw USB data transfer(run bulk in&out transfer with
>> > libusb) and iperf3(over rndis) at the same time,
>> > there are some strange interrupts occurs and make the driver report
>> > error(ERROR DMA transfer).
>> > And:
>>
>> So dwc3 is workingo n host mode. Which platform is this?
>
> This is our self-design platform (ARM v7 cpu core  with synopsys DWC
> USB3.0 controller).
> version info: Linux localhost 4.9.130-645692-g6ecde01-dirty #394 SMP
> PREEMPT Sun Sep 22 15:10:51 CST 2019 armv7l

This is a brand new design and you're waking it up on v4.9? Could've
tracked upstream more closely, IMHO.

>> > 1. this problem only hapened in USB SS mode
>> > 2. this problem seems not hapen when I run same test case with other
>> > xhci controller(such as asmedia/intel pcie xhci controller) on PC.
>> > 3. the kernel version is 4.9.130
>>
>> Have you tried a more recent kernel? 4.9 is really ancient. Please try
>> v5.3.
>
> Our platform only support 4.9 kernel now, and it may take a lot of
> work to do to support the recent kernel.

In that case, I'm afraid you're on your own. Have a look at known
synopsys errata.

On a side-node, getting a cortex-A7 to boot with upstream kernel should
be only about adding a DeviceTree nowadays. Remember that for Linux to
boot, all you need is a system timer and UART. If you're using ARM IP
for interrupts, timers, etc, it should be really straight forward to
boot on v5.3

> Are there any causes may lead the link TRB trigger a interrupt when
> the IOC bit is not setted?

No idea, perhaps you should have a deeper look at both Synopsys databook
and xHCI specification.

In any case, v4.9 is really old.

Good luck

-- 
balbi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-23 10:45     ` Felipe Balbi
@ 2019-09-24 14:19       ` alex zheng
  0 siblings, 0 replies; 14+ messages in thread
From: alex zheng @ 2019-09-24 14:19 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: linux-usb, Mathias Nyman, xiaowei.zheng

Hi Balbi,

This SOC was released last year, and it was woke up on kernel v4.9.

After these days debugging I think it seem more like a hardware related issue.
We will check the Synopsys databook again to find something which may help,
and may consider to try these cases on more recent kernel later.

Thank you for your advices~

Felipe Balbi <felipe.balbi@linux.intel.com> 于2019年9月23日周一 下午6:45写道:

>
>
> Hi Alex,
>
> alex zheng <tc0721@gmail.com> writes:
> >> > I am a user of dwc3 USB host controller, I found there are some
> >> > confused behavior of trb event on this controller.
> >> > When I run a raw USB data transfer(run bulk in&out transfer with
> >> > libusb) and iperf3(over rndis) at the same time,
> >> > there are some strange interrupts occurs and make the driver report
> >> > error(ERROR DMA transfer).
> >> > And:
> >>
> >> So dwc3 is workingo n host mode. Which platform is this?
> >
> > This is our self-design platform (ARM v7 cpu core  with synopsys DWC
> > USB3.0 controller).
> > version info: Linux localhost 4.9.130-645692-g6ecde01-dirty #394 SMP
> > PREEMPT Sun Sep 22 15:10:51 CST 2019 armv7l
>
> This is a brand new design and you're waking it up on v4.9? Could've
> tracked upstream more closely, IMHO.
>
> >> > 1. this problem only hapened in USB SS mode
> >> > 2. this problem seems not hapen when I run same test case with other
> >> > xhci controller(such as asmedia/intel pcie xhci controller) on PC.
> >> > 3. the kernel version is 4.9.130
> >>
> >> Have you tried a more recent kernel? 4.9 is really ancient. Please try
> >> v5.3.
> >
> > Our platform only support 4.9 kernel now, and it may take a lot of
> > work to do to support the recent kernel.
>
> In that case, I'm afraid you're on your own. Have a look at known
> synopsys errata.
>
> On a side-node, getting a cortex-A7 to boot with upstream kernel should
> be only about adding a DeviceTree nowadays. Remember that for Linux to
> boot, all you need is a system timer and UART. If you're using ARM IP
> for interrupts, timers, etc, it should be really straight forward to
> boot on v5.3
>
> > Are there any causes may lead the link TRB trigger a interrupt when
> > the IOC bit is not setted?
>
> No idea, perhaps you should have a deeper look at both Synopsys databook
> and xHCI specification.
>
> In any case, v4.9 is really old.
>
> Good luck
>
> --
> balbi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
       [not found]       ` <CADGPSwi87a5+3mCGAgptHgpBsQk9STQrEKs-kC6Nw55nPdRtOw@mail.gmail.com>
@ 2019-09-25 14:48         ` Mathias Nyman
  2019-09-25 16:22           ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Mathias Nyman @ 2019-09-25 14:48 UTC (permalink / raw)
  To: alex zheng; +Cc: Felipe Balbi, linux-usb, xiaowei.zheng

On 24.9.2019 17.45, alex zheng wrote:
> Hi Mathias,
> 
> I try to ignore the DMA errors, then the transfer continues but it
> complete with data lost, it seems like these ERROR Transfer event
> should be right and must not be ignore.
> 
> test app show:
> "did not get enough data, received size:14410176/15000000"
> 
> kernel log show: (you can see more detail info in the attached log files)

Logs show your transfer ring has four segments, but hardware fails to
jump from the last segment back to first)

Last TRB (LINK TRB) of each segment points to the next segment,
last segments link trb points back to first segment.

In your case:
0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000)

For some reason your hardware doesn't treat the last TRB at the last segment
as a LINK TRB, instead it just issues a transfer event for it, and continues to
the next address instead of jumping back to first segment:

Transfer event for last TRB at last segment: 0x1dbda000 (TRB: 0x1dbdaff0):
This is a link TRB and should not generate transfer event:

xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16 comp_code 1
xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000001dbdaff0 trb-start 000000001d117000 trb-end 000000001d117000 seg-start 000000001d117000 seg-end 000000001d10
xhci-hcd xhci-hcd.0.auto: Ignoring error

Next transfer event should be for TRB at fisrt segment (0x1d117000)
but event shows its trying to handle a event from TRB at 000000001dbdb000, which isn't even part of the ring.

xhci-hcd xhci-hcd.0.auto: process trans event : ep_index = 16, event_dma = 1dbdb000
xhci-hcd xhci-hcd.0.auto: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 16 comp_code 1
xhci-hcd xhci-hcd.0.auto: Looking for event-dma 000000001dbdb000 trb-start 000000001d117000 trb-end 000000001d117000 seg-start 000000001d117000 seg-end 000000001d10
xhci-hcd xhci-hcd.0.auto: Ignoring error

-Mathias

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-25 14:48         ` Mathias Nyman
@ 2019-09-25 16:22           ` David Laight
  2019-09-26  5:45             ` Felipe Balbi
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2019-09-25 16:22 UTC (permalink / raw)
  To: 'Mathias Nyman', alex zheng
  Cc: Felipe Balbi, linux-usb, xiaowei.zheng

From: Mathias Nyman
> Sent: 25 September 2019 15:48
> 
> On 24.9.2019 17.45, alex zheng wrote:
> > Hi Mathias,
...
> Logs show your transfer ring has four segments, but hardware fails to
> jump from the last segment back to first)
> 
> Last TRB (LINK TRB) of each segment points to the next segment,
> last segments link trb points back to first segment.
> 
> In your case:
> 0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000)
> 
> For some reason your hardware doesn't treat the last TRB at the last segment
> as a LINK TRB, instead it just issues a transfer event for it, and continues to
> the next address instead of jumping back to first segment:

That could be a cache coherency (or flushing (etc)) issue.

>> This is our self-design platform (ARM v7 cpu core  with synopsys DWC USB3.0 controller).
Or maybe your hardware is just getting some of the memory accesses wrong?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-25 16:22           ` David Laight
@ 2019-09-26  5:45             ` Felipe Balbi
  2019-09-26  8:21               ` Mathias Nyman
  0 siblings, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2019-09-26  5:45 UTC (permalink / raw)
  To: David Laight, 'Mathias Nyman', alex zheng
  Cc: linux-usb, xiaowei.zheng

[-- Attachment #1: Type: text/plain, Size: 1114 bytes --]


Hi,

David Laight <David.Laight@ACULAB.COM> writes:
> From: Mathias Nyman
>> Sent: 25 September 2019 15:48
>> 
>> On 24.9.2019 17.45, alex zheng wrote:
>> > Hi Mathias,
> ...
>> Logs show your transfer ring has four segments, but hardware fails to
>> jump from the last segment back to first)
>> 
>> Last TRB (LINK TRB) of each segment points to the next segment,
>> last segments link trb points back to first segment.
>> 
>> In your case:
>> 0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000)
>> 
>> For some reason your hardware doesn't treat the last TRB at the last segment
>> as a LINK TRB, instead it just issues a transfer event for it, and continues to
>> the next address instead of jumping back to first segment:
>
> That could be a cache coherency (or flushing (etc)) issue.

XHCI has a HW-configurable maximum number of segments in a ring. AFAICT,
xhci driver doesn't take that into consideration today. Perhaps the HW
in question doesn't like more than 3 segments.

Mathias, what was the register to check this? Do you remember?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-26  5:45             ` Felipe Balbi
@ 2019-09-26  8:21               ` Mathias Nyman
  2019-09-26 10:38                 ` alex zheng
  0 siblings, 1 reply; 14+ messages in thread
From: Mathias Nyman @ 2019-09-26  8:21 UTC (permalink / raw)
  To: Felipe Balbi, David Laight, alex zheng; +Cc: linux-usb, xiaowei.zheng

On 26.9.2019 8.45, Felipe Balbi wrote:
> 
> Hi,
> 
> David Laight <David.Laight@ACULAB.COM> writes:
>> From: Mathias Nyman
>>> Sent: 25 September 2019 15:48
>>>
>>> On 24.9.2019 17.45, alex zheng wrote:
>>>> Hi Mathias,
>> ...
>>> Logs show your transfer ring has four segments, but hardware fails to
>>> jump from the last segment back to first)
>>>
>>> Last TRB (LINK TRB) of each segment points to the next segment,
>>> last segments link trb points back to first segment.
>>>
>>> In your case:
>>> 0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000)
>>>
>>> For some reason your hardware doesn't treat the last TRB at the last segment
>>> as a LINK TRB, instead it just issues a transfer event for it, and continues to
>>> the next address instead of jumping back to first segment:
>>
>> That could be a cache coherency (or flushing (etc)) issue.

The Link TRB is written very early, right after the ring segment is allocated,
and before any other TRBs. 255 other TRBs were written and handled by hw
on this segment after this, so not very likely a flushing/cache coherency issue.

> 
> XHCI has a HW-configurable maximum number of segments in a ring. AFAICT,
> xhci driver doesn't take that into consideration today. Perhaps the HW
> in question doesn't like more than 3 segments.
> 
> Mathias, what was the register to check this? Do you remember?
> 

I only recall a limit for the event ring in the HSCPARAMS2 register(ERST MAX),
not for transfer rings.

Other things to look at would be

- check that Toggle Cycle bit is correct for last segments link TRB (incomplete logs)
- some old xHCI hardware needed the Chain bit set in link TRB for some isoc rings
- was ring recently expanded?, usually rings start with only two segments

Mathias

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-09-26  8:21               ` Mathias Nyman
@ 2019-09-26 10:38                 ` alex zheng
       [not found]                   ` <CADGPSwhCPvdu=KmQP6RHMJnh292UO0uBAt+KyJqqOWY5DWDc3w@mail.gmail.com>
  0 siblings, 1 reply; 14+ messages in thread
From: alex zheng @ 2019-09-26 10:38 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Felipe Balbi, David Laight, linux-usb, xiaowei.zheng

Hi,

Mathias Nyman <mathias.nyman@linux.intel.com> 于2019年9月26日周四 下午4:19写道:
>
> On 26.9.2019 8.45, Felipe Balbi wrote:
> >
> > Hi,
> >
> > David Laight <David.Laight@ACULAB.COM> writes:
> >> From: Mathias Nyman
> >>> Sent: 25 September 2019 15:48
> >>>
> >>> On 24.9.2019 17.45, alex zheng wrote:
> >>>> Hi Mathias,
> >> ...
> >>> Logs show your transfer ring has four segments, but hardware fails to
> >>> jump from the last segment back to first)
> >>>
> >>> Last TRB (LINK TRB) of each segment points to the next segment,
> >>> last segments link trb points back to first segment.
> >>>
> >>> In your case:
> >>> 0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000)
> >>>
> >>> For some reason your hardware doesn't treat the last TRB at the last segment
> >>> as a LINK TRB, instead it just issues a transfer event for it, and continues to
> >>> the next address instead of jumping back to first segment:
> >>
> >> That could be a cache coherency (or flushing (etc)) issue.
>
> The Link TRB is written very early, right after the ring segment is allocated,
> and before any other TRBs. 255 other TRBs were written and handled by hw
> on this segment after this, so not very likely a flushing/cache coherency issue.
>
I  add a flush_cache_all() after queue_trb everytime but it make no
use. It seems
not a flushing/cache coherency issus.

flush like this:
     inc_enq(xhci, ring, more_trbs_coming);

                                                  +
flush_cache_all();

> >
> > XHCI has a HW-configurable maximum number of segments in a ring. AFAICT,
> > xhci driver doesn't take that into consideration today. Perhaps the HW
> > in question doesn't like more than 3 segments.
> >
> > Mathias, what was the register to check this? Do you remember?
> >
>
> I only recall a limit for the event ring in the HSCPARAMS2 register(ERST MAX),
> not for transfer rings.
>
> Other things to look at would be
>
> - check that Toggle Cycle bit is correct for last segments link TRB (incomplete logs)

I dump an other error log, more complete logs see attached
file(transfer_error_0926.cap), in the log:
the error link TRB:
0x1d00dff0: TRB 000000001d068000 status 'Invalid' len 0 slot 0 ep 0
type 'Link' flags e:c
and last segment link TRB:
0x1eb0aff0: TRB 000000001d00d000 status 'Invalid' len 0 slot 0 ep 0
type 'Link' flags e:C

> - some old xHCI hardware needed the Chain bit set in link TRB for some isoc rings
xhci ver is 1.1:
6.888570] c1 46 (kworker/u8:1) xhci-hcd xhci-hcd.0.auto: HCIVERSION: 0x110

> - was ring recently expanded?, usually rings start with only two segments
The extra segments are expanded after raw data test run a while,
especially when the RNDIS test(iperf3) begin to run.

Other info:
1. This issue seems only happened when the raw bulk data test and the
rndis test(other pair endpoints) run at the same time, and happens
more often if we queue trb more quick.
2. The raw bulk data test case is a libusb test use ep4(in) & ep3(out)
to transfer raw bulk data, and I use iperf3(tcp) to test USB rndis.
3. The log file attached only show ep4(in) enqueue/dequeue log for
more readable,
4. More test result show as below:
           1)  run just one raw bulk data test  -->  (always fine)
           2)  run raw rulk data test + rndis test run at the same
time --> (transfer error in 10 minutes)
           3)  run two raw bulk data test run at the same time (with
two pair endpoint) --> (transfer error in 10 minutes)
5. I try to modify the DWC3 hw registers like TX/RX FIFO size,
GTXTHRCFG/GRXTHRCFG , but also did not work.
6. Related interface info:
             8801 I:* If#= 0 Alt= 0 #EPs= 1 Cls=e0(wlcon) Sub=01
Prot=03 Driver=rndis_host
             8802 E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
             8803 I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00
Prot=00 Driver=rndis_host        -----> used in rndis test
             8804 E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
             8805 E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
             8809 I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=43
Prot=01 Driver=(none)    -----> used in raw bulk test
             8810 E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
             8811 E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
             8820 I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=43
Prot=01 Driver=(none)     ----> used in double raw bulk test
             8821 E:  Ad=06(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
             8822 E:  Ad=88(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms

It seems that there are some conflicts when multiple endpoints work at
the same time on our SOC. Are there any other way can try?

>

> Mathias

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
       [not found]                   ` <CADGPSwhCPvdu=KmQP6RHMJnh292UO0uBAt+KyJqqOWY5DWDc3w@mail.gmail.com>
@ 2019-10-23  9:52                     ` alex zheng
  2019-10-23 10:01                       ` Felipe Balbi
  0 siblings, 1 reply; 14+ messages in thread
From: alex zheng @ 2019-10-23  9:52 UTC (permalink / raw)
  To: Mathias Nyman, Felipe Balbi, David Laight; +Cc: linux-usb, xiaowei.zheng

Hi, all

We found that this is a known issue of synopsys DWC3 USB controller,
when the PARKMODE_SS of DWC3 is enable, the controller may hang or do
wrong TRB schedule in some heavy load conditions.

Setting DISABLE_PARKMODE_SS to 1 can work around this bug.

Thank you for your help.

alex zheng <tc0721@gmail.com> 于2019年9月26日周四 下午7:34写道:
>
> add log file.
>
> alex zheng <tc0721@gmail.com> 于2019年9月26日周四 下午6:38写道:
> >
> > Hi,
> >
> > Mathias Nyman <mathias.nyman@linux.intel.com> 于2019年9月26日周四 下午4:19写道:
> > >
> > > On 26.9.2019 8.45, Felipe Balbi wrote:
> > > >
> > > > Hi,
> > > >
> > > > David Laight <David.Laight@ACULAB.COM> writes:
> > > >> From: Mathias Nyman
> > > >>> Sent: 25 September 2019 15:48
> > > >>>
> > > >>> On 24.9.2019 17.45, alex zheng wrote:
> > > >>>> Hi Mathias,
> > > >> ...
> > > >>> Logs show your transfer ring has four segments, but hardware fails to
> > > >>> jump from the last segment back to first)
> > > >>>
> > > >>> Last TRB (LINK TRB) of each segment points to the next segment,
> > > >>> last segments link trb points back to first segment.
> > > >>>
> > > >>> In your case:
> > > >>> 0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000)
> > > >>>
> > > >>> For some reason your hardware doesn't treat the last TRB at the last segment
> > > >>> as a LINK TRB, instead it just issues a transfer event for it, and continues to
> > > >>> the next address instead of jumping back to first segment:
> > > >>
> > > >> That could be a cache coherency (or flushing (etc)) issue.
> > >
> > > The Link TRB is written very early, right after the ring segment is allocated,
> > > and before any other TRBs. 255 other TRBs were written and handled by hw
> > > on this segment after this, so not very likely a flushing/cache coherency issue.
> > >
> > I  add a flush_cache_all() after queue_trb everytime but it make no
> > use. It seems
> > not a flushing/cache coherency issus.
> >
> > flush like this:
> >      inc_enq(xhci, ring, more_trbs_coming);
> >   + flush_cache_all();
> >
> > > >
> > > > XHCI has a HW-configurable maximum number of segments in a ring. AFAICT,
> > > > xhci driver doesn't take that into consideration today. Perhaps the HW
> > > > in question doesn't like more than 3 segments.
> > > >
> > > > Mathias, what was the register to check this? Do you remember?
> > > >
> > >
> > > I only recall a limit for the event ring in the HSCPARAMS2 register(ERST MAX),
> > > not for transfer rings.
> > >
> > > Other things to look at would be
> > >
> > > - check that Toggle Cycle bit is correct for last segments link TRB (incomplete logs)
> >
> > I dump an other error log, more complete logs see attached
> > file(transfer_error_0926.cap), in the log:
> > the error link TRB:
> > 0x1d00dff0: TRB 000000001d068000 status 'Invalid' len 0 slot 0 ep 0
> > type 'Link' flags e:c
> > and last segment link TRB:
> > 0x1eb0aff0: TRB 000000001d00d000 status 'Invalid' len 0 slot 0 ep 0
> > type 'Link' flags e:C
> >
> > > - some old xHCI hardware needed the Chain bit set in link TRB for some isoc rings
> > xhci ver is 1.1:
> > 6.888570] c1 46 (kworker/u8:1) xhci-hcd xhci-hcd.0.auto: HCIVERSION: 0x110
> >
> > > - was ring recently expanded?, usually rings start with only two segments
> > The extra segments are expanded after raw data test run a while,
> > especially when the RNDIS test(iperf3) begin to run.
> >
> > Other info:
> > 1. This issue seems only happened when the raw bulk data test and the
> > rndis test(other pair endpoints) run at the same time, and happens
> > more often if we queue trb more quick.
> > 2. The raw bulk data test case is a libusb test use ep4(in) & ep3(out)
> > to transfer raw bulk data, and I use iperf3(tcp) to test USB rndis.
> > 3. The log file attached only show ep4(in) enqueue/dequeue log for
> > more readable,
> > 4. More test result show as below:
> >    1)  run just one raw bulk data test  -->  (always fine)
> >    2)  run raw rulk data test + rndis test run at the same
> >         time --> (transfer error in 10 minutes)
> >    3)  run two raw bulk data test run at the same time (with
> >         two pair endpoint) --> (transfer error in 10 minutes)
> > 5. I try to modify the DWC3 hw registers like TX/RX FIFO size,
> >     GTXTHRCFG/GRXTHRCFG , but also did not work.
> > 6. Related interface info:
> >     8801 I:* If#= 0 Alt= 0 #EPs= 1 Cls=e0(wlcon) Sub=01
> >     Prot=03 Driver=rndis_host
> >     8802 E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
> >     8803 I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00
> >      Prot=00 Driver=rndis_host        -----> used in rndis test
> >      8804 E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> >      8805 E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> >      8809 I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=43
> > Prot=01 Driver=(none)    -----> used in raw bulk test
> >      8810 E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> >      8811 E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> >      8820 I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=43
> > Prot=01 Driver=(none)     ----> used in double raw bulk test
> >      8821 E:  Ad=06(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> >      8822 E:  Ad=88(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
> >
> > It seems that there are some conflicts when multiple endpoints work at
> > the same time on our SOC. Are there any other way can try?
> >
> > >
> >
> > > Mathias

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-10-23  9:52                     ` alex zheng
@ 2019-10-23 10:01                       ` Felipe Balbi
  2019-10-25  8:44                         ` alex zheng
  0 siblings, 1 reply; 14+ messages in thread
From: Felipe Balbi @ 2019-10-23 10:01 UTC (permalink / raw)
  To: alex zheng, Mathias Nyman, David Laight
  Cc: linux-usb, xiaowei.zheng, Thinh Nguyen

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]


Hi,

(please don't top-post)

alex zheng <tc0721@gmail.com> writes:
> Hi, all
>
> We found that this is a known issue of synopsys DWC3 USB controller,
> when the PARKMODE_SS of DWC3 is enable, the controller may hang or do
> wrong TRB schedule in some heavy load conditions.
>
> Setting DISABLE_PARKMODE_SS to 1 can work around this bug.

Is this something that affects some versions but not others? If the
case, we should teach the driver to handle this based on a revision
check.

cheers

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted
  2019-10-23 10:01                       ` Felipe Balbi
@ 2019-10-25  8:44                         ` alex zheng
  0 siblings, 0 replies; 14+ messages in thread
From: alex zheng @ 2019-10-25  8:44 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Mathias Nyman, David Laight, linux-usb, xiaowei.zheng, Thinh Nguyen

Hi,

Felipe Balbi <felipe.balbi@linux.intel.com> 于2019年10月23日周三 下午6:02写道:
>
>
> Hi,
>
> (please don't top-post)
>
> alex zheng <tc0721@gmail.com> writes:
> > Hi, all
> >
> > We found that this is a known issue of synopsys DWC3 USB controller,
> > when the PARKMODE_SS of DWC3 is enable, the controller may hang or do
> > wrong TRB schedule in some heavy load conditions.
> >
> > Setting DISABLE_PARKMODE_SS to 1 can work around this bug.
>
> Is this something that affects some versions but not others? If the
> case, we should teach the driver to handle this based on a revision
> check.

It seems that all DWC3 USB3.0 controller witch have parkmode_ss may
run into this issue, but we did not do a further test, and I find that
there is already a fix patch in maillist yesterday, see title: usb:
"dwc3: Update entries for disabling SS instances in park mode"

>
> cheers


>
> --
> balbi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-10-25  8:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-22  8:34 BUG report: usb: dwc3: Link TRB triggered an intterupt without IOC being setted alex zheng
2019-09-23  5:36 ` Felipe Balbi
2019-09-23  7:08   ` alex zheng
2019-09-23 10:15     ` Mathias Nyman
     [not found]       ` <CADGPSwi87a5+3mCGAgptHgpBsQk9STQrEKs-kC6Nw55nPdRtOw@mail.gmail.com>
2019-09-25 14:48         ` Mathias Nyman
2019-09-25 16:22           ` David Laight
2019-09-26  5:45             ` Felipe Balbi
2019-09-26  8:21               ` Mathias Nyman
2019-09-26 10:38                 ` alex zheng
     [not found]                   ` <CADGPSwhCPvdu=KmQP6RHMJnh292UO0uBAt+KyJqqOWY5DWDc3w@mail.gmail.com>
2019-10-23  9:52                     ` alex zheng
2019-10-23 10:01                       ` Felipe Balbi
2019-10-25  8:44                         ` alex zheng
2019-09-23 10:45     ` Felipe Balbi
2019-09-24 14:19       ` alex zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).