Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
* USB network gadget / DWC3 issue
@ 2021-03-30 12:37 Andy Shevchenko
  2021-03-30 16:17 ` Felipe Balbi
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Shevchenko @ 2021-03-30 12:37 UTC (permalink / raw)
  To: Thinh Nguyen; +Cc: Alan Stern, USB, Ferry Toth, Felipe Balbi

Hi!

I have a platform with DWC3 in Dual Role mode. Currently I'm
experimenting on v5.12-rc5 with a few patches (mostly configuration)
applied [1]. I'm using Debian Unstable on the host machine and
BuildRoot with the above mentioned kernel on the target.

**So, scenario 0:
1. Run iperf3 -s on target
2. Run iperf3 -c ... -t 0 on the host
3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec                  receiver

**Scenario 1:
1. Now, detach USB cable, wait for several seconds, attach it back,
repeat above:
0.00-9.94   sec   209 MBytes   176 Mbits/sec                  receiver

Note the bandwidth drop (177 vs. 192).

(Repeating scenario 1 will give now the same result)

**Scenario 2.
1. Detach USB cable, attach a device, for example USB stick,
2. See it being enumerated and detach it.
3. Attach cable from host
4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec                  receiver

Note even more bandwidth drop!

(Repeating scenario 1 keeps the same lower bandwidth)

NOTE, sometimes on this scenario after several seconds the target
simply reboots (w/o any logs [from kernel] printed)!

So, any pointers on how to debug and what can be a smoking gun here?

Ferry reported this in [2]. There are different kernel versions and
tools to establish the connection (like connman vs. none in my case).

[1]: https://github.com/andy-shev/linux/
[2]: https://github.com/andy-shev/linux/issues/31


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-03-30 12:37 USB network gadget / DWC3 issue Andy Shevchenko
@ 2021-03-30 16:17 ` Felipe Balbi
  2021-03-30 20:26   ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Balbi @ 2021-03-30 16:17 UTC (permalink / raw)
  To: Andy Shevchenko, Thinh Nguyen; +Cc: Alan Stern, USB, Ferry Toth


Hi,

Andy Shevchenko <andy.shevchenko@gmail.com> writes:
> Hi!
>
> I have a platform with DWC3 in Dual Role mode. Currently I'm
> experimenting on v5.12-rc5 with a few patches (mostly configuration)
> applied [1]. I'm using Debian Unstable on the host machine and
> BuildRoot with the above mentioned kernel on the target.
>
> **So, scenario 0:
> 1. Run iperf3 -s on target
> 2. Run iperf3 -c ... -t 0 on the host
> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec                  receiver
>
> **Scenario 1:
> 1. Now, detach USB cable, wait for several seconds, attach it back,
> repeat above:
> 0.00-9.94   sec   209 MBytes   176 Mbits/sec                  receiver
>
> Note the bandwidth drop (177 vs. 192).
>
> (Repeating scenario 1 will give now the same result)
>
> **Scenario 2.
> 1. Detach USB cable, attach a device, for example USB stick,
> 2. See it being enumerated and detach it.
> 3. Attach cable from host
> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec                  receiver
>
> Note even more bandwidth drop!
>
> (Repeating scenario 1 keeps the same lower bandwidth)
>
> NOTE, sometimes on this scenario after several seconds the target
> simply reboots (w/o any logs [from kernel] printed)!
>
> So, any pointers on how to debug and what can be a smoking gun here?
>
> Ferry reported this in [2]. There are different kernel versions and
> tools to establish the connection (like connman vs. none in my case).
>
> [1]: https://github.com/andy-shev/linux/
> [2]: https://github.com/andy-shev/linux/issues/31

dwc3 tracepoints should give some initial hints. Look at packets sizes
and period of transmission. From dwc3 side, I can't think of anything we
would do to throttle the transmission, but tracepoints should tell a
clearer story.

-- 
balbi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-03-30 16:17 ` Felipe Balbi
@ 2021-03-30 20:26   ` Ferry Toth
  2021-03-30 21:57     ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-03-30 20:26 UTC (permalink / raw)
  To: Felipe Balbi, Andy Shevchenko, Thinh Nguyen; +Cc: Alan Stern, USB

Hi,

Op 30-03-2021 om 18:17 schreef Felipe Balbi:
> Hi,
>
> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>> Hi!
>>
>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>> applied [1]. I'm using Debian Unstable on the host machine and
>> BuildRoot with the above mentioned kernel on the target.
>>
>> **So, scenario 0:
>> 1. Run iperf3 -s on target
>> 2. Run iperf3 -c ... -t 0 on the host
>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec                  receiver
>>
>> **Scenario 1:
>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>> repeat above:
>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec                  receiver
>>
>> Note the bandwidth drop (177 vs. 192).
>>
>> (Repeating scenario 1 will give now the same result)
>>
>> **Scenario 2.
>> 1. Detach USB cable, attach a device, for example USB stick,
>> 2. See it being enumerated and detach it.
>> 3. Attach cable from host
>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec                  receiver
>>
>> Note even more bandwidth drop!
>>
>> (Repeating scenario 1 keeps the same lower bandwidth)
>>
>> NOTE, sometimes on this scenario after several seconds the target
>> simply reboots (w/o any logs [from kernel] printed)!
>>
>> So, any pointers on how to debug and what can be a smoking gun here?
>>
>> Ferry reported this in [2]. There are different kernel versions and
>> tools to establish the connection (like connman vs. none in my case).
>>
>> [1]: https://github.com/andy-shev/linux/
>> [2]: https://github.com/andy-shev/linux/issues/31
> dwc3 tracepoints should give some initial hints. Look at packets sizes
> and period of transmission. From dwc3 side, I can't think of anything we
> would do to throttle the transmission, but tracepoints should tell a
> clearer story.
>
My testing (but yes, with difference kernel and network managed by 
connman) shows:

1) on cold boot eem network gadget works fine

2) after unplug or warm reboot (which is also an unplug) it's broken, 
speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost, no 
configuration received from dhcp, occasional reboot, only way to fix is 
cold boot

3) if before unplug `connmanctl disable gadget`, on replugging and 
enabling it works fine

My theory is that some HW register is disturbed on a surprise unplug, 
but not reset on plug or warm boot. But on cold boot is cleared. Maybe 
that can help to narrow down tracepoints?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-03-30 20:26   ` Ferry Toth
@ 2021-03-30 21:57     ` Ferry Toth
  2021-04-02 19:12       ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-03-30 21:57 UTC (permalink / raw)
  To: Felipe Balbi, Andy Shevchenko, Thinh Nguyen; +Cc: Alan Stern, USB

Hi

Op 30-03-2021 om 22:26 schreef Ferry Toth:
> Hi,
>
> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>> Hi,
>>
>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>> Hi!
>>>
>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>>> applied [1]. I'm using Debian Unstable on the host machine and
>>> BuildRoot with the above mentioned kernel on the target.
>>>
>>> **So, scenario 0:
>>> 1. Run iperf3 -s on target
>>> 2. Run iperf3 -c ... -t 0 on the host
>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec                  
>>> receiver
>>>
>>> **Scenario 1:
>>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>>> repeat above:
>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>
>>> Note the bandwidth drop (177 vs. 192).
>>>
>>> (Repeating scenario 1 will give now the same result)
>>>
>>> **Scenario 2.
>>> 1. Detach USB cable, attach a device, for example USB stick,
>>> 2. See it being enumerated and detach it.
>>> 3. Attach cable from host
>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec                  
>>> receiver
>>>
>>> Note even more bandwidth drop!
>>>
>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>
>>> NOTE, sometimes on this scenario after several seconds the target
>>> simply reboots (w/o any logs [from kernel] printed)!
>>>
>>> So, any pointers on how to debug and what can be a smoking gun here?
>>>
>>> Ferry reported this in [2]. There are different kernel versions and
>>> tools to establish the connection (like connman vs. none in my case).
>>>
>>> [1]: https://github.com/andy-shev/linux/
>>> [2]: https://github.com/andy-shev/linux/issues/31
>> dwc3 tracepoints should give some initial hints. Look at packets sizes
>> and period of transmission. From dwc3 side, I can't think of anything we
>> would do to throttle the transmission, but tracepoints should tell a
>> clearer story.
>>
> My testing (but yes, with difference kernel and network managed by 
> connman) shows:
>
> 1) on cold boot eem network gadget works fine
>
> 2) after unplug or warm reboot (which is also an unplug) it's broken, 
> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost, 
> no configuration received from dhcp, occasional reboot, only way to 
> fix is cold boot
>
> 3) if before unplug `connmanctl disable gadget`, on replugging and 
> enabling it works fine
>
> My theory is that some HW register is disturbed on a surprise unplug, 
> but not reset on plug or warm boot. But on cold boot is cleared. Maybe 
> that can help to narrow down tracepoints?
>
I captured a plug after warm and after cold boot. This includes network 
setup (dhcp). You can find it in [2] or directly link here: 
https://github.com/andy-shev/linux/files/6232410/boot.zip

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-03-30 21:57     ` Ferry Toth
@ 2021-04-02 19:12       ` Ferry Toth
  2021-04-02 20:16         ` Thinh Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-02 19:12 UTC (permalink / raw)
  To: Felipe Balbi, Andy Shevchenko, Thinh Nguyen; +Cc: Alan Stern, USB

Hi

Op 30-03-2021 om 23:57 schreef Ferry Toth:
> Hi
>
> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>> Hi,
>>
>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>> Hi,
>>>
>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>> Hi!
>>>>
>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>> BuildRoot with the above mentioned kernel on the target.
>>>>
>>>> **So, scenario 0:
>>>> 1. Run iperf3 -s on target
>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec                  
>>>> receiver
>>>>
>>>> **Scenario 1:
>>>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>>>> repeat above:
>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>
>>>> Note the bandwidth drop (177 vs. 192).
>>>>
>>>> (Repeating scenario 1 will give now the same result)
>>>>
>>>> **Scenario 2.
>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>> 2. See it being enumerated and detach it.
>>>> 3. Attach cable from host
>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec                  
>>>> receiver
>>>>
>>>> Note even more bandwidth drop!
>>>>
>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>
>>>> NOTE, sometimes on this scenario after several seconds the target
>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>
>>>> So, any pointers on how to debug and what can be a smoking gun here?
>>>>
>>>> Ferry reported this in [2]. There are different kernel versions and
>>>> tools to establish the connection (like connman vs. none in my case).
>>>>
>>>> [1]: https://github.com/andy-shev/linux/
>>>> [2]: https://github.com/andy-shev/linux/issues/31
>>> dwc3 tracepoints should give some initial hints. Look at packets sizes
>>> and period of transmission. From dwc3 side, I can't think of 
>>> anything we
>>> would do to throttle the transmission, but tracepoints should tell a
>>> clearer story.
>>>
>> My testing (but yes, with difference kernel and network managed by 
>> connman) shows:
>>
>> 1) on cold boot eem network gadget works fine
>>
>> 2) after unplug or warm reboot (which is also an unplug) it's broken, 
>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost, 
>> no configuration received from dhcp, occasional reboot, only way to 
>> fix is cold boot
>>
>> 3) if before unplug `connmanctl disable gadget`, on replugging and 
>> enabling it works fine
>>
>> My theory is that some HW register is disturbed on a surprise unplug, 
>> but not reset on plug or warm boot. But on cold boot is cleared. 
>> Maybe that can help to narrow down tracepoints?
>>
> I captured a plug after warm and after cold boot. This includes 
> network setup (dhcp). You can find it in [2] or directly link here: 
> https://github.com/andy-shev/linux/files/6232410/boot.zip

While the above traces in boot.zip allow compare which regs not 
correctly initialized on warm boot, I have now captured traces of 
unplug/plug.

Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged and the 
eem gadget network setup (dhcp). Then trace unplug. Then trace plug.

After plug the eem connection is again broken.

This might allow figuring out what goes wrong on unplug. Traces here: 
https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip
**


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-02 19:12       ` Ferry Toth
@ 2021-04-02 20:16         ` Thinh Nguyen
  2021-04-02 22:40           ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-02 20:16 UTC (permalink / raw)
  To: Ferry Toth, Felipe Balbi, Andy Shevchenko, Thinh Nguyen; +Cc: Alan Stern, USB

Ferry Toth wrote:
> Hi
> 
> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>> Hi
>>
>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>> Hi,
>>>>
>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>> Hi!
>>>>>
>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>
>>>>> **So, scenario 0:
>>>>> 1. Run iperf3 -s on target
>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec                 
>>>>> receiver
>>>>>
>>>>> **Scenario 1:
>>>>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>>>>> repeat above:
>>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>>
>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>
>>>>> (Repeating scenario 1 will give now the same result)
>>>>>
>>>>> **Scenario 2.
>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>> 2. See it being enumerated and detach it.
>>>>> 3. Attach cable from host
>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec                 
>>>>> receiver
>>>>>
>>>>> Note even more bandwidth drop!
>>>>>
>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>
>>>>> NOTE, sometimes on this scenario after several seconds the target
>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>
>>>>> So, any pointers on how to debug and what can be a smoking gun here?
>>>>>
>>>>> Ferry reported this in [2]. There are different kernel versions and
>>>>> tools to establish the connection (like connman vs. none in my case).
>>>>>
>>>>> [1]:
>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>> [2]:
>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>
>>>> dwc3 tracepoints should give some initial hints. Look at packets sizes
>>>> and period of transmission. From dwc3 side, I can't think of
>>>> anything we
>>>> would do to throttle the transmission, but tracepoints should tell a
>>>> clearer story.
>>>>
>>> My testing (but yes, with difference kernel and network managed by
>>> connman) shows:
>>>
>>> 1) on cold boot eem network gadget works fine
>>>
>>> 2) after unplug or warm reboot (which is also an unplug) it's broken,
>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost,
>>> no configuration received from dhcp, occasional reboot, only way to
>>> fix is cold boot
>>>
>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>> enabling it works fine
>>>
>>> My theory is that some HW register is disturbed on a surprise unplug,
>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>> Maybe that can help to narrow down tracepoints?
>>>
>> I captured a plug after warm and after cold boot. This includes
>> network setup (dhcp). You can find it in [2] or directly link here:
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEpjhhrwa-$
> 
> 
> While the above traces in boot.zip allow compare which regs not
> correctly initialized on warm boot, I have now captured traces of
> unplug/plug.
> 
> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged and the
> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
> 
> After plug the eem connection is again broken.
> 
> This might allow figuring out what goes wrong on unplug. Traces here:
> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEpgii82NS$
> **
> 

Hi,

Were you able to narrow down the issue to only DWC3 device? (i.e. you
tested with different hosts and different device controllers to confirm
this)

Did you see this issue previously? If not, is it possible to do git
bisection?

BR,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-02 20:16         ` Thinh Nguyen
@ 2021-04-02 22:40           ` Ferry Toth
  2021-04-03  2:02             ` Thinh Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-02 22:40 UTC (permalink / raw)
  To: Thinh Nguyen, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Hi,

Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Hi
>>
>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>> Hi
>>>
>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>> Hi,
>>>>>
>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>> Hi!
>>>>>>
>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>
>>>>>> **So, scenario 0:
>>>>>> 1. Run iperf3 -s on target
>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec
>>>>>> receiver
>>>>>>
>>>>>> **Scenario 1:
>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>>>>>> repeat above:
>>>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>>>
>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>
>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>
>>>>>> **Scenario 2.
>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>> 2. See it being enumerated and detach it.
>>>>>> 3. Attach cable from host
>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>> receiver
>>>>>>
>>>>>> Note even more bandwidth drop!
>>>>>>
>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>
>>>>>> NOTE, sometimes on this scenario after several seconds the target
>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>
>>>>>> So, any pointers on how to debug and what can be a smoking gun here?
>>>>>>
>>>>>> Ferry reported this in [2]. There are different kernel versions and
>>>>>> tools to establish the connection (like connman vs. none in my case).
>>>>>>
>>>>>> [1]:
>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>>> [2]:
>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>>
>>>>> dwc3 tracepoints should give some initial hints. Look at packets sizes
>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>> anything we
>>>>> would do to throttle the transmission, but tracepoints should tell a
>>>>> clearer story.
>>>>>
>>>> My testing (but yes, with difference kernel and network managed by
>>>> connman) shows:
>>>>
>>>> 1) on cold boot eem network gadget works fine
>>>>
>>>> 2) after unplug or warm reboot (which is also an unplug) it's broken,
>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost,
>>>> no configuration received from dhcp, occasional reboot, only way to
>>>> fix is cold boot
>>>>
>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>> enabling it works fine
>>>>
>>>> My theory is that some HW register is disturbed on a surprise unplug,
>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>> Maybe that can help to narrow down tracepoints?
>>>>
>>> I captured a plug after warm and after cold boot. This includes
>>> network setup (dhcp). You can find it in [2] or directly link here:
>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEpjhhrwa-$
>>
>>
>> While the above traces in boot.zip allow compare which regs not
>> correctly initialized on warm boot, I have now captured traces of
>> unplug/plug.
>>
>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged and the
>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>
>> After plug the eem connection is again broken.
>>
>> This might allow figuring out what goes wrong on unplug. Traces here:
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEpgii82NS$
>> **
>>
> 
> Hi,
> 
> Were you able to narrow down the issue to only DWC3 device? (i.e. you
> tested with different hosts and different device controllers to confirm
> this)

I haven't tried with other devices. I have been forced to replace my 
host mobo and nothing changed. But I didn't pay attention to the 
particular host controller.

> Did you see this issue previously? If not, is it possible to do git
> bisection?

This is with Intel Edison where main line usb gadget support appeared 
around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7 
and tried to bisect but failed.

I realize only now that I failed because:
1) 5.4 already has this issue as I recently retested
2) I didn't use a reproducible criterion. After warm reboot the eem 
gadget fails, but you can flip the host/gadget switch back and forth and 
have the illusion that the connection restored.

The scenario described here is reproducible: leaving the switch in 
gadget mode eem works after cold boot only. And it likely breaks on unplug.

A 2nd hint is that disabling gadget (I used `connmanctl disable gadget` 
but I believe that has the same effect as `iw link set dev usb0 down`) 
before unplug prevents messing up the driver, so you can replug and 
enable again.

> BR,
> Thinh
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-02 22:40           ` Ferry Toth
@ 2021-04-03  2:02             ` Thinh Nguyen
  2021-04-03 11:25               ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-03  2:02 UTC (permalink / raw)
  To: Ferry Toth, Thinh Nguyen, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Ferry Toth wrote:
> Hi,
> 
> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Hi
>>>
>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>> Hi
>>>>
>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>> Hi,
>>>>>
>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>> Hi,
>>>>>>
>>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>
>>>>>>> **So, scenario 0:
>>>>>>> 1. Run iperf3 -s on target
>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec
>>>>>>> receiver
>>>>>>>
>>>>>>> **Scenario 1:
>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>>>>>>> repeat above:
>>>>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>>>>
>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>
>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>
>>>>>>> **Scenario 2.
>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>> 2. See it being enumerated and detach it.
>>>>>>> 3. Attach cable from host
>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>> receiver
>>>>>>>
>>>>>>> Note even more bandwidth drop!
>>>>>>>
>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>
>>>>>>> NOTE, sometimes on this scenario after several seconds the target
>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>
>>>>>>> So, any pointers on how to debug and what can be a smoking gun here?
>>>>>>>
>>>>>>> Ferry reported this in [2]. There are different kernel versions and
>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>> case).
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>>>>
>>>>>>> [2]:
>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>>>>
>>>>>>
>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>> sizes
>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>> anything we
>>>>>> would do to throttle the transmission, but tracepoints should tell a
>>>>>> clearer story.
>>>>>>
>>>>> My testing (but yes, with difference kernel and network managed by
>>>>> connman) shows:
>>>>>
>>>>> 1) on cold boot eem network gadget works fine
>>>>>
>>>>> 2) after unplug or warm reboot (which is also an unplug) it's broken,
>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost,
>>>>> no configuration received from dhcp, occasional reboot, only way to
>>>>> fix is cold boot
>>>>>
>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>> enabling it works fine
>>>>>
>>>>> My theory is that some HW register is disturbed on a surprise unplug,
>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>> Maybe that can help to narrow down tracepoints?
>>>>>
>>>> I captured a plug after warm and after cold boot. This includes
>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEpjhhrwa-$
>>>>
>>>
>>>
>>> While the above traces in boot.zip allow compare which regs not
>>> correctly initialized on warm boot, I have now captured traces of
>>> unplug/plug.
>>>
>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged and the
>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>
>>> After plug the eem connection is again broken.
>>>
>>> This might allow figuring out what goes wrong on unplug. Traces here:
>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEpgii82NS$
>>>
>>> **
>>>
>>
>> Hi,
>>
>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>> tested with different hosts and different device controllers to confirm
>> this)
> 
> I haven't tried with other devices. I have been forced to replace my
> host mobo and nothing changed. But I didn't pay attention to the
> particular host controller.
> 

It'd be better if we can narrow down the culprit as this seems to me
like a synchronization issue at the upper layer between the host and device.

>> Did you see this issue previously? If not, is it possible to do git
>> bisection?
> 
> This is with Intel Edison where main line usb gadget support appeared
> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
> and tried to bisect but failed.
> 
> I realize only now that I failed because:
> 1) 5.4 already has this issue as I recently retested

I'm confused, why do you believe the problem is between 5.4 and 5.7 if
5.4 already has this issue? So when did you start seeing this problem?

Also, these kernel versions are really old, there's been a lot of
updates/fixes to dwc3 since then. Can we run tests on the latest kernel?

> 2) I didn't use a reproducible criterion. After warm reboot the eem
> gadget fails, but you can flip the host/gadget switch back and forth and
> have the illusion that the connection restored.
> 
> The scenario described here is reproducible: leaving the switch in
> gadget mode eem works after cold boot only. And it likely breaks on unplug.
> 
> A 2nd hint is that disabling gadget (I used `connmanctl disable gadget`
> but I believe that has the same effect as `iw link set dev usb0 down`)
> before unplug prevents messing up the driver, so you can replug and
> enable again.

These data points are good. However, we'd need to know where to look
first. The issue isn't obvious from the DWC3 controller or the DWC3 driver.

Can you check a few things:
1) Any error/timeout messages from the host's dmesg? Or device side?
2) What kernel version is your host using? Can you use the latest for
both host and device?
3) Snapshot of dwc3 tracepoints of active transfers between the normal
vs throttled of the latest kernel

BR,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-03  2:02             ` Thinh Nguyen
@ 2021-04-03 11:25               ` Ferry Toth
  2021-04-03 21:15                 ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-03 11:25 UTC (permalink / raw)
  To: Thinh Nguyen, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Hi,

Op 03-04-2021 om 04:02 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Hi,
>>
>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>>> Ferry Toth wrote:
>>>> Hi
>>>>
>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>>> Hi
>>>>>
>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly configuration)
>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>>
>>>>>>>> **So, scenario 0:
>>>>>>>> 1. Run iperf3 -s on target
>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec
>>>>>>>> receiver
>>>>>>>>
>>>>>>>> **Scenario 1:
>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it back,
>>>>>>>> repeat above:
>>>>>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>>>>>
>>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>>
>>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>>
>>>>>>>> **Scenario 2.
>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>>> 2. See it being enumerated and detach it.
>>>>>>>> 3. Attach cable from host
>>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>>> receiver
>>>>>>>>
>>>>>>>> Note even more bandwidth drop!
>>>>>>>>
>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>>
>>>>>>>> NOTE, sometimes on this scenario after several seconds the target
>>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>>
>>>>>>>> So, any pointers on how to debug and what can be a smoking gun here?
>>>>>>>>
>>>>>>>> Ferry reported this in [2]. There are different kernel versions and
>>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>>> case).
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>>>>>
>>>>>>>> [2]:
>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>>>>>
>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>>> sizes
>>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>>> anything we
>>>>>>> would do to throttle the transmission, but tracepoints should tell a
>>>>>>> clearer story.
>>>>>>>
>>>>>> My testing (but yes, with difference kernel and network managed by
>>>>>> connman) shows:
>>>>>>
>>>>>> 1) on cold boot eem network gadget works fine
>>>>>>
>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's broken,
>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets lost,
>>>>>> no configuration received from dhcp, occasional reboot, only way to
>>>>>> fix is cold boot
>>>>>>
>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>>> enabling it works fine
>>>>>>
>>>>>> My theory is that some HW register is disturbed on a surprise unplug,
>>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>>> Maybe that can help to narrow down tracepoints?
>>>>>>
>>>>> I captured a plug after warm and after cold boot. This includes
>>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip
>>>>>
>>>>
>>>> While the above traces in boot.zip allow compare which regs not
>>>> correctly initialized on warm boot, I have now captured traces of
>>>> unplug/plug.
>>>>
>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged and the
>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>>
>>>> After plug the eem connection is again broken.
>>>>
>>>> This might allow figuring out what goes wrong on unplug. Traces here:
>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip
>>>>
>>>> **
>>>>
>>> Hi,
>>>
>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>>> tested with different hosts and different device controllers to confirm
>>> this)
>> I haven't tried with other devices. I have been forced to replace my
>> host mobo and nothing changed. But I didn't pay attention to the
>> particular host controller.
>>
> It'd be better if we can narrow down the culprit as this seems to me
> like a synchronization issue at the upper layer between the host and device.
>
>>> Did you see this issue previously? If not, is it possible to do git
>>> bisection?
>> This is with Intel Edison where main line usb gadget support appeared
>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
>> and tried to bisect but failed.
>>
>> I realize only now that I failed because:
>> 1) 5.4 already has this issue as I recently retested
> I'm confused, why do you believe the problem is between 5.4 and 5.7 if
> 5.4 already has this issue? So when did you start seeing this problem?

Because at the time of 5.4 I didn't notice the issue as I normally did 
cold boots due to other problems on warm boot (i.e. sdhc inaccessible).

I never new that on a cold boot it works. Even during bisecting I didn't 
know until the end, and then I found 5.4 has the same problem as all the 
later kernels (tested up to 5.11)

> Also, these kernel versions are really old, there's been a lot of
> updates/fixes to dwc3 since then. Can we run tests on the latest kernel?

I have tested 5.10.27, 5.11.0 and 5.11.4-rt11.

But of course I am completely prepared to run Andy's latest (v5.12-rc5) 
on the device.

>> 2) I didn't use a reproducible criterion. After warm reboot the eem
>> gadget fails, but you can flip the host/gadget switch back and forth and
>> have the illusion that the connection restored.
>>
>> The scenario described here is reproducible: leaving the switch in
>> gadget mode eem works after cold boot only. And it likely breaks on unplug.
>>
>> A 2nd hint is that disabling gadget (I used `connmanctl disable gadget`
>> but I believe that has the same effect as `iw link set dev usb0 down`)
>> before unplug prevents messing up the driver, so you can replug and
>> enable again.
> These data points are good. However, we'd need to know where to look
> first. The issue isn't obvious from the DWC3 controller or the DWC3 driver.
>
> Can you check a few things:
> 1) Any error/timeout messages from the host's dmesg? Or device side?

I'll add log from the host side.

For now I only see (on a warm plug):

kernel: usb 1-11: can't set config #1, error -110

> 2) What kernel version is your host using? Can you use the latest for
> both host and device?

The host is ubuntu's amd64 5.8.0-48-generic.

I will test with v5.12-rc5  from ubuntu kernel ppa on the host. And 
Andy's latest (v5.12-rc5) on the device.

I am expecting results this evening.

> 3) Snapshot of dwc3 tracepoints of active transfers between the normal
> vs throttled of the latest kernel

I don't know if the problem I see is really throttling.

I can trace an active transfer, but that does actually throttle from 
200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of iperf3).

> BR,
> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-03 11:25               ` Ferry Toth
@ 2021-04-03 21:15                 ` Ferry Toth
  2021-04-05 20:59                   ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-03 21:15 UTC (permalink / raw)
  To: Thinh Nguyen, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Hi,

Op 03-04-2021 om 13:25 schreef Ferry Toth:
> Hi,
> 
> Op 03-04-2021 om 04:02 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>>>> Ferry Toth wrote:
>>>>> Hi
>>>>>
>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>>>> Hi
>>>>>>
>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly 
>>>>>>>>> configuration)
>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>>>
>>>>>>>>> **So, scenario 0:
>>>>>>>>> 1. Run iperf3 -s on target
>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec
>>>>>>>>> receiver
>>>>>>>>>
>>>>>>>>> **Scenario 1:
>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it 
>>>>>>>>> back,
>>>>>>>>> repeat above:
>>>>>>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>>>>>>
>>>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>>>
>>>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>>>
>>>>>>>>> **Scenario 2.
>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>>>> 2. See it being enumerated and detach it.
>>>>>>>>> 3. Attach cable from host
>>>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>>>> receiver
>>>>>>>>>
>>>>>>>>> Note even more bandwidth drop!
>>>>>>>>>
>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>>>
>>>>>>>>> NOTE, sometimes on this scenario after several seconds the target
>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>>>
>>>>>>>>> So, any pointers on how to debug and what can be a smoking gun 
>>>>>>>>> here?
>>>>>>>>>
>>>>>>>>> Ferry reported this in [2]. There are different kernel versions 
>>>>>>>>> and
>>>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>>>> case).
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [2]:
>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>>>> sizes
>>>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>>>> anything we
>>>>>>>> would do to throttle the transmission, but tracepoints should 
>>>>>>>> tell a
>>>>>>>> clearer story.
>>>>>>>>
>>>>>>> My testing (but yes, with difference kernel and network managed by
>>>>>>> connman) shows:
>>>>>>>
>>>>>>> 1) on cold boot eem network gadget works fine
>>>>>>>
>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's 
>>>>>>> broken,
>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets 
>>>>>>> lost,
>>>>>>> no configuration received from dhcp, occasional reboot, only way to
>>>>>>> fix is cold boot
>>>>>>>
>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>>>> enabling it works fine
>>>>>>>
>>>>>>> My theory is that some HW register is disturbed on a surprise 
>>>>>>> unplug,
>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>>>> Maybe that can help to narrow down tracepoints?
>>>>>>>
>>>>>> I captured a plug after warm and after cold boot. This includes
>>>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip 
>>>>>>
>>>>>>
>>>>>
>>>>> While the above traces in boot.zip allow compare which regs not
>>>>> correctly initialized on warm boot, I have now captured traces of
>>>>> unplug/plug.
>>>>>
>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged 
>>>>> and the
>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>>>
>>>>> After plug the eem connection is again broken.
>>>>>
>>>>> This might allow figuring out what goes wrong on unplug. Traces here:
>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip 
>>>>>
>>>>>
>>>>> **
>>>>>
>>>> Hi,
>>>>
>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>>>> tested with different hosts and different device controllers to confirm
>>>> this)
>>> I haven't tried with other devices. I have been forced to replace my
>>> host mobo and nothing changed. But I didn't pay attention to the
>>> particular host controller.
>>>
>> It'd be better if we can narrow down the culprit as this seems to me
>> like a synchronization issue at the upper layer between the host and 
>> device.
>>
>>>> Did you see this issue previously? If not, is it possible to do git
>>>> bisection?
>>> This is with Intel Edison where main line usb gadget support appeared
>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
>>> and tried to bisect but failed.
>>>
>>> I realize only now that I failed because:
>>> 1) 5.4 already has this issue as I recently retested
>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if
>> 5.4 already has this issue? So when did you start seeing this problem?
> 
> Because at the time of 5.4 I didn't notice the issue as I normally did 
> cold boots due to other problems on warm boot (i.e. sdhc inaccessible).
> 
> I never new that on a cold boot it works. Even during bisecting I didn't 
> know until the end, and then I found 5.4 has the same problem as all the 
> later kernels (tested up to 5.11)
> 
>> Also, these kernel versions are really old, there's been a lot of
>> updates/fixes to dwc3 since then. Can we run tests on the latest kernel?
> 
> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11.
> 
> But of course I am completely prepared to run Andy's latest (v5.12-rc5) 
> on the device.
> 
>>> 2) I didn't use a reproducible criterion. After warm reboot the eem
>>> gadget fails, but you can flip the host/gadget switch back and forth and
>>> have the illusion that the connection restored.
>>>
>>> The scenario described here is reproducible: leaving the switch in
>>> gadget mode eem works after cold boot only. And it likely breaks on 
>>> unplug.
>>>
>>> A 2nd hint is that disabling gadget (I used `connmanctl disable gadget`
>>> but I believe that has the same effect as `iw link set dev usb0 down`)
>>> before unplug prevents messing up the driver, so you can replug and
>>> enable again.
>> These data points are good. However, we'd need to know where to look
>> first. The issue isn't obvious from the DWC3 controller or the DWC3 
>> driver.
>>
>> Can you check a few things:
>> 1) Any error/timeout messages from the host's dmesg? Or device side?
> 
> I'll add log from the host side.
> 
> For now I only see (on a warm plug):
> 
> kernel: usb 1-11: can't set config #1, error -110
> 
>> 2) What kernel version is your host using? Can you use the latest for
>> both host and device?
> 
> The host is ubuntu's amd64 5.8.0-48-generic.
> 
> I will test with v5.12-rc5  from ubuntu kernel ppa on the host. And 
> Andy's latest (v5.12-rc5) on the device.

I upgraded host kernel, but not yet device and captured relevant host 
journal messages and device traces. Something did change: after cold 
boot I don't a eem until after I unplug/replug. I then traced a iperf 
transfer. Then after again unplug/replug I get the throttled connection, 
which I also traced.

See https://github.com/andy-shev/linux/files/6253414/transfer.zip


> I am expecting results this evening.
> 
>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal
>> vs throttled of the latest kernel
> 
> I don't know if the problem I see is really throttling.
> 
> I can trace an active transfer, but that does actually throttle from 
> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of iperf3).
> 
>> BR,
>> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-03 21:15                 ` Ferry Toth
@ 2021-04-05 20:59                   ` Ferry Toth
  2021-04-07  0:10                     ` Thinh Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-05 20:59 UTC (permalink / raw)
  To: Thinh Nguyen, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Hi,

Op 03-04-2021 om 23:15 schreef Ferry Toth:
> Hi,
>
> Op 03-04-2021 om 13:25 schreef Ferry Toth:
>> Hi,
>>
>> Op 03-04-2021 om 04:02 schreef Thinh Nguyen:
>>> Ferry Toth wrote:
>>>> Hi,
>>>>
>>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>>>>> Ferry Toth wrote:
>>>>>> Hi
>>>>>>
>>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>>>>> Hi
>>>>>>>
>>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly 
>>>>>>>>>> configuration)
>>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>>>>
>>>>>>>>>> **So, scenario 0:
>>>>>>>>>> 1. Run iperf3 -s on target
>>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>>>>> 3.  0.00-10.36  sec   237 MBytes   192 Mbits/sec
>>>>>>>>>> receiver
>>>>>>>>>>
>>>>>>>>>> **Scenario 1:
>>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it 
>>>>>>>>>> back,
>>>>>>>>>> repeat above:
>>>>>>>>>> 0.00-9.94   sec   209 MBytes   176 Mbits/sec receiver
>>>>>>>>>>
>>>>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>>>>
>>>>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>>>>
>>>>>>>>>> **Scenario 2.
>>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>>>>> 2. See it being enumerated and detach it.
>>>>>>>>>> 3. Attach cable from host
>>>>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>>>>> receiver
>>>>>>>>>>
>>>>>>>>>> Note even more bandwidth drop!
>>>>>>>>>>
>>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>>>>
>>>>>>>>>> NOTE, sometimes on this scenario after several seconds the 
>>>>>>>>>> target
>>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>>>>
>>>>>>>>>> So, any pointers on how to debug and what can be a smoking 
>>>>>>>>>> gun here?
>>>>>>>>>>
>>>>>>>>>> Ferry reported this in [2]. There are different kernel 
>>>>>>>>>> versions and
>>>>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>>>>> case).
>>>>>>>>>>
>>>>>>>>>> [1]:
>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [2]:
>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>>>>> sizes
>>>>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>>>>> anything we
>>>>>>>>> would do to throttle the transmission, but tracepoints should 
>>>>>>>>> tell a
>>>>>>>>> clearer story.
>>>>>>>>>
>>>>>>>> My testing (but yes, with difference kernel and network managed by
>>>>>>>> connman) shows:
>>>>>>>>
>>>>>>>> 1) on cold boot eem network gadget works fine
>>>>>>>>
>>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's 
>>>>>>>> broken,
>>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets 
>>>>>>>> lost,
>>>>>>>> no configuration received from dhcp, occasional reboot, only 
>>>>>>>> way to
>>>>>>>> fix is cold boot
>>>>>>>>
>>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>>>>> enabling it works fine
>>>>>>>>
>>>>>>>> My theory is that some HW register is disturbed on a surprise 
>>>>>>>> unplug,
>>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>>>>> Maybe that can help to narrow down tracepoints?
>>>>>>>>
>>>>>>> I captured a plug after warm and after cold boot. This includes
>>>>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip 
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> While the above traces in boot.zip allow compare which regs not
>>>>>> correctly initialized on warm boot, I have now captured traces of
>>>>>> unplug/plug.
>>>>>>
>>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged 
>>>>>> and the
>>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>>>>
>>>>>> After plug the eem connection is again broken.
>>>>>>
>>>>>> This might allow figuring out what goes wrong on unplug. Traces 
>>>>>> here:
>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip 
>>>>>>
>>>>>>
>>>>>> **
>>>>>>
>>>>> Hi,
>>>>>
>>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>>>>> tested with different hosts and different device controllers to 
>>>>> confirm
>>>>> this)
>>>> I haven't tried with other devices. I have been forced to replace my
>>>> host mobo and nothing changed. But I didn't pay attention to the
>>>> particular host controller.
>>>>
>>> It'd be better if we can narrow down the culprit as this seems to me
>>> like a synchronization issue at the upper layer between the host and 
>>> device.
>>>
>>>>> Did you see this issue previously? If not, is it possible to do git
>>>>> bisection?
>>>> This is with Intel Edison where main line usb gadget support appeared
>>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
>>>> and tried to bisect but failed.
>>>>
>>>> I realize only now that I failed because:
>>>> 1) 5.4 already has this issue as I recently retested
>>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if
>>> 5.4 already has this issue? So when did you start seeing this problem?
>>
>> Because at the time of 5.4 I didn't notice the issue as I normally 
>> did cold boots due to other problems on warm boot (i.e. sdhc 
>> inaccessible).
>>
>> I never new that on a cold boot it works. Even during bisecting I 
>> didn't know until the end, and then I found 5.4 has the same problem 
>> as all the later kernels (tested up to 5.11)
>>
>>> Also, these kernel versions are really old, there's been a lot of
>>> updates/fixes to dwc3 since then. Can we run tests on the latest 
>>> kernel?
>>
>> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11.
>>
>> But of course I am completely prepared to run Andy's latest 
>> (v5.12-rc5) on the device.
>>
>>>> 2) I didn't use a reproducible criterion. After warm reboot the eem
>>>> gadget fails, but you can flip the host/gadget switch back and 
>>>> forth and
>>>> have the illusion that the connection restored.
>>>>
>>>> The scenario described here is reproducible: leaving the switch in
>>>> gadget mode eem works after cold boot only. And it likely breaks on 
>>>> unplug.
>>>>
>>>> A 2nd hint is that disabling gadget (I used `connmanctl disable 
>>>> gadget`
>>>> but I believe that has the same effect as `iw link set dev usb0 down`)
>>>> before unplug prevents messing up the driver, so you can replug and
>>>> enable again.
>>> These data points are good. However, we'd need to know where to look
>>> first. The issue isn't obvious from the DWC3 controller or the DWC3 
>>> driver.
>>>
>>> Can you check a few things:
>>> 1) Any error/timeout messages from the host's dmesg? Or device side?
>>
>> I'll add log from the host side.
>>
>> For now I only see (on a warm plug):
>>
>> kernel: usb 1-11: can't set config #1, error -110
>>
>>> 2) What kernel version is your host using? Can you use the latest for
>>> both host and device?
>>
>> The host is ubuntu's amd64 5.8.0-48-generic.
>>
>> I will test with v5.12-rc5  from ubuntu kernel ppa on the host. And 
>> Andy's latest (v5.12-rc5) on the device.
>
> I upgraded host kernel, but not yet device and captured relevant host 
> journal messages and device traces. Something did change: after cold 
> boot I don't a eem until after I unplug/replug. I then traced a iperf 
> transfer. Then after again unplug/replug I get the throttled 
> connection, which I also traced.
>
> See https://github.com/andy-shev/linux/files/6253414/transfer.zip
>

Now, with host updated to ubuntu kernel ppa 5.12.0-051200rc5-generic and 
edison to 5.12.0-rc5-edison-acpi-standard vanilla + 2 patches appearing 
in rc6:

* "usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable"
* "usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield"

plus one from https://github.com/andy-shev/linux/commits/eds-acpi 
<https://github.com/andy-shev/linux/commits/eds-acpi>

* "TODO: driver core: Break infinite loop when deferred probe can't be 
satisfied"

I captured one good and one bad connection, plus logs on the host side 
see journalctl-plus-comments.txt in 
https://github.com/andy-shev/linux/files/6260614/5.12-rc5.zip

>
>> I am expecting results this evening.
>>
>>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal
>>> vs throttled of the latest kernel
>>
>> I don't know if the problem I see is really throttling.
>>
>> I can trace an active transfer, but that does actually throttle from 
>> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of 
>> iperf3).
>>
>>> BR,
>>> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-05 20:59                   ` Ferry Toth
@ 2021-04-07  0:10                     ` Thinh Nguyen
  2021-04-07  0:24                       ` Thinh Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-07  0:10 UTC (permalink / raw)
  To: Ferry Toth, Thinh Nguyen, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Ferry Toth wrote:
> Hi,
> 
> Op 03-04-2021 om 23:15 schreef Ferry Toth:
>> Hi,
>>
>> Op 03-04-2021 om 13:25 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 03-04-2021 om 04:02 schreef Thinh Nguyen:
>>>> Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>>>>>> Ferry Toth wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>>>>>>> Hi!
>>>>>>>>>>>
>>>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly
>>>>>>>>>>> configuration)
>>>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>>>>>
>>>>>>>>>>> **So, scenario 0:
>>>>>>>>>>> 1. Run iperf3 -s on target
>>>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>>>>>> 3.  0.00-10.36  sec   237 MBytes  192 Mbits/sec
>>>>>>>>>>> receiver
>>>>>>>>>>>
>>>>>>>>>>> **Scenario 1:
>>>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it
>>>>>>>>>>> back,
>>>>>>>>>>> repeat above:
>>>>>>>>>>> 0.00-9.94   sec   209 MBytes   176Mbits/sec receiver
>>>>>>>>>>>
>>>>>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>>>>>
>>>>>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>>>>>
>>>>>>>>>>> **Scenario 2.
>>>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>>>>>> 2. See it being enumerated and detach it.
>>>>>>>>>>> 3. Attach cable from host
>>>>>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>>>>>> receiver
>>>>>>>>>>>
>>>>>>>>>>> Note even more bandwidth drop!
>>>>>>>>>>>
>>>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>>>>>
>>>>>>>>>>> NOTE, sometimes on this scenario after several seconds the
>>>>>>>>>>> target
>>>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>>>>>
>>>>>>>>>>> So, any pointers on how to debug and what can be a smoking
>>>>>>>>>>> gun here?
>>>>>>>>>>>
>>>>>>>>>>> Ferry reported this in [2]. There are different kernel
>>>>>>>>>>> versions and
>>>>>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>>>>>> case).
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [2]:
>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>>>>>> sizes
>>>>>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>>>>>> anything we
>>>>>>>>>> would do to throttle the transmission, but tracepoints should
>>>>>>>>>> tell a
>>>>>>>>>> clearer story.
>>>>>>>>>>
>>>>>>>>> My testing (but yes, with difference kernel and network managed by
>>>>>>>>> connman) shows:
>>>>>>>>>
>>>>>>>>> 1) on cold boot eem network gadget works fine
>>>>>>>>>
>>>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's
>>>>>>>>> broken,
>>>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets
>>>>>>>>> lost,
>>>>>>>>> no configuration received from dhcp, occasional reboot, only
>>>>>>>>> way to
>>>>>>>>> fix is cold boot
>>>>>>>>>
>>>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>>>>>> enabling it works fine
>>>>>>>>>
>>>>>>>>> My theory is that some HW register is disturbed on a surprise
>>>>>>>>> unplug,
>>>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>>>>>> Maybe that can help to narrow down tracepoints?
>>>>>>>>>
>>>>>>>> I captured a plug after warm and after cold boot. This includes
>>>>>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> While the above traces in boot.zip allow compare which regs not
>>>>>>> correctly initialized on warm boot, I have now captured traces of
>>>>>>> unplug/plug.
>>>>>>>
>>>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged
>>>>>>> and the
>>>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>>>>>
>>>>>>> After plug the eem connection is again broken.
>>>>>>>
>>>>>>> This might allow figuring out what goes wrong on unplug. Traces
>>>>>>> here:
>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip
>>>>>>>
>>>>>>>
>>>>>>> **
>>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>>>>>> tested with different hosts and different device controllers to
>>>>>> confirm
>>>>>> this)
>>>>> I haven't tried with other devices. I have been forced to replace my
>>>>> host mobo and nothing changed. But I didn't pay attention to the
>>>>> particular host controller.
>>>>>
>>>> It'd be better if we can narrow down the culprit as this seems to me
>>>> like a synchronization issue at the upper layer between the host and
>>>> device.
>>>>
>>>>>> Did you see this issue previously? If not, is it possible to do git
>>>>>> bisection?
>>>>> This is with Intel Edison where main line usb gadget support appeared
>>>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
>>>>> and tried to bisect but failed.
>>>>>
>>>>> I realize only now that I failed because:
>>>>> 1) 5.4 already has this issue as I recently retested
>>>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if
>>>> 5.4 already has this issue? So when did you start seeing this problem?
>>>
>>> Because at the time of 5.4 I didn't notice the issue as I normally
>>> did cold boots due to other problems on warm boot (i.e. sdhc
>>> inaccessible).
>>>
>>> I never new that on a cold boot it works. Even during bisecting I
>>> didn't know until the end, and then I found 5.4 has the same problem
>>> as all the later kernels (tested up to 5.11)
>>>
>>>> Also, these kernel versions are really old, there's been a lot of
>>>> updates/fixes to dwc3 since then. Can we run tests on the latest
>>>> kernel?
>>>
>>> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11.
>>>
>>> But of course I am completely prepared to run Andy's latest
>>> (v5.12-rc5) on the device.
>>>
>>>>> 2) I didn't use a reproducible criterion. After warm reboot the eem
>>>>> gadget fails, but you can flip the host/gadget switch back and
>>>>> forth and
>>>>> have the illusion that the connection restored.
>>>>>
>>>>> The scenario described here is reproducible: leaving the switch in
>>>>> gadget mode eem works after cold boot only. And it likely breaks on
>>>>> unplug.
>>>>>
>>>>> A 2nd hint is that disabling gadget (I used `connmanctl disable
>>>>> gadget`
>>>>> but I believe that has the same effect as `iw link set dev usb0 down`)
>>>>> before unplug prevents messing up the driver, so you can replug and
>>>>> enable again.
>>>> These data points are good. However, we'd need to know where to look
>>>> first. The issue isn't obvious from the DWC3 controller or the DWC3
>>>> driver.
>>>>
>>>> Can you check a few things:
>>>> 1) Any error/timeout messages from the host's dmesg? Or device side?
>>>
>>> I'll add log from the host side.
>>>
>>> For now I only see (on a warm plug):
>>>
>>> kernel: usb 1-11: can't set config #1, error -110
>>>
>>>> 2) What kernel version is your host using? Can you use the latest for
>>>> both host and device?
>>>
>>> The host is ubuntu's amd64 5.8.0-48-generic.
>>>
>>> I will test with v5.12-rc5  from ubuntu kernel ppa on the host. And
>>> Andy's latest (v5.12-rc5) on the device.
>>
>> I upgraded host kernel, but not yet device and captured relevant host
>> journal messages and device traces. Something did change: after cold
>> boot I don't a eem until after I unplug/replug. I then traced a iperf
>> transfer. Then after again unplug/replug I get the throttled
>> connection, which I also traced.
>>
>> See
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6253414/transfer.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_101A7wlQD$
>>
> 
> Now, with host updated to ubuntu kernel ppa 5.12.0-051200rc5-generic and
> edison to 5.12.0-rc5-edison-acpi-standard vanilla + 2 patches appearing
> in rc6:
> 
> * "usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable"
> * "usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield"
> 
> plus one from
> https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ 
> <https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$
>>
> 
> * "TODO: driver core: Break infinite loop when deferred probe can't be
> satisfied"
> 
> I captured one good and one bad connection, plus logs on the host side
> see journalctl-plus-comments.txt in
> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6260614/5.12-rc5.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_10w5OhpD1$
> 
>>
>>> I am expecting results this evening.
>>>
>>>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal
>>>> vs throttled of the latest kernel
>>>
>>> I don't know if the problem I see is really throttling.
>>>
>>> I can trace an active transfer, but that does actually throttle from
>>> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of
>>> iperf3).
>>>


I took a look at the "bad" and "normal" tracepoints. There are a few
1-second delays where the host tried to bring the device back and
resume from low power:

     ksoftirqd/0-10      [000] d.s.   231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful
     ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610
     ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710
          <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034
          <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000
          <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000
          <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034
     irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event (00000401): WakeUp [U0]
     irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event (00000401): WakeUp [U0]
     irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm)
     irq/15-dwc3-476     [000] d...   232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
     irq/15-dwc3-476     [000] d...   232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0
     irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
     irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal)


Your device is operating in highspeed right? Try to turn off LPM from
host and see if that helps with the speed throttling issue. (If you're
using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
the connection issue you saw.

It seems to be an issue from host, but I can't tell for sure unless we
have some USB traffic analyzer that shows what's going on. Have you
tried different hosts?

BR,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-07  0:10                     ` Thinh Nguyen
@ 2021-04-07  0:24                       ` Thinh Nguyen
  2021-04-07 13:34                         ` Andy Shevchenko
  0 siblings, 1 reply; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-07  0:24 UTC (permalink / raw)
  To: Ferry Toth, Felipe Balbi, Andy Shevchenko; +Cc: Alan Stern, USB

Thinh Nguyen wrote:
> Ferry Toth wrote:
>> Hi,
>>
>> Op 03-04-2021 om 23:15 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 03-04-2021 om 13:25 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 03-04-2021 om 04:02 schreef Thinh Nguyen:
>>>>> Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 02-04-2021 om 22:16 schreef Thinh Nguyen:
>>>>>>> Ferry Toth wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Op 30-03-2021 om 23:57 schreef Ferry Toth:
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> Op 30-03-2021 om 22:26 schreef Ferry Toth:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Op 30-03-2021 om 18:17 schreef Felipe Balbi:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> I have a platform with DWC3 in Dual Role mode. Currently I'm
>>>>>>>>>>>> experimenting on v5.12-rc5 with a few patches (mostly
>>>>>>>>>>>> configuration)
>>>>>>>>>>>> applied [1]. I'm using Debian Unstable on the host machine and
>>>>>>>>>>>> BuildRoot with the above mentioned kernel on the target.
>>>>>>>>>>>>
>>>>>>>>>>>> **So, scenario 0:
>>>>>>>>>>>> 1. Run iperf3 -s on target
>>>>>>>>>>>> 2. Run iperf3 -c ... -t 0 on the host
>>>>>>>>>>>> 3.  0.00-10.36  sec   237 MBytes  192 Mbits/sec
>>>>>>>>>>>> receiver
>>>>>>>>>>>>
>>>>>>>>>>>> **Scenario 1:
>>>>>>>>>>>> 1. Now, detach USB cable, wait for several seconds, attach it
>>>>>>>>>>>> back,
>>>>>>>>>>>> repeat above:
>>>>>>>>>>>> 0.00-9.94   sec   209 MBytes   176Mbits/sec receiver
>>>>>>>>>>>>
>>>>>>>>>>>> Note the bandwidth drop (177 vs. 192).
>>>>>>>>>>>>
>>>>>>>>>>>> (Repeating scenario 1 will give now the same result)
>>>>>>>>>>>>
>>>>>>>>>>>> **Scenario 2.
>>>>>>>>>>>> 1. Detach USB cable, attach a device, for example USB stick,
>>>>>>>>>>>> 2. See it being enumerated and detach it.
>>>>>>>>>>>> 3. Attach cable from host
>>>>>>>>>>>> 4 .   0.00-19.36  sec   315 MBytes   136 Mbits/sec
>>>>>>>>>>>> receiver
>>>>>>>>>>>>
>>>>>>>>>>>> Note even more bandwidth drop!
>>>>>>>>>>>>
>>>>>>>>>>>> (Repeating scenario 1 keeps the same lower bandwidth)
>>>>>>>>>>>>
>>>>>>>>>>>> NOTE, sometimes on this scenario after several seconds the
>>>>>>>>>>>> target
>>>>>>>>>>>> simply reboots (w/o any logs [from kernel] printed)!
>>>>>>>>>>>>
>>>>>>>>>>>> So, any pointers on how to debug and what can be a smoking
>>>>>>>>>>>> gun here?
>>>>>>>>>>>>
>>>>>>>>>>>> Ferry reported this in [2]. There are different kernel
>>>>>>>>>>>> versions and
>>>>>>>>>>>> tools to establish the connection (like connman vs. none in my
>>>>>>>>>>>> case).
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEppG6qq-d$
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [2]:
>>>>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/issues/31__;!!A4F2R9G_pg!KpQnudHIK6XgK6HbPaqtbVgipDmkNBWewo-euAIuBlGdtSiaQiJ8jLn9OoMEptMCrp-F$
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> dwc3 tracepoints should give some initial hints. Look at packets
>>>>>>>>>>> sizes
>>>>>>>>>>> and period of transmission. From dwc3 side, I can't think of
>>>>>>>>>>> anything we
>>>>>>>>>>> would do to throttle the transmission, but tracepoints should
>>>>>>>>>>> tell a
>>>>>>>>>>> clearer story.
>>>>>>>>>>>
>>>>>>>>>> My testing (but yes, with difference kernel and network managed by
>>>>>>>>>> connman) shows:
>>>>>>>>>>
>>>>>>>>>> 1) on cold boot eem network gadget works fine
>>>>>>>>>>
>>>>>>>>>> 2) after unplug or warm reboot (which is also an unplug) it's
>>>>>>>>>> broken,
>>>>>>>>>> speed is lost (|12.0 Mbits/sec from 200Mb/s normally)|, packets
>>>>>>>>>> lost,
>>>>>>>>>> no configuration received from dhcp, occasional reboot, only
>>>>>>>>>> way to
>>>>>>>>>> fix is cold boot
>>>>>>>>>>
>>>>>>>>>> 3) if before unplug `connmanctl disable gadget`, on replugging and
>>>>>>>>>> enabling it works fine
>>>>>>>>>>
>>>>>>>>>> My theory is that some HW register is disturbed on a surprise
>>>>>>>>>> unplug,
>>>>>>>>>> but not reset on plug or warm boot. But on cold boot is cleared.
>>>>>>>>>> Maybe that can help to narrow down tracepoints?
>>>>>>>>>>
>>>>>>>>> I captured a plug after warm and after cold boot. This includes
>>>>>>>>> network setup (dhcp). You can find it in [2] or directly link here:
>>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6232410/boot.zip
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> While the above traces in boot.zip allow compare which regs not
>>>>>>>> correctly initialized on warm boot, I have now captured traces of
>>>>>>>> unplug/plug.
>>>>>>>>
>>>>>>>> Here kernel is 5.10.27 (LTS), cold booted with USB cable plugged
>>>>>>>> and the
>>>>>>>> eem gadget network setup (dhcp). Then trace unplug. Then trace plug.
>>>>>>>>
>>>>>>>> After plug the eem connection is again broken.
>>>>>>>>
>>>>>>>> This might allow figuring out what goes wrong on unplug. Traces
>>>>>>>> here:
>>>>>>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6250924/plug-unplug.zip
>>>>>>>>
>>>>>>>>
>>>>>>>> **
>>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Were you able to narrow down the issue to only DWC3 device? (i.e. you
>>>>>>> tested with different hosts and different device controllers to
>>>>>>> confirm
>>>>>>> this)
>>>>>> I haven't tried with other devices. I have been forced to replace my
>>>>>> host mobo and nothing changed. But I didn't pay attention to the
>>>>>> particular host controller.
>>>>>>
>>>>> It'd be better if we can narrow down the culprit as this seems to me
>>>>> like a synchronization issue at the upper layer between the host and
>>>>> device.
>>>>>
>>>>>>> Did you see this issue previously? If not, is it possible to do git
>>>>>>> bisection?
>>>>>> This is with Intel Edison where main line usb gadget support appeared
>>>>>> around 4.19 iirc. I believed the problem appeared between 5.4 and 5.7
>>>>>> and tried to bisect but failed.
>>>>>>
>>>>>> I realize only now that I failed because:
>>>>>> 1) 5.4 already has this issue as I recently retested
>>>>> I'm confused, why do you believe the problem is between 5.4 and 5.7 if
>>>>> 5.4 already has this issue? So when did you start seeing this problem?
>>>>
>>>> Because at the time of 5.4 I didn't notice the issue as I normally
>>>> did cold boots due to other problems on warm boot (i.e. sdhc
>>>> inaccessible).
>>>>
>>>> I never new that on a cold boot it works. Even during bisecting I
>>>> didn't know until the end, and then I found 5.4 has the same problem
>>>> as all the later kernels (tested up to 5.11)
>>>>
>>>>> Also, these kernel versions are really old, there's been a lot of
>>>>> updates/fixes to dwc3 since then. Can we run tests on the latest
>>>>> kernel?
>>>>
>>>> I have tested 5.10.27, 5.11.0 and 5.11.4-rt11.
>>>>
>>>> But of course I am completely prepared to run Andy's latest
>>>> (v5.12-rc5) on the device.
>>>>
>>>>>> 2) I didn't use a reproducible criterion. After warm reboot the eem
>>>>>> gadget fails, but you can flip the host/gadget switch back and
>>>>>> forth and
>>>>>> have the illusion that the connection restored.
>>>>>>
>>>>>> The scenario described here is reproducible: leaving the switch in
>>>>>> gadget mode eem works after cold boot only. And it likely breaks on
>>>>>> unplug.
>>>>>>
>>>>>> A 2nd hint is that disabling gadget (I used `connmanctl disable
>>>>>> gadget`
>>>>>> but I believe that has the same effect as `iw link set dev usb0 down`)
>>>>>> before unplug prevents messing up the driver, so you can replug and
>>>>>> enable again.
>>>>> These data points are good. However, we'd need to know where to look
>>>>> first. The issue isn't obvious from the DWC3 controller or the DWC3
>>>>> driver.
>>>>>
>>>>> Can you check a few things:
>>>>> 1) Any error/timeout messages from the host's dmesg? Or device side?
>>>>
>>>> I'll add log from the host side.
>>>>
>>>> For now I only see (on a warm plug):
>>>>
>>>> kernel: usb 1-11: can't set config #1, error -110
>>>>
>>>>> 2) What kernel version is your host using? Can you use the latest for
>>>>> both host and device?
>>>>
>>>> The host is ubuntu's amd64 5.8.0-48-generic.
>>>>
>>>> I will test with v5.12-rc5  from ubuntu kernel ppa on the host. And
>>>> Andy's latest (v5.12-rc5) on the device.
>>>
>>> I upgraded host kernel, but not yet device and captured relevant host
>>> journal messages and device traces. Something did change: after cold
>>> boot I don't a eem until after I unplug/replug. I then traced a iperf
>>> transfer. Then after again unplug/replug I get the throttled
>>> connection, which I also traced.
>>>
>>> See
>>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6253414/transfer.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_101A7wlQD$
>>>
>>
>> Now, with host updated to ubuntu kernel ppa 5.12.0-051200rc5-generic and
>> edison to 5.12.0-rc5-edison-acpi-standard vanilla + 2 patches appearing
>> in rc6:
>>
>> * "usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable"
>> * "usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield"
>>
>> plus one from
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$ 
>> <https://urldefense.com/v3/__https://github.com/andy-shev/linux/commits/eds-acpi__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_105iEe4TE$
>>>
>>
>> * "TODO: driver core: Break infinite loop when deferred probe can't be
>> satisfied"
>>
>> I captured one good and one bad connection, plus logs on the host side
>> see journalctl-plus-comments.txt in
>> https://urldefense.com/v3/__https://github.com/andy-shev/linux/files/6260614/5.12-rc5.zip__;!!A4F2R9G_pg!IYAMgA0GMmo4BNpeXScb5Aix0IrxsxJhdCh9d5-75fAnJtSwcSG5e1az-x_10w5OhpD1$
>>
>>>
>>>> I am expecting results this evening.
>>>>
>>>>> 3) Snapshot of dwc3 tracepoints of active transfers between the normal
>>>>> vs throttled of the latest kernel
>>>>
>>>> I don't know if the problem I see is really throttling.
>>>>
>>>> I can trace an active transfer, but that does actually throttle from
>>>> 200Mb/s down to 139MB/s and produces a trace of 53MB. (2x1sec of
>>>> iperf3).
>>>>
> 
> 
> I took a look at the "bad" and "normal" tracepoints. There are a few
> 1-second delays where the host tried to bring the device back and
> resume from low power:
> 
>      ksoftirqd/0-10      [000] d.s.   231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful
>      ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610
>      ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710
>           <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034
>           <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000
>           <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000
>           <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034
>      irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event (00000401): WakeUp [U0]
>      irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event (00000401): WakeUp [U0]
>      irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm)
>      irq/15-dwc3-476     [000] d...   232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>      irq/15-dwc3-476     [000] d...   232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0
>      irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>      irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal)
> 
> 
> Your device is operating in highspeed right? Try to turn off LPM from
> host and see if that helps with the speed throttling issue. (If you're
> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
> the connection issue you saw.
> 
> It seems to be an issue from host, but I can't tell for sure unless we
> have some USB traffic analyzer that shows what's going on. Have you
> tried different hosts?
> 

You can also disable LPM from the gadget side by setting
dwc->dis_enblslpm_quirk.

BR,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-07  0:24                       ` Thinh Nguyen
@ 2021-04-07 13:34                         ` Andy Shevchenko
  2021-04-07 16:08                           ` Ferry Toth
  2021-04-08 20:17                           ` Ferry Toth
  0 siblings, 2 replies; 27+ messages in thread
From: Andy Shevchenko @ 2021-04-07 13:34 UTC (permalink / raw)
  To: Thinh Nguyen; +Cc: Ferry Toth, Felipe Balbi, Alan Stern, USB

On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> Thinh Nguyen wrote:

...

> > I took a look at the "bad" and "normal" tracepoints. There are a few
> > 1-second delays where the host tried to bring the device back and
> > resume from low power:
> >
> >      ksoftirqd/0-10      [000] d.s.   231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful
> >      ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610
> >      ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710
> >           <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034
> >           <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000
> >           <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000
> >           <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034
> >      irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event (00000401): WakeUp [U0]
> >      irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event (00000401): WakeUp [U0]
> >      irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm)
> >      irq/15-dwc3-476     [000] d...   232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
> >      irq/15-dwc3-476     [000] d...   232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0
> >      irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
> >      irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal)
> >
> >
> > Your device is operating in highspeed right? Try to turn off LPM from
> > host and see if that helps with the speed throttling issue. (If you're
> > using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
> > the connection issue you saw.
> >
> > It seems to be an issue from host, but I can't tell for sure unless we
> > have some USB traffic analyzer that shows what's going on. Have you
> > tried different hosts?
> >
>
> You can also disable LPM from the gadget side by setting
> dwc->dis_enblslpm_quirk.

Ferry, it can be done by adding a corresponding property to the
dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
and perhaps I can collect some traces in my case later on when I have
more time for that.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-07 13:34                         ` Andy Shevchenko
@ 2021-04-07 16:08                           ` Ferry Toth
  2021-04-08 20:17                           ` Ferry Toth
  1 sibling, 0 replies; 27+ messages in thread
From: Ferry Toth @ 2021-04-07 16:08 UTC (permalink / raw)
  To: Andy Shevchenko, Thinh Nguyen; +Cc: Felipe Balbi, Alan Stern, USB

Hi,

Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>> Thinh Nguyen wrote:
> ...
>
>>> I took a look at the "bad" and "normal" tracepoints. There are a few
>>> 1-second delays where the host tried to bring the device back and
>>> resume from low power:
>>>
>>>       ksoftirqd/0-10      [000] d.s.   231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful
>>>       ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610
>>>       ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710
>>>            <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034
>>>            <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000
>>>            <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000
>>>            <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034
>>>       irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event (00000401): WakeUp [U0]
>>>       irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event (00000401): WakeUp [U0]
>>>       irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>       irq/15-dwc3-476     [000] d...   232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>       irq/15-dwc3-476     [000] d...   232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0
>>>       irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>       irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal)
>>>
>>>
>>> Your device is operating in highspeed right? Try to turn off LPM from
>>> host and see if that helps with the speed throttling issue. (If you're
>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
>>> the connection issue you saw.
>>>
>>> It seems to be an issue from host, but I can't tell for sure unless we
>>> have some USB traffic analyzer that shows what's going on. Have you
>>> tried different hosts?
>>>
>> You can also disable LPM from the gadget side by setting
>> dwc->dis_enblslpm_quirk.
> Ferry, it can be done by adding a corresponding property to the
> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
> and perhaps I can collect some traces in my case later on when I have
> more time for that.
>
Thanks guys. Indeed is xHCI on the host side. I'll try also from another 
machine later.

I'll try adding a property and report back (but not today, is my wedding 
day :-) )


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-07 13:34                         ` Andy Shevchenko
  2021-04-07 16:08                           ` Ferry Toth
@ 2021-04-08 20:17                           ` Ferry Toth
  2021-04-08 21:12                             ` Thinh Nguyen
  1 sibling, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-08 20:17 UTC (permalink / raw)
  To: Andy Shevchenko, Thinh Nguyen; +Cc: Felipe Balbi, Alan Stern, USB

Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>> Thinh Nguyen wrote:
> ...
>
>>> I took a look at the "bad" and "normal" tracepoints. There are a few
>>> 1-second delays where the host tried to bring the device back and
>>> resume from low power:
>>>
>>>       ksoftirqd/0-10      [000] d.s.   231.501808: dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params 00000000 00000000 00000000 --> status: Successful
>>>       ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr 00000000d68ecd36 value 0000a610
>>>       ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr 00000000d68ecd36 value 0000a710
>>>            <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr 00000000a15e0e35 value 00000034
>>>            <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr 00000000bb67b585 value 00001000
>>>            <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr 00000000bb67b585 value 80001000
>>>            <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr 00000000a15e0e35 value 00000034
>>>       irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event (00000401): WakeUp [U0]
>>>       irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event (00000401): WakeUp [U0]
>>>       irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>       irq/15-dwc3-476     [000] d...   232.499501: dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>       irq/15-dwc3-476     [000] d...   232.499518: dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536 zsI ==> 0
>>>       irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue: ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>       irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb: ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size 1536 ctrl 00000819 (HlcS:sC:normal)
>>>
>>>
>>> Your device is operating in highspeed right? Try to turn off LPM from
>>> host and see if that helps with the speed throttling issue. (If you're
>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
>>> the connection issue you saw.
>>>
>>> It seems to be an issue from host, but I can't tell for sure unless we
>>> have some USB traffic analyzer that shows what's going on. Have you
>>> tried different hosts?
>>>
>> You can also disable LPM from the gadget side by setting
>> dwc->dis_enblslpm_quirk.
> Ferry, it can be done by adding a corresponding property to the
> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
> and perhaps I can collect some traces in my case later on when I have
> more time for that.
>
Ok thanks all. Here is what I tried:

Another computer (Acer 720P brainwashed chromebook), I tried both full 
speed and high speed. Still throttling but less bad.

Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:

diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c

index 4c5c6972124a..a9268c085840 100644

--- a/drivers/usb/dwc3/dwc3-pci.c

+++ b/drivers/usb/dwc3/dwc3-pci.c

@@ -122,6 +122,7 @@ static const struct property_entry 
dwc3_pci_mrfld_properties[] = {

PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),

PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),

PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),

+ PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),

PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),

{}

};

This fixes the throttling but reveals I had actually at least 2 bugs:

1) throttling due to LPM, this seems solved now, thanks to much!

2) a problem with usb plug detection

When I unplug/replug the gadget cable I need to do that at least another 
time before gadget is detected. So unplug/replug/unplug/replug seems to 
work.

Also this platform has a HW switch to select host/device mode, with 
separate connectors for host and device.

When I flip the switch to host it immediately changes to host.

Flipping to device leaves the LEDs on my connected usb hub on, so it's 
still powered (but not operational).

Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I 
still need to additionally unplug/replug the gadget cable to get that to 
work.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-08 20:17                           ` Ferry Toth
@ 2021-04-08 21:12                             ` Thinh Nguyen
  2021-04-08 21:37                               ` Thinh Nguyen
  2021-04-09 13:26                               ` Ferry Toth
  0 siblings, 2 replies; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-08 21:12 UTC (permalink / raw)
  To: Ferry Toth, Andy Shevchenko, Thinh Nguyen; +Cc: Felipe Balbi, Alan Stern, USB

Ferry Toth wrote:
> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>> <Thinh.Nguyen@synopsys.com> wrote:
>>> Thinh Nguyen wrote:
>> ...
>>
>>>> I took a look at the "bad" and "normal" tracepoints. There are a few
>>>> 1-second delays where the host tried to bring the device back and
>>>> resume from low power:
>>>>
>>>>       ksoftirqd/0-10      [000] d.s.   231.501808:
>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>> 00000000 00000000 00000000 --> status: Successful
>>>>       ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr
>>>> 00000000d68ecd36 value 0000a610
>>>>       ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr
>>>> 00000000d68ecd36 value 0000a710
>>>>            <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr
>>>> 00000000a15e0e35 value 00000034
>>>>            <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr
>>>> 00000000bb67b585 value 00001000
>>>>            <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr
>>>> 00000000bb67b585 value 80001000
>>>>            <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr
>>>> 00000000a15e0e35 value 00000034
>>>>       irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event
>>>> (00000401): WakeUp [U0]
>>>>       irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event
>>>> (00000401): WakeUp [U0]
>>>>       irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event
>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>       irq/15-dwc3-476     [000] d...   232.499501:
>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>       irq/15-dwc3-476     [000] d...   232.499518:
>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>> zsI ==> 0
>>>>       irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>       irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb:
>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>
>>>>
>>>> Your device is operating in highspeed right? Try to turn off LPM from
>>>> host and see if that helps with the speed throttling issue. (If you're
>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
>>>> the connection issue you saw.
>>>>
>>>> It seems to be an issue from host, but I can't tell for sure unless we
>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>> tried different hosts?
>>>>
>>> You can also disable LPM from the gadget side by setting
>>> dwc->dis_enblslpm_quirk.
>> Ferry, it can be done by adding a corresponding property to the
>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>> and perhaps I can collect some traces in my case later on when I have
>> more time for that.
>>
> Ok thanks all. Here is what I tried:
> 
> Another computer (Acer 720P brainwashed chromebook), I tried both full
> speed and high speed. Still throttling but less bad.
> 
> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
> 
> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
> 
> index 4c5c6972124a..a9268c085840 100644
> 
> --- a/drivers/usb/dwc3/dwc3-pci.c
> 
> +++ b/drivers/usb/dwc3/dwc3-pci.c
> 
> @@ -122,6 +122,7 @@ static const struct property_entry
> dwc3_pci_mrfld_properties[] = {
> 
> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
> 
> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
> 
> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
> 
> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
> 
> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
> 
> {}
> 
> };
> 
> This fixes the throttling but reveals I had actually at least 2 bugs:
> 
> 1) throttling due to LPM, this seems solved now, thanks to much!

Now that we can confirm the speed throttling is related to LPM. We can
try to experiment further. (IMO, LPM is an important feature and
totally disabling LPM seems like using a sledgehammer to crack a nut)

I suspect that your phy/HW has a higher low power exit latency. I don't
think you provided any HIRD threshold property in your setup right? So
by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
you know what your phy/HW is capable of, try to test and increase the
recommended BESL value. The range can be from 0 to 15 where 0 is 150us
and 15 is 10ms. Maybe try 6 (i.e. 1ms).

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 60e850a395a2..423533df8927 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct usb_gadget *g,
                 * recommended BESL baseline to 1 and clamp the BESL deep to be
                 * within 2 to 15.
                 */
-               params->besl_baseline = 1;
+               params->besl_baseline = 6;
                if (dwc->is_utmi_l1_suspend)
                        params->besl_deep =
                                clamp_t(u8, dwc->hird_threshold, 2, 15);



> 
> 2) a problem with usb plug detection
> 
> When I unplug/replug the gadget cable I need to do that at least another
> time before gadget is detected. So unplug/replug/unplug/replug seems to
> work.
> 
> Also this platform has a HW switch to select host/device mode, with
> separate connectors for host and device.
> 
> When I flip the switch to host it immediately changes to host.
> 
> Flipping to device leaves the LEDs on my connected usb hub on, so it's
> still powered (but not operational).
> 
> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
> still need to additionally unplug/replug the gadget cable to get that to
> work.
> 

The connection issue can come from different things. Please narrow it down
and make sure that you don't use any defective cable or bad hub. Even then,
it's difficult to determine whose fault it is from just the dmesg and driver
logs alone without looking at the USB traffic at the packet level.

Btw, is your setup DRD? If you're switching mode, then I know that dwc3 right
now doesn't implement mode switching correctly.

You can see the discussion we have here:
https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/#t

BR,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-08 21:12                             ` Thinh Nguyen
@ 2021-04-08 21:37                               ` Thinh Nguyen
  2021-04-09 13:26                               ` Ferry Toth
  1 sibling, 0 replies; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-08 21:37 UTC (permalink / raw)
  To: Ferry Toth, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Thinh Nguyen wrote:
> Ferry Toth wrote:
>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>> Thinh Nguyen wrote:
>>> ...
>>>
>>>>> I took a look at the "bad" and "normal" tracepoints. There are a few
>>>>> 1-second delays where the host tried to bring the device back and
>>>>> resume from low power:
>>>>>
>>>>>       ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>       ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr
>>>>> 00000000d68ecd36 value 0000a610
>>>>>       ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr
>>>>> 00000000d68ecd36 value 0000a710
>>>>>            <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr
>>>>> 00000000a15e0e35 value 00000034
>>>>>            <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr
>>>>> 00000000bb67b585 value 00001000
>>>>>            <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr
>>>>> 00000000bb67b585 value 80001000
>>>>>            <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr
>>>>> 00000000a15e0e35 value 00000034
>>>>>       irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event
>>>>> (00000401): WakeUp [U0]
>>>>>       irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event
>>>>> (00000401): WakeUp [U0]
>>>>>       irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event
>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>       irq/15-dwc3-476     [000] d...   232.499501:
>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>       irq/15-dwc3-476     [000] d...   232.499518:
>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>> zsI ==> 0
>>>>>       irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>       irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb:
>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>
>>>>>
>>>>> Your device is operating in highspeed right? Try to turn off LPM from
>>>>> host and see if that helps with the speed throttling issue. (If you're
>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
>>>>> the connection issue you saw.
>>>>>
>>>>> It seems to be an issue from host, but I can't tell for sure unless we
>>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>>> tried different hosts?
>>>>>
>>>> You can also disable LPM from the gadget side by setting
>>>> dwc->dis_enblslpm_quirk.
>>> Ferry, it can be done by adding a corresponding property to the
>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>>> and perhaps I can collect some traces in my case later on when I have
>>> more time for that.
>>>
>> Ok thanks all. Here is what I tried:
>>
>> Another computer (Acer 720P brainwashed chromebook), I tried both full
>> speed and high speed. Still throttling but less bad.
>>
>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>
>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>>
>> index 4c5c6972124a..a9268c085840 100644
>>
>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>
>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>
>> @@ -122,6 +122,7 @@ static const struct property_entry
>> dwc3_pci_mrfld_properties[] = {
>>
>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>
>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>
>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>
>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>
>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>
>> {}
>>
>> };
>>
>> This fixes the throttling but reveals I had actually at least 2 bugs:
>>
>> 1) throttling due to LPM, this seems solved now, thanks to much!
> 
> Now that we can confirm the speed throttling is related to LPM. We can
> try to experiment further. (IMO, LPM is an important feature and
> totally disabling LPM seems like using a sledgehammer to crack a nut)
> 
> I suspect that your phy/HW has a higher low power exit latency. I don't
> think you provided any HIRD threshold property in your setup right? So
> by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
> you know what your phy/HW is capable of, try to test and increase the
> recommended BESL value. The range can be from 0 to 15 where 0 is 150us
> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
> 

Correction... 0 is 125us, 1 is 150us.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-08 21:12                             ` Thinh Nguyen
  2021-04-08 21:37                               ` Thinh Nguyen
@ 2021-04-09 13:26                               ` Ferry Toth
  2021-04-10 13:29                                 ` Ferry Toth
  1 sibling, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-09 13:26 UTC (permalink / raw)
  To: Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Hi,

Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>> Thinh Nguyen wrote:
>>> ...
>>>
>>>>> I took a look at the "bad" and "normal" tracepoints. There are a few
>>>>> 1-second delays where the host tried to bring the device back and
>>>>> resume from low power:
>>>>>
>>>>>        ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>        ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr
>>>>> 00000000d68ecd36 value 0000a610
>>>>>        ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: addr
>>>>> 00000000d68ecd36 value 0000a710
>>>>>             <idle>-0       [000] d.h.   232.499418: dwc3_readl: addr
>>>>> 00000000a15e0e35 value 00000034
>>>>>             <idle>-0       [000] d.h.   232.499423: dwc3_readl: addr
>>>>> 00000000bb67b585 value 00001000
>>>>>             <idle>-0       [000] d.h.   232.499425: dwc3_writel: addr
>>>>> 00000000bb67b585 value 80001000
>>>>>             <idle>-0       [000] d.h.   232.499427: dwc3_writel: addr
>>>>> 00000000a15e0e35 value 00000034
>>>>>        irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: event
>>>>> (00000401): WakeUp [U0]
>>>>>        irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: event
>>>>> (00000401): WakeUp [U0]
>>>>>        irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: event
>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>        irq/15-dwc3-476     [000] d...   232.499501:
>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>        irq/15-dwc3-476     [000] d...   232.499518:
>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>> zsI ==> 0
>>>>>        irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>        irq/15-dwc3-476     [000] d...   232.499601: dwc3_prepare_trb:
>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>
>>>>>
>>>>> Your device is operating in highspeed right? Try to turn off LPM from
>>>>> host and see if that helps with the speed throttling issue. (If you're
>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help with
>>>>> the connection issue you saw.
>>>>>
>>>>> It seems to be an issue from host, but I can't tell for sure unless we
>>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>>> tried different hosts?
>>>>>
>>>> You can also disable LPM from the gadget side by setting
>>>> dwc->dis_enblslpm_quirk.
>>> Ferry, it can be done by adding a corresponding property to the
>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>>> and perhaps I can collect some traces in my case later on when I have
>>> more time for that.
>>>
>> Ok thanks all. Here is what I tried:
>>
>> Another computer (Acer 720P brainwashed chromebook), I tried both full
>> speed and high speed. Still throttling but less bad.
>>
>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>
>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>>
>> index 4c5c6972124a..a9268c085840 100644
>>
>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>
>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>
>> @@ -122,6 +122,7 @@ static const struct property_entry
>> dwc3_pci_mrfld_properties[] = {
>>
>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>
>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>
>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>
>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>
>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>
>> {}
>>
>> };
>>
>> This fixes the throttling but reveals I had actually at least 2 bugs:
>>
>> 1) throttling due to LPM, this seems solved now, thanks to much!
> Now that we can confirm the speed throttling is related to LPM. We can
> try to experiment further. (IMO, LPM is an important feature and
> totally disabling LPM seems like using a sledgehammer to crack a nut)
>
> I suspect that your phy/HW has a higher low power exit latency. I don't
> think you provided any HIRD threshold property in your setup right? So
> by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
> you know what your phy/HW is capable of, try to test and increase the
> recommended BESL value. The range can be from 0 to 15 where 0 is 150us
> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 60e850a395a2..423533df8927 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct usb_gadget *g,
>                   * recommended BESL baseline to 1 and clamp the BESL deep to be
>                   * within 2 to 15.
>                   */
> -               params->besl_baseline = 1;
> +               params->besl_baseline = 6;
>                  if (dwc->is_utmi_l1_suspend)
>                          params->besl_deep =
>                                  clamp_t(u8, dwc->hird_threshold, 2, 15);
>
I will try and report back, hopefully this evening.
>
>> 2) a problem with usb plug detection
>>
>> When I unplug/replug the gadget cable I need to do that at least another
>> time before gadget is detected. So unplug/replug/unplug/replug seems to
>> work.
>>
>> Also this platform has a HW switch to select host/device mode, with
>> separate connectors for host and device.
>>
>> When I flip the switch to host it immediately changes to host.
>>
>> Flipping to device leaves the LEDs on my connected usb hub on, so it's
>> still powered (but not operational).
>>
>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>> still need to additionally unplug/replug the gadget cable to get that to
>> work.
>>
> The connection issue can come from different things. Please narrow it down
> and make sure that you don't use any defective cable or bad hub. Even then,
> it's difficult to determine whose fault it is from just the dmesg and driver
> logs alone without looking at the USB traffic at the packet level.
>
> Btw, is your setup DRD? If you're switching mode, then I know that dwc3 right
> now doesn't implement mode switching correctly.
Yes, we use Extcon driver to support DRD.
> You can see the discussion we have here:
> https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/#t
I see, that might indeed be related. I will try the patches to see if 
that works and report back.
> BR,
> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-09 13:26                               ` Ferry Toth
@ 2021-04-10 13:29                                 ` Ferry Toth
  2021-04-10 14:08                                   ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-10 13:29 UTC (permalink / raw)
  To: Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Op 09-04-2021 om 15:26 schreef Ferry Toth:
> Hi,
>
> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>> Thinh Nguyen wrote:
>>>> ...
>>>>
>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a few
>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>> resume from low power:
>>>>>>
>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: addr
>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: 
>>>>>> addr
>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>             <idle>-0       [000] d.h. 232.499418: dwc3_readl: addr
>>>>>> 00000000a15e0e35 value 00000034
>>>>>>             <idle>-0       [000] d.h. 232.499423: dwc3_readl: addr
>>>>>> 00000000bb67b585 value 00001000
>>>>>>             <idle>-0       [000] d.h. 232.499425: dwc3_writel: addr
>>>>>> 00000000bb67b585 value 80001000
>>>>>>             <idle>-0       [000] d.h. 232.499427: dwc3_writel: addr
>>>>>> 00000000a15e0e35 value 00000034
>>>>>>        irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: 
>>>>>> event
>>>>>> (00000401): WakeUp [U0]
>>>>>>        irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: 
>>>>>> event
>>>>>> (00000401): WakeUp [U0]
>>>>>>        irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: 
>>>>>> event
>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>        irq/15-dwc3-476     [000] d...   232.499501:
>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>        irq/15-dwc3-476     [000] d...   232.499518:
>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>> zsI ==> 0
>>>>>>        irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>        irq/15-dwc3-476     [000] d...   232.499601: 
>>>>>> dwc3_prepare_trb:
>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>
>>>>>>
>>>>>> Your device is operating in highspeed right? Try to turn off LPM 
>>>>>> from
>>>>>> host and see if that helps with the speed throttling issue. (If 
>>>>>> you're
>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help 
>>>>>> with
>>>>>> the connection issue you saw.
>>>>>>
>>>>>> It seems to be an issue from host, but I can't tell for sure 
>>>>>> unless we
>>>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>>>> tried different hosts?
>>>>>>
>>>>> You can also disable LPM from the gadget side by setting
>>>>> dwc->dis_enblslpm_quirk.
>>>> Ferry, it can be done by adding a corresponding property to the
>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>>>> and perhaps I can collect some traces in my case later on when I have
>>>> more time for that.
>>>>
>>> Ok thanks all. Here is what I tried:
>>>
>>> Another computer (Acer 720P brainwashed chromebook), I tried both full
>>> speed and high speed. Still throttling but less bad.
>>>
>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>
>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>>>
>>> index 4c5c6972124a..a9268c085840 100644
>>>
>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>
>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>
>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>> dwc3_pci_mrfld_properties[] = {
>>>
>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>
>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>
>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>
>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>
>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>
>>> {}
>>>
>>> };
>>>
>>> This fixes the throttling but reveals I had actually at least 2 bugs:
>>>
>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>> Now that we can confirm the speed throttling is related to LPM. We can
>> try to experiment further. (IMO, LPM is an important feature and
>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>
>> I suspect that your phy/HW has a higher low power exit latency. I don't
>> think you provided any HIRD threshold property in your setup right? So
>> by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
>> you know what your phy/HW is capable of, try to test and increase the
>> recommended BESL value. The range can be from 0 to 15 where 0 is 150us
>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 60e850a395a2..423533df8927 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct 
>> usb_gadget *g,
>>                   * recommended BESL baseline to 1 and clamp the BESL 
>> deep to be
>>                   * within 2 to 15.
>>                   */
>> -               params->besl_baseline = 1;
>> +               params->besl_baseline = 6;
>>                  if (dwc->is_utmi_l1_suspend)
>>                          params->besl_deep =
>>                                  clamp_t(u8, dwc->hird_threshold, 2, 
>> 15);
>>
> I will try and report back, hopefully this evening.
I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>
>>> 2) a problem with usb plug detection
>>>
>>> When I unplug/replug the gadget cable I need to do that at least 
>>> another
>>> time before gadget is detected. So unplug/replug/unplug/replug seems to
>>> work.
>>>
>>> Also this platform has a HW switch to select host/device mode, with
>>> separate connectors for host and device.
>>>
>>> When I flip the switch to host it immediately changes to host.
>>>
>>> Flipping to device leaves the LEDs on my connected usb hub on, so it's
>>> still powered (but not operational).
>>>
>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>> still need to additionally unplug/replug the gadget cable to get 
>>> that to
>>> work.
>>>
>> The connection issue can come from different things. Please narrow it 
>> down
>> and make sure that you don't use any defective cable or bad hub. Even 
>> then,
>> it's difficult to determine whose fault it is from just the dmesg and 
>> driver
>> logs alone without looking at the USB traffic at the packet level.
>>
>> Btw, is your setup DRD? If you're switching mode, then I know that 
>> dwc3 right
>> now doesn't implement mode switching correctly.
> Yes, we use Extcon driver to support DRD.
>> You can see the discussion we have here:
>> https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/#t 
>>
> I see, that might indeed be related. I will try the patches to see if 
> that works and report back.

I applied both patches:

usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD

usb: dwc3: Fix DRD mode change sequence following programming guide

It doesn't have an effect on the need to unplug/replug neither on the 
problems switch from host/device mode.

But it doesn't hurt either.

>> BR,
>> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-10 13:29                                 ` Ferry Toth
@ 2021-04-10 14:08                                   ` Ferry Toth
  2021-04-11  0:04                                     ` Thinh Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-10 14:08 UTC (permalink / raw)
  To: Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Hi, some corrections below.

Op 10-04-2021 om 15:29 schreef Ferry Toth:
> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>> Hi,
>>
>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>> Ferry Toth wrote:
>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>>> Thinh Nguyen wrote:
>>>>> ...
>>>>>
>>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a 
>>>>>>> few
>>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>>> resume from low power:
>>>>>>>
>>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl: 
>>>>>>> addr
>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel: 
>>>>>>> addr
>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>             <idle>-0       [000] d.h. 232.499418: dwc3_readl: addr
>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>             <idle>-0       [000] d.h. 232.499423: dwc3_readl: addr
>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>             <idle>-0       [000] d.h. 232.499425: dwc3_writel: addr
>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>             <idle>-0       [000] d.h. 232.499427: dwc3_writel: addr
>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499480: dwc3_event: 
>>>>>>> event
>>>>>>> (00000401): WakeUp [U0]
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499492: dwc3_event: 
>>>>>>> event
>>>>>>> (00000401): WakeUp [U0]
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499496: dwc3_event: 
>>>>>>> event
>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>>> zsI ==> 0
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>        irq/15-dwc3-476     [000] d...   232.499601: 
>>>>>>> dwc3_prepare_trb:
>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>
>>>>>>>
>>>>>>> Your device is operating in highspeed right? Try to turn off LPM 
>>>>>>> from
>>>>>>> host and see if that helps with the speed throttling issue. (If 
>>>>>>> you're
>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help 
>>>>>>> with
>>>>>>> the connection issue you saw.
>>>>>>>
>>>>>>> It seems to be an issue from host, but I can't tell for sure 
>>>>>>> unless we
>>>>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>>>>> tried different hosts?
>>>>>>>
>>>>>> You can also disable LPM from the gadget side by setting
>>>>>> dwc->dis_enblslpm_quirk.
>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>>>>> and perhaps I can collect some traces in my case later on when I have
>>>>> more time for that.
>>>>>
>>>> Ok thanks all. Here is what I tried:
>>>>
>>>> Another computer (Acer 720P brainwashed chromebook), I tried both full
>>>> speed and high speed. Still throttling but less bad.
>>>>
>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>>
>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>>>>
>>>> index 4c5c6972124a..a9268c085840 100644
>>>>
>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>
>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>
>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>> dwc3_pci_mrfld_properties[] = {
>>>>
>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>
>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>
>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>
>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>
>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>
>>>> {}
>>>>
>>>> };
>>>>
>>>> This fixes the throttling but reveals I had actually at least 2 bugs:
>>>>
>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>> Now that we can confirm the speed throttling is related to LPM. We can
>>> try to experiment further. (IMO, LPM is an important feature and
>>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>>
>>> I suspect that your phy/HW has a higher low power exit latency. I don't
>>> think you provided any HIRD threshold property in your setup right? So
>>> by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
>>> you know what your phy/HW is capable of, try to test and increase the
>>> recommended BESL value. The range can be from 0 to 15 where 0 is 150us
>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index 60e850a395a2..423533df8927 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct 
>>> usb_gadget *g,
>>>                   * recommended BESL baseline to 1 and clamp the 
>>> BESL deep to be
>>>                   * within 2 to 15.
>>>                   */
>>> -               params->besl_baseline = 1;
>>> +               params->besl_baseline = 6;
>>>                  if (dwc->is_utmi_l1_suspend)
>>>                          params->besl_deep =
>>>                                  clamp_t(u8, dwc->hird_threshold, 2, 
>>> 15);
>>>
>> I will try and report back, hopefully this evening.
> I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>>
>>>> 2) a problem with usb plug detection
>>>>
>>>> When I unplug/replug the gadget cable I need to do that at least 
>>>> another
>>>> time before gadget is detected. So unplug/replug/unplug/replug 
>>>> seems to
>>>> work.
>>>>
>>>> Also this platform has a HW switch to select host/device mode, with
>>>> separate connectors for host and device.
>>>>
>>>> When I flip the switch to host it immediately changes to host.
>>>>
>>>> Flipping to device leaves the LEDs on my connected usb hub on, so it's
>>>> still powered (but not operational).
>>>>
>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>>> still need to additionally unplug/replug the gadget cable to get 
>>>> that to
>>>> work.
>>>>
>>> The connection issue can come from different things. Please narrow 
>>> it down
>>> and make sure that you don't use any defective cable or bad hub. 
>>> Even then,
>>> it's difficult to determine whose fault it is from just the dmesg 
>>> and driver
>>> logs alone without looking at the USB traffic at the packet level.
>>>
>>> Btw, is your setup DRD? If you're switching mode, then I know that 
>>> dwc3 right
>>> now doesn't implement mode switching correctly.
>> Yes, we use Extcon driver to support DRD.
>>> You can see the discussion we have here:
>>> https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/#t 
>>>
>> I see, that might indeed be related. I will try the patches to see if 
>> that works and report back.
>
> I applied both patches:
>
> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>
> usb: dwc3: Fix DRD mode change sequence following programming guide
>
> It doesn't have an effect on the need to unplug/replug neither on the 
> problems switch from host/device mode.

When I test the correct kernel it does have an effect :-)

In most cases the need to unplug/replug is removed, but not always. In 
the cases when I need to retry the host journal shows "can't set config 
#1, error -110"

The switch from host->device and device->host mode seems to be resolved.

Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).

>
> But it doesn't hurt either.
>
>>> BR,
>>> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-10 14:08                                   ` Ferry Toth
@ 2021-04-11  0:04                                     ` Thinh Nguyen
  2021-04-11 15:26                                       ` Ferry Toth
  0 siblings, 1 reply; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-11  0:04 UTC (permalink / raw)
  To: Ferry Toth, Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Ferry Toth wrote:
> Hi, some corrections below.
> 
> Op 10-04-2021 om 15:29 schreef Ferry Toth:
>> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>>> Ferry Toth wrote:
>>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>> Thinh Nguyen wrote:
>>>>>> ...
>>>>>>
>>>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a
>>>>>>>> few
>>>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>>>> resume from low power:
>>>>>>>>
>>>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl:
>>>>>>>> addr
>>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>>        ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel:
>>>>>>>> addr
>>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>>             <idle>-0       [000] d.h. 232.499418: dwc3_readl: addr
>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>             <idle>-0       [000] d.h. 232.499423: dwc3_readl: addr
>>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>>             <idle>-0       [000] d.h. 232.499425: dwc3_writel: addr
>>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>>             <idle>-0       [000] d.h. 232.499427: dwc3_writel: addr
>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499480: dwc3_event:
>>>>>>>> event
>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499492: dwc3_event:
>>>>>>>> event
>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499496: dwc3_event:
>>>>>>>> event
>>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>>>> zsI ==> 0
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>>        irq/15-dwc3-476     [000] d...   232.499601:
>>>>>>>> dwc3_prepare_trb:
>>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>>
>>>>>>>>
>>>>>>>> Your device is operating in highspeed right? Try to turn off LPM
>>>>>>>> from
>>>>>>>> host and see if that helps with the speed throttling issue. (If
>>>>>>>> you're
>>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help
>>>>>>>> with
>>>>>>>> the connection issue you saw.
>>>>>>>>
>>>>>>>> It seems to be an issue from host, but I can't tell for sure
>>>>>>>> unless we
>>>>>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>>>>>> tried different hosts?
>>>>>>>>
>>>>>>> You can also disable LPM from the gadget side by setting
>>>>>>> dwc->dis_enblslpm_quirk.
>>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>>>>>> and perhaps I can collect some traces in my case later on when I have
>>>>>> more time for that.
>>>>>>
>>>>> Ok thanks all. Here is what I tried:
>>>>>
>>>>> Another computer (Acer 720P brainwashed chromebook), I tried both full
>>>>> speed and high speed. Still throttling but less bad.
>>>>>
>>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>>>>>
>>>>> index 4c5c6972124a..a9268c085840 100644
>>>>>
>>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>>
>>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>>
>>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>>> dwc3_pci_mrfld_properties[] = {
>>>>>
>>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>>
>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>>
>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>>
>>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>>
>>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>>
>>>>> {}
>>>>>
>>>>> };
>>>>>
>>>>> This fixes the throttling but reveals I had actually at least 2 bugs:
>>>>>
>>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>>> Now that we can confirm the speed throttling is related to LPM. We can
>>>> try to experiment further. (IMO, LPM is an important feature and
>>>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>>>
>>>> I suspect that your phy/HW has a higher low power exit latency. I don't
>>>> think you provided any HIRD threshold property in your setup right? So
>>>> by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
>>>> you know what your phy/HW is capable of, try to test and increase the
>>>> recommended BESL value. The range can be from 0 to 15 where 0 is 150us
>>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index 60e850a395a2..423533df8927 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct
>>>> usb_gadget *g,
>>>>                   * recommended BESL baseline to 1 and clamp the
>>>> BESL deep to be
>>>>                   * within 2 to 15.
>>>>                   */
>>>> -               params->besl_baseline = 1;
>>>> +               params->besl_baseline = 6;
>>>>                  if (dwc->is_utmi_l1_suspend)
>>>>                         params->besl_deep =
>>>>                                  clamp_t(u8, dwc->hird_threshold, 2,
>>>> 15);
>>>>
>>> I will try and report back, hopefully this evening.
>> I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>>>
>>>>> 2) a problem with usb plug detection
>>>>>
>>>>> When I unplug/replug the gadget cable I need to do that at least
>>>>> another
>>>>> time before gadget is detected. So unplug/replug/unplug/replug
>>>>> seems to
>>>>> work.
>>>>>
>>>>> Also this platform has a HW switch to select host/device mode, with
>>>>> separate connectors for host and device.
>>>>>
>>>>> When I flip the switch to host it immediately changes to host.
>>>>>
>>>>> Flipping to device leaves the LEDs on my connected usb hub on, so it's
>>>>> still powered (but not operational).
>>>>>
>>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>>>> still need to additionally unplug/replug the gadget cable to get
>>>>> that to
>>>>> work.
>>>>>
>>>> The connection issue can come from different things. Please narrow
>>>> it down
>>>> and make sure that you don't use any defective cable or bad hub.
>>>> Even then,
>>>> it's difficult to determine whose fault it is from just the dmesg
>>>> and driver
>>>> logs alone without looking at the USB traffic at the packet level.
>>>>
>>>> Btw, is your setup DRD? If you're switching mode, then I know that
>>>> dwc3 right
>>>> now doesn't implement mode switching correctly.
>>> Yes, we use Extcon driver to support DRD.
>>>> You can see the discussion we have here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/*t__;Iw!!A4F2R9G_pg!MXee1rloMlVeQuXlR60t94lr_6imLoVLTEFXzYWhS27dZFAFtH5AWssCZxlDLGcaKy2f$ 
>>>>
>>> I see, that might indeed be related. I will try the patches to see if
>>> that works and report back.
>>
>> I applied both patches:
>>
>> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>>
>> usb: dwc3: Fix DRD mode change sequence following programming guide
>>
>> It doesn't have an effect on the need to unplug/replug neither on the
>> problems switch from host/device mode.
> 
> When I test the correct kernel it does have an effect :-)
> 
> In most cases the need to unplug/replug is removed, but not always. In
> the cases when I need to retry the host journal shows "can't set config
> #1, error -110"

It's most likely because the driver didn't provide time for the clocks
synchronization before clearing the GCTL soft reset. I noted that issue
in the patch in the discussion thread. I can send out a patch next week.

> 
> The switch from host->device and device->host mode seems to be resolved.
> 
> Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).
> 

Did this happen with disabling LPM or with increasing BESL baseline?
Note that increasing the recommended BESL is not the same as disabling
LPM. With the recommended BESL provided, the host can decide when it
should put the device in low power so that the device has enough time to
wake up. With LPM enabled, there maybe some minor speed degradation but
not that much. Anyway, you can experiment with the BESL value to have
the acceptable speed while still have power saving capability (or
completely disable LPM if power saving is not an issue for you).

BR,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-11  0:04                                     ` Thinh Nguyen
@ 2021-04-11 15:26                                       ` Ferry Toth
  2021-04-13  2:17                                         ` Thinh Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-11 15:26 UTC (permalink / raw)
  To: Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB


Op 11-04-2021 om 02:04 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Hi, some corrections below.
>>
>> Op 10-04-2021 om 15:29 schreef Ferry Toth:
>>> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>>>> Ferry Toth wrote:
>>>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>>> Thinh Nguyen wrote:
>>>>>>> ...
>>>>>>>
>>>>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a
>>>>>>>>> few
>>>>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>>>>> resume from low power:
>>>>>>>>>
>>>>>>>>>         ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>>>         ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl:
>>>>>>>>> addr
>>>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>>>         ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel:
>>>>>>>>> addr
>>>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>>>              <idle>-0       [000] d.h. 232.499418: dwc3_readl: addr
>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>              <idle>-0       [000] d.h. 232.499423: dwc3_readl: addr
>>>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>>>              <idle>-0       [000] d.h. 232.499425: dwc3_writel: addr
>>>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>>>              <idle>-0       [000] d.h. 232.499427: dwc3_writel: addr
>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499480: dwc3_event:
>>>>>>>>> event
>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499492: dwc3_event:
>>>>>>>>> event
>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499496: dwc3_event:
>>>>>>>>> event
>>>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>>>>> zsI ==> 0
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499562: dwc3_ep_queue:
>>>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499601:
>>>>>>>>> dwc3_prepare_trb:
>>>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0 size
>>>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Your device is operating in highspeed right? Try to turn off LPM
>>>>>>>>> from
>>>>>>>>> host and see if that helps with the speed throttling issue. (If
>>>>>>>>> you're
>>>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help
>>>>>>>>> with
>>>>>>>>> the connection issue you saw.
>>>>>>>>>
>>>>>>>>> It seems to be an issue from host, but I can't tell for sure
>>>>>>>>> unless we
>>>>>>>>> have some USB traffic analyzer that shows what's going on. Have you
>>>>>>>>> tried different hosts?
>>>>>>>>>
>>>>>>>> You can also disable LPM from the gadget side by setting
>>>>>>>> dwc->dis_enblslpm_quirk.
>>>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my case
>>>>>>> and perhaps I can collect some traces in my case later on when I have
>>>>>>> more time for that.
>>>>>>>
>>>>>> Ok thanks all. Here is what I tried:
>>>>>>
>>>>>> Another computer (Acer 720P brainwashed chromebook), I tried both full
>>>>>> speed and high speed. Still throttling but less bad.
>>>>>>
>>>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>
>>>>>> index 4c5c6972124a..a9268c085840 100644
>>>>>>
>>>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>
>>>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>
>>>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>>>> dwc3_pci_mrfld_properties[] = {
>>>>>>
>>>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>>>
>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>>>
>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>>>
>>>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>>>
>>>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>>>
>>>>>> {}
>>>>>>
>>>>>> };
>>>>>>
>>>>>> This fixes the throttling but reveals I had actually at least 2 bugs:
>>>>>>
>>>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>>>> Now that we can confirm the speed throttling is related to LPM. We can
>>>>> try to experiment further. (IMO, LPM is an important feature and
>>>>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>>>>
>>>>> I suspect that your phy/HW has a higher low power exit latency. I don't
>>>>> think you provided any HIRD threshold property in your setup right? So
>>>>> by default, dwc3 sets the base line BESL value to 1 (or 150us). Unless
>>>>> you know what your phy/HW is capable of, try to test and increase the
>>>>> recommended BESL value. The range can be from 0 to 15 where 0 is 150us
>>>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>> index 60e850a395a2..423533df8927 100644
>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct
>>>>> usb_gadget *g,
>>>>>                    * recommended BESL baseline to 1 and clamp the
>>>>> BESL deep to be
>>>>>                    * within 2 to 15.
>>>>>                    */
>>>>> -               params->besl_baseline = 1;
>>>>> +               params->besl_baseline = 6;
>>>>>                   if (dwc->is_utmi_l1_suspend)
>>>>>                          params->besl_deep =
>>>>>                                   clamp_t(u8, dwc->hird_threshold, 2,
>>>>> 15);
>>>>>
>>>> I will try and report back, hopefully this evening.
>>> I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>>>>> 2) a problem with usb plug detection
>>>>>>
>>>>>> When I unplug/replug the gadget cable I need to do that at least
>>>>>> another
>>>>>> time before gadget is detected. So unplug/replug/unplug/replug
>>>>>> seems to
>>>>>> work.
>>>>>>
>>>>>> Also this platform has a HW switch to select host/device mode, with
>>>>>> separate connectors for host and device.
>>>>>>
>>>>>> When I flip the switch to host it immediately changes to host.
>>>>>>
>>>>>> Flipping to device leaves the LEDs on my connected usb hub on, so it's
>>>>>> still powered (but not operational).
>>>>>>
>>>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>>>>> still need to additionally unplug/replug the gadget cable to get
>>>>>> that to
>>>>>> work.
>>>>>>
>>>>> The connection issue can come from different things. Please narrow
>>>>> it down
>>>>> and make sure that you don't use any defective cable or bad hub.
>>>>> Even then,
>>>>> it's difficult to determine whose fault it is from just the dmesg
>>>>> and driver
>>>>> logs alone without looking at the USB traffic at the packet level.
>>>>>
>>>>> Btw, is your setup DRD? If you're switching mode, then I know that
>>>>> dwc3 right
>>>>> now doesn't implement mode switching correctly.
>>>> Yes, we use Extcon driver to support DRD.
>>>>> You can see the discussion we have here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/*t__;Iw!!A4F2R9G_pg!MXee1rloMlVeQuXlR60t94lr_6imLoVLTEFXzYWhS27dZFAFtH5AWssCZxlDLGcaKy2f$
>>>>>
>>>> I see, that might indeed be related. I will try the patches to see if
>>>> that works and report back.
>>> I applied both patches:
>>>
>>> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>>>
>>> usb: dwc3: Fix DRD mode change sequence following programming guide
>>>
>>> It doesn't have an effect on the need to unplug/replug neither on the
>>> problems switch from host/device mode.
>> When I test the correct kernel it does have an effect :-)
>>
>> In most cases the need to unplug/replug is removed, but not always. In
>> the cases when I need to retry the host journal shows "can't set config
>> #1, error -110"
> It's most likely because the driver didn't provide time for the clocks
> synchronization before clearing the GCTL soft reset. I noted that issue
> in the patch in the discussion thread. I can send out a patch next week.
>
>> The switch from host->device and device->host mode seems to be resolved.
>>
>> Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).
>>
> Did this happen with disabling LPM or with increasing BESL baseline?
> Note that increasing the recommended BESL is not the same as disabling
> LPM. With the recommended BESL provided, the host can decide when it
> should put the device in low power so that the device has enough time to
> wake up. With LPM enabled, there maybe some minor speed degradation but
> not that much. Anyway, you can experiment with the BESL value to have
> the acceptable speed while still have power saving capability (or
> completely disable LPM if power saving is not an issue for you).
I tried both, the result was exactly the same.
> BR,
> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-11 15:26                                       ` Ferry Toth
@ 2021-04-13  2:17                                         ` Thinh Nguyen
  2021-04-13  8:45                                           ` Ferry Toth
  2021-04-13 21:06                                           ` Ferry Toth
  0 siblings, 2 replies; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-13  2:17 UTC (permalink / raw)
  To: Ferry Toth, Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Ferry Toth wrote:
> 
> Op 11-04-2021 om 02:04 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Hi, some corrections below.
>>>
>>> Op 10-04-2021 om 15:29 schreef Ferry Toth:
>>>> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>>>>> Hi,
>>>>>
>>>>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>>>>> Ferry Toth wrote:
>>>>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>>>> Thinh Nguyen wrote:
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a
>>>>>>>>>> few
>>>>>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>>>>>> resume from low power:
>>>>>>>>>>
>>>>>>>>>>         ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>>>>         ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl:
>>>>>>>>>> addr
>>>>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>>>>         ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel:
>>>>>>>>>> addr
>>>>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>>>>              <idle>-0       [000] d.h. 232.499418: dwc3_readl:
>>>>>>>>>> addr
>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>              <idle>-0       [000] d.h. 232.499423: dwc3_readl:
>>>>>>>>>> addr
>>>>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>>>>              <idle>-0       [000] d.h. 232.499425:
>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>>>>              <idle>-0       [000] d.h. 232.499427:
>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499480: dwc3_event:
>>>>>>>>>> event
>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499492: dwc3_event:
>>>>>>>>>> event
>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499496: dwc3_event:
>>>>>>>>>> event
>>>>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>>>>>> zsI ==> 0
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499562:
>>>>>>>>>> dwc3_ep_queue:
>>>>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>>>>         irq/15-dwc3-476     [000] d...   232.499601:
>>>>>>>>>> dwc3_prepare_trb:
>>>>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0
>>>>>>>>>> size
>>>>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Your device is operating in highspeed right? Try to turn off LPM
>>>>>>>>>> from
>>>>>>>>>> host and see if that helps with the speed throttling issue. (If
>>>>>>>>>> you're
>>>>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help
>>>>>>>>>> with
>>>>>>>>>> the connection issue you saw.
>>>>>>>>>>
>>>>>>>>>> It seems to be an issue from host, but I can't tell for sure
>>>>>>>>>> unless we
>>>>>>>>>> have some USB traffic analyzer that shows what's going on.
>>>>>>>>>> Have you
>>>>>>>>>> tried different hosts?
>>>>>>>>>>
>>>>>>>>> You can also disable LPM from the gadget side by setting
>>>>>>>>> dwc->dis_enblslpm_quirk.
>>>>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my
>>>>>>>> case
>>>>>>>> and perhaps I can collect some traces in my case later on when I
>>>>>>>> have
>>>>>>>> more time for that.
>>>>>>>>
>>>>>>> Ok thanks all. Here is what I tried:
>>>>>>>
>>>>>>> Another computer (Acer 720P brainwashed chromebook), I tried both
>>>>>>> full
>>>>>>> speed and high speed. Still throttling but less bad.
>>>>>>>
>>>>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>>>>>
>>>>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>> b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>
>>>>>>> index 4c5c6972124a..a9268c085840 100644
>>>>>>>
>>>>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>
>>>>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>
>>>>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>>>>> dwc3_pci_mrfld_properties[] = {
>>>>>>>
>>>>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>>>>
>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>>>>
>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>>>>
>>>>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>>>>
>>>>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>>>>
>>>>>>> {}
>>>>>>>
>>>>>>> };
>>>>>>>
>>>>>>> This fixes the throttling but reveals I had actually at least 2
>>>>>>> bugs:
>>>>>>>
>>>>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>>>>> Now that we can confirm the speed throttling is related to LPM. We
>>>>>> can
>>>>>> try to experiment further. (IMO, LPM is an important feature and
>>>>>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>>>>>
>>>>>> I suspect that your phy/HW has a higher low power exit latency. I
>>>>>> don't
>>>>>> think you provided any HIRD threshold property in your setup
>>>>>> right? So
>>>>>> by default, dwc3 sets the base line BESL value to 1 (or 150us).
>>>>>> Unless
>>>>>> you know what your phy/HW is capable of, try to test and increase the
>>>>>> recommended BESL value. The range can be from 0 to 15 where 0 is
>>>>>> 150us
>>>>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>> index 60e850a395a2..423533df8927 100644
>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct
>>>>>> usb_gadget *g,
>>>>>>                    * recommended BESL baseline to 1 and clamp the
>>>>>> BESL deep to be
>>>>>>                    * within 2 to 15.
>>>>>>                    */
>>>>>> -               params->besl_baseline = 1;
>>>>>> +               params->besl_baseline = 6;
>>>>>>                   if (dwc->is_utmi_l1_suspend)
>>>>>>                          params->besl_deep =
>>>>>>                                   clamp_t(u8, dwc->hird_threshold, 2,
>>>>>> 15);
>>>>>>
>>>>> I will try and report back, hopefully this evening.
>>>> I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>>>>>> 2) a problem with usb plug detection
>>>>>>>
>>>>>>> When I unplug/replug the gadget cable I need to do that at least
>>>>>>> another
>>>>>>> time before gadget is detected. So unplug/replug/unplug/replug
>>>>>>> seems to
>>>>>>> work.
>>>>>>>
>>>>>>> Also this platform has a HW switch to select host/device mode, with
>>>>>>> separate connectors for host and device.
>>>>>>>
>>>>>>> When I flip the switch to host it immediately changes to host.
>>>>>>>
>>>>>>> Flipping to device leaves the LEDs on my connected usb hub on, so
>>>>>>> it's
>>>>>>> still powered (but not operational).
>>>>>>>
>>>>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>>>>>> still need to additionally unplug/replug the gadget cable to get
>>>>>>> that to
>>>>>>> work.
>>>>>>>
>>>>>> The connection issue can come from different things. Please narrow
>>>>>> it down
>>>>>> and make sure that you don't use any defective cable or bad hub.
>>>>>> Even then,
>>>>>> it's difficult to determine whose fault it is from just the dmesg
>>>>>> and driver
>>>>>> logs alone without looking at the USB traffic at the packet level.
>>>>>>
>>>>>> Btw, is your setup DRD? If you're switching mode, then I know that
>>>>>> dwc3 right
>>>>>> now doesn't implement mode switching correctly.
>>>>> Yes, we use Extcon driver to support DRD.
>>>>>> You can see the discussion we have here:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/*t__;Iw!!A4F2R9G_pg!MXee1rloMlVeQuXlR60t94lr_6imLoVLTEFXzYWhS27dZFAFtH5AWssCZxlDLGcaKy2f$
>>>>>>
>>>>>>
>>>>> I see, that might indeed be related. I will try the patches to see if
>>>>> that works and report back.
>>>> I applied both patches:
>>>>
>>>> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>>>>
>>>> usb: dwc3: Fix DRD mode change sequence following programming guide
>>>>
>>>> It doesn't have an effect on the need to unplug/replug neither on the
>>>> problems switch from host/device mode.
>>> When I test the correct kernel it does have an effect :-)
>>>
>>> In most cases the need to unplug/replug is removed, but not always. In
>>> the cases when I need to retry the host journal shows "can't set config
>>> #1, error -110"
>> It's most likely because the driver didn't provide time for the clocks
>> synchronization before clearing the GCTL soft reset. I noted that issue
>> in the patch in the discussion thread. I can send out a patch next week.
>>
>>> The switch from host->device and device->host mode seems to be resolved.
>>>
>>> Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).
>>>
>> Did this happen with disabling LPM or with increasing BESL baseline?
>> Note that increasing the recommended BESL is not the same as disabling
>> LPM. With the recommended BESL provided, the host can decide when it
>> should put the device in low power so that the device has enough time to
>> wake up. With LPM enabled, there maybe some minor speed degradation but
>> not that much. Anyway, you can experiment with the BESL value to have
>> the acceptable speed while still have power saving capability (or
>> completely disable LPM if power saving is not an issue for you).
> I tried both, the result was exactly the same.

That's strange... Also, enabling LPM should not impact the performance that
much at all. What's changed to your setup?

Anyway, can you try this patch instead of John Stult's. There are a couple
of issues from his patches.

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 05e2e54cbbdc..675e861fda1a 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -14,6 +14,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/mutex.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/interrupt.h>
@@ -40,6 +41,8 @@
 
 #define DWC3_DEFAULT_AUTOSUSPEND_DELAY 5000 /* ms */
 
+static DEFINE_MUTEX(mode_switch_lock);
+
 /**
  * dwc3_get_dr_mode - Validates and sets dr_mode
  * @dwc: pointer to our context structure
@@ -114,13 +117,20 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
        dwc->current_dr_role = mode;
 }
 
+static int dwc3_core_soft_reset(struct dwc3 *dwc);
+
 static void __dwc3_set_mode(struct work_struct *work)
 {
        struct dwc3 *dwc = work_to_dwc(work);
        unsigned long flags;
+       unsigned int hw_mode;
        int ret;
        u32 reg;
 
+       mutex_lock(&mode_switch_lock);
+
+       hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0);
+
        pm_runtime_get_sync(dwc->dev);
 
        if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
@@ -154,6 +164,24 @@ static void __dwc3_set_mode(struct work_struct *work)
                break;
        }
 
+       if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD) {
+               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+               reg |= DWC3_GCTL_CORESOFTRESET;
+               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+
+               /*
+                * Wait for internal clocks to synchronized. DWC_usb31 and
+                * DWC_usb32 may need at least 50ms (less for DWC_usb3). To
+                * keep it consistent across different IPs, let's wait up to
+                * 100ms before clearing GCTL.CORESOFTRESET.
+                */
+               msleep(100);
+
+               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+               reg &= ~DWC3_GCTL_CORESOFTRESET;
+               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+       }
+
        spin_lock_irqsave(&dwc->lock, flags);
 
        dwc3_set_prtcap(dwc, dwc->desired_dr_role);
@@ -178,6 +206,9 @@ static void __dwc3_set_mode(struct work_struct *work)
                }
                break;
        case DWC3_GCTL_PRTCAP_DEVICE:
+               if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD)
+                       dwc3_core_soft_reset(dwc);
+
                dwc3_event_buffers_setup(dwc);
 
                if (dwc->usb2_phy)
@@ -200,6 +231,7 @@ static void __dwc3_set_mode(struct work_struct *work)
 out:
        pm_runtime_mark_last_busy(dwc->dev);
        pm_runtime_put_autosuspend(dwc->dev);
+       mutex_unlock(&mode_switch_lock);
 }
 
 void dwc3_set_mode(struct dwc3 *dwc, u32 mode)


Thanks,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-13  2:17                                         ` Thinh Nguyen
@ 2021-04-13  8:45                                           ` Ferry Toth
  2021-04-13 21:06                                           ` Ferry Toth
  1 sibling, 0 replies; 27+ messages in thread
From: Ferry Toth @ 2021-04-13  8:45 UTC (permalink / raw)
  To: Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Hi

Op 13-04-2021 om 04:17 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Op 11-04-2021 om 02:04 schreef Thinh Nguyen:
>>> Ferry Toth wrote:
>>>> Hi, some corrections below.
>>>>
>>>> Op 10-04-2021 om 15:29 schreef Ferry Toth:
>>>>> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>>>>>> Ferry Toth wrote:
>>>>>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>>>>>> <Thinh.Nguyen@synopsys.com>  wrote:
>>>>>>>>>> Thinh Nguyen wrote:
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a
>>>>>>>>>>> few
>>>>>>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>>>>>>> resume from low power:
>>>>>>>>>>>
>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499418: dwc3_readl:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499423: dwc3_readl:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499425:
>>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499427:
>>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499480: dwc3_event:
>>>>>>>>>>> event
>>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499492: dwc3_event:
>>>>>>>>>>> event
>>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499496: dwc3_event:
>>>>>>>>>>> event
>>>>>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>>>>>>> zsI ==> 0
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499562:
>>>>>>>>>>> dwc3_ep_queue:
>>>>>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499601:
>>>>>>>>>>> dwc3_prepare_trb:
>>>>>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0
>>>>>>>>>>> size
>>>>>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Your device is operating in highspeed right? Try to turn off LPM
>>>>>>>>>>> from
>>>>>>>>>>> host and see if that helps with the speed throttling issue. (If
>>>>>>>>>>> you're
>>>>>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help
>>>>>>>>>>> with
>>>>>>>>>>> the connection issue you saw.
>>>>>>>>>>>
>>>>>>>>>>> It seems to be an issue from host, but I can't tell for sure
>>>>>>>>>>> unless we
>>>>>>>>>>> have some USB traffic analyzer that shows what's going on.
>>>>>>>>>>> Have you
>>>>>>>>>>> tried different hosts?
>>>>>>>>>>>
>>>>>>>>>> You can also disable LPM from the gadget side by setting
>>>>>>>>>> dwc->dis_enblslpm_quirk.
>>>>>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my
>>>>>>>>> case
>>>>>>>>> and perhaps I can collect some traces in my case later on when I
>>>>>>>>> have
>>>>>>>>> more time for that.
>>>>>>>>>
>>>>>>>> Ok thanks all. Here is what I tried:
>>>>>>>>
>>>>>>>> Another computer (Acer 720P brainwashed chromebook), I tried both
>>>>>>>> full
>>>>>>>> speed and high speed. Still throttling but less bad.
>>>>>>>>
>>>>>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>> b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>
>>>>>>>> index 4c5c6972124a..a9268c085840 100644
>>>>>>>>
>>>>>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>
>>>>>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>
>>>>>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>>>>>> dwc3_pci_mrfld_properties[] = {
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>>>>>
>>>>>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>>>>>
>>>>>>>> {}
>>>>>>>>
>>>>>>>> };
>>>>>>>>
>>>>>>>> This fixes the throttling but reveals I had actually at least 2
>>>>>>>> bugs:
>>>>>>>>
>>>>>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>>>>>> Now that we can confirm the speed throttling is related to LPM. We
>>>>>>> can
>>>>>>> try to experiment further. (IMO, LPM is an important feature and
>>>>>>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>>>>>>
>>>>>>> I suspect that your phy/HW has a higher low power exit latency. I
>>>>>>> don't
>>>>>>> think you provided any HIRD threshold property in your setup
>>>>>>> right? So
>>>>>>> by default, dwc3 sets the base line BESL value to 1 (or 150us).
>>>>>>> Unless
>>>>>>> you know what your phy/HW is capable of, try to test and increase the
>>>>>>> recommended BESL value. The range can be from 0 to 15 where 0 is
>>>>>>> 150us
>>>>>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>>>>>
>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>> index 60e850a395a2..423533df8927 100644
>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct
>>>>>>> usb_gadget *g,
>>>>>>>                     * recommended BESL baseline to 1 and clamp the
>>>>>>> BESL deep to be
>>>>>>>                     * within 2 to 15.
>>>>>>>                     */
>>>>>>> -               params->besl_baseline = 1;
>>>>>>> +               params->besl_baseline = 6;
>>>>>>>                    if (dwc->is_utmi_l1_suspend)
>>>>>>>                           params->besl_deep =
>>>>>>>                                    clamp_t(u8, dwc->hird_threshold, 2,
>>>>>>> 15);
>>>>>>>
>>>>>> I will try and report back, hopefully this evening.
>>>>> I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>>>>>>> 2) a problem with usb plug detection
>>>>>>>>
>>>>>>>> When I unplug/replug the gadget cable I need to do that at least
>>>>>>>> another
>>>>>>>> time before gadget is detected. So unplug/replug/unplug/replug
>>>>>>>> seems to
>>>>>>>> work.
>>>>>>>>
>>>>>>>> Also this platform has a HW switch to select host/device mode, with
>>>>>>>> separate connectors for host and device.
>>>>>>>>
>>>>>>>> When I flip the switch to host it immediately changes to host.
>>>>>>>>
>>>>>>>> Flipping to device leaves the LEDs on my connected usb hub on, so
>>>>>>>> it's
>>>>>>>> still powered (but not operational).
>>>>>>>>
>>>>>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>>>>>>> still need to additionally unplug/replug the gadget cable to get
>>>>>>>> that to
>>>>>>>> work.
>>>>>>>>
>>>>>>> The connection issue can come from different things. Please narrow
>>>>>>> it down
>>>>>>> and make sure that you don't use any defective cable or bad hub.
>>>>>>> Even then,
>>>>>>> it's difficult to determine whose fault it is from just the dmesg
>>>>>>> and driver
>>>>>>> logs alone without looking at the USB traffic at the packet level.
>>>>>>>
>>>>>>> Btw, is your setup DRD? If you're switching mode, then I know that
>>>>>>> dwc3 right
>>>>>>> now doesn't implement mode switching correctly.
>>>>>> Yes, we use Extcon driver to support DRD.
>>>>>>> You can see the discussion we have here:
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/*t__;Iw!!A4F2R9G_pg!MXee1rloMlVeQuXlR60t94lr_6imLoVLTEFXzYWhS27dZFAFtH5AWssCZxlDLGcaKy2f$
>>>>>>>
>>>>>>>
>>>>>> I see, that might indeed be related. I will try the patches to see if
>>>>>> that works and report back.
>>>>> I applied both patches:
>>>>>
>>>>> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>>>>>
>>>>> usb: dwc3: Fix DRD mode change sequence following programming guide
>>>>>
>>>>> It doesn't have an effect on the need to unplug/replug neither on the
>>>>> problems switch from host/device mode.
>>>> When I test the correct kernel it does have an effect :-)
>>>>
>>>> In most cases the need to unplug/replug is removed, but not always. In
>>>> the cases when I need to retry the host journal shows "can't set config
>>>> #1, error -110"
>>> It's most likely because the driver didn't provide time for the clocks
>>> synchronization before clearing the GCTL soft reset. I noted that issue
>>> in the patch in the discussion thread. I can send out a patch next week.
>>>
>>>> The switch from host->device and device->host mode seems to be resolved.
>>>>
>>>> Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).
>>>>
>>> Did this happen with disabling LPM or with increasing BESL baseline?
>>> Note that increasing the recommended BESL is not the same as disabling
>>> LPM. With the recommended BESL provided, the host can decide when it
>>> should put the device in low power so that the device has enough time to
>>> wake up. With LPM enabled, there maybe some minor speed degradation but
>>> not that much. Anyway, you can experiment with the BESL value to have
>>> the acceptable speed while still have power saving capability (or
>>> completely disable LPM if power saving is not an issue for you).
>> I tried both, the result was exactly the same.
> That's strange... Also, enabling LPM should not impact the performance that
> much at all. What's changed to your setup?

Strange indeed. Only the combination of both patches had that effect.

And stranger, I backported everything to 5.10.27 and that effect is not 
present. But then I still need to plug/replug the cable 2x.

> Anyway, can you try this patch instead of John Stult's. There are a couple
> of issues from his patches.
>
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 05e2e54cbbdc..675e861fda1a 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -14,6 +14,7 @@
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
>   #include <linux/spinlock.h>
> +#include <linux/mutex.h>
>   #include <linux/platform_device.h>
>   #include <linux/pm_runtime.h>
>   #include <linux/interrupt.h>
> @@ -40,6 +41,8 @@
>   
>   #define DWC3_DEFAULT_AUTOSUSPEND_DELAY 5000 /* ms */
>   
> +static DEFINE_MUTEX(mode_switch_lock);
> +
>   /**
>    * dwc3_get_dr_mode - Validates and sets dr_mode
>    * @dwc: pointer to our context structure
> @@ -114,13 +117,20 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>          dwc->current_dr_role = mode;
>   }
>   
> +static int dwc3_core_soft_reset(struct dwc3 *dwc);
> +
>   static void __dwc3_set_mode(struct work_struct *work)
>   {
>          struct dwc3 *dwc = work_to_dwc(work);
>          unsigned long flags;
> +       unsigned int hw_mode;
>          int ret;
>          u32 reg;
>   
> +       mutex_lock(&mode_switch_lock);
> +
> +       hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0);
> +
>          pm_runtime_get_sync(dwc->dev);
>   
>          if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
> @@ -154,6 +164,24 @@ static void __dwc3_set_mode(struct work_struct *work)
>                  break;
>          }
>   
> +       if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD) {
> +               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
> +               reg |= DWC3_GCTL_CORESOFTRESET;
> +               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +
> +               /*
> +                * Wait for internal clocks to synchronized. DWC_usb31 and
> +                * DWC_usb32 may need at least 50ms (less for DWC_usb3). To
> +                * keep it consistent across different IPs, let's wait up to
> +                * 100ms before clearing GCTL.CORESOFTRESET.
> +                */
> +               msleep(100);
> +
> +               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
> +               reg &= ~DWC3_GCTL_CORESOFTRESET;
> +               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +       }
> +
>          spin_lock_irqsave(&dwc->lock, flags);
>   
>          dwc3_set_prtcap(dwc, dwc->desired_dr_role);
> @@ -178,6 +206,9 @@ static void __dwc3_set_mode(struct work_struct *work)
>                  }
>                  break;
>          case DWC3_GCTL_PRTCAP_DEVICE:
> +               if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD)
> +                       dwc3_core_soft_reset(dwc);
> +
>                  dwc3_event_buffers_setup(dwc);
>   
>                  if (dwc->usb2_phy)
> @@ -200,6 +231,7 @@ static void __dwc3_set_mode(struct work_struct *work)
>   out:
>          pm_runtime_mark_last_busy(dwc->dev);
>          pm_runtime_put_autosuspend(dwc->dev);
> +       mutex_unlock(&mode_switch_lock);
>   }
>   
>   void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
>
I will try this.
> Thanks,
> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-13  2:17                                         ` Thinh Nguyen
  2021-04-13  8:45                                           ` Ferry Toth
@ 2021-04-13 21:06                                           ` Ferry Toth
  2021-04-13 21:21                                             ` Thinh Nguyen
  1 sibling, 1 reply; 27+ messages in thread
From: Ferry Toth @ 2021-04-13 21:06 UTC (permalink / raw)
  To: Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB


Op 13-04-2021 om 04:17 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Op 11-04-2021 om 02:04 schreef Thinh Nguyen:
>>> Ferry Toth wrote:
>>>> Hi, some corrections below.
>>>>
>>>> Op 10-04-2021 om 15:29 schreef Ferry Toth:
>>>>> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>>>>>> Ferry Toth wrote:
>>>>>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>>>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>>>>> Thinh Nguyen wrote:
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>> I took a look at the "bad" and "normal" tracepoints. There are a
>>>>>>>>>>> few
>>>>>>>>>>> 1-second delays where the host tried to bring the device back and
>>>>>>>>>>> resume from low power:
>>>>>>>>>>>
>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501809: dwc3_readl:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501810: dwc3_writel:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499418: dwc3_readl:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499423: dwc3_readl:
>>>>>>>>>>> addr
>>>>>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499425:
>>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499427:
>>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499480: dwc3_event:
>>>>>>>>>>> event
>>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499492: dwc3_event:
>>>>>>>>>>> event
>>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499496: dwc3_event:
>>>>>>>>>>> event
>>>>>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length 73/1536
>>>>>>>>>>> zsI ==> 0
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499562:
>>>>>>>>>>> dwc3_ep_queue:
>>>>>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499601:
>>>>>>>>>>> dwc3_prepare_trb:
>>>>>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0
>>>>>>>>>>> size
>>>>>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Your device is operating in highspeed right? Try to turn off LPM
>>>>>>>>>>> from
>>>>>>>>>>> host and see if that helps with the speed throttling issue. (If
>>>>>>>>>>> you're
>>>>>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also help
>>>>>>>>>>> with
>>>>>>>>>>> the connection issue you saw.
>>>>>>>>>>>
>>>>>>>>>>> It seems to be an issue from host, but I can't tell for sure
>>>>>>>>>>> unless we
>>>>>>>>>>> have some USB traffic analyzer that shows what's going on.
>>>>>>>>>>> Have you
>>>>>>>>>>> tried different hosts?
>>>>>>>>>>>
>>>>>>>>>> You can also disable LPM from the gadget side by setting
>>>>>>>>>> dwc->dis_enblslpm_quirk.
>>>>>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my
>>>>>>>>> case
>>>>>>>>> and perhaps I can collect some traces in my case later on when I
>>>>>>>>> have
>>>>>>>>> more time for that.
>>>>>>>>>
>>>>>>>> Ok thanks all. Here is what I tried:
>>>>>>>>
>>>>>>>> Another computer (Acer 720P brainwashed chromebook), I tried both
>>>>>>>> full
>>>>>>>> speed and high speed. Still throttling but less bad.
>>>>>>>>
>>>>>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this patch:
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>> b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>
>>>>>>>> index 4c5c6972124a..a9268c085840 100644
>>>>>>>>
>>>>>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>
>>>>>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>
>>>>>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>>>>>> dwc3_pci_mrfld_properties[] = {
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>>>>>
>>>>>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>>>>>
>>>>>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>>>>>
>>>>>>>> {}
>>>>>>>>
>>>>>>>> };
>>>>>>>>
>>>>>>>> This fixes the throttling but reveals I had actually at least 2
>>>>>>>> bugs:
>>>>>>>>
>>>>>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>>>>>> Now that we can confirm the speed throttling is related to LPM. We
>>>>>>> can
>>>>>>> try to experiment further. (IMO, LPM is an important feature and
>>>>>>> totally disabling LPM seems like using a sledgehammer to crack a nut)
>>>>>>>
>>>>>>> I suspect that your phy/HW has a higher low power exit latency. I
>>>>>>> don't
>>>>>>> think you provided any HIRD threshold property in your setup
>>>>>>> right? So
>>>>>>> by default, dwc3 sets the base line BESL value to 1 (or 150us).
>>>>>>> Unless
>>>>>>> you know what your phy/HW is capable of, try to test and increase the
>>>>>>> recommended BESL value. The range can be from 0 to 15 where 0 is
>>>>>>> 150us
>>>>>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>>>>>
>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>> index 60e850a395a2..423533df8927 100644
>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct
>>>>>>> usb_gadget *g,
>>>>>>>                     * recommended BESL baseline to 1 and clamp the
>>>>>>> BESL deep to be
>>>>>>>                     * within 2 to 15.
>>>>>>>                     */
>>>>>>> -               params->besl_baseline = 1;
>>>>>>> +               params->besl_baseline = 6;
>>>>>>>                    if (dwc->is_utmi_l1_suspend)
>>>>>>>                           params->besl_deep =
>>>>>>>                                    clamp_t(u8, dwc->hird_threshold, 2,
>>>>>>> 15);
>>>>>>>
>>>>>> I will try and report back, hopefully this evening.
>>>>> I tried this and it seems to have the same effect as dis_enblslpm_quirk
>>>>>>>> 2) a problem with usb plug detection
>>>>>>>>
>>>>>>>> When I unplug/replug the gadget cable I need to do that at least
>>>>>>>> another
>>>>>>>> time before gadget is detected. So unplug/replug/unplug/replug
>>>>>>>> seems to
>>>>>>>> work.
>>>>>>>>
>>>>>>>> Also this platform has a HW switch to select host/device mode, with
>>>>>>>> separate connectors for host and device.
>>>>>>>>
>>>>>>>> When I flip the switch to host it immediately changes to host.
>>>>>>>>
>>>>>>>> Flipping to device leaves the LEDs on my connected usb hub on, so
>>>>>>>> it's
>>>>>>>> still powered (but not operational).
>>>>>>>>
>>>>>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off. But I
>>>>>>>> still need to additionally unplug/replug the gadget cable to get
>>>>>>>> that to
>>>>>>>> work.
>>>>>>>>
>>>>>>> The connection issue can come from different things. Please narrow
>>>>>>> it down
>>>>>>> and make sure that you don't use any defective cable or bad hub.
>>>>>>> Even then,
>>>>>>> it's difficult to determine whose fault it is from just the dmesg
>>>>>>> and driver
>>>>>>> logs alone without looking at the USB traffic at the packet level.
>>>>>>>
>>>>>>> Btw, is your setup DRD? If you're switching mode, then I know that
>>>>>>> dwc3 right
>>>>>>> now doesn't implement mode switching correctly.
>>>>>> Yes, we use Extcon driver to support DRD.
>>>>>>> You can see the discussion we have here:
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/*t__;Iw!!A4F2R9G_pg!MXee1rloMlVeQuXlR60t94lr_6imLoVLTEFXzYWhS27dZFAFtH5AWssCZxlDLGcaKy2f$
>>>>>>>
>>>>>>>
>>>>>> I see, that might indeed be related. I will try the patches to see if
>>>>>> that works and report back.
>>>>> I applied both patches:
>>>>>
>>>>> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>>>>>
>>>>> usb: dwc3: Fix DRD mode change sequence following programming guide
>>>>>
>>>>> It doesn't have an effect on the need to unplug/replug neither on the
>>>>> problems switch from host/device mode.
>>>> When I test the correct kernel it does have an effect :-)
>>>>
>>>> In most cases the need to unplug/replug is removed, but not always. In
>>>> the cases when I need to retry the host journal shows "can't set config
>>>> #1, error -110"
>>> It's most likely because the driver didn't provide time for the clocks
>>> synchronization before clearing the GCTL soft reset. I noted that issue
>>> in the patch in the discussion thread. I can send out a patch next week.
>>>
>>>> The switch from host->device and device->host mode seems to be resolved.
>>>>
>>>> Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).
>>>>
>>> Did this happen with disabling LPM or with increasing BESL baseline?
>>> Note that increasing the recommended BESL is not the same as disabling
>>> LPM. With the recommended BESL provided, the host can decide when it
>>> should put the device in low power so that the device has enough time to
>>> wake up. With LPM enabled, there maybe some minor speed degradation but
>>> not that much. Anyway, you can experiment with the BESL value to have
>>> the acceptable speed while still have power saving capability (or
>>> completely disable LPM if power saving is not an issue for you).
>> I tried both, the result was exactly the same.
> That's strange... Also, enabling LPM should not impact the performance that
> much at all. What's changed to your setup?
>
> Anyway, can you try this patch instead of John Stult's. There are a couple
> of issues from his patches.
>
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 05e2e54cbbdc..675e861fda1a 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -14,6 +14,7 @@
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
>   #include <linux/spinlock.h>
> +#include <linux/mutex.h>
>   #include <linux/platform_device.h>
>   #include <linux/pm_runtime.h>
>   #include <linux/interrupt.h>
> @@ -40,6 +41,8 @@
>   
>   #define DWC3_DEFAULT_AUTOSUSPEND_DELAY 5000 /* ms */
>   
> +static DEFINE_MUTEX(mode_switch_lock);
> +
>   /**
>    * dwc3_get_dr_mode - Validates and sets dr_mode
>    * @dwc: pointer to our context structure
> @@ -114,13 +117,20 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>          dwc->current_dr_role = mode;
>   }
>   
> +static int dwc3_core_soft_reset(struct dwc3 *dwc);
> +
>   static void __dwc3_set_mode(struct work_struct *work)
>   {
>          struct dwc3 *dwc = work_to_dwc(work);
>          unsigned long flags;
> +       unsigned int hw_mode;
>          int ret;
>          u32 reg;
>   
> +       mutex_lock(&mode_switch_lock);
> +
> +       hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0);
> +
>          pm_runtime_get_sync(dwc->dev);
>   
>          if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
> @@ -154,6 +164,24 @@ static void __dwc3_set_mode(struct work_struct *work)
>                  break;
>          }
>   
> +       if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD) {
> +               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
> +               reg |= DWC3_GCTL_CORESOFTRESET;
> +               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +
> +               /*
> +                * Wait for internal clocks to synchronized. DWC_usb31 and
> +                * DWC_usb32 may need at least 50ms (less for DWC_usb3). To
> +                * keep it consistent across different IPs, let's wait up to
> +                * 100ms before clearing GCTL.CORESOFTRESET.
> +                */
> +               msleep(100);
> +
> +               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
> +               reg &= ~DWC3_GCTL_CORESOFTRESET;
> +               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +       }
> +
>          spin_lock_irqsave(&dwc->lock, flags);
>   
>          dwc3_set_prtcap(dwc, dwc->desired_dr_role);
> @@ -178,6 +206,9 @@ static void __dwc3_set_mode(struct work_struct *work)
>                  }
>                  break;
>          case DWC3_GCTL_PRTCAP_DEVICE:
> +               if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD)
> +                       dwc3_core_soft_reset(dwc);
> +
>                  dwc3_event_buffers_setup(dwc);
>   
>                  if (dwc->usb2_phy)
> @@ -200,6 +231,7 @@ static void __dwc3_set_mode(struct work_struct *work)
>   out:
>          pm_runtime_mark_last_busy(dwc->dev);
>          pm_runtime_put_autosuspend(dwc->dev);
> +       mutex_unlock(&mode_switch_lock);
>   }
>   
>   void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
This doesn't apply on 5.12-rc5 correct? On which would you like me to 
test it on?
>
> Thanks,
> Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: USB network gadget / DWC3 issue
  2021-04-13 21:06                                           ` Ferry Toth
@ 2021-04-13 21:21                                             ` Thinh Nguyen
  0 siblings, 0 replies; 27+ messages in thread
From: Thinh Nguyen @ 2021-04-13 21:21 UTC (permalink / raw)
  To: Ferry Toth, Thinh Nguyen, Andy Shevchenko; +Cc: Felipe Balbi, Alan Stern, USB

Ferry Toth wrote:
> 
> Op 13-04-2021 om 04:17 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Op 11-04-2021 om 02:04 schreef Thinh Nguyen:
>>>> Ferry Toth wrote:
>>>>> Hi, some corrections below.
>>>>>
>>>>> Op 10-04-2021 om 15:29 schreef Ferry Toth:
>>>>>> Op 09-04-2021 om 15:26 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 08-04-2021 om 23:12 schreef Thinh Nguyen:
>>>>>>>> Ferry Toth wrote:
>>>>>>>>> Op 07-04-2021 om 15:34 schreef Andy Shevchenko:
>>>>>>>>>> On Wed, Apr 7, 2021 at 3:24 AM Thinh Nguyen
>>>>>>>>>> <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>>>>>> Thinh Nguyen wrote:
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>>>> I took a look at the "bad" and "normal" tracepoints. There
>>>>>>>>>>>> are a
>>>>>>>>>>>> few
>>>>>>>>>>>> 1-second delays where the host tried to bring the device
>>>>>>>>>>>> back and
>>>>>>>>>>>> resume from low power:
>>>>>>>>>>>>
>>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501808:
>>>>>>>>>>>> dwc3_gadget_ep_cmd: ep3in: cmd 'Update Transfer' [60007] params
>>>>>>>>>>>> 00000000 00000000 00000000 --> status: Successful
>>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501809:
>>>>>>>>>>>> dwc3_readl:
>>>>>>>>>>>> addr
>>>>>>>>>>>> 00000000d68ecd36 value 0000a610
>>>>>>>>>>>>          ksoftirqd/0-10      [000] d.s.   231.501810:
>>>>>>>>>>>> dwc3_writel:
>>>>>>>>>>>> addr
>>>>>>>>>>>> 00000000d68ecd36 value 0000a710
>>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499418: dwc3_readl:
>>>>>>>>>>>> addr
>>>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499423: dwc3_readl:
>>>>>>>>>>>> addr
>>>>>>>>>>>> 00000000bb67b585 value 00001000
>>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499425:
>>>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>>>> 00000000bb67b585 value 80001000
>>>>>>>>>>>>               <idle>-0       [000] d.h. 232.499427:
>>>>>>>>>>>> dwc3_writel: addr
>>>>>>>>>>>> 00000000a15e0e35 value 00000034
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499480:
>>>>>>>>>>>> dwc3_event:
>>>>>>>>>>>> event
>>>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499492:
>>>>>>>>>>>> dwc3_event:
>>>>>>>>>>>> event
>>>>>>>>>>>> (00000401): WakeUp [U0]
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499496:
>>>>>>>>>>>> dwc3_event:
>>>>>>>>>>>> event
>>>>>>>>>>>> (00006088): ep2out: Transfer In Progress [0] (SIm)
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499501:
>>>>>>>>>>>> dwc3_complete_trb: ep2out: trb 00000000c7ce524e (E179:D170) buf
>>>>>>>>>>>> 0000000008273540 size 1463 ctrl 00000818 (hlcS:sC:normal)
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499518:
>>>>>>>>>>>> dwc3_gadget_giveback: ep2out: req 0000000012e296cf length
>>>>>>>>>>>> 73/1536
>>>>>>>>>>>> zsI ==> 0
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499562:
>>>>>>>>>>>> dwc3_ep_queue:
>>>>>>>>>>>> ep2out: req 0000000012e296cf length 0/1536 zsI ==> -115
>>>>>>>>>>>>          irq/15-dwc3-476     [000] d...   232.499601:
>>>>>>>>>>>> dwc3_prepare_trb:
>>>>>>>>>>>> ep2out: trb 000000008c083777 (E180:D170) buf 0000000002a7e9c0
>>>>>>>>>>>> size
>>>>>>>>>>>> 1536 ctrl 00000819 (HlcS:sC:normal)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Your device is operating in highspeed right? Try to turn off
>>>>>>>>>>>> LPM
>>>>>>>>>>>> from
>>>>>>>>>>>> host and see if that helps with the speed throttling issue. (If
>>>>>>>>>>>> you're
>>>>>>>>>>>> using xHCI host, then set XHCI_HW_LPM_DISABLE). It may also
>>>>>>>>>>>> help
>>>>>>>>>>>> with
>>>>>>>>>>>> the connection issue you saw.
>>>>>>>>>>>>
>>>>>>>>>>>> It seems to be an issue from host, but I can't tell for sure
>>>>>>>>>>>> unless we
>>>>>>>>>>>> have some USB traffic analyzer that shows what's going on.
>>>>>>>>>>>> Have you
>>>>>>>>>>>> tried different hosts?
>>>>>>>>>>>>
>>>>>>>>>>> You can also disable LPM from the gadget side by setting
>>>>>>>>>>> dwc->dis_enblslpm_quirk.
>>>>>>>>>> Ferry, it can be done by adding a corresponding property to the
>>>>>>>>>> dwc3-pci.c for Intel Merrifield platform. I'll check also for my
>>>>>>>>>> case
>>>>>>>>>> and perhaps I can collect some traces in my case later on when I
>>>>>>>>>> have
>>>>>>>>>> more time for that.
>>>>>>>>>>
>>>>>>>>> Ok thanks all. Here is what I tried:
>>>>>>>>>
>>>>>>>>> Another computer (Acer 720P brainwashed chromebook), I tried both
>>>>>>>>> full
>>>>>>>>> speed and high speed. Still throttling but less bad.
>>>>>>>>>
>>>>>>>>> Then on desktop, with Edison kernel 5.12-rc5 as above + this
>>>>>>>>> patch:
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>> b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>>
>>>>>>>>> index 4c5c6972124a..a9268c085840 100644
>>>>>>>>>
>>>>>>>>> --- a/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>>
>>>>>>>>> +++ b/drivers/usb/dwc3/dwc3-pci.c
>>>>>>>>>
>>>>>>>>> @@ -122,6 +122,7 @@ static const struct property_entry
>>>>>>>>> dwc3_pci_mrfld_properties[] = {
>>>>>>>>>
>>>>>>>>> PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
>>>>>>>>>
>>>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>>>>>>>>>
>>>>>>>>> PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>>>>>>>>>
>>>>>>>>> + PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
>>>>>>>>>
>>>>>>>>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>>>>>>>>>
>>>>>>>>> {}
>>>>>>>>>
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> This fixes the throttling but reveals I had actually at least 2
>>>>>>>>> bugs:
>>>>>>>>>
>>>>>>>>> 1) throttling due to LPM, this seems solved now, thanks to much!
>>>>>>>> Now that we can confirm the speed throttling is related to LPM. We
>>>>>>>> can
>>>>>>>> try to experiment further. (IMO, LPM is an important feature and
>>>>>>>> totally disabling LPM seems like using a sledgehammer to crack a
>>>>>>>> nut)
>>>>>>>>
>>>>>>>> I suspect that your phy/HW has a higher low power exit latency. I
>>>>>>>> don't
>>>>>>>> think you provided any HIRD threshold property in your setup
>>>>>>>> right? So
>>>>>>>> by default, dwc3 sets the base line BESL value to 1 (or 150us).
>>>>>>>> Unless
>>>>>>>> you know what your phy/HW is capable of, try to test and
>>>>>>>> increase the
>>>>>>>> recommended BESL value. The range can be from 0 to 15 where 0 is
>>>>>>>> 150us
>>>>>>>> and 15 is 10ms. Maybe try 6 (i.e. 1ms).
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>> index 60e850a395a2..423533df8927 100644
>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>> @@ -2895,7 +2895,7 @@ static void dwc3_gadget_config_params(struct
>>>>>>>> usb_gadget *g,
>>>>>>>>                     * recommended BESL baseline to 1 and clamp the
>>>>>>>> BESL deep to be
>>>>>>>>                     * within 2 to 15.
>>>>>>>>                     */
>>>>>>>> -               params->besl_baseline = 1;
>>>>>>>> +               params->besl_baseline = 6;
>>>>>>>>                    if (dwc->is_utmi_l1_suspend)
>>>>>>>>                           params->besl_deep =
>>>>>>>>                                    clamp_t(u8,
>>>>>>>> dwc->hird_threshold, 2,
>>>>>>>> 15);
>>>>>>>>
>>>>>>> I will try and report back, hopefully this evening.
>>>>>> I tried this and it seems to have the same effect as
>>>>>> dis_enblslpm_quirk
>>>>>>>>> 2) a problem with usb plug detection
>>>>>>>>>
>>>>>>>>> When I unplug/replug the gadget cable I need to do that at least
>>>>>>>>> another
>>>>>>>>> time before gadget is detected. So unplug/replug/unplug/replug
>>>>>>>>> seems to
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>>> Also this platform has a HW switch to select host/device mode,
>>>>>>>>> with
>>>>>>>>> separate connectors for host and device.
>>>>>>>>>
>>>>>>>>> When I flip the switch to host it immediately changes to host.
>>>>>>>>>
>>>>>>>>> Flipping to device leaves the LEDs on my connected usb hub on, so
>>>>>>>>> it's
>>>>>>>>> still powered (but not operational).
>>>>>>>>>
>>>>>>>>> Flipping fast host/device (within 1/2 sec) hub LEDs turns off.
>>>>>>>>> But I
>>>>>>>>> still need to additionally unplug/replug the gadget cable to get
>>>>>>>>> that to
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>> The connection issue can come from different things. Please narrow
>>>>>>>> it down
>>>>>>>> and make sure that you don't use any defective cable or bad hub.
>>>>>>>> Even then,
>>>>>>>> it's difficult to determine whose fault it is from just the dmesg
>>>>>>>> and driver
>>>>>>>> logs alone without looking at the USB traffic at the packet level.
>>>>>>>>
>>>>>>>> Btw, is your setup DRD? If you're switching mode, then I know that
>>>>>>>> dwc3 right
>>>>>>>> now doesn't implement mode switching correctly.
>>>>>>> Yes, we use Extcon driver to support DRD.
>>>>>>>> You can see the discussion we have here:
>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro.org/T/*t__;Iw!!A4F2R9G_pg!MXee1rloMlVeQuXlR60t94lr_6imLoVLTEFXzYWhS27dZFAFtH5AWssCZxlDLGcaKy2f$
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I see, that might indeed be related. I will try the patches to
>>>>>>> see if
>>>>>>> that works and report back.
>>>>>> I applied both patches:
>>>>>>
>>>>>> usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD
>>>>>>
>>>>>> usb: dwc3: Fix DRD mode change sequence following programming guide
>>>>>>
>>>>>> It doesn't have an effect on the need to unplug/replug neither on the
>>>>>> problems switch from host/device mode.
>>>>> When I test the correct kernel it does have an effect :-)
>>>>>
>>>>> In most cases the need to unplug/replug is removed, but not always. In
>>>>> the cases when I need to retry the host journal shows "can't set
>>>>> config
>>>>> #1, error -110"
>>>> It's most likely because the driver didn't provide time for the clocks
>>>> synchronization before clearing the GCTL soft reset. I noted that issue
>>>> in the patch in the discussion thread. I can send out a patch next
>>>> week.
>>>>
>>>>> The switch from host->device and device->host mode seems to be
>>>>> resolved.
>>>>>
>>>>> Strangely, iperf3 now reports 130 Mbits/sec (down from 200 Mbits/sec).
>>>>>
>>>> Did this happen with disabling LPM or with increasing BESL baseline?
>>>> Note that increasing the recommended BESL is not the same as disabling
>>>> LPM. With the recommended BESL provided, the host can decide when it
>>>> should put the device in low power so that the device has enough
>>>> time to
>>>> wake up. With LPM enabled, there maybe some minor speed degradation but
>>>> not that much. Anyway, you can experiment with the BESL value to have
>>>> the acceptable speed while still have power saving capability (or
>>>> completely disable LPM if power saving is not an issue for you).
>>> I tried both, the result was exactly the same.
>> That's strange... Also, enabling LPM should not impact the performance
>> that
>> much at all. What's changed to your setup?
>>
>> Anyway, can you try this patch instead of John Stult's. There are a
>> couple
>> of issues from his patches.
>>
>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>> index 05e2e54cbbdc..675e861fda1a 100644
>> --- a/drivers/usb/dwc3/core.c
>> +++ b/drivers/usb/dwc3/core.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/kernel.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>> +#include <linux/mutex.h>
>>   #include <linux/platform_device.h>
>>   #include <linux/pm_runtime.h>
>>   #include <linux/interrupt.h>
>> @@ -40,6 +41,8 @@
>>     #define DWC3_DEFAULT_AUTOSUSPEND_DELAY 5000 /* ms */
>>   +static DEFINE_MUTEX(mode_switch_lock);
>> +
>>   /**
>>    * dwc3_get_dr_mode - Validates and sets dr_mode
>>    * @dwc: pointer to our context structure
>> @@ -114,13 +117,20 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>>          dwc->current_dr_role = mode;
>>   }
>>   +static int dwc3_core_soft_reset(struct dwc3 *dwc);
>> +
>>   static void __dwc3_set_mode(struct work_struct *work)
>>   {
>>          struct dwc3 *dwc = work_to_dwc(work);
>>          unsigned long flags;
>> +       unsigned int hw_mode;
>>          int ret;
>>          u32 reg;
>>   +       mutex_lock(&mode_switch_lock);
>> +
>> +       hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0);
>> +
>>          pm_runtime_get_sync(dwc->dev);
>>            if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
>> @@ -154,6 +164,24 @@ static void __dwc3_set_mode(struct work_struct
>> *work)
>>                  break;
>>          }
>>   +       if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD) {
>> +               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
>> +               reg |= DWC3_GCTL_CORESOFTRESET;
>> +               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
>> +
>> +               /*
>> +                * Wait for internal clocks to synchronized. DWC_usb31
>> and
>> +                * DWC_usb32 may need at least 50ms (less for
>> DWC_usb3). To
>> +                * keep it consistent across different IPs, let's wait
>> up to
>> +                * 100ms before clearing GCTL.CORESOFTRESET.
>> +                */
>> +               msleep(100);
>> +
>> +               reg = dwc3_readl(dwc->regs, DWC3_GCTL);
>> +               reg &= ~DWC3_GCTL_CORESOFTRESET;
>> +               dwc3_writel(dwc->regs, DWC3_GCTL, reg);
>> +       }
>> +
>>          spin_lock_irqsave(&dwc->lock, flags);
>>            dwc3_set_prtcap(dwc, dwc->desired_dr_role);
>> @@ -178,6 +206,9 @@ static void __dwc3_set_mode(struct work_struct *work)
>>                  }
>>                  break;
>>          case DWC3_GCTL_PRTCAP_DEVICE:
>> +               if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD)
>> +                       dwc3_core_soft_reset(dwc);
>> +
>>                  dwc3_event_buffers_setup(dwc);
>>                    if (dwc->usb2_phy)
>> @@ -200,6 +231,7 @@ static void __dwc3_set_mode(struct work_struct *work)
>>   out:
>>          pm_runtime_mark_last_busy(dwc->dev);
>>          pm_runtime_put_autosuspend(dwc->dev);
>> +       mutex_unlock(&mode_switch_lock);
>>   }
>>     void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
> This doesn't apply on 5.12-rc5 correct? On which would you like me to
> test it on?

Please test on Greg's "usb-next" branch
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 12:37 USB network gadget / DWC3 issue Andy Shevchenko
2021-03-30 16:17 ` Felipe Balbi
2021-03-30 20:26   ` Ferry Toth
2021-03-30 21:57     ` Ferry Toth
2021-04-02 19:12       ` Ferry Toth
2021-04-02 20:16         ` Thinh Nguyen
2021-04-02 22:40           ` Ferry Toth
2021-04-03  2:02             ` Thinh Nguyen
2021-04-03 11:25               ` Ferry Toth
2021-04-03 21:15                 ` Ferry Toth
2021-04-05 20:59                   ` Ferry Toth
2021-04-07  0:10                     ` Thinh Nguyen
2021-04-07  0:24                       ` Thinh Nguyen
2021-04-07 13:34                         ` Andy Shevchenko
2021-04-07 16:08                           ` Ferry Toth
2021-04-08 20:17                           ` Ferry Toth
2021-04-08 21:12                             ` Thinh Nguyen
2021-04-08 21:37                               ` Thinh Nguyen
2021-04-09 13:26                               ` Ferry Toth
2021-04-10 13:29                                 ` Ferry Toth
2021-04-10 14:08                                   ` Ferry Toth
2021-04-11  0:04                                     ` Thinh Nguyen
2021-04-11 15:26                                       ` Ferry Toth
2021-04-13  2:17                                         ` Thinh Nguyen
2021-04-13  8:45                                           ` Ferry Toth
2021-04-13 21:06                                           ` Ferry Toth
2021-04-13 21:21                                             ` Thinh Nguyen

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git