All of lore.kernel.org
 help / color / mirror / Atom feed
* dwc3 inconsistent gadget connection state?
@ 2020-07-02 21:44 John Stultz
  2020-07-03  2:55 ` Jun Li
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: John Stultz @ 2020-07-02 21:44 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

Hey Felipe,

  I've been tripping over an issue on my HiKey960 where with the usb-c
gadget cable connected, the gadget code doesn't consistently seem to
initialize properly. I had rarely seen this behavior previously, but
more recently it has become more frequent and annoying.

Usually, unplugging and replugging the USB-C cable would get things
working again (but that's not helpful in test labs).

I annotated a bunch of code trying to understand what was going on and
I narrowed down the difference in the good and bad case to a dwc3
reset interrupts happening after usb_gadget_probe_driver() completes.
In the good case, we see the reset interrupts, and in the failed case
we don't.

[   16.491953] JDB: usb_gadget_probe_driver
[   16.495938] JDB: udc_bind_to_driver
[   16.499555] JDB: dwc3_gadget_start irq: 65 revision: 1429417994
[   16.503803] JDB: __dwc3_gadget_ep_enable
[   16.507791] JDB: __dwc3_gadget_ep_enable
[   16.511715] JDB: dwc3_gadget_enable_irq
[   16.515582] JDB: usb_udc_connect_control
[   16.519510] JDB: usb_gadget_connect
<in the bad case, this is all we see, the gadget device doesn't come up>
[   16.811010] JDB: dwc3_gadget_interrupt
[   16.814783] JDB: dwc3_gadget_reset_interrupt
[   16.819047] JDB: dwc3_reset_gadget
[   16.823935] JDB: dwc3_gadget_interrupt
[   16.827686] JDB: __dwc3_gadget_ep_enable
[   16.831611] JDB: __dwc3_gadget_ep_enable
[   16.994477] JDB: dwc3_gadget_interrupt
[   16.998246] JDB: dwc3_gadget_reset_interrupt
[   17.002519] JDB: dwc3_reset_gadget
[   17.005922] JDB: usb_gadget_udc_reset
[   17.062422] JDB: usb_gadget_set_state  state: 5
[   17.067069] JDB: dwc3_gadget_interrupt
[   17.070823] JDB: __dwc3_gadget_ep_enable
[   17.074745] JDB: __dwc3_gadget_ep_enable
[   17.170898] JDB: usb_gadget_set_state  state: 6
[   17.195605] JDB: usb_gadget_set_state  state: 7
[   17.200179] JDB: __dwc3_gadget_ep_enable
[   17.204118] JDB: __dwc3_gadget_ep_enable
[   17.208057] JDB: usb_gadget_vbus_draw
[   17.211721] JDB: usb_gadget_set_state  state: 7
<in the good case everything is happy here>


This sounds a bit like the issue in the comment here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.8-rc3#n3143

However, I've tried calling dwc3_gadget_reset_interrupt() and
dwc3_reset_gadget() at the tail end of dwc3_gadget_start() but that
doesn't seem to help.

I was curious if you or anyone else had any thoughts on how to debug
this further?

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-02 21:44 dwc3 inconsistent gadget connection state? John Stultz
@ 2020-07-03  2:55 ` Jun Li
  2020-07-03  3:08   ` John Stultz
  2020-07-03  6:15 ` John Stultz
  2020-07-03  9:54 ` Felipe Balbi
  2 siblings, 1 reply; 12+ messages in thread
From: Jun Li @ 2020-07-03  2:55 UTC (permalink / raw)
  To: John Stultz
  Cc: Felipe Balbi, Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha,
	YongQin Liu, Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

John Stultz <john.stultz@linaro.org> 于2020年7月3日周五 上午5:46写道:
>
> Hey Felipe,
>
>   I've been tripping over an issue on my HiKey960 where with the usb-c
> gadget cable connected, the gadget code doesn't consistently seem to
> initialize properly. I had rarely seen this behavior previously, but
> more recently it has become more frequent and annoying.
>
> Usually, unplugging and replugging the USB-C cable would get things
> working again (but that's not helpful in test labs).
>
> I annotated a bunch of code trying to understand what was going on and
> I narrowed down the difference in the good and bad case to a dwc3
> reset interrupts happening after usb_gadget_probe_driver() completes.
> In the good case, we see the reset interrupts, and in the failed case
> we don't.
>
> [   16.491953] JDB: usb_gadget_probe_driver
> [   16.495938] JDB: udc_bind_to_driver
> [   16.499555] JDB: dwc3_gadget_start irq: 65 revision: 1429417994
> [   16.503803] JDB: __dwc3_gadget_ep_enable
> [   16.507791] JDB: __dwc3_gadget_ep_enable
> [   16.511715] JDB: dwc3_gadget_enable_irq
> [   16.515582] JDB: usb_udc_connect_control
> [   16.519510] JDB: usb_gadget_connect
> <in the bad case, this is all we see, the gadget device doesn't come up>
> [   16.811010] JDB: dwc3_gadget_interrupt
> [   16.814783] JDB: dwc3_gadget_reset_interrupt
> [   16.819047] JDB: dwc3_reset_gadget
> [   16.823935] JDB: dwc3_gadget_interrupt
> [   16.827686] JDB: __dwc3_gadget_ep_enable
> [   16.831611] JDB: __dwc3_gadget_ep_enable
> [   16.994477] JDB: dwc3_gadget_interrupt
> [   16.998246] JDB: dwc3_gadget_reset_interrupt
> [   17.002519] JDB: dwc3_reset_gadget
> [   17.005922] JDB: usb_gadget_udc_reset
> [   17.062422] JDB: usb_gadget_set_state  state: 5
> [   17.067069] JDB: dwc3_gadget_interrupt
> [   17.070823] JDB: __dwc3_gadget_ep_enable
> [   17.074745] JDB: __dwc3_gadget_ep_enable
> [   17.170898] JDB: usb_gadget_set_state  state: 6
> [   17.195605] JDB: usb_gadget_set_state  state: 7
> [   17.200179] JDB: __dwc3_gadget_ep_enable
> [   17.204118] JDB: __dwc3_gadget_ep_enable
> [   17.208057] JDB: usb_gadget_vbus_draw
> [   17.211721] JDB: usb_gadget_set_state  state: 7
> <in the good case everything is happy here>
>
>
> This sounds a bit like the issue in the comment here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.8-rc3#n3143
>
> However, I've tried calling dwc3_gadget_reset_interrupt() and
> dwc3_reset_gadget() at the tail end of dwc3_gadget_start() but that
> doesn't seem to help.
>
> I was curious if you or anyone else had any thoughts on how to debug
> this further?

If you force your gadget to be USB2(e.g. in dts)

+       maximum-speed = "high-speed";

will you still reproduce this issue?

Does your gadget connect to host super speed port directly via a C-to-A cable
in your test labs? or there is something between?

Li Jun
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-03  2:55 ` Jun Li
@ 2020-07-03  3:08   ` John Stultz
  2020-07-03  7:46     ` Jun Li
  0 siblings, 1 reply; 12+ messages in thread
From: John Stultz @ 2020-07-03  3:08 UTC (permalink / raw)
  To: Jun Li
  Cc: Felipe Balbi, Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha,
	YongQin Liu, Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

On Thu, Jul 2, 2020 at 7:55 PM Jun Li <lijun.kernel@gmail.com> wrote:
> John Stultz <john.stultz@linaro.org> 于2020年7月3日周五 上午5:46写道:
> > I was curious if you or anyone else had any thoughts on how to debug
> > this further?
>
> If you force your gadget to be USB2(e.g. in dts)
>
> +       maximum-speed = "high-speed";
>
> will you still reproduce this issue?

Thanks for the suggestion! Unfortunately, I gave that a try, but still
reproduced the same issue with this setting.

Curious, what the issue is your were thinking this would help with?

> Does your gadget connect to host super speed port directly via a C-to-A cable
> in your test labs? or there is something between?

I'm not sure of the details in the lab, however I can reproduce this
on my desk with a Host machine <-> USB hub <-> USB-C port.

Additionally, the board itself is a little complicated, in that the
USB-C port is USB2 only (however, it does have two USB-A USB3 ports
behind an on-board hub and a switch to decide if the USB-C or hub
ports are enabled since there is only one usb controller).

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-02 21:44 dwc3 inconsistent gadget connection state? John Stultz
  2020-07-03  2:55 ` Jun Li
@ 2020-07-03  6:15 ` John Stultz
  2020-07-03  7:57   ` Anurag Kumar Vulisha
  2020-07-03  9:54 ` Felipe Balbi
  2 siblings, 1 reply; 12+ messages in thread
From: John Stultz @ 2020-07-03  6:15 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List, Jun Li

On Thu, Jul 2, 2020 at 2:44 PM John Stultz <john.stultz@linaro.org> wrote:
>
>   I've been tripping over an issue on my HiKey960 where with the usb-c
> gadget cable connected, the gadget code doesn't consistently seem to
> initialize properly. I had rarely seen this behavior previously, but
> more recently it has become more frequent and annoying.
>
> Usually, unplugging and replugging the USB-C cable would get things
> working again (but that's not helpful in test labs).
>
> I annotated a bunch of code trying to understand what was going on and
> I narrowed down the difference in the good and bad case to a dwc3
> reset interrupts happening after usb_gadget_probe_driver() completes.
> In the good case, we see the reset interrupts, and in the failed case
> we don't.

So I've kept digging around on this, and started dumping registers at
the end of dwc3_gadget_start() and then dwc3_gadget_pullup() as that
still is called shortly after in both cases.

The one consistent difference between the working and not working case
I saw was the DWC3_DSTS_COREIDLE bit in the DWC3_DSTS register.

It seems when we get to gadget_start()/pullup() if the DSTS_COREIDLE
bit isn't on we won't get the reset irq.

I added a simple timeout loop to pullup() similar to the
DSTS_DEVCTRLHLT loop, but in the failure mode it always times out with
COREIDLE not being set.

Searching around hasn't provided any info on what COREIDLE actually
means, so I'm a bit in the dark.  Any clues?

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-03  3:08   ` John Stultz
@ 2020-07-03  7:46     ` Jun Li
  0 siblings, 0 replies; 12+ messages in thread
From: Jun Li @ 2020-07-03  7:46 UTC (permalink / raw)
  To: John Stultz
  Cc: Felipe Balbi, Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha,
	YongQin Liu, Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

John Stultz <john.stultz@linaro.org> 于2020年7月3日周五 上午11:08写道:
>
> On Thu, Jul 2, 2020 at 7:55 PM Jun Li <lijun.kernel@gmail.com> wrote:
> > John Stultz <john.stultz@linaro.org> 于2020年7月3日周五 上午5:46写道:
> > > I was curious if you or anyone else had any thoughts on how to debug
> > > this further?
> >
> > If you force your gadget to be USB2(e.g. in dts)
> >
> > +       maximum-speed = "high-speed";
> >
> > will you still reproduce this issue?
>
> Thanks for the suggestion! Unfortunately, I gave that a try, but still
> reproduced the same issue with this setting.
>
> Curious, what the issue is your were thinking this would help with?

I had experience device mode had problem on super speed channel
with some switch device between the host and type-C port, then it will
not downgrade to enable USB2 term so host can't detect the my board's
typec-C port.

>
> > Does your gadget connect to host super speed port directly via a C-to-A cable
> > in your test labs? or there is something between?
>
> I'm not sure of the details in the lab, however I can reproduce this
> on my desk with a Host machine <-> USB hub <-> USB-C port.
>
> Additionally, the board itself is a little complicated, in that the
> USB-C port is USB2 only (however, it does have two USB-A USB3 ports
> behind an on-board hub and a switch to decide if the USB-C or hub
> ports are enabled since there is only one usb controller).

So actully you should limit the gadget speed to be high speed for your
USB2 only type-C port.

Does the host machine can detect the connection when you plug in?

Li Jun
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: dwc3 inconsistent gadget connection state?
  2020-07-03  6:15 ` John Stultz
@ 2020-07-03  7:57   ` Anurag Kumar Vulisha
  2020-08-05  5:32     ` John Stultz
  0 siblings, 1 reply; 12+ messages in thread
From: Anurag Kumar Vulisha @ 2020-07-03  7:57 UTC (permalink / raw)
  To: John Stultz, Felipe Balbi
  Cc: Tejas Joglekar, Yang Fei, YongQin Liu, Andrzej Pietrasiewicz,
	Thinh Nguyen, Linux USB List, Jun Li

Hi John,

>-----Original Message-----
>From: John Stultz <john.stultz@linaro.org>
>Sent: Friday, July 3, 2020 11:46 AM
>To: Felipe Balbi <balbi@kernel.org>
>Cc: Tejas Joglekar <tejas.joglekar@synopsys.com>; Yang Fei
><fei.yang@intel.com>; Anurag Kumar Vulisha <anuragku@xilinx.com>;
>YongQin Liu <yongqin.liu@linaro.org>; Andrzej Pietrasiewicz
><andrzej.p@collabora.com>; Thinh Nguyen <thinhn@synopsys.com>; Linux
>USB List <linux-usb@vger.kernel.org>; Jun Li <lijun.kernel@gmail.com>
>Subject: Re: dwc3 inconsistent gadget connection state?
>
>On Thu, Jul 2, 2020 at 2:44 PM John Stultz <john.stultz@linaro.org> wrote:
>>
>>   I've been tripping over an issue on my HiKey960 where with the usb-c
>> gadget cable connected, the gadget code doesn't consistently seem to
>> initialize properly. I had rarely seen this behavior previously, but
>> more recently it has become more frequent and annoying.
>>
>> Usually, unplugging and replugging the USB-C cable would get things
>> working again (but that's not helpful in test labs).
>>
>> I annotated a bunch of code trying to understand what was going on and
>> I narrowed down the difference in the good and bad case to a dwc3
>> reset interrupts happening after usb_gadget_probe_driver() completes.
>> In the good case, we see the reset interrupts, and in the failed case
>> we don't.
>
>So I've kept digging around on this, and started dumping registers at the end
>of dwc3_gadget_start() and then dwc3_gadget_pullup() as that still is called
>shortly after in both cases.
>
>The one consistent difference between the working and not working case I
>saw was the DWC3_DSTS_COREIDLE bit in the DWC3_DSTS register.
>
>It seems when we get to gadget_start()/pullup() if the DSTS_COREIDLE bit
>isn't on we won't get the reset irq.
>
>I added a simple timeout loop to pullup() similar to the DSTS_DEVCTRLHLT
>loop, but in the failure mode it always times out with COREIDLE not being set.
>
>Searching around hasn't provided any info on what COREIDLE actually means,
>so I'm a bit in the dark.  Any clues?
>
DSTS.CoreIdle bit indicates that the core processed all the RXFIFO data, updated the
Descriptors and is in idle state.
From your previous mail I understood that the USB-C connection is configured for
USB 2.0 only. Since you are facing issue with reset, can u please try setting the
USB2PHYCFG. XCVRDLY bit. Enabling this bit adds an extra 2.5us delay after the
controller sending command to configure the ULPI transceiver to HS mode and
controller driving TxValid to 0,  for sending a HS chirp signal. Please check if this
workaround works for you.

Thanks,
Anurag Kumar Vulisha


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-02 21:44 dwc3 inconsistent gadget connection state? John Stultz
  2020-07-03  2:55 ` Jun Li
  2020-07-03  6:15 ` John Stultz
@ 2020-07-03  9:54 ` Felipe Balbi
  2020-07-04  5:51   ` John Stultz
  2 siblings, 1 reply; 12+ messages in thread
From: Felipe Balbi @ 2020-07-03  9:54 UTC (permalink / raw)
  To: John Stultz
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

[-- Attachment #1: Type: text/plain, Size: 2905 bytes --]


Hi,

John Stultz <john.stultz@linaro.org> writes:
>   I've been tripping over an issue on my HiKey960 where with the usb-c
> gadget cable connected, the gadget code doesn't consistently seem to
> initialize properly. I had rarely seen this behavior previously, but
> more recently it has become more frequent and annoying.
>
> Usually, unplugging and replugging the USB-C cable would get things
> working again (but that's not helpful in test labs).
>
> I annotated a bunch of code trying to understand what was going on and
> I narrowed down the difference in the good and bad case to a dwc3
> reset interrupts happening after usb_gadget_probe_driver() completes.
> In the good case, we see the reset interrupts, and in the failed case
> we don't.
>
> [   16.491953] JDB: usb_gadget_probe_driver
> [   16.495938] JDB: udc_bind_to_driver
> [   16.499555] JDB: dwc3_gadget_start irq: 65 revision: 1429417994
> [   16.503803] JDB: __dwc3_gadget_ep_enable
> [   16.507791] JDB: __dwc3_gadget_ep_enable
> [   16.511715] JDB: dwc3_gadget_enable_irq
> [   16.515582] JDB: usb_udc_connect_control
> [   16.519510] JDB: usb_gadget_connect
> <in the bad case, this is all we see, the gadget device doesn't come up>
> [   16.811010] JDB: dwc3_gadget_interrupt
> [   16.814783] JDB: dwc3_gadget_reset_interrupt
> [   16.819047] JDB: dwc3_reset_gadget
> [   16.823935] JDB: dwc3_gadget_interrupt
> [   16.827686] JDB: __dwc3_gadget_ep_enable
> [   16.831611] JDB: __dwc3_gadget_ep_enable
> [   16.994477] JDB: dwc3_gadget_interrupt
> [   16.998246] JDB: dwc3_gadget_reset_interrupt
> [   17.002519] JDB: dwc3_reset_gadget
> [   17.005922] JDB: usb_gadget_udc_reset
> [   17.062422] JDB: usb_gadget_set_state  state: 5
> [   17.067069] JDB: dwc3_gadget_interrupt
> [   17.070823] JDB: __dwc3_gadget_ep_enable
> [   17.074745] JDB: __dwc3_gadget_ep_enable
> [   17.170898] JDB: usb_gadget_set_state  state: 6
> [   17.195605] JDB: usb_gadget_set_state  state: 7
> [   17.200179] JDB: __dwc3_gadget_ep_enable
> [   17.204118] JDB: __dwc3_gadget_ep_enable
> [   17.208057] JDB: usb_gadget_vbus_draw
> [   17.211721] JDB: usb_gadget_set_state  state: 7
> <in the good case everything is happy here>
>
>
> This sounds a bit like the issue in the comment here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.8-rc3#n3143
>
> However, I've tried calling dwc3_gadget_reset_interrupt() and
> dwc3_reset_gadget() at the tail end of dwc3_gadget_start() but that
> doesn't seem to help.
>
> I was curious if you or anyone else had any thoughts on how to debug
> this further?

Try enabling dwc3 tracepoints and collecting working and failing
cases. If I were to guess, I would say there's a small race condition
between setting pullup and the transceiver sending the VBUS_VALID signal
to dwc3.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-03  9:54 ` Felipe Balbi
@ 2020-07-04  5:51   ` John Stultz
  2020-07-04 14:38     ` Felipe Balbi
  0 siblings, 1 reply; 12+ messages in thread
From: John Stultz @ 2020-07-04  5:51 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

[-- Attachment #1: Type: text/plain, Size: 514 bytes --]

On Fri, Jul 3, 2020 at 2:54 AM Felipe Balbi <balbi@kernel.org> wrote:
> John Stultz <john.stultz@linaro.org> writes:
> > I was curious if you or anyone else had any thoughts on how to debug
> > this further?
>
> Try enabling dwc3 tracepoints and collecting working and failing
> cases. If I were to guess, I would say there's a small race condition
> between setting pullup and the transceiver sending the VBUS_VALID signal
> to dwc3.

Trace logs attached. Let me know if you have any further ideas!

thanks
-john

[-- Attachment #2: hikey960.tar.xz --]
[-- Type: application/octet-stream, Size: 10284 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-04  5:51   ` John Stultz
@ 2020-07-04 14:38     ` Felipe Balbi
  2020-07-07  3:56       ` John Stultz
  0 siblings, 1 reply; 12+ messages in thread
From: Felipe Balbi @ 2020-07-04 14:38 UTC (permalink / raw)
  To: John Stultz
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]


Hi,

John Stultz <john.stultz@linaro.org> writes:
> On Fri, Jul 3, 2020 at 2:54 AM Felipe Balbi <balbi@kernel.org> wrote:
>> John Stultz <john.stultz@linaro.org> writes:
>> > I was curious if you or anyone else had any thoughts on how to debug
>> > this further?
>>
>> Try enabling dwc3 tracepoints and collecting working and failing
>> cases. If I were to guess, I would say there's a small race condition
>> between setting pullup and the transceiver sending the VBUS_VALID signal
>> to dwc3.
>
> Trace logs attached. Let me know if you have any further ideas!

You can see from failure case that we never got a Reset event. This
happens, for instance, when dwc3 doesn't know that VBUS is above
VBUS_VALID threshold (4.4V). When the problem happens, I'm assuming USB
is completely dead, meaning that keeping the cable connected for longer
won't change anything, right?

In that case, could you dump DWC3 registers (there's a debugfs interface
for that)? I'm mostly interested in the PHY registers, both USB2 and
USB3. Check if the PHYs are suspended in the error case.

If they are, try enabling the quirk flags that disable suspend for the
PHYs (check binding documentation). If that helps, then discuss with
your Silicon Validation guys what are the requirements when it comes to
suspend. Some PHYs are inherently quirky and need some of the quirky
flags dwc3 provides.

Note that disabling suspend completely is a pretty large hammer that
should only be used if nothing else helps. Some PHYs are happy with a
simple delay of U1/U2/U3 entry but, again, check with your Silicon
Validation folks, likely they have already gone through this during chip
characterization.

cheers

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-04 14:38     ` Felipe Balbi
@ 2020-07-07  3:56       ` John Stultz
  2020-07-07 10:43         ` Felipe Balbi
  0 siblings, 1 reply; 12+ messages in thread
From: John Stultz @ 2020-07-07  3:56 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

On Sat, Jul 4, 2020 at 7:38 AM Felipe Balbi <balbi@kernel.org> wrote:
> John Stultz <john.stultz@linaro.org> writes:
> > On Fri, Jul 3, 2020 at 2:54 AM Felipe Balbi <balbi@kernel.org> wrote:
> >> John Stultz <john.stultz@linaro.org> writes:
> >> > I was curious if you or anyone else had any thoughts on how to debug
> >> > this further?
> >>
> >> Try enabling dwc3 tracepoints and collecting working and failing
> >> cases. If I were to guess, I would say there's a small race condition
> >> between setting pullup and the transceiver sending the VBUS_VALID signal
> >> to dwc3.
> >
> > Trace logs attached. Let me know if you have any further ideas!
>
> You can see from failure case that we never got a Reset event. This
> happens, for instance, when dwc3 doesn't know that VBUS is above
> VBUS_VALID threshold (4.4V). When the problem happens, I'm assuming USB
> is completely dead, meaning that keeping the cable connected for longer
> won't change anything, right?

Correct. The only way to get it working is to unplug and replug the
cable (sometimes more than once).

> In that case, could you dump DWC3 registers (there's a debugfs interface
> for that)? I'm mostly interested in the PHY registers, both USB2 and
> USB3. Check if the PHYs are suspended in the error case.

Here's a diff of the regdump in bad and good cases:
--- regdump.bad 2020-07-07 03:44:46.799514793 +0000
+++ regdump.good        2020-07-07 03:44:44.723534198 +0000
@@ -24,7 +24,7 @@
 GHWPARAMS7 = 0x04881e8d
 GDBGFIFOSPACE = 0x00420000
 GDBGLTSSM = 0x41090440
-GDBGBMU = 0xa0b08000
+GDBGBMU = 0x20300000
 GPRTBIMAP_HS0 = 0x00000000
 GPRTBIMAP_HS1 = 0x00000000
 GPRTBIMAP_FS0 = 0x00000000
@@ -162,29 +162,29 @@
 GEVNTSIZ(0) = 0x00001000
 GEVNTCOUNT(0) = 0x00000000
 GHWPARAMS8 = 0x00000fea
-DCFG = 0x00120804
-DCTL = 0x80f00000
+DCFG = 0x0052082c
+DCTL = 0x8cf00a00
 DEVTEN = 0x00001217
-DSTS = 0x00000000
+DSTS = 0x00820000
 DGCMDPAR = 0x00000000
 DGCMD = 0x00000000
-DALEPENA = 0x00000003
+DALEPENA = 0x0000000f
 DEPCMDPAR2(0) = 0x00000000
-DEPCMDPAR1(0) = 0x17a8e000
+DEPCMDPAR1(0) = 0x15935000
 DEPCMDPAR0(0) = 0x00000002
 DEPCMD(0) = 0x00000006
 DEPCMDPAR2(1) = 0x00000000
-DEPCMDPAR1(1) = 0x02000500
-DEPCMDPAR0(1) = 0x00001000
-DEPCMD(1) = 0x00000001
+DEPCMDPAR1(1) = 0x15935000
+DEPCMDPAR0(1) = 0x00000002
+DEPCMD(1) = 0x00010006
 DEPCMDPAR2(2) = 0x00000000
 DEPCMDPAR1(2) = 0x00000000
-DEPCMDPAR0(2) = 0x00000001
-DEPCMD(2) = 0x00030002
+DEPCMDPAR0(2) = 0x00000000
+DEPCMD(2) = 0x00020007
 DEPCMDPAR2(3) = 0x00000000
 DEPCMDPAR1(3) = 0x00000000
-DEPCMDPAR0(3) = 0x00000001
-DEPCMD(3) = 0x00040002
+DEPCMDPAR0(3) = 0x00000000
+DEPCMD(3) = 0x00030007
 DEPCMDPAR2(4) = 0x00000000
 DEPCMDPAR1(4) = 0x00000000
 DEPCMDPAR0(4) = 0x00000001


> If they are, try enabling the quirk flags that disable suspend for the
> PHYs (check binding documentation). If that helps, then discuss with
> your Silicon Validation guys what are the requirements when it comes to
> suspend. Some PHYs are inherently quirky and need some of the quirky
> flags dwc3 provides.
>
> Note that disabling suspend completely is a pretty large hammer that
> should only be used if nothing else helps. Some PHYs are happy with a
> simple delay of U1/U2/U3 entry but, again, check with your Silicon
> Validation folks, likely they have already gone through this during chip
> characterization.

Unfortunately I don't have any access to silicon validation folks.
There is already a number of the quirk bindings in use, but I'll
tinker around with them a bit to see if it causes any behavior change.

Thanks so much for the ideas and feedback! Much appreciated!
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-07  3:56       ` John Stultz
@ 2020-07-07 10:43         ` Felipe Balbi
  0 siblings, 0 replies; 12+ messages in thread
From: Felipe Balbi @ 2020-07-07 10:43 UTC (permalink / raw)
  To: John Stultz
  Cc: Tejas Joglekar, Yang Fei, Anurag Kumar Vulisha, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List

[-- Attachment #1: Type: text/plain, Size: 3157 bytes --]


Hi,

John Stultz <john.stultz@linaro.org> writes:
> On Sat, Jul 4, 2020 at 7:38 AM Felipe Balbi <balbi@kernel.org> wrote:
>> John Stultz <john.stultz@linaro.org> writes:
>> > On Fri, Jul 3, 2020 at 2:54 AM Felipe Balbi <balbi@kernel.org> wrote:
>> >> John Stultz <john.stultz@linaro.org> writes:
>> >> > I was curious if you or anyone else had any thoughts on how to debug
>> >> > this further?
>> >>
>> >> Try enabling dwc3 tracepoints and collecting working and failing
>> >> cases. If I were to guess, I would say there's a small race condition
>> >> between setting pullup and the transceiver sending the VBUS_VALID signal
>> >> to dwc3.
>> >
>> > Trace logs attached. Let me know if you have any further ideas!
>>
>> You can see from failure case that we never got a Reset event. This
>> happens, for instance, when dwc3 doesn't know that VBUS is above
>> VBUS_VALID threshold (4.4V). When the problem happens, I'm assuming USB
>> is completely dead, meaning that keeping the cable connected for longer
>> won't change anything, right?
>
> Correct. The only way to get it working is to unplug and replug the
> cable (sometimes more than once).
>
>> In that case, could you dump DWC3 registers (there's a debugfs interface
>> for that)? I'm mostly interested in the PHY registers, both USB2 and
>> USB3. Check if the PHYs are suspended in the error case.
>
> Here's a diff of the regdump in bad and good cases:
> --- regdump.bad 2020-07-07 03:44:46.799514793 +0000
> +++ regdump.good        2020-07-07 03:44:44.723534198 +0000
> @@ -162,29 +162,29 @@
>  GEVNTSIZ(0) = 0x00001000
>  GEVNTCOUNT(0) = 0x00000000
>  GHWPARAMS8 = 0x00000fea
> -DCFG = 0x00120804
> -DCTL = 0x80f00000
> +DCFG = 0x0052082c

the only interesting thing here is DCFG. Can you decode it?

> +DCTL = 0x8cf00a00

IIRC, this is only telling you that your controller is in U0 or
something like that. Not interesting.

>> If they are, try enabling the quirk flags that disable suspend for the
>> PHYs (check binding documentation). If that helps, then discuss with
>> your Silicon Validation guys what are the requirements when it comes to
>> suspend. Some PHYs are inherently quirky and need some of the quirky
>> flags dwc3 provides.
>>
>> Note that disabling suspend completely is a pretty large hammer that
>> should only be used if nothing else helps. Some PHYs are happy with a
>> simple delay of U1/U2/U3 entry but, again, check with your Silicon
>> Validation folks, likely they have already gone through this during chip
>> characterization.
>
> Unfortunately I don't have any access to silicon validation folks.

no publicly available Errata List either? Do you know which PHY IP this
platform uses?

> There is already a number of the quirk bindings in use, but I'll
> tinker around with them a bit to see if it causes any behavior change.

Would be great to review those with people who were involved with the
actual Silicon development, but if you don't have access to them, the
discussion is moot :-s

> Thanks so much for the ideas and feedback! Much appreciated!

no worries ;-)

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: dwc3 inconsistent gadget connection state?
  2020-07-03  7:57   ` Anurag Kumar Vulisha
@ 2020-08-05  5:32     ` John Stultz
  0 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2020-08-05  5:32 UTC (permalink / raw)
  To: Anurag Kumar Vulisha
  Cc: Felipe Balbi, Tejas Joglekar, Yang Fei, YongQin Liu,
	Andrzej Pietrasiewicz, Thinh Nguyen, Linux USB List, Jun Li

On Fri, Jul 3, 2020 at 12:57 AM Anurag Kumar Vulisha
<anuragku@xilinx.com> wrote:
> >On Thu, Jul 2, 2020 at 2:44 PM John Stultz <john.stultz@linaro.org> wrote:
> >The one consistent difference between the working and not working case I
> >saw was the DWC3_DSTS_COREIDLE bit in the DWC3_DSTS register.
> >
> >It seems when we get to gadget_start()/pullup() if the DSTS_COREIDLE bit
> >isn't on we won't get the reset irq.
> >
> >I added a simple timeout loop to pullup() similar to the DSTS_DEVCTRLHLT
> >loop, but in the failure mode it always times out with COREIDLE not being set.
> >
> >Searching around hasn't provided any info on what COREIDLE actually means,
> >so I'm a bit in the dark.  Any clues?
> >
> DSTS.CoreIdle bit indicates that the core processed all the RXFIFO data, updated the
> Descriptors and is in idle state.
> From your previous mail I understood that the USB-C connection is configured for
> USB 2.0 only. Since you are facing issue with reset, can u please try setting the
> USB2PHYCFG. XCVRDLY bit. Enabling this bit adds an extra 2.5us delay after the
> controller sending command to configure the ULPI transceiver to HS mode and
> controller driving TxValid to 0,  for sending a HS chirp signal. Please check if this
> workaround works for you.

Hey Anurag!
  Sorry for the slow response! I finally took a bit more time to chase
this issue today, and tried your suggestion above. Unfortunately
adding the XCVRDLY bit to the USB2PHYCFG register doesn't seem to
help. I see the same behavior either way. Thanks for the suggestion
though!

I can consistently detect the problem when the COREIDLE bit isn't set
after the dwc3_ep0_out_start() call in __dwc3_gadget_start():
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.8#n2130

When it gets stuck off, the COREIDLE bit doesn't seem to ever come
back while the cable is plugged in.
Since unplugging and replugging the cable does seem to unstick this,
and since I can consistently detect when the problem has occured, I
tweaked the code so we would return a error (and that error would be
handled in the calling dwc3_gadget_start() code.  However, the device
then tries to initialize over and over, but the COREIDLE is still
stuck off. So I tried a few times to see if I could reset via
dwc3_reset_gadget(), but that doesn't seem to actually do anything
that unsticks the core. Then I tried to mimic something similar to the
softreset code but that just ends up getting the code stuck elsewhere
(i see hard hangs and rcu warnings, but not sure where it goes awry).
So not much luck...

Is there some recommendation for how to best reset the hardware from
the gadget.c code? Or is there a better place to try to detect this
COREIDLE stuck-off state and do something about it?

thanks
-john

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-08-05  5:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 21:44 dwc3 inconsistent gadget connection state? John Stultz
2020-07-03  2:55 ` Jun Li
2020-07-03  3:08   ` John Stultz
2020-07-03  7:46     ` Jun Li
2020-07-03  6:15 ` John Stultz
2020-07-03  7:57   ` Anurag Kumar Vulisha
2020-08-05  5:32     ` John Stultz
2020-07-03  9:54 ` Felipe Balbi
2020-07-04  5:51   ` John Stultz
2020-07-04 14:38     ` Felipe Balbi
2020-07-07  3:56       ` John Stultz
2020-07-07 10:43         ` Felipe Balbi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.