linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Debugging usb core/xhci issue
@ 2019-11-22 15:42 Bryan Gillespie
  2019-11-25  3:22 ` Peter Chen
  0 siblings, 1 reply; 2+ messages in thread
From: Bryan Gillespie @ 2019-11-22 15:42 UTC (permalink / raw)
  To: linux-usb

Hello,

My name is Bryan Gillespie (RPGillespie6 on GitHub). I'm emailing
because I'm completely stumped at how to approach debugging a
USB-related issue on an embedded linux setup and I'm hoping someone
here might be able to at least be able to give some high level ideas
on how to approach debug. Also, I've never used mailing lists before
so let me know if this is completely out of line.

Basically, I have a marvell a3700 soc running embedded linux (linux
version 4.4) connected to a Qualcomm modem (linux version 3.18) via
USB 3.0 traces on a PCB. The Qualcomm modem enumerates as a devices in
the a3700 with 6 interfaces and 14 endpoints. There are various
drivers that are applied to the usb interfaces, from qcserial to
qmi_wwan, to adb (userspace), to ipcrtr (normally not a usb driver but
has usb xprt added).

Everything seems to work perfectly fine until I start putting the
system under higher load for longer periods of time. For example, if I
run iperf traffic through the qmi_wwan/usbnet interface (20 MB up, 200
MB down) and send control traffic periodically through ipc router
interface, eventually (~1-3 hours) there is some kind of breakage and
nothing usb-related works anymore for that device. Not even adb works
even though it has its own dedicated interface (adb shell just hangs
indefinitely, for example).

**This leads me to believe something in linux's usbcore or xhci
somehow got foobared by an interface driver since those are the common
layers shared by all usb interfaces?**

I don't understand these layers well enough to know what that could
possibly be. I should also mention that sometimes (not always) there
is a single dmesg trace that happens at the time of breakage in the
a3700:

[ 3771.097658] ipcrtr_read_cb Connection Reset 7 urb status -71

ipcrtr_read_cb is the urb complete callback and -71 is the feared
-EPROTO urb code.

If I issue USBDEVFS_RESET to the device with ioctl inside the a3700,
everything starts magically working again (presumably because all the
data structures/buffers/etc. in xhci and above are reset and all the
interfaces are re-probed?). I am pretty sure (but not positive) it is
not the modem's fault since qualcomm's provided reference processor
seems to be able to run iperf traffic indefinitely.

I should mention that the a3700 processor is very limited on memory;
it only has about 160 MB of total memory (DRAM) available to linux
compared to Qualcomm's reference processor which has 4 GB memory (and
is running linux version 3.10).

If you've made it this far in my email, my question is - how would you
approach debugging this? Are there some key things you would check?
Are there any known gotchas with linux 4.X as host and linux 3.X as
device? It is not easily reproducible (at least not without waiting a
long time - currently exploring if it is possibly to cause the issue
faster somehow). I have ftrace enabled, but so far I haven't been able
to get a trace that captures the exact window of breakage. I tried
turning on all usb-related debug with dynamic debug as well, but this
appears to cause the kernel to consume 100% cpu as soon as I start
iperf so currently I'm trying to identify some key files to turn on
traces for that hopefully won't overwhelm the cpu with logging.

Any recommendations/ideas at all would be much appreciated.

Thank you!

Respectfully,
Bryan

-- 
Bryan Gillespie
(801) 664-7527

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Debugging usb core/xhci issue
  2019-11-22 15:42 Debugging usb core/xhci issue Bryan Gillespie
@ 2019-11-25  3:22 ` Peter Chen
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Chen @ 2019-11-25  3:22 UTC (permalink / raw)
  To: Bryan Gillespie; +Cc: linux-usb

On 19-11-22 10:42:26, Bryan Gillespie wrote:
> Hello,
> 
> My name is Bryan Gillespie (RPGillespie6 on GitHub). I'm emailing
> because I'm completely stumped at how to approach debugging a
> USB-related issue on an embedded linux setup and I'm hoping someone
> here might be able to at least be able to give some high level ideas
> on how to approach debug. Also, I've never used mailing lists before
> so let me know if this is completely out of line.
> 
> Basically, I have a marvell a3700 soc running embedded linux (linux
> version 4.4) connected to a Qualcomm modem (linux version 3.18) via
> USB 3.0 traces on a PCB. The Qualcomm modem enumerates as a devices in
> the a3700 with 6 interfaces and 14 endpoints. There are various
> drivers that are applied to the usb interfaces, from qcserial to
> qmi_wwan, to adb (userspace), to ipcrtr (normally not a usb driver but
> has usb xprt added).
> 
> Everything seems to work perfectly fine until I start putting the
> system under higher load for longer periods of time. For example, if I
> run iperf traffic through the qmi_wwan/usbnet interface (20 MB up, 200
> MB down) and send control traffic periodically through ipc router
> interface, eventually (~1-3 hours) there is some kind of breakage and
> nothing usb-related works anymore for that device. Not even adb works
> even though it has its own dedicated interface (adb shell just hangs
> indefinitely, for example).
> 
> **This leads me to believe something in linux's usbcore or xhci
> somehow got foobared by an interface driver since those are the common
> layers shared by all usb interfaces?**
> 
> I don't understand these layers well enough to know what that could
> possibly be. I should also mention that sometimes (not always) there
> is a single dmesg trace that happens at the time of breakage in the
> a3700:
> 
> [ 3771.097658] ipcrtr_read_cb Connection Reset 7 urb status -71
> 
> ipcrtr_read_cb is the urb complete callback and -71 is the feared
> -EPROTO urb code.
> 

This usually the hardware error.

> If I issue USBDEVFS_RESET to the device with ioctl inside the a3700,
> everything starts magically working again (presumably because all the
> data structures/buffers/etc. in xhci and above are reset and all the
> interfaces are re-probed?). I am pretty sure (but not positive) it is
> not the modem's fault since qualcomm's provided reference processor
> seems to be able to run iperf traffic indefinitely.
> 
> I should mention that the a3700 processor is very limited on memory;
> it only has about 160 MB of total memory (DRAM) available to linux
> compared to Qualcomm's reference processor which has 4 GB memory (and
> is running linux version 3.10).
> 
> If you've made it this far in my email, my question is - how would you
> approach debugging this? Are there some key things you would check?
> Are there any known gotchas with linux 4.X as host and linux 3.X as
> device? It is not easily reproducible (at least not without waiting a
> long time - currently exploring if it is possibly to cause the issue
> faster somehow). I have ftrace enabled, but so far I haven't been able
> to get a trace that captures the exact window of breakage. I tried
> turning on all usb-related debug with dynamic debug as well, but this
> appears to cause the kernel to consume 100% cpu as soon as I start
> iperf so currently I'm trying to identify some key files to turn on
> traces for that hopefully won't overwhelm the cpu with logging.
> 

Hi Byran,

Your kernel for both host and device are too old. xHCI driver improves
a lot these years, Would you please try using newer kernel the hardware
supported to see if any thing changes? It is easier for the driver
maintainer to give some hints for newer kernel.

-- 

Thanks,
Peter Chen

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-11-25  3:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 15:42 Debugging usb core/xhci issue Bryan Gillespie
2019-11-25  3:22 ` Peter Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).