From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Kimber Subject: RE: USB lockups on BeagleBone/AM335x Date: Fri, 21 Feb 2014 00:11:40 +0000 Message-ID: References: <20140220224902.GB10878@saruman.home> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Return-path: Received: from mail.enatel.co.nz ([131.203.63.198]:1912 "EHLO mail.enatel.co.nz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756439AbaBUALt convert rfc822-to-8bit (ORCPT ); Thu, 20 Feb 2014 19:11:49 -0500 In-Reply-To: <20140220224902.GB10878@saruman.home> Content-Language: en-US Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: "balbi@ti.com" Cc: "linux-omap@vger.kernel.org" Hey, Thanks for the response. I've disabled the DMA (CONFIG_MUSB_PIO_ONLY=y) but the problem still persists (for both USB sticks & USB serial ports). Now it looks like dsps_interrupt() never fires and causes the hang up... [ 94.865635] tty ttyUSB0: serial_write - 11 byte(s) [ 94.865656] cp210x ttyUSB0: usb_serial_generic_write_start - length = 11, data = 54 45 53 54 49 4e 47 20 34 32 0a [ 94.865680] musb-hdrc musb-hdrc.1.auto: qh ce461a00 periodic slot 10 [ 94.865700] musb-hdrc musb-hdrc.1.auto: qh ce461a00 urb ce481e80 dev2 ep1out-bulk, hw_ep 10, ce43db00/11 [ 94.865721] musb-hdrc musb-hdrc.1.auto: --> hw10 urb ce481e80 spd2 dev2 ep1out h_addr00 h_port00 bytes 11 [ 94.865740] musb-hdrc musb-hdrc.1.auto: TX ep10 fifo d0832c48 count 11 buf ce43db00 [ 94.865755] musb-hdrc musb-hdrc.1.auto: Start TX10 pio [ 94.865792] musb-hdrc musb-hdrc.1.auto: usbintr (0) epintr(400) [ 94.865810] musb-hdrc musb-hdrc.1.auto: ** IRQ host usb0000 tx0400 rx0000 [ 94.865826] musb-hdrc musb-hdrc.1.auto: OUT/TX10 end, csr 2100 [ 94.865866] musb-hdrc musb-hdrc.1.auto: complete ce481e80 usb_serial_generic_write_bulk_callback+0x0/0xd4 [usbserial] (0), dev2 ep1out, 11/11 [ 94.865971] tty ttyUSB0: serial_write - 11 byte(s) [ 94.865991] cp210x ttyUSB0: usb_serial_generic_write_start - length = 11, data = 54 45 53 54 49 4e 47 20 34 33 0a [ 94.866015] musb-hdrc musb-hdrc.1.auto: qh ce461a00 periodic slot 10 [ 94.866035] musb-hdrc musb-hdrc.1.auto: qh ce461a00 urb ce481e80 dev2 ep1out-bulk, hw_ep 10, ce43db00/11 [ 94.866055] musb-hdrc musb-hdrc.1.auto: --> hw10 urb ce481e80 spd2 dev2 ep1out h_addr00 h_port00 bytes 11 [ 94.866075] musb-hdrc musb-hdrc.1.auto: TX ep10 fifo d0832c48 count 11 buf ce43db00 [ 94.866089] musb-hdrc musb-hdrc.1.auto: Start TX10 pio Chris -----Original Message----- From: Felipe Balbi [mailto:balbi@ti.com] Sent: Friday, 21 February 2014 11:49 a.m. To: Chris Kimber Cc: linux-omap@vger.kernel.org Subject: Re: USB lockups on BeagleBone/AM335x Hi, On Thu, Feb 20, 2014 at 10:39:00PM +0000, Chris Kimber wrote: > Hi, > > I've been experiencing USB issues with a BeagleBone white rev A5. > I've not seen any symptoms with the TI 3.2 kernel but I need to get > access to some of the later drivers and didn't fancy back porting... > > So I've tried 3.8, 3,12 & 3.13 kernels with the patches from > https://github.com/beagleboard/kernel and they seem to be able to talk > to a USB memory stick but when making use of a cp210x and ftdi_sio > based USB to UART adaptor the controller hangs. > > I've also tried linux-next, linux-usb and now linux-omap3 and they > seem to be more unstable and even communicating with a USB stick seems > flaky. > > I've got a test app that just writes "TESTING \n" to the tty > for ever. > > Here's some dmesg from linux-omap3 (1fbb354). I've added -DDEBUG to > drivers/usb/{musb, serial}. > > OK: > [ 16.573781] tty ttyUSB0: serial_write - 11 byte(s) > [ 16.573802] cp210x ttyUSB0: usb_serial_generic_write_start - length = 11, data = 54 45 53 54 49 4e 47 20 34 32 0a > [ 16.573825] musb-hdrc musb-hdrc.1.auto: qh ce474b00 periodic slot 10 > [ 16.573846] musb-hdrc musb-hdrc.1.auto: qh ce474b00 urb ce489700 dev2 ep1out-bulk, hw_ep 10, ce44f700/11 > [ 16.573866] musb-hdrc musb-hdrc.1.auto: --> hw10 urb ce489700 spd2 dev2 ep1out h_addr00 h_port00 bytes 11 > [ 16.573887] musb-hdrc musb-hdrc.1.auto: configure ep10/a4 packet_sz=64, mode=0, dma_addr=0x8e44f700, len=11 is_tx=1 > [ 16.573905] musb-hdrc musb-hdrc.1.auto: Start TX10 dma > [ 16.573928] musb-hdrc musb-hdrc.1.auto: DMA transfer done on hw_ep=10 bytes=11/11 > [ 16.573945] musb-hdrc musb-hdrc.1.auto: OUT/TX10 end, csr 3500, dma > [ 16.573986] musb-hdrc musb-hdrc.1.auto: complete ce489700 usb_serial_generic_write_bulk_callback+0x0/0xd4 [usbserial] (0), dev2 ep1out, 11/11 > > FAIL: > [ 16.574085] tty ttyUSB0: serial_write - 11 byte(s) > [ 16.574106] cp210x ttyUSB0: usb_serial_generic_write_start - length = 11, data = 54 45 53 54 49 4e 47 20 34 33 0a > [ 16.574129] musb-hdrc musb-hdrc.1.auto: qh ce474b00 periodic slot 10 > [ 16.574149] musb-hdrc musb-hdrc.1.auto: qh ce474b00 urb ce489700 dev2 ep1out-bulk, hw_ep 10, ce44f700/11 > [ 16.574169] musb-hdrc musb-hdrc.1.auto: --> hw10 urb ce489700 spd2 dev2 ep1out h_addr00 h_port00 bytes 11 > [ 16.574191] musb-hdrc musb-hdrc.1.auto: configure ep10/a4 packet_sz=64, mode=0, dma_addr=0x8e44f700, len=11 is_tx=1 > [ 16.574208] musb-hdrc musb-hdrc.1.auto: Start TX10 dma > [ 16.574231] musb-hdrc musb-hdrc.1.auto: DMA transfer done on hw_ep=10 bytes=11/11 > [ 16.574302] tty ttyUSB0: serial_write - 11 byte(s) > [ 16.574322] cp210x ttyUSB0: usb_serial_generic_write_start - length = 11, data = 54 45 53 54 49 4e 47 20 34 34 0a > [ 16.574381] tty ttyUSB0: serial_write - 11 byte(s) > [ 16.574452] tty ttyUSB0: serial_write - 11 byte(s) > [ 16.574508] tty ttyUSB0: serial_write - 11 byte(s) > ... > [ 16.930271] tty ttyUSB0: serial_write - 1 byte(s) > > Then my test app blocks. > > It looks like in the first fail case the DMA "succeeds", but the USB > controller doesn't send the frame and consequently the TXPKTRDY bit in > the csr register never gets cleared. Thus musb_is_tx_fifo_empty() > always returns false and consequently falls into > cppi41_recheck_tx_req() waiting for the queue to clear. Eventually we > must fill up some buffer and cause my sending app to block. > > I've tried to force the FIFO to flush by setting the appropriate bits > in the csr after a timeout and that doesn't seem to do anything. > > If I try and reboot the platform I get a punch of warnings: > > / # reboot > The system is going down NOW! > Sent SIGTERM to all processes > [ 990.007339] ------------[ cut here ]------------ [ 990.014193] > WARNING: CPU: 0 PID: 100 at drivers/dma/cppi41.c:605 > cppi41_dma_control+0x230/0x2a8() [ 990.023567] Modules linked in: > cp210x usbserial [ 990.028383] CPU: 0 PID: 100 Comm: blast Not > tainted 3.14.0-rc2+ #3 [ 990.034967] [] (unwind_backtrace) > from [] (show_stack+0x10/0x14) [ 990.043179] [] > (show_stack) from [] (dump_stack+0x68/0x84) [ 990.050823] > [] (dump_stack) from [] > (warn_slowpath_common+0x64/0x88) [ 990.059375] [] > (warn_slowpath_common) from [] > (warn_slowpath_null+0x18/0x1c) [ 990.068656] [] > (warn_slowpath_null) from [] > (cppi41_dma_control+0x230/0x2a8) [ 990.077948] [] > (cppi41_dma_control) from [] > (cppi41_dma_channel_abort+0x108/0x148) > [ 990.087801] [] (cppi41_dma_channel_abort) from > [] (musb_cleanup_urb+0x40/0x100) [ 990.097364] [] > (musb_cleanup_urb) from [] (musb_urb_dequeue+0x120/0x154) [ > 990.106293] [] (musb_urb_dequeue) from [] > (unlink1+0xb4/0xc4) [ 990.114206] [] (unlink1) from > [] (usb_hcd_unlink_urb+0x60/0x80) [ 990.122304] > [] (usb_hcd_unlink_urb) from [] > (usb_kill_urb+0x50/0xc8) [ 990.130917] [] (usb_kill_urb) > from [] (usb_serial_generic_close+0x20/0x64 [usbserial]) [ > 990.141145] [] (usb_serial_generic_close [usbserial]) from > [] (cp210x_close+0xc/0x28 [cp210x]) [ 990.152094] > [] (cp210x_close [cp210x]) from [] > (serial_port_shutdown+0x24/0x28 [usbserial]) [ 990.162771] > [] (serial_port_shutdown [usbserial]) from [] > (tty_port_shutdown+0x6c/0x78) [ 990.173071] [] > (tty_port_shutdown) from [] (tty_port_close+0x24/0x4c) [ > 990.181733] [] (tty_port_close) from [] > (tty_release+0x118/0x49c) [ 990.190029] [] (tty_release) > from [] (__fput+0xd4/0x1e4) [ 990.197498] [] > (__fput) from [] (task_work_run+0xb4/0xc8) [ 990.205045] > [] (task_work_run) from [] (do_exit+0x3f8/0x948) [ > 990.212865] [] (do_exit) from [] > (do_group_exit+0x98/0xd4) [ 990.220512] [] (do_group_exit) > from [] (get_signal_to_deliver+0x510/0x58c) > [ 990.229616] [] (get_signal_to_deliver) from [] > (do_signal+0xa8/0x3b8) [ 990.238260] [] (do_signal) from > [] (do_work_pending+0x54/0x9c) [ 990.246264] [] > (do_work_pending) from [] (work_pending+0xc/0x20) [ > 990.254441] ---[ end trace 6bbc95d827ba3e8c ]--- > > [ 991.506236] ------------[ cut here ]------------ [ 991.511118] > WARNING: CPU: 0 PID: 100 at drivers/usb/musb/musb_host.c:128 > musb_h_tx_flush_fifo+0x78/0xc4() [ 991.521219] Could not flush host > TX10 fifo: csr: 2503 [ 991.526552] Modules linked in: cp210x usbserial > [ 991.531352] CPU: 0 PID: 100 Comm: blast Tainted: G W 3.14.0-rc2+ #3 > [ 991.538897] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) [ 991.547081] [] (show_stack) from > [] (dump_stack+0x68/0x84) [ 991.554713] [] > (dump_stack) from [] (warn_slowpath_common+0x64/0x88) [ > 991.563262] [] (warn_slowpath_common) from [] > (warn_slowpath_fmt+0x2c/0x3c) [ 991.572456] [] > (warn_slowpath_fmt) from [] (musb_h_tx_flush_fifo+0x78/0xc4) > [ 991.581651] [] (musb_h_tx_flush_fifo) from [] > (musb_cleanup_urb+0xa4/0x100) [ 991.590844] [] > (musb_cleanup_urb) from [] (musb_urb_dequeue+0x120/0x154) [ > 991.599759] [] (musb_urb_dequeue) from [] > (unlink1+0xb4/0xc4) [ 991.607669] [] (unlink1) from > [] (usb_hcd_unlink_urb+0x60/0x80) [ 991.615762] > [] (usb_hcd_unlink_urb) from [] > (usb_kill_urb+0x50/0xc8) [ 991.624329] [] (usb_kill_urb) > from [] (usb_serial_generic_close+0x20/0x64 [usbserial]) [ > 991.634544] [] (usb_serial_generic_close [usbserial]) from > [] (cp210x_close+0xc/0x28 [cp210x]) [ 991.645487] > [] (cp210x_close [cp210x]) from [] > (serial_port_shutdown+0x24/0x28 [usbserial]) [ 991.656156] > [] (serial_port_shutdown [usbserial]) from [] > (tty_port_shutdown+0x6c/0x78) [ 991.666452] [] > (tty_port_shutdown) from [] (tty_port_close+0x24/0x4c) [ > 991.675095] [] (tty_port_close) from [] > (tty_release+0x118/0x49c) [ 991.683372] [] (tty_release) > from [] (__fput+0xd4/0x1e4) [ 991.690824] [] > (__fput) from [] (task_work_run+0xb4/0xc8) [ 991.698368] > [] (task_work_run) from [] (do_exit+0x3f8/0x948) [ > 991.706184] [] (do_exit) from [] > (do_group_exit+0x98/0xd4) [ 991.713821] [] (do_group_exit) > from [] (get_signal_to_deliver+0x510/0x58c) > [ 991.722921] [] (get_signal_to_deliver) from [] > (do_signal+0xa8/0x3b8) [ 991.731564] [] (do_signal) from > [] (do_work_pending+0x54/0x9c) [ 991.739561] [] > (do_work_pending) from [] (work_pending+0xc/0x20) [ > 991.747736] ---[ end trace 6bbc95d827ba3e8e ]--- > > Full dmesg: https://gist.github.com/anonymous/9124604 > > Anyone have any ideas on where else to look? > > I've put my defconfig here https://gist.github.com/anonymous/9124565 > (it's based from the 3.13 one from the beagleboard github) just in > case there is anything stupid going on. > > Is the USB in a known state of flux? the short answer: yes The long answer: AM335x ES1.0 silicon (the one you have on your BBW) has many, many, many known silicon bugs (mostly around CPPI 4.1 - the DMA controller) and it's *very* difficult to have a stable USB with DMA on that device. Surely we shouldn't have such failures, but it takes time and effort to fix all of that in a way that doesn't regress any of the other numerous platforms the MUSB driver supports. Just to make sure this is a DMA problem, can you see if disabling DMA altogether makes the test work ? (beware, throughput will *suck*). cheers -- balbi