Hi, Alan Stern writes: > On Tue, 20 Sep 2016, Felipe Balbi wrote: > >> And here's trace output (complete, scroll to bottom). It seems to me >> like the thread didn't wake up at all. It didn't even try to >> execute. I'll add some more traces and try to get better information >> about what's going on. >> >> >> >> # tracer: nop >> # >> # entries-in-buffer/entries-written: 2865/2865 #P:4 >> # >> # _-----=> irqs-off >> # / _----=> need-resched >> # | / _---=> hardirq/softirq >> # || / _--=> preempt-depth >> # ||| / delay >> # TASK-PID CPU# |||| TIMESTAMP FUNCTION >> # | | | |||| | | > > Skipping to the end... > >> irq/17-dwc3-2522 [002] d... 43.504199: bulk_out_complete: 0, 31/31 >> file-storage-2521 [001] .... 43.504202: fsg_main_thread: get_next_command -> 0 >> file-storage-2521 [001] .... 43.504288: fsg_main_thread: do_scsi_command -> 0 >> file-storage-2521 [001] .... 43.504295: fsg_main_thread: finish_reply -> 0 >> file-storage-2521 [001] .... 43.504298: fsg_main_thread: send_status -> 0 >> irq/17-dwc3-2522 [002] d... 43.504347: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504351: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504434: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504438: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504535: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504539: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504618: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504703: bulk_in_complete: 0, 16384/16384 >> irq/17-dwc3-2522 [002] d... 43.504794: bulk_in_complete: 0, 13/13 >> irq/17-dwc3-2522 [002] d... 43.504797: bulk_out_complete: 0, 31/31 > > Like you say, it appears that the thread didn't get woken up at all. > But this is inconsistent with your earlier results. On Sep. 9, you > posted a message that ended with these lines: > >> irq/17-dwc3-3579 [003] d..1 21167.729666: bulk_out_complete: compl: bh ffff880111e6aac0 state 1 >> file-storage-3578 [002] .... 21167.729670: fsg_main_thread: next: bh ffff880111e6aac0 state 1 > > This indicates that in the earlier test, the thread did start running > and get_next_command should have returned. > > The trace you posted after this one doesn't seem to show anything new, > as far as I can tell. > > So I still can't tell what's happening. Maybe the patch below will > help. It concentrates on the critical area. Sorry for the long delay, I finally have more information on this. All this time I was doing something that I never considered to matter: I've been running host and peripheral on the same machine. Now that I have tracepoints on xHCI as well, I could see that these 30 seconds of "nothing" is actuall full of xHCI activity and I can see that for the duration of these 30 seconds preempt depth on the CPU that (eventually) queues a request on dwc3, is always > 1 (sometimes 2, most of the time 1). My conclusion from that is that xHCI (or usbcore ?!?) locks the CPU and g_mass_storage is spinning for over 30 seconds at which point storage.ko (host side class driver) dequeues the request. I'll see if I can capture a fresh trace with both xHCI and dwc3 with this happening, but probably not today (testing stuff for -rc). -- balbi