From mboxrd@z Thu Jan 1 00:00:00 1970 From: dianders@chromium.org (Doug Anderson) Date: Mon, 17 Apr 2017 15:37:16 -0700 Subject: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s In-Reply-To: <172093673.40121.1492427140661@email.1und1.de> References: <1795308650.27171.9a53158f-312d-40ce-80ce-8bf792d8db34.open-xchange@email.1und1.de> <172093673.40121.1492427140661@email.1und1.de> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi, On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren wrote: > Hi, > >> Stefan Wahren hat am 31. Oktober 2016 um 21:34 geschrieben: >> >> >> I inspired by this issue [1] i build up a slightly modified setup with a >> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 Prolific >> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which >> only tries to open all ttyUSB devices one after the other. >> >> Unfortunately the complete system stuck after opening the first ttyUSB device ( >> heartbeat LED stop blinking, no reaction to debug UART). The only way to >> reanimate the system is to powerdown the USB hub with the USB to serial >> convertors. >> >> [1] - https://github.com/raspberrypi/linux/issues/1692 >> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa > > since this issue still exists with 4.11 (even without or with microframe scheduler enabled), i want to ask some additional questions: > > Is this issue reproducible with other dwc2 platforms than bcm2835? +Edmund Szeto, who I seem to remember emailing me about similar questions in the past. > Does the soft lockup also occurs after opening the second serial convertor or later? I don't have serial converters easily available to me, but back in the day when I was stressing things out on rk3288 I never saw anything this bad. ...of course, on rk3288 we've got 4 A17 cores running really fast, so possibly just being slower is what causes your problems here? I will make the following observations: 1. With dwc2 you often end up in the situation where you need to service an interrupt every 125 uS. If servicing that interrupt takes anywhere near 125 uS in the common case then you'll be in trouble. === 2. When I was testing on rk3288 (on kernel 3.14) I did see occasions where uvc_video_complete() could sometimes take > 125 uS. It's been a long time now, but if I remember correctly this had to do with the fact that the URB buffers were allocated in a way where you had to access them non-cached and this was super duper slow. In my particular case I could "fix" it by adjusting UVC_MAX_PACKETS (crosreview.com/321932). ...and I had some timing code in crosreview.com/321980. Again, it was a long time ago, but elsewhere I have written down: ----- Specifically: * The USB "complete" functions are called with local interrupts disabled. Specifically see __usb_hcd_giveback_urb(). * I see calls to uvc_video_complete() that easily take > 125us. Unfortunately the interrupts disabled while uvc_video_complete() is called are always the interrupts for the same CPU that's dealing with the normal dwc2 USB interrupts. -- Ugh. This may be the memcpy() as others have found: http://www.spinics.net/lists/linux-usb/msg83581.html ...looks like the issue is that the driver is allocating memory that's supposed to be DMA coherent and copying from this memory is slow. ----- You could probably pick my timing patch and then see if you're actually hitting cases like this, I guess? === 3. Are you running CPUFreq by chance? ...back in the day we had a bug on rk3288 where we were temporarily running the CPU as slow as 8 MHz for a short while during a CPUFreq transition. If you happened to get a dwc2 interrupt while at this speed then you were in trouble. -Doug