From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lukasz Majewski Date: Mon, 2 Mar 2020 14:01:07 +0100 Subject: [PATCH] usb: ehci: Fix "EHCI timed out on TD - token=XXXX" error on ehci-hcd In-Reply-To: <19bb8417-8aa1-c963-2ef7-61c82f4f2d9b@denx.de> References: <20200226112955.7930-1-lukma@denx.de> <20200301181909.55276b0b@jawa> <1d47e4de-bf6e-f404-a02e-6209cb9aed45@denx.de> <20200301185953.124f5db8@jawa> <19bb8417-8aa1-c963-2ef7-61c82f4f2d9b@denx.de> Message-ID: <20200302140107.38e34657@jawa> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de Hi Marek, > On 3/1/20 6:59 PM, Lukasz Majewski wrote: > > Hi Marek, > > Hi, > > [...] > > >>> The description of the fix written by the original author: > >>> https://forum.doozan.com/read.php?3,35295,35295#msg-35295 > >>> > >>> states that the reduction of the transfer is done just for "spin > >>> up"/"detection" of the pen drive. The issue is that not all pen > >>> drive disks are able to be properly detected in the first place. > >>> > >> > >> But the commit message claims this patch reduces the transfer rate > >> by 20% always. So that's a NAK right away, sorry, > > > > Marek, now we do have from time to time NOT functional USB (once per > > ten attempts on iMX6Q - tpc70 board). If somebody relies on 'usb > > start' for recovery/normal operation - it is bad. > > > > In such a use case - the 20% performance drop is acceptable. > > I'm sure we can do better. 2020.04 is still quite far out. > > Can you take a look ? There are several issues which are fixed by this patch. I will reply to the actual patch set to point them out in the code, as this may be more readable. > > >> you need to find a > >> better solution which doesn't have such an enormous impact. > >> > >> I'm fine with the spin-up handling, but this auto-detection of > >> transfer size should be pulled into separate patch and is likely > >> wrong anyway. > > > > I cannot agree that is it "likely wrong anyway" - please correct me > > if I'm wrong, but up till now we were fiddling with transfer size > > up till having "good enough" results. > > Correction, read the patches I linked above > (7d6fd7f0ba71cd93d94079132f958d9630f27a89 The above sha1 uses the observation (and heuristics) from other operating systems to deal with usb pendrives, which controllers cannot handle more than 256 sectors (as they have 8 bit counter). > and esp. > 02b0e1a36c5bc20174299312556ec4e266872bd6). This patch introduces the optimization - instead of setup/disable async transmissions - it enables it once and then just adds new tokens to be transmitted. Is the transmission error (USB_ST_XACTERR 0x80) handled at this point? > Those solve the "we were > fiddling around with transfer size" attempts, because I actually > invested the time to understand the issue with those devices that > failed and I was right about never letting any of those "fiddling > around with transfer size" patches into mainline. > > The problem with those devices which needed "fiddling with transfer > size" was that their internal 16bit counters (or 8bit counters) > overflew when a very long transfer was started and thus those devices > failed. The fix there was to transfer less than that amount of data > which triggered the overflow, which on a wast majority of devices is > 240 blocks. However , this led to a massive slowdown, which is > unacceptable. A fix for that slowdown was to avoid turning async > schedule off-on between each transfer, but rather keep it running, > that saves A LOT of wasted transfer time. If you build a long > transfer up front, you would never do that many transfers to notice > the delays incurred by turning the async schedule off/on, but if you > do multiple shorter transfers, these delays add up. > Thanks for the explanation. > Yes, these issues are convoluted and take time to solve properly, yet > the end result might not have such a performance hit. > > > The above link describes what needs to be done in case of the "EHCI > > timed out on TD - token=0x1f8c80" error. > > > > Do we handle it now? In my opinion we don't. > > > >> > >> So try to separate this into two patches, one for handling the spin > >> up, another one for this auto-detection of the transfer size. And I > >> think you will end up needed only the first one. > > > > This is an open question if only the spin up fixes the error. It > > would be great if you could look for the changes introduced in this > > patch and compare it with your experience of ehci driver > > development. > > Surely you can separate those two patches and perform such a test ? I would prefer to first go through the actual patch code, before I spent time on things which may be not necessary. > > >>> Last but not least - the mainline is still broken. The > >>> "token=0x1f8c80" errors pops up from time to time. However, after > >>> applying the approach from this fix - the error is gone (and pass > >>> some acceptance tests). > >> > >> It works for me. > > > > Also works for Tom. > > > >> Do you have a specific device which fails ? > > > > Yes. On my desk - tpc70 with i.MX6Q. And hdc with i.MX53. I can > > confirm that from U-Boot 2018.04 up till now the rate of: > > > > "EHCI timed out on TD - token=0x1f8c80" error has been reduced > > significantly for their use case (reading ~16MiB of binary from > > several pen drives). > > Can you run lsusb (to get IDs) on those pendrives which fail ? > I would like to add them to a list. This was working correctly on the beginning, but after some time it started to show errors in question: Bus 001 Device 040: ID 0930:6544 Toshiba Corp. TransMemory-Mini / Kingston DataTraveler 2.0 Stick This one was causing issues from the very beginning: Bus 001 Device 041: ID 058f:6387 Alcor Micro Corp. Flash Drive This one - Verbatim Bus 001 Device 042: ID 21c4:8005 Never caused issues - worked reliably even with old U-Boot (on which errors were observed very often). > > > Unfortunately the errors are still present from time to time. > > > >> Which > >> device and on which hardware ? > >> > >> Also, what is "some acceptance test" ? > >> > > > > The set of automated tests to assess if "usb start" don't break. > > The usb start only reads out the partition table and does not do any > long transfers (above 240 blocks, which usually triggers the failure > on cheap sticks). So if long transfers fail for you, then this is a > separate issue from the 'usb start' spin-up problem. And so, if this > patch makes both work, then this indeed should be two patches. Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma at denx.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: