From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marek Vasut Date: Mon, 14 Sep 2015 15:36:17 +0200 Subject: [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds In-Reply-To: <20150914132220.128a988c@amdc2363> References: <1440769821-24005-1-git-send-email-l.majewski@samsung.com> <1442225716.9459.11.camel@synopsys.com> <20150914132220.128a988c@amdc2363> Message-ID: <201509141536.17419.marex@denx.de> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote: > Hi Alexey, > > > Hi Marek, Lukasz, > > > > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote: > > > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote: > > > > Hi Marek, > > > > > > Hi, > > > > > > [...] > > > > > > > > > > Still we need to fix regression first with virtually > > > > > > > infinite timeout :) I would even thing that simple revert > > > > > > > of Marek's patch may make sense for now. > > > > > > > > > > > > +1 - unfortunately there were some other patches applied to > > > > > > this particular patch. Simple revert might be a bit tricky > > > > > > here. > > > > > > > > > > -1 - In case the card gets removed during the DMA transfer and > > > > > the board doesn't have a watchdog, it will get stuck > > > > > indefinitelly. > > > > > > > > I'm just wondering here - why the indefinite loop was working > > > > previously? Was anybody complaining (on the ML) about the problem > > > > of removing the SD card when some operation is ongoing? > > > > > > It worked for me for all the workloads I used. Noone was > > > complaining. > > > > The same story here - previous code with infinite loop was working for > > my boards. And now I do see a problem with pretty simple scenario that > > we do use in our products. > > > > > > The problem with potential removal of SD card (after booting the > > > > board) is with us for quite long time. Even with indefinite loop > > > > (without your patch) we also could "hang" the board if the SD > > > > card was removed during a transfer. > > > > > > Which is why we should weed out the unbounded loops. > > > > > > > > We > > > > > absolutelly don't want this sort of behavior in U-Boot. I > > > > > understand that this is the easiest way for everyone to achieve > > > > > some sort of "working" solution, but it is definitelly not the > > > > > correct one. While I do agree to increasing the timeout, I do > > > > > not agree to unbounded loops, sorry. > > > > > > > > We have agreed to not agree :-) > > > > > > Yes :-) > > > > The first thing I care is working U-Boot v2015.10 out of the box on my > > boards. And so I may agree on any temporary solution. I see it as > > timeout value either being infinite or obviously very high like 60 > > seconds. > > > > 60 seconds might sound stupid but my thought behind this is to make > > sure even long transfers succeed. Imagine 100 Mb rootfs or update > > file downloaded from slow SD-card. > > Transfer of rootfs to SD-card (downloaded to memory via tftp) is > definitely valid scenario. > > > > > > > > From both points of view for keeping history > > > > > > > clean (compared to stacked fixes/workarounds) and from > > > > > > > removal of regression root cause. > > > > > > > > > > > > As I said before - +1 from me. > > > > > > > > > > As I said before, -1 from me. Btw. did anything regress in > > > > > here? To me, this seems like a newly discovered bug ... > > > > > > > > Yes, this is a bug. We had similar problem with Samsung's SDHCI, > > > > before we switched to dw_mmc. This issue is new at dw_mmc. > > > > > > > > > > > It's not that I like to have infinite loops but given > > > > > > > previous implementation worked fine for people in the > > > > > > > previous U-Boot release. > > > > > > > > > > > > Good justification > > > > > > > > > > It is never a justified to return to a potentially problematic > > > > > version > > > > > > > > IMHO revering the change (before the release) is from the software > > > > development point of view better solution than adding some > > > > heuristic delta to timeout. > > > > > > > > > for the sake of getting some sort of crappy hardware > > > > > operational. > > > > > > > > Unfortunately this "crappy hardware" is pervasive and we cannot do > > > > anything about it. > > > > > > > > To sum up (my point of view): > > > > 1. The best would be to revert the patch - but if simple "git > > > > revert" is not working then, > > > > Well even if clean revert won't work we may do manual tweaks so that > > functionally it is "revert". If of any interest I may come up with > > that sort of patch. > > > > > > 2. We should increase the timeout (with my patch) for v2015.10 > > > > release > > > > If everybody is OK with that let's go do it. Because release is > > around the corner and I don't want to explain each and every user how > > to fix their new problem. > > v2015.10 correctness is crucial here. > > > > Let's do this for the sake of crappy cards. > > > > > > > 3. After release we can devise some kind of solution > > > > 4. It is a good topic for U-boot's minisummit discussion at Dublin > > > > > > > > Marek, Alexey, Tom, Pantelis what do you think? > > > > > > I think yes. > > > > What's important we need to make sure Tom is aware of this situation > > and he won't cut a release until our fix is in place and all involved > > parties reported their happiness. > > I also think that Tom should speak up on this issue. btw. I think I know why SoCFPGA never triggered this issue, it has CONFIG_SYS_MMC_MAX_BLK_COUNT set to 256 so a large transfer was always broken to smaller chunks.