On Wed, Aug 01 2018, Marek Vasut wrote: > On 07/31/2018 10:12 PM, Boris Brezillon wrote: >> On Tue, 31 Jul 2018 11:05:11 +1000 >> NeilBrown wrote: >> >>> On Fri, Jul 27 2018, Boris Brezillon wrote: >>> >>>> On Fri, 27 Jul 2018 11:33:13 -0700 >>>> Brian Norris wrote: >>>> >>>>> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when >>>>> exiting") is the latest from a long history of attempts to add reboot >>>>> handling to handle stateful addressing modes on SPI flash. Some prior >>>>> mostly-related discussions: >>>>> >>>>> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html >>>>> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands >>>>> >>>>> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html >>>>> [RFC] MTD m25p80 3-byte addressing and boot problem >>>>> >>>>> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html >>>>> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used >>>>> >>>>> Previously, attempts to add reboot-time software reset handling were >>>>> rejected, but the latest attempt was not. >>>>> >>>>> Quick summary of the problem: >>>>> Some systems (e.g., boot ROM or bootloader) assume that they can read >>>>> initial boot code from their SPI flash using 3-byte addressing. If the >>>>> flash is left in 4-byte mode after reset, these systems won't boot. The >>>>> above patch provided a shutdown/remove hook to attempt to reset the >>>>> addressing mode before we reboot. Notably, this patch misses out on >>>>> huge classes of unexpected reboots (e.g., crashes, watchdog resets). >>>>> >>>>> Unfortunately, it is essentially impossible to solve this problem 100%: >>>>> if your system doesn't know how to reset the SPI flash to power-on >>>>> defaults at initialization time, no amount of software can really rescue >>>>> you -- there will always be a chance of some unexpected reset that >>>>> leaves your flash in an addressing mode that your boot sequence didn't >>>>> expect. >>>>> >>>>> While it is not directly harmful to perform hacks like the >>>>> aforementioned commit on all 4-byte addressing flash, a >>>>> properly-designed system should not need the hack -- and in fact, >>>>> providing this hack may mask the fact that a given system is indeed >>>>> broken. So this patch attempts to apply this unsound hack more narrowly, >>>>> providing a strong suggestion to developers and system designers that >>>>> this is truly a hack. With luck, system designers can catch their errors >>>>> early on in their development cycle, rather than applying this hack long >>>>> term. But apparently enough systems are out in the wild that we still >>>>> have to provide this hack. >>>>> >>>>> Document a new device tree property to denote systems that do not have a >>>>> proper hardware (or software) reset mechanism, and apply the hack (with >>>>> a loud warning) only in this case. >>>>> >>>>> Signed-off-by: Brian Norris >>>>> --- >>>>> Note that I intentionall didn't split the documentation patch. It seems >>>>> clearer to do these together IMO, but if it's *really* important to >>>>> someone...I can resend >>>> >>>> I'm fine with that. >>>> >>>> I'll leave Neil some time to review/test/comment on the patch before >>>> queuing it, but it looks good to me. >>> >>> Thanks. >>> I can confirm that if I apply this patch, my system won't reboot >>> properly (as expected), and if I then add >>> >>> broken-flash-reset; >>> >>> to the jedec,spi-nor device, it starts functioning correctly again. >>> >>> I don't like the pejorative "broken", and it also suggests that a thing >>> used to work, but something happened to break it - this is not >>> accurate. >>> I would prefer something like "reset-not-connected" which is an accurate >>> description of the state of the hardware. >>> >>> I also think that having a WARN_ON is an over-reaction. Certainly a >>> warning could be appropriate, but just one pr_warn() should be enough. >>> The "problem" is unlikely in practice, and loudly warning people that an >>> asteroid might kill them isn't particularly helpful. >>> >>> I genuinely think that if the system fails to reboot, then Linux is at >>> fault. I accept that changing Linux to be completely robust might be >>> more trouble than it is worth, but I don't accept that it is impossible. >>> >>> But I don't intend to fight either of these battles. >> >> Does that mean you're accepting this change? Brian, any comment on what >> Neil said? >> >> To be honest, I hate being in the middle of this discussion without >> having been involved in the first decision to accept such workarounds. >> I keep thinking that making boards that do not have reset properly >> wired less likely to fail rebooting is a wise decision, but I also >> agree with Brian when he says we should inform people that their design >> is unreliable. > > Hiding the issue in most cases only leads to vendors making more such > crippled boards and never learning. And you think that printing a loud warning would be likely to get vendor to make fewer crappy boards? I think it would just annoy people who aren't in a position to do anything about it. NeilBrown > >> The main problem I see here, is that adding this prop won't help people >> figuring out what is wrong with their design, it will just help them >> workaround the problem when they find out, and it might already be to >> late to fix the HW design. But maybe it's not what we're trying to do >> here. Maybe we just want to warn users that rebooting such boards is a >> risky procedure. > > The thing is, this is not a workaround, it's just a way of hiding the > problem because the problem does not go away completely. There are still > scenarios in which the system will fail. > > -- > Best regards, > Marek Vasut