From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.micron.com ([137.201.242.129]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YXcIw-0003NZ-Pp for linux-mtd@lists.infradead.org; Mon, 16 Mar 2015 21:12:07 +0000 From: "Jeff Lauruhn (jlauruhn)" To: Boris Brezillon Subject: RE: RFC: detect and manage power cut on MLC NAND Date: Mon, 16 Mar 2015 21:11:30 +0000 Message-ID: <0D23F1ECC880A74392D56535BCADD7354973E2B8@NTXBOIMBX03.micron.com> References: <54FEDC42.2060407@dave-tech.it> <1426058414.1567.2.camel@sauron.fi.intel.com> <5500037A.9010509@nod.at> <1426064733.1567.6.camel@sauron.fi.intel.com> <55000637.1030702@nod.at> <550074D2.1070406@dave-tech.it> <0D23F1ECC880A74392D56535BCADD7354973D072@NTXBOIMBX03.micron.com> <55007B79.2090705@nod.at> <0D23F1ECC880A74392D56535BCADD7354973D2A1@NTXBOIMBX03.micron.com> <55016A43.3000201@nod.at> <0D23F1ECC880A74392D56535BCADD7354973DAD6@NTXBOIMBX03.micron.com> <20150313213134.1b53430b@bbrezillon> <0D23F1ECC880A74392D56535BCADD7354973DF0B@NTXBOIMBX03.micron.com> <20150314113214.58d06f3d@bbrezillon> In-Reply-To: <20150314113214.58d06f3d@bbrezillon> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Andrea Scian , Richard Weinberger , mtd_mailinglist , "dedekind1@gmail.com" List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Good morning Boris; RR is a new feature and not available on all parts few. I'm not sure about= others, but since these are features, you simply enable of disable via SET= FEATURE/GET FEATURE. If you already provide that SET/GET FEATURE function= ality then an end-user determine if their device supports a feature and the= n write the code to enable when they need it on their particular design. Jeff Lauruhn NAND Application Engineer Embedded Business Unit -----Original Message----- From: Boris Brezillon [mailto:boris.brezillon@free-electrons.com]=20 Sent: Saturday, March 14, 2015 3:32 AM To: Jeff Lauruhn (jlauruhn) Cc: Richard Weinberger; dedekind1@gmail.com; mtd_mailinglist; Andrea Scian Subject: Re: RFC: detect and manage power cut on MLC NAND Hi Jeff, On Fri, 13 Mar 2015 23:51:53 +0000 "Jeff Lauruhn (jlauruhn)" wrote: >=20 > Hello Jeff, >=20 > I'm joining the discussion to ask more questions about MLC NANDs ;-). >=20 > Could you tell us more about how block wear impact the voltage level stor= ed in NAND cells. >=20 > 1/ Are all pages in a block impacted the same way ? > Yes, because of block erase, P/E cycles affect all the pages in a block. Okay, that's what I thought. > 2/ Is wear more likely to induce voltage increase, voltage decrease > or is it unpredictable ? Wear is a very well known a NAND characteri= stic. During P/E cycling there is a potential for electrons to get perman= ently trapped in the oxide. The more P/E cycles the more electrons get tra= pped. Over many P/E cycles cells well get to a point where they look perma= nent programmed and can't be erased or programmed. As cells begin to fail,= ECC can be used to recover the data. If too many bits fail in page the de= vice will respond with a FAIL status after a P/E cycle. So voltage thresholds tends to increase with wear, right ? > =09 > 3/ Is it possible to have more than one working voltage threshold > (read-retry mode): I did some testing on my Hynix chip (I know you > work for Micron but that's the only MLC chip I have :-)), and I > managed to get less bitflips by trying another read-retry mode even > if the previous one was allowing me to successfully fix existing > bitflips. > Read Retry is available on some newer products. RR was introduced to he= lp maintain and improve data retention and P/E cycles as geometry shrinks a= nd bit/cell increase. If the device supports RR, we have predefined RR Opt= ions, based on the most likely chance of success. Start with option 1 and= step through the options until you get a successful read. The DS usually = has pretty good information. When you say you have "predefined RR Options, based on the most likely cha= nce of success", does this mean these options are internally evolving durin= g the NAND block lifetime, or is RR mode 0 always encoding the same thresho= ld config. In the latter case, maybe we should start with a different RR mode dependin= g on the number of P/E cycles already done on the block, so that we have mo= re chance to successfully read the page on our first read. =20 >=20 > 4/ Do you have any numbers/statistics that could > help us choose the more appropriate read-retry mode according to the > number of P/E cycles ? I don't have numbers or statistics, but I can = tell you that the RR steps are generally defined based on known NAND behavi= or. Go to the Micron website and put in this PN MT29F128G08CBCCB and you w= ill find good information on RR. Okay, I'll have a look at the datasheet you pointed out (the Hynix one was = not even talking about read-retry, I had to search in Allwinner code to und= erstand how to change read-retry mode). > =20 > 5/ Any other things you'd like to share regarding read-retry ?=20 > RR isn't available on all devices. From your prospective I would give t= hem the option to use RR if it's available. Yes, that's already done this way: we use RR on devices providing this feat= ure. IIRC, only Micron chips are supported so far, but I added support for = one of the Hynix chip. The whole problem here is that each vendor implement RR in their own way (u= sing ONFI params for Micron, OTP area and private commands for Hynix, and p= robably something else for Samsung chips). Anyway, that's just a matter of adding a NAND chip database + vendor specif= ic code to deal with each read retry implementation (even if that would hav= e helped us a lot if chip vendors had agreed on a standard way to control R= R). >=20 > Apart from that, we're currently trying to find the most appropriate way = to deal with paired pages, and this sounds rather complicated. > The current idea is to expose paired pages information up to the UBIFS la= yer, and let UBIFS decide when it should stop writing on pages paired with = already written pages. > Moreover, we have a few pages we need to protect (UBI metadata: EC and VI= D headers) in order to keep UBI/UBIFS consistent. > Do you have anything to share on this topic (ideas, solutions we=20 > should consider, constraints we're not aware of, ...) >=20 > This is one of the reasons I came to this site. I have a great deal of d= evice knowledge and I need to know more about how end users use the device.= =20 >=20 > Most designs today employ power loss detection and employ elegant shutdow= n to the NAND. In addition, we provide Write Protect, which provides an ex= tra layer of protection against power loss. There is still a chance that i= f the power event happens during a program to a page, the previously progra= mmed shared page can also be corrupted. It's not clear to me how to keep t= rack of shared pages for every device out there. It's not like a parameter= page that you can read. It's an interesting problem. Of course, preventing page corruption is a good approach, but some board de= signers are just simply not taking these constraints into account, and dete= cting power loss in order to assert the WP pin is not possible in such desi= gns. I think we should also find a solution to recover from corruptions induced = by paired pages write, and that's the direction we're currently investigati= ng. But if someone have real examples (boards) supporting power loss detection = + WP pin control in such cases, maybe we can start thinking about a standar= d way to deal with that in Linux. Thanks again for your answers. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com