From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.hgst.iphmx.com ([68.232.141.245]:33509 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751775AbdASC4n (ORCPT ); Wed, 18 Jan 2017 21:56:43 -0500 From: Slava Dubeyko To: Jeff Moyer CC: Jan Kara , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Viacheslav Dubeyko , "Linux FS Devel" , "lsf-pc@lists.linux-foundation.org" Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Thu, 19 Jan 2017 02:56:39 +0000 Message-ID: References: <20170114004910.GA4880@omniknight.lm.intel.com> <20170117143703.GP2517@quack2.suse.cz> In-Reply-To: Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: -----Original Message----- From: Jeff Moyer [mailto:jmoyer@redhat.com]=20 Sent: Wednesday, January 18, 2017 12:48 PM To: Slava Dubeyko Cc: Jan Kara ; linux-nvdimm@lists.01.org ; linux-block@vger.kernel.org; Viacheslav Dubeyko = ; Linux FS Devel ; lsf-pc@lists.linux-founda= tion.org Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in f= ilesystems >>> Well, the situation with NVM is more like with DRAM AFAIU. It is=20 >>> quite reliable but given the size the probability *some* cell has degra= ded is quite high. >>> And similar to DRAM you'll get MCE (Machine Check Exception) when you=20 >>> try to read such cell. As Vishal wrote, the hardware does some=20 >>> background scrubbing and relocates stuff early if needed but nothing is= 100%. >> >> My understanding that hardware does the remapping the affected address=20 >> range (64 bytes, for example) but it doesn't move/migrate the stored=20 >> data in this address range. So, it sounds slightly weird. Because it=20 >> means that no guarantee to retrieve the stored data. It sounds that=20 >> file system should be aware about this and has to be heavily protected=20 >> by some replication or erasure coding scheme. Otherwise, if the=20 >> hardware does everything for us (remap the affected address region and=20 >> move data into a new address region) then why does file system need to=20 >> know about the affected address regions? > >The data is lost, that's why you're getting an ECC. It's tantamount to -E= IO for a disk block access. I see the three possible cases here: (1) bad block has been discovered (no remap, no recovering) -> data is lost= ; -EIO for a disk block access, block is always bad; (2) bad block has been discovered and remapped -> data is lost; -EIO for a = disk block access. (3) bad block has been discovered, remapped and recovered -> no data is los= t. >> Let's imagine that the affected address range will equal to 64 bytes.=20 >> It sounds for me that for the case of block device it will affect the=20 >> whole logical block (4 KB). > > 512 bytes, and yes, that's the granularity at which we track errors in th= e block layer, so that's the minimum amount of data you lose. I think it depends what granularity hardware supports. It could be 512 byte= s, 4 KB, maybe greater. >> The situation is more critical for the case of DAX approach. Correct=20 >> me if I wrong but my understanding is the goal of DAX is to provide=20 >> the direct access to file's memory pages with minimal file system=20 >> overhead. So, it looks like that raising bad block issue on file=20 >> system level will affect a user-space application. Because, finally,=20 >> user-space application will need to process such trouble (bad block=20 >> issue). It sounds for me as really weird situation. What can protect a=20 >> user-space application from encountering the issue with partially=20 >> incorrect memory page? > > Applications need to deal with -EIO today. This is the same sort of thin= g. > If an application trips over a bad block during a load from persistent me= mory, > they will get a signal, and they can either handle it or not. > > Have a read through this specification and see if it clears anything up f= or you: > http://www.snia.org/tech_activities/standards/curr_standards/npm Thank you for sharing this. So, if a user-space application follows to the NVM Programming Model then it will be able to survive by means of catching and processing the exceptions. But these applications have to be implemente= d yet. Also such applications need in special technique(s) of recovering. It sound= s that legacy user-space applications are unable to survive for the NVM.PM.FI= LE mode in the case of load/store operation's failure. Thanks, Vyacheslav Dubeyko.