From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Slava Dubeyko Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Thu, 19 Jan 2017 02:56:39 +0000 Message-ID: References: <20170114004910.GA4880@omniknight.lm.intel.com> <20170117143703.GP2517@quack2.suse.cz> In-Reply-To: Content-Language: en-US MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Jeff Moyer Cc: Jan Kara , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Viacheslav Dubeyko , Linux FS Devel , "lsf-pc@lists.linux-foundation.org" List-ID: -----Original Message----- From: Jeff Moyer [mailto:jmoyer@redhat.com] Sent: Wednesday, January 18, 2017 12:48 PM To: Slava Dubeyko Cc: Jan Kara ; linux-nvdimm@lists.01.org ; linux-block@vger.kernel.org; Viacheslav Dubeyko ; Linux FS Devel ; lsf-pc@lists.linux-foundation.org Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems >>> Well, the situation with NVM is more like with DRAM AFAIU. It is >>> quite reliable but given the size the probability *some* cell has degraded is quite high. >>> And similar to DRAM you'll get MCE (Machine Check Exception) when you >>> try to read such cell. As Vishal wrote, the hardware does some >>> background scrubbing and relocates stuff early if needed but nothing is 100%. >> >> My understanding that hardware does the remapping the affected address >> range (64 bytes, for example) but it doesn't move/migrate the stored >> data in this address range. So, it sounds slightly weird. Because it >> means that no guarantee to retrieve the stored data. It sounds that >> file system should be aware about this and has to be heavily protected >> by some replication or erasure coding scheme. Otherwise, if the >> hardware does everything for us (remap the affected address region and >> move data into a new address region) then why does file system need to >> know about the affected address regions? > >The data is lost, that's why you're getting an ECC. It's tantamount to -EIO for a disk block access. I see the three possible cases here: (1) bad block has been discovered (no remap, no recovering) -> data is lost; -EIO for a disk block access, block is always bad; (2) bad block has been discovered and remapped -> data is lost; -EIO for a disk block access. (3) bad block has been discovered, remapped and recovered -> no data is lost. >> Let's imagine that the affected address range will equal to 64 bytes. >> It sounds for me that for the case of block device it will affect the >> whole logical block (4 KB). > > 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose. I think it depends what granularity hardware supports. It could be 512 bytes, 4 KB, maybe greater. >> The situation is more critical for the case of DAX approach. Correct >> me if I wrong but my understanding is the goal of DAX is to provide >> the direct access to file's memory pages with minimal file system >> overhead. So, it looks like that raising bad block issue on file >> system level will affect a user-space application. Because, finally, >> user-space application will need to process such trouble (bad block >> issue). It sounds for me as really weird situation. What can protect a >> user-space application from encountering the issue with partially >> incorrect memory page? > > Applications need to deal with -EIO today. This is the same sort of thing. > If an application trips over a bad block during a load from persistent memory, > they will get a signal, and they can either handle it or not. > > Have a read through this specification and see if it clears anything up for you: > http://www.snia.org/tech_activities/standards/curr_standards/npm Thank you for sharing this. So, if a user-space application follows to the NVM Programming Model then it will be able to survive by means of catching and processing the exceptions. But these applications have to be implemented yet. Also such applications need in special technique(s) of recovering. It sounds that legacy user-space applications are unable to survive for the NVM.PM.FILE mode in the case of load/store operation's failure. Thanks, Vyacheslav Dubeyko. Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.hgst.iphmx.com ([68.232.141.245]:33509 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751775AbdASC4n (ORCPT ); Wed, 18 Jan 2017 21:56:43 -0500 From: Slava Dubeyko To: Jeff Moyer CC: Jan Kara , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Viacheslav Dubeyko , "Linux FS Devel" , "lsf-pc@lists.linux-foundation.org" Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Thu, 19 Jan 2017 02:56:39 +0000 Message-ID: References: <20170114004910.GA4880@omniknight.lm.intel.com> <20170117143703.GP2517@quack2.suse.cz> In-Reply-To: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org -----Original Message----- From: Jeff Moyer [mailto:jmoyer@redhat.com]=20 Sent: Wednesday, January 18, 2017 12:48 PM To: Slava Dubeyko Cc: Jan Kara ; linux-nvdimm@lists.01.org ; linux-block@vger.kernel.org; Viacheslav Dubeyko = ; Linux FS Devel ; lsf-pc@lists.linux-founda= tion.org Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in f= ilesystems >>> Well, the situation with NVM is more like with DRAM AFAIU. It is=20 >>> quite reliable but given the size the probability *some* cell has degra= ded is quite high. >>> And similar to DRAM you'll get MCE (Machine Check Exception) when you=20 >>> try to read such cell. As Vishal wrote, the hardware does some=20 >>> background scrubbing and relocates stuff early if needed but nothing is= 100%. >> >> My understanding that hardware does the remapping the affected address=20 >> range (64 bytes, for example) but it doesn't move/migrate the stored=20 >> data in this address range. So, it sounds slightly weird. Because it=20 >> means that no guarantee to retrieve the stored data. It sounds that=20 >> file system should be aware about this and has to be heavily protected=20 >> by some replication or erasure coding scheme. Otherwise, if the=20 >> hardware does everything for us (remap the affected address region and=20 >> move data into a new address region) then why does file system need to=20 >> know about the affected address regions? > >The data is lost, that's why you're getting an ECC. It's tantamount to -E= IO for a disk block access. I see the three possible cases here: (1) bad block has been discovered (no remap, no recovering) -> data is lost= ; -EIO for a disk block access, block is always bad; (2) bad block has been discovered and remapped -> data is lost; -EIO for a = disk block access. (3) bad block has been discovered, remapped and recovered -> no data is los= t. >> Let's imagine that the affected address range will equal to 64 bytes.=20 >> It sounds for me that for the case of block device it will affect the=20 >> whole logical block (4 KB). > > 512 bytes, and yes, that's the granularity at which we track errors in th= e block layer, so that's the minimum amount of data you lose. I think it depends what granularity hardware supports. It could be 512 byte= s, 4 KB, maybe greater. >> The situation is more critical for the case of DAX approach. Correct=20 >> me if I wrong but my understanding is the goal of DAX is to provide=20 >> the direct access to file's memory pages with minimal file system=20 >> overhead. So, it looks like that raising bad block issue on file=20 >> system level will affect a user-space application. Because, finally,=20 >> user-space application will need to process such trouble (bad block=20 >> issue). It sounds for me as really weird situation. What can protect a=20 >> user-space application from encountering the issue with partially=20 >> incorrect memory page? > > Applications need to deal with -EIO today. This is the same sort of thin= g. > If an application trips over a bad block during a load from persistent me= mory, > they will get a signal, and they can either handle it or not. > > Have a read through this specification and see if it clears anything up f= or you: > http://www.snia.org/tech_activities/standards/curr_standards/npm Thank you for sharing this. So, if a user-space application follows to the NVM Programming Model then it will be able to survive by means of catching and processing the exceptions. But these applications have to be implemente= d yet. Also such applications need in special technique(s) of recovering. It sound= s that legacy user-space applications are unable to survive for the NVM.PM.FI= LE mode in the case of load/store operation's failure. Thanks, Vyacheslav Dubeyko. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.hgst.iphmx.com ([68.232.141.245]:33509 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751775AbdASC4n (ORCPT ); Wed, 18 Jan 2017 21:56:43 -0500 From: Slava Dubeyko To: Jeff Moyer CC: Jan Kara , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Viacheslav Dubeyko , "Linux FS Devel" , "lsf-pc@lists.linux-foundation.org" Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Thu, 19 Jan 2017 02:56:39 +0000 Message-ID: References: <20170114004910.GA4880@omniknight.lm.intel.com> <20170117143703.GP2517@quack2.suse.cz> In-Reply-To: Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: -----Original Message----- From: Jeff Moyer [mailto:jmoyer@redhat.com]=20 Sent: Wednesday, January 18, 2017 12:48 PM To: Slava Dubeyko Cc: Jan Kara ; linux-nvdimm@lists.01.org ; linux-block@vger.kernel.org; Viacheslav Dubeyko = ; Linux FS Devel ; lsf-pc@lists.linux-founda= tion.org Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in f= ilesystems >>> Well, the situation with NVM is more like with DRAM AFAIU. It is=20 >>> quite reliable but given the size the probability *some* cell has degra= ded is quite high. >>> And similar to DRAM you'll get MCE (Machine Check Exception) when you=20 >>> try to read such cell. As Vishal wrote, the hardware does some=20 >>> background scrubbing and relocates stuff early if needed but nothing is= 100%. >> >> My understanding that hardware does the remapping the affected address=20 >> range (64 bytes, for example) but it doesn't move/migrate the stored=20 >> data in this address range. So, it sounds slightly weird. Because it=20 >> means that no guarantee to retrieve the stored data. It sounds that=20 >> file system should be aware about this and has to be heavily protected=20 >> by some replication or erasure coding scheme. Otherwise, if the=20 >> hardware does everything for us (remap the affected address region and=20 >> move data into a new address region) then why does file system need to=20 >> know about the affected address regions? > >The data is lost, that's why you're getting an ECC. It's tantamount to -E= IO for a disk block access. I see the three possible cases here: (1) bad block has been discovered (no remap, no recovering) -> data is lost= ; -EIO for a disk block access, block is always bad; (2) bad block has been discovered and remapped -> data is lost; -EIO for a = disk block access. (3) bad block has been discovered, remapped and recovered -> no data is los= t. >> Let's imagine that the affected address range will equal to 64 bytes.=20 >> It sounds for me that for the case of block device it will affect the=20 >> whole logical block (4 KB). > > 512 bytes, and yes, that's the granularity at which we track errors in th= e block layer, so that's the minimum amount of data you lose. I think it depends what granularity hardware supports. It could be 512 byte= s, 4 KB, maybe greater. >> The situation is more critical for the case of DAX approach. Correct=20 >> me if I wrong but my understanding is the goal of DAX is to provide=20 >> the direct access to file's memory pages with minimal file system=20 >> overhead. So, it looks like that raising bad block issue on file=20 >> system level will affect a user-space application. Because, finally,=20 >> user-space application will need to process such trouble (bad block=20 >> issue). It sounds for me as really weird situation. What can protect a=20 >> user-space application from encountering the issue with partially=20 >> incorrect memory page? > > Applications need to deal with -EIO today. This is the same sort of thin= g. > If an application trips over a bad block during a load from persistent me= mory, > they will get a signal, and they can either handle it or not. > > Have a read through this specification and see if it clears anything up f= or you: > http://www.snia.org/tech_activities/standards/curr_standards/npm Thank you for sharing this. So, if a user-space application follows to the NVM Programming Model then it will be able to survive by means of catching and processing the exceptions. But these applications have to be implemente= d yet. Also such applications need in special technique(s) of recovering. It sound= s that legacy user-space applications are unable to survive for the NVM.PM.FI= LE mode in the case of load/store operation's failure. Thanks, Vyacheslav Dubeyko.