From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Jeff Moyer Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems References: <20170114004910.GA4880@omniknight.lm.intel.com> <20170117143703.GP2517@quack2.suse.cz> Date: Wed, 18 Jan 2017 15:47:37 -0500 In-Reply-To: (Slava Dubeyko's message of "Tue, 17 Jan 2017 23:15:17 +0000") Message-ID: MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Slava Dubeyko Cc: Jan Kara , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Viacheslav Dubeyko , Linux FS Devel , "lsf-pc@lists.linux-foundation.org" List-ID: Slava Dubeyko writes: >> Well, the situation with NVM is more like with DRAM AFAIU. It is quite reliable >> but given the size the probability *some* cell has degraded is quite high. >> And similar to DRAM you'll get MCE (Machine Check Exception) when you try >> to read such cell. As Vishal wrote, the hardware does some background scrubbing >> and relocates stuff early if needed but nothing is 100%. > > My understanding that hardware does the remapping the affected address > range (64 bytes, for example) but it doesn't move/migrate the stored > data in this address range. So, it sounds slightly weird. Because it > means that no guarantee to retrieve the stored data. It sounds that > file system should be aware about this and has to be heavily protected > by some replication or erasure coding scheme. Otherwise, if the > hardware does everything for us (remap the affected address region and > move data into a new address region) then why does file system need to > know about the affected address regions? The data is lost, that's why you're getting an ECC. It's tantamount to -EIO for a disk block access. >> The reason why we play games with badblocks is to avoid those MCEs >> (i.e., even trying to read the data we know that are bad). Even if it would >> be rare event, MCE may mean the machine just immediately reboots >> (although I find such platforms hardly usable with NVM then) and that >> is no good. And even on hardware platforms that allow for more graceful >> recovery from MCE it is asynchronous in its nature and our error handling >> around IO is all synchronous so it is difficult to join these two models together. >> >> But I think it is a good question to ask whether we cannot improve on MCE handling >> instead of trying to avoid them and pushing around responsibility for handling >> bad blocks. Actually I thought someone was working on that. >> Cannot we e.g. wrap in-kernel accesses to persistent memory (those are now >> well identified anyway so that we can consult the badblocks list) so that it MCE >> happens during these accesses, we note it somewhere and at the end of the magic >> block we will just pick up the errors and report them back? > > Let's imagine that the affected address range will equal to 64 bytes. It sounds for me > that for the case of block device it will affect the whole logical > block (4 KB). 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose. > If the failure rate of address ranges could be significant then it > would affect a lot of logical blocks. Who would buy hardware like that? > The situation is more critical for the case of DAX approach. Correct > me if I wrong but my understanding is the goal of DAX is to provide > the direct access to file's memory pages with minimal file system > overhead. So, it looks like that raising bad block issue on file > system level will affect a user-space application. Because, finally, > user-space application will need to process such trouble (bad block > issue). It sounds for me as really weird situation. What can protect a > user-space application from encountering the issue with partially > incorrect memory page? Applications need to deal with -EIO today. This is the same sort of thing. If an application trips over a bad block during a load from persistent memory, they will get a signal, and they can either handle it or not. Have a read through this specification and see if it clears anything up for you: http://www.snia.org/tech_activities/standards/curr_standards/npm Cheers, Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:55232 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751076AbdARUsa (ORCPT ); Wed, 18 Jan 2017 15:48:30 -0500 From: Jeff Moyer To: Slava Dubeyko Cc: Jan Kara , "linux-nvdimm\@lists.01.org" , "linux-block\@vger.kernel.org" , Viacheslav Dubeyko , Linux FS Devel , "lsf-pc\@lists.linux-foundation.org" Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems References: <20170114004910.GA4880@omniknight.lm.intel.com> <20170117143703.GP2517@quack2.suse.cz> Date: Wed, 18 Jan 2017 15:47:37 -0500 In-Reply-To: (Slava Dubeyko's message of "Tue, 17 Jan 2017 23:15:17 +0000") Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org Slava Dubeyko writes: >> Well, the situation with NVM is more like with DRAM AFAIU. It is quite reliable >> but given the size the probability *some* cell has degraded is quite high. >> And similar to DRAM you'll get MCE (Machine Check Exception) when you try >> to read such cell. As Vishal wrote, the hardware does some background scrubbing >> and relocates stuff early if needed but nothing is 100%. > > My understanding that hardware does the remapping the affected address > range (64 bytes, for example) but it doesn't move/migrate the stored > data in this address range. So, it sounds slightly weird. Because it > means that no guarantee to retrieve the stored data. It sounds that > file system should be aware about this and has to be heavily protected > by some replication or erasure coding scheme. Otherwise, if the > hardware does everything for us (remap the affected address region and > move data into a new address region) then why does file system need to > know about the affected address regions? The data is lost, that's why you're getting an ECC. It's tantamount to -EIO for a disk block access. >> The reason why we play games with badblocks is to avoid those MCEs >> (i.e., even trying to read the data we know that are bad). Even if it would >> be rare event, MCE may mean the machine just immediately reboots >> (although I find such platforms hardly usable with NVM then) and that >> is no good. And even on hardware platforms that allow for more graceful >> recovery from MCE it is asynchronous in its nature and our error handling >> around IO is all synchronous so it is difficult to join these two models together. >> >> But I think it is a good question to ask whether we cannot improve on MCE handling >> instead of trying to avoid them and pushing around responsibility for handling >> bad blocks. Actually I thought someone was working on that. >> Cannot we e.g. wrap in-kernel accesses to persistent memory (those are now >> well identified anyway so that we can consult the badblocks list) so that it MCE >> happens during these accesses, we note it somewhere and at the end of the magic >> block we will just pick up the errors and report them back? > > Let's imagine that the affected address range will equal to 64 bytes. It sounds for me > that for the case of block device it will affect the whole logical > block (4 KB). 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose. > If the failure rate of address ranges could be significant then it > would affect a lot of logical blocks. Who would buy hardware like that? > The situation is more critical for the case of DAX approach. Correct > me if I wrong but my understanding is the goal of DAX is to provide > the direct access to file's memory pages with minimal file system > overhead. So, it looks like that raising bad block issue on file > system level will affect a user-space application. Because, finally, > user-space application will need to process such trouble (bad block > issue). It sounds for me as really weird situation. What can protect a > user-space application from encountering the issue with partially > incorrect memory page? Applications need to deal with -EIO today. This is the same sort of thing. If an application trips over a bad block during a load from persistent memory, they will get a signal, and they can either handle it or not. Have a read through this specification and see if it clears anything up for you: http://www.snia.org/tech_activities/standards/curr_standards/npm Cheers, Jeff