From: Jeff Moyer <jmoyer@redhat.com> To: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com> Cc: Jan Kara <jack@suse.cz>, "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, Viacheslav Dubeyko <slava@dubeyko.com>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, "lsf-pc@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org> Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Wed, 18 Jan 2017 15:47:37 -0500 [thread overview] Message-ID: <x49mveo6qom.fsf@segfault.boston.devel.redhat.com> (raw) In-Reply-To: <SN2PR04MB219128E4C8C4FD4AF3452C7C887C0@SN2PR04MB2191.namprd04.prod.outlook.com> (Slava Dubeyko's message of "Tue, 17 Jan 2017 23:15:17 +0000") Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com> writes: >> Well, the situation with NVM is more like with DRAM AFAIU. It is quite reliable >> but given the size the probability *some* cell has degraded is quite high. >> And similar to DRAM you'll get MCE (Machine Check Exception) when you try >> to read such cell. As Vishal wrote, the hardware does some background scrubbing >> and relocates stuff early if needed but nothing is 100%. > > My understanding that hardware does the remapping the affected address > range (64 bytes, for example) but it doesn't move/migrate the stored > data in this address range. So, it sounds slightly weird. Because it > means that no guarantee to retrieve the stored data. It sounds that > file system should be aware about this and has to be heavily protected > by some replication or erasure coding scheme. Otherwise, if the > hardware does everything for us (remap the affected address region and > move data into a new address region) then why does file system need to > know about the affected address regions? The data is lost, that's why you're getting an ECC. It's tantamount to -EIO for a disk block access. >> The reason why we play games with badblocks is to avoid those MCEs >> (i.e., even trying to read the data we know that are bad). Even if it would >> be rare event, MCE may mean the machine just immediately reboots >> (although I find such platforms hardly usable with NVM then) and that >> is no good. And even on hardware platforms that allow for more graceful >> recovery from MCE it is asynchronous in its nature and our error handling >> around IO is all synchronous so it is difficult to join these two models together. >> >> But I think it is a good question to ask whether we cannot improve on MCE handling >> instead of trying to avoid them and pushing around responsibility for handling >> bad blocks. Actually I thought someone was working on that. >> Cannot we e.g. wrap in-kernel accesses to persistent memory (those are now >> well identified anyway so that we can consult the badblocks list) so that it MCE >> happens during these accesses, we note it somewhere and at the end of the magic >> block we will just pick up the errors and report them back? > > Let's imagine that the affected address range will equal to 64 bytes. It sounds for me > that for the case of block device it will affect the whole logical > block (4 KB). 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose. > If the failure rate of address ranges could be significant then it > would affect a lot of logical blocks. Who would buy hardware like that? > The situation is more critical for the case of DAX approach. Correct > me if I wrong but my understanding is the goal of DAX is to provide > the direct access to file's memory pages with minimal file system > overhead. So, it looks like that raising bad block issue on file > system level will affect a user-space application. Because, finally, > user-space application will need to process such trouble (bad block > issue). It sounds for me as really weird situation. What can protect a > user-space application from encountering the issue with partially > incorrect memory page? Applications need to deal with -EIO today. This is the same sort of thing. If an application trips over a bad block during a load from persistent memory, they will get a signal, and they can either handle it or not. Have a read through this specification and see if it clears anything up for you: http://www.snia.org/tech_activities/standards/curr_standards/npm Cheers, Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Jeff Moyer <jmoyer@redhat.com> To: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com> Cc: Jan Kara <jack@suse.cz>, "linux-nvdimm\@lists.01.org" <linux-nvdimm@ml01.01.org>, "linux-block\@vger.kernel.org" <linux-block@vger.kernel.org>, Viacheslav Dubeyko <slava@dubeyko.com>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, "lsf-pc\@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org> Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Wed, 18 Jan 2017 15:47:37 -0500 [thread overview] Message-ID: <x49mveo6qom.fsf@segfault.boston.devel.redhat.com> (raw) In-Reply-To: <SN2PR04MB219128E4C8C4FD4AF3452C7C887C0@SN2PR04MB2191.namprd04.prod.outlook.com> (Slava Dubeyko's message of "Tue, 17 Jan 2017 23:15:17 +0000") Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com> writes: >> Well, the situation with NVM is more like with DRAM AFAIU. It is quite reliable >> but given the size the probability *some* cell has degraded is quite high. >> And similar to DRAM you'll get MCE (Machine Check Exception) when you try >> to read such cell. As Vishal wrote, the hardware does some background scrubbing >> and relocates stuff early if needed but nothing is 100%. > > My understanding that hardware does the remapping the affected address > range (64 bytes, for example) but it doesn't move/migrate the stored > data in this address range. So, it sounds slightly weird. Because it > means that no guarantee to retrieve the stored data. It sounds that > file system should be aware about this and has to be heavily protected > by some replication or erasure coding scheme. Otherwise, if the > hardware does everything for us (remap the affected address region and > move data into a new address region) then why does file system need to > know about the affected address regions? The data is lost, that's why you're getting an ECC. It's tantamount to -EIO for a disk block access. >> The reason why we play games with badblocks is to avoid those MCEs >> (i.e., even trying to read the data we know that are bad). Even if it would >> be rare event, MCE may mean the machine just immediately reboots >> (although I find such platforms hardly usable with NVM then) and that >> is no good. And even on hardware platforms that allow for more graceful >> recovery from MCE it is asynchronous in its nature and our error handling >> around IO is all synchronous so it is difficult to join these two models together. >> >> But I think it is a good question to ask whether we cannot improve on MCE handling >> instead of trying to avoid them and pushing around responsibility for handling >> bad blocks. Actually I thought someone was working on that. >> Cannot we e.g. wrap in-kernel accesses to persistent memory (those are now >> well identified anyway so that we can consult the badblocks list) so that it MCE >> happens during these accesses, we note it somewhere and at the end of the magic >> block we will just pick up the errors and report them back? > > Let's imagine that the affected address range will equal to 64 bytes. It sounds for me > that for the case of block device it will affect the whole logical > block (4 KB). 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose. > If the failure rate of address ranges could be significant then it > would affect a lot of logical blocks. Who would buy hardware like that? > The situation is more critical for the case of DAX approach. Correct > me if I wrong but my understanding is the goal of DAX is to provide > the direct access to file's memory pages with minimal file system > overhead. So, it looks like that raising bad block issue on file > system level will affect a user-space application. Because, finally, > user-space application will need to process such trouble (bad block > issue). It sounds for me as really weird situation. What can protect a > user-space application from encountering the issue with partially > incorrect memory page? Applications need to deal with -EIO today. This is the same sort of thing. If an application trips over a bad block during a load from persistent memory, they will get a signal, and they can either handle it or not. Have a read through this specification and see if it clears anything up for you: http://www.snia.org/tech_activities/standards/curr_standards/npm Cheers, Jeff
next prev parent reply other threads:[~2017-01-18 20:47 UTC|newest] Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <at1mp6pou4lenesjdgh22k4p.1484345585589@email.android.com> [not found] ` <b9rbflutjt10mb4ofherta8j.1484345610771@email.android.com> 2017-01-14 0:00 ` [LSF/MM TOPIC] Badblocks checking/representation in filesystems Slava Dubeyko 2017-01-14 0:00 ` Slava Dubeyko 2017-01-14 0:00 ` Slava Dubeyko 2017-01-14 0:49 ` Vishal Verma 2017-01-14 0:49 ` Vishal Verma 2017-01-16 2:27 ` Slava Dubeyko 2017-01-16 2:27 ` Slava Dubeyko 2017-01-16 2:27 ` Slava Dubeyko 2017-01-17 14:37 ` [Lsf-pc] " Jan Kara 2017-01-17 14:37 ` Jan Kara 2017-01-17 15:08 ` Christoph Hellwig 2017-01-17 15:08 ` Christoph Hellwig 2017-01-17 22:14 ` Vishal Verma 2017-01-17 22:14 ` Vishal Verma 2017-01-18 10:16 ` Jan Kara 2017-01-18 10:16 ` Jan Kara 2017-01-18 20:39 ` Jeff Moyer 2017-01-18 20:39 ` Jeff Moyer 2017-01-18 21:02 ` Darrick J. Wong 2017-01-18 21:02 ` Darrick J. Wong 2017-01-18 21:32 ` Dan Williams 2017-01-18 21:32 ` Dan Williams [not found] ` <CAPcyv4hd7bpCa7d9msX0Y8gLz7WsqXT3VExQwwLuAcsmMxVTPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-18 21:56 ` Verma, Vishal L 2017-01-18 21:56 ` Verma, Vishal L 2017-01-18 21:56 ` Verma, Vishal L [not found] ` <1484776549.4358.33.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 2017-01-19 8:10 ` Jan Kara 2017-01-19 8:10 ` Jan Kara [not found] ` <20170119081011.GA2565-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 2017-01-19 18:59 ` Vishal Verma 2017-01-19 18:59 ` Vishal Verma [not found] ` <20170119185910.GF4880-PxNA6LsHknajYZd8rzuJLNh3ngVCH38I@public.gmane.org> 2017-01-19 19:03 ` Dan Williams 2017-01-19 19:03 ` Dan Williams [not found] ` <CAPcyv4jZz_iqLutd0gPEL3udqbFxvBH8CZY5oDgUjG5dGbC2gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-20 9:03 ` Jan Kara 2017-01-20 9:03 ` Jan Kara 2017-01-17 23:15 ` Slava Dubeyko 2017-01-17 23:15 ` Slava Dubeyko 2017-01-17 23:15 ` Slava Dubeyko 2017-01-18 20:47 ` Jeff Moyer [this message] 2017-01-18 20:47 ` Jeff Moyer 2017-01-19 2:56 ` Slava Dubeyko 2017-01-19 2:56 ` Slava Dubeyko 2017-01-19 2:56 ` Slava Dubeyko 2017-01-19 19:33 ` Jeff Moyer 2017-01-19 19:33 ` Jeff Moyer 2017-01-17 6:33 ` Darrick J. Wong 2017-01-17 6:33 ` Darrick J. Wong 2017-01-17 21:35 ` Vishal Verma 2017-01-17 21:35 ` Vishal Verma 2017-01-17 22:15 ` Andiry Xu 2017-01-17 22:15 ` Andiry Xu 2017-01-17 22:37 ` Vishal Verma 2017-01-17 22:37 ` Vishal Verma 2017-01-17 23:20 ` Andiry Xu 2017-01-17 23:20 ` Andiry Xu 2017-01-17 23:51 ` Vishal Verma 2017-01-17 23:51 ` Vishal Verma 2017-01-18 1:58 ` Andiry Xu 2017-01-18 1:58 ` Andiry Xu [not found] ` <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-20 0:32 ` Verma, Vishal L 2017-01-20 0:32 ` Verma, Vishal L 2017-01-20 0:32 ` Verma, Vishal L 2017-01-18 9:38 ` [Lsf-pc] " Jan Kara 2017-01-18 9:38 ` Jan Kara 2017-01-19 21:17 ` Vishal Verma 2017-01-19 21:17 ` Vishal Verma 2017-01-20 9:47 ` Jan Kara 2017-01-20 9:47 ` Jan Kara 2017-01-20 15:42 ` Dan Williams 2017-01-20 15:42 ` Dan Williams 2017-01-24 7:46 ` Jan Kara 2017-01-24 7:46 ` Jan Kara 2017-01-24 19:59 ` Vishal Verma 2017-01-24 19:59 ` Vishal Verma 2017-01-18 0:16 ` Andreas Dilger 2017-01-18 2:01 ` Andiry Xu 2017-01-18 2:01 ` Andiry Xu 2017-01-18 3:08 ` Lu Zhang 2017-01-18 3:08 ` Lu Zhang 2017-01-20 0:46 ` Vishal Verma 2017-01-20 0:46 ` Vishal Verma 2017-01-20 9:24 ` Yasunori Goto 2017-01-20 9:24 ` Yasunori Goto [not found] ` <20170120182435.0E12.E1E9C6FF-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2017-01-21 0:23 ` Kani, Toshimitsu 2017-01-21 0:23 ` Kani, Toshimitsu 2017-01-21 0:23 ` Kani, Toshimitsu 2017-01-20 0:55 ` Verma, Vishal L 2017-01-20 0:55 ` Verma, Vishal L
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=x49mveo6qom.fsf@segfault.boston.devel.redhat.com \ --to=jmoyer@redhat.com \ --cc=Vyacheslav.Dubeyko@wdc.com \ --cc=jack@suse.cz \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@ml01.01.org \ --cc=lsf-pc@lists.linux-foundation.org \ --cc=slava@dubeyko.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.