From: Vishal Verma <vishal.l.verma@intel.com> To: Andiry Xu <andiry@gmail.com> Cc: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>, "Darrick J. Wong" <darrick.wong@oracle.com>, "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, Viacheslav Dubeyko <slava@dubeyko.com>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, "lsf-pc@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org> Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Tue, 17 Jan 2017 16:51:50 -0700 [thread overview] Message-ID: <20170117235150.GE4880@omniknight.lm.intel.com> (raw) In-Reply-To: <CAOvWMLYcP9PN6LT51gwJvmyCTfRRrVeDTrjN-8_zTKhD+UmDiw@mail.gmail.com> On 01/17, Andiry Xu wrote: <snip> > >> > >> The pmem_do_bvec() read logic is like this: > >> > >> pmem_do_bvec() > >> if (is_bad_pmem()) > >> return -EIO; > >> else > >> memcpy_from_pmem(); > >> > >> Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this imply > >> that even if a block is not in the badblock list, it still can be bad > >> and causes MCE? Does the badblock list get changed during file system > >> running? If that is the case, should the file system get a > >> notification when it gets changed? If a block is good when I first > >> read it, can I still trust it to be good for the second access? > > > > Yes, if a block is not in the badblocks list, it can still cause an > > MCE. This is the latent error case I described above. For a simple read() > > via the pmem driver, this will get handled by memcpy_mcsafe. For mmap, > > an MCE is inevitable. > > > > Yes the badblocks list may change while a filesystem is running. The RFC > > patches[1] I linked to add a notification for the filesystem when this > > happens. > > > > This is really bad and it makes file system implementation much more > complicated. And badblock notification does not help very much, > because any block can be bad potentially, no matter it is in badblock > list or not. And file system has to perform checking for every read, > using memcpy_mcsafe. This is disaster for file system like NOVA, which > uses pointer de-reference to access data structures on pmem. Now if I > want to read a field in an inode on pmem, I have to copy it to DRAM > first and make sure memcpy_mcsafe() does not report anything wrong. You have a good point, and I don't know if I have an answer for this.. Assuming a system with MCE recovery, maybe NOVA can add a mce handler similar to nfit_handle_mce(), and handle errors as they happen, but I'm being very hand-wavey here and don't know how much/how well that might work.. > > > No, if the media, for some reason, 'dvelops' a bad cell, a second > > consecutive read does have a chance of being bad. Once a location has > > been marked as bad, it will stay bad till the ACPI clear error 'DSM' has > > been called to mark it as clean. > > > > I wonder what happens to write in this case? If a block is bad but not > reported in badblock list. Now I write to it without reading first. Do > I clear the poison with the write? Or still require a ACPI DSM? With writes, my understanding is there is still a possibility that an internal read-modify-write can happen, and cause a MCE (this is the same as writing to a bad DRAM cell, which can also cause an MCE). You can't really use the ACPI DSM preemptively because you don't know whether the location was bad. The error flow will be something like write causes the MCE, a badblock gets added (either through the mce handler or after the next reboot), and the recovery path is now the same as a regular badblock. > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html > > > > Thank you for the patchset. I will look into it. > _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Vishal Verma <vishal.l.verma@intel.com> To: Andiry Xu <andiry@gmail.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>, Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>, "lsf-pc@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org>, "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, Viacheslav Dubeyko <slava@dubeyko.com> Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems Date: Tue, 17 Jan 2017 16:51:50 -0700 [thread overview] Message-ID: <20170117235150.GE4880@omniknight.lm.intel.com> (raw) In-Reply-To: <CAOvWMLYcP9PN6LT51gwJvmyCTfRRrVeDTrjN-8_zTKhD+UmDiw@mail.gmail.com> On 01/17, Andiry Xu wrote: <snip> > >> > >> The pmem_do_bvec() read logic is like this: > >> > >> pmem_do_bvec() > >> if (is_bad_pmem()) > >> return -EIO; > >> else > >> memcpy_from_pmem(); > >> > >> Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this imply > >> that even if a block is not in the badblock list, it still can be bad > >> and causes MCE? Does the badblock list get changed during file system > >> running? If that is the case, should the file system get a > >> notification when it gets changed? If a block is good when I first > >> read it, can I still trust it to be good for the second access? > > > > Yes, if a block is not in the badblocks list, it can still cause an > > MCE. This is the latent error case I described above. For a simple read() > > via the pmem driver, this will get handled by memcpy_mcsafe. For mmap, > > an MCE is inevitable. > > > > Yes the badblocks list may change while a filesystem is running. The RFC > > patches[1] I linked to add a notification for the filesystem when this > > happens. > > > > This is really bad and it makes file system implementation much more > complicated. And badblock notification does not help very much, > because any block can be bad potentially, no matter it is in badblock > list or not. And file system has to perform checking for every read, > using memcpy_mcsafe. This is disaster for file system like NOVA, which > uses pointer de-reference to access data structures on pmem. Now if I > want to read a field in an inode on pmem, I have to copy it to DRAM > first and make sure memcpy_mcsafe() does not report anything wrong. You have a good point, and I don't know if I have an answer for this.. Assuming a system with MCE recovery, maybe NOVA can add a mce handler similar to nfit_handle_mce(), and handle errors as they happen, but I'm being very hand-wavey here and don't know how much/how well that might work.. > > > No, if the media, for some reason, 'dvelops' a bad cell, a second > > consecutive read does have a chance of being bad. Once a location has > > been marked as bad, it will stay bad till the ACPI clear error 'DSM' has > > been called to mark it as clean. > > > > I wonder what happens to write in this case? If a block is bad but not > reported in badblock list. Now I write to it without reading first. Do > I clear the poison with the write? Or still require a ACPI DSM? With writes, my understanding is there is still a possibility that an internal read-modify-write can happen, and cause a MCE (this is the same as writing to a bad DRAM cell, which can also cause an MCE). You can't really use the ACPI DSM preemptively because you don't know whether the location was bad. The error flow will be something like write causes the MCE, a badblock gets added (either through the mce handler or after the next reboot), and the recovery path is now the same as a regular badblock. > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html > > > > Thank you for the patchset. I will look into it. >
next prev parent reply other threads:[~2017-01-17 23:51 UTC|newest] Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <at1mp6pou4lenesjdgh22k4p.1484345585589@email.android.com> [not found] ` <b9rbflutjt10mb4ofherta8j.1484345610771@email.android.com> 2017-01-14 0:00 ` [LSF/MM TOPIC] Badblocks checking/representation in filesystems Slava Dubeyko 2017-01-14 0:00 ` Slava Dubeyko 2017-01-14 0:00 ` Slava Dubeyko 2017-01-14 0:49 ` Vishal Verma 2017-01-14 0:49 ` Vishal Verma 2017-01-16 2:27 ` Slava Dubeyko 2017-01-16 2:27 ` Slava Dubeyko 2017-01-16 2:27 ` Slava Dubeyko 2017-01-17 14:37 ` [Lsf-pc] " Jan Kara 2017-01-17 14:37 ` Jan Kara 2017-01-17 15:08 ` Christoph Hellwig 2017-01-17 15:08 ` Christoph Hellwig 2017-01-17 22:14 ` Vishal Verma 2017-01-17 22:14 ` Vishal Verma 2017-01-18 10:16 ` Jan Kara 2017-01-18 10:16 ` Jan Kara 2017-01-18 20:39 ` Jeff Moyer 2017-01-18 20:39 ` Jeff Moyer 2017-01-18 21:02 ` Darrick J. Wong 2017-01-18 21:02 ` Darrick J. Wong 2017-01-18 21:32 ` Dan Williams 2017-01-18 21:32 ` Dan Williams [not found] ` <CAPcyv4hd7bpCa7d9msX0Y8gLz7WsqXT3VExQwwLuAcsmMxVTPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-18 21:56 ` Verma, Vishal L 2017-01-18 21:56 ` Verma, Vishal L 2017-01-18 21:56 ` Verma, Vishal L [not found] ` <1484776549.4358.33.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 2017-01-19 8:10 ` Jan Kara 2017-01-19 8:10 ` Jan Kara [not found] ` <20170119081011.GA2565-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 2017-01-19 18:59 ` Vishal Verma 2017-01-19 18:59 ` Vishal Verma [not found] ` <20170119185910.GF4880-PxNA6LsHknajYZd8rzuJLNh3ngVCH38I@public.gmane.org> 2017-01-19 19:03 ` Dan Williams 2017-01-19 19:03 ` Dan Williams [not found] ` <CAPcyv4jZz_iqLutd0gPEL3udqbFxvBH8CZY5oDgUjG5dGbC2gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-20 9:03 ` Jan Kara 2017-01-20 9:03 ` Jan Kara 2017-01-17 23:15 ` Slava Dubeyko 2017-01-17 23:15 ` Slava Dubeyko 2017-01-17 23:15 ` Slava Dubeyko 2017-01-18 20:47 ` Jeff Moyer 2017-01-18 20:47 ` Jeff Moyer 2017-01-19 2:56 ` Slava Dubeyko 2017-01-19 2:56 ` Slava Dubeyko 2017-01-19 2:56 ` Slava Dubeyko 2017-01-19 19:33 ` Jeff Moyer 2017-01-19 19:33 ` Jeff Moyer 2017-01-17 6:33 ` Darrick J. Wong 2017-01-17 6:33 ` Darrick J. Wong 2017-01-17 21:35 ` Vishal Verma 2017-01-17 21:35 ` Vishal Verma 2017-01-17 22:15 ` Andiry Xu 2017-01-17 22:15 ` Andiry Xu 2017-01-17 22:37 ` Vishal Verma 2017-01-17 22:37 ` Vishal Verma 2017-01-17 23:20 ` Andiry Xu 2017-01-17 23:20 ` Andiry Xu 2017-01-17 23:51 ` Vishal Verma [this message] 2017-01-17 23:51 ` Vishal Verma 2017-01-18 1:58 ` Andiry Xu 2017-01-18 1:58 ` Andiry Xu [not found] ` <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-20 0:32 ` Verma, Vishal L 2017-01-20 0:32 ` Verma, Vishal L 2017-01-20 0:32 ` Verma, Vishal L 2017-01-18 9:38 ` [Lsf-pc] " Jan Kara 2017-01-18 9:38 ` Jan Kara 2017-01-19 21:17 ` Vishal Verma 2017-01-19 21:17 ` Vishal Verma 2017-01-20 9:47 ` Jan Kara 2017-01-20 9:47 ` Jan Kara 2017-01-20 15:42 ` Dan Williams 2017-01-20 15:42 ` Dan Williams 2017-01-24 7:46 ` Jan Kara 2017-01-24 7:46 ` Jan Kara 2017-01-24 19:59 ` Vishal Verma 2017-01-24 19:59 ` Vishal Verma 2017-01-18 0:16 ` Andreas Dilger 2017-01-18 2:01 ` Andiry Xu 2017-01-18 2:01 ` Andiry Xu 2017-01-18 3:08 ` Lu Zhang 2017-01-18 3:08 ` Lu Zhang 2017-01-20 0:46 ` Vishal Verma 2017-01-20 0:46 ` Vishal Verma 2017-01-20 9:24 ` Yasunori Goto 2017-01-20 9:24 ` Yasunori Goto [not found] ` <20170120182435.0E12.E1E9C6FF-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2017-01-21 0:23 ` Kani, Toshimitsu 2017-01-21 0:23 ` Kani, Toshimitsu 2017-01-21 0:23 ` Kani, Toshimitsu 2017-01-20 0:55 ` Verma, Vishal L 2017-01-20 0:55 ` Verma, Vishal L 2017-01-13 21:40 Verma, Vishal L 2017-01-13 21:40 ` Verma, Vishal L 2017-01-13 21:40 ` Verma, Vishal L
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170117235150.GE4880@omniknight.lm.intel.com \ --to=vishal.l.verma@intel.com \ --cc=Vyacheslav.Dubeyko@wdc.com \ --cc=andiry@gmail.com \ --cc=darrick.wong@oracle.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@ml01.01.org \ --cc=lsf-pc@lists.linux-foundation.org \ --cc=slava@dubeyko.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.