All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vishal Verma <vishal.l.verma@intel.com>
To: Andiry Xu <andiry@gmail.com>
Cc: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Tue, 17 Jan 2017 16:51:50 -0700	[thread overview]
Message-ID: <20170117235150.GE4880@omniknight.lm.intel.com> (raw)
In-Reply-To: <CAOvWMLYcP9PN6LT51gwJvmyCTfRRrVeDTrjN-8_zTKhD+UmDiw@mail.gmail.com>

On 01/17, Andiry Xu wrote:

<snip>

> >>
> >> The pmem_do_bvec() read logic is like this:
> >>
> >> pmem_do_bvec()
> >>     if (is_bad_pmem())
> >>         return -EIO;
> >>     else
> >>         memcpy_from_pmem();
> >>
> >> Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this imply
> >> that even if a block is not in the badblock list, it still can be bad
> >> and causes MCE? Does the badblock list get changed during file system
> >> running? If that is the case, should the file system get a
> >> notification when it gets changed? If a block is good when I first
> >> read it, can I still trust it to be good for the second access?
> >
> > Yes, if a block is not in the badblocks list, it can still cause an
> > MCE. This is the latent error case I described above. For a simple read()
> > via the pmem driver, this will get handled by memcpy_mcsafe. For mmap,
> > an MCE is inevitable.
> >
> > Yes the badblocks list may change while a filesystem is running. The RFC
> > patches[1] I linked to add a notification for the filesystem when this
> > happens.
> >
> 
> This is really bad and it makes file system implementation much more
> complicated. And badblock notification does not help very much,
> because any block can be bad potentially, no matter it is in badblock
> list or not. And file system has to perform checking for every read,
> using memcpy_mcsafe. This is disaster for file system like NOVA, which
> uses pointer de-reference to access data structures on pmem. Now if I
> want to read a field in an inode on pmem, I have to copy it to DRAM
> first and make sure memcpy_mcsafe() does not report anything wrong.

You have a good point, and I don't know if I have an answer for this..
Assuming a system with MCE recovery, maybe NOVA can add a mce handler
similar to nfit_handle_mce(), and handle errors as they happen, but I'm
being very hand-wavey here and don't know how much/how well that might
work..

> 
> > No, if the media, for some reason, 'dvelops' a bad cell, a second
> > consecutive read does have a chance of being bad. Once a location has
> > been marked as bad, it will stay bad till the ACPI clear error 'DSM' has
> > been called to mark it as clean.
> >
> 
> I wonder what happens to write in this case? If a block is bad but not
> reported in badblock list. Now I write to it without reading first. Do
> I clear the poison with the write? Or still require a ACPI DSM?

With writes, my understanding is there is still a possibility that an
internal read-modify-write can happen, and cause a MCE (this is the same
as writing to a bad DRAM cell, which can also cause an MCE). You can't
really use the ACPI DSM preemptively because you don't know whether the
location was bad. The error flow will be something like write causes the
MCE, a badblock gets added (either through the mce handler or after the
next reboot), and the recovery path is now the same as a regular badblock.

> 
> > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
> >
> 
> Thank you for the patchset. I will look into it.
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Vishal Verma <vishal.l.verma@intel.com>
To: Andiry Xu <andiry@gmail.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Viacheslav Dubeyko <slava@dubeyko.com>
Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Tue, 17 Jan 2017 16:51:50 -0700	[thread overview]
Message-ID: <20170117235150.GE4880@omniknight.lm.intel.com> (raw)
In-Reply-To: <CAOvWMLYcP9PN6LT51gwJvmyCTfRRrVeDTrjN-8_zTKhD+UmDiw@mail.gmail.com>

On 01/17, Andiry Xu wrote:

<snip>

> >>
> >> The pmem_do_bvec() read logic is like this:
> >>
> >> pmem_do_bvec()
> >>     if (is_bad_pmem())
> >>         return -EIO;
> >>     else
> >>         memcpy_from_pmem();
> >>
> >> Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this imply
> >> that even if a block is not in the badblock list, it still can be bad
> >> and causes MCE? Does the badblock list get changed during file system
> >> running? If that is the case, should the file system get a
> >> notification when it gets changed? If a block is good when I first
> >> read it, can I still trust it to be good for the second access?
> >
> > Yes, if a block is not in the badblocks list, it can still cause an
> > MCE. This is the latent error case I described above. For a simple read()
> > via the pmem driver, this will get handled by memcpy_mcsafe. For mmap,
> > an MCE is inevitable.
> >
> > Yes the badblocks list may change while a filesystem is running. The RFC
> > patches[1] I linked to add a notification for the filesystem when this
> > happens.
> >
> 
> This is really bad and it makes file system implementation much more
> complicated. And badblock notification does not help very much,
> because any block can be bad potentially, no matter it is in badblock
> list or not. And file system has to perform checking for every read,
> using memcpy_mcsafe. This is disaster for file system like NOVA, which
> uses pointer de-reference to access data structures on pmem. Now if I
> want to read a field in an inode on pmem, I have to copy it to DRAM
> first and make sure memcpy_mcsafe() does not report anything wrong.

You have a good point, and I don't know if I have an answer for this..
Assuming a system with MCE recovery, maybe NOVA can add a mce handler
similar to nfit_handle_mce(), and handle errors as they happen, but I'm
being very hand-wavey here and don't know how much/how well that might
work..

> 
> > No, if the media, for some reason, 'dvelops' a bad cell, a second
> > consecutive read does have a chance of being bad. Once a location has
> > been marked as bad, it will stay bad till the ACPI clear error 'DSM' has
> > been called to mark it as clean.
> >
> 
> I wonder what happens to write in this case? If a block is bad but not
> reported in badblock list. Now I write to it without reading first. Do
> I clear the poison with the write? Or still require a ACPI DSM?

With writes, my understanding is there is still a possibility that an
internal read-modify-write can happen, and cause a MCE (this is the same
as writing to a bad DRAM cell, which can also cause an MCE). You can't
really use the ACPI DSM preemptively because you don't know whether the
location was bad. The error flow will be something like write causes the
MCE, a badblock gets added (either through the mce handler or after the
next reboot), and the recovery path is now the same as a regular badblock.

> 
> > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
> >
> 
> Thank you for the patchset. I will look into it.
> 

  reply	other threads:[~2017-01-17 23:51 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <at1mp6pou4lenesjdgh22k4p.1484345585589@email.android.com>
     [not found] ` <b9rbflutjt10mb4ofherta8j.1484345610771@email.android.com>
2017-01-14  0:00   ` [LSF/MM TOPIC] Badblocks checking/representation in filesystems Slava Dubeyko
2017-01-14  0:00     ` Slava Dubeyko
2017-01-14  0:00     ` Slava Dubeyko
2017-01-14  0:49     ` Vishal Verma
2017-01-14  0:49       ` Vishal Verma
2017-01-16  2:27       ` Slava Dubeyko
2017-01-16  2:27         ` Slava Dubeyko
2017-01-16  2:27         ` Slava Dubeyko
2017-01-17 14:37         ` [Lsf-pc] " Jan Kara
2017-01-17 14:37           ` Jan Kara
2017-01-17 15:08           ` Christoph Hellwig
2017-01-17 15:08             ` Christoph Hellwig
2017-01-17 22:14           ` Vishal Verma
2017-01-17 22:14             ` Vishal Verma
2017-01-18 10:16             ` Jan Kara
2017-01-18 10:16               ` Jan Kara
2017-01-18 20:39               ` Jeff Moyer
2017-01-18 20:39                 ` Jeff Moyer
2017-01-18 21:02                 ` Darrick J. Wong
2017-01-18 21:02                   ` Darrick J. Wong
2017-01-18 21:32                   ` Dan Williams
2017-01-18 21:32                     ` Dan Williams
     [not found]                     ` <CAPcyv4hd7bpCa7d9msX0Y8gLz7WsqXT3VExQwwLuAcsmMxVTPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-18 21:56                       ` Verma, Vishal L
2017-01-18 21:56                         ` Verma, Vishal L
2017-01-18 21:56                         ` Verma, Vishal L
     [not found]                         ` <1484776549.4358.33.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-01-19  8:10                           ` Jan Kara
2017-01-19  8:10                             ` Jan Kara
     [not found]                             ` <20170119081011.GA2565-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-01-19 18:59                               ` Vishal Verma
2017-01-19 18:59                                 ` Vishal Verma
     [not found]                                 ` <20170119185910.GF4880-PxNA6LsHknajYZd8rzuJLNh3ngVCH38I@public.gmane.org>
2017-01-19 19:03                                   ` Dan Williams
2017-01-19 19:03                                     ` Dan Williams
     [not found]                                     ` <CAPcyv4jZz_iqLutd0gPEL3udqbFxvBH8CZY5oDgUjG5dGbC2gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20  9:03                                       ` Jan Kara
2017-01-20  9:03                                         ` Jan Kara
2017-01-17 23:15           ` Slava Dubeyko
2017-01-17 23:15             ` Slava Dubeyko
2017-01-17 23:15             ` Slava Dubeyko
2017-01-18 20:47             ` Jeff Moyer
2017-01-18 20:47               ` Jeff Moyer
2017-01-19  2:56               ` Slava Dubeyko
2017-01-19  2:56                 ` Slava Dubeyko
2017-01-19  2:56                 ` Slava Dubeyko
2017-01-19 19:33                 ` Jeff Moyer
2017-01-19 19:33                   ` Jeff Moyer
2017-01-17  6:33       ` Darrick J. Wong
2017-01-17  6:33         ` Darrick J. Wong
2017-01-17 21:35         ` Vishal Verma
2017-01-17 21:35           ` Vishal Verma
2017-01-17 22:15           ` Andiry Xu
2017-01-17 22:15             ` Andiry Xu
2017-01-17 22:37             ` Vishal Verma
2017-01-17 22:37               ` Vishal Verma
2017-01-17 23:20               ` Andiry Xu
2017-01-17 23:20                 ` Andiry Xu
2017-01-17 23:51                 ` Vishal Verma [this message]
2017-01-17 23:51                   ` Vishal Verma
2017-01-18  1:58                   ` Andiry Xu
2017-01-18  1:58                     ` Andiry Xu
     [not found]                     ` <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20  0:32                       ` Verma, Vishal L
2017-01-20  0:32                         ` Verma, Vishal L
2017-01-20  0:32                         ` Verma, Vishal L
2017-01-18  9:38               ` [Lsf-pc] " Jan Kara
2017-01-18  9:38                 ` Jan Kara
2017-01-19 21:17                 ` Vishal Verma
2017-01-19 21:17                   ` Vishal Verma
2017-01-20  9:47                   ` Jan Kara
2017-01-20  9:47                     ` Jan Kara
2017-01-20 15:42                     ` Dan Williams
2017-01-20 15:42                       ` Dan Williams
2017-01-24  7:46                       ` Jan Kara
2017-01-24  7:46                         ` Jan Kara
2017-01-24 19:59                         ` Vishal Verma
2017-01-24 19:59                           ` Vishal Verma
2017-01-18  0:16             ` Andreas Dilger
2017-01-18  2:01               ` Andiry Xu
2017-01-18  2:01                 ` Andiry Xu
2017-01-18  3:08                 ` Lu Zhang
2017-01-18  3:08                   ` Lu Zhang
2017-01-20  0:46                   ` Vishal Verma
2017-01-20  0:46                     ` Vishal Verma
2017-01-20  9:24                     ` Yasunori Goto
2017-01-20  9:24                       ` Yasunori Goto
     [not found]                       ` <20170120182435.0E12.E1E9C6FF-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2017-01-21  0:23                         ` Kani, Toshimitsu
2017-01-21  0:23                           ` Kani, Toshimitsu
2017-01-21  0:23                           ` Kani, Toshimitsu
2017-01-20  0:55                 ` Verma, Vishal L
2017-01-20  0:55                   ` Verma, Vishal L
2017-01-13 21:40 Verma, Vishal L
2017-01-13 21:40 ` Verma, Vishal L
2017-01-13 21:40 ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170117235150.GE4880@omniknight.lm.intel.com \
    --to=vishal.l.verma@intel.com \
    --cc=Vyacheslav.Dubeyko@wdc.com \
    --cc=andiry@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.