linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "andiry@gmail.com" <andiry@gmail.com>
Cc: "darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"Vyacheslav.Dubeyko@wdc.com" <Vyacheslav.Dubeyko@wdc.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Fri, 20 Jan 2017 00:32:14 +0000	[thread overview]
Message-ID: <1484872265.4857.1.camel@intel.com> (raw)
In-Reply-To: <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w@mail.gmail.com>

On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:
> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co
> m> wrote:
> > On 01/17, Andiry Xu wrote:
> > 
> > <snip>
> > 
> > > > > 
> > > > > The pmem_do_bvec() read logic is like this:
> > > > > 
> > > > > pmem_do_bvec()
> > > > >     if (is_bad_pmem())
> > > > >         return -EIO;
> > > > >     else
> > > > >         memcpy_from_pmem();
> > > > > 
> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this
> > > > > imply
> > > > > that even if a block is not in the badblock list, it still can
> > > > > be bad
> > > > > and causes MCE? Does the badblock list get changed during file
> > > > > system
> > > > > running? If that is the case, should the file system get a
> > > > > notification when it gets changed? If a block is good when I
> > > > > first
> > > > > read it, can I still trust it to be good for the second
> > > > > access?
> > > > 
> > > > Yes, if a block is not in the badblocks list, it can still cause
> > > > an
> > > > MCE. This is the latent error case I described above. For a
> > > > simple read()
> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For
> > > > mmap,
> > > > an MCE is inevitable.
> > > > 
> > > > Yes the badblocks list may change while a filesystem is running.
> > > > The RFC
> > > > patches[1] I linked to add a notification for the filesystem
> > > > when this
> > > > happens.
> > > > 
> > > 
> > > This is really bad and it makes file system implementation much
> > > more
> > > complicated. And badblock notification does not help very much,
> > > because any block can be bad potentially, no matter it is in
> > > badblock
> > > list or not. And file system has to perform checking for every
> > > read,
> > > using memcpy_mcsafe. This is disaster for file system like NOVA,
> > > which
> > > uses pointer de-reference to access data structures on pmem. Now
> > > if I
> > > want to read a field in an inode on pmem, I have to copy it to
> > > DRAM
> > > first and make sure memcpy_mcsafe() does not report anything
> > > wrong.
> > 
> > You have a good point, and I don't know if I have an answer for
> > this..
> > Assuming a system with MCE recovery, maybe NOVA can add a mce
> > handler
> > similar to nfit_handle_mce(), and handle errors as they happen, but
> > I'm
> > being very hand-wavey here and don't know how much/how well that
> > might
> > work..
> > 
> > > 
> > > > No, if the media, for some reason, 'dvelops' a bad cell, a
> > > > second
> > > > consecutive read does have a chance of being bad. Once a
> > > > location has
> > > > been marked as bad, it will stay bad till the ACPI clear error
> > > > 'DSM' has
> > > > been called to mark it as clean.
> > > > 
> > > 
> > > I wonder what happens to write in this case? If a block is bad but
> > > not
> > > reported in badblock list. Now I write to it without reading
> > > first. Do
> > > I clear the poison with the write? Or still require a ACPI DSM?
> > 
> > With writes, my understanding is there is still a possibility that
> > an
> > internal read-modify-write can happen, and cause a MCE (this is the
> > same
> > as writing to a bad DRAM cell, which can also cause an MCE). You
> > can't
> > really use the ACPI DSM preemptively because you don't know whether
> > the
> > location was bad. The error flow will be something like write causes
> > the
> > MCE, a badblock gets added (either through the mce handler or after
> > the
> > next reboot), and the recovery path is now the same as a regular
> > badblock.
> > 
> 
> This is different from my understanding. Right now write_pmem() in
> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it
> clears poison and writes to pmem again. Seems to me writing to bad
> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?

You are right, writes don't use memcpy_mcsafe, and will not directly
cause an MCE. However a write can cause an asynchronous 'CMCI' -
corrected machine check interrupt, but this is not critical, and wont be
a memory error as the core didn't consume poison. memcpy_mcsafe cannot
protect against this because the write is 'posted' and the CMCI is not
synchronous. Note that this is only in the latent error or memmap-store
case.

> 
> Thanks,
> Andiry
> 
> > > 
> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
> > > > 
> > > 
> > > Thank you for the patchset. I will look into it.
> > > 

  reply	other threads:[~2017-01-20  0:32 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <at1mp6pou4lenesjdgh22k4p.1484345585589@email.android.com>
     [not found] ` <b9rbflutjt10mb4ofherta8j.1484345610771@email.android.com>
2017-01-14  0:00   ` [LSF/MM TOPIC] Badblocks checking/representation in filesystems Slava Dubeyko
2017-01-14  0:49     ` Vishal Verma
2017-01-16  2:27       ` Slava Dubeyko
2017-01-17 14:37         ` [Lsf-pc] " Jan Kara
2017-01-17 15:08           ` Christoph Hellwig
2017-01-17 22:14           ` Vishal Verma
2017-01-18 10:16             ` Jan Kara
2017-01-18 20:39               ` Jeff Moyer
2017-01-18 21:02                 ` Darrick J. Wong
2017-01-18 21:32                   ` Dan Williams
2017-01-18 21:56                     ` Verma, Vishal L
2017-01-19  8:10                       ` Jan Kara
2017-01-19 18:59                         ` Vishal Verma
2017-01-19 19:03                           ` Dan Williams
2017-01-20  9:03                             ` Jan Kara
2017-01-17 23:15           ` Slava Dubeyko
2017-01-18 20:47             ` Jeff Moyer
2017-01-19  2:56               ` Slava Dubeyko
2017-01-19 19:33                 ` Jeff Moyer
2017-01-17  6:33       ` Darrick J. Wong
2017-01-17 21:35         ` Vishal Verma
2017-01-17 22:15           ` Andiry Xu
2017-01-17 22:37             ` Vishal Verma
2017-01-17 23:20               ` Andiry Xu
2017-01-17 23:51                 ` Vishal Verma
2017-01-18  1:58                   ` Andiry Xu
2017-01-20  0:32                     ` Verma, Vishal L [this message]
2017-01-18  9:38               ` [Lsf-pc] " Jan Kara
2017-01-19 21:17                 ` Vishal Verma
2017-01-20  9:47                   ` Jan Kara
2017-01-20 15:42                     ` Dan Williams
2017-01-24  7:46                       ` Jan Kara
2017-01-24 19:59                         ` Vishal Verma
2017-01-18  0:16             ` Andreas Dilger
2017-01-18  2:01               ` Andiry Xu
     [not found]                 ` <CAOvWMLZA092iUCnFxCxPZmDNX-hH08xbSnweBhK-E-m9Ko0yuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-18  3:08                   ` Lu Zhang
2017-01-20  0:46                     ` Vishal Verma
2017-01-20  9:24                       ` Yasunori Goto
2017-01-21  0:23                         ` Kani, Toshimitsu
2017-01-20  0:55                 ` Verma, Vishal L
2017-01-13 21:40 Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1484872265.4857.1.camel@intel.com \
    --to=vishal.l.verma@intel.com \
    --cc=Vyacheslav.Dubeyko@wdc.com \
    --cc=andiry@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).