All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Verma, Vishal L" <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org"
	<Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org>,
	"darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	"linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org"
	<linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org>,
	"linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org"
	<slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>,
	"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org"
	<lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Fri, 20 Jan 2017 00:32:14 +0000	[thread overview]
Message-ID: <1484872265.4857.1.camel@intel.com> (raw)
In-Reply-To: <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:
> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co
> m> wrote:
> > On 01/17, Andiry Xu wrote:
> > 
> > <snip>
> > 
> > > > > 
> > > > > The pmem_do_bvec() read logic is like this:
> > > > > 
> > > > > pmem_do_bvec()
> > > > >     if (is_bad_pmem())
> > > > >         return -EIO;
> > > > >     else
> > > > >         memcpy_from_pmem();
> > > > > 
> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this
> > > > > imply
> > > > > that even if a block is not in the badblock list, it still can
> > > > > be bad
> > > > > and causes MCE? Does the badblock list get changed during file
> > > > > system
> > > > > running? If that is the case, should the file system get a
> > > > > notification when it gets changed? If a block is good when I
> > > > > first
> > > > > read it, can I still trust it to be good for the second
> > > > > access?
> > > > 
> > > > Yes, if a block is not in the badblocks list, it can still cause
> > > > an
> > > > MCE. This is the latent error case I described above. For a
> > > > simple read()
> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For
> > > > mmap,
> > > > an MCE is inevitable.
> > > > 
> > > > Yes the badblocks list may change while a filesystem is running.
> > > > The RFC
> > > > patches[1] I linked to add a notification for the filesystem
> > > > when this
> > > > happens.
> > > > 
> > > 
> > > This is really bad and it makes file system implementation much
> > > more
> > > complicated. And badblock notification does not help very much,
> > > because any block can be bad potentially, no matter it is in
> > > badblock
> > > list or not. And file system has to perform checking for every
> > > read,
> > > using memcpy_mcsafe. This is disaster for file system like NOVA,
> > > which
> > > uses pointer de-reference to access data structures on pmem. Now
> > > if I
> > > want to read a field in an inode on pmem, I have to copy it to
> > > DRAM
> > > first and make sure memcpy_mcsafe() does not report anything
> > > wrong.
> > 
> > You have a good point, and I don't know if I have an answer for
> > this..
> > Assuming a system with MCE recovery, maybe NOVA can add a mce
> > handler
> > similar to nfit_handle_mce(), and handle errors as they happen, but
> > I'm
> > being very hand-wavey here and don't know how much/how well that
> > might
> > work..
> > 
> > > 
> > > > No, if the media, for some reason, 'dvelops' a bad cell, a
> > > > second
> > > > consecutive read does have a chance of being bad. Once a
> > > > location has
> > > > been marked as bad, it will stay bad till the ACPI clear error
> > > > 'DSM' has
> > > > been called to mark it as clean.
> > > > 
> > > 
> > > I wonder what happens to write in this case? If a block is bad but
> > > not
> > > reported in badblock list. Now I write to it without reading
> > > first. Do
> > > I clear the poison with the write? Or still require a ACPI DSM?
> > 
> > With writes, my understanding is there is still a possibility that
> > an
> > internal read-modify-write can happen, and cause a MCE (this is the
> > same
> > as writing to a bad DRAM cell, which can also cause an MCE). You
> > can't
> > really use the ACPI DSM preemptively because you don't know whether
> > the
> > location was bad. The error flow will be something like write causes
> > the
> > MCE, a badblock gets added (either through the mce handler or after
> > the
> > next reboot), and the recovery path is now the same as a regular
> > badblock.
> > 
> 
> This is different from my understanding. Right now write_pmem() in
> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it
> clears poison and writes to pmem again. Seems to me writing to bad
> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?

You are right, writes don't use memcpy_mcsafe, and will not directly
cause an MCE. However a write can cause an asynchronous 'CMCI' -
corrected machine check interrupt, but this is not critical, and wont be
a memory error as the core didn't consume poison. memcpy_mcsafe cannot
protect against this because the write is 'posted' and the CMCI is not
synchronous. Note that this is only in the latent error or memmap-store
case.

> 
> Thanks,
> Andiry
> 
> > > 
> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
> > > > 
> > > 
> > > Thank you for the patchset. I will look into it.
> > > 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "andiry@gmail.com" <andiry@gmail.com>
Cc: "darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"Vyacheslav.Dubeyko@wdc.com" <Vyacheslav.Dubeyko@wdc.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Fri, 20 Jan 2017 00:32:14 +0000	[thread overview]
Message-ID: <1484872265.4857.1.camel@intel.com> (raw)
In-Reply-To: <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w@mail.gmail.com>

T24gVHVlLCAyMDE3LTAxLTE3IGF0IDE3OjU4IC0wODAwLCBBbmRpcnkgWHUgd3JvdGU6DQo+IE9u
IFR1ZSwgSmFuIDE3LCAyMDE3IGF0IDM6NTEgUE0sIFZpc2hhbCBWZXJtYSA8dmlzaGFsLmwudmVy
bWFAaW50ZWwuY28NCj4gbT4gd3JvdGU6DQo+ID4gT24gMDEvMTcsIEFuZGlyeSBYdSB3cm90ZToN
Cj4gPiANCj4gPiA8c25pcD4NCj4gPiANCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBUaGUgcG1lbV9k
b19idmVjKCkgcmVhZCBsb2dpYyBpcyBsaWtlIHRoaXM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4g
cG1lbV9kb19idmVjKCkNCj4gPiA+ID4gPiDCoMKgwqDCoGlmIChpc19iYWRfcG1lbSgpKQ0KPiA+
ID4gPiA+IMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gLUVJTzsNCj4gPiA+ID4gPiDCoMKgwqDCoGVs
c2UNCj4gPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgbWVtY3B5X2Zyb21fcG1lbSgpOw0KPiA+ID4g
PiA+IA0KPiA+ID4gPiA+IE5vdGUgbWVtY3B5X2Zyb21fcG1lbSgpIGlzIGNhbGxpbmcgbWVtY3B5
X21jc2FmZSgpLiBEb2VzIHRoaXMNCj4gPiA+ID4gPiBpbXBseQ0KPiA+ID4gPiA+IHRoYXQgZXZl
biBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0aGUgYmFkYmxvY2sgbGlzdCwgaXQgc3RpbGwgY2FuDQo+
ID4gPiA+ID4gYmUgYmFkDQo+ID4gPiA+ID4gYW5kIGNhdXNlcyBNQ0U/IERvZXMgdGhlIGJhZGJs
b2NrIGxpc3QgZ2V0IGNoYW5nZWQgZHVyaW5nIGZpbGUNCj4gPiA+ID4gPiBzeXN0ZW0NCj4gPiA+
ID4gPiBydW5uaW5nPyBJZiB0aGF0IGlzIHRoZSBjYXNlLCBzaG91bGQgdGhlIGZpbGUgc3lzdGVt
IGdldCBhDQo+ID4gPiA+ID4gbm90aWZpY2F0aW9uIHdoZW4gaXQgZ2V0cyBjaGFuZ2VkPyBJZiBh
IGJsb2NrIGlzIGdvb2Qgd2hlbiBJDQo+ID4gPiA+ID4gZmlyc3QNCj4gPiA+ID4gPiByZWFkIGl0
LCBjYW4gSSBzdGlsbCB0cnVzdCBpdCB0byBiZSBnb29kIGZvciB0aGUgc2Vjb25kDQo+ID4gPiA+
ID4gYWNjZXNzPw0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLCBpZiBhIGJsb2NrIGlzIG5vdCBpbiB0
aGUgYmFkYmxvY2tzIGxpc3QsIGl0IGNhbiBzdGlsbCBjYXVzZQ0KPiA+ID4gPiBhbg0KPiA+ID4g
PiBNQ0UuIFRoaXMgaXMgdGhlIGxhdGVudCBlcnJvciBjYXNlIEkgZGVzY3JpYmVkIGFib3ZlLiBG
b3IgYQ0KPiA+ID4gPiBzaW1wbGUgcmVhZCgpDQo+ID4gPiA+IHZpYSB0aGUgcG1lbSBkcml2ZXIs
IHRoaXMgd2lsbCBnZXQgaGFuZGxlZCBieSBtZW1jcHlfbWNzYWZlLiBGb3INCj4gPiA+ID4gbW1h
cCwNCj4gPiA+ID4gYW4gTUNFIGlzIGluZXZpdGFibGUuDQo+ID4gPiA+IA0KPiA+ID4gPiBZZXMg
dGhlIGJhZGJsb2NrcyBsaXN0IG1heSBjaGFuZ2Ugd2hpbGUgYSBmaWxlc3lzdGVtIGlzIHJ1bm5p
bmcuDQo+ID4gPiA+IFRoZSBSRkMNCj4gPiA+ID4gcGF0Y2hlc1sxXSBJIGxpbmtlZCB0byBhZGQg
YSBub3RpZmljYXRpb24gZm9yIHRoZSBmaWxlc3lzdGVtDQo+ID4gPiA+IHdoZW4gdGhpcw0KPiA+
ID4gPiBoYXBwZW5zLg0KPiA+ID4gPiANCj4gPiA+IA0KPiA+ID4gVGhpcyBpcyByZWFsbHkgYmFk
IGFuZCBpdCBtYWtlcyBmaWxlIHN5c3RlbSBpbXBsZW1lbnRhdGlvbiBtdWNoDQo+ID4gPiBtb3Jl
DQo+ID4gPiBjb21wbGljYXRlZC4gQW5kIGJhZGJsb2NrIG5vdGlmaWNhdGlvbiBkb2VzIG5vdCBo
ZWxwIHZlcnkgbXVjaCwNCj4gPiA+IGJlY2F1c2UgYW55IGJsb2NrIGNhbiBiZSBiYWQgcG90ZW50
aWFsbHksIG5vIG1hdHRlciBpdCBpcyBpbg0KPiA+ID4gYmFkYmxvY2sNCj4gPiA+IGxpc3Qgb3Ig
bm90LiBBbmQgZmlsZSBzeXN0ZW0gaGFzIHRvIHBlcmZvcm0gY2hlY2tpbmcgZm9yIGV2ZXJ5DQo+
ID4gPiByZWFkLA0KPiA+ID4gdXNpbmcgbWVtY3B5X21jc2FmZS4gVGhpcyBpcyBkaXNhc3RlciBm
b3IgZmlsZSBzeXN0ZW0gbGlrZSBOT1ZBLA0KPiA+ID4gd2hpY2gNCj4gPiA+IHVzZXMgcG9pbnRl
ciBkZS1yZWZlcmVuY2UgdG8gYWNjZXNzIGRhdGEgc3RydWN0dXJlcyBvbiBwbWVtLiBOb3cNCj4g
PiA+IGlmIEkNCj4gPiA+IHdhbnQgdG8gcmVhZCBhIGZpZWxkIGluIGFuIGlub2RlIG9uIHBtZW0s
IEkgaGF2ZSB0byBjb3B5IGl0IHRvDQo+ID4gPiBEUkFNDQo+ID4gPiBmaXJzdCBhbmQgbWFrZSBz
dXJlIG1lbWNweV9tY3NhZmUoKSBkb2VzIG5vdCByZXBvcnQgYW55dGhpbmcNCj4gPiA+IHdyb25n
Lg0KPiA+IA0KPiA+IFlvdSBoYXZlIGEgZ29vZCBwb2ludCwgYW5kIEkgZG9uJ3Qga25vdyBpZiBJ
IGhhdmUgYW4gYW5zd2VyIGZvcg0KPiA+IHRoaXMuLg0KPiA+IEFzc3VtaW5nIGEgc3lzdGVtIHdp
dGggTUNFIHJlY292ZXJ5LCBtYXliZSBOT1ZBIGNhbiBhZGQgYSBtY2UNCj4gPiBoYW5kbGVyDQo+
ID4gc2ltaWxhciB0byBuZml0X2hhbmRsZV9tY2UoKSwgYW5kIGhhbmRsZSBlcnJvcnMgYXMgdGhl
eSBoYXBwZW4sIGJ1dA0KPiA+IEknbQ0KPiA+IGJlaW5nIHZlcnkgaGFuZC13YXZleSBoZXJlIGFu
ZCBkb24ndCBrbm93IGhvdyBtdWNoL2hvdyB3ZWxsIHRoYXQNCj4gPiBtaWdodA0KPiA+IHdvcmsu
Lg0KPiA+IA0KPiA+ID4gDQo+ID4gPiA+IE5vLCBpZiB0aGUgbWVkaWEsIGZvciBzb21lIHJlYXNv
biwgJ2R2ZWxvcHMnIGEgYmFkIGNlbGwsIGENCj4gPiA+ID4gc2Vjb25kDQo+ID4gPiA+IGNvbnNl
Y3V0aXZlIHJlYWQgZG9lcyBoYXZlIGEgY2hhbmNlIG9mIGJlaW5nIGJhZC4gT25jZSBhDQo+ID4g
PiA+IGxvY2F0aW9uIGhhcw0KPiA+ID4gPiBiZWVuIG1hcmtlZCBhcyBiYWQsIGl0IHdpbGwgc3Rh
eSBiYWQgdGlsbCB0aGUgQUNQSSBjbGVhciBlcnJvcg0KPiA+ID4gPiAnRFNNJyBoYXMNCj4gPiA+
ID4gYmVlbiBjYWxsZWQgdG8gbWFyayBpdCBhcyBjbGVhbi4NCj4gPiA+ID4gDQo+ID4gPiANCj4g
PiA+IEkgd29uZGVyIHdoYXQgaGFwcGVucyB0byB3cml0ZSBpbiB0aGlzIGNhc2U/IElmIGEgYmxv
Y2sgaXMgYmFkIGJ1dA0KPiA+ID4gbm90DQo+ID4gPiByZXBvcnRlZCBpbiBiYWRibG9jayBsaXN0
LiBOb3cgSSB3cml0ZSB0byBpdCB3aXRob3V0IHJlYWRpbmcNCj4gPiA+IGZpcnN0LiBEbw0KPiA+
ID4gSSBjbGVhciB0aGUgcG9pc29uIHdpdGggdGhlIHdyaXRlPyBPciBzdGlsbCByZXF1aXJlIGEg
QUNQSSBEU00/DQo+ID4gDQo+ID4gV2l0aCB3cml0ZXMsIG15IHVuZGVyc3RhbmRpbmcgaXMgdGhl
cmUgaXMgc3RpbGwgYSBwb3NzaWJpbGl0eSB0aGF0DQo+ID4gYW4NCj4gPiBpbnRlcm5hbCByZWFk
LW1vZGlmeS13cml0ZSBjYW4gaGFwcGVuLCBhbmQgY2F1c2UgYSBNQ0UgKHRoaXMgaXMgdGhlDQo+
ID4gc2FtZQ0KPiA+IGFzIHdyaXRpbmcgdG8gYSBiYWQgRFJBTSBjZWxsLCB3aGljaCBjYW4gYWxz
byBjYXVzZSBhbiBNQ0UpLiBZb3UNCj4gPiBjYW4ndA0KPiA+IHJlYWxseSB1c2UgdGhlIEFDUEkg
RFNNIHByZWVtcHRpdmVseSBiZWNhdXNlIHlvdSBkb24ndCBrbm93IHdoZXRoZXINCj4gPiB0aGUN
Cj4gPiBsb2NhdGlvbiB3YXMgYmFkLiBUaGUgZXJyb3IgZmxvdyB3aWxsIGJlIHNvbWV0aGluZyBs
aWtlIHdyaXRlIGNhdXNlcw0KPiA+IHRoZQ0KPiA+IE1DRSwgYSBiYWRibG9jayBnZXRzIGFkZGVk
IChlaXRoZXIgdGhyb3VnaCB0aGUgbWNlIGhhbmRsZXIgb3IgYWZ0ZXINCj4gPiB0aGUNCj4gPiBu
ZXh0IHJlYm9vdCksIGFuZCB0aGUgcmVjb3ZlcnkgcGF0aCBpcyBub3cgdGhlIHNhbWUgYXMgYSBy
ZWd1bGFyDQo+ID4gYmFkYmxvY2suDQo+ID4gDQo+IA0KPiBUaGlzIGlzIGRpZmZlcmVudCBmcm9t
IG15IHVuZGVyc3RhbmRpbmcuIFJpZ2h0IG5vdyB3cml0ZV9wbWVtKCkgaW4NCj4gcG1lbV9kb19i
dmVjKCkgZG9lcyBub3QgdXNlIG1lbWNweV9tY3NhZmUoKS4gSWYgdGhlIGJsb2NrIGlzIGJhZCBp
dA0KPiBjbGVhcnMgcG9pc29uIGFuZCB3cml0ZXMgdG8gcG1lbSBhZ2Fpbi4gU2VlbXMgdG8gbWUg
d3JpdGluZyB0byBiYWQNCj4gYmxvY2tzIGRvZXMgbm90IGNhdXNlIE1DRS4gRG8gd2UgbmVlZCBt
ZW1jcHlfbWNzYWZlIGZvciBwbWVtIHN0b3Jlcz8NCg0KWW91IGFyZSByaWdodCwgd3JpdGVzIGRv
bid0IHVzZSBtZW1jcHlfbWNzYWZlLCBhbmQgd2lsbCBub3QgZGlyZWN0bHkNCmNhdXNlIGFuIE1D
RS4gSG93ZXZlciBhIHdyaXRlIGNhbiBjYXVzZSBhbiBhc3luY2hyb25vdXMgJ0NNQ0knIC0NCmNv
cnJlY3RlZCBtYWNoaW5lIGNoZWNrIGludGVycnVwdCwgYnV0IHRoaXMgaXMgbm90IGNyaXRpY2Fs
LCBhbmQgd29udCBiZQ0KYSBtZW1vcnkgZXJyb3IgYXMgdGhlIGNvcmUgZGlkbid0IGNvbnN1bWUg
cG9pc29uLiBtZW1jcHlfbWNzYWZlIGNhbm5vdA0KcHJvdGVjdCBhZ2FpbnN0IHRoaXMgYmVjYXVz
ZSB0aGUgd3JpdGUgaXMgJ3Bvc3RlZCcgYW5kIHRoZSBDTUNJIGlzIG5vdA0Kc3luY2hyb25vdXMu
IE5vdGUgdGhhdCB0aGlzIGlzIG9ubHkgaW4gdGhlIGxhdGVudCBlcnJvciBvciBtZW1tYXAtc3Rv
cmUNCmNhc2UuDQoNCj4gDQo+IFRoYW5rcywNCj4gQW5kaXJ5DQo+IA0KPiA+ID4gDQo+ID4gPiA+
IFsxXTogaHR0cDovL3d3dy5saW51eC5zZ2kuY29tL2FyY2hpdmVzL3hmcy8yMDE2LTA2L21zZzAw
Mjk5Lmh0bWwNCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IFRoYW5rIHlvdSBmb3IgdGhlIHBhdGNo
c2V0LiBJIHdpbGwgbG9vayBpbnRvIGl0Lg0KPiA+ID4g

WARNING: multiple messages have this Message-ID (diff)
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "andiry@gmail.com" <andiry@gmail.com>
Cc: "darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"Vyacheslav.Dubeyko@wdc.com" <Vyacheslav.Dubeyko@wdc.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Fri, 20 Jan 2017 00:32:14 +0000	[thread overview]
Message-ID: <1484872265.4857.1.camel@intel.com> (raw)
In-Reply-To: <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w@mail.gmail.com>

On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote:
> On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma <vishal.l.verma@intel.co
> m> wrote:
> > On 01/17, Andiry Xu wrote:
> > 
> > <snip>
> > 
> > > > > 
> > > > > The pmem_do_bvec() read logic is like this:
> > > > > 
> > > > > pmem_do_bvec()
> > > > >     if (is_bad_pmem())
> > > > >         return -EIO;
> > > > >     else
> > > > >         memcpy_from_pmem();
> > > > > 
> > > > > Note memcpy_from_pmem() is calling memcpy_mcsafe(). Does this
> > > > > imply
> > > > > that even if a block is not in the badblock list, it still can
> > > > > be bad
> > > > > and causes MCE? Does the badblock list get changed during file
> > > > > system
> > > > > running? If that is the case, should the file system get a
> > > > > notification when it gets changed? If a block is good when I
> > > > > first
> > > > > read it, can I still trust it to be good for the second
> > > > > access?
> > > > 
> > > > Yes, if a block is not in the badblocks list, it can still cause
> > > > an
> > > > MCE. This is the latent error case I described above. For a
> > > > simple read()
> > > > via the pmem driver, this will get handled by memcpy_mcsafe. For
> > > > mmap,
> > > > an MCE is inevitable.
> > > > 
> > > > Yes the badblocks list may change while a filesystem is running.
> > > > The RFC
> > > > patches[1] I linked to add a notification for the filesystem
> > > > when this
> > > > happens.
> > > > 
> > > 
> > > This is really bad and it makes file system implementation much
> > > more
> > > complicated. And badblock notification does not help very much,
> > > because any block can be bad potentially, no matter it is in
> > > badblock
> > > list or not. And file system has to perform checking for every
> > > read,
> > > using memcpy_mcsafe. This is disaster for file system like NOVA,
> > > which
> > > uses pointer de-reference to access data structures on pmem. Now
> > > if I
> > > want to read a field in an inode on pmem, I have to copy it to
> > > DRAM
> > > first and make sure memcpy_mcsafe() does not report anything
> > > wrong.
> > 
> > You have a good point, and I don't know if I have an answer for
> > this..
> > Assuming a system with MCE recovery, maybe NOVA can add a mce
> > handler
> > similar to nfit_handle_mce(), and handle errors as they happen, but
> > I'm
> > being very hand-wavey here and don't know how much/how well that
> > might
> > work..
> > 
> > > 
> > > > No, if the media, for some reason, 'dvelops' a bad cell, a
> > > > second
> > > > consecutive read does have a chance of being bad. Once a
> > > > location has
> > > > been marked as bad, it will stay bad till the ACPI clear error
> > > > 'DSM' has
> > > > been called to mark it as clean.
> > > > 
> > > 
> > > I wonder what happens to write in this case? If a block is bad but
> > > not
> > > reported in badblock list. Now I write to it without reading
> > > first. Do
> > > I clear the poison with the write? Or still require a ACPI DSM?
> > 
> > With writes, my understanding is there is still a possibility that
> > an
> > internal read-modify-write can happen, and cause a MCE (this is the
> > same
> > as writing to a bad DRAM cell, which can also cause an MCE). You
> > can't
> > really use the ACPI DSM preemptively because you don't know whether
> > the
> > location was bad. The error flow will be something like write causes
> > the
> > MCE, a badblock gets added (either through the mce handler or after
> > the
> > next reboot), and the recovery path is now the same as a regular
> > badblock.
> > 
> 
> This is different from my understanding. Right now write_pmem() in
> pmem_do_bvec() does not use memcpy_mcsafe(). If the block is bad it
> clears poison and writes to pmem again. Seems to me writing to bad
> blocks does not cause MCE. Do we need memcpy_mcsafe for pmem stores?

You are right, writes don't use memcpy_mcsafe, and will not directly
cause an MCE. However a write can cause an asynchronous 'CMCI' -
corrected machine check interrupt, but this is not critical, and wont be
a memory error as the core didn't consume poison. memcpy_mcsafe cannot
protect against this because the write is 'posted' and the CMCI is not
synchronous. Note that this is only in the latent error or memmap-store
case.

> 
> Thanks,
> Andiry
> 
> > > 
> > > > [1]: http://www.linux.sgi.com/archives/xfs/2016-06/msg00299.html
> > > > 
> > > 
> > > Thank you for the patchset. I will look into it.
> > > 

  parent reply	other threads:[~2017-01-20  0:32 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <at1mp6pou4lenesjdgh22k4p.1484345585589@email.android.com>
     [not found] ` <b9rbflutjt10mb4ofherta8j.1484345610771@email.android.com>
2017-01-14  0:00   ` [LSF/MM TOPIC] Badblocks checking/representation in filesystems Slava Dubeyko
2017-01-14  0:00     ` Slava Dubeyko
2017-01-14  0:00     ` Slava Dubeyko
2017-01-14  0:49     ` Vishal Verma
2017-01-14  0:49       ` Vishal Verma
2017-01-16  2:27       ` Slava Dubeyko
2017-01-16  2:27         ` Slava Dubeyko
2017-01-16  2:27         ` Slava Dubeyko
2017-01-17 14:37         ` [Lsf-pc] " Jan Kara
2017-01-17 14:37           ` Jan Kara
2017-01-17 15:08           ` Christoph Hellwig
2017-01-17 15:08             ` Christoph Hellwig
2017-01-17 22:14           ` Vishal Verma
2017-01-17 22:14             ` Vishal Verma
2017-01-18 10:16             ` Jan Kara
2017-01-18 10:16               ` Jan Kara
2017-01-18 20:39               ` Jeff Moyer
2017-01-18 20:39                 ` Jeff Moyer
2017-01-18 21:02                 ` Darrick J. Wong
2017-01-18 21:02                   ` Darrick J. Wong
2017-01-18 21:32                   ` Dan Williams
2017-01-18 21:32                     ` Dan Williams
     [not found]                     ` <CAPcyv4hd7bpCa7d9msX0Y8gLz7WsqXT3VExQwwLuAcsmMxVTPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-18 21:56                       ` Verma, Vishal L
2017-01-18 21:56                         ` Verma, Vishal L
2017-01-18 21:56                         ` Verma, Vishal L
     [not found]                         ` <1484776549.4358.33.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-01-19  8:10                           ` Jan Kara
2017-01-19  8:10                             ` Jan Kara
     [not found]                             ` <20170119081011.GA2565-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-01-19 18:59                               ` Vishal Verma
2017-01-19 18:59                                 ` Vishal Verma
     [not found]                                 ` <20170119185910.GF4880-PxNA6LsHknajYZd8rzuJLNh3ngVCH38I@public.gmane.org>
2017-01-19 19:03                                   ` Dan Williams
2017-01-19 19:03                                     ` Dan Williams
     [not found]                                     ` <CAPcyv4jZz_iqLutd0gPEL3udqbFxvBH8CZY5oDgUjG5dGbC2gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20  9:03                                       ` Jan Kara
2017-01-20  9:03                                         ` Jan Kara
2017-01-17 23:15           ` Slava Dubeyko
2017-01-17 23:15             ` Slava Dubeyko
2017-01-17 23:15             ` Slava Dubeyko
2017-01-18 20:47             ` Jeff Moyer
2017-01-18 20:47               ` Jeff Moyer
2017-01-19  2:56               ` Slava Dubeyko
2017-01-19  2:56                 ` Slava Dubeyko
2017-01-19  2:56                 ` Slava Dubeyko
2017-01-19 19:33                 ` Jeff Moyer
2017-01-19 19:33                   ` Jeff Moyer
2017-01-17  6:33       ` Darrick J. Wong
2017-01-17  6:33         ` Darrick J. Wong
2017-01-17 21:35         ` Vishal Verma
2017-01-17 21:35           ` Vishal Verma
2017-01-17 22:15           ` Andiry Xu
2017-01-17 22:15             ` Andiry Xu
2017-01-17 22:37             ` Vishal Verma
2017-01-17 22:37               ` Vishal Verma
2017-01-17 23:20               ` Andiry Xu
2017-01-17 23:20                 ` Andiry Xu
2017-01-17 23:51                 ` Vishal Verma
2017-01-17 23:51                   ` Vishal Verma
2017-01-18  1:58                   ` Andiry Xu
2017-01-18  1:58                     ` Andiry Xu
     [not found]                     ` <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20  0:32                       ` Verma, Vishal L [this message]
2017-01-20  0:32                         ` Verma, Vishal L
2017-01-20  0:32                         ` Verma, Vishal L
2017-01-18  9:38               ` [Lsf-pc] " Jan Kara
2017-01-18  9:38                 ` Jan Kara
2017-01-19 21:17                 ` Vishal Verma
2017-01-19 21:17                   ` Vishal Verma
2017-01-20  9:47                   ` Jan Kara
2017-01-20  9:47                     ` Jan Kara
2017-01-20 15:42                     ` Dan Williams
2017-01-20 15:42                       ` Dan Williams
2017-01-24  7:46                       ` Jan Kara
2017-01-24  7:46                         ` Jan Kara
2017-01-24 19:59                         ` Vishal Verma
2017-01-24 19:59                           ` Vishal Verma
2017-01-18  0:16             ` Andreas Dilger
2017-01-18  2:01               ` Andiry Xu
2017-01-18  2:01                 ` Andiry Xu
2017-01-18  3:08                 ` Lu Zhang
2017-01-18  3:08                   ` Lu Zhang
2017-01-20  0:46                   ` Vishal Verma
2017-01-20  0:46                     ` Vishal Verma
2017-01-20  9:24                     ` Yasunori Goto
2017-01-20  9:24                       ` Yasunori Goto
     [not found]                       ` <20170120182435.0E12.E1E9C6FF-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2017-01-21  0:23                         ` Kani, Toshimitsu
2017-01-21  0:23                           ` Kani, Toshimitsu
2017-01-21  0:23                           ` Kani, Toshimitsu
2017-01-20  0:55                 ` Verma, Vishal L
2017-01-20  0:55                   ` Verma, Vishal L
2017-01-13 21:40 Verma, Vishal L
2017-01-13 21:40 ` Verma, Vishal L
2017-01-13 21:40 ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1484872265.4857.1.camel@intel.com \
    --to=vishal.l.verma-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=Vyacheslav.Dubeyko-Sjgp3cTcYWE@public.gmane.org \
    --cc=andiry-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.