All of lore.kernel.org
 help / color / mirror / Atom feed
From: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Thu, 19 Jan 2017 02:56:39 +0000	[thread overview]
Message-ID: <SN2PR04MB21916B138434803EA9AF4C18887E0@SN2PR04MB2191.namprd04.prod.outlook.com> (raw)
In-Reply-To: <x49mveo6qom.fsf@segfault.boston.devel.redhat.com>


-----Original Message-----
From: Jeff Moyer [mailto:jmoyer@redhat.com] 
Sent: Wednesday, January 18, 2017 12:48 PM
To: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>
Cc: Jan Kara <jack@suse.cz>; linux-nvdimm@lists.01.org <linux-nvdimm@ml01.01.org>; linux-block@vger.kernel.org; Viacheslav Dubeyko <slava@dubeyko.com>; Linux FS Devel <linux-fsdevel@vger.kernel.org>; lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

>>> Well, the situation with NVM is more like with DRAM AFAIU. It is 
>>> quite reliable but given the size the probability *some* cell has degraded is quite high.
>>> And similar to DRAM you'll get MCE (Machine Check Exception) when you 
>>> try to read such cell. As Vishal wrote, the hardware does some 
>>> background scrubbing and relocates stuff early if needed but nothing is 100%.
>>
>> My understanding that hardware does the remapping the affected address 
>> range (64 bytes, for example) but it doesn't move/migrate the stored 
>> data in this address range. So, it sounds slightly weird. Because it 
>> means that no guarantee to retrieve the stored data. It sounds that 
>> file system should be aware about this and has to be heavily protected 
>> by some replication or erasure coding scheme. Otherwise, if the 
>> hardware does everything for us (remap the affected address region and 
>> move data into a new address region) then why does file system need to 
>> know about the affected address regions?
>
>The data is lost, that's why you're getting an ECC.  It's tantamount to -EIO for a disk block access.

I see the three possible cases here:
(1) bad block has been discovered (no remap, no recovering) -> data is lost; -EIO for a disk block access, block is always bad;
(2) bad block has been discovered and remapped -> data is lost; -EIO for a disk block access.
(3) bad block has been discovered, remapped and recovered -> no data is lost.

>> Let's imagine that the affected address range will equal to 64 bytes. 
>> It sounds for me that for the case of block device it will affect the 
>> whole logical block (4 KB).
>
> 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose.

I think it depends what granularity hardware supports. It could be 512 bytes, 4 KB, maybe greater.

>> The situation is more critical for the case of DAX approach. Correct 
>> me if I wrong but my understanding is the goal of DAX is to provide 
>> the direct access to file's memory pages with minimal file system 
>> overhead. So, it looks like that raising bad block issue on file 
>> system level will affect a user-space application. Because, finally, 
>> user-space application will need to process such trouble (bad block 
>> issue). It sounds for me as really weird situation. What can protect a 
>> user-space application from encountering the issue with partially 
>> incorrect memory page?
>
> Applications need to deal with -EIO today.  This is the same sort of thing.
> If an application trips over a bad block during a load from persistent memory,
> they will get a signal, and they can either handle it or not.
>
> Have a read through this specification and see if it clears anything up for you:
>  http://www.snia.org/tech_activities/standards/curr_standards/npm

Thank you for sharing this. So, if a user-space application follows to the
NVM Programming Model then it will be able to survive by means of catching
and processing the exceptions. But these applications have to be implemented yet.
Also such applications need in special technique(s) of recovering. It sounds
that legacy user-space applications are unable to survive for the NVM.PM.FILE mode
in the case of load/store operation's failure.

Thanks,
Vyacheslav Dubeyko.

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	"Linux FS Devel" <linux-fsdevel@vger.kernel.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Thu, 19 Jan 2017 02:56:39 +0000	[thread overview]
Message-ID: <SN2PR04MB21916B138434803EA9AF4C18887E0@SN2PR04MB2191.namprd04.prod.outlook.com> (raw)
In-Reply-To: <x49mveo6qom.fsf@segfault.boston.devel.redhat.com>


-----Original Message-----
From: Jeff Moyer [mailto:jmoyer@redhat.com]=20
Sent: Wednesday, January 18, 2017 12:48 PM
To: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>
Cc: Jan Kara <jack@suse.cz>; linux-nvdimm@lists.01.org <linux-nvdimm@ml01.0=
1.org>; linux-block@vger.kernel.org; Viacheslav Dubeyko <slava@dubeyko.com>=
; Linux FS Devel <linux-fsdevel@vger.kernel.org>; lsf-pc@lists.linux-founda=
tion.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in f=
ilesystems

>>> Well, the situation with NVM is more like with DRAM AFAIU. It is=20
>>> quite reliable but given the size the probability *some* cell has degra=
ded is quite high.
>>> And similar to DRAM you'll get MCE (Machine Check Exception) when you=20
>>> try to read such cell. As Vishal wrote, the hardware does some=20
>>> background scrubbing and relocates stuff early if needed but nothing is=
 100%.
>>
>> My understanding that hardware does the remapping the affected address=20
>> range (64 bytes, for example) but it doesn't move/migrate the stored=20
>> data in this address range. So, it sounds slightly weird. Because it=20
>> means that no guarantee to retrieve the stored data. It sounds that=20
>> file system should be aware about this and has to be heavily protected=20
>> by some replication or erasure coding scheme. Otherwise, if the=20
>> hardware does everything for us (remap the affected address region and=20
>> move data into a new address region) then why does file system need to=20
>> know about the affected address regions?
>
>The data is lost, that's why you're getting an ECC.  It's tantamount to -E=
IO for a disk block access.

I see the three possible cases here:
(1) bad block has been discovered (no remap, no recovering) -> data is lost=
; -EIO for a disk block access, block is always bad;
(2) bad block has been discovered and remapped -> data is lost; -EIO for a =
disk block access.
(3) bad block has been discovered, remapped and recovered -> no data is los=
t.

>> Let's imagine that the affected address range will equal to 64 bytes.=20
>> It sounds for me that for the case of block device it will affect the=20
>> whole logical block (4 KB).
>
> 512 bytes, and yes, that's the granularity at which we track errors in th=
e block layer, so that's the minimum amount of data you lose.

I think it depends what granularity hardware supports. It could be 512 byte=
s, 4 KB, maybe greater.

>> The situation is more critical for the case of DAX approach. Correct=20
>> me if I wrong but my understanding is the goal of DAX is to provide=20
>> the direct access to file's memory pages with minimal file system=20
>> overhead. So, it looks like that raising bad block issue on file=20
>> system level will affect a user-space application. Because, finally,=20
>> user-space application will need to process such trouble (bad block=20
>> issue). It sounds for me as really weird situation. What can protect a=20
>> user-space application from encountering the issue with partially=20
>> incorrect memory page?
>
> Applications need to deal with -EIO today.  This is the same sort of thin=
g.
> If an application trips over a bad block during a load from persistent me=
mory,
> they will get a signal, and they can either handle it or not.
>
> Have a read through this specification and see if it clears anything up f=
or you:
>  http://www.snia.org/tech_activities/standards/curr_standards/npm

Thank you for sharing this. So, if a user-space application follows to the
NVM Programming Model then it will be able to survive by means of catching
and processing the exceptions. But these applications have to be implemente=
d yet.
Also such applications need in special technique(s) of recovering. It sound=
s
that legacy user-space applications are unable to survive for the NVM.PM.FI=
LE mode
in the case of load/store operation's failure.

Thanks,
Vyacheslav Dubeyko.


WARNING: multiple messages have this Message-ID (diff)
From: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	"Linux FS Devel" <linux-fsdevel@vger.kernel.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Subject: RE: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems
Date: Thu, 19 Jan 2017 02:56:39 +0000	[thread overview]
Message-ID: <SN2PR04MB21916B138434803EA9AF4C18887E0@SN2PR04MB2191.namprd04.prod.outlook.com> (raw)
In-Reply-To: <x49mveo6qom.fsf@segfault.boston.devel.redhat.com>


-----Original Message-----
From: Jeff Moyer [mailto:jmoyer@redhat.com] 
Sent: Wednesday, January 18, 2017 12:48 PM
To: Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>
Cc: Jan Kara <jack@suse.cz>; linux-nvdimm@lists.01.org <linux-nvdimm@ml01.01.org>; linux-block@vger.kernel.org; Viacheslav Dubeyko <slava@dubeyko.com>; Linux FS Devel <linux-fsdevel@vger.kernel.org>; lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

>>> Well, the situation with NVM is more like with DRAM AFAIU. It is 
>>> quite reliable but given the size the probability *some* cell has degraded is quite high.
>>> And similar to DRAM you'll get MCE (Machine Check Exception) when you 
>>> try to read such cell. As Vishal wrote, the hardware does some 
>>> background scrubbing and relocates stuff early if needed but nothing is 100%.
>>
>> My understanding that hardware does the remapping the affected address 
>> range (64 bytes, for example) but it doesn't move/migrate the stored 
>> data in this address range. So, it sounds slightly weird. Because it 
>> means that no guarantee to retrieve the stored data. It sounds that 
>> file system should be aware about this and has to be heavily protected 
>> by some replication or erasure coding scheme. Otherwise, if the 
>> hardware does everything for us (remap the affected address region and 
>> move data into a new address region) then why does file system need to 
>> know about the affected address regions?
>
>The data is lost, that's why you're getting an ECC.  It's tantamount to -EIO for a disk block access.

I see the three possible cases here:
(1) bad block has been discovered (no remap, no recovering) -> data is lost; -EIO for a disk block access, block is always bad;
(2) bad block has been discovered and remapped -> data is lost; -EIO for a disk block access.
(3) bad block has been discovered, remapped and recovered -> no data is lost.

>> Let's imagine that the affected address range will equal to 64 bytes. 
>> It sounds for me that for the case of block device it will affect the 
>> whole logical block (4 KB).
>
> 512 bytes, and yes, that's the granularity at which we track errors in the block layer, so that's the minimum amount of data you lose.

I think it depends what granularity hardware supports. It could be 512 bytes, 4 KB, maybe greater.

>> The situation is more critical for the case of DAX approach. Correct 
>> me if I wrong but my understanding is the goal of DAX is to provide 
>> the direct access to file's memory pages with minimal file system 
>> overhead. So, it looks like that raising bad block issue on file 
>> system level will affect a user-space application. Because, finally, 
>> user-space application will need to process such trouble (bad block 
>> issue). It sounds for me as really weird situation. What can protect a 
>> user-space application from encountering the issue with partially 
>> incorrect memory page?
>
> Applications need to deal with -EIO today.  This is the same sort of thing.
> If an application trips over a bad block during a load from persistent memory,
> they will get a signal, and they can either handle it or not.
>
> Have a read through this specification and see if it clears anything up for you:
>  http://www.snia.org/tech_activities/standards/curr_standards/npm

Thank you for sharing this. So, if a user-space application follows to the
NVM Programming Model then it will be able to survive by means of catching
and processing the exceptions. But these applications have to be implemented yet.
Also such applications need in special technique(s) of recovering. It sounds
that legacy user-space applications are unable to survive for the NVM.PM.FILE mode
in the case of load/store operation's failure.

Thanks,
Vyacheslav Dubeyko.


  reply	other threads:[~2017-01-19  2:56 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <at1mp6pou4lenesjdgh22k4p.1484345585589@email.android.com>
     [not found] ` <b9rbflutjt10mb4ofherta8j.1484345610771@email.android.com>
2017-01-14  0:00   ` [LSF/MM TOPIC] Badblocks checking/representation in filesystems Slava Dubeyko
2017-01-14  0:00     ` Slava Dubeyko
2017-01-14  0:00     ` Slava Dubeyko
2017-01-14  0:49     ` Vishal Verma
2017-01-14  0:49       ` Vishal Verma
2017-01-16  2:27       ` Slava Dubeyko
2017-01-16  2:27         ` Slava Dubeyko
2017-01-16  2:27         ` Slava Dubeyko
2017-01-17 14:37         ` [Lsf-pc] " Jan Kara
2017-01-17 14:37           ` Jan Kara
2017-01-17 15:08           ` Christoph Hellwig
2017-01-17 15:08             ` Christoph Hellwig
2017-01-17 22:14           ` Vishal Verma
2017-01-17 22:14             ` Vishal Verma
2017-01-18 10:16             ` Jan Kara
2017-01-18 10:16               ` Jan Kara
2017-01-18 20:39               ` Jeff Moyer
2017-01-18 20:39                 ` Jeff Moyer
2017-01-18 21:02                 ` Darrick J. Wong
2017-01-18 21:02                   ` Darrick J. Wong
2017-01-18 21:32                   ` Dan Williams
2017-01-18 21:32                     ` Dan Williams
     [not found]                     ` <CAPcyv4hd7bpCa7d9msX0Y8gLz7WsqXT3VExQwwLuAcsmMxVTPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-18 21:56                       ` Verma, Vishal L
2017-01-18 21:56                         ` Verma, Vishal L
2017-01-18 21:56                         ` Verma, Vishal L
     [not found]                         ` <1484776549.4358.33.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-01-19  8:10                           ` Jan Kara
2017-01-19  8:10                             ` Jan Kara
     [not found]                             ` <20170119081011.GA2565-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-01-19 18:59                               ` Vishal Verma
2017-01-19 18:59                                 ` Vishal Verma
     [not found]                                 ` <20170119185910.GF4880-PxNA6LsHknajYZd8rzuJLNh3ngVCH38I@public.gmane.org>
2017-01-19 19:03                                   ` Dan Williams
2017-01-19 19:03                                     ` Dan Williams
     [not found]                                     ` <CAPcyv4jZz_iqLutd0gPEL3udqbFxvBH8CZY5oDgUjG5dGbC2gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20  9:03                                       ` Jan Kara
2017-01-20  9:03                                         ` Jan Kara
2017-01-17 23:15           ` Slava Dubeyko
2017-01-17 23:15             ` Slava Dubeyko
2017-01-17 23:15             ` Slava Dubeyko
2017-01-18 20:47             ` Jeff Moyer
2017-01-18 20:47               ` Jeff Moyer
2017-01-19  2:56               ` Slava Dubeyko [this message]
2017-01-19  2:56                 ` Slava Dubeyko
2017-01-19  2:56                 ` Slava Dubeyko
2017-01-19 19:33                 ` Jeff Moyer
2017-01-19 19:33                   ` Jeff Moyer
2017-01-17  6:33       ` Darrick J. Wong
2017-01-17  6:33         ` Darrick J. Wong
2017-01-17 21:35         ` Vishal Verma
2017-01-17 21:35           ` Vishal Verma
2017-01-17 22:15           ` Andiry Xu
2017-01-17 22:15             ` Andiry Xu
2017-01-17 22:37             ` Vishal Verma
2017-01-17 22:37               ` Vishal Verma
2017-01-17 23:20               ` Andiry Xu
2017-01-17 23:20                 ` Andiry Xu
2017-01-17 23:51                 ` Vishal Verma
2017-01-17 23:51                   ` Vishal Verma
2017-01-18  1:58                   ` Andiry Xu
2017-01-18  1:58                     ` Andiry Xu
     [not found]                     ` <CAOvWMLZCt39EDg-1uppVVUeRG40JvOo9sKLY2XMuynZdnc0W9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20  0:32                       ` Verma, Vishal L
2017-01-20  0:32                         ` Verma, Vishal L
2017-01-20  0:32                         ` Verma, Vishal L
2017-01-18  9:38               ` [Lsf-pc] " Jan Kara
2017-01-18  9:38                 ` Jan Kara
2017-01-19 21:17                 ` Vishal Verma
2017-01-19 21:17                   ` Vishal Verma
2017-01-20  9:47                   ` Jan Kara
2017-01-20  9:47                     ` Jan Kara
2017-01-20 15:42                     ` Dan Williams
2017-01-20 15:42                       ` Dan Williams
2017-01-24  7:46                       ` Jan Kara
2017-01-24  7:46                         ` Jan Kara
2017-01-24 19:59                         ` Vishal Verma
2017-01-24 19:59                           ` Vishal Verma
2017-01-18  0:16             ` Andreas Dilger
2017-01-18  2:01               ` Andiry Xu
2017-01-18  2:01                 ` Andiry Xu
2017-01-18  3:08                 ` Lu Zhang
2017-01-18  3:08                   ` Lu Zhang
2017-01-20  0:46                   ` Vishal Verma
2017-01-20  0:46                     ` Vishal Verma
2017-01-20  9:24                     ` Yasunori Goto
2017-01-20  9:24                       ` Yasunori Goto
     [not found]                       ` <20170120182435.0E12.E1E9C6FF-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2017-01-21  0:23                         ` Kani, Toshimitsu
2017-01-21  0:23                           ` Kani, Toshimitsu
2017-01-21  0:23                           ` Kani, Toshimitsu
2017-01-20  0:55                 ` Verma, Vishal L
2017-01-20  0:55                   ` Verma, Vishal L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN2PR04MB21916B138434803EA9AF4C18887E0@SN2PR04MB2191.namprd04.prod.outlook.com \
    --to=vyacheslav.dubeyko@wdc.com \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.