On Wed, 1 Oct 2014 20:00:45 +0400 Andrey Kuzmin <andrey.v.kuzmin@gmail.com>
wrote:

> On Wed, Oct 1, 2014 at 6:56 AM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 24 Sep 2014 13:02:28 +0200 Heinz Mauelshagen <heinzm@redhat.com>
> > wrote:
> >
> >>
> >> Martin,
> >>
> >> thanks for the good explanation of the state of the discard union.
> >> Do you have an ETA for the 'zeroout, deallocate' ... support you mentioned?
> >>
> >> I was planning to have a followup patch for dm-raid supporting a dm-raid
> >> table
> >> line argument to prohibit discard passdown.
> >>
> >> In lieu of the fuzzy field situation wrt SSD fw and discard_zeroes_data
> >> support
> >> related to RAID4/5/6, we need that in upstream together with the initial
> >> patch.
> >>
> >> That 'no_discard_passdown' table line can be added to dm-raid RAID4/5/6
> >> table
> >> lines to avoid possible data corruption but can be avoided on RAID1/10
> >> table lines,
> >> because the latter are not suffering from any  discard_zeroes_data flaw.
> >>
> >>
> >> Neil,
> >>
> >> are you going to disable discards in RAID4/5/6 shortly
> >> or rather go with your bitmap solution?
> >
> > Can I just close my eyes and hope it goes away?
> >
> > The idea of a bitmap of uninitialised areas is not a short-term solution.
> > But I'm not really keen on simply disabling discard for RAID4/5/6 either. It
> > would  mean that people with good sensible hardware wouldn't be able to use
> > it properly.
> >
> > I would really rather that discard_zeroes_data were only set on devices where
> > it was actually true.  Then it wouldn't be my problem any more.
> >
> > Maybe I could do a loud warning
> >   "Not enabling DISCARD on RAID5 because we cannot trust committees.
> >    Set "md_mod.willing_to_risk_discard=Y" if your devices reads discarded
> >    sectors as zeros"
> >
> > and add an appropriate module parameter......
> >
> > While we are on the topic, maybe I should write down my thoughts about the
> > bitmap thing in case someone wants to contribute.
> >
> >   There are 3 states that a 'region' can be in:
> >     1- known to be in-sync
> >     2- possibly not in sync, but it should  be
> >     3- probably not in sync, contains no valuable data.
> >
> >  A read from '3' should return zeroes.
> >  A write to '3' should change the region to be '2'.  It could either
> >     write zeros before allowing the write to start, or it could just start
> >     a normal resync.
> >
> >  Here is a question:  if a region has been discarded, are we guaranteed that
> >  reads are at least stable.  i.e. if I read twice will I definitely get the
> >  same value?
> 
> Not sure with other specs, but an NVMe-compliant SSD that supports
> discard (Dataset Management command with Deallocate attribute, in NVMe
> parlance) is, per spec, required to be deterministic when deallocated
> range is subsequently read. That's what the spec (1.1) says:
> 
> The value read from a deallocated LBA shall be deterministic;
> specifically, the value returned by subsequent reads of that LBA shall
> be the same until a write occurs to that LBA. The values read from a
> deallocated LBA and its metadata (excluding protection information)
> shall be all zeros, all ones, or the last data written to the
> associated LBA and its metadata. The values read from an unwritten or
> deallocated LBA’s protection information field shall be all ones
> (indicating the protection information shall not be checked).
> 

That's good to know - thanks.

NeilBrown