All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roger Heflin <rogerheflin@gmail.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: Wols Lists <antlists@youngman.org.uk>,
	Reindl Harald <h.reindl@thelounge.net>,
	Roman Mamedov <rm@romanrm.net>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Filesystem corruption on RAID1
Date: Fri, 18 Aug 2017 07:54:34 -0500	[thread overview]
Message-ID: <CAAMCDefNRMuTwyXn_=3v_EWHwkjy3mhod1dLw3RQpjU=9VHNJQ@mail.gmail.com> (raw)
In-Reply-To: <784bec391a00b9e074744f31901df636@assyoma.it>

I have noticed all of the hardware raid controllers explicitly turn
off the disk's write cache so this would eliminate this issue, but the
cost is much slower write times.

It makes the hardware raid controllers (and disk arrays) become
uselessly slow when their battery backup dies and disables the raid
card and/or arrays write cache.

Remember, safe, fast and cheap, you only get to pick 2.   We generally
pick fast and cheap, the disk arrays/raid controllers pick safe and
fast, but not so cheap as a hardware raid controller with write cache
backup of some sort are quite expensive.

On Fri, Aug 18, 2017 at 7:26 AM, Gionatan Danti <g.danti@assyoma.it> wrote:
> Il 18-08-2017 00:51 Wols Lists ha scritto:
>>
>> Except that that is not what should be happening. I don't know my hard
>> drive details, but I believe drives have an instruction "async write
>> this data and let me know when you have done so".
>>
>> This should NOT return "yes I've flushed it TO cache". Which is how you
>> get your problem - the level above thinks it's been safely flushed to
>> disk (because the disk has said "yes I've got it"), but it then gets
>> lost because of your power fluctuation. It should only acknowledge it
>> *after* it's been flushed *from* cache.
>>
>> And this is apparently exactly what cheap drives do ...
>>
>> If the level above says "tell me when it's safely on disk", and the
>> drive truly does as its told, your problem won't happen because the disk
>> block layer will time out waiting for the acknowledgement and retry the
>> write.
>
>
> SATA drives generally guarantee persistent storage on physical medium by
> issuing *two* different FLUSH_CACHE commands, which do *not* form an atomic
> operation. In other words, it's not a problem of "cheap drives" or "lying
> hardware", rather, it seems a specific SATA limitation.
>
> This means the problem can not be solved by simply "buying better disks".
> Traditional flushing/barrier infrastructure simply has *no* method to ensure
> an atomic commit at the hardware level, and if something goes wrong between
> the two flushes, a (small) possibility exists to have corrupted writes
> without I/O errors reported to the upper layer, even in case of sync()
> writes. It's basically as a failing DRAM cache, but with *no* real
> failures...
>
> Newer drivers should implement FUAs, but I don't know if libata alredy uses
> them by default. Anyway, the disk's firmware is free to split a single FUA
> in more internal operations, so I am not sure they solves all problems.
>
> I really found the linux-scsi discussion interesting. Give it a look...
>
>
> Regards.
>
> --
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8

  reply	other threads:[~2017-08-18 12:54 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 15:35 Filesystem corruption on RAID1 Gionatan Danti
2017-07-13 16:48 ` Roman Mamedov
2017-07-13 21:28   ` Gionatan Danti
2017-07-13 21:34     ` Reindl Harald
2017-07-13 22:34       ` Gionatan Danti
2017-07-14  0:32         ` Reindl Harald
2017-07-14  0:52           ` Anthony Youngman
2017-07-14  1:10             ` Reindl Harald
2017-07-14 10:46           ` Gionatan Danti
2017-07-14 10:58             ` Reindl Harald
2017-08-17  8:23             ` Gionatan Danti
2017-08-17 12:41               ` Roger Heflin
2017-08-17 14:31                 ` Gionatan Danti
2017-08-17 17:33                   ` Wols Lists
2017-08-17 20:50                     ` Gionatan Danti
2017-08-17 21:01                       ` Roger Heflin
2017-08-17 21:21                         ` Gionatan Danti
2017-08-17 21:23                           ` Gionatan Danti
2017-08-17 22:51                       ` Wols Lists
2017-08-18 12:26                         ` Gionatan Danti
2017-08-18 12:54                           ` Roger Heflin [this message]
2017-08-18 19:42                             ` Gionatan Danti
2017-08-20  7:14                               ` Mikael Abrahamsson
2017-08-20  7:24                                 ` Gionatan Danti
2017-08-20 10:43                                   ` Mikael Abrahamsson
2017-08-20 13:07                                     ` Wols Lists
2017-08-20 15:38                                       ` Adam Goryachev
2017-08-20 15:48                                         ` Mikael Abrahamsson
2017-08-20 16:10                                           ` Wols Lists
2017-08-20 23:11                                             ` Adam Goryachev
2017-08-21 14:03                                               ` Anthony Youngman
2017-08-20 19:11                                           ` Gionatan Danti
2017-08-20 19:03                                         ` Gionatan Danti
2017-08-20 19:01                                       ` Gionatan Danti
2017-08-31 22:55                                     ` Robert L Mathews
2017-09-01  5:39                                       ` Reindl Harald
2017-09-01 23:14                                         ` Robert L Mathews
2017-08-20 23:22                                 ` Chris Murphy
2017-08-21  5:57                                   ` Gionatan Danti
2017-08-21  8:37                                   ` Mikael Abrahamsson
2017-08-21 12:28                                     ` Gionatan Danti
2017-08-21 14:09                                       ` Anthony Youngman
2017-08-21 17:33                                     ` Chris Murphy
2017-08-21 17:52                                       ` Reindl Harald
2017-07-14  1:48         ` Chris Murphy
2017-07-14  7:22           ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAMCDefNRMuTwyXn_=3v_EWHwkjy3mhod1dLw3RQpjU=9VHNJQ@mail.gmail.com' \
    --to=rogerheflin@gmail.com \
    --cc=antlists@youngman.org.uk \
    --cc=g.danti@assyoma.it \
    --cc=h.reindl@thelounge.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.