All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gionatan Danti <g.danti@assyoma.it>
To: Wols Lists <antlists@youngman.org.uk>
Cc: Roger Heflin <rogerheflin@gmail.com>,
	Reindl Harald <h.reindl@thelounge.net>,
	Roman Mamedov <rm@romanrm.net>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Filesystem corruption on RAID1
Date: Fri, 18 Aug 2017 14:26:15 +0200	[thread overview]
Message-ID: <784bec391a00b9e074744f31901df636@assyoma.it> (raw)
In-Reply-To: <59961DD7.3060208@youngman.org.uk>

Il 18-08-2017 00:51 Wols Lists ha scritto:
> Except that that is not what should be happening. I don't know my hard
> drive details, but I believe drives have an instruction "async write
> this data and let me know when you have done so".
> 
> This should NOT return "yes I've flushed it TO cache". Which is how you
> get your problem - the level above thinks it's been safely flushed to
> disk (because the disk has said "yes I've got it"), but it then gets
> lost because of your power fluctuation. It should only acknowledge it
> *after* it's been flushed *from* cache.
> 
> And this is apparently exactly what cheap drives do ...
> 
> If the level above says "tell me when it's safely on disk", and the
> drive truly does as its told, your problem won't happen because the 
> disk
> block layer will time out waiting for the acknowledgement and retry the
> write.

SATA drives generally guarantee persistent storage on physical medium by 
issuing *two* different FLUSH_CACHE commands, which do *not* form an 
atomic operation. In other words, it's not a problem of "cheap drives" 
or "lying hardware", rather, it seems a specific SATA limitation.

This means the problem can not be solved by simply "buying better 
disks". Traditional flushing/barrier infrastructure simply has *no* 
method to ensure an atomic commit at the hardware level, and if 
something goes wrong between the two flushes, a (small) possibility 
exists to have corrupted writes without I/O errors reported to the upper 
layer, even in case of sync() writes. It's basically as a failing DRAM 
cache, but with *no* real failures...

Newer drivers should implement FUAs, but I don't know if libata alredy 
uses them by default. Anyway, the disk's firmware is free to split a 
single FUA in more internal operations, so I am not sure they solves all 
problems.

I really found the linux-scsi discussion interesting. Give it a look...

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

  reply	other threads:[~2017-08-18 12:26 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 15:35 Filesystem corruption on RAID1 Gionatan Danti
2017-07-13 16:48 ` Roman Mamedov
2017-07-13 21:28   ` Gionatan Danti
2017-07-13 21:34     ` Reindl Harald
2017-07-13 22:34       ` Gionatan Danti
2017-07-14  0:32         ` Reindl Harald
2017-07-14  0:52           ` Anthony Youngman
2017-07-14  1:10             ` Reindl Harald
2017-07-14 10:46           ` Gionatan Danti
2017-07-14 10:58             ` Reindl Harald
2017-08-17  8:23             ` Gionatan Danti
2017-08-17 12:41               ` Roger Heflin
2017-08-17 14:31                 ` Gionatan Danti
2017-08-17 17:33                   ` Wols Lists
2017-08-17 20:50                     ` Gionatan Danti
2017-08-17 21:01                       ` Roger Heflin
2017-08-17 21:21                         ` Gionatan Danti
2017-08-17 21:23                           ` Gionatan Danti
2017-08-17 22:51                       ` Wols Lists
2017-08-18 12:26                         ` Gionatan Danti [this message]
2017-08-18 12:54                           ` Roger Heflin
2017-08-18 19:42                             ` Gionatan Danti
2017-08-20  7:14                               ` Mikael Abrahamsson
2017-08-20  7:24                                 ` Gionatan Danti
2017-08-20 10:43                                   ` Mikael Abrahamsson
2017-08-20 13:07                                     ` Wols Lists
2017-08-20 15:38                                       ` Adam Goryachev
2017-08-20 15:48                                         ` Mikael Abrahamsson
2017-08-20 16:10                                           ` Wols Lists
2017-08-20 23:11                                             ` Adam Goryachev
2017-08-21 14:03                                               ` Anthony Youngman
2017-08-20 19:11                                           ` Gionatan Danti
2017-08-20 19:03                                         ` Gionatan Danti
2017-08-20 19:01                                       ` Gionatan Danti
2017-08-31 22:55                                     ` Robert L Mathews
2017-09-01  5:39                                       ` Reindl Harald
2017-09-01 23:14                                         ` Robert L Mathews
2017-08-20 23:22                                 ` Chris Murphy
2017-08-21  5:57                                   ` Gionatan Danti
2017-08-21  8:37                                   ` Mikael Abrahamsson
2017-08-21 12:28                                     ` Gionatan Danti
2017-08-21 14:09                                       ` Anthony Youngman
2017-08-21 17:33                                     ` Chris Murphy
2017-08-21 17:52                                       ` Reindl Harald
2017-07-14  1:48         ` Chris Murphy
2017-07-14  7:22           ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=784bec391a00b9e074744f31901df636@assyoma.it \
    --to=g.danti@assyoma.it \
    --cc=antlists@youngman.org.uk \
    --cc=h.reindl@thelounge.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=rm@romanrm.net \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.