From: Gionatan Danti <g.danti@assyoma.it>
To: Roman Mamedov <rm@romanrm.net>
Cc: linux-raid@vger.kernel.org, g.danti@assyoma.it
Subject: Re: Filesystem corruption on RAID1
Date: Thu, 13 Jul 2017 23:28:16 +0200 [thread overview]
Message-ID: <592f19bf608e9a959f9445f7f25c5dad@assyoma.it> (raw)
In-Reply-To: <20170713214856.4a5c8778@natsu>
Il 13-07-2017 18:48 Roman Mamedov ha scritto:
>
> Failed reads are not as bad, as they are just retried.
>
I agree, I reported them only to give a broad picture of the system
state :)
>> Jul 12 03:14:41 nas kernel: ata1.00: failed command: WRITE FPDMA
>> QUEUED
>
> But these WILL cause incorrect data written to disk, in my experience.
> After
> that, one of your disks will contain some corruption, whether in files,
> or (as
> you discovered) in the filesystem itself.
This is the "scary" part: if the write was not acknowledged as committed
to disk, why the block layer did not report it to the MD driver? Or if
the block layer reported that, why MD did not kick the disk out of the
array?
> mdadm may or may not read from that
> disk, as it chooses the mirror for reads pretty much randomly, using
> the least
> loaded one. And even though the other disk still contains good data,
> there is
> no mechanism for the user-space to say "hey, this doesn't look right,
> what's
> on the other mirror?"
I understand and agree with that. I'm fully aware that MD can not (by
design) detect/correct corrupted data. However, I wonder if, and why, a
disk with obvious errors was not kicked out of the array.
>
> Check your cables and/or disks themselves.
>
I tried reseating and inverting the cables ;)
Let see if the problem disappears or if it "follow" the
cable/drive/interface...
> If you know that only one disk had these write errors all the time, you
> could
> try disconnecting it from mirror, and checking if you can get a more
> consistent view of the filesystem on the remaining one.
>
> P.S: about my case (which I witnessed on a RAID6):
>
> * copy a file to the array, one disk will hit tons of WRITE FPDMA
> QUEUED
> errors (due to insufficient power and/or bad data cable).
> * the file that was just copied, turns out to be corrupted when
> reading back.
> * the problem disk WILL NOT get kicked from the array during this.
Wow, a die-hard data corruption. It seems VERY similar to what happened
to me, and the key problem seems the same: a failing drive was not
detached from the array in a timely fashion.
Thanks very much for reporting, Roman.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
next prev parent reply other threads:[~2017-07-13 21:28 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 15:35 Filesystem corruption on RAID1 Gionatan Danti
2017-07-13 16:48 ` Roman Mamedov
2017-07-13 21:28 ` Gionatan Danti [this message]
2017-07-13 21:34 ` Reindl Harald
2017-07-13 22:34 ` Gionatan Danti
2017-07-14 0:32 ` Reindl Harald
2017-07-14 0:52 ` Anthony Youngman
2017-07-14 1:10 ` Reindl Harald
2017-07-14 10:46 ` Gionatan Danti
2017-07-14 10:58 ` Reindl Harald
2017-08-17 8:23 ` Gionatan Danti
2017-08-17 12:41 ` Roger Heflin
2017-08-17 14:31 ` Gionatan Danti
2017-08-17 17:33 ` Wols Lists
2017-08-17 20:50 ` Gionatan Danti
2017-08-17 21:01 ` Roger Heflin
2017-08-17 21:21 ` Gionatan Danti
2017-08-17 21:23 ` Gionatan Danti
2017-08-17 22:51 ` Wols Lists
2017-08-18 12:26 ` Gionatan Danti
2017-08-18 12:54 ` Roger Heflin
2017-08-18 19:42 ` Gionatan Danti
2017-08-20 7:14 ` Mikael Abrahamsson
2017-08-20 7:24 ` Gionatan Danti
2017-08-20 10:43 ` Mikael Abrahamsson
2017-08-20 13:07 ` Wols Lists
2017-08-20 15:38 ` Adam Goryachev
2017-08-20 15:48 ` Mikael Abrahamsson
2017-08-20 16:10 ` Wols Lists
2017-08-20 23:11 ` Adam Goryachev
2017-08-21 14:03 ` Anthony Youngman
2017-08-20 19:11 ` Gionatan Danti
2017-08-20 19:03 ` Gionatan Danti
2017-08-20 19:01 ` Gionatan Danti
2017-08-31 22:55 ` Robert L Mathews
2017-09-01 5:39 ` Reindl Harald
2017-09-01 23:14 ` Robert L Mathews
2017-08-20 23:22 ` Chris Murphy
2017-08-21 5:57 ` Gionatan Danti
2017-08-21 8:37 ` Mikael Abrahamsson
2017-08-21 12:28 ` Gionatan Danti
2017-08-21 14:09 ` Anthony Youngman
2017-08-21 17:33 ` Chris Murphy
2017-08-21 17:52 ` Reindl Harald
2017-07-14 1:48 ` Chris Murphy
2017-07-14 7:22 ` Roman Mamedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=592f19bf608e9a959f9445f7f25c5dad@assyoma.it \
--to=g.danti@assyoma.it \
--cc=linux-raid@vger.kernel.org \
--cc=rm@romanrm.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.