All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gionatan Danti <g.danti@assyoma.it>
To: Roman Mamedov <rm@romanrm.net>
Cc: linux-raid@vger.kernel.org, g.danti@assyoma.it
Subject: Re: Filesystem corruption on RAID1
Date: Thu, 13 Jul 2017 23:28:16 +0200	[thread overview]
Message-ID: <592f19bf608e9a959f9445f7f25c5dad@assyoma.it> (raw)
In-Reply-To: <20170713214856.4a5c8778@natsu>

Il 13-07-2017 18:48 Roman Mamedov ha scritto:
> 
> Failed reads are not as bad, as they are just retried.
> 

I agree, I reported them only to give a broad picture of the system 
state :)

>> Jul 12 03:14:41 nas kernel: ata1.00: failed command: WRITE FPDMA 
>> QUEUED
> 
> But these WILL cause incorrect data written to disk, in my experience. 
> After
> that, one of your disks will contain some corruption, whether in files, 
> or (as
> you discovered) in the filesystem itself.

This is the "scary" part: if the write was not acknowledged as committed 
to disk, why the block layer did not report it to the MD driver? Or if 
the block layer reported that, why MD did not kick the disk out of the 
array?

> mdadm may or may not read from that
> disk, as it chooses the mirror for reads pretty much randomly, using 
> the least
> loaded one. And even though the other disk still contains good data, 
> there is
> no mechanism for the user-space to say "hey, this doesn't look right, 
> what's
> on the other mirror?"

I understand and agree with that. I'm fully aware that MD can not (by 
design) detect/correct corrupted data. However, I wonder if, and why, a 
disk with obvious errors was not kicked out of the array.

> 
> Check your cables and/or disks themselves.
> 

I tried reseating and inverting the cables ;)
Let see if the problem disappears or if it "follow" the 
cable/drive/interface...

> If you know that only one disk had these write errors all the time, you 
> could
> try disconnecting it from mirror, and checking if you can get a more
> consistent view of the filesystem on the remaining one.
> 
> P.S: about my case (which I witnessed on a RAID6):
> 
>   * copy a file to the array, one disk will hit tons of WRITE FPDMA 
> QUEUED
>     errors (due to insufficient power and/or bad data cable).
>   * the file that was just copied, turns out to be corrupted when 
> reading back.
>   * the problem disk WILL NOT get kicked from the array during this.

Wow, a die-hard data corruption. It seems VERY similar to what happened 
to me, and the key problem seems the same: a failing drive was not 
detached from the array in a timely fashion.

Thanks very much for reporting, Roman.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

  reply	other threads:[~2017-07-13 21:28 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 15:35 Filesystem corruption on RAID1 Gionatan Danti
2017-07-13 16:48 ` Roman Mamedov
2017-07-13 21:28   ` Gionatan Danti [this message]
2017-07-13 21:34     ` Reindl Harald
2017-07-13 22:34       ` Gionatan Danti
2017-07-14  0:32         ` Reindl Harald
2017-07-14  0:52           ` Anthony Youngman
2017-07-14  1:10             ` Reindl Harald
2017-07-14 10:46           ` Gionatan Danti
2017-07-14 10:58             ` Reindl Harald
2017-08-17  8:23             ` Gionatan Danti
2017-08-17 12:41               ` Roger Heflin
2017-08-17 14:31                 ` Gionatan Danti
2017-08-17 17:33                   ` Wols Lists
2017-08-17 20:50                     ` Gionatan Danti
2017-08-17 21:01                       ` Roger Heflin
2017-08-17 21:21                         ` Gionatan Danti
2017-08-17 21:23                           ` Gionatan Danti
2017-08-17 22:51                       ` Wols Lists
2017-08-18 12:26                         ` Gionatan Danti
2017-08-18 12:54                           ` Roger Heflin
2017-08-18 19:42                             ` Gionatan Danti
2017-08-20  7:14                               ` Mikael Abrahamsson
2017-08-20  7:24                                 ` Gionatan Danti
2017-08-20 10:43                                   ` Mikael Abrahamsson
2017-08-20 13:07                                     ` Wols Lists
2017-08-20 15:38                                       ` Adam Goryachev
2017-08-20 15:48                                         ` Mikael Abrahamsson
2017-08-20 16:10                                           ` Wols Lists
2017-08-20 23:11                                             ` Adam Goryachev
2017-08-21 14:03                                               ` Anthony Youngman
2017-08-20 19:11                                           ` Gionatan Danti
2017-08-20 19:03                                         ` Gionatan Danti
2017-08-20 19:01                                       ` Gionatan Danti
2017-08-31 22:55                                     ` Robert L Mathews
2017-09-01  5:39                                       ` Reindl Harald
2017-09-01 23:14                                         ` Robert L Mathews
2017-08-20 23:22                                 ` Chris Murphy
2017-08-21  5:57                                   ` Gionatan Danti
2017-08-21  8:37                                   ` Mikael Abrahamsson
2017-08-21 12:28                                     ` Gionatan Danti
2017-08-21 14:09                                       ` Anthony Youngman
2017-08-21 17:33                                     ` Chris Murphy
2017-08-21 17:52                                       ` Reindl Harald
2017-07-14  1:48         ` Chris Murphy
2017-07-14  7:22           ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=592f19bf608e9a959f9445f7f25c5dad@assyoma.it \
    --to=g.danti@assyoma.it \
    --cc=linux-raid@vger.kernel.org \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.