All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gionatan Danti <g.danti@assyoma.it>
To: linux-raid@vger.kernel.org
Cc: g.danti@assyoma.it
Subject: Filesystem corruption on RAID1
Date: Thu, 13 Jul 2017 17:35:12 +0200	[thread overview]
Message-ID: <c2fe6593-c806-ab9f-fcff-8327c013237b@assyoma.it> (raw)

Hi list,
today I had an unexpected filesystem corruption on a RAID1 machine used 
for backup purposes. I would like to reconstruct what possibly happened 
on why, so I am asking for your help.

System specs:
- OS CentOS 7.2 x86_64 with kernel 3.10.0-514.6.1.el7.x86_64
- 2x SEAGATE ST4000VN000-1H4168 (4 TB 5900rpm disks)
- 4 GB DDR3 RAM
- Intel(R) Pentium(R) CPU G3260 @ 3.30GHz

Today, I found the machine crashed with an XFS warning about corrupted 
metadata. The warning stated that in-core (or in-memory) data corruption 
was detected so, thinking about a DRAM-related problem (no ECC memory on 
this small box...) I simply rebooted tha machine. To no avail - the same 
problem immediately happened, preventing the machine from booting (the 
root filesystem did not mount).

After the filesystem was repaired (with significant corruption signs, 
also due to the clearing of the XFS journal), I looked at dmesg and 
found something interesting: a raid-resync action was *automatically* 
performed, as when re-attaching a (detached) disk.

I start investigating in /var/log/messages and found plenty of these 
errors, spanning many days:

...
Jul 10 03:24:01 nas kernel: ata1.00: failed command: READ FPDMA QUEUED
Jul 10 14:50:54 nas kernel: ata1.00: failed command: FLUSH CACHE EXT
Jul 12 03:14:41 nas kernel: ata1.00: failed command: WRITE FPDMA QUEUED
...

To me, it seems that a disks (the first one, sda) had problem executing 
some SATA commands, becoming out-of-sync from the second one (sdb). 
However it was not kicked out the array, as both /var/log/messages *and* 
my custom monitoring script (which keep an eye on /proc/mdstat) reported 
nothing. Moreover, inspecting both the SMART values and log show *no* 
error at all.

Question 1: it is possible to have such a situation, where a failed 
command *silently* put the array in out-of-sync state?

At a certain point, the machine crashed. I noticed and rebooted it.

Question 2: it is possible that the old disk become offline just before 
the crash and, by rebooting, the mdadm re-added it to the array?

Question 3: if so, it is possible that the corruption was due to the 
first disk being the one read by the md array and, by extension, by the 
filesystem?

Any thoughts will be greatly appreciated.
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

             reply	other threads:[~2017-07-13 15:35 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 15:35 Gionatan Danti [this message]
2017-07-13 16:48 ` Filesystem corruption on RAID1 Roman Mamedov
2017-07-13 21:28   ` Gionatan Danti
2017-07-13 21:34     ` Reindl Harald
2017-07-13 22:34       ` Gionatan Danti
2017-07-14  0:32         ` Reindl Harald
2017-07-14  0:52           ` Anthony Youngman
2017-07-14  1:10             ` Reindl Harald
2017-07-14 10:46           ` Gionatan Danti
2017-07-14 10:58             ` Reindl Harald
2017-08-17  8:23             ` Gionatan Danti
2017-08-17 12:41               ` Roger Heflin
2017-08-17 14:31                 ` Gionatan Danti
2017-08-17 17:33                   ` Wols Lists
2017-08-17 20:50                     ` Gionatan Danti
2017-08-17 21:01                       ` Roger Heflin
2017-08-17 21:21                         ` Gionatan Danti
2017-08-17 21:23                           ` Gionatan Danti
2017-08-17 22:51                       ` Wols Lists
2017-08-18 12:26                         ` Gionatan Danti
2017-08-18 12:54                           ` Roger Heflin
2017-08-18 19:42                             ` Gionatan Danti
2017-08-20  7:14                               ` Mikael Abrahamsson
2017-08-20  7:24                                 ` Gionatan Danti
2017-08-20 10:43                                   ` Mikael Abrahamsson
2017-08-20 13:07                                     ` Wols Lists
2017-08-20 15:38                                       ` Adam Goryachev
2017-08-20 15:48                                         ` Mikael Abrahamsson
2017-08-20 16:10                                           ` Wols Lists
2017-08-20 23:11                                             ` Adam Goryachev
2017-08-21 14:03                                               ` Anthony Youngman
2017-08-20 19:11                                           ` Gionatan Danti
2017-08-20 19:03                                         ` Gionatan Danti
2017-08-20 19:01                                       ` Gionatan Danti
2017-08-31 22:55                                     ` Robert L Mathews
2017-09-01  5:39                                       ` Reindl Harald
2017-09-01 23:14                                         ` Robert L Mathews
2017-08-20 23:22                                 ` Chris Murphy
2017-08-21  5:57                                   ` Gionatan Danti
2017-08-21  8:37                                   ` Mikael Abrahamsson
2017-08-21 12:28                                     ` Gionatan Danti
2017-08-21 14:09                                       ` Anthony Youngman
2017-08-21 17:33                                     ` Chris Murphy
2017-08-21 17:52                                       ` Reindl Harald
2017-07-14  1:48         ` Chris Murphy
2017-07-14  7:22           ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2fe6593-c806-ab9f-fcff-8327c013237b@assyoma.it \
    --to=g.danti@assyoma.it \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.