Massive RAID-1 desync

* Massive RAID-1 desync
       [not found] <1919189912.18202330.1429908372364.JavaMail.zimbra@laposte.net>
@ 2015-04-24 20:47 ` cau2jeaf1honoq
  2015-04-25  4:02   ` Mikael Abrahamsson
  2015-04-25  7:25   ` NeilBrown
  0 siblings, 2 replies; 6+ messages in thread
From: cau2jeaf1honoq @ 2015-04-24 20:47 UTC (permalink / raw)
  To: linux-raid

Something is happening here. I don't know what, but I'm having
fun trying to guess.

The root file system (ext3) is on a 4 x 30 GB RAID-1 array. A
couple hours after boot, the kernel detected something wrong in
the file system and decided to remount it read-only.

Comparing the component partitions finds many differences with a
very uneven distribution :

- sda1 and sdb1 are identical except for 4 bytes in the last
  70 kB,

- sdd1 is identical to sda1 and sdb1 except for about 67,000
  differences in the last 70 kB.

- sdc1 is grossly out of sync with about 300 million differences
  with the others, all of them in the first 450 MB or so.

I'm not sure what to make of this. The knee-jerk thought would
be "/dev/sdc1 is the odd man out so sdc must be faulty". But
that disk participates in other arrays without problems, I don't
see anything obviously bad in its SMART data and the kernel
messages just before the remount were actually about sda.

To be honest, I don't have a clear idea of how things got where 
they are. Since writing to a RAID-1 array writes the same data
to all devices, how can you have so many differences ?

^ permalink raw reply	[flat|nested] 6+ messages in thread