All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID-6 mdadm disks out of sync issue (long e-mail)
       [not found] <S1752989AbZFJCy5/20090610025457Z+40@vger.kernel.org>
@ 2009-06-10  8:52 ` linux-raid.vger.kernel.org
  2009-06-10 10:55   ` NeilBrown
  2009-06-10  8:58 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org
  1 sibling, 1 reply; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-10  8:52 UTC (permalink / raw)
  To: linux-raid


Hello Linux-RAID mailing list.

Any help from those with more knowledge than myself would
be greatly appreciated.

I apologise if this e-mail is overly long or if this isn't
the right place to post it.  I feel very brain-dead right
now, as I am quite worried about losing the data and have
been poking away at it for the past 14 hours today and 5
hours last night.

I use Linux software RAID (mdadm) to manage two disk
arrays, a RAID-6 data array with 8x1TB disks (large
partitions on each disk), and a RAID-5 swap array with
the same 8 disks (small partitions at the end of each
disk).  On top of the RAID arrays are a layer of Linux
Device-Mapper encryption, which I don't think is important
to this e-mail, but adding it just in case.

I am currently using an Ubuntu 64bit distro.  Before this
problem happened, I had not rebooted the computer for 4.5
months, and was using Ubuntu 8.10 with Linux kernel 2.6.17.
I upgraded this to Ubuntu 9.04 while the system was up,
and had not yet rebooted into the newly installed system
(with kernel 2.6.28).  On June 3rd one of the eight disks
disconnected.  I was too busy with work to deal with it,
and didn't think there would be any problem waiting a few
days to get to it.

On the morning of June 7th another disk disconnected, which
I first noticed when I got home from work late last night
(there was an issue in the mdadm.conf which was preventing
me from receiving mdadm notification e-mails, which has
been resolved now).

(You can safely skip the end of the e-mail if you want,
where I give a current status summary of the array).

I am not sure what caused the disconnects, either an issue
in the kernel or loose connecting wires (more likely,
as I moved the computer a couple feet the day before the
first disk dissappeared).

Main devices:

   /dev/sdi1 is an old 160GB IDE disk with my "/"
   partition, where my distro lives.

   /dev/md13 is the RAID-6 data array, the important one,
   comprised of /dev/sda1 through /dev/sdh1.

   /dev/md9 is the RAID-5 swap array, which my friend and
   I have been playing with today, so it should be ignored,
   comprised of /dev/sda2 through /dev/sdh2.

   /dev/md0 was apparently created as a result of Ubuntu
   upgrading, as it wasn't there before I rebooted last
   night.  It doesn't show up in /proc/mdstats.

At that point I was substantially worried, with only
6 of 8 disks working.  So, I went to single user mode
(telinit 1) at 1:20AM on June 9th.  In single user mode
I tried unmounting the filesystem on the RAID-6 array,
and was eventually able to do so once I unmounted some
stuff that was mounted inside it.

After unmounting the filesystem, the mdadm still reported
that it was in the state "clean, degraded" with 6 of 8
disks working.  I used "cryptsetup remove" to remove the
hard drive encryption layer, and so the RAID-6 array was
(I thought) cleanly taken care of and safe to shut down
the computer.  I couldn't see how any more changes could
happen to the RAID-6 array, as nothing was using the
disks anymore.

After this I did "swapoff -a" and the 180MB of swap went
away without error.  I didn't realise it at the time,
but I don't know how the swapoff worked -- it was a RAID-5
array, and 2 of the disks had failed, so it shouldn't have
been usable.  I didn't care about the swap, so I didn't
look at it too closely then.

A little before 2:AM, perhaps fifteen minutes after
turning the swap off, I did a "shutdown -h now" and Ubuntu
proceeded to do its shutdown process.  At this point I
saw some errors from either the RAID array (mdadm) or
hard disk(s) flash by very briefly before it rebooted --
I think it mentioned I/O problems, but it was gone too
quickly to take note of it.

After the shutdown, I rearranged the drives slightly (4 of
the 8 disks were close together, and were running hot to
the touch, so I moved the one from this group a few inches
away), the other four disks were not close together and
were only slightly warm.  I snugged up all of the power
and data cables, and powered up the system around 2:30AM
on June 9th.

The BIOS detected that four SATA disks were connected to
the motherboard, and the 32bit PCI SATA controller card
detected the remaining 4 SATA disks.  All seemed well
and I booted the upgraded Ubuntu 9.10 with kernel 2.6.28,
which resides on a separate IDE hard disk.

When it booted up, the RAID-6 was not active.  I tried to
make it automatically detect and start up, and it informed
me that it couldn't activate with only 3 of 8 disks.
This was rather surprising to me, as I had unmounted
the filesystem and mdadm reported it as having 6/8 disks
working 30 minutes after that.

Since that time I have been trying, with the help of
a friend, all sorts of non-destructive things to try
to figure out more about what is wrong.  I am extremely
hesitant to try anything with the array that could cause
the data to become corrupted.  If I knew of any Linux
software RAID experts in my area, I would be very happy
to pay them to come look at the system, but I don't know
any and have found nothing searching online (Vancouver,
BC, Canada).

One possibility of what happened is that perhaps
Ubuntu updated something controlling the RAID (such as
/etc/init.d/mdadm), and when I went to shutdown it didn't
properly shut off the array.  I have no idea if this is the
case, but I've had similar problems updating software on
Ubuntu where handling of the running app breaks because
newer support files have been installed which can't
communicate with the older app.

# /var/log/messages content from errors related to the
  RAID-6 array from BEFORE rebooting last night:

   Jun  6 18:16:42 gqq kernel: ata7: EH complete
   Jun  6 18:16:45 gqq kernel: ata7.00: configured for UDMA/100
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Sense Key : Medium Error [current] [descriptor]
   Jun  6 18:16:45 gqq kernel: Descriptor sense data with sense descriptors (in hex):
   Jun  6 18:16:45 gqq kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
   Jun  6 18:16:45 gqq kernel:         73 77 61 9e 
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Add. Sense: Unrecovered read error - auto reallocate failed
   Jun  6 18:16:45 gqq kernel: ata7: EH complete
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Write Protect is off
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Write Protect is off
   Jun  6 18:16:45 gqq kernel: sd 6:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203488 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203496 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203504 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203512 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203520 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203528 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203536 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203544 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203552 on sde1)
   Jun  6 18:16:47 gqq kernel: raid5:md13: read error corrected (8 sectors at 1937203560 on sde1)

   Jun  7 05:34:05 gqq kernel: ata3.00: configured for UDMA/133
   Jun  7 05:34:05 gqq kernel: ata3: EH complete
   Jun  7 05:34:05 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  7 05:34:05 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  7 05:34:05 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  7 05:34:06 gqq kernel: ata3.00: configured for UDMA/133
   Jun  7 05:34:06 gqq kernel: ata3: EH complete
   Jun  7 05:34:06 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  7 05:34:06 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  7 05:34:06 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  7 05:34:08 gqq kernel: ata3.00: configured for UDMA/133
   Jun  7 05:34:08 gqq kernel: ata3: EH complete
   Jun  7 05:34:08 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  7 05:34:08 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  7 05:34:08 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  7 05:34:09 gqq kernel: ata3.00: configured for UDMA/133
   Jun  7 05:34:09 gqq kernel: ata3: EH complete
   Jun  7 05:34:09 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  7 05:34:09 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  7 05:34:09 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  7 05:34:11 gqq kernel: ata3.00: configured for UDMA/133
   Jun  7 05:34:11 gqq kernel: ata3: EH complete
   Jun  7 05:34:11 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  7 05:34:11 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  7 05:34:11 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  7 05:34:12 gqq kernel: ata3.00: configured for UDMA/133
   Jun  7 05:34:12 gqq kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
   Jun  7 05:34:12 gqq kernel: sd 2:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
   Jun  7 05:34:12 gqq kernel: Descriptor sense data with sense descriptors (in hex):
   Jun  7 05:34:12 gqq kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
   Jun  7 05:34:12 gqq kernel:         27 eb 8b 8c 
   Jun  7 05:34:12 gqq kernel: sd 2:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
   Jun  7 05:34:12 gqq kernel: __ratelimit: 2 callbacks suppressed
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748040 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748048 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748056 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748064 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748072 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748080 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748088 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748096 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748104 on sdb1).
   Jun  7 05:34:12 gqq kernel: raid5:md13: read error not correctable (sector 669748112 on sdb1).
   Jun  7 05:34:12 gqq kernel: ata3: EH complete
   Jun  7 05:34:12 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
   Jun  7 05:34:12 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  7 05:34:12 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   Jun  7 05:34:12 gqq kernel: md: md13: data-check done.
   Jun  7 05:34:12 gqq kernel: md: data-check of RAID array md9
   Jun  7 05:34:12 gqq kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
   Jun  7 05:34:12 gqq kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
   Jun  7 05:34:12 gqq kernel: md: using 128k window, over a total of 1269056 blocks.
   Jun  7 05:34:12 gqq kernel: md: md9: data-check done.
   Jun  7 05:34:12 gqq kernel: RAID5 conf printout:
   Jun  7 05:34:12 gqq kernel:  --- rd:8 wd:6
   Jun  7 05:34:12 gqq kernel:  disk 0, o:0, dev:sdb1
   Jun  7 05:34:12 gqq kernel:  disk 1, o:1, dev:sdf1
   Jun  7 05:34:12 gqq kernel:  disk 2, o:1, dev:sde1
   Jun  7 05:34:12 gqq kernel:  disk 3, o:1, dev:sda1
   Jun  7 05:34:12 gqq kernel:  disk 5, o:1, dev:sdh1
   Jun  7 05:34:12 gqq kernel:  disk 6, o:1, dev:sdc1
   Jun  7 05:34:12 gqq kernel:  disk 7, o:1, dev:sdg1
   Jun  7 05:34:12 gqq kernel: RAID5 conf printout:
   Jun  7 05:34:12 gqq kernel:  --- rd:8 wd:6
   Jun  7 05:34:12 gqq kernel:  disk 1, o:1, dev:sdf1
   Jun  7 05:34:12 gqq kernel:  disk 2, o:1, dev:sde1
   Jun  7 05:34:12 gqq kernel:  disk 3, o:1, dev:sda1
   Jun  7 05:34:12 gqq kernel:  disk 5, o:1, dev:sdh1
   Jun  7 05:34:12 gqq kernel:  disk 6, o:1, dev:sdc1
   Jun  7 05:34:12 gqq kernel:  disk 7, o:1, dev:sdg1

# /var/log/messages content from errors related to the
  RAID-6 array from AFTER rebooting last night (Note:
  a couple of the disk devices changed at this point,
  as I moved a disk and swapped cables):

   Jun  9 02:35:11 gqq kernel: md: md13 still in use.
   Jun  9 02:35:16 gqq kernel: md: md13 stopped.
   Jun  9 02:35:16 gqq kernel: md: unbind<sdf1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sdf1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sdg1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sdg1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sde1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sde1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sdd1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sdd1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sdc1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sdc1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sdb1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sdb1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sda1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sda1)
   Jun  9 02:35:16 gqq kernel: md: unbind<sdh1>
   Jun  9 02:35:16 gqq kernel: md: export_rdev(sdh1)
   Jun  9 02:35:16 gqq kernel: md: bind<sdb1>
   Jun  9 02:35:16 gqq kernel: md: bind<sda1>
   Jun  9 02:35:16 gqq kernel: md: bind<sdf1>
   Jun  9 02:35:16 gqq kernel: md: bind<sdd1>
   Jun  9 02:35:16 gqq kernel: md: bind<sdh1>
   Jun  9 02:35:16 gqq kernel: md: bind<sdc1>
   Jun  9 02:35:16 gqq kernel: md: bind<sdg1>
   Jun  9 02:35:16 gqq kernel: md: bind<sde1>

# I then went to sleep and continued today at 11:AM

# This was when we tried using the auto-detection of
  the array

   Jun  9 12:30:55 gqq kernel: md: Autodetecting RAID arrays.
   Jun  9 12:30:55 gqq kernel: md: Scanned 0 and added 0 devices.
   Jun  9 12:30:55 gqq kernel: md: autorun ...
   Jun  9 12:30:55 gqq kernel: md: ... autorun DONE.
   Jun  9 12:31:01 gqq kernel: md: Autodetecting RAID arrays.
   Jun  9 12:31:01 gqq kernel: md: Scanned 0 and added 0 devices.
   Jun  9 12:31:01 gqq kernel: md: autorun ...
   Jun  9 12:31:01 gqq kernel: md: ... autorun DONE.

# I don't remember what we were doing when these happened,
  but it did it several times and we didn't know what
  it meant

   Jun  9 13:02:40 gqq kernel: md: md13 stopped.
   Jun  9 13:02:40 gqq kernel: md: unbind<sde1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sde1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sdg1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sdg1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sdc1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sdc1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sdh1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sdh1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sdd1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sdd1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sdf1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sdf1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sda1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sda1)
   Jun  9 13:02:40 gqq kernel: md: unbind<sdb1>
   Jun  9 13:02:40 gqq kernel: md: export_rdev(sdb1)
   Jun  9 13:02:40 gqq kernel: md: bind<sdb1>
   Jun  9 13:02:40 gqq kernel: md: bind<sda1>
   Jun  9 13:02:40 gqq kernel: md: bind<sdf1>
   Jun  9 13:02:40 gqq kernel: md: bind<sdd1>
   Jun  9 13:02:40 gqq kernel: md: bind<sdh1>
   Jun  9 13:02:40 gqq kernel: md: bind<sdc1>
   Jun  9 13:02:40 gqq kernel: md: bind<sdg1>
   Jun  9 13:02:40 gqq kernel: md: bind<sde1>

   Repeat at Jun  9 13:02:51

   Repeat at Jun  9 13:03:10

   Repeat at Jun  9 13:03:13

   Repeat at Jun  9 13:41:08

   Jun  9 14:00:30 gqq kernel: md: md13 stopped.
   Jun  9 14:00:30 gqq kernel: md: unbind<sde1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sde1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sdg1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sdg1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sdc1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sdc1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sdh1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sdh1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sdd1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sdd1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sdf1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sdf1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sda1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sda1)
   Jun  9 14:00:30 gqq kernel: md: unbind<sdb1>
   Jun  9 14:00:30 gqq kernel: md: export_rdev(sdb1)
   Jun  9 14:00:30 gqq kernel: md: bind<sda1>
   Jun  9 14:00:30 gqq kernel: md: bind<sdf1>
   Jun  9 14:00:30 gqq kernel: md: bind<sdh1>
   Jun  9 14:00:30 gqq kernel: md: md_import_device returned -16
   Jun  9 14:00:30 gqq kernel: md: bind<sdg1>
   Jun  9 14:00:30 gqq kernel: md: md_import_device returned -16
   Jun  9 14:00:30 gqq kernel: md: bind<sde1>
   Jun  9 14:00:30 gqq kernel: md: bind<sdc1>

# Not sure if these are related, but there are a bunch of
  this type of message throughout the day, including during
  the middle of some disk errors

   Jun  9 16:42:02 gqq kernel: __ratelimit: 16 callbacks suppressed
   Jun  9 16:42:17 gqq kernel: __ratelimit: 13 callbacks suppressed
   Jun  9 18:58:09 gqq kernel: __ratelimit: 36 callbacks suppressed

# When we tested the disks, either through playing with a
  recreated /dev/md9 or using cat /dev/sd?1 > /dev/null,
  two of the disks (/dev/sdb and /dev/sdh) had a lot of
  errors, and the others have remained error-free

   Jun  9 18:58:08 gqq kernel: ata3: EH complete
   Jun  9 18:58:09 gqq kernel: ata3.00: configured for UDMA/133
   Jun  9 18:58:09 gqq kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
   Jun  9 18:58:09 gqq kernel: sd 2:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
   Jun  9 18:58:09 gqq kernel: Descriptor sense data with sense descriptors (in hex):
   Jun  9 18:58:09 gqq kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
   Jun  9 18:58:09 gqq kernel:         74 70 55 63
   Jun  9 18:58:09 gqq kernel: sd 2:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
   Jun  9 18:58:09 gqq kernel: __ratelimit: 36 callbacks suppressed
   Jun  9 18:58:09 gqq kernel: ata3: EH complete
   Jun  9 18:58:09 gqq kernel: sd 2:0:0:0: [sdb] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
   Jun  9 18:58:09 gqq kernel: sd 2:0:0:0: [sdb] Write Protect is off
   Jun  9 18:58:09 gqq kernel: sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

   Jun  9 18:58:27 gqq kernel: ata10: EH complete
   Jun  9 18:58:29 gqq kernel: ata10.00: configured for UDMA/100
   Jun  9 18:58:29 gqq kernel: sd 9:0:0:0: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
   Jun  9 18:58:29 gqq kernel: sd 9:0:0:0: [sdh] Sense Key : Medium Error [current] [descriptor]
   Jun  9 18:58:29 gqq kernel: Descriptor sense data with sense descriptors (in hex):
   Jun  9 18:58:29 gqq kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
   Jun  9 18:58:29 gqq kernel:         74 70 55 d9 
   Jun  9 18:58:29 gqq kernel: sd 9:0:0:0: [sdh] Add. Sense: Unrecovered read error - auto reallocate failed
   Jun  9 18:58:29 gqq kernel: ata10: EH complete
   Jun  9 18:58:29 gqq kernel: sd 9:0:0:0: [sdh] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
   Jun  9 18:58:31 gqq kernel: ata10.00: configured for UDMA/100
   Jun  9 18:58:31 gqq kernel: ata10: EH complete

   etc.

Here's all the information I can think to gather about
the system, if I missed anything just let me know:

# cat /etc/lsb-release 

   DISTRIB_ID=Ubuntu
   DISTRIB_RELEASE=9.04
   DISTRIB_CODENAME=jaunty
   DISTRIB_DESCRIPTION="Ubuntu 9.04"

# uname -a

   Linux gqq 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux

# lspci | grep -i sata

   00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2)
   01:08.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)

# fdisk -l

   Disk /dev/sda: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sda1               1      121443   975490866   83  Linux
   /dev/sda2          121444      121601     1261102   83  Linux

   Disk /dev/sdb: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdb1               1      121443   975490866   83  Linux
   /dev/sdb2          121444      121601     1261102   83  Linux

   Disk /dev/sdc: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdc1               1      121443   975490866   83  Linux
   /dev/sdc2          121444      121601     1261102   83  Linux

   Disk /dev/sdd: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdd1               1      121443   975490866   83  Linux
   /dev/sdd2          121444      121601     1261102   83  Linux

   Disk /dev/sde: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sde1               1      121443   975490866   83  Linux
   /dev/sde2          121444      121601     1261102   83  Linux

   Disk /dev/sdf: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdf1               1      121443   975490866   83  Linux
   /dev/sdf2          121444      121601     1261102   83  Linux

   Disk /dev/sdg: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdg1               1      121443   975490866   83  Linux
   /dev/sdg2          121444      121601     1261102   83  Linux

   Disk /dev/sdh: 1000 GB, 1000202273280 bytes
   255 heads, 63 sectors/track, 121601 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdh1               1      121443   975490866   83  Linux
   /dev/sdh2          121444      121601     1261102   83  Linux

   Disk /dev/sdi: 163 GB, 163921605120 bytes
   255 heads, 63 sectors/track, 19929 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System 
   /dev/sdi1   *           1       19929   160079661   83  Linux

   Error: /dev/md13: unrecognised disk label
   Error: /dev/md9: unrecognised disk label
   Error: /dev/md0: unrecognised disk label

# ls -l /dev/disk/by-id/scsi-SATA_* | sed 's/.*scsi-SATA_\([^ ]*\) .. ......\(.*\)/\2 = \1/; /part/d' | sort

   sda = ST31000340AS_9QJ1PKKS
   sdb = SAMSUNG_HD103UJS13PJDWQ204841
   sdc = ST31000340AS_9QJ0V24S
   sdd = ST31000340AS_9QJ0TTHZ
   sde = ST31000340AS_9QJ0M5J4
   sdf = ST31000340AS_9QJ0V1F5
   sdg = Hitachi_HDS7210_GTA0L0PAJGGZHF
   sdh = SAMSUNG_HD103UJS13PJDWQ204844
   sdi = Maxtor_6Y160P0_Y44ENMKE

# cat /proc/mdstat

   Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
   md9 : inactive sdd2[8](S) sdf2[3](S) sdg2[7](S) sde2[2](S) sdc2[0](S) sda2[4](S) sdh2[6](S) sdb2[5](S)
         10152448 blocks

   md13 : inactive sdd1[4](S) sdb1[0](S) sdc1[6](S) sde1[2](S) sdg1[7](S) sdh1[5](S) sdf1[3](S) sda1[1](S)
         7803926016 blocks

   unused devices: <none>

# cat /sys/module/md_mod/parameters/start_ro 

   1

# for disk in /dev/sd{a,b,c,d,e,f,g,h}1; do printf "$disk"; mdadm --examine "$disk" | tac | \grep -E '(Up|Ev)' | tr -d \\n; echo; done | sort --key=4

   /dev/sdd1         Events : 1107965    Update Time : Wed Jun  3 03:16:51 2009
   /dev/sdb1         Events : 1847298    Update Time : Sun Jun  7 05:34:03 2009
   /dev/sda1         Events : 2186232    Update Time : Tue Jun  9 01:36:59 2009
   /dev/sdf1         Events : 2186232    Update Time : Tue Jun  9 01:36:59 2009
   /dev/sdg1         Events : 2186232    Update Time : Tue Jun  9 01:36:59 2009
   /dev/sdc1         Events : 2186236    Update Time : Tue Jun  9 02:02:37 2009
   /dev/sde1         Events : 2186236    Update Time : Tue Jun  9 02:02:37 2009
   /dev/sdh1         Events : 2186236    Update Time : Tue Jun  9 02:02:37 2009

# for disk in /dev/sd{a,b,c,d,e,f,g,h}1; do mdadm --examine "$disk"; echo; done

   /dev/sda1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Tue Jun  9 01:36:59 2009
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 1
     Spare Devices : 0
          Checksum : b57902ef - correct
            Events : 2186232

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     1       8       81        1      active sync   /dev/sdf1

      0     0       0        0        0      removed
      1     1       8       81        1      active sync   /dev/sdf1
      2     2       8       65        2      active sync   /dev/sde1
      3     3       8        1        3      active sync   /dev/sda1
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       8       97        7      active sync   /dev/sdg1

   /dev/sdb1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Sun Jun  7 05:34:03 2009
             State : clean
    Active Devices : 7
   Working Devices : 7
    Failed Devices : 1
     Spare Devices : 0
          Checksum : b56c3f3e - correct
            Events : 1847298

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     0       8       17        0      active sync   /dev/sdb1

      0     0       8       17        0      active sync   /dev/sdb1
      1     1       8       81        1      active sync   /dev/sdf1
      2     2       8       65        2      active sync   /dev/sde1
      3     3       8        1        3      active sync   /dev/sda1
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       8       97        7      active sync   /dev/sdg1

   /dev/sdc1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Tue Jun  9 02:02:37 2009
             State : clean
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 4
     Spare Devices : 0
          Checksum : b579091e - correct
            Events : 2186236

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     6       8       33        6      active sync   /dev/sdc1

      0     0       0        0        0      removed
      1     1       0        0        1      faulty removed
      2     2       8       65        2      active sync   /dev/sde1
      3     3       0        0        3      faulty removed
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       0        0        7      faulty removed

   /dev/sdd1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Wed Jun  3 03:16:51 2009
             State : active
    Active Devices : 8
   Working Devices : 8
    Failed Devices : 0
     Spare Devices : 0
          Checksum : b53f6123 - correct
            Events : 1107965

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     4       8       49        4      active sync   /dev/sdd1

      0     0       8       17        0      active sync   /dev/sdb1
      1     1       8       81        1      active sync   /dev/sdf1
      2     2       8       65        2      active sync   /dev/sde1
      3     3       8        1        3      active sync   /dev/sda1
      4     4       8       49        4      active sync   /dev/sdd1
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       8       97        7      active sync   /dev/sdg1

   /dev/sde1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Tue Jun  9 02:02:37 2009
             State : clean
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 4
     Spare Devices : 0
          Checksum : b5790936 - correct
            Events : 2186236

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     2       8       65        2      active sync   /dev/sde1

      0     0       0        0        0      removed
      1     1       0        0        1      faulty removed
      2     2       8       65        2      active sync   /dev/sde1
      3     3       0        0        3      faulty removed
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       0        0        7      faulty removed

   /dev/sdf1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Tue Jun  9 01:36:59 2009
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 1
     Spare Devices : 0
          Checksum : b57902a3 - correct
            Events : 2186232

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     3       8        1        3      active sync   /dev/sda1

      0     0       0        0        0      removed
      1     1       8       81        1      active sync   /dev/sdf1
      2     2       8       65        2      active sync   /dev/sde1
      3     3       8        1        3      active sync   /dev/sda1
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       8       97        7      active sync   /dev/sdg1

   /dev/sdg1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Tue Jun  9 01:36:59 2009
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 1
     Spare Devices : 0
          Checksum : b579030b - correct
            Events : 2186232

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     7       8       97        7      active sync   /dev/sdg1

      0     0       0        0        0      removed
      1     1       8       81        1      active sync   /dev/sdf1
      2     2       8       65        2      active sync   /dev/sde1
      3     3       8        1        3      active sync   /dev/sda1
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       8       97        7      active sync   /dev/sdg1

   /dev/sdh1:
             Magic : a92b4efc
           Version : 00.90.00
              UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
     Creation Time : Sun Aug  3 10:21:28 2008
        Raid Level : raid6
     Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
        Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
      Raid Devices : 8
     Total Devices : 8
   Preferred Minor : 13

       Update Time : Tue Jun  9 02:02:37 2009
             State : clean
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 4
     Spare Devices : 0
          Checksum : b579096c - correct
            Events : 2186236

        Chunk Size : 64K

         Number   Major   Minor   RaidDevice State
   this     5       8      113        5      active sync   /dev/sdh1

      0     0       0        0        0      removed
      1     1       0        0        1      faulty removed
      2     2       8       65        2      active sync   /dev/sde1
      3     3       0        0        3      faulty removed
      4     4       0        0        4      faulty removed
      5     5       8      113        5      active sync   /dev/sdh1
      6     6       8       33        6      active sync   /dev/sdc1
      7     7       0        0        7      faulty removed

============================================================

At this point the important parts seem to be:

  a) Two disks are way behind on events than the other six
     disks, these being the ones that failed during the past
	  week.

  b) Two disks are currently producing errors if I try to
     read from them, but these are not the same two disks
     as above (one of them is the same, the other is not).

  c) Of the six remaining disks, there are three disks
     which are 4 events behind the other three disks.
     I don't think there should have been any writing
     to the disks at all, as it wasn't even mounted.
     The extra 4 events seemed to have happened during
     the system shutdown process.

  d) One of the six disks which are nearly up-to-date with
     each other is producing I/O errors when being read
     from, which I must fix.  I think I can accomplish
     by shutting down the system, removing the two disks
     which failed days ago, and moving the one problem
     disk to a new SATA controller and power cable.

  e) I am very worried to even shut down to try this,
     as last time shutting down is what messed things up.
     I don't want to do anything that could increase the
     chances of losing the terabytes of data, much of it
     is not backed up elsewhere.

Any information on how to assess what state the disks
are in would be greatly appreciated.  Before today I had
never even looked at the Event numbers, or most of the
other diagnostics and options I have now learned about.

I have set /sys/module/md_mod/parameters/start_ro to 1,
as I read that will keep it from making changes after it
brings the array back up.  Any other tips?

Again, apologies for the severely long e-mail, and if
anyone actually looks through it -- thank you kindly for
your time.  I have tried to at least put things into clear
sections so it can be skipped over fairly easily.

I *really* don't want to lose this data.  I wish I knew
more about recovering from mdadm issues, I guess I am
getting practice at it now.

Sigh.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (long e-mail)
       [not found] <S1752989AbZFJCy5/20090610025457Z+40@vger.kernel.org>
  2009-06-10  8:52 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org
@ 2009-06-10  8:58 ` linux-raid.vger.kernel.org
  1 sibling, 0 replies; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-10  8:58 UTC (permalink / raw)
  To: linux-raid


Hello again:

I just noticed that the mailing list removed all the spacing
in my last e-mail, which makes it a lot messier to read.  If
you want to read it formatted the way I wrote it, then you
can view it here:

http://pastie.org/506934.txt

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (long e-mail)
  2009-06-10  8:52 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org
@ 2009-06-10 10:55   ` NeilBrown
  2009-06-11 18:43     ` RAID-6 mdadm disks out of sync issue (five questions) linux-raid.vger.kernel.org
  0 siblings, 1 reply; 19+ messages in thread
From: NeilBrown @ 2009-06-10 10:55 UTC (permalink / raw)
  To: linux-raid.vger.kernel.org; +Cc: linux-raid

On Wed, June 10, 2009 6:52 pm, linux-raid.vger.kernel.org@atu.cjb.net wrote:
>
> Hello Linux-RAID mailing list.
>
> Any help from those with more knowledge than myself would
> be greatly appreciated.

I strongly suspect that you can get all your data back.

The arrays are not currently active (/proc/mdstat shows "inactive")
so nothing is going to write to them.  You can reboot without any
concern on that point.

Your priority has to be to sort out the read errors on those drives.
Whether you have to change cables or controller or whatever, and
reboot as often as you like: just get the drives in a state where
you can reliably read from them.  Don't perform any further mdadm
commands until you have achieved that.

Once you are sure you have the 6 drives with the highest event counts
working, assemble them with
   mdadm --assemble /dev/md13 --force /dev/sd?1

and you will almost certainly have a working (though degraded) array.
Then, if you are confident that the other drives are working, add
them with
   mdadm /dev/md13 --add /dev/sdwhatever

Good luck.

NeilBrown


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (five questions)
  2009-06-10 10:55   ` NeilBrown
@ 2009-06-11 18:43     ` linux-raid.vger.kernel.org
  2009-06-11 23:33       ` Michael Tokarev
  0 siblings, 1 reply; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-11 18:43 UTC (permalink / raw)
  To: linux-raid

NeilBrown <neilb@suse.de> wrote :
> Once you are sure you have the 6 drives with the highest
> event counts working, assemble them with
>
> mdadm --assemble /dev/md13 --force /dev/sd?1

I had a few questions before I went ahead with the
reassembly:

1) Does it matter which order the disks are listed in when
reassembling the array (e.g. /dev/sda1 /dev/sdh1 ...)?

2) Is there any risk to the data stored on the disks by
merely reassembling the six working disks with the above
command?

3) Does /sys/module/md_mod/parameters/start_ro being
set to 1 prevent the array from syncing/rebuilding/etc.,
or does it only prevent new user data being written to
the array?  If it only prevents user data being written
to the /dev/md*, is there some way to also prevent mdadm
from doing syncing/rebuilding/etc. so I can be sure the
data is not at risk of further damage while testing?

4) Having checked what the "Events" refers to (I thought it
was write-syncing operations before), should I be worried
at the Event count being above 1,000,000?  I have rebuilt
two failed disks and the distro performed a few data
integrity checks on all the disks.  The array is about
nine to ten months old.

5) Any idea why "shutdown -h now" would cause three of
the six working disks to gain 4 events each (happened with
the filesystem unmounted from /dev/md13)?

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (five questions)
  2009-06-11 18:43     ` RAID-6 mdadm disks out of sync issue (five questions) linux-raid.vger.kernel.org
@ 2009-06-11 23:33       ` Michael Tokarev
  2009-06-12  1:26         ` Neil Brown
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Tokarev @ 2009-06-11 23:33 UTC (permalink / raw)
  To: linux-raid.vger.kernel.org; +Cc: linux-raid

linux-raid.vger.kernel.org@atu.cjb.net wrote:
> NeilBrown <neilb@suse.de> wrote :
>> Once you are sure you have the 6 drives with the highest
>> event counts working, assemble them with
>>
>> mdadm --assemble /dev/md13 --force /dev/sd?1
> 
> I had a few questions before I went ahead with the
> reassembly:
> 
> 1) Does it matter which order the disks are listed in when
> reassembling the array (e.g. /dev/sda1 /dev/sdh1 ...)?

No, the order does not matter.  In superblock of each
device there's the device number so mdadm will figure
it all out automatically.

On the other hand, if you want to RECREATE the array
(with mdadm --create), order DOES matter - it's pretty
much essential to get the same order as original array.

> 2) Is there any risk to the data stored on the disks by
> merely reassembling the six working disks with the above
> command?

If you original set (raid6) was 8 disks, there's nothing
to do with the data.  I mean, mdadm/kernel will not
start any sort of reconstruction because there's no
drives to resync data to.  The data will not be
changed.  Superblocks will be updated (event counts)
but that's not data.

> 3) Does /sys/module/md_mod/parameters/start_ro being
> set to 1 prevent the array from syncing/rebuilding/etc.,
> or does it only prevent new user data being written to
> the array?  If it only prevents user data being written
> to the /dev/md*, is there some way to also prevent mdadm
> from doing syncing/rebuilding/etc. so I can be sure the
> data is not at risk of further damage while testing?

See above.  I really am not sure for start_ro vs
rebuilding - will check ;)
> 
> 4) Having checked what the "Events" refers to (I thought it
> was write-syncing operations before), should I be worried
> at the Event count being above 1,000,000?  I have rebuilt
> two failed disks and the distro performed a few data
> integrity checks on all the disks.  The array is about
> nine to ten months old.

Well, 1.000.000 is a bit too high for that time.
Mine has 28 - half a year old raid array.  But I don't
reboot machine often, it has been rebooted about 10
times in that time.  Events are like - array assembly
and disassembly, drive failed, drive added and the like.

> 5) Any idea why "shutdown -h now" would cause three of
> the six working disks to gain 4 events each (happened with
> the filesystem unmounted from /dev/md13)?

It shouldn't be that high really.  I think.  *Especially*
on only some of the disks.

/mjt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (five questions)
  2009-06-11 23:33       ` Michael Tokarev
@ 2009-06-12  1:26         ` Neil Brown
  2009-06-13  9:18           ` RAID-6 mdadm disks out of sync issue (no success) linux-raid.vger.kernel.org
  0 siblings, 1 reply; 19+ messages in thread
From: Neil Brown @ 2009-06-12  1:26 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid.vger.kernel.org, linux-raid

On Friday June 12, mjt@tls.msk.ru wrote:
> linux-raid.vger.kernel.org@atu.cjb.net wrote:
> > NeilBrown <neilb@suse.de> wrote :
> >> Once you are sure you have the 6 drives with the highest
> >> event counts working, assemble them with
> >>
> >> mdadm --assemble /dev/md13 --force /dev/sd?1
> > 
> > I had a few questions before I went ahead with the
> > reassembly:
> > 
> > 1) Does it matter which order the disks are listed in when
> > reassembling the array (e.g. /dev/sda1 /dev/sdh1 ...)?
> 
> No, the order does not matter.  In superblock of each
> device there's the device number so mdadm will figure
> it all out automatically.
> 
> On the other hand, if you want to RECREATE the array
> (with mdadm --create), order DOES matter - it's pretty
> much essential to get the same order as original array.
> 
> > 2) Is there any risk to the data stored on the disks by
> > merely reassembling the six working disks with the above
> > command?
> 
> If you original set (raid6) was 8 disks, there's nothing
> to do with the data.  I mean, mdadm/kernel will not
> start any sort of reconstruction because there's no
> drives to resync data to.  The data will not be
> changed.  Superblocks will be updated (event counts)
> but that's not data.
> 
> > 3) Does /sys/module/md_mod/parameters/start_ro being
> > set to 1 prevent the array from syncing/rebuilding/etc.,
> > or does it only prevent new user data being written to
> > the array?  If it only prevents user data being written
> > to the /dev/md*, is there some way to also prevent mdadm
> > from doing syncing/rebuilding/etc. so I can be sure the
> > data is not at risk of further damage while testing?
> 
> See above.  I really am not sure for start_ro vs
> rebuilding - will check ;)

If an array started read-only, then no resync/rebuild etc will happen
until the first write, or until the array is explicitly set to
read-write.


> > 
> > 4) Having checked what the "Events" refers to (I thought it
> > was write-syncing operations before), should I be worried
> > at the Event count being above 1,000,000?  I have rebuilt
> > two failed disks and the distro performed a few data
> > integrity checks on all the disks.  The array is about
> > nine to ten months old.
> 
> Well, 1.000.000 is a bit too high for that time.
> Mine has 28 - half a year old raid array.  But I don't
> reboot machine often, it has been rebooted about 10
> times in that time.  Events are like - array assembly
> and disassembly, drive failed, drive added and the like.

Events also can increase every time the array switched between
'active' and 'clean'.  It will switch to 'clean' after 200ms without
writes and then switch back to 'active' on the first write.
So you could get nearly 10 changes per second with a workload that
generates 1 write every 201 ms.

If you have spare drives, then md drive to avoid increases the event
count so much so that it doesn't have to write the the otherwise-idle
drives.  It does this be decrementing the event count on an
active->clean transition if that seems safe.

So on an array with no spares, 1,000,000 is entirely possible - it is
very workload-dependant.

> 
> > 5) Any idea why "shutdown -h now" would cause three of
> > the six working disks to gain 4 events each (happened with
> > the filesystem unmounted from /dev/md13)?
> 
> It shouldn't be that high really.  I think.  *Especially*
> on only some of the disks.

You would definitely expect them to be all updated by the same amount
unless there were drive failures.
Unmounting the filesystem would cause a write of the filesystem
superblock which could switch the array to active.  Then it would
switch to clean, and there could be another double-switch in the when
stopping the array.  So 4 isn't particularly surprising.

NeilBrown

> 
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (no success)
  2009-06-12  1:26         ` Neil Brown
@ 2009-06-13  9:18           ` linux-raid.vger.kernel.org
  2009-06-13  9:24             ` linux-raid.vger.kernel.org
  2009-06-13  9:58             ` NeilBrown
  0 siblings, 2 replies; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-13  9:18 UTC (permalink / raw)
  To: linux-raid

I was too busy with work to try repairing the RAID-6 array until
tonight.  I turned off the computer and carefully rearranged all disks
and wires so everything was in a good/snug position, removed the two of
six disks that had failed days before the others, and then tested that
the six remaining disks were all working now without errors -- which
they were.

I used the following command to reassemble the array:

# mdadm --assemble /dev/md13 --verbose --force /dev/sd{a,b,c,d,e,f}1

mdadm: looking for devices for /dev/md13
mdadm: /dev/sda1 is identified as a member of /dev/md13, slot 2.
mdadm: /dev/sdb1 is identified as a member of /dev/md13, slot 5.
mdadm: /dev/sdc1 is identified as a member of /dev/md13, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md13, slot 6.
mdadm: /dev/sde1 is identified as a member of /dev/md13, slot 7.
mdadm: /dev/sdf1 is identified as a member of /dev/md13, slot 3.
mdadm: forcing event count in /dev/sdc1(1) from 2186232 upto 2186236
mdadm: forcing event count in /dev/sdf1(3) from 2186232 upto 2186236
mdadm: forcing event count in /dev/sde1(7) from 2186232 upto 2186236
mdadm: no uptodate device for slot 0 of /dev/md13
mdadm: added /dev/sda1 to /dev/md13 as 2
mdadm: added /dev/sdf1 to /dev/md13 as 3
mdadm: no uptodate device for slot 4 of /dev/md13
mdadm: added /dev/sdb1 to /dev/md13 as 5
mdadm: added /dev/sdd1 to /dev/md13 as 6
mdadm: added /dev/sde1 to /dev/md13 as 7
mdadm: added /dev/sdc1 to /dev/md13 as 1
[ 2727.749972] raid5: raid level 6 set md13 active with 6 out of 8 devices, algorithm 2
mdadm: /dev/md13 has been started with 6 drives (out of 8).

After this I viewed the /proc/mdstat which seemed in order, the only note
being that it was listed as read-only due to the /sys/module/md_mod/parameters/start_ro being set to read-only mode.

At this point I added the Linux Device-Mapper encryption over /dev/md13 as
I always do, and attempted to mount the filesystem from the encrypted
device in read-only mode, but it failed.

I rebooted at this point so it wouldn't be in read-only mode anymore, and
see if it would auto-assemble properly after a reboot -- and it did.

However, I can not mount my encrypted filesystem no matter what I try.

# mdadm --verbose --verbose --detail --scan /dev/md13

/dev/md13:
        Version : 00.90
  Creation Time : Sun Aug  3 10:21:28 2008
     Raid Level : raid6
     Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
  Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 13
    Persistence : Superblock is persistent

    Update Time : Sat Jun 13 02:03:43 2009
          State : clean, degraded
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K

           UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
         Events : 0.2186266

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8        1        2      active sync   /dev/sda1
       3       8       81        3      active sync   /dev/sdf1
       4       0        0        4      removed
       5       8       17        5      active sync   /dev/sdb1
       6       8       49        6      active sync   /dev/sdd1
       7       8       65        7      active sync   /dev/sde1

The individual disks still have confusing/conflicting information, each
disk showing different states of failed and active for the various disks,
as it did before I reassembled it.


I don't suppose the Linux Device-Mapper / cryptsetup have had any
changes between 2.6.17 and 2.6.28 which could account for me being
unable to decrypt my filesystem?  I tried using "strings" on the
/dev/mapper/blah after it is created, but it's pure random data.
I am positive my password is correct, I have tried it at least a
dozen times already.

Is there anything I can do at this point?

I feel dreadful to lose this data.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (no success)
  2009-06-13  9:18           ` RAID-6 mdadm disks out of sync issue (no success) linux-raid.vger.kernel.org
@ 2009-06-13  9:24             ` linux-raid.vger.kernel.org
  2009-06-13  9:58             ` NeilBrown
  1 sibling, 0 replies; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-13  9:24 UTC (permalink / raw)
  To: linux-raid

> The individual disks still have confusing/conflicting information, each
> disk showing different states of failed and active for the various disks,
> as it did before I reassembled it.

Correction: they did have the old data listed for a while, but have now
updated themselves so all disks show the same two missing disks, and the
rest as active.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (no success)
  2009-06-13  9:18           ` RAID-6 mdadm disks out of sync issue (no success) linux-raid.vger.kernel.org
  2009-06-13  9:24             ` linux-raid.vger.kernel.org
@ 2009-06-13  9:58             ` NeilBrown
  2009-06-13 18:02               ` linux-raid.vger.kernel.org
  1 sibling, 1 reply; 19+ messages in thread
From: NeilBrown @ 2009-06-13  9:58 UTC (permalink / raw)
  To: linux-raid.vger.kernel.org; +Cc: linux-raid

On Sat, June 13, 2009 7:18 pm, linux-raid.vger.kernel.org@atu.cjb.net wrote:
> I was too busy with work to try repairing the RAID-6 array until
> tonight.  I turned off the computer and carefully rearranged all disks
> and wires so everything was in a good/snug position, removed the two of
> six disks that had failed days before the others, and then tested that
> the six remaining disks were all working now without errors -- which
> they were.
>
> I used the following command to reassemble the array:
>
> # mdadm --assemble /dev/md13 --verbose --force /dev/sd{a,b,c,d,e,f}1
>
> mdadm: looking for devices for /dev/md13
> mdadm: /dev/sda1 is identified as a member of /dev/md13, slot 2.
> mdadm: /dev/sdb1 is identified as a member of /dev/md13, slot 5.
> mdadm: /dev/sdc1 is identified as a member of /dev/md13, slot 1.
> mdadm: /dev/sdd1 is identified as a member of /dev/md13, slot 6.
> mdadm: /dev/sde1 is identified as a member of /dev/md13, slot 7.
> mdadm: /dev/sdf1 is identified as a member of /dev/md13, slot 3.
> mdadm: forcing event count in /dev/sdc1(1) from 2186232 upto 2186236
> mdadm: forcing event count in /dev/sdf1(3) from 2186232 upto 2186236
> mdadm: forcing event count in /dev/sde1(7) from 2186232 upto 2186236
> mdadm: no uptodate device for slot 0 of /dev/md13
> mdadm: added /dev/sda1 to /dev/md13 as 2
> mdadm: added /dev/sdf1 to /dev/md13 as 3
> mdadm: no uptodate device for slot 4 of /dev/md13
> mdadm: added /dev/sdb1 to /dev/md13 as 5
> mdadm: added /dev/sdd1 to /dev/md13 as 6
> mdadm: added /dev/sde1 to /dev/md13 as 7
> mdadm: added /dev/sdc1 to /dev/md13 as 1
> [ 2727.749972] raid5: raid level 6 set md13 active with 6 out of 8
> devices, algorithm 2
> mdadm: /dev/md13 has been started with 6 drives (out of 8).
>
> After this I viewed the /proc/mdstat which seemed in order, the only note
> being that it was listed as read-only due to the
> /sys/module/md_mod/parameters/start_ro being set to read-only mode.
>
> At this point I added the Linux Device-Mapper encryption over /dev/md13 as
> I always do, and attempted to mount the filesystem from the encrypted
> device in read-only mode, but it failed.

More information required: how did it fail?
How about "fsck -n /dev/whatever" ??

The output from mdadm --assemble --force looks encouraging.   It
suggests that it was able to re-assemble the array with only minor
changes to the metadata.

So it really looks like you should be very close to success....

NeilBrown


>
> I rebooted at this point so it wouldn't be in read-only mode anymore, and
> see if it would auto-assemble properly after a reboot -- and it did.
>
> However, I can not mount my encrypted filesystem no matter what I try.
>
> # mdadm --verbose --verbose --detail --scan /dev/md13
>
> /dev/md13:
>         Version : 00.90
>   Creation Time : Sun Aug  3 10:21:28 2008
>      Raid Level : raid6
>      Array Size : 5852944512 (5581.80 GiB 5993.42 GB)
>   Used Dev Size : 975490752 (930.30 GiB 998.90 GB)
>    Raid Devices : 8
>   Total Devices : 6
> Preferred Minor : 13
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Jun 13 02:03:43 2009
>           State : clean, degraded
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 64K
>
>            UUID : 7f6da4ce:2ddbe010:f7481424:9a8f8874 (local to host gqq)
>          Events : 0.2186266
>
>     Number   Major   Minor   RaidDevice State
>        0       0        0        0      removed
>        1       8       33        1      active sync   /dev/sdc1
>        2       8        1        2      active sync   /dev/sda1
>        3       8       81        3      active sync   /dev/sdf1
>        4       0        0        4      removed
>        5       8       17        5      active sync   /dev/sdb1
>        6       8       49        6      active sync   /dev/sdd1
>        7       8       65        7      active sync   /dev/sde1
>
> The individual disks still have confusing/conflicting information, each
> disk showing different states of failed and active for the various disks,
> as it did before I reassembled it.
>
>
> I don't suppose the Linux Device-Mapper / cryptsetup have had any
> changes between 2.6.17 and 2.6.28 which could account for me being
> unable to decrypt my filesystem?  I tried using "strings" on the
> /dev/mapper/blah after it is created, but it's pure random data.
> I am positive my password is correct, I have tried it at least a
> dozen times already.
>
> Is there anything I can do at this point?
>
> I feel dreadful to lose this data.
>
>  - S.A.
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (no success)
  2009-06-13  9:58             ` NeilBrown
@ 2009-06-13 18:02               ` linux-raid.vger.kernel.org
  2009-06-13 20:27                 ` RAID-6 mdadm disks out of sync issue (success!) linux-raid.vger.kernel.org
  0 siblings, 1 reply; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-13 18:02 UTC (permalink / raw)
  To: linux-raid


> More information required: how did it fail?
> How about "fsck -n /dev/whatever" ??

The "mount" command never gives useful information, just vague
multiple-choice error messages.  The simple reality is there
is no filesystem (corrupted or otherwise) on the
/dev/mapper/the_encrypted device.

This is confirmed by the fact that I did:

# strings -n 1 /dev/mapper/the_encrypted

There is nothing there for fsck to work with, no filesystem, no
files, just random data indefinitely (I watched it for around 3
minutes).

> The output from mdadm --assemble --force looks encouraging.   It
> suggests that it was able to re-assemble the array with only minor
> changes to the metadata.
> 
> So it really looks like you should be very close to success....

I feel very close to giving up on computers, certainly not success.

I was lax with backing up due to the perceived stability of RAID-6
arrays, and now we have lost most of 10 years of files belonging to
three people.

Re the useless error with mount:

mount: wrong fs type, bad option, bad superblock on
/dev/mapper/the_encrypted, missing codepage or helper
program, or other error.

# fdisk -l > /dev/null

Error: /dev/mapper/the_encrypted: unrecognised disk label
Error: /dev/md13: unrecognised disk label

As far as I remember, when it was working the_encrypted would
show up as 5.5 terabytes, while the md13 would produce that same
error.

Re fsck, there is nothing there to fsck.  It tells me there is no
superblock and exits.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (success!)
  2009-06-13 18:02               ` linux-raid.vger.kernel.org
@ 2009-06-13 20:27                 ` linux-raid.vger.kernel.org
  2009-06-14  7:10                   ` RAID-6 mdadm disks out of sync issue (more questions) linux-raid.vger.kernel.org
  0 siblings, 1 reply; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-13 20:27 UTC (permalink / raw)
  To: linux-raid

Problem solved, thank you for your help.

I use a custom program for creating a hardened encryption key,
wherein I need to fill in a bunch of information, including two
repetition values.  These are used to cycle over the binary key
rehashing it N times.

Prior to this Ubuntu upgrade, it would generate the key in about
15 seconds, after the upgrade about 1 minute.  Because of this
slowness, I thought I had set the repetition values too high,
and ended up reverting to the repetitions I used on my previous
computer from a few years ago.

I had been Ctrl+C'ing out of my key generating program because
it had seemed broken.  I actually sat and waited for it today,
and it worked fine.  Phew!

The filesystem had been unmounted when I had the RAID issue, so
the data is intact.

I figured out why three of the disks had higher events after
the shutdown as well:

My disks go into standby mode fairly quickly when not in use,
and some of the disks take a long time to wake up (~10 seconds),
while others are much quicker (~4 seconds).  When Ubuntu was
shutting down the system, the 3 slower disks didn't get woken
up before the reboot finished, so they were slightly behind.

I also solved why my event count was in the millions, which is
because I was running this command for realtime monitoring:

# watch -n 1 'mdadm --verbose --verbose --detail --scan /dev/md13|tail -n 23'

This mdadm command would repeat every second, and it would
update the event count by 2 each time.  I hadn't realised that
this way of monitoring the disks would alter the array in any
way.

Now I'm off to rebuild the 2 missing disks, and then build a
new array on another computer for backups.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-13 20:27                 ` RAID-6 mdadm disks out of sync issue (success!) linux-raid.vger.kernel.org
@ 2009-06-14  7:10                   ` linux-raid.vger.kernel.org
  2009-06-14  8:11                     ` NeilBrown
  0 siblings, 1 reply; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-14  7:10 UTC (permalink / raw)
  To: linux-raid

So here I was thinking everything was fine.  My six disks were working
for hours and the other two disks were loaded as spares and the first
one was rebuilding, up to 30% with an ETA of 5 hours.  I left the house
for a few hours and when I came back, the same disk with read errors
before had spontaneously disconnected and reconnected three times (I
saw in dmesg).  It probably got around 80% of the way through the six
hour rebuild.

The problem is that when the /dev/sdc disk reconnected itself after,
it was marked as a "Spare", and now I can't use the same command any
longer:

# mdadm --assemble /dev/md13 --verbose --force /dev/sd{a,b,c,d,e,f}1

This time it doesn't work, as it says 5 disks and 1 spare isn't enough
to start the array.  I also tried --re-add, but it already thinks it
is disk 9 out of 8, a Spare.

How can I safely put this disk back into its proper place so I can
again try to rebuild disks 7 and 8?  I'm assuming I probably need to
use mdadm --create, but I'm not sure, and don't want to get it wrong
and have it overwrite this needed disk.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-14  7:10                   ` RAID-6 mdadm disks out of sync issue (more questions) linux-raid.vger.kernel.org
@ 2009-06-14  8:11                     ` NeilBrown
  2009-06-14 21:01                       ` linux-raid.vger.kernel.org
  2009-06-16  3:38                       ` Luca Berra
  0 siblings, 2 replies; 19+ messages in thread
From: NeilBrown @ 2009-06-14  8:11 UTC (permalink / raw)
  To: linux-raid.vger.kernel.org; +Cc: linux-raid

On Sun, June 14, 2009 5:10 pm, linux-raid.vger.kernel.org@atu.cjb.net wrote:
> So here I was thinking everything was fine.  My six disks were working
> for hours and the other two disks were loaded as spares and the first
> one was rebuilding, up to 30% with an ETA of 5 hours.  I left the house
> for a few hours and when I came back, the same disk with read errors
> before had spontaneously disconnected and reconnected three times (I
> saw in dmesg).  It probably got around 80% of the way through the six
> hour rebuild.
>
> The problem is that when the /dev/sdc disk reconnected itself after,
> it was marked as a "Spare", and now I can't use the same command any
> longer:

This doesn't make a lot of sense.  It should not have been marked as
a spare unless someone explicitly tried to "Add" it to the array.

I've been thinking that I need to improve mdadm in this respect
and make it harder to accidentally turn a failed drive into a spare.

However you description of event suggests that this was automatic
which is strange.
Can I get the complete kernel logs from when the rebuild started to
when you finally gave up?  It might help me understand.


>
> # mdadm --assemble /dev/md13 --verbose --force /dev/sd{a,b,c,d,e,f}1
>
> This time it doesn't work, as it says 5 disks and 1 spare isn't enough
> to start the array.  I also tried --re-add, but it already thinks it
> is disk 9 out of 8, a Spare.
>
> How can I safely put this disk back into its proper place so I can
> again try to rebuild disks 7 and 8?  I'm assuming I probably need to
> use mdadm --create, but I'm not sure, and don't want to get it wrong
> and have it overwrite this needed disk.

Yes, I suspect that you need --create, but I cannot be certain with
out seeing all the details (e.g. --examine of all devices).
When using --create you need to ensure that the drives are in the
right order with "missing" at the right places.  As long as there
are two missing devices no resync will happen so the data will not be
changed.  So after doing a --create you can fsck and mount etc and ensure
the data is safe before continuing.

But if you cannot get though a sequential read of all devices without
any read error, you wont be able to rebuild redundancy.  (There are plans
to make raid6 more robust in this scenario, but they are a long way
from fruition yet).

NeilBrown


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-14  8:11                     ` NeilBrown
@ 2009-06-14 21:01                       ` linux-raid.vger.kernel.org
  2009-06-15 15:48                         ` Bill Davidsen
  2009-06-16  6:00                         ` Neil Brown
  2009-06-16  3:38                       ` Luca Berra
  1 sibling, 2 replies; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-14 21:01 UTC (permalink / raw)
  To: NeilBrown, linux-raid


> This doesn't make a lot of sense.  It should not have been marked
> as a spare unless someone explicitly tried to "Add" it to the
> array.
> 
> However you description of event suggests that this was automatic
> which is strange.

Yes, it was entirely automatic.  The only commands I had running on the computer when it happened were:

# watch -n 0.1 'uptime; echo; cat /proc/mdstat|grep md13 -A 2; echo; dmesg|tac'

This gave me a nice, simple display of what was going on with the
rebuild, and a monitor of dmesg in case there were any new kernel
messages.

> Can I get the complete kernel logs from when the rebuild started
> to when you finally gave up?  It might help me understand.

Sure.

Just to confirm, /dev/sd{a,b,c,d,e,f}1 are the partitions which
contain my up-to-date data.  /dev/sd{i,j}1 contain many days old data.

Here is the entire dmesg output during the rebuild:

[ 4245.3] md: md13 switched to read-write mode.
[ 4260.7] md: md13 still in use.
[ 4268.0] md: md13 still in use.
[ 4269.8] md: md13 still in use.
[ 4354.9] md: md13 still in use.
[ 4402.9] md: md13 switched to read-only mode.
[ 4408.1] md: md13 switched to read-write mode.

I had tried to add the two old disks (sdi and sdj) while the array was
in read-only mode for the rebuild, but it didn't allow me.  Is there
any way to mark the six valid disks as read-only so they will not be
modified during the rebuild (and not become spares, have their event
count updated, etc.)?

[ 4418.3] md: bind<sdi1>
[ 4418.4] RAID5 conf printout:
[ 4418.4]  --- rd:8 wd:6
[ 4418.4]  disk 0, o:1, dev:sdi1
[ 4418.4]  disk 1, o:1, dev:sdd1
[ 4418.4]  disk 2, o:1, dev:sda1
[ 4418.4]  disk 3, o:1, dev:sdf1
[ 4418.4]  disk 5, o:1, dev:sdc1
[ 4418.4]  disk 6, o:1, dev:sde1
[ 4418.4]  disk 7, o:1, dev:sdb1
[ 4418.4] md: recovery of RAID array md13
[ 4418.4] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 4418.4] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 4418.4] md: using 128k window, over a total of 975490752 blocks.
[ 4421.8] md: md_do_sync() got signal ... exiting
[ 4421.9] md: md13 switched to read-only mode.
[ 4549.0] md: md13 switched to read-write mode.

I again switched back to read-only mode, hoping it would continue
rebuilding, but it stopped, so I went back to read-write mode and
it resumed the rebuild.

[ 4549.0] RAID5 conf printout:
[ 4549.0]  --- rd:8 wd:6
[ 4549.0]  disk 0, o:1, dev:sdi1
[ 4549.0]  disk 1, o:1, dev:sdd1
[ 4549.0]  disk 2, o:1, dev:sda1
[ 4549.0]  disk 3, o:1, dev:sdf1
[ 4549.0]  disk 5, o:1, dev:sdc1
[ 4549.0]  disk 6, o:1, dev:sde1
[ 4549.0]  disk 7, o:1, dev:sdb1
[ 4549.0] RAID5 conf printout:
[ 4549.0]  --- rd:8 wd:6
[ 4549.0]  disk 0, o:1, dev:sdi1
[ 4549.0]  disk 1, o:1, dev:sdd1
[ 4549.0]  disk 2, o:1, dev:sda1
[ 4549.0]  disk 3, o:1, dev:sdf1
[ 4549.0]  disk 5, o:1, dev:sdc1
[ 4549.0]  disk 6, o:1, dev:sde1
[ 4549.0]  disk 7, o:1, dev:sdb1
[ 4549.0] md: recovery of RAID array md13
[ 4549.0] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 4549.0] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 4549.0] md: using 128k window, over a total of 975490752 blocks.
[ 4549.0] md: resuming recovery of md13 from checkpoint.
[ 4628.7] mdadm[19700]: segfault at 0 ip 000000000041617f sp 00007fff87776290 error 4 in mdadm[400000+2a000]

This new version of mdadm from after my Ubuntu 9.10 upgrade with Linux
2.6.28 seg faults every time a new event happens, such as a disk being
added or removed.  Prior to the upgrade, using Linux 2.6.17 and
whichever older version of mdadm it had, I had never seen it seg fault.

# mdadm --version

mdadm - v2.6.7.1 - 15th October 2008

[ 4647.7] ata1.00: exception Emask 0x0 SAct 0xff SErr 0x0 action 0x6 frozen
[ 4647.7] ata1.00: cmd 61/80:00:87:3c:63/00:00:00:00:00/40 tag 0 ncq 65536 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/40:08:07:3d:63/00:00:00:00:00/40 tag 1 ncq 32768 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/b0:10:47:3d:63/00:00:00:00:00/40 tag 2 ncq 90112 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/b8:18:f7:3d:63/01:00:00:00:00/40 tag 3 ncq 225280 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/60:20:af:3f:63/02:00:00:00:00/40 tag 4 ncq 311296 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/08:28:0f:42:63/01:00:00:00:00/40 tag 5 ncq 135168 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/b0:30:d7:43:63/00:00:00:00:00/40 tag 6 ncq 90112 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1.00: cmd 61/c0:38:17:43:63/00:00:00:00:00/40 tag 7 ncq 98304 out
[ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4647.7] ata1.00: status: { DRDY }
[ 4647.7] ata1: hard resetting link
[ 4648.2] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 4648.2] ata1.00: configured for UDMA/133
[ 4648.2] ata1: EH complete

I've noticed that dmesg most often lists disks as "ata1", "ata9" etc.
and I have found no way to convert these into /dev/sdc style format.
Do you know how to translate these disk identifiers?  It's really
quite frustrating not knowing which disk an error/message is from,
especially when 2 or 3 disks have issues at the same time.

[ 4648.2] sd 0:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[ 4648.2] sd 0:0:0:0: [sdi] Write Protect is off
[ 4648.2] sd 0:0:0:0: [sdi] Mode Sense: 00 3a 00 00
[ 4648.2] sd 0:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Proceeding is when I added the last disk back into the set.  I had hoped
that both disks could rebuild simultaneously, but it seems to force it
to only rebuild one at a time.  Is there any way to rebuild both disks
together?  It is frustrating having two idle CPUs during the rebuild,
and low disk throughput.  I'm guessing mdadm is not a threaded app.

Though I am actually going to keep /dev/sdj as a backup, in case there
is no way to successfully read the data from /dev/sdc.  sdj is a week
older than the rest of the data, but something would be better than
nothing.  Though I would try mounting it read-only and using rsync to
copy data off before I tried something that would break things like that.

[ 4648.3] md: bind<sdj1>
[ 4661.8] mdadm[19774]: segfault at 0 ip 000000000041617f sp 00007fff7630ae00 error 4 in mdadm[400000+2a000]
[ 4662.2] mdadm[19854]: segfault at 0 ip 000000000041617f sp 00007fff72062b80 error 4 in mdadm[400000+2a000]
[ 4697.7] mdadm[19913]: segfault at 0 ip 000000000041617f sp 00007fffefb31640 error 4 in mdadm[400000+2a000]
[ 4697.7] mdadm[19912]: segfault at 0 ip 000000000041617f sp 00007fff9b1bacb0 error 4 in mdadm[400000+2a000]
[ 4697.9] mdadm[19997]: segfault at 0 ip 000000000041617f sp 00007fffd001fb10 error 4 in mdadm[400000+2a000]
[ 4697.9] mdadm[20016]: segfault at 0 ip 000000000041617f sp 00007fff4e9d44f0 error 4 in mdadm[400000+2a000]
[ 4916.6] md: unbind<sdj1>
[ 4916.6] md: export_rdev(sdj1)
[ 4935.3] md: export_rdev(sdj1)
[ 4935.4] md: bind<sdj1>

At this point it was rebuilding fine.  It had an ETA of 4.5 hours left,
from the original 6.0 hours.  I left the house.  Following is the disk
error when I was gone:

[13691.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[13691.4] ata5.00: irq_stat 0x40000008
[13691.4] ata5.00: cmd 60/98:20:7f:af:fa/00:00:31:00:00/40 tag 4 ncq 77824 in
[13691.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
[13691.4] ata5.00: status: { DRDY ERR }
[13691.4] ata5.00: error: { UNC }
[13691.4] ata5.00: configured for UDMA/133
[13691.4] ata5: EH complete
[13691.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[13691.4] sd 4:0:0:0: [sdc] Write Protect is off
[13691.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[13691.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13693.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[13693.4] ata5.00: irq_stat 0x40000008
[13693.4] ata5.00: cmd 60/98:28:7f:af:fa/00:00:31:00:00/40 tag 5 ncq 77824 in
[13693.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
[13693.4] ata5.00: status: { DRDY ERR }
[13693.4] ata5.00: error: { UNC }
[13693.4] ata5.00: configured for UDMA/133
[13693.4] ata5: EH complete

It seems to me like it simply disconnected and then reconnected.  I have
always had this issue on all sorts of hardware on 2.6 kernels, which
makes me think it isn't always a hardware issue, and possibly a Linux
kernel/driver issue.

[13693.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[13693.4] sd 4:0:0:0: [sdc] Write Protect is off
[13693.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[13693.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13694.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[13694.4] ata5.00: irq_stat 0x40000008
[13694.4] ata5.00: cmd 60/98:20:7f:af:fa/00:00:31:00:00/40 tag 4 ncq 77824 in
[13694.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
[13694.4] ata5.00: status: { DRDY ERR }
[13694.4] ata5.00: error: { UNC }
[13694.4] ata5.00: configured for UDMA/133
[13694.4] ata5: EH complete
[13694.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[13694.4] sd 4:0:0:0: [sdc] Write Protect is off
[13694.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[13694.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13695.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[13695.4] ata5.00: irq_stat 0x40000008
[13695.4] ata5.00: cmd 60/98:28:7f:af:fa/00:00:31:00:00/40 tag 5 ncq 77824 in
[13695.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
[13695.4] ata5.00: status: { DRDY ERR }
[13695.4] ata5.00: error: { UNC }
[13695.4] ata5.00: configured for UDMA/133
[13695.4] ata5: EH complete
[13695.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[13695.4] sd 4:0:0:0: [sdc] Write Protect is off
[13695.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[13695.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13696.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[13696.4] ata5.00: irq_stat 0x40000008
[13696.4] ata5.00: cmd 60/98:20:7f:af:fa/00:00:31:00:00/40 tag 4 ncq 77824 in
[13696.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
[13696.4] ata5.00: status: { DRDY ERR }
[13696.4] ata5.00: error: { UNC }
[13696.4] ata5.00: configured for UDMA/133
[13696.4] ata5: EH complete
[13696.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[13696.4] sd 4:0:0:0: [sdc] Write Protect is off
[13696.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[13696.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13697.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[13697.4] ata5.00: irq_stat 0x40000008
[13697.4] ata5.00: cmd 60/98:28:7f:af:fa/00:00:31:00:00/40 tag 5 ncq 77824 in
[13697.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
[13697.4] ata5.00: status: { DRDY ERR }
[13697.4] ata5.00: error: { UNC }
[13697.4] ata5.00: configured for UDMA/133
[13697.4] sd 4:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[13697.4] sd 4:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor]
[13697.4] Descriptor sense data with sense descriptors (in hex):
[13697.4]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[13697.4]         31 fa af f7
[13697.4] sd 4:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed
[13697.4] end_request: I/O error, dev sdc, sector 838512631
[13697.4] raid5:md13: read error not correctable (sector 838512568 on sdc1).
[13697.4] raid5: Disk failure on sdc1, disabling device.
[13697.4] raid5: Operation continuing on 5 devices.

This last line is something I have been baffled by -- how does a RAID-5
or RAID-6 device continue as "active" when fewer than the minimum number
of disks is present?  This happened with my RAID-5 swap array losing 2
disks, and happened above on a RAID-6 with only 5 of 8 disks.  When I
arrived home, it clearly said the array was still "active".

[13697.4] raid5:md13: read error not correctable (sector 838512576 on sdc1).
[13697.4] raid5:md13: read error not correctable (sector 838512584 on sdc1).
[13697.4] raid5:md13: read error not correctable (sector 838512592 on sdc1).
[13697.4] ata5: EH complete
[13697.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[13697.4] sd 4:0:0:0: [sdc] Write Protect is off
[13697.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[13697.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[13711.0] md: md13: recovery done.

What is this "recovery done" referring to?  No recovery was completed.

[13711.1] RAID5 conf printout:
[13711.1]  --- rd:8 wd:5
[13711.1]  disk 0, o:1, dev:sdi1
[13711.1]  disk 1, o:1, dev:sdd1
[13711.1]  disk 2, o:1, dev:sda1
[13711.1]  disk 3, o:1, dev:sdf1
[13711.1]  disk 5, o:0, dev:sdc1
[13711.1]  disk 6, o:1, dev:sde1
[13711.1]  disk 7, o:1, dev:sdb1
[13711.1] RAID5 conf printout:
[13711.1]  --- rd:8 wd:5
[13711.1]  disk 1, o:1, dev:sdd1
[13711.1]  disk 2, o:1, dev:sda1
[13711.1]  disk 3, o:1, dev:sdf1
[13711.1]  disk 5, o:0, dev:sdc1
[13711.1]  disk 6, o:1, dev:sde1
[13711.1]  disk 7, o:1, dev:sdb1
[13711.1] RAID5 conf printout:
[13711.1]  --- rd:8 wd:5
[13711.1]  disk 1, o:1, dev:sdd1
[13711.1]  disk 2, o:1, dev:sda1
[13711.1]  disk 3, o:1, dev:sdf1
[13711.1]  disk 5, o:0, dev:sdc1
[13711.1]  disk 6, o:1, dev:sde1
[13711.1]  disk 7, o:1, dev:sdb1
[13711.1] RAID5 conf printout:
[13711.1]  --- rd:8 wd:5
[13711.1]  disk 1, o:1, dev:sdd1
[13711.1]  disk 2, o:1, dev:sda1
[13711.1]  disk 3, o:1, dev:sdf1
[13711.1]  disk 6, o:1, dev:sde1
[13711.1]  disk 7, o:1, dev:sdb1

I arrived home and performed the following commands
(I have removed some of the duplicate commands):

# mdadm --verbose --verbose --detail --scan /dev/md13
# mdadm --verbose --verbose --detail --scan /dev/md13
# mdadm /dev/md13 --remove /dev/sdj1 /dev/sdi1
# mdadm --verbose --verbose --detail --scan /dev/md13
# mdadm /dev/md13 --remove /dev/sdc1
# mdadm --verbose --verbose --detail --scan /dev/md13
# mdadm /dev/md13 --re-add /dev/sdc1
# mdadm --verbose --verbose --detail --scan /dev/md13
# mdadm /dev/md13 --remove /dev/sdc1
# mdadm --verbose --verbose --detail --scan /dev/md13
# mdadm --readonly /dev/md13
# cat /proc/mdstat
# man mdadm
# mdadm --stop /dev/md13
# c; for disk in /dev/sd{a,b,c,d,e,f}1; do mdadm --examine "$disk"; read; c; done
# c; for disk in /dev/sd{a,b,c,d,e,f}1; do printf "$disk"; mdadm --examine "$disk" | g events; done
# mdadm --stop /dev/md13
# mdadm --assemble /dev/md13 --verbose --force /dev/sd{a,b,c,d,e,f}1
# mdadm --stop /dev/md13
# mdadm --verbose --examine /dev/sdc1

I also detached the /dev/sdc disk and reattached it to my other SATA
controller.

[21281.4] md: unbind<sdj1>
[21281.4] md: export_rdev(sdj1)
[21281.4] md: unbind<sdi1>
[21281.4] md: export_rdev(sdi1)
[21281.5] Buffer I/O error on device md13, logical block 1463236112
[21281.5] Buffer I/O error on device md13, logical block 1463236112
[21281.5] Buffer I/O error on device md13, logical block 1463236126
[21281.5] Buffer I/O error on device md13, logical block 1463236126
[21281.5] Buffer I/O error on device md13, logical block 1463236127
[21281.5] Buffer I/O error on device md13, logical block 1463236127
[21281.5] Buffer I/O error on device md13, logical block 1463236127
[21281.5] Buffer I/O error on device md13, logical block 1463236127
[21281.5] Buffer I/O error on device md13, logical block 1463236127
[21281.5] Buffer I/O error on device md13, logical block 1463236127
[21307.3] md: unbind<sdc1>
[21307.3] md: export_rdev(sdc1)
[21307.4] __ratelimit: 6 callbacks suppressed
[21307.4] Buffer I/O error on device md13, logical block 1463236112
[21307.4] Buffer I/O error on device md13, logical block 1463236112
[21307.4] Buffer I/O error on device md13, logical block 1463236126
[21307.4] Buffer I/O error on device md13, logical block 1463236126
[21307.4] Buffer I/O error on device md13, logical block 1463236127
[21307.4] Buffer I/O error on device md13, logical block 1463236127
[21307.4] Buffer I/O error on device md13, logical block 1463236127
[21307.4] Buffer I/O error on device md13, logical block 1463236127
[21307.4] Buffer I/O error on device md13, logical block 1463236127
[21307.4] Buffer I/O error on device md13, logical block 1463236127
[21323.4] md: bind<sdc1>
[21323.5] __ratelimit: 6 callbacks suppressed
[21323.5] Buffer I/O error on device md13, logical block 1463236112
[21323.5] Buffer I/O error on device md13, logical block 1463236112
[21323.5] Buffer I/O error on device md13, logical block 1463236126
[21323.5] Buffer I/O error on device md13, logical block 1463236126
[21323.5] Buffer I/O error on device md13, logical block 1463236127
[21323.5] Buffer I/O error on device md13, logical block 1463236127
[21323.5] Buffer I/O error on device md13, logical block 1463236127
[21323.5] Buffer I/O error on device md13, logical block 1463236127
[21323.5] Buffer I/O error on device md13, logical block 1463236127
[21323.5] Buffer I/O error on device md13, logical block 1463236127
[21350.1] md: unbind<sdc1>
[21350.1] md: export_rdev(sdc1)
[21350.2] __ratelimit: 6 callbacks suppressed
[21350.2] Buffer I/O error on device md13, logical block 1463236112
[21350.2] Buffer I/O error on device md13, logical block 1463236112
[21350.2] Buffer I/O error on device md13, logical block 1463236126
[21350.2] Buffer I/O error on device md13, logical block 1463236126
[21350.2] Buffer I/O error on device md13, logical block 1463236127
[21350.2] Buffer I/O error on device md13, logical block 1463236127
[21350.2] Buffer I/O error on device md13, logical block 1463236127
[21350.2] Buffer I/O error on device md13, logical block 1463236127
[21350.2] Buffer I/O error on device md13, logical block 1463236127
[21350.2] Buffer I/O error on device md13, logical block 1463236127
[21368.1] md: md13 switched to read-only mode.
[21368.1] __ratelimit: 6 callbacks suppressed
[21368.1] Buffer I/O error on device md13, logical block 1463236112
[21368.1] Buffer I/O error on device md13, logical block 1463236112
[21368.1] Buffer I/O error on device md13, logical block 1463236126
[21368.1] Buffer I/O error on device md13, logical block 1463236126
[21368.1] Buffer I/O error on device md13, logical block 1463236127
[21368.1] Buffer I/O error on device md13, logical block 1463236127
[21368.1] Buffer I/O error on device md13, logical block 1463236127
[21368.1] Buffer I/O error on device md13, logical block 1463236127
[21368.1] Buffer I/O error on device md13, logical block 1463236127
[21368.1] Buffer I/O error on device md13, logical block 1463236127
[21488.8] md: md13 stopped.
[21488.8] md: unbind<sdf1>
[21488.8] md: export_rdev(sdf1)
[21488.8] md: unbind<sda1>
[21488.8] md: export_rdev(sda1)
[21488.8] md: unbind<sdd1>
[21488.8] md: export_rdev(sdd1)
[21488.8] md: unbind<sde1>
[21488.8] md: export_rdev(sde1)
[21488.8] md: unbind<sdb1>
[21488.8] md: export_rdev(sdb1)
[22603.8] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
[22603.8] ata5: irq_stat 0x00400000, PHY RDY changed
[22603.8] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
[22603.8] ata5: hard resetting link
[22604.5] ata5: SATA link down (SStatus 0 SControl 300)
[22609.5] ata5: hard resetting link
[22609.8] ata5: SATA link down (SStatus 0 SControl 300)
[22609.8] ata5: limiting SATA link speed to 1.5 Gbps
[22614.8] ata5: hard resetting link
[22615.2] ata5: SATA link down (SStatus 0 SControl 310)
[22615.2] ata5.00: disabled
[22615.2] ata5: EH complete
[22615.2] ata5.00: detaching (SCSI 4:0:0:0)
[22615.2] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[22615.2] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[22615.2] sd 4:0:0:0: [sdc] Stopping disk
[22615.2] sd 4:0:0:0: [sdc] START_STOP FAILED
[22615.2] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[22640.1] ata8: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
[22640.1] ata8: SError: { PHYRdyChg CommWake }
[22640.1] ata8: hard resetting link
[22640.8] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[22640.9] ata8.00: ATA-7: SAMSUNG HD103UJ, 1AA01109, max UDMA7
[22640.9] ata8.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 0/32)
[22640.9] ata8.00: configured for UDMA/100
[22640.9] ata8: EH complete
[22640.9] scsi 7:0:0:0: Direct-Access     ATA      SAMSUNG HD103UJ  1AA0 PQ: 0 ANSI: 5
[22640.9] sd 7:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[22640.9] sd 7:0:0:0: [sdc] Write Protect is off
[22640.9] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[22640.9] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[22640.9] sd 7:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[22640.9] sd 7:0:0:0: [sdc] Write Protect is off
[22640.9] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[22640.9] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[22640.9]  sdc: sdc1 sdc2
[22640.9] sd 7:0:0:0: [sdc] Attached SCSI disk
[22640.9] sd 7:0:0:0: Attached scsi generic sg2 type 0
[22641.0] md: bind<sdc1>
[22687.9] md: md13 stopped.
[22687.9] md: unbind<sdc1>
[22687.9] md: export_rdev(sdc1)
[22804.2] md: md13 stopped.
[22804.2] md: bind<sda1>
[22804.2] md: bind<sdf1>
[22804.2] md: bind<sde1>
[22804.2] md: bind<sdb1>
[22804.2] md: bind<sdc1>
[22804.2] md: bind<sdd1>
[22864.5] md: md13 stopped.
[22864.5] md: unbind<sdd1>
[22864.6] md: export_rdev(sdd1)
[22864.6] md: unbind<sdc1>
[22864.6] md: export_rdev(sdc1)
[22864.6] md: unbind<sdb1>
[22864.6] md: export_rdev(sdb1)
[22864.6] md: unbind<sde1>
[22864.6] md: export_rdev(sde1)
[22864.6] md: unbind<sdf1>
[22864.6] md: export_rdev(sdf1)
[22864.6] md: unbind<sda1>
[22864.6] md: export_rdev(sda1)

> As long as there are two missing devices no resync will happen so the
> data will not be changed.  So after doing a --create you can fsck and
> mount etc and ensure the data is safe before continuing.

Thank you, that is useful information.

Do you know if the data on /dev/sdc1 would be altered as a result of
it becoming a Spare after it disconnected and reconnected itself?

> But if you cannot get though a sequential read of all devices without
> any read error, you wont be able to rebuild redundancy.  (There are
> plans to make raid6 more robust in this scenario, but they are a long
> way from fruition yet).

Prior to attempting the rebuild, I did the following:

# dd if=/dev/sda1 of=/dev/null &
# dd if=/dev/sdb1 of=/dev/null &
# dd if=/dev/sdc1 of=/dev/null &
# dd if=/dev/sdd1 of=/dev/null &
# dd if=/dev/sde1 of=/dev/null &
# dd if=/dev/sdf1 of=/dev/null &
# dd if=/dev/sdi1 of=/dev/null &
# dd if=/dev/sdj1 of=/dev/null &

I left it running for about an hour, and none of the disks had any errors.
I really hope it is not a permanent fault 75% of the way through the disk.
Though if it was just bad sectors, why would the disk be disconnecting
from the system?

Thanks again for all your help.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-14 21:01                       ` linux-raid.vger.kernel.org
@ 2009-06-15 15:48                         ` Bill Davidsen
  2009-06-16  6:00                         ` Neil Brown
  1 sibling, 0 replies; 19+ messages in thread
From: Bill Davidsen @ 2009-06-15 15:48 UTC (permalink / raw)
  To: linux-raid.vger.kernel.org; +Cc: NeilBrown, linux-raid

linux-raid.vger.kernel.org@atu.cjb.net wrote:
>> This doesn't make a lot of sense.  It should not have been marked
>> as a spare unless someone explicitly tried to "Add" it to the
>> array.
>>
>> However you description of event suggests that this was automatic
>> which is strange.
>>     
>
> Yes, it was entirely automatic.  The only commands I had running on the computer when it happened were:
>
> # watch -n 0.1 'uptime; echo; cat /proc/mdstat|grep md13 -A 2; echo; dmesg|tac'
>
> This gave me a nice, simple display of what was going on with the
> rebuild, and a monitor of dmesg in case there were any new kernel
> messages.
>
>   
>> Can I get the complete kernel logs from when the rebuild started
>> to when you finally gave up?  It might help me understand.
>>     
>
> Sure.
>
> Just to confirm, /dev/sd{a,b,c,d,e,f}1 are the partitions which
> contain my up-to-date data.  /dev/sd{i,j}1 contain many days old data.
>
> Here is the entire dmesg output during the rebuild:
>   
> I left it running for about an hour, and none of the disks had any errors.
> I really hope it is not a permanent fault 75% of the way through the disk.
> Though if it was just bad sectors, why would the disk be disconnecting
> from the system?
>
> Thanks again for all your help.
>
>   

I really don't see any indication that this is a kernel issue, my VM 
host machine has multiple VMs, including this "desktop" system, and runs 
raid5 and raid10, and has had no "ata" messages in 15 days of uptime, 
obviously with lots of disk use. The only thought I do have is that it 
is at least possible that you have a marginal something in your 
hardware, possibly memory, or a controller, and that two things which 
might be useful to check are the memory (memtest) and using 'sensors' to 
monitor heat. I have seen drives which worked fine until you ran them 
hard for 20-30 minutes and then started getting errors (usually seek). 
Just a few things to consider, since you have put this much effort into 
characterizing the problem.

-- 
Bill Davidsen <davidsen@tmr.com>
  Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one error occurs during
wildcard (glob) expansion.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-14  8:11                     ` NeilBrown
  2009-06-14 21:01                       ` linux-raid.vger.kernel.org
@ 2009-06-16  3:38                       ` Luca Berra
  2009-06-16  5:00                         ` linux-raid.vger.kernel.org
  1 sibling, 1 reply; 19+ messages in thread
From: Luca Berra @ 2009-06-16  3:38 UTC (permalink / raw)
  To: linux-raid

On Sun, Jun 14, 2009 at 06:11:44PM +1000, NeilBrown wrote:
>On Sun, June 14, 2009 5:10 pm, linux-raid.vger.kernel.org@atu.cjb.net wrote:
>> So here I was thinking everything was fine.  My six disks were working
>> for hours and the other two disks were loaded as spares and the first
>> one was rebuilding, up to 30% with an ETA of 5 hours.  I left the house
>> for a few hours and when I came back, the same disk with read errors
>> before had spontaneously disconnected and reconnected three times (I
>> saw in dmesg).  It probably got around 80% of the way through the six
>> hour rebuild.
>>
>> The problem is that when the /dev/sdc disk reconnected itself after,
>> it was marked as a "Spare", and now I can't use the same command any
>> longer:
>
>This doesn't make a lot of sense.  It should not have been marked as
>a spare unless someone explicitly tried to "Add" it to the array.
>
>I've been thinking that I need to improve mdadm in this respect
>and make it harder to accidentally turn a failed drive into a spare.
>
>However you description of event suggests that this was automatic
>which is strange.

udev?

>Can I get the complete kernel logs from when the rebuild started to
>when you finally gave up?  It might help me understand.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-16  3:38                       ` Luca Berra
@ 2009-06-16  5:00                         ` linux-raid.vger.kernel.org
  0 siblings, 0 replies; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-16  5:00 UTC (permalink / raw)
  To: linux-raid

> udev?

Yes.

Although the disk remained as /dev/sdc even after the disconnect
and reconnect.

> > Can I get the complete kernel logs from when the rebuild started
> > to when you finally gave up?  It might help me understand.

I already posted the dmesg log, was there something else that was
needed?

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-14 21:01                       ` linux-raid.vger.kernel.org
  2009-06-15 15:48                         ` Bill Davidsen
@ 2009-06-16  6:00                         ` Neil Brown
  2009-06-16  8:13                           ` linux-raid.vger.kernel.org
  1 sibling, 1 reply; 19+ messages in thread
From: Neil Brown @ 2009-06-16  6:00 UTC (permalink / raw)
  To: linux-raid.vger.kernel.org; +Cc: linux-raid

On Sunday June 14, linux-raid.vger.kernel.org@atu.cjb.net wrote:
> 
> I had tried to add the two old disks (sdi and sdj) while the array was
> in read-only mode for the rebuild, but it didn't allow me.  Is there
> any way to mark the six valid disks as read-only so they will not be
> modified during the rebuild (and not become spares, have their event
> count updated, etc.)?

No.  They won't become spares unless you tell them to, but you cannot
force them to be 100% read-only.

> [ 4421.9] md: md13 switched to read-only mode.
> [ 4549.0] md: md13 switched to read-write mode.
> 
> I again switched back to read-only mode, hoping it would continue
> rebuilding, but it stopped, so I went back to read-write mode and
> it resumed the rebuild.

Yes.  "readonly" means "no writing", including the writing required to
recover or resync the array.

> [ 4628.7] mdadm[19700]: segfault at 0 ip 000000000041617f sp 00007fff87776290 error 4 in mdadm[400000+2a000]
> 
> This new version of mdadm from after my Ubuntu 9.10 upgrade with Linux
> 2.6.28 seg faults every time a new event happens, such as a disk being
> added or removed.  Prior to the upgrade, using Linux 2.6.17 and
> whichever older version of mdadm it had, I had never seen it seg fault.
> 
> # mdadm --version
> 
> mdadm - v2.6.7.1 - 15th October 2008

It would be great if you could get a stack trace of this.  Is the an
"mdadm --monitor" that is dying, or mdadm running for some other
reason?


> [ 4647.7] ata1.00: cmd 61/c0:38:17:43:63/00:00:00:00:00/40 tag 7 ncq 98304 out
> [ 4647.7]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> [ 4647.7] ata1.00: status: { DRDY }
> [ 4647.7] ata1: hard resetting link
> [ 4648.2] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 4648.2] ata1.00: configured for UDMA/133
> [ 4648.2] ata1: EH complete
> 
> I've noticed that dmesg most often lists disks as "ata1", "ata9" etc.
> and I have found no way to convert these into /dev/sdc style format.
> Do you know how to translate these disk identifiers?  It's really
> quite frustrating not knowing which disk an error/message is from,
> especially when 2 or 3 disks have issues at the same time.

Sorry, I cannot help you there.
I would probably look in /sys and see if anything looks vaguely similar.

> 
> [ 4648.2] sd 0:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
> [ 4648.2] sd 0:0:0:0: [sdi] Write Protect is off
> [ 4648.2] sd 0:0:0:0: [sdi] Mode Sense: 00 3a 00 00
> [ 4648.2] sd 0:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> 
> Proceeding is when I added the last disk back into the set.  I had hoped
> that both disks could rebuild simultaneously, but it seems to force it
> to only rebuild one at a time.  Is there any way to rebuild both disks
> together?  It is frustrating having two idle CPUs during the rebuild,
> and low disk throughput.  I'm guessing mdadm is not a threaded app.

mdadm doesn't do the resync, the kernel does.
It is quite capable of recovering both drives at once, but it is
difficult to tell it to because as soon as you add a drive, it starts
recovery.
What you could do is add both drives, then abort the recovery with
  echo idle > /sys/block/md13/md/sync_action
The recovery will then start again immediately, but using both drives.
A future release of mdadm will 'freeze' the sync action before adding
any drives, then unfreeze afterwards so this will work better in the
future.

> [13696.4] ata5: EH complete
> [13696.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
> [13696.4] sd 4:0:0:0: [sdc] Write Protect is off
> [13696.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> [13696.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [13697.4] ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
> [13697.4] ata5.00: irq_stat 0x40000008
> [13697.4] ata5.00: cmd 60/98:28:7f:af:fa/00:00:31:00:00/40 tag 5 ncq 77824 in
> [13697.4]          res 41/40:00:f7:af:fa/09:00:31:00:00/40 Emask 0x409 (media error) <F>
> [13697.4] ata5.00: status: { DRDY ERR }
> [13697.4] ata5.00: error: { UNC }
> [13697.4] ata5.00: configured for UDMA/133
> [13697.4] sd 4:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> [13697.4] sd 4:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor]

"Medium Error" is not good.  It implies you have lost data.  Though it
might be transient due to heat? or something.

> [13697.4] Descriptor sense data with sense descriptors (in hex):
> [13697.4]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [13697.4]         31 fa af f7
> [13697.4] sd 4:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed
> [13697.4] end_request: I/O error, dev sdc, sector 838512631
> [13697.4] raid5:md13: read error not correctable (sector 838512568 on sdc1).
> [13697.4] raid5: Disk failure on sdc1, disabling device.
> [13697.4] raid5: Operation continuing on 5 devices.
> 
> This last line is something I have been baffled by -- how does a RAID-5
> or RAID-6 device continue as "active" when fewer than the minimum number
> of disks is present?  This happened with my RAID-5 swap array losing 2
> disks, and happened above on a RAID-6 with only 5 of 8 disks.  When I
> arrived home, it clearly said the array was still "active".

Just poorly worded messages I guess.  The array doesn't go completely
off-line.  It remains sufficiently active for you to be able to read
any block that isn't on a dead drive.  Possibly there isn't much point
in that. 

> 
> [13697.4] raid5:md13: read error not correctable (sector 838512576 on sdc1).
> [13697.4] raid5:md13: read error not correctable (sector 838512584 on sdc1).
> [13697.4] raid5:md13: read error not correctable (sector 838512592 on sdc1).
> [13697.4] ata5: EH complete
> [13697.4] sd 4:0:0:0: [sdc] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
> [13697.4] sd 4:0:0:0: [sdc] Write Protect is off
> [13697.4] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> [13697.4] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [13711.0] md: md13: recovery done.
> 
> What is this "recovery done" referring to?  No recovery was completed.

I just means that it has done all the recovery that it can.  Given the
number of failed devices, that isn't very much.


> 
> I arrived home and performed the following commands
> (I have removed some of the duplicate commands):
> 
> # mdadm --verbose --verbose --detail --scan /dev/md13
> # mdadm --verbose --verbose --detail --scan /dev/md13
> # mdadm /dev/md13 --remove /dev/sdj1 /dev/sdi1
> # mdadm --verbose --verbose --detail --scan /dev/md13
> # mdadm /dev/md13 --remove /dev/sdc1
> # mdadm --verbose --verbose --detail --scan /dev/md13
> # mdadm /dev/md13 --re-add /dev/sdc1

This is where you went wrong.  This will have added /dev/sdc1 as a
spare, because the array was too degraded to have any hope of really
re-adding it.

That is why the metadata on sdc1 no longer reflects its old role in
the array.

Yes: mdadm does need to be improved in this area.

> 
> > As long as there are two missing devices no resync will happen so the
> > data will not be changed.  So after doing a --create you can fsck and
> > mount etc and ensure the data is safe before continuing.
> 
> Thank you, that is useful information.
> 
> Do you know if the data on /dev/sdc1 would be altered as a result of
> it becoming a Spare after it disconnected and reconnected itself?

No, the data will not have been altered.

> 
> > But if you cannot get though a sequential read of all devices without
> > any read error, you wont be able to rebuild redundancy.  (There are
> > plans to make raid6 more robust in this scenario, but they are a long
> > way from fruition yet).
> 
> Prior to attempting the rebuild, I did the following:
> 
> # dd if=/dev/sda1 of=/dev/null &
> # dd if=/dev/sdb1 of=/dev/null &
> # dd if=/dev/sdc1 of=/dev/null &
> # dd if=/dev/sdd1 of=/dev/null &
> # dd if=/dev/sde1 of=/dev/null &
> # dd if=/dev/sdf1 of=/dev/null &
> # dd if=/dev/sdi1 of=/dev/null &
> # dd if=/dev/sdj1 of=/dev/null &
> 
> I left it running for about an hour, and none of the disks had any errors.
> I really hope it is not a permanent fault 75% of the way through the disk.
> Though if it was just bad sectors, why would the disk be disconnecting
> from the system?

Multiple problems I expect.  Maybe something is over-heating or maybe
the controller is a bit dodgey. 

You should be able to create the array with

 mdadm --create /dev/md13 -l6 -n8 missing /dev/sdd1 /dev/sda1 /dev/sdf1 \
                                  missing /dev/sdc1 /dev/sde1 /dev/sdb1
        
providing none of the devices have changed names.  Then you should be
able to get at your data.
You could try a recovery again - it might work.
But if it fails, don't remove and re-add drives that you think have
good data.  Rather stop the array and re-assemble with --force.

NeilBrown

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: RAID-6 mdadm disks out of sync issue (more questions)
  2009-06-16  6:00                         ` Neil Brown
@ 2009-06-16  8:13                           ` linux-raid.vger.kernel.org
  0 siblings, 0 replies; 19+ messages in thread
From: linux-raid.vger.kernel.org @ 2009-06-16  8:13 UTC (permalink / raw)
  To: linux-raid



> No.  They won't become spares unless you tell them to, but you
> cannot force them to be 100% read-only.

This would be a very nice feature to have, rebuilding disks while
guaranteeing that the data on the "good" disks is not modified in
any way.

> > [ 4628.7] mdadm[19700]: segfault at 0 ip 000000000041617f sp 00007fff87776290 error 4 in mdadm[400000+2a000]
> > 
> > mdadm - v2.6.7.1 - 15th October 2008
> 
> It would be great if you could get a stack trace of this.  Is the
> an "mdadm --monitor" that is dying, or mdadm running for some other
> reason?

It was not me running the command, so presumably it was the
/etc/init.d/mdadm service running.  I looked at that file, and near
as I can tell (it's fairly confusing and calls many other files) it
runs one of these three commands:

# mdadm --monitor
# mdadm --syslog
# mdadm --monitor --syslog

Is there some way I could modify this script so it would capture the
debugging output when it seg faults?  Or maybe replacing the
/sbin/mdadm binary with a wrapper Bash script that runs the real mdadm
in a debug mode?  I know nothing about debugging mdadm.

> > I've noticed that dmesg most often lists disks as "ata1", "ata9"
> > etc.  Do you know how to translate these disk identifiers?
> 
> Sorry, I cannot help you there.  I would probably look in /sys and
> see if anything looks vaguely similar.

I spent quite a while looking in /proc and /sys, but wasn't able to
come up with anything.  The only method I came up with was to go back
in the dmesg history until there was a message from only one disk at
a time, in which case it was easy to deduce which "ata*" related to
which "/dev/sd*".

> What you could do is add both drives, then abort the recovery with
>   echo idle > /sys/block/md13/md/sync_action
> The recovery will then start again immediately, but using both drives.

Thank you.

> A future release of mdadm will 'freeze' the sync action before adding
> any drives, then unfreeze afterwards so this will work better in the
> future.

That sounds like a good way to deal with it.

> "Medium Error" is not good.  It implies you have lost data.
> Though it might be transient due to heat? or something.

So far I have only attempted the one rebuild (before this issue with
a Spare being created of the /dev/sdc1).  I have switched it from the
motherboard SATA port to one on the PCI card.  Hopefully it can get
through the rebuild this time.

It wouldn't be due to heat currently, as the disks are as well
ventilated as is possible: in a plastic frame outside the computer,
open to the air, several inches from other disks, with a standing
room fan blowing air over them.

Before the crash a week ago, 4 of the 8 disks were close together in
a 3.5" metal drive cage and were quite hot to the touch.  I'm not
sure if the problematic /dev/sdc disk was in the group of hot disks
or not.  Also not sure if heat can cause permanent damage to a disk.
I think it's just a bad batch of disks, when I recently went looking
online for people with the same model disk, there were lots of
comments about them dying.

> The array doesn't go completely off-line.  It remains sufficiently
> active for you to be able to read any block that isn't on a dead
> drive.  Possibly there isn't much point in that.

I see.  It seems unlikely that I would be able to swapoff 100MB+ of
data in such a situation, but I was able to, after the RAID-5 on
/dev/md9 lost two disks.

> > What is this "recovery done" referring to?
> > No recovery was completed.
> 
> I just means that it has done all the recovery that it can.

Perhaps a more appropriate message would be something like:

"unable to continue with recovery, aborting"

> > I arrived home and performed the following commands
> > (I have removed some of the duplicate commands):
> > 
> > # mdadm --verbose --verbose --detail --scan /dev/md13
> > # mdadm --verbose --verbose --detail --scan /dev/md13
> > # mdadm /dev/md13 --remove /dev/sdj1 /dev/sdi1
> > # mdadm --verbose --verbose --detail --scan /dev/md13
> > # mdadm /dev/md13 --remove /dev/sdc1
> > # mdadm --verbose --verbose --detail --scan /dev/md13
> > # mdadm /dev/md13 --re-add /dev/sdc1
> 
> This is where you went wrong.  This will have added /dev/sdc1 as
> a spare, because the array was too degraded to have any hope of
> really re-adding it.

Do you mean that "mdadm --detail --scan /dev/md13" caused the disk
to become marked as a Spare?  Because it was listed as a Spare the
first time I ran that command when I arrived home.

> You should be able to create the array with
> 
>  mdadm --create /dev/md13 -l6 -n8 missing /dev/sdd1 /dev/sda1 /dev/sdf1 \
>                                   missing /dev/sdc1 /dev/sde1 /dev/sdb1
>         
> providing none of the devices have changed names.
> Then you should be able to get at your data.
> You could try a recovery again - it might work.

Thank you.

> But if it fails, don't remove and re-add drives that you think have
> good data.  Rather stop the array and re-assemble with --force.

Okay.  But just to clarify, this last time the rebuild failed, I did
not --remove and --re-add any disks until after I saw that /dev/sdc
had become a Spare.

I am off to sleep now, will try the rebuild before I go to work.

Thanks for all your help, very much appreciated.

 - S.A.






^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-06-16  8:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <S1752989AbZFJCy5/20090610025457Z+40@vger.kernel.org>
2009-06-10  8:52 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org
2009-06-10 10:55   ` NeilBrown
2009-06-11 18:43     ` RAID-6 mdadm disks out of sync issue (five questions) linux-raid.vger.kernel.org
2009-06-11 23:33       ` Michael Tokarev
2009-06-12  1:26         ` Neil Brown
2009-06-13  9:18           ` RAID-6 mdadm disks out of sync issue (no success) linux-raid.vger.kernel.org
2009-06-13  9:24             ` linux-raid.vger.kernel.org
2009-06-13  9:58             ` NeilBrown
2009-06-13 18:02               ` linux-raid.vger.kernel.org
2009-06-13 20:27                 ` RAID-6 mdadm disks out of sync issue (success!) linux-raid.vger.kernel.org
2009-06-14  7:10                   ` RAID-6 mdadm disks out of sync issue (more questions) linux-raid.vger.kernel.org
2009-06-14  8:11                     ` NeilBrown
2009-06-14 21:01                       ` linux-raid.vger.kernel.org
2009-06-15 15:48                         ` Bill Davidsen
2009-06-16  6:00                         ` Neil Brown
2009-06-16  8:13                           ` linux-raid.vger.kernel.org
2009-06-16  3:38                       ` Luca Berra
2009-06-16  5:00                         ` linux-raid.vger.kernel.org
2009-06-10  8:58 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.