All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Lampel <florian.lampel@gmail.com>
To: Phil Turmel <philip@turmel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 dead on the water after Controller failure
Date: Sat, 15 Feb 2014 19:52:27 +0100	[thread overview]
Message-ID: <8D85C29C-685E-457A-BA2A-5F9069122D88@gmail.com> (raw)
In-Reply-To: <52FF83F1.3030904@turmel.org>

Am 15.02.2014 um 16:12 schrieb Phil Turmel <philip@turmel.org>:

> Good morning Florian,

Good Evening - it's 19:37 here in Austria.

> Device order has changed, summary:
> 
> /dev/sda1: WD-WMC300595440 Device #4 @442
> /dev/sdb1: WD-WMC300595880 Device #5 @442
> /dev/sdc1: WD-WMC1T1521826 Device #6 @442
> /dev/sdd1: WD-WMC300314126 spare
> /dev/sde1: WD-WMC300595645 Device #8 @435
> /dev/sdf1: WD-WMC300314217 Device #9 @435
> /dev/sdg1: WD-WMC300595957 Device #10 @435
> /dev/sdh1: WD-WMC300313432 Device #11 @435
> /dev/sdj1: WD-WMC300312702 Device #0 @442
> /dev/sdk1: WD-WMC300248734 Device #1 @442
> /dev/sdl1: WD-WMC300314248 Device #2 @442
> /dev/sdm1: WD-WMC300585843 Device #3 @442
> 
> and your SSD is now /dev/sdi.

Thank you again for going through all those logs and helping me. 

> Not quite.  What was 'h' is now 'd'.  Use:
> 
> mdadm -Afv /dev/md0 /dev/sd[abcefghjklm]1

Well, that did not went as well as I had hoped. Here is what happened:

root@Lserve:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@Lserve:~# mdadm -Afv /dev/md0 /dev/sd[abcefghjklm]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 8.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 9.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 10.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 11.
mdadm: /dev/sdj1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdk1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdl1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdm1 is identified as a member of /dev/md0, slot 3.
mdadm: forcing event count in /dev/sde1(8) from 435 upto 442
mdadm: forcing event count in /dev/sdf1(9) from 435 upto 442
mdadm: forcing event count in /dev/sdg1(10) from 435 upto 442
mdadm: forcing event count in /dev/sdh1(11) from 435 upto 442
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sde1
mdadm: clearing FAULTY flag for device 4 in /dev/md0 for /dev/sdf1
mdadm: clearing FAULTY flag for device 5 in /dev/md0 for /dev/sdg1
mdadm: clearing FAULTY flag for device 6 in /dev/md0 for /dev/sdh1
mdadm: Marking array /dev/md0 as 'clean'
mdadm: added /dev/sdk1 to /dev/md0 as 1
mdadm: added /dev/sdl1 to /dev/md0 as 2
mdadm: added /dev/sdm1 to /dev/md0 as 3
mdadm: added /dev/sda1 to /dev/md0 as 4
mdadm: added /dev/sdb1 to /dev/md0 as 5
mdadm: added /dev/sdc1 to /dev/md0 as 6
mdadm: no uptodate device for slot 7 of /dev/md0
mdadm: added /dev/sde1 to /dev/md0 as 8
mdadm: added /dev/sdf1 to /dev/md0 as 9
mdadm: added /dev/sdg1 to /dev/md0 as 10
mdadm: added /dev/sdh1 to /dev/md0 as 11
mdadm: added /dev/sdj1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 11 drives - not enough to start the array.

AND:

cat /proc/mdstat:

cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdj1[0](S) sdh1[11](S) sdg1[10](S) sdf1[9](S) sde1[8](S) sdc1[6](S) sdb1[5](S) sda1[4](S) sdm1[3](S) sdl1[2](S) sdk1[1](S)
      21488646696 blocks super 1.0
       
unused devices: <none>

Seems like every HDD got marked as a spare. Why would mdadm do this, and how can I convince mdadm that they are not spares?


> That would be a good time to backup any critical data that isn't
> already in a backup.

Crashplan had about 30% before it happened. 20TB is a lot to upload.

> One more thing:  your drives report never having a self-test run.  You
> should have a cron job that triggers a long background self-test on a
> regular basis.  Weekly, perhaps.
> 
> Similarly, you should have a cron job trigger an occasional "check"
> scrub on the array, too.  Not at the same time as the self-tests,
> though.  (I understand some distributions have this already.)

I will certainly do so in the future.

Thanks again everyone, and I hope this will all end well.

Thanks,
Florian Lampel


  reply	other threads:[~2014-02-15 18:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-14 16:19 RAID6 dead on the water after Controller failure Florian Lampel
2014-02-14 20:35 ` Phil Turmel
2014-02-15 12:31   ` Florian Lampel
2014-02-15 15:12     ` Phil Turmel
2014-02-15 18:52       ` Florian Lampel [this message]
2014-02-15 19:00         ` Phil Turmel
2014-02-15 19:01           ` Phil Turmel
2014-02-15 19:09           ` Bakk. Florian Lampel
2014-02-15 22:04       ` Jon Nelson
2014-02-15 23:04         ` Mikael Abrahamsson
2014-02-15 23:23           ` Jon Nelson
2014-02-16  3:49             ` Phil Turmel
     [not found]     ` <CADNH=7EiY18TJDBDQsT6LDtw+Ft_2XCFaP30uK7uJb_e7xKhsQ@mail.gmail.com>
2014-02-15 18:56       ` Florian Lampel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8D85C29C-685E-457A-BA2A-5F9069122D88@gmail.com \
    --to=florian.lampel@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=philip@turmel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.