All of lore.kernel.org
 help / color / mirror / Atom feed
* multiple disk failures in an md raid6 array
@ 2013-04-03 13:19 Vanhorn, Mike
  2013-04-03 23:33 ` Phil Turmel
  0 siblings, 1 reply; 7+ messages in thread
From: Vanhorn, Mike @ 2013-04-03 13:19 UTC (permalink / raw)
  To: linux-raid


Now, I don't think that 3 disks have all gone bad at the same time, but as
md seems to think that they have, how do I proceed with this?

Normally, it's a RAID 6 array, with sdc - sdi being active and sdj being a
spare (that it, 8 disks total with one spare).

Here's what my raid looks like now:

[root ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Dec 13 16:10:58 2012
     Raid Level : raid6
     Array Size : 9766901760 (9314.44 GiB 10001.31 GB)
  Used Dev Size : 1953380352 (1862.89 GiB 2000.26 GB)
   Raid Devices : 7
  Total Devices : 8
    Persistence : Superblock is persistent

    Update Time : Wed Apr  3 02:15:16 2013
          State : clean, FAILED
 Active Devices : 4
Working Devices : 5
 Failed Devices : 3
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : myhostname:0  (local to host myhostname)
           UUID : c98a2a7b:f051a80c:2fa73177:757a5be1
         Events : 5066

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       49        1      active sync   /dev/sdd1
       2       0        0        2      removed
       3       0        0        3      removed
       4       8       97        4      active sync   /dev/sdg1
       5       8      113        5      active sync   /dev/sdh1
       6       8      129        6      active sync   /dev/sdi1

       0       8       33        -      faulty spare   /dev/sdc1
       2       8       65        -      faulty spare
       3       8       81        -      faulty spare   /dev/sdf1
       7       8      145        -      spare   /dev/sdj1
[root ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc1[0](F) sdj1[7](S) sdi1[6] sdh1[5] sdg1[4]
sdf1[3](F) sde1[2](F) sdd1[1]
      9766901760 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/4]
[_U__UUU]
      
unused devices: <none>
[root ~]#

It seems that at some point last night, sde went bad and was taken out of
the array and the spare, sdj, was put in it's place and the raid began to
rebuild. At that point, I would have waited until the rebuild was
complete, and then replaced sde and brought it all back. However, the
rebuild seems to have died, and now I have the situation shown above.

So, I can believe that sde actually is bad, but it seems unlikely to me
that all of them are bad, especially since the smart tests I do have all
been coming back fine up to this point. Actually, according to smart, most
of them are good:

sdc:
SMART overall-health self-assessment test result: PASSED
sdd:
SMART overall-health self-assessment test result: PASSED
sde:
sdf:
SMART overall-health self-assessment test result: PASSED
sdg:
SMART overall-health self-assessment test result: PASSED
sdh:
SMART overall-health self-assessment test result: PASSED
sdi:
SMART overall-health self-assessment test result: PASSED
sdj:
SMART overall-health self-assessment test result: FAILED!

And so it appears that sde has died (it seems to have disappeared from the
system entirely). And sdj appears to have enough bad block that smart is
labeling it as bad:

[root ~]# /usr/sbin/smartctl -H -d ata /dev/sde
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.13.1.el5] (local
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl open device: /dev/sde failed: No such device
[root ~]# /usr/sbin/smartctl -H -d ata /dev/sdj
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.13.1.el5] (local
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   058   058   140    Pre-fail  Always
FAILING_NOW 1134

Is there someway I can keep this array going? I do have one spare disk on
the shelf that I can put in (which is what I would have done), but how to
I get it to consider sdc and sdf as okay?

Thanks!




---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multiple disk failures in an md raid6 array
  2013-04-03 13:19 multiple disk failures in an md raid6 array Vanhorn, Mike
@ 2013-04-03 23:33 ` Phil Turmel
  2013-04-05  8:25   ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 7+ messages in thread
From: Phil Turmel @ 2013-04-03 23:33 UTC (permalink / raw)
  To: Vanhorn, Mike; +Cc: linux-raid

Hi Mike,

On 04/03/2013 09:19 AM, Vanhorn, Mike wrote:

> Now, I don't think that 3 disks have all gone bad at the same time, but as
> md seems to think that they have, how do I proceed with this?

They generally don't all go bad together.  I smell a classic error timeout
mismatch between non-raid drives and linux driver defaults.

Aside from that, it should be just an --assemble --force with at least the
five "best" drives (determined by event counts).  But you need to fix your
timeouts first, or the array will keep failing.

But first, before *any* other task, you need to completely document your
devices:

mdadm -E /dev/sd[cdfghij]1 >examine.txt
lsdrv >lsdrv.txt
for x in /dev/sd[cdfghij] ; do smartctl -x $x ; done >smart.txt
for x in /sys/block/sd[cdfghij] ; do echo $x: $(< $x/device/timeout) ; done >timeout.txt

{in lieu of lsdrv[1], you could excerpt "ls -l /dev/disk/by-id/"}

> Normally, it's a RAID 6 array, with sdc - sdi being active and sdj being a
> spare (that it, 8 disks total with one spare).

Ok.

[trim /]

> It seems that at some point last night, sde went bad and was taken out of
> the array and the spare, sdj, was put in it's place and the raid began to
> rebuild. At that point, I would have waited until the rebuild was
> complete, and then replaced sde and brought it all back. However, the
> rebuild seems to have died, and now I have the situation shown above.

Ok.

> So, I can believe that sde actually is bad, but it seems unlikely to me
> that all of them are bad, especially since the smart tests I do have all
> been coming back fine up to this point. Actually, according to smart, most
> of them are good:

[trim /]

> system entirely). And sdj appears to have enough bad block that smart is
> labeling it as bad:
> 
> [root ~]# /usr/sbin/smartctl -H -d ata /dev/sde
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.13.1.el5] (local
> build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> Smartctl open device: /dev/sde failed: No such device
> [root ~]# /usr/sbin/smartctl -H -d ata /dev/sdj
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.13.1.el5] (local
> build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: FAILED!
> Drive failure expected in less than 24 hours. SAVE ALL DATA.
> Failed Attributes:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   5 Reallocated_Sector_Ct   0x0033   058   058   140    Pre-fail  Always
> FAILING_NOW 1134

Yup. Toast.  Discard /dev/sdj along with /dev/sde.

> Is there someway I can keep this array going? I do have one spare disk on
> the shelf that I can put in (which is what I would have done), but how to
> I get it to consider sdc and sdf as okay?

I recommend:

1) Fix timeouts as needed.  Either set your drives' ERC to 7.0 seconds,
or raise the driver timeouts ~180 seconds.  Modern *desktop* drives go to
great lengths to read bad sectors--trying for two minutes or more whenever bad
sectors are encountered.  Modern *enterprise* drives, and other drives
advertised as raid-capable have short error timeouts by default (typically 7.0
seconds).  When a desktop drive is in error recovery, it *ignores* the
controller until it has an answer.  Linux MD raid will see the driver timeout
in 30 seconds, decide to rewrite the problem sector, but the drive isn't
listening, so it gets kicked out.

2) Stop the array and re-assembly with:

mdadm --assemble --force /dev/md0 /dev/sd[cdfghi]

3) Manually scrub the degraded array (effectively raid5).  This will fix your
latent unrecoverable read errors, so long as you don't have too many.

echo check >/sys/block/md0/md/sync_action
cat /proc/mdstat

4) Add new drive(s) and let the array rebuild.  (Make sure the new drives have
proper timeouts, too.)

5) Add appropriate instructions to rc.local to set proper timeouts on every boot.

6) Add cronjobs that will trigger a regular scrub (weekly?) and long smart
self-tests.

HTH,

Phil

[1] http://github.com/pturmel/lsdrv






^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multiple disk failures in an md raid6 array
  2013-04-03 23:33 ` Phil Turmel
@ 2013-04-05  8:25   ` Roy Sigurd Karlsbakk
  2013-04-05 12:05     ` Phil Turmel
  0 siblings, 1 reply; 7+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-05  8:25 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid, Mike Vanhorn

> for x in /sys/block/sd[cdfghij] ; do echo $x: $(< $x/device/timeout) ;
> done >timeout.txt

I never made that one work. This one did, though

# for x in /sys/block/sd[cdfghij] ; do echo -n "$x: "; cat $x/device/timeout; done

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multiple disk failures in an md raid6 array
  2013-04-05  8:25   ` Roy Sigurd Karlsbakk
@ 2013-04-05 12:05     ` Phil Turmel
  2013-04-05 17:06       ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 7+ messages in thread
From: Phil Turmel @ 2013-04-05 12:05 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: linux-raid, Mike Vanhorn

On 04/05/2013 04:25 AM, Roy Sigurd Karlsbakk wrote:
>> for x in /sys/block/sd[cdfghij] ; do echo $x: $(< $x/device/timeout) ;
>> done >timeout.txt
> 
> I never made that one work. This one did, though
> 
> # for x in /sys/block/sd[cdfghij] ; do echo -n "$x: "; cat $x/device/timeout; done
> 
> Vennlige hilsener / Best regards

Curious.  Works fine here:

# for x in /sys/block/sd[abcd] ; do echo $x $(< $x/device/timeout) ; done
/sys/block/sda 30
/sys/block/sdb 30
/sys/block/sdc 30
/sys/block/sdd 30

Not that it matters, much.  Use what works for you.

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multiple disk failures in an md raid6 array
  2013-04-05 12:05     ` Phil Turmel
@ 2013-04-05 17:06       ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 7+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-05 17:06 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid, Mike Vanhorn

> > # for x in /sys/block/sd[cdfghij] ; do echo -n "$x: "; cat
> > $x/device/timeout; done
> >
> > Vennlige hilsener / Best regards
> 
> Curious. Works fine here:
> 
> # for x in /sys/block/sd[abcd] ; do echo $x $(< $x/device/timeout) ;
> done
> /sys/block/sda 30
> /sys/block/sdb 30
> /sys/block/sdc 30
> /sys/block/sdd 30
> 
> Not that it matters, much. Use what works for you.

probably a typo - works now :P

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multiple disk failures in an md raid6 array
  2013-04-11 20:36 ` Phil Turmel
@ 2013-04-11 20:48   ` Vanhorn, Mike
  0 siblings, 0 replies; 7+ messages in thread
From: Vanhorn, Mike @ 2013-04-11 20:48 UTC (permalink / raw)
  To: Phil Turmel, Mike VanHorn; +Cc: linux-raid

>>
>>
>>
>>Also, Microsoft's mail server from whence my message was
>> originating has been blacklisted on your server, so I am
>> sending this to you from my personal account on Yahoo!.
>
>You really need to fix your server, then, or just use this yahoo
>account for linux-raid.  My server just uses standard SPF validation
>and common dns blacklists.

Well, I have no control over that, as the University is a Microsoft
customer, but it appears it's been cleared up now because things are going
through again.

>Are you already doing weekly scrubs and drive self-tests?

Yes. I have it do a scrub by writing "check" to
/sys/block/md0/md/sync_action from cron.weekly, and I have a weekly script
that runs smart tests, too.

>Do you still have the complete dmesg from the original triple
>failure?

Unfortunately, no. I thought I kept it, but I have either misplaced the
file or just didn't do it like I thought I did.

There has been a reboot since the failure, and sde has magically come back
and seems to be okay, so the only "bad" disk is actually sdj, which was
the space.  However, see my other thread about "Odd --examine output" for
more info; I haven't been able to reassemble the array (and I should have
enough disks) because the metadata doesn't seem to be in the right place.
On /dev/sd[cdefi], there is one set of metadata (from what I believe was
an earlier rendition of the array, such that this data is now invalid) and
on /dev/sd[gh]1, there is a newer set of metadata, the dates for which
correspond with when this array was created. So, I'm thinking that the
metadata is actually on /dev/sd[cdefi]1, but I can't get to it because
those devices don't exist (such that I can't run, for example, mdadm -E
/dev/sdc1, because /dev/sdc1 doesn't exist).

As I stated in the other thread, I am very confused.

---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: multiple disk failures in an md raid6 array
       [not found] <1365607598.94859.YahooMailNeo@web161904.mail.bf1.yahoo.com>
@ 2013-04-11 20:36 ` Phil Turmel
  2013-04-11 20:48   ` Vanhorn, Mike
  0 siblings, 1 reply; 7+ messages in thread
From: Phil Turmel @ 2013-04-11 20:36 UTC (permalink / raw)
  To: Mike VanHorn; +Cc: linux-raid

Hi Mike,

On 04/10/2013 11:26 AM, Mike VanHorn wrote:
> For some reason, my replies to the linux-raid list aren't going
> through, and not all of the messages from the list seem to be
> getting to me, either, so I hope it is okay that I am replying
> to you directly.

It's ok, but I am adding the list back.

> Also, Microsoft's mail server from whence my message was
> originating has been blacklisted on your server, so I am
> sending this to you from my personal account on Yahoo!.

You really need to fix your server, then, or just use this yahoo
account for linux-raid.  My server just uses standard SPF validation
and common dns blacklists.

> In your reply, you said
> 
>> I recommend:
>>
>> 1) Fix timeouts as needed.  Either set your drives' ERC to 7.0
>> seconds, or raise the driver timeouts ~180 seconds.
> 
> As it turns out, the drives in question aren't ERC capable:
> 
> # smartctl -l scterc,70,70 /dev/sdc
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.13.1.el5] (local
> build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> <http://smartmontools.sourceforge.net/>
> 
> Warning: device does not support SCT Error Recovery Control command
> #
> 
> However, when I do the following
> 
> for x in /sys/block/sd[cdfghij] ; do echo $x: $(< $x/device/timeout) ;
> done>timeout.txt
> 
> I get output such as
> 
> /sys/block/sdj: 180
> 
> because it seems that I've previously discovered that they aren't ERC capable, as I'm setting the timeout in /etc/rc.local like so:
> 
> echo 180 >/sys/block/sdc/device/timeout
> echo 180 >/sys/block/sdd/device/timeout
> echo 180 >/sys/block/sde/device/timeout
> echo 180 >/sys/block/sdf/device/timeout
> echo 180 >/sys/block/sdg/device/timeout
> echo 180 >/sys/block/sdh/device/timeout
> echo 180 >/sys/block/sdi/device/timeout
> echo 180 >/sys/block/sdj/device/timeout
> 
> Doing this is what is meant by changing the driver's timeout, correct?

Yes.

> Should I be setting this for an even longer period of time?

No.

> Thank you for helping me to understand what is going on!

Are you already doing weekly scrubs and drive self-tests?

Do you still have the complete dmesg from the original triple
failure?

> Mike VanHorn
> Senior Computer Systems Administrator
> College of Engineering and Computer Science
> Wright State University
> 265 Russ Engineering Center
> 937-775-5157
> michael.vanhorn@wright.edu
> http://www.cecs.wright.edu/~mvanhorn/

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-04-11 20:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-03 13:19 multiple disk failures in an md raid6 array Vanhorn, Mike
2013-04-03 23:33 ` Phil Turmel
2013-04-05  8:25   ` Roy Sigurd Karlsbakk
2013-04-05 12:05     ` Phil Turmel
2013-04-05 17:06       ` Roy Sigurd Karlsbakk
     [not found] <1365607598.94859.YahooMailNeo@web161904.mail.bf1.yahoo.com>
2013-04-11 20:36 ` Phil Turmel
2013-04-11 20:48   ` Vanhorn, Mike

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.