All of lore.kernel.org
 help / color / mirror / Atom feed
* md raid6 not working
@ 2012-08-20 19:06 Vanhorn, Mike
  2012-08-20 22:34 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Vanhorn, Mike @ 2012-08-20 19:06 UTC (permalink / raw)
  To: linux-raid


I have/had an 8-disk md raid 6, /dev/md0. At some point over the weekend,
two of the disks suddently became marked as "spare" and the other has
disappeared completely (at least as far as mdadm is concerned).

All eight disks seem to be just fine, so I think the data is okay, and if
I could just convince it to start the array with all 8 disks, I actually
think everything would be fine. However, everything I've tried has come to
nothing, and now I think I am stuck.

Is there some way to just "force" is to change the two spare disks from
"spare" to "active", and then let it go?

Here's what I think are relevant details:

The RAID is/was composed of /dev/sd[bcdefghi]1.

/proc/mdstat says:

# cat /proc/mdstat
Personalities : [raid6]
md0 : inactive sdc1[1] sdd1[10] sdi1[8] sdg1[5] sdf1[4] sde1[3] sdh1[2]
      13674583552 blocks
       
unused devices: <none>
# 

So, here, sdb is the only one missing. However, if I try to start the array

# mdadm --assemble /dev/md0
mdadm: /dev/sdi1 has no superblock - assembly aborted
#


So, I check /dev/sdi1:

# mdadm --examine /dev/sdi1
/dev/sdi1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 6b8b4567:327b23c6:643c9869:66334873
  Creation Time : Mon Jun 28 10:46:51 2010
     Raid Level : raid6
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 11721071616 (11178.09 GiB 12002.38 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 0

    Update Time : Mon Aug 20 12:10:18 2012
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 297da62d - correct
         Events : 59235337

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     8       8      129        8      spare   /dev/sdi1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8       65        3      active sync   /dev/sde1
   4     4       8       81        4      active sync   /dev/sdf1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       0        0        7      faulty removed
   8     8       8      129        8      spare   /dev/sdi1
#


The fact that that command worked on /dev/sdi1 indicates that there is, in
fact, a superblock, doesn't it?

At any rate, going from the output of --examine on sdi1, it would seem
that /dev/sdd1 is also not working. So,

# mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 6b8b4567:327b23c6:643c9869:66334873
  Creation Time : Mon Jun 28 10:46:51 2010
     Raid Level : raid6
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
     Array Size : 11721071616 (11178.09 GiB 12002.38 GB)
   Raid Devices : 8
  Total Devices : 5
Preferred Minor : 0

    Update Time : Mon Aug 20 12:10:21 2012
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 297da583 - correct
         Events : 59235338

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this    10       8       49       -1      spare   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8       65        3      active sync   /dev/sde1
   4     4       8       81        4      active sync   /dev/sdf1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       0        0        7      faulty removed
# 


Which would seem to indicate that sdd1 is fine, too. So, then, what about
sdb1?

# mdadm --examine /dev/sdb1
mdadm: No md superblock detected on /dev/sdb1.
#


Okay, fine, maybe something actually has happened to sdb1. However, since
it's a RAID6, having that one bad disk should be survivable. If I could
just get the other two disks (sdi1 and sdd1) to not be spares.


---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: md raid6 not working
  2012-08-20 19:06 md raid6 not working Vanhorn, Mike
@ 2012-08-20 22:34 ` NeilBrown
  2012-08-21 11:17   ` Vanhorn, Mike
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2012-08-20 22:34 UTC (permalink / raw)
  To: Vanhorn, Mike; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

On Mon, 20 Aug 2012 19:06:17 +0000 "Vanhorn, Mike"
<michael.vanhorn@wright.edu> wrote:

> 
> I have/had an 8-disk md raid 6, /dev/md0. At some point over the weekend,
> two of the disks suddently became marked as "spare" and the other has
> disappeared completely (at least as far as mdadm is concerned).
> 
> All eight disks seem to be just fine, so I think the data is okay, and if
> I could just convince it to start the array with all 8 disks, I actually
> think everything would be fine. However, everything I've tried has come to
> nothing, and now I think I am stuck.
> 
> Is there some way to just "force" is to change the two spare disks from
> "spare" to "active", and then let it go?
> 
> Here's what I think are relevant details:
> 
> The RAID is/was composed of /dev/sd[bcdefghi]1.
> 
> /proc/mdstat says:
> 
> # cat /proc/mdstat
> Personalities : [raid6]
> md0 : inactive sdc1[1] sdd1[10] sdi1[8] sdg1[5] sdf1[4] sde1[3] sdh1[2]
>       13674583552 blocks
>        
> unused devices: <none>
> # 
> 
> So, here, sdb is the only one missing. However, if I try to start the array
> 
> # mdadm --assemble /dev/md0
> mdadm: /dev/sdi1 has no superblock - assembly aborted
> #

What is the result of:

  mdadm -S /dev/md0
  mdadm -Avvv /dev/md0
??

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: md raid6 not working
  2012-08-20 22:34 ` NeilBrown
@ 2012-08-21 11:17   ` Vanhorn, Mike
  2012-08-21 22:38     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Vanhorn, Mike @ 2012-08-21 11:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thank you for your response. Here is the output you requested.

>What is the result of:
>
>  mdadm -S /dev/md0

# mdadm -S /dev/md0
mdadm: stopped /dev/md0
#

>  mdadm -Avvv /dev/md0

# mdadm -Avvv /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdi1: Device or resource busy
mdadm: /dev/sdi1 has no superblock - assembly aborted
#

---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: md raid6 not working
  2012-08-21 11:17   ` Vanhorn, Mike
@ 2012-08-21 22:38     ` NeilBrown
  2012-08-22 12:57       ` Vanhorn, Mike
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2012-08-21 22:38 UTC (permalink / raw)
  To: Vanhorn, Mike; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

On Tue, 21 Aug 2012 11:17:57 +0000 "Vanhorn, Mike"
<michael.vanhorn@wright.edu> wrote:

> Thank you for your response. Here is the output you requested.
> 
> >What is the result of:
> >
> >  mdadm -S /dev/md0
> 
> # mdadm -S /dev/md0
> mdadm: stopped /dev/md0
> #
> 
> >  mdadm -Avvv /dev/md0
> 
> # mdadm -Avvv /dev/md0
> mdadm: looking for devices for /dev/md0
> mdadm: cannot open device /dev/sdi1: Device or resource busy
> mdadm: /dev/sdi1 has no superblock - assembly aborted
> #

So /dev/sdi1 is busy.  You need to find out why.  (the "no superblock"
message is a bit misleading... I might have fixed that in newer mdadm, I'm
not sure).

The "/proc/mdstat" that you showed in the original email had sdi1 as a member
of md0, so it clearly wasn't being used by anything else then.
"mdadm -S /dev/md0" would have removed it from md0 so it shouldn't have been
busy.
The fact that it is busy is very odd.

A device can be busy if:
 - it is mounted as a filesystem
 - it is active as swap
 - it is part of an md array
 - it is part of a dm device
 - probably something else, but those are the main ones.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: md raid6 not working
  2012-08-21 22:38     ` NeilBrown
@ 2012-08-22 12:57       ` Vanhorn, Mike
  0 siblings, 0 replies; 5+ messages in thread
From: Vanhorn, Mike @ 2012-08-22 12:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 8/21/12 6:38 PM, "NeilBrown" <neilb@suse.de> wrote:

>># mdadm -Avvv /dev/md0
>>mdadm: looking for devices for /dev/md0
>>mdadm: cannot open device /dev/sdi1: Device or resource busy
>>mdadm: /dev/sdi1 has no superblock - assembly aborted
>>#
>
>So /dev/sdi1 is busy.  You need to find out why.  (the "no superblock"
>message is a bit misleading... I might have fixed that in newer mdadm, I'm
>not sure).
>
>The "/proc/mdstat" that you showed in the original email had sdi1 as a
>member
>of md0, so it clearly wasn't being used by anything else then.
>"mdadm -S /dev/md0" would have removed it from md0 so it shouldn't have
>been
>busy.
>The fact that it is busy is very odd.
>
>A device can be busy if:
>- it is mounted as a filesystem
>- it is active as swap
>- it is part of an md array
>- it is part of a dm device
>- probably something else, but those are the main ones.
>
>


Okay, I have gone to investigate what was using /dev/sdi1 yesterday
morning when I tried to assemble the array. I couldn't find anything at
all that would have been doing something with that disk, so I simply tried
the assemble again, and this time it worked (well, sort of):

# mdadm -Avvv /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is not one of
/dev/sdc1,/dev/sdd1,/dev/sde1,/dev/sdf1,/dev/sdg1,/dev/sdh1,/dev/sdi1
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 8.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: added /dev/sdh1 to /dev/md0 as 2
mdadm: added /dev/sde1 to /dev/md0 as 3
mdadm: added /dev/sdf1 to /dev/md0 as 4
mdadm: added /dev/sdg1 to /dev/md0 as 5
mdadm: no uptodate device for slot 6 of /dev/md0
mdadm: no uptodate device for slot 7 of /dev/md0
mdadm: added /dev/sdi1 to /dev/md0 as 8
mdadm: added /dev/sdd1 to /dev/md0 as -1
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: /dev/md0 assembled from 5 drives and 2 spares - not enough to start
the array.
#

So, sdi1 seems to be just fine. However, since two of the disks are
getting marked as spares, it can't start the array. I don't ever recall
setting the two disks as spares, and even if I had, would one of the
spares have kicked in when sdb1 went bad? Or, am I not understanding the
concept of a spare as it applies to a level 6 raid?

At this point, I'm thinking that sdd1 and sdi1 really should be in either
slot 0, 6 or 7, but I'm not sure which ones. Is there a way to use
trial-and-error to assemble the array with, for example, sdd1 as slot 0,
and see if it works ("working" meaning that I could then mount the xfs
file system) and, if it doesn't, stop the array, and then try it in slot 6?

I am, I guess, making the assumption that it being marked a spare is
incorrect, and that it does, in fact, have data on it.


---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-08-22 12:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-20 19:06 md raid6 not working Vanhorn, Mike
2012-08-20 22:34 ` NeilBrown
2012-08-21 11:17   ` Vanhorn, Mike
2012-08-21 22:38     ` NeilBrown
2012-08-22 12:57       ` Vanhorn, Mike

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.