All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid 5: all devices marked spare, cannot assemble
@ 2015-03-12 12:21 Paul Boven
  2015-03-12 13:48 ` Phil Turmel
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Boven @ 2015-03-12 12:21 UTC (permalink / raw)
  To: linux-raid

Hi folks,

I have a rather curious issue with one of our storage machines. The 
machine has 36x 4TB disks (SuperMicro 847 chassis) which are divided 
over 4 dual SAS-HBAs and the on-board SAS. These disks are in RAID5 
configurations, 6 raids of 6 disks each. Recently the machine ran out of 
memory (it has 32GB, and no swapspace as it boots from SATA-DOM) and the 
last entries in the syslog are from the OOM-killer. The machine is 
running Ubuntu 14.04.02 LTS, mdadm 3.2.5-5ubuntu4.1.

After doing a hard reset, the machine booted fine but one of the raids 
needed to resync. Worse, another of the raid5s will not assemble at all. 
All the drives are marked SPARE. Relevant output from /proc/mdstat (one 
working and the broken array):

md14 : active raid5 sdc1[2] sdag1[6] sde1[4] sdi1[3] sdz1[0] sdu1[1]
       19534425600 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[6/6] [UUUUUU]

md15 : inactive sdd1[6](S) sdad1[0](S) sdy1[3](S) sdv1[4](S) sdm1[2](S) 
sdq1[1](S)
       23441313792 blocks super 1.2

Using 'mdadm --examine' on each of the drives from the broken md15, I get:

sdd1: Spare, Events: 0
sdad1: Active device 0, Events 194
sdy1: Active device 3, Events 194
sdv1: Active device 4, Events 194
sdm1: Active device 2, Events 194
sdq1: Active device 1, Events 194

This numbering corresponds to how the raid5 was created when I installed 
the machine:

mdadm --create /dev/md15 -l 5 -n 6 /dev/sdad1 /dev/sdq1 /dev/sdm1 
/dev/sdy1 /dev/sdv1 /dev/sdd1

Possible clues from /var/log/syslog:

md/raid:md13: not clean -- starting background reconstruction
(at 14 seconds uptime).

md15 isn't even mentioned in the boot-time syslog, only once I manually 
try to assemble it did I get these errors:

md: kicking non-fresh sdd1 from array!
md: unbind<sdd1>
md: export_rdev(sdd1)
md/raid:md15: not clean -- starting background reconstruction
md/raid:md15: device sdy1 operational as raid disk 3
md/raid:md15: device sdv1 operational as raid disk 4
md/raid:md15: device sdad1 operational as raid disk 0
md/raid:md15: device sdq1 operational as raid disk 1
md/raid:md15: device sdm1 operational as raid disk 2
md/raid:md15: allocated 0kB
md/raid:md15: cannot start dirty degraded array.
RAID conf printout:
--- level:5 rd:6 wd:5
disk 0, o:1, dev:sdad1
disk 1, o:1, dev:sdq1
disk 2, o:1, dev:sdm1
disk 3, o:1, dev:sdy1
disk 4, o:1, dev:sdv1
md/raid:md15: failed to run raid set.
md: pers->run() failed ...

So the questions I'd like to pose are:

* Why does this raid5 not assemble? Only one drive (sdd) seems to be 
missing (marked spare), although I see no real issues with it and can 
read from it fine. There should still be enough drives to start the array.

# mdadm --assemble /dev/md15 --run

Returns without any error message, but leaves /proc/mdstat unchanged.

* How can the data be recovered, and the machine brought into production 
again

And of course

* What went wrong, and how can we guard against this?

Any insights and help are much appreciated.

Regards, Paul Boven.
-- 
Paul Boven <boven@jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid 5: all devices marked spare, cannot assemble
  2015-03-12 12:21 Raid 5: all devices marked spare, cannot assemble Paul Boven
@ 2015-03-12 13:48 ` Phil Turmel
  2015-03-12 14:28   ` Paul Boven
  2015-03-13 10:06   ` Bad block management in raid1 Ankur Bose
  0 siblings, 2 replies; 4+ messages in thread
From: Phil Turmel @ 2015-03-12 13:48 UTC (permalink / raw)
  To: Paul Boven, linux-raid

Good morning Paul,

On 03/12/2015 08:21 AM, Paul Boven wrote:
> Hi folks,
> 
> I have a rather curious issue with one of our storage machines. The
> machine has 36x 4TB disks (SuperMicro 847 chassis) which are divided
> over 4 dual SAS-HBAs and the on-board SAS. These disks are in RAID5
> configurations, 6 raids of 6 disks each. Recently the machine ran out of
> memory (it has 32GB, and no swapspace as it boots from SATA-DOM) and the
> last entries in the syslog are from the OOM-killer. The machine is
> running Ubuntu 14.04.02 LTS, mdadm 3.2.5-5ubuntu4.1.

{BTW, I think raid5 is *insane* for this size array.}

> After doing a hard reset, the machine booted fine but one of the raids
> needed to resync. Worse, another of the raid5s will not assemble at all.
> All the drives are marked SPARE. Relevant output from /proc/mdstat (one
> working and the broken array):
> 
> md14 : active raid5 sdc1[2] sdag1[6] sde1[4] sdi1[3] sdz1[0] sdu1[1]
>       19534425600 blocks super 1.2 level 5, 512k chunk, algorithm 2
> [6/6] [UUUUUU]
> 
> md15 : inactive sdd1[6](S) sdad1[0](S) sdy1[3](S) sdv1[4](S) sdm1[2](S)
> sdq1[1](S)
>       23441313792 blocks super 1.2

Although (S) implies spare, that's only true if the array is active.
md15 is assembled but not assembled.

> Using 'mdadm --examine' on each of the drives from the broken md15, I get:
> 
> sdd1: Spare, Events: 0
> sdad1: Active device 0, Events 194
> sdy1: Active device 3, Events 194
> sdv1: Active device 4, Events 194
> sdm1: Active device 2, Events 194
> sdq1: Active device 1, Events 194

Please don't trim the reports.  This implies that your array simply
didn't --run as it is unexpected degraded.

[trim /]

> md: kicking non-fresh sdd1 from array!
> md: unbind<sdd1>
> md: export_rdev(sdd1)
> md/raid:md15: not clean -- starting background reconstruction
> md/raid:md15: device sdy1 operational as raid disk 3
> md/raid:md15: device sdv1 operational as raid disk 4
> md/raid:md15: device sdad1 operational as raid disk 0
> md/raid:md15: device sdq1 operational as raid disk 1
> md/raid:md15: device sdm1 operational as raid disk 2
> md/raid:md15: allocated 0kB
> md/raid:md15: cannot start dirty degraded array.

Exactly.

> * Why does this raid5 not assemble? Only one drive (sdd) seems to be
> missing (marked spare), although I see no real issues with it and can
> read from it fine. There should still be enough drives to start the array.
> 
> # mdadm --assemble /dev/md15 --run

Wrong syntax.  It's already assembled.  Just try "mdadm --run /dev/md15"

> * How can the data be recovered, and the machine brought into production
> again

If the simple --run doesn't work, stop the array and force assemble the
good drives:

mdadm --stop /dev/md15
mdadm --assemble --force --verbose /dev/md15 /dev/sd{ad,q,m,y,v}1

If that doesn't work, show the complete output of the --assemble.

> * What went wrong, and how can we guard against this?

The crash prevented mdadm from writing the current state of the array to
the individual drives' metadata.  You didn't provide complete --examine
output, so I'm speculating, but the drives must disagree on the last
known state of the array ==> "dirty".  See "start_dirty_degraded" in man
md(4), and the --run and --no-degraded options to assemble.

In other words, unclean shutdowns should have manual intervention,
unless the array in question contains the root filesystem, in which case
the risky "start_dirty_degraded" may be appropriate.  In that case, you
probably would want your initramfs to have a special mdadm.conf,
deferring assembly of bulk arrays to normal userspace.

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid 5: all devices marked spare, cannot assemble
  2015-03-12 13:48 ` Phil Turmel
@ 2015-03-12 14:28   ` Paul Boven
  2015-03-13 10:06   ` Bad block management in raid1 Ankur Bose
  1 sibling, 0 replies; 4+ messages in thread
From: Paul Boven @ 2015-03-12 14:28 UTC (permalink / raw)
  To: Phil Turmel, linux-raid

Hi Phil,

Good morning and thanks for your quick reply.

On 03/12/2015 02:48 PM, Phil Turmel wrote:
>> I have a rather curious issue with one of our storage machines. The
>> machine has 36x 4TB disks (SuperMicro 847 chassis) which are divided
>> over 4 dual SAS-HBAs and the on-board SAS. These disks are in RAID5
>> configurations, 6 raids of 6 disks each. Recently the machine ran out of
>> memory (it has 32GB, and no swapspace as it boots from SATA-DOM) and the
>> last entries in the syslog are from the OOM-killer. The machine is
>> running Ubuntu 14.04.02 LTS, mdadm 3.2.5-5ubuntu4.1.
>
> {BTW, I think raid5 is *insane* for this size array.}

It's 6 raid5s, not a single big one. This is only a temporary holding 
space for data to be processed. In its original incarnation the machine 
had 36 distinct file-systems that we would read from in a software 
stripe, just to get enough IO performance. So this is a trade-off 
between IO-speed and lost capacity versus convenience in case a drive 
inevitably fails.

I guess you would recommend raid6? I would have liked a global hot 
spare, maybe 7 arrays of 5 disks, but then we lose 8 disks in total 
instead of the current 6.

> Wrong syntax.  It's already assembled.  Just try "mdadm --run /dev/md15"

Trying to 'run' md15 gives me the same errors as before:
md/raid:md15: not clean -- starting background reconstruction
md/raid:md15: device sdad1 operational as raid disk 0
md/raid:md15: device sdy1 operational as raid disk 3
md/raid:md15: device sdv1 operational as raid disk 4
md/raid:md15: device sdm1 operational as raid disk 2
md/raid:md15: device sdq1 operational as raid disk 1
md/raid:md15: allocated 0kB
md/raid:md15: cannot start dirty degraded array.
RAID conf printout:
--- level:5 rd:6 wd:5
  disk 0, o:1, dev:sdad1
  disk 1, o:1, dev:sdq1
  disk 2, o:1, dev:sdm1
  disk 3, o:1, dev:sdy1
  disk 4, o:1, dev:sdv1
md/raid:md15: failed to run raid set.
md: pers->run() failed ...

> If the simple --run doesn't work, stop the array and force assemble the
> good drives:
>
> mdadm --stop /dev/md15
> mdadm --assemble --force --verbose /dev/md15 /dev/sd{ad,q,m,y,v}1

That worked!
mdadm: looking for devices for /dev/md15
mdadm: /dev/sdad1 is identified as a member of /dev/md15, slot 0.
mdadm: /dev/sdq1 is identified as a member of /dev/md15, slot 1.
mdadm: /dev/sdm1 is identified as a member of /dev/md15, slot 2.
mdadm: /dev/sdy1 is identified as a member of /dev/md15, slot 3.
mdadm: /dev/sdv1 is identified as a member of /dev/md15, slot 4.
mdadm: Marking array /dev/md15 as 'clean'
mdadm: added /dev/sdq1 to /dev/md15 as 1
mdadm: added /dev/sdm1 to /dev/md15 as 2
mdadm: added /dev/sdy1 to /dev/md15 as 3
mdadm: added /dev/sdv1 to /dev/md15 as 4
mdadm: no uptodate device for slot 5 of /dev/md15
mdadm: added /dev/sdad1 to /dev/md15 as 0
mdadm: /dev/md15 has been started with 5 drives (out of 6).

I've checked that the filesystem is in good shape, and added /dev/sdd1 
back in, the array is now resyncing. 680 minutes to go, but there's a 
few tricks I can do to speed that up a bit.

> In other words, unclean shutdowns should have manual intervention,
> unless the array in question contains the root filesystem, in which case
> the risky "start_dirty_degraded" may be appropriate.  In that case, you
> probably would want your initramfs to have a special mdadm.conf,
> deferring assembly of bulk arrays to normal userspace.

I'm perfectly happy with doing the recovery in userspace, these drives 
are not critical for booting. Except that Ubuntu, Plymouth and a few 
other things conspire against booting a machine with any disk problems, 
but that's a different rant for a different place.

Thank you very much for your very helpful reply, things look a lot 
better now.

Regards, Paul Boven.
-- 
Paul Boven <boven@jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Bad block management in raid1
  2015-03-12 13:48 ` Phil Turmel
  2015-03-12 14:28   ` Paul Boven
@ 2015-03-13 10:06   ` Ankur Bose
  1 sibling, 0 replies; 4+ messages in thread
From: Ankur Bose @ 2015-03-13 10:06 UTC (permalink / raw)
  To: linux-raid; +Cc: Suresh Babu Kandukuru

Hi There,

   Can you conform the below scenario in which blocks are consider to be 
a "bad" block.

         1. A read error on a degraded array ( a state of raid when 
array experiences the failure of one or more disks)for which the data 
cannot be found from other legs is a "bad" block and gets recorded.
         2. When recovering, from source to target leg, for any reason 
if unable to read from source, the target leg's block gets recorded as 
“bad” (thought the target block is writable and can be used in future).
         3. Write to a block fails (Though it leads to degraded mode).

Are they all implemented and is there any other scenario?

  When exactly the raid1 decides to make the device "Faulty"? Does that 
depends on the number of bad blocks in the list ie: 512?
  What is the size in the metadata for storing the bad block info.

Thanks,
Ankur
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-13 10:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-12 12:21 Raid 5: all devices marked spare, cannot assemble Paul Boven
2015-03-12 13:48 ` Phil Turmel
2015-03-12 14:28   ` Paul Boven
2015-03-13 10:06   ` Bad block management in raid1 Ankur Bose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.