All of lore.kernel.org
 help / color / mirror / Atom feed
* Trouble reassembling RAID10
@ 2017-02-20 21:42 Roger Roglans
  2017-02-21 15:16 ` Phil Turmel
  0 siblings, 1 reply; 6+ messages in thread
From: Roger Roglans @ 2017-02-20 21:42 UTC (permalink / raw)
  To: linux-raid

Hey new to the mailing list and fairly new to RAIDs in general. I ran
into an issue and was hoping someone could help.

Our server that runs a 14 drive RAID10 through a rocketraid 2470
controller refused to assemble. Our goal is not necessarily to recover
a working RAID, but to get as much data back as possible.

Maybe as a consequence of the assembly failure, upon shutting down the
server, it would get stuck in boot loops. So I'm currently running
Ubuntu 16.04.1 from a USB. I've determined that 2 of 14  disks are
faulty and have determined which ones they are.

Here is the output of a mdadm --examine call.

    ubuntu@ubuntu:~$ sudo mdadm --examine /dev/sd[c-p]1 | egrep
'Events | /dev/sd'
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 21988
       Events : 560
       Events : 21944
       Events : 560

However, I keep running into an error:

    ubuntu@ubuntu:~$ sudo mdadm --assemble --verbose --force /dev/md0
/dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
/dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1
    mdadm: looking for devices for /dev/md0
    mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
    mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
    mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2.
    mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3.
    mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 4.
    mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 5.
    mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 6.
    mdadm: /dev/sdj1 is identified as a member of /dev/md0, slot 7.
    mdadm: /dev/sdk1 is identified as a member of /dev/md0, slot 8.
    mdadm: /dev/sdl1 is identified as a member of /dev/md0, slot 9.
    mdadm: /dev/sdm1 is identified as a member of /dev/md0, slot 10.
    mdadm: /dev/sdn1 is identified as a member of /dev/md0, slot 11.
    mdadm: /dev/sdo1 is identified as a member of /dev/md0, slot 12.
    mdadm: /dev/sdp1 is identified as a member of /dev/md0, slot 13.
    mdadm: added /dev/sdd1 to /dev/md0 as 1
    mdadm: added /dev/sde1 to /dev/md0 as 2
    mdadm: added /dev/sdf1 to /dev/md0 as 3
    mdadm: added /dev/sdg1 to /dev/md0 as 4
    mdadm: added /dev/sdh1 to /dev/md0 as 5
    mdadm: added /dev/sdi1 to /dev/md0 as 6
    mdadm: added /dev/sdj1 to /dev/md0 as 7
    mdadm: added /dev/sdk1 to /dev/md0 as 8
    mdadm: added /dev/sdl1 to /dev/md0 as 9
    mdadm: added /dev/sdm1 to /dev/md0 as 10
    mdadm: added /dev/sdn1 to /dev/md0 as 11 (possibly out of date)
    mdadm: added /dev/sdo1 to /dev/md0 as 12 (possibly out of date)
    mdadm: added /dev/sdp1 to /dev/md0 as 13 (possibly out of date)
    mdadm: added /dev/sdc1 to /dev/md0 as 0
    mdadm: /dev/md0 assembled from 11 drives - not enough to start the array.

and trying to add --run gives this error:

    ubuntu@ubuntu:~$ sudo mdadm --assemble --verbose --run --force
/dev/md1 /dev/sdc1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
/dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 /dev/sdo1
    mdadm: looking for devices for /dev/md1
    mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 0.
    mdadm: /dev/sde1 is identified as a member of /dev/md1, slot 2.
    mdadm: /dev/sdf1 is identified as a member of /dev/md1, slot 3.
    mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 4.
    mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 5.
    mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 6.
    mdadm: /dev/sdj1 is identified as a member of /dev/md1, slot 7.
    mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 8.
    mdadm: /dev/sdl1 is identified as a member of /dev/md1, slot 9.
    mdadm: /dev/sdm1 is identified as a member of /dev/md1, slot 10.
    mdadm: /dev/sdo1 is identified as a member of /dev/md1, slot 12.
    mdadm: no uptodate device for slot 2 of /dev/md1
    mdadm: added /dev/sde1 to /dev/md1 as 2
    mdadm: added /dev/sdf1 to /dev/md1 as 3
    mdadm: added /dev/sdg1 to /dev/md1 as 4
    mdadm: added /dev/sdh1 to /dev/md1 as 5
    mdadm: added /dev/sdi1 to /dev/md1 as 6
    mdadm: added /dev/sdj1 to /dev/md1 as 7
    mdadm: added /dev/sdk1 to /dev/md1 as 8
    mdadm: added /dev/sdl1 to /dev/md1 as 9
    mdadm: added /dev/sdm1 to /dev/md1 as 10
    mdadm: no uptodate device for slot 22 of /dev/md1
    mdadm: added /dev/sdo1 to /dev/md1 as 12 (possibly out of date)
    mdadm: no uptodate device for slot 26 of /dev/md1
    mdadm: added /dev/sdc1 to /dev/md1 as 0
    mdadm: failed to RUN_ARRAY /dev/md1: Input/output error
    mdadm: Not enough devices to start the array.

So it's clear that the last three drives are out of date. It's
possible that drives 11 and 13 were never really active, but since
they were only partners in a raid 1, the array was unaffected until
now. I'm hoping that if I can reassemble with the 12th drive, then I
will be able to recover most of the data. Does anyone know what I can
do about that? I've also tried without the "inactive" drives but it
still isn't assembling drive 12. I know that I can try using --run,
but I'm not sure if I'll lose data that way. I'm also hesitant to zero
the superblock because I've always heard it was a last resort option.

Note that because I'm running this off a USB the usual `cat
/proc/mdstat` doesn't return the previous array. Also, I don't know
the exact structure of the array (if I did this would be a lot
easier).

Thanks in advance for the help,
Roger

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble reassembling RAID10
  2017-02-20 21:42 Trouble reassembling RAID10 Roger Roglans
@ 2017-02-21 15:16 ` Phil Turmel
  2017-02-21 18:38   ` Roger Roglans
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2017-02-21 15:16 UTC (permalink / raw)
  To: Roger Roglans, linux-raid

Hi Roger,

On 02/20/2017 04:42 PM, Roger Roglans wrote:
> Hey new to the mailing list and fairly new to RAIDs in general. I
> ran into an issue and was hoping someone could help.

We probably can.

> Our server that runs a 14 drive RAID10 through a rocketraid 2470 
> controller refused to assemble. Our goal is not necessarily to
> recover a working RAID, but to get as much data back as possible.

Amounts to the same thing.

> Maybe as a consequence of the assembly failure, upon shutting down
> the server, it would get stuck in boot loops. So I'm currently
> running Ubuntu 16.04.1 from a USB. I've determined that 2 of 14
> disks are faulty and have determined which ones they are.

Three.  Two have been faulty for a very long time.  No-one noticed
the degraded status.

> Here is the output of a mdadm --examine call.

Please re-do this, combined with smartctl, and without grep.  This
will tell us everything about your array.  Like so:

for x in /dev/sd[a-p] do mdadm -E ${x}1 ; smartctl -iA -l scterc $x ; done

Paste the output *inline* in your plain-text reply with line
wrapping disabled.  If your draft email is larger than 100k, split
into multiple emails.

You are likely to need an alternate bootable USB stick -- your
report sounds like one of the versions of mdadm that had a bug
in forced assembly.  I usually use the latest one from
https://www.system-rescue-cd.org/

Please also read the recent thread and its references starting
here: https://marc.info/?l=linux-raid&m=148755536616025&w=2

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble reassembling RAID10
  2017-02-21 15:16 ` Phil Turmel
@ 2017-02-21 18:38   ` Roger Roglans
  2017-02-21 19:33     ` Wols Lists
  0 siblings, 1 reply; 6+ messages in thread
From: Roger Roglans @ 2017-02-21 18:38 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

Hi Phil,

seems very useful to know in the future. I ended up just assuming
clean and using "--create". Since I was able to discern the exact
configurations, I was able to mount it and am currently transferring
data. I know it was not the ideal solution but I believe that it
worked out with only minimal corruption. I might have problems with
another array soon. If so, I will certainly contact this mailing list
again.

thanks for your help,

Roger


On Feb 21, 2017 9:16 AM, "Phil Turmel" <philip@turmel.org> wrote:

Hi Roger,

On 02/20/2017 04:42 PM, Roger Roglans wrote:
> Hey new to the mailing list and fairly new to RAIDs in general. I
> ran into an issue and was hoping someone could help.

We probably can.

> Our server that runs a 14 drive RAID10 through a rocketraid 2470
> controller refused to assemble. Our goal is not necessarily to
> recover a working RAID, but to get as much data back as possible.

Amounts to the same thing.

> Maybe as a consequence of the assembly failure, upon shutting down
> the server, it would get stuck in boot loops. So I'm currently
> running Ubuntu 16.04.1 from a USB. I've determined that 2 of 14
> disks are faulty and have determined which ones they are.

Three.  Two have been faulty for a very long time.  No-one noticed
the degraded status.

> Here is the output of a mdadm --examine call.

Please re-do this, combined with smartctl, and without grep.  This
will tell us everything about your array.  Like so:

for x in /dev/sd[a-p] do mdadm -E ${x}1 ; smartctl -iA -l scterc $x ; done

Paste the output *inline* in your plain-text reply with line
wrapping disabled.  If your draft email is larger than 100k, split
into multiple emails.

You are likely to need an alternate bootable USB stick -- your
report sounds like one of the versions of mdadm that had a bug
in forced assembly.  I usually use the latest one from
https://www.system-rescue-cd.org/

Please also read the recent thread and its references starting
here: https://marc.info/?l=linux-raid&m=148755536616025&w=2

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble reassembling RAID10
  2017-02-21 18:38   ` Roger Roglans
@ 2017-02-21 19:33     ` Wols Lists
  2017-02-21 20:24       ` Roger Roglans
  0 siblings, 1 reply; 6+ messages in thread
From: Wols Lists @ 2017-02-21 19:33 UTC (permalink / raw)
  To: Roger Roglans, Phil Turmel; +Cc: linux-raid

On 21/02/17 18:38, Roger Roglans wrote:
> Hi Phil,
> 
> seems very useful to know in the future. I ended up just assuming
> clean and using "--create". Since I was able to discern the exact
> configurations, I was able to mount it and am currently transferring
> data. I know it was not the ideal solution but I believe that it
> worked out with only minimal corruption. I might have problems with
> another array soon. If so, I will certainly contact this mailing list
> again.

If I can plug my own work :-) there is now a section on the linux wiki
about troubleshooting an array, and what data to gather for the list.
Look at

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

That should be enough to fix simple problems or, for more serious ones,
you'll have the bulk if not all the information the members of the list
will need, saving the back-and-forth of "can we have this, can we have
that".

If you find any problems with the information on the wiki, let me know
and I'll endeavour to fix it.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble reassembling RAID10
  2017-02-21 19:33     ` Wols Lists
@ 2017-02-21 20:24       ` Roger Roglans
  2017-02-21 21:14         ` Wols Lists
  0 siblings, 1 reply; 6+ messages in thread
From: Roger Roglans @ 2017-02-21 20:24 UTC (permalink / raw)
  To: Wols Lists; +Cc: Phil Turmel, linux-raid

Hey Wol,
Yes I followed it and it was immensely helpful. I guess I didn't want
to make my post too long so that is why I only included the shortened
command outputs; in the future I'll just include everything. I believe
that the big issue was that the --force command did not work when it
seemed like it should have. I will keep on the lookout for that bug
again in the future.

Just a note on the wiki: "When Things Go Wrogn" should be "When Things Go Wrong"

Best,
Roger

On Tue, Feb 21, 2017 at 1:33 PM, Wols Lists <antlists@youngman.org.uk> wrote:
> On 21/02/17 18:38, Roger Roglans wrote:
>> Hi Phil,
>>
>> seems very useful to know in the future. I ended up just assuming
>> clean and using "--create". Since I was able to discern the exact
>> configurations, I was able to mount it and am currently transferring
>> data. I know it was not the ideal solution but I believe that it
>> worked out with only minimal corruption. I might have problems with
>> another array soon. If so, I will certainly contact this mailing list
>> again.
>
> If I can plug my own work :-) there is now a section on the linux wiki
> about troubleshooting an array, and what data to gather for the list.
> Look at
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> That should be enough to fix simple problems or, for more serious ones,
> you'll have the bulk if not all the information the members of the list
> will need, saving the back-and-forth of "can we have this, can we have
> that".
>
> If you find any problems with the information on the wiki, let me know
> and I'll endeavour to fix it.
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble reassembling RAID10
  2017-02-21 20:24       ` Roger Roglans
@ 2017-02-21 21:14         ` Wols Lists
  0 siblings, 0 replies; 6+ messages in thread
From: Wols Lists @ 2017-02-21 21:14 UTC (permalink / raw)
  To: Roger Roglans; +Cc: linux-raid

On 21/02/17 20:24, Roger Roglans wrote:
> Just a note on the wiki: "When Things Go Wrogn" should be "When Things Go Wrong"

That's a long and hallowed deliberate smelling pistake :-) Goes back at
least 30 years ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-21 21:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-20 21:42 Trouble reassembling RAID10 Roger Roglans
2017-02-21 15:16 ` Phil Turmel
2017-02-21 18:38   ` Roger Roglans
2017-02-21 19:33     ` Wols Lists
2017-02-21 20:24       ` Roger Roglans
2017-02-21 21:14         ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.