Re: mdraid autodetect partially failing after disk replacement

From: Tregaron Bayly <tbayly@bluehost.com>
To: Ricky Burgin <ricky@burg.in>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdraid autodetect partially failing after disk replacement
Date: Thu, 09 May 2013 23:52:19 -0600	[thread overview]
Message-ID: <1368165139.4029.41.camel@linux-lxtg.site> (raw)
In-Reply-To: <8tj2wq0eu1swie7m884u2d8i.1368161579257@email.android.com>

Ricky,

When talking about things that happen before the root device is mounted
you are talking about the initial ramdisk.  

I'm not sure on the specifics for what you're running but can we presume
that this is the root filesystem?  If it is we can deduce that an mdadm
binary, mdadm.conf file and some udev rules to assemble the array need
to exist in the initrd/initramfs.  There may also be a script that
directly executes an mdadm incremental or assemble command.  If the
array isn't being properly started then there could be a problem
detecting the disks (driver issue), the mdadm.conf in the ramdisk could
be incorrect (a devices line excluding the disks or something) or for
some reason the udev rule or mdadm command isn't firing.  None of those
seem likely given the sequence of events you talked about in your first
thread on this problem, but we have to focus on what this system is
doing at boot that your rescue environment isn't.

If your system uses dracut then you can use rdshell to poke around at
what is happening when the root filesystem fails to mount.  You can look
at the device nodes that are created, run commands like blkid, and use
mdadm to attempt assembling the array, for instance.  If you manage to
get it assembled you can type 'exit' to continue booting.  This exercise
could possibly shed some light on your problem - helping you understand
what specifically is amiss in your pre-root environment that isn't wrong
when booting another OS (and ramdisk).

Wish I could be more specific, but hopefully this gets you something
more to look at.  I think this is what Neil was hinting at when he
suggested that you might need to create a new initrd - which is what you
would probably end up doing to fix this anyway if you found something
wrong in your initial ramdisk.

Hope this helps,

Tregaron

On Fri, 2013-05-10 at 05:52 +0100, Ricky Burgin wrote:
> Hi Sam,
> 
> Thanks for the response. Unfortunately that's not the case, that was one of the first things I checked. What variables are considered when adding or excluding drives to or from a raid via autodetection? This problem feels so esoteric that it might just be a bug... 
> 
> I'll keep on trying!
> 
> Ricky
> 
> Sam Bingner <sam@bingner.com> wrote:
> 
> >On May 8, 2013, at 3:20 PM, Ricky Burgin <ricky@burg.in> wrote:
> >
> >> Hello again,
> >> 
> >> Little bit of progress since I last dropped a message in (sorry for the
> >> duplicate, didn't think the initial one got through).
> >> 
> >> The kernel has mdraid built into it and all disks are using 0.90
> >> superblocks, all 'fd' partitions, but only 2 or 4 disks are being
> >> recognised and applied to the freshly created raid array which works
> >> fine when mounted on any other OS.
> >> 
> >> Any suggestions for what could cause disks to be overlooked by mdraid
> >> before the root device is even attempted to be mounted would be very
> >> helpful, I'm now totally at a loss as to what to do from here.
> >> 
> >> Kind regards,
> >> Ricky Burgin
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >Is it possible that you still have 1.x superblocks on two drives that are being detected before the 0.9 superblocks on this OS?  I don't know if this is or is not possible, but that is the only strangeness I see from your prior posts etc.
> >
> >You should probably boot that version of CentOS's rescue mode and see if you have the same issues there... 
> >
> >SamNrybXǧv^)޺{.n+{{ay\x1dʇڙ,j\afhz\x1ew\fj:+vwjm\azZ+ݢj"!

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html