All of lore.kernel.org
 help / color / mirror / Atom feed
* system update killed /boot RAID-1 array auto-assembly/mount. why?
@ 2009-11-02 20:24 Ben DJ
  2009-11-02 21:25 ` Jesse Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Ben DJ @ 2009-11-02 20:24 UTC (permalink / raw)
  To: linux-raid

Hi,

I've installed OpenSuse 11.2 RC2 to

  /boot on  4-disk RAID-1, super=1.0
  /root  & (etc) on 4-disk RAID-10,f2 chunk=256, super=1.1

It's been running fine.

After a recent system upgrade via 'zypper dup', which completed
without any apparent error, reboot failed.  /boot @ /dev/md0 was not
mounting, and the RAID-1 wasn't even assembling.

A full day of reading, and trying various repair-the-array solutions
couldn't get me back.

Although I was able to manually mount the array, and it fsck'ed ok, I
couldn't get the array to auto-assemble.

Finally, I deleted the array, repartitioned the drive, reinstalled
kernel, grub and mdadm, and I'm back in business.  At the moment, the
RAID-10 array is resyncing (not sure why):

cat /proc/mdstat
 Personalities : [raid10] [raid0] [raid1] [raid6] [raid5] [raid4] [linear]
 md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
       160604 blocks super 1.0 [4/4] [UUUU]

 md1 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
       1953198080 blocks super 1.1 256K chunks 2 far-copies [4/4] [UUUU]
       [====>................]  resync = 23.8% (465007616/1953198080)
finish=214.8min speed=115452K/sec

 unused devices: <none>


_Something_ happened at that system upgrade.  My mistake for not
paying closer attention to what was going on.  My goal is to not let
that happen again.

Knowing that I'm not providing any helpful detail -- I don't have it
atm -- can anyone speculate as to what might have happened @ the sys
update to cause this?  If possible, I'd like to start with some clue
as to what I'm watching out for.

Thanks,

BenDJ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: system update killed /boot RAID-1 array auto-assembly/mount. why?
  2009-11-02 20:24 system update killed /boot RAID-1 array auto-assembly/mount. why? Ben DJ
@ 2009-11-02 21:25 ` Jesse Wheeler
  2009-11-02 22:33   ` Ben DJ
  0 siblings, 1 reply; 7+ messages in thread
From: Jesse Wheeler @ 2009-11-02 21:25 UTC (permalink / raw)
  To: linux-raid

Ben:

I've observed some similar behavior on RAID-5 w/ SuSE Enterprise
Server 10.2 SP2 with our eDirectory boxes.  While this is different
from my current issue that I posted today, I will say that I have had
-- on average -- better luck with with RHEL based distributions than
SuSE variants.

I've noticed that Novell tends to make just ever so slight changes to
the back-ports that they place into their 'Enterprise' kernel.  That
being said, Novell is also the only current Enterprise grade Linux
vendor to actually place 'sane' options into the default
partitioning/mounting scheme for Ext3 -- i.e.,
'barrier=1,data=journal,noatime'.  However, this doesn't apply to
software raid/linux MD schemes since (AFAIK), only linear/single drive
Ext3 partitions can mount w/ 'barrier=1'.

This has been my hell for the last month.. 'Enterprise Linux'
distributions, filesystems, software vs. hardware RAID, and data loss.
 I've lost many-an-hour sleep ;o)!

--
Jesse W. Wheeler
Member-Owner
Devotio Consulting, L.L.C.
--

On Mon, Nov 2, 2009 at 3:24 PM, Ben DJ
<bendj095124367913213465@gmail.com> wrote:
> Hi,
>
> I've installed OpenSuse 11.2 RC2 to
>
>  /boot on  4-disk RAID-1, super=1.0
>  /root  & (etc) on 4-disk RAID-10,f2 chunk=256, super=1.1
>
> It's been running fine.
>
> After a recent system upgrade via 'zypper dup', which completed
> without any apparent error, reboot failed.  /boot @ /dev/md0 was not
> mounting, and the RAID-1 wasn't even assembling.
>
> A full day of reading, and trying various repair-the-array solutions
> couldn't get me back.
>
> Although I was able to manually mount the array, and it fsck'ed ok, I
> couldn't get the array to auto-assemble.
>
> Finally, I deleted the array, repartitioned the drive, reinstalled
> kernel, grub and mdadm, and I'm back in business.  At the moment, the
> RAID-10 array is resyncing (not sure why):
>
> cat /proc/mdstat
>  Personalities : [raid10] [raid0] [raid1] [raid6] [raid5] [raid4] [linear]
>  md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
>       160604 blocks super 1.0 [4/4] [UUUU]
>
>  md1 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
>       1953198080 blocks super 1.1 256K chunks 2 far-copies [4/4] [UUUU]
>       [====>................]  resync = 23.8% (465007616/1953198080)
> finish=214.8min speed=115452K/sec
>
>  unused devices: <none>
>
>
> _Something_ happened at that system upgrade.  My mistake for not
> paying closer attention to what was going on.  My goal is to not let
> that happen again.
>
> Knowing that I'm not providing any helpful detail -- I don't have it
> atm -- can anyone speculate as to what might have happened @ the sys
> update to cause this?  If possible, I'd like to start with some clue
> as to what I'm watching out for.
>
> Thanks,
>
> BenDJ
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: system update killed /boot RAID-1 array auto-assembly/mount. why?
  2009-11-02 21:25 ` Jesse Wheeler
@ 2009-11-02 22:33   ` Ben DJ
  2009-11-02 23:36     ` Jesse Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Ben DJ @ 2009-11-02 22:33 UTC (permalink / raw)
  To: Jesse Wheeler; +Cc: linux-raid

Jesse

On Mon, Nov 2, 2009 at 1:25 PM, Jesse Wheeler <jwwstpete@gmail.com> wrote:
> I've observed some similar behavior on RAID-5 w/ SuSE Enterprise
> Server 10.2 SP2 with our eDirectory boxes.  While this is different
> from my current issue that I posted today, I will say that I have had
> -- on average -- better luck with with RHEL based distributions than
> SuSE variants.

I can't say I've made any of those comparisons.  Given a choice, I'm
sticking with openSUSE ...

11.2's RAID behavior 'feels' a bit more fragile than it was on 11.1.
That might be simply because I'm a bit paranoid now what with the
recent FAIL.

I'm trying to figure out if using kernel-desktop, rather than
kernel-server, might have anything to do with this.  I'm guessing, but
RAID1 + RAID10 is not your usual "desktop" config, so though it might
be worth a check.  So far, I haven't found enough detail on the
differences.

BenDJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: system update killed /boot RAID-1 array auto-assembly/mount. why?
  2009-11-02 22:33   ` Ben DJ
@ 2009-11-02 23:36     ` Jesse Wheeler
  2009-11-03  0:09       ` Ben DJ
  0 siblings, 1 reply; 7+ messages in thread
From: Jesse Wheeler @ 2009-11-02 23:36 UTC (permalink / raw)
  To: Ben DJ; +Cc: linux-raid

Ben:

> I can't say I've made any of those comparisons.  Given a choice, I'm
> sticking with openSUSE ...

Totally understandable :o).

> That might be simply because I'm a bit paranoid now what with the
> recent FAIL.

That would make anyone nervous.  As someone who has lost quite a
bit-o'-data within a span of a few years, I can understand that
feeling.  Failures always suck.

> I'm trying to figure out if using kernel-desktop, rather than
> kernel-server, might have anything to do with this.  I'm guessing, but
> RAID1 + RAID10 is not your usual "desktop" config, so though it might
> be worth a check.  So far, I haven't found enough detail on the
> differences

I just re-read your original post.  I guess I read too fast before I
noticed that it was OpenSuSE 11.2 rather than the SLES distribution.
For that, again, I apologize.  However, is this deployment for server
purposes, desktop, or just learning your way around the Linux
way-of-functioning?  Please keep in mind that OpenSuSE is Novell's
equivalent of Red Hat's Fedora -- a proving ground/sandbox with some
form of additional public QA before it makes it into their mainline
Enterprise kernel.  I would shy away from anything that hasn't gone
through a formal QA process before putting it into production use.

Of course, I say that after a production CentOS install of 5.4 which
is the equivalent of RHEL 5.4, of which Fedora 10 was derived.

Point taken.  You've gotta love open source!

Have you checked Novell's OpenSUSE mailing list and community forums?
The last time I checked, there were a few people that were quite
knowledgeable.  I might be able to give you some e-mail addressee's if
you're at a dead-end.  If so, send me a msg off of list and I'd be
glad to help you.

In regards,

--
Jesse W. Wheeler
Member-Owner
Devotio Consulting, L.LC.
--


On Mon, Nov 2, 2009 at 5:33 PM, Ben DJ
<bendj095124367913213465@gmail.com> wrote:
> Jesse
>
> On Mon, Nov 2, 2009 at 1:25 PM, Jesse Wheeler <jwwstpete@gmail.com> wrote:
>> I've observed some similar behavior on RAID-5 w/ SuSE Enterprise
>> Server 10.2 SP2 with our eDirectory boxes.  While this is different
>> from my current issue that I posted today, I will say that I have had
>> -- on average -- better luck with with RHEL based distributions than
>> SuSE variants.
>
> I can't say I've made any of those comparisons.  Given a choice, I'm
> sticking with openSUSE ...
>
> 11.2's RAID behavior 'feels' a bit more fragile than it was on 11.1.
> That might be simply because I'm a bit paranoid now what with the
> recent FAIL.
>
> I'm trying to figure out if using kernel-desktop, rather than
> kernel-server, might have anything to do with this.  I'm guessing, but
> RAID1 + RAID10 is not your usual "desktop" config, so though it might
> be worth a check.  So far, I haven't found enough detail on the
> differences.
>
> BenDJ
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: system update killed /boot RAID-1 array auto-assembly/mount. why?
  2009-11-02 23:36     ` Jesse Wheeler
@ 2009-11-03  0:09       ` Ben DJ
  2009-11-03  0:11         ` Ben DJ
  2009-11-03  1:06         ` Jesse Wheeler
  0 siblings, 2 replies; 7+ messages in thread
From: Ben DJ @ 2009-11-03  0:09 UTC (permalink / raw)
  To: Jesse Wheeler; +Cc: linux-raid

Jesse,

On Mon, Nov 2, 2009 at 3:36 PM, Jesse Wheeler <jwwstpete@gmail.com> wrote:
> Please keep in mind that OpenSuSE is Novell's
> equivalent of Red Hat's Fedora -- a proving ground/sandbox with some
> form of additional public QA before it makes it into their mainline
> Enterprise kernel.  I would shy away from anything that hasn't gone
> through a formal QA process before putting it into production use.

This box is a model for a next-gen 'production desktop' use around
here.  Typical users are a bit 'past middle' between Office Staff &
Kernel Hackers.  We're sticking with opensuse, as SLE* & Redhat/Centos
seem generally too far behind a reasonable 'edge'. I know that there
will be issues; Compromises required.

But that's a different discussion.

I just need to get these 'mysterious' behaviors under control; or, at
least the causes understood.

BenDJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: system update killed /boot RAID-1 array auto-assembly/mount. why?
  2009-11-03  0:09       ` Ben DJ
@ 2009-11-03  0:11         ` Ben DJ
  2009-11-03  1:06         ` Jesse Wheeler
  1 sibling, 0 replies; 7+ messages in thread
From: Ben DJ @ 2009-11-03  0:11 UTC (permalink / raw)
  To: Jesse Wheeler; +Cc: linux-raid

Jesse,

On Mon, Nov 2, 2009 at 3:36 PM, Jesse Wheeler <jwwstpete@gmail.com> wrote:
> Please keep in mind that OpenSuSE is Novell's
> equivalent of Red Hat's Fedora -- a proving ground/sandbox with some
> form of additional public QA before it makes it into their mainline
> Enterprise kernel.  I would shy away from anything that hasn't gone
> through a formal QA process before putting it into production use.

This box is a model for a next-gen 'production desktop' use around
here.  Typical users are a bit 'past middle' between Office Staff &
Kernel Hackers.  We're sticking with opensuse, as SLE* & Redhat/Centos
seem generally too far behind a reasonable 'edge'. I know that there
will be issues; Compromises required.

But that's a different discussion.

I just need to get these 'mysterious' behaviors under control; or, at
least the causes understood.

BenDJ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: system update killed /boot RAID-1 array auto-assembly/mount. why?
  2009-11-03  0:09       ` Ben DJ
  2009-11-03  0:11         ` Ben DJ
@ 2009-11-03  1:06         ` Jesse Wheeler
  1 sibling, 0 replies; 7+ messages in thread
From: Jesse Wheeler @ 2009-11-03  1:06 UTC (permalink / raw)
  To: Ben DJ; +Cc: linux-raid

Understood.  I wish you luck!

On Mon, Nov 2, 2009 at 7:09 PM, Ben DJ
<bendj095124367913213465@gmail.com> wrote:
> Jesse,
>
> On Mon, Nov 2, 2009 at 3:36 PM, Jesse Wheeler <jwwstpete@gmail.com> wrote:
>> Please keep in mind that OpenSuSE is Novell's
>> equivalent of Red Hat's Fedora -- a proving ground/sandbox with some
>> form of additional public QA before it makes it into their mainline
>> Enterprise kernel.  I would shy away from anything that hasn't gone
>> through a formal QA process before putting it into production use.
>
> This box is a model for a next-gen 'production desktop' use around
> here.  Typical users are a bit 'past middle' between Office Staff &
> Kernel Hackers.  We're sticking with opensuse, as SLE* & Redhat/Centos
> seem generally too far behind a reasonable 'edge'. I know that there
> will be issues; Compromises required.
>
> But that's a different discussion.
>
> I just need to get these 'mysterious' behaviors under control; or, at
> least the causes understood.
>
> BenDJ
>



-- 
Jesse W. Wheeler
mailto: jwwstpete@gmail.com
Charlotte, North Carolina - U.S.A.

==
"I'm a white male, age 18 to 49. Everyone listens to me, no matter how
dumb my suggestions are." --Homer Simpson

"I really don't understand how bipartisanship is ever going to work
when one of the parties is insane."-- John Cole

"I defy the tyranny of precedent. I cannot afford the luxury of a
closed mind. I go for anything new that might improve the past." --
Clara Barton

"Any dictator would admire the uniformity and obedience of the U.S.
media." -- Noam Chomsky

"We are called to speak for the weak, for the voiceless, for victims
of our nation and for those it calls enemy...". -- Martin Luther King,
"Beyond Vietnam"

"Be yourself; everyone else is already taken." - Oscar Wilde

* My Other Car is a Health Insurance Payment
* My Car Has Better Insurance Than I Do
* My Death Panel is an HMO
* Underinsured Baby on Board
* WWJD:   Who Would Jesus Deny? (Healthcare Reform Now!)

"Quitbull with Lipstick: She's a gift that keeps on giving; an
Everlasting Gobstopper of fail." -- SilentBrook on DKos talking about
Palin
==
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-03  1:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-02 20:24 system update killed /boot RAID-1 array auto-assembly/mount. why? Ben DJ
2009-11-02 21:25 ` Jesse Wheeler
2009-11-02 22:33   ` Ben DJ
2009-11-02 23:36     ` Jesse Wheeler
2009-11-03  0:09       ` Ben DJ
2009-11-03  0:11         ` Ben DJ
2009-11-03  1:06         ` Jesse Wheeler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.