All of lore.kernel.org
 help / color / mirror / Atom feed
* md extension to support booting from raid whole disks.
@ 2009-04-24 12:08 Daniel Reurich
  2009-04-27 15:08 ` Goswin von Brederlow
  2009-04-28 23:07 ` Neil Brown
  0 siblings, 2 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-24 12:08 UTC (permalink / raw)
  To: linux-raid

Hi,

I have got linux successfully booting from a raid5 whole disk set with /
& /boot filesystems on that raid5 disk set.  This is possible thanks to
grub2 (with some hacking to make it install correctly.)

The downside I found is that the system won't boot if the 1st disk is
missing, as that contains the boot sector and core.img that grub
requires to boot.  The linux kernel also get unhappy about the partition
table of the first disk saying the volume is larger then the geometry of
the physical first disk.

I was wondering if it was worth extending the md superblock to make it
easier for booting raided whole disks.  There are several ideas I had
thought of that would make this achieveable:

A first cylinder which needs to be mirrored across all the devices.
This would be for the Volume/Master Boot record + Boot Sector Code.
Grub2 bootsector + core.img should fit in here at (or least enough of it
to bring grub up with the appropriate raid drivers.)

We could include a dummy partition table with the whole disk in the 1st
partition labeled as something like linux-raid (0xfd) or Non-FS data
(0xda).

The second cylinder has the md superblock and write intent bitmap, and
the raid volume starts at the beginning of the 3rd cylinder.

This would allow for this scheme to work with booting of all whole disk
raid arrays of all levels using grub2, without any significant changes
required in grub2.

Thanks for the awesomeness that is linux software raid.
 
-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-24 12:08 md extension to support booting from raid whole disks Daniel Reurich
@ 2009-04-27 15:08 ` Goswin von Brederlow
  2009-04-28  4:58   ` H. Peter Anvin
  2009-04-28  7:08   ` Daniel Reurich
  2009-04-28 23:07 ` Neil Brown
  1 sibling, 2 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-04-27 15:08 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: linux-raid

Daniel Reurich <daniel@centurion.net.nz> writes:

> Hi,
>
> I have got linux successfully booting from a raid5 whole disk set with /
> & /boot filesystems on that raid5 disk set.  This is possible thanks to
> grub2 (with some hacking to make it install correctly.)
>
> The downside I found is that the system won't boot if the 1st disk is
> missing, as that contains the boot sector and core.img that grub
> requires to boot.  The linux kernel also get unhappy about the partition
> table of the first disk saying the volume is larger then the geometry of
> the physical first disk.
>
> I was wondering if it was worth extending the md superblock to make it
> easier for booting raided whole disks.  There are several ideas I had
> thought of that would make this achieveable:

Or grub2 could be thought to install itself into all MBRs of all
drives in a raid set.

> A first cylinder which needs to be mirrored across all the devices.
> This would be for the Volume/Master Boot record + Boot Sector Code.
> Grub2 bootsector + core.img should fit in here at (or least enough of it
> to bring grub up with the appropriate raid drivers.)
>
> We could include a dummy partition table with the whole disk in the 1st
> partition labeled as something like linux-raid (0xfd) or Non-FS data
> (0xda).
>
> The second cylinder has the md superblock and write intent bitmap, and
> the raid volume starts at the beginning of the 3rd cylinder.
>
> This would allow for this scheme to work with booting of all whole disk
> raid arrays of all levels using grub2, without any significant changes
> required in grub2.

That part I really like. I'm just not sure how complicated the code
for this would be compared to teaching grub2 to handle this case
itself.

> Thanks for the awesomeness that is linux software raid.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-27 15:08 ` Goswin von Brederlow
@ 2009-04-28  4:58   ` H. Peter Anvin
  2009-04-28  6:26     ` Luca Berra
                       ` (2 more replies)
  2009-04-28  7:08   ` Daniel Reurich
  1 sibling, 3 replies; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-28  4:58 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Daniel Reurich, linux-raid

Goswin von Brederlow wrote:
> 
> Or grub2 could be thought to install itself into all MBRs of all
> drives in a raid set.
> 

... which is obviously completely wrong, given that that would break the
whole RAID layer.

The right thing is to use a RAID-1 partition.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28  4:58   ` H. Peter Anvin
@ 2009-04-28  6:26     ` Luca Berra
  2009-04-28  9:35     ` Goswin von Brederlow
  2009-04-28 18:24     ` Dan Williams
  2 siblings, 0 replies; 76+ messages in thread
From: Luca Berra @ 2009-04-28  6:26 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 27, 2009 at 09:58:08PM -0700, H. Peter Anvin wrote:
>Goswin von Brederlow wrote:
>> 
>> Or grub2 could be thought to install itself into all MBRs of all
>> drives in a raid set.

this is what is usually done on grub 0.97, tought implementations i have
seen do suck a lot.

>... which is obviously completely wrong, given that that would break the
>whole RAID layer.

why?

>The right thing is to use a RAID-1 partition.

I see no big problem in reserving some space on all boot drives for a
smallish boot loader,
Maybe it is just a matter of defining an 1.3 metadata format which
leaves more room at the start of the device than 1.2 for this very
purpose.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-27 15:08 ` Goswin von Brederlow
  2009-04-28  4:58   ` H. Peter Anvin
@ 2009-04-28  7:08   ` Daniel Reurich
  1 sibling, 0 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-28  7:08 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid


> >
> > I was wondering if it was worth extending the md superblock to make it
> > easier for booting raided whole disks.  There are several ideas I had
> > thought of that would make this achieveable:
> 
> Or grub2 could be thought to install itself into all MBRs of all
> drives in a raid set.

Grub2 largely already does this (atleast for v0.90 superblocks), It does
need some work to make it go for whole disks, but my proposal solves
this and reduces the needed work on grub.

> 
> > A first cylinder which needs to be mirrored across all the devices.
> > This would be for the Volume/Master Boot record + Boot Sector Code.
> > Grub2 bootsector + core.img should fit in here at (or least enough of it
> > to bring grub up with the appropriate raid drivers.)
> >
> > We could include a dummy partition table with the whole disk in the 1st
> > partition labeled as something like linux-raid (0xfd) or Non-FS data
> > (0xda).
> >
> > The second cylinder has the md superblock and write intent bitmap, and
> > the raid volume starts at the beginning of the 3rd cylinder.
> >
> > This would allow for this scheme to work with booting of all whole disk
> > raid arrays of all levels using grub2, without any significant changes
> > required in grub2.
> 
> That part I really like. I'm just not sure how complicated the code
> for this would be compared to teaching grub2 to handle this case
> itself.

The biggest problem I see is with the creation/resync of devices, and
replication of the important stuff accross all the member discs (ie the
1st cylinder, sans the partition table if it exists (because the disk
geometry may differ between member disks, though I'd prefer no partition
table at all simplifying the implementation here.)  

I'm not yet sure about how much of this would be changes to mdadm or the
md kernel module.  (I assume the replication of the first cylinder would
need to be in the kernel module, and I believe this is already done for
the write intent bitmap anyway, so it shouldn't be too difficult.)

I guess the superblock location isn't even critical, and could go in
either the end of the device as it already does for v0.9/1.0 or at the
start of the device as described in my previous post.




-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28  4:58   ` H. Peter Anvin
  2009-04-28  6:26     ` Luca Berra
@ 2009-04-28  9:35     ` Goswin von Brederlow
  2009-04-28 11:21       ` Daniel Reurich
  2009-04-28 17:36       ` H. Peter Anvin
  2009-04-28 18:24     ` Dan Williams
  2 siblings, 2 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-04-28  9:35 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Goswin von Brederlow, Daniel Reurich, linux-raid

"H. Peter Anvin" <hpa@zytor.com> writes:

> Goswin von Brederlow wrote:
>> 
>> Or grub2 could be thought to install itself into all MBRs of all
>> drives in a raid set.
>> 
>
> ... which is obviously completely wrong, given that that would break the
> whole RAID layer.

Not if there is unused space for the MBR on every raid disk. The 1.2
metadata format leaves the first 4k free on every disk.

> The right thing is to use a RAID-1 partition.
>
> 	-hpa

And how do you create a raid-1 partition on a whole disk raid?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28  9:35     ` Goswin von Brederlow
@ 2009-04-28 11:21       ` Daniel Reurich
  2009-04-28 17:36       ` H. Peter Anvin
  1 sibling, 0 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-28 11:21 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: H. Peter Anvin, linux-raid

On Tue, 2009-04-28 at 11:35 +0200, Goswin von Brederlow wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> > Goswin von Brederlow wrote:
> >> 
> >> Or grub2 could be thought to install itself into all MBRs of all
> >> drives in a raid set.
> >> 
> >
> > ... which is obviously completely wrong, given that that would break the
> > whole RAID layer.
> 
> Not if there is unused space for the MBR on every raid disk. The 1.2
> metadata format leaves the first 4k free on every disk.

I had considered this and figured that space the 1.2 superblock uses
would be right in the middle of your typical bootloader code, and the
space it takes up depends on how many member disks, and the size of the
write intent bitmap if it's used  (I couldn't find details of where the
write intent bitmap normally goes, so I have assumed that it goes
somewhere nearby following the superblock.)

Whats more grub would pretty much use the whole of the 1st cylinder sans
the first 512 byte MBR for the core.img, especially in the case of
starting a raid array where it needs the raid modules built into the
core image. 

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28  9:35     ` Goswin von Brederlow
  2009-04-28 11:21       ` Daniel Reurich
@ 2009-04-28 17:36       ` H. Peter Anvin
  2009-04-28 22:23         ` Daniel Reurich
  1 sibling, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-28 17:36 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Daniel Reurich, linux-raid

Goswin von Brederlow wrote:
> 
> And how do you create a raid-1 partition on a whole disk raid?
> 

What you're asking for is a partitioning layer somehow hidden away
inside md.  This is about as intelligent as Dick Cheney.  We already
have partitioning layers, use them.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28  4:58   ` H. Peter Anvin
  2009-04-28  6:26     ` Luca Berra
  2009-04-28  9:35     ` Goswin von Brederlow
@ 2009-04-28 18:24     ` Dan Williams
  2009-04-28 22:19       ` Daniel Reurich
  2 siblings, 1 reply; 76+ messages in thread
From: Dan Williams @ 2009-04-28 18:24 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Goswin von Brederlow, Daniel Reurich, linux-raid

On Mon, Apr 27, 2009 at 9:58 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Goswin von Brederlow wrote:
>>
>> Or grub2 could be thought to install itself into all MBRs of all
>> drives in a raid set.
>>
>
> ... which is obviously completely wrong, given that that would break the
> whole RAID layer.
>
> The right thing is to use a RAID-1 partition.
>

...or use a metadata format that your platform bios understands and
provides an int 13h vector.  See the new external metadata formats
supported by the mdadm devel-3.0 branch.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 18:24     ` Dan Williams
@ 2009-04-28 22:19       ` Daniel Reurich
  2009-04-28 22:26         ` Dan Williams
  2009-04-28 23:05         ` Neil Brown
  0 siblings, 2 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-28 22:19 UTC (permalink / raw)
  To: Dan Williams; +Cc: H. Peter Anvin, Goswin von Brederlow, linux-raid

On Tue, 2009-04-28 at 11:24 -0700, Dan Williams wrote:

> 
> ...or use a metadata format that your platform bios understands and
> provides an int 13h vector.  See the new external metadata formats
> supported by the mdadm devel-3.0 branch.

I don't think a metadata format is the right way either.  

What we need is a new version of the superblock with the first cylinder
(32kb on 512b sectors x64 sectors per cylinder) being set aside for the
bootloader, the superblock and w-i bitmap go in the second cylinder, and
the raid data area starting in the 3rd cylinder.  

It should be the bootloaders responsibility to install the bootloader
onto the disks 1st cylinder, but md/mdadm would have to replicate it on
resync or adding of a new disk.  However we could consider remapping the
bootloader 


-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 17:36       ` H. Peter Anvin
@ 2009-04-28 22:23         ` Daniel Reurich
  2009-04-28 23:30           ` H. Peter Anvin
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-28 22:23 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Goswin von Brederlow, linux-raid

On Tue, 2009-04-28 at 10:36 -0700, H. Peter Anvin wrote:
> Goswin von Brederlow wrote:
> > 
> > And how do you create a raid-1 partition on a whole disk raid?
> > 
> 
> What you're asking for is a partitioning layer somehow hidden away
> inside md.  This is about as intelligent as Dick Cheney.  We already
> have partitioning layers, use them.
> 
Dick Who???  

No What I'm asking for is a superblock layout that allows the boot
loader to be installed in each member disk, so that the booting from a
software raid array becomes not only possible, but reliable and easy to
set up.

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 22:19       ` Daniel Reurich
@ 2009-04-28 22:26         ` Dan Williams
  2009-05-01 21:04           ` Goswin von Brederlow
  2009-04-28 23:05         ` Neil Brown
  1 sibling, 1 reply; 76+ messages in thread
From: Dan Williams @ 2009-04-28 22:26 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: H. Peter Anvin, Goswin von Brederlow, linux-raid

On Tue, Apr 28, 2009 at 3:19 PM, Daniel Reurich <daniel@centurion.net.nz> wrote:
> On Tue, 2009-04-28 at 11:24 -0700, Dan Williams wrote:
>
>>
>> ...or use a metadata format that your platform bios understands and
>> provides an int 13h vector.  See the new external metadata formats
>> supported by the mdadm devel-3.0 branch.
>
> I don't think a metadata format is the right way either.

Huh? The bootloader does not need to know anything about raid.  It
just uses int13 calls to read sectors off a "disk".  The fact that the
disk is a software raid5 array is completely hidden from grub.  This
is functionality that has been available via dmraid for some time and
is now being made available with the MD infrastructure and mdadm.

Regards,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 22:19       ` Daniel Reurich
  2009-04-28 22:26         ` Dan Williams
@ 2009-04-28 23:05         ` Neil Brown
  2009-04-28 23:20           ` H. Peter Anvin
                             ` (2 more replies)
  1 sibling, 3 replies; 76+ messages in thread
From: Neil Brown @ 2009-04-28 23:05 UTC (permalink / raw)
  To: Daniel Reurich
  Cc: Dan Williams, H. Peter Anvin, Goswin von Brederlow, linux-raid

On Wednesday April 29, daniel@centurion.net.nz wrote:
> On Tue, 2009-04-28 at 11:24 -0700, Dan Williams wrote:
> 
> > 
> > ...or use a metadata format that your platform bios understands and
> > provides an int 13h vector.  See the new external metadata formats
> > supported by the mdadm devel-3.0 branch.
> 
> I don't think a metadata format is the right way either.  
> 
> What we need is a new version of the superblock with the first cylinder
> (32kb on 512b sectors x64 sectors per cylinder) being set aside for the
> bootloader, the superblock and w-i bitmap go in the second cylinder, and
> the raid data area starting in the 3rd cylinder.  
> 
> It should be the bootloaders responsibility to install the bootloader
> onto the disks 1st cylinder, but md/mdadm would have to replicate it on
> resync or adding of a new disk.  However we could consider remapping the
> bootloader 

While I agree with Dan that having a BIOS which understands RAID is a
good way to make this sort of thing "just work", I would be nice if it
could work for people without the bios too.

v1.x metadata has explicit knowledge of where the start of the data
is, so it is quite possible to leave the first few (dozen) sectors
unused (let's not talk about cylinders this century - OK?).
So mdadm could grow a --grub flag to use with --create which arranged
for data/bitmap to not use the first (say) 512 sectors of any device.
(1.1 and 1.2 would still use reserved blocks for the superblock).
[I can cut you a patch to experiment with if you like]

grub could then write whatever it wants to write to any of these
sectors.

That only leaves the question of what happens when a spare is added to
the array - how does the grub data get written to the space on the
spare.
I would rather that grub were responsible for this, than for md to
treat that unused space as RAID1.
We already have a notification system based on "mdadm --monitor" to
process events.  We could possibly plug grub in to that somehow so
that it gets told to re-write all it's special blocks every time
something significant changes in the array.

NeilBrown

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-24 12:08 md extension to support booting from raid whole disks Daniel Reurich
  2009-04-27 15:08 ` Goswin von Brederlow
@ 2009-04-28 23:07 ` Neil Brown
  2009-04-28 23:21   ` Daniel Reurich
  2009-04-28 23:37   ` H. Peter Anvin
  1 sibling, 2 replies; 76+ messages in thread
From: Neil Brown @ 2009-04-28 23:07 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: linux-raid

On Saturday April 25, daniel@centurion.net.nz wrote:
> Hi,
> 
> I have got linux successfully booting from a raid5 whole disk set with /
> & /boot filesystems on that raid5 disk set.  This is possible thanks to
> grub2 (with some hacking to make it install correctly.)
> 
> The downside I found is that the system won't boot if the 1st disk is
> missing, as that contains the boot sector and core.img that grub
> requires to boot.  The linux kernel also get unhappy about the partition
> table of the first disk saying the volume is larger then the geometry of
> the physical first disk.

So where does grub store the core.img in this setup?

While the kernel might be noisy about large partitions, it should
just be noise - everything should still work  - right?

NeilBrown

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:05         ` Neil Brown
@ 2009-04-28 23:20           ` H. Peter Anvin
  2009-04-29  0:00             ` Daniel Reurich
  2009-04-29  7:45             ` md extension to support booting from raid whole disks Luca Berra
  2009-04-28 23:41           ` Daniel Reurich
  2009-05-01 21:33           ` Goswin von Brederlow
  2 siblings, 2 replies; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-28 23:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: Daniel Reurich, Dan Williams, Goswin von Brederlow, linux-raid

Neil Brown wrote:
> 
> That only leaves the question of what happens when a spare is added to
> the array - how does the grub data get written to the space on the
> spare.
> I would rather that grub were responsible for this, than for md to
> treat that unused space as RAID1.
> We already have a notification system based on "mdadm --monitor" to
> process events.  We could possibly plug grub in to that somehow so
> that it gets told to re-write all it's special blocks every time
> something significant changes in the array.
> 

I have multiple issues with this concept (including promoting Grub2, but
let's not get into that.)

For this to be reliable, there is only one sensible configuration, which
is for /boot to be a RAID-1, which is better handled by -- guess what --
partitioning systems; and we already have quite a few of those that work
just fine, thank you.  Otherwise there WILL be configurations -- caused
by controller failures if nothing else -- that simply will not boot even
though the system is otherwise functional.  Promoting this kind of stuff
is criminally stupid.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:07 ` Neil Brown
@ 2009-04-28 23:21   ` Daniel Reurich
  2009-04-28 23:37   ` H. Peter Anvin
  1 sibling, 0 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-28 23:21 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid


> 
> So where does grub store the core.img in this setup?

It's embedded after the MBR in the remaining sectors of the first
cylinder.
> 
> While the kernel might be noisy about large partitions, it should
> just be noise - everything should still work  - right?

It add's 20 - 30 seconds to the kernel start up time, but yes it does
work.


-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 22:23         ` Daniel Reurich
@ 2009-04-28 23:30           ` H. Peter Anvin
  2009-04-29  0:02             ` Daniel Reurich
  0 siblings, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-28 23:30 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Goswin von Brederlow, linux-raid

Daniel Reurich wrote:
> On Tue, 2009-04-28 at 10:36 -0700, H. Peter Anvin wrote:
>> Goswin von Brederlow wrote:
>>> And how do you create a raid-1 partition on a whole disk raid?
>>>
>> What you're asking for is a partitioning layer somehow hidden away
>> inside md.  This is about as intelligent as Dick Cheney.  We already
>> have partitioning layers, use them.
>>
> Dick Who???  
> 
> No What I'm asking for is a superblock layout that allows the boot
> loader to be installed in each member disk, so that the booting from a
> software raid array becomes not only possible, but reliable and easy to
> set up.
> 

What you're asking for is a partitioning scheme hidden away in md.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:07 ` Neil Brown
  2009-04-28 23:21   ` Daniel Reurich
@ 2009-04-28 23:37   ` H. Peter Anvin
  2009-04-29  0:05     ` Daniel Reurich
  1 sibling, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-28 23:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: Daniel Reurich, linux-raid

Neil Brown wrote:
> 
> So where does grub store the core.img in this setup?
> 
> While the kernel might be noisy about large partitions, it should
> just be noise - everything should still work  - right?
> 

For large disks, don't use MS-DOS partition tables; use GPT or another
clean scheme.

(For what it's worth, the MS-DOS partition table will work just fine for
the purpose of carving out a small /boot at the beginning, too.)

	-hpa
-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:05         ` Neil Brown
  2009-04-28 23:20           ` H. Peter Anvin
@ 2009-04-28 23:41           ` Daniel Reurich
  2009-04-29  0:01             ` H. Peter Anvin
  2009-05-01 21:33           ` Goswin von Brederlow
  2 siblings, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-28 23:41 UTC (permalink / raw)
  To: Neil Brown; +Cc: Dan Williams, H. Peter Anvin, Goswin von Brederlow, linux-raid


> 
> v1.x metadata has explicit knowledge of where the start of the data
> is, so it is quite possible to leave the first few (dozen) sectors
> unused (let's not talk about cylinders this century - OK?).
> So mdadm could grow a --grub flag to use with --create which arranged
> for data/bitmap to not use the first (say) 512 sectors of any device.
> (1.1 and 1.2 would still use reserved blocks for the superblock).
> [I can cut you a patch to experiment with if you like]

That would be nice.  The 1.1 superblock is no good as the bootsector
goes in the same place, and 1.2 is where I expect grub would be writing
data too.  I'll check this though.  Grub still needs patching to
understand v1.X superblocks, so that could include blacklisting the
location of a v1.2 superblock location and some for the write-intent
bitmap. (Is there a way to determine where the w-i bitmap gets located
and how big it is from the super block.)

I'd say put lets reserve the first 64K (2 cylinders) for boot and
superblock.


> grub could then write whatever it wants to write to any of these
> sectors.
> 
> That only leaves the question of what happens when a spare is added to
> the array - how does the grub data get written to the space on the
> spare.
> I would rather that grub were responsible for this, than for md to
> treat that unused space as RAID1.

Fair enough for the short term, but I imagine in the long run it would
be a better way then calling an external application.  Isn't this
already done for the w-i bitmap anyway?

> We already have a notification system based on "mdadm --monitor" to
> process events.  We could possibly plug grub in to that somehow so
> that it gets told to re-write all it's special blocks every time
> something significant changes in the array.

If mdmon can call external commands, it should only need to call the
appropriate grub-install [/dev/md/dX | "(md0)", and it would rewrite the
superblock on all the devices.

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:20           ` H. Peter Anvin
@ 2009-04-29  0:00             ` Daniel Reurich
  2009-04-29  0:04               ` H. Peter Anvin
  2009-04-29  7:45             ` md extension to support booting from raid whole disks Luca Berra
  1 sibling, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  0:00 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Neil Brown, Dan Williams, Goswin von Brederlow, linux-raid

On Tue, 2009-04-28 at 16:20 -0700, H. Peter Anvin wrote:
> Neil Brown wrote:
> > 
> > That only leaves the question of what happens when a spare is added to
> > the array - how does the grub data get written to the space on the
> > spare.
> > I would rather that grub were responsible for this, than for md to
> > treat that unused space as RAID1.
> > We already have a notification system based on "mdadm --monitor" to
> > process events.  We could possibly plug grub in to that somehow so
> > that it gets told to re-write all it's special blocks every time
> > something significant changes in the array.
> > 
> 
> I have multiple issues with this concept (including promoting Grub2, but
> let's not get into that.)
> 
It could be a call to a generic interface for reinstalling the
bootloader.  It's not intended to be grub specific, it just so happens
that grub2 is the closest to having a working boot from software raid
implementation.

> For this to be reliable, there is only one sensible configuration, which
> is for /boot to be a RAID-1, which is better handled by -- guess what --
> partitioning systems; and we already have quite a few of those that work
> just fine, thank you.  Otherwise there WILL be configurations -- caused
> by controller failures if nothing else -- that simply will not boot even
> though the system is otherwise functional.  Promoting this kind of stuff
> is criminally stupid.

I disagree.  Grub is quite capable of booting from and assembling a
raid5 volume and accessing it's partitions contents, even if the array
is degraded.  All I'm asking for is that the first 64 kbytes of the disk
be reserved and some of it possibly (but not necessarily) replicated so
that a bootloader capable of assembling a raid array can be installed on
the start of each member disk so that whatever disk the bios decides to
boot from, it will always boot.
 
-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:41           ` Daniel Reurich
@ 2009-04-29  0:01             ` H. Peter Anvin
  0 siblings, 0 replies; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29  0:01 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Neil Brown, Dan Williams, Goswin von Brederlow, linux-raid

Daniel Reurich wrote:
> 
> That would be nice.  The 1.1 superblock is no good as the bootsector
> goes in the same place, and 1.2 is where I expect grub would be writing
> data too.  I'll check this though.  Grub still needs patching to
> understand v1.X superblocks, so that could include blacklisting the
> location of a v1.2 superblock location and some for the write-intent
> bitmap. (Is there a way to determine where the w-i bitmap gets located
> and how big it is from the super block.)
> 
> I'd say put lets reserve the first 64K (2 cylinders) for boot and
> superblock.
> 

I say let's tell people to do the only sane thing and carve out an
appropriate chunk using existing partitioning methods.

Anything else is reinventing the wheel, badly, at the wrong level.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:30           ` H. Peter Anvin
@ 2009-04-29  0:02             ` Daniel Reurich
  2009-04-29 11:32               ` John Robinson
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  0:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Goswin von Brederlow, linux-raid


> > No What I'm asking for is a superblock layout that allows the boot
> > loader to be installed in each member disk, so that the booting from a
> > software raid array becomes not only possible, but reliable and easy to
> > set up.
> > 
> 
> What you're asking for is a partitioning scheme hidden away in md.
> 
Nope, just the first 64K of each member disk to be reserved for a raid
aware bootloader.

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:00             ` Daniel Reurich
@ 2009-04-29  0:04               ` H. Peter Anvin
  2009-04-29  0:20                 ` Daniel Reurich
  0 siblings, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29  0:04 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Neil Brown, Dan Williams, Goswin von Brederlow, linux-raid

Daniel Reurich wrote:
> 
>> For this to be reliable, there is only one sensible configuration, which
>> is for /boot to be a RAID-1, which is better handled by -- guess what --
>> partitioning systems; and we already have quite a few of those that work
>> just fine, thank you.  Otherwise there WILL be configurations -- caused
>> by controller failures if nothing else -- that simply will not boot even
>> though the system is otherwise functional.  Promoting this kind of stuff
>> is criminally stupid.
> 
> I disagree.  Grub is quite capable of booting from and assembling a
> raid5 volume and accessing it's partitions contents, even if the array
> is degraded.  All I'm asking for is that the first 64 kbytes of the disk
> be reserved and some of it possibly (but not necessarily) replicated so
> that a bootloader capable of assembling a raid array can be installed on
> the start of each member disk so that whatever disk the bios decides to
> boot from, it will always boot.
>  

Grub is capable of doing that IF THE FIRMWARE CAN REACH IT.

You seem to have the happy notion that this is something typical, which
frequently isn't the case.

What's worse, you're clearly of the opinion that this is something that
should be promoted to users, which is the "criminal" part of "criminally
stupid."

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:37   ` H. Peter Anvin
@ 2009-04-29  0:05     ` Daniel Reurich
  2009-04-29  0:06       ` H. Peter Anvin
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  0:05 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Neil Brown, linux-raid


> For large disks, don't use MS-DOS partition tables; use GPT or another
> clean scheme.

That's not the point.  Stop trying to de-rail what I'm trying to
acheive.
> 
> (For what it's worth, the MS-DOS partition table will work just fine for
> the purpose of carving out a small /boot at the beginning, too.)
> 
> 	-hpa
-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:05     ` Daniel Reurich
@ 2009-04-29  0:06       ` H. Peter Anvin
  2009-04-29  0:36         ` Daniel Reurich
  0 siblings, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29  0:06 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Neil Brown, linux-raid

Daniel Reurich wrote:
>> For large disks, don't use MS-DOS partition tables; use GPT or another
>> clean scheme.
> 
> That's not the point.  Stop trying to de-rail what I'm trying to
> acheive.

Not on your life.

You're trying to do something that will cause users untold of hours of
pain and suffering, and I'm trying to stop your idiotic scheme any way I
can.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:04               ` H. Peter Anvin
@ 2009-04-29  0:20                 ` Daniel Reurich
  2009-04-29  0:28                   ` H. Peter Anvin
  2009-04-29 22:43                   ` md extension to support booting from raid whole disks, raid6, grub2, lvm2 Michael Ole Olsen
  0 siblings, 2 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  0:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Neil Brown, Dan Williams, Goswin von Brederlow, linux-raid

On Tue, 2009-04-28 at 17:04 -0700, H. Peter Anvin wrote:
> Daniel Reurich wrote:
> > 
> >> For this to be reliable, there is only one sensible configuration, which
> >> is for /boot to be a RAID-1, which is better handled by -- guess what --
> >> partitioning systems; and we already have quite a few of those that work
> >> just fine, thank you.  Otherwise there WILL be configurations -- caused
> >> by controller failures if nothing else -- that simply will not boot even
> >> though the system is otherwise functional.  Promoting this kind of stuff
> >> is criminally stupid.
> > 
> > I disagree.  Grub is quite capable of booting from and assembling a
> > raid5 volume and accessing it's partitions contents, even if the array
> > is degraded.  All I'm asking for is that the first 64 kbytes of the disk
> > be reserved and some of it possibly (but not necessarily) replicated so
> > that a bootloader capable of assembling a raid array can be installed on
> > the start of each member disk so that whatever disk the bios decides to
> > boot from, it will always boot.
> >  
> 
> Grub is capable of doing that IF THE FIRMWARE CAN REACH IT.

Well if the firmware can't find one if the disks, then it doesn't matter
what scheme we have.  Even a single disk won't work.
> 
> You seem to have the happy notion that this is something typical, which
> frequently isn't the case.

I'd say it's typical of 100% of pc's, mac's and just about anything else
that boots of a harddisk without a hardware raid controller.
> 
> What's worse, you're clearly of the opinion that this is something that
> should be promoted to users, which is the "criminal" part of "criminally
> stupid."

I'd like it for me, and to prove it can be done and is a cleaner and
less administratively intensive way of doing it then teaching the
OS/user how to partition a disk and add each partition to into their
respective raid array each time they need to replace or add a new disk
to their array(s). 

Whether this proves reliable and stable enough to be promoted to users
can only be seen once it's proven (or not).

What's your beef. MD already reserve some space for the superblock, and
write-intent bitmap (which I believe is also replicated across the
member disks), so why not add some space to this to make it possible for
a bootloader as well.


-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:20                 ` Daniel Reurich
@ 2009-04-29  0:28                   ` H. Peter Anvin
  2009-04-29  0:43                     ` Daniel Reurich
  2009-04-29 22:43                   ` md extension to support booting from raid whole disks, raid6, grub2, lvm2 Michael Ole Olsen
  1 sibling, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29  0:28 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Neil Brown, Dan Williams, Goswin von Brederlow, linux-raid

Daniel Reurich wrote:
>>>  
>> Grub is capable of doing that IF THE FIRMWARE CAN REACH IT.
> 
> Well if the firmware can't find one if the disks, then it doesn't matter
> what scheme we have.  Even a single disk won't work.

It is *quite* common that firmware can reach a subset of the disks.  If
not when the system is set up, then when a controller is blown and the
user has to install a new one.  I have seen this particular malfunction
up close more times than I can count.

> What's your beef. MD already reserve some space for the superblock, and
> write-intent bitmap (which I believe is also replicated across the
> member disks), so why not add some space to this to make it possible for
> a bootloader as well.

My beef is that you're actively promoting an extremely dangerous
concept, dangerous exactly because it is seductive -- "it seems so
easy."  Most users, you included, apparently, typically will have no
notion of the failure modes, and will pick the "easy" option.

Booting is ugly business.  I have dealt with the subtleties for almost
two decades, and it riles me when people go and foist off bad ideas on
users.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:06       ` H. Peter Anvin
@ 2009-04-29  0:36         ` Daniel Reurich
  2009-04-29  0:44           ` H. Peter Anvin
  2009-04-29  7:07           ` Gabor Gombas
  0 siblings, 2 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  0:36 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Neil Brown, linux-raid

On Tue, 2009-04-28 at 17:06 -0700, H. Peter Anvin wrote:
> Daniel Reurich wrote:
> >> For large disks, don't use MS-DOS partition tables; use GPT or another
> >> clean scheme.
> > 
> > That's not the point.  Stop trying to de-rail what I'm trying to
> > acheive.
> 
> Not on your life.
> 
> You're trying to do something that will cause users untold of hours of
> pain and suffering, and I'm trying to stop your idiotic scheme any way I
> can.

We are already in pain at not having the capability to reliably boot
from a software raid array.  

This is the last hurdle to supplanting proprietary hardware raid
controllers with their proprietary drivers and management tool stacks,
and make software raid actually use able for the average Joe with his
small business server built from cheap commodity hardware running linux.

Or is this what your scared of?



  
-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:28                   ` H. Peter Anvin
@ 2009-04-29  0:43                     ` Daniel Reurich
  2009-04-29  6:43                       ` Gabor Gombas
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  0:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Neil Brown, Dan Williams, Goswin von Brederlow, linux-raid

On Tue, 2009-04-28 at 17:28 -0700, H. Peter Anvin wrote:
> Daniel Reurich wrote:
> >>>  
> >> Grub is capable of doing that IF THE FIRMWARE CAN REACH IT.
> > 
> > Well if the firmware can't find one if the disks, then it doesn't matter
> > what scheme we have.  Even a single disk won't work.
> 
> It is *quite* common that firmware can reach a subset of the disks.  If
> not when the system is set up, then when a controller is blown and the
> user has to install a new one.  I have seen this particular malfunction
> up close more times than I can count.
> 
In which case your probably using a hardware raid controller anyway so
not our problem.  Otherwise if the array is broken by a failed
controller we probably shouldn't boot of it anyway.

> > What's your beef. MD already reserve some space for the superblock, and
> > write-intent bitmap (which I believe is also replicated across the
> > member disks), so why not add some space to this to make it possible for
> > a bootloader as well.
> 
> My beef is that you're actively promoting an extremely dangerous
> concept, dangerous exactly because it is seductive -- "it seems so
> easy."  Most users, you included, apparently, typically will have no
> notion of the failure modes, and will pick the "easy" option.
> 
What's specifically dangerous about it?  Define the failure modes that
this scheme is unable to either cope with that it should do.

> Booting is ugly business.  I have dealt with the subtleties for almost
> two decades, and it riles me when people go and foist off bad ideas on
> users.

Then stop being so emotive about it and explain the failings of the
scheme rather than point the finger at me and say I don't know.

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:36         ` Daniel Reurich
@ 2009-04-29  0:44           ` H. Peter Anvin
       [not found]             ` <1240968482.18303.1028.camel@ezra>
  2009-04-30  2:41             ` Daniel Reurich
  2009-04-29  7:07           ` Gabor Gombas
  1 sibling, 2 replies; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29  0:44 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Neil Brown, linux-raid

Daniel Reurich wrote:
> 
> We are already in pain at not having the capability to reliably boot
> from a software raid array.  
> 

We have that just fine.  Use a RAID-1, which means that as long as the
firmware can find ANY disk you can boot from it, at which time you have
the full library of Linux drivers available, and it doesn't matter how
bad things work.

> This is the last hurdle to supplanting proprietary hardware raid
> controllers with their proprietary drivers and management tool stacks,
> and make software raid actually use able for the average Joe with his
> small business server built from cheap commodity hardware running linux.
>
> Or is this what your scared of?

The cheap commodity hardware running Linux is exactly what I want to
make work correctly; in particular I want it to work even when Joe finds
that a card goes dead in his box and he buys another one off the shelf.
 At that point, it should JUST WORK, no matter how broken the firmware
is (extremely common) or even if the card comes with no firmware at all
(common enough).

We already have the technology to splitting the disk between the boot
region and the array region (partition tables) and making sure that the
boot region is fully replicated (RAID-1).  That is what we should be
deploying, and if it somehow is too hard to deploy, that is what should
be fixed.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
       [not found]               ` <49F7B162.8060301@zytor.com>
@ 2009-04-29  2:08                 ` Daniel Reurich
  2009-04-29  2:33                   ` H. Peter Anvin
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-04-29  2:08 UTC (permalink / raw)
  To: linux-raid

> Grub2 can't be "fixed" with anything less than replicating the entire
> device driver system of Linux in it, which is equivalent to turning it
> into Linux.
> 
Ok then....

So we need then to make mdadm --add hook into some tools that can
automatically replicate the partition table layout and add each
respective partition into it's respective raid volume, and
install/replicate the bootloader onto the disk.

Also then we should really have tool to set this up properly as well on
the initial creation of the raid.


-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  2:08                 ` Daniel Reurich
@ 2009-04-29  2:33                   ` H. Peter Anvin
  0 siblings, 0 replies; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29  2:33 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: linux-raid

Daniel Reurich wrote:
>> Grub2 can't be "fixed" with anything less than replicating the entire
>> device driver system of Linux in it, which is equivalent to turning it
>> into Linux.
>>
> Ok then....
> 
> So we need then to make mdadm --add hook into some tools that can
> automatically replicate the partition table layout and add each
> respective partition into it's respective raid volume, and
> install/replicate the bootloader onto the disk.
> 
> Also then we should really have tool to set this up properly as well on
> the initial creation of the raid.
> 

Yes, it would definitely be nice to have this capability.  Whether or
not it should be done in mdadm or in some kind of wrapper I don't know
offhand.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:43                     ` Daniel Reurich
@ 2009-04-29  6:43                       ` Gabor Gombas
  2009-05-01 21:10                         ` Goswin von Brederlow
  0 siblings, 1 reply; 76+ messages in thread
From: Gabor Gombas @ 2009-04-29  6:43 UTC (permalink / raw)
  To: Daniel Reurich
  Cc: H. Peter Anvin, Neil Brown, Dan Williams, Goswin von Brederlow,
	linux-raid

On Wed, Apr 29, 2009 at 12:43:51PM +1200, Daniel Reurich wrote:

> In which case your probably using a hardware raid controller anyway so
> not our problem.  Otherwise if the array is broken by a failed
> controller we probably shouldn't boot of it anyway.

I have set up a box with 8 SATA disks attached to 2 on-board
controllers. The BIOS can boot from any of the controllers, but then it
can only see the disks that are attached to the selected controller.
Which is quite reasonable if the BIOS handles the controller selection
by redirecting INT 13h (I have not checked).

With "/" on RAID1, I can boot in any failure scenarios (I've actually
tested that anno). With your setup, the box would never boot, since it
could never access enough disks in a RAID5/6 array, even if all the
disks/controllers are perfectly fine.

> What's specifically dangerous about it?  Define the failure modes that
> this scheme is unable to either cope with that it should do.

There is no need for a failure mode. Your scheme does not work even when
everything is fine.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:36         ` Daniel Reurich
  2009-04-29  0:44           ` H. Peter Anvin
@ 2009-04-29  7:07           ` Gabor Gombas
  1 sibling, 0 replies; 76+ messages in thread
From: Gabor Gombas @ 2009-04-29  7:07 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: H. Peter Anvin, Neil Brown, linux-raid

On Wed, Apr 29, 2009 at 12:36:13PM +1200, Daniel Reurich wrote:

> This is the last hurdle to supplanting proprietary hardware raid
> controllers with their proprietary drivers and management tool stacks,
> and make software raid actually use able for the average Joe with his
> small business server built from cheap commodity hardware running linux.

Huh? You have just written:

> In which case your probably using a hardware raid controller anyway so
> not our problem. 

You first say that people with any non-trivial configuration should use
a HW RAID controller to justify the breaking of their systems. Then you
say you would like MD to replace HW RAID in such setups. This makes no
sense.

I have set up both cheap commodity hardware and (not so) big HW RAID.
Both have their uses and their weaknesses. Your "boot directly from
RAID5/6" scheme will _never_ work reliably on cheap commodity hardware
unless you fix the hardware _first_. If you want to use cheap commodity
hardware and still have reliability, then boot from RAID1. (And use
lilo; grub is still less reliable in some failure modes. Being dumb
helps sometimes.)

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:20           ` H. Peter Anvin
  2009-04-29  0:00             ` Daniel Reurich
@ 2009-04-29  7:45             ` Luca Berra
  2009-04-29 16:55               ` H. Peter Anvin
  2009-04-30  6:59               ` Gabor Gombas
  1 sibling, 2 replies; 76+ messages in thread
From: Luca Berra @ 2009-04-29  7:45 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 28, 2009 at 04:20:06PM -0700, H. Peter Anvin wrote:
>Neil Brown wrote:
>> 
>> That only leaves the question of what happens when a spare is added to
>> the array - how does the grub data get written to the space on the
>> spare.
>> I would rather that grub were responsible for this, than for md to
>> treat that unused space as RAID1.
>> We already have a notification system based on "mdadm --monitor" to
>> process events.  We could possibly plug grub in to that somehow so
>> that it gets told to re-write all it's special blocks every time
>> something significant changes in the array.
>> 
>
>I have multiple issues with this concept (including promoting Grub2, but
>let's not get into that.)

I believe your only issue is promoting grub 2, everything else you say
does not seem to be backed by any other reason.
And, please, quit insulting people, it does no justice to your
intelligence acting like a child.

>For this to be reliable, there is only one sensible configuration, which
>is for /boot to be a RAID-1, which is better handled by -- guess what --
>partitioning systems; and we already have quite a few of those that work
>just fine, thank you.  Otherwise there WILL be configurations -- caused
>by controller failures if nothing else -- that simply will not boot even
>though the system is otherwise functional.  Promoting this kind of stuff
>is criminally stupid.

the very funny thing about this is that the solution you endorse works
in the very same way as the solution the op would like to see
implemented. (well at least a cleaned up version of it, if i read the
word cylinder one more time in AD 2009 i will scream :)

your 'only sensible configuration' puts a hunk of code at the beginning
on the disk, which the machine firmware will run when booting.
this particular piece of code has to be put there by something else than
mdadm.

in the partition table case, this piece of code is very limited and can
only access data on disk if firmware is able to.

the op idea is to put a larger block of code at the beginning of the
disk. We can safely assume that this larger block of code, being at the
_beginning_ of the disk is completely readable by the firmware,
this piece of code would be responsible for reading the raid array,
and loading a linux kernel from it.

It has its plus, while these are worth it vs another scheme is
debateable.

Personally I despise disk partitioning schemes, it is a concept that
should have died long ago, even GPT, while being more sensible than PC
partitions, is of no real use to me.  Ok on ia64 the firmware will read
a GPT partition table and load the EFI from a partition, so yes on
itaniums this would be the way to go, do we really care?

I stumbled upon a lovely failure scenario that shows even your scheme is
fragile at best.
Due to issue i would not dwell on here the first disk was kicked from the
raid1 containing /boot, but it was still very well readable by the bios.
result: i took a while realizing why the hell after upgrading the kernel
the system insisted on booting with the previous kernel ;)

also in your 'one sensible configuration' boot should not only be
raid-1, but it should also be entirely contained in the portion of the
disk accessible via int-13. i have seen distribution installers enforce
the first constraint. not the second.


Regards,
L.


-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:02             ` Daniel Reurich
@ 2009-04-29 11:32               ` John Robinson
  0 siblings, 0 replies; 76+ messages in thread
From: John Robinson @ 2009-04-29 11:32 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Linux RAID

On 29/04/2009 01:02, Daniel Reurich wrote:
>>> No What I'm asking for is a superblock layout that allows the boot
>>> loader to be installed in each member disk, so that the booting from a
>>> software raid array becomes not only possible, but reliable and easy to
>>> set up.
>>>
>> What you're asking for is a partitioning scheme hidden away in md.
>>
> Nope, just the first 64K of each member disk to be reserved for a raid
> aware bootloader.

Which is a 64K RAID-1 partition at the beginning of each disc, and the 
remaining partition of the disc the main RAID member, which you can 
already do with visible partitions, so what you're asking for sounds to 
me very much like a paritioning scheme hidden away in md.

Cheers,

John.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  7:45             ` md extension to support booting from raid whole disks Luca Berra
@ 2009-04-29 16:55               ` H. Peter Anvin
  2009-04-29 20:38                 ` Luca Berra
  2009-04-30  6:59               ` Gabor Gombas
  1 sibling, 1 reply; 76+ messages in thread
From: H. Peter Anvin @ 2009-04-29 16:55 UTC (permalink / raw)
  To: linux-raid

Luca Berra wrote:
> 
> I believe your only issue is promoting grub 2, everything else you say
> does not seem to be backed by any other reason.
> 

If you don't believe me, then see the post by Gabor Gombas.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29 16:55               ` H. Peter Anvin
@ 2009-04-29 20:38                 ` Luca Berra
  0 siblings, 0 replies; 76+ messages in thread
From: Luca Berra @ 2009-04-29 20:38 UTC (permalink / raw)
  To: linux-raid

On Wed, Apr 29, 2009 at 09:55:09AM -0700, H. Peter Anvin wrote:
>Luca Berra wrote:
>> 
>> I believe your only issue is promoting grub 2, everything else you say
>> does not seem to be backed by any other reason.
>> 
>
>If you don't believe me, then see the post by Gabor Gombas.
>
bah, this is turning into an holy war, and i really have better use for
my time. I will consider future posts on the matter only if they contain
any fact.  I'm sorry it turned out to a meaningless bickering.

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks, raid6, grub2, lvm2
  2009-04-29  0:20                 ` Daniel Reurich
  2009-04-29  0:28                   ` H. Peter Anvin
@ 2009-04-29 22:43                   ` Michael Ole Olsen
  2009-05-01 21:36                     ` Goswin von Brederlow
  1 sibling, 1 reply; 76+ messages in thread
From: Michael Ole Olsen @ 2009-04-29 22:43 UTC (permalink / raw)
  To: Daniel Reurich
  Cc: H. Peter Anvin, Neil Brown, Dan Williams, Goswin von Brederlow,
	linux-raid

I tried recently with grub2 and also the old grub in lenny (even the lenny
    installer fails, though it seems to complete fine, it just don't boot
    afterwards).

didn't think I would get an issue trying to get /boot on my lvm2 raid6 volume,
    as i recalled grub supported mdadm raids and also lvm2.

seems it has to be partitioned separately as a partition with mdadm and no
lvm2 on top of the boot partition, thats the only thing grub supports, 
     if I am not mistaken?

I just made one big volume, then created the logical volume /boot , and
neither grub or lilo would touch it (lilo only supports mdadm raid1 volumes)

would be cool to be able to boot lvm2 /boot volumes with grub

just wanted to give my recent experience with grub+lvm2+mdadm raid6.


/Michael Ole Olsen




On Wed, 29 Apr 2009, Daniel Reurich wrote:

> On Tue, 2009-04-28 at 17:04 -0700, H. Peter Anvin wrote:
> > Daniel Reurich wrote:
> > > 
> > >> For this to be reliable, there is only one sensible configuration, which
> > >> is for /boot to be a RAID-1, which is better handled by -- guess what --
> > >> partitioning systems; and we already have quite a few of those that work
> > >> just fine, thank you.  Otherwise there WILL be configurations -- caused
> > >> by controller failures if nothing else -- that simply will not boot even
> > >> though the system is otherwise functional.  Promoting this kind of stuff
> > >> is criminally stupid.
> > > 
> > > I disagree.  Grub is quite capable of booting from and assembling a
> > > raid5 volume and accessing it's partitions contents, even if the array
> > > is degraded.  All I'm asking for is that the first 64 kbytes of the disk
> > > be reserved and some of it possibly (but not necessarily) replicated so
> > > that a bootloader capable of assembling a raid array can be installed on
> > > the start of each member disk so that whatever disk the bios decides to
> > > boot from, it will always boot.
> > >  
> > 
> > Grub is capable of doing that IF THE FIRMWARE CAN REACH IT.
> 
> Well if the firmware can't find one if the disks, then it doesn't matter
> what scheme we have.  Even a single disk won't work.
> > 
> > You seem to have the happy notion that this is something typical, which
> > frequently isn't the case.
> 
> I'd say it's typical of 100% of pc's, mac's and just about anything else
> that boots of a harddisk without a hardware raid controller.
> > 
> > What's worse, you're clearly of the opinion that this is something that
> > should be promoted to users, which is the "criminal" part of "criminally
> > stupid."
> 
> I'd like it for me, and to prove it can be done and is a cleaner and
> less administratively intensive way of doing it then teaching the
> OS/user how to partition a disk and add each partition to into their
> respective raid array each time they need to replace or add a new disk
> to their array(s). 
> 
> Whether this proves reliable and stable enough to be promoted to users
> can only be seen once it's proven (or not).
> 
> What's your beef. MD already reserve some space for the superblock, and
> write-intent bitmap (which I believe is also replicated across the
> member disks), so why not add some space to this to make it possible for
> a bootloader as well.
> 
> 
> -- 
> Daniel Reurich
> 
> Centurion Computer Technology (2005) Ltd
> Ph 021 797 722
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  0:44           ` H. Peter Anvin
       [not found]             ` <1240968482.18303.1028.camel@ezra>
@ 2009-04-30  2:41             ` Daniel Reurich
  1 sibling, 0 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-04-30  2:41 UTC (permalink / raw)
  To: linux-raid


> We already have the technology to splitting the disk between the boot
> region and the array region (partition tables) and making sure that the
> boot region is fully replicated (RAID-1).  That is what we should be
> deploying, and if it somehow is too hard to deploy, that is what should
> be fixed.

Can we do this using containers like intels matrix raid.  I believe
mdadm v3 will support assembling intel-matrix raid, but I haven't seen
anything that says it will allow us to create raid volumes like this.

Could this be used to allocate the first piece of the disks as a raid1
disk, and the rest of the disk as a data area as whatever raid level
does the job?

This solution allows the for the replication of the first raid volume
from block zero for thus including the boot loader with /boot (assuming
that the superblock is at the end of the disk), and raid 4|5|6|10 for
the rest of the disk.  And we shouldn't actually need a partition table
at all anywhere if the data volume is overlaid with lvm, and the
bootloader will work regardless of whether it knows about raid.

And best of all adding member disks becomes as simple as mdadm -add ...
which is what I really want.
 
-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  7:45             ` md extension to support booting from raid whole disks Luca Berra
  2009-04-29 16:55               ` H. Peter Anvin
@ 2009-04-30  6:59               ` Gabor Gombas
  2009-04-30  8:11                 ` Luca Berra
  1 sibling, 1 reply; 76+ messages in thread
From: Gabor Gombas @ 2009-04-30  6:59 UTC (permalink / raw)
  To: linux-raid

On Wed, Apr 29, 2009 at 09:45:59AM +0200, Luca Berra wrote:

> Personally I despise disk partitioning schemes, it is a concept that
> should have died long ago, even GPT, while being more sensible than PC
> partitions, is of no real use to me.  Ok on ia64 the firmware will read
> a GPT partition table and load the EFI from a partition, so yes on
> itaniums this would be the way to go, do we really care?

Just because _you_ do not use it does not mean that it is useless. Sure,
on server machines you can dedicate the whole disk for Linux and do
whatever you want. But on desktop machines dual booting is still popular
so unless you fix Windows to boot from an LVM2 volume, partitioning is
going to stay for quite some time. Virtualization helps in some cases to
get rid of the extra partition, but not always.

> I stumbled upon a lovely failure scenario that shows even your scheme is
> fragile at best.
> Due to issue i would not dwell on here the first disk was kicked from the
> raid1 containing /boot, but it was still very well readable by the bios.
> result: i took a while realizing why the hell after upgrading the kernel
> the system insisted on booting with the previous kernel ;)

This failure mode also happens exactly the same way in the "reserve some
space at the beginning ant turn it into a RAID1 without telling enyone"
scheme.

> also in your 'one sensible configuration' boot should not only be
> raid-1, but it should also be entirely contained in the portion of the
> disk accessible via int-13. i have seen distribution installers enforce
> the first constraint. not the second.

If you have such an old BIOS then you will have problems with just a
single disk as well. This has nothing to do with RAID, so I fail to see
why you bring it up.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-30  6:59               ` Gabor Gombas
@ 2009-04-30  8:11                 ` Luca Berra
  2009-04-30 13:01                   ` John Robinson
  0 siblings, 1 reply; 76+ messages in thread
From: Luca Berra @ 2009-04-30  8:11 UTC (permalink / raw)
  To: linux-raid

On Thu, Apr 30, 2009 at 08:59:35AM +0200, Gabor Gombas wrote:
>On Wed, Apr 29, 2009 at 09:45:59AM +0200, Luca Berra wrote:
>
>> Personally I despise disk partitioning schemes, it is a concept that
>> should have died long ago, even GPT, while being more sensible than PC
>> partitions, is of no real use to me.  Ok on ia64 the firmware will read
>> a GPT partition table and load the EFI from a partition, so yes on
>> itaniums this would be the way to go, do we really care?
>
>Just because _you_ do not use it does not mean that it is useless. Sure,
>on server machines you can dedicate the whole disk for Linux and do
>whatever you want. But on desktop machines dual booting is still popular
>so unless you fix Windows to boot from an LVM2 volume, partitioning is
>going to stay for quite some time. Virtualization helps in some cases to
>get rid of the extra partition, but not always.

I do understand that, if you are dual booting you are limited my the
minimum feature set of the system you dual boot. Windows is not even
able to boot from an md raid, the viable choice when dual booting
windows and linux and having disk failure protection is using either
fakeraid, or real hw raid. note that Windows is able to boot from a
raid5 fakeraid, linux, afaik, is not.
I think the whole idea was for systems that do not dual boot.

>> I stumbled upon a lovely failure scenario that shows even your scheme is
>> fragile at best.
>> Due to issue i would not dwell on here the first disk was kicked from the
>> raid1 containing /boot, but it was still very well readable by the bios.
>> result: i took a while realizing why the hell after upgrading the kernel
>> the system insisted on booting with the previous kernel ;)
>
>This failure mode also happens exactly the same way in the "reserve some
>space at the beginning ant turn it into a RAID1 without telling enyone"
>scheme.
My idea is that the space at the beginning has no need of being a raid1.
It just need to contain something smart enough to understand there is a
raid set, assemble it readonly and load a linux kernel from it.
This something does not change over time, you set it up once and forget
about it. At most you update it once in a while.
(note: i never said this something has to be grub2)

>> also in your 'one sensible configuration' boot should not only be
>> raid-1, but it should also be entirely contained in the portion of the
>> disk accessible via int-13. i have seen distribution installers enforce
>> the first constraint. not the second.
>
>If you have such an old BIOS then you will have problems with just a
>single disk as well. This has nothing to do with RAID, so I fail to see
>why you bring it up.
ok, lets forget about that.

L.


-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-30  8:11                 ` Luca Berra
@ 2009-04-30 13:01                   ` John Robinson
  0 siblings, 0 replies; 76+ messages in thread
From: John Robinson @ 2009-04-30 13:01 UTC (permalink / raw)
  To: linux-raid

On 30/04/2009 09:11, Luca Berra wrote:
> On Thu, Apr 30, 2009 at 08:59:35AM +0200, Gabor Gombas wrote:
[...]
> note that Windows is able to boot from a
> raid5 fakeraid, linux, afaik, is not.

I can't see why it wouldn't; a fakeraid BIOS lets you read from the RAID 
set via int13.

[...]
>> This failure mode also happens exactly the same way in the "reserve some
>> space at the beginning ant turn it into a RAID1 without telling enyone"
>> scheme.
> My idea is that the space at the beginning has no need of being a raid1.

But it's effectively a RAID-1 since you have to put your boot code at 
the start of every drive, or at least every drive that might appear as 
the boot drive. And if you want to boot off a RAID-[456] set, the boot 
code is going to have to have some kind of reimplementation of software 
RAID, like a fakeraid BIOS or Linux md. This looks to me like an 
unnecessary duplication of effort.

This whole "boot from raid whole discs" thing seems to me to be a bit of 
a misnomer anyway; "raid whole discs" don't have boot sectors on them, 
so by definition you can't boot from them, so in any case we're talking 
about booting from discs split, partitioned if you will, into a boot 
area and the RAID area.

Cheers,

John.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 22:26         ` Dan Williams
@ 2009-05-01 21:04           ` Goswin von Brederlow
  2009-05-01 21:24             ` Dan Williams
  0 siblings, 1 reply; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-01 21:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: Daniel Reurich, H. Peter Anvin, Goswin von Brederlow, linux-raid

Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, Apr 28, 2009 at 3:19 PM, Daniel Reurich <daniel@centurion.net.nz> wrote:
>> On Tue, 2009-04-28 at 11:24 -0700, Dan Williams wrote:
>>
>>>
>>> ...or use a metadata format that your platform bios understands and
>>> provides an int 13h vector.  See the new external metadata formats
>>> supported by the mdadm devel-3.0 branch.
>>
>> I don't think a metadata format is the right way either.
>
> Huh? The bootloader does not need to know anything about raid.  It
> just uses int13 calls to read sectors off a "disk".  The fact that the
> disk is a software raid5 array is completely hidden from grub.  This
> is functionality that has been available via dmraid for some time and
> is now being made available with the MD infrastructure and mdadm.
>
> Regards,
> Dan

And is horribly ugly when you have to switch hardware. And try doing
it with disks from different controlers? What bios supports that?
What bios even supports raid5?

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-29  6:43                       ` Gabor Gombas
@ 2009-05-01 21:10                         ` Goswin von Brederlow
  2009-05-01 22:36                           ` Rudy Zijlstra
  0 siblings, 1 reply; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-01 21:10 UTC (permalink / raw)
  To: Gabor Gombas
  Cc: Daniel Reurich, H. Peter Anvin, Neil Brown, Dan Williams,
	Goswin von Brederlow, linux-raid

Gabor Gombas <gombasg@sztaki.hu> writes:

> On Wed, Apr 29, 2009 at 12:43:51PM +1200, Daniel Reurich wrote:
>
>> In which case your probably using a hardware raid controller anyway so
>> not our problem.  Otherwise if the array is broken by a failed
>> controller we probably shouldn't boot of it anyway.
>
> I have set up a box with 8 SATA disks attached to 2 on-board
> controllers. The BIOS can boot from any of the controllers, but then it
> can only see the disks that are attached to the selected controller.
> Which is quite reasonable if the BIOS handles the controller selection
> by redirecting INT 13h (I have not checked).
>
> With "/" on RAID1, I can boot in any failure scenarios (I've actually
> tested that anno). With your setup, the box would never boot, since it
> could never access enough disks in a RAID5/6 array, even if all the
> disks/controllers are perfectly fine.

So you have lost nothing. It doesn't boot now from raid5 (accross all
disks) and it still doesn't boot from raid5 with the proposal. Your
hardware just doesn't support it.

>> What's specifically dangerous about it?  Define the failure modes that
>> this scheme is unable to either cope with that it should do.
>
> There is no need for a failure mode. Your scheme does not work even when
> everything is fine.
>
> Gabor

It would only be a problem if with the proposal things would get
worse.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-01 21:04           ` Goswin von Brederlow
@ 2009-05-01 21:24             ` Dan Williams
  2009-05-01 22:33               ` Goswin von Brederlow
  0 siblings, 1 reply; 76+ messages in thread
From: Dan Williams @ 2009-05-01 21:24 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Daniel Reurich, H. Peter Anvin, linux-raid

On Fri, May 1, 2009 at 2:04 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> And is horribly ugly when you have to switch hardware. And try doing
> it with disks from different controlers? What bios supports that?

This is just a simple trade-off between two options.

1/ If you want to reliably boot from multiple disks then make a native
md-raid1 array for /boot like hpa suggests.
2/ If you do not want to make /boot into an md-raid1, and you still
want to boot from raid5, then use $VENDOR's raid5 option-rom and
mdadm's external metadata support.

There is no third option with native md-raid5 and grub2 because as
Gabor and hpa point out it is not reliable and will break.

> What bios even supports raid5?

Just look for any off-the-shelf motherboard that says it has onboard
raid5 support and is not using a dedicated hardware raid controller.

Regards,
Dan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-04-28 23:05         ` Neil Brown
  2009-04-28 23:20           ` H. Peter Anvin
  2009-04-28 23:41           ` Daniel Reurich
@ 2009-05-01 21:33           ` Goswin von Brederlow
  2 siblings, 0 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-01 21:33 UTC (permalink / raw)
  To: Neil Brown
  Cc: Daniel Reurich, Dan Williams, H. Peter Anvin,
	Goswin von Brederlow, linux-raid

Neil Brown <neilb@suse.de> writes:

> On Wednesday April 29, daniel@centurion.net.nz wrote:
>> On Tue, 2009-04-28 at 11:24 -0700, Dan Williams wrote:
>> 
>> > 
>> > ...or use a metadata format that your platform bios understands and
>> > provides an int 13h vector.  See the new external metadata formats
>> > supported by the mdadm devel-3.0 branch.
>> 
>> I don't think a metadata format is the right way either.  
>> 
>> What we need is a new version of the superblock with the first cylinder
>> (32kb on 512b sectors x64 sectors per cylinder) being set aside for the
>> bootloader, the superblock and w-i bitmap go in the second cylinder, and
>> the raid data area starting in the 3rd cylinder.  
>> 
>> It should be the bootloaders responsibility to install the bootloader
>> onto the disks 1st cylinder, but md/mdadm would have to replicate it on
>> resync or adding of a new disk.  However we could consider remapping the
>> bootloader 
>
> While I agree with Dan that having a BIOS which understands RAID is a
> good way to make this sort of thing "just work", I would be nice if it
> could work for people without the bios too.
>
> v1.x metadata has explicit knowledge of where the start of the data
> is, so it is quite possible to leave the first few (dozen) sectors
> unused (let's not talk about cylinders this century - OK?).
> So mdadm could grow a --grub flag to use with --create which arranged
> for data/bitmap to not use the first (say) 512 sectors of any device.
> (1.1 and 1.2 would still use reserved blocks for the superblock).
> [I can cut you a patch to experiment with if you like]
>
> grub could then write whatever it wants to write to any of these
> sectors.

Actualy there you touch a verry good point. How is grub supposed to
write the data anyway? Initially I thought the proposal was to have

sda	sdb	sdc	sdd	md0
0       0       0       0       0 (raid1)
1       1       1       1       1
2       2       2       2       2
3       3       3       3       3
..
meta    meta    meta    meta    -
meta    meta    meta    meta    -
64      65      66      xor     64-66 (raid5)
67      68      69      xor     67-69
...

I.e. at the begining of the md0 device there would be a chunk with
raid1 that is also at the begining of the raw devices. Then the
metadata followed by normal raid5 stripes. Grub would then install to
/dev/md0 and get automatically replicated across all disks.

Now I was against that because that seems awfully complicated for the
code and only works with an FS that leaves space for the bootloader.



What you are talking about is just moving the metadata back more (from
the 4k in 1.2 format to 256k or whatever) and starting the raid5 just
a little bit later on the disk. The only change (so far) would be
increasing the offset where to start.

> That only leaves the question of what happens when a spare is added to
> the array - how does the grub data get written to the space on the
> spare.
> I would rather that grub were responsible for this, than for md to
> treat that unused space as RAID1.
> We already have a notification system based on "mdadm --monitor" to
> process events.  We could possibly plug grub in to that somehow so
> that it gets told to re-write all it's special blocks every time
> something significant changes in the array.
>
> NeilBrown

But now, indeed, how does this work with grub? Grub can't write to
/dev/md0 there, that wouldn't be bootable at all. And if grub writes
to /dev/sda then it doesn't get replicated.

I see two solutions for the initial write:

1) grub initialy writes to all component devices (which already exists
   in some bootloaders)
2) mdadm --copy-reserved /dev/md0 /dev/sda
   After grub installs on /dev/sda it tells mdadm to copy the reserved
   block too all devices.

Also 2 solutions for what to do on changes:

A) mdadm --add copies the first 256k to new devices when syncing
   (possibly sparse too.) The reserved 256k would basically become
   part of the superblock. As such --zero-zuperblock would wipe them
   too. I'm assuming bootloaders can live with identical data on all
   devices.
B) Grub register itself as hook so it can trigger a copy comand on any
   significant change. (possibly run option 2 above)


I think options 1+A are easiest for both md and bootloaders to
implement.

MfG
        Goswin

PS: I'm using grub here as example for any bootloader.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks, raid6, grub2, lvm2
  2009-04-29 22:43                   ` md extension to support booting from raid whole disks, raid6, grub2, lvm2 Michael Ole Olsen
@ 2009-05-01 21:36                     ` Goswin von Brederlow
  0 siblings, 0 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-01 21:36 UTC (permalink / raw)
  To: Michael Ole Olsen
  Cc: Daniel Reurich, H. Peter Anvin, Neil Brown, Dan Williams,
	Goswin von Brederlow, linux-raid

Michael Ole Olsen <gnu@gmx.net> writes:

> I tried recently with grub2 and also the old grub in lenny (even the lenny
>     installer fails, though it seems to complete fine, it just don't boot
>     afterwards).
>
> didn't think I would get an issue trying to get /boot on my lvm2 raid6 volume,
>     as i recalled grub supported mdadm raids and also lvm2.

Grub supports neither. No idea if grub2 can do raid6.

> seems it has to be partitioned separately as a partition with mdadm and no
> lvm2 on top of the boot partition, thats the only thing grub supports, 
>      if I am not mistaken?
>
> I just made one big volume, then created the logical volume /boot , and
> neither grub or lilo would touch it (lilo only supports mdadm raid1 volumes)
>
> would be cool to be able to boot lvm2 /boot volumes with grub

yes it would.

> just wanted to give my recent experience with grub+lvm2+mdadm raid6.
>
>
> /Michael Ole Olsen

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-01 21:24             ` Dan Williams
@ 2009-05-01 22:33               ` Goswin von Brederlow
  2009-05-02 12:07                 ` John Robinson
  0 siblings, 1 reply; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-01 22:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Goswin von Brederlow, Daniel Reurich, H. Peter Anvin, linux-raid

Dan Williams <dan.j.williams@intel.com> writes:

> On Fri, May 1, 2009 at 2:04 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> And is horribly ugly when you have to switch hardware. And try doing
>> it with disks from different controlers? What bios supports that?
>
> This is just a simple trade-off between two options.
>
> 1/ If you want to reliably boot from multiple disks then make a native
> md-raid1 array for /boot like hpa suggests.

Which still has the same problems as the raid5. How to handle the
bootloader. Say I have a raid1 over sda/b with lvm on it. How do I
boot that? How do I get the bootloader cloned to a new spare disk when
it gets added?

It realy makes no difference if it is a raid0/1/10/4/5/6.

> 2/ If you do not want to make /boot into an md-raid1, and you still
> want to boot from raid5, then use $VENDOR's raid5 option-rom and
> mdadm's external metadata support.

There is no such thing as $VENDOR's raid5 option-rom in pretty much
all cases. And even raid0/1 roms are horribly.

> There is no third option with native md-raid5 and grub2 because as
> Gabor and hpa point out it is not reliable and will break.

And they can still be wrong.

>> What bios even supports raid5?
>
> Just look for any off-the-shelf motherboard that says it has onboard
> raid5 support and is not using a dedicated hardware raid controller.
>
> Regards,
> Dan

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-01 21:10                         ` Goswin von Brederlow
@ 2009-05-01 22:36                           ` Rudy Zijlstra
  2009-05-02  1:04                             ` Daniel Reurich
       [not found]                             ` <87presxwu4.fsf@frosties.localdomain>
  0 siblings, 2 replies; 76+ messages in thread
From: Rudy Zijlstra @ 2009-05-01 22:36 UTC (permalink / raw)
  To: Goswin von Brederlow
  Cc: Gabor Gombas, Daniel Reurich, H. Peter Anvin, Neil Brown,
	Dan Williams, linux-raid

Op vrijdag 01-05-2009 om 23:10 uur [tijdzone +0200], schreef Goswin von
Brederlow:
> Gabor Gombas <gombasg@sztaki.hu> writes:
> 
> > On Wed, Apr 29, 2009 at 12:43:51PM +1200, Daniel Reurich wrote:
> >
> >> In which case your probably using a hardware raid controller anyway so
> >> not our problem.  Otherwise if the array is broken by a failed
> >> controller we probably shouldn't boot of it anyway.
> >
> > I have set up a box with 8 SATA disks attached to 2 on-board
> > controllers. The BIOS can boot from any of the controllers, but then it
> > can only see the disks that are attached to the selected controller.
> > Which is quite reasonable if the BIOS handles the controller selection
> > by redirecting INT 13h (I have not checked).
> >
> > With "/" on RAID1, I can boot in any failure scenarios (I've actually
> > tested that anno). With your setup, the box would never boot, since it
> > could never access enough disks in a RAID5/6 array, even if all the
> > disks/controllers are perfectly fine.
> 
> So you have lost nothing. It doesn't boot now from raid5 (accross all
> disks) and it still doesn't boot from raid5 with the proposal. Your
> hardware just doesn't support it.
> 
> >> What's specifically dangerous about it?  Define the failure modes that
> >> this scheme is unable to either cope with that it should do.
> >
> > There is no need for a failure mode. Your scheme does not work even when
> > everything is fine.
> >
> > Gabor
> 
> It would only be a problem if with the proposal things would get
> worse.
> 

It would get worse, as in many situations the installer would succeed,
and the boot would fail. Many raid 5/6 configurations will spread over
controllers, and not BIOS supports booting over several controllers. 

I really prefer the current situation, where you need to explicitly
configure a RAID1. This makes clear what is happening, and reduces
confusion. This propposal would make debug of a failed boot just so much
more difficult. 

Cheers,

Rudy


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-01 22:36                           ` Rudy Zijlstra
@ 2009-05-02  1:04                             ` Daniel Reurich
  2009-05-02 17:02                               ` Michał Przyłuski
       [not found]                             ` <87presxwu4.fsf@frosties.localdomain>
  1 sibling, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-05-02  1:04 UTC (permalink / raw)
  To: linux-raid


> It would get worse, as in many situations the installer would succeed,
> and the boot would fail. Many raid 5/6 configurations will spread over
> controllers, and not BIOS supports booting over several controllers. 
> 
Then we teach the bootloaders installer to detect whether all the member
disks are on the same controller, and refuse to install (or atleast warn
at that point) if not.  

That being said, I don't know whether grub2 is smart enough yet to probe
all the controllers for disks, not just the ones accessible via int13.
If this is the case then this particular failure mode is reduced to only
being an issue on unsupported controllers.


-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-01 22:33               ` Goswin von Brederlow
@ 2009-05-02 12:07                 ` John Robinson
  2009-05-04 17:02                   ` Goswin von Brederlow
  2009-05-05  9:31                   ` Michal Soltys
  0 siblings, 2 replies; 76+ messages in thread
From: John Robinson @ 2009-05-02 12:07 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: linux-raid

On 01/05/2009 23:33, Goswin von Brederlow wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> 
>> On Fri, May 1, 2009 at 2:04 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>> And is horribly ugly when you have to switch hardware. And try doing
>>> it with disks from different controlers? What bios supports that?
>> This is just a simple trade-off between two options.
>>
>> 1/ If you want to reliably boot from multiple disks then make a native
>> md-raid1 array for /boot like hpa suggests.
> 
> Which still has the same problems as the raid5. How to handle the
> bootloader. Say I have a raid1 over sda/b with lvm on it. How do I
> boot that? How do I get the bootloader cloned to a new spare disk when
> it gets added?

`dd if=/dev/originaldrive of=/dev/newdrive bs=512 count=1` to clone the 
boot sector from one of the remaining original drives; this clones your 
partition table too. Once that's done, and you've added the new drive's 
partition(s) back into the RAID set(s) and it's synced, it will boot.

Or just tell your bootloader to install itself on the new drive.

> It realy makes no difference if it is a raid0/1/10/4/5/6.

Yes it does - anything which needs access to more discs than whatever 
the BIOS offers as the boot drive via int13 is asking for trouble.

>> 2/ If you do not want to make /boot into an md-raid1, and you still
>> want to boot from raid5, then use $VENDOR's raid5 option-rom and
>> mdadm's external metadata support.
> 
> There is no such thing as $VENDOR's raid5 option-rom in pretty much
> all cases. And even raid0/1 roms are horribly.

Dan was being modest referring to $VENDOR; any recent board with an 
Intel ICHxR has Intel Matrix RAID with RAID-5 in the BIOS/option ROM. I 
imagine other chipsets have RAID-5 too. I would have used this method on 
my Asus P5Q Pro but the CentOS 5 installer doesn't support it; I think 
recent Fedora and Ubuntu installers do, thanks to excellent work by Dan, 
Neil Brown and others improving md's support for 3rd-party metadata formats.

Cheers,

John.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-02  1:04                             ` Daniel Reurich
@ 2009-05-02 17:02                               ` Michał Przyłuski
  2009-05-03  1:33                                 ` Leslie Rhorer
  2009-05-08 22:06                                 ` Goswin von Brederlow
  0 siblings, 2 replies; 76+ messages in thread
From: Michał Przyłuski @ 2009-05-02 17:02 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: linux-raid

Hello,

2009/5/2 Daniel Reurich <daniel@centurion.net.nz>:
>> It would get worse, as in many situations the installer would succeed,
>> and the boot would fail. Many raid 5/6 configurations will spread over
>> controllers, and not BIOS supports booting over several controllers.
>>
> Then we teach the bootloaders installer to detect whether all the member
> disks are on the same controller, and refuse to install (or atleast warn
> at that point) if not.

That has been an interesting discussion over last week or so. I have
some thoughts at this point, not really technical...

First thing is, DO NOT boot from raid5/6. It's pointless anyway.
Let's think of raid5 as a bigger raid1. It won't add any extra
redundancy to our boot-process over a separate raid1 for /boot. Making
it a hidden raid1 over first few sectors is just creating an
automation that gains nothing and makes things unnecessarily
complicated inside.
For example, if user has a, say, 4 disk raid5, with magic or normal
raid1 for boot, and looses 2 disks, he or she is still pretty angry,
no matter that they can boot, when they'd lost all their data. I'm
quite sure users care more about data when they go raid5 than system
itself.

If you can afford real hw controller, that "does it all for raid5/6"
and provides one int13 device for bootloader then no problems. But
then, who makes his or hers /boot (and / and data) on one huge
partition.

If you can't, and want to have reliable boot, then you should mirror
your drives. Going anywhere over raid1 is pointless. You should just
have a backup, boot is small and changes rarely, one can burn it to a
dvd easily.

So, as /boot (or even /) nowadays is really tiny, compared to disk
sizes, you can easily carve out few gigs at the start of device for
raid1, and use rest of all disks for raid5. Also, lvm'ing /boot sounds
just wrong, I don't think resizing or other lvm features are of any
use for /boot.

Summing up, I don't get, why would anybody really want to boot from raid5 or 6.

I second booting from one thing, and storing data on the other. It can
be different partition, it can be different disk, but mixing those
things together in one place is bad practice for many many reasons.

And my very personal background; I chose mdadm because it allows me to
make raid sets across multiple controllers, and I don't use my raid6
for anything other than data. System boots from single (even EIDE)
disk, I'm totally not worried about my system, only data matter.

So all in all, I think all levels of protections are already
available. And please, no next metadata format; we already have 4
mdadm metadata versions, and users are still not sure which one they
should choose.

Please don't eat me at once,
Have a nice spring day everybody,
Mike

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: md extension to support booting from raid whole disks.
  2009-05-02 17:02                               ` Michał Przyłuski
@ 2009-05-03  1:33                                 ` Leslie Rhorer
  2009-05-03  4:25                                   ` NeilBrown
  2009-05-08 22:06                                 ` Goswin von Brederlow
  1 sibling, 1 reply; 76+ messages in thread
From: Leslie Rhorer @ 2009-05-03  1:33 UTC (permalink / raw)
  To: 'Linux RAID'

> First thing is, DO NOT boot from raid5/6. It's pointless anyway.

I agree.

> Summing up, I don't get, why would anybody really want to boot from raid5
> or 6.

Well, I can see the reasons, but to my mind they are outweighed by the
problems inherent in doing so and the fact the benefits are minimal.


> I second booting from one thing, and storing data on the other. It can
> be different partition, it can be different disk, but mixing those
> things together in one place is bad practice for many many reasons.

With hard drives being so inexpensive, I see no reason not to have separate
boot drive or mirrored set and data devices.  Indeed, since a human error is
much more likley than a drive failure, what I like to do is use a moderate
(160G or so) sized hard drive to create all the boot and OS partitions
(including Grub and if necessary, Windows) and keep a hard drive on the
shelf with a copy of the running system.  Of course there are snapshot
utilities which can make it fairly easy to revert to earlier states, and
those used in combination with mirrored drives can serve the same purpose,
but I find the spare drive on the shelf to be the most comfortable solution.

> And my very personal background; I chose mdadm because it allows me to
> make raid sets across multiple controllers, and I don't use my raid6
> for anything other than data. System boots from single (even EIDE)
> disk, I'm totally not worried about my system, only data matter.

Well, it can take a while to set up a system and recover from a boot drive
failure, so some means of backup is definitely a good idea.  If nothing
else, one can simply tar all the files from the boot system to the RAID
array.  At the very least, a copy of /var and /etc is a good idea.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: md extension to support booting from raid whole disks.
  2009-05-03  1:33                                 ` Leslie Rhorer
@ 2009-05-03  4:25                                   ` NeilBrown
  2009-05-03 18:05                                     ` Leslie Rhorer
  2009-05-04  3:04                                     ` Daniel Reurich
  0 siblings, 2 replies; 76+ messages in thread
From: NeilBrown @ 2009-05-03  4:25 UTC (permalink / raw)
  To: lrhorer; +Cc: 'Linux RAID'

On Sun, May 3, 2009 11:33 am, Leslie Rhorer wrote:
>> First thing is, DO NOT boot from raid5/6. It's pointless anyway.
>
> I agree.

I don't.
I think it is very valuable for people to share their experiences
of what works and what doesn't.  Of what problem situations arise
and what solutions seem effective.  It is even good to give advice
when advice is sought.
But to assume that the thing which works best for you will always
work for everyone else - or vice versa - is wrong.   And people seem
to be appearing close to that assumption.

>
>> Summing up, I don't get, why would anybody really want to boot from
>> raid5
>> or 6.
>
> Well, I can see the reasons, but to my mind they are outweighed by the
> problems inherent in doing so and the fact the benefits are minimal.
>

You are very welcome to share what you see as the inherent problems.
But please be aware that others may see benefits that you do not.


> With hard drives being so inexpensive, I see no reason not to have
> separate
> boot drive or mirrored set and data devices.

Yes, hard drives are cheaper than they once were.  But they still
aren't free.  And they do consume measurable volume and current.

If I happen to have a box which has only got room for 4 drives, then
I'm not going to use two of them just for boot, nor am I going to
plug in external drives with all the risks associated with them.

My own "ideal" would be
 - simple boot loader in first sector of every drive that loaded a
   'second stage' linearly off the early sectors of the disk.
 - the 'second stage' is a linux kernel plus initramfs which finds
   the root filesystem loads the final kernel and initramfs, and
   uses kexec to boot into it.

Thus the final stage of the boot loader can understand any filesystem,
any raid level, any combination of controllers.

The area that stores the second stage would be written rarely, and
always written as a whole - no incremental updates.  So much of the
power of md/raid1 would not be necessary.  Having some tool that
installed the boot loader and second stage on every bootable device would
seem an adequate solution.
Whether this space were kept free by traditional partitioning,
or by the filesystem or raid or whatever "knowing" to leave the first
few meg free is of relatively little interest to me.  I can see advantages
both ways.

While that would be my ideal, I'm quite happy to support other people in
experimenting with other approaches.  It is by being open to making mistakes
and learning from them that we grow.

So I still plan to offer a "--reserve-space=2M" option for mdadm to
allow the first 2M of each device to not used for raid data.  Whether
any particular usage of this option is viable or not, is a different
question altogether.

NeilBrown


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: md extension to support booting from raid whole disks.
  2009-05-03  4:25                                   ` NeilBrown
@ 2009-05-03 18:05                                     ` Leslie Rhorer
  2009-05-04  3:04                                     ` Daniel Reurich
  1 sibling, 0 replies; 76+ messages in thread
From: Leslie Rhorer @ 2009-05-03 18:05 UTC (permalink / raw)
  To: 'Linux RAID'


> On Sun, May 3, 2009 11:33 am, Leslie Rhorer wrote:
> >> First thing is, DO NOT boot from raid5/6. It's pointless anyway.
> >
> > I agree.
> 
> I don't.
> I think it is very valuable for people to share their experiences
> of what works and what doesn't.  Of what problem situations arise
> and what solutions seem effective.  It is even good to give advice
> when advice is sought.
> But to assume that the thing which works best for you will always
> work for everyone else - or vice versa - is wrong.   And people seem
> to be appearing close to that assumption.

I admit the case was overstated, and I thought about that very fact after I
sent the message.  See below.
 
> >> Summing up, I don't get, why would anybody really want to boot from
> >> raid5
> >> or 6.
> >
> > Well, I can see the reasons, but to my mind they are outweighed by the
> > problems inherent in doing so and the fact the benefits are minimal.
> >
> 
> You are very welcome to share what you see as the inherent problems.
> But please be aware that others may see benefits that you do not.

That's entirely possible, but note I qualified my statement specifically
saying I did see advantages, and there are - cost being one.  I stand by my
statement, fully qualified as my personal opinion above, that to my mind the
benefits of booting from a RAID array are outweighed by the inherent issues.
Any other person is free to have a different viewpoint.

> > With hard drives being so inexpensive, I see no reason not to have
> > separate
> > boot drive or mirrored set and data devices.
> 
> Yes, hard drives are cheaper than they once were.  But they still
> aren't free.

Well, I have a dozen or so hard drives laying around which will never be
used - being too small for any other purpose - unless they get stuck into a
system as a boot drive.  Barring that potential use, they would just be
discarded.  That's about as close to free as it gets.  More to the point, it
would seem to me anyone who has enough cash to purchase 4 or more large hard
drives should not ordinarily have a problem getting a small one.

> And they do consume measurable volume and current.

True, but power supplies are also cheap.  Space can admittedly be a
bugger-bear of an issue, but a larger case or an external drive solution may
be viable.  Given we are talking about creating a system, here, proper
planning is advisable.  Obtaining a system which is too small to meet one's
needs is not.

> If I happen to have a box which has only got room for 4 drives, then
> I'm not going to use two of them just for boot, nor am I going to
> plug in external drives with all the risks associated with them.

Well, as I mentioned, I prefer backing up the boot system with a cold drive
sitting on the shelf, but I have also never purchased a case intended to
host a RAID array which only has four drive slots.  In any case, mirrored
systems are far too prone to user errors for my tastes.  I make easily a
thousand stupid mistakes in configuring systems for every drive failure I
have ever had, and no level of RAID support can make up for that.  A cold
spare drive can.

> My own "ideal" would be
>  - simple boot loader in first sector of every drive that loaded a
>    'second stage' linearly off the early sectors of the disk.
>  - the 'second stage' is a linux kernel plus initramfs which finds
>    the root filesystem loads the final kernel and initramfs, and
>    uses kexec to boot into it.
> 
> Thus the final stage of the boot loader can understand any filesystem,
> any raid level, any combination of controllers.

Yeah, but I don't think that will work with a Windows partition, will it?
'Much as I hate Windows and avoid it whenever possible, I do always include
a Windows boot in every x86 architecture machine, simply because many
manufacturers still refuse to supply the breadth of tools for their hardware
for Linux they do for Windows.  It also provides a comparison which can help
to determine if a problem is software or hardware related.  Recently I was
having a problem with a UPS, for example, and it was unclear if the problem
was the UPS or the Linux driver.  By booting Windows, I was able to
determine it was the Linux driver.  OTOH, the last thing I want to do is
waste relatively precious drive space on an array with a Windows partition.

> The area that stores the second stage would be written rarely, and
> always written as a whole - no incremental updates.  So much of the
> power of md/raid1 would not be necessary.  Having some tool that
> installed the boot loader and second stage on every bootable device would
> seem an adequate solution.
> Whether this space were kept free by traditional partitioning,
> or by the filesystem or raid or whatever "knowing" to leave the first
> few meg free is of relatively little interest to me.  I can see advantages
> both ways.

True.  With a completely separate drive, however, the admin can take an idle
machine, build his OS with drivers and utilities intact, test the machine to
work out bugs, and then upgrade the production machine by downing it and
replacing the boot hard drive.  If the system subsequently crashes or
exhibits unstable behavior, swap the drives back and go back to the drawing
board.

> While that would be my ideal, I'm quite happy to support other people in
> experimenting with other approaches.  It is by being open to making
> mistakes and learning from them that we grow.

Absolutely!  An important part of that process, however, is listening to the
opinions and experiences of other people, and learning from their successes
and failures.  That's why I decided to chime in.  As always, every observer
is perfectly welcome to take my opinions into whatever consideration  they
like, or ignore them completely.


> So I still plan to offer a "--reserve-space=2M" option for mdadm to
> allow the first 2M of each device to not used for raid data.  Whether
> any particular usage of this option is viable or not, is a different
> question altogether.

Having additional options available is never a bad thing, and I would never
discourage a developer from adding what he feels is useful features to a
system.  After all, one always has the option of simply not using the
feature.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: md extension to support booting from raid whole disks.
  2009-05-03  4:25                                   ` NeilBrown
  2009-05-03 18:05                                     ` Leslie Rhorer
@ 2009-05-04  3:04                                     ` Daniel Reurich
  2009-05-08 21:50                                       ` Goswin von Brederlow
  1 sibling, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-05-04  3:04 UTC (permalink / raw)
  To: Linux RAID


> My own "ideal" would be
>  - simple boot loader in first sector of every drive that loaded a
>    'second stage' linearly off the early sectors of the disk.
>  - the 'second stage' is a linux kernel plus initramfs which finds
>    the root filesystem loads the final kernel and initramfs, and
>    uses kexec to boot into it.
> 
Why not have it boot into the real linux kernel instead of kexec from a
boot linux kernel into the real one?  This would save the maintenance of
a boot kernel as well as the real one.  Even a simple first stage
bootloader would allow for the selection of between multiple boot images
if there was enough reserved space to have multiple images available.
Of course this would require a userspace tool to emebed them.

This whole discussion seems to revolve around where the complexity of
the boot process should be best located, and the answer this in my view
atleast.

I have asked whether grub2 also has support to access disks across
multiple controllers, and the response I got was that grub2 has modules
for using scsi and ata for disk access, and these can be embedded in the
stage 1 bootimage, so access to disks across controllers may indeed be
possible.  I will run some tests myself to see if this is the reality.

> Thus the final stage of the boot loader can understand any filesystem,
> any raid level, any combination of controllers.
> 
> The area that stores the second stage would be written rarely, and
> always written as a whole - no incremental updates.  So much of the
> power of md/raid1 would not be necessary.  Having some tool that
> installed the boot loader and second stage on every bootable device would
> seem an adequate solution.

But the benefit of md/raid1 of this boot area would be that if the a
disk that is booted from is out of sync with the others for some reason,
yet has enough know how to assemble raid1 (even if it's limited to disks
that are on only 1 controller), and get it's second stage boot image and
linux kernel etc, off the raid1 volume rather than the boot disk, we
effectively remove one of the aforementioned modes of failure.


> Whether this space were kept free by traditional partitioning,
> or by the filesystem or raid or whatever "knowing" to leave the first
> few meg free is of relatively little interest to me.  I can see advantages
> both ways.
> 
I'd personally like to see the back of the MSDOS v2/v3 style partiton
tables when it's not required  (and use lvm on a raided whole disk set.
Both grub2 and linux kexec methods already could do this in theory.

> So I still plan to offer a "--reserve-space=2M" option for mdadm to
> allow the first 2M of each device to not used for raid data.  Whether
> any particular usage of this option is viable or not, is a different
> question altogether.
> 
Would it be better to allow for the creation of a metadata or superblock
that described the on disk layout ala intel matrix style, so that we
could have a whole disk raid, which appears as X number of md devices,
so that one could ask for a layout of 256M raid1 volume + 20G raid10 +
the rest of the disk as raid5 or whatever takes the users fancy.  I'd
imagine that this could just be an additional option to mdadm --create.
This may or may not need a superblock extension that defines the raid
volumes layout either in the superblock, just a metadata block like one
would expect from an intel matrix raid or similar 3rd party metadata
format that the mdadm 3 is said to support.


-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-02 12:07                 ` John Robinson
@ 2009-05-04 17:02                   ` Goswin von Brederlow
  2009-05-05  9:31                   ` Michal Soltys
  1 sibling, 0 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-04 17:02 UTC (permalink / raw)
  To: John Robinson; +Cc: Goswin von Brederlow, linux-raid

John Robinson <john.robinson@anonymous.org.uk> writes:

> On 01/05/2009 23:33, Goswin von Brederlow wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Fri, May 1, 2009 at 2:04 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>>> And is horribly ugly when you have to switch hardware. And try doing
>>>> it with disks from different controlers? What bios supports that?
>>> This is just a simple trade-off between two options.
>>>
>>> 1/ If you want to reliably boot from multiple disks then make a native
>>> md-raid1 array for /boot like hpa suggests.
>>
>> Which still has the same problems as the raid5. How to handle the
>> bootloader. Say I have a raid1 over sda/b with lvm on it. How do I
>> boot that? How do I get the bootloader cloned to a new spare disk when
>> it gets added?
>
> `dd if=/dev/originaldrive of=/dev/newdrive bs=512 count=1` to clone
> the boot sector from one of the remaining original drives; this clones
> your partition table too. Once that's done, and you've added the new
> drive's partition(s) back into the RAID set(s) and it's synced, it
> will boot.

Except that the bootblock then loads the next stage of the bootloader
which you did not copy. And the partition table might not work on that
disk at all if size and/or geometry differ.

> Or just tell your bootloader to install itself on the new drive.
>
>> It realy makes no difference if it is a raid0/1/10/4/5/6.
>
> Yes it does - anything which needs access to more discs than whatever
> the BIOS offers as the boot drive via int13 is asking for trouble.

Which is totally irelevant as THAT problem exists with partitions and
existing bootloader just as well as with the proposed
improvement. Nothing can change that short of fixing your hardware
architecture. .oO(go open bios)

>>> 2/ If you do not want to make /boot into an md-raid1, and you still
>>> want to boot from raid5, then use $VENDOR's raid5 option-rom and
>>> mdadm's external metadata support.
>>
>> There is no such thing as $VENDOR's raid5 option-rom in pretty much
>> all cases. And even raid0/1 roms are horribly.
>
> Dan was being modest referring to $VENDOR; any recent board with an
> Intel ICHxR has Intel Matrix RAID with RAID-5 in the BIOS/option
> ROM. I imagine other chipsets have RAID-5 too. I would have used this
> method on my Asus P5Q Pro but the CentOS 5 installer doesn't support
> it; I think recent Fedora and Ubuntu installers do, thanks to
> excellent work by Dan, Neil Brown and others improving md's support
> for 3rd-party metadata formats.
>
> Cheers,
>
> John.

I still don't have a single system that can.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
       [not found]                                 ` <87bpq8n6ym.fsf@frosties.localdomain>
@ 2009-05-04 20:57                                   ` Rudy Zijlstra
  2009-05-04 22:33                                     ` Daniel Reurich
  2009-05-08 21:18                                     ` Goswin von Brederlow
  0 siblings, 2 replies; 76+ messages in thread
From: Rudy Zijlstra @ 2009-05-04 20:57 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Linux Raid

Op maandag 04-05-2009 om 18:55 uur [tijdzone +0200], schreef Goswin von
Brederlow:
> Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:
> 
> > Op zaterdag 02-05-2009 om 00:49 uur [tijdzone +0200], schreef Goswin von
> > Brederlow:
> >> Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:
> >> 
> >> > I really prefer the current situation, where you need to explicitly
> >> > configure a RAID1. This makes clear what is happening, and reduces
> >> > confusion. This propposal would make debug of a failed boot just so much
> >> > more difficult. 
> >> >
> >> > Cheers,
> >> >
> >> > Rudy
> >> 
> >> As said in another mail the same problem is there with raid1. The
> >> proposal should allow creating a raid1 over sda/b without partitioning
> >> and any replacement drive automatically becoming bootable without you
> >> having to manually reinstall the bootloader to the new disk.
> >> 
> >> Wouldn't that be a huge plus?
> >> 
> >> MfG
> >>         Goswin
> >
> > Agreed to the automatic bootable of replacement disk, I still prefer the
> > partitioning though, as it makes clear what is happening. There are two
> > aspects here:
> > 1/ ease of installation
> 
> Which isn't true. Installing thebootloader on every component device
> is currently a pain and easy to forget when changing disks.

which means it is an important aspect... And i agree at the moment easy
to forget. So far i do not see improvement in this aspect from current
proposal. 

> 
> > 2/ ease of debug
> >
> > The latter is very important in boot situations. It gets worse with any
> > implied action. Please take a look at what you are specifying: an
> > implicit RAID1 over 2 "special" disks, within a RAIDX device... Now you
> > expect a user to debug that, in case it fails?? I have had too often
> > trouble to get my systems to boot the way i wanted to boot them, to
> > trust any BIOS to do the expected :(
> >
> > Cheers,
> >
> > Rudy
> 
> Not over 2 "special" disks. Plain across all disks.

Which is even worse, as i have not seen *any* BIOS able to handle
that... and i have a intel based board wich according to this email
thread should do that (and linux did not recognise the raid it had
configured, so let us forget about booting from it)



Cheers,

Rudy


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-04 20:57                                   ` Rudy Zijlstra
@ 2009-05-04 22:33                                     ` Daniel Reurich
  2009-05-05  0:26                                       ` John Robinson
  2009-05-08 21:18                                     ` Goswin von Brederlow
  1 sibling, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-05-04 22:33 UTC (permalink / raw)
  To: Rudy; +Cc: Goswin von Brederlow, Linux Raid

On Mon, 2009-05-04 at 22:57 +0200, Rudy Zijlstra wrote:
> Op maandag 04-05-2009 om 18:55 uur [tijdzone +0200], schreef Goswin von
> Brederlow:
> > Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:
> > 
> > > Op zaterdag 02-05-2009 om 00:49 uur [tijdzone +0200], schreef Goswin von
> > > Brederlow:
> > >> Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:
> > >> 
> > >> > I really prefer the current situation, where you need to explicitly
> > >> > configure a RAID1. This makes clear what is happening, and reduces
> > >> > confusion. This propposal would make debug of a failed boot just so much
> > >> > more difficult. 
> > >> >
> > >> > Cheers,
> > >> >
> > >> > Rudy
> > >> 
> > >> As said in another mail the same problem is there with raid1. The
> > >> proposal should allow creating a raid1 over sda/b without partitioning
> > >> and any replacement drive automatically becoming bootable without you
> > >> having to manually reinstall the bootloader to the new disk.
> > >> 
> > >> Wouldn't that be a huge plus?
> > >> 
> > >> MfG
> > >>         Goswin
> > >
> > > Agreed to the automatic bootable of replacement disk, I still prefer the
> > > partitioning though, as it makes clear what is happening. There are two
> > > aspects here:
> > > 1/ ease of installation
> > 
> > Which isn't true. Installing thebootloader on every component device
> > is currently a pain and easy to forget when changing disks.
> 
> which means it is an important aspect... And i agree at the moment easy
> to forget. So far i do not see improvement in this aspect from current
> proposal. 
> 
> > 
> > > 2/ ease of debug
> > >
> > > The latter is very important in boot situations. It gets worse with any
> > > implied action. Please take a look at what you are specifying: an
> > > implicit RAID1 over 2 "special" disks, within a RAIDX device... Now you
> > > expect a user to debug that, in case it fails?? I have had too often
> > > trouble to get my systems to boot the way i wanted to boot them, to
> > > trust any BIOS to do the expected :(
> > >
> > > Cheers,
> > >
> > > Rudy
> > 
> > Not over 2 "special" disks. Plain across all disks.
> 
> Which is even worse, as i have not seen *any* BIOS able to handle
> that... and i have a intel based board wich according to this email
> thread should do that (and linux did not recognise the raid it had
> configured, so let us forget about booting from it)
> 
I agree it shouldn't be the bioses problem to figure out how to boot
from software raid.  The Bios should not have to handle anything other
than reading the bootsector of the first disk it finds and executing the
bootcode it finds there.  It is then the responsibility of the boot
loader be that grub or linux kexec or whatever to use int13h calls to
load the enough of itself to get proper disk access working and then do
whatever it needs to do to boot the OS proper.  Grub2 is supposed to
have scsi/ata support so that it can access devices the bios doesn't
make available via int13, and does have support for assembling raid and
acessing LVM volumes.  Linux kexec based solutions will have full
support for accessing the disks, raid and LVM also.  So the BIOS only
needs to find a valid boot sector on one of the attached disks and then
hand control over to the boot loader which should assemble any raid and
boot the chosen OS.

Whether implicit volumes or explicit partitions for the boot volume,
whatever the solution it should still replicate the bootsector code, and
leave the assembly of raid to the bootloader.

I'd like to someone tell me how to get a partitioned scenario where the
partition that starts right at the start of the disk and includes the
bootsector, so that we raid1 that partition and have the bootsector
replicated as well?
> 
> 
> Cheers,
> 
> Rudy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-04 22:33                                     ` Daniel Reurich
@ 2009-05-05  0:26                                       ` John Robinson
  2009-05-05  9:03                                         ` Keld Jørn Simonsen
  0 siblings, 1 reply; 76+ messages in thread
From: John Robinson @ 2009-05-05  0:26 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Linux Raid

On 04/05/2009 23:33, Daniel Reurich wrote:
> I'd like to someone tell me how to get a partitioned scenario where the
> partition that starts right at the start of the disk and includes the
> bootsector, so that we raid1 that partition and have the bootsector
> replicated as well?

I'm sure there are hooks to allow automatic rebuilding of arrays when 
drives fail and are replaced. I don't know what they are (hotplug?), or 
whether they're used in any current distros. I do know some people have 
this automatic rebuilding; for example Netgear (ex Infrant) ReadyNAS 
devices, which are Linux based and use md RAID, do this. It can't be 
such a big leap to use such a hook to reinstate the boot sector at the 
same time as reinstating RAID metadata amd resyncing the array(s). Maybe 
current distros don't automatically do this because it only works if the 
user/OEM understands the particular case.

I don't believe it can't be done using the existing partition table and 
md RAID-1 /boot arrangements, and it would be great if we (and more 
distros) had suitable scripts available, but I suspect it's impossible 
to generalise for all RAID'ed installations.

Cheers,

John.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-05  0:26                                       ` John Robinson
@ 2009-05-05  9:03                                         ` Keld Jørn Simonsen
  0 siblings, 0 replies; 76+ messages in thread
From: Keld Jørn Simonsen @ 2009-05-05  9:03 UTC (permalink / raw)
  To: John Robinson; +Cc: Daniel Reurich, Linux Raid

On Tue, May 05, 2009 at 01:26:11AM +0100, John Robinson wrote:
> On 04/05/2009 23:33, Daniel Reurich wrote:
>> I'd like to someone tell me how to get a partitioned scenario where the
>> partition that starts right at the start of the disk and includes the
>> bootsector, so that we raid1 that partition and have the bootsector
>> replicated as well?
>
> I'm sure there are hooks to allow automatic rebuilding of arrays when  
> drives fail and are replaced. I don't know what they are (hotplug?), or  
> whether they're used in any current distros. I do know some people have  
> this automatic rebuilding; for example Netgear (ex Infrant) ReadyNAS  
> devices, which are Linux based and use md RAID, do this. It can't be  
> such a big leap to use such a hook to reinstate the boot sector at the  
> same time as reinstating RAID metadata amd resyncing the array(s). Maybe  
> current distros don't automatically do this because it only works if the  
> user/OEM understands the particular case.
>
> I don't believe it can't be done using the existing partition table and  
> md RAID-1 /boot arrangements, and it would be great if we (and more  
> distros) had suitable scripts available, but I suspect it's impossible  
> to generalise for all RAID'ed installations.

Hmm, what are you motivation for this?

I think that you can make a much automated disk partitioning scheme,
in the tune of the following:

You want to have multiple partitions on your disks.
Like /boot, /root, /home etc.

This is because you can only boot off a raid1, and raid1 is relatively
slow, so you want to minimize the use of raid1, eg to just the boot
partition. The rest of the partitions can then be the more io-effective
raid10,f2, or space effective raid5 (or raid6). 

You do not boot from MBR, but rather from the partition marked as
active.

Then make your partition scheme, and copy the whole scheme via eg

  sfdisk -d /dev/sda | sfdisk /dev/sdb

Then when you update your system, either by lilo or grub, you only need
to make the change once, and it will apply to all of the raids that
/boot and /root are located on.

This is further described in
http://linux-raid.osdl.org/index.php/Preventing_against_a_failing_disk

best regards
keld

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-02 12:07                 ` John Robinson
  2009-05-04 17:02                   ` Goswin von Brederlow
@ 2009-05-05  9:31                   ` Michal Soltys
  1 sibling, 0 replies; 76+ messages in thread
From: Michal Soltys @ 2009-05-05  9:31 UTC (permalink / raw)
  To: John Robinson; +Cc: Goswin von Brederlow, linux-raid

> John Robinson wrote:
>> Which still has the same problems as the raid5. How to handle the
>> bootloader. Say I have a raid1 over sda/b with lvm on it. How do I
>> boot that? How do I get the bootloader cloned to a new spare disk when
>> it gets added?
> 

Btw - if you have _only_ raid1 on the drives, then you can also set up
partitionable md raid1 with its superblock at the end of the device. 
Thus -  linux raid will cover _whole_ disk, while remaining capable of 
booting with /boot partition in a typical way if /boot was raid1 over 
sd[abc..]1

As for booting itself - the key thing is to have "tiny" /boot partition
_with_ simple well understood and supported filesystem (such as fat,
ext2) at the beginning - outside of lvm or more complex raids. Such a
partition can have - apart from linux kernel+initramfs - bootable
freedos or linux, and host some good bootloader/manager (such
as syslinux, which is far more powerful and flexible than grub).

If /boot is created inside some container (lvm, raid5) and/or as a part 
of other filesystem - then it only ties hands, imho.

>> On Fri, May 1, 2009 at 2:04 PM, Goswin von Brederlow 
>> <goswin-v-b@web.de> wrote:
> `dd if=/dev/originaldrive of=/dev/newdrive bs=512 count=1` to clone the 
> boot sector from one of the remaining original drives; this clones your 
> partition table too. Once that's done, and you've added the new drive's 
> partition(s) back into the RAID set(s) and it's synced, it will boot.
> 

Assuming you have no extended/logical partitions. In case of
legacy partitions, it's better to use:

sfdisk -d /dev/sda >pt.dump

and later to restore somewhere else

sfdisk /dev/sdb <pt.dump

Also - the 'dd' method would also clone disk signature (4 bytes at
1B8h), which depending on the needs, may not be desirable.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-04 20:57                                   ` Rudy Zijlstra
  2009-05-04 22:33                                     ` Daniel Reurich
@ 2009-05-08 21:18                                     ` Goswin von Brederlow
  1 sibling, 0 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-08 21:18 UTC (permalink / raw)
  To: Rudy; +Cc: Goswin von Brederlow, Linux Raid

Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:

> Op maandag 04-05-2009 om 18:55 uur [tijdzone +0200], schreef Goswin von
> Brederlow:
>> Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:
>> 
>> > Op zaterdag 02-05-2009 om 00:49 uur [tijdzone +0200], schreef Goswin von
>> > Brederlow:
>> >> Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:
>> >> 
>> >> > I really prefer the current situation, where you need to explicitly
>> >> > configure a RAID1. This makes clear what is happening, and reduces
>> >> > confusion. This propposal would make debug of a failed boot just so much
>> >> > more difficult. 
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Rudy
>> >> 
>> >> As said in another mail the same problem is there with raid1. The
>> >> proposal should allow creating a raid1 over sda/b without partitioning
>> >> and any replacement drive automatically becoming bootable without you
>> >> having to manually reinstall the bootloader to the new disk.
>> >> 
>> >> Wouldn't that be a huge plus?
>> >> 
>> >> MfG
>> >>         Goswin
>> >
>> > Agreed to the automatic bootable of replacement disk, I still prefer the
>> > partitioning though, as it makes clear what is happening. There are two
>> > aspects here:
>> > 1/ ease of installation
>> 
>> Which isn't true. Installing thebootloader on every component device
>> is currently a pain and easy to forget when changing disks.
>
> which means it is an important aspect... And i agree at the moment easy
> to forget. So far i do not see improvement in this aspect from current
> proposal. 

My preference is to have md consider the bootloader space part of the
metadata and copy it accross to new disks when they get added to a
raid. That way you can't forget it.
 
>> > 2/ ease of debug
>> >
>> > The latter is very important in boot situations. It gets worse with any
>> > implied action. Please take a look at what you are specifying: an
>> > implicit RAID1 over 2 "special" disks, within a RAIDX device... Now you
>> > expect a user to debug that, in case it fails?? I have had too often
>> > trouble to get my systems to boot the way i wanted to boot them, to
>> > trust any BIOS to do the expected :(
>> >
>> > Cheers,
>> >
>> > Rudy
>> 
>> Not over 2 "special" disks. Plain across all disks.
>
> Which is even worse, as i have not seen *any* BIOS able to handle
> that... and i have a intel based board wich according to this email
> thread should do that (and linux did not recognise the raid it had
> configured, so let us forget about booting from it)

If it is mirrored across all disks any one disk (of the raid) will be
bootable. Every bios supports that.

The part where you need bios support only comes later once the
bootloader is loaded, activates its raid capabilities and then tries
to read a raid 0/4/5/6. If the bios does not grive access to enough
component devices of such a raid then the bootloader will be stuck
then.

But that is really nothing new. With a partition block and raid on
partitions the bootloader is just as stuck there. That just can't be
helped so I don't consider that an argument for or against anything.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-04  3:04                                     ` Daniel Reurich
@ 2009-05-08 21:50                                       ` Goswin von Brederlow
  2009-05-08 22:16                                         ` NeilBrown
  0 siblings, 1 reply; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-08 21:50 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Linux RAID

Daniel Reurich <daniel@centurion.net.nz> writes:

>> My own "ideal" would be
>>  - simple boot loader in first sector of every drive that loaded a
>>    'second stage' linearly off the early sectors of the disk.
>>  - the 'second stage' is a linux kernel plus initramfs which finds
>>    the root filesystem loads the final kernel and initramfs, and
>>    uses kexec to boot into it.
>> 
> Why not have it boot into the real linux kernel instead of kexec from a
> boot linux kernel into the real one?  This would save the maintenance of

Because the boot kernel is simple with just the disk drivers in it. I
build it once and install it and never have to touch it again.

The real kernel on the other hand is potentially much bigger and will
run under thread of being hacked. It is important to update it
regulary with security fixes or newer versions. When updating it is
crucial to have the old version available as backup. Having a choice
of kernels to boot is crucial.

> a boot kernel as well as the real one.  Even a simple first stage
> bootloader would allow for the selection of between multiple boot images
> if there was enough reserved space to have multiple images available.
> Of course this would require a userspace tool to emebed them.

Ok, how much space? 10MB? 100MB? 1GB? 10GB? I think people would kill
you if you reserve 1GB on their 8GB SSD disks. On the other hand 10MB
would only fit one kernel and initrd if at all. I have a 300MB rescue
system as initrd in my /boot. That would have to fit in the reserved
space too.

Ergo we need 2 stages. One small stage to get access to all the disk
space and present a menu and then boot the real deal from there.

> This whole discussion seems to revolve around where the complexity of
> the boot process should be best located, and the answer this in my view
> atleast.
>
> I have asked whether grub2 also has support to access disks across
> multiple controllers, and the response I got was that grub2 has modules
> for using scsi and ata for disk access, and these can be embedded in the
> stage 1 bootimage, so access to disks across controllers may indeed be
> possible.  I will run some tests myself to see if this is the reality.
>
>> Thus the final stage of the boot loader can understand any filesystem,
>> any raid level, any combination of controllers.
>> 
>> The area that stores the second stage would be written rarely, and
>> always written as a whole - no incremental updates.  So much of the
>> power of md/raid1 would not be necessary.  Having some tool that
>> installed the boot loader and second stage on every bootable device would
>> seem an adequate solution.
>
> But the benefit of md/raid1 of this boot area would be that if the a
> disk that is booted from is out of sync with the others for some reason,
> yet has enough know how to assemble raid1 (even if it's limited to disks
> that are on only 1 controller), and get it's second stage boot image and
> linux kernel etc, off the raid1 volume rather than the boot disk, we
> effectively remove one of the aforementioned modes of failure.

If the system crashes during the installation of the bootloader then
it will 99.99999% not work no matter what you do. The probability that
it crashes just in the moment when disk 1 has finished writing but
disk 2 has not is so miniscule that we can ignore it. If it crashes
while installing the bootloader then you have to installit again from
a rescue medium.

Note that with grub you basically never need to install it a second
time.

>> Whether this space were kept free by traditional partitioning,
>> or by the filesystem or raid or whatever "knowing" to leave the first
>> few meg free is of relatively little interest to me.  I can see advantages
>> both ways.
>> 
> I'd personally like to see the back of the MSDOS v2/v3 style partiton
> tables when it's not required  (and use lvm on a raided whole disk set.
> Both grub2 and linux kexec methods already could do this in theory.
>
>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
>> allow the first 2M of each device to not used for raid data.  Whether
>> any particular usage of this option is viable or not, is a different
>> question altogether.

How exactly would that layout be then?

Block  0   bootblock
Block  1   raid metadata
Block  x   2M reserved space
Block x+2M start of raid data

Like this?

> Would it be better to allow for the creation of a metadata or superblock
> that described the on disk layout ala intel matrix style, so that we
> could have a whole disk raid, which appears as X number of md devices,
> so that one could ask for a layout of 256M raid1 volume + 20G raid10 +
> the rest of the disk as raid5 or whatever takes the users fancy.  I'd
> imagine that this could just be an additional option to mdadm --create.
> This may or may not need a superblock extension that defines the raid
> volumes layout either in the superblock, just a metadata block like one
> would expect from an intel matrix raid or similar 3rd party metadata
> format that the mdadm 3 is said to support.

Wouldn't it be easier to use lvm there and implement the missing raid
levels for the lvm userspace and device-mapper?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-02 17:02                               ` Michał Przyłuski
  2009-05-03  1:33                                 ` Leslie Rhorer
@ 2009-05-08 22:06                                 ` Goswin von Brederlow
  2009-05-09  7:20                                   ` Peter Rabbitson
  1 sibling, 1 reply; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-08 22:06 UTC (permalink / raw)
  To: Micha Przyuski; +Cc: Daniel Reurich, linux-raid

Micha³ Przy³uski <mikylie@gmail.com> writes:

> Hello,
>
> 2009/5/2 Daniel Reurich <daniel@centurion.net.nz>:
>>> It would get worse, as in many situations the installer would succeed,
>>> and the boot would fail. Many raid 5/6 configurations will spread over
>>> controllers, and not BIOS supports booting over several controllers.
>>>
>> Then we teach the bootloaders installer to detect whether all the member
>> disks are on the same controller, and refuse to install (or atleast warn
>> at that point) if not.
>
> That has been an interesting discussion over last week or so. I have
> some thoughts at this point, not really technical...
>
> First thing is, DO NOT boot from raid5/6. It's pointless anyway.
> Let's think of raid5 as a bigger raid1. It won't add any extra
> redundancy to our boot-process over a separate raid1 for /boot. Making

The point was not to add redundany but to remove complexity.

Just look at what you need to do for the next generation of disks
(>2TB):

- Create a MS Dos partition table with a fake /boot partition in the
  first 2TB.
- Create a GPT table with a matching /boot partition and the rest
- Create a raid1 for /boot
- Create a raidX for the rest

Now you have to watch 2 raids and add/remove partitions from 2 raids,
You also need to copy the bootloader to every new disk you add.

The idea is to bring this down to:

- Create raidX over all disks

> it a hidden raid1 over first few sectors is just creating an
> automation that gains nothing and makes things unnecessarily
> complicated inside.

If the hidden raid1 is just reserved space that is considered part of
the raid metadata then this moves completly into mdadm userspace. The
extra complexity comes down to "read reserved space from old disk,
write reserved space to new disk". In the most basic form that is 3
lines of code (declare buffer, read, write).

> For example, if user has a, say, 4 disk raid5, with magic or normal
> raid1 for boot, and looses 2 disks, he or she is still pretty angry,
> no matter that they can boot, when they'd lost all their data. I'm
> quite sure users care more about data when they go raid5 than system
> itself.

Not the point.

> If you can afford real hw controller, that "does it all for raid5/6"
> and provides one int13 device for bootloader then no problems. But
> then, who makes his or hers /boot (and / and data) on one huge
> partition.

People that don't use softwareraid are not a traget group for software
raid. So irelevant.

> If you can't, and want to have reliable boot, then you should mirror
> your drives. Going anywhere over raid1 is pointless. You should just
> have a backup, boot is small and changes rarely, one can burn it to a
> dvd easily.
>
> So, as /boot (or even /) nowadays is really tiny, compared to disk
> sizes, you can easily carve out few gigs at the start of device for
> raid1, and use rest of all disks for raid5. Also, lvm'ing /boot sounds
> just wrong, I don't think resizing or other lvm features are of any
> use for /boot.

But being able to pvmove it can be verry usefull. Say you have a
system with 2 disk raid1 that is hotplug capable, has space for 4
disks and you want to migrate to bigger disks. Just plug in 2 new
disks, create raid, extend VG, pvmove everything, reduce VG, stop old
raid, remove old disks.

> Summing up, I don't get, why would anybody really want to boot from raid5 or 6.
>
> I second booting from one thing, and storing data on the other. It can
> be different partition, it can be different disk, but mixing those
> things together in one place is bad practice for many many reasons.
>
> And my very personal background; I chose mdadm because it allows me to
> make raid sets across multiple controllers, and I don't use my raid6
> for anything other than data. System boots from single (even EIDE)
> disk, I'm totally not worried about my system, only data matter.
>
> So all in all, I think all levels of protections are already
> available. And please, no next metadata format; we already have 4
> mdadm metadata versions, and users are still not sure which one they
> should choose.
>
> Please don't eat me at once,
> Have a nice spring day everybody,
> Mike

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-08 21:50                                       ` Goswin von Brederlow
@ 2009-05-08 22:16                                         ` NeilBrown
  2009-05-08 22:29                                           ` Goswin von Brederlow
  0 siblings, 1 reply; 76+ messages in thread
From: NeilBrown @ 2009-05-08 22:16 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Daniel Reurich, Linux RAID

On Sat, May 9, 2009 7:50 am, Goswin von Brederlow wrote:

>>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
>>> allow the first 2M of each device to not used for raid data.  Whether
>>> any particular usage of this option is viable or not, is a different
>>> question altogether.
>
> How exactly would that layout be then?
>
> Block  0   bootblock
> Block  1   raid metadata
> Block  x   2M reserved space
> Block x+2M start of raid data
>
> Like this?

When using 1.2 metadata, yes, possible with bitmap
inserted  between the reserved space and the start of raid data.

When using 1.0, it would be

  Block 0..N-1   boot block and second stage
  Block N..near-the-end raid data
  Block x..y     bitmap
  block z        superblock

NeilBrown


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-08 22:16                                         ` NeilBrown
@ 2009-05-08 22:29                                           ` Goswin von Brederlow
  2009-05-12  5:39                                             ` Neil Brown
  0 siblings, 1 reply; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-08 22:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: Goswin von Brederlow, Daniel Reurich, Linux RAID

"NeilBrown" <neilb@suse.de> writes:

> On Sat, May 9, 2009 7:50 am, Goswin von Brederlow wrote:
>
>>>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
>>>> allow the first 2M of each device to not used for raid data.  Whether
>>>> any particular usage of this option is viable or not, is a different
>>>> question altogether.
>>
>> How exactly would that layout be then?
>>
>> Block  0   bootblock
>> Block  1   raid metadata
>> Block  x   2M reserved space
>> Block x+2M start of raid data
>>
>> Like this?
>
> When using 1.2 metadata, yes, possible with bitmap
> inserted  between the reserved space and the start of raid data.

That realy seems to be the best option. Simple to implement, simple to
use and if mdadm copies the reserved space from old to new drives when
adding one it gives us exactly what we want.

Are you working on that already or do you think it needs more discussion?

> When using 1.0, it would be
>
>   Block 0..N-1   boot block and second stage
>   Block N..near-the-end raid data
>   Block x..y     bitmap
>   block z        superblock

I never liked the idea of 1.0.

What actualy does happen when you have raid on partitions and resize a
partition? Am I right that the raid then can't be assembled until the
raid itself gets grown (and the superblock gets moved to the new end)?

> NeilBrown


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-08 22:06                                 ` Goswin von Brederlow
@ 2009-05-09  7:20                                   ` Peter Rabbitson
  2009-05-10  1:29                                     ` Goswin von Brederlow
  0 siblings, 1 reply; 76+ messages in thread
From: Peter Rabbitson @ 2009-05-09  7:20 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Micha Przyuski, Daniel Reurich, linux-raid, neilb

Goswin von Brederlow wrote:
> Micha³ Przy³uski <mikylie@gmail.com> writes:
> 
> 
> The point was not to add redundancy but to remove complexity.
> 
> Just look at what you need to do for the next generation of disks
> (>2TB):
> 
> - Create a MS Dos partition table with a fake /boot partition in the
>   first 2TB.
> - Create a GPT table with a matching /boot partition and the rest
> - Create a raid1 for /boot
> - Create a raidX for the rest
> 
> Now you have to watch 2 raids and add/remove partitions from 2 raids,
> You also need to copy the bootloader to every new disk you add.
> 
> The idea is to bring this down to:
> 
> - Create raidX over all disks
> 
>> it a hidden raid1 over first few sectors is just creating an
>> automation that gains nothing and makes things unnecessarily
>> complicated inside.
> 
> If the hidden raid1 is just reserved space that is considered part of
> the raid metadata then this moves completly into mdadm userspace. The
> extra complexity comes down to "read reserved space from old disk,
> write reserved space to new disk". In the most basic form that is 3
> lines of code (declare buffer, read, write).
> 

This is the best description of the problem/benefit so far. Also when
deciding on the size of the reserved space, factor in possible
bitmap size explosion when moving from say a 4x300G raid6 to a 4x2T raid6.

+1 on this feature request

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-09  7:20                                   ` Peter Rabbitson
@ 2009-05-10  1:29                                     ` Goswin von Brederlow
  0 siblings, 0 replies; 76+ messages in thread
From: Goswin von Brederlow @ 2009-05-10  1:29 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Micha Przyuski, Daniel Reurich, linux-raid, neilb

Peter Rabbitson <rabbit+list@rabbit.us> writes:

> Goswin von Brederlow wrote:
>> If the hidden raid1 is just reserved space that is considered part of
>> the raid metadata then this moves completly into mdadm userspace. The
>> extra complexity comes down to "read reserved space from old disk,
>> write reserved space to new disk". In the most basic form that is 3
>> lines of code (declare buffer, read, write).
>> 
>
> This is the best description of the problem/benefit so far. Also when
> deciding on the size of the reserved space, factor in possible
> bitmap size explosion when moving from say a 4x300G raid6 to a 4x2T raid6.
>
> +1 on this feature request
>
> Cheers

From what Neil described it should be no problem to resize the amount
of space used for metadata. It would have to copy around all the data
of the raid, do a reshape basically, but that isn't hard. Or it could
be limited to adding drives, which would need to do a resync anyway,
e.g. something like

mdadm --add --reserved-space=3M --bitmap-space=4M /dev/md0 /dev/sdx

MfG
        Goswin

PS: tests seem to show though that smaller bitmaps are faster

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-08 22:29                                           ` Goswin von Brederlow
@ 2009-05-12  5:39                                             ` Neil Brown
  2009-05-12 19:44                                               ` Daniel Reurich
  2009-05-13 12:15                                               ` Bill Davidsen
  0 siblings, 2 replies; 76+ messages in thread
From: Neil Brown @ 2009-05-12  5:39 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Daniel Reurich, Linux RAID

On Saturday May 9, goswin-v-b@web.de wrote:
> "NeilBrown" <neilb@suse.de> writes:
> 
> > On Sat, May 9, 2009 7:50 am, Goswin von Brederlow wrote:
> >
> >>>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
> >>>> allow the first 2M of each device to not used for raid data.  Whether
> >>>> any particular usage of this option is viable or not, is a different
> >>>> question altogether.
> >>
> >> How exactly would that layout be then?
> >>
> >> Block  0   bootblock
> >> Block  1   raid metadata
> >> Block  x   2M reserved space
> >> Block x+2M start of raid data
> >>
> >> Like this?
> >
> > When using 1.2 metadata, yes, possible with bitmap
> > inserted  between the reserved space and the start of raid data.
> 
> That realy seems to be the best option. Simple to implement, simple to
> use and if mdadm copies the reserved space from old to new drives when
> adding one it gives us exactly what we want.
> 
> Are you working on that already or do you think it needs more discussion?

Discussion is good....

I have just pushed out some changes to the 'master' branch of
   git://neil.brown.name/mdadm

The last patch adds "--reserve-space=" support to create.
It only works with 1.x metadata (and causes the default to be 1.0).

You cannot hot-add a bitmap to a 1.1 or 1.2 array created with this
feature (the kernel cannot be told the right thing to do yet).

The space can have a K, M, or G suffix with the obvious meanings.
K is the default.

mdadm currently does not copy any data from one device to another.
This could possibly be added for "--add" but not for "--create".

Any reports of success or failure, or other comments would be most
welcome.


> 
> > When using 1.0, it would be
> >
> >   Block 0..N-1   boot block and second stage
> >   Block N..near-the-end raid data
> >   Block x..y     bitmap
> >   block z        superblock
> 
> I never liked the idea of 1.0.
> 
> What actualy does happen when you have raid on partitions and resize a
> partition? Am I right that the raid then can't be assembled until the
> raid itself gets grown (and the superblock gets moved to the new end)?


If you resize the partition under a 0.90 or 1.0 array, then md will
lose track of the metadata and you wont be able to assemble the array
again (there is nothing that will move it to the end).

How often do you resize a partition when there is data on it?  I
suspect only when the partition is a logical volume.  In that case 1.0
is awkward.  In others it works fine.

NeilBrown

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-12  5:39                                             ` Neil Brown
@ 2009-05-12 19:44                                               ` Daniel Reurich
  2009-05-13 11:12                                                 ` Neil Brown
  2009-05-13 12:15                                               ` Bill Davidsen
  1 sibling, 1 reply; 76+ messages in thread
From: Daniel Reurich @ 2009-05-12 19:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: Goswin von Brederlow, Linux RAID

On Tue, 2009-05-12 at 15:39 +1000, Neil Brown wrote:
> On Saturday May 9, goswin-v-b@web.de wrote:
> > "NeilBrown" <neilb@suse.de> writes:
> > 
> > > On Sat, May 9, 2009 7:50 am, Goswin von Brederlow wrote:
> > >
> > >>>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
> > >>>> allow the first 2M of each device to not used for raid data.  Whether
> > >>>> any particular usage of this option is viable or not, is a different
> > >>>> question altogether.
> > >>
> > >> How exactly would that layout be then?
> > >>
> > >> Block  0   bootblock
> > >> Block  1   raid metadata
> > >> Block  x   2M reserved space
> > >> Block x+2M start of raid data
> > >>
> > >> Like this?
> > >
> > > When using 1.2 metadata, yes, possible with bitmap
> > > inserted  between the reserved space and the start of raid data.
> > 
> > That realy seems to be the best option. Simple to implement, simple to
> > use and if mdadm copies the reserved space from old to new drives when
> > adding one it gives us exactly what we want.
> > 
> > Are you working on that already or do you think it needs more discussion?
> 
> Discussion is good....
> 
> I have just pushed out some changes to the 'master' branch of
>    git://neil.brown.name/mdadm
> 
> The last patch adds "--reserve-space=" support to create.
> It only works with 1.x metadata (and causes the default to be 1.0).
> 
> You cannot hot-add a bitmap to a 1.1 or 1.2 array created with this
> feature (the kernel cannot be told the right thing to do yet).
> 
> The space can have a K, M, or G suffix with the obvious meanings.
> K is the default.
> 
> mdadm currently does not copy any data from one device to another.
> This could possibly be added for "--add" but not for "--create".

Could we do this better using containers and snia's ddf in intels matrix
(or our own) metadata to define the data areas and this way create a
raid1 container at the start of the disks and use 1.0 format superblock
and metadata at the end of the drive (as long as this doesn't mess with
the metadata.  

This would solve the syncing of boot sectors because it would be done as
part of normal raid 1.  Hot add would simply be a matter of adding
members to the container.  The only issue I can see is whether you are
able to hot resize the data areas in containers as part of the grow
feature, or would we have to do a metadata tweak to redefine the size of
data areas.

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-12 19:44                                               ` Daniel Reurich
@ 2009-05-13 11:12                                                 ` Neil Brown
  2009-05-14  2:21                                                   ` Daniel Reurich
  2009-05-15 16:13                                                   ` H. Peter Anvin
  0 siblings, 2 replies; 76+ messages in thread
From: Neil Brown @ 2009-05-13 11:12 UTC (permalink / raw)
  To: Daniel Reurich; +Cc: Goswin von Brederlow, Linux RAID

On Wednesday May 13, daniel@centurion.net.nz wrote:
> On Tue, 2009-05-12 at 15:39 +1000, Neil Brown wrote:
> > On Saturday May 9, goswin-v-b@web.de wrote:
> > > "NeilBrown" <neilb@suse.de> writes:
> > > 
> > > > On Sat, May 9, 2009 7:50 am, Goswin von Brederlow wrote:
> > > >
> > > >>>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
> > > >>>> allow the first 2M of each device to not used for raid data.  Whether
> > > >>>> any particular usage of this option is viable or not, is a different
> > > >>>> question altogether.
> > > >>
> > > >> How exactly would that layout be then?
> > > >>
> > > >> Block  0   bootblock
> > > >> Block  1   raid metadata
> > > >> Block  x   2M reserved space
> > > >> Block x+2M start of raid data
> > > >>
> > > >> Like this?
> > > >
> > > > When using 1.2 metadata, yes, possible with bitmap
> > > > inserted  between the reserved space and the start of raid data.
> > > 
> > > That realy seems to be the best option. Simple to implement, simple to
> > > use and if mdadm copies the reserved space from old to new drives when
> > > adding one it gives us exactly what we want.
> > > 
> > > Are you working on that already or do you think it needs more discussion?
> > 
> > Discussion is good....
> > 
> > I have just pushed out some changes to the 'master' branch of
> >    git://neil.brown.name/mdadm
> > 
> > The last patch adds "--reserve-space=" support to create.
> > It only works with 1.x metadata (and causes the default to be 1.0).
> > 
> > You cannot hot-add a bitmap to a 1.1 or 1.2 array created with this
> > feature (the kernel cannot be told the right thing to do yet).
> > 
> > The space can have a K, M, or G suffix with the obvious meanings.
> > K is the default.
> > 
> > mdadm currently does not copy any data from one device to another.
> > This could possibly be added for "--add" but not for "--create".
> 
> Could we do this better using containers and snia's ddf in intels matrix
> (or our own) metadata to define the data areas and this way create a
> raid1 container at the start of the disks and use 1.0 format superblock
> and metadata at the end of the drive (as long as this doesn't mess with
> the metadata.  
> 
> This would solve the syncing of boot sectors because it would be done as
> part of normal raid 1.  Hot add would simply be a matter of adding
> members to the container.  The only issue I can see is whether you are
> able to hot resize the data areas in containers as part of the grow
> feature, or would we have to do a metadata tweak to redefine the size of
> data areas.

While it might be possible to do something vaguely like this, I don't
want to.

Having a replicated boot loader and having a raid1 are conceptually
quite different things.
A raid1 says "keep N (typically 2) copies of the data somewhere for
me".
A replicated boot loader says "store this boot loader on every
bootable device".
One is more abstract, the other is more concrete.

Maybe it is a very subtle distinction, but I think it is worth
maintaining.  Get your boot-loader-installer to install at the front
of every drive - don't bother having a raid1 there that is never read
and hardly ever written..

NeilBrown

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-12  5:39                                             ` Neil Brown
  2009-05-12 19:44                                               ` Daniel Reurich
@ 2009-05-13 12:15                                               ` Bill Davidsen
  1 sibling, 0 replies; 76+ messages in thread
From: Bill Davidsen @ 2009-05-13 12:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: Goswin von Brederlow, Daniel Reurich, Linux RAID

Neil Brown wrote:
> On Saturday May 9, goswin-v-b@web.de wrote:
>   
>> "NeilBrown" <neilb@suse.de> writes:
>>
>>     
>>> On Sat, May 9, 2009 7:50 am, Goswin von Brederlow wrote:
>>>
>>>       
>>>>>> So I still plan to offer a "--reserve-space=2M" option for mdadm to
>>>>>> allow the first 2M of each device to not used for raid data.  Whether
>>>>>> any particular usage of this option is viable or not, is a different
>>>>>> question altogether.
>>>>>>             
>>>> How exactly would that layout be then?
>>>>
>>>> Block  0   bootblock
>>>> Block  1   raid metadata
>>>> Block  x   2M reserved space
>>>> Block x+2M start of raid data
>>>>
>>>> Like this?
>>>>         
>>> When using 1.2 metadata, yes, possible with bitmap
>>> inserted  between the reserved space and the start of raid data.
>>>       
>> That realy seems to be the best option. Simple to implement, simple to
>> use and if mdadm copies the reserved space from old to new drives when
>> adding one it gives us exactly what we want.
>>
>> Are you working on that already or do you think it needs more discussion?
>>     
>
> Discussion is good....
>
> I have just pushed out some changes to the 'master' branch of
>    git://neil.brown.name/mdadm
>
> The last patch adds "--reserve-space=" support to create.
> It only works with 1.x metadata (and causes the default to be 1.0).
>
> You cannot hot-add a bitmap to a 1.1 or 1.2 array created with this
> feature (the kernel cannot be told the right thing to do yet).
>
> The space can have a K, M, or G suffix with the obvious meanings.
> K is the default.
>
> mdadm currently does not copy any data from one device to another.
> This could possibly be added for "--add" but not for "--create".
>
> Any reports of success or failure, or other comments would be most
> welcome.
>
>
>   
>>> When using 1.0, it would be
>>>
>>>   Block 0..N-1   boot block and second stage
>>>   Block N..near-the-end raid data
>>>   Block x..y     bitmap
>>>   block z        superblock
>>>       
>> I never liked the idea of 1.0.
>>
>> What actualy does happen when you have raid on partitions and resize a
>> partition? Am I right that the raid then can't be assembled until the
>> raid itself gets grown (and the superblock gets moved to the new end)?
>>     
>
>
> If you resize the partition under a 0.90 or 1.0 array, then md will
> lose track of the metadata and you wont be able to assemble the array
> again (there is nothing that will move it to the end).
>
> How often do you resize a partition when there is data on it?  I
> suspect only when the partition is a logical volume.  In that case 1.0
> is awkward.  In others it works fine.
>   

Resizing a partition is not an issue, gparted does that nicely for the 
case of partition with filesystem. But when the partition is part of a 
raid array? I'm trying to think just how you would do that, other than 
the very manual one partition at a time. I don't recall seeing gparted 
documentation mentioning that. Perhaps LVM would do that, my use of it 
is somewhat pedestrian and doesn't include swishing my data around like 
mouthwash as I see some people do. As someone said in a movie, "I am not 
a trusting person," and I don't trust LVM, possibly because of 
experiences when it was new.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-13 11:12                                                 ` Neil Brown
@ 2009-05-14  2:21                                                   ` Daniel Reurich
  2009-05-15 16:13                                                   ` H. Peter Anvin
  1 sibling, 0 replies; 76+ messages in thread
From: Daniel Reurich @ 2009-05-14  2:21 UTC (permalink / raw)
  To: Neil Brown; +Cc: Goswin von Brederlow, Linux RAID


> > Could we do this better using containers and snia's ddf in intels matrix
> > (or our own) metadata to define the data areas and this way create a
> > raid1 container at the start of the disks and use 1.0 format superblock
> > and metadata at the end of the drive (as long as this doesn't mess with
> > the metadata.  
> > 
> > This would solve the syncing of boot sectors because it would be done as
> > part of normal raid 1.  Hot add would simply be a matter of adding
> > members to the container.  The only issue I can see is whether you are
> > able to hot resize the data areas in containers as part of the grow
> > feature, or would we have to do a metadata tweak to redefine the size of
> > data areas.
> 
> While it might be possible to do something vaguely like this, I don't
> want to.
> 
Is it already possible to do with mdadm v3?

BTW, I'm not talking about just raiding the boot sector, but a raid1
volume with a filesystem for /boot with the embedded bootsector at the
start of it.  This means that for every member disk that includes
the /boot volume we have a mirror.

The description I got about ddf|matrix containers led me to believe this
was a robust way to do this.  

What's the point in supporting ddf|matrix containers if we can't create
the data areas within them to suit particular use cases like this?  

If we could/can then nothing further needs to be done to md/mdadm to
allow me to implement my preferred boot solution.

> don't bother having a raid1 there that is never read
> and hardly ever written..
????
It's read every time we boot!

I guess I've failed to communicate what I'm after effectively! :(

-- 
Daniel Reurich

Centurion Computer Technology (2005) Ltd
Ph 021 797 722


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: md extension to support booting from raid whole disks.
  2009-05-13 11:12                                                 ` Neil Brown
  2009-05-14  2:21                                                   ` Daniel Reurich
@ 2009-05-15 16:13                                                   ` H. Peter Anvin
  1 sibling, 0 replies; 76+ messages in thread
From: H. Peter Anvin @ 2009-05-15 16:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: Daniel Reurich, Goswin von Brederlow, Linux RAID

Neil Brown wrote:
> 
> Having a replicated boot loader and having a raid1 are conceptually
> quite different things.
> A raid1 says "keep N (typically 2) copies of the data somewhere for
> me".
> A replicated boot loader says "store this boot loader on every
> bootable device".
> One is more abstract, the other is more concrete.
> 
> Maybe it is a very subtle distinction, but I think it is worth
> maintaining.  Get your boot-loader-installer to install at the front
> of every drive - don't bother having a raid1 there that is never read
> and hardly ever written..
> 

I think it is an unfortunate distinction (although I see why you want to
make it), and one which goes in the wrong direction.

As I've stated many times (and not just in this debate), I believe using
the RAID-1 mechanism to replicate /boot across the entire span of
devices is the right thing to do.  Not just the boot loader, but all of
/boot.

Once you do that, you do want to write it on a regular basis, and using
the RAID-1 code is the obvious way to do it.  You can argue that it is
an accidental effect of the way the current Linux RAID-1 code does it,
but it's nevertheless extremely useful, widely deployed, and extremely
resilient.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2009-05-15 16:13 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-24 12:08 md extension to support booting from raid whole disks Daniel Reurich
2009-04-27 15:08 ` Goswin von Brederlow
2009-04-28  4:58   ` H. Peter Anvin
2009-04-28  6:26     ` Luca Berra
2009-04-28  9:35     ` Goswin von Brederlow
2009-04-28 11:21       ` Daniel Reurich
2009-04-28 17:36       ` H. Peter Anvin
2009-04-28 22:23         ` Daniel Reurich
2009-04-28 23:30           ` H. Peter Anvin
2009-04-29  0:02             ` Daniel Reurich
2009-04-29 11:32               ` John Robinson
2009-04-28 18:24     ` Dan Williams
2009-04-28 22:19       ` Daniel Reurich
2009-04-28 22:26         ` Dan Williams
2009-05-01 21:04           ` Goswin von Brederlow
2009-05-01 21:24             ` Dan Williams
2009-05-01 22:33               ` Goswin von Brederlow
2009-05-02 12:07                 ` John Robinson
2009-05-04 17:02                   ` Goswin von Brederlow
2009-05-05  9:31                   ` Michal Soltys
2009-04-28 23:05         ` Neil Brown
2009-04-28 23:20           ` H. Peter Anvin
2009-04-29  0:00             ` Daniel Reurich
2009-04-29  0:04               ` H. Peter Anvin
2009-04-29  0:20                 ` Daniel Reurich
2009-04-29  0:28                   ` H. Peter Anvin
2009-04-29  0:43                     ` Daniel Reurich
2009-04-29  6:43                       ` Gabor Gombas
2009-05-01 21:10                         ` Goswin von Brederlow
2009-05-01 22:36                           ` Rudy Zijlstra
2009-05-02  1:04                             ` Daniel Reurich
2009-05-02 17:02                               ` Michał Przyłuski
2009-05-03  1:33                                 ` Leslie Rhorer
2009-05-03  4:25                                   ` NeilBrown
2009-05-03 18:05                                     ` Leslie Rhorer
2009-05-04  3:04                                     ` Daniel Reurich
2009-05-08 21:50                                       ` Goswin von Brederlow
2009-05-08 22:16                                         ` NeilBrown
2009-05-08 22:29                                           ` Goswin von Brederlow
2009-05-12  5:39                                             ` Neil Brown
2009-05-12 19:44                                               ` Daniel Reurich
2009-05-13 11:12                                                 ` Neil Brown
2009-05-14  2:21                                                   ` Daniel Reurich
2009-05-15 16:13                                                   ` H. Peter Anvin
2009-05-13 12:15                                               ` Bill Davidsen
2009-05-08 22:06                                 ` Goswin von Brederlow
2009-05-09  7:20                                   ` Peter Rabbitson
2009-05-10  1:29                                     ` Goswin von Brederlow
     [not found]                             ` <87presxwu4.fsf@frosties.localdomain>
     [not found]                               ` <1241219902.9516.6.camel@poledra.romunt.nl>
     [not found]                                 ` <87bpq8n6ym.fsf@frosties.localdomain>
2009-05-04 20:57                                   ` Rudy Zijlstra
2009-05-04 22:33                                     ` Daniel Reurich
2009-05-05  0:26                                       ` John Robinson
2009-05-05  9:03                                         ` Keld Jørn Simonsen
2009-05-08 21:18                                     ` Goswin von Brederlow
2009-04-29 22:43                   ` md extension to support booting from raid whole disks, raid6, grub2, lvm2 Michael Ole Olsen
2009-05-01 21:36                     ` Goswin von Brederlow
2009-04-29  7:45             ` md extension to support booting from raid whole disks Luca Berra
2009-04-29 16:55               ` H. Peter Anvin
2009-04-29 20:38                 ` Luca Berra
2009-04-30  6:59               ` Gabor Gombas
2009-04-30  8:11                 ` Luca Berra
2009-04-30 13:01                   ` John Robinson
2009-04-28 23:41           ` Daniel Reurich
2009-04-29  0:01             ` H. Peter Anvin
2009-05-01 21:33           ` Goswin von Brederlow
2009-04-28  7:08   ` Daniel Reurich
2009-04-28 23:07 ` Neil Brown
2009-04-28 23:21   ` Daniel Reurich
2009-04-28 23:37   ` H. Peter Anvin
2009-04-29  0:05     ` Daniel Reurich
2009-04-29  0:06       ` H. Peter Anvin
2009-04-29  0:36         ` Daniel Reurich
2009-04-29  0:44           ` H. Peter Anvin
     [not found]             ` <1240968482.18303.1028.camel@ezra>
     [not found]               ` <49F7B162.8060301@zytor.com>
2009-04-29  2:08                 ` Daniel Reurich
2009-04-29  2:33                   ` H. Peter Anvin
2009-04-30  2:41             ` Daniel Reurich
2009-04-29  7:07           ` Gabor Gombas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.