linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Exporting which partitions to md-configure
@ 2006-01-31  0:52 H. Peter Anvin
  2006-01-31  1:10 ` Neil Brown
  2006-01-31  3:21 ` [klibc] " Greg KH
  0 siblings, 2 replies; 24+ messages in thread
From: H. Peter Anvin @ 2006-01-31  0:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: klibc list, Linux Kernel Mailing List, linux-raid

I'm putting the final touches on kinit, which is the user-space 
replacement (based on klibc) for the whole in-kernel root-mount complex. 
   Pretty much the one thing remaining -- other than lots of testing -- 
is to handle automatically mounted md devices.  In order to do that, 
without adding userspace versions of all the paritition code (which may 
be a future change, but a pretty big one) it would be good if the 
partition flag to auto-configure RAID was available in userspace, 
presumably through sysfs.

Any feeling how best to do that?  My current thinking is to export a 
"flags" entry in addition to the current ones, presumably based on 
"struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which 
seems to be what causes md_autodetect_dev() to be called.

Note that this should be available even if md isn't compiled into the 
kernel, thus making it possible to load md as a module before running 
kinit, or to make the equivalent of the kernel mounting sequence from a 
totally runtime user tool.

	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  0:52 Exporting which partitions to md-configure H. Peter Anvin
@ 2006-01-31  1:10 ` Neil Brown
  2006-01-31  1:42   ` H. Peter Anvin
  2006-01-31  1:43   ` Kyle Moffett
  2006-01-31  3:21 ` [klibc] " Greg KH
  1 sibling, 2 replies; 24+ messages in thread
From: Neil Brown @ 2006-01-31  1:10 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: klibc list, Linux Kernel Mailing List, linux-raid

On Monday January 30, hpa@zytor.com wrote:
> 
> Any feeling how best to do that?  My current thinking is to export a 
> "flags" entry in addition to the current ones, presumably based on 
> "struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which 
> seems to be what causes md_autodetect_dev() to be called.
> 

I think I would prefer a 'type' attribute in each partition that
records the 'type' from the partition table.  This might be more
generally useful than just for md.
Then your userspace code would have to look for '253' and use just
those partitions.

NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  1:10 ` Neil Brown
@ 2006-01-31  1:42   ` H. Peter Anvin
  2006-01-31  2:01     ` Neil Brown
  2006-01-31  6:49     ` Chris Wedgwood
  2006-01-31  1:43   ` Kyle Moffett
  1 sibling, 2 replies; 24+ messages in thread
From: H. Peter Anvin @ 2006-01-31  1:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: klibc list, Linux Kernel Mailing List, linux-raid

Neil Brown wrote:
> On Monday January 30, hpa@zytor.com wrote:
> 
>>Any feeling how best to do that?  My current thinking is to export a 
>>"flags" entry in addition to the current ones, presumably based on 
>>"struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which 
>>seems to be what causes md_autodetect_dev() to be called.
> 
> I think I would prefer a 'type' attribute in each partition that
> records the 'type' from the partition table.  This might be more
> generally useful than just for md.
> Then your userspace code would have to look for '253' and use just
> those partitions.
> 

What about non-DOS partitions?

	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  1:10 ` Neil Brown
  2006-01-31  1:42   ` H. Peter Anvin
@ 2006-01-31  1:43   ` Kyle Moffett
  2006-01-31  1:45     ` H. Peter Anvin
  2006-01-31  2:01     ` Neil Brown
  1 sibling, 2 replies; 24+ messages in thread
From: Kyle Moffett @ 2006-01-31  1:43 UTC (permalink / raw)
  To: Neil Brown
  Cc: H. Peter Anvin, klibc list, Linux Kernel Mailing List, linux-raid

On Jan 30, 2006, at 20:10, Neil Brown wrote:
> On Monday January 30, hpa@zytor.com wrote:
>> Any feeling how best to do that?  My current thinking is to export  
>> a "flags" entry in addition to the current ones, presumably based  
>> on "struct parsed_partitions->parts[].flags" fs/partitions/ 
>> check.h), which seems to be what causes md_autodetect_dev() to be  
>> called.
>
> I think I would prefer a 'type' attribute in each partition that  
> records the 'type' from the partition table.  This might be more  
> generally useful than just for md.  Then your userspace code would  
> have to look for '253' and use just those partitions.

Well, for an MSDOS partition table, you would look for '253', for a  
Mac partition table you could look for something like 'Linux_RAID' or  
similar (just arbitrarily define some name beginning with the Linux_  
prefix), etc.  This means that the partition table type would need to  
be exposed as well (I don't know if it is already).

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw  
knives at people who weren't supposed to be in your machine room.
   -- Anthony de Boer



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  1:43   ` Kyle Moffett
@ 2006-01-31  1:45     ` H. Peter Anvin
  2006-01-31  2:01     ` Neil Brown
  1 sibling, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2006-01-31  1:45 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Neil Brown, klibc list, Linux Kernel Mailing List, linux-raid

Kyle Moffett wrote:
> 
> Well, for an MSDOS partition table, you would look for '253', for a  Mac 
> partition table you could look for something like 'Linux_RAID' or  
> similar (just arbitrarily define some name beginning with the Linux_  
> prefix), etc.  This means that the partition table type would need to  
> be exposed as well (I don't know if it is already).
> 

It's not, but perhaps exporting "format" and "type" as distinct 
attributes is the way to go.  The policy for which partitions to 
consider would live entirely in kinit that way.

type would be format-specific; in EFI it's a UUID.

This, of course, is a bigger change, but it just might be worth it.

	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  1:42   ` H. Peter Anvin
@ 2006-01-31  2:01     ` Neil Brown
  2006-01-31  2:05       ` H. Peter Anvin
  2006-01-31  6:49     ` Chris Wedgwood
  1 sibling, 1 reply; 24+ messages in thread
From: Neil Brown @ 2006-01-31  2:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: klibc list, Linux Kernel Mailing List, linux-raid

On Monday January 30, hpa@zytor.com wrote:
> Neil Brown wrote:
> > On Monday January 30, hpa@zytor.com wrote:
> > 
> >>Any feeling how best to do that?  My current thinking is to export a 
> >>"flags" entry in addition to the current ones, presumably based on 
> >>"struct parsed_partitions->parts[].flags" (fs/partitions/check.h), which 
> >>seems to be what causes md_autodetect_dev() to be called.
> > 
> > I think I would prefer a 'type' attribute in each partition that
> > records the 'type' from the partition table.  This might be more
> > generally useful than just for md.
> > Then your userspace code would have to look for '253' and use just
> > those partitions.
> > 
> 
> What about non-DOS partitions?

Well, grepping through fs/partitions/*.c, the 'flags' thing is set by
 efi.c, msdos.c sgi.c sun.c

Of these, efi compares something against PARTITION_LINUX_RAID_GUID,
and msdos.c, sgi.c and sun. compare something against
LINUX_RAID_PARTITION.

The former would look like
  e6d6d379-f507-44c2-a23c-238f2a3df928
in sysfs (I think);
The latter would look like
  fd
(I suspect).

These are both easily recognisable with no real room for confusion.

And if other partition styles wanted to add support for raid auto
detect, tell them "no". It is perfectly possible and even preferable
to live without autodetect.   We should support legacy usage (those
above) but should discourage any new usage.

NeilBrown


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  1:43   ` Kyle Moffett
  2006-01-31  1:45     ` H. Peter Anvin
@ 2006-01-31  2:01     ` Neil Brown
  2006-01-31  2:38       ` H. Peter Anvin
  2006-01-31  6:42       ` Kyle Moffett
  1 sibling, 2 replies; 24+ messages in thread
From: Neil Brown @ 2006-01-31  2:01 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: H. Peter Anvin, klibc list, Linux Kernel Mailing List, linux-raid

On Monday January 30, mrmacman_g4@mac.com wrote:
> On Jan 30, 2006, at 20:10, Neil Brown wrote:
> > On Monday January 30, hpa@zytor.com wrote:
> >> Any feeling how best to do that?  My current thinking is to export  
> >> a "flags" entry in addition to the current ones, presumably based  
> >> on "struct parsed_partitions->parts[].flags" fs/partitions/ 
> >> check.h), which seems to be what causes md_autodetect_dev() to be  
> >> called.
> >
> > I think I would prefer a 'type' attribute in each partition that  
> > records the 'type' from the partition table.  This might be more  
> > generally useful than just for md.  Then your userspace code would  
> > have to look for '253' and use just those partitions.
> 
> Well, for an MSDOS partition table, you would look for '253', for a  
> Mac partition table you could look for something like 'Linux_RAID' or  
> similar (just arbitrarily define some name beginning with the Linux_  
> prefix), etc.  This means that the partition table type would need to  
> be exposed as well (I don't know if it is already).

Mac partition tables doesn't currently support autodetect (as far as I
can tell).  Let's keep it that way.

NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  2:01     ` Neil Brown
@ 2006-01-31  2:05       ` H. Peter Anvin
  2006-02-06  1:46         ` Neil Brown
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2006-01-31  2:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: klibc list, Linux Kernel Mailing List, linux-raid

Neil Brown wrote:
> 
> Well, grepping through fs/partitions/*.c, the 'flags' thing is set by
>  efi.c, msdos.c sgi.c sun.c
> 
> Of these, efi compares something against PARTITION_LINUX_RAID_GUID,
> and msdos.c, sgi.c and sun. compare something against
> LINUX_RAID_PARTITION.
> 
> The former would look like
>   e6d6d379-f507-44c2-a23c-238f2a3df928
> in sysfs (I think);
> The latter would look like
>   fd
> (I suspect).
> 
> These are both easily recognisable with no real room for confusion.

Well, if we're going to have a generic facility it should make sense 
across the board.  If all we're doing is supporting legacy usage we 
might as well export a flag.

I guess we could have a single entry with a string of the form 
"efi:e6d6d379-f507-44c2-a23c-238f2a3df928" or "msdos:fd" etc -- it 
really doesn't make any difference to me, but it seems cleaner to have 
two pieces of data in two different sysfs entries.

> 
> And if other partition styles wanted to add support for raid auto
> detect, tell them "no". It is perfectly possible and even preferable
> to live without autodetect.   We should support legacy usage (those
> above) but should discourage any new usage.
> 

Why is that, keeping in mind this will all be done in userspace?

	-hpa


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  2:01     ` Neil Brown
@ 2006-01-31  2:38       ` H. Peter Anvin
  2006-01-31  6:42       ` Kyle Moffett
  1 sibling, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2006-01-31  2:38 UTC (permalink / raw)
  To: Neil Brown
  Cc: Kyle Moffett, klibc list, Linux Kernel Mailing List, linux-raid

Neil Brown wrote:
> 
> Mac partition tables doesn't currently support autodetect (as far as I
> can tell).  Let's keep it that way.
> 

For now I guess I'll just take the code from init/do_mounts_md.c; we can 
worry about ripping the RAID_AUTORUN code out of the kernel later.

	-hpa


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Exporting which partitions to md-configure
  2006-01-31  0:52 Exporting which partitions to md-configure H. Peter Anvin
  2006-01-31  1:10 ` Neil Brown
@ 2006-01-31  3:21 ` Greg KH
  2006-01-31  3:24   ` Greg KH
  2006-01-31  3:53   ` H. Peter Anvin
  1 sibling, 2 replies; 24+ messages in thread
From: Greg KH @ 2006-01-31  3:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

On Mon, Jan 30, 2006 at 04:52:08PM -0800, H. Peter Anvin wrote:
> I'm putting the final touches on kinit, which is the user-space 
> replacement (based on klibc) for the whole in-kernel root-mount complex. 
>   Pretty much the one thing remaining -- other than lots of testing -- 
> is to handle automatically mounted md devices.  In order to do that, 
> without adding userspace versions of all the paritition code (which may 
> be a future change, but a pretty big one) it would be good if the 
> partition flag to auto-configure RAID was available in userspace, 
> presumably through sysfs.

What are you looking for exactly?  udev has a great helper program,
volume_id, that identifies any type of filesystem that Linux knows about
(it was based on the ext2 lib code, but smaller, and much more sane, and
works better.)

Would that help out here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Exporting which partitions to md-configure
  2006-01-31  3:21 ` [klibc] " Greg KH
@ 2006-01-31  3:24   ` Greg KH
  2006-01-31  6:53     ` Kyle Moffett
  2006-01-31  3:53   ` H. Peter Anvin
  1 sibling, 1 reply; 24+ messages in thread
From: Greg KH @ 2006-01-31  3:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

On Mon, Jan 30, 2006 at 07:21:33PM -0800, Greg KH wrote:
> On Mon, Jan 30, 2006 at 04:52:08PM -0800, H. Peter Anvin wrote:
> > I'm putting the final touches on kinit, which is the user-space 
> > replacement (based on klibc) for the whole in-kernel root-mount complex. 
> >   Pretty much the one thing remaining -- other than lots of testing -- 
> > is to handle automatically mounted md devices.  In order to do that, 
> > without adding userspace versions of all the paritition code (which may 
> > be a future change, but a pretty big one) it would be good if the 
> > partition flag to auto-configure RAID was available in userspace, 
> > presumably through sysfs.
> 
> What are you looking for exactly?  udev has a great helper program,
> volume_id, that identifies any type of filesystem that Linux knows about
> (it was based on the ext2 lib code, but smaller, and much more sane, and
> works better.)
> 
> Would that help out here?

Oh, an example of it working:
	# vol_id /dev/sda3
	ID_FS_USAGE=filesystem
	ID_FS_TYPE=ext3
	ID_FS_VERSION=1.0
	ID_FS_UUID=9d2efd53-6b5a-4f84-86cc-def71269b7ca
	ID_FS_LABEL=
	ID_FS_LABEL_SAFE=
	# vol_id -t /dev/sda3
	ext3
	# vol_id -u /dev/sda3
	9d2efd53-6b5a-4f84-86cc-def71269b7ca

It also shows just the label if you have one set (I don't on this
disk...)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Exporting which partitions to md-configure
  2006-01-31  3:21 ` [klibc] " Greg KH
  2006-01-31  3:24   ` Greg KH
@ 2006-01-31  3:53   ` H. Peter Anvin
  1 sibling, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2006-01-31  3:53 UTC (permalink / raw)
  To: Greg KH; +Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

Greg KH wrote:
> 
> What are you looking for exactly?  udev has a great helper program,
> volume_id, that identifies any type of filesystem that Linux knows about
> (it was based on the ext2 lib code, but smaller, and much more sane, and
> works better.)
> 
> Would that help out here?
> 

It might, but it's also rather ugly to have two pieces of code, 
especially in the presence of very dynamic partitions.  In other words, 
if the kernel deals with partitions, you want to be able to get at the 
kernel's view of partitions, not necessarily the actual set of 
partitions on disk, which can be quite different.

	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  2:01     ` Neil Brown
  2006-01-31  2:38       ` H. Peter Anvin
@ 2006-01-31  6:42       ` Kyle Moffett
  1 sibling, 0 replies; 24+ messages in thread
From: Kyle Moffett @ 2006-01-31  6:42 UTC (permalink / raw)
  To: Neil Brown
  Cc: H. Peter Anvin, klibc list, Linux Kernel Mailing List, linux-raid

On Jan 30, 2006, at 21:01, Neil Brown wrote:
> On Monday January 30, mrmacman_g4@mac.com wrote:
>> On Jan 30, 2006, at 20:10, Neil Brown wrote:
>>> On Monday January 30, hpa@zytor.com wrote:
>>>> Any feeling how best to do that?  My current thinking is to  
>>>> export a "flags" entry in addition to the current ones,  
>>>> presumably based on "struct parsed_partitions->parts[].flags" fs/ 
>>>> partitions/ check.h), which seems to be what causes  
>>>> md_autodetect_dev() to be called.
>>>
>>> I think I would prefer a 'type' attribute in each partition that  
>>> records the 'type' from the partition table.  This might be more  
>>> generally useful than just for md.  Then your userspace code  
>>> would have to look for '253' and use just those partitions.
>>
>> Well, for an MSDOS partition table, you would look for '253', for  
>> a Mac partition table you could look for something like  
>> 'Linux_RAID' or similar (just arbitrarily define some name  
>> beginning with the Linux_ prefix), etc.  This means that the  
>> partition table type would need to
>> be exposed as well (I don't know if it is already).
>
> Mac partition tables doesn't currently support autodetect (as far  
> as I can tell).  Let's keep it that way.

Well, no, the point would definitely *NOT* be to add kernel-level  
autodetect of stuff in the Mac partition tables.  The point would be  
to export the partition-table-format and partition-type information  
to userspace.  That way a custom mdadm-control-script could have a  
config file with "AUTODETECT_TYPE=Linux_RAID" or  
"AUTODETECT_TYPE=253" or "AUTODETECT_TYPE=<insert EFI UUID here>",  
etc.  The whole detection thing could be configured and done in  
userspace based on partition table info provided by the kernel if  
desired, or mdadm could just scan all disks for RAID headers like it  
does now.  The idea would be that any autodetection would be  
completely out of the kernel and userspace's responsibility; the  
kernel would just export info to make it easier.

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that would also stop them from doing clever things.
   -- Doug Gwyn



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-01-31  1:42   ` H. Peter Anvin
  2006-01-31  2:01     ` Neil Brown
@ 2006-01-31  6:49     ` Chris Wedgwood
  1 sibling, 0 replies; 24+ messages in thread
From: Chris Wedgwood @ 2006-01-31  6:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

On Mon, Jan 30, 2006 at 05:42:45PM -0800, H. Peter Anvin wrote:

> What about non-DOS partitions?

Is something like libblkid suitable as a starting point of something
you can cut-down-to-size?

   text    data     bss     dec     hex filename
  24978    2272      12   27262    6a7e /lib/libblkid.so.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Exporting which partitions to md-configure
  2006-01-31  3:24   ` Greg KH
@ 2006-01-31  6:53     ` Kyle Moffett
  0 siblings, 0 replies; 24+ messages in thread
From: Kyle Moffett @ 2006-01-31  6:53 UTC (permalink / raw)
  To: Greg KH
  Cc: H. Peter Anvin, Neil Brown, linux-raid, klibc list,
	Linux Kernel Mailing List

On Jan 30, 2006, at 22:24, Greg KH wrote:
> Oh, an example of it working:
> 	# vol_id /dev/sda3
> 	ID_FS_USAGE=filesystem
> 	ID_FS_TYPE=ext3
> 	ID_FS_VERSION=1.0
> 	ID_FS_UUID=9d2efd53-6b5a-4f84-86cc-def71269b7ca
> 	ID_FS_LABEL=
> 	ID_FS_LABEL_SAFE=
> 	# vol_id -t /dev/sda3
> 	ext3
> 	# vol_id -u /dev/sda3
> 	9d2efd53-6b5a-4f84-86cc-def71269b7ca

That shows filesystem information, but not at all the partition table  
information.  If I look at my mac partition table (this is using the  
apple-provided tool under OS X, but it's the same using the Linux  
tool), for example:

hc6524e32:~ kyle$ sudo -H /usr/sbin/pdisk -l /dev/disk1

Partition map (with 512 byte blocks) on '/dev/disk1'
#:                type name                               length    
base      ( size )
1: Apple_partition_map Apple                                  63 @ 1
2:     Apple_Bootstrap bootstrap                            1600 @ 64
3:     Apple_UNIX_SVR2 linux_boot                        1048576 @  
1664      (512.0M)
4:     Apple_UNIX_SVR2 linux_swap                        2097152 @  
1050240   (  1.0G)
5:     Apple_UNIX_SVR2 linux_lvm                       241051200 @  
3147392   (114.9G)
6:          Apple_Boot eXternal booter                    262144 @  
244198592 (128.0M)
7:          Apple_RAID Apple_RAID_OfflineV2_Untitled_2 243936432 @  
244460736 (116.3G)

Device block size=512, Number of Blocks=488397168 (232.9G)
DeviceType=0x0, DeviceId=0x0


Now obviously linux_boot, linux_swap, and linux_lvm are not  
_actually_ Apple_UNIX_SVR2, but that's the type stored in the  
partition map.  They also have partition map labels "linux_*", but  
they do *not* have ext3 volume labels.  In fact, linux_boot is one  
slice of a RAID1 of 3 drives, linux_swap is one slice of a RAID5 of 3  
drives, and linux_lvm is one slice of a RAID5 of 3 drives that  
alltogether make an LVM PV.  If I examine each of those partitions  
individually, I get a lot of other information that is totally  
unrelated to that stored in the partition table.  It would be nice to  
be able to change the type of linux_* from Apple_UNIX_SVR2 to  
Linux_RAID (Max length is 31 characters), and have my userspace tools  
be able to detect that and do useful things with it and the pmap label.


Cheers,
Kyle Moffett

--
If you don't believe that a case based on [nothing] could potentially  
drag on in court for _years_, then you have no business playing with  
the legal system at all.
   -- Rob Landley




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-01-31  2:05       ` H. Peter Anvin
@ 2006-02-06  1:46         ` Neil Brown
  2006-02-06  3:29           ` Kyle Moffett
  2006-02-07  2:47           ` [klibc] " H. Peter Anvin
  0 siblings, 2 replies; 24+ messages in thread
From: Neil Brown @ 2006-02-06  1:46 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: klibc list, Linux Kernel Mailing List, linux-raid

On Monday January 30, hpa@zytor.com wrote:
> Neil Brown wrote:
> > 
> > Well, grepping through fs/partitions/*.c, the 'flags' thing is set by
> >  efi.c, msdos.c sgi.c sun.c
> > 
> > Of these, efi compares something against PARTITION_LINUX_RAID_GUID,
> > and msdos.c, sgi.c and sun. compare something against
> > LINUX_RAID_PARTITION.
> > 
> > The former would look like
> >   e6d6d379-f507-44c2-a23c-238f2a3df928
> > in sysfs (I think);
> > The latter would look like
> >   fd
> > (I suspect).
> > 
> > These are both easily recognisable with no real room for confusion.
> 
> Well, if we're going to have a generic facility it should make sense 
> across the board.  If all we're doing is supporting legacy usage we 
> might as well export a flag.
> 
> I guess we could have a single entry with a string of the form 
> "efi:e6d6d379-f507-44c2-a23c-238f2a3df928" or "msdos:fd" etc -- it 
> really doesn't make any difference to me, but it seems cleaner to have 
> two pieces of data in two different sysfs entries.

What constitutes 'a piece of data'?  A bit? a byte?

I would say that 
   msdos:fd
is one piece of data.  The 'fd' is useless without the 'msdos'.
The 'msdos' is, I guess, not completely useless with the fd.

I would lean towards the composite, but I wouldn't fight a separation.


> 
> > 
> > And if other partition styles wanted to add support for raid auto
> > detect, tell them "no". It is perfectly possible and even preferable
> > to live without autodetect.   We should support legacy usage (those
> > above) but should discourage any new usage.
> > 
> 
> Why is that, keeping in mind this will all be done in userspace?
> 

partition-type based autodetect is easily fooled.
If you take a pair of drives from a failed computer, plug them into a
similar computer for data recovery, and boot:  you might have two
different pairs of drives that both want to be 'md0'.  Which wins?

I believe there needs to be a clear, non ambigous, causality path from
the kernel paramters, initramfs, or machine hardware that leads to the
arrays to be assembled and hence the filesystem to be mounted.  These
should identify the array by some reasonably unique identifier.  The
'minor number' stored in the raid superblock and used by
partition-based autodetect is not 'reasonably unique'.

There are many many possibilites.  Some are:

 kernel parameter  md=0,/dev/hda,/dev/hdc

    'hda' and 'hdc' on 'this' machine are (I think) still unique
    identifiers of hardware, and so this can identify drives to assemble
    into an array.  Note that this would *not* be reliable with
          md=0,/dev/sda,/dev/sdb

 kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
    This could be interpreted by an initramfs script to run mdadm 
    to find and assemble the array with that uuid.  The uuid of 
    each array is reasonably unique.

 initramfs containing preconfigured /etc/mdadm.conf
    This mdadm.conf would contain the uuid's of the arrays to
    assemble.

 kernel hardware MAC address
    This could be mapped through DHCP to a host name.  The host name
    is then given to mdadm such that it finds and assemble the array
    with 'name' (only supported in version-1 superblocks) of
         $HOST-root
    or whatever.


Just as there is a direct unambiguous causal path from something
present at early boot to the root filesystem that is mounted (and the
root filesystem specifies all other filesystems through fstab)
similarly there should be an unambiguous causal path from something
present at early boot to the array which holds the root filesystem -
and the root filesystem should describe all other arrays via
mdadm.conf

Does that make sense?

NeilBrown


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Exporting which partitions to md-configure
  2006-02-06  1:46         ` Neil Brown
@ 2006-02-06  3:29           ` Kyle Moffett
  2006-02-07  2:47           ` [klibc] " H. Peter Anvin
  1 sibling, 0 replies; 24+ messages in thread
From: Kyle Moffett @ 2006-02-06  3:29 UTC (permalink / raw)
  To: Neil Brown
  Cc: H. Peter Anvin, klibc list, Linux Kernel Mailing List, linux-raid

On Feb 05, 2006, at 20:46, Neil Brown wrote:
> On Monday January 30, hpa@zytor.com wrote:
>> Well, if we're going to have a generic facility it should make  
>> sense across the board.  If all we're doing is supporting legacy  
>> usage we might as well export a flag.
>>
>> I guess we could have a single entry with a string of the form  
>> "efi:e6d6d379-f507-44c2-a23c-238f2a3df928" or "msdos:fd" etc -- it  
>> really doesn't make any difference to me, but it seems cleaner to  
>> have two pieces of data in two different sysfs entries.
>
> What constitutes 'a piece of data'?  A bit? a byte?
>
> I would say that
>    msdos:fd
> is one piece of data.  The 'fd' is useless without the 'msdos'. The  
> 'msdos' is, I guess, not completely useless with the fd.  I would  
> lean towards the composite, but I wouldn't fight a separation.

I think this boundary is blurred by the fact that several partition  
tables allow mostly-arbitrary partition type strings.  It would be  
convenient to not have to worry about the prefix in that case.  You  
would need just the partition type in the parent device anyways, so  
why munge it into the partition label too?

>>> And if other partition styles wanted to add support for raid auto
>>> detect, tell them "no". It is perfectly possible and even preferable
>>> to live without autodetect.   We should support legacy usage (those
>>> above) but should discourage any new usage.
>>
>> Why is that, keeping in mind this will all be done in userspace?
>
> partition-type based autodetect is easily fooled.  If you take a  
> pair of drives from a failed computer, plug them into a similar  
> computer for data recovery, and boot:  you might have two different  
> pairs of drives that both want to be 'md0'.  Which wins?

Nonono, not _just_ partition-type based autodetect, but a more  
complicated solution done _completely_ in userspace.  Essentially, by  
exporting this data you would merely be providing _extra_ pieces of  
data that could be verified on boot.  If I know that my boot RAID  
volumes for my desktop always have a partition table type string of  
"Linux_RAID_<unique-id>", then I can _also_ verify that in my  
initramdisk.  This isn't as useful on x86, but that's no reason to  
prevent it from being used on archs that do allow 31+ character  
strings for partition types.

> I believe there needs to be a clear, non ambigous, causality path  
> from the kernel paramters, initramfs, or machine hardware that  
> leads to the arrays to be assembled and hence the filesystem to be  
> mounted.

This is one way of doing that on a systems with mac partition  
tables.  The autoprobing is mostly useless on x86 hardware due to the  
limited range of partition types, but that's x86's problem.

Cheers,
Kyle Moffett

--
If you don't believe that a case based on [nothing] could potentially  
drag on in court for _years_, then you have no business playing with  
the legal system at all.
   -- Rob Landley




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-06  1:46         ` Neil Brown
  2006-02-06  3:29           ` Kyle Moffett
@ 2006-02-07  2:47           ` H. Peter Anvin
  2006-02-07  9:03             ` Neil Brown
  2006-02-07 10:43             ` Luca Berra
  1 sibling, 2 replies; 24+ messages in thread
From: H. Peter Anvin @ 2006-02-07  2:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, klibc list, Linux Kernel Mailing List

Neil Brown wrote:
> 
> What constitutes 'a piece of data'?  A bit? a byte?
> 
> I would say that 
>    msdos:fd
> is one piece of data.  The 'fd' is useless without the 'msdos'.
> The 'msdos' is, I guess, not completely useless with the fd.
> 
> I would lean towards the composite, but I wouldn't fight a separation.
> 

Well, the two pieces come from different sources.

> 
> Just as there is a direct unambiguous causal path from something
> present at early boot to the root filesystem that is mounted (and the
> root filesystem specifies all other filesystems through fstab)
> similarly there should be an unambiguous causal path from something
> present at early boot to the array which holds the root filesystem -
> and the root filesystem should describe all other arrays via
> mdadm.conf
> 
> Does that make sense?
> 

It makes sense, but I disagree.  I believe you are correct in that the 
current "preferred minor" bit causes an invalid assumption that, e.g. 
/dev/md3 is always a certain thing, but since each array has a UUID, and 
one should be able to mount by either filesystem UUID or array UUID, 
there should be no need for such a conflict if one allows for dynamic md 
numbers.

Requiring that mdadm.conf describes the actual state of all volumes 
would be an enormous step in the wrong direction.  Right now, the Linux 
md system can handle some very oddball hardware changes (such as on 
hera.kernel.org, when the disks not just completely changed names due to 
a controller change, but changed from hd* to sd*!)

Dynamicity is a good thing, although it needs to be harnessed.

 > kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
 >    This could be interpreted by an initramfs script to run mdadm
 >    to find and assemble the array with that uuid.  The uuid of
 >    each array is reasonably unique.

This, in fact is *EXACTLY* what we're talking about; it does require 
autoassemble.  Why do we care about the partition types at all?  The 
reason is that since the md superblock is at the end, it doesn't get 
automatically wiped if the partition is used as a raw filesystem, and so 
it's important that there is a qualifier for it.

	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-07  2:47           ` [klibc] " H. Peter Anvin
@ 2006-02-07  9:03             ` Neil Brown
  2006-02-07 10:43             ` Luca Berra
  1 sibling, 0 replies; 24+ messages in thread
From: Neil Brown @ 2006-02-07  9:03 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-raid, klibc list, Linux Kernel Mailing List

On Monday February 6, hpa@zytor.com wrote:
> Neil Brown wrote:
> > 
> > Just as there is a direct unambiguous causal path from something
> > present at early boot to the root filesystem that is mounted (and the
> > root filesystem specifies all other filesystems through fstab)
> > similarly there should be an unambiguous causal path from something
> > present at early boot to the array which holds the root filesystem -
> > and the root filesystem should describe all other arrays via
> > mdadm.conf
> > 
> > Does that make sense?
> > 
> 
> It makes sense, but I disagree.  I believe you are correct in that the 
> current "preferred minor" bit causes an invalid assumption that, e.g. 
> /dev/md3 is always a certain thing, but since each array has a UUID, and 
> one should be able to mount by either filesystem UUID or array UUID, 
> there should be no need for such a conflict if one allows for dynamic md 
> numbers.
> 
> Requiring that mdadm.conf describes the actual state of all volumes 
> would be an enormous step in the wrong direction.  Right now, the Linux 
> md system can handle some very oddball hardware changes (such as on 
> hera.kernel.org, when the disks not just completely changed names due to 
> a controller change, but changed from hd* to sd*!)

mdadm.conf doesn't need to, and normally shouldn't, list the devices
that compose an array (though it can if you want it to).

A typical mdadm.conf should look something like:

   DEVICES /dev/hd* /dev/sd*
   ARRAY /dev/md0 UUID=some:long:uuid
   ARRAY /dev/md1 UUID=some:other:long:uuid

So I think we are actually in agreement.

> 
> Dynamicity is a good thing, although it needs to be harnessed.
> 
>  > kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
>  >    This could be interpreted by an initramfs script to run mdadm
>  >    to find and assemble the array with that uuid.  The uuid of
>  >    each array is reasonably unique.
> 
> This, in fact is *EXACTLY* what we're talking about; it does require 
> autoassemble.  Why do we care about the partition types at all?  The 
> reason is that since the md superblock is at the end, it doesn't get 
> automatically wiped if the partition is used as a raw filesystem, and so 
> it's important that there is a qualifier for it.

Maybe I should be explicit about what I am against.
I am against the practice of choosing devices to assemble into arrays
based simply on a partition type - and assembling them into whatever
arrays they appear to comprise.

A device should not be able to say "pick me, pick me!".  Something
*outside* the array should say "pick all devices matching X", where X
is some arbitrary predicate, that could involve partition type
information if you like, but importantly should be precise enough not
to choose wrongly in any but very exceptional circumstances.

I am *not* against 'autoassemble' in the sense that some process hunts
through available devices trying to find the components for a give md
array: It was primarily to achieve this that I wrote mdadm.  I just
want the 'autoassemble' to be driven by some external description of
the array(s) - e.g. uuid.

I don't accept your argument that partition types are of interest
because array components could still have their superblock after being
retargeted.  This is because
  - running "mdadm --zero-superblock" is as easy as changing the
    partition type, and equally, both are easy to forget to do.
  - If you have retargeted devices in an array, you presumably don't
    put the UUID of that array anywhere that would encourage mdadm to
    assemble it.  So the fact that the UUID won't be recognised is
    just as good at stopping the array from being assembled as that
    fact that the partition type has been changed.

This doesn't mean I am violently against partition types (and for
legacy support, we need to use them).  I just don't see a lot of value
in using them.

NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-07  2:47           ` [klibc] " H. Peter Anvin
  2006-02-07  9:03             ` Neil Brown
@ 2006-02-07 10:43             ` Luca Berra
  2006-02-07 15:46               ` H. Peter Anvin
  1 sibling, 1 reply; 24+ messages in thread
From: Luca Berra @ 2006-02-07 10:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

On Mon, Feb 06, 2006 at 06:47:54PM -0800, H. Peter Anvin wrote:
>Neil Brown wrote:
>Requiring that mdadm.conf describes the actual state of all volumes 
>would be an enormous step in the wrong direction.  Right now, the Linux 
>md system can handle some very oddball hardware changes (such as on 
>hera.kernel.org, when the disks not just completely changed names due to 
>a controller change, but changed from hd* to sd*!)
DEVICE partitions
ARRAY /dev/md0 UUID=xxyy:zzyy:aabb:ccdd

would catch that


>Dynamicity is a good thing, although it needs to be harnessed.
>
> > kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
> >    This could be interpreted by an initramfs script to run mdadm
> >    to find and assemble the array with that uuid.  The uuid of
> >    each array is reasonably unique.
I could change mdassemble to allow accepting an uuid on the command line
and assemble a /dev/md0 with the specified uuid (at the moment it only
accepts a configuration file, which i tought was enough for
initrd/initramfs.

>This, in fact is *EXACTLY* what we're talking about; it does require 
>autoassemble.  Why do we care about the partition types at all?  The 
>reason is that since the md superblock is at the end, it doesn't get 
>automatically wiped if the partition is used as a raw filesystem, and so 
>it's important that there is a qualifier for it.
I don't like using partition type as a qualifier, there is people who do
not wish to partition their drives, there are systems not supporting
msdos like partitions, heck even m$ is migrating away from those.

In any case if that has to be done it should be done into mdadm, not
in a different scrip that is going to call mdadm (behaviour should be
consistent between mdadm invoked by initramfs and mdadm invoked on a
running system).

If the user wants to reutilize a device that was previously a member of
an md array he/she should use mdadm --zero-superblock to remove the
superblock.
I see no point in having a system that tries to compensate for users not
following correct procedures. sorry.

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-07 10:43             ` Luca Berra
@ 2006-02-07 15:46               ` H. Peter Anvin
  2006-02-07 16:47                 ` Luca Berra
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2006-02-07 15:46 UTC (permalink / raw)
  To: Luca Berra; +Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

Luca Berra wrote:
> 
>> This, in fact is *EXACTLY* what we're talking about; it does require 
>> autoassemble.  Why do we care about the partition types at all?  The 
>> reason is that since the md superblock is at the end, it doesn't get 
>> automatically wiped if the partition is used as a raw filesystem, and 
>> so it's important that there is a qualifier for it.
> 
> I don't like using partition type as a qualifier, there is people who do
> not wish to partition their drives, there are systems not supporting
> msdos like partitions, heck even m$ is migrating away from those.
> 

That's why we're talking about non-msdos partitioning schemes.

> In any case if that has to be done it should be done into mdadm, not
> in a different scrip that is going to call mdadm (behaviour should be
> consistent between mdadm invoked by initramfs and mdadm invoked on a
> running system).

Agreed.  mdadm is the best place for it.

> If the user wants to reutilize a device that was previously a member of
> an md array he/she should use mdadm --zero-superblock to remove the
> superblock.
> I see no point in having a system that tries to compensate for users not
> following correct procedures. sorry.

You don't?  That surprises me... making it harder for the user to have 
accidental data loss sounds like a very good thing to me.
	
	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-07 15:46               ` H. Peter Anvin
@ 2006-02-07 16:47                 ` Luca Berra
  2006-02-07 16:55                   ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Luca Berra @ 2006-02-07 16:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

On Tue, Feb 07, 2006 at 07:46:59AM -0800, H. Peter Anvin wrote:
>Luca Berra wrote:
>>
>>I don't like using partition type as a qualifier, there is people who do
>>not wish to partition their drives, there are systems not supporting
>>msdos like partitions, heck even m$ is migrating away from those.
>>
>
>That's why we're talking about non-msdos partitioning schemes.

this still leaves whole disks

>>If the user wants to reutilize a device that was previously a member of
>>an md array he/she should use mdadm --zero-superblock to remove the
>>superblock.
>>I see no point in having a system that tries to compensate for users not
>>following correct procedures. sorry.
>
>You don't?  That surprises me... making it harder for the user to have 
>accidental data loss sounds like a very good thing to me.

making it harder for the user is a good thing, but please not at the
expense of usability

the only way i see a user can have data loss is if
- a md array is stopped
- two different filesystems are created on the component devices
- these filesystems are filled with data, but not to the point of
  damaging the superblock
- then the array is started again.

if only one device is removed using mdadm the event counter would
prevent the array from being assembled again.

there are a lot of easier ways for shooting yourself in the feet :)

if we really want to be paranoid we should modify mkXXXfs to refuse
creating a filesystem if the device has an md superblock on it. (lvm2
tools are already able to ignore devices with md superblocks on them,
no clue about EVMS)

L.
-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-07 16:47                 ` Luca Berra
@ 2006-02-07 16:55                   ` H. Peter Anvin
  2006-02-07 17:03                     ` Luca Berra
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2006-02-07 16:55 UTC (permalink / raw)
  To: Luca Berra; +Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

Luca Berra wrote:
> 
> making it harder for the user is a good thing, but please not at the
> expense of usability
> 

What's the usability problem?

	-hpa

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [klibc] Re: Exporting which partitions to md-configure
  2006-02-07 16:55                   ` H. Peter Anvin
@ 2006-02-07 17:03                     ` Luca Berra
  0 siblings, 0 replies; 24+ messages in thread
From: Luca Berra @ 2006-02-07 17:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Neil Brown, linux-raid, klibc list, Linux Kernel Mailing List

On Tue, Feb 07, 2006 at 08:55:21AM -0800, H. Peter Anvin wrote:
>Luca Berra wrote:
>>
>>making it harder for the user is a good thing, but please not at the
>>expense of usability
>>
>
>What's the usability problem?
>
if we fail to support all partitioning schemes and we do not support
non partitioned devices.

if we manage to support all this without too much code bloat i'll shut
up.

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-02-07 17:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-31  0:52 Exporting which partitions to md-configure H. Peter Anvin
2006-01-31  1:10 ` Neil Brown
2006-01-31  1:42   ` H. Peter Anvin
2006-01-31  2:01     ` Neil Brown
2006-01-31  2:05       ` H. Peter Anvin
2006-02-06  1:46         ` Neil Brown
2006-02-06  3:29           ` Kyle Moffett
2006-02-07  2:47           ` [klibc] " H. Peter Anvin
2006-02-07  9:03             ` Neil Brown
2006-02-07 10:43             ` Luca Berra
2006-02-07 15:46               ` H. Peter Anvin
2006-02-07 16:47                 ` Luca Berra
2006-02-07 16:55                   ` H. Peter Anvin
2006-02-07 17:03                     ` Luca Berra
2006-01-31  6:49     ` Chris Wedgwood
2006-01-31  1:43   ` Kyle Moffett
2006-01-31  1:45     ` H. Peter Anvin
2006-01-31  2:01     ` Neil Brown
2006-01-31  2:38       ` H. Peter Anvin
2006-01-31  6:42       ` Kyle Moffett
2006-01-31  3:21 ` [klibc] " Greg KH
2006-01-31  3:24   ` Greg KH
2006-01-31  6:53     ` Kyle Moffett
2006-01-31  3:53   ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).