All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Proposed enhancements to MD
@ 2004-01-13 19:59 Cress, Andrew R
  0 siblings, 0 replies; 57+ messages in thread
From: Cress, Andrew R @ 2004-01-13 19:59 UTC (permalink / raw)
  To: Jeff Garzik, mutex; +Cc: linux-kernel

That discussion was mostly in Nov & Dev 2002.
The Subject line was "RFC - new raid superblock layout for md driver".

Andy

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Jeff Garzik
Sent: Tuesday, January 13, 2004 2:43 PM
To: mutex
Cc: Scott Long; linux-kernel@vger.kernel.org
Subject: Re: Proposed enhancements to MD


mutex wrote:
> On Tue, Jan 13, 2004 at 02:05:55PM -0500 or thereabouts, Jeff Garzik
wrote:
> 
>>>How about a endian safe superblock ?  Seriously, is that a 'bug' or a
>>>'feature' ?  Or do people just not care.
>>
>>
>>There was a thread discussing md's new superblock design, did you 
>>research/follow that?  neilb was actively soliciting comments and
there 
>>was an amount of discussion.
>>
> 
> 
> hmm I don't remember that... was it on lkml or the raid development
> list ? Can you give me a string/date to search around ?


Other than "neil brown md superblock" don't recall.  In the past year or

two :)  There were patches, so it wasn't just discussion.

	Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-16 14:11               ` Matt Domsch
  (?)
@ 2004-01-16 14:13               ` Christoph Hellwig
  -1 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2004-01-16 14:13 UTC (permalink / raw)
  To: Matt Domsch
  Cc: Lars Marowsky-Bree, Neil Brown, Scott Long, linux-kernel, linux-raid

On Fri, Jan 16, 2004 at 08:11:07AM -0600, Matt Domsch wrote:
> www.snia.org in the DDF TWG section, but requires you be a member of SNIA 
> to see at present.  The DDF chairperson is trying to make the draft 
> publicly available, and if/when I see that happen I'll post a link to it 
> here.

Oops.  That's not a good sign.  /me tries to remember a sane spec coming
from SNIA and fails..


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-16 14:06           ` Christoph Hellwig
@ 2004-01-16 14:11               ` Matt Domsch
  0 siblings, 0 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-16 14:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Lars Marowsky-Bree, Neil Brown, Scott Long, linux-kernel, linux-raid

On Fri, 16 Jan 2004, Christoph Hellwig wrote:
> On Fri, Jan 16, 2004 at 02:56:46PM +0100, Lars Marowsky-Bree wrote:
> > If it encodes the bus/id/lun, I can forsee bad effects if the device
> > enumeration changes because the HBAs get swapped in their slots ;-)

I believe it's just supposed to be a hint to the firmware that the drive 
has roamed from one physical slot to another.
 
> A bus/id/lun enumeration is completely bogus.  Think (S)ATA, FC or
> iSCSI.
> 
> So is there a pointer to the current version of the spec?  Just reading
> these multi-path enumerations start to give me the feeling this spec
> is designed rather badly..

www.snia.org in the DDF TWG section, but requires you be a member of SNIA 
to see at present.  The DDF chairperson is trying to make the draft 
publicly available, and if/when I see that happen I'll post a link to it 
here.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
@ 2004-01-16 14:11               ` Matt Domsch
  0 siblings, 0 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-16 14:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Lars Marowsky-Bree, Neil Brown, Scott Long, linux-kernel, linux-raid

On Fri, 16 Jan 2004, Christoph Hellwig wrote:
> On Fri, Jan 16, 2004 at 02:56:46PM +0100, Lars Marowsky-Bree wrote:
> > If it encodes the bus/id/lun, I can forsee bad effects if the device
> > enumeration changes because the HBAs get swapped in their slots ;-)

I believe it's just supposed to be a hint to the firmware that the drive 
has roamed from one physical slot to another.
 
> A bus/id/lun enumeration is completely bogus.  Think (S)ATA, FC or
> iSCSI.
> 
> So is there a pointer to the current version of the spec?  Just reading
> these multi-path enumerations start to give me the feeling this spec
> is designed rather badly..

www.snia.org in the DDF TWG section, but requires you be a member of SNIA 
to see at present.  The DDF chairperson is trying to make the draft 
publicly available, and if/when I see that happen I'll post a link to it 
here.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-16 13:56         ` Lars Marowsky-Bree
@ 2004-01-16 14:06           ` Christoph Hellwig
  2004-01-16 14:11               ` Matt Domsch
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2004-01-16 14:06 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Matt Domsch, Neil Brown, Scott Long, linux-kernel, linux-raid

On Fri, Jan 16, 2004 at 02:56:46PM +0100, Lars Marowsky-Bree wrote:
> If it encodes the bus/id/lun, I can forsee bad effects if the device
> enumeration changes because the HBAs get swapped in their slots ;-)

A bus/id/lun enumeration is completely bogus.  Think (S)ATA, FC or
iSCSI.

So is there a pointer to the current version of the spec?  Just reading
these multi-path enumerations start to give me the feeling this spec
is designed rather badly..

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-16 13:43       ` Matt Domsch
@ 2004-01-16 13:56         ` Lars Marowsky-Bree
  2004-01-16 14:06           ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-16 13:56 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Neil Brown, Scott Long, linux-kernel, linux-raid

On 2004-01-16T07:43:36,
   Matt Domsch <Matt_Domsch@dell.com> said:

> > Do you know whether DDF can also support simple multipathing?
> Yes, the structure info for each physical disk allows for two (and
> only 2) paths to be represented.  But it's pretty limited, describing
> only SCSI-like paths with bus/id/lun only described in the current
> draft.  At the same time, there's a per-physical-disk GUID, such
> that if you find the same disk by multiple paths you can tell.
> There's room for enhancment/feedback in this space for certain.  

One would guess that for m-p, a mere media UUID would be completely
enough; one can simply scan where those are found.

If it encodes the bus/id/lun, I can forsee bad effects if the device
enumeration changes because the HBAs get swapped in their slots ;-)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-16  9:24       ` Lars Marowsky-Bree
  (?)
@ 2004-01-16 13:43       ` Matt Domsch
  2004-01-16 13:56         ` Lars Marowsky-Bree
  -1 siblings, 1 reply; 57+ messages in thread
From: Matt Domsch @ 2004-01-16 13:43 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Neil Brown, Scott Long, linux-kernel, linux-raid

On Fri, Jan 16, 2004 at 10:24:47AM +0100, Lars Marowsky-Bree wrote:
> Do you know whether DDF can also support simple multipathing?

Yes, the structure info for each physical disk allows for two (and
only 2) paths to be represented.  But it's pretty limited, describing
only SCSI-like paths with bus/id/lun only described in the current
draft.  At the same time, there's a per-physical-disk GUID, such
that if you find the same disk by multiple paths you can tell.
There's room for enhancment/feedback in this space for certain.  

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-16  9:31       ` Lars Marowsky-Bree
  (?)
@ 2004-01-16  9:57       ` Arjan van de Ven
  -1 siblings, 0 replies; 57+ messages in thread
From: Arjan van de Ven @ 2004-01-16  9:57 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Matt Domsch, Jeff Garzik, Scott Long, Linux Kernel, linux-raid,
	Neil Brown

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

On Fri, 2004-01-16 at 10:31, Lars Marowsky-Bree wrote:
> On 2004-01-13T13:41:07,
>    Matt Domsch <Matt_Domsch@dell.com> said:
> 
> > > You sorta hit a bad time for 2.4 development.  Even though my employer 
> > > (Red Hat), Adaptec, and many others must continue to support new 
> > > products on 2.4.x kernels,
> > Indeed, enterprise class products based on 2.4.x kernels will need
> > some form of solution here too.
> 
> Yes, namely not supporting this feature and moving onwards to 2.6 in
> their next release ;-)

hear hear


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 19:41   ` Matt Domsch
@ 2004-01-16  9:31       ` Lars Marowsky-Bree
  2004-01-16  9:31       ` Lars Marowsky-Bree
  1 sibling, 0 replies; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-16  9:31 UTC (permalink / raw)
  To: Matt Domsch, Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown

On 2004-01-13T13:41:07,
   Matt Domsch <Matt_Domsch@dell.com> said:

> > You sorta hit a bad time for 2.4 development.  Even though my employer 
> > (Red Hat), Adaptec, and many others must continue to support new 
> > products on 2.4.x kernels,
> Indeed, enterprise class products based on 2.4.x kernels will need
> some form of solution here too.

Yes, namely not supporting this feature and moving onwards to 2.6 in
their next release ;-)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
@ 2004-01-16  9:31       ` Lars Marowsky-Bree
  0 siblings, 0 replies; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-16  9:31 UTC (permalink / raw)
  To: Matt Domsch, Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown

On 2004-01-13T13:41:07,
   Matt Domsch <Matt_Domsch@dell.com> said:

> > You sorta hit a bad time for 2.4 development.  Even though my employer 
> > (Red Hat), Adaptec, and many others must continue to support new 
> > products on 2.4.x kernels,
> Indeed, enterprise class products based on 2.4.x kernels will need
> some form of solution here too.

Yes, namely not supporting this feature and moving onwards to 2.6 in
their next release ;-)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 18:03   ` Scott Long
@ 2004-01-16  9:29     ` Lars Marowsky-Bree
  0 siblings, 0 replies; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-16  9:29 UTC (permalink / raw)
  To: Scott Long; +Cc: linux-raid

On 2004-01-13T11:03:40,
   Scott Long <scott_long@adaptec.com> said:

> The biggest issue here is that a real fdisk table needs to exist on the
> array in order for our BIOS to recognise it as a boot device.

Hm, ok.

> >Yes. Is anything missing from the 2.6 & hotplug & udev solution which
> >you require?
> 
> I'll admit that I'm not as familiar with 2.6 as I should be.  Does a
> disk arrival mechanism already exist?

Yes. hotplug already will get you events when new disks arrive.

> >In particular, I'm wondering whether partitions using the new activity
> >logging features of md will still be bootable, or whether the boot
> >partitions need to be 'md classic'.
> 
> Our products will only recognise and boot off of DDF arrays.  They have
> no concept of classic MD metadata.

OK. The question was meant differently. In 2.6, we have the ability to
log resyncs and journal updates (see the discussions on linux-raid). I
was just wondering whether DDF would allow this, or whether it is a
simple minded "this disk good, that disk bad", and thus the boot drive
might not be able to use the new md features with the DDF metadata.

> This work was originally started on 2.4.  With the closing of 2.4 and
> release of 2.6, we are porting are work forward.  It would be nice to
> integrate the changes into 2.4 also, but we recognise the need for 2.4
> to remain as stable as possible.

2.4 is dead and shouldn't see new features.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-15 21:52     ` Matt Domsch
@ 2004-01-16  9:24       ` Lars Marowsky-Bree
  -1 siblings, 0 replies; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-16  9:24 UTC (permalink / raw)
  To: Matt Domsch, Neil Brown; +Cc: Scott Long, linux-kernel, linux-raid

On 2004-01-15T15:52:21,
   Matt Domsch <Matt_Domsch@dell.com> said:

> * Solution works in both 2.4 and 2.6 kernels
>   - less ideal of two different solutions are needed

Sure, this is important.

> * RAID 0,1 DDF format
> * Bootable from degraded R1

We were looking at extending the boot loader (grub/lilo) to have
additional support for R1 & multipath. (ie, booting from the first
drive/path in the set where a consistent image can be read.) If the BIOS
supports DDF too, this would get even better.

For the boot drive, this is highly desireable!

Do you know whether DDF can also support simple multipathing?

> * Boot from degraded RAID1 requires setup method early in boot
>   process, either initrd or kernel code.

This is needed with DDF too; we need to parse the DDF data somewhere
afterall.

> From what I see about md:
> * RAID 0,1 there today, no DDF

Supporting additional metadata is desireable. For 2.6, this is already
in the code, and I am looking forward to having this feature.

> Am I way off base here? :-)

I don't think so. But for 2.6, the functionality should go either into
DM or MD, not into emd. I don't care which, really, both sides have good
arguments, none of which _really_ matter from a user-perspective ;-)

(If, in 2.7 time, we rip out MD and fully integrate it all into DM, then
we can see further.)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
@ 2004-01-16  9:24       ` Lars Marowsky-Bree
  0 siblings, 0 replies; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-16  9:24 UTC (permalink / raw)
  To: Matt Domsch, Neil Brown; +Cc: Scott Long, linux-kernel, linux-raid

On 2004-01-15T15:52:21,
   Matt Domsch <Matt_Domsch@dell.com> said:

> * Solution works in both 2.4 and 2.6 kernels
>   - less ideal of two different solutions are needed

Sure, this is important.

> * RAID 0,1 DDF format
> * Bootable from degraded R1

We were looking at extending the boot loader (grub/lilo) to have
additional support for R1 & multipath. (ie, booting from the first
drive/path in the set where a consistent image can be read.) If the BIOS
supports DDF too, this would get even better.

For the boot drive, this is highly desireable!

Do you know whether DDF can also support simple multipathing?

> * Boot from degraded RAID1 requires setup method early in boot
>   process, either initrd or kernel code.

This is needed with DDF too; we need to parse the DDF data somewhere
afterall.

> From what I see about md:
> * RAID 0,1 there today, no DDF

Supporting additional metadata is desireable. For 2.6, this is already
in the code, and I am looking forward to having this feature.

> Am I way off base here? :-)

I don't think so. But for 2.6, the functionality should go either into
DM or MD, not into emd. I don't care which, really, both sides have good
arguments, none of which _really_ matter from a user-perspective ;-)

(If, in 2.7 time, we rip out MD and fully integrate it all into DM, then
we can see further.)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-14 23:07 ` Neil Brown
@ 2004-01-15 21:52     ` Matt Domsch
  2004-01-15 21:52     ` Matt Domsch
  1 sibling, 0 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-15 21:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: Scott Long, linux-kernel, linux-raid

On Thu, Jan 15, 2004 at 10:07:34AM +1100, Neil Brown wrote:
> On Monday January 12, scott_long@adaptec.com wrote:
> > All,
> > 
> > Adaptec has been looking at the MD driver for a foundation for their
> > Open-Source software RAID stack.  This will help us provide full
> > and open support for current and future Adaptec RAID products (as
> > opposed to the limited support through closed drivers that we have
> > now).
> 
> Sounds like a great idea.
>
> > - Metadata abstraction:  We intend to support multiple on-disk metadata
> >    formats, along with the 'native MD' format.  To do this, specific
> >    knowledge of MD on-disk structures must be abstracted out of the core
> >    and personalities modules.
> 
> In 2.4, this would be a massive amount of work and I don't recommend
> it.

Scott has a decent stab at doing so already in 2.4, and I've encouraged him
to post the code he's got now.  Since it's too intrusive for 2.4,
perhaps it could be added in parallel, an "emd" driver, and one could
choose to use emd to get the DDF functionality, or continue to use md
without DDF.

Here are some of the features I know I'm looking for, and I've
compared solutions suggested. Comments/corrections welcome.

* Solution works in both 2.4 and 2.6 kernels
  - less ideal of two different solutions are needed
* RAID 0,1 DDF format
* Bootable from degraded R1
* Online Rebuild
* Mgmt tools/hooks
  - online create, delete, modify
* Event notification/logging
* Error Handling
* Installation - simple i.e. without modifying distro installers
  significantly or at all; driver disk only is ideal


From what I see about DM at present:
* RAID 0,1 possible, dm-raid1 module in Sistina CVS needs to get merged
* Boot drive - requires setup method early in boot process, either
  initrd or kernel code
* Boot from degraded RAID1 requires setup method early in boot
  process, either initrd or kernel code.
* Online Rebuild - dm-raid1 has this capability
* mgmt tools/hooks - DM has today way to communicate to kernel the
  changes desired. What remains is userspace tools that read, modify DDF
  metadata and calls into these hooks.
* Event notification / logging - doesn't appear to exist in DM
* Error handling - unclear if/how DM handles this.  For instance, how
  is a disk failure on a dm-raid1 array handled?
* Installation - RHEL3 doesn't include DM yet, significant installer
  work necessary for several distros.


From what I see about md:
* RAID 0,1 there today, no DDF
* Boot drive - yes
* Boot from degraded RAID1 - possible but may require manual
  intervention depending on BIOS capabilities
* Online Rebuild - there today
* mgmt tools/hooks - mdadm there today
* Event notification / logging - mdadm there today
* Error handling - there today
* Installation - disto installer capable of this today


From what I see about emd:
* RAID 0,1 - code being developed by Adaptec today, DDF capable
* Boot drive - yes
* Boot from degraded RAID1 - possible without intervention due to
  Adaptec BIOS
* Online Rebuild - there today
* mgmt tools/hooks - mdadm there today, expect Adaptec to enhance mdam to support DDF
* Event notification / logging - mdadm there today
* Error handling - there today
* Installation - could be done with only a driver disk which adds the
  emd module.

Am I way off base here? :-)

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
@ 2004-01-15 21:52     ` Matt Domsch
  0 siblings, 0 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-15 21:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: Scott Long, linux-kernel, linux-raid

On Thu, Jan 15, 2004 at 10:07:34AM +1100, Neil Brown wrote:
> On Monday January 12, scott_long@adaptec.com wrote:
> > All,
> > 
> > Adaptec has been looking at the MD driver for a foundation for their
> > Open-Source software RAID stack.  This will help us provide full
> > and open support for current and future Adaptec RAID products (as
> > opposed to the limited support through closed drivers that we have
> > now).
> 
> Sounds like a great idea.
>
> > - Metadata abstraction:  We intend to support multiple on-disk metadata
> >    formats, along with the 'native MD' format.  To do this, specific
> >    knowledge of MD on-disk structures must be abstracted out of the core
> >    and personalities modules.
> 
> In 2.4, this would be a massive amount of work and I don't recommend
> it.

Scott has a decent stab at doing so already in 2.4, and I've encouraged him
to post the code he's got now.  Since it's too intrusive for 2.4,
perhaps it could be added in parallel, an "emd" driver, and one could
choose to use emd to get the DDF functionality, or continue to use md
without DDF.

Here are some of the features I know I'm looking for, and I've
compared solutions suggested. Comments/corrections welcome.

* Solution works in both 2.4 and 2.6 kernels
  - less ideal of two different solutions are needed
* RAID 0,1 DDF format
* Bootable from degraded R1
* Online Rebuild
* Mgmt tools/hooks
  - online create, delete, modify
* Event notification/logging
* Error Handling
* Installation - simple i.e. without modifying distro installers
  significantly or at all; driver disk only is ideal


>From what I see about DM at present:
* RAID 0,1 possible, dm-raid1 module in Sistina CVS needs to get merged
* Boot drive - requires setup method early in boot process, either
  initrd or kernel code
* Boot from degraded RAID1 requires setup method early in boot
  process, either initrd or kernel code.
* Online Rebuild - dm-raid1 has this capability
* mgmt tools/hooks - DM has today way to communicate to kernel the
  changes desired. What remains is userspace tools that read, modify DDF
  metadata and calls into these hooks.
* Event notification / logging - doesn't appear to exist in DM
* Error handling - unclear if/how DM handles this.  For instance, how
  is a disk failure on a dm-raid1 array handled?
* Installation - RHEL3 doesn't include DM yet, significant installer
  work necessary for several distros.


>From what I see about md:
* RAID 0,1 there today, no DDF
* Boot drive - yes
* Boot from degraded RAID1 - possible but may require manual
  intervention depending on BIOS capabilities
* Online Rebuild - there today
* mgmt tools/hooks - mdadm there today
* Event notification / logging - mdadm there today
* Error handling - there today
* Installation - disto installer capable of this today


>From what I see about emd:
* RAID 0,1 - code being developed by Adaptec today, DDF capable
* Boot drive - yes
* Boot from degraded RAID1 - possible without intervention due to
  Adaptec BIOS
* Online Rebuild - there today
* mgmt tools/hooks - mdadm there today, expect Adaptec to enhance mdam to support DDF
* Event notification / logging - mdadm there today
* Error handling - there today
* Installation - could be done with only a driver disk which adds the
  emd module.

Am I way off base here? :-)

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-14 23:07 ` Neil Brown
@ 2004-01-15 11:10   ` Norman Schmidt
  2004-01-15 21:52     ` Matt Domsch
  1 sibling, 0 replies; 57+ messages in thread
From: Norman Schmidt @ 2004-01-15 11:10 UTC (permalink / raw)
  To: linux-raid

Neil Brown wrote:

> I'm beginning to think the best approach is to use a new major number
> (which will be dynammically allocated because Linus has forbidden new
> static allocations).  This should be fairly easy to do.

It seems that has changed:

http://www.lanana.org/docs/device-list/

Norman.

-- 
--

Norman Schmidt          Institut fuer Physikal. u. Theoret. Chemie
Dipl.-Chem.             Friedrich-Alexander-Universitaet
schmidt@naa.net         Erlangen-Nuernberg
                         IT-Systembetreuer Physikalische Chemie


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
       [not found]           ` <20040114222447.GL1594@srv-lnx2600.matchmail.com>
@ 2004-01-15  1:42             ` Jakob Oestergaard
  0 siblings, 0 replies; 57+ messages in thread
From: Jakob Oestergaard @ 2004-01-15  1:42 UTC (permalink / raw)
  To: Scott Long, linux-kernel

On Wed, Jan 14, 2004 at 02:24:47PM -0800, Mike Fedyk wrote:
...
> > "only" adding disks... How many people actually shrink stuff nowadays?
> > 
> 
> Going raid0 -> raid5 would shrink your filesystem.

Not if you add an extra disk  :)

> 
> > I'd say having hot-growth would solve 99% of the problems out there.
> > 
> 
> True.  Until now I didn't know I could resize my MD raid arrays!
> 
> Is it still true that you think it's a good idea to try to test the resizing
> code?  It's been around since 1999, so maybe it's a bit further than
> "experemental" now?

I haven't had much need for the program myself since shortly after I
wrote it, but maybe a handfull or so of people have tested it and
reported results back to me (and that's since 1999!).

RedHat took the tool and shipped it with some changes. Don't know if
they have had feedback...

>From the testing it has had, I wouldn't call it more than experimental.
As it turns out, it was "almost" correct from the beginning, and there
haven't been much progress since then  :)

Now it's just lying on my site, rotting...   Mostly, I think the problem
is that the reconfiguration is not on-line.  It is not really useful to
do off-line reconfiguration. You need to make a full backup anyway - and
it is simply faster to just re-create the array and restore your data,
than to run the reconfiguration.  At least this holds true for most of
the cases I've heard of (except maybe the ones where users didn't back
up data first).

I think it's a pity that noone has taken the code and somehow
(userspace/kernel hybrid or pure kernel?) integrated it with the kernel
to make hot reconfiguration possible.

But I have not had the time to do so myself, and I cannot see myself
getting the time to do it in any forseeable future.

I aired the idea with the EVMS folks about a year ago, and they like the
idea but were too busy just getting EVMS into the kernel as it was,
making the necessary changes there...

I think most people agree that hot reconfiguration of RAID arrays would
be a cool feature.  It just seems that noone really has the time to do
it.   The logic as such should be fairly simple - raidreconf is maybe
not exactly 'trivial', but it's not rocket science either.  And if
nothing else, it's a skeleton that works (mostly)   :)

> 
> Has anyone tried to write a test suite for it?

Not that I know of.  But a certain commercial NAS vendor used the tool
in their products, so maybe they wrote a test suite, I don't know.

 / jakob


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13  0:34 Proposed enhancements " Scott Long
                   ` (3 preceding siblings ...)
  2004-01-13 22:06 ` Arjan van de Ven
@ 2004-01-14 23:07 ` Neil Brown
  2004-01-15 11:10   ` Norman Schmidt
  2004-01-15 21:52     ` Matt Domsch
  4 siblings, 2 replies; 57+ messages in thread
From: Neil Brown @ 2004-01-14 23:07 UTC (permalink / raw)
  To: Scott Long; +Cc: linux-kernel, linux-raid

On Monday January 12, scott_long@adaptec.com wrote:
> All,
> 
> Adaptec has been looking at the MD driver for a foundation for their
> Open-Source software RAID stack.  This will help us provide full
> and open support for current and future Adaptec RAID products (as
> opposed to the limited support through closed drivers that we have
> now).

Sounds like a great idea.

> 
> While MD is fairly functional and clean, there are a number of 
> enhancements to it that we have been working on for a while and would
> like to push out to the community for review and integration.  These
> include:

It would help if you said up-front if you were thinking of 2.4 or 2.6
or 2.7 or all of whatever.  I gather from subsequent emails in the
thread that you are thinking of 2.6 and hoping for 2.4.
It is definately too late for any of this to go into kernel.org 2.4,
but some of it could live in an external patch set that people or
vendors can choose or not.

> 
> - partition support for md devices:  MD does not support the concept of
>    fdisk partitions; the only way to approximate this right now is by
>    creating multiple arrays on the same media.  Fixing this is required
>    for not only feature-completeness, but to allow our BIOS to recognise
>    the partitions on an array and properly boot them as it would boot a
>    normal disk.

Your attached patch is completely unacceptable as it breaks backwards
compatability.  /dev/md1 (blockdev 9,1) changes from being the second
md array to being the first partition of the first md array.

I too would like to support partitions of md devices but there is not
really elegant way to do it.
I'm beginning to think the best approach is to use a new major number
(which will be dynammically allocated because Linus has forbidden new
static allocations).  This should be fairly easy to do.

A reasonable alternate is to use DM.  As I understand it, DM can work
with any sort of metadata (As metadata is handled by user-space) so
this should work just fine.

Note that kernel-based autodetection is seriously a thing of the past.
As has been said already, it should be just as easy and much more
manageable to do autodtection in early user-space.  If it isn't, then
we need to improve the early user-space tools.

> 
> - generic device arrival notification mechanism:  This is needed to
>    support device hot-plug, and allow arrays to be automatically
>    configured regardless of when the md module is loaded or initialized.
>    RedHat EL3 has a scaled down version of this already, but it is
>    specific to MD and only works if MD is statically compiled into the
>    kernel.  A general mechanism will benefit MD as well as any other
>    storage system that wants hot-arrival notices.

This has largely been covered, but just to add or clarify slightly:

 This is not an md issue.  This is either a buss controller or
 userspace issue.
 2.6 has a "hotplug" infrastructure and each buss should report
 hotplug events to userspace.
 If they don't they should be enhanced so they do.
 If they do, then userspace needs to be told what to do with these
 events, and when to assemble devices into arrays.

 
> 
> - RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
>    that spans a chunk boundary.  Modifications are needed so that it can
>    take a request and break it up into 1 or more per-disk requests.

In 2.4 it cannot, but arguable doesn't need to.  However I have a
fairly straight-forward patch which supports raid0 request splitting.
In 2.6, this should work properly already.

> 
> - Metadata abstraction:  We intend to support multiple on-disk metadata
>    formats, along with the 'native MD' format.  To do this, specific
>    knowledge of MD on-disk structures must be abstracted out of the core
>    and personalities modules.

In 2.4, this would be a massive amount of work and I don't recommend
it.
In 2.6, most of this is already done - the knowledge about superblock
format is very localised.  I would like to extend this so that a
loadable module can add a new format.  Patches welcome.

Note that the kernel does need to know about the format of the
superblock.
DM can manage without knowing as it's superblock is read mostly and
the very few updates (for reconfiguration) are managed by userspace.
For raid1 and raid5 (which DM doesn't support), we need to update the
superblock on errors and I think that is best done in the kernel.


> 
> - DDF Metadata support: Future products will use the 'DDF' on-disk
>    metadata scheme.  These products will be bootable by the BIOS, but
>    must have DDF support in the OS.  This will plug into the abstraction
>    mentioned above.

I'm looking forward to seeing the specs for DDF (but isn't it pretty
dump to develop a standard in a closed forum).  If DDF turns out to
have real value I would he happy to have support for it in linux/md.

NeilBrown

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
       [not found]       ` <20040114194052.GK1594@srv-lnx2600.matchmail.com>
@ 2004-01-14 21:02         ` Jakob Oestergaard
       [not found]           ` <20040114222447.GL1594@srv-lnx2600.matchmail.com>
  0 siblings, 1 reply; 57+ messages in thread
From: Jakob Oestergaard @ 2004-01-14 21:02 UTC (permalink / raw)
  To: Scott Long, linux-kernel

On Wed, Jan 14, 2004 at 11:40:52AM -0800, Mike Fedyk wrote:
> On Wed, Jan 14, 2004 at 08:07:02PM +0100, Jakob Oestergaard wrote:
> > http://unthought.net/raidreconf/index.shtml
> > 
> > I know of one bug in it which will thoroughly smash user data beyond
> > recognition - it happens when you resize RAID-5 arrays on disks that are
> > not of equal size.  Should be easy to fix, if one tried  :)
> > 
> 
> Hmm, that's if the underlying blockdevs are of differing sizes, right?  I
> usually do my best to make the partitions the same size, so hopefully that
> won't hit for me.  (though, I don't need to resize any arrays right now)

Make backups anyway  :)

> 
> > If you want it in the kernel doing hot-resizing, you probably want to
> > add some sort of 'progress log' so that one can resume the
> > reconfiguration after a reboot - that should be doable, just isn't done
> > yet.
> 
> IIRC, most filesystems don't support hot shrinking if they support
> hot-resizing, so that would only help with adding a disk to an array.

"only" adding disks... How many people actually shrink stuff nowadays?

I'd say having hot-growth would solve 99% of the problems out there.

And I think that's at least a good part of the reason why so few FSes
can actually shrink.  Shrinking can be a much harder problem too, though
- maybe that's part of the reason too.

> 
> > Right now it's entirely a user-space tool and it is not integrated with
> > the MD code to make it do hot-reconfiguration - integrating it with DM
> > and MD would make it truely useful.
> 
> True, but an intermediate step would be to call parted for resizing to the
> exact size needed for a raid0 -> raid5 conversion for example.

Yep.

 / jakob


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
       [not found]   ` <20040113201058.GD1594@srv-lnx2600.matchmail.com>
@ 2004-01-14 19:07     ` Jakob Oestergaard
       [not found]       ` <20040114194052.GK1594@srv-lnx2600.matchmail.com>
  0 siblings, 1 reply; 57+ messages in thread
From: Jakob Oestergaard @ 2004-01-14 19:07 UTC (permalink / raw)
  To: Scott Long, linux-kernel

On Tue, Jan 13, 2004 at 12:10:58PM -0800, Mike Fedyk wrote:
> On Tue, Jan 13, 2004 at 05:26:36PM +0100, Jakob Oestergaard wrote:
> > The RAID conversion/resize code for userspace exists already, and it
> 
> That's news to me!
> 
> Where is the project that does this?

http://unthought.net/raidreconf/index.shtml

I know of one bug in it which will thoroughly smash user data beyond
recognition - it happens when you resize RAID-5 arrays on disks that are
not of equal size.  Should be easy to fix, if one tried  :)

If you want it in the kernel doing hot-resizing, you probably want to
add some sort of 'progress log' so that one can resume the
reconfiguration after a reboot - that should be doable, just isn't done
yet.

Right now it's entirely a user-space tool and it is not integrated with
the MD code to make it do hot-reconfiguration - integrating it with DM
and MD would make it truely useful.


 / jakob


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-14 16:16         ` Kevin Corry
@ 2004-01-14 16:53           ` Kevin P. Fleming
  0 siblings, 0 replies; 57+ messages in thread
From: Kevin P. Fleming @ 2004-01-14 16:53 UTC (permalink / raw)
  To: Kevin Corry; +Cc: Wakko Warner, Scott Long, linux-kernel

Kevin Corry wrote:

> I guess I simply don't understand the desire to partition MD devices when 
> putting LVM on top of MD provides *WAY* more flexibility. You can resize any 
> volume in your group, as well as add new disks or raid devices in the future 
> and expand existing volumes across those new devices. All of this is quite a 
> pain with just partitions.

In a nutshell: other OS compatibility. Not that I care, but they're 
trying to cater to the users that have both Linux and Windows (and other 
stuff) installed on a RAID-1 created by their BIOS RAID driver. In that 
situation, they can't use logical volumes for the other OS partitions, 
they've got to have an MSDOS partition table on top of the RAID device.

However, that does not mean this needs to be done in the kernel, they 
can easily use a (future) dm-partx that reads the partition table and 
tells DM what devices to make from the RAID device.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 23:38       ` Wakko Warner
@ 2004-01-14 16:16         ` Kevin Corry
  2004-01-14 16:53           ` Kevin P. Fleming
  0 siblings, 1 reply; 57+ messages in thread
From: Kevin Corry @ 2004-01-14 16:16 UTC (permalink / raw)
  To: Wakko Warner; +Cc: Scott Long, linux-kernel

On Tuesday 13 January 2004 17:38, Wakko Warner wrote:
> > > As I've understood it, the configuration for DM is userspace and the
> > > kernel can't do any auto detection.  This would be a "put off" for me
> > > to use as a root filesystem.  Configurations like this (and lvm too
> > > last I looked at it) require an initrd or some other way of setting up
> > > the device.  Unfortunately this means that there's configs in 2
> > > locations (one not easily available,  if using initrd.  easily !=
> > > mounting via loop!)
> >
> > You can always do the following: use a mini root fs on the partition
> > where the kernel is located that does nothing but vgscan and friends and
> > then calls pivot_root. '/sbin/init' of the mini root fs looks like:
>
> What is the advantage of not putting the autodetector/setup in the kernel?

Because it can be incredibly complicated, bloated, and difficult to coordinate 
with the corresponding user-space tools.

> Not everyone is going to use this software (or am I wrong on that?) so that
> can be left as an option to compile in (or as a module if possible and if
> autodetection is not required).  How much work is it to maintain something
> like this in the kernel?

Enough to have had the idea shot down a year-and-a-half ago. EVMS did 
in-kernel volume discovery at one point, but the driver was enormous. Let's 
just say we finally "saw the light" and redesigned to do user-space 
discovery. Trust me, it works much better that way.

> I like the fact that MD can autodetect raids on boot when compiled in, I
> didn't like the fact it can't be partitioned.  That's the only thing that
> put me off with MD.  LVM put me off because it couldn't be auto detected at
> boot.  I was going to play with DM, but I haven't yet.

I guess I simply don't understand the desire to partition MD devices when 
putting LVM on top of MD provides *WAY* more flexibility. You can resize any 
volume in your group, as well as add new disks or raid devices in the future 
and expand existing volumes across those new devices. All of this is quite a 
pain with just partitions.

And setting up an init-ramdisk to run the tools isn't that hard. EVMS even 
provides pre-built init-ramdisks with the EVMS tools which has worked for 
virtually all of our users who want their root filesystem on an EVMS volume. 
It really is fairly simple, and I run three of my own computers this way. If 
you'd like to give it a try, I'd be more than happy to help you out.

-- 
Kevin Corry
kevcorry@us.ibm.com
http://evms.sourceforge.net/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 20:41   ` Scott Long
  2004-01-13 22:33       ` Jure Pečar
@ 2004-01-14 15:52     ` Kevin Corry
  1 sibling, 0 replies; 57+ messages in thread
From: Kevin Corry @ 2004-01-14 15:52 UTC (permalink / raw)
  To: Scott Long, Jeff Garzik; +Cc: Linux Kernel, linux-raid, Neil Brown

On Tuesday 13 January 2004 14:41, Scott Long wrote:
> A problem that we've encountered, though, is the following sequence:
>
> 1) md is inialized during boot
> 2) drives X Y and Z are probed during boot
> 3) root fs exists on array [X Y Z], but md didn't see them show up,
>     so it didn't auto-configure the array
>
> I'm not sure how this can be addressed by a userland daemon.  Remember
> that we are focused on providing RAID during boot; configuring a
> secondary array after boot is a much easier problem.

This can already be accomplished with an init-ramdisk (or initramfs in the 
future). These provide the ability to run user-space code before the real 
root filesystem is mounted.

> > I thought that raid0 was one of the few that actually did bio splitting
> > correctly?  Hum, maybe this is a 2.4-only issue.  Interesting, and
> > agreed, if so...
>
> This is definitely still a problem in 2.6.1

Device-Mapper does bio-splitting correctly, and already has a "stripe" module. 
It's pretty trivial to set up a raid0 device with DM.

> As for the question of DM vs. MD, I think that you have to consider that
> DM right now has no concept of storing configuration data on the disk
> (at least that I can find, please correct me if I'm wrong).  I think
> that DM will make a good LVM-like layer on top of MD, but I don't see it
> replacing MD right now.

The DM core has no knowledge of any metadata, but that doesn't mean its 
sub-modules ("targets" in DM-speak) can't. Example, the dm-snapshot target 
has to record enough on-disk metadata for its snapshots to be persistent 
across reboots. Same with the persistent dm-mirror target that Joe Thornber 
and co. have been working on. You could certainly write a raid5 target that 
recorded parity and other state information on disk.

The real key here is keeping the metadata that simply identifies the device 
separate from the metadata that keeps track of the device state. Using the 
snapshot example again, DM keeps a copy of the remapping table on disk, so an 
existing snapshot can be initialized when it's activated at boot-time. But 
this remapping table is completely separate from the metadata that identifies 
a device/volume as being a snapshot. In fact, EVMS and LVM2 have completely 
different ways of identifying snapshots (which is done in user-space), yet 
they both use the same kernel snapshot module.

-- 
Kevin Corry
kevcorry@us.ibm.com
http://evms.sourceforge.net/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 23:09     ` Andreas Steinmetz
@ 2004-01-13 23:38       ` Wakko Warner
  2004-01-14 16:16         ` Kevin Corry
  0 siblings, 1 reply; 57+ messages in thread
From: Wakko Warner @ 2004-01-13 23:38 UTC (permalink / raw)
  To: Andreas Steinmetz; +Cc: Arjan van de Ven, Scott Long, linux-kernel

> > As I've understood it, the configuration for DM is userspace and the kernel
> > can't do any auto detection.  This would be a "put off" for me to use as a
> > root filesystem.  Configurations like this (and lvm too last I looked at it)
> > require an initrd or some other way of setting up the device.  Unfortunately
> > this means that there's configs in 2 locations (one not easily available,  if
> > using initrd.  easily != mounting via loop!)
> 
> You can always do the following: use a mini root fs on the partition 
> where the kernel is located that does nothing but vgscan and friends and 
> then calls pivot_root. '/sbin/init' of the mini root fs looks like:

What is the advantage of not putting the autodetector/setup in the kernel? 
Not everyone is going to use this software (or am I wrong on that?) so that
can be left as an option to compile in (or as a module if possible and if
autodetection is not required).  How much work is it to maintain something
like this in the kernel?

I ask because I'm not a kernel hacker, mostly an end user  (atleast I can
compile my own kernels =)

I must say, the day that kernel level ip configuration via bootp is removed
I'm going to be pissed =)

I like the fact that MD can autodetect raids on boot when compiled in, I
didn't like the fact it can't be partitioned.  That's the only thing that
put me off with MD.  LVM put me off because it couldn't be auto detected at
boot.  I was going to play with DM, but I haven't yet.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 22:44   ` Wakko Warner
  2004-01-13 22:34     ` Arjan van de Ven
@ 2004-01-13 23:09     ` Andreas Steinmetz
  2004-01-13 23:38       ` Wakko Warner
  1 sibling, 1 reply; 57+ messages in thread
From: Andreas Steinmetz @ 2004-01-13 23:09 UTC (permalink / raw)
  To: Wakko Warner; +Cc: Arjan van de Ven, Scott Long, linux-kernel

Wakko Warner wrote:
> 
> As I've understood it, the configuration for DM is userspace and the kernel
> can't do any auto detection.  This would be a "put off" for me to use as a
> root filesystem.  Configurations like this (and lvm too last I looked at it)
> require an initrd or some other way of setting up the device.  Unfortunately
> this means that there's configs in 2 locations (one not easily available,  if
> using initrd.  easily != mounting via loop!)
> 

You can always do the following: use a mini root fs on the partition 
where the kernel is located that does nothing but vgscan and friends and 
then calls pivot_root. '/sbin/init' of the mini root fs looks like:


#!/bin/sh
case "$1" in
         -s|S|single|-a|auto)
                 opt=$1
         ;;
         -b|emergency)
                 export PATH=/bin:/sbin
                 /bin/mount /proc
                 /bin/loadkeys \
			/keymaps/i386/qwertz/de-latin1-nodeadkeys.map.gz
                 exec /bin/sh < /dev/console > /dev/console 2>&1
         ;;
esac
cd /
/bin/mount /proc
/bin/mount -o remount,rw,notail,noatime,nodiratime /
/sbin/vgscan > /dev/null
/sbin/vgchange -a y > /dev/null
/bin/mount -o remount,ro,notail,noatime,nodiratime /
/bin/mount /mnt
/bin/umount /proc
cd /mnt
/sbin/pivot_root . boot
exec /bin/chroot . /bin/sh -c \
	"/bin/umount /boot ; exec /sbin/init $opt" \
	< dev/console > dev/console 2>&1



And if you have partitions of the same size on other disks and fiddle a 
bit with dd you have perfect working backups including the boot loader 
code of the master boot record on the other disks. No initrd required.
As an add-on you have an on-disk rescue system.
-- 
Andreas Steinmetz


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 22:33       ` Jure Pečar
  (?)
  (?)
@ 2004-01-13 22:56       ` viro
  -1 siblings, 0 replies; 57+ messages in thread
From: viro @ 2004-01-13 22:56 UTC (permalink / raw)
  To: Jure Pe??ar; +Cc: Scott Long, jgarzik, linux-kernel, linux-raid, neilb

On Tue, Jan 13, 2004 at 11:33:20PM +0100, Jure Pe??ar wrote:
> Looking at this chicken-and-egg problem of booting from an array from
> administrator's point of view ...
> 
> What do you guys think about Intel's EFI? I think it would be the most
> apropriate place to put a piece of code that would scan the disks, assemble
> any arrays and present them to the OS as bootable devices ... If we're going
> to get a common metadata layout, that would be even easier.
> 
> Thoughts?

Why bother?  We can have userland code running before any device drivers
are initialized.  And have access to
	* all normal system calls
	* normal writable filesystem already present (ramfs)
	* normal multitasking
All of that - within the heavily tested codebase; regular kernel codepaths
that are used all the time by everything.  Oh, and it's portable.

What's the benefit of doing that from EFI?  Pure masochism?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 22:33       ` Jure Pečar
  (?)
@ 2004-01-13 22:44       ` Scott Long
  -1 siblings, 0 replies; 57+ messages in thread
From: Scott Long @ 2004-01-13 22:44 UTC (permalink / raw)
  To: Jure Pečar; +Cc: jgarzik, linux-kernel, linux-raid, neilb

Jure Pečar wrote:
> On Tue, 13 Jan 2004 13:41:07 -0700
> Scott Long <scott_long@adaptec.com> wrote:
> 
> 
>>A problem that we've encountered, though, is the following sequence:
>>
>>1) md is inialized during boot
>>2) drives X Y and Z are probed during boot
>>3) root fs exists on array [X Y Z], but md didn't see them show up,
>>    so it didn't auto-configure the array
>>
>>I'm not sure how this can be addressed by a userland daemon.  Remember
>>that we are focused on providing RAID during boot; configuring a
>>secondary array after boot is a much easier problem.
> 
> 
> Looking at this chicken-and-egg problem of booting from an array from
> administrator's point of view ...
> 
> What do you guys think about Intel's EFI? I think it would be the most
> apropriate place to put a piece of code that would scan the disks,
> assemble
> any arrays and present them to the OS as bootable devices ... If we're
> going
> to get a common metadata layout, that would be even easier.
> 
> Thoughts?
> 

The BIOS already scans the disks, assembles the arrays, and presents
finds the boot sector, and presents the arrays to the loader/GRUB.  Are
you saying that EFI should be the interface by which the arrays are
communicated through, even after the kernel has booted?  Is this
possible right now?

Scott

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 22:06 ` Arjan van de Ven
@ 2004-01-13 22:44   ` Wakko Warner
  2004-01-13 22:34     ` Arjan van de Ven
  2004-01-13 23:09     ` Andreas Steinmetz
  0 siblings, 2 replies; 57+ messages in thread
From: Wakko Warner @ 2004-01-13 22:44 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Scott Long, linux-kernel

> > Adaptec has been looking at the MD driver for a foundation for their
> > Open-Source software RAID stack.
> 
> Hi,
> 
> Is there a (good) reason you didn't use Device Mapper for this? It
> really sounds like Device Mapper is the way to go to parse and use
> raid-like formats to the kernel, since it's designed to be independent
> of on disk formats, unlike MD.

As I've understood it, the configuration for DM is userspace and the kernel
can't do any auto detection.  This would be a "put off" for me to use as a
root filesystem.  Configurations like this (and lvm too last I looked at it)
require an initrd or some other way of setting up the device.  Unfortunately
this means that there's configs in 2 locations (one not easily available,  if
using initrd.  easily != mounting via loop!)

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 18:44 ` Jeff Garzik
                     ` (2 preceding siblings ...)
  2004-01-13 20:41   ` Scott Long
@ 2004-01-13 22:42   ` Luca Berra
  3 siblings, 0 replies; 57+ messages in thread
From: Luca Berra @ 2004-01-13 22:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown

On Tue, Jan 13, 2004 at 01:44:05PM -0500, Jeff Garzik wrote:
>And I could have _sworn_ that Neil already posted a patch to do 
>partitions in md, but maybe my memory is playing tricks on me.
he did, and a long time ago also.
http://cgi.cse.unsw.edu.au/~neilb/patches/

>IMO, your post/effort all boils down to an open design question:  device 
>mapper or md, for doing stuff like vendor-raid1 or vendor-raid5?  And it 
>is even possible to share (for example) raid5 engine among all the 
>various vendor RAID5's?
I would believe the way to go is having md raid personalities turned
into device mapper targets.
the issue is that raid personalities need to be able to constantly
update the metadata, so a callback must be in place to communicate
`exceptions` to a layer that sits above device-mapper and handles
metadatas.

L.


-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 22:44   ` Wakko Warner
@ 2004-01-13 22:34     ` Arjan van de Ven
  2004-01-13 23:09     ` Andreas Steinmetz
  1 sibling, 0 replies; 57+ messages in thread
From: Arjan van de Ven @ 2004-01-13 22:34 UTC (permalink / raw)
  To: Wakko Warner; +Cc: Scott Long, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

On Tue, Jan 13, 2004 at 05:44:22PM -0500, Wakko Warner wrote:
> > > Adaptec has been looking at the MD driver for a foundation for their
> > > Open-Source software RAID stack.
> > 
> > Hi,
> > 
> > Is there a (good) reason you didn't use Device Mapper for this? It
> > really sounds like Device Mapper is the way to go to parse and use
> > raid-like formats to the kernel, since it's designed to be independent
> > of on disk formats, unlike MD.
> 
> As I've understood it, the configuration for DM is userspace and the kernel
> can't do any auto detection.  This would be a "put off" for me to use as a
> root filesystem.  Configurations like this (and lvm too last I looked at it)
> require an initrd or some other way of setting up the device.  Unfortunately
> this means that there's configs in 2 locations (one not easily available,  if
> using initrd.  easily != mounting via loop!)

the kernel is moving into that direction fast, with initramfs etc etc...
It's not like the userspace autodetector needs configuration (although it
can have it of course)

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 20:41   ` Scott Long
@ 2004-01-13 22:33       ` Jure Pečar
  2004-01-14 15:52     ` Kevin Corry
  1 sibling, 0 replies; 57+ messages in thread
From: Jure Pečar @ 2004-01-13 22:33 UTC (permalink / raw)
  To: Scott Long; +Cc: jgarzik, linux-kernel, linux-raid, neilb

On Tue, 13 Jan 2004 13:41:07 -0700
Scott Long <scott_long@adaptec.com> wrote:

> A problem that we've encountered, though, is the following sequence:
> 
> 1) md is inialized during boot
> 2) drives X Y and Z are probed during boot
> 3) root fs exists on array [X Y Z], but md didn't see them show up,
>     so it didn't auto-configure the array
> 
> I'm not sure how this can be addressed by a userland daemon.  Remember
> that we are focused on providing RAID during boot; configuring a
> secondary array after boot is a much easier problem.

Looking at this chicken-and-egg problem of booting from an array from
administrator's point of view ...

What do you guys think about Intel's EFI? I think it would be the most
apropriate place to put a piece of code that would scan the disks, assemble
any arrays and present them to the OS as bootable devices ... If we're going
to get a common metadata layout, that would be even easier.

Thoughts?

-- 

Jure Pečar
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
@ 2004-01-13 22:33       ` Jure Pečar
  0 siblings, 0 replies; 57+ messages in thread
From: Jure Pečar @ 2004-01-13 22:33 UTC (permalink / raw)
  To: Scott Long; +Cc: jgarzik, linux-kernel, linux-raid, neilb

On Tue, 13 Jan 2004 13:41:07 -0700
Scott Long <scott_long@adaptec.com> wrote:

> A problem that we've encountered, though, is the following sequence:
> 
> 1) md is inialized during boot
> 2) drives X Y and Z are probed during boot
> 3) root fs exists on array [X Y Z], but md didn't see them show up,
>     so it didn't auto-configure the array
> 
> I'm not sure how this can be addressed by a userland daemon.  Remember
> that we are focused on providing RAID during boot; configuring a
> secondary array after boot is a much easier problem.

Looking at this chicken-and-egg problem of booting from an array from
administrator's point of view ...

What do you guys think about Intel's EFI? I think it would be the most
apropriate place to put a piece of code that would scan the disks, assemble
any arrays and present them to the OS as bootable devices ... If we're going
to get a common metadata layout, that would be even easier.

Thoughts?

-- 

Jure Pečar

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 17:13   ` Andreas Dilger
@ 2004-01-13 22:26     ` Andreas Dilger
  0 siblings, 0 replies; 57+ messages in thread
From: Andreas Dilger @ 2004-01-13 22:26 UTC (permalink / raw)
  To: Matt Domsch, Scott Long, linux-raid, linux-kernel

On Jan 13, 2004  10:13 -0700, Andreas Dilger wrote:
> So, why not use EVMS and/or Device Mapper to read the DDF metadata and
> set up the mappings that way?

PS - outgoing email was delayed, this has already been covered.  Sorry.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 19:41   ` Matt Domsch
@ 2004-01-13 22:10     ` Arjan van de Ven
  2004-01-16  9:31       ` Lars Marowsky-Bree
  1 sibling, 0 replies; 57+ messages in thread
From: Arjan van de Ven @ 2004-01-13 22:10 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Jeff Garzik, Scott Long, Linux Kernel, linux-raid, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]


> Ideally in 2.6 one can use device mapper, but DM hasn't been
> incorporated into 2.4 stock, I know it's not in RHEL 3, and I don't
> believe it's included in SLES8.  Can anyone share thoughts on if a DDF
> solution were built on top of DM, that DM could be included in 2.4
> stock, RHEL3, or SLES8?  Otherwise, Adaptec will be stuck with two
> different solutions anyhow, one for 2.4 (they're proposing enhancing
> MD), and DM for 2.6.

Well it's either putting DM into 2.4 or forcing some sort of partitioned
MD into 2.4. My strong preference would be DM in that cases since it's
already in 2.6 and is actually designed for the
multiple-superblock-formats case.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13  0:34 Proposed enhancements " Scott Long
                   ` (2 preceding siblings ...)
  2004-01-13 18:44 ` Jeff Garzik
@ 2004-01-13 22:06 ` Arjan van de Ven
  2004-01-13 22:44   ` Wakko Warner
  2004-01-14 23:07 ` Neil Brown
  4 siblings, 1 reply; 57+ messages in thread
From: Arjan van de Ven @ 2004-01-13 22:06 UTC (permalink / raw)
  To: Scott Long; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 453 bytes --]

On Tue, 2004-01-13 at 01:34, Scott Long wrote:
> All,
> 
> Adaptec has been looking at the MD driver for a foundation for their
> Open-Source software RAID stack.

Hi,

Is there a (good) reason you didn't use Device Mapper for this? It
really sounds like Device Mapper is the way to go to parse and use
raid-like formats to the kernel, since it's designed to be independent
of on disk formats, unlike MD.

Greetings,
    Arjan van de Ven

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 18:19   ` Jeff Garzik
  2004-01-13 20:29     ` Chris Friesen
@ 2004-01-13 21:10     ` Matt Domsch
  1 sibling, 0 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-13 21:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Scott Long, linux-raid, linux-kernel

On Tue, Jan 13, 2004 at 01:19:56PM -0500, Jeff Garzik wrote:
> Matt Domsch wrote:
> > I haven't seen the spec yet myself, but I'm lead to believe that
> > DDF allows for multiple logical drives to be created across a single
> > set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on
> > two 80GB spindles), as well as whole disks be used.  It has a
> 
> 
> Me either.  Any idea if there will be a public comment period, or is the 
> spec "locked" into 1.0 when it's released in a month or so?

As it happens, Bill Dawkins of Dell is the DDF committee chair at
SNIA.  Here's what he's told me:

The current draft of the DDF specification is available for review
to any member of SNIA. This is a "Work in Progress" draft. Anyone in a
member company can go to www.snia.org and sign up for web access. They
will then have to sign up for the DDF Technical Working Group.
Acceptance to the DDF TWG is automatic and the current documents are
available there.
(As Dell is a member, I signed up for the DDF TWG as an observer.
Other companies are also on the member list, including Sistina, so
Jeff you may be able to get a Sistina collegue to mail you a copy.
http://www.snia.org/about/member_list has the list of member
companies. - Matt)

For people and companies who are not members of SNIA, I am writing to
the SNIA Technical Director to see if I can release copies of the
draft spec now. I'll let you know when I get a response.

As for the timeline, we have a face to face meeting of the DDF TWG
next Tuesday and it is our intent to vote on releasing the
specification as a "Trial Use" specification for public review and
comment. If the vote is affirmative, the SNIA Technical Council will
have to meet to determine when and if to release the "Trial Use"
specification. This may take a few months, so we are probably looking
at March for full release. Feel free to share this information with
your Linux contacts.



So, for now, if you're in SNIA, you can get access to the draft spec,
and in a few months the draft spec should be publicly available.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 18:21 ` mutex
  2004-01-13 19:05   ` Jeff Garzik
@ 2004-01-13 20:44   ` Scott Long
  1 sibling, 0 replies; 57+ messages in thread
From: Scott Long @ 2004-01-13 20:44 UTC (permalink / raw)
  To: mutex; +Cc: linux-kernel

mutex wrote:
> On Mon, Jan 12, 2004 at 05:34:10PM -0700 or thereabouts, Scott Long
> wrote:
> 
>>All,
>>
>>Adaptec has been looking at the MD driver for a foundation for their
>>Open-Source software RAID stack.  This will help us provide full
>>and open support for current and future Adaptec RAID products (as
>>opposed to the limited support through closed drivers that we have
> 
> now).
> 
>>While MD is fairly functional and clean, there are a number of 
>>enhancements to it that we have been working on for a while and would
>>like to push out to the community for review and integration.  These
>>include:
>>
> 
> 
> 
> How about a endian safe superblock ?  Seriously, is that a 'bug' or a
> 'feature' ?  Or do people just not care.

The DDF metadata module will be endian-safe.

Scott


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 18:44 ` Jeff Garzik
  2004-01-13 19:01   ` John Bradford
  2004-01-13 19:41   ` Matt Domsch
@ 2004-01-13 20:41   ` Scott Long
  2004-01-13 22:33       ` Jure Pečar
  2004-01-14 15:52     ` Kevin Corry
  2004-01-13 22:42   ` Luca Berra
  3 siblings, 2 replies; 57+ messages in thread
From: Scott Long @ 2004-01-13 20:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linux Kernel, linux-raid, Neil Brown

Jeff Garzik wrote:
> Scott Long wrote:
> 
>>I'm going to push these changes out in phases in order to keep the
> 
> risk
> 
>>and churn to a minimum.  The attached patch is for the partition
>>support.  It was originally from Ingo Molnar, but has changed quite a
>>bit due to the radical changes in the disk/block layer in 2.6.  The
> 
> 2.4
> 
>>version works quite well, while the 2.6 version is fairly fresh.  One
>>problem that I have with it is that the created partitions show up in
>>/proc/partitions after running fdisk, but not after a reboot.
> 
> 
> You sorta hit a bad time for 2.4 development.  Even though my employer 
> (Red Hat), Adaptec, and many others must continue to support new 
> products on 2.4.x kernels, kernel development has shifted to 2.6.x (and 
> soon 2.7.x).
> 
> In general, you want a strategy of "develop on latest, then backport if 
> needed."  Once a solution is merged into the latest kernel, it 
> automatically appears in many companies' products (and perhaps more 
> importantly) product roadmaps.  Otherwise you will design various things
> 
> into your software that have already been handled different in the 
> future, thus creating an automatically-obsolete solution and support 
> nightmare.
> 

Oh, I understand completely.  This work has actually been going on for a
number of years in an on-and-off fashion.  I'm just the latest person to
pick it up, and I happened to pick it up right when the big transition
to 2.6 happened.

> Now, addressing your specific issues...
> 
> 
>>hile MD is fairly functional and clean, there are a number of
> 
> enhancements to it that we have been working on for a while and would
> 
>>like to push out to the community for review and integration.  These
>>include:
>>
>>- partition support for md devices:  MD does not support the concept
> 
> of
> 
>>  fdisk partitions; the only way to approximate this right now is by
>>  creating multiple arrays on the same media.  Fixing this is required
>>  for not only feature-completeness, but to allow our BIOS to
> 
> recognise
> 
>>  the partitions on an array and properly boot them as it would boot a
>>  normal disk.
> 
> 
> Neil Brown already done a significant amount of research into this 
> topic.  Given this, and his general status as md maintainer, you should 
> definitely make sure he's kept in the loop.
> 
> Partitioning for md was discussed in this thread:
> http://lkml.org/lkml/2003/11/13/182
> 
> In particular note Al Viro's response to Neil, in addition to Neil's own
> 
> post.
> 
> And I could have _sworn_ that Neil already posted a patch to do 
> partitions in md, but maybe my memory is playing tricks on me.
> 

I thought that I had attached a patch to the end of my last mail, but I
could have messed it up.  The work to do partitioning in 2.6 looks to
be incredibly less significant than in 2.4, thankfully =-)

> 
> 
>>- generic device arrival notification mechanism:  This is needed to
>>  support device hot-plug, and allow arrays to be automatically
>>  configured regardless of when the md module is loaded or
> 
> initialized.
> 
>>  RedHat EL3 has a scaled down version of this already, but it is
>>  specific to MD and only works if MD is statically compiled into the
>>  kernel.  A general mechanism will benefit MD as well as any other
>>  storage system that wants hot-arrival notices.
> 
> 
> This would be via /sbin/hotplug, in the Linux world.  SCSI already does 
> this, I think, so I suppose something similar would happen for md.
> 

A problem that we've encountered, though, is the following sequence:

1) md is inialized during boot
2) drives X Y and Z are probed during boot
3) root fs exists on array [X Y Z], but md didn't see them show up,
    so it didn't auto-configure the array

I'm not sure how this can be addressed by a userland daemon.  Remember
that we are focused on providing RAID during boot; configuring a
secondary array after boot is a much easier problem.

RHEL3 already has a mechanism to address this via the
md_autodetect_dev() hook.  This gets called by the partition code when
partition entites are discovered.  However, it is a static method, so
it only works when md is compiled into the kernel.  Our proposal to
to turn this into a generic registration mechanism, where md can
register as a listener.  When it does that, it gets a list of
previously announced devices, along with future devices as they are
discovered.

The code to do this is pretty small and simple.  The biggest question
is whether to implement it by enhancing add_partition(), or create a
new call (i.e. device_register_partition() ), like is done in RHEL3.

> 
> 
>>- RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
>>  that spans a chunk boundary.  Modifications are needed so that it
> 
> can
> 
>>  take a request and break it up into 1 or more per-disk requests.
> 
> 
> I thought that raid0 was one of the few that actually did bio splitting 
> correctly?  Hum, maybe this is a 2.4-only issue.  Interesting, and 
> agreed, if so...
> 

This is definitely still a problem in 2.6.1

> 
> 
>>- Metadata abstraction:  We intend to support multiple on-disk
> 
> metadata
> 
>>  formats, along with the 'native MD' format.  To do this, specific
>>  knowledge of MD on-disk structures must be abstracted out of the
> 
> core
> 
>>  and personalities modules.
> 
> 
>>- DDF Metadata support: Future products will use the 'DDF' on-disk
>>  metadata scheme.  These products will be bootable by the BIOS, but
>>  must have DDF support in the OS.  This will plug into the
> 
> abstraction
> 
>>  mentioned above.
> 
> 
> Neil already did the work to make 'md' support multiple types of 
> superblocks, but I'm not sure if we want to hack 'md' to support the 
> various vendor RAIDs out there.  DDF support we _definitely_ want, of 
> course.  DDF follows a very nice philosophy:  open[1] standard with no 
> vendor lock-in.
> 
> IMO, your post/effort all boils down to an open design question:  device
> 
> mapper or md, for doing stuff like vendor-raid1 or vendor-raid5?  And it
> 
> is even possible to share (for example) raid5 engine among all the 
> various vendor RAID5's?
> 

The stripe and parity format is not the problem here; md can be enhanced
to support different stripe and parity rotation sequences without much
trouble.

Also, think beyond just DDF.  Having plugable metadata personalities 
means that a module can be written for the existing Adaptec RAID 
products too (like the HostRAID functionality on our U320 adapters).
It also means that you can write personality modules for other vendors,
and even hardware RAID solutions.  Imagine having a PCI RAID card fail,
then plugging the drives directly into your computer and having the
array 'Just Work'.

As for the question of DM vs. MD, I think that you have to consider that
DM right now has no concept of storing configuration data on the disk
(at least that I can find, please correct me if I'm wrong).  I think
that DM will make a good LVM-like layer on top of MD, but I don't see it
replacing MD right now.


Scott


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 20:29     ` Chris Friesen
@ 2004-01-13 20:35       ` Matt Domsch
  0 siblings, 0 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-13 20:35 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Jeff Garzik, Scott Long, linux-raid, linux-kernel

On Tue, Jan 13, 2004 at 03:29:45PM -0500, Chris Friesen wrote:
> How is this different than the 20GB RAID0 and 6 15BB RAID1s that I've 
> got on two 100GB spindles right now?

Indeed, md does this with partitions on the disks today, so it is
analogous; DDF does this with disk extents, which has the same
functionality as partitions, but without an MSDOS partition table to
define the partitions, but an on-disk metadata format (yes, partition
tables are metadata too...). 

The solution needs partitions/extents in two places.
1) below the logical drive, from which logical drives are created.
2) above the logical drive, on which multiple file systems are
created.

md provides 1) today, and as discussed today patches exist to do 2)
that have not yet been merged.


-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 18:19   ` Jeff Garzik
@ 2004-01-13 20:29     ` Chris Friesen
  2004-01-13 20:35       ` Matt Domsch
  2004-01-13 21:10     ` Matt Domsch
  1 sibling, 1 reply; 57+ messages in thread
From: Chris Friesen @ 2004-01-13 20:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Matt Domsch, Scott Long, linux-raid, linux-kernel

Jeff Garzik wrote:
> Matt Domsch wrote:
> 
>> I haven't seen the spec yet myself, but I'm lead to believe that
>> DDF allows for multiple logical drives to be created across a single
>> set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on
>> two 80GB spindles), as well as whole disks be used.  It has a

How is this different than the 20GB RAID0 and 6 15BB RAID1s that I've 
got on two 100GB spindles right now?

I think its on 2.4, might even be 2.2.

Chris



-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 19:43       ` Jeff Garzik
@ 2004-01-13 20:00         ` mutex
  0 siblings, 0 replies; 57+ messages in thread
From: mutex @ 2004-01-13 20:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Scott Long, linux-kernel

On Tue, Jan 13, 2004 at 02:43:29PM -0500 or thereabouts, Jeff Garzik wrote:
> >hmm I don't remember that... was it on lkml or the raid development
> >list ? Can you give me a string/date to search around ?
> 
> 
> Other than "neil brown md superblock" don't recall.  In the past year or 
> two :)  There were patches, so it wasn't just discussion.
> 

in case anybody else is curious, I think this is the thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=103776556308924&w=2

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 19:30     ` mutex
@ 2004-01-13 19:43       ` Jeff Garzik
  2004-01-13 20:00         ` mutex
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff Garzik @ 2004-01-13 19:43 UTC (permalink / raw)
  To: mutex; +Cc: Scott Long, linux-kernel

mutex wrote:
> On Tue, Jan 13, 2004 at 02:05:55PM -0500 or thereabouts, Jeff Garzik wrote:
> 
>>>How about a endian safe superblock ?  Seriously, is that a 'bug' or a
>>>'feature' ?  Or do people just not care.
>>
>>
>>There was a thread discussing md's new superblock design, did you 
>>research/follow that?  neilb was actively soliciting comments and there 
>>was an amount of discussion.
>>
> 
> 
> hmm I don't remember that... was it on lkml or the raid development
> list ? Can you give me a string/date to search around ?


Other than "neil brown md superblock" don't recall.  In the past year or 
two :)  There were patches, so it wasn't just discussion.

	Jeff




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 18:44 ` Jeff Garzik
  2004-01-13 19:01   ` John Bradford
@ 2004-01-13 19:41   ` Matt Domsch
  2004-01-13 22:10     ` Arjan van de Ven
  2004-01-16  9:31       ` Lars Marowsky-Bree
  2004-01-13 20:41   ` Scott Long
  2004-01-13 22:42   ` Luca Berra
  3 siblings, 2 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-13 19:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown

On Tue, Jan 13, 2004 at 01:44:05PM -0500, Jeff Garzik wrote:
> You sorta hit a bad time for 2.4 development.  Even though my employer 
> (Red Hat), Adaptec, and many others must continue to support new 
> products on 2.4.x kernels,

Indeed, enterprise class products based on 2.4.x kernels will need
some form of solution here too.

> kernel development has shifted to 2.6.x (and soon 2.7.x).
> 
> In general, you want a strategy of "develop on latest, then backport if 
> needed."

Ideally in 2.6 one can use device mapper, but DM hasn't been
incorporated into 2.4 stock, I know it's not in RHEL 3, and I don't
believe it's included in SLES8.  Can anyone share thoughts on if a DDF
solution were built on top of DM, that DM could be included in 2.4
stock, RHEL3, or SLES8?  Otherwise, Adaptec will be stuck with two
different solutions anyhow, one for 2.4 (they're proposing enhancing
MD), and DM for 2.6.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 19:05   ` Jeff Garzik
@ 2004-01-13 19:30     ` mutex
  2004-01-13 19:43       ` Jeff Garzik
  0 siblings, 1 reply; 57+ messages in thread
From: mutex @ 2004-01-13 19:30 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Scott Long, linux-kernel

On Tue, Jan 13, 2004 at 02:05:55PM -0500 or thereabouts, Jeff Garzik wrote:
> >How about a endian safe superblock ?  Seriously, is that a 'bug' or a
> >'feature' ?  Or do people just not care.
> 
> 
> There was a thread discussing md's new superblock design, did you 
> research/follow that?  neilb was actively soliciting comments and there 
> was an amount of discussion.
> 

hmm I don't remember that... was it on lkml or the raid development
list ? Can you give me a string/date to search around ?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 18:21 ` mutex
@ 2004-01-13 19:05   ` Jeff Garzik
  2004-01-13 19:30     ` mutex
  2004-01-13 20:44   ` Scott Long
  1 sibling, 1 reply; 57+ messages in thread
From: Jeff Garzik @ 2004-01-13 19:05 UTC (permalink / raw)
  To: mutex; +Cc: Scott Long, linux-kernel

mutex wrote:
> On Mon, Jan 12, 2004 at 05:34:10PM -0700 or thereabouts, Scott Long wrote:
> 
>>All,
>>
>>Adaptec has been looking at the MD driver for a foundation for their
>>Open-Source software RAID stack.  This will help us provide full
>>and open support for current and future Adaptec RAID products (as
>>opposed to the limited support through closed drivers that we have now).
>>
>>While MD is fairly functional and clean, there are a number of 
>>enhancements to it that we have been working on for a while and would
>>like to push out to the community for review and integration.  These
>>include:
>>
> 
> 
> 
> How about a endian safe superblock ?  Seriously, is that a 'bug' or a
> 'feature' ?  Or do people just not care.


There was a thread discussing md's new superblock design, did you 
research/follow that?  neilb was actively soliciting comments and there 
was an amount of discussion.

	Jeff




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13 18:44 ` Jeff Garzik
@ 2004-01-13 19:01   ` John Bradford
  2004-01-13 19:41   ` Matt Domsch
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: John Bradford @ 2004-01-13 19:01 UTC (permalink / raw)
  To: Jeff Garzik, Scott Long; +Cc: Linux Kernel, linux-raid, Neil Brown

> [1] well, developed in secret, but published openly.  Not quite up to 
> Linux's standards, but decent for the h/w world.

..and patent-free?

John.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13  0:34 Proposed enhancements " Scott Long
  2004-01-13 16:26 ` Jakob Oestergaard
  2004-01-13 18:21 ` mutex
@ 2004-01-13 18:44 ` Jeff Garzik
  2004-01-13 19:01   ` John Bradford
                     ` (3 more replies)
  2004-01-13 22:06 ` Arjan van de Ven
  2004-01-14 23:07 ` Neil Brown
  4 siblings, 4 replies; 57+ messages in thread
From: Jeff Garzik @ 2004-01-13 18:44 UTC (permalink / raw)
  To: Scott Long; +Cc: Linux Kernel, linux-raid, Neil Brown

Scott Long wrote:
> I'm going to push these changes out in phases in order to keep the risk
> and churn to a minimum.  The attached patch is for the partition
> support.  It was originally from Ingo Molnar, but has changed quite a
> bit due to the radical changes in the disk/block layer in 2.6.  The 2.4
> version works quite well, while the 2.6 version is fairly fresh.  One
> problem that I have with it is that the created partitions show up in
> /proc/partitions after running fdisk, but not after a reboot.

You sorta hit a bad time for 2.4 development.  Even though my employer 
(Red Hat), Adaptec, and many others must continue to support new 
products on 2.4.x kernels, kernel development has shifted to 2.6.x (and 
soon 2.7.x).

In general, you want a strategy of "develop on latest, then backport if 
needed."  Once a solution is merged into the latest kernel, it 
automatically appears in many companies' products (and perhaps more 
importantly) product roadmaps.  Otherwise you will design various things 
into your software that have already been handled different in the 
future, thus creating an automatically-obsolete solution and support 
nightmare.

Now, addressing your specific issues...

> hile MD is fairly functional and clean, there are a number of enhancements to it that we have been working on for a while and would
> like to push out to the community for review and integration.  These
> include:
> 
> - partition support for md devices:  MD does not support the concept of
>   fdisk partitions; the only way to approximate this right now is by
>   creating multiple arrays on the same media.  Fixing this is required
>   for not only feature-completeness, but to allow our BIOS to recognise
>   the partitions on an array and properly boot them as it would boot a
>   normal disk.

Neil Brown already done a significant amount of research into this 
topic.  Given this, and his general status as md maintainer, you should 
definitely make sure he's kept in the loop.

Partitioning for md was discussed in this thread:
http://lkml.org/lkml/2003/11/13/182

In particular note Al Viro's response to Neil, in addition to Neil's own 
post.

And I could have _sworn_ that Neil already posted a patch to do 
partitions in md, but maybe my memory is playing tricks on me.


> - generic device arrival notification mechanism:  This is needed to
>   support device hot-plug, and allow arrays to be automatically
>   configured regardless of when the md module is loaded or initialized.
>   RedHat EL3 has a scaled down version of this already, but it is
>   specific to MD and only works if MD is statically compiled into the
>   kernel.  A general mechanism will benefit MD as well as any other
>   storage system that wants hot-arrival notices.

This would be via /sbin/hotplug, in the Linux world.  SCSI already does 
this, I think, so I suppose something similar would happen for md.


> - RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
>   that spans a chunk boundary.  Modifications are needed so that it can
>   take a request and break it up into 1 or more per-disk requests.

I thought that raid0 was one of the few that actually did bio splitting 
correctly?  Hum, maybe this is a 2.4-only issue.  Interesting, and 
agreed, if so...


> - Metadata abstraction:  We intend to support multiple on-disk metadata
>   formats, along with the 'native MD' format.  To do this, specific
>   knowledge of MD on-disk structures must be abstracted out of the core
>   and personalities modules.

> - DDF Metadata support: Future products will use the 'DDF' on-disk
>   metadata scheme.  These products will be bootable by the BIOS, but
>   must have DDF support in the OS.  This will plug into the abstraction
>   mentioned above.

Neil already did the work to make 'md' support multiple types of 
superblocks, but I'm not sure if we want to hack 'md' to support the 
various vendor RAIDs out there.  DDF support we _definitely_ want, of 
course.  DDF follows a very nice philosophy:  open[1] standard with no 
vendor lock-in.

IMO, your post/effort all boils down to an open design question:  device 
mapper or md, for doing stuff like vendor-raid1 or vendor-raid5?  And it 
is even possible to share (for example) raid5 engine among all the 
various vendor RAID5's?

	Jeff


[1] well, developed in secret, but published openly.  Not quite up to 
Linux's standards, but decent for the h/w world.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13  0:34 Proposed enhancements " Scott Long
  2004-01-13 16:26 ` Jakob Oestergaard
@ 2004-01-13 18:21 ` mutex
  2004-01-13 19:05   ` Jeff Garzik
  2004-01-13 20:44   ` Scott Long
  2004-01-13 18:44 ` Jeff Garzik
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 57+ messages in thread
From: mutex @ 2004-01-13 18:21 UTC (permalink / raw)
  To: Scott Long; +Cc: linux-kernel

On Mon, Jan 12, 2004 at 05:34:10PM -0700 or thereabouts, Scott Long wrote:
> All,
> 
> Adaptec has been looking at the MD driver for a foundation for their
> Open-Source software RAID stack.  This will help us provide full
> and open support for current and future Adaptec RAID products (as
> opposed to the limited support through closed drivers that we have now).
> 
> While MD is fairly functional and clean, there are a number of 
> enhancements to it that we have been working on for a while and would
> like to push out to the community for review and integration.  These
> include:
> 


How about a endian safe superblock ?  Seriously, is that a 'bug' or a
'feature' ?  Or do people just not care.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 14:19 ` Matt Domsch
  2004-01-13 17:13   ` Andreas Dilger
  2004-01-13 18:19   ` Kevin P. Fleming
@ 2004-01-13 18:19   ` Jeff Garzik
  2004-01-13 20:29     ` Chris Friesen
  2004-01-13 21:10     ` Matt Domsch
  2 siblings, 2 replies; 57+ messages in thread
From: Jeff Garzik @ 2004-01-13 18:19 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Scott Long, linux-raid, linux-kernel

Matt Domsch wrote:
> I haven't seen the spec yet myself, but I'm lead to believe that
> DDF allows for multiple logical drives to be created across a single
> set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on
> two 80GB spindles), as well as whole disks be used.  It has a


Me either.  Any idea if there will be a public comment period, or is the 
spec "locked" into 1.0 when it's released in a month or so?

	Jeff




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 14:19 ` Matt Domsch
  2004-01-13 17:13   ` Andreas Dilger
@ 2004-01-13 18:19   ` Kevin P. Fleming
  2004-01-13 18:19   ` Jeff Garzik
  2 siblings, 0 replies; 57+ messages in thread
From: Kevin P. Fleming @ 2004-01-13 18:19 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Scott Long, linux-raid, linux-kernel

Matt Domsch wrote:

> DDF is quickly becoming important to RAID and system vendors, and I
> welcome Adaptec's work to implement DDF support on Linux.

Fully agreed, the days of vendor-specific metadata formats need to be 
numbered (with a small number). Speaking a customer with a CMD 
FC-to-SCSI RAID controller, which used to be dual-redundant but is now 
single (because of a dead unit), we are not looking forward to the day 
when the remaining controller dies and we lose all the data on the array 
due to a forced metadata format change.

However, given that this will not likely be 2.6 material until after 
it's built and tested in 2.7 and then backported, it doesn't seem to 
make any sense to me to build any of this on top of the MD subsystem at 
all (see other replies about using DM instead). Additionally, it also 
does not seem to make any sense to build any of the DDF 
reading/writing/management in the kernel _at all_. There is no advantage 
to it being there once initramfs is a standard part of the boot process, 
so all of this should be done is userspace and just communicated into 
the kernel to tell it what logical devices to construct using which DM 
modules.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 10:24 ` Lars Marowsky-Bree
@ 2004-01-13 18:03   ` Scott Long
  2004-01-16  9:29     ` Lars Marowsky-Bree
  0 siblings, 1 reply; 57+ messages in thread
From: Scott Long @ 2004-01-13 18:03 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-raid

Lars Marowsky-Bree wrote:
> On 2004-01-12T20:41:54,
>    Scott Long <scott_long@adaptec.com> said:
> 
> Hi Scott, this is good to see!
> 
> 
>>- partition support for md devices:  MD does not support the concept of
>>  fdisk partitions; the only way to approximate this right now is by
>>  creating multiple arrays on the same media.  Fixing this is required
>>  for not only feature-completeness, but to allow our BIOS to recognise
>>  the partitions on an array and properly boot them as it would boot a
>>  normal disk.
> 
> 
> I'm not too excited about this, because Device Mapping on top of md is
> much more flexible, but I see that users want it, and it should be
> pretty easy to add.
> 

The biggest issue here is that a real fdisk table needs to exist on the
array in order for our BIOS to recognise it as a boot device.  While
Device Mapper can probably do a good job at creating logical storage
extends out of a single md device, it doesn't get us any closer to being
able to boot off of an MD array.

> 
>>- generic device arrival notification mechanism:  This is needed to
>>  support device hot-plug, and allow arrays to be automatically
>>  configured regardless of when the md module is loaded or initialized.
>>  RedHat EL3 has a scaled down version of this already, but it is
>>  specific to MD and only works if MD is statically compiled into the
>>  kernel.  A general mechanism will benefit MD as well as any other
>>  storage system that wants hot-arrival notices.
> 
> 
> Yes. Is anything missing from the 2.6 & hotplug & udev solution which
> you require?
> 

I'll admit that I'm not as familiar with 2.6 as I should be.  Does a
disk arrival mechanism already exist?

> 
>>- RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
>>  that spans a chunk boundary.  Modifications are needed so that it can
>>  take a request and break it up into 1 or more per-disk requests.
> 
> 
> Agreed.
> 
> 
>>- Metadata abstraction:  We intend to support multiple on-disk metadata
>>  formats, along with the 'native MD' format.  To do this, specific
>>  knowledge of MD on-disk structures must be abstracted out of the core
>>  and personalities modules.
> 
> 
> This can get difficult, of course, and needs to be implemented in a way
> which doesn't slow us down too much.
> 

Normal I/O doesn't touch the metadata.  Only during error recovery and
configuration would this be touched.  Instead of the core and 
personality modules directly manipulating the metadata, a set of
metadata-specific function pointers will be called through to handle
changing the on-disk metadata.  So, no significant operational overhead
is introduced.

> 
>>- DDF Metadata support: Future products will use the 'DDF' on-disk
>>  metadata scheme.  These products will be bootable by the BIOS, but
>>  must have DDF support in the OS.  This will plug into the abstraction
>>  mentioned above.
> 
> 
> OK. How does the DDF metadata differ from the current md data? Is it
> merely the layout, or are there functional differences?
> 

I'm not sure if the DDF spec has been officially published yet.  It
defines a set of data structures and their location on the disk that
allows disk to be uniquely identified, logical extents to be grouped
into arrays, recording of disk and array state, and event logging.
It is completely different from the metadata that is used for classic
MD.  However, it is still compatible with the high-level striping and
mirroring operations of MD.

> In particular, I'm wondering whether partitions using the new activity
> logging features of md will still be bootable, or whether the boot
> partitions need to be 'md classic'.

Our products will only recognise and boot off of DDF arrays.  They have
no concept of classic MD metadata.

The goal of the abstraction is to allow new metadata personalities to be
plugged in and 'Just Work', while not inhibiting the choice of using
whatever metadata is most suitable for existing arrays.  If you need to
boot off of a DDF-aware controller, but use classic MD for secondary
arrays, that will work.

> 
> 
>>bit due to the radical changes in the disk/block layer in 2.6.  The 2.4
>>version works quite well, while the 2.6 version is fairly fresh. 
> 
> 
> I'd be reluctant doing any of the work for 2.4, but this is of course
> upto you.

This work was originally started on 2.4.  With the closing of 2.4 and
release of 2.6, we are porting are work forward.  It would be nice to
integrate the changes into 2.4 also, but we recognise the need for 2.4
to remain as stable as possible.

Scott


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13 14:19 ` Matt Domsch
@ 2004-01-13 17:13   ` Andreas Dilger
  2004-01-13 22:26     ` Andreas Dilger
  2004-01-13 18:19   ` Kevin P. Fleming
  2004-01-13 18:19   ` Jeff Garzik
  2 siblings, 1 reply; 57+ messages in thread
From: Andreas Dilger @ 2004-01-13 17:13 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Scott Long, linux-raid, linux-kernel

On Jan 13, 2004  08:19 -0600, Matt Domsch wrote:
> On Mon, Jan 12, 2004 at 08:41:54PM -0700, Scott Long wrote:
> > - DDF Metadata support: Future products will use the 'DDF' on-disk
> >    metadata scheme.  These products will be bootable by the BIOS, but
> >    must have DDF support in the OS.  This will plug into the abstraction
> >    mentioned above.
> 
> For those unfamiliar with DDF (Disk Data Format), it is a Storage
> Networking Industry Association (SNIA) project ("Common RAID DDF
> TWG"), designed to provide a single metadata format to be used by all
> the RAID vendors (hardware and software alike).  It removes vendor
> lock-in by having a metadata format that all can use, thus in theory
> you could move disks from an Adaptec hardware RAID controller to an
> LSI software RAID solution without reformatting the disks or touching
> your file systems in any way.  Dell has been championing the DDF
> concept for quite a while, and is driving vendors from which we
> purchase RAID solutions to use DDF instead of their own individual
> metadata formats.
> 
> I haven't seen the spec yet myself, but I'm lead to believe that
> DDF allows for multiple logical drives to be created across a single
> set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on
> two 80GB spindles), as well as whole disks be used.  It has a
> mechanism to support reconstruction checkpointing, so you don't have
> to restart a reconstruct from the beginning after a reboot, but from
> where you left off.  And other useful features too that you'd expect
> in a common RAID solution.  

So, why not use EVMS and/or Device Mapper to read the DDF metadata and
set up the mappings that way?

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed enhancements to MD
  2004-01-13  0:34 Proposed enhancements " Scott Long
@ 2004-01-13 16:26 ` Jakob Oestergaard
       [not found]   ` <20040113201058.GD1594@srv-lnx2600.matchmail.com>
  2004-01-13 18:21 ` mutex
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 57+ messages in thread
From: Jakob Oestergaard @ 2004-01-13 16:26 UTC (permalink / raw)
  To: Scott Long; +Cc: linux-kernel

On Mon, Jan 12, 2004 at 05:34:10PM -0700, Scott Long wrote:
> All,
> 
> Adaptec has been looking at the MD driver for a foundation for their
> Open-Source software RAID stack.  This will help us provide full
> and open support for current and future Adaptec RAID products (as
> opposed to the limited support through closed drivers that we have now).

Interesting...

> 
> While MD is fairly functional and clean, there are a number of 
> enhancements to it that we have been working on for a while and would
> like to push out to the community for review and integration.  These
> include:
> 
> - partition support for md devices:  MD does not support the concept of
>   fdisk partitions; the only way to approximate this right now is by
>   creating multiple arrays on the same media.  Fixing this is required
>   for not only feature-completeness, but to allow our BIOS to recognise
>   the partitions on an array and properly boot them as it would boot a
>   normal disk.

This change is probably not going to go into 2.6.X anytime soon anyway,
so what's your thoughts on doing this "right" - getting MD moved into
DM ?

That would solve the problem, as I see it.

I'm not currently involved in either of those development efforts, but I
thought I'd bring your attention to the DM/MD issue - there was some
talk about it in the past.

Also, since DM will do on-line resizing and we want MD to do this as
well some day, I really think this is the way to be going.  Getting
partition support on MD devices will solve a problem now, but for the
long run I really think MD should be a part of DM.

Anyway, that's my 0.02 Euro on that issue.

...
> - Metadata abstraction:  We intend to support multiple on-disk metadata
>   formats, along with the 'native MD' format.  To do this, specific
>   knowledge of MD on-disk structures must be abstracted out of the core
>   and personalities modules.

I think this one touches the DM issue as well.

So, how about Adaptec and IBM get someone to move MD into DM, and while
you're at it, add hot resizing and hot conversion between RAID levels  :)

2.7.1?  ;)

Jokes aside, I'd like to hear your oppinions on this longer-term
perspective on things...

The RAID conversion/resize code for userspace exists already, and it
works except for some cases with RAID-5 and disks of non-equal size,
where it breaks horribly (fixable bug though).


 / jakob


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13  3:41 Proposed Enhancements " Scott Long
  2004-01-13 10:24 ` Lars Marowsky-Bree
@ 2004-01-13 14:19 ` Matt Domsch
  2004-01-13 17:13   ` Andreas Dilger
                     ` (2 more replies)
  1 sibling, 3 replies; 57+ messages in thread
From: Matt Domsch @ 2004-01-13 14:19 UTC (permalink / raw)
  To: Scott Long; +Cc: linux-raid, linux-kernel

On Mon, Jan 12, 2004 at 08:41:54PM -0700, Scott Long wrote:
> - DDF Metadata support: Future products will use the 'DDF' on-disk
>    metadata scheme.  These products will be bootable by the BIOS, but
>    must have DDF support in the OS.  This will plug into the abstraction
>    mentioned above.

For those unfamiliar with DDF (Disk Data Format), it is a Storage
Networking Industry Association (SNIA) project ("Common RAID DDF
TWG"), designed to provide a single metadata format to be used by all
the RAID vendors (hardware and software alike).  It removes vendor
lock-in by having a metadata format that all can use, thus in theory
you could move disks from an Adaptec hardware RAID controller to an
LSI software RAID solution without reformatting the disks or touching
your file systems in any way.  Dell has been championing the DDF
concept for quite a while, and is driving vendors from which we
purchase RAID solutions to use DDF instead of their own individual
metadata formats.

I haven't seen the spec yet myself, but I'm lead to believe that
DDF allows for multiple logical drives to be created across a single
set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on
two 80GB spindles), as well as whole disks be used.  It has a
mechanism to support reconstruction checkpointing, so you don't have
to restart a reconstruct from the beginning after a reboot, but from
where you left off.  And other useful features too that you'd expect
in a common RAID solution.  

DDF is quickly becoming important to RAID and system vendors, and I
welcome Adaptec's work to implement DDF support on Linux.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Proposed Enhancements to MD
  2004-01-13  3:41 Proposed Enhancements " Scott Long
@ 2004-01-13 10:24 ` Lars Marowsky-Bree
  2004-01-13 18:03   ` Scott Long
  2004-01-13 14:19 ` Matt Domsch
  1 sibling, 1 reply; 57+ messages in thread
From: Lars Marowsky-Bree @ 2004-01-13 10:24 UTC (permalink / raw)
  To: Scott Long, linux-raid

On 2004-01-12T20:41:54,
   Scott Long <scott_long@adaptec.com> said:

Hi Scott, this is good to see!

> - partition support for md devices:  MD does not support the concept of
>   fdisk partitions; the only way to approximate this right now is by
>   creating multiple arrays on the same media.  Fixing this is required
>   for not only feature-completeness, but to allow our BIOS to recognise
>   the partitions on an array and properly boot them as it would boot a
>   normal disk.

I'm not too excited about this, because Device Mapping on top of md is
much more flexible, but I see that users want it, and it should be
pretty easy to add.

> - generic device arrival notification mechanism:  This is needed to
>   support device hot-plug, and allow arrays to be automatically
>   configured regardless of when the md module is loaded or initialized.
>   RedHat EL3 has a scaled down version of this already, but it is
>   specific to MD and only works if MD is statically compiled into the
>   kernel.  A general mechanism will benefit MD as well as any other
>   storage system that wants hot-arrival notices.

Yes. Is anything missing from the 2.6 & hotplug & udev solution which
you require?

> - RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
>   that spans a chunk boundary.  Modifications are needed so that it can
>   take a request and break it up into 1 or more per-disk requests.

Agreed.

> - Metadata abstraction:  We intend to support multiple on-disk metadata
>   formats, along with the 'native MD' format.  To do this, specific
>   knowledge of MD on-disk structures must be abstracted out of the core
>   and personalities modules.

This can get difficult, of course, and needs to be implemented in a way
which doesn't slow us down too much.

> - DDF Metadata support: Future products will use the 'DDF' on-disk
>   metadata scheme.  These products will be bootable by the BIOS, but
>   must have DDF support in the OS.  This will plug into the abstraction
>   mentioned above.

OK. How does the DDF metadata differ from the current md data? Is it
merely the layout, or are there functional differences?

In particular, I'm wondering whether partitions using the new activity
logging features of md will still be bootable, or whether the boot
partitions need to be 'md classic'.

> bit due to the radical changes in the disk/block layer in 2.6.  The 2.4
> version works quite well, while the 2.6 version is fairly fresh. 

I'd be reluctant doing any of the work for 2.4, but this is of course
upto you.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Proposed Enhancements to MD
@ 2004-01-13  3:41 Scott Long
  2004-01-13 10:24 ` Lars Marowsky-Bree
  2004-01-13 14:19 ` Matt Domsch
  0 siblings, 2 replies; 57+ messages in thread
From: Scott Long @ 2004-01-13  3:41 UTC (permalink / raw)
  To: linux-raid

(I already posted this to LKML a few hours ago but forgot to post it
over here)

All,

Adaptec has been looking at the MD driver for a foundation for their
Open-Source software RAID stack.  This will help us provide full
and open support for current and future Adaptec RAID products (as
opposed to the limited support through closed drivers that we have now).

While MD is fairly functional and clean, there are a number of 
enhancements to it that we have been working on for a while and would
like to push out to the community for review and integration.  These
include:

- partition support for md devices:  MD does not support the concept of
   fdisk partitions; the only way to approximate this right now is by
   creating multiple arrays on the same media.  Fixing this is required
   for not only feature-completeness, but to allow our BIOS to recognise
   the partitions on an array and properly boot them as it would boot a
   normal disk.

- generic device arrival notification mechanism:  This is needed to
   support device hot-plug, and allow arrays to be automatically
   configured regardless of when the md module is loaded or initialized.
   RedHat EL3 has a scaled down version of this already, but it is
   specific to MD and only works if MD is statically compiled into the
   kernel.  A general mechanism will benefit MD as well as any other
   storage system that wants hot-arrival notices.

- RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
   that spans a chunk boundary.  Modifications are needed so that it can
   take a request and break it up into 1 or more per-disk requests.

- Metadata abstraction:  We intend to support multiple on-disk metadata
   formats, along with the 'native MD' format.  To do this, specific
   knowledge of MD on-disk structures must be abstracted out of the core
   and personalities modules.

- DDF Metadata support: Future products will use the 'DDF' on-disk
   metadata scheme.  These products will be bootable by the BIOS, but
   must have DDF support in the OS.  This will plug into the abstraction
   mentioned above.

I'm going to push these changes out in phases in order to keep the risk
and churn to a minimum.  The attached patch is for the partition
support.  It was originally from Ingo Molnar, but has changed quite a
bit due to the radical changes in the disk/block layer in 2.6.  The 2.4
version works quite well, while the 2.6 version is fairly fresh.  One
problem that I have with it is that the created partitions show up in
/proc/partitions after running fdisk, but not after a reboot.

Scott




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Proposed enhancements to MD
@ 2004-01-13  0:34 Scott Long
  2004-01-13 16:26 ` Jakob Oestergaard
                   ` (4 more replies)
  0 siblings, 5 replies; 57+ messages in thread
From: Scott Long @ 2004-01-13  0:34 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

All,

Adaptec has been looking at the MD driver for a foundation for their
Open-Source software RAID stack.  This will help us provide full
and open support for current and future Adaptec RAID products (as
opposed to the limited support through closed drivers that we have now).

While MD is fairly functional and clean, there are a number of 
enhancements to it that we have been working on for a while and would
like to push out to the community for review and integration.  These
include:

- partition support for md devices:  MD does not support the concept of
   fdisk partitions; the only way to approximate this right now is by
   creating multiple arrays on the same media.  Fixing this is required
   for not only feature-completeness, but to allow our BIOS to recognise
   the partitions on an array and properly boot them as it would boot a
   normal disk.

- generic device arrival notification mechanism:  This is needed to
   support device hot-plug, and allow arrays to be automatically
   configured regardless of when the md module is loaded or initialized.
   RedHat EL3 has a scaled down version of this already, but it is
   specific to MD and only works if MD is statically compiled into the
   kernel.  A general mechanism will benefit MD as well as any other
   storage system that wants hot-arrival notices.

- RAID-0 fixes:  The MD RAID-0 personality is unable to perform I/O
   that spans a chunk boundary.  Modifications are needed so that it can
   take a request and break it up into 1 or more per-disk requests.

- Metadata abstraction:  We intend to support multiple on-disk metadata
   formats, along with the 'native MD' format.  To do this, specific
   knowledge of MD on-disk structures must be abstracted out of the core
   and personalities modules.

- DDF Metadata support: Future products will use the 'DDF' on-disk
   metadata scheme.  These products will be bootable by the BIOS, but
   must have DDF support in the OS.  This will plug into the abstraction
   mentioned above.

I'm going to push these changes out in phases in order to keep the risk
and churn to a minimum.  The attached patch is for the partition
support.  It was originally from Ingo Molnar, but has changed quite a
bit due to the radical changes in the disk/block layer in 2.6.  The 2.4
version works quite well, while the 2.6 version is fairly fresh.  One
problem that I have with it is that the created partitions show up in
/proc/partitions after running fdisk, but not after a reboot.

Scott

[-- Attachment #2: md_partition.diff --]
[-- Type: text/plain, Size: 1907 bytes --]

--- linux-2.6.1/drivers/md/md.c	2004-01-08 23:59:19.000000000 -0700
+++ md/linux-2.6.1/drivers/md/md.c	2004-01-12 14:46:33.818544376 -0700
@@ -1446,6 +1446,9 @@
 	return 1;
 }
 
+/* MD Partition definitions */
+#define MDP_MINOR_COUNT		16
+#define MDP_MINOR_SHIFT		4
 
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {
@@ -1453,6 +1456,7 @@
 	int unit = *part;
 	mddev_t *mddev = mddev_find(unit);
 	struct gendisk *disk;
+	int index;
 
 	if (!mddev)
 		return NULL;
@@ -1463,15 +1467,22 @@
 		mddev_put(mddev);
 		return NULL;
 	}
-	disk = alloc_disk(1);
+	disk = alloc_disk(MDP_MINOR_COUNT);
 	if (!disk) {
 		up(&disks_sem);
 		mddev_put(mddev);
 		return NULL;
 	}
+	index = mdidx(mddev);
 	disk->major = MD_MAJOR;
-	disk->first_minor = mdidx(mddev);
-	sprintf(disk->disk_name, "md%d", mdidx(mddev));
+	disk->first_minor = index << MDP_MINOR_SHIFT;
+	disk->minors = MDP_MINOR_COUNT;
+	if (index >= 26) {
+		sprintf(disk->disk_name, "md%c%c",
+			'a' + index/26-1,'a' + index % 26);
+	} else {
+		sprintf(disk->disk_name, "md%c", 'a' + index % 26);
+	}
 	disk->fops = &md_fops;
 	disk->private_data = mddev;
 	disk->queue = mddev->queue;
@@ -2512,18 +2523,21 @@
 	 * 4 sectors (with a BIG number of cylinders...). This drives
 	 * dosfs just mad... ;-)
 	 */
+#define MD_HEADS	254
+#define MD_SECTORS	60
 		case HDIO_GETGEO:
 			if (!loc) {
 				err = -EINVAL;
 				goto abort_unlock;
 			}
-			err = put_user (2, (char *) &loc->heads);
+			err = put_user (MD_HEADS, (char *) &loc->heads);
 			if (err)
 				goto abort_unlock;
-			err = put_user (4, (char *) &loc->sectors);
+			err = put_user (MD_SECTORS, (char *) &loc->sectors);
 			if (err)
 				goto abort_unlock;
-			err = put_user(get_capacity(disks[mdidx(mddev)])/8,
+			err = put_user(get_capacity(disks[mdidx(mddev)]) /
+						(MD_HEADS * MD_SECTORS),
 						(short *) &loc->cylinders);
 			if (err)
 				goto abort_unlock;

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2004-01-16 14:13 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-13 19:59 Proposed enhancements to MD Cress, Andrew R
  -- strict thread matches above, loose matches on Subject: below --
2004-01-13  3:41 Proposed Enhancements " Scott Long
2004-01-13 10:24 ` Lars Marowsky-Bree
2004-01-13 18:03   ` Scott Long
2004-01-16  9:29     ` Lars Marowsky-Bree
2004-01-13 14:19 ` Matt Domsch
2004-01-13 17:13   ` Andreas Dilger
2004-01-13 22:26     ` Andreas Dilger
2004-01-13 18:19   ` Kevin P. Fleming
2004-01-13 18:19   ` Jeff Garzik
2004-01-13 20:29     ` Chris Friesen
2004-01-13 20:35       ` Matt Domsch
2004-01-13 21:10     ` Matt Domsch
2004-01-13  0:34 Proposed enhancements " Scott Long
2004-01-13 16:26 ` Jakob Oestergaard
     [not found]   ` <20040113201058.GD1594@srv-lnx2600.matchmail.com>
2004-01-14 19:07     ` Jakob Oestergaard
     [not found]       ` <20040114194052.GK1594@srv-lnx2600.matchmail.com>
2004-01-14 21:02         ` Jakob Oestergaard
     [not found]           ` <20040114222447.GL1594@srv-lnx2600.matchmail.com>
2004-01-15  1:42             ` Jakob Oestergaard
2004-01-13 18:21 ` mutex
2004-01-13 19:05   ` Jeff Garzik
2004-01-13 19:30     ` mutex
2004-01-13 19:43       ` Jeff Garzik
2004-01-13 20:00         ` mutex
2004-01-13 20:44   ` Scott Long
2004-01-13 18:44 ` Jeff Garzik
2004-01-13 19:01   ` John Bradford
2004-01-13 19:41   ` Matt Domsch
2004-01-13 22:10     ` Arjan van de Ven
2004-01-16  9:31     ` Lars Marowsky-Bree
2004-01-16  9:31       ` Lars Marowsky-Bree
2004-01-16  9:57       ` Arjan van de Ven
2004-01-13 20:41   ` Scott Long
2004-01-13 22:33     ` Jure Pečar
2004-01-13 22:33       ` Jure Pečar
2004-01-13 22:44       ` Scott Long
2004-01-13 22:56       ` viro
2004-01-14 15:52     ` Kevin Corry
2004-01-13 22:42   ` Luca Berra
2004-01-13 22:06 ` Arjan van de Ven
2004-01-13 22:44   ` Wakko Warner
2004-01-13 22:34     ` Arjan van de Ven
2004-01-13 23:09     ` Andreas Steinmetz
2004-01-13 23:38       ` Wakko Warner
2004-01-14 16:16         ` Kevin Corry
2004-01-14 16:53           ` Kevin P. Fleming
2004-01-14 23:07 ` Neil Brown
2004-01-15 11:10   ` Norman Schmidt
2004-01-15 21:52   ` Matt Domsch
2004-01-15 21:52     ` Matt Domsch
2004-01-16  9:24     ` Lars Marowsky-Bree
2004-01-16  9:24       ` Lars Marowsky-Bree
2004-01-16 13:43       ` Matt Domsch
2004-01-16 13:56         ` Lars Marowsky-Bree
2004-01-16 14:06           ` Christoph Hellwig
2004-01-16 14:11             ` Matt Domsch
2004-01-16 14:11               ` Matt Domsch
2004-01-16 14:13               ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.