All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
       [not found] <7554605.886551236670855947.JavaMail.coremail@bj163app40.163.com>
@ 2009-03-13  1:00 ` Neil Brown
  0 siblings, 0 replies; 24+ messages in thread
From: Neil Brown @ 2009-03-13  1:00 UTC (permalink / raw)
  To: linux-jay; +Cc: Dan Williams, Yuri Tikhonov, Wolfgang Denk, dzu, linux-raid


> hi neil:
> I get your the latest patches (v2.6.29-rc5-335-g0ac4ee7) from git://neil.brown.name/md.
> I want to test the function that switch raid from raid5 to raid6.
> But find some problem.
> I create a raid6 which has 4 disks,and write some files to this raid6 ,next,i remove two of them.
> if this raid6 is ok,now i can read the files that i have written correctly.
> but in fact that i can't read the files ok.maybe there is some error about raid6 in v2.6.29-rc5-335-g0ac4ee7.
> 
> 
> Steps i have done are as follows
> 1, mdadm -C /dev/md6 -l 6 -n 4 /dev/sda /dev/sdb /dev/sdc /dev/sdd --metadata=1.0 --size=1000000 -f
> 2, pvcreate /dev/md6
> 3, vgcreate md6vg /dev/md6
> 4, lvcreate -L 200M -n md6lv md6vg
> 5, mkfs.xfs /dev/mapper/md6vg-md6lv
> 6, mount /dev/mapper/md6vg-md6lv /tmp     (is ok)
> 
> 7, mdadm -f /dev/md6 /dev/sda /dev/sdb  (faulty two disks)
> 8, mount /dev/mapper/md6vg-md6lv /tmp   (now can't read the filesystem information correctly)
> (mount: Structure needs cleaning) (print this )
>  
> thanks neil ,right here waiting for your help !

Thanks for reporting this.  There must be something broken in the
changes to raid6 to support hardware-offload of the calculations.

If you try my 'md-scratch' branch it might work better.  It has all
the code for raid level conversion, but none of the raid6 rework.

NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
@ 2009-03-10  8:24 jzc-sina
  0 siblings, 0 replies; 24+ messages in thread
From: jzc-sina @ 2009-03-10  8:24 UTC (permalink / raw)
  To: linux-raid

hi neil:
I get your the latest patches (v2.6.29-rc5-335-g0ac4ee7) from git://neil.brown.name/md.
I want to test the function that switch raid from raid5 to raid6.
But find some problem.
I create a raid6 which has 4 disks,and write some files to this raid6 ,next,i remove two of them.
if this raid6 is ok,now i can read the files that i have written correctly.
but in fact that i can't read the files ok.maybe there is some error about raid6 in v2.6.29-rc5-335-g0ac4ee7.


Steps i have done are as follows
1, mdadm -C /dev/md6 -l 6 -n 4 /dev/sda /dev/sdb /dev/sdc /dev/sdd --metadata=1.0 --size=1000000 -f
2, pvcreate /dev/md6
3, vgcreate md6vg /dev/md6
4, lvcreate -L 200M -n md6lv md6vg
5, mkfs.xfs /dev/mapper/md6vg-md6lv
6, mount /dev/mapper/md6vg-md6lv /tmp     (is ok)

7, mdadm -f /dev/md6 /dev/sda /dev/sdb  (faulty two disks)
8, mount /dev/mapper/md6vg-md6lv /tmp   (now can't read the filesystem information correctly)
(mount: Structure needs cleaning) (print this )

thanks neil ,right here waiting for your help !

zhenchengjin 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-16  5:35           ` Neil Brown
@ 2009-02-16 17:31             ` Nagilum
  0 siblings, 0 replies; 24+ messages in thread
From: Nagilum @ 2009-02-16 17:31 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown


----- Message from neilb@suse.de ---------
     Date: Mon, 16 Feb 2009 16:35:52 +1100
     From: Neil Brown <neilb@suse.de>
  Subject: Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
       To: Bill Davidsen <davidsen@tmr.com>
       Cc: Julian Cowley <julian@lava.net>, Keld Jorn Simonsen  
<keld@dkuug.dk>, linux-raid@vger.kernel.org

>> Ob. plug for raid5E: the advantages of raid5E are two-fold. The most
>> obvious is that head motion is spread over N+2 drives (N being number of
>> data drives) which improves performance quite a bit in the common small
>> business case of 4-5 drive setups. It also puts some use on each drive,
>> so you don't suddenly start using a drive which may have been spun down
>> for a month, may have developed issues since SMART was last run, etc.
>>
>
> Are you thinking of raid5e, where all the spare space is at the end of
> the devices, or raid5ee where it is more evenly distributed?

raid5E I'd say.

> So raid5e is just a normal raid5 where you don't use all of the space.
> When a failure happens, you reshape to n-1 drives, thus absorbing the
> space.
>
> raid5ee is much like raid6, but you don't read or write the Q block.
> If you lose a drive, you rebuild it in the space were the Q block
> lives.
>
> So would you just use raid6 normally and transition to a contorted
> raid5 on device failure?  Or would you really want to leave those
> blocks fallow?

My understanding is that 5EE leaves those blocks empty. Doing real Q  
blocks would entail too much overhead but it reminds of an idea I had  
some time ago. I call it lazy-Raid6 ;)

Problem: You have enough disks to run RAID6 but you don't want to pay  
the performance penalty* of RAID6.
The solution in those cases is usually RAID5+hotspare but maybe we can  
do better.
We could also use the hotspare to store the RAID6 polynom but we have  
to calculate this (or more specifically read/write the stripe/block)  
only when the disks are idle. This of course means that the hotspare  
will have a number of invalid blocks after each write operation but  
the majority of blocks will be up-to-date. (use a bitmap to mark dirty  
blocks and "clean up" when the disks are idle)
The goal behind this is to have basically the same performance as with  
normal RAID5 but a higher failure resilience. In my experience  
harddisks often fail partially so that if you have a partial and a  
complete disk failure, chances are you will be able to recover. Even  
when two disks fail completely the number of dirty blocks should  
usually be pretty low so we would be able recover most of the data.
If there is a single disk failure we behave like a normal  
raid5+(hot)spare of course.
It is not intended as a replacement for normal RAID6 but it would give  
most of your data about the same protection while maintaining the  
speed of RAID5.

*) The main speed advantage of RAID5 vs. RAID6 comes from the fact  
that if you write one physical block**) in a RAID5 you only need to  
update***) one other additional physical block. If you write a  
physical block in a RAID6 you have to read the whole stripe and then  
write the RAID6 chunk of the stripe.
**) A RAID chunk consists of several physical blocks. Several chunks  
make up a stripe.
***) read+write

Ok, I hope no one can claim a patent on it now. ;)
Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-13 16:54         ` Bill Davidsen
@ 2009-02-16  5:35           ` Neil Brown
  2009-02-16 17:31             ` Nagilum
  0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2009-02-16  5:35 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Julian Cowley, Keld Jørn Simonsen, linux-raid

On Friday February 13, davidsen@tmr.com wrote:
> Julian Cowley wrote:
> And in this case locking the bard door after the horse has left is 
> probably a path of least confusion.
> 
> > Perhaps instead the documentation in mdadm(8) and md(4) could be 
> > updated to mention that raid10 is a combination of the concepts in 
> > RAID 1 and RAID 0, but is generalized enough so that it can be done 
> > with just two drives at a minimum.  That would have caught my eye, at 
> > least.
> 
> Good idea.

Patches gladly accepted.


> 
> Ob. plug for raid5E: the advantages of raid5E are two-fold. The most 
> obvious is that head motion is spread over N+2 drives (N being number of 
> data drives) which improves performance quite a bit in the common small 
> business case of 4-5 drive setups. It also puts some use on each drive, 
> so you don't suddenly start using a drive which may have been spun down 
> for a month, may have developed issues since SMART was last run, etc.
> 

Are you thinking of raid5e, where all the spare space is at the end of
the devices, or raid5ee where it is more evenly distributed?

So raid5e is just a normal raid5 where you don't use all of the space.
When a failure happens, you reshape to n-1 drives, thus absorbing the
space.

raid5ee is much like raid6, but you don't read or write the Q block.
If you lose a drive, you rebuild it in the space were the Q block
lives. 

So would you just use raid6 normally and transition to a contorted
raid5 on device failure?  Or would you really want to leave those
blocks fallow?

I guess I could implement that by using 8bits in the 'layout' number
to indicate which device in the array is 'failed', and run a reshape
pass that changes the layout, being careful not to re-write blocks
that hadn't changed....

Not impossible, but I would much rather someone else wrote (and
tested) the code while I watched...

> While the distributed spare idea could be extended to raid6 and raid10, 
> the mapping gets complex. Since Neil is currently adding code to allow 
> for orders other than sequential in raid6, being able to quickly deploy 
> the spare on a once-per-stripe basis might at least get him to rethink 
> the concept.

I think raid6e is trivial and raid6ee would be quite straight forward.

For raid10, if you used a far=3 layout but only use the first two
copies, you would effectively have raid10e.
If you used a near=3 layout but only used 2 copies, you would have
something like a raid10ee, but if you have 3 or 6 drives, all the
spare space would be on the 1 (or 2) device(s).



NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 11:17     ` Farkas Levente
@ 2009-02-13 17:02       ` Bill Davidsen
  0 siblings, 0 replies; 24+ messages in thread
From: Bill Davidsen @ 2009-02-13 17:02 UTC (permalink / raw)
  To: Farkas Levente; +Cc: NeilBrown, linux-raid

Farkas Levente wrote:
> NeilBrown wrote:
>   
>> On Thu, February 12, 2009 8:42 pm, Farkas Levente wrote:
>>     
>>> NeilBrown wrote:
>>>       
>>>> Hi,
>>>>  following is my current patch queue for 2.6.30, in case anyone would
>>>> like to review or otherwise comment.
>>>> They should show up in -next shortly.
>>>>
>>>> Probably the most interesting are the last few which provide support
>>>> for converting a raid1 into a raid5, and a raid5 into a raid6.
>>>> I plan to do some more work here so the code might change a bit before
>>>> final submission, as I work out how best ot factor the code.
>>>>
>>>> mdadm doesn't current support these conversions, but you can
>>>> simply
>>>>    echo raid5 > /sys/block/md0/md/level
>>>> to change a 2-drive raid1 into a raid5.  Similarly for 5->6
>>>>         
>>> any plan for non-raid to raid1 or anything else like in windows on can
>>> convert a normal partition into a mirrored one online.
>>>       
>> No plan exactly, but I do think about it from time to time.
>>
>> There are two problems with this, and solving just one of them
>> doesn't help you much.  So you really have to solve both at once,
>> which reduces the motivation towards either ....
>>
>> One problem is the task of changing the implementation of the device
>> underneath the filesystem without the filesystem needing to care.
>>
>> i.e. the filesystem opens block device 8,1 (/dev/sda1) and starts do
>> IO, then mdadm steps in and magically changes things so that /dev/sda1
>> is now on a raid1 array which happens to access the same data, but
>> through a different code path.
>> Figuring out exactly which data structure to install the redirection
>> and how to doing in a way that is guaranteed to be safe is non-trivial.
>>
>> dm has a mechanism to change the implementation under a given dm
>> device, and md now has an mechanism to change the implementation
>> under a given md device.  But generalising that to 'any device' is
>> not entirely trivial.  Now that I have done it for md I'm in a better
>> position to understand how it might be done.
>>
>> The other problem is where to store the metadata.  You need at least a
>> few bytes and realistically 1K of space on the devices that is free to
>> be used by md to record information about device state to allow arrays to
>> be assembled correctly.
>>
>> One idea I had was to get the filesystem to allocate a block and make that
>> available to md, then md would copy the data from the last block of the
>> device into that block and redirect all IO request aim at the
>> last block so that really access the relocated block.  Then md puts
>> it's metadata in that last block.
>>
>> This could work but is a little to error prone for my liking.  e.g.
>> if you fsck the device, you suddenly loose your guarantee that
>> the filesystem isn't going to write to that relocation block.
>>
>> I think it could only work if mdadm can inspect the device and ensure
>> that the last block isn't part of any partition, or any active filesystem.
>> This is possible, but messy.
>>
>> e.g. on my notebook which has a 250Gig drive whatever I used to partition
>> it (cfdisk?) insisted on using multiples of cylinders for partitions
>> (what an out-of-date concept!) and as the reported geometry is
>>
>> Disk /dev/sda: 250.0 GB, 250059350016 bytes
>> 255 heads, 63 sectors/track, 30401 cylinders
>>
>> There are 5013 unused sectors at the end - plenty of room for
>> md to put some metadata.  But if someone else had used sfdisk,
>> I think they would find no spare space and be unhappy.
>>
>> Maybe it is sufficient to support just those people who are
>> lucky enough to not be using the whole device...
>>
>>
>> So it might happen, but it is just a little to easy to stick this
>> one in the too-hard basket.
>>     
>
> the main reason here is our life. i saw many cases where there was a
> system installed to one system and later it'd be nice to make it
> redundant (a most sysadm said: it's not working on linux it's even
> working on windows, just put into a new disk and make it mirror).
> so i don't know the technical detail, but would be a very useful feature.
>
>   
I think you can get there for normal file systems data by creating a 
raid1 on a new drive using a failed drive. Then copy the data from the 
unmirrored drive to the mirrored f/s, unmount the original drive and 
mount the array, and add the original drive to the new array. This is 
ugly, and a verified backup and restore is better, but it can be done.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:13   ` Steve Fairbairn
  2009-02-12  9:46     ` Keld Jørn Simonsen
  2009-02-12 22:57     ` Dan Williams
@ 2009-02-13 16:56     ` Bill Davidsen
  2 siblings, 0 replies; 24+ messages in thread
From: Bill Davidsen @ 2009-02-13 16:56 UTC (permalink / raw)
  To: Steve Fairbairn; +Cc: linux-raid

Steve Fairbairn wrote:
> Keld Jørn Simonsen wrote:
>>
>> I would rather have functionality to convert raid10 to raid5.
>> raid1 should be depreciated, as raid10,n2 for all purposes is the same
>> but better implementation and performance, and raid10,f2 and raid10,o2
>> are even better.  Nobody should use raid1 anymore.
>>
> Complete ignorance of raid10 here, but is raid10,<anything> bootable, 
> like raid1 is?  I use raid1 on my root and boot partitions.

Note: if you use an initrd you can get away with raid1 on only the boot 
partition (I do with Fedora/GRUB). And most "recovery CDs" will not use 
a raid10 swap without starting it by hand, important only on small 
memory machines.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 10:53       ` Julian Cowley
@ 2009-02-13 16:54         ` Bill Davidsen
  2009-02-16  5:35           ` Neil Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Bill Davidsen @ 2009-02-13 16:54 UTC (permalink / raw)
  To: Julian Cowley; +Cc: Keld Jørn Simonsen, linux-raid, Neil Brown

Julian Cowley wrote:
> That would be a mistake, too.  It's not tied to RAID 1+0, but it's not 
> tied down to RAID 1 either, and its name should reflect that.  I'd 
> prefer something like raid1+0E, raid0E, raid1+1, or even just raid11.  
> The latter term is not used anywhere I know of, so it stands out.
>
Storage Computer (NH) has a trademark on one of the raid levels, from 
memory raid7.  Raid5E is used to denote a raid5 with distributed spare 
(something I would love to see in md).

> The horse has already probably left the barn on this one, though.
>
And in this case locking the bard door after the horse has left is 
probably a path of least confusion.

> Perhaps instead the documentation in mdadm(8) and md(4) could be 
> updated to mention that raid10 is a combination of the concepts in 
> RAID 1 and RAID 0, but is generalized enough so that it can be done 
> with just two drives at a minimum.  That would have caught my eye, at 
> least.

Good idea.

Ob. plug for raid5E: the advantages of raid5E are two-fold. The most 
obvious is that head motion is spread over N+2 drives (N being number of 
data drives) which improves performance quite a bit in the common small 
business case of 4-5 drive setups. It also puts some use on each drive, 
so you don't suddenly start using a drive which may have been spun down 
for a month, may have developed issues since SMART was last run, etc.

While the distributed spare idea could be extended to raid6 and raid10, 
the mapping gets complex. Since Neil is currently adding code to allow 
for orders other than sequential in raid6, being able to quickly deploy 
the spare on a once-per-stripe basis might at least get him to rethink 
the concept.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:13   ` Steve Fairbairn
  2009-02-12  9:46     ` Keld Jørn Simonsen
@ 2009-02-12 22:57     ` Dan Williams
  2009-02-13 16:56     ` Bill Davidsen
  2 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2009-02-12 22:57 UTC (permalink / raw)
  To: Steve Fairbairn; +Cc: linux-raid

On Thu, Feb 12, 2009 at 2:13 AM, Steve Fairbairn
<steve@fairbairn-family.com> wrote:
> Keld Jørn Simonsen wrote:
>>
>> I would rather have functionality to convert raid10 to raid5.
>> raid1 should be depreciated, as raid10,n2 for all purposes is the same
>> but better implementation and performance, and raid10,f2 and raid10,o2
>> are even better.  Nobody should use raid1 anymore.
>>
> Complete ignorance of raid10 here, but is raid10,<anything> bootable, like
> raid1 is?  I use raid1 on my root and boot partitions.
>

I would point out that raid1 is not, strictly speaking, bootable.
Yes, it happens to work because the bootloader can treat each disk as
a standalone device, but the bootloader has no idea if a given disk is
failed, rebuilding, or in-sync.  To reliably boot from raid you need
an option-rom that understands the on disk metadata format and exposes
the array as a single device to the bootloader.

Regards,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 15:28         ` Wil Reichert
@ 2009-02-12 17:44           ` Keld Jørn Simonsen
  0 siblings, 0 replies; 24+ messages in thread
From: Keld Jørn Simonsen @ 2009-02-12 17:44 UTC (permalink / raw)
  To: Wil Reichert; +Cc: NeilBrown, linux-raid

On Thu, Feb 12, 2009 at 07:28:59AM -0800, Wil Reichert wrote:
> On Thu, Feb 12, 2009 at 2:45 AM, NeilBrown <neilb@suse.de> wrote:
> > On Thu, February 12, 2009 8:53 pm, Keld Jørn Simonsen wrote:
> >> On Thu, Feb 12, 2009 at 08:21:12PM +1100, NeilBrown wrote:
> >>> On Thu, February 12, 2009 7:11 pm, Keld Jørn Simonsen wrote:
> >>> > On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
> >>> >> Comments and testing very welcome.
> >>> >
> >>> > I would rather have functionality to convert raid10 to raid5.
> >>> > raid1 should be depreciated, as raid10,n2 for all purposes is the same
> >>> > but better implementation and performance, and raid10,f2 and raid10,o2
> >>> > are even better.  Nobody should use raid1 anymore.
> >>>
> >>> That is a fairly simplistic view.
> >>
> >> It was also formulated to provoke some thoughts.
> >>
> >>> raid1 supports --write-mostly and --write-behind which raid10 is
> >>> unlikely
> >>> ever to support.
> >>
> >> why?
> >>
> >> Anyway would it not be possible that this functionality be implemented
> >> for raid10,n2?
> >
> > It would be possible, but it might not be sensible.
> >
> > write-mostly and write-behind only really make sense when you have the
> > clear distinction between drives that raid1 gives you.
> > These options don't make sense for raid10 in general.  Only in very specific
> > layouts.
> > If you like, raid1 is an implementation of a specific raid10 layout,
> > where it makes sense to add some extra functionality.
> >
> >>
> >> Some code to grow raid10 would also be desirable. Maybe it is some of
> >> the same operations that need to be applied: getting the old data in,
> >> have it restructured for the new format, in a safe way, and possibly
> >> with the help of an extra disk, or possibly not. It sounds non-trivial
> >> to me too.
> >
> > What particular growth scenarios are you interested it?
> > Just adding a drive and restriping onto that?  i.e keep that
> > same nominal layout but increase 'raid-disks'?
> >
> > That would be quite similar to the raid5 grow operation so it shouldn't
> > be too hard to achieve.
> > A 'grow' which changed the layout (e.g. near to far) would be a lot
> > harder.
> 
> I'd previously seen the wikipedia article regarding Linux RAID10 and
> its mention of the 3 disk case.  Out of academic curiousity, how does
> the 2 disk RAID10 work?  Is it just a matter of have 2 identical
> volumes and reading subsequent stripes from the alteranate drives?  Or
> is the algorithm more complicated?

There are 3 different layouts for raid10: near, far and offset.
Basically raid10 with 2 disks works as RAID 1 - and "near" and linux
raid1 is almost equivalent. far and offset has somewhat different
placements of blocks on the disks, which should lead to faster operation
for some common tasks.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 10:45       ` NeilBrown
  2009-02-12 11:11         ` Keld Jørn Simonsen
@ 2009-02-12 15:28         ` Wil Reichert
  2009-02-12 17:44           ` Keld Jørn Simonsen
  1 sibling, 1 reply; 24+ messages in thread
From: Wil Reichert @ 2009-02-12 15:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: Keld Jørn Simonsen, linux-raid

On Thu, Feb 12, 2009 at 2:45 AM, NeilBrown <neilb@suse.de> wrote:
> On Thu, February 12, 2009 8:53 pm, Keld Jørn Simonsen wrote:
>> On Thu, Feb 12, 2009 at 08:21:12PM +1100, NeilBrown wrote:
>>> On Thu, February 12, 2009 7:11 pm, Keld Jørn Simonsen wrote:
>>> > On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
>>> >> Comments and testing very welcome.
>>> >
>>> > I would rather have functionality to convert raid10 to raid5.
>>> > raid1 should be depreciated, as raid10,n2 for all purposes is the same
>>> > but better implementation and performance, and raid10,f2 and raid10,o2
>>> > are even better.  Nobody should use raid1 anymore.
>>>
>>> That is a fairly simplistic view.
>>
>> It was also formulated to provoke some thoughts.
>>
>>> raid1 supports --write-mostly and --write-behind which raid10 is
>>> unlikely
>>> ever to support.
>>
>> why?
>>
>> Anyway would it not be possible that this functionality be implemented
>> for raid10,n2?
>
> It would be possible, but it might not be sensible.
>
> write-mostly and write-behind only really make sense when you have the
> clear distinction between drives that raid1 gives you.
> These options don't make sense for raid10 in general.  Only in very specific
> layouts.
> If you like, raid1 is an implementation of a specific raid10 layout,
> where it makes sense to add some extra functionality.
>
>>
>> Some code to grow raid10 would also be desirable. Maybe it is some of
>> the same operations that need to be applied: getting the old data in,
>> have it restructured for the new format, in a safe way, and possibly
>> with the help of an extra disk, or possibly not. It sounds non-trivial
>> to me too.
>
> What particular growth scenarios are you interested it?
> Just adding a drive and restriping onto that?  i.e keep that
> same nominal layout but increase 'raid-disks'?
>
> That would be quite similar to the raid5 grow operation so it shouldn't
> be too hard to achieve.
> A 'grow' which changed the layout (e.g. near to far) would be a lot
> harder.

I'd previously seen the wikipedia article regarding Linux RAID10 and
its mention of the 3 disk case.  Out of academic curiousity, how does
the 2 disk RAID10 work?  Is it just a matter of have 2 identical
volumes and reading subsequent stripes from the alteranate drives?  Or
is the algorithm more complicated?

Wil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 10:40   ` NeilBrown
@ 2009-02-12 11:17     ` Farkas Levente
  2009-02-13 17:02       ` Bill Davidsen
  0 siblings, 1 reply; 24+ messages in thread
From: Farkas Levente @ 2009-02-12 11:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown wrote:
> On Thu, February 12, 2009 8:42 pm, Farkas Levente wrote:
>> NeilBrown wrote:
>>> Hi,
>>>  following is my current patch queue for 2.6.30, in case anyone would
>>> like to review or otherwise comment.
>>> They should show up in -next shortly.
>>>
>>> Probably the most interesting are the last few which provide support
>>> for converting a raid1 into a raid5, and a raid5 into a raid6.
>>> I plan to do some more work here so the code might change a bit before
>>> final submission, as I work out how best ot factor the code.
>>>
>>> mdadm doesn't current support these conversions, but you can
>>> simply
>>>    echo raid5 > /sys/block/md0/md/level
>>> to change a 2-drive raid1 into a raid5.  Similarly for 5->6
>> any plan for non-raid to raid1 or anything else like in windows on can
>> convert a normal partition into a mirrored one online.
> 
> No plan exactly, but I do think about it from time to time.
> 
> There are two problems with this, and solving just one of them
> doesn't help you much.  So you really have to solve both at once,
> which reduces the motivation towards either ....
> 
> One problem is the task of changing the implementation of the device
> underneath the filesystem without the filesystem needing to care.
> 
> i.e. the filesystem opens block device 8,1 (/dev/sda1) and starts do
> IO, then mdadm steps in and magically changes things so that /dev/sda1
> is now on a raid1 array which happens to access the same data, but
> through a different code path.
> Figuring out exactly which data structure to install the redirection
> and how to doing in a way that is guaranteed to be safe is non-trivial.
> 
> dm has a mechanism to change the implementation under a given dm
> device, and md now has an mechanism to change the implementation
> under a given md device.  But generalising that to 'any device' is
> not entirely trivial.  Now that I have done it for md I'm in a better
> position to understand how it might be done.
> 
> The other problem is where to store the metadata.  You need at least a
> few bytes and realistically 1K of space on the devices that is free to
> be used by md to record information about device state to allow arrays to
> be assembled correctly.
> 
> One idea I had was to get the filesystem to allocate a block and make that
> available to md, then md would copy the data from the last block of the
> device into that block and redirect all IO request aim at the
> last block so that really access the relocated block.  Then md puts
> it's metadata in that last block.
> 
> This could work but is a little to error prone for my liking.  e.g.
> if you fsck the device, you suddenly loose your guarantee that
> the filesystem isn't going to write to that relocation block.
> 
> I think it could only work if mdadm can inspect the device and ensure
> that the last block isn't part of any partition, or any active filesystem.
> This is possible, but messy.
> 
> e.g. on my notebook which has a 250Gig drive whatever I used to partition
> it (cfdisk?) insisted on using multiples of cylinders for partitions
> (what an out-of-date concept!) and as the reported geometry is
> 
> Disk /dev/sda: 250.0 GB, 250059350016 bytes
> 255 heads, 63 sectors/track, 30401 cylinders
> 
> There are 5013 unused sectors at the end - plenty of room for
> md to put some metadata.  But if someone else had used sfdisk,
> I think they would find no spare space and be unhappy.
> 
> Maybe it is sufficient to support just those people who are
> lucky enough to not be using the whole device...
> 
> 
> So it might happen, but it is just a little to easy to stick this
> one in the too-hard basket.

the main reason here is our life. i saw many cases where there was a
system installed to one system and later it'd be nice to make it
redundant (a most sysadm said: it's not working on linux it's even
working on windows, just put into a new disk and make it mirror).
so i don't know the technical detail, but would be a very useful feature.

-- 
  Levente                               "Si vis pacem para bellum!"

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 10:52       ` NeilBrown
@ 2009-02-12 11:16         ` Keld Jørn Simonsen
  0 siblings, 0 replies; 24+ messages in thread
From: Keld Jørn Simonsen @ 2009-02-12 11:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: Steve Fairbairn, linux-raid

On Thu, Feb 12, 2009 at 09:52:32PM +1100, NeilBrown wrote:
> On Thu, February 12, 2009 8:46 pm, Keld Jørn Simonsen wrote:
> > On Thu, Feb 12, 2009 at 09:13:17AM +0000, Steve Fairbairn wrote:
> >> Keld Jørn Simonsen wrote:
> >> >
> >> >I would rather have functionality to convert raid10 to raid5.
> >> >raid1 should be depreciated, as raid10,n2 for all purposes is the same
> >> >but better implementation and performance, and raid10,f2 and raid10,o2
> >> >are even better.  Nobody should use raid1 anymore.
> >> >
> >> Complete ignorance of raid10 here, but is raid10,<anything> bootable,
> >> like raid1 is?  I use raid1 on my root and boot partitions.
> >
> > AFAIK, raid10,n2 in default mode (superblock etc) is bootable, as it
> > looks like two copies of a normal FS. I think this was even reported
> > on this list at some time.
> >
> > You are not the only one that does not know much about raid10. I think
> > most Linux administrators don't.  And other system adminstrators most
> > likely don't either.
> >
> > Maybe we should rename raid10 to raid1?
> >
> > Raid10 should just be an enhanced raid1. And I understand from Neil that
> > raid10,o2 is mostly done because it is a standard raid1 layout. So it is
> > strange that it is not available with raid1 in Linux. And I also think
> > that the raid10,f2 layout is available from some HW raid controllers, as
> > their implementation of raid1. So all what is in raid10 is other places
> > considered raid1 stuff.
> >
> 
> The 'offset' layout came about to be able to support a DDF format
> which is called:
> 
> 4.2.18 Integrated Offset Stripe Mirroring (PRL=11, RLQ=01)
> 
> (4.2.18 is the section of the document
>  PRL is Primary Raid Level
>  RLQ is Rail Level Qualifier
> )
> There is also
> 
> 4.2.17 Integrated Adjacent Stripe Mirroring (PRL= 11, RLQ=00)
> 
> which is essentially the same as our n2 layout.
> 
> You should see their
> 4.3.4 Spanned Secondary RAID Level (SRL=03)
> 
> Though.  That would be really .. interesting to implement.
> 
> 
> You can down load the ddf spec at
> 
> http://www.snia.org/tech_activities/standards/curr_standards/ddf/
> 
> NeilBrown

I did look at the spec, and I could not find something that looked like
f2. I then sent them a mail suggesting standardizing raid10,f2 - some
months ago. No answer (yet...) Am I right that raid10,f2 is not
described in their spec? A bit odd. The idea behind raid10,f2 is quite
straightforward, and gives good results.

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12 10:45       ` NeilBrown
@ 2009-02-12 11:11         ` Keld Jørn Simonsen
  2009-02-12 15:28         ` Wil Reichert
  1 sibling, 0 replies; 24+ messages in thread
From: Keld Jørn Simonsen @ 2009-02-12 11:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Feb 12, 2009 at 09:45:38PM +1100, NeilBrown wrote:
> On Thu, February 12, 2009 8:53 pm, Keld Jørn Simonsen wrote:
> > On Thu, Feb 12, 2009 at 08:21:12PM +1100, NeilBrown wrote:
> >> On Thu, February 12, 2009 7:11 pm, Keld Jørn Simonsen wrote:
> >> > On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
> >> >> Comments and testing very welcome.
> >> >
> >> > I would rather have functionality to convert raid10 to raid5.
> >> > raid1 should be depreciated, as raid10,n2 for all purposes is the same
> >> > but better implementation and performance, and raid10,f2 and raid10,o2
> >> > are even better.  Nobody should use raid1 anymore.
> >>
> >> That is a fairly simplistic view.
> >
> > It was also formulated to provoke some thoughts.
> >
> >> raid1 supports --write-mostly and --write-behind which raid10 is
> >> unlikely
> >> ever to support.
> >
> > why?
> >
> > Anyway would it not be possible that this functionality be implemented
> > for raid10,n2?
> 
> It would be possible, but it might not be sensible.
> 
> write-mostly and write-behind only really make sense when you have the
> clear distinction between drives that raid1 gives you.
> These options don't make sense for raid10 in general.  Only in very specific
> layouts.
> If you like, raid1 is an implementation of a specific raid10 layout,
> where it makes sense to add some extra functionality.

Yes, I understand that.

> >
> > Some code to grow raid10 would also be desirable. Maybe it is some of
> > the same operations that need to be applied: getting the old data in,
> > have it restructured for the new format, in a safe way, and possibly
> > with the help of an extra disk, or possibly not. It sounds non-trivial
> > to me too.
> 
> What particular growth scenarios are you interested it?
> Just adding a drive and restriping onto that?  i.e keep that
> same nominal layout but increase 'raid-disks'?

yes. 

> That would be quite similar to the raid5 grow operation so it shouldn't
> be too hard to achieve.

Yes,  I was thinking about growing a raid10,f2 and that should not be
that difficult. You could rebuild each of the raid-0 layers at a time,
from the other raid-0 layer. And a way to make it fast would be to use some bigger
buffers, say 20 MB, to minimize head moves.

I have some needs for such growing for one of my servers.

I think some similar techniques could be used to grow n2 and o2,
given tat there is a clear strategy for filling up the new layout that
requires less space than the old one, so the old space can be reused.
This should even be possible for growing by more than one drive.

> A 'grow' which changed the layout (e.g. near to far) would be a lot
> harder.

Hmm, my ideas were that it should not be difficult, but it could be slow.
One could have a specific bitmap that could track where all data was
during the grow operation. But it could involve some intermediate
storing...

Best regards
keld

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:46     ` Keld Jørn Simonsen
  2009-02-12 10:52       ` NeilBrown
@ 2009-02-12 10:53       ` Julian Cowley
  2009-02-13 16:54         ` Bill Davidsen
  1 sibling, 1 reply; 24+ messages in thread
From: Julian Cowley @ 2009-02-12 10:53 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2063 bytes --]

On Thu, 12 Feb 2009, Keld Jørn Simonsen wrote:
> On Thu, Feb 12, 2009 at 09:13:17AM +0000, Steve Fairbairn wrote:
>> Keld Jørn Simonsen wrote:
>>>
>>> I would rather have functionality to convert raid10 to raid5.
>>> raid1 should be depreciated, as raid10,n2 for all purposes is the same
>>> but better implementation and performance, and raid10,f2 and raid10,o2
>>> are even better.  Nobody should use raid1 anymore.

Interesting.

>> Complete ignorance of raid10 here, but is raid10,<anything> bootable,
>> like raid1 is?  I use raid1 on my root and boot partitions.
>
> AFAIK, raid10,n2 in default mode (superblock etc) is bootable, as it
> looks like two copies of a normal FS. I think this was even reported
> on this list at some time.
>
> You are not the only one that does not know much about raid10. I think
> most Linux administrators don't.  And other system adminstrators most
> likely don't either.

Probably because of its name, which is too close to RAID 1+0 and therefore 
can easily be ignored in situations where RAID 1+0 normally can't be used.

I always assumed (without looking into it) that raid10 was supposed to be 
a replacement/improvement on RAID 1+0, which requires a minimum of 4 
drives.  I'd seen the wikipedia page where 3 drives are used for raid10, 
but it never occurred to me that raid10 is such a generalization that it 
could be used with just two drives.

> Maybe we should rename raid10 to raid1?

That would be a mistake, too.  It's not tied to RAID 1+0, but it's not 
tied down to RAID 1 either, and its name should reflect that.  I'd prefer 
something like raid1+0E, raid0E, raid1+1, or even just raid11.  The latter 
term is not used anywhere I know of, so it stands out.

The horse has already probably left the barn on this one, though.

Perhaps instead the documentation in mdadm(8) and md(4) could be updated 
to mention that raid10 is a combination of the concepts in RAID 1 and RAID 
0, but is generalized enough so that it can be done with just two drives 
at a minimum.  That would have caught my eye, at least.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:46     ` Keld Jørn Simonsen
@ 2009-02-12 10:52       ` NeilBrown
  2009-02-12 11:16         ` Keld Jørn Simonsen
  2009-02-12 10:53       ` Julian Cowley
  1 sibling, 1 reply; 24+ messages in thread
From: NeilBrown @ 2009-02-12 10:52 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Steve Fairbairn, linux-raid

On Thu, February 12, 2009 8:46 pm, Keld Jørn Simonsen wrote:
> On Thu, Feb 12, 2009 at 09:13:17AM +0000, Steve Fairbairn wrote:
>> Keld Jørn Simonsen wrote:
>> >
>> >I would rather have functionality to convert raid10 to raid5.
>> >raid1 should be depreciated, as raid10,n2 for all purposes is the same
>> >but better implementation and performance, and raid10,f2 and raid10,o2
>> >are even better.  Nobody should use raid1 anymore.
>> >
>> Complete ignorance of raid10 here, but is raid10,<anything> bootable,
>> like raid1 is?  I use raid1 on my root and boot partitions.
>
> AFAIK, raid10,n2 in default mode (superblock etc) is bootable, as it
> looks like two copies of a normal FS. I think this was even reported
> on this list at some time.
>
> You are not the only one that does not know much about raid10. I think
> most Linux administrators don't.  And other system adminstrators most
> likely don't either.
>
> Maybe we should rename raid10 to raid1?
>
> Raid10 should just be an enhanced raid1. And I understand from Neil that
> raid10,o2 is mostly done because it is a standard raid1 layout. So it is
> strange that it is not available with raid1 in Linux. And I also think
> that the raid10,f2 layout is available from some HW raid controllers, as
> their implementation of raid1. So all what is in raid10 is other places
> considered raid1 stuff.
>

The 'offset' layout came about to be able to support a DDF format
which is called:

4.2.18 Integrated Offset Stripe Mirroring (PRL=11, RLQ=01)

(4.2.18 is the section of the document
 PRL is Primary Raid Level
 RLQ is Rail Level Qualifier
)
There is also

4.2.17 Integrated Adjacent Stripe Mirroring (PRL= 11, RLQ=00)

which is essentially the same as our n2 layout.

You should see their
4.3.4 Spanned Secondary RAID Level (SRL=03)

Though.  That would be really .. interesting to implement.


You can down load the ddf spec at

http://www.snia.org/tech_activities/standards/curr_standards/ddf/

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:53     ` Keld Jørn Simonsen
@ 2009-02-12 10:45       ` NeilBrown
  2009-02-12 11:11         ` Keld Jørn Simonsen
  2009-02-12 15:28         ` Wil Reichert
  0 siblings, 2 replies; 24+ messages in thread
From: NeilBrown @ 2009-02-12 10:45 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

On Thu, February 12, 2009 8:53 pm, Keld Jørn Simonsen wrote:
> On Thu, Feb 12, 2009 at 08:21:12PM +1100, NeilBrown wrote:
>> On Thu, February 12, 2009 7:11 pm, Keld Jørn Simonsen wrote:
>> > On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
>> >> Comments and testing very welcome.
>> >
>> > I would rather have functionality to convert raid10 to raid5.
>> > raid1 should be depreciated, as raid10,n2 for all purposes is the same
>> > but better implementation and performance, and raid10,f2 and raid10,o2
>> > are even better.  Nobody should use raid1 anymore.
>>
>> That is a fairly simplistic view.
>
> It was also formulated to provoke some thoughts.
>
>> raid1 supports --write-mostly and --write-behind which raid10 is
>> unlikely
>> ever to support.
>
> why?
>
> Anyway would it not be possible that this functionality be implemented
> for raid10,n2?

It would be possible, but it might not be sensible.

write-mostly and write-behind only really make sense when you have the
clear distinction between drives that raid1 gives you.
These options don't make sense for raid10 in general.  Only in very specific
layouts.
If you like, raid1 is an implementation of a specific raid10 layout,
where it makes sense to add some extra functionality.

>
> Some code to grow raid10 would also be desirable. Maybe it is some of
> the same operations that need to be applied: getting the old data in,
> have it restructured for the new format, in a safe way, and possibly
> with the help of an extra disk, or possibly not. It sounds non-trivial
> to me too.

What particular growth scenarios are you interested it?
Just adding a drive and restriping onto that?  i.e keep that
same nominal layout but increase 'raid-disks'?

That would be quite similar to the raid5 grow operation so it shouldn't
be too hard to achieve.
A 'grow' which changed the layout (e.g. near to far) would be a lot
harder.

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:42 ` Farkas Levente
@ 2009-02-12 10:40   ` NeilBrown
  2009-02-12 11:17     ` Farkas Levente
  0 siblings, 1 reply; 24+ messages in thread
From: NeilBrown @ 2009-02-12 10:40 UTC (permalink / raw)
  To: Farkas Levente; +Cc: linux-raid

On Thu, February 12, 2009 8:42 pm, Farkas Levente wrote:
> NeilBrown wrote:
>> Hi,
>>  following is my current patch queue for 2.6.30, in case anyone would
>> like to review or otherwise comment.
>> They should show up in -next shortly.
>>
>> Probably the most interesting are the last few which provide support
>> for converting a raid1 into a raid5, and a raid5 into a raid6.
>> I plan to do some more work here so the code might change a bit before
>> final submission, as I work out how best ot factor the code.
>>
>> mdadm doesn't current support these conversions, but you can
>> simply
>>    echo raid5 > /sys/block/md0/md/level
>> to change a 2-drive raid1 into a raid5.  Similarly for 5->6
>
> any plan for non-raid to raid1 or anything else like in windows on can
> convert a normal partition into a mirrored one online.

No plan exactly, but I do think about it from time to time.

There are two problems with this, and solving just one of them
doesn't help you much.  So you really have to solve both at once,
which reduces the motivation towards either ....

One problem is the task of changing the implementation of the device
underneath the filesystem without the filesystem needing to care.

i.e. the filesystem opens block device 8,1 (/dev/sda1) and starts do
IO, then mdadm steps in and magically changes things so that /dev/sda1
is now on a raid1 array which happens to access the same data, but
through a different code path.
Figuring out exactly which data structure to install the redirection
and how to doing in a way that is guaranteed to be safe is non-trivial.

dm has a mechanism to change the implementation under a given dm
device, and md now has an mechanism to change the implementation
under a given md device.  But generalising that to 'any device' is
not entirely trivial.  Now that I have done it for md I'm in a better
position to understand how it might be done.

The other problem is where to store the metadata.  You need at least a
few bytes and realistically 1K of space on the devices that is free to
be used by md to record information about device state to allow arrays to
be assembled correctly.

One idea I had was to get the filesystem to allocate a block and make that
available to md, then md would copy the data from the last block of the
device into that block and redirect all IO request aim at the
last block so that really access the relocated block.  Then md puts
it's metadata in that last block.

This could work but is a little to error prone for my liking.  e.g.
if you fsck the device, you suddenly loose your guarantee that
the filesystem isn't going to write to that relocation block.

I think it could only work if mdadm can inspect the device and ensure
that the last block isn't part of any partition, or any active filesystem.
This is possible, but messy.

e.g. on my notebook which has a 250Gig drive whatever I used to partition
it (cfdisk?) insisted on using multiples of cylinders for partitions
(what an out-of-date concept!) and as the reported geometry is

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders

There are 5013 unused sectors at the end - plenty of room for
md to put some metadata.  But if someone else had used sfdisk,
I think they would find no spare space and be unhappy.

Maybe it is sufficient to support just those people who are
lucky enough to not be using the whole device...


So it might happen, but it is just a little to easy to stick this
one in the too-hard basket.

Thanks,
NeilBrown




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:21   ` NeilBrown
@ 2009-02-12  9:53     ` Keld Jørn Simonsen
  2009-02-12 10:45       ` NeilBrown
  0 siblings, 1 reply; 24+ messages in thread
From: Keld Jørn Simonsen @ 2009-02-12  9:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Feb 12, 2009 at 08:21:12PM +1100, NeilBrown wrote:
> On Thu, February 12, 2009 7:11 pm, Keld Jørn Simonsen wrote:
> > On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
> >> Comments and testing very welcome.
> >
> > I would rather have functionality to convert raid10 to raid5.
> > raid1 should be depreciated, as raid10,n2 for all purposes is the same
> > but better implementation and performance, and raid10,f2 and raid10,o2
> > are even better.  Nobody should use raid1 anymore.
> 
> That is a fairly simplistic view.

It was also formulated to provoke some thoughts.

> raid1 supports --write-mostly and --write-behind which raid10 is unlikely
> ever to support.

why?

Anyway would it not be possible that this functionality be implemented
for raid10,n2?

> Certainly in many cases raid10 is just as good or better than raid1
> though.
> 
> Certainly a raid10->raid5 conversion for a 2-drive n2 configuration
> is trivial to arrange.  Other conversions are less likely to be supported
> as they require significant non-trivial rearrangement of data.

Yes, possibly.


Some code to grow raid10 would also be desirable. Maybe it is some of
the same operations that need to be applied: getting the old data in,
have it restructured for the new format, in a safe way, and possibly
with the help of an extra disk, or possibly not. It sounds non-trivial
to me too.

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  9:13   ` Steve Fairbairn
@ 2009-02-12  9:46     ` Keld Jørn Simonsen
  2009-02-12 10:52       ` NeilBrown
  2009-02-12 10:53       ` Julian Cowley
  2009-02-12 22:57     ` Dan Williams
  2009-02-13 16:56     ` Bill Davidsen
  2 siblings, 2 replies; 24+ messages in thread
From: Keld Jørn Simonsen @ 2009-02-12  9:46 UTC (permalink / raw)
  To: Steve Fairbairn; +Cc: linux-raid

On Thu, Feb 12, 2009 at 09:13:17AM +0000, Steve Fairbairn wrote:
> Keld Jørn Simonsen wrote:
> >
> >I would rather have functionality to convert raid10 to raid5.
> >raid1 should be depreciated, as raid10,n2 for all purposes is the same
> >but better implementation and performance, and raid10,f2 and raid10,o2
> >are even better.  Nobody should use raid1 anymore.
> >
> Complete ignorance of raid10 here, but is raid10,<anything> bootable, 
> like raid1 is?  I use raid1 on my root and boot partitions.

AFAIK, raid10,n2 in default mode (superblock etc) is bootable, as it
looks like two copies of a normal FS. I think this was even reported 
on this list at some time. 

You are not the only one that does not know much about raid10. I think
most Linux administrators don't.  And other system adminstrators most
likely don't either.

Maybe we should rename raid10 to raid1?

Raid10 should just be an enhanced raid1. And I understand from Neil that
raid10,o2 is mostly done because it is a standard raid1 layout. So it is
strange that it is not available with raid1 in Linux. And I also think
that the raid10,f2 layout is available from some HW raid controllers, as
their implementation of raid1. So all what is in raid10 is other places
considered raid1 stuff.

Or we could merge the two drivers. 

I think we will never get people in general to know about raid10 as well
as they know about raid1, not even by far. 

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  3:10 NeilBrown
  2009-02-12  8:11 ` Keld Jørn Simonsen
@ 2009-02-12  9:42 ` Farkas Levente
  2009-02-12 10:40   ` NeilBrown
  1 sibling, 1 reply; 24+ messages in thread
From: Farkas Levente @ 2009-02-12  9:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown wrote:
> Hi,
>  following is my current patch queue for 2.6.30, in case anyone would
> like to review or otherwise comment.
> They should show up in -next shortly.
> 
> Probably the most interesting are the last few which provide support
> for converting a raid1 into a raid5, and a raid5 into a raid6.
> I plan to do some more work here so the code might change a bit before
> final submission, as I work out how best ot factor the code.
> 
> mdadm doesn't current support these conversions, but you can
> simply
>    echo raid5 > /sys/block/md0/md/level
> to change a 2-drive raid1 into a raid5.  Similarly for 5->6

any plan for non-raid to raid1 or anything else like in windows on can
convert a normal partition into a mirrored one online.

-- 
  Levente                               "Si vis pacem para bellum!"

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  8:11 ` Keld Jørn Simonsen
  2009-02-12  9:13   ` Steve Fairbairn
@ 2009-02-12  9:21   ` NeilBrown
  2009-02-12  9:53     ` Keld Jørn Simonsen
  1 sibling, 1 reply; 24+ messages in thread
From: NeilBrown @ 2009-02-12  9:21 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

On Thu, February 12, 2009 7:11 pm, Keld Jørn Simonsen wrote:
> On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
>> Comments and testing very welcome.
>
> I would rather have functionality to convert raid10 to raid5.
> raid1 should be depreciated, as raid10,n2 for all purposes is the same
> but better implementation and performance, and raid10,f2 and raid10,o2
> are even better.  Nobody should use raid1 anymore.

That is a fairly simplistic view.
raid1 supports --write-mostly and --write-behind which raid10 is unlikely
ever to support.
Certainly in many cases raid10 is just as good or better than raid1
though.

Certainly a raid10->raid5 conversion for a 2-drive n2 configuration
is trivial to arrange.  Other conversions are less likely to be supported
as they require significant non-trivial rearrangement of data.

Thanks for the comment.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  8:11 ` Keld Jørn Simonsen
@ 2009-02-12  9:13   ` Steve Fairbairn
  2009-02-12  9:46     ` Keld Jørn Simonsen
                       ` (2 more replies)
  2009-02-12  9:21   ` NeilBrown
  1 sibling, 3 replies; 24+ messages in thread
From: Steve Fairbairn @ 2009-02-12  9:13 UTC (permalink / raw)
  To: linux-raid

Keld Jørn Simonsen wrote:
> 
> I would rather have functionality to convert raid10 to raid5.
> raid1 should be depreciated, as raid10,n2 for all purposes is the same
> but better implementation and performance, and raid10,f2 and raid10,o2
> are even better.  Nobody should use raid1 anymore.
> 
Complete ignorance of raid10 here, but is raid10,<anything> bootable, 
like raid1 is?  I use raid1 on my root and boot partitions.

Steve.

PS.  Apologies to Keld for sending this directly to him first time.  I 
never remember that this group is reply to sender, not reply to group.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
  2009-02-12  3:10 NeilBrown
@ 2009-02-12  8:11 ` Keld Jørn Simonsen
  2009-02-12  9:13   ` Steve Fairbairn
  2009-02-12  9:21   ` NeilBrown
  2009-02-12  9:42 ` Farkas Levente
  1 sibling, 2 replies; 24+ messages in thread
From: Keld Jørn Simonsen @ 2009-02-12  8:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Feb 12, 2009 at 02:10:10PM +1100, NeilBrown wrote:
> Hi,
>  following is my current patch queue for 2.6.30, in case anyone would
> like to review or otherwise comment.
> They should show up in -next shortly.
> 
> Probably the most interesting are the last few which provide support
> for converting a raid1 into a raid5, and a raid5 into a raid6.
> I plan to do some more work here so the code might change a bit before
> final submission, as I work out how best ot factor the code.
> 
> mdadm doesn't current support these conversions, but you can
> simply
>    echo raid5 > /sys/block/md0/md/level
> to change a 2-drive raid1 into a raid5.  Similarly for 5->6
> 
> The raid6 array created will have a somewhat unusual layout in that
> all the Q blocks will be on the last drive.  Later I'll create
> functionality to restripe the array so that the Q block is rotated
> around all the drives as you would expect.
> 
> Comments and testing very welcome.

I would rather have functionality to convert raid10 to raid5.
raid1 should be depreciated, as raid10,n2 for all purposes is the same
but better implementation and performance, and raid10,f2 and raid10,o2
are even better.  Nobody should use raid1 anymore.

Best regards
keld

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 00/18] Assorted md patches headed for 2.6.30
@ 2009-02-12  3:10 NeilBrown
  2009-02-12  8:11 ` Keld Jørn Simonsen
  2009-02-12  9:42 ` Farkas Levente
  0 siblings, 2 replies; 24+ messages in thread
From: NeilBrown @ 2009-02-12  3:10 UTC (permalink / raw)
  To: linux-raid

Hi,
 following is my current patch queue for 2.6.30, in case anyone would
like to review or otherwise comment.
They should show up in -next shortly.

Probably the most interesting are the last few which provide support
for converting a raid1 into a raid5, and a raid5 into a raid6.
I plan to do some more work here so the code might change a bit before
final submission, as I work out how best ot factor the code.

mdadm doesn't current support these conversions, but you can
simply
   echo raid5 > /sys/block/md0/md/level
to change a 2-drive raid1 into a raid5.  Similarly for 5->6

The raid6 array created will have a somewhat unusual layout in that
all the Q blocks will be on the last drive.  Later I'll create
functionality to restripe the array so that the Q block is rotated
around all the drives as you would expect.

Comments and testing very welcome.

Thanks,
NeilBrown




^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-03-13  1:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <7554605.886551236670855947.JavaMail.coremail@bj163app40.163.com>
2009-03-13  1:00 ` [PATCH 00/18] Assorted md patches headed for 2.6.30 Neil Brown
2009-03-10  8:24 jzc-sina
  -- strict thread matches above, loose matches on Subject: below --
2009-02-12  3:10 NeilBrown
2009-02-12  8:11 ` Keld Jørn Simonsen
2009-02-12  9:13   ` Steve Fairbairn
2009-02-12  9:46     ` Keld Jørn Simonsen
2009-02-12 10:52       ` NeilBrown
2009-02-12 11:16         ` Keld Jørn Simonsen
2009-02-12 10:53       ` Julian Cowley
2009-02-13 16:54         ` Bill Davidsen
2009-02-16  5:35           ` Neil Brown
2009-02-16 17:31             ` Nagilum
2009-02-12 22:57     ` Dan Williams
2009-02-13 16:56     ` Bill Davidsen
2009-02-12  9:21   ` NeilBrown
2009-02-12  9:53     ` Keld Jørn Simonsen
2009-02-12 10:45       ` NeilBrown
2009-02-12 11:11         ` Keld Jørn Simonsen
2009-02-12 15:28         ` Wil Reichert
2009-02-12 17:44           ` Keld Jørn Simonsen
2009-02-12  9:42 ` Farkas Levente
2009-02-12 10:40   ` NeilBrown
2009-02-12 11:17     ` Farkas Levente
2009-02-13 17:02       ` Bill Davidsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.