All of lore.kernel.org
 help / color / mirror / Atom feed
* Balance RAID10 with odd device count
@ 2012-02-21  0:35 Tom Cameron
  2012-02-21  0:45 ` Wes
  2012-02-21  1:13 ` Hugo Mills
  0 siblings, 2 replies; 19+ messages in thread
From: Tom Cameron @ 2012-02-21  0:35 UTC (permalink / raw)
  To: linux-btrfs

I had a 4 drive RAID10 btrfs setup that I added a fifth drive to with
the "btrfs device add" command. Once the device was added, I used the
balance command to distribute the data through the drives. This
resulted in an infinite run of the btrfs tool with data moving back
and forth across the drives over and over again. When using the "btrfs
filesystem show" command, I could see the same pattern repeated in the
byte counts on each of the drives.

It would probably add more complexity to the code, but adding a check
for loops like this may be handy. While a 5-drive RAID10 array is a
weird configuration (I'm waiting for a case with 6 bays), it _should_
be possible with filesystems like BTRFS. In my head, the distribution
of data would be uneven across drives, but the duplicate and stripe
count should be even at the end. I'd imagine it to look something like
this:

D1: A1 B1 C1 D1
D2: A1 B1 C1    E1
D3: A2 B2    D1 E1
D4: A2    C2 D2 E2
D5:    B2 C2 D2 E2

This is obviously over simplified, but the general idea is the same. I
haven't looked into the way the "RAID"ing of objects works in BTRFS
yet, but because it's a filesystem and not a block-based system it
should be smart enough to care only about the duplication and striping
of data, and not the actual block-level or extent-level balancing.
Thoughts?

Thanks in advance!
Tom

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  0:35 Balance RAID10 with odd device count Tom Cameron
@ 2012-02-21  0:45 ` Wes
  2012-02-21  0:51   ` Wes
                     ` (2 more replies)
  2012-02-21  1:13 ` Hugo Mills
  1 sibling, 3 replies; 19+ messages in thread
From: Wes @ 2012-02-21  0:45 UTC (permalink / raw)
  To: tom; +Cc: linux-btrfs

I've noticed similar behavior when even RAID0'ing an odd number of
devices which should be even more trivial in practice.
You would expect something like:
sda A1 B1
sdb A2 B2
sdc A3 B3

or at least, if BTRFS can only handle block pairs,

sda  A1 B2
sdb  A2 C1
sdc  B1 C2

But the end result was that disk usage and reporting went all out of
whack, allocation reporting got confused and started returning
impossible values, and very shortly after the entire FS was corrupted.
 Rebalancing messed everything up royally and in the end I concluded
to simply not use an odd number of drives with BTRFS.

I also tried RAID1 with an odd number of drives, expecting to have 2
redundant mirrors.  Instead the end result was that the blocks were
still only allocated in pairs, and since they were allocated
round-robbin on the drives I completely lost the ability to remove any
single drive from the array without data loss.

ie:
Instead of:
sda A1 B1
sdb A1 B1
sdc A1 B1

it ended up doing:

sda A1 B1
sdb A1 C1
sdc B1 C1

meaning removing any 1 drive would result in lost data.

I was told that this issue should have been resolved a while ago by a
dev at Linuxconf, however this test of mine was only about 2 months
ago.




On Tue, Feb 21, 2012 at 11:35 AM, Tom Cameron <tomc603@gmail.com> wrote=
:
> I had a 4 drive RAID10 btrfs setup that I added a fifth drive to with
> the "btrfs device add" command. Once the device was added, I used the
> balance command to distribute the data through the drives. This
> resulted in an infinite run of the btrfs tool with data moving back
> and forth across the drives over and over again. When using the "btrf=
s
> filesystem show" command, I could see the same pattern repeated in th=
e
> byte counts on each of the drives.
>
> It would probably add more complexity to the code, but adding a check
> for loops like this may be handy. While a 5-drive RAID10 array is a
> weird configuration (I'm waiting for a case with 6 bays), it _should_
> be possible with filesystems like BTRFS. In my head, the distribution
> of data would be uneven across drives, but the duplicate and stripe
> count should be even at the end. I'd imagine it to look something lik=
e
> this:
>
> D1: A1 B1 C1 D1
> D2: A1 B1 C1 =A0 =A0E1
> D3: A2 B2 =A0 =A0D1 E1
> D4: A2 =A0 =A0C2 D2 E2
> D5: =A0 =A0B2 C2 D2 E2
>
> This is obviously over simplified, but the general idea is the same. =
I
> haven't looked into the way the "RAID"ing of objects works in BTRFS
> yet, but because it's a filesystem and not a block-based system it
> should be smart enough to care only about the duplication and stripin=
g
> of data, and not the actual block-level or extent-level balancing.
> Thoughts?
>
> Thanks in advance!
> Tom
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  0:45 ` Wes
@ 2012-02-21  0:51   ` Wes
  2012-02-21  1:07     ` Tom Cameron
  2012-02-21  1:07   ` Hugo Mills
  2012-02-21  1:16   ` Liu Bo
  2 siblings, 1 reply; 19+ messages in thread
From: Wes @ 2012-02-21  0:51 UTC (permalink / raw)
  To: tom; +Cc: linux-btrfs

Sorry, I meant 'removing 2 drives' in the raid1 with 3 drives example



On Tue, Feb 21, 2012 at 11:45 AM, Wes <anomaly256@gmail.com> wrote:
> I've noticed similar behavior when even RAID0'ing an odd number of
> devices which should be even more trivial in practice.
> You would expect something like:
> sda A1 B1
> sdb A2 B2
> sdc A3 B3
>
> or at least, if BTRFS can only handle block pairs,
>
> sda =A0A1 B2
> sdb =A0A2 C1
> sdc =A0B1 C2
>
> But the end result was that disk usage and reporting went all out of
> whack, allocation reporting got confused and started returning
> impossible values, and very shortly after the entire FS was corrupted=
=2E
> =A0Rebalancing messed everything up royally and in the end I conclude=
d
> to simply not use an odd number of drives with BTRFS.
>
> I also tried RAID1 with an odd number of drives, expecting to have 2
> redundant mirrors. =A0Instead the end result was that the blocks were
> still only allocated in pairs, and since they were allocated
> round-robbin on the drives I completely lost the ability to remove an=
y
> single drive from the array without data loss.
>
> ie:
> Instead of:
> sda A1 B1
> sdb A1 B1
> sdc A1 B1
>
> it ended up doing:
>
> sda A1 B1
> sdb A1 C1
> sdc B1 C1
>
> meaning removing any 1 drive would result in lost data.
>
> I was told that this issue should have been resolved a while ago by a
> dev at Linuxconf, however this test of mine was only about 2 months
> ago.
>
>
>
>
> On Tue, Feb 21, 2012 at 11:35 AM, Tom Cameron <tomc603@gmail.com> wro=
te:
>> I had a 4 drive RAID10 btrfs setup that I added a fifth drive to wit=
h
>> the "btrfs device add" command. Once the device was added, I used th=
e
>> balance command to distribute the data through the drives. This
>> resulted in an infinite run of the btrfs tool with data moving back
>> and forth across the drives over and over again. When using the "btr=
fs
>> filesystem show" command, I could see the same pattern repeated in t=
he
>> byte counts on each of the drives.
>>
>> It would probably add more complexity to the code, but adding a chec=
k
>> for loops like this may be handy. While a 5-drive RAID10 array is a
>> weird configuration (I'm waiting for a case with 6 bays), it _should=
_
>> be possible with filesystems like BTRFS. In my head, the distributio=
n
>> of data would be uneven across drives, but the duplicate and stripe
>> count should be even at the end. I'd imagine it to look something li=
ke
>> this:
>>
>> D1: A1 B1 C1 D1
>> D2: A1 B1 C1 =A0 =A0E1
>> D3: A2 B2 =A0 =A0D1 E1
>> D4: A2 =A0 =A0C2 D2 E2
>> D5: =A0 =A0B2 C2 D2 E2
>>
>> This is obviously over simplified, but the general idea is the same.=
 I
>> haven't looked into the way the "RAID"ing of objects works in BTRFS
>> yet, but because it's a filesystem and not a block-based system it
>> should be smart enough to care only about the duplication and stripi=
ng
>> of data, and not the actual block-level or extent-level balancing.
>> Thoughts?
>>
>> Thanks in advance!
>> Tom
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrf=
s" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  0:45 ` Wes
  2012-02-21  0:51   ` Wes
@ 2012-02-21  1:07   ` Hugo Mills
  2012-02-21  1:13     ` Tom Cameron
  2012-02-21  1:27     ` Wes
  2012-02-21  1:16   ` Liu Bo
  2 siblings, 2 replies; 19+ messages in thread
From: Hugo Mills @ 2012-02-21  1:07 UTC (permalink / raw)
  To: Wes; +Cc: tom, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On Tue, Feb 21, 2012 at 11:45:51AM +1100, Wes wrote:
> I've noticed similar behavior when even RAID0'ing an odd number of
> devices which should be even more trivial in practice.
> You would expect something like:
> sda A1 B1
> sdb A2 B2
> sdc A3 B3

   This is what it should do -- it'll use as many disks as it can find
to put stripes across at the time the allocator is asked to make
another block group.

> or at least, if BTRFS can only handle block pairs,
> 
> sda  A1 B2
> sdb  A2 C1
> sdc  B1 C2
> 
> But the end result was that disk usage and reporting went all out of
> whack, allocation reporting got confused and started returning
> impossible values, and very shortly after the entire FS was corrupted.
>  Rebalancing messed everything up royally and in the end I concluded
> to simply not use an odd number of drives with BTRFS.

   I can't see why that should have happened. What kernel were you
doing this with?

> I also tried RAID1 with an odd number of drives, expecting to have 2
> redundant mirrors.

   This isn't a valid expectation. Or rather, you can expect it, but
it's not what btrfs is designed to deliver. Btrfs's RAID-1
implementation is *precisely two* copies. Hence it isn't really much
like RAID-1, as you've found out.

>  Instead the end result was that the blocks were
> still only allocated in pairs, and since they were allocated
> round-robbin on the drives I completely lost the ability to remove any
> single drive from the array without data loss.
> 
> ie:
> Instead of:
> sda A1 B1
> sdb A1 B1
> sdc A1 B1
> 
> it ended up doing:
> 
> sda A1 B1
> sdb A1 C1
> sdc B1 C1
> 
> meaning removing any 1 drive would result in lost data.

   (Any 2 drives, as you corrected in your subsequent email)

   However, you can remove any one drive, and your data is fine, which
is what btrfs's RAID-1 guarantee is. I understand that there will be
additional features coming along Real Soon Now (possibly at the same
time that RAID-5 and -6 are integrated) which will allow the selection
of larger numbers of copies.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- People are too unreliable to be replaced by machines. ---      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  0:51   ` Wes
@ 2012-02-21  1:07     ` Tom Cameron
       [not found]       ` <CA+WRLO9BgqE+CwCUNgjwjVFyjDDp94SBX_EbdVciHUd0jpUqWQ@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Cameron @ 2012-02-21  1:07 UTC (permalink / raw)
  To: Wes; +Cc: linux-btrfs

I figured you meant that.

Using RAID1 on N drives normally would mean all drives have a copy of
the object. The upshot of this is that you can lose N-1 drives and
still access data. In systems like ZFS or BTRFS you would also expect
a read speed of N*, since you could theoretically read from all drives
in parallel as long as the checksum is valid.

It seems from the BTRFS documentation that the RAID1 profile is
actually "mirror", or store 2 copies of the object. Perhaps when
Oracle makes BTRFS a production option they should more clearly spell
that out.

So, if the fixes were done at Linuxconf, would we be looking at a 3.3
or a 3.4 release?


On Mon, Feb 20, 2012 at 7:51 PM, Wes <anomaly256@gmail.com> wrote:
> Sorry, I meant 'removing 2 drives' in the raid1 with 3 drives example
>
>
>
> On Tue, Feb 21, 2012 at 11:45 AM, Wes <anomaly256@gmail.com> wrote:
>> I've noticed similar behavior when even RAID0'ing an odd number of
>> devices which should be even more trivial in practice.
>> You would expect something like:
>> sda A1 B1
>> sdb A2 B2
>> sdc A3 B3
>>
>> or at least, if BTRFS can only handle block pairs,
>>
>> sda =A0A1 B2
>> sdb =A0A2 C1
>> sdc =A0B1 C2
>>
>> But the end result was that disk usage and reporting went all out of
>> whack, allocation reporting got confused and started returning
>> impossible values, and very shortly after the entire FS was corrupte=
d.
>> =A0Rebalancing messed everything up royally and in the end I conclud=
ed
>> to simply not use an odd number of drives with BTRFS.
>>
>> I also tried RAID1 with an odd number of drives, expecting to have 2
>> redundant mirrors. =A0Instead the end result was that the blocks wer=
e
>> still only allocated in pairs, and since they were allocated
>> round-robbin on the drives I completely lost the ability to remove a=
ny
>> single drive from the array without data loss.
>>
>> ie:
>> Instead of:
>> sda A1 B1
>> sdb A1 B1
>> sdc A1 B1
>>
>> it ended up doing:
>>
>> sda A1 B1
>> sdb A1 C1
>> sdc B1 C1
>>
>> meaning removing any 1 drive would result in lost data.
>>
>> I was told that this issue should have been resolved a while ago by =
a
>> dev at Linuxconf, however this test of mine was only about 2 months
>> ago.
>>
>>
>>
>>
>> On Tue, Feb 21, 2012 at 11:35 AM, Tom Cameron <tomc603@gmail.com> wr=
ote:
>>> I had a 4 drive RAID10 btrfs setup that I added a fifth drive to wi=
th
>>> the "btrfs device add" command. Once the device was added, I used t=
he
>>> balance command to distribute the data through the drives. This
>>> resulted in an infinite run of the btrfs tool with data moving back
>>> and forth across the drives over and over again. When using the "bt=
rfs
>>> filesystem show" command, I could see the same pattern repeated in =
the
>>> byte counts on each of the drives.
>>>
>>> It would probably add more complexity to the code, but adding a che=
ck
>>> for loops like this may be handy. While a 5-drive RAID10 array is a
>>> weird configuration (I'm waiting for a case with 6 bays), it _shoul=
d_
>>> be possible with filesystems like BTRFS. In my head, the distributi=
on
>>> of data would be uneven across drives, but the duplicate and stripe
>>> count should be even at the end. I'd imagine it to look something l=
ike
>>> this:
>>>
>>> D1: A1 B1 C1 D1
>>> D2: A1 B1 C1 =A0 =A0E1
>>> D3: A2 B2 =A0 =A0D1 E1
>>> D4: A2 =A0 =A0C2 D2 E2
>>> D5: =A0 =A0B2 C2 D2 E2
>>>
>>> This is obviously over simplified, but the general idea is the same=
=2E I
>>> haven't looked into the way the "RAID"ing of objects works in BTRFS
>>> yet, but because it's a filesystem and not a block-based system it
>>> should be smart enough to care only about the duplication and strip=
ing
>>> of data, and not the actual block-level or extent-level balancing.
>>> Thoughts?
>>>
>>> Thanks in advance!
>>> Tom
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btr=
fs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  0:35 Balance RAID10 with odd device count Tom Cameron
  2012-02-21  0:45 ` Wes
@ 2012-02-21  1:13 ` Hugo Mills
  1 sibling, 0 replies; 19+ messages in thread
From: Hugo Mills @ 2012-02-21  1:13 UTC (permalink / raw)
  To: tom; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]

On Mon, Feb 20, 2012 at 07:35:18PM -0500, Tom Cameron wrote:
> I had a 4 drive RAID10 btrfs setup that I added a fifth drive to with
> the "btrfs device add" command. Once the device was added, I used the
> balance command to distribute the data through the drives. This
> resulted in an infinite run of the btrfs tool with data moving back
> and forth across the drives over and over again. When using the "btrfs
> filesystem show" command, I could see the same pattern repeated in the
> byte counts on each of the drives.

   The balance operation should be guaranteed to complete. At least,
it does these days (back in the 2.6.35 days, it didn't always
complete). Having a repeating pattern of bytes counts isn't
necessarily a sign that it's stuck in an infinite loop. It was
probably just taking a very long time.

   If you use 3.3-rc4, and apply the restriper patches to the
userspace tools, you can use the new restriper code, which adds
(amongst many other things) a progress counter to balances.

> It would probably add more complexity to the code, but adding a check
> for loops like this may be handy. While a 5-drive RAID10 array is a
> weird configuration (I'm waiting for a case with 6 bays), it _should_
> be possible with filesystems like BTRFS.

   Indeed it should. I've not tested it yet myself, though.

> In my head, the distribution
> of data would be uneven across drives, but the duplicate and stripe
> count should be even at the end. I'd imagine it to look something like
> this:
> 
> D1: A1 B1 C1 D1
> D2: A1 B1 C1    E1
> D3: A2 B2    D1 E1
> D4: A2    C2 D2 E2
> D5:    B2 C2 D2 E2

   Yup, that's about right. Except that the empty spaces aren't there,
so it'll look more like this:

D1: A1 B1 C1 D1
D2: A1 B1 C1 E1
D3: A2 B2 D1 E1
D4: A2 C2 D2 E2
D5: B2 C2 D2 E2

> This is obviously over simplified, but the general idea is the same. I
> haven't looked into the way the "RAID"ing of objects works in BTRFS
> yet,

   See the "SysadminGuide" on the wiki[1] for a fuller explanation. I
should probably expand the example to show the case with odd numbers
of drives (and possibly with unbalanced disk sizes too).

> but because it's a filesystem and not a block-based system it
> should be smart enough to care only about the duplication and striping
> of data, and not the actual block-level or extent-level balancing.

   Hugo.

[1] http://btrfs.ipv5.de/index.php?title=SysadminGuide

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- I'd make a joke about UDP,  but I don't know if ---         
                     anyone's actually listening...                      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:07   ` Hugo Mills
@ 2012-02-21  1:13     ` Tom Cameron
  2012-02-21  1:21       ` Hugo Mills
  2012-02-21  1:27     ` Wes
  1 sibling, 1 reply; 19+ messages in thread
From: Tom Cameron @ 2012-02-21  1:13 UTC (permalink / raw)
  To: Hugo Mills, Wes, tom, linux-btrfs

On Mon, Feb 20, 2012 at 8:07 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>
> =A0 However, you can remove any one drive, and your data is fine, whi=
ch
> is what btrfs's RAID-1 guarantee is. I understand that there will be
> additional features coming along Real Soon Now (possibly at the same
> time that RAID-5 and -6 are integrated) which will allow the selectio=
n
> of larger numbers of copies.
>

Is there a projected timeframe for RAID5/6? I understand it's
currently not the development focus of the BTRFS team, and most
organizations want performance over capacity making RAID10 the clear
choice. But, there are still some situations where RAID6 is better
suited (large pools of archive storage).

Also, do we know if the RAID5/6 implementation will simply break data
into two data objects and one or two parity objects, or will it work
with an arbitrary number of devices? Meaning, if I have a RAID6 pool
of 12 drives, will I get 10 data objects and two parity objects?

Thanks all for your replies!
Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  0:45 ` Wes
  2012-02-21  0:51   ` Wes
  2012-02-21  1:07   ` Hugo Mills
@ 2012-02-21  1:16   ` Liu Bo
  2012-02-21  1:22     ` Hugo Mills
  2 siblings, 1 reply; 19+ messages in thread
From: Liu Bo @ 2012-02-21  1:16 UTC (permalink / raw)
  To: Wes; +Cc: tom, linux-btrfs

On 02/21/2012 08:45 AM, Wes wrote:
> I've noticed similar behavior when even RAID0'ing an odd number of
> devices which should be even more trivial in practice.
> You would expect something like:
> sda A1 B1
> sdb A2 B2
> sdc A3 B3
> 
> or at least, if BTRFS can only handle block pairs,
> 
> sda  A1 B2
> sdb  A2 C1
> sdc  B1 C2
> 
> But the end result was that disk usage and reporting went all out of
> whack, allocation reporting got confused and started returning
> impossible values, and very shortly after the entire FS was corrupted.
>  Rebalancing messed everything up royally and in the end I concluded
> to simply not use an odd number of drives with BTRFS.
> 
> I also tried RAID1 with an odd number of drives, expecting to have 2
> redundant mirrors.  Instead the end result was that the blocks were
> still only allocated in pairs, and since they were allocated
> round-robbin on the drives I completely lost the ability to remove any
> single drive from the array without data loss.
> 
> ie:
> Instead of:
> sda A1 B1
> sdb A1 B1
> sdc A1 B1
> 
> it ended up doing:
> 
> sda A1 B1
> sdb A1 C1
> sdc B1 C1
> 
> meaning removing any 1 drive would result in lost data.
> 

Removing any disk will not lose data cause btrfs ensure all the data in the removed disk is
safely placed on right places.  And if there is not enough rest space for the data,
the remove operations will fail.  Or what am I missing?

thanks,
liubo

> I was told that this issue should have been resolved a while ago by a
> dev at Linuxconf, however this test of mine was only about 2 months
> ago.
> 
> 
> 
> 
> On Tue, Feb 21, 2012 at 11:35 AM, Tom Cameron <tomc603@gmail.com> wrote:
>> I had a 4 drive RAID10 btrfs setup that I added a fifth drive to with
>> the "btrfs device add" command. Once the device was added, I used the
>> balance command to distribute the data through the drives. This
>> resulted in an infinite run of the btrfs tool with data moving back
>> and forth across the drives over and over again. When using the "btrfs
>> filesystem show" command, I could see the same pattern repeated in the
>> byte counts on each of the drives.
>>
>> It would probably add more complexity to the code, but adding a check
>> for loops like this may be handy. While a 5-drive RAID10 array is a
>> weird configuration (I'm waiting for a case with 6 bays), it _should_
>> be possible with filesystems like BTRFS. In my head, the distribution
>> of data would be uneven across drives, but the duplicate and stripe
>> count should be even at the end. I'd imagine it to look something like
>> this:
>>
>> D1: A1 B1 C1 D1
>> D2: A1 B1 C1    E1
>> D3: A2 B2    D1 E1
>> D4: A2    C2 D2 E2
>> D5:    B2 C2 D2 E2
>>
>> This is obviously over simplified, but the general idea is the same. I
>> haven't looked into the way the "RAID"ing of objects works in BTRFS
>> yet, but because it's a filesystem and not a block-based system it
>> should be smart enough to care only about the duplication and striping
>> of data, and not the actual block-level or extent-level balancing.
>> Thoughts?
>>
>> Thanks in advance!
>> Tom
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:13     ` Tom Cameron
@ 2012-02-21  1:21       ` Hugo Mills
  2012-02-22 11:48         ` Duncan
  0 siblings, 1 reply; 19+ messages in thread
From: Hugo Mills @ 2012-02-21  1:21 UTC (permalink / raw)
  To: tom; +Cc: Wes, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2371 bytes --]

On Mon, Feb 20, 2012 at 08:13:43PM -0500, Tom Cameron wrote:
> On Mon, Feb 20, 2012 at 8:07 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >
> >   However, you can remove any one drive, and your data is fine, which
> > is what btrfs's RAID-1 guarantee is. I understand that there will be
> > additional features coming along Real Soon Now (possibly at the same
> > time that RAID-5 and -6 are integrated) which will allow the selection
> > of larger numbers of copies.
> >
> 
> Is there a projected timeframe for RAID5/6? I understand it's
> currently not the development focus of the BTRFS team, and most
> organizations want performance over capacity making RAID10 the clear
> choice. But, there are still some situations where RAID6 is better
> suited (large pools of archive storage).

   Rumour has it that it's the next major thing after btrfsck is out
of the door. I don't know how accurate that is. I'm just some bloke on
the Internet. :)

> Also, do we know if the RAID5/6 implementation will simply break data
> into two data objects and one or two parity objects, or will it work
> with an arbitrary number of devices? Meaning, if I have a RAID6 pool
> of 12 drives, will I get 10 data objects and two parity objects?

   AFAIK, the original implementation looked something like the RAID-0
code, so if you have n drives with space for the next block group,
it'll take all n drives to use for the block group. Parity is then
allocated out of those n (with the distribution of the parity blocks
across different drives, as RAID-5 and -6 should do).

   So, allocating a RAID-6 block group of width 1G on your example
12-drive machine, you will indeed end up with 10G of space in that
block group, and 2G of parity data spread across all 12 drives.

   I don't know if the code that will be delivered will allow you to
set a smaller fixed-size stripe width (e.g. 4 data + 2 parity over 8
drives). If the 3-copies RAID-1 code rumour is also true, I would hope
so. Again, I'm just some bloke on the Internet...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- I'd make a joke about UDP,  but I don't know if ---         
                     anyone's actually listening...                      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:16   ` Liu Bo
@ 2012-02-21  1:22     ` Hugo Mills
  0 siblings, 0 replies; 19+ messages in thread
From: Hugo Mills @ 2012-02-21  1:22 UTC (permalink / raw)
  To: Liu Bo; +Cc: Wes, tom, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Tue, Feb 21, 2012 at 09:16:40AM +0800, Liu Bo wrote:
> On 02/21/2012 08:45 AM, Wes wrote:
> > meaning removing any 1 drive would result in lost data.
> 
> Removing any disk will not lose data cause btrfs ensure all the data
> in the removed disk is safely placed on right places.  And if there
> is not enough rest space for the data, the remove operations will
> fail.  Or what am I missing?

   The typo. :) He said he meant "removing any 2 drives" in the
follow-up mail.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- I'd make a joke about UDP,  but I don't know if ---         
                     anyone's actually listening...                      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:07   ` Hugo Mills
  2012-02-21  1:13     ` Tom Cameron
@ 2012-02-21  1:27     ` Wes
  2012-02-21  1:31       ` Hugo Mills
  1 sibling, 1 reply; 19+ messages in thread
From: Wes @ 2012-02-21  1:27 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

@hugo

iirc that was on ~3.0.8 but it might have been 3.0.0.  I'll revisit
the raid0 setup on a newer kernel series and test though before making
any more claims. :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:27     ` Wes
@ 2012-02-21  1:31       ` Hugo Mills
  0 siblings, 0 replies; 19+ messages in thread
From: Hugo Mills @ 2012-02-21  1:31 UTC (permalink / raw)
  To: Wes; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On Tue, Feb 21, 2012 at 12:27:56PM +1100, Wes wrote:
> @hugo
> 
> iirc that was on ~3.0.8 but it might have been 3.0.0.  I'll revisit
> the raid0 setup on a newer kernel series and test though before making
> any more claims. :)

   There's a repeating pattern of three log messages that comes out in
your syslogs. It's something like two "found n extents" messages, and
then "moving block group yyyyyyyyyyyy". As long as you keep getting
the latter message with different numbers, it's still working OK. The
block group numbers are monotonically decreasing (if they go up again,
there's a problem we need to know about), but aren't necessarily
linearly-spaced, particularly if you've done a balance or partial
balance before. i.e. they're an indication that something's happening,
but not how much more of it there is to go.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- I'd make a joke about UDP,  but I don't know if ---         
                     anyone's actually listening...                      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
       [not found]       ` <CA+WRLO9BgqE+CwCUNgjwjVFyjDDp94SBX_EbdVciHUd0jpUqWQ@mail.gmail.com>
@ 2012-02-21  1:59         ` Tom Cameron
  2012-02-21  2:46           ` Gareth Pye
  2012-02-21  7:54           ` Hugo Mills
  0 siblings, 2 replies; 19+ messages in thread
From: Tom Cameron @ 2012-02-21  1:59 UTC (permalink / raw)
  To: Gareth Pye; +Cc: linux-btrfs

Gareth,

I would completely agree. I only use the RAID vernacular here because,
well, it's the unfortunate defacto standard way to talk about data
protection.

I'd go a step beyond saying dupe or dupe + stripe, because future
modifications could conceivably see the addition of multiple
duplicated sets. The case of 4 disks in a BTRFS filesystem with dupe
running across all of them would be a clear extension I could see. So
that would be something like 4D. I'm not real sure what you'd use for
the terminology, but something completely different than RAID-like
terms is almost certainly best. Just look at the ZFS documentation to
see how carefully they have to spell out what RAID-Z, Z2, and Z3 do
because they used the RAID acronym.

On Mon, Feb 20, 2012 at 8:47 PM, Gareth Pye <gareth@cerberos.id.au> wro=
te:
> On Tue, Feb 21, 2012 at 12:07 PM, Tom Cameron <tomc603@gmail.com> wro=
te:
>>
>> It seems from the BTRFS documentation that the RAID1 profile is
>> actually "mirror", or store 2 copies of the object. Perhaps when
>> Oracle makes BTRFS a production option they should more clearly spel=
l
>> that out.
>
>
> I'd really like BTRFS to not use RAID level terminology anywhere (oth=
er than
> maybe in=A0parenthesis=A0along the lines of: "this is similar to RAID=
X") and use
> less ambigious options as the recommended way to talk about things. A=
s there
> is good reason to talk about Dup and RAID1 differently as they aren't=
 the
> same on more than 2 drives. Doing it that way will make people unders=
tand
> what is going on more often, which should be good.
>
> It also makes things much easier to remember. Like how much data can =
you fit
> on a 6 drive RAID10? I dunno, but I can more intuitively answer that =
same
> question when it is phrased as just simply 'dup', or maybe 'dup + str=
ipe'.
>
> Is there a difference in BTRFS between dup and raid10?
>
> --
> Gareth Pye
> Level 2 Judge, Melbourne, Australia
> Australian MTG Forum:=A0mtgau.com
> gareth@cerberos.id.au=A0-=A0www.rockpaperdynamite.wordpress.com
> "Dear God, I would like to file a bug report"
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:59         ` Tom Cameron
@ 2012-02-21  2:46           ` Gareth Pye
  2012-02-21  7:54           ` Hugo Mills
  1 sibling, 0 replies; 19+ messages in thread
From: Gareth Pye @ 2012-02-21  2:46 UTC (permalink / raw)
  To: tom; +Cc: linux-btrfs

I'd probably want to use DupeX to refer to what was classically RAID1
(Duplicate across all disks) and Dupe is an alias for Dup2 but one can
also choose Dupe3 through Dupe99

And I keep forgetting to post to the list in plain text, so many of
you may not have noticed my original email that only exists on the
mailing list in the Quotation in Tom's email

On Tue, Feb 21, 2012 at 12:59 PM, Tom Cameron <tomc603@gmail.com> wrote=
:
> Gareth,
>
> I would completely agree. I only use the RAID vernacular here because=
,
> well, it's the unfortunate defacto standard way to talk about data
> protection.
>
> I'd go a step beyond saying dupe or dupe + stripe, because future
> modifications could conceivably see the addition of multiple
> duplicated sets. The case of 4 disks in a BTRFS filesystem with dupe
> running across all of them would be a clear extension I could see. So
> that would be something like 4D. I'm not real sure what you'd use for
> the terminology, but something completely different than RAID-like
> terms is almost certainly best. Just look at the ZFS documentation to
> see how carefully they have to spell out what RAID-Z, Z2, and Z3 do
> because they used the RAID acronym.
>
> On Mon, Feb 20, 2012 at 8:47 PM, Gareth Pye <gareth@cerberos.id.au> w=
rote:
>> On Tue, Feb 21, 2012 at 12:07 PM, Tom Cameron <tomc603@gmail.com> wr=
ote:
>>>
>>> It seems from the BTRFS documentation that the RAID1 profile is
>>> actually "mirror", or store 2 copies of the object. Perhaps when
>>> Oracle makes BTRFS a production option they should more clearly spe=
ll
>>> that out.
>>
>>
>> I'd really like BTRFS to not use RAID level terminology anywhere (ot=
her than
>> maybe in=A0parenthesis=A0along the lines of: "this is similar to RAI=
DX") and use
>> less ambigious options as the recommended way to talk about things. =
As there
>> is good reason to talk about Dup and RAID1 differently as they aren'=
t the
>> same on more than 2 drives. Doing it that way will make people under=
stand
>> what is going on more often, which should be good.
>>
>> It also makes things much easier to remember. Like how much data can=
 you fit
>> on a 6 drive RAID10? I dunno, but I can more intuitively answer that=
 same
>> question when it is phrased as just simply 'dup', or maybe 'dup + st=
ripe'.
>>
>> Is there a difference in BTRFS between dup and raid10?
>>
>> --
>> Gareth Pye
>> Level 2 Judge, Melbourne, Australia
>> Australian MTG Forum:=A0mtgau.com
>> gareth@cerberos.id.au=A0-=A0www.rockpaperdynamite.wordpress.com
>> "Dear God, I would like to file a bug report"
>>



--=20
Gareth Pye
Level 2 Judge, Melbourne, Australia
Australian MTG Forum:=A0mtgau.com
gareth@cerberos.id.au=A0-=A0www.rockpaperdynamite.wordpress.com
"Dear God, I would like to file a bug report"
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:59         ` Tom Cameron
  2012-02-21  2:46           ` Gareth Pye
@ 2012-02-21  7:54           ` Hugo Mills
  2012-02-22  8:56             ` Xavier Nicollet
  1 sibling, 1 reply; 19+ messages in thread
From: Hugo Mills @ 2012-02-21  7:54 UTC (permalink / raw)
  To: tom; +Cc: Gareth Pye, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

On Mon, Feb 20, 2012 at 08:59:05PM -0500, Tom Cameron wrote:
> Gareth,
> 
> I would completely agree. I only use the RAID vernacular here because,
> well, it's the unfortunate defacto standard way to talk about data
> protection.
> 
> I'd go a step beyond saying dupe or dupe + stripe, because future
> modifications could conceivably see the addition of multiple
> duplicated sets. The case of 4 disks in a BTRFS filesystem with dupe
> running across all of them would be a clear extension I could see. So
> that would be something like 4D. I'm not real sure what you'd use for
> the terminology, but something completely different than RAID-like
> terms is almost certainly best. Just look at the ZFS documentation to
> see how carefully they have to spell out what RAID-Z, Z2, and Z3 do
> because they used the RAID acronym.

   /me opens a plate to put the can of worms on.

   Some time ago, I proposed the following scheme:

<n>C<m>S<p>P

   where n is the number of copies (suffixed by C), m is the number of
stripes for that data (suffixed by S), and p is the number of parity
blocks (suffixed by P). Values of zero are omitted.

   So btrfs's RAID-1 would be 2C, RAID-0 would be 1CnS, RAID-5 would
be 1CnS1P, and RAID-6 would be 1CnS2P. DUP would need a special
indicator to show that it wasn't redundant in the face of a whole-disk
failure: 2CN

   Hugo.

> On Mon, Feb 20, 2012 at 8:47 PM, Gareth Pye <gareth@cerberos.id.au> wrote:
> > On Tue, Feb 21, 2012 at 12:07 PM, Tom Cameron <tomc603@gmail.com> wrote:
> >>
> >> It seems from the BTRFS documentation that the RAID1 profile is
> >> actually "mirror", or store 2 copies of the object. Perhaps when
> >> Oracle makes BTRFS a production option they should more clearly spell
> >> that out.
> >
> >
> > I'd really like BTRFS to not use RAID level terminology anywhere (other than
> > maybe in parenthesis along the lines of: "this is similar to RAIDX") and use
> > less ambigious options as the recommended way to talk about things. As there
> > is good reason to talk about Dup and RAID1 differently as they aren't the
> > same on more than 2 drives. Doing it that way will make people understand
> > what is going on more often, which should be good.
> >
> > It also makes things much easier to remember. Like how much data can you fit
> > on a 6 drive RAID10? I dunno, but I can more intuitively answer that same
> > question when it is phrased as just simply 'dup', or maybe 'dup + stripe'.
> >
> > Is there a difference in BTRFS between dup and raid10?
> >

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
       --- Great oxymorons of the world, no. 4: Future Perfect ---       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  7:54           ` Hugo Mills
@ 2012-02-22  8:56             ` Xavier Nicollet
  2012-02-22 10:22               ` Hubert Kario
  0 siblings, 1 reply; 19+ messages in thread
From: Xavier Nicollet @ 2012-02-22  8:56 UTC (permalink / raw)
  To: Hugo Mills, tom, Gareth Pye, linux-btrfs

Le 21 February 2012 ? 07:54, Hugo Mills a =E9crit:
>    Some time ago, I proposed the following scheme:
>=20
> <n>C<m>S<p>P
>=20
>    where n is the number of copies (suffixed by C), m is the number o=
f
> stripes for that data (suffixed by S), and p is the number of parity
> blocks (suffixed by P). Values of zero are omitted.
>=20
>    So btrfs's RAID-1 would be 2C, RAID-0 would be 1CnS, RAID-5 would
> be 1CnS1P, and RAID-6 would be 1CnS2P. DUP would need a special
> indicator to show that it wasn't redundant in the face of a whole-dis=
k
> failure: 2CN

Seems clear. However, is the S really relevant ?
It would be simpler without it, wouldn't it ?

--=20
Xavier Nicollet
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-22  8:56             ` Xavier Nicollet
@ 2012-02-22 10:22               ` Hubert Kario
  2012-02-22 11:09                 ` Hugo Mills
  0 siblings, 1 reply; 19+ messages in thread
From: Hubert Kario @ 2012-02-22 10:22 UTC (permalink / raw)
  To: nicollet; +Cc: Hugo Mills, tom, Gareth Pye, linux-btrfs

On Wednesday 22 of February 2012 09:56:27 Xavier Nicollet wrote:
> Le 21 February 2012 ? 07:54, Hugo Mills a =E9crit:
> >    Some time ago, I proposed the following scheme:
> > <n>C<m>S<p>P
> >=20
> >    where n is the number of copies (suffixed by C), m is the number=
 of
> >=20
> > stripes for that data (suffixed by S), and p is the number of parit=
y
> > blocks (suffixed by P). Values of zero are omitted.
> >=20
> >    So btrfs's RAID-1 would be 2C, RAID-0 would be 1CnS, RAID-5 woul=
d
> >=20
> > be 1CnS1P, and RAID-6 would be 1CnS2P. DUP would need a special
> > indicator to show that it wasn't redundant in the face of a whole-d=
isk
> > failure: 2CN
>=20
> Seems clear. However, is the S really relevant ?
> It would be simpler without it, wouldn't it ?

It depends how striping will be implemented. Generally it provides=20
information on how much spindles is the data using. With static=20
configuration it will be useless, but when you start changing number of=
=20
drives in set then it's necessary to know if you're not under- or over-
utilising the disks.
--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=F3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-22 10:22               ` Hubert Kario
@ 2012-02-22 11:09                 ` Hugo Mills
  0 siblings, 0 replies; 19+ messages in thread
From: Hugo Mills @ 2012-02-22 11:09 UTC (permalink / raw)
  To: Hubert Kario; +Cc: nicollet, tom, Gareth Pye, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

On Wed, Feb 22, 2012 at 11:22:08AM +0100, Hubert Kario wrote:
> On Wednesday 22 of February 2012 09:56:27 Xavier Nicollet wrote:
> > Le 21 February 2012 ? 07:54, Hugo Mills a écrit:
> > >    Some time ago, I proposed the following scheme:
> > > <n>C<m>S<p>P
> > > 
> > >    where n is the number of copies (suffixed by C), m is the number of
> > > 
> > > stripes for that data (suffixed by S), and p is the number of parity
> > > blocks (suffixed by P). Values of zero are omitted.
> > > 
> > >    So btrfs's RAID-1 would be 2C, RAID-0 would be 1CnS, RAID-5 would
> > > 
> > > be 1CnS1P, and RAID-6 would be 1CnS2P. DUP would need a special
> > > indicator to show that it wasn't redundant in the face of a whole-disk
> > > failure: 2CN
> > 
> > Seems clear. However, is the S really relevant ?
> > It would be simpler without it, wouldn't it ?
> 
> It depends how striping will be implemented. Generally it provides 
> information on how much spindles is the data using. With static 
> configuration it will be useless, but when you start changing number of 
> drives in set then it's necessary to know if you're not under- or over-
> utilising the disks.

   Indeed. If the implementation always uses the largest number of
devices possible, then we'll always have nS. If it allows you to set a
fixed number of devices for a stripe, then the n will be a fixed
number, and it becomes useful.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
             --- Happiness is mandatory.  Are you happy? ---             

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Balance RAID10 with odd device count
  2012-02-21  1:21       ` Hugo Mills
@ 2012-02-22 11:48         ` Duncan
  0 siblings, 0 replies; 19+ messages in thread
From: Duncan @ 2012-02-22 11:48 UTC (permalink / raw)
  To: linux-btrfs

Hugo Mills posted on Tue, 21 Feb 2012 01:21:48 +0000 as excerpted:

> On Mon, Feb 20, 2012 at 08:13:43PM -0500, Tom Cameron wrote:
>> On Mon, Feb 20, 2012 at 8:07 PM, Hugo Mills <hugo@carfax.org.uk> wro=
te:
>> >
>> > =C2=A0 However, you can remove any one drive, and your data is fin=
e,
>> > =C2=A0 which
>> > is what btrfs's RAID-1 guarantee is. I understand that there will =
be
>> > additional features coming along Real Soon Now (possibly at the sa=
me
>> > time that RAID-5 and -6 are integrated) which will allow the
>> > selection of larger numbers of copies.
>> >
>> >
>> Is there a projected timeframe for RAID5/6? I understand it's curren=
tly
>> not the development focus of the BTRFS team, and most organizations
>> want performance over capacity making RAID10 the clear choice. But,
>> there are still some situations where RAID6 is better suited (large
>> pools of archive storage).
>=20
>    Rumour has it that it's the next major thing after btrfsck is out
> of the door. I don't know how accurate that is. I'm just some bloke o=
n
> the Internet. :)

The report I read (on phoronix, ymmv but it was supposed to be from a=20
talk at scalex, iirc) said raid-5/6 was planned for kernel 3.4 or 3.5,=20
with triple-copy-mirroring said to piggyback on some of that code, so=20
presumably 3.5 or 3.6.

Triple-copy-mirroring as a special case doesn't really make sense to me=
,=20
tho.  The first implementation as two-copy (dup) only makes sense, but =
in=20
generalizing that to allow triple copy, I'd think/hope they'd generaliz=
e=20
it to N-copy, IOW, traditional raid-1 style, instead.

I guess we'll see.

=46WIW, I'm running on an older 4-spindle md-raid1 setup now, and I had=
=20
/hoped/ to convert that to 4-copy btrfs-raid1, but that's simply not=20
possible ATM tho a hybrid 2-copy btrfs on dual dual-spindle md/raid1s i=
s=20
possible, if a bit complex.

Given that the disks are older, 300 gig sata seagates nearing half thei=
r=20
rated run-hours according to smart (great on power and spinup cycles=20
tho), now's not the time to switch them to dual-copy-only!  I'd think=20
about triple-copy, but no less!  Thus, I'm eagerly awaiting the=20
introduction of tri- or preferably N-copy raid1 mode, in 3.5-ish.  But=20
the various articles had lead me to believe that btrfs was almost ready=
=20
to have the experimental label removed, and it turns out not to be quit=
e=20
that far along, maybe end-of-year if things go well, so letting btrfs=20
continue to stabilize in general while I wait, certainly won't hurt. =3D=
:^)

Meanwhile, I'm staying on-list so as to keep informed of what else is=20
going on, btrfs-wise, while I wait for triple-copy-mode, minimum.


--=20
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-02-22 11:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-21  0:35 Balance RAID10 with odd device count Tom Cameron
2012-02-21  0:45 ` Wes
2012-02-21  0:51   ` Wes
2012-02-21  1:07     ` Tom Cameron
     [not found]       ` <CA+WRLO9BgqE+CwCUNgjwjVFyjDDp94SBX_EbdVciHUd0jpUqWQ@mail.gmail.com>
2012-02-21  1:59         ` Tom Cameron
2012-02-21  2:46           ` Gareth Pye
2012-02-21  7:54           ` Hugo Mills
2012-02-22  8:56             ` Xavier Nicollet
2012-02-22 10:22               ` Hubert Kario
2012-02-22 11:09                 ` Hugo Mills
2012-02-21  1:07   ` Hugo Mills
2012-02-21  1:13     ` Tom Cameron
2012-02-21  1:21       ` Hugo Mills
2012-02-22 11:48         ` Duncan
2012-02-21  1:27     ` Wes
2012-02-21  1:31       ` Hugo Mills
2012-02-21  1:16   ` Liu Bo
2012-02-21  1:22     ` Hugo Mills
2012-02-21  1:13 ` Hugo Mills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.