All of lore.kernel.org
 help / color / mirror / Atom feed
* heterogeneous raid1
@ 2012-03-23  6:11 Bob McElrath
  2012-03-23  6:47 ` cwillu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Bob McElrath @ 2012-03-23  6:11 UTC (permalink / raw)
  To: linux-btrfs

Greetings butter-heads,

I would like to implement a redundant (raid1) disk array on heterogeneous disks
using btrfs.  A more detailed description of what I want to do can be found here:

http://superuser.com/questions/387851/a-zfs-or-lvm-or-md-redundant-heterogeneous-storage-proposal/388536

In a nutshell: organize your heterogenous disks into two "halves", the sum of
which are of roughly equal size, and create a raid1 array across those two
halves.

For various reasons I decided to go with btrfs over zfs.  What I have done is to
create two lvm Logical Volumes, one using a single large disk, and another as a
linear concatenation of several smaller disks.  It works, so far, and I could
automate it with some scripts.

In the long term, I would like this to be something that btrfs could do by
itself, without LVM.  Having absolutely no knowledge of the btrfs code, this
seems easy, I'm sure you'll tell me otherwise.  ;)  But one needs:

1) The ability to "group" a heterogeneous set of disks into the "halves" of a
raid1.  I don't understand what btrfs is doing if you give it more than 2
devices and ask for raid1.

2) Intellegently rebalance when a new device is added or removed (e.g. rearrange
the halves, and rebalance as necessary)

While btrfs seems to support multi-disk devices, in trying this, I encountered
the following deadly error: creating a raid1 btrfs with more than 2 devices
cannot be mounted in degraded mode if one or more are missing.  (In the above
plan, a filesystem should be mountable as long as one "half" is intact)  With 1
of 4 devices missing in such a circumstance, I get:

    device fsid 2ea954c6-d9ee-47c4-9f90-79a1342c71df devid 1 transid 31 /dev/loop0
    btrfs: allowing degraded mounts
    btrfs: failed to read chunk root on loop0
    btrfs: open_ctree failed

btrfs fi show:
    Label: none  uuid: 2ea954c6-d9ee-47c4-9f90-79a1342c71df
        Total devices 4 FS bytes used 1.78GB
        devid    1 size 1.00GB used 1.00GB path /dev/loop0
        devid    2 size 1.00GB used 1023.00MB path /dev/loop1
        devid    3 size 1.00GB used 1023.00MB path /dev/loop2
        *** Some devices missing

Also I discovered that writing to a degraded 2-disk raid1 btrfs array quickly
fills up the disk.  It does not behave as a single disk.  

Both these errors were encountered with Ubuntu 11.10 (linux 3.0.9).  I tried
with 3.0.22 and I got "failed to read chunk tree" instead of the above "failed
to read chunk root" and furthermore after mounting it degraded, I could not
mount it non-degraded, even after a balance and a fsck.

So, any comments on the general difficulty of implementing this proposal?  Can
someone explain the above errors?  What is btrfs doing with >2 disks and raid1?
Any comments on what parts of this should be inside btrfs, and which parts are
better in external scripts?  I think this feature would be extremely popular: it
turns btrfs into a Drobo.

P.S. why doesn't df work with btrfs raid1?  Why is 'btrfs fi df' necessary?

--
Cheers, Bob McElrath

"The individual has always had to struggle to keep from being overwhelmed by
the tribe.  If you try it, you will be lonely often, and sometimes frightened.
But no price is too high to pay for the privilege of owning yourself." 
    -- Friedrich Nietzsche

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23  6:11 heterogeneous raid1 Bob McElrath
@ 2012-03-23  6:47 ` cwillu
  2012-03-23 10:20 ` Hugo Mills
  2012-03-23 10:44 ` Roman Mamedov
  2 siblings, 0 replies; 9+ messages in thread
From: cwillu @ 2012-03-23  6:47 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-btrfs

> In a nutshell: organize your heterogenous disks into two "halves", th=
e sum of
> which are of roughly equal size, and create a raid1 array across thos=
e two
> halves.
>
[snip]
>
> In the long term, I would like this to be something that btrfs could =
do by
> itself, without LVM. =C2=A0Having absolutely no knowledge of the btrf=
s code, this
> seems easy, I'm sure you'll tell me otherwise. =C2=A0;) =C2=A0But one=
 needs:

It already does this, no organisation necessary.

> While btrfs seems to support multi-disk devices, in trying this, I en=
countered
> the following deadly error: creating a raid1 btrfs with more than 2 d=
evices
> cannot be mounted in degraded mode if one or more are missing. =C2=A0=
(In the above
> plan, a filesystem should be mountable as long as one "half" is intac=
t) =C2=A0With 1
> of 4 devices missing in such a circumstance, I get:

I suspect you didn't make a raid1, but rather a raid1 metadata with
raid0 data.

> Both these errors were encountered with Ubuntu 11.10 (linux 3.0.9). =C2=
=A0I tried
> with 3.0.22 and I got "failed to read chunk tree" instead of the abov=
e "failed
> to read chunk root" and furthermore after mounting it degraded, I cou=
ld not
> mount it non-degraded, even after a balance and a fsck.

3.0 is seriously out of date: anything prior to 3.2 can cause problems
on a hard reboot, and a bunch of other things have also been fixed.


> P.S. why doesn't df work with btrfs raid1? =C2=A0Why is 'btrfs fi df'=
 necessary?

df works fine, but doesn't (and can't) give a complete picture:
there's no way for btrfs to extend the syscall df uses to return more
information without breaking the api for everybody else.

Note that the faq covers all of these points :p
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23  6:11 heterogeneous raid1 Bob McElrath
  2012-03-23  6:47 ` cwillu
@ 2012-03-23 10:20 ` Hugo Mills
  2012-03-24  7:15   ` Duncan
  2012-03-23 10:44 ` Roman Mamedov
  2 siblings, 1 reply; 9+ messages in thread
From: Hugo Mills @ 2012-03-23 10:20 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]

On Fri, Mar 23, 2012 at 06:11:59AM +0000, Bob McElrath wrote:
> Greetings butter-heads,
> 
> I would like to implement a redundant (raid1) disk array on heterogeneous disks
> using btrfs.  A more detailed description of what I want to do can be found here:
> 
> http://superuser.com/questions/387851/a-zfs-or-lvm-or-md-redundant-heterogeneous-storage-proposal/388536
> 
> In a nutshell: organize your heterogenous disks into two "halves", the sum of
> which are of roughly equal size, and create a raid1 array across those two
> halves.
> 
> For various reasons I decided to go with btrfs over zfs.  What I have done is to
> create two lvm Logical Volumes, one using a single large disk, and another as a
> linear concatenation of several smaller disks.  It works, so far, and I could
> automate it with some scripts.

   btrfs doesn't quite do things this way. As well as the FAQ
suggested by Carey, you might want to look at the (rather misnamed)
SysadminGuide on the wiki at http://btrfs.ipv5.de/ .

> In the long term, I would like this to be something that btrfs could do by
> itself, without LVM.  Having absolutely no knowledge of the btrfs code, this
> seems easy, I'm sure you'll tell me otherwise.  ;)  But one needs:
> 
> 1) The ability to "group" a heterogeneous set of disks into the "halves" of a
> raid1.  I don't understand what btrfs is doing if you give it more than 2
> devices and ask for raid1.
> 
> 2) Intellegently rebalance when a new device is added or removed (e.g. rearrange
> the halves, and rebalance as necessary)

   A balance operation is incredibly expensive. It would be much
better to have a complex policy on when to rebalance. Think of trying
to add two new disks to a nearly-full 20TB array: you really don't
want to have to wait for 20TB of data to be rewritten before you add
the second drive. Such a complex policy doesn't belong in the kernel
(and probably doesn't belong in code, unless you've got some
mind-reading software, or a webcam and enough image-processing to
identify the stack of disks on the admin's desk).

   I'm not trying to argue that you shouldn't automatically rebalance
after a new device is added, but more that the feature probably
shouldn't be in the kernel.

   Hugo.

[snip]

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- UNIX: British manufacturer of modular shelving units. ---      

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23  6:11 heterogeneous raid1 Bob McElrath
  2012-03-23  6:47 ` cwillu
  2012-03-23 10:20 ` Hugo Mills
@ 2012-03-23 10:44 ` Roman Mamedov
  2012-03-23 16:49   ` Bob McElrath
  2 siblings, 1 reply; 9+ messages in thread
From: Roman Mamedov @ 2012-03-23 10:44 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]

On Fri, 23 Mar 2012 06:11:59 +0000
Bob McElrath <bob@mcelrath.org> wrote:

> http://superuser.com/questions/387851/a-zfs-or-lvm-or-md-redundant-heterogeneous-storage-proposal/388536
> 
> In a nutshell: organize your heterogenous disks into two "halves", the sum of
> which are of roughly equal size, and create a raid1 array across those two
> halves.

This seems to be an extremely simplistic concept and also a very inefficient
use of storage space, while not even providing enough redundancy (can't
reliably tolerate an any-two-disks failure even).

I suggest that you go with http://linuxconfig.org/prouhd-raid-for-the-end-user
instead. Depending on how many drives you have, the widest portion can be raid
6, then decreasing to RAID5 for the second stage, then finally to RAID1 for
the tail.

Also remember that with MD you can also create arrays from arrays. So e.g. a
RAID0 of two 500GB members can join a RAID6 of 1TB members. More on this idea:
http://louwrentius.com/blog/2008/08/building-a-raid-6-array-of-mixed-drives/

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23 10:44 ` Roman Mamedov
@ 2012-03-23 16:49   ` Bob McElrath
  2012-03-23 17:13     ` Roman Mamedov
  0 siblings, 1 reply; 9+ messages in thread
From: Bob McElrath @ 2012-03-23 16:49 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

Roman Mamedov [rm@romanrm.ru] wrote:
> On Fri, 23 Mar 2012 06:11:59 +0000
> Bob McElrath <bob@mcelrath.org> wrote:
> 
> > http://superuser.com/questions/387851/a-zfs-or-lvm-or-md-redundant-heterogeneous-storage-proposal/388536
> > 
> > In a nutshell: organize your heterogenous disks into two "halves", the sum of
> > which are of roughly equal size, and create a raid1 array across those two
> > halves.
> 
> This seems to be an extremely simplistic concept and also a very inefficient
> use of storage space, while not even providing enough redundancy (can't
> reliably tolerate an any-two-disks failure even).
>
> I suggest that you go with http://linuxconfig.org/prouhd-raid-for-the-end-user
> instead. Depending on how many drives you have, the widest portion can be raid
> 6, then decreasing to RAID5 for the second stage, then finally to RAID1 for
> the tail.

The algorithm I proposed wastes a lot less space.  The above article wastes 2Tb
in his first example, while mine would waste 0 in a raid1.  (2Tb+1Tb+1Tb) and
4Tb in raid1.

And I've chosen not to worry about 2-disk failures.

> Also remember that with MD you can also create arrays from arrays. So e.g. a
> RAID0 of two 500GB members can join a RAID6 of 1TB members. More on this idea:
> http://louwrentius.com/blog/2008/08/building-a-raid-6-array-of-mixed-drives/

I'm aware of that, and decided against it.  The way btrfs does things is the way
of the future.  Using multiple raids there are so many layers (md+md+lvm+btrfs)
that it becomes an administration nightmare, and I've had enough of rebuilding
raid arrays by hand for one lifetime.

--
Cheers, Bob McElrath

"The individual has always had to struggle to keep from being overwhelmed by
the tribe.  If you try it, you will be lonely often, and sometimes frightened.
But no price is too high to pay for the privilege of owning yourself." 
    -- Friedrich Nietzsche

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23 16:49   ` Bob McElrath
@ 2012-03-23 17:13     ` Roman Mamedov
  2012-03-23 17:35       ` Bob McElrath
  0 siblings, 1 reply; 9+ messages in thread
From: Roman Mamedov @ 2012-03-23 17:13 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

On Fri, 23 Mar 2012 16:49:32 +0000
Bob McElrath <bob@mcelrath.org> wrote:

> Roman Mamedov [rm@romanrm.ru] wrote:
> (can't reliably tolerate an any-two-disks failure even).

s/even/event/

> > I suggest that you go with http://linuxconfig.org/prouhd-raid-for-the-end-user
> > instead. Depending on how many drives you have, the widest portion can be raid
> > 6, then decreasing to RAID5 for the second stage, then finally to RAID1 for
> > the tail.
> 
> The algorithm I proposed wastes a lot less space.  The above article wastes 2Tb
> in his first example, while mine would waste 0 in a raid1.  (2Tb+1Tb+1Tb) and
> 4Tb in raid1.

Aye, but I consider space used for redundancy to be wasted as well, especially
when the same (or even higher) amount of redundancy can be achieved by spending
less storage space on it.

E.g. I'd consider a 16-disk LINEAR+RAID1(which is kinda what your algorithm
is) more wasteful than a 16-disk RAID6. Because even with varying disk sizes,
using PROUHD and also implementing "stackable" RAIDs where needed, you can
achieve either a complete coverage with parity-based redundancy (RAID5 and
RAID6), or have to resort to the mirror-based redundancy only for a small
"tail" portion of the volume.

> I'm aware of that, and decided against it.  The way btrfs does things is the way
> of the future.  Using multiple raids there are so many layers (md+md+lvm+btrfs)
> that it becomes an administration nightmare, and I've had enough of rebuilding
> raid arrays by hand for one lifetime.

Again no argument here, just wanted to throw the link out there as it was an
eye opener for me, and for my primary storage I currently use a 6-member RAID6
consisting of 5x 2TB physical disks and a 2TB RAID0 from 1.5TB+500GB (yes,
mdadm can also do RAID0 of differently-sized drives! it'll stripe while it
can, and after that it's just the tail of the larger drive).

Sorry for all the mdadm off-topic. :)

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23 17:13     ` Roman Mamedov
@ 2012-03-23 17:35       ` Bob McElrath
  2012-03-25 11:48         ` Chris Samuel
  0 siblings, 1 reply; 9+ messages in thread
From: Bob McElrath @ 2012-03-23 17:35 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

Roman Mamedov [rm@romanrm.ru] wrote:
> On Fri, 23 Mar 2012 16:49:32 +0000
> Aye, but I consider space used for redundancy to be wasted as well, especially
> when the same (or even higher) amount of redundancy can be achieved by spending
> less storage space on it.

Point taken.  So how's the btrfs raid6 implementation coming along?  ;)

> Again no argument here, just wanted to throw the link out there as it was an
> eye opener for me, and for my primary storage I currently use a 6-member RAID6
> consisting of 5x 2TB physical disks and a 2TB RAID0 from 1.5TB+500GB (yes,
> mdadm can also do RAID0 of differently-sized drives! it'll stripe while it
> can, and after that it's just the tail of the larger drive).

I actually came up with that same algorithm, before switching to the simpler
raid1 arrangement.  The former provides more redundancy, at the expense of a
*lot* more complexity.  Having rebuilt raid arrays by hand, it's not so
difficult to make a mistake there, and nuke your array, rendering your fancy
2-disk failure protection moot.  e.g. one can also issue the wrong set of
commands to btrfs and zero the superblock by accident (or so I read)...in my
case it was a motherboard that rearranged sda/sdb/sdc on each boot, and older
mdadm which didn't handle that gracefully.  If that author had provided some
nice scripts that do what he described, I'd test it, but I didn't see any...

In my proposal I'm unhappy to have to use lvm at all, and would like to remove
that dependency, in the interest of fewer chances to fuck up during a
failure/rebuild.

I'm still dreaming of a fs/admin tool that I can throw disks at, and not have to
spend so much time with the details of partitioning/raid/lvm/fs.  Imagine a
"pool" with check-boxes for how much redundancy you want, and it tells you how
much space you'll have.

--
Cheers, Bob McElrath

"The individual has always had to struggle to keep from being overwhelmed by
the tribe.  If you try it, you will be lonely often, and sometimes frightened.
But no price is too high to pay for the privilege of owning yourself." 
    -- Friedrich Nietzsche

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23 10:20 ` Hugo Mills
@ 2012-03-24  7:15   ` Duncan
  0 siblings, 0 replies; 9+ messages in thread
From: Duncan @ 2012-03-24  7:15 UTC (permalink / raw)
  To: linux-btrfs

Hugo Mills posted on Fri, 23 Mar 2012 10:20:07 +0000 as excerpted:

> On Fri, Mar 23, 2012 at 06:11:59AM +0000, Bob McElrath wrote:
>> Greetings butter-heads,
>> 
>> I would like to implement a redundant (raid1) disk array on
>> heterogeneous disks using btrfs.  A more detailed description of what I
>> want to do can be found here:
>> 
>> http://superuser.com/questions/387851/a-zfs-or-lvm-or-md-redundant-
heterogeneous-storage-proposal/388536
>> 
>> In a nutshell: organize your heterogenous disks into two "halves", the
>> sum of which are of roughly equal size, and create a raid1 array across
>> those two halves.
>> 
>> For various reasons I decided to go with btrfs over zfs.  What I have
>> done is to create two lvm Logical Volumes, one using a single large
>> disk, and another as a linear concatenation of several smaller disks. 
>> It works, so far, and I could automate it with some scripts.
> 
>    btrfs doesn't quite do things this way. As well as the FAQ
> suggested by Carey, you might want to look at the (rather misnamed)
> SysadminGuide on the wiki at http://btrfs.ipv5.de/ .

[ This post is targeted at the OP, but replied to Hugo as I'm building on 
what he stated. ]

Let me agree with the others here, but add a bit more.  You REALLY want 
to read the wiki, not just the FAQ and the sysadmin guide, but most of 
it, including the multi-device page and the use-cases page.

Additionally, let me stress that btrfs isn't stable yet.  The btrfs 
option in the kernel config not only says it's still experimental, but 
says that it's only appropriate for testing with non-critical data.  The 
wiki also makes the point on multiple pages that btrfs is still under 
heavy development.  Here's a bit of what it says on the source-code 
repositories page, for instance:

>>>>>

Since 2.6.29-rc1, Btrfs has been included in the mainline kernel.

Warning, Btrfs evolves very quickly do not test it unless:

    You have good backups and you have tested the restore capability
    You have a backup installation that you can switch to when something 
breaks
    You are willing to report any issues you find
    You can apply patches and compile the latest btrfs code against your 
kernel (quite easy with git and dkms, see below)
    You acknowledge that btrfs may eat your data
    Backups! Backups! Backups! 

<<<<<

Believe it!  While a lot of people are already using btrfs for their data 
without the level of backups and reliability skepticism displayed above 
and some may indeed never have problems, this list seems to get about two 
posts a week from folks who have lost data and are trying to recover it, 
not so they can better help with testing, but because they did NOT have 
reliable and tested current backups of the data on their btrfs 
filesystems!

With mature filesystems, you make backups in case of failure, but don't 
necessarily expect to have to use them.  With filesystems in development 
as is btrfs, the data on the filesystem should be considered test data, 
only copied to the filesystem for testing purposes, with what you 
consider your primary copy safely stored... as well as backed up... 
elsewhere.

Don't be one of those couple posts a week!

If that hasn't scared you off until such time as btrfs is a bit more 
mature, and yes, you have that sort of reliable primary storage and 
backup system in use, and btrfs will indeed only be used for data that 
you're willing to lose at any time... then there's a reasonable chance 
btrfs at this stage is ready for your testing. =:^)


It should now go without saying, but I'll repeat again another point the 
wiki makes repeatedly.  Given the rate at which btrfs is developing, if 
you're running it, you REALLY want to be running at LEAST the latest 
Linus tree stable release, now 3.3. (FWIW, btrfs updates don't hit the 
stable tree as btrfs isn't stable, so while you might do stable updates 
for security or other reasons, don't expect them to contain btrfs 
updates.  Use the -rcs or Chris's tree for that.)  Running the current 
development kernel, at least after rc1, is better, and for the latest, 
use the for-linus branch of Chris's tree or even keep up with the patches 
on the list.

That's why cwillu said 3.0 is "seriously out of date" -- for the purposes 
of those that *SHOULD* be testing/running btrfs at this point, it *IS* 
seriously out of date!  People should be on 3.2 at the very oldest, and 
preferably be updating to 3.3 by now, if they're testing/running btrfs at 
the stage it is now.  If they're not prepared for that sort of kernel 
update cycle, they shouldn't be on btrfs at this point.


You'll also want to read up on the userspace tools on the wiki, as the 
latest release, 0.19, is wayyy old and you'll probably want to do live-
git updates.  However, the userspace tools aren't quite as critical as 
the kernel since updates there are mostly to take advantage of new stuff 
in the kernel; the old stuff generally still works about as it did, so as 
long as you're /reasonably/ current there, you're probably fine.


Meanwhile, btrfs' so-called raid1 is really mis-named as it isn't in fact 
raid1 in the traditional sense at all.  Instead, it's two-way-mirroring 
only.  True raid1 on three or more devices would have as many copies as 
there are devices, while (current mainline) btrfs only does two-way 
mirroring, regardless of the number of devices.  There's patches around 
to add multi-way-mirroring (at least three-way, I'm not sure whether it's 
true-N-way, or only adding specifically three-way to the existing two-
way), but the feature is only available as patches, at this point.  It's 
planned for merge after the raid5/6 code is merged, which may be this 
cycle (3.4), so multi-way-mirroring might be kernel 3.5.

As it happens, the current two-way-mirroring suits your needs just fine, 
tho, so that bit is not a problem.  Again, that's why cwillu stated that 
btrfs already should work as you need, and why all three of us have 
pointed you at the wiki.  In particular, as Hugo mentions, the sysadmin 
page describes how btrfs arranges chunks, etc, across multiple drives to 
get the two-way-mirroring in its so-called raid1 and raid10 modes.

>> In the long term, I would like this to be something that btrfs could do
>> by itself, without LVM.

I can't fault you there!  FWIW I'm using md/raid here (my critical btrfs 
feature is N-way mirroring, which isn't there yet, as explained above, so 
I'm not even testing it yet, just keeping up with developments), but 
after initially trying LVM on md/raid, I decided the stacking was WAYYY 
too complex to have the necessary level of confidence in my ability to 
rebuild, especially under an already heavily stressful recovery scenario 
where I had limited access to documentation, etc.

So here, I'm using (gpt) partitioned md/raid on (gpt) partitioned 
hardware devices -- no lvm, no initrd, assemble the md containing the 
rootfs from the kernel commandline (via grub) and boot directly to it, 
multiple md/raid devices configured such that I have working and backup 
instances of most filesystems (including the rootfs, so I can switch to 
the backup directly from grub) on separate mds, and everything split up 
such that recovering individual mds goes quite fast as they're each only 
a few gigs to a hundred gigs or so each.

With LVM, not only did I have the additional complexity of the additional 
layer and separate administrative command set to master, but since lvm 
requires userspace configuration, I had to keep at least the rootfs and 
its backup on mds not handled by lvm, which substantially reduced the 
benefit of LVM in the first place.  It simply wasn't worth it!

So I'm with you all the way on wishing to be rid of the LVM layer, for 
sure!  Being rid of md as well would be nice, but as I said, btrfs 
doesn't have the three-way-plus mirroring I'm looking for yet, and even 
if it did, it's not mature enough to kill at least my backup mds, tho I'm 
leading edge enough that I'd consider it to replace my primary mds if I 
could.

>>  Having absolutely no knowledge of the btrfs
>> code, this seems easy, I'm sure you'll tell me otherwise.  ;)  But one
>> needs:
>> 
>> 1) The ability to "group" a heterogeneous set of disks into the
>> "halves" of a raid1.  I don't understand what btrfs is doing if you
>> give it more than 2 devices and ask for raid1.

Again, see that sysadmin page on the wiki.  Basically, it distributes two 
copies of each block, making sure each copy is on a different device, 
among all the devices.  But the diagram on that page explains it better 
than I can, here.

The difference against your description, however, is that while any 
single device can fail without loss, the distribution is such that you 
don't have two "sides", thus allowing multiple devices on the same side 
to fail as long as the other side is fine.  However, you state that 
single device failure is all you're looking to cover, so you should be 
fine.

>> 2) Intellegently rebalance when a new device is added or removed (e.g.
>> rearrange the halves, and rebalance as necessary)
> 
>    A balance operation is incredibly expensive. It would be much
> better to have a complex policy on when to rebalance. Think of trying to
> add two new disks to a nearly-full 20TB array: you really don't want to
> have to wait for 20TB of data to be rewritten before you add the second
> drive. Such a complex policy doesn't belong in the kernel (and probably
> doesn't belong in code, unless you've got some mind-reading software, or
> a webcam and enough image-processing to identify the stack of disks on
> the admin's desk).
> 
>    I'm not trying to argue that you shouldn't automatically rebalance
> after a new device is added, but more that the feature probably
> shouldn't be in the kernel.

Agreed.  There will likely be scripts available with that sort of 
intelligence if there aren't already, but they'd need customized to a 
degree that's definitely not kernelspace appropriate, and probably isn't 
btrfs-tools userspace appropriate either, except possibly as part of a 
collection of optional scripts, which would likely include the snapshot 
scheduler scripts already available as discussed on other recent list 
threads, as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heterogeneous raid1
  2012-03-23 17:35       ` Bob McElrath
@ 2012-03-25 11:48         ` Chris Samuel
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Samuel @ 2012-03-25 11:48 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: Text/Plain, Size: 734 bytes --]

On Saturday 24 March 2012 04:35:39 Bob McElrath wrote:

> I'm still dreaming of a fs/admin tool that I can throw disks at,
> and not have to spend so much time with the details of
> partitioning/raid/lvm/fs.

There was a tool called System Storage Manager (ssm) that someone from 
RedHat posted about late last year:

http://www.redhat.com/archives/linux-lvm/2011-December/msg00012.html

Unfortunately it looks like the git repo on SourceForge hasn't been 
touched since the code was pushed last December. :-(

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-03-25 11:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-23  6:11 heterogeneous raid1 Bob McElrath
2012-03-23  6:47 ` cwillu
2012-03-23 10:20 ` Hugo Mills
2012-03-24  7:15   ` Duncan
2012-03-23 10:44 ` Roman Mamedov
2012-03-23 16:49   ` Bob McElrath
2012-03-23 17:13     ` Roman Mamedov
2012-03-23 17:35       ` Bob McElrath
2012-03-25 11:48         ` Chris Samuel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.