All of lore.kernel.org
 help / color / mirror / Atom feed
* Btrfs and raid5 status with kernel 3.14, documentation, and howto
@ 2014-03-23 22:56 Marc MERLIN
  2014-03-24 19:17 ` Martin
  0 siblings, 1 reply; 6+ messages in thread
From: Marc MERLIN @ 2014-03-23 22:56 UTC (permalink / raw)
  To: linux-btrfs

Ok, thanks to the help I got from you, and my own experiments, I've
written this:
http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html

If someone reminds me how to edit the btrfs wiki, I'm happy to copy that
there, or give anyone permission to take part of all of what I wrote 
and use it for any purpose.



The highlights are if you're coming from the mdadm raid5 world:

- btrfs does not yet seem to know that if you removed a drive from an
  array and you plug it back in later, that drive is out of date. It
  will auto-add an out of date drive back to an array and that will
  likely cause data loss by hiding files you had but the old drive
  didn't have. This means you should wipe a drive cleanly before you put
  it back into an array it used to be part of

- btrfs does not deal well with a drive that is present but not
  working. It does not know how to kick it from the array, nor can it
  be removed (btrfs device delete) because this causes reading from
  the drive that isn't working. This means btrfs will try to write to
  the bad drive forever. The solution there is to umount the array,
  remount it with the bad drive missing (it cannot be seen by btrfs,
  or it'll get automounted/added), and then rebuild on a new drive or
  rebuild/shrink the array to be one drive smaller (this is explained
  below).

- You can add and remove drives from an array and rebalance to
  grow/shrink an array without umounting it. Note that is slow since it
  forces rewriting of all data blocks, and this takes about 3H per 100GB
  (or 30H per terabyte) with 10 drives on a dual core duo.

- If you are missing a drive, btrfs will refuse to mount the array and
  give an obscure error unless you mount with -o degraded

- btrfs has no special rebuild procedure. Rebuilding is done by
  rebalancing the array. You could actualy rebalance a degraded array to
  a smaller array by rebuilding/balancing without adding a drive, or you
  can add a drive, rebalance on it, and that will force a read/rewrite
  of all data blocks, which will restripe them nicely.

- btrfs replace does not work, but you can easily do btrfs device add,
  and btrfs remove of the other drive, and this will do the same thing.

- btrfs device add will not cause an auto rebalance. You could chose
  not to rebalance existing data and only have new data be balanced
  properly.

- btrfs device delete will force all data from the deleted drive to be
  rebalanced and the command completes when the drive has been freed up.

- The magic command to delete an unused drive from an array while it is
  missing from the system is btrfs device delete missing .

- btrfs doesn't easily tell you that your array is in degraded mode (run
  btrfs fi show, and it'll show a missing drive as well as how much of
  your total data is still on it). This does means you can have an array
  that is half degraded: half the files are striped over the current
  drives because they were written after the drive was removed, or were
  written by a rebalance that hasn't finished, while the other half of
  your data could be in degraded mode.
  You can see this by looking at the amount of data on each drive,
  anything on drive 11 is properly striped 10 way, while anything on
  drive 3 is in degraded mode:

polgara:~# btrfs fi show
Label: backupcopy  uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1
        Total devices 11 FS bytes used 564.54GiB
        devid    1 size 465.76GiB used 63.14GiB path /dev/dm-0
        devid    2 size 465.76GiB used 63.14GiB path /dev/dm-1
        devid    3 size 465.75GiB used 30.00GiB path   <- this device is missing
        devid    4 size 465.76GiB used 63.14GiB path /dev/dm-2
        devid    5 size 465.76GiB used 63.14GiB path /dev/dm-3
        devid    6 size 465.76GiB used 63.14GiB path /dev/dm-4
        devid    7 size 465.76GiB used 63.14GiB path /dev/mapper/crypt_sdi1
        devid    8 size 465.76GiB used 63.14GiB path /dev/mapper/crypt_sdj1
        devid    9 size 465.76GiB used 63.14GiB path /dev/dm-7
        devid    10 size 465.76GiB used 63.14GiB path /dev/dm-8
        devid    11 size 465.76GiB used 33.14GiB path /dev/mapper/crypt_sde1 <- this device was added


Hope this helps,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto
  2014-03-23 22:56 Btrfs and raid5 status with kernel 3.14, documentation, and howto Marc MERLIN
@ 2014-03-24 19:17 ` Martin
  2014-03-24 21:52   ` Marc MERLIN
  0 siblings, 1 reply; 6+ messages in thread
From: Martin @ 2014-03-24 19:17 UTC (permalink / raw)
  To: linux-btrfs

On 23/03/14 22:56, Marc MERLIN wrote:
> Ok, thanks to the help I got from you, and my own experiments, I've
> written this:
> http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html
> 
> If someone reminds me how to edit the btrfs wiki, I'm happy to copy that
> there, or give anyone permission to take part of all of what I wrote 
> and use it for any purpose.
> 
> 
> 
> The highlights are if you're coming from the mdadm raid5 world:
[---]
> 
> Hope this helps,
> Marc

Thanks for the very good summary.

So... In very brief summary, btrfs raid5 is very much a work in progress.


Question: Is the raid5 going to be seamlessly part of the
error-correcting raids whereby raid5, raid6,
raid-with-n-redundant-drives are all coded as one configurable raid?

Also (second question): What happened to the raid naming scheme that
better described the btrfs-style of raid by explicitly numbering the
number of devices used for mirroring, striping, and error-correction?


Thanks,
Martin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto
  2014-03-24 19:17 ` Martin
@ 2014-03-24 21:52   ` Marc MERLIN
  2014-03-25  1:11     ` Martin
  0 siblings, 1 reply; 6+ messages in thread
From: Marc MERLIN @ 2014-03-24 21:52 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs

On Mon, Mar 24, 2014 at 07:17:12PM +0000, Martin wrote:
> Thanks for the very good summary.
> 
> So... In very brief summary, btrfs raid5 is very much a work in progress.

If you know how to use it, which I didn't know do now, it's technically very
usable as is. The corner cases are in having a failing drive which you can't
hot remove because you can't write to it.
It's unfortunate that you can't just "kill" a drive without umounting,
making the drive disappear so that btrfs can't see it (dmsetup remove
cryptname for me, so it's easy to do remotely), and remounting in degraded
mode.
 
> Question: Is the raid5 going to be seamlessly part of the
> error-correcting raids whereby raid5, raid6,
> raid-with-n-redundant-drives are all coded as one configurable raid?

I'm not sure I parse your question. As far as btrfs is concerned you can
switch from non raid to raid5 to raid6 by adding a drive and rebalancing
which effectively reads and re-writes all the blocks in the new format.

> Also (second question): What happened to the raid naming scheme that
> better described the btrfs-style of raid by explicitly numbering the
> number of devices used for mirroring, striping, and error-correction?

btrfs fi show kind of tells you that if you know how to read it (I didn't
initially). What's missing for you?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto
  2014-03-24 21:52   ` Marc MERLIN
@ 2014-03-25  1:11     ` Martin
  2014-03-25  1:29       ` Marc MERLIN
  0 siblings, 1 reply; 6+ messages in thread
From: Martin @ 2014-03-25  1:11 UTC (permalink / raw)
  To: linux-btrfs

On 24/03/14 21:52, Marc MERLIN wrote:
> On Mon, Mar 24, 2014 at 07:17:12PM +0000, Martin wrote:
>> Thanks for the very good summary.
>>
>> So... In very brief summary, btrfs raid5 is very much a work in progress.
> 
> If you know how to use it, which I didn't know do now, it's technically very
> usable as is. The corner cases are in having a failing drive which you can't
> hot remove because you can't write to it.
> It's unfortunate that you can't just "kill" a drive without umounting,
> making the drive disappear so that btrfs can't see it (dmsetup remove
> cryptname for me, so it's easy to do remotely), and remounting in degraded
> mode.

Yes, looking good, but for my usage I need the option to run ok with a
failed drive. So, that's one to keep a development eye on for continued
progress...


>> Question: Is the raid5 going to be seamlessly part of the
>> error-correcting raids whereby raid5, raid6,
>> raid-with-n-redundant-drives are all coded as one configurable raid?
> 
> I'm not sure I parse your question. As far as btrfs is concerned you can
> switch from non raid to raid5 to raid6 by adding a drive and rebalancing
> which effectively reads and re-writes all the blocks in the new format.

There's a big thread a short while ago about using parity across
n-devices where the parity is spread such that you can have 1, 2, and up
to 6 redundant devices. Well beyond just raid5 and raid6:

http://lwn.net/Articles/579034/


>> Also (second question): What happened to the raid naming scheme that
>> better described the btrfs-style of raid by explicitly numbering the
>> number of devices used for mirroring, striping, and error-correction?
> 
> btrfs fi show kind of tells you that if you know how to read it (I didn't
> initially). What's missing for you?

btrfs raid1 at present is always just the two copies of data spread
across whatever number of disks you have. A more flexible arrangement is
to be able to set to have say 3 copies of data and use say 4 disks.
There's a new naming scheme proposed somewhere that enumerates all the
permutations possible for numbers of devices, copies and parity that
btrfs can support. For me, that is a 'killer' feature beyond what can be
done with md-raid for example.


Regards,
Martin




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto
  2014-03-25  1:11     ` Martin
@ 2014-03-25  1:29       ` Marc MERLIN
  2014-03-25  8:04         ` Brendan Hide
  0 siblings, 1 reply; 6+ messages in thread
From: Marc MERLIN @ 2014-03-25  1:29 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs

On Tue, Mar 25, 2014 at 01:11:43AM +0000, Martin wrote:
> Yes, looking good, but for my usage I need the option to run ok with a
> failed drive. So, that's one to keep a development eye on for continued
> progress...
 
So it does run with a failed drive, it'll just fill the logs with write
errors, but continue working ok.
 
> There's a big thread a short while ago about using parity across
> n-devices where the parity is spread such that you can have 1, 2, and up
> to 6 redundant devices. Well beyond just raid5 and raid6:
> 
> http://lwn.net/Articles/579034/
 
Aah, ok. I didn't understand you meant that. I know nothing about that, but
to be honest, raid6 feels like it's enough for me :)

> btrfs raid1 at present is always just the two copies of data spread
> across whatever number of disks you have. A more flexible arrangement is
> to be able to set to have say 3 copies of data and use say 4 disks.
> There's a new naming scheme proposed somewhere that enumerates all the
> permutations possible for numbers of devices, copies and parity that
> btrfs can support. For me, that is a 'killer' feature beyond what can be
> done with md-raid for example.
 
Right. That's on the roadmap from what I read here, just not ready yet.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs and raid5 status with kernel 3.14, documentation, and howto
  2014-03-25  1:29       ` Marc MERLIN
@ 2014-03-25  8:04         ` Brendan Hide
  0 siblings, 0 replies; 6+ messages in thread
From: Brendan Hide @ 2014-03-25  8:04 UTC (permalink / raw)
  To: Marc MERLIN, Martin; +Cc: linux-btrfs

On 25/03/14 03:29, Marc MERLIN wrote:
> On Tue, Mar 25, 2014 at 01:11:43AM +0000, Martin wrote:
>> There's a big thread a short while ago about using parity across
>> n-devices where the parity is spread such that you can have 1, 2, and up
>> to 6 redundant devices. Well beyond just raid5 and raid6:
>>
>> http://lwn.net/Articles/579034/
>   
> Aah, ok. I didn't understand you meant that. I know nothing about that, but
> to be honest, raid6 feels like it's enough for me :)

There are a few of us who are very much looking forward to these 
special/flexible RAID types - for example RAID15 (very good performance, 
very high redundancy, less than 50% diskspace efficiency). The csp 
notation will probably make it easier to develop the flexible raid types 
and is very much required in order to better manage these more flexible 
raid types.

A typical RAID15 with 12 disks would in csp notation is written as:
2c5s1p

And some would like to be able to use the exact same redundancy scheme 
even with extra disks:
2c5s1p on 16 disks (note, the example is not 2c7s1p, though that would 
also be a valid scheme with 16 disks being the minimum number of disks 
required)

The last thread on this (I think) can be viewed here, 
http://www.spinics.net/lists/linux-btrfs/msg23137.html where Hugo also 
explains and lists the notation for the existing schemes.

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-25  8:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-23 22:56 Btrfs and raid5 status with kernel 3.14, documentation, and howto Marc MERLIN
2014-03-24 19:17 ` Martin
2014-03-24 21:52   ` Marc MERLIN
2014-03-25  1:11     ` Martin
2014-03-25  1:29       ` Marc MERLIN
2014-03-25  8:04         ` Brendan Hide

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.