All of lore.kernel.org
 help / color / mirror / Atom feed
* understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices
@ 2017-06-01 14:54 Alexander Peganz
  2017-06-01 17:55 ` Timofey Titovets
  2017-06-01 18:47 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 5+ messages in thread
From: Alexander Peganz @ 2017-06-01 14:54 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I am trying to understand what differences there are in using btrfs
raid1 vs raid10 in terms of recoverability and also performance.
This has proven itself to be more difficult than expected since all
search results I could come up with generally suffer from one of three
flaws: they either discuss terribly old versions of btrfs, only
discuss 4 disk settings, or are about traditional HW (or mdadm) RAID
modes.

>From what I gathered so far, with raid1 btrfs just puts the 2 copies
of a file on 2 different devices.
And raid10 splits files into stripes, then writes 2 copies of each
stripe to 2 different devices. By splitting the files into stripes it
can write stripe 1 to devices A and B, while at the same time writing
stripe 2 to devices C and D, and so on. So a single copy of a file
might end up split across all devices, as does the second, but with
the stripes distributed in a way that the copies of each one stripe
are never on the same device.

So my first question is: is that actually correct? Or does btrfs raid1
create copies of blocks or something akin to stripes instead of files?
Because I imagine if it is at the file level there is a difference in
recoverability if the "wrong" 2 devices die.
For a raid1 I'd expect to only loose those files whose copies were
located on those 2 devices. Every file with a copy on one of the still
working devices would be recoverable. So the more devices there are
the bigger the percentage of recoverable files could get.
While with raid10 the copies of every file's first stripe might end up
on device A and device B, damaging every single file if A and B die at
the same time.
This might just be a reason for me to choose raid1 over raid10, so I
really appreciate if someone could enlighten me ;)

As to performance, with raid1 write speed should (theoretically) be
the same as a single disk (although writing the first half of the data
to device A while at the same time writing the second half to device B
would allow to write the first copy in half the time, and would allow
to create the second copy at some later point in time I highly doubt
btrfs is quite that adventurous). And read speeds should be up to
twice that of a single device.
With raid10 write speeds should be N times those of a single disk to
create the first copy, and since of course a second one has to be
written as well, effectively up to N/2. Read speeds should be up to N
times that of a single disk. But I couldn't find useful comparisons
using more than 4 devices. Should I expect any weirdness if I don't
have a multiple of 4 devices? Or do I just need an even number of
devices? Or is everything ok, even odd numbers?

And finally, could using raid10 cause me more headache than raid1
farther down the line when adding additional devices? How about if
those devices are not the same size as the original ones, any
difference between raid1 and 10?

Thank you for your help!
Alexander

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices
  2017-06-01 14:54 understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices Alexander Peganz
@ 2017-06-01 17:55 ` Timofey Titovets
  2017-06-01 19:26   ` Marat Khalili
  2017-06-01 18:47 ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 5+ messages in thread
From: Timofey Titovets @ 2017-06-01 17:55 UTC (permalink / raw)
  To: Alexander Peganz; +Cc: linux-btrfs

2017-06-01 17:54 GMT+03:00 Alexander Peganz <a.peganz@gmail.com>:
> Hello,
>
> I am trying to understand what differences there are in using btrfs
> raid1 vs raid10 in terms of recoverability and also performance.
> This has proven itself to be more difficult than expected since all
> search results I could come up with generally suffer from one of three
> flaws: they either discuss terribly old versions of btrfs, only
> discuss 4 disk settings, or are about traditional HW (or mdadm) RAID
> modes.
>
> From what I gathered so far, with raid1 btrfs just puts the 2 copies
> of a file on 2 different devices.
> And raid10 splits files into stripes, then writes 2 copies of each
> stripe to 2 different devices. By splitting the files into stripes it
> can write stripe 1 to devices A and B, while at the same time writing
> stripe 2 to devices C and D, and so on. So a single copy of a file
> might end up split across all devices, as does the second, but with
> the stripes distributed in a way that the copies of each one stripe
> are never on the same device.
>
> So my first question is: is that actually correct? Or does btrfs raid1
> create copies of blocks or something akin to stripes instead of files?
> Because I imagine if it is at the file level there is a difference in
> recoverability if the "wrong" 2 devices die.
> For a raid1 I'd expect to only loose those files whose copies were
> located on those 2 devices. Every file with a copy on one of the still
> working devices would be recoverable. So the more devices there are
> the bigger the percentage of recoverable files could get.
> While with raid10 the copies of every file's first stripe might end up
> on device A and device B, damaging every single file if A and B die at
> the same time.
> This might just be a reason for me to choose raid1 over raid10, so I
> really appreciate if someone could enlighten me ;)

Btrfs use abstraction and not stripe files.
Btrfs have chunks, chunks store data, chunks can have some profile
raid1/10 & etc.
For profiles like single, dup, raid1 next chunk allocated on device
with less used space.
So in general case, yes raid 1 will be randomly striped over disks.
Btrfs have some problems with raid profiles so in general this must be
used with care.
(As example, if you lose 1 disk in raid1 with 2 disks, you will have a
problem with remount this fs RW after & etc)
In general, follow simple rule: you can't restore data if you lose 2
disks in btrfs raid1/10, anyway.

> As to performance, with raid1 write speed should (theoretically) be
> the same as a single disk (although writing the first half of the data
> to device A while at the same time writing the second half to device B
> would allow to write the first copy in half the time, and would allow
> to create the second copy at some later point in time I highly doubt
> btrfs is quite that adventurous). And read speeds should be up to
> twice that of a single device.

raid 1 write data on all disks synchronously all time, no tricks.
btrfs raid1 read data by PID%2
0 - first copy
1 - second copy

In general, think of raid 1 as it's a single disk in performance.

> With raid10 write speeds should be N times those of a single disk to
> create the first copy, and since of course a second one has to be
> written as well, effectively up to N/2. Read speeds should be up to N
> times that of a single disk. But I couldn't find useful comparisons
> using more than 4 devices. Should I expect any weirdness if I don't
> have a multiple of 4 devices? Or do I just need an even number of
> devices? Or is everything ok, even odd numbers?

Write speed always N/2, read speed N/2 - N.
raid10 - you need 4 or more disks, 5 also supported.

> And finally, could using raid10 cause me more headache than raid1
> farther down the line when adding additional devices? How about if
> those devices are not the same size as the original ones, any
> difference between raid1 and 10?

Btrfs create chunks, and data stored on chunks, in jeneral you will
have more flexability with raid1, because you don't need to rebalance
whole FS after adding new disks, and btrfs will be more predictable in
mean of space usage with mixed sized devices.

So, if you want performance - use raid 10, if data availability are
enough, use raid1.

Thanks.
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices
  2017-06-01 14:54 understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices Alexander Peganz
  2017-06-01 17:55 ` Timofey Titovets
@ 2017-06-01 18:47 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2017-06-01 18:47 UTC (permalink / raw)
  To: Alexander Peganz, linux-btrfs

On 2017-06-01 10:54, Alexander Peganz wrote:
> Hello,
> 
> I am trying to understand what differences there are in using btrfs
> raid1 vs raid10 in terms of recoverability and also performance.
> This has proven itself to be more difficult than expected since all
> search results I could come up with generally suffer from one of three
> flaws: they either discuss terribly old versions of btrfs, only
> discuss 4 disk settings, or are about traditional HW (or mdadm) RAID
> modes.
> 
>  From what I gathered so far, with raid1 btrfs just puts the 2 copies
> of a file on 2 different devices.
> And raid10 splits files into stripes, then writes 2 copies of each
> stripe to 2 different devices. By splitting the files into stripes it
> can write stripe 1 to devices A and B, while at the same time writing
> stripe 2 to devices C and D, and so on. So a single copy of a file
> might end up split across all devices, as does the second, but with
> the stripes distributed in a way that the copies of each one stripe
> are never on the same device.
Kind of, except for two things:
1. BTRFS doesn't replicate or stripe at the file level.  BTRFS uses a 
two-stage allocator, allocating chunks of disk space for various block 
types, then allocating blocks within those chunks, and the striping and 
replication is done at the chunk level (so how a block is 
replicated/striped is a property of what chunk it is stored in).  Note 
that this is not exactly the same as conventional RAID, which stripe or 
replicate at either the block (RAID 0, 1, 4, 5, 6 and 10) or bit (RAID 2 
and 3) level.  This doesn't have much impact on how it behaves from a 
userspace perspective though unless you're part way through converting 
profiles and you interrupt the conversion, in which case any given file 
_might_ have different replication profiles for different parts.
2. BTRFS will use a number of devices for each stripe in a raid10 setup 
equal to the total number of devices in the array, divided by 2, rounded 
down.  So if you have 4 or 5 devices, each stripe will be across 2 
devices, but if you have 6 or 7, each stripe will be across 3 devices. 
This also happens at the chunk level, so if you have devices of 
different sizes, you may get variable stripe widths depending on how 
many devices have free space when a chunk is allocated.
> 
> So my first question is: is that actually correct? Or does btrfs raid1
> create copies of blocks or something akin to stripes instead of files?
> Because I imagine if it is at the file level there is a difference in
> recoverability if the "wrong" 2 devices die.
> For a raid1 I'd expect to only loose those files whose copies were
> located on those 2 devices. Every file with a copy on one of the still
> working devices would be recoverable. So the more devices there are
> the bigger the percentage of recoverable files could get.
> While with raid10 the copies of every file's first stripe might end up
> on device A and device B, damaging every single file if A and B die at
> the same time.
> This might just be a reason for me to choose raid1 over raid10, so I
> really appreciate if someone could enlighten me ;)
OK, to expound a bit more on this:
* BTRFS raid1 is currently exactly 2 copies.  This is different from LVM 
or MD RAID1, which have a number of replicas equal to the number of 
devices.  This means that if you lose 2 disks from a 3 disk BTRFS raid1 
volume, you will probably lose data, and the filesystem will refuse to 
mount.
* BTRFS raid10 is also exactly 2 copies, but there isn't a consistent 
mapping of devices to strips (segments of stripes), and it's not smart 
enough to fix things properly when you're missing different parts of 
each replica.  This in turn means that just like raid1 mode, if you lose 
2 disks, you've effectively got a dead filesystem.

Given this, the general consensus is that you only use raid10 mode if 
you need the best possible performance (and can't use more complicated 
setups, see the end of my response for suggestions regarding that), and 
use raid1 mode otherwise since it's marginally more reliable and it's 
more likely to allow you to recover entire files from a broken 
filesystem than raid10 mode is.
> 
> As to performance, with raid1 write speed should (theoretically) be
> the same as a single disk (although writing the first half of the data
> to device A while at the same time writing the second half to device B
> would allow to write the first copy in half the time, and would allow
> to create the second copy at some later point in time I highly doubt
> btrfs is quite that adventurous). And read speeds should be up to
> twice that of a single device.
In theory yes, but in practice, this is not the case.  BTRFS currently 
serializes writes (it only writes to one device at a time), and it will 
only service a given read from a single device.  In practice, this means 
that your write speed in raid1 mode is usually half your write speed for 
single device mode with the same hardware, and your read speed is 
identical between the two for any given thread (but by using multiple 
threads, you can improve this to the theoretical double speed).

The same caveats apply to raid10 mode, with the only difference being 
that the serialization is done per-stripe instead of per-device (at 
least, I know it is for reads, I'm not certain for writes), equating to 
at best N/2 write speed and N/2 read speed for a single thread.
> With raid10 write speeds should be N times those of a single disk to
> create the first copy, and since of course a second one has to be
> written as well, effectively up to N/2. Read speeds should be up to N
> times that of a single disk. But I couldn't find useful comparisons
> using more than 4 devices. Should I expect any weirdness if I don't
> have a multiple of 4 devices? Or do I just need an even number of
> devices? Or is everything ok, even odd numbers?
Any number is OK.  BTRFS will intelligently rotate which devices get 
used at the chunk level when it allocates new chunks so that things are 
roughly evenly distributed.  The only important part is that you need a 
minimum of 4 devices for raid10, or 2 for raid1.
> 
> And finally, could using raid10 cause me more headache than raid1
> farther down the line when adding additional devices? How about if
> those devices are not the same size as the original ones, any
> difference between raid1 and 10?
raid1 mode will handle this marginally better than raid10, but you are 
liable to get unexpected behavior when using variably sized devices 
regardless.

Now, if you are willing to use a slightly more complicated setup, you 
can actually get better performance than either option with roughly 
equivalent data safety by using BTRFS in raid1 mode on top of 2 LVM or 
MD RAID0 arrays.  Up until the last few months when I finally finished 
switching everything over to SSD's, this is what I had my systems set up 
for.  It gets you (based on my own testing) roughly 10-40% better 
performance depending on your workload compared to BTRFS raid10 mode, 
and it incurs no penalties in terms of data safety relative to BTRFS 
raid10 mode.  You can also do the same with other RAID levels below 
BTRFS to get varying rations of performance and data safety (I've tested 
it with RAID1, RAID10, and RAID5, all three work well, but are somewhat 
slow).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices
  2017-06-01 17:55 ` Timofey Titovets
@ 2017-06-01 19:26   ` Marat Khalili
  2017-06-01 21:32     ` Timofey Titovets
  0 siblings, 1 reply; 5+ messages in thread
From: Marat Khalili @ 2017-06-01 19:26 UTC (permalink / raw)
  To: linux-btrfs

>raid 1 write data on all disks synchronously all time, no tricks.
>btrfs raid1 read data by PID%2
>0 - first copy
>1 - second copy

Meaning, a single-process database will only ever read one copy? At least, meaning of first/second relative to physical devices depends on extent, right, right?
-- 

With Best Regards,
Marat Khalili

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices
  2017-06-01 19:26   ` Marat Khalili
@ 2017-06-01 21:32     ` Timofey Titovets
  0 siblings, 0 replies; 5+ messages in thread
From: Timofey Titovets @ 2017-06-01 21:32 UTC (permalink / raw)
  To: Marat Khalili; +Cc: linux-btrfs

2017-06-01 22:26 GMT+03:00 Marat Khalili <mkh@rqc.ru>:
>>raid 1 write data on all disks synchronously all time, no tricks.
>>btrfs raid1 read data by PID%2
>>0 - first copy
>>1 - second copy
>
> Meaning, a single-process database will only ever read one copy? At least, meaning of first/second relative to physical devices depends on extent, right, right?
> --
>
> With Best Regards,
> Marat Khalili
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

IIRC,
https://github.com/torvalds/linux/blob/dc9edaab90de9441cc28ac570b23b0d2bdba7879/fs/btrfs/volumes.c#L5764

So, for single process database you will read only from one disk.

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-06-01 21:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01 14:54 understanding differences in recoverability of raid1 vs raid10 and performance implications of unusual numbers of devices Alexander Peganz
2017-06-01 17:55 ` Timofey Titovets
2017-06-01 19:26   ` Marat Khalili
2017-06-01 21:32     ` Timofey Titovets
2017-06-01 18:47 ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.