All of lore.kernel.org
 help / color / mirror / Atom feed
* Questions on using BtrFS for fileserver
@ 2014-08-19 16:21 M G Berberich
  2014-08-19 16:56 ` Kyle Manna
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: M G Berberich @ 2014-08-19 16:21 UTC (permalink / raw)
  To: linux-btrfs

Hello,

we are thinking about using BtrFS on standard hardware for a
fileserver with about 50T (100T raw) of storage (25×4TByte).

This is what I understood so far. Is this right?

· incremental send/receive works.

· There is no support for hotspares (spare disks that automatically
  replaces faulty disk).

· BtrFS with RAID1 is fairly stable.

· RAID 5/6 spreads all data over all devices, leading to performance
  problems on large diskarrays, and there is no option to limit the
  numbers of disk per stripe so far.

Some questions:

· There where reports, that bcache with btrfs leads to corruption. Is
  this still so?

· If a disk failes, does BtrFS rebalance automatically? (This would
  give a a kind o hotspare behavior)

· Besides using bcache, are there any possibilities to boost
  performance by adding (dedicated) cache-SSDs to a BtrFS?

· Are there any reports/papers/web-pages about BtrFS-systems this size
  in use? Praises, complains, performance-reviews, whatever…

	MfG
	bmg

-- 
„Des is völlig wurscht, was heut beschlos- | M G Berberich
 sen wird: I bin sowieso dagegn!“          | berberic@fmi.uni-passau.de
(SPD-Stadtrat Kurt Schindler; Regensburg)  | www.fmi.uni-passau.de/~berberic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
@ 2014-08-19 16:56 ` Kyle Manna
  2014-08-19 19:05 ` Austin S Hemmelgarn
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Kyle Manna @ 2014-08-19 16:56 UTC (permalink / raw)
  To: M G Berberich; +Cc: linux-btrfs

> · Besides using bcache, are there any possibilities to boost
>   performance by adding (dedicated) cache-SSDs to a BtrFS?

dm-cache is in the mainline kernel and lvm2 recently added support to
make devicemapper configuration automatic.  In my opinion, dm-cache is
a little easier to use because you can add/remove/resize the cache
without recreating the filesystem.  If you're interested, take a peek
at the man page for lvmcache.

- Kyle

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
  2014-08-19 16:56 ` Kyle Manna
@ 2014-08-19 19:05 ` Austin S Hemmelgarn
  2014-08-19 21:09 ` Mitch Harder
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Austin S Hemmelgarn @ 2014-08-19 19:05 UTC (permalink / raw)
  To: M G Berberich, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3025 bytes --]

On 2014-08-19 12:21, M G Berberich wrote:
> Hello,
> 
> we are thinking about using BtrFS on standard hardware for a
> fileserver with about 50T (100T raw) of storage (25×4TByte).
> 
> This is what I understood so far. Is this right?
> 
> · incremental send/receive works.
> 
> · There is no support for hotspares (spare disks that automatically
>   replaces faulty disk).
> 
> · BtrFS with RAID1 is fairly stable.
> 
> · RAID 5/6 spreads all data over all devices, leading to performance
>   problems on large diskarrays, and there is no option to limit the
>   numbers of disk per stripe so far.
> 
> Some questions:
> 
> · There where reports, that bcache with btrfs leads to corruption. Is
>   this still so?
Based on some testing I did last month, bcache with anything has the
potential to cause data corruption.
> 
> · If a disk failes, does BtrFS rebalance automatically? (This would
>   give a a kind o hotspare behavior)
No, but it wouldn't be hard to write a simple monitoring program to do
this from userspace.  IIRC, the big issue is that you need to add a
device in-place of the failed one for the re-balance to work.
> 
> · Besides using bcache, are there any possibilities to boost
>   performance by adding (dedicated) cache-SSDs to a BtrFS?
Like mentioned in one of the other responses, I would suggest looking
into dm-cache.  BTRFS itself does not have any functionality for this,
although there has been talk of implementing device priorities for
reads, which could provide a similar performance boost.
> 
> · Are there any reports/papers/web-pages about BtrFS-systems this size
>   in use? Praises, complains, performance-reviews, whatever…
While it doesn't quite fit the description, I have had very good success
with a very active 2TB BTRFS RAID10 filesystem consisting of BTRFS on
four unpartitioned 1TB SATA III hard drives.  The filesystem gets in
excess of 100GB of data written to it each day (almost all rewrites
however), and is what I use for /home, /var/log, and /var/lib, and I've
had no issues with it that were caused by BTRFS, and in-fact, the very
fact that it uses BTRFS helped me recover data when the storage
controller they are connected to went bad.  On average, I get about 125%
of raw disk performance on writes, and about 110% on reads.

If you are using a very large number of disks, then I would not suggest
that you use BTRFS RAID10, but instead BTRFS RAID1, as RAID10 will try
to stripe things across ALL of the devices in the filesystem, and unless
you have no more than about four times as many disks as storage
controllers (that is, each controller has no more than four disks
attached to it), the overhead outweighs the benefit of striping the data.

Also, just to make sure it's clear, in BTRFS RAID1, each block gets
written EXACTLY twice.  On the plus side though, this means that if you
do set-up a caching mechanism, you may be able to keep most of the array
spun down a majority of the time.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
  2014-08-19 16:56 ` Kyle Manna
  2014-08-19 19:05 ` Austin S Hemmelgarn
@ 2014-08-19 21:09 ` Mitch Harder
  2014-08-19 21:38 ` Andrej Manduch
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Mitch Harder @ 2014-08-19 21:09 UTC (permalink / raw)
  To: M G Berberich; +Cc: linux-btrfs

On Tue, Aug 19, 2014 at 11:21 AM, M G Berberich
<btrfs@oss.m-berberich.de> wrote:
> Hello,
>
> we are thinking about using BtrFS on standard hardware for a
> fileserver with about 50T (100T raw) of storage (25×4TByte).
>

I would recommend carefully reading this thread titled: "1 week to
rebuid 4x 3TB raid10 is a long time!"

http://comments.gmane.org/gmane.comp.file-systems.btrfs/36969

There are multiple methods for replacing a device in a Btrfs RAID
array.  If I understand the conclusions of this thread, you might
still expect 12-14 hours to rebuild after replacing a 4 TByte device,
assuming you use the optimal replace commands.

With 25 devices, that leaves an uncomfortable period of time where
another device might fail.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
                   ` (2 preceding siblings ...)
  2014-08-19 21:09 ` Mitch Harder
@ 2014-08-19 21:38 ` Andrej Manduch
  2014-08-20 15:23   ` Austin S Hemmelgarn
  2014-08-19 21:50 ` Roman Mamedov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Andrej Manduch @ 2014-08-19 21:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: M G Berberich

Hi,

On 08/19/2014 06:21 PM, M G Berberich wrote:> · Are there any
reports/papers/web-pages about BtrFS-systems this size
>   in use? Praises, complains, performance-reviews, whatever…

I don't know about papers or benchmarks but few weeks ago there was a
guy who has problem with really long mounting with btrfs with similiar size.
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36226.html

And I would not recommend 3TB disks. *I'm not btrfs dev* but as far as I
know there is a quite different between rebuilding disk on real RAID and
btrfs RAID. The problem is btrfs has RAID on filesystem level not on hw
level so there is bigger mechanical overheat on drives and thus it take
significantli longer than regular RAID.

--
b.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
                   ` (3 preceding siblings ...)
  2014-08-19 21:38 ` Andrej Manduch
@ 2014-08-19 21:50 ` Roman Mamedov
  2014-08-20  3:22 ` Marc MERLIN
  2014-08-21 20:20 ` Andrew E. Mileski
  6 siblings, 0 replies; 11+ messages in thread
From: Roman Mamedov @ 2014-08-19 21:50 UTC (permalink / raw)
  To: M G Berberich, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 982 bytes --]

On Tue, 19 Aug 2014 18:21:52 +0200
M G Berberich <btrfs@oss.m-berberich.de> wrote:

> · BtrFS with RAID1 is fairly stable.

Maybe, but it's not optimized for performance: reads are not balanced in the
most optimal way, and writes may end up being submitted sequentially rather
than in parallel to two devices, resulting in write performance that's way
less than that of a single device.

> · RAID 5/6 spreads all data over all devices, leading to performance
>   problems on large diskarrays, and there is no option to limit the
>   numbers of disk per stripe so far.

AFAIK Btrfs RAID 5/6 is not yet ready to be used in a production environment;

In your case I would recommend considering Btrfs on top of two 12-disk mdadm
RAID6 arrays, or three 8-disk ones, leaving one HDD as a shared hot spare.

To join the mdadm arrays into a larger block device you can use either LVM, or
Btrfs itself, with the "single" profile for data.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
                   ` (4 preceding siblings ...)
  2014-08-19 21:50 ` Roman Mamedov
@ 2014-08-20  3:22 ` Marc MERLIN
  2014-08-21 20:20 ` Andrew E. Mileski
  6 siblings, 0 replies; 11+ messages in thread
From: Marc MERLIN @ 2014-08-20  3:22 UTC (permalink / raw)
  To: M G Berberich; +Cc: linux-btrfs

On Tue, Aug 19, 2014 at 06:21:52PM +0200, M G Berberich wrote:
> · incremental send/receive works.
 
Yes.

> · There is no support for hotspares (spare disks that automatically
>   replaces faulty disk).

Correct

> · BtrFS with RAID1 is fairly stable.

>From what I know.

> · RAID 5/6 spreads all data over all devices, leading to performance
>   problems on large diskarrays, and there is no option to limit the
>   numbers of disk per stripe so far.

Not sure about the performance issue, but either way, don't use RAID5/6
with btrfs for anything else than playing around. The code is not
finished.

> · If a disk failes, does BtrFS rebalance automatically? (This would
>   give a a kind o hotspare behavior)
 
No, not for raid5/6.

> · Are there any reports/papers/web-pages about BtrFS-systems this size
>   in use? Praises, complains, performance-reviews, whatever…

Use md-raid5 which is known and true, and put btrfs on top.
And still have backups, be ready for btrfs to become unusable (speed
and/or deadlocks), get trashed, or some other problem.
It's not guaranteed to happen, but the odds are far from being 0 either,
so either your data is throwaway, or have good backups.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 21:38 ` Andrej Manduch
@ 2014-08-20 15:23   ` Austin S Hemmelgarn
  0 siblings, 0 replies; 11+ messages in thread
From: Austin S Hemmelgarn @ 2014-08-20 15:23 UTC (permalink / raw)
  To: Andrej Manduch, linux-btrfs; +Cc: M G Berberich

On 08/19/2014 05:38 PM, Andrej Manduch wrote:
> Hi,
> 
> On 08/19/2014 06:21 PM, M G Berberich wrote:> · Are there any
> reports/papers/web-pages about BtrFS-systems this size
>>   in use? Praises, complains, performance-reviews, whatever…
> 
> I don't know about papers or benchmarks but few weeks ago there was a
> guy who has problem with really long mounting with btrfs with similiar size.
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36226.html
> 
> And I would not recommend 3TB disks. *I'm not btrfs dev* but as far as I
> know there is a quite different between rebuilding disk on real RAID and
> btrfs RAID. The problem is btrfs has RAID on filesystem level not on hw
> level so there is bigger mechanical overheat on drives and thus it take
> significantli longer than regular RAID.
It really suprises me that so many people come to this conclusion, but
maybe they don't provide as much slack space as I do on my systems.  In
general you will only have a longer rebuild on BTRFS than on hardware
RAID if the filesystem is more than about 50% full.  On my desktop array
(4x 1TB disks using BTRFS RAID10), I've replaced disks before and it
took less than an hour for the operation.  Of course that array is
usually not more than 10% full.  Interestingly, it took less time to
rebuild this array the last time I lost a disk than it did back when it
was 3x 1TB disks in a BTRFS RAID1, so things might improve overall with
a larger number of disks in the array.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
                   ` (5 preceding siblings ...)
  2014-08-20  3:22 ` Marc MERLIN
@ 2014-08-21 20:20 ` Andrew E. Mileski
  6 siblings, 0 replies; 11+ messages in thread
From: Andrew E. Mileski @ 2014-08-21 20:20 UTC (permalink / raw)
  To: linux-btrfs

On 19/08/14 12:21 PM, M G Berberich wrote:
>
> we are thinking about using BtrFS on standard hardware for a
> fileserver with about 50T (100T raw) of storage (25×4TByte).
>
> ...
>
> · Are there any reports/papers/web-pages about BtrFS-systems this size
>    in use? Praises, complains, performance-reviews, whatever…

For what it is worth, I am running two btrfs filesystems:
1. Primary:  25 TiB, hardware RAID-6, LVM, PCIex8 (11x3TB)
2. Backup :  25 TiB, software RAID-5, LUKS, USB 3.0 (8x4TB)

I am not using btrfs RAID (-d single -m dup), rather hardware or 
software MD.  Neither are partitioned (as they are not bootable).

I do hourly / daily / weekly / monthly / yearly snapshots on subvolumes 
in the primary fs, and pruning excess snapshots (example: I only keep 24 
hourly snapshots).

Currently using stock Fedora 20, though I try to keep the btrfs utility 
up-to-date by building from GIT when an updated RPM is not available.


Overall impressions of btrfs:

* Very resilient.

It has suffered many hardware-related panics and no data-loss or 
filesystem corruption has been detected.  I maintain a backup, which 
includes hashes of everything, and also 5% par2 recovery for some 
critical data.

The data is fairly static though, with the vast majority of operations 
being reads.

* Much higher CPU load than ext4.

This exposes a known reset issue with the old 3Ware 9650SE-ML16 RAID 
controller.  Switching to the NOOP IO scheduler helped reduce the load 
considerably, but it still can get quite high [even without LUKS].

CPU & motherboard replacement hardware is on-hand, and an upgrade is 
imminent (currently using an old Core2 Duo @ 3 GHz, 4 GiB DDR2).

* Slow to mount, but not an unreasonable amount.


~~ Andrew E. Mileski

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
  2014-08-20  9:23 Tomasz Chmielewski
@ 2014-08-20 13:41 ` Benjamin O'Connor
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin O'Connor @ 2014-08-20 13:41 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

As a counter-argument, I use BTRFS on a filesystem with about 280TB raw right now as a 
fileserver.  Note that this is mostly transient data (raid 0), and I have stuck with 3.14, 
hearing the horror stories of 3.15/3.16 locking up.

Even at that size (29TB LUNs), I have been able to add and remove devices and rebalance 
with no issues other than it causing increased IO and taking several weeks to move that 
much data around.

Definitely avoid 3.15/3.16 and test out your workload first if possible to make sure it 
performs properly.  Also build the filesystem and put data on it and test device 
removals/rebuilds to make sure it works with your OS, btrfs tools, and kernel version. 
All that being said it works great for us where we value COW and expansion/rebalancing 
over performance and redundancy.

The filesystem is exported via NFS and rsync to over 200 clients over 10gb/sec ethernet, 
and hits around 5-7gb/sec balanced between reads and writes.

One of our alternative file storage needs that I'd also hoped to move to BTRFS consisted 
of subtrees of 255 directories, each with 255 directories under them, and 255 directories 
under them with 1 file in each (don't ask).  That *did not* work well under BTRFS -- 
probably due to the metadata juggling required in creating or removing any one file that 
far down in such a bizarre tree.  We kept that particular area under XFS.

-ben


Tomasz Chmielewski wrote:
>>> we are thinking about using BtrFS on standard hardware for a
>>> fileserver with about 50T (100T raw) of storage (25×4TByte).
>>>
>>
>> I would recommend carefully reading this thread titled: "1 week to
>> rebuid 4x 3TB raid10 is a long time!"
>
> So I have a 2 x 2.6 TB devices in btrfs RAID-1, 716G used. Linux 3.16.
>
> One of the disks failed.
>
> "btrfs device delete missing /home" is taking 9 days so far, on an idle system:
>
> root 4828 0.3 0.0 17844 260 pts/1 D+ Aug11 38:18 btrfs device delete missing /home
>
> There is some kind of btrfs debug info printed in dmesg which seems to tell me that the
> operation is working, like:
>
> [744657.598810] BTRFS info (device sda4): relocating block group 908951814144 flags 17
> [744672.021612] BTRFS info (device sda4): found 4784 extents
> [744688.604997] BTRFS info (device sda4): found 4784 extents
> [744689.133397] BTRFS info (device sda4): relocating block group 910025555968 flags 17
> [744701.162678] BTRFS info (device sda4): found 4196 extents
> [744725.000459] BTRFS info (device sda4): found 4196 extents
>
>
> but other than that, the recovery time doesn't look optimistic to me, there is no ability
> to check the progress etc.
>
>

-- 
-----------------------------
Benjamin O'Connor
TechOps Systems Administrator
TripAdvisor Media Group

boconnor@tripadvisor.com
c. 617-312-9072
-----------------------------


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on using BtrFS for fileserver
@ 2014-08-20  9:23 Tomasz Chmielewski
  2014-08-20 13:41 ` Benjamin O'Connor
  0 siblings, 1 reply; 11+ messages in thread
From: Tomasz Chmielewski @ 2014-08-20  9:23 UTC (permalink / raw)
  To: linux-btrfs

>> we are thinking about using BtrFS on standard hardware for a
>> fileserver with about 50T (100T raw) of storage (25×4TByte).
>> 
> 
> I would recommend carefully reading this thread titled: "1 week to
> rebuid 4x 3TB raid10 is a long time!"

So I have a 2 x 2.6 TB devices in btrfs RAID-1, 716G used. Linux 3.16.

One of the disks failed.

"btrfs device delete missing /home" is taking 9 days so far, on an idle 
system:

root      4828  0.3  0.0  17844   260 pts/1    D+   Aug11  38:18 btrfs 
device delete missing /home

There is some kind of btrfs debug info printed in dmesg which seems to 
tell me that the operation is working, like:

[744657.598810] BTRFS info (device sda4): relocating block group 
908951814144 flags 17
[744672.021612] BTRFS info (device sda4): found 4784 extents
[744688.604997] BTRFS info (device sda4): found 4784 extents
[744689.133397] BTRFS info (device sda4): relocating block group 
910025555968 flags 17
[744701.162678] BTRFS info (device sda4): found 4196 extents
[744725.000459] BTRFS info (device sda4): found 4196 extents


but other than that, the recovery time doesn't look optimistic to me, 
there is no ability to check the progress etc.


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-08-21 21:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-19 16:21 Questions on using BtrFS for fileserver M G Berberich
2014-08-19 16:56 ` Kyle Manna
2014-08-19 19:05 ` Austin S Hemmelgarn
2014-08-19 21:09 ` Mitch Harder
2014-08-19 21:38 ` Andrej Manduch
2014-08-20 15:23   ` Austin S Hemmelgarn
2014-08-19 21:50 ` Roman Mamedov
2014-08-20  3:22 ` Marc MERLIN
2014-08-21 20:20 ` Andrew E. Mileski
2014-08-20  9:23 Tomasz Chmielewski
2014-08-20 13:41 ` Benjamin O'Connor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.