All of lore.kernel.org
 help / color / mirror / Atom feed
* speed up big btrfs volumes with ssds
@ 2017-09-03 18:32 Stefan Priebe - Profihost AG
  2017-09-03 19:16 ` Peter Grandi
  2017-09-04 10:53 ` Henk Slager
  0 siblings, 2 replies; 11+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-09-03 18:32 UTC (permalink / raw)
  To: linux-btrfs

Hello,

i'm trying to speed up big btrfs volumes.

Some facts:
- Kernel will be 4.13-rc7
- needed volume size is 60TB

Currently without any ssds i get the best speed with:
- 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices

and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.

I can live with a data loss every now and and than ;-) so a raid 0 on
top of the 4x radi5 is acceptable for me.

Currently the write speed is not as good as i would like - especially
for random 8k-16k I/O.

My current idea is to use a pcie flash card with bcache on top of each
raid 5.

Is this something which makes sense to speed up the write speed.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-03 18:32 speed up big btrfs volumes with ssds Stefan Priebe - Profihost AG
@ 2017-09-03 19:16 ` Peter Grandi
  2017-09-04 10:53 ` Henk Slager
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Grandi @ 2017-09-03 19:16 UTC (permalink / raw)
  To: Linux fs Btrfs

> [ ... ] - needed volume size is 60TB

I wonder how long that takes to 'scrub', 'balance', 'check',
'subvolume delete', 'find', etc.

> [ ... ] 4x HW Raid 5 with 1GB controller memory of 4TB 3,5"
> devices and using btrfs as raid 0 for data and metadata on top
> of those 4 raid 5. [ ... ]  the write speed is not as good as
> i would like - especially for random 8k-16k I/O. [ ... ]

Also I noticed that the rain is wet and cold - especially if one
walks around for a few hours in a t-shirt, shorts and sandals.
:-)

> My current idea is to use a pcie flash card with bcache on top
> of each raid 5. Is this something which makes sense to speed
> up the write speed.

Well 'bcache' in the role of write buffer allegedly helps
turning unaligned writes into aligned writes, so might help, but
I wonder how effective that will be in this case, plus it won't
turn low random IOPS-per-TB 4TB devices into high ones. Anyhow
if they are battery-backed the 1GB of HW HBA cache/buffer should
do exactly that, excep that again in this case that is rather
optimistic.

But this reminds me of the common story: "Doctor, if I stab
repeatedly my hand with a fork it hurts a lot, how to fix that?"
"Don't do it".
:-)

PS Random writes of 8-16KiB over 60TB might seem like storing
small records/images in small files. That would be "brave".
On a 60TB RAID50 of 20x 4TB disk drives that might mean around
5-10MB/s of random small writes, including both data and
metadata.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-03 18:32 speed up big btrfs volumes with ssds Stefan Priebe - Profihost AG
  2017-09-03 19:16 ` Peter Grandi
@ 2017-09-04 10:53 ` Henk Slager
  2017-09-04 11:43   ` Peter Grandi
  2017-09-04 12:57   ` Stefan Priebe - Profihost AG
  1 sibling, 2 replies; 11+ messages in thread
From: Henk Slager @ 2017-09-04 10:53 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-btrfs

On Sun, Sep 3, 2017 at 8:32 PM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Hello,
>
> i'm trying to speed up big btrfs volumes.
>
> Some facts:
> - Kernel will be 4.13-rc7
> - needed volume size is 60TB
>
> Currently without any ssds i get the best speed with:
> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>
> and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.
>
> I can live with a data loss every now and and than ;-) so a raid 0 on
> top of the 4x radi5 is acceptable for me.
>
> Currently the write speed is not as good as i would like - especially
> for random 8k-16k I/O.
>
> My current idea is to use a pcie flash card with bcache on top of each
> raid 5.

If it can speed up depends quite a lot on what the use-case is, for
some not-so-much-parallel-access it might work. So this 60TB is then
20 4TB disks or so and the 4x 1GB cache is simply not very helpful I
think. The working set doesn't fit in it I guess. If there is mostly
single or a few users of the fs, a single pcie based bcacheing 4
devices can work, but for SATA SSD, I would use 1 SSD per HWraid5.

Then roughly make sure the complete set of metadata blocks fits in the
cache. For an fs of this size let's say/estimate 150G. Then maybe same
of double for data, so an SSD of 500G would be a first try.

You give the impression that reliability for this fs is not the
highest prio, so if you go full risk, then put bcache in write-back
mode, then you will have your desired random 8k-16k I/O speedup after
the cache is warmed up. But any SW or HW failure wil result in total
fs loss normally if SSD and HDD get out of sync somehow. Bcache
write-through might also be acceptable, you will need extensive
monitoring and tuning of all (bcache) parameters etc to be sure of the
right choice of size and setup etc.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 10:53 ` Henk Slager
@ 2017-09-04 11:43   ` Peter Grandi
  2017-09-04 12:57   ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Grandi @ 2017-09-04 11:43 UTC (permalink / raw)
  To: linux-btrfs

>> [ ... ] Currently the write speed is not as good as i would
>> like - especially for random 8k-16k I/O. [ ... ]

> [ ... ] So this 60TB is then 20 4TB disks or so and the 4x 1GB
> cache is simply not very helpful I think. The working set
> doesn't fit in it I guess. If there is mostly single or a few
> users of the fs, a single pcie based bcacheing 4 devices can
> work, but for SATA SSD, I would use 1 SSD per HWraid5. [ ... ]

Probably the idea of the cacheable working set of a random small
write workload is a major new scientific discovery. :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 10:53 ` Henk Slager
  2017-09-04 11:43   ` Peter Grandi
@ 2017-09-04 12:57   ` Stefan Priebe - Profihost AG
  2017-09-04 13:28     ` Timofey Titovets
  2017-09-04 13:38     ` Russell Coker
  1 sibling, 2 replies; 11+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-09-04 12:57 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

Am 04.09.2017 um 12:53 schrieb Henk Slager:
> On Sun, Sep 3, 2017 at 8:32 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Hello,
>>
>> i'm trying to speed up big btrfs volumes.
>>
>> Some facts:
>> - Kernel will be 4.13-rc7
>> - needed volume size is 60TB
>>
>> Currently without any ssds i get the best speed with:
>> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>>
>> and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.
>>
>> I can live with a data loss every now and and than ;-) so a raid 0 on
>> top of the 4x radi5 is acceptable for me.
>>
>> Currently the write speed is not as good as i would like - especially
>> for random 8k-16k I/O.
>>
>> My current idea is to use a pcie flash card with bcache on top of each
>> raid 5.
> 
> If it can speed up depends quite a lot on what the use-case is, for
> some not-so-much-parallel-access it might work. So this 60TB is then
> 20 4TB disks or so and the 4x 1GB cache is simply not very helpful I
> think. The working set doesn't fit in it I guess. If there is mostly
> single or a few users of the fs, a single pcie based bcacheing 4
> devices can work, but for SATA SSD, I would use 1 SSD per HWraid5.

Yes that's roughly my idea as well and yes the workload is 4 users max
writing data. 50% sequential, 50% random.

> Then roughly make sure the complete set of metadata blocks fits in the
> cache. For an fs of this size let's say/estimate 150G. Then maybe same
> of double for data, so an SSD of 500G would be a first try.

I would use 1TB devices for each Raid or a 4TB PCIe card.

> You give the impression that reliability for this fs is not the
> highest prio, so if you go full risk, then put bcache in write-back
> mode, then you will have your desired random 8k-16k I/O speedup after
> the cache is warmed up. But any SW or HW failure wil result in total
> fs loss normally if SSD and HDD get out of sync somehow. Bcache
> write-through might also be acceptable, you will need extensive
> monitoring and tuning of all (bcache) parameters etc to be sure of the
> right choice of size and setup etc.

Yes i wanted to use the write back mode. Has anybody already made some
test or experience with a setup like this?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 12:57   ` Stefan Priebe - Profihost AG
@ 2017-09-04 13:28     ` Timofey Titovets
  2017-09-04 18:23       ` Peter Grandi
  2017-09-04 18:32       ` Stefan Priebe - Profihost AG
  2017-09-04 13:38     ` Russell Coker
  1 sibling, 2 replies; 11+ messages in thread
From: Timofey Titovets @ 2017-09-04 13:28 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Henk Slager, linux-btrfs

2017-09-04 15:57 GMT+03:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
> Am 04.09.2017 um 12:53 schrieb Henk Slager:
>> On Sun, Sep 3, 2017 at 8:32 PM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
>>> Hello,
>>>
>>> i'm trying to speed up big btrfs volumes.
>>>
>>> Some facts:
>>> - Kernel will be 4.13-rc7
>>> - needed volume size is 60TB
>>>
>>> Currently without any ssds i get the best speed with:
>>> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>>>
>>> and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.
>>>
>>> I can live with a data loss every now and and than ;-) so a raid 0 on
>>> top of the 4x radi5 is acceptable for me.
>>>
>>> Currently the write speed is not as good as i would like - especially
>>> for random 8k-16k I/O.
>>>
>>> My current idea is to use a pcie flash card with bcache on top of each
>>> raid 5.
>>
>> If it can speed up depends quite a lot on what the use-case is, for
>> some not-so-much-parallel-access it might work. So this 60TB is then
>> 20 4TB disks or so and the 4x 1GB cache is simply not very helpful I
>> think. The working set doesn't fit in it I guess. If there is mostly
>> single or a few users of the fs, a single pcie based bcacheing 4
>> devices can work, but for SATA SSD, I would use 1 SSD per HWraid5.
>
> Yes that's roughly my idea as well and yes the workload is 4 users max
> writing data. 50% sequential, 50% random.
>
>> Then roughly make sure the complete set of metadata blocks fits in the
>> cache. For an fs of this size let's say/estimate 150G. Then maybe same
>> of double for data, so an SSD of 500G would be a first try.
>
> I would use 1TB devices for each Raid or a 4TB PCIe card.
>
>> You give the impression that reliability for this fs is not the
>> highest prio, so if you go full risk, then put bcache in write-back
>> mode, then you will have your desired random 8k-16k I/O speedup after
>> the cache is warmed up. But any SW or HW failure wil result in total
>> fs loss normally if SSD and HDD get out of sync somehow. Bcache
>> write-through might also be acceptable, you will need extensive
>> monitoring and tuning of all (bcache) parameters etc to be sure of the
>> right choice of size and setup etc.
>
> Yes i wanted to use the write back mode. Has anybody already made some
> test or experience with a setup like this?
>
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

May be you can make work your raid setup faster by:
1. Use Single Profile
2. Use different stripe size for HW RAID5:
    i think 16kb will be optimal with 5 devices per raid group
    That will give you 64kb data stripe and 16kb parity
    Btrfs raid0 use 64kb as stripe so that can make data access
unaligned (or use single profile for btrfs)
3. Use btrfs ssd_spread to decrease RMW cycles.

Thanks.


-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 12:57   ` Stefan Priebe - Profihost AG
  2017-09-04 13:28     ` Timofey Titovets
@ 2017-09-04 13:38     ` Russell Coker
  1 sibling, 0 replies; 11+ messages in thread
From: Russell Coker @ 2017-09-04 13:38 UTC (permalink / raw)
  To: linux-btrfs

On Monday, 4 September 2017 2:57:18 PM AEST Stefan Priebe - Profihost AG 
wrote:
> > Then roughly make sure the complete set of metadata blocks fits in the
> > cache. For an fs of this size let's say/estimate 150G. Then maybe same
> > of double for data, so an SSD of 500G would be a first try.
> 
> I would use 1TB devices for each Raid or a 4TB PCIe card.

One thing I've considered is to create a filesystem with a RAID-1 of SSDs and 
then create lots of files with long names to use up a lot of space on the 
SSDs.  Then delete those files and add disks to the filesystem.  Then BTRFS 
should keep using the allocated metadata blocks on the SSD for all metadata 
and use disks for just data.

I haven't yet tried bcache, but would prefer something simpler with one less 
layer.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 13:28     ` Timofey Titovets
@ 2017-09-04 18:23       ` Peter Grandi
  2017-09-04 18:32       ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Grandi @ 2017-09-04 18:23 UTC (permalink / raw)
  To: linux-btrfs

>>> [ ... ] Currently without any ssds i get the best speed with:
>>> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>>> and using btrfs as raid 0 for data and metadata on top of
>>> those 4 raid 5. [ ... ] the write speed is not as good as i
>>> would like - especially for random 8k-16k I/O. [ ... ]

> [ ... ] 64kb data stripe and 16kb parity Btrfs raid0 use 64kb
> as stripe so that can make data access unaligned (or use single
> profile for btrfs) 3. Use btrfs ssd_spread to decrease RMW
> cycles.

This is not a "revolutionary" scientific discovery as the idea of
a working set of a small-size random-write workload, but it still
takes a lot of "optimism" to imagine that it is possible to
"decrease RMW cycles" for "random 8k-16k" writes on 64KiB+16KiB
RAID5 stripes, whether with 'ssd_spread' or not.

To "decrease RMW cycles" seems inded to me the better aim than
following the "radical" aim of caching the working set of a
random-small-write workload, but it may be less easy to achieve
than desirable :-). http://www.baarf.dk/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 13:28     ` Timofey Titovets
  2017-09-04 18:23       ` Peter Grandi
@ 2017-09-04 18:32       ` Stefan Priebe - Profihost AG
  2017-09-04 20:35         ` Timofey Titovets
  2017-09-11  9:22         ` Stefan Priebe - Profihost AG
  1 sibling, 2 replies; 11+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-09-04 18:32 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: Henk Slager, linux-btrfs

Am 04.09.2017 um 15:28 schrieb Timofey Titovets:
> 2017-09-04 15:57 GMT+03:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
>> Am 04.09.2017 um 12:53 schrieb Henk Slager:
>>> On Sun, Sep 3, 2017 at 8:32 PM, Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag> wrote:
>>>> Hello,
>>>>
>>>> i'm trying to speed up big btrfs volumes.
>>>>
>>>> Some facts:
>>>> - Kernel will be 4.13-rc7
>>>> - needed volume size is 60TB
>>>>
>>>> Currently without any ssds i get the best speed with:
>>>> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>>>>
>>>> and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.
>>>>
>>>> I can live with a data loss every now and and than ;-) so a raid 0 on
>>>> top of the 4x radi5 is acceptable for me.
>>>>
>>>> Currently the write speed is not as good as i would like - especially
>>>> for random 8k-16k I/O.
>>>>
>>>> My current idea is to use a pcie flash card with bcache on top of each
>>>> raid 5.
>>>
>>> If it can speed up depends quite a lot on what the use-case is, for
>>> some not-so-much-parallel-access it might work. So this 60TB is then
>>> 20 4TB disks or so and the 4x 1GB cache is simply not very helpful I
>>> think. The working set doesn't fit in it I guess. If there is mostly
>>> single or a few users of the fs, a single pcie based bcacheing 4
>>> devices can work, but for SATA SSD, I would use 1 SSD per HWraid5.
>>
>> Yes that's roughly my idea as well and yes the workload is 4 users max
>> writing data. 50% sequential, 50% random.
>>
>>> Then roughly make sure the complete set of metadata blocks fits in the
>>> cache. For an fs of this size let's say/estimate 150G. Then maybe same
>>> of double for data, so an SSD of 500G would be a first try.
>>
>> I would use 1TB devices for each Raid or a 4TB PCIe card.
>>
>>> You give the impression that reliability for this fs is not the
>>> highest prio, so if you go full risk, then put bcache in write-back
>>> mode, then you will have your desired random 8k-16k I/O speedup after
>>> the cache is warmed up. But any SW or HW failure wil result in total
>>> fs loss normally if SSD and HDD get out of sync somehow. Bcache
>>> write-through might also be acceptable, you will need extensive
>>> monitoring and tuning of all (bcache) parameters etc to be sure of the
>>> right choice of size and setup etc.
>>
>> Yes i wanted to use the write back mode. Has anybody already made some
>> test or experience with a setup like this?
>>
> 
> May be you can make work your raid setup faster by:
> 1. Use Single Profile

I'm already using the raid0 profile - see below:

Data,RAID0: Size:22.57TiB, Used:21.08TiB
Metadata,RAID0: Size:90.00GiB, Used:82.28GiB
System,RAID0: Size:64.00MiB, Used:1.53MiB

> 2. Use different stripe size for HW RAID5:
>     i think 16kb will be optimal with 5 devices per raid group
>     That will give you 64kb data stripe and 16kb parity
>     Btrfs raid0 use 64kb as stripe so that can make data access
> unaligned (or use single profile for btrfs)

That sounds like an interesting idea except for the unaligned writes.
Will need to test this.

> 3. Use btrfs ssd_spread to decrease RMW cycles.
Can you explain this?

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 18:32       ` Stefan Priebe - Profihost AG
@ 2017-09-04 20:35         ` Timofey Titovets
  2017-09-11  9:22         ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 11+ messages in thread
From: Timofey Titovets @ 2017-09-04 20:35 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Henk Slager, linux-btrfs

2017-09-04 21:32 GMT+03:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
>> May be you can make work your raid setup faster by:
>> 1. Use Single Profile
>
> I'm already using the raid0 profile - see below:

If i understand correctly, you have a very big data set with random RW
access, so:
I'm saying about single profile for compact writes to one device, that
can make WB cache more effective
Because writes will not spread on several devices and as result that
increase chance that full stripe will be overwriten
That's will just work as raid0 with very big stripe size.

> Data,RAID0: Size:22.57TiB, Used:21.08TiB
> Metadata,RAID0: Size:90.00GiB, Used:82.28GiB
> System,RAID0: Size:64.00MiB, Used:1.53MiB
>
>> 2. Use different stripe size for HW RAID5:
>>     i think 16kb will be optimal with 5 devices per raid group
>>     That will give you 64kb data stripe and 16kb parity
>>     Btrfs raid0 use 64kb as stripe so that can make data access
>> unaligned (or use single profile for btrfs)
>
> That sounds like an interesting idea except for the unaligned writes.
> Will need to test this.

Afaik btrfs also use 64kb for metadata:
https://github.com/torvalds/linux/blob/e26f1bea3b833fb2c16fb5f0a949da1efa219de3/fs/btrfs/extent-tree.c#L6678

>> 3. Use btrfs ssd_spread to decrease RMW cycles.
> Can you explain this?

Long description:
https://www.spinics.net/lists/linux-btrfs/msg67515.html

Short:
that option will change allocator logic.
Allocator will spread writes more aggressively and always try write to
new/empty area.
So in theory that will write new data to new empty chunk, so if you
have much free space that will make some guaranty to not touch old
data, so not do RWM and in theory always do full stripe write.

But if you expect that you array will be near full and you don't want
do defragment on that, that can easy get you enospace error.

> Stefan

That just my IMHO,
Thanks.
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: speed up big btrfs volumes with ssds
  2017-09-04 18:32       ` Stefan Priebe - Profihost AG
  2017-09-04 20:35         ` Timofey Titovets
@ 2017-09-11  9:22         ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-09-11  9:22 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: Henk Slager, linux-btrfs

Hello,

Am 04.09.2017 um 20:32 schrieb Stefan Priebe - Profihost AG:
> Am 04.09.2017 um 15:28 schrieb Timofey Titovets:
>> 2017-09-04 15:57 GMT+03:00 Stefan Priebe - Profihost AG <s.priebe@profihost.ag>:
>>> Am 04.09.2017 um 12:53 schrieb Henk Slager:
>>>> On Sun, Sep 3, 2017 at 8:32 PM, Stefan Priebe - Profihost AG
>>>> <s.priebe@profihost.ag> wrote:
>>>>> Hello,
>>>>>
>>>>> i'm trying to speed up big btrfs volumes.
>>>>>
>>>>> Some facts:
>>>>> - Kernel will be 4.13-rc7
>>>>> - needed volume size is 60TB
>>>>>
>>>>> Currently without any ssds i get the best speed with:
>>>>> - 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices
>>>>>
>>>>> and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.
>>>>>
>>>>> I can live with a data loss every now and and than ;-) so a raid 0 on
>>>>> top of the 4x radi5 is acceptable for me.
>>>>>
>>>>> Currently the write speed is not as good as i would like - especially
>>>>> for random 8k-16k I/O.
>>>>>
>>>>> My current idea is to use a pcie flash card with bcache on top of each
>>>>> raid 5.
>>>>
>>>> If it can speed up depends quite a lot on what the use-case is, for
>>>> some not-so-much-parallel-access it might work. So this 60TB is then
>>>> 20 4TB disks or so and the 4x 1GB cache is simply not very helpful I
>>>> think. The working set doesn't fit in it I guess. If there is mostly
>>>> single or a few users of the fs, a single pcie based bcacheing 4
>>>> devices can work, but for SATA SSD, I would use 1 SSD per HWraid5.
>>>
>>> Yes that's roughly my idea as well and yes the workload is 4 users max
>>> writing data. 50% sequential, 50% random.
>>>
>>>> Then roughly make sure the complete set of metadata blocks fits in the
>>>> cache. For an fs of this size let's say/estimate 150G. Then maybe same
>>>> of double for data, so an SSD of 500G would be a first try.
>>>
>>> I would use 1TB devices for each Raid or a 4TB PCIe card.
>>>
>>>> You give the impression that reliability for this fs is not the
>>>> highest prio, so if you go full risk, then put bcache in write-back
>>>> mode, then you will have your desired random 8k-16k I/O speedup after
>>>> the cache is warmed up. But any SW or HW failure wil result in total
>>>> fs loss normally if SSD and HDD get out of sync somehow. Bcache
>>>> write-through might also be acceptable, you will need extensive
>>>> monitoring and tuning of all (bcache) parameters etc to be sure of the
>>>> right choice of size and setup etc.
>>>
>>> Yes i wanted to use the write back mode. Has anybody already made some
>>> test or experience with a setup like this?
>>>
>>
>> May be you can make work your raid setup faster by:
>> 1. Use Single Profile
> 
> I'm already using the raid0 profile - see below:
> 
> Data,RAID0: Size:22.57TiB, Used:21.08TiB
> Metadata,RAID0: Size:90.00GiB, Used:82.28GiB
> System,RAID0: Size:64.00MiB, Used:1.53MiB
> 
>> 2. Use different stripe size for HW RAID5:
>>     i think 16kb will be optimal with 5 devices per raid group
>>     That will give you 64kb data stripe and 16kb parity
>>     Btrfs raid0 use 64kb as stripe so that can make data access
>> unaligned (or use single profile for btrfs)
> 
> That sounds like an interesting idea except for the unaligned writes.
> Will need to test this.
> 
>> 3. Use btrfs ssd_spread to decrease RMW cycles.
> Can you explain this?
> 
> Stefan

i was able to fix this issue with ssd_spread. Could it be that the
default allocators nossd and ssd are searching to hard to free space?
Even space_tree did not help.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-11  9:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-03 18:32 speed up big btrfs volumes with ssds Stefan Priebe - Profihost AG
2017-09-03 19:16 ` Peter Grandi
2017-09-04 10:53 ` Henk Slager
2017-09-04 11:43   ` Peter Grandi
2017-09-04 12:57   ` Stefan Priebe - Profihost AG
2017-09-04 13:28     ` Timofey Titovets
2017-09-04 18:23       ` Peter Grandi
2017-09-04 18:32       ` Stefan Priebe - Profihost AG
2017-09-04 20:35         ` Timofey Titovets
2017-09-11  9:22         ` Stefan Priebe - Profihost AG
2017-09-04 13:38     ` Russell Coker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.