All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs performance - ssd array
@ 2015-01-12 13:51 P. Remek
  2015-01-12 14:54 ` Austin S Hemmelgarn
  2015-01-13  3:59 ` Wang Shilong
  0 siblings, 2 replies; 9+ messages in thread
From: P. Remek @ 2015-01-12 13:51 UTC (permalink / raw)
  To: linux-btrfs

Hello,

we are currently investigating possiblities and performance limits of
the Btrfs filesystem. Now it seems we are getting pretty poor
performance for the writes and I would like to ask, if our results
makes sense and if it is a result of some well known performance
bottleneck.

Our setup:

Server:
   CPU: dual socket: E5-2630 v2
   RAM: 32 GB ram
   OS: Ubuntu server 14.10
   Kernel: 3.19.0-031900rc2-generic
   btrfs tools: Btrfs v3.14.1
   2x LSI 9300 HBAs - SAS3 12/Gbs
   8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs

Both HBAs see all 8 disks and we have set up multipathing using
multipath command and device mapper. Then we using this command to
create the filesystem:

mkfs.btrfs -f -d raid10 /dev/mapper/prm-0 /dev/mapper/prm-1
/dev/mapper/prm-2 /dev/mapper/prm-3 /dev/mapper/prm-4
/dev/mapper/prm-5 /dev/mapper/prm-6 /dev/mapper/prm-7


We run performance test using following command:

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
--name=test1 --filename=test1 --bs=4k --iodepth=32 --size=12G
--numjobs=24 --readwrite=randwrite


The results for the random read are more or less comparable with the
performance of EXT4 filesystem, we get approximately 300 000 IOPs for
random read.

For random write however, we are getting only about 15 000 IOPs, which
is much lower than for ESX4 (~200 000 IOPs for RAID10).


Regards,
Premek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-12 13:51 btrfs performance - ssd array P. Remek
@ 2015-01-12 14:54 ` Austin S Hemmelgarn
  2015-01-12 15:11   ` Patrik Lundquist
  2015-01-12 15:35   ` P. Remek
  2015-01-13  3:59 ` Wang Shilong
  1 sibling, 2 replies; 9+ messages in thread
From: Austin S Hemmelgarn @ 2015-01-12 14:54 UTC (permalink / raw)
  To: P. Remek, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3627 bytes --]

On 2015-01-12 08:51, P. Remek wrote:
> Hello,
> 
> we are currently investigating possiblities and performance limits of
> the Btrfs filesystem. Now it seems we are getting pretty poor
> performance for the writes and I would like to ask, if our results
> makes sense and if it is a result of some well known performance
> bottleneck.
> 
> Our setup:
> 
> Server:
>     CPU: dual socket: E5-2630 v2
>     RAM: 32 GB ram
>     OS: Ubuntu server 14.10
>     Kernel: 3.19.0-031900rc2-generic
>     btrfs tools: Btrfs v3.14.1
>     2x LSI 9300 HBAs - SAS3 12/Gbs
>     8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
> 
> Both HBAs see all 8 disks and we have set up multipathing using
> multipath command and device mapper. Then we using this command to
> create the filesystem:
> 
> mkfs.btrfs -f -d raid10 /dev/mapper/prm-0 /dev/mapper/prm-1
> /dev/mapper/prm-2 /dev/mapper/prm-3 /dev/mapper/prm-4
> /dev/mapper/prm-5 /dev/mapper/prm-6 /dev/mapper/prm-7
You almost certainly DO NOT want to use BTRFS raid10 unless you have known good backups and are willing to deal with the downtime associated with restoring them.  The current incarnation of raid10 in BTRFS is much worse than LVM/MD based soft-raid with respect to data recoverability.  I would suggest using BTRFS raid1 in this case (which behaves like MD-RAID10 when used with more than 2 devices), possibly on top of LVM/MD RAID0 if you really need the performance.
> 
> 
> We run performance test using following command:
> 
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test1 --filename=test1 --bs=4k --iodepth=32 --size=12G
> --numjobs=24 --readwrite=randwrite
> 
> 
> The results for the random read are more or less comparable with the
> performance of EXT4 filesystem, we get approximately 300 000 IOPs for
> random read.
> 
> For random write however, we are getting only about 15 000 IOPs, which
> is much lower than for ESX4 (~200 000 IOPs for RAID10).
>

While I don't have any conclusive numbers, I have noticed myself that random write based AIO on BTRFS does tend to be slower on other filesystems.  Also, LVM/MD based RAID10 does outperform BTRFS' raid10 implementation, and probably will for quite a while; however, I've also noticed that faster RAM does provide a bigger benefit for BTRFS than it does for LVM (~2.5% greater performance for BTRFS than for LVM when switching from DDR3-1333 to DDR3-1600 on otherwise identical hardware), so you might consider looking into that.

Another thing to consider is that the kernel's default I/O scheduler and the default parameters for that I/O scheduler are almost always suboptimal for SSD's, and this tends to show far more with BTRFS than anything else.  Personally I've found that using the CFQ I/O scheduler with the following parameters works best for a majority of SSD's:
1. slice_idle=0
2. back_seek_penalty=1
3. back_seek_max set equal to the size in sectors of the device
4. nr_requests and quantum set to the hardware command queue depth

You can easily set these persistently for a given device with a udev rule like this:
  KERNEL=='sda', SUBSYSTEM=='block', ACTION=='add', ATTR{queue/scheduler}='cfq', ATTR{queue/iosched/back_seek_penalty}='1', ATTR{queue/iosched/back_seek_max}='<device_size>', ATTR{queue/iosched/quantum}='128', ATTR{queue/iosched/slice_idle}='0', ATTR{queue/nr_requests}='128'

Make sure to replace '128' in the rule with whatever the command queue depth is for the device in question (It's usually 128 or 256, occasionally more), and <device_size> with the size of the device in kibibytes.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-12 14:54 ` Austin S Hemmelgarn
@ 2015-01-12 15:11   ` Patrik Lundquist
  2015-01-12 16:32     ` Austin S Hemmelgarn
  2015-01-12 15:35   ` P. Remek
  1 sibling, 1 reply; 9+ messages in thread
From: Patrik Lundquist @ 2015-01-12 15:11 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: linux-btrfs

On 12 January 2015 at 15:54, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
>
> Another thing to consider is that the kernel's default I/O scheduler and the default parameters for that I/O scheduler are almost always suboptimal for SSD's, and this tends to show far more with BTRFS than anything else.  Personally I've found that using the CFQ I/O scheduler with the following parameters works best for a majority of SSD's:
> 1. slice_idle=0
> 2. back_seek_penalty=1
> 3. back_seek_max set equal to the size in sectors of the device
> 4. nr_requests and quantum set to the hardware command queue depth
>
> You can easily set these persistently for a given device with a udev rule like this:
>   KERNEL=='sda', SUBSYSTEM=='block', ACTION=='add', ATTR{queue/scheduler}='cfq', ATTR{queue/iosched/back_seek_penalty}='1', ATTR{queue/iosched/back_seek_max}='<device_size>', ATTR{queue/iosched/quantum}='128', ATTR{queue/iosched/slice_idle}='0', ATTR{queue/nr_requests}='128'
>
> Make sure to replace '128' in the rule with whatever the command queue depth is for the device in question (It's usually 128 or 256, occasionally more), and <device_size> with the size of the device in kibibytes.
>

So is it "size in sectors of the device" or "size of the device in
kibibytes" for back_seek_max? :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-12 14:54 ` Austin S Hemmelgarn
  2015-01-12 15:11   ` Patrik Lundquist
@ 2015-01-12 15:35   ` P. Remek
  2015-01-12 16:43     ` Austin S Hemmelgarn
  1 sibling, 1 reply; 9+ messages in thread
From: P. Remek @ 2015-01-12 15:35 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: linux-btrfs

>Another thing to consider is that the kernel's default I/O scheduler and the default parameters for that I/O scheduler are almost always suboptimal for SSD's, and this tends to show far more with BTRFS than anything else.  Personally >I've found that using the CFQ I/O scheduler with the following parameters works best for a majority of SSD's:
>1. slice_idle=0
>2. back_seek_penalty=1
>3. back_seek_max set equal to the size in sectors of the device
>4. nr_requests and quantum set to the hardware command queue depth

I will give these suggestions a try but I don't expect any big gain.
Notice that the difference between EXT4 and BTRFS random write is
massive - its 200 000 IOPs vs. 15 000 IOPs and the device and kernel
parameters are exactly the same (it is same machine) for both test
scenarios. It suggests that something is taking down write performance
in the Btrfs implementation.

Notice also that we did some performance tuning ( queue scheduling set
to noop, irq affinity distribution and pinning to specific numa nodes
and cores etc.)

Regards,
Premek


On Mon, Jan 12, 2015 at 3:54 PM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2015-01-12 08:51, P. Remek wrote:
>> Hello,
>>
>> we are currently investigating possiblities and performance limits of
>> the Btrfs filesystem. Now it seems we are getting pretty poor
>> performance for the writes and I would like to ask, if our results
>> makes sense and if it is a result of some well known performance
>> bottleneck.
>>
>> Our setup:
>>
>> Server:
>>     CPU: dual socket: E5-2630 v2
>>     RAM: 32 GB ram
>>     OS: Ubuntu server 14.10
>>     Kernel: 3.19.0-031900rc2-generic
>>     btrfs tools: Btrfs v3.14.1
>>     2x LSI 9300 HBAs - SAS3 12/Gbs
>>     8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
>>
>> Both HBAs see all 8 disks and we have set up multipathing using
>> multipath command and device mapper. Then we using this command to
>> create the filesystem:
>>
>> mkfs.btrfs -f -d raid10 /dev/mapper/prm-0 /dev/mapper/prm-1
>> /dev/mapper/prm-2 /dev/mapper/prm-3 /dev/mapper/prm-4
>> /dev/mapper/prm-5 /dev/mapper/prm-6 /dev/mapper/prm-7
> You almost certainly DO NOT want to use BTRFS raid10 unless you have known good backups and are willing to deal with the downtime associated with restoring them.  The current incarnation of raid10 in BTRFS is much worse than LVM/MD based soft-raid with respect to data recoverability.  I would suggest using BTRFS raid1 in this case (which behaves like MD-RAID10 when used with more than 2 devices), possibly on top of LVM/MD RAID0 if you really need the performance.
>>
>>
>> We run performance test using following command:
>>
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
>> --name=test1 --filename=test1 --bs=4k --iodepth=32 --size=12G
>> --numjobs=24 --readwrite=randwrite
>>
>>
>> The results for the random read are more or less comparable with the
>> performance of EXT4 filesystem, we get approximately 300 000 IOPs for
>> random read.
>>
>> For random write however, we are getting only about 15 000 IOPs, which
>> is much lower than for ESX4 (~200 000 IOPs for RAID10).
>>
>
> While I don't have any conclusive numbers, I have noticed myself that random write based AIO on BTRFS does tend to be slower on other filesystems.  Also, LVM/MD based RAID10 does outperform BTRFS' raid10 implementation, and probably will for quite a while; however, I've also noticed that faster RAM does provide a bigger benefit for BTRFS than it does for LVM (~2.5% greater performance for BTRFS than for LVM when switching from DDR3-1333 to DDR3-1600 on otherwise identical hardware), so you might consider looking into that.
>
> Another thing to consider is that the kernel's default I/O scheduler and the default parameters for that I/O scheduler are almost always suboptimal for SSD's, and this tends to show far more with BTRFS than anything else.  Personally I've found that using the CFQ I/O scheduler with the following parameters works best for a majority of SSD's:
> 1. slice_idle=0
> 2. back_seek_penalty=1
> 3. back_seek_max set equal to the size in sectors of the device
> 4. nr_requests and quantum set to the hardware command queue depth
>
> You can easily set these persistently for a given device with a udev rule like this:
>   KERNEL=='sda', SUBSYSTEM=='block', ACTION=='add', ATTR{queue/scheduler}='cfq', ATTR{queue/iosched/back_seek_penalty}='1', ATTR{queue/iosched/back_seek_max}='<device_size>', ATTR{queue/iosched/quantum}='128', ATTR{queue/iosched/slice_idle}='0', ATTR{queue/nr_requests}='128'
>
> Make sure to replace '128' in the rule with whatever the command queue depth is for the device in question (It's usually 128 or 256, occasionally more), and <device_size> with the size of the device in kibibytes.
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-12 15:11   ` Patrik Lundquist
@ 2015-01-12 16:32     ` Austin S Hemmelgarn
  0 siblings, 0 replies; 9+ messages in thread
From: Austin S Hemmelgarn @ 2015-01-12 16:32 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

On 2015-01-12 10:11, Patrik Lundquist wrote:
> On 12 January 2015 at 15:54, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
>>
>> Another thing to consider is that the kernel's default I/O scheduler and the default parameters for that I/O scheduler are almost always suboptimal for SSD's, and this tends to show far more with BTRFS than anything else.  Personally I've found that using the CFQ I/O scheduler with the following parameters works best for a majority of SSD's:
>> 1. slice_idle=0
>> 2. back_seek_penalty=1
>> 3. back_seek_max set equal to the size in sectors of the device
>> 4. nr_requests and quantum set to the hardware command queue depth
>>
>> You can easily set these persistently for a given device with a udev rule like this:
>>    KERNEL=='sda', SUBSYSTEM=='block', ACTION=='add', ATTR{queue/scheduler}='cfq', ATTR{queue/iosched/back_seek_penalty}='1', ATTR{queue/iosched/back_seek_max}='<device_size>', ATTR{queue/iosched/quantum}='128', ATTR{queue/iosched/slice_idle}='0', ATTR{queue/nr_requests}='128'
>>
>> Make sure to replace '128' in the rule with whatever the command queue depth is for the device in question (It's usually 128 or 256, occasionally more), and <device_size> with the size of the device in kibibytes.
>>
>
> So is it "size in sectors of the device" or "size of the device in
> kibibytes" for back_seek_max? :-)
>
size in kibibytes, sorry about the confusion, I forgot to correct every 
instance of saying it was size in sectors after I reread the documentation.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-12 15:35   ` P. Remek
@ 2015-01-12 16:43     ` Austin S Hemmelgarn
  0 siblings, 0 replies; 9+ messages in thread
From: Austin S Hemmelgarn @ 2015-01-12 16:43 UTC (permalink / raw)
  To: P. Remek; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On 2015-01-12 10:35, P. Remek wrote:
>> Another thing to consider is that the kernel's default I/O scheduler and the default parameters for that I/O scheduler are almost always suboptimal for SSD's, and this tends to show far more with BTRFS than anything else.  Personally >I've found that using the CFQ I/O scheduler with the following parameters works best for a majority of SSD's:
>> 1. slice_idle=0
>> 2. back_seek_penalty=1
>> 3. back_seek_max set equal to the size in sectors of the device
>> 4. nr_requests and quantum set to the hardware command queue depth
>
> I will give these suggestions a try but I don't expect any big gain.
> Notice that the difference between EXT4 and BTRFS random write is
> massive - its 200 000 IOPs vs. 15 000 IOPs and the device and kernel
> parameters are exactly the same (it is same machine) for both test
> scenarios. It suggests that something is taking down write performance
> in the Btrfs implementation.
>
> Notice also that we did some performance tuning ( queue scheduling set
> to noop, irq affinity distribution and pinning to specific numa nodes
> and cores etc.)
>
The stuff about the I/O scheduler is more general advice for dealing 
with SSD's than anything BTRFS specific.  I've found though that on SATA 
(I don't have anywhere near the kind of budget needed for SAS disks, and 
even less so for SAS SSD's) connected SSD's at least, using the no-op 
I/O scheduler get's better small burst performance, but it causes 
horrible latency spikes whenever trying to do something that requires 
bulk throughput with random writes (rsync being an excellent example of 
this).

Something else I thought of after my initial reply, due to the COW 
nature of BTRFS, you will generally get better performance of metadata 
operations with shallower directory structures (largely because mtime 
updates propagate up the directory tree to the root of the filesystem).


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-12 13:51 btrfs performance - ssd array P. Remek
  2015-01-12 14:54 ` Austin S Hemmelgarn
@ 2015-01-13  3:59 ` Wang Shilong
  2015-01-15 13:32   ` P. Remek
  1 sibling, 1 reply; 9+ messages in thread
From: Wang Shilong @ 2015-01-13  3:59 UTC (permalink / raw)
  To: P. Remek; +Cc: linux-btrfs

Hello,

> Hello,
> 
> we are currently investigating possiblities and performance limits of
> the Btrfs filesystem. Now it seems we are getting pretty poor
> performance for the writes and I would like to ask, if our results
> makes sense and if it is a result of some well known performance
> bottleneck.
> 
> Our setup:
> 
> Server:
>   CPU: dual socket: E5-2630 v2
>   RAM: 32 GB ram
>   OS: Ubuntu server 14.10
>   Kernel: 3.19.0-031900rc2-generic
>   btrfs tools: Btrfs v3.14.1
>   2x LSI 9300 HBAs - SAS3 12/Gbs
>   8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs
> 
> Both HBAs see all 8 disks and we have set up multipathing using
> multipath command and device mapper. Then we using this command to
> create the filesystem:
> 
> mkfs.btrfs -f -d raid10 /dev/mapper/prm-0 /dev/mapper/prm-1
> /dev/mapper/prm-2 /dev/mapper/prm-3 /dev/mapper/prm-4
> /dev/mapper/prm-5 /dev/mapper/prm-6 /dev/mapper/prm-7
> 
> 
> We run performance test using following command:
> 
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test1 --filename=test1 --bs=4k --iodepth=32 --size=12G
> --numjobs=24 --readwrite=randwrite

Could you check how many extents with BTRFS and Ext4:
# filefrag test1

To see if this is because bad fragments for BTRFS. I am still not
sure how fio will test randwrite here, so here is possibilities:

case1:
     if fio don’t repeat write same position for several time, i think
     you could add --overite=0, and retest to see if it helps.

case2:
    if fio randwrite did write same position for several time, i think
    you could use ‘-o nodatacow’ mount option to verify if this is because
    BTRFS COW caused serious fragments.

> 
> 
> The results for the random read are more or less comparable with the
> performance of EXT4 filesystem, we get approximately 300 000 IOPs for
> random read.
> 
> For random write however, we are getting only about 15 000 IOPs, which
> is much lower than for ESX4 (~200 000 IOPs for RAID10).
> 
> 
> Regards,
> Premek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-13  3:59 ` Wang Shilong
@ 2015-01-15 13:32   ` P. Remek
  2015-01-18  5:11     ` Wang Shilong
  0 siblings, 1 reply; 9+ messages in thread
From: P. Remek @ 2015-01-15 13:32 UTC (permalink / raw)
  To: Wang Shilong; +Cc: linux-btrfs

Hello,

>
> Could you check how many extents with BTRFS and Ext4:
> # filefrag test1

So my findings are odd:

On BTRFS when I run fio with a single worker thread (target file is
12GB large,and its 100% random write of 4kb blocks), then number of
extents reported by filefrag is around 3.
However when I do the same with 4 worker threads, I get some crazy
number of extents - "test1: 3141866 extents found". Also when running
with 4 threads when I check CPU, the sys% utilization takes 80% of CPU
( in the top output I see that all is consumed by kworker processes).

On the EXT4 I get only 13 extents even when running with 4 worker
threads. (note that I created RAID10 using mdadm before setting up
ext4 there in order to get comparable "storage solution" to what we
test with  BTRFS).

Another odd thing is, that it takes very long time for the filefrag
utility to return the result on the BTRFS and not only for the case
where I got 3 milions of extents but also for the first case where I
ran single worker and the number of extents was only 3. Filefrag on
EXT4 returns immediately.


> To see if this is because bad fragments for BTRFS. I am still not
> sure how fio will test randwrite here, so here is possibilities:
>
> case1:
>      if fio don’t repeat write same position for several time, i think
>      you could add --overite=0, and retest to see if it helps.

Not sure  what parameter do you mean here.

> case2:
>     if fio randwrite did write same position for several time, i think
>     you could use ‘-o nodatacow’ mount option to verify if this is because
>     BTRFS COW caused serious fragments.
>

It seems that mounting it with this option does have some effect but
not very significant and it is not very deterministic. The IOPs are
slightly higher at the beginning (~25 000 IOPs) but IOPs perfromance
is very spiky and I can still see that CPU sys% is very high. As soon
as the kworker threads start consuming CPU, the IOPs performance goes
down again to some ~15 000 IOPs.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs performance - ssd array
  2015-01-15 13:32   ` P. Remek
@ 2015-01-18  5:11     ` Wang Shilong
  0 siblings, 0 replies; 9+ messages in thread
From: Wang Shilong @ 2015-01-18  5:11 UTC (permalink / raw)
  To: P. Remek; +Cc: linux-btrfs

Hello,

> 
> Hello,
> 
>> 
>> Could you check how many extents with BTRFS and Ext4:
>> # filefrag test1
> 
> So my findings are odd:
> 
> On BTRFS when I run fio with a single worker thread (target file is
> 12GB large,and its 100% random write of 4kb blocks), then number of
> extents reported by filefrag is around 3.
> However when I do the same with 4 worker threads, I get some crazy
> number of extents - "test1: 3141866 extents found". Also when running
> with 4 threads when I check CPU, the sys% utilization takes 80% of CPU
> ( in the top output I see that all is consumed by kworker processes).
> 
> On the EXT4 I get only 13 extents even when running with 4 worker
> threads. (note that I created RAID10 using mdadm before setting up
> ext4 there in order to get comparable "storage solution" to what we
> test with  BTRFS).
> 
> Another odd thing is, that it takes very long time for the filefrag
> utility to return the result on the BTRFS and not only for the case
> where I got 3 milions of extents but also for the first case where I
> ran single worker and the number of extents was only 3. Filefrag on
> EXT4 returns immediately.

So looks like Btrfs lock contention problem.

Take a look at look btrfs_drop_extents(), even for nodatacow
, there will be still many items removal for FS tree.

Unfortunately, btrfs lock contention problems seem very sensible
to item removal operations.

And, here Your SSD is very fast? hm, i am not very sensible to
Your IOPS number.

You could verify this problem by reduing threads number for example
compare 1 thread results. aslo i guess btrfs seq write formace should
be less serious here…
> 
> 
>> To see if this is because bad fragments for BTRFS. I am still not
>> sure how fio will test randwrite here, so here is possibilities:
>> 
>> case1:
>>     if fio don’t repeat write same position for several time, i think
>>     you could add --overite=0, and retest to see if it helps.
> 
> Not sure  what parameter do you mean here.


I mean ‘--overwrite' is an option for fio.

> 
>> case2:
>>    if fio randwrite did write same position for several time, i think
>>    you could use ‘-o nodatacow’ mount option to verify if this is because
>>    BTRFS COW caused serious fragments.
>> 
> 
> It seems that mounting it with this option does have some effect but
> not very significant and it is not very deterministic. The IOPs are
> slightly higher at the beginning (~25 000 IOPs) but IOPs perfromance
> is very spiky and I can still see that CPU sys% is very high. As soon
> as the kworker threads start consuming CPU, the IOPs performance goes
> down again to some ~15 000 IOPs.

Best Regards,
Wang Shilong


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-18  5:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-12 13:51 btrfs performance - ssd array P. Remek
2015-01-12 14:54 ` Austin S Hemmelgarn
2015-01-12 15:11   ` Patrik Lundquist
2015-01-12 16:32     ` Austin S Hemmelgarn
2015-01-12 15:35   ` P. Remek
2015-01-12 16:43     ` Austin S Hemmelgarn
2015-01-13  3:59 ` Wang Shilong
2015-01-15 13:32   ` P. Remek
2015-01-18  5:11     ` Wang Shilong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.