All of lore.kernel.org
 help / color / mirror / Atom feed
* Interesting btrfs csum and tree-checker performance penalty analyse
@ 2019-04-03  8:54 Qu Wenruo
  2019-04-03  9:09 ` Nikolay Borisov
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2019-04-03  8:54 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1583 bytes --]

Hi,

Recently Intel LKP performance test is reporting regression of btrfs
performance.

It points to tree-checker code, and since I'm poking around the
bcc/ebpf, I spend some time to do an interesting look into the
performance penalty about both btrfs csum and tree-checker.

The code base is David's misc-next, which contains both write-time tree
checker and enhanced code to handle fuzzed image.

The tool can be find in my gist:
https://gist.github.com/adam900710/b5542f2e52ed4687986cf41f64b85253

To use the tool, one needs bcc-python binding and kernel config for
eBPF, but at least Arch default kernel has all needed config, so any one
can try it on Arch.

The work load is:
 mkfs.btrfs -n 4K $DEV
 mount $DEV $MNT
 fsstress -n 10000 -w -d $MNT
 umount $MNT

 ## start my script ##
 mount $DEV $MNT
 ls -R $MNT > /dev/null # To read all fs tree blocks
 fsstress -n 1000 -w -d $MNT # Trigger enough write
 umount $MNT
 ## stop my script ##


The result is very interesting:
Basic result is:
CSUM_TREE_BLOCK: nr=2311 total=10000612 avg=4327
TREE_CHECKER_READ: nr=461 total=41911553 avg=90914
TREE_CHECKER_WRITE: nr=1575 total=5783330 avg=3671

So if just looking at the average number of csum calculate, it only
brings 3~5μs. And surprisingly, write time tree checker even slower than
checksum!

Also surprisingly, read time tree checker takes near 100μs. nearly 20
times slower than csum/write time tree checker.

So we have a new direction to enhance tree-checker performance.
BTW, bcc/ebpf is really awesome!

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Interesting btrfs csum and tree-checker performance penalty analyse
  2019-04-03  8:54 Interesting btrfs csum and tree-checker performance penalty analyse Qu Wenruo
@ 2019-04-03  9:09 ` Nikolay Borisov
  2019-04-03  9:29   ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Nikolay Borisov @ 2019-04-03  9:09 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 3.04.19 г. 11:54 ч., Qu Wenruo wrote:
> Hi,
> 
> Recently Intel LKP performance test is reporting regression of btrfs
> performance.
> 
> It points to tree-checker code, and since I'm poking around the
> bcc/ebpf, I spend some time to do an interesting look into the
> performance penalty about both btrfs csum and tree-checker.
> 
> The code base is David's misc-next, which contains both write-time tree
> checker and enhanced code to handle fuzzed image.
> 
> The tool can be find in my gist:
> https://gist.github.com/adam900710/b5542f2e52ed4687986cf41f64b85253

So you are essentially trying to figure out the average run time of 3
functions, this could have been made simpler by using the funclatency
bcc tool from iovisor repo:

https://github.com/iovisor/bcc/blob/master/tools/funclatency.py


Actually running this tool will show you a latency histogram making it
easier to spot any latency outliers. An average value doesn't mean
anything without having more context i.e stddev.


> 
> To use the tool, one needs bcc-python binding and kernel config for
> eBPF, but at least Arch default kernel has all needed config, so any one
> can try it on Arch.
> 
> The work load is:
>  mkfs.btrfs -n 4K $DEV
>  mount $DEV $MNT
>  fsstress -n 10000 -w -d $MNT
>  umount $MNT
> 
>  ## start my script ##
>  mount $DEV $MNT
>  ls -R $MNT > /dev/null # To read all fs tree blocks
>  fsstress -n 1000 -w -d $MNT # Trigger enough write
>  umount $MNT
>  ## stop my script ##
> 
> 
> The result is very interesting:
> Basic result is:
> CSUM_TREE_BLOCK: nr=2311 total=10000612 avg=4327
> TREE_CHECKER_READ: nr=461 total=41911553 avg=90914
> TREE_CHECKER_WRITE: nr=1575 total=5783330 avg=3671

Definitely something worth looking at.

> 
> So if just looking at the average number of csum calculate, it only
> brings 3~5μs. And surprisingly, write time tree checker even slower than
> checksum!
> 
> Also surprisingly, read time tree checker takes near 100μs. nearly 20
> times slower than csum/write time tree checker.
> 
> So we have a new direction to enhance tree-checker performance.
> BTW, bcc/ebpf is really awesome!
> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Interesting btrfs csum and tree-checker performance penalty analyse
  2019-04-03  9:09 ` Nikolay Borisov
@ 2019-04-03  9:29   ` Qu Wenruo
  0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2019-04-03  9:29 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2019/4/3 下午5:09, Nikolay Borisov wrote:
>
>
> On 3.04.19 г. 11:54 ч., Qu Wenruo wrote:
>> Hi,
>>
>> Recently Intel LKP performance test is reporting regression of btrfs
>> performance.
>>
>> It points to tree-checker code, and since I'm poking around the
>> bcc/ebpf, I spend some time to do an interesting look into the
>> performance penalty about both btrfs csum and tree-checker.
>>
>> The code base is David's misc-next, which contains both write-time tree
>> checker and enhanced code to handle fuzzed image.
>>
>> The tool can be find in my gist:
>> https://gist.github.com/adam900710/b5542f2e52ed4687986cf41f64b85253
>
> So you are essentially trying to figure out the average run time of 3
> functions, this could have been made simpler by using the funclatency
> bcc tool from iovisor repo:
>
> https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
>
>
> Actually running this tool will show you a latency histogram making it
> easier to spot any latency outliers. An average value doesn't mean
> anything without having more context i.e stddev.

That can be done easily in python part, although the histogram has a
better way to present it.

>
>
>>
>> To use the tool, one needs bcc-python binding and kernel config for
>> eBPF, but at least Arch default kernel has all needed config, so any one
>> can try it on Arch.
>>
>> The work load is:
>>  mkfs.btrfs -n 4K $DEV
>>  mount $DEV $MNT
>>  fsstress -n 10000 -w -d $MNT
>>  umount $MNT
>>
>>  ## start my script ##
>>  mount $DEV $MNT
>>  ls -R $MNT > /dev/null # To read all fs tree blocks
>>  fsstress -n 1000 -w -d $MNT # Trigger enough write
>>  umount $MNT
>>  ## stop my script ##
>>
>>
>> The result is very interesting:
>> Basic result is:
>> CSUM_TREE_BLOCK: nr=2311 total=10000612 avg=4327
>> TREE_CHECKER_READ: nr=461 total=41911553 avg=90914
>> TREE_CHECKER_WRITE: nr=1575 total=5783330 avg=3671
>
> Definitely something worth looking at.

And it already exposes a bug.

The write time tree checker doesn't check the content of leaf, which is
why it's so fast.

For the slow read part, it's the empty root owner check, which I'll
definitely remove it.

Thanks,
Qu
>
>>
>> So if just looking at the average number of csum calculate, it only
>> brings 3~5μs. And surprisingly, write time tree checker even slower than
>> checksum!
>>
>> Also surprisingly, read time tree checker takes near 100μs. nearly 20
>> times slower than csum/write time tree checker.
>>
>> So we have a new direction to enhance tree-checker performance.
>> BTW, bcc/ebpf is really awesome!
>>
>> Thanks,
>> Qu
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-04-03  9:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03  8:54 Interesting btrfs csum and tree-checker performance penalty analyse Qu Wenruo
2019-04-03  9:09 ` Nikolay Borisov
2019-04-03  9:29   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.