archive mirror
 help / color / mirror / Atom feed
From: Dharmendra Singh <>
To: unlisted-recipients:; (no To-header on input)
Subject: Re: [PATCH v2 0/2] FUSE: Atomic lookup + open performance numbers
Date: Tue, 22 Mar 2022 17:42:12 +0530	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Subject: 'Re: [PATCH v2 0/2] FUSE: Atomic lookup + open performance numbers'

Thanks, Miklos. For measuring the performance, bonnie++ was used over passthrough_ll mount on tmpfs.
When taking numbers on vm, I could see non-deterministic behaviour in the results. Therefore core
binding was used for passthrough_ll and bonnie++, keeping them on separate cores.

Here are the google sheets having performance numbers.

Following are the libfuse patches(commit on March 7 and March 8 in first link) which were used to test
these changes

Parameters used in mounting passthrough_ll:
 numactl --localalloc --physcpubind=16-23 passthrough_ll -f -osource=/tmp/source,allow_other,allow_root,
 cache=never -o max_idle_threads=1 /tmp/dest
     (Here cache=never results in direct-io on the file)

Parameters used in bonnie++:
In sheet 0B:
numactl --localalloc --physcpubind=0-7  bonnie++ -x 4 -q -s0  -d /tmp/dest/ -n 10:0:0:10 -r 0 -u 0 2>/dev/null

in sheet 1B:
numactl --localalloc --physcpubind=0-7 bonnie++ -x 4 -q -s0 -d /tmp/dest/ -n 10:1:1:10 -r 0 -u 0 2>/dev/null

Additional settings done on the testing machine:
cpupower frequency-set -g performance

Running bonnie++ gives us results for Create/s,  Read/s and Delete/s. Below table summarises the numbers
for  these three operations. Please note that for read of 0 bytes, bonnie++ does ops in order of create-open,
close and stat but no atomic open.  Therefore performance results  in the sheet 0B had overhead of extra
stat calls.  Whereas in sheet 1B, we directed bonnie++ to read 1 byte and this triggered atomic open call but
numbers for this run involve overhead for read operation itself instead of just plain open/close.

Here is the table summarising the performance numbers

Table: 0B
                                               Sequential                  |            Random
                                           Creat/s       Read/s    Del/s   |    Creat/s     Read/s      Del/s
Patched Libfuse                                -3.55%    -4.9%    -4.43%   |    -0.4%      -1.6%       -1.0%
Patched Libfuse + No-Flush                     +22.3%    +6%       +5.15%  |    +27.9%     +14.5%       +2.8%
Patched Libfuse + Patched FuseK                +22.9%    +6.1%     +5.3%   |    +28.3%     +14.5%       +2.3%
Patched Libfuse + Patched FuseK + No-Flush     +33.4%    -4.4%     -3.73%  |    +38.8%     -2.5%        -2.0%

 Table: 1B
                                                  Sequential                    |                  Random
                                           Create/s       Read/s       Del/s    |      Create/s     Read/s     Del/s
Patched Libfuse                            -0.22%        -0.35%       -0.7%     |      -0.27%        -0.78%    -2.35%
Patched Libfuse + No-Flush                 +2.5%         +2.6%        -9.6%     |      +2.5%         -8.6%     -6.26%
Patched Libfuse + Patched FuseK            +1.63%        -1.0%        -11.45%   |      +4.48%        -6.84%    -4.0%
Patched Libfuse + Patched FuseK + No-Flush  +32.43%      +26.61%      +076%     |      +33.2%       +14.7%     -0.40%

No-Flush = No flush trigger from fuse kernel into libfuse

In Table 1B, we see 4th row has good improvements for both create and Read whereas Del seems to be almost not
changed. In Table 0B, 4th row we have Read perf reduced, it was found out that this was caused by some changes
in libfuse. So this was fixed and in Table 1B, same row, we can see increased numbers.

In Table 0B, 3rd row, we have good numbers because bonnie++ used 0 bytes to read  and this changed behaviour
and impacted perf whereas for the same row, Table 1B we have reduced numbers because it involved flush
calls for 1 byte from the fuse kernel into libfuse.

These changes are not for fuse kernel/users-space context switches only, but our main goal is to have improvement performance
for network file systems
   - Number network round trips
   - Reduce load on meta servers with thousands of clients

Reduced kernel/userspace context switches is 'just' a side effect.


  parent reply	other threads:[~2022-03-22 12:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 11:51 [PATCH v2 0/2] FUSE: Implement atomic lookup + open Dharmendra Singh
2022-03-22 11:51 ` [PATCH v2 1/2] " Dharmendra Singh
2022-04-22 15:29   ` Miklos Szeredi
2022-04-25  5:25     ` Dharmendra Hans
2022-04-25  7:37       ` Miklos Szeredi
2022-04-25 10:43         ` Dharmendra Hans
2022-04-29  4:34           ` Dharmendra Hans
2022-03-22 11:51 ` [PATCH v2 2/2] FUSE: Avoid lookup in d_revalidate() Dharmendra Singh
2022-03-22 12:12 ` Dharmendra Singh [this message]
2022-03-29 11:07 ` [PATCH v2 0/2] FUSE: Implement atomic lookup + open Dharmendra Hans
2022-04-07  9:57   ` Dharmendra Hans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).