[PATCH RFC 0/3] btrfs: Performance profiler support

* [PATCH RFC 0/3] btrfs: Performance profiler support
@ 2019-03-06  6:19 Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 1/3] btrfs: Introduce performance profiler Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-03-06  6:19 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/perf_tree_lock
Which is based on v5.0-rc7 tag.

Although we have ftrace/perf to do various performance analyse, under most
case the granularity is too small, resulting data flood for users.

This RFC patchset provides a btrfs specific performance profiler.
It calculates certain function duration and account the duration.

The result is provided through RO sysfs interface,
/sys/fs/btrfs/<FSID>/profiler.

The content of that file is genreated when read.
Users can have full control on the sample resolution.

The example content can be found in the last patch.

One example using the interface to profile fsstress can be found here:
https://docs.google.com/spreadsheets/d/1BVng8hqyyxFWPQF_1N0cpwiCA6R3SXtDTHmRqo8qyvo/edit?usp=sharing

The test script can be found here:
https://gist.github.com/adam900710/ca47b9a8d4b8db7168b261b6fba71ff1

The interesting result from the graph is:
- Concurrency on fs tree is only high for the initial 25 seconds
  My initial expectation is, the hotness on fs tree should be more or
  less stable. Which looks pretty interesting

- Then extent tree get more concurrency after 25 seconds
  This again breaks my expectation. As write to extent tree should only
  be triggered by delayed ref. So there is something interesting here
  too.

- Root tree is pretty cold
  Since the test is only happening on fs tree, it's expected to be less
  racy.

- There is some minor load on other trees.
  My guess is, that's from csum tree.

Although the patchset is relatively small, there are some design points
need extra commends before the patchset get larger and larger.

- How should this profiler get enabled?
  Should this feature get enabled by mount option or kernel config?
  Or just let it run for all kernel build?
  Currently the overhead should be pretty small, but the overhead should
  be larger and larger with new telemetry.

- Design of the interface
  Is this a valid usage of sysfs or an abuse?
  And if the content can be improved for both human or program?

- Idea on new telemetry
  My plan is to add transaction wait time.

Qu Wenruo (3):
  btrfs: Introduce performance profiler
  btrfs: locking: Add hooks for btrfs perf
  btrfs: perf: Add RO sysfs interface to collect perf result

 fs/btrfs/Makefile  |  2 +-
 fs/btrfs/ctree.h   |  3 ++
 fs/btrfs/disk-io.c |  6 +++
 fs/btrfs/locking.c | 11 ++++++
 fs/btrfs/perf.c    | 92 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/perf.h    | 44 ++++++++++++++++++++++
 fs/btrfs/sysfs.c   | 39 ++++++++++++++++++++
 7 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/perf.c
 create mode 100644 fs/btrfs/perf.h

-- 
2.21.0

^ permalink raw reply	[flat|nested] 16+ messages in thread