All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] btrfs: Performance profiler support
@ 2019-03-06  6:19 Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 1/3] btrfs: Introduce performance profiler Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-03-06  6:19 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/perf_tree_lock
Which is based on v5.0-rc7 tag.

Although we have ftrace/perf to do various performance analyse, under most
case the granularity is too small, resulting data flood for users.

This RFC patchset provides a btrfs specific performance profiler.
It calculates certain function duration and account the duration.

The result is provided through RO sysfs interface,
/sys/fs/btrfs/<FSID>/profiler.

The content of that file is genreated when read.
Users can have full control on the sample resolution.

The example content can be found in the last patch.

One example using the interface to profile fsstress can be found here:
https://docs.google.com/spreadsheets/d/1BVng8hqyyxFWPQF_1N0cpwiCA6R3SXtDTHmRqo8qyvo/edit?usp=sharing

The test script can be found here:
https://gist.github.com/adam900710/ca47b9a8d4b8db7168b261b6fba71ff1

The interesting result from the graph is:
- Concurrency on fs tree is only high for the initial 25 seconds
  My initial expectation is, the hotness on fs tree should be more or
  less stable. Which looks pretty interesting

- Then extent tree get more concurrency after 25 seconds
  This again breaks my expectation. As write to extent tree should only
  be triggered by delayed ref. So there is something interesting here
  too.

- Root tree is pretty cold
  Since the test is only happening on fs tree, it's expected to be less
  racy.
  
- There is some minor load on other trees.
  My guess is, that's from csum tree.

Although the patchset is relatively small, there are some design points
need extra commends before the patchset get larger and larger.

- How should this profiler get enabled?
  Should this feature get enabled by mount option or kernel config?
  Or just let it run for all kernel build?
  Currently the overhead should be pretty small, but the overhead should
  be larger and larger with new telemetry.

- Design of the interface
  Is this a valid usage of sysfs or an abuse?
  And if the content can be improved for both human or program?

- Idea on new telemetry
  My plan is to add transaction wait time.

Qu Wenruo (3):
  btrfs: Introduce performance profiler
  btrfs: locking: Add hooks for btrfs perf
  btrfs: perf: Add RO sysfs interface to collect perf result

 fs/btrfs/Makefile  |  2 +-
 fs/btrfs/ctree.h   |  3 ++
 fs/btrfs/disk-io.c |  6 +++
 fs/btrfs/locking.c | 11 ++++++
 fs/btrfs/perf.c    | 92 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/perf.h    | 44 ++++++++++++++++++++++
 fs/btrfs/sysfs.c   | 39 ++++++++++++++++++++
 7 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/perf.c
 create mode 100644 fs/btrfs/perf.h

-- 
2.21.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFC 1/3] btrfs: Introduce performance profiler
  2019-03-06  6:19 [PATCH RFC 0/3] btrfs: Performance profiler support Qu Wenruo
@ 2019-03-06  6:19 ` Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 2/3] btrfs: locking: Add hooks for btrfs perf Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-03-06  6:19 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the skeleton of btrfs performance profiler.

The objective of btrfs performance profiler is to provide various
indicator to locate possible performance bottleneck.

Initial btrfs profiler only supports sleepable tree lock anaylyse for
the following trees:
- fs/subvolume trees
- extent tree
- root tree
- all other trees

This patch is just the basic skeleton for alloc/free.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/Makefile  |  2 +-
 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/disk-io.c |  6 ++++++
 fs/btrfs/perf.c    | 52 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/perf.h    | 27 ++++++++++++++++++++++++
 5 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/perf.c
 create mode 100644 fs/btrfs/perf.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index ca693dd554e9..7b20d93e4988 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -10,7 +10,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-	   uuid-tree.o props.o free-space-tree.o tree-checker.o
+	   uuid-tree.o props.o free-space-tree.o tree-checker.o perf.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7a2a2621f0d9..a64f81da0c5e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1163,6 +1163,9 @@ struct btrfs_fs_info {
 	spinlock_t swapfile_pins_lock;
 	struct rb_root swapfile_pins;
 
+	/* performance profiler */
+	struct btrfs_perf_profiler *profiler;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6a2a2a951705..a2a4ae344c38 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -38,6 +38,7 @@
 #include "compression.h"
 #include "tree-checker.h"
 #include "ref-verify.h"
+#include "perf.h"
 
 #ifdef CONFIG_X86
 #include <asm/cpufeature.h>
@@ -2608,6 +2609,9 @@ int open_ctree(struct super_block *sb,
 	int clear_free_space_tree = 0;
 	int level;
 
+	fs_info->profiler = btrfs_perf_alloc_profiler();
+	if (IS_ERR(fs_info->profiler))
+		return PTR_ERR(fs_info->profiler);
 	tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info, GFP_KERNEL);
 	chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info, GFP_KERNEL);
 	if (!tree_root || !chunk_root) {
@@ -3330,6 +3334,7 @@ int open_ctree(struct super_block *sb,
 fail:
 	btrfs_free_stripe_hash_table(fs_info);
 	btrfs_close_devices(fs_info->fs_devices);
+	btrfs_perf_free_profiler(fs_info);
 	return err;
 
 recovery_tree_root:
@@ -4043,6 +4048,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 	btrfs_free_stripe_hash_table(fs_info);
 	btrfs_free_ref_cache(fs_info);
 
+	btrfs_perf_free_profiler(fs_info);
 	while (!list_empty(&fs_info->pinned_chunks)) {
 		struct extent_map *em;
 
diff --git a/fs/btrfs/perf.c b/fs/btrfs/perf.c
new file mode 100644
index 000000000000..2880111f3b0c
--- /dev/null
+++ b/fs/btrfs/perf.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
+ */
+
+#include "perf.h"
+
+/*
+ * Allocate and initialize the performance profiler
+ *
+ * Return valid pointer if everything worked.
+ * Return error pointer if something went wrong.
+ */
+struct btrfs_perf_profiler *btrfs_perf_alloc_profiler(void)
+{
+	struct btrfs_perf_profiler *profiler;
+	int ret;
+	int last_failed;
+	int i;
+
+	profiler = kmalloc(sizeof(*profiler), GFP_KERNEL);
+	if (!profiler)
+		return ERR_PTR(-ENOMEM);
+	for (i = 0; i < BTRFS_PERF_LAST; i++) {
+		ret = percpu_counter_init(&profiler->perf_counters[i], 0,
+					  GFP_KERNEL);
+		if (ret < 0) {
+			last_failed = i;
+			goto cleanup;
+		}
+	}
+	profiler->last_sample = ktime_get_ns();
+	return profiler;
+cleanup:
+	for (i = 0; i < last_failed; i++)
+		percpu_counter_destroy(&profiler->perf_counters[i]);
+	return ERR_PTR(ret);
+}
+
+void btrfs_perf_free_profiler(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_perf_profiler *profiler = fs_info->profiler;
+	int i;
+
+	if (!profiler)
+		return;
+	fs_info->profiler = NULL;
+	for (i = 0; i < BTRFS_PERF_LAST; i++)
+		percpu_counter_destroy(&profiler->perf_counters[i]);
+	kfree(profiler);
+}
+
diff --git a/fs/btrfs/perf.h b/fs/btrfs/perf.h
new file mode 100644
index 000000000000..9dbea6458d86
--- /dev/null
+++ b/fs/btrfs/perf.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
+ */
+
+#ifndef BTRFS_PERF_H
+#define BTRFS_PERF_H
+
+#include "ctree.h"
+
+enum {
+	BTRFS_PERF_TREE_LOCK_FS,
+	BTRFS_PERF_TREE_LOCK_EXTENT,
+	BTRFS_PERF_TREE_LOCK_ROOT,
+	BTRFS_PERF_TREE_LOCK_OTHER,
+	BTRFS_PERF_LAST,
+};
+
+struct btrfs_perf_profiler {
+	/* timestamp (ns) of last sample */
+	u64 last_sample;
+	struct percpu_counter perf_counters[BTRFS_PERF_LAST];
+};
+
+struct btrfs_perf_profiler *btrfs_perf_alloc_profiler(void);
+void btrfs_perf_free_profiler(struct btrfs_fs_info *fs_info);
+#endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 2/3] btrfs: locking: Add hooks for btrfs perf
  2019-03-06  6:19 [PATCH RFC 0/3] btrfs: Performance profiler support Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 1/3] btrfs: Introduce performance profiler Qu Wenruo
@ 2019-03-06  6:19 ` Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 3/3] btrfs: perf: Add RO sysfs interface to collect perf result Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-03-06  6:19 UTC (permalink / raw)
  To: linux-btrfs

For btrfs tree locking, there are only 2 functions can sleep:
- btrfs_tree_read_lock()
  It will wait for any blocking writers
- btrfs_tree_lock()
  It will wait for any blocking readers or writers

Other functions only depends on rwlock which won't sleep.
We doesn't really care about the spinning lock version.

The overheads introduced are:
- two ktime_get() calls
- several if branches
- percpu_counter_add()

Which should be smaller than to ftrace function_graph.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/locking.c | 11 +++++++++++
 fs/btrfs/perf.c    | 20 ++++++++++++++++++++
 fs/btrfs/perf.h    |  8 ++++++++
 3 files changed, 39 insertions(+)

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 1da768e5ef75..c23c8f7450e8 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -11,6 +11,7 @@
 #include "ctree.h"
 #include "extent_io.h"
 #include "locking.h"
+#include "perf.h"
 
 static void btrfs_assert_tree_read_locked(struct extent_buffer *eb);
 
@@ -85,10 +86,14 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
  */
 void btrfs_tree_read_lock(struct extent_buffer *eb)
 {
+	u64 start_ns;
+
+	start_ns = btrfs_perf_start();
 again:
 	BUG_ON(!atomic_read(&eb->blocking_writers) &&
 	       current->pid == eb->lock_owner);
 
+
 	read_lock(&eb->lock);
 	if (atomic_read(&eb->blocking_writers) &&
 	    current->pid == eb->lock_owner) {
@@ -101,16 +106,19 @@ void btrfs_tree_read_lock(struct extent_buffer *eb)
 		BUG_ON(eb->lock_nested);
 		eb->lock_nested = 1;
 		read_unlock(&eb->lock);
+		btrfs_perf_end(eb->fs_info, btrfs_header_owner(eb), start_ns);
 		return;
 	}
 	if (atomic_read(&eb->blocking_writers)) {
 		read_unlock(&eb->lock);
+
 		wait_event(eb->write_lock_wq,
 			   atomic_read(&eb->blocking_writers) == 0);
 		goto again;
 	}
 	atomic_inc(&eb->read_locks);
 	atomic_inc(&eb->spinning_readers);
+	btrfs_perf_end(eb->fs_info, btrfs_header_owner(eb), start_ns);
 }
 
 /*
@@ -227,7 +235,9 @@ void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
  */
 void btrfs_tree_lock(struct extent_buffer *eb)
 {
+	u64 start_ns;
 	WARN_ON(eb->lock_owner == current->pid);
+	start_ns = btrfs_perf_start();
 again:
 	wait_event(eb->read_lock_wq, atomic_read(&eb->blocking_readers) == 0);
 	wait_event(eb->write_lock_wq, atomic_read(&eb->blocking_writers) == 0);
@@ -248,6 +258,7 @@ void btrfs_tree_lock(struct extent_buffer *eb)
 	atomic_inc(&eb->spinning_writers);
 	atomic_inc(&eb->write_locks);
 	eb->lock_owner = current->pid;
+	btrfs_perf_end(eb->fs_info, btrfs_header_owner(eb), start_ns);
 }
 
 /*
diff --git a/fs/btrfs/perf.c b/fs/btrfs/perf.c
index 2880111f3b0c..893bfad8e6d3 100644
--- a/fs/btrfs/perf.c
+++ b/fs/btrfs/perf.c
@@ -50,3 +50,23 @@ void btrfs_perf_free_profiler(struct btrfs_fs_info *fs_info)
 	kfree(profiler);
 }
 
+void btrfs_perf_end(struct btrfs_fs_info *fs_info, u64 eb_owner, u64 start_ns)
+{
+	struct btrfs_perf_profiler *profiler = fs_info->profiler;
+	u64 end_ns;
+	int i;
+
+	if (!profiler)
+		return;
+
+	end_ns = ktime_get_ns();
+	if (eb_owner == BTRFS_ROOT_TREE_OBJECTID)
+		i = BTRFS_PERF_TREE_LOCK_ROOT;
+	else if (is_fstree(eb_owner))
+		i = BTRFS_PERF_TREE_LOCK_FS;
+	else if (eb_owner == BTRFS_EXTENT_TREE_OBJECTID)
+		i = BTRFS_PERF_TREE_LOCK_EXTENT;
+	else
+		i = BTRFS_PERF_TREE_LOCK_OTHER;
+	percpu_counter_add(&profiler->perf_counters[i], end_ns - start_ns);
+}
diff --git a/fs/btrfs/perf.h b/fs/btrfs/perf.h
index 9dbea6458d86..7cf4b8c9a0ad 100644
--- a/fs/btrfs/perf.h
+++ b/fs/btrfs/perf.h
@@ -24,4 +24,12 @@ struct btrfs_perf_profiler {
 
 struct btrfs_perf_profiler *btrfs_perf_alloc_profiler(void);
 void btrfs_perf_free_profiler(struct btrfs_fs_info *fs_info);
+void btrfs_perf_update_lock(struct btrfs_fs_info *fs_info,
+			    u64 eb_owner, u64 ns_diff);
+static inline u64 btrfs_perf_start(void)
+{
+	return ktime_get_ns();
+}
+
+void btrfs_perf_end(struct btrfs_fs_info *fs_info, u64 eb_owner, u64 start_ns);
 #endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 3/3] btrfs: perf: Add RO sysfs interface to collect perf result
  2019-03-06  6:19 [PATCH RFC 0/3] btrfs: Performance profiler support Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 1/3] btrfs: Introduce performance profiler Qu Wenruo
  2019-03-06  6:19 ` [PATCH RFC 2/3] btrfs: locking: Add hooks for btrfs perf Qu Wenruo
@ 2019-03-06  6:19 ` Qu Wenruo
  2019-03-07 14:02 ` [PATCH RFC 0/3] btrfs: Performance profiler support David Sterba
  2019-03-10  3:08 ` Anand Jain
  4 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-03-06  6:19 UTC (permalink / raw)
  To: linux-btrfs

This patch adds a new sys fs interface, 'profiler', for user to get
real time performance data.

The content of /sys/fs/btrfs/<FDID>/profiler is generated at the time of
read, so user could have full control of the duration resolution.

The output example would be:
timestamp = 16364075995092ns
duration = 1025342515ns
TREE_LOCK_FS = 417160165ns (40.68%)
TREE_LOCK_EXTENT = 45087670ns (4.39%)
TREE_LOCK_ROOT = 1555506ns (0.15%)
TREE_LOCK_OTHER = 20387436ns (1.98%)

The 'timestamp' content is directly from ktime_get_ns(), which starts from
kernel boot, and doesn't count hibernation/suspension.
The 'duration' content is the time difference between last sample, in
nanoseconds. Doesn't count hibernation or suspension either.

The 'TREE_LOCK_*' content is the time spent on sleepable tree lock.
The percentage in the round brackets is the sleep time compared to
duration, which can be larger than 100%.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/perf.c  | 20 ++++++++++++++++++++
 fs/btrfs/perf.h  |  9 +++++++++
 fs/btrfs/sysfs.c | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/fs/btrfs/perf.c b/fs/btrfs/perf.c
index 893bfad8e6d3..d1c11b91eeb8 100644
--- a/fs/btrfs/perf.c
+++ b/fs/btrfs/perf.c
@@ -70,3 +70,23 @@ void btrfs_perf_end(struct btrfs_fs_info *fs_info, u64 eb_owner, u64 start_ns)
 		i = BTRFS_PERF_TREE_LOCK_OTHER;
 	percpu_counter_add(&profiler->perf_counters[i], end_ns - start_ns);
 }
+
+void btrfs_perf_report(struct btrfs_fs_info *fs_info,
+		       struct btrfs_perf_result *result)
+{
+	struct btrfs_perf_profiler *profiler = fs_info->profiler;
+	u64 end_ns = ktime_get_ns();
+	int i;
+
+	if (!profiler || !result)
+		return;
+
+	for (i = 0; i < BTRFS_PERF_LAST; i++) {
+		result->results_ns[i] =
+			percpu_counter_sum(&profiler->perf_counters[i]);
+		percpu_counter_set(&profiler->perf_counters[i], 0);
+	}
+	result->current_ns = end_ns;
+	result->duration_ns = end_ns - profiler->last_sample;
+	profiler->last_sample = end_ns;
+}
diff --git a/fs/btrfs/perf.h b/fs/btrfs/perf.h
index 7cf4b8c9a0ad..0af12aa148b4 100644
--- a/fs/btrfs/perf.h
+++ b/fs/btrfs/perf.h
@@ -22,6 +22,12 @@ struct btrfs_perf_profiler {
 	struct percpu_counter perf_counters[BTRFS_PERF_LAST];
 };
 
+struct btrfs_perf_result {
+	u64 current_ns;
+	u64 duration_ns;
+	u64 results_ns[BTRFS_PERF_LAST];
+};
+
 struct btrfs_perf_profiler *btrfs_perf_alloc_profiler(void);
 void btrfs_perf_free_profiler(struct btrfs_fs_info *fs_info);
 void btrfs_perf_update_lock(struct btrfs_fs_info *fs_info,
@@ -32,4 +38,7 @@ static inline u64 btrfs_perf_start(void)
 }
 
 void btrfs_perf_end(struct btrfs_fs_info *fs_info, u64 eb_owner, u64 start_ns);
+
+void btrfs_perf_report(struct btrfs_fs_info *fs_info,
+		       struct btrfs_perf_result *result);
 #endif
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 5a5930e3d32b..2b5d72b699d8 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -16,6 +16,7 @@
 #include "transaction.h"
 #include "sysfs.h"
 #include "volumes.h"
+#include "perf.h"
 
 static inline struct btrfs_fs_info *to_fs_info(struct kobject *kobj);
 static inline struct btrfs_fs_devices *to_fs_devs(struct kobject *kobj);
@@ -456,6 +457,43 @@ static ssize_t btrfs_sectorsize_show(struct kobject *kobj,
 
 BTRFS_ATTR(, sectorsize, btrfs_sectorsize_show);
 
+#define print_one_result(name)					\
+({								\
+	tmp = result.results_ns[BTRFS_PERF_##name];		\
+	size += snprintf(buf + size, PAGE_SIZE - size,		\
+			 "%s = %lluns (%llu.%02llu%%)\n",	\
+			 #name, tmp, tmp * 100 / duration,	\
+			 (tmp * 10000 / duration) % 100);	\
+})
+
+static ssize_t btrfs_profiler_show(struct kobject *kobj,
+				struct kobj_attribute *a, char *buf)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	struct btrfs_perf_result result = { 0 };
+	u64 tmp;
+	u64 duration;
+	ssize_t size = 0;
+
+	btrfs_perf_report(fs_info, &result);
+	if (result.duration_ns == 0)
+		return snprintf(buf, PAGE_SIZE, "profiler not running\n");
+
+	duration = result.duration_ns;
+	size += snprintf(buf + size, PAGE_SIZE - size, "timestamp = %lluns\n",
+			 result.current_ns);
+	size += snprintf(buf + size, PAGE_SIZE - size, "duration = %lluns\n",
+			 duration);
+	print_one_result(TREE_LOCK_FS);
+	print_one_result(TREE_LOCK_EXTENT);
+	print_one_result(TREE_LOCK_ROOT);
+	print_one_result(TREE_LOCK_OTHER);
+	return size;
+}
+#undef print_one_result
+
+BTRFS_ATTR(, profiler, btrfs_profiler_show);
+
 static ssize_t btrfs_clone_alignment_show(struct kobject *kobj,
 				struct kobj_attribute *a, char *buf)
 {
@@ -525,6 +563,7 @@ static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, clone_alignment),
 	BTRFS_ATTR_PTR(, quota_override),
 	BTRFS_ATTR_PTR(, metadata_uuid),
+	BTRFS_ATTR_PTR(, profiler),
 	NULL,
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-06  6:19 [PATCH RFC 0/3] btrfs: Performance profiler support Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-03-06  6:19 ` [PATCH RFC 3/3] btrfs: perf: Add RO sysfs interface to collect perf result Qu Wenruo
@ 2019-03-07 14:02 ` David Sterba
  2019-03-07 14:18   ` Qu Wenruo
  2019-03-10  3:08 ` Anand Jain
  4 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2019-03-07 14:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Mar 06, 2019 at 02:19:04PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/perf_tree_lock
> Which is based on v5.0-rc7 tag.
> 
> Although we have ftrace/perf to do various performance analyse, under most
> case the granularity is too small, resulting data flood for users.
> 
> This RFC patchset provides a btrfs specific performance profiler.
> It calculates certain function duration and account the duration.
> 
> The result is provided through RO sysfs interface,
> /sys/fs/btrfs/<FSID>/profiler.
> 
> The content of that file is genreated when read.
> Users can have full control on the sample resolution.
> 
> The example content can be found in the last patch.
> 
> One example using the interface to profile fsstress can be found here:
> https://docs.google.com/spreadsheets/d/1BVng8hqyyxFWPQF_1N0cpwiCA6R3SXtDTHmRqo8qyvo/edit?usp=sharing
> 
> The test script can be found here:
> https://gist.github.com/adam900710/ca47b9a8d4b8db7168b261b6fba71ff1
> 
> The interesting result from the graph is:
> - Concurrency on fs tree is only high for the initial 25 seconds
>   My initial expectation is, the hotness on fs tree should be more or
>   less stable. Which looks pretty interesting
> 
> - Then extent tree get more concurrency after 25 seconds
>   This again breaks my expectation. As write to extent tree should only
>   be triggered by delayed ref. So there is something interesting here
>   too.
> 
> - Root tree is pretty cold
>   Since the test is only happening on fs tree, it's expected to be less
>   racy.
>   
> - There is some minor load on other trees.
>   My guess is, that's from csum tree.
> 
> Although the patchset is relatively small, there are some design points
> need extra commends before the patchset get larger and larger.
> 
> - How should this profiler get enabled?
>   Should this feature get enabled by mount option or kernel config?
>   Or just let it run for all kernel build?
>   Currently the overhead should be pretty small, but the overhead should
>   be larger and larger with new telemetry.
> 
> - Design of the interface
>   Is this a valid usage of sysfs or an abuse?
>   And if the content can be improved for both human or program?

Well, most of that is answered by 'figure out how to use tracepoints and
perf for that'.

If there were not a whole substystem, actively maintained and
documented, implementing something like that might help, but that's not
the case.

I see what you were able to understand from the results, but it's more
like a custom analysis tool that should not merged as-is. It brings a
whole new interface and that's always hard to get right with all the
mistakes ahead that somebody has probably solved already.

It would be good to have list of the limitations of perf you see, and we
can find a solution ourselves or ask elsewhere.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-07 14:02 ` [PATCH RFC 0/3] btrfs: Performance profiler support David Sterba
@ 2019-03-07 14:18   ` Qu Wenruo
  2019-03-07 16:12     ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-03-07 14:18 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, linux-perf-users


[-- Attachment #1.1: Type: text/plain, Size: 4170 bytes --]



On 2019/3/7 下午10:02, David Sterba wrote:
> On Wed, Mar 06, 2019 at 02:19:04PM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/perf_tree_lock
>> Which is based on v5.0-rc7 tag.
>>
>> Although we have ftrace/perf to do various performance analyse, under most
>> case the granularity is too small, resulting data flood for users.
>>
>> This RFC patchset provides a btrfs specific performance profiler.
>> It calculates certain function duration and account the duration.
>>
>> The result is provided through RO sysfs interface,
>> /sys/fs/btrfs/<FSID>/profiler.
>>
>> The content of that file is genreated when read.
>> Users can have full control on the sample resolution.
>>
>> The example content can be found in the last patch.
>>
>> One example using the interface to profile fsstress can be found here:
>> https://docs.google.com/spreadsheets/d/1BVng8hqyyxFWPQF_1N0cpwiCA6R3SXtDTHmRqo8qyvo/edit?usp=sharing
>>
>> The test script can be found here:
>> https://gist.github.com/adam900710/ca47b9a8d4b8db7168b261b6fba71ff1
>>
>> The interesting result from the graph is:
>> - Concurrency on fs tree is only high for the initial 25 seconds
>>   My initial expectation is, the hotness on fs tree should be more or
>>   less stable. Which looks pretty interesting
>>
>> - Then extent tree get more concurrency after 25 seconds
>>   This again breaks my expectation. As write to extent tree should only
>>   be triggered by delayed ref. So there is something interesting here
>>   too.
>>
>> - Root tree is pretty cold
>>   Since the test is only happening on fs tree, it's expected to be less
>>   racy.
>>   
>> - There is some minor load on other trees.
>>   My guess is, that's from csum tree.
>>
>> Although the patchset is relatively small, there are some design points
>> need extra commends before the patchset get larger and larger.
>>
>> - How should this profiler get enabled?
>>   Should this feature get enabled by mount option or kernel config?
>>   Or just let it run for all kernel build?
>>   Currently the overhead should be pretty small, but the overhead should
>>   be larger and larger with new telemetry.
>>
>> - Design of the interface
>>   Is this a valid usage of sysfs or an abuse?
>>   And if the content can be improved for both human or program?
> 
> Well, most of that is answered by 'figure out how to use tracepoints and
> perf for that'.
> 
> If there were not a whole substystem, actively maintained and
> documented, implementing something like that might help, but that's not
> the case.
> 
> I see what you were able to understand from the results, but it's more
> like a custom analysis tool that should not merged as-is. It brings a
> whole new interface and that's always hard to get right with all the
> mistakes ahead that somebody has probably solved already.
> 
> It would be good to have list of the limitations of perf you see, and we
> can find a solution ourselves or ask elsewhere.

Add linux-perf-users mail list.

I should mention the problem of ftrace (or my perf skill) in cover letter.

The biggest problem is the conflicts between detailed function execution
duration and classification.

For tree lock case, indeed we can use function graph to get execution
duration of btrfs_tree_read_lock() and btrfs_tree_lock().
But that's all. We can't really do much classification.

If just use trace event, with trace event added, then we can't get the
execution duration.

Even we can get both (my poor perf/ftrace skill must be blamed), it may
take some more time to get an easy graph.
We need to classify the result by eb owner, then account the execution
duration compared to the timestamp.

If perf users could provide better solution, I'm pretty willing to learn
some new tricks.
(Considering my poor perf skill is mostly from that LWN article, it
won't be a surprise for some perf black magic beating my hack).


And for the interface, that's why I send the patchset early to get some
feedback.
I'm really a bad interface designer.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-07 14:18   ` Qu Wenruo
@ 2019-03-07 16:12     ` David Sterba
  2019-03-09  6:21       ` Nikolay Borisov
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2019-03-07 16:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, linux-perf-users

On Thu, Mar 07, 2019 at 10:18:49PM +0800, Qu Wenruo wrote:
> > Well, most of that is answered by 'figure out how to use tracepoints and
> > perf for that'.
> > 
> > If there were not a whole substystem, actively maintained and
> > documented, implementing something like that might help, but that's not
> > the case.
> > 
> > I see what you were able to understand from the results, but it's more
> > like a custom analysis tool that should not merged as-is. It brings a
> > whole new interface and that's always hard to get right with all the
> > mistakes ahead that somebody has probably solved already.
> > 
> > It would be good to have list of the limitations of perf you see, and we
> > can find a solution ourselves or ask elsewhere.
> 
> Add linux-perf-users mail list.
> 
> I should mention the problem of ftrace (or my perf skill) in cover letter.
> 
> The biggest problem is the conflicts between detailed function execution
> duration and classification.
> 
> For tree lock case, indeed we can use function graph to get execution
> duration of btrfs_tree_read_lock() and btrfs_tree_lock().
> But that's all. We can't really do much classification.
> 
> If just use trace event, with trace event added, then we can't get the
> execution duration.

I think you can save the start and end times and put the delta to the
tracepoint output.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-07 16:12     ` David Sterba
@ 2019-03-09  6:21       ` Nikolay Borisov
  2019-03-09  6:32         ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Borisov @ 2019-03-09  6:21 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs, linux-perf-users



On 7.03.19 г. 18:12 ч., David Sterba wrote:
> On Thu, Mar 07, 2019 at 10:18:49PM +0800, Qu Wenruo wrote:
>>> Well, most of that is answered by 'figure out how to use tracepoints and
>>> perf for that'.
>>>
>>> If there were not a whole substystem, actively maintained and
>>> documented, implementing something like that might help, but that's not
>>> the case.
>>>
>>> I see what you were able to understand from the results, but it's more
>>> like a custom analysis tool that should not merged as-is. It brings a
>>> whole new interface and that's always hard to get right with all the
>>> mistakes ahead that somebody has probably solved already.
>>>
>>> It would be good to have list of the limitations of perf you see, and we
>>> can find a solution ourselves or ask elsewhere.
>>
>> Add linux-perf-users mail list.
>>
>> I should mention the problem of ftrace (or my perf skill) in cover letter.
>>
>> The biggest problem is the conflicts between detailed function execution
>> duration and classification.
>>
>> For tree lock case, indeed we can use function graph to get execution
>> duration of btrfs_tree_read_lock() and btrfs_tree_lock().
>> But that's all. We can't really do much classification.
>>
>> If just use trace event, with trace event added, then we can't get the
>> execution duration.
> 
> I think you can save the start and end times and put the delta to the
> tracepoint output.
> 

Yes, this can be done and in fact JBD2 uses a similar approach. For
example check how journal->j_stats is being used in fs/jbd2/commit.c.
I.e stats are accumulated in a structure which is then printed by some
tracepoints. Concretely, rs_request_delayd calculates the delay between
transaction commit being requested and the transaction entering locked
(committing) state.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-09  6:21       ` Nikolay Borisov
@ 2019-03-09  6:32         ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-03-09  6:32 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, dsterba, linux-btrfs, linux-perf-users



On 2019/3/9 下午2:21, Nikolay Borisov wrote:
> 
> 
> On 7.03.19 г. 18:12 ч., David Sterba wrote:
>> On Thu, Mar 07, 2019 at 10:18:49PM +0800, Qu Wenruo wrote:
>>>> Well, most of that is answered by 'figure out how to use tracepoints and
>>>> perf for that'.
>>>>
>>>> If there were not a whole substystem, actively maintained and
>>>> documented, implementing something like that might help, but that's not
>>>> the case.
>>>>
>>>> I see what you were able to understand from the results, but it's more
>>>> like a custom analysis tool that should not merged as-is. It brings a
>>>> whole new interface and that's always hard to get right with all the
>>>> mistakes ahead that somebody has probably solved already.
>>>>
>>>> It would be good to have list of the limitations of perf you see, and we
>>>> can find a solution ourselves or ask elsewhere.
>>>
>>> Add linux-perf-users mail list.
>>>
>>> I should mention the problem of ftrace (or my perf skill) in cover letter.
>>>
>>> The biggest problem is the conflicts between detailed function execution
>>> duration and classification.
>>>
>>> For tree lock case, indeed we can use function graph to get execution
>>> duration of btrfs_tree_read_lock() and btrfs_tree_lock().
>>> But that's all. We can't really do much classification.
>>>
>>> If just use trace event, with trace event added, then we can't get the
>>> execution duration.
>>
>> I think you can save the start and end times and put the delta to the
>> tracepoint output.
>>
> 
> Yes, this can be done and in fact JBD2 uses a similar approach. For
> example check how journal->j_stats is being used in fs/jbd2/commit.c.
> I.e stats are accumulated in a structure which is then printed by some
> tracepoints.

The idea to use the trace events other than re-inventing a sysfs
interface indeed makes sense.
I'll go traditional trace events way.

Then my question is, there is no way to make the trace events happen at
certain time point. The example in jdb2 can only be triggered by
jdb2_journal_commit_transaction().

IMHO for such accumulated accounting, it would be much more useful to
get such stats at users' will.
So we still need a method to trigger trace event at users' will.

Thanks,
Qu

> Concretely, rs_request_delayd calculates the delay between
> transaction commit being requested and the transaction entering locked
> (committing) state.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-06  6:19 [PATCH RFC 0/3] btrfs: Performance profiler support Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-03-07 14:02 ` [PATCH RFC 0/3] btrfs: Performance profiler support David Sterba
@ 2019-03-10  3:08 ` Anand Jain
  2019-03-10  9:29   ` Nikolay Borisov
  4 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2019-03-10  3:08 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


> This RFC patchset provides a btrfs specific performance profiler.
> It calculates certain function duration and account the duration.

  I agree we need btrfs specific performance measurements and its
  my list too.

  However my idea was to add it as a btrfs-progs subcommand such as

    btrfs inspect perf ...

  And implement by using the systemtap/perf/bpf/dtrace, as these
  can tap the kernel functions from the useland using which we
  can measure the time taken and no kernel changes will be required.
  But yes we need to update the btrfs-progs if we rename the kernel
  function, which I think is ok.

  I was too early trying this with bpf before, probably there are
  more tools now to do that same thing.

Thanks, Anand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-10  3:08 ` Anand Jain
@ 2019-03-10  9:29   ` Nikolay Borisov
  2019-03-10  9:34     ` Qu Wenruo
  2019-03-11  0:44     ` Anand Jain
  0 siblings, 2 replies; 16+ messages in thread
From: Nikolay Borisov @ 2019-03-10  9:29 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs



On 10.03.19 г. 5:08 ч., Anand Jain wrote:
> 
>  I agree we need btrfs specific performance measurements and its
>  my list too.
> 
>  However my idea was to add it as a btrfs-progs subcommand such as
> 
>    btrfs inspect perf ...
> 
>  And implement by using the systemtap/perf/bpf/dtrace, as these
>  can tap the kernel functions from the useland using which we
>  can measure the time taken and no kernel changes will be required.
>  But yes we need to update the btrfs-progs if we rename the kernel
>  function, which I think is ok.
> 
>  I was too early trying this with bpf before, probably there are
>  more tools now to do that same thing.

This is way too developer oriented to be included in the generic btrfs
tools. Frankly bpf makes sense but only as a separate script being
developed and possibly shared on github or whatnot so that other
interested people can use it. However, integrating with btrfs-progs
definitely seems the wrong thing to do.

On the same note - I'm highly against this patchset landing in the kernel.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-10  9:29   ` Nikolay Borisov
@ 2019-03-10  9:34     ` Qu Wenruo
  2019-03-10  9:40       ` Nikolay Borisov
  2019-03-11  0:44     ` Anand Jain
  1 sibling, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-03-10  9:34 UTC (permalink / raw)
  To: Nikolay Borisov, Anand Jain, linux-btrfs



On 2019/3/10 下午5:29, Nikolay Borisov wrote:
> 
> 
> On 10.03.19 г. 5:08 ч., Anand Jain wrote:
>>
>>  I agree we need btrfs specific performance measurements and its
>>  my list too.
>>
>>  However my idea was to add it as a btrfs-progs subcommand such as
>>
>>    btrfs inspect perf ...
>>
>>  And implement by using the systemtap/perf/bpf/dtrace, as these
>>  can tap the kernel functions from the useland using which we
>>  can measure the time taken and no kernel changes will be required.
>>  But yes we need to update the btrfs-progs if we rename the kernel
>>  function, which I think is ok.
>>
>>  I was too early trying this with bpf before, probably there are
>>  more tools now to do that same thing.
> 
> This is way too developer oriented to be included in the generic btrfs
> tools. Frankly bpf makes sense but only as a separate script being
> developed and possibly shared on github or whatnot so that other
> interested people can use it. However, integrating with btrfs-progs
> definitely seems the wrong thing to do.
> 
> On the same note - I'm highly against this patchset landing in the kernel.

Are you against the interface part or the btrfs specific perf part?

For the first half, indeed the new sysfs is not good and I'm OK to move
to any more established interface.

For the later half, if bfp/ftrace can do the same work, then I'm fine.
Otherwise the need is still here, both developer or experienced sys
admin will need this feature.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-10  9:34     ` Qu Wenruo
@ 2019-03-10  9:40       ` Nikolay Borisov
  2019-03-10  9:56         ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Borisov @ 2019-03-10  9:40 UTC (permalink / raw)
  To: Qu Wenruo, Anand Jain, linux-btrfs



On 10.03.19 г. 11:34 ч., Qu Wenruo wrote:
> 
> 
> On 2019/3/10 下午5:29, Nikolay Borisov wrote:
>>
>>
>> On 10.03.19 г. 5:08 ч., Anand Jain wrote:
>>>
>>>  I agree we need btrfs specific performance measurements and its
>>>  my list too.
>>>
>>>  However my idea was to add it as a btrfs-progs subcommand such as
>>>
>>>    btrfs inspect perf ...
>>>
>>>  And implement by using the systemtap/perf/bpf/dtrace, as these
>>>  can tap the kernel functions from the useland using which we
>>>  can measure the time taken and no kernel changes will be required.
>>>  But yes we need to update the btrfs-progs if we rename the kernel
>>>  function, which I think is ok.
>>>
>>>  I was too early trying this with bpf before, probably there are
>>>  more tools now to do that same thing.
>>
>> This is way too developer oriented to be included in the generic btrfs
>> tools. Frankly bpf makes sense but only as a separate script being
>> developed and possibly shared on github or whatnot so that other
>> interested people can use it. However, integrating with btrfs-progs
>> definitely seems the wrong thing to do.
>>
>> On the same note - I'm highly against this patchset landing in the kernel.
> 
> Are you against the interface part or the btrfs specific perf part?
> 
> For the first half, indeed the new sysfs is not good and I'm OK to move
> to any more established interface.
> 
> For the later half, if bfp/ftrace can do the same work, then I'm fine.
> Otherwise the need is still here, both developer or experienced sys
> admin will need this feature.

I'm against this functionality being part of btrfs. What would be more
useful is a set of bpf scripts which might very well be developed
alongside the main project. They will be a lot more flexible also they
will add overhead only when you really need them, whereas the current
approach AFAIK (I might be wrong, correct me if so) always accumulates
certain stats irrespective of whether the sys admin needs them or not?
At the very least you will sprinkle code such as :

'if (perf measurement enabled) ..... '

I'd really we didn't add this clutter.

> 
> Thanks,
> Qu
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-10  9:40       ` Nikolay Borisov
@ 2019-03-10  9:56         ` Qu Wenruo
  2019-03-10 10:00           ` Nikolay Borisov
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-03-10  9:56 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, Anand Jain, linux-btrfs



On 2019/3/10 下午5:40, Nikolay Borisov wrote:
> 
> 
> On 10.03.19 г. 11:34 ч., Qu Wenruo wrote:
>>
>>
>> On 2019/3/10 下午5:29, Nikolay Borisov wrote:
>>>
>>>
>>> On 10.03.19 г. 5:08 ч., Anand Jain wrote:
>>>>
>>>>  I agree we need btrfs specific performance measurements and its
>>>>  my list too.
>>>>
>>>>  However my idea was to add it as a btrfs-progs subcommand such as
>>>>
>>>>    btrfs inspect perf ...
>>>>
>>>>  And implement by using the systemtap/perf/bpf/dtrace, as these
>>>>  can tap the kernel functions from the useland using which we
>>>>  can measure the time taken and no kernel changes will be required.
>>>>  But yes we need to update the btrfs-progs if we rename the kernel
>>>>  function, which I think is ok.
>>>>
>>>>  I was too early trying this with bpf before, probably there are
>>>>  more tools now to do that same thing.
>>>
>>> This is way too developer oriented to be included in the generic btrfs
>>> tools. Frankly bpf makes sense but only as a separate script being
>>> developed and possibly shared on github or whatnot so that other
>>> interested people can use it. However, integrating with btrfs-progs
>>> definitely seems the wrong thing to do.
>>>
>>> On the same note - I'm highly against this patchset landing in the kernel.
>>
>> Are you against the interface part or the btrfs specific perf part?
>>
>> For the first half, indeed the new sysfs is not good and I'm OK to move
>> to any more established interface.
>>
>> For the later half, if bfp/ftrace can do the same work, then I'm fine.
>> Otherwise the need is still here, both developer or experienced sys
>> admin will need this feature.
> 
> I'm against this functionality being part of btrfs. What would be more
> useful is a set of bpf scripts which might very well be developed
> alongside the main project. They will be a lot more flexible also they
> will add overhead only when you really need them, whereas the current
> approach AFAIK (I might be wrong, correct me if so) always accumulates
> certain stats irrespective of whether the sys admin needs them or not?

Since this patchset is mostly an RFC, I skipped a lot things like mount
option to skip all such events.

But the bfp part is really a good direction for me to investigate.
If we can do the same work using more established facility, then I'm
completely fine to discard the patchset.

Thanks,
Qu

> At the very least you will sprinkle code such as :
> 
> 'if (perf measurement enabled) ..... '
> 
> I'd really we didn't add this clutter.
> 
>>
>> Thanks,
>> Qu
>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-10  9:56         ` Qu Wenruo
@ 2019-03-10 10:00           ` Nikolay Borisov
  0 siblings, 0 replies; 16+ messages in thread
From: Nikolay Borisov @ 2019-03-10 10:00 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, Anand Jain, linux-btrfs



On 10.03.19 г. 11:56 ч., Qu Wenruo wrote:
> 
> 
> On 2019/3/10 下午5:40, Nikolay Borisov wrote:
>>
>>
>> On 10.03.19 г. 11:34 ч., Qu Wenruo wrote:
>>>
>>>
>>> On 2019/3/10 下午5:29, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 10.03.19 г. 5:08 ч., Anand Jain wrote:
>>>>>
>>>>>  I agree we need btrfs specific performance measurements and its
>>>>>  my list too.
>>>>>
>>>>>  However my idea was to add it as a btrfs-progs subcommand such as
>>>>>
>>>>>    btrfs inspect perf ...
>>>>>
>>>>>  And implement by using the systemtap/perf/bpf/dtrace, as these
>>>>>  can tap the kernel functions from the useland using which we
>>>>>  can measure the time taken and no kernel changes will be required.
>>>>>  But yes we need to update the btrfs-progs if we rename the kernel
>>>>>  function, which I think is ok.
>>>>>
>>>>>  I was too early trying this with bpf before, probably there are
>>>>>  more tools now to do that same thing.
>>>>
>>>> This is way too developer oriented to be included in the generic btrfs
>>>> tools. Frankly bpf makes sense but only as a separate script being
>>>> developed and possibly shared on github or whatnot so that other
>>>> interested people can use it. However, integrating with btrfs-progs
>>>> definitely seems the wrong thing to do.
>>>>
>>>> On the same note - I'm highly against this patchset landing in the kernel.
>>>
>>> Are you against the interface part or the btrfs specific perf part?
>>>
>>> For the first half, indeed the new sysfs is not good and I'm OK to move
>>> to any more established interface.
>>>
>>> For the later half, if bfp/ftrace can do the same work, then I'm fine.
>>> Otherwise the need is still here, both developer or experienced sys
>>> admin will need this feature.
>>
>> I'm against this functionality being part of btrfs. What would be more
>> useful is a set of bpf scripts which might very well be developed
>> alongside the main project. They will be a lot more flexible also they
>> will add overhead only when you really need them, whereas the current
>> approach AFAIK (I might be wrong, correct me if so) always accumulates
>> certain stats irrespective of whether the sys admin needs them or not?
> 
> Since this patchset is mostly an RFC, I skipped a lot things like mount
> option to skip all such events.
> 
> But the bfp part is really a good direction for me to investigate.
> If we can do the same work using more established facility, then I'm
> completely fine to discard the patchset.

The only thing which migth have to be patched in btrfs is to add the
tree owner being part of struct extent_buffer { since there is no easy
way to query it outside of btrfs. This can be gated behind an ifdef
BTRFS_DEBUG .

There is currently a very similar script in iovisor toolset:
https://github.com/iovisor/bcc/blob/master/tools/funclatency.py

it equates to what function_graph currently does but can easily be
extended to mimic the logic in your patchset. I.e modify a map based on
the owner of an extent buffer.

> 
> Thanks,
> Qu
> 
>> At the very least you will sprinkle code such as :
>>
>> 'if (perf measurement enabled) ..... '
>>
>> I'd really we didn't add this clutter.
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 0/3] btrfs: Performance profiler support
  2019-03-10  9:29   ` Nikolay Borisov
  2019-03-10  9:34     ` Qu Wenruo
@ 2019-03-11  0:44     ` Anand Jain
  1 sibling, 0 replies; 16+ messages in thread
From: Anand Jain @ 2019-03-11  0:44 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs


> Frankly bpf makes sense but only as a separate script being
> developed and possibly shared on github or whatnot so that other
> interested people can use it. However, integrating with btrfs-progs
> definitely seems the wrong thing to do.

  The end users will definitely like it to be part of btrfs-progs.
  Just because one less hassle for them.

  More over as these tools are kernel function names dependent, it
  must be in par with kernel function name changes.

  If the tool is not upto date the trouble is passed on to the end user
  or to an maintainer externally. Its messy.

  These are very important tools which would evolve over time, its
  better to keep them with the core btrfs development.

Thanks, Anand

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-03-11  0:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06  6:19 [PATCH RFC 0/3] btrfs: Performance profiler support Qu Wenruo
2019-03-06  6:19 ` [PATCH RFC 1/3] btrfs: Introduce performance profiler Qu Wenruo
2019-03-06  6:19 ` [PATCH RFC 2/3] btrfs: locking: Add hooks for btrfs perf Qu Wenruo
2019-03-06  6:19 ` [PATCH RFC 3/3] btrfs: perf: Add RO sysfs interface to collect perf result Qu Wenruo
2019-03-07 14:02 ` [PATCH RFC 0/3] btrfs: Performance profiler support David Sterba
2019-03-07 14:18   ` Qu Wenruo
2019-03-07 16:12     ` David Sterba
2019-03-09  6:21       ` Nikolay Borisov
2019-03-09  6:32         ` Qu Wenruo
2019-03-10  3:08 ` Anand Jain
2019-03-10  9:29   ` Nikolay Borisov
2019-03-10  9:34     ` Qu Wenruo
2019-03-10  9:40       ` Nikolay Borisov
2019-03-10  9:56         ` Qu Wenruo
2019-03-10 10:00           ` Nikolay Borisov
2019-03-11  0:44     ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.