kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics
@ 2020-04-27 14:18 Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores Emanuele Giuseppe Esposito
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 14:18 UTC (permalink / raw)
  To: kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

There is currently no common way for Linux kernel subsystems to expose
statistics to userspace shared throughout the Linux kernel; subsystems have
to take care of gathering and displaying statistics by themselves, for
example in the form of files in debugfs. For example KVM has its own code
section that takes care of this in virt/kvm/kvm_main.c, where it sets up
debugfs handlers for displaying values and aggregating them from various
subfolders to obtain information about the system state (i.e. displaying
the total number of exits, calculated by summing all exits of all cpus of
all running virtual machines).

Allowing each section of the kernel to do so has two disadvantages. First,
it will introduce redundant code. Second, debugfs is anyway not the right
place for statistics (for example it is affected by lockdown)

In this patch series I introduce statsfs, a synthetic ram-based virtual
filesystem that takes care of gathering and displaying statistics for the
Linux kernel subsystems.

The file system is mounted on /sys/kernel/stats and would be already used
by kvm. Statsfs was initially introduced by Paolo Bonzini [1].

Statsfs offers a generic and stable API, allowing any kind of
directory/file organization and supporting multiple kind of aggregations
(not only sum, but also average, max, min and count_zero) and data types
(all unsigned and signed types plus boolean). The implementation, which is
a generalization of KVM’s debugfs statistics code, takes care of gathering
and displaying information at run time; users only need to specify the
values to be included in each source.

Statsfs would also be a different mountpoint from debugfs, and would not
suffer from limited access due to the security lock down patches. Its main
function is to display each statistics as a file in the desired folder
hierarchy defined through the API. Statsfs files can be read, and possibly
cleared if their file mode allows it.

Statsfs has two main components: the public API defined by
include/linux/statsfs.h, and the virtual file system which should end up in
/sys/kernel/stats.

The API has two main elements, values and sources. Kernel subsystems like
KVM can use the API to create a source, add child sources/values/aggregates
and register it to the root source (that on the virtual fs would be
/sys/kernel/statsfs).

Sources are created via statsfs_source_create(), and each source becomes a
directory in the file system. Sources form a parent-child relationship;
root sources are added to the file system via statsfs_source_register().
Every other source is added to or removed from a parent through the
statsfs_source_add_subordinate and statsfs_source_remote_subordinate APIs.
Once a source is created and added to the tree (via add_subordinate), it
will be used to compute aggregate values in the parent source.

Values represent quantites that are gathered by the statsfs user. Examples
of values include the number of vm exits of a given kind, the amount of
memory used by some data structure, the length of the longest hash table
chain, or anything like that. Values are defined with the
statsfs_source_add_values function. Each value is defined by a struct
statsfs_value; the same statsfs_value can be added to many different
sources. A value can be considered "simple" if it fetches data from a
user-provided location, or "aggregate" if it groups all values in the
subordinates sources that include the same statsfs_value.

For more information, please consult the kerneldoc documentation in patch 2
and the sample uses in the kunit tests and in KVM.

This series of patches is based on my previous series "libfs: group and
simplify linux fs code" and the single patch sent to kvm "kvm_host: unify
VM_STAT and VCPU_STAT definitions in a single place". The former simplifies
code duplicated in debugfs and tracefs (from which statsfs is based on),
the latter groups all macros definition for statistics in kvm in a single
common file shared by all architectures.

Patch 1 adds a new refcount and kref destructor wrappers that take a
semaphore, as those are used later by statsfs. Patch 2 introduces the
statsfs API, patch 3 provides extensive tests that can also be used as
example on how to use the API and patch 4 adds the file system support.
Finally, patch 5 provides a real-life example of statsfs usage in KVM.

[1] https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa366d@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

Emanuele Giuseppe Esposito (5):
  refcount, kref: add dec-and-test wrappers for rw_semaphores
  statsfs API: create, add and remove statsfs sources and values
  kunit: tests for statsfs API
  statsfs fs: virtual fs to show stats to the end-user
  kvm_main: replace debugfs with statsfs

 arch/arm64/kvm/guest.c          |    2 +-
 arch/mips/kvm/mips.c            |    2 +-
 arch/powerpc/kvm/book3s.c       |    6 +-
 arch/powerpc/kvm/booke.c        |    8 +-
 arch/s390/kvm/kvm-s390.c        |   16 +-
 arch/x86/include/asm/kvm_host.h |    2 +-
 arch/x86/kvm/Makefile           |    2 +-
 arch/x86/kvm/debugfs.c          |   64 --
 arch/x86/kvm/statsfs.c          |   49 ++
 arch/x86/kvm/x86.c              |    6 +-
 fs/Kconfig                      |   13 +
 fs/Makefile                     |    1 +
 fs/statsfs/Makefile             |    6 +
 fs/statsfs/inode.c              |  337 ++++++++++
 fs/statsfs/internal.h           |   35 +
 fs/statsfs/statsfs-tests.c      | 1067 +++++++++++++++++++++++++++++++
 fs/statsfs/statsfs.c            |  780 ++++++++++++++++++++++
 include/linux/kref.h            |   11 +
 include/linux/kvm_host.h        |   39 +-
 include/linux/refcount.h        |    2 +
 include/linux/statsfs.h         |  234 +++++++
 include/uapi/linux/magic.h      |    1 +
 lib/refcount.c                  |   32 +
 tools/lib/api/fs/fs.c           |   21 +
 virt/kvm/arm/arm.c              |    2 +-
 virt/kvm/kvm_main.c             |  314 ++-------
 26 files changed, 2670 insertions(+), 382 deletions(-)
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/statsfs.c
 create mode 100644 fs/statsfs/Makefile
 create mode 100644 fs/statsfs/inode.c
 create mode 100644 fs/statsfs/internal.h
 create mode 100644 fs/statsfs/statsfs-tests.c
 create mode 100644 fs/statsfs/statsfs.c
 create mode 100644 include/linux/statsfs.h

-- 
2.25.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores
  2020-04-27 14:18 [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
@ 2020-04-27 14:18 ` Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 14:18 UTC (permalink / raw)
  To: kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

Similar to the existing functions that take a mutex or spinlock if and
only if a reference count is decremented to zero, these new function
take an rwsem for writing just before the refcount reaches 0 (and
call a user-provided function in the case of kref_put_rwsem).

These will be used for statsfs_source data structures, which are
protected by an rw_semaphore to allow concurrent sysfs reads.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/linux/kref.h     | 11 +++++++++++
 include/linux/refcount.h |  2 ++
 lib/refcount.c           | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/include/linux/kref.h b/include/linux/kref.h
index d32e21a2538c..2dc935445f45 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -79,6 +79,17 @@ static inline int kref_put_mutex(struct kref *kref,
 	return 0;
 }
 
+static inline int kref_put_rwsem(struct kref *kref,
+				 void (*release)(struct kref *kref),
+				 struct rw_semaphore *rwsem)
+{
+	if (refcount_dec_and_down_write(&kref->refcount, rwsem)) {
+		release(kref);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int kref_put_lock(struct kref *kref,
 				void (*release)(struct kref *kref),
 				spinlock_t *lock)
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 0e3ee25eb156..a9d5038aec9a 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -99,6 +99,7 @@
 #include <linux/spinlock_types.h>
 
 struct mutex;
+struct rw_semaphore;
 
 /**
  * struct refcount_t - variant of atomic_t specialized for reference counts
@@ -313,6 +314,7 @@ static inline void refcount_dec(refcount_t *r)
 extern __must_check bool refcount_dec_if_one(refcount_t *r);
 extern __must_check bool refcount_dec_not_one(refcount_t *r);
 extern __must_check bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock);
+extern __must_check bool refcount_dec_and_down_write(refcount_t *r, struct rw_semaphore *rwsem);
 extern __must_check bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock);
 extern __must_check bool refcount_dec_and_lock_irqsave(refcount_t *r,
 						       spinlock_t *lock,
diff --git a/lib/refcount.c b/lib/refcount.c
index ebac8b7d15a7..03e113e1b43a 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/mutex.h>
+#include <linux/rwsem.h>
 #include <linux/refcount.h>
 #include <linux/spinlock.h>
 #include <linux/bug.h>
@@ -94,6 +95,37 @@ bool refcount_dec_not_one(refcount_t *r)
 }
 EXPORT_SYMBOL(refcount_dec_not_one);
 
+/**
+ * refcount_dec_and_down_write - return holding rwsem for writing if able to decrement
+ *                               refcount to 0
+ * @r: the refcount
+ * @lock: the mutex to be locked
+ *
+ * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
+ * to decrement when saturated at REFCOUNT_SATURATED.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ *
+ * Return: true and hold rwsem for writing if able to decrement refcount to 0, false
+ *         otherwise
+ */
+bool refcount_dec_and_down_write(refcount_t *r, struct rw_semaphore *lock)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	down_write(lock);
+	if (!refcount_dec_and_test(r)) {
+		up_write(lock);
+		return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(refcount_dec_and_down_write);
+
 /**
  * refcount_dec_and_mutex_lock - return holding mutex if able to decrement
  *                               refcount to 0
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-27 14:18 [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores Emanuele Giuseppe Esposito
@ 2020-04-27 14:18 ` Emanuele Giuseppe Esposito
  2020-04-27 15:47   ` Matthew Wilcox
                     ` (2 more replies)
  2020-04-27 14:18 ` [RFC PATCH 3/5] kunit: tests for statsfs API Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 14:18 UTC (permalink / raw)
  To: kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

Introduction to the statsfs API, that allows to easily create, add
and remove statsfs sources and values. The API allows to easily building
the statistics directory tree to automatically gather them for the linux
kernel. The main functionalities are: create a source, add child
sources/values/aggregates, register it to the root source (that on
the virtual fs would be /sys/kernel/statsfs), ad perform a search for
a value/aggregate.

This allows creating any kind of source tree, making it more flexible
also to future readjustments.

The API representation is only logical and will be backed up
by a virtual file system in patch 4.
Its usage will be shared between the statsfs file system
and the end-users like kvm, the former calling it when it needs to
display and clear statistics, the latter to add values and sources.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 fs/Kconfig              |   7 +
 fs/Makefile             |   1 +
 fs/statsfs/Makefile     |   4 +
 fs/statsfs/internal.h   |  20 ++
 fs/statsfs/statsfs.c    | 618 ++++++++++++++++++++++++++++++++++++++++
 include/linux/statsfs.h | 222 +++++++++++++++
 6 files changed, 872 insertions(+)
 create mode 100644 fs/statsfs/Makefile
 create mode 100644 fs/statsfs/internal.h
 create mode 100644 fs/statsfs/statsfs.c
 create mode 100644 include/linux/statsfs.h

diff --git a/fs/Kconfig b/fs/Kconfig
index f08fbbfafd9a..824fcf86d12b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -328,4 +328,11 @@ source "fs/unicode/Kconfig"
 config IO_WQ
 	bool
 
+config STATS_FS
+	bool "Statistics Filesystem"
+	default y
+	help
+	  statsfs is a virtual file system that provides counters and other
+	  statistics about the running kernel.
+
 endmenu
diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..6942070f54b2 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS)		+= befs/
 obj-$(CONFIG_HOSTFS)		+= hostfs/
 obj-$(CONFIG_CACHEFILES)	+= cachefiles/
 obj-$(CONFIG_DEBUG_FS)		+= debugfs/
+obj-$(CONFIG_STATS_FS)		+= statsfs/
 obj-$(CONFIG_TRACING)		+= tracefs/
 obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
 obj-$(CONFIG_BTRFS_FS)		+= btrfs/
diff --git a/fs/statsfs/Makefile b/fs/statsfs/Makefile
new file mode 100644
index 000000000000..d494a3f30ba5
--- /dev/null
+++ b/fs/statsfs/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+statsfs-objs	:= statsfs.o
+
+obj-$(CONFIG_STATS_FS)	+= statsfs.o
diff --git a/fs/statsfs/internal.h b/fs/statsfs/internal.h
new file mode 100644
index 000000000000..f124683a2ded
--- /dev/null
+++ b/fs/statsfs/internal.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _STATSFS_INTERNAL_H_
+#define _STATSFS_INTERNAL_H_
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/rwsem.h>
+#include <linux/statsfs.h>
+
+/* values, grouped by base */
+struct statsfs_value_source {
+	void *base_addr;
+	bool files_created;
+	struct statsfs_value *values;
+	struct list_head list_element;
+};
+
+int statsfs_val_get_mode(struct statsfs_value *val);
+
+#endif /* _STATSFS_INTERNAL_H_ */
diff --git a/fs/statsfs/statsfs.c b/fs/statsfs/statsfs.c
new file mode 100644
index 000000000000..0ad1d985be46
--- /dev/null
+++ b/fs/statsfs/statsfs.c
@@ -0,0 +1,618 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/limits.h>
+#include <linux/statsfs.h>
+
+#include "internal.h"
+
+struct statsfs_aggregate_value {
+	uint64_t sum, min, max;
+	uint32_t count, count_zero;
+};
+
+static int is_val_signed(struct statsfs_value *val)
+{
+	return val->type & STATSFS_SIGN;
+}
+
+int statsfs_val_get_mode(struct statsfs_value *val)
+{
+	return val->mode ? val->mode : 0644;
+}
+
+static struct statsfs_value *find_value(struct statsfs_value_source *src,
+					struct statsfs_value *val)
+{
+	struct statsfs_value *entry;
+
+	for (entry = src->values; entry->name; entry++) {
+		if (entry == val) {
+			WARN_ON(strcmp(entry->name, val->name) != 0);
+			return entry;
+		}
+	}
+	return NULL;
+}
+
+static struct statsfs_value *
+search_value_in_source(struct statsfs_source *src, struct statsfs_value *arg,
+		       struct statsfs_value_source **val_src)
+{
+	struct statsfs_value *entry;
+	struct statsfs_value_source *src_entry;
+
+	list_for_each_entry(src_entry, &src->values_head, list_element) {
+		entry = find_value(src_entry, arg);
+		if (entry) {
+			*val_src = src_entry;
+			return entry;
+		}
+	}
+
+	return NULL;
+}
+
+/* Called with rwsem held for writing */
+static struct statsfs_value_source *create_value_source(void *base)
+{
+	struct statsfs_value_source *val_src;
+
+	val_src = kzalloc(sizeof(struct statsfs_value_source), GFP_KERNEL);
+	if (!val_src)
+		return ERR_PTR(-ENOMEM);
+
+	val_src->base_addr = base;
+	val_src->list_element =
+		(struct list_head)LIST_HEAD_INIT(val_src->list_element);
+
+	return val_src;
+}
+
+int statsfs_source_add_values(struct statsfs_source *source,
+			      struct statsfs_value *stat, void *ptr)
+{
+	struct statsfs_value_source *val_src;
+	struct statsfs_value_source *entry;
+
+	down_write(&source->rwsem);
+
+	list_for_each_entry(entry, &source->values_head, list_element) {
+		if (entry->base_addr == ptr && entry->values == stat) {
+			up_write(&source->rwsem);
+			return -EEXIST;
+		}
+	}
+
+	val_src = create_value_source(ptr);
+	val_src->values = (struct statsfs_value *)stat;
+
+	/* add the val_src to the source list */
+	list_add(&val_src->list_element, &source->values_head);
+
+	up_write(&source->rwsem);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(statsfs_source_add_values);
+
+void statsfs_source_add_subordinate(struct statsfs_source *source,
+				    struct statsfs_source *sub)
+{
+	down_write(&source->rwsem);
+
+	statsfs_source_get(sub);
+	list_add(&sub->list_element, &source->subordinates_head);
+
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(statsfs_source_add_subordinate);
+
+/* Called with rwsem held for writing */
+static void
+statsfs_source_remove_subordinate_locked(struct statsfs_source *source,
+					 struct statsfs_source *sub)
+{
+	struct list_head *it, *safe;
+	struct statsfs_source *src_entry;
+
+	list_for_each_safe(it, safe, &source->subordinates_head) {
+		src_entry = list_entry(it, struct statsfs_source, list_element);
+		if (src_entry == sub) {
+			WARN_ON(strcmp(src_entry->name, sub->name) != 0);
+			list_del_init(&src_entry->list_element);
+			statsfs_source_put(src_entry);
+			return;
+		}
+	}
+}
+
+void statsfs_source_remove_subordinate(struct statsfs_source *source,
+				       struct statsfs_source *sub)
+{
+	down_write(&source->rwsem);
+	statsfs_source_remove_subordinate_locked(source, sub);
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(statsfs_source_remove_subordinate);
+
+/* Called with rwsem held for reading */
+static uint64_t get_simple_value(struct statsfs_value_source *src,
+				 struct statsfs_value *val)
+{
+	uint64_t value_found;
+	void *address;
+
+	address = src->base_addr + val->offset;
+
+	switch (val->type) {
+	case STATSFS_U8:
+		value_found = *((uint8_t *)address);
+		break;
+	case STATSFS_U8 | STATSFS_SIGN:
+		value_found = *((int8_t *)address);
+		break;
+	case STATSFS_U16:
+		value_found = *((uint16_t *)address);
+		break;
+	case STATSFS_U16 | STATSFS_SIGN:
+		value_found = *((int16_t *)address);
+		break;
+	case STATSFS_U32:
+		value_found = *((uint32_t *)address);
+		break;
+	case STATSFS_U32 | STATSFS_SIGN:
+		value_found = *((int32_t *)address);
+		break;
+	case STATSFS_U64:
+		value_found = *((uint64_t *)address);
+		break;
+	case STATSFS_U64 | STATSFS_SIGN:
+		value_found = *((int64_t *)address);
+		break;
+	case STATSFS_BOOL:
+		value_found = *((uint8_t *)address);
+		break;
+	default:
+		value_found = 0;
+		break;
+	}
+
+	return value_found;
+}
+
+/* Called with rwsem held for reading */
+static void clear_simple_value(struct statsfs_value_source *src,
+			       struct statsfs_value *val)
+{
+	void *address;
+
+	address = src->base_addr + val->offset;
+
+	switch (val->type) {
+	case STATSFS_U8:
+		*((uint8_t *)address) = 0;
+		break;
+	case STATSFS_U8 | STATSFS_SIGN:
+		*((int8_t *)address) = 0;
+		break;
+	case STATSFS_U16:
+		*((uint16_t *)address) = 0;
+		break;
+	case STATSFS_U16 | STATSFS_SIGN:
+		*((int16_t *)address) = 0;
+		break;
+	case STATSFS_U32:
+		*((uint32_t *)address) = 0;
+		break;
+	case STATSFS_U32 | STATSFS_SIGN:
+		*((int32_t *)address) = 0;
+		break;
+	case STATSFS_U64:
+		*((uint64_t *)address) = 0;
+		break;
+	case STATSFS_U64 | STATSFS_SIGN:
+		*((int64_t *)address) = 0;
+		break;
+	case STATSFS_BOOL:
+		*((uint8_t *)address) = 0;
+		break;
+	default:
+		break;
+	}
+}
+
+/* Called with rwsem held for reading */
+static void search_all_simple_values(struct statsfs_source *src,
+				     struct statsfs_value_source *ref_src_entry,
+				     struct statsfs_value *val,
+				     struct statsfs_aggregate_value *agg)
+{
+	struct statsfs_value_source *src_entry;
+	uint64_t value_found;
+
+	list_for_each_entry(src_entry, &src->values_head, list_element) {
+		/* skip aggregates */
+		if (src_entry->base_addr == NULL)
+			continue;
+
+		/* useless to search here */
+		if (src_entry->values != ref_src_entry->values)
+			continue;
+
+		/* must be here */
+		value_found = get_simple_value(src_entry, val);
+		agg->sum += value_found;
+		agg->count++;
+		agg->count_zero += (value_found == 0);
+
+		if (is_val_signed(val)) {
+			agg->max = (((int64_t)value_found) >=
+				    ((int64_t)agg->max)) ?
+					   value_found :
+					   agg->max;
+			agg->min = (((int64_t)value_found) <=
+				    ((int64_t)agg->min)) ?
+					   value_found :
+					   agg->min;
+		} else {
+			agg->max = (value_found >= agg->max) ? value_found :
+							       agg->max;
+			agg->min = (value_found <= agg->min) ? value_found :
+							       agg->min;
+		}
+	}
+}
+
+/* Called with rwsem held for reading */
+static void do_recursive_aggregation(struct statsfs_source *root,
+				     struct statsfs_value_source *ref_src_entry,
+				     struct statsfs_value *val,
+				     struct statsfs_aggregate_value *agg)
+{
+	struct statsfs_source *subordinate;
+
+	/* search all simple values in this folder */
+	search_all_simple_values(root, ref_src_entry, val, agg);
+
+	/* recursively search in all subfolders */
+	list_for_each_entry(subordinate, &root->subordinates_head,
+			     list_element) {
+		down_read(&subordinate->rwsem);
+		do_recursive_aggregation(subordinate, ref_src_entry, val, agg);
+		up_read(&subordinate->rwsem);
+	}
+}
+
+/* Called with rwsem held for reading */
+static void init_aggregate_value(struct statsfs_aggregate_value *agg,
+				 struct statsfs_value *val)
+{
+	agg->count = agg->count_zero = agg->sum = 0;
+	if (is_val_signed(val)) {
+		agg->max = S64_MIN;
+		agg->min = S64_MAX;
+	} else {
+		agg->max = 0;
+		agg->min = U64_MAX;
+	}
+}
+
+/* Called with rwsem held for reading */
+static void store_final_value(struct statsfs_aggregate_value *agg,
+			    struct statsfs_value *val, uint64_t *ret)
+{
+	int operation;
+
+	operation = val->aggr_kind | is_val_signed(val);
+
+	switch (operation) {
+	case STATSFS_AVG:
+		*ret = agg->count ? agg->sum / agg->count : 0;
+		break;
+	case STATSFS_AVG | STATSFS_SIGN:
+		*ret = agg->count ? ((int64_t)agg->sum) / agg->count : 0;
+		break;
+	case STATSFS_SUM:
+	case STATSFS_SUM | STATSFS_SIGN:
+		*ret = agg->sum;
+		break;
+	case STATSFS_MIN:
+	case STATSFS_MIN | STATSFS_SIGN:
+		*ret = agg->min;
+		break;
+	case STATSFS_MAX:
+	case STATSFS_MAX | STATSFS_SIGN:
+		*ret = agg->max;
+		break;
+	case STATSFS_COUNT_ZERO:
+	case STATSFS_COUNT_ZERO | STATSFS_SIGN:
+		*ret = agg->count_zero;
+		break;
+	default:
+		break;
+	}
+}
+
+/* Called with rwsem held for reading */
+static int statsfs_source_get_value_locked(struct statsfs_source *source,
+					   struct statsfs_value *arg,
+					   uint64_t *ret)
+{
+	struct statsfs_value_source *src_entry;
+	struct statsfs_value *found;
+	struct statsfs_aggregate_value aggr;
+
+	*ret = 0;
+
+	if (!arg)
+		return -ENOENT;
+
+	/* look in simple values */
+	found = search_value_in_source(source, arg, &src_entry);
+
+	if (!found) {
+		printk(KERN_ERR "Statsfs: Value in source \"%s\" not found!\n",
+		       source->name);
+		return -ENOENT;
+	}
+
+	if (src_entry->base_addr != NULL) {
+		*ret = get_simple_value(src_entry, found);
+		return 0;
+	}
+
+	/* look in aggregates */
+	init_aggregate_value(&aggr, found);
+	do_recursive_aggregation(source, src_entry, found, &aggr);
+	store_final_value(&aggr, found, ret);
+
+	return 0;
+}
+
+int statsfs_source_get_value(struct statsfs_source *source,
+			     struct statsfs_value *arg, uint64_t *ret)
+{
+	int retval;
+
+	down_read(&source->rwsem);
+	retval = statsfs_source_get_value_locked(source, arg, ret);
+	up_read(&source->rwsem);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(statsfs_source_get_value);
+
+/* Called with rwsem held for reading */
+static void set_all_simple_values(struct statsfs_source *src,
+				  struct statsfs_value_source *ref_src_entry,
+				  struct statsfs_value *val)
+{
+	struct statsfs_value_source *src_entry;
+
+	list_for_each_entry(src_entry, &src->values_head, list_element) {
+		/* skip aggregates */
+		if (src_entry->base_addr == NULL)
+			continue;
+
+		/* wrong to search here */
+		if (src_entry->values != ref_src_entry->values)
+			continue;
+
+		if (src_entry->base_addr &&
+			src_entry->values == ref_src_entry->values)
+			clear_simple_value(src_entry, val);
+	}
+}
+
+/* Called with rwsem held for reading */
+static void do_recursive_clean(struct statsfs_source *root,
+			       struct statsfs_value_source *ref_src_entry,
+			       struct statsfs_value *val)
+{
+	struct statsfs_source *subordinate;
+
+	/* search all simple values in this folder */
+	set_all_simple_values(root, ref_src_entry, val);
+
+	/* recursively search in all subfolders */
+	list_for_each_entry(subordinate, &root->subordinates_head,
+			     list_element) {
+		down_read(&subordinate->rwsem);
+		do_recursive_clean(subordinate, ref_src_entry, val);
+		up_read(&subordinate->rwsem);
+	}
+}
+
+/* Called with rwsem held for reading */
+static int statsfs_source_clear_locked(struct statsfs_source *source,
+				       struct statsfs_value *val)
+{
+	struct statsfs_value_source *src_entry;
+	struct statsfs_value *found;
+
+	if (!val)
+		return -ENOENT;
+
+	/* look in simple values */
+	found = search_value_in_source(source, val, &src_entry);
+
+	if (!found) {
+		printk(KERN_ERR "Statsfs: Value in source \"%s\" not found!\n",
+		       source->name);
+		return -ENOENT;
+	}
+
+	if (src_entry->base_addr != NULL) {
+		clear_simple_value(src_entry, found);
+		return 0;
+	}
+
+	/* look in aggregates */
+	do_recursive_clean(source, src_entry, found);
+
+	return 0;
+}
+
+int statsfs_source_clear(struct statsfs_source *source,
+			 struct statsfs_value *val)
+{
+	int retval;
+
+	down_read(&source->rwsem);
+	retval = statsfs_source_clear_locked(source, val);
+	up_read(&source->rwsem);
+
+	return retval;
+}
+
+/* Called with rwsem held for reading */
+static struct statsfs_value *
+find_value_by_name(struct statsfs_value_source *src, char *val)
+{
+	struct statsfs_value *entry;
+
+	for (entry = src->values; entry->name; entry++)
+		if (!strcmp(entry->name, val))
+			return entry;
+
+	return NULL;
+}
+
+/* Called with rwsem held for reading */
+static struct statsfs_value *
+search_in_source_by_name(struct statsfs_source *src, char *name)
+{
+	struct statsfs_value *entry;
+	struct statsfs_value_source *src_entry;
+
+	list_for_each_entry(src_entry, &src->values_head, list_element) {
+		entry = find_value_by_name(src_entry, name);
+		if (entry)
+			return entry;
+	}
+
+	return NULL;
+}
+
+int statsfs_source_get_value_by_name(struct statsfs_source *source, char *name,
+				     uint64_t *ret)
+{
+	struct statsfs_value *val;
+	int retval;
+
+	down_read(&source->rwsem);
+	val = search_in_source_by_name(source, name);
+
+	if (!val) {
+		*ret = 0;
+		up_read(&source->rwsem);
+		return -ENOENT;
+	}
+
+	retval = statsfs_source_get_value_locked(source, val, ret);
+	up_read(&source->rwsem);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(statsfs_source_get_value_by_name);
+
+void statsfs_source_get(struct statsfs_source *source)
+{
+	kref_get(&source->refcount);
+}
+EXPORT_SYMBOL_GPL(statsfs_source_get);
+
+void statsfs_source_revoke(struct statsfs_source *source)
+{
+	struct list_head *it, *safe;
+	struct statsfs_value_source *val_src_entry;
+
+	down_write(&source->rwsem);
+
+	list_for_each_safe(it, safe, &source->values_head) {
+		val_src_entry = list_entry(it, struct statsfs_value_source,
+					   list_element);
+		val_src_entry->base_addr = NULL;
+	}
+
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(statsfs_source_revoke);
+
+/* Called with rwsem held for writing
+ *
+ * The refcount is 0 and the lock was taken before refcount
+ * went from 1 to 0
+ */
+static void statsfs_source_destroy(struct kref *kref_source)
+{
+	struct statsfs_value_source *val_src_entry;
+	struct list_head *it, *safe;
+	struct statsfs_source *child, *source;
+
+	source = container_of(kref_source, struct statsfs_source, refcount);
+
+	/* iterate through the values and delete them */
+	list_for_each_safe(it, safe, &source->values_head) {
+		val_src_entry = list_entry(it, struct statsfs_value_source,
+					   list_element);
+		kfree(val_src_entry);
+	}
+
+	/* iterate through the subordinates and delete them */
+	list_for_each_safe(it, safe, &source->subordinates_head) {
+		child = list_entry(it, struct statsfs_source, list_element);
+		statsfs_source_remove_subordinate_locked(source, child);
+	}
+
+
+	up_write(&source->rwsem);
+	kfree(source->name);
+	kfree(source);
+}
+
+void statsfs_source_put(struct statsfs_source *source)
+{
+	kref_put_rwsem(&source->refcount, statsfs_source_destroy,
+		       &source->rwsem);
+}
+EXPORT_SYMBOL_GPL(statsfs_source_put);
+
+struct statsfs_source *statsfs_source_create(const char *fmt, ...)
+{
+	va_list ap;
+	char buf[100];
+	struct statsfs_source *ret;
+	int char_needed;
+
+	va_start(ap, fmt);
+	char_needed = vsnprintf(buf, 100, fmt, ap);
+	va_end(ap);
+
+	ret = kzalloc(sizeof(struct statsfs_source), GFP_KERNEL);
+	if (!ret)
+		return ERR_PTR(-ENOMEM);
+
+	ret->name = kstrdup(buf, GFP_KERNEL);
+	if (!ret->name) {
+		kfree(ret);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	kref_init(&ret->refcount);
+	init_rwsem(&ret->rwsem);
+
+	INIT_LIST_HEAD(&ret->values_head);
+	INIT_LIST_HEAD(&ret->subordinates_head);
+	INIT_LIST_HEAD(&ret->list_element);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(statsfs_source_create);
diff --git a/include/linux/statsfs.h b/include/linux/statsfs.h
new file mode 100644
index 000000000000..3f01f094946d
--- /dev/null
+++ b/include/linux/statsfs.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ *  statsfs.h - a tiny little statistics file system
+ *
+ *  Copyright (C) 2020 Emanuele Giuseppe Esposito
+ *  Copyright (C) 2020 Redhat.
+ *
+ */
+
+#ifndef _STATSFS_H_
+#define _STATSFS_H_
+
+#include <linux/list.h>
+
+/* Used to distinguish signed types */
+#define STATSFS_SIGN 0x8000
+
+struct statsfs_source;
+
+enum stat_type {
+	STATSFS_U8 = 0,
+	STATSFS_U16 = 1,
+	STATSFS_U32 = 2,
+	STATSFS_U64 = 3,
+	STATSFS_BOOL = 4,
+	STATSFS_S8 = STATSFS_U8 | STATSFS_SIGN,
+	STATSFS_S16 = STATSFS_U16 | STATSFS_SIGN,
+	STATSFS_S32 = STATSFS_U32 | STATSFS_SIGN,
+	STATSFS_S64 = STATSFS_U64 | STATSFS_SIGN,
+};
+
+enum stat_aggr {
+	STATSFS_NONE = 0,
+	STATSFS_SUM,
+	STATSFS_MIN,
+	STATSFS_MAX,
+	STATSFS_COUNT_ZERO,
+	STATSFS_AVG,
+};
+
+struct statsfs_value {
+	/* Name of the stat */
+	char *name;
+
+	/* Offset from base address to field containing the value */
+	int offset;
+
+	/* Type of the stat BOOL,U64,... */
+	enum stat_type type;
+
+	/* Aggregate type: MIN, MAX, SUM,... */
+	enum stat_aggr aggr_kind;
+
+	/* File mode */
+	uint16_t mode;
+};
+
+struct statsfs_source {
+	struct kref refcount;
+
+	char *name;
+
+	/* list of source statsfs_value_source*/
+	struct list_head values_head;
+
+	/* list of struct statsfs_source for subordinate sources */
+	struct list_head subordinates_head;
+
+	struct list_head list_element;
+
+	struct rw_semaphore rwsem;
+
+	struct dentry *source_dentry;
+};
+
+/**
+ * statsfs_source_create - create a statsfs_source
+ * Creates a statsfs_source with the given name. This
+ * does not mean it will be backed by the filesystem yet, it will only
+ * be visible to the user once one of its parents (or itself) are
+ * registered in statsfs.
+ *
+ * Returns a pointer to a statsfs_source if it succeeds.
+ * This or one of the parents' pointer must be passed to the statsfs_put()
+ * function when the file is to be removed.  If an error occurs,
+ * ERR_PTR(-ERROR) will be returned.
+ */
+struct statsfs_source *statsfs_source_create(const char *fmt, ...);
+
+/**
+ * statsfs_source_add_values - adds values to the given source
+ * @source: a pointer to the source that will receive the values
+ * @val: a pointer to the NULL terminated statsfs_value array to add
+ * @base_ptr: a pointer to the base pointer used by these values
+ *
+ * In addition to adding values to the source, also create the
+ * files in the filesystem if the source already is backed up by a directory.
+ *
+ * Returns 0 it succeeds. If the value are already in the
+ * source and have the same base_ptr, -EEXIST is returned.
+ */
+int statsfs_source_add_values(struct statsfs_source *source,
+			      struct statsfs_value *val, void *base_ptr);
+
+/**
+ * statsfs_source_add_subordinate - adds a child to the given source
+ * @parent: a pointer to the parent source
+ * @child: a pointer to child source to add
+ *
+ * Recursively create all files in the statsfs filesystem
+ * only if the parent has already a dentry (created with
+ * statsfs_source_register).
+ * This avoids the case where this function is called before register.
+ */
+void statsfs_source_add_subordinate(struct statsfs_source *parent,
+				    struct statsfs_source *child);
+
+/**
+ * statsfs_source_remove_subordinate - removes a child from the given source
+ * @parent: a pointer to the parent source
+ * @child: a pointer to child source to remove
+ *
+ * Look if there is such child in the parent. If so,
+ * it will remove all its files and call statsfs_put on the child.
+ */
+void statsfs_source_remove_subordinate(struct statsfs_source *parent,
+				       struct statsfs_source *child);
+
+/**
+ * statsfs_source_get_value - search a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @val: a pointer to the statsfs_value to search
+ * @ret: a pointer to the uint64_t that will hold the found value
+ *
+ * Look up in the source if a value with same value pointer
+ * exists.
+ * If not, it will return -ENOENT. If it exists and it's a simple value
+ * (not an aggregate), the value that it points to will be returned.
+ * If it exists and it's an aggregate (aggr_type != STATSFS_NONE), all
+ * subordinates will be recursively searched and every simple value match
+ * will be used to aggregate the final result. For example if it's a sum,
+ * all suboordinates having the same value will be sum together.
+ *
+ * This function will return 0 it succeeds.
+ */
+int statsfs_source_get_value(struct statsfs_source *source,
+			     struct statsfs_value *val, uint64_t *ret);
+
+/**
+ * statsfs_source_get_value_by_name - search a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @name: a pointer to the string representing the value to search
+ *        (for example "exits")
+ * @ret: a pointer to the uint64_t that will hold the found value
+ *
+ * Same as statsfs_source_get_value, but initially the name is used
+ * to search in the given source if there is a value with a matching
+ * name. If so, statsfs_source_get_value will be called with the found
+ * value, otherwise -ENOENT will be returned.
+ */
+int statsfs_source_get_value_by_name(struct statsfs_source *source, char *name,
+				     uint64_t *ret);
+
+/**
+ * statsfs_source_clear - search and clears a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @val: a pointer to the statsfs_value to search
+ *
+ * Look up in the source if a value with same value pointer
+ * exists.
+ * If not, it will return -ENOENT. If it exists and it's a simple value
+ * (not an aggregate), the value that it points to will be set to 0.
+ * If it exists and it's an aggregate (aggr_type != STATSFS_NONE), all
+ * subordinates will be recursively searched and every simple value match
+ * will be set to 0.
+ *
+ * This function will return 0 it succeeds.
+ */
+int statsfs_source_clear(struct statsfs_source *source,
+			 struct statsfs_value *val);
+
+/**
+ * statsfs_source_revoke - disconnect the source from its backing data
+ * @source: a pointer to the source that will be revoked
+ *
+ * Ensure that statsfs will not access the data that were passed to
+ * statsfs_source_add_value for this source.
+ *
+ * Because open files increase the reference count for a statsfs_source,
+ * the source can end up living longer than the data that provides the
+ * values for the source.  Calling statsfs_source_revoke just before the
+ * backing data is freed avoids accesses to freed data structures.  The
+ * sources will return 0.
+ */
+void statsfs_source_revoke(struct statsfs_source *source);
+
+/**
+ * statsfs_source_get - increases refcount of source
+ * @source: a pointer to the source whose refcount will be increased
+ */
+void statsfs_source_get(struct statsfs_source *source);
+
+/**
+ * statsfs_source_put - decreases refcount of source and deletes if needed
+ * @source: a pointer to the source whose refcount will be decreased
+ *
+ * If refcount arrives to zero, take care of deleting
+ * and free the source resources and files, by firstly recursively calling
+ * statsfs_source_remove_subordinate to the child and then deleting
+ * its own files and allocations.
+ */
+void statsfs_source_put(struct statsfs_source *source);
+
+/**
+ * statsfs_initialized - returns true if statsfs fs has been registered
+ */
+bool statsfs_initialized(void);
+
+#endif
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/5] kunit: tests for statsfs API
  2020-04-27 14:18 [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Emanuele Giuseppe Esposito
@ 2020-04-27 14:18 ` Emanuele Giuseppe Esposito
  2020-04-28 17:50   ` Randy Dunlap
  2020-04-27 14:18 ` [RFC PATCH 4/5] statsfs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs Emanuele Giuseppe Esposito
  4 siblings, 1 reply; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 14:18 UTC (permalink / raw)
  To: kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

Add kunit tests to extensively test the statsfs API functionality.

In order to run them, the kernel .config must set CONFIG_KUNIT=y
and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
and CONFIG_STATS_FS_TEST=y

Tests can be then started by running the following command from the root
directory of the linux kernel source tree:
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all`

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 fs/Kconfig                 |    6 +
 fs/statsfs/Makefile        |    2 +
 fs/statsfs/statsfs-tests.c | 1067 ++++++++++++++++++++++++++++++++++++
 3 files changed, 1075 insertions(+)
 create mode 100644 fs/statsfs/statsfs-tests.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 824fcf86d12b..6145b607e0bc 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -335,4 +335,10 @@ config STATS_FS
 	  statsfs is a virtual file system that provides counters and other
 	  statistics about the running kernel.
 
+config STATS_FS_TEST
+    bool "Tests for statsfs"
+    depends on STATS_FS && KUNIT
+	help
+	  statsfs tests for the statsfs API.
+
 endmenu
diff --git a/fs/statsfs/Makefile b/fs/statsfs/Makefile
index d494a3f30ba5..f546e3f03a12 100644
--- a/fs/statsfs/Makefile
+++ b/fs/statsfs/Makefile
@@ -1,4 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 statsfs-objs	:= statsfs.o
+statsfs-tests-objs	:= statsfs-tests.o
 
 obj-$(CONFIG_STATS_FS)	+= statsfs.o
+obj-$(CONFIG_STATS_FS_TEST) += statsfs-tests.o
diff --git a/fs/statsfs/statsfs-tests.c b/fs/statsfs/statsfs-tests.c
new file mode 100644
index 000000000000..98d1da2c7544
--- /dev/null
+++ b/fs/statsfs/statsfs-tests.c
@@ -0,0 +1,1067 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/anon_inodes.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+
+#include <linux/limits.h>
+#include <linux/statsfs.h>
+#include <kunit/test.h>
+#include "internal.h"
+
+#define STATSFS_STAT(el, x, ...)                                               \
+	{                                                                      \
+		.name = #x, .offset = offsetof(struct container, el.x),        \
+		##__VA_ARGS__                                                  \
+	}
+
+#define ARR_SIZE(el) ((int)(sizeof(el) / sizeof(struct statsfs_value) - 1))
+
+struct test_values_struct {
+	uint64_t u64;
+	int32_t s32;
+	bool bo;
+	uint8_t u8;
+	int16_t s16;
+};
+
+struct container {
+	struct test_values_struct vals;
+};
+
+struct statsfs_value test_values[6] = {
+	STATSFS_STAT(vals, u64, .type = STATSFS_U64, .aggr_kind = STATSFS_NONE,
+		     .mode = 0),
+	STATSFS_STAT(vals, s32, .type = STATSFS_S32, .aggr_kind = STATSFS_NONE,
+		     .mode = 0),
+	STATSFS_STAT(vals, bo, .type = STATSFS_BOOL, .aggr_kind = STATSFS_NONE,
+		     .mode = 0),
+	STATSFS_STAT(vals, u8, .type = STATSFS_U8, .aggr_kind = STATSFS_NONE,
+		     .mode = 0),
+	STATSFS_STAT(vals, s16, .type = STATSFS_S16, .aggr_kind = STATSFS_NONE,
+		     .mode = 0),
+	{ NULL },
+};
+
+struct statsfs_value test_aggr[4] = {
+	STATSFS_STAT(vals, s32, .type = STATSFS_S32, .aggr_kind = STATSFS_MIN,
+		     .mode = 0),
+	STATSFS_STAT(vals, bo, .type = STATSFS_BOOL, .aggr_kind = STATSFS_MAX,
+		     .mode = 0),
+	STATSFS_STAT(vals, u64, .type = STATSFS_U64, .aggr_kind = STATSFS_SUM,
+		     .mode = 0),
+	{ NULL },
+};
+
+struct statsfs_value test_same_name[3] = {
+	STATSFS_STAT(vals, s32, .type = STATSFS_S32, .aggr_kind = STATSFS_NONE,
+		     .mode = 0),
+	STATSFS_STAT(vals, s32, .type = STATSFS_S32, .aggr_kind = STATSFS_MIN,
+		     .mode = 0),
+	{ NULL },
+};
+
+struct statsfs_value test_all_aggr[6] = {
+	STATSFS_STAT(vals, s32, .type = STATSFS_S32, .aggr_kind = STATSFS_MIN,
+		     .mode = 0),
+	STATSFS_STAT(vals, bo, .type = STATSFS_BOOL,
+		     .aggr_kind = STATSFS_COUNT_ZERO, .mode = 0),
+	STATSFS_STAT(vals, u64, .type = STATSFS_U64, .aggr_kind = STATSFS_SUM,
+		     .mode = 0),
+	STATSFS_STAT(vals, u8, .type = STATSFS_U8, .aggr_kind = STATSFS_AVG,
+		     .mode = 0),
+	STATSFS_STAT(vals, s16, .type = STATSFS_S16, .aggr_kind = STATSFS_MAX,
+		     .mode = 0),
+	{ NULL },
+};
+
+#define def_u64 ((uint64_t) 64)
+
+#define def_val_s32 ((int32_t) S32_MIN)
+#define def_val_bool ((bool) true)
+#define def_val_u8 ((uint8_t) 127)
+#define def_val_s16 ((int16_t) 10000)
+
+#define def_val2_s32 ((int32_t) S16_MAX)
+#define def_val2_bool ((bool) false)
+#define def_val2_u8 ((uint8_t) 255)
+#define def_val2_s16 ((int16_t) -20000)
+
+struct container cont = {
+	.vals = {
+			.u64 = def_u64,
+			.s32 = def_val_s32,
+			.bo = def_val_bool,
+			.u8 = def_val_u8,
+			.s16 = def_val_s16,
+		},
+};
+
+struct container cont2 = {
+	.vals = {
+			.u64 = def_u64,
+			.s32 = def_val2_s32,
+			.bo = def_val2_bool,
+			.u8 = def_val2_u8,
+			.s16 = def_val2_s16,
+		},
+};
+
+static void get_stats_at_addr(struct statsfs_source *src, void *addr, int *aggr,
+			      int *val, int use_addr)
+{
+	struct statsfs_value *entry;
+	struct statsfs_value_source *src_entry;
+	int counter_val = 0, counter_aggr = 0;
+
+	list_for_each_entry(src_entry, &src->values_head, list_element) {
+		if (use_addr && src_entry->base_addr != addr)
+			continue;
+
+		for (entry = src_entry->values; entry->name; entry++) {
+			if (entry->aggr_kind == STATSFS_NONE)
+				counter_val++;
+			else
+				counter_aggr++;
+		}
+	}
+
+	if (aggr)
+		*aggr = counter_aggr;
+
+	if (val)
+		*val = counter_val;
+}
+
+int source_has_subsource(struct statsfs_source *src, struct statsfs_source *sub)
+{
+	struct statsfs_source *entry;
+
+	list_for_each_entry(entry, &src->subordinates_head, list_element) {
+		if (entry == sub)
+			return 1;
+	}
+	return 0;
+}
+
+int get_number_subsources(struct statsfs_source *src)
+{
+	struct statsfs_source *entry;
+	int counter = 0;
+
+	list_for_each_entry(entry, &src->subordinates_head, list_element) {
+		counter++;
+	}
+	return counter;
+}
+
+int get_number_values(struct statsfs_source *src)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, NULL, NULL, &counter, 0);
+	return counter;
+}
+
+int get_total_number_values(struct statsfs_source *src)
+{
+	struct statsfs_source *sub_entry;
+	int counter = 0;
+
+	get_stats_at_addr(src, NULL, NULL, &counter, 0);
+
+	list_for_each_entry(sub_entry, &src->subordinates_head, list_element) {
+		counter += get_total_number_values(sub_entry);
+	}
+
+	return counter;
+}
+
+int get_number_aggregates(struct statsfs_source *src)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, NULL, &counter, NULL, 1);
+	return counter;
+}
+
+int get_number_values_with_base(struct statsfs_source *src, void *addr)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, addr, NULL, &counter, 1);
+	return counter;
+}
+
+int get_number_aggr_with_base(struct statsfs_source *src, void *addr)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, addr, &counter, NULL, 1);
+	return counter;
+}
+
+static void test_empty_folder(struct kunit *test)
+{
+	struct statsfs_source *src;
+
+	src = statsfs_source_create("kvm_%d", 123);
+	KUNIT_EXPECT_EQ(test, strcmp(src->name, "kvm_123"), 0);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	statsfs_source_put(src);
+}
+
+static void test_add_subfolder(struct kunit *test)
+{
+	struct statsfs_source *src, *sub;
+
+	src = statsfs_source_create("parent");
+	sub = statsfs_source_create("child");
+	statsfs_source_add_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, source_has_subsource(src, sub), true);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 1);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(sub), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	sub = statsfs_source_create("not a child");
+	KUNIT_EXPECT_EQ(test, source_has_subsource(src, sub), false);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 1);
+
+	statsfs_source_put(src);
+}
+
+static void test_add_value(struct kunit *test)
+{
+	struct statsfs_source *src;
+	int n;
+
+	src = statsfs_source_create("parent");
+
+	// add values
+	n = statsfs_source_add_values(src, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	// add same values, nothing happens
+	n = statsfs_source_add_values(src, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	// size is invaried
+	KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+
+	// no aggregates
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+
+	statsfs_source_put(src);
+}
+
+static void test_add_value_in_subfolder(struct kunit *test)
+{
+	struct statsfs_source *src, *sub, *sub_not;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub = statsfs_source_create("child");
+
+	// src -> sub
+	statsfs_source_add_subordinate(src, sub);
+
+	// add values
+	n = statsfs_source_add_values(sub, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values));
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub), ARR_SIZE(test_values));
+	// no values in sub
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), 0);
+
+	// different folder
+	sub_not = statsfs_source_create("not a child");
+
+	// add values
+	n = statsfs_source_add_values(sub_not, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub_not, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values));
+
+	// remove sub, check values is 0
+	statsfs_source_remove_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	// re-add sub, check value are added
+	statsfs_source_add_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values));
+
+	// add sub_not, check value are twice as many
+	statsfs_source_add_subordinate(src, sub_not);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values) * 2);
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub_not),
+			ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub_not), 0);
+
+	statsfs_source_put(src);
+}
+
+static void test_search_value(struct kunit *test)
+{
+	struct statsfs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+
+	// add values
+	n = statsfs_source_add_values(src, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	// get u64
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ((bool)ret), def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// get a non-added value
+	n = statsfs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	statsfs_source_put(src);
+}
+
+static void test_search_value_in_subfolder(struct kunit *test)
+{
+	struct statsfs_source *src, *sub;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub = statsfs_source_create("child");
+
+	// src -> sub
+	statsfs_source_add_subordinate(src, sub);
+
+	// add values to sub
+	n = statsfs_source_add_values(sub, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	n = statsfs_source_get_value_by_name(sub, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(sub, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(sub, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ((bool)ret), def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(sub, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	n = statsfs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	statsfs_source_put(src);
+}
+
+static void test_search_value_in_empty_folder(struct kunit *test)
+{
+	struct statsfs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("empty folder");
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	statsfs_source_put(src);
+}
+
+static void test_add_aggregate(struct kunit *test)
+{
+	struct statsfs_source *src;
+	int n;
+
+	src = statsfs_source_create("parent");
+
+	// add aggr to src, no values
+	n = statsfs_source_add_values(src, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// count values
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	// add same array again, should not be added
+	n = statsfs_source_add_values(src, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), ARR_SIZE(test_aggr));
+
+	statsfs_source_put(src);
+}
+
+static void test_add_aggregate_in_subfolder(struct kunit *test)
+{
+	struct statsfs_source *src, *sub, *sub_not;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub = statsfs_source_create("child");
+	// src->sub
+	statsfs_source_add_subordinate(src, sub);
+
+	// add aggr to sub
+	n = statsfs_source_add_values(sub, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), ARR_SIZE(test_aggr));
+
+	// not a child
+	sub_not = statsfs_source_create("not a child");
+
+	// add aggr to "not a child"
+	n = statsfs_source_add_values(sub_not, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub_not, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	// remove sub
+	statsfs_source_remove_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	// re-add both
+	statsfs_source_add_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+	statsfs_source_add_subordinate(src, sub_not);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub_not), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub_not),
+			ARR_SIZE(test_aggr));
+
+	statsfs_source_put(src);
+}
+
+static void test_search_aggregate(struct kunit *test)
+{
+	struct statsfs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	n = statsfs_source_add_values(src, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = statsfs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	statsfs_source_put(src);
+}
+
+static void test_search_aggregate_in_subfolder(struct kunit *test)
+{
+	struct statsfs_source *src, *sub;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub = statsfs_source_create("child");
+
+	statsfs_source_add_subordinate(src, sub);
+
+	n = statsfs_source_add_values(sub, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	n = get_number_aggr_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// no u64 in test_aggr
+	n = statsfs_source_get_value_by_name(sub, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(sub, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(sub, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = statsfs_source_get_value_by_name(sub, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	n = statsfs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	statsfs_source_put(src);
+}
+
+void test_search_same(struct kunit *test)
+{
+	struct statsfs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	n = statsfs_source_add_values(src, test_same_name, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+
+	n = statsfs_source_add_values(src, test_same_name, &cont);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+
+	// returns first the value
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	statsfs_source_put(src);
+}
+
+static void test_add_mixed(struct kunit *test)
+{
+	struct statsfs_source *src;
+	int n;
+
+	src = statsfs_source_create("parent");
+
+	n = statsfs_source_add_values(src, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_add_values(src, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	n = statsfs_source_add_values(src, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	n = statsfs_source_add_values(src, test_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), ARR_SIZE(test_aggr));
+	statsfs_source_put(src);
+}
+
+static void test_search_mixed(struct kunit *test)
+{
+	struct statsfs_source *src, *sub;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub = statsfs_source_create("child");
+	statsfs_source_add_subordinate(src, sub);
+
+	// src has the aggregates, sub the values. Just search
+	n = statsfs_source_add_values(sub, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	n = statsfs_source_add_values(src, test_aggr, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	// u64 is sum so again same value
+	n = statsfs_source_get_value_by_name(sub, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// s32 is min so return the value also in the aggregate
+	n = statsfs_source_get_value_by_name(sub, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// bo is max
+	n = statsfs_source_get_value_by_name(sub, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = statsfs_source_get_value_by_name(sub, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	n = statsfs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	statsfs_source_put(src);
+}
+
+static void test_all_aggregations_agg_val_val(struct kunit *test)
+{
+	struct statsfs_source *src, *sub1, *sub2;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub1 = statsfs_source_create("child1");
+	sub2 = statsfs_source_create("child2");
+	statsfs_source_add_subordinate(src, sub1);
+	statsfs_source_add_subordinate(src, sub2);
+
+	n = statsfs_source_add_values(sub1, test_all_aggr, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+	n = statsfs_source_add_values(sub2, test_all_aggr, &cont2);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub2, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = statsfs_source_add_values(src, test_all_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+	// min
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+	// count_0
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = statsfs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 191ull);
+
+	// max
+	n = statsfs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+	statsfs_source_put(src);
+}
+
+static void test_all_aggregations_val_agg_val(struct kunit *test)
+{
+	struct statsfs_source *src, *sub1, *sub2;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub1 = statsfs_source_create("child1");
+	sub2 = statsfs_source_create("child2");
+	statsfs_source_add_subordinate(src, sub1);
+	statsfs_source_add_subordinate(src, sub2);
+
+	n = statsfs_source_add_values(src, test_all_aggr, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+	n = statsfs_source_add_values(sub2, test_all_aggr, &cont2);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub2, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = statsfs_source_add_values(sub1, test_all_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	n = statsfs_source_get_value_by_name(sub1, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	n = statsfs_source_get_value_by_name(sub2, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	n = statsfs_source_get_value_by_name(sub1, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX); // MIN
+	n = statsfs_source_get_value_by_name(sub2, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val2_s32);
+
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+	n = statsfs_source_get_value_by_name(sub1, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	n = statsfs_source_get_value_by_name(sub2, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val2_bool);
+
+	n = statsfs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val_u8);
+	n = statsfs_source_get_value_by_name(sub1, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	n = statsfs_source_get_value_by_name(sub2, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val2_u8);
+
+	n = statsfs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+	n = statsfs_source_get_value_by_name(sub1, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MIN); // MAX
+	n = statsfs_source_get_value_by_name(sub2, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val2_s16);
+
+	statsfs_source_put(src);
+}
+
+static void test_all_aggregations_agg_val_val_sub(struct kunit *test)
+{
+	struct statsfs_source *src, *sub1, *sub11;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub1 = statsfs_source_create("child1");
+	sub11 = statsfs_source_create("child11");
+	statsfs_source_add_subordinate(src, sub1);
+	statsfs_source_add_subordinate(sub1, sub11); // changes here!
+
+	n = statsfs_source_add_values(sub1, test_values, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub1, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	n = statsfs_source_add_values(sub11, test_values, &cont2);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values) * 2);
+
+	n = statsfs_source_add_values(sub1, test_all_aggr, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+	n = statsfs_source_add_values(sub11, test_all_aggr, &cont2);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = statsfs_source_add_values(src, test_all_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+	// min
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+	// count_0
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = statsfs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 191ull);
+
+	// max
+	n = statsfs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+	statsfs_source_put(src);
+}
+
+static void test_all_aggregations_agg_no_val_sub(struct kunit *test)
+{
+	struct statsfs_source *src, *sub1, *sub11;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub1 = statsfs_source_create("child1");
+	sub11 = statsfs_source_create("child11");
+	statsfs_source_add_subordinate(src, sub1);
+	statsfs_source_add_subordinate(sub1, sub11);
+
+	n = statsfs_source_add_values(sub11, test_all_aggr, &cont2);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	n = statsfs_source_add_values(src, test_all_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+
+	// min
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val2_s32);
+
+	// count_0
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = statsfs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val2_u8);
+
+	// max
+	n = statsfs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val2_s16);
+
+	statsfs_source_put(src);
+}
+
+static void test_all_aggregations_agg_agg_val_sub(struct kunit *test)
+{
+	struct statsfs_source *src, *sub1, *sub11, *sub12;
+	uint64_t ret;
+	int n;
+
+	src = statsfs_source_create("parent");
+	sub1 = statsfs_source_create("child1");
+	sub11 = statsfs_source_create("child11");
+	sub12 = statsfs_source_create("child12");
+	statsfs_source_add_subordinate(src, sub1);
+	statsfs_source_add_subordinate(sub1, sub11);
+	statsfs_source_add_subordinate(sub1, sub12);
+
+	n = statsfs_source_add_values(sub11, test_all_aggr, &cont2);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = statsfs_source_add_values(sub12, test_all_aggr, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub12, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	n = statsfs_source_add_values(src, test_all_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = statsfs_source_add_values(sub1, test_all_aggr, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = statsfs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+	// min
+	n = statsfs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+	// count_0
+	n = statsfs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = statsfs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret,
+			(uint8_t)((def_val2_u8 + def_val_u8) / 2));
+
+	// max
+	n = statsfs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+	statsfs_source_put(src);
+}
+
+static struct kunit_case statsfs_test_cases[] = {
+	KUNIT_CASE(test_empty_folder),
+	KUNIT_CASE(test_add_subfolder),
+	KUNIT_CASE(test_add_value),
+	KUNIT_CASE(test_add_value_in_subfolder),
+	KUNIT_CASE(test_search_value),
+	KUNIT_CASE(test_search_value_in_subfolder),
+	KUNIT_CASE(test_search_value_in_empty_folder),
+	KUNIT_CASE(test_add_aggregate),
+	KUNIT_CASE(test_add_aggregate_in_subfolder),
+	KUNIT_CASE(test_search_aggregate),
+	KUNIT_CASE(test_search_aggregate_in_subfolder),
+	KUNIT_CASE(test_search_same),
+	KUNIT_CASE(test_add_mixed),
+	KUNIT_CASE(test_search_mixed),
+	KUNIT_CASE(test_all_aggregations_agg_val_val),
+	KUNIT_CASE(test_all_aggregations_val_agg_val),
+	KUNIT_CASE(test_all_aggregations_agg_val_val_sub),
+	KUNIT_CASE(test_all_aggregations_agg_no_val_sub),
+	KUNIT_CASE(test_all_aggregations_agg_agg_val_sub),
+	{}
+};
+
+static struct kunit_suite statsfs_test_suite = {
+	.name = "statsfs",
+	.test_cases = statsfs_test_cases,
+};
+
+kunit_test_suite(statsfs_test_suite);
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/5] statsfs fs: virtual fs to show stats to the end-user
  2020-04-27 14:18 [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2020-04-27 14:18 ` [RFC PATCH 3/5] kunit: tests for statsfs API Emanuele Giuseppe Esposito
@ 2020-04-27 14:18 ` Emanuele Giuseppe Esposito
  2020-04-27 14:18 ` [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs Emanuele Giuseppe Esposito
  4 siblings, 0 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 14:18 UTC (permalink / raw)
  To: kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

Add virtual fs that maps statsfs sources with directories, and values
(simple or aggregates) to files.

Every time a file is read/cleared, the fs internally invokes the statsfs
API to get/set the requested value.

fs/statsfs/inode.cis pretty much similar to what is done in
fs/debugfs/inode.c, with the exception that the API is only
composed by statsfs_create_file, statsfs_create_dir and statsfs_remove.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 fs/statsfs/Makefile        |   2 +-
 fs/statsfs/inode.c         | 337 +++++++++++++++++++++++++++++++++++++
 fs/statsfs/internal.h      |  15 ++
 fs/statsfs/statsfs.c       | 162 ++++++++++++++++++
 include/linux/statsfs.h    |  12 ++
 include/uapi/linux/magic.h |   1 +
 tools/lib/api/fs/fs.c      |  21 +++
 7 files changed, 549 insertions(+), 1 deletion(-)
 create mode 100644 fs/statsfs/inode.c

diff --git a/fs/statsfs/Makefile b/fs/statsfs/Makefile
index f546e3f03a12..5df4513a2f34 100644
--- a/fs/statsfs/Makefile
+++ b/fs/statsfs/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-statsfs-objs	:= statsfs.o
+statsfs-objs	:= inode.o statsfs.o
 statsfs-tests-objs	:= statsfs-tests.o
 
 obj-$(CONFIG_STATS_FS)	+= statsfs.o
diff --git a/fs/statsfs/inode.c b/fs/statsfs/inode.c
new file mode 100644
index 000000000000..f774c6618017
--- /dev/null
+++ b/fs/statsfs/inode.c
@@ -0,0 +1,337 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  inode.c - part of statsfs, a tiny little statsfs file system
+ *
+ *  Copyright (C) 2020 Emanuele Giuseppe Esposito <eesposit@redhat.com>
+ *  Copyright (C) 2020 Redhat
+ */
+#define pr_fmt(fmt)	"statsfs: " fmt
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/init.h>
+#include <linux/statsfs.h>
+#include <linux/string.h>
+#include <linux/seq_file.h>
+#include <linux/parser.h>
+#include <linux/magic.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+#define STATSFS_DEFAULT_MODE	0700
+
+static struct simple_fs statsfs;
+static bool statsfs_registered;
+
+struct statsfs_mount_opts {
+	kuid_t uid;
+	kgid_t gid;
+	umode_t mode;
+};
+
+enum {
+	Opt_uid,
+	Opt_gid,
+	Opt_mode,
+	Opt_err
+};
+
+static const match_table_t tokens = {
+	{Opt_uid, "uid=%u"},
+	{Opt_gid, "gid=%u"},
+	{Opt_mode, "mode=%o"},
+	{Opt_err, NULL}
+};
+
+struct statsfs_fs_info {
+	struct statsfs_mount_opts mount_opts;
+};
+
+static int statsfs_parse_options(char *data, struct statsfs_mount_opts *opts)
+{
+	substring_t args[MAX_OPT_ARGS];
+	int option;
+	int token;
+	kuid_t uid;
+	kgid_t gid;
+	char *p;
+
+	opts->mode = STATSFS_DEFAULT_MODE;
+
+	while ((p = strsep(&data, ",")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_uid:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid))
+				return -EINVAL;
+			opts->uid = uid;
+			break;
+		case Opt_gid:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid))
+				return -EINVAL;
+			opts->gid = gid;
+			break;
+		case Opt_mode:
+			if (match_octal(&args[0], &option))
+				return -EINVAL;
+			opts->mode = option & S_IALLUGO;
+			break;
+		/*
+		 * We might like to report bad mount options here;
+		 * but traditionally statsfs has ignored all mount options
+		 */
+		}
+	}
+
+	return 0;
+}
+
+static int statsfs_apply_options(struct super_block *sb)
+{
+	struct statsfs_fs_info *fsi = sb->s_fs_info;
+	struct inode *inode = d_inode(sb->s_root);
+	struct statsfs_mount_opts *opts = &fsi->mount_opts;
+
+	inode->i_mode &= ~S_IALLUGO;
+	inode->i_mode |= opts->mode;
+
+	inode->i_uid = opts->uid;
+	inode->i_gid = opts->gid;
+
+	return 0;
+}
+
+static int statsfs_remount(struct super_block *sb, int *flags, char *data)
+{
+	int err;
+	struct statsfs_fs_info *fsi = sb->s_fs_info;
+
+	sync_filesystem(sb);
+	err = statsfs_parse_options(data, &fsi->mount_opts);
+	if (err)
+		goto fail;
+
+	statsfs_apply_options(sb);
+
+fail:
+	return err;
+}
+
+static int statsfs_show_options(struct seq_file *m, struct dentry *root)
+{
+	struct statsfs_fs_info *fsi = root->d_sb->s_fs_info;
+	struct statsfs_mount_opts *opts = &fsi->mount_opts;
+
+	if (!uid_eq(opts->uid, GLOBAL_ROOT_UID))
+		seq_printf(m, ",uid=%u",
+			   from_kuid_munged(&init_user_ns, opts->uid));
+	if (!gid_eq(opts->gid, GLOBAL_ROOT_GID))
+		seq_printf(m, ",gid=%u",
+			   from_kgid_munged(&init_user_ns, opts->gid));
+	if (opts->mode != STATSFS_DEFAULT_MODE)
+		seq_printf(m, ",mode=%o", opts->mode);
+
+	return 0;
+}
+
+
+static void statsfs_free_inode(struct inode *inode)
+{
+	kfree(inode->i_private);
+	free_inode_nonrcu(inode);
+}
+
+static const struct super_operations statsfs_super_operations = {
+	.statfs		= simple_statfs,
+	.remount_fs	= statsfs_remount,
+	.show_options	= statsfs_show_options,
+	.free_inode	= statsfs_free_inode,
+};
+
+static int statsfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+	static const struct tree_descr statsfs_files[] = {{""}};
+	struct statsfs_fs_info *fsi;
+	int err;
+
+	fsi = kzalloc(sizeof(struct statsfs_fs_info), GFP_KERNEL);
+	sb->s_fs_info = fsi;
+	if (!fsi) {
+		err = -ENOMEM;
+		goto fail;
+	}
+
+	err = statsfs_parse_options(data, &fsi->mount_opts);
+	if (err)
+		goto fail;
+
+	err  =  simple_fill_super(sb, STATSFS_MAGIC, statsfs_files);
+	if (err)
+		goto fail;
+
+	sb->s_op = &statsfs_super_operations;
+
+	statsfs_apply_options(sb);
+
+	return 0;
+
+fail:
+	kfree(fsi);
+	sb->s_fs_info = NULL;
+	return err;
+}
+
+static struct dentry *statsfs_mount(struct file_system_type *fs_type,
+			int flags, const char *dev_name,
+			void *data)
+{
+	return mount_single(fs_type, flags, data, statsfs_fill_super);
+}
+
+static struct file_system_type statsfs_fs_type = {
+	.owner =	THIS_MODULE,
+	.name =		"statsfs",
+	.mount =	statsfs_mount,
+	.kill_sb =	kill_litter_super,
+};
+MODULE_ALIAS_FS("statsfs");
+
+
+/**
+ * statsfs_create_file - create a file in the statsfs filesystem
+ * @val: a pointer to a statsfs_value containing all the infos of
+ * the file to create (name, permission)
+ * @src: a pointer to a statsfs_source containing the dentry of where
+ * to add this file
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the statsfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ *
+ * Val and src will be also inglobated in a ststsfs_data_inode struct
+ * that will be internally stored as inode->i_private and used in the
+ * get/set attribute functions (see statsfs_ops in statsfs.c).
+ */
+struct dentry *statsfs_create_file(struct statsfs_value *val, struct statsfs_source *src)
+{
+	struct dentry *dentry;
+	struct inode *inode;
+	struct statsfs_data_inode *val_inode;
+
+	val_inode = kzalloc(sizeof(struct statsfs_data_inode), GFP_KERNEL);
+	if (!val_inode) {
+		printk(KERN_ERR
+			"Kzalloc failure in statsfs_create_files (ENOMEM)\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	val_inode->src = src;
+	val_inode->val = val;
+
+
+	dentry = simplefs_create_file(&statsfs, &statsfs_fs_type,
+				      val->name, statsfs_val_get_mode(val),
+					  src->source_dentry, val_inode, &inode);
+	if (IS_ERR(dentry))
+		return dentry;
+
+	inode->i_fop = &statsfs_ops;
+
+	return simplefs_finish_dentry(dentry, inode);
+}
+/**
+ * statsfs_create_dir - create a directory in the statsfs filesystem
+ * @name: a pointer to a string containing the name of the directory to
+ *        create.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is NULL, then the
+ *          directory will be created in the root of the statsfs filesystem.
+ *
+ * This function creates a directory in statsfs with the given name.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the statsfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ */
+struct dentry *statsfs_create_dir(const char *name, struct dentry *parent)
+{
+	struct dentry *dentry;
+	struct inode *inode;
+
+	dentry = simplefs_create_dir(&statsfs, &statsfs_fs_type,
+				     name, 0755, parent, &inode);
+	if (IS_ERR(dentry))
+		return dentry;
+
+	inode->i_op = &simple_dir_inode_operations;
+	return simplefs_finish_dentry(dentry, inode);
+}
+
+static void remove_one(struct dentry *victim)
+{
+	simple_release_fs(&statsfs);
+}
+
+/**
+ * statsfs_remove - recursively removes a directory
+ * @dentry: a pointer to a the dentry of the directory to be removed.  If this
+ *          parameter is NULL or an error value, nothing will be done.
+ *
+ * This function recursively removes a directory tree in statsfs that
+ * was previously created with a call to another statsfs function
+ * (like statsfs_create_file() or variants thereof.)
+ *
+ * This function is required to be called in order for the file to be
+ * removed, no automatic cleanup of files will happen when a module is
+ * removed, you are responsible here.
+ */
+void statsfs_remove(struct dentry *dentry)
+{
+	if (IS_ERR_OR_NULL(dentry))
+		return;
+
+	simple_pin_fs(&statsfs, &statsfs_fs_type);
+	simple_recursive_removal(dentry, remove_one);
+	simple_release_fs(&statsfs);
+}
+/**
+ * statsfs_initialized - Tells whether statsfs has been registered
+ */
+bool statsfs_initialized(void)
+{
+	return statsfs_registered;
+}
+EXPORT_SYMBOL_GPL(statsfs_initialized);
+
+static int __init statsfs_init(void)
+{
+	int retval;
+
+	retval = sysfs_create_mount_point(kernel_kobj, "statsfs");
+	if (retval)
+		return retval;
+
+	retval = register_filesystem(&statsfs_fs_type);
+	if (retval)
+		sysfs_remove_mount_point(kernel_kobj, "statsfs");
+	else
+		statsfs_registered = true;
+
+	return retval;
+}
+core_initcall(statsfs_init);
diff --git a/fs/statsfs/internal.h b/fs/statsfs/internal.h
index f124683a2ded..64211f252d6c 100644
--- a/fs/statsfs/internal.h
+++ b/fs/statsfs/internal.h
@@ -15,6 +15,21 @@ struct statsfs_value_source {
 	struct list_head list_element;
 };
 
+struct statsfs_data_inode {
+	struct statsfs_source *src;
+	struct statsfs_value *val;
+};
+
+extern const struct file_operations statsfs_ops;
+
+struct dentry *statsfs_create_file(struct statsfs_value *val,
+				   struct statsfs_source *src);
+
+struct dentry *statsfs_create_dir(const char *name, struct dentry *parent);
+
+void statsfs_remove(struct dentry *dentry);
+#define statsfs_remove_recursive statsfs_remove
+
 int statsfs_val_get_mode(struct statsfs_value *val);
 
 #endif /* _STATSFS_INTERNAL_H_ */
diff --git a/fs/statsfs/statsfs.c b/fs/statsfs/statsfs.c
index 0ad1d985be46..5a56a2cef581 100644
--- a/fs/statsfs/statsfs.c
+++ b/fs/statsfs/statsfs.c
@@ -17,16 +17,114 @@ struct statsfs_aggregate_value {
 	uint32_t count, count_zero;
 };
 
+static void statsfs_source_remove_files(struct statsfs_source *src);
+
 static int is_val_signed(struct statsfs_value *val)
 {
 	return val->type & STATSFS_SIGN;
 }
 
+static int statsfs_attr_get(void *data, u64 *val)
+{
+	int r = -EFAULT;
+	struct statsfs_data_inode *val_inode =
+		(struct statsfs_data_inode *)data;
+
+	r = statsfs_source_get_value(val_inode->src, val_inode->val, val);
+	return r;
+}
+
+static int statsfs_attr_clear(void *data, u64 val)
+{
+	int r = -EFAULT;
+	struct statsfs_data_inode *val_inode =
+		(struct statsfs_data_inode *)data;
+
+	if (val)
+		return -EINVAL;
+
+	r = statsfs_source_clear(val_inode->src, val_inode->val);
+	return r;
+}
+
 int statsfs_val_get_mode(struct statsfs_value *val)
 {
 	return val->mode ? val->mode : 0644;
 }
 
+static int statsfs_attr_data_open(struct inode *inode, struct file *file)
+{
+	struct statsfs_data_inode *val_inode;
+	char *fmt;
+
+	val_inode = (struct statsfs_data_inode *)inode->i_private;
+
+	/* Inodes hold a  pointer to the source which is not included in the
+	 * refcount, so they files be opened while destroy is running, but
+	 * values are removed (base_addr = NULL) before the source is destroyed.
+	 */
+	if (!kref_get_unless_zero(&val_inode->src->refcount))
+		return -ENOENT;
+
+	if (is_val_signed(val_inode->val))
+		fmt = "%lld\n";
+	else
+		fmt = "%llu\n";
+
+	if (simple_attr_open(inode, file, statsfs_attr_get,
+			     statsfs_val_get_mode(val_inode->val) & 0222 ?
+				     statsfs_attr_clear :
+				     NULL,
+			     fmt)) {
+		statsfs_source_put(val_inode->src);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static int statsfs_attr_release(struct inode *inode, struct file *file)
+{
+	struct statsfs_data_inode *val_inode;
+
+	val_inode = (struct statsfs_data_inode *)inode->i_private;
+
+	simple_attr_release(inode, file);
+	statsfs_source_put(val_inode->src);
+
+	return 0;
+}
+
+const struct file_operations statsfs_ops = {
+	.owner = THIS_MODULE,
+	.open = statsfs_attr_data_open,
+	.release = statsfs_attr_release,
+	.read = simple_attr_read,
+	.write = simple_attr_write,
+	.llseek = no_llseek,
+};
+
+/* Called with rwsem held for writing */
+static void statsfs_source_remove_files_locked(struct statsfs_source *src)
+{
+	struct statsfs_source *child;
+
+	if (src->source_dentry == NULL)
+		return;
+
+	list_for_each_entry(child, &src->subordinates_head, list_element)
+		statsfs_source_remove_files(child);
+
+	statsfs_remove_recursive(src->source_dentry);
+	src->source_dentry = NULL;
+}
+
+static void statsfs_source_remove_files(struct statsfs_source *src)
+{
+	down_write(&src->rwsem);
+	statsfs_source_remove_files_locked(src);
+	up_write(&src->rwsem);
+}
+
 static struct statsfs_value *find_value(struct statsfs_value_source *src,
 					struct statsfs_value *val)
 {
@@ -59,6 +157,61 @@ search_value_in_source(struct statsfs_source *src, struct statsfs_value *arg,
 	return NULL;
 }
 
+/* Called with rwsem held for writing */
+static void statsfs_create_files_locked(struct statsfs_source *source)
+{
+	struct statsfs_value_source *val_src;
+	struct statsfs_value *val;
+
+	if (!source->source_dentry)
+		return;
+
+	list_for_each_entry(val_src, &source->values_head, list_element) {
+		if (val_src->files_created)
+			continue;
+
+		for (val = val_src->values; val->name; val++)
+			statsfs_create_file(val, source);
+
+		val_src->files_created = true;
+	}
+}
+
+/* Called with rwsem held for writing */
+static void statsfs_create_files_recursive_locked(struct statsfs_source *source,
+						  struct dentry *parent_dentry)
+{
+	struct statsfs_source *child;
+
+	/* first check values in this folder, since it might be new */
+	if (!source->source_dentry) {
+		source->source_dentry =
+			statsfs_create_dir(source->name, parent_dentry);
+	}
+
+	statsfs_create_files_locked(source);
+
+	list_for_each_entry(child, &source->subordinates_head, list_element) {
+		if (child->source_dentry == NULL) {
+			/* assume that if child has a folder,
+			 * also the sub-child have that.
+			 */
+			down_write(&child->rwsem);
+			statsfs_create_files_recursive_locked(
+				child, source->source_dentry);
+			up_write(&child->rwsem);
+		}
+	}
+}
+
+void statsfs_source_register(struct statsfs_source *source)
+{
+	down_write(&source->rwsem);
+	statsfs_create_files_recursive_locked(source, NULL);
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(statsfs_source_register);
+
 /* Called with rwsem held for writing */
 static struct statsfs_value_source *create_value_source(void *base)
 {
@@ -96,6 +249,9 @@ int statsfs_source_add_values(struct statsfs_source *source,
 	/* add the val_src to the source list */
 	list_add(&val_src->list_element, &source->values_head);
 
+	/* create child if it's the case */
+	statsfs_create_files_locked(source);
+
 	up_write(&source->rwsem);
 
 	return 0;
@@ -109,6 +265,10 @@ void statsfs_source_add_subordinate(struct statsfs_source *source,
 
 	statsfs_source_get(sub);
 	list_add(&sub->list_element, &source->subordinates_head);
+	if (source->source_dentry) {
+		statsfs_create_files_recursive_locked(sub,
+						      source->source_dentry);
+	}
 
 	up_write(&source->rwsem);
 }
@@ -127,6 +287,7 @@ statsfs_source_remove_subordinate_locked(struct statsfs_source *source,
 		if (src_entry == sub) {
 			WARN_ON(strcmp(src_entry->name, sub->name) != 0);
 			list_del_init(&src_entry->list_element);
+			statsfs_source_remove_files(src_entry);
 			statsfs_source_put(src_entry);
 			return;
 		}
@@ -572,6 +733,7 @@ static void statsfs_source_destroy(struct kref *kref_source)
 		statsfs_source_remove_subordinate_locked(source, child);
 	}
 
+	statsfs_source_remove_files_locked(source);
 
 	up_write(&source->rwsem);
 	kfree(source->name);
diff --git a/include/linux/statsfs.h b/include/linux/statsfs.h
index 3f01f094946d..f6e8eead1124 100644
--- a/include/linux/statsfs.h
+++ b/include/linux/statsfs.h
@@ -87,6 +87,18 @@ struct statsfs_source {
  */
 struct statsfs_source *statsfs_source_create(const char *fmt, ...);
 
+/**
+ * statsfs_source_register - register a source in the statsfs filesystem
+ * @source: a pointer to the source that will be registered
+ *
+ * TAdd the given folder as direct child of /sys/kernel/statsfs.
+ * It also starts to recursively search its own child and create all folders
+ * and files if they weren't already. All subsequent add_subordinate calls
+ * on the same source that is used in this function will create corresponding
+ * files and directories.
+ */
+void statsfs_source_register(struct statsfs_source *source);
+
 /**
  * statsfs_source_add_values - adds values to the given source
  * @source: a pointer to the source that will receive the values
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index d78064007b17..46c66ea3fc9e 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -10,6 +10,7 @@
 #define CRAMFS_MAGIC		0x28cd3d45	/* some random number */
 #define CRAMFS_MAGIC_WEND	0x453dcd28	/* magic number with the wrong endianess */
 #define DEBUGFS_MAGIC          0x64626720
+#define STATSFS_MAGIC          0x73746174
 #define SECURITYFS_MAGIC	0x73636673
 #define SELINUX_MAGIC		0xf97cff8c
 #define SMACK_MAGIC		0x43415d53	/* "SMAC" */
diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 027b18f7ed8c..6fe306206dfb 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -35,6 +35,10 @@
 #define TRACEFS_MAGIC          0x74726163
 #endif
 
+#ifndef STATSFS_MAGIC
+#define STATSFS_MAGIC          0x73746174
+#endif
+
 #ifndef HUGETLBFS_MAGIC
 #define HUGETLBFS_MAGIC        0x958458f6
 #endif
@@ -76,6 +80,16 @@ static const char * const tracefs__known_mountpoints[] = {
 	0,
 };
 
+#ifndef STATSFS_DEFAULT_PATH
+#define STATSFS_DEFAULT_PATH "/sys/kernel/statsfs"
+#endif
+
+static const char * const statsfs__known_mountpoints[] = {
+	STATSFS_DEFAULT_PATH,
+	"/statsfs",
+	0,
+};
+
 static const char * const hugetlbfs__known_mountpoints[] = {
 	0,
 };
@@ -100,6 +114,7 @@ enum {
 	FS__TRACEFS = 3,
 	FS__HUGETLBFS = 4,
 	FS__BPF_FS = 5,
+	FS__STATSFS = 6,
 };
 
 #ifndef TRACEFS_MAGIC
@@ -127,6 +142,11 @@ static struct fs fs__entries[] = {
 		.mounts	= tracefs__known_mountpoints,
 		.magic	= TRACEFS_MAGIC,
 	},
+	[FS__STATSFS] = {
+		.name	= "statsfs",
+		.mounts	= statsfs__known_mountpoints,
+		.magic	= STATSFS_MAGIC,
+	},
 	[FS__HUGETLBFS] = {
 		.name	= "hugetlbfs",
 		.mounts = hugetlbfs__known_mountpoints,
@@ -297,6 +317,7 @@ FS(sysfs,   FS__SYSFS);
 FS(procfs,  FS__PROCFS);
 FS(debugfs, FS__DEBUGFS);
 FS(tracefs, FS__TRACEFS);
+FS(statsfs, FS__STATSFS);
 FS(hugetlbfs, FS__HUGETLBFS);
 FS(bpf_fs, FS__BPF_FS);
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs
  2020-04-27 14:18 [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2020-04-27 14:18 ` [RFC PATCH 4/5] statsfs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
@ 2020-04-27 14:18 ` Emanuele Giuseppe Esposito
  2020-04-28 17:56   ` Randy Dunlap
  4 siblings, 1 reply; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 14:18 UTC (permalink / raw)
  To: kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

Use statsfs API instead of debugfs to create sources and add values.

This also requires to change all architecture files to replace the old
debugfs_entries with statsfs_vcpu_entries and statsfs_vm_entries.

The files/folders name and organization is kept unchanged, and a symlink
in sys/kernel/debugfs/kvm is left for backward compatibility.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 arch/arm64/kvm/guest.c          |   2 +-
 arch/mips/kvm/mips.c            |   2 +-
 arch/powerpc/kvm/book3s.c       |   6 +-
 arch/powerpc/kvm/booke.c        |   8 +-
 arch/s390/kvm/kvm-s390.c        |  16 +-
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/Makefile           |   2 +-
 arch/x86/kvm/debugfs.c          |  64 -------
 arch/x86/kvm/statsfs.c          |  49 +++++
 arch/x86/kvm/x86.c              |   6 +-
 include/linux/kvm_host.h        |  39 +---
 virt/kvm/arm/arm.c              |   2 +-
 virt/kvm/kvm_main.c             | 314 ++++----------------------------
 13 files changed, 130 insertions(+), 382 deletions(-)
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/statsfs.c

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 8417b200bec9..be024740aa67 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,7 +29,7 @@
 
 #include "trace.h"
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct statsfs_value statsfs_vcpu_entries[] = {
 	VCPU_STAT("halt_successful_poll", halt_successful_poll),
 	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
 	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index fdf1c14d9205..13266b0f5d2d 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,7 +39,7 @@
 #define VECTORSPACING 0x100	/* for EI/VI mode */
 #endif
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct statsfs_value statsfs_vcpu_entries[] = {
 	VCPU_STAT("wait", wait_exits),
 	VCPU_STAT("cache", cache_exits),
 	VCPU_STAT("signal", signal_exits),
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 37508a356f28..5b1a78747267 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,7 +38,7 @@
 
 /* #define EXIT_DEBUG */
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct statsfs_value statsfs_vcpu_entries[] = {
 	VCPU_STAT("exits", sum_exits),
 	VCPU_STAT("mmio", mmio_exits),
 	VCPU_STAT("sig", signal_exits),
@@ -66,6 +66,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("pthru_all", pthru_all),
 	VCPU_STAT("pthru_host", pthru_host),
 	VCPU_STAT("pthru_bad_aff", pthru_bad_aff),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vm_entries[] = {
 	VM_STAT("largepages_2M", num_2M_pages, .mode = 0444),
 	VM_STAT("largepages_1G", num_1G_pages, .mode = 0444),
 	{ NULL }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c2984cb6dfa7..ef3e3bbab2d8 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -35,7 +35,12 @@
 
 unsigned long kvmppc_booke_handlers;
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct statsfs_value statsfs_vm_entries[] = {
+	VM_STAT("remote_tlb_flush", remote_tlb_flush),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vcpu_entries[] = {
 	VCPU_STAT("mmio", mmio_exits),
 	VCPU_STAT("sig", signal_exits),
 	VCPU_STAT("itlb_r", itlb_real_miss_exits),
@@ -54,7 +59,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("halt_wakeup", halt_wakeup),
 	VCPU_STAT("doorbell", dbell_exits),
 	VCPU_STAT("guest doorbell", gdbell_exits),
-	VM_STAT("remote_tlb_flush", remote_tlb_flush),
 	{ NULL }
 };
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index dbeb7da07f18..c22378fdc1a2 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -57,7 +57,16 @@
 #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
 			   (KVM_MAX_VCPUS + LOCAL_IRQS))
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct statsfs_value statsfs_vm_entries[] = {
+	VM_STAT("inject_float_mchk", inject_float_mchk),
+	VM_STAT("inject_io", inject_io),
+	VM_STAT("inject_pfault_done", inject_pfault_done),
+	VM_STAT("inject_service_signal", inject_service_signal),
+	VM_STAT("inject_virtio", inject_virtio),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vcpu_entries[] = {
 	VCPU_STAT("userspace_handled", exit_userspace),
 	VCPU_STAT("exit_null", exit_null),
 	VCPU_STAT("exit_validity", exit_validity),
@@ -95,18 +104,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("inject_ckc", inject_ckc),
 	VCPU_STAT("inject_cputm", inject_cputm),
 	VCPU_STAT("inject_external_call", inject_external_call),
-	VM_STAT("inject_float_mchk", inject_float_mchk),
 	VCPU_STAT("inject_emergency_signal", inject_emergency_signal),
-	VM_STAT("inject_io", inject_io),
 	VCPU_STAT("inject_mchk", inject_mchk),
-	VM_STAT("inject_pfault_done", inject_pfault_done),
 	VCPU_STAT("inject_program", inject_program),
 	VCPU_STAT("inject_restart", inject_restart),
-	VM_STAT("inject_service_signal", inject_service_signal),
 	VCPU_STAT("inject_set_prefix", inject_set_prefix),
 	VCPU_STAT("inject_stop_signal", inject_stop_signal),
 	VCPU_STAT("inject_pfault_init", inject_pfault_init),
-	VM_STAT("inject_virtio", inject_virtio),
 	VCPU_STAT("instruction_epsw", instruction_epsw),
 	VCPU_STAT("instruction_gs", instruction_gs),
 	VCPU_STAT("instruction_io_other", instruction_io_other),
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..b360ce4b3c5e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -35,7 +35,7 @@
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
 
-#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
+#define __KVM_HAVE_ARCH_VCPU_STATSFS
 
 #define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a789759b7261..117b2f7e9c92 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
-			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o
+			   hyperv.o statsfs.o mmu/mmu.o mmu/page_track.o
 
 kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
deleted file mode 100644
index 018aebce33ff..000000000000
--- a/arch/x86/kvm/debugfs.c
+++ /dev/null
@@ -1,64 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Kernel-based Virtual Machine driver for Linux
- *
- * Copyright 2016 Red Hat, Inc. and/or its affiliates.
- */
-#include <linux/kvm_host.h>
-#include <linux/debugfs.h>
-#include "lapic.h"
-
-static int vcpu_get_timer_advance_ns(void *data, u64 *val)
-{
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->arch.apic->lapic_timer.timer_advance_ns;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, NULL, "%llu\n");
-
-static int vcpu_get_tsc_offset(void *data, u64 *val)
-{
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->arch.tsc_offset;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_offset_fops, vcpu_get_tsc_offset, NULL, "%lld\n");
-
-static int vcpu_get_tsc_scaling_ratio(void *data, u64 *val)
-{
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->arch.tsc_scaling_ratio;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_fops, vcpu_get_tsc_scaling_ratio, NULL, "%llu\n");
-
-static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val)
-{
-	*val = kvm_tsc_scaling_ratio_frac_bits;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n");
-
-void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
-{
-	debugfs_create_file("tsc-offset", 0444, vcpu->debugfs_dentry, vcpu,
-			    &vcpu_tsc_offset_fops);
-
-	if (lapic_in_kernel(vcpu))
-		debugfs_create_file("lapic_timer_advance_ns", 0444,
-				    vcpu->debugfs_dentry, vcpu,
-				    &vcpu_timer_advance_ns_fops);
-
-	if (kvm_has_tsc_control) {
-		debugfs_create_file("tsc-scaling-ratio", 0444,
-				    vcpu->debugfs_dentry, vcpu,
-				    &vcpu_tsc_scaling_fops);
-		debugfs_create_file("tsc-scaling-ratio-frac-bits", 0444,
-				    vcpu->debugfs_dentry, vcpu,
-				    &vcpu_tsc_scaling_frac_fops);
-	}
-}
diff --git a/arch/x86/kvm/statsfs.c b/arch/x86/kvm/statsfs.c
new file mode 100644
index 000000000000..31f58b5694ca
--- /dev/null
+++ b/arch/x86/kvm/statsfs.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Kernel-based Virtual Machine driver for Linux
+ *
+ * Copyright 2016 Red Hat, Inc. and/or its affiliates.
+ */
+#include <linux/kvm_host.h>
+#include <linux/statsfs.h>
+#include "lapic.h"
+
+#define VCPU_ARCH_STATSFS(n, s, x, ...)								\
+			{ n, offsetof(struct s, x), .aggr_kind = STATSFS_SUM, ##__VA_ARGS__ }
+
+struct statsfs_value statsfs_vcpu_tsc_offset[] = {
+	VCPU_ARCH_STATSFS("tsc-offset", kvm_vcpu_arch, tsc_offset,
+						.type = STATSFS_S64, .mode = 0444),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vcpu_arch_lapic_timer[] = {
+	VCPU_ARCH_STATSFS("lapic_timer_advance_ns", kvm_timer, timer_advance_ns,
+						.type = STATSFS_U64, .mode = 0444),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vcpu_arch_tsc_ratio[] = {
+	VCPU_ARCH_STATSFS("tsc-scaling-ratio", kvm_vcpu_arch, tsc_scaling_ratio,
+						.type = STATSFS_U64, .mode = 0444),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vcpu_arch_tsc_frac[] = {
+	{ "tsc-scaling-ratio-frac-bits", 0, .type = STATSFS_U64, .mode = 0444 },
+	{ NULL } /* base is &kvm_tsc_scaling_ratio_frac_bits */
+};
+
+void kvm_arch_create_vcpu_statsfs(struct kvm_vcpu *vcpu)
+{
+	statsfs_source_add_values(vcpu->statsfs_src, statsfs_vcpu_tsc_offset, &vcpu->arch);
+
+	if (lapic_in_kernel(vcpu))
+		statsfs_source_add_values(vcpu->statsfs_src, statsfs_vcpu_arch_lapic_timer, &vcpu->arch.apic->lapic_timer);
+
+	if (kvm_has_tsc_control) {
+		statsfs_source_add_values(vcpu->statsfs_src, statsfs_vcpu_arch_tsc_ratio, &vcpu->arch);
+		statsfs_source_add_values(vcpu->statsfs_src, statsfs_vcpu_arch_tsc_frac,
+								&kvm_tsc_scaling_ratio_frac_bits);
+	}
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 35723dafedeb..53e4e8edeaee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -190,7 +190,7 @@ static u64 __read_mostly host_xss;
 u64 __read_mostly supported_xss;
 EXPORT_SYMBOL_GPL(supported_xss);
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct statsfs_value statsfs_vcpu_entries[] = {
 	VCPU_STAT("pf_fixed", pf_fixed),
 	VCPU_STAT("pf_guest", pf_guest),
 	VCPU_STAT("tlb_flush", tlb_flush),
@@ -217,6 +217,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("nmi_injections", nmi_injections),
 	VCPU_STAT("req_event", req_event),
 	VCPU_STAT("l1d_flush", l1d_flush),
+	{ NULL }
+};
+
+struct statsfs_value statsfs_vm_entries[] = {
 	VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped),
 	VM_STAT("mmu_pte_write", mmu_pte_write),
 	VM_STAT("mmu_pte_updated", mmu_pte_updated),
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3845f857ef7b..2b47d01e6ed7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -27,6 +27,7 @@
 #include <linux/refcount.h>
 #include <linux/nospec.h>
 #include <asm/signal.h>
+#include <linux/statsfs.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -318,7 +319,7 @@ struct kvm_vcpu {
 	bool preempted;
 	bool ready;
 	struct kvm_vcpu_arch arch;
-	struct dentry *debugfs_dentry;
+	struct statsfs_source *statsfs_src;
 };
 
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
@@ -498,8 +499,7 @@ struct kvm {
 	long tlbs_dirty;
 	struct list_head devices;
 	u64 manual_dirty_log_protect;
-	struct dentry *debugfs_dentry;
-	struct kvm_stat_data **debugfs_stat_data;
+	struct statsfs_source *statsfs_src;
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
@@ -880,8 +880,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
 
-#ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
-void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu);
+#ifdef __KVM_HAVE_ARCH_VCPU_STATSFS
+void kvm_arch_create_vcpu_statsfs(struct kvm_vcpu *vcpu);
 #endif
 
 int kvm_arch_hardware_enable(void);
@@ -1110,33 +1110,14 @@ static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
 	return kvm_is_error_hva(hva);
 }
 
-enum kvm_stat_kind {
-	KVM_STAT_VM,
-	KVM_STAT_VCPU,
-};
-
-struct kvm_stat_data {
-	struct kvm *kvm;
-	struct kvm_stats_debugfs_item *dbgfs_item;
-};
-
-struct kvm_stats_debugfs_item {
-	const char *name;
-	int offset;
-	enum kvm_stat_kind kind;
-	int mode;
-};
-
-#define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
-	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
-
 #define VM_STAT(n, x, ...) 													\
-	{ n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
+	{ n, offsetof(struct kvm, stat.x), STATSFS_U64, STATSFS_SUM, ## __VA_ARGS__ }
 #define VCPU_STAT(n, x, ...)												\
-	{ n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
+	{ n, offsetof(struct kvm_vcpu, stat.x), STATSFS_U64, STATSFS_SUM, ## __VA_ARGS__ }
 
-extern struct kvm_stats_debugfs_item debugfs_entries[];
-extern struct dentry *kvm_debugfs_dir;
+extern struct statsfs_value statsfs_vcpu_entries[];
+extern struct statsfs_value statsfs_vm_entries[];
+extern struct statsfs_source *kvm_statsfs_dir;
 
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..7301f6cf4fcc 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -140,7 +140,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return ret;
 }
 
-int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+int kvm_arch_create_vcpu_statsfs(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 74bdb7bf3295..4cb140371a84 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -25,6 +25,7 @@
 #include <linux/vmalloc.h>
 #include <linux/reboot.h>
 #include <linux/debugfs.h>
+#include <linux/statsfs.h>
 #include <linux/highmem.h>
 #include <linux/file.h>
 #include <linux/syscore_ops.h>
@@ -109,11 +110,8 @@ static struct kmem_cache *kvm_vcpu_cache;
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
 
-struct dentry *kvm_debugfs_dir;
-EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
-
-static int kvm_debugfs_num_entries;
-static const struct file_operations stat_fops_per_vm;
+struct statsfs_source *kvm_statsfs_dir;
+EXPORT_SYMBOL_GPL(kvm_statsfs_dir);
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
@@ -356,6 +354,8 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 
 void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	statsfs_source_revoke(vcpu->statsfs_src);
+	statsfs_source_put(vcpu->statsfs_src);
 	kvm_arch_vcpu_destroy(vcpu);
 
 	/*
@@ -601,52 +601,27 @@ static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
 	kvfree(slots);
 }
 
-static void kvm_destroy_vm_debugfs(struct kvm *kvm)
+static void kvm_destroy_vm_statsfs(struct kvm *kvm)
 {
-	int i;
-
-	if (!kvm->debugfs_dentry)
-		return;
-
-	debugfs_remove_recursive(kvm->debugfs_dentry);
-
-	if (kvm->debugfs_stat_data) {
-		for (i = 0; i < kvm_debugfs_num_entries; i++)
-			kfree(kvm->debugfs_stat_data[i]);
-		kfree(kvm->debugfs_stat_data);
-	}
+	statsfs_source_remove_subordinate(kvm_statsfs_dir, kvm->statsfs_src);
+	statsfs_source_revoke(kvm->statsfs_src);
+	statsfs_source_put(kvm->statsfs_src);
 }
 
-static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
+static int kvm_create_vm_statsfs(struct kvm *kvm, int fd)
 {
 	char dir_name[ITOA_MAX_LEN * 2];
-	struct kvm_stat_data *stat_data;
-	struct kvm_stats_debugfs_item *p;
 
-	if (!debugfs_initialized())
+	if (!statsfs_initialized())
 		return 0;
 
 	snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
-	kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+	kvm->statsfs_src = statsfs_source_create(dir_name);
+	statsfs_source_add_subordinate(kvm_statsfs_dir, kvm->statsfs_src);
 
-	kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
-					 sizeof(*kvm->debugfs_stat_data),
-					 GFP_KERNEL_ACCOUNT);
-	if (!kvm->debugfs_stat_data)
-		return -ENOMEM;
+	statsfs_source_add_values(kvm->statsfs_src, statsfs_vm_entries, kvm);
 
-	for (p = debugfs_entries; p->name; p++) {
-		stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL_ACCOUNT);
-		if (!stat_data)
-			return -ENOMEM;
-
-		stat_data->kvm = kvm;
-		stat_data->dbgfs_item = p;
-		kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
-		debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
-				    kvm->debugfs_dentry, stat_data,
-				    &stat_fops_per_vm);
-	}
+	statsfs_source_add_values(kvm->statsfs_src, statsfs_vcpu_entries, NULL);
 	return 0;
 }
 
@@ -783,7 +758,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	struct mm_struct *mm = kvm->mm;
 
 	kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
-	kvm_destroy_vm_debugfs(kvm);
+	kvm_destroy_vm_statsfs(kvm);
 	kvm_arch_sync_events(kvm);
 	mutex_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
@@ -2946,7 +2921,6 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp)
 {
 	struct kvm_vcpu *vcpu = filp->private_data;
 
-	debugfs_remove_recursive(vcpu->debugfs_dentry);
 	kvm_put_kvm(vcpu->kvm);
 	return 0;
 }
@@ -2970,19 +2944,22 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 	return anon_inode_getfd(name, &kvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC);
 }
 
-static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+static void kvm_create_vcpu_statsfs(struct kvm_vcpu *vcpu)
 {
-#ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
 	char dir_name[ITOA_MAX_LEN * 2];
 
-	if (!debugfs_initialized())
+	if (!statsfs_initialized())
 		return;
 
 	snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id);
-	vcpu->debugfs_dentry = debugfs_create_dir(dir_name,
-						  vcpu->kvm->debugfs_dentry);
 
-	kvm_arch_create_vcpu_debugfs(vcpu);
+	vcpu->statsfs_src = statsfs_source_create(dir_name);
+	statsfs_source_add_subordinate(vcpu->kvm->statsfs_src, vcpu->statsfs_src);
+
+	statsfs_source_add_values(vcpu->statsfs_src, statsfs_vcpu_entries, vcpu);
+
+#ifdef __KVM_HAVE_ARCH_VCPU_STATSFS
+	kvm_arch_create_vcpu_statsfs(vcpu);
 #endif
 }
 
@@ -3031,8 +3008,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	if (r)
 		goto vcpu_free_run_page;
 
-	kvm_create_vcpu_debugfs(vcpu);
-
 	mutex_lock(&kvm->lock);
 	if (kvm_get_vcpu_by_id(kvm, id)) {
 		r = -EEXIST;
@@ -3061,11 +3036,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 
 	mutex_unlock(&kvm->lock);
 	kvm_arch_vcpu_postcreate(vcpu);
+	kvm_create_vcpu_statsfs(vcpu);
 	return r;
 
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
-	debugfs_remove_recursive(vcpu->debugfs_dentry);
 	kvm_arch_vcpu_destroy(vcpu);
 vcpu_free_run_page:
 	free_page((unsigned long)vcpu->run);
@@ -3839,7 +3814,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 	 * cases it will be called by the final fput(file) and will take
 	 * care of doing kvm_put_kvm(kvm).
 	 */
-	if (kvm_create_vm_debugfs(kvm, r) < 0) {
+	if (kvm_create_vm_statsfs(kvm, r) < 0) {
 		put_unused_fd(r);
 		fput(file);
 		return -ENOMEM;
@@ -4295,214 +4270,6 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 }
 EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
 
-static int kvm_debugfs_open(struct inode *inode, struct file *file,
-			   int (*get)(void *, u64 *), int (*set)(void *, u64),
-			   const char *fmt)
-{
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
-					  inode->i_private;
-
-	/* The debugfs files are a reference to the kvm struct which
-	 * is still valid when kvm_destroy_vm is called.
-	 * To avoid the race between open and the removal of the debugfs
-	 * directory we test against the users count.
-	 */
-	if (!refcount_inc_not_zero(&stat_data->kvm->users_count))
-		return -ENOENT;
-
-	if (simple_attr_open(inode, file, get,
-		    KVM_DBGFS_GET_MODE(stat_data->dbgfs_item) & 0222
-		    ? set : NULL,
-		    fmt)) {
-		kvm_put_kvm(stat_data->kvm);
-		return -ENOMEM;
-	}
-
-	return 0;
-}
-
-static int kvm_debugfs_release(struct inode *inode, struct file *file)
-{
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
-					  inode->i_private;
-
-	simple_attr_release(inode, file);
-	kvm_put_kvm(stat_data->kvm);
-
-	return 0;
-}
-
-static int kvm_get_stat_per_vm(struct kvm *kvm, size_t offset, u64 *val)
-{
-	*val = *(ulong *)((void *)kvm + offset);
-
-	return 0;
-}
-
-static int kvm_clear_stat_per_vm(struct kvm *kvm, size_t offset)
-{
-	*(ulong *)((void *)kvm + offset) = 0;
-
-	return 0;
-}
-
-static int kvm_get_stat_per_vcpu(struct kvm *kvm, size_t offset, u64 *val)
-{
-	int i;
-	struct kvm_vcpu *vcpu;
-
-	*val = 0;
-
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		*val += *(u64 *)((void *)vcpu + offset);
-
-	return 0;
-}
-
-static int kvm_clear_stat_per_vcpu(struct kvm *kvm, size_t offset)
-{
-	int i;
-	struct kvm_vcpu *vcpu;
-
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		*(u64 *)((void *)vcpu + offset) = 0;
-
-	return 0;
-}
-
-static int kvm_stat_data_get(void *data, u64 *val)
-{
-	int r = -EFAULT;
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
-
-	switch (stat_data->dbgfs_item->kind) {
-	case KVM_STAT_VM:
-		r = kvm_get_stat_per_vm(stat_data->kvm,
-					stat_data->dbgfs_item->offset, val);
-		break;
-	case KVM_STAT_VCPU:
-		r = kvm_get_stat_per_vcpu(stat_data->kvm,
-					  stat_data->dbgfs_item->offset, val);
-		break;
-	}
-
-	return r;
-}
-
-static int kvm_stat_data_clear(void *data, u64 val)
-{
-	int r = -EFAULT;
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
-
-	if (val)
-		return -EINVAL;
-
-	switch (stat_data->dbgfs_item->kind) {
-	case KVM_STAT_VM:
-		r = kvm_clear_stat_per_vm(stat_data->kvm,
-					  stat_data->dbgfs_item->offset);
-		break;
-	case KVM_STAT_VCPU:
-		r = kvm_clear_stat_per_vcpu(stat_data->kvm,
-					    stat_data->dbgfs_item->offset);
-		break;
-	}
-
-	return r;
-}
-
-static int kvm_stat_data_open(struct inode *inode, struct file *file)
-{
-	__simple_attr_check_format("%llu\n", 0ull);
-	return kvm_debugfs_open(inode, file, kvm_stat_data_get,
-				kvm_stat_data_clear, "%llu\n");
-}
-
-static const struct file_operations stat_fops_per_vm = {
-	.owner = THIS_MODULE,
-	.open = kvm_stat_data_open,
-	.release = kvm_debugfs_release,
-	.read = simple_attr_read,
-	.write = simple_attr_write,
-	.llseek = no_llseek,
-};
-
-static int vm_stat_get(void *_offset, u64 *val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-	u64 tmp_val;
-
-	*val = 0;
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_get_stat_per_vm(kvm, offset, &tmp_val);
-		*val += tmp_val;
-	}
-	mutex_unlock(&kvm_lock);
-	return 0;
-}
-
-static int vm_stat_clear(void *_offset, u64 val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-
-	if (val)
-		return -EINVAL;
-
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_clear_stat_per_vm(kvm, offset);
-	}
-	mutex_unlock(&kvm_lock);
-
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, vm_stat_clear, "%llu\n");
-
-static int vcpu_stat_get(void *_offset, u64 *val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-	u64 tmp_val;
-
-	*val = 0;
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_get_stat_per_vcpu(kvm, offset, &tmp_val);
-		*val += tmp_val;
-	}
-	mutex_unlock(&kvm_lock);
-	return 0;
-}
-
-static int vcpu_stat_clear(void *_offset, u64 val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-
-	if (val)
-		return -EINVAL;
-
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_clear_stat_per_vcpu(kvm, offset);
-	}
-	mutex_unlock(&kvm_lock);
-
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, vcpu_stat_clear,
-			"%llu\n");
-
-static const struct file_operations *stat_fops[] = {
-	[KVM_STAT_VCPU] = &vcpu_stat_fops,
-	[KVM_STAT_VM]   = &vm_stat_fops,
-};
-
 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
 {
 	struct kobj_uevent_env *env;
@@ -4537,34 +4304,32 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
 	}
 	add_uevent_var(env, "PID=%d", kvm->userspace_pid);
 
-	if (!IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
+	if (!IS_ERR_OR_NULL(kvm->statsfs_src->source_dentry)) {
 		char *tmp, *p = kmalloc(PATH_MAX, GFP_KERNEL_ACCOUNT);
 
 		if (p) {
-			tmp = dentry_path_raw(kvm->debugfs_dentry, p, PATH_MAX);
+			tmp = dentry_path_raw(kvm->statsfs_src->source_dentry, p, PATH_MAX);
 			if (!IS_ERR(tmp))
 				add_uevent_var(env, "STATS_PATH=%s", tmp);
 			kfree(p);
 		}
 	}
+
 	/* no need for checks, since we are adding at most only 5 keys */
 	env->envp[env->envp_idx++] = NULL;
 	kobject_uevent_env(&kvm_dev.this_device->kobj, KOBJ_CHANGE, env->envp);
 	kfree(env);
 }
 
-static void kvm_init_debug(void)
+static void kvm_init_statsfs(void)
 {
-	struct kvm_stats_debugfs_item *p;
+	kvm_statsfs_dir = statsfs_source_create("kvm");
+	/* symlink to debugfs */
+	debugfs_create_symlink("kvm", NULL, "/sys/kernel/statsfs/kvm");
+	statsfs_source_register(kvm_statsfs_dir);
 
-	kvm_debugfs_dir = debugfs_create_dir("kvm", NULL);
-
-	kvm_debugfs_num_entries = 0;
-	for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
-		debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
-				    kvm_debugfs_dir, (void *)(long)p->offset,
-				    stat_fops[p->kind]);
-	}
+	statsfs_source_add_values(kvm_statsfs_dir, statsfs_vcpu_entries, NULL);
+	statsfs_source_add_values(kvm_statsfs_dir, statsfs_vm_entries, NULL);
 }
 
 static int kvm_suspend(void)
@@ -4738,7 +4503,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	kvm_preempt_ops.sched_in = kvm_sched_in;
 	kvm_preempt_ops.sched_out = kvm_sched_out;
 
-	kvm_init_debug();
+	kvm_init_statsfs();
 
 	r = kvm_vfio_ops_init();
 	WARN_ON(r);
@@ -4767,7 +4532,8 @@ EXPORT_SYMBOL_GPL(kvm_init);
 
 void kvm_exit(void)
 {
-	debugfs_remove_recursive(kvm_debugfs_dir);
+	statsfs_source_revoke(kvm_statsfs_dir);
+	statsfs_source_put(kvm_statsfs_dir);
 	misc_deregister(&kvm_dev);
 	kmem_cache_destroy(kvm_vcpu_cache);
 	kvm_async_pf_deinit();
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-27 14:18 ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Emanuele Giuseppe Esposito
@ 2020-04-27 15:47   ` Matthew Wilcox
  2020-04-27 16:48     ` Emanuele Giuseppe Esposito
  2020-04-29  9:49     ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs Emanuele Giuseppe Esposito
  2020-04-27 21:53   ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Andreas Dilger
  2020-04-28 17:47   ` Randy Dunlap
  2 siblings, 2 replies; 17+ messages in thread
From: Matthew Wilcox @ 2020-04-27 15:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, linux-fsdevel, mst, borntraeger, Paolo Bonzini

On Mon, Apr 27, 2020 at 04:18:13PM +0200, Emanuele Giuseppe Esposito wrote:
> +static struct statsfs_value *find_value(struct statsfs_value_source *src,
> +					struct statsfs_value *val)
> +{
> +	struct statsfs_value *entry;
> +
> +	for (entry = src->values; entry->name; entry++) {
> +		if (entry == val) {
> +			WARN_ON(strcmp(entry->name, val->name) != 0);

Umm.  'entry' and 'val' are pointers.  So if entry is equal to val,
how could entry->name and val->name not be the same thing?

> +/* Called with rwsem held for writing */
> +static struct statsfs_value_source *create_value_source(void *base)
> +{
> +	struct statsfs_value_source *val_src;
> +
> +	val_src = kzalloc(sizeof(struct statsfs_value_source), GFP_KERNEL);
> +	if (!val_src)
> +		return ERR_PTR(-ENOMEM);
> +
> +	val_src->base_addr = base;
> +	val_src->list_element =
> +		(struct list_head)LIST_HEAD_INIT(val_src->list_element);

This is not how LIST_HEAD_INIT is generally used, but see below.

> +int statsfs_source_add_values(struct statsfs_source *source,
> +			      struct statsfs_value *stat, void *ptr)
> +{
> +	struct statsfs_value_source *val_src;
> +	struct statsfs_value_source *entry;
> +
> +	down_write(&source->rwsem);
> +
> +	list_for_each_entry(entry, &source->values_head, list_element) {
> +		if (entry->base_addr == ptr && entry->values == stat) {
> +			up_write(&source->rwsem);
> +			return -EEXIST;
> +		}
> +	}
> +
> +	val_src = create_value_source(ptr);
> +	val_src->values = (struct statsfs_value *)stat;
> +
> +	/* add the val_src to the source list */
> +	list_add(&val_src->list_element, &source->values_head);
> +
> +	up_write(&source->rwsem);

I dislike this use of doubly linked lists.  I would suggest using an
allocating XArray to store your values.  Something like this:

+int statsfs_source_add_values(struct statsfs_source *source,
+			      struct statsfs_value *stat, void *ptr)
+{
+	struct statsfs_value_source *entry, *val_src;
+	unsigned long index;
+	int err = -EEXIST;
+
+	val_src = create_value_source(ptr);
+	val_src->values = stat;
+
+	xa_lock(&source->values);
+	xa_for_each(&source->values, index, entry) {
+		if (entry->base_addr == ptr && entry->values == stat)
+			goto out;
+	}
+
+	err = __xa_alloc(&source->values, &val_src->id, val_src, xa_limit_32b,
+			GFP_KERNEL);
+out:
+	xa_unlock(&source->values);
+	if (err)
+		kfree(val_src);
+	return err;
+}

Using an XArray avoids the occasional latency problems you can see with
rwsems, as well as being more cache-effective than a linked list.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-27 15:47   ` Matthew Wilcox
@ 2020-04-27 16:48     ` Emanuele Giuseppe Esposito
  2020-04-29  9:49     ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs Emanuele Giuseppe Esposito
  1 sibling, 0 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-27 16:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: kvm, linux-fsdevel, mst, borntraeger, Paolo Bonzini



On 4/27/20 5:47 PM, Matthew Wilcox wrote:
> On Mon, Apr 27, 2020 at 04:18:13PM +0200, Emanuele Giuseppe Esposito wrote:
>> +static struct statsfs_value *find_value(struct statsfs_value_source *src,
>> +					struct statsfs_value *val)
>> +{
>> +	struct statsfs_value *entry;
>> +
>> +	for (entry = src->values; entry->name; entry++) {
>> +		if (entry == val) {
>> +			WARN_ON(strcmp(entry->name, val->name) != 0);
> 
> Umm.  'entry' and 'val' are pointers.  So if entry is equal to val,
> how could entry->name and val->name not be the same thing?

Good catch, I'll get rid of that check.
> 
> 
> +int statsfs_source_add_values(struct statsfs_source *source,
> +			      struct statsfs_value *stat, void *ptr)
> +{
> +	struct statsfs_value_source *entry, *val_src;
> +	unsigned long index;
> +	int err = -EEXIST;
> +
> +	val_src = create_value_source(ptr);
> +	val_src->values = stat;
> +
> +	xa_lock(&source->values);
> +	xa_for_each(&source->values, index, entry) {
> +		if (entry->base_addr == ptr && entry->values == stat)
> +			goto out;
> +	}
> +
> +	err = __xa_alloc(&source->values, &val_src->id, val_src, xa_limit_32b,
> +			GFP_KERNEL);
> +out:
> +	xa_unlock(&source->values);
> +	if (err)
> +		kfree(val_src);
> +	return err;
> +}
> 
> Using an XArray avoids the occasional latency problems you can see with
> rwsems, as well as being more cache-effective than a linked list.

I didn't know about XArrays, I'll give them a look. I will also fix the 
list initialization with INIT_LIST_HEAD.

Thank you for the above example, but I don't think that each source 
would have more than 2 or 3 value_sources, so using a linked list there 
should be fine.
However, this might be a good point for the subordinates list.

Regarding the locking, the rwsem is also used to protect the other 
lists and dentry of the source, so it wouldn't be removed anyways.

Thank you,

Emanuele


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-27 14:18 ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Emanuele Giuseppe Esposito
  2020-04-27 15:47   ` Matthew Wilcox
@ 2020-04-27 21:53   ` Andreas Dilger
  2020-04-29 10:55     ` Emanuele Giuseppe Esposito
  2020-04-28 17:47   ` Randy Dunlap
  2 siblings, 1 reply; 17+ messages in thread
From: Andreas Dilger @ 2020-04-27 21:53 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, linux-fsdevel, mst, borntraeger, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 29462 bytes --]

On Apr 27, 2020, at 8:18 AM, Emanuele Giuseppe Esposito <eesposit@redhat.com> wrote:
> 
> Introduction to the statsfs API, that allows to easily create, add
> and remove statsfs sources and values.

Not a huge issue, but IMHO the "statsfs" name is confusingly similar to
the existing "statfs" function name.  Could you name this interface
something more distinct?  Even "fs_stats" or "stats_fs" or similar would
at least be visibly different.

Cheers, Andreas

> The API allows to easily building
> the statistics directory tree to automatically gather them for the linux
> kernel. The main functionalities are: create a source, add child
> sources/values/aggregates, register it to the root source (that on
> the virtual fs would be /sys/kernel/statsfs), ad perform a search for
> a value/aggregate.
> 
> This allows creating any kind of source tree, making it more flexible
> also to future readjustments.
> 
> The API representation is only logical and will be backed up
> by a virtual file system in patch 4.
> Its usage will be shared between the statsfs file system
> and the end-users like kvm, the former calling it when it needs to
> display and clear statistics, the latter to add values and sources.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
> fs/Kconfig              |   7 +
> fs/Makefile             |   1 +
> fs/statsfs/Makefile     |   4 +
> fs/statsfs/internal.h   |  20 ++
> fs/statsfs/statsfs.c    | 618 ++++++++++++++++++++++++++++++++++++++++
> include/linux/statsfs.h | 222 +++++++++++++++
> 6 files changed, 872 insertions(+)
> create mode 100644 fs/statsfs/Makefile
> create mode 100644 fs/statsfs/internal.h
> create mode 100644 fs/statsfs/statsfs.c
> create mode 100644 include/linux/statsfs.h
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index f08fbbfafd9a..824fcf86d12b 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -328,4 +328,11 @@ source "fs/unicode/Kconfig"
> config IO_WQ
> 	bool
> 
> +config STATS_FS
> +	bool "Statistics Filesystem"
> +	default y
> +	help
> +	  statsfs is a virtual file system that provides counters and other
> +	  statistics about the running kernel.
> +
> endmenu
> diff --git a/fs/Makefile b/fs/Makefile
> index 2ce5112b02c8..6942070f54b2 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS)		+= befs/
> obj-$(CONFIG_HOSTFS)		+= hostfs/
> obj-$(CONFIG_CACHEFILES)	+= cachefiles/
> obj-$(CONFIG_DEBUG_FS)		+= debugfs/
> +obj-$(CONFIG_STATS_FS)		+= statsfs/
> obj-$(CONFIG_TRACING)		+= tracefs/
> obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
> obj-$(CONFIG_BTRFS_FS)		+= btrfs/
> diff --git a/fs/statsfs/Makefile b/fs/statsfs/Makefile
> new file mode 100644
> index 000000000000..d494a3f30ba5
> --- /dev/null
> +++ b/fs/statsfs/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +statsfs-objs	:= statsfs.o
> +
> +obj-$(CONFIG_STATS_FS)	+= statsfs.o
> diff --git a/fs/statsfs/internal.h b/fs/statsfs/internal.h
> new file mode 100644
> index 000000000000..f124683a2ded
> --- /dev/null
> +++ b/fs/statsfs/internal.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _STATSFS_INTERNAL_H_
> +#define _STATSFS_INTERNAL_H_
> +
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/rwsem.h>
> +#include <linux/statsfs.h>
> +
> +/* values, grouped by base */
> +struct statsfs_value_source {
> +	void *base_addr;
> +	bool files_created;
> +	struct statsfs_value *values;
> +	struct list_head list_element;
> +};
> +
> +int statsfs_val_get_mode(struct statsfs_value *val);
> +
> +#endif /* _STATSFS_INTERNAL_H_ */
> diff --git a/fs/statsfs/statsfs.c b/fs/statsfs/statsfs.c
> new file mode 100644
> index 000000000000..0ad1d985be46
> --- /dev/null
> +++ b/fs/statsfs/statsfs.c
> @@ -0,0 +1,618 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/module.h>
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/slab.h>
> +#include <linux/rwsem.h>
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/limits.h>
> +#include <linux/statsfs.h>
> +
> +#include "internal.h"
> +
> +struct statsfs_aggregate_value {
> +	uint64_t sum, min, max;
> +	uint32_t count, count_zero;
> +};
> +
> +static int is_val_signed(struct statsfs_value *val)
> +{
> +	return val->type & STATSFS_SIGN;
> +}
> +
> +int statsfs_val_get_mode(struct statsfs_value *val)
> +{
> +	return val->mode ? val->mode : 0644;
> +}
> +
> +static struct statsfs_value *find_value(struct statsfs_value_source *src,
> +					struct statsfs_value *val)
> +{
> +	struct statsfs_value *entry;
> +
> +	for (entry = src->values; entry->name; entry++) {
> +		if (entry == val) {
> +			WARN_ON(strcmp(entry->name, val->name) != 0);
> +			return entry;
> +		}
> +	}
> +	return NULL;
> +}
> +
> +static struct statsfs_value *
> +search_value_in_source(struct statsfs_source *src, struct statsfs_value *arg,
> +		       struct statsfs_value_source **val_src)
> +{
> +	struct statsfs_value *entry;
> +	struct statsfs_value_source *src_entry;
> +
> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
> +		entry = find_value(src_entry, arg);
> +		if (entry) {
> +			*val_src = src_entry;
> +			return entry;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Called with rwsem held for writing */
> +static struct statsfs_value_source *create_value_source(void *base)
> +{
> +	struct statsfs_value_source *val_src;
> +
> +	val_src = kzalloc(sizeof(struct statsfs_value_source), GFP_KERNEL);
> +	if (!val_src)
> +		return ERR_PTR(-ENOMEM);
> +
> +	val_src->base_addr = base;
> +	val_src->list_element =
> +		(struct list_head)LIST_HEAD_INIT(val_src->list_element);
> +
> +	return val_src;
> +}
> +
> +int statsfs_source_add_values(struct statsfs_source *source,
> +			      struct statsfs_value *stat, void *ptr)
> +{
> +	struct statsfs_value_source *val_src;
> +	struct statsfs_value_source *entry;
> +
> +	down_write(&source->rwsem);
> +
> +	list_for_each_entry(entry, &source->values_head, list_element) {
> +		if (entry->base_addr == ptr && entry->values == stat) {
> +			up_write(&source->rwsem);
> +			return -EEXIST;
> +		}
> +	}
> +
> +	val_src = create_value_source(ptr);
> +	val_src->values = (struct statsfs_value *)stat;
> +
> +	/* add the val_src to the source list */
> +	list_add(&val_src->list_element, &source->values_head);
> +
> +	up_write(&source->rwsem);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_add_values);
> +
> +void statsfs_source_add_subordinate(struct statsfs_source *source,
> +				    struct statsfs_source *sub)
> +{
> +	down_write(&source->rwsem);
> +
> +	statsfs_source_get(sub);
> +	list_add(&sub->list_element, &source->subordinates_head);
> +
> +	up_write(&source->rwsem);
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_add_subordinate);
> +
> +/* Called with rwsem held for writing */
> +static void
> +statsfs_source_remove_subordinate_locked(struct statsfs_source *source,
> +					 struct statsfs_source *sub)
> +{
> +	struct list_head *it, *safe;
> +	struct statsfs_source *src_entry;
> +
> +	list_for_each_safe(it, safe, &source->subordinates_head) {
> +		src_entry = list_entry(it, struct statsfs_source, list_element);
> +		if (src_entry == sub) {
> +			WARN_ON(strcmp(src_entry->name, sub->name) != 0);
> +			list_del_init(&src_entry->list_element);
> +			statsfs_source_put(src_entry);
> +			return;
> +		}
> +	}
> +}
> +
> +void statsfs_source_remove_subordinate(struct statsfs_source *source,
> +				       struct statsfs_source *sub)
> +{
> +	down_write(&source->rwsem);
> +	statsfs_source_remove_subordinate_locked(source, sub);
> +	up_write(&source->rwsem);
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_remove_subordinate);
> +
> +/* Called with rwsem held for reading */
> +static uint64_t get_simple_value(struct statsfs_value_source *src,
> +				 struct statsfs_value *val)
> +{
> +	uint64_t value_found;
> +	void *address;
> +
> +	address = src->base_addr + val->offset;
> +
> +	switch (val->type) {
> +	case STATSFS_U8:
> +		value_found = *((uint8_t *)address);
> +		break;
> +	case STATSFS_U8 | STATSFS_SIGN:
> +		value_found = *((int8_t *)address);
> +		break;
> +	case STATSFS_U16:
> +		value_found = *((uint16_t *)address);
> +		break;
> +	case STATSFS_U16 | STATSFS_SIGN:
> +		value_found = *((int16_t *)address);
> +		break;
> +	case STATSFS_U32:
> +		value_found = *((uint32_t *)address);
> +		break;
> +	case STATSFS_U32 | STATSFS_SIGN:
> +		value_found = *((int32_t *)address);
> +		break;
> +	case STATSFS_U64:
> +		value_found = *((uint64_t *)address);
> +		break;
> +	case STATSFS_U64 | STATSFS_SIGN:
> +		value_found = *((int64_t *)address);
> +		break;
> +	case STATSFS_BOOL:
> +		value_found = *((uint8_t *)address);
> +		break;
> +	default:
> +		value_found = 0;
> +		break;
> +	}
> +
> +	return value_found;
> +}
> +
> +/* Called with rwsem held for reading */
> +static void clear_simple_value(struct statsfs_value_source *src,
> +			       struct statsfs_value *val)
> +{
> +	void *address;
> +
> +	address = src->base_addr + val->offset;
> +
> +	switch (val->type) {
> +	case STATSFS_U8:
> +		*((uint8_t *)address) = 0;
> +		break;
> +	case STATSFS_U8 | STATSFS_SIGN:
> +		*((int8_t *)address) = 0;
> +		break;
> +	case STATSFS_U16:
> +		*((uint16_t *)address) = 0;
> +		break;
> +	case STATSFS_U16 | STATSFS_SIGN:
> +		*((int16_t *)address) = 0;
> +		break;
> +	case STATSFS_U32:
> +		*((uint32_t *)address) = 0;
> +		break;
> +	case STATSFS_U32 | STATSFS_SIGN:
> +		*((int32_t *)address) = 0;
> +		break;
> +	case STATSFS_U64:
> +		*((uint64_t *)address) = 0;
> +		break;
> +	case STATSFS_U64 | STATSFS_SIGN:
> +		*((int64_t *)address) = 0;
> +		break;
> +	case STATSFS_BOOL:
> +		*((uint8_t *)address) = 0;
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static void search_all_simple_values(struct statsfs_source *src,
> +				     struct statsfs_value_source *ref_src_entry,
> +				     struct statsfs_value *val,
> +				     struct statsfs_aggregate_value *agg)
> +{
> +	struct statsfs_value_source *src_entry;
> +	uint64_t value_found;
> +
> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
> +		/* skip aggregates */
> +		if (src_entry->base_addr == NULL)
> +			continue;
> +
> +		/* useless to search here */
> +		if (src_entry->values != ref_src_entry->values)
> +			continue;
> +
> +		/* must be here */
> +		value_found = get_simple_value(src_entry, val);
> +		agg->sum += value_found;
> +		agg->count++;
> +		agg->count_zero += (value_found == 0);
> +
> +		if (is_val_signed(val)) {
> +			agg->max = (((int64_t)value_found) >=
> +				    ((int64_t)agg->max)) ?
> +					   value_found :
> +					   agg->max;
> +			agg->min = (((int64_t)value_found) <=
> +				    ((int64_t)agg->min)) ?
> +					   value_found :
> +					   agg->min;
> +		} else {
> +			agg->max = (value_found >= agg->max) ? value_found :
> +							       agg->max;
> +			agg->min = (value_found <= agg->min) ? value_found :
> +							       agg->min;
> +		}
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static void do_recursive_aggregation(struct statsfs_source *root,
> +				     struct statsfs_value_source *ref_src_entry,
> +				     struct statsfs_value *val,
> +				     struct statsfs_aggregate_value *agg)
> +{
> +	struct statsfs_source *subordinate;
> +
> +	/* search all simple values in this folder */
> +	search_all_simple_values(root, ref_src_entry, val, agg);
> +
> +	/* recursively search in all subfolders */
> +	list_for_each_entry(subordinate, &root->subordinates_head,
> +			     list_element) {
> +		down_read(&subordinate->rwsem);
> +		do_recursive_aggregation(subordinate, ref_src_entry, val, agg);
> +		up_read(&subordinate->rwsem);
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static void init_aggregate_value(struct statsfs_aggregate_value *agg,
> +				 struct statsfs_value *val)
> +{
> +	agg->count = agg->count_zero = agg->sum = 0;
> +	if (is_val_signed(val)) {
> +		agg->max = S64_MIN;
> +		agg->min = S64_MAX;
> +	} else {
> +		agg->max = 0;
> +		agg->min = U64_MAX;
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static void store_final_value(struct statsfs_aggregate_value *agg,
> +			    struct statsfs_value *val, uint64_t *ret)
> +{
> +	int operation;
> +
> +	operation = val->aggr_kind | is_val_signed(val);
> +
> +	switch (operation) {
> +	case STATSFS_AVG:
> +		*ret = agg->count ? agg->sum / agg->count : 0;
> +		break;
> +	case STATSFS_AVG | STATSFS_SIGN:
> +		*ret = agg->count ? ((int64_t)agg->sum) / agg->count : 0;
> +		break;
> +	case STATSFS_SUM:
> +	case STATSFS_SUM | STATSFS_SIGN:
> +		*ret = agg->sum;
> +		break;
> +	case STATSFS_MIN:
> +	case STATSFS_MIN | STATSFS_SIGN:
> +		*ret = agg->min;
> +		break;
> +	case STATSFS_MAX:
> +	case STATSFS_MAX | STATSFS_SIGN:
> +		*ret = agg->max;
> +		break;
> +	case STATSFS_COUNT_ZERO:
> +	case STATSFS_COUNT_ZERO | STATSFS_SIGN:
> +		*ret = agg->count_zero;
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static int statsfs_source_get_value_locked(struct statsfs_source *source,
> +					   struct statsfs_value *arg,
> +					   uint64_t *ret)
> +{
> +	struct statsfs_value_source *src_entry;
> +	struct statsfs_value *found;
> +	struct statsfs_aggregate_value aggr;
> +
> +	*ret = 0;
> +
> +	if (!arg)
> +		return -ENOENT;
> +
> +	/* look in simple values */
> +	found = search_value_in_source(source, arg, &src_entry);
> +
> +	if (!found) {
> +		printk(KERN_ERR "Statsfs: Value in source \"%s\" not found!\n",
> +		       source->name);
> +		return -ENOENT;
> +	}
> +
> +	if (src_entry->base_addr != NULL) {
> +		*ret = get_simple_value(src_entry, found);
> +		return 0;
> +	}
> +
> +	/* look in aggregates */
> +	init_aggregate_value(&aggr, found);
> +	do_recursive_aggregation(source, src_entry, found, &aggr);
> +	store_final_value(&aggr, found, ret);
> +
> +	return 0;
> +}
> +
> +int statsfs_source_get_value(struct statsfs_source *source,
> +			     struct statsfs_value *arg, uint64_t *ret)
> +{
> +	int retval;
> +
> +	down_read(&source->rwsem);
> +	retval = statsfs_source_get_value_locked(source, arg, ret);
> +	up_read(&source->rwsem);
> +
> +	return retval;
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_get_value);
> +
> +/* Called with rwsem held for reading */
> +static void set_all_simple_values(struct statsfs_source *src,
> +				  struct statsfs_value_source *ref_src_entry,
> +				  struct statsfs_value *val)
> +{
> +	struct statsfs_value_source *src_entry;
> +
> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
> +		/* skip aggregates */
> +		if (src_entry->base_addr == NULL)
> +			continue;
> +
> +		/* wrong to search here */
> +		if (src_entry->values != ref_src_entry->values)
> +			continue;
> +
> +		if (src_entry->base_addr &&
> +			src_entry->values == ref_src_entry->values)
> +			clear_simple_value(src_entry, val);
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static void do_recursive_clean(struct statsfs_source *root,
> +			       struct statsfs_value_source *ref_src_entry,
> +			       struct statsfs_value *val)
> +{
> +	struct statsfs_source *subordinate;
> +
> +	/* search all simple values in this folder */
> +	set_all_simple_values(root, ref_src_entry, val);
> +
> +	/* recursively search in all subfolders */
> +	list_for_each_entry(subordinate, &root->subordinates_head,
> +			     list_element) {
> +		down_read(&subordinate->rwsem);
> +		do_recursive_clean(subordinate, ref_src_entry, val);
> +		up_read(&subordinate->rwsem);
> +	}
> +}
> +
> +/* Called with rwsem held for reading */
> +static int statsfs_source_clear_locked(struct statsfs_source *source,
> +				       struct statsfs_value *val)
> +{
> +	struct statsfs_value_source *src_entry;
> +	struct statsfs_value *found;
> +
> +	if (!val)
> +		return -ENOENT;
> +
> +	/* look in simple values */
> +	found = search_value_in_source(source, val, &src_entry);
> +
> +	if (!found) {
> +		printk(KERN_ERR "Statsfs: Value in source \"%s\" not found!\n",
> +		       source->name);
> +		return -ENOENT;
> +	}
> +
> +	if (src_entry->base_addr != NULL) {
> +		clear_simple_value(src_entry, found);
> +		return 0;
> +	}
> +
> +	/* look in aggregates */
> +	do_recursive_clean(source, src_entry, found);
> +
> +	return 0;
> +}
> +
> +int statsfs_source_clear(struct statsfs_source *source,
> +			 struct statsfs_value *val)
> +{
> +	int retval;
> +
> +	down_read(&source->rwsem);
> +	retval = statsfs_source_clear_locked(source, val);
> +	up_read(&source->rwsem);
> +
> +	return retval;
> +}
> +
> +/* Called with rwsem held for reading */
> +static struct statsfs_value *
> +find_value_by_name(struct statsfs_value_source *src, char *val)
> +{
> +	struct statsfs_value *entry;
> +
> +	for (entry = src->values; entry->name; entry++)
> +		if (!strcmp(entry->name, val))
> +			return entry;
> +
> +	return NULL;
> +}
> +
> +/* Called with rwsem held for reading */
> +static struct statsfs_value *
> +search_in_source_by_name(struct statsfs_source *src, char *name)
> +{
> +	struct statsfs_value *entry;
> +	struct statsfs_value_source *src_entry;
> +
> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
> +		entry = find_value_by_name(src_entry, name);
> +		if (entry)
> +			return entry;
> +	}
> +
> +	return NULL;
> +}
> +
> +int statsfs_source_get_value_by_name(struct statsfs_source *source, char *name,
> +				     uint64_t *ret)
> +{
> +	struct statsfs_value *val;
> +	int retval;
> +
> +	down_read(&source->rwsem);
> +	val = search_in_source_by_name(source, name);
> +
> +	if (!val) {
> +		*ret = 0;
> +		up_read(&source->rwsem);
> +		return -ENOENT;
> +	}
> +
> +	retval = statsfs_source_get_value_locked(source, val, ret);
> +	up_read(&source->rwsem);
> +
> +	return retval;
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_get_value_by_name);
> +
> +void statsfs_source_get(struct statsfs_source *source)
> +{
> +	kref_get(&source->refcount);
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_get);
> +
> +void statsfs_source_revoke(struct statsfs_source *source)
> +{
> +	struct list_head *it, *safe;
> +	struct statsfs_value_source *val_src_entry;
> +
> +	down_write(&source->rwsem);
> +
> +	list_for_each_safe(it, safe, &source->values_head) {
> +		val_src_entry = list_entry(it, struct statsfs_value_source,
> +					   list_element);
> +		val_src_entry->base_addr = NULL;
> +	}
> +
> +	up_write(&source->rwsem);
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_revoke);
> +
> +/* Called with rwsem held for writing
> + *
> + * The refcount is 0 and the lock was taken before refcount
> + * went from 1 to 0
> + */
> +static void statsfs_source_destroy(struct kref *kref_source)
> +{
> +	struct statsfs_value_source *val_src_entry;
> +	struct list_head *it, *safe;
> +	struct statsfs_source *child, *source;
> +
> +	source = container_of(kref_source, struct statsfs_source, refcount);
> +
> +	/* iterate through the values and delete them */
> +	list_for_each_safe(it, safe, &source->values_head) {
> +		val_src_entry = list_entry(it, struct statsfs_value_source,
> +					   list_element);
> +		kfree(val_src_entry);
> +	}
> +
> +	/* iterate through the subordinates and delete them */
> +	list_for_each_safe(it, safe, &source->subordinates_head) {
> +		child = list_entry(it, struct statsfs_source, list_element);
> +		statsfs_source_remove_subordinate_locked(source, child);
> +	}
> +
> +
> +	up_write(&source->rwsem);
> +	kfree(source->name);
> +	kfree(source);
> +}
> +
> +void statsfs_source_put(struct statsfs_source *source)
> +{
> +	kref_put_rwsem(&source->refcount, statsfs_source_destroy,
> +		       &source->rwsem);
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_put);
> +
> +struct statsfs_source *statsfs_source_create(const char *fmt, ...)
> +{
> +	va_list ap;
> +	char buf[100];
> +	struct statsfs_source *ret;
> +	int char_needed;
> +
> +	va_start(ap, fmt);
> +	char_needed = vsnprintf(buf, 100, fmt, ap);
> +	va_end(ap);
> +
> +	ret = kzalloc(sizeof(struct statsfs_source), GFP_KERNEL);
> +	if (!ret)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret->name = kstrdup(buf, GFP_KERNEL);
> +	if (!ret->name) {
> +		kfree(ret);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	kref_init(&ret->refcount);
> +	init_rwsem(&ret->rwsem);
> +
> +	INIT_LIST_HEAD(&ret->values_head);
> +	INIT_LIST_HEAD(&ret->subordinates_head);
> +	INIT_LIST_HEAD(&ret->list_element);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(statsfs_source_create);
> diff --git a/include/linux/statsfs.h b/include/linux/statsfs.h
> new file mode 100644
> index 000000000000..3f01f094946d
> --- /dev/null
> +++ b/include/linux/statsfs.h
> @@ -0,0 +1,222 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + *  statsfs.h - a tiny little statistics file system
> + *
> + *  Copyright (C) 2020 Emanuele Giuseppe Esposito
> + *  Copyright (C) 2020 Redhat.
> + *
> + */
> +
> +#ifndef _STATSFS_H_
> +#define _STATSFS_H_
> +
> +#include <linux/list.h>
> +
> +/* Used to distinguish signed types */
> +#define STATSFS_SIGN 0x8000
> +
> +struct statsfs_source;
> +
> +enum stat_type {
> +	STATSFS_U8 = 0,
> +	STATSFS_U16 = 1,
> +	STATSFS_U32 = 2,
> +	STATSFS_U64 = 3,
> +	STATSFS_BOOL = 4,
> +	STATSFS_S8 = STATSFS_U8 | STATSFS_SIGN,
> +	STATSFS_S16 = STATSFS_U16 | STATSFS_SIGN,
> +	STATSFS_S32 = STATSFS_U32 | STATSFS_SIGN,
> +	STATSFS_S64 = STATSFS_U64 | STATSFS_SIGN,
> +};
> +
> +enum stat_aggr {
> +	STATSFS_NONE = 0,
> +	STATSFS_SUM,
> +	STATSFS_MIN,
> +	STATSFS_MAX,
> +	STATSFS_COUNT_ZERO,
> +	STATSFS_AVG,
> +};
> +
> +struct statsfs_value {
> +	/* Name of the stat */
> +	char *name;
> +
> +	/* Offset from base address to field containing the value */
> +	int offset;
> +
> +	/* Type of the stat BOOL,U64,... */
> +	enum stat_type type;
> +
> +	/* Aggregate type: MIN, MAX, SUM,... */
> +	enum stat_aggr aggr_kind;
> +
> +	/* File mode */
> +	uint16_t mode;
> +};
> +
> +struct statsfs_source {
> +	struct kref refcount;
> +
> +	char *name;
> +
> +	/* list of source statsfs_value_source*/
> +	struct list_head values_head;
> +
> +	/* list of struct statsfs_source for subordinate sources */
> +	struct list_head subordinates_head;
> +
> +	struct list_head list_element;
> +
> +	struct rw_semaphore rwsem;
> +
> +	struct dentry *source_dentry;
> +};
> +
> +/**
> + * statsfs_source_create - create a statsfs_source
> + * Creates a statsfs_source with the given name. This
> + * does not mean it will be backed by the filesystem yet, it will only
> + * be visible to the user once one of its parents (or itself) are
> + * registered in statsfs.
> + *
> + * Returns a pointer to a statsfs_source if it succeeds.
> + * This or one of the parents' pointer must be passed to the statsfs_put()
> + * function when the file is to be removed.  If an error occurs,
> + * ERR_PTR(-ERROR) will be returned.
> + */
> +struct statsfs_source *statsfs_source_create(const char *fmt, ...);
> +
> +/**
> + * statsfs_source_add_values - adds values to the given source
> + * @source: a pointer to the source that will receive the values
> + * @val: a pointer to the NULL terminated statsfs_value array to add
> + * @base_ptr: a pointer to the base pointer used by these values
> + *
> + * In addition to adding values to the source, also create the
> + * files in the filesystem if the source already is backed up by a directory.
> + *
> + * Returns 0 it succeeds. If the value are already in the
> + * source and have the same base_ptr, -EEXIST is returned.
> + */
> +int statsfs_source_add_values(struct statsfs_source *source,
> +			      struct statsfs_value *val, void *base_ptr);
> +
> +/**
> + * statsfs_source_add_subordinate - adds a child to the given source
> + * @parent: a pointer to the parent source
> + * @child: a pointer to child source to add
> + *
> + * Recursively create all files in the statsfs filesystem
> + * only if the parent has already a dentry (created with
> + * statsfs_source_register).
> + * This avoids the case where this function is called before register.
> + */
> +void statsfs_source_add_subordinate(struct statsfs_source *parent,
> +				    struct statsfs_source *child);
> +
> +/**
> + * statsfs_source_remove_subordinate - removes a child from the given source
> + * @parent: a pointer to the parent source
> + * @child: a pointer to child source to remove
> + *
> + * Look if there is such child in the parent. If so,
> + * it will remove all its files and call statsfs_put on the child.
> + */
> +void statsfs_source_remove_subordinate(struct statsfs_source *parent,
> +				       struct statsfs_source *child);
> +
> +/**
> + * statsfs_source_get_value - search a value in the source (and
> + * subordinates)
> + * @source: a pointer to the source that will be searched
> + * @val: a pointer to the statsfs_value to search
> + * @ret: a pointer to the uint64_t that will hold the found value
> + *
> + * Look up in the source if a value with same value pointer
> + * exists.
> + * If not, it will return -ENOENT. If it exists and it's a simple value
> + * (not an aggregate), the value that it points to will be returned.
> + * If it exists and it's an aggregate (aggr_type != STATSFS_NONE), all
> + * subordinates will be recursively searched and every simple value match
> + * will be used to aggregate the final result. For example if it's a sum,
> + * all suboordinates having the same value will be sum together.
> + *
> + * This function will return 0 it succeeds.
> + */
> +int statsfs_source_get_value(struct statsfs_source *source,
> +			     struct statsfs_value *val, uint64_t *ret);
> +
> +/**
> + * statsfs_source_get_value_by_name - search a value in the source (and
> + * subordinates)
> + * @source: a pointer to the source that will be searched
> + * @name: a pointer to the string representing the value to search
> + *        (for example "exits")
> + * @ret: a pointer to the uint64_t that will hold the found value
> + *
> + * Same as statsfs_source_get_value, but initially the name is used
> + * to search in the given source if there is a value with a matching
> + * name. If so, statsfs_source_get_value will be called with the found
> + * value, otherwise -ENOENT will be returned.
> + */
> +int statsfs_source_get_value_by_name(struct statsfs_source *source, char *name,
> +				     uint64_t *ret);
> +
> +/**
> + * statsfs_source_clear - search and clears a value in the source (and
> + * subordinates)
> + * @source: a pointer to the source that will be searched
> + * @val: a pointer to the statsfs_value to search
> + *
> + * Look up in the source if a value with same value pointer
> + * exists.
> + * If not, it will return -ENOENT. If it exists and it's a simple value
> + * (not an aggregate), the value that it points to will be set to 0.
> + * If it exists and it's an aggregate (aggr_type != STATSFS_NONE), all
> + * subordinates will be recursively searched and every simple value match
> + * will be set to 0.
> + *
> + * This function will return 0 it succeeds.
> + */
> +int statsfs_source_clear(struct statsfs_source *source,
> +			 struct statsfs_value *val);
> +
> +/**
> + * statsfs_source_revoke - disconnect the source from its backing data
> + * @source: a pointer to the source that will be revoked
> + *
> + * Ensure that statsfs will not access the data that were passed to
> + * statsfs_source_add_value for this source.
> + *
> + * Because open files increase the reference count for a statsfs_source,
> + * the source can end up living longer than the data that provides the
> + * values for the source.  Calling statsfs_source_revoke just before the
> + * backing data is freed avoids accesses to freed data structures.  The
> + * sources will return 0.
> + */
> +void statsfs_source_revoke(struct statsfs_source *source);
> +
> +/**
> + * statsfs_source_get - increases refcount of source
> + * @source: a pointer to the source whose refcount will be increased
> + */
> +void statsfs_source_get(struct statsfs_source *source);
> +
> +/**
> + * statsfs_source_put - decreases refcount of source and deletes if needed
> + * @source: a pointer to the source whose refcount will be decreased
> + *
> + * If refcount arrives to zero, take care of deleting
> + * and free the source resources and files, by firstly recursively calling
> + * statsfs_source_remove_subordinate to the child and then deleting
> + * its own files and allocations.
> + */
> +void statsfs_source_put(struct statsfs_source *source);
> +
> +/**
> + * statsfs_initialized - returns true if statsfs fs has been registered
> + */
> +bool statsfs_initialized(void);
> +
> +#endif
> --
> 2.25.2
> 


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-27 14:18 ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Emanuele Giuseppe Esposito
  2020-04-27 15:47   ` Matthew Wilcox
  2020-04-27 21:53   ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Andreas Dilger
@ 2020-04-28 17:47   ` Randy Dunlap
  2020-04-29 10:34     ` Paolo Bonzini
  2 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2020-04-28 17:47 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini

On 4/27/20 7:18 AM, Emanuele Giuseppe Esposito wrote:
> diff --git a/fs/Kconfig b/fs/Kconfig
> index f08fbbfafd9a..824fcf86d12b 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -328,4 +328,11 @@ source "fs/unicode/Kconfig"
>  config IO_WQ
>  	bool
>  
> +config STATS_FS
> +	bool "Statistics Filesystem"
> +	default y

Not default y. We don't enable things that are not required.
Unless you have a convincing argument otherwise.

> +	help
> +	  statsfs is a virtual file system that provides counters and other
> +	  statistics about the running kernel.
> +
>  endmenu


-- 
~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/5] kunit: tests for statsfs API
  2020-04-27 14:18 ` [RFC PATCH 3/5] kunit: tests for statsfs API Emanuele Giuseppe Esposito
@ 2020-04-28 17:50   ` Randy Dunlap
  0 siblings, 0 replies; 17+ messages in thread
From: Randy Dunlap @ 2020-04-28 17:50 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini

On 4/27/20 7:18 AM, Emanuele Giuseppe Esposito wrote:
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 824fcf86d12b..6145b607e0bc 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -335,4 +335,10 @@ config STATS_FS
>  	  statsfs is a virtual file system that provides counters and other
>  	  statistics about the running kernel.
>  
> +config STATS_FS_TEST
> +    bool "Tests for statsfs"
> +    depends on STATS_FS && KUNIT

The 2 lines above should be indented with one tab, not spaces.

> +	help
> +	  statsfs tests for the statsfs API.
> +
>  endmenu

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs
  2020-04-27 14:18 ` [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs Emanuele Giuseppe Esposito
@ 2020-04-28 17:56   ` Randy Dunlap
  2020-04-29 10:34     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2020-04-28 17:56 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, kvm
  Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini

On 4/27/20 7:18 AM, Emanuele Giuseppe Esposito wrote:
> Use statsfs API instead of debugfs to create sources and add values.
> 
> This also requires to change all architecture files to replace the old
> debugfs_entries with statsfs_vcpu_entries and statsfs_vm_entries.
> 
> The files/folders name and organization is kept unchanged, and a symlink
> in sys/kernel/debugfs/kvm is left for backward compatibility.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  arch/arm64/kvm/guest.c          |   2 +-
>  arch/mips/kvm/mips.c            |   2 +-
>  arch/powerpc/kvm/book3s.c       |   6 +-
>  arch/powerpc/kvm/booke.c        |   8 +-
>  arch/s390/kvm/kvm-s390.c        |  16 +-
>  arch/x86/include/asm/kvm_host.h |   2 +-
>  arch/x86/kvm/Makefile           |   2 +-
>  arch/x86/kvm/debugfs.c          |  64 -------
>  arch/x86/kvm/statsfs.c          |  49 +++++
>  arch/x86/kvm/x86.c              |   6 +-
>  include/linux/kvm_host.h        |  39 +---
>  virt/kvm/arm/arm.c              |   2 +-
>  virt/kvm/kvm_main.c             | 314 ++++----------------------------
>  13 files changed, 130 insertions(+), 382 deletions(-)
>  delete mode 100644 arch/x86/kvm/debugfs.c
>  create mode 100644 arch/x86/kvm/statsfs.c


You might want to select STATS_FS here (or depend on it if it is required),
or you could provide stubs in <linux/statsfs.h> for the cases of STATS_FS
is not set/enabled.

-- 
~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/5] statsfs API: create, add and remove statsfs
  2020-04-27 15:47   ` Matthew Wilcox
  2020-04-27 16:48     ` Emanuele Giuseppe Esposito
@ 2020-04-29  9:49     ` Emanuele Giuseppe Esposito
  1 sibling, 0 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-29  9:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kvm, linux-fsdevel, mst, borntraeger, Paolo Bonzini,
	Emanuele Giuseppe Esposito

Hi Mattew,
I am trying to apply your Xarrays suggestion, but I don't
understand how to make them properly work. In particular, the __xa_alloc
function always returns -EINVAL.

I tried to follow the Xarrays kernel doc and the example you provided to
replace the subordinates linked list, but alloc always returns that error.

Below you can find the changes I intended to do.
Can you help me?

Thank you,
Emanuele

------ 8< -----------
From ad5d20b6ce7995b2d1164104cf958f7bc3e692fa Mon Sep 17 00:00:00 2001
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Date: Tue, 28 Apr 2020 12:21:00 +0200
Subject: [PATCH] statsfs: switch subordinate sources to xarray

---
 fs/statsfs/statsfs.c    | 45 +++++++++++++++++++++++++++--------------
 include/linux/statsfs.h |  5 ++---
 2 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/fs/statsfs/statsfs.c b/fs/statsfs/statsfs.c
index c8cfa590a3b0..0cf135c36776 100644
--- a/fs/statsfs/statsfs.c
+++ b/fs/statsfs/statsfs.c
@@ -107,11 +107,12 @@ const struct file_operations statsfs_ops = {
 static void statsfs_source_remove_files_locked(struct statsfs_source *src)
 {
 	struct statsfs_source *child;
+	unsigned long index;
 
 	if (src->source_dentry == NULL)
 		return;
 
-	list_for_each_entry(child, &src->subordinates_head, list_element)
+	xa_for_each (&src->subordinates, index, child)
 		statsfs_source_remove_files(child);
 
 	statsfs_remove_recursive(src->source_dentry);
@@ -180,6 +181,7 @@ static void statsfs_create_files_recursive_locked(struct statsfs_source *source,
 						  struct dentry *parent_dentry)
 {
 	struct statsfs_source *child;
+	unsigned long index;
 
 	/* first check values in this folder, since it might be new */
 	if (!source->source_dentry) {
@@ -189,7 +191,7 @@ static void statsfs_create_files_recursive_locked(struct statsfs_source *source,
 
 	statsfs_create_files_locked(source);
 
-	list_for_each_entry(child, &source->subordinates_head, list_element) {
+	xa_for_each (&source->subordinates, index, child) {
 		if (child->source_dentry == NULL) {
 			/* assume that if child has a folder,
 			 * also the sub-child have that.
@@ -258,10 +260,23 @@ EXPORT_SYMBOL_GPL(statsfs_source_add_values);
 void statsfs_source_add_subordinate(struct statsfs_source *source,
 				    struct statsfs_source *sub)
 {
+	int err;
+	uint32_t index;
+
 	down_write(&source->rwsem);
 
 	statsfs_source_get(sub);
-	list_add(&sub->list_element, &source->subordinates_head);
+	err = __xa_alloc(&source->subordinates, &index, sub, xa_limit_32b,
+		       GFP_KERNEL);
+
+	if (err) {
+		printk(KERN_ERR "Failed to insert subordinate %s\n"
+			"Too many subordinates in source %s\n",
+			sub->name, source->name);
+		up_write(&source->rwsem);
+		return;
+	}
+
 	if (source->source_dentry)
 		statsfs_create_files_recursive_locked(sub,
 						      source->source_dentry);
@@ -276,10 +291,11 @@ statsfs_source_remove_subordinate_locked(struct statsfs_source *source,
 					 struct statsfs_source *sub)
 {
 	struct statsfs_source *src_entry;
+	unsigned long index;
 
-	list_for_each_entry(src_entry, &source->subordinates_head, list_element) {
+	xa_for_each (&source->subordinates, index, src_entry) {
 		if (src_entry == sub) {
-			list_del_init(&src_entry->list_element);
+			xa_erase(&source->subordinates, index);
 			statsfs_source_remove_files(src_entry);
 			statsfs_source_put(src_entry);
 			return;
@@ -431,13 +447,13 @@ static void do_recursive_aggregation(struct statsfs_source *root,
 				     struct statsfs_aggregate_value *agg)
 {
 	struct statsfs_source *subordinate;
+	unsigned long index;
 
 	/* search all simple values in this folder */
 	search_all_simple_values(root, ref_src_entry, val, agg);
 
 	/* recursively search in all subfolders */
-	list_for_each_entry(subordinate, &root->subordinates_head,
-			     list_element) {
+	xa_for_each (&root->subordinates, index, subordinate) {
 		down_read(&subordinate->rwsem);
 		do_recursive_aggregation(subordinate, ref_src_entry, val, agg);
 		up_read(&subordinate->rwsem);
@@ -571,13 +587,13 @@ static void do_recursive_clean(struct statsfs_source *root,
 			       struct statsfs_value *val)
 {
 	struct statsfs_source *subordinate;
+	unsigned long index;
 
 	/* search all simple values in this folder */
 	set_all_simple_values(root, ref_src_entry, val);
 
 	/* recursively search in all subfolders */
-	list_for_each_entry(subordinate, &root->subordinates_head,
-			     list_element) {
+	xa_for_each (&root->subordinates, index, subordinate) {
 		down_read(&subordinate->rwsem);
 		do_recursive_clean(subordinate, ref_src_entry, val);
 		up_read(&subordinate->rwsem);
@@ -703,9 +719,10 @@ EXPORT_SYMBOL_GPL(statsfs_source_revoke);
  */
 static void statsfs_source_destroy(struct kref *kref_source)
 {
-	struct statsfs_value_source *val_src_entry;
 	struct list_head *it, *safe;
+	struct statsfs_value_source *val_src_entry;
 	struct statsfs_source *child, *source;
+	unsigned long index;
 
 	source = container_of(kref_source, struct statsfs_source, refcount);
 
@@ -717,15 +734,14 @@ static void statsfs_source_destroy(struct kref *kref_source)
 	}
 
 	/* iterate through the subordinates and delete them */
-	list_for_each_safe(it, safe, &source->subordinates_head) {
-		child = list_entry(it, struct statsfs_source, list_element);
+	xa_for_each (&source->subordinates, index, child)
 		statsfs_source_remove_subordinate_locked(source, child);
-	}
 
 	statsfs_source_remove_files_locked(source);
 
 	up_write(&source->rwsem);
 	kfree(source->name);
+	xa_destroy(&source->subordinates);
 	kfree(source);
 }
 
@@ -761,8 +777,7 @@ struct statsfs_source *statsfs_source_create(const char *fmt, ...)
 	init_rwsem(&ret->rwsem);
 
 	INIT_LIST_HEAD(&ret->values_head);
-	INIT_LIST_HEAD(&ret->subordinates_head);
-	INIT_LIST_HEAD(&ret->list_element);
+	xa_init(&ret->subordinates);
 
 	return ret;
 }
diff --git a/include/linux/statsfs.h b/include/linux/statsfs.h
index f6e8eead1124..20153f50ffc0 100644
--- a/include/linux/statsfs.h
+++ b/include/linux/statsfs.h
@@ -11,6 +11,7 @@
 #define _STATSFS_H_
 
 #include <linux/list.h>
+#include <linux/xarray.h>
 
 /* Used to distinguish signed types */
 #define STATSFS_SIGN 0x8000
@@ -64,9 +65,7 @@ struct statsfs_source {
 	struct list_head values_head;
 
 	/* list of struct statsfs_source for subordinate sources */
-	struct list_head subordinates_head;
-
-	struct list_head list_element;
+	struct xarray subordinates;
 
 	struct rw_semaphore rwsem;
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs
  2020-04-28 17:56   ` Randy Dunlap
@ 2020-04-29 10:34     ` Emanuele Giuseppe Esposito
  2020-04-29 10:35       ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-29 10:34 UTC (permalink / raw)
  To: Randy Dunlap, kvm; +Cc: linux-fsdevel, mst, borntraeger, Paolo Bonzini



On 4/28/20 7:56 PM, Randy Dunlap wrote:
> On 4/27/20 7:18 AM, Emanuele Giuseppe Esposito wrote:
>> Use statsfs API instead of debugfs to create sources and add values.
>>
>> This also requires to change all architecture files to replace the old
>> debugfs_entries with statsfs_vcpu_entries and statsfs_vm_entries.
>>
>> The files/folders name and organization is kept unchanged, and a symlink
>> in sys/kernel/debugfs/kvm is left for backward compatibility.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   arch/arm64/kvm/guest.c          |   2 +-
>>   arch/mips/kvm/mips.c            |   2 +-
>>   arch/powerpc/kvm/book3s.c       |   6 +-
>>   arch/powerpc/kvm/booke.c        |   8 +-
>>   arch/s390/kvm/kvm-s390.c        |  16 +-
>>   arch/x86/include/asm/kvm_host.h |   2 +-
>>   arch/x86/kvm/Makefile           |   2 +-
>>   arch/x86/kvm/debugfs.c          |  64 -------
>>   arch/x86/kvm/statsfs.c          |  49 +++++
>>   arch/x86/kvm/x86.c              |   6 +-
>>   include/linux/kvm_host.h        |  39 +---
>>   virt/kvm/arm/arm.c              |   2 +-
>>   virt/kvm/kvm_main.c             | 314 ++++----------------------------
>>   13 files changed, 130 insertions(+), 382 deletions(-)
>>   delete mode 100644 arch/x86/kvm/debugfs.c
>>   create mode 100644 arch/x86/kvm/statsfs.c
> 
> 
> You might want to select STATS_FS here (or depend on it if it is required),
> or you could provide stubs in <linux/statsfs.h> for the cases of STATS_FS
> is not set/enabled.

Currently debugfs is not present in the kvm Kconfig, but implements 
empty stubs as you suggested. I guess it would be a good idea to do the 
same for statsfs.

Paolo, what do you think?

Regarding the other suggestions, you are right, I will apply them in v2.

Thank you,
Emanuele


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-28 17:47   ` Randy Dunlap
@ 2020-04-29 10:34     ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2020-04-29 10:34 UTC (permalink / raw)
  To: Randy Dunlap, Emanuele Giuseppe Esposito, kvm
  Cc: linux-fsdevel, mst, borntraeger

On 28/04/20 19:47, Randy Dunlap wrote:
>> +config STATS_FS
>> +	bool "Statistics Filesystem"
>> +	default y
> Not default y. We don't enable things that are not required.
> Unless you have a convincing argument otherwise.
> 

I think the best solution is to add stubs to include/linux/statsfs.h,
and use "imply STATS_FS" in KVM.  This would still "default y" when
a subsystem that uses the filesystem is in use, but not otherwise.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs
  2020-04-29 10:34     ` Emanuele Giuseppe Esposito
@ 2020-04-29 10:35       ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2020-04-29 10:35 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, Randy Dunlap, kvm
  Cc: linux-fsdevel, mst, borntraeger

On 29/04/20 12:34, Emanuele Giuseppe Esposito wrote:
>>
>>
>> You might want to select STATS_FS here (or depend on it if it is
>> required),
>> or you could provide stubs in <linux/statsfs.h> for the cases of STATS_FS
>> is not set/enabled.
> 
> Currently debugfs is not present in the kvm Kconfig, but implements
> empty stubs as you suggested. I guess it would be a good idea to do the
> same for statsfs.
> 
> Paolo, what do you think?
> 
> Regarding the other suggestions, you are right, I will apply them in v2.

I replied in v2 - basically "imply" STATS_FS here instead of "selecting" it.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values
  2020-04-27 21:53   ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Andreas Dilger
@ 2020-04-29 10:55     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 17+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-04-29 10:55 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: kvm, linux-fsdevel, mst, borntraeger, Paolo Bonzini



On 4/27/20 11:53 PM, Andreas Dilger wrote:
> On Apr 27, 2020, at 8:18 AM, Emanuele Giuseppe Esposito <eesposit@redhat.com> wrote:
>>
>> Introduction to the statsfs API, that allows to easily create, add
>> and remove statsfs sources and values.
> 
> Not a huge issue, but IMHO the "statsfs" name is confusingly similar to
> the existing "statfs" function name.  Could you name this interface
> something more distinct?  Even "fs_stats" or "stats_fs" or similar would
> at least be visibly different.

You're right, thanks for pointing that out. I am going to change all 
functions and files into stats_fs. The filesystem name, however, will 
still stay statsfs because it follows the same naming as 
debugfs/tracecfs/securityfs and won't interfere or be confused with 
statfs functions.

Thank you,
Emanuele

> 
> Cheers, Andreas
> 
>> The API allows to easily building
>> the statistics directory tree to automatically gather them for the linux
>> kernel. The main functionalities are: create a source, add child
>> sources/values/aggregates, register it to the root source (that on
>> the virtual fs would be /sys/kernel/statsfs), ad perform a search for
>> a value/aggregate.
>>
>> This allows creating any kind of source tree, making it more flexible
>> also to future readjustments.
>>
>> The API representation is only logical and will be backed up
>> by a virtual file system in patch 4.
>> Its usage will be shared between the statsfs file system
>> and the end-users like kvm, the former calling it when it needs to
>> display and clear statistics, the latter to add values and sources.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>> fs/Kconfig              |   7 +
>> fs/Makefile             |   1 +
>> fs/statsfs/Makefile     |   4 +
>> fs/statsfs/internal.h   |  20 ++
>> fs/statsfs/statsfs.c    | 618 ++++++++++++++++++++++++++++++++++++++++
>> include/linux/statsfs.h | 222 +++++++++++++++
>> 6 files changed, 872 insertions(+)
>> create mode 100644 fs/statsfs/Makefile
>> create mode 100644 fs/statsfs/internal.h
>> create mode 100644 fs/statsfs/statsfs.c
>> create mode 100644 include/linux/statsfs.h
>>
>> diff --git a/fs/Kconfig b/fs/Kconfig
>> index f08fbbfafd9a..824fcf86d12b 100644
>> --- a/fs/Kconfig
>> +++ b/fs/Kconfig
>> @@ -328,4 +328,11 @@ source "fs/unicode/Kconfig"
>> config IO_WQ
>> 	bool
>>
>> +config STATS_FS
>> +	bool "Statistics Filesystem"
>> +	default y
>> +	help
>> +	  statsfs is a virtual file system that provides counters and other
>> +	  statistics about the running kernel.
>> +
>> endmenu
>> diff --git a/fs/Makefile b/fs/Makefile
>> index 2ce5112b02c8..6942070f54b2 100644
>> --- a/fs/Makefile
>> +++ b/fs/Makefile
>> @@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS)		+= befs/
>> obj-$(CONFIG_HOSTFS)		+= hostfs/
>> obj-$(CONFIG_CACHEFILES)	+= cachefiles/
>> obj-$(CONFIG_DEBUG_FS)		+= debugfs/
>> +obj-$(CONFIG_STATS_FS)		+= statsfs/
>> obj-$(CONFIG_TRACING)		+= tracefs/
>> obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
>> obj-$(CONFIG_BTRFS_FS)		+= btrfs/
>> diff --git a/fs/statsfs/Makefile b/fs/statsfs/Makefile
>> new file mode 100644
>> index 000000000000..d494a3f30ba5
>> --- /dev/null
>> +++ b/fs/statsfs/Makefile
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +statsfs-objs	:= statsfs.o
>> +
>> +obj-$(CONFIG_STATS_FS)	+= statsfs.o
>> diff --git a/fs/statsfs/internal.h b/fs/statsfs/internal.h
>> new file mode 100644
>> index 000000000000..f124683a2ded
>> --- /dev/null
>> +++ b/fs/statsfs/internal.h
>> @@ -0,0 +1,20 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _STATSFS_INTERNAL_H_
>> +#define _STATSFS_INTERNAL_H_
>> +
>> +#include <linux/list.h>
>> +#include <linux/kref.h>
>> +#include <linux/rwsem.h>
>> +#include <linux/statsfs.h>
>> +
>> +/* values, grouped by base */
>> +struct statsfs_value_source {
>> +	void *base_addr;
>> +	bool files_created;
>> +	struct statsfs_value *values;
>> +	struct list_head list_element;
>> +};
>> +
>> +int statsfs_val_get_mode(struct statsfs_value *val);
>> +
>> +#endif /* _STATSFS_INTERNAL_H_ */
>> diff --git a/fs/statsfs/statsfs.c b/fs/statsfs/statsfs.c
>> new file mode 100644
>> index 000000000000..0ad1d985be46
>> --- /dev/null
>> +++ b/fs/statsfs/statsfs.c
>> @@ -0,0 +1,618 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <linux/module.h>
>> +#include <linux/errno.h>
>> +#include <linux/file.h>
>> +#include <linux/fs.h>
>> +#include <linux/slab.h>
>> +#include <linux/rwsem.h>
>> +#include <linux/list.h>
>> +#include <linux/kref.h>
>> +#include <linux/limits.h>
>> +#include <linux/statsfs.h>
>> +
>> +#include "internal.h"
>> +
>> +struct statsfs_aggregate_value {
>> +	uint64_t sum, min, max;
>> +	uint32_t count, count_zero;
>> +};
>> +
>> +static int is_val_signed(struct statsfs_value *val)
>> +{
>> +	return val->type & STATSFS_SIGN;
>> +}
>> +
>> +int statsfs_val_get_mode(struct statsfs_value *val)
>> +{
>> +	return val->mode ? val->mode : 0644;
>> +}
>> +
>> +static struct statsfs_value *find_value(struct statsfs_value_source *src,
>> +					struct statsfs_value *val)
>> +{
>> +	struct statsfs_value *entry;
>> +
>> +	for (entry = src->values; entry->name; entry++) {
>> +		if (entry == val) {
>> +			WARN_ON(strcmp(entry->name, val->name) != 0);
>> +			return entry;
>> +		}
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static struct statsfs_value *
>> +search_value_in_source(struct statsfs_source *src, struct statsfs_value *arg,
>> +		       struct statsfs_value_source **val_src)
>> +{
>> +	struct statsfs_value *entry;
>> +	struct statsfs_value_source *src_entry;
>> +
>> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
>> +		entry = find_value(src_entry, arg);
>> +		if (entry) {
>> +			*val_src = src_entry;
>> +			return entry;
>> +		}
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +/* Called with rwsem held for writing */
>> +static struct statsfs_value_source *create_value_source(void *base)
>> +{
>> +	struct statsfs_value_source *val_src;
>> +
>> +	val_src = kzalloc(sizeof(struct statsfs_value_source), GFP_KERNEL);
>> +	if (!val_src)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	val_src->base_addr = base;
>> +	val_src->list_element =
>> +		(struct list_head)LIST_HEAD_INIT(val_src->list_element);
>> +
>> +	return val_src;
>> +}
>> +
>> +int statsfs_source_add_values(struct statsfs_source *source,
>> +			      struct statsfs_value *stat, void *ptr)
>> +{
>> +	struct statsfs_value_source *val_src;
>> +	struct statsfs_value_source *entry;
>> +
>> +	down_write(&source->rwsem);
>> +
>> +	list_for_each_entry(entry, &source->values_head, list_element) {
>> +		if (entry->base_addr == ptr && entry->values == stat) {
>> +			up_write(&source->rwsem);
>> +			return -EEXIST;
>> +		}
>> +	}
>> +
>> +	val_src = create_value_source(ptr);
>> +	val_src->values = (struct statsfs_value *)stat;
>> +
>> +	/* add the val_src to the source list */
>> +	list_add(&val_src->list_element, &source->values_head);
>> +
>> +	up_write(&source->rwsem);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_add_values);
>> +
>> +void statsfs_source_add_subordinate(struct statsfs_source *source,
>> +				    struct statsfs_source *sub)
>> +{
>> +	down_write(&source->rwsem);
>> +
>> +	statsfs_source_get(sub);
>> +	list_add(&sub->list_element, &source->subordinates_head);
>> +
>> +	up_write(&source->rwsem);
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_add_subordinate);
>> +
>> +/* Called with rwsem held for writing */
>> +static void
>> +statsfs_source_remove_subordinate_locked(struct statsfs_source *source,
>> +					 struct statsfs_source *sub)
>> +{
>> +	struct list_head *it, *safe;
>> +	struct statsfs_source *src_entry;
>> +
>> +	list_for_each_safe(it, safe, &source->subordinates_head) {
>> +		src_entry = list_entry(it, struct statsfs_source, list_element);
>> +		if (src_entry == sub) {
>> +			WARN_ON(strcmp(src_entry->name, sub->name) != 0);
>> +			list_del_init(&src_entry->list_element);
>> +			statsfs_source_put(src_entry);
>> +			return;
>> +		}
>> +	}
>> +}
>> +
>> +void statsfs_source_remove_subordinate(struct statsfs_source *source,
>> +				       struct statsfs_source *sub)
>> +{
>> +	down_write(&source->rwsem);
>> +	statsfs_source_remove_subordinate_locked(source, sub);
>> +	up_write(&source->rwsem);
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_remove_subordinate);
>> +
>> +/* Called with rwsem held for reading */
>> +static uint64_t get_simple_value(struct statsfs_value_source *src,
>> +				 struct statsfs_value *val)
>> +{
>> +	uint64_t value_found;
>> +	void *address;
>> +
>> +	address = src->base_addr + val->offset;
>> +
>> +	switch (val->type) {
>> +	case STATSFS_U8:
>> +		value_found = *((uint8_t *)address);
>> +		break;
>> +	case STATSFS_U8 | STATSFS_SIGN:
>> +		value_found = *((int8_t *)address);
>> +		break;
>> +	case STATSFS_U16:
>> +		value_found = *((uint16_t *)address);
>> +		break;
>> +	case STATSFS_U16 | STATSFS_SIGN:
>> +		value_found = *((int16_t *)address);
>> +		break;
>> +	case STATSFS_U32:
>> +		value_found = *((uint32_t *)address);
>> +		break;
>> +	case STATSFS_U32 | STATSFS_SIGN:
>> +		value_found = *((int32_t *)address);
>> +		break;
>> +	case STATSFS_U64:
>> +		value_found = *((uint64_t *)address);
>> +		break;
>> +	case STATSFS_U64 | STATSFS_SIGN:
>> +		value_found = *((int64_t *)address);
>> +		break;
>> +	case STATSFS_BOOL:
>> +		value_found = *((uint8_t *)address);
>> +		break;
>> +	default:
>> +		value_found = 0;
>> +		break;
>> +	}
>> +
>> +	return value_found;
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static void clear_simple_value(struct statsfs_value_source *src,
>> +			       struct statsfs_value *val)
>> +{
>> +	void *address;
>> +
>> +	address = src->base_addr + val->offset;
>> +
>> +	switch (val->type) {
>> +	case STATSFS_U8:
>> +		*((uint8_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U8 | STATSFS_SIGN:
>> +		*((int8_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U16:
>> +		*((uint16_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U16 | STATSFS_SIGN:
>> +		*((int16_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U32:
>> +		*((uint32_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U32 | STATSFS_SIGN:
>> +		*((int32_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U64:
>> +		*((uint64_t *)address) = 0;
>> +		break;
>> +	case STATSFS_U64 | STATSFS_SIGN:
>> +		*((int64_t *)address) = 0;
>> +		break;
>> +	case STATSFS_BOOL:
>> +		*((uint8_t *)address) = 0;
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static void search_all_simple_values(struct statsfs_source *src,
>> +				     struct statsfs_value_source *ref_src_entry,
>> +				     struct statsfs_value *val,
>> +				     struct statsfs_aggregate_value *agg)
>> +{
>> +	struct statsfs_value_source *src_entry;
>> +	uint64_t value_found;
>> +
>> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
>> +		/* skip aggregates */
>> +		if (src_entry->base_addr == NULL)
>> +			continue;
>> +
>> +		/* useless to search here */
>> +		if (src_entry->values != ref_src_entry->values)
>> +			continue;
>> +
>> +		/* must be here */
>> +		value_found = get_simple_value(src_entry, val);
>> +		agg->sum += value_found;
>> +		agg->count++;
>> +		agg->count_zero += (value_found == 0);
>> +
>> +		if (is_val_signed(val)) {
>> +			agg->max = (((int64_t)value_found) >=
>> +				    ((int64_t)agg->max)) ?
>> +					   value_found :
>> +					   agg->max;
>> +			agg->min = (((int64_t)value_found) <=
>> +				    ((int64_t)agg->min)) ?
>> +					   value_found :
>> +					   agg->min;
>> +		} else {
>> +			agg->max = (value_found >= agg->max) ? value_found :
>> +							       agg->max;
>> +			agg->min = (value_found <= agg->min) ? value_found :
>> +							       agg->min;
>> +		}
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static void do_recursive_aggregation(struct statsfs_source *root,
>> +				     struct statsfs_value_source *ref_src_entry,
>> +				     struct statsfs_value *val,
>> +				     struct statsfs_aggregate_value *agg)
>> +{
>> +	struct statsfs_source *subordinate;
>> +
>> +	/* search all simple values in this folder */
>> +	search_all_simple_values(root, ref_src_entry, val, agg);
>> +
>> +	/* recursively search in all subfolders */
>> +	list_for_each_entry(subordinate, &root->subordinates_head,
>> +			     list_element) {
>> +		down_read(&subordinate->rwsem);
>> +		do_recursive_aggregation(subordinate, ref_src_entry, val, agg);
>> +		up_read(&subordinate->rwsem);
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static void init_aggregate_value(struct statsfs_aggregate_value *agg,
>> +				 struct statsfs_value *val)
>> +{
>> +	agg->count = agg->count_zero = agg->sum = 0;
>> +	if (is_val_signed(val)) {
>> +		agg->max = S64_MIN;
>> +		agg->min = S64_MAX;
>> +	} else {
>> +		agg->max = 0;
>> +		agg->min = U64_MAX;
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static void store_final_value(struct statsfs_aggregate_value *agg,
>> +			    struct statsfs_value *val, uint64_t *ret)
>> +{
>> +	int operation;
>> +
>> +	operation = val->aggr_kind | is_val_signed(val);
>> +
>> +	switch (operation) {
>> +	case STATSFS_AVG:
>> +		*ret = agg->count ? agg->sum / agg->count : 0;
>> +		break;
>> +	case STATSFS_AVG | STATSFS_SIGN:
>> +		*ret = agg->count ? ((int64_t)agg->sum) / agg->count : 0;
>> +		break;
>> +	case STATSFS_SUM:
>> +	case STATSFS_SUM | STATSFS_SIGN:
>> +		*ret = agg->sum;
>> +		break;
>> +	case STATSFS_MIN:
>> +	case STATSFS_MIN | STATSFS_SIGN:
>> +		*ret = agg->min;
>> +		break;
>> +	case STATSFS_MAX:
>> +	case STATSFS_MAX | STATSFS_SIGN:
>> +		*ret = agg->max;
>> +		break;
>> +	case STATSFS_COUNT_ZERO:
>> +	case STATSFS_COUNT_ZERO | STATSFS_SIGN:
>> +		*ret = agg->count_zero;
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static int statsfs_source_get_value_locked(struct statsfs_source *source,
>> +					   struct statsfs_value *arg,
>> +					   uint64_t *ret)
>> +{
>> +	struct statsfs_value_source *src_entry;
>> +	struct statsfs_value *found;
>> +	struct statsfs_aggregate_value aggr;
>> +
>> +	*ret = 0;
>> +
>> +	if (!arg)
>> +		return -ENOENT;
>> +
>> +	/* look in simple values */
>> +	found = search_value_in_source(source, arg, &src_entry);
>> +
>> +	if (!found) {
>> +		printk(KERN_ERR "Statsfs: Value in source \"%s\" not found!\n",
>> +		       source->name);
>> +		return -ENOENT;
>> +	}
>> +
>> +	if (src_entry->base_addr != NULL) {
>> +		*ret = get_simple_value(src_entry, found);
>> +		return 0;
>> +	}
>> +
>> +	/* look in aggregates */
>> +	init_aggregate_value(&aggr, found);
>> +	do_recursive_aggregation(source, src_entry, found, &aggr);
>> +	store_final_value(&aggr, found, ret);
>> +
>> +	return 0;
>> +}
>> +
>> +int statsfs_source_get_value(struct statsfs_source *source,
>> +			     struct statsfs_value *arg, uint64_t *ret)
>> +{
>> +	int retval;
>> +
>> +	down_read(&source->rwsem);
>> +	retval = statsfs_source_get_value_locked(source, arg, ret);
>> +	up_read(&source->rwsem);
>> +
>> +	return retval;
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_get_value);
>> +
>> +/* Called with rwsem held for reading */
>> +static void set_all_simple_values(struct statsfs_source *src,
>> +				  struct statsfs_value_source *ref_src_entry,
>> +				  struct statsfs_value *val)
>> +{
>> +	struct statsfs_value_source *src_entry;
>> +
>> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
>> +		/* skip aggregates */
>> +		if (src_entry->base_addr == NULL)
>> +			continue;
>> +
>> +		/* wrong to search here */
>> +		if (src_entry->values != ref_src_entry->values)
>> +			continue;
>> +
>> +		if (src_entry->base_addr &&
>> +			src_entry->values == ref_src_entry->values)
>> +			clear_simple_value(src_entry, val);
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static void do_recursive_clean(struct statsfs_source *root,
>> +			       struct statsfs_value_source *ref_src_entry,
>> +			       struct statsfs_value *val)
>> +{
>> +	struct statsfs_source *subordinate;
>> +
>> +	/* search all simple values in this folder */
>> +	set_all_simple_values(root, ref_src_entry, val);
>> +
>> +	/* recursively search in all subfolders */
>> +	list_for_each_entry(subordinate, &root->subordinates_head,
>> +			     list_element) {
>> +		down_read(&subordinate->rwsem);
>> +		do_recursive_clean(subordinate, ref_src_entry, val);
>> +		up_read(&subordinate->rwsem);
>> +	}
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static int statsfs_source_clear_locked(struct statsfs_source *source,
>> +				       struct statsfs_value *val)
>> +{
>> +	struct statsfs_value_source *src_entry;
>> +	struct statsfs_value *found;
>> +
>> +	if (!val)
>> +		return -ENOENT;
>> +
>> +	/* look in simple values */
>> +	found = search_value_in_source(source, val, &src_entry);
>> +
>> +	if (!found) {
>> +		printk(KERN_ERR "Statsfs: Value in source \"%s\" not found!\n",
>> +		       source->name);
>> +		return -ENOENT;
>> +	}
>> +
>> +	if (src_entry->base_addr != NULL) {
>> +		clear_simple_value(src_entry, found);
>> +		return 0;
>> +	}
>> +
>> +	/* look in aggregates */
>> +	do_recursive_clean(source, src_entry, found);
>> +
>> +	return 0;
>> +}
>> +
>> +int statsfs_source_clear(struct statsfs_source *source,
>> +			 struct statsfs_value *val)
>> +{
>> +	int retval;
>> +
>> +	down_read(&source->rwsem);
>> +	retval = statsfs_source_clear_locked(source, val);
>> +	up_read(&source->rwsem);
>> +
>> +	return retval;
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static struct statsfs_value *
>> +find_value_by_name(struct statsfs_value_source *src, char *val)
>> +{
>> +	struct statsfs_value *entry;
>> +
>> +	for (entry = src->values; entry->name; entry++)
>> +		if (!strcmp(entry->name, val))
>> +			return entry;
>> +
>> +	return NULL;
>> +}
>> +
>> +/* Called with rwsem held for reading */
>> +static struct statsfs_value *
>> +search_in_source_by_name(struct statsfs_source *src, char *name)
>> +{
>> +	struct statsfs_value *entry;
>> +	struct statsfs_value_source *src_entry;
>> +
>> +	list_for_each_entry(src_entry, &src->values_head, list_element) {
>> +		entry = find_value_by_name(src_entry, name);
>> +		if (entry)
>> +			return entry;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +int statsfs_source_get_value_by_name(struct statsfs_source *source, char *name,
>> +				     uint64_t *ret)
>> +{
>> +	struct statsfs_value *val;
>> +	int retval;
>> +
>> +	down_read(&source->rwsem);
>> +	val = search_in_source_by_name(source, name);
>> +
>> +	if (!val) {
>> +		*ret = 0;
>> +		up_read(&source->rwsem);
>> +		return -ENOENT;
>> +	}
>> +
>> +	retval = statsfs_source_get_value_locked(source, val, ret);
>> +	up_read(&source->rwsem);
>> +
>> +	return retval;
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_get_value_by_name);
>> +
>> +void statsfs_source_get(struct statsfs_source *source)
>> +{
>> +	kref_get(&source->refcount);
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_get);
>> +
>> +void statsfs_source_revoke(struct statsfs_source *source)
>> +{
>> +	struct list_head *it, *safe;
>> +	struct statsfs_value_source *val_src_entry;
>> +
>> +	down_write(&source->rwsem);
>> +
>> +	list_for_each_safe(it, safe, &source->values_head) {
>> +		val_src_entry = list_entry(it, struct statsfs_value_source,
>> +					   list_element);
>> +		val_src_entry->base_addr = NULL;
>> +	}
>> +
>> +	up_write(&source->rwsem);
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_revoke);
>> +
>> +/* Called with rwsem held for writing
>> + *
>> + * The refcount is 0 and the lock was taken before refcount
>> + * went from 1 to 0
>> + */
>> +static void statsfs_source_destroy(struct kref *kref_source)
>> +{
>> +	struct statsfs_value_source *val_src_entry;
>> +	struct list_head *it, *safe;
>> +	struct statsfs_source *child, *source;
>> +
>> +	source = container_of(kref_source, struct statsfs_source, refcount);
>> +
>> +	/* iterate through the values and delete them */
>> +	list_for_each_safe(it, safe, &source->values_head) {
>> +		val_src_entry = list_entry(it, struct statsfs_value_source,
>> +					   list_element);
>> +		kfree(val_src_entry);
>> +	}
>> +
>> +	/* iterate through the subordinates and delete them */
>> +	list_for_each_safe(it, safe, &source->subordinates_head) {
>> +		child = list_entry(it, struct statsfs_source, list_element);
>> +		statsfs_source_remove_subordinate_locked(source, child);
>> +	}
>> +
>> +
>> +	up_write(&source->rwsem);
>> +	kfree(source->name);
>> +	kfree(source);
>> +}
>> +
>> +void statsfs_source_put(struct statsfs_source *source)
>> +{
>> +	kref_put_rwsem(&source->refcount, statsfs_source_destroy,
>> +		       &source->rwsem);
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_put);
>> +
>> +struct statsfs_source *statsfs_source_create(const char *fmt, ...)
>> +{
>> +	va_list ap;
>> +	char buf[100];
>> +	struct statsfs_source *ret;
>> +	int char_needed;
>> +
>> +	va_start(ap, fmt);
>> +	char_needed = vsnprintf(buf, 100, fmt, ap);
>> +	va_end(ap);
>> +
>> +	ret = kzalloc(sizeof(struct statsfs_source), GFP_KERNEL);
>> +	if (!ret)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	ret->name = kstrdup(buf, GFP_KERNEL);
>> +	if (!ret->name) {
>> +		kfree(ret);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	kref_init(&ret->refcount);
>> +	init_rwsem(&ret->rwsem);
>> +
>> +	INIT_LIST_HEAD(&ret->values_head);
>> +	INIT_LIST_HEAD(&ret->subordinates_head);
>> +	INIT_LIST_HEAD(&ret->list_element);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(statsfs_source_create);
>> diff --git a/include/linux/statsfs.h b/include/linux/statsfs.h
>> new file mode 100644
>> index 000000000000..3f01f094946d
>> --- /dev/null
>> +++ b/include/linux/statsfs.h
>> @@ -0,0 +1,222 @@
>> +/* SPDX-License-Identifier: GPL-2.0
>> + *
>> + *  statsfs.h - a tiny little statistics file system
>> + *
>> + *  Copyright (C) 2020 Emanuele Giuseppe Esposito
>> + *  Copyright (C) 2020 Redhat.
>> + *
>> + */
>> +
>> +#ifndef _STATSFS_H_
>> +#define _STATSFS_H_
>> +
>> +#include <linux/list.h>
>> +
>> +/* Used to distinguish signed types */
>> +#define STATSFS_SIGN 0x8000
>> +
>> +struct statsfs_source;
>> +
>> +enum stat_type {
>> +	STATSFS_U8 = 0,
>> +	STATSFS_U16 = 1,
>> +	STATSFS_U32 = 2,
>> +	STATSFS_U64 = 3,
>> +	STATSFS_BOOL = 4,
>> +	STATSFS_S8 = STATSFS_U8 | STATSFS_SIGN,
>> +	STATSFS_S16 = STATSFS_U16 | STATSFS_SIGN,
>> +	STATSFS_S32 = STATSFS_U32 | STATSFS_SIGN,
>> +	STATSFS_S64 = STATSFS_U64 | STATSFS_SIGN,
>> +};
>> +
>> +enum stat_aggr {
>> +	STATSFS_NONE = 0,
>> +	STATSFS_SUM,
>> +	STATSFS_MIN,
>> +	STATSFS_MAX,
>> +	STATSFS_COUNT_ZERO,
>> +	STATSFS_AVG,
>> +};
>> +
>> +struct statsfs_value {
>> +	/* Name of the stat */
>> +	char *name;
>> +
>> +	/* Offset from base address to field containing the value */
>> +	int offset;
>> +
>> +	/* Type of the stat BOOL,U64,... */
>> +	enum stat_type type;
>> +
>> +	/* Aggregate type: MIN, MAX, SUM,... */
>> +	enum stat_aggr aggr_kind;
>> +
>> +	/* File mode */
>> +	uint16_t mode;
>> +};
>> +
>> +struct statsfs_source {
>> +	struct kref refcount;
>> +
>> +	char *name;
>> +
>> +	/* list of source statsfs_value_source*/
>> +	struct list_head values_head;
>> +
>> +	/* list of struct statsfs_source for subordinate sources */
>> +	struct list_head subordinates_head;
>> +
>> +	struct list_head list_element;
>> +
>> +	struct rw_semaphore rwsem;
>> +
>> +	struct dentry *source_dentry;
>> +};
>> +
>> +/**
>> + * statsfs_source_create - create a statsfs_source
>> + * Creates a statsfs_source with the given name. This
>> + * does not mean it will be backed by the filesystem yet, it will only
>> + * be visible to the user once one of its parents (or itself) are
>> + * registered in statsfs.
>> + *
>> + * Returns a pointer to a statsfs_source if it succeeds.
>> + * This or one of the parents' pointer must be passed to the statsfs_put()
>> + * function when the file is to be removed.  If an error occurs,
>> + * ERR_PTR(-ERROR) will be returned.
>> + */
>> +struct statsfs_source *statsfs_source_create(const char *fmt, ...);
>> +
>> +/**
>> + * statsfs_source_add_values - adds values to the given source
>> + * @source: a pointer to the source that will receive the values
>> + * @val: a pointer to the NULL terminated statsfs_value array to add
>> + * @base_ptr: a pointer to the base pointer used by these values
>> + *
>> + * In addition to adding values to the source, also create the
>> + * files in the filesystem if the source already is backed up by a directory.
>> + *
>> + * Returns 0 it succeeds. If the value are already in the
>> + * source and have the same base_ptr, -EEXIST is returned.
>> + */
>> +int statsfs_source_add_values(struct statsfs_source *source,
>> +			      struct statsfs_value *val, void *base_ptr);
>> +
>> +/**
>> + * statsfs_source_add_subordinate - adds a child to the given source
>> + * @parent: a pointer to the parent source
>> + * @child: a pointer to child source to add
>> + *
>> + * Recursively create all files in the statsfs filesystem
>> + * only if the parent has already a dentry (created with
>> + * statsfs_source_register).
>> + * This avoids the case where this function is called before register.
>> + */
>> +void statsfs_source_add_subordinate(struct statsfs_source *parent,
>> +				    struct statsfs_source *child);
>> +
>> +/**
>> + * statsfs_source_remove_subordinate - removes a child from the given source
>> + * @parent: a pointer to the parent source
>> + * @child: a pointer to child source to remove
>> + *
>> + * Look if there is such child in the parent. If so,
>> + * it will remove all its files and call statsfs_put on the child.
>> + */
>> +void statsfs_source_remove_subordinate(struct statsfs_source *parent,
>> +				       struct statsfs_source *child);
>> +
>> +/**
>> + * statsfs_source_get_value - search a value in the source (and
>> + * subordinates)
>> + * @source: a pointer to the source that will be searched
>> + * @val: a pointer to the statsfs_value to search
>> + * @ret: a pointer to the uint64_t that will hold the found value
>> + *
>> + * Look up in the source if a value with same value pointer
>> + * exists.
>> + * If not, it will return -ENOENT. If it exists and it's a simple value
>> + * (not an aggregate), the value that it points to will be returned.
>> + * If it exists and it's an aggregate (aggr_type != STATSFS_NONE), all
>> + * subordinates will be recursively searched and every simple value match
>> + * will be used to aggregate the final result. For example if it's a sum,
>> + * all suboordinates having the same value will be sum together.
>> + *
>> + * This function will return 0 it succeeds.
>> + */
>> +int statsfs_source_get_value(struct statsfs_source *source,
>> +			     struct statsfs_value *val, uint64_t *ret);
>> +
>> +/**
>> + * statsfs_source_get_value_by_name - search a value in the source (and
>> + * subordinates)
>> + * @source: a pointer to the source that will be searched
>> + * @name: a pointer to the string representing the value to search
>> + *        (for example "exits")
>> + * @ret: a pointer to the uint64_t that will hold the found value
>> + *
>> + * Same as statsfs_source_get_value, but initially the name is used
>> + * to search in the given source if there is a value with a matching
>> + * name. If so, statsfs_source_get_value will be called with the found
>> + * value, otherwise -ENOENT will be returned.
>> + */
>> +int statsfs_source_get_value_by_name(struct statsfs_source *source, char *name,
>> +				     uint64_t *ret);
>> +
>> +/**
>> + * statsfs_source_clear - search and clears a value in the source (and
>> + * subordinates)
>> + * @source: a pointer to the source that will be searched
>> + * @val: a pointer to the statsfs_value to search
>> + *
>> + * Look up in the source if a value with same value pointer
>> + * exists.
>> + * If not, it will return -ENOENT. If it exists and it's a simple value
>> + * (not an aggregate), the value that it points to will be set to 0.
>> + * If it exists and it's an aggregate (aggr_type != STATSFS_NONE), all
>> + * subordinates will be recursively searched and every simple value match
>> + * will be set to 0.
>> + *
>> + * This function will return 0 it succeeds.
>> + */
>> +int statsfs_source_clear(struct statsfs_source *source,
>> +			 struct statsfs_value *val);
>> +
>> +/**
>> + * statsfs_source_revoke - disconnect the source from its backing data
>> + * @source: a pointer to the source that will be revoked
>> + *
>> + * Ensure that statsfs will not access the data that were passed to
>> + * statsfs_source_add_value for this source.
>> + *
>> + * Because open files increase the reference count for a statsfs_source,
>> + * the source can end up living longer than the data that provides the
>> + * values for the source.  Calling statsfs_source_revoke just before the
>> + * backing data is freed avoids accesses to freed data structures.  The
>> + * sources will return 0.
>> + */
>> +void statsfs_source_revoke(struct statsfs_source *source);
>> +
>> +/**
>> + * statsfs_source_get - increases refcount of source
>> + * @source: a pointer to the source whose refcount will be increased
>> + */
>> +void statsfs_source_get(struct statsfs_source *source);
>> +
>> +/**
>> + * statsfs_source_put - decreases refcount of source and deletes if needed
>> + * @source: a pointer to the source whose refcount will be decreased
>> + *
>> + * If refcount arrives to zero, take care of deleting
>> + * and free the source resources and files, by firstly recursively calling
>> + * statsfs_source_remove_subordinate to the child and then deleting
>> + * its own files and allocations.
>> + */
>> +void statsfs_source_put(struct statsfs_source *source);
>> +
>> +/**
>> + * statsfs_initialized - returns true if statsfs fs has been registered
>> + */
>> +bool statsfs_initialized(void);
>> +
>> +#endif
>> --
>> 2.25.2
>>
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-04-29 10:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-27 14:18 [RFC PATCH 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
2020-04-27 14:18 ` [RFC PATCH 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores Emanuele Giuseppe Esposito
2020-04-27 14:18 ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Emanuele Giuseppe Esposito
2020-04-27 15:47   ` Matthew Wilcox
2020-04-27 16:48     ` Emanuele Giuseppe Esposito
2020-04-29  9:49     ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs Emanuele Giuseppe Esposito
2020-04-27 21:53   ` [RFC PATCH 2/5] statsfs API: create, add and remove statsfs sources and values Andreas Dilger
2020-04-29 10:55     ` Emanuele Giuseppe Esposito
2020-04-28 17:47   ` Randy Dunlap
2020-04-29 10:34     ` Paolo Bonzini
2020-04-27 14:18 ` [RFC PATCH 3/5] kunit: tests for statsfs API Emanuele Giuseppe Esposito
2020-04-28 17:50   ` Randy Dunlap
2020-04-27 14:18 ` [RFC PATCH 4/5] statsfs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
2020-04-27 14:18 ` [RFC PATCH 5/5] kvm_main: replace debugfs with statsfs Emanuele Giuseppe Esposito
2020-04-28 17:56   ` Randy Dunlap
2020-04-29 10:34     ` Emanuele Giuseppe Esposito
2020-04-29 10:35       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).