linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
@ 2020-05-26 11:03 Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values Emanuele Giuseppe Esposito
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

There is currently no common way for Linux kernel subsystems to expose
statistics to userspace shared throughout the Linux kernel; subsystems have
to take care of gathering and displaying statistics by themselves, for
example in the form of files in debugfs. For example KVM has its own code
section that takes care of this in virt/kvm/kvm_main.c, where it sets up
debugfs handlers for displaying values and aggregating them from various
subfolders to obtain information about the system state (i.e. displaying
the total number of exits, calculated by summing all exits of all cpus of
all running virtual machines).

Allowing each section of the kernel to do so has two disadvantages. First,
it will introduce redundant code. Second, debugfs is anyway not the right
place for statistics (for example it is affected by lockdown)

In this patch series I introduce statsfs, a synthetic ram-based virtual
filesystem that takes care of gathering and displaying statistics for the
Linux kernel subsystems.

The file system is mounted on /sys/kernel/stats and would be already used
by kvm. Statsfs was initially introduced by Paolo Bonzini [1].

Statsfs offers a generic and stable API, allowing any kind of
directory/file organization and supporting multiple kind of aggregations
(not only sum, but also average, max, min and count_zero) and data types
(boolean, unsigned/signed and custom types). The implementation, which is
a generalization of KVM’s debugfs statistics code, takes care of gathering
and displaying information at run time; users only need to specify the
values to be included in each source.

Statsfs would also be a different mountpoint from debugfs, and would not
suffer from limited access due to the security lock down patches. Its main
function is to display each statistics as a file in the desired folder
hierarchy defined through the API. Statsfs files can be read, and possibly
cleared if their file mode allows it.

Statsfs has two main components: the public API defined by
include/linux/statsfs.h, and the virtual file system which should end up in
/sys/kernel/stats.

The API has two main elements, values and sources. Kernel subsystems like
KVM can use the API to create a source, add child sources/values/aggregates
and register it to the root source (that on the virtual fs would be
/sys/kernel/statsfs).

Sources are created via statsfs_source_create(), and each source becomes a
directory in the file system. Sources form a parent-child relationship;
root sources are added to the file system via statsfs_source_register().
Every other source is added to or removed from a parent through the
statsfs_source_add_subordinate and statsfs_source_remote_subordinate APIs.
Once a source is created and added to the tree (via add_subordinate), it
will be used to compute aggregate values in the parent source.
A source can optionally be hidden from the filesystem
but still considered in the aggregation operations if the corresponding
flag is set during initialization.

Values represent quantites that are gathered by the statsfs user. Examples
of values include the number of vm exits of a given kind, the amount of
memory used by some data structure, the length of the longest hash table
chain, or anything like that. Values are defined with the
statsfs_source_add_values function. Each value is defined by a struct
statsfs_value; the same statsfs_value can be added to many different
sources. A value can be considered "simple" if it fetches data from a
user-provided location, or "aggregate" if it groups all values in the
subordinates sources that include the same statsfs_value.
Each value has a stats_fs_type pointer in order to allow the user to
provide custom get and clear functions. The library, however, also
exports default stats_fs_type structs for the standard types
(all unsigned and signed types plus boolean).
A value can also provide a show function, that takes care
of displaying the value in a custom string format. This can be especially
useful when displaying enums.

For more information, please consult the kerneldoc documentation in patch 2
and the sample uses in the kunit tests, KVM and networking.

This series of patches is based on my previous series "libfs: group and
simplify linux fs code" and the single patch sent to kvm "kvm_host: unify
VM_STAT and VCPU_STAT definitions in a single place". The former simplifies
code duplicated in debugfs and tracefs (from which statsfs is based on),
the latter groups all macros definition for statistics in kvm in a single
common file shared by all architectures.

Patch 1 adds a new refcount and kref destructor wrappers that take a
semaphore, as those are used later by statsfs. Patch 2 introduces the
statsfs API, patch 3 provides extensive tests that can also be used as
example on how to use the API and patch 4 adds the file system support.
Finally, patch 5 provides a real-life example of statsfs usage in KVM,
with patch 6 providing a concrete example of the show function and
patch 7 another real-life example in the networking subsystem.

[1] https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa366d@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

v2 -> v3 move kconfig entry in the pseudo filesystem menu, add
documentation, get/clear function for value types, show function,
floating/cumulative and hidden flags. Also added the netstat
example

Emanuele Giuseppe Esposito (7):
  stats_fs API: create, add and remove stats_fs sources and values
  documentation for stats_fs
  kunit: tests for stats_fs API
  stats_fs fs: virtual fs to show stats to the end-user
  kvm_main: replace debugfs with stats_fs
  [not for merge] kvm: example of stats_fs_value show function
  [not for merge] netstats: example use of stats_fs API

 Documentation/filesystems/index.rst    |    1 +
 Documentation/filesystems/stats_fs.rst |  222 +++++
 MAINTAINERS                            |    7 +
 arch/arm64/kvm/Kconfig                 |    1 +
 arch/arm64/kvm/guest.c                 |    2 +-
 arch/mips/kvm/Kconfig                  |    1 +
 arch/mips/kvm/mips.c                   |    2 +-
 arch/powerpc/kvm/Kconfig               |    1 +
 arch/powerpc/kvm/book3s.c              |   12 +-
 arch/powerpc/kvm/booke.c               |    8 +-
 arch/s390/kvm/Kconfig                  |    1 +
 arch/s390/kvm/kvm-s390.c               |   16 +-
 arch/x86/include/asm/kvm_host.h        |    2 +-
 arch/x86/kvm/Kconfig                   |    1 +
 arch/x86/kvm/Makefile                  |    2 +-
 arch/x86/kvm/debugfs.c                 |   64 --
 arch/x86/kvm/stats_fs.c                |  114 +++
 arch/x86/kvm/x86.c                     |   11 +-
 fs/Kconfig                             |   20 +
 fs/Makefile                            |    1 +
 fs/stats_fs/Makefile                   |    7 +
 fs/stats_fs/inode.c                    |  461 ++++++++++
 fs/stats_fs/internal.h                 |   34 +
 fs/stats_fs/stats_fs-tests.c           | 1097 ++++++++++++++++++++++++
 fs/stats_fs/stats_fs.c                 |  642 ++++++++++++++
 fs/stats_fs/stub.c                     |   13 +
 include/linux/kvm_host.h               |   45 +-
 include/linux/netdevice.h              |    2 +
 include/linux/stats_fs.h               |  381 ++++++++
 include/uapi/linux/magic.h             |    1 +
 net/Kconfig                            |    1 +
 net/core/dev.c                         |   68 ++
 tools/lib/api/fs/fs.c                  |   21 +
 virt/kvm/arm/arm.c                     |    2 +-
 virt/kvm/kvm_main.c                    |  317 +------
 35 files changed, 3193 insertions(+), 388 deletions(-)
 create mode 100644 Documentation/filesystems/stats_fs.rst
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/stats_fs.c
 create mode 100644 fs/stats_fs/Makefile
 create mode 100644 fs/stats_fs/inode.c
 create mode 100644 fs/stats_fs/internal.h
 create mode 100644 fs/stats_fs/stats_fs-tests.c
 create mode 100644 fs/stats_fs/stats_fs.c
 create mode 100644 fs/stats_fs/stub.c
 create mode 100644 include/linux/stats_fs.h

-- 
2.25.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 2/7] documentation for stats_fs Emanuele Giuseppe Esposito
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Introduction to the stats_fs API, that allows to easily create, add
and remove stats_fs sources and values. The API allows to easily building
the statistics directory tree to automatically gather them for the linux
kernel. The main functionalities are: create a source, add child
sources/values/aggregates, register it to the root source (that on
the virtual fs would be /sys/kernel/stats), ad perform a search for
a value/aggregate.

Each source and value has an optional flag parameter:
in a value, it represent whether the statistic is cumulative or floating, in a
source whether it should be visible from the filesystem or not.
Defaults are respectively cumulative and visible.
Both flags fields are represented as an uint32_t to offer portability for
future flags.

Each value also takes a struct stats_fs_type pointer that defines
get and clear function for that stat, allowing custom
types handling. The API also  provides default get and clear types for
the supported standard types (stats_fs_type_*).

The API representation is only logical and will be backed up
by a virtual file system in patch 4.
Its usage will be shared between the stats_fs file system
and the end-users like kvm, the former calling it when it needs to
display and clear statistics, the latter to add values and sources.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 MAINTAINERS              |   7 +
 fs/Kconfig               |  14 +
 fs/Makefile              |   1 +
 fs/stats_fs/Makefile     |   5 +
 fs/stats_fs/internal.h   |  19 ++
 fs/stats_fs/stats_fs.c   | 552 +++++++++++++++++++++++++++++++++++++++
 fs/stats_fs/stub.c       |  13 +
 include/linux/stats_fs.h | 363 +++++++++++++++++++++++++
 8 files changed, 974 insertions(+)
 create mode 100644 fs/stats_fs/Makefile
 create mode 100644 fs/stats_fs/internal.h
 create mode 100644 fs/stats_fs/stats_fs.c
 create mode 100644 fs/stats_fs/stub.c
 create mode 100644 include/linux/stats_fs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b816a453b10e..a8403d07cee5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5229,6 +5229,13 @@ F:	include/linux/debugfs.h
 F:	include/linux/kobj*
 F:	lib/kobj*
 
+STATS_FS
+M:	Paolo Bonzini <pbonzini@redhat.com>
+R:	Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
+S:	Supported
+F:	include/linux/stats_fs.h
+F:	fs/stats_fs
+
 DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS)
 M:	Kevin Hilman <khilman@kernel.org>
 M:	Nishanth Menon <nm@ti.com>
diff --git a/fs/Kconfig b/fs/Kconfig
index f08fbbfafd9a..684ad61129ab 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -221,6 +221,20 @@ config MEMFD_CREATE
 config ARCH_HAS_GIGANTIC_PAGE
 	bool
 
+config STATS_FS
+	bool "Statistics Filesystem"
+	help
+	  stats_fs is a virtual file system that provides counters and
+	  other statistics about the running kernel.
+
+config STATS_FS_API
+	bool
+	imply STATS_FS
+
+config STATS_FS_STUB
+	bool
+	default y if STATS_FS_API && !STATS_FS
+
 source "fs/configfs/Kconfig"
 source "fs/efivarfs/Kconfig"
 
diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..91558eca0cf7 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS)		+= befs/
 obj-$(CONFIG_HOSTFS)		+= hostfs/
 obj-$(CONFIG_CACHEFILES)	+= cachefiles/
 obj-$(CONFIG_DEBUG_FS)		+= debugfs/
+obj-$(CONFIG_STATS_FS)		+= stats_fs/
 obj-$(CONFIG_TRACING)		+= tracefs/
 obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
 obj-$(CONFIG_BTRFS_FS)		+= btrfs/
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
new file mode 100644
index 000000000000..bd988daa4c39
--- /dev/null
+++ b/fs/stats_fs/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+stats_fs-objs	:= stats_fs.o
+
+obj-$(CONFIG_STATS_FS)	    += stats_fs.o
+obj-$(CONFIG_STATS_FS_STUB) += stub.o
diff --git a/fs/stats_fs/internal.h b/fs/stats_fs/internal.h
new file mode 100644
index 000000000000..4993afbb1e45
--- /dev/null
+++ b/fs/stats_fs/internal.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _STATS_FS_INTERNAL_H_
+#define _STATS_FS_INTERNAL_H_
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/rwsem.h>
+#include <linux/stats_fs.h>
+
+/* values, grouped by base */
+struct stats_fs_value_source {
+	void *base_addr;
+	bool files_created;
+	uint32_t common_flags;
+	struct stats_fs_value *values;
+	struct list_head list_element;
+};
+
+#endif /* _STATS_FS_INTERNAL_H_ */
diff --git a/fs/stats_fs/stats_fs.c b/fs/stats_fs/stats_fs.c
new file mode 100644
index 000000000000..b76ee44f6dac
--- /dev/null
+++ b/fs/stats_fs/stats_fs.c
@@ -0,0 +1,552 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/limits.h>
+#include <linux/stats_fs.h>
+
+#include "internal.h"
+
+struct stats_fs_aggregate_value {
+	uint64_t sum, min, max;
+	uint32_t count, count_zero;
+};
+
+#define STATS_FS_DEFINE_TYPE_STRUCT(gtype, stype, si)                          \
+	const struct stats_fs_type stats_fs_type_##gtype =		       \
+		{                                                              \
+			.get = stats_fs_get_##gtype,			       \
+			.clear = stats_fs_clear_##stype,		       \
+			.sign = si,                                            \
+		};							       \
+	EXPORT_SYMBOL_GPL(stats_fs_type_##gtype);
+
+#define STATS_FS_TYPE_STRUCT_US(len)				               \
+	STATS_FS_DEFINE_TYPE_STRUCT(u##len, len, false)			       \
+	STATS_FS_DEFINE_TYPE_STRUCT(s##len, len, true)
+
+#define STATS_FS_TYPE_STRUCT(type)				               \
+	STATS_FS_DEFINE_TYPE_STRUCT(type, type, false)
+
+STATS_FS_TYPE_STRUCT_US(8)
+STATS_FS_TYPE_STRUCT_US(16)
+STATS_FS_TYPE_STRUCT_US(32)
+STATS_FS_TYPE_STRUCT_US(64)
+STATS_FS_TYPE_STRUCT(bool)
+
+static int is_val_signed(struct stats_fs_value *val)
+{
+	return val->type->sign;
+}
+
+static struct stats_fs_value *find_value(struct stats_fs_value_source *src,
+					 struct stats_fs_value *val)
+{
+	struct stats_fs_value *entry;
+
+	for (entry = src->values; entry->name; entry++) {
+		if (entry == val)
+			return entry;
+	}
+	return NULL;
+}
+
+static struct stats_fs_value *
+search_value_in_source(struct stats_fs_source *src, struct stats_fs_value *arg,
+		       struct stats_fs_value_source **val_src)
+{
+	struct stats_fs_value *entry;
+	struct stats_fs_value_source *src_entry;
+
+	list_for_each_entry (src_entry, &src->values_head, list_element) {
+		entry = find_value(src_entry, arg);
+		if (entry) {
+			*val_src = src_entry;
+			return entry;
+		}
+	}
+
+	return NULL;
+}
+
+/* Called with rwsem held for writing */
+static struct stats_fs_value_source *create_value_source(void *base, uint32_t flags)
+{
+	struct stats_fs_value_source *val_src;
+
+	val_src = kzalloc(sizeof(struct stats_fs_value_source), GFP_KERNEL);
+	if (!val_src)
+		return ERR_PTR(-ENOMEM);
+
+	val_src->base_addr = base;
+	val_src->common_flags = flags;
+	INIT_LIST_HEAD(&val_src->list_element);
+
+	return val_src;
+}
+
+int stats_fs_source_add_values(struct stats_fs_source *source,
+			       struct stats_fs_value *stat, void *ptr,
+			       uint32_t flags)
+{
+	struct stats_fs_value_source *val_src;
+	struct stats_fs_value_source *entry;
+
+	down_write(&source->rwsem);
+
+	list_for_each_entry (entry, &source->values_head, list_element) {
+		if (entry->base_addr == ptr && entry->values == stat) {
+			up_write(&source->rwsem);
+			return -EEXIST;
+		}
+	}
+
+	val_src = create_value_source(ptr, flags);
+	val_src->values = (struct stats_fs_value *)stat;
+
+	/* add the val_src to the source list */
+	list_add(&val_src->list_element, &source->values_head);
+
+	up_write(&source->rwsem);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_add_values);
+
+void stats_fs_source_add_subordinate(struct stats_fs_source *source,
+				     struct stats_fs_source *sub)
+{
+	down_write(&source->rwsem);
+
+	stats_fs_source_get(sub);
+	list_add(&sub->list_element, &source->subordinates_head);
+
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_add_subordinate);
+
+/* Called with rwsem held for writing */
+static void
+stats_fs_source_remove_subordinate_locked(struct stats_fs_source *source,
+					  struct stats_fs_source *sub)
+{
+	struct stats_fs_source *src_entry;
+
+	list_for_each_entry (src_entry, &source->subordinates_head,
+			     list_element) {
+		if (src_entry == sub) {
+			list_del_init(&src_entry->list_element);
+			stats_fs_source_put(src_entry);
+			return;
+		}
+	}
+}
+
+void stats_fs_source_remove_subordinate(struct stats_fs_source *source,
+					struct stats_fs_source *sub)
+{
+	down_write(&source->rwsem);
+	stats_fs_source_remove_subordinate_locked(source, sub);
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_remove_subordinate);
+
+/* Called with rwsem held for reading */
+static uint64_t get_simple_value(struct stats_fs_value_source *src,
+				 struct stats_fs_value *val)
+{
+	if (val->type->get)
+		return val->type->get(val, src->base_addr);
+	return 0;
+}
+
+/* Called with rwsem held for reading */
+static void
+search_all_simple_values(struct stats_fs_source *src,
+			 struct stats_fs_value_source *ref_src_entry,
+			 struct stats_fs_value *val,
+			 struct stats_fs_aggregate_value *agg)
+{
+	struct stats_fs_value_source *src_entry;
+	uint64_t value_found;
+
+	list_for_each_entry (src_entry, &src->values_head, list_element) {
+		/* skip aggregates */
+		if (src_entry->base_addr == NULL)
+			continue;
+
+		/* useless to search here */
+		if (src_entry->values != ref_src_entry->values)
+			continue;
+
+		/* must be here */
+		value_found = get_simple_value(src_entry, val);
+
+		agg->sum += value_found;
+		agg->count++;
+		agg->count_zero += (value_found == 0);
+
+		if (is_val_signed(val)) {
+			agg->max = (((int64_t)value_found) >=
+				    ((int64_t)agg->max)) ?
+					   value_found :
+					   agg->max;
+			agg->min = (((int64_t)value_found) <=
+				    ((int64_t)agg->min)) ?
+					   value_found :
+					   agg->min;
+		} else {
+			agg->max = (value_found >= agg->max) ? value_found :
+							       agg->max;
+			agg->min = (value_found <= agg->min) ? value_found :
+							       agg->min;
+		}
+	}
+}
+
+/* Called with rwsem held for reading */
+static void
+do_recursive_aggregation(struct stats_fs_source *root,
+			 struct stats_fs_value_source *ref_src_entry,
+			 struct stats_fs_value *val,
+			 struct stats_fs_aggregate_value *agg)
+{
+	struct stats_fs_source *subordinate;
+
+	/* search all simple values in this folder */
+	search_all_simple_values(root, ref_src_entry, val, agg);
+
+	/* recursively search in all subfolders */
+	list_for_each_entry (subordinate, &root->subordinates_head,
+			     list_element) {
+		down_read(&subordinate->rwsem);
+		do_recursive_aggregation(subordinate, ref_src_entry, val, agg);
+		up_read(&subordinate->rwsem);
+	}
+}
+
+/* Called with rwsem held for reading */
+static void init_aggregate_value(struct stats_fs_aggregate_value *agg,
+				 struct stats_fs_value *val)
+{
+	agg->count = agg->count_zero = agg->sum = 0;
+	if (is_val_signed(val)) {
+		agg->max = S64_MIN;
+		agg->min = S64_MAX;
+	} else {
+		agg->max = 0;
+		agg->min = U64_MAX;
+	}
+}
+
+/* Called with rwsem held for reading */
+static void store_final_value(struct stats_fs_aggregate_value *agg,
+			      struct stats_fs_value *val, uint64_t *ret)
+{
+	switch (val->aggr_kind) {
+	case STATS_FS_AVG:{
+		if (is_val_signed(val))
+			*ret = agg->count ? ((int64_t)agg->sum) / agg->count : 0;
+		else
+			*ret = agg->count ? agg->sum / agg->count : 0;
+		break;
+	}
+	case STATS_FS_SUM:
+		*ret = agg->sum;
+		break;
+	case STATS_FS_MIN:
+		*ret = agg->min;
+		break;
+	case STATS_FS_MAX:
+		*ret = agg->max;
+		break;
+	case STATS_FS_COUNT_ZERO:
+		*ret = agg->count_zero;
+		break;
+	default:
+		break;
+	}
+}
+
+/* Called with rwsem held for reading */
+static int stats_fs_source_get_value_locked(struct stats_fs_source *source,
+					    struct stats_fs_value *arg,
+					    uint64_t *ret)
+{
+	struct stats_fs_value_source *src_entry;
+	struct stats_fs_value *found;
+	struct stats_fs_aggregate_value aggr;
+
+	*ret = 0;
+
+	if (!arg)
+		return -ENOENT;
+
+	/* look in simple values */
+	found = search_value_in_source(source, arg, &src_entry);
+
+	if (!found) {
+		printk(KERN_ERR "Stats_fs: Value in source \"%s\" not found!\n",
+		       source->name);
+		return -ENOENT;
+	}
+
+	if (src_entry->base_addr != NULL) {
+		*ret = get_simple_value(src_entry, found);
+		return 0;
+	}
+
+	/* look in aggregates */
+	init_aggregate_value(&aggr, found);
+	do_recursive_aggregation(source, src_entry, found, &aggr);
+	store_final_value(&aggr, found, ret);
+
+	return 0;
+}
+
+int stats_fs_source_get_value(struct stats_fs_source *source,
+			      struct stats_fs_value *arg, uint64_t *ret)
+{
+	int retval;
+
+	down_read(&source->rwsem);
+	retval = stats_fs_source_get_value_locked(source, arg, ret);
+	up_read(&source->rwsem);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_get_value);
+
+/* Called with rwsem held for reading */
+static void set_all_simple_values(struct stats_fs_source *src,
+				  struct stats_fs_value_source *ref_src_entry,
+				  struct stats_fs_value *val)
+{
+	struct stats_fs_value_source *src_entry;
+
+	list_for_each_entry (src_entry, &src->values_head, list_element) {
+		/* skip aggregates */
+		if (src_entry->base_addr == NULL)
+			continue;
+
+		/* wrong to search here */
+		if (src_entry->values != ref_src_entry->values)
+			continue;
+
+		if (src_entry->base_addr &&
+		    src_entry->values == ref_src_entry->values &&
+		    val->type->clear)
+			val->type->clear(val, src_entry->base_addr);
+	}
+}
+
+/* Called with rwsem held for reading */
+static void do_recursive_clean(struct stats_fs_source *root,
+			       struct stats_fs_value_source *ref_src_entry,
+			       struct stats_fs_value *val)
+{
+	struct stats_fs_source *subordinate;
+
+	/* search all simple values in this folder */
+	set_all_simple_values(root, ref_src_entry, val);
+
+	/* recursively search in all subfolders */
+	list_for_each_entry (subordinate, &root->subordinates_head,
+			     list_element) {
+		down_read(&subordinate->rwsem);
+		do_recursive_clean(subordinate, ref_src_entry, val);
+		up_read(&subordinate->rwsem);
+	}
+}
+
+/* Called with rwsem held for reading */
+static int stats_fs_source_clear_locked(struct stats_fs_source *source,
+					struct stats_fs_value *val)
+{
+	struct stats_fs_value_source *src_entry;
+	struct stats_fs_value *found;
+
+	if (!val)
+		return -ENOENT;
+
+	/* look in simple values */
+	found = search_value_in_source(source, val, &src_entry);
+
+	if (!found) {
+		printk(KERN_ERR "Stats_fs: Value in source \"%s\" not found!\n",
+		       source->name);
+		return -ENOENT;
+	}
+
+	if (!(stats_fs_val_get_mode(val) & 0222))
+		return -EPERM;
+
+	if (src_entry->base_addr != NULL && found->type->clear) {
+		found->type->clear(found, src_entry->base_addr);
+		return 0;
+	}
+
+	/* look in aggregates */
+	do_recursive_clean(source, src_entry, found);
+
+	return 0;
+}
+
+int stats_fs_source_clear(struct stats_fs_source *source,
+			  struct stats_fs_value *val)
+{
+	int retval;
+
+	down_read(&source->rwsem);
+	retval = stats_fs_source_clear_locked(source, val);
+	up_read(&source->rwsem);
+
+	return retval;
+}
+
+/* Called with rwsem held for reading */
+static struct stats_fs_value *
+find_value_by_name(struct stats_fs_value_source *src, char *val)
+{
+	struct stats_fs_value *entry;
+
+	for (entry = src->values; entry->name; entry++)
+		if (!strcmp(entry->name, val))
+			return entry;
+
+	return NULL;
+}
+
+/* Called with rwsem held for reading */
+static struct stats_fs_value *
+search_in_source_by_name(struct stats_fs_source *src, char *name)
+{
+	struct stats_fs_value *entry;
+	struct stats_fs_value_source *src_entry;
+
+	list_for_each_entry (src_entry, &src->values_head, list_element) {
+		entry = find_value_by_name(src_entry, name);
+		if (entry)
+			return entry;
+	}
+
+	return NULL;
+}
+
+int stats_fs_source_get_value_by_name(struct stats_fs_source *source,
+				      char *name, uint64_t *ret)
+{
+	struct stats_fs_value *val;
+	int retval;
+
+	down_read(&source->rwsem);
+	val = search_in_source_by_name(source, name);
+
+	if (!val) {
+		*ret = 0;
+		up_read(&source->rwsem);
+		return -ENOENT;
+	}
+
+	retval = stats_fs_source_get_value_locked(source, val, ret);
+	up_read(&source->rwsem);
+
+	return retval;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_get_value_by_name);
+
+void stats_fs_source_get(struct stats_fs_source *source)
+{
+	kref_get(&source->refcount);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_get);
+
+void stats_fs_source_revoke(struct stats_fs_source *source)
+{
+	struct stats_fs_value_source *val_src_entry;
+
+	down_write(&source->rwsem);
+
+	list_for_each_entry (val_src_entry, &source->values_head, list_element)
+		val_src_entry->base_addr = NULL;
+
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_revoke);
+
+/* Called with rwsem held for writing
+ *
+ * The refcount is 0 and the lock was taken before refcount
+ * went from 1 to 0
+ */
+static void stats_fs_source_destroy(struct kref *kref_source)
+{
+	struct stats_fs_value_source *val_src_entry;
+	struct list_head *it, *safe;
+	struct stats_fs_source *child, *source;
+
+	source = container_of(kref_source, struct stats_fs_source, refcount);
+
+	/* iterate through the values and delete them */
+	list_for_each_safe (it, safe, &source->values_head) {
+		val_src_entry = list_entry(it, struct stats_fs_value_source,
+					   list_element);
+		kfree(val_src_entry);
+	}
+
+	/* iterate through the subordinates and delete them */
+	list_for_each_safe (it, safe, &source->subordinates_head) {
+		child = list_entry(it, struct stats_fs_source, list_element);
+		stats_fs_source_remove_subordinate_locked(source, child);
+	}
+
+	up_write(&source->rwsem);
+	kfree(source->name);
+	kfree(source);
+}
+
+void stats_fs_source_put(struct stats_fs_source *source)
+{
+	kref_put_rwsem(&source->refcount, stats_fs_source_destroy,
+		       &source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_put);
+
+struct stats_fs_source *stats_fs_source_create(uint32_t flags, const char *fmt, ...)
+{
+	va_list ap;
+	char buf[100];
+	struct stats_fs_source *ret;
+	int char_needed;
+
+	va_start(ap, fmt);
+	char_needed = vsnprintf(buf, 100, fmt, ap);
+	va_end(ap);
+
+	ret = kzalloc(sizeof(struct stats_fs_source), GFP_KERNEL);
+	if (!ret)
+		return ERR_PTR(-ENOMEM);
+
+	ret->name = kstrdup(buf, GFP_KERNEL);
+	if (!ret->name) {
+		kfree(ret);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret->common_flags = flags;
+
+	kref_init(&ret->refcount);
+	init_rwsem(&ret->rwsem);
+
+	INIT_LIST_HEAD(&ret->values_head);
+	INIT_LIST_HEAD(&ret->subordinates_head);
+	INIT_LIST_HEAD(&ret->list_element);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_create);
diff --git a/fs/stats_fs/stub.c b/fs/stats_fs/stub.c
new file mode 100644
index 000000000000..0843b58ad3be
--- /dev/null
+++ b/fs/stats_fs/stub.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/limits.h>
+#include <linux/stats_fs.h>
+
+const struct stats_fs_type stats_fs_type_stub;
diff --git a/include/linux/stats_fs.h b/include/linux/stats_fs.h
new file mode 100644
index 000000000000..93847383f597
--- /dev/null
+++ b/include/linux/stats_fs.h
@@ -0,0 +1,363 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ *  stats_fs.h - a tiny little statistics file system
+ *
+ *  Copyright (C) 2020 Emanuele Giuseppe Esposito
+ *  Copyright (C) 2020 Redhat.
+ *
+ */
+
+#ifndef _STATS_FS_H_
+#define _STATS_FS_H_
+
+#include <linux/list.h>
+
+enum stat_aggr {
+	STATS_FS_NONE = 0,
+	STATS_FS_SUM,
+	STATS_FS_MIN,
+	STATS_FS_MAX,
+	STATS_FS_COUNT_ZERO,
+	STATS_FS_AVG,
+};
+
+/* flags used by source and internal source_value structure */
+enum source_value_common_flags {
+	STATS_FS_HIDDEN = 0x1,
+};
+
+enum stat_fs_value_flag {
+	STATS_FS_FLOATING_VALUE = 0x1,
+};
+
+struct stats_fs_value;
+
+struct stats_fs_type {
+	uint64_t (*get)(struct stats_fs_value *, void *);
+	void (*clear)(struct stats_fs_value *, void *);
+	bool sign;
+};
+
+struct stats_fs_value {
+	/* Name of the stat */
+	char *name;
+
+	/* Offset from base address to field containing the value */
+	int offset;
+
+	/* Type of the stat BOOL,U64,... */
+	const struct stats_fs_type *type;
+
+	/* Aggregate type: MIN, MAX, SUM,... */
+	enum stat_aggr aggr_kind;
+
+	uint32_t value_flag;
+};
+
+struct stats_fs_source {
+	struct kref refcount;
+
+	char *name;
+
+	uint32_t common_flags;
+
+	/* list of source stats_fs_value_source*/
+	struct list_head values_head;
+
+	/* list of struct stats_fs_source for subordinate sources */
+	struct list_head subordinates_head;
+
+	struct list_head list_element;
+
+	struct rw_semaphore rwsem;
+
+	struct dentry *source_dentry;
+};
+
+static inline int stats_fs_val_get_mode(struct stats_fs_value *val)
+{
+	return (val->value_flag & STATS_FS_FLOATING_VALUE || !val->type->clear)
+		? 0444 : 0644;
+}
+
+#define STATS_FS_DEFINE_GET(name, type)					       \
+	static inline uint64_t stats_fs_get_##name(struct stats_fs_value *val, \
+						   void *base)                 \
+	{                                                                      \
+		return *((type *)(base + (uintptr_t)val->offset));	       \
+	}
+
+#define STATS_FS_DEFINE_CLEAR(name, type)                                      \
+	static inline void stats_fs_clear_##name(struct stats_fs_value *val,   \
+						 void *base)                   \
+	{                                                                      \
+		*((type *)(base + (uintptr_t)val->offset)) = 0;                \
+	}
+
+#define STATS_FS_DEFINE_FUNCT_US(len)					       \
+	STATS_FS_DEFINE_GET(u##len, u##len)				       \
+	STATS_FS_DEFINE_GET(s##len, s##len)				       \
+	STATS_FS_DEFINE_CLEAR(len, u##len)
+
+#define STATS_FS_DEFINE_FUNCT(type)					       \
+	STATS_FS_DEFINE_GET(type, type)					       \
+	STATS_FS_DEFINE_CLEAR(type, type)
+
+STATS_FS_DEFINE_FUNCT_US(8)
+STATS_FS_DEFINE_FUNCT_US(16)
+STATS_FS_DEFINE_FUNCT_US(32)
+STATS_FS_DEFINE_FUNCT_US(64)
+STATS_FS_DEFINE_FUNCT(bool)
+
+#undef STATS_FS_DEFINE_FUNCT
+#undef STATS_FS_DEFINE_FUNCT_US
+#undef STATS_FS_DEFINE_GET
+#undef STATS_FS_DEFINE_CLEAR
+
+#if defined(CONFIG_STATS_FS)
+
+extern const struct stats_fs_type stats_fs_type_u8;
+extern const struct stats_fs_type stats_fs_type_s8;
+extern const struct stats_fs_type stats_fs_type_u16;
+extern const struct stats_fs_type stats_fs_type_s16;
+extern const struct stats_fs_type stats_fs_type_u32;
+extern const struct stats_fs_type stats_fs_type_s32;
+extern const struct stats_fs_type stats_fs_type_u64;
+extern const struct stats_fs_type stats_fs_type_s64;
+extern const struct stats_fs_type stats_fs_type_bool;
+
+/**
+ * stats_fs_source_create - create a stats_fs_source
+ * @flags: an integer containing all source flags (STATS_FS_HIDDEN, ...)
+ * @fmt: source name format
+ *
+ * Creates a stats_fs_source with the given name. This
+ * does not mean it will be backed by the filesystem yet, it will only
+ * be visible to the user once one of its parents (or itself) are
+ * registered in stats_fs.
+ *
+ * Returns a pointer to a stats_fs_source if it succeeds.
+ * This or one of the parents' pointer must be passed to the stats_fs_put()
+ * function when the file is to be removed.  If an error occurs,
+ * ERR_PTR(-ERROR) will be returned.
+ */
+struct stats_fs_source *stats_fs_source_create(uint32_t flags, const char *fmt,
+					       ...);
+
+/**
+ * stats_fs_source_add_values - adds values to the given source
+ * @source: a pointer to the source that will receive the values
+ * @val: a pointer to the NULL terminated stats_fs_value array to add
+ * @base_ptr: a pointer to the base pointer used by these values
+ * @flags: an integer containing common value flags (STATS_FS_HIDDEN, ...)
+ *
+ * In addition to adding values to the source, also create the
+ * files in the filesystem if the source already is backed up by a directory.
+ *
+ * Returns 0 it succeeds. If the value are already in the
+ * source and have the same base_ptr, -EEXIST is returned.
+ */
+int stats_fs_source_add_values(struct stats_fs_source *source,
+			       struct stats_fs_value *val, void *base_ptr,
+			       uint32_t flags);
+
+/**
+ * stats_fs_source_add_subordinate - adds a child to the given source
+ * @parent: a pointer to the parent source
+ * @child: a pointer to child source to add
+ *
+ * Recursively create all files in the stats_fs filesystem
+ * only if the parent has already a dentry (created with
+ * stats_fs_source_register).
+ * This avoids the case where this function is called before register.
+ */
+void stats_fs_source_add_subordinate(struct stats_fs_source *parent,
+				     struct stats_fs_source *child);
+
+/**
+ * stats_fs_source_remove_subordinate - removes a child from the given source
+ * @parent: a pointer to the parent source
+ * @child: a pointer to child source to remove
+ *
+ * Look if there is such child in the parent. If so,
+ * it will remove all its files and call stats_fs_put on the child.
+ */
+void stats_fs_source_remove_subordinate(struct stats_fs_source *parent,
+					struct stats_fs_source *child);
+
+/**
+ * stats_fs_source_get_value - search a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @val: a pointer to the stats_fs_value to search
+ * @ret: a pointer to the uint64_t that will hold the found value
+ *
+ * Look up in the source if a value with same value pointer
+ * exists.
+ * If not, it will return -ENOENT. If it exists and it's a simple value
+ * (not an aggregate), the value that it points to will be returned.
+ * If it exists and it's an aggregate (aggr_type != STATS_FS_NONE), all
+ * subordinates will be recursively searched and every simple value match
+ * will be used to aggregate the final result. For example if it's a sum,
+ * all suboordinates having the same value will be sum together.
+ *
+ * This function will return 0 it succeeds.
+ */
+int stats_fs_source_get_value(struct stats_fs_source *source,
+			      struct stats_fs_value *val, uint64_t *ret);
+
+/**
+ * stats_fs_source_get_value_by_name - search a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @name: a pointer to the string representing the value to search
+ *        (for example "exits")
+ * @ret: a pointer to the uint64_t that will hold the found value
+ *
+ * Same as stats_fs_source_get_value, but initially the name is used
+ * to search in the given source if there is a value with a matching
+ * name. If so, stats_fs_source_get_value will be called with the found
+ * value, otherwise -ENOENT will be returned.
+ */
+int stats_fs_source_get_value_by_name(struct stats_fs_source *source,
+				      char *name, uint64_t *ret);
+
+/**
+ * stats_fs_source_clear - search and clears a value in the source (and
+ * subordinates)
+ * @source: a pointer to the source that will be searched
+ * @val: a pointer to the stats_fs_value to search
+ *
+ * Look up in the source if a value with same value pointer
+ * exists.
+ * If not, it will return -ENOENT. If it exists and it's a simple value
+ * (not an aggregate), the value that it points to will be set to 0.
+ * If it exists and it's an aggregate (aggr_type != STATS_FS_NONE), all
+ * subordinates will be recursively searched and every simple value match
+ * will be set to 0.
+ *
+ * This function will return 0 it succeeds.
+ */
+int stats_fs_source_clear(struct stats_fs_source *source,
+			  struct stats_fs_value *val);
+
+/**
+ * stats_fs_source_revoke - disconnect the source from its backing data
+ * @source: a pointer to the source that will be revoked
+ *
+ * Ensure that stats_fs will not access the data that were passed to
+ * stats_fs_source_add_value for this source.
+ *
+ * Because open files increase the reference count for a stats_fs_source,
+ * the source can end up living longer than the data that provides the
+ * values for the source.  Calling stats_fs_source_revoke just before the
+ * backing data is freed avoids accesses to freed data structures.  The
+ * sources will return 0.
+ */
+void stats_fs_source_revoke(struct stats_fs_source *source);
+
+/**
+ * stats_fs_source_get - increases refcount of source
+ * @source: a pointer to the source whose refcount will be increased
+ */
+void stats_fs_source_get(struct stats_fs_source *source);
+
+/**
+ * stats_fs_source_put - decreases refcount of source and deletes if needed
+ * @source: a pointer to the source whose refcount will be decreased
+ *
+ * If refcount arrives to zero, take care of deleting
+ * and free the source resources and files, by firstly recursively calling
+ * stats_fs_source_remove_subordinate to the child and then deleting
+ * its own files and allocations.
+ */
+void stats_fs_source_put(struct stats_fs_source *source);
+
+/**
+ * stats_fs_initialized - returns true if stats_fs fs has been registered
+ */
+bool stats_fs_initialized(void);
+
+#else
+
+#include <linux/err.h>
+
+#define stats_fs_type_u8 stats_fs_type_stub
+#define stats_fs_type_s8 stats_fs_type_stub
+#define stats_fs_type_u16 stats_fs_type_stub
+#define stats_fs_type_s16 stats_fs_type_stub
+#define stats_fs_type_u32 stats_fs_type_stub
+#define stats_fs_type_s32 stats_fs_type_stub
+#define stats_fs_type_u64 stats_fs_type_stub
+#define stats_fs_type_s64 stats_fs_type_stub
+#define stats_fs_type_bool stats_fs_type_stub
+
+extern const struct stats_fs_type stats_fs_type_stub;
+
+/*
+ * We do not return NULL from these functions if CONFIG_STATS_FS is not enabled
+ * so users have a chance to detect if there was a real error or not.  We don't
+ * want to duplicate the design decision mistakes of procfs and devfs again.
+ */
+
+static inline struct stats_fs_source *stats_fs_source_create(uint32_t flags,
+							     const char *fmt,
+							     ...)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline int stats_fs_source_add_values(struct stats_fs_source *source,
+					     struct stats_fs_value *val,
+					     void *base_ptr, uint32_t flags)
+{
+	return -ENODEV;
+}
+
+static inline void
+stats_fs_source_add_subordinate(struct stats_fs_source *parent,
+				struct stats_fs_source *child)
+{ }
+
+static inline void
+stats_fs_source_remove_subordinate(struct stats_fs_source *parent,
+				   struct stats_fs_source *child)
+{ }
+
+static inline int stats_fs_source_get_value(struct stats_fs_source *source,
+					    struct stats_fs_value *val,
+					    uint64_t *ret)
+{
+	return -ENODEV;
+}
+
+static inline int
+stats_fs_source_get_value_by_name(struct stats_fs_source *source, char *name,
+				  uint64_t *ret)
+{
+	return -ENODEV;
+}
+
+static inline int stats_fs_source_clear(struct stats_fs_source *source,
+					struct stats_fs_value *val)
+{
+	return -ENODEV;
+}
+
+static inline void stats_fs_source_revoke(struct stats_fs_source *source)
+{ }
+
+static inline void stats_fs_source_get(struct stats_fs_source *source)
+{ }
+
+static inline void stats_fs_source_put(struct stats_fs_source *source)
+{ }
+
+static inline bool stats_fs_initialized(void)
+{
+	return false;
+}
+
+#endif
+
+#endif
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/7] documentation for stats_fs
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-06-04  0:23   ` Randy Dunlap
  2020-05-26 11:03 ` [PATCH v3 3/7] kunit: tests for stats_fs API Emanuele Giuseppe Esposito
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Html docs for a complete documentation of the stats_fs API,
filesystem and usage.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 Documentation/filesystems/index.rst    |   1 +
 Documentation/filesystems/stats_fs.rst | 222 +++++++++++++++++++++++++
 2 files changed, 223 insertions(+)
 create mode 100644 Documentation/filesystems/stats_fs.rst

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index e7b46dac7079..9a46fd851c6e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,6 +89,7 @@ Documentation for filesystem implementations.
    relay
    romfs
    squashfs
+   stats_fs
    sysfs
    sysv-fs
    tmpfs
diff --git a/Documentation/filesystems/stats_fs.rst b/Documentation/filesystems/stats_fs.rst
new file mode 100644
index 000000000000..292c689ffb98
--- /dev/null
+++ b/Documentation/filesystems/stats_fs.rst
@@ -0,0 +1,222 @@
+========
+Stats_FS
+========
+
+Stats_fs is a synthetic ram-based virtual filesystem that takes care of
+gathering and displaying statistics for the Linux kernel subsystems.
+
+The motivation for stats_fs comes from the fact that there is no common
+way for Linux kernel subsystems to expose statistics to userspace shared
+throughout the Linux kernel; subsystems have to take care of gathering and
+displaying statistics by themselves, for example in the form of files in
+debugfs.
+
+Allowing each subsystem of the kernel to do so has two disadvantages.
+First, it will introduce redundant code. Second, debugfs is anyway not the
+right place for statistics (for example it is affected by lockdown).
+
+Stats_fs offers a generic and stable API, allowing any kind of
+directory/file organization and supporting multiple kind of aggregations
+(not only sum, but also average, max, min and count_zero) and data types
+(boolean, all unsigned/signed and custom types). The implementation takes
+care of gathering and displaying information at run time; users only need
+to specify the values to be included in each source. Optionally, users can
+also provide a display function for each value, that will take care of
+displaying the provided value in a custom format.
+
+Its main function is to display each statistics as a file in the desired
+folder hierarchy defined through the API. Stats_fs files can be read, and
+possibly cleared if their file mode allows it.
+
+Stats_fs is typically mounted with a command like::
+
+    mount -t stats_fs stats_fs /sys/kernel/stats_fs
+
+(Or an equivalent /etc/fstab line).
+
+Stats_fs has two main components: the public API defined by
+include/linux/stats_fs.h, and the virtual file system in
+/sys/kernel/stats.
+
+The API has two main elements, values and sources. Kernel
+subsystems will create a source, add child
+sources/values/aggregates and register it to the root source (that on the
+virtual fs would be /sys/kernel/stats).
+
+The stats_fs API is defined in ``<linux/stats_fs.h>``.
+
+    Sources
+        Sources are created via ``stats_fs_source_create()``, and each
+        source becomes a directory in the file system. Sources form a
+        parent-child relationship; root sources are added to the file
+        system via ``stats_fs_source_register()``. Therefore each Linux
+        subsystem will add its own entry to the root, filesystem similar
+        to what it is done in debugfs. Every other source is added to or
+        removed from a parent through the
+        ``stats_fs_source_add_subordinate()`` and
+        ``stats_fs_source_remove_subordinate()`` APIs. Once a source is
+        created and added to the tree (via add_subordinate), it will be
+        used to compute aggregate values in the parent source. A source
+        can optionally be hidden from the filesystem but still considered
+        in the aggregation operations if the corresponding flag is set
+        during initialization.
+
+    Values
+        Values represent quantites that are gathered by the stats_fs user.
+        Examples of values include the number of vm exits of a given kind,
+        the amount of memory used by some data structure, the length of
+        the longest hash table chain, or anything like that. Values are
+        defined with the stats_fs_source_add_values function. Each value
+        is defined by a ``struct stats_fs_value``; the same
+        ``stats_fs_value`` can be added to many different sources. A value
+        can be considered "simple" if it fetches data from a user-provided
+        location, or "aggregate" if it groups all values in the
+        subordinate sources that include the same ``stats_fs_value``.
+        Values by default are considered to be cumulative, meaning the
+        value they represent never decreases, but can also be defined as
+        floating if they exibith a different behavior. The main difference
+        between these two is reflected into the file permission, since a
+        floating value file does not allow the user to clear it. Each
+        value has a ``stats_fs_type`` pointer in order to allow the user
+        to provide custom get and clear functions. The library, however,
+        also exports default ``stats_fs_type`` structs for the standard
+        types (all unsigned and signed types plus boolean). A value can
+        also provide a show function that takes care of displaying the
+        value in a custom string format. This can be especially useful
+        when displaying enums.
+
+Because stats_fs is a different mountpoint than debugfs, it is not affected
+by security lockdown.
+
+Using Stats_fs
+================
+
+Define a value::
+
+        struct statistics{
+                uint64_t exit;
+                ...
+        };
+
+        struct kvm {
+                ...
+                struct statistics stat;
+        };
+
+        struct stats_fs_value kvm_stats[] = {
+                { "exit_vm", offsetof(struct kvm, stat.exit), &stats_fs_type_u64,
+                  STATS_FS_SUM },
+                { NULL }
+        };
+
+The same ``struct stats_fs_value`` is used for both simple and aggregate
+values, though the type and offset are only used for simple values.
+Aggregates merge all values that use the same ``struct stats_fs_value``.
+
+Create the parent source::
+
+        struct stats_fs_source parent_source = stats_fs_source_create(0, "parent");
+
+Register it (files and folders
+will only be visible after this function is called)::
+
+        stats_fs_source_register(parent_source);
+
+Create and add a child::
+
+        struct stats_fs_source child_source = stats_fs_source_create(STATS_FS_HIDDEN, "child");
+
+        stats_fs_source_add_subordinate(parent_source, child_source);
+
+The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
+block the creation of the files.
+
+Add values to parent and child (also here order doesn't matter)::
+
+        struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
+        ...
+        stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
+        stats_fs_source_add_values(parent_source, kvm_stats, NULL, STATS_FS_HIDDEN);
+
+``child_source`` will be a simple value, since it has a non-NULL base
+pointer, while ``parent_source`` will be an aggregate. During the adding
+phase, also values can optionally be marked as hidden, so that the folder
+and other values can be still shown.
+
+Of course the same ``struct stats_fs_value`` array can be also passed with a
+different base pointer, to represent the same value but in another instance
+of the kvm struct.
+
+Search:
+
+Fetch a value from the child source, returning the value
+pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
+
+        uint64_t ret_child, ret_parent;
+
+        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+
+Fetch an aggregate value, searching all subsources of ``parent_source`` for
+the specified ``struct stats_fs_value``::
+
+        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+
+        assert(ret_child == ret_parent); // check expected result
+
+To make it more interesting, add another child::
+
+        struct stats_fs_source child_source2 = stats_fs_source_create(0, "child2");
+
+        stats_fs_source_add_subordinate(parent_source, child_source2);
+        // now  the structure is parent -> child1
+        //                              -> child2
+
+        struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
+        ...
+        stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 0);
+
+Note that other_base_ptr points to another instance of kvm, so the struct
+stats_fs_value is the same but the address at which they point is not.
+
+Now get the aggregate value::
+
+        uint64_t ret_child, ret_child2, ret_parent;
+
+        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+        stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
+
+        assert((ret_child + ret_child2) == ret_parent);
+
+Cleanup::
+
+        stats_fs_source_remove_subordinate(parent_source, child_source);
+        stats_fs_source_revoke(child_source);
+        stats_fs_source_put(child_source);
+
+        stats_fs_source_remove_subordinate(parent_source, child_source2);
+        stats_fs_source_revoke(child_source2);
+        stats_fs_source_put(child_source2);
+
+        stats_fs_source_put(parent_source);
+        kfree(other_base_ptr);
+        kfree(base_ptr);
+
+Calling stats_fs_source_revoke is very important, because it will ensure
+that stats_fs will not access the data that were passed to
+stats_fs_source_add_value for this source.
+
+Because open files increase the reference count for a stats_fs_source, the
+source can end up living longer than the data that provides the values for
+the source.  Calling stats_fs_source_revoke just before the backing data
+is freed avoids accesses to freed data structures. The sources will return
+0.
+
+This is not needed for the parent_source, since it just contains
+aggregates that would be 0 anyways if no matching child value exist.
+
+API Documentation
+=================
+
+.. kernel-doc:: include/linux/stats_fs.h
+   :export: fs/stats_fs/*.c
\ No newline at end of file
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/7] kunit: tests for stats_fs API
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 2/7] documentation for stats_fs Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-05-27 10:05   ` Alan Maguire
  2020-05-26 11:03 ` [PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Add kunit tests to extensively test the stats_fs API functionality.

In order to run them, the kernel .config must set CONFIG_KUNIT=y
and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
and CONFIG_STATS_FS_TEST=y

Tests can be then started by running the following command from the root
directory of the linux kernel source tree:
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all`

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 fs/Kconfig                   |    6 +
 fs/stats_fs/Makefile         |    2 +
 fs/stats_fs/stats_fs-tests.c | 1097 ++++++++++++++++++++++++++++++++++
 3 files changed, 1105 insertions(+)
 create mode 100644 fs/stats_fs/stats_fs-tests.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 684ad61129ab..02bbb0e4cdf7 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -227,6 +227,12 @@ config STATS_FS
 	  stats_fs is a virtual file system that provides counters and
 	  other statistics about the running kernel.
 
+config STATS_FS_TEST
+	bool "Tests for stats_fs"
+	depends on STATS_FS && KUNIT
+	help
+	  tests for the stats_fs API.
+
 config STATS_FS_API
 	bool
 	imply STATS_FS
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index bd988daa4c39..bc59a54d5721 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 stats_fs-objs	:= stats_fs.o
+stats_fs-tests-objs	:= stats_fs-tests.o
 
 obj-$(CONFIG_STATS_FS)	    += stats_fs.o
 obj-$(CONFIG_STATS_FS_STUB) += stub.o
+obj-$(CONFIG_STATS_FS_TEST) += stats_fs-tests.o
diff --git a/fs/stats_fs/stats_fs-tests.c b/fs/stats_fs/stats_fs-tests.c
new file mode 100644
index 000000000000..bbac133d7fe7
--- /dev/null
+++ b/fs/stats_fs/stats_fs-tests.c
@@ -0,0 +1,1097 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/anon_inodes.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+
+#include <linux/limits.h>
+#include <linux/stats_fs.h>
+#include <kunit/test.h>
+#include "internal.h"
+
+#define STATS_FS_STAT(el, x, ...)                                              \
+	{                                                                      \
+		.name = #x, .offset = offsetof(struct container, el.x),        \
+		##__VA_ARGS__                                                  \
+	}
+
+#define ARR_SIZE(el) ((int)(sizeof(el) / sizeof(struct stats_fs_value) - 1))
+
+struct test_values_struct {
+	uint64_t u64;
+	int32_t s32;
+	bool bo;
+	uint8_t u8;
+	int16_t s16;
+};
+
+struct container {
+	struct test_values_struct vals;
+};
+
+struct stats_fs_value test_values[6] = {
+	STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+		      .aggr_kind = STATS_FS_NONE,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+		      .aggr_kind = STATS_FS_NONE),
+	STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+		      .aggr_kind = STATS_FS_NONE,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	STATS_FS_STAT(vals, u8, .type = &stats_fs_type_u8,
+		      .aggr_kind = STATS_FS_NONE),
+	STATS_FS_STAT(vals, s16, .type = &stats_fs_type_s16,
+		      .aggr_kind = STATS_FS_NONE,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	{ NULL },
+};
+
+struct stats_fs_value test_aggr[4] = {
+	STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+		      .aggr_kind = STATS_FS_MIN,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+		      .aggr_kind = STATS_FS_MAX,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+		      .aggr_kind = STATS_FS_SUM,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	{ NULL },
+};
+
+struct stats_fs_value test_same_name[3] = {
+	STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+		      .aggr_kind = STATS_FS_NONE),
+	STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+		      .aggr_kind = STATS_FS_MIN),
+	{ NULL },
+};
+
+struct stats_fs_value test_all_aggr[6] = {
+	STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+		      .aggr_kind = STATS_FS_MIN),
+	STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+		      .aggr_kind = STATS_FS_COUNT_ZERO,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+		      .aggr_kind = STATS_FS_SUM),
+	STATS_FS_STAT(vals, u8, .type = &stats_fs_type_u8,
+		      .aggr_kind = STATS_FS_AVG,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	STATS_FS_STAT(vals, s16, .type = &stats_fs_type_s16,
+		      .aggr_kind = STATS_FS_MAX,
+		      .value_flag = STATS_FS_FLOATING_VALUE),
+	{ NULL },
+};
+
+#define def_u64 ((uint64_t)64)
+
+#define def_val_s32 ((int32_t)S32_MIN)
+#define def_val_bool ((bool)true)
+#define def_val_u8 ((uint8_t)127)
+#define def_val_s16 ((int16_t)10000)
+
+#define def_val2_s32 ((int32_t)S16_MAX)
+#define def_val2_bool ((bool)false)
+#define def_val2_u8 ((uint8_t)255)
+#define def_val2_s16 ((int16_t)-20000)
+
+struct container cont = {
+	.vals = {
+			.u64 = def_u64,
+			.s32 = def_val_s32,
+			.bo = def_val_bool,
+			.u8 = def_val_u8,
+			.s16 = def_val_s16,
+		},
+};
+
+struct container cont2 = {
+	.vals = {
+			.u64 = def_u64,
+			.s32 = def_val2_s32,
+			.bo = def_val2_bool,
+			.u8 = def_val2_u8,
+			.s16 = def_val2_s16,
+		},
+};
+
+static void get_stats_at_addr(struct stats_fs_source *src, void *addr,
+			      int *aggr, int *val, int use_addr)
+{
+	struct stats_fs_value *entry;
+	struct stats_fs_value_source *src_entry;
+	int counter_val = 0, counter_aggr = 0;
+
+	list_for_each_entry (src_entry, &src->values_head, list_element) {
+		if (use_addr && src_entry->base_addr != addr)
+			continue;
+
+		for (entry = src_entry->values; entry->name; entry++) {
+			if (entry->aggr_kind == STATS_FS_NONE)
+				counter_val++;
+			else
+				counter_aggr++;
+		}
+	}
+
+	if (aggr)
+		*aggr = counter_aggr;
+
+	if (val)
+		*val = counter_val;
+}
+
+int source_has_subsource(struct stats_fs_source *src,
+			 struct stats_fs_source *sub)
+{
+	struct stats_fs_source *entry;
+
+	list_for_each_entry (entry, &src->subordinates_head, list_element) {
+		if (entry == sub)
+			return 1;
+	}
+	return 0;
+}
+
+int get_number_subsources(struct stats_fs_source *src)
+{
+	struct stats_fs_source *entry;
+	int counter = 0;
+
+	list_for_each_entry (entry, &src->subordinates_head, list_element) {
+		counter++;
+	}
+	return counter;
+}
+
+int get_number_values(struct stats_fs_source *src)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, NULL, NULL, &counter, 0);
+	return counter;
+}
+
+int get_total_number_values(struct stats_fs_source *src)
+{
+	struct stats_fs_source *sub_entry;
+	int counter = 0;
+
+	get_stats_at_addr(src, NULL, NULL, &counter, 0);
+
+	list_for_each_entry (sub_entry, &src->subordinates_head, list_element) {
+		counter += get_total_number_values(sub_entry);
+	}
+
+	return counter;
+}
+
+int get_number_aggregates(struct stats_fs_source *src)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, NULL, &counter, NULL, 1);
+	return counter;
+}
+
+int get_number_values_with_base(struct stats_fs_source *src, void *addr)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, addr, NULL, &counter, 1);
+	return counter;
+}
+
+int get_number_aggr_with_base(struct stats_fs_source *src, void *addr)
+{
+	int counter = 0;
+
+	get_stats_at_addr(src, addr, &counter, NULL, 1);
+	return counter;
+}
+
+static void test_empty_folder(struct kunit *test)
+{
+	struct stats_fs_source *src;
+
+	src = stats_fs_source_create(0, "kvm_%d", 123);
+	KUNIT_EXPECT_EQ(test, strcmp(src->name, "kvm_123"), 0);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	stats_fs_source_put(src);
+}
+
+static void test_add_subfolder(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub;
+
+	src = stats_fs_source_create(0, "parent");
+	sub = stats_fs_source_create(0, "child");
+	stats_fs_source_add_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, source_has_subsource(src, sub), true);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 1);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(sub), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	stats_fs_source_put(sub);
+	sub = stats_fs_source_create(0, "not a child");
+	KUNIT_EXPECT_EQ(test, source_has_subsource(src, sub), false);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 1);
+
+	stats_fs_source_put(sub);
+	stats_fs_source_put(src);
+}
+
+static void test_add_value(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+
+	// add values
+	n = stats_fs_source_add_values(src, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	// add same values, nothing happens
+	n = stats_fs_source_add_values(src, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	// size is invaried
+	KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+
+	// no aggregates
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+
+	stats_fs_source_put(src);
+}
+
+static void test_add_value_in_subfolder(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub, *sub_not;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub = stats_fs_source_create(0, "child");
+
+	// src -> sub
+	stats_fs_source_add_subordinate(src, sub);
+
+	// add values
+	n = stats_fs_source_add_values(sub, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values));
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub), ARR_SIZE(test_values));
+	// no values in sub
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), 0);
+
+	// different folder
+	sub_not = stats_fs_source_create(0, "not a child");
+
+	// add values
+	n = stats_fs_source_add_values(sub_not, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub_not, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values));
+
+	// remove sub, check values is 0
+	stats_fs_source_remove_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	// re-add sub, check value are added
+	stats_fs_source_add_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values));
+
+	// add sub_not, check value are twice as many
+	stats_fs_source_add_subordinate(src, sub_not);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values) * 2);
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub_not),
+			ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub_not), 0);
+
+	stats_fs_source_put(sub);
+	stats_fs_source_put(sub_not);
+	stats_fs_source_put(src);
+}
+
+static void test_search_value(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+
+	// add values
+	n = stats_fs_source_add_values(src, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	// get u64
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ((bool)ret), def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// get a non-added value
+	n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	stats_fs_source_put(src);
+}
+
+static void test_search_value_in_subfolder(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub = stats_fs_source_create(0, "child");
+
+	// src -> sub
+	stats_fs_source_add_subordinate(src, sub);
+
+	// add values to sub
+	n = stats_fs_source_add_values(sub, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	n = stats_fs_source_get_value_by_name(sub, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(sub, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(sub, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ((bool)ret), def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(sub, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	stats_fs_source_put(sub);
+	stats_fs_source_put(src);
+}
+
+static void test_search_value_in_empty_folder(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "empty folder");
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_subsources(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	stats_fs_source_put(src);
+}
+
+static void test_add_aggregate(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+
+	// add aggr to src, no values
+	n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// count values
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	// add same array again, should not be added
+	n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), ARR_SIZE(test_aggr));
+
+	stats_fs_source_put(src);
+}
+
+static void test_add_aggregate_in_subfolder(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub, *sub_not;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub = stats_fs_source_create(0, "child");
+	// src->sub
+	stats_fs_source_add_subordinate(src, sub);
+
+	// add aggr to sub
+	n = stats_fs_source_add_values(sub, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub), ARR_SIZE(test_aggr));
+
+	// not a child
+	sub_not = stats_fs_source_create(0, "not a child");
+
+	// add aggr to "not a child"
+	n = stats_fs_source_add_values(sub_not, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub_not, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	KUNIT_EXPECT_EQ(test, get_number_values(src), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), 0);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	// remove sub
+	stats_fs_source_remove_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	// re-add both
+	stats_fs_source_add_subordinate(src, sub);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+	stats_fs_source_add_subordinate(src, sub_not);
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	KUNIT_EXPECT_EQ(test, get_number_values(sub_not), 0);
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(sub_not),
+			ARR_SIZE(test_aggr));
+
+	stats_fs_source_put(sub);
+	stats_fs_source_put(sub_not);
+	stats_fs_source_put(src);
+}
+
+static void test_search_aggregate(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	stats_fs_source_put(src);
+}
+
+static void test_search_aggregate_in_subfolder(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub = stats_fs_source_create(0, "child");
+
+	stats_fs_source_add_subordinate(src, sub);
+
+	n = stats_fs_source_add_values(sub, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+	n = get_number_aggr_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// no u64 in test_aggr
+	n = stats_fs_source_get_value_by_name(sub, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(sub, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(sub, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	n = stats_fs_source_get_value_by_name(sub, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	stats_fs_source_put(sub);
+	stats_fs_source_put(src);
+}
+
+void test_search_same(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	n = stats_fs_source_add_values(src, test_same_name, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+
+	n = stats_fs_source_add_values(src, test_same_name, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, 1);
+
+	// returns first the value
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	stats_fs_source_put(src);
+}
+
+static void test_add_mixed(struct kunit *test)
+{
+	struct stats_fs_source *src;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+
+	n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_add_values(src, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	n = stats_fs_source_add_values(src, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_values_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	n = stats_fs_source_add_values(src, test_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, -EEXIST);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_number_values(src), ARR_SIZE(test_values));
+	KUNIT_EXPECT_EQ(test, get_number_aggregates(src), ARR_SIZE(test_aggr));
+	stats_fs_source_put(src);
+}
+
+static void test_search_mixed(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub = stats_fs_source_create(0, "child");
+	stats_fs_source_add_subordinate(src, sub);
+
+	// src has the aggregates, sub the values. Just search
+	n = stats_fs_source_add_values(sub, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	n = stats_fs_source_add_values(src, test_aggr, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_aggr));
+
+	// u64 is sum so again same value
+	n = stats_fs_source_get_value_by_name(sub, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// s32 is min so return the value also in the aggregate
+	n = stats_fs_source_get_value_by_name(sub, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	// bo is max
+	n = stats_fs_source_get_value_by_name(sub, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+	KUNIT_EXPECT_EQ(test, n, 0);
+
+	n = stats_fs_source_get_value_by_name(sub, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+	n = stats_fs_source_get_value_by_name(src, "does not exist", &ret);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	KUNIT_EXPECT_EQ(test, n, -ENOENT);
+
+	stats_fs_source_put(sub);
+	stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_val_val(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub1, *sub2;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub1 = stats_fs_source_create(0, "child1");
+	sub2 = stats_fs_source_create(0, "child2");
+	stats_fs_source_add_subordinate(src, sub1);
+	stats_fs_source_add_subordinate(src, sub2);
+
+	n = stats_fs_source_add_values(sub1, test_all_aggr, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+	n = stats_fs_source_add_values(sub2, test_all_aggr, &cont2, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub2, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+	// min
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+	// count_0
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 191ull);
+
+	// max
+	n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+	stats_fs_source_put(sub1);
+	stats_fs_source_put(sub2);
+	stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_val_agg_val(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub1, *sub2;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub1 = stats_fs_source_create(0, "child1");
+	sub2 = stats_fs_source_create(0, "child2");
+	stats_fs_source_add_subordinate(src, sub1);
+	stats_fs_source_add_subordinate(src, sub2);
+
+	n = stats_fs_source_add_values(src, test_all_aggr, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+	n = stats_fs_source_add_values(sub2, test_all_aggr, &cont2, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub2, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = stats_fs_source_add_values(sub1, test_all_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+	n = stats_fs_source_get_value_by_name(sub1, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	n = stats_fs_source_get_value_by_name(sub2, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+	n = stats_fs_source_get_value_by_name(sub1, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MAX); // MIN
+	n = stats_fs_source_get_value_by_name(sub2, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val2_s32);
+
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val_bool);
+	n = stats_fs_source_get_value_by_name(sub1, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	n = stats_fs_source_get_value_by_name(sub2, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (bool)ret, def_val2_bool);
+
+	n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val_u8);
+	n = stats_fs_source_get_value_by_name(sub1, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 0ull);
+	n = stats_fs_source_get_value_by_name(sub2, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val2_u8);
+
+	n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+	n = stats_fs_source_get_value_by_name(sub1, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int64_t)ret, S64_MIN); // MAX
+	n = stats_fs_source_get_value_by_name(sub2, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val2_s16);
+
+	stats_fs_source_put(sub1);
+	stats_fs_source_put(sub2);
+	stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_val_val_sub(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub1, *sub11;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub1 = stats_fs_source_create(0, "child1");
+	sub11 = stats_fs_source_create(0, "child11");
+	stats_fs_source_add_subordinate(src, sub1);
+	stats_fs_source_add_subordinate(sub1, sub11); // changes here!
+
+	n = stats_fs_source_add_values(sub1, test_values, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub1, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+	n = stats_fs_source_add_values(sub11, test_values, &cont2, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_values_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_values));
+
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src),
+			ARR_SIZE(test_values) * 2);
+
+	n = stats_fs_source_add_values(sub1, test_all_aggr, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+	n = stats_fs_source_add_values(sub11, test_all_aggr, &cont2, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+	// min
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+	// count_0
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 191ull);
+
+	// max
+	n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+	stats_fs_source_put(sub1);
+	stats_fs_source_put(sub11);
+	stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_no_val_sub(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub1, *sub11;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub1 = stats_fs_source_create(0, "child1");
+	sub11 = stats_fs_source_create(0, "child11");
+	stats_fs_source_add_subordinate(src, sub1);
+	stats_fs_source_add_subordinate(sub1, sub11);
+
+	n = stats_fs_source_add_values(sub11, test_all_aggr, &cont2, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64);
+
+	// min
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val2_s32);
+
+	// count_0
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret, def_val2_u8);
+
+	// max
+	n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val2_s16);
+
+	stats_fs_source_put(sub1);
+	stats_fs_source_put(sub11);
+	stats_fs_source_put(src);
+}
+
+static void test_all_aggregations_agg_agg_val_sub(struct kunit *test)
+{
+	struct stats_fs_source *src, *sub1, *sub11, *sub12;
+	uint64_t ret;
+	int n;
+
+	src = stats_fs_source_create(0, "parent");
+	sub1 = stats_fs_source_create(0, "child1");
+	sub11 = stats_fs_source_create(0, "child11");
+	sub12 = stats_fs_source_create(0, "child12");
+	stats_fs_source_add_subordinate(src, sub1);
+	stats_fs_source_add_subordinate(sub1, sub11);
+	stats_fs_source_add_subordinate(sub1, sub12);
+
+	n = stats_fs_source_add_values(sub11, test_all_aggr, &cont2, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub11, &cont2);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = stats_fs_source_add_values(sub12, test_all_aggr, &cont, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub12, &cont);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	KUNIT_EXPECT_EQ(test, get_total_number_values(src), 0);
+
+	n = stats_fs_source_add_values(src, test_all_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(src, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	n = stats_fs_source_add_values(sub1, test_all_aggr, NULL, 0);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	n = get_number_aggr_with_base(sub1, NULL);
+	KUNIT_EXPECT_EQ(test, n, ARR_SIZE(test_all_aggr));
+
+	// sum
+	n = stats_fs_source_get_value_by_name(src, "u64", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, def_u64 * 2);
+
+	// min
+	n = stats_fs_source_get_value_by_name(src, "s32", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ((int32_t)ret), def_val_s32);
+
+	// count_0
+	n = stats_fs_source_get_value_by_name(src, "bo", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, ret, 1ull);
+
+	// avg
+	n = stats_fs_source_get_value_by_name(src, "u8", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (uint8_t)ret,
+			(uint8_t)((def_val2_u8 + def_val_u8) / 2));
+
+	// max
+	n = stats_fs_source_get_value_by_name(src, "s16", &ret);
+	KUNIT_EXPECT_EQ(test, n, 0);
+	KUNIT_EXPECT_EQ(test, (int16_t)ret, def_val_s16);
+
+	stats_fs_source_put(sub1);
+	stats_fs_source_put(sub11);
+	stats_fs_source_put(sub12);
+	stats_fs_source_put(src);
+}
+
+static struct kunit_case stats_fs_test_cases[] = {
+	KUNIT_CASE(test_empty_folder),
+	KUNIT_CASE(test_add_subfolder),
+	KUNIT_CASE(test_add_value),
+	KUNIT_CASE(test_add_value_in_subfolder),
+	KUNIT_CASE(test_search_value),
+	KUNIT_CASE(test_search_value_in_subfolder),
+	KUNIT_CASE(test_search_value_in_empty_folder),
+	KUNIT_CASE(test_add_aggregate),
+	KUNIT_CASE(test_add_aggregate_in_subfolder),
+	KUNIT_CASE(test_search_aggregate),
+	KUNIT_CASE(test_search_aggregate_in_subfolder),
+	KUNIT_CASE(test_search_same),
+	KUNIT_CASE(test_add_mixed),
+	KUNIT_CASE(test_search_mixed),
+	KUNIT_CASE(test_all_aggregations_agg_val_val),
+	KUNIT_CASE(test_all_aggregations_val_agg_val),
+	KUNIT_CASE(test_all_aggregations_agg_val_val_sub),
+	KUNIT_CASE(test_all_aggregations_agg_no_val_sub),
+	KUNIT_CASE(test_all_aggregations_agg_agg_val_sub),
+	{}
+};
+
+static struct kunit_suite stats_fs_test_suite = {
+	.name = "stats_fs",
+	.test_cases = stats_fs_test_cases,
+};
+
+kunit_test_suite(stats_fs_test_suite);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2020-05-26 11:03 ` [PATCH v3 3/7] kunit: tests for stats_fs API Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 5/7] kvm_main: replace debugfs with stats_fs Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Add virtual fs that maps stats_fs sources with directories, and values
(simple or aggregates) to files.

Every time a file is read/cleared, the fs internally invokes the stats_fs
API to get/set the requested value.

Also introduce the optional show function in each value, that allows
to customize how the value is displayed inside a file. This could be
especially useful with enums.

fs/stats_fs/inode.cis pretty much similar to what is done in
fs/debugfs/inode.c, with the exception that the API is only
composed by stats_fs_create_file, stats_fs_create_dir and stats_fs_remove.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 fs/stats_fs/Makefile       |   2 +-
 fs/stats_fs/inode.c        | 461 +++++++++++++++++++++++++++++++++++++
 fs/stats_fs/internal.h     |  15 ++
 fs/stats_fs/stats_fs.c     |  92 +++++++-
 include/linux/stats_fs.h   |  18 ++
 include/uapi/linux/magic.h |   1 +
 tools/lib/api/fs/fs.c      |  21 ++
 7 files changed, 608 insertions(+), 2 deletions(-)
 create mode 100644 fs/stats_fs/inode.c

diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index bc59a54d5721..19b7e13f6c3d 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-stats_fs-objs	:= stats_fs.o
+stats_fs-objs	:= inode.o stats_fs.o
 stats_fs-tests-objs	:= stats_fs-tests.o
 
 obj-$(CONFIG_STATS_FS)	    += stats_fs.o
diff --git a/fs/stats_fs/inode.c b/fs/stats_fs/inode.c
new file mode 100644
index 000000000000..eaa0a8bc7466
--- /dev/null
+++ b/fs/stats_fs/inode.c
@@ -0,0 +1,461 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  inode.c - part of stats_fs, a tiny little stats_fs file system
+ *
+ *  Copyright (C) 2020 Emanuele Giuseppe Esposito <eesposit@redhat.com>
+ *  Copyright (C) 2020 Redhat
+ */
+#define pr_fmt(fmt)	"stats_fs: " fmt
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/init.h>
+#include <linux/stats_fs.h>
+#include <linux/string.h>
+#include <linux/seq_file.h>
+#include <linux/parser.h>
+#include <linux/magic.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+#define STATS_FS_DEFAULT_MODE	0700
+
+static struct simple_fs stats_fs;
+static bool stats_fs_registered;
+
+struct stats_fs_mount_opts {
+	kuid_t uid;
+	kgid_t gid;
+	umode_t mode;
+};
+
+enum {
+	Opt_uid,
+	Opt_gid,
+	Opt_mode,
+	Opt_err
+};
+
+static const match_table_t tokens = {
+	{Opt_uid, "uid=%u"},
+	{Opt_gid, "gid=%u"},
+	{Opt_mode, "mode=%o"},
+	{Opt_err, NULL}
+};
+
+struct stats_fs_fs_info {
+	struct stats_fs_mount_opts mount_opts;
+};
+
+static int stats_fs_parse_options(char *data, struct stats_fs_mount_opts *opts)
+{
+	substring_t args[MAX_OPT_ARGS];
+	int option;
+	int token;
+	kuid_t uid;
+	kgid_t gid;
+	char *p;
+
+	opts->mode = STATS_FS_DEFAULT_MODE;
+
+	while ((p = strsep(&data, ",")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_uid:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid))
+				return -EINVAL;
+			opts->uid = uid;
+			break;
+		case Opt_gid:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid))
+				return -EINVAL;
+			opts->gid = gid;
+			break;
+		case Opt_mode:
+			if (match_octal(&args[0], &option))
+				return -EINVAL;
+			opts->mode = option & S_IALLUGO;
+			break;
+		/*
+		 * We might like to report bad mount options here;
+		 * but traditionally stats_fs has ignored all mount options
+		 */
+		}
+	}
+
+	return 0;
+}
+
+static int stats_fs_apply_options(struct super_block *sb)
+{
+	struct stats_fs_fs_info *fsi = sb->s_fs_info;
+	struct inode *inode = d_inode(sb->s_root);
+	struct stats_fs_mount_opts *opts = &fsi->mount_opts;
+
+	inode->i_mode &= ~S_IALLUGO;
+	inode->i_mode |= opts->mode;
+
+	inode->i_uid = opts->uid;
+	inode->i_gid = opts->gid;
+
+	return 0;
+}
+
+static int stats_fs_remount(struct super_block *sb, int *flags, char *data)
+{
+	int err;
+	struct stats_fs_fs_info *fsi = sb->s_fs_info;
+
+	sync_filesystem(sb);
+	err = stats_fs_parse_options(data, &fsi->mount_opts);
+	if (err)
+		goto fail;
+
+	stats_fs_apply_options(sb);
+
+fail:
+	return err;
+}
+
+static int stats_fs_show_options(struct seq_file *m, struct dentry *root)
+{
+	struct stats_fs_fs_info *fsi = root->d_sb->s_fs_info;
+	struct stats_fs_mount_opts *opts = &fsi->mount_opts;
+
+	if (!uid_eq(opts->uid, GLOBAL_ROOT_UID))
+		seq_printf(m, ",uid=%u",
+			   from_kuid_munged(&init_user_ns, opts->uid));
+	if (!gid_eq(opts->gid, GLOBAL_ROOT_GID))
+		seq_printf(m, ",gid=%u",
+			   from_kgid_munged(&init_user_ns, opts->gid));
+	if (opts->mode != STATS_FS_DEFAULT_MODE)
+		seq_printf(m, ",mode=%o", opts->mode);
+
+	return 0;
+}
+
+
+static void stats_fs_free_inode(struct inode *inode)
+{
+	kfree(inode->i_private);
+	free_inode_nonrcu(inode);
+}
+
+static const struct super_operations stats_fs_super_operations = {
+	.statfs		= simple_statfs,
+	.remount_fs	= stats_fs_remount,
+	.show_options	= stats_fs_show_options,
+	.free_inode	= stats_fs_free_inode,
+};
+
+static int stats_fs_fill_super(struct super_block *sb, void *data, int silent)
+{
+	static const struct tree_descr stats_fs_files[] = {{""}};
+	struct stats_fs_fs_info *fsi;
+	int err;
+
+	fsi = kzalloc(sizeof(struct stats_fs_fs_info), GFP_KERNEL);
+	sb->s_fs_info = fsi;
+	if (!fsi) {
+		err = -ENOMEM;
+		goto fail;
+	}
+
+	err = stats_fs_parse_options(data, &fsi->mount_opts);
+	if (err)
+		goto fail;
+
+	err  =  simple_fill_super(sb, STATSFS_MAGIC, stats_fs_files);
+	if (err)
+		goto fail;
+
+	sb->s_op = &stats_fs_super_operations;
+
+	stats_fs_apply_options(sb);
+
+	return 0;
+
+fail:
+	kfree(fsi);
+	sb->s_fs_info = NULL;
+	return err;
+}
+
+static struct dentry *stats_fs_mount(struct file_system_type *fs_type,
+			int flags, const char *dev_name,
+			void *data)
+{
+	return mount_single(fs_type, flags, data, stats_fs_fill_super);
+}
+
+static struct file_system_type stats_fs_fs_type = {
+	.owner =	THIS_MODULE,
+	.name =		"statsfs",
+	.mount =	stats_fs_mount,
+	.kill_sb =	kill_litter_super,
+};
+MODULE_ALIAS_FS("statsfs");
+
+static int stats_fs_u64_attr_get(void *data, u64 *val)
+{
+	int r = -EFAULT;
+	struct stats_fs_data_inode *val_inode =
+		(struct stats_fs_data_inode *)data;
+
+	r = stats_fs_source_get_value(val_inode->src, val_inode->val, val);
+	return r;
+}
+
+static int stats_fs_u64_attr_clear(void *data, u64 val)
+{
+	int r = -EFAULT;
+	struct stats_fs_data_inode *val_inode =
+		(struct stats_fs_data_inode *)data;
+
+	if (val)
+		return -EINVAL;
+
+	r = stats_fs_source_clear(val_inode->src, val_inode->val);
+	return r;
+}
+
+static int stats_fs_u64_attr_open(struct inode *inode, struct file *file)
+{
+	struct stats_fs_data_inode *val_inode;
+	char *fmt;
+
+	val_inode = (struct stats_fs_data_inode *)inode->i_private;
+
+	/* Inodes hold a  pointer to the source which is not included in the
+	 * refcount, so they files be opened while destroy is running, but
+	 * values are removed (base_addr = NULL) before the source is destroyed.
+	 */
+	if (!kref_get_unless_zero(&val_inode->src->refcount))
+		return -ENOENT;
+
+	if (is_val_signed(val_inode->val))
+		fmt = "%lld\n";
+	else
+		fmt = "%llu\n";
+
+	if (simple_attr_open(inode, file, stats_fs_u64_attr_get,
+			     stats_fs_val_get_mode(val_inode->val) & 0222 ?
+				     stats_fs_u64_attr_clear :
+				     NULL,
+			     fmt)) {
+		stats_fs_source_put(val_inode->src);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static ssize_t stats_fs_string_attr_read(struct file *file, char __user *buf,
+					 size_t len, loff_t *ppos)
+{
+	const char *str = file->private_data;
+	size_t size = strlen(str);
+	return simple_read_from_buffer(buf, len, ppos, str, size);
+}
+
+static int file_string_attr_open(struct inode *inode, struct file *file,
+				 char *str)
+{
+	file->private_data = str;
+	return nonseekable_open(inode, file);
+}
+
+static int stats_fs_string_attr_open(struct inode *inode, struct file *file)
+{
+	struct stats_fs_data_inode *val_inode;
+	char *str;
+	u64 val;
+
+	val_inode = (struct stats_fs_data_inode *)inode->i_private;
+
+	WARN_ON(val_inode->val->value_flag & STATS_FS_FLOATING_VALUE);
+
+	/* Inodes hold a  pointer to the source which is not included in the
+	 * refcount, so they files be opened while destroy is running, but
+	 * values are removed (base_addr = NULL) before the source is destroyed.
+	 */
+	if (!kref_get_unless_zero(&val_inode->src->refcount))
+		return -ENOENT;
+
+	stats_fs_source_get_value(val_inode->src, val_inode->val, &val);
+	str = val_inode->val->show(val);
+
+	if (file_string_attr_open(inode, file, str)) {
+		stats_fs_source_put(val_inode->src);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static int stats_fs_attr_release(struct inode *inode, struct file *file)
+{
+	struct stats_fs_data_inode *val_inode;
+
+	val_inode = (struct stats_fs_data_inode *)inode->i_private;
+
+	simple_attr_release(inode, file);
+	stats_fs_source_put(val_inode->src);
+
+	return 0;
+}
+
+static const struct file_operations stats_fs_u64_ops = {
+	.owner = THIS_MODULE,
+	.open = stats_fs_u64_attr_open,
+	.release = stats_fs_attr_release,
+	.read = simple_attr_read,
+	.write = simple_attr_write,
+	.llseek = no_llseek,
+};
+
+static const struct file_operations stats_fs_string_ops = {
+	.owner = THIS_MODULE,
+	.open = stats_fs_string_attr_open,
+	.release = stats_fs_attr_release,
+	.read = stats_fs_string_attr_read,
+	.write = simple_attr_write,
+	.llseek = no_llseek,
+};
+
+/**
+ * stats_fs_create_file - create a file in the stats_fs filesystem
+ * @val: a pointer to a stats_fs_value containing all the infos of
+ * the file to create (name, permission)
+ * @src: a pointer to a stats_fs_source containing the dentry of where
+ * to add this file
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the stats_fs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ *
+ * Val and src will be also inglobated in a ststsfs_data_inode struct
+ * that will be internally stored as inode->i_private and used in the
+ * get/set attribute functions (see stats_fs_ops in stats_fs.c).
+ */
+struct dentry *stats_fs_create_file(struct stats_fs_value *val, struct stats_fs_source *src)
+{
+	struct dentry *dentry;
+	struct inode *inode;
+	struct stats_fs_data_inode *val_inode;
+
+	val_inode = kzalloc(sizeof(struct stats_fs_data_inode), GFP_KERNEL);
+	if (!val_inode) {
+		printk(KERN_ERR
+			"Kzalloc failure in stats_fs_create_files (ENOMEM)\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	val_inode->src = src;
+	val_inode->val = val;
+
+
+	dentry = simplefs_create_file(&stats_fs, &stats_fs_fs_type,
+				      val->name, stats_fs_val_get_mode(val),
+					  src->source_dentry, val_inode, &inode);
+	if (IS_ERR(dentry))
+		return dentry;
+
+	inode->i_fop = val->show ? &stats_fs_string_ops : &stats_fs_u64_ops;
+
+	return simplefs_finish_dentry(dentry, inode);
+}
+/**
+ * stats_fs_create_dir - create a directory in the stats_fs filesystem
+ * @name: a pointer to a string containing the name of the directory to
+ *        create.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is NULL, then the
+ *          directory will be created in the root of the stats_fs filesystem.
+ *
+ * This function creates a directory in stats_fs with the given name.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the stats_fs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ */
+struct dentry *stats_fs_create_dir(const char *name, struct dentry *parent)
+{
+	struct dentry *dentry;
+	struct inode *inode;
+
+	dentry = simplefs_create_dir(&stats_fs, &stats_fs_fs_type,
+				     name, 0755, parent, &inode);
+	if (IS_ERR(dentry))
+		return dentry;
+
+	inode->i_op = &simple_dir_inode_operations;
+	return simplefs_finish_dentry(dentry, inode);
+}
+
+static void remove_one(struct dentry *victim)
+{
+	simple_release_fs(&stats_fs);
+}
+
+/**
+ * stats_fs_remove - recursively removes a directory
+ * @dentry: a pointer to a the dentry of the directory to be removed.  If this
+ *          parameter is NULL or an error value, nothing will be done.
+ *
+ * This function recursively removes a directory tree in stats_fs that
+ * was previously created with a call to another stats_fs function
+ * (like stats_fs_create_file() or variants thereof.)
+ *
+ * This function is required to be called in order for the file to be
+ * removed, no automatic cleanup of files will happen when a module is
+ * removed, you are responsible here.
+ */
+void stats_fs_remove(struct dentry *dentry)
+{
+	if (IS_ERR_OR_NULL(dentry))
+		return;
+
+	simple_pin_fs(&stats_fs, &stats_fs_fs_type);
+	simple_recursive_removal(dentry, remove_one);
+	simple_release_fs(&stats_fs);
+}
+/**
+ * stats_fs_initialized - Tells whether stats_fs has been registered
+ */
+bool stats_fs_initialized(void)
+{
+	return stats_fs_registered;
+}
+EXPORT_SYMBOL_GPL(stats_fs_initialized);
+
+static int __init stats_fs_init(void)
+{
+	int retval;
+
+	retval = sysfs_create_mount_point(kernel_kobj, "stats");
+	if (retval)
+		return retval;
+
+	retval = register_filesystem(&stats_fs_fs_type);
+	if (retval)
+		sysfs_remove_mount_point(kernel_kobj, "stats");
+	else
+		stats_fs_registered = true;
+
+	return retval;
+}
+core_initcall(stats_fs_init);
diff --git a/fs/stats_fs/internal.h b/fs/stats_fs/internal.h
index 4993afbb1e45..50977f332feb 100644
--- a/fs/stats_fs/internal.h
+++ b/fs/stats_fs/internal.h
@@ -16,4 +16,19 @@ struct stats_fs_value_source {
 	struct list_head list_element;
 };
 
+struct stats_fs_data_inode {
+	struct stats_fs_source *src;
+	struct stats_fs_value *val;
+};
+
+struct dentry *stats_fs_create_file(struct stats_fs_value *val,
+				   struct stats_fs_source *src);
+
+struct dentry *stats_fs_create_dir(const char *name, struct dentry *parent);
+
+void stats_fs_remove(struct dentry *dentry);
+#define stats_fs_remove_recursive stats_fs_remove
+
+int is_val_signed(struct stats_fs_value *val);
+
 #endif /* _STATS_FS_INTERNAL_H_ */
diff --git a/fs/stats_fs/stats_fs.c b/fs/stats_fs/stats_fs.c
index b76ee44f6dac..f99c3ae6ce25 100644
--- a/fs/stats_fs/stats_fs.c
+++ b/fs/stats_fs/stats_fs.c
@@ -39,11 +39,35 @@ STATS_FS_TYPE_STRUCT_US(32)
 STATS_FS_TYPE_STRUCT_US(64)
 STATS_FS_TYPE_STRUCT(bool)
 
-static int is_val_signed(struct stats_fs_value *val)
+static void stats_fs_source_remove_files(struct stats_fs_source *src);
+
+int is_val_signed(struct stats_fs_value *val)
 {
 	return val->type->sign;
 }
 
+/* Called with rwsem held for writing */
+static void stats_fs_source_remove_files_locked(struct stats_fs_source *src)
+{
+	struct stats_fs_source *child;
+
+	if (src->source_dentry == NULL)
+		return;
+
+	list_for_each_entry (child, &src->subordinates_head, list_element)
+		stats_fs_source_remove_files(child);
+
+	stats_fs_remove_recursive(src->source_dentry);
+	src->source_dentry = NULL;
+}
+
+static void stats_fs_source_remove_files(struct stats_fs_source *src)
+{
+	down_write(&src->rwsem);
+	stats_fs_source_remove_files_locked(src);
+	up_write(&src->rwsem);
+}
+
 static struct stats_fs_value *find_value(struct stats_fs_value_source *src,
 					 struct stats_fs_value *val)
 {
@@ -74,6 +98,63 @@ search_value_in_source(struct stats_fs_source *src, struct stats_fs_value *arg,
 	return NULL;
 }
 
+/* Called with rwsem held for writing */
+static void stats_fs_create_files_locked(struct stats_fs_source *source)
+{
+	struct stats_fs_value_source *val_src;
+	struct stats_fs_value *val;
+
+	if (!source->source_dentry)
+		return;
+
+	list_for_each_entry (val_src, &source->values_head, list_element) {
+		if (val_src->files_created ||
+		    (val_src->common_flags & STATS_FS_HIDDEN))
+			continue;
+
+		for (val = val_src->values; val->name; val++)
+			stats_fs_create_file(val, source);
+
+		val_src->files_created = true;
+	}
+}
+
+/* Called with rwsem held for writing */
+static void
+stats_fs_create_files_recursive_locked(struct stats_fs_source *source,
+				       struct dentry *parent_dentry)
+{
+	struct stats_fs_source *child;
+
+	/* first check values in this folder, since it might be new */
+	if (!source->source_dentry && !(source->common_flags & STATS_FS_HIDDEN)) {
+		source->source_dentry =
+			stats_fs_create_dir(source->name, parent_dentry);
+	}
+
+	stats_fs_create_files_locked(source);
+
+	list_for_each_entry (child, &source->subordinates_head, list_element) {
+		if (child->source_dentry == NULL) {
+			/* assume that if child has a folder,
+			 * also the sub-child have that.
+			 */
+			down_write(&child->rwsem);
+			stats_fs_create_files_recursive_locked(
+				child, source->source_dentry);
+			up_write(&child->rwsem);
+		}
+	}
+}
+
+void stats_fs_source_register(struct stats_fs_source *source)
+{
+	down_write(&source->rwsem);
+	stats_fs_create_files_recursive_locked(source, NULL);
+	up_write(&source->rwsem);
+}
+EXPORT_SYMBOL_GPL(stats_fs_source_register);
+
 /* Called with rwsem held for writing */
 static struct stats_fs_value_source *create_value_source(void *base, uint32_t flags)
 {
@@ -112,6 +193,9 @@ int stats_fs_source_add_values(struct stats_fs_source *source,
 	/* add the val_src to the source list */
 	list_add(&val_src->list_element, &source->values_head);
 
+	/* create child if it's the case */
+	stats_fs_create_files_locked(source);
+
 	up_write(&source->rwsem);
 
 	return 0;
@@ -125,6 +209,9 @@ void stats_fs_source_add_subordinate(struct stats_fs_source *source,
 
 	stats_fs_source_get(sub);
 	list_add(&sub->list_element, &source->subordinates_head);
+	if (source->source_dentry)
+		stats_fs_create_files_recursive_locked(sub,
+						       source->source_dentry);
 
 	up_write(&source->rwsem);
 }
@@ -141,6 +228,7 @@ stats_fs_source_remove_subordinate_locked(struct stats_fs_source *source,
 			     list_element) {
 		if (src_entry == sub) {
 			list_del_init(&src_entry->list_element);
+			stats_fs_source_remove_files(src_entry);
 			stats_fs_source_put(src_entry);
 			return;
 		}
@@ -505,6 +593,8 @@ static void stats_fs_source_destroy(struct kref *kref_source)
 		stats_fs_source_remove_subordinate_locked(source, child);
 	}
 
+	stats_fs_source_remove_files_locked(source);
+
 	up_write(&source->rwsem);
 	kfree(source->name);
 	kfree(source);
diff --git a/include/linux/stats_fs.h b/include/linux/stats_fs.h
index 93847383f597..db1c3ae9ff8b 100644
--- a/include/linux/stats_fs.h
+++ b/include/linux/stats_fs.h
@@ -52,6 +52,9 @@ struct stats_fs_value {
 	enum stat_aggr aggr_kind;
 
 	uint32_t value_flag;
+
+	/* optional show function */
+	char *(*show)(uint64_t);
 };
 
 struct stats_fs_source {
@@ -144,6 +147,18 @@ extern const struct stats_fs_type stats_fs_type_bool;
 struct stats_fs_source *stats_fs_source_create(uint32_t flags, const char *fmt,
 					       ...);
 
+/**
+ * stats_fs_source_register - register a source in the stats_fs filesystem
+ * @source: a pointer to the source that will be registered
+ *
+ * Add the given folder as direct child of /sys/kernel/stats.
+ * It also starts to recursively search its own child and create all folders
+ * and files if they weren't already. All subsequent add_subordinate calls
+ * on the same source that is used in this function will create corresponding
+ * files and directories.
+ */
+void stats_fs_source_register(struct stats_fs_source *source);
+
 /**
  * stats_fs_source_add_values - adds values to the given source
  * @source: a pointer to the source that will receive the values
@@ -307,6 +322,9 @@ static inline struct stats_fs_source *stats_fs_source_create(uint32_t flags,
 	return ERR_PTR(-ENODEV);
 }
 
+static inline void stats_fs_source_register(struct stats_fs_source *source)
+{ }
+
 static inline int stats_fs_source_add_values(struct stats_fs_source *source,
 					     struct stats_fs_value *val,
 					     void *base_ptr, uint32_t flags)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index d78064007b17..46c66ea3fc9e 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -10,6 +10,7 @@
 #define CRAMFS_MAGIC		0x28cd3d45	/* some random number */
 #define CRAMFS_MAGIC_WEND	0x453dcd28	/* magic number with the wrong endianess */
 #define DEBUGFS_MAGIC          0x64626720
+#define STATSFS_MAGIC          0x73746174
 #define SECURITYFS_MAGIC	0x73636673
 #define SELINUX_MAGIC		0xf97cff8c
 #define SMACK_MAGIC		0x43415d53	/* "SMAC" */
diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 027b18f7ed8c..c512e69b3cfe 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -35,6 +35,10 @@
 #define TRACEFS_MAGIC          0x74726163
 #endif
 
+#ifndef STATSFS_MAGIC
+#define STATSFS_MAGIC          0x73746174
+#endif
+
 #ifndef HUGETLBFS_MAGIC
 #define HUGETLBFS_MAGIC        0x958458f6
 #endif
@@ -76,6 +80,16 @@ static const char * const tracefs__known_mountpoints[] = {
 	0,
 };
 
+#ifndef STATSFS_DEFAULT_PATH
+#define STATSFS_DEFAULT_PATH "/sys/kernel/stats"
+#endif
+
+static const char * const statsfs__known_mountpoints[] = {
+	STATSFS_DEFAULT_PATH,
+	"/stats",
+	0,
+};
+
 static const char * const hugetlbfs__known_mountpoints[] = {
 	0,
 };
@@ -100,6 +114,7 @@ enum {
 	FS__TRACEFS = 3,
 	FS__HUGETLBFS = 4,
 	FS__BPF_FS = 5,
+	FS__STATSFS = 6,
 };
 
 #ifndef TRACEFS_MAGIC
@@ -127,6 +142,11 @@ static struct fs fs__entries[] = {
 		.mounts	= tracefs__known_mountpoints,
 		.magic	= TRACEFS_MAGIC,
 	},
+	[FS__STATSFS] = {
+		.name	= "statsfs",
+		.mounts	= statsfs__known_mountpoints,
+		.magic	= STATSFS_MAGIC,
+	},
 	[FS__HUGETLBFS] = {
 		.name	= "hugetlbfs",
 		.mounts = hugetlbfs__known_mountpoints,
@@ -297,6 +317,7 @@ FS(sysfs,   FS__SYSFS);
 FS(procfs,  FS__PROCFS);
 FS(debugfs, FS__DEBUGFS);
 FS(tracefs, FS__TRACEFS);
+FS(statsfs, FS__STATSFS);
 FS(hugetlbfs, FS__HUGETLBFS);
 FS(bpf_fs, FS__BPF_FS);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] kvm_main: replace debugfs with stats_fs
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2020-05-26 11:03 ` [PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Use stats_fs API instead of debugfs to create sources and add values.

This also requires to change all architecture files to replace the old
debugfs_entries with stats_fs_vcpu_entries and statsfs_vm_entries.

The files/folders name and organization is kept unchanged, and a symlink
in sys/kernel/debugfs/kvm is left for backward compatibility.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 arch/arm64/kvm/Kconfig          |   1 +
 arch/arm64/kvm/guest.c          |   2 +-
 arch/mips/kvm/Kconfig           |   1 +
 arch/mips/kvm/mips.c            |   2 +-
 arch/powerpc/kvm/Kconfig        |   1 +
 arch/powerpc/kvm/book3s.c       |  12 +-
 arch/powerpc/kvm/booke.c        |   8 +-
 arch/s390/kvm/Kconfig           |   1 +
 arch/s390/kvm/kvm-s390.c        |  16 +-
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/Kconfig            |   1 +
 arch/x86/kvm/Makefile           |   2 +-
 arch/x86/kvm/debugfs.c          |  64 -------
 arch/x86/kvm/stats_fs.c         |  60 ++++++
 arch/x86/kvm/x86.c              |  11 +-
 include/linux/kvm_host.h        |  45 ++---
 virt/kvm/arm/arm.c              |   2 +-
 virt/kvm/kvm_main.c             | 318 +++++---------------------------
 18 files changed, 161 insertions(+), 388 deletions(-)
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/stats_fs.c

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449386d76441..f95f6d1c3610 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -23,6 +23,7 @@ config KVM
 	depends on OF
 	# for TASKSTATS/TASK_DELAY_ACCT:
 	depends on NET && MULTIUSER
+	select STATS_FS_API
 	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 8417b200bec9..235ed44e4353 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,7 +29,7 @@
 
 #include "trace.h"
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
 	VCPU_STAT("halt_successful_poll", halt_successful_poll),
 	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
 	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index b91d145aa2d5..b19fbc5297b4 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -19,6 +19,7 @@ config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
 	depends on MIPS_FP_SUPPORT
+	select STATS_FS_API
 	select EXPORT_UASM
 	select PREEMPT_NOTIFIERS
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index fdf1c14d9205..a47d21f35444 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,7 +39,7 @@
 #define VECTORSPACING 0x100	/* for EI/VI mode */
 #endif
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
 	VCPU_STAT("wait", wait_exits),
 	VCPU_STAT("cache", cache_exits),
 	VCPU_STAT("signal", signal_exits),
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 12885eda324e..6f0675edfe7c 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -19,6 +19,7 @@ if VIRTUALIZATION
 
 config KVM
 	bool
+	select STATS_FS_API
 	select PREEMPT_NOTIFIERS
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 37508a356f28..e3346b3087d0 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,7 +38,7 @@
 
 /* #define EXIT_DEBUG */
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
 	VCPU_STAT("exits", sum_exits),
 	VCPU_STAT("mmio", mmio_exits),
 	VCPU_STAT("sig", signal_exits),
@@ -66,8 +66,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("pthru_all", pthru_all),
 	VCPU_STAT("pthru_host", pthru_host),
 	VCPU_STAT("pthru_bad_aff", pthru_bad_aff),
-	VM_STAT("largepages_2M", num_2M_pages, .mode = 0444),
-	VM_STAT("largepages_1G", num_1G_pages, .mode = 0444),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vm_entries[] = {
+	VM_STAT("largepages_2M", num_2M_pages,
+		.value_flag = STATS_FS_FLOATING_VALUE),
+	VM_STAT("largepages_1G", num_1G_pages,
+		.value_flag = STATS_FS_FLOATING_VALUE),
 	{ NULL }
 };
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c2984cb6dfa7..b14c07786cc8 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -35,7 +35,12 @@
 
 unsigned long kvmppc_booke_handlers;
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vm_entries[] = {
+	VM_STAT("remote_tlb_flush", remote_tlb_flush),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_entries[] = {
 	VCPU_STAT("mmio", mmio_exits),
 	VCPU_STAT("sig", signal_exits),
 	VCPU_STAT("itlb_r", itlb_real_miss_exits),
@@ -54,7 +59,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("halt_wakeup", halt_wakeup),
 	VCPU_STAT("doorbell", dbell_exits),
 	VCPU_STAT("guest doorbell", gdbell_exits),
-	VM_STAT("remote_tlb_flush", remote_tlb_flush),
 	{ NULL }
 };
 
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index def3b60f1fe8..ec8b2e04d698 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
 	def_tristate y
 	prompt "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
+	select STATS_FS_API
 	select PREEMPT_NOTIFIERS
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index dbeb7da07f18..f2f090b78529 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -57,7 +57,16 @@
 #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
 			   (KVM_MAX_VCPUS + LOCAL_IRQS))
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vm_entries[] = {
+	VM_STAT("inject_float_mchk", inject_float_mchk),
+	VM_STAT("inject_io", inject_io),
+	VM_STAT("inject_pfault_done", inject_pfault_done),
+	VM_STAT("inject_service_signal", inject_service_signal),
+	VM_STAT("inject_virtio", inject_virtio),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_entries[] = {
 	VCPU_STAT("userspace_handled", exit_userspace),
 	VCPU_STAT("exit_null", exit_null),
 	VCPU_STAT("exit_validity", exit_validity),
@@ -95,18 +104,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("inject_ckc", inject_ckc),
 	VCPU_STAT("inject_cputm", inject_cputm),
 	VCPU_STAT("inject_external_call", inject_external_call),
-	VM_STAT("inject_float_mchk", inject_float_mchk),
 	VCPU_STAT("inject_emergency_signal", inject_emergency_signal),
-	VM_STAT("inject_io", inject_io),
 	VCPU_STAT("inject_mchk", inject_mchk),
-	VM_STAT("inject_pfault_done", inject_pfault_done),
 	VCPU_STAT("inject_program", inject_program),
 	VCPU_STAT("inject_restart", inject_restart),
-	VM_STAT("inject_service_signal", inject_service_signal),
 	VCPU_STAT("inject_set_prefix", inject_set_prefix),
 	VCPU_STAT("inject_stop_signal", inject_stop_signal),
 	VCPU_STAT("inject_pfault_init", inject_pfault_init),
-	VM_STAT("inject_virtio", inject_virtio),
 	VCPU_STAT("instruction_epsw", instruction_epsw),
 	VCPU_STAT("instruction_gs", instruction_gs),
 	VCPU_STAT("instruction_io_other", instruction_io_other),
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..6a04f590963f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -35,7 +35,7 @@
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
 
-#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
+#define __KVM_HAVE_ARCH_VCPU_STATS_FS
 
 #define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d8154e0684b6..0b53bb14c97e 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
 	# for TASKSTATS/TASK_DELAY_ACCT:
 	depends on NET && MULTIUSER
 	depends on X86_LOCAL_APIC
+	select STATS_FS_API
 	select PREEMPT_NOTIFIERS
 	select MMU_NOTIFIER
 	select HAVE_KVM_IRQCHIP
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a789759b7261..18285a382eba 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
-			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o
+			   hyperv.o stats_fs.o mmu/mmu.o mmu/page_track.o
 
 kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
deleted file mode 100644
index 018aebce33ff..000000000000
--- a/arch/x86/kvm/debugfs.c
+++ /dev/null
@@ -1,64 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Kernel-based Virtual Machine driver for Linux
- *
- * Copyright 2016 Red Hat, Inc. and/or its affiliates.
- */
-#include <linux/kvm_host.h>
-#include <linux/debugfs.h>
-#include "lapic.h"
-
-static int vcpu_get_timer_advance_ns(void *data, u64 *val)
-{
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->arch.apic->lapic_timer.timer_advance_ns;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, NULL, "%llu\n");
-
-static int vcpu_get_tsc_offset(void *data, u64 *val)
-{
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->arch.tsc_offset;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_offset_fops, vcpu_get_tsc_offset, NULL, "%lld\n");
-
-static int vcpu_get_tsc_scaling_ratio(void *data, u64 *val)
-{
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->arch.tsc_scaling_ratio;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_fops, vcpu_get_tsc_scaling_ratio, NULL, "%llu\n");
-
-static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val)
-{
-	*val = kvm_tsc_scaling_ratio_frac_bits;
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n");
-
-void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
-{
-	debugfs_create_file("tsc-offset", 0444, vcpu->debugfs_dentry, vcpu,
-			    &vcpu_tsc_offset_fops);
-
-	if (lapic_in_kernel(vcpu))
-		debugfs_create_file("lapic_timer_advance_ns", 0444,
-				    vcpu->debugfs_dentry, vcpu,
-				    &vcpu_timer_advance_ns_fops);
-
-	if (kvm_has_tsc_control) {
-		debugfs_create_file("tsc-scaling-ratio", 0444,
-				    vcpu->debugfs_dentry, vcpu,
-				    &vcpu_tsc_scaling_fops);
-		debugfs_create_file("tsc-scaling-ratio-frac-bits", 0444,
-				    vcpu->debugfs_dentry, vcpu,
-				    &vcpu_tsc_scaling_frac_fops);
-	}
-}
diff --git a/arch/x86/kvm/stats_fs.c b/arch/x86/kvm/stats_fs.c
new file mode 100644
index 000000000000..f6edebb9c559
--- /dev/null
+++ b/arch/x86/kvm/stats_fs.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Kernel-based Virtual Machine driver for Linux
+ *
+ * Copyright 2016 Red Hat, Inc. and/or its affiliates.
+ */
+#include <linux/kvm_host.h>
+#include <linux/stats_fs.h>
+#include "lapic.h"
+
+#define VCPU_ARCH_STATS_FS(n, s, x, ...)					\
+			{ n, offsetof(struct s, x), .aggr_kind = STATS_FS_SUM,	\
+			  ##__VA_ARGS__ }
+
+struct stats_fs_value stats_fs_vcpu_tsc_offset[] = {
+	VCPU_ARCH_STATS_FS("tsc-offset", kvm_vcpu_arch, tsc_offset,
+			   .type = &stats_fs_type_s64,
+			   .value_flag = STATS_FS_FLOATING_VALUE),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_arch_lapic_timer[] = {
+	VCPU_ARCH_STATS_FS("lapic_timer_advance_ns", kvm_timer, timer_advance_ns,
+			   .type = &stats_fs_type_u64,
+			   .value_flag = STATS_FS_FLOATING_VALUE),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_arch_tsc_ratio[] = {
+	VCPU_ARCH_STATS_FS("tsc-scaling-ratio", kvm_vcpu_arch, tsc_scaling_ratio,
+			   .type = &stats_fs_type_u64,
+			   .value_flag = STATS_FS_FLOATING_VALUE),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_arch_tsc_frac[] = {
+	{ "tsc-scaling-ratio-frac-bits", 0, .type = &stats_fs_type_u64,
+	  .value_flag = STATS_FS_FLOATING_VALUE },
+	{ NULL } /* base is &kvm_tsc_scaling_ratio_frac_bits */
+};
+
+void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
+{
+	stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_tsc_offset,
+				   &vcpu->arch, 0);
+
+	if (lapic_in_kernel(vcpu))
+		stats_fs_source_add_values(vcpu->stats_fs_src,
+					   stats_fs_vcpu_arch_lapic_timer,
+					   &vcpu->arch.apic->lapic_timer, 0);
+
+	if (kvm_has_tsc_control) {
+		stats_fs_source_add_values(vcpu->stats_fs_src,
+					   stats_fs_vcpu_arch_tsc_ratio,
+					   &vcpu->arch, 0);
+		stats_fs_source_add_values(vcpu->stats_fs_src,
+					   stats_fs_vcpu_arch_tsc_frac,
+					   &kvm_tsc_scaling_ratio_frac_bits, 0);
+	}
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 35723dafedeb..e441fbc00c03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -190,7 +190,7 @@ static u64 __read_mostly host_xss;
 u64 __read_mostly supported_xss;
 EXPORT_SYMBOL_GPL(supported_xss);
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
 	VCPU_STAT("pf_fixed", pf_fixed),
 	VCPU_STAT("pf_guest", pf_guest),
 	VCPU_STAT("tlb_flush", tlb_flush),
@@ -217,6 +217,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("nmi_injections", nmi_injections),
 	VCPU_STAT("req_event", req_event),
 	VCPU_STAT("l1d_flush", l1d_flush),
+	{ NULL }
+};
+
+struct stats_fs_value stats_fs_vm_entries[] = {
 	VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped),
 	VM_STAT("mmu_pte_write", mmu_pte_write),
 	VM_STAT("mmu_pte_updated", mmu_pte_updated),
@@ -226,8 +230,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VM_STAT("mmu_cache_miss", mmu_cache_miss),
 	VM_STAT("mmu_unsync", mmu_unsync),
 	VM_STAT("remote_tlb_flush", remote_tlb_flush),
-	VM_STAT("largepages", lpages, .mode = 0444),
-	VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
+	VM_STAT("largepages", lpages, .value_flag = STATS_FS_FLOATING_VALUE),
+	VM_STAT("nx_largepages_splitted", nx_lpage_splits,
+		.value_flag = STATS_FS_FLOATING_VALUE),
 	VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
 	{ NULL }
 };
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3845f857ef7b..f7b6a48bac8f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -27,6 +27,7 @@
 #include <linux/refcount.h>
 #include <linux/nospec.h>
 #include <asm/signal.h>
+#include <linux/stats_fs.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -318,7 +319,7 @@ struct kvm_vcpu {
 	bool preempted;
 	bool ready;
 	struct kvm_vcpu_arch arch;
-	struct dentry *debugfs_dentry;
+	struct stats_fs_source *stats_fs_src;
 };
 
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
@@ -498,8 +499,7 @@ struct kvm {
 	long tlbs_dirty;
 	struct list_head devices;
 	u64 manual_dirty_log_protect;
-	struct dentry *debugfs_dentry;
-	struct kvm_stat_data **debugfs_stat_data;
+	struct stats_fs_source *stats_fs_src;
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
@@ -880,8 +880,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
 
-#ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
-void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu);
+#ifdef __KVM_HAVE_ARCH_VCPU_STATS_FS
+void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu);
 #endif
 
 int kvm_arch_hardware_enable(void);
@@ -1110,33 +1110,16 @@ static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
 	return kvm_is_error_hva(hva);
 }
 
-enum kvm_stat_kind {
-	KVM_STAT_VM,
-	KVM_STAT_VCPU,
-};
-
-struct kvm_stat_data {
-	struct kvm *kvm;
-	struct kvm_stats_debugfs_item *dbgfs_item;
-};
-
-struct kvm_stats_debugfs_item {
-	const char *name;
-	int offset;
-	enum kvm_stat_kind kind;
-	int mode;
-};
-
-#define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
-	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
-
-#define VM_STAT(n, x, ...) 													\
-	{ n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
-#define VCPU_STAT(n, x, ...)												\
-	{ n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
+#define VM_STAT(n, x, ...)						       \
+	{ n, offsetof(struct kvm, stat.x), &stats_fs_type_u64,		       \
+	  STATS_FS_SUM, ## __VA_ARGS__ }
+#define VCPU_STAT(n, x, ...)						       \
+	{ n, offsetof(struct kvm_vcpu, stat.x), &stats_fs_type_u64,	       \
+	  STATS_FS_SUM, ## __VA_ARGS__ }
 
-extern struct kvm_stats_debugfs_item debugfs_entries[];
-extern struct dentry *kvm_debugfs_dir;
+extern struct stats_fs_value stats_fs_vcpu_entries[];
+extern struct stats_fs_value stats_fs_vm_entries[];
+extern struct stats_fs_source *kvm_stats_fs_dir;
 
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..4171f92fa473 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -140,7 +140,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return ret;
 }
 
-int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+int kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 74bdb7bf3295..3d2dccb5234e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -25,6 +25,7 @@
 #include <linux/vmalloc.h>
 #include <linux/reboot.h>
 #include <linux/debugfs.h>
+#include <linux/stats_fs.h>
 #include <linux/highmem.h>
 #include <linux/file.h>
 #include <linux/syscore_ops.h>
@@ -109,11 +110,8 @@ static struct kmem_cache *kvm_vcpu_cache;
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
 
-struct dentry *kvm_debugfs_dir;
-EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
-
-static int kvm_debugfs_num_entries;
-static const struct file_operations stat_fops_per_vm;
+struct stats_fs_source *kvm_stats_fs_dir;
+EXPORT_SYMBOL_GPL(kvm_stats_fs_dir);
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
@@ -356,6 +354,8 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 
 void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	stats_fs_source_revoke(vcpu->stats_fs_src);
+	stats_fs_source_put(vcpu->stats_fs_src);
 	kvm_arch_vcpu_destroy(vcpu);
 
 	/*
@@ -601,52 +601,29 @@ static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
 	kvfree(slots);
 }
 
-static void kvm_destroy_vm_debugfs(struct kvm *kvm)
+static void kvm_destroy_vm_stats_fs(struct kvm *kvm)
 {
-	int i;
-
-	if (!kvm->debugfs_dentry)
-		return;
-
-	debugfs_remove_recursive(kvm->debugfs_dentry);
-
-	if (kvm->debugfs_stat_data) {
-		for (i = 0; i < kvm_debugfs_num_entries; i++)
-			kfree(kvm->debugfs_stat_data[i]);
-		kfree(kvm->debugfs_stat_data);
-	}
+	stats_fs_source_remove_subordinate(kvm_stats_fs_dir, kvm->stats_fs_src);
+	stats_fs_source_revoke(kvm->stats_fs_src);
+	stats_fs_source_put(kvm->stats_fs_src);
 }
 
-static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
+static int kvm_create_vm_stats_fs(struct kvm *kvm, int fd)
 {
 	char dir_name[ITOA_MAX_LEN * 2];
-	struct kvm_stat_data *stat_data;
-	struct kvm_stats_debugfs_item *p;
 
-	if (!debugfs_initialized())
+	if (!stats_fs_initialized())
 		return 0;
 
 	snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
-	kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+	kvm->stats_fs_src = stats_fs_source_create(0, dir_name);
+	stats_fs_source_add_subordinate(kvm_stats_fs_dir, kvm->stats_fs_src);
 
-	kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
-					 sizeof(*kvm->debugfs_stat_data),
-					 GFP_KERNEL_ACCOUNT);
-	if (!kvm->debugfs_stat_data)
-		return -ENOMEM;
+	stats_fs_source_add_values(kvm->stats_fs_src, stats_fs_vm_entries,
+				   kvm, 0);
 
-	for (p = debugfs_entries; p->name; p++) {
-		stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL_ACCOUNT);
-		if (!stat_data)
-			return -ENOMEM;
-
-		stat_data->kvm = kvm;
-		stat_data->dbgfs_item = p;
-		kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
-		debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
-				    kvm->debugfs_dentry, stat_data,
-				    &stat_fops_per_vm);
-	}
+	stats_fs_source_add_values(kvm->stats_fs_src, stats_fs_vcpu_entries,
+				   NULL, 0);
 	return 0;
 }
 
@@ -783,7 +760,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	struct mm_struct *mm = kvm->mm;
 
 	kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
-	kvm_destroy_vm_debugfs(kvm);
+	kvm_destroy_vm_stats_fs(kvm);
 	kvm_arch_sync_events(kvm);
 	mutex_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
@@ -2946,7 +2923,6 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp)
 {
 	struct kvm_vcpu *vcpu = filp->private_data;
 
-	debugfs_remove_recursive(vcpu->debugfs_dentry);
 	kvm_put_kvm(vcpu->kvm);
 	return 0;
 }
@@ -2970,19 +2946,23 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 	return anon_inode_getfd(name, &kvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC);
 }
 
-static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
+static void kvm_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
 {
-#ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
 	char dir_name[ITOA_MAX_LEN * 2];
 
-	if (!debugfs_initialized())
+	if (!stats_fs_initialized())
 		return;
 
 	snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id);
-	vcpu->debugfs_dentry = debugfs_create_dir(dir_name,
-						  vcpu->kvm->debugfs_dentry);
 
-	kvm_arch_create_vcpu_debugfs(vcpu);
+	vcpu->stats_fs_src = stats_fs_source_create(0, dir_name);
+	stats_fs_source_add_subordinate(vcpu->kvm->stats_fs_src, vcpu->stats_fs_src);
+
+	stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_entries, vcpu,
+				   STATS_FS_HIDDEN);
+
+#ifdef __KVM_HAVE_ARCH_VCPU_STATS_FS
+	kvm_arch_create_vcpu_stats_fs(vcpu);
 #endif
 }
 
@@ -3031,8 +3011,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	if (r)
 		goto vcpu_free_run_page;
 
-	kvm_create_vcpu_debugfs(vcpu);
-
 	mutex_lock(&kvm->lock);
 	if (kvm_get_vcpu_by_id(kvm, id)) {
 		r = -EEXIST;
@@ -3061,11 +3039,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 
 	mutex_unlock(&kvm->lock);
 	kvm_arch_vcpu_postcreate(vcpu);
+	kvm_create_vcpu_stats_fs(vcpu);
 	return r;
 
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
-	debugfs_remove_recursive(vcpu->debugfs_dentry);
 	kvm_arch_vcpu_destroy(vcpu);
 vcpu_free_run_page:
 	free_page((unsigned long)vcpu->run);
@@ -3839,7 +3817,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 	 * cases it will be called by the final fput(file) and will take
 	 * care of doing kvm_put_kvm(kvm).
 	 */
-	if (kvm_create_vm_debugfs(kvm, r) < 0) {
+	if (kvm_create_vm_stats_fs(kvm, r) < 0) {
 		put_unused_fd(r);
 		fput(file);
 		return -ENOMEM;
@@ -4295,214 +4273,6 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 }
 EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
 
-static int kvm_debugfs_open(struct inode *inode, struct file *file,
-			   int (*get)(void *, u64 *), int (*set)(void *, u64),
-			   const char *fmt)
-{
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
-					  inode->i_private;
-
-	/* The debugfs files are a reference to the kvm struct which
-	 * is still valid when kvm_destroy_vm is called.
-	 * To avoid the race between open and the removal of the debugfs
-	 * directory we test against the users count.
-	 */
-	if (!refcount_inc_not_zero(&stat_data->kvm->users_count))
-		return -ENOENT;
-
-	if (simple_attr_open(inode, file, get,
-		    KVM_DBGFS_GET_MODE(stat_data->dbgfs_item) & 0222
-		    ? set : NULL,
-		    fmt)) {
-		kvm_put_kvm(stat_data->kvm);
-		return -ENOMEM;
-	}
-
-	return 0;
-}
-
-static int kvm_debugfs_release(struct inode *inode, struct file *file)
-{
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
-					  inode->i_private;
-
-	simple_attr_release(inode, file);
-	kvm_put_kvm(stat_data->kvm);
-
-	return 0;
-}
-
-static int kvm_get_stat_per_vm(struct kvm *kvm, size_t offset, u64 *val)
-{
-	*val = *(ulong *)((void *)kvm + offset);
-
-	return 0;
-}
-
-static int kvm_clear_stat_per_vm(struct kvm *kvm, size_t offset)
-{
-	*(ulong *)((void *)kvm + offset) = 0;
-
-	return 0;
-}
-
-static int kvm_get_stat_per_vcpu(struct kvm *kvm, size_t offset, u64 *val)
-{
-	int i;
-	struct kvm_vcpu *vcpu;
-
-	*val = 0;
-
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		*val += *(u64 *)((void *)vcpu + offset);
-
-	return 0;
-}
-
-static int kvm_clear_stat_per_vcpu(struct kvm *kvm, size_t offset)
-{
-	int i;
-	struct kvm_vcpu *vcpu;
-
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		*(u64 *)((void *)vcpu + offset) = 0;
-
-	return 0;
-}
-
-static int kvm_stat_data_get(void *data, u64 *val)
-{
-	int r = -EFAULT;
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
-
-	switch (stat_data->dbgfs_item->kind) {
-	case KVM_STAT_VM:
-		r = kvm_get_stat_per_vm(stat_data->kvm,
-					stat_data->dbgfs_item->offset, val);
-		break;
-	case KVM_STAT_VCPU:
-		r = kvm_get_stat_per_vcpu(stat_data->kvm,
-					  stat_data->dbgfs_item->offset, val);
-		break;
-	}
-
-	return r;
-}
-
-static int kvm_stat_data_clear(void *data, u64 val)
-{
-	int r = -EFAULT;
-	struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
-
-	if (val)
-		return -EINVAL;
-
-	switch (stat_data->dbgfs_item->kind) {
-	case KVM_STAT_VM:
-		r = kvm_clear_stat_per_vm(stat_data->kvm,
-					  stat_data->dbgfs_item->offset);
-		break;
-	case KVM_STAT_VCPU:
-		r = kvm_clear_stat_per_vcpu(stat_data->kvm,
-					    stat_data->dbgfs_item->offset);
-		break;
-	}
-
-	return r;
-}
-
-static int kvm_stat_data_open(struct inode *inode, struct file *file)
-{
-	__simple_attr_check_format("%llu\n", 0ull);
-	return kvm_debugfs_open(inode, file, kvm_stat_data_get,
-				kvm_stat_data_clear, "%llu\n");
-}
-
-static const struct file_operations stat_fops_per_vm = {
-	.owner = THIS_MODULE,
-	.open = kvm_stat_data_open,
-	.release = kvm_debugfs_release,
-	.read = simple_attr_read,
-	.write = simple_attr_write,
-	.llseek = no_llseek,
-};
-
-static int vm_stat_get(void *_offset, u64 *val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-	u64 tmp_val;
-
-	*val = 0;
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_get_stat_per_vm(kvm, offset, &tmp_val);
-		*val += tmp_val;
-	}
-	mutex_unlock(&kvm_lock);
-	return 0;
-}
-
-static int vm_stat_clear(void *_offset, u64 val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-
-	if (val)
-		return -EINVAL;
-
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_clear_stat_per_vm(kvm, offset);
-	}
-	mutex_unlock(&kvm_lock);
-
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, vm_stat_clear, "%llu\n");
-
-static int vcpu_stat_get(void *_offset, u64 *val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-	u64 tmp_val;
-
-	*val = 0;
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_get_stat_per_vcpu(kvm, offset, &tmp_val);
-		*val += tmp_val;
-	}
-	mutex_unlock(&kvm_lock);
-	return 0;
-}
-
-static int vcpu_stat_clear(void *_offset, u64 val)
-{
-	unsigned offset = (long)_offset;
-	struct kvm *kvm;
-
-	if (val)
-		return -EINVAL;
-
-	mutex_lock(&kvm_lock);
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_clear_stat_per_vcpu(kvm, offset);
-	}
-	mutex_unlock(&kvm_lock);
-
-	return 0;
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, vcpu_stat_clear,
-			"%llu\n");
-
-static const struct file_operations *stat_fops[] = {
-	[KVM_STAT_VCPU] = &vcpu_stat_fops,
-	[KVM_STAT_VM]   = &vm_stat_fops,
-};
-
 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
 {
 	struct kobj_uevent_env *env;
@@ -4537,34 +4307,33 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
 	}
 	add_uevent_var(env, "PID=%d", kvm->userspace_pid);
 
-	if (!IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
+	if (!IS_ERR_OR_NULL(kvm->stats_fs_src->source_dentry)) {
 		char *tmp, *p = kmalloc(PATH_MAX, GFP_KERNEL_ACCOUNT);
 
 		if (p) {
-			tmp = dentry_path_raw(kvm->debugfs_dentry, p, PATH_MAX);
+			tmp = dentry_path_raw(kvm->stats_fs_src->source_dentry,
+					      p, PATH_MAX);
 			if (!IS_ERR(tmp))
 				add_uevent_var(env, "STATS_PATH=%s", tmp);
 			kfree(p);
 		}
 	}
+
 	/* no need for checks, since we are adding at most only 5 keys */
 	env->envp[env->envp_idx++] = NULL;
 	kobject_uevent_env(&kvm_dev.this_device->kobj, KOBJ_CHANGE, env->envp);
 	kfree(env);
 }
 
-static void kvm_init_debug(void)
+static void kvm_init_stats_fs(void)
 {
-	struct kvm_stats_debugfs_item *p;
+	kvm_stats_fs_dir = stats_fs_source_create(0, "kvm");
+	/* symlink to debugfs */
+	debugfs_create_symlink("kvm", NULL, "/sys/kernel/stats/kvm");
+	stats_fs_source_register(kvm_stats_fs_dir);
 
-	kvm_debugfs_dir = debugfs_create_dir("kvm", NULL);
-
-	kvm_debugfs_num_entries = 0;
-	for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
-		debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
-				    kvm_debugfs_dir, (void *)(long)p->offset,
-				    stat_fops[p->kind]);
-	}
+	stats_fs_source_add_values(kvm_stats_fs_dir, stats_fs_vcpu_entries, NULL, 0);
+	stats_fs_source_add_values(kvm_stats_fs_dir, stats_fs_vm_entries, NULL, 0);
 }
 
 static int kvm_suspend(void)
@@ -4738,7 +4507,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	kvm_preempt_ops.sched_in = kvm_sched_in;
 	kvm_preempt_ops.sched_out = kvm_sched_out;
 
-	kvm_init_debug();
+	kvm_init_stats_fs();
 
 	r = kvm_vfio_ops_init();
 	WARN_ON(r);
@@ -4767,7 +4536,8 @@ EXPORT_SYMBOL_GPL(kvm_init);
 
 void kvm_exit(void)
 {
-	debugfs_remove_recursive(kvm_debugfs_dir);
+	stats_fs_source_revoke(kvm_stats_fs_dir);
+	stats_fs_source_put(kvm_stats_fs_dir);
 	misc_deregister(&kvm_dev);
 	kmem_cache_destroy(kvm_vcpu_cache);
 	kvm_async_pf_deinit();
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2020-05-26 11:03 ` [PATCH v3 5/7] kvm_main: replace debugfs with stats_fs Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-05-26 11:03 ` [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Add an example of the show function using the mp_state value.

mp_state is an enum that represents the VCPU state,
so instead of displaying its integer representation,
the show function takes care of translating the integer into a
more meaningful string representation.

The VCPU status is shown in the kvm/<vmid>/vcpu<cpuid>/mp_state file

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 arch/x86/kvm/stats_fs.c | 54 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/x86/kvm/stats_fs.c b/arch/x86/kvm/stats_fs.c
index f6edebb9c559..902be18562da 100644
--- a/arch/x86/kvm/stats_fs.c
+++ b/arch/x86/kvm/stats_fs.c
@@ -39,11 +39,65 @@ struct stats_fs_value stats_fs_vcpu_arch_tsc_frac[] = {
 	{ NULL } /* base is &kvm_tsc_scaling_ratio_frac_bits */
 };
 
+char *stats_fs_vcpu_get_mpstate(uint64_t state)
+{
+	char *state_str;
+
+	state_str = kzalloc(20, GFP_KERNEL);
+	if (!state_str)
+		return ERR_PTR(-ENOMEM);
+
+	switch (state) {
+	case KVM_MP_STATE_RUNNABLE:
+		strcpy(state_str, "RUNNABLE");
+		break;
+	case KVM_MP_STATE_UNINITIALIZED:
+		strcpy(state_str, "UNINITIALIZED");
+		break;
+	case KVM_MP_STATE_INIT_RECEIVED:
+		strcpy(state_str, "INIT_RECEIVED");
+		break;
+	case KVM_MP_STATE_HALTED:
+		strcpy(state_str, "HALTED");
+		break;
+	case KVM_MP_STATE_SIPI_RECEIVED:
+		strcpy(state_str, "SIPI_RECEIVED");
+		break;
+	case KVM_MP_STATE_STOPPED:
+		strcpy(state_str, "STOPPED");
+		break;
+	case KVM_MP_STATE_CHECK_STOP:
+		strcpy(state_str, "CHECK_STOP");
+		break;
+	case KVM_MP_STATE_OPERATING:
+		strcpy(state_str, "OPERATING");
+		break;
+	case KVM_MP_STATE_LOAD:
+		strcpy(state_str, "LOAD");
+		break;
+	default:
+		strcpy(state_str, "UNRECOGNIZED");
+		break;
+	}
+
+	return state_str;
+}
+
+struct stats_fs_value stats_fs_vcpu_mp_state[] = {
+	VCPU_ARCH_STATS_FS("mp_state", kvm_vcpu_arch, mp_state,
+			   .type = &stats_fs_type_u32,
+			   .show = stats_fs_vcpu_get_mpstate),
+	{ NULL }
+};
+
 void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
 {
 	stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_tsc_offset,
 				   &vcpu->arch, 0);
 
+	stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_mp_state,
+				   &vcpu->arch, 0);
+
 	if (lapic_in_kernel(vcpu))
 		stats_fs_source_add_values(vcpu->stats_fs_src,
 					   stats_fs_vcpu_arch_lapic_timer,
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (5 preceding siblings ...)
  2020-05-26 11:03 ` [PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function Emanuele Giuseppe Esposito
@ 2020-05-26 11:03 ` Emanuele Giuseppe Esposito
  2020-05-26 14:16   ` Andrew Lunn
  2020-05-26 22:31 ` [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Jakub Kicinski
  2020-05-27 21:17 ` Andrew Lunn
  8 siblings, 1 reply; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 11:03 UTC (permalink / raw)
  To: kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Emanuele Giuseppe Esposito

Apply stats_fs on the networking statistics subsystem.

Currently it only works with disabled network namespace
(CONFIG_NET_NS=n), because multiple namespaces will have the same
device name under the same root source that will cause a conflict in
stats_fs.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 include/linux/netdevice.h |  2 ++
 net/Kconfig               |  1 +
 net/core/dev.c            | 66 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 130a668049ab..408c4e7b0e21 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -48,6 +48,7 @@
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
 #include <linux/hashtable.h>
+#include <linux/stats_fs.h>
 
 struct netpoll_info;
 struct device;
@@ -2117,6 +2118,7 @@ struct net_device {
 	unsigned		wol_enabled:1;
 
 	struct list_head	net_notifier_list;
+	struct stats_fs_source	*stats_fs_src;
 
 #if IS_ENABLED(CONFIG_MACSEC)
 	/* MACsec management functions */
diff --git a/net/Kconfig b/net/Kconfig
index df8d8c9bd021..3441d5bb6107 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -8,6 +8,7 @@ menuconfig NET
 	select NLATTR
 	select GENERIC_NET_UTILS
 	select BPF
+	select STATS_FS_API
 	---help---
 	  Unless you really know what you are doing, you should say Y here.
 	  The reason is that some programs need kernel networking support even
diff --git a/net/core/dev.c b/net/core/dev.c
index 522288177bbd..3db48cd1a097 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -142,6 +142,7 @@
 #include <linux/net_namespace.h>
 #include <linux/indirect_call_wrapper.h>
 #include <net/devlink.h>
+#include <linux/stats_fs.h>
 
 #include "net-sysfs.h"
 
@@ -150,6 +151,11 @@
 /* This should be increased if a protocol with a bigger head is added. */
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
+#define NETDEV_STAT(str, m, ...)						\
+	{ str, offsetof(struct rtnl_link_stats64, m),				\
+	  &stats_fs_type_netdev_u64,						\
+	  STATS_FS_SUM, ## __VA_ARGS__ }
+
 static DEFINE_SPINLOCK(ptype_lock);
 static DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
@@ -196,6 +202,53 @@ static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8);
 
 static seqcount_t devnet_rename_seq;
 
+static uint64_t stats_fs_get_netdev_u64(struct stats_fs_value *val,
+					void *base)
+{
+	struct net_device *netdev = (struct net_device *)base;
+	struct rtnl_link_stats64 net_stats;
+
+	dev_get_stats(netdev, &net_stats);
+
+	return stats_fs_get_u64(val, &net_stats);
+}
+
+static struct stats_fs_type stats_fs_type_netdev_u64 = {
+	.get = stats_fs_get_netdev_u64,
+	.clear = NULL,
+	.sign = false
+};
+
+static struct stats_fs_source *netdev_root;
+
+static struct stats_fs_value stats_fs_netdev_entries[] = {
+	NETDEV_STAT("rx_packets", rx_packets),
+	NETDEV_STAT("tx_packets", tx_packets),
+	NETDEV_STAT("rx_bytes", rx_bytes),
+	NETDEV_STAT("tx_bytes", tx_bytes),
+	NETDEV_STAT("rx_errors", rx_errors),
+	NETDEV_STAT("tx_errors", tx_errors),
+	NETDEV_STAT("rx_dropped", rx_dropped),
+	NETDEV_STAT("tx_dropped", tx_dropped),
+	NETDEV_STAT("multicast", multicast),
+	NETDEV_STAT("collisions", collisions),
+	NETDEV_STAT("rx_length_errors", rx_length_errors),
+	NETDEV_STAT("rx_over_errors", rx_over_errors),
+	NETDEV_STAT("rx_crc_errors", rx_crc_errors),
+	NETDEV_STAT("rx_frame_errors", rx_frame_errors),
+	NETDEV_STAT("rx_fifo_errors", rx_fifo_errors),
+	NETDEV_STAT("rx_missed_errors", rx_missed_errors),
+	NETDEV_STAT("tx_aborted_errors", tx_aborted_errors),
+	NETDEV_STAT("tx_carrier_errors", tx_carrier_errors),
+	NETDEV_STAT("tx_fifo_errors", tx_fifo_errors),
+	NETDEV_STAT("tx_heartbeat_errors", tx_heartbeat_errors),
+	NETDEV_STAT("tx_window_errors", tx_window_errors),
+	NETDEV_STAT("rx_compressed", rx_compressed),
+	NETDEV_STAT("tx_compressed", tx_compressed),
+	NETDEV_STAT("rx_nohandler", rx_nohandler),
+	{ NULL }
+};
+
 static inline void dev_base_seq_inc(struct net *net)
 {
 	while (++net->dev_base_seq == 0)
@@ -8783,6 +8836,11 @@ static void rollback_registered_many(struct list_head *head)
 	ASSERT_RTNL();
 
 	list_for_each_entry_safe(dev, tmp, head, unreg_list) {
+		stats_fs_source_remove_subordinate(netdev_root,
+						   dev->stats_fs_src);
+		stats_fs_source_revoke(dev->stats_fs_src);
+		stats_fs_source_put(dev->stats_fs_src);
+
 		/* Some devices call without registering
 		 * for initialization unwind. Remove those
 		 * devices and proceed with the remaining.
@@ -9436,6 +9494,11 @@ int register_netdevice(struct net_device *dev)
 	    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
 		rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
 
+	dev->stats_fs_src = stats_fs_source_create(0, dev->name);
+	stats_fs_source_add_subordinate(netdev_root, dev->stats_fs_src);
+	stats_fs_source_add_values(dev->stats_fs_src, stats_fs_netdev_entries,
+				   dev, 0);
+
 out:
 	return ret;
 
@@ -10500,6 +10563,9 @@ static int __init net_dev_init(void)
 	if (netdev_kobject_init())
 		goto out;
 
+	netdev_root = stats_fs_source_create(0, "net");
+	stats_fs_source_register(netdev_root);
+
 	INIT_LIST_HEAD(&ptype_all);
 	for (i = 0; i < PTYPE_HASH_SIZE; i++)
 		INIT_LIST_HEAD(&ptype_base[i]);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API
  2020-05-26 11:03 ` [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API Emanuele Giuseppe Esposito
@ 2020-05-26 14:16   ` Andrew Lunn
  2020-05-26 15:45     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Lunn @ 2020-05-26 14:16 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev

On Tue, May 26, 2020 at 01:03:17PM +0200, Emanuele Giuseppe Esposito wrote:
> Apply stats_fs on the networking statistics subsystem.
> 
> Currently it only works with disabled network namespace
> (CONFIG_NET_NS=n), because multiple namespaces will have the same
> device name under the same root source that will cause a conflict in
> stats_fs.

Hi Emanuele

How do you atomically get and display a group of statistics?

If you look at how the netlink socket works, you will see code like:

                do {
                        start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
                        rx_packets = cpu_stats->rx_packets;
                        rx_bytes = cpu_stats->rx_bytes;
			....
                } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));

It will ensure that rx_packets and rx_bytes are consistent with each
other. If the value of the sequence counter changes while inside the
loop, the loop so repeated until it does not change.

In general, hardware counters in NICs are the same.  You tell it to
take a snapshot of the statistics counters, and then read them all
back, to give a consistent view across all the statistics.

I've not looked at this new code in detail, but it looks like you have
one file per statistic, and assume each statistic is independent of
every other statistic. This independence can limit how you use the
values, particularly when debugging. The netlink interface we use does
not have this limitation.

	     Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API
  2020-05-26 14:16   ` Andrew Lunn
@ 2020-05-26 15:45     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-26 15:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev


Hi Andrew

> How do you atomically get and display a group of statistics?
> 
> If you look at how the netlink socket works, you will see code like:
> 
>                  do {
>                          start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
>                          rx_packets = cpu_stats->rx_packets;
>                          rx_bytes = cpu_stats->rx_bytes;
> 			....
>                  } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));
> 
> It will ensure that rx_packets and rx_bytes are consistent with each
> other. If the value of the sequence counter changes while inside the
> loop, the loop so repeated until it does not change.
> 
> In general, hardware counters in NICs are the same.  You tell it to
> take a snapshot of the statistics counters, and then read them all
> back, to give a consistent view across all the statistics.
> 
> I've not looked at this new code in detail, but it looks like you have
> one file per statistic, and assume each statistic is independent of
> every other statistic. This independence can limit how you use the
> values, particularly when debugging. The netlink interface we use does
> not have this limitation.

You're right, statistics are treated independently so what you describe 
is currently not supported.

In KVM the utilization is more qualitative, so there isn't such problem.
But as long as the interface is based on file access, the possibility of 
snapshotting might not be useful; however, it could still be considered 
to be added later together with the binary access.

Jonathan, how is your metricfs handling this case?

Thank you,
Emanuele


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (6 preceding siblings ...)
  2020-05-26 11:03 ` [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API Emanuele Giuseppe Esposito
@ 2020-05-26 22:31 ` Jakub Kicinski
  2020-05-27 13:14   ` Emanuele Giuseppe Esposito
  2020-05-27 21:17 ` Andrew Lunn
  8 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2020-05-26 22:31 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev

On Tue, 26 May 2020 13:03:10 +0200 Emanuele Giuseppe Esposito wrote:
> There is currently no common way for Linux kernel subsystems to expose
> statistics to userspace shared throughout the Linux kernel; subsystems have
> to take care of gathering and displaying statistics by themselves, for
> example in the form of files in debugfs. For example KVM has its own code
> section that takes care of this in virt/kvm/kvm_main.c, where it sets up
> debugfs handlers for displaying values and aggregating them from various
> subfolders to obtain information about the system state (i.e. displaying
> the total number of exits, calculated by summing all exits of all cpus of
> all running virtual machines).
> 
> Allowing each section of the kernel to do so has two disadvantages. First,
> it will introduce redundant code. Second, debugfs is anyway not the right
> place for statistics (for example it is affected by lockdown)
> 
> In this patch series I introduce statsfs, a synthetic ram-based virtual
> filesystem that takes care of gathering and displaying statistics for the
> Linux kernel subsystems.
> 
> The file system is mounted on /sys/kernel/stats and would be already used
> by kvm. Statsfs was initially introduced by Paolo Bonzini [1].

What's the direct motivation for this work? Moving KVM stats out of
debugfs?

In my experience stats belong in the API used for creating/enumerating
objects, statsfs sounds like going in the exact opposite direction -
creating a parallel structure / hierarchy for exposing stats. I know
nothing about KVM but are you sure all the info that has to be exposed
will be stats?

In case of networking we have the basic stats in sysfs, under the
netdevice's kobject. But since we're not using sysfs much any more 
for config, new stats are added in netlink APIs. Again - same APIs
used for enumeration and config.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/7] kunit: tests for stats_fs API
  2020-05-26 11:03 ` [PATCH v3 3/7] kunit: tests for stats_fs API Emanuele Giuseppe Esposito
@ 2020-05-27 10:05   ` Alan Maguire
  2020-05-27 13:26     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Maguire @ 2020-05-27 10:05 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, brendanhiggins, linux-kselftest, kunit-dev

On Tue, 26 May 2020, Emanuele Giuseppe Esposito wrote:

> Add kunit tests to extensively test the stats_fs API functionality.
>

I've added in the kunit-related folks.
 
> In order to run them, the kernel .config must set CONFIG_KUNIT=y
> and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
> and CONFIG_STATS_FS_TEST=y
>

It looks like CONFIG_STATS_FS is built-in, but it exports
much of the functionality you are testing.  However could the
tests also be built as a module (i.e. make CONFIG_STATS_FS_TEST
a tristate variable)? To test this you'd need to specify
CONFIG_KUNIT=m and CONFIG_STATS_FS_TEST=m, and testing would
simply be a case of "modprobe"ing the stats fs module and collecting
results in /sys/kernel/debug/kunit/<module_name> (rather 
than running kunit.py). Are you relying on unexported internals in
the the tests that would prevent building them as a module?

Thanks!

Alan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-26 22:31 ` [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Jakub Kicinski
@ 2020-05-27 13:14   ` Emanuele Giuseppe Esposito
  2020-05-27 13:33     ` Andrew Lunn
  2020-05-27 20:23     ` Jakub Kicinski
  0 siblings, 2 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-27 13:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Andrew Lunn


>>
>> The file system is mounted on /sys/kernel/stats and would be already used
>> by kvm. Statsfs was initially introduced by Paolo Bonzini [1].
> 
> What's the direct motivation for this work? Moving KVM stats out of
> debugfs?

There's many reasons: one of these is not using debugfs for statistics, 
but also (and mainly) to try and have a single tool that automatically 
takes care and displays them, instead of leaving each subsystem "on its 
own".

Sure, everyone gathers and processes stats in different ways, and the 
aim of this tool is to hopefully be extensible enough to cover all needs.
> In my experience stats belong in the API used for creating/enumerating
> objects, statsfs sounds like going in the exact opposite direction -
> creating a parallel structure / hierarchy for exposing stats.

  I know
> nothing about KVM but are you sure all the info that has to be exposed
> will be stats?I don't understand, what do you mean here?

> 
> In case of networking we have the basic stats in sysfs, under the
> netdevice's kobject. But since we're not using sysfs much any more
> for config, new stats are added in netlink APIs. Again - same APIs
> used for enumeration and config.

I don't really know a lot about the networking subsystem, and as it was 
pointed out in another email on patch 7 by Andrew, networking needs to 
atomically gather and display statistics in order to make them 
consistent, and currently this is not supported by stats_fs but could be 
added in future.

In addition, right now it won't work properly if the networking 
namespaces are enabled. That is another issue to take into 
consideration. That's also why I marked patch 7 as "not for merge"

Regarding the config, as I said the idea is to gather multiple 
subsystems' statistics, therefore there wouldn't be a single 
configuration method like in netlink.
For example in kvm there are file descriptors for configuration, and 
creating them requires no privilege, contrary to the network interfaces.

Thank you,
Emanuele


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/7] kunit: tests for stats_fs API
  2020-05-27 10:05   ` Alan Maguire
@ 2020-05-27 13:26     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-05-27 13:26 UTC (permalink / raw)
  To: Alan Maguire
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, brendanhiggins, linux-kselftest, kunit-dev


>> In order to run them, the kernel .config must set CONFIG_KUNIT=y
>> and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
>> and CONFIG_STATS_FS_TEST=y
>>
> 
> It looks like CONFIG_STATS_FS is built-in, but it exports
> much of the functionality you are testing.  However could the
> tests also be built as a module (i.e. make CONFIG_STATS_FS_TEST
> a tristate variable)? To test this you'd need to specify
> CONFIG_KUNIT=m and CONFIG_STATS_FS_TEST=m, and testing would
> simply be a case of "modprobe"ing the stats fs module and collecting
> results in /sys/kernel/debug/kunit/<module_name> (rather
> than running kunit.py). Are you relying on unexported internals in
> the the tests that would prevent building them as a module?
> 

I haven't checked it yet, but tests should work as separate module.
I will look into it, thanks.

Emanuele


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 13:14   ` Emanuele Giuseppe Esposito
@ 2020-05-27 13:33     ` Andrew Lunn
  2020-05-27 15:00       ` Paolo Bonzini
  2020-05-27 20:23     ` Jakub Kicinski
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Lunn @ 2020-05-27 13:33 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: Jakub Kicinski, kvm, Christian Borntraeger, Paolo Bonzini,
	Jim Mattson, Alexander Viro, Emanuele Giuseppe Esposito,
	David Rientjes, Jonathan Adams, linux-doc, linux-kernel,
	linux-arm-kernel, linux-mips, kvm-ppc, linuxppc-dev, linux-s390,
	linux-fsdevel, netdev

> I don't really know a lot about the networking subsystem, and as it was
> pointed out in another email on patch 7 by Andrew, networking needs to
> atomically gather and display statistics in order to make them consistent,
> and currently this is not supported by stats_fs but could be added in
> future.

Hi Emanuele

Do you have any idea how you will support atomic access? It does not
seem easy to implement in a filesystem based model.

     Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 13:33     ` Andrew Lunn
@ 2020-05-27 15:00       ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-05-27 15:00 UTC (permalink / raw)
  To: Andrew Lunn, Emanuele Giuseppe Esposito
  Cc: Jakub Kicinski, kvm, Christian Borntraeger, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev

On 27/05/20 15:33, Andrew Lunn wrote:
>> I don't really know a lot about the networking subsystem, and as it was
>> pointed out in another email on patch 7 by Andrew, networking needs to
>> atomically gather and display statistics in order to make them consistent,
>> and currently this is not supported by stats_fs but could be added in
>> future.
> 
> Do you have any idea how you will support atomic access? It does not
> seem easy to implement in a filesystem based model.

Hi Andrew,

there are plans to support binary access.  Emanuele and I don't really
have a plan for how to implement it, but there are developers from
Google that have ideas (because Google has a similar "metricfs" thing
in-house).

I think atomic access would use some kind of "source_ops" struct
containing create_snapshot and release_snapshot function pointers.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 13:14   ` Emanuele Giuseppe Esposito
  2020-05-27 13:33     ` Andrew Lunn
@ 2020-05-27 20:23     ` Jakub Kicinski
  2020-05-27 21:07       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2020-05-27 20:23 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev, Andrew Lunn

On Wed, 27 May 2020 15:14:41 +0200 Emanuele Giuseppe Esposito wrote:
> Regarding the config, as I said the idea is to gather multiple 
> subsystems' statistics, therefore there wouldn't be a single 
> configuration method like in netlink.
> For example in kvm there are file descriptors for configuration, and 
> creating them requires no privilege, contrary to the network interfaces.

Enumerating networking interfaces, addresses, and almost all of the
configuration requires no extra privilege. In fact I'd hope that
whatever daemon collects network stats doesn't run as root :)

I think enumerating objects is of primary importance, and statistics 
of those objects are subordinate.

Again, I have little KVM knowledge, but BPF also uses a fd-based API,
and carries stats over the same syscall interface.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 20:23     ` Jakub Kicinski
@ 2020-05-27 21:07       ` Paolo Bonzini
  2020-05-27 21:27         ` Jakub Kicinski
  2020-05-27 22:21         ` David Ahern
  0 siblings, 2 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-05-27 21:07 UTC (permalink / raw)
  To: Jakub Kicinski, Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Jim Mattson, Alexander Viro,
	Emanuele Giuseppe Esposito, David Rientjes, Jonathan Adams,
	linux-doc, linux-kernel, linux-arm-kernel, linux-mips, kvm-ppc,
	linuxppc-dev, linux-s390, linux-fsdevel, netdev, Andrew Lunn

On 27/05/20 22:23, Jakub Kicinski wrote:
> On Wed, 27 May 2020 15:14:41 +0200 Emanuele Giuseppe Esposito wrote:
>> Regarding the config, as I said the idea is to gather multiple 
>> subsystems' statistics, therefore there wouldn't be a single 
>> configuration method like in netlink.
>> For example in kvm there are file descriptors for configuration, and 
>> creating them requires no privilege, contrary to the network interfaces.
>
> Enumerating networking interfaces, addresses, and almost all of the
> configuration requires no extra privilege. In fact I'd hope that
> whatever daemon collects network stats doesn't run as root :)
> 
> I think enumerating objects is of primary importance, and statistics 
> of those objects are subordinate.

I see what you meant now.  statsfs can also be used to enumerate objects
if one is so inclined (with the prototype in patch 7, for example, each
network interface becomes a directory).

> Again, I have little KVM knowledge, but BPF also uses a fd-based API,
> and carries stats over the same syscall interface.

Can BPF stats (for BPF scripts created by whatever process is running in
the system) be collected by an external daemon that does not have access
to the file descriptor?  For KVM it's of secondary importance to gather
stats in the program; it can be nice to have and we are thinking of a
way to export the stats over the fd-based API, but it's less useful than
system-wide monitoring.  Perhaps this is a difference between the two.

Another case where stats and configuration are separate is CPUs, where
CPU enumeration is done in sysfs but statistics are exposed in various
procfs files such as /proc/interrupts and /proc/stats.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
                   ` (7 preceding siblings ...)
  2020-05-26 22:31 ` [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Jakub Kicinski
@ 2020-05-27 21:17 ` Andrew Lunn
  8 siblings, 0 replies; 25+ messages in thread
From: Andrew Lunn @ 2020-05-27 21:17 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito
  Cc: kvm, linux-s390, linux-doc, netdev, Emanuele Giuseppe Esposito,
	linux-kernel, kvm-ppc, Jonathan Adams, Christian Borntraeger,
	Alexander Viro, David Rientjes, linux-fsdevel, Paolo Bonzini,
	linux-mips, linuxppc-dev, linux-arm-kernel, Jim Mattson

On Tue, May 26, 2020 at 01:03:10PM +0200, Emanuele Giuseppe Esposito wrote:
> There is currently no common way for Linux kernel subsystems to expose
> statistics to userspace shared throughout the Linux kernel; subsystems have
> to take care of gathering and displaying statistics by themselves, for
> example in the form of files in debugfs. For example KVM has its own code
> section that takes care of this in virt/kvm/kvm_main.c, where it sets up
> debugfs handlers for displaying values and aggregating them from various
> subfolders to obtain information about the system state (i.e. displaying
> the total number of exits, calculated by summing all exits of all cpus of
> all running virtual machines).
> 
> Allowing each section of the kernel to do so has two disadvantages. First,
> it will introduce redundant code. Second, debugfs is anyway not the right
> place for statistics (for example it is affected by lockdown)
> 
> In this patch series I introduce statsfs, a synthetic ram-based virtual
> filesystem that takes care of gathering and displaying statistics for the
> Linux kernel subsystems.
> 
> The file system is mounted on /sys/kernel/stats and would be already used
> by kvm. Statsfs was initially introduced by Paolo Bonzini [1].
> 
> Statsfs offers a generic and stable API, allowing any kind of
> directory/file organization and supporting multiple kind of aggregations
> (not only sum, but also average, max, min and count_zero) and data types
> (boolean, unsigned/signed and custom types). The implementation, which is
> a generalization of KVM’s debugfs statistics code, takes care of gathering
> and displaying information at run time; users only need to specify the
> values to be included in each source.
> 
> Statsfs would also be a different mountpoint from debugfs, and would not
> suffer from limited access due to the security lock down patches. Its main
> function is to display each statistics as a file in the desired folder
> hierarchy defined through the API. Statsfs files can be read, and possibly
> cleared if their file mode allows it.
> 
> Statsfs has two main components: the public API defined by
> include/linux/statsfs.h, and the virtual file system which should end up in
> /sys/kernel/stats.
> 

Hi Emanuele

> The API has two main elements, values and sources. Kernel subsystems like
> KVM can use the API to create a source, add child sources/values/aggregates
> and register it to the root source (that on the virtual fs would be
> /sys/kernel/statsfs).

Another issue i see with networking is that statistic counters can be
dynamic. They can come and go. One of the drivers i work on has extra
statistics available when a fibre interface is used, compared to a
copper interface. And this happens at run time. The netlink API has no
problems with this. It is a snapshot of what counters are currently
available. There is no state in the API.

In my humble opinion, networking is unlikely to adopt your approach.
You probably want to look around for other subsystems which have
statistics, and see if you can cover their requirements, and get them
on board.

   Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 21:07       ` Paolo Bonzini
@ 2020-05-27 21:27         ` Jakub Kicinski
  2020-05-27 21:44           ` Paolo Bonzini
  2020-05-27 22:21         ` David Ahern
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2020-05-27 21:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Emanuele Giuseppe Esposito, kvm, Christian Borntraeger,
	Jim Mattson, Alexander Viro, Emanuele Giuseppe Esposito,
	David Rientjes, Jonathan Adams, linux-doc, linux-kernel,
	linux-arm-kernel, linux-mips, kvm-ppc, linuxppc-dev, linux-s390,
	linux-fsdevel, netdev, Andrew Lunn

On Wed, 27 May 2020 23:07:53 +0200 Paolo Bonzini wrote:
> > Again, I have little KVM knowledge, but BPF also uses a fd-based API,
> > and carries stats over the same syscall interface.  
> 
> Can BPF stats (for BPF scripts created by whatever process is running in
> the system) be collected by an external daemon that does not have access
> to the file descriptor?  For KVM it's of secondary importance to gather
> stats in the program; it can be nice to have and we are thinking of a
> way to export the stats over the fd-based API, but it's less useful than
> system-wide monitoring.  Perhaps this is a difference between the two.

Yes, check out bpftool prog list (bpftool code is under tools/bpf/ in
the kernel tree). BPF statistics are under a static key, so you may not
see any on your system. My system shows e.g.:

81: kprobe  name abc  tag cefaa9376bdaae75  gpl run_time_ns 80941 run_cnt 152
	loaded_at 2020-05-26T13:00:24-0700  uid 0
	xlated 512B  jited 307B  memlock 4096B  map_ids 66,64
	btf_id 16

In this example run_time_ns and run_cnt are stats.

The first number on the left is the program ID. BPF has an IDA, and
each object gets an integer id. So admin (or CAP_BPF, I think) can
iterate over the ids and open fds to objects of interest.

> Another case where stats and configuration are separate is CPUs, where
> CPU enumeration is done in sysfs but statistics are exposed in various
> procfs files such as /proc/interrupts and /proc/stats.

True, but I'm guessing everyone is just okay living with the legacy
procfs format there. Otherwise I'd guess the stats would had been added
to sysfs. I'd be curious to hear the full story there.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 21:27         ` Jakub Kicinski
@ 2020-05-27 21:44           ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-05-27 21:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Emanuele Giuseppe Esposito, kvm, Christian Borntraeger,
	Jim Mattson, Alexander Viro, Emanuele Giuseppe Esposito,
	David Rientjes, Jonathan Adams, linux-doc, linux-kernel,
	linux-arm-kernel, linux-mips, kvm-ppc, linuxppc-dev, linux-s390,
	linux-fsdevel, netdev, Andrew Lunn

On 27/05/20 23:27, Jakub Kicinski wrote:
> On Wed, 27 May 2020 23:07:53 +0200 Paolo Bonzini wrote:
>>> Again, I have little KVM knowledge, but BPF also uses a fd-based API,
>>> and carries stats over the same syscall interface.  
>>
>> Can BPF stats (for BPF scripts created by whatever process is running in
>> the system) be collected by an external daemon that does not have access
>> to the file descriptor?  For KVM it's of secondary importance to gather
>> stats in the program; it can be nice to have and we are thinking of a
>> way to export the stats over the fd-based API, but it's less useful than
>> system-wide monitoring.  Perhaps this is a difference between the two.
> 
> Yes, check out bpftool prog list (bpftool code is under tools/bpf/ in
> the kernel tree). BPF statistics are under a static key, so you may not
> see any on your system. My system shows e.g.:
> 
> 81: kprobe  name abc  tag cefaa9376bdaae75  gpl run_time_ns 80941 run_cnt 152
> 	loaded_at 2020-05-26T13:00:24-0700  uid 0
> 	xlated 512B  jited 307B  memlock 4096B  map_ids 66,64
> 	btf_id 16
> 
> In this example run_time_ns and run_cnt are stats.
> 
> The first number on the left is the program ID. BPF has an IDA, and
> each object gets an integer id. So admin (or CAP_BPF, I think) can
> iterate over the ids and open fds to objects of interest.

Got it, thanks.  But then "I'd hope that whatever daemon collects [BPF]
stats doesn't run as root". :)

>> Another case where stats and configuration are separate is CPUs, where
>> CPU enumeration is done in sysfs but statistics are exposed in various
>> procfs files such as /proc/interrupts and /proc/stats.
> 
> True, but I'm guessing everyone is just okay living with the legacy
> procfs format there. Otherwise I'd guess the stats would had been added
> to sysfs. I'd be curious to hear the full story there.

Yeah, it's a chicken-and-egg problem in that there's no good place in
sysfs to put statistics right now, which is part of what this filesystem
is trying to solve (the other part is the API).

You can read more about Google's usecase at
http://lkml.iu.edu/hypermail/linux/kernel/2005.0/08056.html, it does
include both network and interrupt stats and it's something that they've
been using in production for quite some time.  We'd like the statsfs API
to be the basis for including something akin to that in Linux.

To be honest, it's unlikely that Emanuele (who has just finished his
internship at Red Hat) and I will pursue the networking stats further
than the demo patch at the end of this series. However, we're trying to
make sure that the API is at least ready for that, and to probe whether
any developers from other subsystems would be interested in using
statsfs.  So thanks for bringing your point of view!

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 21:07       ` Paolo Bonzini
  2020-05-27 21:27         ` Jakub Kicinski
@ 2020-05-27 22:21         ` David Ahern
  2020-05-28  5:22           ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: David Ahern @ 2020-05-27 22:21 UTC (permalink / raw)
  To: Paolo Bonzini, Jakub Kicinski, Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Jim Mattson, Alexander Viro,
	Emanuele Giuseppe Esposito, David Rientjes, Jonathan Adams,
	linux-doc, linux-kernel, linux-arm-kernel, linux-mips, kvm-ppc,
	linuxppc-dev, linux-s390, linux-fsdevel, netdev, Andrew Lunn

On 5/27/20 3:07 PM, Paolo Bonzini wrote:
> I see what you meant now.  statsfs can also be used to enumerate objects
> if one is so inclined (with the prototype in patch 7, for example, each
> network interface becomes a directory).

there are many use cases that have 100's to 1000's have network devices.
Having a sysfs entry per device already bloats memory usage for these
use cases; another filesystem with an entry per device makes that worse.
Really the wrong direction for large scale systems.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics
  2020-05-27 22:21         ` David Ahern
@ 2020-05-28  5:22           ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2020-05-28  5:22 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Emanuele Giuseppe Esposito
  Cc: kvm, Christian Borntraeger, Jim Mattson, Alexander Viro,
	Emanuele Giuseppe Esposito, David Rientjes, Jonathan Adams,
	linux-doc, linux-kernel, linux-arm-kernel, linux-mips, kvm-ppc,
	linuxppc-dev, linux-s390, linux-fsdevel, netdev, Andrew Lunn

On 28/05/20 00:21, David Ahern wrote:
> On 5/27/20 3:07 PM, Paolo Bonzini wrote:
>> I see what you meant now.  statsfs can also be used to enumerate objects
>> if one is so inclined (with the prototype in patch 7, for example, each
>> network interface becomes a directory).
> 
> there are many use cases that have 100's to 1000's have network devices.
> Having a sysfs entry per device already bloats memory usage for these
> use cases; another filesystem with an entry per device makes that worse.
> Really the wrong direction for large scale systems.

Hi David,

IMO the important part for now is having a flexible kernel API for
exposing statistics across multiple subsystems, so that they can be
harvested in an efficient way.  The userspace API is secondary, and
multiple APIs can be added to cater for different usecases.

For example, as of the first five patches the memory usage is the same
as what is now in the mainline kernel, since all the patchset does is
take existing debugfs inodes and move them to statsfs.  I agree that, if
the concept is extended to the whole kernel, scalability and memory
usage becomes an issue; and indeed, the long-term plan is to support a
binary format that is actually _more_ efficient than the status quo for
large scale systems.

In the meanwhile, the new filesystem can be disabled (see the difference
between "STATS_FS" and "STATS_FS_API") if it imposes undesirable overhead.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/7] documentation for stats_fs
  2020-05-26 11:03 ` [PATCH v3 2/7] documentation for stats_fs Emanuele Giuseppe Esposito
@ 2020-06-04  0:23   ` Randy Dunlap
  2020-06-04 15:34     ` Emanuele Giuseppe Esposito
  0 siblings, 1 reply; 25+ messages in thread
From: Randy Dunlap @ 2020-06-04  0:23 UTC (permalink / raw)
  To: Emanuele Giuseppe Esposito, kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, Emanuele Giuseppe Esposito, David Rientjes,
	Jonathan Adams, linux-doc, linux-kernel, linux-arm-kernel,
	linux-mips, kvm-ppc, linuxppc-dev, linux-s390, linux-fsdevel,
	netdev

Hi--

Here are a few comments for you.


On 5/26/20 4:03 AM, Emanuele Giuseppe Esposito wrote:
> Html docs for a complete documentation of the stats_fs API,
> filesystem and usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  Documentation/filesystems/index.rst    |   1 +
>  Documentation/filesystems/stats_fs.rst | 222 +++++++++++++++++++++++++
>  2 files changed, 223 insertions(+)
>  create mode 100644 Documentation/filesystems/stats_fs.rst

> diff --git a/Documentation/filesystems/stats_fs.rst b/Documentation/filesystems/stats_fs.rst
> new file mode 100644
> index 000000000000..292c689ffb98
> --- /dev/null
> +++ b/Documentation/filesystems/stats_fs.rst
> @@ -0,0 +1,222 @@
> +========
> +Stats_FS
> +========
> +
> +Stats_fs is a synthetic ram-based virtual filesystem that takes care of

                           RAM-based

> +gathering and displaying statistics for the Linux kernel subsystems.
> +
> +The motivation for stats_fs comes from the fact that there is no common
> +way for Linux kernel subsystems to expose statistics to userspace shared
> +throughout the Linux kernel; subsystems have to take care of gathering and
> +displaying statistics by themselves, for example in the form of files in
> +debugfs.
> +
> +Allowing each subsystem of the kernel to do so has two disadvantages.
> +First, it will introduce redundant code. Second, debugfs is anyway not the
> +right place for statistics (for example it is affected by lockdown).
> +
> +Stats_fs offers a generic and stable API, allowing any kind of
> +directory/file organization and supporting multiple kind of aggregations

                                                       kinds

> +(not only sum, but also average, max, min and count_zero) and data types
> +(boolean, all unsigned/signed and custom types). The implementation takes
> +care of gathering and displaying information at run time; users only need
> +to specify the values to be included in each source. Optionally, users can
> +also provide a display function for each value, that will take care of
> +displaying the provided value in a custom format.
> +
> +Its main function is to display each statistics as a file in the desired

                                        statistic

> +folder hierarchy defined through the API. Stats_fs files can be read, and
> +possibly cleared if their file mode allows it.
> +
> +Stats_fs is typically mounted with a command like::
> +
> +    mount -t stats_fs stats_fs /sys/kernel/stats_fs
> +
> +(Or an equivalent /etc/fstab line).
> +
> +Stats_fs has two main components: the public API defined by
> +include/linux/stats_fs.h, and the virtual file system in
> +/sys/kernel/stats.
> +
> +The API has two main elements, values and sources. Kernel
> +subsystems will create a source, add child
> +sources/values/aggregates and register it to the root source (that on the
> +virtual fs would be /sys/kernel/stats).
> +
> +The stats_fs API is defined in ``<linux/stats_fs.h>``.
> +
> +    Sources
> +        Sources are created via ``stats_fs_source_create()``, and each
> +        source becomes a directory in the file system. Sources form a
> +        parent-child relationship; root sources are added to the file
> +        system via ``stats_fs_source_register()``. Therefore each Linux
> +        subsystem will add its own entry to the root, filesystem similar

                                                   root filesystem

> +        to what it is done in debugfs. Every other source is added to or
> +        removed from a parent through the
> +        ``stats_fs_source_add_subordinate()`` and
> +        ``stats_fs_source_remove_subordinate()`` APIs. Once a source is
> +        created and added to the tree (via add_subordinate), it will be
> +        used to compute aggregate values in the parent source. A source
> +        can optionally be hidden from the filesystem but still considered
> +        in the aggregation operations if the corresponding flag is set
> +        during initialization.
> +
> +    Values
> +        Values represent quantites that are gathered by the stats_fs user.

                            quantities

> +        Examples of values include the number of vm exits of a given kind,

                                                    VM

> +        the amount of memory used by some data structure, the length of
> +        the longest hash table chain, or anything like that. Values are
> +        defined with the stats_fs_source_add_values function. Each value
> +        is defined by a ``struct stats_fs_value``; the same
> +        ``stats_fs_value`` can be added to many different sources. A value
> +        can be considered "simple" if it fetches data from a user-provided
> +        location, or "aggregate" if it groups all values in the
> +        subordinate sources that include the same ``stats_fs_value``.
> +        Values by default are considered to be cumulative, meaning the
> +        value they represent never decreases, but can also be defined as
> +        floating if they exibith a different behavior. The main difference

                            exhibit

> +        between these two is reflected into the file permission, since a
> +        floating value file does not allow the user to clear it. Each
> +        value has a ``stats_fs_type`` pointer in order to allow the user
> +        to provide custom get and clear functions. The library, however,
> +        also exports default ``stats_fs_type`` structs for the standard
> +        types (all unsigned and signed types plus boolean). A value can
> +        also provide a show function that takes care of displaying the
> +        value in a custom string format. This can be especially useful
> +        when displaying enums.
> +
> +Because stats_fs is a different mountpoint than debugfs, it is not affected
> +by security lockdown.
> +
> +Using Stats_fs
> +================
> +
> +Define a value::
> +
> +        struct statistics{
> +                uint64_t exit;
> +                ...
> +        };
> +
> +        struct kvm {
> +                ...
> +                struct statistics stat;
> +        };
> +
> +        struct stats_fs_value kvm_stats[] = {
> +                { "exit_vm", offsetof(struct kvm, stat.exit), &stats_fs_type_u64,
> +                  STATS_FS_SUM },
> +                { NULL }
> +        };
> +
> +The same ``struct stats_fs_value`` is used for both simple and aggregate
> +values, though the type and offset are only used for simple values.
> +Aggregates merge all values that use the same ``struct stats_fs_value``.
> +
> +Create the parent source::
> +
> +        struct stats_fs_source parent_source = stats_fs_source_create(0, "parent");
> +
> +Register it (files and folders
> +will only be visible after this function is called)::
> +
> +        stats_fs_source_register(parent_source);
> +
> +Create and add a child::
> +
> +        struct stats_fs_source child_source = stats_fs_source_create(STATS_FS_HIDDEN, "child");
> +
> +        stats_fs_source_add_subordinate(parent_source, child_source);
> +
> +The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
> +block the creation of the files.

Why does HIDDEN block the creation of files?  instead of their visibility?

> +
> +Add values to parent and child (also here order doesn't matter)::
> +
> +        struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
> +        ...
> +        stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
> +        stats_fs_source_add_values(parent_source, kvm_stats, NULL, STATS_FS_HIDDEN);
> +
> +``child_source`` will be a simple value, since it has a non-NULL base
> +pointer, while ``parent_source`` will be an aggregate. During the adding
> +phase, also values can optionally be marked as hidden, so that the folder
> +and other values can be still shown.
> +
> +Of course the same ``struct stats_fs_value`` array can be also passed with a
> +different base pointer, to represent the same value but in another instance
> +of the kvm struct.
> +
> +Search:
> +
> +Fetch a value from the child source, returning the value
> +pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
> +
> +        uint64_t ret_child, ret_parent;
> +
> +        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
> +
> +Fetch an aggregate value, searching all subsources of ``parent_source`` for
> +the specified ``struct stats_fs_value``::
> +
> +        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
> +
> +        assert(ret_child == ret_parent); // check expected result
> +
> +To make it more interesting, add another child::
> +
> +        struct stats_fs_source child_source2 = stats_fs_source_create(0, "child2");
> +
> +        stats_fs_source_add_subordinate(parent_source, child_source2);
> +        // now  the structure is parent -> child1
> +        //                              -> child2

Is that the same as                 parent -> child1 -> child2
?  It could almost be read as
                                    parent -> child1
                                    parent -> child2

Whichever it is, can you make it more explicit, please?


> +
> +        struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
> +        ...
> +        stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 0);
> +
> +Note that other_base_ptr points to another instance of kvm, so the struct
> +stats_fs_value is the same but the address at which they point is not.
> +
> +Now get the aggregate value::
> +
> +        uint64_t ret_child, ret_child2, ret_parent;
> +
> +        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
> +        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
> +        stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
> +
> +        assert((ret_child + ret_child2) == ret_parent);
> +
> +Cleanup::
> +
> +        stats_fs_source_remove_subordinate(parent_source, child_source);
> +        stats_fs_source_revoke(child_source);
> +        stats_fs_source_put(child_source);
> +
> +        stats_fs_source_remove_subordinate(parent_source, child_source2);
> +        stats_fs_source_revoke(child_source2);
> +        stats_fs_source_put(child_source2);
> +
> +        stats_fs_source_put(parent_source);
> +        kfree(other_base_ptr);
> +        kfree(base_ptr);
> +
> +Calling stats_fs_source_revoke is very important, because it will ensure

           stats_fs_source_revoke()

> +that stats_fs will not access the data that were passed to
> +stats_fs_source_add_value for this source.
> +
> +Because open files increase the reference count for a stats_fs_source, the
> +source can end up living longer than the data that provides the values for
> +the source.  Calling stats_fs_source_revoke just before the backing data

                        stats_fs_source_revoke()

> +is freed avoids accesses to freed data structures. The sources will return
> +0.
> +
> +This is not needed for the parent_source, since it just contains
> +aggregates that would be 0 anyways if no matching child value exist.
> +
> +API Documentation
> +=================
> +
> +.. kernel-doc:: include/linux/stats_fs.h
> +   :export: fs/stats_fs/*.c
> \ No newline at end of file

Please fix that. ^^^^^


Thanks for the documentation.

-- 
~Randy


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/7] documentation for stats_fs
  2020-06-04  0:23   ` Randy Dunlap
@ 2020-06-04 15:34     ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 25+ messages in thread
From: Emanuele Giuseppe Esposito @ 2020-06-04 15:34 UTC (permalink / raw)
  To: Randy Dunlap, Emanuele Giuseppe Esposito, kvm
  Cc: Christian Borntraeger, Paolo Bonzini, Jim Mattson,
	Alexander Viro, David Rientjes, Jonathan Adams, linux-doc,
	linux-kernel, linux-arm-kernel, linux-mips, kvm-ppc,
	linuxppc-dev, linux-s390, linux-fsdevel, netdev

Hi,

>> +
>> +The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
>> +block the creation of the files.
> 
> Why does HIDDEN block the creation of files?  instead of their visibility?

The file itself is used to allow the user to view the content of a 
value. In order to make it hidden, the framework just doesn't create the 
file.
The structure is still present and considered in statsfs, however.

Hidden in this case means not visible at all thus not created, not the 
hidden file concept of dotted files (".filename")

> 
>> +
>> +Add values to parent and child (also here order doesn't matter)::
>> +
>> +        struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
>> +        ...
>> +        stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
>> +        stats_fs_source_add_values(parent_source, kvm_stats, NULL, STATS_FS_HIDDEN);
>> +
>> +``child_source`` will be a simple value, since it has a non-NULL base
>> +pointer, while ``parent_source`` will be an aggregate. During the adding
>> +phase, also values can optionally be marked as hidden, so that the folder
>> +and other values can be still shown.
>> +
>> +Of course the same ``struct stats_fs_value`` array can be also passed with a
>> +different base pointer, to represent the same value but in another instance
>> +of the kvm struct.
>> +
>> +Search:
>> +
>> +Fetch a value from the child source, returning the value
>> +pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
>> +
>> +        uint64_t ret_child, ret_parent;
>> +
>> +        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
>> +
>> +Fetch an aggregate value, searching all subsources of ``parent_source`` for
>> +the specified ``struct stats_fs_value``::
>> +
>> +        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
>> +
>> +        assert(ret_child == ret_parent); // check expected result
>> +
>> +To make it more interesting, add another child::
>> +
>> +        struct stats_fs_source child_source2 = stats_fs_source_create(0, "child2");
>> +
>> +        stats_fs_source_add_subordinate(parent_source, child_source2);
>> +        // now  the structure is parent -> child1
>> +        //                              -> child2
> 
> Is that the same as                 parent -> child1 -> child2
> ?  It could almost be read as
>                                      parent -> child1
>                                      parent -> child2

No the example in the documentation shows the relationship
parent -> child1 and
parent -> child2.
It's not the same as
parent -> child1 -> child2.
In order to do the latter, one would need to do:

stats_fs_source_add_subordinate(parent_source, child_source1);
stats_fs_source_add_subordinate(child_source1, child_source2);

Hope that this clarifies it.

> 
> Whichever it is, can you make it more explicit, please?
> 
> 
>> +
>> +        struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
>> +        ...
>> +        stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 0);
>> +
>> +Note that other_base_ptr points to another instance of kvm, so the struct
>> +stats_fs_value is the same but the address at which they point is not.
>> +
>> +Now get the aggregate value::
>> +
>> +        uint64_t ret_child, ret_child2, ret_parent;
>> +
>> +        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
>> +        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
>> +        stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
>> +
>> +        assert((ret_child + ret_child2) == ret_parent);
>> +
>> +Cleanup::
>> +
>> +        stats_fs_source_remove_subordinate(parent_source, child_source);
>> +        stats_fs_source_revoke(child_source);
>> +        stats_fs_source_put(child_source);
>> +
>> +        stats_fs_source_remove_subordinate(parent_source, child_source2);
>> +        stats_fs_source_revoke(child_source2);
>> +        stats_fs_source_put(child_source2);
>> +
>> +        stats_fs_source_put(parent_source);
>> +        kfree(other_base_ptr);
>> +        kfree(base_ptr);
>> +
>> +Calling stats_fs_source_revoke is very important, because it will ensure
> 
>             stats_fs_source_revoke()
> 
>> +that stats_fs will not access the data that were passed to
>> +stats_fs_source_add_value for this source.
>> +
>> +Because open files increase the reference count for a stats_fs_source, the
>> +source can end up living longer than the data that provides the values for
>> +the source.  Calling stats_fs_source_revoke just before the backing data
> 
>                          stats_fs_source_revoke()
> 
>> +is freed avoids accesses to freed data structures. The sources will return
>> +0.
>> +
>> +This is not needed for the parent_source, since it just contains
>> +aggregates that would be 0 anyways if no matching child value exist.
>> +
>> +API Documentation
>> +=================
>> +
>> +.. kernel-doc:: include/linux/stats_fs.h
>> +   :export: fs/stats_fs/*.c
>> \ No newline at end of file
> 
> Please fix that. ^^^^^
> 
> 
> Thanks for the documentation.
> 

Thank you for the feedback,
Emanuele

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-06-04 15:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 2/7] documentation for stats_fs Emanuele Giuseppe Esposito
2020-06-04  0:23   ` Randy Dunlap
2020-06-04 15:34     ` Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 3/7] kunit: tests for stats_fs API Emanuele Giuseppe Esposito
2020-05-27 10:05   ` Alan Maguire
2020-05-27 13:26     ` Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 5/7] kvm_main: replace debugfs with stats_fs Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API Emanuele Giuseppe Esposito
2020-05-26 14:16   ` Andrew Lunn
2020-05-26 15:45     ` Emanuele Giuseppe Esposito
2020-05-26 22:31 ` [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Jakub Kicinski
2020-05-27 13:14   ` Emanuele Giuseppe Esposito
2020-05-27 13:33     ` Andrew Lunn
2020-05-27 15:00       ` Paolo Bonzini
2020-05-27 20:23     ` Jakub Kicinski
2020-05-27 21:07       ` Paolo Bonzini
2020-05-27 21:27         ` Jakub Kicinski
2020-05-27 21:44           ` Paolo Bonzini
2020-05-27 22:21         ` David Ahern
2020-05-28  5:22           ` Paolo Bonzini
2020-05-27 21:17 ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).