All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] tracing: introducing eventfs
@ 2023-07-13 11:33 Ajay Kaher
  2023-07-13 11:33 ` [PATCH v4 01/10] tracing: Require all trace events to have a TRACE_SYSTEM Ajay Kaher
                   ` (10 more replies)
  0 siblings, 11 replies; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.

Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.

Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.

v3:
Patch 3,4,5,7,9:
         removed all the eventfs_rwsem code and replaced it with an srcu
         lock for the readers, and a mutex to synchronize the writers of
         the list.

Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10

Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
         as it really should not be exposed to all users.

Patch 5: added a recursion check to eventfs_remove_rec() as it is really
         dangerous to have unchecked recursion in the kernel (we do have
         a fixed size stack).

         have the free use srcu callbacks. After the srcu grace periods
         are done, it adds the eventfs_file onto a llist (lockless link
         list) and wakes up a work queue. Then the work queue does the
         freeing (this needs to be done in task/workqueue context, as
         srcu callbacks are done in softirq context).

Patch 6: renamed:
         eventfs_create_file() -> create_file()
         eventfs_create_dir() -> create_dir()

v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
          As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
          helper function to add files or directories to eventfs
          fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
          used eventfs_prepare_ef() to add files
          fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
          fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
          rebased because of v3 01/10
Patch 10: moved from v1 9/9

v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
         protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
         removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
         fix kprobe test in eventfs_root_lookup()
         protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
         called eventfs_remove_events_dir() instead of tracefs_remove()
         from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case

Ajay Kaher (9):
  tracefs: Rename some tracefs function
  eventfs: Implement eventfs dir creation functions
  eventfs: Implement eventfs file add functions
  eventfs: Implement eventfs file, directory remove function
  eventfs: Implement functions to create eventfs files and directories
  eventfs: Implement eventfs lookup, read, open functions
  eventfs: Implement tracefs_inode_cache
  eventfs: Move tracing/events to eventfs
  test: ftrace: Fix kprobe test for eventfs

Steven Rostedt (Google) (1):
  tracing: Require all trace events to have a TRACE_SYSTEM

 fs/tracefs/Makefile                           |   1 +
 fs/tracefs/event_inode.c                      | 711 ++++++++++++++++++
 fs/tracefs/inode.c                            | 124 ++-
 fs/tracefs/internal.h                         |  25 +
 include/linux/trace_events.h                  |   1 +
 include/linux/tracefs.h                       |  32 +
 kernel/trace/trace.h                          |   2 +-
 kernel/trace/trace_events.c                   |  78 +-
 .../ftrace/test.d/kprobe/kprobe_args_char.tc  |   4 +-
 .../test.d/kprobe/kprobe_args_string.tc       |   4 +-
 10 files changed, 930 insertions(+), 52 deletions(-)
 create mode 100644 fs/tracefs/event_inode.c
 create mode 100644 fs/tracefs/internal.h

-- 
2.39.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v4 01/10] tracing: Require all trace events to have a TRACE_SYSTEM
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-13 11:33 ` [PATCH v4 02/10] tracefs: Rename some tracefs function Ajay Kaher
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The creation of the trace event directory requires that a TRACE_SYSTEM is
defined that the trace event directory is added within the system it was
defined in.

The code handled the case where a TRACE_SYSTEM was not added, and would
then add the event at the events directory. But nothing should be doing
this. This code also prevents the implementation of creating dynamic
dentrys for the eventfs system.

As this path has never been hit on correct code, remove it. If it does get
hit, issues a WARN_ON_ONCE() and return ENODEV.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 kernel/trace/trace_events.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 57e539d47989..12ed71428939 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2426,14 +2426,15 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 
 	/*
 	 * If the trace point header did not define TRACE_SYSTEM
-	 * then the system would be called "TRACE_SYSTEM".
+	 * then the system would be called "TRACE_SYSTEM". This should
+	 * never happen.
 	 */
-	if (strcmp(call->class->system, TRACE_SYSTEM) != 0) {
-		d_events = event_subsystem_dir(tr, call->class->system, file, parent);
-		if (!d_events)
-			return -ENOMEM;
-	} else
-		d_events = parent;
+	if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0))
+		return -ENODEV;
+
+	d_events = event_subsystem_dir(tr, call->class->system, file, parent);
+	if (!d_events)
+		return -ENOMEM;
 
 	name = trace_event_name(call);
 	file->dir = tracefs_create_dir(name, d_events);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 02/10] tracefs: Rename some tracefs function
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
  2023-07-13 11:33 ` [PATCH v4 01/10] tracing: Require all trace events to have a TRACE_SYSTEM Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 15:11   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions Ajay Kaher
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Renaming following functions as these would require by eventfs
as well:

start_creating -> tracefs_start_creating
failed_creating -> tracefs_failed_creating
end_creating -> tracefs_end_creating

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
---
 fs/tracefs/inode.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 57ac8aa4a724..b0348efc0238 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = {
 };
 MODULE_ALIAS_FS("tracefs");
 
-static struct dentry *start_creating(const char *name, struct dentry *parent)
+static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
 {
 	struct dentry *dentry;
 	int error;
@@ -437,7 +437,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent)
 	return dentry;
 }
 
-static struct dentry *failed_creating(struct dentry *dentry)
+static struct dentry *tracefs_failed_creating(struct dentry *dentry)
 {
 	inode_unlock(d_inode(dentry->d_parent));
 	dput(dentry);
@@ -445,7 +445,7 @@ static struct dentry *failed_creating(struct dentry *dentry)
 	return NULL;
 }
 
-static struct dentry *end_creating(struct dentry *dentry)
+static struct dentry *tracefs_end_creating(struct dentry *dentry)
 {
 	inode_unlock(d_inode(dentry->d_parent));
 	return dentry;
@@ -490,14 +490,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
 	if (!(mode & S_IFMT))
 		mode |= S_IFREG;
 	BUG_ON(!S_ISREG(mode));
-	dentry = start_creating(name, parent);
+	dentry = tracefs_start_creating(name, parent);
 
 	if (IS_ERR(dentry))
 		return NULL;
 
 	inode = tracefs_get_inode(dentry->d_sb);
 	if (unlikely(!inode))
-		return failed_creating(dentry);
+		return tracefs_failed_creating(dentry);
 
 	inode->i_mode = mode;
 	inode->i_fop = fops ? fops : &tracefs_file_operations;
@@ -506,13 +506,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
 	inode->i_gid = d_inode(dentry->d_parent)->i_gid;
 	d_instantiate(dentry, inode);
 	fsnotify_create(d_inode(dentry->d_parent), dentry);
-	return end_creating(dentry);
+	return tracefs_end_creating(dentry);
 }
 
 static struct dentry *__create_dir(const char *name, struct dentry *parent,
 				   const struct inode_operations *ops)
 {
-	struct dentry *dentry = start_creating(name, parent);
+	struct dentry *dentry = tracefs_start_creating(name, parent);
 	struct inode *inode;
 
 	if (IS_ERR(dentry))
@@ -520,7 +520,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
 
 	inode = tracefs_get_inode(dentry->d_sb);
 	if (unlikely(!inode))
-		return failed_creating(dentry);
+		return tracefs_failed_creating(dentry);
 
 	/* Do not set bits for OTH */
 	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP;
@@ -534,7 +534,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
 	d_instantiate(dentry, inode);
 	inc_nlink(d_inode(dentry->d_parent));
 	fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
-	return end_creating(dentry);
+	return tracefs_end_creating(dentry);
 }
 
 /**
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
  2023-07-13 11:33 ` [PATCH v4 01/10] tracing: Require all trace events to have a TRACE_SYSTEM Ajay Kaher
  2023-07-13 11:33 ` [PATCH v4 02/10] tracefs: Rename some tracefs function Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 15:44   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 04/10] eventfs: Implement eventfs file add functions Ajay Kaher
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Adding eventfs_file structure which will hold properties of file or dir.

Adding following functions to add dir in eventfs:

eventfs_create_events_dir() will directly create events dir within
tracing folder.

eventfs_add_subsystem_dir() adds the information of subsystem_dir
to eventfs, and dynamically creates subsystem_dir directories when
they are accessed.

eventfs_add_dir() adds the information of the dir, within a
subsystem_dir, to eventfs and dynamically creates these directories
when they are accessed.

Adding tracefs_inode structure, this will help eventfs to keep track
of inode, flags and pointer to private date.

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
---
 fs/tracefs/Makefile      |   1 +
 fs/tracefs/event_inode.c | 217 +++++++++++++++++++++++++++++++++++++++
 fs/tracefs/inode.c       |   8 +-
 fs/tracefs/internal.h    |  25 +++++
 include/linux/tracefs.h  |  11 ++
 5 files changed, 258 insertions(+), 4 deletions(-)
 create mode 100644 fs/tracefs/event_inode.c
 create mode 100644 fs/tracefs/internal.h

diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile
index 7c35a282b484..73c56da8e284 100644
--- a/fs/tracefs/Makefile
+++ b/fs/tracefs/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 tracefs-objs	:= inode.o
+tracefs-objs	+= event_inode.o
 
 obj-$(CONFIG_TRACING)	+= tracefs.o
 
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
new file mode 100644
index 000000000000..4e7a8eccaa0b
--- /dev/null
+++ b/fs/tracefs/event_inode.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  event_inode.c - part of tracefs, a pseudo file system for activating tracing
+ *
+ *  Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) <rostedt@goodmis.org>
+ *  Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher <akaher@vmware.com>
+ *
+ *  eventfs is used to show trace events with one set of dentries
+ *
+ *  eventfs stores meta-data of files/dirs and skip to create object of
+ *  inodes/dentries. As and when requires, eventfs will create the
+ *  inodes/dentries for only required files/directories. Also eventfs
+ *  would delete the inodes/dentries once no more requires but preserve
+ *  the meta data.
+ */
+#include <linux/fsnotify.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/workqueue.h>
+#include <linux/security.h>
+#include <linux/tracefs.h>
+#include <linux/kref.h>
+#include <linux/delay.h>
+#include "internal.h"
+
+struct eventfs_inode {
+	struct list_head	e_top_files;
+};
+
+struct eventfs_file {
+	const char			*name;
+	struct dentry			*d_parent;
+	struct dentry			*dentry;
+	struct list_head		list;
+	struct eventfs_inode		*ei;
+	const struct file_operations	*fop;
+	const struct inode_operations	*iop;
+	union {
+		struct rcu_head		rcu;
+		struct llist_node	llist;  /* For freeing after RCU */
+	};
+	void				*data;
+	umode_t				mode;
+	bool				created;
+};
+
+static DEFINE_MUTEX(eventfs_mutex);
+
+static const struct file_operations eventfs_file_operations = {
+};
+
+static const struct inode_operations eventfs_root_dir_inode_operations = {
+};
+
+/**
+ * eventfs_prepare_ef - helper function to prepare eventfs_file
+ * @name: the name of the file/directory to create.
+ * @mode: the permission that the file should have.
+ * @fop: struct file_operations that should be used for this file/directory.
+ * @iop: struct inode_operations that should be used for this file/directory.
+ * @data: something that the caller will want to get to later on. The
+ *        inode.i_private pointer will point to this value on the open() call.
+ *
+ * This function allocates and fills the eventfs_file structure.
+ */
+static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
+					const struct file_operations *fop,
+					const struct inode_operations *iop,
+					void *data)
+{
+	struct eventfs_file *ef;
+
+	ef = kzalloc(sizeof(*ef), GFP_KERNEL);
+	if (!ef)
+		return ERR_PTR(-ENOMEM);
+
+	ef->name = kstrdup(name, GFP_KERNEL);
+	if (!ef->name) {
+		kfree(ef);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (S_ISDIR(mode)) {
+		ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
+		if (!ef->ei) {
+			kfree(ef->name);
+			kfree(ef);
+			return ERR_PTR(-ENOMEM);
+		}
+		INIT_LIST_HEAD(&ef->ei->e_top_files);
+	} else {
+		ef->ei = NULL;
+	}
+
+	ef->iop = iop;
+	ef->fop = fop;
+	ef->mode = mode;
+	ef->data = data;
+	return ef;
+}
+
+/**
+ * eventfs_create_events_dir - create the trace event structure
+ * @name: the name of the directory to create.
+ * @parent: parent dentry for this file.  This should be a directory dentry
+ *          if set.  If this parameter is NULL, then the directory will be
+ *          created in the root of the tracefs filesystem.
+ *
+ * This function creates the top of the trace event directory.
+ */
+struct dentry *eventfs_create_events_dir(const char *name,
+					 struct dentry *parent)
+{
+	struct dentry *dentry = tracefs_start_creating(name, parent);
+	struct eventfs_inode *ei;
+	struct tracefs_inode *ti;
+	struct inode *inode;
+
+	if (IS_ERR(dentry))
+		return dentry;
+
+	ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+	if (!ei)
+		return ERR_PTR(-ENOMEM);
+	inode = tracefs_get_inode(dentry->d_sb);
+	if (unlikely(!inode)) {
+		kfree(ei);
+		tracefs_failed_creating(dentry);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	INIT_LIST_HEAD(&ei->e_top_files);
+
+	ti = get_tracefs(inode);
+	ti->flags |= TRACEFS_EVENT_INODE;
+	ti->private = ei;
+
+	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+	inode->i_op = &eventfs_root_dir_inode_operations;
+	inode->i_fop = &eventfs_file_operations;
+
+	/* directory inodes start off with i_nlink == 2 (for "." entry) */
+	inc_nlink(inode);
+	d_instantiate(dentry, inode);
+	inc_nlink(dentry->d_parent->d_inode);
+	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
+	return tracefs_end_creating(dentry);
+}
+
+/**
+ * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later
+ * @name: the name of the file to create.
+ * @parent: parent dentry for this dir.
+ *
+ * This function adds eventfs subsystem dir to list.
+ * And all these dirs are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
+					       struct dentry *parent)
+{
+	struct tracefs_inode *ti_parent;
+	struct eventfs_inode *ei_parent;
+	struct eventfs_file *ef;
+
+	if (!parent)
+		return ERR_PTR(-EINVAL);
+
+	ti_parent = get_tracefs(parent->d_inode);
+	ei_parent = ti_parent->private;
+
+	ef = eventfs_prepare_ef(name,
+		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
+		&eventfs_file_operations,
+		&eventfs_root_dir_inode_operations, NULL);
+
+	if (IS_ERR(ef))
+		return ef;
+
+	mutex_lock(&eventfs_mutex);
+	list_add_tail(&ef->list, &ei_parent->e_top_files);
+	ef->d_parent = parent;
+	mutex_unlock(&eventfs_mutex);
+	return ef;
+}
+
+/**
+ * eventfs_add_dir - add eventfs dir to list to create later
+ * @name: the name of the file to create.
+ * @ef_parent: parent eventfs_file for this dir.
+ *
+ * This function adds eventfs dir to list.
+ * And all these dirs are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+struct eventfs_file *eventfs_add_dir(const char *name,
+				     struct eventfs_file *ef_parent)
+{
+	struct eventfs_file *ef;
+
+	if (!ef_parent)
+		return ERR_PTR(-EINVAL);
+
+	ef = eventfs_prepare_ef(name,
+		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
+		&eventfs_file_operations,
+		&eventfs_root_dir_inode_operations, NULL);
+
+	if (IS_ERR(ef))
+		return ef;
+
+	mutex_lock(&eventfs_mutex);
+	list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
+	ef->d_parent = ef_parent->dentry;
+	mutex_unlock(&eventfs_mutex);
+	return ef;
+}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index b0348efc0238..7ef3a02766f5 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -127,7 +127,7 @@ static const struct inode_operations tracefs_dir_inode_operations = {
 	.rmdir		= tracefs_syscall_rmdir,
 };
 
-static struct inode *tracefs_get_inode(struct super_block *sb)
+struct inode *tracefs_get_inode(struct super_block *sb)
 {
 	struct inode *inode = new_inode(sb);
 	if (inode) {
@@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = {
 };
 MODULE_ALIAS_FS("tracefs");
 
-static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
 {
 	struct dentry *dentry;
 	int error;
@@ -437,7 +437,7 @@ static struct dentry *tracefs_start_creating(const char *name, struct dentry *pa
 	return dentry;
 }
 
-static struct dentry *tracefs_failed_creating(struct dentry *dentry)
+struct dentry *tracefs_failed_creating(struct dentry *dentry)
 {
 	inode_unlock(d_inode(dentry->d_parent));
 	dput(dentry);
@@ -445,7 +445,7 @@ static struct dentry *tracefs_failed_creating(struct dentry *dentry)
 	return NULL;
 }
 
-static struct dentry *tracefs_end_creating(struct dentry *dentry)
+struct dentry *tracefs_end_creating(struct dentry *dentry)
 {
 	inode_unlock(d_inode(dentry->d_parent));
 	return dentry;
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
new file mode 100644
index 000000000000..c443a0c32a8c
--- /dev/null
+++ b/fs/tracefs/internal.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _TRACEFS_INTERNAL_H
+#define _TRACEFS_INTERNAL_H
+
+enum {
+	TRACEFS_EVENT_INODE     = BIT(1),
+};
+
+struct tracefs_inode {
+	unsigned long           flags;
+	void                    *private;
+	struct inode            vfs_inode;
+};
+
+static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
+{
+	return container_of(inode, struct tracefs_inode, vfs_inode);
+}
+
+struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
+struct dentry *tracefs_end_creating(struct dentry *dentry);
+struct dentry *tracefs_failed_creating(struct dentry *dentry);
+struct inode *tracefs_get_inode(struct super_block *sb);
+
+#endif /* _TRACEFS_INTERNAL_H */
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 99912445974c..432e5e6f7901 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -21,6 +21,17 @@ struct file_operations;
 
 #ifdef CONFIG_TRACING
 
+struct eventfs_file;
+
+struct dentry *eventfs_create_events_dir(const char *name,
+					 struct dentry *parent);
+
+struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
+					       struct dentry *parent);
+
+struct eventfs_file *eventfs_add_dir(const char *name,
+				     struct eventfs_file *ef_parent);
+
 struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 04/10] eventfs: Implement eventfs file add functions
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (2 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 16:23   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function Ajay Kaher
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Adding following function to eventfs to add files:

eventfs_add_top_file() adds the info of top file to
eventfs and dynamically create these files when they
are accessed.

eventfs_add_file() adds the info of nested files
to eventfs and dynamically create these files when
they are accessed.

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
---
 fs/tracefs/event_inode.c | 84 ++++++++++++++++++++++++++++++++++++++++
 include/linux/tracefs.h  |  8 ++++
 2 files changed, 92 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 4e7a8eccaa0b..75dc8953d813 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -215,3 +215,87 @@ struct eventfs_file *eventfs_add_dir(const char *name,
 	mutex_unlock(&eventfs_mutex);
 	return ef;
 }
+
+/**
+ * eventfs_add_top_file - add event top file to list to create later
+ * @name: the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: parent dentry for this file.
+ * @data: something that the caller will want to get to later on. The
+ *        inode.i_private pointer will point to this value on the open() call.
+ * @fop: struct file_operations that should be used for this file.
+ *
+ * This function adds top files of event dir to list.
+ * And all these files are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+int eventfs_add_top_file(const char *name, umode_t mode,
+			 struct dentry *parent, void *data,
+			 const struct file_operations *fop)
+{
+	struct tracefs_inode *ti;
+	struct eventfs_inode *ei;
+	struct eventfs_file *ef;
+
+	if (!parent)
+		return -EINVAL;
+
+	if (!(mode & S_IFMT))
+		mode |= S_IFREG;
+
+	if (!parent->d_inode)
+		return -EINVAL;
+
+	ti = get_tracefs(parent->d_inode);
+	if (!(ti->flags & TRACEFS_EVENT_INODE))
+		return -EINVAL;
+
+	ei = ti->private;
+	ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
+
+	if (IS_ERR(ef))
+		return -ENOMEM;
+
+	mutex_lock(&eventfs_mutex);
+	list_add_tail(&ef->list, &ei->e_top_files);
+	ef->d_parent = parent;
+	mutex_unlock(&eventfs_mutex);
+	return 0;
+}
+
+/**
+ * eventfs_add_file - add eventfs file to list to create later
+ * @name: the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @ef_parent: parent eventfs_file for this file.
+ * @data: something that the caller will want to get to later on. The
+ *        inode.i_private pointer will point to this value on the open() call.
+ * @fop: struct file_operations that should be used for this file.
+ *
+ * This function adds top files of event dir to list.
+ * And all these files are created on the fly when they are looked up,
+ * and the dentry and inodes will be removed when they are done.
+ */
+int eventfs_add_file(const char *name, umode_t mode,
+		     struct eventfs_file *ef_parent,
+		     void *data,
+		     const struct file_operations *fop)
+{
+	struct eventfs_file *ef;
+
+	if (!ef_parent)
+		return -EINVAL;
+
+	if (!(mode & S_IFMT))
+		mode |= S_IFREG;
+
+	ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
+	if (IS_ERR(ef))
+		return -ENOMEM;
+
+	mutex_lock(&eventfs_mutex);
+	list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
+	ef->d_parent = ef_parent->dentry;
+	mutex_unlock(&eventfs_mutex);
+	return 0;
+}
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 432e5e6f7901..a51312ff803c 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
 struct eventfs_file *eventfs_add_dir(const char *name,
 				     struct eventfs_file *ef_parent);
 
+int eventfs_add_file(const char *name, umode_t mode,
+		     struct eventfs_file *ef_parent, void *data,
+		     const struct file_operations *fops);
+
+int eventfs_add_top_file(const char *name, umode_t mode,
+			 struct dentry *parent, void *data,
+			 const struct file_operations *fops);
+
 struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (3 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 04/10] eventfs: Implement eventfs file add functions Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 16:35   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 06/10] eventfs: Implement functions to create eventfs files and directories Ajay Kaher
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Adding eventfs_remove(), this function will recursively remove
dir or file info from eventfs.

added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).

have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/
---
 fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++
 include/linux/tracefs.h  |   4 ++
 2 files changed, 114 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 75dc8953d813..322a77be5a56 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -45,6 +45,7 @@ struct eventfs_file {
 };
 
 static DEFINE_MUTEX(eventfs_mutex);
+DEFINE_STATIC_SRCU(eventfs_srcu);
 
 static const struct file_operations eventfs_file_operations = {
 };
@@ -299,3 +300,112 @@ int eventfs_add_file(const char *name, umode_t mode,
 	mutex_unlock(&eventfs_mutex);
 	return 0;
 }
+
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+	struct eventfs_file *ef, *tmp;
+	struct llist_node *llnode;
+
+	llnode = llist_del_all(&free_list);
+	llist_for_each_entry_safe(ef, tmp, llnode, llist) {
+		if (ef->created && ef->dentry)
+			dput(ef->dentry);
+		kfree(ef->name);
+		kfree(ef->ei);
+		kfree(ef);
+	}
+}
+
+DECLARE_WORK(eventfs_work, eventfs_workfn);
+
+static void free_ef(struct rcu_head *head)
+{
+	struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
+
+	if (!llist_add(&ef->llist, &free_list))
+		return;
+
+	queue_work(system_unbound_wq, &eventfs_work);
+}
+
+
+
+/**
+ * eventfs_remove_rec - remove eventfs dir or file from list
+ * @ef: eventfs_file to be removed.
+ *
+ * This function recursively remove eventfs_file which
+ * contains info of file or dir.
+ */
+static void eventfs_remove_rec(struct eventfs_file *ef, int level)
+{
+	struct eventfs_file *ef_child;
+
+	if (!ef)
+		return;
+	/*
+	 * Check recursion depth. It should never be greater than 3:
+	 * 0 - events/
+	 * 1 - events/group/
+	 * 2 - events/group/event/
+	 * 3 - events/group/event/file
+	 */
+	if (WARN_ON_ONCE(level > 3))
+		return;
+
+	if (ef->ei) {
+		/* search for nested folders or files */
+		list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
+					 lockdep_is_held(&eventfs_mutex)) {
+			eventfs_remove_rec(ef_child, level + 1);
+		}
+	}
+
+	if (ef->created && ef->dentry)
+		d_invalidate(ef->dentry);
+
+	list_del_rcu(&ef->list);
+	call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+}
+
+/**
+ * eventfs_remove - remove eventfs dir or file from list
+ * @ef: eventfs_file to be removed.
+ *
+ * This function acquire the eventfs_mutex lock and call eventfs_remove_rec()
+ */
+void eventfs_remove(struct eventfs_file *ef)
+{
+	if (!ef)
+		return;
+
+	mutex_lock(&eventfs_mutex);
+	eventfs_remove_rec(ef, 0);
+	mutex_unlock(&eventfs_mutex);
+}
+
+/**
+ * eventfs_remove_events_dir - remove eventfs dir or file from list
+ * @dentry: events's dentry to be removed.
+ *
+ * This function remove events main directory
+ */
+void eventfs_remove_events_dir(struct dentry *dentry)
+{
+	struct tracefs_inode *ti;
+	struct eventfs_inode *ei;
+
+	if (!dentry || !dentry->d_inode)
+		return;
+
+	ti = get_tracefs(dentry->d_inode);
+	if (!ti || !(ti->flags & TRACEFS_EVENT_INODE))
+		return;
+
+	ei = ti->private;
+	d_invalidate(dentry);
+	dput(dentry);
+	kfree(ei);
+}
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index a51312ff803c..2c08edd4a739 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -40,6 +40,10 @@ int eventfs_add_top_file(const char *name, umode_t mode,
 			 struct dentry *parent, void *data,
 			 const struct file_operations *fops);
 
+void eventfs_remove(struct eventfs_file *ef);
+
+void eventfs_remove_events_dir(struct dentry *dentry);
+
 struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 06/10] eventfs: Implement functions to create eventfs files and directories
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (4 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 16:47   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions Ajay Kaher
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Adding create_file(), create_dir() to create file, dir
at runtime when they are accessed.

These function will be called either from lookup
of inode_operations or open of file_operations.

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
---
 fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++
 fs/tracefs/inode.c       |  47 +++++++++++++++++
 include/linux/tracefs.h  |   7 +++
 3 files changed, 164 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 322a77be5a56..34b5d3d8005b 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -46,6 +46,116 @@ struct eventfs_file {
 
 static DEFINE_MUTEX(eventfs_mutex);
 DEFINE_STATIC_SRCU(eventfs_srcu);
+/**
+ * create_file - create a file in the tracefs filesystem
+ * @name: the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: parent dentry for this file.
+ * @data: something that the caller will want to get to later on.
+ *        The inode.i_private pointer will point to this value on
+ *        the open() call.
+ * @fop: struct file_operations that should be used for this file.
+ *
+ * This is the basic "create a file" function for tracefs.  It allows for a
+ * wide range of flexibility in creating a file.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the tracefs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, %NULL will be returned.
+ *
+ * If tracefs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+struct dentry *create_file(const char *name, umode_t mode,
+				  struct dentry *parent, void *data,
+				  const struct file_operations *fop)
+{
+	struct tracefs_inode *ti;
+	struct dentry *dentry;
+	struct inode *inode;
+
+	if (!(mode & S_IFMT))
+		mode |= S_IFREG;
+
+	if (WARN_ON_ONCE(!S_ISREG(mode)))
+		return NULL;
+
+	dentry = eventfs_start_creating(name, parent);
+
+	if (IS_ERR(dentry))
+		return dentry;
+
+	inode = tracefs_get_inode(dentry->d_sb);
+	if (unlikely(!inode))
+		return eventfs_failed_creating(dentry);
+
+	inode->i_mode = mode;
+	inode->i_fop = fop;
+	inode->i_private = data;
+
+	ti = get_tracefs(inode);
+	ti->flags |= TRACEFS_EVENT_INODE;
+	d_instantiate(dentry, inode);
+	fsnotify_create(dentry->d_parent->d_inode, dentry);
+	return eventfs_end_creating(dentry);
+}
+
+/**
+ * create_dir - create a dir in the tracefs filesystem
+ * @name: the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: parent dentry for this file.
+ * @data: something that the caller will want to get to later on.
+ *        The inode.i_private pointer will point to this value on
+ *        the open() call.
+ * @fop: struct file_operations that should be used for this dir.
+ * @iop: struct inode_operations that should be used for this dir.
+ *
+ * This is the basic "create a dir" function for eventfs.  It allows for a
+ * wide range of flexibility in creating a dir.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the tracefs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, %NULL will be returned.
+ *
+ * If tracefs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.
+ */
+struct dentry *create_dir(const char *name, umode_t mode,
+				 struct dentry *parent, void *data,
+				 const struct file_operations *fop,
+				 const struct inode_operations *iop)
+{
+	struct tracefs_inode *ti;
+	struct dentry *dentry;
+	struct inode *inode;
+
+	WARN_ON(!S_ISDIR(mode));
+
+	dentry = eventfs_start_creating(name, parent);
+	if (IS_ERR(dentry))
+		return dentry;
+
+	inode = tracefs_get_inode(dentry->d_sb);
+	if (unlikely(!inode))
+		return eventfs_failed_creating(dentry);
+
+	inode->i_mode = mode;
+	inode->i_op = iop;
+	inode->i_fop = fop;
+	inode->i_private = data;
+
+	ti = get_tracefs(inode);
+	ti->flags |= TRACEFS_EVENT_INODE;
+
+	inc_nlink(inode);
+	d_instantiate(dentry, inode);
+	inc_nlink(dentry->d_parent->d_inode);
+	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
+	return eventfs_end_creating(dentry);
+}
 
 static const struct file_operations eventfs_file_operations = {
 };
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 7ef3a02766f5..7dc692a5fee1 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -451,6 +451,53 @@ struct dentry *tracefs_end_creating(struct dentry *dentry)
 	return dentry;
 }
 
+struct dentry *eventfs_start_creating(const char *name, struct dentry *parent)
+{
+	struct dentry *dentry;
+	int error;
+
+	error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
+			      &tracefs_mount_count);
+	if (error)
+		return ERR_PTR(error);
+
+	/*
+	 * If the parent is not specified, we create it in the root.
+	 * We need the root dentry to do this, which is in the super
+	 * block. A pointer to that is in the struct vfsmount that we
+	 * have around.
+	 */
+	if (!parent)
+		parent = tracefs_mount->mnt_root;
+
+	if (unlikely(IS_DEADDIR(parent->d_inode)))
+		dentry = ERR_PTR(-ENOENT);
+	else
+		dentry = lookup_one_len(name, parent, strlen(name));
+
+	if (!IS_ERR(dentry) && dentry->d_inode) {
+		dput(dentry);
+		dentry = ERR_PTR(-EEXIST);
+	}
+
+	if (IS_ERR(dentry))
+		simple_release_fs(&tracefs_mount, &tracefs_mount_count);
+
+	return dentry;
+}
+
+struct dentry *eventfs_failed_creating(struct dentry *dentry)
+{
+	dput(dentry);
+	simple_release_fs(&tracefs_mount, &tracefs_mount_count);
+	return NULL;
+}
+
+struct dentry *eventfs_end_creating(struct dentry *dentry)
+{
+	return dentry;
+}
+
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 2c08edd4a739..47c1b4d21735 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -23,6 +23,13 @@ struct file_operations;
 
 struct eventfs_file;
 
+struct dentry *eventfs_start_creating(const char *name,
+				      struct dentry *parent);
+
+struct dentry *eventfs_failed_creating(struct dentry *dentry);
+
+struct dentry *eventfs_end_creating(struct dentry *dentry);
+
 struct dentry *eventfs_create_events_dir(const char *name,
 					 struct dentry *parent);
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (5 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 06/10] eventfs: Implement functions to create eventfs files and directories Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 20:17   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache Ajay Kaher
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Adding following inode_operations, file_operations and helper functions to
eventfs:
dcache_dir_open_wrapper()
eventfs_root_lookup()
eventfs_release()
eventfs_set_ef_status_free()
eventfs_post_create_dir()

inode_operations, file_operations will be called from vfs.

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
---
 fs/tracefs/event_inode.c | 194 ++++++++++++++++++++++++++++++++++++++-
 include/linux/tracefs.h  |   2 +
 2 files changed, 194 insertions(+), 2 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 34b5d3d8005b..7167340ac29e 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -67,7 +67,7 @@ DEFINE_STATIC_SRCU(eventfs_srcu);
  * If tracefs is not enabled in the kernel, the value -%ENODEV will be
  * returned.
  */
-struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *create_file(const char *name, umode_t mode,
 				  struct dentry *parent, void *data,
 				  const struct file_operations *fop)
 {
@@ -123,7 +123,7 @@ struct dentry *create_file(const char *name, umode_t mode,
  * If tracefs is not enabled in the kernel, the value -%ENODEV will be
  * returned.
  */
-struct dentry *create_dir(const char *name, umode_t mode,
+static struct dentry *create_dir(const char *name, umode_t mode,
 				 struct dentry *parent, void *data,
 				 const struct file_operations *fop,
 				 const struct inode_operations *iop)
@@ -157,10 +157,200 @@ struct dentry *create_dir(const char *name, umode_t mode,
 	return eventfs_end_creating(dentry);
 }
 
+/**
+ * eventfs_set_ef_status_free - set the ef->status to free
+ * @dentry: dentry who's status to be freed
+ *
+ * eventfs_set_ef_status_free will be called if no more
+ * reference remains
+ */
+void eventfs_set_ef_status_free(struct dentry *dentry)
+{
+	struct tracefs_inode *ti_parent;
+	struct eventfs_file *ef;
+
+	ti_parent = get_tracefs(dentry->d_parent->d_inode);
+	if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
+		return;
+
+	ef = dentry->d_fsdata;
+	if (!ef)
+		return;
+	ef->created = false;
+	ef->dentry = NULL;
+}
+
+/**
+ * eventfs_post_create_dir - post create dir routine
+ * @ef: eventfs_file of recently created dir
+ *
+ * Files with-in eventfs dir should know dentry of parent dir
+ */
+static void eventfs_post_create_dir(struct eventfs_file *ef)
+{
+	struct eventfs_file *ef_child;
+	struct tracefs_inode *ti;
+
+	/* srcu lock already held */
+	/* fill parent-child relation */
+	list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
+				 srcu_read_lock_held(&eventfs_srcu)) {
+		ef_child->d_parent = ef->dentry;
+	}
+
+	ti = get_tracefs(ef->dentry->d_inode);
+	ti->private = ef->ei;
+}
+
+/**
+ * eventfs_root_lookup - lookup routine to create file/dir
+ * @dir: directory in which lookup to be done
+ * @dentry: file/dir dentry
+ * @flags:
+ *
+ * Used to create dynamic file/dir with-in @dir, search with-in ei
+ * list, if @dentry found go ahead and create the file/dir
+ */
+
+static struct dentry *eventfs_root_lookup(struct inode *dir,
+					  struct dentry *dentry,
+					  unsigned int flags)
+{
+	struct tracefs_inode *ti;
+	struct eventfs_inode *ei;
+	struct eventfs_file *ef;
+	struct dentry *ret = NULL;
+	int idx;
+
+	ti = get_tracefs(dir);
+	if (!(ti->flags & TRACEFS_EVENT_INODE))
+		return NULL;
+
+	ei = ti->private;
+	idx = srcu_read_lock(&eventfs_srcu);
+	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
+				 srcu_read_lock_held(&eventfs_srcu)) {
+		if (strcmp(ef->name, dentry->d_name.name))
+			continue;
+		ret = simple_lookup(dir, dentry, flags);
+		if (ef->created)
+			continue;
+		mutex_lock(&eventfs_mutex);
+		ef->created = true;
+		if (ef->ei)
+			ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
+						ef->data, ef->fop, ef->iop);
+		else
+			ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
+						 ef->data, ef->fop);
+
+		if (IS_ERR_OR_NULL(ef->dentry)) {
+			ef->created = false;
+			mutex_unlock(&eventfs_mutex);
+		} else {
+			if (ef->ei)
+				eventfs_post_create_dir(ef);
+			ef->dentry->d_fsdata = ef;
+			mutex_unlock(&eventfs_mutex);
+			dput(ef->dentry);
+		}
+		break;
+	}
+	srcu_read_unlock(&eventfs_srcu, idx);
+	return ret;
+}
+
+/**
+ * eventfs_release - called to release eventfs file/dir
+ * @inode: inode to be released
+ * @file: file to be released (not used)
+ */
+static int eventfs_release(struct inode *inode, struct file *file)
+{
+	struct tracefs_inode *ti;
+	struct eventfs_inode *ei;
+	struct eventfs_file *ef;
+	int idx;
+
+	ti = get_tracefs(inode);
+	if (!(ti->flags & TRACEFS_EVENT_INODE))
+		return -EINVAL;
+
+	ei = ti->private;
+	idx = srcu_read_lock(&eventfs_srcu);
+	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
+				 srcu_read_lock_held(&eventfs_srcu)) {
+		if (ef->created)
+			dput(ef->dentry);
+	}
+	srcu_read_unlock(&eventfs_srcu, idx);
+	return dcache_dir_close(inode, file);
+}
+
+/**
+ * dcache_dir_open_wrapper - eventfs open wrapper
+ * @inode: not used
+ * @file: dir to be opened (to create it's child)
+ *
+ * Used to dynamic create file/dir with-in @file, all the
+ * file/dir will be created. If already created then reference
+ * will be increased
+ */
+static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
+{
+	struct tracefs_inode *ti;
+	struct eventfs_inode *ei;
+	struct eventfs_file *ef;
+	struct dentry *dentry = file_dentry(file);
+	struct inode *f_inode = file_inode(file);
+	int idx;
+
+	ti = get_tracefs(f_inode);
+	if (!(ti->flags & TRACEFS_EVENT_INODE))
+		return -EINVAL;
+
+	ei = ti->private;
+	idx = srcu_read_lock(&eventfs_srcu);
+	list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
+		if (ef->created) {
+			dget(ef->dentry);
+			continue;
+		}
+		mutex_lock(&eventfs_mutex);
+		ef->created = true;
+
+		inode_lock(dentry->d_inode);
+		if (ef->ei)
+			ef->dentry = create_dir(ef->name, ef->mode, dentry,
+						ef->data, ef->fop, ef->iop);
+		else
+			ef->dentry = create_file(ef->name, ef->mode, dentry,
+						 ef->data, ef->fop);
+		inode_unlock(dentry->d_inode);
+
+		if (IS_ERR_OR_NULL(ef->dentry)) {
+			ef->created = false;
+		} else {
+			if (ef->ei)
+				eventfs_post_create_dir(ef);
+			ef->dentry->d_fsdata = ef;
+		}
+		mutex_unlock(&eventfs_mutex);
+	}
+	srcu_read_unlock(&eventfs_srcu, idx);
+	return dcache_dir_open(inode, file);
+}
+
 static const struct file_operations eventfs_file_operations = {
+	.open           = dcache_dir_open_wrapper,
+	.read		= generic_read_dir,
+	.iterate_shared	= dcache_readdir,
+	.llseek		= generic_file_llseek,
+	.release        = eventfs_release,
 };
 
 static const struct inode_operations eventfs_root_dir_inode_operations = {
+	.lookup		= eventfs_root_lookup,
 };
 
 /**
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 47c1b4d21735..4d30b0cafc5f 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -51,6 +51,8 @@ void eventfs_remove(struct eventfs_file *ef);
 
 void eventfs_remove_events_dir(struct dentry *dentry);
 
+void eventfs_set_ef_status_free(struct dentry *dentry);
+
 struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (6 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 20:27   ` Steven Rostedt
  2023-07-13 11:33 ` [PATCH v4 09/10] eventfs: Move tracing/events to eventfs Ajay Kaher
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Creating tracefs_inode_cache which is a cache of tracefs_inode.
Adding helping functions:
tracefs_alloc_inode()
tracefs_free_inode()

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
---
 fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 7dc692a5fee1..76820d3e97eb 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -21,13 +21,33 @@
 #include <linux/parser.h>
 #include <linux/magic.h>
 #include <linux/slab.h>
+#include "internal.h"
 
 #define TRACEFS_DEFAULT_MODE	0700
+static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
 
 static struct vfsmount *tracefs_mount;
 static int tracefs_mount_count;
 static bool tracefs_registered;
 
+static struct inode *tracefs_alloc_inode(struct super_block *sb)
+{
+	struct tracefs_inode *ti;
+
+	ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL);
+	if (!ti)
+		return NULL;
+
+	ti->flags = 0;
+
+	return &ti->vfs_inode;
+}
+
+static void tracefs_free_inode(struct inode *inode)
+{
+	kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+}
+
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
 {
@@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root)
 }
 
 static const struct super_operations tracefs_super_operations = {
+	.alloc_inode    = tracefs_alloc_inode,
+	.free_inode     = tracefs_free_inode,
+	.drop_inode     = generic_delete_inode,
 	.statfs		= simple_statfs,
 	.remount_fs	= tracefs_remount,
 	.show_options	= tracefs_show_options,
@@ -675,10 +698,26 @@ bool tracefs_initialized(void)
 	return tracefs_registered;
 }
 
+static void init_once(void *foo)
+{
+	struct tracefs_inode *ti = (struct tracefs_inode *) foo;
+
+	inode_init_once(&ti->vfs_inode);
+}
+
 static int __init tracefs_init(void)
 {
 	int retval;
 
+	tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
+						 sizeof(struct tracefs_inode),
+						 0, (SLAB_RECLAIM_ACCOUNT|
+						     SLAB_MEM_SPREAD|
+						     SLAB_ACCOUNT),
+						 init_once);
+	if (!tracefs_inode_cachep)
+		return -ENOMEM;
+
 	retval = sysfs_create_mount_point(kernel_kobj, "tracing");
 	if (retval)
 		return -EINVAL;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 09/10] eventfs: Move tracing/events to eventfs
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (7 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14 21:06   ` Steven Rostedt
  2023-09-08 12:14   ` Sven Schnelle
  2023-07-13 11:33 ` [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs Ajay Kaher
  2023-07-14 22:58 ` [PATCH v4 00/10] tracing: introducing eventfs Steven Rostedt
  10 siblings, 2 replies; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

Till now /sys/kernel/debug/tracing/events is a part of tracefs,
with-in this patch creating 'events' and it's sub-dir as eventfs.
Basically replacing tracefs calls with eventfs calls for 'events'.

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
---
 fs/tracefs/inode.c           | 18 ++++++++++
 include/linux/trace_events.h |  1 +
 kernel/trace/trace.h         |  2 +-
 kernel/trace/trace_events.c  | 67 +++++++++++++++++++-----------------
 4 files changed, 55 insertions(+), 33 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 76820d3e97eb..a098d7153498 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = {
 	.show_options	= tracefs_show_options,
 };
 
+static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+{
+	struct tracefs_inode *ti;
+
+	if (!dentry || !inode)
+		return;
+
+	ti = get_tracefs(inode);
+	if (ti && ti->flags & TRACEFS_EVENT_INODE)
+		eventfs_set_ef_status_free(dentry);
+	iput(inode);
+}
+
+static const struct dentry_operations tracefs_dentry_operations = {
+	.d_iput = tracefs_dentry_iput,
+};
+
 static int trace_fill_super(struct super_block *sb, void *data, int silent)
 {
 	static const struct tree_descr trace_files[] = {{""}};
@@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent)
 		goto fail;
 
 	sb->s_op = &tracefs_super_operations;
+	sb->s_d_op = &tracefs_dentry_operations;
 
 	tracefs_apply_options(sb, false);
 
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 7c4a0b72334e..c6bb74cccea3 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -635,6 +635,7 @@ struct trace_event_file {
 	struct list_head		list;
 	struct trace_event_call		*event_call;
 	struct event_filter __rcu	*filter;
+	struct eventfs_file             *ef;
 	struct dentry			*dir;
 	struct trace_array		*tr;
 	struct trace_subsystem_dir	*system;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 79bdefe9261b..d9a0ec2b918e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1309,7 +1309,7 @@ struct trace_subsystem_dir {
 	struct list_head		list;
 	struct event_subsystem		*subsystem;
 	struct trace_array		*tr;
-	struct dentry			*entry;
+	struct eventfs_file             *ef;
 	int				ref_count;
 	int				nr_events;
 };
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 12ed71428939..4058178de682 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -990,7 +990,8 @@ static void remove_subsystem(struct trace_subsystem_dir *dir)
 		return;
 
 	if (!--dir->nr_events) {
-		tracefs_remove(dir->entry);
+		if (dir->ef)
+			eventfs_remove(dir->ef);
 		list_del(&dir->list);
 		__put_system_dir(dir);
 	}
@@ -1011,7 +1012,8 @@ static void remove_event_file_dir(struct trace_event_file *file)
 
 		tracefs_remove(dir);
 	}
-
+	if (file->ef)
+		eventfs_remove(file->ef);
 	list_del(&file->list);
 	remove_subsystem(file->system);
 	free_event_filter(file->filter);
@@ -2297,13 +2299,13 @@ create_new_subsystem(const char *name)
 	return NULL;
 }
 
-static struct dentry *
+static struct eventfs_file *
 event_subsystem_dir(struct trace_array *tr, const char *name,
 		    struct trace_event_file *file, struct dentry *parent)
 {
 	struct event_subsystem *system, *iter;
 	struct trace_subsystem_dir *dir;
-	struct dentry *entry;
+	int res;
 
 	/* First see if we did not already create this dir */
 	list_for_each_entry(dir, &tr->systems, list) {
@@ -2311,7 +2313,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
 		if (strcmp(system->name, name) == 0) {
 			dir->nr_events++;
 			file->system = dir;
-			return dir->entry;
+			return dir->ef;
 		}
 	}
 
@@ -2335,8 +2337,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
 	} else
 		__get_system(system);
 
-	dir->entry = tracefs_create_dir(name, parent);
-	if (!dir->entry) {
+	dir->ef = eventfs_add_subsystem_dir(name, parent);
+	if (IS_ERR(dir->ef)) {
 		pr_warn("Failed to create system directory %s\n", name);
 		__put_system(system);
 		goto out_free;
@@ -2351,22 +2353,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
 	/* the ftrace system is special, do not create enable or filter files */
 	if (strcmp(name, "ftrace") != 0) {
 
-		entry = tracefs_create_file("filter", TRACE_MODE_WRITE,
-					    dir->entry, dir,
+		res = eventfs_add_file("filter", TRACE_MODE_WRITE,
+					    dir->ef, dir,
 					    &ftrace_subsystem_filter_fops);
-		if (!entry) {
+		if (res) {
 			kfree(system->filter);
 			system->filter = NULL;
 			pr_warn("Could not create tracefs '%s/filter' entry\n", name);
 		}
 
-		trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir,
+		eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir,
 				  &ftrace_system_enable_fops);
 	}
 
 	list_add(&dir->list, &tr->systems);
 
-	return dir->entry;
+	return dir->ef;
 
  out_free:
 	kfree(dir);
@@ -2420,7 +2422,7 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 {
 	struct trace_event_call *call = file->event_call;
 	struct trace_array *tr = file->tr;
-	struct dentry *d_events;
+	struct eventfs_file *ef_subsystem = NULL;
 	const char *name;
 	int ret;
 
@@ -2432,24 +2434,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 	if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0))
 		return -ENODEV;
 
-	d_events = event_subsystem_dir(tr, call->class->system, file, parent);
-	if (!d_events)
+	ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent);
+	if (!ef_subsystem)
 		return -ENOMEM;
 
 	name = trace_event_name(call);
-	file->dir = tracefs_create_dir(name, d_events);
-	if (!file->dir) {
+	file->ef = eventfs_add_dir(name, ef_subsystem);
+	if (IS_ERR(file->ef)) {
 		pr_warn("Could not create tracefs '%s' directory\n", name);
 		return -1;
 	}
 
 	if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE))
-		trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file,
+		eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file,
 				  &ftrace_enable_fops);
 
 #ifdef CONFIG_PERF_EVENTS
 	if (call->event.type && call->class->reg)
-		trace_create_file("id", TRACE_MODE_READ, file->dir,
+		eventfs_add_file("id", TRACE_MODE_READ, file->ef,
 				  (void *)(long)call->event.type,
 				  &ftrace_event_id_fops);
 #endif
@@ -2465,27 +2467,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 	 * triggers or filters.
 	 */
 	if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
-		trace_create_file("filter", TRACE_MODE_WRITE, file->dir,
+		eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef,
 				  file, &ftrace_event_filter_fops);
 
-		trace_create_file("trigger", TRACE_MODE_WRITE, file->dir,
+		eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef,
 				  file, &event_trigger_fops);
 	}
 
 #ifdef CONFIG_HIST_TRIGGERS
-	trace_create_file("hist", TRACE_MODE_READ, file->dir, file,
+	eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file,
 			  &event_hist_fops);
 #endif
 #ifdef CONFIG_HIST_TRIGGERS_DEBUG
-	trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file,
+	eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file,
 			  &event_hist_debug_fops);
 #endif
-	trace_create_file("format", TRACE_MODE_READ, file->dir, call,
+	eventfs_add_file("format", TRACE_MODE_READ, file->ef, call,
 			  &ftrace_event_format_fops);
 
 #ifdef CONFIG_TRACE_EVENT_INJECT
 	if (call->event.type && call->class->reg)
-		trace_create_file("inject", 0200, file->dir, file,
+		eventfs_add_file("inject", 0200, file->ef, file,
 				  &event_inject_fops);
 #endif
 
@@ -3638,21 +3640,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
 {
 	struct dentry *d_events;
 	struct dentry *entry;
+	int error = 0;
 
 	entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
 				  tr, &ftrace_set_event_fops);
 	if (!entry)
 		return -ENOMEM;
 
-	d_events = tracefs_create_dir("events", parent);
-	if (!d_events) {
+	d_events = eventfs_create_events_dir("events", parent);
+	if (IS_ERR(d_events)) {
 		pr_warn("Could not create tracefs 'events' directory\n");
 		return -ENOMEM;
 	}
 
-	entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events,
+	error = eventfs_add_top_file("enable", TRACE_MODE_WRITE, d_events,
 				  tr, &ftrace_tr_enable_fops);
-	if (!entry)
+	if (error)
 		return -ENOMEM;
 
 	/* There are not as crucial, just warn if they are not created */
@@ -3665,11 +3668,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
 			  &ftrace_set_event_notrace_pid_fops);
 
 	/* ring buffer internal formats */
-	trace_create_file("header_page", TRACE_MODE_READ, d_events,
+	eventfs_add_top_file("header_page", TRACE_MODE_READ, d_events,
 				  ring_buffer_print_page_header,
 				  &ftrace_show_header_fops);
 
-	trace_create_file("header_event", TRACE_MODE_READ, d_events,
+	eventfs_add_top_file("header_event", TRACE_MODE_READ, d_events,
 				  ring_buffer_print_entry_header,
 				  &ftrace_show_header_fops);
 
@@ -3757,7 +3760,7 @@ int event_trace_del_tracer(struct trace_array *tr)
 
 	down_write(&trace_event_sem);
 	__trace_remove_event_dirs(tr);
-	tracefs_remove(tr->event_dir);
+	eventfs_remove_events_dir(tr->event_dir);
 	up_write(&trace_event_sem);
 
 	tr->event_dir = NULL;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (8 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 09/10] eventfs: Move tracing/events to eventfs Ajay Kaher
@ 2023-07-13 11:33 ` Ajay Kaher
  2023-07-14  2:37   ` Steven Rostedt
  2023-07-14 22:58 ` [PATCH v4 00/10] tracing: introducing eventfs Steven Rostedt
  10 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-13 11:33 UTC (permalink / raw)
  To: rostedt, shuah, mhiramat
  Cc: chinglinyu, linux-kernel, linux-kselftest, linux-trace-kernel,
	lkp, namit, oe-lkp, akaher, amakhalov, er.ajay.kaher, srivatsa,
	tkundu, vsirnapalli

kprobe_args_char.tc, kprobe_args_string.tc has validation check
for tracefs_create_dir, for eventfs it should be eventfs_create_dir.

Signed-off-by: Ajay Kaher <akaher@vmware.com>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ching-lin Yu <chinglinyu@google.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc        | 4 ++--
 .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc      | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
index 285b4770efad..523cfb64539f 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
@@ -34,14 +34,14 @@ mips*)
 esac
 
 : "Test get argument (1)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
+echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1='t'" trace
 
 echo 0 > events/kprobes/testprobe/enable
 : "Test get argument (2)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
+echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
index a4f8e7c53c1f..b9f8c3f8bae8 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -37,14 +37,14 @@ loongarch*)
 esac
 
 : "Test get argument (1)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
+echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1=\"test\"" trace
 
 echo 0 > events/kprobes/testprobe/enable
 : "Test get argument (2)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
+echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs
  2023-07-13 11:33 ` [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs Ajay Kaher
@ 2023-07-14  2:37   ` Steven Rostedt
  2023-07-14 13:27     ` Masami Hiramatsu
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14  2:37 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli

On Thu, 13 Jul 2023 17:03:24 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> kprobe_args_char.tc, kprobe_args_string.tc has validation check
> for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

As this patch as is will break when running on older kernels, I was
wondering if we should do this instead?

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
index 285b4770efad..ff7499eb98d6 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
@@ -34,14 +34,19 @@ mips*)
 esac
 
 : "Test get argument (1)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
+if grep -q eventfs_add_dir available_filter_functions; then
+  DIR_NAME="eventfs_add_dir"
+else
+  DIR_NAME="tracefs_create_dir"
+fi
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1='t'" trace
 
 echo 0 > events/kprobes/testprobe/enable
 : "Test get argument (2)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
index a4f8e7c53c1f..a202b2ea4baf 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -37,14 +37,19 @@ loongarch*)
 esac
 
 : "Test get argument (1)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
+if grep -q eventfs_add_dir available_filter_functions; then
+  DIR_NAME="eventfs_add_dir"
+else
+  DIR_NAME="tracefs_create_dir"
+fi
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1=\"test\"" trace
 
 echo 0 > events/kprobes/testprobe/enable
 : "Test get argument (2)"
-echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
+echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
 echo "p:test $FUNCTION_FORK" >> kprobe_events
 grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace

-- Steve

> ---
>  .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc        | 4 ++--
>  .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc      | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> index 285b4770efad..523cfb64539f 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> @@ -34,14 +34,14 @@ mips*)
>  esac
>  
>  : "Test get argument (1)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1='t'" trace
>  
>  echo 0 > events/kprobes/testprobe/enable
>  : "Test get argument (2)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> index a4f8e7c53c1f..b9f8c3f8bae8 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> @@ -37,14 +37,14 @@ loongarch*)
>  esac
>  
>  : "Test get argument (1)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1=\"test\"" trace
>  
>  echo 0 > events/kprobes/testprobe/enable
>  : "Test get argument (2)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs
  2023-07-14  2:37   ` Steven Rostedt
@ 2023-07-14 13:27     ` Masami Hiramatsu
  2023-07-17  5:24       ` Ajay Kaher
  0 siblings, 1 reply; 44+ messages in thread
From: Masami Hiramatsu @ 2023-07-14 13:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ajay Kaher, shuah, mhiramat, chinglinyu, linux-kernel,
	linux-kselftest, linux-trace-kernel, lkp, namit, oe-lkp,
	amakhalov, er.ajay.kaher, srivatsa, tkundu, vsirnapalli

On Thu, 13 Jul 2023 22:37:58 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 13 Jul 2023 17:03:24 +0530
> Ajay Kaher <akaher@vmware.com> wrote:
> 
> > kprobe_args_char.tc, kprobe_args_string.tc has validation check
> > for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
> > 
> > Signed-off-by: Ajay Kaher <akaher@vmware.com>
> > Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Tested-by: Ching-lin Yu <chinglinyu@google.com>
> > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> As this patch as is will break when running on older kernels, I was
> wondering if we should do this instead?

+1 since the latest kselftest is used also for checking the older
stable kernels, the test case has to check the environment and
change the parameter (or make it unsupported for new feature)
So below looks good to me.

Thanks,

> 
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> index 285b4770efad..ff7499eb98d6 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> @@ -34,14 +34,19 @@ mips*)
>  esac
>  
>  : "Test get argument (1)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
> +if grep -q eventfs_add_dir available_filter_functions; then
> +  DIR_NAME="eventfs_add_dir"
> +else
> +  DIR_NAME="tracefs_create_dir"
> +fi
> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1='t'" trace
>  
>  echo 0 > events/kprobes/testprobe/enable
>  : "Test get argument (2)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> index a4f8e7c53c1f..a202b2ea4baf 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> @@ -37,14 +37,19 @@ loongarch*)
>  esac
>  
>  : "Test get argument (1)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
> +if grep -q eventfs_add_dir available_filter_functions; then
> +  DIR_NAME="eventfs_add_dir"
> +else
> +  DIR_NAME="tracefs_create_dir"
> +fi
> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1=\"test\"" trace
>  
>  echo 0 > events/kprobes/testprobe/enable
>  : "Test get argument (2)"
> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
>  echo 1 > events/kprobes/testprobe/enable
>  echo "p:test $FUNCTION_FORK" >> kprobe_events
>  grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
> 
> -- Steve
> 
> > ---
> >  .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc        | 4 ++--
> >  .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc      | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> > index 285b4770efad..523cfb64539f 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
> > @@ -34,14 +34,14 @@ mips*)
> >  esac
> >  
> >  : "Test get argument (1)"
> > -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
> > +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> >  echo "p:test $FUNCTION_FORK" >> kprobe_events
> >  grep -qe "testprobe.* arg1='t'" trace
> >  
> >  echo 0 > events/kprobes/testprobe/enable
> >  : "Test get argument (2)"
> > -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
> > +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> >  echo "p:test $FUNCTION_FORK" >> kprobe_events
> >  grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
> > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > index a4f8e7c53c1f..b9f8c3f8bae8 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > @@ -37,14 +37,14 @@ loongarch*)
> >  esac
> >  
> >  : "Test get argument (1)"
> > -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
> > +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> >  echo "p:test $FUNCTION_FORK" >> kprobe_events
> >  grep -qe "testprobe.* arg1=\"test\"" trace
> >  
> >  echo 0 > events/kprobes/testprobe/enable
> >  : "Test get argument (2)"
> > -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
> > +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> >  echo "p:test $FUNCTION_FORK" >> kprobe_events
> >  grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 02/10] tracefs: Rename some tracefs function
  2023-07-13 11:33 ` [PATCH v4 02/10] tracefs: Rename some tracefs function Ajay Kaher
@ 2023-07-14 15:11   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 15:11 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli


Some nits.

First, the subject should be:

 tracefs: Rename and export some tracefs functions

Yes, we should export them here too, even though they are not used yet.

On Thu, 13 Jul 2023 17:03:16 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Renaming following functions as these would require by eventfs
> as well:

I would have this state:

Export a few tracefs functions that will be needed by the eventfs dynamic
file system. Rename them to start with "tracefs_" to keep with the name
space.

> 
> start_creating -> tracefs_start_creating
> failed_creating -> tracefs_failed_creating
> end_creating -> tracefs_end_creating
> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> ---
>  fs/tracefs/inode.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 57ac8aa4a724..b0348efc0238 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = {
>  };
>  MODULE_ALIAS_FS("tracefs");
>  
> -static struct dentry *start_creating(const char *name, struct dentry *parent)
> +static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)

Remove the static, and create the fs/tracefs/internal.h file in this patch.

-- Steve


>  {
>  	struct dentry *dentry;
>  	int error;
> @@ -437,7 +437,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent)
>  	return dentry;
>  }
>  
> -static struct dentry *failed_creating(struct dentry *dentry)
> +static struct dentry *tracefs_failed_creating(struct dentry *dentry)
>  {
>  	inode_unlock(d_inode(dentry->d_parent));
>  	dput(dentry);
> @@ -445,7 +445,7 @@ static struct dentry *failed_creating(struct dentry *dentry)
>  	return NULL;
>  }
>  
> -static struct dentry *end_creating(struct dentry *dentry)
> +static struct dentry *tracefs_end_creating(struct dentry *dentry)
>  {
>  	inode_unlock(d_inode(dentry->d_parent));
>  	return dentry;
> @@ -490,14 +490,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  	if (!(mode & S_IFMT))
>  		mode |= S_IFREG;
>  	BUG_ON(!S_ISREG(mode));
> -	dentry = start_creating(name, parent);
> +	dentry = tracefs_start_creating(name, parent);
>  
>  	if (IS_ERR(dentry))
>  		return NULL;
>  
>  	inode = tracefs_get_inode(dentry->d_sb);
>  	if (unlikely(!inode))
> -		return failed_creating(dentry);
> +		return tracefs_failed_creating(dentry);
>  
>  	inode->i_mode = mode;
>  	inode->i_fop = fops ? fops : &tracefs_file_operations;
> @@ -506,13 +506,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  	inode->i_gid = d_inode(dentry->d_parent)->i_gid;
>  	d_instantiate(dentry, inode);
>  	fsnotify_create(d_inode(dentry->d_parent), dentry);
> -	return end_creating(dentry);
> +	return tracefs_end_creating(dentry);
>  }
>  
>  static struct dentry *__create_dir(const char *name, struct dentry *parent,
>  				   const struct inode_operations *ops)
>  {
> -	struct dentry *dentry = start_creating(name, parent);
> +	struct dentry *dentry = tracefs_start_creating(name, parent);
>  	struct inode *inode;
>  
>  	if (IS_ERR(dentry))
> @@ -520,7 +520,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
>  
>  	inode = tracefs_get_inode(dentry->d_sb);
>  	if (unlikely(!inode))
> -		return failed_creating(dentry);
> +		return tracefs_failed_creating(dentry);
>  
>  	/* Do not set bits for OTH */
>  	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP;
> @@ -534,7 +534,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
>  	d_instantiate(dentry, inode);
>  	inc_nlink(d_inode(dentry->d_parent));
>  	fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
> -	return end_creating(dentry);
> +	return tracefs_end_creating(dentry);
>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions
  2023-07-13 11:33 ` [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions Ajay Kaher
@ 2023-07-14 15:44   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 15:44 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli


Some more nits.

On Thu, 13 Jul 2023 17:03:17 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Adding eventfs_file structure which will hold properties of file or dir.

Add eventfs_file structure which will hold the properties of the eventfs
files and directories.

> 
> Adding following functions to add dir in eventfs:

Add following functions to create the directories in eventfs:

> 
> eventfs_create_events_dir() will directly create events dir within
> tracing folder.

eventfs_create_events_dir() will create the top level "events" directory
within the tracefs file system.

> 
> eventfs_add_subsystem_dir() adds the information of subsystem_dir
> to eventfs, and dynamically creates subsystem_dir directories when
> they are accessed.

eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the
given name of the subsystem.

> 
> eventfs_add_dir() adds the information of the dir, within a
> subsystem_dir, to eventfs and dynamically creates these directories
> when they are accessed.

eventfs_add_dir() creates an eventfs_file descriptor with the given name of
the directory and attached to a eventfs_file of a subsystem.

> 
> Adding tracefs_inode structure, this will help eventfs to keep track
> of inode, flags and pointer to private date.

Add tracefs_inode structure to hold the inodes, flags and pointers to
private data used by eventfs.

> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
> ---
>  fs/tracefs/Makefile      |   1 +
>  fs/tracefs/event_inode.c | 217 +++++++++++++++++++++++++++++++++++++++
>  fs/tracefs/inode.c       |   8 +-
>  fs/tracefs/internal.h    |  25 +++++
>  include/linux/tracefs.h  |  11 ++
>  5 files changed, 258 insertions(+), 4 deletions(-)
>  create mode 100644 fs/tracefs/event_inode.c
>  create mode 100644 fs/tracefs/internal.h
> 
> diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile
> index 7c35a282b484..73c56da8e284 100644
> --- a/fs/tracefs/Makefile
> +++ b/fs/tracefs/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  tracefs-objs	:= inode.o
> +tracefs-objs	+= event_inode.o
>  
>  obj-$(CONFIG_TRACING)	+= tracefs.o
>  
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> new file mode 100644
> index 000000000000..4e7a8eccaa0b
> --- /dev/null
> +++ b/fs/tracefs/event_inode.c
> @@ -0,0 +1,217 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + *  event_inode.c - part of tracefs, a pseudo file system for activating tracing
> + *
> + *  Copyright (C) 2020-22 VMware Inc, author: Steven Rostedt (VMware) <rostedt@goodmis.org>
> + *  Copyright (C) 2020-22 VMware Inc, author: Ajay Kaher <akaher@vmware.com>
> + *
> + *  eventfs is used to show trace events with one set of dentries

 eventfs is used to dynamically create inodes and dentries based on the
 meta data provided by the tracing system.


> + *
> + *  eventfs stores meta-data of files/dirs and skip to create object of
> + *  inodes/dentries. As and when requires, eventfs will create the
> + *  inodes/dentries for only required files/directories. Also eventfs
> + *  would delete the inodes/dentries once no more requires but preserve
> + *  the meta data.

 eventfs stores the meta-data of files/dirs and holds off on creating
 inodes/dentries of the files. When accessed, the eventfs will create the
 inodes/dentries in a just-in-time (JIT) manner. The eventfs will clean up
 and delete the inodes/dentries when they are no longer referenced.


> + */
> +#include <linux/fsnotify.h>
> +#include <linux/fs.h>
> +#include <linux/namei.h>
> +#include <linux/workqueue.h>
> +#include <linux/security.h>
> +#include <linux/tracefs.h>
> +#include <linux/kref.h>
> +#include <linux/delay.h>
> +#include "internal.h"
> +
> +struct eventfs_inode {
> +	struct list_head	e_top_files;
> +};
> +

We probably want to document the below structure, and only add 
fields as they are added by the patches.

> +struct eventfs_file {
> +	const char			*name;
> +	struct dentry			*d_parent;
> +	struct dentry			*dentry;

dentry is never assigned, although at the end in eventfs_add_dir() we have:

	ef->d_parent = ef_parent->dentry;

Both the above assignment and entry in the structure should be removed from
this patch. It makes it easier to review. Because that assignment is
meaningless. When the above line is added in later patches, it should have
meaning.

When breaking up patches, you don't just want to cut and paste in blocks of
functions. You want to build the functionality, so the added code makes
sense.

> +	struct list_head		list;
> +	struct eventfs_inode		*ei;
> +	const struct file_operations	*fop;
> +	const struct inode_operations	*iop;

> +	union {
> +		struct rcu_head		rcu;
> +		struct llist_node	llist;  /* For freeing after RCU */
> +	};

The above union should be added when the clean up code is added.

> +	void				*data;
> +	umode_t				mode;

> +	bool				created;

The "created" field should be added when its use case is added.

> +};
> +
> +static DEFINE_MUTEX(eventfs_mutex);
> +
> +static const struct file_operations eventfs_file_operations = {
> +};
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> +};
> +
> +/**
> + * eventfs_prepare_ef - helper function to prepare eventfs_file
> + * @name: the name of the file/directory to create.
> + * @mode: the permission that the file should have.
> + * @fop: struct file_operations that should be used for this file/directory.
> + * @iop: struct inode_operations that should be used for this file/directory.
> + * @data: something that the caller will want to get to later on. The
> + *        inode.i_private pointer will point to this value on the open() call.
> + *
> + * This function allocates and fills the eventfs_file structure.
> + */
> +static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
> +					const struct file_operations *fop,
> +					const struct inode_operations *iop,
> +					void *data)
> +{
> +	struct eventfs_file *ef;
> +
> +	ef = kzalloc(sizeof(*ef), GFP_KERNEL);
> +	if (!ef)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ef->name = kstrdup(name, GFP_KERNEL);
> +	if (!ef->name) {
> +		kfree(ef);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	if (S_ISDIR(mode)) {
> +		ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL);
> +		if (!ef->ei) {
> +			kfree(ef->name);
> +			kfree(ef);
> +			return ERR_PTR(-ENOMEM);
> +		}
> +		INIT_LIST_HEAD(&ef->ei->e_top_files);
> +	} else {
> +		ef->ei = NULL;
> +	}
> +
> +	ef->iop = iop;
> +	ef->fop = fop;
> +	ef->mode = mode;
> +	ef->data = data;
> +	return ef;
> +}
> +
> +/**
> + * eventfs_create_events_dir - create the trace event structure
> + * @name: the name of the directory to create.
> + * @parent: parent dentry for this file.  This should be a directory dentry
> + *          if set.  If this parameter is NULL, then the directory will be
> + *          created in the root of the tracefs filesystem.
> + *
> + * This function creates the top of the trace event directory.
> + */
> +struct dentry *eventfs_create_events_dir(const char *name,
> +					 struct dentry *parent)
> +{
> +	struct dentry *dentry = tracefs_start_creating(name, parent);
> +	struct eventfs_inode *ei;
> +	struct tracefs_inode *ti;
> +	struct inode *inode;
> +
> +	if (IS_ERR(dentry))
> +		return dentry;
> +
> +	ei = kzalloc(sizeof(*ei), GFP_KERNEL);
> +	if (!ei)
> +		return ERR_PTR(-ENOMEM);
> +	inode = tracefs_get_inode(dentry->d_sb);
> +	if (unlikely(!inode)) {
> +		kfree(ei);
> +		tracefs_failed_creating(dentry);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	INIT_LIST_HEAD(&ei->e_top_files);
> +
> +	ti = get_tracefs(inode);
> +	ti->flags |= TRACEFS_EVENT_INODE;
> +	ti->private = ei;
> +
> +	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> +	inode->i_op = &eventfs_root_dir_inode_operations;
> +	inode->i_fop = &eventfs_file_operations;
> +
> +	/* directory inodes start off with i_nlink == 2 (for "." entry) */
> +	inc_nlink(inode);
> +	d_instantiate(dentry, inode);
> +	inc_nlink(dentry->d_parent->d_inode);
> +	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
> +	return tracefs_end_creating(dentry);
> +}
> +
> +/**
> + * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later
> + * @name: the name of the file to create.
> + * @parent: parent dentry for this dir.
> + *
> + * This function adds eventfs subsystem dir to list.
> + * And all these dirs are created on the fly when they are looked up,
> + * and the dentry and inodes will be removed when they are done.
> + */
> +struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> +					       struct dentry *parent)
> +{
> +	struct tracefs_inode *ti_parent;
> +	struct eventfs_inode *ei_parent;
> +	struct eventfs_file *ef;
> +
> +	if (!parent)
> +		return ERR_PTR(-EINVAL);
> +
> +	ti_parent = get_tracefs(parent->d_inode);
> +	ei_parent = ti_parent->private;
> +
> +	ef = eventfs_prepare_ef(name,
> +		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> +		&eventfs_file_operations,
> +		&eventfs_root_dir_inode_operations, NULL);
> +
> +	if (IS_ERR(ef))
> +		return ef;
> +
> +	mutex_lock(&eventfs_mutex);
> +	list_add_tail(&ef->list, &ei_parent->e_top_files);
> +	ef->d_parent = parent;
> +	mutex_unlock(&eventfs_mutex);
> +	return ef;
> +}
> +
> +/**
> + * eventfs_add_dir - add eventfs dir to list to create later
> + * @name: the name of the file to create.
> + * @ef_parent: parent eventfs_file for this dir.
> + *
> + * This function adds eventfs dir to list.
> + * And all these dirs are created on the fly when they are looked up,
> + * and the dentry and inodes will be removed when they are done.
> + */
> +struct eventfs_file *eventfs_add_dir(const char *name,
> +				     struct eventfs_file *ef_parent)
> +{
> +	struct eventfs_file *ef;
> +
> +	if (!ef_parent)
> +		return ERR_PTR(-EINVAL);
> +
> +	ef = eventfs_prepare_ef(name,
> +		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> +		&eventfs_file_operations,
> +		&eventfs_root_dir_inode_operations, NULL);
> +
> +	if (IS_ERR(ef))
> +		return ef;
> +
> +	mutex_lock(&eventfs_mutex);
> +	list_add_tail(&ef->list, &ef_parent->ei->e_top_files);

> +	ef->d_parent = ef_parent->dentry;

As mentioned above. Let's not add the above assignment yet, as the
ef_parent->dentry will not be anything but NULL here.

-- Steve


> +	mutex_unlock(&eventfs_mutex);
> +	return ef;
> +}
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index b0348efc0238..7ef3a02766f5 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -127,7 +127,7 @@ static const struct inode_operations tracefs_dir_inode_operations = {
>  	.rmdir		= tracefs_syscall_rmdir,
>  };
>  
> -static struct inode *tracefs_get_inode(struct super_block *sb)
> +struct inode *tracefs_get_inode(struct super_block *sb)
>  {
>  	struct inode *inode = new_inode(sb);
>  	if (inode) {
> @@ -399,7 +399,7 @@ static struct file_system_type trace_fs_type = {
>  };
>  MODULE_ALIAS_FS("tracefs");
>  
> -static struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
> +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
>  {
>  	struct dentry *dentry;
>  	int error;
> @@ -437,7 +437,7 @@ static struct dentry *tracefs_start_creating(const char *name, struct dentry *pa
>  	return dentry;
>  }
>  
> -static struct dentry *tracefs_failed_creating(struct dentry *dentry)
> +struct dentry *tracefs_failed_creating(struct dentry *dentry)
>  {
>  	inode_unlock(d_inode(dentry->d_parent));
>  	dput(dentry);
> @@ -445,7 +445,7 @@ static struct dentry *tracefs_failed_creating(struct dentry *dentry)
>  	return NULL;
>  }
>  
> -static struct dentry *tracefs_end_creating(struct dentry *dentry)
> +struct dentry *tracefs_end_creating(struct dentry *dentry)
>  {
>  	inode_unlock(d_inode(dentry->d_parent));
>  	return dentry;
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> new file mode 100644
> index 000000000000..c443a0c32a8c
> --- /dev/null
> +++ b/fs/tracefs/internal.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _TRACEFS_INTERNAL_H
> +#define _TRACEFS_INTERNAL_H
> +
> +enum {
> +	TRACEFS_EVENT_INODE     = BIT(1),
> +};
> +
> +struct tracefs_inode {
> +	unsigned long           flags;
> +	void                    *private;
> +	struct inode            vfs_inode;
> +};
> +
> +static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
> +{
> +	return container_of(inode, struct tracefs_inode, vfs_inode);
> +}
> +
> +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
> +struct dentry *tracefs_end_creating(struct dentry *dentry);
> +struct dentry *tracefs_failed_creating(struct dentry *dentry);
> +struct inode *tracefs_get_inode(struct super_block *sb);
> +
> +#endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 99912445974c..432e5e6f7901 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -21,6 +21,17 @@ struct file_operations;
>  
>  #ifdef CONFIG_TRACING
>  
> +struct eventfs_file;
> +
> +struct dentry *eventfs_create_events_dir(const char *name,
> +					 struct dentry *parent);
> +
> +struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> +					       struct dentry *parent);
> +
> +struct eventfs_file *eventfs_add_dir(const char *name,
> +				     struct eventfs_file *ef_parent);
> +
>  struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 04/10] eventfs: Implement eventfs file add functions
  2023-07-13 11:33 ` [PATCH v4 04/10] eventfs: Implement eventfs file add functions Ajay Kaher
@ 2023-07-14 16:23   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 16:23 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli


More nits.

On Thu, 13 Jul 2023 17:03:18 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Adding following function to eventfs to add files:

Add the following functions to add files to evenfs:

> 
> eventfs_add_top_file() adds the info of top file to
> eventfs and dynamically create these files when they
> are accessed.

I wonder if we should rename this, because I didn't really know what "top"
meant. Perhaps:

 eventfs_add_events_file() to add the data needed to create a specific file
 located at the top level events directory. The dentry/inode will be
 created when the events directory is scanned.

> 
> eventfs_add_file() adds the info of nested files
> to eventfs and dynamically create these files when
> they are accessed.

 eventfs_add_file() to add the data needed for files within the directories
 below the top level events directory. The dentry/inode of the file will be
 created when the directory that the file is in is scanned.

> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com
> ---
>  fs/tracefs/event_inode.c | 84 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/tracefs.h  |  8 ++++
>  2 files changed, 92 insertions(+)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4e7a8eccaa0b..75dc8953d813 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -215,3 +215,87 @@ struct eventfs_file *eventfs_add_dir(const char *name,
>  	mutex_unlock(&eventfs_mutex);
>  	return ef;
>  }
> +
> +/**
> + * eventfs_add_top_file - add event top file to list to create later

I'm still thinking we should rename this to eventfs_add_events_file() to
match the "eventfs_create_events_dir()".

 eventfs_add_events_file - add the data needed to create file in events dir


> + * @name: the name of the file to create.
> + * @mode: the permission that the file should have.
> + * @parent: parent dentry for this file.
> + * @data: something that the caller will want to get to later on. The
> + *        inode.i_private pointer will point to this value on the open() call.

Note, for kerneldoc, it's best to have the above be one line, and then add
anything else below it.
 * @data: something that the caller will want to get to later on.

> + * @fop: struct file_operations that should be used for this file.
> + *
> + * This function adds top files of event dir to list.
> + * And all these files are created on the fly when they are looked up,
> + * and the dentry and inodes will be removed when they are done.

 * This function is used to add the information needed to create a
 * dentry/inode within the top level events directory. The file created
 * will have the @mode permissions. The @data will be used to fill the
 * inode.i_private when the open() call is done. The dentry and inodes are
 * all created when they are referenced, and removed when they are no
 * longer referenced.


> + */
> +int eventfs_add_top_file(const char *name, umode_t mode,
> +			 struct dentry *parent, void *data,
> +			 const struct file_operations *fop)
> +{
> +	struct tracefs_inode *ti;
> +	struct eventfs_inode *ei;
> +	struct eventfs_file *ef;
> +
> +	if (!parent)
> +		return -EINVAL;
> +
> +	if (!(mode & S_IFMT))
> +		mode |= S_IFREG;
> +
> +	if (!parent->d_inode)
> +		return -EINVAL;
> +
> +	ti = get_tracefs(parent->d_inode);
> +	if (!(ti->flags & TRACEFS_EVENT_INODE))
> +		return -EINVAL;
> +
> +	ei = ti->private;
> +	ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
> +
> +	if (IS_ERR(ef))
> +		return -ENOMEM;
> +
> +	mutex_lock(&eventfs_mutex);
> +	list_add_tail(&ef->list, &ei->e_top_files);
> +	ef->d_parent = parent;
> +	mutex_unlock(&eventfs_mutex);
> +	return 0;
> +}
> +
> +/**
> + * eventfs_add_file - add eventfs file to list to create later

Same idea here.

 eventfs_add_file - add the data needed to create a file for later reference

> + * @name: the name of the file to create.
> + * @mode: the permission that the file should have.
> + * @ef_parent: parent eventfs_file for this file.
> + * @data: something that the caller will want to get to later on. The
> + *        inode.i_private pointer will point to this value on the open() call.

And same note about the line.

> + * @fop: struct file_operations that should be used for this file.
> + *
> + * This function adds top files of event dir to list.

Seems you have a cut and paste error here.

> + * And all these files are created on the fly when they are looked up,
> + * and the dentry and inodes will be removed when they are done.

 * This function is used to add the information needed to create a
 * file within a subdirectory of the events directory. The file created
 * will have the @mode permissions. The @data will be used to fill the
 * inode.i_private when the open() call is done. The dentry and inodes are
 * all created when they are referenced, and removed when they are no
 * longer referenced.

-- Steve

> + */
> +int eventfs_add_file(const char *name, umode_t mode,
> +		     struct eventfs_file *ef_parent,
> +		     void *data,
> +		     const struct file_operations *fop)
> +{
> +	struct eventfs_file *ef;
> +
> +	if (!ef_parent)
> +		return -EINVAL;
> +
> +	if (!(mode & S_IFMT))
> +		mode |= S_IFREG;
> +
> +	ef = eventfs_prepare_ef(name, mode, fop, NULL, data);
> +	if (IS_ERR(ef))
> +		return -ENOMEM;
> +
> +	mutex_lock(&eventfs_mutex);
> +	list_add_tail(&ef->list, &ef_parent->ei->e_top_files);
> +	ef->d_parent = ef_parent->dentry;
> +	mutex_unlock(&eventfs_mutex);
> +	return 0;
> +}
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 432e5e6f7901..a51312ff803c 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
>  struct eventfs_file *eventfs_add_dir(const char *name,
>  				     struct eventfs_file *ef_parent);
>  
> +int eventfs_add_file(const char *name, umode_t mode,
> +		     struct eventfs_file *ef_parent, void *data,
> +		     const struct file_operations *fops);
> +
> +int eventfs_add_top_file(const char *name, umode_t mode,
> +			 struct dentry *parent, void *data,
> +			 const struct file_operations *fops);
> +
>  struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function
  2023-07-13 11:33 ` [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function Ajay Kaher
@ 2023-07-14 16:35   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 16:35 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli

Some more nits:

Subject:

 eventfs: Implement removal of meta data from eventfs


On Thu, 13 Jul 2023 17:03:19 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Adding eventfs_remove(), this function will recursively remove
> dir or file info from eventfs.

 When events are removed from tracefs, the eventfs must be aware of this.
 The eventfs_remove() removes the meta data from eventfs so that it will no
 longer create the files associated with that event.

 When an instance is removed from tracefs, eventfs_remove_events_dir() will
 remove and clean up the entire "events" directory.


> 
> added a recursion check to eventfs_remove_rec() as it is really
> dangerous to have unchecked recursion in the kernel (we do have
> a fixed size stack).

The above doesn't need to be in the change log. It's an update from the
previous version. The eventfs_remove_rec() is added here, so the recursion
check was not.

> 
> have the free use srcu callbacks. After the srcu grace periods
> are done, it adds the eventfs_file onto a llist (lockless link
> list) and wakes up a work queue. Then the work queue does the
> freeing (this needs to be done in task/workqueue context, as
> srcu callbacks are done in softirq context).

If you want to document the above, we can say:

 The helper function eventfs_remove_rec() is used to clean up and free the
 associated data from eventfs for both of the added functions. SRCU is used
 to protect the lists of meta data stored in the eventfs. The eventfs_mutex
 is used to protect the content of the items in the list.

 As lookups may be happening as deletions of events are made, the freeing
 of of dentry/inodes and relative information is done after the SRCU grace
 period has passed. As the callback of SRCU is in a softirq context, a work
 queue is added to perform the cleanups in a task context.

 The struct evenfs_file is given a union of an rcu_head and a llist_node.
 The SRCU callback uses the rcu_head from this structure to insert it into
 the SRCU queue. When the SRCU grace periods are complete, the callback
 will then insert the eventfs_file struct onto a lockless llist using the
 llist_node of the structure. A union is used as this process is just a
 hand off from SRCU to workqueue, and only one field is necessary for this
 to work.

You can also add the above as a comment in the code (and keep it in the
change log as well).

-- Steve


> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/
> ---
>  fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++
>  include/linux/tracefs.h  |   4 ++
>  2 files changed, 114 insertions(+)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 75dc8953d813..322a77be5a56 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -45,6 +45,7 @@ struct eventfs_file {
>  };
>  
>  static DEFINE_MUTEX(eventfs_mutex);
> +DEFINE_STATIC_SRCU(eventfs_srcu);
>  
>  static const struct file_operations eventfs_file_operations = {
>  };
> @@ -299,3 +300,112 @@ int eventfs_add_file(const char *name, umode_t mode,
>  	mutex_unlock(&eventfs_mutex);
>  	return 0;
>  }
> +
> +static LLIST_HEAD(free_list);
> +
> +static void eventfs_workfn(struct work_struct *work)
> +{
> +	struct eventfs_file *ef, *tmp;
> +	struct llist_node *llnode;
> +
> +	llnode = llist_del_all(&free_list);
> +	llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> +		if (ef->created && ef->dentry)
> +			dput(ef->dentry);
> +		kfree(ef->name);
> +		kfree(ef->ei);
> +		kfree(ef);
> +	}
> +}
> +
> +DECLARE_WORK(eventfs_work, eventfs_workfn);
> +
> +static void free_ef(struct rcu_head *head)
> +{
> +	struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
> +
> +	if (!llist_add(&ef->llist, &free_list))
> +		return;
> +
> +	queue_work(system_unbound_wq, &eventfs_work);
> +}
> +
> +
> +
> +/**
> + * eventfs_remove_rec - remove eventfs dir or file from list
> + * @ef: eventfs_file to be removed.
> + *
> + * This function recursively remove eventfs_file which
> + * contains info of file or dir.
> + */
> +static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +{
> +	struct eventfs_file *ef_child;
> +
> +	if (!ef)
> +		return;
> +	/*
> +	 * Check recursion depth. It should never be greater than 3:
> +	 * 0 - events/
> +	 * 1 - events/group/
> +	 * 2 - events/group/event/
> +	 * 3 - events/group/event/file
> +	 */
> +	if (WARN_ON_ONCE(level > 3))
> +		return;
> +
> +	if (ef->ei) {
> +		/* search for nested folders or files */
> +		list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> +					 lockdep_is_held(&eventfs_mutex)) {
> +			eventfs_remove_rec(ef_child, level + 1);
> +		}
> +	}
> +
> +	if (ef->created && ef->dentry)
> +		d_invalidate(ef->dentry);
> +
> +	list_del_rcu(&ef->list);
> +	call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> +}
> +
> +/**
> + * eventfs_remove - remove eventfs dir or file from list
> + * @ef: eventfs_file to be removed.
> + *
> + * This function acquire the eventfs_mutex lock and call eventfs_remove_rec()
> + */
> +void eventfs_remove(struct eventfs_file *ef)
> +{
> +	if (!ef)
> +		return;
> +
> +	mutex_lock(&eventfs_mutex);
> +	eventfs_remove_rec(ef, 0);
> +	mutex_unlock(&eventfs_mutex);
> +}
> +
> +/**
> + * eventfs_remove_events_dir - remove eventfs dir or file from list
> + * @dentry: events's dentry to be removed.
> + *
> + * This function remove events main directory
> + */
> +void eventfs_remove_events_dir(struct dentry *dentry)
> +{
> +	struct tracefs_inode *ti;
> +	struct eventfs_inode *ei;
> +
> +	if (!dentry || !dentry->d_inode)
> +		return;
> +
> +	ti = get_tracefs(dentry->d_inode);
> +	if (!ti || !(ti->flags & TRACEFS_EVENT_INODE))
> +		return;
> +
> +	ei = ti->private;
> +	d_invalidate(dentry);
> +	dput(dentry);
> +	kfree(ei);
> +}
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index a51312ff803c..2c08edd4a739 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -40,6 +40,10 @@ int eventfs_add_top_file(const char *name, umode_t mode,
>  			 struct dentry *parent, void *data,
>  			 const struct file_operations *fops);
>  
> +void eventfs_remove(struct eventfs_file *ef);
> +
> +void eventfs_remove_events_dir(struct dentry *dentry);
> +
>  struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 06/10] eventfs: Implement functions to create eventfs files and directories
  2023-07-13 11:33 ` [PATCH v4 06/10] eventfs: Implement functions to create eventfs files and directories Ajay Kaher
@ 2023-07-14 16:47   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 16:47 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli


Some more nits.

Subject: eventfs: Implement functions to create files and dirs when accessed

On Thu, 13 Jul 2023 17:03:20 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Adding create_file(), create_dir() to create file, dir
> at runtime when they are accessed.
> 
> These function will be called either from lookup
> of inode_operations or open of file_operations.

Add create_file() and create_dir() functions to create the files and
directories respectively when they are accessed. The functions will be
called from the lookup operation of the inode_operations or from the open
function of file_operations.


> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> ---
>  fs/tracefs/event_inode.c | 110 +++++++++++++++++++++++++++++++++++++++
>  fs/tracefs/inode.c       |  47 +++++++++++++++++
>  include/linux/tracefs.h  |   7 +++
>  3 files changed, 164 insertions(+)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 322a77be5a56..34b5d3d8005b 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -46,6 +46,116 @@ struct eventfs_file {
>  
>  static DEFINE_MUTEX(eventfs_mutex);
>  DEFINE_STATIC_SRCU(eventfs_srcu);
> +/**
> + * create_file - create a file in the tracefs filesystem
> + * @name: the name of the file to create.
> + * @mode: the permission that the file should have.
> + * @parent: parent dentry for this file.
> + * @data: something that the caller will want to get to later on.
> + *        The inode.i_private pointer will point to this value on
> + *        the open() call.

Again, should move the extra lines into the description.

> + * @fop: struct file_operations that should be used for this file.
> + *
> + * This is the basic "create a file" function for tracefs.  It allows for a
> + * wide range of flexibility in creating a file.
> + *
> + * This function will return a pointer to a dentry if it succeeds.  This
> + * pointer must be passed to the tracefs_remove() function when the file is
> + * to be removed (no automatic cleanup happens if your module is unloaded,
> + * you are responsible here.)  If an error occurs, %NULL will be returned.
> + *
> + * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> + * returned.
> + */
> +struct dentry *create_file(const char *name, umode_t mode,
> +				  struct dentry *parent, void *data,
> +				  const struct file_operations *fop)
> +{
> +	struct tracefs_inode *ti;
> +	struct dentry *dentry;
> +	struct inode *inode;
> +
> +	if (!(mode & S_IFMT))
> +		mode |= S_IFREG;
> +
> +	if (WARN_ON_ONCE(!S_ISREG(mode)))
> +		return NULL;
> +
> +	dentry = eventfs_start_creating(name, parent);
> +
> +	if (IS_ERR(dentry))
> +		return dentry;
> +
> +	inode = tracefs_get_inode(dentry->d_sb);
> +	if (unlikely(!inode))
> +		return eventfs_failed_creating(dentry);
> +
> +	inode->i_mode = mode;
> +	inode->i_fop = fop;
> +	inode->i_private = data;
> +
> +	ti = get_tracefs(inode);
> +	ti->flags |= TRACEFS_EVENT_INODE;
> +	d_instantiate(dentry, inode);
> +	fsnotify_create(dentry->d_parent->d_inode, dentry);
> +	return eventfs_end_creating(dentry);
> +}
> +
> +/**
> + * create_dir - create a dir in the tracefs filesystem
> + * @name: the name of the file to create.
> + * @mode: the permission that the file should have.
> + * @parent: parent dentry for this file.
> + * @data: something that the caller will want to get to later on.
> + *        The inode.i_private pointer will point to this value on
> + *        the open() call.

Move the extra lines down into the description here too.

> + * @fop: struct file_operations that should be used for this dir.
> + * @iop: struct inode_operations that should be used for this dir.
> + *
> + * This is the basic "create a dir" function for eventfs.  It allows for a
> + * wide range of flexibility in creating a dir.
> + *
> + * This function will return a pointer to a dentry if it succeeds.  This
> + * pointer must be passed to the tracefs_remove() function when the file is
> + * to be removed (no automatic cleanup happens if your module is unloaded,
> + * you are responsible here.)  If an error occurs, %NULL will be returned.
> + *
> + * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> + * returned.
> + */
> +struct dentry *create_dir(const char *name, umode_t mode,
> +				 struct dentry *parent, void *data,
> +				 const struct file_operations *fop,
> +				 const struct inode_operations *iop)
> +{
> +	struct tracefs_inode *ti;
> +	struct dentry *dentry;
> +	struct inode *inode;
> +
> +	WARN_ON(!S_ISDIR(mode));
> +
> +	dentry = eventfs_start_creating(name, parent);
> +	if (IS_ERR(dentry))
> +		return dentry;
> +
> +	inode = tracefs_get_inode(dentry->d_sb);
> +	if (unlikely(!inode))
> +		return eventfs_failed_creating(dentry);
> +
> +	inode->i_mode = mode;
> +	inode->i_op = iop;
> +	inode->i_fop = fop;
> +	inode->i_private = data;
> +
> +	ti = get_tracefs(inode);
> +	ti->flags |= TRACEFS_EVENT_INODE;
> +
> +	inc_nlink(inode);
> +	d_instantiate(dentry, inode);
> +	inc_nlink(dentry->d_parent->d_inode);
> +	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
> +	return eventfs_end_creating(dentry);
> +}
>  
>  static const struct file_operations eventfs_file_operations = {
>  };
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 7ef3a02766f5..7dc692a5fee1 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -451,6 +451,53 @@ struct dentry *tracefs_end_creating(struct dentry *dentry)
>  	return dentry;
>  }
>  

This needs to be documented. What about:

/**
 * eventfs_start_creating - start the process of creating a dentry
 * @name: Name of the file created for the dentry
 * @parent: The parent dentry where this dentry will be created
 *
 * This is a simple helper function for the dynamically created eventfs
 * files. When the directory of the eventfs files are accessed, their
 * dentries are created on the fly. This function is used to start that
 * process.
 */
> +struct dentry *eventfs_start_creating(const char *name, struct dentry
>    *parent) +{
> +	struct dentry *dentry;
> +	int error;
> +
> +	error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
> +			      &tracefs_mount_count);
> +	if (error)
> +		return ERR_PTR(error);
> +
> +	/*
> +	 * If the parent is not specified, we create it in the root.
> +	 * We need the root dentry to do this, which is in the super
> +	 * block. A pointer to that is in the struct vfsmount that we
> +	 * have around.
> +	 */
> +	if (!parent)
> +		parent = tracefs_mount->mnt_root;
> +
> +	if (unlikely(IS_DEADDIR(parent->d_inode)))
> +		dentry = ERR_PTR(-ENOENT);
> +	else
> +		dentry = lookup_one_len(name, parent, strlen(name));
> +
> +	if (!IS_ERR(dentry) && dentry->d_inode) {
> +		dput(dentry);
> +		dentry = ERR_PTR(-EEXIST);
> +	}
> +
> +	if (IS_ERR(dentry))
> +		simple_release_fs(&tracefs_mount, &tracefs_mount_count);
> +
> +	return dentry;
> +}
> +

/**
 * eventfs_failed_creating - clean up a failed eventfs dentry creation
 * @dentry: The dentry to clean up
 *
 * If after calling eventfs_start_creating(), a failure is detected, the
 * resources created by eventfs_start_creating() needs to be cleaned up. In
 * that case, this function should be called to perform that clean up.
 */
> +struct dentry *eventfs_failed_creating(struct dentry *dentry)
> +{
> +	dput(dentry);
> +	simple_release_fs(&tracefs_mount, &tracefs_mount_count);
> +	return NULL;
> +}
> +

/**
 * eventfs_end_creating - Finish the process of creating a eventfs dentry
 * @dentry: The dentry that has successfully been created.
 *
 * This function is currently just a place holder to match
 * eventfs_start_creating(). In case any synchronization needs to be added,
 * this function will be used to implement that without having to modify
 * the callers of eventfs_start_creating().
 */
> +struct dentry *eventfs_end_creating(struct dentry *dentry)
> +{
> +	return dentry;
> +}

-- Steve

> +
>  /**
>   * tracefs_create_file - create a file in the tracefs filesystem
>   * @name: a pointer to a string containing the name of the file to
>    create. diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 2c08edd4a739..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -23,6 +23,13 @@ struct file_operations;
>  
>  struct eventfs_file;
>  
> +struct dentry *eventfs_start_creating(const char *name,
> +				      struct dentry *parent);
> +
> +struct dentry *eventfs_failed_creating(struct dentry *dentry);
> +
> +struct dentry *eventfs_end_creating(struct dentry *dentry);
> +
>  struct dentry *eventfs_create_events_dir(const char *name,
>  					 struct dentry *parent);
>  


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions
  2023-07-13 11:33 ` [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions Ajay Kaher
@ 2023-07-14 20:17   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 20:17 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli


Some more nits.

On Thu, 13 Jul 2023 17:03:21 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Adding following inode_operations, file_operations and helper functions to
> eventfs:

 Add the inode_operations, file_operations, and helper functions to eventfs:

> dcache_dir_open_wrapper()
> eventfs_root_lookup()
> eventfs_release()
> eventfs_set_ef_status_free()
> eventfs_post_create_dir()
> 
> inode_operations, file_operations will be called from vfs.

The inode_operations and file_operations functions will be called from the
VFS layer.

> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> ---
>  fs/tracefs/event_inode.c | 194 ++++++++++++++++++++++++++++++++++++++-
>  include/linux/tracefs.h  |   2 +
>  2 files changed, 194 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 34b5d3d8005b..7167340ac29e 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -67,7 +67,7 @@ DEFINE_STATIC_SRCU(eventfs_srcu);
>   * If tracefs is not enabled in the kernel, the value -%ENODEV will be
>   * returned.
>   */


> -struct dentry *create_file(const char *name, umode_t mode,
> +static struct dentry *create_file(const char *name, umode_t mode,
>  				  struct dentry *parent, void *data,
>  				  const struct file_operations *fop)
>  {
> @@ -123,7 +123,7 @@ struct dentry *create_file(const char *name, umode_t
> mode,
>   * If tracefs is not enabled in the kernel, the value -%ENODEV will be
>   * returned.
>   */
> -struct dentry *create_dir(const char *name, umode_t mode,
> +static struct dentry *create_dir(const char *name, umode_t mode,
>  				 struct dentry *parent, void *data,
>  				 const struct file_operations *fop,
>  				 const struct inode_operations *iop)

Please fold these changes into the patches that added these functions. They
should have been created as static functions.

If you are worried about the warnings that they caused by being static and
not used, perhaps these patches should be reordered, and we add this patch
first, with those functions as stubs.

That is, build the VFS functions first with just place holders for the
internals, and then fill in the internals. That would actually make more
sense in reviewing the code.

> @@ -157,10 +157,200 @@ struct dentry *create_dir(const char *name,
> umode_t mode, return eventfs_end_creating(dentry);
>  }
>  
> +/**
> + * eventfs_set_ef_status_free - set the ef->status to free
> + * @dentry: dentry who's status to be freed
> + *
> + * eventfs_set_ef_status_free will be called if no more
> + * reference remains

 "references remain"

> + */
> +void eventfs_set_ef_status_free(struct dentry *dentry)
> +{
> +	struct tracefs_inode *ti_parent;
> +	struct eventfs_file *ef;
> +
> +	ti_parent = get_tracefs(dentry->d_parent->d_inode);
> +	if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> +		return;
> +
> +	ef = dentry->d_fsdata;
> +	if (!ef)
> +		return;
> +	ef->created = false;
> +	ef->dentry = NULL;
> +}
> +
> +/**
> + * eventfs_post_create_dir - post create dir routine
> + * @ef: eventfs_file of recently created dir
> + *
> + * Files with-in eventfs dir should know dentry of parent dir

 * Map the meta-data of files within an eventfs dir to their parent dentry.

> + */
> +static void eventfs_post_create_dir(struct eventfs_file *ef)
> +{
> +	struct eventfs_file *ef_child;
> +	struct tracefs_inode *ti;
> +
> +	/* srcu lock already held */
> +	/* fill parent-child relation */
> +	list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> +				 srcu_read_lock_held(&eventfs_srcu)) {
> +		ef_child->d_parent = ef->dentry;
> +	}
> +
> +	ti = get_tracefs(ef->dentry->d_inode);
> +	ti->private = ef->ei;
> +}
> +
> +/**
> + * eventfs_root_lookup - lookup routine to create file/dir
> + * @dir: directory in which lookup to be done

  "in which a lookup is being done"

> + * @dentry: file/dir dentry
> + * @flags:

  ?

 "unused" or "to pass as flags parameter to simple lookup"?

> + *
> + * Used to create dynamic file/dir with-in @dir, search with-in ei
> + * list, if @dentry found go ahead and create the file/dir

"within" is a real word ;-)

 * Used to create a dynamic file/dir within @dir. Use the eventfs_inode
 * list of meta data to find the information needed to create the file/dir.

> + */
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> +					  struct dentry *dentry,
> +					  unsigned int flags)
> +{
> +	struct tracefs_inode *ti;
> +	struct eventfs_inode *ei;
> +	struct eventfs_file *ef;
> +	struct dentry *ret = NULL;
> +	int idx;
> +
> +	ti = get_tracefs(dir);
> +	if (!(ti->flags & TRACEFS_EVENT_INODE))
> +		return NULL;
> +
> +	ei = ti->private;
> +	idx = srcu_read_lock(&eventfs_srcu);
> +	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> +				 srcu_read_lock_held(&eventfs_srcu)) {
> +		if (strcmp(ef->name, dentry->d_name.name))
> +			continue;
> +		ret = simple_lookup(dir, dentry, flags);
> +		if (ef->created)
> +			continue;
> +		mutex_lock(&eventfs_mutex);
> +		ef->created = true;
> +		if (ef->ei)
> +			ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> +						ef->data, ef->fop, ef->iop);
> +		else
> +			ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> +						 ef->data, ef->fop);
> +
> +		if (IS_ERR_OR_NULL(ef->dentry)) {
> +			ef->created = false;
> +			mutex_unlock(&eventfs_mutex);
> +		} else {
> +			if (ef->ei)
> +				eventfs_post_create_dir(ef);
> +			ef->dentry->d_fsdata = ef;
> +			mutex_unlock(&eventfs_mutex);
> +			dput(ef->dentry);
> +		}
> +		break;
> +	}
> +	srcu_read_unlock(&eventfs_srcu, idx);
> +	return ret;
> +}
> +
> +/**
> + * eventfs_release - called to release eventfs file/dir
> + * @inode: inode to be released
> + * @file: file to be released (not used)
> + */
> +static int eventfs_release(struct inode *inode, struct file *file)
> +{
> +	struct tracefs_inode *ti;
> +	struct eventfs_inode *ei;
> +	struct eventfs_file *ef;
> +	int idx;
> +
> +	ti = get_tracefs(inode);
> +	if (!(ti->flags & TRACEFS_EVENT_INODE))
> +		return -EINVAL;
> +
> +	ei = ti->private;
> +	idx = srcu_read_lock(&eventfs_srcu);
> +	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> +				 srcu_read_lock_held(&eventfs_srcu)) {
> +		if (ef->created)
> +			dput(ef->dentry);
> +	}
> +	srcu_read_unlock(&eventfs_srcu, idx);
> +	return dcache_dir_close(inode, file);
> +}
> +
> +/**
> + * dcache_dir_open_wrapper - eventfs open wrapper
> + * @inode: not used
> + * @file: dir to be opened (to create it's child)

  "its child"

> + *
> + * Used to dynamic create file/dir with-in @file, all the
> + * file/dir will be created. If already created then reference
> + * will be increased

 * Used to dynamically create the file/dir within @file. @file is really a
 * directory and all the files/dirs of the children within @file will be
 * created. If any of the files/dirs have already been created, their
 * reference count will be incremented.


> + */
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file
> *file) +{
> +	struct tracefs_inode *ti;
> +	struct eventfs_inode *ei;
> +	struct eventfs_file *ef;
> +	struct dentry *dentry = file_dentry(file);
> +	struct inode *f_inode = file_inode(file);
> +	int idx;
> +
> +	ti = get_tracefs(f_inode);
> +	if (!(ti->flags & TRACEFS_EVENT_INODE))
> +		return -EINVAL;
> +
> +	ei = ti->private;
> +	idx = srcu_read_lock(&eventfs_srcu);
> +	list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> +		if (ef->created) {
> +			dget(ef->dentry);
> +			continue;
> +		}
> +		mutex_lock(&eventfs_mutex);
> +		ef->created = true;
> +
> +		inode_lock(dentry->d_inode);
> +		if (ef->ei)
> +			ef->dentry = create_dir(ef->name, ef->mode, dentry,
> +						ef->data, ef->fop, ef->iop);
> +		else
> +			ef->dentry = create_file(ef->name, ef->mode, dentry,
> +						 ef->data, ef->fop);
> +		inode_unlock(dentry->d_inode);
> +
> +		if (IS_ERR_OR_NULL(ef->dentry)) {
> +			ef->created = false;
> +		} else {
> +			if (ef->ei)
> +				eventfs_post_create_dir(ef);
> +			ef->dentry->d_fsdata = ef;
> +		}
> +		mutex_unlock(&eventfs_mutex);
> +	}
> +	srcu_read_unlock(&eventfs_srcu, idx);
> +	return dcache_dir_open(inode, file);
> +}
> +
>  static const struct file_operations eventfs_file_operations = {
> +	.open           = dcache_dir_open_wrapper,
> +	.read		= generic_read_dir,
> +	.iterate_shared	= dcache_readdir,
> +	.llseek		= generic_file_llseek,
> +	.release        = eventfs_release,
>  };
>  
>  static const struct inode_operations eventfs_root_dir_inode_operations =
> {
> +	.lookup		= eventfs_root_lookup,
>  };
>  
>  /**
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 47c1b4d21735..4d30b0cafc5f 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,6 +51,8 @@ void eventfs_remove(struct eventfs_file *ef);
>  
>  void eventfs_remove_events_dir(struct dentry *dentry);
>  
> +void eventfs_set_ef_status_free(struct dentry *dentry);

Shouldn't this be internal to the tracefs code, and not something exposed
to users of tracefs?

> +
>  struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops);

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache
  2023-07-13 11:33 ` [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache Ajay Kaher
@ 2023-07-14 20:27   ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 20:27 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli


I wonder if this should be patch 2?

On Thu, 13 Jul 2023 17:03:22 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Creating tracefs_inode_cache which is a cache of tracefs_inode.

Create a kmem cache of tracefs_inodes. To be more efficient, as there are
lots of tracefs inodes, create its own cache. This also allows to see how
many tracefs inodes have been created.

> Adding helping functions:

Add helper functions:

> tracefs_alloc_inode()
> tracefs_free_inode()
> 

-- Steve

> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> ---
>  fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 7dc692a5fee1..76820d3e97eb 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -21,13 +21,33 @@
>  #include <linux/parser.h>
>  #include <linux/magic.h>
>  #include <linux/slab.h>
> +#include "internal.h"
>  
>  #define TRACEFS_DEFAULT_MODE	0700
> +static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
>  
>  static struct vfsmount *tracefs_mount;
>  static int tracefs_mount_count;
>  static bool tracefs_registered;
>  
> +static struct inode *tracefs_alloc_inode(struct super_block *sb)
> +{
> +	struct tracefs_inode *ti;
> +
> +	ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL);
> +	if (!ti)
> +		return NULL;
> +
> +	ti->flags = 0;
> +
> +	return &ti->vfs_inode;
> +}
> +
> +static void tracefs_free_inode(struct inode *inode)
> +{
> +	kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
> +}
> +
>  static ssize_t default_read_file(struct file *file, char __user *buf,
>  				 size_t count, loff_t *ppos)
>  {
> @@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root)
>  }
>  
>  static const struct super_operations tracefs_super_operations = {
> +	.alloc_inode    = tracefs_alloc_inode,
> +	.free_inode     = tracefs_free_inode,
> +	.drop_inode     = generic_delete_inode,
>  	.statfs		= simple_statfs,
>  	.remount_fs	= tracefs_remount,
>  	.show_options	= tracefs_show_options,
> @@ -675,10 +698,26 @@ bool tracefs_initialized(void)
>  	return tracefs_registered;
>  }
>  
> +static void init_once(void *foo)
> +{
> +	struct tracefs_inode *ti = (struct tracefs_inode *) foo;
> +
> +	inode_init_once(&ti->vfs_inode);
> +}
> +
>  static int __init tracefs_init(void)
>  {
>  	int retval;
>  
> +	tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
> +						 sizeof(struct tracefs_inode),
> +						 0, (SLAB_RECLAIM_ACCOUNT|
> +						     SLAB_MEM_SPREAD|
> +						     SLAB_ACCOUNT),
> +						 init_once);
> +	if (!tracefs_inode_cachep)
> +		return -ENOMEM;
> +
>  	retval = sysfs_create_mount_point(kernel_kobj, "tracing");
>  	if (retval)
>  		return -EINVAL;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/10] eventfs: Move tracing/events to eventfs
  2023-07-13 11:33 ` [PATCH v4 09/10] eventfs: Move tracing/events to eventfs Ajay Kaher
@ 2023-07-14 21:06   ` Steven Rostedt
  2023-07-19 11:08     ` Ajay Kaher
  2023-07-19 11:08     ` Ajay Kaher
  2023-09-08 12:14   ` Sven Schnelle
  1 sibling, 2 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 21:06 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli

On Thu, 13 Jul 2023 17:03:23 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Till now /sys/kernel/debug/tracing/events is a part of tracefs,
> with-in this patch creating 'events' and it's sub-dir as eventfs.
> Basically replacing tracefs calls with eventfs calls for 'events'.

[ note: /sys/kernel/debug/tracing is deprecated. Please avoid referencing it. ]

Up until now, /sys/kernel/tracing/events was no different than any other
part of tracefs. The files and directories within the events directory was
created when the tracefs was mounted, and also created for the instances in
/sys/kernel/tracing/instances/<instance>/events. Most of these files and
directories will never be referenced. Since there are thousands of these
files and directories they spend their time wasting precious memory
resources.

Move the "events" directory to the new eventfs. The eventfs will take the
meta data of the events that they represent and store that. When the files
in the events directory are referenced, the dentry and inodes to represent
them are then created. When the files are no longer referenced, they are
freed. This saves the precious memory resources that were wasted on these
seldom referenced dentries and inodes.

We should also include memory metrics of both improvements from
/proc/slabinfo and /proc/meminfo.

> 
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> ---
>  fs/tracefs/inode.c           | 18 ++++++++++
>  include/linux/trace_events.h |  1 +
>  kernel/trace/trace.h         |  2 +-
>  kernel/trace/trace_events.c  | 67 +++++++++++++++++++-----------------
>  4 files changed, 55 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 76820d3e97eb..a098d7153498 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = {
>  	.show_options	= tracefs_show_options,
>  };
>  
> +static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
> +{
> +	struct tracefs_inode *ti;
> +
> +	if (!dentry || !inode)
> +		return;
> +
> +	ti = get_tracefs(inode);
> +	if (ti && ti->flags & TRACEFS_EVENT_INODE)
> +		eventfs_set_ef_status_free(dentry);
> +	iput(inode);
> +}
> +
> +static const struct dentry_operations tracefs_dentry_operations = {
> +	.d_iput = tracefs_dentry_iput,
> +};
> +
>  static int trace_fill_super(struct super_block *sb, void *data, int silent)
>  {
>  	static const struct tree_descr trace_files[] = {{""}};
> @@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent)
>  		goto fail;
>  
>  	sb->s_op = &tracefs_super_operations;
> +	sb->s_d_op = &tracefs_dentry_operations;
>  
>  	tracefs_apply_options(sb, false);
>  
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 7c4a0b72334e..c6bb74cccea3 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -635,6 +635,7 @@ struct trace_event_file {
>  	struct list_head		list;
>  	struct trace_event_call		*event_call;
>  	struct event_filter __rcu	*filter;
> +	struct eventfs_file             *ef;
>  	struct dentry			*dir;
>  	struct trace_array		*tr;
>  	struct trace_subsystem_dir	*system;
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 79bdefe9261b..d9a0ec2b918e 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1309,7 +1309,7 @@ struct trace_subsystem_dir {
>  	struct list_head		list;
>  	struct event_subsystem		*subsystem;
>  	struct trace_array		*tr;
> -	struct dentry			*entry;
> +	struct eventfs_file             *ef;
>  	int				ref_count;
>  	int				nr_events;
>  };
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 12ed71428939..4058178de682 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -990,7 +990,8 @@ static void remove_subsystem(struct trace_subsystem_dir *dir)
>  		return;
>  
>  	if (!--dir->nr_events) {
> -		tracefs_remove(dir->entry);
> +		if (dir->ef)
> +			eventfs_remove(dir->ef);

Why the if statement? eventfs_remove() has:

void eventfs_remove(struct eventfs_file *ef)
{
	if (!ef)
		return;
	[..]
}

Let's get rid of that.

>  		list_del(&dir->list);
>  		__put_system_dir(dir);
>  	}
> @@ -1011,7 +1012,8 @@ static void remove_event_file_dir(struct trace_event_file *file)
>  
>  		tracefs_remove(dir);
>  	}
> -
> +	if (file->ef)
> +		eventfs_remove(file->ef);

Same here.

>  	list_del(&file->list);
>  	remove_subsystem(file->system);
>  	free_event_filter(file->filter);
> @@ -2297,13 +2299,13 @@ create_new_subsystem(const char *name)
>  	return NULL;
>  }
>  
> -static struct dentry *
> +static struct eventfs_file *
>  event_subsystem_dir(struct trace_array *tr, const char *name,
>  		    struct trace_event_file *file, struct dentry *parent)
>  {
>  	struct event_subsystem *system, *iter;
>  	struct trace_subsystem_dir *dir;
> -	struct dentry *entry;
> +	int res;
>  
>  	/* First see if we did not already create this dir */
>  	list_for_each_entry(dir, &tr->systems, list) {
> @@ -2311,7 +2313,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
>  		if (strcmp(system->name, name) == 0) {
>  			dir->nr_events++;
>  			file->system = dir;
> -			return dir->entry;
> +			return dir->ef;
>  		}
>  	}
>  
> @@ -2335,8 +2337,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
>  	} else
>  		__get_system(system);
>  
> -	dir->entry = tracefs_create_dir(name, parent);
> -	if (!dir->entry) {
> +	dir->ef = eventfs_add_subsystem_dir(name, parent);
> +	if (IS_ERR(dir->ef)) {
>  		pr_warn("Failed to create system directory %s\n", name);
>  		__put_system(system);
>  		goto out_free;
> @@ -2351,22 +2353,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name,
>  	/* the ftrace system is special, do not create enable or filter files */
>  	if (strcmp(name, "ftrace") != 0) {
>  
> -		entry = tracefs_create_file("filter", TRACE_MODE_WRITE,
> -					    dir->entry, dir,
> +		res = eventfs_add_file("filter", TRACE_MODE_WRITE,
> +					    dir->ef, dir,
>  					    &ftrace_subsystem_filter_fops);
> -		if (!entry) {
> +		if (res) {
>  			kfree(system->filter);
>  			system->filter = NULL;
>  			pr_warn("Could not create tracefs '%s/filter' entry\n", name);
>  		}
>  
> -		trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir,
> +		eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir,
>  				  &ftrace_system_enable_fops);
>  	}
>  
>  	list_add(&dir->list, &tr->systems);
>  
> -	return dir->entry;
> +	return dir->ef;
>  
>   out_free:
>  	kfree(dir);
> @@ -2420,7 +2422,7 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
>  {
>  	struct trace_event_call *call = file->event_call;
>  	struct trace_array *tr = file->tr;
> -	struct dentry *d_events;
> +	struct eventfs_file *ef_subsystem = NULL;

Move this around to keep the upside-down-xmas tree order.

-- Steve


>  	const char *name;
>  	int ret;
>  
> @@ -2432,24 +2434,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
>  	if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0))
>  		return -ENODEV;
>  
> -	d_events = event_subsystem_dir(tr, call->class->system, file, parent);
> -	if (!d_events)
> +	ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent);
> +	if (!ef_subsystem)
>  		return -ENOMEM;
>  
>  	name = trace_event_name(call);
> -	file->dir = tracefs_create_dir(name, d_events);
> -	if (!file->dir) {
> +	file->ef = eventfs_add_dir(name, ef_subsystem);
> +	if (IS_ERR(file->ef)) {
>  		pr_warn("Could not create tracefs '%s' directory\n", name);
>  		return -1;
>  	}
>  
>  	if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE))
> -		trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file,
> +		eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file,
>  				  &ftrace_enable_fops);
>  
>  #ifdef CONFIG_PERF_EVENTS
>  	if (call->event.type && call->class->reg)
> -		trace_create_file("id", TRACE_MODE_READ, file->dir,
> +		eventfs_add_file("id", TRACE_MODE_READ, file->ef,
>  				  (void *)(long)call->event.type,
>  				  &ftrace_event_id_fops);
>  #endif
> @@ -2465,27 +2467,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
>  	 * triggers or filters.
>  	 */
>  	if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
> -		trace_create_file("filter", TRACE_MODE_WRITE, file->dir,
> +		eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef,
>  				  file, &ftrace_event_filter_fops);
>  
> -		trace_create_file("trigger", TRACE_MODE_WRITE, file->dir,
> +		eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef,
>  				  file, &event_trigger_fops);
>  	}
>  
>  #ifdef CONFIG_HIST_TRIGGERS
> -	trace_create_file("hist", TRACE_MODE_READ, file->dir, file,
> +	eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file,
>  			  &event_hist_fops);
>  #endif
>  #ifdef CONFIG_HIST_TRIGGERS_DEBUG
> -	trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file,
> +	eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file,
>  			  &event_hist_debug_fops);
>  #endif
> -	trace_create_file("format", TRACE_MODE_READ, file->dir, call,
> +	eventfs_add_file("format", TRACE_MODE_READ, file->ef, call,
>  			  &ftrace_event_format_fops);
>  
>  #ifdef CONFIG_TRACE_EVENT_INJECT
>  	if (call->event.type && call->class->reg)
> -		trace_create_file("inject", 0200, file->dir, file,
> +		eventfs_add_file("inject", 0200, file->ef, file,
>  				  &event_inject_fops);
>  #endif
>  
> @@ -3638,21 +3640,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
>  {
>  	struct dentry *d_events;
>  	struct dentry *entry;
> +	int error = 0;
>  
>  	entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent,
>  				  tr, &ftrace_set_event_fops);
>  	if (!entry)
>  		return -ENOMEM;
>  
> -	d_events = tracefs_create_dir("events", parent);
> -	if (!d_events) {
> +	d_events = eventfs_create_events_dir("events", parent);
> +	if (IS_ERR(d_events)) {
>  		pr_warn("Could not create tracefs 'events' directory\n");
>  		return -ENOMEM;
>  	}
>  
> -	entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events,
> +	error = eventfs_add_top_file("enable", TRACE_MODE_WRITE, d_events,
>  				  tr, &ftrace_tr_enable_fops);
> -	if (!entry)
> +	if (error)
>  		return -ENOMEM;
>  
>  	/* There are not as crucial, just warn if they are not created */
> @@ -3665,11 +3668,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr)
>  			  &ftrace_set_event_notrace_pid_fops);
>  
>  	/* ring buffer internal formats */
> -	trace_create_file("header_page", TRACE_MODE_READ, d_events,
> +	eventfs_add_top_file("header_page", TRACE_MODE_READ, d_events,
>  				  ring_buffer_print_page_header,
>  				  &ftrace_show_header_fops);
>  
> -	trace_create_file("header_event", TRACE_MODE_READ, d_events,
> +	eventfs_add_top_file("header_event", TRACE_MODE_READ, d_events,
>  				  ring_buffer_print_entry_header,
>  				  &ftrace_show_header_fops);
>  
> @@ -3757,7 +3760,7 @@ int event_trace_del_tracer(struct trace_array *tr)
>  
>  	down_write(&trace_event_sem);
>  	__trace_remove_event_dirs(tr);
> -	tracefs_remove(tr->event_dir);
> +	eventfs_remove_events_dir(tr->event_dir);
>  	up_write(&trace_event_sem);
>  
>  	tr->event_dir = NULL;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
                   ` (9 preceding siblings ...)
  2023-07-13 11:33 ` [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs Ajay Kaher
@ 2023-07-14 22:58 ` Steven Rostedt
  2023-07-16 17:32   ` Ajay Kaher
  10 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2023-07-14 22:58 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, chinglinyu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, namit, oe-lkp, amakhalov, er.ajay.kaher,
	srivatsa, tkundu, vsirnapalli

On Thu, 13 Jul 2023 17:03:14 +0530
Ajay Kaher <akaher@vmware.com> wrote:

> Events Tracing infrastructure contains lot of files, directories
> (internally in terms of inodes, dentries). And ends up by consuming
> memory in MBs. We can have multiple events of Events Tracing, which
> further requires more memory.
> 
> Instead of creating inodes/dentries, eventfs could keep meta-data and
> skip the creation of inodes/dentries. As and when require, eventfs will
> create the inodes/dentries only for required files/directories.
> Also eventfs would delete the inodes/dentries once no more requires
> but preserve the meta data.
> 
> Tracing events took ~9MB, with this approach it took ~4.5MB
> for ~10K files/dir.

I think we are very close to getting this in for the next merge window. I
ran several tests and so far it's holding up!

I made a bunch of nits for this series, but nothing major. Mostly fixing up
change logs and comments, as well as some naming conventions and
reorganizing the series a little bit.

Anyway, I'm hoping that v5 will be ready to go into linux-next.

Thanks a lot Ajay for working on this!

-- Steve


> 
> v3:
> Patch 3,4,5,7,9:
>          removed all the eventfs_rwsem code and replaced it with an srcu
>          lock for the readers, and a mutex to synchronize the writers of
>          the list.
> 
> Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
> 
> Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
>          as it really should not be exposed to all users.
> 
> Patch 5: added a recursion check to eventfs_remove_rec() as it is really
>          dangerous to have unchecked recursion in the kernel (we do have
>          a fixed size stack).
> 
>          have the free use srcu callbacks. After the srcu grace periods
>          are done, it adds the eventfs_file onto a llist (lockless link
>          list) and wakes up a work queue. Then the work queue does the
>          freeing (this needs to be done in task/workqueue context, as
>          srcu callbacks are done in softirq context).
> 
> Patch 6: renamed:
>          eventfs_create_file() -> create_file()
>          eventfs_create_dir() -> create_dir()
> 
> v2:
> Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
> Patch 02: moved from v1 1/9
> Patch 03: moved from v1 2/9
>           As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
>           helper function to add files or directories to eventfs
>           fix WARNING reported by kernel test robot in v1 8/9
> Patch 04: moved from v1 3/9
>           used eventfs_prepare_ef() to add files
>           fix WARNING reported by kernel test robot in v1 8/9
> Patch 05: moved from v1 4/9
>           fix compiling warning reported by kernel test robot in v1 4/9
> Patch 06: moved from v1 5/9
> Patch 07: moved from v1 6/9
> Patch 08: moved from v1 7/9
> Patch 09: moved from v1 8/9
>           rebased because of v3 01/10
> Patch 10: moved from v1 9/9
> 
> v1:
> Patch 1: add header file
> Patch 2: resolved kernel test robot issues
>          protecting eventfs lists using nested eventfs_rwsem
> Patch 3: protecting eventfs lists using nested eventfs_rwsem
> Patch 4: improve events cleanup code to fix crashes
> Patch 5: resolved kernel test robot issues
>          removed d_instantiate_anon() calls
> Patch 6: resolved kernel test robot issues
>          fix kprobe test in eventfs_root_lookup()
>          protecting eventfs lists using nested eventfs_rwsem
> Patch 7: remove header file
> Patch 8: pass eventfs_rwsem as argument to eventfs functions
>          called eventfs_remove_events_dir() instead of tracefs_remove()
>          from event_trace_del_tracer()
> Patch 9: new patch to fix kprobe test case
> 
> Ajay Kaher (9):
>   tracefs: Rename some tracefs function
>   eventfs: Implement eventfs dir creation functions
>   eventfs: Implement eventfs file add functions
>   eventfs: Implement eventfs file, directory remove function
>   eventfs: Implement functions to create eventfs files and directories
>   eventfs: Implement eventfs lookup, read, open functions
>   eventfs: Implement tracefs_inode_cache
>   eventfs: Move tracing/events to eventfs
>   test: ftrace: Fix kprobe test for eventfs
> 
> Steven Rostedt (Google) (1):
>   tracing: Require all trace events to have a TRACE_SYSTEM
> 
>  fs/tracefs/Makefile                           |   1 +
>  fs/tracefs/event_inode.c                      | 711 ++++++++++++++++++
>  fs/tracefs/inode.c                            | 124 ++-
>  fs/tracefs/internal.h                         |  25 +
>  include/linux/trace_events.h                  |   1 +
>  include/linux/tracefs.h                       |  32 +
>  kernel/trace/trace.h                          |   2 +-
>  kernel/trace/trace_events.c                   |  78 +-
>  .../ftrace/test.d/kprobe/kprobe_args_char.tc  |   4 +-
>  .../test.d/kprobe/kprobe_args_string.tc       |   4 +-
>  10 files changed, 930 insertions(+), 52 deletions(-)
>  create mode 100644 fs/tracefs/event_inode.c
>  create mode 100644 fs/tracefs/internal.h
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-14 22:58 ` [PATCH v4 00/10] tracing: introducing eventfs Steven Rostedt
@ 2023-07-16 17:32   ` Ajay Kaher
  2023-07-18 13:40     ` Steven Rostedt
  0 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-16 17:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli



> On 15-Jul-2023, at 4:28 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> !! External Email
> 
> On Thu, 13 Jul 2023 17:03:14 +0530
> Ajay Kaher <akaher@vmware.com> wrote:
> 
>> Events Tracing infrastructure contains lot of files, directories
>> (internally in terms of inodes, dentries). And ends up by consuming
>> memory in MBs. We can have multiple events of Events Tracing, which
>> further requires more memory.
>> 
>> Instead of creating inodes/dentries, eventfs could keep meta-data and
>> skip the creation of inodes/dentries. As and when require, eventfs will
>> create the inodes/dentries only for required files/directories.
>> Also eventfs would delete the inodes/dentries once no more requires
>> but preserve the meta data.
>> 
>> Tracing events took ~9MB, with this approach it took ~4.5MB
>> for ~10K files/dir.
> 
> I think we are very close to getting this in for the next merge window. I
> ran several tests and so far it's holding up!
> 
> I made a bunch of nits for this series, but nothing major. Mostly fixing up
> change logs and comments, as well as some naming conventions and
> reorganizing the series a little bit.
> 
> Anyway, I'm hoping that v5 will be ready to go into linux-next.
> 
> Thanks a lot Ajay for working on this!


Thanks Steve, hopefully I will fix all the pending nits in v5.
Here is the checkpatch.pl report:


./scripts/checkpatch.pl v4/*
--------------------------
v4/0000-cover-letter.patch
--------------------------
total: 0 errors, 0 warnings, 0 lines checked

v4/0000-cover-letter.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 22 lines checked

v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch has no obvious style problems and is ready for submission.
--------------------------------------------------
v4/0002-tracefs-Rename-some-tracefs-function.patch
--------------------------------------------------
total: 0 errors, 0 warnings, 71 lines checked

v4/0002-tracefs-Rename-some-tracefs-function.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------
v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch
--------------------------------------------------------------
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#52:
new file mode 100644

WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'.
#194: FILE: fs/tracefs/event_inode.c:138:
+	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;

WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'.
#229: FILE: fs/tracefs/event_inode.c:173:
+		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,

WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'.
#261: FILE: fs/tracefs/event_inode.c:205:
+		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,

total: 0 errors, 4 warnings, 297 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch has style problems, please review.
----------------------------------------------------------
v4/0004-eventfs-Implement-eventfs-file-add-functions.patch
----------------------------------------------------------
total: 0 errors, 0 warnings, 101 lines checked

v4/0004-eventfs-Implement-eventfs-file-add-functions.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 129 lines checked

v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 182 lines checked

v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------
v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch
------------------------------------------------------------------
total: 0 errors, 0 warnings, 224 lines checked

v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch has no obvious style problems and is ready for submission.
---------------------------------------------------
v4/0008-eventfs-Implement-tracefs_inode_cache.patch
---------------------------------------------------
total: 0 errors, 0 warnings, 68 lines checked

v4/0008-eventfs-Implement-tracefs_inode_cache.patch has no obvious style problems and is ready for submission.
----------------------------------------------------
v4/0009-eventfs-Move-tracing-events-to-eventfs.patch
----------------------------------------------------
total: 0 errors, 0 warnings, 241 lines checked

v4/0009-eventfs-Move-tracing-events-to-eventfs.patch has no obvious style problems and is ready for submission.
-----------------------------------------------------
v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch
-----------------------------------------------------
total: 0 errors, 0 warnings, 32 lines checked

v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch has no obvious style problems and is ready for submission.

-Ajay



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs
  2023-07-14 13:27     ` Masami Hiramatsu
@ 2023-07-17  5:24       ` Ajay Kaher
  2023-07-17 12:24         ` Steven Rostedt
  0 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-17  5:24 UTC (permalink / raw)
  To: Masami Hiramatsu, Steven Rostedt
  Cc: shuah, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, kernel test robot, Nadav Amit, oe-lkp,
	Alexey Makhalov, er.ajay.kaher, srivatsa, Tapas Kundu,
	Vasavi Sirnapalli



> On 14-Jul-2023, at 6:57 PM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> !! External Email
> 
> On Thu, 13 Jul 2023 22:37:58 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> On Thu, 13 Jul 2023 17:03:24 +0530
>> Ajay Kaher <akaher@vmware.com> wrote:
>> 
>>> kprobe_args_char.tc, kprobe_args_string.tc has validation check
>>> for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
>>> 
>>> Signed-off-by: Ajay Kaher <akaher@vmware.com>
>>> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
>>> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
>>> Tested-by: Ching-lin Yu <chinglinyu@google.com>
>>> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>> 
>> As this patch as is will break when running on older kernels, I was
>> wondering if we should do this instead?
> 
> +1 since the latest kselftest is used also for checking the older
> stable kernels, the test case has to check the environment and
> change the parameter (or make it unsupported for new feature)
> So below looks good to me.

+1, many ftrace tests are unsupported in my setup and may few require
changes, not sure. Does any auto job takes care of this?

- Ajay

>> 
>> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
>> index 285b4770efad..ff7499eb98d6 100644
>> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
>> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
>> @@ -34,14 +34,19 @@ mips*)
>> esac
>> 
>> : "Test get argument (1)"
>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
>> +if grep -q eventfs_add_dir available_filter_functions; then
>> +  DIR_NAME="eventfs_add_dir"
>> +else
>> +  DIR_NAME="tracefs_create_dir"
>> +fi
>> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events
>> echo 1 > events/kprobes/testprobe/enable
>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>> grep -qe "testprobe.* arg1='t'" trace
>> 
>> echo 0 > events/kprobes/testprobe/enable
>> : "Test get argument (2)"
>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
>> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
>> echo 1 > events/kprobes/testprobe/enable
>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>> grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
>> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
>> index a4f8e7c53c1f..a202b2ea4baf 100644
>> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
>> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
>> @@ -37,14 +37,19 @@ loongarch*)
>> esac
>> 
>> : "Test get argument (1)"
>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
>> +if grep -q eventfs_add_dir available_filter_functions; then
>> +  DIR_NAME="eventfs_add_dir"
>> +else
>> +  DIR_NAME="tracefs_create_dir"
>> +fi
>> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events
>> echo 1 > events/kprobes/testprobe/enable
>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>> grep -qe "testprobe.* arg1=\"test\"" trace
>> 
>> echo 0 > events/kprobes/testprobe/enable
>> : "Test get argument (2)"
>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
>> +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
>> echo 1 > events/kprobes/testprobe/enable
>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>> grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
>> 
>> -- Steve
>> 
>>> ---
>>> .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc        | 4 ++--
>>> .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc      | 4 ++--
>>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
>>> index 285b4770efad..523cfb64539f 100644
>>> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
>>> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
>>> @@ -34,14 +34,14 @@ mips*)
>>> esac
>>> 
>>> : "Test get argument (1)"
>>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events
>>> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char" > kprobe_events
>>> echo 1 > events/kprobes/testprobe/enable
>>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>>> grep -qe "testprobe.* arg1='t'" trace
>>> 
>>> echo 0 > events/kprobes/testprobe/enable
>>> : "Test get argument (2)"
>>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
>>> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events
>>> echo 1 > events/kprobes/testprobe/enable
>>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>>> grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace
>>> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
>>> index a4f8e7c53c1f..b9f8c3f8bae8 100644
>>> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
>>> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
>>> @@ -37,14 +37,14 @@ loongarch*)
>>> esac
>>> 
>>> : "Test get argument (1)"
>>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
>>> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string" > kprobe_events
>>> echo 1 > events/kprobes/testprobe/enable
>>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>>> grep -qe "testprobe.* arg1=\"test\"" trace
>>> 
>>> echo 0 > events/kprobes/testprobe/enable
>>> : "Test get argument (2)"
>>> -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
>>> +echo "p:testprobe eventfs_add_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events
>>> echo 1 > events/kprobes/testprobe/enable
>>> echo "p:test $FUNCTION_FORK" >> kprobe_events
>>> grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs
  2023-07-17  5:24       ` Ajay Kaher
@ 2023-07-17 12:24         ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-17 12:24 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: Masami Hiramatsu, shuah, Ching-lin Yu, linux-kernel,
	linux-kselftest, linux-trace-kernel, kernel test robot,
	Nadav Amit, oe-lkp, Alexey Makhalov, er.ajay.kaher, srivatsa,
	Tapas Kundu, Vasavi Sirnapalli

On Mon, 17 Jul 2023 05:24:43 +0000
Ajay Kaher <akaher@vmware.com> wrote:

> >> As this patch as is will break when running on older kernels, I was
> >> wondering if we should do this instead?  
> > 
> > +1 since the latest kselftest is used also for checking the older
> > stable kernels, the test case has to check the environment and
> > change the parameter (or make it unsupported for new feature)
> > So below looks good to me.  
> 
> +1, many ftrace tests are unsupported in my setup and may few require
> changes, not sure. Does any auto job takes care of this?

You mean like some kernel CI? Not that I know of.

Shuah, do you run these selftests on older kernels to make sure they don't
fail just because the test is unsupported?

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-16 17:32   ` Ajay Kaher
@ 2023-07-18 13:40     ` Steven Rostedt
  2023-07-19 10:25       ` Ajay Kaher
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2023-07-18 13:40 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Sun, 16 Jul 2023 17:32:35 +0000
Ajay Kaher <akaher@vmware.com> wrote:

> Thanks Steve, hopefully I will fix all the pending nits in v5.
> Here is the checkpatch.pl report:

Hold off on v5. I hit the following on v4:

[  220.170527] BUG: unable to handle page fault for address: fffffffffffffff0
[  220.172792] #PF: supervisor read access in kernel mode
[  220.174618] #PF: error_code(0x0000) - not-present page
[  220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0
[  220.178559] Oops: 0000 [#1] PREEMPT SMP PTI
[  220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15
[  220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  220.186629] Workqueue: events_unbound eventfs_workfn
[  220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40
[  220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40
[  220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287
[  220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000
[  220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000
[  220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022
[  220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058
[  220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848
[  220.205685] FS:  0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000
[  220.207476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0
[  220.210342] Call Trace:
[  220.210879]  <TASK>
[  220.211359]  ? __die+0x23/0x70
[  220.212036]  ? page_fault_oops+0xa4/0x180
[  220.212904]  ? exc_page_fault+0xf6/0x190
[  220.213738]  ? asm_exc_page_fault+0x26/0x30
[  220.214586]  ? eventfs_set_ef_status_free+0x17/0x40
[  220.216081]  tracefs_dentry_iput+0x39/0x50
[  220.217370]  __dentry_kill+0xdc/0x170
[  220.218581]  dput+0x142/0x310
[  220.219647]  eventfs_workfn+0x42/0x70
[  220.220805]  process_one_work+0x1e2/0x3e0
[  220.222031]  worker_thread+0x1da/0x390
[  220.223204]  ? __pfx_worker_thread+0x10/0x10
[  220.224476]  kthread+0xf7/0x130
[  220.225543]  ? __pfx_kthread+0x10/0x10
[  220.226735]  ret_from_fork+0x2c/0x50
[  220.227898]  </TASK>
[  220.228792] Modules linked in:
[  220.229860] CR2: fffffffffffffff0
[  220.230960] ---[ end trace 0000000000000000 ]---


I think I know the issue, and looking to see if I can fix it.

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-18 13:40     ` Steven Rostedt
@ 2023-07-19 10:25       ` Ajay Kaher
  2023-07-19 14:23         ` Steven Rostedt
  0 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-19 10:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli



> On 18-Jul-2023, at 7:10 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> !! External Email
> 
> On Sun, 16 Jul 2023 17:32:35 +0000
> Ajay Kaher <akaher@vmware.com> wrote:
> 
>> Thanks Steve, hopefully I will fix all the pending nits in v5.
>> Here is the checkpatch.pl report:
> 
> Hold off on v5. I hit the following on v4:

OK.

> 
> [  220.170527] BUG: unable to handle page fault for address: fffffffffffffff0
> [  220.172792] #PF: supervisor read access in kernel mode
> [  220.174618] #PF: error_code(0x0000) - not-present page
> [  220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0
> [  220.178559] Oops: 0000 [#1] PREEMPT SMP PTI
> [  220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15
> [  220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [  220.186629] Workqueue: events_unbound eventfs_workfn
> [  220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40
> [  220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40
> [  220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287
> [  220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000
> [  220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000
> [  220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022
> [  220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058
> [  220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848
> [  220.205685] FS:  0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000
> [  220.207476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0
> [  220.210342] Call Trace:
> [  220.210879]  <TASK>
> [  220.211359]  ? __die+0x23/0x70
> [  220.212036]  ? page_fault_oops+0xa4/0x180
> [  220.212904]  ? exc_page_fault+0xf6/0x190
> [  220.213738]  ? asm_exc_page_fault+0x26/0x30
> [  220.214586]  ? eventfs_set_ef_status_free+0x17/0x40
> [  220.216081]  tracefs_dentry_iput+0x39/0x50
> [  220.217370]  __dentry_kill+0xdc/0x170
> [  220.218581]  dput+0x142/0x310
> [  220.219647]  eventfs_workfn+0x42/0x70
> [  220.220805]  process_one_work+0x1e2/0x3e0
> [  220.222031]  worker_thread+0x1da/0x390
> [  220.223204]  ? __pfx_worker_thread+0x10/0x10
> [  220.224476]  kthread+0xf7/0x130
> [  220.225543]  ? __pfx_kthread+0x10/0x10
> [  220.226735]  ret_from_fork+0x2c/0x50
> [  220.227898]  </TASK>
> [  220.228792] Modules linked in:
> [  220.229860] CR2: fffffffffffffff0
> [  220.230960] ---[ end trace 0000000000000000 ]---
> 
> 
> I think I know the issue, and looking to see if I can fix it.


- Is it also reproducible on v3?
- Is it manually reproducible or reproducible using any specific script?

Let me know if I can help.

-Ajay


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/10] eventfs: Move tracing/events to eventfs
  2023-07-14 21:06   ` Steven Rostedt
@ 2023-07-19 11:08     ` Ajay Kaher
  2023-07-19 11:08     ` Ajay Kaher
  1 sibling, 0 replies; 44+ messages in thread
From: Ajay Kaher @ 2023-07-19 11:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli


> On 15-Jul-2023, at 2:36 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> !! External Email
> 
> On Thu, 13 Jul 2023 17:03:23 +0530
> Ajay Kaher <akaher@vmware.com> wrote:
> 
>> Till now /sys/kernel/debug/tracing/events is a part of tracefs,
>> with-in this patch creating 'events' and it's sub-dir as eventfs.
>> Basically replacing tracefs calls with eventfs calls for 'events'.
> 
> [ note: /sys/kernel/debug/tracing is deprecated. Please avoid referencing it. ]
> 
> Up until now, /sys/kernel/tracing/events was no different than any other
> part of tracefs. The files and directories within the events directory was
> created when the tracefs was mounted, and also created for the instances in
> /sys/kernel/tracing/instances/<instance>/events. Most of these files and
> directories will never be referenced. Since there are thousands of these
> files and directories they spend their time wasting precious memory
> resources.
> 
> Move the "events" directory to the new eventfs. The eventfs will take the
> meta data of the events that they represent and store that. When the files
> in the events directory are referenced, the dentry and inodes to represent
> them are then created. When the files are no longer referenced, they are
> freed. This saves the precious memory resources that were wasted on these
> seldom referenced dentries and inodes.
> 

Some correction here:

The dentry and inodes to represent eventfs files or directories will be freed only
during drop cache or eventfs_remove(). This is same as with other dynamic fs
e.g. sysfs or procfs.

We can achieve ‘free the dentry and inodes if no longer requires’ using
create_file()->d_instantiate_anon() instead create_file()->d_instantiate(), but I faced 
some issues. This optimisation we may consider in future along with sysfs, procfs.

-Ajay


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/10] eventfs: Move tracing/events to eventfs
  2023-07-14 21:06   ` Steven Rostedt
  2023-07-19 11:08     ` Ajay Kaher
@ 2023-07-19 11:08     ` Ajay Kaher
  1 sibling, 0 replies; 44+ messages in thread
From: Ajay Kaher @ 2023-07-19 11:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli


> On 15-Jul-2023, at 2:36 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> !! External Email
> 
> On Thu, 13 Jul 2023 17:03:23 +0530
> Ajay Kaher <akaher@vmware.com> wrote:
> 
>> Till now /sys/kernel/debug/tracing/events is a part of tracefs,
>> with-in this patch creating 'events' and it's sub-dir as eventfs.
>> Basically replacing tracefs calls with eventfs calls for 'events'.
> 
> [ note: /sys/kernel/debug/tracing is deprecated. Please avoid referencing it. ]
> 
> Up until now, /sys/kernel/tracing/events was no different than any other
> part of tracefs. The files and directories within the events directory was
> created when the tracefs was mounted, and also created for the instances in
> /sys/kernel/tracing/instances/<instance>/events. Most of these files and
> directories will never be referenced. Since there are thousands of these
> files and directories they spend their time wasting precious memory
> resources.
> 
> Move the "events" directory to the new eventfs. The eventfs will take the
> meta data of the events that they represent and store that. When the files
> in the events directory are referenced, the dentry and inodes to represent
> them are then created. When the files are no longer referenced, they are
> freed. This saves the precious memory resources that were wasted on these
> seldom referenced dentries and inodes.
> 

Some correction here:

The dentry and inodes to represent eventfs files or directories will be freed only
during drop cache or eventfs_remove(). This is same as with other dynamic fs
e.g. sysfs or procfs.

We can achieve ‘free the dentry and inodes if no longer requires’ using
create_file()->d_instantiate_anon() instead create_file()->d_instantiate(), but I faced 
some issues. This optimisation we may consider in future along with sysfs, procfs.

-Ajay


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-19 10:25       ` Ajay Kaher
@ 2023-07-19 14:23         ` Steven Rostedt
  2023-07-19 18:37           ` Ajay Kaher
  0 siblings, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2023-07-19 14:23 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Wed, 19 Jul 2023 10:25:28 +0000
Ajay Kaher <akaher@vmware.com> wrote:

> - Is it also reproducible on v3?
> - Is it manually reproducible or reproducible using any specific script?
> 
> Let me know if I can help.

Just tried it against v3, and it gave me the splat that I originally had
and starting to fix, which now gives me another splat. I'll spend a couple
more days on it and start sharing code and seeing if we can work together
on this.

Here's the reproducer (of both v3 splat and the bug I'm hitting now).

 ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
 ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
 ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events

v3 gives me (and my updates too)

======================================================
 WARNING: possible circular locking dependency detected
 6.5.0-rc1-test+ #576 Not tainted
 ------------------------------------------------------
 trace-cmd/840 is trying to acquire lock:
 ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
 
 but task is already holding lock:
 ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
 
 which lock already depends on the new lock.

 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (eventfs_rwsem/1){.+.+}-{3:3}:
        down_read_nested+0x41/0x180
        eventfs_root_lookup+0x42/0x120
        __lookup_slow+0xff/0x1b0
        walk_component+0xdb/0x150
        path_lookupat+0x67/0x1a0
        filename_lookup+0xe4/0x1f0
        vfs_statx+0x9e/0x180
        vfs_fstatat+0x51/0x70
        __do_sys_newfstatat+0x3f/0x80
        do_syscall_64+0x3a/0xc0
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 
 -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:
        __lock_acquire+0x165d/0x2390
        lock_acquire+0xd4/0x2d0
        down_write+0x3b/0xd0
        dcache_dir_open_wrapper+0xc1/0x1b0
        do_dentry_open+0x20c/0x510
        path_openat+0x7ad/0xc60
        do_filp_open+0xaf/0x160
        do_sys_openat2+0xab/0xe0
        __x64_sys_openat+0x6a/0xa0
        do_syscall_64+0x3a/0xc0
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 
 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(eventfs_rwsem/1);
                                lock(&sb->s_type->i_mutex_key#5);
                                lock(eventfs_rwsem/1);
   lock(&sb->s_type->i_mutex_key#5);
 
  *** DEADLOCK ***

 1 lock held by trace-cmd/840:
  #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
 
 stack backtrace:
 CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x90
  check_noncircular+0x14b/0x160
  __lock_acquire+0x165d/0x2390
  lock_acquire+0xd4/0x2d0
  ? dcache_dir_open_wrapper+0xc1/0x1b0
  down_write+0x3b/0xd0
  ? dcache_dir_open_wrapper+0xc1/0x1b0
  dcache_dir_open_wrapper+0xc1/0x1b0
  ? __pfx_dcache_dir_open_wrapper+0x10/0x10
  do_dentry_open+0x20c/0x510
  path_openat+0x7ad/0xc60
  do_filp_open+0xaf/0x160
  do_sys_openat2+0xab/0xe0
  __x64_sys_openat+0x6a/0xa0
  do_syscall_64+0x3a/0xc0
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 RIP: 0033:0x7f1743267e41
 Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
 RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
 RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
 RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
 R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
 R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
  </TASK>


I moved the code around a bit, and it appears that kprobes is getting
dput() more than once.

I moved the d_invalidate() and dput() into the workqueue function, and on
kprobes going away, d_invalidate() frees it, and dput() is now corrupted.

Still investigating. The VFS layer is a magic box that needs the right
wizard hat to deal with, but I unfortunately am waiting on back order to
retrieve that specific hat :-p

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-19 14:23         ` Steven Rostedt
@ 2023-07-19 18:37           ` Ajay Kaher
  2023-07-19 18:40             ` Steven Rostedt
  0 siblings, 1 reply; 44+ messages in thread
From: Ajay Kaher @ 2023-07-19 18:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli



> On 19-Jul-2023, at 7:53 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> !! External Email
> 
> On Wed, 19 Jul 2023 10:25:28 +0000
> Ajay Kaher <akaher@vmware.com> wrote:
> 
>> - Is it also reproducible on v3?
>> - Is it manually reproducible or reproducible using any specific script?
>> 
>> Let me know if I can help.
> 
> Just tried it against v3, and it gave me the splat that I originally had
> and starting to fix, which now gives me another splat. I'll spend a couple
> more days on it and start sharing code and seeing if we can work together
> on this.
> 
> Here's the reproducer (of both v3 splat and the bug I'm hitting now).
> 
> ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
> ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events

I tried above steps on v4 but couldn’t reproduce:

root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
enable  filter  format  id  trigger
root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
-bash: echo: write error: No such file or directory

I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
will have unordered list and could cause problem.
 

> 
> v3 gives me (and my updates too)
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.5.0-rc1-test+ #576 Not tainted
> ------------------------------------------------------
> trace-cmd/840 is trying to acquire lock:
> ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
> 
> but task is already holding lock:
> ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (eventfs_rwsem/1){.+.+}-{3:3}:
>        down_read_nested+0x41/0x180
>        eventfs_root_lookup+0x42/0x120
>        __lookup_slow+0xff/0x1b0
>        walk_component+0xdb/0x150
>        path_lookupat+0x67/0x1a0
>        filename_lookup+0xe4/0x1f0
>        vfs_statx+0x9e/0x180
>        vfs_fstatat+0x51/0x70
>        __do_sys_newfstatat+0x3f/0x80
>        do_syscall_64+0x3a/0xc0
>        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:
>        __lock_acquire+0x165d/0x2390
>        lock_acquire+0xd4/0x2d0
>        down_write+0x3b/0xd0
>        dcache_dir_open_wrapper+0xc1/0x1b0
>        do_dentry_open+0x20c/0x510
>        path_openat+0x7ad/0xc60
>        do_filp_open+0xaf/0x160
>        do_sys_openat2+0xab/0xe0
>        __x64_sys_openat+0x6a/0xa0
>        do_syscall_64+0x3a/0xc0
>        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   rlock(eventfs_rwsem/1);
>                                lock(&sb->s_type->i_mutex_key#5);
>                                lock(eventfs_rwsem/1);
>   lock(&sb->s_type->i_mutex_key#5);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by trace-cmd/840:
>  #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
> 
> stack backtrace:
> CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x57/0x90
>  check_noncircular+0x14b/0x160
>  __lock_acquire+0x165d/0x2390
>  lock_acquire+0xd4/0x2d0
>  ? dcache_dir_open_wrapper+0xc1/0x1b0
>  down_write+0x3b/0xd0
>  ? dcache_dir_open_wrapper+0xc1/0x1b0
>  dcache_dir_open_wrapper+0xc1/0x1b0
>  ? __pfx_dcache_dir_open_wrapper+0x10/0x10
>  do_dentry_open+0x20c/0x510
>  path_openat+0x7ad/0xc60
>  do_filp_open+0xaf/0x160
>  do_sys_openat2+0xab/0xe0
>  __x64_sys_openat+0x6a/0xa0
>  do_syscall_64+0x3a/0xc0
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7f1743267e41
> Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
> RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
> RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
> RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
> R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
> R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
>  </TASK>
> 

This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not
reproduced on v3 then it’s v4 issue.

-Ajay

> 
> I moved the code around a bit, and it appears that kprobes is getting
> dput() more than once.
> 
> I moved the d_invalidate() and dput() into the workqueue function, and on
> kprobes going away, d_invalidate() frees it, and dput() is now corrupted.
> 
> Still investigating. The VFS layer is a magic box that needs the right
> wizard hat to deal with, but I unfortunately am waiting on back order to
> retrieve that specific hat :-p
> 
> -- Steve
> 
> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-19 18:37           ` Ajay Kaher
@ 2023-07-19 18:40             ` Steven Rostedt
  2023-07-21 12:48               ` Steven Rostedt
  2023-07-21 13:18               ` Steven Rostedt
  0 siblings, 2 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-19 18:40 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Wed, 19 Jul 2023 18:37:12 +0000
Ajay Kaher <akaher@vmware.com> wrote:

> > Here's the reproducer (of both v3 splat and the bug I'm hitting now).
> > 
> > ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> > ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
> > ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events  
> 
> I tried above steps on v4 but couldn’t reproduce:
> 
> root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/
> enable  filter  format  id  trigger
> root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events
> -bash: echo: write error: No such file or directory
> 
> I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
> will have unordered list and could cause problem.

I modified the srcu portion a bit. Will post soon. I think I got something
working.

I'm having doubt that the dput()s were needed in the eventfs_remove_rec(),
as the d_invalidate() appears to be enough. I'm still testing.

>  
> 
> > 
> > v3 gives me (and my updates too)
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.5.0-rc1-test+ #576 Not tainted
> > ------------------------------------------------------
> > trace-cmd/840 is trying to acquire lock:
> > ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0
> > 
> > but task is already holding lock:
> > ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> >   
> > -> #1 (eventfs_rwsem/1){.+.+}-{3:3}:  
> >        down_read_nested+0x41/0x180
> >        eventfs_root_lookup+0x42/0x120
> >        __lookup_slow+0xff/0x1b0
> >        walk_component+0xdb/0x150
> >        path_lookupat+0x67/0x1a0
> >        filename_lookup+0xe4/0x1f0
> >        vfs_statx+0x9e/0x180
> >        vfs_fstatat+0x51/0x70
> >        __do_sys_newfstatat+0x3f/0x80
> >        do_syscall_64+0x3a/0xc0
> >        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >   
> > -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}:  
> >        __lock_acquire+0x165d/0x2390
> >        lock_acquire+0xd4/0x2d0
> >        down_write+0x3b/0xd0
> >        dcache_dir_open_wrapper+0xc1/0x1b0
> >        do_dentry_open+0x20c/0x510
> >        path_openat+0x7ad/0xc60
> >        do_filp_open+0xaf/0x160
> >        do_sys_openat2+0xab/0xe0
> >        __x64_sys_openat+0x6a/0xa0
> >        do_syscall_64+0x3a/0xc0
> >        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > 
> > other info that might help us debug this:
> > 
> >  Possible unsafe locking scenario:
> > 
> >        CPU0                    CPU1
> >        ----                    ----
> >   rlock(eventfs_rwsem/1);
> >                                lock(&sb->s_type->i_mutex_key#5);
> >                                lock(eventfs_rwsem/1);
> >   lock(&sb->s_type->i_mutex_key#5);
> > 
> >  *** DEADLOCK ***
> > 
> > 1 lock held by trace-cmd/840:
> >  #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0
> > 
> > stack backtrace:
> > CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x57/0x90
> >  check_noncircular+0x14b/0x160
> >  __lock_acquire+0x165d/0x2390
> >  lock_acquire+0xd4/0x2d0
> >  ? dcache_dir_open_wrapper+0xc1/0x1b0
> >  down_write+0x3b/0xd0
> >  ? dcache_dir_open_wrapper+0xc1/0x1b0
> >  dcache_dir_open_wrapper+0xc1/0x1b0
> >  ? __pfx_dcache_dir_open_wrapper+0x10/0x10
> >  do_dentry_open+0x20c/0x510
> >  path_openat+0x7ad/0xc60
> >  do_filp_open+0xaf/0x160
> >  do_sys_openat2+0xab/0xe0
> >  __x64_sys_openat+0x6a/0xa0
> >  do_syscall_64+0x3a/0xc0
> >  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > RIP: 0033:0x7f1743267e41
> > Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
> > RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41
> > RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c
> > RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040
> > R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0
> > R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b
> >  </TASK>
> >   
> 
> This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not
> reproduced on v3 then it’s v4 issue.

The issue comes from fixing the above ;-)

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-19 18:40             ` Steven Rostedt
@ 2023-07-21 12:48               ` Steven Rostedt
  2023-07-21 13:16                 ` Steven Rostedt
  2023-07-21 13:19                 ` Steven Rostedt
  2023-07-21 13:18               ` Steven Rostedt
  1 sibling, 2 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 12:48 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel@vger.kernel.org"        
	<linux-trace-kernel@vger.kernel.org>,

On Wed, 19 Jul 2023 14:40:46 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
> > will have unordered list and could cause problem.  
> 
> I modified the srcu portion a bit. Will post soon. I think I got something
> working.
> 
> I'm having doubt that the dput()s were needed in the eventfs_remove_rec(),
> as the d_invalidate() appears to be enough. I'm still testing.

OK, I got this working and it appears to pass all my tests. I actually got
it working Wednesday night, but I tried a different approach on Thursday
that got rid of the evenfs_file and only used eventfs_inodes and makes the
files more dynamic. There's still a couple of corner cases that are not
working with this approach (the dentry counters are getting out of sync).
This should not stop this from going in. I'll continue working on that
approach for the next merge cycle. But for now, here's the patch to this
series that works.

I'm going to post it here, and then reply to it with comments about my
changes.

-- Steve

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 4db048250cdb..2718de1533e6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -36,16 +36,36 @@ struct eventfs_file {
 	const struct file_operations	*fop;
 	const struct inode_operations	*iop;
 	union {
+		struct list_head	del_list;
 		struct rcu_head		rcu;
-		struct llist_node	llist;  /* For freeing after RCU */
+		unsigned long		is_freed; /* Freed if one of the above is set */
 	};
 	void				*data;
 	umode_t				mode;
-	bool				created;
+	unsigned int			flags;
 };
 
 static DEFINE_MUTEX(eventfs_mutex);
 DEFINE_STATIC_SRCU(eventfs_srcu);
+
+static struct dentry *eventfs_root_lookup(struct inode *dir,
+					  struct dentry *dentry,
+					  unsigned int flags);
+static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
+static int eventfs_release(struct inode *inode, struct file *file);
+
+static const struct inode_operations eventfs_root_dir_inode_operations = {
+	.lookup		= eventfs_root_lookup,
+};
+
+static const struct file_operations eventfs_file_operations = {
+	.open           = dcache_dir_open_wrapper,
+	.read		= generic_read_dir,
+	.iterate_shared	= dcache_readdir,
+	.llseek		= generic_file_llseek,
+	.release        = eventfs_release,
+};
+
 /**
  * create_file - create a file in the tracefs filesystem
  * @name: the name of the file to create.
@@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
  * If tracefs is not enabled in the kernel, the value -%ENODEV will be
  * returned.
  */
-static struct dentry *create_dir(const char *name, umode_t mode,
-				 struct dentry *parent, void *data,
-				 const struct file_operations *fop,
-				 const struct inode_operations *iop)
+static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
 {
 	struct tracefs_inode *ti;
 	struct dentry *dentry;
 	struct inode *inode;
 
-	WARN_ON(!S_ISDIR(mode));
-
 	dentry = eventfs_start_creating(name, parent);
 	if (IS_ERR(dentry))
 		return dentry;
@@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
 	if (unlikely(!inode))
 		return eventfs_failed_creating(dentry);
 
-	inode->i_mode = mode;
-	inode->i_op = iop;
-	inode->i_fop = fop;
+	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+	inode->i_op = &eventfs_root_dir_inode_operations;
+	inode->i_fop = &eventfs_file_operations;
 	inode->i_private = data;
 
 	ti = get_tracefs(inode);
@@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
 	struct tracefs_inode *ti_parent;
 	struct eventfs_file *ef;
 
+	mutex_lock(&eventfs_mutex);
 	ti_parent = get_tracefs(dentry->d_parent->d_inode);
 	if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
-		return;
+		goto out;
 
 	ef = dentry->d_fsdata;
 	if (!ef)
-		return;
-	ef->created = false;
+		goto out;
+	/*
+	 * If ef was freed, then the LSB bit is set for d_fsdata.
+	 * But this should not happen, as it should still have a
+	 * ref count that prevents it. Warn in case it does.
+	 */
+	if (WARN_ON_ONCE((unsigned long)ef & 1))
+		goto out;
+
+	dentry->d_fsdata = NULL;
+
 	ef->dentry = NULL;
+ out:
+	mutex_unlock(&eventfs_mutex);
 }
 
 /**
@@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
 	ti->private = ef->ei;
 }
 
+static struct dentry *
+create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
+{
+	bool invalidate = false;
+	struct dentry *dentry;
+
+	mutex_lock(&eventfs_mutex);
+	if (ef->is_freed) {
+		mutex_unlock(&eventfs_mutex);
+		return NULL;
+	}
+	if (ef->dentry) {
+		dentry = ef->dentry;
+		/* On dir open, up the ref count */
+		if (!lookup)
+			dget(dentry);
+		mutex_unlock(&eventfs_mutex);
+		return dentry;
+	}
+	mutex_unlock(&eventfs_mutex);
+
+	if (!lookup)
+		inode_lock(parent->d_inode);
+
+	if (ef->ei)
+		dentry = create_dir(ef->name, parent, ef->data);
+	else
+		dentry = create_file(ef->name, ef->mode, parent,
+				     ef->data, ef->fop);
+
+	if (!lookup)
+		inode_unlock(parent->d_inode);
+
+	mutex_lock(&eventfs_mutex);
+	if (IS_ERR_OR_NULL(dentry)) {
+		/* If the ef was already updated get it */
+		dentry = ef->dentry;
+		if (dentry && !lookup)
+			dget(dentry);
+		mutex_unlock(&eventfs_mutex);
+		return dentry;
+	}
+
+	if (!ef->dentry && !ef->is_freed) {
+		ef->dentry = dentry;
+		if (ef->ei)
+			eventfs_post_create_dir(ef);
+		dentry->d_fsdata = ef;
+	} else {
+		/* A race here, should try again (unless freed) */
+		invalidate = true;
+	}
+	mutex_unlock(&eventfs_mutex);
+	if (invalidate)
+		d_invalidate(dentry);
+
+	if (lookup || invalidate)
+		dput(dentry);
+
+	return invalidate ? NULL : dentry;
+}
+
+static bool match_event_file(struct eventfs_file *ef, const char *name)
+{
+	bool ret;
+
+	mutex_lock(&eventfs_mutex);
+	ret = !ef->is_freed && strcmp(ef->name, name) == 0;
+	mutex_unlock(&eventfs_mutex);
+
+	return ret;
+}
+
 /**
  * eventfs_root_lookup - lookup routine to create file/dir
  * @dir: directory in which lookup to be done
@@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
  * Used to create dynamic file/dir with-in @dir, search with-in ei
  * list, if @dentry found go ahead and create the file/dir
  */
-
 static struct dentry *eventfs_root_lookup(struct inode *dir,
 					  struct dentry *dentry,
 					  unsigned int flags)
@@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 	idx = srcu_read_lock(&eventfs_srcu);
 	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
 				 srcu_read_lock_held(&eventfs_srcu)) {
-		if (strcmp(ef->name, dentry->d_name.name))
+		if (!match_event_file(ef, dentry->d_name.name))
 			continue;
 		ret = simple_lookup(dir, dentry, flags);
-		if (ef->created)
-			continue;
-		mutex_lock(&eventfs_mutex);
-		ef->created = true;
-		if (ef->ei)
-			ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
-						ef->data, ef->fop, ef->iop);
-		else
-			ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
-						 ef->data, ef->fop);
-
-		if (IS_ERR_OR_NULL(ef->dentry)) {
-			ef->created = false;
-			mutex_unlock(&eventfs_mutex);
-		} else {
-			if (ef->ei)
-				eventfs_post_create_dir(ef);
-			ef->dentry->d_fsdata = ef;
-			mutex_unlock(&eventfs_mutex);
-			dput(ef->dentry);
-		}
+		create_dentry(ef, ef->d_parent, true);
 		break;
 	}
 	srcu_read_unlock(&eventfs_srcu, idx);
@@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
 	struct tracefs_inode *ti;
 	struct eventfs_inode *ei;
 	struct eventfs_file *ef;
+	struct dentry *dentry;
 	int idx;
 
 	ti = get_tracefs(inode);
@@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
 	idx = srcu_read_lock(&eventfs_srcu);
 	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
 				 srcu_read_lock_held(&eventfs_srcu)) {
-		if (ef->created)
-			dput(ef->dentry);
+		mutex_lock(&eventfs_mutex);
+		dentry = ef->dentry;
+		mutex_unlock(&eventfs_mutex);
+		if (dentry)
+			dput(dentry);
 	}
 	srcu_read_unlock(&eventfs_srcu, idx);
 	return dcache_dir_close(inode, file);
@@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
 	ei = ti->private;
 	idx = srcu_read_lock(&eventfs_srcu);
 	list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
-		if (ef->created) {
-			dget(ef->dentry);
-			continue;
-		}
-		mutex_lock(&eventfs_mutex);
-		ef->created = true;
-
-		inode_lock(dentry->d_inode);
-		if (ef->ei)
-			ef->dentry = create_dir(ef->name, ef->mode, dentry,
-						ef->data, ef->fop, ef->iop);
-		else
-			ef->dentry = create_file(ef->name, ef->mode, dentry,
-						 ef->data, ef->fop);
-		inode_unlock(dentry->d_inode);
-
-		if (IS_ERR_OR_NULL(ef->dentry)) {
-			ef->created = false;
-		} else {
-			if (ef->ei)
-				eventfs_post_create_dir(ef);
-			ef->dentry->d_fsdata = ef;
-		}
-		mutex_unlock(&eventfs_mutex);
+		create_dentry(ef, dentry, false);
 	}
 	srcu_read_unlock(&eventfs_srcu, idx);
 	return dcache_dir_open(inode, file);
 }
 
-static const struct file_operations eventfs_file_operations = {
-	.open           = dcache_dir_open_wrapper,
-	.read		= generic_read_dir,
-	.iterate_shared	= dcache_readdir,
-	.llseek		= generic_file_llseek,
-	.release        = eventfs_release,
-};
-
-static const struct inode_operations eventfs_root_dir_inode_operations = {
-	.lookup		= eventfs_root_lookup,
-};
-
 /**
  * eventfs_prepare_ef - helper function to prepare eventfs_file
  * @name: the name of the file/directory to create.
@@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
 	ti_parent = get_tracefs(parent->d_inode);
 	ei_parent = ti_parent->private;
 
-	ef = eventfs_prepare_ef(name,
-		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
-		&eventfs_file_operations,
-		&eventfs_root_dir_inode_operations, NULL);
-
+	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
 	if (IS_ERR(ef))
 		return ef;
 
@@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
 	if (!ef_parent)
 		return ERR_PTR(-EINVAL);
 
-	ef = eventfs_prepare_ef(name,
-		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
-		&eventfs_file_operations,
-		&eventfs_root_dir_inode_operations, NULL);
-
+	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
 	if (IS_ERR(ef))
 		return ef;
 
@@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
 	return 0;
 }
 
-static LLIST_HEAD(free_list);
-
-static void eventfs_workfn(struct work_struct *work)
-{
-	struct eventfs_file *ef, *tmp;
-	struct llist_node *llnode;
-
-	llnode = llist_del_all(&free_list);
-	llist_for_each_entry_safe(ef, tmp, llnode, llist) {
-		if (ef->created && ef->dentry)
-			dput(ef->dentry);
-		kfree(ef->name);
-		kfree(ef->ei);
-		kfree(ef);
-	}
-}
-
-DECLARE_WORK(eventfs_work, eventfs_workfn);
-
 static void free_ef(struct rcu_head *head)
 {
 	struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
 
-	if (!llist_add(&ef->llist, &free_list))
-		return;
-
-	queue_work(system_unbound_wq, &eventfs_work);
+	kfree(ef->name);
+	kfree(ef->ei);
+	kfree(ef);
 }
 
-
-
 /**
  * eventfs_remove_rec - remove eventfs dir or file from list
  * @ef: eventfs_file to be removed.
@@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
  * This function recursively remove eventfs_file which
  * contains info of file or dir.
  */
-static void eventfs_remove_rec(struct eventfs_file *ef, int level)
+static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
 {
 	struct eventfs_file *ef_child;
 
@@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
 		/* search for nested folders or files */
 		list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
 					 lockdep_is_held(&eventfs_mutex)) {
-			eventfs_remove_rec(ef_child, level + 1);
+			eventfs_remove_rec(ef_child, head, level + 1);
 		}
 	}
 
-	if (ef->created && ef->dentry)
-		d_invalidate(ef->dentry);
-
 	list_del_rcu(&ef->list);
-	call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+	list_add_tail(&ef->del_list, head);
 }
 
 /**
@@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
  */
 void eventfs_remove(struct eventfs_file *ef)
 {
+	struct eventfs_file *tmp;
+	LIST_HEAD(ef_del_list);
+	struct dentry *dentry_list = NULL;
+	struct dentry *dentry;
+
 	if (!ef)
 		return;
 
 	mutex_lock(&eventfs_mutex);
-	eventfs_remove_rec(ef, 0);
+	eventfs_remove_rec(ef, &ef_del_list, 0);
+
+	list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
+		if (ef->dentry) {
+			unsigned long ptr = (unsigned long)dentry_list;
+
+			/* Keep the dentry from being freed yet */
+			dget(ef->dentry);
+
+			/*
+			 * Paranoid: The dget() above should prevent the dentry
+			 * from being freed and calling eventfs_set_ef_status_free().
+			 * But just in case, set the link list LSB pointer to 1
+			 * and have eventfs_set_ef_status_free() check that to
+			 * make sure that if it does happen, it will not think
+			 * the d_fsdata is an event_file.
+			 *
+			 * For this to work, no event_file should be allocated
+			 * on a odd space, as the ef should always be allocated
+			 * to be at least word aligned. Check for that too.
+			 */
+			WARN_ON_ONCE(ptr & 1);
+
+			ef->dentry->d_fsdata = (void *)(ptr | 1);
+			dentry_list = ef->dentry;
+			ef->dentry = NULL;
+		}
+		call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+	}
 	mutex_unlock(&eventfs_mutex);
+
+	while (dentry_list) {
+		unsigned long ptr;
+
+		dentry = dentry_list;
+		ptr = (unsigned long)dentry->d_fsdata & ~1UL;
+		dentry_list = (struct dentry *)ptr;
+		dentry->d_fsdata = NULL;
+		d_invalidate(dentry);
+		mutex_lock(&eventfs_mutex);
+		/* dentry should now have at least a single reference */
+		WARN_ONCE((int)d_count(dentry) < 1,
+			  "dentry %px less than one reference (%d) after invalidate\n",
+			  dentry, d_count(dentry));
+		mutex_unlock(&eventfs_mutex);
+		dput(dentry);
+	}
 }
 
 /**
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index c443a0c32a8c..1b880b5cd29d 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
 struct dentry *tracefs_failed_creating(struct dentry *dentry);
 struct inode *tracefs_get_inode(struct super_block *sb);
 
+void eventfs_set_ef_status_free(struct dentry *dentry);
+
 #endif /* _TRACEFS_INTERNAL_H */
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 4d30b0cafc5f..47c1b4d21735 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
 
 void eventfs_remove_events_dir(struct dentry *dentry);
 
-void eventfs_set_ef_status_free(struct dentry *dentry);
-
 struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 12:48               ` Steven Rostedt
@ 2023-07-21 13:16                 ` Steven Rostedt
  2023-07-21 13:19                 ` Steven Rostedt
  1 sibling, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 13:16 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel@vger.kernel.org"        
	<linux-trace-kernel@vger.kernel.org>,

On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
>  	const struct file_operations	*fop;
>  	const struct inode_operations	*iop;
>  	union {
> +		struct list_head	del_list;
>  		struct rcu_head		rcu;
> -		struct llist_node	llist;  /* For freeing after RCU */
> +		unsigned long		is_freed; /* Freed if one of the above is set */

I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().

I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!

>  	};
>  	void				*data;
>  	umode_t				mode;
> -	bool				created;
> +	unsigned int			flags;

Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.

The flags isn't used and can be removed. I just forgot to do so.

>  };
>  
>  static DEFINE_MUTEX(eventfs_mutex);
>  DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> +					  struct dentry *dentry,
> +					  unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> +	.lookup		= eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> +	.open           = dcache_dir_open_wrapper,
> +	.read		= generic_read_dir,
> +	.iterate_shared	= dcache_readdir,
> +	.llseek		= generic_file_llseek,
> +	.release        = eventfs_release,
> +};
> +

In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.

>  /**
>   * create_file - create a file in the tracefs filesystem
>   * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
>   * If tracefs is not enabled in the kernel, the value -%ENODEV will be
>   * returned.
>   */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> -				 struct dentry *parent, void *data,
> -				 const struct file_operations *fop,
> -				 const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
>  {

As stated, the directories always used the same *op values, so I just hard
coded it.

>  	struct tracefs_inode *ti;
>  	struct dentry *dentry;
>  	struct inode *inode;
>  
> -	WARN_ON(!S_ISDIR(mode));
> -
>  	dentry = eventfs_start_creating(name, parent);
>  	if (IS_ERR(dentry))
>  		return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
>  	if (unlikely(!inode))
>  		return eventfs_failed_creating(dentry);
>  
> -	inode->i_mode = mode;
> -	inode->i_op = iop;
> -	inode->i_fop = fop;
> +	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> +	inode->i_op = &eventfs_root_dir_inode_operations;
> +	inode->i_fop = &eventfs_file_operations;
>  	inode->i_private = data;
>  
>  	ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
>  	struct tracefs_inode *ti_parent;
>  	struct eventfs_file *ef;
>  
> +	mutex_lock(&eventfs_mutex);

To synchronize with the removals, I needed to add locking here.

>  	ti_parent = get_tracefs(dentry->d_parent->d_inode);
>  	if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> -		return;
> +		goto out;
>  
>  	ef = dentry->d_fsdata;
>  	if (!ef)
> -		return;
> -	ef->created = false;
> +		goto out;
> +	/*
> +	 * If ef was freed, then the LSB bit is set for d_fsdata.
> +	 * But this should not happen, as it should still have a
> +	 * ref count that prevents it. Warn in case it does.
> +	 */
> +	if (WARN_ON_ONCE((unsigned long)ef & 1))
> +		goto out;

During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.

> +
> +	dentry->d_fsdata = NULL;
> +
>  	ef->dentry = NULL;
> + out:
> +	mutex_unlock(&eventfs_mutex);
>  }
>  
>  /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
>  	ti->private = ef->ei;
>  }
>  
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{

Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.

> +	bool invalidate = false;
> +	struct dentry *dentry;
> +
> +	mutex_lock(&eventfs_mutex);
> +	if (ef->is_freed) {
> +		mutex_unlock(&eventfs_mutex);
> +		return NULL;
> +	}

Ignore if the ef is on its way to be freed.

> +	if (ef->dentry) {
> +		dentry = ef->dentry;

If the ef already has a dentry (created) then use it.

> +		/* On dir open, up the ref count */
> +		if (!lookup)
> +			dget(dentry);
> +		mutex_unlock(&eventfs_mutex);
> +		return dentry;
> +	}
> +	mutex_unlock(&eventfs_mutex);
> +
> +	if (!lookup)
> +		inode_lock(parent->d_inode);
> +
> +	if (ef->ei)
> +		dentry = create_dir(ef->name, parent, ef->data);
> +	else
> +		dentry = create_file(ef->name, ef->mode, parent,
> +				     ef->data, ef->fop);
> +
> +	if (!lookup)
> +		inode_unlock(parent->d_inode);
> +
> +	mutex_lock(&eventfs_mutex);
> +	if (IS_ERR_OR_NULL(dentry)) {

With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.

Note, if the ef is freed, it should not have a dentry.

> +		/* If the ef was already updated get it */
> +		dentry = ef->dentry;
> +		if (dentry && !lookup)
> +			dget(dentry);
> +		mutex_unlock(&eventfs_mutex);
> +		return dentry;
> +	}
> +
> +	if (!ef->dentry && !ef->is_freed) {

With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.

> +		ef->dentry = dentry;
> +		if (ef->ei)
> +			eventfs_post_create_dir(ef);
> +		dentry->d_fsdata = ef;
> +	} else {
> +		/* A race here, should try again (unless freed) */
> +		invalidate = true;

I had a WARN_ON() once here. Probably could add a:

		WARN_ON_ONCE(!ef->is_freed);

> +	}
> +	mutex_unlock(&eventfs_mutex);
> +	if (invalidate)
> +		d_invalidate(dentry);
> +
> +	if (lookup || invalidate)
> +		dput(dentry);
> +
> +	return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{

A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.

> +	bool ret;
> +
> +	mutex_lock(&eventfs_mutex);
> +	ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> +	mutex_unlock(&eventfs_mutex);
> +
> +	return ret;
> +}
> +
>  /**
>   * eventfs_root_lookup - lookup routine to create file/dir
>   * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
>   * Used to create dynamic file/dir with-in @dir, search with-in ei
>   * list, if @dentry found go ahead and create the file/dir
>   */
> -
>  static struct dentry *eventfs_root_lookup(struct inode *dir,
>  					  struct dentry *dentry,
>  					  unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
>  	idx = srcu_read_lock(&eventfs_srcu);
>  	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
>  				 srcu_read_lock_held(&eventfs_srcu)) {
> -		if (strcmp(ef->name, dentry->d_name.name))
> +		if (!match_event_file(ef, dentry->d_name.name))
>  			continue;
>  		ret = simple_lookup(dir, dentry, flags);
> -		if (ef->created)
> -			continue;
> -		mutex_lock(&eventfs_mutex);
> -		ef->created = true;
> -		if (ef->ei)
> -			ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> -						ef->data, ef->fop, ef->iop);
> -		else
> -			ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> -						 ef->data, ef->fop);
> -
> -		if (IS_ERR_OR_NULL(ef->dentry)) {
> -			ef->created = false;
> -			mutex_unlock(&eventfs_mutex);
> -		} else {
> -			if (ef->ei)
> -				eventfs_post_create_dir(ef);
> -			ef->dentry->d_fsdata = ef;
> -			mutex_unlock(&eventfs_mutex);
> -			dput(ef->dentry);
> -		}
> +		create_dentry(ef, ef->d_parent, true);
>  		break;
>  	}
>  	srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
>  	struct tracefs_inode *ti;
>  	struct eventfs_inode *ei;
>  	struct eventfs_file *ef;
> +	struct dentry *dentry;
>  	int idx;
>  
>  	ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
>  	idx = srcu_read_lock(&eventfs_srcu);
>  	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
>  				 srcu_read_lock_held(&eventfs_srcu)) {
> -		if (ef->created)
> -			dput(ef->dentry);
> +		mutex_lock(&eventfs_mutex);
> +		dentry = ef->dentry;
> +		mutex_unlock(&eventfs_mutex);
> +		if (dentry)
> +			dput(dentry);
>  	}
>  	srcu_read_unlock(&eventfs_srcu, idx);
>  	return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
>  	ei = ti->private;
>  	idx = srcu_read_lock(&eventfs_srcu);
>  	list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> -		if (ef->created) {
> -			dget(ef->dentry);
> -			continue;
> -		}
> -		mutex_lock(&eventfs_mutex);
> -		ef->created = true;
> -
> -		inode_lock(dentry->d_inode);
> -		if (ef->ei)
> -			ef->dentry = create_dir(ef->name, ef->mode, dentry,
> -						ef->data, ef->fop, ef->iop);
> -		else
> -			ef->dentry = create_file(ef->name, ef->mode, dentry,
> -						 ef->data, ef->fop);
> -		inode_unlock(dentry->d_inode);
> -
> -		if (IS_ERR_OR_NULL(ef->dentry)) {
> -			ef->created = false;
> -		} else {
> -			if (ef->ei)
> -				eventfs_post_create_dir(ef);
> -			ef->dentry->d_fsdata = ef;
> -		}
> -		mutex_unlock(&eventfs_mutex);
> +		create_dentry(ef, dentry, false);
>  	}
>  	srcu_read_unlock(&eventfs_srcu, idx);
>  	return dcache_dir_open(inode, file);
>  }
>  
> -static const struct file_operations eventfs_file_operations = {
> -	.open           = dcache_dir_open_wrapper,
> -	.read		= generic_read_dir,
> -	.iterate_shared	= dcache_readdir,
> -	.llseek		= generic_file_llseek,
> -	.release        = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> -	.lookup		= eventfs_root_lookup,
> -};
> -
>  /**
>   * eventfs_prepare_ef - helper function to prepare eventfs_file
>   * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
>  	ti_parent = get_tracefs(parent->d_inode);
>  	ei_parent = ti_parent->private;
>  
> -	ef = eventfs_prepare_ef(name,
> -		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> -		&eventfs_file_operations,
> -		&eventfs_root_dir_inode_operations, NULL);
> -
> +	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);

For directories, just use the hard coded values.

>  	if (IS_ERR(ef))
>  		return ef;
>  
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
>  	if (!ef_parent)
>  		return ERR_PTR(-EINVAL);
>  
> -	ef = eventfs_prepare_ef(name,
> -		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> -		&eventfs_file_operations,
> -		&eventfs_root_dir_inode_operations, NULL);
> -
> +	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);

ditto.

>  	if (IS_ERR(ef))
>  		return ef;
>  
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
>  	return 0;
>  }
>  
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> -	struct eventfs_file *ef, *tmp;
> -	struct llist_node *llnode;
> -
> -	llnode = llist_del_all(&free_list);
> -	llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> -		if (ef->created && ef->dentry)
> -			dput(ef->dentry);
> -		kfree(ef->name);
> -		kfree(ef->ei);
> -		kfree(ef);
> -	}
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
>  static void free_ef(struct rcu_head *head)
>  {
>  	struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>  
> -	if (!llist_add(&ef->llist, &free_list))
> -		return;
> -
> -	queue_work(system_unbound_wq, &eventfs_work);
> +	kfree(ef->name);
> +	kfree(ef->ei);
> +	kfree(ef);

Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.

>  }
>  
> -
> -
>  /**
>   * eventfs_remove_rec - remove eventfs dir or file from list
>   * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
>   * This function recursively remove eventfs_file which
>   * contains info of file or dir.
>   */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
>  {
>  	struct eventfs_file *ef_child;
>  
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
>  		/* search for nested folders or files */
>  		list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
>  					 lockdep_is_held(&eventfs_mutex)) {
> -			eventfs_remove_rec(ef_child, level + 1);
> +			eventfs_remove_rec(ef_child, head, level + 1);
>  		}
>  	}
>  
> -	if (ef->created && ef->dentry)
> -		d_invalidate(ef->dentry);
> -
>  	list_del_rcu(&ef->list);
> -	call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> +	list_add_tail(&ef->del_list, head);

Hold off on freeing the ef. Add it to a link list to do so later.

>  }
>  
>  /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
>   */
>  void eventfs_remove(struct eventfs_file *ef)
>  {
> +	struct eventfs_file *tmp;
> +	LIST_HEAD(ef_del_list);
> +	struct dentry *dentry_list = NULL;
> +	struct dentry *dentry;
> +
>  	if (!ef)
>  		return;
>  
>  	mutex_lock(&eventfs_mutex);
> -	eventfs_remove_rec(ef, 0);
> +	eventfs_remove_rec(ef, &ef_del_list, 0);

The above returns back with ef_del_list holding all the ef's to be freed.

I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.

> +
> +	list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> +		if (ef->dentry) {
> +			unsigned long ptr = (unsigned long)dentry_list;
> +
> +			/* Keep the dentry from being freed yet */
> +			dget(ef->dentry);
> +
> +			/*
> +			 * Paranoid: The dget() above should prevent the dentry
> +			 * from being freed and calling eventfs_set_ef_status_free().
> +			 * But just in case, set the link list LSB pointer to 1
> +			 * and have eventfs_set_ef_status_free() check that to
> +			 * make sure that if it does happen, it will not think
> +			 * the d_fsdata is an event_file.
> +			 *
> +			 * For this to work, no event_file should be allocated
> +			 * on a odd space, as the ef should always be allocated
> +			 * to be at least word aligned. Check for that too.
> +			 */
> +			WARN_ON_ONCE(ptr & 1);
> +
> +			ef->dentry->d_fsdata = (void *)(ptr | 1);

Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.


> +			dentry_list = ef->dentry;
> +			ef->dentry = NULL;
> +		}
> +		call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> +	}
>  	mutex_unlock(&eventfs_mutex);
> +
> +	while (dentry_list) {
> +		unsigned long ptr;
> +
> +		dentry = dentry_list;
> +		ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> +		dentry_list = (struct dentry *)ptr;
> +		dentry->d_fsdata = NULL;

With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!

> +		d_invalidate(dentry);
> +		mutex_lock(&eventfs_mutex);
> +		/* dentry should now have at least a single reference */
> +		WARN_ONCE((int)d_count(dentry) < 1,
> +			  "dentry %px less than one reference (%d) after invalidate\n",

I did update the above to:

		WARN_ONCE((int)d_count(dentry) < 1,
			  "dentry %px (%s) less than one reference (%d) after invalidate\n",
			  dentry, dentry->d_name.name, d_count(dentry));

To include the name of the dentry (my current work is triggering this still).

> +			  dentry, d_count(dentry));
> +		mutex_unlock(&eventfs_mutex);
> +		dput(dentry);
> +	}
>  }
>  
>  /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
>  struct dentry *tracefs_failed_creating(struct dentry *dentry);
>  struct inode *tracefs_get_inode(struct super_block *sb);
>  
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
>  #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>  
>  void eventfs_remove_events_dir(struct dentry *dentry);
>  
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -

Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.

-- Steve


>  struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-19 18:40             ` Steven Rostedt
  2023-07-21 12:48               ` Steven Rostedt
@ 2023-07-21 13:18               ` Steven Rostedt
  2023-07-21 19:14                 ` Steven Rostedt
  1 sibling, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 13:18 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli


[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]

On Wed, 19 Jul 2023 14:40:46 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list
> > will have unordered list and could cause problem.  
> 
> I modified the srcu portion a bit. Will post soon. I think I got something
> working.
> 
> I'm having doubt that the dput()s were needed in the eventfs_remove_rec(),
> as the d_invalidate() appears to be enough. I'm still testing.

OK, I got this working and it appears to pass all my tests. I actually got
it working Wednesday night, but I tried a different approach on Thursday
that got rid of the evenfs_file and only used eventfs_inodes and makes the
files more dynamic. There's still a couple of corner cases that are not
working with this approach (the dentry counters are getting out of sync).
This should not stop this from going in. I'll continue working on that
approach for the next merge cycle. But for now, here's the patch to this
series that works.

I'm going to post it here, and then reply to it with comments about my
changes.

-- Steve

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 4db048250cdb..2718de1533e6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -36,16 +36,36 @@ struct eventfs_file {
 	const struct file_operations	*fop;
 	const struct inode_operations	*iop;
 	union {
+		struct list_head	del_list;
 		struct rcu_head		rcu;
-		struct llist_node	llist;  /* For freeing after RCU */
+		unsigned long		is_freed; /* Freed if one of the above is set */
 	};
 	void				*data;
 	umode_t				mode;
-	bool				created;
+	unsigned int			flags;
 };
 
 static DEFINE_MUTEX(eventfs_mutex);
 DEFINE_STATIC_SRCU(eventfs_srcu);
+
+static struct dentry *eventfs_root_lookup(struct inode *dir,
+					  struct dentry *dentry,
+					  unsigned int flags);
+static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
+static int eventfs_release(struct inode *inode, struct file *file);
+
+static const struct inode_operations eventfs_root_dir_inode_operations = {
+	.lookup		= eventfs_root_lookup,
+};
+
+static const struct file_operations eventfs_file_operations = {
+	.open           = dcache_dir_open_wrapper,
+	.read		= generic_read_dir,
+	.iterate_shared	= dcache_readdir,
+	.llseek		= generic_file_llseek,
+	.release        = eventfs_release,
+};
+
 /**
  * create_file - create a file in the tracefs filesystem
  * @name: the name of the file to create.
@@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
  * If tracefs is not enabled in the kernel, the value -%ENODEV will be
  * returned.
  */
-static struct dentry *create_dir(const char *name, umode_t mode,
-				 struct dentry *parent, void *data,
-				 const struct file_operations *fop,
-				 const struct inode_operations *iop)
+static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
 {
 	struct tracefs_inode *ti;
 	struct dentry *dentry;
 	struct inode *inode;
 
-	WARN_ON(!S_ISDIR(mode));
-
 	dentry = eventfs_start_creating(name, parent);
 	if (IS_ERR(dentry))
 		return dentry;
@@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
 	if (unlikely(!inode))
 		return eventfs_failed_creating(dentry);
 
-	inode->i_mode = mode;
-	inode->i_op = iop;
-	inode->i_fop = fop;
+	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+	inode->i_op = &eventfs_root_dir_inode_operations;
+	inode->i_fop = &eventfs_file_operations;
 	inode->i_private = data;
 
 	ti = get_tracefs(inode);
@@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
 	struct tracefs_inode *ti_parent;
 	struct eventfs_file *ef;
 
+	mutex_lock(&eventfs_mutex);
 	ti_parent = get_tracefs(dentry->d_parent->d_inode);
 	if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
-		return;
+		goto out;
 
 	ef = dentry->d_fsdata;
 	if (!ef)
-		return;
-	ef->created = false;
+		goto out;
+	/*
+	 * If ef was freed, then the LSB bit is set for d_fsdata.
+	 * But this should not happen, as it should still have a
+	 * ref count that prevents it. Warn in case it does.
+	 */
+	if (WARN_ON_ONCE((unsigned long)ef & 1))
+		goto out;
+
+	dentry->d_fsdata = NULL;
+
 	ef->dentry = NULL;
+ out:
+	mutex_unlock(&eventfs_mutex);
 }
 
 /**
@@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
 	ti->private = ef->ei;
 }
 
+static struct dentry *
+create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
+{
+	bool invalidate = false;
+	struct dentry *dentry;
+
+	mutex_lock(&eventfs_mutex);
+	if (ef->is_freed) {
+		mutex_unlock(&eventfs_mutex);
+		return NULL;
+	}
+	if (ef->dentry) {
+		dentry = ef->dentry;
+		/* On dir open, up the ref count */
+		if (!lookup)
+			dget(dentry);
+		mutex_unlock(&eventfs_mutex);
+		return dentry;
+	}
+	mutex_unlock(&eventfs_mutex);
+
+	if (!lookup)
+		inode_lock(parent->d_inode);
+
+	if (ef->ei)
+		dentry = create_dir(ef->name, parent, ef->data);
+	else
+		dentry = create_file(ef->name, ef->mode, parent,
+				     ef->data, ef->fop);
+
+	if (!lookup)
+		inode_unlock(parent->d_inode);
+
+	mutex_lock(&eventfs_mutex);
+	if (IS_ERR_OR_NULL(dentry)) {
+		/* If the ef was already updated get it */
+		dentry = ef->dentry;
+		if (dentry && !lookup)
+			dget(dentry);
+		mutex_unlock(&eventfs_mutex);
+		return dentry;
+	}
+
+	if (!ef->dentry && !ef->is_freed) {
+		ef->dentry = dentry;
+		if (ef->ei)
+			eventfs_post_create_dir(ef);
+		dentry->d_fsdata = ef;
+	} else {
+		/* A race here, should try again (unless freed) */
+		invalidate = true;
+	}
+	mutex_unlock(&eventfs_mutex);
+	if (invalidate)
+		d_invalidate(dentry);
+
+	if (lookup || invalidate)
+		dput(dentry);
+
+	return invalidate ? NULL : dentry;
+}
+
+static bool match_event_file(struct eventfs_file *ef, const char *name)
+{
+	bool ret;
+
+	mutex_lock(&eventfs_mutex);
+	ret = !ef->is_freed && strcmp(ef->name, name) == 0;
+	mutex_unlock(&eventfs_mutex);
+
+	return ret;
+}
+
 /**
  * eventfs_root_lookup - lookup routine to create file/dir
  * @dir: directory in which lookup to be done
@@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
  * Used to create dynamic file/dir with-in @dir, search with-in ei
  * list, if @dentry found go ahead and create the file/dir
  */
-
 static struct dentry *eventfs_root_lookup(struct inode *dir,
 					  struct dentry *dentry,
 					  unsigned int flags)
@@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 	idx = srcu_read_lock(&eventfs_srcu);
 	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
 				 srcu_read_lock_held(&eventfs_srcu)) {
-		if (strcmp(ef->name, dentry->d_name.name))
+		if (!match_event_file(ef, dentry->d_name.name))
 			continue;
 		ret = simple_lookup(dir, dentry, flags);
-		if (ef->created)
-			continue;
-		mutex_lock(&eventfs_mutex);
-		ef->created = true;
-		if (ef->ei)
-			ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
-						ef->data, ef->fop, ef->iop);
-		else
-			ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
-						 ef->data, ef->fop);
-
-		if (IS_ERR_OR_NULL(ef->dentry)) {
-			ef->created = false;
-			mutex_unlock(&eventfs_mutex);
-		} else {
-			if (ef->ei)
-				eventfs_post_create_dir(ef);
-			ef->dentry->d_fsdata = ef;
-			mutex_unlock(&eventfs_mutex);
-			dput(ef->dentry);
-		}
+		create_dentry(ef, ef->d_parent, true);
 		break;
 	}
 	srcu_read_unlock(&eventfs_srcu, idx);
@@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
 	struct tracefs_inode *ti;
 	struct eventfs_inode *ei;
 	struct eventfs_file *ef;
+	struct dentry *dentry;
 	int idx;
 
 	ti = get_tracefs(inode);
@@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
 	idx = srcu_read_lock(&eventfs_srcu);
 	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
 				 srcu_read_lock_held(&eventfs_srcu)) {
-		if (ef->created)
-			dput(ef->dentry);
+		mutex_lock(&eventfs_mutex);
+		dentry = ef->dentry;
+		mutex_unlock(&eventfs_mutex);
+		if (dentry)
+			dput(dentry);
 	}
 	srcu_read_unlock(&eventfs_srcu, idx);
 	return dcache_dir_close(inode, file);
@@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
 	ei = ti->private;
 	idx = srcu_read_lock(&eventfs_srcu);
 	list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
-		if (ef->created) {
-			dget(ef->dentry);
-			continue;
-		}
-		mutex_lock(&eventfs_mutex);
-		ef->created = true;
-
-		inode_lock(dentry->d_inode);
-		if (ef->ei)
-			ef->dentry = create_dir(ef->name, ef->mode, dentry,
-						ef->data, ef->fop, ef->iop);
-		else
-			ef->dentry = create_file(ef->name, ef->mode, dentry,
-						 ef->data, ef->fop);
-		inode_unlock(dentry->d_inode);
-
-		if (IS_ERR_OR_NULL(ef->dentry)) {
-			ef->created = false;
-		} else {
-			if (ef->ei)
-				eventfs_post_create_dir(ef);
-			ef->dentry->d_fsdata = ef;
-		}
-		mutex_unlock(&eventfs_mutex);
+		create_dentry(ef, dentry, false);
 	}
 	srcu_read_unlock(&eventfs_srcu, idx);
 	return dcache_dir_open(inode, file);
 }
 
-static const struct file_operations eventfs_file_operations = {
-	.open           = dcache_dir_open_wrapper,
-	.read		= generic_read_dir,
-	.iterate_shared	= dcache_readdir,
-	.llseek		= generic_file_llseek,
-	.release        = eventfs_release,
-};
-
-static const struct inode_operations eventfs_root_dir_inode_operations = {
-	.lookup		= eventfs_root_lookup,
-};
-
 /**
  * eventfs_prepare_ef - helper function to prepare eventfs_file
  * @name: the name of the file/directory to create.
@@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
 	ti_parent = get_tracefs(parent->d_inode);
 	ei_parent = ti_parent->private;
 
-	ef = eventfs_prepare_ef(name,
-		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
-		&eventfs_file_operations,
-		&eventfs_root_dir_inode_operations, NULL);
-
+	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
 	if (IS_ERR(ef))
 		return ef;
 
@@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
 	if (!ef_parent)
 		return ERR_PTR(-EINVAL);
 
-	ef = eventfs_prepare_ef(name,
-		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
-		&eventfs_file_operations,
-		&eventfs_root_dir_inode_operations, NULL);
-
+	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
 	if (IS_ERR(ef))
 		return ef;
 
@@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
 	return 0;
 }
 
-static LLIST_HEAD(free_list);
-
-static void eventfs_workfn(struct work_struct *work)
-{
-	struct eventfs_file *ef, *tmp;
-	struct llist_node *llnode;
-
-	llnode = llist_del_all(&free_list);
-	llist_for_each_entry_safe(ef, tmp, llnode, llist) {
-		if (ef->created && ef->dentry)
-			dput(ef->dentry);
-		kfree(ef->name);
-		kfree(ef->ei);
-		kfree(ef);
-	}
-}
-
-DECLARE_WORK(eventfs_work, eventfs_workfn);
-
 static void free_ef(struct rcu_head *head)
 {
 	struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
 
-	if (!llist_add(&ef->llist, &free_list))
-		return;
-
-	queue_work(system_unbound_wq, &eventfs_work);
+	kfree(ef->name);
+	kfree(ef->ei);
+	kfree(ef);
 }
 
-
-
 /**
  * eventfs_remove_rec - remove eventfs dir or file from list
  * @ef: eventfs_file to be removed.
@@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
  * This function recursively remove eventfs_file which
  * contains info of file or dir.
  */
-static void eventfs_remove_rec(struct eventfs_file *ef, int level)
+static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
 {
 	struct eventfs_file *ef_child;
 
@@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
 		/* search for nested folders or files */
 		list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
 					 lockdep_is_held(&eventfs_mutex)) {
-			eventfs_remove_rec(ef_child, level + 1);
+			eventfs_remove_rec(ef_child, head, level + 1);
 		}
 	}
 
-	if (ef->created && ef->dentry)
-		d_invalidate(ef->dentry);
-
 	list_del_rcu(&ef->list);
-	call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+	list_add_tail(&ef->del_list, head);
 }
 
 /**
@@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
  */
 void eventfs_remove(struct eventfs_file *ef)
 {
+	struct eventfs_file *tmp;
+	LIST_HEAD(ef_del_list);
+	struct dentry *dentry_list = NULL;
+	struct dentry *dentry;
+
 	if (!ef)
 		return;
 
 	mutex_lock(&eventfs_mutex);
-	eventfs_remove_rec(ef, 0);
+	eventfs_remove_rec(ef, &ef_del_list, 0);
+
+	list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
+		if (ef->dentry) {
+			unsigned long ptr = (unsigned long)dentry_list;
+
+			/* Keep the dentry from being freed yet */
+			dget(ef->dentry);
+
+			/*
+			 * Paranoid: The dget() above should prevent the dentry
+			 * from being freed and calling eventfs_set_ef_status_free().
+			 * But just in case, set the link list LSB pointer to 1
+			 * and have eventfs_set_ef_status_free() check that to
+			 * make sure that if it does happen, it will not think
+			 * the d_fsdata is an event_file.
+			 *
+			 * For this to work, no event_file should be allocated
+			 * on a odd space, as the ef should always be allocated
+			 * to be at least word aligned. Check for that too.
+			 */
+			WARN_ON_ONCE(ptr & 1);
+
+			ef->dentry->d_fsdata = (void *)(ptr | 1);
+			dentry_list = ef->dentry;
+			ef->dentry = NULL;
+		}
+		call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
+	}
 	mutex_unlock(&eventfs_mutex);
+
+	while (dentry_list) {
+		unsigned long ptr;
+
+		dentry = dentry_list;
+		ptr = (unsigned long)dentry->d_fsdata & ~1UL;
+		dentry_list = (struct dentry *)ptr;
+		dentry->d_fsdata = NULL;
+		d_invalidate(dentry);
+		mutex_lock(&eventfs_mutex);
+		/* dentry should now have at least a single reference */
+		WARN_ONCE((int)d_count(dentry) < 1,
+			  "dentry %px less than one reference (%d) after invalidate\n",
+			  dentry, d_count(dentry));
+		mutex_unlock(&eventfs_mutex);
+		dput(dentry);
+	}
 }
 
 /**
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index c443a0c32a8c..1b880b5cd29d 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
 struct dentry *tracefs_failed_creating(struct dentry *dentry);
 struct inode *tracefs_get_inode(struct super_block *sb);
 
+void eventfs_set_ef_status_free(struct dentry *dentry);
+
 #endif /* _TRACEFS_INTERNAL_H */
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 4d30b0cafc5f..47c1b4d21735 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
 
 void eventfs_remove_events_dir(struct dentry *dentry);
 
-void eventfs_set_ef_status_free(struct dentry *dentry);
-
 struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 12:48               ` Steven Rostedt
  2023-07-21 13:16                 ` Steven Rostedt
@ 2023-07-21 13:19                 ` Steven Rostedt
  2023-07-21 17:17                   ` Nadav Amit
  2023-07-21 20:40                   ` Steven Rostedt
  1 sibling, 2 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 13:19 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli


[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]

On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
>  	const struct file_operations	*fop;
>  	const struct inode_operations	*iop;
>  	union {
> +		struct list_head	del_list;
>  		struct rcu_head		rcu;
> -		struct llist_node	llist;  /* For freeing after RCU */
> +		unsigned long		is_freed; /* Freed if one of the above is set */

I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().

I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!

>  	};
>  	void				*data;
>  	umode_t				mode;
> -	bool				created;
> +	unsigned int			flags;

Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.

The flags isn't used and can be removed. I just forgot to do so.

>  };
>  
>  static DEFINE_MUTEX(eventfs_mutex);
>  DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> +					  struct dentry *dentry,
> +					  unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> +	.lookup		= eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> +	.open           = dcache_dir_open_wrapper,
> +	.read		= generic_read_dir,
> +	.iterate_shared	= dcache_readdir,
> +	.llseek		= generic_file_llseek,
> +	.release        = eventfs_release,
> +};
> +

In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.

>  /**
>   * create_file - create a file in the tracefs filesystem
>   * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
>   * If tracefs is not enabled in the kernel, the value -%ENODEV will be
>   * returned.
>   */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> -				 struct dentry *parent, void *data,
> -				 const struct file_operations *fop,
> -				 const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
>  {

As stated, the directories always used the same *op values, so I just hard
coded it.

>  	struct tracefs_inode *ti;
>  	struct dentry *dentry;
>  	struct inode *inode;
>  
> -	WARN_ON(!S_ISDIR(mode));
> -
>  	dentry = eventfs_start_creating(name, parent);
>  	if (IS_ERR(dentry))
>  		return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
>  	if (unlikely(!inode))
>  		return eventfs_failed_creating(dentry);
>  
> -	inode->i_mode = mode;
> -	inode->i_op = iop;
> -	inode->i_fop = fop;
> +	inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> +	inode->i_op = &eventfs_root_dir_inode_operations;
> +	inode->i_fop = &eventfs_file_operations;
>  	inode->i_private = data;
>  
>  	ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
>  	struct tracefs_inode *ti_parent;
>  	struct eventfs_file *ef;
>  
> +	mutex_lock(&eventfs_mutex);

To synchronize with the removals, I needed to add locking here.

>  	ti_parent = get_tracefs(dentry->d_parent->d_inode);
>  	if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> -		return;
> +		goto out;
>  
>  	ef = dentry->d_fsdata;
>  	if (!ef)
> -		return;
> -	ef->created = false;
> +		goto out;
> +	/*
> +	 * If ef was freed, then the LSB bit is set for d_fsdata.
> +	 * But this should not happen, as it should still have a
> +	 * ref count that prevents it. Warn in case it does.
> +	 */
> +	if (WARN_ON_ONCE((unsigned long)ef & 1))
> +		goto out;

During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.

> +
> +	dentry->d_fsdata = NULL;
> +
>  	ef->dentry = NULL;
> + out:
> +	mutex_unlock(&eventfs_mutex);
>  }
>  
>  /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
>  	ti->private = ef->ei;
>  }
>  
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{

Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.

> +	bool invalidate = false;
> +	struct dentry *dentry;
> +
> +	mutex_lock(&eventfs_mutex);
> +	if (ef->is_freed) {
> +		mutex_unlock(&eventfs_mutex);
> +		return NULL;
> +	}

Ignore if the ef is on its way to be freed.

> +	if (ef->dentry) {
> +		dentry = ef->dentry;

If the ef already has a dentry (created) then use it.

> +		/* On dir open, up the ref count */
> +		if (!lookup)
> +			dget(dentry);
> +		mutex_unlock(&eventfs_mutex);
> +		return dentry;
> +	}
> +	mutex_unlock(&eventfs_mutex);
> +
> +	if (!lookup)
> +		inode_lock(parent->d_inode);
> +
> +	if (ef->ei)
> +		dentry = create_dir(ef->name, parent, ef->data);
> +	else
> +		dentry = create_file(ef->name, ef->mode, parent,
> +				     ef->data, ef->fop);
> +
> +	if (!lookup)
> +		inode_unlock(parent->d_inode);
> +
> +	mutex_lock(&eventfs_mutex);
> +	if (IS_ERR_OR_NULL(dentry)) {

With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.

Note, if the ef is freed, it should not have a dentry.

> +		/* If the ef was already updated get it */
> +		dentry = ef->dentry;
> +		if (dentry && !lookup)
> +			dget(dentry);
> +		mutex_unlock(&eventfs_mutex);
> +		return dentry;
> +	}
> +
> +	if (!ef->dentry && !ef->is_freed) {

With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.

> +		ef->dentry = dentry;
> +		if (ef->ei)
> +			eventfs_post_create_dir(ef);
> +		dentry->d_fsdata = ef;
> +	} else {
> +		/* A race here, should try again (unless freed) */
> +		invalidate = true;

I had a WARN_ON() once here. Probably could add a:

		WARN_ON_ONCE(!ef->is_freed);

> +	}
> +	mutex_unlock(&eventfs_mutex);
> +	if (invalidate)
> +		d_invalidate(dentry);
> +
> +	if (lookup || invalidate)
> +		dput(dentry);
> +
> +	return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{

A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.

> +	bool ret;
> +
> +	mutex_lock(&eventfs_mutex);
> +	ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> +	mutex_unlock(&eventfs_mutex);
> +
> +	return ret;
> +}
> +
>  /**
>   * eventfs_root_lookup - lookup routine to create file/dir
>   * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
>   * Used to create dynamic file/dir with-in @dir, search with-in ei
>   * list, if @dentry found go ahead and create the file/dir
>   */
> -
>  static struct dentry *eventfs_root_lookup(struct inode *dir,
>  					  struct dentry *dentry,
>  					  unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
>  	idx = srcu_read_lock(&eventfs_srcu);
>  	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
>  				 srcu_read_lock_held(&eventfs_srcu)) {
> -		if (strcmp(ef->name, dentry->d_name.name))
> +		if (!match_event_file(ef, dentry->d_name.name))
>  			continue;
>  		ret = simple_lookup(dir, dentry, flags);
> -		if (ef->created)
> -			continue;
> -		mutex_lock(&eventfs_mutex);
> -		ef->created = true;
> -		if (ef->ei)
> -			ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> -						ef->data, ef->fop, ef->iop);
> -		else
> -			ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> -						 ef->data, ef->fop);
> -
> -		if (IS_ERR_OR_NULL(ef->dentry)) {
> -			ef->created = false;
> -			mutex_unlock(&eventfs_mutex);
> -		} else {
> -			if (ef->ei)
> -				eventfs_post_create_dir(ef);
> -			ef->dentry->d_fsdata = ef;
> -			mutex_unlock(&eventfs_mutex);
> -			dput(ef->dentry);
> -		}
> +		create_dentry(ef, ef->d_parent, true);
>  		break;
>  	}
>  	srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
>  	struct tracefs_inode *ti;
>  	struct eventfs_inode *ei;
>  	struct eventfs_file *ef;
> +	struct dentry *dentry;
>  	int idx;
>  
>  	ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
>  	idx = srcu_read_lock(&eventfs_srcu);
>  	list_for_each_entry_srcu(ef, &ei->e_top_files, list,
>  				 srcu_read_lock_held(&eventfs_srcu)) {
> -		if (ef->created)
> -			dput(ef->dentry);
> +		mutex_lock(&eventfs_mutex);
> +		dentry = ef->dentry;
> +		mutex_unlock(&eventfs_mutex);
> +		if (dentry)
> +			dput(dentry);
>  	}
>  	srcu_read_unlock(&eventfs_srcu, idx);
>  	return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
>  	ei = ti->private;
>  	idx = srcu_read_lock(&eventfs_srcu);
>  	list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> -		if (ef->created) {
> -			dget(ef->dentry);
> -			continue;
> -		}
> -		mutex_lock(&eventfs_mutex);
> -		ef->created = true;
> -
> -		inode_lock(dentry->d_inode);
> -		if (ef->ei)
> -			ef->dentry = create_dir(ef->name, ef->mode, dentry,
> -						ef->data, ef->fop, ef->iop);
> -		else
> -			ef->dentry = create_file(ef->name, ef->mode, dentry,
> -						 ef->data, ef->fop);
> -		inode_unlock(dentry->d_inode);
> -
> -		if (IS_ERR_OR_NULL(ef->dentry)) {
> -			ef->created = false;
> -		} else {
> -			if (ef->ei)
> -				eventfs_post_create_dir(ef);
> -			ef->dentry->d_fsdata = ef;
> -		}
> -		mutex_unlock(&eventfs_mutex);
> +		create_dentry(ef, dentry, false);
>  	}
>  	srcu_read_unlock(&eventfs_srcu, idx);
>  	return dcache_dir_open(inode, file);
>  }
>  
> -static const struct file_operations eventfs_file_operations = {
> -	.open           = dcache_dir_open_wrapper,
> -	.read		= generic_read_dir,
> -	.iterate_shared	= dcache_readdir,
> -	.llseek		= generic_file_llseek,
> -	.release        = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> -	.lookup		= eventfs_root_lookup,
> -};
> -
>  /**
>   * eventfs_prepare_ef - helper function to prepare eventfs_file
>   * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
>  	ti_parent = get_tracefs(parent->d_inode);
>  	ei_parent = ti_parent->private;
>  
> -	ef = eventfs_prepare_ef(name,
> -		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> -		&eventfs_file_operations,
> -		&eventfs_root_dir_inode_operations, NULL);
> -
> +	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);

For directories, just use the hard coded values.

>  	if (IS_ERR(ef))
>  		return ef;
>  
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
>  	if (!ef_parent)
>  		return ERR_PTR(-EINVAL);
>  
> -	ef = eventfs_prepare_ef(name,
> -		S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> -		&eventfs_file_operations,
> -		&eventfs_root_dir_inode_operations, NULL);
> -
> +	ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);

ditto.

>  	if (IS_ERR(ef))
>  		return ef;
>  
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
>  	return 0;
>  }
>  
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> -	struct eventfs_file *ef, *tmp;
> -	struct llist_node *llnode;
> -
> -	llnode = llist_del_all(&free_list);
> -	llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> -		if (ef->created && ef->dentry)
> -			dput(ef->dentry);
> -		kfree(ef->name);
> -		kfree(ef->ei);
> -		kfree(ef);
> -	}
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
>  static void free_ef(struct rcu_head *head)
>  {
>  	struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>  
> -	if (!llist_add(&ef->llist, &free_list))
> -		return;
> -
> -	queue_work(system_unbound_wq, &eventfs_work);
> +	kfree(ef->name);
> +	kfree(ef->ei);
> +	kfree(ef);

Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.

>  }
>  
> -
> -
>  /**
>   * eventfs_remove_rec - remove eventfs dir or file from list
>   * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
>   * This function recursively remove eventfs_file which
>   * contains info of file or dir.
>   */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
>  {
>  	struct eventfs_file *ef_child;
>  
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
>  		/* search for nested folders or files */
>  		list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
>  					 lockdep_is_held(&eventfs_mutex)) {
> -			eventfs_remove_rec(ef_child, level + 1);
> +			eventfs_remove_rec(ef_child, head, level + 1);
>  		}
>  	}
>  
> -	if (ef->created && ef->dentry)
> -		d_invalidate(ef->dentry);
> -
>  	list_del_rcu(&ef->list);
> -	call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> +	list_add_tail(&ef->del_list, head);

Hold off on freeing the ef. Add it to a link list to do so later.

>  }
>  
>  /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
>   */
>  void eventfs_remove(struct eventfs_file *ef)
>  {
> +	struct eventfs_file *tmp;
> +	LIST_HEAD(ef_del_list);
> +	struct dentry *dentry_list = NULL;
> +	struct dentry *dentry;
> +
>  	if (!ef)
>  		return;
>  
>  	mutex_lock(&eventfs_mutex);
> -	eventfs_remove_rec(ef, 0);
> +	eventfs_remove_rec(ef, &ef_del_list, 0);

The above returns back with ef_del_list holding all the ef's to be freed.

I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.

> +
> +	list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> +		if (ef->dentry) {
> +			unsigned long ptr = (unsigned long)dentry_list;
> +
> +			/* Keep the dentry from being freed yet */
> +			dget(ef->dentry);
> +
> +			/*
> +			 * Paranoid: The dget() above should prevent the dentry
> +			 * from being freed and calling eventfs_set_ef_status_free().
> +			 * But just in case, set the link list LSB pointer to 1
> +			 * and have eventfs_set_ef_status_free() check that to
> +			 * make sure that if it does happen, it will not think
> +			 * the d_fsdata is an event_file.
> +			 *
> +			 * For this to work, no event_file should be allocated
> +			 * on a odd space, as the ef should always be allocated
> +			 * to be at least word aligned. Check for that too.
> +			 */
> +			WARN_ON_ONCE(ptr & 1);
> +
> +			ef->dentry->d_fsdata = (void *)(ptr | 1);

Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.


> +			dentry_list = ef->dentry;
> +			ef->dentry = NULL;
> +		}
> +		call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> +	}
>  	mutex_unlock(&eventfs_mutex);
> +
> +	while (dentry_list) {
> +		unsigned long ptr;
> +
> +		dentry = dentry_list;
> +		ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> +		dentry_list = (struct dentry *)ptr;
> +		dentry->d_fsdata = NULL;

With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!

> +		d_invalidate(dentry);
> +		mutex_lock(&eventfs_mutex);
> +		/* dentry should now have at least a single reference */
> +		WARN_ONCE((int)d_count(dentry) < 1,
> +			  "dentry %px less than one reference (%d) after invalidate\n",

I did update the above to:

		WARN_ONCE((int)d_count(dentry) < 1,
			  "dentry %px (%s) less than one reference (%d) after invalidate\n",
			  dentry, dentry->d_name.name, d_count(dentry));

To include the name of the dentry (my current work is triggering this still).

> +			  dentry, d_count(dentry));
> +		mutex_unlock(&eventfs_mutex);
> +		dput(dentry);
> +	}
>  }
>  
>  /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
>  struct dentry *tracefs_failed_creating(struct dentry *dentry);
>  struct inode *tracefs_get_inode(struct super_block *sb);
>  
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
>  #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>  
>  void eventfs_remove_events_dir(struct dentry *dentry);
>  
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -

Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.

-- Steve


>  struct dentry *tracefs_create_file(const char *name, umode_t mode,
>  				   struct dentry *parent, void *data,
>  				   const struct file_operations *fops);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 13:19                 ` Steven Rostedt
@ 2023-07-21 17:17                   ` Nadav Amit
  2023-07-21 17:24                     ` Steven Rostedt
  2023-07-21 17:30                     ` Steven Rostedt
  2023-07-21 20:40                   ` Steven Rostedt
  1 sibling, 2 replies; 44+ messages in thread
From: Nadav Amit @ 2023-07-21 17:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ajay Kaher, shuah, mhiramat, Ching-lin Yu,
	Linux Kernel Mailing List, linux-kselftest, linux-trace-kernel,
	kernel test robot, oe-lkp, Alexey Makhalov, er.ajay.kaher,
	srivatsa, Tapas Kundu, Vasavi Sirnapalli


> On Jul 21, 2023, at 6:19 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>      union {
>> +             struct list_head        del_list;
>>              struct rcu_head         rcu;
>> -             struct llist_node       llist;  /* For freeing after RCU */
>> +             unsigned long           is_freed; /* Freed if one of the above is set */
> 
> I changed the freeing around. The dentries are freed before returning from
> eventfs_remove_dir().
> 
> I also added a "is_freed" field that is part of the union and is set if
> list elements have content. Note, since the union was criticized before, I
> will state the entire purpose of doing this patch set is to save memory.
> This structure will be used for every event file. What's the point of
> getting rid of dentries if we are replacing it with something just as big?
> Anyway, struct dentry does the exact same thing!

Hey, don’t shoot me…

[And admittedly, I didn’t review the whole series after v1.]

I understand your position, but I think that at least is_freed should not
be in the union, and you can just put it after umode_t.

Even for the matter of size, it should not matter in most architectures
since umode_t is 16-bit, as natural alignment is at least 32-bits.

[ And “bool" is clearer type for is_freed. ]


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 17:17                   ` Nadav Amit
@ 2023-07-21 17:24                     ` Steven Rostedt
  2023-07-21 17:30                     ` Steven Rostedt
  1 sibling, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 17:24 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Ajay Kaher, shuah, mhiramat, Ching-lin Yu,
	Linux Kernel Mailing List, linux-kselftest, linux-trace-kernel,
	kernel test robot, oe-lkp, Alexey Makhalov, er.ajay.kaher,
	srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Fri, 21 Jul 2023 17:17:24 +0000
Nadav Amit <namit@vmware.com> wrote:


> > I also added a "is_freed" field that is part of the union and is set if
> > list elements have content. Note, since the union was criticized before, I
> > will state the entire purpose of doing this patch set is to save memory.
> > This structure will be used for every event file. What's the point of
> > getting rid of dentries if we are replacing it with something just as big?
> > Anyway, struct dentry does the exact same thing!  
> 
> Hey, don’t shoot me…

 ;-)

> 
> [And admittedly, I didn’t review the whole series after v1.]
> 
> I understand your position, but I think that at least is_freed should not
> be in the union, and you can just put it after umode_t.
> 
> Even for the matter of size, it should not matter in most architectures
> since umode_t is 16-bit, as natural alignment is at least 32-bits.

My new code I'm working on removes the umode_t and will require this then.
I rather have that part tested before adding other drastic changes.

-- Steve


> 
> [ And “bool" is clearer type for is_freed. ]
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 17:17                   ` Nadav Amit
  2023-07-21 17:24                     ` Steven Rostedt
@ 2023-07-21 17:30                     ` Steven Rostedt
  1 sibling, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 17:30 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Ajay Kaher, shuah, mhiramat, Ching-lin Yu,
	Linux Kernel Mailing List, linux-kselftest, linux-trace-kernel,
	kernel test robot, oe-lkp, Alexey Makhalov, er.ajay.kaher,
	srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Fri, 21 Jul 2023 17:17:24 +0000
Nadav Amit <namit@vmware.com> wrote:

> [ And “bool" is clearer type for is_freed. ]

Oh, and to answer why I didn't use bool. I want to make sure that it
doesn't use just part of the other elements in the union. If the compiler
decides to use one byte for the bool (which it is perfectly valid to do
so), and it happens to map over a zero of the other elements in the union,
then it will give a false negative.

By using unsigned long, it will be guaranteed to contain some content.

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 13:18               ` Steven Rostedt
@ 2023-07-21 19:14                 ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 19:14 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Fri, 21 Jul 2023 09:18:43 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> OK, I got this working and it appears to pass all my tests. I actually got
> it working Wednesday night, but I tried a different approach on Thursday
> that got rid of the evenfs_file and only used eventfs_inodes and makes the
> files more dynamic. There's still a couple of corner cases that are not
> working with this approach (the dentry counters are getting out of sync).
> This should not stop this from going in. I'll continue working on that
> approach for the next merge cycle. But for now, here's the patch to this
> series that works.

Just figured out my bug with my new design. It was in the eventfs_release()
code, where I have a loop that does the dput on the children, but I wasn't
dput(child), I was dput(parent) in that loop!!

Anyway, for this merge window I much rather have this code in, and for the
next merge window I'll add this update.

I can post the new design too, but first let's focus on this.

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 13:19                 ` Steven Rostedt
  2023-07-21 17:17                   ` Nadav Amit
@ 2023-07-21 20:40                   ` Steven Rostedt
  2023-07-26 18:54                     ` Ajay Kaher
  1 sibling, 1 reply; 44+ messages in thread
From: Steven Rostedt @ 2023-07-21 20:40 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli

On Fri, 21 Jul 2023 09:19:47 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > +	} else {
> > +		/* A race here, should try again (unless freed) */
> > +		invalidate = true;  
> 
> I had a WARN_ON() once here. Probably could add a:
> 
> 		WARN_ON_ONCE(!ef->is_freed);

Yeah this should have a WARN_ON_ONCE() because the only way to get here
with having a dentry and the ef->dentry being set is if we have two
dentries with the same name in the same directory. Which should never
happen.

I think we can add:

		/*
		 * Should never happen unless we get here due to being freed.
		 * Otherwise it means two dentries exist with the same name.
		 */
		WARN_ON_ONCE(!ef->is_freed);

> 
> > +	}


-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 00/10] tracing: introducing eventfs
  2023-07-21 20:40                   ` Steven Rostedt
@ 2023-07-26 18:54                     ` Ajay Kaher
  0 siblings, 0 replies; 44+ messages in thread
From: Ajay Kaher @ 2023-07-26 18:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: shuah, mhiramat, Ching-lin Yu, linux-kernel, linux-kselftest,
	linux-trace-kernel, lkp, Nadav Amit, oe-lkp, Alexey Makhalov,
	er.ajay.kaher, srivatsa, Tapas Kundu, Vasavi Sirnapalli



> On 22-Jul-2023, at 2:10 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> !! External Email
> 
> On Fri, 21 Jul 2023 09:19:47 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>> +   } else {
>>> +           /* A race here, should try again (unless freed) */
>>> +           invalidate = true;
>> 
>> I had a WARN_ON() once here. Probably could add a:
>> 
>>              WARN_ON_ONCE(!ef->is_freed);
> 
> Yeah this should have a WARN_ON_ONCE() because the only way to get here
> with having a dentry and the ef->dentry being set is if we have two
> dentries with the same name in the same directory. Which should never
> happen.
> 
> I think we can add:
> 
>                /*
>                 * Should never happen unless we get here due to being freed.
>                 * Otherwise it means two dentries exist with the same name.
>                 */
>                WARN_ON_ONCE(!ef->is_freed);

I missed WARN_ON_ONCE in v5 06/10, I will add in v6 06/10.

- Ajay


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/10] eventfs: Move tracing/events to eventfs
  2023-07-13 11:33 ` [PATCH v4 09/10] eventfs: Move tracing/events to eventfs Ajay Kaher
  2023-07-14 21:06   ` Steven Rostedt
@ 2023-09-08 12:14   ` Sven Schnelle
  2023-09-08 12:31     ` Steven Rostedt
  1 sibling, 1 reply; 44+ messages in thread
From: Sven Schnelle @ 2023-09-08 12:14 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: rostedt, shuah, mhiramat, chinglinyu, linux-kernel,
	linux-kselftest, linux-trace-kernel, lkp, namit, oe-lkp,
	amakhalov, er.ajay.kaher, srivatsa, tkundu, vsirnapalli

Ajay Kaher <akaher@vmware.com> writes:

> Till now /sys/kernel/debug/tracing/events is a part of tracefs,
> with-in this patch creating 'events' and it's sub-dir as eventfs.
> Basically replacing tracefs calls with eventfs calls for 'events'.
>
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Tested-by: Ching-lin Yu <chinglinyu@google.com>
> ---
>  fs/tracefs/inode.c           | 18 ++++++++++
>  include/linux/trace_events.h |  1 +
>  kernel/trace/trace.h         |  2 +-
>  kernel/trace/trace_events.c  | 67 +++++++++++++++++++-----------------
>  4 files changed, 55 insertions(+), 33 deletions(-)

With this patchset in next-20230908 the following crash is observed on
s390 while running ftracetest test.d/instances/instance-event.tc:

(It also crashes without KASAN, just not on every try)

[   26.728436] ==================================================================
[   26.728450] BUG: KASAN: slab-use-after-free in __ftrace_event_enable_disable+0x56c/0x648
[   26.728460] Read of size 8 at addr 00000000107d3838 by task ftracetest/514
[   26.728465]
[   26.728468] CPU: 1 PID: 514 Comm: ftracetest Not tainted 6.5.0-rc4-00014-g27152bceea1d #958
[   26.728474] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
[   26.728478] Call Trace:
[   26.728482]  [<000000000262349e>] dump_stack_lvl+0x106/0x1c8
[   26.728490]  [<00000000009e3b8c>] print_address_description.constprop.0+0x34/0x378
[   26.728498]  [<00000000009e3f7c>] print_report+0xac/0x240
[   26.728504]  [<00000000009e42fa>] kasan_report+0xf2/0x130
[   26.728509]  [<00000000005c61c4>] __ftrace_event_enable_disable+0x56c/0x648
[   26.728516]  [<00000000005c6d02>] event_enable_write+0x132/0x218
[   26.728522]  [<0000000000a89598>] vfs_write+0x208/0x930
[   26.728528]  [<0000000000a8a148>] ksys_write+0x118/0x200
[   26.728535]  [<000000000010c5fc>] do_syscall+0x22c/0x328
[   26.728547]  [<000000000268e8b2>] __do_syscall+0x9a/0xf8
[   26.728553]  [<00000000026c1d28>] system_call+0x70/0x98
[   26.728559] 3 locks held by ftracetest/514:
[   26.728562]  #0: 00000000226addd8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x84/0xa0
[   26.728577]  #1: 0000000008f19440 (sb_writers#14){.+.+}-{0:0}, at: ksys_write+0x118/0x200
[   26.728594]  #2: 00000000033205d0 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0xdc/0x218
[   26.728609]
[   26.728613] Allocated by task 515:
[   26.728617]  kasan_save_stack+0x40/0x68
[   26.728627]  kasan_set_track+0x36/0x48
[   26.728632]  __kasan_slab_alloc+0x8e/0xb8
[   26.728638]  kmem_cache_alloc+0x1ec/0x568
[   26.728643]  trace_create_new_event+0x88/0x3d8
[   26.728648]  event_trace_add_tracer+0x13a/0x2a0
[   26.728654]  trace_array_create_dir+0x80/0x1a0
[   26.728660]  trace_array_create+0x344/0x700
[   26.728665]  instance_mkdir+0xbe/0x118
[   26.728670]  tracefs_syscall_mkdir+0xa8/0xf8
[   26.728678]  vfs_mkdir+0x454/0x6c0
[   26.728685]  do_mkdirat+0x24e/0x2a8
[   26.728692]  __s390x_sys_mkdir+0xf4/0x168
[   26.728696]  do_syscall+0x22c/0x328
[   26.728700]  __do_syscall+0x9a/0xf8
[   26.728704]  system_call+0x70/0x98
[   26.728710]
[   26.728713] Freed by task 553:
[   26.728716]  kasan_save_stack+0x40/0x68
[   26.728721]  kasan_set_track+0x36/0x48
[   26.728726]  kasan_save_free_info+0x42/0x60
[   26.728733]  ____kasan_slab_free+0x17c/0x1e0
[   26.728741]  __kasan_slab_free+0x24/0x30
[   26.728747]  slab_free_freelist_hook+0x272/0x3a0
[   26.728751]  kmem_cache_free+0xec/0x540
[   26.728757]  event_trace_del_tracer+0x108/0x1f8
[   26.728763]  __remove_instance+0x242/0x668
[   26.728781]  instance_rmdir+0xc4/0x108
[   26.728788]  tracefs_syscall_rmdir+0xd4/0x160
[   26.728796]  vfs_rmdir+0x18a/0x538
[   26.728802]  do_rmdir+0x2c0/0x358
[   26.728807]  __s390x_sys_rmdir+0xd2/0x138
[   26.728811]  do_syscall+0x22c/0x328
[   26.728817]  __do_syscall+0x9a/0xf8
[   26.728823]  system_call+0x70/0x98
[   26.728829]
[   26.728833] The buggy address belongs to the object at 00000000107d3828
[   26.728833]  which belongs to the cache trace_event_file of size 96
[   26.728841] The buggy address is located 16 bytes inside of
[   26.728841]  freed 96-byte region [00000000107d3828, 00000000107d3888)
[   26.728846]
[   26.728848] The buggy address belongs to the physical page:
[   26.728851] page:000040000041f4c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x107d3
[   26.728871] flags: 0x1ffff00000000200(slab|node=0|zone=0|lastcpupid=0x1ffff)
[   26.728880] page_type: 0xffffffff()
[   26.728889] raw: 1ffff00000000200 0000000005a2c100 0000400000891490 00004000006a89d0
[   26.728894] raw: 0000000000000000 0013002600000000 ffffffff00000001 0000000000000000
[   26.728898] page dumped because: kasan: bad access detected
[   26.728902]
[   26.728906] Memory state around the buggy address:
[   26.728912]  00000000107d3700: fc fc fc fc fc fc fc fc fc fc fc fa fb fb fb fb
[   26.728919]  00000000107d3780: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[   26.728928] >00000000107d3800: fc fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb
[   26.728933]                                         ^
[   26.728940]  00000000107d3880: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fa
[   26.728943]  00000000107d3900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
[   26.728946] ==================================================================
[   26.728955] Disabling lock debugging due to kernel taint
[   27.089717] Unable to handle kernel pointer dereference in virtual kernel address space
[   27.089725] Failing address: 0d896d6d6d6d6000 TEID: 0d896d6d6d6d6803
[   27.089729] Fault in home space mode while using kernel ASCE.
[   27.089737] AS:00000000046b800b R2:0000000000000028
[   27.089770] Oops: 0038 ilc:2 [#1] PREEMPT SMP
[   27.089779] Modules linked in:
[   27.089784] CPU: 1 PID: 514 Comm: ftracetest Tainted: G    B              6.5.0-rc4-00014-g27152bceea1d #958
[   27.089790] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
[   27.089794] Krnl PSW : 0404e00180000000 00000000005c4488 (trace_event_buffer_reserve+0xf0/0x3e8)
[   27.089804]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[   27.089810] Krnl GPRS: 0000000000000001 0000000000000001 6b6b6b6b6b6b6c5b 001c000000000000
[   27.089816]            0d896d6d6d6d6d8b 00000000005c4702 001bff80013cf170 6b6b6b6b6b6b6b6b
[   27.089822]            6b6b6b6b6b6b6b6b 00037ff000279e26 00000000107d3828 001bff80013cf208
[   27.089827]            00000000031669f8 00000000026c1c1c 00000000005c4702 001bff80013cf078
[   27.089843] Krnl Code: 00000000005c447a: 412080f0            la      %r2,240(%r8)
[   27.089843]            00000000005c447e: eb420003000c        srlg    %r4,%r2,3
[   27.089843]           #00000000005c4484: b9080043            agr     %r4,%r3
[   27.089843]           >00000000005c4488: 95004000            cli     0(%r4),0
[   27.089843]            00000000005c448c: a7740142            brc     7,00000000005c4710
[   27.089843]            00000000005c4490: e3e080f00004        lg      %r14,240(%r8)
[   27.089843]            00000000005c4496: e3e0f0a80024        stg     %r14,168(%r15)
[   27.089843]            00000000005c449c: 412080f8            la      %r2,248(%r8)
[   27.089885] Call Trace:
[   27.089890]  [<00000000005c4488>] trace_event_buffer_reserve+0xf0/0x3e8
[   27.089897] ([<00000000005c4702>] trace_event_buffer_reserve+0x36a/0x3e8)
[   27.089903]  [<000000000028979e>] trace_event_raw_event_sched_switch+0xce/0x430
[   27.089910]  [<00000000026a1cd4>] __schedule+0xba4/0x1f58
[   27.089916]  [<00000000026a4544>] preempt_schedule_irq+0xdc/0x190
[   27.089925]  [<000000000268f99c>] irqentry_exit+0xac/0x128
[   27.089931]  [<00000000026c1fac>] ext_int_handler+0xc4/0xf0
[   27.089936]  [<0000000000135a98>] __unwind_start+0x190/0x568
[   27.089944] ([<0000000000135a7c>] __unwind_start+0x174/0x568)
[   27.089950]  [<000000000013bb90>] arch_stack_walk+0x130/0x1d0
[   27.089957]  [<0000000000424198>] stack_trace_save+0xc8/0xf8
[   27.089968]  [<00000000009d3468>] set_track_prepare+0x38/0x58
[   27.089974]  [<00000000009df35a>] free_to_partial_list+0x1ea/0x2f0
[   27.089979]  [<00000000009df656>] __slab_free+0x1f6/0x3c0
[   27.089984]  [<00000000009e19e8>] ___cache_free+0x158/0x188
[   27.089989]  [<00000000009e6df6>] qlist_free_all+0x7e/0x150
[   27.089994]  [<00000000009e752a>] kasan_quarantine_reduce+0x17a/0x1d8
[   27.089999]  [<00000000009e3488>] __kasan_slab_alloc+0x98/0xb8
[   27.090005]  [<00000000009de3fc>] kmem_cache_alloc+0x1ec/0x568
[   27.090011]  [<0000000000ab1d8e>] getname_flags.part.0+0x56/0x420
[   27.090018]  [<0000000000a81244>] do_sys_openat2+0xc4/0x178
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00.
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 01.
[   27.090027]  [<0000000000a815ae>] do_sys_open+0xce/0x118
[   27.090033]  [<000000000010c5fc>] do_syscall+0x22c/0x328
[   27.090040]  [<000000000268e8b2>] __do_syscall+0x9a/0xf8
[   27.090048]  [<00000000026c1d28>] system_call+0x70/0x98
[   27.090056] INFO: lockdep is turned off.
[   27.090060] Last Breaking-Event-Address:
[   27.090066]  [<00000000005c4702>] trace_event_buffer_reserve+0x36a/0x3e8
[   27.090077] Kernel panic - not syncing: Fatal exception: panic_on_oops

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/10] eventfs: Move tracing/events to eventfs
  2023-09-08 12:14   ` Sven Schnelle
@ 2023-09-08 12:31     ` Steven Rostedt
  0 siblings, 0 replies; 44+ messages in thread
From: Steven Rostedt @ 2023-09-08 12:31 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Ajay Kaher, shuah, mhiramat, chinglinyu, linux-kernel,
	linux-kselftest, linux-trace-kernel, lkp, namit, oe-lkp,
	amakhalov, er.ajay.kaher, srivatsa, tkundu, vsirnapalli

On Fri, 08 Sep 2023 14:14:20 +0200
Sven Schnelle <svens@linux.ibm.com> wrote:

> Ajay Kaher <akaher@vmware.com> writes:
> 
> > Till now /sys/kernel/debug/tracing/events is a part of tracefs,
> > with-in this patch creating 'events' and it's sub-dir as eventfs.
> > Basically replacing tracefs calls with eventfs calls for 'events'.
> >
> > Signed-off-by: Ajay Kaher <akaher@vmware.com>
> > Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Tested-by: Ching-lin Yu <chinglinyu@google.com>
> > ---
> >  fs/tracefs/inode.c           | 18 ++++++++++
> >  include/linux/trace_events.h |  1 +
> >  kernel/trace/trace.h         |  2 +-
> >  kernel/trace/trace_events.c  | 67 +++++++++++++++++++-----------------
> >  4 files changed, 55 insertions(+), 33 deletions(-)  
> 
> With this patchset in next-20230908 the following crash is observed on
> s390 while running ftracetest test.d/instances/instance-event.tc:
> 
> (It also crashes without KASAN, just not on every try)

Yep, and I've spent the last few days fixing this :-)

  https://lore.kernel.org/linux-trace-kernel/20230907024710.866917011@goodmis.org/

-- Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-09-08 12:31 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13 11:33 [PATCH v4 00/10] tracing: introducing eventfs Ajay Kaher
2023-07-13 11:33 ` [PATCH v4 01/10] tracing: Require all trace events to have a TRACE_SYSTEM Ajay Kaher
2023-07-13 11:33 ` [PATCH v4 02/10] tracefs: Rename some tracefs function Ajay Kaher
2023-07-14 15:11   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 03/10] eventfs: Implement eventfs dir creation functions Ajay Kaher
2023-07-14 15:44   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 04/10] eventfs: Implement eventfs file add functions Ajay Kaher
2023-07-14 16:23   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 05/10] eventfs: Implement eventfs file, directory remove function Ajay Kaher
2023-07-14 16:35   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 06/10] eventfs: Implement functions to create eventfs files and directories Ajay Kaher
2023-07-14 16:47   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 07/10] eventfs: Implement eventfs lookup, read, open functions Ajay Kaher
2023-07-14 20:17   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 08/10] eventfs: Implement tracefs_inode_cache Ajay Kaher
2023-07-14 20:27   ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 09/10] eventfs: Move tracing/events to eventfs Ajay Kaher
2023-07-14 21:06   ` Steven Rostedt
2023-07-19 11:08     ` Ajay Kaher
2023-07-19 11:08     ` Ajay Kaher
2023-09-08 12:14   ` Sven Schnelle
2023-09-08 12:31     ` Steven Rostedt
2023-07-13 11:33 ` [PATCH v4 10/10] test: ftrace: Fix kprobe test for eventfs Ajay Kaher
2023-07-14  2:37   ` Steven Rostedt
2023-07-14 13:27     ` Masami Hiramatsu
2023-07-17  5:24       ` Ajay Kaher
2023-07-17 12:24         ` Steven Rostedt
2023-07-14 22:58 ` [PATCH v4 00/10] tracing: introducing eventfs Steven Rostedt
2023-07-16 17:32   ` Ajay Kaher
2023-07-18 13:40     ` Steven Rostedt
2023-07-19 10:25       ` Ajay Kaher
2023-07-19 14:23         ` Steven Rostedt
2023-07-19 18:37           ` Ajay Kaher
2023-07-19 18:40             ` Steven Rostedt
2023-07-21 12:48               ` Steven Rostedt
2023-07-21 13:16                 ` Steven Rostedt
2023-07-21 13:19                 ` Steven Rostedt
2023-07-21 17:17                   ` Nadav Amit
2023-07-21 17:24                     ` Steven Rostedt
2023-07-21 17:30                     ` Steven Rostedt
2023-07-21 20:40                   ` Steven Rostedt
2023-07-26 18:54                     ` Ajay Kaher
2023-07-21 13:18               ` Steven Rostedt
2023-07-21 19:14                 ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.