linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [for-next][PATCH 0/3] tracefs/eventfs: Updates for 6.8
@ 2024-01-04 16:47 Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 1/3] eventfs: Remove "lookup" parameter from create_dir/file_dentry() Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 16:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton


  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
eventfs/for-next

Head SHA1: 8186fff7ab649085e2c60d032d9a20a85af1d87c


Steven Rostedt (Google) (3):
      eventfs: Remove "lookup" parameter from create_dir/file_dentry()
      eventfs: Stop using dcache_readdir() for getdents()
      tracefs/eventfs: Use root and instance inodes as default ownership

----
 fs/tracefs/event_inode.c | 320 +++++++++++++++++++++++------------------------
 fs/tracefs/inode.c       | 198 ++++++++++++++++-------------
 fs/tracefs/internal.h    |   3 +
 3 files changed, 270 insertions(+), 251 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [for-next][PATCH 1/3] eventfs: Remove "lookup" parameter from create_dir/file_dentry()
  2024-01-04 16:47 [for-next][PATCH 0/3] tracefs/eventfs: Updates for 6.8 Steven Rostedt
@ 2024-01-04 16:47 ` Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership Steven Rostedt
  2 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 16:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Ajay Kaher, Al Viro, Christian Brauner

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The "lookup" parameter is a way to differentiate the call to
create_file/dir_dentry() from when it's just a lookup (no need to up the
dentry refcount) and accessed via a readdir (need to up the refcount).

But reality, it just makes the code more complex. Just up the refcount and
let the caller decide to dput() the result or not.

Link: https://lore.kernel.org/linux-trace-kernel/20240103102553.17a19cea@gandalf.local.home
Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.517502710@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 fs/tracefs/event_inode.c | 55 +++++++++++++++-------------------------
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index f0677ea0ec24..c360300fb866 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -390,16 +390,14 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
  * @mode: The mode of the file.
  * @data: The data to use to set the inode of the file with on open()
  * @fops: The fops of the file to be created.
- * @lookup: If called by the lookup routine, in which case, dput() the created dentry.
  *
  * Create a dentry for a file of an eventfs_inode @ei and place it into the
- * address located at @e_dentry. If the @e_dentry already has a dentry, then
- * just do a dget() on it and return. Otherwise create the dentry and attach it.
+ * address located at @e_dentry.
  */
 static struct dentry *
 create_file_dentry(struct eventfs_inode *ei, int idx,
 		   struct dentry *parent, const char *name, umode_t mode, void *data,
-		   const struct file_operations *fops, bool lookup)
+		   const struct file_operations *fops)
 {
 	struct eventfs_attr *attr = NULL;
 	struct dentry **e_dentry = &ei->d_children[idx];
@@ -414,9 +412,7 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
 	}
 	/* If the e_dentry already has a dentry, use it */
 	if (*e_dentry) {
-		/* lookup does not need to up the ref count */
-		if (!lookup)
-			dget(*e_dentry);
+		dget(*e_dentry);
 		mutex_unlock(&eventfs_mutex);
 		return *e_dentry;
 	}
@@ -441,13 +437,12 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
 		 * way to being freed, don't return it. If e_dentry is NULL
 		 * it means it was already freed.
 		 */
-		if (ei->is_freed)
+		if (ei->is_freed) {
 			dentry = NULL;
-		else
+		} else {
 			dentry = *e_dentry;
-		/* The lookup does not need to up the dentry refcount */
-		if (dentry && !lookup)
 			dget(dentry);
+		}
 		mutex_unlock(&eventfs_mutex);
 		return dentry;
 	}
@@ -465,9 +460,6 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
 	}
 	mutex_unlock(&eventfs_mutex);
 
-	if (lookup)
-		dput(dentry);
-
 	return dentry;
 }
 
@@ -500,13 +492,12 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei)
  * @pei: The eventfs_inode parent of ei.
  * @ei: The eventfs_inode to create the directory for
  * @parent: The dentry of the parent of this directory
- * @lookup: True if this is called by the lookup code
  *
  * This creates and attaches a directory dentry to the eventfs_inode @ei.
  */
 static struct dentry *
 create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
-		  struct dentry *parent, bool lookup)
+		  struct dentry *parent)
 {
 	struct dentry *dentry = NULL;
 
@@ -518,11 +509,9 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
 		return NULL;
 	}
 	if (ei->dentry) {
-		/* If the dentry already has a dentry, use it */
+		/* If the eventfs_inode already has a dentry, use it */
 		dentry = ei->dentry;
-		/* lookup does not need to up the ref count */
-		if (!lookup)
-			dget(dentry);
+		dget(dentry);
 		mutex_unlock(&eventfs_mutex);
 		return dentry;
 	}
@@ -542,7 +531,7 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
 		 * way to being freed.
 		 */
 		dentry = ei->dentry;
-		if (dentry && !lookup)
+		if (dentry)
 			dget(dentry);
 		mutex_unlock(&eventfs_mutex);
 		return dentry;
@@ -562,9 +551,6 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
 	}
 	mutex_unlock(&eventfs_mutex);
 
-	if (lookup)
-		dput(dentry);
-
 	return dentry;
 }
 
@@ -589,8 +575,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 	struct eventfs_inode *ei;
 	struct dentry *ei_dentry = NULL;
 	struct dentry *ret = NULL;
+	struct dentry *d;
 	const char *name = dentry->d_name.name;
-	bool created = false;
 	umode_t mode;
 	void *data;
 	int idx;
@@ -626,13 +612,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 		ret = simple_lookup(dir, dentry, flags);
 		if (IS_ERR(ret))
 			goto out;
-		create_dir_dentry(ei, ei_child, ei_dentry, true);
-		created = true;
-		break;
-	}
-
-	if (created)
+		d = create_dir_dentry(ei, ei_child, ei_dentry);
+		dput(d);
 		goto out;
+	}
 
 	for (i = 0; i < ei->nr_entries; i++) {
 		entry = &ei->entries[i];
@@ -650,8 +633,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 			ret = simple_lookup(dir, dentry, flags);
 			if (IS_ERR(ret))
 				goto out;
-			create_file_dentry(ei, i, ei_dentry, name, mode, cdata,
-					   fops, true);
+			d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
+			dput(d);
 			break;
 		}
 	}
@@ -768,9 +751,10 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
 	inode_lock(parent->d_inode);
 	list_for_each_entry_srcu(ei_child, &ei->children, list,
 				 srcu_read_lock_held(&eventfs_srcu)) {
-		d = create_dir_dentry(ei, ei_child, parent, false);
+		d = create_dir_dentry(ei, ei_child, parent);
 		if (d) {
 			ret = add_dentries(&dentries, d, cnt);
+			dput(d);
 			if (ret < 0)
 				break;
 			cnt++;
@@ -790,9 +774,10 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
 		mutex_unlock(&eventfs_mutex);
 		if (r <= 0)
 			continue;
-		d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false);
+		d = create_file_dentry(ei, i, parent, name, mode, cdata, fops);
 		if (d) {
 			ret = add_dentries(&dentries, d, cnt);
+			dput(d);
 			if (ret < 0)
 				break;
 			cnt++;
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 16:47 [for-next][PATCH 0/3] tracefs/eventfs: Updates for 6.8 Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 1/3] eventfs: Remove "lookup" parameter from create_dir/file_dentry() Steven Rostedt
@ 2024-01-04 16:47 ` Steven Rostedt
  2024-01-04 18:46   ` Linus Torvalds
  2024-01-15 20:58   ` Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership Steven Rostedt
  2 siblings, 2 replies; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 16:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Ajay Kaher, Al Viro, Christian Brauner

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The eventfs creates dynamically allocated dentries and inodes. Using the
dcache_readdir() logic for its own directory lookups requires hiding the
cursor of the dcache logic and playing games to allow the dcache_readdir()
to still have access to the cursor while the eventfs saved what it created
and what it needs to release.

Instead, just have eventfs have its own iterate_shared callback function
that will fill in the dent entries. This simplifies the code quite a bit.

Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 fs/tracefs/event_inode.c | 194 +++++++++++++--------------------------
 1 file changed, 64 insertions(+), 130 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index c360300fb866..41af56f44f0a 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -52,9 +52,7 @@ enum {
 static struct dentry *eventfs_root_lookup(struct inode *dir,
 					  struct dentry *dentry,
 					  unsigned int flags);
-static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
-static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx);
-static int eventfs_release(struct inode *inode, struct file *file);
+static int eventfs_iterate(struct file *file, struct dir_context *ctx);
 
 static void update_attr(struct eventfs_attr *attr, struct iattr *iattr)
 {
@@ -148,11 +146,9 @@ static const struct inode_operations eventfs_file_inode_operations = {
 };
 
 static const struct file_operations eventfs_file_operations = {
-	.open		= dcache_dir_open_wrapper,
 	.read		= generic_read_dir,
-	.iterate_shared	= dcache_readdir_wrapper,
+	.iterate_shared	= eventfs_iterate,
 	.llseek		= generic_file_llseek,
-	.release	= eventfs_release,
 };
 
 /* Return the evenfs_inode of the "events" directory */
@@ -643,128 +639,87 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 	return ret;
 }
 
-struct dentry_list {
-	void			*cursor;
-	struct dentry		**dentries;
-};
-
-/**
- * eventfs_release - called to release eventfs file/dir
- * @inode: inode to be released
- * @file: file to be released (not used)
- */
-static int eventfs_release(struct inode *inode, struct file *file)
-{
-	struct tracefs_inode *ti;
-	struct dentry_list *dlist = file->private_data;
-	void *cursor;
-	int i;
-
-	ti = get_tracefs(inode);
-	if (!(ti->flags & TRACEFS_EVENT_INODE))
-		return -EINVAL;
-
-	if (WARN_ON_ONCE(!dlist))
-		return -EINVAL;
-
-	for (i = 0; dlist->dentries && dlist->dentries[i]; i++) {
-		dput(dlist->dentries[i]);
-	}
-
-	cursor = dlist->cursor;
-	kfree(dlist->dentries);
-	kfree(dlist);
-	file->private_data = cursor;
-	return dcache_dir_close(inode, file);
-}
-
-static int add_dentries(struct dentry ***dentries, struct dentry *d, int cnt)
-{
-	struct dentry **tmp;
-
-	tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_NOFS);
-	if (!tmp)
-		return -1;
-	tmp[cnt] = d;
-	tmp[cnt + 1] = NULL;
-	*dentries = tmp;
-	return 0;
-}
-
-/**
- * dcache_dir_open_wrapper - eventfs open wrapper
- * @inode: not used
- * @file: dir to be opened (to create it's children)
- *
- * Used to dynamic create file/dir with-in @file, all the
- * file/dir will be created. If already created then references
- * will be increased
+/*
+ * Walk the children of a eventfs_inode to fill in getdents().
  */
-static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
+static int eventfs_iterate(struct file *file, struct dir_context *ctx)
 {
 	const struct file_operations *fops;
+	struct inode *f_inode = file_inode(file);
 	const struct eventfs_entry *entry;
 	struct eventfs_inode *ei_child;
 	struct tracefs_inode *ti;
 	struct eventfs_inode *ei;
-	struct dentry_list *dlist;
-	struct dentry **dentries = NULL;
-	struct dentry *parent = file_dentry(file);
-	struct dentry *d;
-	struct inode *f_inode = file_inode(file);
-	const char *name = parent->d_name.name;
+	struct dentry *ei_dentry = NULL;
+	struct dentry *dentry;
+	const char *name;
 	umode_t mode;
-	void *data;
-	int cnt = 0;
 	int idx;
-	int ret;
-	int i;
-	int r;
+	int ret = -EINVAL;
+	int ino;
+	int i, r, c;
+
+	if (!dir_emit_dots(file, ctx))
+		return 0;
 
 	ti = get_tracefs(f_inode);
 	if (!(ti->flags & TRACEFS_EVENT_INODE))
 		return -EINVAL;
 
-	if (WARN_ON_ONCE(file->private_data))
-		return -EINVAL;
+	c = ctx->pos - 2;
 
 	idx = srcu_read_lock(&eventfs_srcu);
 
 	mutex_lock(&eventfs_mutex);
 	ei = READ_ONCE(ti->private);
+	if (ei && !ei->is_freed)
+		ei_dentry = READ_ONCE(ei->dentry);
 	mutex_unlock(&eventfs_mutex);
 
-	if (!ei) {
-		srcu_read_unlock(&eventfs_srcu, idx);
-		return -EINVAL;
-	}
-
-
-	data = ei->data;
+	if (!ei || !ei_dentry)
+		goto out;
 
-	dlist = kmalloc(sizeof(*dlist), GFP_KERNEL);
-	if (!dlist) {
-		srcu_read_unlock(&eventfs_srcu, idx);
-		return -ENOMEM;
-	}
+	ret = 0;
 
-	inode_lock(parent->d_inode);
+	/*
+	 * Need to create the dentries and inodes to have a consistent
+	 * inode number.
+	 */
 	list_for_each_entry_srcu(ei_child, &ei->children, list,
 				 srcu_read_lock_held(&eventfs_srcu)) {
-		d = create_dir_dentry(ei, ei_child, parent);
-		if (d) {
-			ret = add_dentries(&dentries, d, cnt);
-			dput(d);
-			if (ret < 0)
-				break;
-			cnt++;
+
+		if (c > 0) {
+			c--;
+			continue;
 		}
+
+		if (ei_child->is_freed)
+			continue;
+
+		name = ei_child->name;
+
+		dentry = create_dir_dentry(ei, ei_child, ei_dentry);
+		if (!dentry)
+			goto out;
+		ino = dentry->d_inode->i_ino;
+		dput(dentry);
+
+		if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR))
+			goto out;
+		ctx->pos++;
 	}
 
 	for (i = 0; i < ei->nr_entries; i++) {
-		void *cdata = data;
+		void *cdata = ei->data;
+
+		if (c > 0) {
+			c--;
+			continue;
+		}
+
 		entry = &ei->entries[i];
 		name = entry->name;
+
 		mutex_lock(&eventfs_mutex);
 		/* If ei->is_freed, then the event itself may be too */
 		if (!ei->is_freed)
@@ -774,42 +729,21 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
 		mutex_unlock(&eventfs_mutex);
 		if (r <= 0)
 			continue;
-		d = create_file_dentry(ei, i, parent, name, mode, cdata, fops);
-		if (d) {
-			ret = add_dentries(&dentries, d, cnt);
-			dput(d);
-			if (ret < 0)
-				break;
-			cnt++;
-		}
-	}
-	inode_unlock(parent->d_inode);
-	srcu_read_unlock(&eventfs_srcu, idx);
-	ret = dcache_dir_open(inode, file);
 
-	/*
-	 * dcache_dir_open() sets file->private_data to a dentry cursor.
-	 * Need to save that but also save all the dentries that were
-	 * opened by this function.
-	 */
-	dlist->cursor = file->private_data;
-	dlist->dentries = dentries;
-	file->private_data = dlist;
-	return ret;
-}
+		dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
+		if (!dentry)
+			goto out;
+		ino = dentry->d_inode->i_ino;
+		dput(dentry);
 
-/*
- * This just sets the file->private_data back to the cursor and back.
- */
-static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx)
-{
-	struct dentry_list *dlist = file->private_data;
-	int ret;
+		if (!dir_emit(ctx, name, strlen(name), ino, DT_REG))
+			goto out;
+		ctx->pos++;
+	}
+	ret = 1;
+ out:
+	srcu_read_unlock(&eventfs_srcu, idx);
 
-	file->private_data = dlist->cursor;
-	ret = dcache_readdir(file, ctx);
-	dlist->cursor = file->private_data;
-	file->private_data = dlist;
 	return ret;
 }
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership
  2024-01-04 16:47 [for-next][PATCH 0/3] tracefs/eventfs: Updates for 6.8 Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 1/3] eventfs: Remove "lookup" parameter from create_dir/file_dentry() Steven Rostedt
  2024-01-04 16:47 ` [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
@ 2024-01-04 16:47 ` Steven Rostedt
  2024-01-04 18:38   ` Linus Torvalds
  2 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 16:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Al Viro, Christian Brauner, Greg Kroah-Hartman

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Instead of walking the dentries on mount/remount to update the gid values of
all the dentries if a gid option is specified on mount, just update the root
inode. Add .getattr, .setattr, and .permissions on the tracefs inode
operations to update the permissions of the files and directories.

For all files and directories in the top level instance:

 /sys/kernel/tracing/*

It will use the root inode as the default permissions. The inode that
represents: /sys/kernel/tracing (or wherever it is mounted).

When an instance is created:

 mkdir /sys/kernel/tracing/instance/foo

The directory "foo" and all its files and directories underneath will use
the default of what foo is when it was created. A remount of tracefs will
not affect it.

If a user were to modify the permissions of any file or directory in
tracefs, it will also no longer be modified by a change in ownership of a
remount.

The events directory, if it is in the top level instance, will use the
tracefs root inode as the default ownership for itself and all the files and
directories below it.

For the events directory in an instance ("foo"), it will keep the ownership
of what it was when it was created, and that will be used as the default
ownership for the files and directories beneath it.

Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wjVdGkjDXBbvLn2wbZnqP4UsH46E3gqJ9m7UG6DpX2+WA@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240103215016.1e0c9811@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 fs/tracefs/event_inode.c |  79 +++++++++++++++-
 fs/tracefs/inode.c       | 198 ++++++++++++++++++++++-----------------
 fs/tracefs/internal.h    |   3 +
 3 files changed, 190 insertions(+), 90 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 41af56f44f0a..72912b5f9a90 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -45,6 +45,7 @@ enum {
 	EVENTFS_SAVE_MODE	= BIT(16),
 	EVENTFS_SAVE_UID	= BIT(17),
 	EVENTFS_SAVE_GID	= BIT(18),
+	EVENTFS_TOPLEVEL	= BIT(19),
 };
 
 #define EVENTFS_MODE_MASK	(EVENTFS_SAVE_MODE - 1)
@@ -115,10 +116,17 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
 		 * The events directory dentry is never freed, unless its
 		 * part of an instance that is deleted. It's attr is the
 		 * default for its child files and directories.
-		 * Do not update it. It's not used for its own mode or ownership
+		 * Do not update it. It's not used for its own mode or ownership.
 		 */
-		if (!ei->is_events)
+		if (ei->is_events) {
+			/* But it still needs to know if it was modified */
+			if (iattr->ia_valid & ATTR_UID)
+				ei->attr.mode |= EVENTFS_SAVE_UID;
+			if (iattr->ia_valid & ATTR_GID)
+				ei->attr.mode |= EVENTFS_SAVE_GID;
+		} else {
 			update_attr(&ei->attr, iattr);
+		}
 
 	} else {
 		name = dentry->d_name.name;
@@ -136,9 +144,66 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
 	return ret;
 }
 
+static void update_top_events_attr(struct eventfs_inode *ei, struct dentry *dentry)
+{
+	struct inode *inode;
+
+	/* Only update if the "events" was on the top level */
+	if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL))
+		return;
+
+	/* Get the tracefs root inode. */
+	inode = d_inode(dentry->d_sb->s_root);
+	ei->attr.uid = inode->i_uid;
+	ei->attr.gid = inode->i_gid;
+}
+
+static void set_top_events_ownership(struct inode *inode)
+{
+	struct tracefs_inode *ti = get_tracefs(inode);
+	struct eventfs_inode *ei = ti->private;
+	struct dentry *dentry;
+
+	/* The top events directory doesn't get automatically updated */
+	if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL))
+		return;
+
+	dentry = ei->dentry;
+
+	update_top_events_attr(ei, dentry);
+
+	if (!(ei->attr.mode & EVENTFS_SAVE_UID))
+		inode->i_uid = ei->attr.uid;
+
+	if (!(ei->attr.mode & EVENTFS_SAVE_GID))
+		inode->i_gid = ei->attr.gid;
+}
+
+static int eventfs_get_attr(struct mnt_idmap *idmap,
+			    const struct path *path, struct kstat *stat,
+			    u32 request_mask, unsigned int flags)
+{
+	struct dentry *dentry = path->dentry;
+	struct inode *inode = d_backing_inode(dentry);
+
+	set_top_events_ownership(inode);
+
+	generic_fillattr(idmap, request_mask, inode, stat);
+	return 0;
+}
+
+static int eventfs_permission(struct mnt_idmap *idmap,
+			      struct inode *inode, int mask)
+{
+	set_top_events_ownership(inode);
+	return generic_permission(idmap, inode, mask);
+}
+
 static const struct inode_operations eventfs_root_dir_inode_operations = {
 	.lookup		= eventfs_root_lookup,
 	.setattr	= eventfs_set_attr,
+	.getattr	= eventfs_get_attr,
+	.permission	= eventfs_permission,
 };
 
 static const struct inode_operations eventfs_file_inode_operations = {
@@ -174,6 +239,8 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
 	} while (!ei->is_events);
 	mutex_unlock(&eventfs_mutex);
 
+	update_top_events_attr(ei, dentry);
+
 	return ei;
 }
 
@@ -887,6 +954,14 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
 	uid = d_inode(dentry->d_parent)->i_uid;
 	gid = d_inode(dentry->d_parent)->i_gid;
 
+	/*
+	 * If the events directory is of the top instance, then parent
+	 * is NULL. Set the attr.mode to reflect this and its permissions will
+	 * default to the tracefs root dentry.
+	 */
+	if (!parent)
+		ei->attr.mode = EVENTFS_TOPLEVEL;
+
 	/* This is used as the default ownership of the files and directories */
 	ei->attr.uid = uid;
 	ei->attr.gid = gid;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index bc86ffdb103b..e1b172c0e091 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -91,6 +91,7 @@ static int tracefs_syscall_mkdir(struct mnt_idmap *idmap,
 				 struct inode *inode, struct dentry *dentry,
 				 umode_t mode)
 {
+	struct tracefs_inode *ti;
 	char *name;
 	int ret;
 
@@ -98,6 +99,15 @@ static int tracefs_syscall_mkdir(struct mnt_idmap *idmap,
 	if (!name)
 		return -ENOMEM;
 
+	/*
+	 * This is a new directory that does not take the default of
+	 * the rootfs. It becomes the default permissions for all the
+	 * files and directories underneath it.
+	 */
+	ti = get_tracefs(inode);
+	ti->flags |= TRACEFS_INSTANCE_INODE;
+	ti->private = inode;
+
 	/*
 	 * The mkdir call can call the generic functions that create
 	 * the files within the tracefs system. It is up to the individual
@@ -141,10 +151,76 @@ static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry)
 	return ret;
 }
 
-static const struct inode_operations tracefs_dir_inode_operations = {
+static void set_tracefs_inode_owner(struct inode *inode)
+{
+	struct tracefs_inode *ti = get_tracefs(inode);
+	struct inode *root_inode = ti->private;
+
+	/*
+	 * If this inode has never been referenced, then update
+	 * the permissions to the superblock.
+	 */
+	if (!(ti->flags & TRACEFS_UID_PERM_SET))
+		inode->i_uid = root_inode->i_uid;
+
+	if (!(ti->flags & TRACEFS_GID_PERM_SET))
+		inode->i_gid = root_inode->i_gid;
+}
+
+static int tracefs_permission(struct mnt_idmap *idmap,
+			      struct inode *inode, int mask)
+{
+	set_tracefs_inode_owner(inode);
+	return generic_permission(idmap, inode, mask);
+}
+
+static int tracefs_getattr(struct mnt_idmap *idmap,
+			   const struct path *path, struct kstat *stat,
+			   u32 request_mask, unsigned int flags)
+{
+	struct inode *inode = d_backing_inode(path->dentry);
+
+	set_tracefs_inode_owner(inode);
+	generic_fillattr(idmap, request_mask, inode, stat);
+	return 0;
+}
+
+static int tracefs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+			   struct iattr *attr)
+{
+	unsigned int ia_valid = attr->ia_valid;
+	struct inode *inode = d_inode(dentry);
+	struct tracefs_inode *ti = get_tracefs(inode);
+
+	if (ia_valid & ATTR_UID)
+		ti->flags |= TRACEFS_UID_PERM_SET;
+
+	if (ia_valid & ATTR_GID)
+		ti->flags |= TRACEFS_GID_PERM_SET;
+
+	return simple_setattr(idmap, dentry, attr);
+}
+
+static const struct inode_operations tracefs_instance_dir_inode_operations = {
 	.lookup		= simple_lookup,
 	.mkdir		= tracefs_syscall_mkdir,
 	.rmdir		= tracefs_syscall_rmdir,
+	.permission	= tracefs_permission,
+	.getattr	= tracefs_getattr,
+	.setattr	= tracefs_setattr,
+};
+
+static const struct inode_operations tracefs_dir_inode_operations = {
+	.lookup		= simple_lookup,
+	.permission	= tracefs_permission,
+	.getattr	= tracefs_getattr,
+	.setattr	= tracefs_setattr,
+};
+
+static const struct inode_operations tracefs_file_inode_operations = {
+	.permission	= tracefs_permission,
+	.getattr	= tracefs_getattr,
+	.setattr	= tracefs_setattr,
 };
 
 struct inode *tracefs_get_inode(struct super_block *sb)
@@ -183,87 +259,6 @@ struct tracefs_fs_info {
 	struct tracefs_mount_opts mount_opts;
 };
 
-static void change_gid(struct dentry *dentry, kgid_t gid)
-{
-	if (!dentry->d_inode)
-		return;
-	dentry->d_inode->i_gid = gid;
-}
-
-/*
- * Taken from d_walk, but without he need for handling renames.
- * Nothing can be renamed while walking the list, as tracefs
- * does not support renames. This is only called when mounting
- * or remounting the file system, to set all the files to
- * the given gid.
- */
-static void set_gid(struct dentry *parent, kgid_t gid)
-{
-	struct dentry *this_parent;
-	struct list_head *next;
-
-	this_parent = parent;
-	spin_lock(&this_parent->d_lock);
-
-	change_gid(this_parent, gid);
-repeat:
-	next = this_parent->d_subdirs.next;
-resume:
-	while (next != &this_parent->d_subdirs) {
-		struct tracefs_inode *ti;
-		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
-		next = tmp->next;
-
-		/* Note, getdents() can add a cursor dentry with no inode */
-		if (!dentry->d_inode)
-			continue;
-
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-
-		change_gid(dentry, gid);
-
-		/* If this is the events directory, update that too */
-		ti = get_tracefs(dentry->d_inode);
-		if (ti && (ti->flags & TRACEFS_EVENT_INODE))
-			eventfs_update_gid(dentry, gid);
-
-		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&this_parent->d_lock);
-			spin_release(&dentry->d_lock.dep_map, _RET_IP_);
-			this_parent = dentry;
-			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
-			goto repeat;
-		}
-		spin_unlock(&dentry->d_lock);
-	}
-	/*
-	 * All done at this level ... ascend and resume the search.
-	 */
-	rcu_read_lock();
-ascend:
-	if (this_parent != parent) {
-		struct dentry *child = this_parent;
-		this_parent = child->d_parent;
-
-		spin_unlock(&child->d_lock);
-		spin_lock(&this_parent->d_lock);
-
-		/* go into the first sibling still alive */
-		do {
-			next = child->d_child.next;
-			if (next == &this_parent->d_subdirs)
-				goto ascend;
-			child = list_entry(next, struct dentry, d_child);
-		} while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
-		rcu_read_unlock();
-		goto resume;
-	}
-	rcu_read_unlock();
-	spin_unlock(&this_parent->d_lock);
-	return;
-}
-
 static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts)
 {
 	substring_t args[MAX_OPT_ARGS];
@@ -336,10 +331,8 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
 	if (!remount || opts->opts & BIT(Opt_uid))
 		inode->i_uid = opts->uid;
 
-	if (!remount || opts->opts & BIT(Opt_gid)) {
-		/* Set all the group ids to the mount option */
-		set_gid(sb->s_root, opts->gid);
-	}
+	if (!remount || opts->opts & BIT(Opt_gid))
+		inode->i_gid = opts->gid;
 
 	return 0;
 }
@@ -573,6 +566,26 @@ struct dentry *eventfs_end_creating(struct dentry *dentry)
 	return dentry;
 }
 
+/* Find the inode that this will use for default */
+static struct inode *instance_inode(struct dentry *parent, struct inode *inode)
+{
+	struct tracefs_inode *ti;
+
+	/* If parent is NULL then use root inode */
+	if (!parent)
+		return d_inode(inode->i_sb->s_root);
+
+	/* Find the inode that is flagged as an instance or the root inode */
+	while (!IS_ROOT(parent)) {
+		ti = get_tracefs(d_inode(parent));
+		if (ti->flags & TRACEFS_INSTANCE_INODE)
+			break;
+		parent = parent->d_parent;
+	}
+
+	return d_inode(parent);
+}
+
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
@@ -603,6 +616,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
 				   struct dentry *parent, void *data,
 				   const struct file_operations *fops)
 {
+	struct tracefs_inode *ti;
 	struct dentry *dentry;
 	struct inode *inode;
 
@@ -621,7 +635,11 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
 	if (unlikely(!inode))
 		return tracefs_failed_creating(dentry);
 
+	ti = get_tracefs(inode);
+	ti->private = instance_inode(parent, inode);
+
 	inode->i_mode = mode;
+	inode->i_op = &tracefs_file_inode_operations;
 	inode->i_fop = fops ? fops : &tracefs_file_operations;
 	inode->i_private = data;
 	inode->i_uid = d_inode(dentry->d_parent)->i_uid;
@@ -634,6 +652,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
 static struct dentry *__create_dir(const char *name, struct dentry *parent,
 				   const struct inode_operations *ops)
 {
+	struct tracefs_inode *ti;
 	struct dentry *dentry = tracefs_start_creating(name, parent);
 	struct inode *inode;
 
@@ -651,6 +670,9 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
 	inode->i_uid = d_inode(dentry->d_parent)->i_uid;
 	inode->i_gid = d_inode(dentry->d_parent)->i_gid;
 
+	ti = get_tracefs(inode);
+	ti->private = instance_inode(parent, inode);
+
 	/* directory inodes start off with i_nlink == 2 (for "." entry) */
 	inc_nlink(inode);
 	d_instantiate(dentry, inode);
@@ -681,7 +703,7 @@ struct dentry *tracefs_create_dir(const char *name, struct dentry *parent)
 	if (security_locked_down(LOCKDOWN_TRACEFS))
 		return NULL;
 
-	return __create_dir(name, parent, &simple_dir_inode_operations);
+	return __create_dir(name, parent, &tracefs_dir_inode_operations);
 }
 
 /**
@@ -712,7 +734,7 @@ __init struct dentry *tracefs_create_instance_dir(const char *name,
 	if (WARN_ON(tracefs_ops.mkdir || tracefs_ops.rmdir))
 		return NULL;
 
-	dentry = __create_dir(name, parent, &tracefs_dir_inode_operations);
+	dentry = __create_dir(name, parent, &tracefs_instance_dir_inode_operations);
 	if (!dentry)
 		return NULL;
 
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 42bdeb471a07..12b7d0150ae9 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -5,6 +5,9 @@
 enum {
 	TRACEFS_EVENT_INODE		= BIT(1),
 	TRACEFS_EVENT_TOP_INODE		= BIT(2),
+	TRACEFS_GID_PERM_SET		= BIT(3),
+	TRACEFS_UID_PERM_SET		= BIT(4),
+	TRACEFS_INSTANCE_INODE		= BIT(5),
 };
 
 struct tracefs_inode {
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership
  2024-01-04 16:47 ` [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership Steven Rostedt
@ 2024-01-04 18:38   ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2024-01-04 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Al Viro, Christian Brauner, Greg Kroah-Hartman

On Thu, 4 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Instead of walking the dentries on mount/remount to update the gid values of
> all the dentries if a gid option is specified on mount, just update the root
> inode. Add .getattr, .setattr, and .permissions on the tracefs inode
> operations to update the permissions of the files and directories.

Looks mostly good, thanks. This may add more lines than it removes,
but the lines it adds are *much* simpler than the removed ones.

I don't understand why you do those odd TRACEFS_INSTANCE_INODE games.
That seems entirely new functionality. The old'set_gid()' thing did
none of that, and just forced everything to new gid values.

IOW, this seems entirely random. I *suspect* that you have just tried
to retain some odd random semantics that happened to be the result of
a random implementation detail that came out of the dentry tree not
necessarily being fully populated by the time you did the remount.

So this seems wrong.

             Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 16:47 ` [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
@ 2024-01-04 18:46   ` Linus Torvalds
  2024-01-04 19:02     ` Steven Rostedt
  2024-01-15 20:58   ` Steven Rostedt
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2024-01-04 18:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote:
>
>         list_for_each_entry_srcu(ei_child, &ei->children, list,
>                                  srcu_read_lock_held(&eventfs_srcu)) {
> +
> +               if (c > 0) {
> +                       c--;
> +                       continue;
>                 }

Thanks for putting that at the top, I really do think it's not just
more efficient, but "more correct" too - ie if some entry that *used*
to exist and was previously counted by 'pos' went away, it's actually
*better* to count it again if we still see it, in order to not skip
subsequent entries that haven't been seen..

And that very fact actually makes me wonder:

>         for (i = 0; i < ei->nr_entries; i++) {
> +               void *cdata = ei->data;
> +
> +               if (c > 0) {
> +                       c--;
> +                       continue;
> +               }

The 'ei->nr_entries' things are in a stable array, so the indexing for
them cannot change (ie even if "is_freed" were to be set the array is
still stable).

So I wonder if - just from a 'pos' iterator stability standpoint - you
should change the tracefs directory iterator to always start with the
non-directory entries in ei->entries[]?

That way, even if concurrent dynamic add/remove events might change
the 'ei->children' list, it could never cause an 'ei->entry[]' to
disappear (or be returned twice).

This is very nitpicky and I doubt it matters, because I doubt the
whole "ls on a tracefs directory while changing it" case matters, but
I thought I'd mention it.

              Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 18:46   ` Linus Torvalds
@ 2024-01-04 19:02     ` Steven Rostedt
  2024-01-04 20:05       ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 10:46:04 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> And that very fact actually makes me wonder:
> 
> >         for (i = 0; i < ei->nr_entries; i++) {
> > +               void *cdata = ei->data;
> > +
> > +               if (c > 0) {
> > +                       c--;
> > +                       continue;
> > +               }  
> 
> The 'ei->nr_entries' things are in a stable array, so the indexing for
> them cannot change (ie even if "is_freed" were to be set the array is
> still stable).

Yeah, the entries is fixed. If the ei itself gets freed, so does all the
entries at the same time. The individual entries should too.

Hmm, it probably doesn't even make sense to continue the loop if is_freed
is set as the same variable will be checked every time. :-/  Should just be
a "break;" not a "continue;"

> 
> So I wonder if - just from a 'pos' iterator stability standpoint - you
> should change the tracefs directory iterator to always start with the
> non-directory entries in ei->entries[]?

Sure I can do that. I only did it this way because I followed the way the
other functions did the child directories first and then the files (the
entries array represents the files, as the eventfs_inode represents
directories).

> 
> That way, even if concurrent dynamic add/remove events might change
> the 'ei->children' list, it could never cause an 'ei->entry[]' to
> disappear (or be returned twice).
> 
> This is very nitpicky and I doubt it matters, because I doubt the
> whole "ls on a tracefs directory while changing it" case matters, but
> I thought I'd mention it.

I may add this update, but as a separate patch.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 19:02     ` Steven Rostedt
@ 2024-01-04 20:05       ` Steven Rostedt
  2024-01-04 20:18         ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 20:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 14:02:46 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > And that very fact actually makes me wonder:
> >   
> > >         for (i = 0; i < ei->nr_entries; i++) {
> > > +               void *cdata = ei->data;
> > > +
> > > +               if (c > 0) {
> > > +                       c--;
> > > +                       continue;
> > > +               }    
> > 
> > The 'ei->nr_entries' things are in a stable array, so the indexing for
> > them cannot change (ie even if "is_freed" were to be set the array is
> > still stable).  
> 
> Yeah, the entries is fixed. If the ei itself gets freed, so does all the
> entries at the same time. The individual entries should too.
> 
> Hmm, it probably doesn't even make sense to continue the loop if is_freed
> is set as the same variable will be checked every time. :-/  Should just be
> a "break;" not a "continue;"

Also, I just realized it breaks if we update the 'c--' before the callback. :-/

I have to put this check *after* the callback check.

Reason being, the callback can say "this event doesn't get this file" and
return 0, which tells eventfs to skip this file.

For example, we have

 # ls /sys/kernel/tracing/events/ftrace/function
format  hist  hist_debug  id  inject

and

 # ls /sys/kernel/tracing/events/sched/sched_switch/
enable  filter  format  hist  hist_debug  id  inject  trigger

The "ftrace" event files are for information only. They describe the
internal events. For example, the "function" event is what is written into
the ring buffer by the function tracer. That event is not enabled by the
events directory. It is only enabled when "function" is written into
current_tracer.

Same for filtering. The filter logic for function events is done by the
"set_ftrace_filter" file in tracefs.

But the "sched_switch" event is enabled and filtered by the eventfs
directory. Here we see "enable" and "filter" files that the user could
write into to enabled and/or filter the sched_switch event.

Now because the "function" and "sched_switch" registered the same way, the
callback that handles the two has:

static int event_callback(const char *name, umode_t *mode, void **data,
			  const struct file_operations **fops)
{
	struct trace_event_file *file = *data;
	struct trace_event_call *call = file->event_call;

[..]
	if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) {
		if (call->class->reg && strcmp(name, "enable") == 0) {
			*mode = TRACE_MODE_WRITE;
			*fops = &ftrace_enable_fops;
			return 1;
		}

		if (strcmp(name, "filter") == 0) {
			*mode = TRACE_MODE_WRITE;
			*fops = &ftrace_event_filter_fops;
			return 1;
		}
	}
[..]
	return 0;
}

Where the function event has that "TRACE_EVENT_FL_IGNORE_ENABLE" flag set,
so when the entry for "enable" or "filter" is passed to it, it returns 0.

Before, where we had c-- after the callback, we had:

 # ls /sys/kernel/tracing/events/ftrace/function
format  hist  hist_debug  id  inject

After changing the c-- to before the callback I now get:

 # ls /sys/kernel/tracing/events/ftrace/function/
format  hist  hist  hist_debug  hist_debug  id  inject  inject

Where the missing "enable" and "filter" files caused the indexing to be
off, and the ls repeated "hist" and "hist_debug" because of it. Oh, and the
function event doesn't have "trigger" either which is why we get two
"inject" files.

I have to move the c-- update down. I can still do it before creating the
dentry though, as if that fails, we should just bail.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 20:05       ` Steven Rostedt
@ 2024-01-04 20:18         ` Linus Torvalds
  2024-01-04 20:33           ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2024-01-04 20:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 at 12:04, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Also, I just realized it breaks if we update the 'c--' before the callback. :-/
>
> I have to put this check *after* the callback check.

What? No.

> Reason being, the callback can say "this event doesn't get this file" and
> return 0, which tells eventfs to skip this file.

So yes, there seems to be a bug there, in that ctx->pos is only
updated for successful callbacks (and not for "ignored entry").

But that just means that you should always update 'ctx->pos' as you
'continue' the loop.

The logical place to do that would be in the for-loop itself, which
actually is very natural for the simple case, ie you should just do

        for (i = 0; i < ei->nr_entries; i++, ctx->pos++) {

but in the list_for_each_entry_srcu() case the 'update' part of the
for-loop isn't actually accessible, so it would have to be at the
'continue' point(s).

Which is admittedly a bit annoying.

Looking at that I'm actually surprised that I don't recall that we'd
have hit that issue with our 'for_each_xyz()' loops before.

The update for our "for_each_xyz()" helpers are all hardcoded to just
do the "next iterator" thing, and there's no nice way to take
advantage of the normal for-loop semantics of "do this at the end of
the loop"

            Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 20:18         ` Linus Torvalds
@ 2024-01-04 20:33           ` Steven Rostedt
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2024-01-04 20:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 12:18:06 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, 4 Jan 2024 at 12:04, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Also, I just realized it breaks if we update the 'c--' before the callback. :-/
> >
> > I have to put this check *after* the callback check.  
> 
> What? No.
> 
> > Reason being, the callback can say "this event doesn't get this file" and
> > return 0, which tells eventfs to skip this file.  
> 
> So yes, there seems to be a bug there, in that ctx->pos is only
> updated for successful callbacks (and not for "ignored entry").

OK, I wasn't sure if it was OK to update the ctx->pos for something we
didn't add, so I avoided doing so.

> 
> But that just means that you should always update 'ctx->pos' as you
> 'continue' the loop.
> 
> The logical place to do that would be in the for-loop itself, which
> actually is very natural for the simple case, ie you should just do
> 
>         for (i = 0; i < ei->nr_entries; i++, ctx->pos++) {

Well, we don't want to do that and c-- at the same time. But of course, if
we do the shortcut, we can have:

	for (i = c; i < ei->nr_entries; i++, ctx->pos++) {

which would be OK. And better if we move it before the ei->children list walk.

> 
> but in the list_for_each_entry_srcu() case the 'update' part of the
> for-loop isn't actually accessible, so it would have to be at the
> 'continue' point(s).
> 
> Which is admittedly a bit annoying.

But not really an issue as we just have:

	list_for_each_entry_srcu(ei_child, &ei->children, list,
				 srcu_read_lock_held(&eventfs_srcu)) {

		if (c > 0) {
			c--;
			continue;
		}

		ctx->pos++;

> 
> Looking at that I'm actually surprised that I don't recall that we'd
> have hit that issue with our 'for_each_xyz()' loops before.
> 
> The update for our "for_each_xyz()" helpers are all hardcoded to just
> do the "next iterator" thing, and there's no nice way to take
> advantage of the normal for-loop semantics of "do this at the end of
> the loop"

Anyway, if I do count ctx->pos++ for every iteration, whether it added
something or not, it appears to work. I'll write up a couple of patches to
handle this.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-04 16:47 ` [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
  2024-01-04 18:46   ` Linus Torvalds
@ 2024-01-15 20:58   ` Steven Rostedt
  1 sibling, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2024-01-15 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 04 Jan 2024 11:47:05 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The eventfs creates dynamically allocated dentries and inodes. Using the
> dcache_readdir() logic for its own directory lookups requires hiding the
> cursor of the dcache logic and playing games to allow the dcache_readdir()
> to still have access to the cursor while the eventfs saved what it created
> and what it needs to release.
> 
> Instead, just have eventfs have its own iterate_shared callback function
> that will fill in the dent entries. This simplifies the code quite a bit.
> 
> Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org

So now I know why Ajay did what he did with the
dcach_dir_open_wrapper():

  https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/

Basically it crashed with:

> [   41.602625][ T4377] kernel BUG at fs/dcache.c:2031!
> [   41.602625][ T4374] RSP: 0018:ffa000000fcdfcd0 EFLAGS: 00010286
> [   41.602629][ T4374] RAX: 0000000000000002 RBX: ff11000109392980 RCX: 0000000000000000
> [   41.602630][ T4374] RDX: 0000000000000000 RSI: ff1100405e46c6f0 RDI: ff1100405f05afc0
> [   41.602631][ T4374] RBP: ff1100405f05afc0 R08: ffffffff830ad0e0 R09: 0000000000000000
> [   41.602632][ T4374] R10: 0000000000000280 R11: ffffffff8162036a R12: 0000000000000000
> [   41.602633][ T4374] R13: ff1100405e46c6f0 R14: ff1100405f05aff8 R15: 0000000000000000
> [   41.602634][ T4374] FS:  00007f2582ff9740(0000) GS:ff1100407fa80000(0000) knlGS:0000000000000000
> [   41.602635][ T4374] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   41.602635][ T4374] CR2: 00005624511f3328 CR3: 000000208a342006 CR4: 0000000000771ef0
> [   41.602636][ T4374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   41.602637][ T4374] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   41.602638][ T4374] PKRU: 55555554
> [   41.602638][ T4374] Call Trace:
> [   41.602640][ T4374]  <TASK>
> [ 41.602642][ T4374] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) 
> [ 41.602644][ T4374] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153) 
> [ 41.602645][ T4374] ? d_instantiate (fs/dcache.c:2031 (discriminator 1)) 
> [ 41.602647][ T4374] ? do_error_trap (arch/x86/include/asm/traps.h:59 arch/x86/kernel/traps.c:174) 
> [ 41.602648][ T4374] ? d_instantiate (fs/dcache.c:2031 (discriminator 1)) 
> [ 41.602649][ T4374] ? exc_invalid_op (arch/x86/kernel/traps.c:265) 
> [ 41.602652][ T4374] ? d_instantiate (fs/dcache.c:2031 (discriminator 1)) 
> [ 41.602653][ T4374] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) 
> [ 41.602655][ T4374] ? tracefs_alloc_inode (fs/tracefs/inode.c:38) 
> [ 41.602657][ T4374] ? d_instantiate (fs/dcache.c:2031 (discriminator 1)) 
> [ 41.602659][ T4374] create_dir_dentry (fs/tracefs/event_inode.c:329 fs/tracefs/event_inode.c:516) 
> [ 41.602661][ T4374] eventfs_root_lookup (fs/tracefs/event_inode.c:611) 
> [ 41.602662][ T4374] ? terminate_walk (fs/namei.c:691) 
> [ 41.602665][ T4374] __lookup_slow (fs/namei.c:1694) 
> [ 41.602667][ T4374] lookup_one_len (fs/namei.c:2746 (discriminator 1)) 
> [ 41.602669][ T4374] eventfs_start_creating (fs/tracefs/inode.c:536) 
> [ 41.602671][ T4374] create_dir_dentry (fs/tracefs/event_inode.c:309 fs/tracefs/event_inode.c:516) 
> [ 41.602673][ T4374] eventfs_iterate (fs/tracefs/event_inode.c:701) 
> [ 41.602674][ T4374] ? atime_needs_update (fs/inode.c:1842 fs/inode.c:1994) 
> [ 41.602677][ T4374] iterate_dir (fs/readdir.c:106) 
> [ 41.602680][ T4374] __x64_sys_getdents (fs/readdir.c:323 fs/readdir.c:307 fs/readdir.c:307) 
> [ 41.602682][ T4374] ? __pfx_filldir (fs/readdir.c:260) 
> [ 41.602684][ T4374] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [ 41.602686][ T4374] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) 

I guess that's because I'm creating the inodes and dentries in the
iterare_shared() and not at open. The inodes and dentries are created
with a ref count of zero, as they should be freed when no longer used.
But I don't think the getdents() can handle that in the iterator part.
I do need to create the inodes/dentries in the open, up their ref count
and dec the ref counts at the release.

But I still do not need the dcache_readdir() code to do it. I just have
to create an array of inodes to use at the open (like the code did
before this update), and have the iterate_shared() use that array, and
then the release just decrement the inodes/dentries.

I'm off today, so I will likely fix that tomorrow.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-15 20:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-04 16:47 [for-next][PATCH 0/3] tracefs/eventfs: Updates for 6.8 Steven Rostedt
2024-01-04 16:47 ` [for-next][PATCH 1/3] eventfs: Remove "lookup" parameter from create_dir/file_dentry() Steven Rostedt
2024-01-04 16:47 ` [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
2024-01-04 18:46   ` Linus Torvalds
2024-01-04 19:02     ` Steven Rostedt
2024-01-04 20:05       ` Steven Rostedt
2024-01-04 20:18         ` Linus Torvalds
2024-01-04 20:33           ` Steven Rostedt
2024-01-15 20:58   ` Steven Rostedt
2024-01-04 16:47 ` [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership Steven Rostedt
2024-01-04 18:38   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).