linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy)
@ 2024-01-31 18:49 Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 1/7] tracefs: Zero out the tracefs_inode when allocating it Steven Rostedt
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman


Linus took the time to massively clean up the eventfs logic.
I took his code and made tweaks to represent some of the feedback
from Al Viro and also fix issues that came up in testing.

The diff between v1 and this can be found here:
  https://lore.kernel.org/linux-trace-kernel/20240131105847.3e9afcb8@gandalf.local.home/
 
  Although the first patch I changed to use memset_after() since
  that update.

I would like to have this entire series go all the way back to 6.6 (after it
is accepted in mainline of course) and replace everything since the creation
of the eventfs code.  That is, stable releases may need to add all the
patches that are in fs/tracefs to make that happen. The reason being is that
this rewrite likely fixed a lot of hidden bugs and I honestly believe it's
more stable than the code that currently exists.

Note, there's more clean ups that can happen. One being cleaning up
the eventfs_inode structure. But that's not critical now and can be
added later.

This made it through one round of my testing. I'm going to run it
again but with the part of testing that also runs some tests on
each patch in the series to make sure it doesn't break bisection.

In Linus's first version, patch 5 broke some of the tests but was fixed
in patch 6. I swapped the order and moved patch 6 before patch 5
and it appears to work. I still need to run this through all
my testing again.

Version 1 is at: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-1-torvalds@linux-foundation.org/



Linus Torvalds (6):
      eventfs: Initialize the tracefs inode properly
      tracefs: Avoid using the ei->dentry pointer unnecessarily
      tracefs: dentry lookup crapectomy
      eventfs: Remove unused 'd_parent' pointer field
      eventfs: Clean up dentry ops and add revalidate function
      eventfs: Get rid of dentry pointers without refcounts

Steven Rostedt (Google) (1):
      tracefs: Zero out the tracefs_inode when allocating it

----
 fs/tracefs/event_inode.c | 551 ++++++++++++-----------------------------------
 fs/tracefs/inode.c       | 102 ++-------
 fs/tracefs/internal.h    |  18 +-
 3 files changed, 167 insertions(+), 504 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/7] tracefs: Zero out the tracefs_inode when allocating it
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 2/7] eventfs: Initialize the tracefs inode properly Steven Rostedt
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable, kernel test robot

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

eventfs uses the tracefs_inode and assumes that it's already initialized
to zero. That is, it doesn't set fields to zero (like ti->private) after
getting its tracefs_inode. This causes bugs due to stale values.

Just initialize the entire structure to zero on allocation so there isn't
any more surprises.

This is a partial fix to access to ti->private. The assignment still needs
to be made before the dentry is instantiated.

Cc: stable@vger.kernel.org
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since last version: https://lore.kernel.org/all/20240130230612.377a1933@gandalf.local.home/

- Moved vfs_inode to top of tracefs_inode structure so that the rest can
  be initialized with memset_after() as the vfs_inode portion is already
  cleared with a memset() itself in inode_init_once().

 fs/tracefs/inode.c    | 6 ++++--
 fs/tracefs/internal.h | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index e1b172c0e091..888e42087847 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -38,8 +38,6 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb)
 	if (!ti)
 		return NULL;
 
-	ti->flags = 0;
-
 	return &ti->vfs_inode;
 }
 
@@ -779,7 +777,11 @@ static void init_once(void *foo)
 {
 	struct tracefs_inode *ti = (struct tracefs_inode *) foo;
 
+	/* inode_init_once() calls memset() on the vfs_inode portion */
 	inode_init_once(&ti->vfs_inode);
+
+	/* Zero out the rest */
+	memset_after(ti, 0, vfs_inode);
 }
 
 static int __init tracefs_init(void)
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 91c2bf0b91d9..7d84349ade87 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -11,9 +11,10 @@ enum {
 };
 
 struct tracefs_inode {
+	struct inode            vfs_inode;
+	/* The below gets initialized with memset_after(ti, 0, vfs_inode) */
 	unsigned long           flags;
 	void                    *private;
-	struct inode            vfs_inode;
 };
 
 /*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/7] eventfs: Initialize the tracefs inode properly
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 1/7] tracefs: Zero out the tracefs_inode when allocating it Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 3/7] tracefs: Avoid using the ei->dentry pointer unnecessarily Steven Rostedt
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable, kernel test robot

From: Linus Torvalds <torvalds@linux-foundation.org>

The tracefs-specific fields in the inode were not initialized before the
inode was exposed to others through the dentry with 'd_instantiate()'.

Move the field initializations up to before the d_instantiate.

Cc: stable@vger.kernel.org
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-2-torvalds@linux-foundation.org

-  Since another patch zeroed out the entire tracefs_inode, there's no need
   to initialize any of its fields to NULL.

 fs/tracefs/event_inode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 1c3dd0ad4660..824b1811e342 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -370,6 +370,8 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
 
 	ti = get_tracefs(inode);
 	ti->flags |= TRACEFS_EVENT_INODE;
+	/* Only directories have ti->private set to an ei, not files */
+	ti->private = ei;
 
 	inc_nlink(inode);
 	d_instantiate(dentry, inode);
@@ -515,7 +517,6 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
 static void eventfs_post_create_dir(struct eventfs_inode *ei)
 {
 	struct eventfs_inode *ei_child;
-	struct tracefs_inode *ti;
 
 	lockdep_assert_held(&eventfs_mutex);
 
@@ -525,9 +526,6 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei)
 				 srcu_read_lock_held(&eventfs_srcu)) {
 		ei_child->d_parent = ei->dentry;
 	}
-
-	ti = get_tracefs(ei->dentry->d_inode);
-	ti->private = ei;
 }
 
 /**
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/7] tracefs: Avoid using the ei->dentry pointer unnecessarily
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 1/7] tracefs: Zero out the tracefs_inode when allocating it Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 2/7] eventfs: Initialize the tracefs inode properly Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 4/7] tracefs: dentry lookup crapectomy Steven Rostedt
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

The eventfs_find_events() code tries to walk up the tree to find the
event directory that a dentry belongs to, in order to then find the
eventfs inode that is associated with that event directory.

However, it uses an odd combination of walking the dentry parent,
looking up the eventfs inode associated with that, and then looking up
the dentry from there.  Repeat.

But the code shouldn't have back-pointers to dentries in the first
place, and it should just walk the dentry parenthood chain directly.

Similarly, 'set_top_events_ownership()' looks up the dentry from the
eventfs inode, but the only reason it wants a dentry is to look up the
superblock in order to look up the root dentry.

But it already has the real filesystem inode, which has that same
superblock pointer.  So just pass in the superblock pointer using the
information that's already there, instead of looking up extraneous data
that is irrelevant.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/

Cc: stable@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Original patch: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-1-torvalds@linux-foundation.org

 fs/tracefs/event_inode.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 824b1811e342..e9819d719d2a 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -156,33 +156,30 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
 	return ret;
 }
 
-static void update_top_events_attr(struct eventfs_inode *ei, struct dentry *dentry)
+static void update_top_events_attr(struct eventfs_inode *ei, struct super_block *sb)
 {
-	struct inode *inode;
+	struct inode *root;
 
 	/* Only update if the "events" was on the top level */
 	if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL))
 		return;
 
 	/* Get the tracefs root inode. */
-	inode = d_inode(dentry->d_sb->s_root);
-	ei->attr.uid = inode->i_uid;
-	ei->attr.gid = inode->i_gid;
+	root = d_inode(sb->s_root);
+	ei->attr.uid = root->i_uid;
+	ei->attr.gid = root->i_gid;
 }
 
 static void set_top_events_ownership(struct inode *inode)
 {
 	struct tracefs_inode *ti = get_tracefs(inode);
 	struct eventfs_inode *ei = ti->private;
-	struct dentry *dentry;
 
 	/* The top events directory doesn't get automatically updated */
 	if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL))
 		return;
 
-	dentry = ei->dentry;
-
-	update_top_events_attr(ei, dentry);
+	update_top_events_attr(ei, inode->i_sb);
 
 	if (!(ei->attr.mode & EVENTFS_SAVE_UID))
 		inode->i_uid = ei->attr.uid;
@@ -235,8 +232,10 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
 
 	mutex_lock(&eventfs_mutex);
 	do {
-		/* The parent always has an ei, except for events itself */
-		ei = dentry->d_parent->d_fsdata;
+		// The parent is stable because we do not do renames
+		dentry = dentry->d_parent;
+		// ... and directories always have d_fsdata
+		ei = dentry->d_fsdata;
 
 		/*
 		 * If the ei is being freed, the ownership of the children
@@ -246,12 +245,11 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
 			ei = NULL;
 			break;
 		}
-
-		dentry = ei->dentry;
+		// Walk upwards until you find the events inode
 	} while (!ei->is_events);
 	mutex_unlock(&eventfs_mutex);
 
-	update_top_events_attr(ei, dentry);
+	update_top_events_attr(ei, dentry->d_sb);
 
 	return ei;
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/7] tracefs: dentry lookup crapectomy
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
                   ` (2 preceding siblings ...)
  2024-01-31 18:49 ` [PATCH v2 3/7] tracefs: Avoid using the ei->dentry pointer unnecessarily Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-02-01  0:27   ` Al Viro
  2024-01-31 18:49 ` [PATCH v2 5/7] eventfs: Remove unused d_parent pointer field Steven Rostedt
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

The dentry lookup for eventfs files was very broken, and had lots of
signs of the old situation where the filesystem names were all created
statically in the dentry tree, rather than being looked up dynamically
based on the eventfs data structures.

You could see it in the naming - how it claimed to "create" dentries
rather than just look up the dentries that were given it.

You could see it in various nonsensical and very incorrect operations,
like using "simple_lookup()" on the dentries that were passed in, which
only results in those dentries becoming negative dentries.  Which meant
that any other lookup would possibly return ENOENT if it saw that
negative dentry before the data was then later filled in.

You could see it in the immense amount of nonsensical code that didn't
actually just do lookups.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/

Cc: stable@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-3-torvalds@linux-foundation.org

- Fixed the lookup case of not found dentry, to return an error.
  This was added in a later patch when it should have been in this one.

- Removed the calls to eventfs_{start,end,failed}_creating()

 fs/tracefs/event_inode.c | 285 ++++++++-------------------------------
 fs/tracefs/inode.c       |  69 ----------
 fs/tracefs/internal.h    |   3 -
 3 files changed, 58 insertions(+), 299 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index e9819d719d2a..4878f4d578be 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -230,7 +230,6 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
 {
 	struct eventfs_inode *ei;
 
-	mutex_lock(&eventfs_mutex);
 	do {
 		// The parent is stable because we do not do renames
 		dentry = dentry->d_parent;
@@ -247,7 +246,6 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
 		}
 		// Walk upwards until you find the events inode
 	} while (!ei->is_events);
-	mutex_unlock(&eventfs_mutex);
 
 	update_top_events_attr(ei, dentry->d_sb);
 
@@ -280,11 +278,10 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode,
 }
 
 /**
- * create_file - create a file in the tracefs filesystem
- * @name: the name of the file to create.
+ * lookup_file - look up a file in the tracefs filesystem
+ * @dentry: the dentry to look up
  * @mode: the permission that the file should have.
  * @attr: saved attributes changed by user
- * @parent: parent dentry for this file.
  * @data: something that the caller will want to get to later on.
  * @fop: struct file_operations that should be used for this file.
  *
@@ -292,13 +289,13 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode,
  * directory. The inode.i_private pointer will point to @data in the open()
  * call.
  */
-static struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *lookup_file(struct dentry *dentry,
+				  umode_t mode,
 				  struct eventfs_attr *attr,
-				  struct dentry *parent, void *data,
+				  void *data,
 				  const struct file_operations *fop)
 {
 	struct tracefs_inode *ti;
-	struct dentry *dentry;
 	struct inode *inode;
 
 	if (!(mode & S_IFMT))
@@ -307,15 +304,9 @@ static struct dentry *create_file(const char *name, umode_t mode,
 	if (WARN_ON_ONCE(!S_ISREG(mode)))
 		return NULL;
 
-	WARN_ON_ONCE(!parent);
-	dentry = eventfs_start_creating(name, parent);
-
-	if (IS_ERR(dentry))
-		return dentry;
-
 	inode = tracefs_get_inode(dentry->d_sb);
 	if (unlikely(!inode))
-		return eventfs_failed_creating(dentry);
+		return ERR_PTR(-ENOMEM);
 
 	/* If the user updated the directory's attributes, use them */
 	update_inode_attr(dentry, inode, attr, mode);
@@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, umode_t mode,
 
 	ti = get_tracefs(inode);
 	ti->flags |= TRACEFS_EVENT_INODE;
-	d_instantiate(dentry, inode);
+
+	d_add(dentry, inode);
 	fsnotify_create(dentry->d_parent->d_inode, dentry);
-	return eventfs_end_creating(dentry);
+	return dentry;
 };
 
 /**
- * create_dir - create a dir in the tracefs filesystem
+ * lookup_dir_entry - look up a dir in the tracefs filesystem
+ * @dentry: the directory to look up
  * @ei: the eventfs_inode that represents the directory to create
- * @parent: parent dentry for this file.
  *
- * This function will create a dentry for a directory represented by
+ * This function will look up a dentry for a directory represented by
  * a eventfs_inode.
  */
-static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent)
+static struct dentry *lookup_dir_entry(struct dentry *dentry,
+	struct eventfs_inode *pei, struct eventfs_inode *ei)
 {
 	struct tracefs_inode *ti;
-	struct dentry *dentry;
 	struct inode *inode;
 
-	dentry = eventfs_start_creating(ei->name, parent);
-	if (IS_ERR(dentry))
-		return dentry;
-
 	inode = tracefs_get_inode(dentry->d_sb);
 	if (unlikely(!inode))
-		return eventfs_failed_creating(dentry);
+		return ERR_PTR(-ENOMEM);
 
 	/* If the user updated the directory's attributes, use them */
 	update_inode_attr(dentry, inode, &ei->attr,
@@ -371,11 +359,14 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
 	/* Only directories have ti->private set to an ei, not files */
 	ti->private = ei;
 
+	dentry->d_fsdata = ei;
+        ei->dentry = dentry;	// Remove me!
+
 	inc_nlink(inode);
-	d_instantiate(dentry, inode);
+	d_add(dentry, inode);
 	inc_nlink(dentry->d_parent->d_inode);
 	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
-	return eventfs_end_creating(dentry);
+	return dentry;
 }
 
 static void free_ei(struct eventfs_inode *ei)
@@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
 }
 
 /**
- * create_file_dentry - create a dentry for a file of an eventfs_inode
+ * lookup_file_dentry - create a dentry for a file of an eventfs_inode
  * @ei: the eventfs_inode that the file will be created under
  * @idx: the index into the d_children[] of the @ei
  * @parent: The parent dentry of the created file.
@@ -438,157 +429,21 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
  * address located at @e_dentry.
  */
 static struct dentry *
-create_file_dentry(struct eventfs_inode *ei, int idx,
-		   struct dentry *parent, const char *name, umode_t mode, void *data,
+lookup_file_dentry(struct dentry *dentry,
+		   struct eventfs_inode *ei, int idx,
+		   umode_t mode, void *data,
 		   const struct file_operations *fops)
 {
 	struct eventfs_attr *attr = NULL;
 	struct dentry **e_dentry = &ei->d_children[idx];
-	struct dentry *dentry;
-
-	WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
 
-	mutex_lock(&eventfs_mutex);
-	if (ei->is_freed) {
-		mutex_unlock(&eventfs_mutex);
-		return NULL;
-	}
-	/* If the e_dentry already has a dentry, use it */
-	if (*e_dentry) {
-		dget(*e_dentry);
-		mutex_unlock(&eventfs_mutex);
-		return *e_dentry;
-	}
-
-	/* ei->entry_attrs are protected by SRCU */
 	if (ei->entry_attrs)
 		attr = &ei->entry_attrs[idx];
 
-	mutex_unlock(&eventfs_mutex);
-
-	dentry = create_file(name, mode, attr, parent, data, fops);
-
-	mutex_lock(&eventfs_mutex);
-
-	if (IS_ERR_OR_NULL(dentry)) {
-		/*
-		 * When the mutex was released, something else could have
-		 * created the dentry for this e_dentry. In which case
-		 * use that one.
-		 *
-		 * If ei->is_freed is set, the e_dentry is currently on its
-		 * way to being freed, don't return it. If e_dentry is NULL
-		 * it means it was already freed.
-		 */
-		if (ei->is_freed) {
-			dentry = NULL;
-		} else {
-			dentry = *e_dentry;
-			dget(dentry);
-		}
-		mutex_unlock(&eventfs_mutex);
-		return dentry;
-	}
-
-	if (!*e_dentry && !ei->is_freed) {
-		*e_dentry = dentry;
-		dentry->d_fsdata = ei;
-	} else {
-		/*
-		 * Should never happen unless we get here due to being freed.
-		 * Otherwise it means two dentries exist with the same name.
-		 */
-		WARN_ON_ONCE(!ei->is_freed);
-		dentry = NULL;
-	}
-	mutex_unlock(&eventfs_mutex);
-
-	return dentry;
-}
-
-/**
- * eventfs_post_create_dir - post create dir routine
- * @ei: eventfs_inode of recently created dir
- *
- * Map the meta-data of files within an eventfs dir to their parent dentry
- */
-static void eventfs_post_create_dir(struct eventfs_inode *ei)
-{
-	struct eventfs_inode *ei_child;
-
-	lockdep_assert_held(&eventfs_mutex);
-
-	/* srcu lock already held */
-	/* fill parent-child relation */
-	list_for_each_entry_srcu(ei_child, &ei->children, list,
-				 srcu_read_lock_held(&eventfs_srcu)) {
-		ei_child->d_parent = ei->dentry;
-	}
-}
-
-/**
- * create_dir_dentry - Create a directory dentry for the eventfs_inode
- * @pei: The eventfs_inode parent of ei.
- * @ei: The eventfs_inode to create the directory for
- * @parent: The dentry of the parent of this directory
- *
- * This creates and attaches a directory dentry to the eventfs_inode @ei.
- */
-static struct dentry *
-create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
-		  struct dentry *parent)
-{
-	struct dentry *dentry = NULL;
-
-	WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
-
-	mutex_lock(&eventfs_mutex);
-	if (pei->is_freed || ei->is_freed) {
-		mutex_unlock(&eventfs_mutex);
-		return NULL;
-	}
-	if (ei->dentry) {
-		/* If the eventfs_inode already has a dentry, use it */
-		dentry = ei->dentry;
-		dget(dentry);
-		mutex_unlock(&eventfs_mutex);
-		return dentry;
-	}
-	mutex_unlock(&eventfs_mutex);
+	dentry->d_fsdata = ei;		// NOTE: ei of _parent_
+	lookup_file(dentry, mode, attr, data, fops);
 
-	dentry = create_dir(ei, parent);
-
-	mutex_lock(&eventfs_mutex);
-
-	if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) {
-		/*
-		 * When the mutex was released, something else could have
-		 * created the dentry for this e_dentry. In which case
-		 * use that one.
-		 *
-		 * If ei->is_freed is set, the e_dentry is currently on its
-		 * way to being freed.
-		 */
-		dentry = ei->dentry;
-		if (dentry)
-			dget(dentry);
-		mutex_unlock(&eventfs_mutex);
-		return dentry;
-	}
-
-	if (!ei->dentry && !ei->is_freed) {
-		ei->dentry = dentry;
-		eventfs_post_create_dir(ei);
-		dentry->d_fsdata = ei;
-	} else {
-		/*
-		 * Should never happen unless we get here due to being freed.
-		 * Otherwise it means two dentries exist with the same name.
-		 */
-		WARN_ON_ONCE(!ei->is_freed);
-		dentry = NULL;
-	}
-	mutex_unlock(&eventfs_mutex);
+	*e_dentry = dentry;	// Remove me
 
 	return dentry;
 }
@@ -607,79 +462,55 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 					  struct dentry *dentry,
 					  unsigned int flags)
 {
-	const struct file_operations *fops;
-	const struct eventfs_entry *entry;
 	struct eventfs_inode *ei_child;
 	struct tracefs_inode *ti;
 	struct eventfs_inode *ei;
-	struct dentry *ei_dentry = NULL;
-	struct dentry *ret = NULL;
-	struct dentry *d;
 	const char *name = dentry->d_name.name;
-	umode_t mode;
-	void *data;
-	int idx;
-	int i;
-	int r;
+	struct dentry *result = NULL;
 
 	ti = get_tracefs(dir);
 	if (!(ti->flags & TRACEFS_EVENT_INODE))
-		return NULL;
-
-	/* Grab srcu to prevent the ei from going away */
-	idx = srcu_read_lock(&eventfs_srcu);
+		return ERR_PTR(-EIO);
 
-	/*
-	 * Grab the eventfs_mutex to consistent value from ti->private.
-	 * This s
-	 */
 	mutex_lock(&eventfs_mutex);
-	ei = READ_ONCE(ti->private);
-	if (ei && !ei->is_freed)
-		ei_dentry = READ_ONCE(ei->dentry);
-	mutex_unlock(&eventfs_mutex);
-
-	if (!ei || !ei_dentry)
-		goto out;
 
-	data = ei->data;
+	ei = ti->private;
+	if (!ei || ei->is_freed)
+		goto enoent;
 
-	list_for_each_entry_srcu(ei_child, &ei->children, list,
-				 srcu_read_lock_held(&eventfs_srcu)) {
+	list_for_each_entry(ei_child, &ei->children, list) {
 		if (strcmp(ei_child->name, name) != 0)
 			continue;
-		ret = simple_lookup(dir, dentry, flags);
-		if (IS_ERR(ret))
-			goto out;
-		d = create_dir_dentry(ei, ei_child, ei_dentry);
-		dput(d);
+		if (ei_child->is_freed)
+			goto enoent;
+		lookup_dir_entry(dentry, ei, ei_child);
 		goto out;
 	}
 
-	for (i = 0; i < ei->nr_entries; i++) {
-		entry = &ei->entries[i];
-		if (strcmp(name, entry->name) == 0) {
-			void *cdata = data;
-			mutex_lock(&eventfs_mutex);
-			/* If ei->is_freed, then the event itself may be too */
-			if (!ei->is_freed)
-				r = entry->callback(name, &mode, &cdata, &fops);
-			else
-				r = -1;
-			mutex_unlock(&eventfs_mutex);
-			if (r <= 0)
-				continue;
-			ret = simple_lookup(dir, dentry, flags);
-			if (IS_ERR(ret))
-				goto out;
-			d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
-			dput(d);
-			break;
-		}
+	for (int i = 0; i < ei->nr_entries; i++) {
+		void *data;
+		umode_t mode;
+		const struct file_operations *fops;
+		const struct eventfs_entry *entry = &ei->entries[i];
+
+		if (strcmp(name, entry->name) != 0)
+			continue;
+
+		data = ei->data;
+		if (entry->callback(name, &mode, &data, &fops) <= 0)
+			goto enoent;
+
+		lookup_file_dentry(dentry, ei, i, mode, data, fops);
+		goto out;
 	}
+
+ enoent:
+	/* Don't cache negative lookups, just return an error */
+	result = ERR_PTR(-ENOENT);
+
  out:
-	srcu_read_unlock(&eventfs_srcu, idx);
-	return ret;
+	mutex_unlock(&eventfs_mutex);
+	return result;
 }
 
 /*
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 888e42087847..5c84460feeeb 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -495,75 +495,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry)
 	return dentry;
 }
 
-/**
- * eventfs_start_creating - start the process of creating a dentry
- * @name: Name of the file created for the dentry
- * @parent: The parent dentry where this dentry will be created
- *
- * This is a simple helper function for the dynamically created eventfs
- * files. When the directory of the eventfs files are accessed, their
- * dentries are created on the fly. This function is used to start that
- * process.
- */
-struct dentry *eventfs_start_creating(const char *name, struct dentry *parent)
-{
-	struct dentry *dentry;
-	int error;
-
-	/* Must always have a parent. */
-	if (WARN_ON_ONCE(!parent))
-		return ERR_PTR(-EINVAL);
-
-	error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
-			      &tracefs_mount_count);
-	if (error)
-		return ERR_PTR(error);
-
-	if (unlikely(IS_DEADDIR(parent->d_inode)))
-		dentry = ERR_PTR(-ENOENT);
-	else
-		dentry = lookup_one_len(name, parent, strlen(name));
-
-	if (!IS_ERR(dentry) && dentry->d_inode) {
-		dput(dentry);
-		dentry = ERR_PTR(-EEXIST);
-	}
-
-	if (IS_ERR(dentry))
-		simple_release_fs(&tracefs_mount, &tracefs_mount_count);
-
-	return dentry;
-}
-
-/**
- * eventfs_failed_creating - clean up a failed eventfs dentry creation
- * @dentry: The dentry to clean up
- *
- * If after calling eventfs_start_creating(), a failure is detected, the
- * resources created by eventfs_start_creating() needs to be cleaned up. In
- * that case, this function should be called to perform that clean up.
- */
-struct dentry *eventfs_failed_creating(struct dentry *dentry)
-{
-	dput(dentry);
-	simple_release_fs(&tracefs_mount, &tracefs_mount_count);
-	return NULL;
-}
-
-/**
- * eventfs_end_creating - Finish the process of creating a eventfs dentry
- * @dentry: The dentry that has successfully been created.
- *
- * This function is currently just a place holder to match
- * eventfs_start_creating(). In case any synchronization needs to be added,
- * this function will be used to implement that without having to modify
- * the callers of eventfs_start_creating().
- */
-struct dentry *eventfs_end_creating(struct dentry *dentry)
-{
-	return dentry;
-}
-
 /* Find the inode that this will use for default */
 static struct inode *instance_inode(struct dentry *parent, struct inode *inode)
 {
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 7d84349ade87..09037e2c173d 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -80,9 +80,6 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
 struct dentry *tracefs_end_creating(struct dentry *dentry);
 struct dentry *tracefs_failed_creating(struct dentry *dentry);
 struct inode *tracefs_get_inode(struct super_block *sb);
-struct dentry *eventfs_start_creating(const char *name, struct dentry *parent);
-struct dentry *eventfs_failed_creating(struct dentry *dentry);
-struct dentry *eventfs_end_creating(struct dentry *dentry);
 void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
 
 #endif /* _TRACEFS_INTERNAL_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/7] eventfs: Remove unused d_parent pointer field
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
                   ` (3 preceding siblings ...)
  2024-01-31 18:49 ` [PATCH v2 4/7] tracefs: dentry lookup crapectomy Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 6/7] eventfs: Clean up dentry ops and add revalidate function Steven Rostedt
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

It's never used

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/

Cc: stable@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Original-patch: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-4-torvalds@linux-foundation.org

 fs/tracefs/event_inode.c | 4 +---
 fs/tracefs/internal.h    | 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 4878f4d578be..0289ec787367 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -686,10 +686,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
 	INIT_LIST_HEAD(&ei->list);
 
 	mutex_lock(&eventfs_mutex);
-	if (!parent->is_freed) {
+	if (!parent->is_freed)
 		list_add_tail(&ei->list, &parent->children);
-		ei->d_parent = parent->dentry;
-	}
 	mutex_unlock(&eventfs_mutex);
 
 	/* Was the parent freed? */
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 09037e2c173d..932733a2696a 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -36,7 +36,6 @@ struct eventfs_attr {
  * @name:	the name of the directory to create
  * @children:	link list into the child eventfs_inode
  * @dentry:     the dentry of the directory
- * @d_parent:   pointer to the parent's dentry
  * @d_children: The array of dentries to represent the files when created
  * @entry_attrs: Saved mode and ownership of the @d_children
  * @attr:	Saved mode and ownership of eventfs_inode itself
@@ -51,7 +50,6 @@ struct eventfs_inode {
 	const char			*name;
 	struct list_head		children;
 	struct dentry			*dentry; /* Check is_freed to access */
-	struct dentry			*d_parent;
 	struct dentry			**d_children;
 	struct eventfs_attr		*entry_attrs;
 	struct eventfs_attr		attr;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/7] eventfs: Clean up dentry ops and add revalidate function
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
                   ` (4 preceding siblings ...)
  2024-01-31 18:49 ` [PATCH v2 5/7] eventfs: Remove unused d_parent pointer field Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-01-31 18:49 ` [PATCH v2 7/7] eventfs: Get rid of dentry pointers without refcounts Steven Rostedt
  2024-01-31 19:17 ` [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

In order for the dentries to stay up-to-date with the eventfs changes,
just add a 'd_revalidate' function that checks the 'is_freed' bit.

Also, clean up the dentry release to actually use d_release() rather
than the slightly odd d_iput() function.  We don't care about the inode,
all we want to do is to get rid of the refcount to the eventfs data
added by dentry->d_fsdata.

It would probably be cleaner to make eventfs its own filesystem, or at
least set its own dentry ops when looking up eventfs files.  But as it
is, only eventfs dentries use d_fsdata, so we don't really need to split
these things up by use.

Another thing that might be worth doing is to make all eventfs lookups
mark their dentries as not worth caching.  We could do that with
d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.

As it is, the dentries are all freeable, but they only tend to get freed
at memory pressure rather than more proactively.  But that's a separate
issue.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/

Cc: stable@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Original-patch: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-6-torvalds@linux-foundation.org

 fs/tracefs/event_inode.c |  5 ++---
 fs/tracefs/inode.c       | 27 ++++++++++++++++++---------
 fs/tracefs/internal.h    |  3 ++-
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0289ec787367..0213a3375d53 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -378,13 +378,12 @@ static void free_ei(struct eventfs_inode *ei)
 }
 
 /**
- * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode
- * @ti: the tracefs_inode of the dentry
+ * eventfs_d_release - dentry is going away
  * @dentry: dentry which has the reference to remove.
  *
  * Remove the association between a dentry from an eventfs_inode.
  */
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
+void eventfs_d_release(struct dentry *dentry)
 {
 	struct eventfs_inode *ei;
 	int i;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 5c84460feeeb..d65ffad4c327 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -377,21 +377,30 @@ static const struct super_operations tracefs_super_operations = {
 	.show_options	= tracefs_show_options,
 };
 
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+/*
+ * It would be cleaner if eventfs had its own dentry ops.
+ *
+ * Note that d_revalidate is called potentially under RCU,
+ * so it can't take the eventfs mutex etc. It's fine - if
+ * we open a file just as it's marked dead, things will
+ * still work just fine, and just see the old stale case.
+ */
+static void tracefs_d_release(struct dentry *dentry)
 {
-	struct tracefs_inode *ti;
+	if (dentry->d_fsdata)
+		eventfs_d_release(dentry);
+}
 
-	if (!dentry || !inode)
-		return;
+static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+	struct eventfs_inode *ei = dentry->d_fsdata;
 
-	ti = get_tracefs(inode);
-	if (ti && ti->flags & TRACEFS_EVENT_INODE)
-		eventfs_set_ei_status_free(ti, dentry);
-	iput(inode);
+	return !(ei && ei->is_freed);
 }
 
 static const struct dentry_operations tracefs_dentry_operations = {
-	.d_iput = tracefs_dentry_iput,
+	.d_revalidate = tracefs_d_revalidate,
+	.d_release = tracefs_d_release,
 };
 
 static int trace_fill_super(struct super_block *sb, void *data, int silent)
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 932733a2696a..4b50a0668055 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -78,6 +78,7 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
 struct dentry *tracefs_end_creating(struct dentry *dentry);
 struct dentry *tracefs_failed_creating(struct dentry *dentry);
 struct inode *tracefs_get_inode(struct super_block *sb);
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
+
+void eventfs_d_release(struct dentry *dentry);
 
 #endif /* _TRACEFS_INTERNAL_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 7/7] eventfs: Get rid of dentry pointers without refcounts
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
                   ` (5 preceding siblings ...)
  2024-01-31 18:49 ` [PATCH v2 6/7] eventfs: Clean up dentry ops and add revalidate function Steven Rostedt
@ 2024-01-31 18:49 ` Steven Rostedt
  2024-01-31 19:17 ` [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer.  That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.

There were two reasonms why eventfs tried to keep a pointer to a dentry:

 - the creation of a 'events' directory would actually have a stable
   dentry pointer that it created with tracefs_start_creating().

   And it needed that dentry when tearing it all down again in
   eventfs_remove_events_dir().

   This use is actually ok, because the special top-level events
   directory dentries are actually stable, not just a temporary cache of
   the eventfs data structures.

 - the 'eventfs_inode' (aka ei) needs to stay around as long as there
   are dentries that refer to it.

   It then used these dentry pointers as a replacement for doing
   reference counting: it would try to make sure that there was only
   ever one dentry associated with an event_inode, and keep a child
   dentry array around to see which dentries might still refer to the
   parent ei.

This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.

The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei.  As
does the directory dentries.  That makes the broken use case go away.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/

Cc: stable@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-5-torvalds@linux-foundation.org

- Put back the kstrdup_const()

- use kfree_rcu(ei, rcu);

- Replace simple_recursive_removal() with d_invalidate().

 fs/tracefs/event_inode.c | 247 ++++++++++++---------------------------
 fs/tracefs/internal.h    |   7 +-
 2 files changed, 77 insertions(+), 177 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0213a3375d53..31cbe38739fa 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -62,6 +62,35 @@ enum {
 
 #define EVENTFS_MODE_MASK	(EVENTFS_SAVE_MODE - 1)
 
+/*
+ * eventfs_inode reference count management.
+ *
+ * NOTE! We count only references from dentries, in the
+ * form 'dentry->d_fsdata'. There are also references from
+ * directory inodes ('ti->private'), but the dentry reference
+ * count is always a superset of the inode reference count.
+ */
+static void release_ei(struct kref *ref)
+{
+	struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
+	kfree(ei->entry_attrs);
+	kfree_const(ei->name);
+	kfree_rcu(ei, rcu);
+}
+
+static inline void put_ei(struct eventfs_inode *ei)
+{
+	if (ei)
+		kref_put(&ei->kref, release_ei);
+}
+
+static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
+{
+	if (ei)
+		kref_get(&ei->kref);
+	return ei;
+}
+
 static struct dentry *eventfs_root_lookup(struct inode *dir,
 					  struct dentry *dentry,
 					  unsigned int flags);
@@ -289,7 +318,8 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode,
  * directory. The inode.i_private pointer will point to @data in the open()
  * call.
  */
-static struct dentry *lookup_file(struct dentry *dentry,
+static struct dentry *lookup_file(struct eventfs_inode *parent_ei,
+				  struct dentry *dentry,
 				  umode_t mode,
 				  struct eventfs_attr *attr,
 				  void *data,
@@ -302,7 +332,7 @@ static struct dentry *lookup_file(struct dentry *dentry,
 		mode |= S_IFREG;
 
 	if (WARN_ON_ONCE(!S_ISREG(mode)))
-		return NULL;
+		return ERR_PTR(-EIO);
 
 	inode = tracefs_get_inode(dentry->d_sb);
 	if (unlikely(!inode))
@@ -321,9 +351,12 @@ static struct dentry *lookup_file(struct dentry *dentry,
 	ti = get_tracefs(inode);
 	ti->flags |= TRACEFS_EVENT_INODE;
 
+	// Files have their parent's ei as their fsdata
+	dentry->d_fsdata = get_ei(parent_ei);
+
 	d_add(dentry, inode);
 	fsnotify_create(dentry->d_parent->d_inode, dentry);
-	return dentry;
+	return NULL;
 };
 
 /**
@@ -359,22 +392,29 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry,
 	/* Only directories have ti->private set to an ei, not files */
 	ti->private = ei;
 
-	dentry->d_fsdata = ei;
-        ei->dentry = dentry;	// Remove me!
+	dentry->d_fsdata = get_ei(ei);
 
 	inc_nlink(inode);
 	d_add(dentry, inode);
 	inc_nlink(dentry->d_parent->d_inode);
 	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
-	return dentry;
+	return NULL;
 }
 
-static void free_ei(struct eventfs_inode *ei)
+static inline struct eventfs_inode *alloc_ei(const char *name)
 {
-	kfree_const(ei->name);
-	kfree(ei->d_children);
-	kfree(ei->entry_attrs);
-	kfree(ei);
+	struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+
+	if (!ei)
+		return NULL;
+
+	ei->name = kstrdup_const(name, GFP_KERNEL);
+	if (!ei->name) {
+		kfree(ei);
+		return NULL;
+	}
+	kref_init(&ei->kref);
+	return ei;
 }
 
 /**
@@ -385,39 +425,13 @@ static void free_ei(struct eventfs_inode *ei)
  */
 void eventfs_d_release(struct dentry *dentry)
 {
-	struct eventfs_inode *ei;
-	int i;
-
-	mutex_lock(&eventfs_mutex);
-
-	ei = dentry->d_fsdata;
-	if (!ei)
-		goto out;
-
-	/* This could belong to one of the files of the ei */
-	if (ei->dentry != dentry) {
-		for (i = 0; i < ei->nr_entries; i++) {
-			if (ei->d_children[i] == dentry)
-				break;
-		}
-		if (WARN_ON_ONCE(i == ei->nr_entries))
-			goto out;
-		ei->d_children[i] = NULL;
-	} else if (ei->is_freed) {
-		free_ei(ei);
-	} else {
-		ei->dentry = NULL;
-	}
-
-	dentry->d_fsdata = NULL;
- out:
-	mutex_unlock(&eventfs_mutex);
+	put_ei(dentry->d_fsdata);
 }
 
 /**
  * lookup_file_dentry - create a dentry for a file of an eventfs_inode
  * @ei: the eventfs_inode that the file will be created under
- * @idx: the index into the d_children[] of the @ei
+ * @idx: the index into the entry_attrs[] of the @ei
  * @parent: The parent dentry of the created file.
  * @name: The name of the file to create
  * @mode: The mode of the file.
@@ -434,17 +448,11 @@ lookup_file_dentry(struct dentry *dentry,
 		   const struct file_operations *fops)
 {
 	struct eventfs_attr *attr = NULL;
-	struct dentry **e_dentry = &ei->d_children[idx];
 
 	if (ei->entry_attrs)
 		attr = &ei->entry_attrs[idx];
 
-	dentry->d_fsdata = ei;		// NOTE: ei of _parent_
-	lookup_file(dentry, mode, attr, data, fops);
-
-	*e_dentry = dentry;	// Remove me
-
-	return dentry;
+	return lookup_file(ei, dentry, mode, attr, data, fops);
 }
 
 /**
@@ -465,7 +473,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 	struct tracefs_inode *ti;
 	struct eventfs_inode *ei;
 	const char *name = dentry->d_name.name;
-	struct dentry *result = NULL;
+	struct dentry *result;
 
 	ti = get_tracefs(dir);
 	if (!(ti->flags & TRACEFS_EVENT_INODE))
@@ -482,7 +490,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 			continue;
 		if (ei_child->is_freed)
 			goto enoent;
-		lookup_dir_entry(dentry, ei, ei_child);
+		result = lookup_dir_entry(dentry, ei, ei_child);
 		goto out;
 	}
 
@@ -499,7 +507,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
 		if (entry->callback(name, &mode, &data, &fops) <= 0)
 			goto enoent;
 
-		lookup_file_dentry(dentry, ei, i, mode, data, fops);
+		result = lookup_file_dentry(dentry, ei, i, mode, data, fops);
 		goto out;
 	}
 
@@ -659,25 +667,10 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
 	if (!parent)
 		return ERR_PTR(-EINVAL);
 
-	ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+	ei = alloc_ei(name);
 	if (!ei)
 		return ERR_PTR(-ENOMEM);
 
-	ei->name = kstrdup_const(name, GFP_KERNEL);
-	if (!ei->name) {
-		kfree(ei);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	if (size) {
-		ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL);
-		if (!ei->d_children) {
-			kfree_const(ei->name);
-			kfree(ei);
-			return ERR_PTR(-ENOMEM);
-		}
-	}
-
 	ei->entries = entries;
 	ei->nr_entries = size;
 	ei->data = data;
@@ -691,7 +684,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
 
 	/* Was the parent freed? */
 	if (list_empty(&ei->list)) {
-		free_ei(ei);
+		put_ei(ei);
 		ei = NULL;
 	}
 	return ei;
@@ -726,28 +719,20 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
 	if (IS_ERR(dentry))
 		return ERR_CAST(dentry);
 
-	ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+	ei = alloc_ei(name);
 	if (!ei)
-		goto fail_ei;
+		goto fail;
 
 	inode = tracefs_get_inode(dentry->d_sb);
 	if (unlikely(!inode))
 		goto fail;
 
-	if (size) {
-		ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL);
-		if (!ei->d_children)
-			goto fail;
-	}
-
-	ei->dentry = dentry;
+	// Note: we have a ref to the dentry from tracefs_start_creating()
+	ei->events_dir = dentry;
 	ei->entries = entries;
 	ei->nr_entries = size;
 	ei->is_events = 1;
 	ei->data = data;
-	ei->name = kstrdup_const(name, GFP_KERNEL);
-	if (!ei->name)
-		goto fail;
 
 	/* Save the ownership of this directory */
 	uid = d_inode(dentry->d_parent)->i_uid;
@@ -778,7 +763,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
 	inode->i_op = &eventfs_root_dir_inode_operations;
 	inode->i_fop = &eventfs_file_operations;
 
-	dentry->d_fsdata = ei;
+	dentry->d_fsdata = get_ei(ei);
 
 	/* directory inodes start off with i_nlink == 2 (for "." entry) */
 	inc_nlink(inode);
@@ -790,72 +775,11 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
 	return ei;
 
  fail:
-	kfree(ei->d_children);
-	kfree(ei);
- fail_ei:
+	put_ei(ei);
 	tracefs_failed_creating(dentry);
 	return ERR_PTR(-ENOMEM);
 }
 
-static LLIST_HEAD(free_list);
-
-static void eventfs_workfn(struct work_struct *work)
-{
-        struct eventfs_inode *ei, *tmp;
-        struct llist_node *llnode;
-
-	llnode = llist_del_all(&free_list);
-        llist_for_each_entry_safe(ei, tmp, llnode, llist) {
-		/* This dput() matches the dget() from unhook_dentry() */
-		for (int i = 0; i < ei->nr_entries; i++) {
-			if (ei->d_children[i])
-				dput(ei->d_children[i]);
-		}
-		/* This should only get here if it had a dentry */
-		if (!WARN_ON_ONCE(!ei->dentry))
-			dput(ei->dentry);
-        }
-}
-
-static DECLARE_WORK(eventfs_work, eventfs_workfn);
-
-static void free_rcu_ei(struct rcu_head *head)
-{
-	struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu);
-
-	if (ei->dentry) {
-		/* Do not free the ei until all references of dentry are gone */
-		if (llist_add(&ei->llist, &free_list))
-			queue_work(system_unbound_wq, &eventfs_work);
-		return;
-	}
-
-	/* If the ei doesn't have a dentry, neither should its children */
-	for (int i = 0; i < ei->nr_entries; i++) {
-		WARN_ON_ONCE(ei->d_children[i]);
-	}
-
-	free_ei(ei);
-}
-
-static void unhook_dentry(struct dentry *dentry)
-{
-	if (!dentry)
-		return;
-	/*
-	 * Need to add a reference to the dentry that is expected by
-	 * simple_recursive_removal(), which will include a dput().
-	 */
-	dget(dentry);
-
-	/*
-	 * Also add a reference for the dput() in eventfs_workfn().
-	 * That is required as that dput() will free the ei after
-	 * the SRCU grace period is over.
-	 */
-	dget(dentry);
-}
-
 /**
  * eventfs_remove_rec - remove eventfs dir or file from list
  * @ei: eventfs_inode to be removed.
@@ -868,8 +792,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
 {
 	struct eventfs_inode *ei_child;
 
-	if (!ei)
-		return;
 	/*
 	 * Check recursion depth. It should never be greater than 3:
 	 * 0 - events/
@@ -881,28 +803,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
 		return;
 
 	/* search for nested folders or files */
-	list_for_each_entry_srcu(ei_child, &ei->children, list,
-				 lockdep_is_held(&eventfs_mutex)) {
-		/* Children only have dentry if parent does */
-		WARN_ON_ONCE(ei_child->dentry && !ei->dentry);
+	list_for_each_entry(ei_child, &ei->children, list)
 		eventfs_remove_rec(ei_child, level + 1);
-	}
-
 
 	ei->is_freed = 1;
-
-	for (int i = 0; i < ei->nr_entries; i++) {
-		if (ei->d_children[i]) {
-			/* Children only have dentry if parent does */
-			WARN_ON_ONCE(!ei->dentry);
-			unhook_dentry(ei->d_children[i]);
-		}
-	}
-
-	unhook_dentry(ei->dentry);
-
-	list_del_rcu(&ei->list);
-	call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
+	list_del(&ei->list);
+	put_ei(ei);
 }
 
 /**
@@ -913,22 +819,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
  */
 void eventfs_remove_dir(struct eventfs_inode *ei)
 {
-	struct dentry *dentry;
-
 	if (!ei)
 		return;
 
 	mutex_lock(&eventfs_mutex);
-	dentry = ei->dentry;
 	eventfs_remove_rec(ei, 0);
 	mutex_unlock(&eventfs_mutex);
-
-	/*
-	 * If any of the ei children has a dentry, then the ei itself
-	 * must have a dentry.
-	 */
-	if (dentry)
-		simple_recursive_removal(dentry, NULL);
 }
 
 /**
@@ -941,7 +837,11 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
 {
 	struct dentry *dentry;
 
-	dentry = ei->dentry;
+	dentry = ei->events_dir;
+	if (!dentry)
+		return;
+
+	ei->events_dir = NULL;
 	eventfs_remove_dir(ei);
 
 	/*
@@ -951,5 +851,6 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
 	 * sticks around while the other ei->dentry are created
 	 * and destroyed dynamically.
 	 */
+	d_invalidate(dentry);
 	dput(dentry);
 }
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 4b50a0668055..1886f1826cd8 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -35,8 +35,7 @@ struct eventfs_attr {
  * @entries:	the array of entries representing the files in the directory
  * @name:	the name of the directory to create
  * @children:	link list into the child eventfs_inode
- * @dentry:     the dentry of the directory
- * @d_children: The array of dentries to represent the files when created
+ * @events_dir: the dentry of the events directory
  * @entry_attrs: Saved mode and ownership of the @d_children
  * @attr:	Saved mode and ownership of eventfs_inode itself
  * @data:	The private data to pass to the callbacks
@@ -45,12 +44,12 @@ struct eventfs_attr {
  * @nr_entries: The number of items in @entries
  */
 struct eventfs_inode {
+	struct kref			kref;
 	struct list_head		list;
 	const struct eventfs_entry	*entries;
 	const char			*name;
 	struct list_head		children;
-	struct dentry			*dentry; /* Check is_freed to access */
-	struct dentry			**d_children;
+	struct dentry			*events_dir;
 	struct eventfs_attr		*entry_attrs;
 	struct eventfs_attr		attr;
 	void				*data;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy)
  2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
                   ` (6 preceding siblings ...)
  2024-01-31 18:49 ` [PATCH v2 7/7] eventfs: Get rid of dentry pointers without refcounts Steven Rostedt
@ 2024-01-31 19:17 ` Steven Rostedt
  7 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-01-31 19:17 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-fsdevel
  Cc: Linus Torvalds, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Christian Brauner, Al Viro, Ajay Kaher,
	Greg Kroah-Hartman

On Wed, 31 Jan 2024 13:49:18 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I would like to have this entire series go all the way back to 6.6 (after it
> is accepted in mainline of course) and replace everything since the creation
> of the eventfs code.  That is, stable releases may need to add all the
> patches that are in fs/tracefs to make that happen. The reason being is that
> this rewrite likely fixed a lot of hidden bugs and I honestly believe it's
> more stable than the code that currently exists.

If there is no more issues found here, and Linus pulls it into 6.8, I'll
make the backport series for both 6.7 and 6.6.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy
  2024-01-31 18:49 ` [PATCH v2 4/7] tracefs: dentry lookup crapectomy Steven Rostedt
@ 2024-02-01  0:27   ` Al Viro
  2024-02-01  2:26     ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-02-01  0:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, linux-fsdevel, Linus Torvalds,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Greg Kroah-Hartman, stable

On Wed, Jan 31, 2024 at 01:49:22PM -0500, Steven Rostedt wrote:

> @@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, umode_t mode,
>  
>  	ti = get_tracefs(inode);
>  	ti->flags |= TRACEFS_EVENT_INODE;
> -	d_instantiate(dentry, inode);
> +
> +	d_add(dentry, inode);
>  	fsnotify_create(dentry->d_parent->d_inode, dentry);

Seriously?  stat(2), have it go away from dcache on memory pressure,
lather, rinse, repeat...  Won't *snotify get confused by the stream
of creations of the same thing, with not a removal in sight?

> -	return eventfs_end_creating(dentry);
> +	return dentry;
>  };
>  
>  /**
> - * create_dir - create a dir in the tracefs filesystem
> + * lookup_dir_entry - look up a dir in the tracefs filesystem
> + * @dentry: the directory to look up
>   * @ei: the eventfs_inode that represents the directory to create
> - * @parent: parent dentry for this file.
>   *
> - * This function will create a dentry for a directory represented by
> + * This function will look up a dentry for a directory represented by
>   * a eventfs_inode.
>   */
> -static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent)
> +static struct dentry *lookup_dir_entry(struct dentry *dentry,
> +	struct eventfs_inode *pei, struct eventfs_inode *ei)
>  {
>  	struct tracefs_inode *ti;
> -	struct dentry *dentry;
>  	struct inode *inode;
>  
> -	dentry = eventfs_start_creating(ei->name, parent);
> -	if (IS_ERR(dentry))
> -		return dentry;
> -
>  	inode = tracefs_get_inode(dentry->d_sb);
>  	if (unlikely(!inode))
> -		return eventfs_failed_creating(dentry);
> +		return ERR_PTR(-ENOMEM);
>  
>  	/* If the user updated the directory's attributes, use them */
>  	update_inode_attr(dentry, inode, &ei->attr,
> @@ -371,11 +359,14 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
>  	/* Only directories have ti->private set to an ei, not files */
>  	ti->private = ei;
>  
> +	dentry->d_fsdata = ei;
> +        ei->dentry = dentry;	// Remove me!
> +
>  	inc_nlink(inode);
> -	d_instantiate(dentry, inode);
> +	d_add(dentry, inode);
>  	inc_nlink(dentry->d_parent->d_inode);

What will happen when that thing gets evicted from dcache,
gets looked up again, and again, and...?

>  	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);

Same re snotify confusion...

> -	return eventfs_end_creating(dentry);
> +	return dentry;
>  }
>  
>  static void free_ei(struct eventfs_inode *ei)
> @@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
>  }
>  
>  /**
> - * create_file_dentry - create a dentry for a file of an eventfs_inode
> + * lookup_file_dentry - create a dentry for a file of an eventfs_inode
>   * @ei: the eventfs_inode that the file will be created under
>   * @idx: the index into the d_children[] of the @ei
>   * @parent: The parent dentry of the created file.
> @@ -438,157 +429,21 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
>   * address located at @e_dentry.
>   */
>  static struct dentry *
> -create_file_dentry(struct eventfs_inode *ei, int idx,
> -		   struct dentry *parent, const char *name, umode_t mode, void *data,
> +lookup_file_dentry(struct dentry *dentry,
> +		   struct eventfs_inode *ei, int idx,
> +		   umode_t mode, void *data,
>  		   const struct file_operations *fops)
>  {
>  	struct eventfs_attr *attr = NULL;
>  	struct dentry **e_dentry = &ei->d_children[idx];
> -	struct dentry *dentry;
> -
> -	WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
>  
> -	mutex_lock(&eventfs_mutex);
> -	if (ei->is_freed) {
> -		mutex_unlock(&eventfs_mutex);
> -		return NULL;
> -	}
> -	/* If the e_dentry already has a dentry, use it */
> -	if (*e_dentry) {
> -		dget(*e_dentry);
> -		mutex_unlock(&eventfs_mutex);
> -		return *e_dentry;
> -	}
> -
> -	/* ei->entry_attrs are protected by SRCU */
>  	if (ei->entry_attrs)
>  		attr = &ei->entry_attrs[idx];
>  
> -	mutex_unlock(&eventfs_mutex);
> -
> -	dentry = create_file(name, mode, attr, parent, data, fops);
> -
> -	mutex_lock(&eventfs_mutex);
> -
> -	if (IS_ERR_OR_NULL(dentry)) {
> -		/*
> -		 * When the mutex was released, something else could have
> -		 * created the dentry for this e_dentry. In which case
> -		 * use that one.
> -		 *
> -		 * If ei->is_freed is set, the e_dentry is currently on its
> -		 * way to being freed, don't return it. If e_dentry is NULL
> -		 * it means it was already freed.
> -		 */
> -		if (ei->is_freed) {
> -			dentry = NULL;
> -		} else {
> -			dentry = *e_dentry;
> -			dget(dentry);
> -		}
> -		mutex_unlock(&eventfs_mutex);
> -		return dentry;
> -	}
> -
> -	if (!*e_dentry && !ei->is_freed) {
> -		*e_dentry = dentry;
> -		dentry->d_fsdata = ei;
> -	} else {
> -		/*
> -		 * Should never happen unless we get here due to being freed.
> -		 * Otherwise it means two dentries exist with the same name.
> -		 */
> -		WARN_ON_ONCE(!ei->is_freed);
> -		dentry = NULL;
> -	}
> -	mutex_unlock(&eventfs_mutex);
> -
> -	return dentry;
> -}
> -
> -/**
> - * eventfs_post_create_dir - post create dir routine
> - * @ei: eventfs_inode of recently created dir
> - *
> - * Map the meta-data of files within an eventfs dir to their parent dentry
> - */
> -static void eventfs_post_create_dir(struct eventfs_inode *ei)
> -{
> -	struct eventfs_inode *ei_child;
> -
> -	lockdep_assert_held(&eventfs_mutex);
> -
> -	/* srcu lock already held */
> -	/* fill parent-child relation */
> -	list_for_each_entry_srcu(ei_child, &ei->children, list,
> -				 srcu_read_lock_held(&eventfs_srcu)) {
> -		ei_child->d_parent = ei->dentry;
> -	}
> -}
> -
> -/**
> - * create_dir_dentry - Create a directory dentry for the eventfs_inode
> - * @pei: The eventfs_inode parent of ei.
> - * @ei: The eventfs_inode to create the directory for
> - * @parent: The dentry of the parent of this directory
> - *
> - * This creates and attaches a directory dentry to the eventfs_inode @ei.
> - */
> -static struct dentry *
> -create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
> -		  struct dentry *parent)
> -{
> -	struct dentry *dentry = NULL;
> -
> -	WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
> -
> -	mutex_lock(&eventfs_mutex);
> -	if (pei->is_freed || ei->is_freed) {
> -		mutex_unlock(&eventfs_mutex);
> -		return NULL;
> -	}
> -	if (ei->dentry) {
> -		/* If the eventfs_inode already has a dentry, use it */
> -		dentry = ei->dentry;
> -		dget(dentry);
> -		mutex_unlock(&eventfs_mutex);
> -		return dentry;
> -	}
> -	mutex_unlock(&eventfs_mutex);
> +	dentry->d_fsdata = ei;		// NOTE: ei of _parent_
> +	lookup_file(dentry, mode, attr, data, fops);
>  
> -	dentry = create_dir(ei, parent);
> -
> -	mutex_lock(&eventfs_mutex);
> -
> -	if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) {
> -		/*
> -		 * When the mutex was released, something else could have
> -		 * created the dentry for this e_dentry. In which case
> -		 * use that one.
> -		 *
> -		 * If ei->is_freed is set, the e_dentry is currently on its
> -		 * way to being freed.
> -		 */
> -		dentry = ei->dentry;
> -		if (dentry)
> -			dget(dentry);
> -		mutex_unlock(&eventfs_mutex);
> -		return dentry;
> -	}
> -
> -	if (!ei->dentry && !ei->is_freed) {
> -		ei->dentry = dentry;
> -		eventfs_post_create_dir(ei);
> -		dentry->d_fsdata = ei;
> -	} else {
> -		/*
> -		 * Should never happen unless we get here due to being freed.
> -		 * Otherwise it means two dentries exist with the same name.
> -		 */
> -		WARN_ON_ONCE(!ei->is_freed);
> -		dentry = NULL;
> -	}
> -	mutex_unlock(&eventfs_mutex);
> +	*e_dentry = dentry;	// Remove me
>  
>  	return dentry;
>  }
> @@ -607,79 +462,55 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
>  					  struct dentry *dentry,
>  					  unsigned int flags)
>  {
> -	const struct file_operations *fops;
> -	const struct eventfs_entry *entry;
>  	struct eventfs_inode *ei_child;
>  	struct tracefs_inode *ti;
>  	struct eventfs_inode *ei;
> -	struct dentry *ei_dentry = NULL;
> -	struct dentry *ret = NULL;
> -	struct dentry *d;
>  	const char *name = dentry->d_name.name;
> -	umode_t mode;
> -	void *data;
> -	int idx;
> -	int i;
> -	int r;
> +	struct dentry *result = NULL;
>  
>  	ti = get_tracefs(dir);
>  	if (!(ti->flags & TRACEFS_EVENT_INODE))

	Can that ever happen?  I mean, why set ->i_op to something that
has this for ->lookup() on a directory without TRACEFS_EVENT_INODE in
its inode?  It's not as if you ever removed that flag...

> -		return NULL;
> -
> -	/* Grab srcu to prevent the ei from going away */
> -	idx = srcu_read_lock(&eventfs_srcu);
> +		return ERR_PTR(-EIO);
>  
> -	/*
> -	 * Grab the eventfs_mutex to consistent value from ti->private.
> -	 * This s
> -	 */
>  	mutex_lock(&eventfs_mutex);
> -	ei = READ_ONCE(ti->private);
> -	if (ei && !ei->is_freed)
> -		ei_dentry = READ_ONCE(ei->dentry);
> -	mutex_unlock(&eventfs_mutex);
> -
> -	if (!ei || !ei_dentry)
> -		goto out;
>  
> -	data = ei->data;
> +	ei = ti->private;
> +	if (!ei || ei->is_freed)
> +		goto enoent;
>  
> -	list_for_each_entry_srcu(ei_child, &ei->children, list,
> -				 srcu_read_lock_held(&eventfs_srcu)) {
> +	list_for_each_entry(ei_child, &ei->children, list) {
>  		if (strcmp(ei_child->name, name) != 0)
>  			continue;
> -		ret = simple_lookup(dir, dentry, flags);
> -		if (IS_ERR(ret))
> -			goto out;
> -		d = create_dir_dentry(ei, ei_child, ei_dentry);
> -		dput(d);
> +		if (ei_child->is_freed)
> +			goto enoent;

Out of curiosity - can that happen now?  You've got exclusion with
eventfs_remove_rec(), so you shouldn't be able to catch the moment
between setting ->is_freed and removal from the list...

> +		lookup_dir_entry(dentry, ei, ei_child);
>  		goto out;
>  	}
>  
> -	for (i = 0; i < ei->nr_entries; i++) {
> -		entry = &ei->entries[i];
> -		if (strcmp(name, entry->name) == 0) {
> -			void *cdata = data;
> -			mutex_lock(&eventfs_mutex);
> -			/* If ei->is_freed, then the event itself may be too */
> -			if (!ei->is_freed)
> -				r = entry->callback(name, &mode, &cdata, &fops);
> -			else
> -				r = -1;
> -			mutex_unlock(&eventfs_mutex);
> -			if (r <= 0)
> -				continue;
> -			ret = simple_lookup(dir, dentry, flags);
> -			if (IS_ERR(ret))
> -				goto out;
> -			d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
> -			dput(d);
> -			break;
> -		}
> +	for (int i = 0; i < ei->nr_entries; i++) {
> +		void *data;
> +		umode_t mode;
> +		const struct file_operations *fops;
> +		const struct eventfs_entry *entry = &ei->entries[i];
> +
> +		if (strcmp(name, entry->name) != 0)
> +			continue;
> +
> +		data = ei->data;
> +		if (entry->callback(name, &mode, &data, &fops) <= 0)
> +			goto enoent;
> +
> +		lookup_file_dentry(dentry, ei, i, mode, data, fops);
> +		goto out;
>  	}
> +
> + enoent:
> +	/* Don't cache negative lookups, just return an error */
> +	result = ERR_PTR(-ENOENT);

Huh?  Just return NULL and be done with that - you'll get an
unhashed negative dentry and let the caller turn that into
-ENOENT...

>   out:
> -	srcu_read_unlock(&eventfs_srcu, idx);
> -	return ret;
> +	mutex_unlock(&eventfs_mutex);
> +	return result;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy
  2024-02-01  0:27   ` Al Viro
@ 2024-02-01  2:26     ` Steven Rostedt
  2024-02-01  3:02       ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2024-02-01  2:26 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, linux-trace-kernel, linux-fsdevel, Linus Torvalds,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Greg Kroah-Hartman, stable

On Thu, 1 Feb 2024 00:27:19 +0000
Al Viro <viro@zeniv.linux.org.uk> wrote:

> On Wed, Jan 31, 2024 at 01:49:22PM -0500, Steven Rostedt wrote:
> 
> > @@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, umode_t mode,
> >  
> >  	ti = get_tracefs(inode);
> >  	ti->flags |= TRACEFS_EVENT_INODE;
> > -	d_instantiate(dentry, inode);
> > +
> > +	d_add(dentry, inode);
> >  	fsnotify_create(dentry->d_parent->d_inode, dentry);  
> 
> Seriously?  stat(2), have it go away from dcache on memory pressure,
> lather, rinse, repeat...  Won't *snotify get confused by the stream
> of creations of the same thing, with not a removal in sight?
> 

That looks to be cut and paste from the old create in tracefs. I don't know
of a real use case for that. I think we could possibly delete it without
anyone noticing.


> > -	return eventfs_end_creating(dentry);
> > +	return dentry;
> >  };
> >  


> > @@ -371,11 +359,14 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
> >  	/* Only directories have ti->private set to an ei, not files */
> >  	ti->private = ei;
> >  
> > +	dentry->d_fsdata = ei;
> > +        ei->dentry = dentry;	// Remove me!
> > +
> >  	inc_nlink(inode);
> > -	d_instantiate(dentry, inode);
> > +	d_add(dentry, inode);
> >  	inc_nlink(dentry->d_parent->d_inode);  
> 
> What will happen when that thing gets evicted from dcache,
> gets looked up again, and again, and...?
> 
> >  	fsnotify_mkdir(dentry->d_parent->d_inode, dentry);  
> 
> Same re snotify confusion...

Yeah, again, I think it's useless. Doing that is more useless than taring
the tracefs directory ;-)

> 
> > -	return eventfs_end_creating(dentry);
> > +	return dentry;
> >  }
> >  
> >  static void free_ei(struct eventfs_inode *ei)
> > @@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
> >  }
> >  


> > @@ -607,79 +462,55 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
> >  					  struct dentry *dentry,
> >  					  unsigned int flags)
> >  {
> > -	const struct file_operations *fops;
> > -	const struct eventfs_entry *entry;
> >  	struct eventfs_inode *ei_child;
> >  	struct tracefs_inode *ti;
> >  	struct eventfs_inode *ei;
> > -	struct dentry *ei_dentry = NULL;
> > -	struct dentry *ret = NULL;
> > -	struct dentry *d;
> >  	const char *name = dentry->d_name.name;
> > -	umode_t mode;
> > -	void *data;
> > -	int idx;
> > -	int i;
> > -	int r;
> > +	struct dentry *result = NULL;
> >  
> >  	ti = get_tracefs(dir);
> >  	if (!(ti->flags & TRACEFS_EVENT_INODE))  
> 
> 	Can that ever happen?  I mean, why set ->i_op to something that
> has this for ->lookup() on a directory without TRACEFS_EVENT_INODE in
> its inode?  It's not as if you ever removed that flag...

That's been there mostly as paranoia. Should probably be switched to:

	if (WARN_ON_ONCE(!(ti->flags & TRACEFS_EVENT_INODE)))


> 
> > -		return NULL;
> > -
> > -	/* Grab srcu to prevent the ei from going away */
> > -	idx = srcu_read_lock(&eventfs_srcu);
> > +		return ERR_PTR(-EIO);
> >  
> > -	/*
> > -	 * Grab the eventfs_mutex to consistent value from ti->private.
> > -	 * This s
> > -	 */
> >  	mutex_lock(&eventfs_mutex);
> > -	ei = READ_ONCE(ti->private);
> > -	if (ei && !ei->is_freed)
> > -		ei_dentry = READ_ONCE(ei->dentry);
> > -	mutex_unlock(&eventfs_mutex);
> > -
> > -	if (!ei || !ei_dentry)
> > -		goto out;
> >  
> > -	data = ei->data;
> > +	ei = ti->private;
> > +	if (!ei || ei->is_freed)
> > +		goto enoent;
> >  
> > -	list_for_each_entry_srcu(ei_child, &ei->children, list,
> > -				 srcu_read_lock_held(&eventfs_srcu)) {
> > +	list_for_each_entry(ei_child, &ei->children, list) {
> >  		if (strcmp(ei_child->name, name) != 0)
> >  			continue;
> > -		ret = simple_lookup(dir, dentry, flags);
> > -		if (IS_ERR(ret))
> > -			goto out;
> > -		d = create_dir_dentry(ei, ei_child, ei_dentry);
> > -		dput(d);
> > +		if (ei_child->is_freed)
> > +			goto enoent;  
> 
> Out of curiosity - can that happen now?  You've got exclusion with
> eventfs_remove_rec(), so you shouldn't be able to catch the moment
> between setting ->is_freed and removal from the list...

Yeah, that's from when we just used SRCU. If anything, it too should just
add a WARN_ON_ONCE() to it.

> 
> > +		lookup_dir_entry(dentry, ei, ei_child);
> >  		goto out;
> >  	}
> >  
> > -	for (i = 0; i < ei->nr_entries; i++) {
> > -		entry = &ei->entries[i];
> > -		if (strcmp(name, entry->name) == 0) {
> > -			void *cdata = data;
> > -			mutex_lock(&eventfs_mutex);
> > -			/* If ei->is_freed, then the event itself may be too */
> > -			if (!ei->is_freed)
> > -				r = entry->callback(name, &mode, &cdata, &fops);
> > -			else
> > -				r = -1;
> > -			mutex_unlock(&eventfs_mutex);
> > -			if (r <= 0)
> > -				continue;
> > -			ret = simple_lookup(dir, dentry, flags);
> > -			if (IS_ERR(ret))
> > -				goto out;
> > -			d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
> > -			dput(d);
> > -			break;
> > -		}
> > +	for (int i = 0; i < ei->nr_entries; i++) {
> > +		void *data;
> > +		umode_t mode;
> > +		const struct file_operations *fops;
> > +		const struct eventfs_entry *entry = &ei->entries[i];
> > +
> > +		if (strcmp(name, entry->name) != 0)
> > +			continue;
> > +
> > +		data = ei->data;
> > +		if (entry->callback(name, &mode, &data, &fops) <= 0)
> > +			goto enoent;
> > +
> > +		lookup_file_dentry(dentry, ei, i, mode, data, fops);
> > +		goto out;
> >  	}
> > +
> > + enoent:
> > +	/* Don't cache negative lookups, just return an error */
> > +	result = ERR_PTR(-ENOENT);  
> 
> Huh?  Just return NULL and be done with that - you'll get an
> unhashed negative dentry and let the caller turn that into
> -ENOENT...

We had a problem here with just returning NULL. It leaves the negative
dentry around and doesn't get refreshed.

I did this:

 # cd /sys/kernel/tracing
 # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory
 # echo 'p:sched schedule' >> kprobe_events
 # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

When it should have been:

 # ls events/kprobes/sched/
enable  filter  format  hist  hist_debug  id  inject  trigger

Leaving the negative dentry there will have it fail when the directory
exists the next time.

-- Steve



> 
> >   out:
> > -	srcu_read_unlock(&eventfs_srcu, idx);
> > -	return ret;
> > +	mutex_unlock(&eventfs_mutex);
> > +	return result;
> >  }
> >  
> >  /*  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy
  2024-02-01  2:26     ` Steven Rostedt
@ 2024-02-01  3:02       ` Al Viro
  2024-02-01  3:21         ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2024-02-01  3:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, linux-fsdevel, Linus Torvalds,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Greg Kroah-Hartman, stable

On Wed, Jan 31, 2024 at 09:26:42PM -0500, Steven Rostedt wrote:

> > Huh?  Just return NULL and be done with that - you'll get an
> > unhashed negative dentry and let the caller turn that into
> > -ENOENT...
> 
> We had a problem here with just returning NULL. It leaves the negative
> dentry around and doesn't get refreshed.

Why would that dentry stick around?  And how would anyone find
it, anyway, when it's not hashed?

> I did this:
> 
>  # cd /sys/kernel/tracing
>  # ls events/kprobes/sched/
> ls: cannot access 'events/kprobes/sched/': No such file or directory
>  # echo 'p:sched schedule' >> kprobe_events
>  # ls events/kprobes/sched/
> ls: cannot access 'events/kprobes/sched/': No such file or directory
> 
> When it should have been:
> 
>  # ls events/kprobes/sched/
> enable  filter  format  hist  hist_debug  id  inject  trigger
> 
> Leaving the negative dentry there will have it fail when the directory
> exists the next time.

Then you have something very deeply fucked up.  NULL or ERR_PTR(-ENOENT)
from ->lookup() in the last component of open() would do exactly the
same thing: dput() whatever had been passed to ->lookup() and fail
open(2) with -ENOENT.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy
  2024-02-01  3:02       ` Al Viro
@ 2024-02-01  3:21         ` Steven Rostedt
  2024-02-01  4:18           ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2024-02-01  3:21 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, linux-trace-kernel, linux-fsdevel, Linus Torvalds,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Greg Kroah-Hartman, stable

On Thu, 1 Feb 2024 03:02:05 +0000
Al Viro <viro@zeniv.linux.org.uk> wrote:

> > We had a problem here with just returning NULL. It leaves the negative
> > dentry around and doesn't get refreshed.  
> 
> Why would that dentry stick around?  And how would anyone find
> it, anyway, when it's not hashed?

We (Linus and I) got it wrong. It originally had:

	d_add(dentry, NULL);
	[..]
	return NULL;

and it caused the:


  # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

  # echo 'p:sched schedule' >> /sys/kernel/tracing/kprobe_events 
  # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

I just changed the code to simply return NULL, and it had no issues:

  # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

  # echo 'p:sched schedule' >> /sys/kernel/tracing/kprobe_events 
  # ls events/kprobes/sched/
enable  filter  format  hist  hist_debug  id  inject  trigger

But then I added the: d_add(dentry, NULL); that we originally had, and then
it caused the issue again.

So it wasn't the returning NULL that was causing a problem, it was calling
the d_add(dentry, NULL); that was.

I'll update the patch.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy
  2024-02-01  3:21         ` Steven Rostedt
@ 2024-02-01  4:18           ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2024-02-01  4:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, linux-trace-kernel, linux-fsdevel, Linus Torvalds,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Greg Kroah-Hartman, stable

On Wed, 31 Jan 2024 22:21:27 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> We (Linus and I) got it wrong. It originally had:
> 
> 	d_add(dentry, NULL);
> 	[..]
> 	return NULL;

OK, so I changed that function to this:

static struct dentry *eventfs_root_lookup(struct inode *dir,
					  struct dentry *dentry,
					  unsigned int flags)
{
	struct eventfs_inode *ei_child;
	struct tracefs_inode *ti;
	struct eventfs_inode *ei;
	const char *name = dentry->d_name.name;

	ti = get_tracefs(dir);
	if (!(ti->flags & TRACEFS_EVENT_INODE))
		return ERR_PTR(-EIO);

	mutex_lock(&eventfs_mutex);

	ei = ti->private;
	if (!ei || ei->is_freed)
		goto out;

	list_for_each_entry(ei_child, &ei->children, list) {
		if (strcmp(ei_child->name, name) != 0)
			continue;
		if (ei_child->is_freed)
			goto out;
		lookup_dir_entry(dentry, ei, ei_child);
		goto out;
	}

	for (int i = 0; i < ei->nr_entries; i++) {
		void *data;
		umode_t mode;
		const struct file_operations *fops;
		const struct eventfs_entry *entry = &ei->entries[i];

		if (strcmp(name, entry->name) != 0)
			continue;

		data = ei->data;
		if (entry->callback(name, &mode, &data, &fops) <= 0)
			goto out;

		lookup_file_dentry(dentry, ei, i, mode, data, fops);
		goto out;
	}
 out:
	mutex_unlock(&eventfs_mutex);
	return NULL;
}

And it passes the make kprobe test. I'll send out a v3 of this patch, and
remove the inc_nlink(dentry->d_parent->d_inode) and the fsnotify as
separate patches as that code was there before Linus touched it.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-02-01  4:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 18:49 [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 1/7] tracefs: Zero out the tracefs_inode when allocating it Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 2/7] eventfs: Initialize the tracefs inode properly Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 3/7] tracefs: Avoid using the ei->dentry pointer unnecessarily Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 4/7] tracefs: dentry lookup crapectomy Steven Rostedt
2024-02-01  0:27   ` Al Viro
2024-02-01  2:26     ` Steven Rostedt
2024-02-01  3:02       ` Al Viro
2024-02-01  3:21         ` Steven Rostedt
2024-02-01  4:18           ` Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 5/7] eventfs: Remove unused d_parent pointer field Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 6/7] eventfs: Clean up dentry ops and add revalidate function Steven Rostedt
2024-01-31 18:49 ` [PATCH v2 7/7] eventfs: Get rid of dentry pointers without refcounts Steven Rostedt
2024-01-31 19:17 ` [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy) Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).