All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show()
@ 2022-08-20  0:05 Tejun Heo
  2022-08-20  0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team

Hello,

Currently, deactivated kernfs nodes are used for two purposes - during
removal to kill and drain nodes and during creation to make multiple
kernfs_nodes creations to succeed or fail as a group.

This patchset make kernfs [de]activation generic so that it can be used
anytime to deactivate (hide and drain) and activate (show) kernfs nodes,
and, on top, implement cgroup_file_show() which allows toggling cgroup file
visiblity.

This is for the following pending patchset to allow disabling PSI on
per-cgroup basis:

 https://lore.kernel.org/all/20220808110341.15799-1-zhouchengming@bytedance.com/t/#u

which requires hiding the corresponding cgroup interface files while
disabled.

This patchset contains the following seven patches.

 0001-kernfs-Simply-by-replacing-kernfs_deref_open_node-wi.patch
 0002-kernfs-Drop-unnecessary-mutex-local-variable-initial.patch
 0003-kernfs-Refactor-kernfs_get_open_node.patch
 0004-kernfs-Skip-kernfs_drain_open_files-more-aggressivel.patch
 0005-kernfs-Make-kernfs_drain-skip-draining-more-aggressi.patch
 0006-kernfs-Allow-kernfs-nodes-to-be-deactivated-and-re-a.patch
 0007-cgroup-Implement-cgroup_file_show.patch

0001-0003 are misc prep patches. 0004-0006 implement kernsf_deactivate().
0008 implements cgroup_file_show() on top. The patches are also available in
the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-deactivate

diffstat follows. Thanks.

 fs/kernfs/dir.c             |  120 +++++++++++++++++++++++++++++++++++++++++-------------------
 fs/kernfs/file.c            |  139 +++++++++++++++++++++++++++++++---------------------------------------
 fs/kernfs/kernfs-internal.h |    1
 include/linux/cgroup.h      |    1
 include/linux/kernfs.h      |    2 +
 kernel/cgroup/cgroup.c      |   23 +++++++++++
 6 files changed, 172 insertions(+), 114 deletions(-)

--
tejun



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on()
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-23  5:15   ` Chengming Zhou
  2022-08-20  0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU
dereference is currently only used in kernfs_notify(). Everywhere else,
either we're holding the lock which protects it or know that the
kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file
which is hanging off of it.

kernfs_deref_open_node() is used for the latter case - accessing
kernfs_open_node from kernfs_open_file. The lifetime and visibility rules
are simple and clear here. To someone who can access a kernfs_open_file, its
kernfs_open_node is pinned and visible through of->kn->attr.open.

Replace kernfs_deref_open_node() which simpler of_on(). The former takes
both @kn and @of and RCU deref @kn->attr.open while sanity checking with
@of. The latter takes @of and uses protected deref on of->kn->attr.open.

As the return value can't be NULL, remove the error handling in the callers
too.

This shouldn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
---
 fs/kernfs/file.c | 56 +++++++++++-------------------------------------
 1 file changed, 13 insertions(+), 43 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index b3ec34386b43..32b16fe00a9e 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -57,31 +57,17 @@ static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
- *
- * @of: associated kernfs_open_file instance.
- * @kn: target kernfs_node.
- *
- * Fetch and return ->attr.open of @kn if @of->list is non empty.
- * If @of->list is not empty we can safely assume that @of is on
- * @kn->attr.open->files list and this guarantees that @kn->attr.open
- * will not vanish i.e. dereferencing outside RCU read-side critical
- * section is safe here.
- *
- * The caller needs to make sure that @of->list is not empty.
+ * of_on - Return the kernfs_open_node of the specified kernfs_open_file
+ * @of: taret kernfs_open_file
  */
-static struct kernfs_open_node *
-kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
+static struct kernfs_open_node *of_on(struct kernfs_open_file *of)
 {
-	struct kernfs_open_node *on;
-
-	on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list));
-
-	return on;
+	return rcu_dereference_protected(of->kn->attr.open,
+					 !list_empty(&of->list));
 }
 
 /**
- * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
+ * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn
  *
  * @kn: target kernfs_node.
  *
@@ -96,7 +82,7 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
  * The caller needs to make sure that kernfs_open_file_mutex is held.
  */
 static struct kernfs_open_node *
-kernfs_deref_open_node_protected(struct kernfs_node *kn)
+kernfs_deref_open_node_locked(struct kernfs_node *kn)
 {
 	return rcu_dereference_protected(kn->attr.open,
 				lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
@@ -207,12 +193,8 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
 static int kernfs_seq_show(struct seq_file *sf, void *v)
 {
 	struct kernfs_open_file *of = sf->private;
-	struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn);
-
-	if (!on)
-		return -EINVAL;
 
-	of->event = atomic_read(&on->event);
+	of->event = atomic_read(&of_on(of)->event);
 
 	return of->kn->attr.ops->seq_show(sf, v);
 }
@@ -235,7 +217,6 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
 	ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE);
 	const struct kernfs_ops *ops;
-	struct kernfs_open_node *on;
 	char *buf;
 
 	buf = of->prealloc_buf;
@@ -257,14 +238,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		goto out_free;
 	}
 
-	on = kernfs_deref_open_node(of, of->kn);
-	if (!on) {
-		len = -EINVAL;
-		mutex_unlock(&of->mutex);
-		goto out_free;
-	}
-
-	of->event = atomic_read(&on->event);
+	of->event = atomic_read(&of_on(of)->event);
 
 	ops = kernfs_ops(of->kn);
 	if (ops->read)
@@ -584,7 +558,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 	struct mutex *mutex = NULL;
 
 	mutex = kernfs_open_file_mutex_lock(kn);
-	on = kernfs_deref_open_node_protected(kn);
+	on = kernfs_deref_open_node_locked(kn);
 
 	if (on) {
 		list_add_tail(&of->list, &on->files);
@@ -629,7 +603,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 
 	mutex = kernfs_open_file_mutex_lock(kn);
 
-	on = kernfs_deref_open_node_protected(kn);
+	on = kernfs_deref_open_node_locked(kn);
 	if (!on) {
 		mutex_unlock(mutex);
 		return;
@@ -839,7 +813,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 		return;
 
 	mutex = kernfs_open_file_mutex_lock(kn);
-	on = kernfs_deref_open_node_protected(kn);
+	on = kernfs_deref_open_node_locked(kn);
 	if (!on) {
 		mutex_unlock(mutex);
 		return;
@@ -874,11 +848,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
  */
 __poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
 {
-	struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
-	struct kernfs_open_node *on = kernfs_deref_open_node(of, kn);
-
-	if (!on)
-		return EPOLLERR;
+	struct kernfs_open_node *on = of_on(of);
 
 	poll_wait(of->file, &on->poll, wait);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
  2022-08-20  0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-23  5:15   ` Chengming Zhou
  2022-08-20  0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

These are unnecessary and unconventional. Remove them. Also move variable
declaration into the block that it's used. No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
---
 fs/kernfs/file.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 32b16fe00a9e..6437f7c7162d 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -555,7 +555,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 				struct kernfs_open_file *of)
 {
 	struct kernfs_open_node *on, *new_on = NULL;
-	struct mutex *mutex = NULL;
+	struct mutex *mutex;
 
 	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_deref_open_node_locked(kn);
@@ -599,7 +599,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 				 struct kernfs_open_file *of)
 {
 	struct kernfs_open_node *on;
-	struct mutex *mutex = NULL;
+	struct mutex *mutex;
 
 	mutex = kernfs_open_file_mutex_lock(kn);
 
@@ -776,9 +776,10 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
 {
 	struct kernfs_node *kn = inode->i_private;
 	struct kernfs_open_file *of = kernfs_of(filp);
-	struct mutex *mutex = NULL;
 
 	if (kn->flags & KERNFS_HAS_RELEASE) {
+		struct mutex *mutex;
+
 		mutex = kernfs_open_file_mutex_lock(kn);
 		kernfs_release_file(kn, of);
 		mutex_unlock(mutex);
@@ -796,7 +797,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 {
 	struct kernfs_open_node *on;
 	struct kernfs_open_file *of;
-	struct mutex *mutex = NULL;
+	struct mutex *mutex;
 
 	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
 		return;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] kernfs: Refactor kernfs_get_open_node()
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
  2022-08-20  0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
  2022-08-20  0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-23  5:16   ` Chengming Zhou
  2022-08-20  0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

Factor out commont part. This is cleaner and should help with future
changes. No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 6437f7c7162d..7060a2a714b8 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -554,31 +554,28 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
 static int kernfs_get_open_node(struct kernfs_node *kn,
 				struct kernfs_open_file *of)
 {
-	struct kernfs_open_node *on, *new_on = NULL;
+	struct kernfs_open_node *on;
 	struct mutex *mutex;
 
 	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_deref_open_node_locked(kn);
 
-	if (on) {
-		list_add_tail(&of->list, &on->files);
-		mutex_unlock(mutex);
-		return 0;
-	} else {
+	if (!on) {
 		/* not there, initialize a new one */
-		new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
-		if (!new_on) {
+		on = kmalloc(sizeof(*on), GFP_KERNEL);
+		if (!on) {
 			mutex_unlock(mutex);
 			return -ENOMEM;
 		}
-		atomic_set(&new_on->event, 1);
-		init_waitqueue_head(&new_on->poll);
-		INIT_LIST_HEAD(&new_on->files);
-		list_add_tail(&of->list, &new_on->files);
-		rcu_assign_pointer(kn->attr.open, new_on);
+		atomic_set(&on->event, 1);
+		init_waitqueue_head(&on->poll);
+		INIT_LIST_HEAD(&on->files);
+		rcu_assign_pointer(kn->attr.open, on);
 	}
-	mutex_unlock(mutex);
 
+	list_add_tail(&of->list, &on->files);
+
+	mutex_unlock(mutex);
 	return 0;
 }
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
                   ` (2 preceding siblings ...)
  2022-08-20  0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-23  5:27   ` Chengming Zhou
  2022-08-25 12:11   ` Chengming Zhou
  2022-08-20  0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

Track the number of mmapped files and files that need to be released and
skip kernfs_drain_open_file() if both are zero, which are the precise
conditions which require draining open_files. The early exit test is
factored into kernfs_should_drain_open_files() which is now tested by
kernfs_drain_open_files()'s caller - kernfs_drain().

This isn't a meaningful optimization on its own but will enable future
stand-alone kernfs_deactivate() implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c             |  3 ++-
 fs/kernfs/file.c            | 51 +++++++++++++++++++++++++------------
 fs/kernfs/kernfs-internal.h |  1 +
 3 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 1cc88ba6de90..8ae44db920d4 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -489,7 +489,8 @@ static void kernfs_drain(struct kernfs_node *kn)
 		rwsem_release(&kn->dep_map, _RET_IP_);
 	}
 
-	kernfs_drain_open_files(kn);
+	if (kernfs_should_drain_open_files(kn))
+		kernfs_drain_open_files(kn);
 
 	down_write(&root->kernfs_rwsem);
 }
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 7060a2a714b8..b510589af427 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -23,6 +23,8 @@ struct kernfs_open_node {
 	atomic_t		event;
 	wait_queue_head_t	poll;
 	struct list_head	files; /* goes through kernfs_open_file.list */
+	unsigned int		nr_mmapped;
+	unsigned int		nr_to_release;
 };
 
 /*
@@ -527,6 +529,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
 
 	rc = 0;
 	of->mmapped = true;
+	of_on(of)->nr_mmapped++;
 	of->vm_ops = vma->vm_ops;
 	vma->vm_ops = &kernfs_vm_ops;
 out_put:
@@ -574,6 +577,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 	}
 
 	list_add_tail(&of->list, &on->files);
+	if (kn->flags & KERNFS_HAS_RELEASE)
+		on->nr_to_release++;
 
 	mutex_unlock(mutex);
 	return 0;
@@ -606,8 +611,12 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 		return;
 	}
 
-	if (of)
+	if (of) {
+		WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
+		if (of->mmapped)
+			on->nr_mmapped--;
 		list_del(&of->list);
+	}
 
 	if (list_empty(&on->files)) {
 		rcu_assign_pointer(kn->attr.open, NULL);
@@ -766,6 +775,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
 		 */
 		kn->attr.ops->release(of);
 		of->released = true;
+		of_on(of)->nr_to_release--;
 	}
 }
 
@@ -790,25 +800,30 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-void kernfs_drain_open_files(struct kernfs_node *kn)
+bool kernfs_should_drain_open_files(struct kernfs_node *kn)
 {
 	struct kernfs_open_node *on;
-	struct kernfs_open_file *of;
-	struct mutex *mutex;
-
-	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
-		return;
+	bool ret;
 
 	/*
-	 * lockless opportunistic check is safe below because no one is adding to
-	 * ->attr.open at this point of time. This check allows early bail out
-	 * if ->attr.open is already NULL. kernfs_unlink_open_file makes
-	 * ->attr.open NULL only while holding kernfs_open_file_mutex so below
-	 * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
-	 * ->attr.open became NULL while waiting for the mutex.
+	 * @kn being deactivated guarantees that @kn->attr.open can't change
+	 * beneath us making the lockless test below safe.
 	 */
-	if (!rcu_access_pointer(kn->attr.open))
-		return;
+	WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
+
+	rcu_read_lock();
+	on = rcu_dereference(kn->attr.open);
+	ret = on && (on->nr_mmapped || on->nr_to_release);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+void kernfs_drain_open_files(struct kernfs_node *kn)
+{
+	struct kernfs_open_node *on;
+	struct kernfs_open_file *of;
+	struct mutex *mutex;
 
 	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_deref_open_node_locked(kn);
@@ -820,13 +835,17 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 	list_for_each_entry(of, &on->files, list) {
 		struct inode *inode = file_inode(of->file);
 
-		if (kn->flags & KERNFS_HAS_MMAP)
+		if (of->mmapped) {
 			unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+			of->mmapped = false;
+			on->nr_mmapped--;
+		}
 
 		if (kn->flags & KERNFS_HAS_RELEASE)
 			kernfs_release_file(kn, of);
 	}
 
+	WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
 	mutex_unlock(mutex);
 }
 
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 3ae214d02d44..fc5821effd97 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -157,6 +157,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
  */
 extern const struct file_operations kernfs_file_fops;
 
+bool kernfs_should_drain_open_files(struct kernfs_node *kn);
 void kernfs_drain_open_files(struct kernfs_node *kn);
 
 /*
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] kernfs: Make kernfs_drain() skip draining more aggressively
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
                   ` (3 preceding siblings ...)
  2022-08-20  0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-23  5:33   ` Chengming Zhou
  2022-08-20  0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

kernfs_drain() is updated to handle whether draining is necessary or not
rather than its caller. __kernfs_remove() now always calls kernfs_drain()
instead of filtering based on KERNFS_ACTIVATED.

kernfs_drain() now tests kn->active and kernfs_should_drain_open_files() to
determine whether draining is necessary at all. If not, it returns %false
without doing anything. Otherwise, it unlocks kernfs_rwsem and drains as
before and returns %true. The return value isn't used yet.

Using the precise conditions allows skipping more aggressively. This isn't a
meaningful optimization on its own but will enable future stand-alone
kernfs_deactivate() implementation.

While at it, make minor comment updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 8ae44db920d4..f857731598cd 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -460,10 +460,14 @@ void kernfs_put_active(struct kernfs_node *kn)
  * @kn: kernfs_node to drain
  *
  * Drain existing usages and nuke all existing mmaps of @kn.  Mutiple
- * removers may invoke this function concurrently on @kn and all will
+ * callers may invoke this function concurrently on @kn and all will
  * return after draining is complete.
+ *
+ * RETURNS:
+ * %false if nothing needed to be drained; otherwise, %true. On %true return,
+ * kernfs_rwsem has been released and re-acquired.
  */
-static void kernfs_drain(struct kernfs_node *kn)
+static bool kernfs_drain(struct kernfs_node *kn)
 	__releases(&kernfs_root(kn)->kernfs_rwsem)
 	__acquires(&kernfs_root(kn)->kernfs_rwsem)
 {
@@ -472,6 +476,16 @@ static void kernfs_drain(struct kernfs_node *kn)
 	lockdep_assert_held_write(&root->kernfs_rwsem);
 	WARN_ON_ONCE(kernfs_active(kn));
 
+	/*
+	 * Skip draining if already fully drained. This avoids draining and its
+	 * lockdep annotations for nodes which have never been activated
+	 * allowing embedding kernfs_remove() in create error paths without
+	 * worrying about draining.
+	 */
+	if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
+	    kernfs_should_drain_open_files(kn))
+		return false;
+
 	up_write(&root->kernfs_rwsem);
 
 	if (kernfs_lockdep(kn)) {
@@ -480,7 +494,6 @@ static void kernfs_drain(struct kernfs_node *kn)
 			lock_contended(&kn->dep_map, _RET_IP_);
 	}
 
-	/* but everyone should wait for draining */
 	wait_event(root->deactivate_waitq,
 		   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
 
@@ -493,6 +506,8 @@ static void kernfs_drain(struct kernfs_node *kn)
 		kernfs_drain_open_files(kn);
 
 	down_write(&root->kernfs_rwsem);
+
+	return true;
 }
 
 /**
@@ -1370,23 +1385,14 @@ static void __kernfs_remove(struct kernfs_node *kn)
 		pos = kernfs_leftmost_descendant(kn);
 
 		/*
-		 * kernfs_drain() drops kernfs_rwsem temporarily and @pos's
+		 * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
 		 * base ref could have been put by someone else by the time
 		 * the function returns.  Make sure it doesn't go away
 		 * underneath us.
 		 */
 		kernfs_get(pos);
 
-		/*
-		 * Drain iff @kn was activated.  This avoids draining and
-		 * its lockdep annotations for nodes which have never been
-		 * activated and allows embedding kernfs_remove() in create
-		 * error paths without worrying about draining.
-		 */
-		if (kn->flags & KERNFS_ACTIVATED)
-			kernfs_drain(pos);
-		else
-			WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
+		kernfs_drain(pos);
 
 		/*
 		 * kernfs_unlink_sibling() succeeds once per node.  Use it
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
                   ` (4 preceding siblings ...)
  2022-08-20  0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-23  5:49   ` Chengming Zhou
  2022-08-20  0:05 ` [PATCH 7/7] cgroup: Implement cgroup_file_show() Tejun Heo
  2022-08-22  1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
  7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

Currently, kernfs nodes can be created deactivated and activated later to
allow creation of multiple nodes to succeed or fail as a group. Extend this
capability so that kernfs nodes can be deactivated and re-activated anytime
and however many times. This can be used to toggle interface files for
features which can be dynamically turned on and off.

kernfs_activate()'s skip conditions are updated so that it doesn't ignore
re-activations and suppress re-activations of files which are being removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/kernfs/dir.c        | 89 ++++++++++++++++++++++++++++++------------
 include/linux/kernfs.h |  2 +
 2 files changed, 65 insertions(+), 26 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index f857731598cd..6db031362585 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -483,7 +483,7 @@ static bool kernfs_drain(struct kernfs_node *kn)
 	 * worrying about draining.
 	 */
 	if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
-	    kernfs_should_drain_open_files(kn))
+	    !kernfs_should_drain_open_files(kn))
 		return false;
 
 	up_write(&root->kernfs_rwsem);
@@ -1321,14 +1321,15 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
 }
 
 /**
- * kernfs_activate - activate a node which started deactivated
+ * kernfs_activate - activate a node's subtree
  * @kn: kernfs_node whose subtree is to be activated
  *
- * If the root has KERNFS_ROOT_CREATE_DEACTIVATED set, a newly created node
- * needs to be explicitly activated.  A node which hasn't been activated
- * isn't visible to userland and deactivation is skipped during its
- * removal.  This is useful to construct atomic init sequences where
- * creation of multiple nodes should either succeed or fail atomically.
+ * If newly created on a root w/ %KERNFS_ROOT_CREATE_DEACTIVATED or after a
+ * kernfs_deactivate() call, @kn is deactivated and invisible to userland. This
+ * function activates all nodes in @kn's inclusive subtree making them visible.
+ *
+ * %KERNFS_ROOT_CREATE_DEACTIVATED is useful when constructing init sequences
+ * where creation of multiple nodes should either succeed or fail atomically.
  *
  * The caller is responsible for ensuring that this function is not called
  * after kernfs_remove*() is invoked on @kn.
@@ -1342,7 +1343,7 @@ void kernfs_activate(struct kernfs_node *kn)
 
 	pos = NULL;
 	while ((pos = kernfs_next_descendant_post(pos, kn))) {
-		if (pos->flags & KERNFS_ACTIVATED)
+		if (kernfs_active(pos) || (kn->flags & KERNFS_REMOVING))
 			continue;
 
 		WARN_ON_ONCE(pos->parent && RB_EMPTY_NODE(&pos->rb));
@@ -1355,6 +1356,58 @@ void kernfs_activate(struct kernfs_node *kn)
 	up_write(&root->kernfs_rwsem);
 }
 
+static void kernfs_deactivate_locked(struct kernfs_node *kn, bool removing)
+{
+	struct kernfs_root *root = kernfs_root(kn);
+	struct kernfs_node *pos;
+
+	lockdep_assert_held_write(&root->kernfs_rwsem);
+
+	/* prevent any new usage under @kn by deactivating all nodes */
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn))) {
+		if (kernfs_active(pos))
+			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+		if (removing)
+			pos->flags |= KERNFS_REMOVING;
+	}
+
+	/*
+	 * No new active usage can be created. Drain existing ones. As
+	 * kernfs_drain() may drop kernfs_rwsem temporarily, pin @pos so that it
+	 * doesn't go away underneath us.
+	 *
+	 * If kernfs_rwsem was released, restart from the beginning. Forward
+	 * progress is guaranteed as a drained node is guaranteed to stay
+	 * drained. In the unlikely case that the loop restart ever becomes a
+	 * problem, we should be able to work around by batching up the
+	 * draining.
+	 */
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn))) {
+		kernfs_get(pos);
+		if (kernfs_drain(pos))
+			pos = NULL;
+		kernfs_put(pos);
+	}
+}
+
+/**
+ * kernfs_deactivate - deactivate a node's subtree
+ * @kn: kernfs_node whose subtree is to be deactivated
+ *
+ * Deactivate @kn's inclusive subtree. On return, the subtree is invisible to
+ * userland and there are no in-flight file operations.
+ */
+void kernfs_deactivate(struct kernfs_node *kn)
+{
+	struct kernfs_root *root = kernfs_root(kn);
+
+	down_write(&root->kernfs_rwsem);
+	kernfs_deactivate_locked(kn, false);
+	up_write(&root->kernfs_rwsem);
+}
+
 static void __kernfs_remove(struct kernfs_node *kn)
 {
 	struct kernfs_node *pos;
@@ -1374,26 +1427,12 @@ static void __kernfs_remove(struct kernfs_node *kn)
 
 	pr_debug("kernfs %s: removing\n", kn->name);
 
-	/* prevent any new usage under @kn by deactivating all nodes */
-	pos = NULL;
-	while ((pos = kernfs_next_descendant_post(pos, kn)))
-		if (kernfs_active(pos))
-			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+	kernfs_deactivate_locked(kn, true);
 
-	/* deactivate and unlink the subtree node-by-node */
+	/* unlink the subtree node-by-node */
 	do {
 		pos = kernfs_leftmost_descendant(kn);
 
-		/*
-		 * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
-		 * base ref could have been put by someone else by the time
-		 * the function returns.  Make sure it doesn't go away
-		 * underneath us.
-		 */
-		kernfs_get(pos);
-
-		kernfs_drain(pos);
-
 		/*
 		 * kernfs_unlink_sibling() succeeds once per node.  Use it
 		 * to decide who's responsible for cleanups.
@@ -1410,8 +1449,6 @@ static void __kernfs_remove(struct kernfs_node *kn)
 
 			kernfs_put(pos);
 		}
-
-		kernfs_put(pos);
 	} while (pos != kn);
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 367044d7708c..657eea1395b6 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -112,6 +112,7 @@ enum kernfs_node_flag {
 	KERNFS_SUICIDED		= 0x0800,
 	KERNFS_EMPTY_DIR	= 0x1000,
 	KERNFS_HAS_RELEASE	= 0x2000,
+	KERNFS_REMOVING		= 0x4000,
 };
 
 /* @flags for kernfs_create_root() */
@@ -429,6 +430,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
 				       const char *name,
 				       struct kernfs_node *target);
 void kernfs_activate(struct kernfs_node *kn);
+void kernfs_deactivate(struct kernfs_node *kn);
 void kernfs_remove(struct kernfs_node *kn);
 void kernfs_break_active_protection(struct kernfs_node *kn);
 void kernfs_unbreak_active_protection(struct kernfs_node *kn);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] cgroup: Implement cgroup_file_show()
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
                   ` (5 preceding siblings ...)
  2022-08-20  0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
@ 2022-08-20  0:05 ` Tejun Heo
  2022-08-22  1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
  7 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-20  0:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
	Imran Khan, kernel-team, Tejun Heo

Add cgroup_file_show() which allows toggling visibility of a cgroup file
using the new kernfs_deactivate(). This will be used to hide psi interface
files on cgroups where it's disabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/cgroup.h |  1 +
 kernel/cgroup/cgroup.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed53bfe7c46c..f61dd135efbe 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -114,6 +114,7 @@ int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 int cgroup_rm_cftypes(struct cftype *cfts);
 void cgroup_file_notify(struct cgroup_file *cfile);
+void cgroup_file_show(struct cgroup_file *cfile, bool show);
 
 int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
 int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ffaccd6373f1..eb096499c72b 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4339,6 +4339,29 @@ void cgroup_file_notify(struct cgroup_file *cfile)
 	spin_unlock_irqrestore(&cgroup_file_kn_lock, flags);
 }
 
+/**
+ * cgroup_file_show - show or hide a hidden cgroup file
+ * @cfile: target cgroup_file obtained by setting cftype->file_offset
+ * @show: whether to show or hide
+ */
+void cgroup_file_show(struct cgroup_file *cfile, bool show)
+{
+	struct kernfs_node *kn;
+
+	spin_lock_irq(&cgroup_file_kn_lock);
+	kn = cfile->kn;
+	kernfs_get(kn);
+	spin_unlock_irq(&cgroup_file_kn_lock);
+
+	if (show)
+		kernfs_activate(kn);
+	else
+		kernfs_deactivate(kn);
+
+	kernfs_put(kn);
+}
+
+
 /**
  * css_next_child - find the next child of a given css
  * @pos: the current position (%NULL to initiate traversal)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show()
  2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
                   ` (6 preceding siblings ...)
  2022-08-20  0:05 ` [PATCH 7/7] cgroup: Implement cgroup_file_show() Tejun Heo
@ 2022-08-22  1:58 ` Chengming Zhou
  2022-08-22  7:10   ` Tejun Heo
  7 siblings, 1 reply; 19+ messages in thread
From: Chengming Zhou @ 2022-08-22  1:58 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> Hello,
> 
> Currently, deactivated kernfs nodes are used for two purposes - during
> removal to kill and drain nodes and during creation to make multiple
> kernfs_nodes creations to succeed or fail as a group.
> 
> This patchset make kernfs [de]activation generic so that it can be used
> anytime to deactivate (hide and drain) and activate (show) kernfs nodes,
> and, on top, implement cgroup_file_show() which allows toggling cgroup file
> visiblity.
> 
> This is for the following pending patchset to allow disabling PSI on
> per-cgroup basis:
> 
>  https://lore.kernel.org/all/20220808110341.15799-1-zhouchengming@bytedance.com/t/#u
> 
> which requires hiding the corresponding cgroup interface files while
> disabled.

Hello,

It's so nice of you for implementing this! I will rebase on your work to
do per-cgroup PSI disable/re-enable and test it today.

Thanks!

> 
> This patchset contains the following seven patches.
> 
>  0001-kernfs-Simply-by-replacing-kernfs_deref_open_node-wi.patch
>  0002-kernfs-Drop-unnecessary-mutex-local-variable-initial.patch
>  0003-kernfs-Refactor-kernfs_get_open_node.patch
>  0004-kernfs-Skip-kernfs_drain_open_files-more-aggressivel.patch
>  0005-kernfs-Make-kernfs_drain-skip-draining-more-aggressi.patch
>  0006-kernfs-Allow-kernfs-nodes-to-be-deactivated-and-re-a.patch
>  0007-cgroup-Implement-cgroup_file_show.patch
> 
> 0001-0003 are misc prep patches. 0004-0006 implement kernsf_deactivate().
> 0008 implements cgroup_file_show() on top. The patches are also available in
> the following git branch:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-deactivate
> 
> diffstat follows. Thanks.
> 
>  fs/kernfs/dir.c             |  120 +++++++++++++++++++++++++++++++++++++++++-------------------
>  fs/kernfs/file.c            |  139 +++++++++++++++++++++++++++++++---------------------------------------
>  fs/kernfs/kernfs-internal.h |    1
>  include/linux/cgroup.h      |    1
>  include/linux/kernfs.h      |    2 +
>  kernel/cgroup/cgroup.c      |   23 +++++++++++
>  6 files changed, 172 insertions(+), 114 deletions(-)
> 
> --
> tejun
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show()
  2022-08-22  1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
@ 2022-08-22  7:10   ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-22  7:10 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Greg Kroah-Hartman, linux-kernel, linux-fsdevel, Johannes Weiner,
	Imran Khan, kernel-team

Hello,

On Mon, Aug 22, 2022 at 09:58:22AM +0800, Chengming Zhou wrote:
> It's so nice of you for implementing this! I will rebase on your work to
> do per-cgroup PSI disable/re-enable and test it today.

Oh, I always kinda wanted it and am likely to need it for something else
that I'm working on, so it isn't all that altruistic.

Thanks for working on PSI optimizations!

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on()
  2022-08-20  0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
@ 2022-08-23  5:15   ` Chengming Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23  5:15 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU
> dereference is currently only used in kernfs_notify(). Everywhere else,
> either we're holding the lock which protects it or know that the
> kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file
> which is hanging off of it.
> 
> kernfs_deref_open_node() is used for the latter case - accessing
> kernfs_open_node from kernfs_open_file. The lifetime and visibility rules
> are simple and clear here. To someone who can access a kernfs_open_file, its
> kernfs_open_node is pinned and visible through of->kn->attr.open.
> 
> Replace kernfs_deref_open_node() which simpler of_on(). The former takes
> both @kn and @of and RCU deref @kn->attr.open while sanity checking with
> @of. The latter takes @of and uses protected deref on of->kn->attr.open.
> 
> As the return value can't be NULL, remove the error handling in the callers
> too.
> 
> This shouldn't cause any functional changes.

Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>

Thanks.

> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Imran Khan <imran.f.khan@oracle.com>
> ---
>  fs/kernfs/file.c | 56 +++++++++++-------------------------------------
>  1 file changed, 13 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index b3ec34386b43..32b16fe00a9e 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -57,31 +57,17 @@ static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
>  }
>  
>  /**
> - * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
> - *
> - * @of: associated kernfs_open_file instance.
> - * @kn: target kernfs_node.
> - *
> - * Fetch and return ->attr.open of @kn if @of->list is non empty.
> - * If @of->list is not empty we can safely assume that @of is on
> - * @kn->attr.open->files list and this guarantees that @kn->attr.open
> - * will not vanish i.e. dereferencing outside RCU read-side critical
> - * section is safe here.
> - *
> - * The caller needs to make sure that @of->list is not empty.
> + * of_on - Return the kernfs_open_node of the specified kernfs_open_file
> + * @of: taret kernfs_open_file
>   */
> -static struct kernfs_open_node *
> -kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
> +static struct kernfs_open_node *of_on(struct kernfs_open_file *of)
>  {
> -	struct kernfs_open_node *on;
> -
> -	on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list));
> -
> -	return on;
> +	return rcu_dereference_protected(of->kn->attr.open,
> +					 !list_empty(&of->list));
>  }
>  
>  /**
> - * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
> + * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn
>   *
>   * @kn: target kernfs_node.
>   *
> @@ -96,7 +82,7 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
>   * The caller needs to make sure that kernfs_open_file_mutex is held.
>   */
>  static struct kernfs_open_node *
> -kernfs_deref_open_node_protected(struct kernfs_node *kn)
> +kernfs_deref_open_node_locked(struct kernfs_node *kn)
>  {
>  	return rcu_dereference_protected(kn->attr.open,
>  				lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
> @@ -207,12 +193,8 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
>  static int kernfs_seq_show(struct seq_file *sf, void *v)
>  {
>  	struct kernfs_open_file *of = sf->private;
> -	struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn);
> -
> -	if (!on)
> -		return -EINVAL;
>  
> -	of->event = atomic_read(&on->event);
> +	of->event = atomic_read(&of_on(of)->event);
>  
>  	return of->kn->attr.ops->seq_show(sf, v);
>  }
> @@ -235,7 +217,6 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
>  	ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE);
>  	const struct kernfs_ops *ops;
> -	struct kernfs_open_node *on;
>  	char *buf;
>  
>  	buf = of->prealloc_buf;
> @@ -257,14 +238,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  		goto out_free;
>  	}
>  
> -	on = kernfs_deref_open_node(of, of->kn);
> -	if (!on) {
> -		len = -EINVAL;
> -		mutex_unlock(&of->mutex);
> -		goto out_free;
> -	}
> -
> -	of->event = atomic_read(&on->event);
> +	of->event = atomic_read(&of_on(of)->event);
>  
>  	ops = kernfs_ops(of->kn);
>  	if (ops->read)
> @@ -584,7 +558,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
>  	struct mutex *mutex = NULL;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
> -	on = kernfs_deref_open_node_protected(kn);
> +	on = kernfs_deref_open_node_locked(kn);
>  
>  	if (on) {
>  		list_add_tail(&of->list, &on->files);
> @@ -629,7 +603,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
>  
> -	on = kernfs_deref_open_node_protected(kn);
> +	on = kernfs_deref_open_node_locked(kn);
>  	if (!on) {
>  		mutex_unlock(mutex);
>  		return;
> @@ -839,7 +813,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
>  		return;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
> -	on = kernfs_deref_open_node_protected(kn);
> +	on = kernfs_deref_open_node_locked(kn);
>  	if (!on) {
>  		mutex_unlock(mutex);
>  		return;
> @@ -874,11 +848,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
>   */
>  __poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
>  {
> -	struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
> -	struct kernfs_open_node *on = kernfs_deref_open_node(of, kn);
> -
> -	if (!on)
> -		return EPOLLERR;
> +	struct kernfs_open_node *on = of_on(of);
>  
>  	poll_wait(of->file, &on->poll, wait);
>  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization
  2022-08-20  0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
@ 2022-08-23  5:15   ` Chengming Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23  5:15 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> These are unnecessary and unconventional. Remove them. Also move variable
> declaration into the block that it's used. No functional changes.
> 

Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>

Thanks.

> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Imran Khan <imran.f.khan@oracle.com>
> ---
>  fs/kernfs/file.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 32b16fe00a9e..6437f7c7162d 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -555,7 +555,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
>  				struct kernfs_open_file *of)
>  {
>  	struct kernfs_open_node *on, *new_on = NULL;
> -	struct mutex *mutex = NULL;
> +	struct mutex *mutex;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
>  	on = kernfs_deref_open_node_locked(kn);
> @@ -599,7 +599,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
>  				 struct kernfs_open_file *of)
>  {
>  	struct kernfs_open_node *on;
> -	struct mutex *mutex = NULL;
> +	struct mutex *mutex;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
>  
> @@ -776,9 +776,10 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
>  {
>  	struct kernfs_node *kn = inode->i_private;
>  	struct kernfs_open_file *of = kernfs_of(filp);
> -	struct mutex *mutex = NULL;
>  
>  	if (kn->flags & KERNFS_HAS_RELEASE) {
> +		struct mutex *mutex;
> +
>  		mutex = kernfs_open_file_mutex_lock(kn);
>  		kernfs_release_file(kn, of);
>  		mutex_unlock(mutex);
> @@ -796,7 +797,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
>  {
>  	struct kernfs_open_node *on;
>  	struct kernfs_open_file *of;
> -	struct mutex *mutex = NULL;
> +	struct mutex *mutex;
>  
>  	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
>  		return;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/7] kernfs: Refactor kernfs_get_open_node()
  2022-08-20  0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
@ 2022-08-23  5:16   ` Chengming Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23  5:16 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> Factor out commont part. This is cleaner and should help with future
> changes. No functional changes.
> 

Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>

Thanks.

> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  fs/kernfs/file.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 6437f7c7162d..7060a2a714b8 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -554,31 +554,28 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>  static int kernfs_get_open_node(struct kernfs_node *kn,
>  				struct kernfs_open_file *of)
>  {
> -	struct kernfs_open_node *on, *new_on = NULL;
> +	struct kernfs_open_node *on;
>  	struct mutex *mutex;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
>  	on = kernfs_deref_open_node_locked(kn);
>  
> -	if (on) {
> -		list_add_tail(&of->list, &on->files);
> -		mutex_unlock(mutex);
> -		return 0;
> -	} else {
> +	if (!on) {
>  		/* not there, initialize a new one */
> -		new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
> -		if (!new_on) {
> +		on = kmalloc(sizeof(*on), GFP_KERNEL);
> +		if (!on) {
>  			mutex_unlock(mutex);
>  			return -ENOMEM;
>  		}
> -		atomic_set(&new_on->event, 1);
> -		init_waitqueue_head(&new_on->poll);
> -		INIT_LIST_HEAD(&new_on->files);
> -		list_add_tail(&of->list, &new_on->files);
> -		rcu_assign_pointer(kn->attr.open, new_on);
> +		atomic_set(&on->event, 1);
> +		init_waitqueue_head(&on->poll);
> +		INIT_LIST_HEAD(&on->files);
> +		rcu_assign_pointer(kn->attr.open, on);
>  	}
> -	mutex_unlock(mutex);
>  
> +	list_add_tail(&of->list, &on->files);
> +
> +	mutex_unlock(mutex);
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
  2022-08-20  0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
@ 2022-08-23  5:27   ` Chengming Zhou
  2022-08-23 19:37     ` Tejun Heo
  2022-08-25 12:11   ` Chengming Zhou
  1 sibling, 1 reply; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23  5:27 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> Track the number of mmapped files and files that need to be released and
> skip kernfs_drain_open_file() if both are zero, which are the precise
> conditions which require draining open_files. The early exit test is
> factored into kernfs_should_drain_open_files() which is now tested by
> kernfs_drain_open_files()'s caller - kernfs_drain().
> 
> This isn't a meaningful optimization on its own but will enable future
> stand-alone kernfs_deactivate() implementation.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  fs/kernfs/dir.c             |  3 ++-
>  fs/kernfs/file.c            | 51 +++++++++++++++++++++++++------------
>  fs/kernfs/kernfs-internal.h |  1 +
>  3 files changed, 38 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 1cc88ba6de90..8ae44db920d4 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -489,7 +489,8 @@ static void kernfs_drain(struct kernfs_node *kn)
>  		rwsem_release(&kn->dep_map, _RET_IP_);
>  	}
>  
> -	kernfs_drain_open_files(kn);
> +	if (kernfs_should_drain_open_files(kn))
> +		kernfs_drain_open_files(kn);
>  
>  	down_write(&root->kernfs_rwsem);
>  }
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 7060a2a714b8..b510589af427 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -23,6 +23,8 @@ struct kernfs_open_node {
>  	atomic_t		event;
>  	wait_queue_head_t	poll;
>  	struct list_head	files; /* goes through kernfs_open_file.list */
> +	unsigned int		nr_mmapped;
> +	unsigned int		nr_to_release;
>  };
>  
>  /*
> @@ -527,6 +529,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>  
>  	rc = 0;
>  	of->mmapped = true;
> +	of_on(of)->nr_mmapped++;
>  	of->vm_ops = vma->vm_ops;
>  	vma->vm_ops = &kernfs_vm_ops;
>  out_put:
> @@ -574,6 +577,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
>  	}
>  
>  	list_add_tail(&of->list, &on->files);
> +	if (kn->flags & KERNFS_HAS_RELEASE)
> +		on->nr_to_release++;
>  
>  	mutex_unlock(mutex);
>  	return 0;
> @@ -606,8 +611,12 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
>  		return;
>  	}
>  
> -	if (of)
> +	if (of) {
> +		WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);

kernfs_unlink_open_file() is also used in error case "err_put_node" in kernfs_fop_open(),
which should also dec the on->nr_to_release?

Thanks.

> +		if (of->mmapped)
> +			on->nr_mmapped--;
>  		list_del(&of->list);
> +	}
>  
>  	if (list_empty(&on->files)) {
>  		rcu_assign_pointer(kn->attr.open, NULL);
> @@ -766,6 +775,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
>  		 */
>  		kn->attr.ops->release(of);
>  		of->released = true;
> +		of_on(of)->nr_to_release--;
>  	}
>  }
>  
> @@ -790,25 +800,30 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
>  	return 0;
>  }
>  
> -void kernfs_drain_open_files(struct kernfs_node *kn)
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn)
>  {
>  	struct kernfs_open_node *on;
> -	struct kernfs_open_file *of;
> -	struct mutex *mutex;
> -
> -	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
> -		return;
> +	bool ret;
>  
>  	/*
> -	 * lockless opportunistic check is safe below because no one is adding to
> -	 * ->attr.open at this point of time. This check allows early bail out
> -	 * if ->attr.open is already NULL. kernfs_unlink_open_file makes
> -	 * ->attr.open NULL only while holding kernfs_open_file_mutex so below
> -	 * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
> -	 * ->attr.open became NULL while waiting for the mutex.
> +	 * @kn being deactivated guarantees that @kn->attr.open can't change
> +	 * beneath us making the lockless test below safe.
>  	 */
> -	if (!rcu_access_pointer(kn->attr.open))
> -		return;
> +	WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> +
> +	rcu_read_lock();
> +	on = rcu_dereference(kn->attr.open);
> +	ret = on && (on->nr_mmapped || on->nr_to_release);
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +void kernfs_drain_open_files(struct kernfs_node *kn)
> +{
> +	struct kernfs_open_node *on;
> +	struct kernfs_open_file *of;
> +	struct mutex *mutex;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
>  	on = kernfs_deref_open_node_locked(kn);
> @@ -820,13 +835,17 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
>  	list_for_each_entry(of, &on->files, list) {
>  		struct inode *inode = file_inode(of->file);
>  
> -		if (kn->flags & KERNFS_HAS_MMAP)
> +		if (of->mmapped) {
>  			unmap_mapping_range(inode->i_mapping, 0, 0, 1);
> +			of->mmapped = false;
> +			on->nr_mmapped--;
> +		}
>  
>  		if (kn->flags & KERNFS_HAS_RELEASE)
>  			kernfs_release_file(kn, of);
>  	}
>  
> +	WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
>  	mutex_unlock(mutex);
>  }
>  
> diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> index 3ae214d02d44..fc5821effd97 100644
> --- a/fs/kernfs/kernfs-internal.h
> +++ b/fs/kernfs/kernfs-internal.h
> @@ -157,6 +157,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
>   */
>  extern const struct file_operations kernfs_file_fops;
>  
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn);
>  void kernfs_drain_open_files(struct kernfs_node *kn);
>  
>  /*

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] kernfs: Make kernfs_drain() skip draining more aggressively
  2022-08-20  0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
@ 2022-08-23  5:33   ` Chengming Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23  5:33 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> kernfs_drain() is updated to handle whether draining is necessary or not
> rather than its caller. __kernfs_remove() now always calls kernfs_drain()
> instead of filtering based on KERNFS_ACTIVATED.
> 
> kernfs_drain() now tests kn->active and kernfs_should_drain_open_files() to
> determine whether draining is necessary at all. If not, it returns %false
> without doing anything. Otherwise, it unlocks kernfs_rwsem and drains as
> before and returns %true. The return value isn't used yet.
> 
> Using the precise conditions allows skipping more aggressively. This isn't a
> meaningful optimization on its own but will enable future stand-alone
> kernfs_deactivate() implementation.
> 
> While at it, make minor comment updates.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  fs/kernfs/dir.c | 34 ++++++++++++++++++++--------------
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 8ae44db920d4..f857731598cd 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -460,10 +460,14 @@ void kernfs_put_active(struct kernfs_node *kn)
>   * @kn: kernfs_node to drain
>   *
>   * Drain existing usages and nuke all existing mmaps of @kn.  Mutiple
> - * removers may invoke this function concurrently on @kn and all will
> + * callers may invoke this function concurrently on @kn and all will
>   * return after draining is complete.
> + *
> + * RETURNS:
> + * %false if nothing needed to be drained; otherwise, %true. On %true return,
> + * kernfs_rwsem has been released and re-acquired.
>   */
> -static void kernfs_drain(struct kernfs_node *kn)
> +static bool kernfs_drain(struct kernfs_node *kn)
>  	__releases(&kernfs_root(kn)->kernfs_rwsem)
>  	__acquires(&kernfs_root(kn)->kernfs_rwsem)
>  {
> @@ -472,6 +476,16 @@ static void kernfs_drain(struct kernfs_node *kn)
>  	lockdep_assert_held_write(&root->kernfs_rwsem);
>  	WARN_ON_ONCE(kernfs_active(kn));
>  
> +	/*
> +	 * Skip draining if already fully drained. This avoids draining and its
> +	 * lockdep annotations for nodes which have never been activated
> +	 * allowing embedding kernfs_remove() in create error paths without
> +	 * worrying about draining.
> +	 */
> +	if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
> +	    kernfs_should_drain_open_files(kn))

Should be !kernfs_should_drain_open_files(kn)? I see that diff is put in patch 6/7.

Thanks.


> +		return false;
> +
>  	up_write(&root->kernfs_rwsem);
>  
>  	if (kernfs_lockdep(kn)) {
> @@ -480,7 +494,6 @@ static void kernfs_drain(struct kernfs_node *kn)
>  			lock_contended(&kn->dep_map, _RET_IP_);
>  	}
>  
> -	/* but everyone should wait for draining */
>  	wait_event(root->deactivate_waitq,
>  		   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
>  
> @@ -493,6 +506,8 @@ static void kernfs_drain(struct kernfs_node *kn)
>  		kernfs_drain_open_files(kn);
>  
>  	down_write(&root->kernfs_rwsem);
> +
> +	return true;
>  }
>  
>  /**
> @@ -1370,23 +1385,14 @@ static void __kernfs_remove(struct kernfs_node *kn)
>  		pos = kernfs_leftmost_descendant(kn);
>  
>  		/*
> -		 * kernfs_drain() drops kernfs_rwsem temporarily and @pos's
> +		 * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
>  		 * base ref could have been put by someone else by the time
>  		 * the function returns.  Make sure it doesn't go away
>  		 * underneath us.
>  		 */
>  		kernfs_get(pos);
>  
> -		/*
> -		 * Drain iff @kn was activated.  This avoids draining and
> -		 * its lockdep annotations for nodes which have never been
> -		 * activated and allows embedding kernfs_remove() in create
> -		 * error paths without worrying about draining.
> -		 */
> -		if (kn->flags & KERNFS_ACTIVATED)
> -			kernfs_drain(pos);
> -		else
> -			WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> +		kernfs_drain(pos);
>  
>  		/*
>  		 * kernfs_unlink_sibling() succeeds once per node.  Use it

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated
  2022-08-20  0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
@ 2022-08-23  5:49   ` Chengming Zhou
  2022-08-23 20:31     ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23  5:49 UTC (permalink / raw)
  To: Tejun Heo, Greg Kroah-Hartman
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> Currently, kernfs nodes can be created deactivated and activated later to
> allow creation of multiple nodes to succeed or fail as a group. Extend this
> capability so that kernfs nodes can be deactivated and re-activated anytime
> and however many times. This can be used to toggle interface files for
> features which can be dynamically turned on and off.
> 
> kernfs_activate()'s skip conditions are updated so that it doesn't ignore
> re-activations and suppress re-activations of files which are being removed.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  fs/kernfs/dir.c        | 89 ++++++++++++++++++++++++++++++------------
>  include/linux/kernfs.h |  2 +
>  2 files changed, 65 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index f857731598cd..6db031362585 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -483,7 +483,7 @@ static bool kernfs_drain(struct kernfs_node *kn)
>  	 * worrying about draining.
>  	 */
>  	if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
> -	    kernfs_should_drain_open_files(kn))
> +	    !kernfs_should_drain_open_files(kn))
>  		return false;
>  
>  	up_write(&root->kernfs_rwsem);
> @@ -1321,14 +1321,15 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
>  }
>  
>  /**
> - * kernfs_activate - activate a node which started deactivated
> + * kernfs_activate - activate a node's subtree
>   * @kn: kernfs_node whose subtree is to be activated
>   *
> - * If the root has KERNFS_ROOT_CREATE_DEACTIVATED set, a newly created node
> - * needs to be explicitly activated.  A node which hasn't been activated
> - * isn't visible to userland and deactivation is skipped during its
> - * removal.  This is useful to construct atomic init sequences where
> - * creation of multiple nodes should either succeed or fail atomically.
> + * If newly created on a root w/ %KERNFS_ROOT_CREATE_DEACTIVATED or after a
> + * kernfs_deactivate() call, @kn is deactivated and invisible to userland. This
> + * function activates all nodes in @kn's inclusive subtree making them visible.
> + *
> + * %KERNFS_ROOT_CREATE_DEACTIVATED is useful when constructing init sequences
> + * where creation of multiple nodes should either succeed or fail atomically.
>   *
>   * The caller is responsible for ensuring that this function is not called
>   * after kernfs_remove*() is invoked on @kn.
> @@ -1342,7 +1343,7 @@ void kernfs_activate(struct kernfs_node *kn)
>  
>  	pos = NULL;
>  	while ((pos = kernfs_next_descendant_post(pos, kn))) {
> -		if (pos->flags & KERNFS_ACTIVATED)
> +		if (kernfs_active(pos) || (kn->flags & KERNFS_REMOVING))

May I ask a question, what's the difference between kernfs_active() and KERNFS_ACTIVATED?

KERNFS_ACTIVATED is always set when kernfs_activate() and never clear, so I think it means:

1. !KERNFS_ACTIVATED : allocated but not activated
2. KERNFS_ACTIVATED && !kernfs_active() : make deactivated by kernfs_deactivate_locked()

I see most code check kernfs_active(), but two places check KERNFS_ACTIVATED, I'm not sure where
should check KERNFS_ACTIVATED, or is there any chance we can remove KERNFS_ACTIVATED?

Thanks!


>  			continue;
>  
>  		WARN_ON_ONCE(pos->parent && RB_EMPTY_NODE(&pos->rb));
> @@ -1355,6 +1356,58 @@ void kernfs_activate(struct kernfs_node *kn)
>  	up_write(&root->kernfs_rwsem);
>  }
>  
> +static void kernfs_deactivate_locked(struct kernfs_node *kn, bool removing)
> +{
> +	struct kernfs_root *root = kernfs_root(kn);
> +	struct kernfs_node *pos;
> +
> +	lockdep_assert_held_write(&root->kernfs_rwsem);
> +
> +	/* prevent any new usage under @kn by deactivating all nodes */
> +	pos = NULL;
> +	while ((pos = kernfs_next_descendant_post(pos, kn))) {
> +		if (kernfs_active(pos))
> +			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
> +		if (removing)
> +			pos->flags |= KERNFS_REMOVING;
> +	}
> +
> +	/*
> +	 * No new active usage can be created. Drain existing ones. As
> +	 * kernfs_drain() may drop kernfs_rwsem temporarily, pin @pos so that it
> +	 * doesn't go away underneath us.
> +	 *
> +	 * If kernfs_rwsem was released, restart from the beginning. Forward
> +	 * progress is guaranteed as a drained node is guaranteed to stay
> +	 * drained. In the unlikely case that the loop restart ever becomes a
> +	 * problem, we should be able to work around by batching up the
> +	 * draining.
> +	 */
> +	pos = NULL;
> +	while ((pos = kernfs_next_descendant_post(pos, kn))) {
> +		kernfs_get(pos);
> +		if (kernfs_drain(pos))
> +			pos = NULL;
> +		kernfs_put(pos);
> +	}
> +}
> +
> +/**
> + * kernfs_deactivate - deactivate a node's subtree
> + * @kn: kernfs_node whose subtree is to be deactivated
> + *
> + * Deactivate @kn's inclusive subtree. On return, the subtree is invisible to
> + * userland and there are no in-flight file operations.
> + */
> +void kernfs_deactivate(struct kernfs_node *kn)
> +{
> +	struct kernfs_root *root = kernfs_root(kn);
> +
> +	down_write(&root->kernfs_rwsem);
> +	kernfs_deactivate_locked(kn, false);
> +	up_write(&root->kernfs_rwsem);
> +}
> +
>  static void __kernfs_remove(struct kernfs_node *kn)
>  {
>  	struct kernfs_node *pos;
> @@ -1374,26 +1427,12 @@ static void __kernfs_remove(struct kernfs_node *kn)
>  
>  	pr_debug("kernfs %s: removing\n", kn->name);
>  
> -	/* prevent any new usage under @kn by deactivating all nodes */
> -	pos = NULL;
> -	while ((pos = kernfs_next_descendant_post(pos, kn)))
> -		if (kernfs_active(pos))
> -			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
> +	kernfs_deactivate_locked(kn, true);
>  
> -	/* deactivate and unlink the subtree node-by-node */
> +	/* unlink the subtree node-by-node */
>  	do {
>  		pos = kernfs_leftmost_descendant(kn);
>  
> -		/*
> -		 * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
> -		 * base ref could have been put by someone else by the time
> -		 * the function returns.  Make sure it doesn't go away
> -		 * underneath us.
> -		 */
> -		kernfs_get(pos);
> -
> -		kernfs_drain(pos);
> -
>  		/*
>  		 * kernfs_unlink_sibling() succeeds once per node.  Use it
>  		 * to decide who's responsible for cleanups.
> @@ -1410,8 +1449,6 @@ static void __kernfs_remove(struct kernfs_node *kn)
>  
>  			kernfs_put(pos);
>  		}
> -
> -		kernfs_put(pos);
>  	} while (pos != kn);
>  }
>  
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index 367044d7708c..657eea1395b6 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -112,6 +112,7 @@ enum kernfs_node_flag {
>  	KERNFS_SUICIDED		= 0x0800,
>  	KERNFS_EMPTY_DIR	= 0x1000,
>  	KERNFS_HAS_RELEASE	= 0x2000,
> +	KERNFS_REMOVING		= 0x4000,
>  };
>  
>  /* @flags for kernfs_create_root() */
> @@ -429,6 +430,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
>  				       const char *name,
>  				       struct kernfs_node *target);
>  void kernfs_activate(struct kernfs_node *kn);
> +void kernfs_deactivate(struct kernfs_node *kn);
>  void kernfs_remove(struct kernfs_node *kn);
>  void kernfs_break_active_protection(struct kernfs_node *kn);
>  void kernfs_unbreak_active_protection(struct kernfs_node *kn);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
  2022-08-23  5:27   ` Chengming Zhou
@ 2022-08-23 19:37     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-23 19:37 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Greg Kroah-Hartman, linux-kernel, linux-fsdevel, Johannes Weiner,
	Imran Khan, kernel-team

On Tue, Aug 23, 2022 at 01:27:22PM +0800, Chengming Zhou wrote:
> > +	if (of) {
> > +		WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
> 
> kernfs_unlink_open_file() is also used in error case "err_put_node" in kernfs_fop_open(),
> which should also dec the on->nr_to_release?

Ah, right you're. Will fix. Thanks for pointing it out.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated
  2022-08-23  5:49   ` Chengming Zhou
@ 2022-08-23 20:31     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-23 20:31 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Greg Kroah-Hartman, linux-kernel, linux-fsdevel, Johannes Weiner,
	Imran Khan, kernel-team

Hello,

On Tue, Aug 23, 2022 at 01:49:07PM +0800, Chengming Zhou wrote:
> > -		if (pos->flags & KERNFS_ACTIVATED)
> > +		if (kernfs_active(pos) || (kn->flags & KERNFS_REMOVING))
> 
> May I ask a question, what's the difference between kernfs_active() and KERNFS_ACTIVATED?
> 
> KERNFS_ACTIVATED is always set when kernfs_activate() and never clear, so I think it means:
> 
> 1. !KERNFS_ACTIVATED : allocated but not activated
> 2. KERNFS_ACTIVATED && !kernfs_active() : make deactivated by kernfs_deactivate_locked()
> 
> I see most code check kernfs_active(), but two places check KERNFS_ACTIVATED, I'm not sure where
> should check KERNFS_ACTIVATED, or is there any chance we can remove KERNFS_ACTIVATED?

Yeah, ACTIVATED means taht created but never activated while kernfs_active()
means currently active. I tried to substitute all ACTIVATED tests with
kernfs_active() and remove the former but I wasn't sure about changing
kernfs_add_one() behavior.

I think it's too confusing to combine the initial activated state with
user-requested show/hide state and causes other problems like
kernfs_activate() used to activate newly created files unhiding files
explicitly deactivated. Lemme separate out show/hide state into something
separate so that the distinction is clear.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
  2022-08-20  0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
  2022-08-23  5:27   ` Chengming Zhou
@ 2022-08-25 12:11   ` Chengming Zhou
  1 sibling, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-25 12:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Greg Kroah-Hartman,
	Imran Khan, kernel-team

On 2022/8/20 08:05, Tejun Heo wrote:
> Track the number of mmapped files and files that need to be released and
> skip kernfs_drain_open_file() if both are zero, which are the precise
> conditions which require draining open_files. The early exit test is
> factored into kernfs_should_drain_open_files() which is now tested by
> kernfs_drain_open_files()'s caller - kernfs_drain().
> 
> This isn't a meaningful optimization on its own but will enable future
> stand-alone kernfs_deactivate() implementation.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  fs/kernfs/dir.c             |  3 ++-
>  fs/kernfs/file.c            | 51 +++++++++++++++++++++++++------------
>  fs/kernfs/kernfs-internal.h |  1 +
>  3 files changed, 38 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 1cc88ba6de90..8ae44db920d4 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -489,7 +489,8 @@ static void kernfs_drain(struct kernfs_node *kn)
>  		rwsem_release(&kn->dep_map, _RET_IP_);
>  	}
>  
> -	kernfs_drain_open_files(kn);
> +	if (kernfs_should_drain_open_files(kn))
> +		kernfs_drain_open_files(kn);
>  
>  	down_write(&root->kernfs_rwsem);
>  }
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 7060a2a714b8..b510589af427 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -23,6 +23,8 @@ struct kernfs_open_node {
>  	atomic_t		event;
>  	wait_queue_head_t	poll;
>  	struct list_head	files; /* goes through kernfs_open_file.list */
> +	unsigned int		nr_mmapped;
> +	unsigned int		nr_to_release;
>  };
>  
>  /*
> @@ -527,6 +529,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>  
>  	rc = 0;
>  	of->mmapped = true;
> +	of_on(of)->nr_mmapped++;
>  	of->vm_ops = vma->vm_ops;
>  	vma->vm_ops = &kernfs_vm_ops;
>  out_put:
> @@ -574,6 +577,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
>  	}

The current code use kmalloc() to alloc kernfs_open_node, leave nr_to_release and
nr_mmapped uninitialized.

Found by below stress test:

```
cd /sys/fs/cgroup

while true; do echo 0 > cgroup.pressure; echo 1 > cgroup.pressure; done

while true; do cat *.pressure; done

```

Thanks.

>  
>  	list_add_tail(&of->list, &on->files);
> +	if (kn->flags & KERNFS_HAS_RELEASE)
> +		on->nr_to_release++;
>  
>  	mutex_unlock(mutex);
>  	return 0;
> @@ -606,8 +611,12 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
>  		return;
>  	}
>  
> -	if (of)
> +	if (of) {
> +		WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
> +		if (of->mmapped)
> +			on->nr_mmapped--;
>  		list_del(&of->list);
> +	}
>  
>  	if (list_empty(&on->files)) {
>  		rcu_assign_pointer(kn->attr.open, NULL);
> @@ -766,6 +775,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
>  		 */
>  		kn->attr.ops->release(of);
>  		of->released = true;
> +		of_on(of)->nr_to_release--;
>  	}
>  }
>  
> @@ -790,25 +800,30 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
>  	return 0;
>  }
>  
> -void kernfs_drain_open_files(struct kernfs_node *kn)
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn)
>  {
>  	struct kernfs_open_node *on;
> -	struct kernfs_open_file *of;
> -	struct mutex *mutex;
> -
> -	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
> -		return;
> +	bool ret;
>  
>  	/*
> -	 * lockless opportunistic check is safe below because no one is adding to
> -	 * ->attr.open at this point of time. This check allows early bail out
> -	 * if ->attr.open is already NULL. kernfs_unlink_open_file makes
> -	 * ->attr.open NULL only while holding kernfs_open_file_mutex so below
> -	 * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
> -	 * ->attr.open became NULL while waiting for the mutex.
> +	 * @kn being deactivated guarantees that @kn->attr.open can't change
> +	 * beneath us making the lockless test below safe.
>  	 */
> -	if (!rcu_access_pointer(kn->attr.open))
> -		return;
> +	WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> +
> +	rcu_read_lock();
> +	on = rcu_dereference(kn->attr.open);
> +	ret = on && (on->nr_mmapped || on->nr_to_release);
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +void kernfs_drain_open_files(struct kernfs_node *kn)
> +{
> +	struct kernfs_open_node *on;
> +	struct kernfs_open_file *of;
> +	struct mutex *mutex;
>  
>  	mutex = kernfs_open_file_mutex_lock(kn);
>  	on = kernfs_deref_open_node_locked(kn);
> @@ -820,13 +835,17 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
>  	list_for_each_entry(of, &on->files, list) {
>  		struct inode *inode = file_inode(of->file);
>  
> -		if (kn->flags & KERNFS_HAS_MMAP)
> +		if (of->mmapped) {
>  			unmap_mapping_range(inode->i_mapping, 0, 0, 1);
> +			of->mmapped = false;
> +			on->nr_mmapped--;
> +		}
>  
>  		if (kn->flags & KERNFS_HAS_RELEASE)
>  			kernfs_release_file(kn, of);
>  	}
>  
> +	WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
>  	mutex_unlock(mutex);
>  }
>  
> diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> index 3ae214d02d44..fc5821effd97 100644
> --- a/fs/kernfs/kernfs-internal.h
> +++ b/fs/kernfs/kernfs-internal.h
> @@ -157,6 +157,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
>   */
>  extern const struct file_operations kernfs_file_fops;
>  
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn);
>  void kernfs_drain_open_files(struct kernfs_node *kn);
>  
>  /*

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-08-25 12:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-20  0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
2022-08-20  0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
2022-08-23  5:15   ` Chengming Zhou
2022-08-20  0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
2022-08-23  5:15   ` Chengming Zhou
2022-08-20  0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
2022-08-23  5:16   ` Chengming Zhou
2022-08-20  0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
2022-08-23  5:27   ` Chengming Zhou
2022-08-23 19:37     ` Tejun Heo
2022-08-25 12:11   ` Chengming Zhou
2022-08-20  0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
2022-08-23  5:33   ` Chengming Zhou
2022-08-20  0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
2022-08-23  5:49   ` Chengming Zhou
2022-08-23 20:31     ` Tejun Heo
2022-08-20  0:05 ` [PATCH 7/7] cgroup: Implement cgroup_file_show() Tejun Heo
2022-08-22  1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
2022-08-22  7:10   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.