* [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show()
@ 2022-08-20 0:05 Tejun Heo
2022-08-20 0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
` (7 more replies)
0 siblings, 8 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team
Hello,
Currently, deactivated kernfs nodes are used for two purposes - during
removal to kill and drain nodes and during creation to make multiple
kernfs_nodes creations to succeed or fail as a group.
This patchset make kernfs [de]activation generic so that it can be used
anytime to deactivate (hide and drain) and activate (show) kernfs nodes,
and, on top, implement cgroup_file_show() which allows toggling cgroup file
visiblity.
This is for the following pending patchset to allow disabling PSI on
per-cgroup basis:
https://lore.kernel.org/all/20220808110341.15799-1-zhouchengming@bytedance.com/t/#u
which requires hiding the corresponding cgroup interface files while
disabled.
This patchset contains the following seven patches.
0001-kernfs-Simply-by-replacing-kernfs_deref_open_node-wi.patch
0002-kernfs-Drop-unnecessary-mutex-local-variable-initial.patch
0003-kernfs-Refactor-kernfs_get_open_node.patch
0004-kernfs-Skip-kernfs_drain_open_files-more-aggressivel.patch
0005-kernfs-Make-kernfs_drain-skip-draining-more-aggressi.patch
0006-kernfs-Allow-kernfs-nodes-to-be-deactivated-and-re-a.patch
0007-cgroup-Implement-cgroup_file_show.patch
0001-0003 are misc prep patches. 0004-0006 implement kernsf_deactivate().
0008 implements cgroup_file_show() on top. The patches are also available in
the following git branch:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-deactivate
diffstat follows. Thanks.
fs/kernfs/dir.c | 120 +++++++++++++++++++++++++++++++++++++++++-------------------
fs/kernfs/file.c | 139 +++++++++++++++++++++++++++++++---------------------------------------
fs/kernfs/kernfs-internal.h | 1
include/linux/cgroup.h | 1
include/linux/kernfs.h | 2 +
kernel/cgroup/cgroup.c | 23 +++++++++++
6 files changed, 172 insertions(+), 114 deletions(-)
--
tejun
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on()
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-23 5:15 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
` (6 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU
dereference is currently only used in kernfs_notify(). Everywhere else,
either we're holding the lock which protects it or know that the
kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file
which is hanging off of it.
kernfs_deref_open_node() is used for the latter case - accessing
kernfs_open_node from kernfs_open_file. The lifetime and visibility rules
are simple and clear here. To someone who can access a kernfs_open_file, its
kernfs_open_node is pinned and visible through of->kn->attr.open.
Replace kernfs_deref_open_node() which simpler of_on(). The former takes
both @kn and @of and RCU deref @kn->attr.open while sanity checking with
@of. The latter takes @of and uses protected deref on of->kn->attr.open.
As the return value can't be NULL, remove the error handling in the callers
too.
This shouldn't cause any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
---
fs/kernfs/file.c | 56 +++++++++++-------------------------------------
1 file changed, 13 insertions(+), 43 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index b3ec34386b43..32b16fe00a9e 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -57,31 +57,17 @@ static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
}
/**
- * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
- *
- * @of: associated kernfs_open_file instance.
- * @kn: target kernfs_node.
- *
- * Fetch and return ->attr.open of @kn if @of->list is non empty.
- * If @of->list is not empty we can safely assume that @of is on
- * @kn->attr.open->files list and this guarantees that @kn->attr.open
- * will not vanish i.e. dereferencing outside RCU read-side critical
- * section is safe here.
- *
- * The caller needs to make sure that @of->list is not empty.
+ * of_on - Return the kernfs_open_node of the specified kernfs_open_file
+ * @of: taret kernfs_open_file
*/
-static struct kernfs_open_node *
-kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
+static struct kernfs_open_node *of_on(struct kernfs_open_file *of)
{
- struct kernfs_open_node *on;
-
- on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list));
-
- return on;
+ return rcu_dereference_protected(of->kn->attr.open,
+ !list_empty(&of->list));
}
/**
- * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
+ * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn
*
* @kn: target kernfs_node.
*
@@ -96,7 +82,7 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
* The caller needs to make sure that kernfs_open_file_mutex is held.
*/
static struct kernfs_open_node *
-kernfs_deref_open_node_protected(struct kernfs_node *kn)
+kernfs_deref_open_node_locked(struct kernfs_node *kn)
{
return rcu_dereference_protected(kn->attr.open,
lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
@@ -207,12 +193,8 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
static int kernfs_seq_show(struct seq_file *sf, void *v)
{
struct kernfs_open_file *of = sf->private;
- struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn);
-
- if (!on)
- return -EINVAL;
- of->event = atomic_read(&on->event);
+ of->event = atomic_read(&of_on(of)->event);
return of->kn->attr.ops->seq_show(sf, v);
}
@@ -235,7 +217,6 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE);
const struct kernfs_ops *ops;
- struct kernfs_open_node *on;
char *buf;
buf = of->prealloc_buf;
@@ -257,14 +238,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
goto out_free;
}
- on = kernfs_deref_open_node(of, of->kn);
- if (!on) {
- len = -EINVAL;
- mutex_unlock(&of->mutex);
- goto out_free;
- }
-
- of->event = atomic_read(&on->event);
+ of->event = atomic_read(&of_on(of)->event);
ops = kernfs_ops(of->kn);
if (ops->read)
@@ -584,7 +558,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
struct mutex *mutex = NULL;
mutex = kernfs_open_file_mutex_lock(kn);
- on = kernfs_deref_open_node_protected(kn);
+ on = kernfs_deref_open_node_locked(kn);
if (on) {
list_add_tail(&of->list, &on->files);
@@ -629,7 +603,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
mutex = kernfs_open_file_mutex_lock(kn);
- on = kernfs_deref_open_node_protected(kn);
+ on = kernfs_deref_open_node_locked(kn);
if (!on) {
mutex_unlock(mutex);
return;
@@ -839,7 +813,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
return;
mutex = kernfs_open_file_mutex_lock(kn);
- on = kernfs_deref_open_node_protected(kn);
+ on = kernfs_deref_open_node_locked(kn);
if (!on) {
mutex_unlock(mutex);
return;
@@ -874,11 +848,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
*/
__poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
{
- struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
- struct kernfs_open_node *on = kernfs_deref_open_node(of, kn);
-
- if (!on)
- return EPOLLERR;
+ struct kernfs_open_node *on = of_on(of);
poll_wait(of->file, &on->poll, wait);
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
2022-08-20 0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-23 5:15 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
` (5 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
These are unnecessary and unconventional. Remove them. Also move variable
declaration into the block that it's used. No functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
---
fs/kernfs/file.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 32b16fe00a9e..6437f7c7162d 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -555,7 +555,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
struct kernfs_open_file *of)
{
struct kernfs_open_node *on, *new_on = NULL;
- struct mutex *mutex = NULL;
+ struct mutex *mutex;
mutex = kernfs_open_file_mutex_lock(kn);
on = kernfs_deref_open_node_locked(kn);
@@ -599,7 +599,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
struct kernfs_open_file *of)
{
struct kernfs_open_node *on;
- struct mutex *mutex = NULL;
+ struct mutex *mutex;
mutex = kernfs_open_file_mutex_lock(kn);
@@ -776,9 +776,10 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
{
struct kernfs_node *kn = inode->i_private;
struct kernfs_open_file *of = kernfs_of(filp);
- struct mutex *mutex = NULL;
if (kn->flags & KERNFS_HAS_RELEASE) {
+ struct mutex *mutex;
+
mutex = kernfs_open_file_mutex_lock(kn);
kernfs_release_file(kn, of);
mutex_unlock(mutex);
@@ -796,7 +797,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
{
struct kernfs_open_node *on;
struct kernfs_open_file *of;
- struct mutex *mutex = NULL;
+ struct mutex *mutex;
if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
return;
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/7] kernfs: Refactor kernfs_get_open_node()
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
2022-08-20 0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
2022-08-20 0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-23 5:16 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
Factor out commont part. This is cleaner and should help with future
changes. No functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
fs/kernfs/file.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 6437f7c7162d..7060a2a714b8 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -554,31 +554,28 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
static int kernfs_get_open_node(struct kernfs_node *kn,
struct kernfs_open_file *of)
{
- struct kernfs_open_node *on, *new_on = NULL;
+ struct kernfs_open_node *on;
struct mutex *mutex;
mutex = kernfs_open_file_mutex_lock(kn);
on = kernfs_deref_open_node_locked(kn);
- if (on) {
- list_add_tail(&of->list, &on->files);
- mutex_unlock(mutex);
- return 0;
- } else {
+ if (!on) {
/* not there, initialize a new one */
- new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
- if (!new_on) {
+ on = kmalloc(sizeof(*on), GFP_KERNEL);
+ if (!on) {
mutex_unlock(mutex);
return -ENOMEM;
}
- atomic_set(&new_on->event, 1);
- init_waitqueue_head(&new_on->poll);
- INIT_LIST_HEAD(&new_on->files);
- list_add_tail(&of->list, &new_on->files);
- rcu_assign_pointer(kn->attr.open, new_on);
+ atomic_set(&on->event, 1);
+ init_waitqueue_head(&on->poll);
+ INIT_LIST_HEAD(&on->files);
+ rcu_assign_pointer(kn->attr.open, on);
}
- mutex_unlock(mutex);
+ list_add_tail(&of->list, &on->files);
+
+ mutex_unlock(mutex);
return 0;
}
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
` (2 preceding siblings ...)
2022-08-20 0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-23 5:27 ` Chengming Zhou
2022-08-25 12:11 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
` (3 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
Track the number of mmapped files and files that need to be released and
skip kernfs_drain_open_file() if both are zero, which are the precise
conditions which require draining open_files. The early exit test is
factored into kernfs_should_drain_open_files() which is now tested by
kernfs_drain_open_files()'s caller - kernfs_drain().
This isn't a meaningful optimization on its own but will enable future
stand-alone kernfs_deactivate() implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
fs/kernfs/dir.c | 3 ++-
fs/kernfs/file.c | 51 +++++++++++++++++++++++++------------
fs/kernfs/kernfs-internal.h | 1 +
3 files changed, 38 insertions(+), 17 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 1cc88ba6de90..8ae44db920d4 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -489,7 +489,8 @@ static void kernfs_drain(struct kernfs_node *kn)
rwsem_release(&kn->dep_map, _RET_IP_);
}
- kernfs_drain_open_files(kn);
+ if (kernfs_should_drain_open_files(kn))
+ kernfs_drain_open_files(kn);
down_write(&root->kernfs_rwsem);
}
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 7060a2a714b8..b510589af427 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -23,6 +23,8 @@ struct kernfs_open_node {
atomic_t event;
wait_queue_head_t poll;
struct list_head files; /* goes through kernfs_open_file.list */
+ unsigned int nr_mmapped;
+ unsigned int nr_to_release;
};
/*
@@ -527,6 +529,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
rc = 0;
of->mmapped = true;
+ of_on(of)->nr_mmapped++;
of->vm_ops = vma->vm_ops;
vma->vm_ops = &kernfs_vm_ops;
out_put:
@@ -574,6 +577,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
}
list_add_tail(&of->list, &on->files);
+ if (kn->flags & KERNFS_HAS_RELEASE)
+ on->nr_to_release++;
mutex_unlock(mutex);
return 0;
@@ -606,8 +611,12 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
return;
}
- if (of)
+ if (of) {
+ WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
+ if (of->mmapped)
+ on->nr_mmapped--;
list_del(&of->list);
+ }
if (list_empty(&on->files)) {
rcu_assign_pointer(kn->attr.open, NULL);
@@ -766,6 +775,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
*/
kn->attr.ops->release(of);
of->released = true;
+ of_on(of)->nr_to_release--;
}
}
@@ -790,25 +800,30 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
return 0;
}
-void kernfs_drain_open_files(struct kernfs_node *kn)
+bool kernfs_should_drain_open_files(struct kernfs_node *kn)
{
struct kernfs_open_node *on;
- struct kernfs_open_file *of;
- struct mutex *mutex;
-
- if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
- return;
+ bool ret;
/*
- * lockless opportunistic check is safe below because no one is adding to
- * ->attr.open at this point of time. This check allows early bail out
- * if ->attr.open is already NULL. kernfs_unlink_open_file makes
- * ->attr.open NULL only while holding kernfs_open_file_mutex so below
- * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
- * ->attr.open became NULL while waiting for the mutex.
+ * @kn being deactivated guarantees that @kn->attr.open can't change
+ * beneath us making the lockless test below safe.
*/
- if (!rcu_access_pointer(kn->attr.open))
- return;
+ WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
+
+ rcu_read_lock();
+ on = rcu_dereference(kn->attr.open);
+ ret = on && (on->nr_mmapped || on->nr_to_release);
+ rcu_read_unlock();
+
+ return ret;
+}
+
+void kernfs_drain_open_files(struct kernfs_node *kn)
+{
+ struct kernfs_open_node *on;
+ struct kernfs_open_file *of;
+ struct mutex *mutex;
mutex = kernfs_open_file_mutex_lock(kn);
on = kernfs_deref_open_node_locked(kn);
@@ -820,13 +835,17 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
list_for_each_entry(of, &on->files, list) {
struct inode *inode = file_inode(of->file);
- if (kn->flags & KERNFS_HAS_MMAP)
+ if (of->mmapped) {
unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+ of->mmapped = false;
+ on->nr_mmapped--;
+ }
if (kn->flags & KERNFS_HAS_RELEASE)
kernfs_release_file(kn, of);
}
+ WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
mutex_unlock(mutex);
}
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 3ae214d02d44..fc5821effd97 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -157,6 +157,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
*/
extern const struct file_operations kernfs_file_fops;
+bool kernfs_should_drain_open_files(struct kernfs_node *kn);
void kernfs_drain_open_files(struct kernfs_node *kn);
/*
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/7] kernfs: Make kernfs_drain() skip draining more aggressively
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
` (3 preceding siblings ...)
2022-08-20 0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-23 5:33 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
` (2 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
kernfs_drain() is updated to handle whether draining is necessary or not
rather than its caller. __kernfs_remove() now always calls kernfs_drain()
instead of filtering based on KERNFS_ACTIVATED.
kernfs_drain() now tests kn->active and kernfs_should_drain_open_files() to
determine whether draining is necessary at all. If not, it returns %false
without doing anything. Otherwise, it unlocks kernfs_rwsem and drains as
before and returns %true. The return value isn't used yet.
Using the precise conditions allows skipping more aggressively. This isn't a
meaningful optimization on its own but will enable future stand-alone
kernfs_deactivate() implementation.
While at it, make minor comment updates.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
fs/kernfs/dir.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 8ae44db920d4..f857731598cd 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -460,10 +460,14 @@ void kernfs_put_active(struct kernfs_node *kn)
* @kn: kernfs_node to drain
*
* Drain existing usages and nuke all existing mmaps of @kn. Mutiple
- * removers may invoke this function concurrently on @kn and all will
+ * callers may invoke this function concurrently on @kn and all will
* return after draining is complete.
+ *
+ * RETURNS:
+ * %false if nothing needed to be drained; otherwise, %true. On %true return,
+ * kernfs_rwsem has been released and re-acquired.
*/
-static void kernfs_drain(struct kernfs_node *kn)
+static bool kernfs_drain(struct kernfs_node *kn)
__releases(&kernfs_root(kn)->kernfs_rwsem)
__acquires(&kernfs_root(kn)->kernfs_rwsem)
{
@@ -472,6 +476,16 @@ static void kernfs_drain(struct kernfs_node *kn)
lockdep_assert_held_write(&root->kernfs_rwsem);
WARN_ON_ONCE(kernfs_active(kn));
+ /*
+ * Skip draining if already fully drained. This avoids draining and its
+ * lockdep annotations for nodes which have never been activated
+ * allowing embedding kernfs_remove() in create error paths without
+ * worrying about draining.
+ */
+ if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
+ kernfs_should_drain_open_files(kn))
+ return false;
+
up_write(&root->kernfs_rwsem);
if (kernfs_lockdep(kn)) {
@@ -480,7 +494,6 @@ static void kernfs_drain(struct kernfs_node *kn)
lock_contended(&kn->dep_map, _RET_IP_);
}
- /* but everyone should wait for draining */
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
@@ -493,6 +506,8 @@ static void kernfs_drain(struct kernfs_node *kn)
kernfs_drain_open_files(kn);
down_write(&root->kernfs_rwsem);
+
+ return true;
}
/**
@@ -1370,23 +1385,14 @@ static void __kernfs_remove(struct kernfs_node *kn)
pos = kernfs_leftmost_descendant(kn);
/*
- * kernfs_drain() drops kernfs_rwsem temporarily and @pos's
+ * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
* base ref could have been put by someone else by the time
* the function returns. Make sure it doesn't go away
* underneath us.
*/
kernfs_get(pos);
- /*
- * Drain iff @kn was activated. This avoids draining and
- * its lockdep annotations for nodes which have never been
- * activated and allows embedding kernfs_remove() in create
- * error paths without worrying about draining.
- */
- if (kn->flags & KERNFS_ACTIVATED)
- kernfs_drain(pos);
- else
- WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
+ kernfs_drain(pos);
/*
* kernfs_unlink_sibling() succeeds once per node. Use it
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
` (4 preceding siblings ...)
2022-08-20 0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-23 5:49 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 7/7] cgroup: Implement cgroup_file_show() Tejun Heo
2022-08-22 1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
7 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
Currently, kernfs nodes can be created deactivated and activated later to
allow creation of multiple nodes to succeed or fail as a group. Extend this
capability so that kernfs nodes can be deactivated and re-activated anytime
and however many times. This can be used to toggle interface files for
features which can be dynamically turned on and off.
kernfs_activate()'s skip conditions are updated so that it doesn't ignore
re-activations and suppress re-activations of files which are being removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
fs/kernfs/dir.c | 89 ++++++++++++++++++++++++++++++------------
include/linux/kernfs.h | 2 +
2 files changed, 65 insertions(+), 26 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index f857731598cd..6db031362585 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -483,7 +483,7 @@ static bool kernfs_drain(struct kernfs_node *kn)
* worrying about draining.
*/
if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
- kernfs_should_drain_open_files(kn))
+ !kernfs_should_drain_open_files(kn))
return false;
up_write(&root->kernfs_rwsem);
@@ -1321,14 +1321,15 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
}
/**
- * kernfs_activate - activate a node which started deactivated
+ * kernfs_activate - activate a node's subtree
* @kn: kernfs_node whose subtree is to be activated
*
- * If the root has KERNFS_ROOT_CREATE_DEACTIVATED set, a newly created node
- * needs to be explicitly activated. A node which hasn't been activated
- * isn't visible to userland and deactivation is skipped during its
- * removal. This is useful to construct atomic init sequences where
- * creation of multiple nodes should either succeed or fail atomically.
+ * If newly created on a root w/ %KERNFS_ROOT_CREATE_DEACTIVATED or after a
+ * kernfs_deactivate() call, @kn is deactivated and invisible to userland. This
+ * function activates all nodes in @kn's inclusive subtree making them visible.
+ *
+ * %KERNFS_ROOT_CREATE_DEACTIVATED is useful when constructing init sequences
+ * where creation of multiple nodes should either succeed or fail atomically.
*
* The caller is responsible for ensuring that this function is not called
* after kernfs_remove*() is invoked on @kn.
@@ -1342,7 +1343,7 @@ void kernfs_activate(struct kernfs_node *kn)
pos = NULL;
while ((pos = kernfs_next_descendant_post(pos, kn))) {
- if (pos->flags & KERNFS_ACTIVATED)
+ if (kernfs_active(pos) || (kn->flags & KERNFS_REMOVING))
continue;
WARN_ON_ONCE(pos->parent && RB_EMPTY_NODE(&pos->rb));
@@ -1355,6 +1356,58 @@ void kernfs_activate(struct kernfs_node *kn)
up_write(&root->kernfs_rwsem);
}
+static void kernfs_deactivate_locked(struct kernfs_node *kn, bool removing)
+{
+ struct kernfs_root *root = kernfs_root(kn);
+ struct kernfs_node *pos;
+
+ lockdep_assert_held_write(&root->kernfs_rwsem);
+
+ /* prevent any new usage under @kn by deactivating all nodes */
+ pos = NULL;
+ while ((pos = kernfs_next_descendant_post(pos, kn))) {
+ if (kernfs_active(pos))
+ atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+ if (removing)
+ pos->flags |= KERNFS_REMOVING;
+ }
+
+ /*
+ * No new active usage can be created. Drain existing ones. As
+ * kernfs_drain() may drop kernfs_rwsem temporarily, pin @pos so that it
+ * doesn't go away underneath us.
+ *
+ * If kernfs_rwsem was released, restart from the beginning. Forward
+ * progress is guaranteed as a drained node is guaranteed to stay
+ * drained. In the unlikely case that the loop restart ever becomes a
+ * problem, we should be able to work around by batching up the
+ * draining.
+ */
+ pos = NULL;
+ while ((pos = kernfs_next_descendant_post(pos, kn))) {
+ kernfs_get(pos);
+ if (kernfs_drain(pos))
+ pos = NULL;
+ kernfs_put(pos);
+ }
+}
+
+/**
+ * kernfs_deactivate - deactivate a node's subtree
+ * @kn: kernfs_node whose subtree is to be deactivated
+ *
+ * Deactivate @kn's inclusive subtree. On return, the subtree is invisible to
+ * userland and there are no in-flight file operations.
+ */
+void kernfs_deactivate(struct kernfs_node *kn)
+{
+ struct kernfs_root *root = kernfs_root(kn);
+
+ down_write(&root->kernfs_rwsem);
+ kernfs_deactivate_locked(kn, false);
+ up_write(&root->kernfs_rwsem);
+}
+
static void __kernfs_remove(struct kernfs_node *kn)
{
struct kernfs_node *pos;
@@ -1374,26 +1427,12 @@ static void __kernfs_remove(struct kernfs_node *kn)
pr_debug("kernfs %s: removing\n", kn->name);
- /* prevent any new usage under @kn by deactivating all nodes */
- pos = NULL;
- while ((pos = kernfs_next_descendant_post(pos, kn)))
- if (kernfs_active(pos))
- atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+ kernfs_deactivate_locked(kn, true);
- /* deactivate and unlink the subtree node-by-node */
+ /* unlink the subtree node-by-node */
do {
pos = kernfs_leftmost_descendant(kn);
- /*
- * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
- * base ref could have been put by someone else by the time
- * the function returns. Make sure it doesn't go away
- * underneath us.
- */
- kernfs_get(pos);
-
- kernfs_drain(pos);
-
/*
* kernfs_unlink_sibling() succeeds once per node. Use it
* to decide who's responsible for cleanups.
@@ -1410,8 +1449,6 @@ static void __kernfs_remove(struct kernfs_node *kn)
kernfs_put(pos);
}
-
- kernfs_put(pos);
} while (pos != kn);
}
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 367044d7708c..657eea1395b6 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -112,6 +112,7 @@ enum kernfs_node_flag {
KERNFS_SUICIDED = 0x0800,
KERNFS_EMPTY_DIR = 0x1000,
KERNFS_HAS_RELEASE = 0x2000,
+ KERNFS_REMOVING = 0x4000,
};
/* @flags for kernfs_create_root() */
@@ -429,6 +430,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
const char *name,
struct kernfs_node *target);
void kernfs_activate(struct kernfs_node *kn);
+void kernfs_deactivate(struct kernfs_node *kn);
void kernfs_remove(struct kernfs_node *kn);
void kernfs_break_active_protection(struct kernfs_node *kn);
void kernfs_unbreak_active_protection(struct kernfs_node *kn);
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/7] cgroup: Implement cgroup_file_show()
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
` (5 preceding siblings ...)
2022-08-20 0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
@ 2022-08-20 0:05 ` Tejun Heo
2022-08-22 1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
7 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-20 0:05 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Chengming Zhou, Johannes Weiner,
Imran Khan, kernel-team, Tejun Heo
Add cgroup_file_show() which allows toggling visibility of a cgroup file
using the new kernfs_deactivate(). This will be used to hide psi interface
files on cgroups where it's disabled.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/cgroup.h | 1 +
kernel/cgroup/cgroup.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed53bfe7c46c..f61dd135efbe 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -114,6 +114,7 @@ int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_rm_cftypes(struct cftype *cfts);
void cgroup_file_notify(struct cgroup_file *cfile);
+void cgroup_file_show(struct cgroup_file *cfile, bool show);
int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ffaccd6373f1..eb096499c72b 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4339,6 +4339,29 @@ void cgroup_file_notify(struct cgroup_file *cfile)
spin_unlock_irqrestore(&cgroup_file_kn_lock, flags);
}
+/**
+ * cgroup_file_show - show or hide a hidden cgroup file
+ * @cfile: target cgroup_file obtained by setting cftype->file_offset
+ * @show: whether to show or hide
+ */
+void cgroup_file_show(struct cgroup_file *cfile, bool show)
+{
+ struct kernfs_node *kn;
+
+ spin_lock_irq(&cgroup_file_kn_lock);
+ kn = cfile->kn;
+ kernfs_get(kn);
+ spin_unlock_irq(&cgroup_file_kn_lock);
+
+ if (show)
+ kernfs_activate(kn);
+ else
+ kernfs_deactivate(kn);
+
+ kernfs_put(kn);
+}
+
+
/**
* css_next_child - find the next child of a given css
* @pos: the current position (%NULL to initiate traversal)
--
2.37.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show()
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
` (6 preceding siblings ...)
2022-08-20 0:05 ` [PATCH 7/7] cgroup: Implement cgroup_file_show() Tejun Heo
@ 2022-08-22 1:58 ` Chengming Zhou
2022-08-22 7:10 ` Tejun Heo
7 siblings, 1 reply; 19+ messages in thread
From: Chengming Zhou @ 2022-08-22 1:58 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> Hello,
>
> Currently, deactivated kernfs nodes are used for two purposes - during
> removal to kill and drain nodes and during creation to make multiple
> kernfs_nodes creations to succeed or fail as a group.
>
> This patchset make kernfs [de]activation generic so that it can be used
> anytime to deactivate (hide and drain) and activate (show) kernfs nodes,
> and, on top, implement cgroup_file_show() which allows toggling cgroup file
> visiblity.
>
> This is for the following pending patchset to allow disabling PSI on
> per-cgroup basis:
>
> https://lore.kernel.org/all/20220808110341.15799-1-zhouchengming@bytedance.com/t/#u
>
> which requires hiding the corresponding cgroup interface files while
> disabled.
Hello,
It's so nice of you for implementing this! I will rebase on your work to
do per-cgroup PSI disable/re-enable and test it today.
Thanks!
>
> This patchset contains the following seven patches.
>
> 0001-kernfs-Simply-by-replacing-kernfs_deref_open_node-wi.patch
> 0002-kernfs-Drop-unnecessary-mutex-local-variable-initial.patch
> 0003-kernfs-Refactor-kernfs_get_open_node.patch
> 0004-kernfs-Skip-kernfs_drain_open_files-more-aggressivel.patch
> 0005-kernfs-Make-kernfs_drain-skip-draining-more-aggressi.patch
> 0006-kernfs-Allow-kernfs-nodes-to-be-deactivated-and-re-a.patch
> 0007-cgroup-Implement-cgroup_file_show.patch
>
> 0001-0003 are misc prep patches. 0004-0006 implement kernsf_deactivate().
> 0008 implements cgroup_file_show() on top. The patches are also available in
> the following git branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git kernfs-deactivate
>
> diffstat follows. Thanks.
>
> fs/kernfs/dir.c | 120 +++++++++++++++++++++++++++++++++++++++++-------------------
> fs/kernfs/file.c | 139 +++++++++++++++++++++++++++++++---------------------------------------
> fs/kernfs/kernfs-internal.h | 1
> include/linux/cgroup.h | 1
> include/linux/kernfs.h | 2 +
> kernel/cgroup/cgroup.c | 23 +++++++++++
> 6 files changed, 172 insertions(+), 114 deletions(-)
>
> --
> tejun
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show()
2022-08-22 1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
@ 2022-08-22 7:10 ` Tejun Heo
0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-22 7:10 UTC (permalink / raw)
To: Chengming Zhou
Cc: Greg Kroah-Hartman, linux-kernel, linux-fsdevel, Johannes Weiner,
Imran Khan, kernel-team
Hello,
On Mon, Aug 22, 2022 at 09:58:22AM +0800, Chengming Zhou wrote:
> It's so nice of you for implementing this! I will rebase on your work to
> do per-cgroup PSI disable/re-enable and test it today.
Oh, I always kinda wanted it and am likely to need it for something else
that I'm working on, so it isn't all that altruistic.
Thanks for working on PSI optimizations!
--
tejun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on()
2022-08-20 0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
@ 2022-08-23 5:15 ` Chengming Zhou
0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23 5:15 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU
> dereference is currently only used in kernfs_notify(). Everywhere else,
> either we're holding the lock which protects it or know that the
> kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file
> which is hanging off of it.
>
> kernfs_deref_open_node() is used for the latter case - accessing
> kernfs_open_node from kernfs_open_file. The lifetime and visibility rules
> are simple and clear here. To someone who can access a kernfs_open_file, its
> kernfs_open_node is pinned and visible through of->kn->attr.open.
>
> Replace kernfs_deref_open_node() which simpler of_on(). The former takes
> both @kn and @of and RCU deref @kn->attr.open while sanity checking with
> @of. The latter takes @of and uses protected deref on of->kn->attr.open.
>
> As the return value can't be NULL, remove the error handling in the callers
> too.
>
> This shouldn't cause any functional changes.
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Thanks.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Imran Khan <imran.f.khan@oracle.com>
> ---
> fs/kernfs/file.c | 56 +++++++++++-------------------------------------
> 1 file changed, 13 insertions(+), 43 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index b3ec34386b43..32b16fe00a9e 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -57,31 +57,17 @@ static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
> }
>
> /**
> - * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
> - *
> - * @of: associated kernfs_open_file instance.
> - * @kn: target kernfs_node.
> - *
> - * Fetch and return ->attr.open of @kn if @of->list is non empty.
> - * If @of->list is not empty we can safely assume that @of is on
> - * @kn->attr.open->files list and this guarantees that @kn->attr.open
> - * will not vanish i.e. dereferencing outside RCU read-side critical
> - * section is safe here.
> - *
> - * The caller needs to make sure that @of->list is not empty.
> + * of_on - Return the kernfs_open_node of the specified kernfs_open_file
> + * @of: taret kernfs_open_file
> */
> -static struct kernfs_open_node *
> -kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
> +static struct kernfs_open_node *of_on(struct kernfs_open_file *of)
> {
> - struct kernfs_open_node *on;
> -
> - on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list));
> -
> - return on;
> + return rcu_dereference_protected(of->kn->attr.open,
> + !list_empty(&of->list));
> }
>
> /**
> - * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
> + * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn
> *
> * @kn: target kernfs_node.
> *
> @@ -96,7 +82,7 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
> * The caller needs to make sure that kernfs_open_file_mutex is held.
> */
> static struct kernfs_open_node *
> -kernfs_deref_open_node_protected(struct kernfs_node *kn)
> +kernfs_deref_open_node_locked(struct kernfs_node *kn)
> {
> return rcu_dereference_protected(kn->attr.open,
> lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
> @@ -207,12 +193,8 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
> static int kernfs_seq_show(struct seq_file *sf, void *v)
> {
> struct kernfs_open_file *of = sf->private;
> - struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn);
> -
> - if (!on)
> - return -EINVAL;
>
> - of->event = atomic_read(&on->event);
> + of->event = atomic_read(&of_on(of)->event);
>
> return of->kn->attr.ops->seq_show(sf, v);
> }
> @@ -235,7 +217,6 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
> ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE);
> const struct kernfs_ops *ops;
> - struct kernfs_open_node *on;
> char *buf;
>
> buf = of->prealloc_buf;
> @@ -257,14 +238,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> goto out_free;
> }
>
> - on = kernfs_deref_open_node(of, of->kn);
> - if (!on) {
> - len = -EINVAL;
> - mutex_unlock(&of->mutex);
> - goto out_free;
> - }
> -
> - of->event = atomic_read(&on->event);
> + of->event = atomic_read(&of_on(of)->event);
>
> ops = kernfs_ops(of->kn);
> if (ops->read)
> @@ -584,7 +558,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
> struct mutex *mutex = NULL;
>
> mutex = kernfs_open_file_mutex_lock(kn);
> - on = kernfs_deref_open_node_protected(kn);
> + on = kernfs_deref_open_node_locked(kn);
>
> if (on) {
> list_add_tail(&of->list, &on->files);
> @@ -629,7 +603,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
>
> mutex = kernfs_open_file_mutex_lock(kn);
>
> - on = kernfs_deref_open_node_protected(kn);
> + on = kernfs_deref_open_node_locked(kn);
> if (!on) {
> mutex_unlock(mutex);
> return;
> @@ -839,7 +813,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> return;
>
> mutex = kernfs_open_file_mutex_lock(kn);
> - on = kernfs_deref_open_node_protected(kn);
> + on = kernfs_deref_open_node_locked(kn);
> if (!on) {
> mutex_unlock(mutex);
> return;
> @@ -874,11 +848,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> */
> __poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
> {
> - struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
> - struct kernfs_open_node *on = kernfs_deref_open_node(of, kn);
> -
> - if (!on)
> - return EPOLLERR;
> + struct kernfs_open_node *on = of_on(of);
>
> poll_wait(of->file, &on->poll, wait);
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization
2022-08-20 0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
@ 2022-08-23 5:15 ` Chengming Zhou
0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23 5:15 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> These are unnecessary and unconventional. Remove them. Also move variable
> declaration into the block that it's used. No functional changes.
>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Thanks.
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Imran Khan <imran.f.khan@oracle.com>
> ---
> fs/kernfs/file.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 32b16fe00a9e..6437f7c7162d 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -555,7 +555,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
> struct kernfs_open_file *of)
> {
> struct kernfs_open_node *on, *new_on = NULL;
> - struct mutex *mutex = NULL;
> + struct mutex *mutex;
>
> mutex = kernfs_open_file_mutex_lock(kn);
> on = kernfs_deref_open_node_locked(kn);
> @@ -599,7 +599,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
> struct kernfs_open_file *of)
> {
> struct kernfs_open_node *on;
> - struct mutex *mutex = NULL;
> + struct mutex *mutex;
>
> mutex = kernfs_open_file_mutex_lock(kn);
>
> @@ -776,9 +776,10 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> {
> struct kernfs_node *kn = inode->i_private;
> struct kernfs_open_file *of = kernfs_of(filp);
> - struct mutex *mutex = NULL;
>
> if (kn->flags & KERNFS_HAS_RELEASE) {
> + struct mutex *mutex;
> +
> mutex = kernfs_open_file_mutex_lock(kn);
> kernfs_release_file(kn, of);
> mutex_unlock(mutex);
> @@ -796,7 +797,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> {
> struct kernfs_open_node *on;
> struct kernfs_open_file *of;
> - struct mutex *mutex = NULL;
> + struct mutex *mutex;
>
> if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
> return;
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/7] kernfs: Refactor kernfs_get_open_node()
2022-08-20 0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
@ 2022-08-23 5:16 ` Chengming Zhou
0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23 5:16 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> Factor out commont part. This is cleaner and should help with future
> changes. No functional changes.
>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Thanks.
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> fs/kernfs/file.c | 25 +++++++++++--------------
> 1 file changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 6437f7c7162d..7060a2a714b8 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -554,31 +554,28 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
> static int kernfs_get_open_node(struct kernfs_node *kn,
> struct kernfs_open_file *of)
> {
> - struct kernfs_open_node *on, *new_on = NULL;
> + struct kernfs_open_node *on;
> struct mutex *mutex;
>
> mutex = kernfs_open_file_mutex_lock(kn);
> on = kernfs_deref_open_node_locked(kn);
>
> - if (on) {
> - list_add_tail(&of->list, &on->files);
> - mutex_unlock(mutex);
> - return 0;
> - } else {
> + if (!on) {
> /* not there, initialize a new one */
> - new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
> - if (!new_on) {
> + on = kmalloc(sizeof(*on), GFP_KERNEL);
> + if (!on) {
> mutex_unlock(mutex);
> return -ENOMEM;
> }
> - atomic_set(&new_on->event, 1);
> - init_waitqueue_head(&new_on->poll);
> - INIT_LIST_HEAD(&new_on->files);
> - list_add_tail(&of->list, &new_on->files);
> - rcu_assign_pointer(kn->attr.open, new_on);
> + atomic_set(&on->event, 1);
> + init_waitqueue_head(&on->poll);
> + INIT_LIST_HEAD(&on->files);
> + rcu_assign_pointer(kn->attr.open, on);
> }
> - mutex_unlock(mutex);
>
> + list_add_tail(&of->list, &on->files);
> +
> + mutex_unlock(mutex);
> return 0;
> }
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
2022-08-20 0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
@ 2022-08-23 5:27 ` Chengming Zhou
2022-08-23 19:37 ` Tejun Heo
2022-08-25 12:11 ` Chengming Zhou
1 sibling, 1 reply; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23 5:27 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> Track the number of mmapped files and files that need to be released and
> skip kernfs_drain_open_file() if both are zero, which are the precise
> conditions which require draining open_files. The early exit test is
> factored into kernfs_should_drain_open_files() which is now tested by
> kernfs_drain_open_files()'s caller - kernfs_drain().
>
> This isn't a meaningful optimization on its own but will enable future
> stand-alone kernfs_deactivate() implementation.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> fs/kernfs/dir.c | 3 ++-
> fs/kernfs/file.c | 51 +++++++++++++++++++++++++------------
> fs/kernfs/kernfs-internal.h | 1 +
> 3 files changed, 38 insertions(+), 17 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 1cc88ba6de90..8ae44db920d4 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -489,7 +489,8 @@ static void kernfs_drain(struct kernfs_node *kn)
> rwsem_release(&kn->dep_map, _RET_IP_);
> }
>
> - kernfs_drain_open_files(kn);
> + if (kernfs_should_drain_open_files(kn))
> + kernfs_drain_open_files(kn);
>
> down_write(&root->kernfs_rwsem);
> }
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 7060a2a714b8..b510589af427 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -23,6 +23,8 @@ struct kernfs_open_node {
> atomic_t event;
> wait_queue_head_t poll;
> struct list_head files; /* goes through kernfs_open_file.list */
> + unsigned int nr_mmapped;
> + unsigned int nr_to_release;
> };
>
> /*
> @@ -527,6 +529,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>
> rc = 0;
> of->mmapped = true;
> + of_on(of)->nr_mmapped++;
> of->vm_ops = vma->vm_ops;
> vma->vm_ops = &kernfs_vm_ops;
> out_put:
> @@ -574,6 +577,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
> }
>
> list_add_tail(&of->list, &on->files);
> + if (kn->flags & KERNFS_HAS_RELEASE)
> + on->nr_to_release++;
>
> mutex_unlock(mutex);
> return 0;
> @@ -606,8 +611,12 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
> return;
> }
>
> - if (of)
> + if (of) {
> + WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
kernfs_unlink_open_file() is also used in error case "err_put_node" in kernfs_fop_open(),
which should also dec the on->nr_to_release?
Thanks.
> + if (of->mmapped)
> + on->nr_mmapped--;
> list_del(&of->list);
> + }
>
> if (list_empty(&on->files)) {
> rcu_assign_pointer(kn->attr.open, NULL);
> @@ -766,6 +775,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
> */
> kn->attr.ops->release(of);
> of->released = true;
> + of_on(of)->nr_to_release--;
> }
> }
>
> @@ -790,25 +800,30 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> return 0;
> }
>
> -void kernfs_drain_open_files(struct kernfs_node *kn)
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn)
> {
> struct kernfs_open_node *on;
> - struct kernfs_open_file *of;
> - struct mutex *mutex;
> -
> - if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
> - return;
> + bool ret;
>
> /*
> - * lockless opportunistic check is safe below because no one is adding to
> - * ->attr.open at this point of time. This check allows early bail out
> - * if ->attr.open is already NULL. kernfs_unlink_open_file makes
> - * ->attr.open NULL only while holding kernfs_open_file_mutex so below
> - * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
> - * ->attr.open became NULL while waiting for the mutex.
> + * @kn being deactivated guarantees that @kn->attr.open can't change
> + * beneath us making the lockless test below safe.
> */
> - if (!rcu_access_pointer(kn->attr.open))
> - return;
> + WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> +
> + rcu_read_lock();
> + on = rcu_dereference(kn->attr.open);
> + ret = on && (on->nr_mmapped || on->nr_to_release);
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> +void kernfs_drain_open_files(struct kernfs_node *kn)
> +{
> + struct kernfs_open_node *on;
> + struct kernfs_open_file *of;
> + struct mutex *mutex;
>
> mutex = kernfs_open_file_mutex_lock(kn);
> on = kernfs_deref_open_node_locked(kn);
> @@ -820,13 +835,17 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> list_for_each_entry(of, &on->files, list) {
> struct inode *inode = file_inode(of->file);
>
> - if (kn->flags & KERNFS_HAS_MMAP)
> + if (of->mmapped) {
> unmap_mapping_range(inode->i_mapping, 0, 0, 1);
> + of->mmapped = false;
> + on->nr_mmapped--;
> + }
>
> if (kn->flags & KERNFS_HAS_RELEASE)
> kernfs_release_file(kn, of);
> }
>
> + WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
> mutex_unlock(mutex);
> }
>
> diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> index 3ae214d02d44..fc5821effd97 100644
> --- a/fs/kernfs/kernfs-internal.h
> +++ b/fs/kernfs/kernfs-internal.h
> @@ -157,6 +157,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
> */
> extern const struct file_operations kernfs_file_fops;
>
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn);
> void kernfs_drain_open_files(struct kernfs_node *kn);
>
> /*
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/7] kernfs: Make kernfs_drain() skip draining more aggressively
2022-08-20 0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
@ 2022-08-23 5:33 ` Chengming Zhou
0 siblings, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23 5:33 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> kernfs_drain() is updated to handle whether draining is necessary or not
> rather than its caller. __kernfs_remove() now always calls kernfs_drain()
> instead of filtering based on KERNFS_ACTIVATED.
>
> kernfs_drain() now tests kn->active and kernfs_should_drain_open_files() to
> determine whether draining is necessary at all. If not, it returns %false
> without doing anything. Otherwise, it unlocks kernfs_rwsem and drains as
> before and returns %true. The return value isn't used yet.
>
> Using the precise conditions allows skipping more aggressively. This isn't a
> meaningful optimization on its own but will enable future stand-alone
> kernfs_deactivate() implementation.
>
> While at it, make minor comment updates.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> fs/kernfs/dir.c | 34 ++++++++++++++++++++--------------
> 1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 8ae44db920d4..f857731598cd 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -460,10 +460,14 @@ void kernfs_put_active(struct kernfs_node *kn)
> * @kn: kernfs_node to drain
> *
> * Drain existing usages and nuke all existing mmaps of @kn. Mutiple
> - * removers may invoke this function concurrently on @kn and all will
> + * callers may invoke this function concurrently on @kn and all will
> * return after draining is complete.
> + *
> + * RETURNS:
> + * %false if nothing needed to be drained; otherwise, %true. On %true return,
> + * kernfs_rwsem has been released and re-acquired.
> */
> -static void kernfs_drain(struct kernfs_node *kn)
> +static bool kernfs_drain(struct kernfs_node *kn)
> __releases(&kernfs_root(kn)->kernfs_rwsem)
> __acquires(&kernfs_root(kn)->kernfs_rwsem)
> {
> @@ -472,6 +476,16 @@ static void kernfs_drain(struct kernfs_node *kn)
> lockdep_assert_held_write(&root->kernfs_rwsem);
> WARN_ON_ONCE(kernfs_active(kn));
>
> + /*
> + * Skip draining if already fully drained. This avoids draining and its
> + * lockdep annotations for nodes which have never been activated
> + * allowing embedding kernfs_remove() in create error paths without
> + * worrying about draining.
> + */
> + if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
> + kernfs_should_drain_open_files(kn))
Should be !kernfs_should_drain_open_files(kn)? I see that diff is put in patch 6/7.
Thanks.
> + return false;
> +
> up_write(&root->kernfs_rwsem);
>
> if (kernfs_lockdep(kn)) {
> @@ -480,7 +494,6 @@ static void kernfs_drain(struct kernfs_node *kn)
> lock_contended(&kn->dep_map, _RET_IP_);
> }
>
> - /* but everyone should wait for draining */
> wait_event(root->deactivate_waitq,
> atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
>
> @@ -493,6 +506,8 @@ static void kernfs_drain(struct kernfs_node *kn)
> kernfs_drain_open_files(kn);
>
> down_write(&root->kernfs_rwsem);
> +
> + return true;
> }
>
> /**
> @@ -1370,23 +1385,14 @@ static void __kernfs_remove(struct kernfs_node *kn)
> pos = kernfs_leftmost_descendant(kn);
>
> /*
> - * kernfs_drain() drops kernfs_rwsem temporarily and @pos's
> + * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
> * base ref could have been put by someone else by the time
> * the function returns. Make sure it doesn't go away
> * underneath us.
> */
> kernfs_get(pos);
>
> - /*
> - * Drain iff @kn was activated. This avoids draining and
> - * its lockdep annotations for nodes which have never been
> - * activated and allows embedding kernfs_remove() in create
> - * error paths without worrying about draining.
> - */
> - if (kn->flags & KERNFS_ACTIVATED)
> - kernfs_drain(pos);
> - else
> - WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> + kernfs_drain(pos);
>
> /*
> * kernfs_unlink_sibling() succeeds once per node. Use it
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated
2022-08-20 0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
@ 2022-08-23 5:49 ` Chengming Zhou
2022-08-23 20:31 ` Tejun Heo
0 siblings, 1 reply; 19+ messages in thread
From: Chengming Zhou @ 2022-08-23 5:49 UTC (permalink / raw)
To: Tejun Heo, Greg Kroah-Hartman
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> Currently, kernfs nodes can be created deactivated and activated later to
> allow creation of multiple nodes to succeed or fail as a group. Extend this
> capability so that kernfs nodes can be deactivated and re-activated anytime
> and however many times. This can be used to toggle interface files for
> features which can be dynamically turned on and off.
>
> kernfs_activate()'s skip conditions are updated so that it doesn't ignore
> re-activations and suppress re-activations of files which are being removed.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
> fs/kernfs/dir.c | 89 ++++++++++++++++++++++++++++++------------
> include/linux/kernfs.h | 2 +
> 2 files changed, 65 insertions(+), 26 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index f857731598cd..6db031362585 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -483,7 +483,7 @@ static bool kernfs_drain(struct kernfs_node *kn)
> * worrying about draining.
> */
> if (atomic_read(&kn->active) == KN_DEACTIVATED_BIAS &&
> - kernfs_should_drain_open_files(kn))
> + !kernfs_should_drain_open_files(kn))
> return false;
>
> up_write(&root->kernfs_rwsem);
> @@ -1321,14 +1321,15 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
> }
>
> /**
> - * kernfs_activate - activate a node which started deactivated
> + * kernfs_activate - activate a node's subtree
> * @kn: kernfs_node whose subtree is to be activated
> *
> - * If the root has KERNFS_ROOT_CREATE_DEACTIVATED set, a newly created node
> - * needs to be explicitly activated. A node which hasn't been activated
> - * isn't visible to userland and deactivation is skipped during its
> - * removal. This is useful to construct atomic init sequences where
> - * creation of multiple nodes should either succeed or fail atomically.
> + * If newly created on a root w/ %KERNFS_ROOT_CREATE_DEACTIVATED or after a
> + * kernfs_deactivate() call, @kn is deactivated and invisible to userland. This
> + * function activates all nodes in @kn's inclusive subtree making them visible.
> + *
> + * %KERNFS_ROOT_CREATE_DEACTIVATED is useful when constructing init sequences
> + * where creation of multiple nodes should either succeed or fail atomically.
> *
> * The caller is responsible for ensuring that this function is not called
> * after kernfs_remove*() is invoked on @kn.
> @@ -1342,7 +1343,7 @@ void kernfs_activate(struct kernfs_node *kn)
>
> pos = NULL;
> while ((pos = kernfs_next_descendant_post(pos, kn))) {
> - if (pos->flags & KERNFS_ACTIVATED)
> + if (kernfs_active(pos) || (kn->flags & KERNFS_REMOVING))
May I ask a question, what's the difference between kernfs_active() and KERNFS_ACTIVATED?
KERNFS_ACTIVATED is always set when kernfs_activate() and never clear, so I think it means:
1. !KERNFS_ACTIVATED : allocated but not activated
2. KERNFS_ACTIVATED && !kernfs_active() : make deactivated by kernfs_deactivate_locked()
I see most code check kernfs_active(), but two places check KERNFS_ACTIVATED, I'm not sure where
should check KERNFS_ACTIVATED, or is there any chance we can remove KERNFS_ACTIVATED?
Thanks!
> continue;
>
> WARN_ON_ONCE(pos->parent && RB_EMPTY_NODE(&pos->rb));
> @@ -1355,6 +1356,58 @@ void kernfs_activate(struct kernfs_node *kn)
> up_write(&root->kernfs_rwsem);
> }
>
> +static void kernfs_deactivate_locked(struct kernfs_node *kn, bool removing)
> +{
> + struct kernfs_root *root = kernfs_root(kn);
> + struct kernfs_node *pos;
> +
> + lockdep_assert_held_write(&root->kernfs_rwsem);
> +
> + /* prevent any new usage under @kn by deactivating all nodes */
> + pos = NULL;
> + while ((pos = kernfs_next_descendant_post(pos, kn))) {
> + if (kernfs_active(pos))
> + atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
> + if (removing)
> + pos->flags |= KERNFS_REMOVING;
> + }
> +
> + /*
> + * No new active usage can be created. Drain existing ones. As
> + * kernfs_drain() may drop kernfs_rwsem temporarily, pin @pos so that it
> + * doesn't go away underneath us.
> + *
> + * If kernfs_rwsem was released, restart from the beginning. Forward
> + * progress is guaranteed as a drained node is guaranteed to stay
> + * drained. In the unlikely case that the loop restart ever becomes a
> + * problem, we should be able to work around by batching up the
> + * draining.
> + */
> + pos = NULL;
> + while ((pos = kernfs_next_descendant_post(pos, kn))) {
> + kernfs_get(pos);
> + if (kernfs_drain(pos))
> + pos = NULL;
> + kernfs_put(pos);
> + }
> +}
> +
> +/**
> + * kernfs_deactivate - deactivate a node's subtree
> + * @kn: kernfs_node whose subtree is to be deactivated
> + *
> + * Deactivate @kn's inclusive subtree. On return, the subtree is invisible to
> + * userland and there are no in-flight file operations.
> + */
> +void kernfs_deactivate(struct kernfs_node *kn)
> +{
> + struct kernfs_root *root = kernfs_root(kn);
> +
> + down_write(&root->kernfs_rwsem);
> + kernfs_deactivate_locked(kn, false);
> + up_write(&root->kernfs_rwsem);
> +}
> +
> static void __kernfs_remove(struct kernfs_node *kn)
> {
> struct kernfs_node *pos;
> @@ -1374,26 +1427,12 @@ static void __kernfs_remove(struct kernfs_node *kn)
>
> pr_debug("kernfs %s: removing\n", kn->name);
>
> - /* prevent any new usage under @kn by deactivating all nodes */
> - pos = NULL;
> - while ((pos = kernfs_next_descendant_post(pos, kn)))
> - if (kernfs_active(pos))
> - atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
> + kernfs_deactivate_locked(kn, true);
>
> - /* deactivate and unlink the subtree node-by-node */
> + /* unlink the subtree node-by-node */
> do {
> pos = kernfs_leftmost_descendant(kn);
>
> - /*
> - * kernfs_drain() may drop kernfs_rwsem temporarily and @pos's
> - * base ref could have been put by someone else by the time
> - * the function returns. Make sure it doesn't go away
> - * underneath us.
> - */
> - kernfs_get(pos);
> -
> - kernfs_drain(pos);
> -
> /*
> * kernfs_unlink_sibling() succeeds once per node. Use it
> * to decide who's responsible for cleanups.
> @@ -1410,8 +1449,6 @@ static void __kernfs_remove(struct kernfs_node *kn)
>
> kernfs_put(pos);
> }
> -
> - kernfs_put(pos);
> } while (pos != kn);
> }
>
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index 367044d7708c..657eea1395b6 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -112,6 +112,7 @@ enum kernfs_node_flag {
> KERNFS_SUICIDED = 0x0800,
> KERNFS_EMPTY_DIR = 0x1000,
> KERNFS_HAS_RELEASE = 0x2000,
> + KERNFS_REMOVING = 0x4000,
> };
>
> /* @flags for kernfs_create_root() */
> @@ -429,6 +430,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
> const char *name,
> struct kernfs_node *target);
> void kernfs_activate(struct kernfs_node *kn);
> +void kernfs_deactivate(struct kernfs_node *kn);
> void kernfs_remove(struct kernfs_node *kn);
> void kernfs_break_active_protection(struct kernfs_node *kn);
> void kernfs_unbreak_active_protection(struct kernfs_node *kn);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
2022-08-23 5:27 ` Chengming Zhou
@ 2022-08-23 19:37 ` Tejun Heo
0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-23 19:37 UTC (permalink / raw)
To: Chengming Zhou
Cc: Greg Kroah-Hartman, linux-kernel, linux-fsdevel, Johannes Weiner,
Imran Khan, kernel-team
On Tue, Aug 23, 2022 at 01:27:22PM +0800, Chengming Zhou wrote:
> > + if (of) {
> > + WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
>
> kernfs_unlink_open_file() is also used in error case "err_put_node" in kernfs_fop_open(),
> which should also dec the on->nr_to_release?
Ah, right you're. Will fix. Thanks for pointing it out.
--
tejun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated
2022-08-23 5:49 ` Chengming Zhou
@ 2022-08-23 20:31 ` Tejun Heo
0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-08-23 20:31 UTC (permalink / raw)
To: Chengming Zhou
Cc: Greg Kroah-Hartman, linux-kernel, linux-fsdevel, Johannes Weiner,
Imran Khan, kernel-team
Hello,
On Tue, Aug 23, 2022 at 01:49:07PM +0800, Chengming Zhou wrote:
> > - if (pos->flags & KERNFS_ACTIVATED)
> > + if (kernfs_active(pos) || (kn->flags & KERNFS_REMOVING))
>
> May I ask a question, what's the difference between kernfs_active() and KERNFS_ACTIVATED?
>
> KERNFS_ACTIVATED is always set when kernfs_activate() and never clear, so I think it means:
>
> 1. !KERNFS_ACTIVATED : allocated but not activated
> 2. KERNFS_ACTIVATED && !kernfs_active() : make deactivated by kernfs_deactivate_locked()
>
> I see most code check kernfs_active(), but two places check KERNFS_ACTIVATED, I'm not sure where
> should check KERNFS_ACTIVATED, or is there any chance we can remove KERNFS_ACTIVATED?
Yeah, ACTIVATED means taht created but never activated while kernfs_active()
means currently active. I tried to substitute all ACTIVATED tests with
kernfs_active() and remove the former but I wasn't sure about changing
kernfs_add_one() behavior.
I think it's too confusing to combine the initial activated state with
user-requested show/hide state and causes other problems like
kernfs_activate() used to activate newly created files unhiding files
explicitly deactivated. Lemme separate out show/hide state into something
separate so that the distinction is clear.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively
2022-08-20 0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
2022-08-23 5:27 ` Chengming Zhou
@ 2022-08-25 12:11 ` Chengming Zhou
1 sibling, 0 replies; 19+ messages in thread
From: Chengming Zhou @ 2022-08-25 12:11 UTC (permalink / raw)
To: Tejun Heo
Cc: linux-kernel, linux-fsdevel, Johannes Weiner, Greg Kroah-Hartman,
Imran Khan, kernel-team
On 2022/8/20 08:05, Tejun Heo wrote:
> Track the number of mmapped files and files that need to be released and
> skip kernfs_drain_open_file() if both are zero, which are the precise
> conditions which require draining open_files. The early exit test is
> factored into kernfs_should_drain_open_files() which is now tested by
> kernfs_drain_open_files()'s caller - kernfs_drain().
>
> This isn't a meaningful optimization on its own but will enable future
> stand-alone kernfs_deactivate() implementation.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> fs/kernfs/dir.c | 3 ++-
> fs/kernfs/file.c | 51 +++++++++++++++++++++++++------------
> fs/kernfs/kernfs-internal.h | 1 +
> 3 files changed, 38 insertions(+), 17 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 1cc88ba6de90..8ae44db920d4 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -489,7 +489,8 @@ static void kernfs_drain(struct kernfs_node *kn)
> rwsem_release(&kn->dep_map, _RET_IP_);
> }
>
> - kernfs_drain_open_files(kn);
> + if (kernfs_should_drain_open_files(kn))
> + kernfs_drain_open_files(kn);
>
> down_write(&root->kernfs_rwsem);
> }
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 7060a2a714b8..b510589af427 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -23,6 +23,8 @@ struct kernfs_open_node {
> atomic_t event;
> wait_queue_head_t poll;
> struct list_head files; /* goes through kernfs_open_file.list */
> + unsigned int nr_mmapped;
> + unsigned int nr_to_release;
> };
>
> /*
> @@ -527,6 +529,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>
> rc = 0;
> of->mmapped = true;
> + of_on(of)->nr_mmapped++;
> of->vm_ops = vma->vm_ops;
> vma->vm_ops = &kernfs_vm_ops;
> out_put:
> @@ -574,6 +577,8 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
> }
The current code use kmalloc() to alloc kernfs_open_node, leave nr_to_release and
nr_mmapped uninitialized.
Found by below stress test:
```
cd /sys/fs/cgroup
while true; do echo 0 > cgroup.pressure; echo 1 > cgroup.pressure; done
while true; do cat *.pressure; done
```
Thanks.
>
> list_add_tail(&of->list, &on->files);
> + if (kn->flags & KERNFS_HAS_RELEASE)
> + on->nr_to_release++;
>
> mutex_unlock(mutex);
> return 0;
> @@ -606,8 +611,12 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
> return;
> }
>
> - if (of)
> + if (of) {
> + WARN_ON_ONCE((kn->flags & KERNFS_HAS_RELEASE) && !of->released);
> + if (of->mmapped)
> + on->nr_mmapped--;
> list_del(&of->list);
> + }
>
> if (list_empty(&on->files)) {
> rcu_assign_pointer(kn->attr.open, NULL);
> @@ -766,6 +775,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
> */
> kn->attr.ops->release(of);
> of->released = true;
> + of_on(of)->nr_to_release--;
> }
> }
>
> @@ -790,25 +800,30 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> return 0;
> }
>
> -void kernfs_drain_open_files(struct kernfs_node *kn)
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn)
> {
> struct kernfs_open_node *on;
> - struct kernfs_open_file *of;
> - struct mutex *mutex;
> -
> - if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
> - return;
> + bool ret;
>
> /*
> - * lockless opportunistic check is safe below because no one is adding to
> - * ->attr.open at this point of time. This check allows early bail out
> - * if ->attr.open is already NULL. kernfs_unlink_open_file makes
> - * ->attr.open NULL only while holding kernfs_open_file_mutex so below
> - * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
> - * ->attr.open became NULL while waiting for the mutex.
> + * @kn being deactivated guarantees that @kn->attr.open can't change
> + * beneath us making the lockless test below safe.
> */
> - if (!rcu_access_pointer(kn->attr.open))
> - return;
> + WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
> +
> + rcu_read_lock();
> + on = rcu_dereference(kn->attr.open);
> + ret = on && (on->nr_mmapped || on->nr_to_release);
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> +void kernfs_drain_open_files(struct kernfs_node *kn)
> +{
> + struct kernfs_open_node *on;
> + struct kernfs_open_file *of;
> + struct mutex *mutex;
>
> mutex = kernfs_open_file_mutex_lock(kn);
> on = kernfs_deref_open_node_locked(kn);
> @@ -820,13 +835,17 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> list_for_each_entry(of, &on->files, list) {
> struct inode *inode = file_inode(of->file);
>
> - if (kn->flags & KERNFS_HAS_MMAP)
> + if (of->mmapped) {
> unmap_mapping_range(inode->i_mapping, 0, 0, 1);
> + of->mmapped = false;
> + on->nr_mmapped--;
> + }
>
> if (kn->flags & KERNFS_HAS_RELEASE)
> kernfs_release_file(kn, of);
> }
>
> + WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
> mutex_unlock(mutex);
> }
>
> diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
> index 3ae214d02d44..fc5821effd97 100644
> --- a/fs/kernfs/kernfs-internal.h
> +++ b/fs/kernfs/kernfs-internal.h
> @@ -157,6 +157,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
> */
> extern const struct file_operations kernfs_file_fops;
>
> +bool kernfs_should_drain_open_files(struct kernfs_node *kn);
> void kernfs_drain_open_files(struct kernfs_node *kn);
>
> /*
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-08-25 12:12 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-20 0:05 [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Tejun Heo
2022-08-20 0:05 ` [PATCH 1/7] kernfs: Simply by replacing kernfs_deref_open_node() with of_on() Tejun Heo
2022-08-23 5:15 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 2/7] kernfs: Drop unnecessary "mutex" local variable initialization Tejun Heo
2022-08-23 5:15 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 3/7] kernfs: Refactor kernfs_get_open_node() Tejun Heo
2022-08-23 5:16 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 4/7] kernfs: Skip kernfs_drain_open_files() more aggressively Tejun Heo
2022-08-23 5:27 ` Chengming Zhou
2022-08-23 19:37 ` Tejun Heo
2022-08-25 12:11 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 5/7] kernfs: Make kernfs_drain() skip draining " Tejun Heo
2022-08-23 5:33 ` Chengming Zhou
2022-08-20 0:05 ` [PATCH 6/7] kernfs: Allow kernfs nodes to be deactivated and re-activated Tejun Heo
2022-08-23 5:49 ` Chengming Zhou
2022-08-23 20:31 ` Tejun Heo
2022-08-20 0:05 ` [PATCH 7/7] cgroup: Implement cgroup_file_show() Tejun Heo
2022-08-22 1:58 ` [PATCHSET for-6.1] kernfs, cgroup: implement kernfs_deactivate() and cgroup_file_show() Chengming Zhou
2022-08-22 7:10 ` Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.