All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] kernfs: make ->attr.open RCU protected.
@ 2022-06-15  2:10 Imran Khan
  2022-06-15  2:10 ` [PATCH v7 1/4] " Imran Khan
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Imran Khan @ 2022-06-15  2:10 UTC (permalink / raw)
  To: tj, gregkh, viro; +Cc: linux-kernel

The patches in this version of the patch set are as follows:

PATCH-1: Make kernfs_open_node->attr.open RCU protected.

PATCH-2: Change kernfs_notify_list to llist.

PATCH-3: Introduce interface to access kernfs_open_file_mutex.

PATCH-4: Replace global kernfs_open_file_mutex with hashed mutexes.

Changes since v6:
 - Use rcu_dereference_protected for ->attr.open dereferencing under
   kernfs_open_file_mutex

Changes since v5:
 - Use 2 helpers for ->attr.open dereferencing

Changes since v4:
 - Rebase on tag next-20220610
 - Use one helper for all ->attr.open dereferencing.

Changes since v3:
 - Rebase on tag next-20220601
 - Rename RCU access related helpers and update their description
 - Address errors reported by kernel test robot
 - Include Acked-by tags from Tejun for the acked patches (PATCH-2,3,4)

Changes since v2:
 - Rebase on tag next-20220510
 - Remove PATCH-1 of v2 because it is present in tag next-20220510
 - Include Acked-by tags from Tejun for the acked patches (PATCH-2 and PATCH-3)


Cover letter for v2:
--------------------------------------------------------------------------

I have not yet received any feedback about v1 of this patchset [2] but
in the meantime an old version of first patch from [3] has been integrated in
linux-next. Functionally first patch in both [2] and [3] are identical.
It's just that [2] has renamed one of the functions to better reflect the fact
that we are no longer using reference counting for kernfs_open_node.

In this version, I have just modified first patch of v1 so that we use the
modified function name as done in [2] and avoid those parts that are already
present in linux-next now. The remaining 4 patches (PATCH-2 to PATCH-5) are
identical in both v1 and v2 albeit v2 has been rebased on tag next-20220503.

Changes since v1:
 - Rebase on tag next-20220503

[2]: https://lore.kernel.org/lkml/20220428055431.3826852-1-imran.f.khan@oracle.com/
[3]: https://lore.kernel.org/lkml/20220324103040.584491-1-imran.f.khan@oracle.com/

Original cover letter
-------------------------------------------------------

This patchset contains subset of patches (after addressing review comments)
discussed at [1]. Since [1] is replacing multiple global locks and since
each of these locks can be removed independently, it was decided that we
should make these changes in parts i.e first get one set of optimizations
integrated and then work on top of those further.

The patches in this change set introduce following changes:

PATCH-1: Remove reference counting for kernfs_open_node.

PATCH-2: Make kernfs_open_node->attr.open RCU protected.

PATCH-3: Change kernfs_notify_list to llist.

PATCH-4: Introduce interface to access kernfs_open_file_mutex.

PATCH-5: Replace global kernfs_open_file_mutex with hashed mutexes.

[1] https://lore.kernel.org/lkml/YmLfxHcekrr89IFl@slm.duckdns.org/

----------------------------------------------------------------

Imran Khan (4):
  kernfs: make ->attr.open RCU protected.
  kernfs: Change kernfs_notify_list to llist.
  kernfs: Introduce interface to access global kernfs_open_file_mutex.
  kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.

 fs/kernfs/file.c            | 249 ++++++++++++++++++++++--------------
 fs/kernfs/kernfs-internal.h |   4 +
 fs/kernfs/mount.c           |  19 +++
 include/linux/kernfs.h      |  61 ++++++++-
 4 files changed, 235 insertions(+), 98 deletions(-)


base-commit: 6d0c806803170f120f8cb97b321de7bd89d3a791
-- 
2.30.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-15  2:10 [PATCH v7 0/4] kernfs: make ->attr.open RCU protected Imran Khan
@ 2022-06-15  2:10 ` Imran Khan
  2022-06-15  6:12   ` Tejun Heo
  2022-06-15  2:10 ` [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist Imran Khan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Imran Khan @ 2022-06-15  2:10 UTC (permalink / raw)
  To: tj, gregkh, viro; +Cc: linux-kernel

After removal of kernfs_open_node->refcnt in the previous patch,
kernfs_open_node_lock can be removed as well by making ->attr.open
RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open
to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock.

Suggested by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---
 fs/kernfs/file.c       | 147 ++++++++++++++++++++++++++++-------------
 include/linux/kernfs.h |   2 +-
 2 files changed, 102 insertions(+), 47 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e3abfa843879c..706aebcfb8f69 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -23,16 +23,16 @@
  * for each kernfs_node with one or more open files.
  *
  * kernfs_node->attr.open points to kernfs_open_node.  attr.open is
- * protected by kernfs_open_node_lock.
+ * RCU protected.
  *
  * filp->private_data points to seq_file whose ->private points to
  * kernfs_open_file.  kernfs_open_files are chained at
  * kernfs_open_node->files, which is protected by kernfs_open_file_mutex.
  */
-static DEFINE_SPINLOCK(kernfs_open_node_lock);
 static DEFINE_MUTEX(kernfs_open_file_mutex);
 
 struct kernfs_open_node {
+	struct rcu_head		rcu_head;
 	atomic_t		event;
 	wait_queue_head_t	poll;
 	struct list_head	files; /* goes through kernfs_open_file.list */
@@ -51,6 +51,52 @@ struct kernfs_open_node {
 static DEFINE_SPINLOCK(kernfs_notify_lock);
 static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
 
+/**
+ * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
+ *
+ * @of: associated kernfs_open_file instance.
+ * @kn: target kernfs_node.
+ *
+ * Fetch and return ->attr.open of @kn if @of->list is non empty.
+ * If @of->list is not empty we can safely assume that @of is on
+ * @kn->attr.open->files list and this guarantees that @kn->attr.open
+ * will not vanish i.e. dereferencing outside RCU read-side critical
+ * section is safe here.
+ *
+ * The caller needs to make sure that @of->list is not empty.
+ */
+static struct kernfs_open_node *
+kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
+{
+	struct kernfs_open_node *on;
+
+	on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list));
+
+	return on;
+}
+
+/**
+ * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
+ *
+ * @kn: target kernfs_node.
+ *
+ * Fetch and return ->attr.open of @kn when caller holds the
+ * kernfs_open_file_mutex.
+ *
+ * Update of ->attr.open happens under kernfs_open_file_mutex. So when
+ * the caller guarantees that this mutex is being held, other updaters can't
+ * change ->attr.open and this means that we can safely deref ->attr.open
+ * outside RCU read-side critical section.
+ *
+ * The caller needs to make sure that kernfs_open_file_mutex is held.
+ */
+static struct kernfs_open_node *
+kernfs_deref_open_node_protected(struct kernfs_node *kn)
+{
+	return rcu_dereference_protected(kn->attr.open,
+				lockdep_is_held(&kernfs_open_file_mutex));
+}
+
 static struct kernfs_open_file *kernfs_of(struct file *file)
 {
 	return ((struct seq_file *)file->private_data)->private;
@@ -156,8 +202,12 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
 static int kernfs_seq_show(struct seq_file *sf, void *v)
 {
 	struct kernfs_open_file *of = sf->private;
+	struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn);
 
-	of->event = atomic_read(&of->kn->attr.open->event);
+	if (!on)
+		return -EINVAL;
+
+	of->event = atomic_read(&on->event);
 
 	return of->kn->attr.ops->seq_show(sf, v);
 }
@@ -180,6 +230,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
 	ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE);
 	const struct kernfs_ops *ops;
+	struct kernfs_open_node *on;
 	char *buf;
 
 	buf = of->prealloc_buf;
@@ -201,7 +252,15 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		goto out_free;
 	}
 
-	of->event = atomic_read(&of->kn->attr.open->event);
+	on = kernfs_deref_open_node(of, of->kn);
+	if (!on) {
+		len = -EINVAL;
+		mutex_unlock(&of->mutex);
+		goto out_free;
+	}
+
+	of->event = atomic_read(&on->event);
+
 	ops = kernfs_ops(of->kn);
 	if (ops->read)
 		len = ops->read(of, buf, len, iocb->ki_pos);
@@ -519,36 +578,29 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 {
 	struct kernfs_open_node *on, *new_on = NULL;
 
- retry:
 	mutex_lock(&kernfs_open_file_mutex);
-	spin_lock_irq(&kernfs_open_node_lock);
-
-	if (!kn->attr.open && new_on) {
-		kn->attr.open = new_on;
-		new_on = NULL;
-	}
-
-	on = kn->attr.open;
-	if (on)
-		list_add_tail(&of->list, &on->files);
-
-	spin_unlock_irq(&kernfs_open_node_lock);
-	mutex_unlock(&kernfs_open_file_mutex);
+	on = kernfs_deref_open_node_protected(kn);
 
 	if (on) {
-		kfree(new_on);
+		list_add_tail(&of->list, &on->files);
+		mutex_unlock(&kernfs_open_file_mutex);
 		return 0;
+	} else {
+		/* not there, initialize a new one */
+		new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
+		if (!new_on) {
+			mutex_unlock(&kernfs_open_file_mutex);
+			return -ENOMEM;
+		}
+		atomic_set(&new_on->event, 1);
+		init_waitqueue_head(&new_on->poll);
+		INIT_LIST_HEAD(&new_on->files);
+		list_add_tail(&of->list, &new_on->files);
+		rcu_assign_pointer(kn->attr.open, new_on);
 	}
+	mutex_unlock(&kernfs_open_file_mutex);
 
-	/* not there, initialize a new one and retry */
-	new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
-	if (!new_on)
-		return -ENOMEM;
-
-	atomic_set(&new_on->event, 1);
-	init_waitqueue_head(&new_on->poll);
-	INIT_LIST_HEAD(&new_on->files);
-	goto retry;
+	return 0;
 }
 
 /**
@@ -567,24 +619,25 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 static void kernfs_unlink_open_file(struct kernfs_node *kn,
 				 struct kernfs_open_file *of)
 {
-	struct kernfs_open_node *on = kn->attr.open;
-	unsigned long flags;
+	struct kernfs_open_node *on;
 
 	mutex_lock(&kernfs_open_file_mutex);
-	spin_lock_irqsave(&kernfs_open_node_lock, flags);
+
+	on = kernfs_deref_open_node_protected(kn);
+	if (!on) {
+		mutex_unlock(&kernfs_open_file_mutex);
+		return;
+	}
 
 	if (of)
 		list_del(&of->list);
 
-	if (list_empty(&on->files))
-		kn->attr.open = NULL;
-	else
-		on = NULL;
+	if (list_empty(&on->files)) {
+		rcu_assign_pointer(kn->attr.open, NULL);
+		kfree_rcu(on, rcu_head);
+	}
 
-	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
 	mutex_unlock(&kernfs_open_file_mutex);
-
-	kfree(on);
 }
 
 static int kernfs_fop_open(struct inode *inode, struct file *file)
@@ -774,17 +827,16 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 	 * check under kernfs_open_file_mutex will ensure bailing out if
 	 * ->attr.open became NULL while waiting for the mutex.
 	 */
-	if (!kn->attr.open)
+	if (!rcu_access_pointer(kn->attr.open))
 		return;
 
 	mutex_lock(&kernfs_open_file_mutex);
-	if (!kn->attr.open) {
+	on = kernfs_deref_open_node_protected(kn);
+	if (!on) {
 		mutex_unlock(&kernfs_open_file_mutex);
 		return;
 	}
 
-	on = kn->attr.open;
-
 	list_for_each_entry(of, &on->files, list) {
 		struct inode *inode = file_inode(of->file);
 
@@ -815,7 +867,10 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 __poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
 {
 	struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
-	struct kernfs_open_node *on = kn->attr.open;
+	struct kernfs_open_node *on = kernfs_deref_open_node(of, kn);
+
+	if (!on)
+		return EPOLLERR;
 
 	poll_wait(of->file, &on->poll, wait);
 
@@ -922,13 +977,13 @@ void kernfs_notify(struct kernfs_node *kn)
 		return;
 
 	/* kick poll immediately */
-	spin_lock_irqsave(&kernfs_open_node_lock, flags);
-	on = kn->attr.open;
+	rcu_read_lock();
+	on = rcu_dereference(kn->attr.open);
 	if (on) {
 		atomic_inc(&on->event);
 		wake_up_interruptible(&on->poll);
 	}
-	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
+	rcu_read_unlock();
 
 	/* schedule work to kick fsnotify */
 	spin_lock_irqsave(&kernfs_notify_lock, flags);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index e2ae15a6225e8..13f54f078a52a 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -114,7 +114,7 @@ struct kernfs_elem_symlink {
 
 struct kernfs_elem_attr {
 	const struct kernfs_ops	*ops;
-	struct kernfs_open_node	*open;
+	struct kernfs_open_node __rcu	*open;
 	loff_t			size;
 	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist.
  2022-06-15  2:10 [PATCH v7 0/4] kernfs: make ->attr.open RCU protected Imran Khan
  2022-06-15  2:10 ` [PATCH v7 1/4] " Imran Khan
@ 2022-06-15  2:10 ` Imran Khan
       [not found]   ` <CGME20220701112210eucas1p2d2db45881086f41b73527f7536537aa5@eucas1p2.samsung.com>
  2022-06-15  2:10 ` [PATCH v7 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
  2022-06-15  2:10 ` [PATCH v7 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan
  3 siblings, 1 reply; 11+ messages in thread
From: Imran Khan @ 2022-06-15  2:10 UTC (permalink / raw)
  To: tj, gregkh, viro; +Cc: linux-kernel

At present kernfs_notify_list is implemented as a singly linked
list of kernfs_node(s), where last element points to itself and
value of ->attr.next tells if node is present on the list or not.
Both addition and deletion to list happen under kernfs_notify_lock.

Change kernfs_notify_list to llist so that addition to list can heppen
locklessly.

Suggested by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c       | 47 ++++++++++++++++++------------------------
 include/linux/kernfs.h |  2 +-
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 706aebcfb8f69..dce654ad2cea9 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -38,18 +38,16 @@ struct kernfs_open_node {
 	struct list_head	files; /* goes through kernfs_open_file.list */
 };
 
-/*
- * kernfs_notify() may be called from any context and bounces notifications
- * through a work item.  To minimize space overhead in kernfs_node, the
- * pending queue is implemented as a singly linked list of kernfs_nodes.
- * The list is terminated with the self pointer so that whether a
- * kernfs_node is on the list or not can be determined by testing the next
- * pointer for NULL.
+/**
+ * attribute_to_node - get kernfs_node object corresponding to a kernfs attribute
+ * @ptr:	&struct kernfs_elem_attr
+ * @type:	struct kernfs_node
+ * @member:	name of member (i.e attr)
  */
-#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
+#define attribute_to_node(ptr, type, member)	\
+	container_of(ptr, type, member)
 
-static DEFINE_SPINLOCK(kernfs_notify_lock);
-static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
+static LLIST_HEAD(kernfs_notify_list);
 
 /**
  * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
@@ -903,18 +901,16 @@ static void kernfs_notify_workfn(struct work_struct *work)
 	struct kernfs_node *kn;
 	struct kernfs_super_info *info;
 	struct kernfs_root *root;
+	struct llist_node *free;
+	struct kernfs_elem_attr *attr;
 repeat:
 	/* pop one off the notify_list */
-	spin_lock_irq(&kernfs_notify_lock);
-	kn = kernfs_notify_list;
-	if (kn == KERNFS_NOTIFY_EOL) {
-		spin_unlock_irq(&kernfs_notify_lock);
+	free = llist_del_first(&kernfs_notify_list);
+	if (free == NULL)
 		return;
-	}
-	kernfs_notify_list = kn->attr.notify_next;
-	kn->attr.notify_next = NULL;
-	spin_unlock_irq(&kernfs_notify_lock);
 
+	attr = llist_entry(free, struct kernfs_elem_attr, notify_next);
+	kn = attribute_to_node(attr, struct kernfs_node, attr);
 	root = kernfs_root(kn);
 	/* kick fsnotify */
 	down_write(&root->kernfs_rwsem);
@@ -970,12 +966,14 @@ static void kernfs_notify_workfn(struct work_struct *work)
 void kernfs_notify(struct kernfs_node *kn)
 {
 	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
-	unsigned long flags;
 	struct kernfs_open_node *on;
 
 	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
 		return;
 
+	/* Because we are using llist for kernfs_notify_list */
+	WARN_ON_ONCE(in_nmi());
+
 	/* kick poll immediately */
 	rcu_read_lock();
 	on = rcu_dereference(kn->attr.open);
@@ -986,14 +984,9 @@ void kernfs_notify(struct kernfs_node *kn)
 	rcu_read_unlock();
 
 	/* schedule work to kick fsnotify */
-	spin_lock_irqsave(&kernfs_notify_lock, flags);
-	if (!kn->attr.notify_next) {
-		kernfs_get(kn);
-		kn->attr.notify_next = kernfs_notify_list;
-		kernfs_notify_list = kn;
-		schedule_work(&kernfs_notify_work);
-	}
-	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
+	kernfs_get(kn);
+	llist_add(&kn->attr.notify_next, &kernfs_notify_list);
+	schedule_work(&kernfs_notify_work);
 }
 EXPORT_SYMBOL_GPL(kernfs_notify);
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 13f54f078a52a..2dd9c8df0f4f6 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -116,7 +116,7 @@ struct kernfs_elem_attr {
 	const struct kernfs_ops	*ops;
 	struct kernfs_open_node __rcu	*open;
 	loff_t			size;
-	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
+	struct llist_node	notify_next;	/* for kernfs_notify() */
 };
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex.
  2022-06-15  2:10 [PATCH v7 0/4] kernfs: make ->attr.open RCU protected Imran Khan
  2022-06-15  2:10 ` [PATCH v7 1/4] " Imran Khan
  2022-06-15  2:10 ` [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist Imran Khan
@ 2022-06-15  2:10 ` Imran Khan
  2022-06-15  2:10 ` [PATCH v7 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan
  3 siblings, 0 replies; 11+ messages in thread
From: Imran Khan @ 2022-06-15  2:10 UTC (permalink / raw)
  To: tj, gregkh, viro; +Cc: linux-kernel

This allows to change underlying mutex locking, without needing to change
the users of the lock. For example next patch modifies this interface to
use hashed mutexes in place of a single global kernfs_open_file_mutex.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c | 56 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index dce654ad2cea9..fa0154211076d 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -49,6 +49,22 @@ struct kernfs_open_node {
 
 static LLIST_HEAD(kernfs_notify_list);
 
+static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
+{
+	return &kernfs_open_file_mutex;
+}
+
+static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
+{
+	struct mutex *lock;
+
+	lock = kernfs_open_file_mutex_ptr(kn);
+
+	mutex_lock(lock);
+
+	return lock;
+}
+
 /**
  * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
  *
@@ -79,9 +95,9 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
  * @kn: target kernfs_node.
  *
  * Fetch and return ->attr.open of @kn when caller holds the
- * kernfs_open_file_mutex.
+ * kernfs_open_file_mutex_ptr(kn).
  *
- * Update of ->attr.open happens under kernfs_open_file_mutex. So when
+ * Update of ->attr.open happens under kernfs_open_file_mutex_ptr(kn). So when
  * the caller guarantees that this mutex is being held, other updaters can't
  * change ->attr.open and this means that we can safely deref ->attr.open
  * outside RCU read-side critical section.
@@ -92,7 +108,7 @@ static struct kernfs_open_node *
 kernfs_deref_open_node_protected(struct kernfs_node *kn)
 {
 	return rcu_dereference_protected(kn->attr.open,
-				lockdep_is_held(&kernfs_open_file_mutex));
+				lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
 }
 
 static struct kernfs_open_file *kernfs_of(struct file *file)
@@ -575,19 +591,20 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 				struct kernfs_open_file *of)
 {
 	struct kernfs_open_node *on, *new_on = NULL;
+	struct mutex *mutex = NULL;
 
-	mutex_lock(&kernfs_open_file_mutex);
+	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_deref_open_node_protected(kn);
 
 	if (on) {
 		list_add_tail(&of->list, &on->files);
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 		return 0;
 	} else {
 		/* not there, initialize a new one */
 		new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
 		if (!new_on) {
-			mutex_unlock(&kernfs_open_file_mutex);
+			mutex_unlock(mutex);
 			return -ENOMEM;
 		}
 		atomic_set(&new_on->event, 1);
@@ -596,7 +613,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 		list_add_tail(&of->list, &new_on->files);
 		rcu_assign_pointer(kn->attr.open, new_on);
 	}
-	mutex_unlock(&kernfs_open_file_mutex);
+	mutex_unlock(mutex);
 
 	return 0;
 }
@@ -618,12 +635,13 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 				 struct kernfs_open_file *of)
 {
 	struct kernfs_open_node *on;
+	struct mutex *mutex = NULL;
 
-	mutex_lock(&kernfs_open_file_mutex);
+	mutex = kernfs_open_file_mutex_lock(kn);
 
 	on = kernfs_deref_open_node_protected(kn);
 	if (!on) {
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 		return;
 	}
 
@@ -635,7 +653,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 		kfree_rcu(on, rcu_head);
 	}
 
-	mutex_unlock(&kernfs_open_file_mutex);
+	mutex_unlock(mutex);
 }
 
 static int kernfs_fop_open(struct inode *inode, struct file *file)
@@ -773,11 +791,11 @@ static void kernfs_release_file(struct kernfs_node *kn,
 	/*
 	 * @of is guaranteed to have no other file operations in flight and
 	 * we just want to synchronize release and drain paths.
-	 * @kernfs_open_file_mutex is enough.  @of->mutex can't be used
+	 * @kernfs_open_file_mutex_ptr(kn) is enough. @of->mutex can't be used
 	 * here because drain path may be called from places which can
 	 * cause circular dependency.
 	 */
-	lockdep_assert_held(&kernfs_open_file_mutex);
+	lockdep_assert_held(kernfs_open_file_mutex_ptr(kn));
 
 	if (!of->released) {
 		/*
@@ -794,11 +812,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
 {
 	struct kernfs_node *kn = inode->i_private;
 	struct kernfs_open_file *of = kernfs_of(filp);
+	struct mutex *mutex = NULL;
 
 	if (kn->flags & KERNFS_HAS_RELEASE) {
-		mutex_lock(&kernfs_open_file_mutex);
+		mutex = kernfs_open_file_mutex_lock(kn);
 		kernfs_release_file(kn, of);
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 	}
 
 	kernfs_unlink_open_file(kn, of);
@@ -813,6 +832,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 {
 	struct kernfs_open_node *on;
 	struct kernfs_open_file *of;
+	struct mutex *mutex = NULL;
 
 	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
 		return;
@@ -822,16 +842,16 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 	 * ->attr.open at this point of time. This check allows early bail out
 	 * if ->attr.open is already NULL. kernfs_unlink_open_file makes
 	 * ->attr.open NULL only while holding kernfs_open_file_mutex so below
-	 * check under kernfs_open_file_mutex will ensure bailing out if
+	 * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if
 	 * ->attr.open became NULL while waiting for the mutex.
 	 */
 	if (!rcu_access_pointer(kn->attr.open))
 		return;
 
-	mutex_lock(&kernfs_open_file_mutex);
+	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_deref_open_node_protected(kn);
 	if (!on) {
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 		return;
 	}
 
@@ -845,7 +865,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 			kernfs_release_file(kn, of);
 	}
 
-	mutex_unlock(&kernfs_open_file_mutex);
+	mutex_unlock(mutex);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.
  2022-06-15  2:10 [PATCH v7 0/4] kernfs: make ->attr.open RCU protected Imran Khan
                   ` (2 preceding siblings ...)
  2022-06-15  2:10 ` [PATCH v7 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
@ 2022-06-15  2:10 ` Imran Khan
  3 siblings, 0 replies; 11+ messages in thread
From: Imran Khan @ 2022-06-15  2:10 UTC (permalink / raw)
  To: tj, gregkh, viro; +Cc: linux-kernel

In current kernfs design a single mutex, kernfs_open_file_mutex, protects
the list of kernfs_open_file instances corresponding to a sysfs attribute.
So even if different tasks are opening or closing different sysfs files
they can contend on osq_lock of this mutex. The contention is more apparent
in large scale systems with few hundred CPUs where most of the CPUs have
running tasks that are opening, accessing or closing sysfs files at any
point of time.

Using hashed mutexes in place of a single global mutex, can significantly
reduce contention around global mutex and hence can provide better
scalability. Moreover as these hashed mutexes are not part of kernfs_node
objects we will not see any singnificant change in memory utilization of
kernfs based file systems like sysfs, cgroupfs etc.

Modify interface introduced in previous patch to make use of hashed
mutexes. Use kernfs_node address as hashing key.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c            | 17 ++---------
 fs/kernfs/kernfs-internal.h |  4 +++
 fs/kernfs/mount.c           | 19 +++++++++++++
 include/linux/kernfs.h      | 57 +++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index fa0154211076d..06fc4be4032b2 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -18,19 +18,6 @@
 
 #include "kernfs-internal.h"
 
-/*
- * There's one kernfs_open_file for each open file and one kernfs_open_node
- * for each kernfs_node with one or more open files.
- *
- * kernfs_node->attr.open points to kernfs_open_node.  attr.open is
- * RCU protected.
- *
- * filp->private_data points to seq_file whose ->private points to
- * kernfs_open_file.  kernfs_open_files are chained at
- * kernfs_open_node->files, which is protected by kernfs_open_file_mutex.
- */
-static DEFINE_MUTEX(kernfs_open_file_mutex);
-
 struct kernfs_open_node {
 	struct rcu_head		rcu_head;
 	atomic_t		event;
@@ -51,7 +38,9 @@ static LLIST_HEAD(kernfs_notify_list);
 
 static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
 {
-	return &kernfs_open_file_mutex;
+	int idx = hash_ptr(kn, NR_KERNFS_LOCK_BITS);
+
+	return &kernfs_locks->open_file_mutex[idx];
 }
 
 static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index eeaa779b929c7..3ae214d02d441 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -164,4 +164,8 @@ void kernfs_drain_open_files(struct kernfs_node *kn);
  */
 extern const struct inode_operations kernfs_symlink_iops;
 
+/*
+ * kernfs locks
+ */
+extern struct kernfs_global_locks *kernfs_locks;
 #endif	/* __KERNFS_INTERNAL_H */
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index cfa79715fc1a7..d0859f72d2d64 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -20,6 +20,7 @@
 #include "kernfs-internal.h"
 
 struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache;
+struct kernfs_global_locks *kernfs_locks;
 
 static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry)
 {
@@ -387,6 +388,22 @@ void kernfs_kill_sb(struct super_block *sb)
 	kfree(info);
 }
 
+static void __init kernfs_mutex_init(void)
+{
+	int count;
+
+	for (count = 0; count < NR_KERNFS_LOCKS; count++)
+		mutex_init(&kernfs_locks->open_file_mutex[count]);
+}
+
+static void __init kernfs_lock_init(void)
+{
+	kernfs_locks = kmalloc(sizeof(struct kernfs_global_locks), GFP_KERNEL);
+	WARN_ON(!kernfs_locks);
+
+	kernfs_mutex_init();
+}
+
 void __init kernfs_init(void)
 {
 	kernfs_node_cache = kmem_cache_create("kernfs_node_cache",
@@ -397,4 +414,6 @@ void __init kernfs_init(void)
 	kernfs_iattrs_cache  = kmem_cache_create("kernfs_iattrs_cache",
 					      sizeof(struct kernfs_iattrs),
 					      0, SLAB_PANIC, NULL);
+
+	kernfs_lock_init();
 }
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 2dd9c8df0f4f6..13e703f615f79 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -18,6 +18,7 @@
 #include <linux/uidgid.h>
 #include <linux/wait.h>
 #include <linux/rwsem.h>
+#include <linux/cache.h>
 
 struct file;
 struct dentry;
@@ -34,6 +35,62 @@ struct kernfs_fs_context;
 struct kernfs_open_node;
 struct kernfs_iattrs;
 
+/*
+ * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash
+ * table of locks.
+ * Having a small hash table would impact scalability, since
+ * more and more kernfs_node objects will end up using same lock
+ * and having a very large hash table would waste memory.
+ *
+ * At the moment size of hash table of locks is being set based on
+ * the number of CPUs as follows:
+ *
+ * NR_CPU      NR_KERNFS_LOCK_BITS      NR_KERNFS_LOCKS
+ *   1                  1                       2
+ *  2-3                 2                       4
+ *  4-7                 4                       16
+ *  8-15                6                       64
+ *  16-31               8                       256
+ *  32 and more         10                      1024
+ *
+ * The above relation between NR_CPU and number of locks is based
+ * on some internal experimentation which involved booting qemu
+ * with different values of smp, performing some sysfs operations
+ * on all CPUs and observing how increase in number of locks impacts
+ * completion time of these sysfs operations on each CPU.
+ */
+#ifdef CONFIG_SMP
+#define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32)))
+#else
+#define NR_KERNFS_LOCK_BITS     1
+#endif
+
+#define NR_KERNFS_LOCKS     (1 << NR_KERNFS_LOCK_BITS)
+
+/*
+ * There's one kernfs_open_file for each open file and one kernfs_open_node
+ * for each kernfs_node with one or more open files.
+ *
+ * filp->private_data points to seq_file whose ->private points to
+ * kernfs_open_file.
+ *
+ * kernfs_open_files are chained at kernfs_open_node->files, which is
+ * protected by kernfs_global_locks.open_file_mutex[i].
+ *
+ * To reduce possible contention in sysfs access, arising due to single
+ * locks, use an array of locks (e.g. open_file_mutex) and use kernfs_node
+ * object address as hash keys to get the index of these locks.
+ *
+ * Hashed mutexes are safe to use here because operations using these don't
+ * rely on global exclusion.
+ *
+ * In future we intend to replace other global locks with hashed ones as well.
+ * kernfs_global_locks acts as a holder for all such hash tables.
+ */
+struct kernfs_global_locks {
+	struct mutex open_file_mutex[NR_KERNFS_LOCKS];
+};
+
 enum kernfs_node_type {
 	KERNFS_DIR		= 0x0001,
 	KERNFS_FILE		= 0x0002,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-15  2:10 ` [PATCH v7 1/4] " Imran Khan
@ 2022-06-15  6:12   ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2022-06-15  6:12 UTC (permalink / raw)
  To: Imran Khan; +Cc: gregkh, viro, linux-kernel

On Wed, Jun 15, 2022 at 12:10:56PM +1000, Imran Khan wrote:
> After removal of kernfs_open_node->refcnt in the previous patch,
> kernfs_open_node_lock can be removed as well by making ->attr.open
> RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open
> to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock.
> 
> Suggested by: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist.
       [not found]   ` <CGME20220701112210eucas1p2d2db45881086f41b73527f7536537aa5@eucas1p2.samsung.com>
@ 2022-07-01 11:22     ` Marek Szyprowski
  2022-07-01 12:20       ` Imran Khan
  0 siblings, 1 reply; 11+ messages in thread
From: Marek Szyprowski @ 2022-07-01 11:22 UTC (permalink / raw)
  To: Imran Khan, tj, gregkh, viro; +Cc: linux-kernel

Hi All,

On 15.06.2022 04:10, Imran Khan wrote:
> At present kernfs_notify_list is implemented as a singly linked
> list of kernfs_node(s), where last element points to itself and
> value of ->attr.next tells if node is present on the list or not.
> Both addition and deletion to list happen under kernfs_notify_lock.
>
> Change kernfs_notify_list to llist so that addition to list can heppen
> locklessly.
>
> Suggested by: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
> Acked-by: Tejun Heo <tj@kernel.org>

This patch landed in linux next-20220630 as commit b8f35fa1188b 
("kernfs: Change kernfs_notify_list to llist."). Unfortunately, it 
causes serious regression on my test systems. It can be easily noticed 
in the logs by the following warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 34 at fs/kernfs/dir.c:531 kernfs_put.part.0+0x1a4/0x1d8
kernfs_put: console/active: released with incorrect active_ref 0
Modules linked in:
CPU: 1 PID: 34 Comm: kworker/1:4 Not tainted 
5.19.0-rc4-05465-g5732b42edfd1 #12317
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events kernfs_notify_workfn
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x40/0x4c
  dump_stack_lvl from __warn+0xc8/0x13c
  __warn from warn_slowpath_fmt+0x90/0xb4
  warn_slowpath_fmt from kernfs_put.part.0+0x1a4/0x1d8
  kernfs_put.part.0 from kernfs_notify_workfn+0x1a0/0x1d0
  kernfs_notify_workfn from process_one_work+0x1ec/0x4cc
  process_one_work from worker_thread+0x58/0x54c
  worker_thread from kthread+0xd0/0xec
  kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf099dfb0 to 0xf099dff8)
...
---[ end trace 0000000000000000 ]---

then, after some standard kernel messages, booting stops with the 
following log:

rcu: INFO: rcu_sched self-detected stall on CPU
rcu:     1-...!: (2099 ticks this GP) idle=403c/1/0x40000002 
softirq=33/33 fqs=1
  (t=2100 jiffies g=-1135 q=4 ncpus=2)
rcu: rcu_sched kthread starved for 2098 jiffies! g-1135 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now 
expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_sched       state:R  running task     stack:    0 pid: 10 
ppid:     2 flags:0x00000000
  __schedule from schedule+0x48/0xb0
  schedule from schedule_timeout+0x84/0x138
  schedule_timeout from rcu_gp_fqs_loop+0x124/0x478
  rcu_gp_fqs_loop from rcu_gp_kthread+0x150/0x1c8
  rcu_gp_kthread from kthread+0xd0/0xec
  kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf088dfb0 to 0xf088dff8)
...
rcu: Stack dump where RCU GP kthread last ran:
NMI backtrace for cpu 1
CPU: 1 PID: 34 Comm: kworker/1:4 Tainted: G        W 
5.19.0-rc4-05465-g5732b42edfd1 #12317
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events kernfs_notify_workfn
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x40/0x4c
  dump_stack_lvl from nmi_cpu_backtrace+0xd0/0x104
  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xd8/0x120
  nmi_trigger_cpumask_backtrace from 
rcu_check_gp_kthread_starvation+0x138/0x154
  rcu_check_gp_kthread_starvation from rcu_sched_clock_irq+0x664/0x9f4
  rcu_sched_clock_irq from update_process_times+0x6c/0x94
  update_process_times from tick_sched_timer+0x60/0xe8
  tick_sched_timer from __hrtimer_run_queues+0x1b4/0x328
  __hrtimer_run_queues from hrtimer_interrupt+0x124/0x2b0
  hrtimer_interrupt from exynos4_mct_tick_isr+0x44/0x7c
  exynos4_mct_tick_isr from __handle_irq_event_percpu+0x7c/0x1cc
  __handle_irq_event_percpu from handle_irq_event+0x44/0x8c
  handle_irq_event from handle_fasteoi_irq+0x98/0x18c
  handle_fasteoi_irq from generic_handle_domain_irq+0x24/0x34
  generic_handle_domain_irq from gic_handle_irq+0x88/0xa8
  gic_handle_irq from generic_handle_arch_irq+0x34/0x44
  generic_handle_arch_irq from call_with_stack+0x18/0x20
  call_with_stack from __irq_svc+0x98/0xb0
Exception stack(0xf099de78 to 0xf099dec0)
...
  __irq_svc from up_write+0x4/0x3c
  up_write from 0xef7cbb00


It was not easy to find this, because bisecting points to commit 
5732b42edfd1 ("Merge branch 'driver-core-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git"). 
This means that there are some non-trivial dependencies between both 
merged branches. I've rebased 5732b42edfd1^2 onto 5732b42edfd1^1 and 
then I've did the bisecting of the rebased commits. Finally I've found 
this one. Then I've confirmed that reverting it on top of 5732b42edfd1 
and then next-20220630 really fixes the issue.

Let me know how I can help fixing this issue. I've observed this issue 
on ARM 32bit kernel compiled from multi_v7_defconfig and following 
boards: Raspberry Pi 3b and 4b as well as Samsung Exynos based Odroid U3 
and XU4. This issue doesn't happen on QEMU's ARM32bit 'virt' machine.


> ---
>   fs/kernfs/file.c       | 47 ++++++++++++++++++------------------------
>   include/linux/kernfs.h |  2 +-
>   2 files changed, 21 insertions(+), 28 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 706aebcfb8f69..dce654ad2cea9 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -38,18 +38,16 @@ struct kernfs_open_node {
>   	struct list_head	files; /* goes through kernfs_open_file.list */
>   };
>   
> -/*
> - * kernfs_notify() may be called from any context and bounces notifications
> - * through a work item.  To minimize space overhead in kernfs_node, the
> - * pending queue is implemented as a singly linked list of kernfs_nodes.
> - * The list is terminated with the self pointer so that whether a
> - * kernfs_node is on the list or not can be determined by testing the next
> - * pointer for NULL.
> +/**
> + * attribute_to_node - get kernfs_node object corresponding to a kernfs attribute
> + * @ptr:	&struct kernfs_elem_attr
> + * @type:	struct kernfs_node
> + * @member:	name of member (i.e attr)
>    */
> -#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
> +#define attribute_to_node(ptr, type, member)	\
> +	container_of(ptr, type, member)
>   
> -static DEFINE_SPINLOCK(kernfs_notify_lock);
> -static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
> +static LLIST_HEAD(kernfs_notify_list);
>   
>   /**
>    * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
> @@ -903,18 +901,16 @@ static void kernfs_notify_workfn(struct work_struct *work)
>   	struct kernfs_node *kn;
>   	struct kernfs_super_info *info;
>   	struct kernfs_root *root;
> +	struct llist_node *free;
> +	struct kernfs_elem_attr *attr;
>   repeat:
>   	/* pop one off the notify_list */
> -	spin_lock_irq(&kernfs_notify_lock);
> -	kn = kernfs_notify_list;
> -	if (kn == KERNFS_NOTIFY_EOL) {
> -		spin_unlock_irq(&kernfs_notify_lock);
> +	free = llist_del_first(&kernfs_notify_list);
> +	if (free == NULL)
>   		return;
> -	}
> -	kernfs_notify_list = kn->attr.notify_next;
> -	kn->attr.notify_next = NULL;
> -	spin_unlock_irq(&kernfs_notify_lock);
>   
> +	attr = llist_entry(free, struct kernfs_elem_attr, notify_next);
> +	kn = attribute_to_node(attr, struct kernfs_node, attr);
>   	root = kernfs_root(kn);
>   	/* kick fsnotify */
>   	down_write(&root->kernfs_rwsem);
> @@ -970,12 +966,14 @@ static void kernfs_notify_workfn(struct work_struct *work)
>   void kernfs_notify(struct kernfs_node *kn)
>   {
>   	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
> -	unsigned long flags;
>   	struct kernfs_open_node *on;
>   
>   	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
>   		return;
>   
> +	/* Because we are using llist for kernfs_notify_list */
> +	WARN_ON_ONCE(in_nmi());
> +
>   	/* kick poll immediately */
>   	rcu_read_lock();
>   	on = rcu_dereference(kn->attr.open);
> @@ -986,14 +984,9 @@ void kernfs_notify(struct kernfs_node *kn)
>   	rcu_read_unlock();
>   
>   	/* schedule work to kick fsnotify */
> -	spin_lock_irqsave(&kernfs_notify_lock, flags);
> -	if (!kn->attr.notify_next) {
> -		kernfs_get(kn);
> -		kn->attr.notify_next = kernfs_notify_list;
> -		kernfs_notify_list = kn;
> -		schedule_work(&kernfs_notify_work);
> -	}
> -	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
> +	kernfs_get(kn);
> +	llist_add(&kn->attr.notify_next, &kernfs_notify_list);
> +	schedule_work(&kernfs_notify_work);
>   }
>   EXPORT_SYMBOL_GPL(kernfs_notify);
>   
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index 13f54f078a52a..2dd9c8df0f4f6 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -116,7 +116,7 @@ struct kernfs_elem_attr {
>   	const struct kernfs_ops	*ops;
>   	struct kernfs_open_node __rcu	*open;
>   	loff_t			size;
> -	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
> +	struct llist_node	notify_next;	/* for kernfs_notify() */
>   };
>   
>   /*

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist.
  2022-07-01 11:22     ` Marek Szyprowski
@ 2022-07-01 12:20       ` Imran Khan
  2022-07-01 12:49         ` Marek Szyprowski
  0 siblings, 1 reply; 11+ messages in thread
From: Imran Khan @ 2022-07-01 12:20 UTC (permalink / raw)
  To: Marek Szyprowski, tj, gregkh, viro; +Cc: linux-kernel

Hello Marek,


On 1/7/22 9:22 pm, Marek Szyprowski wrote:
> Hi All,
> 
> On 15.06.2022 04:10, Imran Khan wrote:
>> At present kernfs_notify_list is implemented as a singly linked
>> list of kernfs_node(s), where last element points to itself and
>> value of ->attr.next tells if node is present on the list or not.
>> Both addition and deletion to list happen under kernfs_notify_lock.
>>
>> Change kernfs_notify_list to llist so that addition to list can heppen
>> locklessly.
>>
>> Suggested by: Al Viro <viro@zeniv.linux.org.uk>
>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
>> Acked-by: Tejun Heo <tj@kernel.org>
> 
> This patch landed in linux next-20220630 as commit b8f35fa1188b 
> ("kernfs: Change kernfs_notify_list to llist."). Unfortunately, it 
> causes serious regression on my test systems. It can be easily noticed 
> in the logs by the following warning:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 34 at fs/kernfs/dir.c:531 kernfs_put.part.0+0x1a4/0x1d8
> kernfs_put: console/active: released with incorrect active_ref 0
> Modules linked in:
> CPU: 1 PID: 34 Comm: kworker/1:4 Not tainted 
> 5.19.0-rc4-05465-g5732b42edfd1 #12317
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Workqueue: events kernfs_notify_workfn
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x40/0x4c
>   dump_stack_lvl from __warn+0xc8/0x13c
>   __warn from warn_slowpath_fmt+0x90/0xb4
>   warn_slowpath_fmt from kernfs_put.part.0+0x1a4/0x1d8
>   kernfs_put.part.0 from kernfs_notify_workfn+0x1a0/0x1d0
>   kernfs_notify_workfn from process_one_work+0x1ec/0x4cc
>   process_one_work from worker_thread+0x58/0x54c
>   worker_thread from kthread+0xd0/0xec
>   kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xf099dfb0 to 0xf099dff8)
> ...
> ---[ end trace 0000000000000000 ]---
> 

Thanks for reporting this issue. It has been reported earlier in [1] as well. I
am unable to reproduce it locally. Could you please test with following patch on
top of linux next-20220630 and let me know if it helps:


From 6bf7f1adc4b091dc6d6c60e0dd0f16247f61f374 Mon Sep 17 00:00:00 2001
From: Imran Khan <imran.f.khan@oracle.com>
Date: Fri, 1 Jul 2022 14:27:52 +1000
Subject: [PATCH] kernfs: Avoid re-adding kernfs_node into kernfs_notify_list.

---
 fs/kernfs/file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index bb933221b4bae..e8ec054e11c63 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -917,6 +917,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
        if (free == NULL)
                return;

+       free->next = NULL;
        attr = llist_entry(free, struct kernfs_elem_attr, notify_next);
        kn = attribute_to_node(attr, struct kernfs_node, attr);
        root = kernfs_root(kn);
@@ -992,9 +993,11 @@ void kernfs_notify(struct kernfs_node *kn)
        rcu_read_unlock();

        /* schedule work to kick fsnotify */
-       kernfs_get(kn);
-       llist_add(&kn->attr.notify_next, &kernfs_notify_list);
-       schedule_work(&kernfs_notify_work);
+       if (kn->attr.notify_next.next != NULL) {
+               kernfs_get(kn);
+               llist_add(&kn->attr.notify_next, &kernfs_notify_list);
+               schedule_work(&kernfs_notify_work);
+       }
 }
 EXPORT_SYMBOL_GPL(kernfs_notify);


base-commit: 6cc11d2a1759275b856e464265823d94aabd5eaf
-- 
2.30.2

Thanks
-- Imran

[1]: https://lore.kernel.org/lkml/0b3c1362-5f0d-44d5-d3e7-c01de59a4e86@oracle.com/

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist.
  2022-07-01 12:20       ` Imran Khan
@ 2022-07-01 12:49         ` Marek Szyprowski
  2022-07-01 15:06           ` Imran Khan
  2022-07-05 15:18           ` Geert Uytterhoeven
  0 siblings, 2 replies; 11+ messages in thread
From: Marek Szyprowski @ 2022-07-01 12:49 UTC (permalink / raw)
  To: Imran Khan, tj, gregkh, viro; +Cc: linux-kernel

Hi,

On 01.07.2022 14:20, Imran Khan wrote:
> On 1/7/22 9:22 pm, Marek Szyprowski wrote:
>> On 15.06.2022 04:10, Imran Khan wrote:
>>> At present kernfs_notify_list is implemented as a singly linked
>>> list of kernfs_node(s), where last element points to itself and
>>> value of ->attr.next tells if node is present on the list or not.
>>> Both addition and deletion to list happen under kernfs_notify_lock.
>>>
>>> Change kernfs_notify_list to llist so that addition to list can heppen
>>> locklessly.
>>>
>>> Suggested by: Al Viro <viro@zeniv.linux.org.uk>
>>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
>>> Acked-by: Tejun Heo <tj@kernel.org>
>> This patch landed in linux next-20220630 as commit b8f35fa1188b
>> ("kernfs: Change kernfs_notify_list to llist."). Unfortunately, it
>> causes serious regression on my test systems. It can be easily noticed
>> in the logs by the following warning:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 34 at fs/kernfs/dir.c:531 kernfs_put.part.0+0x1a4/0x1d8
>> kernfs_put: console/active: released with incorrect active_ref 0
>> Modules linked in:
>> CPU: 1 PID: 34 Comm: kworker/1:4 Not tainted
>> 5.19.0-rc4-05465-g5732b42edfd1 #12317
>> Hardware name: Samsung Exynos (Flattened Device Tree)
>> Workqueue: events kernfs_notify_workfn
>>    unwind_backtrace from show_stack+0x10/0x14
>>    show_stack from dump_stack_lvl+0x40/0x4c
>>    dump_stack_lvl from __warn+0xc8/0x13c
>>    __warn from warn_slowpath_fmt+0x90/0xb4
>>    warn_slowpath_fmt from kernfs_put.part.0+0x1a4/0x1d8
>>    kernfs_put.part.0 from kernfs_notify_workfn+0x1a0/0x1d0
>>    kernfs_notify_workfn from process_one_work+0x1ec/0x4cc
>>    process_one_work from worker_thread+0x58/0x54c
>>    worker_thread from kthread+0xd0/0xec
>>    kthread from ret_from_fork+0x14/0x2c
>> Exception stack(0xf099dfb0 to 0xf099dff8)
>> ...
>> ---[ end trace 0000000000000000 ]---
>>
> Thanks for reporting this issue. It has been reported earlier in [1] as well. I
> am unable to reproduce it locally. Could you please test with following patch on
> top of linux next-20220630 and let me know if it helps:

Yes, this fixes the issue. Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Maybe it is related to the fact, that I have earlycon enabled on those 
machines?

> >From 6bf7f1adc4b091dc6d6c60e0dd0f16247f61f374 Mon Sep 17 00:00:00 2001
> From: Imran Khan <imran.f.khan@oracle.com>
> Date: Fri, 1 Jul 2022 14:27:52 +1000
> Subject: [PATCH] kernfs: Avoid re-adding kernfs_node into kernfs_notify_list.
>
> ---
>   fs/kernfs/file.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index bb933221b4bae..e8ec054e11c63 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -917,6 +917,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
>          if (free == NULL)
>                  return;
>
> +       free->next = NULL;
>          attr = llist_entry(free, struct kernfs_elem_attr, notify_next);
>          kn = attribute_to_node(attr, struct kernfs_node, attr);
>          root = kernfs_root(kn);
> @@ -992,9 +993,11 @@ void kernfs_notify(struct kernfs_node *kn)
>          rcu_read_unlock();
>
>          /* schedule work to kick fsnotify */
> -       kernfs_get(kn);
> -       llist_add(&kn->attr.notify_next, &kernfs_notify_list);
> -       schedule_work(&kernfs_notify_work);
> +       if (kn->attr.notify_next.next != NULL) {
> +               kernfs_get(kn);
> +               llist_add(&kn->attr.notify_next, &kernfs_notify_list);
> +               schedule_work(&kernfs_notify_work);
> +       }
>   }
>   EXPORT_SYMBOL_GPL(kernfs_notify);
>
>
> base-commit: 6cc11d2a1759275b856e464265823d94aabd5eaf

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist.
  2022-07-01 12:49         ` Marek Szyprowski
@ 2022-07-01 15:06           ` Imran Khan
  2022-07-05 15:18           ` Geert Uytterhoeven
  1 sibling, 0 replies; 11+ messages in thread
From: Imran Khan @ 2022-07-01 15:06 UTC (permalink / raw)
  To: Marek Szyprowski, tj, gregkh, viro; +Cc: linux-kernel

Hello,

On 1/7/22 10:49 pm, Marek Szyprowski wrote:
> Hi,
> 
> On 01.07.2022 14:20, Imran Khan wrote:
>> On 1/7/22 9:22 pm, Marek Szyprowski wrote:
>>> On 15.06.2022 04:10, Imran Khan wrote:
>>>> At present kernfs_notify_list is implemented as a singly linked
>>>> list of kernfs_node(s), where last element points to itself and
>>>> value of ->attr.next tells if node is present on the list or not.
>>>> Both addition and deletion to list happen under kernfs_notify_lock.
>>>>
>>>> Change kernfs_notify_list to llist so that addition to list can heppen
>>>> locklessly.
>>>>
>>>> Suggested by: Al Viro <viro@zeniv.linux.org.uk>
>>>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
>>>> Acked-by: Tejun Heo <tj@kernel.org>
>>> This patch landed in linux next-20220630 as commit b8f35fa1188b
>>> ("kernfs: Change kernfs_notify_list to llist."). Unfortunately, it
>>> causes serious regression on my test systems. It can be easily noticed
>>> in the logs by the following warning:
>>>
[...]
> 
> Yes, this fixes the issue. Feel free to add:
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 

Thanks a lot for testing. Sure I have added these tags. I have
send the patch for review at [1].

> Maybe it is related to the fact, that I have earlycon enabled on those 
> machines?
> 
For sure it is occuring with some tweaking in console settings. So far both the
reported occurences have this thing in common. I will be able to confirm further
if I could reproduce this locally and I am trying that at the moment.
I will share when I have some more findings.


Thanks
-- Imran

[1]: https://lore.kernel.org/lkml/20220701145047.2206900-1-imran.f.khan@oracle.com/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist.
  2022-07-01 12:49         ` Marek Szyprowski
  2022-07-01 15:06           ` Imran Khan
@ 2022-07-05 15:18           ` Geert Uytterhoeven
  1 sibling, 0 replies; 11+ messages in thread
From: Geert Uytterhoeven @ 2022-07-05 15:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Imran Khan, Tejun Heo, Greg KH, Al Viro, Linux Kernel Mailing List

Hi Marek,

On Fri, Jul 1, 2022 at 2:51 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> On 01.07.2022 14:20, Imran Khan wrote:
> > On 1/7/22 9:22 pm, Marek Szyprowski wrote:
> >> On 15.06.2022 04:10, Imran Khan wrote:
> >>> At present kernfs_notify_list is implemented as a singly linked
> >>> list of kernfs_node(s), where last element points to itself and
> >>> value of ->attr.next tells if node is present on the list or not.
> >>> Both addition and deletion to list happen under kernfs_notify_lock.
> >>>
> >>> Change kernfs_notify_list to llist so that addition to list can heppen
> >>> locklessly.
> >>>
> >>> Suggested by: Al Viro <viro@zeniv.linux.org.uk>
> >>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
> >>> Acked-by: Tejun Heo <tj@kernel.org>
> >> This patch landed in linux next-20220630 as commit b8f35fa1188b
> >> ("kernfs: Change kernfs_notify_list to llist."). Unfortunately, it
> >> causes serious regression on my test systems. It can be easily noticed
> >> in the logs by the following warning:
> >>
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 1 PID: 34 at fs/kernfs/dir.c:531 kernfs_put.part.0+0x1a4/0x1d8
> >> kernfs_put: console/active: released with incorrect active_ref 0
> >> Modules linked in:
> >> CPU: 1 PID: 34 Comm: kworker/1:4 Not tainted
> >> 5.19.0-rc4-05465-g5732b42edfd1 #12317
> >> Hardware name: Samsung Exynos (Flattened Device Tree)
> >> Workqueue: events kernfs_notify_workfn
> >>    unwind_backtrace from show_stack+0x10/0x14
> >>    show_stack from dump_stack_lvl+0x40/0x4c
> >>    dump_stack_lvl from __warn+0xc8/0x13c
> >>    __warn from warn_slowpath_fmt+0x90/0xb4
> >>    warn_slowpath_fmt from kernfs_put.part.0+0x1a4/0x1d8
> >>    kernfs_put.part.0 from kernfs_notify_workfn+0x1a0/0x1d0
> >>    kernfs_notify_workfn from process_one_work+0x1ec/0x4cc
> >>    process_one_work from worker_thread+0x58/0x54c
> >>    worker_thread from kthread+0xd0/0xec
> >>    kthread from ret_from_fork+0x14/0x2c
> >> Exception stack(0xf099dfb0 to 0xf099dff8)
> >> ...
> >> ---[ end trace 0000000000000000 ]---
> >>
> > Thanks for reporting this issue. It has been reported earlier in [1] as well. I
> > am unable to reproduce it locally. Could you please test with following patch on
> > top of linux next-20220630 and let me know if it helps:
>
> Yes, this fixes the issue. Feel free to add:
>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> Maybe it is related to the fact, that I have earlycon enabled on those
> machines?

Probably.
I see the issue on
  - rbtx4927 (CONFIG_EARLY_PRINTK=y),
  - SiPEED MAiXBiT (chosen/bootargs = "earlycon console=ttySIF0",
     chosen/stdout-path = "serial0:115200n8")
  - Litex/VexRiscV (chosen/bootargs = "console=liteuart earlycon=sbi").

It doesn't happen on the boards that just provide chosen/stdout-path
in DT.

Reverting commit b8f35fa1188b8403 ("kernfs: Change kernfs_notify_list
to llist.") fixes the issue.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-07-05 15:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15  2:10 [PATCH v7 0/4] kernfs: make ->attr.open RCU protected Imran Khan
2022-06-15  2:10 ` [PATCH v7 1/4] " Imran Khan
2022-06-15  6:12   ` Tejun Heo
2022-06-15  2:10 ` [PATCH v7 2/4] kernfs: Change kernfs_notify_list to llist Imran Khan
     [not found]   ` <CGME20220701112210eucas1p2d2db45881086f41b73527f7536537aa5@eucas1p2.samsung.com>
2022-07-01 11:22     ` Marek Szyprowski
2022-07-01 12:20       ` Imran Khan
2022-07-01 12:49         ` Marek Szyprowski
2022-07-01 15:06           ` Imran Khan
2022-07-05 15:18           ` Geert Uytterhoeven
2022-06-15  2:10 ` [PATCH v7 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
2022-06-15  2:10 ` [PATCH v7 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.