linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v4 0/4] kernfs: make ->attr.open RCU protected.
@ 2022-06-02  6:39 Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 1/4] " Imran Khan
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Imran Khan @ 2022-06-02  6:39 UTC (permalink / raw)
  To: tj, viro, gregkh; +Cc: matthew.wilcox, konrad.wilk, linux-kernel

I have not received any feedback about v4 of this patchset. So RESENDing
after rebase on tag next-20220601.

The patches in this version of the patch set are as follows:

PATCH-1: Make kernfs_open_node->attr.open RCU protected.

PATCH-2: Change kernfs_notify_list to llist.

PATCH-3: Introduce interface to access kernfs_open_file_mutex.

PATCH-4: Replace global kernfs_open_file_mutex with hashed mutexes.

Changes since v3:
 - Rebase on tag next-20220601
 - Rename RCU access related helpers and update their description
 - Address errors reported by kernel test robot
 - Include Acked-by tags from Tejun for the acked patches (PATCH-2,3,4)

Changes since v2:
 - Rebase on tag next-20220510
 - Remove PATCH-1 of v2 because it is present in tag next-20220510
 - Include Acked-by tags from Tejun for the acked patches (PATCH-2 and PATCH-3)


Cover letter for v2:
--------------------------------------------------------------------------

I have not yet received any feedback about v1 of this patchset [2] but
in the meantime an old version of first patch from [3] has been integrated in
linux-next. Functionally first patch in both [2] and [3] are identical.
It's just that [2] has renamed one of the functions to better reflect the fact
that we are no longer using reference counting for kernfs_open_node.

In this version, I have just modified first patch of v1 so that we use the
modified function name as done in [2] and avoid those parts that are already
present in linux-next now. The remaining 4 patches (PATCH-2 to PATCH-5) are
identical in both v1 and v2 albeit v2 has been rebased on tag next-20220503.

Changes since v1:
 - Rebase on tag next-20220503

[2]: https://lore.kernel.org/lkml/20220428055431.3826852-1-imran.f.khan@oracle.com/
[3]: https://lore.kernel.org/lkml/20220324103040.584491-1-imran.f.khan@oracle.com/

Original cover letter
-------------------------------------------------------

This patchset contains subset of patches (after addressing review comments)
discussed at [1]. Since [1] is replacing multiple global locks and since
each of these locks can be removed independently, it was decided that we
should make these changes in parts i.e first get one set of optimizations
integrated and then work on top of those further.

The patches in this change set introduce following changes:

PATCH-1: Remove reference counting for kernfs_open_node.

PATCH-2: Make kernfs_open_node->attr.open RCU protected.

PATCH-3: Change kernfs_notify_list to llist.

PATCH-4: Introduce interface to access kernfs_open_file_mutex.

PATCH-5: Replace global kernfs_open_file_mutex with hashed mutexes.

[1] https://lore.kernel.org/lkml/YmLfxHcekrr89IFl@slm.duckdns.org/

----------------------------------------------------------------

Imran Khan (4):
  kernfs: make ->attr.open RCU protected.
  kernfs: Change kernfs_notify_list to llist.
  kernfs: Introduce interface to access global kernfs_open_file_mutex.
  kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.

 fs/kernfs/file.c            | 270 +++++++++++++++++++++++-------------
 fs/kernfs/kernfs-internal.h |   4 +
 fs/kernfs/mount.c           |  19 +++
 include/linux/kernfs.h      |  61 +++++++-
 4 files changed, 258 insertions(+), 96 deletions(-)


base-commit: 5d8e7e3bbaaf115a935accd269873469928848c0
-- 
2.30.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-02  6:39 [PATCH RESEND v4 0/4] kernfs: make ->attr.open RCU protected Imran Khan
@ 2022-06-02  6:39 ` Imran Khan
  2022-06-12 17:58   ` Tejun Heo
  2022-06-02  6:39 ` [PATCH RESEND v4 2/4] kernfs: Change kernfs_notify_list to llist Imran Khan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Imran Khan @ 2022-06-02  6:39 UTC (permalink / raw)
  To: tj, viro, gregkh; +Cc: matthew.wilcox, konrad.wilk, linux-kernel

After removal of kernfs_open_node->refcnt in the previous patch,
kernfs_open_node_lock can be removed as well by making ->attr.open
RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open
to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock.

Suggested by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---
 fs/kernfs/file.c       | 172 ++++++++++++++++++++++++++++++-----------
 include/linux/kernfs.h |   2 +-
 2 files changed, 127 insertions(+), 47 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e3abfa843879..63f1a8f3efb6 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -23,16 +23,16 @@
  * for each kernfs_node with one or more open files.
  *
  * kernfs_node->attr.open points to kernfs_open_node.  attr.open is
- * protected by kernfs_open_node_lock.
+ * RCU protected.
  *
  * filp->private_data points to seq_file whose ->private points to
  * kernfs_open_file.  kernfs_open_files are chained at
  * kernfs_open_node->files, which is protected by kernfs_open_file_mutex.
  */
-static DEFINE_SPINLOCK(kernfs_open_node_lock);
 static DEFINE_MUTEX(kernfs_open_file_mutex);
 
 struct kernfs_open_node {
+	struct rcu_head		rcu_head;
 	atomic_t		event;
 	wait_queue_head_t	poll;
 	struct list_head	files; /* goes through kernfs_open_file.list */
@@ -51,6 +51,77 @@ struct kernfs_open_node {
 static DEFINE_SPINLOCK(kernfs_notify_lock);
 static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
 
+/**
+ * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
+ *
+ * @of: associated kernfs_open_file instance.
+ * @kn: target kernfs_node.
+ *
+ * Fetch and return ->attr.open of @kn if @of->list is non empty.
+ * If @of->list is not empty we can safely assume that @of is on
+ * @kn->attr.open->files list and this guarantees that @kn->attr.open
+ * will not vanish i.e. dereferencing outside RCU read-side critical
+ * section is safe here.
+ *
+ * This should ONLY be used by readers of ->attr.open and caller needs
+ * to make sure that @of->list is not empty.
+ */
+static struct kernfs_open_node *
+kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
+{
+	struct kernfs_open_node *on;
+
+	on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list));
+
+	return on;
+}
+
+/**
+ * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
+ *
+ * @kn: target kernfs_node.
+ *
+ * Fetch and return ->attr.open of @kn when caller(writer) holds
+ * kernfs_open_file_mutex.
+ *
+ * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
+ * the current updater (caller) is holding this mutex, other updaters will not
+ * be able to change ->attr.open and this means that we can safely deref
+ * ->attr.open outside RCU read-side critical section.
+ *
+ * This should ONLY be used by updaters of ->attr.open and caller needs to make
+ * sure that kernfs_open_file_mutex is held.
+ */
+static struct kernfs_open_node *
+kernfs_deref_open_node_protected(struct kernfs_node *kn)
+{
+	return rcu_dereference_protected(kn->attr.open,
+				lockdep_is_held(&kernfs_open_file_mutex));
+}
+
+/**
+ * kernfs_check_open_node_protected - Get kernfs_open_node corresponding to @kn
+ *
+ * @kn: target kernfs_node.
+ *
+ * Fetch and return ->attr.open of @kn when caller(reader) holds
+ * kernfs_open_file_mutex.
+ *
+ * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
+ * the current reader (caller) is holding this mutex, updaters will not be
+ * able to change ->attr.open and this means that we can safely deref
+ * ->attr.open outside RCU read-side critical section.
+ *
+ * This should ONLY be used by readers of ->attr.open and caller needs to make
+ * sure that kernfs_open_file_mutex is held.
+ */
+static struct kernfs_open_node *
+kernfs_check_open_node_protected(struct kernfs_node *kn)
+{
+	return rcu_dereference_check(kn->attr.open,
+				      lockdep_is_held(&kernfs_open_file_mutex));
+}
+
 static struct kernfs_open_file *kernfs_of(struct file *file)
 {
 	return ((struct seq_file *)file->private_data)->private;
@@ -156,8 +227,12 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
 static int kernfs_seq_show(struct seq_file *sf, void *v)
 {
 	struct kernfs_open_file *of = sf->private;
+	struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn);
+
+	if (!on)
+		return -EINVAL;
 
-	of->event = atomic_read(&of->kn->attr.open->event);
+	of->event = atomic_read(&on->event);
 
 	return of->kn->attr.ops->seq_show(sf, v);
 }
@@ -180,6 +255,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
 	ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE);
 	const struct kernfs_ops *ops;
+	struct kernfs_open_node *on;
 	char *buf;
 
 	buf = of->prealloc_buf;
@@ -201,7 +277,15 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		goto out_free;
 	}
 
-	of->event = atomic_read(&of->kn->attr.open->event);
+	on = kernfs_deref_open_node(of, of->kn);
+	if (!on) {
+		len = -EINVAL;
+		mutex_unlock(&of->mutex);
+		goto out_free;
+	}
+
+	of->event = atomic_read(&on->event);
+
 	ops = kernfs_ops(of->kn);
 	if (ops->read)
 		len = ops->read(of, buf, len, iocb->ki_pos);
@@ -519,36 +603,29 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 {
 	struct kernfs_open_node *on, *new_on = NULL;
 
- retry:
 	mutex_lock(&kernfs_open_file_mutex);
-	spin_lock_irq(&kernfs_open_node_lock);
-
-	if (!kn->attr.open && new_on) {
-		kn->attr.open = new_on;
-		new_on = NULL;
-	}
-
-	on = kn->attr.open;
-	if (on)
-		list_add_tail(&of->list, &on->files);
-
-	spin_unlock_irq(&kernfs_open_node_lock);
-	mutex_unlock(&kernfs_open_file_mutex);
+	on = kernfs_deref_open_node_protected(kn);
 
 	if (on) {
-		kfree(new_on);
+		list_add_tail(&of->list, &on->files);
+		mutex_unlock(&kernfs_open_file_mutex);
 		return 0;
+	} else {
+		/* not there, initialize a new one */
+		new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
+		if (!new_on) {
+			mutex_unlock(&kernfs_open_file_mutex);
+			return -ENOMEM;
+		}
+		atomic_set(&new_on->event, 1);
+		init_waitqueue_head(&new_on->poll);
+		INIT_LIST_HEAD(&new_on->files);
+		list_add_tail(&of->list, &new_on->files);
+		rcu_assign_pointer(kn->attr.open, new_on);
 	}
+	mutex_unlock(&kernfs_open_file_mutex);
 
-	/* not there, initialize a new one and retry */
-	new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
-	if (!new_on)
-		return -ENOMEM;
-
-	atomic_set(&new_on->event, 1);
-	init_waitqueue_head(&new_on->poll);
-	INIT_LIST_HEAD(&new_on->files);
-	goto retry;
+	return 0;
 }
 
 /**
@@ -567,24 +644,25 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 static void kernfs_unlink_open_file(struct kernfs_node *kn,
 				 struct kernfs_open_file *of)
 {
-	struct kernfs_open_node *on = kn->attr.open;
-	unsigned long flags;
+	struct kernfs_open_node *on;
 
 	mutex_lock(&kernfs_open_file_mutex);
-	spin_lock_irqsave(&kernfs_open_node_lock, flags);
+
+	on = kernfs_deref_open_node_protected(kn);
+	if (!on) {
+		mutex_unlock(&kernfs_open_file_mutex);
+		return;
+	}
 
 	if (of)
 		list_del(&of->list);
 
-	if (list_empty(&on->files))
-		kn->attr.open = NULL;
-	else
-		on = NULL;
+	if (list_empty(&on->files)) {
+		rcu_assign_pointer(kn->attr.open, NULL);
+		kfree_rcu(on, rcu_head);
+	}
 
-	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
 	mutex_unlock(&kernfs_open_file_mutex);
-
-	kfree(on);
 }
 
 static int kernfs_fop_open(struct inode *inode, struct file *file)
@@ -774,17 +852,16 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 	 * check under kernfs_open_file_mutex will ensure bailing out if
 	 * ->attr.open became NULL while waiting for the mutex.
 	 */
-	if (!kn->attr.open)
+	if (!rcu_access_pointer(kn->attr.open))
 		return;
 
 	mutex_lock(&kernfs_open_file_mutex);
-	if (!kn->attr.open) {
+	on = kernfs_check_open_node_protected(kn);
+	if (!on) {
 		mutex_unlock(&kernfs_open_file_mutex);
 		return;
 	}
 
-	on = kn->attr.open;
-
 	list_for_each_entry(of, &on->files, list) {
 		struct inode *inode = file_inode(of->file);
 
@@ -815,7 +892,10 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 __poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
 {
 	struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
-	struct kernfs_open_node *on = kn->attr.open;
+	struct kernfs_open_node *on = kernfs_deref_open_node(of, kn);
+
+	if (!on)
+		return EPOLLERR;
 
 	poll_wait(of->file, &on->poll, wait);
 
@@ -922,13 +1002,13 @@ void kernfs_notify(struct kernfs_node *kn)
 		return;
 
 	/* kick poll immediately */
-	spin_lock_irqsave(&kernfs_open_node_lock, flags);
-	on = kn->attr.open;
+	rcu_read_lock();
+	on = rcu_dereference(kn->attr.open);
 	if (on) {
 		atomic_inc(&on->event);
 		wake_up_interruptible(&on->poll);
 	}
-	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
+	rcu_read_unlock();
 
 	/* schedule work to kick fsnotify */
 	spin_lock_irqsave(&kernfs_notify_lock, flags);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index e2ae15a6225e..13f54f078a52 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -114,7 +114,7 @@ struct kernfs_elem_symlink {
 
 struct kernfs_elem_attr {
 	const struct kernfs_ops	*ops;
-	struct kernfs_open_node	*open;
+	struct kernfs_open_node __rcu	*open;
 	loff_t			size;
 	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RESEND v4 2/4] kernfs: Change kernfs_notify_list to llist.
  2022-06-02  6:39 [PATCH RESEND v4 0/4] kernfs: make ->attr.open RCU protected Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 1/4] " Imran Khan
@ 2022-06-02  6:39 ` Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan
  3 siblings, 0 replies; 13+ messages in thread
From: Imran Khan @ 2022-06-02  6:39 UTC (permalink / raw)
  To: tj, viro, gregkh; +Cc: matthew.wilcox, konrad.wilk, linux-kernel

At present kernfs_notify_list is implemented as a singly linked
list of kernfs_node(s), where last element points to itself and
value of ->attr.next tells if node is present on the list or not.
Both addition and deletion to list happen under kernfs_notify_lock.

Change kernfs_notify_list to llist so that addition to list can heppen
locklessly. We still need kernfs_notify_lock for consumers (kernfs_notify\
_workfn) because there can be multiple concurrent work items.

Suggested by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c       | 47 ++++++++++++++++++------------------------
 include/linux/kernfs.h |  2 +-
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 63f1a8f3efb6..b21c8e3b6a8d 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -38,18 +38,16 @@ struct kernfs_open_node {
 	struct list_head	files; /* goes through kernfs_open_file.list */
 };
 
-/*
- * kernfs_notify() may be called from any context and bounces notifications
- * through a work item.  To minimize space overhead in kernfs_node, the
- * pending queue is implemented as a singly linked list of kernfs_nodes.
- * The list is terminated with the self pointer so that whether a
- * kernfs_node is on the list or not can be determined by testing the next
- * pointer for NULL.
+/**
+ * attribute_to_node - get kernfs_node object corresponding to a kernfs attribute
+ * @ptr:	&struct kernfs_elem_attr
+ * @type:	struct kernfs_node
+ * @member:	name of member (i.e attr)
  */
-#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
+#define attribute_to_node(ptr, type, member)	\
+	container_of(ptr, type, member)
 
-static DEFINE_SPINLOCK(kernfs_notify_lock);
-static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
+static LLIST_HEAD(kernfs_notify_list);
 
 /**
  * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
@@ -928,18 +926,16 @@ static void kernfs_notify_workfn(struct work_struct *work)
 	struct kernfs_node *kn;
 	struct kernfs_super_info *info;
 	struct kernfs_root *root;
+	struct llist_node *free;
+	struct kernfs_elem_attr *attr;
 repeat:
 	/* pop one off the notify_list */
-	spin_lock_irq(&kernfs_notify_lock);
-	kn = kernfs_notify_list;
-	if (kn == KERNFS_NOTIFY_EOL) {
-		spin_unlock_irq(&kernfs_notify_lock);
+	free = llist_del_first(&kernfs_notify_list);
+	if (free == NULL)
 		return;
-	}
-	kernfs_notify_list = kn->attr.notify_next;
-	kn->attr.notify_next = NULL;
-	spin_unlock_irq(&kernfs_notify_lock);
 
+	attr = llist_entry(free, struct kernfs_elem_attr, notify_next);
+	kn = attribute_to_node(attr, struct kernfs_node, attr);
 	root = kernfs_root(kn);
 	/* kick fsnotify */
 	down_write(&root->kernfs_rwsem);
@@ -995,12 +991,14 @@ static void kernfs_notify_workfn(struct work_struct *work)
 void kernfs_notify(struct kernfs_node *kn)
 {
 	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
-	unsigned long flags;
 	struct kernfs_open_node *on;
 
 	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
 		return;
 
+	/* Because we are using llist for kernfs_notify_list */
+	WARN_ON_ONCE(in_nmi());
+
 	/* kick poll immediately */
 	rcu_read_lock();
 	on = rcu_dereference(kn->attr.open);
@@ -1011,14 +1009,9 @@ void kernfs_notify(struct kernfs_node *kn)
 	rcu_read_unlock();
 
 	/* schedule work to kick fsnotify */
-	spin_lock_irqsave(&kernfs_notify_lock, flags);
-	if (!kn->attr.notify_next) {
-		kernfs_get(kn);
-		kn->attr.notify_next = kernfs_notify_list;
-		kernfs_notify_list = kn;
-		schedule_work(&kernfs_notify_work);
-	}
-	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
+	kernfs_get(kn);
+	llist_add(&kn->attr.notify_next, &kernfs_notify_list);
+	schedule_work(&kernfs_notify_work);
 }
 EXPORT_SYMBOL_GPL(kernfs_notify);
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 13f54f078a52..2dd9c8df0f4f 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -116,7 +116,7 @@ struct kernfs_elem_attr {
 	const struct kernfs_ops	*ops;
 	struct kernfs_open_node __rcu	*open;
 	loff_t			size;
-	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
+	struct llist_node	notify_next;	/* for kernfs_notify() */
 };
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RESEND v4 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex.
  2022-06-02  6:39 [PATCH RESEND v4 0/4] kernfs: make ->attr.open RCU protected Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 1/4] " Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 2/4] kernfs: Change kernfs_notify_list to llist Imran Khan
@ 2022-06-02  6:39 ` Imran Khan
  2022-06-02  6:39 ` [PATCH RESEND v4 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan
  3 siblings, 0 replies; 13+ messages in thread
From: Imran Khan @ 2022-06-02  6:39 UTC (permalink / raw)
  To: tj, viro, gregkh; +Cc: matthew.wilcox, konrad.wilk, linux-kernel

This allows to change underlying mutex locking, without needing to change
the users of the lock. For example next patch modifies this interface to
use hashed mutexes in place of a single global kernfs_open_file_mutex.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c | 72 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index b21c8e3b6a8d..d35d01d30fa0 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -49,6 +49,22 @@ struct kernfs_open_node {
 
 static LLIST_HEAD(kernfs_notify_list);
 
+static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
+{
+	return &kernfs_open_file_mutex;
+}
+
+static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
+{
+	struct mutex *lock;
+
+	lock = kernfs_open_file_mutex_ptr(kn);
+
+	mutex_lock(lock);
+
+	return lock;
+}
+
 /**
  * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn.
  *
@@ -80,21 +96,21 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn)
  * @kn: target kernfs_node.
  *
  * Fetch and return ->attr.open of @kn when caller(writer) holds
- * kernfs_open_file_mutex.
+ * kernfs_open_file_mutex_ptr(kn).
  *
- * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
- * the current updater (caller) is holding this mutex, other updaters will not
- * be able to change ->attr.open and this means that we can safely deref
- * ->attr.open outside RCU read-side critical section.
+ * Update of ->attr.open happens under kernfs_open_file_mutex_ptr(kn). So as
+ * long as the current updater (caller) is holding this mutex, other updaters
+ * will not be able to change ->attr.open and this means that we can safely
+ * deref ->attr.open outside RCU read-side critical section.
  *
  * This should ONLY be used by updaters of ->attr.open and caller needs to make
- * sure that kernfs_open_file_mutex is held.
+ * sure that kernfs_open_file_mutex_ptr(kn) is held.
  */
 static struct kernfs_open_node *
 kernfs_deref_open_node_protected(struct kernfs_node *kn)
 {
 	return rcu_dereference_protected(kn->attr.open,
-				lockdep_is_held(&kernfs_open_file_mutex));
+			lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
 }
 
 /**
@@ -103,21 +119,21 @@ kernfs_deref_open_node_protected(struct kernfs_node *kn)
  * @kn: target kernfs_node.
  *
  * Fetch and return ->attr.open of @kn when caller(reader) holds
- * kernfs_open_file_mutex.
+ * kernfs_open_file_mutex_ptr(kn).
  *
- * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
- * the current reader (caller) is holding this mutex, updaters will not be
- * able to change ->attr.open and this means that we can safely deref
+ * Update of ->attr.open happens under kernfs_open_file_mutex_ptr(kn). So as
+ * long as the current reader (caller) is holding this mutex, updaters will
+ * not be able to change ->attr.open and this means that we can safely deref
  * ->attr.open outside RCU read-side critical section.
  *
  * This should ONLY be used by readers of ->attr.open and caller needs to make
- * sure that kernfs_open_file_mutex is held.
+ * sure that kernfs_open_file_mutex_ptr(kn) is held.
  */
 static struct kernfs_open_node *
 kernfs_check_open_node_protected(struct kernfs_node *kn)
 {
 	return rcu_dereference_check(kn->attr.open,
-				      lockdep_is_held(&kernfs_open_file_mutex));
+			lockdep_is_held(kernfs_open_file_mutex_ptr(kn)));
 }
 
 static struct kernfs_open_file *kernfs_of(struct file *file)
@@ -600,19 +616,20 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 				struct kernfs_open_file *of)
 {
 	struct kernfs_open_node *on, *new_on = NULL;
+	struct mutex *mutex = NULL;
 
-	mutex_lock(&kernfs_open_file_mutex);
+	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_deref_open_node_protected(kn);
 
 	if (on) {
 		list_add_tail(&of->list, &on->files);
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 		return 0;
 	} else {
 		/* not there, initialize a new one */
 		new_on = kmalloc(sizeof(*new_on), GFP_KERNEL);
 		if (!new_on) {
-			mutex_unlock(&kernfs_open_file_mutex);
+			mutex_unlock(mutex);
 			return -ENOMEM;
 		}
 		atomic_set(&new_on->event, 1);
@@ -621,7 +638,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
 		list_add_tail(&of->list, &new_on->files);
 		rcu_assign_pointer(kn->attr.open, new_on);
 	}
-	mutex_unlock(&kernfs_open_file_mutex);
+	mutex_unlock(mutex);
 
 	return 0;
 }
@@ -643,12 +660,13 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 				 struct kernfs_open_file *of)
 {
 	struct kernfs_open_node *on;
+	struct mutex *mutex = NULL;
 
-	mutex_lock(&kernfs_open_file_mutex);
+	mutex = kernfs_open_file_mutex_lock(kn);
 
 	on = kernfs_deref_open_node_protected(kn);
 	if (!on) {
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 		return;
 	}
 
@@ -660,7 +678,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn,
 		kfree_rcu(on, rcu_head);
 	}
 
-	mutex_unlock(&kernfs_open_file_mutex);
+	mutex_unlock(mutex);
 }
 
 static int kernfs_fop_open(struct inode *inode, struct file *file)
@@ -802,7 +820,7 @@ static void kernfs_release_file(struct kernfs_node *kn,
 	 * here because drain path may be called from places which can
 	 * cause circular dependency.
 	 */
-	lockdep_assert_held(&kernfs_open_file_mutex);
+	lockdep_assert_held(kernfs_open_file_mutex_ptr(kn));
 
 	if (!of->released) {
 		/*
@@ -819,11 +837,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
 {
 	struct kernfs_node *kn = inode->i_private;
 	struct kernfs_open_file *of = kernfs_of(filp);
+	struct mutex *mutex = NULL;
 
 	if (kn->flags & KERNFS_HAS_RELEASE) {
-		mutex_lock(&kernfs_open_file_mutex);
+		mutex = kernfs_open_file_mutex_lock(kn);
 		kernfs_release_file(kn, of);
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 	}
 
 	kernfs_unlink_open_file(kn, of);
@@ -838,6 +857,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 {
 	struct kernfs_open_node *on;
 	struct kernfs_open_file *of;
+	struct mutex *mutex = NULL;
 
 	if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
 		return;
@@ -853,10 +873,10 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 	if (!rcu_access_pointer(kn->attr.open))
 		return;
 
-	mutex_lock(&kernfs_open_file_mutex);
+	mutex = kernfs_open_file_mutex_lock(kn);
 	on = kernfs_check_open_node_protected(kn);
 	if (!on) {
-		mutex_unlock(&kernfs_open_file_mutex);
+		mutex_unlock(mutex);
 		return;
 	}
 
@@ -870,7 +890,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
 			kernfs_release_file(kn, of);
 	}
 
-	mutex_unlock(&kernfs_open_file_mutex);
+	mutex_unlock(mutex);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RESEND v4 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.
  2022-06-02  6:39 [PATCH RESEND v4 0/4] kernfs: make ->attr.open RCU protected Imran Khan
                   ` (2 preceding siblings ...)
  2022-06-02  6:39 ` [PATCH RESEND v4 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
@ 2022-06-02  6:39 ` Imran Khan
  3 siblings, 0 replies; 13+ messages in thread
From: Imran Khan @ 2022-06-02  6:39 UTC (permalink / raw)
  To: tj, viro, gregkh; +Cc: matthew.wilcox, konrad.wilk, linux-kernel

In current kernfs design a single mutex, kernfs_open_file_mutex, protects
the list of kernfs_open_file instances corresponding to a sysfs attribute.
So even if different tasks are opening or closing different sysfs files
they can contend on osq_lock of this mutex. The contention is more apparent
in large scale systems with few hundred CPUs where most of the CPUs have
running tasks that are opening, accessing or closing sysfs files at any
point of time.

Using hashed mutexes in place of a single global mutex, can significantly
reduce contention around global mutex and hence can provide better
scalability. Moreover as these hashed mutexes are not part of kernfs_node
objects we will not see any singnificant change in memory utilization of
kernfs based file systems like sysfs, cgroupfs etc.

Modify interface introduced in previous patch to make use of hashed
mutexes. Use kernfs_node address as hashing key.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/file.c            | 17 ++---------
 fs/kernfs/kernfs-internal.h |  4 +++
 fs/kernfs/mount.c           | 19 +++++++++++++
 include/linux/kernfs.h      | 57 +++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index d35d01d30fa0..987ef7165acc 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -18,19 +18,6 @@
 
 #include "kernfs-internal.h"
 
-/*
- * There's one kernfs_open_file for each open file and one kernfs_open_node
- * for each kernfs_node with one or more open files.
- *
- * kernfs_node->attr.open points to kernfs_open_node.  attr.open is
- * RCU protected.
- *
- * filp->private_data points to seq_file whose ->private points to
- * kernfs_open_file.  kernfs_open_files are chained at
- * kernfs_open_node->files, which is protected by kernfs_open_file_mutex.
- */
-static DEFINE_MUTEX(kernfs_open_file_mutex);
-
 struct kernfs_open_node {
 	struct rcu_head		rcu_head;
 	atomic_t		event;
@@ -51,7 +38,9 @@ static LLIST_HEAD(kernfs_notify_list);
 
 static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
 {
-	return &kernfs_open_file_mutex;
+	int idx = hash_ptr(kn, NR_KERNFS_LOCK_BITS);
+
+	return &kernfs_locks->open_file_mutex[idx];
 }
 
 static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index eeaa779b929c..3ae214d02d44 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -164,4 +164,8 @@ void kernfs_drain_open_files(struct kernfs_node *kn);
  */
 extern const struct inode_operations kernfs_symlink_iops;
 
+/*
+ * kernfs locks
+ */
+extern struct kernfs_global_locks *kernfs_locks;
 #endif	/* __KERNFS_INTERNAL_H */
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index cfa79715fc1a..d0859f72d2d6 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -20,6 +20,7 @@
 #include "kernfs-internal.h"
 
 struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache;
+struct kernfs_global_locks *kernfs_locks;
 
 static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry)
 {
@@ -387,6 +388,22 @@ void kernfs_kill_sb(struct super_block *sb)
 	kfree(info);
 }
 
+static void __init kernfs_mutex_init(void)
+{
+	int count;
+
+	for (count = 0; count < NR_KERNFS_LOCKS; count++)
+		mutex_init(&kernfs_locks->open_file_mutex[count]);
+}
+
+static void __init kernfs_lock_init(void)
+{
+	kernfs_locks = kmalloc(sizeof(struct kernfs_global_locks), GFP_KERNEL);
+	WARN_ON(!kernfs_locks);
+
+	kernfs_mutex_init();
+}
+
 void __init kernfs_init(void)
 {
 	kernfs_node_cache = kmem_cache_create("kernfs_node_cache",
@@ -397,4 +414,6 @@ void __init kernfs_init(void)
 	kernfs_iattrs_cache  = kmem_cache_create("kernfs_iattrs_cache",
 					      sizeof(struct kernfs_iattrs),
 					      0, SLAB_PANIC, NULL);
+
+	kernfs_lock_init();
 }
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 2dd9c8df0f4f..13e703f615f7 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -18,6 +18,7 @@
 #include <linux/uidgid.h>
 #include <linux/wait.h>
 #include <linux/rwsem.h>
+#include <linux/cache.h>
 
 struct file;
 struct dentry;
@@ -34,6 +35,62 @@ struct kernfs_fs_context;
 struct kernfs_open_node;
 struct kernfs_iattrs;
 
+/*
+ * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash
+ * table of locks.
+ * Having a small hash table would impact scalability, since
+ * more and more kernfs_node objects will end up using same lock
+ * and having a very large hash table would waste memory.
+ *
+ * At the moment size of hash table of locks is being set based on
+ * the number of CPUs as follows:
+ *
+ * NR_CPU      NR_KERNFS_LOCK_BITS      NR_KERNFS_LOCKS
+ *   1                  1                       2
+ *  2-3                 2                       4
+ *  4-7                 4                       16
+ *  8-15                6                       64
+ *  16-31               8                       256
+ *  32 and more         10                      1024
+ *
+ * The above relation between NR_CPU and number of locks is based
+ * on some internal experimentation which involved booting qemu
+ * with different values of smp, performing some sysfs operations
+ * on all CPUs and observing how increase in number of locks impacts
+ * completion time of these sysfs operations on each CPU.
+ */
+#ifdef CONFIG_SMP
+#define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32)))
+#else
+#define NR_KERNFS_LOCK_BITS     1
+#endif
+
+#define NR_KERNFS_LOCKS     (1 << NR_KERNFS_LOCK_BITS)
+
+/*
+ * There's one kernfs_open_file for each open file and one kernfs_open_node
+ * for each kernfs_node with one or more open files.
+ *
+ * filp->private_data points to seq_file whose ->private points to
+ * kernfs_open_file.
+ *
+ * kernfs_open_files are chained at kernfs_open_node->files, which is
+ * protected by kernfs_global_locks.open_file_mutex[i].
+ *
+ * To reduce possible contention in sysfs access, arising due to single
+ * locks, use an array of locks (e.g. open_file_mutex) and use kernfs_node
+ * object address as hash keys to get the index of these locks.
+ *
+ * Hashed mutexes are safe to use here because operations using these don't
+ * rely on global exclusion.
+ *
+ * In future we intend to replace other global locks with hashed ones as well.
+ * kernfs_global_locks acts as a holder for all such hash tables.
+ */
+struct kernfs_global_locks {
+	struct mutex open_file_mutex[NR_KERNFS_LOCKS];
+};
+
 enum kernfs_node_type {
 	KERNFS_DIR		= 0x0001,
 	KERNFS_FILE		= 0x0002,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-02  6:39 ` [PATCH RESEND v4 1/4] " Imran Khan
@ 2022-06-12 17:58   ` Tejun Heo
  2022-06-13  2:36     ` Imran Khan
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2022-06-12 17:58 UTC (permalink / raw)
  To: Imran Khan; +Cc: viro, gregkh, matthew.wilcox, konrad.wilk, linux-kernel

Hello,

Sorry about the long delay.

On Thu, Jun 02, 2022 at 04:39:04PM +1000, Imran Khan wrote:
> +/**
> + * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn
> + *
> + * @kn: target kernfs_node.
> + *
> + * Fetch and return ->attr.open of @kn when caller(writer) holds
> + * kernfs_open_file_mutex.
> + *
> + * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
> + * the current updater (caller) is holding this mutex, other updaters will not
> + * be able to change ->attr.open and this means that we can safely deref
> + * ->attr.open outside RCU read-side critical section.
> + *
> + * This should ONLY be used by updaters of ->attr.open and caller needs to make
> + * sure that kernfs_open_file_mutex is held.
> + */
> +static struct kernfs_open_node *
> +kernfs_deref_open_node_protected(struct kernfs_node *kn)
> +{
> +	return rcu_dereference_protected(kn->attr.open,
> +				lockdep_is_held(&kernfs_open_file_mutex));
> +}
> +
> +/**
> + * kernfs_check_open_node_protected - Get kernfs_open_node corresponding to @kn
> + *
> + * @kn: target kernfs_node.
> + *
> + * Fetch and return ->attr.open of @kn when caller(reader) holds
> + * kernfs_open_file_mutex.
> + *
> + * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
> + * the current reader (caller) is holding this mutex, updaters will not be
> + * able to change ->attr.open and this means that we can safely deref
> + * ->attr.open outside RCU read-side critical section.
> + *
> + * This should ONLY be used by readers of ->attr.open and caller needs to make
> + * sure that kernfs_open_file_mutex is held.
> + */
> +static struct kernfs_open_node *
> +kernfs_check_open_node_protected(struct kernfs_node *kn)
> +{
> +	return rcu_dereference_check(kn->attr.open,
> +				      lockdep_is_held(&kernfs_open_file_mutex));
> +}

I don't understand why the above is necessary. Whether you're a reader or
writer, you can deref the pointer w/ _protected as long as you're holding
the lock, right? If I'm mistaken and somehow a reader needs to use a
deref_check, I don't see a reason for this to be a separate accessor. Why
not just merge the condition into the kernfs_deref_open_node()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-12 17:58   ` Tejun Heo
@ 2022-06-13  2:36     ` Imran Khan
  2022-06-13  2:46       ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Imran Khan @ 2022-06-13  2:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: viro, gregkh, linux-kernel

Hello Tejun,

On 13/6/22 3:58 am, Tejun Heo wrote:
> Hello,
> 
> Sorry about the long delay.
>
No problem.

[...]
>> +/**
>> + * kernfs_check_open_node_protected - Get kernfs_open_node corresponding to @kn
>> + *
>> + * @kn: target kernfs_node.
>> + *
>> + * Fetch and return ->attr.open of @kn when caller(reader) holds
>> + * kernfs_open_file_mutex.
>> + *
>> + * Update of ->attr.open happens under kernfs_open_file_mutex. So as long as
>> + * the current reader (caller) is holding this mutex, updaters will not be
>> + * able to change ->attr.open and this means that we can safely deref
>> + * ->attr.open outside RCU read-side critical section.
>> + *
>> + * This should ONLY be used by readers of ->attr.open and caller needs to make
>> + * sure that kernfs_open_file_mutex is held.
>> + */
>> +static struct kernfs_open_node *
>> +kernfs_check_open_node_protected(struct kernfs_node *kn)
>> +{
>> +	return rcu_dereference_check(kn->attr.open,
>> +				      lockdep_is_held(&kernfs_open_file_mutex));
>> +}
> 
> I don't understand why the above is necessary. Whether you're a reader or
> writer, you can deref the pointer w/ _protected as long as you're holding
> the lock, right?

As per [1], we should use rcu_dereference_check() for the reader side when we
are holding the lock.

> If I'm mistaken and somehow a reader needs to use a
> deref_check, I don't see a reason for this to be a separate accessor. Why
> not just merge the condition into the kernfs_deref_open_node()?
>

I have not merged lockdep_is_held(&kernfs_open_file_mutex) in
kernfs_deref_open_node() because not all users of kernfs_deref_open_node() (i.e.
kernfs_seq_show(), kernfs_file_read_iter(), kernfs_generic_poll()) can guarantee
that kernfs_open_file_mutex has been acquired. So at the moment we have 2
helpers around rcu_dereference_check, kernfs_check_open_node_protected() for
readers under kernfs_open_file_mutex and kernfs_deref_open_node for readers that
rely on existence of kernfs_open_file by making sure that ->list is not empty.

Could you please let me know if this sounds reasonable to you ?

[1]:
https://www.kernel.org/doc/html/v5.6/RCU/rcu_dereference.html#which-member-of-the-rcu-dereference-family-should-you-use


Thanks again for your review.
  -- Imran

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-13  2:36     ` Imran Khan
@ 2022-06-13  2:46       ` Tejun Heo
  2022-06-13  2:55         ` Imran Khan
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2022-06-13  2:46 UTC (permalink / raw)
  To: Imran Khan; +Cc: viro, gregkh, linux-kernel

Hello,

On Mon, Jun 13, 2022 at 12:36:12PM +1000, Imran Khan wrote:
> >> +static struct kernfs_open_node *
> >> +kernfs_check_open_node_protected(struct kernfs_node *kn)
> >> +{
> >> +	return rcu_dereference_check(kn->attr.open,
> >> +				      lockdep_is_held(&kernfs_open_file_mutex));
> >> +}
> > 
> > I don't understand why the above is necessary. Whether you're a reader or
> > writer, you can deref the pointer w/ _protected as long as you're holding
> > the lock, right?
> 
> As per [1], we should use rcu_dereference_check() for the reader side when we
> are holding the lock.

Hmm.... can you quote the exact phrase that you took the above from?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-13  2:46       ` Tejun Heo
@ 2022-06-13  2:55         ` Imran Khan
  2022-06-13  3:09           ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Imran Khan @ 2022-06-13  2:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: viro, gregkh, linux-kernel

Hello Tejun,

On 13/6/22 12:46 pm, Tejun Heo wrote:
> Hello,
> 
> On Mon, Jun 13, 2022 at 12:36:12PM +1000, Imran Khan wrote:
>>>> +static struct kernfs_open_node *
>>>> +kernfs_check_open_node_protected(struct kernfs_node *kn)
>>>> +{
>>>> +	return rcu_dereference_check(kn->attr.open,
>>>> +				      lockdep_is_held(&kernfs_open_file_mutex));
>>>> +}
>>>
>>> I don't understand why the above is necessary. Whether you're a reader or
>>> writer, you can deref the pointer w/ _protected as long as you're holding
>>> the lock, right?
>>
>> As per [1], we should use rcu_dereference_check() for the reader side when we
>> are holding the lock.
> 
> Hmm.... can you quote the exact phrase that you took the above from?
> 
I took below phrases as reference:

If the access might be within an RCU read-side critical section on the one hand,
or protected by (say) my_lock on the other, use rcu_dereference_check(), for
example:

p1 = rcu_dereference_check(p->rcu_protected_pointer,
                           lockdep_is_held(&my_lock));


and


If the access might be within an RCU read-side critical section on the one hand,
or protected by either my_lock or your_lock on the other, again use
rcu_dereference_check(), for example:

p1 = rcu_dereference_check(p->rcu_protected_pointer,
                           lockdep_is_held(&my_lock) ||
                           lockdep_is_held(&your_lock));


Thanks,
 -- Imran

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-13  2:55         ` Imran Khan
@ 2022-06-13  3:09           ` Tejun Heo
  2022-06-13  3:56             ` Imran Khan
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2022-06-13  3:09 UTC (permalink / raw)
  To: Imran Khan; +Cc: viro, gregkh, linux-kernel

On Mon, Jun 13, 2022 at 12:55:14PM +1000, Imran Khan wrote:
> I took below phrases as reference:
> 
> If the access might be within an RCU read-side critical section on the one hand,
> or protected by (say) my_lock on the other, use rcu_dereference_check(), for
> example:
> 
> p1 = rcu_dereference_check(p->rcu_protected_pointer,
>                            lockdep_is_held(&my_lock));
> 
> 
> and
> 
> 
> If the access might be within an RCU read-side critical section on the one hand,
> or protected by either my_lock or your_lock on the other, again use
> rcu_dereference_check(), for example:
> 
> p1 = rcu_dereference_check(p->rcu_protected_pointer,
>                            lockdep_is_held(&my_lock) ||
>                            lockdep_is_held(&your_lock));

So, both are saying that if a given reference can be under both read
critical section or a lock which blocks updates, you can use deref_check to
cover both cases - we're just using the stronger form of derefing even
though that's not necessary while update side is locked out, which is fine.

The protected one is different in that it doesn't enforce the load ordering
which is required for accesses with only RCU read lock. Given that all
that's required is dependency ordering, I doubt it makes any actual
difference and it likely is more useful in marking a specific dereference as
always being with the update side locked.

tl;dr is that you're way over-thinking the rcu deref code. Just make one
deref accessor which encompasses all three use cases.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-13  3:09           ` Tejun Heo
@ 2022-06-13  3:56             ` Imran Khan
  2022-06-13 17:30               ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Imran Khan @ 2022-06-13  3:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: viro, gregkh, linux-kernel

Hello Tejun,

On 13/6/22 1:09 pm, Tejun Heo wrote:
> On Mon, Jun 13, 2022 at 12:55:14PM +1000, Imran Khan wrote:
>> I took below phrases as reference:
>>
>> If the access might be within an RCU read-side critical section on the one hand,
>> or protected by (say) my_lock on the other, use rcu_dereference_check(), for
>> example:
>>
>> p1 = rcu_dereference_check(p->rcu_protected_pointer,
>>                            lockdep_is_held(&my_lock));
>>
>>
>> and
>>
>>
>> If the access might be within an RCU read-side critical section on the one hand,
>> or protected by either my_lock or your_lock on the other, again use
>> rcu_dereference_check(), for example:
>>
>> p1 = rcu_dereference_check(p->rcu_protected_pointer,
>>                            lockdep_is_held(&my_lock) ||
>>                            lockdep_is_held(&your_lock));
> 
> So, both are saying that if a given reference can be under both read
> critical section or a lock which blocks updates, you can use deref_check to
> cover both cases - we're just using the stronger form of derefing even
> though that's not necessary while update side is locked out, which is fine.
> 
> The protected one is different in that it doesn't enforce the load ordering
> which is required for accesses with only RCU read lock. Given that all
> that's required is dependency ordering, I doubt it makes any actual
> difference and it likely is more useful in marking a specific dereference as
> always being with the update side locked.> tl;dr is that you're way over-thinking the rcu deref code. Just make one
> deref accessor which encompasses all three use cases.
> 
> 

Agree. I did over think this and went for the safest interface that I could
think of in each of the use cases. I will remove
kernfs_check_open_node_protected and use kernfs_deref_open_node_protected in its
place as well. This will cover all accesses under kernfs_open_file_mutex.

But we will still need kernfs_deref_open_node for cases where
!list_empty(&of->list) ensures safe access of ->attr.open and where we can't
ensure holding of kernfs_open_file_mutex. So we will need 2 deref accessors.
Right? Just asking this because you mentioned above to come up with one deref
accessor that can be used in all three use cases

Please let me if this sounds okay. I can send updated patch-set with these
changes in place.

Thanks,
-- Imran


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-13  3:56             ` Imran Khan
@ 2022-06-13 17:30               ` Tejun Heo
  2022-06-14  2:31                 ` Imran Khan
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2022-06-13 17:30 UTC (permalink / raw)
  To: Imran Khan; +Cc: viro, gregkh, linux-kernel

On Mon, Jun 13, 2022 at 01:56:43PM +1000, Imran Khan wrote:
> Agree. I did over think this and went for the safest interface that I could
> think of in each of the use cases. I will remove
> kernfs_check_open_node_protected and use kernfs_deref_open_node_protected in its
> place as well. This will cover all accesses under kernfs_open_file_mutex.
> 
> But we will still need kernfs_deref_open_node for cases where
> !list_empty(&of->list) ensures safe access of ->attr.open and where we can't
> ensure holding of kernfs_open_file_mutex. So we will need 2 deref accessors.
> Right? Just asking this because you mentioned above to come up with one deref
> accessor that can be used in all three use cases
> 
> Please let me if this sounds okay. I can send updated patch-set with these
> changes in place.

Just merge all three into one accessor. You can list both the !list_empty
condition and lock held conditions on the same rcu_dereference_check() call.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RESEND v4 1/4] kernfs: make ->attr.open RCU protected.
  2022-06-13 17:30               ` Tejun Heo
@ 2022-06-14  2:31                 ` Imran Khan
  0 siblings, 0 replies; 13+ messages in thread
From: Imran Khan @ 2022-06-14  2:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: viro, gregkh, linux-kernel

Hello Tejun,

On 14/6/22 3:30 am, Tejun Heo wrote:
[...]
> 
> Just merge all three into one accessor. You can list both the !list_empty
> condition and lock held conditions on the same rcu_dereference_check() call.
> 

I have all merged all dereferencing into one accessor in v5 [1] of this
patch-set. Could you please let me know your feedback ?

Thanks
 -- Imran

[1]: https://lore.kernel.org/lkml/20220614020840.546013-1-imran.f.khan@oracle.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-14  2:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-02  6:39 [PATCH RESEND v4 0/4] kernfs: make ->attr.open RCU protected Imran Khan
2022-06-02  6:39 ` [PATCH RESEND v4 1/4] " Imran Khan
2022-06-12 17:58   ` Tejun Heo
2022-06-13  2:36     ` Imran Khan
2022-06-13  2:46       ` Tejun Heo
2022-06-13  2:55         ` Imran Khan
2022-06-13  3:09           ` Tejun Heo
2022-06-13  3:56             ` Imran Khan
2022-06-13 17:30               ` Tejun Heo
2022-06-14  2:31                 ` Imran Khan
2022-06-02  6:39 ` [PATCH RESEND v4 2/4] kernfs: Change kernfs_notify_list to llist Imran Khan
2022-06-02  6:39 ` [PATCH RESEND v4 3/4] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
2022-06-02  6:39 ` [PATCH RESEND v4 4/4] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).