linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
@ 2014-01-10 13:57 Tejun Heo
  2014-01-10 13:57 ` [PATCH 01/14] kernfs: fix get_active failure handling in kernfs_seq_*() Tejun Heo
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

Hello,

This is v3 of kernfs self-removal patchset.  v2 posting mistakenly
sent out slightly old set of patches, so the v3.  Sorry about the
noise.  Changes from the first take[L] are,

* Patches reordered so that more trivial ones are in the front.

* Deactivation split into a separate stage instead of being part of
  unlinking.  Deactivation is now exposed as kernfs API via four new
  functions - kernfs_{de|re}activate[_self]().  These functions can
  nest and allow implementation of "deactivate, lock subsys, try
  removal, unlock subsys, reactivate" sequence where removal may or
  may not succeed.  This will be used to convert cgroup to kernfs.
  (prototype seems happy with this API)

* As this means that deactivation can be temporary,
  kernfs_get_active() is updated to block if the node is deactivated
  but not removed.

* kernfs_remove_self() is now implemented using the new deactivation
  API.  Its behavior remains the same.

Original patch description follows.

kernfs / sysfs implement the "sever" semantic for userland accesses.
When a node is removed, no further userland operations are allowed and
the in-flight ones are drained before removal is finished.  This makes
policing post-mortem userland accesses trivial for its users;
unfortunately, this comes with a drawback - a node which tries to
delete oneself through one of its userland operations deadlocks.
Removal wants to drain the active access that the operation itself is
running on top of.

This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which punts the actual removal to a work
item.  While making the operation asynchronous kinda works, it's a bit
cumbersome to use and its behavior isn't quite correct as the caller
has no way of telling when or even whether the operation is actually
complete.  If such self-removal is followed by another operation which
expects the removed name to be available, there's no way to make the
second operation reliable - e.g. something like "echo 1 > asdf/delete;
echo asdf > create_new_child" can't work properly.

This patchset improves kernfs removal path and implements
kernfs_remove_self() which is to be called from an on-going kernfs
operation and removes the self node.  The function can be called
concurrently and only one will return %true and all others will wait
until the winner's file operation is complete (not the
kernfs_remove_self() call itself but the enclosing file operation
which invoked the function).  This ensures that if there are multiple
concurrent "echo 1 > asdf/delete", all of them would finish only after
the whole store_delete() method is complete.

kernfs_remove_self() is exposed to upper layers through
sysfs_remove_file_self() and device_remove_file_self().  The existing
users of device_schedule_callback() are converted to use remove_self
and the unused async mechanism is removed.

This patchset contains the following 14 patches.

 0001-kernfs-fix-get_active-failure-handling-in-kernfs_seq.patch
 0002-kernfs-replace-kernfs_node-u.completion-with-kernfs_.patch
 0003-kernfs-remove-KERNFS_ACTIVE_REF-and-add-kernfs_lockd.patch
 0004-kernfs-remove-KERNFS_REMOVED.patch
 0005-kernfs-restructure-removal-path-to-fix-possible-prem.patch
 0006-kernfs-invoke-kernfs_unmap_bin_file-directly-from-__.patch
 0007-kernfs-remove-kernfs_addrm_cxt.patch
 0008-kernfs-make-kernfs_get_active-block-if-the-node-is-d.patch
 0009-kernfs-implement-kernfs_-de-re-activate-_self.patch
 0010-kernfs-sysfs-driver-core-implement-kernfs_remove_sel.patch
 0011-pci-use-device_remove_file_self-instead-of-device_sc.patch
 0012-scsi-use-device_remove_file_self-instead-of-device_s.patch
 0013-s390-use-device_remove_file_self-instead-of-device_s.patch
 0014-sysfs-driver-core-remove-unused-sysfs-device-_schedu.patch

0001 fixes -ENODEV failure handling in kernfs.  I *think* this could
be the fix for the issue Sasha reported with trinity fuzzying.  Sasha,
would it be possible to confirm whether the issue is reproducible with
this patch applied?

0002 replaces kernfs_node->u.completion with a hierarchy-wide
wait_queue_head.  This will be used to fix concurrent removal
behavior.

0003-0004 simplifies removal path to prepare for restructuring.

0005 fixes premature completion of node removal when multiple removers
are competing.  This shouldn't matter for the existing sysfs users.

0006-0007 cleans up removal path.  The size of kernfs_node gets
reduced by one pointer.

0008-0010 implement kernfs_{de|re}activate[_self](),
kernfs_remove_self() and friends.

0011-0014 convert the existing users of device_schedule_callback() to
device_remove_file_self() and remove now unused async mechanism.

After the changes, kernfs_node is shrunken by a pointer.
Unfortunately, the addition of deactivation API makes LOC go up by
above a hundred lines.  Oh well....

The patchset is on top of the current driver-core-next eb4c69033fd1
("Revert "kobject: introduce kobj_completion"") and also available in
the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-kernfs-suicide

diffstat follows.

 arch/s390/include/asm/ccwgroup.h |    1 
 arch/s390/pci/pci_sysfs.c        |   18 -
 drivers/base/core.c              |   50 +--
 drivers/pci/pci-sysfs.c          |   24 -
 drivers/s390/block/dcssblk.c     |   14 
 drivers/s390/cio/ccwgroup.c      |   26 +
 drivers/scsi/scsi_sysfs.c        |   15 -
 fs/kernfs/dir.c                  |  585 ++++++++++++++++++++++++++-------------
 fs/kernfs/file.c                 |   62 +++-
 fs/kernfs/kernfs-internal.h      |   17 -
 fs/kernfs/symlink.c              |    6 
 fs/sysfs/file.c                  |  115 +------
 include/linux/device.h           |   13 
 include/linux/kernfs.h           |   24 +
 include/linux/sysfs.h            |   16 -
 15 files changed, 555 insertions(+), 431 deletions(-)

Thanks.

--
tejun

[L] git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-kernfs-suicide

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/14] kernfs: fix get_active failure handling in kernfs_seq_*()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 02/14] kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq Tejun Heo
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo, Sasha Levin

When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
---
 fs/kernfs/file.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 316604c..bdd3885 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -54,6 +54,38 @@ static const struct kernfs_ops *kernfs_ops(struct kernfs_node *kn)
 	return kn->attr.ops;
 }
 
+/*
+ * As kernfs_seq_stop() is also called after kernfs_seq_start() or
+ * kernfs_seq_next() failure, it needs to distinguish whether it's stopping
+ * a seq_file iteration which is fully initialized with an active reference
+ * or an aborted kernfs_seq_start() due to get_active failure.  The
+ * position pointer is the only context for each seq_file iteration and
+ * thus the stop condition should be encoded in it.  As the return value is
+ * directly visible to userland, ERR_PTR(-ENODEV) is the only acceptable
+ * choice to indicate get_active failure.
+ *
+ * Unfortunately, this is complicated due to the optional custom seq_file
+ * operations which may return ERR_PTR(-ENODEV) too.  kernfs_seq_stop()
+ * can't distinguish whether ERR_PTR(-ENODEV) is from get_active failure or
+ * custom seq_file operations and thus can't decide whether put_active
+ * should be performed or not only on ERR_PTR(-ENODEV).
+ *
+ * This is worked around by factoring out the custom seq_stop() and
+ * put_active part into kernfs_seq_stop_active(), skipping it from
+ * kernfs_seq_stop() if ERR_PTR(-ENODEV) while invoking it directly after
+ * custom seq_file operations fail with ERR_PTR(-ENODEV) - this ensures
+ * that kernfs_seq_stop_active() is skipped only after get_active failure.
+ */
+static void kernfs_seq_stop_active(struct seq_file *sf, void *v)
+{
+	struct kernfs_open_file *of = sf->private;
+	const struct kernfs_ops *ops = kernfs_ops(of->kn);
+
+	if (ops->seq_stop)
+		ops->seq_stop(sf, v);
+	kernfs_put_active(of->kn);
+}
+
 static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
 {
 	struct kernfs_open_file *of = sf->private;
@@ -69,7 +101,11 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
 
 	ops = kernfs_ops(of->kn);
 	if (ops->seq_start) {
-		return ops->seq_start(sf, ppos);
+		void *next = ops->seq_start(sf, ppos);
+		/* see the comment above kernfs_seq_stop_active() */
+		if (next == ERR_PTR(-ENODEV))
+			kernfs_seq_stop_active(sf, next);
+		return next;
 	} else {
 		/*
 		 * The same behavior and code as single_open().  Returns
@@ -85,7 +121,11 @@ static void *kernfs_seq_next(struct seq_file *sf, void *v, loff_t *ppos)
 	const struct kernfs_ops *ops = kernfs_ops(of->kn);
 
 	if (ops->seq_next) {
-		return ops->seq_next(sf, v, ppos);
+		void *next = ops->seq_next(sf, v, ppos);
+		/* see the comment above kernfs_seq_stop_active() */
+		if (next == ERR_PTR(-ENODEV))
+			kernfs_seq_stop_active(sf, next);
+		return next;
 	} else {
 		/*
 		 * The same behavior and code as single_open(), always
@@ -99,12 +139,9 @@ static void *kernfs_seq_next(struct seq_file *sf, void *v, loff_t *ppos)
 static void kernfs_seq_stop(struct seq_file *sf, void *v)
 {
 	struct kernfs_open_file *of = sf->private;
-	const struct kernfs_ops *ops = kernfs_ops(of->kn);
 
-	if (ops->seq_stop)
-		ops->seq_stop(sf, v);
-
-	kernfs_put_active(of->kn);
+	if (v != ERR_PTR(-ENODEV))
+		kernfs_seq_stop_active(sf, v);
 	mutex_unlock(&of->mutex);
 }
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/14] kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
  2014-01-10 13:57 ` [PATCH 01/14] kernfs: fix get_active failure handling in kernfs_seq_*() Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 03/14] kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep() Tejun Heo
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

kernfs_node->u.completion is used to notify deactivation completion
from kernfs_put_active() to kernfs_deactivate().  We now allow
multiple racing removals of the same node and the current removal
scheme is no longer correct - kernfs_remove() invocation may return
before the node is properly deactivated if it races against another
removal.  The removal path will be restructured to address the issue.

To help such restructure which requires supporting multiple waiters,
this patch replaces kernfs_node->u.completion with
kernfs_root->deactivate_waitq.  This makes deactivation event
notifications share a per-root waitqueue_head; however, the wait path
is quite cold and this will also allow shaving one pointer off
kernfs_node.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c        | 27 +++++++++++----------------
 include/linux/kernfs.h |  4 ++--
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 510b506..ed62de6 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -8,6 +8,7 @@
  * This file is released under the GPLv2.
  */
 
+#include <linux/sched.h>
 #include <linux/fs.h>
 #include <linux/namei.h>
 #include <linux/idr.h>
@@ -151,6 +152,7 @@ struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)
  */
 void kernfs_put_active(struct kernfs_node *kn)
 {
+	struct kernfs_root *root = kernfs_root(kn);
 	int v;
 
 	if (unlikely(!kn))
@@ -162,11 +164,7 @@ void kernfs_put_active(struct kernfs_node *kn)
 	if (likely(v != KN_DEACTIVATED_BIAS))
 		return;
 
-	/*
-	 * atomic_dec_return() is a mb(), we'll always see the updated
-	 * kn->u.completion.
-	 */
-	complete(kn->u.completion);
+	wake_up_all(&root->deactivate_waitq);
 }
 
 /**
@@ -177,26 +175,22 @@ void kernfs_put_active(struct kernfs_node *kn)
  */
 static void kernfs_deactivate(struct kernfs_node *kn)
 {
-	DECLARE_COMPLETION_ONSTACK(wait);
-	int v;
+	struct kernfs_root *root = kernfs_root(kn);
 
 	BUG_ON(!(kn->flags & KERNFS_REMOVED));
 
 	if (!(kernfs_type(kn) & KERNFS_ACTIVE_REF))
 		return;
 
-	kn->u.completion = (void *)&wait;
-
 	rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
-	/* atomic_add_return() is a mb(), put_active() will always see
-	 * the updated kn->u.completion.
-	 */
-	v = atomic_add_return(KN_DEACTIVATED_BIAS, &kn->active);
 
-	if (v != KN_DEACTIVATED_BIAS) {
+	atomic_add(KN_DEACTIVATED_BIAS, &kn->active);
+
+	if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
 		lock_contended(&kn->dep_map, _RET_IP_);
-		wait_for_completion(&wait);
-	}
+
+	wait_event(root->deactivate_waitq,
+		   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
 
 	lock_acquired(&kn->dep_map, _RET_IP_);
 	rwsem_release(&kn->dep_map, 1, _RET_IP_);
@@ -613,6 +607,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_dir_ops *kdops, void *priv)
 
 	root->dir_ops = kdops;
 	root->kn = kn;
+	init_waitqueue_head(&root->deactivate_waitq);
 
 	return root;
 }
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index d2c439d..232f1a6 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -15,7 +15,7 @@
 #include <linux/lockdep.h>
 #include <linux/rbtree.h>
 #include <linux/atomic.h>
-#include <linux/completion.h>
+#include <linux/wait.h>
 
 struct file;
 struct iattr;
@@ -91,7 +91,6 @@ struct kernfs_node {
 	struct rb_node		rb;
 
 	union {
-		struct completion	*completion;
 		struct kernfs_node	*removed_list;
 	} u;
 
@@ -132,6 +131,7 @@ struct kernfs_root {
 	/* private fields, do not use outside kernfs proper */
 	struct ida		ino_ida;
 	struct kernfs_dir_ops	*dir_ops;
+	wait_queue_head_t	deactivate_waitq;
 };
 
 struct kernfs_open_file {
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/14] kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
  2014-01-10 13:57 ` [PATCH 01/14] kernfs: fix get_active failure handling in kernfs_seq_*() Tejun Heo
  2014-01-10 13:57 ` [PATCH 02/14] kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 04/14] kernfs: remove KERNFS_REMOVED Tejun Heo
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

There currently are two mechanisms gating active ref lockdep
annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
The former disables lockdep annotations in kernfs_get/put_active()
while the latter disables all of kernfs_deactivate().

While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
deactivation step for non-file nodes, the benefit is marginal and it
needlessly diverges code paths.  Let's drop KERNFS_ACTIVE_REF and use
KERNFS_LOCKDEP in kernfs_deactivate() too.

While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
flag so that it's more convenient and the related code can be compiled
out when not enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c        | 31 ++++++++++++++++++++-----------
 include/linux/kernfs.h |  1 -
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index ed62de6..1c9130a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -22,6 +22,15 @@ DEFINE_MUTEX(kernfs_mutex);
 
 #define rb_to_kn(X) rb_entry((X), struct kernfs_node, rb)
 
+static bool kernfs_lockdep(struct kernfs_node *kn)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	return kn->flags & KERNFS_LOCKDEP;
+#else
+	return false;
+#endif
+}
+
 /**
  *	kernfs_name_hash
  *	@name: Null terminated string to hash
@@ -138,7 +147,7 @@ struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)
 	if (!atomic_inc_unless_negative(&kn->active))
 		return NULL;
 
-	if (kn->flags & KERNFS_LOCKDEP)
+	if (kernfs_lockdep(kn))
 		rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);
 	return kn;
 }
@@ -158,7 +167,7 @@ void kernfs_put_active(struct kernfs_node *kn)
 	if (unlikely(!kn))
 		return;
 
-	if (kn->flags & KERNFS_LOCKDEP)
+	if (kernfs_lockdep(kn))
 		rwsem_release(&kn->dep_map, 1, _RET_IP_);
 	v = atomic_dec_return(&kn->active);
 	if (likely(v != KN_DEACTIVATED_BIAS))
@@ -179,21 +188,21 @@ static void kernfs_deactivate(struct kernfs_node *kn)
 
 	BUG_ON(!(kn->flags & KERNFS_REMOVED));
 
-	if (!(kernfs_type(kn) & KERNFS_ACTIVE_REF))
-		return;
-
-	rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
-
 	atomic_add(KN_DEACTIVATED_BIAS, &kn->active);
 
-	if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
-		lock_contended(&kn->dep_map, _RET_IP_);
+	if (kernfs_lockdep(kn)) {
+		rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
+		if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
+			lock_contended(&kn->dep_map, _RET_IP_);
+	}
 
 	wait_event(root->deactivate_waitq,
 		   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
 
-	lock_acquired(&kn->dep_map, _RET_IP_);
-	rwsem_release(&kn->dep_map, 1, _RET_IP_);
+	if (kernfs_lockdep(kn)) {
+		lock_acquired(&kn->dep_map, _RET_IP_);
+		rwsem_release(&kn->dep_map, 1, _RET_IP_);
+	}
 }
 
 /**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 232f1a6..42ad32f 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -34,7 +34,6 @@ enum kernfs_node_type {
 };
 
 #define KERNFS_TYPE_MASK	0x000f
-#define KERNFS_ACTIVE_REF	KERNFS_FILE
 #define KERNFS_FLAG_MASK	~KERNFS_TYPE_MASK
 
 enum kernfs_node_flag {
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/14] kernfs: remove KERNFS_REMOVED
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (2 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 03/14] kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep() Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 05/14] kernfs: restructure removal path to fix possible premature return Tejun Heo
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

KERNFS_REMOVED is used to mark half-initialized and dying nodes so
that they don't show up in lookups and deny adding new nodes under or
renaming it; however, its role overlaps those of deactivation and
removal from rbtree.

It's necessary to deny addition of new children while removal is in
progress; however, this role considerably intersects with deactivation
- KERNFS_REMOVED prevents new children while deactivation prevents new
file operations.  There's no reason to have them separate making
things more complex than necessary.

KERNFS_REMOVED is also used to decide whether a node is still visible
to vfs layer, which is rather redundant as equivalent determination
can be made by testing whether the node is on its parent's children
rbtree or not.

This patch removes KERNFS_REMOVED.

* Instead of KERNFS_REMOVED, each node now starts its life
  deactivated.  This means that we now use both atomic_add() and
  atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN.  The compiler
  generates an overflow warnings when negating INT_MIN as the negation
  can't be represented as a positive number.  Nothing is actually
  broken but let's bump BIAS by one to avoid the warnings for archs
  which negates the subtrahend..

* KERNFS_REMOVED tests in add and rename paths are replaced with
  kernfs_get/put_active() of the target nodes.  Due to the way the add
  path is structured now, active ref handling is done in the callers
  of kernfs_add_one().  This will be consolidated up later.

* kernfs_remove_one() is updated to deactivate instead of setting
  KERNFS_REMOVED.  This removes deactivation from kernfs_deactivate(),
  which is now renamed to kernfs_drain().

* kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
  KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
  dropped.  A node which is removed from the children rbtree is not
  included in the iteration in the first place.  This means that a
  node may be visible through vfs a bit longer - it's now also visible
  after deactivation until the actual removal.  This slightly enlarged
  window difference doesn't make any difference to the userland.

* Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
  checks on the active ref.

* Some comment style updates in the affected area.

v2: Reordered before removal path restructuring.  kernfs_active()
    dropped and kernfs_get/put_active() used instead.  RB_EMPTY_NODE()
    used in the lookup paths.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c             | 79 +++++++++++++++++++++++++--------------------
 fs/kernfs/file.c            | 10 ++++--
 fs/kernfs/kernfs-internal.h |  3 +-
 fs/kernfs/symlink.c         | 10 ++++--
 include/linux/kernfs.h      |  1 -
 5 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 1c9130a..7f8afc1 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -127,6 +127,7 @@ static void kernfs_unlink_sibling(struct kernfs_node *kn)
 		kn->parent->dir.subdirs--;
 
 	rb_erase(&kn->rb, &kn->parent->dir.children);
+	RB_CLEAR_NODE(&kn->rb);
 }
 
 /**
@@ -177,18 +178,16 @@ void kernfs_put_active(struct kernfs_node *kn)
 }
 
 /**
- *	kernfs_deactivate - deactivate kernfs_node
- *	@kn: kernfs_node to deactivate
+ * kernfs_drain - drain kernfs_node
+ * @kn: kernfs_node to drain
  *
- *	Deny new active references and drain existing ones.
+ * Drain existing usages.
  */
-static void kernfs_deactivate(struct kernfs_node *kn)
+static void kernfs_drain(struct kernfs_node *kn)
 {
 	struct kernfs_root *root = kernfs_root(kn);
 
-	BUG_ON(!(kn->flags & KERNFS_REMOVED));
-
-	atomic_add(KN_DEACTIVATED_BIAS, &kn->active);
+	WARN_ON_ONCE(atomic_read(&kn->active) >= 0);
 
 	if (kernfs_lockdep(kn)) {
 		rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
@@ -233,13 +232,15 @@ void kernfs_put(struct kernfs_node *kn)
 		return;
 	root = kernfs_root(kn);
  repeat:
-	/* Moving/renaming is always done while holding reference.
+	/*
+	 * Moving/renaming is always done while holding reference.
 	 * kn->parent won't change beneath us.
 	 */
 	parent = kn->parent;
 
-	WARN(!(kn->flags & KERNFS_REMOVED), "kernfs: free using entry: %s/%s\n",
-	     parent ? parent->name : "", kn->name);
+	WARN_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS,
+		  "kernfs_put: %s/%s: released with incorrect active_ref %d\n",
+		  parent ? parent->name : "", kn->name, atomic_read(&kn->active));
 
 	if (kernfs_type(kn) == KERNFS_LINK)
 		kernfs_put(kn->symlink.target_kn);
@@ -281,8 +282,8 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 	kn = dentry->d_fsdata;
 	mutex_lock(&kernfs_mutex);
 
-	/* The kernfs node has been deleted */
-	if (kn->flags & KERNFS_REMOVED)
+	/* Force fresh lookup if removed */
+	if (kn->parent && RB_EMPTY_NODE(&kn->rb))
 		goto out_bad;
 
 	/* The kernfs node has been moved? */
@@ -350,11 +351,12 @@ struct kernfs_node *kernfs_new_node(struct kernfs_root *root, const char *name,
 	kn->ino = ret;
 
 	atomic_set(&kn->count, 1);
-	atomic_set(&kn->active, 0);
+	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
+	RB_CLEAR_NODE(&kn->rb);
 
 	kn->name = name;
 	kn->mode = mode;
-	kn->flags = flags | KERNFS_REMOVED;
+	kn->flags = flags;
 
 	return kn;
 
@@ -413,6 +415,8 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
 	struct kernfs_iattrs *ps_iattr;
 	int ret;
 
+	WARN_ON_ONCE(atomic_read(&parent->active) < 0);
+
 	if (has_ns != (bool)kn->ns) {
 		WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n",
 		     has_ns ? "required" : "invalid", parent->name, kn->name);
@@ -422,9 +426,6 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
 	if (kernfs_type(parent) != KERNFS_DIR)
 		return -EINVAL;
 
-	if (parent->flags & KERNFS_REMOVED)
-		return -ENOENT;
-
 	kn->hash = kernfs_name_hash(kn->name, kn->ns);
 	kn->parent = parent;
 	kernfs_get(parent);
@@ -441,8 +442,7 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
 	}
 
 	/* Mark the entry added into directory tree */
-	kn->flags &= ~KERNFS_REMOVED;
-
+	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
 	return 0;
 }
 
@@ -470,7 +470,7 @@ static void kernfs_remove_one(struct kernfs_addrm_cxt *acxt,
 	 * Removal can be called multiple times on the same node.  Only the
 	 * first invocation is effective and puts the base ref.
 	 */
-	if (kn->flags & KERNFS_REMOVED)
+	if (atomic_read(&kn->active) < 0)
 		return;
 
 	if (kn->parent) {
@@ -484,7 +484,7 @@ static void kernfs_remove_one(struct kernfs_addrm_cxt *acxt,
 		}
 	}
 
-	kn->flags |= KERNFS_REMOVED;
+	atomic_add(KN_DEACTIVATED_BIAS, &kn->active);
 	kn->u.removed_list = acxt->removed;
 	acxt->removed = kn;
 }
@@ -512,7 +512,7 @@ void kernfs_addrm_finish(struct kernfs_addrm_cxt *acxt)
 
 		acxt->removed = kn->u.removed_list;
 
-		kernfs_deactivate(kn);
+		kernfs_drain(kn);
 		kernfs_unmap_bin_file(kn);
 		kernfs_put(kn);
 	}
@@ -610,7 +610,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_dir_ops *kdops, void *priv)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	kn->flags &= ~KERNFS_REMOVED;
+	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
 	kn->priv = priv;
 	kn->dir.root = root;
 
@@ -662,9 +662,13 @@ struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent,
 	kn->priv = priv;
 
 	/* link in */
-	kernfs_addrm_start(&acxt);
-	rc = kernfs_add_one(&acxt, kn, parent);
-	kernfs_addrm_finish(&acxt);
+	rc = -ENOENT;
+	if (kernfs_get_active(parent)) {
+		kernfs_addrm_start(&acxt);
+		rc = kernfs_add_one(&acxt, kn, parent);
+		kernfs_addrm_finish(&acxt);
+		kernfs_put_active(parent);
+	}
 
 	if (!rc)
 		return kn;
@@ -899,27 +903,29 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 {
 	int error;
 
-	mutex_lock(&kernfs_mutex);
-
 	error = -ENOENT;
-	if ((kn->flags | new_parent->flags) & KERNFS_REMOVED)
+	if (!kernfs_get_active(new_parent))
 		goto out;
+	if (!kernfs_get_active(kn))
+		goto out_put_new_parent;
+
+	mutex_lock(&kernfs_mutex);
 
 	error = 0;
 	if ((kn->parent == new_parent) && (kn->ns == new_ns) &&
 	    (strcmp(kn->name, new_name) == 0))
-		goto out;	/* nothing to rename */
+		goto out_unlock;	/* nothing to rename */
 
 	error = -EEXIST;
 	if (kernfs_find_ns(new_parent, new_name, new_ns))
-		goto out;
+		goto out_unlock;
 
 	/* rename kernfs_node */
 	if (strcmp(kn->name, new_name) != 0) {
 		error = -ENOMEM;
 		new_name = kstrdup(new_name, GFP_KERNEL);
 		if (!new_name)
-			goto out;
+			goto out_unlock;
 
 		if (kn->flags & KERNFS_STATIC_NAME)
 			kn->flags &= ~KERNFS_STATIC_NAME;
@@ -941,8 +947,12 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 	kernfs_link_sibling(kn);
 
 	error = 0;
- out:
+out_unlock:
 	mutex_unlock(&kernfs_mutex);
+	kernfs_put_active(kn);
+out_put_new_parent:
+	kernfs_put_active(new_parent);
+out:
 	return error;
 }
 
@@ -962,8 +972,7 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns,
 	struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos)
 {
 	if (pos) {
-		int valid = !(pos->flags & KERNFS_REMOVED) &&
-			pos->parent == parent && hash == pos->hash;
+		int valid = pos->parent == parent && hash == pos->hash;
 		kernfs_put(pos);
 		if (!valid)
 			pos = NULL;
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index bdd3885..231a171 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -856,9 +856,13 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 	if (ops->mmap)
 		kn->flags |= KERNFS_HAS_MMAP;
 
-	kernfs_addrm_start(&acxt);
-	rc = kernfs_add_one(&acxt, kn, parent);
-	kernfs_addrm_finish(&acxt);
+	rc = -ENOENT;
+	if (kernfs_get_active(parent)) {
+		kernfs_addrm_start(&acxt);
+		rc = kernfs_add_one(&acxt, kn, parent);
+		kernfs_addrm_finish(&acxt);
+		kernfs_put_active(parent);
+	}
 
 	if (rc) {
 		kernfs_put(kn);
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index c6ba5bc..57a93f4 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -26,7 +26,8 @@ struct kernfs_iattrs {
 	struct simple_xattrs	xattrs;
 };
 
-#define KN_DEACTIVATED_BIAS		INT_MIN
+/* +1 to avoid triggering overflow warning when negating it */
+#define KN_DEACTIVATED_BIAS		(INT_MIN + 1)
 
 /* KERNFS_TYPE_MASK and types are defined in include/linux/kernfs.h */
 
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index a03e260..b2c106c 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -40,9 +40,13 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
 	kn->symlink.target_kn = target;
 	kernfs_get(target);	/* ref owned by symlink */
 
-	kernfs_addrm_start(&acxt);
-	error = kernfs_add_one(&acxt, kn, parent);
-	kernfs_addrm_finish(&acxt);
+	error = -ENOENT;
+	if (kernfs_get_active(parent)) {
+		kernfs_addrm_start(&acxt);
+		error = kernfs_add_one(&acxt, kn, parent);
+		kernfs_addrm_finish(&acxt);
+		kernfs_put_active(parent);
+	}
 
 	if (!error)
 		return kn;
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 42ad32f..289d4f6 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -37,7 +37,6 @@ enum kernfs_node_type {
 #define KERNFS_FLAG_MASK	~KERNFS_TYPE_MASK
 
 enum kernfs_node_flag {
-	KERNFS_REMOVED		= 0x0010,
 	KERNFS_NS		= 0x0020,
 	KERNFS_HAS_SEQ_SHOW	= 0x0040,
 	KERNFS_HAS_MMAP		= 0x0080,
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/14] kernfs: restructure removal path to fix possible premature return
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (3 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 04/14] kernfs: remove KERNFS_REMOVED Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 06/14] kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove() Tejun Heo
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

The recursive nature of kernfs_remove() means that, even if
kernfs_remove() is not allowed to be called multiple times on the same
node, there may be race conditions between removal of parent and its
descendants.  While we can claim that kernfs_remove() shouldn't be
called on one of the descendants while the removal of an ancestor is
in progress, such rule is unnecessarily restrictive and very difficult
to enforce.  It's better to simply allow invoking kernfs_remove() as
the caller sees fit as long as the caller ensures that the node is
accessible.

The current behavior in such situations is broken.  Whoever enters
removal path first takes the node off the hierarchy and then
deactivates.  Following removers either return as soon as it notices
that it's not the first one or can't even find the target node as it
has already been removed from the hierarchy.  In both cases, the
following removers may finish prematurely while the nodes which should
be removed and drained are still being processed by the first one.

This patch restructures so that multiple removers, whether through
recursion or direction invocation, always follow the following rules.

* When there are multiple concurrent removers, only one puts the base
  ref.

* Regardless of which one puts the base ref, all removers are blocked
  until the target node is fully deactivated and removed.

To achieve the above, removal path now first deactivates the subtree,
drains it and then unlinks one-by-one.  __kernfs_deactivate() is
called directly from __kernfs_removal() and drops and regrabs
kernfs_mutex for each descendant to drain active refs.  As this means
that multiple removers can enter __kernfs_deactivate() for the same
node, the function is updated so that it can handle multiple
deactivators of the same node - only one actually deactivates but all
wait till drain completion.

The restructured removal path guarantees that a removed node gets
unlinked only after the node is deactivated and drained.  Combined
with proper multiple deactivator handling, this guarantees that any
invocation of kernfs_remove() returns only after the node itself and
all its descendants are deactivated, drained and removed.

v2: Draining separated into a separate loop (used to be in the same
    loop as unlink) and done from __kernfs_deactivate().  This is to
    allow exposing deactivation as a separate interface later.

    Root node removal was broken in v1 patch.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c        | 139 ++++++++++++++++++++++++++++++-------------------
 include/linux/kernfs.h |   1 +
 2 files changed, 87 insertions(+), 53 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 7f8afc1..e565ec0 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -181,14 +181,38 @@ void kernfs_put_active(struct kernfs_node *kn)
  * kernfs_drain - drain kernfs_node
  * @kn: kernfs_node to drain
  *
- * Drain existing usages.
+ * Drain existing usages of @kn.  Mutiple removers may invoke this function
+ * concurrently on @kn and all will return after draining is complete.
+ * Returns %true if drain is performed and kernfs_mutex was temporarily
+ * released.  %false if @kn was already drained and no operation was
+ * necessary.
+ *
+ * The caller is responsible for ensuring @kn stays pinned while this
+ * function is in progress even if it gets removed by someone else.
  */
-static void kernfs_drain(struct kernfs_node *kn)
+static bool kernfs_drain(struct kernfs_node *kn)
+	__releases(&kernfs_mutex) __acquires(&kernfs_mutex)
 {
 	struct kernfs_root *root = kernfs_root(kn);
 
+	lockdep_assert_held(&kernfs_mutex);
 	WARN_ON_ONCE(atomic_read(&kn->active) >= 0);
 
+	/*
+	 * We want to go through the active ref lockdep annotation at least
+	 * once for all node removals, but the lockdep annotation can't be
+	 * nested inside kernfs_mutex and deactivation can't make forward
+	 * progress if we keep dropping the mutex.  Use JUST_ACTIVATED to
+	 * force the slow path once for each deactivation if lockdep is
+	 * enabled.
+	 */
+	if ((!kernfs_lockdep(kn) || !(kn->flags & KERNFS_JUST_DEACTIVATED)) &&
+	    atomic_read(&kn->active) == KN_DEACTIVATED_BIAS)
+		return false;
+
+	kn->flags &= ~KERNFS_JUST_DEACTIVATED;
+	mutex_unlock(&kernfs_mutex);
+
 	if (kernfs_lockdep(kn)) {
 		rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
 		if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
@@ -202,6 +226,9 @@ static void kernfs_drain(struct kernfs_node *kn)
 		lock_acquired(&kn->dep_map, _RET_IP_);
 		rwsem_release(&kn->dep_map, 1, _RET_IP_);
 	}
+
+	mutex_lock(&kernfs_mutex);
+	return true;
 }
 
 /**
@@ -447,49 +474,6 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
 }
 
 /**
- *	kernfs_remove_one - remove kernfs_node from parent
- *	@acxt: addrm context to use
- *	@kn: kernfs_node to be removed
- *
- *	Mark @kn removed and drop nlink of parent inode if @kn is a
- *	directory.  @kn is unlinked from the children list.
- *
- *	This function should be called between calls to
- *	kernfs_addrm_start() and kernfs_addrm_finish() and should be
- *	passed the same @acxt as passed to kernfs_addrm_start().
- *
- *	LOCKING:
- *	Determined by kernfs_addrm_start().
- */
-static void kernfs_remove_one(struct kernfs_addrm_cxt *acxt,
-			      struct kernfs_node *kn)
-{
-	struct kernfs_iattrs *ps_iattr;
-
-	/*
-	 * Removal can be called multiple times on the same node.  Only the
-	 * first invocation is effective and puts the base ref.
-	 */
-	if (atomic_read(&kn->active) < 0)
-		return;
-
-	if (kn->parent) {
-		kernfs_unlink_sibling(kn);
-
-		/* Update timestamps on the parent */
-		ps_iattr = kn->parent->iattr;
-		if (ps_iattr) {
-			ps_iattr->ia_iattr.ia_ctime = CURRENT_TIME;
-			ps_iattr->ia_iattr.ia_mtime = CURRENT_TIME;
-		}
-	}
-
-	atomic_add(KN_DEACTIVATED_BIAS, &kn->active);
-	kn->u.removed_list = acxt->removed;
-	acxt->removed = kn;
-}
-
-/**
  *	kernfs_addrm_finish - finish up kernfs_node add/remove
  *	@acxt: addrm context to finish up
  *
@@ -512,7 +496,6 @@ void kernfs_addrm_finish(struct kernfs_addrm_cxt *acxt)
 
 		acxt->removed = kn->u.removed_list;
 
-		kernfs_drain(kn);
 		kernfs_unmap_bin_file(kn);
 		kernfs_put(kn);
 	}
@@ -822,23 +805,73 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
 	return pos->parent;
 }
 
+static void __kernfs_deactivate(struct kernfs_node *kn)
+{
+	struct kernfs_node *pos;
+
+	lockdep_assert_held(&kernfs_mutex);
+
+	/* prevent any new usage under @kn by deactivating all nodes */
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn))) {
+		if (atomic_read(&pos->active) >= 0) {
+			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+			pos->flags |= KERNFS_JUST_DEACTIVATED;
+		}
+	}
+
+	/*
+	 * Drain the subtree.  If kernfs_drain() blocked to drain, which is
+	 * indicated by %true return, it temporarily released kernfs_mutex
+	 * and the rbtree might have been modified inbetween breaking our
+	 * future walk.  Restart the walk after each %true return.
+	 */
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn))) {
+		bool drained;
+
+		kernfs_get(pos);
+		drained = kernfs_drain(pos);
+		kernfs_put(pos);
+		if (drained)
+			pos = NULL;
+	}
+}
+
 static void __kernfs_remove(struct kernfs_addrm_cxt *acxt,
 			    struct kernfs_node *kn)
 {
-	struct kernfs_node *pos, *next;
+	struct kernfs_node *pos;
+
+	lockdep_assert_held(&kernfs_mutex);
 
 	if (!kn)
 		return;
 
 	pr_debug("kernfs %s: removing\n", kn->name);
 
-	next = NULL;
+	__kernfs_deactivate(kn);
+
+	/* unlink the subtree node-by-node */
 	do {
-		pos = next;
-		next = kernfs_next_descendant_post(pos, kn);
-		if (pos)
-			kernfs_remove_one(acxt, pos);
-	} while (next);
+		struct kernfs_iattrs *ps_iattr;
+
+		pos = kernfs_leftmost_descendant(kn);
+
+		if (pos->parent) {
+			kernfs_unlink_sibling(pos);
+
+			/* update timestamps on the parent */
+			ps_iattr = pos->parent->iattr;
+			if (ps_iattr) {
+				ps_iattr->ia_iattr.ia_ctime = CURRENT_TIME;
+				ps_iattr->ia_iattr.ia_mtime = CURRENT_TIME;
+			}
+		}
+
+		pos->u.removed_list = acxt->removed;
+		acxt->removed = pos;
+	} while (pos != kn);
 }
 
 /**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 289d4f6..61bd7ae 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -37,6 +37,7 @@ enum kernfs_node_type {
 #define KERNFS_FLAG_MASK	~KERNFS_TYPE_MASK
 
 enum kernfs_node_flag {
+	KERNFS_JUST_DEACTIVATED	= 0x0010, /* used to aid lockdep annotation */
 	KERNFS_NS		= 0x0020,
 	KERNFS_HAS_SEQ_SHOW	= 0x0040,
 	KERNFS_HAS_MMAP		= 0x0080,
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/14] kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (4 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 05/14] kernfs: restructure removal path to fix possible premature return Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 07/14] kernfs: remove kernfs_addrm_cxt Tejun Heo
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
the target file before kernfs_remove() finishes; however, it currently
is being called from kernfs_addrm_finish() and has the same race
problem as the original implementation of deactivation when there are
multiple removers - only the remover which snatches the node to its
addrm_cxt->removed list is guaranteed to wait for its completion
before returning.

It can be fixed by moving kernfs_unmap_bin_file() invocation from
kernfs_addrm_finish() to __kernfs_remove().  The function may be
called multiple times but that shouldn't do any harm.

We end up dropping kernfs_mutex in the removal loop and the node may
be removed inbetween by someone else.  kernfs_unlink_sibling() is
updated to test whether the node has already been removed and return
accordingly.  __kernfs_remove() in turn performs post-unlinking
cleanup only if it actually unlinked the node.

KERNFS_HAS_MMAP test is moved out of the unmap function into
__kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
While at it, drop the now meaningless "bin" qualifier from the
function name.

v2: Rewritten to fit the v2 restructuring of removal path.  HAS_MMAP
    test relocated.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c             | 42 +++++++++++++++++++++++++++++++++---------
 fs/kernfs/file.c            |  5 +----
 fs/kernfs/kernfs-internal.h |  2 +-
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index e565ec0..f878e4f 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -121,13 +121,17 @@ static int kernfs_link_sibling(struct kernfs_node *kn)
  *	Locking:
  *	mutex_lock(kernfs_mutex)
  */
-static void kernfs_unlink_sibling(struct kernfs_node *kn)
+static bool kernfs_unlink_sibling(struct kernfs_node *kn)
 {
+	if (RB_EMPTY_NODE(&kn->rb))
+		return false;
+
 	if (kernfs_type(kn) == KERNFS_DIR)
 		kn->parent->dir.subdirs--;
 
 	rb_erase(&kn->rb, &kn->parent->dir.children);
 	RB_CLEAR_NODE(&kn->rb);
+	return true;
 }
 
 /**
@@ -496,7 +500,6 @@ void kernfs_addrm_finish(struct kernfs_addrm_cxt *acxt)
 
 		acxt->removed = kn->u.removed_list;
 
-		kernfs_unmap_bin_file(kn);
 		kernfs_put(kn);
 	}
 }
@@ -854,23 +857,44 @@ static void __kernfs_remove(struct kernfs_addrm_cxt *acxt,
 
 	/* unlink the subtree node-by-node */
 	do {
-		struct kernfs_iattrs *ps_iattr;
-
 		pos = kernfs_leftmost_descendant(kn);
 
-		if (pos->parent) {
-			kernfs_unlink_sibling(pos);
+		/*
+		 * We're gonna release kernfs_mutex to unmap bin files,
+		 * Make sure @pos doesn't go away inbetween.
+		 */
+		kernfs_get(pos);
+
+		/*
+		 * This must be come before unlinking; otherwise, when
+		 * there are multiple removers, some may finish before
+		 * unmapping is complete.
+		 */
+		if (pos->flags & KERNFS_HAS_MMAP) {
+			mutex_unlock(&kernfs_mutex);
+			kernfs_unmap_file(pos);
+			mutex_lock(&kernfs_mutex);
+		}
+
+		/*
+		 * kernfs_unlink_sibling() succeeds once per node.  Use it
+		 * to decide who's responsible for cleanups.
+		 */
+		if (!pos->parent || kernfs_unlink_sibling(pos)) {
+			struct kernfs_iattrs *ps_iattr =
+				pos->parent ? pos->parent->iattr : NULL;
 
 			/* update timestamps on the parent */
-			ps_iattr = pos->parent->iattr;
 			if (ps_iattr) {
 				ps_iattr->ia_iattr.ia_ctime = CURRENT_TIME;
 				ps_iattr->ia_iattr.ia_mtime = CURRENT_TIME;
 			}
+
+			pos->u.removed_list = acxt->removed;
+			acxt->removed = pos;
 		}
 
-		pos->u.removed_list = acxt->removed;
-		acxt->removed = pos;
+		kernfs_put(pos);
 	} while (pos != kn);
 }
 
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 231a171..404ffd2 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -700,14 +700,11 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-void kernfs_unmap_bin_file(struct kernfs_node *kn)
+void kernfs_unmap_file(struct kernfs_node *kn)
 {
 	struct kernfs_open_node *on;
 	struct kernfs_open_file *of;
 
-	if (!(kn->flags & KERNFS_HAS_MMAP))
-		return;
-
 	spin_lock_irq(&kernfs_open_node_lock);
 	on = kn->attr.open;
 	if (on)
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 57a93f4..e9ec38c 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -113,7 +113,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_root *root, const char *name,
  */
 extern const struct file_operations kernfs_file_fops;
 
-void kernfs_unmap_bin_file(struct kernfs_node *kn);
+void kernfs_unmap_file(struct kernfs_node *kn);
 
 /*
  * symlink.c
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/14] kernfs: remove kernfs_addrm_cxt
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (5 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 06/14] kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove() Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 08/14] kernfs: make kernfs_get_active() block if the node is deactivated but not removed Tejun Heo
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
added because there were operations which should be performed outside
kernfs_mutex after adding and removing kernfs_nodes.  The necessary
operations were recorded in kernfs_addrm_cxt and performed by
kernfs_addrm_finish(); however, after the recent changes which
relocated deactivation and unmapping so that they're performed
directly during removal, the only operation kernfs_addrm_finish()
performs is kernfs_put(), which can be moved inside the removal path
too.

This patch moves the kernfs_put() of the base ref to __kernfs_remove()
and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

* kernfs_add_one() is updated to grab and release the parent's active
  ref and kernfs_mutex itself.  kernfs_get/put_active() and
  kernfs_addrm_start/finish() invocations around it are removed from
  all users.

* __kernfs_remove() puts an unlinked node directly instead of chaining
  it to kernfs_addrm_cxt.  Its callers are updated to grab and release
  kernfs_mutex instead of calling kernfs_addrm_start/finish() around
  it.

v2: Updated to fit the v2 restructuring of removal path.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c             | 114 ++++++++++----------------------------------
 fs/kernfs/file.c            |  10 +---
 fs/kernfs/kernfs-internal.h |  12 +----
 fs/kernfs/symlink.c         |  10 +---
 include/linux/kernfs.h      |   4 --
 5 files changed, 29 insertions(+), 121 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index f878e4f..770d687 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -399,28 +399,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_root *root, const char *name,
 }
 
 /**
- *	kernfs_addrm_start - prepare for kernfs_node add/remove
- *	@acxt: pointer to kernfs_addrm_cxt to be used
- *
- *	This function is called when the caller is about to add or remove
- *	kernfs_node.  This function acquires kernfs_mutex.  @acxt is used
- *	to keep and pass context to other addrm functions.
- *
- *	LOCKING:
- *	Kernel thread context (may sleep).  kernfs_mutex is locked on
- *	return.
- */
-void kernfs_addrm_start(struct kernfs_addrm_cxt *acxt)
-	__acquires(kernfs_mutex)
-{
-	memset(acxt, 0, sizeof(*acxt));
-
-	mutex_lock(&kernfs_mutex);
-}
-
-/**
  *	kernfs_add_one - add kernfs_node to parent without warning
- *	@acxt: addrm context to use
  *	@kn: kernfs_node to be added
  *	@parent: the parent kernfs_node to add @kn to
  *
@@ -428,34 +407,29 @@ void kernfs_addrm_start(struct kernfs_addrm_cxt *acxt)
  *	parent inode if @kn is a directory and link into the children list
  *	of the parent.
  *
- *	This function should be called between calls to
- *	kernfs_addrm_start() and kernfs_addrm_finish() and should be passed
- *	the same @acxt as passed to kernfs_addrm_start().
- *
- *	LOCKING:
- *	Determined by kernfs_addrm_start().
- *
  *	RETURNS:
  *	0 on success, -EEXIST if entry with the given name already
  *	exists.
  */
-int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
-		  struct kernfs_node *parent)
+int kernfs_add_one(struct kernfs_node *kn, struct kernfs_node *parent)
 {
-	bool has_ns = kernfs_ns_enabled(parent);
 	struct kernfs_iattrs *ps_iattr;
+	bool has_ns;
 	int ret;
 
-	WARN_ON_ONCE(atomic_read(&parent->active) < 0);
+	if (!kernfs_get_active(parent))
+		return -ENOENT;
 
-	if (has_ns != (bool)kn->ns) {
-		WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n",
-		     has_ns ? "required" : "invalid", parent->name, kn->name);
-		return -EINVAL;
-	}
+	mutex_lock(&kernfs_mutex);
+
+	ret = -EINVAL;
+	has_ns = kernfs_ns_enabled(parent);
+	if (WARN(has_ns != (bool)kn->ns, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n",
+		 has_ns ? "required" : "invalid", parent->name, kn->name))
+		goto out_unlock;
 
 	if (kernfs_type(parent) != KERNFS_DIR)
-		return -EINVAL;
+		goto out_unlock;
 
 	kn->hash = kernfs_name_hash(kn->name, kn->ns);
 	kn->parent = parent;
@@ -463,7 +437,7 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
 
 	ret = kernfs_link_sibling(kn);
 	if (ret)
-		return ret;
+		goto out_unlock;
 
 	/* Update timestamps on the parent */
 	ps_iattr = parent->iattr;
@@ -474,34 +448,11 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
 
 	/* Mark the entry added into directory tree */
 	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
-	return 0;
-}
-
-/**
- *	kernfs_addrm_finish - finish up kernfs_node add/remove
- *	@acxt: addrm context to finish up
- *
- *	Finish up kernfs_node add/remove.  Resources acquired by
- *	kernfs_addrm_start() are released and removed kernfs_nodes are
- *	cleaned up.
- *
- *	LOCKING:
- *	kernfs_mutex is released.
- */
-void kernfs_addrm_finish(struct kernfs_addrm_cxt *acxt)
-	__releases(kernfs_mutex)
-{
-	/* release resources acquired by kernfs_addrm_start() */
+	ret = 0;
+out_unlock:
 	mutex_unlock(&kernfs_mutex);
-
-	/* kill removed kernfs_nodes */
-	while (acxt->removed) {
-		struct kernfs_node *kn = acxt->removed;
-
-		acxt->removed = kn->u.removed_list;
-
-		kernfs_put(kn);
-	}
+	kernfs_put_active(parent);
+	return ret;
 }
 
 /**
@@ -633,7 +584,6 @@ struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent,
 					 const char *name, umode_t mode,
 					 void *priv, const void *ns)
 {
-	struct kernfs_addrm_cxt acxt;
 	struct kernfs_node *kn;
 	int rc;
 
@@ -648,14 +598,7 @@ struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent,
 	kn->priv = priv;
 
 	/* link in */
-	rc = -ENOENT;
-	if (kernfs_get_active(parent)) {
-		kernfs_addrm_start(&acxt);
-		rc = kernfs_add_one(&acxt, kn, parent);
-		kernfs_addrm_finish(&acxt);
-		kernfs_put_active(parent);
-	}
-
+	rc = kernfs_add_one(kn, parent);
 	if (!rc)
 		return kn;
 
@@ -841,8 +784,7 @@ static void __kernfs_deactivate(struct kernfs_node *kn)
 	}
 }
 
-static void __kernfs_remove(struct kernfs_addrm_cxt *acxt,
-			    struct kernfs_node *kn)
+static void __kernfs_remove(struct kernfs_node *kn)
 {
 	struct kernfs_node *pos;
 
@@ -890,8 +832,7 @@ static void __kernfs_remove(struct kernfs_addrm_cxt *acxt,
 				ps_iattr->ia_iattr.ia_mtime = CURRENT_TIME;
 			}
 
-			pos->u.removed_list = acxt->removed;
-			acxt->removed = pos;
+			kernfs_put(pos);
 		}
 
 		kernfs_put(pos);
@@ -906,11 +847,9 @@ static void __kernfs_remove(struct kernfs_addrm_cxt *acxt,
  */
 void kernfs_remove(struct kernfs_node *kn)
 {
-	struct kernfs_addrm_cxt acxt;
-
-	kernfs_addrm_start(&acxt);
-	__kernfs_remove(&acxt, kn);
-	kernfs_addrm_finish(&acxt);
+	mutex_lock(&kernfs_mutex);
+	__kernfs_remove(kn);
+	mutex_unlock(&kernfs_mutex);
 }
 
 /**
@@ -925,7 +864,6 @@ void kernfs_remove(struct kernfs_node *kn)
 int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
 			     const void *ns)
 {
-	struct kernfs_addrm_cxt acxt;
 	struct kernfs_node *kn;
 
 	if (!parent) {
@@ -934,13 +872,13 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
 		return -ENOENT;
 	}
 
-	kernfs_addrm_start(&acxt);
+	mutex_lock(&kernfs_mutex);
 
 	kn = kernfs_find_ns(parent, name, ns);
 	if (kn)
-		__kernfs_remove(&acxt, kn);
+		__kernfs_remove(kn);
 
-	kernfs_addrm_finish(&acxt);
+	mutex_unlock(&kernfs_mutex);
 
 	if (kn)
 		return 0;
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 404ffd2..ffe1beb 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -817,7 +817,6 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 					 bool name_is_static,
 					 struct lock_class_key *key)
 {
-	struct kernfs_addrm_cxt acxt;
 	struct kernfs_node *kn;
 	unsigned flags;
 	int rc;
@@ -853,14 +852,7 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 	if (ops->mmap)
 		kn->flags |= KERNFS_HAS_MMAP;
 
-	rc = -ENOENT;
-	if (kernfs_get_active(parent)) {
-		kernfs_addrm_start(&acxt);
-		rc = kernfs_add_one(&acxt, kn, parent);
-		kernfs_addrm_finish(&acxt);
-		kernfs_put_active(parent);
-	}
-
+	rc = kernfs_add_one(kn, parent);
 	if (rc) {
 		kernfs_put(kn);
 		return ERR_PTR(rc);
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index e9ec38c..4bc5784 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -46,13 +46,6 @@ static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
 }
 
 /*
- * Context structure to be used while adding/removing nodes.
- */
-struct kernfs_addrm_cxt {
-	struct kernfs_node	*removed;
-};
-
-/*
  * mount.c
  */
 struct kernfs_super_info {
@@ -101,10 +94,7 @@ extern const struct inode_operations kernfs_dir_iops;
 
 struct kernfs_node *kernfs_get_active(struct kernfs_node *kn);
 void kernfs_put_active(struct kernfs_node *kn);
-void kernfs_addrm_start(struct kernfs_addrm_cxt *acxt);
-int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
-		   struct kernfs_node *parent);
-void kernfs_addrm_finish(struct kernfs_addrm_cxt *acxt);
+int kernfs_add_one(struct kernfs_node *kn, struct kernfs_node *parent);
 struct kernfs_node *kernfs_new_node(struct kernfs_root *root, const char *name,
 				    umode_t mode, unsigned flags);
 
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index b2c106c..3a939c2 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -27,7 +27,6 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
 				       struct kernfs_node *target)
 {
 	struct kernfs_node *kn;
-	struct kernfs_addrm_cxt acxt;
 	int error;
 
 	kn = kernfs_new_node(kernfs_root(parent), name, S_IFLNK|S_IRWXUGO,
@@ -40,14 +39,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
 	kn->symlink.target_kn = target;
 	kernfs_get(target);	/* ref owned by symlink */
 
-	error = -ENOENT;
-	if (kernfs_get_active(parent)) {
-		kernfs_addrm_start(&acxt);
-		error = kernfs_add_one(&acxt, kn, parent);
-		kernfs_addrm_finish(&acxt);
-		kernfs_put_active(parent);
-	}
-
+	error = kernfs_add_one(kn, parent);
 	if (!error)
 		return kn;
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 61bd7ae..9b5a4bb 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -89,10 +89,6 @@ struct kernfs_node {
 
 	struct rb_node		rb;
 
-	union {
-		struct kernfs_node	*removed_list;
-	} u;
-
 	const void		*ns;	/* namespace tag */
 	unsigned int		hash;	/* ns + name hash */
 	union {
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/14] kernfs: make kernfs_get_active() block if the node is deactivated but not removed
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (6 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 07/14] kernfs: remove kernfs_addrm_cxt Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]() Tejun Heo
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

Currently, kernfs_get_active() fails if the target node is
deactivated.  This is fine as a node always gets removed after
deactivation; however, we're gonna add reactivation so the assumption
won't hold.  It'd be incorrect for kernfs_get_active() to fail for a
node which was deactivated only temporarily.

This patch makes kernfs_get_active() block if the node is deactivated
but not removed.  If the node gets reactivated (not yet implemented),
it will be retried and succeed.  If the node gets removed, it will be
woken up and fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 770d687..37dd640 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -149,12 +149,25 @@ struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)
 	if (unlikely(!kn))
 		return NULL;
 
-	if (!atomic_inc_unless_negative(&kn->active))
-		return NULL;
-
 	if (kernfs_lockdep(kn))
 		rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);
-	return kn;
+
+	/*
+	 * Try to obtain an active ref.  If @kn is deactivated, we block
+	 * till either it's reactivated or killed.
+	 */
+	do {
+		if (atomic_inc_unless_negative(&kn->active))
+			return kn;
+
+		wait_event(kernfs_root(kn)->deactivate_waitq,
+			   atomic_read(&kn->active) >= 0 ||
+			   RB_EMPTY_NODE(&kn->rb));
+	} while (!RB_EMPTY_NODE(&kn->rb));
+
+	if (kernfs_lockdep(kn))
+		rwsem_release(&kn->dep_map, 1, _RET_IP_);
+	return NULL;
 }
 
 /**
@@ -786,6 +799,7 @@ static void __kernfs_deactivate(struct kernfs_node *kn)
 
 static void __kernfs_remove(struct kernfs_node *kn)
 {
+	struct kernfs_root *root = kernfs_root(kn);
 	struct kernfs_node *pos;
 
 	lockdep_assert_held(&kernfs_mutex);
@@ -837,6 +851,9 @@ static void __kernfs_remove(struct kernfs_node *kn)
 
 		kernfs_put(pos);
 	} while (pos != kn);
+
+	/* some nodes killed, kick get_active waiters */
+	wake_up_all(&root->deactivate_waitq);
 }
 
 /**
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (7 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 08/14] kernfs: make kernfs_get_active() block if the node is deactivated but not removed Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 10/14] kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers Tejun Heo
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

This patch implements four functions to manipulate deactivation state
- deactivate, reactivate and the _self suffixed pair.  A new fields
kernfs_node->deact_depth is added so that concurrent and nested
deactivations are handled properly.  kernfs_node->hash is moved so
that it's paired with the new field so that it doesn't increase the
size of kernfs_node.

A kernfs user's lock would normally nest inside active ref but during
removal the user may want to perform kernfs_remove() while holding the
said lock, which would introduce a reverse locking dependency.  This
function can be used to break such reverse dependency by allowing
deactivation step to performed separately outside user's critical
section.

This will also be used implement kernfs_remove_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c        | 118 ++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/kernfs.h |   7 ++-
 2 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 37dd640..1aeb579 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -396,6 +396,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_root *root, const char *name,
 
 	atomic_set(&kn->count, 1);
 	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
+	kn->deact_depth = 1;
 	RB_CLEAR_NODE(&kn->rb);
 
 	kn->name = name;
@@ -461,6 +462,7 @@ int kernfs_add_one(struct kernfs_node *kn, struct kernfs_node *parent)
 
 	/* Mark the entry added into directory tree */
 	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
+	kn->deact_depth--;
 	ret = 0;
 out_unlock:
 	mutex_unlock(&kernfs_mutex);
@@ -561,6 +563,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_dir_ops *kdops, void *priv)
 	}
 
 	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
+	kn->deact_depth--;
 	kn->priv = priv;
 	kn->dir.root = root;
 
@@ -773,7 +776,8 @@ static void __kernfs_deactivate(struct kernfs_node *kn)
 	/* prevent any new usage under @kn by deactivating all nodes */
 	pos = NULL;
 	while ((pos = kernfs_next_descendant_post(pos, kn))) {
-		if (atomic_read(&pos->active) >= 0) {
+		if (!pos->deact_depth++) {
+			WARN_ON_ONCE(atomic_read(&pos->active) < 0);
 			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
 			pos->flags |= KERNFS_JUST_DEACTIVATED;
 		}
@@ -797,6 +801,118 @@ static void __kernfs_deactivate(struct kernfs_node *kn)
 	}
 }
 
+static void __kernfs_reactivate(struct kernfs_node *kn)
+{
+	struct kernfs_node *pos;
+
+	lockdep_assert_held(&kernfs_mutex);
+
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn))) {
+		if (!--pos->deact_depth) {
+			WARN_ON_ONCE(atomic_read(&pos->active) >= 0);
+			atomic_sub(KN_DEACTIVATED_BIAS, &pos->active);
+		}
+		WARN_ON_ONCE(pos->deact_depth < 0);
+	}
+
+	/* some nodes reactivated, kick get_active waiters */
+	wake_up_all(&kernfs_root(kn)->deactivate_waitq);
+}
+
+static void __kernfs_deactivate_self(struct kernfs_node *kn)
+{
+	/*
+	 * Take out ourself out of the active ref dependency chain and
+	 * deactivate.  If we're called without an active ref, lockdep will
+	 * complain.
+	 */
+	kernfs_put_active(kn);
+	__kernfs_deactivate(kn);
+}
+
+static void __kernfs_reactivate_self(struct kernfs_node *kn)
+{
+	__kernfs_reactivate(kn);
+	/*
+	 * Restore active ref dropped by deactivate_self() so that it's
+	 * balanced on return.  put_active() will soon be called on @kn, so
+	 * this can't break anything regardless of @kn's state.
+	 */
+	atomic_inc(&kn->active);
+	if (kernfs_lockdep(kn))
+		rwsem_acquire(&kn->dep_map, 0, 1, _RET_IP_);
+}
+
+/**
+ * kernfs_deactivate - deactivate subtree of a node
+ * @kn: kernfs_node to deactivate subtree of
+ *
+ * Deactivate the subtree of @kn.  On return, there's no active operation
+ * going on under @kn and creation or renaming of a node under @kn is
+ * blocked until @kn is reactivated or removed.  This function can be
+ * called multiple times and nests properly.  Each invocation should be
+ * paired with kernfs_reactivate().
+ *
+ * For a kernfs user which uses simple locking, the subsystem lock would
+ * nest inside active reference.  This becomes problematic if the user
+ * tries to remove nodes while holding the subystem lock as it would create
+ * a reverse locking dependency from the subsystem lock to active ref.
+ * This function can be used to break such reverse dependency.  The user
+ * can call this function outside the subsystem lock and then proceed to
+ * invoke kernfs_remove() while holding the subsystem lock without
+ * introducing such reverse dependency.
+ */
+void kernfs_deactivate(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_deactivate(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
+/**
+ * kernfs_reactivate - reactivate subtree of a node
+ * @kn: kernfs_node to reactivate subtree of
+ *
+ * Undo kernfs_deactivate().
+ */
+void kernfs_reactivate(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_reactivate(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
+/**
+ * kernfs_deactivate_self - deactivate subtree of a node from its own method
+ * @kn: the self kernfs_node to deactivate subtree of
+ *
+ * The caller must be running off of a kernfs operation which is invoked
+ * with an active reference - e.g. one of kernfs_ops.  Once this function
+ * is called, @kn may be removed by someone else while the enclosing method
+ * is in progress.  Other than that, this function is equivalent to
+ * kernfs_deactivate() and should be paired with kernfs_reactivate_self().
+ */
+void kernfs_deactivate_self(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_deactivate_self(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
+/**
+ * kernfs_reactivate_self - reactivate subtree of a node from its own method
+ * @kn: the self kernfs_node to reactivate subtree of
+ *
+ * Undo kernfs_deactivate_self().
+ */
+void kernfs_reactivate_self(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_reactivate_self(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
 static void __kernfs_remove(struct kernfs_node *kn)
 {
 	struct kernfs_root *root = kernfs_root(kn);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 9b5a4bb..ac86930 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -80,6 +80,8 @@ struct kernfs_elem_attr {
 struct kernfs_node {
 	atomic_t		count;
 	atomic_t		active;
+	int			deact_depth;
+	unsigned int		hash;	/* ns + name hash */
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map	dep_map;
 #endif
@@ -90,7 +92,6 @@ struct kernfs_node {
 	struct rb_node		rb;
 
 	const void		*ns;	/* namespace tag */
-	unsigned int		hash;	/* ns + name hash */
 	union {
 		struct kernfs_elem_dir		dir;
 		struct kernfs_elem_symlink	symlink;
@@ -233,6 +234,10 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
 				       const char *name,
 				       struct kernfs_node *target);
+void kernfs_deactivate(struct kernfs_node *kn);
+void kernfs_reactivate(struct kernfs_node *kn);
+void kernfs_deactivate_self(struct kernfs_node *kn);
+void kernfs_reactivate_self(struct kernfs_node *kn);
 void kernfs_remove(struct kernfs_node *kn);
 int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
 			     const void *ns);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/14] kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (8 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]() Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 11/14] pci: use device_remove_file_self() instead of device_schedule_callback() Tejun Heo
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo, kbuild test robot

Sometimes it's necessary to implement a node which wants to delete
nodes including itself.  This isn't straightforward because of kernfs
active reference.  While a file operation is in progress, an active
reference is held and kernfs_remove() waits for all such references to
drain before completing.  For a self-deleting node, this is a deadlock
as kernfs_remove() ends up waiting for an active reference that itself
is sitting on top of.

This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which makes such removals asynchronous.
While it works, it's rather cumbersome and inherently breaks
synchronicity of the operation - the file operation which triggered
the operation may complete before the removal is finished (or even
started) and the removal may fail asynchronously.  If a removal
operation is immmediately followed by another operation which expects
the specific name to be available (e.g. removal followed by rename
onto the same name), there's no way to make the latter operation
reliable.

The thing is there's no inherent reason for this to be asynchrnous.
All that's necessary to do this synchronous is a dedicated operation
which drops its own active ref and deactivates self.  This patch
implements kernfs_remove_self() and its wrappers in sysfs and driver
core.  kernfs_remove_self() is to be called from one of the file
operations, drops the active ref and deactivates using
__kernfs_deactivate_self(), removes the self node, and restores active
ref to the dead node using __kernfs_reactivate_self() so that the ref
is balanced afterwards.  __kernfs_remove() is updated so that it takes
an early exit if the target node is already fully removed so that the
active ref restored by kernfs_remove_self() after removal doesn't
confuse the deactivation path.

This makes implementing self-deleting nodes very easy.  The normal
removal path doesn't even need to be changed to use
kernfs_remove_self() for the self-deleting node.  The method can
invoke kernfs_remove_self() on itself before proceeding the normal
removal path.  kernfs_remove() invoked on the node by the normal
deletion path will simply be ignored.

This will replace sysfs_schedule_callback().  A subtle feature of
sysfs_schedule_callback() is that it collapses multiple invocations -
even if multiple removals are triggered, the removal callback is run
only once.  An equivalent effect can be achieved by testing the return
value of kernfs_remove_self() - only the one which gets %true return
value should proceed with actual deletion.  All other instances of
kernfs_remove_self() will wait till the enclosing kernfs operation
which invoked the winning instance of kernfs_remove_self() finishes
and then return %false.  This trivially makes all users of
kernfs_remove_self() automatically show correct synchronous behavior
even when there are multiple concurrent operations - all "echo 1 >
delete" instances will finish only after the whole operation is
completed by one of the instances.

v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
    and sysfs_remove_file_self() had incorrect return type.  Fix it.
    Reported by kbuild test bot.

v3: Updated to use __kernfs_{de|re}activate_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
---
 drivers/base/core.c    | 17 ++++++++++++
 fs/kernfs/dir.c        | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/sysfs/file.c        | 23 ++++++++++++++++
 include/linux/device.h |  2 ++
 include/linux/kernfs.h |  6 +++++
 include/linux/sysfs.h  |  7 +++++
 6 files changed, 127 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b56717..9db57af 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -571,6 +571,23 @@ void device_remove_file(struct device *dev,
 EXPORT_SYMBOL_GPL(device_remove_file);
 
 /**
+ * device_remove_file_self - remove sysfs attribute file from its own method.
+ * @dev: device.
+ * @attr: device attribute descriptor.
+ *
+ * See kernfs_remove_self() for details.
+ */
+bool device_remove_file_self(struct device *dev,
+			     const struct device_attribute *attr)
+{
+	if (dev)
+		return sysfs_remove_file_self(&dev->kobj, &attr->attr);
+	else
+		return false;
+}
+EXPORT_SYMBOL_GPL(device_remove_file_self);
+
+/**
  * device_create_bin_file - create sysfs binary attribute file for device.
  * @dev: device.
  * @attr: device binary attribute descriptor.
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 1aeb579..a8028be 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -986,6 +986,78 @@ void kernfs_remove(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_remove_self - remove a kernfs_node from its own method
+ * @kn: the self kernfs_node to remove
+ *
+ * The caller must be running off of a kernfs operation which is invoked
+ * with an active reference - e.g. one of kernfs_ops.  This can be used to
+ * implement a file operation which deletes itself.
+ *
+ * For example, the "delete" file for a sysfs device directory can be
+ * implemented by invoking kernfs_remove_self() on the "delete" file
+ * itself.  This function breaks the circular dependency of trying to
+ * deactivate self while holding an active ref itself.  It isn't necessary
+ * to modify the usual removal path to use kernfs_remove_self().  The
+ * "delete" implementation can simply invoke kernfs_remove_self() on self
+ * before proceeding with the usual removal path.  kernfs will ignore later
+ * kernfs_remove() on self.
+ *
+ * kernfs_remove_self() can be called multiple times concurrently on the
+ * same kernfs_node.  Only the first one actually performs removal and
+ * returns %true.  All others will wait until the kernfs operation which
+ * won self-removal finishes and return %false.  Note that the losers wait
+ * for the completion of not only the winning kernfs_remove_self() but also
+ * the whole kernfs_ops which won the arbitration.  This can be used to
+ * guarantee, for example, all concurrent writes to a "delete" file to
+ * finish only after the whole operation is complete.
+ */
+bool kernfs_remove_self(struct kernfs_node *kn)
+{
+	bool ret;
+
+	mutex_lock(&kernfs_mutex);
+	__kernfs_deactivate_self(kn);
+
+	/*
+	 * SUICIDAL is used to arbitrate among competing invocations.  Only
+	 * the first one will actually perform removal.  When the removal
+	 * is complete, SUICIDED is set and the active ref is restored
+	 * while holding kernfs_mutex.  The ones which lost arbitration
+	 * waits for SUICDED && drained which can happen only after the
+	 * enclosing kernfs operation which executed the winning instance
+	 * of kernfs_remove_self() finished.
+	 */
+	if (!(kn->flags & KERNFS_SUICIDAL)) {
+		kn->flags |= KERNFS_SUICIDAL;
+		__kernfs_remove(kn);
+		kn->flags |= KERNFS_SUICIDED;
+		ret = true;
+	} else {
+		wait_queue_head_t *waitq = &kernfs_root(kn)->deactivate_waitq;
+		DEFINE_WAIT(wait);
+
+		while (true) {
+			prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE);
+
+			if ((kn->flags & KERNFS_SUICIDED) &&
+			    atomic_read(&kn->active) == KN_DEACTIVATED_BIAS)
+				break;
+
+			mutex_unlock(&kernfs_mutex);
+			schedule();
+			mutex_lock(&kernfs_mutex);
+		}
+		finish_wait(waitq, &wait);
+		WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb));
+		ret = false;
+	}
+
+	__kernfs_reactivate_self(kn);
+	mutex_unlock(&kernfs_mutex);
+	return ret;
+}
+
+/**
  * kernfs_remove_by_name_ns - find a kernfs_node by name and remove it
  * @parent: parent of the target
  * @name: name of the kernfs_node to remove
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 810cf6e..1b8b91b 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -372,6 +372,29 @@ void sysfs_remove_file_ns(struct kobject *kobj, const struct attribute *attr,
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_ns);
 
+/**
+ * sysfs_remove_file_self - remove an object attribute from its own method
+ * @kobj: object we're acting for
+ * @attr: attribute descriptor
+ *
+ * See kernfs_remove_self() for details.
+ */
+bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr)
+{
+	struct kernfs_node *parent = kobj->sd;
+	struct kernfs_node *kn;
+	bool ret;
+
+	kn = kernfs_find_and_get(parent, attr->name);
+	if (WARN_ON_ONCE(!kn))
+		return false;
+
+	ret = kernfs_remove_self(kn);
+
+	kernfs_put(kn);
+	return ret;
+}
+
 void sysfs_remove_files(struct kobject *kobj, const struct attribute **ptr)
 {
 	int i;
diff --git a/include/linux/device.h b/include/linux/device.h
index 952b010..1ff3f16 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -560,6 +560,8 @@ extern int device_create_file(struct device *device,
 			      const struct device_attribute *entry);
 extern void device_remove_file(struct device *dev,
 			       const struct device_attribute *attr);
+extern bool device_remove_file_self(struct device *dev,
+				    const struct device_attribute *attr);
 extern int __must_check device_create_bin_file(struct device *dev,
 					const struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index ac86930..0b7b7cc 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -43,6 +43,8 @@ enum kernfs_node_flag {
 	KERNFS_HAS_MMAP		= 0x0080,
 	KERNFS_LOCKDEP		= 0x0100,
 	KERNFS_STATIC_NAME	= 0x0200,
+	KERNFS_SUICIDAL		= 0x0400,
+	KERNFS_SUICIDED		= 0x0800,
 };
 
 /* type-specific structures for kernfs_node union members */
@@ -239,6 +241,7 @@ void kernfs_reactivate(struct kernfs_node *kn);
 void kernfs_deactivate_self(struct kernfs_node *kn);
 void kernfs_reactivate_self(struct kernfs_node *kn);
 void kernfs_remove(struct kernfs_node *kn);
+bool kernfs_remove_self(struct kernfs_node *kn);
 int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
 			     const void *ns);
 int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
@@ -296,6 +299,9 @@ kernfs_create_link(struct kernfs_node *parent, const char *name,
 
 static inline void kernfs_remove(struct kernfs_node *kn) { }
 
+static inline bool kernfs_remove_self(struct kernfs_node *kn)
+{ return false; }
+
 static inline int kernfs_remove_by_name_ns(struct kernfs_node *kn,
 					   const char *name, const void *ns)
 { return -ENOSYS; }
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 30b2ebe..bd96c60 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -198,6 +198,7 @@ int __must_check sysfs_chmod_file(struct kobject *kobj,
 				  const struct attribute *attr, umode_t mode);
 void sysfs_remove_file_ns(struct kobject *kobj, const struct attribute *attr,
 			  const void *ns);
+bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr);
 void sysfs_remove_files(struct kobject *kobj, const struct attribute **attr);
 
 int __must_check sysfs_create_bin_file(struct kobject *kobj,
@@ -301,6 +302,12 @@ static inline void sysfs_remove_file_ns(struct kobject *kobj,
 {
 }
 
+static inline bool sysfs_remove_file_self(struct kobject *kobj,
+					  const struct attribute *attr)
+{
+	return false;
+}
+
 static inline void sysfs_remove_files(struct kobject *kobj,
 				     const struct attribute **attr)
 {
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/14] pci: use device_remove_file_self() instead of device_schedule_callback()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (9 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 10/14] kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 12/14] scsi: " Tejun Heo
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo, linux-pci

driver-core now supports synchrnous self-deletion of attributes and
the asynchrnous removal mechanism is scheduled for removal.  Use it
instead of device_schedule_callback().  This makes "remove" behave
synchronously.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/pci/pci-sysfs.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c91e6c1..94d1cb8 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -352,32 +352,20 @@ static struct device_attribute dev_rescan_attr = __ATTR(rescan,
 							(S_IWUSR|S_IWGRP),
 							NULL, dev_rescan_store);
 
-static void remove_callback(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-
-	mutex_lock(&pci_remove_rescan_mutex);
-	pci_stop_and_remove_bus_device(pdev);
-	mutex_unlock(&pci_remove_rescan_mutex);
-}
-
 static ssize_t
-remove_store(struct device *dev, struct device_attribute *dummy,
+remove_store(struct device *dev, struct device_attribute *attr,
 	     const char *buf, size_t count)
 {
-	int ret = 0;
 	unsigned long val;
 
 	if (kstrtoul(buf, 0, &val) < 0)
 		return -EINVAL;
 
-	/* An attribute cannot be unregistered by one of its own methods,
-	 * so we have to use this roundabout approach.
-	 */
-	if (val)
-		ret = device_schedule_callback(dev, remove_callback);
-	if (ret)
-		count = ret;
+	if (val && device_remove_file_self(dev, attr)) {
+		mutex_lock(&pci_remove_rescan_mutex);
+		pci_stop_and_remove_bus_device(to_pci_dev(dev));
+		mutex_unlock(&pci_remove_rescan_mutex);
+	}
 	return count;
 }
 static struct device_attribute dev_remove_attr = __ATTR(remove,
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/14] scsi: use device_remove_file_self() instead of device_schedule_callback()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (10 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 11/14] pci: use device_remove_file_self() instead of device_schedule_callback() Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 13/14] s390: " Tejun Heo
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo, linux-scsi

driver-core now supports synchrnous self-deletion of attributes and
the asynchrnous removal mechanism is scheduled for removal.  Use it
instead of device_schedule_callback().  This makes "delete" behave
synchronously.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: linux-scsi@vger.kernel.org
---
 drivers/scsi/scsi_sysfs.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..33e6199 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -635,23 +635,12 @@ store_rescan_field (struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
-static void sdev_store_delete_callback(struct device *dev)
-{
-	scsi_remove_device(to_scsi_device(dev));
-}
-
 static ssize_t
 sdev_store_delete(struct device *dev, struct device_attribute *attr,
 		  const char *buf, size_t count)
 {
-	int rc;
-
-	/* An attribute cannot be unregistered by one of its own methods,
-	 * so we have to use this roundabout approach.
-	 */
-	rc = device_schedule_callback(dev, sdev_store_delete_callback);
-	if (rc)
-		count = rc;
+	if (device_remove_file_self(dev, attr))
+		scsi_remove_device(to_scsi_device(dev));
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/14] s390: use device_remove_file_self() instead of device_schedule_callback()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (11 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 12/14] scsi: " Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 13:57 ` [PATCH 14/14] sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() Tejun Heo
  2014-01-10 15:08 ` [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Greg KH
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo, linux390, linux-s390

driver-core now supports synchrnous self-deletion of attributes and
the asynchrnous removal mechanism is scheduled for removal.  Use it
instead of device_schedule_callback().

* Conversions in arch/s390/pci/pci_sysfs.c and
  drivers/s390/block/dcssblk.c are straightforward.

* drivers/s390/cio/ccwgroup.c is a bit more tricky because
  ccwgroup_notifier() was (ab)using device_schedule_callback() to
  purely obtain a process context to kick off ungroup operation which
  may block from a notifier callback.

  Rename ccwgroup_ungroup_callback() to ccwgroup_ungroup() and make it
  take ccwgroup_device * instead.  The new function is now called
  directly from ccwgroup_ungroup_store().

  ccwgroup_notifier() chain is updated to explicitly bounce through
  ccwgroup_device->ungroup_work.  This also removes possible failure
  from memory pressure.

Only compile-tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
---
 arch/s390/include/asm/ccwgroup.h |  1 +
 arch/s390/pci/pci_sysfs.c        | 18 ++++++++----------
 drivers/s390/block/dcssblk.c     | 14 +++++++-------
 drivers/s390/cio/ccwgroup.c      | 26 ++++++++++++++++----------
 4 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/arch/s390/include/asm/ccwgroup.h b/arch/s390/include/asm/ccwgroup.h
index 23723ce..6e670f8 100644
--- a/arch/s390/include/asm/ccwgroup.h
+++ b/arch/s390/include/asm/ccwgroup.h
@@ -23,6 +23,7 @@ struct ccwgroup_device {
 	unsigned int count;
 	struct device	dev;
 	struct ccw_device *cdev[0];
+	struct work_struct ungroup_work;
 };
 
 /**
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index cf8a12f..ab4a913 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -48,29 +48,27 @@ static ssize_t show_pfgid(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR(pfgid, S_IRUGO, show_pfgid, NULL);
 
-static void recover_callback(struct device *dev)
+static ssize_t store_recover(struct device *dev, struct device_attribute *attr,
+			     const char *buf, size_t count)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct zpci_dev *zdev = get_zdev(pdev);
 	int ret;
 
+	if (!device_remove_file_self(dev, attr))
+		return count;
+
 	pci_stop_and_remove_bus_device(pdev);
 	ret = zpci_disable_device(zdev);
 	if (ret)
-		return;
+		return ret;
 
 	ret = zpci_enable_device(zdev);
 	if (ret)
-		return;
+		return ret;
 
 	pci_rescan_bus(zdev->bus);
-}
-
-static ssize_t store_recover(struct device *dev, struct device_attribute *attr,
-			     const char *buf, size_t count)
-{
-	int rc = device_schedule_callback(dev, recover_callback);
-	return rc ? rc : count;
+	return count;
 }
 static DEVICE_ATTR(recover, S_IWUSR, NULL, store_recover);
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 6eca019..2e2f454 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -304,12 +304,6 @@ dcssblk_load_segment(char *name, struct segment_info **seg_info)
 	return rc;
 }
 
-static void dcssblk_unregister_callback(struct device *dev)
-{
-	device_unregister(dev);
-	put_device(dev);
-}
-
 /*
  * device attribute for switching shared/nonshared (exclusive)
  * operation (show + store)
@@ -397,7 +391,13 @@ removeseg:
 	blk_cleanup_queue(dev_info->dcssblk_queue);
 	dev_info->gd->queue = NULL;
 	put_disk(dev_info->gd);
-	rc = device_schedule_callback(dev, dcssblk_unregister_callback);
+	up_write(&dcssblk_devices_sem);
+
+	if (device_remove_file_self(dev, attr)) {
+		device_unregister(dev);
+		put_device(dev);
+	}
+	return rc;
 out:
 	up_write(&dcssblk_devices_sem);
 	return rc;
diff --git a/drivers/s390/cio/ccwgroup.c b/drivers/s390/cio/ccwgroup.c
index 959135a..67b9dc9 100644
--- a/drivers/s390/cio/ccwgroup.c
+++ b/drivers/s390/cio/ccwgroup.c
@@ -168,14 +168,12 @@ static ssize_t ccwgroup_online_show(struct device *dev,
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
  */
-static void ccwgroup_ungroup_callback(struct device *dev)
+static void ccwgroup_ungroup(struct ccwgroup_device *gdev)
 {
-	struct ccwgroup_device *gdev = to_ccwgroupdev(dev);
-
 	mutex_lock(&gdev->reg_mutex);
 	if (device_is_registered(&gdev->dev)) {
 		__ccwgroup_remove_symlinks(gdev);
-		device_unregister(dev);
+		device_unregister(&gdev->dev);
 		__ccwgroup_remove_cdev_refs(gdev);
 	}
 	mutex_unlock(&gdev->reg_mutex);
@@ -195,10 +193,9 @@ static ssize_t ccwgroup_ungroup_store(struct device *dev,
 		rc = -EINVAL;
 		goto out;
 	}
-	/* Note that we cannot unregister the device from one of its
-	 * attribute methods, so we have to use this roundabout approach.
-	 */
-	rc = device_schedule_callback(dev, ccwgroup_ungroup_callback);
+
+	if (device_remove_file_self(dev, attr))
+		ccwgroup_ungroup(gdev);
 out:
 	if (rc) {
 		if (rc != -EAGAIN)
@@ -224,6 +221,14 @@ static const struct attribute_group *ccwgroup_attr_groups[] = {
 	NULL,
 };
 
+static void ccwgroup_ungroup_workfn(struct work_struct *work)
+{
+	struct ccwgroup_device *gdev =
+		container_of(work, struct ccwgroup_device, ungroup_work);
+
+	ccwgroup_ungroup(gdev);
+}
+
 static void ccwgroup_release(struct device *dev)
 {
 	kfree(to_ccwgroupdev(dev));
@@ -323,6 +328,7 @@ int ccwgroup_create_dev(struct device *parent, struct ccwgroup_driver *gdrv,
 	atomic_set(&gdev->onoff, 0);
 	mutex_init(&gdev->reg_mutex);
 	mutex_lock(&gdev->reg_mutex);
+	INIT_WORK(&gdev->ungroup_work, ccwgroup_ungroup_workfn);
 	gdev->count = num_devices;
 	gdev->dev.bus = &ccwgroup_bus_type;
 	gdev->dev.parent = parent;
@@ -404,10 +410,10 @@ EXPORT_SYMBOL(ccwgroup_create_dev);
 static int ccwgroup_notifier(struct notifier_block *nb, unsigned long action,
 			     void *data)
 {
-	struct device *dev = data;
+	struct ccwgroup_device *gdev = to_ccwgroupdev(data);
 
 	if (action == BUS_NOTIFY_UNBIND_DRIVER)
-		device_schedule_callback(dev, ccwgroup_ungroup_callback);
+		schedule_work(&gdev->ungroup_work);
 
 	return NOTIFY_OK;
 }
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/14] sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (12 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 13/14] s390: " Tejun Heo
@ 2014-01-10 13:57 ` Tejun Heo
  2014-01-10 15:08 ` [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Greg KH
  14 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:57 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

All device_schedule_callback_owner() users are converted to use
device_remove_file_self().  Remove now unused
{sysfs|device}_schedule_callback_owner().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 drivers/base/core.c    | 33 ------------------
 fs/sysfs/file.c        | 92 --------------------------------------------------
 include/linux/device.h | 11 +-----
 include/linux/sysfs.h  |  9 -----
 4 files changed, 1 insertion(+), 144 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 9db57af..4195364 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -615,39 +615,6 @@ void device_remove_bin_file(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
-/**
- * device_schedule_callback_owner - helper to schedule a callback for a device
- * @dev: device.
- * @func: callback function to invoke later.
- * @owner: module owning the callback routine
- *
- * Attribute methods must not unregister themselves or their parent device
- * (which would amount to the same thing).  Attempts to do so will deadlock,
- * since unregistration is mutually exclusive with driver callbacks.
- *
- * Instead methods can call this routine, which will attempt to allocate
- * and schedule a workqueue request to call back @func with @dev as its
- * argument in the workqueue's process context.  @dev will be pinned until
- * @func returns.
- *
- * This routine is usually called via the inline device_schedule_callback(),
- * which automatically sets @owner to THIS_MODULE.
- *
- * Returns 0 if the request was submitted, -ENOMEM if storage could not
- * be allocated, -ENODEV if a reference to @owner isn't available.
- *
- * NOTE: This routine won't work if CONFIG_SYSFS isn't set!  It uses an
- * underlying sysfs routine (since it is intended for use by attribute
- * methods), and if sysfs isn't available you'll get nothing but -ENOSYS.
- */
-int device_schedule_callback_owner(struct device *dev,
-		void (*func)(struct device *), struct module *owner)
-{
-	return sysfs_schedule_callback(&dev->kobj,
-			(void (*)(void *)) func, dev, owner);
-}
-EXPORT_SYMBOL_GPL(device_schedule_callback_owner);
-
 static void klist_children_get(struct klist_node *n)
 {
 	struct device_private *p = to_device_private_parent(n);
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 1b8b91b..28cc1acd 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -453,95 +453,3 @@ void sysfs_remove_bin_file(struct kobject *kobj,
 	kernfs_remove_by_name(kobj->sd, attr->attr.name);
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_bin_file);
-
-struct sysfs_schedule_callback_struct {
-	struct list_head	workq_list;
-	struct kobject		*kobj;
-	void			(*func)(void *);
-	void			*data;
-	struct module		*owner;
-	struct work_struct	work;
-};
-
-static struct workqueue_struct *sysfs_workqueue;
-static DEFINE_MUTEX(sysfs_workq_mutex);
-static LIST_HEAD(sysfs_workq);
-static void sysfs_schedule_callback_work(struct work_struct *work)
-{
-	struct sysfs_schedule_callback_struct *ss = container_of(work,
-			struct sysfs_schedule_callback_struct, work);
-
-	(ss->func)(ss->data);
-	kobject_put(ss->kobj);
-	module_put(ss->owner);
-	mutex_lock(&sysfs_workq_mutex);
-	list_del(&ss->workq_list);
-	mutex_unlock(&sysfs_workq_mutex);
-	kfree(ss);
-}
-
-/**
- * sysfs_schedule_callback - helper to schedule a callback for a kobject
- * @kobj: object we're acting for.
- * @func: callback function to invoke later.
- * @data: argument to pass to @func.
- * @owner: module owning the callback code
- *
- * sysfs attribute methods must not unregister themselves or their parent
- * kobject (which would amount to the same thing).  Attempts to do so will
- * deadlock, since unregistration is mutually exclusive with driver
- * callbacks.
- *
- * Instead methods can call this routine, which will attempt to allocate
- * and schedule a workqueue request to call back @func with @data as its
- * argument in the workqueue's process context.  @kobj will be pinned
- * until @func returns.
- *
- * Returns 0 if the request was submitted, -ENOMEM if storage could not
- * be allocated, -ENODEV if a reference to @owner isn't available,
- * -EAGAIN if a callback has already been scheduled for @kobj.
- */
-int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
-		void *data, struct module *owner)
-{
-	struct sysfs_schedule_callback_struct *ss, *tmp;
-
-	if (!try_module_get(owner))
-		return -ENODEV;
-
-	mutex_lock(&sysfs_workq_mutex);
-	list_for_each_entry_safe(ss, tmp, &sysfs_workq, workq_list)
-		if (ss->kobj == kobj) {
-			module_put(owner);
-			mutex_unlock(&sysfs_workq_mutex);
-			return -EAGAIN;
-		}
-	mutex_unlock(&sysfs_workq_mutex);
-
-	if (sysfs_workqueue == NULL) {
-		sysfs_workqueue = create_singlethread_workqueue("sysfsd");
-		if (sysfs_workqueue == NULL) {
-			module_put(owner);
-			return -ENOMEM;
-		}
-	}
-
-	ss = kmalloc(sizeof(*ss), GFP_KERNEL);
-	if (!ss) {
-		module_put(owner);
-		return -ENOMEM;
-	}
-	kobject_get(kobj);
-	ss->kobj = kobj;
-	ss->func = func;
-	ss->data = data;
-	ss->owner = owner;
-	INIT_WORK(&ss->work, sysfs_schedule_callback_work);
-	INIT_LIST_HEAD(&ss->workq_list);
-	mutex_lock(&sysfs_workq_mutex);
-	list_add_tail(&ss->workq_list, &sysfs_workq);
-	mutex_unlock(&sysfs_workq_mutex);
-	queue_work(sysfs_workqueue, &ss->work);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(sysfs_schedule_callback);
diff --git a/include/linux/device.h b/include/linux/device.h
index 1ff3f16..fb1ba13 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -566,12 +566,6 @@ extern int __must_check device_create_bin_file(struct device *dev,
 					const struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
 				   const struct bin_attribute *attr);
-extern int device_schedule_callback_owner(struct device *dev,
-		void (*func)(struct device *dev), struct module *owner);
-
-/* This is a macro to avoid include problems with THIS_MODULE */
-#define device_schedule_callback(dev, func)			\
-	device_schedule_callback_owner(dev, func, THIS_MODULE)
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
@@ -931,10 +925,7 @@ extern int device_online(struct device *dev);
 extern struct device *__root_device_register(const char *name,
 					     struct module *owner);
 
-/*
- * This is a macro to avoid include problems with THIS_MODULE,
- * just as per what is done for device_schedule_callback() above.
- */
+/* This is a macro to avoid include problems with THIS_MODULE */
 #define root_device_register(name) \
 	__root_device_register(name, THIS_MODULE)
 
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index bd96c60..14df054 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -178,9 +178,6 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
-int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
-			    void *data, struct module *owner);
-
 int __must_check sysfs_create_dir_ns(struct kobject *kobj, const void *ns);
 void sysfs_remove_dir(struct kobject *kobj);
 int __must_check sysfs_rename_dir_ns(struct kobject *kobj, const char *new_name,
@@ -249,12 +246,6 @@ int __must_check sysfs_init(void);
 
 #else /* CONFIG_SYSFS */
 
-static inline int sysfs_schedule_callback(struct kobject *kobj,
-		void (*func)(void *), void *data, struct module *owner)
-{
-	return -ENOSYS;
-}
-
 static inline int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
 {
 	return 0;
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
  2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
                   ` (13 preceding siblings ...)
  2014-01-10 13:57 ` [PATCH 14/14] sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() Tejun Heo
@ 2014-01-10 15:08 ` Greg KH
  2014-01-11  0:19   ` Greg KH
  14 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2014-01-10 15:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

On Fri, Jan 10, 2014 at 08:57:17AM -0500, Tejun Heo wrote:
> Hello,
> 
> This is v3 of kernfs self-removal patchset.  v2 posting mistakenly
> sent out slightly old set of patches, so the v3.  Sorry about the
> noise.  Changes from the first take[L] are,

It's really late in the -rc cycle for me to take this for 3.14, but I
see patch 1 is a good one to have, so I'll take that now, and queue the
rest up for after 3.14-rc1 is out for 3.15.  Is that ok with you, or do
you have patches that depend on this series for 3.14?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
  2014-01-10 15:08 ` [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Greg KH
@ 2014-01-11  0:19   ` Greg KH
  2014-01-11 18:45     ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2014-01-11  0:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

On Fri, Jan 10, 2014 at 07:08:56AM -0800, Greg KH wrote:
> On Fri, Jan 10, 2014 at 08:57:17AM -0500, Tejun Heo wrote:
> > Hello,
> > 
> > This is v3 of kernfs self-removal patchset.  v2 posting mistakenly
> > sent out slightly old set of patches, so the v3.  Sorry about the
> > noise.  Changes from the first take[L] are,
> 
> It's really late in the -rc cycle for me to take this for 3.14, but I
> see patch 1 is a good one to have, so I'll take that now, and queue the
> rest up for after 3.14-rc1 is out for 3.15.  Is that ok with you, or do
> you have patches that depend on this series for 3.14?

Oh nevermind, these are all good, now applied :)

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
  2014-01-11  0:19   ` Greg KH
@ 2014-01-11 18:45     ` Tejun Heo
  2014-01-13 21:17       ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2014-01-11 18:45 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

Hey, Greg.

On Fri, Jan 10, 2014 at 04:19:53PM -0800, Greg KH wrote:
> > It's really late in the -rc cycle for me to take this for 3.14, but I
> > see patch 1 is a good one to have, so I'll take that now, and queue the
> > rest up for after 3.14-rc1 is out for 3.15.  Is that ok with you, or do
> > you have patches that depend on this series for 3.14?
> 
> Oh nevermind, these are all good, now applied :)

I don't have anything depending on the series for the up coming merge
window so 3.15 merge window would have been fine but 3.14 merge
windows wokrs too. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
  2014-01-11 18:45     ` Tejun Heo
@ 2014-01-13 21:17       ` Tejun Heo
  2014-01-13 22:50         ` Greg KH
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2014-01-13 21:17 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

On Sat, Jan 11, 2014 at 01:45:13PM -0500, Tejun Heo wrote:
> Hey, Greg.
> 
> On Fri, Jan 10, 2014 at 04:19:53PM -0800, Greg KH wrote:
> > > It's really late in the -rc cycle for me to take this for 3.14, but I
> > > see patch 1 is a good one to have, so I'll take that now, and queue the
> > > rest up for after 3.14-rc1 is out for 3.15.  Is that ok with you, or do
> > > you have patches that depend on this series for 3.14?
> > 
> > Oh nevermind, these are all good, now applied :)
> 
> I don't have anything depending on the series for the up coming merge
> window so 3.15 merge window would have been fine but 3.14 merge
> windows wokrs too. :)

Greg, I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential to lead
to deadlock and that deactivate/reactivate interface is something
fundamentally flawed and that cgroup will have to work with the
remove_self() like everybody else.  IOW, I think the first posting was
correct.

I think we better defer this to the next window and I'll do the whole
thing - kernfs updates & cgroup conversion - together and then push
out patches so that I don't repeat these mistakes.

Sorry about the mess.  I got tunnel-visioned thinking about cgroup
semantics too much.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
  2014-01-13 21:17       ` Tejun Heo
@ 2014-01-13 22:50         ` Greg KH
  2014-01-13 22:54           ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2014-01-13 22:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

On Mon, Jan 13, 2014 at 04:17:15PM -0500, Tejun Heo wrote:
> On Sat, Jan 11, 2014 at 01:45:13PM -0500, Tejun Heo wrote:
> > Hey, Greg.
> > 
> > On Fri, Jan 10, 2014 at 04:19:53PM -0800, Greg KH wrote:
> > > > It's really late in the -rc cycle for me to take this for 3.14, but I
> > > > see patch 1 is a good one to have, so I'll take that now, and queue the
> > > > rest up for after 3.14-rc1 is out for 3.15.  Is that ok with you, or do
> > > > you have patches that depend on this series for 3.14?
> > > 
> > > Oh nevermind, these are all good, now applied :)
> > 
> > I don't have anything depending on the series for the up coming merge
> > window so 3.15 merge window would have been fine but 3.14 merge
> > windows wokrs too. :)
> 
> Greg, I'm sorry but can you please revert the whole series?
> get_active() waiting while a node is deactivated has potential to lead
> to deadlock and that deactivate/reactivate interface is something
> fundamentally flawed and that cgroup will have to work with the
> remove_self() like everybody else.  IOW, I think the first posting was
> correct.
> 
> I think we better defer this to the next window and I'll do the whole
> thing - kernfs updates & cgroup conversion - together and then push
> out patches so that I don't repeat these mistakes.
> 
> Sorry about the mess.  I got tunnel-visioned thinking about cgroup
> semantics too much.

No worries, but it is sad, I really liked seeing that odd "remove self"
function go away.  I've now reverted all 15 patches, please verify that
I didn't mess anything up in my tree somehow.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal
  2014-01-13 22:50         ` Greg KH
@ 2014-01-13 22:54           ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-13 22:54 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley, bhelgaas

Hello, Greg.

On Mon, Jan 13, 2014 at 02:50:34PM -0800, Greg KH wrote:
> No worries, but it is sad, I really liked seeing that odd "remove self"
> function go away.  I've now reverted all 15 patches, please verify that

Oh, I'll repost it and it's probably gonna be about the same as my
initial posting which was simpler and actually reduced LOC.  I just
wanna make sure cgroup conversion works fully before reposting it.  I
think it should but I've been wrong a couple times already, so...

> I didn't mess anything up in my tree somehow.

diff'ed before & after and the only difference is the new component
thing, so it looks all good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]()
  2014-01-10 13:46 [PATCHSET v2 " Tejun Heo
@ 2014-01-10 13:46 ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-01-10 13:46 UTC (permalink / raw)
  To: gregkh
  Cc: linux-kernel, schwidefsky, heiko.carstens, stern, JBottomley,
	bhelgaas, Tejun Heo

This patch implements four functions to manipulate deactivation state
- deactivate, reactivate and the _self suffixed pair.  A new fields
kernfs_node->deact_depth is added so that concurrent and nested
deactivations are handled properly.  kernfs_node->hash is moved so
that it's paired with the new field so that it doesn't increase the
size of kernfs_node.

A kernfs user's lock would normally nest inside active ref but during
removal the user may want to perform kernfs_remove() while holding the
said lock, which would introduce a reverse locking dependency.  This
function can be used to break such reverse dependency by allowing
deactivation step to performed separately outside user's critical
section.

This will also be used implement kernfs_remove_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/kernfs/dir.c        | 120 ++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/kernfs.h |   7 ++-
 2 files changed, 121 insertions(+), 6 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index f18fc5e..cc76af4 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -396,6 +396,7 @@ struct kernfs_node *kernfs_new_node(struct kernfs_root *root, const char *name,
 
 	atomic_set(&kn->count, 1);
 	atomic_set(&kn->active, KN_DEACTIVATED_BIAS);
+	kn->deact_depth = 1;
 	RB_CLEAR_NODE(&kn->rb);
 
 	kn->name = name;
@@ -461,6 +462,7 @@ int kernfs_add_one(struct kernfs_node *kn, struct kernfs_node *parent)
 
 	/* Mark the entry added into directory tree */
 	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
+	kn->deact_depth--;
 	ret = 0;
 out_unlock:
 	mutex_unlock(&kernfs_mutex);
@@ -561,6 +563,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_dir_ops *kdops, void *priv)
 	}
 
 	atomic_sub(KN_DEACTIVATED_BIAS, &kn->active);
+	kn->deact_depth--;
 	kn->priv = priv;
 	kn->dir.root = root;
 
@@ -773,7 +776,8 @@ static void __kernfs_deactivate(struct kernfs_node *kn)
 	/* prevent any new usage under @kn by deactivating all nodes */
 	pos = NULL;
 	while ((pos = kernfs_next_descendant_post(pos, kn))) {
-		if (atomic_read(&pos->active) >= 0) {
+		if (!pos->deact_depth++) {
+			WARN_ON_ONCE(atomic_read(&pos->active) < 0);
 			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
 			pos->flags |= KERNFS_JUST_DEACTIVATED;
 		}
@@ -797,9 +801,118 @@ static void __kernfs_deactivate(struct kernfs_node *kn)
 	}
 }
 
+static void __kernfs_reactivate(struct kernfs_node *kn)
+{
+	struct kernfs_node *pos;
+
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn))) {
+		if (!--pos->deact_depth) {
+			WARN_ON_ONCE(atomic_read(&pos->active) >= 0);
+			atomic_sub(KN_DEACTIVATED_BIAS, &pos->active);
+		}
+		WARN_ON_ONCE(pos->deact_depth < 0);
+	}
+
+	/* some nodes reactivated, kick get_active waiters */
+	wake_up_all(&kernfs_root(kn)->deactivate_waitq);
+}
+
+static void __kernfs_deactivate_self(struct kernfs_node *kn)
+{
+	/*
+	 * Take out ourself out of the active ref dependency chain and
+	 * deactivate.  If we're called without an active ref, lockdep will
+	 * complain.
+	 */
+	kernfs_put_active(kn);
+	__kernfs_deactivate(kn);
+}
+
+static void __kernfs_reactivate_self(struct kernfs_node *kn)
+{
+	__kernfs_reactivate(kn);
+	/*
+	 * Restore active ref dropped by deactivate_self() so that it's
+	 * balanced on return.  put_active() will soon be called on @kn, so
+	 * this can't break anything regardless of @kn's state.
+	 */
+	atomic_inc(&kn->active);
+	if (kernfs_lockdep(kn))
+		rwsem_acquire(&kn->dep_map, 0, 1, _RET_IP_);
+}
+
+/**
+ * kernfs_deactivate - deactivate subtree of a node
+ * @kn: kernfs_node to deactivate subtree of
+ *
+ * Deactivate the subtree of @kn.  On return, there's no active operation
+ * going on under @kn and creation or renaming of a node under @kn is
+ * blocked until @kn is reactivated or removed.  This function can be
+ * called multiple times and nests properly.  Each invocation should be
+ * paired with kernfs_reactivate().
+ *
+ * For a kernfs user which uses simple locking, the subsystem lock would
+ * nest inside active reference.  This becomes problematic if the user
+ * tries to remove nodes while holding the subystem lock as it would create
+ * a reverse locking dependency from the subsystem lock to active ref.
+ * This function can be used to break such reverse dependency.  The user
+ * can call this function outside the subsystem lock and then proceed to
+ * invoke kernfs_remove() while holding the subsystem lock without
+ * introducing such reverse dependency.
+ */
+void kernfs_deactivate(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_deactivate(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
+/**
+ * kernfs_reactivate - reactivate subtree of a node
+ * @kn: kernfs_node to reactivate subtree of
+ *
+ * Undo kernfs_deactivate().
+ */
+void kernfs_reactivate(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_reactivate(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
+/**
+ * kernfs_deactivate_self - deactivate subtree of a node from its own method
+ * @kn: the self kernfs_node to deactivate subtree of
+ *
+ * The caller must be running off of a kernfs operation which is invoked
+ * with an active reference - e.g. one of kernfs_ops.  Once this function
+ * is called, @kn may be removed by someone else while the enclosing method
+ * is in progress.  Other than that, this function is equivalent to
+ * kernfs_deactivate() and should be paired with kernfs_reactivate_self().
+ */
+void kernfs_deactivate_self(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_deactivate_self(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
+/**
+ * kernfs_reactivate_self - reactivate subtree of a node from its own method
+ * @kn: the self kernfs_node to reactivate subtree of
+ *
+ * Undo kernfs_deactivate_self().
+ */
+void kernfs_reactivate_self(struct kernfs_node *kn)
+{
+	mutex_lock(&kernfs_mutex);
+	__kernfs_reactivate_self(kn);
+	mutex_unlock(&kernfs_mutex);
+}
+
 static void __kernfs_remove(struct kernfs_node *kn)
 {
-	struct kernfs_root *root = kernfs_root(kn);
 	struct kernfs_node *pos;
 
 	lockdep_assert_held(&kernfs_mutex);
@@ -844,9 +957,6 @@ static void __kernfs_remove(struct kernfs_node *kn)
 
 		kernfs_put(pos);
 	} while (pos != kn);
-
-	/* some nodes killed, kick get_active waiters */
-	wake_up_all(&root->deactivate_waitq);
 }
 
 /**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 9b5a4bb..ac86930 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -80,6 +80,8 @@ struct kernfs_elem_attr {
 struct kernfs_node {
 	atomic_t		count;
 	atomic_t		active;
+	int			deact_depth;
+	unsigned int		hash;	/* ns + name hash */
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map	dep_map;
 #endif
@@ -90,7 +92,6 @@ struct kernfs_node {
 	struct rb_node		rb;
 
 	const void		*ns;	/* namespace tag */
-	unsigned int		hash;	/* ns + name hash */
 	union {
 		struct kernfs_elem_dir		dir;
 		struct kernfs_elem_symlink	symlink;
@@ -233,6 +234,10 @@ struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent,
 struct kernfs_node *kernfs_create_link(struct kernfs_node *parent,
 				       const char *name,
 				       struct kernfs_node *target);
+void kernfs_deactivate(struct kernfs_node *kn);
+void kernfs_reactivate(struct kernfs_node *kn);
+void kernfs_deactivate_self(struct kernfs_node *kn);
+void kernfs_reactivate_self(struct kernfs_node *kn);
 void kernfs_remove(struct kernfs_node *kn);
 int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
 			     const void *ns);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-01-13 22:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-10 13:57 [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
2014-01-10 13:57 ` [PATCH 01/14] kernfs: fix get_active failure handling in kernfs_seq_*() Tejun Heo
2014-01-10 13:57 ` [PATCH 02/14] kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq Tejun Heo
2014-01-10 13:57 ` [PATCH 03/14] kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep() Tejun Heo
2014-01-10 13:57 ` [PATCH 04/14] kernfs: remove KERNFS_REMOVED Tejun Heo
2014-01-10 13:57 ` [PATCH 05/14] kernfs: restructure removal path to fix possible premature return Tejun Heo
2014-01-10 13:57 ` [PATCH 06/14] kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove() Tejun Heo
2014-01-10 13:57 ` [PATCH 07/14] kernfs: remove kernfs_addrm_cxt Tejun Heo
2014-01-10 13:57 ` [PATCH 08/14] kernfs: make kernfs_get_active() block if the node is deactivated but not removed Tejun Heo
2014-01-10 13:57 ` [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]() Tejun Heo
2014-01-10 13:57 ` [PATCH 10/14] kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers Tejun Heo
2014-01-10 13:57 ` [PATCH 11/14] pci: use device_remove_file_self() instead of device_schedule_callback() Tejun Heo
2014-01-10 13:57 ` [PATCH 12/14] scsi: " Tejun Heo
2014-01-10 13:57 ` [PATCH 13/14] s390: " Tejun Heo
2014-01-10 13:57 ` [PATCH 14/14] sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() Tejun Heo
2014-01-10 15:08 ` [PATCHSET v3 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Greg KH
2014-01-11  0:19   ` Greg KH
2014-01-11 18:45     ` Tejun Heo
2014-01-13 21:17       ` Tejun Heo
2014-01-13 22:50         ` Greg KH
2014-01-13 22:54           ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2014-01-10 13:46 [PATCHSET v2 " Tejun Heo
2014-01-10 13:46 ` [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]() Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).