All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [REVIEW][PATCH] proc/sysctl: Don't grab i_lock under sysctl_lock.
Date: Tue, 21 Feb 2017 14:41:34 +1300	[thread overview]
Message-ID: <87a89gqq0h.fsf_-_@xmission.com> (raw)
In-Reply-To: <20170219084202.GJ29622@ZenIV.linux.org.uk> (Al Viro's message of "Sun, 19 Feb 2017 08:42:12 +0000")


Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> This patch has locking problem. I've got lockdep splat under LTP.
>
> [ 6633.115456] ======================================================
> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G             L
> [ 6633.115584] -------------------------------------------------------
> [ 6633.115627] ksm02/284980 is trying to acquire lock:
> [ 6633.115659]  (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
> [ 6633.115834] but task is already holding lock:
> [ 6633.115882]  (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
> [ 6633.116026] which lock already depends on the new lock.
> [ 6633.116026]
> [ 6633.116080]
> [ 6633.116080] the existing dependency chain (in reverse order) is:
> [ 6633.116117]
> -> #2 (sysctl_lock){+.+...}:
> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>
> d_lock nests inside i_lock
> sysctl_lock nests inside d_lock in d_compare
>
> This patch adds i_lock nesting inside sysctl_lock.

Al Viro <viro@ZenIV.linux.org.uk> replied:
> Once ->unregistering is set, you can drop sysctl_lock just fine.  So I'd
> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
> drop sysctl_lock() before it and regain after.  Make sure that no inodes
> are added to the list ones ->unregistering has been set and use RCU list
> primitives for modifying the inode list, with sysctl_lock still used to
> serialize its modifications.
>
> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
> igrab() is safe there.  Since we don't drop inode reference until after we'd
> passed beyond it in the list, list_for_each_entry_rcu() should be fine.

I agree with Al Viro's analsysis of the situtation.

Fixes: 802e348c6b77 ("proc/sysctl: prune stale dentries during unregistering")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

This is my cleaned up version of Al Viro's proposed fix.
I have tested it and the lockdep warnings go away, and
I have fixed a few trivial to ensure things work as intended.

Unless someone sees a problem I am going to add this fix to my tree and
then send a pull request to Linus.

 fs/proc/proc_sysctl.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8efb1e10b025..3e64c6502dc8 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -266,21 +266,19 @@ static void proc_sys_prune_dcache(struct ctl_table_header *head)
 	struct inode *inode, *prev = NULL;
 	struct proc_inode *ei;
 
-	list_for_each_entry(ei, &head->inodes, sysctl_inodes) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) {
 		inode = igrab(&ei->vfs_inode);
 		if (inode) {
-			spin_unlock(&sysctl_lock);
+			rcu_read_unlock();
 			iput(prev);
 			prev = inode;
 			d_prune_aliases(inode);
-			spin_lock(&sysctl_lock);
+			rcu_read_lock();
 		}
 	}
-	if (prev) {
-		spin_unlock(&sysctl_lock);
-		iput(prev);
-		spin_lock(&sysctl_lock);
-	}
+	rcu_read_unlock();
+	iput(prev);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
@@ -296,10 +294,10 @@ static void start_unregistering(struct ctl_table_header *p)
 		p->unregistering = &wait;
 		spin_unlock(&sysctl_lock);
 		wait_for_completion(&wait);
-		spin_lock(&sysctl_lock);
 	} else {
 		/* anything non-NULL; we'll never dereference it */
 		p->unregistering = ERR_PTR(-EINVAL);
+		spin_unlock(&sysctl_lock);
 	}
 	/*
 	 * Prune dentries for unregistered sysctls: namespaced sysctls
@@ -310,6 +308,7 @@ static void start_unregistering(struct ctl_table_header *p)
 	 * do not remove from the list until nobody holds it; walking the
 	 * list in do_sysctl() relies on that.
 	 */
+	spin_lock(&sysctl_lock);
 	erase_header(p);
 }
 
@@ -455,11 +454,17 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 	inode->i_ino = get_next_ino();
 
 	ei = PROC_I(inode);
-	ei->sysctl = head;
-	ei->sysctl_entry = table;
 
 	spin_lock(&sysctl_lock);
-	list_add(&ei->sysctl_inodes, &head->inodes);
+	if (unlikely(head->unregistering)) {
+		spin_unlock(&sysctl_lock);
+		iput(inode);
+		inode = NULL;
+		goto out;
+	}
+	ei->sysctl = head;
+	ei->sysctl_entry = table;
+	list_add_rcu(&ei->sysctl_inodes, &head->inodes);
 	head->count++;
 	spin_unlock(&sysctl_lock);
 
@@ -487,7 +492,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
 {
 	spin_lock(&sysctl_lock);
-	list_del(&PROC_I(inode)->sysctl_inodes);
+	list_del_rcu(&PROC_I(inode)->sysctl_inodes);
 	if (!--head->count)
 		kfree_rcu(head, rcu);
 	spin_unlock(&sysctl_lock);
-- 
2.10.1

  reply	other threads:[~2017-02-21  1:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-08 10:48 [PATCH] proc/sysctl: drop unregistered stale dentries as soon as possible Konstantin Khlebnikov
2017-02-08 21:48 ` Andrew Morton
2017-02-09  3:53   ` Al Viro
2017-02-09  7:36     ` Konstantin Khlebnikov
2017-02-09  8:40       ` Al Viro
2017-02-10  7:35         ` [PATCH] proc/sysctl: prune stale dentries during unregistering Konstantin Khlebnikov
2017-02-10  7:47           ` Al Viro
2017-02-10  7:54             ` Konstantin Khlebnikov
2017-02-13  9:54               ` Eric W. Biederman
2017-02-18 18:55           ` Konstantin Khlebnikov
2017-02-19  8:42             ` Al Viro
2017-02-21  1:41               ` Eric W. Biederman [this message]
2017-02-21  8:40                 ` [REVIEW][PATCH] proc/sysctl: Don't grab i_lock under sysctl_lock Konstantin Khlebnikov
2017-02-21 19:29                   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a89gqq0h.fsf_-_@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.