From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751223AbdBUBqT (ORCPT ); Mon, 20 Feb 2017 20:46:19 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:36523 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750928AbdBUBqR (ORCPT ); Mon, 20 Feb 2017 20:46:17 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Al Viro Cc: Konstantin Khlebnikov , linux-kernel@vger.kernel.org, Andrew Morton References: <20170209084016.GL13195@ZenIV.linux.org.uk> <148671210259.52694.13774349516906955456.stgit@buzz> <20170219084202.GJ29622@ZenIV.linux.org.uk> Date: Tue, 21 Feb 2017 14:41:34 +1300 In-Reply-To: <20170219084202.GJ29622@ZenIV.linux.org.uk> (Al Viro's message of "Sun, 19 Feb 2017 08:42:12 +0000") Message-ID: <87a89gqq0h.fsf_-_@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1cfzWr-0000uv-Nm;;;mid=<87a89gqq0h.fsf_-_@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=101.100.131.232;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19D+RMGgBR9r8qGabSzaQYvYDZk/InoaIA= X-SA-Exim-Connect-IP: 101.100.131.232 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 1.5 TR_Symld_Words too many words that have symbols inside * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.5 XM_Body_Dirty_Words Contains a dirty word * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject * 0.0 T_TooManySym_03 6+ unique symbols in subject X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Al Viro X-Spam-Relay-Country: X-Spam-Timing: total 5740 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 2.9 (0.1%), b_tie_ro: 2.0 (0.0%), parse: 1.18 (0.0%), extract_message_metadata: 20 (0.3%), get_uri_detail_list: 3.3 (0.1%), tests_pri_-1000: 8 (0.1%), tests_pri_-950: 1.13 (0.0%), tests_pri_-900: 0.94 (0.0%), tests_pri_-400: 29 (0.5%), check_bayes: 28 (0.5%), b_tokenize: 10 (0.2%), b_tok_get_all: 10 (0.2%), b_comp_prob: 2.8 (0.0%), b_tok_touch_all: 3.8 (0.1%), b_finish: 0.67 (0.0%), tests_pri_0: 296 (5.2%), check_dkim_signature: 0.52 (0.0%), check_dkim_adsp: 2.9 (0.1%), tests_pri_500: 5377 (93.7%), poll_dns_idle: 5367 (93.5%), rewrite_mail: 0.00 (0.0%) Subject: [REVIEW][PATCH] proc/sysctl: Don't grab i_lock under sysctl_lock. X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Konstantin Khlebnikov writes: > This patch has locking problem. I've got lockdep splat under LTP. > > [ 6633.115456] ====================================================== > [ 6633.115502] [ INFO: possible circular locking dependency detected ] > [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L > [ 6633.115584] ------------------------------------------------------- > [ 6633.115627] ksm02/284980 is trying to acquire lock: > [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [] igrab+0x1e/0x80 > [ 6633.115834] but task is already holding lock: > [ 6633.115882] (sysctl_lock){+.+...}, at: [] unregister_sysctl_table+0x6b/0x110 > [ 6633.116026] which lock already depends on the new lock. > [ 6633.116026] > [ 6633.116080] > [ 6633.116080] the existing dependency chain (in reverse order) is: > [ 6633.116117] > -> #2 (sysctl_lock){+.+...}: > -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}: > -> #0 (&sb->s_type->i_lock_key#4){+.+...}: > > d_lock nests inside i_lock > sysctl_lock nests inside d_lock in d_compare > > This patch adds i_lock nesting inside sysctl_lock. Al Viro replied: > Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd > try something like this - use rcu_read_lock() in proc_sys_prune_dcache(), > drop sysctl_lock() before it and regain after. Make sure that no inodes > are added to the list ones ->unregistering has been set and use RCU list > primitives for modifying the inode list, with sysctl_lock still used to > serialize its modifications. > > Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing > igrab() is safe there. Since we don't drop inode reference until after we'd > passed beyond it in the list, list_for_each_entry_rcu() should be fine. I agree with Al Viro's analsysis of the situtation. Fixes: 802e348c6b77 ("proc/sysctl: prune stale dentries during unregistering") Reported-by: Konstantin Khlebnikov Suggested-by: Al Viro Signed-off-by: "Eric W. Biederman" --- This is my cleaned up version of Al Viro's proposed fix. I have tested it and the lockdep warnings go away, and I have fixed a few trivial to ensure things work as intended. Unless someone sees a problem I am going to add this fix to my tree and then send a pull request to Linus. fs/proc/proc_sysctl.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 8efb1e10b025..3e64c6502dc8 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -266,21 +266,19 @@ static void proc_sys_prune_dcache(struct ctl_table_header *head) struct inode *inode, *prev = NULL; struct proc_inode *ei; - list_for_each_entry(ei, &head->inodes, sysctl_inodes) { + rcu_read_lock(); + list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) { inode = igrab(&ei->vfs_inode); if (inode) { - spin_unlock(&sysctl_lock); + rcu_read_unlock(); iput(prev); prev = inode; d_prune_aliases(inode); - spin_lock(&sysctl_lock); + rcu_read_lock(); } } - if (prev) { - spin_unlock(&sysctl_lock); - iput(prev); - spin_lock(&sysctl_lock); - } + rcu_read_unlock(); + iput(prev); } /* called under sysctl_lock, will reacquire if has to wait */ @@ -296,10 +294,10 @@ static void start_unregistering(struct ctl_table_header *p) p->unregistering = &wait; spin_unlock(&sysctl_lock); wait_for_completion(&wait); - spin_lock(&sysctl_lock); } else { /* anything non-NULL; we'll never dereference it */ p->unregistering = ERR_PTR(-EINVAL); + spin_unlock(&sysctl_lock); } /* * Prune dentries for unregistered sysctls: namespaced sysctls @@ -310,6 +308,7 @@ static void start_unregistering(struct ctl_table_header *p) * do not remove from the list until nobody holds it; walking the * list in do_sysctl() relies on that. */ + spin_lock(&sysctl_lock); erase_header(p); } @@ -455,11 +454,17 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, inode->i_ino = get_next_ino(); ei = PROC_I(inode); - ei->sysctl = head; - ei->sysctl_entry = table; spin_lock(&sysctl_lock); - list_add(&ei->sysctl_inodes, &head->inodes); + if (unlikely(head->unregistering)) { + spin_unlock(&sysctl_lock); + iput(inode); + inode = NULL; + goto out; + } + ei->sysctl = head; + ei->sysctl_entry = table; + list_add_rcu(&ei->sysctl_inodes, &head->inodes); head->count++; spin_unlock(&sysctl_lock); @@ -487,7 +492,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) { spin_lock(&sysctl_lock); - list_del(&PROC_I(inode)->sysctl_inodes); + list_del_rcu(&PROC_I(inode)->sysctl_inodes); if (!--head->count) kfree_rcu(head, rcu); spin_unlock(&sysctl_lock); -- 2.10.1