From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757113AbdDRPRO (ORCPT ); Tue, 18 Apr 2017 11:17:14 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:49570 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756662AbdDRPRG (ORCPT ); Tue, 18 Apr 2017 11:17:06 -0400 Date: Tue, 18 Apr 2017 08:17:00 -0700 From: "Paul E. McKenney" To: Johannes Berg Cc: Nicolai Stange , Greg Kroah-Hartman , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Reply-To: paulmck@linux.vnet.ibm.com References: <871stdyg0u.fsf@gmail.com> <20170416095137.2784-1-nicstange@gmail.com> <20170416095137.2784-10-nicstange@gmail.com> <20170417160121.GP3956@linux.vnet.ibm.com> <1492508367.2472.9.camel@sipsolutions.net> <20170418133136.GS3956@linux.vnet.ibm.com> <1492522832.18845.1.camel@sipsolutions.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1492522832.18845.1.camel@sipsolutions.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17041815-0036-0000-0000-000001E3C655 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006936; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00849178; UDB=6.00419284; IPR=6.00627820; BA=6.00005297; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015083; XFM=3.00000013; UTC=2017-04-18 15:17:03 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17041815-0037-0000-0000-00003FCD1BF6 Message-Id: <20170418151700.GU3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-18_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704180123 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 18, 2017 at 03:40:32PM +0200, Johannes Berg wrote: > On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote: > > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > > > > > If you have not already done so, please run this with debug > > > > enabled, > > > > especially CONFIG_PROVE_LOCKING=y (which implies > > > > CONFIG_PROVE_RCU=y). > > > > This is important because there are configurations for which the > > > > deadlocks you saw with SRCU turn into silent failure, including > > > > memory corruption. > > > > CONFIG_PROVE_RCU=y will catch many of those situations. > > > > > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > > > enabled in the builds where we saw the problem, but I'm not sure. > > > > CONFIG_PROVE_RCU=y will reliably catch things like this: > > > > 1. rcu_read_lock(); > > synchronize_rcu(); > > rcu_read_unlock(); > > Ok, that's not something that happens here either. > > > 2. rcu_read_lock(); > > schedule_timeout_interruptible(HZ); > > rcu_read_unlock(); > > Neither is this happening. > > > There are more, but this should get you the flavor of the types > > of bugs CONFIG_PROVE_RCU=y can locate for you. > > Makes sense. However, the issue at hand is what we (you and I) > discussed earlier wrt. lockdep -- from SRCU's point of view everything > is actually OK, except that the one thread is waiting for something and > we can never finish the grace period, and thus synchronize_srcu() will > never return. But there's no real SRCU bug here. > > > > Nicolai probably never even ran into this problem, though it should > > > be easy to reproduce. > > > > I am just worried that the situation resulting in the earlier SRCU > > deadlocks might be hiding behind CONFIG_PROVE_RCU=n, > > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n.  Or some other bug > > hiding behind some other set of Kconfig options. > > There's no SRCU deadlock though. I know exactly why it happens, in my > case, which is the following: > > Thread 1 > userspace: read(debugfs_file_1) > srcu_read_lock(&debugfs_srcu); // in debugfs bowels > wait_event_interruptible(...); // in my driver's debugfs read method > > Thread 2: > debugfs_remove(debugfs_file_2); > srcu_synchronize(&debugfs_srcu); // in debugfs bowels > > > This is the live-lock. The deadlock is something I posited but never > ran into: > > CPU 1 CPU 2 > srcu_read_lock(&debugfs_srcu); > rtnl_lock(); > rtnl_lock(); > srcu_synchronize(&debugfs_srcu); > > Again, no (S)RCU abuse here, just an ABBA deadlock. OK, please accept my apologies for failing to follow the thread. I nevertheless reiterate my advice to run at least some tests with CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep to find the above theoretical deadlock. Thanx, Paul