From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756287AbdDRNbp (ORCPT ); Tue, 18 Apr 2017 09:31:45 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57831 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756022AbdDRNbm (ORCPT ); Tue, 18 Apr 2017 09:31:42 -0400 Date: Tue, 18 Apr 2017 06:31:36 -0700 From: "Paul E. McKenney" To: Johannes Berg Cc: Nicolai Stange , Greg Kroah-Hartman , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Reply-To: paulmck@linux.vnet.ibm.com References: <871stdyg0u.fsf@gmail.com> <20170416095137.2784-1-nicstange@gmail.com> <20170416095137.2784-10-nicstange@gmail.com> <20170417160121.GP3956@linux.vnet.ibm.com> <1492508367.2472.9.camel@sipsolutions.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1492508367.2472.9.camel@sipsolutions.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17041813-0048-0000-0000-0000015F9410 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006936; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00849143; UDB=6.00419263; IPR=6.00627785; BA=6.00005297; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015081; XFM=3.00000013; UTC=2017-04-18 13:31:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17041813-0049-0000-0000-000040963D96 Message-Id: <20170418133136.GS3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-18_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704180111 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > If you have not already done so, please run this with debug enabled, > > especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). > > This is important because there are configurations for which the > > deadlocks you saw with SRCU turn into silent failure, including > > memory corruption. > > CONFIG_PROVE_RCU=y will catch many of those situations. > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > enabled in the builds where we saw the problem, but I'm not sure. CONFIG_PROVE_RCU=y will reliably catch things like this: 1. rcu_read_lock(); synchronize_rcu(); rcu_read_unlock(); With CONFIG_PROVE_RCU=n and CONFIG_PREEMPT=n, this will result in too-short grace periods, which can free things out from under the read-side critical section, which in turn can result in arbitrary memory corruption. You might not even get a "scheduling while atomic", though CONFIG_PREEMPT_COUNT=y will produce this message. With CONFIG_PREEMPT=y, on the other hand, this should deadlock in a manner similar to the earlier SRCU deadlocks seen in debugfs. 2. rcu_read_lock(); schedule_timeout_interruptible(HZ); rcu_read_unlock(); With CONFIG_PROVE_RCU=y and CONFIG_PREEMPT=y, this will just work, more or less. Until someone runs with CONFIG_PREEMPT=n, which will produce "scheduling while atomic". (I have a fix for this queued for 4.13, FWIW, so that in the future CONFIG_PROVE_RCU=y and CONFIG_PREEMPT=y will complain about this. But for now, silent bug.) There are more, but this should get you the flavor of the types of bugs CONFIG_PROVE_RCU=y can locate for you. > Can you say which configurations you're thinking of? And perhaps what > kind of corruption you're thinking of also? I'm having a hard time > imagining any corruption that should happen? #1 is the silent corruption case given CONFIG_PROVE_RCU=n, CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. > Nicolai probably never even ran into this problem, though it should be > easy to reproduce. I am just worried that the situation resulting in the earlier SRCU deadlocks might be hiding behind CONFIG_PROVE_RCU=n, CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug hiding behind some other set of Kconfig options. Thanx, Paul