From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756883AbcKKS3D (ORCPT ); Fri, 11 Nov 2016 13:29:03 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:39282 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756014AbcKKS3C (ORCPT ); Fri, 11 Nov 2016 13:29:02 -0500 Date: Fri, 11 Nov 2016 10:28:56 -0800 From: "Paul E. McKenney" To: Guenter Roeck Cc: linux-kernel@vger.kernel.org Subject: Re: RCU related tracebacks in various boot tests in -next Reply-To: paulmck@linux.vnet.ibm.com References: <20161111143243.GA13198@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161111143243.GA13198@roeck-us.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111118-0016-0000-0000-000005279B3C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006059; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00779645; UDB=6.00375664; IPR=6.00556964; BA=6.00004872; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013294; XFM=3.00000011; UTC=2016-11-11 18:28:59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111118-0017-0000-0000-00003492B9CF Message-Id: <20161111182856.GH4127@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-11_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611110327 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 11, 2016 at 06:32:43AM -0800, Guenter Roeck wrote: > Hi, > > I see a number of RCU related tracebacks in recent versions of -next. > The problem was first seen in next-20161109. > > I didn't bisect, but my wild guess is to blame Paul's related changes. > Please let me know if a bisect is needed. > > Detailed logs are at http://kerneltests.org/builders. > Build/test scripts are in the various subdirectories of > https://github.com/groeck/linux-build-test/tree/master/rootfs/. > > Thanks, > Guenter > > -------- This is a failure on my part to convert fully to the new scheme, will fix. (The third trace makes the situation clear, this one at first glance looks like a failure to invoke irq_enter(), but it is most likely another symptom of the problem in the third trace.) > arm: > > [ 1.699096] =============================== > [ 1.699188] suspicious RCU usage. ] > [ 1.699340] 4.9.0-rc4-next-20161111 #4 Not tainted > [ 1.699462] ------------------------------- > [ 1.699554] ./include/trace/events/irq.h:69 suspicious rcu_dereference_check() usage! > [ 1.699707] > [ 1.699707] other info that might help us debug this: > [ 1.699707] > [ 1.699890] > [ 1.699890] RCU used illegally from idle CPU! > [ 1.699890] rcu_scheduler_active = 1, debug_locks = 0 > [ 1.700103] RCU used illegally from extended quiescent state! > [ 1.700256] no locks held by swapper/0/0. > [ 1.700317] > [ 1.700317] stack backtrace: > [ 1.700500] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc4-next-20161111 #4 > [ 1.700653] Hardware name: Generic OMAP3 (Flattened Device Tree) > [ 1.700988] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [ 1.701171] [] (show_stack) from [] (dump_stack+0xb0/0xe4) > [ 1.701354] [] (dump_stack) from [] (__handle_irq_event_percpu+0x2fc/0x3d0) > [ 1.701538] [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x1c/0x58) > [ 1.701751] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x38/0x5c) > [ 1.701934] [] (handle_irq_event) from [] (handle_level_irq+0xb8/0x14c) > [ 1.702087] [] (handle_level_irq) from [] (generic_handle_irq+0x20/0x34) > [ 1.702270] [] (generic_handle_irq) from [] (__handle_domain_irq+0x64/0xe0) > [ 1.702453] [] (__handle_domain_irq) from [] (__irq_svc+0x70/0x98) > [ 1.702606] [] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c) > [ 1.702789] [] (arch_cpu_idle) from [] (cpu_startup_entry+0x180/0x224) > [ 1.702941] [] (cpu_startup_entry) from [] (start_kernel+0x344/0x3b8) > [ 1.703186] [] (start_kernel) from [<8000807c>] (0x8000807c) > > arm64: For this second one.... > [ 1.247359] ------------[ cut here ]------------ > [ 1.247761] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:380 rcu_momentary_dyntick_idle+0xc0/0xc8 > [ 1.247884] Modules linked in: > [ 1.247987] > [ 1.248150] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc4-next-20161111 #1 > [ 1.248241] Hardware name: ZynqMP EP108 (DT) > [ 1.248350] task: ffff80007c4d1900 task.stack: ffff80007c4e0000 > [ 1.248445] PC is at rcu_momentary_dyntick_idle+0xc0/0xc8 > [ 1.248523] LR is at rcu_note_context_switch+0x1b4/0x2c0 > [ 1.248595] pc : [] lr : [] pstate: 600000c5 > [ 1.248676] sp : ffff80007c4e3ea0 > [ 1.248738] x29: ffff80007c4e3ea0 x28: ffff000008d88000 > [ 1.248842] x27: ffff80007c4e0000 x26: ffff000008da4803 > [ 1.248919] x25: 0000000000000000 x24: ffff80007c4e0000 > [ 1.248992] x23: ffff0000088b3544 x22: ffff80007c4d1900 > [ 1.249065] x21: ffff80007efb1880 x20: ffff000008d88000 > [ 1.249138] x19: ffff80007c4d1900 x18: 000000000000000e > [ 1.249211] x17: 0000000000000007 x16: 0000000000000001 > [ 1.249285] x15: 000000000000004c x14: 0000000000000000 > [ 1.249362] x13: 000000000000000b x12: 0000000000000006 > [ 1.249435] x11: ffff0000088e0ad8 x10: ffff80007efb18b0 > [ 1.249586] x9 : ffff0000088e0ad4 x8 : 00000000029f6300 > [ 1.249715] x7 : 7fffffffffffffff x6 : ffff80007efabbd0 > [ 1.249839] x5 : 0000000000000004 x4 : ffff000008da0688 > [ 1.249963] x3 : 0000000000000004 x2 : ffff80007efabbdc > [ 1.250088] x1 : 0000000000000001 x0 : ffff000008da4811 > [ 1.250213] > [ 1.250325] ---[ end trace 2930a57464ba65da ]--- > [ 1.250551] Call trace: > [ 1.251134] Exception stack(0xffff80007c4e3cd0 to 0xffff80007c4e3e00) > [ 1.251396] 3cc0: ffff80007c4d1900 0001000000000000 > [ 1.251605] 3ce0: ffff80007c4e3ea0 ffff000008114df0 ffff000008e6c388 ffff80007c4e3f80 > [ 1.251790] 3d00: ffff80007c4e3d90 ffff00000838b804 ffff000008e6c3ac ffff80007c4e3fd0 > [ 1.251972] 3d20: 00000000ffffffd8 ffff000008910b8c 00000000000003e0 ffff000008b4c1c1 > [ 1.252151] 3d40: ffff000008e6c388 ffff80007c4e3fd0 ffff000008b4c1bd ffff000008107264 > [ 1.252331] 3d60: ffff80007c4e3d90 ffff00000838b4e0 ffff000008da4811 0000000000000001 > [ 1.252509] 3d80: ffff80007efabbdc 0000000000000004 ffff000008da0688 0000000000000004 > [ 1.252688] 3da0: ffff80007efabbd0 7fffffffffffffff 00000000029f6300 ffff0000088e0ad4 > [ 1.252866] 3dc0: ffff80007efb18b0 ffff0000088e0ad8 0000000000000006 000000000000000b > [ 1.253043] 3de0: 0000000000000000 000000000000004c 0000000000000001 0000000000000007 > [ 1.253269] [] rcu_momentary_dyntick_idle+0xc0/0xc8 > [ 1.253442] [] __schedule+0x68/0x570 > [ 1.253598] [] schedule+0x3c/0xa8 > [ 1.253726] [] schedule_preempt_disabled+0x20/0x38 > [ 1.253884] [] cpu_startup_entry+0xd4/0x1d8 > [ 1.254026] [] secondary_start_kernel+0x158/0x198 > [ 1.254268] [<00000000008bd1a4>] 0x8bd1a4 Looks like I got too excited about adding WARN_ON_ONCE(), will fix. > s390: > > =============================== > > suspicious RCU usage. ] > 4.9.0-rc4-next-20161111 #1 Not tainted > ------------------------------- > ./include/trace/events/rcu.h:454 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > RCU used illegally from idle CPU! > rcu_scheduler_active = 1, debug_locks = 0 > RCU used illegally from extended quiescent state! > no locks held by swapper/0/0. > stack backtrace: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc4-next-20161111 #1 > Hardware name: QEMU QEMU QEMU (KVM) > Stack: > 000000001fffbcb0 000000001fffbd40 0000000000000003 0000000000000000 > 000000001fffbde0 000000001fffbd58 000000001fffbd58 0000000000000020 > 0000000001e20b70 ffffffffffffffff 0000000000000000 0404c0010000000a > 000000000000000c 000000001fffbda8 0000000000000000 0000000000000000 > 0000000000000000 00000000001168f4 000000001fffbd40 000000001fffbd88 > > Call Trace: > ([<00000000001167b0>] show_trace+0x100/0x104) > [<0000000000116896>] show_stack+0x6e/0xec > [<00000000007985e0>] dump_stack+0x94/0xd4 > [<000000000021882e>] rcu_eqs_exit_common.isra.49+0x1ee/0x21c > [<000000000022172a>] rcu_irq_enter+0x1ca/0x284 > [<0000000000182178>] irq_enter+0x40/0xc0 > [<000000000010dcca>] do_IRQ+0x5a/0x94 [<0000000000ad234e>] ext_int_handler+0x11e/0x124 > [<0000000000103c50>] enabled_wait+0x64/0x110 > ([<0000000000103c3c>] enabled_wait+0x50/0x110) > [<0000000000103eca>] arch_cpu_idle+0x42/0x54 > [<0000000000ad0576>] default_idle_call+0x4e/0x64 > [<00000000001ddcb0>] cpuidle_idle_call+0x54/0xa4 > [<00000000001dde62>] cpu_startup_entry+0xea/0x12c > [<0000000000eaec82>] start_kernel+0x3ee/0x3fc > [<0000000000100020>] _stext+0x20/0x80 And here I failed to convert one of the accesses to the ->dynticks counter to the new scheme. Again, will fix, patch below for the -next you were using. And thank you for your testing efforts! Thanx, Paul ------------------------------------------------------------------------ commit ee17957e96a196d3ea70a4b1c20ca09f8394ce66 Author: Paul E. McKenney Date: Fri Nov 11 10:24:56 2016 -0800 fixup! rcu: Abstract the dynticks snapshot operation diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c1e3d4a333b2..03f7bcbc2e99 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1139,7 +1139,9 @@ void rcu_nmi_exit(void) */ bool notrace __rcu_is_watching(void) { - return atomic_read(this_cpu_ptr(&rcu_dynticks.dynticks)) & 0x1; + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); + + return rcu_dynticks_in_eqs(rcu_dynticks_snap(rdtp)); } /**