From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932523AbbA3T7F (ORCPT ); Fri, 30 Jan 2015 14:59:05 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:17630 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759811AbbA3T7B (ORCPT ); Fri, 30 Jan 2015 14:59:01 -0500 Message-ID: <54CBE23F.3010003@oracle.com> Date: Fri, 30 Jan 2015 14:57:51 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Andy Lutomirski , Paul McKenney CC: Borislav Petkov , X86 ML , Linus Torvalds , "linux-kernel@vger.kernel.org" , Peter Zijlstra , Oleg Nesterov , Tony Luck , Andi Kleen , Josh Triplett , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVj?= =?UTF-8?B?a2Vy?= Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context References: <7665538633a500255d7da9ca5985547f6a2aa191.1416604491.git.luto@amacapital.net> <54C17139.1040706@oracle.com> <20150123180455.GA3192@pd.tnic> <54C2B396.9090106@oracle.com> <20150128174817.GQ19109@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/28/2015 04:02 PM, Andy Lutomirski wrote: > On Wed, Jan 28, 2015 at 9:48 AM, Paul E. McKenney > wrote: >> On Wed, Jan 28, 2015 at 08:33:06AM -0800, Andy Lutomirski wrote: >>> On Fri, Jan 23, 2015 at 5:25 PM, Andy Lutomirski wrote: >>>> On Fri, Jan 23, 2015 at 12:48 PM, Sasha Levin wrote: >>>>> On 01/23/2015 01:34 PM, Andy Lutomirski wrote: >>>>>> On Fri, Jan 23, 2015 at 10:04 AM, Borislav Petkov wrote: >>>>>>> On Fri, Jan 23, 2015 at 09:58:01AM -0800, Andy Lutomirski wrote: >>>>>>>>> [ 543.999079] Call Trace: >>>>>>>>> [ 543.999079] dump_stack (lib/dump_stack.c:52) >>>>>>>>> [ 543.999079] lockdep_rcu_suspicious (kernel/locking/lockdep.c:4259) >>>>>>>>> [ 543.999079] atomic_notifier_call_chain (include/linux/rcupdate.h:892 kernel/notifier.c:182 kernel/notifier.c:193) >>>>>>>>> [ 543.999079] ? atomic_notifier_call_chain (kernel/notifier.c:192) >>>>>>>>> [ 543.999079] notify_die (kernel/notifier.c:538) >>>>>>>>> [ 543.999079] ? atomic_notifier_call_chain (kernel/notifier.c:538) >>>>>>>>> [ 543.999079] ? debug_smp_processor_id (lib/smp_processor_id.c:57) >>>>>>>>> [ 543.999079] do_debug (arch/x86/kernel/traps.c:652) >>>>>>>>> [ 543.999079] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609) >>>>>>>>> [ 543.999079] ? do_int3 (arch/x86/kernel/traps.c:610) >>>>>>>>> [ 543.999079] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601) >>>>>>>>> [ 543.999079] debug (arch/x86/kernel/entry_64.S:1310) >>>>>>>> >>>>>>>> I don't know how to read this stack trace. Are we in do_int3, >>>>>>>> do_debug, or both? I didn't change do_debug at all. >>>>>>> >>>>>>> It looks like we're in do_debug. do_int3 is only on the stack but not >>>>>>> part of the current frame if I can trust the '?' ... >>>>>>> >>>>>> >>>>>> It's possible that an int3 happened and I did something wrong on >>>>>> return that caused a subsequent do_debug to screw up, but I don't see >>>>>> how my patch would have caused that. >>>>>> >>>>>> Were there any earlier log messages? >>>>> >>>>> Nope, nothing odd before or after. >>>> >>>> Trinity just survived for a decent amount of time for me with my >>>> patches, other than a bunch of apparently expected OOM kills. I have >>>> no idea how to tell trinity how much memory to use. >>> >>> A longer trinity run on a larger VM survived (still with some OOM >>> kills, but no taint) with these patches. I suspect that it's a >>> regression somewhere else in the RCU changes. I have >>> CONFIG_PROVE_RCU=y, so I should have seen the failure if it was there, >>> I think. >> >> If by "RCU changes" you mean my changes to the RCU infrastructure, I am >> going to need more of a hint than I see in this thread thus far. ;-) >> > > I can't help much, since I can't reproduce the problem. Presumably if > it's a bug in -tip, someone else will trigger it, too. I'm not sure what to tell you here, I'm not using any weird options for trinity to reproduce it. It doesn't happen to frequently, but I still see it happening. Would you like me to try a debug patch or something similar? Thanks, Sasha