From mboxrd@z Thu Jan 1 00:00:00 1970 From: Corey Minyard Subject: Re: [PATCH][RT] x86: Fix an RT MCE crash Date: Thu, 30 Jun 2016 12:18:01 -0500 Message-ID: <57755449.7070302@acm.org> References: <1467293089-27656-1-git-send-email-minyard@acm.org> <20160630094301.22d32ec1@gandalf.local.home> <5775316F.2020102@acm.org> <20160630115101.6337c395@gandalf.local.home> <20160630160128.GA4365@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F3A14CDB9@ORSMSX114.amr.corp.intel.com> <57754B71.2000108@acm.org> <20160630170134.GA3932@pd.tnic> Reply-To: minyard@acm.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "Luck, Tony" , Steven Rostedt , "linux-rt-users@vger.kernel.org" , Corey Minyard To: Borislav Petkov Return-path: Received: from mail-pa0-f49.google.com ([209.85.220.49]:33663 "EHLO mail-pa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751068AbcF3RSG (ORCPT ); Thu, 30 Jun 2016 13:18:06 -0400 Received: by mail-pa0-f49.google.com with SMTP id b13so30209062pat.0 for ; Thu, 30 Jun 2016 10:18:05 -0700 (PDT) In-Reply-To: <20160630170134.GA3932@pd.tnic> Sender: linux-rt-users-owner@vger.kernel.org List-ID: On 06/30/2016 12:01 PM, Borislav Petkov wrote: > On Thu, Jun 30, 2016 at 11:40:17AM -0500, Corey Minyard wrote: >> I'm not sure. I've included the entire boot log below... > ... > >> [ 0.164185] [] try_to_wake_up+0x28/0x320 >> [ 0.164188] [] wake_up_process+0x10/0x20 >> [ 0.164207] [] mce_notify_irq+0x28/0x30 >> [ 0.164210] [] intel_threshold_interrupt+0xb5/0xd0 >> [ 0.164213] [] smp_threshold_interrupt+0x1c/0x40 >> [ 0.164221] [] threshold_interrupt+0x6a/0x70 >> [ 0.164223] >> [ 0.164226] [] ? cmci_recheck+0x67/0x70 >> [ 0.164241] [] setup_local_APIC+0x276/0x283 >> [ 0.164259] [] native_smp_prepare_cpus+0x379/0x43b >> [ 0.164266] [] kernel_init_freeable+0xd7/0x21a >> [ 0.164270] [] ? rest_init+0x90/0x90 >> [ 0.164272] [] kernel_init+0x9/0x180 >> [ 0.164275] [] ret_from_fork+0x58/0x90 >> [ 0.164277] [] ? rest_init+0x90/0x90 >> [ 0.164295] Code: e7 ff ff 48 8b 7d 08 e8 02 1a 95 ff 5d c3 55 48 89 e5 >> 41 54 53 48 89 fb 9c 41 5c fa bf 01 00 00 00 e8 a8 38 00 00 ba 00 01 00 00 >> 66 0f c1 13 0f b6 ce 38 d1 74 10 0f 1f 80 00 00 00 00 f3 90 >> [ 0.164298] RIP [] _raw_spin_lock_irqsave+0x1d/0x50 >> [ 0.164298] RSP >> [ 0.164299] CR2: 0000000000000600 >> [ 0.656225] ---[ end trace 0000000000000001 ]--- >> [ 0.656233] Kernel panic - not syncing: Fatal exception in interrupt > Hmm, we have that setup_local_APIC -> cmci_recheck path on latest kernel > too. However, we do init CMCI earlier, down the start_kernel() path. > > Would it be possible to boot latest upstream kernel on it to see whether > it explodes the same way? > > Thanks. This is on 3.10-rt with PREEMPT_RT enabled. It appears that from 3.18-rt and later it has code like the change I have proposed, so it does not crash. I could add a something to see if the interrupt is coming in early to 4.6-rt, is that what you are looking for? -corey