From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964970AbaKNAuk (ORCPT ); Thu, 13 Nov 2014 19:50:40 -0500 Received: from mail-lb0-f169.google.com ([209.85.217.169]:62950 "EHLO mail-lb0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964816AbaKNAuj (ORCPT ); Thu, 13 Nov 2014 19:50:39 -0500 MIME-Version: 1.0 In-Reply-To: References: <20141112220058.GA5295@redhat.com> <3908561D78D1C84285E8C5FCA982C28F3292BAB4@ORSMSX114.amr.corp.intel.com> <3908561D78D1C84285E8C5FCA982C28F3292BD44@ORSMSX114.amr.corp.intel.com> <3908561D78D1C84285E8C5FCA982C28F3292CB9A@ORSMSX114.amr.corp.intel.com> <3908561D78D1C84285E8C5FCA982C28F3292D57B@ORSMSX114.amr.corp.intel.com> From: Andy Lutomirski Date: Thu, 13 Nov 2014 16:50:16 -0800 Message-ID: Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace To: "Luck, Tony" Cc: Oleg Nesterov , Borislav Petkov , X86 ML , "linux-kernel@vger.kernel.org" , Peter Zijlstra , Andi Kleen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 13, 2014 at 3:13 PM, Andy Lutomirski wrote: > On Thu, Nov 13, 2014 at 2:47 PM, Andy Lutomirski wrote: >> On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote: >>>> Are you sure that this works in an unmodified kernel >>> >>> Unmodified kernel has run tens of thousands of injection/consumption/recovery cycles. >>> >>> I did get a crash with the entry/exit traces you asked for. Last 20000 lines of console log >>> attached. There are a couple of OOPs before things fall apart completely. I haven't yet >>> counted all the entry/exits from the last cycle to see if they match. >>> >> >> That log was a good hint, and I am a fool. I'll send a v3 once I test it. > > ...or not. I confused myself there. I thought I had a bug, but I was wrong. > > I'm stress-testing sleeping in an int3 handler that entered from user > space, and I'm not seeing any problems, even with perf firing lots of > NMIs. I'm also passing the kprobes smoke test with my patch applied, > and the stack switching code is correctly not switching stacks. > > Any chance you could try to trigger this this again with regs->sp, > regs->ip, and regs->cs added to the cpu=%d regs=... message? I feel > like I'm missing something weird here. Can you also try rebasing onto what will probably be v3? https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tag/?id=paranoid-stack-v2.9 It adds debugging for inappropriate reschedules from the wrong stack. Setting CONFIG_DEBUG_ATOMIC_SLEEP might also be a good idea. It seems plausible to me that the failure mode is worst == MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel). This could blow up in one of two ways: 1. current is the idle thread. This causes the warning you saw. 2. current is a real user process, but that MCE was nested inside another exception somehow or otherwise didn't switch stacks. Now we're on an IST stack and we schedule. So far so good, but the next thing that tries to use that IST stack cause lots of corruption. It looks like if bug 1 is happening, then you might never notice it without mce-stack.patch applied -- you'll set TIF_MCE_NOTIFY on the idle thread, but the idle thread never returns to userspace, so the mce notifier never has a chance to crash. --Andy