From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934670AbaKLCCR (ORCPT ); Tue, 11 Nov 2014 21:02:17 -0500 Received: from mail-lb0-f182.google.com ([209.85.217.182]:32888 "EHLO mail-lb0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932917AbaKLCCM (ORCPT ); Tue, 11 Nov 2014 21:02:12 -0500 MIME-Version: 1.0 In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F3292A157@ORSMSX114.amr.corp.intel.com> References: <20141111213628.GP31490@pd.tnic> <20141111223316.GQ31490@pd.tnic> <20141111230926.GR31490@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F3292A03B@ORSMSX114.amr.corp.intel.com> <3908561D78D1C84285E8C5FCA982C28F3292A157@ORSMSX114.amr.corp.intel.com> From: Andy Lutomirski Date: Tue, 11 Nov 2014 18:01:50 -0800 Message-ID: Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace To: "Luck, Tony" Cc: Borislav Petkov , X86 ML , "linux-kernel@vger.kernel.org" , Peter Zijlstra , Oleg Nesterov , Andi Kleen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 11, 2014 at 5:06 PM, Luck, Tony wrote: >> I've thought about one sneaky option. If we can reliably determine >> that we're an innocent bystander of a broadcast #MC, can we send an >> IPI-to-self and return without clearing MCIP? Then we get another >> interrupt as soon as interrupts are enabled, and we can clear MCIP at >> a time when we're definitely not running on the IST stack. > > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they > are quite easy to spot. Perhaps we might look at subverting the silly > broadcast by just having them immediately clear MCG_STATUS and iret > (i.e. not go to do_machine_check() at all). That would require lots of > surgery to do_machine_check() and friends - now it wouldn't be sure > how many processors to expect to show up. It also opens a different > window - once they are back running normal code they might trip another > machine check while the victims of the first are still processing - so > another "boom, you're dead". The advantage of hitting everyone > with the machine check is that it lessens the chance that another will > happen as everyone is running looking at a few pages of kernel code > & data. > > The worrying part in that is "as soon as interrupts are enabled". Until > we do clear MCIP we're sitting in a mode where another machine check > means instant death no saving throw. Nominally better than the "we'll > mess the stack up for you" that we are trying to avoid - but the old window > is quite short and known to be bounded. The new one might be a lot bigger. Yeah, fair enough. The annoying thing is that there's no way to atomically return from interrupt and clear MCIP. Here's a different idea. In do_machine_check, check if (regs->sp points at the machine check IST stack && !user_mode(regs)) and, if so, declare the machine check to be unrecoverable. There are a couple ways this can happen: - This is a second #MC that hit after clearing MCIP and before returning. It's genuinely unrecoverable (we're well and truly screwed at this point), but we probably won't actually crash unless we try to return. - This is a normal #MC that hit in kernel mode during a time when sp was bogus and coincidentally pointed at the #MC IST stack. This isn't perfect. A malicious user can do dummy syscalls in a loop on one CPU with rsp pointing at the IST stack and try to cause a machine check on a different CPU, causing the system to panic when it thinks that the first CPU had a recursive IST usage. I think that we probably have bigger problems if a malicious user can cause machine checks, though. --Andy