From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754111AbbCISFT (ORCPT ); Mon, 9 Mar 2015 14:05:19 -0400 Received: from mail-lb0-f175.google.com ([209.85.217.175]:37688 "EHLO mail-lb0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751596AbbCISFR (ORCPT ); Mon, 9 Mar 2015 14:05:17 -0400 MIME-Version: 1.0 In-Reply-To: References: <1425909943-14687-1-git-send-email-dvlasenk@redhat.com> <54FDDB90.9010706@zytor.com> From: Andy Lutomirski Date: Mon, 9 Mar 2015 11:04:55 -0700 Message-ID: Subject: Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) To: Linus Torvalds Cc: "H. Peter Anvin" , Denys Vlasenko , Steven Rostedt , Ingo Molnar , Borislav Petkov , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , "the arch/x86 maintainers" , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 9, 2015 at 10:59 AM, Linus Torvalds wrote: > On Mon, Mar 9, 2015 at 10:45 AM, Andy Lutomirski wrote: >> >> If sp0 is set to the very top of the stack, then an NMI immediately >> after sysenter will have OLDSS off the top of the stack, and reading >> it can crash. This is why 32-bit kernels have a (buggy!) 8 byte >> offset in sp0. > > So I think that for sysenter, we *should* have that 8-byte buffer. > > Not in general for sp0, but for MSR_IA32_SYSENTER_ESP (which is sp1, afaik). Bah. I'm slightly wrong here. It's not NMI immediately after sysenter (that's handled by the emergency stack gunk, although we might need to fix that too). It's this: ENTRY(ia32_sysenter_target) CFI_STARTPROC simple CFI_SIGNAL_FRAME CFI_DEF_CFA esp, 0 CFI_REGISTER esp, ebp movl TSS_sysenter_sp0(%esp),%esp sysenter_past_esp: /* * Interrupts are disabled here, but we can't trace it until * enough kernel state to call TRACE_IRQS_OFF can be called - but * we immediately enable interrupts at that point anyway. */ --> NMI here pushl_cfi $__USER_DS --> or here /*CFI_REL_OFFSET ss, 0*/ pushl_cfi %ebp We could fix that locally by loading sp0 - 8 atomically (w/ percpu help) and populating the top two words with mov instead of push. One option would be to change the NMI entry code to move itself down 8 bytes if this happens (came from kernel mode or sp == sp0 - 12, perhaps). Or we could suck it up and keep that 8 byte offset (and fix the cpu_tss initializer bug that threw me off in the first place). --Andy > > Just make the rule be that you can never ever have a kernel stack > frame that doesn't contain room for ss/sp at the top. > > We have various code that looks at and touches "pt_regs" anyway, and > accesses things out for debugging/oopsing/tracing etc. Let's not make > the rule be that you cannot look at regs->ss without checking various > random other fields first. > > Linus -- Andy Lutomirski AMA Capital Management, LLC