From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754039AbbLRQEc (ORCPT ); Fri, 18 Dec 2015 11:04:32 -0500 Received: from mail-ob0-f181.google.com ([209.85.214.181]:34877 "EHLO mail-ob0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752087AbbLRQEb (ORCPT ); Fri, 18 Dec 2015 11:04:31 -0500 MIME-Version: 1.0 In-Reply-To: References: <56736BD1.5080700@linux.intel.com> <5673750B.606@linux.intel.com> From: Andy Lutomirski Date: Fri, 18 Dec 2015 08:04:10 -0800 Message-ID: Subject: Re: Rethinking sigcontext's xfeatures slightly for PKRU's benefit? To: "H. Peter Anvin" Cc: Dave Hansen , Oleg Nesterov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Brian Gerst , "linux-kernel@vger.kernel.org" , Linus Torvalds , Christoph Hellwig Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 17, 2015 at 10:43 PM, H. Peter Anvin wrote: > On December 17, 2015 9:29:21 PM PST, Andy Lutomirski wrote: >>On Dec 17, 2015 6:53 PM, "Dave Hansen" >>wrote: >>> >>> On 12/17/2015 06:32 PM, Andy Lutomirski wrote: >>> > On Thu, Dec 17, 2015 at 6:13 PM, Dave Hansen >> wrote: >>> >> But what about the register state when delivering a signal? Don't >>we >>> >> set the registers to the init state? Do we need to preserve PKRU >>state >>> >> instead of init'ing it? The init state _is_ nice here because it >>is >>> >> permissive allows us to do something useful no matter what PKRU >>gets set to. >>> > >>> > I think we leave the extended regs alone. Don't we? >>> > >>> > I think that, for most signals, we want to leave PKRU as is, >>> > especially for things that aren't SIGSEGV. For SIGSEGV, maybe we >>want >>> > an option to reset PKRU on delivery (and then set the flag to >>restore >>> > on return?). >>> >>> Is there some precedent for doing the state differently for different >>> signals? >> >>Yes, to a very limited extent: SA_ONSTACK. >> >>> >>> >> Well, the signal handler isn't necessarily going to clobber it, >>but the >>> >> delivery code already clobbers it when going to the init state. >>> > >>> > Can you point to that code? >>> >>> handle_signal() -> fpu__clear() >>> >>> The comment around it says: >>> >>> "Ensure the signal handler starts with the new fpu state." >>> >> >>You win this round :) >> >>So maybe we should have a mask of xfeatures that aren't cleared on >>signal delivery (e.g. PKRU, perhaps) and that are, by default, >>preserved across signal returns. >> [...] > > I find the notion of PKRU not being initialized at the entry to a signal handler a bit odd. Saving/restoring it seems like the right thing to me. > > What am I missing? On Fri, Dec 18, 2015 at 12:32 AM, Ingo Molnar wrote: > > So the principle is this: signals are conceptually like creating a new thread of > execution, and that's a very powerful programming concept, like fork() or > pthread_create() are powerful concepts. So we definitely want to keep that default > behavior, and I don't think we want to deviate from that for typical new extended > CPU context features, even if signal delivery slows down as a result. I think that PKRU is in almost no respect a typical no extended CPU context feature. It's a control register, not a data register. Let's try a concrete example. One big use case for PKRU will be in conjunction with DAX-style pmem. Imagine some program (database server, perhaps) that mmaps 1 TB of pmem MAP_SHARED, PROT_READ | PROT_WRITE with protection key 1. It sets PKRU so that key 1 can't be written. All of its accesses to the pmem store are in tight code regions that do wrpkru; mov; wrpkru (just like what the kernel does with stac/clac on SMAP systems). >>From the app's perspective, it's imperative that stray writes that coincidentally hit the pmem region fail. (1 TB is big enough that a random write to canonical user memory hits it almost 1% of the time.) This means: 1. If a signal is delivered, the app wants PKRU to be set to some safe value as early as possible. The app can do it by itself if it doesn't use libraries that interfere, but it would be IMO much, much nicer if the kernel made this happen automatically. 1b. If the app malfunctions such that RSP points to pmem, the kernel MUST NOT clobber the pmem space. I think that this basically mandates that PKRU needs to have some safe state (i.e. definitely not the init state) on signal delivery: the kernel is going to write a signal frame at the address identified by RSP, and that address is in pmem, so those writes need to fail. 2. clone with shared VM should preserve PKRU. So in this regard, I think signals are kind of like new threads after all. Now that I think about it more, there are at least two possibilities that work for this use case. Option A: signal delivery preserves PKRU. If we go this route, I think we should decide whether PKRU is preserved on exit. Given that we're talking about adding syscalls to adjust PKRU, it seems a bit odd to me that sigreturn would, by default, undo whatever those syscalls do. Option B: signal delivery resets PKRU to user-specified values. We're talking about having syscalls to write PKRU. We could have clone and signal entry use the values set by syscall instead of the values in the actual PKRU register. That way: set_protection_key(1, no writes); ... wrpkru(allow writes to key 1) mov something <<<<<<<<<<< fault or async signal here wrpkru(disallow writes to key 1) leaves key 1 protected in the signal handler. If we go that route, I think we must restore PKRU just like any other xstate on sigreturn, so in some sense it's the simplest of all to implement. We'll need to change the signal delivery stuff to do the fpu__clear and the automatic PKRU write before trying to set up the signal frame so that the frame is written with the syscall-specified protections instead of the wrpkru-overridden values, but that should be easy. Which reminds me: __get_user, etc all respect PKRU because the SDM says that PKRU affects all user *and* kernel accesses to user memory (i.e. anything with _PAGE_USER). Should get_user_pages also respect PKRU? > > But we've been arguing about 'lightweight signals' for up to two decades that I > can remember. (The first such suggestion was to not save the FPU state, back when > FPU saves were ridiculously slow compared to other parts of saving/restoring a > context.) I'm not making any claims about lightweightness here. While performance is an issue in some cases, I'd like to hash out the baseline functionality first. > > So having a well-enumerated, extensible opt-in mask (which defaults to 'all state > saved') that allows smart signal handlers to skip the save/restore of certain CPU > context components would be acceptable I think. I have a patch that helps considerably with a probably-unnoticeable ABI change by letting signal delivery use sysret. I'll dust it off. --Andy