From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753520AbcGLW4w (ORCPT ); Tue, 12 Jul 2016 18:56:52 -0400 Received: from mail-lf0-f44.google.com ([209.85.215.44]:35369 "EHLO mail-lf0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753065AbcGLW4b (ORCPT ); Tue, 12 Jul 2016 18:56:31 -0400 MIME-Version: 1.0 In-Reply-To: <578524E0.6080401@intel.com> References: <20160707124719.3F04C882@viggo.jf.intel.com> <20160707124728.C1116BB1@viggo.jf.intel.com> <20160707144508.GZ11498@techsingularity.net> <577E924C.6010406@sr71.net> <20160708071810.GA27457@gmail.com> <577FD587.6050101@sr71.net> <20160709083715.GA29939@gmail.com> <5783AE8F.3@sr71.net> <5783BFB0.70203@intel.com> <578524E0.6080401@intel.com> From: Andy Lutomirski Date: Tue, 12 Jul 2016 15:55:45 -0700 Message-ID: Subject: Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls To: Dave Hansen Cc: Thomas Gleixner , Dave Hansen , Al Viro , X86 ML , Hugh Dickins , Andrew Morton , Linux API , Ingo Molnar , Mel Gorman , Linus Torvalds , linux-arch , "linux-mm@kvack.org" , Arnd Bergmann , Peter Zijlstra , "linux-kernel@vger.kernel.org" , "H. Peter Anvin" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 12, 2016 at 10:12 AM, Dave Hansen wrote: > On 07/12/2016 09:32 AM, Andy Lutomirski wrote: >> I think it's more or less impossible to get sensible behavior passing >> pkey != 0 data to legacy functions. If you call: >> >> void frob(struct foo *p); >> >> If frob in turn passes p to a thread, what PKRU is it supposed to use? > > The thread inheritance of PKRU can be nice. It actually gives things a > good chance of working if you can control PKRU before clone(). I'd > describe the semantics like this: > > PKRU values are inherited at the time of a clone() system > call. Threads unaware of protection keys may work on > protection-key-protected data as long as PKRU is set up in > advance of the clone() and never needs to be changed inside the > thread. > > If a thread is created before PKRU is set appropriately, the > thread may not be able to act on protection-key-protected data. Given the apparent need for seccomp's TSYNC, I'm a bit nervous that this will be restrictive to a problematic degree. > > Otherwise, the semantics are simpler, but they basically give threads no > chance of ever working: > > Threads unaware of protection keys and which can not manage > PKRU may not operate on data where a non-zero key has been > passed to pkey_mprotect(). > > It isn't clear to me that one of these is substantially better than the > other. It's fairly easy in either case for an app that cares to get the > behavior of the other. > > But, one is clearly easier to implement in the kernel. :) > >>>> So how is user code supposed lock down all of its threads? >>>> >>>> seccomp has TSYNC for this, but I don't think that PKRU allows >>>> something like that. >>> >>> I'm not sure this is possible for PKRU. Think of a simple PKRU >>> manipulation in userspace: >>> >>> pkru = rdpkru(); >>> pkru |= PKEY_DENY_ACCESS<>> wrpkru(pkru); >>> >>> If we push a PKRU value into a thread between the rdpkru() and wrpkru(), >>> we'll lose the content of that "push". I'm not sure there's any way to >>> guarantee this with a user-controlled register. >> >> We could try to insist that user code uses some vsyscall helper that >> tracks which bits are as-yet-unassigned. That's quite messy, though. > > Yeah, doable, but not without some new data going out to userspace, plus > the vsyscall code itself. > >> We could also arbitrarily partition the key space into >> initially-wide-open, initially-read-only, and initially-no-access and >> let pkey_alloc say which kind it wants. > > The point is still that wrpkru destroyed the 'push' operation. You > always end up with a PKRU that (at least temporarily) ignored the 'push'. > Not with my partitioning proposal. We'd never asynchronously modify another thread's state -- we'd start start with a mask that gives us a good chance of having the initial state always be useful. To be completely precise, the initial state would be something like: 0 = all access, 1 (PROT_EXEC) = deny read and write, 2-11: deny read and write, 12-21: deny write, 22-31: all access Then pkru_alloc would take a parameter giving the requested initial state, and it would only work if a key with that initial state is available. If we went with the vdso approach, the API could look like: pkru_state_t prev = pkru_push(mask, value); ... pkru_pop(prev); // or pkru_pop(mask, prev)? This doesn't fundamentally require the vdso, except that implementing bitwise operations on PKRU can't be done atomically with RDPKRU / WRPKRU. Grr. This also falls apart pretty badly when sigreturn happens, so I don't think I like this approach. --Andy