From: Sean Christopherson <sean.j.christopherson@intel.com> To: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com>, Jann Horn <jannh@google.com>, "Linus Torvalds" <torvalds@linux-foundation.org>, Rich Felker <dalias@libc.org>, Dave Hansen <dave.hansen@linux.intel.com>, Jethro Beekman <jethro@fortanix.com>, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, Florian Weimer <fweimer@redhat.com>, Linux API <linux-api@vger.kernel.org>, X86 ML <x86@kernel.org>, linux-arch <linux-arch@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, <nhorman@redhat.com>, <npmccallum@redhat.com>, "Ayoun, Serge" <serge.ayoun@intel.com>, <shay.katz-zamir@intel.com>, <linux-sgx@vger.kernel.org>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "Carlos O'Donell" <carlos@redhat.com>, <adhemerval.zanella@linaro.org> Subject: Re: RFC: userspace exception fixups Date: Tue, 6 Nov 2018 16:02:35 -0800 [thread overview] Message-ID: <20181107000235.GC11101@linux.intel.com> (raw) In-Reply-To: <CALCETrVySfV64YN7DWf3rsAxfiugJKsRJCNmEn-AKQ4dPYeG4Q@mail.gmail.com> On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > >> > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > >> any architectural indication of why the AEX code got called or any > > > >> obvious way for the user code to know whether the exit was fixed up by > > > >> the kernel? > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > bit misleading because its signal handler may muck with the context's > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > ucode preamble, but from software's perspective it's a normal event > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > And because the signals the SDK cares about are all synchronous, the > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > > into the enclave. > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > case, after the trap handler has run. > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > from doing stupid things? > > > > > > My general feeling is that userspace should be allowed to do apparently > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > DOSEMU are just user programs that do stupid things. Linux generally tries > > > to provide a reasonably complete view of architectural behavior. This is > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > > cause very odd behavior indeed. So magic fixups that do non-architectural > > > things are not so great. > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > that the enclave can EEXIT to immediately after the EENTER location. > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > instruction, not the EENTER instruction, so if we skip it we just end > up in lala land. Userspace would obviously need to be aware of the fixup behavior, but it actually works out fairly nicely to have a separate path for ERESUME fixup since a fault on EENTER is generally fatal, whereas as a fault on ERESUME might be recoverable. do_eenter: mov tcs, %rbx lea async_exit, %rcx mov $EENTER, %rax ENCLU /* * EEXIT or EENTER faulted. In the latter case, %RAX already holds some * fault indicator, e.g. -EFAULT. */ eexit_or_eenter_fault: ret async_exit: ENCLU fixup_handler: <do fault stuff> > How averse would everyone be to making enclave entry be a syscall? > The user code would do sys_sgx_enter_enclave(), and the kernel would > stash away the register state (vm86()-style), point RIP to the vDSO's > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > SYSRET. The trap handlers would understand what's going on and > restore register state accordingly. Wouldn't that blast away any stack changes made by the enclave?
WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <sean.j.christopherson@intel.com> To: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com>, Jann Horn <jannh@google.com>, Linus Torvalds <torvalds@linux-foundation.org>, Rich Felker <dalias@libc.org>, Dave Hansen <dave.hansen@linux.intel.com>, Jethro Beekman <jethro@fortanix.com>, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, Florian Weimer <fweimer@redhat.com>, Linux API <linux-api@vger.kernel.org>, X86 ML <x86@kernel.org>, linux-arch <linux-arch@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" <serge.ayoun@intel.com>, shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Carlos O'Donell <carlos@redhat.com>, adhemerval.zanella@linaro.org Subject: Re: RFC: userspace exception fixups Date: Tue, 6 Nov 2018 16:02:35 -0800 [thread overview] Message-ID: <20181107000235.GC11101@linux.intel.com> (raw) Message-ID: <20181107000235.yLAufcFmQhwhj4x6SaV5MK3ZqeyOCptLZkAKw6qn5to@z> (raw) In-Reply-To: <CALCETrVySfV64YN7DWf3rsAxfiugJKsRJCNmEn-AKQ4dPYeG4Q@mail.gmail.com> On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote: > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote: > > > > > > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > >> > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: > > > >> Sean, how does the current SDK AEX handler decide whether to do > > > >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems > > > >> like the *CPU* could give a big hint, but I don't see where there is > > > >> any architectural indication of why the AEX code got called or any > > > >> obvious way for the user code to know whether the exit was fixed up by > > > >> the kernel? > > > > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's > > > > bit misleading because its signal handler may muck with the context's > > > > RIP, e.g. to abort the enclave on a fatal fault. > > > > > > > > On an event/exception from within an enclave, the event is immediately > > > > delivered after loading synthetic state and changing RIP to the AEP. > > > > In other words, jamming CPU state is essentially a bunch of vectoring > > > > ucode preamble, but from software's perspective it's a normal event > > > > that happens to point at the AEP instead of somewhere in the enclave. > > > > And because the signals the SDK cares about are all synchronous, the > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > > > > resides in its signal handler. IRQs and whatnot simply trampoline back > > > > into the enclave. > > > > > > > > Userspace can do something funky instead of ERESUME, but only *after* > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > > > > case, after the trap handler has run. > > > > > > > > Jumping back a bit, how much do we care about preventing userspace > > > > from doing stupid things? > > > > > > My general feeling is that userspace should be allowed to do apparently > > > stupid things. For example, as far as the kernel is concerned, Wine and > > > DOSEMU are just user programs that do stupid things. Linux generally tries > > > to provide a reasonably complete view of architectural behavior. This is > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May > > > cause very odd behavior indeed. So magic fixups that do non-architectural > > > things are not so great. > > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU > > with a specific (ignored) prefix pattern? I.e. effectively make the magic > > fixup opt-in, falling back to signals. Jamming RIP to skip ENCLU isn't > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so > > that the enclave can EEXIT to immediately after the EENTER location. > > > > How does that even work, though? On an AEX, RIP points to the ERESUME > instruction, not the EENTER instruction, so if we skip it we just end > up in lala land. Userspace would obviously need to be aware of the fixup behavior, but it actually works out fairly nicely to have a separate path for ERESUME fixup since a fault on EENTER is generally fatal, whereas as a fault on ERESUME might be recoverable. do_eenter: mov tcs, %rbx lea async_exit, %rcx mov $EENTER, %rax ENCLU /* * EEXIT or EENTER faulted. In the latter case, %RAX already holds some * fault indicator, e.g. -EFAULT. */ eexit_or_eenter_fault: ret async_exit: ENCLU fixup_handler: <do fault stuff> > How averse would everyone be to making enclave entry be a syscall? > The user code would do sys_sgx_enter_enclave(), and the kernel would > stash away the register state (vm86()-style), point RIP to the vDSO's > ENCLU instruction, point RCX to another vDSO ENCLU instruction, and > SYSRET. The trap handlers would understand what's going on and > restore register state accordingly. Wouldn't that blast away any stack changes made by the enclave?
next prev parent reply other threads:[~2018-11-07 0:02 UTC|newest] Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-11-01 17:53 RFC: userspace exception fixups Andy Lutomirski 2018-11-01 17:53 ` Andy Lutomirski 2018-11-01 18:09 ` Florian Weimer 2018-11-01 18:09 ` Florian Weimer 2018-11-01 18:30 ` Rich Felker 2018-11-01 18:30 ` Rich Felker 2018-11-01 19:00 ` Jarkko Sakkinen 2018-11-01 19:00 ` Jarkko Sakkinen 2018-11-01 18:27 ` Rich Felker 2018-11-01 18:27 ` Rich Felker 2018-11-01 18:33 ` Jann Horn 2018-11-01 18:33 ` Jann Horn 2018-11-01 18:52 ` Rich Felker 2018-11-01 18:52 ` Rich Felker 2018-11-01 19:10 ` Linus Torvalds 2018-11-01 19:10 ` Linus Torvalds 2018-11-01 19:31 ` Rich Felker 2018-11-01 19:31 ` Rich Felker 2018-11-01 21:24 ` Linus Torvalds 2018-11-01 21:24 ` Linus Torvalds 2018-11-01 23:22 ` Andy Lutomirski 2018-11-01 23:22 ` Andy Lutomirski 2018-11-02 16:30 ` Sean Christopherson 2018-11-02 16:30 ` Sean Christopherson 2018-11-02 16:37 ` Jethro Beekman 2018-11-02 16:37 ` Jethro Beekman 2018-11-02 16:52 ` Sean Christopherson 2018-11-02 16:52 ` Sean Christopherson 2018-11-02 16:56 ` Jethro Beekman 2018-11-02 16:56 ` Jethro Beekman 2018-11-02 17:01 ` Andy Lutomirski 2018-11-02 17:01 ` Andy Lutomirski 2018-11-02 17:05 ` Jethro Beekman 2018-11-02 17:05 ` Jethro Beekman 2018-11-02 17:16 ` Andy Lutomirski 2018-11-02 17:16 ` Andy Lutomirski 2018-11-02 17:32 ` Rich Felker 2018-11-02 17:32 ` Rich Felker 2018-11-02 17:12 ` Sean Christopherson 2018-11-02 17:12 ` Sean Christopherson 2018-11-02 22:42 ` Jarkko Sakkinen 2018-11-02 22:42 ` Jarkko Sakkinen 2018-11-02 16:56 ` Dave Hansen 2018-11-02 16:56 ` Dave Hansen 2018-11-02 17:06 ` Sean Christopherson 2018-11-02 17:06 ` Sean Christopherson 2018-11-02 17:13 ` Dave Hansen 2018-11-02 17:13 ` Dave Hansen 2018-11-02 17:33 ` Sean Christopherson 2018-11-02 17:33 ` Sean Christopherson 2018-11-02 17:48 ` Andy Lutomirski 2018-11-02 17:48 ` Andy Lutomirski 2018-11-02 18:27 ` Sean Christopherson 2018-11-02 18:27 ` Sean Christopherson 2018-11-02 19:02 ` Jann Horn 2018-11-02 19:02 ` Jann Horn 2018-11-02 22:04 ` Sean Christopherson 2018-11-02 22:04 ` Sean Christopherson 2018-11-02 23:27 ` Jann Horn 2018-11-02 23:27 ` Jann Horn 2018-11-02 23:32 ` Andy Lutomirski 2018-11-02 23:32 ` Andy Lutomirski 2018-11-02 23:36 ` Jann Horn 2018-11-02 23:36 ` Jann Horn 2018-11-06 15:37 ` Sean Christopherson 2018-11-06 15:37 ` Sean Christopherson 2018-11-06 16:57 ` Andy Lutomirski 2018-11-06 16:57 ` Andy Lutomirski 2018-11-06 17:03 ` Dave Hansen 2018-11-06 17:03 ` Dave Hansen 2018-11-06 17:19 ` Sean Christopherson 2018-11-06 17:19 ` Sean Christopherson 2018-11-06 18:20 ` Andy Lutomirski 2018-11-06 18:20 ` Andy Lutomirski 2018-11-06 18:41 ` Dave Hansen 2018-11-06 18:41 ` Dave Hansen 2018-11-06 19:02 ` Andy Lutomirski 2018-11-06 19:02 ` Andy Lutomirski 2018-11-06 19:22 ` Dave Hansen 2018-11-06 19:22 ` Dave Hansen 2018-11-06 20:12 ` Andy Lutomirski 2018-11-06 20:12 ` Andy Lutomirski 2018-11-06 21:00 ` Dave Hansen 2018-11-06 21:00 ` Dave Hansen 2018-11-06 21:07 ` Andy Lutomirski 2018-11-06 21:07 ` Andy Lutomirski 2018-11-06 21:41 ` Andy Lutomirski 2018-11-06 21:41 ` Andy Lutomirski 2018-11-06 21:59 ` Sean Christopherson 2018-11-06 21:59 ` Sean Christopherson 2018-11-06 23:00 ` Andy Lutomirski 2018-11-06 23:00 ` Andy Lutomirski 2018-11-06 23:35 ` Sean Christopherson 2018-11-06 23:35 ` Sean Christopherson 2018-11-06 23:39 ` Andy Lutomirski 2018-11-06 23:39 ` Andy Lutomirski 2018-11-07 0:02 ` Sean Christopherson [this message] 2018-11-07 0:02 ` Sean Christopherson 2018-11-07 1:17 ` Andy Lutomirski 2018-11-07 1:17 ` Andy Lutomirski 2018-11-07 6:47 ` Jethro Beekman 2018-11-07 6:47 ` Jethro Beekman 2018-11-07 15:34 ` Sean Christopherson 2018-11-07 15:34 ` Sean Christopherson 2018-11-07 19:01 ` Sean Christopherson 2018-11-07 19:01 ` Sean Christopherson 2018-11-07 20:56 ` Dave Hansen 2018-11-07 20:56 ` Dave Hansen 2018-11-08 15:04 ` Jarkko Sakkinen 2018-11-08 15:04 ` Jarkko Sakkinen 2018-11-08 19:54 ` Sean Christopherson 2018-11-08 19:54 ` Sean Christopherson 2018-11-08 20:05 ` Andy Lutomirski 2018-11-08 20:05 ` Andy Lutomirski 2018-11-08 20:10 ` Dave Hansen 2018-11-08 20:10 ` Dave Hansen 2018-11-08 21:16 ` Sean Christopherson 2018-11-08 21:16 ` Sean Christopherson 2018-11-08 21:50 ` Dave Hansen 2018-11-08 21:50 ` Dave Hansen 2018-11-08 22:04 ` Sean Christopherson 2018-11-08 22:04 ` Sean Christopherson 2018-11-09 7:12 ` Christoph Hellwig 2018-11-09 7:12 ` Christoph Hellwig 2018-11-06 23:17 ` Rich Felker 2018-11-06 23:17 ` Rich Felker 2018-11-06 23:26 ` Sean Christopherson 2018-11-06 23:26 ` Sean Christopherson 2018-11-07 21:27 ` Rich Felker 2018-11-07 21:27 ` Rich Felker 2018-11-07 21:33 ` Andy Lutomirski 2018-11-07 21:33 ` Andy Lutomirski 2018-11-07 21:40 ` Sean Christopherson 2018-11-07 21:40 ` Sean Christopherson 2018-11-08 15:11 ` Jarkko Sakkinen 2018-11-08 15:11 ` Jarkko Sakkinen 2018-11-06 17:00 ` Dave Hansen 2018-11-06 17:00 ` Dave Hansen 2018-11-02 22:37 ` Jarkko Sakkinen 2018-11-02 22:37 ` Jarkko Sakkinen 2018-11-01 19:06 ` Linus Torvalds 2018-11-01 19:06 ` Linus Torvalds 2018-11-02 22:07 ` Jarkko Sakkinen 2018-11-02 22:07 ` Jarkko Sakkinen 2018-11-18 7:15 ` Jarkko Sakkinen 2018-11-18 7:18 ` Jarkko Sakkinen 2018-11-18 13:02 ` Jarkko Sakkinen 2018-11-19 5:17 ` Jethro Beekman 2018-11-19 14:05 ` Jarkko Sakkinen 2018-11-19 14:59 ` Jarkko Sakkinen 2018-11-19 15:29 ` Andy Lutomirski 2018-11-19 16:02 ` Jarkko Sakkinen 2018-11-19 17:00 ` Andy Lutomirski 2018-11-20 10:11 ` Jarkko Sakkinen 2018-11-20 15:19 ` Andy Lutomirski 2018-11-20 22:55 ` Jarkko Sakkinen 2018-11-21 5:17 ` Jethro Beekman 2018-11-21 15:17 ` Jarkko Sakkinen 2018-11-24 17:07 ` Jarkko Sakkinen 2018-11-26 14:35 ` Sean Christopherson 2018-11-26 22:06 ` Jarkko Sakkinen 2018-11-20 18:09 ` Sean Christopherson 2018-11-20 22:46 ` Jarkko Sakkinen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181107000235.GC11101@linux.intel.com \ --to=sean.j.christopherson@intel.com \ --cc=adhemerval.zanella@linaro.org \ --cc=andriy.shevchenko@linux.intel.com \ --cc=bp@alien8.de \ --cc=carlos@redhat.com \ --cc=dalias@libc.org \ --cc=dave.hansen@intel.com \ --cc=dave.hansen@linux.intel.com \ --cc=fweimer@redhat.com \ --cc=jannh@google.com \ --cc=jarkko.sakkinen@linux.intel.com \ --cc=jethro@fortanix.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-sgx@vger.kernel.org \ --cc=luto@kernel.org \ --cc=mingo@redhat.com \ --cc=nhorman@redhat.com \ --cc=npmccallum@redhat.com \ --cc=peterz@infradead.org \ --cc=serge.ayoun@intel.com \ --cc=shay.katz-zamir@intel.com \ --cc=tglx@linutronix.de \ --cc=torvalds@linux-foundation.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).