From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 2 Nov 2018 09:30:34 -0700 From: Sean Christopherson To: Andy Lutomirski CC: Linus Torvalds , Rich Felker , Jann Horn , Dave Hansen , Jethro Beekman , "Jarkko Sakkinen" , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , , , "Ayoun, Serge" , , , Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "Carlos O'Donell" , Subject: Re: RFC: userspace exception fixups Message-ID: <20181102163034.GB7393@linux.intel.com> References: <20181101185225.GC5150@brightrain.aerifal.cx> <20181101193107.GE5150@brightrain.aerifal.cx> Content-Type: text/plain; charset="us-ascii" In-Reply-To: Return-Path: sean.j.christopherson@intel.com MIME-Version: 1.0 List-ID: On Thu, Nov 01, 2018 at 04:22:55PM -0700, Andy Lutomirski wrote: > On Thu, Nov 1, 2018 at 2:24 PM Linus Torvalds > wrote: > > > > On Thu, Nov 1, 2018 at 12:31 PM Rich Felker wrote: > > > > > > See my other emails in this thread. You would register the *address* > > > (in TLS) of a function pointer object pointing to the handler, rather > > > than the function address of the handler. Then switching handler is > > > just a single store in userspace, no syscalls involved. > > > > Yes. > > > > And for just EENTER, maybe that's the right model. > > > > If we want to generalize it to other thread-synchronous faults, it > > needs way more information and a list of handlers, but if we limit the > > thing to _only_ EENTER getting an SGX fault, then a single "this is > > the fault handler" address is probably the right thing to do. > > It sounds like you're saying that the kernel should know, *before* > running any user fixup code, whether the fault in question is one that > wants a fixup. Sounds reasonable. > > I think it would be nice, but not absolutely necessary, if user code > didn't need to poke some value into TLS each time it ran a function > that had a fixup. With the poke-into-TLS approach, it looks a lot > like rseq, and rseq doesn't nest very nicely. I think we really want > this mechanism to Just Work. So we could maybe have a syscall that > associates a list of fixups with a given range of text addresses. We > might want the kernel to automatically zap the fixups when the text in > question is unmapped. If this is EENTER specific then nesting isn't an issue. But I don't see a simple way to restrict the mechanism to EENTER. What if rather than having userspace register an address for fixup the kernel instead unconditionally does fixup on the ENCLU opcode? For example, skip the instruction and put fault info into some combination of RDX/RSI/RDI (they're cleared on asynchronous enclave exits). The decode logic is straightforward since ENCLU doesn't have operands, we'd just have to eat any ignored prefixes. The intended convention for EENTER is to have an ENCLU at the AEX target (to automatically do ERESUME after INTR, etc...), so this would work regardless of whether the fault happened on EENTER or in the enclave. EENTER/ERESUME are the only ENCLU functions that are allowed outside of an enclave so there's no danger of accidentally crushing something else. This way we wouldn't need a VDSO blob and we'd enforce the kernel's ABI, e.g. a library that tried to use signal handling would go off the rails when the kernel mucked with the registers. We could even have the SGX EPC fault handler return VM_FAULT_SIGBUS if the faulting instruction isn't ENCLU, e.g. to further enforce that the AEX target needs to be ENCLU. Userspace would look something like this: mov tcs, %xbx /* Thread Control Structure address */ leaq async_exit(%rip), %rcx /* AEX target for EENTER/RESUME */ mov $SGX_EENTER, %rax /* EENTER leaf */ async_exit: ENCLU fault_handler: enclave_exit: /* EEXIT target */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B4BBC32789 for ; Fri, 2 Nov 2018 16:30:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E17D62081F for ; Fri, 2 Nov 2018 16:30:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E17D62081F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-sgx-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726849AbeKCBiR (ORCPT ); Fri, 2 Nov 2018 21:38:17 -0400 Received: from mga01.intel.com ([192.55.52.88]:30202 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726260AbeKCBiR (ORCPT ); Fri, 2 Nov 2018 21:38:17 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2018 09:30:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,456,1534834800"; d="scan'208";a="277840635" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.193]) by fmsmga006.fm.intel.com with ESMTP; 02 Nov 2018 09:30:35 -0700 Date: Fri, 2 Nov 2018 09:30:34 -0700 From: Sean Christopherson To: Andy Lutomirski Cc: Linus Torvalds , Rich Felker , Jann Horn , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Carlos O'Donell , adhemerval.zanella@linaro.org Subject: Re: RFC: userspace exception fixups Message-ID: <20181102163034.GB7393@linux.intel.com> References: <20181101185225.GC5150@brightrain.aerifal.cx> <20181101193107.GE5150@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Message-ID: <20181102163034.mmkHQXSRE_0PIs2BoQe49I2_yPv17FG8CccP-co54tE@z> On Thu, Nov 01, 2018 at 04:22:55PM -0700, Andy Lutomirski wrote: > On Thu, Nov 1, 2018 at 2:24 PM Linus Torvalds > wrote: > > > > On Thu, Nov 1, 2018 at 12:31 PM Rich Felker wrote: > > > > > > See my other emails in this thread. You would register the *address* > > > (in TLS) of a function pointer object pointing to the handler, rather > > > than the function address of the handler. Then switching handler is > > > just a single store in userspace, no syscalls involved. > > > > Yes. > > > > And for just EENTER, maybe that's the right model. > > > > If we want to generalize it to other thread-synchronous faults, it > > needs way more information and a list of handlers, but if we limit the > > thing to _only_ EENTER getting an SGX fault, then a single "this is > > the fault handler" address is probably the right thing to do. > > It sounds like you're saying that the kernel should know, *before* > running any user fixup code, whether the fault in question is one that > wants a fixup. Sounds reasonable. > > I think it would be nice, but not absolutely necessary, if user code > didn't need to poke some value into TLS each time it ran a function > that had a fixup. With the poke-into-TLS approach, it looks a lot > like rseq, and rseq doesn't nest very nicely. I think we really want > this mechanism to Just Work. So we could maybe have a syscall that > associates a list of fixups with a given range of text addresses. We > might want the kernel to automatically zap the fixups when the text in > question is unmapped. If this is EENTER specific then nesting isn't an issue. But I don't see a simple way to restrict the mechanism to EENTER. What if rather than having userspace register an address for fixup the kernel instead unconditionally does fixup on the ENCLU opcode? For example, skip the instruction and put fault info into some combination of RDX/RSI/RDI (they're cleared on asynchronous enclave exits). The decode logic is straightforward since ENCLU doesn't have operands, we'd just have to eat any ignored prefixes. The intended convention for EENTER is to have an ENCLU at the AEX target (to automatically do ERESUME after INTR, etc...), so this would work regardless of whether the fault happened on EENTER or in the enclave. EENTER/ERESUME are the only ENCLU functions that are allowed outside of an enclave so there's no danger of accidentally crushing something else. This way we wouldn't need a VDSO blob and we'd enforce the kernel's ABI, e.g. a library that tried to use signal handling would go off the rails when the kernel mucked with the registers. We could even have the SGX EPC fault handler return VM_FAULT_SIGBUS if the faulting instruction isn't ENCLU, e.g. to further enforce that the AEX target needs to be ENCLU. Userspace would look something like this: mov tcs, %xbx /* Thread Control Structure address */ leaq async_exit(%rip), %rcx /* AEX target for EENTER/RESUME */ mov $SGX_EENTER, %rax /* EENTER leaf */ async_exit: ENCLU fault_handler: enclave_exit: /* EEXIT target */