From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Content-Type: text/plain; charset="utf-8" Subject: Re: RFC: userspace exception fixups From: Andy Lutomirski In-Reply-To: <1541541565.8854.13.camel@intel.com> Date: Tue, 6 Nov 2018 15:00:56 -0800 CC: Andy Lutomirski , Dave Hansen , Jann Horn , Linus Torvalds , Rich Felker , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , "Peter Zijlstra" , , , "Ayoun, Serge" , , , Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "Carlos O'Donell" , Message-ID: <7FF4802E-FBC5-4E6D-A8F6-8A65114F18C7@amacapital.net> References: <20181102170627.GD7393@linux.intel.com> <20181102173350.GF7393@linux.intel.com> <20181102182712.GG7393@linux.intel.com> <20181102220437.GI7393@linux.intel.com> <1541518670.7839.31.camel@intel.com> <1541524750.7839.51.camel@intel.com> <22596E35-F5D1-4935-86AB-B510DCA0FABE@amacapital.net> <1C426267-492F-4AE7-8BE8-C7FE278531F9@amacapital.net> <209cf4a5-eda9-2495-539f-fed22252cf02@intel.com> <9B76E95B-5745-412E-8007-7FAA7F83D6FB@amacapital.net> <1541541565.8854.13.camel@intel.com> To: Sean Christopherson MIME-Version: 1.0 List-ID: >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson wrote: >>=20 >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: >>>> On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski w= rote: >>>>=20 >>>>=20 >>>>> On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote= : >>>>>=20 >>>>>=20 >>>>> On 11/6/18 12:12 PM, Andy Lutomirski wrote: >>>>> True, but what if we have a nasty enclave that writes to memory just >>>>> below SP *before* decrementing SP? >>>> Yeah, that would be unfortunate. If an enclave did this (roughly): >>>>=20 >>>> 1. EENTER >>>> 2. Hardware sets eenter_hwframe->sp =3D %sp >>>> 3. Enclave runs... wants to do out-call >>>> 4. Enclave sets up parameters: >>>> memcpy(&eenter_hwframe->sp[-offset], arg1, size); >>>> ... >>>> 5. Enclave sets eenter_hwframe->sp -=3D offset >>>>=20 >>>> If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' th= at >>>> was on the stack. The enclave could easily fix this by moving ->sp fi= rst. >>>>=20 >>>> But, this is one of those "fun" parts of the ABI that I think we need = to >>>> talk about. If we do this, we also basically require that the code >>>> which handles asynchronous exits must *not* write to the stack. That'= s >>>> not hard because it's typically just a single ERESUME instruction, but >>>> it *is* a requirement. >>> I was assuming that the async exit stuff was completely hidden by the A= PI. The AEP code would decide whether the exit got fixed up by the kernel (= which may or may not be easy to tell =E2=80=94 can the >>> code even tell without kernel help whether it was, say, an IRQ vs #UD?)= and then either do ERESUME or cause sgx_enter_enclave() to return with an = appropriate return value. >> Sean, how does the current SDK AEX handler decide whether to do >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems >> like the *CPU* could give a big hint, but I don't see where there is >> any architectural indication of why the AEX code got called or any >> obvious way for the user code to know whether the exit was fixed up by >> the kernel? >=20 > The SDK "unconditionally" does ERESUME at the AEP location, but that's > bit misleading because its signal handler may muck with the context's > RIP, e.g. to abort the enclave on a fatal fault. >=20 > On an event/exception from within an enclave, the event is immediately > delivered after loading synthetic state and changing RIP to the AEP. > In other words, jamming CPU state is essentially a bunch of vectoring > ucode preamble, but from software's perspective it's a normal event > that happens to point at the AEP instead of somewhere in the enclave. > And because the signals the SDK cares about are all synchronous, the > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > resides in its signal handler. IRQs and whatnot simply trampoline back > into the enclave. >=20 > Userspace can do something funky instead of ERESUME, but only *after* > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > case, after the trap handler has run. >=20 > Jumping back a bit, how much do we care about preventing userspace > from doing stupid things?=20 My general feeling is that userspace should be allowed to do apparently stu= pid things. For example, as far as the kernel is concerned, Wine and DOSEMU= are just user programs that do stupid things. Linux generally tries to pro= vide a reasonably complete view of architectural behavior. This is in contr= ast to, say, Windows, where IIUC doing an unapproved WRFSBASE May cause ver= y odd behavior indeed. So magic fixups that do non-architectural things are= not so great. The flip side, of course, is that the architecture is arguably inherently e= rratic here, and it=E2=80=99s apparently impossible to have an SGX library = with sane semantics without some kernel assistance. So if we can make my straw man API work, perhaps with vDSO or rseq-like hel= p, then the official SDK can use it, but less well behaved programs can sti= ll mostly work. (Modulo Linux=E2=80=99s non-support for EINITTOKEN, of cou= rse.) Thinking about it some more, the major sticking point may be finding the RI= P and stack frame of EENTER in the AEP code or in its fixup. The vDSO can= =E2=80=99t use TLS without serious hackery. We could massively abuse WRFSB= ASE, but that=E2=80=99s really ugly. (How does the Windows case work? If there=E2=80=99s an exception after the= untrusted stack allocation and before EEXIT and SEH tries to handle it, ho= w does the unwinder figure out where to start?) > I did a quick POC on the idea of hardcoding > fixup for the ENCLU opcode, and the basic idea checks out. The code > is fairly minimal and doesn't impact the core functionality of the SDK. > They'd need to redo their trap handling to move it from the signal > handler to inline, but their stack shenanigans won't be any more broken > than they already are. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MIME_QP_LONG_LINE, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C654BC32789 for ; Tue, 6 Nov 2018 23:01:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 79A1E20830 for ; Tue, 6 Nov 2018 23:01:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=amacapital-net.20150623.gappssmtp.com header.i=@amacapital-net.20150623.gappssmtp.com header.b="cfUZW+yE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 79A1E20830 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=amacapital.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-sgx-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726462AbeKGI2d (ORCPT ); Wed, 7 Nov 2018 03:28:33 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:39819 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727248AbeKGI2d (ORCPT ); Wed, 7 Nov 2018 03:28:33 -0500 Received: by mail-pf1-f195.google.com with SMTP id n11-v6so6806490pfb.6 for ; Tue, 06 Nov 2018 15:00:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=T+Gz74XK+/NUrbJf9v/Kr5K1X91pVUW6EhYdApm04EE=; b=cfUZW+yEwS75KFV1BBx0vpPxo31fkJM5QiwO5QCTGBiDMmM+xVXNNUbjEg2tjYQBMD Vpn29NPcXDPhGpjhZNXDe/nuFkE2kf//03ERZxdHfLBIzSwDQZjqtVF42CyvAXQIJicf lGmhGJi1Y2makUWLkUr2yLN9E9Qq3Opmm/bKl/H37d9itxHyBJ2c9/B47LygozsN+Lsn 8axZ6eiUZFWXi79ZrqEr4Dmt3olZPJ49/2pRaFNJWBqWe++jeWyakjRf0S7oydZgV9jj zTWqd+YUUx00NLn+2UO4cGjah6hoAMN7V55t7//okPxPT3F4aL/cI4Bx1HoVe/67shra lR7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=T+Gz74XK+/NUrbJf9v/Kr5K1X91pVUW6EhYdApm04EE=; b=Nfbrj/34Tk5vtDiyjpSzH7/f0LPLaX6AWWE9yjKAeDoaoHXLCfAm0UvFQ+rOG2Rz1A vIG0+5WwTD2X869l6VrHKzrtNysXatwHghjnLpBlR/WKUGnBncJvM2hdxN7BbBMzLJHj w4oyaLAqUL7yBoZT+iemB10MwxcXAj3uBPAEX477QNz5bmfCvrxa6n93l0AujBti1v2Q eBbP+gjll0hZe5Sv4sV2LwPKQZ+3Q5Ra/kJjiRXnMr2YT14xAwSdKOZfsg/XJimjdFuk BhwRYeQUgT3cUIx98Scq79xG18ReIQziVSC9YGrWHzeu2MKBcWwrtSj8j3q2+C5S2jyK FPdA== X-Gm-Message-State: AGRZ1gIwdSkLY+bpSwodp4kuXWfod6dQ4WNzD1JA9Ldf+aHRYvPPqa9O 0PTjSqmsppHx3nYcKUgT9R6o9w== X-Google-Smtp-Source: AJdET5c+ISWewHsHFr6lVrp9WH0O8ZY5oRzhQRFOMeD6jclIbq5zSo+ybPQKIqaXCah6MoUf9bs1Ig== X-Received: by 2002:aa7:8603:: with SMTP id p3-v6mr28212719pfn.247.1541545258810; Tue, 06 Nov 2018 15:00:58 -0800 (PST) Received: from ?IPv6:2601:646:c200:7429:8c77:2d31:8ab8:32d8? ([2601:646:c200:7429:8c77:2d31:8ab8:32d8]) by smtp.gmail.com with ESMTPSA id o12-v6sm65851261pfh.20.2018.11.06.15.00.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Nov 2018 15:00:57 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: RFC: userspace exception fixups From: Andy Lutomirski X-Mailer: iPhone Mail (16A404) In-Reply-To: <1541541565.8854.13.camel@intel.com> Date: Tue, 6 Nov 2018 15:00:56 -0800 Cc: Andy Lutomirski , Dave Hansen , Jann Horn , Linus Torvalds , Rich Felker , Dave Hansen , Jethro Beekman , Jarkko Sakkinen , Florian Weimer , Linux API , X86 ML , linux-arch , LKML , Peter Zijlstra , nhorman@redhat.com, npmccallum@redhat.com, "Ayoun, Serge" , shay.katz-zamir@intel.com, linux-sgx@vger.kernel.org, Andy Shevchenko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Carlos O'Donell , adhemerval.zanella@linaro.org Content-Transfer-Encoding: quoted-printable Message-Id: <7FF4802E-FBC5-4E6D-A8F6-8A65114F18C7@amacapital.net> References: <20181102170627.GD7393@linux.intel.com> <20181102173350.GF7393@linux.intel.com> <20181102182712.GG7393@linux.intel.com> <20181102220437.GI7393@linux.intel.com> <1541518670.7839.31.camel@intel.com> <1541524750.7839.51.camel@intel.com> <22596E35-F5D1-4935-86AB-B510DCA0FABE@amacapital.net> <1C426267-492F-4AE7-8BE8-C7FE278531F9@amacapital.net> <209cf4a5-eda9-2495-539f-fed22252cf02@intel.com> <9B76E95B-5745-412E-8007-7FAA7F83D6FB@amacapital.net> <1541541565.8854.13.camel@intel.com> To: Sean Christopherson Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-sgx@vger.kernel.org Message-ID: <20181106230056.8Ci0IYqiRdZ94vR2oYvpb4VDhl1RWly-g1b86I4gpYk@z> >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson wrote: >>=20 >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote: >>>> On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski wr= ote: >>>>=20 >>>>=20 >>>>> On Nov 6, 2018, at 1:00 PM, Dave Hansen wrote:= >>>>>=20 >>>>>=20 >>>>> On 11/6/18 12:12 PM, Andy Lutomirski wrote: >>>>> True, but what if we have a nasty enclave that writes to memory just >>>>> below SP *before* decrementing SP? >>>> Yeah, that would be unfortunate. If an enclave did this (roughly): >>>>=20 >>>> 1. EENTER >>>> 2. Hardware sets eenter_hwframe->sp =3D %sp >>>> 3. Enclave runs... wants to do out-call >>>> 4. Enclave sets up parameters: >>>> memcpy(&eenter_hwframe->sp[-offset], arg1, size); >>>> ... >>>> 5. Enclave sets eenter_hwframe->sp -=3D offset >>>>=20 >>>> If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' tha= t >>>> was on the stack. The enclave could easily fix this by moving ->sp fir= st. >>>>=20 >>>> But, this is one of those "fun" parts of the ABI that I think we need t= o >>>> talk about. If we do this, we also basically require that the code >>>> which handles asynchronous exits must *not* write to the stack. That's= >>>> not hard because it's typically just a single ERESUME instruction, but >>>> it *is* a requirement. >>> I was assuming that the async exit stuff was completely hidden by the AP= I. The AEP code would decide whether the exit got fixed up by the kernel (wh= ich may or may not be easy to tell =E2=80=94 can the >>> code even tell without kernel help whether it was, say, an IRQ vs #UD?) a= nd then either do ERESUME or cause sgx_enter_enclave() to return with an app= ropriate return value. >> Sean, how does the current SDK AEX handler decide whether to do >> EENTER, ERESUME, or just bail and consider the enclave dead? It seems >> like the *CPU* could give a big hint, but I don't see where there is >> any architectural indication of why the AEX code got called or any >> obvious way for the user code to know whether the exit was fixed up by >> the kernel? >=20 > The SDK "unconditionally" does ERESUME at the AEP location, but that's > bit misleading because its signal handler may muck with the context's > RIP, e.g. to abort the enclave on a fatal fault. >=20 > On an event/exception from within an enclave, the event is immediately > delivered after loading synthetic state and changing RIP to the AEP. > In other words, jamming CPU state is essentially a bunch of vectoring > ucode preamble, but from software's perspective it's a normal event > that happens to point at the AEP instead of somewhere in the enclave. > And because the signals the SDK cares about are all synchronous, the > SDK can simply hardcode ERESUME at the AEP since all of the fault logic > resides in its signal handler. IRQs and whatnot simply trampoline back > into the enclave. >=20 > Userspace can do something funky instead of ERESUME, but only *after* > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's > case, after the trap handler has run. >=20 > Jumping back a bit, how much do we care about preventing userspace > from doing stupid things?=20 My general feeling is that userspace should be allowed to do apparently stup= id things. For example, as far as the kernel is concerned, Wine and DOSEMU a= re just user programs that do stupid things. Linux generally tries to provid= e a reasonably complete view of architectural behavior. This is in contrast t= o, say, Windows, where IIUC doing an unapproved WRFSBASE May cause very odd b= ehavior indeed. So magic fixups that do non-architectural things are not so g= reat. The flip side, of course, is that the architecture is arguably inherently er= ratic here, and it=E2=80=99s apparently impossible to have an SGX library wi= th sane semantics without some kernel assistance. So if we can make my straw man API work, perhaps with vDSO or rseq-like help= , then the official SDK can use it, but less well behaved programs can still= mostly work. (Modulo Linux=E2=80=99s non-support for EINITTOKEN, of course= .) Thinking about it some more, the major sticking point may be finding the RIP= and stack frame of EENTER in the AEP code or in its fixup. The vDSO can=E2=80= =99t use TLS without serious hackery. We could massively abuse WRFSBASE, bu= t that=E2=80=99s really ugly. (How does the Windows case work? If there=E2=80=99s an exception after the u= ntrusted stack allocation and before EEXIT and SEH tries to handle it, how d= oes the unwinder figure out where to start?) > I did a quick POC on the idea of hardcoding > fixup for the ENCLU opcode, and the basic idea checks out. The code > is fairly minimal and doesn't impact the core functionality of the SDK. > They'd need to redo their trap handling to move it from the signal > handler to inline, but their stack shenanigans won't be any more broken > than they already are.