From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46D90C433E1 for ; Tue, 28 Jul 2020 17:32:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 151512078E for ; Tue, 28 Jul 2020 17:32:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595957536; bh=9/DO98f4NNb2r87lGlZR0IYODVEjduHFQ1okay382Kw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=MxH5C3MFdLdkd0BbL9tsXlRtqmuVO6LlmRO0OI/HjiKqct1M4nVI+LMGnsSCb0yC1 ZzViuvmibrHgvcX4A65Ugl7nge72hYaOfXIOP7AO1Jv5YHugAMtCsT9lugTZzKRx9u bkqqcy7QQ4RSSdSJ3aQ+QUyb6UGmfEDln5tD5cgQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732029AbgG1RcO (ORCPT ); Tue, 28 Jul 2020 13:32:14 -0400 Received: from mail.kernel.org ([198.145.29.99]:45770 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731892AbgG1RcN (ORCPT ); Tue, 28 Jul 2020 13:32:13 -0400 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CFFB7207F5 for ; Tue, 28 Jul 2020 17:32:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595957533; bh=9/DO98f4NNb2r87lGlZR0IYODVEjduHFQ1okay382Kw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=00hywb4ZCFn7lczpn0BiOMeOm9xDGiOqlvQj8/th2fDX/IzE1putRyI2doIv2tI36 gO4lphP9SFo10v3da6m5h1BH07cHRR/okVnVy90sS8zQtt5pSIghUkUXx0ikOM7Nd0 h6O5+kD6MR7Vr5ux5UT9pRxsLHVuGsY6clVQqbF0= Received: by mail-wr1-f48.google.com with SMTP id a14so19107671wra.5 for ; Tue, 28 Jul 2020 10:32:12 -0700 (PDT) X-Gm-Message-State: AOAM530yajwin7RyZjDC6V19W9n6q4ZvVu39H+FwZM5vj5LLCTvwgAk/ gQ+rjXtQwFSdPeUg0oS8tNpCFhO3sbtvwSDs/TTgPw== X-Google-Smtp-Source: ABdhPJxq1bq+hNP4spnLcyA3rbDMq8utpwpjTxFxTkTrdf+BFedxKdyd+N85+0IDFv89rVq0CIoCx1Y8N9iYBQx1zm4= X-Received: by 2002:a5d:5273:: with SMTP id l19mr25476365wrc.257.1595957531409; Tue, 28 Jul 2020 10:32:11 -0700 (PDT) MIME-Version: 1.0 References: <20200728131050.24443-1-madvenka@linux.microsoft.com> In-Reply-To: <20200728131050.24443-1-madvenka@linux.microsoft.com> From: Andy Lutomirski Date: Tue, 28 Jul 2020 10:31:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: madvenka@linux.microsoft.com Cc: Kernel Hardening , Linux API , linux-arm-kernel , Linux FS Devel , linux-integrity , LKML , LSM List , Oleg Nesterov , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote: > > =EF=BB=BFFrom: "Madhavan T. Venkataraman" > > The kernel creates the trampoline mapping without any permissions. When > the trampoline is executed by user code, a page fault happens and the > kernel gets control. The kernel recognizes that this is a trampoline > invocation. It sets up the user registers based on the specified > register context, and/or pushes values on the user stack based on the > specified stack context, and sets the user PC to the requested target > PC. When the kernel returns, execution continues at the target PC. > So, the kernel does the work of the trampoline on behalf of the > application. This is quite clever, but now I=E2=80=99m wondering just how much kernel he= lp is really needed. In your series, the trampoline is an non-executable page. I can think of at least two alternative approaches, and I'd like to know the pros and cons. 1. Entirely userspace: a return trampoline would be something like: 1: pushq %rax pushq %rbc pushq %rcx ... pushq %r15 movq %rsp, %rdi # pointer to saved regs leaq 1b(%rip), %rsi # pointer to the trampoline itself callq trampoline_handler # see below You would fill a page with a bunch of these, possibly compacted to get more per page, and then you would remap as many copies as needed. The 'callq trampoline_handler' part would need to be a bit clever to make it continue to work despite this remapping. This will be *much* faster than trampfd. How much of your use case would it cover? For the inverse, it's not too hard to write a bit of asm to set all registers and jump somewhere. 2. Use existing kernel functionality. Raise a signal, modify the state, and return from the signal. This is very flexible and may not be all that much slower than trampfd. 3. Use a syscall. Instead of having the kernel handle page faults, have the trampoline code push the syscall nr register, load a special new syscall nr into the syscall nr register, and do a syscall. On x86_64, this would be: pushq %rax movq __NR_magic_trampoline, %rax syscall with some adjustment if the stack slot you're clobbering is important. Also, will using trampfd cause issues with various unwinders? I can easily imagine unwinders expecting code to be readable, although this is slowly going away for other reasons. All this being said, I think that the kernel should absolutely add a sensible interface for JITs to use to materialize their code. This would integrate sanely with LSMs and wouldn't require hacks like using files, etc. A cleverly designed JIT interface could function without seriailization IPIs, and even lame architectures like x86 could potentially avoid shootdown IPIs if the interface copied code instead of playing virtual memory games. At its very simplest, this could be: void *jit_create_code(const void *source, size_t len); and the result would be a new anonymous mapping that contains exactly the code requested. There could also be: int jittfd_create(...); that does something similar but creates a memfd. A nicer implementation for short JIT sequences would allow appending more code to an existing JIT region. On x86, an appendable JIT region would start filled with 0xCC, and I bet there's a way to materialize new code into a previously 0xcc-filled virtual page wthout any synchronization. One approach would be to start with: 0xcc 0xcc ... 0xcc and to create a whole new page like: 0xcc ... 0xcc so that the only difference is that some code changed to some more code. Then replace the PTE to swap from the old page to the new page, and arrange to avoid freeing the old page until we're sure it's gone from all TLBs. This may not work if spans a page boundary. The #BP fixup would zap the TLB and retry. Even just directly copying code over some 0xcc bytes almost works, but there's a nasty corner case involving instructions that fetch I$ fetch boundaries. I'm not sure to what extent I$ snooping helps. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25659C433DF for ; Tue, 28 Jul 2020 17:33:59 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EB1E42078E for ; Tue, 28 Jul 2020 17:33:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="IGJqCP5s"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="00hywb4Z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EB1E42078E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=SqySYsCLWVJCvMSRN/QFakzQ5q3gyRLJJnU2mvJJu7k=; b=IGJqCP5sVyUGNTo1+WfFl2hon EuRv7qgvSvR+fhYGT9YM2k7NUPvq/8+KgM9r3FPXsk+6tBMHkfUk9hlHVrO75fcGUilIPSMVM8pFU vYUHic5zHXrBdJu8TSK0wAu7bbNSCbS/CmH97CmwhCts5aGCPqPkh0lGgMzRVoXmnGy5E6P6oFAZv sD2vPxJsviwR8LLxcbsviCbEsl3I0oFo+8N0h8zWdJaorua019w0bS0kJ6Lg/j1t9GoI+f0i/IbcB qxyb48FmcymUzL5Rt2NtqKWlahl2NcZWcLzSD/gu8CqTRQ6NCBdUfVF+1T1xGEk55RRW2Jp82h1gw Exxn2NvCQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k0TSa-0002Dw-QC; Tue, 28 Jul 2020 17:32:16 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k0TSX-0002D3-VJ for linux-arm-kernel@lists.infradead.org; Tue, 28 Jul 2020 17:32:15 +0000 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E1CFC20A8B for ; Tue, 28 Jul 2020 17:32:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595957533; bh=9/DO98f4NNb2r87lGlZR0IYODVEjduHFQ1okay382Kw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=00hywb4ZCFn7lczpn0BiOMeOm9xDGiOqlvQj8/th2fDX/IzE1putRyI2doIv2tI36 gO4lphP9SFo10v3da6m5h1BH07cHRR/okVnVy90sS8zQtt5pSIghUkUXx0ikOM7Nd0 h6O5+kD6MR7Vr5ux5UT9pRxsLHVuGsY6clVQqbF0= Received: by mail-wr1-f52.google.com with SMTP id b6so19083052wrs.11 for ; Tue, 28 Jul 2020 10:32:12 -0700 (PDT) X-Gm-Message-State: AOAM533t8KBzk3o3TLyQl7No21N/Y3t3Q+wPrw+V/ohJpG/zGj2D5GB6 0RIGR8LjN4BUeJoURWHn6D9fK6qPBlZgJU0T45xqFg== X-Google-Smtp-Source: ABdhPJxq1bq+hNP4spnLcyA3rbDMq8utpwpjTxFxTkTrdf+BFedxKdyd+N85+0IDFv89rVq0CIoCx1Y8N9iYBQx1zm4= X-Received: by 2002:a5d:5273:: with SMTP id l19mr25476365wrc.257.1595957531409; Tue, 28 Jul 2020 10:32:11 -0700 (PDT) MIME-Version: 1.0 References: <20200728131050.24443-1-madvenka@linux.microsoft.com> In-Reply-To: <20200728131050.24443-1-madvenka@linux.microsoft.com> From: Andy Lutomirski Date: Tue, 28 Jul 2020 10:31:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: madvenka@linux.microsoft.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200728_133214_171541_60A94FE7 X-CRM114-Status: GOOD ( 27.55 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kernel Hardening , Linux API , X86 ML , LKML , Oleg Nesterov , LSM List , Linux FS Devel , linux-integrity , linux-arm-kernel Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org PiBPbiBKdWwgMjgsIDIwMjAsIGF0IDY6MTEgQU0sIG1hZHZlbmthQGxpbnV4Lm1pY3Jvc29mdC5j b20gd3JvdGU6Cj4KPiDvu79Gcm9tOiAiTWFkaGF2YW4gVC4gVmVua2F0YXJhbWFuIiA8bWFkdmVu a2FAbGludXgubWljcm9zb2Z0LmNvbT4KPgoKPiBUaGUga2VybmVsIGNyZWF0ZXMgdGhlIHRyYW1w b2xpbmUgbWFwcGluZyB3aXRob3V0IGFueSBwZXJtaXNzaW9ucy4gV2hlbgo+IHRoZSB0cmFtcG9s aW5lIGlzIGV4ZWN1dGVkIGJ5IHVzZXIgY29kZSwgYSBwYWdlIGZhdWx0IGhhcHBlbnMgYW5kIHRo ZQo+IGtlcm5lbCBnZXRzIGNvbnRyb2wuIFRoZSBrZXJuZWwgcmVjb2duaXplcyB0aGF0IHRoaXMg aXMgYSB0cmFtcG9saW5lCj4gaW52b2NhdGlvbi4gSXQgc2V0cyB1cCB0aGUgdXNlciByZWdpc3Rl cnMgYmFzZWQgb24gdGhlIHNwZWNpZmllZAo+IHJlZ2lzdGVyIGNvbnRleHQsIGFuZC9vciBwdXNo ZXMgdmFsdWVzIG9uIHRoZSB1c2VyIHN0YWNrIGJhc2VkIG9uIHRoZQo+IHNwZWNpZmllZCBzdGFj ayBjb250ZXh0LCBhbmQgc2V0cyB0aGUgdXNlciBQQyB0byB0aGUgcmVxdWVzdGVkIHRhcmdldAo+ IFBDLiBXaGVuIHRoZSBrZXJuZWwgcmV0dXJucywgZXhlY3V0aW9uIGNvbnRpbnVlcyBhdCB0aGUg dGFyZ2V0IFBDLgo+IFNvLCB0aGUga2VybmVsIGRvZXMgdGhlIHdvcmsgb2YgdGhlIHRyYW1wb2xp bmUgb24gYmVoYWxmIG9mIHRoZQo+IGFwcGxpY2F0aW9uLgoKVGhpcyBpcyBxdWl0ZSBjbGV2ZXIs IGJ1dCBub3cgSeKAmW0gd29uZGVyaW5nIGp1c3QgaG93IG11Y2gga2VybmVsIGhlbHAKaXMgcmVh bGx5IG5lZWRlZC4gSW4geW91ciBzZXJpZXMsIHRoZSB0cmFtcG9saW5lIGlzIGFuIG5vbi1leGVj dXRhYmxlCnBhZ2UuICBJIGNhbiB0aGluayBvZiBhdCBsZWFzdCB0d28gYWx0ZXJuYXRpdmUgYXBw cm9hY2hlcywgYW5kIEknZApsaWtlIHRvIGtub3cgdGhlIHByb3MgYW5kIGNvbnMuCgoxLiBFbnRp cmVseSB1c2Vyc3BhY2U6IGEgcmV0dXJuIHRyYW1wb2xpbmUgd291bGQgYmUgc29tZXRoaW5nIGxp a2U6CgoxOgpwdXNocSAlcmF4CnB1c2hxICVyYmMKcHVzaHEgJXJjeAouLi4KcHVzaHEgJXIxNQpt b3ZxICVyc3AsICVyZGkgIyBwb2ludGVyIHRvIHNhdmVkIHJlZ3MKbGVhcSAxYiglcmlwKSwgJXJz aSAjIHBvaW50ZXIgdG8gdGhlIHRyYW1wb2xpbmUgaXRzZWxmCmNhbGxxIHRyYW1wb2xpbmVfaGFu ZGxlciAjIHNlZSBiZWxvdwoKWW91IHdvdWxkIGZpbGwgYSBwYWdlIHdpdGggYSBidW5jaCBvZiB0 aGVzZSwgcG9zc2libHkgY29tcGFjdGVkIHRvIGdldAptb3JlIHBlciBwYWdlLCBhbmQgdGhlbiB5 b3Ugd291bGQgcmVtYXAgYXMgbWFueSBjb3BpZXMgYXMgbmVlZGVkLiAgVGhlCidjYWxscSB0cmFt cG9saW5lX2hhbmRsZXInIHBhcnQgd291bGQgbmVlZCB0byBiZSBhIGJpdCBjbGV2ZXIgdG8gbWFr ZQppdCBjb250aW51ZSB0byB3b3JrIGRlc3BpdGUgdGhpcyByZW1hcHBpbmcuICBUaGlzIHdpbGwg YmUgKm11Y2gqCmZhc3RlciB0aGFuIHRyYW1wZmQuIEhvdyBtdWNoIG9mIHlvdXIgdXNlIGNhc2Ug d291bGQgaXQgY292ZXI/ICBGb3IKdGhlIGludmVyc2UsIGl0J3Mgbm90IHRvbyBoYXJkIHRvIHdy aXRlIGEgYml0IG9mIGFzbSB0byBzZXQgYWxsCnJlZ2lzdGVycyBhbmQganVtcCBzb21ld2hlcmUu CgoyLiBVc2UgZXhpc3Rpbmcga2VybmVsIGZ1bmN0aW9uYWxpdHkuICBSYWlzZSBhIHNpZ25hbCwg bW9kaWZ5IHRoZQpzdGF0ZSwgYW5kIHJldHVybiBmcm9tIHRoZSBzaWduYWwuICBUaGlzIGlzIHZl cnkgZmxleGlibGUgYW5kIG1heSBub3QKYmUgYWxsIHRoYXQgbXVjaCBzbG93ZXIgdGhhbiB0cmFt cGZkLgoKMy4gVXNlIGEgc3lzY2FsbC4gIEluc3RlYWQgb2YgaGF2aW5nIHRoZSBrZXJuZWwgaGFu ZGxlIHBhZ2UgZmF1bHRzLApoYXZlIHRoZSB0cmFtcG9saW5lIGNvZGUgcHVzaCB0aGUgc3lzY2Fs bCBuciByZWdpc3RlciwgbG9hZCBhIHNwZWNpYWwKbmV3IHN5c2NhbGwgbnIgaW50byB0aGUgc3lz Y2FsbCBuciByZWdpc3RlciwgYW5kIGRvIGEgc3lzY2FsbC4gT24KeDg2XzY0LCB0aGlzIHdvdWxk IGJlOgoKcHVzaHEgJXJheAptb3ZxIF9fTlJfbWFnaWNfdHJhbXBvbGluZSwgJXJheApzeXNjYWxs Cgp3aXRoIHNvbWUgYWRqdXN0bWVudCBpZiB0aGUgc3RhY2sgc2xvdCB5b3UncmUgY2xvYmJlcmlu ZyBpcyBpbXBvcnRhbnQuCgoKQWxzbywgd2lsbCB1c2luZyB0cmFtcGZkIGNhdXNlIGlzc3VlcyB3 aXRoIHZhcmlvdXMgdW53aW5kZXJzPyAgSSBjYW4KZWFzaWx5IGltYWdpbmUgdW53aW5kZXJzIGV4 cGVjdGluZyBjb2RlIHRvIGJlIHJlYWRhYmxlLCBhbHRob3VnaCB0aGlzCmlzIHNsb3dseSBnb2lu ZyBhd2F5IGZvciBvdGhlciByZWFzb25zLgoKQWxsIHRoaXMgYmVpbmcgc2FpZCwgSSB0aGluayB0 aGF0IHRoZSBrZXJuZWwgc2hvdWxkIGFic29sdXRlbHkgYWRkIGEKc2Vuc2libGUgaW50ZXJmYWNl IGZvciBKSVRzIHRvIHVzZSB0byBtYXRlcmlhbGl6ZSB0aGVpciBjb2RlLiAgVGhpcwp3b3VsZCBp bnRlZ3JhdGUgc2FuZWx5IHdpdGggTFNNcyBhbmQgd291bGRuJ3QgcmVxdWlyZSBoYWNrcyBsaWtl IHVzaW5nCmZpbGVzLCBldGMuICBBIGNsZXZlcmx5IGRlc2lnbmVkIEpJVCBpbnRlcmZhY2UgY291 bGQgZnVuY3Rpb24gd2l0aG91dApzZXJpYWlsaXphdGlvbiBJUElzLCBhbmQgZXZlbiBsYW1lIGFy Y2hpdGVjdHVyZXMgbGlrZSB4ODYgY291bGQKcG90ZW50aWFsbHkgYXZvaWQgc2hvb3Rkb3duIElQ SXMgaWYgdGhlIGludGVyZmFjZSBjb3BpZWQgY29kZSBpbnN0ZWFkCm9mIHBsYXlpbmcgdmlydHVh bCBtZW1vcnkgZ2FtZXMuICBBdCBpdHMgdmVyeSBzaW1wbGVzdCwgdGhpcyBjb3VsZCBiZToKCnZv aWQgKmppdF9jcmVhdGVfY29kZShjb25zdCB2b2lkICpzb3VyY2UsIHNpemVfdCBsZW4pOwoKYW5k IHRoZSByZXN1bHQgd291bGQgYmUgYSBuZXcgYW5vbnltb3VzIG1hcHBpbmcgdGhhdCBjb250YWlu cyBleGFjdGx5CnRoZSBjb2RlIHJlcXVlc3RlZC4gIFRoZXJlIGNvdWxkIGFsc28gYmU6CgppbnQg aml0dGZkX2NyZWF0ZSguLi4pOwoKdGhhdCBkb2VzIHNvbWV0aGluZyBzaW1pbGFyIGJ1dCBjcmVh dGVzIGEgbWVtZmQuICBBIG5pY2VyCmltcGxlbWVudGF0aW9uIGZvciBzaG9ydCBKSVQgc2VxdWVu Y2VzIHdvdWxkIGFsbG93IGFwcGVuZGluZyBtb3JlIGNvZGUKdG8gYW4gZXhpc3RpbmcgSklUIHJl Z2lvbi4gIE9uIHg4NiwgYW4gYXBwZW5kYWJsZSBKSVQgcmVnaW9uIHdvdWxkCnN0YXJ0IGZpbGxl ZCB3aXRoIDB4Q0MsIGFuZCBJIGJldCB0aGVyZSdzIGEgd2F5IHRvIG1hdGVyaWFsaXplIG5ldwpj b2RlIGludG8gYSBwcmV2aW91c2x5IDB4Y2MtZmlsbGVkIHZpcnR1YWwgcGFnZSB3dGhvdXQgYW55 CnN5bmNocm9uaXphdGlvbi4gIE9uZSBhcHByb2FjaCB3b3VsZCBiZSB0byBzdGFydCB3aXRoOgoK PHNvbWUgY29kZT4KMHhjYwoweGNjCi4uLgoweGNjCgphbmQgdG8gY3JlYXRlIGEgd2hvbGUgbmV3 IHBhZ2UgbGlrZToKCjxzb21lIGNvZGU+Cjxzb21lIG1vcmUgY29kZT4KMHhjYwouLi4KMHhjYwoK c28gdGhhdCB0aGUgb25seSBkaWZmZXJlbmNlIGlzIHRoYXQgc29tZSBjb2RlIGNoYW5nZWQgdG8g c29tZSBtb3JlCmNvZGUuICBUaGVuIHJlcGxhY2UgdGhlIFBURSB0byBzd2FwIGZyb20gdGhlIG9s ZCBwYWdlIHRvIHRoZSBuZXcgcGFnZSwKYW5kIGFycmFuZ2UgdG8gYXZvaWQgZnJlZWluZyB0aGUg b2xkIHBhZ2UgdW50aWwgd2UncmUgc3VyZSBpdCdzIGdvbmUKZnJvbSBhbGwgVExCcy4gIFRoaXMg bWF5IG5vdCB3b3JrIGlmIDxzb21lIG1vcmUgY29kZT4gc3BhbnMgYSBwYWdlCmJvdW5kYXJ5LiAg VGhlICNCUCBmaXh1cCB3b3VsZCB6YXAgdGhlIFRMQiBhbmQgcmV0cnkuICBFdmVuIGp1c3QKZGly ZWN0bHkgY29weWluZyBjb2RlIG92ZXIgc29tZSAweGNjIGJ5dGVzIGFsbW9zdCB3b3JrcywgYnV0 IHRoZXJlJ3MgYQpuYXN0eSBjb3JuZXIgY2FzZSBpbnZvbHZpbmcgaW5zdHJ1Y3Rpb25zIHRoYXQg ZmV0Y2ggSSQgZmV0Y2gKYm91bmRhcmllcy4gIEknbSBub3Qgc3VyZSB0byB3aGF0IGV4dGVudCBJ JCBzbm9vcGluZyBoZWxwcy4KCi0tQW5keQoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgtYXJt LWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21h aWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtlcm5lbAo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAB7FC433F7 for ; Tue, 28 Jul 2020 17:32:32 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id 2A092207F5 for ; Tue, 28 Jul 2020 17:32:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="00hywb4Z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A092207F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19481-kernel-hardening=archiver.kernel.org@lists.openwall.com Received: (qmail 17705 invoked by uid 550); 28 Jul 2020 17:32:25 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Received: (qmail 17677 invoked from network); 28 Jul 2020 17:32:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595957533; bh=9/DO98f4NNb2r87lGlZR0IYODVEjduHFQ1okay382Kw=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=00hywb4ZCFn7lczpn0BiOMeOm9xDGiOqlvQj8/th2fDX/IzE1putRyI2doIv2tI36 gO4lphP9SFo10v3da6m5h1BH07cHRR/okVnVy90sS8zQtt5pSIghUkUXx0ikOM7Nd0 h6O5+kD6MR7Vr5ux5UT9pRxsLHVuGsY6clVQqbF0= X-Gm-Message-State: AOAM533yRrtwUlFcdbDX7MPuhxSL2gLXx5IU/+VYemMckmyLU1E2F+i9 l9neqsWS5ae0XSrrk9V3UORiFjffctopHPPR6ZGo0g== X-Google-Smtp-Source: ABdhPJxq1bq+hNP4spnLcyA3rbDMq8utpwpjTxFxTkTrdf+BFedxKdyd+N85+0IDFv89rVq0CIoCx1Y8N9iYBQx1zm4= X-Received: by 2002:a5d:5273:: with SMTP id l19mr25476365wrc.257.1595957531409; Tue, 28 Jul 2020 10:32:11 -0700 (PDT) MIME-Version: 1.0 References: <20200728131050.24443-1-madvenka@linux.microsoft.com> In-Reply-To: <20200728131050.24443-1-madvenka@linux.microsoft.com> From: Andy Lutomirski Date: Tue, 28 Jul 2020 10:31:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: madvenka@linux.microsoft.com Cc: Kernel Hardening , Linux API , linux-arm-kernel , Linux FS Devel , linux-integrity , LKML , LSM List , Oleg Nesterov , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote: > > =EF=BB=BFFrom: "Madhavan T. Venkataraman" > > The kernel creates the trampoline mapping without any permissions. When > the trampoline is executed by user code, a page fault happens and the > kernel gets control. The kernel recognizes that this is a trampoline > invocation. It sets up the user registers based on the specified > register context, and/or pushes values on the user stack based on the > specified stack context, and sets the user PC to the requested target > PC. When the kernel returns, execution continues at the target PC. > So, the kernel does the work of the trampoline on behalf of the > application. This is quite clever, but now I=E2=80=99m wondering just how much kernel he= lp is really needed. In your series, the trampoline is an non-executable page. I can think of at least two alternative approaches, and I'd like to know the pros and cons. 1. Entirely userspace: a return trampoline would be something like: 1: pushq %rax pushq %rbc pushq %rcx ... pushq %r15 movq %rsp, %rdi # pointer to saved regs leaq 1b(%rip), %rsi # pointer to the trampoline itself callq trampoline_handler # see below You would fill a page with a bunch of these, possibly compacted to get more per page, and then you would remap as many copies as needed. The 'callq trampoline_handler' part would need to be a bit clever to make it continue to work despite this remapping. This will be *much* faster than trampfd. How much of your use case would it cover? For the inverse, it's not too hard to write a bit of asm to set all registers and jump somewhere. 2. Use existing kernel functionality. Raise a signal, modify the state, and return from the signal. This is very flexible and may not be all that much slower than trampfd. 3. Use a syscall. Instead of having the kernel handle page faults, have the trampoline code push the syscall nr register, load a special new syscall nr into the syscall nr register, and do a syscall. On x86_64, this would be: pushq %rax movq __NR_magic_trampoline, %rax syscall with some adjustment if the stack slot you're clobbering is important. Also, will using trampfd cause issues with various unwinders? I can easily imagine unwinders expecting code to be readable, although this is slowly going away for other reasons. All this being said, I think that the kernel should absolutely add a sensible interface for JITs to use to materialize their code. This would integrate sanely with LSMs and wouldn't require hacks like using files, etc. A cleverly designed JIT interface could function without seriailization IPIs, and even lame architectures like x86 could potentially avoid shootdown IPIs if the interface copied code instead of playing virtual memory games. At its very simplest, this could be: void *jit_create_code(const void *source, size_t len); and the result would be a new anonymous mapping that contains exactly the code requested. There could also be: int jittfd_create(...); that does something similar but creates a memfd. A nicer implementation for short JIT sequences would allow appending more code to an existing JIT region. On x86, an appendable JIT region would start filled with 0xCC, and I bet there's a way to materialize new code into a previously 0xcc-filled virtual page wthout any synchronization. One approach would be to start with: 0xcc 0xcc ... 0xcc and to create a whole new page like: 0xcc ... 0xcc so that the only difference is that some code changed to some more code. Then replace the PTE to swap from the old page to the new page, and arrange to avoid freeing the old page until we're sure it's gone from all TLBs. This may not work if spans a page boundary. The #BP fixup would zap the TLB and retry. Even just directly copying code over some 0xcc bytes almost works, but there's a nasty corner case involving instructions that fetch I$ fetch boundaries. I'm not sure to what extent I$ snooping helps. --Andy