From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAB7FC433DF for ; Thu, 30 Jul 2020 20:54:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A08352084D for ; Thu, 30 Jul 2020 20:54:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596142461; bh=kjsUMa7W/2716/E66O9BRcoDTvK1hMoraL3jFZhSbhQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=GEs8TPoVqHjuKMuTmVLiO6CYzPp/tgGg/MnA44OEfoWEVLzMlgLdlc66xIirPTePt g8BhyYhQIn50Zhe1O8bM3n8Cq6wsjp7d4xTQS5LkE5RcXnsLPX+5zPObEr5idJ/q2p wwhXTMbTSWlxGaUVaT0FQf+PFuqzrX5MFi2X7Xg8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730453AbgG3UyU (ORCPT ); Thu, 30 Jul 2020 16:54:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:42600 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728809AbgG3UyT (ORCPT ); Thu, 30 Jul 2020 16:54:19 -0400 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3D90121883 for ; Thu, 30 Jul 2020 20:54:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596142458; bh=kjsUMa7W/2716/E66O9BRcoDTvK1hMoraL3jFZhSbhQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=o9dMLI7V7xLay/4FEvc5ZBCLpWIPaCqr6MXAhd3v41abR8skQqLAzbrq08+887reU wRySApjBQKeQ5VZq/RrsOM5e6cbno9eOfSVO8kC1+jgM4mvZwzHMUrYvd4E6R5+giP WaOsgZjN1HyUHoKP9cFT3MNatWBEDqakktTzyWTE= Received: by mail-wr1-f50.google.com with SMTP id 88so26091751wrh.3 for ; Thu, 30 Jul 2020 13:54:18 -0700 (PDT) X-Gm-Message-State: AOAM532Ue1X/9i5/UrBH8JSpMDQnQuYB7pYmssqrX910wCgGgfPhp3kM uOb47jQBynFl9SZAUv17ss/8AJa1FvSB0H3IEfihBg== X-Google-Smtp-Source: ABdhPJzKLFb8fOLe7ajuISL9Za2Aj4/QcGOi3NXA9PDpcRSYfLFS4b/xrq0MWMJYlHADmhXcVVN9ubUqp0AHIqmO1Bk= X-Received: by 2002:adf:fa85:: with SMTP id h5mr509001wrr.18.1596142456738; Thu, 30 Jul 2020 13:54:16 -0700 (PDT) MIME-Version: 1.0 References: <20200728131050.24443-1-madvenka@linux.microsoft.com> <6540b4b7-3f70-adbf-c922-43886599713a@linux.microsoft.com> In-Reply-To: <6540b4b7-3f70-adbf-c922-43886599713a@linux.microsoft.com> From: Andy Lutomirski Date: Thu, 30 Jul 2020 13:54:03 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: "Madhavan T. Venkataraman" Cc: Andy Lutomirski , Kernel Hardening , Linux API , linux-arm-kernel , Linux FS Devel , linux-integrity , LKML , LSM List , Oleg Nesterov , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 30, 2020 at 7:24 AM Madhavan T. Venkataraman wrote: > > Sorry for the delay. I just wanted to think about this a little. > In this email, I will respond to your first suggestion. I will > respond to the rest in separate emails if that is alright with > you. > > On 7/28/20 12:31 PM, Andy Lutomirski wrote: > > On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote: > > =EF=BB=BFFrom: "Madhavan T. Venkataraman" > > The kernel creates the trampoline mapping without any permissions. When > the trampoline is executed by user code, a page fault happens and the > kernel gets control. The kernel recognizes that this is a trampoline > invocation. It sets up the user registers based on the specified > register context, and/or pushes values on the user stack based on the > specified stack context, and sets the user PC to the requested target > PC. When the kernel returns, execution continues at the target PC. > So, the kernel does the work of the trampoline on behalf of the > application. > > This is quite clever, but now I=E2=80=99m wondering just how much kernel = help > is really needed. In your series, the trampoline is an non-executable > page. I can think of at least two alternative approaches, and I'd > like to know the pros and cons. > > 1. Entirely userspace: a return trampoline would be something like: > > 1: > pushq %rax > pushq %rbc > pushq %rcx > ... > pushq %r15 > movq %rsp, %rdi # pointer to saved regs > leaq 1b(%rip), %rsi # pointer to the trampoline itself > callq trampoline_handler # see below > > You would fill a page with a bunch of these, possibly compacted to get > more per page, and then you would remap as many copies as needed. The > 'callq trampoline_handler' part would need to be a bit clever to make > it continue to work despite this remapping. This will be *much* > faster than trampfd. How much of your use case would it cover? For > the inverse, it's not too hard to write a bit of asm to set all > registers and jump somewhere. > > Let me state what I have understood about this suggestion. Correct me if > I get anything wrong. If you don't mind, I will also take the liberty > of generalizing and paraphrasing your suggestion. > > The goal is to create two page mappings that are adjacent to each other: > > - a code page that contains template code for a trampoline. Since the > template code would tend to be small in size, pack as many of them > as possible within a page to conserve memory. In other words, create > an array of the template code fragments. Each element in the array > would be used for one trampoline instance. > > - a data page that contains an array of data elements. Corresponding > to each code element in the code page, there would be a data element > in the data page that would contain data that is specific to a > trampoline instance. > > - Code will access data using PC-relative addressing. > > The management of the code pages and allocation for each trampoline > instance would all be done in user space. > > Is this the general idea? Yes. > > Creating a code page > -------------------- > > We can do this in one of the following ways: > > - Allocate a writable page at run time, write the template code into > the page and have execute permissions on the page. > > - Allocate a writable page at run time, write the template code into > the page and remap the page with just execute permissions. > > - Allocate a writable page at run time, write the template code into > the page, write the page into a temporary file and map the file with > execute permissions. > > - Include the template code in a code page at build time itself and > just remap the code page each time you need a code page. This latter part shouldn't need any special permissions as far as I know. > > Pros and Cons > ------------- > > As long as the OS provides the functionality to do this and the security > subsystem in the OS allows the actions, this is totally feasible. If not, > we need something like trampfd. > > As Floren mentioned, libffi does implement something like this for MACH. > > In fact, in my libffi changes, I use trampfd only after all the other met= hods > have failed because of security settings. > > But the above approach only solves the problem for this simple type of > trampoline. It does not provide a framework for addressing more complex t= ypes > or even other forms of dynamic code. > > Also, each application would need to implement this solution for itself > as opposed to relying on one implementation provided by the kernel. I would argue this is a benefit. If the whole implementation is in userspace, there is no ABI compatibility issue. The user program contains the trampoline code and the code that uses it. > > Trampfd-based solution > ---------------------- > > I outlined an enhancement to trampfd in a response to David Laight. In th= is > enhancement, the kernel is the one that would set up the code page. > > The kernel would call an arch-specific support function to generate the > code required to load registers, push values on the stack and jump to a P= C > for a trampoline instance based on its current context. The trampoline > instance data could be baked into the code. > > My initial idea was to only have one trampoline instance per page. But I > think I can implement multiple instances per page. I just have to manage > the trampfd file private data and VMA private data accordingly to map an > element in a code page to its trampoline object. > > The two approaches are similar except for the detail about who sets up > and manages the trampoline pages. In both approaches, the performance pro= blem > is addressed. But trampfd can be used even when security settings are > restrictive. > > Is my solution acceptable? Perhaps. In general, before adding a new ABI to the kernel, it's nice to understand how it's better than doing the same thing in userspace. Saying that it's easier for user code to work with if it's in the kernel isn't necessarily an adequate justification. Why would remapping two pages of actual application text ever fail? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D45DC433DF for ; Thu, 30 Jul 2020 20:55:44 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3C059204EA for ; Thu, 30 Jul 2020 20:55:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="mc9Tagpq"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="o9dMLI7V" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C059204EA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=DjIcH2terLJgIZqwZrTTkziP71o+KLXHxJIDX6Qo3ZU=; b=mc9TagpqxZTtnAXvdQh1CEFHN 6+y7Vy/K8VA1tR3nCjv+Oyap1WwPCU/47/tB5wgrmCXJfrnJGvKMs/9L6Ccw76k5p5Z4yql8WowSx dlGpmir9kDT6X8QI1R10YuotU9AaOj/wFQWxUhIHRnus4uXKoB6CRRbuYZH9ABSUYL8JBXjXNrD// J2zcWVDC0ctcpiBpU/jT0apHszHIScDGYiMf1T//vhAD0aEJ7iK24tjkv0xgD9shB2pTVPCS9z52o mB9QBSu+VKcHhlNpdmnLxKDn2LaUiiazRO05LSZ7EYpQlcxXXCiFDtSLt+FXJGerulBdLqKnAFxvW ZoCAFjj9Q==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k1FZG-0005fL-I4; Thu, 30 Jul 2020 20:54:22 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k1FZD-0005eJ-UM for linux-arm-kernel@lists.infradead.org; Thu, 30 Jul 2020 20:54:21 +0000 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 47B0D22B48 for ; Thu, 30 Jul 2020 20:54:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596142458; bh=kjsUMa7W/2716/E66O9BRcoDTvK1hMoraL3jFZhSbhQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=o9dMLI7V7xLay/4FEvc5ZBCLpWIPaCqr6MXAhd3v41abR8skQqLAzbrq08+887reU wRySApjBQKeQ5VZq/RrsOM5e6cbno9eOfSVO8kC1+jgM4mvZwzHMUrYvd4E6R5+giP WaOsgZjN1HyUHoKP9cFT3MNatWBEDqakktTzyWTE= Received: by mail-wr1-f44.google.com with SMTP id a15so26098095wrh.10 for ; Thu, 30 Jul 2020 13:54:18 -0700 (PDT) X-Gm-Message-State: AOAM53217LgG4HGtaPK92ia+kcVqKeANWUdn/svNhnuXzQJ+VYrnrdtc dN8K2ynzrFQ54O5a1bPZZltfDLjEC7bFwjJFHhDPsw== X-Google-Smtp-Source: ABdhPJzKLFb8fOLe7ajuISL9Za2Aj4/QcGOi3NXA9PDpcRSYfLFS4b/xrq0MWMJYlHADmhXcVVN9ubUqp0AHIqmO1Bk= X-Received: by 2002:adf:fa85:: with SMTP id h5mr509001wrr.18.1596142456738; Thu, 30 Jul 2020 13:54:16 -0700 (PDT) MIME-Version: 1.0 References: <20200728131050.24443-1-madvenka@linux.microsoft.com> <6540b4b7-3f70-adbf-c922-43886599713a@linux.microsoft.com> In-Reply-To: <6540b4b7-3f70-adbf-c922-43886599713a@linux.microsoft.com> From: Andy Lutomirski Date: Thu, 30 Jul 2020 13:54:03 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: "Madhavan T. Venkataraman" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200730_165420_109996_61A9E665 X-CRM114-Status: GOOD ( 48.83 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kernel Hardening , Linux API , X86 ML , LKML , Oleg Nesterov , LSM List , Andy Lutomirski , Linux FS Devel , linux-integrity , linux-arm-kernel Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gVGh1LCBKdWwgMzAsIDIwMjAgYXQgNzoyNCBBTSBNYWRoYXZhbiBULiBWZW5rYXRhcmFtYW4K PG1hZHZlbmthQGxpbnV4Lm1pY3Jvc29mdC5jb20+IHdyb3RlOgo+Cj4gU29ycnkgZm9yIHRoZSBk ZWxheS4gSSBqdXN0IHdhbnRlZCB0byB0aGluayBhYm91dCB0aGlzIGEgbGl0dGxlLgo+IEluIHRo aXMgZW1haWwsIEkgd2lsbCByZXNwb25kIHRvIHlvdXIgZmlyc3Qgc3VnZ2VzdGlvbi4gSSB3aWxs Cj4gcmVzcG9uZCB0byB0aGUgcmVzdCBpbiBzZXBhcmF0ZSBlbWFpbHMgaWYgdGhhdCBpcyBhbHJp Z2h0IHdpdGgKPiB5b3UuCj4KPiBPbiA3LzI4LzIwIDEyOjMxIFBNLCBBbmR5IEx1dG9taXJza2kg d3JvdGU6Cj4KPiBPbiBKdWwgMjgsIDIwMjAsIGF0IDY6MTEgQU0sIG1hZHZlbmthQGxpbnV4Lm1p Y3Jvc29mdC5jb20gd3JvdGU6Cj4KPiDvu79Gcm9tOiAiTWFkaGF2YW4gVC4gVmVua2F0YXJhbWFu IiA8bWFkdmVua2FAbGludXgubWljcm9zb2Z0LmNvbT4KPgo+IFRoZSBrZXJuZWwgY3JlYXRlcyB0 aGUgdHJhbXBvbGluZSBtYXBwaW5nIHdpdGhvdXQgYW55IHBlcm1pc3Npb25zLiBXaGVuCj4gdGhl IHRyYW1wb2xpbmUgaXMgZXhlY3V0ZWQgYnkgdXNlciBjb2RlLCBhIHBhZ2UgZmF1bHQgaGFwcGVu cyBhbmQgdGhlCj4ga2VybmVsIGdldHMgY29udHJvbC4gVGhlIGtlcm5lbCByZWNvZ25pemVzIHRo YXQgdGhpcyBpcyBhIHRyYW1wb2xpbmUKPiBpbnZvY2F0aW9uLiBJdCBzZXRzIHVwIHRoZSB1c2Vy IHJlZ2lzdGVycyBiYXNlZCBvbiB0aGUgc3BlY2lmaWVkCj4gcmVnaXN0ZXIgY29udGV4dCwgYW5k L29yIHB1c2hlcyB2YWx1ZXMgb24gdGhlIHVzZXIgc3RhY2sgYmFzZWQgb24gdGhlCj4gc3BlY2lm aWVkIHN0YWNrIGNvbnRleHQsIGFuZCBzZXRzIHRoZSB1c2VyIFBDIHRvIHRoZSByZXF1ZXN0ZWQg dGFyZ2V0Cj4gUEMuIFdoZW4gdGhlIGtlcm5lbCByZXR1cm5zLCBleGVjdXRpb24gY29udGludWVz IGF0IHRoZSB0YXJnZXQgUEMuCj4gU28sIHRoZSBrZXJuZWwgZG9lcyB0aGUgd29yayBvZiB0aGUg dHJhbXBvbGluZSBvbiBiZWhhbGYgb2YgdGhlCj4gYXBwbGljYXRpb24uCj4KPiBUaGlzIGlzIHF1 aXRlIGNsZXZlciwgYnV0IG5vdyBJ4oCZbSB3b25kZXJpbmcganVzdCBob3cgbXVjaCBrZXJuZWwg aGVscAo+IGlzIHJlYWxseSBuZWVkZWQuIEluIHlvdXIgc2VyaWVzLCB0aGUgdHJhbXBvbGluZSBp cyBhbiBub24tZXhlY3V0YWJsZQo+IHBhZ2UuICBJIGNhbiB0aGluayBvZiBhdCBsZWFzdCB0d28g YWx0ZXJuYXRpdmUgYXBwcm9hY2hlcywgYW5kIEknZAo+IGxpa2UgdG8ga25vdyB0aGUgcHJvcyBh bmQgY29ucy4KPgo+IDEuIEVudGlyZWx5IHVzZXJzcGFjZTogYSByZXR1cm4gdHJhbXBvbGluZSB3 b3VsZCBiZSBzb21ldGhpbmcgbGlrZToKPgo+IDE6Cj4gcHVzaHEgJXJheAo+IHB1c2hxICVyYmMK PiBwdXNocSAlcmN4Cj4gLi4uCj4gcHVzaHEgJXIxNQo+IG1vdnEgJXJzcCwgJXJkaSAjIHBvaW50 ZXIgdG8gc2F2ZWQgcmVncwo+IGxlYXEgMWIoJXJpcCksICVyc2kgIyBwb2ludGVyIHRvIHRoZSB0 cmFtcG9saW5lIGl0c2VsZgo+IGNhbGxxIHRyYW1wb2xpbmVfaGFuZGxlciAjIHNlZSBiZWxvdwo+ Cj4gWW91IHdvdWxkIGZpbGwgYSBwYWdlIHdpdGggYSBidW5jaCBvZiB0aGVzZSwgcG9zc2libHkg Y29tcGFjdGVkIHRvIGdldAo+IG1vcmUgcGVyIHBhZ2UsIGFuZCB0aGVuIHlvdSB3b3VsZCByZW1h cCBhcyBtYW55IGNvcGllcyBhcyBuZWVkZWQuICBUaGUKPiAnY2FsbHEgdHJhbXBvbGluZV9oYW5k bGVyJyBwYXJ0IHdvdWxkIG5lZWQgdG8gYmUgYSBiaXQgY2xldmVyIHRvIG1ha2UKPiBpdCBjb250 aW51ZSB0byB3b3JrIGRlc3BpdGUgdGhpcyByZW1hcHBpbmcuICBUaGlzIHdpbGwgYmUgKm11Y2gq Cj4gZmFzdGVyIHRoYW4gdHJhbXBmZC4gSG93IG11Y2ggb2YgeW91ciB1c2UgY2FzZSB3b3VsZCBp dCBjb3Zlcj8gIEZvcgo+IHRoZSBpbnZlcnNlLCBpdCdzIG5vdCB0b28gaGFyZCB0byB3cml0ZSBh IGJpdCBvZiBhc20gdG8gc2V0IGFsbAo+IHJlZ2lzdGVycyBhbmQganVtcCBzb21ld2hlcmUuCj4K PiBMZXQgbWUgc3RhdGUgd2hhdCBJIGhhdmUgdW5kZXJzdG9vZCBhYm91dCB0aGlzIHN1Z2dlc3Rp b24uIENvcnJlY3QgbWUgaWYKPiBJIGdldCBhbnl0aGluZyB3cm9uZy4gSWYgeW91IGRvbid0IG1p bmQsIEkgd2lsbCBhbHNvIHRha2UgdGhlIGxpYmVydHkKPiBvZiBnZW5lcmFsaXppbmcgYW5kIHBh cmFwaHJhc2luZyB5b3VyIHN1Z2dlc3Rpb24uCj4KPiBUaGUgZ29hbCBpcyB0byBjcmVhdGUgdHdv IHBhZ2UgbWFwcGluZ3MgdGhhdCBhcmUgYWRqYWNlbnQgdG8gZWFjaCBvdGhlcjoKPgo+IC0gYSBj b2RlIHBhZ2UgdGhhdCBjb250YWlucyB0ZW1wbGF0ZSBjb2RlIGZvciBhIHRyYW1wb2xpbmUuIFNp bmNlIHRoZQo+ICB0ZW1wbGF0ZSBjb2RlIHdvdWxkIHRlbmQgdG8gYmUgc21hbGwgaW4gc2l6ZSwg cGFjayBhcyBtYW55IG9mIHRoZW0KPiAgYXMgcG9zc2libGUgd2l0aGluIGEgcGFnZSB0byBjb25z ZXJ2ZSBtZW1vcnkuIEluIG90aGVyIHdvcmRzLCBjcmVhdGUKPiAgYW4gYXJyYXkgb2YgdGhlIHRl bXBsYXRlIGNvZGUgZnJhZ21lbnRzLiBFYWNoIGVsZW1lbnQgaW4gdGhlIGFycmF5Cj4gIHdvdWxk IGJlIHVzZWQgZm9yIG9uZSB0cmFtcG9saW5lIGluc3RhbmNlLgo+Cj4gLSBhIGRhdGEgcGFnZSB0 aGF0IGNvbnRhaW5zIGFuIGFycmF5IG9mIGRhdGEgZWxlbWVudHMuIENvcnJlc3BvbmRpbmcKPiAg dG8gZWFjaCBjb2RlIGVsZW1lbnQgaW4gdGhlIGNvZGUgcGFnZSwgdGhlcmUgd291bGQgYmUgYSBk YXRhIGVsZW1lbnQKPiAgaW4gdGhlIGRhdGEgcGFnZSB0aGF0IHdvdWxkIGNvbnRhaW4gZGF0YSB0 aGF0IGlzIHNwZWNpZmljIHRvIGEKPiAgdHJhbXBvbGluZSBpbnN0YW5jZS4KPgo+IC0gQ29kZSB3 aWxsIGFjY2VzcyBkYXRhIHVzaW5nIFBDLXJlbGF0aXZlIGFkZHJlc3NpbmcuCj4KPiBUaGUgbWFu YWdlbWVudCBvZiB0aGUgY29kZSBwYWdlcyBhbmQgYWxsb2NhdGlvbiBmb3IgZWFjaCB0cmFtcG9s aW5lCj4gaW5zdGFuY2Ugd291bGQgYWxsIGJlIGRvbmUgaW4gdXNlciBzcGFjZS4KPgo+IElzIHRo aXMgdGhlIGdlbmVyYWwgaWRlYT8KClllcy4KCj4KPiBDcmVhdGluZyBhIGNvZGUgcGFnZQo+IC0t LS0tLS0tLS0tLS0tLS0tLS0tCj4KPiBXZSBjYW4gZG8gdGhpcyBpbiBvbmUgb2YgdGhlIGZvbGxv d2luZyB3YXlzOgo+Cj4gLSBBbGxvY2F0ZSBhIHdyaXRhYmxlIHBhZ2UgYXQgcnVuIHRpbWUsIHdy aXRlIHRoZSB0ZW1wbGF0ZSBjb2RlIGludG8KPiAgIHRoZSBwYWdlIGFuZCBoYXZlIGV4ZWN1dGUg cGVybWlzc2lvbnMgb24gdGhlIHBhZ2UuCj4KPiAtIEFsbG9jYXRlIGEgd3JpdGFibGUgcGFnZSBh dCBydW4gdGltZSwgd3JpdGUgdGhlIHRlbXBsYXRlIGNvZGUgaW50bwo+ICAgdGhlIHBhZ2UgYW5k IHJlbWFwIHRoZSBwYWdlIHdpdGgganVzdCBleGVjdXRlIHBlcm1pc3Npb25zLgo+Cj4gLSBBbGxv Y2F0ZSBhIHdyaXRhYmxlIHBhZ2UgYXQgcnVuIHRpbWUsIHdyaXRlIHRoZSB0ZW1wbGF0ZSBjb2Rl IGludG8KPiAgIHRoZSBwYWdlLCB3cml0ZSB0aGUgcGFnZSBpbnRvIGEgdGVtcG9yYXJ5IGZpbGUg YW5kIG1hcCB0aGUgZmlsZSB3aXRoCj4gICBleGVjdXRlIHBlcm1pc3Npb25zLgo+Cj4gLSBJbmNs dWRlIHRoZSB0ZW1wbGF0ZSBjb2RlIGluIGEgY29kZSBwYWdlIGF0IGJ1aWxkIHRpbWUgaXRzZWxm IGFuZAo+ICAganVzdCByZW1hcCB0aGUgY29kZSBwYWdlIGVhY2ggdGltZSB5b3UgbmVlZCBhIGNv ZGUgcGFnZS4KClRoaXMgbGF0dGVyIHBhcnQgc2hvdWxkbid0IG5lZWQgYW55IHNwZWNpYWwgcGVy bWlzc2lvbnMgYXMgZmFyIGFzIEkga25vdy4KCj4KPiBQcm9zIGFuZCBDb25zCj4gLS0tLS0tLS0t LS0tLQo+Cj4gQXMgbG9uZyBhcyB0aGUgT1MgcHJvdmlkZXMgdGhlIGZ1bmN0aW9uYWxpdHkgdG8g ZG8gdGhpcyBhbmQgdGhlIHNlY3VyaXR5Cj4gc3Vic3lzdGVtIGluIHRoZSBPUyBhbGxvd3MgdGhl IGFjdGlvbnMsIHRoaXMgaXMgdG90YWxseSBmZWFzaWJsZS4gSWYgbm90LAo+IHdlIG5lZWQgc29t ZXRoaW5nIGxpa2UgdHJhbXBmZC4KPgo+IEFzIEZsb3JlbiBtZW50aW9uZWQsIGxpYmZmaSBkb2Vz IGltcGxlbWVudCBzb21ldGhpbmcgbGlrZSB0aGlzIGZvciBNQUNILgo+Cj4gSW4gZmFjdCwgaW4g bXkgbGliZmZpIGNoYW5nZXMsIEkgdXNlIHRyYW1wZmQgb25seSBhZnRlciBhbGwgdGhlIG90aGVy IG1ldGhvZHMKPiBoYXZlIGZhaWxlZCBiZWNhdXNlIG9mIHNlY3VyaXR5IHNldHRpbmdzLgo+Cj4g QnV0IHRoZSBhYm92ZSBhcHByb2FjaCBvbmx5IHNvbHZlcyB0aGUgcHJvYmxlbSBmb3IgdGhpcyBz aW1wbGUgdHlwZSBvZgo+IHRyYW1wb2xpbmUuIEl0IGRvZXMgbm90IHByb3ZpZGUgYSBmcmFtZXdv cmsgZm9yIGFkZHJlc3NpbmcgbW9yZSBjb21wbGV4IHR5cGVzCj4gb3IgZXZlbiBvdGhlciBmb3Jt cyBvZiBkeW5hbWljIGNvZGUuCj4KPiBBbHNvLCBlYWNoIGFwcGxpY2F0aW9uIHdvdWxkIG5lZWQg dG8gaW1wbGVtZW50IHRoaXMgc29sdXRpb24gZm9yIGl0c2VsZgo+IGFzIG9wcG9zZWQgdG8gcmVs eWluZyBvbiBvbmUgaW1wbGVtZW50YXRpb24gcHJvdmlkZWQgYnkgdGhlIGtlcm5lbC4KCkkgd291 bGQgYXJndWUgdGhpcyBpcyBhIGJlbmVmaXQuICBJZiB0aGUgd2hvbGUgaW1wbGVtZW50YXRpb24g aXMgaW4KdXNlcnNwYWNlLCB0aGVyZSBpcyBubyBBQkkgY29tcGF0aWJpbGl0eSBpc3N1ZS4gIFRo ZSB1c2VyIHByb2dyYW0KY29udGFpbnMgdGhlIHRyYW1wb2xpbmUgY29kZSBhbmQgdGhlIGNvZGUg dGhhdCB1c2VzIGl0LgoKPgo+IFRyYW1wZmQtYmFzZWQgc29sdXRpb24KPiAtLS0tLS0tLS0tLS0t LS0tLS0tLS0tCj4KPiBJIG91dGxpbmVkIGFuIGVuaGFuY2VtZW50IHRvIHRyYW1wZmQgaW4gYSBy ZXNwb25zZSB0byBEYXZpZCBMYWlnaHQuIEluIHRoaXMKPiBlbmhhbmNlbWVudCwgdGhlIGtlcm5l bCBpcyB0aGUgb25lIHRoYXQgd291bGQgc2V0IHVwIHRoZSBjb2RlIHBhZ2UuCj4KPiBUaGUga2Vy bmVsIHdvdWxkIGNhbGwgYW4gYXJjaC1zcGVjaWZpYyBzdXBwb3J0IGZ1bmN0aW9uIHRvIGdlbmVy YXRlIHRoZQo+IGNvZGUgcmVxdWlyZWQgdG8gbG9hZCByZWdpc3RlcnMsIHB1c2ggdmFsdWVzIG9u IHRoZSBzdGFjayBhbmQganVtcCB0byBhIFBDCj4gZm9yIGEgdHJhbXBvbGluZSBpbnN0YW5jZSBi YXNlZCBvbiBpdHMgY3VycmVudCBjb250ZXh0LiBUaGUgdHJhbXBvbGluZQo+IGluc3RhbmNlIGRh dGEgY291bGQgYmUgYmFrZWQgaW50byB0aGUgY29kZS4KPgo+IE15IGluaXRpYWwgaWRlYSB3YXMg dG8gb25seSBoYXZlIG9uZSB0cmFtcG9saW5lIGluc3RhbmNlIHBlciBwYWdlLiBCdXQgSQo+IHRo aW5rIEkgY2FuIGltcGxlbWVudCBtdWx0aXBsZSBpbnN0YW5jZXMgcGVyIHBhZ2UuIEkganVzdCBo YXZlIHRvIG1hbmFnZQo+IHRoZSB0cmFtcGZkIGZpbGUgcHJpdmF0ZSBkYXRhIGFuZCBWTUEgcHJp dmF0ZSBkYXRhIGFjY29yZGluZ2x5IHRvIG1hcCBhbgo+IGVsZW1lbnQgaW4gYSBjb2RlIHBhZ2Ug dG8gaXRzIHRyYW1wb2xpbmUgb2JqZWN0Lgo+Cj4gVGhlIHR3byBhcHByb2FjaGVzIGFyZSBzaW1p bGFyIGV4Y2VwdCBmb3IgdGhlIGRldGFpbCBhYm91dCB3aG8gc2V0cyB1cAo+IGFuZCBtYW5hZ2Vz IHRoZSB0cmFtcG9saW5lIHBhZ2VzLiBJbiBib3RoIGFwcHJvYWNoZXMsIHRoZSBwZXJmb3JtYW5j ZSBwcm9ibGVtCj4gaXMgYWRkcmVzc2VkLiBCdXQgdHJhbXBmZCBjYW4gYmUgdXNlZCBldmVuIHdo ZW4gc2VjdXJpdHkgc2V0dGluZ3MgYXJlCj4gcmVzdHJpY3RpdmUuCj4KPiBJcyBteSBzb2x1dGlv biBhY2NlcHRhYmxlPwoKUGVyaGFwcy4gIEluIGdlbmVyYWwsIGJlZm9yZSBhZGRpbmcgYSBuZXcg QUJJIHRvIHRoZSBrZXJuZWwsIGl0J3MgbmljZQp0byB1bmRlcnN0YW5kIGhvdyBpdCdzIGJldHRl ciB0aGFuIGRvaW5nIHRoZSBzYW1lIHRoaW5nIGluIHVzZXJzcGFjZS4KU2F5aW5nIHRoYXQgaXQn cyBlYXNpZXIgZm9yIHVzZXIgY29kZSB0byB3b3JrIHdpdGggaWYgaXQncyBpbiB0aGUKa2VybmVs IGlzbid0IG5lY2Vzc2FyaWx5IGFuIGFkZXF1YXRlIGp1c3RpZmljYXRpb24uCgpXaHkgd291bGQg cmVtYXBwaW5nIHR3byBwYWdlcyBvZiBhY3R1YWwgYXBwbGljYXRpb24gdGV4dCBldmVyIGZhaWw/ CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1h cm0ta2VybmVsIG1haWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5v cmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0t a2VybmVsCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97BF6C433DF for ; Thu, 30 Jul 2020 20:54:39 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id B9FFA20829 for ; Thu, 30 Jul 2020 20:54:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="o9dMLI7V" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9FFA20829 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19503-kernel-hardening=archiver.kernel.org@lists.openwall.com Received: (qmail 3931 invoked by uid 550); 30 Jul 2020 20:54:31 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Received: (qmail 3908 invoked from network); 30 Jul 2020 20:54:30 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596142458; bh=kjsUMa7W/2716/E66O9BRcoDTvK1hMoraL3jFZhSbhQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=o9dMLI7V7xLay/4FEvc5ZBCLpWIPaCqr6MXAhd3v41abR8skQqLAzbrq08+887reU wRySApjBQKeQ5VZq/RrsOM5e6cbno9eOfSVO8kC1+jgM4mvZwzHMUrYvd4E6R5+giP WaOsgZjN1HyUHoKP9cFT3MNatWBEDqakktTzyWTE= X-Gm-Message-State: AOAM530QYdiHQdScgTZzW05noJH0gGEgYeTn9/yV6dhPtinvlgT9q8VE 0sYVJWzQ310JQM7VK4Mi5ExjWUy/ixDmplMVGLhBww== X-Google-Smtp-Source: ABdhPJzKLFb8fOLe7ajuISL9Za2Aj4/QcGOi3NXA9PDpcRSYfLFS4b/xrq0MWMJYlHADmhXcVVN9ubUqp0AHIqmO1Bk= X-Received: by 2002:adf:fa85:: with SMTP id h5mr509001wrr.18.1596142456738; Thu, 30 Jul 2020 13:54:16 -0700 (PDT) MIME-Version: 1.0 References: <20200728131050.24443-1-madvenka@linux.microsoft.com> <6540b4b7-3f70-adbf-c922-43886599713a@linux.microsoft.com> In-Reply-To: <6540b4b7-3f70-adbf-c922-43886599713a@linux.microsoft.com> From: Andy Lutomirski Date: Thu, 30 Jul 2020 13:54:03 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor To: "Madhavan T. Venkataraman" Cc: Andy Lutomirski , Kernel Hardening , Linux API , linux-arm-kernel , Linux FS Devel , linux-integrity , LKML , LSM List , Oleg Nesterov , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jul 30, 2020 at 7:24 AM Madhavan T. Venkataraman wrote: > > Sorry for the delay. I just wanted to think about this a little. > In this email, I will respond to your first suggestion. I will > respond to the rest in separate emails if that is alright with > you. > > On 7/28/20 12:31 PM, Andy Lutomirski wrote: > > On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote: > > =EF=BB=BFFrom: "Madhavan T. Venkataraman" > > The kernel creates the trampoline mapping without any permissions. When > the trampoline is executed by user code, a page fault happens and the > kernel gets control. The kernel recognizes that this is a trampoline > invocation. It sets up the user registers based on the specified > register context, and/or pushes values on the user stack based on the > specified stack context, and sets the user PC to the requested target > PC. When the kernel returns, execution continues at the target PC. > So, the kernel does the work of the trampoline on behalf of the > application. > > This is quite clever, but now I=E2=80=99m wondering just how much kernel = help > is really needed. In your series, the trampoline is an non-executable > page. I can think of at least two alternative approaches, and I'd > like to know the pros and cons. > > 1. Entirely userspace: a return trampoline would be something like: > > 1: > pushq %rax > pushq %rbc > pushq %rcx > ... > pushq %r15 > movq %rsp, %rdi # pointer to saved regs > leaq 1b(%rip), %rsi # pointer to the trampoline itself > callq trampoline_handler # see below > > You would fill a page with a bunch of these, possibly compacted to get > more per page, and then you would remap as many copies as needed. The > 'callq trampoline_handler' part would need to be a bit clever to make > it continue to work despite this remapping. This will be *much* > faster than trampfd. How much of your use case would it cover? For > the inverse, it's not too hard to write a bit of asm to set all > registers and jump somewhere. > > Let me state what I have understood about this suggestion. Correct me if > I get anything wrong. If you don't mind, I will also take the liberty > of generalizing and paraphrasing your suggestion. > > The goal is to create two page mappings that are adjacent to each other: > > - a code page that contains template code for a trampoline. Since the > template code would tend to be small in size, pack as many of them > as possible within a page to conserve memory. In other words, create > an array of the template code fragments. Each element in the array > would be used for one trampoline instance. > > - a data page that contains an array of data elements. Corresponding > to each code element in the code page, there would be a data element > in the data page that would contain data that is specific to a > trampoline instance. > > - Code will access data using PC-relative addressing. > > The management of the code pages and allocation for each trampoline > instance would all be done in user space. > > Is this the general idea? Yes. > > Creating a code page > -------------------- > > We can do this in one of the following ways: > > - Allocate a writable page at run time, write the template code into > the page and have execute permissions on the page. > > - Allocate a writable page at run time, write the template code into > the page and remap the page with just execute permissions. > > - Allocate a writable page at run time, write the template code into > the page, write the page into a temporary file and map the file with > execute permissions. > > - Include the template code in a code page at build time itself and > just remap the code page each time you need a code page. This latter part shouldn't need any special permissions as far as I know. > > Pros and Cons > ------------- > > As long as the OS provides the functionality to do this and the security > subsystem in the OS allows the actions, this is totally feasible. If not, > we need something like trampfd. > > As Floren mentioned, libffi does implement something like this for MACH. > > In fact, in my libffi changes, I use trampfd only after all the other met= hods > have failed because of security settings. > > But the above approach only solves the problem for this simple type of > trampoline. It does not provide a framework for addressing more complex t= ypes > or even other forms of dynamic code. > > Also, each application would need to implement this solution for itself > as opposed to relying on one implementation provided by the kernel. I would argue this is a benefit. If the whole implementation is in userspace, there is no ABI compatibility issue. The user program contains the trampoline code and the code that uses it. > > Trampfd-based solution > ---------------------- > > I outlined an enhancement to trampfd in a response to David Laight. In th= is > enhancement, the kernel is the one that would set up the code page. > > The kernel would call an arch-specific support function to generate the > code required to load registers, push values on the stack and jump to a P= C > for a trampoline instance based on its current context. The trampoline > instance data could be baked into the code. > > My initial idea was to only have one trampoline instance per page. But I > think I can implement multiple instances per page. I just have to manage > the trampfd file private data and VMA private data accordingly to map an > element in a code page to its trampoline object. > > The two approaches are similar except for the detail about who sets up > and manages the trampoline pages. In both approaches, the performance pro= blem > is addressed. But trampfd can be used even when security settings are > restrictive. > > Is my solution acceptable? Perhaps. In general, before adding a new ABI to the kernel, it's nice to understand how it's better than doing the same thing in userspace. Saying that it's easier for user code to work with if it's in the kernel isn't necessarily an adequate justification. Why would remapping two pages of actual application text ever fail?