Kernel-hardening archive on lore.kernel.org
 help / color / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: madvenka@linux.microsoft.com
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>,
	 Linux API <linux-api@vger.kernel.org>,
	 linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	 Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	 linux-integrity <linux-integrity@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 LSM List <linux-security-module@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,  X86 ML <x86@kernel.org>
Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
Date: Tue, 28 Jul 2020 10:31:59 -0700
Message-ID: <CALCETrVy5OMuUx04-wWk9FJbSxkrT2vMfN_kANinudrDwC4Cig@mail.gmail.com> (raw)
In-Reply-To: <20200728131050.24443-1-madvenka@linux.microsoft.com>

> On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote:
>
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>

> The kernel creates the trampoline mapping without any permissions. When
> the trampoline is executed by user code, a page fault happens and the
> kernel gets control. The kernel recognizes that this is a trampoline
> invocation. It sets up the user registers based on the specified
> register context, and/or pushes values on the user stack based on the
> specified stack context, and sets the user PC to the requested target
> PC. When the kernel returns, execution continues at the target PC.
> So, the kernel does the work of the trampoline on behalf of the
> application.

This is quite clever, but now I’m wondering just how much kernel help
is really needed. In your series, the trampoline is an non-executable
page.  I can think of at least two alternative approaches, and I'd
like to know the pros and cons.

1. Entirely userspace: a return trampoline would be something like:

1:
pushq %rax
pushq %rbc
pushq %rcx
...
pushq %r15
movq %rsp, %rdi # pointer to saved regs
leaq 1b(%rip), %rsi # pointer to the trampoline itself
callq trampoline_handler # see below

You would fill a page with a bunch of these, possibly compacted to get
more per page, and then you would remap as many copies as needed.  The
'callq trampoline_handler' part would need to be a bit clever to make
it continue to work despite this remapping.  This will be *much*
faster than trampfd. How much of your use case would it cover?  For
the inverse, it's not too hard to write a bit of asm to set all
registers and jump somewhere.

2. Use existing kernel functionality.  Raise a signal, modify the
state, and return from the signal.  This is very flexible and may not
be all that much slower than trampfd.

3. Use a syscall.  Instead of having the kernel handle page faults,
have the trampoline code push the syscall nr register, load a special
new syscall nr into the syscall nr register, and do a syscall. On
x86_64, this would be:

pushq %rax
movq __NR_magic_trampoline, %rax
syscall

with some adjustment if the stack slot you're clobbering is important.


Also, will using trampfd cause issues with various unwinders?  I can
easily imagine unwinders expecting code to be readable, although this
is slowly going away for other reasons.

All this being said, I think that the kernel should absolutely add a
sensible interface for JITs to use to materialize their code.  This
would integrate sanely with LSMs and wouldn't require hacks like using
files, etc.  A cleverly designed JIT interface could function without
seriailization IPIs, and even lame architectures like x86 could
potentially avoid shootdown IPIs if the interface copied code instead
of playing virtual memory games.  At its very simplest, this could be:

void *jit_create_code(const void *source, size_t len);

and the result would be a new anonymous mapping that contains exactly
the code requested.  There could also be:

int jittfd_create(...);

that does something similar but creates a memfd.  A nicer
implementation for short JIT sequences would allow appending more code
to an existing JIT region.  On x86, an appendable JIT region would
start filled with 0xCC, and I bet there's a way to materialize new
code into a previously 0xcc-filled virtual page wthout any
synchronization.  One approach would be to start with:

<some code>
0xcc
0xcc
...
0xcc

and to create a whole new page like:

<some code>
<some more code>
0xcc
...
0xcc

so that the only difference is that some code changed to some more
code.  Then replace the PTE to swap from the old page to the new page,
and arrange to avoid freeing the old page until we're sure it's gone
from all TLBs.  This may not work if <some more code> spans a page
boundary.  The #BP fixup would zap the TLB and retry.  Even just
directly copying code over some 0xcc bytes almost works, but there's a
nasty corner case involving instructions that fetch I$ fetch
boundaries.  I'm not sure to what extent I$ snooping helps.

--Andy

  parent reply index

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <aefc85852ea518982e74b233e11e16d2e707bc32>
2020-07-28 13:10 ` madvenka
2020-07-28 13:10   ` [PATCH v1 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
2020-07-28 14:50     ` Oleg Nesterov
2020-07-28 14:58       ` Madhavan T. Venkataraman
2020-07-28 16:06         ` Oleg Nesterov
2020-07-28 13:10   ` [PATCH v1 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
2020-07-30  9:06     ` Greg KH
2020-07-30 14:25       ` Madhavan T. Venkataraman
2020-07-28 13:10   ` [PATCH v1 3/4] [RFC] arm64/trampfd: " madvenka
2020-07-28 13:10   ` [PATCH v1 4/4] [RFC] arm/trampfd: " madvenka
2020-07-28 15:13   ` [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor David Laight
2020-07-28 16:32     ` Madhavan T. Venkataraman
2020-07-28 17:16       ` Andy Lutomirski
2020-07-28 17:39         ` Madhavan T. Venkataraman
2020-07-29  5:16           ` Andy Lutomirski
2020-07-28 18:52         ` Madhavan T. Venkataraman
2020-07-29  8:36           ` David Laight
2020-07-29 17:55             ` Madhavan T. Venkataraman
2020-07-28 16:05   ` Casey Schaufler
2020-07-28 16:49     ` Madhavan T. Venkataraman
2020-07-28 17:05     ` James Morris
2020-07-28 17:08       ` Madhavan T. Venkataraman
2020-07-28 17:31   ` Andy Lutomirski [this message]
2020-07-28 19:01     ` Madhavan T. Venkataraman
2020-07-29 13:29     ` Florian Weimer
2020-07-30 13:09     ` David Laight
2020-08-02 11:56       ` Pavel Machek
2020-08-03  8:08         ` David Laight
2020-08-03 15:57           ` Madhavan T. Venkataraman
2020-07-30 14:24     ` Madhavan T. Venkataraman
2020-07-30 20:54       ` Andy Lutomirski
2020-07-31 17:13         ` Madhavan T. Venkataraman
2020-07-31 18:31           ` Mark Rutland
2020-08-03  8:27             ` David Laight
2020-08-03 16:03               ` Madhavan T. Venkataraman
2020-08-03 16:57                 ` David Laight
2020-08-03 17:00                   ` Madhavan T. Venkataraman
2020-08-03 17:58             ` Madhavan T. Venkataraman
2020-08-04 13:55               ` Mark Rutland
2020-08-04 14:33                 ` David Laight
2020-08-04 14:44                   ` David Laight
2020-08-04 14:48                   ` Madhavan T. Venkataraman
2020-08-04 15:46                 ` Madhavan T. Venkataraman
2020-08-02 13:57           ` Florian Weimer
2020-07-30 14:42     ` Madhavan T. Venkataraman
2020-08-02 18:54     ` Madhavan T. Venkataraman
2020-08-02 20:00       ` Andy Lutomirski
2020-08-02 22:58         ` Madhavan T. Venkataraman
2020-08-03 18:36         ` Madhavan T. Venkataraman
2020-08-10 17:20         ` Madhavan T. Venkataraman
2020-08-10 17:34         ` Madhavan T. Venkataraman
2020-08-11 21:12           ` Madhavan T. Venkataraman
2020-08-03  8:23       ` David Laight
2020-08-03 15:59         ` Madhavan T. Venkataraman
2020-07-31 18:09   ` Mark Rutland
2020-07-31 20:08     ` Madhavan T. Venkataraman
2020-08-03 16:57     ` Madhavan T. Venkataraman
2020-08-04 14:30       ` Mark Rutland
2020-08-06 17:26         ` Madhavan T. Venkataraman
2020-08-08 22:17           ` Pavel Machek
2020-08-11 12:41             ` Madhavan T. Venkataraman
2020-08-11 13:08               ` Pavel Machek
2020-08-11 15:54                 ` Madhavan T. Venkataraman
2020-08-12 10:06           ` Mark Rutland
2020-08-12 18:47             ` Madhavan T. Venkataraman
2020-08-19 18:53             ` Mickaël Salaün
2020-09-01 15:42               ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVy5OMuUx04-wWk9FJbSxkrT2vMfN_kANinudrDwC4Cig@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=madvenka@linux.microsoft.com \
    --cc=oleg@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Kernel-hardening archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernel-hardening/0 kernel-hardening/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernel-hardening kernel-hardening/ https://lore.kernel.org/kernel-hardening \
		kernel-hardening@lists.openwall.com
	public-inbox-index kernel-hardening

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.openwall.lists.kernel-hardening


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git