All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Linux API <linux-api@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-integrity <linux-integrity@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>, X86 ML <x86@kernel.org>
Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
Date: Tue, 28 Jul 2020 14:01:12 -0500	[thread overview]
Message-ID: <8b28f4a5-2d9e-0686-40e5-2ea9e37c5933@linux.microsoft.com> (raw)
In-Reply-To: <CALCETrVy5OMuUx04-wWk9FJbSxkrT2vMfN_kANinudrDwC4Cig@mail.gmail.com>

I am working on a response to this. I will send it soon.

Thanks.

Madhavan

On 7/28/20 12:31 PM, Andy Lutomirski wrote:
>> On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote:
>>
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> The kernel creates the trampoline mapping without any permissions. When
>> the trampoline is executed by user code, a page fault happens and the
>> kernel gets control. The kernel recognizes that this is a trampoline
>> invocation. It sets up the user registers based on the specified
>> register context, and/or pushes values on the user stack based on the
>> specified stack context, and sets the user PC to the requested target
>> PC. When the kernel returns, execution continues at the target PC.
>> So, the kernel does the work of the trampoline on behalf of the
>> application.
> This is quite clever, but now I’m wondering just how much kernel help
> is really needed. In your series, the trampoline is an non-executable
> page.  I can think of at least two alternative approaches, and I'd
> like to know the pros and cons.
>
> 1. Entirely userspace: a return trampoline would be something like:
>
> 1:
> pushq %rax
> pushq %rbc
> pushq %rcx
> ...
> pushq %r15
> movq %rsp, %rdi # pointer to saved regs
> leaq 1b(%rip), %rsi # pointer to the trampoline itself
> callq trampoline_handler # see below
>
> You would fill a page with a bunch of these, possibly compacted to get
> more per page, and then you would remap as many copies as needed.  The
> 'callq trampoline_handler' part would need to be a bit clever to make
> it continue to work despite this remapping.  This will be *much*
> faster than trampfd. How much of your use case would it cover?  For
> the inverse, it's not too hard to write a bit of asm to set all
> registers and jump somewhere.
>
> 2. Use existing kernel functionality.  Raise a signal, modify the
> state, and return from the signal.  This is very flexible and may not
> be all that much slower than trampfd.
>
> 3. Use a syscall.  Instead of having the kernel handle page faults,
> have the trampoline code push the syscall nr register, load a special
> new syscall nr into the syscall nr register, and do a syscall. On
> x86_64, this would be:
>
> pushq %rax
> movq __NR_magic_trampoline, %rax
> syscall
>
> with some adjustment if the stack slot you're clobbering is important.
>
>
> Also, will using trampfd cause issues with various unwinders?  I can
> easily imagine unwinders expecting code to be readable, although this
> is slowly going away for other reasons.
>
> All this being said, I think that the kernel should absolutely add a
> sensible interface for JITs to use to materialize their code.  This
> would integrate sanely with LSMs and wouldn't require hacks like using
> files, etc.  A cleverly designed JIT interface could function without
> seriailization IPIs, and even lame architectures like x86 could
> potentially avoid shootdown IPIs if the interface copied code instead
> of playing virtual memory games.  At its very simplest, this could be:
>
> void *jit_create_code(const void *source, size_t len);
>
> and the result would be a new anonymous mapping that contains exactly
> the code requested.  There could also be:
>
> int jittfd_create(...);
>
> that does something similar but creates a memfd.  A nicer
> implementation for short JIT sequences would allow appending more code
> to an existing JIT region.  On x86, an appendable JIT region would
> start filled with 0xCC, and I bet there's a way to materialize new
> code into a previously 0xcc-filled virtual page wthout any
> synchronization.  One approach would be to start with:
>
> <some code>
> 0xcc
> 0xcc
> ...
> 0xcc
>
> and to create a whole new page like:
>
> <some code>
> <some more code>
> 0xcc
> ...
> 0xcc
>
> so that the only difference is that some code changed to some more
> code.  Then replace the PTE to swap from the old page to the new page,
> and arrange to avoid freeing the old page until we're sure it's gone
> from all TLBs.  This may not work if <some more code> spans a page
> boundary.  The #BP fixup would zap the TLB and retry.  Even just
> directly copying code over some 0xcc bytes almost works, but there's a
> nasty corner case involving instructions that fetch I$ fetch
> boundaries.  I'm not sure to what extent I$ snooping helps.
>
> --Andy


WARNING: multiple messages have this Message-ID (diff)
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Linux API <linux-api@vger.kernel.org>, X86 ML <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	LSM List <linux-security-module@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-integrity <linux-integrity@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
Date: Tue, 28 Jul 2020 14:01:12 -0500	[thread overview]
Message-ID: <8b28f4a5-2d9e-0686-40e5-2ea9e37c5933@linux.microsoft.com> (raw)
In-Reply-To: <CALCETrVy5OMuUx04-wWk9FJbSxkrT2vMfN_kANinudrDwC4Cig@mail.gmail.com>

I am working on a response to this. I will send it soon.

Thanks.

Madhavan

On 7/28/20 12:31 PM, Andy Lutomirski wrote:
>> On Jul 28, 2020, at 6:11 AM, madvenka@linux.microsoft.com wrote:
>>
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> The kernel creates the trampoline mapping without any permissions. When
>> the trampoline is executed by user code, a page fault happens and the
>> kernel gets control. The kernel recognizes that this is a trampoline
>> invocation. It sets up the user registers based on the specified
>> register context, and/or pushes values on the user stack based on the
>> specified stack context, and sets the user PC to the requested target
>> PC. When the kernel returns, execution continues at the target PC.
>> So, the kernel does the work of the trampoline on behalf of the
>> application.
> This is quite clever, but now I’m wondering just how much kernel help
> is really needed. In your series, the trampoline is an non-executable
> page.  I can think of at least two alternative approaches, and I'd
> like to know the pros and cons.
>
> 1. Entirely userspace: a return trampoline would be something like:
>
> 1:
> pushq %rax
> pushq %rbc
> pushq %rcx
> ...
> pushq %r15
> movq %rsp, %rdi # pointer to saved regs
> leaq 1b(%rip), %rsi # pointer to the trampoline itself
> callq trampoline_handler # see below
>
> You would fill a page with a bunch of these, possibly compacted to get
> more per page, and then you would remap as many copies as needed.  The
> 'callq trampoline_handler' part would need to be a bit clever to make
> it continue to work despite this remapping.  This will be *much*
> faster than trampfd. How much of your use case would it cover?  For
> the inverse, it's not too hard to write a bit of asm to set all
> registers and jump somewhere.
>
> 2. Use existing kernel functionality.  Raise a signal, modify the
> state, and return from the signal.  This is very flexible and may not
> be all that much slower than trampfd.
>
> 3. Use a syscall.  Instead of having the kernel handle page faults,
> have the trampoline code push the syscall nr register, load a special
> new syscall nr into the syscall nr register, and do a syscall. On
> x86_64, this would be:
>
> pushq %rax
> movq __NR_magic_trampoline, %rax
> syscall
>
> with some adjustment if the stack slot you're clobbering is important.
>
>
> Also, will using trampfd cause issues with various unwinders?  I can
> easily imagine unwinders expecting code to be readable, although this
> is slowly going away for other reasons.
>
> All this being said, I think that the kernel should absolutely add a
> sensible interface for JITs to use to materialize their code.  This
> would integrate sanely with LSMs and wouldn't require hacks like using
> files, etc.  A cleverly designed JIT interface could function without
> seriailization IPIs, and even lame architectures like x86 could
> potentially avoid shootdown IPIs if the interface copied code instead
> of playing virtual memory games.  At its very simplest, this could be:
>
> void *jit_create_code(const void *source, size_t len);
>
> and the result would be a new anonymous mapping that contains exactly
> the code requested.  There could also be:
>
> int jittfd_create(...);
>
> that does something similar but creates a memfd.  A nicer
> implementation for short JIT sequences would allow appending more code
> to an existing JIT region.  On x86, an appendable JIT region would
> start filled with 0xCC, and I bet there's a way to materialize new
> code into a previously 0xcc-filled virtual page wthout any
> synchronization.  One approach would be to start with:
>
> <some code>
> 0xcc
> 0xcc
> ...
> 0xcc
>
> and to create a whole new page like:
>
> <some code>
> <some more code>
> 0xcc
> ...
> 0xcc
>
> so that the only difference is that some code changed to some more
> code.  Then replace the PTE to swap from the old page to the new page,
> and arrange to avoid freeing the old page until we're sure it's gone
> from all TLBs.  This may not work if <some more code> spans a page
> boundary.  The #BP fixup would zap the TLB and retry.  Even just
> directly copying code over some 0xcc bytes almost works, but there's a
> nasty corner case involving instructions that fetch I$ fetch
> boundaries.  I'm not sure to what extent I$ snooping helps.
>
> --Andy


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-07-28 19:01 UTC|newest]

Thread overview: 146+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <aefc85852ea518982e74b233e11e16d2e707bc32>
2020-07-28 13:10 ` [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor madvenka
2020-07-28 13:10   ` madvenka
2020-07-28 13:10   ` [PATCH v1 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
2020-07-28 13:10     ` madvenka
2020-07-28 14:50     ` Oleg Nesterov
2020-07-28 14:50       ` Oleg Nesterov
2020-07-28 14:58       ` Madhavan T. Venkataraman
2020-07-28 14:58         ` Madhavan T. Venkataraman
2020-07-28 16:06         ` Oleg Nesterov
2020-07-28 16:06           ` Oleg Nesterov
2020-07-28 19:48     ` kernel test robot
2020-07-29  2:33     ` kernel test robot
2020-07-28 13:10   ` [PATCH v1 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
2020-07-28 13:10     ` madvenka
2020-07-28 18:38     ` kernel test robot
2020-07-30  9:06     ` Greg KH
2020-07-30  9:06       ` Greg KH
2020-07-30 14:25       ` Madhavan T. Venkataraman
2020-07-30 14:25         ` Madhavan T. Venkataraman
2020-07-28 13:10   ` [PATCH v1 3/4] [RFC] arm64/trampfd: " madvenka
2020-07-28 13:10     ` madvenka
2020-07-28 13:10   ` [PATCH v1 4/4] [RFC] arm/trampfd: " madvenka
2020-07-28 13:10     ` madvenka
2020-07-28 15:13   ` [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor David Laight
2020-07-28 15:13     ` David Laight
2020-07-28 15:13     ` David Laight
2020-07-28 16:32     ` Madhavan T. Venkataraman
2020-07-28 16:32       ` Madhavan T. Venkataraman
2020-07-28 16:32       ` Madhavan T. Venkataraman
2020-07-28 17:16       ` Andy Lutomirski
2020-07-28 17:16         ` Andy Lutomirski
2020-07-28 17:16         ` Andy Lutomirski
2020-07-28 17:39         ` Madhavan T. Venkataraman
2020-07-29  5:16           ` Andy Lutomirski
2020-07-29  5:16             ` Andy Lutomirski
2020-07-29  5:16             ` Andy Lutomirski
2020-07-28 18:52         ` Madhavan T. Venkataraman
2020-07-28 18:52           ` Madhavan T. Venkataraman
2020-07-28 18:52           ` Madhavan T. Venkataraman
2020-07-29  8:36           ` David Laight
2020-07-29  8:36             ` David Laight
2020-07-29  8:36             ` David Laight
2020-07-29 17:55             ` Madhavan T. Venkataraman
2020-07-29 17:55               ` Madhavan T. Venkataraman
2020-07-29 17:55               ` Madhavan T. Venkataraman
2020-07-28 16:05   ` Casey Schaufler
2020-07-28 16:05     ` Casey Schaufler
2020-07-28 16:49     ` Madhavan T. Venkataraman
2020-07-28 16:49       ` Madhavan T. Venkataraman
2020-07-28 17:05     ` James Morris
2020-07-28 17:05       ` James Morris
2020-07-28 17:08       ` Madhavan T. Venkataraman
2020-07-28 17:08         ` Madhavan T. Venkataraman
2020-07-28 17:31   ` Andy Lutomirski
2020-07-28 17:31     ` Andy Lutomirski
2020-07-28 17:31     ` Andy Lutomirski
2020-07-28 19:01     ` Madhavan T. Venkataraman [this message]
2020-07-28 19:01       ` Madhavan T. Venkataraman
2020-07-29 13:29     ` Florian Weimer
2020-07-29 13:29       ` Florian Weimer
2020-07-29 13:29       ` Florian Weimer
2020-07-30 13:09     ` David Laight
2020-07-30 13:09       ` David Laight
2020-08-02 11:56       ` Pavel Machek
2020-08-02 11:56         ` Pavel Machek
2020-08-03  8:08         ` David Laight
2020-08-03  8:08           ` David Laight
2020-08-03 15:57           ` Madhavan T. Venkataraman
2020-08-03 15:57             ` Madhavan T. Venkataraman
2020-07-30 14:24     ` Madhavan T. Venkataraman
2020-07-30 20:54       ` Andy Lutomirski
2020-07-30 20:54         ` Andy Lutomirski
2020-07-30 20:54         ` Andy Lutomirski
2020-07-31 17:13         ` Madhavan T. Venkataraman
2020-07-31 17:13           ` Madhavan T. Venkataraman
2020-07-31 18:31           ` Mark Rutland
2020-07-31 18:31             ` Mark Rutland
2020-08-03  8:27             ` David Laight
2020-08-03  8:27               ` David Laight
2020-08-03 16:03               ` Madhavan T. Venkataraman
2020-08-03 16:03                 ` Madhavan T. Venkataraman
2020-08-03 16:57                 ` David Laight
2020-08-03 16:57                   ` David Laight
2020-08-03 17:00                   ` Madhavan T. Venkataraman
2020-08-03 17:00                     ` Madhavan T. Venkataraman
2020-08-03 17:58             ` Madhavan T. Venkataraman
2020-08-03 17:58               ` Madhavan T. Venkataraman
2020-08-04 13:55               ` Mark Rutland
2020-08-04 13:55                 ` Mark Rutland
2020-08-04 14:33                 ` David Laight
2020-08-04 14:33                   ` David Laight
2020-08-04 14:44                   ` David Laight
2020-08-04 14:44                     ` David Laight
2020-08-04 14:48                   ` Madhavan T. Venkataraman
2020-08-04 14:48                     ` Madhavan T. Venkataraman
2020-08-04 15:46                 ` Madhavan T. Venkataraman
2020-08-04 15:46                   ` Madhavan T. Venkataraman
2020-08-02 13:57           ` Florian Weimer
2020-08-02 13:57             ` Florian Weimer
2020-08-02 13:57             ` Florian Weimer
2020-07-30 14:42     ` Madhavan T. Venkataraman
2020-07-30 14:42       ` Madhavan T. Venkataraman
2020-08-02 18:54     ` Madhavan T. Venkataraman
2020-08-02 18:54       ` Madhavan T. Venkataraman
2020-08-02 20:00       ` Andy Lutomirski
2020-08-02 20:00         ` Andy Lutomirski
2020-08-02 20:00         ` Andy Lutomirski
2020-08-02 22:58         ` Madhavan T. Venkataraman
2020-08-02 22:58           ` Madhavan T. Venkataraman
2020-08-03 18:36         ` Madhavan T. Venkataraman
2020-08-03 18:36           ` Madhavan T. Venkataraman
2020-08-10 17:20         ` Madhavan T. Venkataraman
2020-08-10 17:34         ` Madhavan T. Venkataraman
2020-08-10 17:34           ` Madhavan T. Venkataraman
2020-08-11 21:12           ` Madhavan T. Venkataraman
2020-08-11 21:12             ` Madhavan T. Venkataraman
2020-08-03  8:23       ` David Laight
2020-08-03  8:23         ` David Laight
2020-08-03 15:59         ` Madhavan T. Venkataraman
2020-08-03 15:59           ` Madhavan T. Venkataraman
2020-07-31 18:09   ` Mark Rutland
2020-07-31 18:09     ` Mark Rutland
2020-07-31 20:08     ` Madhavan T. Venkataraman
2020-07-31 20:08       ` Madhavan T. Venkataraman
2020-08-03 16:57     ` Madhavan T. Venkataraman
2020-08-03 16:57       ` Madhavan T. Venkataraman
2020-08-04 14:30       ` Mark Rutland
2020-08-04 14:30         ` Mark Rutland
2020-08-06 17:26         ` Madhavan T. Venkataraman
2020-08-06 17:26           ` Madhavan T. Venkataraman
2020-08-08 22:17           ` Pavel Machek
2020-08-08 22:17             ` Pavel Machek
2020-08-11 12:41             ` Madhavan T. Venkataraman
2020-08-11 12:41               ` Madhavan T. Venkataraman
2020-08-11 13:08               ` Pavel Machek
2020-08-11 13:08                 ` Pavel Machek
2020-08-11 15:54                 ` Madhavan T. Venkataraman
2020-08-11 15:54                   ` Madhavan T. Venkataraman
2020-08-12 10:06           ` Mark Rutland
2020-08-12 10:06             ` Mark Rutland
2020-08-12 18:47             ` Madhavan T. Venkataraman
2020-08-12 18:47               ` Madhavan T. Venkataraman
2020-08-19 18:53             ` Mickaël Salaün
2020-08-19 18:53               ` Mickaël Salaün
2020-09-01 15:42               ` Mark Rutland
2020-09-01 15:42                 ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b28f4a5-2d9e-0686-40e5-2ea9e37c5933@linux.microsoft.com \
    --to=madvenka@linux.microsoft.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=oleg@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.