linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
       [not found] <210d7cd762d5307c2aa1676705b392bd445f1baa>
@ 2020-09-16 15:08 ` madvenka
  2020-09-16 15:08   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
                     ` (4 more replies)
  2020-09-22 21:53 ` madvenka
  1 sibling, 5 replies; 50+ messages in thread
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.

Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.

To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.

The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,

- Load the code into a file and map the file with R-X.

- Load the code in an RW- page. Change the permissions to R--. Then,
  change the permissions to R-X.

- Load the code in an RW- page. Remap the page with R-X to get a separate
  mapping to the same underlying physical page.

IMO, these are all security holes as an attacker can exploit them to inject
his own code.

In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:

	- user level tricks using anonymous pages will fail as anonymous
	  pages have no file identity

	- loading the code in a temporary file and mapping it with R-X
	  will fail as the temporary file would not have a signature

We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.

Examples of trampolines
=======================

libffi (A Portable Foreign Function Interface Library):

libffi allows a user to define functions with an arbitrary list of
arguments and return value through a feature called "Closures".
Closures use trampolines to jump to ABI handlers that handle calling
conventions and call a target function. libffi is used by a lot
of different applications. To name a few:

	- Python
	- Java
	- Javascript
	- Ruby FFI
	- Lisp
	- Objective C

GCC nested functions:

GCC has traditionally used trampolines for implementing nested
functions. The trampoline is placed on the user stack. So, the stack
needs to be executable.

Currently available solution
============================

One solution that has been proposed to allow trampolines to be executed
without making security exceptions is Trampoline Emulation. See:

https://pax.grsecurity.net/docs/emutramp.txt

In this solution, the kernel recognizes certain sequences of instructions
as "well-known" trampolines. When such a trampoline is executed, a page
fault happens because the trampoline page does not have execute permission.
The kernel recognizes the trampoline and emulates it. Basically, the
kernel does the work of the trampoline on behalf of the application.

Currently, the emulated trampolines are the ones used in libffi and GCC
nested functions. To my knowledge, only X86 is supported at this time.

As noted in emutramp.txt, this is not a generic solution. For every new
trampoline that needs to be supported, new instruction sequences need to
be recognized by the kernel and emulated. And this has to be done for
every architecture that needs to be supported.

emutramp.txt notes the following:

"... the real solution is not in emulation but by designing a kernel API
for runtime code generation and modifying userland to make use of it."

Solution proposed in this RFC
=============================

From this RFC's perspective, there are two scenarios for dynamic code:

Scenario 1
----------

We know what code we need only at runtime. For instance, JIT code generated
for frequently executed Java methods. Only at runtime do we know what
methods need to be JIT compiled. Such code cannot be statically defined. It
has to be generated at runtime.

Scenario 2
----------

We know what code we need in advance. User trampolines are a good example of
this. It is possible to define such code statically with some help from the
kernel.

This RFC addresses (2). (1) needs a general purpose trusted code generator
and is out of scope for this RFC.

For (2), the solution is to convert dynamic code to static code and place it
in a source file. The binary generated from the source can be signed. The
kernel can use signature verification to authenticate the binary and
allow the code to be mapped and executed.

The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.

We need one standard way that would work for all architectures and ABIs.

The solution proposed here is:

1. Write the static code assuming that the data needed by the code is already
   pointed to by a designated register.

2. Get the kernel to supply a small universal trampoline that does the
   following:

	- Load the address of the data in a designated register
	- Load the address of the static code in a designated register
	- Jump to the static code

User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.

To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.

Trampoline File Descriptor (trampfd)
==========================

I am proposing a kernel API using anonymous file descriptors that can be
used to create the trampolines. The API is described in patch 1/4 of this
patchset. I provide a summary here:

	- Create a trampoline file object

	- Write a code descriptor into the trampoline file and specify:

		- the number of trampolines desired
		- the name of the code register
		- user pointer to a table of code addresses, one address
		  per trampoline

	- Write a data descriptor into the trampoline file and specify:

		- the name of the data register
		- user pointer to a table of data addresses, one address
		  per trampoline

	- mmap() the trampoline file. The kernel generates a table of
	  trampolines in a page and returns the trampoline table address

	- munmap() a trampoline file mapping

	- Close the trampoline file

Each mmap() will only map a single base page. Large pages are not supported.

A trampoline file can only be mapped once in an address space.

Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.

It is recommended that the code descriptor and the code table be placed
in the .rodata section so an attacker cannot modify them.

Trampoline use and reuse
========================

The code for trampoline X in the trampoline table is:

	load	&code_table[X], code_reg
	load	(code_reg), code_reg
	load	&data_table[X], data_reg
	load	(data_reg), data_reg
	jump	code_reg

The addresses &code_table[X] and &data_table[X] are baked into the
trampoline code. So, PC-relative data references are not needed. The user
can modify code_table[X] and data_table[X] dynamically.

For instance, within libffi, the same trampoline X can be used for different
closures at different times by setting:

	data_table[X] = closure;
	code_table[X] = ABI handling code;

Advantages of the Trampoline File Descriptor approach
=====================================================

- Using this support from the kernel, dynamic code can be converted to
  static code with a little effort so applications and libraries can move to
  a more secure model. In the simplest cases such as libffi, dynamic code can
  even be eliminated.

- This initial work is targeted towards X86 and ARM. But it can be supported
  easily on all architectures. We don't need any special ISA features such
  as PC-relative data referencing.

- The only code generation needed is for this small, universal trampoline.

- The kernel does not have to deal with any ABI issues in the generation of
  this trampoline.

- The kernel provides a trampoline table to conserve memory.

- An SELinux setting called "exectramp" can be implemented along the
  lines of "execmem", "execstack" and "execheap" to selectively allow the
  use of trampolines on a per application basis.

- In version 1, a trip to the kernel was required to execute the trampoline.
  In version 2, that is not required. So, there are no performance
  concerns in this approach.

libffi
======

I have implemented my solution for libffi and provided the changes for
X86 and ARM, 32-bit and 64-bit. Here is the reference patch:

http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt

If the trampfd patchset gets accepted, I will send the libffi changes
to the maintainers for a review. BTW, I have also successfully executed
the libffi self tests.

Work that is pending
====================

- I am working on implementing the SELinux setting - "exectramp".

- I have a test program to test the kernel API. I am working on adding it
  to selftests.

References
==========

[1] https://microsoft.github.io/ipe/
---

Changelog:

v1
	Introduced the Trampfd feature.

v2
	- Changed the system call. Version 2 does not support different
	  trampoline types and their associated type structures. It only
	  supports a kernel generated trampoline.

	  The system call now returns information to the user that is
	  used to define trampoline descriptors. E.g., the maximum
	  number of trampolines that can be packed in a single page.

	- Removed all the trampoline contexts such as register contexts
	  and stack contexts. This is based on the feedback that the kernel
	  should not have to worry about ABI issues and H/W features that
	  may deal with the context of a process.

	- Removed the need to make a trip into the kernel on trampoline
	  invocation. This is based on the feedback about performance.

	- Removed the ability to share trampolines across address spaces.
	  This would have made sense to different trampoline types based
	  on their semantics. But since I support only one specific
	  trampoline, sharing does not make sense.

	- Added calls to specify trampoline descriptors that the kernel
	  uses to generate trampolines.

	- Added architecture-specific code to generate the small, universal
	  trampoline for X86 32 and 64-bit, ARM 32 and 64-bit.

	- Implemented the trampoline table in a page.
Madhavan T. Venkataraman (4):
  Implement the kernel API for the trampoline file descriptor.
  Implement i386 and X86 support for the trampoline file descriptor.
  Implement ARM64 support for the trampoline file descriptor.
  Implement ARM support for the trampoline file descriptor.

 arch/arm/include/uapi/asm/ptrace.h     |  21 +++
 arch/arm/kernel/Makefile               |   1 +
 arch/arm/kernel/trampfd.c              | 124 +++++++++++++
 arch/arm/tools/syscall.tbl             |   1 +
 arch/arm64/include/asm/unistd.h        |   2 +-
 arch/arm64/include/asm/unistd32.h      |   2 +
 arch/arm64/include/uapi/asm/ptrace.h   |  59 ++++++
 arch/arm64/kernel/Makefile             |   2 +
 arch/arm64/kernel/trampfd.c            | 244 +++++++++++++++++++++++++
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/trampfd.c              | 238 ++++++++++++++++++++++++
 fs/Makefile                            |   1 +
 fs/trampfd/Makefile                    |   5 +
 fs/trampfd/trampfd_fops.c              | 241 ++++++++++++++++++++++++
 fs/trampfd/trampfd_map.c               | 142 ++++++++++++++
 include/linux/syscalls.h               |   2 +
 include/linux/trampfd.h                |  49 +++++
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/trampfd.h           | 184 +++++++++++++++++++
 init/Kconfig                           |   7 +
 kernel/sys_ni.c                        |   3 +
 24 files changed, 1371 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/kernel/trampfd.c
 create mode 100644 arch/arm64/kernel/trampfd.c
 create mode 100644 arch/x86/kernel/trampfd.c
 create mode 100644 fs/trampfd/Makefile
 create mode 100644 fs/trampfd/trampfd_fops.c
 create mode 100644 fs/trampfd/trampfd_map.c
 create mode 100644 include/linux/trampfd.h
 create mode 100644 include/uapi/linux/trampfd.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API
  2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
@ 2020-09-16 15:08   ` madvenka
  2020-09-16 15:08   ` [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.

Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.

To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.

The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,

- Load the code into a file and map the file with R-X.

- Load the code in an RW- page. Change the permissions to R--. Then,
  change the permissions to R-X.

- Load the code in an RW- page. Remap the page with R-X to get a separate
  mapping to the same underlying physical page.

IMO, these are all security holes as an attacker can exploit them to inject
his own code.

In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:

	- user level tricks using anonymous pages will fail as anonymous
	  pages have no file identity

	- loading the code in a temporary file and mapping it with R-X
	  will fail as the temporary file would not have a signature

We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.

Solution
========

The solution is to convert dynamic code to static code and place it in a
source file. The binary generated from the source can be signed. The kernel
can use signature verification to authenticate the binary and allow the code
to be mapped and executed.

The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.

We need one standard way that would work for all architectures and ABIs.

The solution has two parts:

1. The maintainer of the code writes the static code assuming that the data
   needed by the code is already pointed to by a designated register.

2. The kernel supplies a small universal trampoline that does the following:

	- Load the address of the data in a designated register
	- Load the address of the static code in a designated register
	- Jump to the static code

User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.

To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.

Kernel API
==========

A kernel API based on anonymous file descriptors is defined to create
trampolines. The following sections describe the API.

Create trampfd
==============

This feature introduces a new trampfd system call.

	struct trampfd_info	info;
	int			trampfd;

	trampfd = syscall(440, &info);

The kernel creates a trampoline file object and returns the following items
in info:

ntrampolines
	The number of trampolines that can be created with one trampfd. The
	user may create fewer trampolines if he wishes.

code_size
	The size of each trampoline.

code_offset
	The file offset to be used in mmap() to map the trampoline code.

Initialize trampfd
==================

A trampfd is initialized in this manner:

	struct trampfd_code	code;
	struct trampfd_data	data;

	/*
	 * Code descriptor.
	 */
	code.ntrampolines = number of desired trampolines;
	code.reg = code register name;
	code.table = array of code addresses

	/*
	 * Data descriptor.
	 */
	data.reg = data register name;
	data.table = array of data addresses

	pwrite(trampfd, &code, sizeof(init), TRAMPFD_CODE);
	pwrite(trampfd, &data, sizeof(init), TRAMPFD_DATA);

The register names are defined in ptrace.h (reg_32_name and reg_64_name).

It is recommended that the code descriptor and code array be placed in the
.rodata section so that an attacker cannot modify its contents.

Instead of pwrite(), the user can also do lseek() and write().

Map trampfd
===========

The user uses mmap() to map the trampoline table into user address space.

	len = info.code_size * code.ntrampolines;
	prot = PROT_READ | PROT_EXEC;
	flags = MAP_PRIVATE;
	offset = info.code_offset;

	trampoline_table = mmap(NULL, len, prot, flags, trampfd, offset);

The kernel generates the trampoline table. The code for trampoline X in the
table is:

	load	&code_table[X], code_reg
	load	(code_reg), code_reg
	load	&data_table[X], data_reg
	load	(data_reg), data_reg
	jump	code_reg

Each mmap() will only map a single base page. Large pages are not supported.

A trampoline file can only be mmapped once in an address space.

Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.

The trampoline code is generated with &code_table[X] and &data_table[X] hard
coded in it. But code_table[X] and data_table[X] can be modified by user
code dynamically so supply the code and data to trampoline X.

Trampoline table management
===========================

The user manages the trampoline table. The address of trampoline X is:

	trampoline_table + info.code_size * X;

Prior to invoking trampoline X, the user must initialize code_table[X] and
data_table[X].

Unmap trampfd
=============

Once the user is done with the trampoline table, it may be unmapped:

	len = info.code_size * code.ntrampolines;
	munmap(trampoline_table, len);

Remove trampfd
==============

To remove the trampfd:

	close(trampfd);

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 fs/Makefile                       |   1 +
 fs/trampfd/Makefile               |   5 +
 fs/trampfd/trampfd_fops.c         | 241 ++++++++++++++++++++++++++++++
 fs/trampfd/trampfd_map.c          | 142 ++++++++++++++++++
 include/linux/syscalls.h          |   2 +
 include/linux/trampfd.h           |  49 ++++++
 include/uapi/asm-generic/unistd.h |   4 +-
 include/uapi/linux/trampfd.h      | 184 +++++++++++++++++++++++
 init/Kconfig                      |   7 +
 kernel/sys_ni.c                   |   3 +
 10 files changed, 637 insertions(+), 1 deletion(-)
 create mode 100644 fs/trampfd/Makefile
 create mode 100644 fs/trampfd/trampfd_fops.c
 create mode 100644 fs/trampfd/trampfd_map.c
 create mode 100644 include/linux/trampfd.h
 create mode 100644 include/uapi/linux/trampfd.h

diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..227761302000 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -136,3 +136,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
+obj-$(CONFIG_TRAMPFD)		+= trampfd/
diff --git a/fs/trampfd/Makefile b/fs/trampfd/Makefile
new file mode 100644
index 000000000000..ae09a0b1f841
--- /dev/null
+++ b/fs/trampfd/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_TRAMPFD) += trampfd.o
+
+trampfd-y += trampfd_fops.o trampfd_map.o
diff --git a/fs/trampfd/trampfd_fops.c b/fs/trampfd/trampfd_fops.c
new file mode 100644
index 000000000000..7164dd4d9039
--- /dev/null
+++ b/fs/trampfd/trampfd_fops.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - System call and File operations.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@microsoft.com)
+ *
+ * Copyright (C) 2020 Microsoft Corporation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/syscalls.h>
+#include <linux/seq_file.h>
+#include <linux/anon_inodes.h>
+#include <linux/trampfd.h>
+
+char	*trampfd_name = "[trampfd]";
+
+struct kmem_cache	*trampfd_cache;
+
+/*
+ * Arch stub function to return info for the trampfd syscall.
+ */
+int __attribute__((weak)) trampfd_arch(struct trampfd_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Arch stub function to do arch specific initialization for a code
+ * descriptor.
+ */
+int __attribute__((weak)) trampfd_code_arch(struct trampfd_code *code)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Arch stub function to do arch specific initialization for a data
+ * descriptor.
+ */
+int __attribute__((weak)) trampfd_data_arch(struct trampfd_data *data)
+{
+	return -EOPNOTSUPP;
+}
+
+#ifdef CONFIG_PROC_FS
+static void trampfd_show_fdinfo(struct seq_file *sfile, struct file *file)
+{
+	seq_puts(sfile, "Trampoline FD\n");
+}
+#endif
+
+static loff_t trampfd_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct trampfd		*trampfd = file->private_data;
+
+	if (whence != SEEK_SET)
+		return -EINVAL;
+
+	if ((offset < 0) || (offset >= TRAMPFD_NUM_OFFSETS))
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+	mutex_unlock(&trampfd->lock);
+	return offset;
+}
+
+int trampfd_code(struct file *file, const char __user *arg, size_t count)
+{
+	struct trampfd		*trampfd = file->private_data;
+	struct trampfd_code	code;
+	int			rc = 0;
+
+	if (count != sizeof(code))
+		return -EINVAL;
+
+	if (copy_from_user(&code, arg, sizeof(code)))
+		return -EFAULT;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->code) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	rc = trampfd_code_arch(&code);
+	if (rc)
+		goto unlock;
+
+	trampfd->code_reg = code.reg;
+	trampfd->ntrampolines = code.ntrampolines;
+	trampfd->code = (void *) (uintptr_t) code.table;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+int trampfd_data(struct file *file, const char __user *arg, size_t count)
+{
+	struct trampfd		*trampfd = file->private_data;
+	struct trampfd_data	data;
+	int			rc = 0;
+
+	if (count != sizeof(data))
+		return -EINVAL;
+
+	if (copy_from_user(&data, arg, sizeof(data)))
+		return -EFAULT;
+
+	if (data.reserved)
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->data) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	rc = trampfd_data_arch(&data);
+	if (rc)
+		goto unlock;
+
+	trampfd->data_reg = data.reg;
+	trampfd->data = (void *) (uintptr_t) data.table;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+static ssize_t trampfd_write(struct file *file, const char __user *arg,
+			     size_t count, loff_t *ppos)
+{
+	int		rc;
+
+	if (!arg || !count)
+		return -EINVAL;
+
+	switch (*ppos) {
+	case TRAMPFD_CODE:
+		rc = trampfd_code(file, arg, count);
+		break;
+
+	case TRAMPFD_DATA:
+		rc = trampfd_data(file, arg, count);
+		break;
+
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+out:
+	return rc ? rc : (ssize_t) count;
+}
+
+static int trampfd_release(struct inode *inode, struct file *file)
+{
+	struct trampfd		*trampfd = file->private_data;
+
+	mutex_destroy(&trampfd->lock);
+	kmem_cache_free(trampfd_cache, trampfd);
+	return 0;
+}
+
+const struct file_operations trampfd_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo		= trampfd_show_fdinfo,
+#endif
+	.llseek			= trampfd_llseek,
+	.write			= trampfd_write,
+	.release		= trampfd_release,
+	.mmap			= trampfd_mmap,
+	.get_unmapped_area	= trampfd_get_unmapped_area,
+};
+
+SYSCALL_DEFINE1(trampfd, struct trampfd_info *, info_arg)
+{
+	struct trampfd		*trampfd;
+	struct trampfd_info	info;
+	struct file		*file;
+	int			fd;
+	int			rc;
+
+	if (!trampfd_cache)
+		return -ENOMEM;
+
+	if (!info_arg)
+		return -EINVAL;
+
+	trampfd = kmem_cache_zalloc(trampfd_cache, GFP_KERNEL);
+	if (!trampfd)
+		return -ENOMEM;
+	mutex_init(&trampfd->lock);
+	trampfd->creator = current->pid;
+
+	trampfd_arch(&info);
+
+	if (copy_to_user(info_arg, &info, sizeof(info))) {
+		rc = -EFAULT;
+		goto freetramp;
+	}
+
+	rc = get_unused_fd_flags(O_CLOEXEC);
+	if (rc < 0)
+		goto freetramp;
+	fd = rc;
+
+	file = anon_inode_getfile(trampfd_name, &trampfd_fops, trampfd, O_RDWR);
+	if (IS_ERR(file)) {
+		rc = PTR_ERR(file);
+		goto freefd;
+	}
+	file->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+	fd_install(fd, file);
+	return fd;
+freefd:
+	put_unused_fd(fd);
+freetramp:
+	kmem_cache_free(trampfd_cache, trampfd);
+	return rc;
+}
+
+int __init trampfd_feature_init(void)
+{
+	trampfd_cache = kmem_cache_create("trampfd_cache",
+		sizeof(struct trampfd), 0, SLAB_HWCACHE_ALIGN, NULL);
+	if (trampfd_cache == NULL) {
+		pr_warn("%s: kmem_cache_create failed", __func__);
+		return -ENOMEM;
+	}
+	return 0;
+}
+core_initcall(trampfd_feature_init);
diff --git a/fs/trampfd/trampfd_map.c b/fs/trampfd/trampfd_map.c
new file mode 100644
index 000000000000..679b29768491
--- /dev/null
+++ b/fs/trampfd/trampfd_map.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - Memory mapping.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@microsoft.com)
+ *
+ * Copyright (C) 2020 Microsoft Corporation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/mman.h>
+#include <linux/highmem.h>
+#include <linux/trampfd.h>
+
+/*
+ * Arch stub function to populate a page with trampolines based on a
+ * trampoline specification.
+ */
+void __attribute__((weak)) trampfd_code_fill(struct trampfd *trampfd,
+					     char *addr)
+{
+}
+
+static void trampfd_close(struct vm_area_struct *vma)
+{
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->page) {
+		__free_pages(trampfd->page, 0);
+		trampfd->page = NULL;
+	}
+	trampfd->mapped = false;
+
+	mutex_unlock(&trampfd->lock);
+}
+
+static vm_fault_t trampfd_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct	*vma = vmf->vma;
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+	struct page		*new_page = NULL;
+	void			*addr;
+
+	/*
+	 * Check this outside the lock so the lock does not have to be
+	 * dropped in order to allocate a page. Races are benign.
+	 */
+	if (!trampfd->page) {
+		new_page = alloc_pages(GFP_KERNEL, 0);
+		if (!new_page)
+			return VM_FAULT_OOM;
+	}
+
+	mutex_lock(&trampfd->lock);
+
+	if (!trampfd->page) {
+		trampfd->page = new_page;
+		new_page = NULL;
+		/*
+		 * Populate the page with trampolines.
+		 */
+		addr = kmap(trampfd->page);
+		trampfd_code_fill(trampfd, addr);
+		kunmap(trampfd->page);
+	}
+	vmf->page = trampfd->page;
+	get_page(vmf->page);
+
+	mutex_unlock(&trampfd->lock);
+
+	if (new_page)
+		__free_pages(new_page, 0);
+	return 0;
+}
+
+static const struct vm_operations_struct trampfd_vm_ops = {
+	.close	= trampfd_close,
+	.fault	= trampfd_fault,
+};
+
+int trampfd_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+	int			rc = 0;
+
+	/*
+	 * A trampfd cannot be mapped into multiple address spaces.
+	 */
+	if (current->pid != trampfd->creator)
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+
+	/*
+	 * trampfd must be initialized before it can be mapped.
+	 */
+	if (!trampfd->code || !trampfd->data) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	/*
+	 * A trampfd cannot be mapped multiple times in the same address space.
+	 */
+	if (trampfd->mapped) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	/*
+	 * prot should be R-X.
+	 */
+	if ((vma->vm_flags & VM_WRITE) || !(vma->vm_flags & VM_READ) ||
+	    !(vma->vm_flags & VM_EXEC)) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+	trampfd->mapped = true;
+	vma->vm_ops = &trampfd_vm_ops;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+unsigned long
+trampfd_get_unmapped_area(struct file *file, unsigned long orig_addr,
+			  unsigned long len, unsigned long pgoff,
+			  unsigned long flags)
+{
+	const typeof_member(struct file_operations, get_unmapped_area)
+	get_area = current->mm->get_unmapped_area;
+
+	if (pgoff != TRAMPFD_CODE_PGOFF || flags != MAP_PRIVATE ||
+	    len != PAGE_SIZE)
+		return -EINVAL;
+
+	return get_area(file, orig_addr, len, pgoff, flags);
+}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b951a87da987..91f55ff3cdac 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -69,6 +69,7 @@ union bpf_attr;
 struct io_uring_params;
 struct clone_args;
 struct open_how;
+struct trampfd_info;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -1005,6 +1006,7 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_trampfd(struct trampfd_info *info);
 
 /*
  * Architecture-specific system calls
diff --git a/include/linux/trampfd.h b/include/linux/trampfd.h
new file mode 100644
index 000000000000..c98fa1741c36
--- /dev/null
+++ b/include/linux/trampfd.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Trampoline FD - Internal structures and definitions.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+#ifndef _LINUX_TRAMPFD_H
+#define _LINUX_TRAMPFD_H
+
+#include <uapi/linux/trampfd.h>
+
+/*
+ * mmap() offsets.
+ */
+enum trampfd_pgoff {
+	TRAMPFD_CODE_PGOFF = 1,
+	TRAMPFD_NUM_PGOFF,
+};
+
+/*
+ * Trampoline structure.
+ */
+struct trampfd {
+	struct mutex		lock;		/* to serialize access */
+	pid_t			creator;	/* to prevent sharing */
+
+	short			code_reg;	/* code register name */
+	short			data_reg;	/* data register name */
+	int			ntrampolines;	/* number of trampolines */
+
+	void			*code;		/* user code address table */
+	void			*data;		/* user data address table */
+
+	struct page		*page;		/* code page */
+	bool			mapped;		/* mapped into address space? */
+};
+
+#ifdef CONFIG_TRAMPFD
+
+int trampfd_mmap(struct file *file, struct vm_area_struct *vma);
+unsigned long trampfd_get_unmapped_area(struct file *file, unsigned long addr,
+					unsigned long len, unsigned long pgoff,
+					unsigned long flags);
+
+#endif /* CONFIG_TRAMPFD */
+
+#endif /* _LINUX_TRAMPFD_H */
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f4a01305d9a6..3b1ad4b75c7a 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/trampfd.h b/include/uapi/linux/trampfd.h
new file mode 100644
index 000000000000..9bbc0450e16d
--- /dev/null
+++ b/include/uapi/linux/trampfd.h
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Trampoline FD - API structures and definitions.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+#ifndef _UAPI_LINUX_TRAMPFD_H
+#define _UAPI_LINUX_TRAMPFD_H
+
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+/*
+ * All structure fields are defined so that they are the same width and at the
+ * same structure offset on 32-bit and 64-bit to avoid compat code.
+ *
+ * All fields named "reserved" must be set to 0. They are there primarily for
+ * alignment. But they may be used in the future.
+ */
+
+/* ---------------------------- Trampfd Feature --------------------------- */
+
+/*
+ * This feature can be used to help convert dynamic user code to static user
+ * code so that the code can be in the text segment of a binary file. This
+ * allows the kernel to authenticate the code. E.g., using signature
+ * verification of the binary file.
+ *
+ * The problem in converting dynamic code to static code is that the static
+ * code needs to be able to locate its data dynamically. So, its data needs
+ * to be loaded in a designated register before jumping to the static code.
+ *
+ * This feature uses the kernel to generate a small, secure trampoline to do
+ * this. The trampoline code looks like this:
+ *
+ *	- load the address of the static code in a register (code_reg)
+ *	- load the address of its data in a register (data_reg)
+ *	- jump to code_reg
+ *
+ * The kernel places the trampoline in a user page and maps it into the user
+ * address space. To conserve memory, the kernel packs multiple trampolines in
+ * a page and creates a trampoline table.
+ */
+
+/* -------------------------- Trampoline Creation ------------------------- */
+
+/*
+ * This feature introduces a new trampfd system call.
+ *
+ *	struct trampfd_info	info;
+ *
+ *	trampfd = syscall(430, &info);
+ *
+ * The kernel returns the following items in info:
+ *
+ * ntrampolines
+ *	The number of trampolines that can be created with one trampfd. The
+ *	user may create fewer trampolines if he wishes.
+ *
+ * code_size
+ *	The size of each trampoline.
+ *
+ * code_offset
+ *	The file offset to be used in mmap() to map the trampoline code.
+ */
+struct trampfd_info {
+	__u32		ntrampolines;
+	__u32		code_size;
+	__u32		code_offset;
+	__u32		reserved;
+};
+
+/* ----------------------- Trampoline Initialization ---------------------- */
+
+/*
+ * Trampoline code descriptor.
+ *
+ * ntrampolines
+ *	User specified number of trampolines. This number cannot exceed
+ *	info.ntrampolines.
+ *
+ * reg
+ *	User specified code register name. This is architecture specific and
+ *	can be obtained from ptrace.h.
+ *
+ * table
+ *	User array of code addresses, one address per trampoline.
+ *
+ */
+struct trampfd_code {
+	__u32		ntrampolines;
+	__u32		reg;
+	__u64		table;
+};
+
+/*
+ * Trampoline data descriptor.
+ *
+ * reg
+ *	User specified data register name. This is architecture specific and
+ *	can be obtained from ptrace.h.
+ *
+ * table
+ *	User array of data addresses, one address per trampoline.
+ *
+ */
+struct trampfd_data {
+	__u32		reg;
+	__u32		reserved;
+	__u64		table;
+};
+
+/*
+ * A trampfd is initialized in this manner:
+ *
+ *	struct trampfd_code	code;
+ *	struct trampfd_data	data;
+ *
+ *	code.ntrampolines = number of desired trampolines;
+ *	code.reg = code register name;
+ *	code.table = array of code addresses
+ *
+ *	data.reg = data register name;
+ *	data.table = array of data addresses
+ *
+ *	pwrite(trampfd, &code, sizeof(init), TRAMPFD_CODE);
+ *	pwrite(trampfd, &data, sizeof(init), TRAMPFD_DATA);
+ *
+ * It is recommended that the code descriptor and code array be placed in the
+ * .rodata section so that an attacker cannot modify its contents.
+ */
+
+/* ---------------------------- Trampoline mapping ------------------------ */
+
+/*
+ * The user uses mmap() to map the trampoline table into user address space.
+ *
+ *	len = info.code_size * code.ntrampolines;
+ *	prot = PROT_READ | PROT_EXEC;
+ *	flags = MAP_PRIVATE;
+ *	offset = info.code_offset;
+ *
+ *	trampoline_table = mmap(NULL, len, prot, flags, trampfd, offset);
+ *
+ * The kernel generates the trampoline table. The code for trampoline X in the
+ * table is:
+ *
+ *	load code_table[X] into code_reg
+ *	load data_table[X] into data_reg
+ *	jump code_reg
+ *
+ * The user manages the trampoline table. The address of trampoline X is:
+ *
+ *	trampoline_table + info.code_size * X;
+ *
+ * Prior to invoking trampoline X, the user must initialize code_table[X] and
+ * data_table[X].
+ */
+
+/* ------------------------- Symbolic offsets -------------------------- */
+
+/*
+ * trampfd can have different actions/parameters associated with it. Each one
+ * has a symbolic file offset. Action/Parameter structures are read or written
+ * at their file offsets.
+ *
+ * Offset		Operation	Data
+ * ------------------------------------------------------------------------
+ * TRAMPFD_CODE		Write		struct trampfd_code
+ * TRAMPFD_DATA		Write		struct trampfd_data
+ * ------------------------------------------------------------------------
+ */
+enum trampfd_offsets {
+	TRAMPFD_CODE,
+	TRAMPFD_DATA,
+	TRAMPFD_NUM_OFFSETS,
+};
+
+
+/* ----------------------------------------------------------------------- */
+
+#endif /* _UAPI_LINUX_TRAMPFD_H */
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..bb3ecca5b8e7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2313,3 +2313,10 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 # <asm/syscall_wrapper.h>.
 config ARCH_HAS_SYSCALL_WRAPPER
 	def_bool n
+
+config TRAMPFD
+	bool "Enable trampfd() system call"
+	depends on MMU
+	help
+	  Enable the trampfd() system call that allows a process to map
+	  kernel generated trampolines within its address space.
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..93c5972aba85 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -349,6 +349,9 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* Trampoline fd */
+COND_SYSCALL(trampfd);
+
 
 /*
  * Architecture specific weak syscall entries.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor
  2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
  2020-09-16 15:08   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
@ 2020-09-16 15:08   ` madvenka
  2020-09-16 15:08   ` [PATCH v2 3/4] [RFC] arm64/trampfd: " madvenka
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table for:
		- 32-bit user process
		- 64-bit user process

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/trampfd.c              | 238 +++++++++++++++++++++++++
 5 files changed, 279 insertions(+)
 create mode 100644 arch/x86/kernel/trampfd.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed1..d4f17806c9ab 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -443,3 +443,4 @@
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	trampfd			sys_trampfd
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78847b32e137..91b37bc4b6f0 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -360,6 +360,7 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	common	trampfd			sys_trampfd
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/x86/include/uapi/asm/ptrace.h b/arch/x86/include/uapi/asm/ptrace.h
index 85165c0edafc..b4be362929b3 100644
--- a/arch/x86/include/uapi/asm/ptrace.h
+++ b/arch/x86/include/uapi/asm/ptrace.h
@@ -9,6 +9,44 @@
 
 #ifndef __ASSEMBLY__
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	x32_min = 0,
+	x32_eax = x32_min,
+	x32_ebx,
+	x32_ecx,
+	x32_edx,
+	x32_esi,
+	x32_edi,
+	x32_ebp,
+	x32_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+	x64_min = x32_max,
+	x64_rax = x64_min,
+	x64_rbx,
+	x64_rcx,
+	x64_rdx,
+	x64_rsi,
+	x64_rdi,
+	x64_rbp,
+	x64_r8,
+	x64_r9,
+	x64_r10,
+	x64_r11,
+	x64_r12,
+	x64_r13,
+	x64_r14,
+	x64_r15,
+	x64_max,
+};
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
    stack during a system call. */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261db2391..feb7f4f311fd 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -157,3 +157,4 @@ ifeq ($(CONFIG_X86_64),y)
 endif
 
 obj-$(CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT)	+= ima_arch.o
+obj-$(CONFIG_TRAMPFD)				+= trampfd.o
diff --git a/arch/x86/kernel/trampfd.c b/arch/x86/kernel/trampfd.c
new file mode 100644
index 000000000000..7b812c200d01
--- /dev/null
+++ b/arch/x86/kernel/trampfd.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - X86 support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE		24
+#define TRAMPFD_CODE_64_SIZE		40
+
+static inline bool is_compat(void)
+{
+	return (IS_ENABLED(CONFIG_X86_32) ||
+		(IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_ADDR32)));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	if (is_compat())
+		info->code_size = TRAMPFD_CODE_32_SIZE;
+	else
+		info->code_size = TRAMPFD_CODE_64_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	if (is_compat()) {
+		min = x32_min;
+		max = x32_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+	} else {
+		min = x64_min;
+		max = x64_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+	}
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	if (is_compat()) {
+		min = x32_min;
+		max = x32_max;
+	} else {
+		min = x64_min;
+		max = x64_max;
+	}
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * X32 register encodings.
+ */
+static unsigned char	reg_32[] = {
+	0,	/* x32_eax */
+	3,	/* x32_ebx */
+	1,	/* x32_ecx */
+	2,	/* x32_edx */
+	6,	/* x32_esi */
+	7,	/* x32_edi */
+	5,	/* x32_ebp */
+};
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - x32_min;
+	int		dreg = trampfd->data_reg - x32_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/* endbr32 */
+		addr[0] = 0xf3;
+		addr[1] = 0x0f;
+		addr[2] = 0x1e;
+		addr[3] = 0xfb;
+
+		/* mov code, %creg */
+		addr[4] = 0xB8 | reg_32[creg];			/* opcode+reg */
+		memcpy(&addr[5], &code, sizeof(u32));		/* imm32 */
+
+		/* mov (%creg), %creg */
+		addr[9] = 0x8B;				/* opcode */
+		addr[10] = 0x00 |				/* MODRM.mode */
+			   reg_32[creg] << 3 |			/* MODRM.reg */
+			   reg_32[creg];			/* MODRM.r/m */
+
+		/* mov data, %dreg */
+		addr[11] = 0xB8 | reg_32[dreg];			/* opcode+reg */
+		memcpy(&addr[12], &data, sizeof(u32));		/* imm32 */
+
+		/* mov (%dreg), %dreg */
+		addr[16] = 0x8B;				/* opcode */
+		addr[17] = 0x00 |				/* MODRM.mode */
+			   reg_32[dreg] << 3 |			/* MODRM.reg */
+			   reg_32[dreg];			/* MODRM.r/m */
+
+		/* jmp *%creg */
+		addr[18] = 0xff;				/* opcode */
+		addr[19] = 0xe0 | reg_32[creg];			/* MODRM.r/m */
+
+		/* nopl (%eax) */
+		addr[20] = 0x0f;
+		addr[21] = 0x1f;
+		addr[22] = 0x00;
+
+		/* pad to 4-byte boundary */
+		memset(&addr[23], 0, TRAMPFD_CODE_32_SIZE - 23);
+		addr += TRAMPFD_CODE_32_SIZE;
+	}
+	memset(addr, 0, eaddr - addr);
+}
+
+/*
+ * X64 register encodings.
+ */
+static unsigned char	reg_64[] = {
+	0,	/* x64_rax */
+	3,	/* x64_rbx */
+	1,	/* x64_rcx */
+	2,	/* x64_rdx */
+	6,	/* x64_rsi */
+	7,	/* x64_rdi */
+	5,	/* x64_rbp */
+	8,	/* x64_r8 */
+	9,	/* x64_r9 */
+	10,	/* x64_r10 */
+	11,	/* x64_r11 */
+	12,	/* x64_r12 */
+	13,	/* x64_r13 */
+	14,	/* x64_r14 */
+	15,	/* x64_r15 */
+};
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - x64_min;
+	int		dreg = trampfd->data_reg - x64_min;
+	u64		*code = trampfd->code;
+	u64		*data = trampfd->data;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/* endbr64 */
+		addr[0] = 0xf3;
+		addr[1] = 0x0f;
+		addr[2] = 0x1e;
+		addr[3] = 0xfa;
+
+		/* movabs code, %creg */
+		addr[4] = 0x48 |				/* REX.W */
+			  ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[5] = 0xB8 | (reg_64[creg] & 0x7);		/* opcode+reg */
+		memcpy(&addr[6], &code, sizeof(u64));		/* imm64 */
+
+		/* movq (%creg), %creg */
+		addr[14] = 0x48 |				/* REX.W */
+			   ((reg_64[creg] & 0x8) >> 1) |	/* REX.R */
+			   ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[15] = 0x8B;				/* opcode */
+		addr[16] = 0x00 |				/* MODRM.mode */
+			   ((reg_64[creg] & 0x7)) << 3 |	/* MODRM.reg */
+			   ((reg_64[creg] & 0x7));		/* MODRM.r/m */
+
+		/* movabs data, %dreg */
+		addr[17] = 0x48 |				/* REX.W */
+			  ((reg_64[dreg] & 0x8) >> 3);		/* REX.B */
+		addr[18] = 0xB8 | (reg_64[dreg] & 0x7);		/* opcode+reg */
+		memcpy(&addr[19], &data, sizeof(u64));		/* imm64 */
+
+		/* movq (%dreg), %dreg */
+		addr[27] = 0x48 |				/* REX.W */
+			   ((reg_64[dreg] & 0x8) >> 1) |	/* REX.R */
+			   ((reg_64[dreg] & 0x8) >> 3);		/* REX.B */
+		addr[28] = 0x8B;				/* opcode */
+		addr[29] = 0x00 |				/* MODRM.mode */
+			   ((reg_64[dreg] & 0x7)) << 3 |	/* MODRM.reg */
+			   ((reg_64[dreg] & 0x7));		/* MODRM.r/m */
+
+		/* jmpq *%creg */
+		addr[30] = 0x40 |				/* REX.W */
+			   ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[31] = 0xff;				/* opcode */
+		addr[32] = 0xe0 | (reg_64[creg] & 0x7);		/* MODRM.r/m */
+
+		/* nopl (%rax) */
+		addr[33] = 0x0f;
+		addr[34] = 0x1f;
+		addr[35] = 0x00;
+
+		/* pad to 8-byte boundary */
+		memset(&addr[36], 0, TRAMPFD_CODE_64_SIZE - 36);
+		addr += TRAMPFD_CODE_64_SIZE;
+	}
+	memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	if (is_compat())
+		trampfd_code_fill_32(trampfd, addr);
+	else
+		trampfd_code_fill_64(trampfd, addr);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/4] [RFC] arm64/trampfd: Provide support for the trampoline file descriptor
  2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
  2020-09-16 15:08   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
  2020-09-16 15:08   ` [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
@ 2020-09-16 15:08   ` madvenka
  2020-09-16 15:08   ` [PATCH v2 4/4] [RFC] arm/trampfd: " madvenka
  2020-09-17  1:04   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Florian Weimer
  4 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table for:
		- 32-bit user process
		- 64-bit user process

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/unistd.h      |   2 +-
 arch/arm64/include/asm/unistd32.h    |   2 +
 arch/arm64/include/uapi/asm/ptrace.h |  59 +++++++
 arch/arm64/kernel/Makefile           |   2 +
 arch/arm64/kernel/trampfd.c          | 244 +++++++++++++++++++++++++++
 5 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/trampfd.c

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f..c0493c5322d9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 42cbe34d95ce..2778789c1cbe 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -88,6 +88,65 @@ struct user_pt_regs {
 	__u64		pstate;
 };
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	arm_min,
+	arm_r0 = arm_min,
+	arm_r1,
+	arm_r2,
+	arm_r3,
+	arm_r4,
+	arm_r5,
+	arm_r6,
+	arm_r7,
+	arm_r8,
+	arm_r9,
+	arm_r10,
+	arm_r11,
+	arm_r12,
+	arm_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+	arm64_min = arm_max,
+	arm64_r0 = arm64_min,
+	arm64_r1,
+	arm64_r2,
+	arm64_r3,
+	arm64_r4,
+	arm64_r5,
+	arm64_r6,
+	arm64_r7,
+	arm64_r8,
+	arm64_r9,
+	arm64_r10,
+	arm64_r11,
+	arm64_r12,
+	arm64_r13,
+	arm64_r14,
+	arm64_r15,
+	arm64_r16,
+	arm64_r17,
+	arm64_r18,
+	arm64_r19,
+	arm64_r20,
+	arm64_r21,
+	arm64_r22,
+	arm64_r23,
+	arm64_r24,
+	arm64_r25,
+	arm64_r26,
+	arm64_r27,
+	arm64_r28,
+	arm64_r29,
+	arm64_max,
+};
+
 struct user_fpsimd_state {
 	__uint128_t	vregs[32];
 	__u32		fpsr;
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a561cbb91d4d..18d373fb1208 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -71,3 +71,5 @@ extra-y					+= $(head-y) vmlinux.lds
 ifeq ($(CONFIG_DEBUG_EFI),y)
 AFLAGS_head.o += -DVMLINUX_PATH="\"$(realpath $(objtree)/vmlinux)\""
 endif
+
+obj-$(CONFIG_TRAMPFD)			+= trampfd.o
diff --git a/arch/arm64/kernel/trampfd.c b/arch/arm64/kernel/trampfd.c
new file mode 100644
index 000000000000..3b40ebb12907
--- /dev/null
+++ b/arch/arm64/kernel/trampfd.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM64 support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <asm/compat.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE		28
+#define TRAMPFD_CODE_64_SIZE		48
+
+static inline bool is_compat(void)
+{
+	return is_compat_thread(task_thread_info(current));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	if (is_compat())
+		info->code_size = TRAMPFD_CODE_32_SIZE;
+	else
+		info->code_size = TRAMPFD_CODE_64_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	if (is_compat()) {
+		min = arm_min;
+		max = arm_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+	} else {
+		min = arm64_min;
+		max = arm64_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+	}
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	if (is_compat()) {
+		min = arm_min;
+		max = arm_max;
+	} else {
+		min = arm64_min;
+		max = arm64_max;
+	}
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+#define MOVARM(ins, reg, imm32)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm32);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 2; _hw++) {					\
+		/* movw or movt */					\
+		_opcode = _hw ? 0xe3400000 : 0xe3000000;		\
+		*ins++ = _opcode | (_imm16[_hw] >> 12) << 16 |		\
+			 (reg) << 12 | (_imm16[_hw] & 0xFFF);		\
+	}								\
+}
+
+#define LDRARM(ins, reg)						\
+{									\
+	*ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12;		\
+}
+
+#define BXARM(ins, reg)							\
+{									\
+	*ins++ = 0xe12fff10 | (reg);					\
+}
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm_min;
+	int		dreg = trampfd->data_reg - arm_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * movw creg, code & 0xFFFF
+		 * movt creg, code >> 16
+		 */
+		MOVARM(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDRARM(instruction, creg);
+
+		/*
+		 * movw dreg, data & 0xFFFF
+		 * movt dreg, data >> 16
+		 */
+		MOVARM(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDRARM(instruction, dreg);
+
+		/*
+		 * bx	creg
+		 */
+		BXARM(instruction, creg);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
+
+#define MOVQ(ins, reg, imm64)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm64);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 4; _hw++) {					\
+		/* movz or movk */					\
+		_opcode = _hw ? 0xf2800000 : 0xd2800000;		\
+		*ins++ = _opcode | _hw << 21 | _imm16[_hw] << 5 | (reg);\
+	}								\
+}
+
+#define LDR(ins, reg)							\
+{									\
+	*ins++ = 0xf9400000 | (reg) << 5 | (reg);			\
+}
+
+#define BR(ins, reg)							\
+{									\
+	*ins++ = 0xd61f0000 | (reg) << 5;				\
+}
+
+#define PAD(ins)							\
+{									\
+	while ((uintptr_t) ins & 7)					\
+		*ins++ = 0;						\
+}
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm64_min;
+	int		dreg = trampfd->data_reg - arm64_min;
+	u64		*code = trampfd->code;
+	u64		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * Pseudo instruction:
+		 *
+		 * movq creg, code
+		 *
+		 * Actual instructions:
+		 *
+		 * movz	creg, code & 0xFFFF
+		 * movk	creg, (code >> 16) & 0xFFFF, lsl 16
+		 * movk	creg, (code >> 32) & 0xFFFF, lsl 32
+		 * movk	creg, (code >> 48) & 0xFFFF, lsl 48
+		 */
+		MOVQ(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDR(instruction, creg);
+
+		/*
+		 * Pseudo instruction:
+		 *
+		 * movq dreg, data
+		 *
+		 * Actual instructions:
+		 *
+		 * movz	dreg, data & 0xFFFF
+		 * movk	dreg, (data >> 16) & 0xFFFF, lsl 16
+		 * movk	dreg, (data >> 32) & 0xFFFF, lsl 32
+		 * movk	dreg, (data >> 48) & 0xFFFF, lsl 48
+		 */
+		MOVQ(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDR(instruction, dreg);
+
+		/*
+		 * br	creg
+		 */
+		BR(instruction, creg);
+
+		/*
+		 * Pad to 8-byte boundary
+		 */
+		PAD(instruction);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	if (is_compat())
+		trampfd_code_fill_32(trampfd, addr);
+	else
+		trampfd_code_fill_64(trampfd, addr);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/4] [RFC] arm/trampfd: Provide support for the trampoline file descriptor
  2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
                     ` (2 preceding siblings ...)
  2020-09-16 15:08   ` [PATCH v2 3/4] [RFC] arm64/trampfd: " madvenka
@ 2020-09-16 15:08   ` madvenka
  2020-09-17  1:04   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Florian Weimer
  4 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-16 15:08 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table,

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm/include/uapi/asm/ptrace.h |  21 +++++
 arch/arm/kernel/Makefile           |   1 +
 arch/arm/kernel/trampfd.c          | 124 +++++++++++++++++++++++++++++
 arch/arm/tools/syscall.tbl         |   1 +
 4 files changed, 147 insertions(+)
 create mode 100644 arch/arm/kernel/trampfd.c

diff --git a/arch/arm/include/uapi/asm/ptrace.h b/arch/arm/include/uapi/asm/ptrace.h
index e61c65b4018d..598047768f9b 100644
--- a/arch/arm/include/uapi/asm/ptrace.h
+++ b/arch/arm/include/uapi/asm/ptrace.h
@@ -151,6 +151,27 @@ struct pt_regs {
 #define ARM_r0		uregs[0]
 #define ARM_ORIG_r0	uregs[17]
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	arm_min,
+	arm_r0 = arm_min,
+	arm_r1,
+	arm_r2,
+	arm_r3,
+	arm_r4,
+	arm_r5,
+	arm_r6,
+	arm_r7,
+	arm_r8,
+	arm_r9,
+	arm_r10,
+	arm_r11,
+	arm_r12,
+	arm_max,
+};
+
 /*
  * The size of the user-visible VFP state as seen by PTRACE_GET/SETVFPREGS
  * and core dumps.
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89e5d864e923..652c54c2f19a 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -105,5 +105,6 @@ obj-$(CONFIG_SMP)		+= psci_smp.o
 endif
 
 obj-$(CONFIG_HAVE_ARM_SMCCC)	+= smccc-call.o
+obj-$(CONFIG_TRAMPFD)		+= trampfd.o
 
 extra-y := $(head-y) vmlinux.lds
diff --git a/arch/arm/kernel/trampfd.c b/arch/arm/kernel/trampfd.c
new file mode 100644
index 000000000000..45146ed489e8
--- /dev/null
+++ b/arch/arm/kernel/trampfd.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_SIZE		28
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	info->code_size = TRAMPFD_CODE_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	min = arm_min;
+	max = arm_max;
+	ntrampolines = PAGE_SIZE / TRAMPFD_CODE_SIZE;
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	min = arm_min;
+	max = arm_max;
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+#define MOVW(ins, reg, imm32)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm32);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 2; _hw++) {					\
+		/* movw or movt */					\
+		_opcode = _hw ? 0xe3400000 : 0xe3000000;		\
+		*ins++ = _opcode | (_imm16[_hw] >> 12) << 16 |		\
+			 (reg) << 12 | (_imm16[_hw] & 0xFFF);		\
+	}								\
+}
+
+#define LDR(ins, reg)							\
+{									\
+	*ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12;		\
+}
+
+#define BX(ins, reg)							\
+{									\
+	*ins++ = 0xe12fff10 | (reg);					\
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm_min;
+	int		dreg = trampfd->data_reg - arm_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * movw creg, code & 0xFFFF
+		 * movt creg, code >> 16
+		 */
+		MOVW(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDR(instruction, creg);
+
+		/*
+		 * movw dreg, data & 0xFFFF
+		 * movt dreg, data >> 16
+		 */
+		MOVW(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDR(instruction, dreg);
+
+		/*
+		 * bx	creg
+		 */
+		BX(instruction, creg);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index d5cae5ffede0..85dcbc9e08ee 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -452,3 +452,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	trampfd				sys_trampfd
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
                     ` (3 preceding siblings ...)
  2020-09-16 15:08   ` [PATCH v2 4/4] [RFC] arm/trampfd: " madvenka
@ 2020-09-17  1:04   ` Florian Weimer
  2020-09-17 15:36     ` Madhavan T. Venkataraman
  4 siblings, 1 reply; 50+ messages in thread
From: Florian Weimer @ 2020-09-17  1:04 UTC (permalink / raw)
  To: madvenka
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	libffi-discuss

* madvenka:

> Examples of trampolines
> =======================
>
> libffi (A Portable Foreign Function Interface Library):
>
> libffi allows a user to define functions with an arbitrary list of
> arguments and return value through a feature called "Closures".
> Closures use trampolines to jump to ABI handlers that handle calling
> conventions and call a target function. libffi is used by a lot
> of different applications. To name a few:
>
> 	- Python
> 	- Java
> 	- Javascript
> 	- Ruby FFI
> 	- Lisp
> 	- Objective C

libffi does not actually need this.  It currently collocates
trampolines and the data they need on the same page, but that's
actually unecessary.  It's possible to avoid doing this just by
changing libffi, without any kernel changes.

I think this has already been done for the iOS port.

> The code for trampoline X in the trampoline table is:
> 
> 	load	&code_table[X], code_reg
> 	load	(code_reg), code_reg
> 	load	&data_table[X], data_reg
> 	load	(data_reg), data_reg
> 	jump	code_reg
> 
> The addresses &code_table[X] and &data_table[X] are baked into the
> trampoline code. So, PC-relative data references are not needed. The user
> can modify code_table[X] and data_table[X] dynamically.

You can put this code into the libffi shared object and map it from
there, just like the rest of the libffi code.  To get more
trampolines, you can map the page containing the trampolines multiple
times, each instance preceded by a separate data page with the control
information.

I think the previous patch submission has also resulted in several
comments along those lines, so I'm not sure why you are reposting
this.

> libffi
> ======
>
> I have implemented my solution for libffi and provided the changes for
> X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
>
> http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt

The URL does not appear to work, I get a 403 error.

> If the trampfd patchset gets accepted, I will send the libffi changes
> to the maintainers for a review. BTW, I have also successfully executed
> the libffi self tests.

I have not seen your libffi changes, but I expect that the complexity
is about the same as a userspace-only solution.


Cc:ing libffi upstream for awareness.  The start of the thread is
here:

<https://lore.kernel.org/linux-api/20200916150826.5990-1-madvenka@linux.microsoft.com/>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-17  1:04   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Florian Weimer
@ 2020-09-17 15:36     ` Madhavan T. Venkataraman
  2020-09-17 15:57       ` Madhavan T. Venkataraman
  2020-09-23  1:46       ` Arvind Sankar
  0 siblings, 2 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-17 15:36 UTC (permalink / raw)
  To: Florian Weimer
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	libffi-discuss



On 9/16/20 8:04 PM, Florian Weimer wrote:
> * madvenka:
> 
>> Examples of trampolines
>> =======================
>>
>> libffi (A Portable Foreign Function Interface Library):
>>
>> libffi allows a user to define functions with an arbitrary list of
>> arguments and return value through a feature called "Closures".
>> Closures use trampolines to jump to ABI handlers that handle calling
>> conventions and call a target function. libffi is used by a lot
>> of different applications. To name a few:
>>
>> 	- Python
>> 	- Java
>> 	- Javascript
>> 	- Ruby FFI
>> 	- Lisp
>> 	- Objective C
> 
> libffi does not actually need this.  It currently collocates
> trampolines and the data they need on the same page, but that's
> actually unecessary.  It's possible to avoid doing this just by
> changing libffi, without any kernel changes.
> 
> I think this has already been done for the iOS port.
> 

The trampoline table that has been implemented for the iOS port (MACH)
is based on PC-relative data referencing. That is, the code and data
are placed in adjacent pages so that the code can access the data using
an address relative to the current PC.

This is an ISA feature that is not supported on all architectures.

Now, if it is a performance feature, we can include some architectures
and exclude others. But this is a security feature. IMO, we cannot
exclude any architecture even if it is a legacy one as long as Linux
is running on the architecture. So, we need a solution that does
not assume any specific ISA feature.

>> The code for trampoline X in the trampoline table is:
>>
>> 	load	&code_table[X], code_reg
>> 	load	(code_reg), code_reg
>> 	load	&data_table[X], data_reg
>> 	load	(data_reg), data_reg
>> 	jump	code_reg
>>
>> The addresses &code_table[X] and &data_table[X] are baked into the
>> trampoline code. So, PC-relative data references are not needed. The user
>> can modify code_table[X] and data_table[X] dynamically.
> 
> You can put this code into the libffi shared object and map it from
> there, just like the rest of the libffi code.  To get more
> trampolines, you can map the page containing the trampolines multiple
> times, each instance preceded by a separate data page with the control
> information.
> 

If you put the code in the libffi shared object, how do you pass data to
the code at runtime? If the code we are talking about is a function, then
there is an ABI defined way to pass data to the function. But if the
code we are talking about is some arbitrary code such as a trampoline,
there is no ABI defined way to pass data to it except in a couple of
platforms such as HP PA-RISC that have support for function descriptors
in the ABI itself.

As mentioned before, if the ISA supports PC-relative data references
(e.g., X86 64-bit platforms support RIP-relative data references)
then we can pass data to that code by placing the code and data in
adjacent pages. So, you can implement the trampoline table for X64.
i386 does not support it.


> I think the previous patch submission has also resulted in several
> comments along those lines, so I'm not sure why you are reposting
> this.

IIRC, I have answered all of those comments by mentioning the point
that we need to support all architectures without requiring special
ISA features. Taking the kernel's help in this is one solution.


> 
>> libffi
>> ======
>>
>> I have implemented my solution for libffi and provided the changes for
>> X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
>>
>> http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt
> 
> The URL does not appear to work, I get a 403 error.

I apologize for that. That site is supposed to be accessible publicly.
I will contact the administrator and get this resolved.

Sorry for the annoyance.

> 
>> If the trampfd patchset gets accepted, I will send the libffi changes
>> to the maintainers for a review. BTW, I have also successfully executed
>> the libffi self tests.
> 
> I have not seen your libffi changes, but I expect that the complexity
> is about the same as a userspace-only solution.
> 
> 

I agree. The complexity is about the same. But the support is for all
architectures. Once the common code is in place, the changes for each
architecture are trivial.

Madhavan

> Cc:ing libffi upstream for awareness.  The start of the thread is
> here:
> 
> <https://lore.kernel.org/linux-api/20200916150826.5990-1-madvenka@linux.microsoft.com/>
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-17 15:36     ` Madhavan T. Venkataraman
@ 2020-09-17 15:57       ` Madhavan T. Venkataraman
  2020-09-17 16:01         ` Florian Weimer
  2020-09-23  1:46       ` Arvind Sankar
  1 sibling, 1 reply; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-17 15:57 UTC (permalink / raw)
  To: Florian Weimer
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	libffi-discuss



On 9/17/20 10:36 AM, Madhavan T. Venkataraman wrote:
>>> libffi
>>> ======
>>>
>>> I have implemented my solution for libffi and provided the changes for
>>> X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
>>>
>>> http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt
>> The URL does not appear to work, I get a 403 error.
> I apologize for that. That site is supposed to be accessible publicly.
> I will contact the administrator and get this resolved.
> 
> Sorry for the annoyance.
> 

Could you try the link again and confirm that you can access it?
Again, sorry for the trouble.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-17 15:57       ` Madhavan T. Venkataraman
@ 2020-09-17 16:01         ` Florian Weimer
  0 siblings, 0 replies; 50+ messages in thread
From: Florian Weimer @ 2020-09-17 16:01 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	libffi-discuss

* Madhavan T. Venkataraman:

> On 9/17/20 10:36 AM, Madhavan T. Venkataraman wrote:
>>>> libffi
>>>> ======
>>>>
>>>> I have implemented my solution for libffi and provided the changes for
>>>> X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
>>>>
>>>> http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt
>>> The URL does not appear to work, I get a 403 error.
>> I apologize for that. That site is supposed to be accessible publicly.
>> I will contact the administrator and get this resolved.
>> 
>> Sorry for the annoyance.

> Could you try the link again and confirm that you can access it?
> Again, sorry for the trouble.

Yes, it works now.  Thanks for having it fixed.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
       [not found] <210d7cd762d5307c2aa1676705b392bd445f1baa>
  2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
@ 2020-09-22 21:53 ` madvenka
  2020-09-22 21:53   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
                     ` (6 more replies)
  1 sibling, 7 replies; 50+ messages in thread
From: madvenka @ 2020-09-22 21:53 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, pavel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.

Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.

To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.

The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,

- Load the code into a file and map the file with R-X.

- Load the code in an RW- page. Change the permissions to R--. Then,
  change the permissions to R-X.

- Load the code in an RW- page. Remap the page with R-X to get a separate
  mapping to the same underlying physical page.

IMO, these are all security holes as an attacker can exploit them to inject
his own code.

In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:

	- user level tricks using anonymous pages will fail as anonymous
	  pages have no file identity

	- loading the code in a temporary file and mapping it with R-X
	  will fail as the temporary file would not have a signature

We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.

Examples of trampolines
=======================

libffi (A Portable Foreign Function Interface Library):

libffi allows a user to define functions with an arbitrary list of
arguments and return value through a feature called "Closures".
Closures use trampolines to jump to ABI handlers that handle calling
conventions and call a target function. libffi is used by a lot
of different applications. To name a few:

	- Python
	- Java
	- Javascript
	- Ruby FFI
	- Lisp
	- Objective C

GCC nested functions:

GCC has traditionally used trampolines for implementing nested
functions. The trampoline is placed on the user stack. So, the stack
needs to be executable.

Currently available solution
============================

One solution that has been proposed to allow trampolines to be executed
without making security exceptions is Trampoline Emulation. See:

https://pax.grsecurity.net/docs/emutramp.txt

In this solution, the kernel recognizes certain sequences of instructions
as "well-known" trampolines. When such a trampoline is executed, a page
fault happens because the trampoline page does not have execute permission.
The kernel recognizes the trampoline and emulates it. Basically, the
kernel does the work of the trampoline on behalf of the application.

Currently, the emulated trampolines are the ones used in libffi and GCC
nested functions. To my knowledge, only X86 is supported at this time.

As noted in emutramp.txt, this is not a generic solution. For every new
trampoline that needs to be supported, new instruction sequences need to
be recognized by the kernel and emulated. And this has to be done for
every architecture that needs to be supported.

emutramp.txt notes the following:

"... the real solution is not in emulation but by designing a kernel API
for runtime code generation and modifying userland to make use of it."

Solution proposed in this RFC
=============================

From this RFC's perspective, there are two scenarios for dynamic code:

Scenario 1
----------

We know what code we need only at runtime. For instance, JIT code generated
for frequently executed Java methods. Only at runtime do we know what
methods need to be JIT compiled. Such code cannot be statically defined. It
has to be generated at runtime.

Scenario 2
----------

We know what code we need in advance. User trampolines are a good example of
this. It is possible to define such code statically with some help from the
kernel.

This RFC addresses (2). (1) needs a general purpose trusted code generator
and is out of scope for this RFC.

For (2), the solution is to convert dynamic code to static code and place it
in a source file. The binary generated from the source can be signed. The
kernel can use signature verification to authenticate the binary and
allow the code to be mapped and executed.

The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.

We need one standard way that would work for all architectures and ABIs.

The solution proposed here is:

1. Write the static code assuming that the data needed by the code is already
   pointed to by a designated register.

2. Get the kernel to supply a small universal trampoline that does the
   following:

	- Load the address of the data in a designated register
	- Load the address of the static code in a designated register
	- Jump to the static code

User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.

To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.

Trampoline File Descriptor (trampfd)
==========================

I am proposing a kernel API using anonymous file descriptors that can be
used to create the trampolines. The API is described in patch 1/4 of this
patchset. I provide a summary here:

	- Create a trampoline file object

	- Write a code descriptor into the trampoline file and specify:

		- the number of trampolines desired
		- the name of the code register
		- user pointer to a table of code addresses, one address
		  per trampoline

	- Write a data descriptor into the trampoline file and specify:

		- the name of the data register
		- user pointer to a table of data addresses, one address
		  per trampoline

	- mmap() the trampoline file. The kernel generates a table of
	  trampolines in a page and returns the trampoline table address

	- munmap() a trampoline file mapping

	- Close the trampoline file

Each mmap() will only map a single base page. Large pages are not supported.

A trampoline file can only be mapped once in an address space.

Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.

It is recommended that the code descriptor and the code table be placed
in the .rodata section so an attacker cannot modify them.

Trampoline use and reuse
========================

The code for trampoline X in the trampoline table is:

	load	&code_table[X], code_reg
	load	(code_reg), code_reg
	load	&data_table[X], data_reg
	load	(data_reg), data_reg
	jump	code_reg

The addresses &code_table[X] and &data_table[X] are baked into the
trampoline code. So, PC-relative data references are not needed. The user
can modify code_table[X] and data_table[X] dynamically.

For instance, within libffi, the same trampoline X can be used for different
closures at different times by setting:

	data_table[X] = closure;
	code_table[X] = ABI handling code;

Advantages of the Trampoline File Descriptor approach
=====================================================

- Using this support from the kernel, dynamic code can be converted to
  static code with a little effort so applications and libraries can move to
  a more secure model. In the simplest cases such as libffi, dynamic code can
  even be eliminated.

- This initial work is targeted towards X86 and ARM. But it can be supported
  easily on all architectures. We don't need any special ISA features such
  as PC-relative data referencing.

- The only code generation needed is for this small, universal trampoline.

- The kernel does not have to deal with any ABI issues in the generation of
  this trampoline.

- The kernel provides a trampoline table to conserve memory.

- An SELinux setting called "exectramp" can be implemented along the
  lines of "execmem", "execstack" and "execheap" to selectively allow the
  use of trampolines on a per application basis.

- In version 1, a trip to the kernel was required to execute the trampoline.
  In version 2, that is not required. So, there are no performance
  concerns in this approach.

libffi
======

I have implemented my solution for libffi and provided the changes for
X86 and ARM, 32-bit and 64-bit. Here is the reference patch:

http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt

If the trampfd patchset gets accepted, I will send the libffi changes
to the maintainers for a review. BTW, I have also successfully executed
the libffi self tests.

Work that is pending
====================

- I am working on implementing the SELinux setting - "exectramp".

- I have a test program to test the kernel API. I am working on adding it
  to selftests.

References
==========

[1] https://microsoft.github.io/ipe/
---

Changelog:

v1
	Introduced the Trampfd feature.

v2
	- Changed the system call. Version 2 does not support different
	  trampoline types and their associated type structures. It only
	  supports a kernel generated trampoline.

	  The system call now returns information to the user that is
	  used to define trampoline descriptors. E.g., the maximum
	  number of trampolines that can be packed in a single page.

	- Removed all the trampoline contexts such as register contexts
	  and stack contexts. This is based on the feedback that the kernel
	  should not have to worry about ABI issues and H/W features that
	  may deal with the context of a process.

	- Removed the need to make a trip into the kernel on trampoline
	  invocation. This is based on the feedback about performance.

	- Removed the ability to share trampolines across address spaces.
	  This would have made sense to different trampoline types based
	  on their semantics. But since I support only one specific
	  trampoline, sharing does not make sense.

	- Added calls to specify trampoline descriptors that the kernel
	  uses to generate trampolines.

	- Added architecture-specific code to generate the small, universal
	  trampoline for X86 32 and 64-bit, ARM 32 and 64-bit.

	- Implemented the trampoline table in a page.
Madhavan T. Venkataraman (4):
  Implement the kernel API for the trampoline file descriptor.
  Implement i386 and X86 support for the trampoline file descriptor.
  Implement ARM64 support for the trampoline file descriptor.
  Implement ARM support for the trampoline file descriptor.

 arch/arm/include/uapi/asm/ptrace.h     |  21 +++
 arch/arm/kernel/Makefile               |   1 +
 arch/arm/kernel/trampfd.c              | 124 +++++++++++++
 arch/arm/tools/syscall.tbl             |   1 +
 arch/arm64/include/asm/unistd.h        |   2 +-
 arch/arm64/include/asm/unistd32.h      |   2 +
 arch/arm64/include/uapi/asm/ptrace.h   |  59 ++++++
 arch/arm64/kernel/Makefile             |   2 +
 arch/arm64/kernel/trampfd.c            | 244 +++++++++++++++++++++++++
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/trampfd.c              | 238 ++++++++++++++++++++++++
 fs/Makefile                            |   1 +
 fs/trampfd/Makefile                    |   5 +
 fs/trampfd/trampfd_fops.c              | 241 ++++++++++++++++++++++++
 fs/trampfd/trampfd_map.c               | 142 ++++++++++++++
 include/linux/syscalls.h               |   2 +
 include/linux/trampfd.h                |  49 +++++
 include/uapi/asm-generic/unistd.h      |   4 +-
 include/uapi/linux/trampfd.h           | 184 +++++++++++++++++++
 init/Kconfig                           |   7 +
 kernel/sys_ni.c                        |   3 +
 24 files changed, 1371 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/kernel/trampfd.c
 create mode 100644 arch/arm64/kernel/trampfd.c
 create mode 100644 arch/x86/kernel/trampfd.c
 create mode 100644 fs/trampfd/Makefile
 create mode 100644 fs/trampfd/trampfd_fops.c
 create mode 100644 fs/trampfd/trampfd_map.c
 create mode 100644 include/linux/trampfd.h
 create mode 100644 include/uapi/linux/trampfd.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API
  2020-09-22 21:53 ` madvenka
@ 2020-09-22 21:53   ` madvenka
  2020-09-22 21:53   ` [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-22 21:53 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, pavel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Introduction
============

Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.

Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.

To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.

The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,

- Load the code into a file and map the file with R-X.

- Load the code in an RW- page. Change the permissions to R--. Then,
  change the permissions to R-X.

- Load the code in an RW- page. Remap the page with R-X to get a separate
  mapping to the same underlying physical page.

IMO, these are all security holes as an attacker can exploit them to inject
his own code.

In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:

	- user level tricks using anonymous pages will fail as anonymous
	  pages have no file identity

	- loading the code in a temporary file and mapping it with R-X
	  will fail as the temporary file would not have a signature

We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.

Solution
========

The solution is to convert dynamic code to static code and place it in a
source file. The binary generated from the source can be signed. The kernel
can use signature verification to authenticate the binary and allow the code
to be mapped and executed.

The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.

We need one standard way that would work for all architectures and ABIs.

The solution has two parts:

1. The maintainer of the code writes the static code assuming that the data
   needed by the code is already pointed to by a designated register.

2. The kernel supplies a small universal trampoline that does the following:

	- Load the address of the data in a designated register
	- Load the address of the static code in a designated register
	- Jump to the static code

User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.

To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.

Kernel API
==========

A kernel API based on anonymous file descriptors is defined to create
trampolines. The following sections describe the API.

Create trampfd
==============

This feature introduces a new trampfd system call.

	struct trampfd_info	info;
	int			trampfd;

	trampfd = syscall(440, &info);

The kernel creates a trampoline file object and returns the following items
in info:

ntrampolines
	The number of trampolines that can be created with one trampfd. The
	user may create fewer trampolines if he wishes.

code_size
	The size of each trampoline.

code_offset
	The file offset to be used in mmap() to map the trampoline code.

Initialize trampfd
==================

A trampfd is initialized in this manner:

	struct trampfd_code	code;
	struct trampfd_data	data;

	/*
	 * Code descriptor.
	 */
	code.ntrampolines = number of desired trampolines;
	code.reg = code register name;
	code.table = array of code addresses

	/*
	 * Data descriptor.
	 */
	data.reg = data register name;
	data.table = array of data addresses

	pwrite(trampfd, &code, sizeof(init), TRAMPFD_CODE);
	pwrite(trampfd, &data, sizeof(init), TRAMPFD_DATA);

The register names are defined in ptrace.h (reg_32_name and reg_64_name).

It is recommended that the code descriptor and code array be placed in the
.rodata section so that an attacker cannot modify its contents.

Instead of pwrite(), the user can also do lseek() and write().

Map trampfd
===========

The user uses mmap() to map the trampoline table into user address space.

	len = info.code_size * code.ntrampolines;
	prot = PROT_READ | PROT_EXEC;
	flags = MAP_PRIVATE;
	offset = info.code_offset;

	trampoline_table = mmap(NULL, len, prot, flags, trampfd, offset);

The kernel generates the trampoline table. The code for trampoline X in the
table is:

	load	&code_table[X], code_reg
	load	(code_reg), code_reg
	load	&data_table[X], data_reg
	load	(data_reg), data_reg
	jump	code_reg

Each mmap() will only map a single base page. Large pages are not supported.

A trampoline file can only be mmapped once in an address space.

Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.

The trampoline code is generated with &code_table[X] and &data_table[X] hard
coded in it. But code_table[X] and data_table[X] can be modified by user
code dynamically so supply the code and data to trampoline X.

Trampoline table management
===========================

The user manages the trampoline table. The address of trampoline X is:

	trampoline_table + info.code_size * X;

Prior to invoking trampoline X, the user must initialize code_table[X] and
data_table[X].

Unmap trampfd
=============

Once the user is done with the trampoline table, it may be unmapped:

	len = info.code_size * code.ntrampolines;
	munmap(trampoline_table, len);

Remove trampfd
==============

To remove the trampfd:

	close(trampfd);

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 fs/Makefile                       |   1 +
 fs/trampfd/Makefile               |   5 +
 fs/trampfd/trampfd_fops.c         | 241 ++++++++++++++++++++++++++++++
 fs/trampfd/trampfd_map.c          | 142 ++++++++++++++++++
 include/linux/syscalls.h          |   2 +
 include/linux/trampfd.h           |  49 ++++++
 include/uapi/asm-generic/unistd.h |   4 +-
 include/uapi/linux/trampfd.h      | 184 +++++++++++++++++++++++
 init/Kconfig                      |   7 +
 kernel/sys_ni.c                   |   3 +
 10 files changed, 637 insertions(+), 1 deletion(-)
 create mode 100644 fs/trampfd/Makefile
 create mode 100644 fs/trampfd/trampfd_fops.c
 create mode 100644 fs/trampfd/trampfd_map.c
 create mode 100644 include/linux/trampfd.h
 create mode 100644 include/uapi/linux/trampfd.h

diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..227761302000 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -136,3 +136,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
+obj-$(CONFIG_TRAMPFD)		+= trampfd/
diff --git a/fs/trampfd/Makefile b/fs/trampfd/Makefile
new file mode 100644
index 000000000000..ae09a0b1f841
--- /dev/null
+++ b/fs/trampfd/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_TRAMPFD) += trampfd.o
+
+trampfd-y += trampfd_fops.o trampfd_map.o
diff --git a/fs/trampfd/trampfd_fops.c b/fs/trampfd/trampfd_fops.c
new file mode 100644
index 000000000000..7164dd4d9039
--- /dev/null
+++ b/fs/trampfd/trampfd_fops.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - System call and File operations.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@microsoft.com)
+ *
+ * Copyright (C) 2020 Microsoft Corporation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/syscalls.h>
+#include <linux/seq_file.h>
+#include <linux/anon_inodes.h>
+#include <linux/trampfd.h>
+
+char	*trampfd_name = "[trampfd]";
+
+struct kmem_cache	*trampfd_cache;
+
+/*
+ * Arch stub function to return info for the trampfd syscall.
+ */
+int __attribute__((weak)) trampfd_arch(struct trampfd_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Arch stub function to do arch specific initialization for a code
+ * descriptor.
+ */
+int __attribute__((weak)) trampfd_code_arch(struct trampfd_code *code)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Arch stub function to do arch specific initialization for a data
+ * descriptor.
+ */
+int __attribute__((weak)) trampfd_data_arch(struct trampfd_data *data)
+{
+	return -EOPNOTSUPP;
+}
+
+#ifdef CONFIG_PROC_FS
+static void trampfd_show_fdinfo(struct seq_file *sfile, struct file *file)
+{
+	seq_puts(sfile, "Trampoline FD\n");
+}
+#endif
+
+static loff_t trampfd_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct trampfd		*trampfd = file->private_data;
+
+	if (whence != SEEK_SET)
+		return -EINVAL;
+
+	if ((offset < 0) || (offset >= TRAMPFD_NUM_OFFSETS))
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+	mutex_unlock(&trampfd->lock);
+	return offset;
+}
+
+int trampfd_code(struct file *file, const char __user *arg, size_t count)
+{
+	struct trampfd		*trampfd = file->private_data;
+	struct trampfd_code	code;
+	int			rc = 0;
+
+	if (count != sizeof(code))
+		return -EINVAL;
+
+	if (copy_from_user(&code, arg, sizeof(code)))
+		return -EFAULT;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->code) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	rc = trampfd_code_arch(&code);
+	if (rc)
+		goto unlock;
+
+	trampfd->code_reg = code.reg;
+	trampfd->ntrampolines = code.ntrampolines;
+	trampfd->code = (void *) (uintptr_t) code.table;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+int trampfd_data(struct file *file, const char __user *arg, size_t count)
+{
+	struct trampfd		*trampfd = file->private_data;
+	struct trampfd_data	data;
+	int			rc = 0;
+
+	if (count != sizeof(data))
+		return -EINVAL;
+
+	if (copy_from_user(&data, arg, sizeof(data)))
+		return -EFAULT;
+
+	if (data.reserved)
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->data) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	rc = trampfd_data_arch(&data);
+	if (rc)
+		goto unlock;
+
+	trampfd->data_reg = data.reg;
+	trampfd->data = (void *) (uintptr_t) data.table;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+static ssize_t trampfd_write(struct file *file, const char __user *arg,
+			     size_t count, loff_t *ppos)
+{
+	int		rc;
+
+	if (!arg || !count)
+		return -EINVAL;
+
+	switch (*ppos) {
+	case TRAMPFD_CODE:
+		rc = trampfd_code(file, arg, count);
+		break;
+
+	case TRAMPFD_DATA:
+		rc = trampfd_data(file, arg, count);
+		break;
+
+	default:
+		rc = -EINVAL;
+		goto out;
+	}
+out:
+	return rc ? rc : (ssize_t) count;
+}
+
+static int trampfd_release(struct inode *inode, struct file *file)
+{
+	struct trampfd		*trampfd = file->private_data;
+
+	mutex_destroy(&trampfd->lock);
+	kmem_cache_free(trampfd_cache, trampfd);
+	return 0;
+}
+
+const struct file_operations trampfd_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo		= trampfd_show_fdinfo,
+#endif
+	.llseek			= trampfd_llseek,
+	.write			= trampfd_write,
+	.release		= trampfd_release,
+	.mmap			= trampfd_mmap,
+	.get_unmapped_area	= trampfd_get_unmapped_area,
+};
+
+SYSCALL_DEFINE1(trampfd, struct trampfd_info *, info_arg)
+{
+	struct trampfd		*trampfd;
+	struct trampfd_info	info;
+	struct file		*file;
+	int			fd;
+	int			rc;
+
+	if (!trampfd_cache)
+		return -ENOMEM;
+
+	if (!info_arg)
+		return -EINVAL;
+
+	trampfd = kmem_cache_zalloc(trampfd_cache, GFP_KERNEL);
+	if (!trampfd)
+		return -ENOMEM;
+	mutex_init(&trampfd->lock);
+	trampfd->creator = current->pid;
+
+	trampfd_arch(&info);
+
+	if (copy_to_user(info_arg, &info, sizeof(info))) {
+		rc = -EFAULT;
+		goto freetramp;
+	}
+
+	rc = get_unused_fd_flags(O_CLOEXEC);
+	if (rc < 0)
+		goto freetramp;
+	fd = rc;
+
+	file = anon_inode_getfile(trampfd_name, &trampfd_fops, trampfd, O_RDWR);
+	if (IS_ERR(file)) {
+		rc = PTR_ERR(file);
+		goto freefd;
+	}
+	file->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+	fd_install(fd, file);
+	return fd;
+freefd:
+	put_unused_fd(fd);
+freetramp:
+	kmem_cache_free(trampfd_cache, trampfd);
+	return rc;
+}
+
+int __init trampfd_feature_init(void)
+{
+	trampfd_cache = kmem_cache_create("trampfd_cache",
+		sizeof(struct trampfd), 0, SLAB_HWCACHE_ALIGN, NULL);
+	if (trampfd_cache == NULL) {
+		pr_warn("%s: kmem_cache_create failed", __func__);
+		return -ENOMEM;
+	}
+	return 0;
+}
+core_initcall(trampfd_feature_init);
diff --git a/fs/trampfd/trampfd_map.c b/fs/trampfd/trampfd_map.c
new file mode 100644
index 000000000000..679b29768491
--- /dev/null
+++ b/fs/trampfd/trampfd_map.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - Memory mapping.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@microsoft.com)
+ *
+ * Copyright (C) 2020 Microsoft Corporation.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/mman.h>
+#include <linux/highmem.h>
+#include <linux/trampfd.h>
+
+/*
+ * Arch stub function to populate a page with trampolines based on a
+ * trampoline specification.
+ */
+void __attribute__((weak)) trampfd_code_fill(struct trampfd *trampfd,
+					     char *addr)
+{
+}
+
+static void trampfd_close(struct vm_area_struct *vma)
+{
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+
+	mutex_lock(&trampfd->lock);
+
+	if (trampfd->page) {
+		__free_pages(trampfd->page, 0);
+		trampfd->page = NULL;
+	}
+	trampfd->mapped = false;
+
+	mutex_unlock(&trampfd->lock);
+}
+
+static vm_fault_t trampfd_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct	*vma = vmf->vma;
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+	struct page		*new_page = NULL;
+	void			*addr;
+
+	/*
+	 * Check this outside the lock so the lock does not have to be
+	 * dropped in order to allocate a page. Races are benign.
+	 */
+	if (!trampfd->page) {
+		new_page = alloc_pages(GFP_KERNEL, 0);
+		if (!new_page)
+			return VM_FAULT_OOM;
+	}
+
+	mutex_lock(&trampfd->lock);
+
+	if (!trampfd->page) {
+		trampfd->page = new_page;
+		new_page = NULL;
+		/*
+		 * Populate the page with trampolines.
+		 */
+		addr = kmap(trampfd->page);
+		trampfd_code_fill(trampfd, addr);
+		kunmap(trampfd->page);
+	}
+	vmf->page = trampfd->page;
+	get_page(vmf->page);
+
+	mutex_unlock(&trampfd->lock);
+
+	if (new_page)
+		__free_pages(new_page, 0);
+	return 0;
+}
+
+static const struct vm_operations_struct trampfd_vm_ops = {
+	.close	= trampfd_close,
+	.fault	= trampfd_fault,
+};
+
+int trampfd_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct trampfd		*trampfd = vma->vm_file->private_data;
+	int			rc = 0;
+
+	/*
+	 * A trampfd cannot be mapped into multiple address spaces.
+	 */
+	if (current->pid != trampfd->creator)
+		return -EINVAL;
+
+	mutex_lock(&trampfd->lock);
+
+	/*
+	 * trampfd must be initialized before it can be mapped.
+	 */
+	if (!trampfd->code || !trampfd->data) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	/*
+	 * A trampfd cannot be mapped multiple times in the same address space.
+	 */
+	if (trampfd->mapped) {
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	/*
+	 * prot should be R-X.
+	 */
+	if ((vma->vm_flags & VM_WRITE) || !(vma->vm_flags & VM_READ) ||
+	    !(vma->vm_flags & VM_EXEC)) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+	trampfd->mapped = true;
+	vma->vm_ops = &trampfd_vm_ops;
+unlock:
+	mutex_unlock(&trampfd->lock);
+	return rc;
+}
+
+unsigned long
+trampfd_get_unmapped_area(struct file *file, unsigned long orig_addr,
+			  unsigned long len, unsigned long pgoff,
+			  unsigned long flags)
+{
+	const typeof_member(struct file_operations, get_unmapped_area)
+	get_area = current->mm->get_unmapped_area;
+
+	if (pgoff != TRAMPFD_CODE_PGOFF || flags != MAP_PRIVATE ||
+	    len != PAGE_SIZE)
+		return -EINVAL;
+
+	return get_area(file, orig_addr, len, pgoff, flags);
+}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b951a87da987..91f55ff3cdac 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -69,6 +69,7 @@ union bpf_attr;
 struct io_uring_params;
 struct clone_args;
 struct open_how;
+struct trampfd_info;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -1005,6 +1006,7 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_trampfd(struct trampfd_info *info);
 
 /*
  * Architecture-specific system calls
diff --git a/include/linux/trampfd.h b/include/linux/trampfd.h
new file mode 100644
index 000000000000..c98fa1741c36
--- /dev/null
+++ b/include/linux/trampfd.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Trampoline FD - Internal structures and definitions.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+#ifndef _LINUX_TRAMPFD_H
+#define _LINUX_TRAMPFD_H
+
+#include <uapi/linux/trampfd.h>
+
+/*
+ * mmap() offsets.
+ */
+enum trampfd_pgoff {
+	TRAMPFD_CODE_PGOFF = 1,
+	TRAMPFD_NUM_PGOFF,
+};
+
+/*
+ * Trampoline structure.
+ */
+struct trampfd {
+	struct mutex		lock;		/* to serialize access */
+	pid_t			creator;	/* to prevent sharing */
+
+	short			code_reg;	/* code register name */
+	short			data_reg;	/* data register name */
+	int			ntrampolines;	/* number of trampolines */
+
+	void			*code;		/* user code address table */
+	void			*data;		/* user data address table */
+
+	struct page		*page;		/* code page */
+	bool			mapped;		/* mapped into address space? */
+};
+
+#ifdef CONFIG_TRAMPFD
+
+int trampfd_mmap(struct file *file, struct vm_area_struct *vma);
+unsigned long trampfd_get_unmapped_area(struct file *file, unsigned long addr,
+					unsigned long len, unsigned long pgoff,
+					unsigned long flags);
+
+#endif /* CONFIG_TRAMPFD */
+
+#endif /* _LINUX_TRAMPFD_H */
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f4a01305d9a6..3b1ad4b75c7a 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/trampfd.h b/include/uapi/linux/trampfd.h
new file mode 100644
index 000000000000..9bbc0450e16d
--- /dev/null
+++ b/include/uapi/linux/trampfd.h
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Trampoline FD - API structures and definitions.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+#ifndef _UAPI_LINUX_TRAMPFD_H
+#define _UAPI_LINUX_TRAMPFD_H
+
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+/*
+ * All structure fields are defined so that they are the same width and at the
+ * same structure offset on 32-bit and 64-bit to avoid compat code.
+ *
+ * All fields named "reserved" must be set to 0. They are there primarily for
+ * alignment. But they may be used in the future.
+ */
+
+/* ---------------------------- Trampfd Feature --------------------------- */
+
+/*
+ * This feature can be used to help convert dynamic user code to static user
+ * code so that the code can be in the text segment of a binary file. This
+ * allows the kernel to authenticate the code. E.g., using signature
+ * verification of the binary file.
+ *
+ * The problem in converting dynamic code to static code is that the static
+ * code needs to be able to locate its data dynamically. So, its data needs
+ * to be loaded in a designated register before jumping to the static code.
+ *
+ * This feature uses the kernel to generate a small, secure trampoline to do
+ * this. The trampoline code looks like this:
+ *
+ *	- load the address of the static code in a register (code_reg)
+ *	- load the address of its data in a register (data_reg)
+ *	- jump to code_reg
+ *
+ * The kernel places the trampoline in a user page and maps it into the user
+ * address space. To conserve memory, the kernel packs multiple trampolines in
+ * a page and creates a trampoline table.
+ */
+
+/* -------------------------- Trampoline Creation ------------------------- */
+
+/*
+ * This feature introduces a new trampfd system call.
+ *
+ *	struct trampfd_info	info;
+ *
+ *	trampfd = syscall(430, &info);
+ *
+ * The kernel returns the following items in info:
+ *
+ * ntrampolines
+ *	The number of trampolines that can be created with one trampfd. The
+ *	user may create fewer trampolines if he wishes.
+ *
+ * code_size
+ *	The size of each trampoline.
+ *
+ * code_offset
+ *	The file offset to be used in mmap() to map the trampoline code.
+ */
+struct trampfd_info {
+	__u32		ntrampolines;
+	__u32		code_size;
+	__u32		code_offset;
+	__u32		reserved;
+};
+
+/* ----------------------- Trampoline Initialization ---------------------- */
+
+/*
+ * Trampoline code descriptor.
+ *
+ * ntrampolines
+ *	User specified number of trampolines. This number cannot exceed
+ *	info.ntrampolines.
+ *
+ * reg
+ *	User specified code register name. This is architecture specific and
+ *	can be obtained from ptrace.h.
+ *
+ * table
+ *	User array of code addresses, one address per trampoline.
+ *
+ */
+struct trampfd_code {
+	__u32		ntrampolines;
+	__u32		reg;
+	__u64		table;
+};
+
+/*
+ * Trampoline data descriptor.
+ *
+ * reg
+ *	User specified data register name. This is architecture specific and
+ *	can be obtained from ptrace.h.
+ *
+ * table
+ *	User array of data addresses, one address per trampoline.
+ *
+ */
+struct trampfd_data {
+	__u32		reg;
+	__u32		reserved;
+	__u64		table;
+};
+
+/*
+ * A trampfd is initialized in this manner:
+ *
+ *	struct trampfd_code	code;
+ *	struct trampfd_data	data;
+ *
+ *	code.ntrampolines = number of desired trampolines;
+ *	code.reg = code register name;
+ *	code.table = array of code addresses
+ *
+ *	data.reg = data register name;
+ *	data.table = array of data addresses
+ *
+ *	pwrite(trampfd, &code, sizeof(init), TRAMPFD_CODE);
+ *	pwrite(trampfd, &data, sizeof(init), TRAMPFD_DATA);
+ *
+ * It is recommended that the code descriptor and code array be placed in the
+ * .rodata section so that an attacker cannot modify its contents.
+ */
+
+/* ---------------------------- Trampoline mapping ------------------------ */
+
+/*
+ * The user uses mmap() to map the trampoline table into user address space.
+ *
+ *	len = info.code_size * code.ntrampolines;
+ *	prot = PROT_READ | PROT_EXEC;
+ *	flags = MAP_PRIVATE;
+ *	offset = info.code_offset;
+ *
+ *	trampoline_table = mmap(NULL, len, prot, flags, trampfd, offset);
+ *
+ * The kernel generates the trampoline table. The code for trampoline X in the
+ * table is:
+ *
+ *	load code_table[X] into code_reg
+ *	load data_table[X] into data_reg
+ *	jump code_reg
+ *
+ * The user manages the trampoline table. The address of trampoline X is:
+ *
+ *	trampoline_table + info.code_size * X;
+ *
+ * Prior to invoking trampoline X, the user must initialize code_table[X] and
+ * data_table[X].
+ */
+
+/* ------------------------- Symbolic offsets -------------------------- */
+
+/*
+ * trampfd can have different actions/parameters associated with it. Each one
+ * has a symbolic file offset. Action/Parameter structures are read or written
+ * at their file offsets.
+ *
+ * Offset		Operation	Data
+ * ------------------------------------------------------------------------
+ * TRAMPFD_CODE		Write		struct trampfd_code
+ * TRAMPFD_DATA		Write		struct trampfd_data
+ * ------------------------------------------------------------------------
+ */
+enum trampfd_offsets {
+	TRAMPFD_CODE,
+	TRAMPFD_DATA,
+	TRAMPFD_NUM_OFFSETS,
+};
+
+
+/* ----------------------------------------------------------------------- */
+
+#endif /* _UAPI_LINUX_TRAMPFD_H */
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..bb3ecca5b8e7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2313,3 +2313,10 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 # <asm/syscall_wrapper.h>.
 config ARCH_HAS_SYSCALL_WRAPPER
 	def_bool n
+
+config TRAMPFD
+	bool "Enable trampfd() system call"
+	depends on MMU
+	help
+	  Enable the trampfd() system call that allows a process to map
+	  kernel generated trampolines within its address space.
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..93c5972aba85 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -349,6 +349,9 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* Trampoline fd */
+COND_SYSCALL(trampfd);
+
 
 /*
  * Architecture specific weak syscall entries.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor
  2020-09-22 21:53 ` madvenka
  2020-09-22 21:53   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
@ 2020-09-22 21:53   ` madvenka
  2020-09-22 21:53   ` [PATCH v2 3/4] [RFC] arm64/trampfd: " madvenka
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-22 21:53 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, pavel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table for:
		- 32-bit user process
		- 64-bit user process

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/trampfd.c              | 238 +++++++++++++++++++++++++
 5 files changed, 279 insertions(+)
 create mode 100644 arch/x86/kernel/trampfd.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed1..d4f17806c9ab 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -443,3 +443,4 @@
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	trampfd			sys_trampfd
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78847b32e137..91b37bc4b6f0 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -360,6 +360,7 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	common	trampfd			sys_trampfd
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/x86/include/uapi/asm/ptrace.h b/arch/x86/include/uapi/asm/ptrace.h
index 85165c0edafc..b4be362929b3 100644
--- a/arch/x86/include/uapi/asm/ptrace.h
+++ b/arch/x86/include/uapi/asm/ptrace.h
@@ -9,6 +9,44 @@
 
 #ifndef __ASSEMBLY__
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	x32_min = 0,
+	x32_eax = x32_min,
+	x32_ebx,
+	x32_ecx,
+	x32_edx,
+	x32_esi,
+	x32_edi,
+	x32_ebp,
+	x32_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+	x64_min = x32_max,
+	x64_rax = x64_min,
+	x64_rbx,
+	x64_rcx,
+	x64_rdx,
+	x64_rsi,
+	x64_rdi,
+	x64_rbp,
+	x64_r8,
+	x64_r9,
+	x64_r10,
+	x64_r11,
+	x64_r12,
+	x64_r13,
+	x64_r14,
+	x64_r15,
+	x64_max,
+};
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
    stack during a system call. */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261db2391..feb7f4f311fd 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -157,3 +157,4 @@ ifeq ($(CONFIG_X86_64),y)
 endif
 
 obj-$(CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT)	+= ima_arch.o
+obj-$(CONFIG_TRAMPFD)				+= trampfd.o
diff --git a/arch/x86/kernel/trampfd.c b/arch/x86/kernel/trampfd.c
new file mode 100644
index 000000000000..7b812c200d01
--- /dev/null
+++ b/arch/x86/kernel/trampfd.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - X86 support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE		24
+#define TRAMPFD_CODE_64_SIZE		40
+
+static inline bool is_compat(void)
+{
+	return (IS_ENABLED(CONFIG_X86_32) ||
+		(IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_ADDR32)));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	if (is_compat())
+		info->code_size = TRAMPFD_CODE_32_SIZE;
+	else
+		info->code_size = TRAMPFD_CODE_64_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	if (is_compat()) {
+		min = x32_min;
+		max = x32_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+	} else {
+		min = x64_min;
+		max = x64_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+	}
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	if (is_compat()) {
+		min = x32_min;
+		max = x32_max;
+	} else {
+		min = x64_min;
+		max = x64_max;
+	}
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * X32 register encodings.
+ */
+static unsigned char	reg_32[] = {
+	0,	/* x32_eax */
+	3,	/* x32_ebx */
+	1,	/* x32_ecx */
+	2,	/* x32_edx */
+	6,	/* x32_esi */
+	7,	/* x32_edi */
+	5,	/* x32_ebp */
+};
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - x32_min;
+	int		dreg = trampfd->data_reg - x32_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/* endbr32 */
+		addr[0] = 0xf3;
+		addr[1] = 0x0f;
+		addr[2] = 0x1e;
+		addr[3] = 0xfb;
+
+		/* mov code, %creg */
+		addr[4] = 0xB8 | reg_32[creg];			/* opcode+reg */
+		memcpy(&addr[5], &code, sizeof(u32));		/* imm32 */
+
+		/* mov (%creg), %creg */
+		addr[9] = 0x8B;				/* opcode */
+		addr[10] = 0x00 |				/* MODRM.mode */
+			   reg_32[creg] << 3 |			/* MODRM.reg */
+			   reg_32[creg];			/* MODRM.r/m */
+
+		/* mov data, %dreg */
+		addr[11] = 0xB8 | reg_32[dreg];			/* opcode+reg */
+		memcpy(&addr[12], &data, sizeof(u32));		/* imm32 */
+
+		/* mov (%dreg), %dreg */
+		addr[16] = 0x8B;				/* opcode */
+		addr[17] = 0x00 |				/* MODRM.mode */
+			   reg_32[dreg] << 3 |			/* MODRM.reg */
+			   reg_32[dreg];			/* MODRM.r/m */
+
+		/* jmp *%creg */
+		addr[18] = 0xff;				/* opcode */
+		addr[19] = 0xe0 | reg_32[creg];			/* MODRM.r/m */
+
+		/* nopl (%eax) */
+		addr[20] = 0x0f;
+		addr[21] = 0x1f;
+		addr[22] = 0x00;
+
+		/* pad to 4-byte boundary */
+		memset(&addr[23], 0, TRAMPFD_CODE_32_SIZE - 23);
+		addr += TRAMPFD_CODE_32_SIZE;
+	}
+	memset(addr, 0, eaddr - addr);
+}
+
+/*
+ * X64 register encodings.
+ */
+static unsigned char	reg_64[] = {
+	0,	/* x64_rax */
+	3,	/* x64_rbx */
+	1,	/* x64_rcx */
+	2,	/* x64_rdx */
+	6,	/* x64_rsi */
+	7,	/* x64_rdi */
+	5,	/* x64_rbp */
+	8,	/* x64_r8 */
+	9,	/* x64_r9 */
+	10,	/* x64_r10 */
+	11,	/* x64_r11 */
+	12,	/* x64_r12 */
+	13,	/* x64_r13 */
+	14,	/* x64_r14 */
+	15,	/* x64_r15 */
+};
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - x64_min;
+	int		dreg = trampfd->data_reg - x64_min;
+	u64		*code = trampfd->code;
+	u64		*data = trampfd->data;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/* endbr64 */
+		addr[0] = 0xf3;
+		addr[1] = 0x0f;
+		addr[2] = 0x1e;
+		addr[3] = 0xfa;
+
+		/* movabs code, %creg */
+		addr[4] = 0x48 |				/* REX.W */
+			  ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[5] = 0xB8 | (reg_64[creg] & 0x7);		/* opcode+reg */
+		memcpy(&addr[6], &code, sizeof(u64));		/* imm64 */
+
+		/* movq (%creg), %creg */
+		addr[14] = 0x48 |				/* REX.W */
+			   ((reg_64[creg] & 0x8) >> 1) |	/* REX.R */
+			   ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[15] = 0x8B;				/* opcode */
+		addr[16] = 0x00 |				/* MODRM.mode */
+			   ((reg_64[creg] & 0x7)) << 3 |	/* MODRM.reg */
+			   ((reg_64[creg] & 0x7));		/* MODRM.r/m */
+
+		/* movabs data, %dreg */
+		addr[17] = 0x48 |				/* REX.W */
+			  ((reg_64[dreg] & 0x8) >> 3);		/* REX.B */
+		addr[18] = 0xB8 | (reg_64[dreg] & 0x7);		/* opcode+reg */
+		memcpy(&addr[19], &data, sizeof(u64));		/* imm64 */
+
+		/* movq (%dreg), %dreg */
+		addr[27] = 0x48 |				/* REX.W */
+			   ((reg_64[dreg] & 0x8) >> 1) |	/* REX.R */
+			   ((reg_64[dreg] & 0x8) >> 3);		/* REX.B */
+		addr[28] = 0x8B;				/* opcode */
+		addr[29] = 0x00 |				/* MODRM.mode */
+			   ((reg_64[dreg] & 0x7)) << 3 |	/* MODRM.reg */
+			   ((reg_64[dreg] & 0x7));		/* MODRM.r/m */
+
+		/* jmpq *%creg */
+		addr[30] = 0x40 |				/* REX.W */
+			   ((reg_64[creg] & 0x8) >> 3);		/* REX.B */
+		addr[31] = 0xff;				/* opcode */
+		addr[32] = 0xe0 | (reg_64[creg] & 0x7);		/* MODRM.r/m */
+
+		/* nopl (%rax) */
+		addr[33] = 0x0f;
+		addr[34] = 0x1f;
+		addr[35] = 0x00;
+
+		/* pad to 8-byte boundary */
+		memset(&addr[36], 0, TRAMPFD_CODE_64_SIZE - 36);
+		addr += TRAMPFD_CODE_64_SIZE;
+	}
+	memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	if (is_compat())
+		trampfd_code_fill_32(trampfd, addr);
+	else
+		trampfd_code_fill_64(trampfd, addr);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/4] [RFC] arm64/trampfd: Provide support for the trampoline file descriptor
  2020-09-22 21:53 ` madvenka
  2020-09-22 21:53   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
  2020-09-22 21:53   ` [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
@ 2020-09-22 21:53   ` madvenka
  2020-09-22 21:53   ` [PATCH v2 4/4] [RFC] arm/trampfd: " madvenka
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-22 21:53 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, pavel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table for:
		- 32-bit user process
		- 64-bit user process

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/unistd.h      |   2 +-
 arch/arm64/include/asm/unistd32.h    |   2 +
 arch/arm64/include/uapi/asm/ptrace.h |  59 +++++++
 arch/arm64/kernel/Makefile           |   2 +
 arch/arm64/kernel/trampfd.c          | 244 +++++++++++++++++++++++++++
 5 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/trampfd.c

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f..c0493c5322d9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 42cbe34d95ce..2778789c1cbe 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -88,6 +88,65 @@ struct user_pt_regs {
 	__u64		pstate;
 };
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	arm_min,
+	arm_r0 = arm_min,
+	arm_r1,
+	arm_r2,
+	arm_r3,
+	arm_r4,
+	arm_r5,
+	arm_r6,
+	arm_r7,
+	arm_r8,
+	arm_r9,
+	arm_r10,
+	arm_r11,
+	arm_r12,
+	arm_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+	arm64_min = arm_max,
+	arm64_r0 = arm64_min,
+	arm64_r1,
+	arm64_r2,
+	arm64_r3,
+	arm64_r4,
+	arm64_r5,
+	arm64_r6,
+	arm64_r7,
+	arm64_r8,
+	arm64_r9,
+	arm64_r10,
+	arm64_r11,
+	arm64_r12,
+	arm64_r13,
+	arm64_r14,
+	arm64_r15,
+	arm64_r16,
+	arm64_r17,
+	arm64_r18,
+	arm64_r19,
+	arm64_r20,
+	arm64_r21,
+	arm64_r22,
+	arm64_r23,
+	arm64_r24,
+	arm64_r25,
+	arm64_r26,
+	arm64_r27,
+	arm64_r28,
+	arm64_r29,
+	arm64_max,
+};
+
 struct user_fpsimd_state {
 	__uint128_t	vregs[32];
 	__u32		fpsr;
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a561cbb91d4d..18d373fb1208 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -71,3 +71,5 @@ extra-y					+= $(head-y) vmlinux.lds
 ifeq ($(CONFIG_DEBUG_EFI),y)
 AFLAGS_head.o += -DVMLINUX_PATH="\"$(realpath $(objtree)/vmlinux)\""
 endif
+
+obj-$(CONFIG_TRAMPFD)			+= trampfd.o
diff --git a/arch/arm64/kernel/trampfd.c b/arch/arm64/kernel/trampfd.c
new file mode 100644
index 000000000000..3b40ebb12907
--- /dev/null
+++ b/arch/arm64/kernel/trampfd.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM64 support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <asm/compat.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE		28
+#define TRAMPFD_CODE_64_SIZE		48
+
+static inline bool is_compat(void)
+{
+	return is_compat_thread(task_thread_info(current));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	if (is_compat())
+		info->code_size = TRAMPFD_CODE_32_SIZE;
+	else
+		info->code_size = TRAMPFD_CODE_64_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	if (is_compat()) {
+		min = arm_min;
+		max = arm_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+	} else {
+		min = arm64_min;
+		max = arm64_max;
+		ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+	}
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	if (is_compat()) {
+		min = arm_min;
+		max = arm_max;
+	} else {
+		min = arm64_min;
+		max = arm64_max;
+	}
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+#define MOVARM(ins, reg, imm32)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm32);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 2; _hw++) {					\
+		/* movw or movt */					\
+		_opcode = _hw ? 0xe3400000 : 0xe3000000;		\
+		*ins++ = _opcode | (_imm16[_hw] >> 12) << 16 |		\
+			 (reg) << 12 | (_imm16[_hw] & 0xFFF);		\
+	}								\
+}
+
+#define LDRARM(ins, reg)						\
+{									\
+	*ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12;		\
+}
+
+#define BXARM(ins, reg)							\
+{									\
+	*ins++ = 0xe12fff10 | (reg);					\
+}
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm_min;
+	int		dreg = trampfd->data_reg - arm_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * movw creg, code & 0xFFFF
+		 * movt creg, code >> 16
+		 */
+		MOVARM(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDRARM(instruction, creg);
+
+		/*
+		 * movw dreg, data & 0xFFFF
+		 * movt dreg, data >> 16
+		 */
+		MOVARM(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDRARM(instruction, dreg);
+
+		/*
+		 * bx	creg
+		 */
+		BXARM(instruction, creg);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
+
+#define MOVQ(ins, reg, imm64)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm64);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 4; _hw++) {					\
+		/* movz or movk */					\
+		_opcode = _hw ? 0xf2800000 : 0xd2800000;		\
+		*ins++ = _opcode | _hw << 21 | _imm16[_hw] << 5 | (reg);\
+	}								\
+}
+
+#define LDR(ins, reg)							\
+{									\
+	*ins++ = 0xf9400000 | (reg) << 5 | (reg);			\
+}
+
+#define BR(ins, reg)							\
+{									\
+	*ins++ = 0xd61f0000 | (reg) << 5;				\
+}
+
+#define PAD(ins)							\
+{									\
+	while ((uintptr_t) ins & 7)					\
+		*ins++ = 0;						\
+}
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm64_min;
+	int		dreg = trampfd->data_reg - arm64_min;
+	u64		*code = trampfd->code;
+	u64		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * Pseudo instruction:
+		 *
+		 * movq creg, code
+		 *
+		 * Actual instructions:
+		 *
+		 * movz	creg, code & 0xFFFF
+		 * movk	creg, (code >> 16) & 0xFFFF, lsl 16
+		 * movk	creg, (code >> 32) & 0xFFFF, lsl 32
+		 * movk	creg, (code >> 48) & 0xFFFF, lsl 48
+		 */
+		MOVQ(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDR(instruction, creg);
+
+		/*
+		 * Pseudo instruction:
+		 *
+		 * movq dreg, data
+		 *
+		 * Actual instructions:
+		 *
+		 * movz	dreg, data & 0xFFFF
+		 * movk	dreg, (data >> 16) & 0xFFFF, lsl 16
+		 * movk	dreg, (data >> 32) & 0xFFFF, lsl 32
+		 * movk	dreg, (data >> 48) & 0xFFFF, lsl 48
+		 */
+		MOVQ(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDR(instruction, dreg);
+
+		/*
+		 * br	creg
+		 */
+		BR(instruction, creg);
+
+		/*
+		 * Pad to 8-byte boundary
+		 */
+		PAD(instruction);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	if (is_compat())
+		trampfd_code_fill_32(trampfd, addr);
+	else
+		trampfd_code_fill_64(trampfd, addr);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/4] [RFC] arm/trampfd: Provide support for the trampoline file descriptor
  2020-09-22 21:53 ` madvenka
                     ` (2 preceding siblings ...)
  2020-09-22 21:53   ` [PATCH v2 3/4] [RFC] arm64/trampfd: " madvenka
@ 2020-09-22 21:53   ` madvenka
  2020-09-22 21:54   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Madhavan T. Venkataraman
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: madvenka @ 2020-09-22 21:53 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, pavel, madvenka

From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

	- Define architecture specific register names
	- Architecture specific functions for:
		- system call init
		- code descriptor check
		- data descriptor check
	- Fill a page with a trampoline table,

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm/include/uapi/asm/ptrace.h |  21 +++++
 arch/arm/kernel/Makefile           |   1 +
 arch/arm/kernel/trampfd.c          | 124 +++++++++++++++++++++++++++++
 arch/arm/tools/syscall.tbl         |   1 +
 4 files changed, 147 insertions(+)
 create mode 100644 arch/arm/kernel/trampfd.c

diff --git a/arch/arm/include/uapi/asm/ptrace.h b/arch/arm/include/uapi/asm/ptrace.h
index e61c65b4018d..598047768f9b 100644
--- a/arch/arm/include/uapi/asm/ptrace.h
+++ b/arch/arm/include/uapi/asm/ptrace.h
@@ -151,6 +151,27 @@ struct pt_regs {
 #define ARM_r0		uregs[0]
 #define ARM_ORIG_r0	uregs[17]
 
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+	arm_min,
+	arm_r0 = arm_min,
+	arm_r1,
+	arm_r2,
+	arm_r3,
+	arm_r4,
+	arm_r5,
+	arm_r6,
+	arm_r7,
+	arm_r8,
+	arm_r9,
+	arm_r10,
+	arm_r11,
+	arm_r12,
+	arm_max,
+};
+
 /*
  * The size of the user-visible VFP state as seen by PTRACE_GET/SETVFPREGS
  * and core dumps.
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89e5d864e923..652c54c2f19a 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -105,5 +105,6 @@ obj-$(CONFIG_SMP)		+= psci_smp.o
 endif
 
 obj-$(CONFIG_HAVE_ARM_SMCCC)	+= smccc-call.o
+obj-$(CONFIG_TRAMPFD)		+= trampfd.o
 
 extra-y := $(head-y) vmlinux.lds
diff --git a/arch/arm/kernel/trampfd.c b/arch/arm/kernel/trampfd.c
new file mode 100644
index 000000000000..45146ed489e8
--- /dev/null
+++ b/arch/arm/kernel/trampfd.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM support.
+ *
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_SIZE		28
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+	info->code_size = TRAMPFD_CODE_SIZE;
+	info->ntrampolines = PAGE_SIZE / info->code_size;
+	info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+	info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+	int	ntrampolines;
+	int	min, max;
+
+	min = arm_min;
+	max = arm_max;
+	ntrampolines = PAGE_SIZE / TRAMPFD_CODE_SIZE;
+
+	if (code->reg < min || code->reg >= max)
+		return -EINVAL;
+
+	if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+	int	min, max;
+
+	min = arm_min;
+	max = arm_max;
+
+	if (data->reg < min || data->reg >= max)
+		return -EINVAL;
+	return 0;
+}
+
+#define MOVW(ins, reg, imm32)						\
+{									\
+	u16	*_imm16 = (u16 *) &(imm32);	/* little endian */	\
+	int	_hw, _opcode;						\
+									\
+	for (_hw = 0; _hw < 2; _hw++) {					\
+		/* movw or movt */					\
+		_opcode = _hw ? 0xe3400000 : 0xe3000000;		\
+		*ins++ = _opcode | (_imm16[_hw] >> 12) << 16 |		\
+			 (reg) << 12 | (_imm16[_hw] & 0xFFF);		\
+	}								\
+}
+
+#define LDR(ins, reg)							\
+{									\
+	*ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12;		\
+}
+
+#define BX(ins, reg)							\
+{									\
+	*ins++ = 0xe12fff10 | (reg);					\
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+	char		*eaddr = addr + PAGE_SIZE;
+	int		creg = trampfd->code_reg - arm_min;
+	int		dreg = trampfd->data_reg - arm_min;
+	u32		*code = trampfd->code;
+	u32		*data = trampfd->data;
+	u32		*instruction = (u32 *) addr;
+	int		i;
+
+	for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+		/*
+		 * movw creg, code & 0xFFFF
+		 * movt creg, code >> 16
+		 */
+		MOVW(instruction, creg, code);
+
+		/*
+		 * ldr	creg, [creg]
+		 */
+		LDR(instruction, creg);
+
+		/*
+		 * movw dreg, data & 0xFFFF
+		 * movt dreg, data >> 16
+		 */
+		MOVW(instruction, dreg, data);
+
+		/*
+		 * ldr	dreg, [dreg]
+		 */
+		LDR(instruction, dreg);
+
+		/*
+		 * bx	creg
+		 */
+		BX(instruction, creg);
+	}
+	addr = (char *) instruction;
+	memset(addr, 0, eaddr - addr);
+}
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index d5cae5ffede0..85dcbc9e08ee 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -452,3 +452,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	trampfd				sys_trampfd
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-22 21:53 ` madvenka
                     ` (3 preceding siblings ...)
  2020-09-22 21:53   ` [PATCH v2 4/4] [RFC] arm/trampfd: " madvenka
@ 2020-09-22 21:54   ` Madhavan T. Venkataraman
  2020-09-23  8:14   ` Pavel Machek
  2020-09-23  8:42   ` Pavel Machek
  6 siblings, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-22 21:54 UTC (permalink / raw)
  To: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, pavel

I just resent the trampfd v2 RFC. I forgot to CC the reviewers who provided comments before.
So sorry.

Madhavan

On 9/22/20 4:53 PM, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Introduction
> ============
> 
> Dynamic code is used in many different user applications. Dynamic code is
> often generated at runtime. Dynamic code can also just be a pre-defined
> sequence of machine instructions in a data buffer. Examples of dynamic
> code are trampolines, JIT code, DBT code, etc.
> 
> Dynamic code is placed either in a data page or in a stack page. In order
> to execute dynamic code, the page it resides in needs to be mapped with
> execute permissions. Writable pages with execute permissions provide an
> attack surface for hackers. Attackers can use this to inject malicious
> code, modify existing code or do other harm.
> 
> To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> allow pages to have both write and execute permissions. This prevents
> dynamic code from executing and blocks applications that use it. To allow
> genuine applications to run, exceptions have to be made for them (by setting
> execmem, etc) which opens the door to security issues.
> 
> The W^X implementation today is not complete. There exist many user level
> tricks that can be used to load and execute dynamic code. E.g.,
> 
> - Load the code into a file and map the file with R-X.
> 
> - Load the code in an RW- page. Change the permissions to R--. Then,
>   change the permissions to R-X.
> 
> - Load the code in an RW- page. Remap the page with R-X to get a separate
>   mapping to the same underlying physical page.
> 
> IMO, these are all security holes as an attacker can exploit them to inject
> his own code.
> 
> In the future, these holes will definitely be closed. For instance, LSMs
> (such as the IPE proposal [1]) may only allow code in properly signed object
> files to be mapped with execute permissions. This will do two things:
> 
> 	- user level tricks using anonymous pages will fail as anonymous
> 	  pages have no file identity
> 
> 	- loading the code in a temporary file and mapping it with R-X
> 	  will fail as the temporary file would not have a signature
> 
> We need a way to execute such code without making security exceptions.
> Trampolines are a good example of dynamic code. A couple of examples
> of trampolines are given below. My first use case for this RFC is
> libffi.
> 
> Examples of trampolines
> =======================
> 
> libffi (A Portable Foreign Function Interface Library):
> 
> libffi allows a user to define functions with an arbitrary list of
> arguments and return value through a feature called "Closures".
> Closures use trampolines to jump to ABI handlers that handle calling
> conventions and call a target function. libffi is used by a lot
> of different applications. To name a few:
> 
> 	- Python
> 	- Java
> 	- Javascript
> 	- Ruby FFI
> 	- Lisp
> 	- Objective C
> 
> GCC nested functions:
> 
> GCC has traditionally used trampolines for implementing nested
> functions. The trampoline is placed on the user stack. So, the stack
> needs to be executable.
> 
> Currently available solution
> ============================
> 
> One solution that has been proposed to allow trampolines to be executed
> without making security exceptions is Trampoline Emulation. See:
> 
> https://pax.grsecurity.net/docs/emutramp.txt
> 
> In this solution, the kernel recognizes certain sequences of instructions
> as "well-known" trampolines. When such a trampoline is executed, a page
> fault happens because the trampoline page does not have execute permission.
> The kernel recognizes the trampoline and emulates it. Basically, the
> kernel does the work of the trampoline on behalf of the application.
> 
> Currently, the emulated trampolines are the ones used in libffi and GCC
> nested functions. To my knowledge, only X86 is supported at this time.
> 
> As noted in emutramp.txt, this is not a generic solution. For every new
> trampoline that needs to be supported, new instruction sequences need to
> be recognized by the kernel and emulated. And this has to be done for
> every architecture that needs to be supported.
> 
> emutramp.txt notes the following:
> 
> "... the real solution is not in emulation but by designing a kernel API
> for runtime code generation and modifying userland to make use of it."
> 
> Solution proposed in this RFC
> =============================
> 
>>From this RFC's perspective, there are two scenarios for dynamic code:
> 
> Scenario 1
> ----------
> 
> We know what code we need only at runtime. For instance, JIT code generated
> for frequently executed Java methods. Only at runtime do we know what
> methods need to be JIT compiled. Such code cannot be statically defined. It
> has to be generated at runtime.
> 
> Scenario 2
> ----------
> 
> We know what code we need in advance. User trampolines are a good example of
> this. It is possible to define such code statically with some help from the
> kernel.
> 
> This RFC addresses (2). (1) needs a general purpose trusted code generator
> and is out of scope for this RFC.
> 
> For (2), the solution is to convert dynamic code to static code and place it
> in a source file. The binary generated from the source can be signed. The
> kernel can use signature verification to authenticate the binary and
> allow the code to be mapped and executed.
> 
> The problem is that the static code has to be able to find the data that it
> needs when it executes. For functions, the ABI defines the way to pass
> parameters. But, for arbitrary dynamic code, there isn't a standard ABI
> compliant way to pass data to the code for most architectures. Each instance
> of dynamic code defines its own way. For instance, co-location of code and
> data and PC-relative data referencing are used in cases where the ISA
> supports it.
> 
> We need one standard way that would work for all architectures and ABIs.
> 
> The solution proposed here is:
> 
> 1. Write the static code assuming that the data needed by the code is already
>    pointed to by a designated register.
> 
> 2. Get the kernel to supply a small universal trampoline that does the
>    following:
> 
> 	- Load the address of the data in a designated register
> 	- Load the address of the static code in a designated register
> 	- Jump to the static code
> 
> User code would use a kernel supplied API to create and map the trampoline.
> The address values would be baked into the code so that no special ISA
> features are needed.
> 
> To conserve memory, the kernel will pack as many trampolines as possible in
> a page and provide a trampoline table to user code. The table itself is
> managed by the user.
> 
> Trampoline File Descriptor (trampfd)
> ==========================
> 
> I am proposing a kernel API using anonymous file descriptors that can be
> used to create the trampolines. The API is described in patch 1/4 of this
> patchset. I provide a summary here:
> 
> 	- Create a trampoline file object
> 
> 	- Write a code descriptor into the trampoline file and specify:
> 
> 		- the number of trampolines desired
> 		- the name of the code register
> 		- user pointer to a table of code addresses, one address
> 		  per trampoline
> 
> 	- Write a data descriptor into the trampoline file and specify:
> 
> 		- the name of the data register
> 		- user pointer to a table of data addresses, one address
> 		  per trampoline
> 
> 	- mmap() the trampoline file. The kernel generates a table of
> 	  trampolines in a page and returns the trampoline table address
> 
> 	- munmap() a trampoline file mapping
> 
> 	- Close the trampoline file
> 
> Each mmap() will only map a single base page. Large pages are not supported.
> 
> A trampoline file can only be mapped once in an address space.
> 
> Trampoline file mappings cannot be shared across address spaces. So,
> sending the trampoline file descriptor over a unix domain socket and
> mapping it in another process will not work.
> 
> It is recommended that the code descriptor and the code table be placed
> in the .rodata section so an attacker cannot modify them.
> 
> Trampoline use and reuse
> ========================
> 
> The code for trampoline X in the trampoline table is:
> 
> 	load	&code_table[X], code_reg
> 	load	(code_reg), code_reg
> 	load	&data_table[X], data_reg
> 	load	(data_reg), data_reg
> 	jump	code_reg
> 
> The addresses &code_table[X] and &data_table[X] are baked into the
> trampoline code. So, PC-relative data references are not needed. The user
> can modify code_table[X] and data_table[X] dynamically.
> 
> For instance, within libffi, the same trampoline X can be used for different
> closures at different times by setting:
> 
> 	data_table[X] = closure;
> 	code_table[X] = ABI handling code;
> 
> Advantages of the Trampoline File Descriptor approach
> =====================================================
> 
> - Using this support from the kernel, dynamic code can be converted to
>   static code with a little effort so applications and libraries can move to
>   a more secure model. In the simplest cases such as libffi, dynamic code can
>   even be eliminated.
> 
> - This initial work is targeted towards X86 and ARM. But it can be supported
>   easily on all architectures. We don't need any special ISA features such
>   as PC-relative data referencing.
> 
> - The only code generation needed is for this small, universal trampoline.
> 
> - The kernel does not have to deal with any ABI issues in the generation of
>   this trampoline.
> 
> - The kernel provides a trampoline table to conserve memory.
> 
> - An SELinux setting called "exectramp" can be implemented along the
>   lines of "execmem", "execstack" and "execheap" to selectively allow the
>   use of trampolines on a per application basis.
> 
> - In version 1, a trip to the kernel was required to execute the trampoline.
>   In version 2, that is not required. So, there are no performance
>   concerns in this approach.
> 
> libffi
> ======
> 
> I have implemented my solution for libffi and provided the changes for
> X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
> 
> http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt
> 
> If the trampfd patchset gets accepted, I will send the libffi changes
> to the maintainers for a review. BTW, I have also successfully executed
> the libffi self tests.
> 
> Work that is pending
> ====================
> 
> - I am working on implementing the SELinux setting - "exectramp".
> 
> - I have a test program to test the kernel API. I am working on adding it
>   to selftests.
> 
> References
> ==========
> 
> [1] https://microsoft.github.io/ipe/
> ---
> 
> Changelog:
> 
> v1
> 	Introduced the Trampfd feature.
> 
> v2
> 	- Changed the system call. Version 2 does not support different
> 	  trampoline types and their associated type structures. It only
> 	  supports a kernel generated trampoline.
> 
> 	  The system call now returns information to the user that is
> 	  used to define trampoline descriptors. E.g., the maximum
> 	  number of trampolines that can be packed in a single page.
> 
> 	- Removed all the trampoline contexts such as register contexts
> 	  and stack contexts. This is based on the feedback that the kernel
> 	  should not have to worry about ABI issues and H/W features that
> 	  may deal with the context of a process.
> 
> 	- Removed the need to make a trip into the kernel on trampoline
> 	  invocation. This is based on the feedback about performance.
> 
> 	- Removed the ability to share trampolines across address spaces.
> 	  This would have made sense to different trampoline types based
> 	  on their semantics. But since I support only one specific
> 	  trampoline, sharing does not make sense.
> 
> 	- Added calls to specify trampoline descriptors that the kernel
> 	  uses to generate trampolines.
> 
> 	- Added architecture-specific code to generate the small, universal
> 	  trampoline for X86 32 and 64-bit, ARM 32 and 64-bit.
> 
> 	- Implemented the trampoline table in a page.
> Madhavan T. Venkataraman (4):
>   Implement the kernel API for the trampoline file descriptor.
>   Implement i386 and X86 support for the trampoline file descriptor.
>   Implement ARM64 support for the trampoline file descriptor.
>   Implement ARM support for the trampoline file descriptor.
> 
>  arch/arm/include/uapi/asm/ptrace.h     |  21 +++
>  arch/arm/kernel/Makefile               |   1 +
>  arch/arm/kernel/trampfd.c              | 124 +++++++++++++
>  arch/arm/tools/syscall.tbl             |   1 +
>  arch/arm64/include/asm/unistd.h        |   2 +-
>  arch/arm64/include/asm/unistd32.h      |   2 +
>  arch/arm64/include/uapi/asm/ptrace.h   |  59 ++++++
>  arch/arm64/kernel/Makefile             |   2 +
>  arch/arm64/kernel/trampfd.c            | 244 +++++++++++++++++++++++++
>  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  arch/x86/include/uapi/asm/ptrace.h     |  38 ++++
>  arch/x86/kernel/Makefile               |   1 +
>  arch/x86/kernel/trampfd.c              | 238 ++++++++++++++++++++++++
>  fs/Makefile                            |   1 +
>  fs/trampfd/Makefile                    |   5 +
>  fs/trampfd/trampfd_fops.c              | 241 ++++++++++++++++++++++++
>  fs/trampfd/trampfd_map.c               | 142 ++++++++++++++
>  include/linux/syscalls.h               |   2 +
>  include/linux/trampfd.h                |  49 +++++
>  include/uapi/asm-generic/unistd.h      |   4 +-
>  include/uapi/linux/trampfd.h           | 184 +++++++++++++++++++
>  init/Kconfig                           |   7 +
>  kernel/sys_ni.c                        |   3 +
>  24 files changed, 1371 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm/kernel/trampfd.c
>  create mode 100644 arch/arm64/kernel/trampfd.c
>  create mode 100644 arch/x86/kernel/trampfd.c
>  create mode 100644 fs/trampfd/Makefile
>  create mode 100644 fs/trampfd/trampfd_fops.c
>  create mode 100644 fs/trampfd/trampfd_map.c
>  create mode 100644 include/linux/trampfd.h
>  create mode 100644 include/uapi/linux/trampfd.h
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-17 15:36     ` Madhavan T. Venkataraman
  2020-09-17 15:57       ` Madhavan T. Venkataraman
@ 2020-09-23  1:46       ` Arvind Sankar
  2020-09-23  9:11         ` Arvind Sankar
  1 sibling, 1 reply; 50+ messages in thread
From: Arvind Sankar @ 2020-09-23  1:46 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Florian Weimer, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel

On Thu, Sep 17, 2020 at 10:36:02AM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 9/16/20 8:04 PM, Florian Weimer wrote:
> > * madvenka:
> > 
> >> Examples of trampolines
> >> =======================
> >>
> >> libffi (A Portable Foreign Function Interface Library):
> >>
> >> libffi allows a user to define functions with an arbitrary list of
> >> arguments and return value through a feature called "Closures".
> >> Closures use trampolines to jump to ABI handlers that handle calling
> >> conventions and call a target function. libffi is used by a lot
> >> of different applications. To name a few:
> >>
> >> 	- Python
> >> 	- Java
> >> 	- Javascript
> >> 	- Ruby FFI
> >> 	- Lisp
> >> 	- Objective C
> > 
> > libffi does not actually need this.  It currently collocates
> > trampolines and the data they need on the same page, but that's
> > actually unecessary.  It's possible to avoid doing this just by
> > changing libffi, without any kernel changes.
> > 
> > I think this has already been done for the iOS port.
> > 
> 
> The trampoline table that has been implemented for the iOS port (MACH)
> is based on PC-relative data referencing. That is, the code and data
> are placed in adjacent pages so that the code can access the data using
> an address relative to the current PC.
> 
> This is an ISA feature that is not supported on all architectures.
> 
> Now, if it is a performance feature, we can include some architectures
> and exclude others. But this is a security feature. IMO, we cannot
> exclude any architecture even if it is a legacy one as long as Linux
> is running on the architecture. So, we need a solution that does
> not assume any specific ISA feature.

Which ISA does not support PIC objects? You mentioned i386 below, but
i386 does support them, it just needs to copy the PC into a GPR first
(see below).

> 
> >> The code for trampoline X in the trampoline table is:
> >>
> >> 	load	&code_table[X], code_reg
> >> 	load	(code_reg), code_reg
> >> 	load	&data_table[X], data_reg
> >> 	load	(data_reg), data_reg
> >> 	jump	code_reg
> >>
> >> The addresses &code_table[X] and &data_table[X] are baked into the
> >> trampoline code. So, PC-relative data references are not needed. The user
> >> can modify code_table[X] and data_table[X] dynamically.
> > 
> > You can put this code into the libffi shared object and map it from
> > there, just like the rest of the libffi code.  To get more
> > trampolines, you can map the page containing the trampolines multiple
> > times, each instance preceded by a separate data page with the control
> > information.
> > 
> 
> If you put the code in the libffi shared object, how do you pass data to
> the code at runtime? If the code we are talking about is a function, then
> there is an ABI defined way to pass data to the function. But if the
> code we are talking about is some arbitrary code such as a trampoline,
> there is no ABI defined way to pass data to it except in a couple of
> platforms such as HP PA-RISC that have support for function descriptors
> in the ABI itself.
> 
> As mentioned before, if the ISA supports PC-relative data references
> (e.g., X86 64-bit platforms support RIP-relative data references)
> then we can pass data to that code by placing the code and data in
> adjacent pages. So, you can implement the trampoline table for X64.
> i386 does not support it.
> 

i386 just needs a tiny bit of code to copy the PC into a GPR first, i.e.
the trampoline would be:

	call	1f
1:	pop	%data_reg
	movl	(code_table + X - 1b)(%data_reg), %code_reg
	movl	(data_table + X - 1b)(%data_reg), %data_reg
	jmp	*(%code_reg)

I do not understand the point about passing data at runtime. This
trampoline is to achieve exactly that, no? 

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-22 21:53 ` madvenka
                     ` (4 preceding siblings ...)
  2020-09-22 21:54   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Madhavan T. Venkataraman
@ 2020-09-23  8:14   ` Pavel Machek
  2020-09-23  9:14     ` Solar Designer
                       ` (2 more replies)
  2020-09-23  8:42   ` Pavel Machek
  6 siblings, 3 replies; 50+ messages in thread
From: Pavel Machek @ 2020-09-23  8:14 UTC (permalink / raw)
  To: madvenka
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic

[-- Attachment #1: Type: text/plain, Size: 2602 bytes --]

Hi!

> Introduction
> ============
> 
> Dynamic code is used in many different user applications. Dynamic code is
> often generated at runtime. Dynamic code can also just be a pre-defined
> sequence of machine instructions in a data buffer. Examples of dynamic
> code are trampolines, JIT code, DBT code, etc.
> 
> Dynamic code is placed either in a data page or in a stack page. In order
> to execute dynamic code, the page it resides in needs to be mapped with
> execute permissions. Writable pages with execute permissions provide an
> attack surface for hackers. Attackers can use this to inject malicious
> code, modify existing code or do other harm.
> 
> To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> allow pages to have both write and execute permissions. This prevents
> dynamic code from executing and blocks applications that use it. To allow
> genuine applications to run, exceptions have to be made for them (by setting
> execmem, etc) which opens the door to security issues.
> 
> The W^X implementation today is not complete. There exist many user level
> tricks that can be used to load and execute dynamic code. E.g.,
> 
> - Load the code into a file and map the file with R-X.
> 
> - Load the code in an RW- page. Change the permissions to R--. Then,
>   change the permissions to R-X.
> 
> - Load the code in an RW- page. Remap the page with R-X to get a separate
>   mapping to the same underlying physical page.
> 
> IMO, these are all security holes as an attacker can exploit them to inject
> his own code.

IMO, you are smoking crack^H^H very seriously misunderstanding what
W^X is supposed to protect from.

W^X is not supposed to protect you from attackers that can already do
system calls. So loading code into a file then mapping the file as R-X
is in no way security hole in W^X.

If you want to provide protection from attackers that _can_ do system
calls, fine, but please don't talk about W^X and please specify what
types of attacks you want to prevent and why that's good thing.

Hint: attacker that can "Load the code into a file and map the file
with R-X." can probably also load the code into /foo and
os.system("/usr/bin/python /foo").

This is not first crazy patch from your company. Perhaps you should
have a person with strong Unix/Linux experience performing "straight
face test" on outgoing patches?

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-22 21:53 ` madvenka
                     ` (5 preceding siblings ...)
  2020-09-23  8:14   ` Pavel Machek
@ 2020-09-23  8:42   ` Pavel Machek
  2020-09-23 18:56     ` Madhavan T. Venkataraman
  6 siblings, 1 reply; 50+ messages in thread
From: Pavel Machek @ 2020-09-23  8:42 UTC (permalink / raw)
  To: madvenka
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic

[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]

Hi!

> Solution proposed in this RFC
> =============================
> 
> >From this RFC's perspective, there are two scenarios for dynamic code:
> 
> Scenario 1
> ----------
> 
> We know what code we need only at runtime. For instance, JIT code generated
> for frequently executed Java methods. Only at runtime do we know what
> methods need to be JIT compiled. Such code cannot be statically defined. It
> has to be generated at runtime.
> 
> Scenario 2
> ----------
> 
> We know what code we need in advance. User trampolines are a good example of
> this. It is possible to define such code statically with some help from the
> kernel.
> 
> This RFC addresses (2). (1) needs a general purpose trusted code generator
> and is out of scope for this RFC.

This is slightly less crazy talk than introduction talking about holes
in W^X. But it is very, very far from normal Unix system, where you
have selection of interpretters to run your malware on (sh, python,
awk, emacs, ...) and often you can even compile malware from sources. 

And as you noted, we don't have "a general purpose trusted code
generator" for our systems.

I believe you should simply delete confusing "introduction" and
provide details of super-secure system where your patches would be
useful, instead.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  1:46       ` Arvind Sankar
@ 2020-09-23  9:11         ` Arvind Sankar
  2020-09-23 19:17           ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 50+ messages in thread
From: Arvind Sankar @ 2020-09-23  9:11 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Madhavan T. Venkataraman, Florian Weimer, kernel-hardening,
	linux-api, linux-arm-kernel, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, oleg, x86, libffi-discuss,
	luto, David.Laight, mark.rutland, mic, pavel

On Tue, Sep 22, 2020 at 09:46:16PM -0400, Arvind Sankar wrote:
> On Thu, Sep 17, 2020 at 10:36:02AM -0500, Madhavan T. Venkataraman wrote:
> > 
> > 
> > On 9/16/20 8:04 PM, Florian Weimer wrote:
> > > * madvenka:
> > > 
> > >> Examples of trampolines
> > >> =======================
> > >>
> > >> libffi (A Portable Foreign Function Interface Library):
> > >>
> > >> libffi allows a user to define functions with an arbitrary list of
> > >> arguments and return value through a feature called "Closures".
> > >> Closures use trampolines to jump to ABI handlers that handle calling
> > >> conventions and call a target function. libffi is used by a lot
> > >> of different applications. To name a few:
> > >>
> > >> 	- Python
> > >> 	- Java
> > >> 	- Javascript
> > >> 	- Ruby FFI
> > >> 	- Lisp
> > >> 	- Objective C
> > > 
> > > libffi does not actually need this.  It currently collocates
> > > trampolines and the data they need on the same page, but that's
> > > actually unecessary.  It's possible to avoid doing this just by
> > > changing libffi, without any kernel changes.
> > > 
> > > I think this has already been done for the iOS port.
> > > 
> > 
> > The trampoline table that has been implemented for the iOS port (MACH)
> > is based on PC-relative data referencing. That is, the code and data
> > are placed in adjacent pages so that the code can access the data using
> > an address relative to the current PC.
> > 
> > This is an ISA feature that is not supported on all architectures.
> > 
> > Now, if it is a performance feature, we can include some architectures
> > and exclude others. But this is a security feature. IMO, we cannot
> > exclude any architecture even if it is a legacy one as long as Linux
> > is running on the architecture. So, we need a solution that does
> > not assume any specific ISA feature.
> 
> Which ISA does not support PIC objects? You mentioned i386 below, but
> i386 does support them, it just needs to copy the PC into a GPR first
> (see below).
> 
> > 
> > >> The code for trampoline X in the trampoline table is:
> > >>
> > >> 	load	&code_table[X], code_reg
> > >> 	load	(code_reg), code_reg
> > >> 	load	&data_table[X], data_reg
> > >> 	load	(data_reg), data_reg
> > >> 	jump	code_reg
> > >>
> > >> The addresses &code_table[X] and &data_table[X] are baked into the
> > >> trampoline code. So, PC-relative data references are not needed. The user
> > >> can modify code_table[X] and data_table[X] dynamically.
> > > 
> > > You can put this code into the libffi shared object and map it from
> > > there, just like the rest of the libffi code.  To get more
> > > trampolines, you can map the page containing the trampolines multiple
> > > times, each instance preceded by a separate data page with the control
> > > information.
> > > 
> > 
> > If you put the code in the libffi shared object, how do you pass data to
> > the code at runtime? If the code we are talking about is a function, then
> > there is an ABI defined way to pass data to the function. But if the
> > code we are talking about is some arbitrary code such as a trampoline,
> > there is no ABI defined way to pass data to it except in a couple of
> > platforms such as HP PA-RISC that have support for function descriptors
> > in the ABI itself.
> > 
> > As mentioned before, if the ISA supports PC-relative data references
> > (e.g., X86 64-bit platforms support RIP-relative data references)
> > then we can pass data to that code by placing the code and data in
> > adjacent pages. So, you can implement the trampoline table for X64.
> > i386 does not support it.
> > 
> 
> i386 just needs a tiny bit of code to copy the PC into a GPR first, i.e.
> the trampoline would be:
> 
> 	call	1f
> 1:	pop	%data_reg
> 	movl	(code_table + X - 1b)(%data_reg), %code_reg
> 	movl	(data_table + X - 1b)(%data_reg), %data_reg
> 	jmp	*(%code_reg)
> 
> I do not understand the point about passing data at runtime. This
> trampoline is to achieve exactly that, no? 
> 
> Thanks.

For libffi, I think the proposed standard trampoline won't actually
work, because not all ABIs have two scratch registers available to use
as code_reg and data_reg. Eg i386 fastcall only has one, and register
has zero scratch registers. I believe 32-bit ARM only has one scratch
register as well.

For i386 you'd need something that saves a register on the stack first,
maybe like the below with a 16-byte trampoline and a 16-byte context
structure that has the address of the code to jump to in the first
dword:

	.balign 4096
	trampoline_page:

	.rept	4096/16-1
	0:	endbr32
		push	%eax
		call	__x86.get_pc_thunk.ax
	1:	jmp	trampoline
	.balign 16
	.endr

	.org trampoline_page + 4096 - 16
	__x86.get_pc_thunk.ax:
		movl	(%esp), %eax
		ret
	trampoline:
		subl	$(1b-0b), %eax
		jmp	*(table-trampoline_page)(%eax)

	.org trampoline_page + 4096
	table:

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  8:14   ` Pavel Machek
@ 2020-09-23  9:14     ` Solar Designer
  2020-09-23 14:11       ` Solar Designer
                         ` (2 more replies)
  2020-09-23 18:10     ` James Morris
  2020-09-23 18:32     ` Madhavan T. Venkataraman
  2 siblings, 3 replies; 50+ messages in thread
From: Solar Designer @ 2020-09-23  9:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: madvenka, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland, mic, Rich Felker

On Wed, Sep 23, 2020 at 10:14:26AM +0200, Pavel Machek wrote:
> > Introduction
> > ============
> > 
> > Dynamic code is used in many different user applications. Dynamic code is
> > often generated at runtime. Dynamic code can also just be a pre-defined
> > sequence of machine instructions in a data buffer. Examples of dynamic
> > code are trampolines, JIT code, DBT code, etc.
> > 
> > Dynamic code is placed either in a data page or in a stack page. In order
> > to execute dynamic code, the page it resides in needs to be mapped with
> > execute permissions. Writable pages with execute permissions provide an
> > attack surface for hackers. Attackers can use this to inject malicious
> > code, modify existing code or do other harm.
> > 
> > To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> > allow pages to have both write and execute permissions. This prevents
> > dynamic code from executing and blocks applications that use it. To allow
> > genuine applications to run, exceptions have to be made for them (by setting
> > execmem, etc) which opens the door to security issues.
> > 
> > The W^X implementation today is not complete. There exist many user level
> > tricks that can be used to load and execute dynamic code. E.g.,
> > 
> > - Load the code into a file and map the file with R-X.
> > 
> > - Load the code in an RW- page. Change the permissions to R--. Then,
> >   change the permissions to R-X.
> > 
> > - Load the code in an RW- page. Remap the page with R-X to get a separate
> >   mapping to the same underlying physical page.
> > 
> > IMO, these are all security holes as an attacker can exploit them to inject
> > his own code.
> 
> IMO, you are smoking crack^H^H very seriously misunderstanding what
> W^X is supposed to protect from.
> 
> W^X is not supposed to protect you from attackers that can already do
> system calls. So loading code into a file then mapping the file as R-X
> is in no way security hole in W^X.
> 
> If you want to provide protection from attackers that _can_ do system
> calls, fine, but please don't talk about W^X and please specify what
> types of attacks you want to prevent and why that's good thing.

On one hand, Pavel is absolutely right.  It is ridiculous to say that
"these are all security holes as an attacker can exploit them to inject
his own code."

On the other hand, "what W^X is supposed to protect from" depends on how
the term W^X is defined (historically, by PaX and OpenBSD).  It may be
that W^X is partially not a feature to defeat attacks per se, but also a
policy enforcement feature preventing use of dangerous techniques (JIT).

Such policy might or might not make sense.  It might make sense for ease
of reasoning, e.g. "I've flipped this setting, and now I'm certain the
system doesn't have JIT within a process (can still have it through
dynamically creating and invoking an entire new program), so there are
no opportunities for an attacker to inject code nor generate previously
non-existing ROP gadgets into an executable mapping within a process."

I do find it questionable whether such policy and such reasoning make
sense beyond academia.

Then, there might be even more ways in which W^X is not perfect enough
to enable such reasoning.  What about using ptrace(2) to inject code?
Should enabling W^X also disable ability to debug programs by non-root?
We already have Yama ptrace_scope, which can achieve that at the highest
setting, although that's rather inconvenient and is probably unexpected
by most to be a requirement for having (ridiculously?) full W^X allowing
for the academic reasoning.

Personally, I am for policies that make more practical sense.  For
example, years ago I advocated here on kernel-hardening that we should
have a mode where ELF flags enabling/disabling executable stack are
ignored, and non-executable stack is always enforced.  This should also
be extended to default (at program startup) permissions on more than
just stack (but also on .bss, typical libcs' heap allocations, etc.)
However, I am not convinced there's enough value in extending the policy
to restricting explicit uses of mprotect(2).

Yes, PaX did that, and its emutramp.txt said "runtime code generation is
by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
features, therefore the real solution is not in emulation but by
designing a kernel API for runtime code generation and modifying
userland to make use of it."  However, not being convinced in the
MPROTECT feature having enough practical value, I am also not convinced
"a kernel API for runtime code generation and modifying userland to make
use of it" is the way to go.

Having static instead of dynamically-generated trampolines in userland
code where possible (and making other userland/ABI changes to make that
possible in more/all cases) is an obvious improvement, and IMO should be
a priority over the above.

While I share my opinion here, I don't mean that to block Madhavan's
work.  I'd rather defer to people more knowledgeable in current userland
and ABI issues/limitations and plans on dealing with those, especially
to Florian Weimer.  I haven't seen Florian say anything specific for or
against Madhavan's proposal, and I'd like to.  (Have I missed that?)
It'd be wrong to introduce a kernel API that userland doesn't need, and
it'd be right to introduce one that userland actually intends to use.

I've also added Rich Felker to CC here, for musl libc and its possible
intent to use the proposed API.  (My guess is there's no such need, and
thus no intent, but Rich might want to confirm that or correct me.)

Alexander

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  9:14     ` Solar Designer
@ 2020-09-23 14:11       ` Solar Designer
  2020-09-23 15:18         ` Pavel Machek
  2020-09-23 14:39       ` Florian Weimer
  2020-09-23 19:41       ` Madhavan T. Venkataraman
  2 siblings, 1 reply; 50+ messages in thread
From: Solar Designer @ 2020-09-23 14:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: madvenka, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland, mic, Rich Felker

On Wed, Sep 23, 2020 at 11:14:56AM +0200, Solar Designer wrote:
> On Wed, Sep 23, 2020 at 10:14:26AM +0200, Pavel Machek wrote:
> > > Introduction
> > > ============
> > > 
> > > Dynamic code is used in many different user applications. Dynamic code is
> > > often generated at runtime. Dynamic code can also just be a pre-defined
> > > sequence of machine instructions in a data buffer. Examples of dynamic
> > > code are trampolines, JIT code, DBT code, etc.
> > > 
> > > Dynamic code is placed either in a data page or in a stack page. In order
> > > to execute dynamic code, the page it resides in needs to be mapped with
> > > execute permissions. Writable pages with execute permissions provide an
> > > attack surface for hackers. Attackers can use this to inject malicious
> > > code, modify existing code or do other harm.
> > > 
> > > To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> > > allow pages to have both write and execute permissions. This prevents
> > > dynamic code from executing and blocks applications that use it. To allow
> > > genuine applications to run, exceptions have to be made for them (by setting
> > > execmem, etc) which opens the door to security issues.
> > > 
> > > The W^X implementation today is not complete. There exist many user level
> > > tricks that can be used to load and execute dynamic code. E.g.,
> > > 
> > > - Load the code into a file and map the file with R-X.
> > > 
> > > - Load the code in an RW- page. Change the permissions to R--. Then,
> > >   change the permissions to R-X.
> > > 
> > > - Load the code in an RW- page. Remap the page with R-X to get a separate
> > >   mapping to the same underlying physical page.
> > > 
> > > IMO, these are all security holes as an attacker can exploit them to inject
> > > his own code.
> > 
> > IMO, you are smoking crack^H^H very seriously misunderstanding what
> > W^X is supposed to protect from.
> > 
> > W^X is not supposed to protect you from attackers that can already do
> > system calls. So loading code into a file then mapping the file as R-X
> > is in no way security hole in W^X.
> > 
> > If you want to provide protection from attackers that _can_ do system
> > calls, fine, but please don't talk about W^X and please specify what
> > types of attacks you want to prevent and why that's good thing.
> 
> On one hand, Pavel is absolutely right.  It is ridiculous to say that
> "these are all security holes as an attacker can exploit them to inject
> his own code."

I stand corrected, due to Brad's tweet and follow-ups here:

https://twitter.com/spendergrsec/status/1308728284390318082

It sure does make sense to combine ret2libc/ROP to mprotect() with one's
own injected shellcode.  Compared to doing everything from ROP, this is
easier and more reliable across versions/builds if the desired payload
is non-trivial.  My own example: invoking a shell in a local attack on
Linux is trivial enough to do via ret2libc only, but a connect-back
shell in a remote attack might be easier and more reliably done via
mprotect() + shellcode.

Per the follow-ups, this was an established technique on Windows and iOS
until further hardening prevented it.  So it does make sense for Linux
to do the same (as an option because of it breaking existing stuff), and
not so much as policy enforcement for the sake of it and ease of
reasoning, but mostly to force real-world exploits to be more complex
and less reliable.

> On the other hand, "what W^X is supposed to protect from" depends on how
> the term W^X is defined (historically, by PaX and OpenBSD).  It may be
> that W^X is partially not a feature to defeat attacks per se, but also a
> policy enforcement feature preventing use of dangerous techniques (JIT).
> 
> Such policy might or might not make sense.  It might make sense for ease
> of reasoning, e.g. "I've flipped this setting, and now I'm certain the
> system doesn't have JIT within a process (can still have it through
> dynamically creating and invoking an entire new program), so there are
> no opportunities for an attacker to inject code nor generate previously
> non-existing ROP gadgets into an executable mapping within a process."
> 
> I do find it questionable whether such policy and such reasoning make
> sense beyond academia.

I was wrong in the above, focusing on the wrong thing.

> Then, there might be even more ways in which W^X is not perfect enough
> to enable such reasoning.  What about using ptrace(2) to inject code?
> Should enabling W^X also disable ability to debug programs by non-root?
> We already have Yama ptrace_scope, which can achieve that at the highest
> setting, although that's rather inconvenient and is probably unexpected
> by most to be a requirement for having (ridiculously?) full W^X allowing
> for the academic reasoning.

Thinking out loud:

Technically, ptrace() is also usable from a ROP chain.  It might be too
cumbersome to bother using to get a shellcode going, but OTOH it's just
one function to be invoked in a similar fashion multiple times, so might
be more reliable than having a ROP chain depend on multiple actually
needed functions directly (moving that dependency into the shellcode).

> Personally, I am for policies that make more practical sense.  For
> example, years ago I advocated here on kernel-hardening that we should
> have a mode where ELF flags enabling/disabling executable stack are
> ignored, and non-executable stack is always enforced.  This should also
> be extended to default (at program startup) permissions on more than
> just stack (but also on .bss, typical libcs' heap allocations, etc.)
> However, I am not convinced there's enough value in extending the policy
> to restricting explicit uses of mprotect(2).
> 
> Yes, PaX did that, and its emutramp.txt said "runtime code generation is
> by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
> features, therefore the real solution is not in emulation but by
> designing a kernel API for runtime code generation and modifying
> userland to make use of it."  However, not being convinced in the
> MPROTECT feature having enough practical value,

I am convinced now, however:

> I am also not convinced
> "a kernel API for runtime code generation and modifying userland to make
> use of it" is the way to go.

doesn't automatically follow from the above, because:

> Having static instead of dynamically-generated trampolines in userland
> code where possible (and making other userland/ABI changes to make that
> possible in more/all cases) is an obvious improvement, and IMO should be
> a priority over the above.
> 
> While I share my opinion here, I don't mean that to block Madhavan's
> work.  I'd rather defer to people more knowledgeable in current userland
> and ABI issues/limitations and plans on dealing with those, especially
> to Florian Weimer.  I haven't seen Florian say anything specific for or
> against Madhavan's proposal, and I'd like to.  (Have I missed that?)
> It'd be wrong to introduce a kernel API that userland doesn't need, and
> it'd be right to introduce one that userland actually intends to use.
> 
> I've also added Rich Felker to CC here, for musl libc and its possible
> intent to use the proposed API.  (My guess is there's no such need, and
> thus no intent, but Rich might want to confirm that or correct me.)

So need to hear more from the userland folks, I guess.

Alexander

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  9:14     ` Solar Designer
  2020-09-23 14:11       ` Solar Designer
@ 2020-09-23 14:39       ` Florian Weimer
  2020-09-23 18:09         ` Andy Lutomirski
                           ` (2 more replies)
  2020-09-23 19:41       ` Madhavan T. Venkataraman
  2 siblings, 3 replies; 50+ messages in thread
From: Florian Weimer @ 2020-09-23 14:39 UTC (permalink / raw)
  To: Solar Designer
  Cc: Pavel Machek, madvenka, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight,
	mark.rutland, mic, Rich Felker

* Solar Designer:

> While I share my opinion here, I don't mean that to block Madhavan's
> work.  I'd rather defer to people more knowledgeable in current userland
> and ABI issues/limitations and plans on dealing with those, especially
> to Florian Weimer.  I haven't seen Florian say anything specific for or
> against Madhavan's proposal, and I'd like to.  (Have I missed that?)

There was a previous discussion, where I provided feedback (not much
different from the feedback here, given that the mechanism is mostly the
same).

I think it's unnecessary for the libffi use case.  Precompiled code can
be loaded from disk because the libffi trampolines are so regular.  On
most architectures, it's not even the code that's patched, but some of
the data driving it, which happens to be located on the same page due to
a libffi quirk.

The libffi use case is a bit strange anyway: its trampolines are
type-generic, and the per-call adjustment is data-driven.  This means
that once you have libffi in the process, you have a generic
data-to-function-call mechanism available that can be abused (it's even
fully CET compatible in recent versions).  And then you need to look at
the processes that use libffi.  A lot of them contain bytecode
interpreters, and those enable data-driven arbitrary code execution as
well.  I know that there are efforts under way to harden Python, but
it's going to be tough to get to the point where things are still
difficult for an attacker once they have the ability to make mprotect
calls.

It was pointed out to me that libffi is doing things wrong, and the
trampolines should not be type-generic, but generated so that they match
the function being called.  That is, the marshal/unmarshal code would be
open-coded in the trampoline, rather than using some generic mechanism
plus run-time dispatch on data tables describing the function type.
That is a very different design (and typically used by compilers (JIT or
not JIT) to implement native calls).  Mapping some code page with a
repeating pattern would no longer work to defeat anti-JIT measures
because it's closer to real JIT.  I don't know if kernel support could
make sense in this context, but it would be a completely different
patch.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 14:11       ` Solar Designer
@ 2020-09-23 15:18         ` Pavel Machek
  2020-09-23 18:00           ` Solar Designer
  0 siblings, 1 reply; 50+ messages in thread
From: Pavel Machek @ 2020-09-23 15:18 UTC (permalink / raw)
  To: Solar Designer
  Cc: madvenka, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland, mic, Rich Felker

[-- Attachment #1: Type: text/plain, Size: 2377 bytes --]

Hi!

> > > > The W^X implementation today is not complete. There exist many user level
> > > > tricks that can be used to load and execute dynamic code. E.g.,
> > > > 
> > > > - Load the code into a file and map the file with R-X.
> > > > 
> > > > - Load the code in an RW- page. Change the permissions to R--. Then,
> > > >   change the permissions to R-X.
> > > > 
> > > > - Load the code in an RW- page. Remap the page with R-X to get a separate
> > > >   mapping to the same underlying physical page.
> > > > 
> > > > IMO, these are all security holes as an attacker can exploit them to inject
> > > > his own code.
> > > 
> > > IMO, you are smoking crack^H^H very seriously misunderstanding what
> > > W^X is supposed to protect from.
> > > 
> > > W^X is not supposed to protect you from attackers that can already do
> > > system calls. So loading code into a file then mapping the file as R-X
> > > is in no way security hole in W^X.
> > > 
> > > If you want to provide protection from attackers that _can_ do system
> > > calls, fine, but please don't talk about W^X and please specify what
> > > types of attacks you want to prevent and why that's good thing.
> > 
> > On one hand, Pavel is absolutely right.  It is ridiculous to say that
> > "these are all security holes as an attacker can exploit them to inject
> > his own code."
> 
> I stand corrected, due to Brad's tweet and follow-ups here:
> 
> https://twitter.com/spendergrsec/status/1308728284390318082
> 
> It sure does make sense to combine ret2libc/ROP to mprotect() with one's
> own injected shellcode.  Compared to doing everything from ROP, this is
> easier and more reliable across versions/builds if the desired
> payload

Ok, so this starts to be a bit confusing.

I thought W^X is to protect from attackers that have overflowed buffer
somewhere, but can not to do arbitrary syscalls, yet.

You are saying that there's important class of attackers that can do
some syscalls but not arbitrary ones.

I'd like to see definition of that attacker (and perhaps description
of the system the protection is expected to be useful on -- if it is
not close to common Linux distros).

Best regards,

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 15:18         ` Pavel Machek
@ 2020-09-23 18:00           ` Solar Designer
  2020-09-23 18:21             ` Solar Designer
  0 siblings, 1 reply; 50+ messages in thread
From: Solar Designer @ 2020-09-23 18:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: madvenka, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland, mic, Rich Felker

On Wed, Sep 23, 2020 at 05:18:35PM +0200, Pavel Machek wrote:
> > It sure does make sense to combine ret2libc/ROP to mprotect() with one's
> > own injected shellcode.  Compared to doing everything from ROP, this is
> > easier and more reliable across versions/builds if the desired
> > payload
> 
> Ok, so this starts to be a bit confusing.
> 
> I thought W^X is to protect from attackers that have overflowed buffer
> somewhere, but can not to do arbitrary syscalls, yet.
> 
> You are saying that there's important class of attackers that can do
> some syscalls but not arbitrary ones.

They might be able to do many, most, or all arbitrary syscalls via
ret2libc or such.  The crucial detail is that each time they do that,
they risk incompatibility with the given target system (version, build,
maybe ASLR if gadgets from multiple libraries are involved).  By using
mprotect(), they only take this risk once (need to get the address of an
mprotect() gadget and of what to change protections on right), and then
they can invoke multiple syscalls from their shellcode more reliably.
So for doing a lot of work, mprotect() combined with injected code can
be easier and more reliable.  It is also an extra option an attacker can
use, in addition to doing everything via borrowed code.  More
flexibility for the attacker means the attacker may choose whichever
approach works better in a given case (or try several).

I am embarrassed for not thinking/recalling this when I first posted
earlier today.  It's actually obvious.  I'm just getting old and rusty.

> I'd like to see definition of that attacker (and perhaps description
> of the system the protection is expected to be useful on -- if it is
> not close to common Linux distros).

There's nothing unusual about that attacker and the system.

A couple of other things Brad kindly pointed out:

SELinux already has similar protections (execmem, execmod):

http://lkml.iu.edu/hypermail/linux/kernel/0508.2/0194.html
https://danwalsh.livejournal.com/6117.html

PaX MPROTECT is implemented in a way or at a layer that covers ptrace()
abuse that I mentioned.  (At least that's how I understood Brad.)

Alexander

P.S. Meanwhile, Twitter locked my account "for security purposes".  Fun.
I'll just let it be for now.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 14:39       ` Florian Weimer
@ 2020-09-23 18:09         ` Andy Lutomirski
  2020-09-23 18:11         ` Solar Designer
  2020-09-23 23:53         ` Madhavan T. Venkataraman
  2 siblings, 0 replies; 50+ messages in thread
From: Andy Lutomirski @ 2020-09-23 18:09 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Solar Designer, Pavel Machek, Madhavan T. Venkataraman,
	Kernel Hardening, Linux API, linux-arm-kernel, Linux FS Devel,
	linux-integrity, LKML, LSM List, Oleg Nesterov, X86 ML,
	Andrew Lutomirski, David Laight, Mark Rutland,
	Mickaël Salaün, Rich Felker

On Wed, Sep 23, 2020 at 7:39 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Solar Designer:
>
> > While I share my opinion here, I don't mean that to block Madhavan's
> > work.  I'd rather defer to people more knowledgeable in current userland
> > and ABI issues/limitations and plans on dealing with those, especially
> > to Florian Weimer.  I haven't seen Florian say anything specific for or
> > against Madhavan's proposal, and I'd like to.  (Have I missed that?)
>
> There was a previous discussion, where I provided feedback (not much
> different from the feedback here, given that the mechanism is mostly the
> same).
>
> I think it's unnecessary for the libffi use case.  Precompiled code can
> be loaded from disk because the libffi trampolines are so regular.  On
> most architectures, it's not even the code that's patched, but some of
> the data driving it, which happens to be located on the same page due to
> a libffi quirk.
>
> The libffi use case is a bit strange anyway: its trampolines are
> type-generic, and the per-call adjustment is data-driven.  This means
> that once you have libffi in the process, you have a generic
> data-to-function-call mechanism available that can be abused (it's even
> fully CET compatible in recent versions).  And then you need to look at
> the processes that use libffi.  A lot of them contain bytecode
> interpreters, and those enable data-driven arbitrary code execution as
> well.  I know that there are efforts under way to harden Python, but
> it's going to be tough to get to the point where things are still
> difficult for an attacker once they have the ability to make mprotect
> calls.
>
> It was pointed out to me that libffi is doing things wrong, and the
> trampolines should not be type-generic, but generated so that they match
> the function being called.  That is, the marshal/unmarshal code would be
> open-coded in the trampoline, rather than using some generic mechanism
> plus run-time dispatch on data tables describing the function type.
> That is a very different design (and typically used by compilers (JIT or
> not JIT) to implement native calls).  Mapping some code page with a
> repeating pattern would no longer work to defeat anti-JIT measures
> because it's closer to real JIT.  I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.

I would very much like to see a well-designed kernel facility for
helping userspace do JIT in a safer manner, but designing such a thing
is likely to be distinctly nontrivial.  To throw a half-backed idea
out there, suppose a program could pre-declare a list of JIT
verifiers:

static bool ffi_trampoline_verifier(void *target_address, size_t
target_size, void *source_data, void *context);

struct jit_verifier {
  .magic = 0xMAGIC_HERE,
  .verifier = ffi_trampoline_verifier,
} my_verifier __attribute((section("something special here?)));

and then a system call something like:

instantiate_jit_code(target, source, size, &my_verifier, context);

The idea being that even an attacker that can force a call to
instantiate_jit_code() can only create code that passes verification
by one of the pre-declared verifiers in the process.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  8:14   ` Pavel Machek
  2020-09-23  9:14     ` Solar Designer
@ 2020-09-23 18:10     ` James Morris
  2020-09-23 18:32     ` Madhavan T. Venkataraman
  2 siblings, 0 replies; 50+ messages in thread
From: James Morris @ 2020-09-23 18:10 UTC (permalink / raw)
  To: Pavel Machek
  Cc: madvenka, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland, mic

On Wed, 23 Sep 2020, Pavel Machek wrote:

> This is not first crazy patch from your company. Perhaps you should
> have a person with strong Unix/Linux experience performing "straight
> face test" on outgoing patches?

Just for the record: the author of the code has 30+ years experience in 
SunOS, Solaris, Unixware, Realtime, SVR4, and Linux.


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 14:39       ` Florian Weimer
  2020-09-23 18:09         ` Andy Lutomirski
@ 2020-09-23 18:11         ` Solar Designer
  2020-09-23 18:49           ` Arvind Sankar
  2020-09-23 23:53         ` Madhavan T. Venkataraman
  2 siblings, 1 reply; 50+ messages in thread
From: Solar Designer @ 2020-09-23 18:11 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Pavel Machek, madvenka, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight,
	mark.rutland, mic, Rich Felker

On Wed, Sep 23, 2020 at 04:39:31PM +0200, Florian Weimer wrote:
> * Solar Designer:
> 
> > While I share my opinion here, I don't mean that to block Madhavan's
> > work.  I'd rather defer to people more knowledgeable in current userland
> > and ABI issues/limitations and plans on dealing with those, especially
> > to Florian Weimer.  I haven't seen Florian say anything specific for or
> > against Madhavan's proposal, and I'd like to.  (Have I missed that?)

[...]
> I think it's unnecessary for the libffi use case.
[...]

> I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.

Thanks.  Are there currently relevant use cases where the proposed
trampfd would be useful and likely actually made use of by userland -
e.g., specific userland project developers saying they'd use it, or
Madhavan intending to develop and contribute userland patches?

Alexander

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 18:00           ` Solar Designer
@ 2020-09-23 18:21             ` Solar Designer
  0 siblings, 0 replies; 50+ messages in thread
From: Solar Designer @ 2020-09-23 18:21 UTC (permalink / raw)
  To: Pavel Machek
  Cc: madvenka, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland, mic, Rich Felker

On Wed, Sep 23, 2020 at 08:00:07PM +0200, Solar Designer wrote:
> A couple of other things Brad kindly pointed out:
> 
> SELinux already has similar protections (execmem, execmod):
> 
> http://lkml.iu.edu/hypermail/linux/kernel/0508.2/0194.html
> https://danwalsh.livejournal.com/6117.html

Actually, that's right in Madhavan's "Introduction": "LSMs such as
SELinux implement W^X" and "The W^X implementation today is not
complete."  I'm sorry I jumped into this thread out of context.

Alexander

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  8:14   ` Pavel Machek
  2020-09-23  9:14     ` Solar Designer
  2020-09-23 18:10     ` James Morris
@ 2020-09-23 18:32     ` Madhavan T. Venkataraman
  2 siblings, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 18:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic

...
>> The W^X implementation today is not complete. There exist many user level
>> tricks that can be used to load and execute dynamic code. E.g.,
>>
>> - Load the code into a file and map the file with R-X.
>>
>> - Load the code in an RW- page. Change the permissions to R--. Then,
>>   change the permissions to R-X.
>>
>> - Load the code in an RW- page. Remap the page with R-X to get a separate
>>   mapping to the same underlying physical page.
>>
>> IMO, these are all security holes as an attacker can exploit them to inject
>> his own code.
> 
> IMO, you are smoking crack^H^H very seriously misunderstanding what
> W^X is supposed to protect from.
> 
> W^X is not supposed to protect you from attackers that can already do
> system calls. So loading code into a file then mapping the file as R-X
> is in no way security hole in W^X.
> 
> If you want to provide protection from attackers that _can_ do system
> calls, fine, but please don't talk about W^X and please specify what
> types of attacks you want to prevent and why that's good thing.
> 


There are two things here - the idea behind W^X and the current realization
of that idea in actual implementation. The idea behind W^X, as I understand,
is to prevent a user from loading arbitrary code into a page and getting it
to execute. If the user code contains a vulnerability, an attacker can 
exploit it to potentially inject his own code and get it to execute. This
cannot be denied.

From that perspective, all of the above tricks I have mentioned are tricks
that user code can use to load arbitrary code into a page and get it to
execute.

Now, I don't want the discussion to be stuck in a mere name. If what I am
suggesting needs a name other than "W^X" in the opinion of the reviewers,
that is fine with me. But I don't believe there is any disagreement that
the above user tricks are security holes.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 18:11         ` Solar Designer
@ 2020-09-23 18:49           ` Arvind Sankar
  0 siblings, 0 replies; 50+ messages in thread
From: Arvind Sankar @ 2020-09-23 18:49 UTC (permalink / raw)
  To: Solar Designer
  Cc: Florian Weimer, Pavel Machek, madvenka, kernel-hardening,
	linux-api, linux-arm-kernel, linux-fsdevel, linux-integrity,
	linux-kernel, linux-security-module, oleg, x86, luto,
	David.Laight, mark.rutland, mic, Rich Felker

On Wed, Sep 23, 2020 at 08:11:36PM +0200, Solar Designer wrote:
> On Wed, Sep 23, 2020 at 04:39:31PM +0200, Florian Weimer wrote:
> > * Solar Designer:
> > 
> > > While I share my opinion here, I don't mean that to block Madhavan's
> > > work.  I'd rather defer to people more knowledgeable in current userland
> > > and ABI issues/limitations and plans on dealing with those, especially
> > > to Florian Weimer.  I haven't seen Florian say anything specific for or
> > > against Madhavan's proposal, and I'd like to.  (Have I missed that?)
> 
> [...]
> > I think it's unnecessary for the libffi use case.
> [...]
> 
> > I don't know if kernel support could
> > make sense in this context, but it would be a completely different
> > patch.
> 
> Thanks.  Are there currently relevant use cases where the proposed
> trampfd would be useful and likely actually made use of by userland -
> e.g., specific userland project developers saying they'd use it, or
> Madhavan intending to develop and contribute userland patches?
> 
> Alexander

The trampoline it provides in this version can be implemented completely
in userspace. The kernel part of it is essentially just providing a way
to do text relocations without needing a WX mapping, but the text
relocations would be unnecessary in the first place if the trampoline
was position-independent code.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  8:42   ` Pavel Machek
@ 2020-09-23 18:56     ` Madhavan T. Venkataraman
  2020-09-23 20:51       ` Pavel Machek
  0 siblings, 1 reply; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 18:56 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic



On 9/23/20 3:42 AM, Pavel Machek wrote:
> Hi!
> 
>> Solution proposed in this RFC
>> =============================
>>
>> >From this RFC's perspective, there are two scenarios for dynamic code:
>>
>> Scenario 1
>> ----------
>>
>> We know what code we need only at runtime. For instance, JIT code generated
>> for frequently executed Java methods. Only at runtime do we know what
>> methods need to be JIT compiled. Such code cannot be statically defined. It
>> has to be generated at runtime.
>>
>> Scenario 2
>> ----------
>>
>> We know what code we need in advance. User trampolines are a good example of
>> this. It is possible to define such code statically with some help from the
>> kernel.
>>
>> This RFC addresses (2). (1) needs a general purpose trusted code generator
>> and is out of scope for this RFC.
> 
> This is slightly less crazy talk than introduction talking about holes
> in W^X. But it is very, very far from normal Unix system, where you
> have selection of interpretters to run your malware on (sh, python,
> awk, emacs, ...) and often you can even compile malware from sources. 
> 
> And as you noted, we don't have "a general purpose trusted code
> generator" for our systems.
> 
> I believe you should simply delete confusing "introduction" and
> provide details of super-secure system where your patches would be
> useful, instead.
> 
> Best regards,
> 									Pavel
> 

This RFC talks about converting dynamic code (which cannot be authenticated)
to static code that can be authenticated using signature verification. That
is the scope of this RFC.

If I have not been clear before, by dynamic code, I mean machine code that is
dynamic in nature. Scripts are beyond the scope of this RFC.

Also, malware compiled from sources is not dynamic code. That is orthogonal
to this RFC. If such malware has a valid signature that the kernel permits its
execution, we have a systemic problem.

I am not saying that script authentication or compiled malware are not problems.
I am just saying that this RFC is not trying to solve all of the security problems.
It is trying to define one way to convert dynamic code to static code to address
one class of problems.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  9:11         ` Arvind Sankar
@ 2020-09-23 19:17           ` Madhavan T. Venkataraman
  2020-09-23 19:51             ` Arvind Sankar
  0 siblings, 1 reply; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 19:17 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Florian Weimer, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel



On 9/23/20 4:11 AM, Arvind Sankar wrote:
> For libffi, I think the proposed standard trampoline won't actually
> work, because not all ABIs have two scratch registers available to use
> as code_reg and data_reg. Eg i386 fastcall only has one, and register
> has zero scratch registers. I believe 32-bit ARM only has one scratch
> register as well.

The trampoline is invoked as a function call in the libffi case. Any
caller saved register can be used as code_reg, can it not? And the
scratch register is needed only to jump to the code. After that, it
can be reused for any other purpose.

However, for ARM, you are quite correct. There is only one scratch
register. This means that I have to provide two types of trampolines:

	- If an architecture has enough scratch registers, use the currently
	  defined trampoline.

	- If the architecture has only one scratch register, but has PC-relative
	  data references, then embed the code address at the bottom of the
	  trampoline and access it using PC-relative addressing.

Thanks for pointing this out.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23  9:14     ` Solar Designer
  2020-09-23 14:11       ` Solar Designer
  2020-09-23 14:39       ` Florian Weimer
@ 2020-09-23 19:41       ` Madhavan T. Venkataraman
  2 siblings, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 19:41 UTC (permalink / raw)
  To: Solar Designer, Pavel Machek
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic, Rich Felker



On 9/23/20 4:14 AM, Solar Designer wrote:
>>> The W^X implementation today is not complete. There exist many user level
>>> tricks that can be used to load and execute dynamic code. E.g.,
>>>
>>> - Load the code into a file and map the file with R-X.
>>>
>>> - Load the code in an RW- page. Change the permissions to R--. Then,
>>>   change the permissions to R-X.
>>>
>>> - Load the code in an RW- page. Remap the page with R-X to get a separate
>>>   mapping to the same underlying physical page.
>>>
>>> IMO, these are all security holes as an attacker can exploit them to inject
>>> his own code.
>> IMO, you are smoking crack^H^H very seriously misunderstanding what
>> W^X is supposed to protect from.
>>
>> W^X is not supposed to protect you from attackers that can already do
>> system calls. So loading code into a file then mapping the file as R-X
>> is in no way security hole in W^X.
>>
>> If you want to provide protection from attackers that _can_ do system
>> calls, fine, but please don't talk about W^X and please specify what
>> types of attacks you want to prevent and why that's good thing.
> On one hand, Pavel is absolutely right.  It is ridiculous to say that
> "these are all security holes as an attacker can exploit them to inject
> his own code."
> 

Why? Isn't it possible that an attacker can exploit some vulnerability such
as buffer overflow and overwrite the buffer that contains the dynamic code?


> On the other hand, "what W^X is supposed to protect from" depends on how
> the term W^X is defined (historically, by PaX and OpenBSD).  It may be
> that W^X is partially not a feature to defeat attacks per se, but also a
> policy enforcement feature preventing use of dangerous techniques (JIT).
> 
> Such policy might or might not make sense.  It might make sense for ease
> of reasoning, e.g. "I've flipped this setting, and now I'm certain the
> system doesn't have JIT within a process (can still have it through
> dynamically creating and invoking an entire new program), so there are
> no opportunities for an attacker to inject code nor generate previously
> non-existing ROP gadgets into an executable mapping within a process."
> 
> I do find it questionable whether such policy and such reasoning make
> sense beyond academia.
> 
> Then, there might be even more ways in which W^X is not perfect enough
> to enable such reasoning.  What about using ptrace(2) to inject code?
> Should enabling W^X also disable ability to debug programs by non-root?
> We already have Yama ptrace_scope, which can achieve that at the highest
> setting, although that's rather inconvenient and is probably unexpected
> by most to be a requirement for having (ridiculously?) full W^X allowing
> for the academic reasoning.
> 

I am not suggesting that W^X be fixed. That is up to the maintainers of that
code. I am saying that if the security subsystem is enhanced in the future with
policies and settings that prevent the user tricks I mentioned, then it becomes
impossible to execute dynamic code except by making security exceptions on a case
by case basis.

As an alternative to making security exceptions, one could convert dynamic code
to static code which can then be authenticated.

> Personally, I am for policies that make more practical sense.  For
> example, years ago I advocated here on kernel-hardening that we should
> have a mode where ELF flags enabling/disabling executable stack are
> ignored, and non-executable stack is always enforced.  This should also
> be extended to default (at program startup) permissions on more than
> just stack (but also on .bss, typical libcs' heap allocations, etc.)
> However, I am not convinced there's enough value in extending the policy
> to restricting explicit uses of mprotect(2).
> 
> Yes, PaX did that, and its emutramp.txt said "runtime code generation is
> by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
> features, therefore the real solution is not in emulation but by
> designing a kernel API for runtime code generation and modifying
> userland to make use of it."  However, not being convinced in the
> MPROTECT feature having enough practical value, I am also not convinced
> "a kernel API for runtime code generation and modifying userland to make
> use of it" is the way to go.
> 

In a separate email, I will try to answer this and provide justification
for why it is better to do it in the kernel.

> Having static instead of dynamically-generated trampolines in userland
> code where possible (and making other userland/ABI changes to make that
> possible in more/all cases) is an obvious improvement, and IMO should be
> a priority over the above.
>

> While I share my opinion here, I don't mean that to block Madhavan's
> work.  I'd rather defer to people more knowledgeable in current userland
> and ABI issues/limitations and plans on dealing with those, especially
> to Florian Weimer.  I haven't seen Florian say anything specific for or
> against Madhavan's proposal, and I'd like to.  (Have I missed that?)
> It'd be wrong to introduce a kernel API that userland doesn't need, and
> it'd be right to introduce one that userland actually intends to use.
> 
> I've also added Rich Felker to CC here, for musl libc and its possible
> intent to use the proposed API.  (My guess is there's no such need, and
> thus no intent, but Rich might want to confirm that or correct me.)
> 
> Alexander

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 19:17           ` Madhavan T. Venkataraman
@ 2020-09-23 19:51             ` Arvind Sankar
  2020-09-23 23:51               ` Madhavan T. Venkataraman
  2020-09-24 20:23               ` Madhavan T. Venkataraman
  0 siblings, 2 replies; 50+ messages in thread
From: Arvind Sankar @ 2020-09-23 19:51 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Arvind Sankar, Florian Weimer, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel

On Wed, Sep 23, 2020 at 02:17:30PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 9/23/20 4:11 AM, Arvind Sankar wrote:
> > For libffi, I think the proposed standard trampoline won't actually
> > work, because not all ABIs have two scratch registers available to use
> > as code_reg and data_reg. Eg i386 fastcall only has one, and register
> > has zero scratch registers. I believe 32-bit ARM only has one scratch
> > register as well.
> 
> The trampoline is invoked as a function call in the libffi case. Any
> caller saved register can be used as code_reg, can it not? And the
> scratch register is needed only to jump to the code. After that, it
> can be reused for any other purpose.
> 
> However, for ARM, you are quite correct. There is only one scratch
> register. This means that I have to provide two types of trampolines:
> 
> 	- If an architecture has enough scratch registers, use the currently
> 	  defined trampoline.
> 
> 	- If the architecture has only one scratch register, but has PC-relative
> 	  data references, then embed the code address at the bottom of the
> 	  trampoline and access it using PC-relative addressing.
> 
> Thanks for pointing this out.
> 
> Madhavan

libffi is trying to provide closures with non-standard ABIs as well: the
actual user function is standard ABI, but the closure can be called with
a different ABI. If the closure was created with FFI_REGISTER abi, there
are no registers available for the trampoline to use: EAX, EDX and ECX
contain the first three arguments of the function, and every other
register is callee-save.

I provided a sample of the kind of trampoline that would be needed in
this case -- it's position-independent and doesn't clobber any registers
at all, and you get 255 trampolines per page. If I take another 16-byte
slot out of the page for the end trampoline that does the actual work,
I'm sure I could even come up with one that can just call a normal C
function, only the return might need special handling depending on the
return type.

And again, do you actually have any example of an architecture that
cannot run position-independent code? PC-relative addressing is an
implementation detail: the fact that it's available for x86_64 but not
for i386 just makes position-independent code more cumbersome on i386,
but it doesn't make it impossible. For the tiny trampolines here, it
makes almost no difference.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 18:56     ` Madhavan T. Venkataraman
@ 2020-09-23 20:51       ` Pavel Machek
  2020-09-23 23:04         ` Madhavan T. Venkataraman
  2020-09-24 16:44         ` Mickaël Salaün
  0 siblings, 2 replies; 50+ messages in thread
From: Pavel Machek @ 2020-09-23 20:51 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic

[-- Attachment #1: Type: text/plain, Size: 2407 bytes --]

Hi!

> >> Scenario 2
> >> ----------
> >>
> >> We know what code we need in advance. User trampolines are a good example of
> >> this. It is possible to define such code statically with some help from the
> >> kernel.
> >>
> >> This RFC addresses (2). (1) needs a general purpose trusted code generator
> >> and is out of scope for this RFC.
> > 
> > This is slightly less crazy talk than introduction talking about holes
> > in W^X. But it is very, very far from normal Unix system, where you
> > have selection of interpretters to run your malware on (sh, python,
> > awk, emacs, ...) and often you can even compile malware from sources. 
> > 
> > And as you noted, we don't have "a general purpose trusted code
> > generator" for our systems.
> > 
> > I believe you should simply delete confusing "introduction" and
> > provide details of super-secure system where your patches would be
> > useful, instead.
> 
> This RFC talks about converting dynamic code (which cannot be authenticated)
> to static code that can be authenticated using signature verification. That
> is the scope of this RFC.
> 
> If I have not been clear before, by dynamic code, I mean machine code that is
> dynamic in nature. Scripts are beyond the scope of this RFC.
> 
> Also, malware compiled from sources is not dynamic code. That is orthogonal
> to this RFC. If such malware has a valid signature that the kernel permits its
> execution, we have a systemic problem.
> 
> I am not saying that script authentication or compiled malware are not problems.
> I am just saying that this RFC is not trying to solve all of the security problems.
> It is trying to define one way to convert dynamic code to static code to address
> one class of problems.

Well, you don't have to solve all problems at once.

But solutions have to exist, and AFAIK in this case they don't. You
are armoring doors, but ignoring open windows.

Or very probably you are thinking about something different than
normal desktop distros (Debian 10). Because on my systems, I have
python, gdb and gcc...

It would be nice to specify what other pieces need to be present for
this to make sense -- because it makes no sense on Debian 10.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 20:51       ` Pavel Machek
@ 2020-09-23 23:04         ` Madhavan T. Venkataraman
  2020-09-24 16:44         ` Mickaël Salaün
  1 sibling, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 23:04 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland, mic



On 9/23/20 3:51 PM, Pavel Machek wrote:
> Hi!
> 
>>>> Scenario 2
>>>> ----------
>>>>
>>>> We know what code we need in advance. User trampolines are a good example of
>>>> this. It is possible to define such code statically with some help from the
>>>> kernel.
>>>>
>>>> This RFC addresses (2). (1) needs a general purpose trusted code generator
>>>> and is out of scope for this RFC.
>>>
>>> This is slightly less crazy talk than introduction talking about holes
>>> in W^X. But it is very, very far from normal Unix system, where you
>>> have selection of interpretters to run your malware on (sh, python,
>>> awk, emacs, ...) and often you can even compile malware from sources. 
>>>
>>> And as you noted, we don't have "a general purpose trusted code
>>> generator" for our systems.
>>>
>>> I believe you should simply delete confusing "introduction" and
>>> provide details of super-secure system where your patches would be
>>> useful, instead.
>>
>> This RFC talks about converting dynamic code (which cannot be authenticated)
>> to static code that can be authenticated using signature verification. That
>> is the scope of this RFC.
>>
>> If I have not been clear before, by dynamic code, I mean machine code that is
>> dynamic in nature. Scripts are beyond the scope of this RFC.
>>
>> Also, malware compiled from sources is not dynamic code. That is orthogonal
>> to this RFC. If such malware has a valid signature that the kernel permits its
>> execution, we have a systemic problem.
>>
>> I am not saying that script authentication or compiled malware are not problems.
>> I am just saying that this RFC is not trying to solve all of the security problems.
>> It is trying to define one way to convert dynamic code to static code to address
>> one class of problems.
> 
> Well, you don't have to solve all problems at once.
> 
> But solutions have to exist, and AFAIK in this case they don't. You
> are armoring doors, but ignoring open windows.
> 

I am afraid I don't agree that the other open security issues must be
addressed for this RFC to make sense. If you think that any of those
issues actually has a bad interaction/intersection with this RFC,
let me know how and I will address it.

> Or very probably you are thinking about something different than
> normal desktop distros (Debian 10). Because on my systems, I have
> python, gdb and gcc...
> 
> It would be nice to specify what other pieces need to be present for
> this to make sense -- because it makes no sense on Debian 10.
> 

Since this RFC pertains to converting dynamic machine code to static
code, it has nothing to do with the other items you have mentioned.
I am not disagreeing that the other items need to be addressed. But
they are orthogonal.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 19:51             ` Arvind Sankar
@ 2020-09-23 23:51               ` Madhavan T. Venkataraman
  2020-09-24 20:23               ` Madhavan T. Venkataraman
  1 sibling, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 23:51 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Florian Weimer, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel



On 9/23/20 2:51 PM, Arvind Sankar wrote:
> On Wed, Sep 23, 2020 at 02:17:30PM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 9/23/20 4:11 AM, Arvind Sankar wrote:
>>> For libffi, I think the proposed standard trampoline won't actually
>>> work, because not all ABIs have two scratch registers available to use
>>> as code_reg and data_reg. Eg i386 fastcall only has one, and register
>>> has zero scratch registers. I believe 32-bit ARM only has one scratch
>>> register as well.
>>
>> The trampoline is invoked as a function call in the libffi case. Any
>> caller saved register can be used as code_reg, can it not? And the
>> scratch register is needed only to jump to the code. After that, it
>> can be reused for any other purpose.
>>
>> However, for ARM, you are quite correct. There is only one scratch
>> register. This means that I have to provide two types of trampolines:
>>
>> 	- If an architecture has enough scratch registers, use the currently
>> 	  defined trampoline.
>>
>> 	- If the architecture has only one scratch register, but has PC-relative
>> 	  data references, then embed the code address at the bottom of the
>> 	  trampoline and access it using PC-relative addressing.
>>
>> Thanks for pointing this out.
>>
>> Madhavan
> 
> libffi is trying to provide closures with non-standard ABIs as well: the
> actual user function is standard ABI, but the closure can be called with
> a different ABI. If the closure was created with FFI_REGISTER abi, there
> are no registers available for the trampoline to use: EAX, EDX and ECX
> contain the first three arguments of the function, and every other
> register is callee-save.
> 
> I provided a sample of the kind of trampoline that would be needed in
> this case -- it's position-independent and doesn't clobber any registers
> at all, and you get 255 trampolines per page. If I take another 16-byte
> slot out of the page for the end trampoline that does the actual work,
> I'm sure I could even come up with one that can just call a normal C
> function, only the return might need special handling depending on the
> return type.
> 
> And again, do you actually have any example of an architecture that
> cannot run position-independent code? PC-relative addressing is an
> implementation detail: the fact that it's available for x86_64 but not
> for i386 just makes position-independent code more cumbersome on i386,
> but it doesn't make it impossible. For the tiny trampolines here, it
> makes almost no difference.
> 

Hi Arvind,

I am preparing a response for all of your comments. I will send it out
tomorrow. Sorry for the delay.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 14:39       ` Florian Weimer
  2020-09-23 18:09         ` Andy Lutomirski
  2020-09-23 18:11         ` Solar Designer
@ 2020-09-23 23:53         ` Madhavan T. Venkataraman
  2 siblings, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-23 23:53 UTC (permalink / raw)
  To: Florian Weimer, Solar Designer
  Cc: Pavel Machek, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight,
	mark.rutland, mic, Rich Felker



On 9/23/20 9:39 AM, Florian Weimer wrote:
> * Solar Designer:
> 
>> While I share my opinion here, I don't mean that to block Madhavan's
>> work.  I'd rather defer to people more knowledgeable in current userland
>> and ABI issues/limitations and plans on dealing with those, especially
>> to Florian Weimer.  I haven't seen Florian say anything specific for or
>> against Madhavan's proposal, and I'd like to.  (Have I missed that?)
> 
> There was a previous discussion, where I provided feedback (not much
> different from the feedback here, given that the mechanism is mostly the
> same).
> 
> I think it's unnecessary for the libffi use case.  Precompiled code can
> be loaded from disk because the libffi trampolines are so regular.  On
> most architectures, it's not even the code that's patched, but some of
> the data driving it, which happens to be located on the same page due to
> a libffi quirk.
> 
> The libffi use case is a bit strange anyway: its trampolines are
> type-generic, and the per-call adjustment is data-driven.  This means
> that once you have libffi in the process, you have a generic
> data-to-function-call mechanism available that can be abused (it's even
> fully CET compatible in recent versions).  And then you need to look at
> the processes that use libffi.  A lot of them contain bytecode
> interpreters, and those enable data-driven arbitrary code execution as
> well.  I know that there are efforts under way to harden Python, but
> it's going to be tough to get to the point where things are still
> difficult for an attacker once they have the ability to make mprotect
> calls.
> 
> It was pointed out to me that libffi is doing things wrong, and the
> trampolines should not be type-generic, but generated so that they match
> the function being called.  That is, the marshal/unmarshal code would be
> open-coded in the trampoline, rather than using some generic mechanism
> plus run-time dispatch on data tables describing the function type.
> That is a very different design (and typically used by compilers (JIT or
> not JIT) to implement native calls).  Mapping some code page with a
> repeating pattern would no longer work to defeat anti-JIT measures
> because it's closer to real JIT.  I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.
> 
> Thanks,
> Florian
> 
Hi Florian,

I am making myself familiar with anti-JIT measures before I can respond
to this comment. Bear with me. I will also respond to the above
libffi comment.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 20:51       ` Pavel Machek
  2020-09-23 23:04         ` Madhavan T. Venkataraman
@ 2020-09-24 16:44         ` Mickaël Salaün
  2020-09-24 22:05           ` Pavel Machek
  1 sibling, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2020-09-24 16:44 UTC (permalink / raw)
  To: Pavel Machek, Madhavan T. Venkataraman
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86,
	luto, David.Laight, fweimer, mark.rutland


On 23/09/2020 22:51, Pavel Machek wrote:
> Hi!
> 
>>>> Scenario 2
>>>> ----------
>>>>
>>>> We know what code we need in advance. User trampolines are a good example of
>>>> this. It is possible to define such code statically with some help from the
>>>> kernel.
>>>>
>>>> This RFC addresses (2). (1) needs a general purpose trusted code generator
>>>> and is out of scope for this RFC.
>>>
>>> This is slightly less crazy talk than introduction talking about holes
>>> in W^X. But it is very, very far from normal Unix system, where you
>>> have selection of interpretters to run your malware on (sh, python,
>>> awk, emacs, ...) and often you can even compile malware from sources. 
>>>
>>> And as you noted, we don't have "a general purpose trusted code
>>> generator" for our systems.
>>>
>>> I believe you should simply delete confusing "introduction" and
>>> provide details of super-secure system where your patches would be
>>> useful, instead.
>>
>> This RFC talks about converting dynamic code (which cannot be authenticated)
>> to static code that can be authenticated using signature verification. That
>> is the scope of this RFC.
>>
>> If I have not been clear before, by dynamic code, I mean machine code that is
>> dynamic in nature. Scripts are beyond the scope of this RFC.
>>
>> Also, malware compiled from sources is not dynamic code. That is orthogonal
>> to this RFC. If such malware has a valid signature that the kernel permits its
>> execution, we have a systemic problem.
>>
>> I am not saying that script authentication or compiled malware are not problems.
>> I am just saying that this RFC is not trying to solve all of the security problems.
>> It is trying to define one way to convert dynamic code to static code to address
>> one class of problems.
> 
> Well, you don't have to solve all problems at once.
> 
> But solutions have to exist, and AFAIK in this case they don't. You
> are armoring doors, but ignoring open windows.

FYI, script execution is being addressed (for the kernel part) by this
patch series:
https://lore.kernel.org/lkml/20200924153228.387737-1-mic@digikod.net/

> 
> Or very probably you are thinking about something different than
> normal desktop distros (Debian 10). Because on my systems, I have
> python, gdb and gcc...

It doesn't make sense for a tailored security system to leave all these
tools available to an attacker.

> 
> It would be nice to specify what other pieces need to be present for
> this to make sense -- because it makes no sense on Debian 10.

Not all kernel features make sense for a generic/undefined usage,
especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
they can still be definitely useful.

> 
> Best regards,
> 									Pavel
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-23 19:51             ` Arvind Sankar
  2020-09-23 23:51               ` Madhavan T. Venkataraman
@ 2020-09-24 20:23               ` Madhavan T. Venkataraman
  2020-09-24 20:52                 ` Florian Weimer
                                   ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-24 20:23 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Florian Weimer, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel



On 9/23/20 2:51 PM, Arvind Sankar wrote:
> On Wed, Sep 23, 2020 at 02:17:30PM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 9/23/20 4:11 AM, Arvind Sankar wrote:
>>> For libffi, I think the proposed standard trampoline won't actually
>>> work, because not all ABIs have two scratch registers available to use
>>> as code_reg and data_reg. Eg i386 fastcall only has one, and register
>>> has zero scratch registers. I believe 32-bit ARM only has one scratch
>>> register as well.
>>
>> The trampoline is invoked as a function call in the libffi case. Any
>> caller saved register can be used as code_reg, can it not? And the
>> scratch register is needed only to jump to the code. After that, it
>> can be reused for any other purpose.
>>
>> However, for ARM, you are quite correct. There is only one scratch
>> register. This means that I have to provide two types of trampolines:
>>
>> 	- If an architecture has enough scratch registers, use the currently
>> 	  defined trampoline.
>>
>> 	- If the architecture has only one scratch register, but has PC-relative
>> 	  data references, then embed the code address at the bottom of the
>> 	  trampoline and access it using PC-relative addressing.
>>
>> Thanks for pointing this out.
>>
>> Madhavan
> 
> libffi is trying to provide closures with non-standard ABIs as well: the
> actual user function is standard ABI, but the closure can be called with
> a different ABI. If the closure was created with FFI_REGISTER abi, there
> are no registers available for the trampoline to use: EAX, EDX and ECX
> contain the first three arguments of the function, and every other
> register is callee-save.
> 
> I provided a sample of the kind of trampoline that would be needed in
> this case -- it's position-independent and doesn't clobber any registers
> at all, and you get 255 trampolines per page. If I take another 16-byte
> slot out of the page for the end trampoline that does the actual work,
> I'm sure I could even come up with one that can just call a normal C
> function, only the return might need special handling depending on the
> return type.
> 
> And again, do you actually have any example of an architecture that
> cannot run position-independent code? PC-relative addressing is an
> implementation detail: the fact that it's available for x86_64 but not
> for i386 just makes position-independent code more cumbersome on i386,
> but it doesn't make it impossible. For the tiny trampolines here, it
> makes almost no difference.
> 

I have tried to answer all of your previous comments here. Let me know
if I missed anything:


> Which ISA does not support PIC objects? You mentioned i386 below, but
> i386 does support them, it just needs to copy the PC into a GPR first
> (see below).

Position Independent Code needs PC-relative branches. I was referring
to PC-relative data references. Like RIP-relative data references in
X64. i386 ISA does not support this.

> i386 just needs a tiny bit of code to copy the PC into a GPR first, i.e.
> the trampoline would be:
> 
> 	call	1f
> 1:	pop	%data_reg
> 	movl	(code_table + X - 1b)(%data_reg), %code_reg
> 	movl	(data_table + X - 1b)(%data_reg), %data_reg
> 	jmp	*(%code_reg)
> 
> I do not understand the point about passing data at runtime. This
> trampoline is to achieve exactly that, no?

PC-relative data referencing
----------------------------

I agree that the current PC value can be loaded in a GPR using the trick
of call, pop on i386.

Perhaps, on other architectures, we can do similar things. For instance,
in architectures that load the return address in a designated register
instead of pushing it on the stack, the trampoline could call a leaf function
that moves the value of that register into data_reg so that at the location
after the call instruction, the current PC is already loaded in data_reg.
SPARC is one example I can think of.

My take is - if the ISA supports PC-relative data referencing explicitly (like
X64 or ARM64), then we can use it. Or, if the ABI specification documents an
approved way to load the PC into a GPR, we can use it.

Otherwise, using an ABI quirk or a calling convention side effect to load the
PC into a GPR is, IMO, non-standard or non-compliant or non-approved or
whatever you want to call it. I would be conservative and not use it. Who knows
what incompatibility there will be with some future software or hardware
features?

For instance, in the i386 example, we do a call without a matching return.
Also, we use a pop to undo the call. Can anyone tell me if this kind of use
is an ABI approved one?

Kernel supplied trampoline
--------------------------

One advantage in doing this in the kernel is that we don't need to use
non-standard or non-ABI compliant code.

To minimize the number of registers used by the trampoline, I will redefine
the kernel generated trampoline as follows:

- The kernel loads the trampoline and the code and the data addresses to be
  dereferenced like this:

	A ----> -------------------
		| Trampoline code |
	B ---->	-------------------
	        | Data Address    |
		-------------------
		| Code Address    |
		-------------------

So, the trampoline code would be:

	mov B, %data_reg
	jump (%data_reg + sizeof(Data address))

The kernel will hard code B into the trampoline.

The static code that the trampoline jumps to looks like this:

	load (%data_reg), %data_reg
	rest of the code

Use of scratch registers
------------------------

With this new trampoline, we only use one scratch register. So, the same
RFC will work for libffi on ARM.

You pointed out that in the FFI_REGISTER ABI no scratch registers can
be used. Read the section "Secure vs Performant trampoline" below where
this is addressed.

Standard API for all userland for all architectures
---------------------------------------------------

The next advantage in using the kernel is standardization.

If the kernel supplies this, then all applications and libraries can use
it for all architectures with one single, simple API. Without this, each
application/library has to roll its own solution for every architecture-ABI
combo it wants to support.

Furthermore, if this work gets accepted, I plan to add a glibc wrapper for
the kernel API. The glibc API would look something like this:

	Allocate a trampoline
	---------------------

	tramp = alloc_tramp();

	Set trampoline parameters
	-------------------------

	init_tramp(tramp, code, data);

	Free the trampoline
	-------------------

	free_tramp(tramp);

glibc will allocate and manage the code and data tables, handle kernel API
details and manage the trampoline table.

As an example, in libffi:

	ffi_closure_alloc() would call alloc_tramp()

	ffi_prep_closure_loc() would call init_tramp()

	ffi_closure_free() would call free_tramp()

That is it! It works on all the architectures supported in the kernel for
trampfd.

This makes it really easy for maintainers to adopt the API and move their
code to a more secure model (which is the fundamental idea behind this work).
For this advantage alone, IMO, it is worth doing it in the kernel.

Secure vs Performant trampoline
-------------------------------

If you recall, in version 1, I presented a trampoline type that is
implemented in the kernel. When an application invokes the trampoline,
it traps into the kernel and the kernel performs the work of the trampoline.

The disadvantage is that a trip to the kernel is needed. That can be
expensive.

The advantage is that the kernel can add security checks before doing the
work. Mainly, I am looking at checks that might prevent the trampoline
from being used in an ROP/BOP chain. Some half-baked ideas:

	- Check that the invocation is at the starting point of the
	  trampoline

	- Check if the trampoline is jumping to an allowed PC

	- Check if the trampoline is being invoked from an allowed
	  calling PC or PC range

Allowed PCs can be input using the trampfd API mentioned in version 1.
Basically, an array of PCs is written into trampfd.

Suggestions for other checks are most welcome!

I would like to implement an option in the trampfd API. The user can
choose a secure trampoline or a performant trampoline. For a performant
trampoline, the kernel will generate the code. For a secure trampoline,
the kernel will do the work itself.

In order to address the FFI_REGISTER ABI in libffi, we could use the secure
trampoline. In FFI_REGISTER, the data is pushed on the stack and the code
is jumped to without using any registers.

As outlined in version 1, the kernel can push the data address on the stack
and write the code address into the PC and return to userland.

For doing all of this, we need trampfd.

Permitting the use of trampfd
-----------------------------

An "exectramp" setting can be implemented in SELinux to selectively allow the
use of trampfd for applications.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 20:23               ` Madhavan T. Venkataraman
@ 2020-09-24 20:52                 ` Florian Weimer
  2020-09-25 22:22                   ` Madhavan T. Venkataraman
  2020-09-24 22:13                 ` Pavel Machek
  2020-09-24 23:43                 ` Arvind Sankar
  2 siblings, 1 reply; 50+ messages in thread
From: Florian Weimer @ 2020-09-24 20:52 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Arvind Sankar, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel

* Madhavan T. Venkataraman:

> Otherwise, using an ABI quirk or a calling convention side effect to
> load the PC into a GPR is, IMO, non-standard or non-compliant or
> non-approved or whatever you want to call it. I would be
> conservative and not use it. Who knows what incompatibility there
> will be with some future software or hardware features?

AArch64 PAC makes a backwards-incompatible change that touches this
area, but we'll see if they can actually get away with it.

In general, these things are baked into the ABI, even if they are not
spelled out explicitly in the psABI supplement.

> For instance, in the i386 example, we do a call without a matching return.
> Also, we use a pop to undo the call. Can anyone tell me if this kind of use
> is an ABI approved one?

Yes, for i386, this is completely valid from an ABI point of view.
It's equally possible to use a regular function call and just read the
return address that has been pushed to the stack.  Then there's no
stack mismatch at all.  Return stack predictors (including the one
used by SHSTK) also recognize the CALL 0 construct, so that's fine as
well.  The i386 psABI does not use function descriptors, and either
approach (out-of-line thunk or CALL 0) is in common use to materialize
the program counter in a register and construct the GOT pointer.

> If the kernel supplies this, then all applications and libraries can use
> it for all architectures with one single, simple API. Without this, each
> application/library has to roll its own solution for every architecture-ABI
> combo it wants to support.

Is there any other user for these type-generic trampolines?
Everything else I've seen generates machine code specific to the
function being called.  libffi is quite the outlier in my experience
because the trampoline calls a generic data-driven
marshaller/unmarshaller.  The other trampoline generators put this
marshalling code directly into the generated trampoline.

I'm still not convinced that this can't be done directly in libffi,
without kernel help.  Hiding the architecture-specific code in the
kernel doesn't reduce overall system complexity.

> As an example, in libffi:
>
> 	ffi_closure_alloc() would call alloc_tramp()
>
> 	ffi_prep_closure_loc() would call init_tramp()
>
> 	ffi_closure_free() would call free_tramp()
>
> That is it! It works on all the architectures supported in the kernel for
> trampfd.

ffi_prep_closure_loc would still need to check whether the trampoline
has been allocated by alloc_tramp because some applications supply
their own (executable and writable) mapping.  ffi_closure_alloc would
need to support different sizes (not matching the trampoline).  It's
also unclear to me to what extent software out there writes to the
trampoline data directly, bypassing the libffi API (the structs are
not opaque, after all).  And all the existing libffi memory management
code (including the embedded dlmalloc copy) would be needed to support
kernels without trampfd for years to come.

I very much agree that we have a gap in libffi when it comes to
JIT-less operation.  But I'm not convinced that kernel support is
needed to close it, or that it is even the right design.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 16:44         ` Mickaël Salaün
@ 2020-09-24 22:05           ` Pavel Machek
  2020-09-25 10:12             ` Mickaël Salaün
  0 siblings, 1 reply; 50+ messages in thread
From: Pavel Machek @ 2020-09-24 22:05 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Madhavan T. Venkataraman, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland

[-- Attachment #1: Type: text/plain, Size: 2801 bytes --]

Hi!

> >>> I believe you should simply delete confusing "introduction" and
> >>> provide details of super-secure system where your patches would be
> >>> useful, instead.
> >>
> >> This RFC talks about converting dynamic code (which cannot be authenticated)
> >> to static code that can be authenticated using signature verification. That
> >> is the scope of this RFC.
> >>
> >> If I have not been clear before, by dynamic code, I mean machine code that is
> >> dynamic in nature. Scripts are beyond the scope of this RFC.
> >>
> >> Also, malware compiled from sources is not dynamic code. That is orthogonal
> >> to this RFC. If such malware has a valid signature that the kernel permits its
> >> execution, we have a systemic problem.
> >>
> >> I am not saying that script authentication or compiled malware are not problems.
> >> I am just saying that this RFC is not trying to solve all of the security problems.
> >> It is trying to define one way to convert dynamic code to static code to address
> >> one class of problems.
> > 
> > Well, you don't have to solve all problems at once.
> > 
> > But solutions have to exist, and AFAIK in this case they don't. You
> > are armoring doors, but ignoring open windows.
> 
> FYI, script execution is being addressed (for the kernel part) by this
> patch series:
> https://lore.kernel.org/lkml/20200924153228.387737-1-mic@digikod.net/

Ok.

> > Or very probably you are thinking about something different than
> > normal desktop distros (Debian 10). Because on my systems, I have
> > python, gdb and gcc...
> 
> It doesn't make sense for a tailored security system to leave all these
> tools available to an attacker.

And it also does not make sense to use "trampoline file descriptor" on
generic system... while W^X should make sense there.

> > It would be nice to specify what other pieces need to be present for
> > this to make sense -- because it makes no sense on Debian 10.
> 
> Not all kernel features make sense for a generic/undefined usage,
> especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
> SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
> they can still be definitely useful.

Yep... so... I'd expect something like... "so you have single-purpose
system with all script interpreters removed, IMA hashing all the files
to make sure they are not modified, and W^X enabled. Attacker can
still execute code after buffer overflow by .... and trapoline file
descriptor addrsses that"... so that people running generic systems
can stop reading after first sentence.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 20:23               ` Madhavan T. Venkataraman
  2020-09-24 20:52                 ` Florian Weimer
@ 2020-09-24 22:13                 ` Pavel Machek
  2020-09-24 23:43                 ` Arvind Sankar
  2 siblings, 0 replies; 50+ messages in thread
From: Pavel Machek @ 2020-09-24 22:13 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Arvind Sankar, Florian Weimer, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic

[-- Attachment #1: Type: text/plain, Size: 1644 bytes --]

Hi!

> PC-relative data referencing
> ----------------------------
> 
> I agree that the current PC value can be loaded in a GPR using the trick
> of call, pop on i386.
> 
> Perhaps, on other architectures, we can do similar things. For instance,
> in architectures that load the return address in a designated register
> instead of pushing it on the stack, the trampoline could call a leaf function
> that moves the value of that register into data_reg so that at the location
> after the call instruction, the current PC is already loaded in data_reg.
> SPARC is one example I can think of.
> 
> My take is - if the ISA supports PC-relative data referencing explicitly (like
> X64 or ARM64), then we can use it. Or, if the ABI specification documents an
> approved way to load the PC into a GPR, we can use it.
> 
> Otherwise, using an ABI quirk or a calling convention side effect to load the
> PC into a GPR is, IMO, non-standard or non-compliant or non-approved or
> whatever you want to call it. I would be conservative and not use

ISAs are very well defined, and basically not changing. If you want to
argue we should not use something, you should have very clear picture
_why_ it is bad. "Non-standard or non-approved or whatever" just does
not cut it.

And yes, certain tricks may be seriously slow on modern CPUs, and we
might want to avoid those. But other than that... you should have
better argument than "it is non-standard".

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 20:23               ` Madhavan T. Venkataraman
  2020-09-24 20:52                 ` Florian Weimer
  2020-09-24 22:13                 ` Pavel Machek
@ 2020-09-24 23:43                 ` Arvind Sankar
  2020-09-25 22:44                   ` Madhavan T. Venkataraman
  2 siblings, 1 reply; 50+ messages in thread
From: Arvind Sankar @ 2020-09-24 23:43 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Arvind Sankar, Florian Weimer, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel

On Thu, Sep 24, 2020 at 03:23:52PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> > Which ISA does not support PIC objects? You mentioned i386 below, but
> > i386 does support them, it just needs to copy the PC into a GPR first
> > (see below).
> 
> Position Independent Code needs PC-relative branches. I was referring
> to PC-relative data references. Like RIP-relative data references in
> X64. i386 ISA does not support this.

I was talking about PC-relative data references too: they are a
requirement for PIC code that wants to access any global data. They can
be implemented easily on i386 even though it doesn't have an addressing
mode that uses the PC.

> Otherwise, using an ABI quirk or a calling convention side effect to load the
> PC into a GPR is, IMO, non-standard or non-compliant or non-approved or
> whatever you want to call it. I would be conservative and not use it. Who knows
> what incompatibility there will be with some future software or hardware
> features?
> 
> For instance, in the i386 example, we do a call without a matching return.
> Also, we use a pop to undo the call. Can anyone tell me if this kind of use
> is an ABI approved one?

This doesn't have anything to do with the ABI, since what happened here
isn't visible to any caller or callee. Any machine instruction sequence
that has the effect of copying the PC into a GPR is acceptable, but this
is basically the only possible solution on i386. If you don't like the
call/pop mismatch (though that's supported by the hardware, and is what
clang likes to use), you can use the slightly different technique used
in my example, which copies the top of stack into a GPR after a call.

This is how all i386 PIC code has always worked.

> Standard API for all userland for all architectures
> ---------------------------------------------------
> 
> The next advantage in using the kernel is standardization.
> 
> If the kernel supplies this, then all applications and libraries can use
> it for all architectures with one single, simple API. Without this, each
> application/library has to roll its own solution for every architecture-ABI
> combo it wants to support.

But you can get even more standardization out of a userspace library,
because that can work even on non-linux OS's, as well as versions of
linux where the new syscall isn't available.

> 
> Furthermore, if this work gets accepted, I plan to add a glibc wrapper for
> the kernel API. The glibc API would look something like this:
> 
> 	Allocate a trampoline
> 	---------------------
> 
> 	tramp = alloc_tramp();
> 
> 	Set trampoline parameters
> 	-------------------------
> 
> 	init_tramp(tramp, code, data);
> 
> 	Free the trampoline
> 	-------------------
> 
> 	free_tramp(tramp);
> 
> glibc will allocate and manage the code and data tables, handle kernel API
> details and manage the trampoline table.

glibc could do this already if it wants, even without the syscall,
because this can be done in userspace already.

> 
> Secure vs Performant trampoline
> -------------------------------
> 
> If you recall, in version 1, I presented a trampoline type that is
> implemented in the kernel. When an application invokes the trampoline,
> it traps into the kernel and the kernel performs the work of the trampoline.
> 
> The disadvantage is that a trip to the kernel is needed. That can be
> expensive.
> 
> The advantage is that the kernel can add security checks before doing the
> work. Mainly, I am looking at checks that might prevent the trampoline
> from being used in an ROP/BOP chain. Some half-baked ideas:
> 
> 	- Check that the invocation is at the starting point of the
> 	  trampoline
> 
> 	- Check if the trampoline is jumping to an allowed PC
> 
> 	- Check if the trampoline is being invoked from an allowed
> 	  calling PC or PC range
> 
> Allowed PCs can be input using the trampfd API mentioned in version 1.
> Basically, an array of PCs is written into trampfd.

The source PC will generally not be available if the compiler decided to
tail-call optimize the call to the trampoline into a jump.

What's special about these trampolines anyway? Any indirect function
call could have these same problems -- an attacker could have
overwritten the pointer the same way, whether it's supposed to point to
a normal function or it is the target of this trampoline.

For making them a bit safer, userspace could just map the page holding
the data pointers/destination address(es) as read-only after
initialization.

> 
> Suggestions for other checks are most welcome!
> 
> I would like to implement an option in the trampfd API. The user can
> choose a secure trampoline or a performant trampoline. For a performant
> trampoline, the kernel will generate the code. For a secure trampoline,
> the kernel will do the work itself.
> 
> In order to address the FFI_REGISTER ABI in libffi, we could use the secure
> trampoline. In FFI_REGISTER, the data is pushed on the stack and the code
> is jumped to without using any registers.
> 
> As outlined in version 1, the kernel can push the data address on the stack
> and write the code address into the PC and return to userland.
> 
> For doing all of this, we need trampfd.

We don't need this for FFI_REGISTER. I presented a solution that works
in userspace. Even if you want to use a trampoline created by the
kernel, there's no reason it needs to trap into the kernel at trampoline
execution time. libffi's trampolines already handle this case today.

> 
> Permitting the use of trampfd
> -----------------------------
> 
> An "exectramp" setting can be implemented in SELinux to selectively allow the
> use of trampfd for applications.
> 
> Madhavan

Applications can use their own userspace trampolines regardless of this
setting, so it doesn't provide any additional security benefit by
preventing usage of trampfd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 22:05           ` Pavel Machek
@ 2020-09-25 10:12             ` Mickaël Salaün
  0 siblings, 0 replies; 50+ messages in thread
From: Mickaël Salaün @ 2020-09-25 10:12 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Madhavan T. Venkataraman, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, luto, David.Laight, fweimer,
	mark.rutland


On 25/09/2020 00:05, Pavel Machek wrote:
> Hi!
> 
>>>>> I believe you should simply delete confusing "introduction" and
>>>>> provide details of super-secure system where your patches would be
>>>>> useful, instead.
>>>>
>>>> This RFC talks about converting dynamic code (which cannot be authenticated)
>>>> to static code that can be authenticated using signature verification. That
>>>> is the scope of this RFC.
>>>>
>>>> If I have not been clear before, by dynamic code, I mean machine code that is
>>>> dynamic in nature. Scripts are beyond the scope of this RFC.
>>>>
>>>> Also, malware compiled from sources is not dynamic code. That is orthogonal
>>>> to this RFC. If such malware has a valid signature that the kernel permits its
>>>> execution, we have a systemic problem.
>>>>
>>>> I am not saying that script authentication or compiled malware are not problems.
>>>> I am just saying that this RFC is not trying to solve all of the security problems.
>>>> It is trying to define one way to convert dynamic code to static code to address
>>>> one class of problems.
>>>
>>> Well, you don't have to solve all problems at once.
>>>
>>> But solutions have to exist, and AFAIK in this case they don't. You
>>> are armoring doors, but ignoring open windows.
>>
>> FYI, script execution is being addressed (for the kernel part) by this
>> patch series:
>> https://lore.kernel.org/lkml/20200924153228.387737-1-mic@digikod.net/
> 
> Ok.
> 
>>> Or very probably you are thinking about something different than
>>> normal desktop distros (Debian 10). Because on my systems, I have
>>> python, gdb and gcc...
>>
>> It doesn't make sense for a tailored security system to leave all these
>> tools available to an attacker.
> 
> And it also does not make sense to use "trampoline file descriptor" on
> generic system... while W^X should make sense there.

Well, as said before, (full/original/system-wide) W^X may require
trampfd (as well as other building-blocks).

I guess most Linux deployments are not on "generic systems"
anyway (even if they may be based on generic distros), and W^X
contradicts the fact that users/attackers can do whatever they want on
the system.

> 
>>> It would be nice to specify what other pieces need to be present for
>>> this to make sense -- because it makes no sense on Debian 10.
>>
>> Not all kernel features make sense for a generic/undefined usage,
>> especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
>> SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
>> they can still be definitely useful.
> 
> Yep... so... I'd expect something like... "so you have single-purpose
> system

No one talked about a single-purpose system.

> with all script interpreters removed,

Not necessarily with the patch series I pointed out just before.

> IMA hashing all the files
> to make sure they are not modified, and W^X enabled.

System-wide W^X is not only for memory, and as Madhavan said: "this RFC
pertains to converting dynamic [writable] machine code to static
[non-writable] code".

> Attacker can
> still execute code after buffer overflow by .... and trapoline file
> descriptor addrsses that"... so that people running generic systems
> can stop reading after first sentence.

Are you proposing to add a
"[feature-not-useful-without-a-proper-system-configuration]" tag in
subjects? :)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 20:52                 ` Florian Weimer
@ 2020-09-25 22:22                   ` Madhavan T. Venkataraman
  2020-09-27 18:25                     ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-25 22:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Arvind Sankar, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel



On 9/24/20 3:52 PM, Florian Weimer wrote:
> * Madhavan T. Venkataraman:
> 
>> Otherwise, using an ABI quirk or a calling convention side effect to
>> load the PC into a GPR is, IMO, non-standard or non-compliant or
>> non-approved or whatever you want to call it. I would be
>> conservative and not use it. Who knows what incompatibility there
>> will be with some future software or hardware features?
> 
> AArch64 PAC makes a backwards-incompatible change that touches this
> area, but we'll see if they can actually get away with it.
> 
> In general, these things are baked into the ABI, even if they are not
> spelled out explicitly in the psABI supplement.
> 
>> For instance, in the i386 example, we do a call without a matching return.
>> Also, we use a pop to undo the call. Can anyone tell me if this kind of use
>> is an ABI approved one?
> 
> Yes, for i386, this is completely valid from an ABI point of view.
> It's equally possible to use a regular function call and just read the
> return address that has been pushed to the stack.  Then there's no
> stack mismatch at all.  Return stack predictors (including the one
> used by SHSTK) also recognize the CALL 0 construct, so that's fine as
> well.  The i386 psABI does not use function descriptors, and either
> approach (out-of-line thunk or CALL 0) is in common use to materialize
> the program counter in a register and construct the GOT pointer.
> 
>> If the kernel supplies this, then all applications and libraries can use
>> it for all architectures with one single, simple API. Without this, each
>> application/library has to roll its own solution for every architecture-ABI
>> combo it wants to support.
> 
> Is there any other user for these type-generic trampolines?
> Everything else I've seen generates machine code specific to the
> function being called.  libffi is quite the outlier in my experience
> because the trampoline calls a generic data-driven
> marshaller/unmarshaller.  The other trampoline generators put this
> marshalling code directly into the generated trampoline.
> 
> I'm still not convinced that this can't be done directly in libffi,
> without kernel help.  Hiding the architecture-specific code in the
> kernel doesn't reduce overall system complexity.
> 

See below. I have accepted the community's recommendation to implement it
in user land. However, this is not just for libffi. It is for all dynamic
code. libffi is just the first use case I am addressing with this.

>> As an example, in libffi:
>>
>> 	ffi_closure_alloc() would call alloc_tramp()
>>
>> 	ffi_prep_closure_loc() would call init_tramp()
>>
>> 	ffi_closure_free() would call free_tramp()
>>
>> That is it! It works on all the architectures supported in the kernel for
>> trampfd.
> 
> ffi_prep_closure_loc would still need to check whether the trampoline
> has been allocated by alloc_tramp because some applications supply
> their own (executable and writable) mapping.  ffi_closure_alloc would
> need to support different sizes (not matching the trampoline).  It's
> also unclear to me to what extent software out there writes to the
> trampoline data directly, bypassing the libffi API (the structs are
> not opaque, after all).  And all the existing libffi memory management
> code (including the embedded dlmalloc copy) would be needed to support
> kernels without trampfd for years to come.
> 

In the libffi patch I have included, I have handled this. The closure
structure contains a tramp field:

  char tramp[FFI_TRAMPOLINE_SIZE];

If trampfd is not used, this array will contain the actual
trampoline code. If trampfd is used, then we don't need the array for
storing any trampoline code. That space can be used for storing trampfd
related information.

So, there is no change to the closure structure.

Also, the code can tell if the closure has been allocated from dlmalloc()
called from ffi_closure_alloc() or has been allocated by the caller
directly without calling ffi_closure_alloc(). I have written this function:

int ffi_closure_alloc_called(void *closure)
{
  msegmentptr seg = segment_holding (gm, closure);
  return (seg != NULL);
}

Using this function, I can tell how the closure has been allocated. I use
trampfd only for closures that have been allocated using ffi_closure_alloc().
So, I believe I have handled all the cases. If I have missed anything,
let me know. I will address it.

> I very much agree that we have a gap in libffi when it comes to
> JIT-less operation.  But I'm not convinced that kernel support is
> needed to close it, or that it is even the right design.
> 

I have taken into account most of the comments received so far and I have
come up with a proposal:

I would like to do this in two separate RFCs:

library RFC
-----------

I accept the recommendation of the reviewers about implementing it in
user land in a library.

Just for the sake of context, I would like to reiterate the problem being
solved and what the library will contain. Bear with me.

My goal is to help convert existing dynamic code to static code as far as
possible. The binary generated from the static code can be signed. The
kernel can use signature verification to authenticate the code. This way,
we don't need to disable W^X or make exceptions for the code (exemem etc) or
use any user level methods to somehow map and execute the code.

The dynamic code can be very simple like the libffi trampoline. Or, it can
be a lot more complex. E.g., a trampoline that uses data marshaling as Florian
mentioned. In all cases, when the code is converted to static code, the static
code needs to know where its data will be located at runtime. If static code is
a function, then one can just pass parameters. But if it is arbitrary code,
then one needs a way to inform the static code where it can find its data.

The code can use PC-relative referencing where available. For the sake of this
discussion, let us assume that we can use some trick or the other to load the
current PC into a GPR on all architectures. Then, we can use PC-relative
referencing. Let us assume that these tricks will not cause ABI compliance
issues in the future.

The maintainer of the dynamic code who wishes to convert it to static code
should not have to deal with all of these details. The static code should
be able to assume that its data is pointed to by a designated register. Or,
it should be able to assume that the data pointer has been pushed on the
stack. Then, it is easier for maintainers to adopt this and move their code
to a more secure model.

This can be achieved by providing a small, minimal trampoline that loads the
data pointer in a register or pushes it on the stack and jumps to the static
code.

The reviewers felt that the minimal trampoline can be provided in user land.
So, I will provide a user library. The user library will:

	- define the minimal trampoline statically for different architectures
	  using some flavor of PC-relative data referencing
	
	- provide a table of trampolines in a page
	
	- create and manage code and data pages
	
	- present a simple API to dynamic code maintainers

This overall approach has pretty much been agreed upon by the community so
far. I will send out an RFC for the library once I have the code ready.

Which library?
--------------

I need a recommendation from the community on this. Should I just place the
code in glibc? Or, should I create a libtramp for this? I prefer glibc as
it will make for easier adoption. But I will defer to the community on this.
What do you recommend?

trampfd RFC version 3
---------------------

Once the library RFC is accepted, I would, however, like to submit
version 3 of trampfd.

The library would support a choice of trampoline:

	- fast user trampoline described above

	- slow kernel trampoline described below that supports security
	  checks each time the trampoline is invoked

The minimal trampoline mentioned above would also be implemented in the
kernel. The mechanism is outlined in version 1. When the application executes
the trampoline, it would trap into the kernel and the kernel would do the
work (load the data pointer in a user register or push it on the user stack and set
the user PC to the target code and return). The kernel will perform security checks when
the trampoline is invoked. For instance, to reduce or eliminate the possibility
of the trampoline being used in an ROP/BOP chain. The checks are work in
progress. But I think I can nail them.

Note that there is no code generation involved in this proposal. The kernel
is the trampoline.

Would you guys be willing to consider this approach?

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-24 23:43                 ` Arvind Sankar
@ 2020-09-25 22:44                   ` Madhavan T. Venkataraman
  2020-09-26 15:55                     ` Arvind Sankar
  0 siblings, 1 reply; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-25 22:44 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Florian Weimer, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel



On 9/24/20 6:43 PM, Arvind Sankar wrote:
> On Thu, Sep 24, 2020 at 03:23:52PM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>>> Which ISA does not support PIC objects? You mentioned i386 below, but
>>> i386 does support them, it just needs to copy the PC into a GPR first
>>> (see below).
>>
>> Position Independent Code needs PC-relative branches. I was referring
>> to PC-relative data references. Like RIP-relative data references in
>> X64. i386 ISA does not support this.
> 
> I was talking about PC-relative data references too: they are a
> requirement for PIC code that wants to access any global data. They can
> be implemented easily on i386 even though it doesn't have an addressing
> mode that uses the PC.
> 
>> Otherwise, using an ABI quirk or a calling convention side effect to load the
>> PC into a GPR is, IMO, non-standard or non-compliant or non-approved or
>> whatever you want to call it. I would be conservative and not use it. Who knows
>> what incompatibility there will be with some future software or hardware
>> features?
>>
>> For instance, in the i386 example, we do a call without a matching return.
>> Also, we use a pop to undo the call. Can anyone tell me if this kind of use
>> is an ABI approved one?
> 
> This doesn't have anything to do with the ABI, since what happened here
> isn't visible to any caller or callee. Any machine instruction sequence
> that has the effect of copying the PC into a GPR is acceptable, but this
> is basically the only possible solution on i386. If you don't like the
> call/pop mismatch (though that's supported by the hardware, and is what
> clang likes to use), you can use the slightly different technique used
> in my example, which copies the top of stack into a GPR after a call.
> 
> This is how all i386 PIC code has always worked.
> 

I have responded to this in my reply to Florian. Basically, I accept the opinion
of the reviewers. I will assume that any trick we use to get the current PC into a
GPR will not cause ABI compliance issue in the future.

>> Standard API for all userland for all architectures
>> ---------------------------------------------------
>>
>> The next advantage in using the kernel is standardization.
>>
>> If the kernel supplies this, then all applications and libraries can use
>> it for all architectures with one single, simple API. Without this, each
>> application/library has to roll its own solution for every architecture-ABI
>> combo it wants to support.
> 
> But you can get even more standardization out of a userspace library,
> because that can work even on non-linux OS's, as well as versions of
> linux where the new syscall isn't available.
> 

Dealing with old vs new kernels is the same as dealing with old vs new libs.

In any case, what you have suggested above has already been suggested before
and I have accepted everyone's opinion. Please see my response to Florian's email.

>>
>> Furthermore, if this work gets accepted, I plan to add a glibc wrapper for
>> the kernel API. The glibc API would look something like this:
>>
>> 	Allocate a trampoline
>> 	---------------------
>>
>> 	tramp = alloc_tramp();
>>
>> 	Set trampoline parameters
>> 	-------------------------
>>
>> 	init_tramp(tramp, code, data);
>>
>> 	Free the trampoline
>> 	-------------------
>>
>> 	free_tramp(tramp);
>>
>> glibc will allocate and manage the code and data tables, handle kernel API
>> details and manage the trampoline table.
> 
> glibc could do this already if it wants, even without the syscall,
> because this can be done in userspace already.
> 

I am wary of using ABI tricks or calling convention side-effects. However,
since the reviewers feel it is OK, I have accepted that opinion. I have
assumed now that any trick to load the current PC into a GPR can be used
without any risk. I hope that assumption is correct.

>>
>> Secure vs Performant trampoline
>> -------------------------------
>>
>> If you recall, in version 1, I presented a trampoline type that is
>> implemented in the kernel. When an application invokes the trampoline,
>> it traps into the kernel and the kernel performs the work of the trampoline.
>>
>> The disadvantage is that a trip to the kernel is needed. That can be
>> expensive.
>>
>> The advantage is that the kernel can add security checks before doing the
>> work. Mainly, I am looking at checks that might prevent the trampoline
>> from being used in an ROP/BOP chain. Some half-baked ideas:
>>
>> 	- Check that the invocation is at the starting point of the
>> 	  trampoline
>>
>> 	- Check if the trampoline is jumping to an allowed PC
>>
>> 	- Check if the trampoline is being invoked from an allowed
>> 	  calling PC or PC range
>>
>> Allowed PCs can be input using the trampfd API mentioned in version 1.
>> Basically, an array of PCs is written into trampfd.
> 
> The source PC will generally not be available if the compiler decided to
> tail-call optimize the call to the trampoline into a jump.
> 

This is still work in progress. But I am thinking that labels can be used.
So, if the code is:

	invoke_tramp:
		(*tramp)();

then, invoke_tramp can be supplied as the calling PC.

Similarly, labels can be used in assembly functions as well.

Like I said, I have to think about this more.

> What's special about these trampolines anyway? Any indirect function
> call could have these same problems -- an attacker could have
> overwritten the pointer the same way, whether it's supposed to point to
> a normal function or it is the target of this trampoline.
> 
> For making them a bit safer, userspace could just map the page holding
> the data pointers/destination address(es) as read-only after
> initialization.
> 

You need to look at version 1 of trampfd for how to do "allowed pcs".
As an example, libffi defines ABI handlers for every arch-ABI combo.
These ABI handler pointers could be placed in an array in .rodata.
Then, the array can be written into trampfd for setting allowed PCS.
When the target PC is set for a trampoline, the kernel will check
it against allowed PCs and reject it if it has been overwritten.

>>
>> Suggestions for other checks are most welcome!
>>
>> I would like to implement an option in the trampfd API. The user can
>> choose a secure trampoline or a performant trampoline. For a performant
>> trampoline, the kernel will generate the code. For a secure trampoline,
>> the kernel will do the work itself.
>>
>> In order to address the FFI_REGISTER ABI in libffi, we could use the secure
>> trampoline. In FFI_REGISTER, the data is pushed on the stack and the code
>> is jumped to without using any registers.
>>
>> As outlined in version 1, the kernel can push the data address on the stack
>> and write the code address into the PC and return to userland.
>>
>> For doing all of this, we need trampfd.
> 
> We don't need this for FFI_REGISTER. I presented a solution that works
> in userspace. Even if you want to use a trampoline created by the
> kernel, there's no reason it needs to trap into the kernel at trampoline
> execution time. libffi's trampolines already handle this case today.
> 

libffi handles this using user level dynamic code which needs to be executed.
If the security subsystem prevents that, then the dynamic code cannot execute.
That is the whole point of this RFC.

>>
>> Permitting the use of trampfd
>> -----------------------------
>>
>> An "exectramp" setting can be implemented in SELinux to selectively allow the
>> use of trampfd for applications.
>>
>> Madhavan
> 
> Applications can use their own userspace trampolines regardless of this
> setting, so it doesn't provide any additional security benefit by
> preventing usage of trampfd.
> 

The background for all of this is that dynamic code such as trampolines
need to be placed in a page with executable permissions so they can
execute. If security measures such as W^X are present, this will not
be possible. Admitted, today some user level tricks exist to get around
W^X. I have alluded to those. IMO, they are all security holes and will
get plugged sooner or later. Then, these trampolines cannot execute.
Currently, there exist security exceptions such as execmem to let them
execute. But we would like to do it without making security exceptions.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-25 22:44                   ` Madhavan T. Venkataraman
@ 2020-09-26 15:55                     ` Arvind Sankar
  2020-09-27 17:59                       ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 50+ messages in thread
From: Arvind Sankar @ 2020-09-26 15:55 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Arvind Sankar, Florian Weimer, kernel-hardening, linux-api,
	linux-arm-kernel, linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel

On Fri, Sep 25, 2020 at 05:44:56PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 9/24/20 6:43 PM, Arvind Sankar wrote:
> > 
> > The source PC will generally not be available if the compiler decided to
> > tail-call optimize the call to the trampoline into a jump.
> > 
> 
> This is still work in progress. But I am thinking that labels can be used.
> So, if the code is:
> 
> 	invoke_tramp:
> 		(*tramp)();
> 
> then, invoke_tramp can be supplied as the calling PC.
> 
> Similarly, labels can be used in assembly functions as well.
> 
> Like I said, I have to think about this more.

What I mean is that the kernel won't have access to the actual source
PC. If I followed your v1 correctly, it works by making any branch to
the trampoline code trigger a page fault. At this point, the PC has
already been updated to the trampoline entry, so the only thing the
fault handler can know is the return address on the top of the stack,
which (a) might not be where the branch actually originated, either
because it was a jump, or you've already been hacked and you got here
using a ret; (b) is available to userspace anyway.

> 
> > What's special about these trampolines anyway? Any indirect function
> > call could have these same problems -- an attacker could have
> > overwritten the pointer the same way, whether it's supposed to point to
> > a normal function or it is the target of this trampoline.
> > 
> > For making them a bit safer, userspace could just map the page holding
> > the data pointers/destination address(es) as read-only after
> > initialization.
> > 
> 
> You need to look at version 1 of trampfd for how to do "allowed pcs".
> As an example, libffi defines ABI handlers for every arch-ABI combo.
> These ABI handler pointers could be placed in an array in .rodata.
> Then, the array can be written into trampfd for setting allowed PCS.
> When the target PC is set for a trampoline, the kernel will check
> it against allowed PCs and reject it if it has been overwritten.

I'm not asking how it's implemented. I'm asking what's the point? On a
typical linux system, at least on x86, every library function call is an
indirect branch. The protection they get is that the dynamic linker can
map the pointer table read-only after initializing it.

For the RO mapping, libffi could be mapping both the entire closure
structure, as well as the structure that describes the arguments and
return types of the function, read-only once they are initialized.

For libffi, there are three indirect branches for every trampoline call
with your suggested trampoline: one to get to the trampoline, one to
jump to the handler, and one to call the actual user function. If we are
particularly concerned about the trampoline to handler branch for some
reason, we could just replace it with a direct branch: if the kernel was
generating the code, there's no reason to allow the data pointer or code
target to be changed after the trampoline was created. It can just
hard-code them in the generated code and be done with it. Even with
user-space trampolines, you can use a direct call. All you need is
libffi-trampoline.so which contains a few thousand trampolines all
jumping to one handler, which then decides what to do based on which
trampoline was called. Sure libffi currently dispatches to one of 2-3
handlers based on the ABI, but there's no technical reason it couldn't
dispatch to just one that handled all the ABIs, and the trampoline could
be boiled down to just:
	endbr
	call handler
	ret

> >>
> >> In order to address the FFI_REGISTER ABI in libffi, we could use the secure
> >> trampoline. In FFI_REGISTER, the data is pushed on the stack and the code
> >> is jumped to without using any registers.
> >>
> >> As outlined in version 1, the kernel can push the data address on the stack
> >> and write the code address into the PC and return to userland.
> >>
> >> For doing all of this, we need trampfd.
> > 
> > We don't need this for FFI_REGISTER. I presented a solution that works
> > in userspace. Even if you want to use a trampoline created by the
> > kernel, there's no reason it needs to trap into the kernel at trampoline
> > execution time. libffi's trampolines already handle this case today.
> > 
> 
> libffi handles this using user level dynamic code which needs to be executed.
> If the security subsystem prevents that, then the dynamic code cannot execute.
> That is the whole point of this RFC.

/If/ you are using a trampoline created by the kernel, it can just
create the one that libffi is using today; which doesn't need trapping
into the kernel at execution time.

And if you aren't, you can use the trampoline I wrote, which has no
dynamic code, and doesn't need to trap into the kernel at execution time
either.

> 
> >>
> >> Permitting the use of trampfd
> >> -----------------------------
> >>
> >> An "exectramp" setting can be implemented in SELinux to selectively allow the
> >> use of trampfd for applications.
> >>
> >> Madhavan
> > 
> > Applications can use their own userspace trampolines regardless of this
> > setting, so it doesn't provide any additional security benefit by
> > preventing usage of trampfd.
> > 
> 
> The background for all of this is that dynamic code such as trampolines
> need to be placed in a page with executable permissions so they can
> execute. If security measures such as W^X are present, this will not
> be possible. Admitted, today some user level tricks exist to get around
> W^X. I have alluded to those. IMO, they are all security holes and will
> get plugged sooner or later. Then, these trampolines cannot execute.
> Currently, there exist security exceptions such as execmem to let them
> execute. But we would like to do it without making security exceptions.
> 
> Madhavan

How can you still say this after this whole discussion? Applications can
get the exact same functionality as your proposed trampfd using static
code, no W^X tricks needed.

This only matters if you have a trampfd that generates _truly_ dynamic
code, not just code that can be trivially made static.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-26 15:55                     ` Arvind Sankar
@ 2020-09-27 17:59                       ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-27 17:59 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Florian Weimer, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel



On 9/26/20 10:55 AM, Arvind Sankar wrote:
> On Fri, Sep 25, 2020 at 05:44:56PM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 9/24/20 6:43 PM, Arvind Sankar wrote:
>>>
>>> The source PC will generally not be available if the compiler decided to
>>> tail-call optimize the call to the trampoline into a jump.
>>>
>>
>> This is still work in progress. But I am thinking that labels can be used.
>> So, if the code is:
>>
>> 	invoke_tramp:
>> 		(*tramp)();
>>
>> then, invoke_tramp can be supplied as the calling PC.
>>
>> Similarly, labels can be used in assembly functions as well.
>>
>> Like I said, I have to think about this more.
> 
> What I mean is that the kernel won't have access to the actual source
> PC. If I followed your v1 correctly, it works by making any branch to
> the trampoline code trigger a page fault. At this point, the PC has
> already been updated to the trampoline entry, so the only thing the
> fault handler can know is the return address on the top of the stack,
> which (a) might not be where the branch actually originated, either
> because it was a jump, or you've already been hacked and you got here
> using a ret; (b) is available to userspace anyway.

Like I said, this is work in progress. I have to spend time to figure out
how this would work or if this would work. So, let us brainstorm this
a little bit.

There are two ways to invoke the trampoline:

(1) By just branching to the trampoline address.

(2) Or, by treating the address as a function pointer and calling it.
    In the libffi case, it is (2).

If it is (2), it is easier. We can figure out the return address of the
call which would be the location after the call instruction.

If it is (1), it is harder as you point out. So, we can support this
at least for (2). The user can inform trampfd as to the type of
invocation for the trampoline.

For (1), the return address would be that of the call to the function
that contains the branch. If the kernel can get that call instruction
and figure out the function address, then we can do something.

I admit this is bit hairy at the moment. I have to work it out.

> 
>>
>>> What's special about these trampolines anyway? Any indirect function
>>> call could have these same problems -- an attacker could have
>>> overwritten the pointer the same way, whether it's supposed to point to
>>> a normal function or it is the target of this trampoline.
>>>
>>> For making them a bit safer, userspace could just map the page holding
>>> the data pointers/destination address(es) as read-only after
>>> initialization.
>>>
>>
>> You need to look at version 1 of trampfd for how to do "allowed pcs".
>> As an example, libffi defines ABI handlers for every arch-ABI combo.
>> These ABI handler pointers could be placed in an array in .rodata.
>> Then, the array can be written into trampfd for setting allowed PCS.
>> When the target PC is set for a trampoline, the kernel will check
>> it against allowed PCs and reject it if it has been overwritten.
> 
> I'm not asking how it's implemented. I'm asking what's the point? On a
> typical linux system, at least on x86, every library function call is an
> indirect branch. The protection they get is that the dynamic linker can
> map the pointer table read-only after initializing it.
> 

The security subsystem is concerned about dynamic code, not the indirect
branches set up for dynamic linking.


> For the RO mapping, libffi could be mapping both the entire closure
> structure, as well as the structure that describes the arguments and
> return types of the function, read-only once they are initialized.
> 

This has been suggested in some form before. The general problem with
this approach is that when the page is still writable, an attacker
can inject his code potentially. Making the page read-only after the
fact may not help. In specific use cases, it may work. But it is
not OK as a general approach to solving this problem.

> For libffi, there are three indirect branches for every trampoline call
> with your suggested trampoline: one to get to the trampoline, one to
> jump to the handler, and one to call the actual user function. If we are
> particularly concerned about the trampoline to handler branch for some
> reason, we could just replace it with a direct branch: if the kernel was
> generating the code, there's no reason to allow the data pointer or code
> target to be changed after the trampoline was created. It can just
> hard-code them in the generated code and be done with it. Even with
> user-space trampolines, you can use a direct call. All you need is
> libffi-trampoline.so which contains a few thousand trampolines all
> jumping to one handler, which then decides what to do based on which
> trampoline was called. Sure libffi currently dispatches to one of 2-3
> handlers based on the ABI, but there's no technical reason it couldn't
> dispatch to just one that handled all the ABIs, and the trampoline could
> be boiled down to just:
> 	endbr
> 	call handler
> 	ret
> 
One still needs this trampoline:

	load closure in some register
	jump to single_handler

In the kernel based solution, the user would specify to the kernel the
target PC in a code context.

	pwrite(trampfd, code_context, size, CODE_OFFSET);

code_context itself can be hacked unless it is in .rodata. The allowed_pcs
thing exists for apps/libs that are unable or unwilling to place code_context
in .rodata.

I would like to not just focus how to solve things for libffi alone.


>>>>
>>>> In order to address the FFI_REGISTER ABI in libffi, we could use the secure
>>>> trampoline. In FFI_REGISTER, the data is pushed on the stack and the code
>>>> is jumped to without using any registers.
>>>>
>>>> As outlined in version 1, the kernel can push the data address on the stack
>>>> and write the code address into the PC and return to userland.
>>>>
>>>> For doing all of this, we need trampfd.
>>>
>>> We don't need this for FFI_REGISTER. I presented a solution that works
>>> in userspace. Even if you want to use a trampoline created by the
>>> kernel, there's no reason it needs to trap into the kernel at trampoline
>>> execution time. libffi's trampolines already handle this case today.
>>>
>>
>> libffi handles this using user level dynamic code which needs to be executed.
>> If the security subsystem prevents that, then the dynamic code cannot execute.
>> That is the whole point of this RFC.
> 
> /If/ you are using a trampoline created by the kernel, it can just
> create the one that libffi is using today; which doesn't need trapping
> into the kernel at execution time.
> 
> And if you aren't, you can use the trampoline I wrote, which has no
> dynamic code, and doesn't need to trap into the kernel at execution time
> either.
> 

The kernel based solution gives you the opportunity to make additional
security checks at the time a trampoline is invoked. A purely user level
solution cannot do that. E.g., I would like to prevent even the minimal
trampoline from being used in BOP/ROP chains.

>>
>>>>
>>>> Permitting the use of trampfd
>>>> -----------------------------
>>>>
>>>> An "exectramp" setting can be implemented in SELinux to selectively allow the
>>>> use of trampfd for applications.
>>>>
>>>> Madhavan
>>>
>>> Applications can use their own userspace trampolines regardless of this
>>> setting, so it doesn't provide any additional security benefit by
>>> preventing usage of trampfd.
>>>
>>
>> The background for all of this is that dynamic code such as trampolines
>> need to be placed in a page with executable permissions so they can
>> execute. If security measures such as W^X are present, this will not
>> be possible. Admitted, today some user level tricks exist to get around
>> W^X. I have alluded to those. IMO, they are all security holes and will
>> get plugged sooner or later. Then, these trampolines cannot execute.
>> Currently, there exist security exceptions such as execmem to let them
>> execute. But we would like to do it without making security exceptions.
>>
>> Madhavan
> 
> How can you still say this after this whole discussion? Applications can
> get the exact same functionality as your proposed trampfd using static
> code, no W^X tricks needed.
> 
> This only matters if you have a trampfd that generates _truly_ dynamic
> code, not just code that can be trivially made static.
> 

How can *you* still say this after all this discussion?

I have already explained all of this. The trivial bootstrap trampoline can
be provided in a user library as well the kernel. The user land solution
provides a fast trampoline that does the job. The kernel solution
is slower but allows for additional security checks that a user land
solution does not allow. IMO, it should be a choice what type of trampoline
the user wants.

And this is not just for libffi that we can somehow do this within libffi.
I would like to provide something so that the maintainers of other
dynamic code can use it to convert their dynamic code to static code
when their dynamic code is a lot more complex that the libffi trampoline.

I am already willing to implement a user land only solution. I don't see
the problem.

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor
  2020-09-25 22:22                   ` Madhavan T. Venkataraman
@ 2020-09-27 18:25                     ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 50+ messages in thread
From: Madhavan T. Venkataraman @ 2020-09-27 18:25 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Arvind Sankar, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86, libffi-discuss, luto,
	David.Laight, mark.rutland, mic, pavel

Before I implement the user land solution recommended by reviewers, I just want
an opinion on where the code should reside.

I am thinking glibc. The other choice would be a separate library, say, libtramp.
What do you recommend?

Madhavan

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2020-09-27 18:25 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <210d7cd762d5307c2aa1676705b392bd445f1baa>
2020-09-16 15:08 ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor madvenka
2020-09-16 15:08   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
2020-09-16 15:08   ` [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
2020-09-16 15:08   ` [PATCH v2 3/4] [RFC] arm64/trampfd: " madvenka
2020-09-16 15:08   ` [PATCH v2 4/4] [RFC] arm/trampfd: " madvenka
2020-09-17  1:04   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Florian Weimer
2020-09-17 15:36     ` Madhavan T. Venkataraman
2020-09-17 15:57       ` Madhavan T. Venkataraman
2020-09-17 16:01         ` Florian Weimer
2020-09-23  1:46       ` Arvind Sankar
2020-09-23  9:11         ` Arvind Sankar
2020-09-23 19:17           ` Madhavan T. Venkataraman
2020-09-23 19:51             ` Arvind Sankar
2020-09-23 23:51               ` Madhavan T. Venkataraman
2020-09-24 20:23               ` Madhavan T. Venkataraman
2020-09-24 20:52                 ` Florian Weimer
2020-09-25 22:22                   ` Madhavan T. Venkataraman
2020-09-27 18:25                     ` Madhavan T. Venkataraman
2020-09-24 22:13                 ` Pavel Machek
2020-09-24 23:43                 ` Arvind Sankar
2020-09-25 22:44                   ` Madhavan T. Venkataraman
2020-09-26 15:55                     ` Arvind Sankar
2020-09-27 17:59                       ` Madhavan T. Venkataraman
2020-09-22 21:53 ` madvenka
2020-09-22 21:53   ` [PATCH v2 1/4] [RFC] fs/trampfd: Implement the trampoline file descriptor API madvenka
2020-09-22 21:53   ` [PATCH v2 2/4] [RFC] x86/trampfd: Provide support for the trampoline file descriptor madvenka
2020-09-22 21:53   ` [PATCH v2 3/4] [RFC] arm64/trampfd: " madvenka
2020-09-22 21:53   ` [PATCH v2 4/4] [RFC] arm/trampfd: " madvenka
2020-09-22 21:54   ` [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor Madhavan T. Venkataraman
2020-09-23  8:14   ` Pavel Machek
2020-09-23  9:14     ` Solar Designer
2020-09-23 14:11       ` Solar Designer
2020-09-23 15:18         ` Pavel Machek
2020-09-23 18:00           ` Solar Designer
2020-09-23 18:21             ` Solar Designer
2020-09-23 14:39       ` Florian Weimer
2020-09-23 18:09         ` Andy Lutomirski
2020-09-23 18:11         ` Solar Designer
2020-09-23 18:49           ` Arvind Sankar
2020-09-23 23:53         ` Madhavan T. Venkataraman
2020-09-23 19:41       ` Madhavan T. Venkataraman
2020-09-23 18:10     ` James Morris
2020-09-23 18:32     ` Madhavan T. Venkataraman
2020-09-23  8:42   ` Pavel Machek
2020-09-23 18:56     ` Madhavan T. Venkataraman
2020-09-23 20:51       ` Pavel Machek
2020-09-23 23:04         ` Madhavan T. Venkataraman
2020-09-24 16:44         ` Mickaël Salaün
2020-09-24 22:05           ` Pavel Machek
2020-09-25 10:12             ` Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).